Abstract
Background
Diet plays an important role in Alzheimer’s disease (AD) initiation,
progression and outcomes. Previous studies have shown individual
food-derived substances may have neuroprotective or neurotoxic effects.
However, few works systematically investigate the role of food and
food-derived metabolites on the development and progression of AD.
Methods
In this study, we systematically investigated 7569 metabolites and
identified AD-associated food metabolites using a novel network-based
approach. We constructed a context-sensitive network to integrate
heterogeneous chemical and genetic data, and to model context-specific
inter-relationships among foods, metabolites, human genes and AD.
Results
Our metabolite prioritization algorithm ranked 59 known AD-associated
food metabolites within top 4.9%, which is significantly higher than
random expectation. Interestingly, a few top-ranked food metabolites
were specifically enriched in herbs and spices. Pathway enrichment
analysis shows that these top-ranked herb-and-spice metabolites share
many common pathways with AD, including the amyloid processing pathway,
which is considered as a hallmark in AD-affected brains and has
pathological roles in AD development.
Conclusions
Our study represents the first unbiased systems approach to
characterizing the effects of food and food-derived metabolites in AD
pathogenesis. Our ranking approach prioritizes the known AD-associated
food metabolites, and identifies interesting relationships between AD
and the food group “herbs and spices”. Overall, our study provides
intriguing evidence for the role of diet, as an important environmental
factor, in AD etiology.
Keywords: Food metabolite, Network analysis, Alzheimer’s disease,
Disease prevention
Background
Alzheimer’s disease (AD) is the sixth leading cause of death and
affected 5.3 million people in 2015 in the United States [[27]1]. Diet
plays an important role in the disease development [[28]2].
Epidemiological studies have shown that higher adherence to a
Mediterranean-type diet is associated with lower risk for AD
[[29]3–[30]5] and mild cognitive impairment [[31]6, [32]7]. Evidence
suggests that improper diet habits may accelerate the progression of
neuron damage through increasing the concentration of pro-inflammatory
mediators [[33]8, [34]9]. In addition, a number of experimental studies
have investigated individual food-derived substances, such as
resveratrol [[35]10], vitamin [[36]11], and advanced glycation end
products [[37]12], and demonstrated their neuroprotective or neurotoxic
effects. Systematic study of food metabolites and their associations
with AD may offer insights into the disease-environment relationship
and disease prevention, but currently remains unexplored.
Knowledge of metabolites and their interactions with disease-associated
proteins has been obtained through in vitro, in silico, and in vivo
technologies [[38]13]. Most previous studies used these data to
understand drug actions [[39]14, [40]15]. Recently, large amounts of
data have also accumulated on food metabolites (Fig. [41]1): The Human
Metabolome Database (HMDB) [[42]16] provides high-quality and
comprehensive information for 74,462 metabolites, including their
chemical, biological, and physical properties; these metabolites can be
linked to foods using the large-scale food constitute resource in the
Food Database (FooDB) [[43]17], which covers the detailed compositional
information for 907 foods. On the other hand, the interactions between
the metabolites and human proteins are also available in
chemical-protein interaction databases, such as the Search Tool for
Interactions of Chemicals (STITCH) [[44]18]. Here, we developed a
network-based approach to integrate food metabolites with foods and
human proteins, and performed a systematic unbiased study to identify
AD-associated food metabolites.
Fig. 1.
[45]Fig. 1
[46]Open in a new tab
Link disease, chemical and genetic data to infer the food metabolites
related with AD
Network-based approaches have been widely used in biomedical
applications, such as predicting disease-gene associations
[[47]19–[48]21], understanding disease comorbidity [[49]22], and drug
repurposing [[50]23–[51]25]. Traditional biomedical networks often
model the relationships between two nodes based on pairwise
similarities [[52]26, [53]27]. For example, disease networks have been
constructed by defining different similarities: some quantified the
disease-disease similarities based on shared phenotypes [[54]26,
[55]27], and others used shared genetic factors [[56]28]. These
networks only captured the strength of the links, but ignored their
semantic meaning. Real world interconnections are multi-typed.
Specifically, in our problem, two metabolites may share commonalities
because they are contained in the same food, or interact with the same
protein. Recently, we introduce a novel concept—context-sensitive
network [[57]29], which preserves the context of how nodes are
connected in the network. In a disease-gene prediction study, our
experiment results demonstrated that the context-sensitive disease
network led to significantly improved performance than the
similarity-based disease network [[58]29]. Analysis shows that the
similarity-based network tends to contain noises and bury the true
signals in a much denser network structure than the context-sensitive
network [[59]29]. Motivated by the benefits of context-sensitive
networks, we construct a gene-metabolite-food (GMF) network in this
study to model the complex relationships among food, metabolites, human
proteins, and AD by seamlessly integrating heterogeneous databases in
Fig. [60]1. Then we predict the food metabolites that are highly
associated with AD using this network, and further investigate the
pathways shared between AD and the prioritized food metabolites. Due to
the lack of gold standard, we tested our approach in AD by manually
curating a list of known AD-associated food metabolites. To the best of
our knowledge, our study represents the first effort to systematically
model the context-sensitive interactions among tens of thousands of
human genes, food metabolites, food and diseases and to understand
which and how food and food-derived metabolites are involved in disease
development. In summary, the identification of food and food-derived
metabolites and the understanding of their role as key mediators
through which these factors promote or protect against human diseases
will enable new possibilities for disease understanding, diagnosis,
prevention, and treatment.
Methods
Our study consists of four steps (Fig. [61]2): first, we construct the
GMF network using databases in Fig. [62]1; second, we prioritize
AD-associated metabolites using a network-based ranking algorithm with
the input of AD-causing genes; third, we evaluate the metabolite
ranking using the known disorder-metabolite associations provided by
HMDB; and finally, we investigate the common pathways shared by AD and
top-ranked food metabolites to gain insights into how the metabolites
affect AD. The following subsections describe each step in details.
Fig. 2.
[63]Fig. 2
[64]Open in a new tab
Four steps of our study: (1) GMF network construction (blue nodes:
genes; green nodes: metabolites; orange nodes: food); (2) metabolite
ranking using a network-based ranking algorithm; (3) evaluation of the
metabolite ranking; and (4) investigation of the common pathways
between AD and prioritized food metabolites
GMF network construction
We construct a context-sensitive network to model the interconnections
among foods, metabolites, and human genes. We first extract the three
types of nodes for the network: the metabolite nodes are extracted from
HMDB [[65]16]; the gene nodes are obtained from The HUGO Gene
Nomenclature Committee (HGNC) [[66]30] and labeled by approved gene
names. For food nodes, we extract food names from FooDB [[67]17] and
normalize these strings using the unique identifier assigned by the
database. Then we use the “group” information provided in FooDB for
each food to further clean the food names: we exclude the foods in the
group of “dishes”, such as “pizza” and “meatball”, which contain
complex and uncertain components, and remove the food names that are
high level food group names, such as “herbs and spices”, “fruits”, and
“green vegetables”.
Next, we identify three types of edges for the network:
metabolite-gene, metabolite-food, and gene-gene links. The
metabolite-food edges are extracted from FooDB: we aligned the unique
metabolite identifiers provided by FooDB to the metabolite names in
HMDB. We conducted distribution analysis on food metabolites
(Fig. [68]3). Each food is averagely associated with 78 metabolites,
and 95% of the metabolites are linked to less than 20 foods. The
metabolite-gene connections are extracted from the STITCH^18 database:
we link the metabolite names to PubChem compound identifiers, which is
linked to interacting genes in STITCH. In addition, genes are connected
other gene nodes via the protein-protein interactions extracted from
the Search Tool for the Retrieval of Interacting Genes/Proteins
(STRING) [[69]31]. Since protein-protein interactions in STRING and
metabolite-gene interactions in STITCH have confidence scores provided
by each own database, we establish weighted edges for gene-gene and
gene-metabolite edges, and normalize the weights into the range of
[0,1]. Table [70]1 shows the size of the entire GMF network, and the
numbers of nodes and edges of each kind.
Fig. 3.
[71]Fig. 3
[72]Open in a new tab
Distribution of (1) the number of metabolites for each food, and (2)
the number of food associated with each metabolite
Table 1.
Number of nodes and edges in the gene-metabolite-food (GMF) network
Node/edge type Number
Nodes Gene nodes 18,338
Metabolite nodes 7596
Food nodes 790
Total 26,724
Edges Gene-gene 7,869,282
Gene-metabolite 210,405
Metabolite-food 62,216
Total 8,141,903
[73]Open in a new tab
Metabolite ranking algorithm
We first extracted from the Online Mendelian Inheritance in Man (OMIM)
database all 14 genes associated with AD [[74]32], and set the
corresponding gene nodes in the GMF network as the “seeds.” Then we
rank the nodes in the GMF network using the random walk model, which
assumes that a walker starts from the seeds and randomly jumps to the
neighbor nodes. We calculate an iteratively updated score for each node
as the probability of being reached by the seeds:
[MATH: pi=1−γMpi−1
+γp0, :MATH]
1
where M is the transition matrix, γ is the probability of restarting
from the seeds, and p[0] consists of the initial scores for all nodes.
Here, the initial score is 1/14 for each seed and zero for all other
nodes; thus all the scores add up to 1. The transition matrix M is the
adjacency matrix of the GMF network after column-wise normalization. We
set the restarting probability γ as 0.7 and the algorithm is
insensitive to different choices of γ. We assume the algorithm
converges if the difference of scores between iteration ε < 10^−8.
After the algorithm converges, we extract metabolites from all the
nodes and rank them based on the scores.
Evaluation of metabolite ranking
HMDB provides metabolite-disorder associations curated from literature.
We extract a total of 81 AD-associated metabolites from HMDB, 59 of
which appear in STITCH with associated genes. We used these 56
metabolites as the evaluation set. Most of metabolite-AD associations
were identified in previous animal model or human cell line studies.
Here, though the 59 metabolites are not the perfect gold standard for
AD-associated metabolites, we consider them as the positive examples
that show relevance with AD and test if they tend to be ranked highly
in our approach.
We calculate the mean and median ranks for the 59 metabolites among our
ranking. We also plot the precision-recall curve, and calculated the
average precision across all recall levels when considering top k
retrieved metabolites as the positive. The evaluation metrics are
compared between our approach and the random cases. Pure random
rankings result in a mean average rank of 50% for the 59 metabolites.
Here, we generate random rankings by randomly selecting the seeds on
the GMF network. Comparing our ranking for the evaluation set with the
randomized cases, we test if the top-ranked metabolites were
prioritized by chance.
Pathway analysis for top-ranked food metabolites
Only part of the 7596 metabolites are actually linked to food nodes
based on the FooDB data in the GMF network. In addition, many of the
food metabolites are components of hundreds of different foods. We
first extract the metabolites that were uniquely identified in less
than ten foods. Then we identify the significantly enriched pathways
for each top-ranked food-specific metabolite: we import the metabolite
interacting genes into the QIAGEN’s Ingenuity Pathway Analysis software
(IPA®, QIAGEN Redwood City,
[75]https://www.qiagenbioinformatics.com/products/ingenuity-pathway-ana
lysis/) and download the significant canonical pathways. To compare the
pathways for prioritized metabolites and AD, we also identified
significant pathways for AD using the 14 AD-associated genes from OMIM.
We developed a method to rank the common significant pathways between
AD and each prioritized metabolite. Intuitively, we intended to
prioritize the pathways that are highly enriched for both AD- and
metabolite-associated genes. The IPA software provides a coverage score
for each AD- or metabolite-associated significant pathway; the score
measures the percentage of AD- or metabolite-associated genes in each
pathway. We design a score for each common pathway between AD and a
metabolite to ensure the balanced coverage:
[MATH: s=cAD×cm
cAD+cm
, :MATH]
2
where c[AD] and c[m] are the coverage of AD-associated genes and the
metabolite-associated genes, respectively. The score was inspired by
the definition of F1 measure, which is a measure of a test’s accuracy,
and considers precision and recall at the same time. Last, we examine
the top-ranked common significant pathways between AD and each
metabolite based on the balanced score.
Results
Metabolite ranking based on the context-sensitive GMF network are supported
by existing knowledge
Our approach averagely ranked the 59 known AD-associated food
metabolites in top 4.9% among the 5192 food metabolites in the GMF
network (metabolite nodes that have connections to food nodes).
Comparing with the randomized rankings (generated with random seeds
placed on the GMF network), we achieved significantly higher mean rank
(p < e-12, student’s T test) and median rank (p < e-14, Wilcoxon ranked
sum test). Also, 55 out of the 59 (93%) positive examples of
AD-associated metabolites were ranked within top 10%. In addition, the
precision-recall curve in Fig. [76]4 demonstrates a better performance
of our ranking comparing with the randomized rankings; the mean average
precision calculated from the precision-recall curve is also
significantly higher than the random case (Table [77]2, p < e-8).
Together, the results demonstrate that our ranking for the food
metabolites was able to prioritize relevant compounds for AD. Note that
our ranking algorithm is unbiased, and did not use any prior knowledge
about the known AD-associated food metabolites.
Fig. 4.
[78]Fig. 4
[79]Open in a new tab
Precision recall curve for GMF network ranking algorithm for
food-contained metabolites and the average of 100 random rankings
Table 2.
Performance of metabolite ranking using the reduced GMF network
comparing with the average performance of random rankings
Ranking Mean rank Median rank Mean average precision
GMF network ranking 4.9% 1.9% 0.287
Randomized ranking 11.4% 18.2% 0.093
[80]Open in a new tab
Besides the ranking for metabolites, our approach also automatically
generated the ranking for all foods based on the strength of their
associations with AD. We grouped the foods into categories and ranked
the categories based on the average of food ranks in each category. The
ranking shows a trend that high-fiber foods, such as grains, vegetables
and legumes, tend to have higher scores than meats, sweets and milk
products. Interestingly, our ranking is approximately correlated with
the Mediterranean diet pyramid, which suggests an eating pattern with
many healthy grains, fruits, vegetables, beans and nuts, and small
amounts of dairy, red wine and meats [[81]33] (Fig. [82]5). Here, the
ranking of food categories only reflects the average ranks for foods of
each class, and individual food in lowly ranked food categories may
also contain metabolites that are closely relevant to AD. Next, we
specifically examined each top-ranked food metabolites.
Fig. 5.
[83]Fig. 5
[84]Open in a new tab
Mediterranean diet pyramid and food category ranking based on the GMF
network
Top-ranked food metabolites contain interesting candidates of AD-associated
compounds
Many top-ranked metabolites are common nutrients found in hundreds of
different foods, such as calcium and glycerol. Here, we focus on the
unique metabolites that were exclusively identified in several specific
foods or food categories. Table [85]3 lists the top-ranked food
metabolites that were identified in less than ten foods. Seven out of
ten metabolites were constituents of “healthy foods,” which include
fruits, vegetables, grains, nuts and legumes. Among them,
tetramethylpyrazine has been shown to exhibit the neuroprotective
effects in rats [[86]34]; and resveratrol is widely-known nutritional
supplement with a number of beneficial health effects, such as
anti-cancer [[87]35], antiviral [[88]36], neuroprotective
[[89]37–[90]40], anti-aging [[91]41], anti-inflammatory [[92]42],
cardioprotective [[93]43], and life-prolonging effects. Among the top
ten food-specific metabolites, only 4-hydroxynonenal was in the
evaluation set of food metabolites that were known to be associated
with AD based on the HMDB data. This also shows that many AD-associated
food metabolites may not be readily included in existing databases. The
ultimate goal of our study is to identify these new relevant food
metabolites, which may might shed lights on the disease prevention.
Table 3.
Top-ranked unique metabolites that were found in less than ten foods
Metabolite Food group Rank among all
estradiol fruits, legumes 0.12%
tetramethylpyrazine fruits, vegetables 0.18%
resveratrol fruits, nuts 0.22%
theophylline fruits 0.46%
chloroform herbs and spices 0.47%
4-hydroxynonenal legumes 0.55%
capsaicin herbs and spices 0.62%
chlorine fruits, vegetables 0.68%
emodin herbs and spices, vegetables 0.75%
xylene nuts, grains 0.76%
[94]Open in a new tab
Surprisingly, we found that three metabolites in Table [95]3 are
uniquely identified in the group of “herbs and spices”. Previous
studies point out that the incidence of neurodegenerative diseases
among people living in the Asian subcontinent, where people regularly
consume spices, is much lower than in countries of the western world
[[96]44]. In addition, both in vitro and in vivo studies have indicated
that nutraceuticals derived from herbs and spices, such as red pepper,
black pepper, ginger, garlic, and cinnamon, target inflammatory
pathways, and may show effects in preventing neurodegenerative diseases
[[97]45, [98]46]. We filtered our metabolite ranking and systematically
extracted the compounds that are specifically found in herbs and
spices. Table [99]4 lists the top ten spice-specific metabolites. Among
these chemicals, capsaicin has been studied in animal models to
investigate if it may attenuate memory impairment [[100]47, [101]48].
Next, we systematically investigated the pathways targeted by the top
AD-associated spice-specific metabolites.
Table 4.
Top-ranked herbs and spices specific metabolites
Metabolite Food Rank among all
chloroform spearmint 0.6%
capsaicin ginger, pepper (C. frutescens), pepper (C. annuum) 0.79%
2,6-di-tert-butyl-4-methylphenol soft-necked garlic 1.16%
sesamol sesame, fats and oils 1.89%
desmosterol cardamom, soy bean 2.56%
santene parsley, rosemary, cornmint 3.18%
1-piperidinecarboxaldehyde herbs and spices, pepper (spice) 3.28%
p-menthan-3-ol herbs and spices 4.5%
sanguinarine opium poppy 4.77%
1,1,1,3,3,3-hexachloro-2-propanone herbs and spices 5.26%
[102]Open in a new tab
Top-ranked spice-specific metabolites share significant pathways with AD
We identified 58 significantly enriched pathways for AD, and found that
each top-ranked herb-and-spice metabolite has many overlapping pathways
with AD. Figure [103]5 shows the overlapping pathways that are mostly
enriched for both AD- and metabolite-associated genes. Importantly, we
found that amyloid processing (highlighted in Fig. [104]6) appears
repetitively among the enriched pathways for herb-and-spice
metabolites. The accumulation of the beta-amyloid protein is a major
neuropathological hallmark in AD-affected brains and has a pathological
role in AD [[105]49]. The pathway analysis supports that the identified
herb-and-spice metabolites are potentially involved with the
development of AD. Other AD-involved pathways, including melatonin
degradation [[106]50], neuroprotective role of THOP1 [[107]51], and
Reelin signaling in neurons [[108]52], were also found enriched for the
herb-and-spice metabolite interacting genes. As a control, we also
investigated the pathways for guanosine 2′,3′-cyclic phosphate, which
is food metabolite ranked in the bottom by our approach; the metabolite
has no overlapping pathways with AD.
Fig. 6.
[109]Fig. 6
[110]Open in a new tab
Overlapping pathways between the top-ranked herbs and spices specific
metabolites and AD
Discussions
We developed a novel context-sensitive network approach to analyze
interactions among food, food metabolites, host genetics and pathways
in the context of specific diseases. In this study, we use the approach
to identify relevant food metabolites for AD, which is a complex
disease affected by both genetic and environmental factors. Our study
provides intriguing evidence for the role of diet, as an important
environmental factor, in AD etiology. We also provide the hypotheses
for the subsequent biological and clinical studies of host-environment
interactions in AD. Due to the lack of gold standard (i.e., known food
metabolites for many diseases), we did not test our algorithm on all
other diseases. Our approach is not biased towards to AD; it is highly
generic and can be applied to any other diseases.
The future work of this study includes the following aspects. (1) We
will test and apply the algorithms to other food-related diseases, such
as cancers, inflammatory bowel diseases, and allergy. (2) We will
further classify food metabolites into neuroprotective and neurotoxic.
In the future, as more detailed and quantitative data become
increasingly available, we will be able to further classify the effects
of food metabolites into AD-promoting or protective. (3) We constructed
a network that contains gene, food, and metabolite nodes in this study.
Other types of data, such as disease-phenotype relationships and
disorder-metabolites in HMDB, may also be helpful in inferring
AD-associated food metabolites. However, the usefulness of these data
requires further evaluation. In the future, we will investigate
effective approaches to rationally integrate more comprehensive data to
predict AD-affecting food metabolites. (4) We will further improve the
prediction algorithm based on the context-sensitive networks. In social
network analysis, researchers have developed improved random walk
algorithms that consider the semantic meanings of the paths in networks
[[111]53]. However, these approaches usually require prior knowledge or
sufficient training data, to define or learning meaningful paths for
the random walker in the network; the knowledge and training data
cannot be easily obtained in most biomedical prediction scenarios. We
will explore new algorithms in the unsupervised fashion that could
further take the advantages of the context-sensitive networks. (5) In
addition, we need further validation on the prioritized AD-associated
food metabolites and how they might affect AD. Currently, we
investigated the common significantly enriched pathways between AD and
the prioritized metabolites, and found that a few metabolites are
involved in the amyloid processing pathways. Amyloid processing is a
major activity in AD-affected brains and involves with the cause of AD.
The result shows that the top-ranked food metabolites are highly
associated with AD development. However, further validations are
essential through in vitro and in vivo experiments (6) Finally, AD may
be related with the interactions of different food metabolites. More
generally, other environmental factors, including toxins, drugs, and
gut microbiome may also contribute to the AD development. In our
previous work, we have studied brain-gut-microbiome connections in AD
[[112]54]. In the future, we will develop approaches in identifying
chemicals from other sources that are associated with AD. We will also
explore more complex computational models to investigate the combined
effects of multiple environmental factors.
Conclusions
In summary, we developed a novel network-based approach to
understanding how food and food-derived metabolites are involved in
complex human diseases, and conducted an exploratory study in AD. The
identification of disease-associated food metabolites and their
underlying pathways may provide insights into disease mechanism and
offer the opportunities for disease prevention and treatment.
Acknowledgements