Abstract
Objective
Alzheimer’s disease (AD) is a severe neurodegenerative disorder and has
become a global public health problem. Intensive research has been
conducted for AD. But the pathophysiology of AD is still not
elucidated. Disease comorbidity often associates diseases with
overlapping patterns of genetic markers. This may inform a common
etiology and suggest essential protein targets. US Food and Drug
Administration (FDA) Adverse Event Reporting System (FAERS) collects
large-scale postmarketing surveillance data that provide a unique
opportunity to investigate disease co-occurrence pattern. We aim to
construct a heterogeneous network that integrates disease comorbidity
network (DCN) from FAERS with protein–protein interaction (PPI) to
prioritize the AD risk genes using network-based ranking algorithm.
Materials and Methods
We built a DCN based on indication data from FAERS using association
rule mining. DCN was further integrated with PPI network. We used
random walk with restart ranking algorithm to prioritize AD risk genes.
Results
We evaluated the performance of our approach using AD risk genes
curated from genetic association studies. Our approach achieved an area
under a receiver operating characteristic curve of 0.770. Top 500
ranked genes achieved 5.53-fold enrichment for known AD risk genes as
compared to random expectation. Pathway enrichment analysis using
top-ranked genes revealed that two novel pathways, ERBB and coagulation
pathways, might be involved in AD pathogenesis.
Conclusion
We innovatively leveraged FAERS, a comprehensive data resource for FDA
postmarket drug safety surveillance, for large-scale AD comorbidity
mining. This exploratory study demonstrated the potential of
disease-comorbidities mining from FAERS in AD genetics discovery.
Keywords: Alzheimer’s disease, FAERS, disease comorbidity network,
protein–protein interaction, disease gene discovery
Introduction
Alzheimer’s disease (AD) is a debilitating neurodegenerative disorder
characterized by the progressive loss of cholinergic neurons, leading
to the onset of severe behavioral, motor, and cognitive impairments. An
estimated 5.4 million Americans have AD. It is the sixth leading cause
of death in the United States and the fifth leading cause of death in
Americans age ≥ 65 years. Between 2000 and 2013, deaths from AD
increased 71%.[26]^1 Though intensive research for AD has been
conducted, the etiology of AD is still not elucidated.
Computational-based approaches have been widely used in disease gene
discovery.[27]^2^,[28]^3 Network-based disease algorithm utilizes
disease relationship to prioritize candidate disease genes. The key for
network-based disease gene discovery is to construct disease
relationship. Disease manifestation and electronic medical record (EMR)
have been used for this purpose. For example, we constructed a disease
manifestation network (DMN) to predict novel genes for Parkinson’s
disease.[29]^4 Bagley et al. discovered new genes for autoimmune
disorder and neuropsychiatric disorder using EMR.[30]^5 Disease
comorbidity often associates diseases with overlapping patterns of
genetic markers[31]^5^,[32]^6 and several comorbidity networks have
also been built.[33]^7–9 Recently, a very interesting disease
trajectory relationship were also established based on EMR data on 6.2
million patients.[34]^10 However, these networks are biased towards
special population[35]^7 or single medical center[36]^9 and have not
been used in disease gene discovery.
FDA Adverse Event Reporting System (FAERS) contains adverse event
reports from manufacturers, consumers, and healthcare professionals for
all marketed drug and therapeutic biologic products, which is a
large-scale database that contains seven linked data files representing
patient demographics, drugs, indications, outcomes, reactions,
therapies, and reporting sources.[37]^11 FAERS data have been
intensively used in drug safety issue studies. But the other possible
usages have not been explored. We noticed that each case report in
indication data contains information for all used drugs and diseases
when drug adverse event occurs, which essentially reflects the
co-occurring diseases in an individual. Based on this observation, we
explore the possibility of FAERS in disease comorbidity study. Compared
with EMR, indication data of FAERS have several advantages. First, all
co-occurring diseases reported in FAERS are treated by drugs, which
helps to reduce the disease noise. Second, large scale of FAERS makes
data unbiased for specific diseases. Third, FAERS provides a unified
reporting system in whole population level, which can avoid the
potential bias of EMR toward specific population or discrepancy across
health care systems.[38]^12^,[39]^13
In this study, we used association rule mining to explore this
large-scale data to construct a disease comorbidity network (DCN). One
of the advantages of this method is that it can flexibly detect
multiple disease comorbidities, which is common in clinic
setting.[40]^14 DCN was further integrated with protein–protein
interaction (PPI) network. We used network and functional analysis to
reveal the novel genes and pathways for AD.
METHODS
Our overall methods are shown in [41]Figure 1. First, we used
association rule mining to construct a DCN from FAERS; second, we
constructed a heterogeneous network by integration of DCN with PPI
network; third, we used random walk with restart to prioritize AD risk
genes and evaluated the performance of our methods using de novo
prediction of validation gene set from AlzGene database; fourth, we
used AD as the seed to prioritize the new AD risk genes; finally, we
performed the pathway analysis using top-ranked genes to discover novel
pathways that might be involved in AD pathogenesis.
Figure 1.
[42]Figure 1.
[43]Open in a new tab
Overview of our method. ARM: association rule mining; DCN: disease
comorbidity network; PPI: protein–protein interaction.
Data
FAERS data were downloaded from US Food and Drug Administration (FDA),
which contains 17 305 542 case reports for indications from 2004 to
2017.[44]^11 Disease genetic data were extracted from Online Mendelian
Inheritance in Man (OMIM). The OMIM catalog contains 15 462
disease–gene associations for 8832 genes and 6018
diseases/traits.[45]^15 Protein–protein interaction were obtained from
STRING database, which contains 1 380 504 interactions for 17 860
genes.[46]^16 AlzGene database collects AD risk genes (679 genes) that
were derived from comprehensive genetic association studies.[47]^17
Construction of disease comorbidity network
Data processing
Indication files in FAERS from 2014 to 2017 were used in this study to
explore disease comorbidity patterns. After removing reports with
unknown indications, data contain 6 480 372 case reports and represent
15 721 indications of drugs. [48]Table 1 shows a sample indication data
for one patient. We can see this patient was treated with 9 drugs for
different diseases/symptoms.
Table 1.
Sample indication data for one patient
Primary_id Case_id Drug_seq Drug Indication
131970402 13197040 1 Trifluridine Adenocarcinoma of colon
131970402 13197040 2 Irinotecan Adenocarcinoma of colon
131970402 13197040 3 Bevacizumab Adenocarcinoma of colon
131970402 13197040 4 Fentanyl Back pain
131970402 13197040 5 Acetaminophen Back pain
131970402 13197040 6 Ondansetron hydrochloride Prophylaxis of nausea
and vomiting
131970402 13197040 7 Levothyroxine sodium Hypothyroidism
131970402 13197040 8 Rivaroxaban Deep vein thrombosis
131970402 13197040 9 Dexamethasone Prophylaxis of nausea and vomiting
[49]Open in a new tab
Note: Primary_id is used to link other data in FAERS. Case_id indicates
patient.
Indications in FAERS are represented as Medical Dictionary for
Regulatory Activities (MedDRA) terms.[50]^18 In order to facilitate
downstream analysis, we mapped indication terms into Unified Medical
Language System (UMLS)[51]^19 using MetaMap (2016 V2 release).[52]^20
Considering these indications include not only diseases, but also
treatment procedures, etc., we constrained the mapping to 12 semantic
types that are categorized as disorders in UMLS, including Acquired
Abnormality, Anatomical Abnormality, Cell or Molecular Dysfunction,
Congenital Abnormality, Disease or Syndrome, Experimental Model of
Disease, Finding, Injury or Poisoning, Mental or Behavioral
Dysfunction, Neoplastic Process, Pathologic Function, and Sign or
Symptom. Total 12 225 of 15 721 (77.76%) were mapped. The clean data
set contains 6211 disorders and 5 784 501 case reports.
We then summarized the data on patient level, that is, each row
represents co-occurring disorders in one patient. For example, the
patient in [53]Table 1 has multiple diseases, including adenocarcinoma
of colon, back pain, prophylaxis of nausea and vomiting,
hypothyroidism, and deep vein thrombosis, which will be constructed as
one record in our data set.
Disease comorbidity pattern calculation
We applied Frequent Pattern-growth (FP-growth) algorithm (implemented
in Weka)[54]^21^,[55]^22 into this data to obtain disease co-occurrence
patterns. FP-growth is a widely used association rule mining algorithm
and the choice of support, and lift is a tradeoff between precision and
recall. We experimented with different combinations of support and lift
to evaluate the performance of comorbidity mining using manually
curated disease comorbidities related to obesity, multiple sclerosis,
and psoriasis. After experimentations, we used support >12 and lift >1
and generated 20 101 rules, which are lists of patterns between two
sets of diseases, represented in the form
[MATH: {X=>Y}
:MATH]
, for example, {
[MATH: anxiety,diabetesmellitus,type2=>multiplesclerosis :MATH]
}.
Construct disease comorbidity network
We constructed an undirected and unweighted DCN based on these rules.
Nodes in DCN included all diseases in the rules and edges were
established between each pair of diseases in both sides. The DCN
contains 1538 diseases and 21 321 edges.
Evaluation of performance for AD comorbidity
We considered neighbor nodes of AD as its comorbidities and obtained
subcomorbidity network for AD. To test the performance of DCN, we
manually curated comorbidities of AD from literature, then compared
with comorbidities from DCN. Precision and recall were computed
correspondingly.
Construction of a heterogeneous network by integration of disease comorbidity
network and protein–protein interaction network
DCN was integrated with PPI by disease–gene association network from
OMIM. Diseases in both DCN and OMIM were mapped to UMLS to enable the
connection.
Prioritization of candidate genes for AD
We used random walk with restart to prioritize the AD candidate gene.
We used AD as the seed and prioritized genes according to their scores,
which represented the probability that each gene can be reached from
the seed at steady state. Assuming
[MATH: p0
:MATH]
is a seed vector, the updated score vector
[MATH: pk :MATH]
at step
[MATH: k :MATH]
is defined:
[MATH:
pk+1=1-γMpk+
γp
0, :MATH]
(1)
where γ is the probability that the random walker restarts from the
seeds at each step, and M is the transition matrix of the entire
heterogeneous network, which contains two intranetwork transition
matrices on the diagonal and two internetwork transition matrices on
the off-diagonal defined below:
[MATH: M=MDMDGMDGTMG<
/mrow>, :MATH]
(2)
where
[MATH: D :MATH]
and
[MATH: G :MATH]
represent DCN and the genetic network, respectively. The value of γ was
set to 0.5 according to de novo prediction result below and loop
stopped when
[MATH: pk+
1-pk<1
0-6 :MATH]
, indicating probability vector is stable.[56]^23
Evaluation of predicted genes for AD
To evaluate our methods, we obtained a validation gene set from AlzGene
database. Currently, there are 679 genes in this database, which
represented the largest AD risk gene set. We performed de novo
prediction to test how well our approach ranks these genes.
Specifically, we removed all edges between AD and its associated OMIM
genes. Then, we used random walk with restart to prioritize the AD risk
genes in gene network. We evaluated the performance of our algorithm
from two aspects.
First, we split the whole ranked gene list into 36 bins with size of
500 genes and investigated the distribution of validation genes in each
bin. We then calculated the fold enrichment of validation genes in the
top 500 ranked genes. In order to calculate the statistical
significance of enrichment, we randomized all 17 860 genes for 1000
times to generate random rankings. We then counted the number of AD
risk genes in top 500 genes in each randomization to generate the
background distribution. The P-value and fold enrichment of our ranking
were calculated based on this distribution.
Second, we used different rank percentiles as thresholds to compute a
receiver operating characteristic curve (ROC curve) and
precision-recall curve. Given a percentile, for example 5%, we
considered all genes that rank in top 5% are positive prediction (AD
risk genes, denoted as ADgenes) and the other 95% genes are negative
prediction (none-AD risk genes, denoted as nADgenes).
[MATH: Truepositiverate :MATH]
,
[MATH: falsepositiverate :MATH]
,
[MATH: truenegativerate :MATH]
, and
[MATH: falsenegativerate :MATH]
were defined as following formulas, where AlzGene/nAlzGene are denoted
as genes in/not in AlzGene database separately.
[MATH: Truepositiverate=ADgenes∈AlzGeneAlzGene :MATH]
(3)
[MATH: Falsepositiverate=ADgenes∈nAlzGenenAlzGene :MATH]
(4)
[MATH: Truenegativerate=nADgenes∈nAlzGenenALzgene :MATH]
(5)
[MATH: Falsenegativerate=nADgenes∈AlzGeneAlzGene :MATH]
(6)
Once these values were calculated in each threshold, precision, recall,
specificity, and sensitivity were computed following the standard
definitions[57]^24 and ROC and precision-recall curve can be derived.
Comparison of DCN with randomized disease network
To further test the usefulness of DCN, we compared the performance of
DCN in predicting AD risk genes with that of randomized disease
network. To generate such networks, we kept all disease nodes and total
number of edges unchanged but edges were randomly assigned between 2
nodes. We generated 1000 such networks. Then each network was
integrated with protein–protein network, and random walk with restart
was used to prioritize AD risk genes. We used 679 genes from AlzGene
database as validation gene set to compute the Area Under the ROC curve
(AUC). P-value of the AUC from real DCN was computed based on normal
distribution of AUCs from 1000 randomized networks.
Functional analysis of candidate genes for AD
We used clusterProfiler (Version 3.4.4) (R package)[58]^25 to perform
gene ontology analysis and gene set enrichment analysis to understand
the functions of novel candidate genes we obtained from our methods.
RESULTS
Disease comorbidity network capture known comorbidities of Alzheimer’s
disease
We extracted 20 101 comorbidity association rules from the indication
data of FAERS across thirteen years. The comorbidity network based on
these rules contains 1538 nodes and 21 312 edges. To obtain
subcomorbidity network for AD, we considered all its neighbor nodes as
comorbidities of AD. [59]Figure 2A shows the extracted comorbidity
network of AD. Total 98 comorbidities were found in our network,
including five psychiatric disorders such as depression, anxiety
disorder etc., and many nonpsychiatric disorders, such as hypertension,
diabetes mellitus, type 2 etc.
Figure 2.
[60]Figure 2.
[61]Open in a new tab
Comorbidity network of Alzheimer’s disease. (A) Diseases are
represented as nodes and the size of each node is proportional to its
degree. Node color represents disorder class (SOC in MedDRA) to which
it belongs (yellow nodes indicate psychiatric disorders). Edges between
nodes are represented as the co-occurrence of diseases. (B) Precision
and recall for AD comorbidities from DCN.
To test the performance of our network, we compared comorbidities of AD
from DCN with known comorbidities of AD from literature. Comorbidities
of AD include psychiatric disorder such as depression, sleep disorder,
bipolar disorder, and nonpsychiatric disorders, such as cardiovascular
diseases (ischemia damage, hypertension, etc.), diabetes mellitus (type
2), hypercholesterolemia, hyperlipidemia, arthrosis, thyroid disease,
osteoporosis, and glaucoma.[62]^26^,[63]^27 Based on these reports, the
precision and recall of AD comorbidities from our network are 66.3% and
91.7% separately. Considering some unknown comorbidities have not been
identified, this result indicates that our network has good performance
in capture disease comorbidities for AD.
DCN-based network rank algorithm prioritizes known AD associated genes
We used 679 AD associated genes from AlzGene database as validation
gene set to evaluate our approach. All connections between AD and its
associated genes reported in OMIM were removed and we used AD as the
seed to prioritize all genes using random walk with restart. We’d like
to emphasize that this de novo prediction high-lighten the contribution
of DCN in disease gene discovery for AD. The top 500 genes in the
ranking contain 93 validation genes, which is 5.53 folds enrichment
comparing with random ranking (
[MATH: P=4.36×10-69 :MATH]
) ([64]Figure 3A). We also used ranking percentiles as threshold to
compute the ROC ([65]Figure 3B) and precision-recall curve
([66]Figure 3C). Our approach achieved AUC of 0.770 and top-ranked
genes showed high precision.
Figure 3.
Figure 3.
[67]Open in a new tab
Evaluation of DCN-based AD risk gene prediction. (A) Distribution of
validation gene set from AlzGene database in gene ranking. (B) ROC
curve for de novo prediction of AD risk genes. (C) Precision-recall
curve for de novo prediction of AD risk genes. (D) Distribution of AUCs
generated from 1000 randomized disease networks.
To further demonstrate the usefulness of DCN, we generated 1000
randomized disease networks and used them to rank AD risk genes.
Distribution of AUCs computed from these networks shows normal
distribution with mean of 0.639 and variance of 0.0146 ([68]Figure 3D).
AUC (0.770) obtained from real DCN is significantly better than that
from randomized networks (
[MATH: P=1.48×10-19 :MATH]
).
DCN-based network rank algorithm prioritizes new AD risk candidate genes
We used AD and AD associated genes reported in OMIM as seeds to rank
new AD associated genes. [69]Table 2 lists the top 20 ranked genes (see
[70]Supplementary Material for full ranked gene list).
Table 2.
Top 20 ranked new AD risk genes
Rank Gene_symbol Gene_name Location Type
1 UBC[71]^a Ubiquitin C Cytoplasm Enzyme
2 NOTCH1[72]^a Notch 1 Plasma Membrane Transcription regulator
3 EGFR[73]^a Epidermal growth factor receptor Plasma Membrane Kinase
4 ALB Albumin Extracellular Space Transporter
5 APLP2[74]^a Amyloid beta precursor like protein 2 Cytoplasm Other
6 APLP1[75]^a Amyloid beta precursor like protein 1 Extracellular Space
Other
7 CP[76]^a Ceruloplasmin Extracellular Space Enzyme
8 PRDM10[77]^a PR/SET domain 10 Nucleus Transcription regulator
9 APBA2[78]^a Amyloid beta precursor protein binding family A member 2
Cytoplasm Transporter
10 NAE1[79]^a NEDD8 activating enzyme E1 subunit 1 Cytoplasm Enzyme
11 NCSTN Nicastrin Plasma Membrane Peptidase
12 SHC1[80]^a SHC adaptor protein 1 Cytoplasm OTHER
13 KAT5[81]^a Lysine acetyltransferase 5 Nucleus Transcription
regulator
14 TSPO[82]^a Translocator protein Cytoplasm Transmembrane receptor
15 BACE1 Beta-secretase 1 Cytoplasm Peptidase
16 APBA3[83]^a Amyloid beta precursor protein binding family A member 3
Cytoplasm Transporter
17 BLMH Bleomycin hydrolase Cytoplasm Peptidase
18 GEN1[84]^a GEN1, Holliday junction 5′ flap endonuclease Cytoplasm
Enzyme
19 APBA1 Amyloid beta precursor protein binding family A member 1
Cytoplasm Transporter
20 TP53 Tumor protein p53 Nucleus Transcription regulator
[85]Open in a new tab
^a
New AD risk genes that are not included in AlzGene database.
We can see 14 genes that are not included in AlzGene database have high
rankings, such as UBC, PRDM10, EGFR, NOTCH1, APLP1, and APLP2 etc. The
roles of most of these genes in the AD pathogenesis have been
implicated or supported by recent studies. For instance, UBC is a major
ubiquitin protein and it is reported that ubiquitin-proteasome system
is impaired in AD patients[86]^28; Notch1 activity is significantly
altered in the brain of AD patients[87]^29; EGFR gene plays a central
role in neurometabolic aging and associates with AD.[88]^30^,[89]^31
Hence, these highly ranked genes provide a start point for further
experimental investigation of their roles in AD pathogenesis.
Pathway analysis of top-ranked novel AD candidate genes
To further investigate the function of the top-ranked AD risk genes, we
performed gene ontology analysis using these genes. [90]Figure 4A lists
the top 10 enriched GO biological process terms.[91]^32 AD is
characterized by disruption of calcium homeostasis, mitochondrial
oxidative stress, impaired energy metabolism and abnormal glucose
regulation, and ultimately neuronal cell death.[92]^33 Expectedly,
several biological processes, such as cellular response to oxidative
stress and neuron death are enriched in our analysis. Interestingly, we
found a new pathway, ERBB signaling pathway, is also significantly
enriched in our analysis. Indeed, Mei et al. reported that ERBB
signaling pathway is involved in nervous system development and
disruption of ERBB is associated with nervous disorders.[93]^34
Figure 4.
Figure 4.
[94]Open in a new tab
Functional analysis of top-ranked AD risk genes. (A) Top ten enriched
biological process terms of gene ontology. (B) Top ten enriched
Hallmark pathways of MSigDB using gene set enrichment.
We also performed gene set enrichment using Molecular Signatures
Database (MSigDB) Hallmark pathways. MSigDB is a collection of
annotated gene sets widely used in gene set enrichment analysis.[95]^35
There are 8 major gene set collections in MSigDB, and we used Hallmark
gene set since it reduces noise and redundancy and provides a better
delineated biological process.[96]^35[97]Figure 4B lists the top 10
enriched Hallmark pathways. APOPTOSIS, NOTCH, TNFA, and HYPOXIA are
well defined AD pathways.[98]^36–39 WNT, a recently identified AD
pathway,[99]^40 is also ranked high in our analysis. Interestingly, we
found that coagulation pathway is also significantly enriched (fold
enrich = 3.97, P = .0002). A recent report detected the interactions of
β-amyloid peptide with fibrinogen and coagulation factor XII,[100]^41
which provides preliminary evidence that coagulation system might be
involved in AD pathogenesis.
CONCLUSIONS AND DISCUSSION
Alzheimer’s disease is complicated disease and its etiology is still
not elucidated. Traditional in vitro- and in vivo-based experimental
methods will continue to discover disease mechanisms, we propose a new
framework to prioritize the AD risk genes by integration of DCN with
PPI. We demonstrated that this framework can efficiently prioritize
known AD risk genes, suggesting that the usefulness of our network in
AD disease genetic analysis. We also predicted novel AD risk genes and
pathways that have preliminary literature support. Further intensive
experiment-based evidence needs to be performed to confirm our
findings.
FAERS data have been considered as a largely uncurated and
unstandardized database. A recent study reported that average 16
different names were given for each active drug ingredient and FAERS is
biased towards serious or life-threatening outcomes.[101]^42 The data
redundancy and bias may lead to wrong interpretation for drug-adverse
event association.[102]^43 However, these problems don’t affect the
investigation of disease co-occurrence pattern from indication data
since we only focus on the co-occurring diseases in individual
patients, which is reported as standard MedDRA terms.
One variability of DCN that is constructed using association rule
mining is that we need to assign thresholds for support and lift. High
thresholds will only identify very common comorbidities, which lead to
poor recall for specific disease. On the contrary, low thresholds will
identify very rare co-occurring diseases, which may not be real
comorbidity disease and lead to poor precision. Therefore, these two
values need to be carefully tuned to achieve a balance of precision and
recall. However, two reasons make the evaluation difficult. One is that
no comprehensive gold standard database for disease comorbidity is
available. Another is that disease comorbidity is a dynamic concept
that number of disease comorbidities for a specific disease changes
over time. In this study, we manually curated disease comorbidities
from literature or disease organizations for several diseases,
including obesity, multiple sclerosis, and psoriasis. Then we used them
as criteria to optimize the thresholds. Though it is not comprehensive,
it is demonstrated that optimized DCN has good performance in terms of
AD comorbidity as well as its risk gene discovery.
Systems approaches to study disease phenotypes can facilitate disease
mechanism understanding. We in this study demonstrated that
disease-comorbidity relationships mined from FAERS have potential in AD
genetics prediction. In our future studies, we will integrate
disease-comorbidity associations mined from FAERS with other disease
phenotypic relationships (eg disease-manifestation) from other data
resources (eg UMLS, biomedical literature), disease genetics and PPI
for AD genetic discovery. We have recently used disease-manifestation
relationships extracted from UMLS to construct a DMN network and have
developed a combined phenome and genome-driven network approach for
disease genetics prediction.[103]^44 We previously developed novel
natural language processing techniques to extract large number of
disease-phenotypic relationships from over 21 million published
biomedical literature records and demonstrated the high potential of
integrating the high-level disease-phenotypic relationships with
lower-level genetic and genomic data in both disease genetics
understanding and drug discovery.[104]^45–48
Modeling heterogeneous and complex relationships among tens of
thousands biomedical entities extracted from different data resources
(eg FAERS, biomedical literature) is a challenging task. Recently, we
developed a novel a context-sensitive network (CSN) approach to model
the complex, heterogeneous, and context-specific interactions among
tens of thousands of biomedical entities, including diseases, disease
phenotypes, drugs, drug phenotypes, and genes.[105]^49 Compared to
existing biomedical networks where the relationships among entities are
often modeled by pairwise similarity (similarity-based network or SBN),
CSNs preserve the context information on how biomedical entities are
connected. Our recent study showed that CSN-based approach for disease
genetics prediction had significantly better performance than SBN-based
approach.[106]^49 In future studies, we will use the CSN framework to
model the context-specific (eg comorbidity, manifestation, risk/causal)
relationships among diseases and other biomedical entities and
integrate disease phenotypes with disease genetics and genomics data
for disease genetics prediction and drug discovery.
Large-scale disease comorbidity relationships offer unique
opportunities to understand shared genetic mechanisms underlying a
disease and its comorbidities, for example, AD and its associated
neuropsychiatric symptoms (eg anxiety, depression), AD, and type 2
diabetes. By integrating disease comorbidities and vast amounts of
genetics, genomic and pathway data, we can understand how disease
comorbidity occur, for example by directly sharing common disease genes
or indirectly coregulated by high-level biological mechanisms such as
cellular pathways.[107]^50
In summary, we demonstrated that we innovatively leveraged FAERS, a
comprehensive data resource for FDA postmarket drug safety
surveillance, for large-scale AD comorbidity mining. This early stage
exploratory study demonstrated the potential of disease-comorbidities
mining from FAERS in AD genetics discovery.
Data availability
Data available from the Dryad Digital Repository:
[108]https://doi.org/10.5061/dryad.3p9b4c2.
SUPPLEMENTARY MATERIAL
[109]Supplementary material is available at Journal of the American
Medical Informatics Association online.
Contributors
RX conceived the study. CZ performed the experiments and wrote the
manuscript. Both authors have participated in study discussion and
manuscript preparation. All authors read and approved the final
manuscript.
Funding
This work was supported by the Eunice Kennedy Shriver National
Institute of Child Health & Human Development of the National
Institutes of Health under the NIH Director’s New Innovator Award
number DP2HD084068 (Xu), NIH National Institute of Aging (1 R01
AG057557-01, Xu), NIH National Institute of Aging (1 R01 AG061388-01,
Xu), NIH National Institute of Aging (1 R56 AG062272-01, Xu), American
Cancer Society Research Scholar Grant (RSG-16-049-01 - MPC, Xu), NIH
Clinical and Translational Science Collaborative of Cleveland
(1UL1TR002548-01, Konstan).
Conflict of interest statement. None declared.
Supplementary Material
Supplementary Data
[110]Click here for additional data file.^ (4.5MB, xls)
REFERENCES