Abstract
Background
Genome-wide association studies (GWAS) have identified many individual
genes associated with brain imaging quantitative traits (QTs) in
Alzheimer’s disease (AD). However single marker level association
discovery may not be able to address the underlying biological
interactions with disease mechanism.
Results
In this paper, we used the MGAS (Multivariate Gene-based Association
test by extended Simes procedure) tool to perform multivariate GWAS on
eight AD-relevant subcortical imaging measures. We conducted multiple
iPINBPA (integrative Protein-Interaction-Network-Based Pathway
Analysis) network analyses on MGAS findings using protein-protein
interaction (PPI) data, and identified five Consensus Modules (CMs)
from the PPI network. Functional annotation and network analysis were
performed on the identified CMs. The MGAS yielded significant hits
within APOE, TOMM40 and APOC1 genes, which were known AD risk factors,
as well as a few new genes such as LAMA1, XYLB, HSD17B7P2, and NPEPL1.
The identified five CMs were enriched by biological processes related
to disorders such as Alzheimer’s disease, Legionellosis, Pertussis, and
Serotonergic synapse.
Conclusions
The statistical power of coupling MGAS with iPINBPA was higher than
traditional GWAS method, and yielded new findings that were missed by
GWAS. This study provides novel insights into the molecular mechanism
of Alzheimer’s Disease and will be of value to novel gene discovery and
functional genomic studies.
Keywords: Brain imaging, Multivariate gene-based genome-wide analysis,
iPINBPA network analysis, Consensus modules
Background
Alzheimer’s disease (AD) is a debilitating and highly heritable disease
with great complexity in its genetic contributors [[47]1]. Genome-wide
association studies (GWAS) of AD or AD biomarkers have been performed
at the single-nucleotide polymorphism (SNP) level [[48]2–[49]4] as well
as at the higher level (e.g., gene, pathway and/or network)
[[50]5–[51]8]. It is widely recognized that AD has a complicated
genetic mechanism involving multiple genes. Different combinations of
functionally related variants in genes and pathways may interact to
produce the phenotypic outcomes in AD, single SNP-level and gene-level
GWAS results are unlikely to completely reveal the underlying genetic
mechanism in AD. GWAS have greatly facilitated the identification of
genetic markers (e.g., single nucleotide polymorphisms or SNPs)
associated with brain imaging quantitative traits (QTs) in AD [[52]9,
[53]10]. As a complex disease, it is highly likely that AD is
influenced by multiple genetic variants [[54]11, [55]12]. The
identified single-SNP-single-QT associations typically have small
effect sizes. To bridge this gap, exploring single-SNP-multi-QT
associations may have the potential to increase statistical power and
identify meaningful imaging genetic associations. With this
observation, we employ the MGAS (Multivariate Gene-based Association
test by extended Simes procedure) tool [[56]13] to perform multivariate
GWAS on eight AD-relevant subcortical imaging measures.
In addition, biological interactions may be important in contributing
to intermediate imaging QTs and overall disease outcomes [[57]14].
Network-based analysis guided by biologically relevant connections from
public databases provides a powerful tool for improved mechanistic
understanding of complex disorders [[58]15–[59]18]. Considering that
the etiology of AD might depend on functional protein-protein
interaction (PPI) network, we conduct multiple iPINBPA (integrative
protein-interaction-network-based pathway analysis) [[60]19] network
analyses on MGAS findings using the PPI data, and identify Consensus
Modules (CMs) based on the iPINBPA discoveries. Functional annotation
and network analysis are subsequently performed on the identified CMs.
In order to enhance the ability to recognize the aggregation effect of
multiple SNPs, it may be desirable to perform association analysis at
the SNP set (or gene) level rather than at a single SNP level. This
paper aims to reveal the relationship between genetic markers and
multiple phenotypes, improve statistical power, and find GWAS missing
results by MGAS. Network analysis could provide meaningful biological
relationships to help interpret GWAS data to further study the genetic
mechanism of AD. A schematic framework of our analysis is shown in
Fig. [61]1.
Fig. 1.
[62]Fig. 1
[63]Open in a new tab
An overview of the proposed analysis framework. a Multivariate genome
wide association analysis of eight subcortical imaging measures. b
Network-based analysis of MGAS findings using the CM-based network
strategy. c Functional enrichment analysis of the identified consensus
modules (CMs)
Results
Participant characteristics
The subjects (N = 866) consisted of 467 males (53.9%) and 399 females
(46.1%) aged 48–91 years. Shown in Table [64]1 are the demographic and
clinical characteristics of these subjects stratified by five
diagnostic groups. There is no significant difference on the APOE e4
status in the five diagnostic groups. Significant differences are
observed in gender (p = 0.035) and education (p = 0.037). Age is
significantly different across the five groups (p < 0.001).
Furthermore, eight neuroimaging phenotypes
(LAmygVol,RAmygVol,LHippVol,RHippVol,
LAccumVol,RAccumVol,LPutamVol,RPutamVol; see Table [65]2) show the
significant difference across the five diagnostic groups (p < 0.001).
Shown in Fig. [66]2 is the correlation matrix of these eight
phenotypes. The correlation between LHippVol and RHippVol (r = 0.83)
and that between LPutamVol and RPutamVol (r = 0.90) are among the
highest.
Table 1.
Demographic information and total number of participants involved in
each analysis
CN (N = 183) SMC (N = 95) EMCI (N = 281) LMCI (N = 177) AD (N = 130)
p-values
Age (years) 74.29(6.01) 72.20(5.67) 71.35(7.30) 71.87(7.98) 74.56(8.07)
p < 0.001
Gender(M/F) 92/91 39/56 159/122 99/78 78/52 0.035
Education (years) 15.78(2.69) 16.81(2.55) 16.07(2.66) 16.31(2.89)
15.78(2.69) 0.037
APOE e4 allele prensent 62(34%) 37(39%) 108 (38%) 64(36%) 45 (35%)
0.161
LAmygVol
(i.e., QT for GWAS)
1377.2 (230.40) 1434.48(202.03) 1387.06(257.16) 1258.07(286.96) 1081.93
(229.42) p < 0.001
RAmygVol
(i.e., QT for GWAS)
1425.05 (221.11) 1494.02 (210.39) 1450.03 (255.77) 1327.93 (282.28)
1178.03 (240.54) p < 0.001
LHippVol
(i.e., QT for GWAS)
3626.23 (497.63) 3730.98 (529.42) 3573.44 (558.02) 3243.94 (635.34)
2912.15 (518.08) p < 0.001
RHippVol
(i.e., QT for GWAS)
3679.88 (500.54) 3824.23 (487.64) 3663.93 (535.43) 3319.32 (641.19)
2985.63 (540.11) p < 0.001
LAccumVol
(i.e., QT for GWAS)
463.51 (100.88) 481.37 (92.49) 474.07 (93.96) 447.91 (101.37) 417.88
(96.74) p < 0.001
RAccumVol
(i.e., QT for GWAS)
490.44 (95.69) 506.76 (93.38) 508.65 (107.70) 474.41 (105.41) 429.57
(96.56) p < 0.001
LCaudVol 3442.57 (505.06) 3411.83 (518.23) 3477.88 (572.20) 3460.31
(518.58) 3429.95 (698.19) 0.0851
RCaudVol 3583.43 (528.08) 3539.56 (545.67) 3640.80 (623.54) 3588.41
(540.35) 3577.51 (697.23) 0.608
LPallVol 1602.48 (204.00) 1592.57 (196.97) 1633.63 (211.56) 1587.56
(220.94) 1584.27 (231.88) 0.101
RPallVol 1413.36 (189.35) 1434.79 (177.66) 1444.59 (192.61) 1433.18
(202.08) 1414.99 (217.48) 0.444
LPutamVol
(i.e., QT for GWAS)
4788.04 (636.56) 4788.21 (687.42) 4939.46 (750.31) 4733.41 (671.80)
4479.92 (688.84) p < 0.001
RPutamVol
(i.e., QT for GWAS)
4586.97 (599.25) 4584.27 (604.70) 4708.87 (752.76) 4522.97 (669.07)
4327.07 (687.91) p < 0.001
LThalVol 6060.0 (714.50) 6064.96 (868.69) 6072.55 (604.95) 6054.47
(745.24) 5953.05 (718.64) 0.592
RThalVol 6081.58 (684.84) 6056.69 (722.49) 6176.78 (663.50) 6128.47
(732.28) 5980.53 (750.51) 0.098
[67]Open in a new tab
AD Alzheimer’s disease, ADNI Alzheimer’s Disease Neuroimaging
Initiative, CDR–SOB clinical dementia rating–sum of boxes, CN
cognitively normal, SMC significant memory concern, EMCI early mild
cognitive impairment, LMCI late mild cognitive impairment
Number (%) or mean (s.d.) is shown in each entry. P-values are computed
using one-way ANOVA (*except for gender using chi-square test)
Table 2.
14 FreeSurfer subcortical ROIs
Phenotype ID Description Region
LAmygVol the volume of the left amygdala Subcortical (temporal)
RAmygVol the volume of the right amygdala
LHippVol the volume of the left hippocampus
RHippVol the volume of the right hippocampus
LAccumVol the volume of the left accumbens Subcortical (striatum/basal
ganglia)
RAccumVol the volume of the right accumbens
LCaudVol the volume of the left caudate
RCaudVol the volume of the right caudate
LPallVol the volume of the left pallidum
RPallVol the volume of the right pallidum
LPutamVol the volume of the left putamen
RPutamVol the volume of the right putamen
LThalVol the volume of the left thalamus Subcortical (thalamus)
RThalVol the volume of the right thalamus
[68]Open in a new tab
Fig. 2.
[69]Fig. 2
[70]Open in a new tab
Phenotypic correlations between 8 subcortical volumes traits, volumes
Multivariate genome wide association study
In multivariate genome wide association study (MGAS) [[71]13, [72]20],
the top SNP hit is rs769449 from the APOE and TOMM40 region
(p = 1.19E-09) (Table [73]3). According to the hypothesis that genes
are the functional units in biology [[74]15, [75]21], multivariate
gene-level association p-values were also obtained by MGAS which
combines p-value information in regressing univariate phenotypes on
common SNPs. Figure [76]3 shows the Manhattan plot of the gene-based
MGAS results. Using Bonferroni corrected p-value of 0.05 as the
threshold, three genes (APOE, TOMM40, APOC1) were significantly
associated with the studied eight subcortical measures. Table [77]4
shows that the top 10 gene-level findings identified by MGAS, where
APOE (p = 2.77E-08), TOMM40 (p = 3.49E-08), and APOC1 (p = 2.09E-06)
are the well-known AD risk regions. LAMA1 (p = 3.79E-05) was reported
to encode the laminin alpha subunit associated with late onset AD in
the Amish [[78]22]. HSD17B7P2 (p = 8.40E-05) was reported to play an
important role in brain development [[79]23]. The other five gene-level
findings in top 10 are XYLB, NPEPL1, CYP24A1, OR5B2 and MIR7160.
Table 3.
The top 10 SNPs identified by MGAS
SNP Position P[SNP] Chr Gene P[MGAS]
rs769449 45,410,002 1.19E-09 19 APOE 2.77E-08
rs405509 45,408,836 2.01E-04 19 APOE 2.77E-08
rs439401 45,414,451 1.99E-01 19 APOE 2.77E-08
rs584007 45,416,478 2.25E-01 19 APOE 2.77E-08
rs445925 45,415,640 2.62E-01 19 APOE 2.77E-08
rs769449 45,410,002 1.19E-09 19 TOMM40 3.49E-08
rs2075650 45,395,619 2.19E-06 19 TOMM40 3.49E-08
rs157582 45,396,219 8.54E-05 19 TOMM40 3.49E-08
rs1160985 45,403,412 6.88E-06 19 TOMM40 3.49E-08
rs405509 45,408,836 2.01E-04 19 TOMM40 3.49E-08
[80]Open in a new tab
P[SNP] is the p-value of top 10 snp identified by MGAS; Chr:
Chromosome; P[MAGS] is the p-value of top 10 gene identified by MGAS
Fig. 3.
[81]Fig. 3
[82]Open in a new tab
A Manhattan plot showing the gene-level p values in multivariate GWAS
study of 8 subcortical volumes. The blue line corresponds to
p = 10^− 5; the red line corresponds to p = 10^− 7
Table 4.
The top 10 FDR corrected genes identified by MGAS in 8 subcortical ROIs
NO. Gene P[MGAS] Chr SNP P[LAmygVol] P[RAmygVol] P[LHippVol]
P[RHippVol]P[LAccumVol] P[RAccumVol] P[LPutamVol] P[RPutamVol]
1 APOE 2.77E-08 19 rs769449 3.98E-06 1.38E-04 5.77E-07 9.52E-09
1.58E+ 00 1.48E+ 00 2.11E+ 00 2.44E+ 00
2 TOMM40 3.49E-08 19 rs769449 3.98E-06 1.38E-04 5.77E-07 9.52E-09
1.58E+ 00 1.48E+ 00 2.11E+ 00 2.44E+ 00
3 APOC1 1.18E-06 19 rs4420638 3.09E-05 3.46E-05 7.70E-06 5.23E-07
6.99E-01 1.34E+ 00 3.06E+ 00 2.71E+ 00
4 LAMA1 3.79E-05 18 rs656734 2.62E-05 8.64E-07 3.11E-04 5.53E-04
5.64E+ 00 2.45E+ 00 4.27E+ 00 5.26E+ 00
5 XYLB 6.72E-05 3 rs196376 1.80E-02 6.61E-04 8.64E-05 6.63E-06 2.06E-02
5.68E-01 9.22E-01 7.13E-02
6 HSD17B7P2 8.40E-05 10 rs12221164 3.42E+ 00 4.05E+ 00 5.92E+ 00
5.09E+ 00 1.16E-04 1.02E-01 7.86E+ 00 7.93E+ 00
7 NPEPL1 8.63E-05 20 rs2426778 5.96E-01 3.89E+ 00 7.38E-01 2.17E+ 00
7.32E+ 00 2.09E-02 2.13E-05 3.39E-03
8 OR5B2 9.40E-05 11 rs11229440 1.62E+ 00 8.20E-01 3.90E-01 1.05E+ 00
1.67E+ 00 8.30E-03 2.22E-04 9.52E-05
9 MIR7160 1.10E-04 8 rs6558595 8.22E-01 7.69E+ 00 3.07E-01 1.13E+ 00
8.50E-02 1.68E+ 00 3.49E-05 3.67E-04
10 CYP24A1 1.20E-04 20 rs3787555 2.10E-01 4.75E+ 00 3.64E+ 00 2.19E+ 00
7.80E+ 00 1.75E+ 00 1.71E-01 9.55E-01
[83]Open in a new tab
P[MAGS] is the p-value of top 10 genes identified by MGAS; Chr:
Chromosome; P[LAmygVol] is the p-value of top 10 genes associated to
LAmygVol; P[RAmygVol] is the p-value of top 10 genes associated to
RAmygVol; P[LHippVol] is the p-value of top 10 genes associated to
LHippVol; P[RHippVol] is the p-value of top 10 genes associated to
RHippVol; P[LAccumVol] is the p-value of top 10 genes associated to
LAccumVol; P[RAccumVol] is the p-value of top 10 genes associated to
RAccumVol; P[LPutamVol] is the p-value of top 10 genes associated to
LPutamVol; P[RPutamVol] is the p-value of top 10 genes associated to
RPutamVol, Bold font indicates p-value < 0.000001, Italic font
indicates p-value< 0.00001
Consensus modules
Consensus modules (CMs) were constructed based on our previous work
[[84]24]. To search for subnetworks in the multivariate GWAS finding,
we ran iPINBPA ten times by varying the random seed value from 1 to 10.
Table [85]5 shows the top 5 subnetworks identified in each run,
including the Dice’s coefficient value with the most similar modules in
other runs. Compared with the standard iPINBPA method, our CM-based
network strategy was designed to identify more reliable modules across
multiple runs.
Table 5.
The characteristics of the identified consensus modules in 10 iPINBPA
runs
CM RunA Ta: the top subnetwork in RunA. Sb: the most similar subnetwork
to Ta in RunB
RunB 1 2 3 4 5 6 7 8 9 10
1 TN11 Rank of Sb in RunB TN[11] TN[12] TN[13] TN[14] TN[15] TN[16]
TN[17] TN[18] TN[19] TN[1,10]
DC(x, y) 1 0.87 0.63 0.38 0.87 0.53 0.44 0.57 0.92 0.51
2 TN21 Rank of Sb in RunB TN[21] TN[22] TN[23] TN[24] TN[25] TN[26]
TN[27] TN[28] TN[29] TN[2,10]
DC(x, y) 1 0.77 0.58 0.71 0.95 0.59 0.47 0.7 0.77 0.38
3 TN31 Rank of Sb in RunB TN[31] TN[32] TN[33] TN[34] TN[45] TN[36]
TN[37] TN[38] TN[49] TN[4,10]
DC(x,y) 1 0.89 0.89 1 0.74 0.74 0.77 0.89 0.89 0.44
4 TN41 Rank of Sb in RunB TN[41] TN[32] TN[33] TN[34] TN[45] TN[36]
TN[37] TN[38] TN[49] TN[3,10]
DC(x,y) 1 0.61 0.61 0.7 0.96 0.67 0.67 0.7 0.7 0.36
5 TN51 Rank of Sb in RunB TN[51] TN[62] TN[53] TN[74] TN[75] TN[86]
TN[67] TN[88] TN[89] TN[4,10]
DC(x, y) 1 1 1 1 1 1 1 1 0.96 0.52
[86]Open in a new tab
For the overlapping subnetworks, five unique CMs were identified
(Fig. [87]4). CM[1] contains eight genes, including MAPK8, ATF2,
TNFRSF1A, JUND, NR3C1, RB1, IKBKB, and CDK2. CM[2] contains four genes,
including IGF1R, PRKCA, INSR, and PTPRC. CM[3] contains six genes,
including APP, APOE, CASP3, C3, PLTP, and CNTF. CM[4] contains five
genes, including APP, APOE, CASP3, C3, and PLTP. CM[5] contains six
genes, including MED8, GATA6, HNF4A, MED1, THOC7, and PHYHIP. The
individual genes in the CMs might not demonstrate a direct statistical
significance. All the genes in an identified module have a collective
effect on the studied QTs, and thus have the potential to provide
valuable information about the underlying biology.
Fig. 4.
Fig. 4
[88]Open in a new tab
Consensus modules identified by CM-based network strategy. Different
CMs are showed by different colors. The blue color indicates genes in
CM[1]; green color indicates genes in CM[2]; cyan color indicates genes
in CM[3]; red color indicates genes in CM[4]; and yellow color
indicates genes in CM[5].The genes appearing in multiple CMs have
multiple colors. For example, the genes APOE, APP, CASP3, C3, PLTP are
in both CM[3] and CM[4]
Pathway analysis of consensus modules
In our work, we hypothesize that the identified trait prioritized CMs
with high replication might have strong functional associations with
the studied subcortical volume phenotypes. We clustered the relevant
pathways for five CMs and plotted a heat map to summarize the
relationships between these pathways and CMs (Fig. [89]5). Figure [90]5
shows that Alzheimer’s disease, Apoptosis, TNF signaling pathway,
Herpes simplex infection, MAPK signaling pathway are the pathways
significantly enriched by one or more CMs [[91]25]. We also observe
that CM[1], CM[3], CM[4] enriched many interesting pathways. In
particular, CM[3] demonstrates the strongest functional association
with AD (p = 4.94E-05).
Fig. 5.
[92]Fig. 5
[93]Open in a new tab
Functional annotation of the five identified consensus modules
(CM[1]-CM[5]) using KEGG pathways. The five consensus modules were
treated as five gene sets, and went through pathway enrichment analysis
based on the KEGG pathway database. The enrichment results at a nominal
statistical threshold of p < 0.05 are shown. -log10(p) values are
color-mapped and displayed in the heat map. Heat map blocks labeled
with “x” reach the nominal significance level of p < 0.05. Only top
enrichment findings are included in the heat map, and so each row
(pathway) has at least one “x” block
Discussion
In this work, we performed multivariate genome wide association study
(MGAS) of eight AD-relevant subcortical ROIs, using 866 samples in the
ADNI database. To the best of our knowledge, this is the first MGAS on
the quantitative traits of eight subcortical ROIs. In our MGAS, we
confirmed associations at multiple genes previously associated with AD,
such as APOE (p = 2.77E-08, rs769449), TOMM40 (p = 3.49E-08, rs769449),
APOC1 (p = 1.18E-06, rs4420638), as well as identified a few novel
associations shown in Table [94]4. Table [95]4 also shows that the
associations to individual subcortical QTs (e.g. APOE, TOMM40, APOC1:
associated to LAmygVol, RAmygVol, LHippVol and RHippVol) have a range
of different significances.
XYLB (p = 6.72E-05, rs196376) had been reported to be associated with
neurological diseases such as ischemic stroke [[96]26]. We observed
that this gene is associated to RAmygVol, LHippVol and RHippVol. LAMA1
(p = 3.79E-05, rs656734) encodes one of the alpha 1 subunits of
Laminin, which has been demonstrated to be expressed in the hippocampal
neuronal cell layers [[97]27]. NPEPL1 was confirmed to be a potential
direct target of miR-19a in a breast cancer study [[98]28] and miR-19a
was up-regulated in primary motor cortex and hippocampus in the brain
of amyotrophic lateral sclerosis mice at late disease stage [[99]29].
In our study, we found that NPEPL1 was associated to LPutamVol
(P[LPutamVol] = 2.13E-05) and RPutamVol (P[RPutamVol] = 3.39E-03). In
Table [100]4, the hippocampus and amygdala volumes were associated with
multiple genes. While CYP24A1 was associated with none of eight studied
QTs, it was identified in MGAS to have an overall association with all
eight QTs. The statistical efficacy of MGAS of the detected gene
associations appears to be more powerful than univariate phenotype
models. Given that OR5B2 and MIR7160 have not been reported to be
related to AD or AD related biomarkers, it warrants further
investigation to examine their roles on AD in independent cohorts.
Because we found that different subnetworks could be identified by
using different random seed values, we present the consensus modules
discovered by an enhanced iPINBPA strategy. The genes for the CMs might
not show a direct individual statistical significance but demonstrated
a collected effect on the studied phenotypes. We assessed the
significance of each identified consensus module (Table [101]6). CM[1]
(Score = 3.32, p = 9.00E-04) contains totally 8 genes, including KEGG
AD genes TNFRSF1A. CM[2] (Score = 1.41, p = 0.16) contains totally 4
genes without reaching the significance level. CM[3] (Score = 4.37,
p = 1.24E-05) contains 6 genes, including KEGG AD genes APOE APP, and
CASP3. CM[4] (Score = 4.32, p = 1.56E-05) contains 5 genes, including
KEGG AD genes APOE, APP, and CASP3. CM[5] (Score = 1.89, p = 5.88E-02)
contains 6 genes with a marginal significance. The genes in the
significant CMs warrant further investigation. The consensus module
strategy applied to the iPINBPA framework yielded more stable results
than the standard iPINBPA.
Table 6.
The properties of Consensus Modules identified from the PPI network
Consensus Module Nodes Score P-value
CM[1] MAPK8, ATF2, TNFRSF1A, JUND, NR3C1, RB1, IKBKB, CDK2 3.32
9.00E-04
CM[2] IGF1R, PRKCA, INSR, PTPRC 1.41 1.59E-01
CM[3] APP, APOE, CASP3, C3, PLTP, CNTF 4.37 1.24E-05
CM[4] C3, PLTP, APP, APOE, CASP3 4.32 1.56E-05
CM[5] MED8, GATA6, HNF4A, MED1, THOC7, PHYHIP 1.89 5.88E-02
[102]Open in a new tab
Scores were computed by using the adjusted network score
The intersection of CM[3] and CM[4] yielded five genes, including APP,
APOE, CASP3, C3, and PLTP. The C3 gene was shown to contribute to the
pathogenesis of demyelinating disease by directly or indirectly
chemoattracting encephalitogenic cells to the CNS [[103]30]. The PLTP
gene was reported to play an important role in Aβ metabolism and it is
an interesting topic to further elucidate functions of PLTP in AD
susceptibility. Table [104]7 shows the top ten pathways enriched by the
intersection genes. Among these genes, the APP, APOE, and CASP3 genes
are known AD risk factors. Several significant pathways were observed,
including Alzheimer’s disease (p-value = 2.50E-05, FDR = 1.54E-03),
Legionellosis (p-value = 1.90E-04, FDR = 7.03E-03), Pertussis
(p-value = 3.63E-04, FDR = 8.40E-03), Serotonergic synapse
(p-value = 8.02E-04, FDR = 1.28E-02), Tuberculosis(p-value = 2.00E-03,
FDR = 1.92E-02), Herpes simplex infection (p-value = 2.13E-03,
FDR = 1.92E-02) and so on. It has been reported that Legionella
pneumonphila, one species of Legionella, is an intracellular
microorganism that causes Legionellosis. This type of pulmonary
infection is usually associated with neurological dysfunction
[[105]31]. Serotonergic neurotransmission and synapse activity are
highlighted as primary pathological factors in neuropsychiatric
symptoms [[106]32, [107]33]. Pertussis toxin inhibits the apoptosis and
DNA synthesis caused by FAD APP mutants which precedes FAD APP-mediated
apoptosis in neurons and inhibition of neuronal entry into the cell
cycle inhibits the apoptosis [[108]34]. Apoptotic pathways and DNA
synthesis are activated in neurons in the brains of individuals with
AD.
Table 7.
The pathway of genes appearing in all five consensus modules
NO. Pathway p-value FDR Genes
1 Alzheimer’s disease 2.50E-05 1.54E-03 CASP3, APOE, APP
2 Legionellosis 1.90E-04 7.03E-03 CASP3, C3
3 Pertussis 3.63E-04 8.40E-03 CASP3, C3
4 Serotonergic synapse 8.02E-04 1.28E-02 CASP3, APP
5 Tuberculosis 2.00E-03 1.92E-02 CASP3, C3
6 Herpes simplex infection 2.13E-03 1.92E-02 CASP3, C3
7 Viral carcinogenesis 2.51E-03 1.92E-02 CASP3, C3
8 Apoptosis - multiple species 1.32E-02 2.94E-02 CASP3
9 Amyotrophic lateral sclerosis (ALS) 2.03E-02 3.92E-02 CASP3
10 Staphylococcus aureus infection 2.23E-02 4.12E-02 C3
[109]Open in a new tab
Due to the limited number of samples available to us, in this work we
were only able to perform a discovery study. In the future, when more
data become available replication studies in independent cohorts
warrant investigation to validate the identified CMs.
Conclusion
In this study, we performed MGAS analysis to explore the multivariate
imaging genetic association effects for a set of AD-related subcortical
measures. In addition, we conducted the iPINBPA network analysis to
discover consensus modules related to these imaging phenotypes from a
protein-protein interaction network. The MGAS analysis identified
several genes associated with the studied imaging phenotypes, including
APOE, TOMM40, APOC1, LAMA1, XYLB, HSD17B7P2 and others. The statistical
power of coupling MGAS with iPINBPA was higher than traditional GWAS
method, and yielded findings missed by GWAS. In this work, we reported
top five consensus modules based on MGAS results. Network-based
analysis can take into account information on biological relationships
to interpret GWAS data. Our results suggested several susceptible genes
and network modules for further investigation and replication to better
understand the genetic mechanism of Alzheimer’s Disease.
Methods
Subjects and data
Data used in the preparation of this article were obtained from the
Alzheimer’s Disease Neuroimaging Initiative (ADNI) database
([110]adni.loni.usc.edu). The ADNI was launched in 2003 as a
public-private partnership, led by Principal Investigator Michael W.
Weiner, MD. The primary goal of ADNI has been to test whether serial
magnetic resonance imaging (MRI), positron emission tomography (PET),
other biological markers, and clinical and neuropsychological
assessment can be combined to measure the progression of mild cognitive
impairment (MCI) and early Alzheimer’s disease (AD). For up-to-date
information, see [111]www.adni-info.org.
Baseline 3 T MRI scans, demographic information, and diagnosis for the
ADNI-1 and ADNI-GO/2 cohorts were downloaded [[112]35]. MRI scans were
analyzed using FreeSurfer version 5.1 for brain segmentation. We
examined the volume measures of 14 subcortical ROIs; see
Tables [113]1-[114]2. We performed analysis of variance (ANOVA) to
evaluate the diagnostic effect on 14 volume measures. Using the
significance level of p < 0.05, we focused on the volume measures of
eight subcortical ROIs (i.e., LAmygVol, RAmygVol, LHippVol, RHippVol,
LAccumVol, LPutamVol, RPutamVol; see Table [115]2) in subsequent
genetic association studies.
Genotyping data of both ADNI-1 and ADNI-GO/2 cohorts were downloaded,
and then quality controlled and combined as described in [[116]36]. A
total of 866 non-Hispanic Caucasian participants with both complete
subcortical imaging measurements and genotyping data were included in
the study. The study sample (N = 866) included 183 cognitively normal
(CN), 95 significant memory concern (SMC), 281 early MCI (EMCI), 177
late MCI (LMCI) and 130 AD subjects. The demographic and clinical
characteristics of participants, stratified by the diagnosis, are shown
in Table [117]1.
Multivariate genome wide association study
GWAS was performed to examine the main effects of 563,980 SNPs on eight
subcortical measures as quantitative traits (QTs). Linear regression
model was performed using PLINK to examine the association between each
SNP-QT pair ([118]https://www.cog-genomics.org/plink2) [[119]37]. An
additive genetic model was tested with age, gender and brain volume as
covariates. We computed the correlation matrix (8 × 8 matrix) for the
QT data containing eight imaging phenotypes (Fig. [120]2). We applied
the MGAS (Multivariate Gene-based Association test by extended Simes
procedure) tool to all 563,980 SNPs and examined their multivariate
gene-based associations with eight imaging QTs [[121]13]. A Manhattan
plot was generated using R ([122]http://www.r-project.org) to visualize
the gene-level MGAS results for our work (Fig. [123]3).We obtained one
multivariate gene-based p-value P[MGAS] as follows.
[MATH: PMGAS=minqep
jqej :MATH]
Here, q[e] represents the effective number of p-values within a gene,
q[ej] represents the effective number of p-values among the top j
p-values where j runs from 1 to 8 × 563,980, and p[j] represents the
j-th p-value in the list of ordered p-values. P[MGAS] is the smallest
weighted p-value within a gene associated with the null hypothesis that
none of the eight phenotypes are related to the 563,980 SNPs within the
gene, and the alternative hypothesis that at least one of the eight
phenotypes is related to at least one of the 563,980 SNPs. We
identified 1386 genes with p-value< 0.05 [[124]13, [125]38].
Identifying consensus models using iPINBPA
This study used the protein-protein interaction (PPI) data from the
Human Protein Reference Database (HPRD, [126]http://www.hprd.org)
[[127]39], containing 9617 proteins and 39,240 interactions. Gene-level
p-values obtained from MGAS of subcortical imaging phenotypes were
mapped to the PPI network. Then the network was followed by an iPINBPA
(integrative protein-interaction-network-based pathway analysis)
procedure [[128]19] to identify enriched PPI network modules. Consensus
modules (CMs) were identified using the following approach based on our
prior study [[129]24].
Briefly, building on our prior study, we focus on analyzing the top 5
subnetworks (TN[1], TN[2], TN[3], TN[4], TN[5]) in each iPINBPA run.
Let TN[ij] be the top i-th subnetwork identified in the j-th run, where
i ∈ {1, 2, …, 5} and j ∈ {1, 2, ...10}. We first find SN[n](TN[ij]),
which is the most similar subnetwork to TN[ij] in the n-th run, where
n ∈ {1, 2, …, 10}\{j}. Clearly, we have.
SN[n](TN[ij]) = argmax[sn] DC(TN[ij], sn),
where sn is any subnetwork enriched in Run n, and DC(x, y) indicates
the dice coefficients between two subnetworks x and y. Consequently,
for Run j, we define its i-th consensus module CM[ij] as follows.
[MATH: CMij=TN
ij∩⋂j=1,2,...10i=1,2,...5SNnTNij,n∈12...10\j :MATH]
Namely, CM[ij] is the intersection of TN[ij] and its most similar
subnetworks identified in all the other runs. In our empirical study,
we will report the consensus modules based on Run 1, i.e., CM[i1] as
the i-th consensus module.
Functional analysis
Cytoscape 3.4 [[130]40] was used to visualize the identified CMs. We
used ToppGene online tool ([131]https://toppgene.cchmc.org/) for
functional enrichment analysis. The ToppGene suite is an advanced
bioinformatics tool, it could detect and arrange candidate genes
through a comprehensive assessment of a variety of factors, including
gene ontology (GO) annotating, phenotype, signaling pathway and protein
interactions from a specific list of genes [[132]41]. In this case, the
top 10 findings of our multivariate gene-based association analysis
were analyzed for functional enrichment. For the identified CMs, we
also performed functional enrichment analysis using the ToppGene Suite.
Acknowledgements