Abstract

Background

   The cerebrospinal fluid (CSF) levels of total tau (t-tau) and Aβ[1–42]
   are potential early diagnostic markers for probable Alzheimer’s disease
   (AD). The influence of genetic variation on these CSF biomarkers has
   been investigated in candidate or genome-wide association studies
   (GWAS). However, the investigation of statistically modest associations
   in GWAS in the context of biological networks is still an
   under-explored topic in AD studies. The main objective of this study is
   to gain further biological insights via the integration of statistical
   gene associations in AD with physical protein interaction networks.

Results

   The CSF and genotyping data of 843 study subjects (199 CN, 85 SMC, 239
   EMCI, 207 LMCI, 113 AD) from the Alzheimer’s Disease Neuroimaging
   Initiative (ADNI) were analyzed. PLINK was used to perform GWAS on the
   t-tau/Aβ[1–42] ratio using quality controlled genotype data, including
   563,980 single nucleotide polymorphisms (SNPs), with age, sex and
   diagnosis as covariates. Gene-level p-values were obtained by VEGAS2.
   Genes with p-value ≤ 0.05 were mapped on to a protein-protein
   interaction (PPI) network (9,617 nodes, 39,240 edges, from the HPRD
   Database). We integrated a consensus model strategy into the iPINBPA
   network analysis framework, and named it as CM-iPINBPA. Four consensus
   modules (CMs) were discovered by CM-iPINBPA, and were functionally
   annotated using the pathway analysis tool Enrichr. The intersection of
   four CMs forms a common subnetwork of 29 genes, including those related
   to tau phosphorylation (GSK3B, SUMO1, AKAP5, CALM1 and DLG4), amyloid
   beta production (CASP8, PIK3R1, PPA1, PARP1, CSNK2A1, NGFR, and RHOA),
   and AD (BCL3, CFLAR, SMAD1, and HIF1A).

Conclusions

   This study coupled a consensus module (CM) strategy with the iPINBPA
   network analysis framework, and applied it to the GWAS of CSF
   t-tau/Aβ1-42 ratio in an AD study. The genome-wide network analysis
   yielded 4 enriched CMs that share not only genes related to tau
   phosphorylation or amyloid beta production but also multiple genes
   enriching several KEGG pathways such as Alzheimer’s disease, colorectal
   cancer, gliomas, renal cell carcinoma, Huntington’s disease, and
   others. This study demonstrated that integration of gene-level
   associations with CMs could yield statistically significant findings to
   offer valuable biological insights (e.g., functional interaction among
   the protein products of these genes) and suggest high confidence
   candidates for subsequent analyses.

   Keywords: Alzhermer’s disease, CSF biomarker, t-tau/Aβ[1–42] ratio,
   Network analysis, Pathway analysis, Consensus module, CM-iPINBPA

Background

   Alzheimer’s disease (AD) is a neurodegenerative disease and the most
   common form of dementia. Although its etiology is not completely
   understood, a genetic component of susceptibility to AD has been shown
   in the literature [[51]1–[52]6]. Cerebrospinal fluid (CSF) studies
   [[53]7–[54]10] have also been conducted in AD to identify differential
   biomarkers. Given that one hallmark of AD pathology is a cerebral
   accumulation of amyloid-β 1–42 peptide (Aβ[1–42]) in amyloid plaques,
   the Aβ[1–42] level has been shown markedly reduced in CSF. In addition,
   the total tau (t-tau) protein level has been shown significantly
   elevated in the CSF of AD patients. As a result, the CSF t-tau/Aβ[1–42]
   ratio has also been studied as a biomarker for differentiating AD from
   normal older adults [[55]5, [56]11–[57]13].

   With the recent advances in high-throughput genotyping technologies,
   Genome-Wide Association Studies (GWAS) have been applied to the
   analysis of CSF biomarkers (e.g., [[58]13, [59]14]) to identify
   relevant genetic markers, such as Single Nucleotide Polymorphisms
   (SNPs). While most studies examined genetic associations with CSF
   biomarkers at the individual SNP or gene level, mining higher level
   genetic associations using biological interaction networks is still an
   under-explored topic for the CSF biomarker studies in AD. Recently,
   many studies in other domains have demonstrated that integrative
   analyses of GWAS data and protein-protein interaction (PPI) networks
   can provide valuable biological insights. Some methods have been
   proposed to identify subnetworks enriched by GWAS results
   [[60]15–[61]19]. One tool is iPINBPA [[62]16, [63]20–[64]23], which is
   based on the fact that the genes identified in GWAS are more likely to
   physically interact as well as to belong to the same or related
   pathways.

   With these observations, in this work, we performed a genome-wide
   network-based pathway analysis for CSF studies in AD. We analyzed an AD
   cohort from Alzheimer’s Disease Neuroimaging Initiative (ADNI), used
   the CSF biomarker t-tau/Aβ[1–42] ratio as the test phenotype or
   quantitative trait (QT), downloaded the PPI network from the Human
   Protein Reference Database (HPRD) ([65]http://www.hprd.org/), and
   applied the iPINBPA analysis to the GWAS findings of the CSF
   t-tau/Aβ[1–42] ratio. Our goal was to search for subnetworks or network
   modules enriched by the CSF GWAS findings, which may offer valuable
   biological insights and suggest high confidence candidates for
   subsequent analyses.

Methods

   Figure [66]1 illustrates the work-flow of this study. The genotyping
   and CSF data were downloaded from the Alzheimer’s Disease Neuroimaging
   Initiative (ADNI) database. GWAS of the CSF QT was performed using the
   PLINK software [[67]24]. This resulted in 563,980 SNPs with associated
   p-values, which were then assigned to 22,179 genes. The gene assignment
   and gene-based p-values were calculated using the VEGAS2 software
   [[68]25]. The nominally significant genes (i.e., gene-based
   p-values ≤ 0.05) were mapped onto the HPRD PPI network [[69]26, [70]27]
   and analyzed using the iPINBPA method in order to identify the enriched
   subnetworks. The Enrichr pathway analysis tool [[71]28] was applied to
   functionally annotate the subnetwork.

Fig. 1.

   Fig. 1
   [72]Open in a new tab

   Flow chart. Step1: GWAS using Plink was performed on 843 ADNI
   participants. Step2: VEGAS2 was used to obtain gene-level p-values,
   which were mapped onto the HPRD PPI network. Step3: Network-based
   analysis was performed by iPINBPA software 10 times, and 10 groups of
   subnetworks were obtained. For the top subnetwork of each result, we
   computed a consensus module by intersecting this top subnetwork with
   the most similar subnetworks obtained in all the other nine results.
   Step4: KEGG pathway enrichment analysis was performed for each
   consensus module by Enrichr tool.

Subjects

   Data used in the preparation of this article were obtained from the
   Alzheimer’s Disease Neuroimaging Initiative (ADNI) database
   (adni.loni.usc.edu). One goal of ADNI has been to test whether serial
   magnetic resonance imaging (MRI), positron emission tomography (PET),
   other biological markers, and clinical and neuropsychological
   assessment can be combined to measure the progression of mild cognitive
   impairment (MCI) and early AD. For up-to-date information, see
   [73]http://adni.loni.usc.edu/. Appropriate Institutional Review Boards
   approval occurred at each ADNI site and informed consent was obtained
   from each participant or authorized representative.

   In this study, our analyses were concentrated on 843 ADNI subjects
   whose genotyping data (after quality control described below) and the
   baseline CSF biomarker data including t-tau and Aβ[1–42] were both
   available. This sample included 199 cognitively normal (CN) subjects,
   85 subjects with significant memory concern (SMC), 239 subjects with
   early mild cognitive impairment (EMCI), 207 subjects with late mild
   cognitive impairment (LMCI), and 113 subjects with Alzheimer’s disease
   (AD). Table [74]1 shows the demographic and clinical characteristics of
   these participants at the baseline, where the characteristics were
   analyzed with the statistical software IBM SPSS [[75]29] Version 2 for
   differences across diagnostic groups using one-way analysis of variance
   (ANOVA) or Chi-square test.

Table 1.

   Selected demographic and clinical characteristics of 843ADNI
   participants
   CN
   (N = 199) SMC
   (N = 85) EMCI
   (N = 239) LMCI
   (N = 207) AD
   (N = 113) P-value
   Age (years) 74.4 (5.79) 72.0(5.48) 71.4(7.30) 72.4(7.62) 75.2 (8.19)
   p < 0.001
   Women* 96(48%) 50(59%) 102(43%) 83(40%) 45(40%) 0.002*
   Education (years) 16.2 (2.82) 16(2.79) 16.2(2.84) 16.4(2.53) 16.4(2.56)
   0.764
   APOE e4 allele present 47(24%) 31(36%) 99 (41%) 112(54%) 74 (65%)
   p < 0.001
   CDR-SOB 0.04(0.14) 0.08(0.18) 1.27(0.77) 1.65(0.94) 4.53 (1.70)
   p < 0.001
   Mini mental status examination 29.1(1.18) 29.0(1.2) 28.3(1.62)
   27.5(1.75) 23.1 (2.05) p < 0.001
   Logical memory immediate
   recall 14.42(3.00) 14.44(3.34) 11.09(2.68) 7.18(3.06) 4.15 (2.70)
   p < 0.001
   Logical memory delayed
   recall 13.34(3.13) 13.29(3.31) 8.97(1.73) 3.94 (2.7) 1.52 (1.80)
   p < 0.001
   t-tau/Aβ[1-42]Ratio
   (i.e., QT for GWAS) 0.40 (0.27) 0.37(0.24) 0.50(0.45) 0.70(0.47) 0.98
   (0.49) p < 0.001
   [76]Open in a new tab

   AD Alzheimer’s disease, ADNI Alzheimer’s Disease Neuroimaging
   Initiative, CDR–SOB clinical dementia rating–sum of boxes, CN
   cognitively normal, SMC significant memory concern, EMCI early mild
   cognitive impairment, LMCI late mild cognitive impairment. Number (%)
   or mean (s.d.) was shown in each entry. P-values were assessed due to
   significant differences between diagnosis groups, which computed using
   one-way ANOVA (*except for gender using chi-square test)

CSF Biomarker Measurement as Quantitative Trait

   The amyloid-β 1–42 peptide (Aβ[1–42]) and total tau (t-tau) measured in
   the baseline CSF samples of the participants were downloaded from the
   ADNI database. The t-tau/Aβ[1–42] ratio was computed and used as the
   quantitative trait in the GWAS.

Genotyping Data and Quality Control

   The genotyping data of the participants were collected using either the
   Illumina 2.5 M array (a byproduct of the ADNI whole genome sequencing
   sample) or the Illumina OmniQuad array. For the present analyses,
   single nucleotide polymorphism (SNP) markers that were present on both
   arrays were included [[77]6, [78]30, [79]31].

   Quality control (QC) was performed on the ADNI participants using the
   PLINK v1.07 software [[80]24] ([81]http://zzz.bwh.harvard.edu/plink/),
   following a similar procedure described in Li et al. [[82]32]. Briefly,
   SNPs not meeting any of the following criteria were excluded: (1) SNPs
   available on both 2.5 M array and OmniQuad array, (2) call rate per
   SNP ≥ 95%; (3) minor allele frequency ≥ 5% (n = 1,845,510 SNPs were
   excluded based on Criteria 1–3); and (4) Hardy-Weinberg equilibrium
   test of p ≥ 10^−6 (n = 198 SNPs were excluded) using control subjects
   only. Participants were excluded from the analysis if any of the
   following criteria were not satisfied: (1) call rate per subject ≥ 95%
   (no participant was excluded), (2) sex check (1 participant was
   excluded), (3) identity check for related pairs (8 sibling pairs and 1
   sibling triple were identified with PI_HAT >0.5, 1 participant from
   each family was randomly selected and included in the study).

   Population stratification analysis was performed using EIGENSTRAT
   [[83]33], and confirmed using STRUCTURE [[84]34]. It yielded 89 study
   participants who did not cluster with the remaining subjects and with
   the CEU HapMap samples who are primarily of European ancestry
   (non-Hispanic Caucasians). These 89 participants were excluded from the
   analysis. After QC, 563,980 SNPs and 843 individuals remained available
   for the subsequent GWAS, network and pathway analyses.

SNP-Level and Gene-Level GWAS Analyses

   For GWAS, to examine the main effects, linear regression was
   implemented by PLINK to evaluate the association between each SNP and
   the t-tau/Aβ[1–42] ratio. An additive genetic model was tested with
   covariates, including age, gender, and diagnosis (through five binary
   dummy variables indicating CN, SMC, EMCI, LMCI, and AD). Then, the
   SNP-level p-values were obtained.

   The VEGAS2 software [[85]25] was used to assign 563,980 SNPs to 22,179
   autosomal genes according to positions on the UCSC Genome Browser (out
   of 26,292 in hg19), and to compute gene-level p values. The software
   applies simulations from the multivariate normal distribution to employ
   information from a defined subset of markers within a gene as well as
   take into account linkage disequilibrium between the markers. To save
   running time, we use a multi-stage approach to adaptively determine the
   number of simulations per gene: (Stage 1) we run 10^3 simulations for
   all the genes; (Stage 2) we run 10^4 simulations only for genes with
   Stage 1 empirical p-values ≤ 0.1; (Stage 3) we run 10^6 simulations
   only for genes with Stage 2 empirical p-values ≤ 0.001. We interpret an
   empirical p-value of 0 from 10^6 simulations as p < 10^−6. Given 22,179
   genes included in this analysis, a Bonferroni-corrected threshold is
   p < 2.25 × 10^−6 (i.e., 0.05/22,179), which can be exceeded by the
   theoretically smallest empirical p-value shown above. A Manhattan plot
   was generated using R ([86]http://www.r-project.org) to visualize the
   gene-level GWAS results for our work.

Network-level Analysis

   The Human PPI data (n = 9,617) were downloaded from the Human Protein
   Reference Database (HPRD, [87]http://www.hprd.org); gene-level p-values
   obtained from the GWAS of the CSF t-tau/Aβ[1–42] ratio were mapped to
   the PPI network. The integrative protein interaction network-based
   pathway analysis (iPINBPA) software [[88]22] was used to integrate GWAS
   findings with physical evidence of interaction at the protein level,
   and to detect new high-level associations (i.e., subnetworks of
   functionally interacted genes) with the CSF biomarker. Briefly, iPINBPA
   identifies enriched subnetworks using the following three steps.

   In Step 1, using the GWAS findings, the nominally significant genes
   (i.e., p ≤ 0.05) are treated as seed genes, and assigned with certain
   weights (e.g., in this work, 1 for seed genes, 0 for the rest). After
   that, a random walk with restart strategy is employed to smooth these
   weights over the entire network. Intuitively, the nodes in the network
   are weighted based on their connectivity to seed genes (i.e.,
   guilt-by-association). Let n[k] be a node on the PPI network mapped
   with gene-level p-value p[i]. Let e[ij] be the edge connecting n[i] and
   n[j], and W[ij] be the weight of e[ij]. All the W[ij]’s form the
   adjacency matrix W. Extending Köhler’s approach [[89]35], iPINBPA
   weights the edge e[ij] as follows:
   [MATH: <msub><mi>W</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><mspace
   width="0.25em"></mspace><mo>=</mo><mspace
   width="0.25em"></mspace><mfenced close=")" open="("><mrow><mfenced
   close=")"
   open="("><mrow><mn>1</mn><mo>−</mo><msub><mi>p</mi><mi>i</mi></msub></m
   row></mfenced><mspace width="0.25em"></mspace><mo>+</mo><mspace
   width="0.25em"></mspace><mfenced close=")"
   open="("><mrow><mn>1</mn><mo>−</mo><msub><mi>p</mi><mi>j</mi></msub></m
   row></mfenced></mrow></mfenced><mo
   stretchy="true">/</mo><mn>2</mn><mtext>.</mtext> :MATH]

   In addition, it normalizes the adjacency matrix W by its columns. After
   each step of random walk, a score vector is calculated as
   [MATH: <mi>P</mi><mfenced close=")"
   open="("><mi>t</mi></mfenced><mspace
   width="0.25em"></mspace><mo>=</mo><mspace
   width="0.25em"></mspace><mfenced close=")"
   open="("><mrow><mn>1</mn><mo>−</mo><mi>r</mi></mrow></mfenced><mi>W</mi
   ><mspace width="0.25em"></mspace><mo>·</mo><mspace
   width="0.25em"></mspace><mi>P</mi><mfenced close=")"
   open="("><mrow><mi>t</mi><mo>−</mo><mn>1</mn></mrow></mfenced><mspace
   width="0.25em"></mspace><mo>+</mo><mspace
   width="0.25em"></mspace><mi>r</mi><mi>P</mi><mfenced close=")"
   open="("><mn>0</mn></mfenced><mtext>,</mtext> :MATH]

   where P(t) is the score after walking t steps, and r is the restart
   ratio. In this work, we assign 1 to all the seed genes, and 0 to the
   rest. Upon the completion of the random walk after T steps, the vector
   P(T) contains the node weights, which reflect the topological
   connections to the seed genes [[90]36].

   In Step 2, given a network A containing k nodes, iPINBPA defines its
   score by combining the gene-level p-values with node weights described
   above, using the Liptak-Stouffer method. Specifically, the network
   score of A is defined, via weighted Z transform test [[91]37], as
   follows:

   [MATH: <msub><mi>Z</mi><mi>A</mi></msub><mo>=</mo><mfrac><mstyle
   displaystyle="true"><msub><mo
   stretchy="true">∑</mo><mrow><mi>i</mi><mo>∈</mo><mi>A</mi></mrow></msub
   ><mrow><mi>P</mi><msub><mfenced close=")"
   open="("><mi>T</mi></mfenced><mi>i</mi></msub><msub><mi>z</mi><mi>i</mi
   ></msub></mrow></mstyle><msqrt><mstyle displaystyle="true"><msub><mo
   stretchy="true">∑</mo><mrow><mi>i</mi><mo>∈</mo><mi>A</mi></mrow></msub
   ><mrow><mi>P</mi><msubsup><mfenced close=")"
   open="("><mi>T</mi></mfenced><mi>i</mi><mn>2</mn></msubsup></mrow></mst
   yle></msqrt></mfrac> :MATH]
   .

   A random sampling of gene sets of size k∈[1, 500] for 1000 times was
   applied in iPINBPA [[92]36] to determine the background distribution of
   the network score. Using this distribution, the adjusted network score
   of A is defined as:

   [MATH:
   <msub><mi>S</mi><mi>A</mi></msub><mo>=</mo><mfrac><mrow><msub><mi>Z</mi
   ><mi>A</mi></msub><mo>−</mo><msub><mi>μ</mi><mi>k</mi></msub></mrow><ms
   ub><mi>σ</mi><mi>k</mi></msub></mfrac> :MATH]
   ,

   where Z[A] is the network score, and μ [k] and σ [k] are respectively
   the mean and the standard deviation of the background distribution of
   the network scores at size k.

   In Step 3, a greedy algorithm was developed to search for modules with
   high network scores, i.e., those enriched in genes with low p-values
   and high weights. Details about the algorithm is available in Wang et
   al. [[93]22].

   In this work, the parameters were set in iPINBPA as follows: r = 0.5,
   T = 5, NetScore > 3.0, NetSize ≥ 5, and MaxNetSize ≤ 300. Given the
   stochastic nature of the iPINBPA algorithm, we ran iPINBPA ten times,
   respectively by setting the random seed value from 1 to 10.
   Consequently, we obtained ten groups of enriched subnetworks (GNs)
   identified by iPINBPA. Below, we couple a consensus module (CM)
   strategy with iPINBPA (named as CM-iPINBPA) to generate consensus
   findings from these analyses.

   Given two subnetworks m and n, we use Dice’s coefficient DC(m,n) to
   measure their similarity:

   [MATH: <mi>D</mi><mi>C</mi><mfenced close=")" open="("
   separators=","><mi>m</mi><mi>n</mi></mfenced><mo>=</mo><mfrac><mrow><mn
   >2</mn><mfenced close=""
   open="|"><mrow><mi>m</mi><mo>∩</mo></mrow></mfenced><mfenced close="|"
   open=""><mi>n</mi></mfenced></mrow><mrow><mfenced close="|"
   open="|"><mi>m</mi></mfenced><mo>+</mo><mfenced close="|"
   open="|"><mi>n</mi></mfenced></mrow></mfrac> :MATH]
   .

   In this work, we only focused on analyzing the top subnetwork (TN) in
   each iPINBPA run. Let TN [a] be the top subnetwork identified in Run
   a ∈ {1, 2, …, 10}. We first find SN [b] (TN [a] ), which is the most
   similar subnetwork to TN [a] in Run b ∈ {1, 2, …, 10}\{a}. Clearly we
   have
   [MATH: <mi>S</mi><msub><mi>N</mi><mi>b</mi></msub><mfenced close=")"
   open="("><mrow><mi>T</mi><msub><mi>N</mi><mi>a</mi></msub></mrow></mfen
   ced><mspace width="0.25em"></mspace><mo>=</mo><mspace
   width="0.25em"></mspace><mi
   mathvariant="italic">argma</mi><msub><mi>x</mi><mrow><mi>s</mi><mi>n</m
   i></mrow></msub><mi>D</mi><mi>C</mi><mfenced close=")"
   open="("><mrow><mi>T</mi><msub><mi>N</mi><mi>a</mi></msub><mo>,</mo><ms
   pace
   width="0.25em"></mspace><mi>s</mi><mi>n</mi></mrow></mfenced><mtext>,</
   mtext> :MATH]

   where sn is any subnetwork enriched in Run b. After that, we define the
   consensus module (CM) based on Run a as follows:

   CM [a] = TN [a] ∩ (∩[a ≠ b,b ∈ {1,2,…,10}] SN [b](TN [a])).

   Namely, CM [a] is the intersection of TN [a] and its most similar
   subnetworks identified in all the other runs.

Network Module Visualization and Functional Analysis

   Cytoscape 3.2 [[94]38] was used to visualize the example identified
   network modules. The Enrichr [[95]28] pathway analysis tool
   ([96]http://amp.pharm.mssm.edu/Enrichr/) and the Kyoto Encyclopedia of
   Genes and Genomes database (KEGG; [97]http://www.genome.jp/kegg/)
   [[98]39] was applied to functional analysis of the identified network
   modules. Heat map was plotted, using R 3.2.0 software, to demonstrate
   relations between consensus modules and relevant KEGG pathways.

Results

GWAS and gene-level analysis

   The demographic and clinical characteristics for the 843 ADNI subjects
   in this study are presented in Table [99]1. The summary statistics for
   all diagnostic groups (CN, SMC, EMCI, LMCI, and AD) are given.
   Education level (p = 0.764) was not significantly different across the
   five diagnostic groups; however, gender demonstrated a significant
   difference (p = 0.002). Furthermore, as expected, age, APOE e4 status,
   clinical dementia rating–sum of boxes (CDR-SOB), mini mental status
   examination (MMSE), logical memory immediate recall, and logical memory
   delayed recall exhibited significant differences across the five groups
   (p < 0.001). Also as expected, the phenotype t-tau/Aβ[1–42] ratio
   significantly differed across the diagnostic groups (p < 0.001).

   The top SNP in the GWAS analysis was rs4420638 (chromosome 19, 14 kb
   away from the APOC1 gene, p = 2.576E-28), which was previously reported
   by Lars Bertram et al. [[100]40]. The SNP rs769449 within the APOE gene
   on chromosome 19 was also significant with p = 4.98E-23, and was
   previously reported by Soerensen et al. [[101]41]. Similar to the
   results reported in our earlier paper [[102]13], The TOMM40 SNP
   rs2075650 (chromosome 19, p =4.23E-18, was associated with
   t-tau/Aβ[1–42] ratio.

   Under the hypothesis that genes, rather than SNPs, are the functional
   units in biology [[103]42], a gene-level association analysis of the
   t-tau/Aβ[1–42] ratio was performed based on the SNP-level results by
   VEGAS2. Table [104]2 shows the top 10 genes identified by VEGAS2.
   Figure [105]2 shows the Manhattan plot of the gene-based GWAS results.

Table 2.

   The top 10 genes identified by VEGAS2
   Chr  Gene  nSNPs  Test   Pvalue    TopSNP   TopSNP-Pvalue
   19  APOC1  6     274.96 1.00E-06 rs4420638    2.58E-28
   19  APOE   7     188.92 1.00E-06 rs769449     4.98E-23
   19  PVRL2  19    228.01 1.00E-06 rs2075650    4.23E-18
   19  TOMM40 11    363.85 1.00E-06 rs769449     4.98E-23
   17  PDK2   11    103.37 5.00E-06 rs3809762    1.86E-06
   17  ITGA3  15    111.81 7.40E-05 rs3809762    1.86E-06
   19  CBLC   2     24.35  1.12E-04 rs2965121    1.45E-04
   17  CCL7   5     28.84  2.38E-04 rs991804     2.28E-04
   19  KLK7   12    56.02  2.76E-04 rs11084043   7.74E-05
   1   PTGER3 97    400.30 3.07E-04 rs7540868    1.89E-04
   [106]Open in a new tab

   Chr: Chromosome; Gene: Gene name; nSNPs: Number of SNPs in the input
   file that map to the gene; Test: The sum of the individual chi-squared
   1 degree of freedom SNP-association test statistic; Pvalue: The
   gene-based p-value considering the full set of SNPs; TopSNP: The name
   of the most significant SNP within the gene; TopSNP-Pvalue: The p-value
   for the most significant SNP with the gene

Fig. 2.

   Fig. 2
   [107]Open in a new tab

   a Manhattan plot showing the gene-level p values in ADNI t-tau/Aβ[1–42]
   ratio GWAS study. The blue line corresponds to p = 10^−5; the red line
   corresponds to p = 10^−6. Bonferroni-corrected threshold is
   p < 2.25 × 10^−6 (i.e., 0.05/22,179)

Network search for CMs

   Consensus modules (CMs) were identified by CM-iPINBPA network analysis
   strategy. Subnetwork search was conducted on the GWAS findings using
   iPINBPA ten times by varying the seed value of random number generator,
   which ranged from 1 to 10. Table [108]3 summarizes the results of ten
   iPINBPA runs. PRKCA and TP53 appeared to be the start nodes of the top
   subnetworks identified in multiple runs. The PRKCA gene was previously
   reported as being associated with an altered amyloid precursor protein
   (APP) secretion in fibroblasts from AD patients [[109]43, [110]44].
   Culmsee et al. demonstrated that TP53 was a novel gene as a biomarker
   of AD and was related to neurodegenerative processes [[111]45].

Table 3.

   Results of 10 iPINBPA runs
   SRN # of Subnetworks TN Size TNStartNode TN Score
   1   1058             99      PRKCA        10.11
   2   918              92      PRKCA         9.70
   3   975              151     TP53         10.95
   4   1100             133     -            10.52
   5   977              108     PRKCA         9.85
   6   936              135     -            10.64
   7   1166             101     PRKCA         9.41
   8   1050             147     -            11.27
   9   972              67      TP53          8.04
   10  955              146     -            11.25
   [112]Open in a new tab

   SRN:The seed value used for an iPINBPA run

   # of Subnetworks: The number of subnetworks identified in an iPINBPA
   run

   TN Size: The number of genes in the top subnetwork identified in an
   iPINBPA run

   TN StartNode: The start node of the top subnetwork

   TN Score: The score of the top subnetwork

   Table [113]4 shows the top subnetwork identified in each run, its most
   similar subnetworks in other runs coupled with the Dice’s coefficient
   value, and the corresponding CM. For example, in Table [114]4, the
   Dice’s coefficient between TN1 and SN1 (in GN[2]) is 0.96335. Thanks to
   the overlapping subnetworks, only four unique CMs were identified
   (Table [115]4). These four CMs are shown in Fig. [116]3, where the
   reddish nodes represent the known AD genes from the KEGG AD pathway
   (hsa05010). CM1 (S [A] = 10.15, p < 0.001) shown in Fig. [117]3(a)
   contains totally 67 genes, including KEGG AD genes GSK3B, MAPK1, PSEN2,
   CALM1, CALM2 and CASP8. CM2 (S [A] = 10.39, p < 0.001) shown in
   Fig. [118]3(b) contains totally 67 genes, including KEGG AD genes
   BACE1, GSK3B, MAPK1, PSEN2, CALM1, CALM2 and CASP8. CM3 (S [A] = 7.99,
   p < 0.001) shown in Fig. [119]3(c) contains 40 genes, including KEGG AD
   genes GSK3B, CALM1 and CASP8. CM4 (S [A] = 10.46, p < 0.001) shown in
   Fig. [120]3(d) contains 58 genes, including KEGG AD genes BACE1, GSK3B,
   MAPK1, PSEN2, CALM1, CALM2 and CASP8.

Table 4.

   The characteristics of the identified consensus modules in 10 iPINBPA
   runs
   CM RunA Ta: the top subnetwork in RunA. Sb: the most similar subnetwork
   to Ta in RunB
   RunB 1 2 3 4 5 6 7 8 9 10
   1 1 Rank of Sb in RunB 1 1 4 3 1 6 1 3 8 4
   DC(Ta, Sb) 1.00 0.96 0.88 0.88 0.96 0.96 0.99 0.82 0.81 0.96
   1 2 Rank of Sb in RunB 1 1 4 3 1 6 1 3 8 4
   DC(Ta, Sb) 0.96 1.00 0.85 0.85 0.92 1.00 0.95 0.78 0.84 0.92
   2 3 Rank of Sb in RunB 3 2 1 2 3 2 3 2 1 5
   DC(Ta, Sb) 0.78 0.77 1.00 0.93 0.83 0.91 0.81 0.97 0.61 0.83
   3 4 Rank of Sb in RunB 2 3 2 1 2 1 2 1 13 1
   DC(Ta, Sb) 0.85 0.82 0.97 1.00 0.89 0.99 0.86 0.95 0.67 0.95
   1 5 Rank of Sb in RunB 1 1 4 3 1 6 1 3 8 4
   DC(Ta, Sb) 0.96 0.92 0.93 0.93 1.00 0.92 0.97 0.86 0.77 1.00
   3 6 Rank of Sb in RunB 2 3 2 1 2 1 2 1 13 1
   DC(Ta, Sb) 0.84 0.82 0.98 0.99 0.88 1.00 0.86 0.96 0.66 0.96
   1 7 Rank of Sb in RunB 1 1 4 3 1 6 1 3 8 4
   DC(Ta, Sb) 0.99 0.95 0.89 0.89 0.97 0.95 1.00 0.83 0.80 0.97
   3 8 Rank of Sb in RunB 2 3 2 1 2 1 2 1 13 1
   DC(Ta, Sb) 0.80 0.78 0.98 0.95 0.84 0.96 0.81 1.00 0.63 1.00
   4 9 Rank of Sb in RunB 3 2 1 2 3 8 3 2 1 5
   DC(Ta, Sb) 0.82 0.83 0.61 0.68 0.77 0.73 0.79 0.64 1.00 0.77
   3 10 Rank of Sb in RunB 2 3 2 1 2 1 2 1 13 1
   DC(Ta, Sb) 0.80 0.78 0.98 0.95 0.85 0.96 0.82 1.00 0.63 1.00
   [121]Open in a new tab

   CM consensus module id, DC Dice’s coefficient

Fig. 3.

   Fig. 3
   [122]Open in a new tab

   Consensus modules identified by 10 runs of iPINBPA. a Consensus
   Module1; b Consensus Module 2; (c) Consensus Module 3; (d) Consensus
   Module 4. The reddish color indicates genes belonging to the KEGG
   Alzheimer’s Pathway. The adjusted network scores (i.e., S [A]) of these
   four modules are 10.15, 10.39, 7.99, and 10.46, respectively.
   Therefore, all the modules are significant (p < 0.001)

   In addition, the intersection of the four CMs was extracted and named
   as the common subnetwork. The common subnetwork is shown in
   Fig. [123]5, and contains total 29 genes, including 3 KEGG AD genes
   CALM1, CASP8, and GSK3B, and 26 other genes.

Fig. 5.

   Fig. 5
   [124]Open in a new tab

   The common subnetwork. This subnetwork consists of only overlapping
   genes of all consensus modules. The reddish color indicates genes
   belonging to the KEGG Alzheimer’s Pathway

Pathway analysis of consensus modules and the common subnetwork

   To test the hypothesis that CMs enriched by the GWAS findings might be
   significantly overrepresented in AD and other relevant pathways, the
   Enrichr method was performed for pathway analysis. For the genes in
   each CM, a pathway enrichment analysis was conducted, and the nominally
   enriched pathways were identified with adjusted p-value ≤ 0.05. Then,
   these identified pathways of the four CMs were plotted as a heat map
   shown in Fig. [125]4 to summarize the relationships between the
   pathways and CMs. Note that Fig. [126]4 lists only pathways enriched by
   at least one CM. Table [127]5 shows top 20 pathway enrichments analysis
   of the common subnetwork.

Fig. 4.

   Fig. 4
   [128]Open in a new tab

   Functional annotation of the four identified consensus modules
   (CM1-CM4) using KEGG pathways. The four consensus modules were treated
   as four gene sets, and went through pathway enrichment analysis based
   on the KEGG pathway database. The enrichment results at a nominal
   statistical threshold of adjusted p-value < 0.05 are shown.
   -log[10](p-value) values are color-mapped and displayed in the heat
   map. Heat map blocks labeled with “x” reach the nominal significance
   level of adjusted p-value < 0.05. Only top enrichment findings are
   included in the heat map, and so each row (pathway) has at least one
   “x” block

Table 5.

   Top 20 pathway enrichments analysis of the common consensus subnetwork
   Pathway Overlap P-value Adjusted P-value Z-score Combined Score Genes
   glioma 4/62 7.19E-05 0.002277 −2.14 13.00 PDGFRB; PIK3R1;CALM1;TP53
   apoptosis 4/81 1.94E-04 0.002277 −1.85 11.28 CASP8; CFLAR;PIK3R1;TP53
   huntingtons disease 3/31 2.20E-04 0.002277 −1.77 10.77 CASP8;CALM1;TP53
   colorectal cancer 4/84 2.22E-04 0.002277 −1.84 11.18 PDGFRB;
   GSK3B;PIK3R1;TP53
   prostate cancer 4/86 2.42E-04 0.002277 −1.80 10.97 PDGFRB;
   GSK3B;PIK3R1;TP53
   endometrial cancer 3/52 9.23E-04 0.007232 −1.65 8.13 GSK3B;PIK3R1;TP53
   renal cell carcinoma 3/69 2.02E-03 0.01286 −1.61 6.99 GAB1;PIK3R1;HIF1A
   melanoma 3/71 2.19E-03 0.01286 −1.49 6.50 PDGFRB;PIK3R1;TP53
   erbb signaling pathway 3/85 3.59E-03 0.018769 −1.53 6.08
   GSK3B;GAB1;PIK3R1
   focal adhesion 4/192 4.51E-03 0.02118 −1.60 6.16 PDGFRB;
   GSK3B;PIK3R1;RHOA
   neurodegenerative diseases 2/38 9.01E-03 0.038512 −0.87 2.83 NGFR;CASP8
   non small cell lung cancer 2/53 1.66E-02 0.064347 −1.21 3.32
   PIK3R1;TP53
   basal cell carcinoma 2/55 1.78E-02 0.064347 −0.92 2.53 GSK3B;TP53
   b cell receptor signaling pathway 2/62 2.22E-02 0.074377 −0.91 2.37
   GSK3B;PIK3R1
   phosphatidylinositol signaling system 2/72 2.91E-02 0.081711 −1.00 2.50
   PIK3R1;CALM1
   pancreatic cancer 2/73 2.98E-02 0.081711 −0.99 2.49 PIK3R1;TP53
   chronic myeloid leukemia 2/75 3.13E-02 0.081711 −0.93 2.33 PIK3R1;TP53
   adherens junction 2/75 3.13E-02 0.081711 −0.78 1.96 CSNK2A1;RHOA
   regulation of actin cytoskeleton 3/201 3.54E-02 0.087678 −0.96 2.34
   PDGFRB; PIK3R1;RHOA
   small cell lung cancer 2/85 3.92E-02 0.092081 −0.73 1.74 PIK3R1;TP53
   [129]Open in a new tab

   Pathway: The name of KEGG pathway

   Overlap: The number of overlapping genes compared with the number of
   input genes

   P-value: P-value was computed using the Fisher exact test

   Adjusted P-value: Adjusted P-value was a corrected p-value to the
   Fisher exact test

   Z-score: Computed by assessing the deviation from the expected rank

   Combined score: Computed by taking the log of the p-value from the Fish
   exact test and multiplying that by the z-score of the deviation from
   the expected rank

   Genes: The overlapping genes between the input and the pathway

Discussion

Gene-level analysis

   In the GWAS analysis, the gene-level p-values were determined and shown
   in Fig. [130]2. The use of the CSF t-tau/Aβ[1–42] ratio as a
   quantitative trait (QT) in this study enabled us to examine the effect
   of genes previously known to be associated with the QT as well as to
   identify novel genes. Table [131]2 lists the top ten genes obtained by
   converting SNPs into gene-wise p-values. Given the Bonferroni-corrected
   threshold of p < 2.25 × 10^−6 (i.e., 0.05/22,179), we found five
   significant genes. As expected, significant associations were
   identified between loci on chromosome 19 and the CSF t-tau/Aβ[1–42]
   ratio (e.g., APOC1, APOE, PVRL2, TOMM40, p = 1 × 10^−6, see
   Fig. [132]2). Among these genes, apolipoprotein C1 (ApoC1) encoded by
   the APOC1 gene is associated with amyloid β plaques; the APOE and
   TOMM40 (rs769449) genes code for proteins related to the clearance of
   Aβ and mitochondrial functions [[133]5, [134]13]; and the PVRL2 gene
   was previously reported as related to risk factors that contribute to
   AD pathogenesis [[135]46]. PDK2 (p = 5 × 10^−6) shows a trend towards
   the significance, and the overexpression of this gene may be related to
   cancer and diabetes[[136]47]. Additionally, CCL7 mRNA is highly
   increased by Aβ[1–42] stimulation [[137]48]. The CCL7 gene was
   previously reported as related to the chemotaxis of pro-inflammatory
   cells to the inflamed location [[138]49]. The PTGER3 gene was
   previously reported as being related to the inflammatory response
   [[139]50].

Network search for CMs and functional validation

   Although iPINBPA has been successfully applied in several previous
   studies [[140]16, [141]20–[142]23], we observed that different
   subnetworks could be obtained by using different random seed values. To
   overcome this limitation, we proposed to examine the consensus modules
   discovered by multiple iPINBPA analyses. In other words, we focused on
   examining the shared subnetworks among multiple iPINBPA runs, which
   turned out to be more stable patterns. A two-stage strategy was
   employed. First, ten groups of subnetworks were generated by running
   iPINBPA ten times with varying random seed values ranging from 1 to 10.
   After comparing these ten sets of results, we identified ten CMs, one
   from each run (defined as the intersection of the top subnetwork from
   the current run and the most similar subnetworks from all the other
   runs). As a result, there are four unique CMs based on ten identified
   ones.

   The genes in the CMs might not show a direct statistical significance
   but could interact with some genes identified in our GWAS. These genes
   can demonstrate indirect association with the studied QT, and may
   potentially provide valuable biological interpretation. For example,
   Consensus Module 1 contains the protein gamma-aminobutyric acid (GABA)
   A receptor (GABRB1) gene. GABRB1 codes for the β1 subunit of
   gamma-aminobutyric acid A (GABA[A]) receptors [[143]1]. The GABRB1 gene
   has been demonstrated to be involved in the thalamus structure and its
   interactive effects on intelligence [[144]51]. The GABRB1 gene had also
   been associated with many neuropsychological diseases, such as
   schizophrenia, major depression, bipolar disorder, and Alzheimer’s
   disease [[145]52].

   In this study, we hypothesize that trait prioritized CM with high
   replication might have strong functional associations with
   t-tau/Aβ[1–42] ratio. We gathered these identified pathways for 4 CMs
   to plot heatmap to explore and refine the relationships between
   pathways and CMs (Fig. [146]4). In Fig. [147]4, it was observed that
   four pathways, including colorectal cancer, gliomas, renal cell
   carcinoma, and Huntington’s disease, were commonly enriched in all the
   consensus modules. The neurodegenerative symptoms of neuron death
   affect many diseases, including Alzheimer’s, Parkinson’s, and
   Huntington’s diseases. Below we briefly discuss a few additional
   pathways identified in Fig. [148]4. In AD, focal adhesion complexes
   regulating neuronal viability can be used in treatment to adjust
   neuronal survival [[149]53]. The adherens junction has been
   demonstrated as maintaining blood–brain barrier integrity, and the
   adherens junction pathways is highly associated with neurodegenerative
   diseases [[150]54]. Apoptosis is an important pathway in Alzheimer’s
   disease that is associated with neuronal loss [[151]55]. The change in
   the MAPK signaling pathway contributes to significant change in
   neurotropin [[152]56]. Some cancer-related pathways were found, such as
   colorectal cancer, pancreatic cancer, prostate cancer, endometrial
   cancer, bladder cancer, and so on. Some prior studies have been
   performed to examine the relationship between cancer and neurologic
   disease [[153]57, [154]58].

   With these observations, the genes in the CMs may provide valuable
   information to suggest novel molecular mechanisms for subsequent
   analyses. Compared with the standard iPINBPA method, CM-iPINBPA network
   analysis strategy for mining consensus models could offer more stable
   results.

Common subnetwork and functional validation

   The common subnetwork is the intersection of the four identified common
   modules, and consists of 29 genes (Fig. [155]5). Among these genes, the
   GSK3B, SUMO1, AKAP5, CALM1 and DLG4 genes have been identified to be
   involved in tau phosphorylation, and over-phosphorylation of the tau
   protein can form the tangles in the brain of AD patients
   [[156]59–[157]63]. Additionally, the CASP8, PIK3R1, PPA1, PARP1,
   CSNK2A1, NGFR, and RHOA genes have been demonstrated to be involved in
   amyloid beta peptide production [[158]64–[159]70]. The BCL3, CFLAR,
   SMAD1, and HIF1A genes have been identified to be associated with
   late-onset Alzheimer's disease. The common subnetwork also contains the
   following genes TP53, DDX5, NDN, MST1R, CCDC106, NMT2, RPA1, AKAP13,
   GAB1, PPM1A, SPTBN1 and MED6, which interact with themselves and other
   genes. These findings offer valuable biological insights and suggest
   promising candidates for subsequent analyses.

   Table [160]5 shows the top twenty pathways enriched by the common
   subnetwork. Some significant pathways were observed, such as glioma,
   apoptosis, Huntington’s disease, renal cell carcinoma, melanoma, the
   erbb signaling pathway, focal adhesion, neurodegenerative diseases, and
   so on. Twelve genes (PDGFRB, PIK3R1, CALM1, CFLAR, TP53, RHOA, CASP8,
   HIF1A, GSK3B, GAB1, NGFR and CSNK2A1) were involved in the top twenty
   pathways. The gene PDGFRB has been confirmed as causative of primary
   familial brain calcifications (PFBC) [[161]71]. The gene PIK3R1 has
   been shown to be involved in Alzheimer’s disease [[162]72]. CALM1
   encodes for tau protein and regulates the subcellular localization and
   function of calmodulin in neurons [[163]73]. CFLAR suppresses death
   receptor-induced apoptosis and TCR activation, which induces cell death
   by inhibiting caspase-8 activation [[164]74]. RHOA was implicated in Aβ
   neurotoxicity, and the activation generates cytoskeletal changes
   [[165]75]. NGFR ligands play an important role in preventing
   fundamental tau-related pathologic mechanisms in Alzheimer’s disease
   [[166]76]. These pathways appear relevant given the results in prior
   studies. For other genes identified in this pathway analysis, it
   warrants further investigation to demonstrate the role they play.

Limitations

   Due to the limited number of subjects available to us, we were only
   able to perform a discovery study in this work. When more data become
   available in the future, replication studies in independent cohorts are
   required to evaluate and validate the identified network modules in our
   study. In addition, in this work, we reported the results using the
   default parameter setting provided by the software tool and according
   to Lili et al. [[167]22], except the random seed value. We ran iPINBPA
   multiple times by using different random seed values and then
   extracting the consensus patterns to stabilize the results. As to other
   parameters, we also briefly tested each of those by varying its value.
   For most of these parameters, we obtained very similar results. The
   most sensitive parameter is the restart ratio used in the step called
   “random walk with restart” for prioritizing phenotype-associated genes.
   It is excepted that different restart ratios will assign different
   weights to network nodes and subsequently produce different scores for
   network components. The determination of the optimal restart ratio
   warrants a separate and focused investigation and is an interesting
   future direction.

Conclusions

   Network-based methods form a new generation of enrichment analysis
   strategy, and they can overcome the limitation of traditional
   enrichment analysis where only a fixed set of pre-defined pathways are
   examined. In this study, a genome-wide network-based pathway analysis
   of the CSF biomarker of the t-tau/Aβ[1–42] ratio was performed, using a
   sample of 843 subjects from the ADNI database. To our knowledge, this
   is the first genome-wide network-based pathway study on the CSF
   biomarker t-tau/Aβ[1–42] ratio in Alzheimer’s disease. Due to the
   stochastic nature of the iPINBPA method, we employed a consensus module
   (CM) strategy to run iPINBPA multiple times and aimed to identify CMs
   from these different runs. We identified 4 CMs. These CMs contain not
   only genes from KEGG AD pathways, including BACE1, GSK3B, MAPK1, PSEN2,
   CALM1, CALM2, CASP8, and SK3B; but also interesting genes with relevant
   biological functions such as GABRB1, MMP2, CDK17, and IGFBP3. In sum,
   besides confirming previous findings (e.g., APOE, TOMM40, APOC1), this
   study has also suggested new susceptible genes, CMs and pathways
   underlying Alzheimer’s disease.

Acknowledgements