Abstract Bipolar disorder (BPD) is a complex psychiatric trait with high heritability. Despite efforts through conducting genome-wide association (GWA) studies, the success of identifying susceptibility loci for BPD has been limited, which is partially attributed to the complex nature of its pathogenesis. Pathway-based analytic strategy is a powerful tool to explore joint effects of gene sets within specific biological pathways. Additionally, to incorporate other aspects of genomic data into pathway analysis may further enhance our understanding for the underlying mechanisms for BPD. Patterns of DNA methylation play important roles in regulating gene expression and function. A commonly observed phenomenon, allele-specific methylation (ASM) describes the associations between genetic variants and DNA methylation patterns. The present study aimed to identify biological pathways that are involve in the pathogenesis of BPD while incorporating brain specific ASM information in pathway analysis using two large-scale GWA datasets in Caucasian populations. A weighting scheme was adopted to take ASM information into consideration for each pathway. After multiple testing corrections, we identified 88 and 15 enriched pathways for their biological relevance for BPD in the Genetic Association Information Network (GAIN) and the Wellcome Trust Case Control Consortium dataset, respectively. Many of these pathways were significant only when applying the weighting scheme. Three ion channel related pathways were consistently identified in both datasets. [29]Results in the GAIN dataset also suggest for the roles of extracellular matrix in brain for BPD. Findings from Gene Ontology (GO) analysis exhibited functional enrichment among genes of non-GO pathways in activity of gated channel, transporter, and neurotransmitter receptor. We demonstrated that integrating different data sources with pathway analysis provides an avenue to identify promising and novel biological pathways for exploring the underlying molecular mechanisms for bipolar disorder. Further basic research can be conducted to target the biological mechanisms for the identified genes and pathways. Introduction Bipolar disorder (BPD) is a severe and complex psychiatric disorder, with high heritability around 0.6 to 0.7 [30][1], [31][2]. Prior individual linkage studies and meta-analyses suggested a number of susceptible regions in human genome for the risk of developing BPD. However, most of these findings are inconsistent and rarely pointed to specific chromosomal locations for replication [32][3]. Recently, large scale genome-wide association (GWA) studies, which scanned half or a million single nucleotide polymorphisms (SNPs), were frequently employed. Although the GWA studies were anticipated to provide comprehensive genetic information for complex traits, previous GWA studies for BPD reported limited numbers of susceptible loci with small effect size. The odds ratios of significant findings in GWA studies for BPD in populations of European, American and Han Chinese were between 1.2 and 2.0 [33][4], [34][5], [35][6], [36][7], which are in accordance with the observations from other GWA studies in complex traits [37][8]. In addition, the reported associated variants from GWA studies often explain a small proportion of heritability for complex traits, a so called ‘missing heritability’ phenomenon [38][9]. Missing heritability may be owing to lack of power to detect common variants with very small effect, not including rare variants for their effects in whole-genomic array, or not considering other genomic mechanisms, such as complex gene-gene interaction and epigenetic influences [39][10]. In most of the genetic studies, a commonly applied strategy is to analyze single markers or specific haplotypes for their associations with disease of interests. This often produces limited success in identifying putative loci for BPD, especially for variants with small to moderate effect. The genetic causes of BPD are likely involved with a large collection of genetic variants in certain biological pathways to jointly exhibit their effects for the trait. Therefore, pathway-based approach becomes a useful and complementary method in addition to single locus analyses. A pathway is considered as a specific gene set that is defined according to certain biological function or process. Analyzing GWA dataset with pathway-based approach could provide integrating information of multiple loci with similar physiological functions to bring biological insights into the mechanisms of BPD. Previously, pathway analysis has been successfully conducted using GWA datasets for several complex traits, such as schizophrenia, major depressive disorder, and breast cancer [40][11], [41][12], [42][13] to reveal important biological mechanisms underlying the diseases. To perform pathway analysis using GWA dataset, an important first step is to extract SNPs information for each gene region. A commonly adopted method is to select the most significant SNP within a gene region to represent the gene [43][14]. Often, tens to hundreds of common SNPs are found in a typical gene region. Only a few of them are functional variants, thus, SNP with the maximum statistic may lack direct biological meaning and connection with the trait of interest. Nevertheless, non-structured variants may still regulate gene functions through other mechanisms. Epigenetic changes, one of regulatory mechanisms, can modify gene activity or gene expression without altering the genomic structure, including stable DNA methylation, post-translational modifications of histone proteins, and non-coding RNA [44][15]. Among these epigenetic modifications, changes of DNA methylation patterns at CpG sites are considered heritable and may play important roles in regulating gene functions [45][16], [46][17]. A pilot study of the Human Epigenome Project reported that more than 50% of CpG sites have greater than 50% variation within region of the major histocompatibility complex [47][18]. Compelling evidence also reveals a commonly observed phenomenon called allele-specific methylation (ASM) to describe different status of DNA methylation of a nearby CpG site by the two alleles presenting in a cell [48][19]. The profiles of DNA methylation are dynamic and tissue specific. Using human adult cerebellum samples, Zhang and colleagues (2010) conducted whole genome genetic polymorphisms and methylation quantitative association analysis to identify SNPs that regulate DNA methylation of CpG sites through cis- or trans- regulation [49][20]. Their findings documented that variation in genetic polymorphism affects the degree of DNA methylation in coding or non-coding region of specific genes. A considerable proportion of CpG sites were regulated by specific genetic variants distributed in the whole genome. Incorporating the ASM information into pathway-based analysis using GWA dataset may provide a new avenue to search for important biological pathways and to investigate the underlying pathogenesis of BPD. The present study aimed to integrate brain-specific ASM information into whole genome genotyping data to identify important pathways for bipolar disorder. We used two GWA datasets of BPD with relatively large samples in Caucasian populations, the Wellcome Trust Case Control Consortium (WTCCC) and the Genetic Association Information Network (GAIN). The list of brain-specific ASM was obtained from the Zhang's (2010) study. We applied comprehensive pathway based statistical approaches with novel weighting scheme to incorporate the impacts of ASM to evaluate the enrichment of annotated pathways for BPD. The present study successfully identified significant and novel pathways for BPD. Our strategy to explore potential mechanisms for BPD through integrating information from different genomic aspects can be well applied to other complex traits. Materials and Methods Genome-wide association (GWA) datasets In the current study, two GWA datasets of BPD in the Caucasian populations were used, the WTCCC [50][4] and the GAIN data [51][21]. We used these two individual GWA datasets to search for consistent pathway findings for BPD. The details of subject enrollment and genotyping of the two GWA studies were provided in their primary articles. In brief, all participants in the WTCCC were self-identified as white Europeans who lived in the United Kingdom. These included 1,868 subjects with BPD and 2,938 healthy subjects from the 1958 British Birth Cohort or United Kingdom Blood Donors. In the GAIN dataset, individuals were Americans with European ancestry, including 1,001 cases of BPD and 1,034 controls. The genotyping platform was Affymetrix GeneChip Human Mapping 500K Array and Affymetrix Genome-Wide Human SNP Array 6.0 for the WTCCC and GAIN, respectively. After quality control procedures implemented, a total of 485,263 (WTCCC) and 698,227 (GAIN) autosomal SNPs were retained in the following analyses [52][4], [53][21]. All single marker association analyses with additive model were conducted using PLINK versions 1.07 [54][22]. Brain specific allele-specific gene methylation (ASM) list and computing gene-wise statistic values Information of ASM in human brain tissues was obtained from Zhang and colleagues [55][20]. Regulating SNPs within 1 Mb region of both ends of each CpG site were considered cis-acting, and all the other regulating SNPs were trans-acting. SNPs with region-wise p-value less than 0.05 in cis-acting and SNPs with genome-wide p-value less than 0.05 in trans-acting were selected in the ASM list in the present study. In total, we had 9,414 SNP-CpG pairs in autosomatic chromosomes, which included 9,042 cis-acting and 372 trans-acting pairs (In the Table S5 of the Zhang et al's study). [56]Figure 1 described our analysis flow-chart. To obtain gene level significance for BPD in the two GWA datasets, we first mapped SNPs to a gene region (using NCBI build 35 for the WTCCC and build 36 for the GAIN due to different genotyping platforms) if SNPs were located within 5 kb of both ends of the gene. In the WTCCC dataset, there were 193,837 SNPs mapped to 15,054 genes. Among these SNPs, 6,324 SNPs in the ASM list locate in 1,785 genes. In the GAIN dataset, there were 304,343 SNPs mapped to 16,385 genes, and 6,992 SNPs in the ASM list locate in 1,961 genes. We therefore divided genes into two sets, ASM and non-ASM set, in the subsequent pathway analysis. We defined the ASM set to include 1) genes with SNPs in the ASM list, and 2) genes with CpG site that is regulated by SNPs in the ASM list. Genes not in the ASM set were assigned into the non-ASM set. In total, we had 2,327 and 2,298 genes in the WTCCC and GAIN datasets, respectively. Figure 1. The summary description of present pathway-based method. [57]Figure 1 [58]Open in a new tab *Gene with CpG site that is regulated by SNPs in the ASM list. The gene-wise p-value for each gene was defined by the most significant SNP within a gene region, which was commonly adopted in conducting pathway analysis [59][14]. For genes in the ASM set, the smallest p-value (min-p) among all SNPs in the ASM list in a gene region was used to represent the gene-wise statistic value. Similarly, for genes in the non-ASM set, the smallest p-value (min-p) of all SNPs in a gene region was used to represent the gene-wise statistic values. To account for potential bias caused by using minimum p-values to represent gene-wise statistic for genes with various sizes, we adopted the method by Yang and colleagues [60][23] to calculate normalized gene scores via 10,000 permutations for all the genes we analyzed. For each gene, a gene-size adjusted gene score was calculated and used in the following pathway analyses. Weighting procedure using the ASM information in pathways To incorporate methylation information into GWA dataset in pathway-based analyses, we applied weighting procedures for ASM and non-ASM gene sets for each pathway. The overall proportion of significant SNPs (i.e. p-value<0.05) in the whole GWA dataset was first calculated (11.4% in the WTCCC and 10.2% in the GAIN datasets). We then compared the proportion of significant SNPs in a given gene with the average proportion of significant SNPs in the whole GWA dataset to evaluate whether a gene is more informative than average. We used Harmonic average (H) as the basis for weighting the average amount of informative genes in the ASM set and non-ASM set in a pathway. The weighting procedures are described below. For each pathway, the corresponding numbers in the ASM and non-ASM gene sets were n and m, respectively. k[n] and k[m] represent the numbers of informative genes in the two gene sets separately. Thus, the proportion of informative genes in the ASM and non-ASM sets was denoted as k[n]/n and k[m]/m. Assuming that R[n] and R[m] are two reduced fractions of k[n]/n and k[m]/m, the Harmonic average of R[n] and R[m], H, was specified as 2/(1/R[n]+1/R[m]). The Harmonic average was chosen so that the average is less influenced by extreme R[n] or R[m], values, and H was used to calculate the gene-wise weights for genes in the ASM and non-ASM sets in a pathway and to further reduce the potential bias in pathway analysis due to pathway size variation (i.e., numbers of genes in each pathway vary). If the proportion of informative genes in the ASM set was greater than the non-ASM set, the weights for the ASM and non-ASM sets were R[n]/H and R[m]/H, respectively. If there were no informative genes in non-ASM set, we assigned a weight, ranging from 1 to 6, according to the proportion of informative genes (using 0.1, 0.3, 0.5, 0.7, and 0.9 as cutoff-values) to ASM set and 1 to non-ASM set. Otherwise genes in the ASM and non-ASM sets had equal weights. Statistical methods for pathway enrichment analysis We downloaded canonical pathway information from Molecule Signature Database (MsigDB). The MsigDB consists of several available online sources of pathway databases and manually curated pathways from the literature, including Kyoto Encyclopedia of Genes and Genomes (KEGG), BioCarta, Reactome, Gene Ontology (GO) terms, and gene sets compiled from published biomedical literature [61][24], which listed 4,726 pathways and 22,429 genes. Pathways with extreme numbers of genes (i.e., 10th percentile of pathway size distribution, less than 10 or more than 380) were removed from analysis to avoid stochastic bias or testing too general biological process. After mapping genes in GWA datasets into pathways, we tested in total 3,917 pathways for the WTCCC and 4,051 pathways for the GAIN datasets. We applied both competitive and self-contained pathway analyses approaches [62][14]. Competitive method compares the statistics of genes in a given pathway with the rest of genes not in the pathway. Self-contained method compares the statistics of genes in a given pathway with the null genomic background [63][25]. To obtain more comprehensive information in pathway analyses, three statistical methods were performed to evaluate the enrichment of all pathways, Gene Set Enrichment Analysis (GSEA), sum-square-statistic and sum-statistic [64][26], [65][27], [66][28]. The details of calculation procedures were provided in our previous study [67][13]. In brief, GSEA method ordered a set of genes by the min-p, and the gene-wise statistic values ( Inline graphic ) were defined as the chi-square statistic of the corresponding most-significant SNP. For each examined pathway, an enrichment score (ES) was calculated to evaluate association signals for all genes in the pathway. The sum-statistic or sum-square-statistic methods were to sum (the square) all gene-wise statistic values over the set of genes ( Inline graphic or Inline graphic ) [68][28]. The three pathway-based approaches were analyzed with or without weighting procedures using the ASM information. We performed five thousand permutations to obtain empirical p-values for each pathway. The Benjamini and Hochberg (BH) multiple comparison procedure was used to control for the false discovery rate (FDR) [69][29] in pathway analyses. A p-value less than 0.01 after FDR correction was considered significant in the present study. To examine the common processes or underlying biological themes among significant pathways, we also analyzed functions of genes in enriched pathways using GO terms, including domains in biological process, cellular component, and molecular function ([70]http://www.broadinstitute.org/gsea/msigdb/annotate.jsp). Results In the GAIN dataset, 88 pathways were significant, in which 32 were identified only by weighting the ASM information ([71]Table S1). Similarly, we identified 15 pathways (11 of them were identified only when weighting procedure is applied for the ASM set) in the WTCCC dataset with p-value less than 0.01 after BH correction ([72]Table S2). Among the 88 enriched pathways in the GAIN dataset, there were 32 (36.4%) from GO, nine (10.2%) from KEGG, and eight (9.1%) from Reactome. Among the 15 enriched pathways in the WTCCC dataset, there were six (40.0%) from GO and one (6.7%) from Biocarta. Three pathways consistently exhibited their biological relevance for BPD in both GWA datasets ([73]Table 1). These 3 pathways were cation channel activity, gated channel activity, and metal ion transmembrane transporter activity. Additionally, the enriched pathways in the GAIN dataset ([74]Table S1) were involved in a series of biological procedures and mechanisms, such as brain development and neuron function (e.g. nervous system development, neurological system process, axon guidance, etc.), component of extracellular matrix, ECM (e.g. cell matrix adhesion, ECM receptor interaction, focal adhesion, integrin cell surface interactions, etc.), neurotransmitter (e.g. glutamate signaling pathway), and ion channel activity (e.g. potassium channel activity, voltage gated cation channel activity, calcium signaling pathway, etc.). Most of the enriched GO pathways in the WTCCC dataset were associated with serotonin receptor and channel and transporter activity, such as serotonin receptor activity, gated channel activity, cation channel activity, and metal ion transmembrane transporter activity ([75]Table S2). Table 1. Concordant enriched pathways among GWA datasets of the GAIN and the WTCCC by different pathway-based methods. Pathway name Total genes in pathway GAIN WTCCC No. of genes on list % of ASM gene Empirical p-value after the BH correction[76]^# No. of genes on list % of ASM gene Empirical p-value after the BH correction[77]^# GSEA SUMST SUMSQ GSEA SUMST SUMSQ GO_Cation channel activity 118 113 19.5 0.0881[78]^a 0.0000[79]^c 0.0386[80]^b 107 20.6 0.1741[81]^a 0.0000[82]^b 0.3656[83]^b GO_Gated channel activity 121 114 19.3 0.0661[84]^b 0.0000[85]^c 0.0559[86]^c 109 22.0 0.0870[87]^b 0.0000[88]^b 0.2765[89]^b GO_Metal ion transmembrane transporter activity 145 136 19.1 0.0939[90]^b 0.0000[91]^c 0.0000[92]^a 129 19.4 0.0870[93]^b 0.0000[94]^b 0.2765[95]^b [96]Open in a new tab ASM: Gene set of allele-specific methylation; GSEA: Gene Set Enrichment Analysis; SUMST: sum-statistic; SUMSQ: sum-square-statistic. ^# : The p-value after correction by the Benjamini and Hochberg (BH) multiple comparison procedure. ^a : Empirical p-values of non-weighting method is less than weighting; ^b : Empirical p-values of weighting method is less than non-weighting; ^c : Empirical p-values of non-weighting and weighting are equivalent. Using GO term analysis, we further examined genes in the enriched pathways of GAIN and WTCCC datasets (other than the original significant GO pathways) to search for common functions of these genes. In the GAIN datasets, there were 4,600 unique genes in 56 non-GO pathways. [97]Table S3 shows the top 50 significant GO terms with p-value less than 0.05. Most of these significant GO terms were associated with cytoskeleton structure (e.g. actin cytoskeleton organization and biogenesis, actin filament based movement, etc.), ECM (e.g. extracellular matrix structural constituent, collagen, integrin complex, integrin complex, etc.), and cation and gated channel activity (e.g. cation channel activity, nicotinic acetylcholine gated receptor channel complex, voltage gated calcium channel activity, etc.). In the WTCCC datasets, there were 990 genes in 9 significant non-GO pathways. [98]Table S4 exhibits the top 50 significant GO terms. Most of these significant GO terms were associated with ion channel activity (e.g., calcium, potassium, sodium, chloride channel activity, etc.), transporter activity (e.g. cation transmembrane transporter activity, inorganic cation transmembrane transporter activity, etc.), and neurotransmitter receptor activitysuch as serotonin receptor activity. [99]Table 2 displays the significant GO terms that were concordantly identified for BPD in both GWA datasets. The 29 GO terms were mainly associated with ion channel activity, such as calcium channel, ligand gated channel, nicotinic acetylcholine gated channel, voltage gated channel, etc. Table 2. Concordant gene sets in the two GWA datasets of the GAIN and the WTCCC using Gene Ontology analysis. Gene get name NO. of gene in gene Set GAIN (4,600 genes) WTCCC (945 genes) % of the overlap in Gene set p-value* % of the overlap in Gene set p-value* Calcium channel activity 33 90.9 2.31E−05 87.9 0.00E+00 Cation transmembrane transporter activity 211 78.2 5.39E−11 62.1 0.00E+00 Cation transport 146 65.8 1.88E−02 53.4 0.00E+00 Delayed rectifier potassium channel activity 12 100.0 1.18E−03 91.7 5.82E−10 Excitatory extracellular ligand gated ion channel activity 21 85.7 5.29E−03 81.0 4.91E−13 Extracellular ligand gated ion channel activity 21 85.7 5.29E−03 81.0 4.91E−13 Gated channel activity 121 86.0 5.18E−12 86.8 0.00E+00 Inward rectifier potassium channel activity 12 100.0 1.18E−03 91.7 5.82E−10 Ion channel activity 147 83.7 3.29E−12 82.3 0.00E+00 Ion transmembrane transporter activity 275 70.2 3.46E−06 53.8 0.00E+00 Ion transport 184 64.7 1.99E−02 47.8 0.00E+00 Ligand gated channel activity 39 79.5 2.81E−03 79.5 0.00E+00 Metal ion transmembrane transporter activity 145 86.9 4.33E−15 86.9 0.00E+00 Monovalent inorganic cation transport 93 69.9 7.16E−03 61.3 0.00E+00 Nicotinic acetylcholine activated cation selective channel activity 11 100.0 2.07E−03 81.8 1.78E−07 Nicotinic acetylcholine gated receptor channel complex 11 100.0 2.07E−03 81.8 1.78E−07 Potassium channel activity 50 96.0 4.29E−10 92.0 0.00E+00 Potassium ion transport 58 84.5 7.29E−06 77.6 0.00E+00 Sodium channel activity 17 82.4 2.72E−02 76.5 1.10E−09 Substrate specific channel activity 154 80.5 4.58E−10 78.6 0.00E+00 Substrate specific transmembrane transporter activity 341 67.5 3.72E−05 43.7 0.00E+00 Substrate specific transporter activity 388 63.4 5.25E−03 39.2 0.00E+00 Transmembrane transporter activity 371 66.6 7.59E−05 40.7 0.00E+00 Voltage gated calcium channel activity 18 94.4 5.88E−04 88.9 1.37E−13 Voltage gated calcium channel complex 15 93.3 2.69E−03 86.7 6.10E−11 Voltage gated cation channel activity 66 93.9 1.75E−11 90.9 0.00E+00 Voltage gated channel activity 73 90.4 3.53E−10 90.4 0.00E+00 Voltage gated potassium channel activity 36 100.0 1.57E−09 94.4 0.00E+00 Voltage gated potassium channel complex 40 90.0 5.76E−06 82.5 0.00E+00 [100]Open in a new tab GAIN: The analysis of biological gene sets by Gene Ontology was among 4,600 genes from 56 enriched pathways; WTCCC: The analysis of biological gene sets by Gene Ontology was among 945 genes from 9 enriched pathways. We further identified genes that were over-represented in enriched pathways. We selected genes that were commonly involved in more than 20% out of all enriched pathways for each GWA dataset and had at least one SNP having p-value less than 0.05 in the GAIN or the WTCCC dataset, that is, more than 18 pathways in the GAIN and 3 pathways in the WTCCC datasets. The proportion of significant SNPs in these genes ranged from 1.2% to 54.5%. In total, there were 26 concordant genes that satisfied these criteria between the two GWA datasets ([101]Table 3). They were mainly associated with calcium and potassium channel. In the WTCCC datasets, the over-represented genes were also associated with synaptic transmission (e.g. ACCN1, CHRNAV6, HTR3B, HTR3A), mediation of calcium ion release (e.g. RYR1, RYR2, RYR3, TRPC3, TRPC4), and channels of calcium, potassium, and sodium. Table 3. Over-representing genes in enriched pathways in the two GWAS datasets of the GAIN and the WTCCC. Gene Set GAIN WTCCC No. of SNP in gene % of significant SNPs[102]^# Smallest p-value No. of SNP in gene % of significant SNPs[103]^# Smallest p-value ACCN1 ASM 348 11.5 1.35E−03 CACNA1A ASM 48 2.1 4.06E−02 CACNA1B ASM 20 30.0 1.11E−02 CACNA1C ASM 205 3.4 3.82E−03 149 26.8 5.49E−05 CACNA1D ASM 123 3.3 2.52E−02 68 14.7 4.45E−03 CACNA1E ASM 45 6.7 3.92E−03 CACNA2D1 non-ASM 77 2.6 2.47E−02 CACNB2 ASM 185 11.9 5.07E−04 127 9.4 3.33E−05 CACNB3 ASM 1 100.0 3.63E−02 CACNB4 ASM 68 1.5 4.59E−02 37 5.4 3.60E−02 CENPN non-ASM 4 25.0 5.15E−22 CHRNA6 non-ASM 6 33.3 2.84E−02 HTR3B non-ASM 14 7.1 6.88E−19 KCNA2 ASM 5 20.0 4.85E−02 KCNA4 non-ASM 5 20.0 3.64E−02 KCNB2 ASM 142 4.9 2.62E−03 107 3.7 2.34E−04 KCNC1 ASM 11 54.5 1.59E−02 9 22.2 2.99E−02 KCNC4 ASM 10 10.0 1.60E−02 KCND3 ASM 114 14.9 7.76E−04 59 5.1 1.31E−02 KCNE1 ASM 20 25.0 2.04E−02 KCNG2 non-ASM 8 25.0 6.66E−04 KCNH1 ASM 149 8.1 4.03E−03 79 3.8 1.19E−02 KCNH2 non-ASM 2 50.0 2.74E−02 KCNJ1 non-ASM 11 9.1 3.95E−02 KCNJ12 5 20.0 3.02E−02 KCNJ15 14 7.1 3.11E−02 KCNJ3 ASM 49 12.2 2.02E−03 KCNJ5 non-ASM 18 5.6 7.22E−03 KCNJ6 ASM 157 8.3 5.15E−03 97 2.1 2.12E−03 KCNK1 ASM 36 8.3 2.01E−02 21 38.1 3.04E−02 KCNK3 2 50.0 3.16E−05 KCNMB2 ASM 94 23.4 1.46E−03 63 11.1 1.03E−02 KCNN2 30 10.0 3.71E−03 KCNN3 ASM 83 15.7 4.28E−03 36 2.8 3.55E−02 KCNQ1 non-ASM 102 14.7 6.35E−04 64 4.7 2.53E−02 KCNQ3 non-ASM 162 1.2 1.65E−02 97 4.1 4.10E−03 KCNQ5 94 8.5 2.07E−03 KCNS1 non-ASM 5 40.0 6.14E−04 4 50.0 5.58E−05 KCNS3 ASM 21 14.3 3.18E−02 24 20.8 1.72E−02 P2RX4 3 33.3 2.93E−02 PKD2 ASM 27 22.2 7.65E−03 10 20.0 1.59E−02 RYR1 21 9.5 4.76E−03 RYR2 114 14.0 1.30E−03 RYR3 185 5.4 2.14E−03 SCN11A 20 30.0 1.13E−02 SCN2A 14 14.3 4.18E−02 SCN2B 11 18.2 7.79E−03 SCN5A 19 5.3 2.33E−02 SCN9A 22 9.1 2.08E−03 SERPINB5 21 4.8 2.98E−02 TRPC3 12 8.3 3.08E−02 TRPC4 57 10.5 8.55E−03 [104]Open in a new tab ASM: Gene set of allele-specific methylation; Non-ASM: Gene set of other than ASM in pathway analysis. ^# : significant level: p-value less than 0.05. Discussion For complex trait like bipolar disorder, the whole-genome screening provides comprehensive genetic data and pathway-based approaches offer complementary information to reveal underlying complex biological connections in the whole-genome scale. [105]Results of pathway–based analysis not only can verify prior causal hypotheses for BPD (e.g. neurotransmitter processes and neuron activity dysfunction in brain, etc) but also to explore novel biological pathways [106][30], [107][31]. In the present study, we found enriched pathways for BPD to be related to ion channel activity such as calcium, potassium, and sodium ion. These findings are consistent with some of the presumed pathological mechanisms for BPD. Additionally, our results in the GAIN datasets suggest for the roles of extracellular matrix in brain to be involved in the development of bipolar disorder. Among our reported significant pathways in the two datasets, some of them were identified through GO database, which include nervous system development, ion channel and transporter activity, extracellular matrix, etc in the GAIN dataset and gated channel activity in the WTCCC dataset. These enriched GO pathways for BPD are in line with findings from some of previous studies [108][32], [109][33], [110][34]. The other pathways we identified are mainly based on gene sets that were compiled from published biomedical literature (discussed later). We noted that few studies applying pathway based approaches for BPD utilized pathway sources only from KEGG or GO, which include only 25% to 40% genes in the whole human genome. Thus, a large proportion of genes with potential impacts for the trait of interest might be excluded from pathway analyses. This is clearly the restriction that results of pathway findings depend on the completeness and correctness of annotated pathway databases. Therefore, using more comprehensive pathway sources to include the whole genome (as we used in the present study) brings benefit to obtain more accurate and full understanding for the enriched pathways for bipolar disorder. A common phenomenon seeing in many GWA studies is that most of the reported genetic variants lack direct biological connection or knowledge (e.g. variants in a gene desert) [111][35], and are likely to be statistically artificial findings without meaningful biological explanations. The expectation of pathway analysis is to reveal more biological insights for the potential mechanisms of the trait, thus, to integrate other aspects of information from genetic regulation mechanisms into pathway analysis could be very useful. The present study focused on DNA methylation patterns, which represent as epigenetic markers to play critical roles in regulating gene expression. The commonly observed tissue specific allele-specific methylation could explain possible links between non-coding SNPs and gene function. We designed a weighting scheme to incorporate such information of functional variants into our pathway analyses for the GWA datasets of bipolar disorder. We found that several enriched pathways were only significant after applying the weighting scheme (36.3% in the GAIN dataset and 73.3% in the WTCCC datasets). These results with the weighting scheme reveal additional information, such as pathways related to exracellular matrix, long term potential, regulation of heart atrium and ventricle, methylated in acute lymphoblastic leukimia, etc in the GAIN dataset ([112]Table S1) and gated and cation channel activity in the WTCCC dataset ([113]Table S2). Prior studies have found that genes in some of these pathways are regulated by epigenetic mechanism. One recent study reported that epigenetic regulation of DNA demethylation of target genes, such as RELN (reelin, extracellular matrix serine protease) and BDNF, might underlie the mechanisms of synaptic plasticity and memory retention in rat medial prefrontal cortex [114][36] In addition, the expression of gene SCN3A that encodes for a subunit of voltage-gated sodium channel, which mainly expresses in the central nerve system, has found to be regulated by DNA methylation mechanism in mouse [115][37]. Some significant pathways in the present study identified through published biomedical databases were related to carcinogenesis (e.g. thyroid carcinoma, down-regulation of breast cancer, etc), specific targets of molecule complex (e.g. NCAM1 interactions, targets of CCND1 and CDK4, CHREBP, and SEMA3B etc.), and regulation of cellular processes or human diseases (e.g. arrhythmogenic right ventricular cardiomyopat, hypertrophic cardiomyopathyhcm and systolic heart failure, hematopoietic stem cell, etc). We examined the common functions of genes that included in these significant pathways to further explore their roles in the etiology of BPD. [116]Results of GO term analysis in the GAIN dataset ([117]Table S3) demonstrated the importance of collagen and extracellular matrix. Components of the extracellular matrix surround cell and mediate many important cellular processes such as cell differentiation, tissue rearrangement, and carcinogenesis. Neuron migration and colony from different brain areas which enrich the neuronal network with functional unit are highly associated with the extracellular matrix [118][38]. Disruption of this process in brain may be a potential cause of bipolar disorder. Interestingly, pathways that are related to cell movement (such as the extracellular matrix, focal adhesion, and regulation of actin cytoskeleton) were found to be associated with antipsychotic induced tremors in patient with mania episode [119][39]. In a GWA study of bipolar disorder, collagen type 11α2 (COL11A2), a component of extracellular matrix, was also shown to be associated with bipolar disorder comorbid with alcohol dependence [120][40]. This evidence altogether points to a new possibility to further investigate the roles of brain extracellular matrix in bipolar illness. We also identified several important genes that over-represented in reported enriched pathways. Many of these genes are associated with different biological processes and functions, including synaptic transmission (e.g. ACCN1, CHRNA6, HTR3B, HTR3A, etc) and cation channels activity (e.g. ACCN1, CACNA1C, KCNN3, etc). For genes encode for synapse components, prior association studies have indicated the involvement of certain genetic variants in a variety of psychiatric disorders [121][33], [122][41], [123][42], [124][43], [125][44], [126][45]. Genetic variants in genes related to serotonin transmission (e.g. HTR1B,HTR3A, HTR5A, etc) were associated with mood disorder [127][46]. Polymorphisms in CHRNA6 (nicotinic alpha subunit 6 of neuronal cholinergic receptor), have also been reported to be associated with bipolar disorder [128][47]. In addition, many studies have reported that variations in gene CACNAIC (alpha 1C subunit of the L-type voltage-gated calcium channel) had strong association signals with bipolar illness [129][5]. Potassium channels are found in most cell types and control a wide variety of cell functions, such as regulation of action potential and resting membrane potential in neurons. Prior study reported that long repeats of KCNN3 reduce potassium channel function and modify cognitive performance in schizophrenic patients [130][48]. ACCN1 (neuronal amiloride-sensitive cation channel 1), a cation channel with high affinity for sodium, is also permeable for lithium and potassium ions. A recent genome-wide scan found that genetic variants in ACCN1 were associated with response to lithium treatment in bipolar patients [131][49]. In conclusion, our study integrated methylation information with genome-wide genotyping data to bring biological insights into the underlying pathogenesis of bipolar illness. We identified significant pathways that are in line with evidence from prior causal hypotheses for bipolar disorder, and also reported novel biological pathways, such as the involvement of brain extracellular matrix in bipolar disorder. The strategy we applied provides another avenue to comprehensive our knowledge for the complex networks reside in the biological basis of bipolar disorder. Our findings could facilitate follow-up basic research to validate the functional and biological mechanisms for identified genes and pathways. There are some limitations in the current study. First, the smallest p-value (as commonly adopted in other studies [132][14]) was used to define the gene-level statistic, thus, information of other markers in a gene region is excluded. Using a combined method to include all markers' information in a defined gene may provide slightly different results in pathway analysis, such as the Inverse Gamma method [133][50], random effects model, or Bayesian statistical methods [134][51]. In addition, the accuracy of pathway analysis results depends on the completeness and correctness of annotated pathway database. Although we have used the more comprehensive databases, there is still likely that some pathways were not included in our analysis. Second, we incorporated methylation information in brain tissues into pathway analysis, while other genomic information such as gene expression or other epigenetic regulation was not used. Integration of genomic information from different platforms may provide additional benefit to identify enriched pathways for bipolar disorder. Third, we used two major GWA datasets of BPD in Caucasian populations to obtain concordant findings. Although these are large-scale GWA datasets, the association results from meta- or mega- analysis can be used in the near future to further increase the power to uncover the underlying biological mechanisms for BPD. Supporting Information Table S1 88 Significant pathways in the GAIN dataset by pathway-based methods after correction for multiple comparisons. #: The significant p-value after correction by the BH multiple comparison procedure; GSEA: Gene Set Enrichment Analysis; SUMSQ: sum-square-statistic; SUMST: sum-statistic. (DOCX) [135]Click here for additional data file.^ (34KB, docx) Table S2 15 Significant pathways in the WTCCC dataset by pathway-based methods after correction for multiple comparisons. #: The significant p-value after correction by the BH multiple comparison procedure; GSEA: Gene Set Enrichment Analysis; SUMSQ: sum-square-statistic; SUMST: sum-statistic. (DOCX) [136]Click here for additional data file.^ (21.6KB, docx) Table S3 The top 50 significant GO terms with p-value less than 0.05 in the GAIN dataset. (DOCX) [137]Click here for additional data file.^ (23.3KB, docx) Table S4 The top 50 significant GO terms with p-value less than 0.05 in the WTCCC dataset. (DOCX) [138]Click here for additional data file.^ (22.9KB, docx) Acknowledgments