Abstract Depression is currently the leading cause of disability around the world. We conducted an epigenome-wide association study (EWAS) in a sample of 58 depression score-discordant monozygotic twin pairs, aiming to detect specific epigenetic variants potentially related to depression and further integrate with gene expression profile data. Association between the methylation level of each CpG site and depression score was tested by applying a linear mixed effect model. Weighted gene co-expression network analysis (WGCNA) was performed for gene expression data. The association of DNA methylation levels of 66 CpG sites with depression score reached the level of P < 1 × 10^−4. These top CpG sites were located at 34 genes, especially PTPRN2, HES5, GATA2, PRDM7, and KCNIP1. Many ontology enrichments were highlighted, including Notch signaling pathway, Huntington disease, p53 pathway by glucose deprivation, hedgehog signaling pathway, DNA binding, and nucleic acid metabolic process. We detected 19 differentially methylated regions (DMRs), some of which were located at GRIK2, DGKA, and NIPA2. While integrating with gene expression data, HELZ2, PTPRN2, GATA2, and ZNF624 were differentially expressed. In WGCNA, one specific module was positively correlated with depression score (r = 0.62, P = 0.002). Some common genes (including BMP2, PRDM7, KCNIP1, and GRIK2) and enrichment terms (including complement and coagulation cascades pathway, DNA binding, neuron fate specification, glial cell differentiation, and thyroid gland development) were both identified in methylation analysis and WGCNA. Our study identifies specific epigenetic variations which are significantly involved in regions, functional genes, biological function, and pathways that mediate depression disorder. Subject terms: Clinical genetics, Personalized medicine Introduction Depression is currently the leading cause of disability worldwide [[38]1], and it is predicted to be one of the three leading causes of illness by 2030 [[39]2]. Although the heavy social and economic burden, the potential molecular mechanisms underlying depression remain poorly understood. The depression risk is influenced by both genetic and environmental factors. It is suggested that epigenetic modification could mediate the lasting increasing depression risk resulting from exposure to adverse life events and provide a mechanistic framework, where genetic and environmental factors were integrated [[40]3, [41]4]. DNA methylation was one important form of epigenetic modification, and one recent review of 67 studies concluded that there was evidence for the association of DNA methylation variation with depression risk [[42]5]. Additionally, candidate gene studies discovered that BDNF and SLC6A4 hypermethylation were related to depression or major depressive disorder (MDD) [[43]5]. Even currently some significant methylation modifications were found to be associated with depression, however, no consistent results were identified. Nowadays, using monozygotic twins discordant for a trait or disease has been proved to be a powerful and popular design for EWAS in linking the environmental basis of epigenetic modification variation to disease status while controlling for individual genetic component [[44]6, [45]7]. This design has been extensively used to explore specific DNA methylation variation associated with diseases, such as cognitive function decline [[46]8], Alzheimer’s disease [[47]9], and rheumatoid arthritis [[48]10]. Since the Chinese population are different from other ethnic populations worldwide concerning genetic constitutions, environmental exposure, a multitude of life styles and occupations, the DNA methylation variation may also differ. However, to our knowledge, yet no EWAS has been performed to explore the DNA methylation variation associated with depression using Chinese monozygotic twin samples. Accordingly, we aimed to conduct an EWAS to detect DNA CpG sites associated with depression and further integrate with gene expression data in a sample of middle and ole-aged Chinese monozygotic twins. Material and methods The primary materials and methods of this study were similar to our previously published studies [[49]8, [50]11, [51]12]. Participants Participants recruitment and collection were described in detail previously [[52]13]. Participants who suffered from cerebrovascular disorders, stroke, traumatic brain injury, central nervous system (CNS) tumor, CNS infections, and alcohol or drug dependence were excluded. Meanwhile, participants who were unconscious, unable, or unwilling to cooperate were also dropped. Finally, a total of 58 complete monozygotic twin pairs with a mean age of 52 years (SD: 7.5) were included in the methylation analysis, and a subsample consisted of 12 twin pairs were included in the gene expression analysis. The median of absolute values of intrapair depression score difference (∆(depression score)) of all twins was 4 (range: 1–15). The number of twin pairs for ∆(depression score) ranging in 1–5, 6–10, and 11–15 was 39, 15, and 4 in the methylation analysis and 9, 3, and 0 in the gene expression analysis, respectively. The median of ratio difference calculated as |∆(depression score)|/max(depression score) was 0.41 (range: 0.14–1.00). After providing written informed consent, all participants took a standardized questionnaire and underwent a health examination. This study was approved by the Regional Ethics Committee of the Qingdao CDC Institutional Review Boards. And the ethical principles of the Helsinki Declaration were also followed. Assessment of depression Depression was assessed by the Geriatric Depression Scale-30 (GDS-30) [[53]14]. The GDS-30 had 30 items, and participants were asked to answer “yes” or “no” to the items based on how they felt over the past 1 week. The higher the total score was, the more severe the participant’s mental condition was. Reduced representation bisulfite sequencing (RRBS) analysis The total DNA was extracted from whole blood and sent to one biomarker technology corporation in China for the RRBS experiment. Briefly, genomic DNA was first digested to generate short fragments that contained CpG dinucleotides at the ends. Then the CpG-rich DNA fragments were extracted and bisulfite-converted. The cDNA library was constructed and sequenced to get raw sequencing data, which was then preprocessed and mapped by Bismark [[54]15] and smoothed by R package BiSeq [[55]16]. The methylation β-value was transformed to M-value for statistical modeling. Finally, a total of 551,447 CpG sites were included. Cell-type composition estimation Considering total DNA was extracted from whole blood, different methylation profiles of distinct cell types may lead to false discoveries [[56]17]. We used ReFACTor method to control for the cell-type composition effect on DNA methylation in EWAS [[57]18]. ReFACTor is an unsupervised reference-free method that selects methylation sites, which are informative about the cell composition in the data to apply to principal component analysis (PCA) and further uses the top components of PCA to construct surrogates for the underlying cell-type compositions for adjustment in statistical analysis. In our study, the top five components were chosen as covariates to correct the cell-type heterogeneity. RNA library construction, sequencing, and quality control Briefly, after total mRNA was extracted from whole blood, the RNA-Seq library was constructed and sequenced to get the sequenced data. The data was then mapped to the human genome by TopHat2 [[58]19]. The gene expression level was estimated by FPKM value through Cufflinks software [[59]20]. Statistical analysis Methylation analysis Epigenome-wide association analysis The association between the methylation level of each CpG site and depression score was tested by a linear mixed effect model, which was equivalent to the regression model as proposed by Tan et al. [[60]6]. The fixed effect variables of age, gender, and cell-type composition as well as random effect variable of twin pairing were adjusted for in the model. Predicting functions of cis-regulatory regions and ontology enrichments analysis The identified epigenome CpG sites (P < 0.05) were submitted to the Genomic Regions Enrichment of Annotations Tool (GREAT) online to analyze the functional significance of cis-regulatory regions and ontology enrichments [[61]21]. The default “basal plus extension” association rule was chosen. In this rule, a “basal regulatory region” irrespective of the presence of neighboring genes which extended 5 kb upstream and 1 kb downstream of the transcription start site (TSS) were firstly assigned. Then each gene’s regulatory domain was extended up to the basal regulatory region of the nearest upstream and downstream genes, but no longer than 1 Mb in each direction. FDR < 0.05 was set as statistically significant in ontology enrichments analysis. Differentially methylated region (DMR) analysis Based on the bisulfite sequencing data and corresponding EWAS results, the DMRs associated with depression score were detected by using the comb-p approach [[62]22]. Significant enriched DMRs were evaluated by Stouffer-Liptak-Kechris (slk) corrected P-value < 0.05. Gene expression analysis Weighted gene co-expression network analysis (WGCNA) The WGCNA package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis [[63]23, [64]24]. Briefly, we first established a weighted adjacency matrix. Then the topological overlap matrix (TOM) was constructed [[65]25–[66]27] and used as input for hierarchical clustering analysis [[67]28]. Gene modules were detected by using a dynamic tree cutting algorithm. The module eigengenes (MEs) were correlated with the trait of depression score. Relationships among modules were illustrated by a hierarchical clustering dendrogram of MEs [[68]29], and a heatmap plot of the corresponding eigengene network. Intramodular hub genes were defined following criteria of depression score based gene significance (GS) > 0.70 and module membership (MM) > 0.90 with a threshold of P-value < 0.01 [[69]30]. For the genes clustered in the module associated with depression score, GO enrichment analysis and BIOCARTA, KEGG, and REACTOME pathway enrichment analysis were conducted by the DAVID tool [[70]31, [71]32]. The modified fisher exact P-value < 0.05 was considered as enrichment cut-off criterion. Differentially expressed genes analysis Five depression cases (depression score > 10) and eight health controls were included. The gene expression levels of 46 genes (including the genes where the top CpG sites and the DMRs were located) between the two groups were compared by the Wilcoxon rank sum test. The P-value < 0.05 was considered as statistically significant. Results Methylation analysis A total of 58 monozygotic twin pairs with a mean age of 52 years (SD: 7.5) were included. The median of depression score was 8 (95% range: 0–21). Most of the clinical indicators were statistically intrapair correlated, indicating that the co-twin design could be beneficial (Table [72]1). Table 1. Basic characteristics of the participants. Characteristics Values Intrapair correlation r P-value Number of twin pairs 58 Gender, pairs (%) Male 29 (50) – – Female 29 (50) – – Age, mean (SD) (years) 52 (7.5) – – Depression score, M (P[2.5], P[97.5]) 8 (0, 21) 0.36 0.006 Cognitive function score, M (P[2.5], P[97.5]) 21 (8, 30) 0.39* 0.002 BMI, mean (SD), (kg/m^2) 25.18 (3.63) 0.63** <0.001 Systolic, M (P[2.5], P[97.5]) (mmHg) 130 (104, 179) 0.44* 0.001 Diastolic, M (P[2.5], P[97.5]) (mmHg) 82 (62, 100) 0.31* 0.020 SUA, mean (SD) (μmol/L) 302 (95) 0.52** <0.001 GLU, M (P[2.5], P[97.5]) (mmol/L) 5.4 (3.60, 10.86) 0.57** <0.001 CHOL, mean (SD) (mmol/L) 4.97 (1.18) 0.54** <0.001 TG, M (P[2.5], P[97.5]) (mmol/L) 1.19 (0.20, 5.67) 0.58** <0.001 HDLC, M (P[2.5], P[97.5]) (mmol/L) 1.34 (0.66, 2.71) 0.83** <0.001 LDLC, mean (SD) (mmol/L) 2.89 (0.88) 0.45** <0.001 [73]Open in a new tab Continuous variables were presented as mean (standard deviation (SD)) or median (P[2.5], P[97.5]); Categorical variables were presented as numbers with percentages. BMI body mass index, CHOL total cholesterol, GLU fasting glucose, HDLC high-density lipoprotein cholesterol, LDLC low-density lipoprotein cholesterol, SUA serum uric acid, TG triglyceride. Epigenome-wide association analysis As shown in the Manhattan plot (Fig. [74]1) and Table [75]2, the association of DNA methylation levels of 66 top CpG sites with depression score reached the level of P < 1 × 10^−4. The numbers of these top CpG sites located at chr1, chr4, chr16, chr19, chr20, chr11, chr7, chr6, chr12, and chr17 were 9, 8, 7, 6, 6, 5, 5, 4, 3, and 3, respectively. The strongest association (β = 0.403, P = 2.951 × 10^−8) was detected for the CpG site (chr16: 90,143,728 bp) located near PRDM7. All of the top CpG sites (P < 1 × 10^−4) were located at 34 genes, and there were 5, 4, 4, 4, 3, 3, 3, 3, and 3 CpG sites located at/near PTPRN2, HES5, PRDM7, RIOK1, FCGBP, HELZ2, HPF1, LAMA5, and TRIM69, respectively. Fig. 1. Circular Manhattan plot for epigenome-wide association study of depression. [76]Fig. 1 [77]Open in a new tab The numbers of chromosome and the −log[10] of P-values for statistical significance are shown. The dots represent the observed CpG sites. Table 2. The results of epigenome-wide association study on depression score (P-value <1 × 10^−4). Chromosome Position (bp) Coefficient P-value Ensembl gene ID HGNC symbol chr16 90,143,728 0.403 2.951E−08 ENSG00000126856 PRDM7 chr16 90,143,720 0.402 4.592E−08 ENSG00000126856 PRDM7 chr16 90,143,734 0.361 4.714E−08 ENSG00000126856 PRDM7 chr1 2,460,431 0.054 4.083E−07 ENSG00000197921 HES5 chr16 90,143,752 0.311 1.753E−06 ENSG00000126856 PRDM7 chr4 170,696,032 0.132 2.809E−06 NA NA chr4 170,696,037 0.131 3.123E−06 NA NA chr7 157,794,468 −0.055 3.609E−06 ENSG00000155093 PTPRN2 chr4 170,696,018 0.131 4.001E−06 NA NA chr4 170,696,053 0.130 4.603E−06 NA NA chr4 170,696,014 0.131 4.857E−06 ENSG00000056050 HPF1 chr6 7,441,817 0.061 7.444E−06 ENSG00000124784 RIOK1 chr17 16,540,145 −0.274 9.447E−06 ENSG00000197566 ZNF624 chr7 157,378,324 0.052 1.138E−05 ENSG00000155093 PTPRN2 chr4 170,696,062 0.128 1.172E−05 ENSG00000056050 HPF1 chr1 231,296,670 0.089 1.202E−05 ENSG00000119283 TRIM67 chr6 7,441,838 0.065 1.218E−05 ENSG00000124784 RIOK1 chr11 114,480,748 0.041 1.337E−05 ENSG00000137634 NXPE4 chr1 231,296,663 0.089 1.358E−05 ENSG00000167333 TRIM68 chr7 157,378,313 0.046 1.398E−05 ENSG00000155093 PTPRN2 chr1 231,296,660 0.089 1.555E−05 ENSG00000185880 TRIM69 chr11 114,480,768 0.044 1.590E−05 ENSG00000137634 NXPE4 chr11 130,343,943 0.336 1.617E−05 ENSG00000166106 ADAMTS15 chr12 2,903,455 −0.058 1.622E−05 ENSG00000256150 ITFG2-AS1 chr7 157,378,307 0.044 1.699E−05 ENSG00000155093 PTPRN2 chr5 140,045,268 0.026 1.765E−05 ENSG00000120314 WDR55 chr18 18,627,551 −0.040 1.910E−05 ENSG00000124784 ROCK1 chr11 130,343,954 0.333 2.152E-05 ENSG00000166106 ADAMTS15 chr19 999,961 −0.021 2.233E−05 ENSG00000116032 GRIN3B chr7 157,378,339 0.052 2.274E−05 ENSG00000155093 PTPRN2 chr17 16,540,132 −0.270 2.365E−05 ENSG00000197566 ZNF624 chr1 2,460,466 0.031 2.831E−05 ENSG00000197921 HES5 chr14 59,113,371 0.253 2.918E−05 ENSG00000165617 DACT1 chr20 62,194,220 0.206 3.388E−05 ENSG00000130589 HELZ2 chr6 7,441,846 0.068 3.433E−05 ENSG00000124784 RIOK1 chr1 231,296,648 0.085 3.830E−05 ENSG00000185880 TRIM69 chr1 2,475,177 0.062 4.081E−05 ENSG00000197921 HES5 chr4 170,696,067 0.127 4.084E−05 ENSG00000056050 HPF1 chr1 231,296,641 0.084 5.177E−05 ENSG00000185880 TRIM69 chr3 128,205,418 0.142 5.408E−05 ENSG00000179348 GATA2 chr19 999,937 −0.019 5.557E−05 ENSG00000116032 GRIN3B chr9^a 95,376,152 0.272 5.571E−05 ENSG00000188312 CENPP ENSG00000127080 IPPK chr19 40,366,323 0.163 5.666E−05 ENSG00000275395 FCGBP chr20 60,886,354 0.031 5.690E−05 ENSG00000130702 LAMA5 chr20 60,886,357 0.031 5.912E−05 ENSG00000130702 LAMA5 chr19 40,366,308 0.163 5.921E−05 ENSG00000275395 FCGBP chr20 62,194,213 0.203 6.059E−05 ENSG00000130589 HELZ2 chr12 2,903,464 −0.062 6.221E−05 ENSG00000256150 ITFG2-AS1 chr6 7,441,850 0.073 6.404E−05 ENSG00000124784 RIOK1 chr22 50,985,408 0.046 6.716E−05 ENSG00000130487 KLHDC7B chr4 24,423,094 0.040 6.749E−05 ENSG00000109819 PPARGC1A chr5 170,068,701 0.044 7.195E−05 ENSG00000182132 KCNIP1 chr16 69,969,329 0.249 7.352E−05 ENSG00000198373 WWP2 chr20 60,886,348 0.031 7.884E−05 ENSG00000130702 LAMA5 chr16 56,998,191 0.040 8.090E−05 ENSG00000087237 CETP chr9^a 95,376,137 0.283 8.368E−05 ENSG00000188312 CENPP ENSG00000127080 IPPK chr12 133,341,501 0.039 8.446E−05 ENSG00000176915 ANKLE2 chr16 56,998,186 0.038 8.749E−05 ENSG00000087238 CETP chr20 62,194,202 0.202 8.765E−05 ENSG00000130589 HELZ2 chr14 54,419,615 0.028 8.799E−05 ENSG00000125378 BMP4 chr19 3,823,080 0.041 9.100E−05 ENSG00000105278 ZFR2 chr1 2,460,472 0.028 9.116E−05 ENSG00000197921 HES5 chr19 40,366,341 0.162 9.234E−05 ENSG00000275395 FCGBP chr10 102,822,675 0.032 9.349E−05 ENSG00000107821 KAZALD1 chr17 40,825,688 0.030 9.473E−05 ENSG00000068137 PLEKHH3 chr11 2,037,042 0.055 9.810E−05 ENSG00000130600 H19 [78]Open in a new tab NA not available. ^aThe CpG sites annotated to two genes. Predicting functions of cis-regulatory regions and ontology enrichments analysis A total of 15,978 genomic cis-regulatory regions were identified to be associated with one or more genes. (Supplementary Fig. [79]1) Many important pathway terms probably related to depression were significantly enriched, such as Notch signaling pathway, nicotine pharmacodynamics pathway, Huntington disease, p53 pathway by glucose deprivation, Parkinson disease, and hedgehog signaling pathways. Moreover, the GO enriched terms mainly highlighted DNA binding and nucleic acid metabolic process (Table [80]3). Table 3. The top GREAT ontology enrichments for regions potentially related to depression by using binomial test. Ontology database Term name Binom FDR Q-value Binom region fold enrichment Pathways PANTHER Transcription regulation by bZIP transcription factor 1.94E−24 3.18 PANTHER Notch signaling pathway 2.38E−07 1.81 PANTHER Nicotine pharmacodynamics pathway 5.88E−06 1.85 PANTHER Huntington disease 5.95E−06 1.38 PANTHER p53 pathway by glucose deprivation 7.56E−05 2.03 PANTHER ATP synthesis 7.44E−05 10.97 PANTHER Parkinson disease 7.32E−05 1.45 PANTHER Hedgehog signaling pathway 1.72E−04 1.76 PANTHER Muscarinic acetylcholine receptor 1 and 3 signaling pathway 1.31E−03 1.37 PANTHER Adrenaline and noradrenaline biosynthesis 1.21E−02 1.59 BioCyc Histidine degradation III 8.26E−09 7.34 BioCyc Palmitate biosynthesis I (animals) 1.27E−08 3.92 BioCyc Catecholamine biosynthesis 1.09E−08 6.49 BioCyc Adenine and adenosine salvage I 2.15E−07 18.12 BioCyc Aspartate biosynthesis 2.86E−07 8.98 BioCyc Methylglyoxal degradation VI 4.68E−07 3.41 BioCyc Oxidized GTP and dGTP detoxification 6.76E−07 18.79 BioCyc Hypusine biosynthesis 8.58E−07 24.23 BioCyc Serine and glycine biosynthesis 2.47E−06 3.16 BioCyc Dolichyl-diphosphooligosaccharide biosynthesis 2.45E−06 3.50 MSigDB Elongation arrest and recovery 1.03E−43 6.52 MSigDB Formation of tubulin folding intermediates by CCT/TriC 6.80E−30 5.44 MSigDB Notch signaling pathway 5.63E−28 2.85 MSigDB Prefoldin mediated transfer of substrate to CCT/TriC 2.68E−26 4.43 MSigDB Thrombin signaling through proteinase activated receptors (PARs) 3.13E−25 3.45 MSigDB Formation of RNA Pol II elongation complex 3.55E−25 3.64 MSigDB Glucagon signaling in metabolic regulation 1.69E−24 2.84 MSigDB PKC-catalyzed phosphorylation of inhibitory phosphoprotein of myosin phosphatase 2.20E−23 2.67 MSigDB Prostacyclin signaling through prostacyclin receptor 4.70E−22 4.40 MSigDB G alpha (12/13) signaling events 3.12E−20 1.98 GO function GO-MF DNA binding 1.26E−114 1.36 GO-MF Nucleic acid binding 1.06E−97 1.27 GO-MF Sequence-specific DNA binding transcription factor activity 7.48E−98 1.46 GO-MF Nucleic acid binding transcription factor activity 9.13E−96 1.46 GO-MF Sequence-specific DNA binding 1.57E−90 1.54 GO-MF Organic cyclic compound binding 1.75E−87 1.18 GO-MF Heterocyclic compound binding 5.68E−87 1.18 GO-MF Transcription regulatory region DNA binding 4.84E−58 1.61 GO-MF Regulatory region DNA binding 1.72E−57 1.60 GO-MF Transcription regulatory region sequence-specific DNA binding 1.64E−53 1.85 GO-BP RNA metabolic process 1.71E−89 1.27 GO-BP Gene expression 8.91E−88 1.26 GO-BP RNA biosynthetic process 7.09E−87 1.30 GO-BP Transcription, DNA-dependent 7.90E−86 1.30 GO-BP Regulation of macromolecule biosynthetic process 2.51E−82 1.23 GO-BP Nucleobase-containing compound biosynthetic process 5.60E−82 1.28 GO-BP Regulation of RNA biosynthetic process 1.08E−81 1.24 GO-BP Nucleic acid metabolic process 1.56E−80 1.23 GO-BP Organic cyclic compound biosynthetic process 2.66E−79 1.26 GO-BP Regulation of RNA metabolic process 7.78E−79 1.23 [81]Open in a new tab MF molecular function, BP biological process. Differentially methylated region (DMR) analysis Among the 19 DMRs identified (Fig. [82]2 and Table [83]4), the methylation levels of 14 DMRs (1, 2, 3, 5–12, 15, 17, 19) were positively and two DMRs (13, 14) negatively correlated with depression score. But it was difficult to determine the trend of associations between three DMRs (4, 16, 18) and depression score. Fig. 2. Differential methylation patterns for the identified differentially methylated regions (DMRs). [84]Fig. 2 [85]Open in a new tab The horizontal axis shows the chromosome positions with the black point indicating each CpG, and the vertical axis shows the coefficient for the association of each CpG sites with depression score. The blue line indicates the methylation pattern for each DMR. BP, base pair; chr, chromosome. Table 4. The results of annotation to significant differentially methylated regions (DMRs) (slk corrected P-value < 0.05). DMR ID Chromosome Start End Length Stouffer-Liptak-Kechris (slk) corrected P-value Gene symbol Location 1 chr6 7,441,772 7,441,894 11 0.001 CAGE1 Near 2 chr21 10,989,962 10,991,112 59 0.002 TPTE At 3 chr7 157,378,078 157,378,360 13 0.002 PTPRN2 At 4 chr12 132,329,893 132,330,413 18 0.003 MMP17 At 5 chr5 140,045,103 140,045,416 13 0.004 WDR55 At 6 chr12 56,329,618 56,329,905 17 0.004 DGKA At 7 chr19 40,366,276 40,366,427 14 0.008 FCGBP At 8 chr22 50,985,408 50,985,540 15 0.009 KLHDC7B At 9 chr16 90,143,720 90,143,886 11 0.020 PRDM7 At 10 chr15 23,034,971 23,035,423 6 0.021 NIPA2 At 11 chr5 138,729,711 138,730,335 36 0.023 PROB1 At 12 chr1 2,475,075 2,475,337 14 0.024 TNFRSF14-AS1 At 13 chr2 95,539,959 95,540,223 9 0.032 TEKT4 At 14 chr9 37,002,619 37,002,829 24 0.036 PAX5 At 15 chr16 87,901,443 87,901,581 11 0.040 SLC7A5 At 16 chr6 101,850,301 101,850,869 22 0.043 GRIK2 At 17 chr14 54,419,545 54,419,845 16 0.044 BMP4 At 18 chr2 223,163,019 223,163,942 29 0.045 PAX3 At 19 chr1 871,143 871,655 23 0.048 SAMD11 At [86]Open in a new tab The DMRs were located at/near 19 genes, among which DGKA and NIPA2 might play an important roles in regulating depression. Interestingly, several DMRs covered the top CpG sites listed in Table [87]2. The DMR1 (located at CAGE1), DMR3 (PTPRN2), and DMR9 (PRDM7) covered 4 CpG sites, and the DMR7 (located at FCGBP) covered three CpG sites. Gene expression analysis There were 12 twin pairs (including seven male pairs) with a median age of 53 years (95% range: 43–65) and a median depression score of 7.5 (range: 1–27) included in the gene expression analysis. Weighted gene co-expression network analysis (WGCNA) As Fig. [88]3 illustrated, genes clustered in pink module (including 3629 genes) were positively correlated with both depression score (r = 0.62, P = 0.002) and disease status (r = 0.49, P = 0.02). For this module, neuroactive ligand–receptor interaction, nicotine addiction, calcium signaling pathway, glutam4atergic synapse, and nervous system development were significantly enriched. (Table [89]5) MM and depression score-based GS exhibited a very significant positive correlation in pink module (r = 0.67, P < 0.001) (Supplementary Fig. [90]2), and 27 hub genes were identified (Supplementary Table [91]1). Fig. 3. Relationships of consensus module eigengenes and external trait. [92]Fig. 3 [93]Open in a new tab Each row in the table corresponds to a consensus module, and each column to a trait. Numbers in the table report the correlations of the corresponding module eigengenes and trait with the P-values printed below the correlations in parentheses. The table is color coded by correlation according to the color legend. Table 5. The results of GO function and KEGG pathway enrichment analysis for genes clustered in pink module by DAVID tool. Category Term Count P-value Pathways KEGG Neuroactive ligand–receptor interaction 93 3.03E−14 KEGG Olfactory transduction 99 5.04E−07 KEGG Nicotine addiction 20 1.53E−06 KEGG Calcium signaling pathway 49 4.49E−05 KEGG Morphine addiction 30 5.49E−05 KEGG Glutamatergic synapse 35 5.99E−05 KEGG ECM-receptor interaction 28 1.62E−04 KEGG Serotonergic synapse 33 1.98E−04 KEGG Maturity onset diabetes of the young 12 8.83E−04 KEGG Retrograde endocannabinoid signaling 29 9.61E−04 BIOCARTA Intrinsic prothrombin activation pathway 10 1.61E−03 KEGG Bile secretion 21 2.78E−03 KEGG GABAergic synapse 23 7.88E−03 KEGG Linoleic acid metabolism 11 8.60E−03 BIOCARTA Platelet amyloid precursor protein pathway 7 8.88E−03 KEGG Steroid hormone biosynthesis 17 1.17E−02 KEGG Complement and coagulation cascades 19 1.42E−02 GO function GO-BP Chemical synaptic transmission 88 3.54E−13 GO-BP Potassium ion transmembrane transport 45 2.07E−07 GO-BP Regulation of ion transmembrane transport 41 9.55E−07 GO-MF G-protein coupled receptor activity 167 1.73E−06 GO-MF Calcium ion binding 165 7.84E−06 GO-MF Voltage-gated potassium channel activity 24 2.58E−05 GO-BP G-protein coupled receptor signaling pathway 199 3.06E−05 GO-BP Sodium ion import across plasma membrane 10 9.99E−05 GO-BP Nervous system development 74 1.54E−04 GO-BP G-protein coupled receptor signaling pathway 19 2.83E−04 GO-MF Glutamate receptor activity 8 6.30E−04 GO-MF GABA-A receptor activity 10 1.83E−03 GO-BP Central nervous system development 34 2.46E−03 [94]Open in a new tab MF molecular function, BP biological process. The common genes and enrichment terms between methylation analysis and WGCNA The CpG sites (P < 0.05) were annotated to 2808 genes, of which 404 genes were also clustered in the pink module in WGCNA. Among these common genes, DENND5B, KBTBD13, TENM3, and BMP2 were also identified as hub genes following our strict criteria. And genes including PRDM7, KCNIP1, PLEKHH3, GRIK2, PROB1, PAX3 were where the top CpG sites or DMRs were located. (Supplementary Table [95]2) Many common enrichment terms were also found, including extra cellular matrix (ECM)-receptor interaction pathway, maturity onset diabetes of the young pathway, complement and coagulation cascades pathway, DNA binding, neuron fate specification, glial cell differentiation, thyroid gland development, and cellular response to hormone stimulus. Differentially expressed genes related to depression Genes including HELZ2 (P = 0.013), PTPRN2 (P = 0.040), GATA2 (P = 0.013), ZNF624 (P = 0.019) were found differentially expressed between the two independent groups. Discussion In this study based on monozygotic twins, we detected some important epigenetic variants underlying depression by EWAS. A total of 66 interesting CpG sites (P < 1 × 10^−4) and 19 DMRs were identified. Moreover, many crucial GREAT ontology enrichments were identified. Genes clustered in the pink module were positively correlated with depression score in WGCNA, and many genes and enrichment terms were overlapped between methylation analysis and WGCNA. Finally, four genes were found to be differentially expressed in depression cases and health controls. In EWAS, some genes where the top CpG sites and DMRs were located (Tables [96]2 and [97]4) may play essential roles in regulating depression status: (1) PTPRN2: DNA methylation variation of PTPRN2 was found to be associated with mood state disturbances across [[98]33]; (2) HES5: HES5 could negatively regulate 5-HT1A receptor gene, which was related to MDD and suicide [[99]34]; (3) GATA2: It was reported that overexpression of human GATA2 interfered with spine formation and produced depressive behavior in rats [[100]35]; (4) DGKA: Blood transcript levels of DGKA differed significantly between participants with MDDs and nondepressed controls [[101]36]; (5) NIPA2: It was suggested that rare copy number variants (CNVs) in NIPA2 could increase the risk of MDD by disrupting regulatory regions [[102]37]; (6) PRDM7: The protein encoded by this gene was involved in lysine degradation pathway, and lysine methylation was a physiological post-translational modification of tau protein which played an important role in aging and Alzheimer’s disease [[103]38]; (7) KCNIP1: The protein encoded by this gene was a member of the family of cytosolic voltage-gated potassium (Kv) channel-interacting proteins (KCNIPs), and could regulate rapidly inactivating (A-type) currents and hence neuronal membrane excitability; (8) GRIK2: GRIK2, as one glutamate-related gene, might be related to risk for mood disorders [[104]39], and the gene polymorphism of GRIK2 was associated with depressive symptoms [[105]40]. The other genes have an unknown function in terms of depression now, whereas they may also be interesting potential candidates to be future researched and validated. As additional validation, we integrated the methylation data with gene expression data. Genes clustered in the pink module were positively correlated with depression score in WGCNA. And some genes were in common with EWAS findings, like BMP2, PRDM7, KCNIP1, and GRIK2. It was indicated that histone deacetylases could control neurogenesis in embryonic brain by inhibiting BMP2/4 signaling [[106]41]. The other three genes had been discussed above. Additionally, four genes HELZ2, PTPRN2, GATA2, and ZNF624 were differentially expressed between depression cases and health controls. PTPRN2 and GATA2 have been discussed above, whereas the biological of HELZ2 and ZNF624 involved in depression remained to be studied further. Two strengths can be noticed in our study. Since the case co-twin design using monozygotic twins discordant for a trait or disease was a powerful tool for EWAS [[107]6, [108]7], our results based on the twin data would be credible. Meanwhile, considering the various genetic constitutions, environmental exposures, and a multitude of life styles in different ethnic populations worldwide, our findings will specifically help elucidate the underlying pathogenesis of depression in the Chinese population. Nevertheless, some limitations of our study should also be considered. First, the sample size of the present study was relatively limited due to the challenges of recruiting and identifying qualified twin pairs. We’ll further validate the top CpG sites, essential genes, and biological pathways in a community population. And we’ll also evaluate if the physical distribution of top CpG sites at different chromosomes is over-represented in the regulatory domain of one specific biological pathway. Additionally, we’ll conduct a causal effect analysis based on one specific biological pathway by integrating data of genetic variation, epigenetic variation, and environmental factors. Second, the Townsend deprivation index (TDI) was indicated to be associated with depression [[109]42, [110]43]. However, we couldn’t add this factor as a covariate in the linear mixed effects model, because we didn’t investigate the corresponding information of TDI during the epidemiological survey. We’ll consider the TDI factor in the validation analysis in the future. In summary, we confirm that epigenetic factors are significant in explaining depression through twin-based analysis. We detected multiple CpG sites, genes, DMRs, and pathways that were potentially associated with depression. The findings provided an important clues to further elucidate the pathogenesis of depression and helped to identify new diagnostic biomarkers and therapeutic targets for this disease. Supplementary information [111]Supplementary legends^ (13.7KB, docx) [112]Supplementary table 1^ (18.1KB, docx) [113]Supplementary table 2^ (50.1KB, docx) [114]Supplementary figure 1^ (20.6KB, pdf) [115]Supplementary figure 2^ (204.3KB, pdf) Acknowledgements