Abstract

   Major depression (MD) is a leading cause of global disease burden, and
   both experimental and population-based studies suggest that differences
   in DNA methylation may be associated with the condition. However,
   previous DNA methylation studies have, so far, not been widely
   replicated, suggesting a need for larger meta-analysis studies. Here we
   conducted a meta-analysis of methylome-wide association analysis for
   lifetime MD across 18 studies of 24,754 European-ancestry participants
   (5,443 MD cases) and an East Asian sample (243 cases, 1,846 controls).
   We identified 15 CpG sites associated with lifetime MD with
   methylome-wide significance. The methylation score created using the
   methylome-wide association analysis summary statistics was
   significantly associated with MD status in an out-of-sample
   classification analysis (area under the curve 0.53). Methylation score
   was also associated with five inflammatory markers, with the strongest
   association found with tumor necrosis factor beta. Mendelian
   randomization analysis revealed 23 CpG sites potentially causally
   linked to MD, with 7 replicated in an independent dataset. Our study
   provides evidence that variations in DNA methylation are associated
   with MD, and further evidence supporting involvement of the immune
   system.

   Subject terms: Predictive markers, DNA methylation, Depression,
   Epigenomics
     __________________________________________________________________

   The authors report a meta-analysis of methylome-wide association
   studies, identifying 15 significant CpG sites linked to major
   depression, revealing associations with inflammatory markers and
   suggesting potential causal relationships through Mendelian
   randomization analysis.

Main

   Major depression (MD) is a common psychiatric disorder arising from a
   complex combination of genetic and environmental factors that include
   lifestyle factors such as physical activity, smoking, alcohol
   consumption and body mass index (BMI)^[180]1–[181]3. The heritability
   of MD estimated from twin studies is 37% (ref. ^[182]4), and polygenic
   risk scores trained on genome-wide association studies (GWASs)
   currently explain 1.5–3.2% of the variance in MD^[183]3.

   DNA methylation (DNAm) is one of the most studied epigenetic processes
   and is influenced by both genetic and environmental factors^[184]5.
   DNAm is dynamic and is associated with changes in environmental and
   lifestyle factors, including smoking^[185]6, alcohol^[186]7 and
   BMI^[187]8, all factors that are implicated in MD^[188]3. A recent
   study^[189]9 identified associations between methylation scores (MSs),
   calculated using methylome-wide association study (MWAS) summary
   statistics for several relevant lifestyle factors, and MD^[190]10.
   These environmental MS measures were able to capture additional
   variation associated with MD when added to direct lifestyle measures,
   and it is thought this may be due to their ability to act as an archive
   of environmental exposure.

   There is growing evidence from MWAS that DNAm measured in whole blood
   may be associated with MD. Jovanova et al.^[191]11 looked at 11 cohorts
   comprising 11,256 participants of European and African ancestry and
   identified 3 cytosine–phosphate–guanine (CpG) sites associated with
   depressive symptoms, which were annotated to genes implicated in axon
   guidance^[192]11. Starnawska et al.^[193]12 investigated depressive
   symptomatology in a sample of 724 monozygotic twins, with top findings
   annotated to genes previously implicated in depression^[194]12.
   Finally, Huls et al.^[195]13 identified DNAm associations with MD in
   dorsal lateral prefrontal cortex samples from N = 608 participants and
   uncovered CpGs annotated to several genes that are relevant to
   MD^[196]13. In addition, MS calculated using penalized regression
   models have previously been utilized to investigate MD^[197]14. We
   recently used lasso regression to calculate an MD–MS in N = 1,780
   participants (cases = 363; controls = 1,417) and found that the scores
   explain approximately 1.75% of the variance in prevalent MD, acting
   additively with polygenic risk score^[198]14. Findings from these
   studies have been somewhat inconsistent, and they have either included
   only European-ancestry samples^[199]15,[200]16 or used a combined
   multi-ancestry sample without considering trans-ancestry
   differences^[201]11. This has limited generalizability of previous
   findings. Larger MWAS studies may provide more reliable estimates of
   the differences in DNAm between MD cases and controls and, in doing so,
   bring insight into the molecular mechanisms associated with MD. We
   further hypothesized that DNAm associations with MDD would generalize
   across ancestries and sought to test this in a case–control sample of
   East Asian ancestry.

   Given previous evidence highlighting the role of DNAm in MD, we
   conducted an MD MWAS meta-analysis using whole-blood DNAm data from 18
   cohorts comprising 24,754 individuals (5,443 cases) of European
   ancestry. We sought to identify whether differences in DNAm were a
   potential cause or consequence of MD using a two-sample Mendelian
   randomization framework. Further, we trained a DNAm classifier of MD
   status from our summary statistics and assessed whether it could
   classify MD case–control status in an independent testing sample and
   its association with the abundance of inflammatory protein markers.
   Finally, we assessed whether significant effect sizes were positively
   associated with those in an independent East Asian sample.

Results

MD meta-MWAS

   A total of 15 CpG sites were significantly associated with MD after
   Bonferroni correction in the basic model (P < 6.55 × 10^−8,
   P[Bonferroni] < 0.05; see Table [202]1 for details of participating
   studies and Fig. [203]1 and Table [204]2 for significant findings).

Table 1.

   Information for cohorts that participated in the methylome-wide
   meta-analysis
   Cohort name PMID N cases Sample size MD prevalence (%) Case phenotype
   Mean age Gender (% male) Genetic ethnicity DNAm array Tissue type
   Avon Longitudinal Study of Parents and Children/Accessible Resource for
   Integrated Epigenomic Studies 25991711 204 657 31.1 Questionnaire
   29.4/45 0 European 450K Whole blood
   The biological classification of mental disorders/the Optimized
   Treatment Identification at the Max Planck Institute 32393358; 33054737
   222 305 72.8 Clinical interview 37.5 38.4 European EPIC Whole blood
   BiDirect 24924233 287 561 51.5 Questionnaire 51.6 49.7 European EPIC
   Whole blood
   Dublin Cohort 26419460 37 186 19.9 Questionnaire 38.0 52.7 European
   450K PBMCs
   Environmental Risk Longitudinal Twin Study 12236608; 12537874 322 1,625
   19.8 Clinical interview 18.5 51.0 Predominantly European 450K Whole
   blood
   EXTEND 29790996 190 1,181 16.1 Self-reported diagnosis/GP visit 56.3
   48.0 Predominantly European 450K Whole blood
   Finnish Twin Cohort (450K set) 31796134; 31640839 163 985 16.5 Clinical
   interview 23.8 45.9 Finnish, European 450K Whole blood
   Finnish Twin Cohort (EPIC set) 31796134; 31640839 93 366 25.4 Clinical
   interview 25.4 49.5 Finnish, European EPIC Whole blood
   FOR2107 30267149 340 667 51.0 Clinical interview 35.3 36.7 European
   EPIC Whole blood
   GS (sets 1 and 2) 30918249 1,632 9,502 17.2 Clinical interview 49.8
   41.0 European EPIC Whole blood
   Janssen 36319817 191 222 86.0 DSM-criteria questionnaire 42.8 19.7
   Predominantly European EPIC Whole blood
   Biopsychosocial factors of major depression in youth/University
   Hospital of Munich 30947532 381 633 60.2 Clinical interview 15.1 33.3
   European EPIC Whole blood
   Munich Antidepressant Response Signature/GlaxoSmithKline 18586274;
   16822851 311 497 62.6 DSM-criteria questionnaire 48.1 42.5 European
   450K Whole blood
   Netherlands Twin Register 31666148 436 2,701 16.1 Clinical interview
   36.5 33.0 European 450K Whole blood
   Study of Health in Pomerania-Trend 35348705 84 492 17.1 DSM-criteria
   questionnaire 51.1 46.3 European EPIC Whole blood
   TwinsUK 30760334 201 692 29.0 Clinical interview 59.0 0 European 450K
   Whole blood
   Understanding Society/UK Household Longitudinal Study Set 1 30401456 72
   1,121 6.4 Self-reported diagnosis/GP visit 58.4 41.9 European EPIC
   Whole blood
   Understanding Society/UK Household Longitudinal Study Set 2 30401456
   277 2,361 11.7 Self-reported diagnosis/GP visit 51.1 45.9 European EPIC
   Whole blood
   [205]Open in a new tab

   Mean ages are reported for two subsets of the Avon Longitudinal Study
   of Parents and Children cohort. DSM, Diagnostic and Statistical Manual
   of Mental Disorders^[206]54; GP, general practitioner; PMID, PubMed
   identifier.

Fig. 1. Meta-analysis of MWAS for MD.

   [207]Fig. 1
   [208]Open in a new tab

   a, Workflow of meta-analysis. b, Manhattan plot for the meta-analysis
   of MWAS for MD. Each dot represents a CpG probe. The x axis represents
   the relative positions of the probes in the genome. The y axis
   represents −log[10]-transformed, two-sided P values. The red and yellow
   dashed lines represent the significance threshold for Bonferroni and
   FDR correction, respectively.

Table 2.

   Significant CpG sites in the meta-analysis of MD MWAS
   CpG Chromosome Base pair location Beta s.e. P P-adj N UCSC gene
   Relation to Island Array EWAS catalog/EWAS atlas (PMID) GWAS catalog
   gene look-up (PMID)
   cg19432791 3 45894556 0.025 0.004 1.70 × 10^−9 0.001 16,545 LZTFL1
   OpenSea EPIC C-reactive protein (CRP)^a (38692281); glucose (33151971);
   schizophrenia (26619358) Multiple sclerosis (31604244); COVID-19
   (35910207); CD3 protein levels (32929287)
   cg11101030 13 93744566 −0.035 0.006 4.99 × 10^−9 0.004 15,901 – OpenSea
   EPIC – –
   cg10145533 8 20350779 0.019 0.003 5.34 × 10^−9 0.004 24,191 – OpenSea
   Both Crohn’s disease^a (27279921); preterm birth^a (28428831);
   autoimmune disease^a (31024609) –
   cg12187985 18 23654410 0.021 0.004 5.43 × 10^−9 0.004 16,850 SS18
   OpenSea EPIC CRP^a (38692281); preterm birth (28428831); rheumatoid
   arthritis (23334450); LRIG1 protein (35945220) Neuroticism (30595370);
   autoimmune thyroid disease (32581359)
   cg09726355 17 42544621 0.019 0.003 8.53 × 10^−9 0.007 16,628 GPATCH8
   OpenSea EPIC Rheumatoid arthritis (23334450); HIV infection (27105112);
   alcohol consumption (27843151) Cerebrospinal fluid progranulin levels
   (33087363); brain morphology (32665545); educational attainment
   (35361970)
   cg03671627 2 209185667 0.026 0.005 9.85 × 10^−9 0.008 16,850 PIKFYVE
   OpenSea EPIC CRP^a (38692281); kidney disease^a (33933144); alcohol
   consumption (31789449); childhood obesity (28614626) BMI (36581621);
   asthma (27611488)
   cg17494199 13 106834942 −0.011 0.002 1.22 × 10^−8 0.009 24,748 –
   OpenSea Both Very early preterm birth^a (30833390), birth weight^a
   (31015461), fetal brain development^a (25650246) –
   cg17002328 14 91751773 0.016 0.003 2.32 × 10^−8 0.018 24,750 CCDC88C
   N_Shelf Both CRP^a (38692281); multiple sclerosis^a (30479356); chronic
   fatigue syndrome (30036399); schizophrenia (33279932); autism
   (27404287); BMI (33517419); ADHD (28785368) White-matter hyperintensity
   volume (33293549); subcortical volume (32665545); Autism (35215271);
   Alzheimer’s disease (26830138)
   cg12805451 3 123672734 0.024 0.004 3.11 × 10^−8 0.024 16,850 CCDC14
   OpenSea EPIC – intelligence (29844566); general cognitive ability
   (29326435)
   cg20793665 2 232549224 0.019 0.003 3.79 × 10^−8 0.029 24,752 – S_Shelf
   Both CRP^a (38692281) –
   cg01150646 17 1377304 0.013 0.002 3.80 × 1^−8 0.029 16,850 MYO1C
   S_Shore EPIC Insulin resistance (30792424); BMI (29278407); Alzheimer’s
   disease (33257653); alcohol consumption (27843151); preterm birth
   (28428831) Alzheimer’s disease (35770850); educational attainment
   (30038396)
   cg02948555 15 42749848 0.036 0.007 4.72 × 10^−8 0.036 15,911 ZNF106
   OpenSea EPIC Downs syndrome (33547282); early onset intracranial
   atherosclerotic stenosis (31142690) CRP (31900758); cholesterol to
   total lipid ratio in low-density lipoprotein cholesterol (35213538)
   cg10604091 13 50650086 0.023 0.004 5.27 × 10^−8 0.040 16,850 DLEU2
   OpenSea EPIC BMI (29278407); inflammatory bowel disease (32281463);
   Alzheimer’s disease (25129075); birth weight (31015461); insufficient
   sleep (30718923) BMI (31669095); creatinine (35710981, 34594039);
   non-high-density lipoprotein cholesterol levels (34887591)
   cg04173586 19 2167496 0.025 0.005 6.30 × 10^−8 0.048 24,043 DOT1L
   S_Shore Both Schizophrenia^a (33646943); alcohol consumption^a
   (31789449) BMI-adjusted waist/hip ratio (30575882); gamma-linolenic
   acid (26584805)
   cg07063853 11 43333512 0.020 0.004 6.42 × 10^−8 0.049 23,395 API5
   Island Both HIV infection^a (27105112); obesity (29692867); rheumatoid
   arthritis (27585642); Alzheimer’s disease (33257653) –
   [209]Open in a new tab

   EWAS catalog ([210]http://www.ewascatalog.org/), EWAS atlas
   ([211]https://ngdc.cncb.ac.cn/ewas/atlas) and GWAS catalog
   ([212]https://www.ebi.ac.uk/gwas/) were used to find associated traits
   in previous studies (search was based on genes). Beta, standardized
   regression coefficient; P, double-sided P value; P-adj,
   Bonferroni-corrected P value.

   ^aTraits associated with CpG sites.

   Five of the significant CpG sites positionally mapped to genes
   associated with mental health, neurodegenerative and developmental
   disorders. The gene mapped from cg17002328 (CCDC88C) was associated
   with schizophrenia^[213]17 and attention-deficit/hyperactivity disorder
   (ADHD)^[214]16 in epigenetic studies and with brain structural measures
   in GWAS, such as cortical surface area^[215]18 and accumbens
   volume^[216]19. CpG site cg17494199 was associated with preterm
   birth^[217]20, birth weight^[218]21 and fetal brain development^[219]22
   in previous epigenetic studies. Genes mapped from cg01150646 (MYO1C),
   cg10604091 (DLEU2) and cg07063853 (API5) were associated with
   Alzheimer’s disease in brain-tissue methylation levels^[220]23–[221]25.

   A total of five significant CpG sites were associated with autoimmune
   diseases and biomarkers in previous studies, one of which was the most
   significantly CpG site (cg19432791 on chromosome 3), mapped to gene
   LZTFL1. From the catalog of epigenome-wide association studies (EWAS
   catalog), the LZTFL1 gene was associated with biomarkers related to
   pain, such as glucose level^[222]26, and autoimmune diseases or
   markers, such as rheumatoid arthritis^[223]27 and C9 protein
   levels^[224]28. Four of the significant CpG sites, cg12187985 (SS18),
   cg10145533, cg09726355 (GPATCH8) and cg07063853 (API5), map to genes
   previously found associated with autoimmune markers and diseases (for
   example, rheumatoid arthritis^[225]27) in MWAS studies.

   Other CpG sites, cg03671627 and cg02948555 (ZNF106), were associated
   with traits and markers relevant to obesity in both genetic and
   epigenetic studies. These markers include, for example, BMI^[226]29 and
   the ratio of cholesterol to total lipids^[227]30.

   A complete list of CpG sites and related genes and traits can be found
   in Table [228]2. Quantile–quantile plot of the meta-analysis and the
   inflation factors of the association statistics for each individual
   cohort can be found in Supplementary Figs. [229]3 and [230]4,
   respectively. See Supplementary Fig. [231]5 for effect sizes of each
   study for the significant CpG sites.

Identification of DMRs

   A total of 37 differentially methylated regions (DMRs) were identified
   to be associated with MD after Bonferroni correction. The largest
   associated region locates within the major histocompatibility complex
   region, discoidin domain receptor tyrosine kinase 1 (DDR1) gene (chr.
   6:30853258–30854233), previously implicated in major psychiatric
   disorders (for example, schizophrenia, MD and bipolar disorder).

   The full list of significant DMRs can be found in Supplementary Table
   [232]1.

Pathway enrichment analysis

   Following the meta-MWAS, pathway enrichment analysis was conducted to
   identify potential biological pathways that the significant CpG sites
   were implicated in. No gene ontology (GO) term or Kyoto Encyclopedia of
   Genes and Genomes (KEGG) pathway was significantly enriched after false
   discovery rate (FDR) correction (Supplementary Table [233]2). The top
   ten most significantly enriched GO terms included pathways relevant to
   protein and metabolic processes (for example, negative regulation of
   protein localization to ciliary membrane) (P ranged from 0.003 to
   8.14 × 10^−4). For KEGG, ‘transcriptional misregulation in cancer’ and
   ‘lysine degradation’ were the only pathways that reached nominal
   significance (P ≤ 0.007).

The basic model versus complex model

   We compared results for the basic model (18 cohorts, N[total] = 24,754)
   and complex model (15 cohorts, N[total] = 20,196; Supplementary Fig.
   [234]1) to evaluate potential confounding effects of BMI and alcohol
   consumption (Fig. [235]2). Effect sizes of the two models were highly
   correlated for the significant CpG sites (r = 0.988) and for CpG sites
   across the entire methylome (r = 0.920). All significant CpG sites in
   the basic model remained significant in the fully adjusted model after
   Bonferroni correction across the 15 significant CpGs found in the basic
   model, despite a significant reduction in sample size
   (P < 2.51 × 10^−4, P[Bonferroni] < 0.004 corrected across the
   significant CpG sites in discovery analysis).

Fig. 2. Comparison between basic and complex models.

   [236]Fig. 2
   [237]Open in a new tab

   a,b, In both comparison of effect sizes indicated by standardized
   coefficients of linear regression (a) and comparison of double-sided P
   values (b), each dot represents a CpG probe. In a, the x axis
   represents effect size, and in b, the x axis represents
   −log[10]-transformed, double-sided P value. Error bars represent ± one
   standard error of the mean. Differences in P values reflect both the
   model used and the sample sizes. The y axis represents individual CpG
   sites. The yellow dashed line in b represents the significance
   threshold for nominal significance (double-sided P < 0.05).

Out-of-sample classification of MD using methylation scores

   MD methylation scores (MD–MSs) were used to assess whether the present
   MD MWAS summary statistics can be used to classify MD case–control
   status in an independent sample. All five MD–MSs showed positive effect
   sizes (higher score associated with higher liability of MD). There was
   an increasing trend of effect sizes as the P-value threshold for MS
   calculation became increasingly stringent (Fig. [238]3). Of the five
   scores tested, only MS at P-value threshold of ≤5 × 10^−8 was found to
   be associated with MD diagnosis (β = 0.13, P = 0.003, area under the
   curve = 0.53; Fig. [239]3). P values for other MSs ranged from 0.069 to
   0.314.

Fig. 3. Out-of-sample classification of MD using MS.

   Fig. 3
   [240]Open in a new tab

   a, Box plot for case–control classification using MS created using
   P-value threshold at 5 × 10^−8. MS was standardized and residualized
   against age, sex and values for aryl hydrocarbon receptor repressor
   (AHRR) probes. Residuals of MSs were used for the plot. The x axis
   represents MS, and the y axis represents MD case and control groups.
   Box boundaries are determined by the top and bottom quartiles. Box
   whiskers are defined by 1.5 interquartile range. Any values outside of
   the box whiskers are presented as individual data points. Median values
   per group are indicated as vertical lines within the boxes. b,
   Statistics for MS association tests. Cohen’s d was calculated by
   comparing MD case–control differences in MS. Double-sided P values are
   reported. P[T], P threshold for creating the MS.

Association between MD–MS and inflammatory protein abundance

   Following out-of-sample classification using MD–MS, we sought to
   identify inflammatory proteins of which the abundance was associated
   with MD–MS. Five proteins were significantly (P < 3.5 × 10^−3)
   associated with the MD–MS created at the P threshold of ≤5 × 10^−8. The
   strongest association was found in tumor necrosis factor B (β = −0.15,
   P = 2.37 × 10^−5). Other proteins were interleukin 6 (β = 0.12,
   P = 5.18 × 10^−4), transforming growth factor alpha (β = 0.11,
   P = 2.8 × 10^−3), CD5 (β = −0.11, P = 2.1 × 10^−3) and EN-RAGE
   (β = 0.11, P = 3.5 × 10^−3; Supplementary Fig. [241]6).

Heterogeneity analysis

Leave-one-out analysis

   We systematically investigated the heterogeneity of effect sizes in
   each individual study by conducting leave-one-out (LOO) analysis. For
   every CpG site, all effect sizes for leave-one-out analyses remained in
   the same direction as the meta-analysis (Supplementary Fig. [242]7).
   All tested CpG sites remained significant after leaving individual
   studies out except when the largest study, Generation Scotland (GS),
   was left out (P < 1.03 × 10^−4). When GS was left out, 11 out of the 15
   tested CpG sites remained nominally significant (P < 8.91 × 10^−3). The
   four sites that became non-significant when GS was omitted were
   cg02948555 (β[meta-MWAS] = 0.036, β[LOO] = 0.008, P[LOO] = 0.511),
   cg11101030 (β[meta-MWAS] = −0.035, β[LOO] = −0.01, P[LOO] = 0.235),
   cg07063853 (β[meta-MWAS] = 0.02, β[LOO] = 0.01, P[LOO] = 0.232) and
   cg01150646 (β[meta-MWAS] = 0.013, β[LOO] = 0.007, P[LOO] = 0.054).

Effect of age difference between studies

   Due to systematic differences of age range across studies, we conducted
   meta-regression to investigate the impact of age difference on the
   findings of meta-analysis. Of the 15 CpG sites significant in the
   meta-MWAS, 14 sites did not show an effect of age difference across
   studies (absolute β ranged from 1.11 × 10^−6 to 6.09 × 10^−4,
   P > 0.234; Supplementary Fig. [243]8). One site, cg04173586, showed a
   significant effect of age (β = −9.25 × 10^−4, P = 0.006). However,
   leave-one-out analysis showed highly consistent findings for cg04173586
   across studies. There were 14 out of 17 participating cohorts that
   showed effect sizes consistent with the meta-MWAS (Supplementary Fig.
   [244]5) and leave-one-out analyses were significant for all iterations
   (P ranged from 1.03 × 10^−4 to 2.38 × 10^−8; Supplementary Fig. [245]5
   and Supplementary Table [246]3).

Correlation matrix for effect sizes

   Heterogeneity between studies was analyzed by looking at the
   between-study correlation of effect sizes estimated using the basic
   model (Supplementary Fig. [247]9). Correlations between the 18 studies
   participating in the MD meta-MWAS ranged from −0.19 to 0.31. The
   highest positive correlation was found between the Netherlands Twin
   Register and Janssen (r = 0.31). Of the 153 pairwise correlations, 96
   were positive (62.7%). Compared with the MD meta-MWAS, BMI MWAS (10
   studies) showed higher, positive effect size correlations between
   studies (r ranged from 0.305 to 0.864, 100% of the pairs were
   positively and significantly correlated).

Mendelian randomization

   Using cis mQTL data from GS and Wald ratio Mendelian randomization (MR)
   method, we found 23 significant and potentially causal effects of DNA
   methylation on MD (absolute β ranged from 0.06 to 0.93, P ranged from
   6.88 × 10^−3 to 4.58 × 10^−6; Fig. [248]4 and Supplementary Data
   [249]1). There were 17 CpG sites located in the major
   histocompatibility complex region (mapping to the DDR1R gene) on
   liability to MD (β ranged from 0.06 to 0.17, P ranged from 1.1 × 10^−3
   to 8.05 × 10^−5). See Supplementary Data [250]1 for the full list of
   significant causal effects found in the discovery analysis in GS.

Fig. 4. MR analysis of the causal effect of DNAm on MD.

   [251]Fig. 4
   [252]Open in a new tab

   a, P plot for discovery MR analysis in GS. Each dot represents a CpG
   site. The x axis represents chromosomes and base pair position. The y
   axis represents −log[10]-tranformed P value of Wald ratio MR analysis.
   The gray dashed line shows the FDR-corrected significance threshold.
   Discovery MR analysis was performed on CpG sites available in the EPIC
   array. b, Replication analysis in GoDMC for the significant CpGs found
   in the discovery analysis. Replication analysis was performed on CpG
   sites available on the 450K array. Of the 23 significant CpG sites in
   the discovery analysis, 19 were available on both EPIC and 450K arrays
   and therefore were included in the replication analysis. The x axes
   represent effect size and −log[10]-transformed P value on the left and
   right panels. The y axis represents individual CpG sites. The yellow
   dashed line in the right panel represents FDR-significance threshold.

   Of the 23 potentially causal effects found in the discovery analysis, 4
   CpG sites were specific to the EPIC array, while 19 were available on
   both array types and could, therefore, be included in the replication
   analysis. Seven CpG sites tested were replicated (P < 0.003; Fig.
   [253]4). Six of the replicated CpG sites were in the major
   histocompatibility complex region (β 0.040 to 0.073, P 1.22 × 10^−4 to
   1.19 × 10^−9). In addition, a potentially causal effect from cg26829071
   to MD (mapping to GPR133, β = −0.056, P = 0.004) was found on
   chromosome 12.

   No significant causal effects from MD to DNAm were found (absolute β
   1.62 × 10^−5 to 0.22, P[uncorrected] > 0.001, P[FDR] > 0.66).

MWAS in East Asian ancestry

   We sought to identify DNAm associations with MD in East Asian ancestry
   using data from Taiwan Biobank. There were no methylome-wide
   significant findings identified. Correlation between the methylome-wide
   significant CpGs effect sizes identified in the main results (N = 15)
   and the same CpGs in Taiwan Biobank was r = 0.48. The effect direction
   was consistent between the two cohorts for 11 of the 15 CpGs (73.3% in
   the same direction; Supplementary Tables [254]4–[255]6). Effect sizes
   were significantly and positively correlated when we extended the
   comparison in the top 100 and top 1,000 CpG sites found within the MWAS
   from European samples (top 100 CpG sites found in European samples:
   r = 0.26, P = 0.009; top 1,000 CpG sites found in European samples:
   r = 0.22, P = 4.07 × 10^−12).

Discussion

   Our meta-MWAS of 24,754 individuals found 15 CpG sites to be
   significantly associated with MD, an increase of 12 CpGs on previously
   reported associations^[256]11. Significant findings revealed CpGs
   mapped to genes associated with autoimmune markers and
   depression-related traits, such as BMI. Seven potentially causal
   effects from DNAm to MD were identified and replicated. Leave-one-out
   analysis showed that effects were highly consistent across studies for
   the significant sites. However, the correlation in CpG–MD association
   effect sizes was relatively low and showed considerable between-study
   heterogeneity for MD, in contrast to BMI, which showed uniformly
   positive and more homogeneous correlations across all available
   studies. A positive correlation was observed for top MD CpG sites
   between European and East Asian samples (r = 0.48 for significant CpG
   sites and r = 0.26 for top 100 CpG sites), and effects of 11 of the 15
   significant CpG sites found in the European sample remained in the same
   direction in the East Asian sample.

   Five CpG sites mapped to genes previously reported in association with
   autoimmune disorders and markers relevant to activity of the
   hypothalamic–pituitary–adrenal (HPA) axis. The top CpG, cg19432791,
   mapped to LZTFL1, a gene that was identified as an effector gene that
   contributes to severe autoimmune responses and inflammation, such as
   risk of respiratory failure caused by COVID-19^[257]31. In addition to
   its association with inflammation, LZTFL1 regulates ciliary
   localization of the Bardet–Biedl syndrome protein complex. The
   Bardet–Biedl syndrome complex is a well-replicated causative protein
   marker for obesity^[258]32 and highly relevant to HPA-axis activity by
   being involved in transducing leptin signals in hypothalamic
   neurons^[259]33,[260]34. Several top CpG sites and mapped genes in our
   MWAS were previously linked to other psychiatric disorders. For
   example, two CpG sites, cg04173586 and cg19432791, were associated with
   schizophrenia^[261]35,[262]36. Another CpG site, cg17002328, mapped to
   gene CCDC88C, was found to be related to multiple psychiatric
   disorders, including schizophrenia^[263]17,
   attention-deficit/hyperactivity disorder^[264]16 and autism^[265]37.
   Four of the 11 genes identified in our MWAS (LZFL1, SS18, CCDC88C and
   DLEU2) were found in significant DMRs associated with suicide attempts
   in adults with bipolar disorder^[266]38. Beyond psychiatric disorders,
   five CpG sites showed associations with CRP^[267]8,[268]39. CRP is one
   of the most well-studied predictors of vulnerability to
   depression^[269]40 and persistence of depressive symptoms^[270]41. As a
   pro-inflammatory cytokine, CRP has been associated with stress-related
   hyperactivity in the HPA axis^[271]42. The proteome-wide association
   analysis study (PWAS) findings provided additional evidence for the
   association between the top CpG sites and inflammation, for example, in
   tumor necrosis factor beta and interleukin 6 (IL6). Our findings
   provide further evidence supporting the association between MD and
   chronic inflammation, particularly involving the HPA axis. However, no
   association between these results and MD had been identified in
   previous MWAS, probably due to the limited sample sizes of MWAS
   conducted on mental health-related traits.

   Three CpG sites were associated with traits (for example, blood
   cholesterol levels, waist-to-hip ratio and pulse pressure) related to
   BMI. Obesity has been repeatedly associated with depression, and MR
   analyses from previous studies have indicated that this may be a causal
   association^[272]3,[273]43. Within the genes in the vicinity of the
   four CpG sites, ZNF106 is involved in the regulation of calcium
   homeostasis that is crucial for cell survival and death^[274]44. Other
   genes, including MYO1C and FNDC3B, are understood to be involved in
   energy metabolism and homeostasis and highly sensitive to stress and
   inflammation^[275]45,[276]46. Although the previous findings relevant
   to the BMI-related genes were based on GWAS studies, one of the genes,
   FNDC3B, was found in an earlier MD MWAS study based on brain-tissue
   samples^[277]13. Considering the dynamic nature of DNA methylation and
   its high sensitivity to environmental stressors, our findings suggest
   metabolic processes may play a potentially crucial role in depression,
   which may be exacerbated by adverse environmental factors and
   dysfunctional stress coping mechanisms.

   We found stronger evidence for DNAm causing MD than for the reverse
   direction. This aligns with the broader investigation by Min et al.,
   which showed that DNAm more frequently affects complex traits than vice
   versa^[278]15. However, previous research has shown MD can influence
   DNAm at CpG sites linked to MD genetic risk loci^[279]47. While our
   MWAS identified a dominant causal pathway from DNAm to MD in top CpG
   sites identified in our MWAS, reverse causation may occur in other
   genomic regions, particularly at MD genetic risk loci^[280]47.

   We observed varying results from different cohorts in our study, with
   little evidence of systematic differences in study age ranges
   contributing to the heterogeneity. The higher degree of effect size
   heterogeneity in the MD versus BMI MWAS meta-analyses suggests that
   phenotype may be a reason for differences between the studies and not
   other methodological factors, such as sample processing or covariate
   adjustment. In the MD meta-analysis, larger studies (N > 1,000) showed
   stronger correlation for the top associations, suggesting that
   sufficient statistical power may help overcome the issue of phenotyping
   inconsistency. This suggests that future individual MWAS studies of MD
   should be larger. The considerable costs of DNAm profiling, compared
   with genotyping, act as a barrier to achieving these aims.

   Although there was no significant finding in the MWAS conducted in
   non-European studies, it is notable that effects were consistent with
   the significant findings on European samples. Top pathways associated
   with MD also showed convergent results with the main analysis (for
   example pathways relevant to inflammation and immune processing).
   Correlations for effect sizes between European and East Asian samples
   were positive with the South Asian sample, consistent with previous
   evidence showing relatively high genetic consistency between European
   and East Asian groups^[281]48. Developing larger non-European DNA
   methylation samples will be crucial to provide a statistical balanced
   comparison between ancestry groups and to identify
   ancestry-group-specific DNA methylation sites for MD^[282]49.

   Our study combined many studies with widely varying sample sizes,
   compared results between two ancestry groups and replicated MR findings
   using two large mQTL datasets. We provided a comprehensive evaluation
   of sampling and analytic strategies to guide future large-scale
   meta-MWAS for mental health disorders. While the blood draw was not
   timed to coincide with the onset of a depressive episode, limiting
   causal inferences based on the temporal order of DNAm exposure and MD
   onset, the MR analyses helped to address this potential limitation.
   Further, we appreciate that factors, such as antidepressant use, could
   potentially influence the relationships observed^[283]50. Future
   research should consider environmental factors as potential moderating
   factors to better understand the epigenetic variation in individuals
   with MD. In addition, there is a lack of replication in other tissue
   types that are directly relevant to mood regulation, such as brain
   tissue. Studies have shown that the genetic drivers of DNAm have
   similar effects across multiple cell types^[284]51. Future clinical
   applications and larger sample sizes make whole-blood DNAm data more
   feasible than post-mortem tissue samples. However, to ensure the
   validity of the findings, future studies should broaden their scope by
   encompassing additional cell and tissue types.

Methods

Participants

   A total of 24,754 European-ancestry participants (5,443 MD cases) from
   18 studies were included in the meta-analysis. The mean age of
   participants in each study ranged from 15 to 59 years. Details for each
   individual study can be found in Table [285]1 and [286]Supplementary
   Information, ‘Cohort details’ section. Written consent was obtained
   from all participants. The study was approved by the NHS Tayside
   Research Ethics committee (05/s1401/89). Ethics declaration and
   participant compensation for each participating cohort can be found in
   the [287]Supplementary Information, ‘Cohort details’ section.

DNAm data preparation and quality check

   DNAm data were obtained from DNA extracted from whole blood. Eight
   studies used the Infinium Human Methylation 450 (450K) BeadChip array
   (Illumina Inc.; number of CpG sites ranged from 275,868 to 438,752
   after quality check), and the other ten studies used the Illumina
   Infinium Methylation EPIC array (Illumina Inc., number of CpG sites
   ranged from 673,085 to 809,447 after quality check). Quality checks and
   data normalization were conducted by each individual study team.
   Details are provided in the protocol papers for each individual study
   (Table [288]1 and [289]Supplementary Information, ‘Cohort details’
   section). In brief, the majority of studies used functional
   normalization for methylation data preprocessing, unless stated
   otherwise in the [290]Supplementary Information, ‘Cohort details’
   section^[291]52. Similar quality check procedures were used, including
   removing probes with outlying signal intensity, bead count and
   detection P values, removing participants with mismatched sex
   prediction from DNA methylation data, and removing cross-hybridizing
   probes that map to common genetic variants (at minor allele
   frequency > 0.05) or polymorphic probes^[292]52. M values were used for
   the association analysis^[293]53.

Diagnosis of MD

   Lifetime diagnosis of MD was derived on the basis of structured
   clinical interview or self-reported symptoms. Those studies that
   derived diagnoses of MD on the basis of structured clinical interviews
   used criteria from the DSM Fifth/Fourth Edition^[294]54. Self-declared
   MD was based on depressive symptoms lasting for more than 2 weeks and
   help-seeking due to depression. Studies that derived MD diagnosis on
   the basis of multiple time points defined cases as those who
   experienced any depressive episodes during their lifetime, and controls
   were those who did not declare MD throughout. A total of 9 studies
   defined MD cases using structural clinical interview (N cases = 3,790),
   3 studies used DSM-criteria questionnaires (N cases = 586), 3 studies
   used self-administered questionnaires for depressive symptoms (N
   cases = 528), and 3 studies defined MD cases on the basis of
   self-declared visits to a general practitioner (N cases = 539). Details
   for MD diagnosis for each cohort can be found in Table [295]1 and the
   [296]Supplementary Information, ‘Cohort details’.

   Additional exclusion criteria per study are stated in the
   [297]Supplementary Information, ‘Cohort details’ section.

Association analysis

   MWAS was conducted in each individual study before the meta-analysis.
   Linear regression was used to test the association between DNAm (M
   values, dependent variable) and MD diagnosis (independent variable)
   using a pipeline available at the URL
   [298]https://github.com/psychiatric-genomics-consortium/mdd-mwas. Those
   cohorts that used their own specific pipelines are specified in the
   [299]Supplementary Information, ‘Cohort details’ section. The pipeline
   uses the R package ‘limma’ (version 3.60.6) for linear regression on
   large omic data^[300]55. Three models representing increasingly
   rigorous correction for potential confounders were estimated.
   Covariates for the simplest model were age, sex, batch, the first 20
   methylation principal components (PCs) or surrogate variables^[301]56
   based on the study protocol for each individual cohort, and
   white-blood-cell proportions estimated from DNAm data of CD8^+ T cells,
   CD4^+ T cells, natural killer cells, B cells and granulocytes^[302]56.
   The AHRR-adjusted model had an additional covariate that adjusted for
   smoking status by including the M values for the AHRR probe
   (cg05575921), due to its known accuracy in predicting smoking^[303]57
   and its consistent availability in all studies. Finally, a third model
   with additional covariates (referred to as the ‘complex model’) was
   fitted that contained BMI and alcohol consumption in addition to all
   the other covariates included in the previous models.

   Results for the AHRR-adjusted model (referred to as the ‘main model’)
   are reported as the main findings. Standardized Cohen’s d between MD
   cases and controls were reported as effect sizes.

Meta-analysis

   Meta-analysis of cohort-level MWAS was conducted using METAL (version
   2011)^[304]58 in a two-stage process. First, meta-analysis was
   performed on studies that used 450K and EPIC arrays separately, due to
   the difference in CpG sites available for each array (Fig. [305]1).

   Those CpG sites that either were available for more than half of the
   studies using the given array or had a total sample size over 80% of
   the max sample size were kept for further analysis. CpG sites with
   excessive standard errors (s.e. > 0.5; Supplementary Fig. [306]1) were
   removed from analysis. Second, the summary statistics for 450K and EPIC
   array were meta-analyzed. A fixed-effect inverse-variance model was
   used without genomic control correction. P values were Bonferroni
   corrected (P-value threshold = 6.55 × 10^−8 to reach
   Bonferroni-corrected significance) for all 763,265 CpGs included in the
   analysis. CpGs were mapped to genes using an annotation object
   generated by the ‘IlluminaHumanMethylationEPICanno.ilm10b4.hg19’ R
   package (version 3.13)^[307]59. The annotation object was created on
   the basis of the product file provided by Illumina for the Infinium
   MethylationEPIC v1.0 Beadchip^[308]60, with the UCSC gene names
   provided as the target gene regions of the assay. We searched the EWAS
   Atlas ([309]https://ngdc.cncb.ac.cn/ewas/atlas) and EWAS Catalog
   ([310]http://www.ewascatalog.org/) for significantly associated CpGs
   and genes and the GWAS Catalog ([311]https://www.ebi.ac.uk/gwas/) for
   annotated genes.

Pathway enrichment analysis

   Following the meta-MWAS, we conducted pathway enrichment analysis to
   identify the biological pathways in which the significant CpG sites may
   be implicated. We used the ‘gometh’ function from the ‘missMethyl’ R
   package (version 1.38.0)^[312]61 for pathway analyses using the results
   of the AHRR-adjusted model. Significant CpGs found in the MWAS after
   Bonferroni correction were selected, and the rest of the CpGs profiled
   in the EPIC array were included as the background list. GO terms and
   KEGG pathways were analyzed separately. P values for both enrichment
   analyses were FDR corrected.

Identification of differentially methylated regions

   DMRs associated with MD were identified using the ‘dmrff’ R package
   (version 1.1.2)^[313]62. A DMR was identified if it contained at least
   two CpG sites within a maximum of 500-bp window, showed consistent
   direction of effect and both/all had meta-MWAS P < 0.05. Statistics
   were meta-analyzed within the identified region. P values were
   Bonferroni corrected for all regions (>2 CpG sites) and single CpG
   sites altogether. Significant DMRs were identified if the two-sided,
   Bonferroni-corrected P < 0.05.

Analysis of the confounding effect of BMI and alcohol consumption

   BMI and alcohol consumption are risk factors for MD and are known to
   have widespread associations with DNAm^[314]7,[315]29. To investigate
   whether the signals found in the main model were due to the effects of
   BMI and alcohol consumption, we conducted an additional meta-analysis
   of a complex model for the 14 cohorts that had BMI and alcohol
   consumption data available (N[total] = 20,196; see Supplementary Fig.
   [316]2 for sample sizes of individual studies). BMI and alcohol
   consumption were added as additional covariates for the complex model.
   We compared effect sizes and P values between the basic and complex
   models for significant associations found in the main model.

Out-of-sample classification of MD using MD–MS

Calculation of MD–MS and statistical model

   We created MD–MS in an independent testing sample by calculating the
   weighted sum of M values. Effect sizes from the MD meta-MWAS summary
   statistics were used as weights. Five P-value thresholds were used to
   select the CpG sites—P ≤ 1, P ≤ 0.01, P ≤ 0.001, P ≤1 × 10^−6 and
   P ≤ 5 × 10^−8—resulting in five MD–MSs.

   MD diagnosis was set as the independent variable and MS as the
   dependent variable in the logistic regression model using the ‘glm’
   function in R. Covariates were age, sex and M values of AHRR probe.
   DNAm-estimated cell proportions were not associated with any of the
   MD–MSs (P > 0.5) and were therefore not included as covariates. Batch
   and genomic relationship matrix were pre-corrected by residualizing M
   values against these covariates.

Testing sample

   GS DNAm set 3 data were used for out-of-sample classification (GS DNAm
   sets 1 and 2 were included in the meta-MWAS). GS DNAm set 3 data used
   the Illumina Infinium Methylation EPIC array, had no overlap with the
   GS DNAm data used in the meta-MWAS, and relatedness within set 3 and
   with the rest of the GS sample was removed by regressing the M values
   against the genomic relationship matrix. Quality check and
   preprocessing were kept consistent with the GS sample used in the
   meta-MWAS (sets 1 and 2).

   MD diagnosis was derived using electronic health records (EHRs) from GP
   diagnosis^[317]63. Details for the EHRs were explained in the protocol
   paper by Kerr et al.^[318]63. In brief, a subset of participants
   (N = 20,032) of GS with genotyping data gave consent to link their data
   to EHRs. All Read codes from the PH1021^[319]64–[320]66 and
   PH149^[321]67,[322]68 inventories of the Health Data Research UK
   Phenotype Library for primary care data of depression were used to
   identify cases of lifetime MD. Participants with ≥1 entry of diagnosis
   of depression were classified as cases, and those with no entry of any
   diagnosis for depression or no data to indicate depressive status were
   controls. The final testing sample contained 504 cases and 8,372
   controls.

Association between MD–MS and inflammatory protein markers

   A previous study demonstrated widespread association between protein
   abundance and DNAm^[323]28. We conducted a PWAS for MD–MS seeking to
   reveal the potentially downstream proteomic consequences of the
   measured DNAm differences.

   Lothian Birth Cohort 1936 (LBC1936), a cohort independent of the MD
   MWAS, was used for the PWAS analysis. LBC1936 is a community-based
   birth cohort of participants born in 1936, recruited in Scotland. The
   sample used in the PWAS analysis had 875 people with both DNAm and
   proteome data collected at mean age 69.8 ± 0.8 years. DNAm in LBC1936
   was profiled in whole-blood samples using the HumanMethylation450
   BeadChip Kit (Ilumina). Sample preparation and quality check were kept
   consistent with previously published work^[324]69. Proteomic data were
   profiled using lithium heparin collected plasma samples analyzed using
   a 92-plex proximity extension assay (inflammation panel; Olink
   Bioscience). The Olink team performed preprocessing using NPX Manager
   software. For 22 proteins, over 40% of samples fell below the lowest
   limit of detection, leaving 70 post-quality-check proteins.

   General linear models were used to test the association between
   relative abundance across all 70 proteins and MD–MS created at P
   threshold ≤ 5 × 10^−8. Protein abundance levels were rank-based-inverse
   normalized and residualized against age, sex, first four genetic PCs
   and array for proteomic data before entering association analysis.
   Residual scores of protein abundance were set as the dependent
   variable. Array for DNAm data and the AHRR probe were included as
   covariates in the GLM. Two-sided P values were corrected using FDR
   correction.

Heterogeneity analysis

Leave-one-out analysis

   To investigate whether a particular study had disproportionate
   influence on any meta-analytic association, we conducted leave-one-out
   meta-analyses. For each significant CpG, 18 iterations of meta-analysis
   were conducted leaving each individual study out, regardless of
   profiling arrays.

Meta-regression for age

   For the CpGs that were significant in the MWAS meta-analysis, we used
   meta-regression to analyze the potential effect of age across studies.
   A mixed-effect model with the ‘metareg’ R package^[325]70 was used for
   the meta-regression analysis. Mean age of each individual cohort was
   set as a random effect. Standardized regression coefficients from
   summary statistics of each individual study were set as estimates of
   treatment effects (the ‘TE’ option). Between-study variance was
   estimated using the restricted maximum likelihood method. The
   standardized regression coefficient of the random effect of age was
   reported as effect sizes for the meta-regression analysis.

Comparison between MD and BMI associations

   To further evaluate the heterogeneity of MD–DNAm associations between
   studies, we looked at the correlation of effect sizes for the summary
   statistics of the MD MWAS for individual studies. GS that was included
   in the MWAS, being the largest study sample in the meta-analysis
   (N = 9,502), was used to select a list of CpG sites of interest. The
   1,000 most significantly MD-associated CpG sites in GS were selected.
   Effect sizes for these CpG sites were extracted for all other studies.
   Correlation analysis was conducted on the effect sizes. We also
   performed an MWAS of BMI and conducted a similar analysis, for
   comparison with MD, to assess whether BMI–DNAm associations were
   similarly heterogeneous.

MR

   To identify potentially causal CpG sites to MD, we conducted MR
   analyses using Genetics of DNA Methylation Consortium (GoDMC; discovery
   analysis) and GS data (replication analysis).

MD GWAS summary statistics

   GWAS summary statistics were obtained from the Howard et al.
   meta-analysis for MD GWAS from the PGC, 23andMe and UK Biobank^[326]3.
   A total of 807,553 individuals (246,363 cases and 561,190 controls) of
   European ancestry were included in the MD meta-GWAS.

GoDMC and GS methylation quantitative trait loci

   Quantitative trait loci associated with DNAm (mQTL) summary statistics
   were obtained from GS and GoDMC. For GS, full mQTL summary statistics
   (N = 17,620) were obtained without any P-value thresholding.
   OmicS-data-based complex trait analysis was used for mQTL
   estimation^[327]47,[328]71. Covariates were kept consistent with the
   main model for the MD MWAS, except for using self-reported smoking
   behavior (current smoker, past smoker or non-smoker) and pack years
   (quantity of smoking) to control for smoking and adding ten genetic PCs
   as covariates. GoDMC mQTL data were obtained through the consortium
   website ([329]http://www.godmc.org.uk/)^[330]15,[331]47 using the same
   pipeline described in the GoDMC protocol paper by Min et al.^[332]15.
   The mQTL data contains 32 cohorts with 25,561 participants of European
   ancestry. The mQTL meta-analysis from GoDMC adopted a two-stage
   approach. First, a truncated set of mQTL data that reached the
   threshold of P value < 1 × 10^−5 were obtained from participating
   cohorts. This initial stage created a candidate list of mQTL
   associations (n = 120,212,413). Meta-analyses for mQTL were then
   conducted on these candidate associations.

   Samples for GS mQTL analysis, GoDMC mQTL analysis and MD GWAS were
   mutually exclusive.

Selection of CpG list

   A list of CpG sites that were either (1) significant in the MD MWAS or
   (2) within the identified DMRs were selected as CpG sites of interest.
   We extracted cis mQTL for further MR analysis^[333]15.

   In the GoDMC dataset, a total of 156 CpG sites that met the preceding
   criteria had at least one cis mQTL. The cis mQTL summary statistics for
   these CpG sites were extracted from the GoDMC dataset. For those CpG
   sites that had more than one cis mQTL after clumping (1 Mb window,
   P < 5 × 10^−8), the most significant mQTL with the lowest P value was
   selected for analysis. Those CpG sites that showed significant causal
   association with MD were selected for replication analysis using the GS
   mQTL summary statistics.

MR methods

   MR analysis was conducted using the ‘TwoSampleMR’ R package (version
   0.5.6)^[334]72. To identify causal effects of DNAm on MD, we used the
   Wald ratio MR method^[335]73 to analyze causal effects on MD using cis
   mQTLs (within a 1 Mb window in vicinity of the chosen CpG site). The
   most significant mQTL for each CpG site that reached the threshold of
   P < 5 × 10^−8 was selected. Causal effects from DNAm to MD were tested
   using both GoDMC and GS mQTL data. For the causal effect in the reverse
   direction (from MD to DNAm), MD GWAS summary statistics were clumped at
   P ≤ 5 × 10^−8, with a 1 Mb window and r^2 = 0.001. Causal effects from
   MD to DNAm were tested using mQTL data from the entire GS sample (sets
   1, 2 and 3).

MWAS in East Asian ancestry

   We sought to investigate MD associations with DNAm in participants of
   East Asian (Taiwan Biobank) ancestry. Demographic and descriptive
   statistics are included in the [336]Supplementary Information, ‘Cohort
   details’ section. As in the main meta-analysis, biological and
   technical covariates, as well as age, sex and smoking (indexed by AHRR
   probe cg05575921), were included as covariates. Evidence of
   trans-ancestry transferability of MD CpG effects was investigated by
   testing for the correlation of effect sizes across both ancestries. We
   then used the function ‘gometh’ in package ‘missMethyl’ to assess
   ontology and pathway enrichment (GO and KEGG) for differentially
   methylated CpG sites at a threshold of P < 1 × 10^−5 (N[CpG] = 24), as
   used in previous studies^[337]74.

Reporting summary

   Further information on research design is available in the [338]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [339]Supplementary Information^ (2.2MB, pdf)

   Supplementary Figs. 1–9, Tables 1–6, Methods, Results and legend for
   Supplementary Data 1.
   [340]Reporting Summary^ (2.6MB, pdf)
   [341]Supplementary Data 1^ (52.2KB, xlsx)

   Results for Mendelian Randomization (MR) analyses on causal effect of
   DNAm to MD. The table consists of results for discovery MR in
   Generation Scotland and replication MR using mQTL data from GoDMC.
   Results for Wald’s ratio tests are shown: beta, Wald’s ratio effect
   size; SE, standard error; pval, double-side P value for Wald’s ratio
   tests; pFDR, FDR-corrected P value. Following the Wald’s ratio results
   is information for the genetic instruments used for Wald’s ratio
   analysis (SNP info),: SNP, rsID of instrument; CHR, chromosome; BP,
   base pair position; A1, reference allele; A2, alternative allele; Freq,
   allele frequency estimated from the MD GWAS. In addition, effect sizes
   (beta), standard errors (SE) and P values (pval) for mQTL and MD GWAS
   shown for each test.

Acknowledgements