Abstract Alzheimer’s disease (AD) is a progressive neurodegenerative disorder with no curative treatment available. Exploring the genetic and non-genetic contributors to AD pathogenesis is essential to better understand its underlying biological mechanisms, and to develop novel preventive and therapeutic strategies. We investigated potential genetically driven epigenetic heterogeneity of AD through summary data-based Mendelian randomization (SMR), which combined results from our previous genome-wide association analyses with those from two publicly available methylation quantitative trait loci studies of blood and brain tissue samples. We found that 152 probes corresponding to 113 genes were epigenetically associated with AD at a Bonferroni-adjusted significance level of 5.49E-07. Of these, 10 genes had significant probes in both brain-specific and blood-based analyses. Comparing males vs. females and hypertensive vs. non-hypertensive subjects, we found that 22 and 79 probes had group-specific associations with AD, respectively, suggesting a potential role for such epigenetic modifications in the heterogeneous nature of AD. Our analyses provided stronger evidence for possible roles of four genes (i.e., AIM2, C16orf80, DGUOK, and ST14) in AD pathogenesis as they were also transcriptionally associated with AD. The identified associations suggest a list of prioritized genes for follow-up functional studies and advance our understanding of AD pathogenesis. Keywords: neurodegenerative diseases, dementia, aging, GWAS, mQTLs, eQTLs, Alzheimer’s disease, Alzheimer’s disease pathogenesis, methylome-wide association analyses, summary data-based mendelian randomization 1. Introduction Alzheimer’s disease (AD) is the major cause of dementia and is projected to affect more than 13 million people in the United States by 2050, thus imposing huge health and economic burdens [[28]1,[29]2]. Late onset AD is believed to be a multifactorial disease caused by complex interactions between various genetic and non-genetic factors [[30]3]. Many genetic variants mapped to several chromosomal regions and genes have thus far been associated with AD by genome-wide association studies (GWAS) [[31]4,[32]5]; although, the vast majority of AD cases cannot be etiologically attributed to these variants [[33]2,[34]6]. Also, none of non-genetic AD-associated factors (e.g., age, cardiovascular risk factors, head trauma, depression, and educational attainment) has been proven to have a strong causal relationship with AD [[35]7,[36]8]. Epigenetic modifications of gene expression in interaction with non-genetic factors are hypothesized to contribute to AD development [[37]6,[38]9], particularly in light of the heterogeneous clinical manifestations of AD observed among patients with similar or identical genetic backgrounds [[39]10]. The potential role of epigenetic mechanisms in AD pathogenesis has been widely investigated in cell lines, mouse models, post-mortem brain tissue, and blood cells [[40]6,[41]10,[42]11,[43]12,[44]13]. Several studies have explored the global DNA methylation in AD cases compared with controls, although their findings have been inconclusive, with some reporting global hypomethylation in AD, some suggesting global hypermethylation in AD, and the others reporting no significant differences between cases and controls [[45]12]. Previous studies have also provided many lines of evidence of associations between AD and gene-specific epigenetic modifications. They mainly investigated the DNA methylation and histone modification differences between AD cases and unaffected controls using candidate gene or genome-wide analysis approaches (e.g., pyrosequencing and array hybridization) which revealed AD-associated epigenetic modifications in some well-known AD genes, such as amyloid-β precursor protein (APP), Microtubule Associated Protein Tau (MAPT) [[46]14], and Apolipoprotein E (APOE) [[47]15], as well as in other genes [[48]12]. For instance, Iwata et al. discovered CpG hypermethylation in APP and MAPT in post-mortem brain samples from AD patients, which were suggested to contribute to neural dysfunction and AD development [[49]14]. Foraker et al. found that AD patients had a lower mean methylation level in 76 CpG sites across APOE gene compared with age-matched controls when hippocampus and frontal lobe samples were analyzed. However, APOE methylation was not statistically different between cases and controls in samples obtained from their cerebellum [[50]15]. In most cases, epigenetically dysregulated genes were uniquely found in a single study [[51]6,[52]10,[53]12,[54]13], although AD-associated epigenetic modifications of some genes have been replicated in independent studies. For instance, several studies have reported CpG hypermethylation in the ANK1 gene in different brain regions, such as entorhinal and prefrontal cortices, superior temporal gyrus, and/or hippocampus in AD patients [[55]16,[56]17,[57]18]. Hypermethylated regions overlapping DUSP22 gene were previously detected in entorhinal and dorsolateral prefrontal cortices and/or hippocampus of AD affected individuals [[58]18,[59]19], and CpG hypermethylation of SORBS3 was detected in the cerebral cortex of AD patients and transgenic AD mouse models [[60]11,[61]20]. Moreover, differentially methylated regions overlapping CDH23, RHBDF2, and RPL13 genes were reported in previous studies [[62]16,[63]17,[64]21]. The mRNA expressions of these genes were also found to be altered in AD patients [[65]16]. In addition, several genes whose associations with AD were replicated by independent GWAS [[66]2], such as ABCA7, BIN1, CLU, HLA-DRB5, SLC24A4, and SORL1, are epigenetically implicated in AD as well [[67]16,[68]22,[69]23]. The case-control studies and cell/animal models may not, however, reflect genetic contributions to AD-associated epigenetic modifications as they are more likely to identify the environmentally induced epigenetic alterations [[70]6,[71]9]. In addition to the studies using individual-level data, several epigenetically AD-associated genes, such as BIN1, APOC1, HLA-DRB1, HLA-DRB5, and TOMM40, have been reported by summary data-based analyses [[72]24,[73]25] which reflect genetically driven (i.e., through cis acting variants) epigenetic alterations [[74]26]. In this study, we performed methylome-wide association (MWA) analyses of AD using the summary data-based Mendelian randomization (SMR) method [[75]26] to investigate genetically driven epigenetic contributors to AD pathogenesis. Instead of analyzing individual-level data, the SMR method integrates the summary results from previous GWAS [[76]27,[77]28] and methylation quantitative trait loci (mQTLs) studies using blood samples [[78]29] and brain tissue [[79]30] in order to identify associations between AD and methylation alterations that may mediate the genetic associations examined by GWAS. Central to our study was to investigate potential genetically driven epigenetic heterogeneity of AD. Therefore, summary results from our previous GWAS which aimed to analyze genetic heterogeneity of AD in contrasting groups of subjects stratified based on their sex and history of hypertension (HTN) were used for our MWA analyses. Sex has been identified as a risk factor for AD and there are many reports highlighting sex disparities in epidemiological and clinical features of AD [[80]31,[81]32,[82]33,[83]34,[84]35,[85]36,[86]37]. HTN is also a major cardiovascular risk factor for AD that may be involved in initiation and progression of the disease by causing structural and functional damages to cerebral microvasculature and promoting amyloid plaques formation [[87]8,[88]38,[89]39]. By detecting several group-specific AD-associated single-nucleotide polymorphisms (SNPs) at P < 5E-06, our GWAS suggested that differences in the genetic architecture of AD between these contrasting groups may differentially contribute to AD pathogenesis [[90]27,[91]28]. Thus, the current study using summary results from these two GWAS may provide novel insights into potential genetically driven epigenetic heterogeneity of AD. To further validate significant findings, we compared our MWA results with those from our previous transcriptome-wide association (TWA) analyses of AD [[92]27,[93]28] that implemented the SMR method using the same GWAS summary results along with data from blood-based [[94]40] and brain-specific [[95]30,[96]41] expression quantitative trait loci (eQTLs) studies. 2. Methods 2.1. GWAS Data This study makes use of the results of our previous genome-wide association meta-analyses [[97]27,[98]28]. Briefly, these meta-analyses were performed using genotype and phenotype data from four independent datasets: (1) Cardiovascular Health Study (CHS) [[99]42]; (2) Framingham Heart Study (FHS) [[100]43,[101]44]; (3) Late-Onset Alzheimer’s Disease Family Study (LOADFS) from the National Institute on Aging [[102]45], available to the research community through the dbGaP repository ([103]https://www.ncbi.nlm.nih.gov/gap); and (4) Health and Retirement Study (HRS) [[104]46], which can be accessed through dbGaP and the University of Michigan restricted access webpage ([105]http://hrsonline.isr.umich.edu/index.php?p=data). These meta-analyses were performed under five analysis plans in which the genetic basis of AD was investigated among: (1) all subjects in each dataset, (2) only males, (3) only females [[106]27], (4) only subjects with a history of HTN, or (5) only subjects with no history of HTN [[107]28]. AD patients were mainly diagnosed clinically based on neurologic findings (e.g., using National Institute of Neurological and Communicative Disorders and Stroke-Alzheimer’s Disease and Related Disorders Association (NINCDS-ADRDA) criteria [[108]47]) and were either identified directly (LOADFS and FHS datasets) or reported indirectly (CHS and HRS datasets) through the International Classification of Disease codes, Ninth revision (i.e., ICD-9:331.0 code). The numbers of AD cases were 2741, 952, 1789, 1262, and 796 under plans 1–5, respectively; and the numbers of unaffected controls were 14739, 6337, 8402, 9608, and 4010, respectively. The studied subjects were all of Caucasian ancestry to make samples more homogeneous. For each analysis plan, the additive genetic associations of ~2 million SNPs with AD were investigated by fitting logistic regression (CHS and HRS cohorts with population-based design) [[109]48] or generalized mixed logistic regression (LOADFS and FHS cohorts with family-based design) [[110]49] models. The top five principal components of genotype data, birth year, and sex (except plans 2 and 3) of subjects were considered as fixed-effects covariates. In the case of LOADFS and FHS cohorts, family identifier was also included as a random-effects covariate in the fitted models to adjust for potential confounding from family structure. Individual GWAS results from the four datasets were then combined by inverse-variance meta-analysis [[111]50]. Under plans 2–5 that aimed to investigate the genetic heterogeneity of AD through stratified analyses of datasets under consideration, group-specific SNPs effects were identified by a Wald chi-square test (df = 1) [[112]51] which was performed for any SNPs with significant association signals in only one of the contrasting groups in order to determine whether the SNPs odds ratios were significantly different between males and females (plans 2 and 3) [[113]27] and between hypertensive and non-hypertensive subjects (plans 4 and 5) [[114]28]. [MATH: χ2=(b1b2) 2se12< /mn>+se22 :MATH] (1) where b[1] (se[1]) and b[2] (se[2]) are the beta coefficients (and their standard errors) of a SNP in each of the two contrasting groups. 2.2. mQTLs Data The summary results from two previous mQTLs studies using blood samples (n = 1980) [[115]29] and human brain tissue (n = 1160 from a meta-analysis of three independent brain-specific mQTLs data of mostly dorsolateral prefrontal cortex and fetal brain samples) [[116]30] were also used for our analyses. The mQTLs studies provided genome-wide CpG methylation data using the Illumina Human Methylation 450 K array. The mQTLs data in the format compatible for MWA analyses can be downloaded at: [117]https://cnsgenomics.com/software/smr/#DataResource. The annotation of probes was in accordance with the Illumina support files for Human Methylation 450K array. Probes which were located in the inter-genic regions (IGRs) (i.e., not located within any gene or within 1.5 kb of the transcription start site of any gene [[118]52]) were annotated to their closest genes. 2.3. MWA Analysis Under each of the five analysis plans, two sets of MWA analyses (i.e., blood-based and brain-specific) were performed by combining the results from our GWAS with publicly available summary results from the two mQTLs studies. MWA analyses were performed by the SMR package (v 0.710) [[119]26] to identify SNPs that might be pleiotropically associated with AD and DNA methylation changes. The SMR package was run using default input arguments. Probes that had at least one significant mQTL (i.e., a SNP with P[mQTL] < 5E-08) that was also among the SNPs in our GWAS were included. This resulted in the inclusion of sets of up to 90,357 and 90,848 probes with significant cis-mQTLs from blood-based and brain-specific mQTLs studies under the five analysis plans. Associations of any probes with AD were first sought through a SMR test, and significant associations were determined at a Bonferroni-adjusted significance level of 5.49E-07 (i.e., 0.05/91000) to account for multiple comparisons. Probes with significant P[SMR] were then selected for heterogeneity in dependent instruments (HEIDI) testing to identify associations that were likely to arise from the pleiotropic effects of a single locus on both methylation changes and AD status (i.e., probes with P[HEIDI] ≥ 0.05) and not from the linkage between adjacent variants that affected AD susceptibility and methylation patterns separately (i.e., probes with P[HEIDI] < 0.05) [[120]26]. Here, HRS was used as the reference panel for estimating pair-wise linkage disequilibrium and SNP clumping. To examine the consistency of probe effects in blood-based and brain-specific analyses, the b[SMR] of any probes were compared between these analyses using the chi-square test mentioned above in the GWAS data section. In addition, probes that were detected in either males or females and in either hypertensive or non-hypertensive groups were subject to the chi-square test to find out whether their b[SMR] were significantly different between the two contrasting groups (i.e., they had group-specific effects). Finally, lists of AD-associated genes from MWA analyses were compared to those from our previous blood-based and brain-specific TWA analyses [[121]27,[122]28] to identify any overlaps between epigenetically and transcriptionally AD-associated genes. 2.4. Pathway Enrichment Analysis Pathway enrichment analysis was performed to correlate nominally AD-associated genes in our MWA results with biological processes that might contribute to AD pathogenesis. Pathway-based analyses were performed by the GSA-SNP2 (i.e., gene set analysis-single nucleotide polymorphism2) package [[123]53] using 1329 canonical pathways provided by the Broad Institute gene set enrichment analysis (GSEA) website [[124]54] based on information from several pathway databases such as Kyoto Encyclopedia of Genes and Genomes (KEGG) [[125]55], REACTOME pathway knowledgebase [[126]56], Pathway Interaction Database (PID) [[127]57], and Matrisome Project [[128]58]. Significant AD-associated pathways were determined using plan-specific false discovery rates (FDR) [[129]59] at which the numbers of possible false-positively detected pathways were smaller than 1 (i.e., FDR levels between 0.05 and 0.25). 2.5. Ethics Approval This study focuses on secondary analysis of data obtained from dbGaP and the University of Michigan [[130]42,[131]43,[132]44,[133]45,[134]46] (please see the [135]Supporting Acknowledgment in Additional File 1) and does not