Abstract Environmental factors including viruses, diet, and the metabolome have been linked with the appearance of islet autoimmunity (IA) that precedes development of type 1 diabetes (T1D). We measured global DNA methylation (DNAm) and untargeted metabolomics prior to IA and at the time of seroconversion to IA in 92 IA cases and 91 controls from the Diabetes Autoimmunity Study in the Young (DAISY). Causal mediation models were used to identify seven DNAm probe-metabolite pairs in which the metabolite measured at IA mediated the protective effect of the DNAm probe measured prior to IA against IA risk. These pairs included five DNAm probes mediated by histidine (a metabolite known to affect T1D risk), one probe (cg01604946) mediated by phostidyl choline p-32:0 or o-32:1, and one probe (cg00390143) mediated by sphingomyelin d34:2. The top 100 DNAm probes were over-represented in six reactome pathways at the FDR <0.1 level (q = 0.071), including transport of small molecules and inositol phosphate metabolism. While the causal pathways in our mediation models require further investigation to better understand the biological mechanisms, we identified seven methylation sites that may improve our understanding of epigenetic protection against T1D as mediated by the metabolome. Keywords: DNA methylation, metabolomics, type 1 diabetes 1. Introduction Type 1 diabetes (T1D) is an autoimmune disease characterized by the production of antibodies which target pancreatic [MATH: β :MATH] -cells. The disease currently affects over 30 million people worldwide [[46]1] and is increasing by 3–4% per year on average [[47]2]. Genetic predisposition accounts for some of the etiology of T1D (siblings of an individual with T1D have a relative risk 15 times higher than those without a sibling with T1D) [[48]3] and explains some geographic variation in incidence [[49]2]. Human leukocyte antigen (HLA) genes were the first to be linked to T1D and account for much of the genetic predisposition to the disease, but genome-wide association studies (GWAS) have also identified more than 40 other T1D-associated loci [[50]4]. However, it is still unclear how these multiple loci [[51]5] interact with one another and the environment to produce a T1D diagnosis. In addition to the complex genetics of T1D, low monozygotic (MZ) twin concordance (approximately 50%) and the increasing incidence over time [[52]2] support the theory that non-genetic factors play an important role in T1D development [[53]1]. Epigenetic differences may be important contributors to T1D etiology. Changes in methylation have been associated with other autoimmune conditions, and monozygotic twins can be epigenetically heterogeneous despite sharing an identical genetic code [[54]1]. Rakyan et al. [[55]1] and Stefan et al. [[56]6] performed epigenome-wide association studies (EWAS) in discordant and concordant twin pairs and found that methylation profiles were more similar among participants with T1D than in unaffected twins. Epigenetics profiles were also combined with GWAS data and differentially methylated sites were mapped to six well-known T1D susceptibility genes, including two major histocompatibility complex (MHC) genes and several HLA loci [[57]1,[58]6]. Finally, Johnson et al. found that T1D cases had different longitudinal methylation patterns compared to controls prior to diagnosis [[59]7]. Environmental factors including viruses, diet, and the metabolome have also been linked with islet autoimmunity (IA) [[60]8] and T1D etiology [[61]4]. Previous studies have found associations between T1D and differentially expressed phospholipids and sphingolipids, excretion of modified amino acids, and vitamin D (and related compounds on its metabolic pathway) [[62]4,[63]9]. To date, the effects of the metabolome and DNA methylation have been studied separately in T1D; thus, the combined nutrigenomics of T1D remain unclear. Early in vitro studies confirmed metabolome-dependent alterations in DNA methylation associated with various cancers [[64]10,[65]11], and even identified oncometabolites associated with the development of glioma [[66]12]. More recent human studies also support the connection between the metabolome and methylation in breast cancer [[67]13], colorectal cancer [[68]14], and smoking-related diseases [[69]15]. A 2018 study by Zaghlool et al. used linear models to link metabolomics and DNA methylation with type 2 diabetes (T2D) and obesity and used Mendelian randomization (MR) to provide evidence for “a causal effect of metabolite levels on methylation of obesity-associated CpG sites” [[70]16]. Causal interpretations are of particular interest when integrating epigenetic and metabolomics data. The counterfactual framework for mediation analyses is widely used in statistics and epidemiology because it can allow for causal interpretations of observational data. Additionally, it provides a relatively intuitive interpretation of mediation effects by quantifying how much an outcome of interest would likely differ given a change in either the exposure or mediator [[71]17]. Because mediation is defined based on an assumed causal model, it is important that causality assumptions are reasonable [[72]18]. It is difficult to evaluate causal assumptions across the epigenome, but longitudinal study designs allow for mediation models in which the exposure variable is measured prior to the mediator. To our knowledge, this is the first study to examine causal links between DNAm, the metabolome, and T1D. The Diabetes Autoimmunity Study in the Young (DAISY) follows 2547 high-risk children in Colorado for the development of IA and T1D (ClinicalTrials.gov identifier: [73]NCT03205865). Follow-up includes blood sample collection at 9, 15, and 24 months, then annual collection until one or more autoantibodies are detected. Participants who test positive for one or more autoantibodies are asked to follow an accelerated protocol with visits and blood sample collection every 3–6 months. IA is defined as two consecutive visits at which a confirmed autoantibody to insulin, GAD65, IA-2, or ZnT8, was detected ([74]Figure 1) [[75]19]. The children are followed over time, until they are diagnosed with T1D by a physician using American Diabetes Association criteria. Thus, DAISY’s study design allows us to include exposure and mediator variables that are temporally separated in our mediation models because methylation was measured prior to seroconversion (PSV), defined as the autoantibody-negative visit preceding the visit at which the child first tested autoantibody positive (SV), and metabolite was measured at SV and vice versa. This aids in interpretation of the results and in theory does not violate the assumptions of a mediation analysis. Figure 1. [76]Figure 1 [77]Open in a new tab (a) A flowchart depicting progression from birth to islet autoimmunity (IA) and the Diabetes Autoimmunity Study in the Young (DAISY) visits. To reduce computation time, we performed simple linear regression without covariate adjustment to identify pairs of DNA methylation (DNAm) probes and metabolites for mediation analysis. Candidate pairs were significantly associated at the p < 0.01 level, and both DNAm probe and metabolite were associated with development of IA at the p < 0.01 level. (b) We examined pairs with DNAm measured pre-seroconversion (PSV) and metabolite measured at seroconversion (SV). (c) We also examined pairs with metabolite measured at PSV and DNAm measured at SV in separate analyses. 2. Results A total of 183 participants had both metabolite and methylation data at their PSV and SV study visits. [78]Figure 1 outlines DAISY study visits. Participant characteristics are shown in [79]Table 1, and there were no statistically significant differences between IA cases and controls with respect to age, sex, race/ethnicity, DR3/4 status, and first-degree relative (FDR) status. Table 1. Participant characteristics at first visit. Case (n = 92) Control (n = 91) Total (n = 183) p Value Age 0.387 ^1 Mean (SD) 6.2 (4.3) 6.8 (4.2) 6.5 (4.3) Range 0.7–18.3 0.7–20.3 0.7–20.3 Non-Hispanic White 0.756 ^2 No 22 (23.9%) 20 (22.0%) 42 (23.0%) Yes 70 (76.1%) 71 (78.0%) 141 (77.0%) DR3/4 0.172 ^2 No 67 (72.8%) 74 (81.3%) 141 (77.0%) Yes 25 (27.2%) 17 (18.7%) 42 (23.0%) Sex 0.599 ^2 Female 44 (47.8%) 40 (44.0%) 84 (45.9%) Male 48 (52.2%) 51 (56.0%) 99 (54.1%) FDR Status 0.819 ^2 First-degree relative with T1D 49 (53.3%) 50 (54.9%) 99 (54.1%) General population (no first degree relative with T1D) 43 (46.7%) 41 (45.1%) 84 (45.9%) [80]Open in a new tab ^1 Linear model ANOVA; ^2 Pearson’s Chi-squared test. Out of a total 379,557,915 possible pairs, our linear regression filtering step identified 1490 pairs with methylation measured at PSV and metabolite measured at SV where metabolite and methylation probe were significantly associated with one another, and both were associated with IA at the p < 0.01 level ([81]Figure 1b). We also identified 600 pairs with metabolite measured at PSV and methylation measured at SV for mediation analysis ([82]Figure 1c). Correlations between DNAm and metabolite for all candidates are shown in [83]Figure 2. Figure 2. [84]Figure 2 [85]Open in a new tab Heat maps depicting the correlation coefficient between DNA methylation (DNAm) probes in rows and metabolites in columns. Columns and rows were clustered using the complete linkage method. (a) Correlation between DNAm at pre-seroconversion (PSV) and metabolite measured at seroconversion (SV). (b) Correlation between DNAm at seroconversion (SV) and metabolite measured pre-seroconversion (PSV). After FCR adjustment, seven pairs had a confidence interval for the estimate of NIE that did not contain 0 on the log odds scale ([86]Table 2). Each pair contained a unique probe, but the metabolite histidine was the mediator in five of the pairs. No pairs in which metabolite was measured at PSV and methylation was measured at SV were significant after FCR adjustment. All confidence intervals are presented after FCR adjustment. For all seven pairs, the metabolite was positively associated with IA, while methylation was negatively associated with both the outcome and respective metabolite. Thus, an indirect effect less than one suggests that some of the protective effect of methylation is via reduction in the metabolite levels. Table 2. Methylation–metabolite pairs with significant natural indirect effects. Metabolite Probe Position Relation to Island Gene Type Description NIE NIE CI Low NIE CI High Histidine cg15688253 chr1:1096717 N_Shore 0.766 0.418 0.993 Histidine cg15052330 chr2:72360243 OpenSea CYP26B1 0.714 0.339 0.950 PC (p-32:0) or PC (o-32:1) cg01604946 chr5:148398804 OpenSea SH3TC2 Protein Coding SH3 Domain; Tetratricopeptide Repeats 2; SH3TC2 Divergent Transcript 0.663 0.342 0.895 Histidine cg19939773 chr6:32729876 Island HLA-DQB2 0.785 0.439 0.994 SM (d34:2) [M+HAc-H]- & [M+Cl]- _YLWSJLLZUHSIEA-CKSUKHGVSA-N cg00390143 chr12:132842539 Island GALNT9 Protein Coding Polypeptide N-Acetylgalactosaminyltransferase 9 0.777 0.494 0.999 Histidine cg01172082 chr14:104645732 Island KIF26A Protein Coding Kinesin Family Member 26A 0.759 0.390 0.989 Histidine cg07964219 chr21:46847898 S_Shore COL18A1 Protein Coding Collagen Type XVIII Alpha 1 Chain; COL18A1 Antisense RNA 1; COL18A1 Antisense RNA 2 0.795 0.447 0.986 [87]Open in a new tab Of the unique probes, six measure methylation at a site in a known gene, and four genes are protein coding. Of the four probes linked to protein-coding genes, two are located within a CpG dinucleotide island (cg00390143 and cg01172082), one on the southern shore of the CpG island (cg07964219), and one in open sea (cg01604946). These probes are associated with the GALNT9, KIF26A, COL18A1, and SH3TC2 genes, respectively. The probe associated with HLA-DQB2 (cg19939773) was also located within a CpG island, while cg15052330 (linked to the CYP26B1 gene) was in open sea and cg15688253 was on the north shore of a CpG island. The metabolite histidine mediated the effect of the five methylation sites associated with the CYP26B1, HLA-DQB2, KIF26A, and COL18A1 genes, while a phosphatidyl choline (PC) identified as PC (p-32:0) or PC (o-32:1) mediated the effect of cg01604946 (located in the SH3TC2 gene), and a sphingomyelin (SM) we identified as SM (d34:2) mediated the effect of cg00390143 (GALNT9 gene). Six reactome pathways were enriched (FDR q-value < 0.10) ([88]Table 3), including plasma lipoprotein assembly, remodeling, and clearance and inositol phosphate metabolism. All pathways were enriched at least two-fold. Over-representation of the inositol phosphate metabolism pathway was driven by the INPP5A and PLCH2 genes (associated with probes cg13931663 and cg20468586, respectively, from our list of the top 100 probes). The plasma lipoprotein assembly, remodeling, and clearance pathway over-representation was driven by the LIPA and PCSK6 genes (probes cg24405248 and cg07375207, respectively). Table 3. Reactome pathway enrichment. Term Number of Genes in Top 100 Probes Number of Genes in Reference List Expected Fold Enrichment p Value FDR q Value Formation of the cornified envelope (R-HSA-6809371) 3 105 0.32 9.24 0.004 0.071 Inositol phosphate metabolism (R-HSA-1483249) 2 43 0.13 15.04 0.008 0.071 Ion transport by P-type ATPases (R-HSA-936837) 2 49 0.15 13.2 0.011 0.071 Keratinization (R-HSA-6805567) 3 159 0.49 6.1 0.014 0.071 Transport of small molecules (R-HSA-382551) 6 668 2.07 2.91 0.017 0.071 Plasma lipoprotein assembly, remodeling, and clearance (R-HSA-174824) 2 63 0.19 10.27 0.017 0.071 [89]Open in a new tab None of the biopathways were significant after p-value adjustment ([90]Table 4). However, several immune-system-related pathways were significant at an unadjusted p < 0.05 level, including interleukin signaling and transmembrane transport of small molecules. Interestingly, transport and metabolism of small molecules appeared in both the over-representation and enrichment analyses. Table 4. Bioplanet enrichment (top ten). Term Number of Genes in Top 100 Probes Number of Genes in Reference List p Value FDR q Value Fold Enrichment Genes Interleukin receptor SHC signaling 2 28 0.0031816 0.2497991 26.411141 IL3;IL5RA Ion transport by P-type ATPases 2 36 0.0052218 0.2497991 20.188641 ATP4B;ATP2B2 Interleukin-3 signaling pathway 2 45 0.0080651 0.2497991 15.955894 IL3;IL5RA Interleukin-3, interleukin-5, and GM-CSF signaling 2 45 0.0080651 0.2497991 15.955894 IL3;IL5RA Regulation of NFAT transcription factors 2 47 0.0087727 0.2497991 15.245211 IL3;IKZF1 Ion channel transport 2 61 0.0144583 0.2497991 11.619521 ATP4B;ATP2B2 Sodium-coupled sulphate, di- and tri-carboxylate transporters 1 5 0.0149116 0.2497991 84.474576 SLC13A2 Cytochrome P450 metabolism of vitamins 1 6 0.0178676 0.2497991 67.576271 CYP26B1 Phase I of biological oxidations: non-cytochrome P450 enzymes 1 7 0.0208149 0.2497991 56.310735 LIPA Eosinophils in the chemokine network of allergy 1 8 0.0237535 0.2497991 48.263922 IL3 Small cell lung cancer 2 84 0.0263601 0.2497991 8.350715 LAMA2;TRAF5 Hematopoietic cell lineage 2 88 0.0287269 0.2497991 7.960706 IL3;IL5RA Transmembrane transport of small molecules 4 432 0.0405993 0.2497991 3.256342 ATP4B;SLC13A2;STEAP3;ATP2B2 [91]Open in a new tab 3. Discussion We found seven unique pairs of CpG dinucleotide sites and metabolites with significant natural indirect mediation effects. These pairs included five DNAm probes mediated by histidine (a metabolite known to affect T1D risk), one probe (cg01604946) mediated by phostidyl choline p-32:0 or o-32:1, and one probe (cg00390143) mediated by sphingomyelin d34:2. For all seven pairs, the metabolite was positively associated with IA, and the methylation of the CpG site was negatively associated with both the metabolite and IA. Thus, an indirect effect less than one suggests that some of the protective effect of the CpG site is via reduction in metabolite levels. We previously discovered nominally significant positive associations between histidine and the risk of progression from IA to T1D [[92]20], and other studies indicated that the histidine–glutamate–glutamine pathway can be corrected by improving glycemic control [[93]21]. Additionally, histidine is a precursor to histamine, a monoamine that plays an important role in inflammation and has been linked to the development of T1D and other immune responses [[94]22,[95]23]. A knock-out mouse experiment for the gene encoding histidine decarboxylase (the enzyme that converts histidine to histamine) showed that decreased histamine levels were associated with a lower incidence of T1D and decreased levels of circulating IL-12 and IFN- [MATH: γ :MATH] [[96]22]. While the connection between histamine and T1D is relatively well studied, it remains unclear how it might mediate the protective effects of the CpG sites we identified, and how the methylation of those CpG sites might affect T1D etiology. The association of IA and the CpG site profiled by probe cg19939773, located approximately 1.4 kb upstream of the second exon of the HLA-DQB2 gene, was mediated by histidine. HLA-DQB1 is an important risk gene in development of T1D, but the literature on HLA-DQB2 is less clear [[97]24]. Interestingly, methylation of the KIF26A gene (probe cg01172082), which was also mediated by histidine, was found to be affected by a 5-day high-fat high-energy diet in young men, and these changes could potentially contribute to the insulin resistance seen in some of the study participants with T2D [[98]25]. Das et al. found using whole-exome sequencing (WES) that the COL18A gene (which we found was mediated by histidine) is protective against diabetic retinopathy, but the sample size was small, and these results warrant replication [[99]26]. A relatively large study also found that polymorphisms in the collagen protein encoded by COL18A may be related to increased obesity in patients with T2D [[100]27]. COL18A antisense RNA is also not well understood, but mouse models have provided some evidence that other antisense RNAs, namely GLUT-2, can increase risk of diabetes, [[101]28] so it is possible that increased methylation results in lower expression of risk-increasing antisense RNA. For the phosphatidyl choline (PC) identified as either PC (p-32:0) or PC (o-32:1) and the sphingomyelin (SM) identified as SM (d34:2), a study by Oresic et al. found that participants who went on to develop T1D had lower levels of PCs at birth, but that “the lysoPC PC(18:0/0:0) was increased 1.5-fold within 9–18 mo before seroconversion” [[102]8]. Elevated lysoPCs are believed to be a marker of oxidative stress prior to the development of islet autoantibodies [[103]29], but our group also found that PC (16:0_18:1(9Z)) was the strongest single metabolite predictor of IA reversion. The CpG site profiled by the cg01604946 probe is located within the SH3TC2 gene and its association with IA is mediated by the compound identified as PC (p-32:0) or PC (o-32:1). This phosphatidyl choline is generally associated with demyelinating motor and sensory neuropathy, and its connection to T1D requires more research [[104]30]. Like PCs, the research regarding SMs generally suggests that they have a protective effect against developing T1D [[105]8,[106]31] and mouse models have shown that SM patches on pancreatic [MATH: β :MATH] -cells correlate well with insulin production capacity [[107]32]. However, SMs are also strongly associated with rapid eGFR decline [[108]33] and general nephropathy [[109]34] in those who already have T1D. Both SMs and PCs have also been linked to renal impairment and all-cause mortality in T1D, [[110]35] which indicates that understanding of their role in T1D etiology remains incomplete and specific SMs and PCs may not be protective. Like the COL18A and KIF26A genes, which were mediated by histidine, the CpG probed by cg00390143 was associated with the GALNT9 gene, which has been implicated in T2D but remains understudied [[111]36]. This study has identified several potential causal pathways in the etiology of IA. The seven methylation sites that were significantly mediated by metabolites are potentially interesting candidates for elucidating epigenetic protection against T1D. While the causal pathways in our mediation models are temporally reasonable in the sense that the exposure was measured prior to the mediator, a better understanding of the biological pathways is necessary to confirm these exploratory analyses. A limitation of this work is that metabolites were measured in non-fasting samples, and these analyses did not account for dietary intake, which is the single biggest source of exposure to chemicals and nutrients [[112]37]. Although DAISY has collected dietary information, these data are available on only a subset of the relevant timepoints herein; therefore, adjustment for dietary intake would have significantly decreased our sample size. However, case/control status for each participant is unknown at the time of metabolite measurement, so participants are effectively randomized with respect to outcome. Thus, we would not expect there to be a systematic difference in diet that would affect these results. Enrichment and over-representation tools are not designed for the integration of methylation and metabolomics data. To our knowledge, there are no integrated enrichment or over-representation tools that can incorporate multiple omics datasets, so our biological interpretation relied on an ad hoc choice of the 100 most significant DNAm probes. Additionally, we did not adjust the candidate selection step for multiple testing, and only adjusted p values at the second stage, which could potentially lead to false positives. However, the best approach to p-value adjustment for correlated variables in this filtering step remains an active area of statistical research and we suggest that this approach is unlikely to result in many false positives due to the adjustments used in the later stages [[113]38]. However, despite these limitations we believe that our approach is a novel method for the integration of omics data in epidemiological studies. The combination of a longitudinal study design and mediation analysis allows for causal interpretation of our results, which will hopefully guide additional research into biological mechanisms. Additionally, this approach is not limited to DNAm and metabolite studies and could in theory be applied to any longitudinal study with multiple omics datasets. 4. Materials and Methods 4.1. Study Design and Participants Study participants were recruited via newborn screening at St. Joseph’s Hospital in Denver, CO, USA and from unaffected first-degree relatives (FDR) of type 1 diabetes patients. For the DAISY nested case–control study, IA cases were frequency-matched to controls by age at SV, race/ethnicity, and sample availability. The majority of participants were Non-Hispanic White (NHW), and race/ethnicity was categorized into NHW and Other for matching and analysis. All research was performed in accordance with relevant guidelines and regulations [[114]7,[115]39,[116]40]. Participants with both methylation and metabolomic measures at the visit at PSV and SV were selected for these analyses (n = 183). 4.2. DNA Methylation IA cases and frequency-matched controls were randomly assigned to either the 450 K group (which included duplicate samples for quality control) or EPIC group (which included replicates from the 450 K set for quality control). DNA methylation was profiled in peripheral whole blood using the Infinium HumanMethylation450K Beadchip (Illumina, San Diego, CA, USA, “450 K”) for the 450 K set, and the Infinium HumanMethylation EPIC Beadchip (“EPIC”) was used for the EPIC set. Both sets underwent identical pre-processing using the SeSAMe pipeline [[117]41], and measurement platform was included as a covariate in all statistical models in order to account for technological batch effects. Johnson and colleagues performed quality control and removed poor-quality samples and DNAm probes (see Johnson et al. [[118]7] for details). This resulted in 199,243 quality DNA methylation probes and 183 subjects having DNAm at both PSV and SV timepoints. Normalized M values were used in all statistical analyses. We use the term DNAm probe and the probe identifier when referring to the data in the results. However, each probe is designed to measure methylation at a CpG site which is used as a more general term in the discussion. 4.3. Metabolomics Untargeted metabolomics were obtained using gas chromatography–time-of-flight mass spectrometry (GC–TOF MS), charged surface hybrid column quadrupole time-of-flight mass spectrometry (CSH–QTOF MS), and hydrophilic interaction chromatography quadrupole time-of-flight mass spectrometry (HILIC–QTOF MS) at the UC Davis West Coast Metabolomics Center. Non-fasting plasma samples were prepared and analyzed as previously described in Johnson et al. [[119]20]. For GC–TOF data, peak picking and annotation was performed using BinBase [[120]42]. CSH–QTOF MS and HILIC–QTOF MS were processed using MS-Dial [[121]43], complex lipids were annotated with LipidBlast [[122]44] and Massbank of North America ([123]http://mona.fiehnlab.ucdavis.edu/, accessed on 6 August 2021), and erroneous peaks were removed using MSFLO [[124]45]. After collection, annotation, and post-processing, metabolomics data were normalized using SERRF [[125]46], a QC-based method designed to account for batch effects. Samples with low abundance (n = 2) and metabolites with a coefficient of variation more than two absolute deviations from the median (n = 344) were excluded, and data were transformed using the Box-Cox method [[126]47]. After processing and quality checks, 2457 untargeted metabolites remained, of which 1905 metabolites were annotated with either a lab-specific or InChIKey identifier [[127]20]. Only annotated metabolites were considered for these analyses. 4.4. Statistical Analysis All analyses were performed using the R programming environment [[128]48] version 4.0.0 and included only participants with methylation and metabolomics data at both their SV and PSV study visits (n = 183). To reduce the number of methylation probe–metabolite pairs examined in the mediation analysis, we performed simple linear regression to find probes and metabolites that were correlated at a nominal p < 0.01 level. Next, we performed two independent logistic regressions on all significantly correlated pairs (one model for DNAm probe and one for metabolite) to determine pairs in which both were significantly associated with the IA outcome, again at a nominal 0.01 level. Pairs in which both metabolite and probe were correlated with one another, and both variables were associated with the outcome, were selected for the mediation analysis. These p values were not adjusted for multiple comparisons because this was a candidate-filtering step intended to save computing time for later steps. Model-based mediation analyses were performed using the regmedint package [[129]49] version 0.2.1, an R implementation of Valeri and VanderWeele’s SAS macros [[130]50]. We report the natural indirect effect (NIE). The natural indirect effect is interpreted on the odds scale and represents the average change in outcome if the exposure [MATH: a :MATH] (methylation at PSV) was fixed at 1, but the mediator [MATH: t :MATH] (metabolite at SV) were changed from the level it would take if [MATH: a=0 :MATH] to the level it would take if [MATH: a=1 :MATH] [[131]17]. In other words, it is the effect of the exposure on the outcome that operates through changing the mediator [[132]17]. All regression models were adjusted for HLA-DR3/4 haplotype, age at PSV, time from PSV to SV, and sex, and included exposure mediator interaction terms as recommended by VanderWeele [[133]17]. Confidence intervals for the NIE were calculated based on 10,000 bootstrap simulations using the percentile method and adjusted using Benjamini and Yekutieli’s False Coverage–Statement Rate (FCR) method [[134]51] at the q = 0.05 level (equivalent selection based on a false discovery rate-adjusted p value < 0.05). Because DAISY is a case–control study, cases are oversampled relative to the general population. To account for this, the regmedint package fits the mediator model using only the control participants, which approximates the results from a cohort study when the outcome is rare [[135]17]. 4.5. Biological Interpretation Pathway over-representation was performed using PANTHER’s [[136]52] implementation of reactome pathways [[137]53] and included 60 unique genes associated with the top 100 probes from the mediation step of the analysis. These p values were based on the standard error of the NIE estimated using the multivariate delta method, as opposed to being converted from bootstrap confidence intervals [[138]17]. CpG sites were converted to Entrez IDs using the missMethyl R package [[139]54] and the 199,243 original probes were used as the reference list. Pathways with less than 5 genes in the reference list and less than two genes in the analysis list were excluded. Pathway enrichment analysis was performed using the Enrichr ([140]https://maayanlab.cloud/Enrichr/, accessed on 6 August 2021) [[141]55] Biopathways 2019 module [[142]56], again with the 60 unique genes associated with the top 100 candidate probes. We also searched for known genotypes at certain loci with allele-specific methylation using the BIOS QTL browser ([143]https://genenetwork.nl/biosqtlbrowser/, accessed on 6 August 2021) [[144]57]. These methylation quantitative trait loci (meQTLs) can alter methylation patterns across genomic regions [[145]58]. We then used the biomaRt R package [[146]59] to obtain gene symbols, which were annotated using GeneCards [[147]60] via the GeneBook R package [[148]61]. Author Contributions Conceptualization and methodology, K.K., J.M.N., L.A.V., P.M.C., R.K.J., A.M.K., L.P. and T.V.; formal analysis, T.V.; data curation, L.A.V., O.F., B.C.D. and I.Y.; writing—original draft preparation, T.V.; writing—review and editing, all authors; visualization, T.V. and P.M.C.; supervision, K.K., J.M.N., and L.A.V.; project administration, K.K., J.M.N. and L.A.V.; funding acquisition, M.R., K.K. and J.M.N. All authors have read and agreed to the published version of the manuscript. Funding This research was funded by NIH/NIDDK, grant number R01DK104351, NIH/NIAID, grant number R21AI142483 and NIH/NCI, grant number U01 [149]CA235488. Institutional Review Board Statement The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Colorado Multiple Institutional Review Board (COMIRB 92-080). Informed Consent Statement Informed consent was obtained from all subjects involved in the study. Data Availability Statement The data presented in this study are available on request from the corresponding author. The data are not publicly available because they contain protected health information. Conflicts of Interest The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Footnotes Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. References