Abstract
With age, hematopoietic stem cells can acquire somatic mutations in
leukemogenic genes that confer a proliferative advantage in a
phenomenon termed CHIP. How these mutations result in increased risk
for numerous age-related diseases remains poorly understood. We conduct
a multiracial meta-analysis of EWAS of CHIP in the Framingham Heart
Study, Jackson Heart Study, Cardiovascular Health Study, and
Atherosclerosis Risk in Communities cohorts (N = 8196) to elucidate the
molecular mechanisms underlying CHIP and illuminate how these changes
influence cardiovascular disease risk. We functionally validate the
EWAS findings using human hematopoietic stem cell models of CHIP. We
then use expression quantitative trait methylation analysis to identify
transcriptomic changes associated with CHIP-associated CpGs. Causal
inference analyses reveal 261 CHIP-associated CpGs associated with
cardiovascular traits and all-cause mortality (FDR adjusted
p-value < 0.05). Taken together, our study reports the epigenetic
changes impacted by CHIP and their associations with age-related
disease outcomes.
Subject terms: DNA methylation, Population genetics, Ageing,
Haematopoiesis, Cardiovascular diseases
__________________________________________________________________
In CHIP, somatic mutations in a hematopoietic stem cell lead to a
clonal subpopulation of blood cells. Here, the authors perform a CHIP
meta-EWAS to establish its epigenetic features and age-related
outcomes.
Introduction
A hallmark of aging is the accumulation of somatic mutations in
dividing cells. The vast majority of these mutations do not affect cell
fitness. In rare circumstances, however, a mutation can arise in a
progenitor cell that confers a selective fitness advantage, culminating
in its expansion relative to other cells. In the hematopoietic system,
this process is termed clonal hematopoiesis (CH). Individuals with CH
are at increased risk for the development of hematologic
malignancies^[77]1. A subset of CH is driven by pathogenic mutations in
myeloid malignancy-associated genes, which is termed CH of
indeterminate potential (CHIP) and has been shown to be associated with
hematologic cancers, cardiovascular disease (CVD), chronic obstructive
pulmonary disease, and mortality, among other conditions^[78]2–[79]4.
The prevalence of CHIP increases with advancing age^[80]2,[81]5–[82]7.
In a whole genome sequencing (WGS) study from the NHLBI Trans-Omics for
Precision Medicine (TOPMed) program that included ~100,000 individuals
across 51 separate studies, large CHIP clones were found to be uncommon
(<1%) in individuals younger than 40 years of age and increased to 12%
in those aged 70–89 and 20% in those aged 90 years and older^[83]5.
This age-dependent pattern was consistent across CHIP driver
genes^[84]5 and has been observed in other studies^[85]2,[86]6,[87]7.
DNA methylation (DNAm), the addition of a methyl group to a cytosine
followed by a guanosine (CpG) in DNA, is an epigenetic modification
that reflects age and environmental exposures. The gene products of the
three most frequently mutated CHIP driver genes, DNMT3A, TET2, and
ASXL1, are epigenetic regulators^[88]5. DNMT3A (DNA-methyltransferase
3A) is a methyltransferase that catalyzes the transfer of methyl groups
to CpG sites and catalyzes de novo DNA methylation^[89]8. Conversely,
TET2 (ten-eleven translocation-2) is a DNA demethylase that catalyzes
the conversion of 5-methylcytosine to 5-hydroxymethylcytosine, one of
the steps leading to eventual demethylation of CpG sites^[90]9. ASXL1
(ASXL transcriptional regulator 1) is involved in histone
modification^[91]10. Its function in CHIP remains relatively
unknown^[92]11.
CHIP has been shown to be associated with global DNAm changes,
particularly for the DNMT3A and TET2 CHIP driver gene mutations^[93]12.
A previous epigenome-wide association study (EWAS) of CHIP in 582
Cardiovascular Health Study (CHS) participants, with replication in
2655 Atherosclerosis Risk in Communities (ARIC) participants, revealed
several thousand CpG sites associated with CHIP and its two major CHIP
driver genes, DNMT3A and TET2^[94]12. DNMT3A and TET2 CHIP were also
found to have directionally opposing DNAm signatures: DNMT3A CHIP
mutations were associated with hypomethylation of CpGs, whereas TET2
CHIP was associated with hypermethylation of CpGs, consistent with the
canonical regulatory functions of DNMT3A and TET2 elucidated in murine
and human model systems^[95]13–[96]15.
Despite the wealth of information from the previous EWAS of
CHIP^[97]12, several limitations and knowledge gaps remain. These
include the need to use larger sample sizes to enable analyses of less
prevalent CHIP driver gene mutations such as ASXL1, explore downstream
functions and pathways influenced by mRNA expression for any CHIP and
CHIP subtypes, and identify underlying molecular mechanisms linking
CHIP to CVD.
To address these knowledge gaps, we conduct a multiracial meta-analysis
of separate EWAS of CHIP in four independent cohort studies (N = 8196;
462 with any CHIP, 261 DNMT3A, 84 TET2, and 21 with ASXL1 CHIP) along
with analysis of the associations of CHIP-related CpGs with downstream
gene expression. We expand upon the previous EWAS of CHIP study^[98]12
by adding two cohorts—the Framingham Heart Study (FHS) and the
African-American Jackson Heart Study (JHS) – in addition to the ARIC
and CHS cohorts. The EWAS findings are functionally validated using
human hematopoietic stem cell (HSC) models of CHIP. Expression
quantitative trait methylation (eQTM) analysis identifies gene
expression changes associated with CHIP-associated CpGs. Causal
inference analysis using two-sample Mendelian randomization (MR) is
performed to gain insight into the molecular mechanisms linking CHIP to
CVD. A flowchart of the study design is shown in Fig. [99]1.
Fig. 1. Overview of Study Design.
[100]Fig. 1
[101]Open in a new tab
This flowchart outlines the sequential steps of the study, from data
collection to downstream analyses. CHIP Clonal Hematopoiesis of
Indeterminate Potential, WGS Whole Genome Sequencing, WES Whole Exome
Sequencing, TOPMed Trans-Omics for Precision Medicine program, eQTM
Expression Quantitative Trait Methylation.
Results
Clinical characteristics of study participants
The baseline characteristics of FHS, JHS, CHS, and ARIC participants
included in this investigation are presented in Table [102]1. The mean
age at the time of blood draw for whole-genome sequencing (WGS) was 57,
56, and 58 for FHS, JHS, and ARIC, respectively. Participants from CHS
were considerably older, with a mean age of 74 years. All four cohorts
had more women than men (54–63% women). Overall, CHIP mutations with a
variant allele frequency (VAF) ≥ 2% were present in 5% (166/3295) of
participants in FHS, 4% (68/1664) in JHS, 5% (142/2655) in ARIC, and
15% (86/582) in CHS. Consistent with previous reports^[103]5, the three
most frequently mutated CHIP driver genes across all cohorts were
DNMT3A, TET2, and ASXL1. Eighty percent of individuals with CHIP
demonstrated expanded CHIP clones with VAF > 10%.
Table 1.
Baseline Characteristics of Cohorts
Study N CHIP cases, N CHIP cases at VAF > 10% DNMT3A CHIP TET2 CHIP
ASXL1 CHIP White participants, N Black participants, N Age, mean
(range) Sex, Female (%) Smoking, N
FHS 3295 166 145 (87%) 77 (46%) 38 (23%) 21 (13%) 3295 0 57 (24–92) 54
343
JHS 1664 68 63 (93%) 44 (65%) 14 (21%) N/A* 0 1664 56 (22–93) 63 236
CHS 582 86 76 (88%) 35 (41%) 18 (21%) N/A* 302 280 74 (64–91) 61 320
ARIC 2655 142 86 (61%) 105 (74%) 14 (10%) N/A* 758 1897 58 (47–72) 61
1486
[104]Open in a new tab
*Less than 5 ASXL1 CHIP cases for indicated cohorts.
FHS Framingham Heart Study, JHS Jackson Heart Study, CHS Cardiovascular
Health Study, ARIC Atherosclerosis Risk in Communities, CHIP Clonal
Hematopoiesis of Indeterminate Potential, VAF Variant Allele Fraction.
Epigenome-wide association analysis
Race was classified based on self-report. In the race-stratified
analysis, we identified 2843 CpGs associated with any CHIP, 758 with
DNMT3A, 4735 with TET2 CHIP in White participants and 5498 with any
CHIP, 5065 with DNMT3A, and 290 with TET2 CHIP in Black participants at
Bonferroni-corrected P < 1
[MATH:
×10−7 :MATH]
(Supplementary Data [105]1–[106]6). 1290, 675, and 254 CHIP-associated
CpG sites were shared between White and Black participants at the
Bonferroni-corrected threshold, with concordant directions of effect
for any CHIP, DNMT3A, and TET2 CHIP, respectively.
In a multiracial, meta-EWAS of CHIP, 9615 CpGs were associated with any
CHIP, and 5990, 5633, and 6078 CpGs were associated with DNMT3A CHIP,
TET2 CHIP, and ASXL1 CHIP, respectively (at Bonferroni-corrected P < 1
[MATH:
×10−7 :MATH]
). The top ten CpGs for any CHIP and for each of the three CHIP driver
genes are shown in Table [107]2. A full list of CpG signatures and
their directions of effect are reported in Supplementary
Data [108]7-[109]10. There was minimal to moderate overlap of CpGs
associated with DNMT3A, TET2, and ASXL1; 429, 904, and 1088 CpG sites
were shared between DNMT3A and TET2, DNMT3A and ASXL1, and TET2 and
ASXL1, respectively.
Table 2.
Top 10 CHIP-associated CpGs
CHIP Subtype CpG CHR Position Gene β SE P-value Association with CHIP
(Black participants)* Association with CHIP (White participants)**
Any CHIP cg23014425 17 46648525 HOXB3 −0.016 8.7E-04 6.60E-79 --- ---
cg04800503 17 46648533 HOXB3 −0.028 1.5E-03 8.10E-76 --- ---
cg07727170 15 70458214 −0.016 9.1E-04 5.30E-69 --- ---
cg01966117 3 52528714 STAB1 −0.034 1.9E-03 1.20E-68 --- ---
cg19825437 3 1.69E + 08 −0.038 2.2E-03 3.60E-68 --- ---
cg25113462 2 2.39E + 08 TRAF3IP1 −0.023 1.3E-03 1.40E-64 --- ---
cg08343644 16 57662060 GPR56 −0.021 1.3E-03 1.10E-57 --- ---
cg01521274 14 71822452 −0.025 1.5E-03 7.00E-57 --- ---
cg21517792 14 1.06E + 08 MTA1 −0.024 1.5E-03 1.20E-55 --- ---
cg15059065 19 17354961 NR2F6 −0.04 2.6E-03 1.60E-53 --- ---
DNMT3A CHIP cg04800503 17 46648533 HOXB3 −0.048 1.8E-03 7.10E-150 ---
---
cg23014425 17 46648525 HOXB3 −0.026 1.0E-03 2.00E-143 --- ---
cg25113462 2 2.39E + 08 TRAF3IP1 −0.038 1.7E-03 6.20E-112 --- ---
cg03785076 2 2.42E + 08 SNED1 −0.052 2.5E-03 6.20E-95 --- ---
cg23551720 17 46633726 HOXB3 −0.038 1.9E-03 8.70E-91 --- ---
cg07727170 15 70458214 −0.023 1.2E-03 4.70E-90 --- ---
cg09749364 15 40384779 BMF −0.046 2.3E-03 5.10E-87 --- ---
cg16937168 2 2.42E + 08 SNED1 −0.059 3.1E-03 9.20E-82 --- ---
cg23146197 12 66271002 HMGA2 −0.041 2.2E-03 5.00E-80 --- ---
cg24400630 1 89728035 GBP5 −0.046 2.5E-03 1.90E-78 --- ---
TET2 CHIP cg13742400 2 2.26E + 08 DOCK10 0.086 4.5E-03 1.90E-82 +++ +++
cg19695507 10 13526193 BEND7 0.097 5.2E-03 3.30E-76 +++ +++
cg22562591 8 82002977 PAG1 0.057 3.4E-03 6.20E-65 N/A +++
cg07905808 6 30297389 TRIM39 0.068 4.0E-03 1.30E-64 +++ +++
cg00116699 2 2.40E + 08 HDAC4 0.082 4.9E-03 3.70E-63 +++ +-+
cg06043201 8 28974428 KIF13B 0.085 5.1E-03 2.20E-62 +++ +++
cg09667606 6 1.59E + 08 SYNJ2 0.077 4.6E-03 7.10E-62 +++ +++
cg17607231 2 2.31E + 08 SP140 0.12 7.3E-03 1.70E-60 +++ +++
cg01133215 6 45399681 RUNX2 0.085 5.2E-03 1.70E-59 +++ +++
cg25463483 6 30530544 PRR3 0.064 4.0E-03 1.70E-58 +++ +++
ASXL1 CHIP cg07262247 5 1.32E + 08 PDLIM4 −0.2 7.8E-03 2.10E-133 -
cg17543112 5 1.32E + 08 PDLIM4 −0.14 5.9E-03 1.20E-117 -
cg01305625 5 1.32E + 08 PDLIM4 −0.11 5.8E-03 2.70E-80 -
cg17412560 2 95963403 KCNIP3 −0.16 8.5E-03 3.90E-72 -
cg00443981 17 58499679 C17orf64 −0.17 9.7E-03 6.10E-64 -
cg02544002 3 1.29E + 08 PLXND1 −0.14 8.6E-03 3.40E-57 -
cg19529621 12 2045722 −0.15 9.3E-03 7.80E-54 -
cg02341556 11 1.19E + 08 BCL9L −0.14 9.2E-03 3.40E-53 -
cg06124793 11 1939725 TNNT3 −0.11 6.8E-03 5.60E-52 -
cg10558233 8 94892613 −0.16 1.0E-02 1.40E-51 -
[110]Open in a new tab
The effect size (β), standard error (SE), and P-values for any CHIP,
DNMT3A CHIP, and TET2 CHIP were derived from fixed-effect
meta-analysis. Because ASXL1 CHIP was only available in the Framingham
Heart Study (FHS) cohort, the β, SE, and P-values were derived from
linear regression models. Two-sided tests were used for all analyses.
P-values were adjusted for multiple comparisons using the
Benjamini-Hochberg FDR method.
* “Association with CHIP”: “+” or “-” represent the directions of
effect for any CHIP, DNMT3A, or TET2 CHIP in CHS, ARIC, and JHS,
respectively; **“Association with CHIP”: “+” or “-” represent the
directions of effect for any CHIP, DNMT3A, or TET2 CHIP in CHS, ARIC,
and FHS, respectively; “N/A”: CpG was not found in association with
CHIP.
CHIP Clonal Hematopoiesis of Indeterminate Potential, CHR Chromosome.
We identified 5987 CpGs (~100%) associated with DNMT3A CHIP and 4607
CpGs (~76%) associated with ASXL1 CHIP that showed decreased
methylation (β < 0) (Fig. [111]2b, d). In contrast, 5079 CpGs (~90%)
associated with TET2 CHIP showed increased methylation (β > 0)
(Fig. [112]2c). Out of the 554 TET2-associated CpGs that showed
decreased methylation, 171 (~31%) CpGs were found to overlap with
DNMT3A CpG sites. The vast majority of CpGs associated with CHIP were
remote (>1 Mb) from the driver gene including 5969/5990 (99.6%) for
DNMT3A, 5632/5633 (~100%) for TET2, and 6070/6078 (99.9%) for ASXL1.
Fig. 2. Genome-wide Directions of Effect of Any CHIP and CHIP Subtypes.
[113]Fig. 2
[114]Open in a new tab
Volcano plots showing the effect size (β) and -log[10](P-value) from
the multiracial meta-analysis of epigenome-wide association studies
(EWAS) for (a) any CHIP (Clonal Hematopoiesis of Indeterminate
Potential), (b) DNMT3A CHIP, (c) TET2 CHIP, and the EWAS in the
Framingham Heart Study (FHS) for (d) ASXL1 CHIP. Genes annotated to the
CpG sites are shown. For panels (a–c), the effect size (β) and P-values
were derived from fixed-effect meta-analysis of multiple cohorts. For
panel (d), because ASXL1 CHIP was only available in the FHS cohort, the
effect size (β) and P-values were derived from linear regression
models. The color green indicates a significant negative association
between CHIP and DNA methylation while purple indicates a positive
association between the two variables. Yellow signifies non-significant
associations between CHIP and DNA methylation. Two-sided tests were
used for all analyses. P-values were adjusted for multiple comparisons
using the Benjamini-Hochberg false discovery rate (FDR) method.
Significant associations were defined as FDR < 0.05. Exact P-values,
standard errors for the β, and 95% confidence intervals for significant
results are provided in Supplementary Data [115]7-[116]10. Source data
are provided as a Source Data file.
Although age was included as a covariate, there remains a possibility
that common CpG sites across all three CHIP driver genes may be related
to age rather than CHIP mutation. To assess this, we compared the 19
CpGs that are common among all three CHIP driver gene mutations from
our meta-EWAS of CHIP with the CpGs from a recent EWAS of chronological
age in the Generation Scotland cohort^[117]16 (N = 18,413). No CpGs
overlapped, suggesting that common CpGs across all CHIP driver genes
are related to CHIP mutation rather than age.
A sensitivity analysis was performed by excluding CHIP cases with
VAF < 10%. The results are similar to the multiracial meta-EWAS of any
CHIP and are provided in Supplementary Fig. [118]4 and Supplementary
Data [119]11-[120]13. Approximately 78% of CpGs (7460/9615) in the
meta-EWAS of any CHIP were re-identified in the sensitivity analysis,
while 312 CpGs were newly identified.
Human hematopoietic stem cell models of CHIP validate EWAS
We sought to experimentally validate our multiracial meta-EWAS
methylation findings with an in vitro model of CHIP. CHIP was modeled
by introducing loss-of-function mutations in DNMT3A, TET2, and ASXL1 in
mobilized peripheral blood CD34+ hematopoietic cells, using
CRISPR-Cas9^[121]17. After seven days in culture, these cells were flow
sorted to isolate a purified population of CD34^+CD38^-Lin^- cells.
Following fluorescence-activated cell sorting, genomic DNA (gDNA) was
extracted, and methylation was assayed using biomodal duet evoC (see
Methods)^[122]18.
The analysis focused on the subset of CpG sites that were significantly
associated with CHIP in the EWAS data and nominally differentially
methylated (P < 0.05) in the in vitro model of CHIP. When comparing CpG
site subsets with their respective engineered cells, DNMT3A-associated
CpG sites showed significant enrichment in DNMT3A-engineered cells
(P < 2.88 × 10^-239) (Fig. [123]3A), while TET2-associated CpG sites
were significantly enriched in TET2-engineered cells
(P < 8.39 × 10^-56) (Fig. [124]3B), and ASXL1-engineered cells
(P < 1.65 × 10^-14) (Supplementary Fig. [125]5). ASXL1-associated CpG
sites showed no significant hits in ASXL1-engineered cells
(Fig. [126]3C), but a slight trend in TET2-engineered cells
(Supplementary Fig. [127]5). The any CHIP-associated CpG sites were
significantly enriched in DNMT3A-engineered primary cells only
(P < 9.29 × 10^-121), unlike TET2 and ASXL1 (Supplementary
Fig. [128]5).
Fig. 3. Functional Validation in CRISPR/Cas9-edited Hematopoietic Stem Cells
Modeling CHIP.
[129]Fig. 3
[130]Open in a new tab
Dot plots of methylation change from −1.0 (no methylation) to 1.0
(complete methylation) seen in engineered primary cell cultures
compared to correlation of EWAS results ranging from −0.1 to 0.1.
Following an initial CpG filtering using an uncorrected Student’s
t-test (p < 0.05), significance was determined with a two-sided
binomial test. a DNMT3A-associated CpG sites (n = 855 CpG sites)
compared to DNMT3A engineered human stem cells (n = 4). b
TET2-associated CpG sites (n = 312) compared to TET2-engineered human
stem cells (n = 4). c ASXL1-associated CpG Sites (n = 139) compared to
ASXL1-engineered human stem cells (n = 3). Source data are provided as
a Source Data file. KO Knockout, mC methylcytosine.
CpG association with gene expression and pathway analyses
To investigate the functional consequences of CHIP-associated CpGs, we
performed gene ontology (GO) and pathway enrichment analysis for genes
harboring CHIP-associated CpGs. For any CHIP, DNMT3A CHIP, and ASXL1
CHIP, the enriched GO terms related to broad cellular developmental and
organismal processes, while for TET2 CHIP the top GO terms related to
cellular regulation and cell signaling (Supplementary
Data [131]14–[132]17). For example, for any CHIP, DNMT3A CHIP, and
ASXL1 CHIP, the top ten most significant ontologies included
multicellular organism development, anatomical structure development,
system development, and developmental process. For TET2 CHIP, the most
significant ontology terms related to a cellular response to stimulus,
regulation of cellular processes, and cell signaling. Notably, for the
genes annotated to the 554 TET2-associated CpGs that were found to be
demethylated, the top GO terms were enriched for cellular developmental
and organismal processes such as multicellular organism development and
system development—similar to the enriched GO terms for genes annotated
to DNMT3A CHIP-associated CpGs (Supplementary Data [133]29).
To understand how differentially methylated CpGs associated with CHIP
might alter cellular function, we identified gene expression changes
associated with CHIP-linked CpGs. We analyzed the associations of
CHIP-associated CpGs with changes in cis gene expression (expressed
gene [eGene] within 100 kB of CpG) in 2115 FHS participants whose DNA
methylation data and whole-blood RNA-seq data were available. At P < 1
[MATH:
×10−7 :MATH]
, we identified 467 significant cis CpG-transcript pairs for any CHIP,
258 for DNMT3A CHIP, 293 for TET2 CHIP, and 234 for ASXL1 CHIP
(Supplementary Data [134]18–[135]21 provide the full expression
quantitative trait methylation (eQTM) results)^[136]19. The vast
majority of the associations between methylation and gene expression
changes were negative, where decreased methylation changes were
associated with increased gene expression changes or increased
methylation changes were associated with decreased gene expression
changes. For any CHIP, DNMT3A, TET2, and ASXL1 CHIP, ~68% (317/467),
~71% (184/258), ~77% (224/293), and ~72% (168/234) of CpGs had a
negative association between methylation and gene expression changes,
respectively. For any CHIP, the top enriched GO terms related to lipid
metabolism. eGenes associated with DNMT3A CHIP were enriched in cell
motility and adhesion processes. For TET2 CHIP, the top enriched terms
related to immune processes, such as leukocyte differentiation. ASXL1
CHIP eGenes were enriched in cellular and immune processes, including
cell importation and antigen processing and presentation (Supplementary
Data [137]22–[138]25).
Association of methylation with variants and MR analysis
Cis-methylation quantitative trait loci (cis-mQTL)—genetic loci that
are significantly associated with CpG methylation levels and located
within 1 Mb of their associated CpG—linked 8642 CpGs associated with
any CHIP and CHIP subtypes to GWAS Catalog
traits/diseases^[139]19,[140]20. Of the cis-mQTL variants, a subset
were associated with clonal hematopoiesis traits, particularly myeloid
clonal hematopoiesis and the number of clonal hematopoiesis mutations
(Supplementary Data [141]26).
Additionally, enrichment tests of CHIP-associated CpG sites with EWAS
catalog traits^[142]21 were performed across 4023 traits using a
significance threshold of
[MATH:
1.24×10−5 :MATH]
(0.05/4023) (Supplementary Data [143]27). For any CHIP, DNMT3A CHIP,
TET2 CHIP, and ASXL1 CHIP, the top outcomes reflected CpG sites related
to age/aging, alcohol consumption, smoking, and multiple CVD-related
traits including body mass index (BMI), type II diabetes, and fasting
insulin. In support of previous studies reporting ASXL1 CHIP enrichment
among smokers^[144]22,[145]23, 24% (1462/6078) of ASXL1 CHIP-associated
CpGs overlapped with smoking-associated CpGs.
Two-sample MR analysis of CHIP-associated CpGs (as exposures) with
cis-mQTLs as the instrumental variables in relation to CVD-related
traits and mortality (as outcomes) was performed to infer whether
differential methylation at CHIP-associated CpGs may causally influence
the outcomes. The significantly associated CpGs for any CHIP and for
the three CHIP driver genes were tested for causal associations with 22
traits, including all-cause mortality, BMI, LDL cholesterol,
hypertension, diabetes, CVD, and smoking. The top 20 CpGs and annotated
genes for each trait are reported in Table [146]3 (Supplementary
Data [147]28 displays the full MR results). 261 CHIP-associated,
differentially methylated CpG sites were identified that were
putatively causally associated with CVD-related traits and/or all-cause
mortality, including eight CpGs for myocardial infarction (MI) (e.g.,
cg11879188 (ABO), β[MR] = −0.99, P[MR] =
[MATH:
4.8×10−18 :MATH]
), 108 CpGs for blood pressure (e.g., cg20305489 (SEPT9), β[MR] = 10,
P[MR] =
[MATH:
1.7×10−31 :MATH]
), 86 CpGs for lipid traits (e.g., cg11250194 (FADS2), β[MR] = −0.89,
P[MR] =
[MATH:
2.0×10−33 :MATH]
), and two CpGs for mortality (e.g., cg08756033 (C13orf33),
β[MR] = 0.016, P[MR] =
[MATH:
1.3×10−4 :MATH]
). 53 CpGs were associated with more than one trait. For example,
cg11879188 is annotated to the ABO gene and was associated with seven
traits, including diastolic blood pressure (β[MR] = 2.7, P[MR] =
[MATH:
1.9×10−23 :MATH]
), MI (β[MR] = −0.99, P[MR] =
[MATH:
4.8×10−18 :MATH]
), and triglycerides (β[MR] = 0.20, P[MR] =
[MATH:
2.0×10−9 :MATH]
).
Table 3.
Mendelian Randomization of CHIP-associated CpGs and CVD-related
Outcomes
CHIP Category Exposure Outcome β SE P-value FDR Nearby Gene Association
with CHIP*
DNMT3A cg11250194 LDL cholesterol −0.89 0.074 2.0E-33 1.1E-28 FADS2 -
TET2 cg20305489 Diastolic blood pressure 10 0.89 1.7E-31 6.7E-27 SEPT9
+
TET2 cg20305489 Systolic blood pressure 16 1.5 3.6E-26 1.3E-21 SEPT9 +
Any CHIP/DNMT3A/ASXL1 cg00776080 Diastolic blood pressure −28 2.7
2.6E-24 7.9E-20 TENC1 -
Any CHIP/ASXL1 cg11879188 Diastolic blood pressure 2.7 0.27 1.9E-23
4.7E-19 ABO -
Any CHIP cg24530246 HDL cholesterol 0.46 0.047 1.2E-22 2.7E-18 -
DNMT3A cg11250194 HDL cholesterol −1.5 0.16 1.3E-22 2.7E-18 FADS2 -
DNMT3A cg11250194 Triglycerides 1.4 0.14 1.8E-22 3.6E-18 FADS2 -
DNMT3A cg16517298 HDL cholesterol 0.56 0.062 1.1E-19 1.6E-15 GALNT2 -
Any CHIP/DNMT3A cg17892169 Diastolic blood pressure 8.9 0.99 2.2E-19
3.2E-15 TNFSF12 -
TET2 cg01687878 Diastolic blood pressure −6.7 0.75 3.8E-19 5.2E-15 +
TET2 cg10632966 Systolic blood pressure 25 2.8 3.3E-18 3.8E-14 +
Any CHIP/ASXL1 cg11879188 Myocardial infarction −0.99 0.11 4.8E-18
5.3E-14 ABO -
DNMT3A cg16517298 Triglycerides −0.55 0.064 8.0E-18 7.9E-14 GALNT2 -
Any CHIP cg00417151 HDL cholesterol 0.85 0.10 2.7E-17 2.6E-13 RRBP1 -
TET2 cg16060189 Type 2 diabetes 2.4 0.29 1.1E-16 9.4E-13 +
TET2 cg14016363 Diastolic blood pressure −21 2.5 1.3E-16 1.0E-12 +
Any CHIP cg00526336 Triglycerides −2.1 0.28 2.5E-14 1.4E-10 GALNT2 -
Any CHIP/DNMT3A cg06346307 Systolic blood pressure −11 1.5 2.7E-14
1.5E-10 COMT -
Any CHIP cg19758448 HDL cholesterol −0.54 0.073 1.0E-13 5.5E-10 PGAP3 -
[148]Open in a new tab
The effect size (β), standard error (SE), and P-values were derived
from two-sample Mendelian randomization (MR) tests. Two-sided tests
were used for all analyses. FDR values were calculated using the
Benjamini-Hochberg FDR method.
*“Association with CHIP”: “+” or “-” represent the directions of effect
for any CHIP, DNMT3A, TET2, or ASXL1 CHIP in meta-EWAS.
CHIP Clonal Hematopoiesis of Indeterminate Potential.
Discussion
We report the results of a multiracial meta-EWAS of CHIP and identified
thousands of CpG sites across the genome that are significantly
associated with any CHIP and with DNMT3A, TET2, and ASXL1 CHIP. Of
note, the vast majority of the CpGs were trans- relative to the CHIP
driver gene. This appears to be consistent with the functions of
DNMT3A, TET2, and ASXL1 in globally altering DNA methylation levels of
CpG sites genome wide, as seen in the EWAS of each of the three CHIP
driver genes, where the significantly associated CpGs were numerous and
located diffusely across the genome. The methylomic signatures of CHIP
and CHIP driver genes were experimentally validated with
human-engineered CHIP cells. Downstream analyses were conducted to
assess whether these alterations in DNA methylation levels may be
causally associated with CVD-related outcomes and all-cause mortality.
Causal inference analyses using two-sample MR revealed evidence of a
possible causal role of CHIP-associated CpGs in various CVD-related
traits and all-cause mortality.
For the experimental validation of our meta-EWAS results, any
CHIP-associated CpG sites were significantly enriched in
DNMT3A-engineered cells, which was expected given the overwhelming
predominance of DNMT3A CHIP among total CHIP cases reported in our
study and several others^[149]2,[150]5,[151]12. Interestingly,
TET2-associated CpG sites were enriched in ASXL1-engineered cells. This
finding is consistent with the substantial CpG overlap ( ~ 1000 shared
CpGs) between TET2 and ASXL1 CHIP from the meta-EWAS and suggests that
the epigenetic regulators TET2 and ASXL1 impact several of the same
genome regions and may lead to similar downstream consequences.
Notably, ASXL1-associated CpGs showed no significant enrichment in the
ASXL1-engineered cells. The lack of enrichment of ASXL1-associated CpGs
in the ASXL1-engineered cells may limit the validity of the study’s
downstream analyses with ASXL1 CHIP. This observation may be due to the
limited number of ASXL1 CHIP cases in the EWAS as well as several
biological factors. ASXL1 mutations primarily affect histone
modifications, particularly H2AK119 ubiquitination, which indirectly
influences chromatin accessibility^[152]24. Recent studies have shown
that ASXL1 loss-of-function mutations increase chromatin accessibility,
potentially resulting in individualistic methylation changes influenced
by other genetic and environmental factors^[153]25. Furthermore, the
effects of ASXL1 mutations on methylation might be temporally dynamic
or cell-type specific, aspects not fully captured in our current
experimental design. Future studies with larger sample sizes,
particularly for ASXL1 CHIP, and longer observation periods may help
elucidate the relationship between ASXL1 mutations and DNA methylation
patterns in CHIP and determine with greater certainty the epigenetic
signatures of this CHIP driver mutation.
By incorporating an engineered reductionist system, we provide an
orthogonal approach to confirm that the patterns observed in CHIP
donors result directly from the somatic mutation. By engineering
DNMT3A, TET2, and ASXL1 mutations into healthy CD34+ cells and
performing DNA methylation profiling, we can recapitulate the DNA
methylation patterns seen in CHIP donors. Our study also establishes
this reductionist system as a robust representation of the methylation
phenotype. Future studies leveraging this system will enable more
precise dissection of the causal relations between these mutations and
changes in DNA methylation than would be possible from population-scale
epidemiology data alone.
In the in vitro validation analysis, we focused on the subset of CpG
sites that were significantly associated with CHIP in the EWAS and
nominally differentially methylated in the in vitro CHIP model to
improve the validity of our findings and reduce the likelihood of false
positives. By concentrating on overlapping CpG sites, we prioritized
CpGs with a stronger potential biological relevance, as they were
consistent across population-level and experimental settings.
Importantly, despite the strong directional concordance in the subset
of overlapping CpG sites, a small proportion of the CpGs identified
from the meta-EWAS were captured in the in vitro model: ~14% (855/5990)
for DNMT3A, ~6% (312/5633) for TET2, and ~2% (139/6078) for ASXL1 CHIP.
This may be because the in vitro system does not fully capture the
complex in vivo environment in which CHIP is influenced by various cell
types, environmental factors, and systemic interactions, such as immune
system interactions.
Two-sample MR analysis identified 261 differentially methylated CpG
sites that were putatively causally related to one or more CVD traits
and/or all-cause mortality. For example, cg11250194 was putatively
causally associated with four CVD-related cardiometabolic traits: LDL
cholesterol, HDL cholesterol, triglycerides, and fasting glucose.
Cg11250194 resides in the FADS2 gene. It is hypomethylated, associated
with DNMT3A CHIP (β = −0.022, P = 1.6E-13), and replicated in the
DNMT3A CHIP-engineered cells. The FADS2 gene encodes the enzyme fatty
acid desaturase 2 – the first rate-limiting enzyme for the biosynthesis
of polyunsaturated fatty acids^[154]26. A recent study found that
cg11250194 (FADS2) was associated with Alternative Healthy Eating Index
and that hypermethylation of this CpG was associated with lower
triglyceride levels^[155]27. Moreover, cg11250194 was previously
identified in an EWAS of lipid-related metabolic measures^[156]28.
Based on our findings, hypomethylation of this diet-associated CpG may
be linked to higher triglyceride levels, putatively increasing the risk
for CVD. FADS2 overexpression has also been found to promote clonal
formation^[157]26. Thus, FADS2 may be an important gene connecting CHIP
with diet. Of note, of the 30 CpGs associated with either
Mediterranean-style Diet Score or Alternative Healthy Eating Index or
both in a 2020 study by Ma et al.^[158]27, 17 were CHIP-associated CpGs
(~57%) identified from our multiracial meta-EWAS of CHIP. The
substantial overlap between diet- and CHIP-associated CpGs is
consistent with the hypothesis that an unhealthy diet may be associated
with CHIP through epigenetic mechanisms. Despite including smoking
status as a covariate in the models for all participating cohorts in
both studies^[159]28, we recognize that our ability to fully separate
the effects of diet from smoking on CHIP risk may be limited due to our
cross-sectional study design. As a result, some of the observed
association between unhealthy diet and CHIP through epigenetic
mechanisms may reflect residual smoking effects.
Compared to a previously published EWAS of CHIP^[160]12 (N = 3273, 228
CHIP cases), the present study has a substantially larger sample size
(N = 8196, 462 CHIP cases), including all the samples from the previous
study. With the larger sample size of the present study, we identified
6687, 3524, and 4678 novel CpGs significantly associated with any CHIP
and with the top two CHIP driver genes DNMT3A and TET2. Of the CpG
sites identified from the previous EWAS study at P < 1
[MATH:
×10−7 :MATH]
, a large proportion overlapped and have concordant directions of
effect with CpGs from the multiracial meta-EWAS of CHIP at P < 1
[MATH:
×10−7 :MATH]
: 91% (2928/3217) for any CHIP, 89% (2466/2769) for DNMT3A CHIP, and
90% (955/1059) for TET2 CHIP. This is expected, as almost half of the
CHIP cases in our meta-EWAS of CHIP are from the previous EWAS of
CHIP^[161]12. Additionally, we report thousands of ASXL1
CHIP-associated CpGs from an EWAS of ASXL1 CHIP. Through eQTM analysis
that identified CpG-transcript pairs, the top eGenes in ASXL1 CHIP
relate to various immune processes, suggesting that dysregulated immune
function may contribute to ASXL1 CHIP-related disease outcomes. This
putative role of ASXL1 CHIP in perturbing immune function, specifically
T cell function, has been recently reported using an ASXL1 CHIP
conditional knock-in mouse model^[162]29. Notably, several of the ASXL1
CHIP-associated CpGs displayed putatively causal relations to
CVD-related traits in MR analysis, including cg11879188 (in ABO).
While there are several strengths of our study, some limitations should
be noted. Although smoking status was included as a covariate in the
statistical models for all study cohorts, there could be residual
confounding as smoking behavior may not be fully adjusted for in the
analysis. Thus, smoking could still be driving the association between
CHIP and CVD, as was reported in a recent study of CH^[163]30. Of note,
there are several studies across diverse populations and in different
settings that controlled for smoking as a covariate and also found an
association of CHIP (a specific form of CH) with CVD. For instance, in
a recent study, Diez-Diez et al.^[164]31 clarified the directionality
of the CHIP-CVD relationship with adjustment for smoking and concluded
that CH confers an increased risk of developing atherosclerosis.
Additionally, the way that the DeCODE^[165]30 investigators ascertained
“CH” may have contributed to their finding that CH was not associated
with CVD, as it is distinct from the definition of “CHIP.” CH includes
clonal events with known leukemic driver gene mutations, such as CHIP
and mosaic chromosomal alterations (mCAs), and clonal events without
clear driver genes. Given recent findings that each of these distinct
classes of CH has unique phenotypic consequences^[166]2,[167]32, the
lack of association between CH and CHIP reported by Stacey et al. may
be due to the grouping of heterogenous CH subtypes.
Moreover, it has been previously demonstrated that small changes in the
stringency with which CHIP is ascertained can have an outsize effect on
downstream analyses. For example, Vlasschaert et al.^[168]33 reported
the importance of CHIP detection stringency in relation to
CHIP-associated CVD risk. Specifically, more stringent criteria (≥5
supporting reads) were associated with CVD risk, while less stringent
criteria (≥3 supporting reads) attenuated the association of CHIP with
CVD. Their study provides an up-to-date and nuanced explanation of the
CHIP-CVD relationship.
Driverless CH is the occurrence of clonal expansions in blood without a
known CHIP driver mutation and is estimated to drive the majority of
clonal expansions in the elderly^[169]34. Bernstein et al.^[170]34
identified regions within exome sequences that are under positive
selection to identify additional driver mutations in whole blood (large
clones >0.1) together with validation of positive selection in single
cell-derived hematopoietic myeloid and lymphoid colonies. The inclusion
of mutations in these fitness-inferred CH genes increases prevalence of
CH by 18% in the UK Biobank cohort. In our study, CHIP was defined when
an individual harbored at least one deleterious insertion/deletion or
single nucleotide variant in any of the 74 genes that have been
previously linked to myeloid malignancy at a variant allele frequency
of at least 2%. Given the study’s scope, we did not include driverless
CH in our CHIP definition. Thus, CHIP prevalence in our study may be
underestimated relative to studies that account for driverless clonal
expansion.
A larger sample size is needed to examine less frequently mutated CHIP
driver genes, such as TP53, JAK2, and PPM1D. Moreover, the reported
putatively causal associations of CpGs with CVD outcomes and mortality
were based on two-sample MR analysis. Despite our attempt to minimize
horizontal pleiotropy by excluding cis-mQTLs that also serve as
trans-mQTLs and excluding CpGs with three or more independent
instrumental variables through MR-Egger using a threshold of
P-value < 0.05, the two-sample MR approach has known
limitations^[171]35. Methods for detecting and addressing pleiotropy
may be ineffective^[172]36 and, thus, longitudinal and functional
studies are needed to reinforce causal findings.
To account for the possibility that the VAF of CHIP driver gene
mutation may influence DNA methylation, we used a threshold of
FDR < 0.05 to detect associations between CpGs and VAF. There were no
significant associations between CpG sites and VAF. Based on these
results, we do not believe that VAF significantly influenced our DNA
methylation findings. Notably, the sample size with available VAF
information is limited. For example, in the FHS cohort, we have only
166 CHIP cases and cannot completely rule out the possibility that VAF
may still have a modest impact on DNA methylation. Experimental studies
or studies with larger sample sizes may be necessary to address the
effect of VAF of mutation on DNA methylation.
While this study benefits from a large sample size, which allows for
robust statistical comparisons, we acknowledge the distinction between
statistical significance and biological significance. The statistical
significance observed in this study does not necessarily equate to
meaningful biological effects. The impact of cell type correction on
downstream analyses, particularly in heterogeneous patient populations,
has not been fully validated. Further research is needed to determine
how this adjustment translates into biological outcomes. Additionally,
our study evaluated individual CpG sites, however, differentially
methylated regions consisting of several consecutive methylated CpGs
have been shown to have important implications for disease
pathogenesis^[173]37. Thus, studies exploring these broader methylation
patterns are warranted to better capture the functional relevance of
epigenetic signatures of CHIP.
Last, although cell-type proportions were included as covariates for
all cohorts, we cannot exclude the possibility that subtle uncorrected
effects in cell-type proportions due to clonal selection in immune
cells may contribute to the enrichment of immune function observed for
TET2 and ASXL1 CHIP eGenes. While cell-type adjustments reduce
confounding effects, residual contributions from altered immune cell
proportions remain possible. Future studies investigating cell-type
specific DNA methylation and gene expression may provide additional
clarity on the impact of CHIP on immune gene expression.
Overall, our study sheds light on the epigenetic changes linked to CHIP
and CHIP subtypes and their associations with CVD-related outcomes. The
differentially expressed genes and pathways linked to the epigenetic
features of CHIP may serve as therapeutic targets for CHIP-related
diseases. For example, Fc receptor-like protein 3 (FCRL3) (cg17134153,
Fx = -5.5, P = 1E-113) is the top differentially expressed gene for
TET2 CHIP. FCRL3 encodes a type I transmembrane glycoprotein that is
expressed by lymphocytes and plays a role in modulating immune
responses^[174]38. Polymorphisms in this gene have been implicated in
the pathogenesis of autoimmune diseases^[175]38,[176]39. A recent study
demonstrated that FCRL3 stimulation of regulatory T cells induced
production of pro-inflammatory cytokines, including IL-17 and
IL-26^[177]38. This finding suggests that FCRL3 may play a critical
role in mediating the transition of regulatory T cells to a
pro-inflammatory phenotype and could potentially contribute to the
increased inflammation observed among TET2 CHIP
carriers^[178]40,[179]41. Additionally, Clark et al. identified FCRL3
as a gene for which DNA methylation at the CpG site cg17134153 in CD4^+
T cells likely mediates the genetic risk for rheumatoid
arthritis^[180]42. Given that CHIP, including TET2 CHIP, has been
associated with rheumatoid arthritis (RA)^[181]43, the regulation of
FCRL3 expression by methylation changes at cg17134153 may, in part,
serve as the functional basis of the observed association between CHIP
and RA. Further experimental studies are warranted to better understand
how differential expression of FCRL3 may impact TET2 CHIP development
and the pathogenesis of RA. Taken together, our results provide insight
into the molecular mechanisms underlying age-related diseases, namely
cardiovascular disease.
Methods
Ethics
All participants provided written, informed consent. The study protocol
was approved by the following institutional review boards at each
collaborating institution: Institutional Review Board at Boston Medical
Center (FHS); University of Washington Institutional Review Board
(CHS); University of Mississippi Medical Center Institutional Review
Board (ARIC: Jackson Field Center); Wake Forest University Health
Sciences Institutional Review Board (ARIC: Forsyth County Field
Center); University of Minnesota Institutional Review Board (ARIC:
Minnesota Field Center); Johns Hopkins University School of Public
Health Institutional Review Board (ARIC: Washington County Field
Center); University of Mississippi Medical Center (JHS); Jackson State
University (JHS); and Tougaloo College (JHS). All research was
performed in accordance with relevant ethical guidelines and
regulations. The study design and conduct adhered to all relevant
regulations regarding the use of human study participants and was
conducted in accordance to the criteria set by the Declaration of
Helsinki.
Study cohorts
The Framingham Heart Study (FHS) is a prospective, observational
community-based cohort investigating risk factors for CVD. For our
discovery sample, DNAm was measured from FHS participants (N = 3295) in
the Offspring cohort (N = 1860; Exam 8; years 2005-2008)^[182]37 and in
the Third Generation cohort (N = 1435; Exam 2; years
2008-2011)^[183]44. CHIP calls were based on whole-genome sequencing of
whole blood DNA samples, the majority of which were from FHS Offspring
participants at Exam 8 and Gen 3 participants at Exam 2 and temporally
concordant with the time of DNAm profiling. All FHS participants
self-identified as White at the time of recruitment.
The Jackson Heart Study (JHS) is an observational community-based
cohort studying the environmental and genetic factors associated with
CVD in African Americans. For our discovery sample, data were collected
from 1664 JHS participants^[184]12. DNAm was measured from the majority
of JHS participants at visit 1, with a small subset at visit 2. CHIP
calls were concurrent with DNAm profiling and based on whole-genome
sequencing of whole blood DNA samples, where the majority were from
visit 1 (years 2000–2004) and a subset from visit 2 (years
2005-2008)^[185]12. All JHS participants self-identified as Black or
African American at the time of recruitment. No ancestry outliers were
excluded, as inferred based on genetic similarity to reference panels.
Similarity to the 1000 G AFR reference panel varied by individual
(study q1, median, q3 77.9% 84.3% 89.0%) in the methylation and WGS
overlap dataset, using estimates from RFMix.
The Cardiovascular Health Study (CHS) is a population-based cohort
study of risk factors for CVD in adults aged 65 or older^[186]45. DNAm
was measured from blood samples from participants in years 5 and 9,
year 5, or year 9 only. CHIP calls were based on whole-genome
sequencing of blood samples, where the majority were taken 3 years
before or concurrently with the first DNAm measurement^[187]12. CHS
participants self-reported their race at the time of recruitment.
The Atherosclerosis Risk in Communities (ARIC) is a prospective,
multiracial cohort study of risk factor and clinical outcomes of
atherosclerosis^[188]38. DNAm was measured from 2655 ARIC participants
at visit 2 (1990-1992) or visit 3 (1993-1995). CHIP calls were based on
whole exome sequencing of blood samples from visit 2 and visit
3^[189]12,[190]39. ARIC participants self-identified their race at the
time of recruitment. There is a subset of participants included in both
ARIC and JHS. These overlapping participants were not excluded.
DNA methylation profiling
All the DNA samples were from whole blood. The four cohorts including
FHS, JHS, CHS and ARIC, conducted independent laboratory DNAm
measurements, quality control (including sample-wise and probe-wide
filtering and probe intensity background correction; see Supplementary
Information File). DNA methylation was measured in FHS, CHS, and ARIC
participants using Illumina Infinium Human Methylation-450 Beadchip
(450 K array) and in JHS participants using the Ilumina EPIC
array^[191]40,[192]41.
CHIP calling
For the purposes of this investigation, CHIP was defined as a candidate
driver gene mutation in genes that have been reported to be associated
with hematologic malignancy, is present at a variant allele frequency
(VAF) of at least 2% in peripheral blood, and is present in the absence
of hematologic malignancy^[193]42. CHIP was detected in FHS, JHS, and
CHS from WGS blood DNA in the NHLBI Trans-Omics for Precision Medicine
(TOPMed) consortium using the Mutect2 software^[194]5. In ARIC, CHIP
calls were based on whole exome sequencing of blood DNA using the same
procedure^[195]5. CHIP is defined as when an individual harbors at
least one pre-specified deleterious insertion/deletion or single
nucleotide variant in any of the 74 genes linked to myeloid malignancy
at a variant allele frequency (VAF) ≥ 2%^[196]5. TOPMed WGS samples
were sequenced to a median depth of 40x, with the sequencing depth
ranging from 30x-50x for a specific region. At this sequencing depth,
CHIP can be reliably ascertained with a VAF > 10% but CHIP variants
with a VAF ≤ 10% are unable to be robustly captured^[197]5. For a
sensitivity analysis, race-stratified and multiracial meta-EWAS of any
CHIP was performed using a more restrictive CHIP clone size of
VAF > 10% (See Supplementary Fig. [198]4 and Supplementary
Data [199]11-[200]13).
Cohort-specific EWAS
The correction of methylation data for technical covariates was cohort
specific. Each cohort performed an independent investigation to select
an optimized set of technical covariates (e.g., batch, plate, chip,
row, and column), using measured or imputed blood cell type fractions,
surrogate variables, and/or principal components. Most cohorts had
previous publications using the same dataset for EWAS of different
traits, such as EWAS of alcohol drinking and smoking. In this study,
those cohorts used the same strategies as they did previously for
correcting for technical variables, including batch effects. Linear
mixed models were used to test the associations between CHIP status as
the predictor variable and DNAm β values as the outcome variable.
Information about cohort-specific models is available in the
Supplementary Information File.
Meta-analysis
All analyses were contingent on self-reported Black or White race.
Previous ancestry inference in these cohort studies^[201]43 suggests
high genetic similarity of nearly all self-identified White
participants to EUR reference panels (including 1000 Genomes).
Self-identified Black participants have high but variable (average ~80%
but may vary based on study and by study participant) genetic
similarity to AFR reference panels and have some similarity to EUR
reference panels as well. In some cases, extreme ancestry outliers may
have been removed during study-specific QC. However, this has not been
thoroughly documented in the data we received from participating
studies. Importantly, we do not mean to imply that socially constructed
racial identities reported by study participants are synonymous with
genetic ancestry. Stratification by race may, however, capture
differential social and environmental exposures within the US, which
may impact the epigenome.
The meta-analysis was performed for any CHIP, DNMT3A, and TET2 in White
participants from FHS, CHS, and ARIC (n = 4355) and Black participants
from JHS, CHS, and ARIC (n = 3841) participants, respectively, using
inverse variance-weighted fixed-effects models implemented in metagen()
function in R packages
([202]https://rdrr.io/cran/meta/man/metagen.html). The summary
statistics were used from the previous EWAS of CHIP for the ARIC and
CHS cohorts^[203]12. Then, multiracial meta-analysis was performed for
White and Black participants (n = 8,196). The meta-analysis was
constrained to methylation probes passing filtering criteria in all
cohorts.
Supplementary Fig. [204]1 presents QQ plots with genomic control (GC)
inflation factor (λ) to illustrate the EWAS results in each cohort and
in the meta-analysis. Our observations reveal a prevalence of high
inflation factors (λ > 1.1) across nearly all studies. Such elevated
inflation factors typically signal potential bias in the analysis
process. However, it’s important to note that in cases where a
significant portion of CpG sites exhibit differential methylation
associated with the outcome (e.g., age and CHIP), this can contribute
to the observed high λ values. Moreover, adjusting for additional PCs
moderately associated with the outcome may alleviate lambda values,
albeit at the expense of reduced power to detect CpGs related to the
outcome. To address this, we adopted strategies consistent with those
employed by the respective cohorts in previous analyses, focusing on
correcting for technical variables and latent factors identified in
prior studies across multiple outcomes^[205]46–[206]48. Furthermore,
prior to meta-analysis, we implemented additional corrections for
individual study results exhibiting λ > 1.5, ensuring the integrity of
our findings. The statistical significance threshold was
P < 0.05/400,000 ≈ 1 × 10^−7. A less stringent threshold, the
Benjamini-corrected FDR adjusted p-value < 0.05, was also used.
Expression quantitative trait methylation analysis
Association tests of DNAm and gene expression were previously performed
in 2115 FHS participants in the Offspring (n = 686) and Third
Generation (n = 1429) cohorts with available whole blood DNA
methylation and RNA-seq gene expression data to identify CpG sites at
which differential methylation is associated with gene
expression^[207]49. Approximately 70,000 significant cis CpG-transcript
pairs were identified at P < 1 × 10^-7. Cis is defined as CpGs located
within 100 kB of the transcription start site of a mRNA. When
calculating the association between CpG sites and gene-level
transcripts, linear regression models were used. Residualized gene
expression served as the outcome and residualized DNA methylation β
value as the primary explanatory variable, with adjustment for age,
sex, white blood cell count, blood cell fraction, platelet count, five
gene expression PCs, and ten DNA methylation PCs. Through integration
of CpGs and gene-level transcripts (mRNAs) from RNA-seq, mRNAs were
identified that were significantly associated with each of the CpGs in
cis for any CHIP and the CHIP subtypes^[208]49,[209]50.
Pathway enrichment analysis
Enrichment analysis for CHIP EWAS signatures with a significance
threshold of P < 1 × 10^-7 was conducted on gene sets comprising genes
annotated to CpGs associated with CHIP and major CHIP subtypes using
missMethyl R package. This package adjusts for known DNAm array
bias^[210]51. For the enrichment analysis for eQTM gene sets, the DAVID
Bioinformatics online tool was used
([211]https://david.ncifcrf.gov/home.jsp). To improve the focus of this
study, only the results of Gene Ontology (GO) terms related to
biological process and KEGG pathways were used. Over-representation
enrichment tests, specifically one-sided Fisher’s exact tests, were
used to assess whether a GO/KEGG term is significantly enriched
compared to the background. The significant threshold of FDR adjusted
p-value < 0.05 was used, corrected by multiple tested terms^[212]5.
Cell culture of mPB CD34+ cells
Patients were given G-CSF ≤ 10 mcg/kg/day for up to 5 days. Peripheral
blood mononuclear cells were collected and CD34+ cells were isolated
using a MACs sorter. Samples were then counted and frozen down for
future use. This research is funded from NIDDK. Mobilized peripheral
blood (mPB) CD34+ cells were bought from StemCell technologies or the
Cooperative Center of Excellence in Hematology (CCEH) at the Fred Hutch
Cancer Research Center, Seattle, USA. The name and source of all cell
lines used are the following: mPB-001: Sex: Female, Supplier Fred
Hutchinson; mPB-002: Sex: Male, Supplier Fred Hutchinson; mPB-003: Sex:
Female, Supplier StemCell Technologies; mPB-004: Sex: Male, Supplier:
StemCell Technologies; mPB-005: Sex: Male, Supplier: StemCell
Technologies. CD34+ cells were thawed and cultured in CD34+ expansion
medium (StemSpan II (StemCell Technologies) + 10% CD34+ expansion
supplement (Stemcell Technologies) + 20 U/mL penicillin-streptomycin
(Gibco) + 500 nM UM729 (StemCell Technologies) + 750 nM Stemreginin-1
(StemCell Technologies)) for 48 h prior to editing with CRISPR-Cas9.
After 48 h, samples were electroporated with RNP complexes and seeded
at 400k cells per mL. Cells were maintained between 200k - 1M cells per
mL.
CRISPR-Cas9 of mPB CD34+ cells
Ribonucleoprotein (RNP) complexes targeting scramble, AAVS, TET2,
ASXL-1, and DNMT3A were made by incubating Cas9 (IDT Alt-R HiFi sp Cas9
Nuclease V3) and sgRNA (IDT Alt-R Cas9 sgRNAs) at a 1:3.26 ratio.
Guides for each gene are present in Supplementary Table [213]1. On day
2 post thaw, mPB CD34 cells were counted and resuspended in Buffer R or
GE Buffer. RNP complexes and cells were mixed and electroporated using
Neon Pipette (Thermo Scientific Inc.) with the following settings:
1650 V 10 ms pulses 3 times. Samples were seeded in expansion media at
400k/mL.
Assessment of indel formation
Genomic DNA (gDNA) was isolated and amplified with the following
conditions: 95 °C for 2 min followed by 35 cycles of 95 °C for 45 s,
61-62 °C for 1 min, 72 °C for 2 min with a final extension at 72 °C for
5 min using primers towards TET2, ASXL-1, and DNMT3A (Supplementary
Table [214]2). PCR products were sent to GeneWiz (Azenta Life Sciences)
where PCR cleanup and Sanger sequencing was performed. Indel formation
was assessed using TIDE (Supplementary Table [215]3)^[216]52.
FACS sorting of mPB CD34+ cells
Edited CD34+ cells were sorted at day 7 post CRISPR-Cas9 using a
FACSymphony™ S6 Cell Sorter or a BD FACS Aria II to remove
differentiated cells. Briefly, CD34+ cells were washed in cell staining
buffer (Biolegend) once and stained with antibodies targeting CD34
(Biolegend: 343614; dilution: 1:50), CD38 (Biolegend: 303532; dilution:
1:100), and Lineage Markers (Biolegend: 348805; dilution: 1:10) for
30 min at 4 °C in the dark (Supplementary Fig. [217]6). The antibodies
are present in Supplementary Table [218]4.
Duet evoC library generation and primary methylation analysis
DNA was extracted using Micro kits (Qiagen) from flow sorted cells from
3–5 donors. EvoC libraries were created following manufacturer
instructions (Biomodal). Briefly, DNA was sheared using a Covaris LE220
and assessment of input DNA was performed using Bioanalayzer instrument
(Agilent) and Qbit (ThermoFisher). Library generation was performed
according to the duet evoC library generation protocol (biomodal).
Sequencing of duet evoC libraries
Capture of CpG sites was performed using Twist Human Methylome Panel
(Twist Biosciences) and next generation sequencing was completed by
using the NovaSeq 6000 (150 bp paired-end reads) targeting 160 M reads
per sample. Biomodal pipeline version 1.1.1 was used to analyze the raw
FASTQs with default settings. Briefly adapter trimming was performed
with cutadapt, resolution of R1 and R2 to generate single-end reads
with epigenetic information, mapping onto the human genome (GRCh38),
and quantification of the modification state of each CpG site.
Comparisons between EWAS and biomodal data
For each sample and for each CpG, read counts from the forward and
reverse strand were summed and the mC fraction calculated as the number
of reads supporting mC divided by the total number of reads with
modified or unmodified C (excluding reads with A, T or G). The dataset
was reduced to the CpGs with significant levels of association from
each EWAS analysis. For each of these CpGs, methylation difference was
calculated as the difference between the average mC fraction of
multiple replicates of different KO primary cells (“DNMT3A”, “TET2”,
“ASXL-1”) and the average mC fraction of multiple replicates of control
cells (“Scramble” or “AAVS”). Only CpGs with uncorrected
p-values < 0.05 (t-test) were carried forward. For each EWAS analysis
(“any-CHIP”, “DNMT3A_chip”, “TET2_chip”, “ASXL1_chip”) and for each
gene-KO (“DNMT3A”, “TET2”, “ASXL-1”), the mC fraction of these CpGs was
plotted against the EWAS TE, and a binomial test was used to check for
enrichment in the top-right and bottom-left quadrant indicating a sign
correlation between the mC fraction change induced by the KO and the
EWAS TE.
Cis-mQTLs
Methylation quantitative trait loci (mQTLs) – SNPs associated with DNA
methylation – were identified from 4,170 FHS participants as previously
reported^[219]40, including 4.7 million cis-mQTLs at P < 2 × 10^−11.
Genotypes were imputed using the 1000 Genomes Project panel phase 3
using MACH / Minimac software. SNPs with MAF > 0.01 and imputation
quality ratio >0.3 were retained. Cis-mQTLs were defined as SNPs
residing within 1 Mb upstream or downstream of a CpG site.
Association of methylation with complex diseases and traits
To annotate CHIP-associated CpGs and cis-mQTLs, we utilized both the
EWAS Catalog ([220]https://www.ewascatalog.org/)^[221]22 and the GWAS
Catalog ([222]https://www.ebi.ac.uk/gwas/)^[223]20. The EWAS Catalog
collected published CpG signatures for about 4000 traits and/or
diseases. GWAS Catalog collected significant SNPs associated with
thousands of traits and/or diseases. For the identified CHIP-associated
CpGs, we matched these CpGs with reported trait-associated CpGs in the
EWAS Catalog. To evaluate the enrichment of CHIP-associated CpGs for
traits listed in the EWAS Catalog, we performed one-sided Fisher’s
exact tests. We applied a Bonferroni-corrected significance threshold
of P = 1.24E-05 (0.05/4023, accounting for 4023 traits in the EWAS
Catalog). Additionally, to assess whether any cis-mQTLs of
CHIP-associated CpGs demonstrated strong associations with human
complex traits, we matched the cis-mQTLs against SNPs in the GWAS
Catalog that were reported with P < 5E-8.
Mendelian randomization analysis
In order to investigate whether differentiation methylation at
CHIP-associated CpGs causally influences risk of CVD and mortality,
two-sample Mendelian randomization (MR) was performed between exposures
(CHIP-associated CpGs) and a list of CVD- and mortality- related traits
as outcomes. We utilized our in-house developed analytical pipeline
called MR-Seek ([224]https://github.com/OpenOmics/mr-seek.git) to
perform the analysis. The full summary statistics of different GWAS
datasets were downloaded from NHGRI-EBI. The list of CVD- and
mortality- related traits and list of references of those GWAS results