Abstract
We conduct a large-scale meta-analysis of heart failure genome-wide
association studies (GWAS) consisting of over 90,000 heart failure
cases and more than 1 million control individuals of European ancestry
to uncover novel genetic determinants for heart failure. Using the GWAS
results and blood protein quantitative loci, we perform Mendelian
randomization and colocalization analyses on human proteins to provide
putative causal evidence for the role of druggable proteins in the
genesis of heart failure. We identify 39 genome-wide significant heart
failure risk variants, of which 18 are previously unreported. Using a
combination of Mendelian randomization proteomics and genetic cis-only
colocalization analyses, we identify 10 additional putatively causal
genes for heart failure. Findings from GWAS and Mendelian
randomization-proteomics identify seven (CAMK2D, PRKD1, PRKD3, MAPK3,
TNFSF12, APOC3 and NAE1) proteins as potential targets for
interventions to be used in primary prevention of heart failure.
Subject terms: Genome-wide association studies, Cardiovascular
genetics, Heart failure, Statistical methods
__________________________________________________________________
Here, the authors perform a large-scale meta-analysis of genome-wide
association studies and cis-MR proteomics to identify protein
biomarkers and drug targets for heart failure.
Introduction
Heart failure (HF) is one of the most important threats to the
sustainability of health systems in the United States^[90]1. Despite
major improvements in the understanding of risk factors for incident
HF^[91]2, this knowledge has not yet been fully translated into
effective interventions for the primary prevention of HF, except for
blood pressure (BP) lowering medications^[92]3 and statins^[93]4. Due
to the inherent attributes of human genetics that minimize the risk of
residual confounding and reverse causation^[94]5, large-scale genomic
analyses provide an opportunity to uncover putative causal mechanisms
for complex phenotypes such as HF^[95]6. Recent genome-wide association
studies (GWAS) of HF by the Heart Failure Molecular Epidemiology for
Therapeutic Targets (HERMES) and the Million Veteran Program
(MVP)^[96]7 have identified 26 genomic loci associated with HF^[97]8.
This emerging knowledge has served to identify novel biological
mechanisms associated with incident HF and may inform the development
of novel interventions for the primary prevention of HF.
Novel technological developments can simultaneously measure thousands
of human proteins in a single blood sample. The SOMAscan V4 assay
includes 5207 aptamers capable of measuring 4988 unique human proteins,
of which 514 are the target of drugs licensed or in the clinical phase,
1153 are the target of compounds in the preclinical phase, and 1377 are
proteins predicted to be druggable^[98]9,[99]10. This offers a unique
opportunity for translating the genomic findings of HF into novel
interventions for the primary prevention of HF. Given that human
proteins account for the majority of targets for approved drugs to date
and that expression or activity is central to the development of human
disease^[100]11, leveraging GWAS data of HF and protein quantitative
trait loci (pQTL) offers an opportunity to provide mechanistic insight
into the causal pathways involved in the emergence of HF as well as to
inform novel therapeutic targets.
Here, we conduct a meta-analysis of GWAS on HF from the MVP and the
HERMES consortium and leverage our GWAS of HF with pQTLs from the
Fenland study to conduct Mendelian randomization (MR) and genetic
colocalization analyses on human proteins covered by SOMAscan
V4^[101]12. We then perform extensive downstream analyses covering HF
risk factors, cardiac MRI traits, -omics, and downstream
transcriptomics analyses to investigate the biological credibility of
our genetic findings.
Results
Genome-wide meta-analysis identifies 18 novel loci for HF
We meta-analyzed GWAS on HF from the HERMES consortium and MVP
(Supplementary Data [102]1) and identified variants at GW-significance
(p < 5 × 10^−8) (Fig. [103]1). The quantile-quantile (Q-Q) plot of the
meta-analysis is shown in Supplementary Fig. [104]1. We performed
follow-up analysis of the newly discovered HF variants to identify the
likely causal gene for each signal and to investigate associations with
15 HF risk factors and nine left ventricular (LV) cardiac MRI traits.
Fig. 1. Schematic diagram of the datasets and analyses.
[105]Fig. 1
[106]Open in a new tab
HF heart failure, MVP Million Veteran Program cohort, GWAS genome-wide
association study, pQTL protein quantitative trait loci, PheWAS
phenome-wide association study, MR Mendelian randomization, FDR false
discovery rate, PP.H4 posterior probability of H[4].
We performed meta-analyses of genome-wide association results for HF
from two studies: MVP (n[cases] = 43,344; n[controls] = 258,943) and
HERMES (n[cases] = 47,309; n[controls] = 930,014). After quality
control, we obtained association results for 10,227,138 genetic
variants with HF. We observed 39 variants with genome-wide significant
signals with HF, of which 18 variants were >500KB from a previously
reported indexed variant (Fig. [107]2 and Supplementary Data [108]2).
We performed fine-mapping using GWAS summary statistics (Supplementary
Fig. [109]2). We determined the gene closest to the indexed SNP, as
well as the gene with the highest score from Polygenic Priority Score
(PoPs)^[110]13 within a 500KB region of the indexed SNP (Table [111]1).
PoPs take genome-wide features into account while the nearest gene is
based on local information, providing complementary information for
annotation of indexed variants (see Methods). For all the genes
suggested by the nearest gene and PoPS, we retrieved the results from
gene-burden tests using putative Loss-of-Function (pLoF) variants from
the Genebass-UK Biobank resource (see Methods)^[112]14. RFX4 and UBC,
both suggested by PoPs, showed the most significant gene-based p values
with HF (p values of 9.12 × 10^−4 and 4.6 × 10^−3, respectively). From
herein, we used genes suggested by PoPs as default to describe the
distinct variants.
Fig. 2. Manhattan plots showing associations with HF from a GWAS
meta-analysis on n = 1,266,315 individuals and b MR-wide proteomics.
[113]Fig. 2
[114]Open in a new tab
a Manhattan plot showing the −log[10](P value) of association for each
SNP from the GWAS meta-analysis plotted on the y-axis against genomic
position on the x-axis. The red dotted line corresponds to the
genome-wide significance threshold. The summary statistics of
independent lead SNPs are noted in Supplementary Data [115]1. b
Manhattan plot showing the −log[10]-transformed FDR-adjusted P value of
association for each gene plotted against genomic position on the
x-axis. All tests were two-sided and adjusted for multiple comparisons.
The blue line corresponds to an FDR threshold of 5% and points are
color-coded by drug tractability information based on data provided by
OpenTargets; green for druggable genes. FDR false discovery rate.
Table 1.
Loci reported for HF in the meta-analysis of HERMES and MVP HF GWAS
datasets
rsID Chr Pos Nearest Gene PoPs gene CADD Phred Score NEA EA MAF Beta SE
P value
Novel variants
rs4755720 11 43628749 HSD17B12 HSD17B12 7.526 C T 0.400 −0.037 0.006
8.14E-11
rs7766436 6 22598259 HDGFL1 HDGFL1 1.902 C T 0.286 0.037 0.006 5.18E-10
rs3820888 2 201180023 SPATS2L SPATS2L 6.261 C T 0.378 −0.034 0.006
1.20E-09
rs12992672 2 632592 TMEM18 TMEM18 1.061 G A 0.172 0.045 0.007 1.70E-09
rs10846742 12 125308682 SCARB1 UBC 0.277 G A 0.173 −0.046 0.008
1.73E-09
rs17620390 4 114384328 CAMK2D CAMK2D 1.907 C A 0.284 −0.037 0.006
1.90E-09
rs72688573 1 50746997 FAF1 FAF1 6.834 C T 0.022 −0.122 0.021 4.31E-09
rs10938398 4 45186139 GNPDA2 N/A 1.094 G A 0.421 0.033 0.006 4.50E-09
rs6945340 7 75100124 POM121C HIP1 3.58 C T 0.208 −0.040 0.007 5.89E-09
rs7564469 2 145258445 ZEB2 GTDC1 19.56 C T 0.165 −0.043 0.007 6.66E-09
rs7977247 12 107259470 RIC8B RFX4 1.868 C T 0.434 0.032 0.006 1.07E-08
rs1016287 2 59305625 FANCL N/A 19.22 C T 0.280 0.037 0.006 1.11E-08
rs959388 14 30169987 PRKD1 PRKD1 0.596 G T 0.417 −0.031 0.006 1.30E-08
rs233806 4 103212846 SLC39A8 BANK1 9.713 C T 0.207 −0.037 0.007
1.57E-08
rs17038861 2 37233265 HEATR5B PRKD3 0.403 G T 0.195 0.039 0.007
2.35E-08
rs9352691 6 79785607 PHIP PHIP 6.373 C T 0.365 0.032 0.006 2.65E-08
rs10520390 19 46327831 SYMPK DMWD 2.849 G C 0.059 0.074 0.013 2.87E-08
rs79682748 1 66989719 SGIP1 SGIP1 4.719 G A 0.018 −0.155 0.028 3.00E-08
Previously reported variants
rs7859727 9 22102165 CDKN2B CDKN2A 1.448 C T 0.488 0.061 0.006 3.11E-29
rs2634071 4 111669220 PITX2 PITX2 1.622 C T 0.219 0.079 0.007 3.64E-29
rs11642015 16 53802494 FTO RPGRIP1L 4.826 C T 0.432 0.058 0.006
2.69E-25
rs10455872 6 161010118 LPA PLG 0.146 G A 0.074 −0.104 0.011 8.20E-23
rs3176326 6 36647289 CDKN1A CDKN1A 10.88 G A 0.173 −0.068 0.007
2.51E-22
rs602633 1 109821511 PSRC1 CELSR2 8.63 G T 0.207 −0.054 0.007 5.19E-16
rs1739833 1 16331108 C1orf64 ZBTB17 4.205 C T 0.331 −0.048 0.006
7.89E-15
rs17617337 10 121426884 BAG3 BAG3 0.079 C T 0.218 −0.050 0.007 8.88E-14
rs600038 9 136151806 ABO SURF1 7.596 C T 0.215 −0.049 0.007 9.44E-14
rs34163229 10 75406912 SYNPO2L SEC24C 24 G T 0.133 −0.056 0.008
7.40E-13
rs113437066 17 65836220 BPTF BPTF 2.588 ATTT A 0.197 0.061 0.010
1.81E-10
rs11746435 5 137006762 KLHL3 HNRNPA0 7.216 T A 0.229 0.042 0.007
2.04E-10
rs2832275 21 30602994 BACH1 LTN1 1.08 T A 0.139 −0.047 0.008 2.90E-10
rs7795282 7 74122857 GTF2I GTF2IRD1 0.262 G A 0.221 −0.042 0.007
7.69E-10
rs12933292 16 69566309 NFAT5 NFAT5 0.403 G C 0.425 0.034 0.006 8.96E-10
rs216199 17 2200871 SMG6 SMG6 3.442 C T 0.388 −0.037 0.006 1.11E-09
rs2013002 12 112200150 ALDH2 ATXN2 4.336 C T 0.415 0.033 0.006 5.68E-09
rs17163345 1 222806218 MIA3 MIA3 7.541 G A 0.270 −0.034 0.006 2.15E-08
rs3764351 17 37824339 PNMT MED1 5.182 G A 0.340 −0.033 0.006 2.27E-08
rs9349379 6 12903957 PHACTR1 PHACTR1 5.478 G A 0.401 −0.031 0.006
2.58E-08
rs4327120 18 36532976 N/A N/A 1.032 C T 0.128 0.050 0.009 3.09E-08
[116]Open in a new tab
Findings were identified using fixed effects inverse-variance weighted
meta-analysis. The chromosomal position is based on GRCh37/hg19
reference. Gene names are italicized. Genes that are druggable or
predicted to be druggable are highlighted in bold.
CADD combined annotation-dependent depletion, NEA non-effect allele, EA
effect allele, MAF minor allele frequency, SE standard error.
Except for rs6945340/HIP1 and rs79682748/SGIP1, all other distinct
variants for HF had an association (defined as 0.01/number of secondary
traits, p < 1 × 10^−4) with at least one HF risk factor (Fig. [117]3a).
Five variants had the largest number of associations with HF risk
factors: rs9352691/PHIP (blood pressure, body mass index (BMI),
high-density lipoprotein cholesterol (HDL-C), alcohol consumption, and
atrial fibrillation (AF)), rs12992672/TMEM18 (BMI, HDL-C, type-2
diabetes mellitus (T2DM), AF, and smoking), rs4755720/ HSD17B12 (BMI,
HDL-C, T2DM, and CAD), rs233806/BANK1 (blood pressure, HDL-C, and BMI)
and rs959388/PRKD1 (BMI, smoking, and blood pressure), details in
Supplementary Data [118]3. We observed that the directionality of the
associations with HF risk factors was concordant with the findings on
HF risk in 32 out of the 42 (76%) associations. HDL-C and diastolic BP
accounted for nine of the ten discordant associations (Supplementary
Fig. [119]3). We did not find associations with troponin, NT-proBNP,
and IL-6 (Supplementary Data [120]3).
Fig. 3. Plots showing a genetic association of 18 HF loci against risk
factors for HF and b MR and colocalization estimates of MR-proteomic
genes-hits against HF risk factors.
[121]Fig. 3
[122]Open in a new tab
a The color of the bubble corresponds to the beta coefficient of the
genetic association between the loci (x-axis) and trait (y-axis). Blue
corresponds to a negative and red corresponds to a positive beta
coefficient. The size of each bubble corresponds to the negative
logarithm of the association p value; larger size corresponds to lower
p values. Loci are grouped by druggable and non-druggable genes. All
tests were two-sided without adjustment for multiple comparisons.
Associations which passed the p value threshold (p < 1 × 10^−4) are
denoted by a yellow diamond. b This bubble plot shows MR estimates for
which p < 1 × 10^−4. The size of each bubble corresponds to the
posterior probability for hypothesis 4 derived from colocalization. The
color of the bubble corresponds to the beta coefficient derived from
MR. Blue corresponds to a negative association and red corresponds to a
positive association; note that a positive
[MATH: β :MATH]
indicates either an increase in protein levels corresponding to an
increase in HF risk or a decrease in protein levels corresponding to a
decrease in HF risk, while a negative
[MATH: β :MATH]
indicates either a decrease in protein levels corresponding to an
increase in HF risk or an increase in protein levels corresponding to a
decrease in HF risk. The intensity of the color corresponds to
−log[10](P value) for the strength of association in the MR. All tests
were two-sided without adjustment for multiple comparisons. Loci are
grouped by druggable and non-druggable genes. TNXB, SIRPA, and ENPEP
genes are not included as these had no MR estimates on HF risk factors
that pass the p < 1 × 10^−4 threshold.
[MATH: β :MATH]
, Beta coefficient, AC alcohol consumption, AF atrial fibrillation, BMI
body mass index, CAD coronary artery disease, COPD chronic obstructive
pulmonary disease, DBP diastolic blood pressure, eGFR estimated
glomerular filtration rate, HDL-C high-density lipoprotein cholesterol,
IL-6 Interleukin-6, LDL-C low-density lipoprotein cholesterol,
NT-proBNP N-terminal proBNP, SBP systolic blood pressure, SMK smoking,
T2D type-2 diabetes, TRP troponin I cardiac muscle, PP.H4 posterior
probability of H[4].
Only three variants (rs3820888/SPATS2L, rs4755720/HSD17B12, and
rs72688573/FAF1) showed at least one association (p < 1 × 10^−4) with
LV cardiac MRI traits (Supplementary Fig. [123]4 and Supplementary
Data [124]3). The rs3820888/SPATS2L variant was associated with six LV
cardiac MRI traits and AF; all these associations were directionally
concordant with the HF findings. The rs4755720/HSD17B12 variant was
associated with LV end-diastolic volume indexed to body surface area
and four HF risk factors, and rs72688573/FAF1 was associated with LV
mass to end-diastolic volume ratio and two HF risk factors, see details
in Supplementary Data [125]3. In the African-American subpopulation
from the MVP GWAS (Supplementary Data [126]4), we found none of our 39
genome-wide significant distinct variants with HF in the European
datasets achieved genome-wide significance (Supplementary Data [127]5).
MR Proteomics and colocalization identifies ten genes for HF
Using the GWAS data on SOMAscan V4 proteomics, we selected
conditionally independent cis-variants, defined as any variant within a
+/− 1 Mb region of the protein-encoding gene, that is associated with
plasma levels of SOMAscan proteins (p < 5 × 10^−8). We propose that
these variants are instrumental variables for measured SOMAscan
proteins and conducted two-sample MR analyses using our
European-descent GWAS meta-analysis of HF from the MVP and HERMES
consortium. We conducted several analyses to minimize confounding and
biases. For the MR results that passed our significance threshold (FDR
<5%), we performed genetic colocalization analysis to ensure the MR
results were unlikely to be confounded by linkage disequilibrium (LD).
For the MR results with evidence of colocalization, we conducted MR and
colocalization analyses against HF risk factors and cardiac MRI traits
and cis-eQTL searches. Then, we conducted a novel multi-step analytical
approach to reduce the risk of horizontal pleiotropy.
We used 2900 cis-pQTLs across 1557 genes from the Fenland study as
proposed instrumental variables for conducting two-sample MR of
proteomics with HF. We found 16 genes passed our MR threshold (FDR
<5%), of which ten genes also showed suggestive evidence of
colocalization between HF and pQTL signals (posterior probably of
Hypothesis 4 (PP.H4): one common causal variant >0.5) for at least one
of the instruments, and of which three genes show strong evidence of
colocalization (PP.H4 >0.8), see details in Table [128]2 and
Supplementary Data [129]6. Except for ENPEP, no other gene that
colocalized was within 500 KB of a known HF GWAS loci. For genes with
more than one instrument, we did not observe any evidence of
heterogeneity based on Cochran’s Q statistic according to the IVW model
or by the MR-Egger intercept test, Table [130]2. This lack of
heterogeneity suggests that average directional horizontal pleiotropy
may not explain these findings.
Table 2.
Protein-hits for heart failure identified through Mendelian
randomization that passed an FDR threshold of 5%
Protein Gene Name Number of SNPs Odds Ratio 95% CI P value pHet† FDR
MR-Egger Intercept (95% CI); P value Coloc PP.H4* Druggability
classification (Chemical Modality) Near known HF gene**
ITIH4* 2 1.13 (1.07, 1.17) 2.51E-07 0.66 3.90E-04 N/A 0.97
Non-druggable No
APOC3* 1 1.19 (1.11, 1.28) 1.74E-06 NA 1.35E-03 N/A 0.99 Advanced
Clinical Phase (Oligonucleotide) No
MAPK3 3 0.95 (0.93, 0.97) 6.70E-06 0.41 3.48E-03 0.02 (−0.01, 0.05);
0.43 0.52 Advanced Clinical Phase (Small molecule) No
TNFSF12 2 0.96 (0.94, 0.98) 1.78E-05 0.05 6.94E-03 N/A 0.79 Clinical
Phase 1 (Antibody) No
ABO 2 1.02 (1.01, 1.03) 2.89E-05 0.11 8.99E-03 N/A 0.01 Non-druggable
ABO
APOH* 2 0.96 (0.94, 0.98) 5.24E-05 0.83 1.36E-02 N/A 0.89 Non-druggable
No
B3GNT8 2 0.97 (0.96, 0.99) 9.35E-05 0.96 2.08E-02 N/A 0.48
Non-druggable No
NTN4 2 1.08 (1.04, 1.13) 1.10E-04 0.68 2.14E-02 N/A 0.04 Non-druggable
No
DLL1 1 0.87 (0.8, 0.93) 1.53E-04 NA 2.65E-02 N/A 0.75 Non-druggable No
MST1 3 1.02 (1.01, 1.03) 1.99E-04 0.11 3.10E-02 −0.20 (−0.38, −0.01);
0.29 0.37 Non-druggable No
ENPEP 4 0.96 (0.94, 0.98) 3.12E-04 0.18 4.27E-02 0.01 (−0.02, 0.03);
0.62 0.74 Non-druggable PITX2, FAM241A
NAE1 1 0.82 (0.74, 0.91) 3.55E-04 NA 4.27E-02 N/A 0.6 Advanced Clinical
Phase (Small molecule) No
TNXB 1 1.03 (1.02, 1.05) 3.56E-04 NA 4.27E-02 N/A 0.61 Non-druggable No
SIRPA 1 0.98 (0.97, 0.99) 3.94E-04 NA 4.39E-02 N/A 0.56 Non-druggable
No
EBI3 1 0.75 (0.64, 0.89) 4.44E-04 NA 4.61E-02 N/A 0.01 Non-druggable No
IL27 1 0.75 (0.64, 0.89) 4.44E-04 NA 4.61E-02 N/A 0.4 Non-druggable No
[131]Open in a new tab
Gene names are italicized.
Significant MR results, FDR <5%. MR estimates were calculated using
Wald ratio for instruments with one variant and inverse-variance
weighting and fixed effects for instruments that contained more than
one variant. Note that an OR >1 indicates an increase in protein
corresponding with an increase in HF risk or vice versa, suggesting
that the therapeutic solution may be an inhibitor; an OR <1 indicates
either a decrease in protein levels corresponding with an increase in
HF risk or an increase in protein levels corresponding with a decrease
in HF risk, suggesting the therapeutic solution may be an agonist.
Genes that passed a colocalization threshold of PP.H4 >0.5 (suggestive
threshold) are highlighted in bold and PP.H4 >0.8 (strong threshold)
are marked with an asterisk.
MR Mendelian randomization, FDR false discovery rate, PP.H4 posterior
probability of H4.
*Posterior probability of H4 (one common causal variant) from
colocalization of pQTL and GWAS results.
**Previously reported HF GWAS gene for instruments in GWAS loci (within
500 KB up or down from each loci).
^†MR pHet were measured by Cochran’s Q-test for heterogeneity across
individual-variant MR estimates within a genetic instrument;
instruments containing one variant were not tested for heterogeneity.
Except for ENPEP, TNXB, and SIRPA, all the other genes that passed
thresholds for MR and colocalization with HF also showed an association
(defined as MR p < 1 × 10^−4 and colocalization: PP.H4 >0.5) with at
least one of the 15 HF risk factors (Fig. [132]3b and Supplementary
Data [133]7). We observed that the directionality of the MR
associations with HF risk factors was concordant with the MR findings
on HF in 10 out of the 14 (71%) associations. HDL-C, LDL-C, and
systolic BP accounted for discordant associations. Only the TNFSF12
gene showed an association with an LV cardiac MRI trait that passed
statistical thresholds for MR and colocalization, see details in
Supplementary Data [134]7, [135]8. We investigated if the cis-pQTL
instruments for the ten MR genes were also cis-eQTLs (p < 5 × 10^−8).
Twelve of the 18 proposed instruments were also cis-eQTLs in at least
one tissue. None of the cis-pQTLs used as proposed instruments for
TNXB, APOC3, and APOH genes showed a cis-eQTL association
(Supplementary Data [136]9).
In our assessment of horizontal pleiotropy (see Methods and
Supplementary Fig. [137]5), the 18 proposed instruments for the ten MR
genes were associated (p < 5 × 10^−8) with 251 proteins or gene
expression using SOMAscan V4, Fenland study and eQTLGen, respectively
(Supplementary Data [138]10). For 217 of the 251 proteins/gene
expression, we identified at least one cis-pQTL or cis-eQTL at
p < 5 × 10^−8 associated with protein levels based on the SOMAscan V4
Fenland study, or gene expression based on eQTLGen. We then conducted
two-sample MR of these secondary proteins/genes expression against HF
and identified four genes (TP53, ZNF259, ACVR2A, and MYRF) that passed
multiple testing thresholds (0.05/217, p < 2 × 10^−4, Supplementary
Data [139]10). These four secondary genes correspond to the following
genes identified by MR proteomics as hits for HF: TNXB (ACVR2A and
MYRF), APOH (TP53), and APOC3 (ZNF259). TP53 and ACVR2A were in a
different biological pathway than APOH and TNXB, respectively,
suggesting potential horizontal pleiotropy. ZNF259 and MYRF did not
retrieve any biological pathways; hence, it is unknown if these are due
to horizontal pleiotropy. We then determined protein–protein
interaction (PPI) networks for APOH and TNXB proteins using Enrichr and
GPS-Prot databases. The Enrichr’s PPI Hub Protein pathways reported
interactions between APOH and CDC42, AKT1, TP53, and GRB2 (adjusted p
values <0.04), while the GPS-Prot showed that the APOH protein is
directly connected to TP53 with confidence >0.6 (Supplementary
Fig. [140]6). No significant interaction was identified for the TNXB
and ACVR2A proteins.
Genetic correlation estimates
Estimates of the genetic correlation between HF and 15 HF risk factors
are reported in Supplementary Data [141]11. Results that pass multiple
testing at 5% FDR are denoted, including a positive genetic correlation
between HF and BMI of 0.56 (0.03) and with AF of 0.11 (0.02), as well
as a negative genetic correlation between HF and HDL-C of −0.36 (0.03)
(Supplementary Data [142]11).
Polygenic risk score validation
To test the PRS for HF in an out-of-sample cohort, we used data from
75,119 participants of European descent from the BioVU, of which 5845
participants had HF. Individuals with a 1-standard deviation increase
in the PRS had a 1.28 higher odds of HF (95% confidence interval (CI),
1.24–1.31; p < 2 × 10^−16). Participants in the top decile had a
1.82-fold (95% CI, 1.60–2.06; p < 0.0001) higher odds of HF compared to
those in the bottom PRS decile.
Pathway enrichment analysis recovers pathways relevant to HF
We used previously published and our newly identified HF GWAS variants
(n = 40) together with the 18 proposed instruments for the ten
MR-proteomics genes associated with HF and conducted gene pathway
enrichment analysis using GTEx V8. These 58 variants are associated
with 1605 GTEx V8 cis-eQTLs (p < 1 × 10^−4), corresponding to a total
of 165 unique genes (see Supplementary Data [143]12). After restricting
the analysis to pathways described in Gene Ontology, KEGG, and
Reactome, we observed 56 enriched pathways (FDR <5%). Biological
pathways include muscle adaptation (adjusted p value = 0.03),
ventricular system development (p = 0.03), sarcomere organization
(p = 0.04), regulation of vasculature development (p = 0.04), and
aldosterone-regulated sodium reabsorption (p = 0.04), details on
Supplementary Fig. [144]7 and Supplementary Data [145]13.
For the 18 GWAS distinct variants on HF, we determined the differential
gene expression associated with the novel HF variants (p < 1 × 10^−4)
in each GTEx V8 tested tissue (heart atrial, heart ventricle, artery
aorta, adipose, liver, kidney, and whole-blood tissues, and transformed
cultured fibroblasts). We then used the set of differentially expressed
genes to conduct an overrepresentation analysis on a per-tissue basis
(Supplementary Fig. [146]8). A total of 605 enriched pathways had at
least two differentially expressed genes, with heart-left ventricle
being the tissue with the most significantly enriched pathways
(n = 393). The rs6945340/HIP1 variant showed the largest number of
enriched pathways (n = 391, all tissues) with the heart’s left
ventricle being the primary tissue. Pathways to highlight for this
variant include the Krebs cycle, respiratory electron transport chain
(both with p = 4.8 × 10^−30), and oxidative phosphorylation
(p = 3.2 × 10^−5). Further details are available in Supplementary
Data [147]14. For eight of the MR-proteomics genes, we identified 77
reported associations with HF-related medical terms according to the
EpiGraphDB database (Supplementary Data [148]15).
Mouse knock-out models for novel genes identified by GWAS or MR-proteomics
We queried for knock-out (KO) mouse models, using the Mouse Genomics
(MGI) resource, for evidence that modification of the target produces a
phenotype relevant to HF. In 13 genes (eight GWAS and five
MR-proteomics genes), we retrieved evidence of a KO associated with
cardiovascular abnormalities. KO models on CAMK2D, PRKD1, MAPK3, NAE1,
SLC39A8, PHIP, RFX4, SCARB1, and TNXB showed phenotypes such as
myocardial abnormalities, dilated cardiomyopathy, abnormal response to
cardiac infarction, and cardiac hypertrophy, suggesting an intrinsic
role in heart function regulation (Supplementary Data [149]16).
Druggability
A total of seven novel genes from the GWAS (CAMK2D, PRKD1, and PRKD3)
and MR-proteomics (MAPK3, TNFSF12, APOC3, and NAE1) were identified to
encode proteins that are predicted to be druggable (CAMK2D) or targets
for 14 unique drugs that are either licensed or in the clinical phase
(PRKD1, PRKD3, MAPK3, TNFSF12, APOC3, and NAE1). Except for drugs
targeting Apolipoprotein C-III mRNA, Volanesorsen, and
AKCEA-APO-CIII-L[Rx] evaluated for familial chylomicronemia syndrome,
all the other 12 drugs are either licensed or under clinical
investigation for cancer (n = 10 (MAPK3, PRKD1, PRKD3, and NAE1)) or
autoimmune disorders (n = 2, (TNFSF12)). In four of the seven druggable
genes, we were able to use our MR findings to infer the type of
pharmacological action (agonist versus antagonist) needed to prevent HF
and compared this against the pharmacological action of the existing
drugs with a single target (which are most likely to reproduce genetic
findings). Through this process, we observed a match in one gene
(APOC3); and for the other druggable genes (MAPK3, NAE1, and TNFSF12),
the existing drugs were an inhibitor/antagonist, while MR suggested an
agonist, details on Supplementary Data [150]17.
In silico trials
We searched for genetic associations for the GWAS hits and conducted
two-sample MR for the MR proteomics hits to evaluate safety and
efficacy outcomes relevant to the primary prevention trials on HF.
Seven of the 18 GWAS distinct variants and two of the ten MR-proteomics
genes were additionally associated (p < 1 × 10^−4) with efficacy
outcomes (CAD, T2DM) in the same direction as HF (Supplementary
Data [151]18). None of the 18 distinct GWAS variants or ten
MR-proteomics genes showed an association (p < 1 × 10^−4) with the
following safety traits: cancers (lung, prostate, colorectal, breast),
chronic kidney disease, Alzheimer’s disease, liver enzymes, or
creatinine.
Comparison with Global Biobank Meta-analysis Initiative (GBMI) on HF
An unpublished study from the GBMI reporting a multi-ancestry HF GWAS
(68,408 HF cases and 1,286,331 controls) identified 11 potentially
novel loci for HF^[152]15. We compared these associations with our
HERMES-MVP GWAS and determined that seven of the 11 GBMI variants were
associated (p < 5 × 10^−8) in our HF meta-analysis. None of these
variants were associated (p < 5 × 10^−8) in the HF GWAS in MVP
African-Americans dataset (Supplementary Data [153]19). Two GBMI loci
correspond to the same variants (rs10455872/PLG and rs600038/ SURF1)
previously reported by HERMES or MVP, and an additional five loci were
in LD (r^2 range: 0.39 to 1) with our findings (Supplementary
Data [154]19). Finally, two GBMI GWAS variants (rs17035646 and
rs61208973) showed suggestive evidence of association in our HF GWAS
(p < 0.003). In a replication study of the 18 novel loci, findings from
the HF GWAS in the GBMI multi-ancestry excluding UK Biobank indicate
33.3% (6 of 18) of variants are significant (p value <0.05/18), 61.1%
(11 of 18) are nominally significant (p value <0.05), and 100% have a
beta estimate that is directionally concordant with our meta-analysis
(Supplementary Data [155]20).
Discussion
Our genetic analysis on HF consisting of 90,653 cases identified 18
distinct HF variants through GWAS and an additional ten putatively
causal genes for HF through MR and colocalization using proteomic
instruments. Our study expands the knowledge on the biological pathways
associated with all HF risk loci discovered to date and identifies
seven druggable genes as potential drug targets for the primary
prevention of HF.
We conducted several strategies to provide biological credibility to
our 18 distinct GWAS variants. First, 16 of the 18 variants showed
genetic associations with HF risk factors that were directionally
concordant with the HF findings, and several LV cardiac MRI traits.
Second, overrepresentation analysis using differentially expressed
genes by each GWAS variant identified the heart LV myocardium as the
most significantly enriched tissue and recovered several pathways of HF
relevance. Third, systematic querying on KO mouse models identified
CAMK2D, PRKD1, PHIP, RFX4, SLC39A8, and SCARB1, genes found by our
GWAS, with phenotypes relevant to HF. Novel variants to highlight
include rs3820888/SPATS2L and rs4755720/HSD17B12 that showed
associations with HF risk factors and LV cardiac MRI traits. The
rs3820888/SPATS2L variant showed evidence of colocalization with six
cardiac MRI traits, including LVEF, LV mass to end-diastolic volume
ratio, and AF, all of which were directionally concordant with the HF
findings. Previous GWAS have also indicated that the same variant was
also associated with QT interval^[156]16. The rs4755720/HSD17B12
variant colocalized with LV end-diastolic volume indexed to BSA and HF
risk factors that were directionally concordant with the HF findings,
all showing a protective effect. Previous GWAS indicated that this
variant, as well as others in strong LD, associated with a reduction in
adiposity measures and an increase in lung function metrics, suggesting
that cardiometabolic fitness may explain the association with
HF^[157]17–[158]19.
We conducted MR-proteomic analyses to uncover the putative causal role
of human proteins in HF. Ten genes passed our genetic colocalization
test, of which nine were also not in LD with a previously reported HF
variant, minimizing the probability of confounding by LD. Seven of the
10 genes showed associations with at least one HF risk factor, and in
the majority (71%) of these associations, the point estimate was
directionally concordant with the MR findings on HF.
Four (MAPK3, PRKD1, CAMK2D, and PRKD3) of the seven druggable genes
identified by our analyses encode proteins with serine/threonine kinase
activity. These four genes are associated with HF risk factors in a
manner that is concordant with the findings on HF. CAMK2D also showed a
suggestive association (p = 9 × 10^−4) with LV mass. In support of our
findings, a mouse model with deletion of MAPK3/MAPK1 genes developed
cardiac hypertrophy and ventricular dilation followed by reduced
ventricular performance^[159]20. CAMK2D, PRKD1, and PRKD3 are
calcium/calmodulin-dependent protein kinases known to be associated
with cardiac pathophysiology. Protein kinase-D, encoded by the PRKD1
gene, appears to be a regulator of myocardial structure and function.
Mice with a deletion of PRKD1 in cardiomyocytes were reported to be
resistant to stress-induced hypertrophy in response to pressure
overload, angiotensin-II, and adrenergic activation^[160]21.
Calcium/Calmodulin-Dependent Protein Kinase II (CamKII) is composed of
four chains, one of which, delta (δ), is encoded by the CAMK2D gene.
CamKII-δ is largely expressed in cardiac tissue (confirmed by our
pathway enrichment analysis), where it regulates proteins involved in
calcium handling, excitation-contraction coupling, activation of
hypertrophy, cell death, and inflammation^[161]22. Several case-control
studies have shown an upregulation of cardiac CamKII-δ expression and
activity in patients with HF, dilated cardiomyopathy, and diabetic
cardiomyopathy. In support of this, several experimental studies in
animal models of dilated cardiomyopathy and HF have shown that chemical
inhibition of CamKII led to protection from cardiac dysfunction,
adverse cardiac remodeling, and cardiac arrhythmias^[162]22. More
recently, the administration of a novel ATP-competitive CaMKII-δ oral
inhibitor (RA306) in a dilated cardiomyopathy mouse model led to an
improvement in ejection fraction^[163]23. This oral inhibitor offers
the opportunity to test the causal role of CamKII-δ through clinical
trials for the prevention of HF. Interestingly, the CAMK2D gene was
also associated with AF, confirming an association demonstrated by
in-vitro and animal models of AF^[164]22.
Additional druggable genes identified were APOC3, TNFSF12, and NAE1.
The APOC3 gene, which achieved the highest level of evidence in our
analyses (FDR 5% and PP.H4 >0.8), is known for its associations with
lipids, and CAD, which were confirmed in our analysis. Apolipoprotein
C-III mRNA is targeted by two different antisense oligonucleotides
(ASO), Volanesorsen and AKCEA-APO-CIII-L[Rx], evaluated for familial
chylomicronemia syndrome. Phase 3 trials on Volanesorsen have shown an
increase in LDL-C levels and thrombocytopenia, which makes it an
unlikely candidate for the prevention of HF.^[165]24
AKCEA-APO-CIII-L[Rx] is an ASO liver-specific that appears to have a
better safety profile, and may be more suitable for long-term
use^[166]25. TNFSF12 gene encodes for the TNF superfamily member 12
protein; increased levels of this protein were associated with a risk
reduction in HF according to our MR and colocalization findings.
Similar, directionally concordant, findings were reported by recent MR
proteomics (using various proteomics platforms) against ischemic
stroke^[167]26. These results are consistent with the finding that
TNFSF12 is MR associated and colocalized with AF, a risk factor for
both ischemic stroke and HF. In addition, we observed a clear reduction
in LV mass to end-diastolic volume ratio and a suggestive
(p = 2 × 10^−3) increase in LVEF, both directionally concordant with a
risk reduction in HF. Transgenic mice and adenoviral-mediated gene
expression models have also pointed to the role of TNFSF12 in the
development of dilated cardiomyopathy and severe cardiac
dysfunction^[168]27. NAE1 gene encodes NEDD8 activating enzyme E1
subunit 1 protein, and our MR and colocalization findings showed this
gene was associated with lower values of blood pressure, which
coincides with the reduced risk of HF.
The strengths of the current analysis are multiple. First, the large
number of HF cases included in our analysis led us to identify new
variants and putatively causal genes for HF through GWAS and MR
proteomics. Second, we used three complementary strategies— nearest
gene (local method), PoPs (global method), and pLoF— to assign the most
likely gene responsible for the GWAS signal with HF. Through this
process, we observed agreements in 11 of 18 GWAS variants, which
provided some degree of confidence in the gene prioritization. However,
we acknowledge that the PoPs method will miss variants that do not act
through various mechanisms captured by PoPs^[169]13, highlighting the
challenge in assigning the gene responsible for GWAS
loci^[170]28–[171]30. Third, we provide biological credibility for most
of our genetic findings through an extensive and complementary analysis
covering HF risk factors, LV cardiac MRI, and -omics. Fourth, in seven
MR hits for HF, we showed that our proposed instruments, in addition to
associations with HF risk factors or LV cardiac MRI traits, were also
associated with gene expression, and protein levels all acting in cis.
Fifth, KO models of thirteen genes identified through GWAS and MR
developed highly relevant phenotypes to HF and in some cases (CAMK2D),
specific pharmacological inhibition showed reversibility of the HF
phenotypes. Six, the lack of associations between the distinct GWAS
loci and MR genes with safety outcomes used in the primary prevention
trials of HF provides some reassurance on target safety profiles.
The degree of credibility on the causality of proteins identified by MR
depends on whether the MR assumptions are valid. First, our
colocalization analysis on HF, risk factors for HF, and LV cardiac MRI
traits make confounding by LD unlikely. The selection of cis-variants
as proposed instruments minimize the chances of horizontal pleiotropy.
To further minimize the chances of horizontal pleiotropy, we developed
a novel analysis that attempted to empirically test the relevant
conditions needed for horizontal pleiotropy to invalidate MR. First, we
looked for secondary proteins or gene expression associated with our MR
protein-hits, and then evaluated if those secondary proteins/gene
expression were associated with HF and fall in a biological or PPI
pathway outside our protein-hits. After doing this, only TNXB showed
some evidence of horizontal pleiotropy. Interestingly, cis-pQTLs used
as instruments for TNXB were not associated with cis-eQTLs, HF risk
factors or LV cardiac MRI traits. Although we used multiple lines of
evidence to determine putative causal genes, the pathway enrichment
analysis identifies pathways linked to cardiac biology, but may not
point to specific insights for HF, and we did not functionally validate
any of our results, which remains as the highest level of evidence to
support causal roles for the hits, especially those that pass the
suggestive MR and coloc thresholds of FDR 5% and PP.H4 >0.5.
Although most of our variants and genes showed associations with HF
risk factors that were biologically concordant with HF risk, some
discordant associations were observed. HDL-C and diastolic BP accounted
for most of these discordant associations. It has been reported that
higher levels of diastolic BP may be protective on HF^[172]31,[173]32,
instead of deleterious as we assumed, while the HDL-C association with
HF seems to be non-linear^[174]32, which was not accounted for in our
MR analysis that included HDL-C as a co-variable. We validated seven of
the 11 variants reported in an unpublished multi-ancestry HF GWAS by
GBMI^[175]15. Another limitation is that our analysis was restricted to
individuals of European ancestry. While this does reduce the potential
bias caused by population stratification, our results may not apply to
populations of other ancestral groups. Future HF GWAS meta-analysis
including larger releases of MVP, All of US, and GBMI will not only
provide chances for replication of variants identified in Europeans,
but also include non-white populations to further increase the
discovery of genetic determinants of HF.
Although the absence of HF subtypes in this analysis most certainly
decreased our ability to detect signals specific to HF subtypes, it
does not invalidate the ones identified. Evidence from primary
prevention trials using HF as an outcome (as our genetic study) that
uncovered the benefits of BP lowering therapies and statins indicates
the plausibility for translation of our genetic findings. Future
genomic analysis should extend to different HF subtypes, with a focus
on HF with preserved ejection fraction, a major unmet need in medicine.
Although our design attempted to emulate a primary prevention trial on
HF, further studies with access to individual participant data that
reliably recreate eligibility criteria and outcome ascertainment that
cover efficacy (including HF subtypes) and safety outcomes are needed.
In conclusion, we discovered a total of 18 distinct novel HF-associated
variants and ten putatively causal genes for HF through GWAS and
MR-proteomics with evidence of biological plausibility. The new
mechanisms and pathways together with the seven druggable genes
discovered provide a tractable path for the translation of our genomic
findings for the primary prevention of HF.
Methods
Clinical and demographic characteristics
The study population for the meta-analysis consisted of 1,279,610
participants, of which 302,287 were from MVP (43,344 cases and 258,943
controls) and 977,323 were from HERMES Consortium (47,309 cases and
930,014 controls). The clinical and demographic features of the
participants are summarized in Supplementary Data [176]1. A detailed
breakdown of clinical and demographic characteristics according to each
study included in the HERMES Consortium has been previously
published^[177]8. The population characteristics of the BioVU PRS
cohort can be found in Supplementary Data [178]21.
Genotyping, quality control, and imputation of genetic data
For the data obtained from the Million Veteran Program (MVP), DNA was
extracted from participants’ blood and genotyped using the MVP 1.0
Genotyping Array, which is enriched for both common and rare genetic
variants of clinical significance. Imputation performance was assessed,
and variants that had poor quality as determined were removed from
further analyses. All studies included in the HERMES Consortium
utilized high-density genotyping arrays. A detailed table summarizing
the genotyping, quality control, imputation, and analysis across the 29
distinct datasets included in the HERMES Consortium has been previously
described^[179]8. For quality control, the per-variant call rate and
the per-sample call rate across all studies was at least greater than
90^[180]8. The MAF threshold ranged from >0 to 1% across
studies^[181]8. Further details can be found in the Supplementary
Information.
Phenotyping of heart failure
Across all 26 cohorts of the HERMES Consortium, cases with HF were
identified by a clinical diagnosis of HF of any etiology, as determined
by physician diagnosis or adjudication, ICD codes, and imaging, and
controls were participants without a clinical diagnosis of HF. In the
MVP, HF patients were identified as those with an International
Classification of Diseases (ICD)−9 code of 428.x or ICD-10 code of
I50.x and an echocardiogram performed within 6 months of diagnosis
(median time period from diagnosis to echocardiography was 3 days,
interquartile range 0–32 days). Further details can be found in the
Supplementary Information.
Genome-wide association study for HF
We performed a fixed effects inverse-variance weighted meta-analysis HF
from the published MVP (n = 302,258) and HERMES (n = 964,057)^[182]8
GWAS using METAL^[183]33 (version release 2020-05-05) in a total of
1,266,315 individuals. We removed variants with a MAF <0.5%, resulting
in 10,227,138 associations.
We used FUMA^[184]34 to annotate our results using the default
settings. In accordance with the default FUMA parameters, we defined
distinct variants to have an R^2 < 0.6 and determined the associations
that were >500 KB from a previously reported indexed variant in MVP and
HERMES. We used the closest gene to the indexed variant and the top
gene per locus identified by PoPs to prioritize genes for our
GWA-significant (p < 5 × 10^−8) loci.
The PoPS method^[185]13 is a new gene prioritization method that
identifies the causal genes by integrating GWAS summary statistics with
gene expression, biological pathway, and predicted protein–protein
interaction data. We applied the PoPS score because it has been shown
to nominate causal genes at non-coding GWAS loci with greater
predictive confidence compared to other similarity-based or locus-based
methods^[186]13. By leveraging a framework unbiased by previous
trait-specific knowledge, the PoPs tool can prioritize causal genes and
therefore highlight relevant biological pathways with greater
confidence. First, as part of the PoPS analysis, we used MAGMA to
compute gene association statistics (z-scores) and gene–gene
correlations from GWAS summary statistics and LD information from the
1000 Genomes. Next, PoPS performs marginal feature selection by using
MAGMA to perform enrichment analysis for each gene feature separately.
The model is fit by generalized least squares (GLS), and MAGMA results
are used to perform marginal feature selection, retaining only features
that pass a nominal significance threshold (p < 0.05). Then, PoPS
computes a joint enrichment of all selected features simultaneously in
a leave one chromosome out (LOCO) framework. The gene features employed
by PoPS are listed here:
[187]https://github.com/FinucaneLab/gene_features. The PoPs method uses
data from gene expression datasets, protein–protein interaction
networks, and pathway databases; however, variants that act through
mechanisms not captured by the PoPs model would not be identified.
Finally, PoPS computes polygenic priority scores for each gene by
fitting a joint model for the enrichment of all selected features. The
PoP score for a gene is independent of the GWAS data on the chromosome
where the gene is located. The PoPS analysis returned scores for a
total of 18,383 genes per set of GWAS datasets. We then annotated our
GWAS loci with the Ensembl genes in a 500 kb window and selected the
highest PoP score gene in the locus as the prioritized gene. For all
the genes suggested by the nearest gene and PoPS, we conducted
gene-burden tests derived using a gene-based (mean) approach in a mixed
model framework using the Genebass-UK Biobank resource (see
Supplementary Information).
Genome-wide association study in African-Americans MVP subpopulation
We conducted a GWAS of HF in the African-American MVP subpopulation and
performed lookups for our novel HF variants as well as the previously
described HF variants. The African-American subpopulation in the MVP is
composed of 11,399 cases with heart failure and 69,726 controls, of
which 94.9% cases and 85.4% controls were male with a mean age of 63.82
(9.92) and 56.39 (12.20) for the cases and controls, respectively
(Supplementary Data [188]4).
Associations of HF GWAS variants with HF risk factors and LV cardiac MRI
traits
For genetic variants that passed the GWAS threshold for HF
(p < 5 × 10^−8), we determined genetic associations for 15 HF risk
factors and nine LV cardiac MRI traits derived from available GWAS.
Data on HF risk factors was obtained from European-descent GWAS
studies: BMI^[189]35, smoking^[190]36, alcohol intake
frequency^[191]37, AF^[192]38, diastolic and systolic BP^[193]39,
T2DM^[194]40, CAD^[195]41, LDL-C^[196]42, HDL-C^[197]42, estimated
glomerular filtration rate (eGFR)^[198]29, and chronic obstructive
airways disease (COPD)^[199]36, and troponin I cardiac muscle,
N-terminal proBNP (NT-proBNP), and interleukin-6 (IL-6).
For LV cardiac MRI traits, we determined genetic associations from two
separate publications. Seven LV cardiac MRI measurements in 36,041
participants of the UK Biobank from ref. ^[200]43 and LV mass and LV
mass to end-diastolic volume ratio from cardiac MRI in 42,157 UK
Biobank participants from Aung et al. (unpublished) using automated CMR
analysis techniques and LV GWAS techniques^[201]44,[202]45.
We used p < 1 × 10^−4 (0.01/number of secondary to HF traits tested in
the manuscript) to account for multiple testing. For associations that
passed our p value threshold, we evaluated whether the directionality
of HF risk factors associations was concordant with findings on HF; for
example, for a variant that showed an increased risk of HF, we expect a
positive association with a deleterious risk factor.
Mendelian randomization on 1557 proteins and HF
Selection of proposed pQTL instruments
We obtained pQTLs from a genome-proteome-wide association study in the
Fenland study of 10,708 participants of European-descent^[203]12
(retrieved from [204]www.omicscience.org). The genome-proteome-wide
association study was conducted using 10.2 million genetic variants and
plasma abundances of 4775 distinct protein targets (proteins targeted
by a least one aptamer) measured using the SOMAscan V4 assay^[205]12.
Significant genetic variant pQTLs were defined as passing a Bonferroni
p value threshold of p < 1.004 × 10^–11. Approximate conditional
analysis was performed to detect secondary signals for each genomic
region identified by distance-based clumping of association
statistics^[206]12. To diminish the likelihood of horizontal
pleiotropy, we restricted proposed instrumental variables to (lead and
secondary signals) cis-pQTLs using a p value threshold of p < 5 × 10^−8
in marginal statistics, where cis is defined as any variant within a
+/− 1 Mb region of the protein-encoding gene. A total of 2900 cis-pQTLs
across 1557 genes (mean = 1.9, min = 1, max = 14) covering an equal
number of proteins from the Fenland study were used as proposed
instrument variables for conducting two-sample MR of proteomics against
HF.
Mendelian randomization and colocalization
We performed two-sample MR using the TwoSampleMR package in R
([207]https://mrcieu.github.io/TwoSampleMR/)^[208]46. The Wald Ratio
was used for instruments with one variant and the inverse-variance
weighted MR method was used for instruments with two or more variants.
We tested the heterogeneity across variant-level MR estimates, using
the Cochrane Q method (mr_heterogeneity option in TwoSampleMR package)
and plotted the effects of the variants on the proteins against the
effects of the variants on HF to validate our instruments when more
than one variant was included. We defined significant MR results using
a false discovery rate (FDR) of 0.05 calculated by the
Benjamini–Hochberg method (corresponding p value = 5 × 10^−4). We used
the MR-Egger intercept test to detect potential directional pleiotropy,
and report the Egger intercept and corresponding standard error and p
value for genes with three or more variants, where the MR-Egger
intercept can be interpreted as an estimate of the average horizontal
pleiotropic effect of the genetic variants^[209]47.
MR assumes the SNP influences the outcome only through exposure. To
help guard against the existence of distinct but correlated causal
variants for the exposure and outcome, for results that passed our MR
threshold (FDR <0.05), we performed colocalization using the COLOC
package^[210]48 in R. Colocalization assesses the probability of a
shared causal variant (PP.H4) or distinct causal variants (PP.H3)
between the HF GWAS and cis-pQTL instruments for the protein of
interest. We performed conditional analysis on the pQTL data to
identify conditionally distinct pQTL signals and performed
colocalization using marginal (unadjusted) pQTL results as well as
results conditional on each of the instruments used in the MR.
Statistically significant MR hits with a posterior probability of a
shared causal variant (PP.H4) >0.5 for at least one instrumental
variant were then investigated further. Colocalization was performed
using with default priors (prior probability of initial trait
association is 1 × 10^−4, prior probability of shared causal variant
across two traits is 1 × 10^−5). We also investigated if the cis-pQTL
instruments for genes that passed both MR and colocalization thresholds
were also cis-eQTLs (p < 5 × 10^−8). Tissues used were whole blood from
eQTLGen and heart atrial, heart ventricle, artery aorta, adipose,
liver, kidney tissues, and transformed cultured fibroblasts from GTEx
V8.
MR and colocalization for HF risk factors and cardiac MRI traits
For proteins that passed both MR and colocalization thresholds, we
conducted two-sample MR analyses of these proteins, using cis-pQTLs
from the Fenland study as proposed instrumental variables, against 15
HF risk factors and nine cardiac MRI traits described in the previous
section (see Supplementary Material for details on traits and
datasets). For the MR results that passed a p value threshold of
p < 1 × 10^−4, we conducted colocalization analyses as previously
described. We defined significant findings as those that passed
thresholds for MR (p < 1 × 10^−4) and colocalization (PP.H4 >0.5).
Assessment of horizontal pleiotropy
For statistical findings that passed the MR and colocalization
thresholds, we evaluated the possibility that horizontal pleiotropy may
invalidate our findings. The pipeline of analysis is depicted in
Supplementary Fig. [211]5. Step-1: We determined if our cis-pQTLs were
associated (p < 5 × 10^−8) with other proteins levels included in
SOMAscan V4 or with gene expression using data from eQTLGen. Step-2: We
queried if the genes (including genes that encode SOMAscan proteins)
identified in Step-1 were within 1MB of the risk loci for HF identified
by GWAS conducted to date. Step-3: We conducted a two-sample MR to
identify if the secondary genes/proteins (identified in Step-1) were
associated with HF, using a Bonferroni-corrected p value (0.05/number
of unique genes/proteins identified in Step-1). We leveraged as
proposed instruments the lead cis-pQTL (p < 5 × 10^−8) from the Fenland
study, and if it was not available, we used the lead cis-eQTL
(p < 5 × 10^−8) identified from eQTLGen. Step-4: We then mapped all
secondary genes/proteins identified in Step-3 to Reactome/KEGG
pathways; and compared if these pathways are on the same (vertical
pleiotropy) or different (horizontal pleiotropy) pathway as that
associated with the primary genes identified through MR proteomics for
HF. To further investigate the physiological functionalities of our
findings retrieved in Step-4, we queried two databases: the
Enrichr^[212]49–[213]51, an interactive gene knowledge discovery
database, and the GPS-Prot server^[214]52, a platform with aggregated
information about protein–protein interactions.
LD score regression
We used LD Score regression^[215]53 (LDSC) to estimate genetic
correlations between heart failure and 15 cardiovascular traits. We
estimated using European LD scores obtained from the 1000 Genomes
Project Phase 3 data for the HapMap2 SNPs. We used MungeSumstats to
perform standardization of association statistics^[216]54.
Polygenic risk score analysis
A polygenic score for heart failure was calculated using the HF
meta-analysis using the PRS-CS package^[217]55, which utilizes a
Bayesian regression framework to calculate posterior SNP effect sizes
under a continuous shrinkage prior. We used the LD reference panel
constructed using the 1000 Genomes Project Phase 3 data. We conducted
these analyses in Python, using the packages scipy and h5py. The PRS
was evaluated in the Vanderbilt University Medical Center (VUMC) BioVU,
a biobank that links the de-identified electronic medical record (EMR)
system containing phenotypic data to discarded blood samples from
routine clinical testing for the extraction of genetic data^[218]56. A
full description of the BioVU resource has been previously
published^[219]56. Participants with heart failure were identified by a
modified version of the eMERGE definition for heart failure, which
includes the International Classification of Diseases, Tenth Revision
(ICD-10) codes, where age was defined as age at heart failure for cases
and age at last medical visit for controls. To determine the ability of
PRS to stratify heart failure cases from controls, we used a logistic
regression model, adjusting for age, sex, and three principal
components of ancestry in the BioVU. We assessed enrichment in the more
extreme tail of the PRS distribution by evaluating the odds ratio for
individuals in the top PRS decile compared to individuals in the bottom
PRS decile. In the top decile of PRS, there were 723 participants with
HF and 6788 controls, and in the bottom decile, there were 416
participants with HF and 7096 controls.
Pathway enrichment analysis
We conducted an enrichment analysis to identify biological pathways
associated with HF risk loci (established and novel) that passed the
GWAS p value thresholds. For each locus, we selected the top variant
and then identified cis-eQTLs (within a 1 Mb region) from GTEx V8 in
any tissue associated with the top variants and extracted all genes
with a p < 1 × 10^−4. We merged all retrieved genes to a gene set that
was then used for inquiry for the enriched pathways. This set of genes
was set forth to an overrepresentation analysis using the pathways
described in Gene Ontology, KEGG, and Reactome. Selected pathways were
those significantly enriched at an FDR <0.05.
Additionally, we explored the downstream transcriptional consequences
associated with the distinct variants identified by our GWAS on HF and
those not previously reported. We used the distinct variants and
conducted a differential gene expression analysis (using a dominant
model) for all transcripts available in GTEx V8 for heart atrial, heart
ventricle, artery aorta, adipose, liver, kidney, transformed cultured
fibroblasts, and whole-blood tissues. After fitting models for our
variants, we retrieved all genes differentially expressed at a
p < 1 × 10^−4 and conducted an enrichment pathway analysis (through an
overrepresentation analysis, as described above). Enrichment analyses
were performed using the R packages clusterProfiler and
enrichplot^[220]57.
EpiGraphDB queries
To investigate the current knowledge about the biomedical functions of
the hit genes in association with HF, we used the EpiGraphDB
database^[221]58. We queried the biomedical and epidemiological
relationships curated in the database to identify associations between
the genes we identified and cardiovascular-related outcomes and risk
factors (see Supplementary Methods).
Querying the MGI database
We queried the Mouse Genome Informatics (MGI,
[222]http://www.informatics.jax.org/) resource for all candidate genes
from our novel GWAS hits list or those suggested as causal from our
MR/colocalization approach. MGI uses a standardized nomenclature, and
controlled vocabularies such as the Mouse Developmental Anatomy
Ontology, the Mammalian Phenotype Ontology, and the Gene Ontologies. As
MGI extracts and organizes data from the primary literature, we have
parsed all system abnormalities associated with models on all of the
queried genes^[223]59. For models that displayed cardiovascular
abnormalities, we have hand-curated the abnormalities and organized
them into three distinct groups associated with (1) congenital heart
malformations, (2) myocardial abnormalities, and (3) vascular
abnormalities.
Druggability annotations
Proteins encoded by genes identified in the GWAS and MR analyses for HF
were annotated with drug tractability information based on information
provided by OpenTargets^[224]10,[225]60,[226]61 (release 2021-03-08).
OpenTargets tractability system stratified drug targets into nine
mutually exclusive groups (termed “buckets”) based on the drug type and
the stage of the drug discovery pipeline. For easier interpretation, we
regrouped the original buckets into four mutually exclusive groups, as
follows: Licensed drugs: bucket-1 for antibodies, small molecules, and
other modalities. Drugs in clinical development: buckets 2 and 3 for
antibodies, small molecules, and other modalities. Compounds in the
preclinical phase: buckets 4 and 5 for small molecules. Predicted
druggable: buckets 6 to 8 for small molecules plus buckets 4 and 5 for
antibodies. The remaining proteins were considered non-druggable. For
genes that were the target of licensed drugs, we checked whether the
disease indication was also a risk factor for HF, as this may introduce
a bias analogous to confounding by indication in MR.
GBMI replication of novel loci
We conducted a replication of the 18 novel loci in the Global Biobank
Meta-analysis Initiative (GBMI) multi-ancestry GWAS on heart failure,
which includes 859,141 controls and 60,605 cases from BioBank Japan,
BioMe, BioVU, China Kadoorie Biobank, Estonian Biobank, FinnGen, Genes
& Health, HUNT, Lifelines, Michigan Genomics Initiative, Partners
Biobank, UCLA Precision Health Biobank, excluding UK Biobank^[227]62.
Heart failure cases were ascertained by ICD code (phecode 428.2). We
consider p < 0.05/18 as a level of significance for replication and
p < 0.05 as a level of nominal significance.
Reporting summary
Further information on research design is available in the [228]Nature
Portfolio Reporting Summary linked to this article.
Supplementary information
[229]Supplementary Information^ (13.2MB, pdf)
[230]Peer Review File^ (455KB, pdf)
[231]41467_2023_39253_MOESM3_ESM.pdf^ (161.4KB, pdf)
Description of Additional Supplementary Files
[232]Supplementary Data 1-21^ (253.1KB, xlsx)
[233]Reporting Summary^ (3.5MB, pdf)
Acknowledgements