Abstract
Atrial fibrillation (AF) is a common cardiac arrhythmia with strong
genetic components, yet its underlying molecular mechanisms and
potential therapeutic targets remain incompletely understood. We
conducted a cross-population genome-wide meta-analysis of 168,007 AF
cases and identified 525 loci that met genome-wide significance. Two
loci of PITX2 and ZFHX3 genes were identified as shared across
populations of different ancestries. Comprehensive gene prioritization
approaches reinforced the role of muscle development and heart
contraction while also uncovering additional pathways, including
cellular response to transforming growth factor-beta.
Population-specific genetic correlations uncovered common and unique
circulatory comorbidities between Europeans and Africans. Mendelian
randomization identified modifiable risk factors and circulating
proteins, informing disease prevention and drug development.
Integrating genomic data from this cross-population genome-wide
meta-analysis with proteomic profiling significantly enhanced AF risk
prediction. This study advances our understanding of the genetic
etiology of AF while also enhancing risk prediction, prevention
strategies, and therapeutic development.
Subject terms: Cardiovascular genetics, Risk factors, Predictive
markers, Genome-wide association studies
__________________________________________________________________
Atrial fibrillation has a strong genetic basis, but key mechanisms
remain unclear. Here, the authors show that a cross-population GWAS
meta-analysis identifies 525 loci and highlights shared and
ancestry-specific pathways relevant to AF risk and therapeutic
targeting.
Introduction
Atrial fibrillation (AF) is the most common arrhythmia, characterized
by disorganized atrial depolarizations, which can lead to symptoms
including palpitations and decreased exercise capacity, as well as more
serious complications. With an aging global population, AF has become
an epidemic and important health issue with increasing incidence and
prevalence^[56]1, particularly in North America and Europe^[57]2. The
Global Burden of Disease 2019 Study estimated that approximately 59.7
million individuals live with AF, which is associated with 8.4 million
disability-adjusted life years worldwide^[58]3. Hence, there is an
urgent need to elucidate the etiological basis of AF to improve risk
prediction, prevention and treatment.
While environmental factors play a role in AF development, the genetic
contribution to AF susceptibility has been increasingly recognized.
Multiple genome-wide association studies (GWASs) have uncovered over
100 risk loci, shedding light on AF’s genetic architecture^[59]4–[60]8.
However, existing studies have largely been conducted in European
populations, and a larger, more diverse GWAS—particularly one including
multi-populations—could enhance the discovery of variants with smaller
effects, as well as population-specific and shared loci.
Genomic data are now widely leveraged to improve disease risk
prediction, identify risk factors, and facilitate therapeutic
development. While some studies have explored these aspects for AF,
integrating cross-population genetic data with proteomic insights in a
large-scale study could further refine the identification of genetic
signals, associated comorbidities, causal risk factors, and potential
drug targets. Thus, we conducted a cross-population GWAS meta-analysis
involving over 2 million individuals and performed comprehensive
downstream analyses to uncover unreported genetic loci, identify causal
risk factors, and enhance AF risk prediction and therapeutic
opportunities.
Results
Cross-population GWAS meta-analysis identified 379 unreported loci
We conducted a cross-population GWAS meta-analyses and a series of
downstream analyses on AF (Fig. [61]1). The European meta-analysis,
which included 153,980 AF cases from nine studies (Supplementary
Data [62]1), identified 493 genetic loci reaching genome-wide
significance (Supplementary Data [63]2). Among these, five loci showed
significant evidence of heterogeneity in effect estimates across the
contributing GWAS (heterogeneity test; P < 0.05/493, Supplementary
Data [64]2). Of the 493 loci, 479 displayed consistent effect estimates
between the discovery and replication datasets (Supplementary
Data [65]2 and Supplementary Fig. [66]1), and 426 had P < 0.05 in the
replication study (Supplementary Data [67]2). Using linkage
disequilibrium score regression (LDSC) with the 1000 Genomes European
reference panel, common variants explain 11.2% (95% CI: 9.2%–13.2%) of
the variance in AF liability, assuming a 2% disease prevalence.
Fig. 1. Study design overview.
[68]Fig. 1
[69]Open in a new tab
AF atrial fibrillation, AFR African, AMR Admixed American, GWAS
genome-wide association study. EAS Eastern Asian, EUR European, SAS
South Asian.
The cross-population GWAS meta-analysis, which included 168,007 AF
cases, identified 525 loci that met genome-wide significance
(Fig. [70]2a). Thirteen loci demonstrated significant heterogeneity in
effect estimates (P[het] < 0.05/525, Supplementary Data [71]3). The
majority of risk alleles conferred small-to-moderate effect sizes, with
odds ratios (ORs) ranging from 1.0 to 1.3 per allele (Fig. [72]2b).
However, six lead SNPs had ORs exceeding 1.3 and were located in loci
with genes SORCS3, POLD1, AGBL4, [73]AC126283.1, PITX2, and FAM241A
(Fig. [74]2b). Among the 525 significant loci, the breakdown by
population revealed 483 loci in Europeans, 29 in East Asians, 5 in
Africans, and 2 in Admixed Americans (Fig. [75]2c). Two loci of PITX2
and ZFHX3 genes were identified as shared across these populations
(Fig. [76]2d).
Fig. 2. Genetic loci associated with atrial fibrillation (AF) across
populations of different ancestries.
[77]Fig. 2
[78]Open in a new tab
a Manhattan plot of GWAS associations. The x-axis represents the
genomic positions of SNPs across chromosomes, while the y-axis displays
the -log10(P) values, indicating the strength of the association. Each
dot represents a single SNP, positioned based on its genomic location
and statistical significance. The red dashed line marks the genome-wide
significance threshold (P = 5 × 10⁻⁸). The statistical test was
two-sided, and the Bonferroni-corrected significance level was applied.
b Scatter plot of minor allele frequency (MAF) versus effect size
(log-odds ratio) for variant-AF associations. Two gray dashed lines
indicate MAFs of 0.001 and 0.01. The loci with an effect of odds ratio
> 1.3 were labeled with the gene name. c Distribution of loci
identified across GWAS of different ancestries. d Venn diagram of
shared and unique loci across ancestries. Two loci near PITX2 and ZFHX3
were identified as shared across European (EUR), East Asian (EAS),
African (AFR), and Admixed American (AMR) populations. Source data are
provided as a Source Data file.
Comprehensive gene prioritization refined pathway exploration
Using a systematic prioritization framework, we nominated a likely
causal gene at each of the 504 genome-wide significant loci,
acknowledging that this assignment is based on available functional
evidence and may not be definitive for all loci. Among these, 70 genes
harbored protein-altering variants, and 47% of prioritized genes had
≥ 80% agreement across available methods (Supplementary Data [79]4).
To gain mechanistic insights into AF, we performed pathway enrichment
analysis using these 504 prioritized genes. Enrichment analysis in the
Reactome database identified 5 out of 1131 pathways significantly
associated with AF after Bonferroni correction. Among these, muscle
contraction and cardiogenesis showed strong associations (Fig. [80]3a
and Supplementary Data [81]5). In addition, we conducted enrichment
analysis using the Gene Ontology (GO) database. After Bonferroni
correction, we identified 50 biological processes (BP), 7 cellular
components (CC), and 6 molecular functions (MF) (Fig. [82]3b and
Supplementary Data [83]6). GO enrichment analysis reinforced the role
of muscle development and heart contraction in AF onset while also
uncovering additional pathways, including cellular response to
transforming growth factor-beta (TGF-β), artery morphogenesis,
regulation of cell communication via electrical coupling, and actin
filament-based movement (Supplementary Data [84]6).
Fig. 3. Pathways enriched based on AF-associated genes.
[85]Fig. 3
[86]Open in a new tab
a Pathway enrichment in the Reactome database. The x-axis represents
the effect size of the pathway’s influence on AF, while the y-axis
shows the -log10(P) values, indicating statistical significance. Each
dot corresponds to a pathway, with blue dots representing pathways that
are significant after Bonferroni correction. b Pathway enrichment in
the Gene Ontology (GO) database. The analysis includes pathways
categorized under biological processes (BP), molecular functions (MF),
and cellular components (CC). The x-axis represents the ratio of
AF-associated genes to the total number of genes in each pathway, while
the y-axis lists the pathways. Each dot represents a pathway, where the
color reflects the Bonferroni-adjusted p-value, and the size indicates
the count of AF-associated genes in each pathway. For clarity, the
figure only highlights the top 10 out of 50 BP pathways due to space
constraints. Full results, including all pathways, are provided in
Supplementary Data [87]5 and Supplementary Data [88]6. The statistical
test was two-sided, and the Bonferroni-corrected significance level was
applied. Source data are provided as a Source Data file.
Population-specific genetic correlations uncovered circulatory comorbidities
After Bonferroni correction, AF was significantly associated with 95 of
128 circulatory endpoints in Europeans (Supplementary Data [89]7) and
18 of 95 in Africans (Supplementary Data [90]8). Among the traits
assessed for heterogeneity in genetic correlation with AF between
European and African populations, several phenotypes demonstrated
substantial population-specific differences. We identified conditions
such as first-degree atrioventricular block, abdominal aortic aneurysm,
varicose vein of lower extremity, deep vein thrombosis, tachycardia,
transient cerebral ischemia, and abnormal heart sounds as having
significantly heterogeneous genetic correlations with AF across
ancestries (Fig. [91]4).
Fig. 4. Heterogeneity between Europeans and Africans regarding genetic
correlations between atrial fibrillation and other circulatory endpoints.
Fig. 4
[92]Open in a new tab
The analysis was conducted using data from the Million Veteran Program
(MVP). The analysis involved 94 correlations both in Europeans and
Africans, and heterogeneity was defined by I ^2 > 75% and P-value for
Cochran’s Q < 0.05. The statistical test was two-sided. Detailed
information on these genetic correlations is available Supplementary
Data [93]7 and [94]8. Source data are provided as a Source Data file.
Mendelian randomization revealed modifiable risk factors
Among the 37 modifiable risk factors, genetically predicted body mass
index (BMI), waist-to-hip ratio, visceral adiposity, childhood BMI,
apolipoprotein A-I levels, apolipoprotein B levels, low-density
lipoprotein (LDL) cholesterol levels, type 2 diabetes, systolic and
diastolic blood pressure, thyroid-stimulating hormone levels, smoking
initiation, lifetime smoking index, alcohol consumption, leisure screen
time, and insomnia were significantly associated with AF risk after
Bonferroni correction (Fig. [95]5). The scatter plots of the effect of
SNPs on these traits and that on AF are shown in Supplementary
Figs. [96]2–[97]17. These associations remained robust in sensitivity
analyses (Supplementary Data [98]9).
Fig. 5. Genetically predicted associations between 37 modifiable traits and
atrial fibrillation (AF).
[99]Fig. 5
[100]Open in a new tab
The estimates and p-values were derived using the inverse variance
weighted (IVW) method with a fixed-effects model for traits with ≤ 4
genetic instruments. For traits with > 4 genetic instruments, the
results were obtained from MR-PRESSO, accounting for potential
pleiotropic effects by removing outlier SNPs where applicable. Detailed
results are presented in Supplementary Data [101]9. Supplementary
Data [102]18 lists the number of instrumental variables, the sample
sizes of the source studies, and the units for each trait. The x-axis
represents the odds ratio (OR) of AF per unit increase in the
genetically predicted trait. Triangles indicate associations with
P < 0.05 after Bonferroni correction, while red and blue dots represent
positive and inverse associations, respectively. Data are presented as
ORs +/− 95% confidence intervals. The statistical test was two-sided,
and the Bonferroni-corrected significance level was applied. Source
data are provided as a Source Data file.
Bidirectional protein-wide Mendelian randomization identified causal proteins
After pooling protein quantitative trait loci (pQTL) from deCODE and
UKB-PPP, the forward Mendelian randomization (MR) analysis (the effect
of genetically predicted protein levels on AF) included 2847 unique
proteins with cis genetic variants as the instrumental variables. After
filtering the association with P < 0.05 after Bonferroni correction, P
for heterogeneity in dependent instruments (HEIDI) test > 0.05, we
identified genetically predicted levels of 95 circulating proteins were
associated with AF risk (Fig. [103]6a and Supplementary Data [104]10).
Among these, 21 and 16 protein-AF associations were identified as
strong colocalization evidence with PPH4 > 0.8, respectively, using
traditional colocalization (Fig. [105]6b and Supplementary
Data [106]11) and Sum of Single Effects (SuSiE) colocalization
(Fig. [107]6c and Supplementary Data [108]11) methods. In total, 28
proteins were deemed with potential causal associations with AF, with
one standard deviation increment conferring an odds ratio of AF from
0.61 (95% CI 0.49–0.75) for ING1 to 1.68 (95% CI 1.35–2.09) for ATXN2L.
Among these 28 proteins, 18 proteins had cis instruments available in
the Fenland study, and 17 associations were replicated with
P-value < 0.05 albeit the direction of the association was reverse for
ICAM1, CCN3 (also known as NOV), and QSOX2 (Supplementary
Data [109]12).
Fig. 6. Genetically predicted levels of 2847 proteins associated with atrial
fibrillation (AF).
[110]Fig. 6
[111]Open in a new tab
We analyzed 2847 unique proteins with cis-instrumental variables
derived from the deCODE and UKB-PPP datasets. For proteins present in
both datasets, data from UKB-PPP were prioritized due to its larger
sample size. All associations were scaled to a one standard deviation
increase in genetically predicted protein levels. a volcano plot of
protein-AF associations using SMR analysis. The x-axis represents the
effect size of protein-AF associations, while the y-axis shows the
-log10(P) values. The statistical test was two-sided, and the
Bonferroni-corrected significance level was applied. Associations with
P < 0.05 after Bonferroni correction and HEIDI test P > 0.05 are
labeled. Red and blue dots indicate positive and inverse associations,
respectively. b traditional colocalization analysis results. Only
protein-AF associations with PPH4 > 0.7 are displayed due to space
constraints. The gray line indicates PPH = 0.8, a commonly used
threshold for strong colocalization evidence. c SuSiE colocalization
analysis results. Similar to panel b, only protein-AF associations with
PPH4 > 0.7 are shown. The gray line indicates PPH = 0.8. d forest plot
of associations meeting the criteria of Bonferroni-corrected P < 0.05,
HEIDI P > 0.05, and colocalization PPH4 > 0.8. Data are presented as
ORs +/− 95% confidence intervals. The statistical test was two-sided,
and the Bonferroni-corrected significance level was applied. Source
data are provided as a Source Data file.
Seven protein targets have corresponding drugs in clinical trials or
approved for other indications; however, none have been explicitly
approved for treating AF (Supplementary Data [112]13). Nonetheless,
certain targets, such as ICAM1, ANGPT1, and MAPK3, may hold therapeutic
potential due to their roles in cardiovascular and inflammatory
pathways, which are implicated in AF pathophysiology.
In the reverse MR analysis, genetic liability to AF was associated with
levels of 16 unique proteins in deCODE or UKB-PPP (Supplementary
Data [113]14 and [114]15) after Bonferroni correction. In particular,
genetic liability to AF was associated with reduced levels of
N-terminal pro-brain natriuretic peptide (NT-proBNP). The association
for natriuretic peptide B was conflicting between deCODE and UKB-PPP.
Polygenic risk and protein score enhanced disease prediction
To evaluate the performance of the polygenic risk score in an
independent dataset, we tested it in the Penn Medicine BioBank (PMBB),
which is not used for PGS derivation. The polygenic risk score (PGS)
derived from this cross-population GWAS meta-analysis demonstrated a
dose-response association with AF prevalence in 4401 individuals with
AF and 32,760 individuals without AF from the PMBB (Fig. [115]7a–c).
Each standard deviation (SD) increase in PGS was associated with an
odds ratio (OR) of 1.82 (95% CI: 1.79–1.85) for AF. Compared to
individuals in the first decile of the PGS, those in the tenth decile
had a sixfold increased risk of AF (OR = 6.38, 95% CI: 5.30–7.75)
(Fig. [116]7b). Our PGS showed superior predictive performance compared
to PGS002814 from the Miyazawa et al. study, with an area under a
receiver operating characteristic (AUC) of 0.780 (95% CI: 0.778–0.783)
and a Brier score of 0.092 (95% CI: 0.091–0.093), outperforming
PGS002814 (AUC = 0.767, 95% CI: 0.764–0.769; Brier score = 0.094, 95%
CI: 0.093–0.095) (Fig. [117]7c). The DeLong test showed that the AUC of
the PGS derived from our GWAS meta-analysis was significantly higher
than that of the Miyazawa PGS (P < 2.2e-16).
Fig. 7. Polygenic risk score (PGS) and protein score (ProS) for atrial
fibrillation (AF) risk prediction.
[118]Fig. 7
[119]Open in a new tab
The analysis for panels (a, b, and c) was based on the Penn Medicine
Biobank (PMBB, 4401 individuals with prevalent AF and 32,760
individuals without) and the analysis for panel d was based on the UK
Biobank (3441 individuals with incident AF and 47,437 without). Panels
(a and b) plots show the prevalence and odds ratio of AF across deciles
of our PGS vs. the PGS002814 from the Miyazawa et al. study,
respectively. Data in panels (a and b) are presented as mean
values +/− SD and ORs +/− 95% confidence intervals, respectively. Panel
c plot compares the prediction ability between two PGS (AUC for our
PGS = 0.780 and AUC for PGS002814 = 0.767). Panel (d) plot compares the
prediction ability between PGS, ProS, and their combination. AUC, area
under its receiver operating characteristic curve. Source data are
provided as a Source Data file.
In a cohort of 3441 individuals with incident AF and 47,437 without,
with available proteomic and genetic profiles, we constructed a protein
score (ProS) using the LASSO method and a PGS to assess their
predictive value for AF risk. The ProS included 87 proteins listed in
Supplementary Methods. The ProS exhibited a positive association with
AF incidence (Supplementary Data [120]16) and demonstrated strong
predictive performance, achieving an AUC of 0.792 and a Brier score of
0.119 in the testing set (Fig. [121]7d). Similarly, the PGS also showed
a robust association with incident AF (Supplementary Data [122]17).
Adding the ProS to the PGS significantly enhanced the performance of AF
risk prediction. The combined model incorporating PGS and ProS achieved
an AUC of 0.823 and a Brier score of 0.059 (Fig. [123]7d). The combined
score incorporating both the PGS and ProS demonstrated superior
predictive performance compared to either PGS alone (P = 1.34 × 10⁻²¹)
or ProS alone (P = 0.009).
Discussion
In this large-scale cross-population GWAS meta-analysis of AF,
comprising 168,007 cases and 1,959,739 controls, we identified numerous
previously unreported genetic loci, refined the genetic architecture of
AF, and emphasized the importance of population-inclusive research in
uncovering both shared and population-specific risk variants. Notably,
our population-specific analysis revealed significant disparities in
genetic risk loci, with a majority identified in Europeans and
relatively few in non-Europeans. This imbalance likely reflects
differences in sample sizes across ancestries, underscoring the urgent
need to increase representation of underrepresented populations in
future genetic studies to ensure equitable and comprehensive genetic
discovery^[124]9.
While most risk alleles had small-to-moderate effect sizes, we
identified six lead SNPs with larger effect sizes in loci prioritized
by SORCS3, POLD1, AGBL4, [125]AC126283.1, PITX2, and FAM241A genes,
suggesting stronger genetic contributions at these loci. PITX2 has a
well-documented role in AF through mechanisms involving electrical and
structural remodeling, as well as calcium handling^[126]10–[127]12.
AGBL4 has been revealed to be associated with AF in previous
GWASs^[128]7,[129]13. However, the involvement of SORCS3, POLD1,
[130]AC126283.1, and FAM241A in AF remains to be clarified through
future studies.
PITX2 and ZFHX3 are well-established AF-associated genes; our findings
reaffirm their consistent association across four population
groups^[131]7,[132]8, further supporting their pivotal role in AF
susceptibility. Regarding mechanisms, a knockout mice study revealed
that ZFHX3 loss in mice leads to atrial dysfunction, arrhythmogenic
remodeling, and increased AF susceptibility^[133]14. However, no drugs
targeting the two gens have been proved or developed, thus whether
these two targets can be used for therapeutic development needs to be
investigated.
We employed a comprehensive gene prioritization strategy, identifying
putative causal genes for 504 loci, providing functional insights into
AF pathogenesis. This approach enhanced pathway enrichment analyses,
reaffirming muscle contraction and cardiac development^[134]7 as core
AF mechanisms while uncovering additional pathways, including TGF-β
signaling, vascular remodeling^[135]15, electrical coupling, and
cytoskeletal regulation^[136]16. These findings highlight potential
therapeutic opportunities, such as targeting TGF-β-mediated fibrosis or
refining anti-arrhythmic strategies through ion channel
modulation^[137]17, paving the way for potential interventions in AF
prevention and treatment.
We observed significant heterogeneity in the genetic correlation
between AF and several circulatory phenotypes across European and
African populations. While these findings suggest the possibility of
population-specific differences in the shared genetic architecture
between AF and its comorbidities, we acknowledge that these results are
exploratory and require validation in independent cohorts. Due to the
limited number of prior genetic studies addressing population-specific
correlations for these traits, we refrain from drawing strong
conclusions about the direction or clinical implications of individual
trait differences. Instead, our findings underscore the broader need
for population-informed genetic analyses and increased representation
of diverse populations. Facilitating this type of research may improve
the accuracy of risk stratification, inform targeted screening
strategies, and reduce disparities in cardiovascular outcomes across
diverse patient populations.
Our MR analyses identified obesity^[138]18, type 2 diabetes^[139]19,
hypertension^[140]20, high TSH levels^[141]21, smoking^[142]22, and
insomnia^[143]23 as causal risk factors for AF, consistent with
previous studies. However, for dyslipidemia^[144]24, alcohol
consumption^[145]22, and sedentary behavior^[146]25—traits with
conflicting evidence in prior research—our well-powered MR analysis
leveraging a larger sample size strengthened their associations with
AF. Mechanistically, obesity, lipid imbalances, and hypertension may
drive atrial remodeling and inflammation, while smoking, alcohol
consumption, and insomnia could exacerbate autonomic dysfunction and
electrical instability, increasing AF susceptibility. Clinically, these
findings emphasize the need for targeted AF prevention strategies,
including weight management, lipid-lowering therapies, blood pressure
control, and behavioral interventions to reduce sedentary behavior.
Addressing these modifiable risk factors through lifestyle changes and
medical interventions could play a crucial role in reducing AF
incidence and its associated complications.
Our study identified 28 circulating proteins with potential causal
roles in AF, some of which have been previously associated with the
condition^[147]26,[148]27. Among these, our MR associations for
ICAM1^[149]28 and CD40^[150]29 were directionally opposite to prior
observational studies, likely reflecting compensatory or feedback
mechanisms^[151]30,[152]31. The positive association for FURIN aligns
with its role in pro-fibrotic and inflammatory pathways^[153]32, while
ADM’s association supports its involvement in vascular
regulation^[154]33, both of which may contribute to AF onset. Although
none of these proteins have been established as direct therapeutic
targets for AF, our findings provide valuable insights into AF
pathophysiology and highlight promising candidates for further
investigation^[155]30,[156]34. Nonetheless, we observed that a subset
of associations could not be replicated in the independent dataset.
While such discrepancies may arise from differences in genetic
regulation across populations, platform-specific variation in protein
quantification, or measurement error in replication analyses, they do
not necessarily invalidate the MR results. However, they do warrant
caution in interpreting these findings. Importantly, the associations
identified in our study are based on protein levels measured in
circulation and may not fully capture tissue-specific effects relevant
to AF pathogenesis. Further validation in independent cohorts and
functional characterization of these proteins in cardiac-relevant
tissues and models will be essential to confirm their causal roles and
assess their translational potential.
MR revealed that genetic liability to AF is paradoxically associated
with lower circulating NT-proBNP levels, in direct contrast to
case-control studies reporting elevated NT-proBNP among AF
patients^[157]35,[158]36. This discordance implies that the NT-proBNP
elevations seen in AF may largely reflect secondary hemodynamic stress
and atrial stretch rather than a primary effect of AF itself. Moreover,
we observed inconsistent associations between genetic liability to AF
and NPPB—the prohormone precursor to NT-proBNP—across two independent
proteomic datasets, underscoring additional complexity. Although
longitudinal cohorts have linked higher baseline NT-proBNP to
subsequent AF^[159]37, our SMR analyses did not support a causal
influence of genetically proxied NT-proBNP or NPPB on AF risk.
Together, these data argue against a simple, unidirectional causal
relationship between AF and NT-proBNP, and highlight the need for
detailed longitudinal and mechanistic studies to untangle cause from
consequence in the AF–NT-proBNP axis.
Our study highlights the strong predictive value of a PGS derived from
a cross-population GWAS, demonstrating superior performance compared to
previous PGSs^[160]8. The enhanced predictive accuracy has significant
implications for risk differentiation at the population level. In
addition, we developed a ProS and found that combining the ProS with
the PGS significantly improved risk prediction, aligning with findings
from prior studies. A UK Biobank-based study demonstrated improved
disease prediction when integrating a protein score with a clinical
score^[161]38, while another UK Biobank study found a significant
improvement when combining a protein score with a PGS^[162]39. Even
though different protein selection methods were used between previous
studies and the current study, the findings remained consistent.
Collectively, these results underscore the value of multi-omic
approaches in refining AF risk assessment. Future research should focus
on validating these models in diverse populations and evaluating their
potential clinical applications to further enhance personalized AF
prevention and management strategies.
This study has several limitations. First, although we included data
from non-European populations, the statistical power for these
ancestries may be limited due to smaller sample sizes, potentially
affecting the identification of population-specific associations.
Second, despite employing multiple prioritization strategies, some
degree of gene misassignment is likely inevitable due to the
limitations of current functional annotation resources. While many
genes at these loci were prioritized based on proximity to the lead
variant, we have explicitly noted when proximity was the sole criterion
and, where possible, incorporated supporting evidence from eQTL
colocalization and fine mapping to strengthen biological plausibility.
Third, although we applied a MAC < 50 threshold to exclude rare
variants, a small number of variants with minor allele frequency
(MAF) < 1% remained in the analysis (5 out of 493 in European GWAS and
7 out of 525 in cross-population GWAS). However, nearly half of these
variants were replicated in our independent replication dataset or have
been previously reported in association with AF in other studies. Given
their limited number and supporting evidence, we believe that the
inclusion of these variants does not materially affect the
comparability of our results with earlier GWAS. Fourth, while there
were some sample overlaps in the MR analysis, the potential bias is
likely minimal due to the small proportion of overlapping samples and
the strong validity of the genetic instrumental variables used. Fifth,
the inclusion of coding variants may alter epitope binding in
aptamer-based proteomic analyses for certain proteins, potentially
introducing measurement bias that could affect the accuracy of MR
results^[163]40. Lastly, all analyses were conducted using in silico
approaches, emphasizing the need for further validation through
functional studies and experimental research to confirm the biological
relevance of the identified associations.
In summary, this cross-population GWAS meta-analysis identified 525
genetic loci for AF, refining its genetic architecture and biological
pathways. Mendelian randomization revealed causal risk factors and
circulating proteins, offering insights for prevention and therapeutic
development. The cross-population-derived PGS, combined with a protein
score, significantly improved risk prediction. This study integrates
genetic discovery, causal inference, and multi-omic data, advancing AF
risk stratification, prevention, and potential therapeutic strategies.
Methods
Ethics
The study complied with all relevant regulations governing the use of
human participants and was conducted in accordance with the principles
of the Declaration of Helsinki. Participants in the FinnGen study
provided informed consent for biobank research, with the study protocol
(No. HUS/990/2017) approved by the Coordinating Ethics Committee of the
Hospital District of Helsinki and Uusimaa (HUS). The UK Biobank
received ethical approval from the North West Multi-center Research
Ethics Committee (approval number: 11/NW/0382), with all participants
giving informed consent. The Million Veteran Program (MVP) was approved
by the VA Central Institutional Review Board (IRB), and participants
provided informed consent. The Penn Medicine Biobank (PMBB) was
approved by the University of Pennsylvania Institutional Review Board,
and all participants gave informed consent. The Swedish Ethical Review
Authority granted ethical approval for SIMPLER and the current protocol
(no. 2019-03986), and all participants gave informed consent. Each
study adheres to rigorous ethical guidelines to ensure the protection
of participants and the integrity of the research.
Study design and participants
Figure [164]1 summarizes the study design. We first performed a GWAS
meta-analysis across eight studies as the discovery analysis in
European populations. This was followed by a replication analysis using
data from the UK Biobank, resulting in a European GWAS meta-analysis
that included a total of 153,980 AF cases and 1,611,415 controls. Next,
we extended the analysis to include data from East Asians, South
Asians, Africans, and Admixed Americans, enabling a cross-population
meta-analysis comprising 168,007 AF cases and 1,959,739 controls.
Detailed descriptions of included studies are shown in Supplementary
Methods and Supplementary Data [165]1. Using this large-scale AF GWAS,
we conducted comprehensive downstream analyses to prioritize related
genes, explore potential etiologies, assess genetic correlations,
identify risk factors, and evaluate risk prediction models.
Cross-population GWAS meta-analysis
Eight studies (the Nord-Trøndelag Health Study [HUNT], deCODE,
DiscoverEHR, Michigan Genomics Initiative [MGI], AFGen
consortium^[166]4, FinnGen R12, Swedish Infrastructure for Medical
Population-Based Life-Course and Environmental Research [SIMPLER,
[167]https://www.simpler4health.se/], and Million Veteran Program[MVP])
contributed to the discovery analysis for the European GWAS, comprising
117,905 atrial fibrillation (AF) cases and 1,239,541
controls^[168]6,[169]41,[170]42. We performed GWAS association testing
using individual-level genotype and phenotype data from participants in
the SIMPLER cohort. By incorporating replication data from the UK
Biobank (36,075 cases and 371,874 controls), the total sample size for
the European GWAS reached 153,980 cases and 1,611,415 controls. To
expand the analysis, we included data from four additional ancestries
represented in Biobank Japan^[171]43, Genes & Health^[172]44, and
MVP^[173]41, culminating in a cross-population meta-analysis with
168,007 AF cases and 1,959,739 controls. Detailed descriptions of the
study populations, genotyping procedures, and quality control protocols
are provided in the Supplementary Methods, while AF definitions and
sample sizes for each included study are summarized in Supplementary
Data [174]1.
Each dataset underwent rigorous quality control, including initial
preprocessing, genotype imputation, post-imputation filtering, and
association testing, with adjustments for age (or birth year), sex, and
principal components as covariates. Post-GWAS quality control was
performed using GWASinspector^[175]45, and SNPs with minor allele
counts < 50 were excluded. Meta-analyses were conducted using
METAL^[176]46, employing the fixed-effect inverse-variance-weighted
method. After meta-analysis, variants that were present in only one
cohort were excluded from downstream analysis.
We applied LDSC to evaluate the contributions of population
stratification and polygenicity to GWAS test statistic
inflation^[177]47. Although the genomic inflation factor (λGC) was
2.04, the LDSC intercept (1.34) and ratio (15%) indicated that most of
the inflation could be attributed to a true polygenic signal rather
than confounding biases. Genome-wide significant SNPs were grouped into
loci if they were within 1 Mb of each other^[178]8. Loci were defined
by (1) identifying genome-wide significant variants (P < 5 × 10^−⁸)
from association results, (2) extending the region by 500 kb on either
side of these variants, and (3) merging overlapping regions. Genetic
loci in the European analysis were defined based on a GWAS
meta-analysis that combined both the discovery and replication
datasets. This integrated approach maximized statistical power,
enabling the identification of several loci that reached genome-wide
significance only after the datasets were meta-analyzed. Loci were
annotated as unreported if loci had no overlapping coordinates with
previously reported genome-wide significant variants (P < 5 × 10^−⁸)
associated with AF based on a comprehensive evaluation. This included
PheWAS lookups using the Open Targets platform
([179]https://genetics.opentargets.org/, integrating data from the GWAS
Catalog, UK Biobank, and FinnGen), as well as cross-referencing with
prior AF GWAS reports, including those by Thorolfsdottir et al.
(2017)^[180]48, Nielsen et al. (2018)^[181]6, Roselli et al. (2018,
2025)^[182]7,[183]49, Miyazawa et al. (2023)^[184]8, Verma et al.
(2024)^[185]41, Choi et al. (2025)^[186]50, and other relevant studies.
Gene prioritization
We applied six complementary gene prioritization approaches to identify
the most confident locus-gene pairs: (1) nearest gene annotation, (2)
MAGMA-based gene prioritization^[187]51, (3) Polygenic Priority Score
(PoPS)^[188]52, (4) eQTL colocalization, (5) CARMA (Credible-variant
Analysis for Regional Meta-Analysis)-based functional gene
prioritization^[189]53, and (6) transcriptome-wide association study
(TWAS)^[190]54. For each genomic locus, the prioritized gene was
determined by selecting the gene with the highest count of selections
across these six methods. In cases where multiple genes had the same
count, prioritization was refined by first considering genes encoding
variants within CARMA-identified credible sets, followed by the nearest
gene^[191]55. Below is a detailed description of each approach:
Nearest gene annotation
The gene closest to the lead SNP in each locus was identified based on
its physical distance to the gene body. This analysis was performed
using the get_nearest_gene() function from the gwasRtools R package
([192]https://github.com/lcpilling/gwasRtools).
MAGMA-Based gene prioritization
We utilized MAGMA to annotate genes within genomic loci using the 1000
Genomes Project as the reference panel^[193]51. SNPs were mapped to
genes based on their physical positions, including the gene body and
flanking regions (± 10 kb). Gene-level p-values were then calculated by
aggregating SNP association statistics while accounting for linkage
disequilibrium (LD) structure. The gene with the smallest p-value
within each locus was selected as the prioritized gene.
PoPS
PoPS, a similarity-based gene prioritization tool, integrates publicly
available datasets, such as RNA sequencing data, curated pathway
annotations, and predicted protein-protein interaction
networks^[194]52. Based on the premise that causal genes share similar
functional characteristics, PoPS calculates gene-level association
statistics using GWAS summary statistics and MAGMA-based gene
annotations. It then selects relevant features from precomputed
statistics and assigns a score to each gene, reflecting its likelihood
of being causal. For each genome-wide significant locus, genes within
1 Mb of the index variant (in both directions) were ranked by their
PoPS scores, with the highest-ranked gene prioritized.
eQTL Colocalization
Colocalization analysis was conducted using the coloc R package, which
applies an approximate Bayes factor framework to assess whether two
traits share a causal genetic signal^[195]56. Using the coloc.abf()
function ([196]https://github.com/chr1swallace/coloc), we calculated
posterior probabilities for five hypotheses: (H0) no association with
either trait; (H1/H2) association with only one trait; (H3) association
with both traits but different causal variants; and (H4) association
with both traits with the same causal variant. A high posterior
probability for H4 (PP4 > 0.8) was considered evidence of
colocalization^[197]57. For this analysis, we used eQTL data from
eQTLGen Phase I^[198]58 and the Genotype-Tissue Expression (GTEx)
Project v8^[199]59 for heart atrial appendage and heart left ventricle
tissues. Variants within 500 kb of each GWAS index variant were
extracted to perform colocalization analysis.
CARMA-Based functional gene prioritization
We applied CARMA, a Bayesian fine-mapping approach^[200]53, to identify
credible sets of variants within each genomic locus. CARMA accounts for
LD structure and aggregates association signals across studies or
populations to identify variants most likely to be causal. For each
locus, CARMA generated a credible set with a high posterior probability
(e.g., 95%) of containing the causal variant(s). Functional annotation
of these variants was performed using Open Targets
([201]https://www.opentargets.org/), which provides information on
coding, regulatory, and splicing effects^[202]60. If a causal variant
was located within or directly affected a gene’s function, that gene
was assigned to the locus.
TWAS
We performed TWAS using MetaXcan^[203]54 to estimate the relationship
between genetically predicted gene expression and AF. MetaXcan
integrates GWAS summary statistics with precomputed gene expression
prediction models to identify genes associated with the phenotype. For
this analysis, we used expression prediction models for the heart
atrial appendage, artery tibial, and heart left ventricle, leveraging
LD reference data from GTEx v8^[204]59 and cross-population AF-GWAS
summary statistics. For the TWAS, the target tissues were selected
based on results from MAGMA tissue enrichment analysis and
stratified-LDSC^[205]61, both of which were conducted using gene
expression data from GTEx v8. MAGMA tissue enrichment analysis
identifies tissues where genes associated with the trait of interest
are significantly enriched by testing the relationship between GWAS
association signals and tissue-specific gene expression profiles.
S-LDSC further refines this by partitioning heritability across genomic
regions annotated with tissue-specific gene expression and estimating
the contribution of each tissue to the trait heritability. Using these
complementary approaches, tissues such as the heart atrial appendage,
heart left ventricle, and artery tibial were identified as relevant for
atrial fibrillation (Supplementary Fig. [206]18). These selected
tissues were then used to predefine the expression prediction models
for the TWAS. Bonferroni correction was applied to account for multiple
testing, and the gene with the lowest p-value within each locus was
prioritized.
Pathway enrichment
Pathway enrichment analysis was performed to identify biological
pathways and functional categories associated with the prioritized
genes. Reactome^[207]62 enrichment was conducted using Enrichr
([208]https://maayanlab.cloud/Enrichr/), enabling the exploration of
curated pathways^[209]63. Gene Ontology (GO) enrichment
analysis^[210]64, which provided insights into biological processes,
molecular functions, and cellular components, was carried out using the
enrichGO function from the clusterProfiler Bioconductor R package
([211]https://bioconductor.org/packages/release/bioc/html/clusterProfil
er.html). To minimize false-positive findings, Bonferroni correction
was applied to account for multiple testing, with the significance
threshold set at P < 0.05/number of tests performed.
Population-specific genetic correlations with circulatory endpoints
Using LDSC, we calculated the genetic correlations of AF with 130 and
97 circulatory endpoints defined by phecodes, separately for Europeans
and Africans in the MVP cohort. The MVP GWAS included up to 449,042
European participants and 121,177 African participants^[212]41. Genetic
correlations with rg > 1.25 or < − 1.25 were removed due to poor
inheritability (h^2 estimates was very close to zero). To account for
multiple testing and reduce the likelihood of false-positive results,
the Bonferroni correction was applied. To objectively compare genetic
correlations between populations, we applied Cochran’s Q test to assess
heterogeneity in the correlation estimates between European and African
populations. Traits were considered to exhibit population-specific
differences if they showed evidence of substantial heterogeneity,
defined as an I² statistic greater than 75% and a Cochran’s Q test
P–value less than 0.05.
Mendelian randomization analysis for modifiable risk factors
MR is an analytical approach that strengthens causal inference by
leveraging genetic variants (IVs) as instrumental variables to estimate
the causal effect of an exposure on an outcome. A comprehensive
description of the MR design is provided in the Supplementary Methods.
Using GWAS meta-analysis data, we conducted MR to evaluate the
associations between 37 modifiable risk factors and AF risk. These
modifiable factors span multiple categories, including adiposity, blood
lipids, type 2 diabetes and glycemic traits, other metabolic traits
(e.g., blood pressure, thyroid function, and kidney function),
lifestyle factors (e.g., smoking, alcohol and coffee consumption, and
physical activity), sleep behaviors, and dietary factors (e.g.,
circulating levels of vitamins and minerals). The selection of these
factors was guided by a recent comprehensive review of AF risk
factors^[213]65. Detailed information on the GWAS data sources for
these traits is summarized in Supplementary Data [214]18.
Genetic variants associated with the exposures were selected at a
genome-wide significance threshold of P < 5 × 10^−8. To ensure
independence among instrumental variables, SNPs were pruned at R
^2 < 0.01, minimizing the effects of collinearity due to LD. The
strength of the instrumental variables was assessed using
F-statistics^[215]66, with all variants meeting the threshold of
F > 10. Data harmonization was performed to align effect and non-effect
alleles consistently between the exposures and outcomes. Detailed
information on the used genetic instruments is presented in
Supplementary Data [216]19.
For exposures with fewer than five genetic instruments, the inverse
variance weighted (IVW) method with a fixed-effects model was used. For
exposures with five or more genetic instruments, we employed MR-PRESSO
as the primary analysis method, as it accounts for pleiotropic effects
by identifying and removing outlier SNPs^[217]67. In the absence of
outlier SNPs, MR-PRESSO provides estimates equivalent to the IVW
method. Sensitivity analyses included the IVW method with random
effects, the weighted median method^[218]68, and MR-Egger
regression^[219]69. Heterogeneity among SNP-specific estimates was
assessed using Cochran’s Q test, while the MR-Egger intercept test was
used to evaluate the presence of horizontal pleiotropy. The scatter
plot was used to visualize potential pleiotropic SNPs. To minimize
false-positive findings, we applied Bonferroni correction to account
for multiple testing.
MR and colocalization analyses for circulating proteins
For the MR analysis of circulating proteins, we utilized two
large-scale pQTL (protein quantitative trait loci) datasets,
deCODE^[220]70 and UKB-PPP^[221]71, for IV selection (Supplementary
Fig. [222]19). After excluding overlapping proteins, a total of 2847
proteins with cis-SNPs were included in the analysis. For proteins
present in both datasets, we prioritized data from UKB-PPP due to its
larger sample size and the fact that it identified a greater number of
cis-pQTLs using the Olink^[223]72. Importantly, the associations of
overlapping proteins with AF showed strong consistency between the two
datasets, supporting the robustness of the findings. To validate the
results, we used the Fenland study as a replication dataset^[224]73,
focusing on proteins with available IVs in this study to replicate the
observed associations.
We used the lead cis-SNP associated with plasma protein levels at
P < 5 × 10^−8 as the genetic IV. Cis-SNPs were defined as variants
located within 250 kb of the encoding gene. Detailed information on
selected genetic IVs is shown in Supplementary Data [225]20. The
Summary-data-based Mendelian Randomization (SMR) method was employed to
estimate the association between genetically predicted protein levels
and AF risk^[226]74. SMR integrates GWAS and pQTL summary statistics to
evaluate whether the genetic association with a phenotype (i.e., AF) is
mediated through the genetically regulated protein levels. To evaluate
potential pleiotropy, we performed HEIDI (Heterogeneity in Dependent
Instruments) analysis^[227]74. HEIDI assesses whether the association
between the protein and the phenotype is driven by the same causal
variant or by independent variants in LD. The analysis uses 3–20 SNPs
in the cis region of the encoding gene to test for heterogeneity. A
HEIDI p-value > 0.05 suggests no evidence of pleiotropy and supports
the hypothesis of a shared causal variant. To further rule out
false-positive associations caused by LD, we conducted
traditional^[228]56 and SuSiE (Sum of Single Effects)^[229]75
colocalization analyses, using all SNPs in the cis gene region as
input. As described in detail in the eQTL colocalization section,
strong evidence of shared causal variants between protein levels and AF
was indicated by PP.H4 ≥ 0.8, a stringent but widely accepted threshold
in colocalization studies. We applied Bonferroni correction to account
for multiple testing. Associations were considered potentially causal
if they met the following criteria: adjusted P < 0.05 for the SMR
analysis, adjusted P > 0.05 for the HEIDI test (indicating no
pleiotropy), and colocalization posterior probability PP.H4 ≥ 0.8.
Bonferroni correction was used for multiple testing for SMR analysis.
The druggability of identified proteins was assessed using multiple
drug databases, including DrugBank^[230]76, DepMap^[231]77, and
OpenTargets^[232]60. Based on their therapeutic potential, proteins
were classified into five categories: (1) approved drug targets, (2) in
clinical trials, (3) preclinical candidates, (4) druggable, and (5) not
currently listed as druggable targets.
To examine the effect of genetic liability to AF on blood proteins, we
conducted a reverse MR analysis using 624 SNPs as instrument variables
for AF (P < 5 × 10^−8 and r^2 for linkage disequilibrium < 0.01) and
protein GWAS data from deCODE and UKB-PPP. Bonferroni correction was
used for multiple testing.
Joint performance of PGS and protein score (ProS)
PGS analysis
The weights of the polygenic scores (PGS) in the current study were
generated using the “auto” setting of PRS-CSx^[233]78, incorporating
summary statistics from the meta-analysis and corresponding EUR, AFR,
AMR, EAS, or SAS LD reference panels derived from 1000 Genomes Project
Phase 3 samples. This approach eliminates the need for independent
training data. The effective sample size was calculated as
4/((1/ncases) + (1/ncontrols)). For the score applied to the UK
Biobank, weights were derived using data that excluded summary
statistics from UK Biobank participants. As a reference, we used the
PGS (PGS002814) from the Miyazawa et al. study^[234]8, which was
derived using the Pruning and Thresholding method (r^2 = 0.5 and
P = 5 × 10^−4; [235]https://www.pgscatalog.org/score/PGS002814/). We
used the DeLong test^[236]79, implemented in the pROC R package, to
statistically compare the AUROC of the polygenic scores and evaluate
whether the difference in predictive performance was significant. We
calculated PGS for 4401 individuals with AF and 32,760 individuals
without from the Penn Medicine BioBank (PMBB)^[237]80, an ongoing study
that integrates genomic and electronic health record data to
investigate the genetic and clinical determinants of various diseases.
The population breakdown of PMBB participants is predominantly European
(∼ 70%), followed by African (∼ 25%), with smaller proportions of South
Asian, East Asian, Admixed American, and other populations
(Supplementary Fig. [238]20). The study was approved by the University
of Pennsylvania Institutional Review Board. To standardize the scores,
we applied a principal component analysis-based method, normalizing
both the mean and variance to the 1000 Genomes reference panel. The
association between PGS and prevalent AF was assessed using a
generalized linear regression model with a logit link, adjusting for
age and sex as covariates. We evaluated the PGS effect size using odds
ratios and assessed model performance by calculating the area under the
receiver operating characteristic curve (AUROC) and Brier score. Using
the ‘tidymodels’ R package ([239]https://github.com/tidymodels), we
performed V-fold cross-validation to validate model performance. The
same approach was used to test PGS performance in the UK Biobank,
including 3441 individuals with incident AF and 47,437 without incident
AF with available proteomic profiles.
Protein score analysis
We derived a protein score (ProS) for AF using individual-level data
from the UK Biobank, a large, ongoing population-based prospective
cohort study with extensive proteomic and phenotypic data. To rule out
proteins with reverse associations, we first conducted a prospective
cohort analysis. Participants with baseline AF or those diagnosed with
AF within the first two years of follow-up were excluded, leaving
50,878 participants with proteomic data. Proteins with a missing rate
exceeding 30% were also excluded, resulting in a final dataset of 2920
proteins. After adjusting for age, sex, ethnicity, Townsend deprivation
index, education, body mass index, smoking status, drinking status, and
physical activity, 459 proteins were significantly associated with
incident AF after Bonferroni correction (Supplementary Data [240]21).
We then used the Least Absolute Shrinkage and Selection Operator
(LASSO) method to construct the ProS^[241]81. We applied LASSO logistic
regression to identify candidate proteins associated with AF, using
five-fold cross-validation to determine the optimal penalty parameter
(λ). A weighted protein score (ProS) was then constructed based on the
proteins selected via LASSO. Specifically, a Cox regression model was
used to estimate the log-hazard ratios for each protein and the
baseline hazard function. The individual risk score for each
participant was subsequently calculated using: Risk
[MATH:
Score=h0(<
mi>t)×exp(β1
mrow>X1
+β2X2+
⋯+βnXn<
mo>) :MATH]
, where X[n] is the level of the n-th selected protein, and β[n] is the
corresponding coefficient from the Cox model. Participants were
randomly split into training and validation cohorts in a 7:3 ratio
using the R package caret ([242]https://github.com/topepo/caret). The
model demonstrating the best predictive performance in the training
cohort were then validated in the remaining 30% of participants and
ultimately combined into a final model for predicting the risk of AF
onset.
Joint performance
The AUROC analysis was performed to assess the predictive performance
of the selected key proteins for AF, both individually and in
combination with the PGS in the UK Biobank. The DeLong test^[243]79 was
used to statistically compare the AUROC of these scores and their
difference in predictive performance.
Reporting summary
Further information on research design is available in the [244]Nature
Portfolio Reporting Summary linked to this article.
Supplementary information
[245]Supplementary Information^ (2.5MB, pdf)
[246]41467_2025_61720_MOESM2_ESM.pdf^ (88KB, pdf)
Description of Additional Supplementary Files
[247]Supplementary Data 1^ (11.6KB, xlsx)
[248]Supplementary Data 2^ (100.7KB, xlsx)
[249]Supplementary Data 3^ (72.1KB, xlsx)
[250]Supplementary Data 4^ (46.9KB, xlsx)
[251]Supplementary Data 5^ (87.6KB, xlsx)
[252]Supplementary Data 6^ (20KB, xlsx)
[253]Supplementary Data 7^ (22.7KB, xlsx)
[254]Supplementary Data 8^ (18.5KB, xlsx)
[255]Supplementary Data 9^ (23KB, xlsx)
[256]Supplementary Data 10^ (231KB, xlsx)
[257]Supplementary Data 11^ (13.8KB, xlsx)
[258]Supplementary Data 12^ (11.3KB, xlsx)
[259]Supplementary Data 13^ (12.3KB, xlsx)
[260]Supplementary Data 14^ (824.6KB, xlsx)
[261]Supplementary Data 15^ (491.3KB, xlsx)
[262]Supplementary Data 16^ (10.1KB, xlsx)
[263]Supplementary Data 17^ (10.1KB, xlsx)
[264]Supplementary Data 18^ (13.6KB, xlsx)
[265]Supplementary Data 19^ (466.1KB, xlsx)
[266]Supplementary Data 20^ (513.9KB, xlsx)
[267]Supplementary Data 21^ (171.7KB, xlsx)
[268]Reporting Summary^ (1,002.5KB, pdf)
[269]Transparent Peer Review file^ (1.3MB, pdf)
Source data
[270]Source Data^ (411.5KB, xlsx)
Acknowledgements