Abstract

   Atrial fibrillation (AF) is a common cardiac arrhythmia with strong
   genetic components, yet its underlying molecular mechanisms and
   potential therapeutic targets remain incompletely understood. We
   conducted a cross-population genome-wide meta-analysis of 168,007 AF
   cases and identified 525 loci that met genome-wide significance. Two
   loci of PITX2 and ZFHX3 genes were identified as shared across
   populations of different ancestries. Comprehensive gene prioritization
   approaches reinforced the role of muscle development and heart
   contraction while also uncovering additional pathways, including
   cellular response to transforming growth factor-beta.
   Population-specific genetic correlations uncovered common and unique
   circulatory comorbidities between Europeans and Africans. Mendelian
   randomization identified modifiable risk factors and circulating
   proteins, informing disease prevention and drug development.
   Integrating genomic data from this cross-population genome-wide
   meta-analysis with proteomic profiling significantly enhanced AF risk
   prediction. This study advances our understanding of the genetic
   etiology of AF while also enhancing risk prediction, prevention
   strategies, and therapeutic development.

   Subject terms: Cardiovascular genetics, Risk factors, Predictive
   markers, Genome-wide association studies
     __________________________________________________________________

   Atrial fibrillation has a strong genetic basis, but key mechanisms
   remain unclear. Here, the authors show that a cross-population GWAS
   meta-analysis identifies 525 loci and highlights shared and
   ancestry-specific pathways relevant to AF risk and therapeutic
   targeting.

Introduction

   Atrial fibrillation (AF) is the most common arrhythmia, characterized
   by disorganized atrial depolarizations, which can lead to symptoms
   including palpitations and decreased exercise capacity, as well as more
   serious complications. With an aging global population, AF has become
   an epidemic and important health issue with increasing incidence and
   prevalence^[56]1, particularly in North America and Europe^[57]2. The
   Global Burden of Disease 2019 Study estimated that approximately 59.7
   million individuals live with AF, which is associated with 8.4 million
   disability-adjusted life years worldwide^[58]3. Hence, there is an
   urgent need to elucidate the etiological basis of AF to improve risk
   prediction, prevention and treatment.

   While environmental factors play a role in AF development, the genetic
   contribution to AF susceptibility has been increasingly recognized.
   Multiple genome-wide association studies (GWASs) have uncovered over
   100 risk loci, shedding light on AF’s genetic architecture^[59]4–[60]8.
   However, existing studies have largely been conducted in European
   populations, and a larger, more diverse GWAS—particularly one including
   multi-populations—could enhance the discovery of variants with smaller
   effects, as well as population-specific and shared loci.

   Genomic data are now widely leveraged to improve disease risk
   prediction, identify risk factors, and facilitate therapeutic
   development. While some studies have explored these aspects for AF,
   integrating cross-population genetic data with proteomic insights in a
   large-scale study could further refine the identification of genetic
   signals, associated comorbidities, causal risk factors, and potential
   drug targets. Thus, we conducted a cross-population GWAS meta-analysis
   involving over 2 million individuals and performed comprehensive
   downstream analyses to uncover unreported genetic loci, identify causal
   risk factors, and enhance AF risk prediction and therapeutic
   opportunities.

Results

Cross-population GWAS meta-analysis identified 379 unreported loci

   We conducted a cross-population GWAS meta-analyses and a series of
   downstream analyses on AF (Fig. [61]1). The European meta-analysis,
   which included 153,980 AF cases from nine studies (Supplementary
   Data [62]1), identified 493 genetic loci reaching genome-wide
   significance (Supplementary Data [63]2). Among these, five loci showed
   significant evidence of heterogeneity in effect estimates across the
   contributing GWAS (heterogeneity test; P  <  0.05/493, Supplementary
   Data [64]2). Of the 493 loci, 479 displayed consistent effect estimates
   between the discovery and replication datasets (Supplementary
   Data [65]2 and Supplementary Fig. [66]1), and 426 had P < 0.05 in the
   replication study (Supplementary Data [67]2). Using linkage
   disequilibrium score regression (LDSC) with the 1000 Genomes European
   reference panel, common variants explain 11.2% (95% CI: 9.2%–13.2%) of
   the variance in AF liability, assuming a 2% disease prevalence.

Fig. 1. Study design overview.

   [68]Fig. 1
   [69]Open in a new tab

   AF atrial fibrillation, AFR African, AMR Admixed American, GWAS
   genome-wide association study. EAS Eastern Asian, EUR European, SAS
   South Asian.

   The cross-population GWAS meta-analysis, which included 168,007 AF
   cases, identified 525 loci that met genome-wide significance
   (Fig. [70]2a). Thirteen loci demonstrated significant heterogeneity in
   effect estimates (P[het]  <  0.05/525, Supplementary Data [71]3). The
   majority of risk alleles conferred small-to-moderate effect sizes, with
   odds ratios (ORs) ranging from 1.0 to 1.3 per allele (Fig. [72]2b).
   However, six lead SNPs had ORs exceeding 1.3 and were located in loci
   with genes SORCS3, POLD1, AGBL4, [73]AC126283.1, PITX2, and FAM241A
   (Fig. [74]2b). Among the 525 significant loci, the breakdown by
   population revealed 483 loci in Europeans, 29 in East Asians, 5 in
   Africans, and 2 in Admixed Americans (Fig. [75]2c). Two loci of PITX2
   and ZFHX3 genes were identified as shared across these populations
   (Fig. [76]2d).

Fig. 2. Genetic loci associated with atrial fibrillation (AF) across
populations of different ancestries.

   [77]Fig. 2
   [78]Open in a new tab

   a Manhattan plot of GWAS associations. The x-axis represents the
   genomic positions of SNPs across chromosomes, while the y-axis displays
   the -log10(P) values, indicating the strength of the association. Each
   dot represents a single SNP, positioned based on its genomic location
   and statistical significance. The red dashed line marks the genome-wide
   significance threshold (P = 5 × 10⁻⁸). The statistical test was
   two-sided, and the Bonferroni-corrected significance level was applied.
   b Scatter plot of minor allele frequency (MAF) versus effect size
   (log-odds ratio) for variant-AF associations. Two gray dashed lines
   indicate MAFs of 0.001 and 0.01. The loci with an effect of odds ratio
   > 1.3 were labeled with the gene name. c Distribution of loci
   identified across GWAS of different ancestries. d Venn diagram of
   shared and unique loci across ancestries. Two loci near PITX2 and ZFHX3
   were identified as shared across European (EUR), East Asian (EAS),
   African (AFR), and Admixed American (AMR) populations. Source data are
   provided as a Source Data file.

Comprehensive gene prioritization refined pathway exploration

   Using a systematic prioritization framework, we nominated a likely
   causal gene at each of the 504 genome-wide significant loci,
   acknowledging that this assignment is based on available functional
   evidence and may not be definitive for all loci. Among these, 70 genes
   harbored protein-altering variants, and 47% of prioritized genes had
   ≥ 80% agreement across available methods (Supplementary Data [79]4).

   To gain mechanistic insights into AF, we performed pathway enrichment
   analysis using these 504 prioritized genes. Enrichment analysis in the
   Reactome database identified 5 out of 1131 pathways significantly
   associated with AF after Bonferroni correction. Among these, muscle
   contraction and cardiogenesis showed strong associations (Fig. [80]3a
   and Supplementary Data [81]5). In addition, we conducted enrichment
   analysis using the Gene Ontology (GO) database. After Bonferroni
   correction, we identified 50 biological processes (BP), 7 cellular
   components (CC), and 6 molecular functions (MF) (Fig. [82]3b and
   Supplementary Data [83]6). GO enrichment analysis reinforced the role
   of muscle development and heart contraction in AF onset while also
   uncovering additional pathways, including cellular response to
   transforming growth factor-beta (TGF-β), artery morphogenesis,
   regulation of cell communication via electrical coupling, and actin
   filament-based movement (Supplementary Data [84]6).

Fig. 3. Pathways enriched based on AF-associated genes.

   [85]Fig. 3
   [86]Open in a new tab

   a Pathway enrichment in the Reactome database. The x-axis represents
   the effect size of the pathway’s influence on AF, while the y-axis
   shows the -log10(P) values, indicating statistical significance. Each
   dot corresponds to a pathway, with blue dots representing pathways that
   are significant after Bonferroni correction. b Pathway enrichment in
   the Gene Ontology (GO) database. The analysis includes pathways
   categorized under biological processes (BP), molecular functions (MF),
   and cellular components (CC). The x-axis represents the ratio of
   AF-associated genes to the total number of genes in each pathway, while
   the y-axis lists the pathways. Each dot represents a pathway, where the
   color reflects the Bonferroni-adjusted p-value, and the size indicates
   the count of AF-associated genes in each pathway. For clarity, the
   figure only highlights the top 10 out of 50 BP pathways due to space
   constraints. Full results, including all pathways, are provided in
   Supplementary Data [87]5 and Supplementary Data [88]6. The statistical
   test was two-sided, and the Bonferroni-corrected significance level was
   applied. Source data are provided as a Source Data file.

Population-specific genetic correlations uncovered circulatory comorbidities

   After Bonferroni correction, AF was significantly associated with 95 of
   128 circulatory endpoints in Europeans (Supplementary Data [89]7) and
   18 of 95 in Africans (Supplementary Data [90]8). Among the traits
   assessed for heterogeneity in genetic correlation with AF between
   European and African populations, several phenotypes demonstrated
   substantial population-specific differences. We identified conditions
   such as first-degree atrioventricular block, abdominal aortic aneurysm,
   varicose vein of lower extremity, deep vein thrombosis, tachycardia,
   transient cerebral ischemia, and abnormal heart sounds as having
   significantly heterogeneous genetic correlations with AF across
   ancestries (Fig. [91]4).

Fig. 4. Heterogeneity between Europeans and Africans regarding genetic
correlations between atrial fibrillation and other circulatory endpoints.

   Fig. 4
   [92]Open in a new tab

   The analysis was conducted using data from the Million Veteran Program
   (MVP). The analysis involved 94 correlations both in Europeans and
   Africans, and heterogeneity was defined by I ^2 > 75% and P-value for
   Cochran’s Q < 0.05. The statistical test was two-sided. Detailed
   information on these genetic correlations is available Supplementary
   Data [93]7 and [94]8. Source data are provided as a Source Data file.

Mendelian randomization revealed modifiable risk factors

   Among the 37 modifiable risk factors, genetically predicted body mass
   index (BMI), waist-to-hip ratio, visceral adiposity, childhood BMI,
   apolipoprotein A-I levels, apolipoprotein B levels, low-density
   lipoprotein (LDL) cholesterol levels, type 2 diabetes, systolic and
   diastolic blood pressure, thyroid-stimulating hormone levels, smoking
   initiation, lifetime smoking index, alcohol consumption, leisure screen
   time, and insomnia were significantly associated with AF risk after
   Bonferroni correction (Fig. [95]5). The scatter plots of the effect of
   SNPs on these traits and that on AF are shown in Supplementary
   Figs. [96]2–[97]17. These associations remained robust in sensitivity
   analyses (Supplementary Data [98]9).

Fig. 5. Genetically predicted associations between 37 modifiable traits and
atrial fibrillation (AF).

   [99]Fig. 5
   [100]Open in a new tab

   The estimates and p-values were derived using the inverse variance
   weighted (IVW) method with a fixed-effects model for traits with ≤ 4
   genetic instruments. For traits with > 4 genetic instruments, the
   results were obtained from MR-PRESSO, accounting for potential
   pleiotropic effects by removing outlier SNPs where applicable. Detailed
   results are presented in Supplementary Data [101]9. Supplementary
   Data [102]18 lists the number of instrumental variables, the sample
   sizes of the source studies, and the units for each trait. The x-axis
   represents the odds ratio (OR) of AF per unit increase in the
   genetically predicted trait. Triangles indicate associations with
   P < 0.05 after Bonferroni correction, while red and blue dots represent
   positive and inverse associations, respectively. Data are presented as
   ORs +/− 95% confidence intervals. The statistical test was two-sided,
   and the Bonferroni-corrected significance level was applied. Source
   data are provided as a Source Data file.

Bidirectional protein-wide Mendelian randomization identified causal proteins

   After pooling protein quantitative trait loci (pQTL) from deCODE and
   UKB-PPP, the forward Mendelian randomization (MR) analysis (the effect
   of genetically predicted protein levels on AF) included 2847 unique
   proteins with cis genetic variants as the instrumental variables. After
   filtering the association with P < 0.05 after Bonferroni correction, P
   for heterogeneity in dependent instruments (HEIDI) test > 0.05, we
   identified genetically predicted levels of 95 circulating proteins were
   associated with AF risk (Fig. [103]6a and Supplementary Data [104]10).
   Among these, 21 and 16 protein-AF associations were identified as
   strong colocalization evidence with PPH4 > 0.8, respectively, using
   traditional colocalization (Fig. [105]6b and Supplementary
   Data [106]11) and Sum of Single Effects (SuSiE) colocalization
   (Fig. [107]6c and Supplementary Data [108]11) methods. In total, 28
   proteins were deemed with potential causal associations with AF, with
   one standard deviation increment conferring an odds ratio of AF from
   0.61 (95% CI 0.49–0.75) for ING1 to 1.68 (95% CI 1.35–2.09) for ATXN2L.
   Among these 28 proteins, 18 proteins had cis instruments available in
   the Fenland study, and 17 associations were replicated with
   P-value < 0.05 albeit the direction of the association was reverse for
   ICAM1, CCN3 (also known as NOV), and QSOX2 (Supplementary
   Data [109]12).

Fig. 6. Genetically predicted levels of 2847 proteins associated with atrial
fibrillation (AF).

   [110]Fig. 6
   [111]Open in a new tab

   We analyzed 2847 unique proteins with cis-instrumental variables
   derived from the deCODE and UKB-PPP datasets. For proteins present in
   both datasets, data from UKB-PPP were prioritized due to its larger
   sample size. All associations were scaled to a one standard deviation
   increase in genetically predicted protein levels. a volcano plot of
   protein-AF associations using SMR analysis. The x-axis represents the
   effect size of protein-AF associations, while the y-axis shows the
   -log10(P) values. The statistical test was two-sided, and the
   Bonferroni-corrected significance level was applied. Associations with
   P < 0.05 after Bonferroni correction and HEIDI test P > 0.05 are
   labeled. Red and blue dots indicate positive and inverse associations,
   respectively. b traditional colocalization analysis results. Only
   protein-AF associations with PPH4 > 0.7 are displayed due to space
   constraints. The gray line indicates PPH = 0.8, a commonly used
   threshold for strong colocalization evidence. c SuSiE colocalization
   analysis results. Similar to panel b, only protein-AF associations with
   PPH4 > 0.7 are shown. The gray line indicates PPH = 0.8. d forest plot
   of associations meeting the criteria of Bonferroni-corrected P < 0.05,
   HEIDI P > 0.05, and colocalization PPH4 > 0.8. Data are presented as
   ORs +/− 95% confidence intervals. The statistical test was two-sided,
   and the Bonferroni-corrected significance level was applied. Source
   data are provided as a Source Data file.

   Seven protein targets have corresponding drugs in clinical trials or
   approved for other indications; however, none have been explicitly
   approved for treating AF (Supplementary Data [112]13). Nonetheless,
   certain targets, such as ICAM1, ANGPT1, and MAPK3, may hold therapeutic
   potential due to their roles in cardiovascular and inflammatory
   pathways, which are implicated in AF pathophysiology.

   In the reverse MR analysis, genetic liability to AF was associated with
   levels of 16 unique proteins in deCODE or UKB-PPP (Supplementary
   Data [113]14 and [114]15) after Bonferroni correction. In particular,
   genetic liability to AF was associated with reduced levels of
   N-terminal pro-brain natriuretic peptide (NT-proBNP). The association
   for natriuretic peptide B was conflicting between deCODE and UKB-PPP.

Polygenic risk and protein score enhanced disease prediction

   To evaluate the performance of the polygenic risk score in an
   independent dataset, we tested it in the Penn Medicine BioBank (PMBB),
   which is not used for PGS derivation. The polygenic risk score (PGS)
   derived from this cross-population GWAS meta-analysis demonstrated a
   dose-response association with AF prevalence in 4401 individuals with
   AF and 32,760 individuals without AF from the PMBB (Fig. [115]7a–c).
   Each standard deviation (SD) increase in PGS was associated with an
   odds ratio (OR) of 1.82 (95% CI: 1.79–1.85) for AF. Compared to
   individuals in the first decile of the PGS, those in the tenth decile
   had a sixfold increased risk of AF (OR = 6.38, 95% CI: 5.30–7.75)
   (Fig. [116]7b). Our PGS showed superior predictive performance compared
   to PGS002814 from the Miyazawa et al. study, with an area under a
   receiver operating characteristic (AUC) of 0.780 (95% CI: 0.778–0.783)
   and a Brier score of 0.092 (95% CI: 0.091–0.093), outperforming
   PGS002814 (AUC = 0.767, 95% CI: 0.764–0.769; Brier score = 0.094, 95%
   CI: 0.093–0.095) (Fig. [117]7c). The DeLong test showed that the AUC of
   the PGS derived from our GWAS meta-analysis was significantly higher
   than that of the Miyazawa PGS (P < 2.2e-16).

Fig. 7. Polygenic risk score (PGS) and protein score (ProS) for atrial
fibrillation (AF) risk prediction.

   [118]Fig. 7
   [119]Open in a new tab

   The analysis for panels (a, b, and c) was based on the Penn Medicine
   Biobank (PMBB, 4401 individuals with prevalent AF and 32,760
   individuals without) and the analysis for panel d was based on the UK
   Biobank (3441 individuals with incident AF and 47,437 without). Panels
   (a and b) plots show the prevalence and odds ratio of AF across deciles
   of our PGS vs. the PGS002814 from the Miyazawa et al. study,
   respectively. Data in panels (a and b) are presented as mean
   values +/− SD and ORs +/− 95% confidence intervals, respectively. Panel
   c plot compares the prediction ability between two PGS (AUC for our
   PGS = 0.780 and AUC for PGS002814 = 0.767). Panel (d) plot compares the
   prediction ability between PGS, ProS, and their combination. AUC, area
   under its receiver operating characteristic curve. Source data are
   provided as a Source Data file.

   In a cohort of 3441 individuals with incident AF and 47,437 without,
   with available proteomic and genetic profiles, we constructed a protein
   score (ProS) using the LASSO method and a PGS to assess their
   predictive value for AF risk. The ProS included 87 proteins listed in
   Supplementary Methods. The ProS exhibited a positive association with
   AF incidence (Supplementary Data [120]16) and demonstrated strong
   predictive performance, achieving an AUC of 0.792 and a Brier score of
   0.119 in the testing set (Fig. [121]7d). Similarly, the PGS also showed
   a robust association with incident AF (Supplementary Data [122]17).
   Adding the ProS to the PGS significantly enhanced the performance of AF
   risk prediction. The combined model incorporating PGS and ProS achieved
   an AUC of 0.823 and a Brier score of 0.059 (Fig. [123]7d). The combined
   score incorporating both the PGS and ProS demonstrated superior
   predictive performance compared to either PGS alone (P = 1.34 × 10⁻²¹)
   or ProS alone (P = 0.009).

Discussion

   In this large-scale cross-population GWAS meta-analysis of AF,
   comprising 168,007 cases and 1,959,739 controls, we identified numerous
   previously unreported genetic loci, refined the genetic architecture of
   AF, and emphasized the importance of population-inclusive research in
   uncovering both shared and population-specific risk variants. Notably,
   our population-specific analysis revealed significant disparities in
   genetic risk loci, with a majority identified in Europeans and
   relatively few in non-Europeans. This imbalance likely reflects
   differences in sample sizes across ancestries, underscoring the urgent
   need to increase representation of underrepresented populations in
   future genetic studies to ensure equitable and comprehensive genetic
   discovery^[124]9.

   While most risk alleles had small-to-moderate effect sizes, we
   identified six lead SNPs with larger effect sizes in loci prioritized
   by SORCS3, POLD1, AGBL4, [125]AC126283.1, PITX2, and FAM241A genes,
   suggesting stronger genetic contributions at these loci. PITX2 has a
   well-documented role in AF through mechanisms involving electrical and
   structural remodeling, as well as calcium handling^[126]10–[127]12.
   AGBL4 has been revealed to be associated with AF in previous
   GWASs^[128]7,[129]13. However, the involvement of SORCS3, POLD1,
   [130]AC126283.1, and FAM241A in AF remains to be clarified through
   future studies.

   PITX2 and ZFHX3 are well-established AF-associated genes; our findings
   reaffirm their consistent association across four population
   groups^[131]7,[132]8, further supporting their pivotal role in AF
   susceptibility. Regarding mechanisms, a knockout mice study revealed
   that ZFHX3 loss in mice leads to atrial dysfunction, arrhythmogenic
   remodeling, and increased AF susceptibility^[133]14. However, no drugs
   targeting the two gens have been proved or developed, thus whether
   these two targets can be used for therapeutic development needs to be
   investigated.

   We employed a comprehensive gene prioritization strategy, identifying
   putative causal genes for 504 loci, providing functional insights into
   AF pathogenesis. This approach enhanced pathway enrichment analyses,
   reaffirming muscle contraction and cardiac development^[134]7 as core
   AF mechanisms while uncovering additional pathways, including TGF-β
   signaling, vascular remodeling^[135]15, electrical coupling, and
   cytoskeletal regulation^[136]16. These findings highlight potential
   therapeutic opportunities, such as targeting TGF-β-mediated fibrosis or
   refining anti-arrhythmic strategies through ion channel
   modulation^[137]17, paving the way for potential interventions in AF
   prevention and treatment.

   We observed significant heterogeneity in the genetic correlation
   between AF and several circulatory phenotypes across European and
   African populations. While these findings suggest the possibility of
   population-specific differences in the shared genetic architecture
   between AF and its comorbidities, we acknowledge that these results are
   exploratory and require validation in independent cohorts. Due to the
   limited number of prior genetic studies addressing population-specific
   correlations for these traits, we refrain from drawing strong
   conclusions about the direction or clinical implications of individual
   trait differences. Instead, our findings underscore the broader need
   for population-informed genetic analyses and increased representation
   of diverse populations. Facilitating this type of research may improve
   the accuracy of risk stratification, inform targeted screening
   strategies, and reduce disparities in cardiovascular outcomes across
   diverse patient populations.

   Our MR analyses identified obesity^[138]18, type 2 diabetes^[139]19,
   hypertension^[140]20, high TSH levels^[141]21, smoking^[142]22, and
   insomnia^[143]23 as causal risk factors for AF, consistent with
   previous studies. However, for dyslipidemia^[144]24, alcohol
   consumption^[145]22, and sedentary behavior^[146]25—traits with
   conflicting evidence in prior research—our well-powered MR analysis
   leveraging a larger sample size strengthened their associations with
   AF. Mechanistically, obesity, lipid imbalances, and hypertension may
   drive atrial remodeling and inflammation, while smoking, alcohol
   consumption, and insomnia could exacerbate autonomic dysfunction and
   electrical instability, increasing AF susceptibility. Clinically, these
   findings emphasize the need for targeted AF prevention strategies,
   including weight management, lipid-lowering therapies, blood pressure
   control, and behavioral interventions to reduce sedentary behavior.
   Addressing these modifiable risk factors through lifestyle changes and
   medical interventions could play a crucial role in reducing AF
   incidence and its associated complications.

   Our study identified 28 circulating proteins with potential causal
   roles in AF, some of which have been previously associated with the
   condition^[147]26,[148]27. Among these, our MR associations for
   ICAM1^[149]28 and CD40^[150]29 were directionally opposite to prior
   observational studies, likely reflecting compensatory or feedback
   mechanisms^[151]30,[152]31. The positive association for FURIN aligns
   with its role in pro-fibrotic and inflammatory pathways^[153]32, while
   ADM’s association supports its involvement in vascular
   regulation^[154]33, both of which may contribute to AF onset. Although
   none of these proteins have been established as direct therapeutic
   targets for AF, our findings provide valuable insights into AF
   pathophysiology and highlight promising candidates for further
   investigation^[155]30,[156]34. Nonetheless, we observed that a subset
   of associations could not be replicated in the independent dataset.
   While such discrepancies may arise from differences in genetic
   regulation across populations, platform-specific variation in protein
   quantification, or measurement error in replication analyses, they do
   not necessarily invalidate the MR results. However, they do warrant
   caution in interpreting these findings. Importantly, the associations
   identified in our study are based on protein levels measured in
   circulation and may not fully capture tissue-specific effects relevant
   to AF pathogenesis. Further validation in independent cohorts and
   functional characterization of these proteins in cardiac-relevant
   tissues and models will be essential to confirm their causal roles and
   assess their translational potential.

   MR revealed that genetic liability to AF is paradoxically associated
   with lower circulating NT-proBNP levels, in direct contrast to
   case-control studies reporting elevated NT-proBNP among AF
   patients^[157]35,[158]36. This discordance implies that the NT-proBNP
   elevations seen in AF may largely reflect secondary hemodynamic stress
   and atrial stretch rather than a primary effect of AF itself. Moreover,
   we observed inconsistent associations between genetic liability to AF
   and NPPB—the prohormone precursor to NT-proBNP—across two independent
   proteomic datasets, underscoring additional complexity. Although
   longitudinal cohorts have linked higher baseline NT-proBNP to
   subsequent AF^[159]37, our SMR analyses did not support a causal
   influence of genetically proxied NT-proBNP or NPPB on AF risk.
   Together, these data argue against a simple, unidirectional causal
   relationship between AF and NT-proBNP, and highlight the need for
   detailed longitudinal and mechanistic studies to untangle cause from
   consequence in the AF–NT-proBNP axis.

   Our study highlights the strong predictive value of a PGS derived from
   a cross-population GWAS, demonstrating superior performance compared to
   previous PGSs^[160]8. The enhanced predictive accuracy has significant
   implications for risk differentiation at the population level. In
   addition, we developed a ProS and found that combining the ProS with
   the PGS significantly improved risk prediction, aligning with findings
   from prior studies. A UK Biobank-based study demonstrated improved
   disease prediction when integrating a protein score with a clinical
   score^[161]38, while another UK Biobank study found a significant
   improvement when combining a protein score with a PGS^[162]39. Even
   though different protein selection methods were used between previous
   studies and the current study, the findings remained consistent.
   Collectively, these results underscore the value of multi-omic
   approaches in refining AF risk assessment. Future research should focus
   on validating these models in diverse populations and evaluating their
   potential clinical applications to further enhance personalized AF
   prevention and management strategies.

   This study has several limitations. First, although we included data
   from non-European populations, the statistical power for these
   ancestries may be limited due to smaller sample sizes, potentially
   affecting the identification of population-specific associations.
   Second, despite employing multiple prioritization strategies, some
   degree of gene misassignment is likely inevitable due to the
   limitations of current functional annotation resources. While many
   genes at these loci were prioritized based on proximity to the lead
   variant, we have explicitly noted when proximity was the sole criterion
   and, where possible, incorporated supporting evidence from eQTL
   colocalization and fine mapping to strengthen biological plausibility.
   Third, although we applied a MAC < 50 threshold to exclude rare
   variants, a small number of variants with minor allele frequency
   (MAF) < 1% remained in the analysis (5 out of 493 in European GWAS and
   7 out of 525 in cross-population GWAS). However, nearly half of these
   variants were replicated in our independent replication dataset or have
   been previously reported in association with AF in other studies. Given
   their limited number and supporting evidence, we believe that the
   inclusion of these variants does not materially affect the
   comparability of our results with earlier GWAS. Fourth, while there
   were some sample overlaps in the MR analysis, the potential bias is
   likely minimal due to the small proportion of overlapping samples and
   the strong validity of the genetic instrumental variables used. Fifth,
   the inclusion of coding variants may alter epitope binding in
   aptamer-based proteomic analyses for certain proteins, potentially
   introducing measurement bias that could affect the accuracy of MR
   results^[163]40. Lastly, all analyses were conducted using in silico
   approaches, emphasizing the need for further validation through
   functional studies and experimental research to confirm the biological
   relevance of the identified associations.

   In summary, this cross-population GWAS meta-analysis identified 525
   genetic loci for AF, refining its genetic architecture and biological
   pathways. Mendelian randomization revealed causal risk factors and
   circulating proteins, offering insights for prevention and therapeutic
   development. The cross-population-derived PGS, combined with a protein
   score, significantly improved risk prediction. This study integrates
   genetic discovery, causal inference, and multi-omic data, advancing AF
   risk stratification, prevention, and potential therapeutic strategies.

Methods

Ethics

   The study complied with all relevant regulations governing the use of
   human participants and was conducted in accordance with the principles
   of the Declaration of Helsinki. Participants in the FinnGen study
   provided informed consent for biobank research, with the study protocol
   (No. HUS/990/2017) approved by the Coordinating Ethics Committee of the
   Hospital District of Helsinki and Uusimaa (HUS). The UK Biobank
   received ethical approval from the North West Multi-center Research
   Ethics Committee (approval number: 11/NW/0382), with all participants
   giving informed consent. The Million Veteran Program (MVP) was approved
   by the VA Central Institutional Review Board (IRB), and participants
   provided informed consent. The Penn Medicine Biobank (PMBB) was
   approved by the University of Pennsylvania Institutional Review Board,
   and all participants gave informed consent. The Swedish Ethical Review
   Authority granted ethical approval for SIMPLER and the current protocol
   (no. 2019-03986), and all participants gave informed consent. Each
   study adheres to rigorous ethical guidelines to ensure the protection
   of participants and the integrity of the research.

Study design and participants

   Figure [164]1 summarizes the study design. We first performed a GWAS
   meta-analysis across eight studies as the discovery analysis in
   European populations. This was followed by a replication analysis using
   data from the UK Biobank, resulting in a European GWAS meta-analysis
   that included a total of 153,980 AF cases and 1,611,415 controls. Next,
   we extended the analysis to include data from East Asians, South
   Asians, Africans, and Admixed Americans, enabling a cross-population
   meta-analysis comprising 168,007 AF cases and 1,959,739 controls.
   Detailed descriptions of included studies are shown in Supplementary
   Methods and Supplementary Data [165]1. Using this large-scale AF GWAS,
   we conducted comprehensive downstream analyses to prioritize related
   genes, explore potential etiologies, assess genetic correlations,
   identify risk factors, and evaluate risk prediction models.

Cross-population GWAS meta-analysis

   Eight studies (the Nord-Trøndelag Health Study [HUNT], deCODE,
   DiscoverEHR, Michigan Genomics Initiative [MGI], AFGen
   consortium^[166]4, FinnGen R12, Swedish Infrastructure for Medical
   Population-Based Life-Course and Environmental Research [SIMPLER,
   [167]https://www.simpler4health.se/], and Million Veteran Program[MVP])
   contributed to the discovery analysis for the European GWAS, comprising
   117,905 atrial fibrillation (AF) cases and 1,239,541
   controls^[168]6,[169]41,[170]42. We performed GWAS association testing
   using individual-level genotype and phenotype data from participants in
   the SIMPLER cohort. By incorporating replication data from the UK
   Biobank (36,075 cases and 371,874 controls), the total sample size for
   the European GWAS reached 153,980 cases and 1,611,415 controls. To
   expand the analysis, we included data from four additional ancestries
   represented in Biobank Japan^[171]43, Genes & Health^[172]44, and
   MVP^[173]41, culminating in a cross-population meta-analysis with
   168,007 AF cases and 1,959,739 controls. Detailed descriptions of the
   study populations, genotyping procedures, and quality control protocols
   are provided in the Supplementary Methods, while AF definitions and
   sample sizes for each included study are summarized in Supplementary
   Data [174]1.

   Each dataset underwent rigorous quality control, including initial
   preprocessing, genotype imputation, post-imputation filtering, and
   association testing, with adjustments for age (or birth year), sex, and
   principal components as covariates. Post-GWAS quality control was
   performed using GWASinspector^[175]45, and SNPs with minor allele
   counts < 50 were excluded. Meta-analyses were conducted using
   METAL^[176]46, employing the fixed-effect inverse-variance-weighted
   method. After meta-analysis, variants that were present in only one
   cohort were excluded from downstream analysis.

   We applied LDSC to evaluate the contributions of population
   stratification and polygenicity to GWAS test statistic
   inflation^[177]47. Although the genomic inflation factor (λGC) was
   2.04, the LDSC intercept (1.34) and ratio (15%) indicated that most of
   the inflation could be attributed to a true polygenic signal rather
   than confounding biases. Genome-wide significant SNPs were grouped into
   loci if they were within 1 Mb of each other^[178]8. Loci were defined
   by (1) identifying genome-wide significant variants (P < 5 × 10^−⁸)
   from association results, (2) extending the region by 500 kb on either
   side of these variants, and (3) merging overlapping regions. Genetic
   loci in the European analysis were defined based on a GWAS
   meta-analysis that combined both the discovery and replication
   datasets. This integrated approach maximized statistical power,
   enabling the identification of several loci that reached genome-wide
   significance only after the datasets were meta-analyzed. Loci were
   annotated as unreported if loci had no overlapping coordinates with
   previously reported genome-wide significant variants (P < 5 × 10^−⁸)
   associated with AF based on a comprehensive evaluation. This included
   PheWAS lookups using the Open Targets platform
   ([179]https://genetics.opentargets.org/, integrating data from the GWAS
   Catalog, UK Biobank, and FinnGen), as well as cross-referencing with
   prior AF GWAS reports, including those by Thorolfsdottir et al.
   (2017)^[180]48, Nielsen et al. (2018)^[181]6, Roselli et al. (2018,
   2025)^[182]7,[183]49, Miyazawa et al. (2023)^[184]8, Verma et al.
   (2024)^[185]41, Choi et al. (2025)^[186]50, and other relevant studies.

Gene prioritization

   We applied six complementary gene prioritization approaches to identify
   the most confident locus-gene pairs: (1) nearest gene annotation, (2)
   MAGMA-based gene prioritization^[187]51, (3) Polygenic Priority Score
   (PoPS)^[188]52, (4) eQTL colocalization, (5) CARMA (Credible-variant
   Analysis for Regional Meta-Analysis)-based functional gene
   prioritization^[189]53, and (6) transcriptome-wide association study
   (TWAS)^[190]54. For each genomic locus, the prioritized gene was
   determined by selecting the gene with the highest count of selections
   across these six methods. In cases where multiple genes had the same
   count, prioritization was refined by first considering genes encoding
   variants within CARMA-identified credible sets, followed by the nearest
   gene^[191]55. Below is a detailed description of each approach:

Nearest gene annotation

   The gene closest to the lead SNP in each locus was identified based on
   its physical distance to the gene body. This analysis was performed
   using the get_nearest_gene() function from the gwasRtools R package
   ([192]https://github.com/lcpilling/gwasRtools).

MAGMA-Based gene prioritization

   We utilized MAGMA to annotate genes within genomic loci using the 1000
   Genomes Project as the reference panel^[193]51. SNPs were mapped to
   genes based on their physical positions, including the gene body and
   flanking regions (± 10 kb). Gene-level p-values were then calculated by
   aggregating SNP association statistics while accounting for linkage
   disequilibrium (LD) structure. The gene with the smallest p-value
   within each locus was selected as the prioritized gene.

PoPS

   PoPS, a similarity-based gene prioritization tool, integrates publicly
   available datasets, such as RNA sequencing data, curated pathway
   annotations, and predicted protein-protein interaction
   networks^[194]52. Based on the premise that causal genes share similar
   functional characteristics, PoPS calculates gene-level association
   statistics using GWAS summary statistics and MAGMA-based gene
   annotations. It then selects relevant features from precomputed
   statistics and assigns a score to each gene, reflecting its likelihood
   of being causal. For each genome-wide significant locus, genes within
   1 Mb of the index variant (in both directions) were ranked by their
   PoPS scores, with the highest-ranked gene prioritized.

eQTL Colocalization

   Colocalization analysis was conducted using the coloc R package, which
   applies an approximate Bayes factor framework to assess whether two
   traits share a causal genetic signal^[195]56. Using the coloc.abf()
   function ([196]https://github.com/chr1swallace/coloc), we calculated
   posterior probabilities for five hypotheses: (H0) no association with
   either trait; (H1/H2) association with only one trait; (H3) association
   with both traits but different causal variants; and (H4) association
   with both traits with the same causal variant. A high posterior
   probability for H4 (PP4 > 0.8) was considered evidence of
   colocalization^[197]57. For this analysis, we used eQTL data from
   eQTLGen Phase I^[198]58 and the Genotype-Tissue Expression (GTEx)
   Project v8^[199]59 for heart atrial appendage and heart left ventricle
   tissues. Variants within 500 kb of each GWAS index variant were
   extracted to perform colocalization analysis.

CARMA-Based functional gene prioritization

   We applied CARMA, a Bayesian fine-mapping approach^[200]53, to identify
   credible sets of variants within each genomic locus. CARMA accounts for
   LD structure and aggregates association signals across studies or
   populations to identify variants most likely to be causal. For each
   locus, CARMA generated a credible set with a high posterior probability
   (e.g., 95%) of containing the causal variant(s). Functional annotation
   of these variants was performed using Open Targets
   ([201]https://www.opentargets.org/), which provides information on
   coding, regulatory, and splicing effects^[202]60. If a causal variant
   was located within or directly affected a gene’s function, that gene
   was assigned to the locus.

TWAS

   We performed TWAS using MetaXcan^[203]54 to estimate the relationship
   between genetically predicted gene expression and AF. MetaXcan
   integrates GWAS summary statistics with precomputed gene expression
   prediction models to identify genes associated with the phenotype. For
   this analysis, we used expression prediction models for the heart
   atrial appendage, artery tibial, and heart left ventricle, leveraging
   LD reference data from GTEx v8^[204]59 and cross-population AF-GWAS
   summary statistics. For the TWAS, the target tissues were selected
   based on results from MAGMA tissue enrichment analysis and
   stratified-LDSC^[205]61, both of which were conducted using gene
   expression data from GTEx v8. MAGMA tissue enrichment analysis
   identifies tissues where genes associated with the trait of interest
   are significantly enriched by testing the relationship between GWAS
   association signals and tissue-specific gene expression profiles.
   S-LDSC further refines this by partitioning heritability across genomic
   regions annotated with tissue-specific gene expression and estimating
   the contribution of each tissue to the trait heritability. Using these
   complementary approaches, tissues such as the heart atrial appendage,
   heart left ventricle, and artery tibial were identified as relevant for
   atrial fibrillation (Supplementary Fig. [206]18). These selected
   tissues were then used to predefine the expression prediction models
   for the TWAS. Bonferroni correction was applied to account for multiple
   testing, and the gene with the lowest p-value within each locus was
   prioritized.

Pathway enrichment

   Pathway enrichment analysis was performed to identify biological
   pathways and functional categories associated with the prioritized
   genes. Reactome^[207]62 enrichment was conducted using Enrichr
   ([208]https://maayanlab.cloud/Enrichr/), enabling the exploration of
   curated pathways^[209]63. Gene Ontology (GO) enrichment
   analysis^[210]64, which provided insights into biological processes,
   molecular functions, and cellular components, was carried out using the
   enrichGO function from the clusterProfiler Bioconductor R package
   ([211]https://bioconductor.org/packages/release/bioc/html/clusterProfil
   er.html). To minimize false-positive findings, Bonferroni correction
   was applied to account for multiple testing, with the significance
   threshold set at P < 0.05/number of tests performed.

Population-specific genetic correlations with circulatory endpoints

   Using LDSC, we calculated the genetic correlations of AF with 130 and
   97 circulatory endpoints defined by phecodes, separately for Europeans
   and Africans in the MVP cohort. The MVP GWAS included up to 449,042
   European participants and 121,177 African participants^[212]41. Genetic
   correlations with rg > 1.25 or < − 1.25 were removed due to poor
   inheritability (h^2 estimates was very close to zero). To account for
   multiple testing and reduce the likelihood of false-positive results,
   the Bonferroni correction was applied. To objectively compare genetic
   correlations between populations, we applied Cochran’s Q test to assess
   heterogeneity in the correlation estimates between European and African
   populations. Traits were considered to exhibit population-specific
   differences if they showed evidence of substantial heterogeneity,
   defined as an I² statistic greater than 75% and a Cochran’s Q test
   P–value less than 0.05.

Mendelian randomization analysis for modifiable risk factors

   MR is an analytical approach that strengthens causal inference by
   leveraging genetic variants (IVs) as instrumental variables to estimate
   the causal effect of an exposure on an outcome. A comprehensive
   description of the MR design is provided in the Supplementary Methods.
   Using GWAS meta-analysis data, we conducted MR to evaluate the
   associations between 37 modifiable risk factors and AF risk. These
   modifiable factors span multiple categories, including adiposity, blood
   lipids, type 2 diabetes and glycemic traits, other metabolic traits
   (e.g., blood pressure, thyroid function, and kidney function),
   lifestyle factors (e.g., smoking, alcohol and coffee consumption, and
   physical activity), sleep behaviors, and dietary factors (e.g.,
   circulating levels of vitamins and minerals). The selection of these
   factors was guided by a recent comprehensive review of AF risk
   factors^[213]65. Detailed information on the GWAS data sources for
   these traits is summarized in Supplementary Data [214]18.

   Genetic variants associated with the exposures were selected at a
   genome-wide significance threshold of P < 5 × 10^−8. To ensure
   independence among instrumental variables, SNPs were pruned at R
   ^2 < 0.01, minimizing the effects of collinearity due to LD. The
   strength of the instrumental variables was assessed using
   F-statistics^[215]66, with all variants meeting the threshold of
   F > 10. Data harmonization was performed to align effect and non-effect
   alleles consistently between the exposures and outcomes. Detailed
   information on the used genetic instruments is presented in
   Supplementary Data [216]19.

   For exposures with fewer than five genetic instruments, the inverse
   variance weighted (IVW) method with a fixed-effects model was used. For
   exposures with five or more genetic instruments, we employed MR-PRESSO
   as the primary analysis method, as it accounts for pleiotropic effects
   by identifying and removing outlier SNPs^[217]67. In the absence of
   outlier SNPs, MR-PRESSO provides estimates equivalent to the IVW
   method. Sensitivity analyses included the IVW method with random
   effects, the weighted median method^[218]68, and MR-Egger
   regression^[219]69. Heterogeneity among SNP-specific estimates was
   assessed using Cochran’s Q test, while the MR-Egger intercept test was
   used to evaluate the presence of horizontal pleiotropy. The scatter
   plot was used to visualize potential pleiotropic SNPs. To minimize
   false-positive findings, we applied Bonferroni correction to account
   for multiple testing.

MR and colocalization analyses for circulating proteins

   For the MR analysis of circulating proteins, we utilized two
   large-scale pQTL (protein quantitative trait loci) datasets,
   deCODE^[220]70 and UKB-PPP^[221]71, for IV selection (Supplementary
   Fig. [222]19). After excluding overlapping proteins, a total of 2847
   proteins with cis-SNPs were included in the analysis. For proteins
   present in both datasets, we prioritized data from UKB-PPP due to its
   larger sample size and the fact that it identified a greater number of
   cis-pQTLs using the Olink^[223]72. Importantly, the associations of
   overlapping proteins with AF showed strong consistency between the two
   datasets, supporting the robustness of the findings. To validate the
   results, we used the Fenland study as a replication dataset^[224]73,
   focusing on proteins with available IVs in this study to replicate the
   observed associations.

   We used the lead cis-SNP associated with plasma protein levels at
   P < 5 × 10^−8 as the genetic IV. Cis-SNPs were defined as variants
   located within 250 kb of the encoding gene. Detailed information on
   selected genetic IVs is shown in Supplementary Data [225]20. The
   Summary-data-based Mendelian Randomization (SMR) method was employed to
   estimate the association between genetically predicted protein levels
   and AF risk^[226]74. SMR integrates GWAS and pQTL summary statistics to
   evaluate whether the genetic association with a phenotype (i.e., AF) is
   mediated through the genetically regulated protein levels. To evaluate
   potential pleiotropy, we performed HEIDI (Heterogeneity in Dependent
   Instruments) analysis^[227]74. HEIDI assesses whether the association
   between the protein and the phenotype is driven by the same causal
   variant or by independent variants in LD. The analysis uses 3–20 SNPs
   in the cis region of the encoding gene to test for heterogeneity. A
   HEIDI p-value > 0.05 suggests no evidence of pleiotropy and supports
   the hypothesis of a shared causal variant. To further rule out
   false-positive associations caused by LD, we conducted
   traditional^[228]56 and SuSiE (Sum of Single Effects)^[229]75
   colocalization analyses, using all SNPs in the cis gene region as
   input. As described in detail in the eQTL colocalization section,
   strong evidence of shared causal variants between protein levels and AF
   was indicated by PP.H4 ≥ 0.8, a stringent but widely accepted threshold
   in colocalization studies. We applied Bonferroni correction to account
   for multiple testing. Associations were considered potentially causal
   if they met the following criteria: adjusted P < 0.05 for the SMR
   analysis, adjusted P > 0.05 for the HEIDI test (indicating no
   pleiotropy), and colocalization posterior probability PP.H4 ≥ 0.8.
   Bonferroni correction was used for multiple testing for SMR analysis.

   The druggability of identified proteins was assessed using multiple
   drug databases, including DrugBank^[230]76, DepMap^[231]77, and
   OpenTargets^[232]60. Based on their therapeutic potential, proteins
   were classified into five categories: (1) approved drug targets, (2) in
   clinical trials, (3) preclinical candidates, (4) druggable, and (5) not
   currently listed as druggable targets.

   To examine the effect of genetic liability to AF on blood proteins, we
   conducted a reverse MR analysis using 624 SNPs as instrument variables
   for AF (P < 5 × 10^−8 and r^2 for linkage disequilibrium < 0.01) and
   protein GWAS data from deCODE and UKB-PPP. Bonferroni correction was
   used for multiple testing.

Joint performance of PGS and protein score (ProS)

PGS analysis

   The weights of the polygenic scores (PGS) in the current study were
   generated using the “auto” setting of PRS-CSx^[233]78, incorporating
   summary statistics from the meta-analysis and corresponding EUR, AFR,
   AMR, EAS, or SAS LD reference panels derived from 1000 Genomes Project
   Phase 3 samples. This approach eliminates the need for independent
   training data. The effective sample size was calculated as
   4/((1/ncases) + (1/ncontrols)). For the score applied to the UK
   Biobank, weights were derived using data that excluded summary
   statistics from UK Biobank participants. As a reference, we used the
   PGS (PGS002814) from the Miyazawa et al. study^[234]8, which was
   derived using the Pruning and Thresholding method (r^2 = 0.5 and
   P = 5 × 10^−4; [235]https://www.pgscatalog.org/score/PGS002814/). We
   used the DeLong test^[236]79, implemented in the pROC R package, to
   statistically compare the AUROC of the polygenic scores and evaluate
   whether the difference in predictive performance was significant. We
   calculated PGS for 4401 individuals with AF and 32,760 individuals
   without from the Penn Medicine BioBank (PMBB)^[237]80, an ongoing study
   that integrates genomic and electronic health record data to
   investigate the genetic and clinical determinants of various diseases.
   The population breakdown of PMBB participants is predominantly European
   (∼ 70%), followed by African (∼ 25%), with smaller proportions of South
   Asian, East Asian, Admixed American, and other populations
   (Supplementary Fig. [238]20). The study was approved by the University
   of Pennsylvania Institutional Review Board. To standardize the scores,
   we applied a principal component analysis-based method, normalizing
   both the mean and variance to the 1000 Genomes reference panel. The
   association between PGS and prevalent AF was assessed using a
   generalized linear regression model with a logit link, adjusting for
   age and sex as covariates. We evaluated the PGS effect size using odds
   ratios and assessed model performance by calculating the area under the
   receiver operating characteristic curve (AUROC) and Brier score. Using
   the ‘tidymodels’ R package ([239]https://github.com/tidymodels), we
   performed V-fold cross-validation to validate model performance. The
   same approach was used to test PGS performance in the UK Biobank,
   including 3441 individuals with incident AF and 47,437 without incident
   AF with available proteomic profiles.

Protein score analysis

   We derived a protein score (ProS) for AF using individual-level data
   from the UK Biobank, a large, ongoing population-based prospective
   cohort study with extensive proteomic and phenotypic data. To rule out
   proteins with reverse associations, we first conducted a prospective
   cohort analysis. Participants with baseline AF or those diagnosed with
   AF within the first two years of follow-up were excluded, leaving
   50,878 participants with proteomic data. Proteins with a missing rate
   exceeding 30% were also excluded, resulting in a final dataset of 2920
   proteins. After adjusting for age, sex, ethnicity, Townsend deprivation
   index, education, body mass index, smoking status, drinking status, and
   physical activity, 459 proteins were significantly associated with
   incident AF after Bonferroni correction (Supplementary Data [240]21).
   We then used the Least Absolute Shrinkage and Selection Operator
   (LASSO) method to construct the ProS^[241]81. We applied LASSO logistic
   regression to identify candidate proteins associated with AF, using
   five-fold cross-validation to determine the optimal penalty parameter
   (λ). A weighted protein score (ProS) was then constructed based on the
   proteins selected via LASSO. Specifically, a Cox regression model was
   used to estimate the log-hazard ratios for each protein and the
   baseline hazard function. The individual risk score for each
   participant was subsequently calculated using: Risk
   [MATH:
   <mi>S</mi><mi>c</mi><mi>o</mi><mi>r</mi><mi>e</mi><mo>=</mo><msub><mrow
   ><mi>h</mi></mrow><mrow><mn>0</mn></mrow></msub><mrow><mo>(</mo><mrow><
   mi>t</mi></mrow><mo>)</mo></mrow><mo>×</mo><mi>e</mi><mi>x</mi><mi>p</m
   i><mrow><mo>(</mo><mrow><msub><mrow><mi>β</mi></mrow><mrow><mn>1</mn></
   mrow></msub><msub><mrow><mi>X</mi></mrow><mrow><mn>1</mn></mrow></msub>
   <mo>+</mo><msub><mrow><mi>β</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   sub><mrow><mi>X</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>+</mo><mo>
   ⋯</mo><mo>+</mo><msub><mrow><mi>β</mi></mrow><mrow><mi>n</mi></mrow></m
   sub><msub><mrow><mi>X</mi></mrow><mrow><mi>n</mi></mrow></msub></mrow><
   mo>)</mo></mrow> :MATH]
   , where X[n] is the level of the n-th selected protein, and β[n] is the
   corresponding coefficient from the Cox model. Participants were
   randomly split into training and validation cohorts in a 7:3 ratio
   using the R package caret ([242]https://github.com/topepo/caret). The
   model demonstrating the best predictive performance in the training
   cohort were then validated in the remaining 30% of participants and
   ultimately combined into a final model for predicting the risk of AF
   onset.

Joint performance

   The AUROC analysis was performed to assess the predictive performance
   of the selected key proteins for AF, both individually and in
   combination with the PGS in the UK Biobank. The DeLong test^[243]79 was
   used to statistically compare the AUROC of these scores and their
   difference in predictive performance.

Reporting summary

   Further information on research design is available in the [244]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [245]Supplementary Information^ (2.5MB, pdf)
   [246]41467_2025_61720_MOESM2_ESM.pdf^ (88KB, pdf)

   Description of Additional Supplementary Files
   [247]Supplementary Data 1^ (11.6KB, xlsx)
   [248]Supplementary Data 2^ (100.7KB, xlsx)
   [249]Supplementary Data 3^ (72.1KB, xlsx)
   [250]Supplementary Data 4^ (46.9KB, xlsx)
   [251]Supplementary Data 5^ (87.6KB, xlsx)
   [252]Supplementary Data 6^ (20KB, xlsx)
   [253]Supplementary Data 7^ (22.7KB, xlsx)
   [254]Supplementary Data 8^ (18.5KB, xlsx)
   [255]Supplementary Data 9^ (23KB, xlsx)
   [256]Supplementary Data 10^ (231KB, xlsx)
   [257]Supplementary Data 11^ (13.8KB, xlsx)
   [258]Supplementary Data 12^ (11.3KB, xlsx)
   [259]Supplementary Data 13^ (12.3KB, xlsx)
   [260]Supplementary Data 14^ (824.6KB, xlsx)
   [261]Supplementary Data 15^ (491.3KB, xlsx)
   [262]Supplementary Data 16^ (10.1KB, xlsx)
   [263]Supplementary Data 17^ (10.1KB, xlsx)
   [264]Supplementary Data 18^ (13.6KB, xlsx)
   [265]Supplementary Data 19^ (466.1KB, xlsx)
   [266]Supplementary Data 20^ (513.9KB, xlsx)
   [267]Supplementary Data 21^ (171.7KB, xlsx)
   [268]Reporting Summary^ (1,002.5KB, pdf)
   [269]Transparent Peer Review file^ (1.3MB, pdf)

Source data

   [270]Source Data^ (411.5KB, xlsx)

Acknowledgements