Abstract The extensive co-occurrence of cardiovascular diseases (CVDs), as evidenced by epidemiological studies, is supported by positive genetic correlations identified in comprehensive genetic investigations, suggesting a shared genetic basis. However, the precise genetic mechanisms underlying these associations remain elusive. By assessing genetic correlations, genetic overlap, and causal connections, we aim to shed light on common genetic underpinnings among major CVDs. Employing multi-trait analysis, we pursue diverse strategies to unveil shared genetic elements, encompassing SNPs, genes, gene sets, and functional categories with pleiotropic implications. Our study systematically quantifies genetic overlap beyond genome-wide genetic correlations across CVDs, while identifying a putative causal relationship between coronary artery disease (CAD) and heart failure (HF). We then pinpointed 38 genomic loci with pleiotropic influence across CVDs, of which the most influential pleiotropic locus is located at the LPA gene. Notably, 12 loci present high evidence of multi-trait colocalization and display congruent directional effects. Examination of genes and gene sets linked to these loci unveiled robust associations with circulatory system development processes. Intriguingly, distinct patterns predominantly driven by atrial fibrillation, coronary artery disease, and venous thromboembolism underscore the significant disparities between clinically defined CVD classifications and underlying shared biological mechanisms, according to functional annotation findings. Subject terms: Genomics, Cardiovascular biology, Cardiology __________________________________________________________________ Cardiovascular diseases often co-occur. Here the authors conduct cross-disease genome wide association studies using multi-trait analyses to uncover shared genetic mechanisms contributing to overlapping risk profiles. Introduction Cardiovascular diseases (CVDs), particularly coronary artery disease (CAD) and stroke, are the leading global causes of mortality and disability^[56]1. In detail, heart failure (HF) is a complex clinical syndrome that often represents the final stage of various heart diseases, commonly resulting from CAD^[57]2. Atrial fibrillation (AF), on the other hand, is a primary risk factor for both stroke and venous thromboembolism (VTE)^[58]3. Vascular diseases, such as peripheral artery disease (PAD) and CAD, further increase the risk of both stroke and VTE^[59]4,[60]5. Additionally, atherosclerosis and hypertension serve as major precursors to many CVDs, although the precise mechanisms triggering these diseases remain poorly understood^[61]6. Moreover, these conditions exhibit a complex genetic pattern, which suggests that children of individuals with CVDs face an increased risk of developing a range of CVDs, not merely the specific disease their parents experienced^[62]7,[63]8. Furthermore, the frequent co-occurrence of multiple CVDs, known as comorbidity, underscores the complex interplay among various CVDs. Recent large-scale genomic studies have established the substantial influence of genetic variation on the risk of various CVDs^[64]9. Genome-wide association studies (GWASs) have pinpointed numerous genetic risk loci for major CVDs such as AF, CAD, VTE, HF, PAD, and stroke^[65]10–[66]13. Strikingly, these loci often exert effects across multiple phenotypes, revealing a remarkable interconnectedness among CVDs. While pleiotropic effects are widely acknowledged, their systematic presence and downstream consequences in the context of CVDs have yet to be fully assessed. Furthermore, the common genetic variants discovered to date account for only a limited portion of the heritable risk for these diseases. The ongoing search for new risk loci associated with traits necessitates the continual expansion of sample sizes. Unfortunately, challenges in recruiting large study populations have notably hindered variant discovery for certain traits, such as PAD, which lags behind more common CVDs like CAD and stroke^[67]14 In response to these challenges, the concept of multi-trait analysis in GWAS has emerged to combine GWAS summary data from diverse cohorts with genetically correlated traits to increase the study sample size^[68]15. Recent statistical advancements, represented by multi-trait analysis of GWAS (MTAG)^[69]16, enhance the capacity to detect pleiotropic variants associated with multiple diseases, thereby unraveling the genetic basis for shared risk across diseases. To date, cross-disease GWAS multi-trait analyses have predominantly focused on psychiatric and autoimmune disorders^[70]17–[71]22. Given the pervasive comorbidities observed in epidemiological studies and the existing evidence indicating a shared genetic background across multiple CVDs, the need for a comprehensive, large-scale analysis became evident. In this work, we leverage MTAG to conduct an integrative analysis of the largest available GWAS datasets encompassing six major CVDs in individuals of European ancestry—AF, CAD, VTE, HF, PAD, and stroke—to characterize the underlying biology of the novel risk loci in the context of CVDs. This study addressed four central inquiries regarding the shared genetic basis of these six CVDs (Fig. [72]1): (i) Can we identify shared genetic architectures amidst the diverse landscape of these clinically distinct CVDs? (ii) Can we determine whether the shared genetic architectures related to major CVDs are driven by causal associations (i.e., vertical pleiotropy)? (iii) Can we detect additional genomic loci for multiple CVDs (i.e., pleiotropic loci), and can we determine whether some of the loci exhibit opposite allelic effects across CVDs? (iv) Can we identify functional features of the pleiotropic loci that could account for their widespread impact on cardiovascular pathology? Fig. 1. Schematic representation of analyses performed for all 6 major cardiovascular diseases in the current study. [73]Fig. 1 [74]Open in a new tab This figure illustrates the comprehensive analytical approach used for all six major cardiovascular diseases within this study. The initial GWAS findings were obtained from various repositories. Before integrating cardiovascular diseases into a multi-trait analysis, genetic characteristics, such as genome-wide and local genetic correlations, and genetic overlap were individually estimated to identify genetic overlap across the six CVDs beyond genetic correlations. The shared genetic basis was interpreted as genetic variants influencing multiple complex phenotypic traits through vertical and horizontal pleiotropy. Mendelian randomization was used to elucidate the important role of vertical pleiotropy in CVDs. Subsequently, the results of SNP and genomic loci analyses from the multi-trait assessment, along with cross-trait analysis for replication, were compared with the results from individual cardiovascular diseases, uncovering many novel pleiotropic loci. Various methods were employed to comprehensively characterize the shared genetic mechanisms for each of the six cardiovascular diseases. First, we examined the convergence of SNPs, genomic loci, and mapped genes. Subsequent stages involved biological pathway and functional category analyses, evaluating the enrichment of genetic signals across 9398 distinct gene sets and 49 tissue types for each disease. The enrichment assessments were further extended using LDSC-SEG and GARFIELD, covering 489 and 1005 functional genomic categories, respectively. The diagram was created using BioRender and included with permission for publication (Created in BioRender. Feng, Y. (2025) [75]https://BioRender.com/mp87c7d). AF atrial fibrillation, CAD coronary artery disease, VTE venous thromboembolism, HF heart failure, PAD peripheral artery disease. Results Genome-wide genetic overlap beyond genetic correlation across the six CVDs We obtained GWAS summary statistics for six major CVDs^[76]10–[77]13,[78]23,[79]24, with sample sizes varying widely (median: 1,098,263, ranging from 511,634 for PAD to 1,500,861 for VTE) (Supplementary Data [80]1 provides an overview and acronyms; Supplementary Note 1 lists data sources; Fig. [81]1 illustrates the study flowchart). After harmonizing and filtering of SNPs shared across GWAS summary statistics, we used linkage disequilibrium (LD) score regression (LDSC)^[82]25,[83]26 to estimate SNP-based heritability (h^2[SNP]) and genome-wide genetic correlation (r[g]). The statistical power and genetic signal strength of the GWASs varied substantially across the six CVDs, with the median h^2[SNP] being 1.4% (ranging from 0.6% to 3.2%; Fig. [84]2a and Supplementary Data [85]2a). Estimated r[g]s ranged from moderate to strong (median = 0.435; ranging from 0.148 to 0.677) and were all significantly positive after Bonferroni correction (P = 8.33 × 10^−3; Fig. [86]2b and Supplementary Data [87]2b). Substantial r[g]s were observed between CVD pairs regardless of whether the individual traits had low or high h^2[SNP]. For example, HF and stroke, with h^2[SNP] values of 0.6 and 0.8%, respectively, showed a relatively high r[g] of 0.552 (standard error [SE] = 0.051), whereas AF and CAD, despite having higher h^2[SNP] (2.5 and 3.2%, respectively), exhibited a lower r[g] of 0.197 (SE = 0.024). Fig. 2. Genome-wide and local genetic correlations, and genetic overlap of the six cardiovascular diseases. [88]Fig. 2 [89]Open in a new tab a Error-bar plot of the SNP-based heritability (h^2[SNP]) point estimates for the six cardiovascular diseases, calculated using univariate LDSC. The magnitude and precision of the estimates varied substantially across the CVDs (h^2[SNP] range from 0.0059 to 0.0324; standard error (SE) range from 0.0005 to 0.0033). Error bars represented one standard deviation (SD) (1.96 × SE). b Network visualization of the Bonferroni-corrected significant global r[g] among the cardiovascular diseases, calculated using bivariate LDSC. Connections represented significant r[g], with the correlation value along connections, thicker lines indicating stronger correlations, and dark gray denoting more significant correlations. The size of the nodes is weighted by the sample size and h^2[SNP] of the given cardiovascular disease (size = h^2[SNP] × sqrt(n)) (AF, n = 1,030,836; CAD, n = 1,165,690; VTE, n = 1,500,861; HF, n = 977,323; PAD, n = 511,634; Stroke, n = 1,308,460). c Along the diagonal, univariate MiXeR estimates are provided for each CVD. h^2[SNP] = SNP-based heritability estimate; polygenicity[90] = number (in thousands) of causal variants with the strongest effects required to explain 90% of SNP-based heritability. Mixer-modeled genome-wide genetic overlap and genetic correlations (top-right) and LAVA local correlations (bottom-left) across the six CVDs are shown. Top-right: MiXeR Venn diagrams showing the number of shared and disorder-specific “causal” variants for each pair of diseases. Genome-wide genetic correlation (r[g]) and genetic correlation of shared variants (r[gs]) are represented by the color of the disease-specific (r[g]) and shared regions (r[gs]), respectively, in the Venn diagrams. For AF-HF, CAD-PAD, and HF-PAD, both AICs (best_vs_min_AIC and best_vs_max_AIC) were negative, indicating that the analysis lacked sufficient power to provide precise estimates of genetic overlap. Bottom-left: Volcano plots of LAVA local genetic correlation coefficients (rho, y-axis) against -log10 (P values) for each pairwise analysis at each locus. P values of the local r[g] were corrected for the total number of bivariate tests conducted (P  =  0.05/ the number of bivariate tests = 0.05/1672  =  2.9 × 10^−5). All statistical tests were two-sided. Larger dots with black circles represent loci significantly correlated after the Bonferroni correction. MiXeR-estimated r[g] and r[gs], and LAVA estimated rho were represented on the same blue to red color scale. Note that the Volcano plots were plotted at P values truncated by 1 × 10^−10 for better visualization. AF atrial fibrillation, CAD coronary artery disease, VTE venous thromboembolism, HF heart failure, PAD peripheral artery disease. Although r[g] is a valuable measure to assess the genetic similarity between two phenotypes, it fails to capture all dimensions of genetic overlap. As r[g] is a genome-wide summary measure, it cannot distinguish genetic overlap resulting from a mixture of concordant and discordant effects from the absence of genetic overlap, leading to an estimated r[g] close to 0 in both cases. Therefore, it was imperative to use multiple methods with different modeling assumptions to capture this “missing dimension” of genetic overlap and comprehensively describe the shared genetic architecture of CVDs beyond genetic correlation, including the bivariate causal mixture model (MiXeR)^[90]27, local analysis of (co)variant association (LAVA)^[91]28, and pairwise GWAS (GWAS-PW)^[92]29. To further characterize the extent of polygenic overlap beyond genetic correlation, we applied MiXeR, a modeling framework that estimates the total number of trait-influencing (i.e., “causal”) variants for each phenotype via univariate analysis, as well as the number of shared and trait-specific variants between pairs of phenotypes via bivariate analysis. First, we used univariate MiXeR analysis to estimate the number of trait-influencing variants for each trait (i.e., polygenicity) and the average magnitude of additive genetic associations among these variants (i.e., discoverability). Among the six CVDs, HF and CAD exhibited higher estimated polygenicity, indicative of more extensive involvement of genetically correlated genomic regions compared to other CVDs (Supplementary Data [93]3a). In contrast, PAD exhibited higher discoverability but lower polygenicity than other CVDs. As a result, PAD exhibited comparatively fewer genomic loci than other CVDs (Supplementary Data [94]3a). Notably, the estimated sample size required to achieve 90% h^2[SNP] was over 24 times smaller for PAD than for HF (Supplementary Fig. [95]1). Overall, there were substantial genetic overlaps among the six CVDs, with the median Dice coefficient (i.e., the proportion of shared variants to the total number of variants) was 0.362 (ranging from 0.207 to 0.732; Fig. [96]2c, Supplementary Fig. [97]2, and Supplementary Data [98]3b), both in the presence of moderate positive r[g] (CAD-HF) and minimal r[g] (AF-PAD). For example, the genetic overlap between CAD and HF was particularly striking, with 1397 (standard deviation [SD] = 254) shared variants, representing 93.3% of the variants influencing CAD and 60.9% of the variants influencing HF. This mirrored their robust positive genome-wide r[g] of 0.677 (SE = 0.030) and genetic correlation of shared variants (r[gs] = 0.699, SE = 0.013). In contrast, fewer shared variants (74, SD = 11) were observed between AF and PAD, representing 14.0% of AF-influencing variants and 18.9% of PAD-influencing variants, consistent with their weak genome-wide r[g] of 0.204 (SE = 0.046). This was associated with weak positive genetic correlations for both shared variants (r[gs] = 0.204, SE = 0.046) and genome-wide levels (r[g] = 0.140, SE = 0.015). Extensive genetic overlap was also observed in the presence of weak genetic correlations. For example, CAD and VTE shared a large number of ‘causal’ variants (290, SD = 88), with only 35 unique VTE variants, despite having a weak genome-wide genetic correlation (r[g] = 0.182, SE = 0.026) and genetic correlation of shared variants (r[gs] = 0.220, SE = 0.008). This pattern of extensive genetic overlap but weak genetic correlation suggests a mixture of concordant and discordant genetic effects, as indicated by the MiXeR-estimated proportion of shared “causal” variants with concordant effects (0.696, SE = 0.074). To further investigate the regional architecture of genetic sharing between CVDs, we applied LAVA to estimate pairwise local genetic correlations (local r[g]) across ~2495 1-Mb semi-independent genomic regions. This approach facilitated the detection of localized genetic sharing and mixed-effect directions, even in disease pairs with minimal genome-wide r[g]. Among 896 unique genomic regions that met the significance threshold (P < 1 × 10^−4), 495 (55.25%) exhibited univariate genetic signals for multiple CVDs, with 21 regions (2.34%) displaying signals for more than half of the CVDs (Fig. [99]2c and Supplementary Data [100]4). In total, 130 unique genomic regions were significant in bivariate analyses (P < 0.05) across CVDs, with 18 of these regions shared more than one CVD pair, further providing evidence of mixed-effect directions. This phenomenon was most pronounced between CAD and VTE, where 14 regions were positively correlated and 5 were negatively correlated. Even CAD and PAD, despite their strong positive rg, exhibited a mixture of effect directions, with ten positively and 1 negatively correlated regions. However, after Bonferroni correction, only 24 distinct genomic regions remained significant for local rg, and just two regions displayed significant correlations for more than two CVDs (P = 0.05/number of bivariate tests = 0.05/577 = 8.67 × 10^−5; Fig. [101]2c, Supplementary Data [102]5, and Supplementary Note [103]2). To further investigate whether the genomic regions identified by LAVA harbored variants associated with multiple CVDs, we applied GWAS-PW to estimate the posterior probabilities of pleiotropy across the 2495 LAVA-defined regions shared by both traits. Overall, GWAS-PW analysis identified 2112 unique genomic regions with variants influencing pairs of CVDs (i.e., PPA3 >0.5 or PPA4 >0.5, Supplementary Data [104]6), suggesting the presence of shared genomic regions and causal variants. Notably, 1238 unique genomic regions demonstrated robust pleiotropic signals across loci shared by more than two trait pairs. Additionally, 15 out of the 24 unique genomic regions were significantly detected by both the bivariate LAVA and GWAS-PW analyses. For example, within the locus (LD block 1398; chr9: 21,674,033-23,091,193), six trait pairs exhibited strong positive local r[g] at the nominal significance level (P < 0.05). Four of these six trait pairs (including AF-CAD, CAD-HF, CAD-PAD, and CAD-Stroke) remained significant after Bonferroni correction and were also validated by GWAS-PW. Further analysis of this region using hypothesis prioritization in multi-trait colocalization (HyPrColoc) provided strong colocalization evidence across all CVDs except VTE and PAD, with a posterior probability (PP) greater than 0.7. This locus encompassed a shared causal SNP (rs4977574), located within an intron of cyclin-dependent kinase inhibitor 2B antisense RNA 1 (CDKN2B-AS1) on chromosome 9q21.3. Previous studies suggest a potential association between the CDKN2B-AS1 gene rs4977574 A/G polymorphism and susceptibility to coronary heart disease, which may influence the progression and severity of vascular calcification in vascular smooth muscle cells^[105]30. In summary, these findings reinforced the genetic resemblance among CVDs, implicating the likely presence of shared molecular mechanisms contributing to general CVD predisposition. Causal association among the six CVDs (i.e., vertical pleiotropy) The shared genetic basis can be interpreted as genetic variants affecting multiple complex phenotypic traits by vertical and/or horizontal pleiotropy. In detail, horizontal pleiotropy arises when two phenotypes are independently influenced by the same genetic variant, suggesting possible shared biological pathways linking complex traits. However, vertical pleiotropy occurs when a genetic variant influences one phenotype, which in turn affects the expression of a second phenotype. This type of pleiotropy—also referred to as genetic causality—is the primary focus of MR studies. Subsequently, the construction of the latent causal variable (LCV) model^[106]31 allowed us to estimate the partial genetic causality between significantly correlated CVD pairs. We observed convincing evidence for three CVD pairs (Supplementary Data [107]7), where the absolute value of genetic causality proportion (GCP) exceeded 0.6 and the P value passed Bonferroni correction (P < 0.05/the number of independent CVD pairs = 0.05/6  =  8.33 × 10^−3; Supplementary Data [108]7). By considering the sign of the r[g], it was plausible to infer that CAD could elevate HF risk, while VTE might have an adverse impact on both CAD and stroke. Notably, the AF-HF trait pair (GCP  =  0.575, P  =  1.07 × 10^−3) exhibited trends toward partial genetic causation, suggesting AF might partially influence HF, but fell short of the stringent GCP threshold. To validate the reproducibility of these partial causal associations, the MRlap method was employed, confirming the positive causal associations between CAD-HF and VTE-Stroke (Supplementary Fig. [109]3 and Supplementary Data [110]8a). Furthermore, a trait pair (VTE-CAD) also showed a partial positive causality, although it did not withstand multiple testing corrections (P < 0.05/the number of combinations/the number of MR-tests = 0.05/6/2 = 4.17 × 10^−3). However, the AF-HF pair exhibited bias due to reverse causality, which may have skewed the results. In all, we found a high degree of agreement between the MRlap and LCV results, with both methods inferring causality that aligns with existing epidemiological evidence. To investigate potential bias due to sample overlap, we repeated the analyses using GWAS summary statistics of AF and HF based on GWAS excluding the UK Biobank sample (the main source of overlap). We observed small differences in the estimates, but the interpretation remained the same for all results (Supplementary Data [111]8a). We extended the univariate results with multivariate MR^[112]32 to assess whether causal effects observed for CAD-HF and VTE-Stroke were confounded by other CVDs using both raw and UKB-excluded summary data. We found that the association between genetic liability to CAD and HF remained statistically significant even after adjusting for other CVDs, a pattern also observed for the VTE-Stroke pair (Supplementary Data [113]8b). Furthermore, the association between genetic liability to CAD and HF was strengthened in the UKB-excluded summary data, with the odds ratio (OR) rising from 1.33 (CAD-HF) to 2.72 (CAD-HF_UKB-excluded). In contrast, the association between genetically contributed VTE susceptibility and stroke slightly weakened after adjusting for other CVDs, with the OR changing from 1.09 (VTE-Stroke) to 1.08 (VTE-Stroke_UKB-excluded). Overall, these findings further support the robust causal relationships between CAD-HF and VTE-Stroke. In conclusion, the MR analyses illuminate that the shared genetic basis for most CVDs is not potentially mediated by vertical pleiotropy, given the limited causal relationships observed across CVDs despite their strong genetic associations. This implies that pleiotropy is more likely horizontal in nature, driven by shared risk factors or upstream biological mechanisms rather than direct causal pathways. Shared and novel loci for the six CVDs Previous studies have demonstrated that cross-trait meta-analysis is a powerful approach for increasing statistical power to identify pleiotropic SNPs, shared genetic architecture, and common biological annotations across distinct psychiatric and immunological diseases^[114]18,[115]22. Building on this, we conducted cross-trait GWAS meta-analyses using the MTAG method^[116]16 to enhance the identification of novel genetic associations for each CVD by pinpointing pleiotropic SNPs contributing to CVD susceptibility. As a result, we observed an increase in maximum effective sample sizes, with the following increments: from 1,030,836 to 1,076,012 for AF; from 1,165,690 to 1,237,236 for CAD; from 1,500,861 to 1,631,058 for VTE; from 977,323 to 1,219,100 for HF; from 511,634 to 843,568 for PAD; and from 1,308,460 to 1,735,909 for stroke (Table [117]1 and Fig. [118]3a–f). Importantly, analysis of heritability Z-scores derived from MTAG results revealed stronger genetic signals compared to the original GWAS outcomes (Table [119]1). Furthermore, LDSC intercepts were close to one, suggesting that the observed increase in mean χ2 statistics was attributable to polygenicity rather than confounding biases such as population stratification. The quantile–quantile (Q-Q) plots and genomic inflation factors (λGC) for both MTAG and original GWAS results are presented in Supplementary Figs. [120]S4, [121]S5. The λGC values ranged from 1.096 to 1.421 in the GWAS results and from 1.014 to 1.446 in the MTAG results, suggesting no evidence of systematic inflation. Detailed information on novel genomic loci and mapped genes identified from the MTAG analysis of the six CVDs is provided in Supplementary Notes [122]4, [123]5, Supplementary Fig. [124]5, and Supplementary Data [125]9–[126]16. Table 1. Statistical summary of GWAS and MTAG results Atrial fibrillation Coronary artery disease Venous thromboembolism Heart failure Peripheral artery disease Stroke GWAS MTAG GWAS MTAG GWAS MTAG GWAS MTAG GWAS MTAG GWAS MTAG mean χ2 1.5451 1.4888 1.7648 1.8020 1.5698 1.4830 1.1591 1.2062 1.1025 1.1253 1.2280 1.1944 h^2[SNP] 0.0250 0.0245 0.0324 0.0345 0.0182 0.0183 0.0080 0.0118 0.0094 0.0157 0.0060 0.0084 λGC 1.2932 1.2697 1.4210 1.4460 1.3203 1.2498 1.1270 1.1555 1.0957 1.0135 1.1876 1.1459 Intercept 1.0538 1.0025 1.0218 1.0067 1.0511 0.9501 1.0073 0.9797 1.0157 0.9795 1.0738 0.9750 Ratio 0.0978 0.0052 0.0302 0.0083 0.0898 <0 0.0416 <0 0.1560 <0 0.3245 <0 Sample size 1,030,836 1,076,012 1,165,690 1,237,236 1,500,861 1,631,058 977,323 1,219,100 511,634 843,568 1,308,460 1,735,909 Total SNPs 6,925,287 5,782,223 6,925,287 5,782,223 6,925,287 5,782,223 6,925,287 5,785,906 6,925,287 5,785,906 6,925,287 5,785,906 nSig SNPs 10,784 8401 15,789 12,118 9070 6511 272 449 92 426 941 585 nGenomic locus 117 108 203 171 91 72 11 13 4 37 24 18 nInd Sig SNPs 749 592 1049 862 765 598 21 40 8 60 50 48 nLead SNPs 224 191 362 295 222 176 12 14 5 43 25 20 nGenes mapped by position using FUMA 397 345 662 567 447 375 38 47 21 63 67 66 nGenes mapped by MAGMA 298 241 551 474 401 303 20 31 5 17 43 43 nGenes mapped by eqtl using FUMA 1018 900 1921 1768 1147 1056 70 121 25 151 156 157 nGenes mapped by TWAS 387 322 831 755 598 494 19 32 11 33 43 44 [127]Open in a new tab All reported P values are two-sided. mean χ2 mean chi-squared value, h^2[SNP] SNP-based heritability, λGC Lambda GC of 1 indicates no inflation of the median χ2 association statistic, Intercept LD score intercept, a value of 1 indicates that inflation is not due to population stratification biases, Ratio (intercept-1)/(mean chi-squared value-1), the ratio indicates what percentage of the inflation can be attributed to other sources than polygenicity, nSig SNPs number of genome-wide significant SNPs after additional Bonferroni correction for three independent traits (P < 5 ×  10^−8/3 = 1.67 × 10^−8), nGenomic locus number of the genomic locus, nLead SNPs number of lead SNPs in the genomic locus (P < 5 × 10^−8 and r^2 < 0.1), nInd Sig SNPs number of the independent significant SNPs in the genomic locus (P < 5 × 10^−8 and r^2 < 0.6), nGenes mapped by position using FUMA Number of genes mapped by position using FUMA, nGenes mapped by MAGMA number of genes mapped by MAGMA, nGenes mapped by eqtl using FUMA Number of genes mapped by eqtl using FUMA, nGenes mapped by TWAS Number of genes mapped by TWAS, FUMA functional mapping and annotation of genetic associations, MAGMA multi-marker analysis of genomic annotation, TWAS transcriptome-wide association study. Fig. 3. Results of multi-traits meta-analysis by MTAG based on over 1.2 million individuals. [128]Fig. 3 [129]Open in a new tab a–f Manhattan plots for MTAG results of each cardiovascular disease. The X-axis represented the chromosomal position, and the Y-axis showed the negative log10-transformed P values for each SNP. Cytoband annotations for the newly identified genomic loci are shown in gray. Genome-wide significance was indicated by the multiple comparisons-corrected threshold of P = 1.67 × 10^−8 (red dotted line). Colored dots indicated an independent genome-wide significant association with the smallest P value (Top lead SNP). Only SNPs common to all summary statistics were included. All statistical tests are two-sided. MTAG multi-trait analysis of GWAS, AF atrial fibrillation, CAD: coronary artery disease, VTE venous thromboembolism, HF heart failure, PAD peripheral artery disease. Exploration of shared biological mechanisms among the six CVDs (i.e., horizontal pleiotropy) In light of the challenges posed by prior GWAS results in discerning shared biological mechanisms among CVDs, our subsequent efforts involved a comprehensive assessment of genetic overlap within the MTAG results of six CVDs, spanning genomic loci, SNPs, genes, and pathways. Among the noteworthy observations, a total of 38 pleiotropic loci showed genetic signals across multiple CVDs, of which 9 (22.0%) were shared by more than half of the considered CVDs (Supplementary Data [130]17a). However, when testing whether these overlapping loci shared causal variants through hypothesis prioritization in multi-trait colocalization (HyPrColoc)^[131]33, only 12 (31.58%) loci exhibited strong colocalization evidence of sharing causal variants (posterior probability [PP] >0.70). Out of these, 11 loci were further corroborated by Metasoft analysis^[132]34 (Supplementary Fig. [133]6). The most prominent pleiotropic locus is located within an intron of the lipoprotein(a) gene (LPA) at 6q25.3. This locus harbored the shared causal SNP rs10455872, which colocalized across all CVDs except MTAG_VTE_EUR. Notably, rs10455872 has been established as linked to elevated lipoprotein (a) (Lp[a]) levels, in addition to an increased risk of CVDs across the general population^[134]35. Although statistically significant only in MTAG_CAD_EUR and MTAG_PAD_EUR (P[meta] = 1.11 × 10^−143, Fig. [135]4a) through MetaSoft analysis, the strong signal underscores the pleiotropic role of this locus. A second pleiotropic signal was found at 7p21.1 near rs8084351, located in the intergenic region between the TWIST1 (twist family bHLH transcription factor 1) and HDAC9 (histone deacetylase 9) genes. This locus demonstrated a strong connection with prevalent vascular diseases, including MTAG_CAD_EUR, MTAG_PAD_EUR, and MTAG_Stroke_EUR (P[meta] = 2.20 × 10^−41, Fig. [136]4b). The signal in our analysis was associated with artery aorta eQTLs for TWIST1 rather than HDAC9 (eQTL association FDR for TWIST1 = 5.54 × 10^−4), supporting TWIST1 as a plausible candidate gene. Previous functional studies have demonstrated that the rs2107595 variant modulates TWIST1 expression and affects vascular smooth muscle cell function^[137]36. We also discovered a novel pleiotropic locus at 19p13.2, located ~11 kb upstream of the LDLR gene (low-density lipoprotein receptor). This locus was associated with all CVDs except MTAG_AF_EUR. Although it fell short of the stringent PP threshold (PP < 0.7), the lead SNP rs112898275 showcased noteworthy meta-analysis significance (P[meta] = 1.47 × 10^−64, Fig. [138]4c). A pivotal aspect of our analysis encompassed investigating the presence of opposite directional effects of shared causal variants among the 12 pleiotropic loci across various diseases. Notably, all 12 loci exhibited the same directional effect on diseases, including ten susceptible loci and two protective loci, in line with their strong genome-wide r[g] (Supplementary Fig. [139]7). Fig. 4. The three most pleiotropic loci associated with all six cardiovascular diseases. [140]Fig. 4 [141]Open in a new tab a–c Forest plots with PM-plots show disease-specific effects of the index SNP in each locus, including rs10455872 on 6q25.3, rs2107595 on 7p21.1, and rs112898275 on 19p13.2. For each locus, the disease-specific effects of the causal SNP were illustrated using ForestPMPlot, based on Metasoft analysis. The first panel was a forest plot displaying the disease-specific association P value, log odds ratios (ORs), and standard errors (SE) for the SNP. Each black square represents the point estimate of the effect size (log OR) for an individual study, with the size of the square proportional to the weight assigned to that study in the meta-analysis. The width of the line in each study represents the confidence interval (CI) for the effect size estimate. The diamond at the bottom indicates the overall effect estimate from the fixed-effects (FE) model, with its width corresponding to the 95% CI of the pooled estimate. The meta-analysis P value and corresponding summary statistic were displayed at the top and bottom of the forest plot, respectively. The second panel was a PM-plot where the X-axis represents the m-value, the posterior probability that the effect exists in each disease, and the Y-axis represents the disease-specific association P value as −log10 (P value). Diseases were represented by dots, where the size corresponds to the sample size of the individual GWAS (AF, n = 1,076,012; CAD, n = 1,237,236; VTE, n = 1,631,058; HF, n = 1,219,100; PAD, n = 843,568; Stroke, n = 1,735,909). Diseases with estimated m-values of at least 0.9 are colored red, indicating that the SNP has an effect on the disease, while those with m values below 0.1 are marked blue, suggesting that the SNP does not have an effect. Diseases with estimated m-values between 0.1 and 0.9 are colored green, indicating that the SNP may have an uncertain effect on the disease. All statistical tests were two-sided. AF atrial fibrillation, CAD coronary artery disease, VTE venous thromboembolism, HF heart failure, PAD peripheral artery disease. Multi-trait colocalization analysis using HyPrColoc further highlighted 7 pleiotropic loci that were colocalized to share a causal variant, supporting the significant role of metabolic traits—including anthropometric traits^[142]37, blood pressure traits^[143]38, lipidemic traits^[144]39, glycemic traits^[145]40, kidney function^[146]41, liver function^[147]42, and other traits, such as carotid artery intima-media thickness (cIMT)^[148]43 and C-reactive protein (CRP)^[149]44—in the pathogenesis of CVDs (Supplementary Data [150]17b, c). Of these, three loci showed substantial evidence of colocalization among multiple CVDs and more than one metabolic trait. For example, a locus located at 1p36.22 was found to be colocalized between multiple CVDs (MTAG_AF_EUR and MTAG_Stroke_EUR) with multiple metabolic traits—including blood pressure, lipidemic, and glycemic traits (e.g., glycated hemoglobin [HbA1c])—with the index SNP rs880315 (mapped to CASZ1 [castor zinc finger 1]) identified as the likely shared causal variant. Notably, the locus at 16q12.2 (index SNP: rs11075990, mapped to FTO [FTO alpha-ketoglutarate dependent dioxygenase]) was found to be colocalized with a broad spectrum of metabolic traits—including diastolic blood pressure (DBP) and pulse pressure (PP) (blood pressure traits), high-density-lipoprotein cholesterol (HDL-C) and triglycerides (TC) (lipidemic traits), HbA1c (glycemic trait), alanine aminotransferase (ALT) (liver function), and CRP—although the PP did not exceed the stringent threshold (PP > 0.70). Besides, two loci (1p34.3 and 7p21.2) were colocalized between multiple CVDs and blood pressure traits, especially systolic blood pressure (SBP). Two other loci exhibited strong evidence of colocalization between multiple CVDs and lipidemic traits (6q25.3) or glycemic traits (18q21.1), respectively (Supplementary Data [151]17d). We further tested whether these pleiotropic loci could be colocalized with common risk factors, such as hypertension^[152]45, type 2 diabetes (T2D)^[153]46, chronic kidney disease (CKD)^[154]47, and non-alcoholic fatty liver disease (NAFLD)^[155]48. For example, the locus (4q21.21) was identified to have strong colocalization signals with MTAG_AF_EUR, MTAG_CAD_EUR, estimated glomerular filtration rate (eGFR), and CKD, where the index SNP rs1458038 (located near the FGF5 [fibroblast growth factor 5] gene) was identified as the likely shared causal variant. Strikingly, the locus located at the FTO gene also exhibited evidence of colocalization with NAFLD, although the PP did not meet the stringent threshold. All of these—680 lead SNPs, 924 susceptibility genes identified by multi-marker analysis of genomic annotation (MAGMA)^[156]49 and 1474 tissue-specific genes identified by transcriptome-wide association study (TWAS)^[157]50—achieved genome-wide significance for any of the six CVDs. Specifically, only 74 (10.88%) lead SNPs (P  <  1.67 × 10^-8; Supplementary Data [158]18), 118 (12.77%) susceptibility genes (P  <  9.45 × 10^−7; Supplementary Data [159]15), and 156 (10.58%) tissue-specific genes (P  <  8.71 × 10^−7; Supplementary Data [160]16) reached genome-wide significance for more than one CVD. Among these, only 17 (2.50%) lead SNPs, 16 (1.73%) susceptibility genes, and 11 (0.75%) tissue-specific genes achieved genome-wide significance in more than half of the six CVDs (i.e., at least four). Together with strong evidence from colocalization analyses, direct SNP- and gene-level comparisons across the six CVDs support the presence of shared genetic signals in overlapping loci. While the observation of a relatively limited number of overlapping genes and regions among cardiovascular diseases is apparent, the distinct genes associated with these conditions could potentially converge within shared biological pathways or display enrichment in similar tissues or functional categories. To explore this notion, we undertook a comprehensive analysis that included: (i) gene set enrichment based on MAGMA-identified susceptibility genes, (ii) pathway enrichment analysis of tissue-specific genes using Metascape, and (iii) tissue and functional category enrichment using SNP-based heritability from LDSC-SEG^[161]51. After meticulous correction for 9398 tests across the six CVDs (that is, 7744 and 1654 gene sets for GO BP and Reactome, respectively; P  =  0.05/9398/3  =  1.77 × 10^−6; Supplementary Data [162]19), we observed little convergence among sets of CVDs for gene sets. Of these, three gene sets—circulatory system development, circulatory system process, and heart development—were significantly enriched in more than one CVD, and none showed significance in more than two. Subsequently, more lenient enrichment tests using Metascape similarly revealed limited convergence, with circulatory system process being the only pathway enriched across multiple CVDs based on tissue-specific genes (Supplementary Data [163]20). Similarly, after separately correcting for tissues or functional categories and six CVDs (that is, 49 tissues and 489 functional categories; P-value for tissues = 0.05/49/3  =  3.40 × 10^−4; P value for functional categories = 0.05/489/3  = 3.41 × 10^−5; Supplementary Figs. [164]8, [165]9 and Supplementary Data [166]21, [167]22), no test surpassed the significance threshold. Taking a less stringent approach, only correcting separately for tissues or functional categories, there were still no convergence among CVDs for tissues and functional categories. To assess whether certain functional genomic categories (e.g., regulatory elements in particular tissues or cell types) are enriched among multiple CVDs, we applied GARFIELD (GWAS analysis of regulatory and functional information enrichment with LD correction)^[168]52. After correction for the 1005 tested functional categories across all CVDs (P  =  0.05/1005/3 = 1.66 × 10^−5; Supplementary Fig. [169]10 and Supplementary Data [170]23), we observed significant enrichment in the exon region and for multiple regulatory features. These included 240 (23.88%) DNaseI hypersensitive sites, 86 (8.56%) chromatin accessibility peaks, 37 (3.68%) histone-modified regions, 19 (1.89%) transcription-factor footprints, 8 (0.796%) chromatin states, 1 transcription-factor binding site (TFBS), and 2 formaldehyde-assisted isolation of regulatory elements (FAIRE) of different tissues or cell types in more than one CVD—excluding MTAG_HF_EUR, MTAG_PAD_EUR, and MTAG_Stroke_EUR. These findings suggest that shared functional category signals are largely driven by AF, CAD, and VTE, which are the CVDs with the strongest genetic associations, thus limiting their interpretation as general CVD susceptibility signals. Likewise, of the 74 lead SNPs, 118 MAGMA-derived susceptibility genes, and 156 TWAS-identified tissue-specific genes that were genome-wide significant across more than one CVD, most—45 (60.81%), 100 (84.75%), and 113 (72.44%), respectively—were shared among multiple diseases, again excluding MTAG_HF_EUR, MTAG_PAD_EUR, and MTAG_Stroke_EUR. Ultimately, these findings suggest that signals indicating shared biological mechanisms were predominantly driven by AF, CAD, and VTE, rather than reflecting a generalizable genetic basis across all CVDs. This is likely due to their relatively higher heritability and larger GWAS sample sizes, both of which enhance the statistical power to detect shared genetic associations. In contrast, other CVDs, including HF, PAD, and Stroke, may exhibit more heterogeneous etiologies or weaker genetic architectures, which complicates the identification of shared genetic components. Discussion In this work, we systematically investigate the overlap of genomic loci, SNPs, genes, gene sets, tissue types, and functional categories across six major CVDs, leveraging data from over 1.2 million individuals. This comprehensive approach uncovers locally shared genetic mechanisms that would otherwise go undetected and yields four key insights into the shared genetic architecture of CVDs. Specifically, we quantify genetic overlap beyond genome-wide genetic correlations across CVDs, and identify a putative causal relationship between CAD and HF. We further identify 38 genomic loci with pleiotropic effects, the most prominent of which is located at the LPA gene. Analysis of associated genes and gene sets reveals strong enrichment in pathways related to circulatory system development. Notably, distinct genetic patterns are primarily driven by AF, CAD, and VTE, suggesting a mismatch between clinical classifications and underlying shared biological mechanisms. First, our findings systematically quantified the genetic architecture of CVDs beyond genome-wide genetic correlations by integrating multiple methods, including LDSC, MiXeR, LAVA, and GWAS-PW. We observed marked differences in polygenicity across CVDs, accompanied by extensive genetic overlap and relatively few disease-specific variants. These findings were further supported by LAVA local r[g], which revealed heterogeneous effect directions that were not captured by genome-wide estimates. However, the Dice coefficient may underestimate the true extent of genetic overlap, particularly when one trait harbors significantly fewer causal variants, even if a substantial proportion of them are shared with the other trait. This limitation highlights the need to interpret overlap metrics cautiously, especially in cases with asymmetrical statistical power for variant discovery, such as the HF-PAD pair. In such settings, shared architecture may be obscured by the imbalance in variant discovery power (Supplementary Note [171]3). Overall, our results suggest that phenotypic specificity in CVDs is primarily shaped not by disease-specific variants, but by the distribution of effect sizes and directions among shared pleiotropic variants. Second, our MR analyses illuminated that genetic predisposition to CAD is significantly associated with an elevated risk of HF, consistent with findings from previous epidemiological studies. The improved survival rate following acute myocardial infarction, along with the declining prevalence of hypertension and valvular heart disease as primary causes of incident HF, has made CAD the leading risk factor for HF development^[172]53. Notably, nearly two-thirds of HF cases are now attributed to underlying CAD, underscoring its central role in the pathogenesis of HF^[173]54. We also identified a potential risk effect of VTE on stroke. However, current research mainly highlights an increased short-term risk of VTE following ischemic stroke, while evidence directly linking VTE to stroke as a causal risk factor remains limited, necessitating cautious interpretation. In sum, the shared genetic basis between most pairs of CVDs is potentially mediated by horizontal pleiotropy rather than vertical pleiotropy, pointing to a broader genetic framework underlying the risk of cardiovascular pathology. Moreover, most CVDs frequently co-occur with other cardiometabolic disorders and/or factors, suggesting a shared genetic susceptibility^[174]4,[175]5,[176]55. Our MR analysis of 24 metabolic traits or diseases and six major CVDs further confirmed that most metabolic traits and disorders exert a causal influence on CVD risk (Supplementary Note [177]6 and Supplementary Data [178]24). Among these, hypertension emerged as the most robust causal risk factor. Third, to address limitations in previous cross-disease genetic overlap research, we employed MTAG analysis to enhance power for detecting pleiotropic variants, which allowed us to elucidate the genetic foundation for shared risks across diseases. Remarkably, we identified novel genomic loci associated with various CVDs, significantly increasing the effective sample sizes for these conditions. Our variant-level analyses strongly supported the existence of horizontal pleiotropy, with ~10.88% of lead SNPs influencing more than one of the examined CVDs. Intriguingly, multi-trait colocalization analysis from HyPrColoc corroborated that 12 loci displayed particularly extensive pleiotropic profiles across CVDs. Of these, seven loci were further found to be colocalized across multiple CVDs and a variety of metabolic traits (such as blood pressure, lipidemic, and glycemic traits). The most highly pleiotropic locus was located in the LPA gene, which encodes apolipoprotein(a), a key component of Lp(a) that was strongly associated with increased CVD risk^[179]35,[180]56,[181]57. This result was supported by the colocalization between multiple CVDs and lipidemic traits (such as low-density lipoprotein and total cholesterol), which exemplified the complex genetic interplay underlying related diseases and their risk factors. Similarly, the second-most pronounced pleiotropic locus has been demonstrated to regulate the expression of the TWIST1 gene in vascular smooth muscle cells, which was also supported by the colocalization between multiple CVDs and blood pressure traits, particularly SBP. The typical example was the locus 1p36.22, located in the CASZ1 gene, which was colocalized with multiple CVDs and common metabolic traits (including blood pressure, lipidemic, and glycemic traits). CASZ1 encodes a zinc finger transcription factor that can inhibit tumorigenesis. Interestingly, CASZ1 promotes vascular development by directly regulating the EGFL7/RhoA-mediated pathway, which may contribute to stroke phenotypes^[182]58. Managing risk factors is also crucial to prevent stroke, because the association of CASZ1 variant rs880315 with stroke has been proven to be affected by its risk factors^[183]59. Another interesting example was the locus 16q12.2, located in the intron of the FTO gene, which had been reported to be associated with fat mass and obesity in multiple populations^[184]60. FTO is known to encode a nucleic acid methylase dependent on α-ketoglutarate and iron (Fe II), which is ubiquitous in human tissues^[185]61. Although this locus showed low colocalization signals between multiple CVDs and most metabolic traits, it was highly colocalized with MTAG_VTE_EUR, MTAG_HF_EUR, MTAG_PAD_EUR, and obesity-related traits (such as body mass index [BMI] and waist-hip ratio [WHR]). The index SNP rs1421085, located ~18.9 kb from the previously identified causal variant rs11075990, was identified as a shared causal variant. This result highlights the FTO gene as a common genetic connection between metabolic traits (such as obesity) and CVDs. Fourth, we unearthed extensive evidence linking the cross-disease genetics of CVDs to circulatory system development. Alongside LPA, pleiotropic genes like IL6R and SMAD7, and gene set assessment, further highlighted the connection between pleiotropy and genetic effects on circulatory system development^[186]62–[187]65. Our functional enrichment analyses using GARFIELD revealed that shared functional category signals were predominantly driven by AF, CAD, and VTE. These results underscore the existence of two distinct groups of CVDs: the first group, which includes AF, CAD, and VTE, is characterized by stronger genetic signals and more prominent shared biological mechanisms. In contrast, the second group, comprising HF, PAD, and stroke, exhibits weaker genetic signals and fewer overlapping biological mechanisms. This observation challenges the classical categorical classification of CVDs. Additionally, to evaluate the potential causal roles of circulating proteins in CVD, we conducted proteome-wide MR and colocalization analyses using whole-proteome data from two major proteomic platforms (i.e., SomaScan and Olink), encompassing 2730 proteins. However, due to challenges such as differences between proteome profiling platforms and unavoidable sample overlap between exposure and outcome datasets, we did not identify any proteins with consistent causal associations across multiple CVDs (Supplementary Note [188]7, Supplementary Fig. [189]11, and Supplementary Data [190]25, [191]26). However, our study has several limitations. First, although the dataset encompassed six major CVDs, larger sample sizes and the inclusion of additional disease types could further enhance the statistical power to detect cross-disorder genetic effects. Second, our study predominantly focused on individuals of European ancestry, and more research is needed to validate findings in other populations. To address this limitation, we incorporated GWAS summary statistics from East Asian cohorts for replication, which partially confirmed the consistency of the genetic architecture identified in the European sample (Supplementary Note [192]8 and Supplementary Data [193]27–[194]39). Nonetheless, future research adopting a trans-ancestral approach will be crucial for assessing the broader generalizability of these findings. Finally, while GWAS designs capture common variant aspects of genetic architecture, studies examining copy-number variants and rare mutations are needed for a comprehensive understanding. In conclusion, our study provides new insights into the complex genetic relationships among CVDs, uncovering shared genetic mechanisms, pleiotropy, and the role of circulatory system development. These findings not only advance our understanding of CVD genetics but also highlight potential therapeutic targets and emphasize the need for further research to disentangle the intricate interplay between genetic and environmental factors in cardiovascular health. Methods Data sources and quality control We sought the largest GWAS summary statistics of CVDs from publicly available data sources of European ancestry, due to the limited availability of well-powered GWAS for other ancestries, and selected GWAS with sample sizes larger than 50,000 to ensure enough statistical power. Finally, we collected GWAS summary statistics for six major CVDs. Detailed information about these diseases and their publication sources was provided in Supplementary Data [195]1 and Supplementary Note [196]1^[197]10,[198]13,[199]15. Only GWAS summary statistics from European ancestry were used in the main analysis. Pre-analysis quality control steps were performed on the GWAS summary statistics, including (1) aligning to 1000 Genomes Project v3 European reference (hg19 genome build); (2) excluding non-autosomal SNPs; (3) filtering out SNPs without rsID or with duplicated rsIDs; and (4) keeping only biallelic SNPs with minor allele frequency (MAF) >0.01. To assure fair and interpretable comparisons across the six CVDs, we filtered all summary statistics to retain only SNPs available for all six CVDs, resulting in a final set of 6,925,287 SNPs. Additional data processing was carried out according to the specific requirements of the different methods used in the subsequent analyses. Estimating SNP-based heritability and genome-wide genetic correlation using LDSC We used univariate linkage disequilibrium (LD) score regression (LDSC) to estimate the SNP-based heritability (h^2[SNP], i.e., the proportion of the phenotypic variance in a trait that could be explained by common genetic variants included in the analysis) of each of the six CVDs^[200]26,[201]66. All GWAS summary statistics were reformatted to match the precomputed LD scores from the 1000 Genomes Project European reference panel. SNPs were excluded if they did not intersect with the reference panel or were located within the MHC region (chromosome 6: 26–35 Mb). Bivariate LDSC was used to estimate genetic correlation (r[g,] i.e., the proportion of genetic variance shared by two traits divided by the square root of the product of their SNP-based heritability estimates). To account for multiple testing across the six CVDs in our analysis, we used the matrix spectral decomposition (matSpD) software^[202]67 to examine the degree of independence among each trait. We determined that the effective number of independent traits was four, resulting in six independent trait pairs. To correct for multiple testing, the Bonferroni correction was applied to the r[g] P values (P  =  0.05/6 combinations = 0.05/6  =  8.33 × 10^−3). Like univariate LDSC, bivariate LDSC only requires GWAS summary statistics and produces estimates that are unbiased by sample overlap. Estimating polygenicity and genetic overlap using MiXeR We used the univariate causal mixture modeling method (MiXeR) analysis to estimate the number of trait-influencing (i.e., “causal”) variants for each trait (i.e., polygenicity) and the average magnitude of additive genetic associations among these variants (i.e., discoverability)^[203]27. Following recommended guidelines, we used the 1000 Genomes Project Phase 3 European reference panel and excluded the MHC region (chromosome 6: 26–35 Mb). Bivariate MiXeR models were applied to infer additive genetic effects as a mixture of four bivariate Gaussian components: (a) variants not associated with either trait, (b) variants uniquely associated with the first trait, (c) variants uniquely associated with the second trait, and (d) variants associated with both traits (i.e., shared disease-influencing variants)^[204]68. Results were presented as Venn diagrams displaying the proportion of shared and unique variants that collectively accounted for 90% of SNP-based heritability in each GWAS. To evaluate genetic overlap, MiXeR was implemented to calculate the Dice coefficient (i.e., the ratio of shared variants to the total number of variants). Model fit was assessed using the Akaike information criterion (AIC), which was based on the maximum likelihood of GWAS z-scores and was visualized by modeled versus observed conditional quantile–quantile (Q-Q) plots. Estimating local genetic correlation using LAVA and GWAS-PW We used univariate local analysis of (co)variant association (LAVA) to test for local SNP-based heritability (local h^2[SNP]) within each CVD. We used genomic regions defined as autosomal linkage disequilibrium (LD) blocks (N = 2495) by ref. ^[205]28, which are characterized by minimal LD between blocks, contain at least 2500 variants each, and have an average size of ~1 Mb. The LD structure was estimated using the European sample reference panel from the 1000 Genomes Project Phase 3. The detection of valid and interpretable local genetic correlation (local r[g]) for bivariate testing requires the presence of sufficient local genetic signal. To this end, we used a threshold of P  <  1 × 10^−4 to filter out non-associated loci (i.e., loci devoid of any univariate heritability) that nonetheless might have contributed to the overall h^2[SNP,] to get a more global overview of the shared genetic signal. A total of 577 bivariate tests were conducted, as multiple tests could occur within a single region when different CVDs exhibit univariate signals in the same locus. Significance testing was performed using simulation-based P values. To correct for multiple testing, the Bonferroni correction was applied to the local r[g] P values (P  =  0.05/the number of bivariate tests = 0.05/577  =  8.67 × 10^−5). Sample overlap across GWAS datasets was accounted for by incorporating pairwise genetic covariance estimates from bivariate LDSC, which were subsequently standardized into a correlation matrix. We applied the Bayesian pleiotropy association method from the pairwise GWAS framework (GWAS-PW) to detect genomic regions with shared genetic effects among CVDs^[206]29. As in the LAVA analysis, we used the 1000 Genomes Project Phase 3 European panel as the LD reference and partitioned it into 2495 semi-independent blocks of ~1 Mb each. For each region, GWAS-PW estimates the posterior probabilities (PPAs) for four scenarios: (i) the locus is associated with the first trait (PPA_1); (ii) with the second trait (PPA_2); (iii) with both traits via the same causal variant (PPA_3); and (iv) with both traits via different causal variants (PPA_4). A posterior probability greater than 0.5 for either PPA_3 or PPA_4) was considered evidence of shared genetic signal within the region. Mendelian randomization analysis Genome-wide or local genetic correlations and overlaps may reveal important insights into shared biological mechanisms between two traits; however, these findings should not be interpreted as evidence of causal relationships in either direction. To evaluate potential causal relationships, we performed the latent causal variable (LCV) model to evaluate partial genetic causality between traits^[207]31. The LCV approach leverages the bivariate effect size distribution of SNPs from two GWAS and their LD scores to estimate the posterior mean genetic causality proportion (GCP), enabling partial genetic causality to be distinguished from r[g]. GCP estimates close to zero for genetically correlated traits suggest that the observed relationship may result from horizontal pleiotropy. In such a case, the traits share genetic pathways but are unlikely to influence each other through vertical pleiotropy within the same causal mechanism. LCV estimates the GCP on a scale ranging from −1 to 1, with values >0 indicating partial genetic causality from the first trait to the second, and values <0 indicating the reverse direction. A strong causal signal was defined as a posterior GCP significantly different from zero (based on a one-sided t-test) and an absolute GCP greater than 0.6. To correct for multiple testing, the Bonferroni correction was applied to the GCP P values (P  =  0.05/6 combinations = 0.05/6  =  8.33 × 10^−3). LCV accounts for SNP-based heritability and genetic correlation between traits and is robust to sample overlap. Notably, the LCV model assumes a single latent variable mediating the effect between two traits, with the effect being unidirectional. However, this assumption may be violated in the presence of bidirectional causality or multiple latent mediators, which could confound the results. Given the limitations of the LCV method, we performed a sensitivity analysis using the MRlap method for CVD pairs that exhibited partial genetic causality^[208]69. MRlap corrected for biases such as sample overlap, weak instrument strength, and the winner’s curse, thereby offering more reliable causal estimates. This method first estimates observed MR-based effects, followed by corrections based on genetic covariance calculated through LDSC. We compared IVW results before and after correction to assess whether sample overlap significantly influenced the findings. If a significant impact was observed, the corrected P values were considered the primary results. We repeated the analyses using AF and HF GWAS summary statistics after excluding the UKB sample, which contributed most of the overlap, and compared the results. We also examined associations between the genetic liability of one CVD and another, adjusting for the presence of other CVDs using multivariable Mendelian randomization (MVMR)^[209]32. MVMR allows us to estimate causal relationships between two traits while controlling for the influence of other related traits. However, since MVMR relies on regression analysis, it does not formally test for mediation effects. Instead, it provides adjusted causal estimates accounting for potential mediating factors, offering insights into the direct relationships between traits. Multi-trait analysis of GWAS using MTAG Multi-trait analysis of GWAS (MTAG) applies a generalized inverse-variance-weighted meta-analysis approach to multiple correlated traits and aims to detect novel genetic associations for each trait by boosting statistical power by borrowing the correlations among correlated traits^[210]16. Briefly, MTAG takes summary statistics from single-trait GWAS as input and estimates trait-specific effects for one common set of SNPs, and the resulting P value could be interpreted similarly to those from single-trait GWAS. MTAG incorporates bivariate LDSC to account for possibly unknown sample overlap among the GWASs of multiple correlated traits. MTAG relies on the key homogeneity assumption that all SNPs across traits share the same variance-covariance matrix of effect sizes. However, the MTAG estimator remains consistent even when this assumption is violated, when some SNPs influence only a subset of the traits. We calculate the upper bound of the false discovery rate (“maxFDR”) to evaluate the extent of overall inflation due to potential violations of the homogeneity assumption. In the initial MTAG analysis, the “maxFDR” values were 1.656 × 10^−3 for AF, 3.092 × 10^−4 for CAD, 2.798 × 10^−3 for VTE, 0.134 for HF, 0.354 for PAD, and 0.569 for stroke. The “maxFDR” values of HF, PAD, and stroke were greater than 0.05, suggesting potential overall inflation due to violation of the homogeneous assumption. We then repeated the MTAG analysis for HF, PAD, and stroke, obtaining revised ‘maxFDR’ values of 0.0398, 0.0419, and 0.0585, respectively. To investigate whether violations of the assumptions of equal SNP heritability for each trait and perfect genetic covariance between traits could bias our MTAG results, we performed pairwise cross-trait meta-analysis using cross-phenotype association (CPASSOC) as a sensitivity analysis^[211]70. CPASSOC assumes the presence of heterogeneous effects across traits and estimates the cross-trait statistic SHet and P-value through a sample size-weighted, fixed-effect meta-analysis of GWAS summary statistics. We denote the summary statistics from single-trait GWAS as GWAS_AF, GWAS_CAD, GWAS_VTE, GWAS_HF, GWAS_PAD, and GWAS_Stroke, and the summary statistics from MTAG analysis as MTAG_AF, MTAG_CAD, MTAG_VTE, MTAG_HF, MTAG_PAD, and MTAG_Stroke, respectively. MatSpD estimated that the six traits were equivalent to three independent traits based on the MTAG results. Independent SNPs that were genome-wide significant after an additional Bonferroni correction for three independent traits (P < 5 ×  10^−8/3 = 1.67 × 10^−8) in the cross-trait meta-analyses using both MTAG and CPASSOC were considered for further verification. Genomic loci definition and functional annotation using FUMA Functional mapping and annotation of genetic associations (FUMA) was an online platform that annotates SNPs with their biological functionality and maps implicated genes^[212]71. We performed FUMA annotation with default settings, using the 1000 Genomes Project v3 of European samples as a reference panel. Before annotation, FUMA first defined independent significant SNPs with a genome-wide significant P value (5 × 10^−8) and were independent at r^2  <  0.6 within 1 Mb. Based on LD information, a subset of these independent significant SNPs was labeled as lead SNPs (independent from each other at r^2  <  0.1). Genomic loci were identified by merging the LD blocks of lead SNPs that are closely located to each other (less than 250 kb apart). The top lead SNP was defined as the SNP with the lowest P-value in a specific region. Biological functionality for lead SNPs, including potential regulatory functions (RegulomeDB score), deleteriousness score (CADD score), effects on gene functions (using ANNOVAR), and mRNA expression levels (using eQTL data), was also estimated by FUMA. In detail, the nearest genes and the functional consequences of each SNP on gene functions were annotated based on ANNOVAR. Combined annotation dependent depletion (CADD) score indexes the deleteriousness of variants computed based on 67 annotation resources. SNPs with a CADD score higher than 12 were considered to confer deleterious effects. The RegulomeDB provides a categorical score that describes how likely a SNP is to play a regulatory role based on the integration of high-throughput datasets. The RDB score of 1a suggests the strongest evidence, while a score 7 represents the least support for a regulatory potential. eQTL mapping provides significant cis-SNP-gene pairs (up to 1 Mb apart) in CVD-related tissue types from GTEx. SNPs were mapped to genes based on their physical position in the genome and expression quantitative trait locus (eQTL) associations (obtained from ten related tissues in GTEx v8). The SNP locations were defined with reference to the human genome Build 37 (GRCh37/hg19), and only protein-coding genes were included in the analysis. In addition, genome-wide significant SNPs from CPASSOC and original GWAS results were also annotated by FUMA for comparison. Testing overlap of genomic risk loci across CVDs FUMA-annotated genomic risk loci were overlapped based on their chromosomal location to identify overlapping genomic risk loci across multiple CVDs. We further performed Multi-trait colocalization analysis on these overlapping genomic loci using the HyPrColoc (hypothesis prioritization in multi-trait colocalization) method to identify the potential shared causal variants, thus revealing potential biological mechanisms among CVDs^[213]33. HyPrColoc, an extension of the colocalization method, enables colocalization analysis for multiple traits and uses a deterministic Bayesian divisive clustering algorithm to identify clusters of colocalized traits and candidate causal variants within the same genomic locus, providing the posterior probability (PP) of colocalization for each cluster. A genomic risk locus with a PP greater than 0.7, whose traits were identified by HyPrColoc, are consistent with those obtained by overlapping the genomic risk loci, was declared a colocalized locus. Next, we estimated posterior probabilities (known as the m value) for each pleiotropic locus to quantify the best-fit model of cross-disease genotype-phenotype relationships using MetaSoft^[214]34 with the fixed-effects model (FE). An m value >0.9 indicated that a particular variant had an effect on a given disease, while an m value <0.1 predicted that the SNP had no effect on the disease. We further examined whether these pleiotropic loci could be colocalized between multiple CVDs and metabolic traits using multi-trait colocalization analysis by HyPrColoc, including anthropometric traits (body mass index [BMI], waist-to-hip ratio [WHR], and waist-to-hip ratio adjusted for BMI [WHRadj])^[215]37, blood pressure traits (systolic blood pressure [SBP], diastolic blood pressure [DBP], and pulse pressure [PP])^[216]38, lipidemic traits (Low-density lipoprotein cholesterol [LDL-C], High-density lipoprotein cholesterol [HDL-C], triglycerides [TG], and total cholesterol [TC])^[217]39, glycemic traits (fasting insulin [FI], fasting glucose [FG], 2-h glucose [2hGlu], and glycated hemoglobin levels [HbA1c])^[218]40, kidney function (estimated glomerular filtration rate [eGFR])^[219]41, liver function (alanine aminotransferase [ALT], alkaline phosphatase [ALP], and γ-glutamyl transferase [GGT])^[220]42, and other traits like carotid artery intima-media thickness [cIMT] and C-reactive protein [CRP])^[221]37,[222]38. The common risk factors, including hypertension, type 2 diabetes (T2D), chronic kidney disease (CKD), and non-alcoholic fatty liver disease (NAFLD), were also included in the above analysis^[223]33,[224]45,[225]46,[226]48. Gene-level analysis using MAGMA and TWAS Moving beyond SNP-level studies, we performed gene-based association analysis using multi-marker analysis of genomic annotation (MAGMA), which yields gene-level P values by evaluating the joint association of all SNPs within each gene, while accounting for LD between SNPs^[227]49. SNP positions were defined based on the human genome reference Build 37 (GRCh37/hg19), and only protein-coding genes (n = 17,363) were included in the analysis. Gene boundaries were extended by ±10 kb from the gene body, consistent with the default settings of positional mapping in FUMA. We applied Bonferroni correction for multiple testing based on the number of protein-coding genes (P  =  0.05/the number of protein-coding genes = 0.05/17,636 = 2.84 × 10^−6), and additionally also indicated which results remained significant after further correction for the number of independent CVDs (P  =  0.05/the number of protein-coding genes /the number of independent CVDs = 0.05/17,636/3 = 9.45 × 10^−7). In addition, we leveraged a transcriptome-wide association study (TWAS) using the functional summary-based Imputation (FUSION) software to further investigate tissue-specific genes across the ten CVD-related tissues^[228]50. Briefly, we used the FUSION method to prioritize CVD-associated genes by integrating MTAG-based summary statistics with precomputed gene expression weights reference from trait-relevant tissues from the genotype-tissue expression (GTEx) project version 8, while considering LD structures. The cis-genetic components of tissue-specific gene expression were imputed using MTAG-based summary statistics and five predictive models: best linear unbiased prediction (BLUP), Bayesian sparse linear mixed model (BSLMM), least absolute shrinkage and selection operator (LASSO), Elastic Net (ENET), and top single-nucleotide polymorphisms (SNPs). The gene expression weights from the best-performing model were then combined with MTAG results to identify significant associations between gene expression and CVDs. In TWAS, correcting for multiple comparisons using the total number of tested genes is expected to be too stringent and not reflect true biology (i.e., gene expression covariance)^[229]72. To address this, we estimated the effective number of independent genes per tissue using matSpD software. Briefly, matSpD estimates the effective number of independent variables in a correlation matrix by examining the eigenvalues derived from spectral decomposition. The expression values for genes whose differential expression was predicted by FUSION were extracted from normalized gene expression matrices obtained from GTEx. A total of 19,144 protein-coding genes were tested across the ten GTEx tissues relevant to CVDs., therefore, we applied a Bonferroni-corrected significance threshold (P = 0.05/the number of independent genes/the number of independent CVDs = 0.05/19,144/3 = 8.71 ×  10^−7). All gene-mapping analyses were conducted using both GWAS summary statistics and MTAG-based summary statistics across all six CVD traits. Pathway-level analysis using MAGMA, LDSC-SEG, and GARFIELD We conducted gene set enrichment analysis using MAGMA on 9398 gene sets, including canonical pathways and biological process sets from the Molecular Signatures Database (MSigDB, v.2023.1; categories C2: CP REACTOME and C5: GO BP)^[230]73. Multiple testing correction was applied to adjust for the number of gene sets analyzed (P < 0.05/(7,744 + 1,654) = 5.33 × 10^−6), and additionally also indicated which results remained significant after an additional Bonferroni correction for the number of independent CVD traits (P < 0.05/(7744 + 1654)/3 = 1.78 × 10^−6). All statistical tests were two-sided. LDSC applied to specifically expressed genes (LDSC-SEG) was used to identify tissue-specific enrichments in gene expression and chromatin modifications. The analysis incorporated multi-tissue gene expression datasets (from GTEx and the Franke lab) and multi-tissue chromatin datasets (from Roadmap Epigenomics and ENCODE)^[231]51. For tissue-specific gene expression, we used annotations derived from RNA-seq data of human tissues from GTEx, with the Bonferroni-corrected significant threshold set at P  < 0.05/48 /3 = 3.47 × 10^−4. For tissue-specific histone marks, we used annotations derived from the Roadmap Epigenomics Project, focusing on narrow peaks for DNase hypersensitivity and chromatin marks, including H3K27ac, H3K4me1, H3K4me3, H3K9ac, and H3K36me3, with the Bonferroni-corrected significant threshold set at P  <  0.05/489/3 = 3.41 × 10^−5. We also applied GWAS analysis of regulatory and functional information enrichment with LD correction (GARFIELD), a powerful enrichment tool for comparing different sets of annotation marks across specific cell types or tissues, to correlate GWAS summary statistics with regulatory and functional annotations. This approach identifies features relevant to a phenotype of interest across various GWAS P value thresholds, while adjusting for LD, MAF, and distance to the transcription start site (TSS)^[232]52. The annotations included 1005 features derived from ENCODE, GENCODE, and Roadmap Epigenomics projects, including DNase I hypersensitivity hotspots, chromatin accessibility peaks, transcription factor footprints, formaldehyde-assisted isolation of regulatory elements (FAIRE), histone modifications, chromatin segmentation states, genic annotations, and transcription factor binding sites (TFBS), among others, across multiple publicly available cell lines or tissues. Enrichment P values were empirically determined using a permutation-based procedure that accounts for the structure of associated genomic regions, based on the number of SNPs and their mean LD. We also assessed multiple testing corrections to account for the number of functional categories tested (P < 0.05/1005 = 4.98 × 10^−5), and further identified results that remained significant after additional correction for the number of CVDs (P < 0.05/1005/3 = 1.66 × 10^−5). Reporting summary Further information on research design is available in the [233]Nature Portfolio Reporting Summary linked to this article. Supplementary information [234]Supplementary Information^ (4MB, docx) [235]41467_2025_62419_MOESM2_ESM.pdf^ (139.5KB, pdf) Description of Additional Supplementary Files [236]Supplementary Data 1^ (15KB, xlsx) [237]Supplementary Data 2^ (14.6KB, xlsx) [238]Supplementary Data 3^ (19.7KB, xlsx) [239]Supplementary Data 4^ (342.3KB, xlsx) [240]Supplementary Data 5^ (81.3KB, xlsx) [241]Supplementary Data 6^ (3.3MB, xlsx) [242]Supplementary Data 7^ (12.9KB, xlsx) [243]Supplementary Data 8^ (28.7KB, xlsx) [244]Supplementary Data 9^ (183.5KB, xlsx) [245]Supplementary Data 10^ (114.9KB, xlsx) [246]Supplementary Data 11^ (122.3KB, xlsx) [247]Supplementary Data 12^ (404.3KB, xlsx) [248]Supplementary Data 13^ (438KB, xlsx) [249]Supplementary Data 14^ (373.4KB, xlsx) [250]Supplementary Data 15^ (229.1KB, xlsx) [251]Supplementary Data 16^ (302.3KB, xlsx) [252]Supplementary Data 17^ (22.6KB, xlsx) [253]Supplementary Data 18^ (94.1KB, xlsx) [254]Supplementary Data 19^ (27.8KB, xlsx) [255]Supplementary Data 20^ (37.7KB, xlsx) [256]Supplementary Data 21^ (16.8KB, xlsx) [257]Supplementary Data 22^ (65.2KB, xlsx) [258]Supplementary Data 23^ (272.7KB, xlsx) [259]Supplementary Data 24^ (50KB, xlsx) [260]Supplementary Data 25^ (35.7KB, xlsx) [261]Supplementary Data 26^ (27.3KB, xlsx) [262]Supplementary Data 27^ (12.8KB, xlsx) [263]Supplementary Data 28^ (13KB, xlsx) [264]Supplementary Data 29^ (17.6KB, xlsx) [265]Supplementary Data 30^ (432.5KB, xlsx) [266]Supplementary Data 31^ (67KB, xlsx) [267]Supplementary Data 32^ (1.3MB, xlsx) [268]Supplementary Data 33^ (15.6KB, xlsx) [269]Supplementary Data 34^ (27.8KB, xlsx) [270]Supplementary Data 35^ (53.1KB, xlsx) [271]Supplementary Data 36^ (37.9KB, xlsx) [272]Supplementary Data 37^ (13.4KB, xlsx) [273]Supplementary Data 38^ (22.6KB, xlsx) [274]Supplementary Data 39^ (12.9KB, xlsx) [275]Reporting Summary^ (2.7MB, pdf) [276]Transparent Peer Review file^ (1,004.8KB, pdf) Source data [277]Source Data^ (286.7KB, xlsx) Acknowledgements