Abstract Background Most coronary artery disease (CAD) risk loci identified by genome-wide association studies (GWAS) are located in non-coding regions, hampering the interpretation of how they confer CAD risk. It is essential to integrate GWAS with molecular traits data to further explore the genetic basis of CAD. Methods We used the probabilistic Mendelian randomization (PMR) method to identify potential proteins involved in CAD by integrating CAD GWAS data (∼76,014 cases and ∼264,785 controls) and human plasma proteomes (N = 35,559). Then, Bayesian co-localization analysis, confirmatory PMR analysis using independent plasma proteome data (N = 7752), and gene expression data (N1 = 213, N2 = 670) were performed to validate candidate proteins. We further investigated the associations between candidate proteins and CAD-related traits and explored the rationality and biological functions of candidate proteins through disease enrichment, cell type-specific, GO, and KEGG enrichment analysis. Results This study inferred that the abundance of 30 proteins in the plasma was causally associated with CAD (P < 0.05/4408, Bonferroni correction), such as PLG, IL15RA, and CSNK2A1. PLG, PSCK9, COLEC11, ZNF180, ERP29, TCP1, FN1, CDH5, IL15RA, MGAT4B, TNFRSF6B, DNM2, and TGF1R were replicated in the confirmatory PMR (P < 0.05). PCSK9 (PP.H4 = 0.99), APOB (PP.H4 = 0.89), FN1 (PP.H4 = 0.87), and APOC1 (PP.H4 = 0.78) coding proteins shared one common variant with CAD. MTAP, TCP1, APOC2, ERP29, MORF4L1, C19orf80, PCSK9, APOC1, EPOR, DNM2, TNFRSF6B, CDKN2B, and LDLR were supported by PMR at the transcriptome level in whole blood and/or coronary arteries (P < 0.05). Enrichment analysis identified multiple pathways involved in cholesterol metabolism, regulation of lipoprotein levels and telomerase, such as cholesterol metabolism (hsa04979, P = 2.25E-7), plasma lipoprotein particle clearance (GO:0034381, P = 5.47E-5), and regulation of telomerase activity (GO:0051972, P = 2.34E-3). Conclusions Our integration analysis has identified 30 candidate proteins for CAD, which may provide important leads to design future functional studies and potential drug targets for CAD. Keywords: Coronary artery disease, Plasma proteomes, PMR, Functional genomics 1. Introduction Coronary artery disease (CAD) stands as a major cardiovascular disease, ranking the top causes of mortality in both developed and developing countries [[29]1]. An individual's risk of developing CAD is influenced by a combination of genetic factors, environment, and lifestyle [[30]2]. Genetic factors are estimated to account for 30 %–60 % of the variation in coronary artery disease risk among individuals [[31][3], [32][4], [33][5], [34][6], [35][7], [36][8]]. Over the past decade, several genome-wide association studies (GWAS) have identified more than 200 risk loci associated with CAD [[37][3], [38][4], [39][5], [40][6], [41][7], [42][8]]. However, the majority of these variants are located outside protein-coding regions. Furthermore, most associations are typically attributed to specific genes primarily due to their proximity to the variant site, which makes it challenging to decipher the biological processes underlying these genetic effects [[43]3]. The significant enrichment of variants in regulatory regions underscores the primary role of CAD risk variants in modulating gene expression [[44]9].Therefore, it is essential to integrate GWAS with molecular traits data to further explore the genetic basis of CAD. Li et al. have integrated transcriptome data and GWAS to conduct transcription-wide association studies (TWAS) to explore susceptibility genes for CAD [[45]10]. Nevertheless, proteins are the main functional components of biological processes, as well as the final products of transcripts. They often undergo intricate post-translational alterations and refinement, and contain additional information [[46]11,[47]12]. Numerous genetic studies have identified protein quantitative trait loci (pQTLs) in plasma, and demonstrated that pQTLs strengthen the associations found in GWAS across various complex traits [[48][13], [49][14], [50][15], [51][16]]. Several studies have used pQTLs as instrumental variables to perform Mendelian randomization to identify causative proteins of different phenotypes [[52][17], [53][18], [54][19]]. Currently, few studies have integrated large-scale proteome data with GWAS to identify susceptibility proteins for CAD. The majority of exposure/molecular traits are influenced by multiple SNPs that may have potential linkage disequilibrium (LD) with each other. Previous studies mainly used Mendelian randomization (MR) or FUSION, PrediXcan, and other TWAS methods for transcription/protein-wide association analysis [[55]10,[56][17], [57][18], [58][19]]. Traditional MR methods are limited to using a single SNP or multiple independent SNPs as instruments. By incorporating multiple correlated SNPs, TWAS methods such as FUSION and PrediXcan improve statistical performance. Unfortunately, many TWAS methods rely on two-stage MR Inference procedures that fail to take into account the uncertainty of parameter estimates in exposure studies, which may lead to bias in estimates of causal effects and power loss [[59][20], [60][21], [61][22]]. Therefore, our study adopts a new multi-omics integration analysis method, probabilistic Mendelian randomization (PMR) [[62]23], which incorporates multiple correlated SNPs in the likelihood inference framework to infer the relationship between molecular phenotypes and traits, which greatly improves the statistical power. In this study, we employed an integrative analytical pipeline to identify proteins that were associated with CAD ([63]Fig. 1). First, we performed a discovery PMR to integrate GWAS of CAD and a large-scale plasma pQTL data to identify susceptibility proteins involved in CAD, protein-coding genes with P value less than 0.05/N after Bonferroni correction were regarded as candidate proteins; Second, we performed confirmatory PMR using independent pQTL data from ARIC and Bayesian co-localization analysis; Third, we explored the association of transcriptome-levels of candidate protein-coding genes with CAD using expression quantitative trait loci (eQTL) data from whole blood and coronary arteries; Fourth, we used PMR to further investigate the associations between candidate proteins and CAD-related traits, such as hyperlipidemia, hypertension, and atherosclerosis; Fifth, Disease enrichment, Cell type specific enrichment, GO and KEGG enrichment analysis were conducted to explore the plausibility, biological function and pathogenicity of the candidate proteins. Fig. 1. [64]Fig. 1 [65]Open in a new tab The overall analysis pipeline for this study. Step 1, we performed discovery PMR to integrate GWAS of CAD and a large-scale plasma pQTL data to identify candidate genes involved in CAD (Bonferroni correction P < 0.05/N); then, we performed confirmatory PMR using an independent pQTL data from ARIC and colocalization analysis; Step 2, we validated the candidate genes at the transcriptome-level; Step 3, we further investigated the associations between candidate genes and CAD related traits; Step 4, Disease enrichment, Cell-type specific enrichment, GO and KEGG enrichment analysis. CAD, coronary artery disease; PMR, probabilistic two-sample Mendelian randomization; pQTL, protein quantitative trait locus; TWAS, transcriptome-wide association studies; eQTL, expression quantitative trait locus; GTEx, Genotype-tissue expression; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes. 2. Materials and method 2.1. CAD GWAS data We used the summary statistics from a meta-analysis of UK Biobank SOFT CAD GWAS with CARDIoGRAMplusC4D 1000 Genomes-based GWAS and the Myocardial Infarction Genetics and CARDIoGRAM Exome [[66]24], which included ∼76,014 cases and ∼264785 controls. CAD was defined based on self-reported information (via questionnaire), hospital records, or death registry data, encompassing both fatal and nonfatal myocardial infarction, percutaneous transluminal coronary angioplasty (PTCA), coronary artery bypass grafting (CABG), chronic ischemic heart disease (IHD), and angina. Individuals who did not fulfill the criteria for CAD were designated as controls, after excluding individuals with aneurysms and atherosclerotic cardiovascular disease. Two distinct fixed-effect meta-analyses were performed to integrate the association summary statistics from the UKB with the 1000 Genomes-imputed GWAS outcomes and Exome results [[67]25]. Details information regarding this data resource is available in the original papers [[68]2,[69]24]. 2.2. Human plasma pQTLs in the discovery PMR The human plasma samples were collected from 40,004 Icelanders [[70]26], and were measured with the SomaScan version 4 assay (SomaLogic), which contains 5284 aptamers to recognize specific proteins. After converting protein concentrations into DNA aptamer concentrations, the proteins are quantified using a DNA microarray. After quality control, a total of 4907 aptamers that target a total of 4719 unique proteins remained for the pQTL analysis. A total of 166,281 Icelanders were genotyped using Illumina SNP arrays, resulting in >27 million imputed variants with MAF>0.01 and imputation info score >0.9 after imputation. Of these, 35,559 Icelanders had both genotype and protein measurements. The pQTLs were generated by genome-wide association testing using the linear mixed model implemented in BOLT-LMM [[71]27]. The phenotypes were the standardized residuals after adjusting for rank-inverse normal transformed for age, sex, and sample age. All P values were computed by likelihood-ratio test. More detailed information was provided in the original papers [[72]14]. 2.3. Human plasma pQTLs in the confirmatory PMR The human plasma proteomes used in the confirmatory PMR analysis were obtained from 7213 European American individuals in the Atherosclerosis Risk in Communities (ARIC) study [[73]28]. The relative concentrations of plasma proteins or protein complexes from the blood samples were measured utilizing the V4 platform by SomaLogic Inc. (Boulder, Colorado, USA), employing an aptamer (SOMAmer)-based methodology [[74]29,[75]30]. After quality control, the pQTL analysis included 4657 SOMAmers that measured 4483 unique proteins or protein complexes encoded by 4435 autosomal genes. The Affymetrix 6.0 DNA microarray was used to genotype the ARIC samples, and the TOPMed reference panel was used to perform imputation [[76]31,[77]32]. The SNPs with imputation quality R^2 ≥ 0.8, call rates ≥90 %, Hardy-Weinberg equilibrium p-values ≥10^−6, or MAF ≥0.01 were included in the pQTL analysis. The association tests were performed by an adaptive permutation approach implemented in QTLtools [[78]33]. More detailed information was provided in the original papers [[79]28]. 2.4. Human eQTL data in the PMR In this study, we used the eQTL summary data from the coronary artery (N = 213) and whole blood (N = 31684) for PMR analysis to explore the association of candidate protein-coding genes expression levels with CAD risk in these tissues. The eQTL summary data of coronary artery was obtained from Genotype-Tissue Expression version 8 (GTEx v8) [[80]34], which examined 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors and genotyped 43,066,422 SNPs using whole-genome sequencing (WGS). The mRNA of each of the tissue samples was quantified using the Quant-iTTM RiboGreen®RNA Assay Kit and sequenced to a median depth of 82.6 million reads. STAR v2.5.3a was used to align the RNA-seq data to the human reference genome GRCh38/hg38 [[81]35]. We used two whole blood eQTL summary data, one from GTEx v8 (N = 670) and the other from a large-scale meta-analysis of 31,684 blood samples from 37 eQTLGen Consortium cohorts [[82]36]. 2.5. GWAS summary statistics of CAD-related traits In this study, we investigated the relationship between candidate proteins with CAD-related traits including hypertension, hyperlipidemia, atherosclerosis, stroke, venous thrombosis, Non-alcoholic fatty liver disease (NAFLD), Body Mass Index (BMI), and waist-to-hip ratio (WHR) adjusted for BMI (WHRadjBMI). Among them, the GWAS summary data of hypertension, hyperlipidemia, atherosclerosis, and NAFLD were from the UK Biobank (UKB) data [[83]2,[84]37,[85]38], the stroke GWAS summary data was from the International Stroke Genetics Consortium [[86]39]; the GWAS summary data of venous thrombosis was obtained from a genome-wide meta-analysis including six datasets (CHB-CVDC/DBDS, UKB, FinnGen freeze 6, deCODE, Intermountain Healthcare and MVP) [[87]40]. The GWAS summary data of BMI and WhRadjBMI were both from a meta-analysis of UKB results with publicly available GWAS summary statistics from GIANT [[88]41]. 2.6. Probabilistic mendelian randomization We used a probabilistic Mendelian randomization (PMR) method for proteome-wide association studies (PWAS) applications [[89]23]. PMR-Egger unifies many existing TWAS and MR methods in a MR likelihood framework to test the causal effect of proteins on traits in the presence of horizontal pleiotropy. The approach utilizes multiple related SNPs as instruments, facilitating inference within the likelihood framework and thereby significantly enhancing statistical power. In this study, we extracted cis-acting SNPs that are within 500 kb upstream of the transcription start site and 500 kb downstream of the transcription end site, and used the 1000G LD reference panel in Europeans. To ensure the existence of a solution, we limit the low IV correlation (r2 < 0.1 in 1000G) due to the possibility of decomposition failure caused by the relatively small sample size of the reference panel. In the discovery PMR analysis, P-values were corrected by the Bonferroni method, and p-value <0.05 was defined as statistically significant in the confirmatory PMR. The PMR was conducted by using the R package “PMR” ([90]https://www.xzlab.org./software.html) with the default parameters recommended by the developers. 2.7. Bayesian co-localization analysis In this study, we used Bayesian co-localization to assess the probability of the same SNP being responsible for both changing the CAD risk and modulating the protein levels [[91]42]. To begin with, it is necessary to set the prior probability of a certain SNP being associated with CAD (P1), the probability of it being a significant pQTL (P2), and the probability of a QTL and GWAS association overlap (P12). We used the default COLOC priors of P1 = 10^−4, P2 = 10^−4, and P12 = 10^−5. There are a total of 5 hypotheses ([92]Supplementary text 1) in Bayesian co-localization analysis, under the assumption that there is at most one causal variant between two traits. Among these five hypotheses, we are mainly interested in the fifth hypothesis, that a shared SNP is both associated with GWAS and pQTL. In addition, we also performed a Sum of Single Effects (SuSiE) regression that allows evidence for association at multiple causal variants to be evaluated simultaneously [[93]43]. As recommended by the developer of this method, posterior probabilities for hypotheses 4 (PP.H4) > 0.80 were regarded as evidence of co-localization. The Bayesian colocalization analysis was conducted by using the R package “coloc” ([94]https://github.com/clagiamba/moloc). 2.8. Cell-type expression specificity We examined the cell-type-specific expression of candidate protein-coding genes utilizing single-cell RNA sequencing (RNA-seq) data derived from atherosclerotic human coronary arteries ([95]GSE131778) [[96]44] and whole blood samples ([97]GSE149938)) [[98]44]. The [99]GSE131778 dataset comprises atherosclerotic coronary artery cells harvested from the excised hearts of four patients undergoing heart transplantation. The proximal-to-mid segment of the right coronary artery was used for RNA-seq. The cells were then loaded onto a microfluidics chip from 10X Genomics and encapsulated within gel beads containing barcoded oligo-dT. Subsequently, the cells were isolated using FACS and captured on the 10X Chromium controller. The [100]GSE149938 dataset offers transcriptome profiles for 7551 human blood cells, encompassing 32 distinct immuno-phenotypic cell types. These cell types encompass hematopoietic stem cells, progenitors, and mature blood cells sourced from 21 healthy donors. The Cell-type Expression Specificity (ES) profiles were generated using the CELLEX tool, which adopts a "wisdom of the crowd" approach. By integrating multiple ES metrics, it produces a comprehensive set of complementary cell-type ES profiles [[101]45]. 2.9. Gene set enrichment analysis We conducted gene ontology (GO) and pathway enrichment analysis on candidate protein-coding genes identified through PMR analysis. The clusterProfiler package was used to identify the GO terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enriched by the candidateprotein-coding genes [[102]46]. We also performed disease enrichment analysis based on a database of gene-disease associations (DisGeNET), one of the largest publicly available collections of genes and variants associated with human diseases, integrating data from expert-curated repositories, GWAS catalogs, animal models, and the scientific literature [[103]47]. q-value <0.05 was defined as a criterion for significant enrichment. 2.10. Ethics All summarized statistics used in this study were generated by previous studies and were public availavle, for which ethical approval and individual consent were obtained for all original studies. 2.11. Role of the funding source The funding sources had no role in the study design, data collection, analysis, interpretation, and writing of the manuscript. 3. Results A total of 499 proteins were excluded from PMR analysis due to various reasons such as unsuitable instrument variables, low heritability expression (less than 0.01 %), or inability to decompose the LD matrix, and the remaining 4408 proteins underwent PMR analysis ([104]Fig. 2). PMR identified 30 candidate proteins that were associated with CAD (P < 0.05/4408, [105]Table 1). Among them, 14 proteins were positively related to CAD, such as PCSK9, APOB, and LDLR, and 16 proteins were negatively related to CAD, such as PLG, APOC1, and CDKN2B. We obtained the lowest P values for the SNPs within 1 Mb of each of these identified 30 proteins using the summary statistics from the CAD GWAS. Among these, 19 proteins with the most significant P values were less than 5 × 10^−8 and the other 11 proteins ranged from 4.16 × 10^−4 to 5.77 × 10^−8, including MGAT4B, CSNK2A1, OPCML, TNFRSF6B, COLEC11, SIGLEC7, CDH5, IGF1R, IL15RA, SORCS2, and IL5RA ([106]Supplementary Table S1). We further performed a confirmatory PMR analysis with another independent human plasma pQTL summary data from the ARIC study. C19orf80, CDKN2B, MATP, and LDLR could not be tested due to low heritability (<0.01 %) or the LD matrix failed to decompose, and EPOR was not profiled in the pQTL data. Finally, 25 candidate proteins were tested and 19 proteins of them were associated with CAD (P < 0.05) in the confirmatory PMR ([107]Table 1), while the direction of effect of 6 proteins was not consistent with that in the discovery PMR analysis. Thus, a total of 13 proteins were replicated in the confirmatory PMR, including PLG, PSCK9, COLEC11, ZNF180, ERP29, TCP1, FN1, CDH5, IL15RA, MGAT4B, TNFRSF6B, DNM2, and IGF1R. Fig. 2. [108]Fig. 2 [109]Open in a new tab Manhattan plot for the discovery PMR analysis integrating the CAD GWAS (N = 336,924) with the large-scale plasma proteomes (N = 35,559). Each point represents a single association test between a gene and CAD ordered by genomic position on the x axis and the association strength on the y axis as the -log10(P). The discovery PWAS identified 30 genes whose cis-regulated plasma protein abundance was associated with CVD after Bonferroni correction (P < 1.13E-5, 0.05/4408). The blue horizontal line reflects the significant threshold of the P < 1.13E-5. Table 1. The discovery PMR identified 30 significant proteins, of which, 25 were found in the confirmatory PMR and 13proteins replicated. Protein-coding Genes Discovery PMR __________________________________________________________________ Confirmtary PMR __________________________________________________________________ Evidence for Replication __________________________________________________________________ Beta Se P P_pleiotropy Beta Se P P_pleiotropy Effect direction consistency Significance PLG −2.296 0.130 9.19E-70 0.107 −1.097 0.072 3.66E-52 0.278 Yes Yes PCSK9 0.044 0.007 3.63E-09 0.481 0.048 0.007 7.35E-12 0.251 Yes Yes COLEC11 0.011 0.002 5.67E-07 0.502 0.019 0.004 1.46E-07 0.497 Yes Yes ZNF180 −0.907 0.126 6.30E-13 0.715 −0.803 0.153 1.61E-07 0.791 Yes Yes ERP29 0.500 0.105 2.01E-06 0.960 0.296 0.058 2.74E-07 0.565 Yes Yes TCP1 −1.071 0.141 3.36E-14 0.198 −0.207 0.042 7.71E-07 0.175 Yes Yes FN1 −0.061 0.012 8.25E-07 0.055 −0.033 0.009 1.13E-04 0.082 Yes Yes CDH5 −0.080 0.016 1.03E-06 0.137 −0.032 0.010 1.50E-03 0.543 Yes Yes IL15RA −0.013 0.003 4.02E-06 0.350 −0.022 0.008 3.44E-03 0.490 Yes Yes MGAT4B 0.018 0.004 6.74E-06 0.779 0.013 0.004 3.49E-03 0.443 Yes Yes TNFRSF6B 0.116 0.023 3.98E-07 0.453 0.036 0.014 9.78E-03 0.486 Yes Yes DNM2 0.647 0.110 3.59E-09 0.033 0.115 0.046 1.15E-02 0.816 Yes Yes IGF1R −0.045 0.008 9.73E-09 0.645 −0.014 0.006 2.63E-02 0.160 Yes Yes IL5RA −0.011 0.002 4.24E-07 0.179 −0.005 0.003 1.02E-01 0.146 Yes No OPCML 0.032 0.003 2.32E-21 0.068 0.004 0.013 7.52E-01 0.962 Yes No APOB 0.145 0.032 7.50E-06 0.701 0.004 0.016 8.08E-01 0.256 Yes No PSMA4 −0.640 0.096 3.16E-11 0.408 1.091 0.243 7.04E-06 0.598 No Yes MORF4L1 −0.606 0.104 5.17E-09 0.138 0.167 0.041 4.45E-05 0.376 No Yes MAPKAPK5 0.664 0.116 1.02E-08 0.655 −0.405 0.113 3.22E-04 0.072 No Yes PARK2 0.071 0.012 2.10E-09 0.617 −0.055 0.017 9.57E-04 0.292 No Yes APOC2 −0.228 0.042 4.03E-08 0.023 0.085 0.030 4.59E-03 0.242 No Yes SORCS2 0.030 0.005 2.53E-10 0.559 −0.031 0.014 3.22E-02 0.483 No Yes CSNK2A1 −0.129 0.023 1.42E-08 0.125 0.035 0.021 8.82E-02 0.225 No No APOC1 −0.038 0.006 3.82E-09 0.052 0.013 0.008 9.06E-02 0.659 No No SIGLEC7 −0.025 0.005 5.27E-08 0.530 0.004 0.004 2.92E-01 0.231 No No LDLR 0.686 0.127 6.52E-08 0.281 EPOR 0.607 0.102 2.73E-09 0.084 C19orf80 1.056 0.147 7.80E-13 0.166 CDKN2B −1.614 0.125 3.68E-38 0.651 MTAP −2.921 0.216 1.35E-41 0.177 [110]Open in a new tab Bayesian colocalization analysis found that PCSK9 (PP.H4 = 0.99), APOB (PP.H4 = 0.89), and FN1 (PP.H4 = 0.87) each share a variation with CAD. The top shared variants of the above three proteins were rs11591147, rs548145, and rs1250229, respectively ([111]Fig. 3). To relax the single causal variant assumption, we also used SuSiE for colocalization analysis allowing for multiple causal variants. Coloc.SuSiE analysis identified APOC1 with evidence of genetic colocalization based on a PP.H4 [SuSiE] ^= 0.78. Fig. 3. [112]Fig. 3 [113]Open in a new tab Locuszoom plots for the colocalization analysis of APOB (A, GWAS; D, pQTL), FN1 (B, GWAS; E, pQTL), and PCSK9 (C, GWAS; F, pQTL). According to the central dogma of molecular biology that proteins are translated from mRNA, this study further explored the association between the transcription level of protein-coding genes and CAD. A total of 27 identified protein-coding genes were tested in PMR analysis using eQTL summary data from whole blood and coronary artery ([114]Table 2). Among them, genetically regulated mRNA expression levels of 12 and 11 protein-coding genes in whole blood and coronary arteries, respectively, were associated with the risk of coronary artery disease (P < 0.05) ([115]Table 2). The transcription levels of 6 protein-coding genes including MTAP, ERP29, MORF4L1, DNM2, CDKN2B, and LDLR in blood and coronary arteries were both related to CAD (P < 0.05). Additionally, the transcription levels of IL5RA, SIGLEC7, CSNK2A1, PSMA4, ZNF180, and MAPKAPK5 in blood, as well as TCP1, APOC2, C19orf80, APOC1, and TNFRSF6B in coronary arteries, were also found to be related to CAD (P < 0.05). In addition, the effect direction of the transcription levels of protein-coding genes such as PCSK9, PSMA4, EPOR, and LDLR on CAD are inconsistent between whole blood and coronary arteries, which may indicate tissue specificity. Table 2. Results of PMR analysis at the transcriptome level. Gene Symbol __________________________________________________________________ eQTLGen Blood eQTL __________________________________________________________________ GTEx Blood eQTL __________________________________________________________________ GTEx Artery Cononary eQTL __________________________________________________________________ Evidence tier[116]^a Beta Se P P_pleiotropy Beta Se P P_pleiotropy Beta Se P P_pleiotropy MTAP 0.042 0.007 5.97E-10 0.913 0.321 0.044 5.13E-13 0.849 0.444 0.024 1.51E-73 0.157 3 ERP29 0.124 0.02 6.71E-10 0.273 0.099 0.026 1.85E-04 0.628 0.225 0.070 1.22E-03 0.088 3 MORF4L1 −0.130 0.021 7.90E-10 0.355 −0.024 0.006 2.05E-04 0.014 −0.037 0.007 1.48E-07 0.742 3 TCP1 −0.001 0.003 7.58E-01 0.605 0.126 0.020 5.42E-10 0.369 0.200 0.039 3.42E-07 0.489 2 CDKN2B −0.057 0.005 9.50E-26 0.002 −0.007 0.004 5.47E-02 0.002 −0.264 0.022 2.39E-33 0.104 2 LDLR 0.180 0.045 7.51E-05 0.424 −0.009 0.005 8.63E-02 0.000 0.096 0.030 1.53E-03 0.939 2 DNM2 0.092 0.016 1.33E-08 0.847 −0.021 0.011 4.59E-02 0.577 −0.121 0.032 1.29E-04 0.794 2 C19orf80 – – – – 0.581 0.159 2.67E-04 0.000 0.301 0.047 1.94E-10 0.305 2 APOC1 – – – – 0.019 0.005 3.40E-04 0.064 0.008 0.004 4.58E-02 0.764 2 APOC2 – – – – 0.033 0.007 1.95E-06 0.749 0.010 0.004 1.83E-02 0.017 2 EPOR −0.005 0.006 4.43E-01 0.826 0.011 0.005 1.82E-02 0.744 −0.009 0.011 4.23E-01 0.855 1 MAPKAPK5 −0.037 0.012 2.47E-03 0.007 0.033 0.018 5.78E-02 0.402 0.011 0.019 5.47E-01 0.003 1 PSMA4 −0.031 0.007 2.64E-06 0.553 −0.004 0.004 3.33E-01 0.070 0.011 0.009 2.10E-01 0.594 1 ZNF180 0.257 0.03 1.92E-17 0.651 −0.003 0.006 6.02E-01 0.840 −0.002 0.005 6.79E-01 0.923 1 IL5RA −0.036 0.009 1.14E-04 0.078 – – – – – – – – 1 SIGLEC7 −0.02 0.005 1.39E-05 0.354 – – – – – – – – 1 CSNK2A1 −0.044 0.009 2.16E-06 0.172 – – – – – – – – 1 PCSK9 – – – – 0.009 0.002 2.77E-04 0.918 −0.004 0.003 2.16E-01 0.921 1 TNFRSF6B – – – – −0.005 0.003 5.45E-02 0.278 −0.008 0.004 2.49E-02 0.622 1 FN1 0.001 0.006 9.21E-01 0.008 0.003 0.006 6.12E-01 0.526 −0.001 0.005 8.18E-01 0.406 0 IGF1R −0.005 0.005 3.54E-01 0.356 – – – – – – – – 0 IL15RA 0.004 0.003 2.01E-01 0.598 – – – – – – – – 0 MGAT4B −0.007 0.006 2.76E-01 0.921 – – – – – – – – 0 COLEC11 0.008 0.007 2.42E-01 0.506 – – – – – – – – 0 SORCS2 −0.006 0.005 2.16E-01 0.287 – – – – – – – – 0 PARK2 −0.059 0.031 5.81E-02 0.862 – – – – – – – – 0 CDH5 – – – – 0.000 0.005 9.61E-01 0.563 −0.003 0.007 0.723 0.833 0 [117]Open in a new tab ^a Tier 3: the transcription level of the protein-coding gene is associated with CAD in all three-transcriptome data and the effect direction is consistent; Tier 2: the transcription level of the protein-coding gene is associated with CAD in coronary arteries and one of the whole blood transcriptomes and the effect direction is consistent; Tier 1: the transcription level of the protein-coding gene is associated with CAD in only one tissue transcriptome; Tier 0: Non-association evidence. PMR, Probabilistic two sample mendelian randomization; eQTL, expression quantitative trait loci. We investigated the association of candidate proteins with CAD-related traits such as hypertension, hyperlipidemia, atherosclerosis, stroke, venous thrombosis, NAFLD, BMI, and WhRadjBMI. Among the 30 candidate proteins, two proteinss (PARK2 and MAPKAPK5), 8 proteins (TNFRSF6B, APOC2, MORF4L1, MAPKAPk5, OPCML, ERP29, LDLR, and MGAT4B), two proteinss (OPCML and SIGLEC7), six proteinss (CSNK2A1, MAPKAPK5, ERP29, IGF1R, DNM2, and TCP1), five proteins (LDLR, DNM2, SORCS2, APOC1, and MAPKAPK5), nine proteins (DNM2, ZNF180, FN1, LDLR, C19orf80, APOC2, PSMA4, TNFRSF6B, and PLG), and one proteins (OPCML) were associated with stroke, BMI, atherosclerosis, hypertension, venous thrombosis, WhRadjBMI, and dyslipidemia, respectively (P < 0.05/4408) ([118]Supplementary Table S2). There were no overlapping candidate proteins between NAFLD and CAD. In addition, we performed disease enrichment analysis and found that candidate protein-coding genes were primarily enriched for CAD, coronary arteriosclerosis, and dyslipidemias ([119]Supplementary Table S3). We performed cell-type expression specificity analysis in two tissues. In atherosclerotic human coronary arteries, APOC1, APOC2, and SIGLEC7 were only found in macrophages; COLEC11, IL5RA, and OPCML were more abundant in artery cell; the ES of IL15RA, IGF1R, CDH5, and LDLR were relatively high in endothelial cell. The ES of APOB and FN1 in regulatory T cells and vascular smooth muscle cells were relatively high. The highest ES of MGAT4B, TCP1, MTAP, CDKN2B, ERP29, PSMA4, MORF4L1, IGF1R, DNM2, and EPOR in 7 coronary artery cell types were all lower than 0.50 ([120]Fig. 4A, [121]Supplementary Table S4). In whole blood, SIGLEC7 was more abundant in kineNK, nonM, and toxiNK; the expression abundance of APOC1 was relatively high in CMP, MEP, and erythrocytes; the four highest ES for LDLR were in HSC, MPP, claM, and preM. The highest ES of 7 protein-coding genes (TCP1, MAPKAPK5, ERP29, PSMA4, MORF4L1, CSNK2A1, DNM2) in 32 immunophenotypic cell types were all lower than 0.50 ([122]Fig. 4B–[123]Supplementary Table S5). In addition, GO and KEGG enrichment analysis identified 150 GO terms and 1 KEGG pathway ([124]Supplementary Table S6), and the candidate protein-coding genes were mainly enriched in GO terms and pathways involving cholesterol metabolism, regulation of lipoprotein levels, and endocytosis, such as Cholesterol metabolism (hsa04979, P = 2.25E-7), plasma lipoprotein particle clearance (GO:0034381, P = 5.47E-5), and receptor-mediated endocytosis (GO:0006898, P = 3.25E-7). Notably, we also found that candidate protein-coding genes were enriched in the alcohol metabolic process (GO:0006066, P = 8.01E-5) and secondary alcohol metabolic process (GO:1902652, P = 2.01E-3). Fig. 4. [125]Fig. 4 [126]Open in a new tab Single-cell-type expression of the potentially CAD-risk genes in atherosclerotic human coronary arteries (A) and whole blood (B). Bar graph of single-cell-type enrichment for risk genes in CAD from the discovery PMR. The diagram depicts Cell-type expression-specificity (y axis) for each gene (x axis), with evidence of substantial enrichment within a specific cell type (histogram of the bar). We used the “wisdom of the crowd” technique to assess enrichment based on gene expression in one cell type against all other cell types. CAD, coronary artery disease; PMR, probabilistic two-sample Mendelian randomization; VSMC, Vascular smooth muscle cell; Treg, Regulatory T cell. MEP, Megakaryocytic-Erythroid Progenitors; NKP, NK cell precursors; BNK, Blood Natural Killer Cells; CLP, common lymphoid progenitor; CMP, common myeloid progenitor; GMP, granulocyte/macrophage progenitor; HSC, hematopoietic stem cell; LMPP, Lymphoid-primed multipotent progenitor; MEP, Megakaryocyte/erythrocyte progenitor; MLP, Multilymphoid Progenitor; MPP, Multipotent Progenitors and Plasma Cells; NKP, NK cell precursors; cMOP, common monocyte progenitor; claM, classical monocle; Ery, Erythrocytes; hMDP, human monocyte-dendritic progenitor; immB, Immature B; interM, Intermediate monocyle; kineNK, cytokine NK; matureN, mature neutrophil; memB, memory B; metaN, Meta-myelocyte; myeN, myelocyte neutrophil; naiB, Naive B; nonM, non-classical monocyle; preB, preB; preM, Pre-monocyle; proB, proB; proN, Pro-myelocyte; regB, Regulatory B; toxiNK, cytotoxic NK. 4. Discussion In this study, we employed a range of analytical techniques to investigate the functional associations between protein biomarkers in blood and CAD risk. Our PMR analysis suggested that the abundance of 30 potential candidate proteins in the blood was causally associated with CAD, 13 of which were validated in a confirmatory PMR analysis which was based on independent pQTL data from ARIC. 19 candidate proteins were supported by PMR at the transcriptional level in whole blood or coronary arteries; The protein levels of PCSK9, APOB, APOC2, and FN1 each share a causal variant with CAD. Furthermore, we also investigated the cell-type expression specificity of candidate protein-coding genes in coronary arteries and whole blood and performed enrichment analysis to explore potential pathways involved in CAD, providing reference information for further mechanism and treatment research. The associations identified by PMR provide a substantial step toward gene prioritization of GWAS loci. Our study identified multiple proteins that have been extensively studied in previous CAD studies, such as FN1 and PCSK9. FN1 encodes fibronectin, mainly including two isoforms: plasma fibronectin 1 (pFN, present in plasma as a soluble dimer) and cellular fibronectin 1 (cFN, as a dimer or multimer present on the cell surface and in the extracellular matrix) [[127]48]. cFN is deposited as insoluble fibers and is involved in cell adhesion. pFN constitutes a significant fraction of arterial FN and may be involved in normal tissue remodeling, and angiogenesis [[128][49], [129][50], [130][51]]. Many studies have found that multiple SNPs of FN1 were associated with CAD risk, such as rs1250229 and rs1250259 [[131][52], [132][53], [133][54], [134][55]]. Our study also found that the rs1250229 was a shared variant between CAD and FN1 encoded proteins through coloc analysis. PSCK9 encodes a member of the subtilisin-like proprotein convertase family, which is mainly expressed in the liver, kidney, and intestinal tissues, and can promote the degradation of LDL receptors in the liver, thereby reducing the uptake of LDL in plasma by the liver, leading to hyperlipidemia [[135]56,[136]57]. PCSK9 inhibitors (PCSKi) have been used in the prevention of CAD [[137]58,[138]59]. The shared variant rs11571147 between CAD and PCSK9 has also been reported by other studies on its association with CAD [[139]60]. The above findings also indicate the effectiveness and robustness of the analytical methods used in this study to a certain extent. Our study also identified several novel candidate proteins, such as IL15RA, IL5RA, and CSNK2A1. Our study indicated that both IL15RA and IL5RA protein abundance were associated with decreased CAD risk. Both IL15RA and IL5RA encode interleukin receptors. Some studies have shown that IL-15 is upregulated in human and animal atherosclerotic lesions, which may lead to the recruitment and activation of T cells in the process of atherosclerosis [[140]61,[141]62]. On the other hand, IL-15 protects cardiomyocytes from oxidative stress through the PI3K/ERK1/2 pathway [[142]63], supplementing IL-15 can improve the survival rate of cardiomyocytes under oxidative stress and can improve cardiac function recovery after myocardial infarction [[143]64,[144]65]. The role of IL-15 in atherosclerosis and cardiac outcomes needs to be further explored [[145]66]. Several studies have shown that IL5 is athero-protective, supported by that it is primarily produced by anti-inflammatory T helper 2 (Th2) cells [[146][67], [147][68], [148][69], [149][70]]. CSNK2A1 encodes the catalytic subunit of a highly conserved serine-threonine kinase. Song et al. have reported that CSNK2 is involved in the regulation of the phosphorylation of the autophagy gene ATG16L1, which is related to various diseases including cardiovascular diseases [[150]71,[151]72]. Furthermore, CSNK2A1 plays a key role in the regulation of DNA damage repair and metabolic processes and promotes glycerolipid metabolism [[152]73]. In the confirmatory PMR analysis of 25 candidate proteins, only 13 proteins showed confirmatory evidence with a P < 0.05 and consistent effect direction. This could be due to several reasons such as differences in age, gender, and disease history between the two proteome data source populations, potential methodological differences for the two pQTL data in terms of tissue collection, sample processing, and analysis platform, and limited and non-overlapping genotypes of the two pQTL data, resulting in differences in the instrumental variables of the two PMR analyzes. Also, 6 proteins with P < 0.05 had inconsistent effect direction with the discovery PMR analysis, which may imply a complex relationship between these proteins and CAD rather than a simple linear relationship. We further performed PMR analysis at the transcriptome level to explore the GWAS-related transcriptional regulation mechanism. Among the 27 candidate proteins, 19 proteins reached a consensus at the transcript level, and the other 8 proteins had no statistical association with CAD at the transcript levels of whole blood and coronary arteries. This may be due to the inconsistency of mRNA expression and protein levels. Previous studies have shown that the expression of many genes was poorly correlated with protein levels, proteins with distant pQTLs (pQTLs not within ±10 Mb of the DNA sequence encoding the protein) were poorly correlated with mRNA expression, which may be partly attributable to post-transcriptional mechanisms, such as protein translation and degradation [[153][74], [154][75], [155][76]]. A weak correlation between expression and protein means that the potential impact of eQTLs on downstream phenotypes is usually weakened or buffered. Enrichment analysis showed that candidate protein-coding genes were mainly enriched in pathways involving cholesterol, lipid, and lipoprotein metabolism and their regulation. Seven protein-coding genes, including ZNF180, TNFRSF6B, MGAT4B, C19orf80, PARK2, OPCML, and SIGLEC7, were not included in the identified GO terms and KEGG pathways, indicating that they may affect CAD through other pathways that may not yet be identified, which requires further investigation. It is worth noting that this study also found that multiple pathways involving telomerase and telomeres may also be involved in the pathogenesis of CAD, such as the positive regulation of telomere maintenance by telomerase and the regulation of telomerase activity. Telomerase consists of a telomerase RNA component (TERC) and a catalytic component, telomerase reverse transcriptase (TERT), which uses TERC as a template to synthesize new telomere DNA repeats at single-stranded overhangs to maintain telomeres. Several studies have observed shorter telomeres in endothelial cells and vascular smooth muscle cells (VSMCs) in atherosclerotic artery walls, consistent with various regions of the human vasculature [[156][77], [157][78], [158][79]]. In the cell-type expression specific analysis, we also found that TCP1, a gene enriched in telomerase and telomere pathway, was specifically expressed in coronary artery VSMCs. Hoffmann et al. suggested that the antioxidant function of mitochondrial telomerase reverse transcriptase may have athero-protective effects, making it a potential target for clinical trials [[159]80]. The study has certain limitations. Firstly, the proteome data utilized in the study only covers specific proteomes and not the entire proteome. This may have restricted the scope of the study to reveal more associations. Further studies that employ alternative detection techniques and biological samples may uncover additional associations. Secondly, the PMR method requires the use of the 1000 Genomes Project data as the LD reference panel. Consequently, only SNPs overlapping with 1000 G data can be selected as instrumental variables, which may have reduced the statistical power of the study to a certain extent. Third, the diagnosis of CAD in GWAS is derived from various sources, which may lead to heterogeneity in the results due to different diagnostic criteria. To enhance accuracy and identify more specific disease-related markers, future GWAS studies should consider using more detailed diagnostic criteria. Fourth, hundreds of proteins were not tested using PMR due to their low expression heritability or failure of LD matrix decomposition, resulting in the loss of potential candidate proteins. Furthermore, this study only mapped GWAS signals at the protein level, and the biological mechanisms of some candidate proteins and pathways are still unclear. More mechanism studies and data such as mQTL and single-cell sequencing are required to gain a better understanding of the role of candidate protein-coding genes associated with CAD. In conclusion, we found strong evidence supporting the association of multiple protein biomarkers in blood with CAD, among which multiple proteins such as TCP1 and PCSK9 were further validated at the transcriptional level. We also performed cell-type expression specificity and enrichment analysis. These findings provide information about the genetic and physiological processes of CAD, providing clues for further exploration of CAD pathogenesis and new therapeutic targets. Future studies utilizing more extensive large-scale molecular datasets from CAD-associated tissues are warranted, which could further help generate new insights into the genetic and functional mechanisms of CAD, as well as drug targets. Funding This study was supported by the Shandong Province Natural Science Foundation, Youth Branch (ZR2023QH381) and the Taishan Scholar Youth Expert Program (tsqn202306356), and General Fund of China Postdoctoral Science Foundation (2024M751847). Data availability statement All data relevant to the study are public available. The data generated in this study will be available from the corresponding author upon reasonable request. GWAS summary statistics from these analyses are available at [160]CARDIoGRAMplusC4D Consortium ([161]http://www.cardiogramplusc4d.org/). The statistical code needed to reproduce the results in the article is available upon request. CRediT authorship contribution statement Jiqing Li: Writing – review & editing, Writing – original draft, Software, Methodology, Data curation. Jiate Wei: Writing – original draft. Ping Fu: Formal analysis. Jianhua Gu: Funding acquisition, Conceptualization. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Footnotes ^Appendix A Supplementary data to this article can be found online at [162]https://doi.org/10.1016/j.heliyon.2024.e38036. Appendix A. Supplementary data The following is the Supplementary data to this article: Multimedia component 1 [163]mmc1.docx^ (62KB, docx) References