Abstract Coronary artery disease (CAD) is a leading cause of death worldwide. Recent genome-wide association studies have identified more than one hundred susceptibility loci associated with CAD. However, the underlying mechanism of these genetic loci to CAD susceptibility is still largely unknown. We performed a tissue-specific network analysis of CAD using the summary statistics from one of the largest genome-wide association studies. Variant-level associations were summarized into gene-level associations, and a CAD-related interaction network was built using experimentally validated gene interactions and gene coexpression in coronary artery. The network contained 102 genes, of which 53 were significantly associated with CAD. Pathway enrichment analysis revealed that many genes in the network were involved in the regulation of peripheral arteries. In summary, we performed a tissue-specific network analysis and found abnormalities in the peripheral arteries might be an important pathway underlying the pathogenesis of CAD. Future functional characterization might further validate our findings and identify potential therapeutic targets for CAD. Introduction Coronary artery disease (CAD) is a leading cause of death worldwide^[30]1–[31]3. CAD is heritable, and a familial history of CAD is associated with a significantly increased CAD risk^[32]4. Recent genome-wide association studies (GWAS) have identified more than one hundred genetic loci associated with CAD^[33]5–[34]11. However, most of these loci are located outside of protein coding regions, and their implication on CAD is still largely unknown. Moreover, many studies have been focused on the association of single variants with CAD. Given that the majority of complex diseases including CAD are caused by the interplay of many genetic and environmental factors^[35]12,[36]13, it is important to jointly investigate the combinatory effects of multiple genetic variants on biological pathways and interaction networks^[37]14–[38]16. The gene function is highly dependent on the tissue where the gene is expressed^[39]17, which is controlled by very distinct regulatory programs^[40]18. Genes with tissue-specific expression have shown important physiological processes for complex organisms^[41]19. However, functional studies of human genes have been traditionally carried out on specific cell lines, and the characterization of tissue-specific interactions is predominantly based on a small sample size. Recent advances in next generation sequencing provide an unprecedented opportunity to profile gene expression in a much larger scale of human samples^[42]20,[43]21. An excellent example is the Genotype-Tissue Expression (GTEx) project, which has characterized the association of genetic variants with gene expression in nearly 50 different types of human tissues that were collected from approximately 1000 individuals^[44]22. The objective of our current study is to understand the function of CAD-related genetic loci through the gene interaction network. The associations of individual variants with CAD were summarized into gene-level associations, which were then combined with experimentally validated gene interactions as well as gene coexpression in coronary artery. A tissue-specific interaction network then built to examine the interactions between CAD-related genes and their potential functions in terms of biological pathways. Results Gene-level Association with CAD The association of genetic variants with CAD was previously investigated^[45]7. The study included 9,455,778 common variants, among which 2,213 variants passed the genome-wide significance (P < 5 × 10^−8). Variant-level associations were then summarized into gene-level associations using fastBAT^[46]23. Variants within 50 kb of the gene region were collapsed and jointly modelled. Figure [47]1 is the Manhattan plot showing the association of each of the 26,228 genes with CAD. The Q-Q plot of the associations is shown in Supplementary Figure [48]1. A total of 143 genes were significantly associated with CAD after Bonferroni correction of multiple testing (P < 0.05/26,228 = 1.91 × 10^−6). Table [49]1 shows the top 25 genes associated with CAD, and the full list of CAD-related genes is shown in Supplementary Table [50]1. The most significant gene was CDKN2B-AS1 (P = 9.45 × 10^−69), which is the antisense of CDKN2B that is located at the 9p21 locus. The locus has long been recognized to be associated with various cardiovascular diseases^[51]24–[52]28. We also performed a sensitivity analysis by expanding the flanking sequence to 100 kb to include more regulatory variants. As shown in Supplementary Figure [53]2, the association was highly correlated (R^2 = 0.88). Figure 1. [54]Figure 1 [55]Open in a new tab Manhattan plot of gene-level association with CAD. Each dot represents one gene. The x-axis represents chromosome positions, whereas the y-axis represents the log[10](P). The red dash indicates P < 0.05/26228 = 1.91 × 10^−6. Only autosomal variants were included in the analysis. Table 1. Top 25 genes associated with CAD. Chromosome Gene #Variants within the gene Gene association with CAD (P value) Best variant with the gene Best variant association with CAD (P value) 9p21.3 CDKN2B-AS1 283 9.45E-69 rs2891168 2.29E-98 9p21.3 CDKN2B 96 2.04E-38 rs7028268 4.98E-48 9p21.3 CDKN2A 119 3.42E-32 rs3217992 1.03E-42 9p21.3 CDKN2A-AS1 104 2.26E-26 rs3217992 1.03E-42 6q25.3 LPA 180 8.34E-25 rs55730499 5.39E-39 19p13.2 SMARCA4 219 5.88E-21 rs56289821 4.44E-15 15q25.1 ADAMTS7 216 9.37E-20 rs4468572 4.44E-16 19p13.2 LDLR 256 1.71E-16 rs56289821 4.44E-15 1p13.3 SORT1 114 2.18E-16 rs7528419 1.97E-23 15q25.1 MORF4L1 134 6.37E-16 rs4468572 4.44E-16 19p13.2 MIR6886 182 2.13E-15 rs56289821 4.44E-15 1p13.3 MYBPHL 113 3.96E-15 rs7528419 1.97E-23 10q11.21 C10orf142 135 1.31E-14 rs1746050 6.28E-13 6p24.1 PHACTR1 656 1.59E-14 rs9349379 1.81E-42 1p13.3 PSRC1 128 1.88E-14 rs7528419 1.97E-23 6q23.2 LINC01312 113 9.18E-14 rs12202017 1.98E-11 2q33.2 WDR12 51 2.08E-13 rs115396314 5.11E-18 6q26 PLG 184 2.28E-13 rs2315065 2.88E-34 1q41 MIA3 99 7.04E-13 rs67180937 1.01E-12 1q41 TAF1A-AS1 111 9.66E-13 rs35700460 1.38E-12 1p13.3 CELSR2 159 9.70E-13 rs7528419 1.97E-23 2q33.2 CARF 67 1.28E-12 rs115654617 3.12E-18 10q11.21 LINC00841 222 1.57E-12 rs1870634 5.55E-15 1q41 TAF1A 132 1.96E-12 rs35700460 1.38E-12 1p32.2 PLPP3 211 2.40E-12 rs9970807 5.00E-14 [56]Open in a new tab Tissue-specific Interaction Network Related to CAD We then built a CAD-related interaction network by integrating GWAS and gene coexpression in coronary artery. As shown in Fig. [57]2, the network is comprised of 102 nodes and 182 edges. Each node represents one gene, and each edge represents the interaction between two genes. These genes were interconnected together, so not all of the 143 CAD-related gene were included. Among the 102 genes in the network, 53 were CAD-related genes (P < 1.91 × 10^−6). Figure 2. [58]Figure 2 [59]Open in a new tab CAD-related network derived from protein-protein interaction. Each node represents one gene, wheras each edge represents the interaction between two genes. The nodes were colored to represent their association with CAD: red color represents genes that were associated with CAD, white color represents genes were not associated with CAD. The node size is proportional to the number of edges that the node is connectted to. We then examined the potential functional implication of the network using WebGestalt^[60]29. As shown in Table [61]2, six disease pathways were significantly enriched with genes in the network (FDR <0.1). The most significant pathway is abnormalities of the peripheral arteries (P = 1.63 × 10^−6, FDR = 3.52 × 10^−3), which includes eight CAD-related genes. In addition, many CAD-related genes were involved in xanthomatosis (P = 2.06 × 10^−5, FDR = 2.96 × 10^−2) and cerebral ischemia (P = 4.94 × 10^−5, FDR = 5.33 × 10^−2). Table 2. Most significant disease pathways enriched with CAD-related genes (FDR < 0.1). Disease pathway #Genes in the pathway Ratio of enrichment P value FDR Overlapping Genes Abnormalities of the peripheral arteries 97 8.98 1.63E-06 3.52E-03 COL4A1; COL4A2; APOB; APOE; APP; LPL; TP53; BAZ1B Xanthomatosis 19 22.93 2.06E-05 2.96E-02 APOB; APOC2; APOE; LPL Cerebral ischemia 46 11.84 4.94E-05 5.33E-02 COL4A1; COL4A2; APP; TP53; BAZ1B Coronary artery disease 51 10.68 8.20E-05 6.80E-02 APOB; APOE; LPL; TP53; BAZ1B Abnormality of the coronary arteries 53 10.28 9.89E-05 6.80E-02 APOB; APOE; LPL; TP53; BAZ1B Neoplasm of the adrenal cortex 11 29.71 0.00011 6.80E-02 CDKN2B; MDM2; TP53 [62]Open in a new tab A recent study^[63]30 of CAD prioritized 184 candidate genes as the most likely causal genes for CAD based on functional evidence. These genes were involved in 286 modules. Interestingly, 26 of those candidate genes were also included in our network (enrichment P < 2.2e-16). We also examined if genes in the network are potential drug targets. DrugBank^[64]31 was queried, and 13 genes in the CAD-related network were reported as therapeutic targets for at least one known or developing drug, suggesting the potential of network analysis to identify drug targets. Identification of Key Drivers in the Network We also examined the structure of the CAD-related network, and tested if they were any key drivers of the network. Each gene in the network was evaluated by the association of its neighbors with CAD, and the distribution was then compared with random distribution. A single gene, UBC, turned out to be one of the most important genes in the network. The gene was nominally associated with CAD (P = 1.25 × 10^−3) but did not reach the genome-wide significance cutoff (P < 1.91 × 10^−6). However, it interacted with 22 other genes, including 10 CAD-related genes, suggesting that it might be an important regulator of the CAD-related pathways. We also calculated the weighted centroid of the network, which took into consideration of both the weight of neighboring genes as well as the strength of interactions (represented by the absolute value of the correlation coefficient between two genes). As shown in Supplementary Table [65]2, UBC, together with CAND1 and SMARCA4, were among the top centroid genes. Discussion One challenge in the post-GWAS era is to understand molecular mechanisms underlying the association of GWAS loci with diseases. Given that many genes are only expressed in certain tissues, it is important to study gene functions in a tissue-specific manner. Here we built a CAD-related interaction network in coronary artery, and found that many of genes in the network were involved in abnormalities of the peripheral arteries. Our study suggests that the disturbing of peripheral artery functions might be an important pathway leading to CAD. UBC was found to be a key driver of the network. The gene encodes ubiquitin C, a highly conserved gene^[66]32 that acts as a potential target of N-Formylmethionine^[67]33. In combination with proteasome, the ubiquitin-proteasome system (UPS) is responsible for the degradation of up to 80–90% of proteins in mammalian cells^[68]34, which is essential to the removal of non-functional or damaged polypeptides^[69]35. The ubiquitination also regulates multiple cardiac signal transduction pathways^[70]36–[71]38 as well as the promotion of pathologic hypertrophic growth of cardiomyocytes^[72]39,[73]40. It is interesting to see that APOB and APOE were both involved in multiple pathways enriched with CAD-related genes. APOB encodes apolipoprotein B, a lipoprotein that is known to be involved in the development of CAD^[74]41,[75]42. The inclusion of apolipoprotein B level into risk models would significantly improve the prediction performance of future risk of coronary heart disease^[76]43. Genetic variations in APOB were also associated with Mendelian diseases such as familial hypobetalipoproteinemia^[77]44. APOE encodes a ligand that binds to both low density lipoprotein receptor and APOE-specific receptor. It is involved in the regulation of cholesterol and the metabolism of lipoproteins, and the polymorphisms of APOE are associated with atherosclerosis^[78]45 and coronary heart disease^[79]46,[80]47. Tissue-specific expression has been recognized as an important pattern for many diseases. Traditional expression quantitative trait loci (eQTLs) studies were mostly focused on gene expression in the blood due to convenience and accessibility. However, these studies have limited power to detect tissue-specific expression profiles. Recent efforts are being devoted to identify eQTLs across a variety of tissue types^[81]22,[82]48,[83]49, which would empower an in-depth study of gene function among different types of tissues^[84]50,[85]51. GWAS provided an unbiased approach to screen a huge amount of genetic variants with complex diseases. In order to correct for multiple testing and reduce false positives, a stringent significance cutoff is applied (typically P < 5 × 10^−8), which on the other hand, could result in the loss of many true associations. By integrating gene interactions, we were able to reprioritize gene signals, and identify some additional associations that might be otherwise missed because of moderate significance. One example is TGFB1, which encodes a transforming growth factor, which plays an important role in the regulation vascular smooth muscle^[86]52. The gene was significantly associated with CAD after summarizing genetic variants within the gene region (P = 6.26 × 10^−9). However, the most significant SNP within the gene region was rs15052 (P < 2.21 × 10^−7), which did not reach the genome-wide significance cutoff (P < 5 × 10^−8). A recent study^[87]10 that combined UK Biobank together with CARDIoGRAMplusC4D 1000 Genomes-based GWAS and the Myocardial Infarction Genetics and CARDIoGRAM Exome found that rs8108632 within the TGFB1 region was significantly associated with CAD (P = 4.04 × 10^−8). Our results suggest that a combination of joint testing of multiple variants together with gene interactions could offer additional power to identify novel susceptibility loci for complex diseases. Given that many top variants identified by GWAS are just tagging variants, it is challenging to pinpoint the casual variants/genes. For example, the most significant variant associated with body mass index is rs9930506, which is located within the intron of FTO. However, it was found that the causative gene at the locus is actually IRX3, which is located ~480 kb away from the top SNP rs9930506^[88]53. Appropriately 6.22% of cis-eQTLs in coronary artery are even more than 500 kb away from the corresponding genes^[89]22, suggesting that causal genetic variants could be far away from the genes. On the other hand, the inclusion of long-range regulation could also result in an increase of irrelevant genes. Therefore it is useful to include additional annotations such as those derived from the ENCODE Project^[90]54 and the Roadmap Epigenomics Project^[91]55. Our study has several limitations. The interaction network was built from previously reported protein interactions. However, it was estimated that only a small proportion of interactions have been characterized^[92]56–[93]58. Therefore, many important interactions might still be missing and thus were not included in the current analysis. In addition, tissue-specific interactions were estimated from the correlation of gene expression, which had limited power to identify interactions between genes with low expression. It is also worth to note that the key driver analysis is a hypothesis generating approach solely based on the strength of associations of individual genes and the network structure. It is by no means to indicate causality between genes. Further experimental validation is needed to understand the mechanism of associations and the biological processes involved in the etiology of CAD. In conclusion, we performed a tissue-specific network analysis of genetic variants associated with CAD. Our study underscored the role of abnormalities in the peripheral arteries in the pathogenesis of CAD. Future functional characterization of CAD-related gene might identify potential therapeutic targets for CAD. Methods Association of Genetic Variants with CAD The summary statistics of genetic variants associated with CAD was obtained from the CARDIoGRAMplusC4D 1000 Genomes-based genome-wide association study ([94]http://www.cardiogramplusc4d.org/data-downloads/), which included 48 studies with a total of 60,801 CAD cases and 123,504 controls^[95]7. The raw genotypes were imputed to 1000 Genomes phase 1 v3 that included more than 38 million variants, but only common variants (minor allele frequency ≥1%) were included in the current analysis. Derivation of Gene-level Association with CAD Variant-level associations were summarized into gene-level associations using fastBAT^[96]23. For each gene, we took into account of all the variants within 50 kb of the gene region, and joint tested their association with CAD. Given that tens of thousands of genes were tested, a nominal significance cutoff (P < 0.05) could result in a number of false positives. Therefore Bonferroni correction was used, which is a conservative adjustment that assumes all tests are independent. Genes with p-value less than 0.05/N were considered as significant, where N was number of tested genes. Construction of a Tissue-specific Interaction Network Gene interactions were obtained from the iRefIndex database^[97]59. Self-interactions or interactions involved in non-human genes were removed from downstream analyses. Gene coexpression was based on 173 coronary artery samples from the GTEx project^[98]22. Pearson correlation was calculated for each interaction gene pair, and pairs with normalized correlation higher than 0.25 were considered as coexpression in the tissue. Construction of Gene Interaction Network Related to CAD A dense module searching strategy^[99]60 was implemented to identify modules enriched with CAD-related genes. Each gene was assigned a score equivalent to the absolute value of the z-score of the association with CAD. Seed genes were defined as those significantly associated with CAD after Bonferroni correction. The module searching started with a single seed gene. Neighboring genes were then added sequentially to the module if the addition would increase the overall module score^[100]61, which was defined as [MATH: Zm=gik :MATH] , where k was the number of genes in the module, and [MATH: gi :MATH] was the score of gene i. The process iterated until no more genes could be added. Modules derived from different seed genes were highly overlapped and thus were merged to create a combined interaction network. Identification of Key Drivers in the Network Key drivers were defined as genes that interact with more CAD-related genes than what would be expected from a randomly selected gene set with an equal number of genes. These genes are pivotal to the structure of the network and might be potential targets for further functional characterization. The score of each gene was defined as the z-score of the association with CAD. The Kolmogorov–Smirnov test was then used to assess the deviation of scores of neighboring genes from the random expectation. To correct for multiple testing, Bonferroni adjustment was used and genes with P < 0.05/N were defined as key drivers, where N was the total number of genes in the network. In addition, we also calculated the weighted centroid of the network, which was defined as [MATH: Ci=gi+k|wi< /mi>,k|gk :MATH] , where C[i] was the weighted centroid of gene i, and w[i,k] was the correlation coefficient between gene i and gene k, and g[k] is the score of gene k. Electronic supplementary material [101]Supplementary materials^ (167.8KB, docx) Acknowledgements