Abstract

   Coronary artery disease (CAD) is a leading cause of death worldwide.
   Recent genome-wide association studies have identified more than one
   hundred susceptibility loci associated with CAD. However, the
   underlying mechanism of these genetic loci to CAD susceptibility is
   still largely unknown. We performed a tissue-specific network analysis
   of CAD using the summary statistics from one of the largest genome-wide
   association studies. Variant-level associations were summarized into
   gene-level associations, and a CAD-related interaction network was
   built using experimentally validated gene interactions and gene
   coexpression in coronary artery. The network contained 102 genes, of
   which 53 were significantly associated with CAD. Pathway enrichment
   analysis revealed that many genes in the network were involved in the
   regulation of peripheral arteries. In summary, we performed a
   tissue-specific network analysis and found abnormalities in the
   peripheral arteries might be an important pathway underlying the
   pathogenesis of CAD. Future functional characterization might further
   validate our findings and identify potential therapeutic targets for
   CAD.

Introduction

   Coronary artery disease (CAD) is a leading cause of death
   worldwide^[30]1–[31]3. CAD is heritable, and a familial history of CAD
   is associated with a significantly increased CAD risk^[32]4. Recent
   genome-wide association studies (GWAS) have identified more than one
   hundred genetic loci associated with CAD^[33]5–[34]11. However, most of
   these loci are located outside of protein coding regions, and their
   implication on CAD is still largely unknown. Moreover, many studies
   have been focused on the association of single variants with CAD. Given
   that the majority of complex diseases including CAD are caused by the
   interplay of many genetic and environmental factors^[35]12,[36]13, it
   is important to jointly investigate the combinatory effects of multiple
   genetic variants on biological pathways and interaction
   networks^[37]14–[38]16.

   The gene function is highly dependent on the tissue where the gene is
   expressed^[39]17, which is controlled by very distinct regulatory
   programs^[40]18. Genes with tissue-specific expression have shown
   important physiological processes for complex organisms^[41]19.
   However, functional studies of human genes have been traditionally
   carried out on specific cell lines, and the characterization of
   tissue-specific interactions is predominantly based on a small sample
   size. Recent advances in next generation sequencing provide an
   unprecedented opportunity to profile gene expression in a much larger
   scale of human samples^[42]20,[43]21. An excellent example is the
   Genotype-Tissue Expression (GTEx) project, which has characterized the
   association of genetic variants with gene expression in nearly 50
   different types of human tissues that were collected from approximately
   1000 individuals^[44]22.

   The objective of our current study is to understand the function of
   CAD-related genetic loci through the gene interaction network. The
   associations of individual variants with CAD were summarized into
   gene-level associations, which were then combined with experimentally
   validated gene interactions as well as gene coexpression in coronary
   artery. A tissue-specific interaction network then built to examine the
   interactions between CAD-related genes and their potential functions in
   terms of biological pathways.

Results

Gene-level Association with CAD

   The association of genetic variants with CAD was previously
   investigated^[45]7. The study included 9,455,778 common variants, among
   which 2,213 variants passed the genome-wide significance
   (P < 5 × 10^−8). Variant-level associations were then summarized into
   gene-level associations using fastBAT^[46]23. Variants within 50 kb of
   the gene region were collapsed and jointly modelled. Figure [47]1 is
   the Manhattan plot showing the association of each of the 26,228 genes
   with CAD. The Q-Q plot of the associations is shown in Supplementary
   Figure [48]1. A total of 143 genes were significantly associated with
   CAD after Bonferroni correction of multiple testing
   (P < 0.05/26,228 = 1.91 × 10^−6). Table [49]1 shows the top 25 genes
   associated with CAD, and the full list of CAD-related genes is shown in
   Supplementary Table [50]1. The most significant gene was CDKN2B-AS1
   (P = 9.45 × 10^−69), which is the antisense of CDKN2B that is located
   at the 9p21 locus. The locus has long been recognized to be associated
   with various cardiovascular diseases^[51]24–[52]28. We also performed a
   sensitivity analysis by expanding the flanking sequence to 100 kb to
   include more regulatory variants. As shown in Supplementary
   Figure [53]2, the association was highly correlated (R^2 = 0.88).

Figure 1.

   [54]Figure 1
   [55]Open in a new tab

   Manhattan plot of gene-level association with CAD. Each dot represents
   one gene. The x-axis represents chromosome positions, whereas the
   y-axis represents the log[10](P). The red dash indicates
   P < 0.05/26228 = 1.91 × 10^−6. Only autosomal variants were included in
   the analysis.

Table 1.

   Top 25 genes associated with CAD.
   Chromosome Gene #Variants within the gene Gene association with CAD (P
   value) Best variant with the gene Best variant association with CAD (P
   value)
   9p21.3 CDKN2B-AS1 283 9.45E-69 rs2891168 2.29E-98
   9p21.3 CDKN2B 96 2.04E-38 rs7028268 4.98E-48
   9p21.3 CDKN2A 119 3.42E-32 rs3217992 1.03E-42
   9p21.3 CDKN2A-AS1 104 2.26E-26 rs3217992 1.03E-42
   6q25.3 LPA 180 8.34E-25 rs55730499 5.39E-39
   19p13.2 SMARCA4 219 5.88E-21 rs56289821 4.44E-15
   15q25.1 ADAMTS7 216 9.37E-20 rs4468572 4.44E-16
   19p13.2 LDLR 256 1.71E-16 rs56289821 4.44E-15
   1p13.3 SORT1 114 2.18E-16 rs7528419 1.97E-23
   15q25.1 MORF4L1 134 6.37E-16 rs4468572 4.44E-16
   19p13.2 MIR6886 182 2.13E-15 rs56289821 4.44E-15
   1p13.3 MYBPHL 113 3.96E-15 rs7528419 1.97E-23
   10q11.21 C10orf142 135 1.31E-14 rs1746050 6.28E-13
   6p24.1 PHACTR1 656 1.59E-14 rs9349379 1.81E-42
   1p13.3 PSRC1 128 1.88E-14 rs7528419 1.97E-23
   6q23.2 LINC01312 113 9.18E-14 rs12202017 1.98E-11
   2q33.2 WDR12 51 2.08E-13 rs115396314 5.11E-18
   6q26 PLG 184 2.28E-13 rs2315065 2.88E-34
   1q41 MIA3 99 7.04E-13 rs67180937 1.01E-12
   1q41 TAF1A-AS1 111 9.66E-13 rs35700460 1.38E-12
   1p13.3 CELSR2 159 9.70E-13 rs7528419 1.97E-23
   2q33.2 CARF 67 1.28E-12 rs115654617 3.12E-18
   10q11.21 LINC00841 222 1.57E-12 rs1870634 5.55E-15
   1q41 TAF1A 132 1.96E-12 rs35700460 1.38E-12
   1p32.2 PLPP3 211 2.40E-12 rs9970807 5.00E-14
   [56]Open in a new tab

Tissue-specific Interaction Network Related to CAD

   We then built a CAD-related interaction network by integrating GWAS and
   gene coexpression in coronary artery. As shown in Fig. [57]2, the
   network is comprised of 102 nodes and 182 edges. Each node represents
   one gene, and each edge represents the interaction between two genes.
   These genes were interconnected together, so not all of the 143
   CAD-related gene were included. Among the 102 genes in the network, 53
   were CAD-related genes (P < 1.91 × 10^−6).

Figure 2.

   [58]Figure 2
   [59]Open in a new tab

   CAD-related network derived from protein-protein interaction. Each node
   represents one gene, wheras each edge represents the interaction
   between two genes. The nodes were colored to represent their
   association with CAD: red color represents genes that were associated
   with CAD, white color represents genes were not associated with CAD.
   The node size is proportional to the number of edges that the node is
   connectted to.

   We then examined the potential functional implication of the network
   using WebGestalt^[60]29. As shown in Table [61]2, six disease pathways
   were significantly enriched with genes in the network (FDR <0.1). The
   most significant pathway is abnormalities of the peripheral arteries
   (P = 1.63 × 10^−6, FDR = 3.52 × 10^−3), which includes eight
   CAD-related genes. In addition, many CAD-related genes were involved in
   xanthomatosis (P = 2.06 × 10^−5, FDR = 2.96 × 10^−2) and cerebral
   ischemia (P = 4.94 × 10^−5, FDR = 5.33 × 10^−2).

Table 2.

   Most significant disease pathways enriched with CAD-related genes
   (FDR < 0.1).
   Disease pathway #Genes in the pathway Ratio of enrichment P value FDR
   Overlapping Genes
   Abnormalities of the peripheral arteries 97 8.98 1.63E-06 3.52E-03
   COL4A1; COL4A2; APOB; APOE; APP; LPL; TP53; BAZ1B
   Xanthomatosis 19 22.93 2.06E-05 2.96E-02 APOB; APOC2; APOE; LPL
   Cerebral ischemia 46 11.84 4.94E-05 5.33E-02 COL4A1; COL4A2; APP; TP53;
   BAZ1B
   Coronary artery disease 51 10.68 8.20E-05 6.80E-02 APOB; APOE; LPL;
   TP53; BAZ1B
   Abnormality of the coronary arteries 53 10.28 9.89E-05 6.80E-02 APOB;
   APOE; LPL; TP53; BAZ1B
   Neoplasm of the adrenal cortex 11 29.71 0.00011 6.80E-02 CDKN2B; MDM2;
   TP53
   [62]Open in a new tab

   A recent study^[63]30 of CAD prioritized 184 candidate genes as the
   most likely causal genes for CAD based on functional evidence. These
   genes were involved in 286 modules. Interestingly, 26 of those
   candidate genes were also included in our network (enrichment
   P < 2.2e-16). We also examined if genes in the network are potential
   drug targets. DrugBank^[64]31 was queried, and 13 genes in the
   CAD-related network were reported as therapeutic targets for at least
   one known or developing drug, suggesting the potential of network
   analysis to identify drug targets.

Identification of Key Drivers in the Network

   We also examined the structure of the CAD-related network, and tested
   if they were any key drivers of the network. Each gene in the network
   was evaluated by the association of its neighbors with CAD, and the
   distribution was then compared with random distribution. A single gene,
   UBC, turned out to be one of the most important genes in the network.
   The gene was nominally associated with CAD (P = 1.25 × 10^−3) but did
   not reach the genome-wide significance cutoff (P < 1.91 × 10^−6).
   However, it interacted with 22 other genes, including 10 CAD-related
   genes, suggesting that it might be an important regulator of the
   CAD-related pathways.

   We also calculated the weighted centroid of the network, which took
   into consideration of both the weight of neighboring genes as well as
   the strength of interactions (represented by the absolute value of the
   correlation coefficient between two genes). As shown in Supplementary
   Table [65]2, UBC, together with CAND1 and SMARCA4, were among the top
   centroid genes.

Discussion

   One challenge in the post-GWAS era is to understand molecular
   mechanisms underlying the association of GWAS loci with diseases. Given
   that many genes are only expressed in certain tissues, it is important
   to study gene functions in a tissue-specific manner. Here we built a
   CAD-related interaction network in coronary artery, and found that many
   of genes in the network were involved in abnormalities of the
   peripheral arteries. Our study suggests that the disturbing of
   peripheral artery functions might be an important pathway leading to
   CAD.

   UBC was found to be a key driver of the network. The gene encodes
   ubiquitin C, a highly conserved gene^[66]32 that acts as a potential
   target of N-Formylmethionine^[67]33. In combination with proteasome,
   the ubiquitin-proteasome system (UPS) is responsible for the
   degradation of up to 80–90% of proteins in mammalian cells^[68]34,
   which is essential to the removal of non-functional or damaged
   polypeptides^[69]35. The ubiquitination also regulates multiple cardiac
   signal transduction pathways^[70]36–[71]38 as well as the promotion of
   pathologic hypertrophic growth of cardiomyocytes^[72]39,[73]40.

   It is interesting to see that APOB and APOE were both involved in
   multiple pathways enriched with CAD-related genes. APOB encodes
   apolipoprotein B, a lipoprotein that is known to be involved in the
   development of CAD^[74]41,[75]42. The inclusion of apolipoprotein B
   level into risk models would significantly improve the prediction
   performance of future risk of coronary heart disease^[76]43. Genetic
   variations in APOB were also associated with Mendelian diseases such as
   familial hypobetalipoproteinemia^[77]44. APOE encodes a ligand that
   binds to both low density lipoprotein receptor and APOE-specific
   receptor. It is involved in the regulation of cholesterol and the
   metabolism of lipoproteins, and the polymorphisms of APOE are
   associated with atherosclerosis^[78]45 and coronary heart
   disease^[79]46,[80]47.

   Tissue-specific expression has been recognized as an important pattern
   for many diseases. Traditional expression quantitative trait loci
   (eQTLs) studies were mostly focused on gene expression in the blood due
   to convenience and accessibility. However, these studies have limited
   power to detect tissue-specific expression profiles. Recent efforts are
   being devoted to identify eQTLs across a variety of tissue
   types^[81]22,[82]48,[83]49, which would empower an in-depth study of
   gene function among different types of tissues^[84]50,[85]51.

   GWAS provided an unbiased approach to screen a huge amount of genetic
   variants with complex diseases. In order to correct for multiple
   testing and reduce false positives, a stringent significance cutoff is
   applied (typically P < 5 × 10^−8), which on the other hand, could
   result in the loss of many true associations. By integrating gene
   interactions, we were able to reprioritize gene signals, and identify
   some additional associations that might be otherwise missed because of
   moderate significance. One example is TGFB1, which encodes a
   transforming growth factor, which plays an important role in the
   regulation vascular smooth muscle^[86]52. The gene was significantly
   associated with CAD after summarizing genetic variants within the gene
   region (P = 6.26 × 10^−9). However, the most significant SNP within the
   gene region was rs15052 (P < 2.21 × 10^−7), which did not reach the
   genome-wide significance cutoff (P < 5 × 10^−8). A recent study^[87]10
   that combined UK Biobank together with CARDIoGRAMplusC4D 1000
   Genomes-based GWAS and the Myocardial Infarction Genetics and
   CARDIoGRAM Exome found that rs8108632 within the TGFB1 region was
   significantly associated with CAD (P = 4.04 × 10^−8). Our results
   suggest that a combination of joint testing of multiple variants
   together with gene interactions could offer additional power to
   identify novel susceptibility loci for complex diseases.

   Given that many top variants identified by GWAS are just tagging
   variants, it is challenging to pinpoint the casual variants/genes. For
   example, the most significant variant associated with body mass index
   is rs9930506, which is located within the intron of FTO. However, it
   was found that the causative gene at the locus is actually IRX3, which
   is located ~480 kb away from the top SNP rs9930506^[88]53.
   Appropriately 6.22% of cis-eQTLs in coronary artery are even more than
   500 kb away from the corresponding genes^[89]22, suggesting that causal
   genetic variants could be far away from the genes. On the other hand,
   the inclusion of long-range regulation could also result in an increase
   of irrelevant genes. Therefore it is useful to include additional
   annotations such as those derived from the ENCODE Project^[90]54 and
   the Roadmap Epigenomics Project^[91]55.

   Our study has several limitations. The interaction network was built
   from previously reported protein interactions. However, it was
   estimated that only a small proportion of interactions have been
   characterized^[92]56–[93]58. Therefore, many important interactions
   might still be missing and thus were not included in the current
   analysis. In addition, tissue-specific interactions were estimated from
   the correlation of gene expression, which had limited power to identify
   interactions between genes with low expression. It is also worth to
   note that the key driver analysis is a hypothesis generating approach
   solely based on the strength of associations of individual genes and
   the network structure. It is by no means to indicate causality between
   genes. Further experimental validation is needed to understand the
   mechanism of associations and the biological processes involved in the
   etiology of CAD.

   In conclusion, we performed a tissue-specific network analysis of
   genetic variants associated with CAD. Our study underscored the role of
   abnormalities in the peripheral arteries in the pathogenesis of CAD.
   Future functional characterization of CAD-related gene might identify
   potential therapeutic targets for CAD.

Methods

Association of Genetic Variants with CAD

   The summary statistics of genetic variants associated with CAD was
   obtained from the CARDIoGRAMplusC4D 1000 Genomes-based genome-wide
   association study
   ([94]http://www.cardiogramplusc4d.org/data-downloads/), which included
   48 studies with a total of 60,801 CAD cases and 123,504 controls^[95]7.
   The raw genotypes were imputed to 1000 Genomes phase 1 v3 that included
   more than 38 million variants, but only common variants (minor allele
   frequency ≥1%) were included in the current analysis.

Derivation of Gene-level Association with CAD

   Variant-level associations were summarized into gene-level associations
   using fastBAT^[96]23. For each gene, we took into account of all the
   variants within 50 kb of the gene region, and joint tested their
   association with CAD. Given that tens of thousands of genes were
   tested, a nominal significance cutoff (P < 0.05) could result in a
   number of false positives. Therefore Bonferroni correction was used,
   which is a conservative adjustment that assumes all tests are
   independent. Genes with p-value less than 0.05/N were considered as
   significant, where N was number of tested genes.

Construction of a Tissue-specific Interaction Network

   Gene interactions were obtained from the iRefIndex database^[97]59.
   Self-interactions or interactions involved in non-human genes were
   removed from downstream analyses. Gene coexpression was based on 173
   coronary artery samples from the GTEx project^[98]22. Pearson
   correlation was calculated for each interaction gene pair, and pairs
   with normalized correlation higher than 0.25 were considered as
   coexpression in the tissue.

Construction of Gene Interaction Network Related to CAD

   A dense module searching strategy^[99]60 was implemented to identify
   modules enriched with CAD-related genes. Each gene was assigned a score
   equivalent to the absolute value of the z-score of the association with
   CAD. Seed genes were defined as those significantly associated with CAD
   after Bonferroni correction. The module searching started with a single
   seed gene. Neighboring genes were then added sequentially to the module
   if the addition would increase the overall module score^[100]61, which
   was defined as
   [MATH:
   <msub><mrow><mi>Z</mi></mrow><mrow><mi>m</mi></mrow></msub><mo>=</mo><m
   frac><mrow><mo>∑</mo><msub><mrow><mi>g</mi></mrow><mrow><mi>i</mi></mro
   w></msub></mrow><mrow><msqrt><mi>k</mi></msqrt></mrow></mfrac> :MATH]
   , where k was the number of genes in the module, and
   [MATH: <msub><mrow><mi>g</mi></mrow><mrow><mi>i</mi></mrow></msub>
   :MATH]
   was the score of gene i. The process iterated until no more genes could
   be added. Modules derived from different seed genes were highly
   overlapped and thus were merged to create a combined interaction
   network.

Identification of Key Drivers in the Network

   Key drivers were defined as genes that interact with more CAD-related
   genes than what would be expected from a randomly selected gene set
   with an equal number of genes. These genes are pivotal to the structure
   of the network and might be potential targets for further functional
   characterization. The score of each gene was defined as the z-score of
   the association with CAD. The Kolmogorov–Smirnov test was then used to
   assess the deviation of scores of neighboring genes from the random
   expectation. To correct for multiple testing, Bonferroni adjustment was
   used and genes with P < 0.05/N were defined as key drivers, where N was
   the total number of genes in the network. In addition, we also
   calculated the weighted centroid of the network, which was defined as
   [MATH:
   <msub><mrow><mi>C</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>=</mo><m
   sub><mrow><mi>g</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>+</mo><mun
   der><mo>∑</mo><mrow><mi>k</mi></mrow></munder><mrow><mo
   stretchy="false">|</mo></mrow><msub><mrow><mi>w</mi></mrow><mrow><mi>i<
   /mi><mo>,</mo><mi>k</mi></mrow></msub><mrow><mo
   stretchy="false">|</mo></mrow><mo>∗</mo><msub><mrow><mi>g</mi></mrow><m
   row><mi>k</mi></mrow></msub> :MATH]
   , where C[i] was the weighted centroid of gene i, and w[i,k] was the
   correlation coefficient between gene i and gene k, and g[k] is the
   score of gene k.

Electronic supplementary material

   [101]Supplementary materials^ (167.8KB, docx)

Acknowledgements