Abstract

   Network analysis is a novel method to understand the complex
   pathogenesis of inflammation-driven atherosclerosis. Using this
   approach, we attempted to identify key inflammatory genes and their
   core transcriptional regulators in coronary artery disease (CAD).
   Initially, we obtained 124 candidate genes associated with inflammation
   and CAD using Polysearch and CADgene database for which protein-protein
   interaction network was generated using STRING 9.0 (Search Tool for the
   Retrieval of Interacting Genes) and visualized using Cytoscape v 2.8.3.
   Based on betweenness centrality (BC) and node degree as key topological
   parameters, we identified interleukin-6 (IL-6), vascular endothelial
   growth factor A (VEGFA), interleukin-1 beta (IL-1B), tumor necrosis
   factor (TNF) and prostaglandin-endoperoxide synthase 2 (PTGS2) as hub
   nodes. The backbone network constructed with these five hub genes
   showed 111 nodes connected via 348 edges, with IL-6 having the largest
   degree and highest BC. Nuclear factor kappa B1 (NFKB1), signal
   transducer and activator of transcription 3 (STAT3) and JUN were
   identified as the three core transcription factors from the regulatory
   network derived using MatInspector. For the purpose of validation of
   the hub genes, 97 test networks were constructed, which revealed the
   accuracy of the backbone network to be 0.7763 while the frequency of
   the hub nodes remained largely unaltered. Pathway enrichment analysis
   with ClueGO, KEGG and REACTOME showed significant enrichment of six
   validated CAD pathways - smooth muscle cell proliferation, acute-phase
   response, calcidiol 1-monooxygenase activity, toll-like receptor
   signaling, NOD-like receptor signaling and adipocytokine signaling
   pathways. Experimental verification of the above findings in 64 cases
   and 64 controls showed increased expression of the five candidate genes
   and the three transcription factors in the cases relative to the
   controls (p<0.05). Thus, analysis of complex networks aid in the
   prioritization of genes and their transcriptional regulators in complex
   diseases.

Introduction

   Coronary Artery Disease (CAD) is a chronic inflammatory disease. There
   is ample evidence on the role of inflammation in all stages of the
   atherosclerotic disease process [29][1], [30][2]. Genetic studies have
   revealed many causal or susceptible inflammatory loci associated with
   CAD, the manifest form of atherosclerosis [31][3], [32][4]. Circulating
   inflammatory biomarkers such as C-reactive proteins (CRP) and certain
   cytokines are also elevated in acute coronary syndrome, which reflect
   the extent of myocardial necrosis and ischemia/reperfusion damage
   [33][5]. All these studies singularly demonstrate that inflammatory
   genes act in an interactive manner to orchestrate the disease
   associated risk. In most conditions, inflammation signaling show a
   cascade effect with some molecules acting as primary triggers that
   stimulate a secondary line of molecules and so on, eventually
   generating a strong inflammatory milieu. Identification of such key
   inflammatory targets is critical from a translational aspect in order
   to treat the ‘inflammation’ component of the disease that can lead to
   slowing down or even arrest in its pathological and clinical
   progression.

   In this regard, systems biology focuses on understanding the complex
   nature of CAD by integrating the information across various systems
   such as the genome, transcriptome, proteome and the metabolome, which
   is in direct contrast to traditional approaches that focus on
   individual genes, proteins or metabolites [34][6], [35][7]. Despite
   decades of genetic research on CAD, it is striking to note that the
   identified genetic loci explain only a small proportion of the disease
   heritability [36][8], suggesting that there may still be many other
   genes involved in CAD that remain unknown to date. Network based
   approach is being extensively used for the prediction of putative
   candidate genes, prioritizing drug targets [37][9] and in the
   construction of gene regulatory networks [38][10] thus utilizing the
   data from gene expression or genome wide linkage and association
   studies. This approach primarily focuses on the inter-relationship
   between the various components using protein-protein interaction (PPI)
   network and assist in the identification of the best discerning
   molecules associated with the disease. Studies have employed a PPI
   network-based approach to identify important novel cardiovascular
   disease genes [39][11], [40][12]. There is growing interest in
   combining genome wide association studies with network analysis to
   improve our understanding of the molecular basis of complex diseases
   [41][13], [42][14]. Feldmann et al. and more recently Björkegren et al
   utilized this combined approach to delineate the transcription factor-
   regulatory molecules that potentiate the key target genes in
   atherosclerosis [43][15], [44][16]. Motivated by these studies and to
   overcome the inherent limitations in the selection of genes for network
   construction, we have taken into consideration all the inflammatory
   genes implicated in various inflammatory diseases and systematically
   analyzed the putative causal genes and deregulated pathways. Thus, the
   present study enumerates the following work: i) identify key
   inflammatory genes in CAD derived from an inflammatory bionetwork,
   generated with the help of bioinformatics tools; ii) perform functional
   enrichment analysis to ascribe biological functions to the key genes;
   iii) identify key transcriptional regulators; and iv) study the whole
   blood gene expression levels of these regulators in a matched cohort of
   CAD patients (cases) and controls. The study findings provide
   interesting evidence of increased expression of key candidate genes and
   their transcriptional regulators among the cases, thus enabling us to
   prioritize the significant contributors of inflammation among a myriad
   of inflammatory biomarkers through the application of computational
   biology. Given the simplicity of the methodology involved, this
   framework can be easily extended to other complex diseases.

Materials and Methods

   The general schematic approach of the study is summarized in [45]Figure
   1.

Figure 1. An overview of the work flow.

   [46]Figure 1
   [47]Open in a new tab

   An overview of the work flow has been summarized and consists of the
   following steps- Step 1: Retrieval of inflammatory genes from
   literature using PolySearch text mining tool and from CADgene database;
   Step 2: Analysis of protein interactions using STRING database; Step 3:
   Topological analysis of network using Cytoscape v2.8, based on
   betweenness centrality and node degree, leading to the identification
   of hub genes and construction of the backbone network; Step 4:
   Construction of a regulatory network for the hub genes and
   identification of common transcription factors (TFs) regulating them;
   Step 5: Validation of the hub genes based on cross-validation (1 to 4
   node deletion), functional enrichment analysis (ClueGo) and
   quantitative evaluation of key genes and their common transcription
   factors by real-time PCR.

Gene retrieval and prioritization

   Genes associated with inflammation in general and CAD in particular was
   used for the construction of the inflammatory bionetwork. These genes
   were retrieved from two independent sources, namely PolySearch and
   CADgene database. PolySearch is a comprehensive text mining system that
   extracts the relationships between diseases, genes, mutations, drugs
   and metabolites in humans from different types of biomedical text
   databases such as PubMed, OMIM, DrugBank, SwissProt, Human Metabolome
   Database (HMDB), Human Protein Reference Database (HPRD) and Genetic
   Association Database (GAD) [48][17]. The search was performed using
   ‘Disease-Gene/Protein Association’ for query type and ‘Inflammation’ as
   the query keyword, which resulted in a list of 589 genes. This initial
   gene list was prioritized based on a relevancy score, expressed as Z
   score, and refers to the number of standard deviations of the relevancy
   score above the mean value. A higher Z score denotes a lesser
   likelihood that the outcome is due to chance. We selected genes having
   a Z score >1 and arrived at a final list of 78 genes.

   Next, we selected the inflammatory genes from CADgene database
   [49][18], which includes ∼300 genes curated from literature and
   categorized according to their functionality. This search yielded 68
   genes. The lists of genes obtained from the two databases were then
   combined and the redundant gene symbols were removed, to finally obtain
   124 genes ([50]Table S1). We then searched for disease terms associated
   with the 124 putative genes using FunDO [51][19], which suggested that
   70% (87 genes out of 124) of the genes were associated with CAD and
   belonged to the following disease ontology (DO) terms: atherosclerosis,
   multiple sclerosis, intermediate coronary syndrome, heart failure,
   ischemia, cardiovascular disease, heart disease and vascular disease.

Protein-Protein interaction (PPI) network

   The 124 genes listed in [52]Table S1 were considered as the seed
   molecules from which we obtained direct and indirect protein-protein
   interactions using STRING 9.0 database (Search Tool for the Retrieval
   of Interacting Genes) [53][20]. This database provides information on
   both experimental and predicted interactions from varied sources based
   on their neighborhood, gene fusions, co-occurrence, co-expression,
   experiments and literature mining. We constructed an extended network
   based on a high confidence score of 0.7, which implies that only
   interactions with high level of confidence were extracted from the
   database and considered as valid links for the PPI network.

Construction and analysis of the network

   The protein-protein interaction (PPI) network was visualized using
   Cytoscape v 2.8.3 software. The network was analyzed based on
   topological parameters like betweenness centrality (BC) and node degree
   using a Cytoscape plug-in called ‘Network Analyzer’ [54][21]. In a
   given network, each gene is represented as a node and the interactions
   between the nodes are defined as edges. Degree indicates the number of
   edges linked to a given node and nodes having high degree may represent
   the hub genes possessing important biological functions. Betweenness
   centrality reflects the importance of the node based on the number of
   shortest paths that pass through each node. The final network was
   visualized based on these parameters wherein we mapped the node degree
   to the node size and betweenness to the node color in the network view.
   Nodes having high degree were displayed as a big circle while shades of
   red to green color represented high to low BC values for the node
   [55][21]. In short, we considered nodes carrying high degree and
   betweenness centrality to be the hub genes.

Construction of the regulatory network

   We carried out promoter analysis using MatInspector program from
   Genomatix software suite (Genomatix, Munich, Germany) [56][22] to
   identify transcription factors (TFs) that regulate the hub genes.
   Transcription factors which showed evidence of more than one binding
   site was selected for further analysis. Functionally related or similar
   TFs were grouped into the same family. Following the selection of
   regulators for the seed genes, we constructed the regulatory network
   using Cytoscape. Transcription factors that showed high connectivity
   with the target genes were then selected for assessing their expression
   levels in the matched cases and controls.

Validation of hub genes

   In order to evaluate the accuracy of our backbone network and determine
   the frequency of hub genes, we constructed test networks using a part
   of the 124 seed genes. The test networks were constructed by leaving
   out 1 to 4 genes by repeated removal of the top 22 nodes that had the
   highest BC and node degree. Deletion of one node each time would result
   in the construction of 22 networks. However, if the number of omitted
   genes was more than 2, we would have had a large number of
   combinations. Therefore, deletion of more than one node was carried out
   five times by omitting one hub gene each time and randomly selecting
   the remaining genes from the top 22 nodes. This step was carried out
   for all the hub genes, namely IL-6, TNF, IL1B, VEGFA and PTGS2.
   Deletion of 2, 3 or 4 nodes resulted in the construction of 25
   networks. Finally 97 (22+3*25) test networks were constructed and
   analyzed using Cytoscape v 2.8.3. The detailed list of nodes omitted
   for each test network is provided in [57]Table S3. Eventually, the top
   5 nodes were determined in these 97 test networks. The accuracy of the
   network, which is the proportion of the hubs retained in the test
   network to the number of hubs (n = 5) in the main network, was finally
   measured.

Functional enrichment analysis

   To further understand the biological relevance of the hub genes and
   their regulators in CAD, we performed functional enrichment analysis
   using ClueGO [58][23]. ClueGO facilitates the visualization of
   functionally related genes displayed as a clustered network and chart.
   The statistical test used for the enrichment was based on right-sided
   hypergeometric option with a Benjamini-Hochberg correction and kappa
   score of 0.3.

Testing the mRNA expression of genes constituting the backbone network and
core regulatory molecules by real-time qPCR

   Potential candidate genes and their common regulators obtained from the
   network analyses were selected for mRNA expression assay by
   quantitative RT-PCR (real time-PCR or qPCR) in 64 CAD individuals and
   64 matched controls.

Study population

   Two groups consisting of CAD patients (n = 64) and age and
   gender-matched controls (n = 64) were selected from the Indian
   Atherosclerosis Research Study (IARS). Detailed design of the IARS has
   been published [59][24]. Briefly, the IARS is an ongoing
   epidemiological study with an objective to investigate the genetic
   factors and biomarkers against a backdrop of the conventional
   cardiovascular risk factors in Asian Indians living in India.
   Recruitment of cases and controls was based on predefined
   inclusion/exclusion criteria. CAD patients were included if they
   belonged to any of the following criteria: i) angiographically
   confirmed presence of CAD with >70% stenosis in any one major
   epicardial artery or >50% stenosis in two or more arteries, ii) having
   past history of myocardial infarction, and iii) undergone/undergoing
   percutaneous transluminal coronary angioplasty or bypass graft surgery.

   Healthy volunteers who were clinically asymptomatic for CAD and other
   inflammatory disorders, enrolled from the same geographical area as
   that of the proband, having no family history of cardiovascular disease
   and showing normal ECG readings were treated as controls. Informed
   consent was obtained from all the study participants. The study has
   been approved by the Institutional Ethics Committee and follows the
   bioethical guidelines of the Indian Council of Medical Research (ICMR)
   [60][25].

Clinical and Biochemical assessment

   Detailed demographics and anthropometric measures were recorded for
   each participant based on a personal interview. Presence of
   hypertension and diabetes was ascertained based on self-report of
   physician's diagnosis and/or use of prescription medications along with
   perusal of their medical records. Body mass index (BMI) was calculated
   as body weight (kg) divided by the square of height (m^2). Prevalence
   of metabolic syndrome (MS) in the cohort was assessed based on the
   modified Adult Treatment Panel III(ATP-III) criteria which includes
   lower cut-offs for waist circumference(WC) (≥90 cm for men and ≥80 cm
   for women) and appears to be a better criteria for classification of MS
   in Asian Indians [61][26].

   Venous blood collected after overnight fasting of 12 to 14 hrs was
   centrifuged to separate the serum, EDTA and plasma samples. Plasma
   levels of total cholesterol (TC), triglyceride (TG) and High-density
   lipoprotein-cholesterol (HDL-c) were measured using Siemens Dimension
   Flex reagent cartridge (Siemens Healthcare Diagnostics Ltd, UK) and
   standards from Randox laboratories (Crumlin, UK) and assayed on Siemens
   dimension Xpand plus instrument (Siemens Dade Behring, Liederbach,
   Germany).

Real-Time QPCR Assay setup and analysis

   Total RNA was isolated from whole blood cells using QIAamp RNA Blood
   mini kit (Qiagen Inc, Hilden, Germany) and reverse transcribed to cDNA
   using cDNA archive kit (Applied Biosystems Inc., Foster Coty, CA, USA),
   following manufacturer's instructions. Quantitative RT-PCR was
   performed in duplicate using SYBR green chemistry on 7900 HT Fast
   RT-PCR system (Applied Biosystems, Foster City, CA). Relevant primer
   pairs were selected from the PrimerBank [62][27] and verified using
   BLAST search. The primer sequences are listed in [63]Table S2. The
   efficiency of RT-PCR experiment for each primer pair was determined by
   constructing a standard curve with serial sample dilutions. Relative
   gene expression was calculated with comparative Ct method [64][28]
   after normalization to beta-glucuronidase (GUSB), an endogenous
   control, by determining mRNA abundance in each sample relative to the
   reference sample (calibrator). Outliers were repeated in duplicates and
   persistent outliers were removed from the final analysis.

Statistical Analysis

   Student's t- test and univariate analysis were used to determine the
   differences in normalized mRNA expression levels and other quantitative
   traits between the cases and controls. Normality distribution of mRNA
   expression levels was assessed using Q-Q plot. Statistical differences
   for the experiments involving three or more groups were determined
   using analysis of variance (ANOVA) test. Pearson correlation was
   performed to evaluate correlations between the biomarkers. Age and
   gender were considered as potential confounders and appropriately
   adjusted for during analysis. All the analysis was carried out using
   SPSS v 17.0 statistical software package. Data was expressed as mean ±
   standard error of mean (SEM). All the statistical tests were two-sided,
   with 95% confidence interval (CIs). A nominal p value of 0.05 or less
   was considered as statistically significant.

Results

Network analysis and characterization of the hub nodes

   The extended PPI network generated using the 124 seed genes in STRING
   resulted in 1234 interactions between 145 nodes, of which 21 new nodes
   were pulled out based on their protein- protein interactions. The
   network obtained from STRING was subsequently analyzed as described in
   the methodology section. Nodes with large degree and high BC represent
   the key genes. In the present study, 5% of the nodes had a degree value
   greater than 50 and 4% of the nodes had BC above 0.05. In total, 3% of
   the nodes (5 genes) were finally selected having high degree and high
   BC values as shown in [65]table 1. The network so constructed revealed
   a scale free architecture with a power-law degree distribution
   r-squared value of 0.626. Interleukin-6 (IL-6), vascular endothelial
   growth factor A (VEGFA), interleukin-1 beta (IL1B), tumor necrosis
   factor (TNF) and prostaglandin-endoperoxide synthase 2 (PTGS2) were the
   key nodes (hub), displaying the highest connectivity within the
   network. The backbone network was subsequently constructed using these
   5 key genes, which consisted of 111 nodes connected via 348 edges
   ([66]Figure 2). Here, IL-6 occupied the centre of the backbone network,
   having the largest degree and highest BC, which suggests that IL-6
   could be considered as a super-hub gene.

Table 1. Key genes selected based on topological parameters like BC and
degree.

   Gene  Degree Betweenness centrality (BC)
   IL6   77     0.10995103
   VEGFA 61     0.09511043
   IL1B  61     0.05885691
   TNF   62     0.05394098
   PTGS2 49     0.05242533
   [67]Open in a new tab

   Node degree and Betweenness Centrality (BC) are topological parameters
   used for gene prioritization in the network; A cut-off of BC >0.05
   and/or node degree >50 were considered for gene prioritization. IL-6
   constituted the super hub node having the largest degree and the
   highest BC.

Figure 2. The inflammatory backbone network.

   [68]Figure 2
   [69]Open in a new tab

   The backbone network is derived from a master network consisting of 145
   nodes connected via 1234 edges, constructed from 124 combined gene sets
   obtained from Polysearch and CADgene database. This backbone network
   consists of 111 nodes connected via 348 edges. Node color code: shades
   of red to green color depicts node with highest to lowest value of
   betweenness centrality (BC); Node size: from biggest to smallest circle
   map the node degree. Bigger and bright colored nodes represent genes
   with more links. IL-6 appears to be the super hub gene in the network
   with largest degree and highest BC.

Identification of regulators for the hub nodes

   To investigate if there was a common transcriptional regulatory
   mechanism for these hub genes, we performed transcription promoter
   analysis and obtained a list of transcription factors using
   MatInspector. Further refinement of TFs based on evidence >1 resulted
   in 69 TFs belonging to 19 families for IL6, 75 TFs belonging to 18
   families for VEGFA, 74 TFs belonging to 21 families for IL1B, 70 TFs
   belonging to 21 families for TNF and 37 TFs belonging to 10 families
   for PTGS2. All the TFs were subsequently combined and redundancy was
   removed to obtain a final list of 184 unique TFs. A regulatory network
   was constructed by integrating these TFs and their target genes
   ([70]Figure 3). Nuclear factor kappa B1 (NFKB1), signal transducer and
   activator of transcription 3 (STAT3) and JUN showed the highest
   connections suggesting that these regulators could play a critical role
   in activating these putative inflammatory genes.

Figure 3. Interactive regulatory network constituted with five hub genes and
three common transcription factors.

   [71]Figure 3
   [72]Open in a new tab

   184 unique transcription factors (TFs) obtained with MatInspector were
   used for the construction of this network with Cytoscape v 2.8.3. The
   TFs showing top connectivity are represented as red circle while the
   putative target genes are shown in turquoise color. The text has been
   enlarged for easy identification.

Validation of the hub genes

   We performed cross-validation to confirm if the selected hub genes are
   being retained even after random gene removal. We ran 97 test networks
   by randomly removing the top 22 nodes having the highest BC and node
   degree. By doing so, the overall accuracy of the backbone network was
   estimated to be 0.7763. The frequency of the hub nodes, IL-6, IL1B,
   TNF, and PTGS2, was good for 1 to 4 gene deletions. However, the
   frequency of VEGFA was good for 1-2 gene removals, but decreased with
   three or more gene removals. The frequency of hub genes and accuracy of
   the 97 test networks are provided in [73]Table 2 and [74]Figure 4.

Table 2. Frequency of the hub genes and the accuracy of the backbone in 97
test network.

   Number of removed gene Frequency of the hub nodes in test network
   Accuracy of the backbone Number of test networks
   IL6 IL1B TNF PTGS2 VEGFA
   1 21 21 21 21 20 0.945 22
   2 20 20 20 19 20 0.792 25
   3 20 23 18 17 9 0.696 25
   4 20 23 18 19 4 0.672 25
   Overall 81 87 77 76 53 0.776 97
   [75]Open in a new tab

Figure 4. Accuracy of hub genes in the test network.

   [76]Figure 4
   [77]Open in a new tab

   Horizontal axis represents the number of genes removed in the test
   network. The vertical axis represents accuracy of the respective test
   network.

Molecular function and Pathway analysis

   To get an insight on the pathways that the 5 hub genes and their 3
   common transcriptional regulators might be involved in, we performed
   enrichment analysis using ClueGO software. Overall, 36 GO terms and 20
   pathways were significantly enriched, which could be categorized into
   six GO groups as represented in [78]Figure 5. The main GO categories
   were as follows: positive regulation of smooth muscle cell
   proliferation, acute-phase response and regulation of calcidiol
   1-monooxygenase activity. The main pathways from KEGG and REACTOME
   belonged to the following categories: toll-like receptor signaling,
   NOD-like receptor signaling and adipocytokine signaling pathways
   ([79]Figure 5).

Figure 5. Grouping of network based on functionally enriched GO terms and
pathways.

   [80]Figure 5
   [81]Open in a new tab

   A. Functionally grouped network of enriched categories was generated
   for the hub genes and their regulators using ClueGO. GO terms are
   represented as nodes based on their kappa score level (≥0.3).
   Functionally grouped networks are linked to their biological function,
   where only the most significant term in the group is labeled.
   Functionally related groups partially overlap. Visualization has been
   carried out using Cytoscape 2.8.3. B. Table provides the results of
   ClueGO analysis. Nr: Number of genes from our list (8-genes) associated
   with the GO term. %: percentage of genes found from the total number of
   associated genes. p value: p value of the GO term after
   Benjamini-Hochberg correction. Associated genes are represented from
   among those associated with either GO term or specific pathway.

Clinical characteristics of study subjects

   The characteristics of study subjects in the gene expression study are
   described in [82]Table 3. The average age of cases and controls was
   around 50 years, with higher frequency of males (89%) than female
   (11%). Hypertension and diabetes were higher in the cases than in the
   controls. There were 51 (79.7%) and 45 (70.3%) individuals having MS in
   cases and controls, respectively. There were 35 cases (62.5%) with
   chronic stable angina and 21cases (37.5%) with myocardial infarction.

Table 3. Clinical characteristics of study participants included in the gene
expression study.

   Clinical Factors            Cases         Controls     p value
                               (N = 64)      (N = 64)
   Age (years)                 50.02±0.817   50.22±0.788  0.858
   Males N (%)                 57(89.1)      57(89.1)     -
   Females N (%)               7(10.9)       7(10.9)      -
   BMI(kg/m^2)                 25.24±0.416   25.11±0.55   0.847
   Waist Circumference (cm)    89.71±1.10    93.54±1.09   0.015
   Hip Circumference (cm)      93.95±1.01    94.62±1.25   0.68
   Waist/Hip Ratio (cm)        0.95±0.009    0.98±0.010   0.015
   SBP (mmHg)                  120.75±1.41   124.09±2.49  0.246
   DBP (mmHg)                  77.84±0.90    79.25±1.97   0.518
   Age at onset (years)        49.33±0.934                -
   Stable Angina N (%)         35(62.5)
   Myocardial infarction N (%) 21(37.5)
   Number of Diseased Vessel   1 =  7(12.3)
                               2 =  18(31.6)
                               3 =  32(56.1)
   Laboratory studies
   TC (mg/dl)                  147.38±5.92   143.95±11.52 2.29*10^−6
   TG (mg/dl)                  161.79±8.19   194.71±9.06  0.218
   HDL-c (mg/dl)               31.48±1.01    39.89±1.445  7.70*10^−6
   LDL-c (mg/dl)               83.54±5.22    117.99±4.17  8.59*10^−7
   Medical history
   Smoking N (%)               26(40.6)      26(40.6)     -
   Hypertension N (%)          30(47.6)      7 (10.9)     5.06*10^−6
   Diabetes mellitus N (%)     27(42.2)      8 (12.5)     0.00028
   Metabolic Syndrome N (%)    51 (79.7)     45 (70.3)    0.154
   Medications
   Statin N (%)                41 (64%)      -            -
   [83]Open in a new tab

   Continuous variables are expressed as mean±standard deviation. BMI,
   body mass index; SBP, systolic blood pressure; DBP, diastolic blood
   pressure; TC-Total cholesterol; TG- Triglyceride; HDL-c, High Density
   Lipoprotein cholesterol; LDL-c, Low Density Lipoprotein cholesterol.

   Frequency distribution of subjects based on the number of disease
   vessels were as follows: 0 or 1 vessel: 7 (12.3%), 2 vessels: 18
   (31.6%), 3 or more vessels: 32 (56.1%). TC and LDL-c levels were lower
   in cases than in the controls, which might be attributed to the higher
   usage of lipid lowering drugs in the former group.

Expression analysis of key genes and their regulators

   In order to discern the association of key genes and transcription
   factors identified through network analysis, we measured the relative
   expression of all the 5 candidate genes and the 3 regulators in 64
   cases and 64 controls. Significant differences were observed in the
   mean expression levels between cases versus controls for IL6 (2.08±0.23
   vs 1.48±0.17; p = 0.02), VEGFA (1.91±0.11 vs 1.15; p<0.001), IL-1B
   (1.19±0.11 vs 0.93±0.07; p = 0.002), TNF (1.93±0.12 vs 1.26±0.08;
   p = 0.03), PTGS2 (6.12±0.68 vs 1.63±0.16; p<0.001), NFKB1 (0.31±0.03 vs
   0.21±0.02; p<0.001), STAT3 (0.61±0.06 vs 0.41±0.04; p = 0.02) and JUN
   (2.95±0.32 vs 1.43±0.15; p<0.001) ([84]Figure 6). Statistical
   significance was retained even after adjusting for age and gender.

Figure 6. Relative expression of hub genes and their transcription factors.

   [85]Figure 6
   [86]Open in a new tab

   Gene expression of individual samples was normalized to GUSB mRNA
   levels. The data are expressed as mean ± S.E.M. for the affected (black
   bar) and unaffected (white bar) subjects.

   Subset analysis of gene expression in individuals with (N = 26) and
   without (N = 66) diabetes and with (N = 24) and without (N = 66)
   hypertension showed significant differences for PTGS2 in individuals
   with diabetes (6.23±0.94 vs 2.93±0.40; p = 0.009) and hypertension
   (5.83±1.01 vs 3.04±0.39; p = 0.001), respectively. Analysis of these
   markers in individuals with MS (N = 70) and without MS (N = 24) showed
   significant expression differences for PTGS2 (4.13±0.51 vs 2.73±0.54;
   p = 0.008) and VEGFA (1.59±0.09 vs 1.38±0.10; p = 0.006). Furthermore,
   there was strong correlation in the mean expression levels of these hub
   genes with each other as well as with the transcriptional regulators
   (r = 0.24–0.78; p<0.05) ([87]Table S4).

Discussion

   Inflammation plays a pivotal role in atherosclerosis which can be
   attributed to the release of various enzymes, cytokines and chemokines
   during the different stages of disease development and progression
   [88][29], [89][30]. In line with this, several inflammatory genes are
   known to be associated with CAD [90][4]. Experimental validation of all
   the potential candidate genes is impractical due to the prohibitive
   cost and the time involved. In such a scenario, network analysis can
   serve as a powerful tool for gene prioritization. We have used a
   network based approach to identify key connecting molecules from a
   given panel of inflammatory genes. As our understanding of the
   molecular basis of CAD may suffer from incompleteness of available
   data, we considered inflammatory genes implicated with different
   disease types. Studies on human disease network in fact indicate a
   common genetic origin for many diseases, thus suggesting
   interconnectedness among the genes [91][31]. Hence, for this study, 68
   inflammatory genes associated with CAD were selected using the CADgene
   database, which was further extended by selecting additional
   inflammatory genes associated with other diseases. In this manner, the
   number of genes increased from 68 to 124, which was subsequently used
   for the construction of the network.

   The main network was characterized by a few large nodes which
   constituted the hub, having many connections and other smaller nodes
   with fewer connections [92][32]. This primary network consisted of 1234
   interactions between 145 nodes and the relative importance of the genes
   in the network was determined based on the centrality measures of node
   degree >50 and BC value >0.05. In this manner, we identified five key
   genes, IL6, VEGFA, IL-1B, TNF and PTGS2 that were used to construct the
   backbone network with an aim to analyze the interconnectivity of the
   hub genes with the other genes. The backbone interaction network
   included 111 nodes connected via 348 edges and implied that 76% of the
   connections were formed due to the five hub nodes, which may play a
   critical role in the inflammatory-driven disease process. In fact, IL-6
   formed the central molecule, showing the highest connectivity and
   constituting a super-hub in relation to the network. According to the
   centrality-lethality rule, hubs are more likely to be functionally
   relevant than the other genes [93][33]. In fact, it has been
   demonstrated that deletion of the hub genes produce a higher frequency
   of sick phenotypes than the deletion of genes showing low connectivity
   [94][34]. These topological properties of the network have been
   employed to improve our understanding of key genes associated with the
   diseases. For instance, Sarajlic et al employed network topological
   features to predict novel cardiovascular genes that are enriched in
   drug targets and driver genes [95][11]. As explained in the
   [96]introduction section, inflammation plays a pivotal role in CAD and
   network biology approaches also postulate that inflammation and immune
   response related genes have yielded the most consistent signals across
   different diseases [97][35]. Zhang et al. applied network topological
   features to construct a unified classifier and then used it to predict
   the candidate genes for CAD and their results showed an
   overrepresentation of the genes belonging to the inflammatory pathway
   [98][12]. Through an effective analysis of the genome wide action of
   liver X receptor α (LXRα) in foam cells and macrophages using chromatin
   immunoprecipitation sequencing, gene expression and in vitro
   experimentation, complimented by transcriptional network analysis,
   Feldmann et al, identified highly integrated LXRα ligand-dependent
   transcriptional networks that contributed to the reversal of
   cholesterol efflux and reduced inflammation, thus resulting in
   effective prevention of atherogenesis [99][15]. In yet another recent
   report on the molecular processes occurring on the atherosclerotic
   arterial wall in response to plasma cholesterol lowering therapies,
   distinct differences were observed in the type of transcription factors
   and associated regulatory networks of plasma cholesterol
   lowering-responsive gene sets in early, mature, and advanced lesions
   [100][16]. These studies highlight the effectiveness of combing
   experimentation with network analysis to delineate the complexity of
   the atherosclerotic disease process.

   We performed quantitative validation of the causal genes followed by
   functional enrichment analysis. Literature validations of hub genes
   have also confirmed their role in the pathogenesis of CAD and the
   details are discussed in the ensuing paragraphs. Next, in order, to
   evaluate the robustness of our network and eliminate false positives
   and false negatives, we constructed 97 test networks wherein we found
   that the frequency of the hub genes was not grossly altered. Concurrent
   with our findings, other studies have also demonstrated that the scale
   free network exhibit certain level of tolerance against errors,
   implying that the PPI network topologies are robust with respect to
   perturbations [101][36].

   Inflammatory response results from a complex interplay between the
   different inflammatory molecules which act either in an autocrine or
   paracrine manner to induce acute phase response. The inflammatory
   cascade is initiated by the pro-inflammatory cytokines, IL1B and TNF,
   which induce IL-6 synthesis, mediated by Phosphatidylinositol
   3-Kinase-dependent AKT/IκB Kinase α Pathway targeting AP-1 (Activator
   Protein-1) [102][37]. Cholesterol crystals in the plaque activate IL-1B
   secretion in the macrophages and thus link cholesterol metabolism to
   inflammation [103][38]. Increased circulatory levels of IL-1β in
   patients with unstable angina indicates its contribution to the acute
   disease process [104][39]. Clinical studies have also shown increased
   expression of IL-1B and TNF in the atherosclerotic plaques [105][40],
   [106][41]. Consistent with these reports, we also observed increased
   levels of IL-1B and TNF in the affected cases than in the asymptomatic
   controls. Studies in mouse models however, suggest a dual role for
   IL1B, wherein absence of IL-1B decreases the severity of
   atherosclerosis [107][42]. On the other hand, Alexander et al have
   shown that the inactivation of IL-1B enhance atherosclerotic plaque
   stability [108][43]. In the present study, IL-6 was represented at the
   centre of the backbone network, which highlights their significance as
   a pivotal player in the CAD inflammatory network. In fact, in a study
   on the IARS cohort, we have previously shown significant association
   between IL-6 gene polymorphisms and premature CAD and that IL6 acts as
   a key regulator of acute phase reactants, namely hsCRP and fibrinogen
   in the IARS cohort [109][44].

   Vascular endothelial growth factor A is a key member of the family of
   growth factors. It is involved in the angiogenesis and is essential for
   tissue growth as well as organ repair process [110][45]. Higher
   circulating levels of VEGFA have been previously detected in serum of
   patients with cardiovascular disease [111][46], [112][47]. The enzyme
   PTGS2 [more commonly known as cyclooxygenase-2 (COX-2)] mediates the
   production of prostaglandins, which can induce an inflammatory response
   [113][48]. PTGS2 has been shown to promote plaque rupture through the
   metalloproteases in symptomatic plaques showing recent evidence of
   ischemic attack [114][49]. Other studies have also reported a strong
   correlation between COX-2 and VEGFA, which together play an important
   role in angiogenesis [115][50]. We noted increased expression and
   positive correlation of PTGS2 and VEGFA in the present study.

   Controlled regulation of the gene transcripts is essential for
   maintaining homeostasis and a disease free state. Since specific TFs
   bind to the promoter region of their gene targets to regulate
   expression, we analyzed the regulatory network to understand the
   pattern of relationship between the five hub genes and their specific
   TFs. From among the 184 unique TFs, we identified NFKB1, JUN and STAT3
   as the three most common factors, whose role in CAD is well documented
   [116][51]–[117][53]. Recently, the CARDIoGRAMplusC4D consortium carried
   out a large scale association study followed by network analysis and
   identified NF-κB, MAPK and JAK-STAT as the key signaling pathways
   involved in the pathogenesis of CAD [118][8]. Inhibition of NFKB has
   been shown to retard atherosclerotic disease progression in apoE/LDLR
   double knockout mice models [119][54]. In fact, the NFKB family as a
   whole plays a crucial role in regulating a number of processes in the
   cardiovascular system such as inflammation, cell survival, cell
   proliferation, cellular response to stress, hypoxia and ischemia
   [120][55]. Among the many transcriptional targets for NFKB, IL6, VEGFA,
   IL-1B, TNF and PTGS2 are considered to be important and regulate the
   cytokines, chemokines and the adhesion molecules [121][53]. JUN is an
   important protein constituent of AP-1 (activator protein-1)
   transcription factors wherein, the c-Jun protein forms a homo or
   heterodimer with c-Fos, resulting in the AP-1 transcription factor
   complex [122][56]. The role of c-Jun in cellular proliferation,
   differentiation and apoptosis is well established [123][57], [124][58].
   Studies also suggest that c-Jun can regulate neo intima formation and
   promote the proliferation and differentiation of vascular smooth muscle
   cells [125][59] and human endometrial cells [126][60]. AP-1 has been
   shown to be involved in the activation of matrix metalloproteinase
   (MMP) in progressive and unstable atherosclerotic plaques [127][61] as
   well as during the early disease process [128][52]. JUN appeared to be
   one of the most significant transcription factors in our regulatory
   network.

   STAT3 has been implicated in the cardiovascular inflammatory process
   [129][51]. STAT3 is activated by phosphorylation of tyrosine 705 and
   serine 727 in response to the various cytokines and growth factors,
   including IL-6, IL-10, epidermal growth factor (EGF), interferon-alpha
   (IFN-alpha) and interferon-gamma (IFN-gamma) in distinct patterns
   [130][62], [131][63]. In addition, STAT3 is activated under a variety
   of stress conditions [132][64], translocate to the nucleus, where they
   bind to specific DNA sequences and regulate the expression of target
   genes. Initially, STAT3 was thought to be an acute phase reactant
   [133][65]. However, recent studies have shown that it exerts a cardio
   protective effect [134][66]. Experimental studies in rats have shown
   elevated IL6 expression in acute MI, accompanied by a marked increase
   in STAT3 through the activation of the JAK/STAT pathway [135][67],
   [136][68]. Our results are in line with the earlier reports, wherein
   both IL6 and STAT3 showed higher expression in CAD patients relative to
   the controls. We observed strong positive correlation between STAT3 and
   VEGFA (r = 0.565; p<0.001) which is comparable to the study by Osugi et
   al where they have shown that activation of STAT3 in the cardiac
   myocytes increased VEGF expression [137][69]. In fact, STAT3 activation
   could be ascribed to a protective systemic response to reduce cardiac
   damage and promote active remodeling. The above studies demonstrate the
   critical role of STAT3 as a negative regulator of inflammation and its
   potential as a therapeutic target for atherosclerosis.

   Through functional enrichment, we identified the following pathways
   associated with the hub genes and their regulators: smooth muscle cell
   proliferation, acute-phase response, calcidiol 1-monooxygenase
   activity, toll-like receptor (TLR), NOD-like receptor (NLR) and
   adipocytokine signaling pathways. Numerous studies have demonstrated
   the involvement of these biological processes in atherosclerosis.
   Abnormal vascular smooth muscle cell proliferation has been shown to
   increase the production of acute-phase reactant proteins like CRP,
   while Vitamin D deficiency has been implicated in cardiovascular
   disease [138][70]–[139][72]. Both TLR and NLR signaling pathways belong
   to a specific family of pattern recognition receptors that trigger the
   expression of several genes involved in innate immune response
   [140][73]. TLR signaling is associated with chronic inflammatory
   response [141][74] while NLR signaling results in the formation of
   inflammasomes, which along with TLR, orchestrate the induction of IL-1B
   and IL-18 secretion [142][75]. TLR-1, TLR-2 and TLR-4 have been shown
   to be elevated in human atherosclerotic lesions [143][76]. In a pilot
   study, we have previously shown the increased expression of TLR2 in the
   peripheral whole blood of CAD subjects [144][77]. Among NLRs,
   NLR-related protein 3 is one of the better characterized receptors in
   atherosclerosis. Interestingly, there have been studies to show that
   cholesterol crystals which are characteristic of atherosclerotic
   lesions, activate NLR3 and promote the secretion of pro-inflammatory
   cytokines [145][38], [146][78]. Adipocytokines are biologically active
   molecules secreted by the adipose tissue and promote the development of
   obesity-mediated atherosclerotic disease. Adipose tissue from obese
   individuals synthesizes and releases leptin, adiponectin and
   pro-inflammatory molecules such as TNF, IL1B and IL-6 that increase the
   cardiovascular risk [147][79].

   We recognize certain limitations in our study. The network generated
   was based on information obtained from literature mining and hence
   could carry an inherent bias. The generated network might also suffer
   from incompleteness or missing interactions since they have been
   derived from published reports and our understanding of cardiovascular
   genetics is far from complete. However, we have addressed these issues
   to some extent by using two different data sources to arrive at a
   comprehensive panel of inflammatory genes.

   In conclusion, we have identified IL-6, VEGFA, IL1B, TNF and PTGS2
   genes to play a pivotal role in the inflammation-driven atherosclerotic
   disease process from among a vast array of putative inflammatory genes
   using network-based analysis. Of the associated transcriptional
   regulatory factors, NFKB1, STAT3 and JUN was shown to highly network
   with the above gene targets and provide a glimpse of the inherent
   complexity of the inflammatory bionetwork. The frequency of the hub
   genes was also found to be unaffected against the changes in the
   initial seed genes. We were finally able to confirm our hypothesis with
   a pilot case-control gene expression association study. While there is
   convincing evidence to show an active role for these candidate genes
   and their transcription factors based on in vitro and in vivo
   experimental studies in animal models and humans, we have shown
   comparable outcomes using a network based approach. The advantage lies
   in our ability to visualize not only complex interactions among the
   individual components but also comprehend their relative importance in
   the network based on well defined topological parameters. Such
   visualization and identification can promote a better understanding of
   the underlying disease process and also identify specific gene targets
   for therapy. Nonetheless, additional studies are required to confirm
   these initial findings and to eventually realize their true potential
   in a clinical setting.

Supporting Information

   Table S1

   List of genes extracted from Polysearch and CADgene database that are
   associated with inflammation. Relevancy score has been provided for all
   the genes selected from Polysearch. Other genes have been selected from
   CADgene database.

   (DOCX)
   [148]Click here for additional data file.^ (23.3KB, docx)
   Table S2

   Oligonucleotides used for SYBR Green RT-qPCR. Primers were selected
   from PrimerBank and the respective IDs have been provided.

   (DOCX)
   [149]Click here for additional data file.^ (11.7KB, docx)
   Table S3

   Detailed information about frequency of hub node and accuracy of the
   backbone in the 97 test networks.

   (XLSX)
   [150]Click here for additional data file.^ (15.7KB, xlsx)
   Table S4

   Representation of correlation matrix for expression of candidate
   inflammatory genes. Pearson's correlation between the transcriptome is
   displayed in a color gradient of red to green, where red indicates high
   correlation and green indicates least correlation. Range of red to
   orange color indicates positive correlation. * indicates p<0.05;
   correlation coefficient range was 0.249 to 0.79.

   (DOCX)
   [151]Click here for additional data file.^ (13KB, docx)

Acknowledgments