Abstract

   Atherosclerosis is the primary cause of cardiovascular events and its
   molecular mechanism urgently needs to be clarified. In our study,
   atheromatous plaques (ATH) and macroscopically intact tissue (MIT)
   sampled from 32 patients were compared and an integrated series of
   bioinformatic microarray analyses were used to identify altered genes
   and pathways. Our work showed 816 genes were differentially expressed
   between ATH and MIT, including 443 that were up-regulated and 373 that
   were down-regulated in ATH tissues. GO functional-enrichment analysis
   for differentially expressed genes (DEGs) indicated that genes related
   to the “immune response” and “muscle contraction” were altered in ATHs.
   KEGG pathway-enrichment analysis showed that up-regulated DEGs were
   significantly enriched in the “FcεRI-mediated signaling pathway”, while
   down-regulated genes were significantly enriched in the “transforming
   growth factor-β signaling pathway”. Protein-protein interaction network
   and module analysis demonstrated that VAV1, SYK, LYN and PTPN6 may play
   critical roles in the network. Additionally, similar observations were
   seen in a validation study where SYK, LYN and PTPN6 were markedly
   elevated in ATH. All in all, identification of these genes and pathways
   not only provides new insights into the pathogenesis of
   atherosclerosis, but may also aid in the development of prognostic and
   therapeutic biomarkers for advanced atheroma.
     __________________________________________________________________

   Cardiovascular diseases are the leading cause of morbidity and
   mortality worldwide, and atherosclerosis is known to be the primary
   underlying factor responsible for the development of these
   diseases[47]^1. Despite extensive research, the detailed molecular
   mechanisms underlying the development of atherosclerosis and causing
   plaque rupture still remain unclear and new findings are urgently
   needed to complement the current knowledge and to identify new drug
   targets[48]^2. Rapid advances in biological technology, including DNA
   microarrays, able to detect the expression levels of tens of thousands
   of genes simultaneously, might help to provide comprehensive insights
   into the pathogenesis of atherosclerosis.

   Gene-expression profiling of atherosclerosis has recently been used to
   identify genes and pathways relevant to vascular pathophysiology. It
   has previously been used to analyze altered gene expression in normal
   and diseased arteries[49]^3, establish crucial players in
   atherosclerotic plaque progression[50]^4,[51]^5, identify
   differentially expressed genes (DEGs) by comparing plaques with or
   without cerebrovascular symptoms[52]^6, discover candidate pathways and
   genes related to atherosclerosis[53]^7, and find gene expression
   changes of atherosclerotic plaques in different vascular beds[54]^8.
   However, some drawbacks are associated with those previous studies. In
   microarray studies comparing atheroma with normal tissues[55]^3,[56]^7,
   differences in the cellular compositions and morphologies of
   atherosclerotic plaques and normal arteries may result in differential
   gene expression profiles that simply reflect this variation[57]^9. In
   addition, irregular sample-collection methods existed in some
   studies[58]^3,[59]^8 , for example, samples from different sites or
   sources or small sample sizes, may affect the reliability of
   studies[60]^10. Furthermore, in animal model experiments[61]^4,[62]^5,
   a high degree of variability in plaque composition and gene expression
   between humans and animal models may limit the extension of cDNA array
   studies on animal material to clinical use[63]^11. Features of unstable
   plaques, such as surface ulceration, rupture, intraplaque hemorrhage
   and thrombus, may also occur in both asymptomatic and symptomatic
   patients, which may also confound studies[64]^6 that classify samples
   according to patient symptomatology[65]^12. Additionally, the relative
   lack of systematic bioinformatic analysis of cDNA microarrays in
   current studies limits the effective exploitation of gene-expression
   data sets[66]^10. Therefore, an integrated bioinformatic analysis based
   on cDNA microarray studies of human tissues may help to clarify the
   mechanisms underlying the development and progression of
   atherosclerosis.

   To our knowledge, the variations between different individuals or blood
   vessels may affect the reliability of studies and it is very difficult
   to obtain healthy and diseased tissue from the same blood vessel of the
   same individual in human studies. To overcome the difficulty, we used a
   gene expression dataset from a previously published study[67]^13,
   comparing atheroma and its surrounding tissues from the same individual
   to track gene changes with disease progression and validated our
   findings with similar tissues. Besides, to interpret the biological
   relevance of these changes in gene expression, the microarray data were
   analyzed by integrated bioinformatic analysis expanding on traditional
   microarray analysis methods, namely gene-ontology and pathway analysis,
   thereby allowing the construction of interaction networks, that might
   identify novel prognostic markers and therapeutic targets.

Results

Identification of differentially expressed genes

   Through our microarray analysis, a total of 816
   differentially-expressed genes (DEGs) were identified between atheroma
   plaque (ATH) and macroscopically-intact tissue (MIT), including that
   443 genes were up-regulated and 373 genes were down-regulated ([68]Fig.
   1A,B). The greatest fold differential expressions were a six-fold
   up-regulation of the FABP4 gene (fatty acid-binding protein 4) and a
   3.3-fold down-regulation of the CNTN1 gene (contactin 1) in ATH
   compared with MIT.

Figure 1. Differentially expressed genes were identified between ATH and MIT.

   [69]Figure 1
   [70]Open in a new tab

   (A) The overlapping gene set dually identified by the SAM and FC
   method. (B) Volcano plots for all differentially expressed genes in
   comparison. The dots indicate that up-(red) and down-regulated (blue)
   DEGs were significant both at false-discovery rate <0.05 and
   Fold-change >1.5 or <0.667.

Gene ontology and pathway analyses

   Two hundred-ninety and 26 GO terms were significantly enriched among
   the up-regulated and down-regulated genes, respectively. [71]Table 1
   shows the ten most overrepresented GO terms for the up-regulated and
   down-regulated DEGs, including immune-related biological process, such
   as “cell activation” and “cytokine production”, and “muscle system
   process”. Meanwhile, the KEGG pathway analysis identified 77 and 26
   significantly enriched pathways for up-regulated and down-regulated
   genes, respectively. The ten most overrepresented KEGG pathways for
   up-regulated and down-regulated DEGs are shown in [72]Table 2, with “B
   cell receptor signaling pathway” and “TGF-beta signaling pathway” being
   most significantly enriched.

Table 1. Top 10 most overrepresented GO terms for the DEGs.

   Term Name Count P-value Adjustedp-value
   Up-regulated DEGs
   GO:0001775 cell activation 83 0 0
   GO:0001816 cytokine production 46 0 0
   GO:0002250 adaptive immune response 33 0 0
   GO:0002252 immune effector process 56 0 0
   GO:0002253 activation of immune response 42 0 0
   GO:0002376 immune system process 175 0 0
   GO:0002443 leukocyte mediated immunity 33 0 0
   GO:0002682 regulation of immune system process 104 0 0
   GO:0002684 positive regulation of immune system process 76 0 0
   GO:0002694 regulation of leukocyte activation 44 0 0
   Down-regulated DEGs
   GO:0003012 muscle system process 32 3.33E-16 3.77E-11
   GO:0006936 muscle contraction 28 6.35E-14 3.59E-09
   GO:0003008 system process 73 1.08E-11 4.06E-07
   GO:0032501 multicellular organismal process 157 2.01E-09 5.70E-05
   GO:0007155 cell adhesion 46 8.28E-09 0.000167
   GO:0022610 biological adhesion 46 8.87E-09 0.000167
   GO:0048731 system development 106 1.95E-08 0.000314
   GO:0044057 regulation of system process 28 2.64E-08 0.000373
   GO:0007507 heart development 25 3.82E-08 0.000481
   GO:0007399 nervous system development 64 1.30E-07 0.00147
   [73]Open in a new tab

   Count: the number of differentially expressed genes; P-value was
   obtained by hypergeometric test; P-value was adjusted by
   Benjamini-Hochberg method. If the p-value is less than 2.2E-16 in R
   tool, it will be automatically changed to 0, and FDR should also be 0.

Table 2. Top 10 most overrepresented KEGG pathways for the DEGs.

   Pathway id Pathway name Count P-value Adjustedp-value
   Up-regulated DEGs
   path:04662 B cell receptor signaling pathway 14 0 0
   path:04650 Natural killer cell mediated cytotoxicity 14 8.88178E-16
   6.94556E-13
   path:04664 Fc epsilon RI signaling pathway 11 2.87992E-13 9.65184E-11
   path:04062 Chemokine signaling pathway 18 1.55209E-12 3.64121E-10
   path:04380 Osteoclast differentiation 12 2.35412E-12 4.6023E-10
   path:04666 Fc gamma R-mediated phagocytosis 10 5.50975E-11 8.07867E-09
   path:04810 Regulation of actin cytoskeleton 12 1.0021E-09 1.17546E-07
   path:05144 Malaria 6 1.90164E-08 1.71586E-06
   path:04670 Leukocyte transendothelial migration 7 2.39578E-08
   2.00732E-06
   path:04610 Complement and coagulation cascades 8 7.67467E-08
   5.00133E-06
   Down-regulated DEGs
   path:04350 TGF-beta signaling pathway 4 0.0000639 0.075854499
   path:04713 Circadian entrainment 5 0.000112157 0.075854499
   path:04728 Dopaminergic synapse 6 0.000272488 0.109483429
   path:04020 Calcium signaling pathway 8 0.000326677 0.109483429
   path:04724 Glutamatergic synapse 6 0.00056299 0.146752804
   path:04540 Gap junction 4 0.000822902 0.154822692
   path:04723 Retrograde endocannabinoid signaling 3 0.000923945
   0.154822692
   path:04270 Vascular smooth muscle contraction 4 0.000934306 0.154822692
   path:04730 Long-term depression 4 0.001055909 0.154822692
   path:00380 Tryptophan metabolism 3 0.001142806 0.154862355
   [74]Open in a new tab

   Count: the number of differentially expressed genes; P-value was
   obtained by hypergeometric test; P-value was adjusted by
   Benjamini-Hochberg method. If the p-value is less than 2.2E-16 in R
   tool, it will be automatically changed to 0, and FDR should also be 0.

Protein–protein interaction (PPI) network analysis

   We constructed a PPI network to identify more important proteins and
   biological modules that may play crucial roles in the development of
   atherosclerosis. To confine the interactions only to those close to the
   DEGs, only first level interactions between DEGs and their neighbors
   were selected. There were 3,990 PPI pairs and 2,491 nodes in our
   constructed PPI network. The degree represents the number of
   neighboring nodes in the network and changes in the proteins/genes with
   higher degrees have more effects than changes in those with smaller
   degrees. SMAD9, LYN, PTPN6, ZBTB16, SYK, PRKCB, SVIL, VAV1, BMPR1B and
   BTK were located in the more important positions of network with higher
   degrees of 108, 102, 95, 81, 74, 66, 63, 62, 60 and 56, respectively,
   indicating those proteins play irreplaceable and critical roles in the
   maintaining the whole protein interactions in the network ([75]Fig.
   2A).

Figure 2. PPI interactions network of DEGs and the disease-relevant module
found in the network.

   [76]Figure 2
   [77]Open in a new tab

   Nodes and links represent human proteins and protein interactions;
   Nodes represent the encoding genes of proteins; Red color indicates
   up-regulated genes annotated in the PPI network; Blue color indicates
   the down-regulated genes annotated in the PPI network; Pink nodes
   represent the non-DEGs which have an interaction with DEGs (A,B). The
   disease-relevant module contains the most nodes in CFinder software
   (B).

   CFinder software was used to identify the disease-relevant modules in
   the PPI network. [78]Figure 2B shows the module containing the most
   nodes with parameter k = 5. This module contained eight DEGs, including
   CSF2RB, LCP2, LYN, PLCG2, PTPN6, PTPRC, SYK and VAV1. Jointly using
   topology of network and module analysis to select candidates would
   identify genes that have higher significance in the PPI network.
   Finally, we focused on four genes (SYK, LYN, PTPN6 and VAV1) that were
   overlapped between the top 10 nodes ranked by degrees and the
   disease-relevant modules in the PPI network ([79]Table 3).

Table 3. The candidate genes selected by our analysis.

   symbol entreze     FC      P-value  Ajusted P-value
   SYK     6850   1.696016958 5.02E-07    5.38E-06
   PTPN6   5777   1.694628747 5.02E-07    5.38E-06
   LYN     4067   1.635340328 0[80]^*     0[81]^*
   VAV1    7409   1.616980858 5.02E-07    5.38E-06
   [82]Open in a new tab

   ^*If the p-value is less than 2.2E-16 in R tool, it will be
   automatically changed to 0, and FDR should also be 0.

Verification of differentially expression genes in clinical samples

   To confirm and validate the expression of four candidates determined
   from microarray data analysis in clinical samples, eight fresh sets of
   ATH and MIT samples were collected from surgery. The mRNA expression of
   four candidates (VAV1, SYK, LYN and PTPN6) were examined by qRT-PCR in
   eight sets of atherosclerotic sample (8 ATH and 8 MIT, n = 16). The
   results from qRT-PCR showed that mRNA level of VAV1 (12.6 ± 5.0), LYN
   (3.7 ± 1.1), SYK (13.1 ± 4.4) and PTPN6 (11.2 ± 4.9) increased by 12.6,
   3.7, 13.1 and 11.2 folds respectively in ATH compared with in MIT
   ([83]Fig. 3A). The results are consistent with data from microarray
   analysis although the differences in mRNA level were even higher than
   the differences determined in the microarray analysis. Moreover, the
   whole lysates from four sets of atherosclerotic samples (4 ATH and 4
   MIT, n = 8) were analyzed by western blot. As [84]Fig. 3B shows, the
   protein level of LYN (2.2 ± 0.4), SYK (27.3 ± 8.3) and PTPN6
   (2.0 ± 0.5) significantly increased by degree of 2.2, 27.3 and 2.0
   folds respectively in all ATH samples compared with MIT samples. We
   noticed that VAV1 was not detected in the atherosclerotic samples at
   the protein level by current antibody.

Figure 3. Validation of four candidate genes expression determined from
microarray data analysis in clinical samples.

   [85]Figure 3
   [86]Open in a new tab

   (A) VAV1, LYN, SYK and PTPN6 expression was analyzed by qRT- PCR in 8
   sets of atherosclerotic sample (8 ATH and 8 MIT, n = 16). *p < 0.05;
   **p < 0.01. The data is a representative of three independent
   experiments. (B) whole lysate from atherosclerotic samples (4 ATH and 4
   MIT, n = 8) were analyzed by western blot. Top panel, the
   representative images of three independent experiments; Bottom panel,
   the quantitative data of the images in western blot by ImageJ software.
   The full-length blots are presented in [87]Supplementary materials.

Discussion

   Microarray studies have great potential to provide novel insights into
   the pathogenesis of complex diseases. In the present study, we
   systematically applied integrated bioinformatic methods to mine new
   candidate players in the process of atherosclerosis and validated our
   findings in an independent set of samples at both mRNA and protein
   levels. In our study, we identified a total of 816 genes differentially
   expressed in ATH compared with MIT, including 443 up-regulated and 373
   down-regulated DEGs. GO functional-enrichment analysis of these DEGs
   showed that genes mainly related to inflammation and immune responses
   were altered with disease progression. “Cell adhesion”,
   “proliferation”, “differentiation”, “motility”, “cell death”, “lipid
   metabolism” and “immune response” have all been reportedly associated
   with atherosclerosis[88]^14, and these processes were also identified
   in our enrichment results. Interestingly, although we identified
   similar numbers of up-regulated and down-regulated genes in ATHs, we
   observed an excess of significant GO categories for up-regulated genes,
   suggesting that the up-regulated genes are functionally more important
   in atherosclerosis progression.

   Pathway-enrichment results showed an overabundance of immune and
   inflammatory signals, represented by the “chemokine-signaling pathway”,
   “natural killer cell-mediated cytotoxicity”, and “FcεRI-signaling
   pathway” in atherosclerosis. Our finding indicates that innate and
   adaptive immune cells might contribute to the development and
   progression of atherosclerosis, especially in the advanced stages.
   Hypercholesterolemia was initially considered to be the major risk
   factor for atherosclerosis, but recent advances have proven that
   chronic inflammation and autoimmunity play major roles in the
   initiation and progression of the disease[89]^15, as supported by our
   pathway-enrichment results. In addition, the “TGF-β signaling”,
   “calcium signaling”, and “osteoclast differentiation” were all
   significantly enriched among DEGs.

   Mast cells are frequently in an activated state and participate in the
   process of atherosclerosis[90]^16. “FcεRI-mediated signaling” in mast
   cells is initiated by the interaction of an antigen with IgE bound to
   the extracellular domain of the α-chain of FcεRI. The mast cells
   activated by crosslinking of the FcεRI via IgE-antigen complexes could
   release and secrete biogenic amines, cytokines, lipid mediators and
   proteoglycans, which contribute to inflammatory responses. IgE and
   FcεRI have been implicated in several aspects of autoimmunity and
   chronic inflammatory diseases[91]^17. LYN, SYK and VAV1, the key
   modulators in our PPI network and module, are also implicated in this
   pathway ([92]Figure S1). Our results indicated that this pathway might
   play a crucial role in the process of atherosclerosis; however, there
   is currently no proof of a relationship between “FcεRI-mediated
   signaling” and atherosclerosis. And it is worth noting that calcium
   signaling is associated with FcεRI -mediataed signaling. “Calcium
   signaling”[93]^18, “TGF-β signaling”[94]^19 and “osteoclast
   differentiation”[95]^20 have all been shown to be involved in the
   process of atherosclerosis. Our pathway-enrichment analysis supported
   the involvement of some pathways known to be associated with
   atherosclerosis initiation and progression, and also highlighted the
   FcεRI-mediated signaling pathway, which, to the best of our knowledge,
   has not previously been reported in association with carotid atheroma.

   Our PPI network and further module analysis showed that VAV1, SYK, LYN
   and PTPN6 overlapped between the top 10 nodes and the disease-relevant
   module, suggesting that these genes play more crucial roles in the
   pathogenesis of carotid atheroma. Several studies have indicated
   important related functions for SYK and VAV1, suggesting that they play
   significant roles in atherosclerosis. SYK has been reported to be
   involved in the pathogenesis of atherosclerosis by activating monocyte
   chemotactic protein-1 expression[96]^21. Choi et al.[97]^22 found that
   TLR4/SYK-mediated macrophage responses may contribute to chronic
   inflammation in human atherosclerosis. Furthermore, the SYK inhibitor
   fostamatinib attenuated atherogenesis in mice, suggesting that SYK is a
   potential anti-inflammatory therapeutic target in
   atherosclerosis[98]^23. The validation study also indicated SYK was
   up-regulated both at the level of mRNA and protein which strengthens
   the assertion that it might play an important role in atherosclerosis
   development.

   VAV1, a member of the VAV gene family, is expressed exclusively in
   hematopoietic cells. It is a signal transduction molecule that acts as
   guanine nucleotide exchange factor for Rac1 and Rho GTPases, and also
   functions as an adaptor platform[99]^24. VAV1 impacts on processes that
   are highly relevant to atherogenesis, such as NADPH oxidase-mediated
   generation of reactive oxygen species, cell death, and leukocyte
   activation. An in vivo carotid artery thrombosis model showed that
   genetic deletion of Vav1 and Vav3 together may prevent the development
   of occlusive thrombi in mice fed a high-fat diet[100]^25. Deletion of
   Vav1 alone led to modest inhibition of oxidized low-density lipoprotein
   uptake and foam-cell formation, while deletion of both Vav1 and Vav3
   led to nearly complete inhibition of oxidized low-density lipoprotein
   uptake and foam-cell formation, suggesting that Vavs act as a critical
   regulator in the process of atherogenesis, and thus represents a novel
   therapeutic target[101]^26. In our validation study, VAV1 was not
   detected in the atherosclerotic samples at the protein level by current
   antibody, which worked well in the experiment of positive control
   tissues. Another VAV1 antibody was also applied, which still could not
   dectect the expression of this gene. Though we found similar expression
   patterns for upregulated VAV1 in both the qPCR and microarray analyses
   of ATH samples, VAV1 may not play a major role in the progression of
   atherosclerosis because its expression may be blocked at the
   translation phase.

   LYN encodes a tyrosine protein kinase that is involved in the
   regulation of mast-cell degranulation. Lyn is the major Src-family
   kinase regulating glycoprotein VI signaling, and its absence caused a
   delay in activation and a marked reduction in platelet aggregation on
   collagen in a laser-injury model[102]^27. However, an apparently
   contradictory study showed that Lyn inhibited platelet activation, and
   that Lyn was increasingly inactivated as platelet aggregation
   progressed[103]^28. Miki et al.[104]^29 suggested that Lyn plays an
   important role in the metabolism of serum lipids, and could induce the
   expression of monocyte chemotactic protein-1, which is related to
   atherosclerosis, during the development of atherosclerotic lesions on
   high-fat diets. Previous studies reflect the complex roles of LYN and
   our validation experiments showed that LYN was up-regulated in the
   atheroma at the levels of mRNA and proteins, which might promote the
   progression of this disease.

   PTPN6 is a member of the protein tyrosine phosphatase family of
   signaling molecules. It regulates a variety of cellular processes
   including cell growth, differentiation, mitotic cycle, and oncogenic
   transformation[105]^30. Kamata et al.[106]^31 suggested that PTPN6 acts
   as a negative regulator in the development of allergic responses such
   as allergic asthma. Dubois et al.[107]^32 concluded that PTPN6 played a
   crucial role in the negative modulation of insulin action and clearance
   in the liver, thus regulating whole-body glucose homeostasis. However,
   here we found PTPN6 up-regulated in atheroma at both the mRNA and
   proteins level and for the first time linked PTPN6 to atherosclerosis.

   Additionally, to ensure the robustness of our candidate DEGs, we
   confirmed the expression of our candidate genes in another dataset
   [108]GSE28829 ([109]Supplementary Figure S2). The analysis revealed
   that a similar representation of the gene expression patterns of our
   candidate DEGs was seen in the dataset [110]GSE28829, suggesting that
   VAV1, SYK, LYN and PTPN6 genes may play an important role in
   progression of atherosclerosis.

   To our knowledge, this is the first integrated bioinformatic analysis
   comparing gene expression between carotid plaque and macroscopically
   intact arterial tissue. The large-scale gene expression profile
   analysis in our study is a significant strength in addition to the fact
   that paired tissue samples were obtained from the same individual. Our
   integrated methods are based on pre-specified specific algorithms,
   established topology information of networks and existing knowledge
   from databases and literature. The integrated methods have an advantage
   over traditional, single-analysis microarray approaches and other
   enrichment-analysis methods, such as DAVID[111]^33 and
   NetGestalt[112]^34 and ensure the reliability and accuracy of the
   results. Key candidates were also validated in Chinese patients,
   indicating the generalization of molecular mechanisms among different
   ethnic groups. Meanwhile, our integrated bioinformatic analysis might
   reveal the relationship between DEGs at molecular interaction and
   pathway levels, which provided some clues for the deeper mechanism of
   our candidate DEGs.

   The discrepancies of expression between the qRT-PCR and microarray
   results may have been caused by a sensitivity bias between the two
   methods, difference in ethnicity, diet and lifestyle between French and
   Chinese people or by the use of different statistical methods in
   qRT-PCR and microarray. However, there are also some limitations in our
   study. First, the study population from the microarray analysis
   underwent carotid endarterectomy at the university hospital of Lyon, so
   the gene expression profile may be influenced by their ethnicity, diet
   and lifestyle. Secondly, the cohort consisted of older subjects that
   were predominantly male and the majority had hypertension. The
   generalization of our findings is unknown as the present results are
   limited to high risk populations with signs of atherosclerosis and
   severe symptoms. Finally, our work is a reanalysis of previously
   published dataset and although some previous work and our validation
   experiments support our gene expression analysis results, the work
   requires further study to identify the mechanism of action and to
   assess the relevance of our findings. This work serves as an excellent
   foundation and reference for further studies to expand on these
   findings in the future.

Conclusion

   This study identified SYK, LYN, PTPN6 and the “FcεRI-mediated signaling
   pathway” as potential candidate players involved in the pathogenesis of
   atherosclerosis. These findings enhance our understanding of the
   molecular mechanisms of this important disease. Further studies, such
   as gene functional studies, are needed to support the results of our
   study, with the aim of identifying candidate biomarkers with sufficient
   predictive power to act as prognostic and therapeutic biomarkers for
   advanced atheroma.

Methods

Source of data

   An existing dataset [113]GSE43292 within the Gene Expression Omnibus
   database was used for this work and obtained through approved access.
   The dataset was generated using the Affymetrix Human Gene 1.0 ST
   Array[114]^13. Ethical approval, sample tissue collection and
   preparation methods and characteristics of study participants were
   described in a previous report[115]^13. In brief, the dataset included
   32 from 34 consecutive patients admitted to the university hospital of
   Lyon in 2009 for carotid endarterectomy. Paired samples were taken from
   individuals meaning that 64 carotid artery samples were analyzed. The
   mean age of participants was 70 years (±10 years) and the majority were
   male and with hypertension, with elevated blood lipid levels and just
   over one third of the sample were diabetic ([116]Supplementary Table
   S1)[117]^13. Tissue samples were removed during surgery and dissected
   into two fragments: atheroma plaque tissue (ATH, subsequently
   identified as mostly stage IV and/or V lesions according to the
   American Heart Association classification) and macroscopically intact
   tissue (MIT, almost exclusively composed of stage I and II
   lesions)[118]^13.

Identification of differentially expressed genes

   The raw gene dataset obtained from the previous work[119]^13 was
   converted into expression measures, and background correction and
   quantile data normalization were performed using the robust multiarray
   average algorithm from the Affy package to obtain the expression
   profile data[120]^35. After deleting duplicated probes and averaging
   the multiple probes values for the same Entrez Genes (the unique
   integers as identifiers for gene records)[121]^36, we finally obtained
   expression profiles for 19,924 genes in the 64 samples.

   Because the differentially expressed genes (DEGs) might have stronger
   relationship with the development of disease, the significance analysis
   of microarrays[122]^37 and fold-change methods were jointly used to
   identify DEGs between ATH and MIT.

Functional-enrichment analysis

   We integrated GO annotation into the total DEGs by mining for enriched
   GO terms of proteins using the R-based GO function software packages,
   which extracts biologically relevant terms from statistically
   significant GO terms for a disease[123]^38,[124]^39.

   The DEGs were chosen for further analysis of Kyoto Encyclopedia of Gene
   and Genome (KEGG) enrichment. The SubpathwayMiner is a pathway
   identification system[125]^40 and accurately assessed the pathway
   structure to locate disease-relevant KEGG pathways[126]^41 and
   subpathways in DEGs relative to the genomic background.

Protein–protein interaction network construction

   In the study, we downloaded protein-protein interaction (PPI) data from
   human protein reference database (Release 9) on the website
   ([127]http://www.hprd.org/)[128]^42. These interactions were derived
   from literature of experimental validation, including physical
   interactions and enzymatic reactions found in signal transduction
   pathways. The PPI data were preprocessed, including removing redundancy
   and self-loops, resulting in a connected network of 9,618 nodes (unique
   Entrez IDs) and 39,240 documented interactions. PPI networks are
   visualized in Cytoscape[129]^43 with the nodes representing the
   proteins/genes and the edges representing interactions between any two
   proteins/genes.

   We constructed the PPI network by mapping the DEGs to the PPI network
   using the following steps. First, we extracted the nodes and
   relationships between DEGs and their direct interacting neighbors to
   confine the interactions only to those close to the DEGs using R
   software[130]^44, with each pair of interacting proteins in two lists
   of a text file. The DEGs (gene symbols) were listed in a NOA file with
   different node attribution annotations (down-regulated genes,
   up-regulated genes) and mapped to the constructed PPI network by the
   menu of “File-Import-Node Attributes”. Second, the degrees of nodes in
   the PPI networks were calculated by Network Analysis plugin by the menu
   of “Plugins- Network Analysis-Analyze network”. In our network, the
   degree of a node was the number of neighboring nodes in the network and
   node size was proportional to the degree of the protein. Third, CFinder
   software[131]^45 was used to find disease-related modules based on the
   Clique Percolation Method[132]^46, which is a free software for finding
   and visualizing overlapping dense groups of nodes in networks. PPI data
   from a text file was imported using the menu of “File- Open new
   network-run” with default parameters. The results of CFinder are highly
   correlated to the value of the parameter k. Larger k values correspond
   to smaller subgraphs with a higher density of links within them.

Ethics statement and validation study sample collection

   The collection of clinical samples was under approval by the Medical
   Ethics Committee of NanFang Hospital (Number: NFEC-2014-117) and
   informed consent was obtained from all subjects. The study was carried
   out in accordance with the standards set by the Declaration of Helsinki
   and Good Clinical Practice guidelines.

   Human carotid atherosclerotic plaques were obtained from patients who
   underwent endarterectomy at the Vascular Surgery Department of Nanfang
   hospital of Southern medical university (Guangzhou, China). Patients
   (n = 8, mean age: 67.3 years, range: 53–80 years) with internal carotid
   artery stenosis >70% were included. The carotid atheromas were
   separated as ATH and MIT according to macroscopic observation.
   Dissected vascular tissues were rapidly frozen in liquid nitrogen and
   stored at –80 °C. Sample characteristics used for each experiment are
   shown in [133]supplementary information Table S2.

Quantitative real-time PCR (qRT-PCR)

   Total RNA from specimens (n = 16) was isolated using Trizol reagent
   (TaKaRa Bio Inc, Japan) according to the manufacturer’s instructions.
   Complementary DNA was synthesized from 1000 ng of total RNA using the
   PrimeScript^TM RT reagent Kit with gDNA Eraser (TaKaRa Bio Inc, RR047Q)
   according to the manufacturer’s instruction, including the DNase step.
   Amplification was performed using SYBR^® Premix Ex Taq^TM (TaKaRa Bio
   Inc, RR420A). Quantitative real-time polymerase chain reaction (q-PCR)
   analysis was performed on Lightcycler96 (Roche Applied Science)
   according to the manufacturer’s protocol. GAPDH was used as internal
   control to normalize mRNA levels. All experiments were repeated three
   times. Primer sequences are listed in [134]Table 4. Analysis was
   performed by the comparative delta–delta–Ct method[135]^47.

Table 4. Primers for Real Time PCR.

   Gene Primer sequences (5′ → 3′)
   VAV1 Forward primer: CAACCTGCGTGAGGTCAACReverse primer:
   ACCTTGCCAAAATCCTGCACA
   SYK Forward primer: TGCACTATCGCATCGACAAAGReverse primer:
   CATTTCCCTGTGTGCCGATTT
   LYN Forward primer: GCTTTTGGCACCAGGAAATAGCReverse primer:
   TCATGTCGCTGATACAGGGAA
   PTPN6 Forward primer: TGAACTGCTCCGATCCCACTAReverse primer:
   CACGCACAAGAAACGTCCAG
   GAPDH Forward primer: GATGACATCAAGAAGGTGGTGAReverse primer:
   GTCTACATGGCAACTGTGAGGA
   [136]Open in a new tab

Western blotting

   Samples (n = 8) were taken from ultra-low temperature freezer and
   crushed in lysis buffer under liquid nitrogen. After detecting the
   concentration, proteins were separated on 7% SDS-polyacrylamide gel
   electrophoresis and transferred onto nitrocellulose membranes.
   Membranes were blocked with TBS-T(TBS/0.1% Tween-20) containing 5%
   non-fat dry milk for 1.5 hour at room temperature. Then, they were
   probed successively with mouse monoclonal anti-VAV1 (Cell Signaling
   Technology, Danvers, MA, USA), rabbit monoclonal anti-LYN (Abcam,
   Cambridge, MA, USA), rabbit monoclonal anti-PTPN6 (Abcam, Cambridge,
   MA, USA), and rabbit monoclonal anti-SYK (Abcam, Cambridge, MA, USA)
   antibodies at 4 °C overnight. Mouse monoclonal antibodies against GAPDH
   (Cell Signaling Technology, Danvers, MA, USA) were used as a loading
   control. Membranes were washed in TBS-T (TBS/0.1% Tween-20) three times
   for 5 min and probed with an anti-rabbit or -mouse HRP-conjugated
   secondary antibody in TBS-T with 5% of nonfat dry milk at room
   temperature for 1.5 h. Protein detection was performed using ECL
   reagents. Western blot bands were scanned using the ChemiDoc™ XRS
   Imaging System (BioRad, USA). Western blot bands were quantified using
   ImageJ software by measuring the band intensity for each group and
   normalized by GAPDH. The final results are expressed as fold changes by
   normalizing the data to the control values.

Statistical analyses

   Microarray analysis: DEGs for the microarray were identified using the
   fold-change and significance analysis of microarrays methods, with
   multiple testing corrections applied using the Benjamini-Hochberg
   method[137]^48. False-discovery rate <0.05 and fold-change >1.5 or
   <0.667 were set as the cutoff values of DEGs. For the functional
   enrichment analysis, significantly enriched GO terms in DEGs relative
   to the genomic background by GO function software packages were
   identified using the hypergeometric tests with an adjusted p-value
   <0.01, calculated by the Benjamini-Hochberg method[138]^48.
   Pathway-enrichment analysis was done using the R-based SubpathwayMiner
   software packages. Significantly enriched pathways were identified
   using hypergeometric tests and a p-value <0.01 was applied as the
   cut-off value for statistical significance.

   Validation study: The data in this study are shown as the mean ± S.D.
   For the real time PCR, groups were compared using the Wilcoxon
   signed-rank test for continuous variables (SPSS 19.0, Chicago, IL) and
   a 2-sided p value <0.05 was considered statistically significant.

Additional Information

   How to cite this article: Nai, W. et al. Identification of novel genes
   and pathways in carotid atheroma using integrated bioinformatic
   methods. Sci. Rep. 6, 18764; doi: 10.1038/srep18764 (2016).

Supplementary Material

   Supplementary Information
   [139]srep18764-s1.pdf^ (1.3MB, pdf)

Acknowledgments