Abstract

   The rupture of atherosclerotic plaques is essential for cardiovascular
   and cerebrovascular events. Identification of the key genes related to
   plaque rupture is an important approach to predict the status of plaque
   and to prevent the clinical events. In the present study, we downloaded
   two expression profiles related to the rupture of atherosclerotic
   plaques ([38]GSE41571 and [39]GSE120521) from GEO database. 11 samples
   in [40]GSE41571 were used to identify the differentially expressed
   genes (DEGs) and to construct the weighted gene correlation network
   analysis (WGCNA) by R software. The gene oncology (GO) and Kyoto
   Encyclopedia of Genes and Genomes (KEGG) enrichment tool in DAVID
   website, and the Protein-protein interactions in STRING website were
   used to predict the functions and mechanisms of genes. Furthermore, we
   mapped the hub genes extracted from WGCNA to DEGs, and constructed a
   sub-network using Cytoscape 3.7.2. The key genes were identified by the
   molecular complex detection (MCODE) in Cytoscape. Further validation
   was conducted using dataset [41]GSE120521 and human carotid
   endarterectomy (CEA) plaques. Results: In our study, 868 DEGs were
   identified in [42]GSE41571. Six modules with 236 hub genes were
   identified through WGCNA analysis. Among these six modules, blue and
   brown modules were of the highest correlations with ruptured plaques
   (with a correlation of 0.82 and −0.9 respectively). 72 hub genes were
   identified from blue and brown modules. These 72 genes were the most
   likely ones being related to cell adhesion, extracellular matrix
   organization, cell growth, cell migration, leukocyte migration,
   PI[3]K-Akt signaling, focal adhesion, and ECM-receptor interaction.
   Among the 72 hub genes, 45 were mapped to the DEGs (logFC > 1.0,
   p-value < 0.05). The sub-network of these 45 hub genes and MCODE
   analysis indicated 3 clusters (13 genes) as key genes. They were LOXL1,
   FBLN5, FMOD, ELN, EFEMP1 in cluster 1, RILP, HLA-DRA, HLA-DMB, HLA-DMA
   in cluster 2, and SFRP4, FZD6, DKK3 in cluster 3. Further expression
   detection indicated EFEMP1, BGN, ELN, FMOD, DKK3, FBLN5, FZD6, HLA-DRA,
   HLA-DMB, HLA-DMA, and RILP might have potential diagnostic value.

   Subject terms: Computational biology and bioinformatics, Genetics,
   Molecular biology, Biomarkers, Diseases

Introduction

   Atherosclerosis is a chronic inflammatory disease responsible for many
   clinical manifestations, such as coronary artery disease and ischemic
   stroke^[43]1. The deposition of lipid under a blood vessel wall leads
   to the formation of atherosclerotic plaques. The rupture of the plaque
   blocks the blood vessel and is the main reason for cardiovascular or
   cerebral vascular events^[44]2. Identification of the key genes related
   to plaque rupture is an important approach to predict the status of
   plaque and to prevent the clinical events.

   With the development of new technologies, microarray analysis has been
   a vital method for the research of genetic alteration in various
   diseases^[45]3. To deal with the big data from microarray profile,
   bioinformatics analysis has become a useful tool. For instance, linear
   models for microarray data (LIMMA) was used to identify the
   differential expressed genes (DEGs) from big data, Gene Oncology (GO)
   and Kyoto Encyclopedia of Genes and Genomes (KEGG) were used to predict
   the functions and mechanisms of genes^[46]4. Weighted gene correlation
   network analysis (WGCNA) is an advanced data mining method. It is used
   to describe the correlation patterns among genes, to find the clusters
   (modules) of highly correlated genes, to identify the hub genes in each
   module, and to associate the modules to external sample traits^[47]5.
   WGCNA provides another powerful method to group the genes into modules
   and coordinate the modules to biological functions and regulatory
   mechanisms^[48]6. WGCNA has been widely used in cancers, cardiovascular
   diseases and many other diseases^[49]7,[50]8.

   Recently, some microarray studies have been conducted to compare the
   gene expression profiles between carotid atheroma and the
   microscopically intact tissue ([51]GSE43292); human normal arterial
   intimae and advanced atherosclerotic plaques ([52]GSE97210); and
   ruptured and stable atherosclerotic human plaques ([53]GSE41571,
   [54]GSE120521)^[55]9,[56]10. WGCNA was used to identify the key lncRNAs
   and mRNAs associated with atherosclerosis or coronary artery disease
   progression^[57]8,[58]11. The bioinformatics method LIMMA was used to
   identify key genes in [59]GSE41571. COL3A1, COL1A2, ASPN, PPBP were
   thought to play critical roles in atherosclerotic plaque
   rupture^[60]10. However, they only identified individual DEGs in
   microarray profiles. No WGCNA analysis was conducted on stable and
   ruptured atherosclerotic plaques.

   In the present study, we downloaded the datasets [61]GSE41571 and
   [62]GSE120521 from GEO database
   ([63]https://www.ncbi.nlm.nih.gov/geo/), identified key genes by using
   bioinformatics tools, such as LIMMA, WGCNA, GO, KEGG, and Cytoscape.
   The key genes were further validated in human carotid endarterectomy
   (CEA) plaques.

Results

Datasets and workflow

   The datasets [64]GSE41571 and [65]GSE120521 were downloaded from the
   GEO database ([66]http://www.ncbi.nlm.nih.gov/geo/). [67]GSE41571
   contains 5 ruptured and 6 stable atherosclerotic plaques, while
   [68]GSE120521 contains 4 ruptured and 4 stable plaques.

   The workflow of the whole study was shown in Fig. [69]1. DEGs in
   [70]GSE41571 were firstly screened. The WGCNA was conducted on the
   ruptured and stable plaques in [71]GSE41571. The functional modules and
   the hub genes were identified. Then the hub genes were mapped to the
   DEGs and formed a sub-network by Cytoscape. Key genes were identified
   based on this network. The key genes were validated using [72]GSE120521
   and human CEA plaques.

Figure 1.

   [73]Figure 1
   [74]Open in a new tab

   Workflow of the whole study.

Identification of the DEGs

   The matrix file from [75]GSE41571 was pre-processed and annotated with
   the official gene symbol. The DEGs were identified using “LIMMA”
   package in the R software^[76]12. The genes with fold change over 2.0
   (logFC > 1.0) and p-value < 0.05 were considered DEGs. Eventually, 868
   genes were identified (Fig. [77]2A). Compared to the stable plaques,
   342 genes were upregulated and 526 genes were downregulated in the
   ruptured plaques. The heatmap of the top 40 upregulated and 40
   downregulated DEGs were shown in Fig. [78]2B, and Table [79]S1.

Figure 2.

   [80]Figure 2
   [81]Open in a new tab

   Identification of DEGs in [82]GSE41571. A: volcano plot of
   [83]GSE41571; B: Heatmap of the top 40 upregulated and top 40
   downregulated DEGs.

Construction of gene co-expression networks by WGCNA

   The gene expression profile [84]GSE41571 (11 samples, 8413 genes) was
   downloaded. The top 50% variant genes (4207 genes) in [85]GSE41571 were
   screened out by LIMMA and were used for the construction of gene
   co-expression network by WGNCA in the R software.

   The plaque traits of each sample were shown in Fig. [86]3A. To ensure a
   scale-free network for the construction of the co-expression network, a
   power of β = 9 was selected as a soft-thresholding parameter
   (Fig. [87]3B). The hierarchical clustering based on the topological
   overlap matrix (TOM) dissimilarity measure revealed six modules,
   namely, the blue, brown, green, red, turquoise, and yellow modules
   (Fig. [88]3C). Each module contained a group of coordinately expressed
   genes with high TOM value and was potentially related to the same
   clinical traits. Genes with the absolute values of Pearson’s
   correlation over 0.9 were considered hub genes. The number of genes
   involved in each module was 319 for the blue module (48 hub genes), 124
   for the brown module (24 hub genes), 51 for the green module (15 hub
   genes), 22 for the red module (14 hub genes), 391 for the turquoise
   module (103 hub genes) and 88 for the yellow module (32 hub genes)
   (Table [89]S2). Among these modules, blue and brown modules were found
   to have the highest association with atherosclerotic plaque rupture
   (correlations of 0.82, p-value of 0.002 for the blue module, and
   correlations of −0.9, p-value of 0.0001 for the brown module)
   (Fig. [90]3 D). Therefore, the hub genes in blue and brown modules were
   used for further analysis.

Figure 3.

   [91]Figure 3
   [92]Open in a new tab

   WGCNA identification of plaque rupture related hub genes. (A)
   Clustering dendrogram of 11 samples; (B) Determination of
   soft-threshoding power in the WGCNA; (C) Dendrogram of all
   differentially expressed genes clustered based on a dissimilarity
   measure (1-TOM); (D) Association between each individual module and the
   rupture of atherosclerotic plaque;.

Gene Oncology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG)
enrichment of hub genes in blue and brown modules

   To analyze the functions and mechaisms of the identified hub genes in
   blue and brown modules, GO and KEGG enrichment in DAVID (website:
   [93]https://david.ncifcrf.gov/) were used. The q-value < 0.05 was
   considered to be statistically significant for the correlations. The
   top 10 GO enrichment terms were visualized in the bubble charts by the
   Omicshare online tool ([94]http://www.omicshare.com/tools).

   As shown in Fig. [95]4, these hub genes were mostly enriched in
   Biological Process (BP) of cell adhesion and extracellular matrix
   organization (Fig. [96]4A); Cellular Components (CC) of extracellular
   exosome, plasma membrane (Fig. [97]4B); and Molecular Functions (MF) of
   protein binding (Fig. [98]4D). KEGG enrichment indicated the genes most
   likely affect the signal pathways of PI3K-Akt signaling and focal
   adhesion (Fig. [99]4C).

Figure 4.

   [100]Figure 4
   [101]Open in a new tab

   GO and KEGG enrichment of genes in blue and brown modules. (A)
   Biological Process of GO enrichment; (B) Cellular components of GO
   enrichment; (C) KEGG pathway enrichment; (D) Molecular function of GO
   enrichment; RichFactor = “count”/ “pop hits”. The “counts” is the
   number of hub genes enriched in a certain term. The “pop hits” is the
   number of all genes enriched in a certain term.

Protein-protein interaction (PPI) network of the hub genes in the blue and
brown modules

   The PPI was analyzed by the STRING online tool (website:
   [102]https://string-db.org/). 72 hub genes in the blue and brown
   modules were analyzed. As a result, the 72 hub genes formed a network
   with 82 nodes and 97 edges (Fig. [103]5). BGN, CTSA, FBLN5, and HLA-DRA
   were at the core of the network. These genes are mainly involved in the
   process of collagen formation and extracellular matrix degradation, and
   the functions of the immune system. All of the above pathological
   processes play critical roles in atherosclerosis.

Figure 5.

   [104]Figure 5
   [105]Open in a new tab

   PPI network for the 72 hub genes in blue and brown modules. The circles
   represent the proteins encoded by the corresponding genes; lines
   represent the interactions between the proteins.

Identification of key genes

   To figure out the key genes related to the ruptured plaques, we mapped
   the 72 hub genes to the DEGs. 45 hub genes with logFC > 1.0 and p-value
   < 0.05 were screened out. Among them, 12 were upregulated and 33 were
   downregulated (Table [106]S3). These 45 hub genes were subsequently
   used to construct the sub-network by Cytoscape 3.7.2 software. The
   plugin molecular complex detection (MCODE) tool was used to calculate
   the most significant clusters in the sub-network. As a result shown in
   Fig. [107]6 and Table [108]1, 3 clusters with 13 key genes were figured
   out. They were LOXL1, FBLN5, FMOD, ELN, EFEMP1 in cluster1, RILP,
   HLA-DRA, HLA-DMB, HLA-DMA in cluster 2, and SFRP4, FZD6, DKK3 in
   cluster 3. These genes were considered to be key genes.

Figure 6.

   [109]Figure 6
   [110]Open in a new tab

   The 45 hub genes sub-network constructed in Cytoscape 3.7.2 software.
   Yellow boxes were key genes in cluster 1; green boxes were key genes in
   cluster 2; red boxes were key genes in cluster 3; blue boxes were genes
   not clustered.

Table 1.

   The list of key genes.
   Gene symbol MCODE score Official full name
   Cluster 1 FMOD 3 Fibromodulin
   EFEMP1 3 EGF containing fibulin extracellular matrix protein 1
   LOXL1 2.7 Lysyl oxidase like 1
   BGN 2.7 Biglycan
   FBLN5 2.4 Fibulin 5
   ELN 2.4 Elastin
   Cluster 2 RILP 3.0 Rab interacting lysosomal protein
   HLA-DRA 3.0 Major histocompatibility complex, class II, DR alpha
   HLA-DMB 3.0 Major histocompatibility complex, class II, DM beta
   HLA-DMA 3.0 Major histocompatibility complex, class II, DM alpha
   Cluster 3 SFRP4 2.0 Secreted frizzled related protein 4
   FZD6 2.0 Frizzled class receptor 6
   DKK3 2.0 Dickkopf WNT signaling pathway inhibitor 3
   [111]Open in a new tab

Validation of key genes in [112]GSE120521 and human CEA plaques

   To further validate the key genes, we detected the expression of the 13
   key genes in the microarray profile [113]GSE120521, and in the human
   CEA plaques (10 stable and 10 ruptured). In [114]GSE120521, EFEMP1,
   BGN, ELN, FMOD, DKK3, FBLN5, and FZD6 were decreased in ruptured
   plaques, while HLA-DRA, HLA-DMB, HLA-DMA, and RILP were increased.
   (Fig. [115]7A). Similar results were found in human CEA plaques
   (Fig. [116]7B).

Figure 7.

   [117]Figure 7
   [118]Open in a new tab

   The expression of 13 key genes in [119]GSE120521 and human CEA plaques.
   (A) the expression of 13 key genes in [120]GSE120521. Black round:
   stable plaques; black square: ruptured plaques; n = 4 for both stable
   and ruptured plaques. (B) the expression of 13 key genes in human CEA
   plaques. n = 10 for both stable and ruptured plaques. *P < 0.05,
   **P < 0.01 vs Stable group.

Discussion

   In the present study, six modules were detected by WGCNA using the
   dataset [121]GSE41571. Among the six modules, blue and brown modules
   were associated with the rupture of atherosclerotic plaques
   significantly. 72 hub genes were screened out from the blue and brown
   modules. After mapping the 72 hub genes to the DEGs, 45 genes were
   found and used to construct a sub-network. Eventually, 13 key genes
   were identified. The expression levels of EFEMP1, BGN, ELN, FMOD, DKK3,
   FBLN5, FZD6, HLA-DRA, HLA-DMB, HLA-DMA, and RILP were significantly
   changed in ruptured plaques. These genes may play critical roles in the
   stability of atherosclerotic plaques.

   The stability of atherosclerotic plaques is now considered to be
   essential in the clinical events of atherosclerosis. Studies have shown
   the abundant presence of inflammatory cells, such as monocyte-derived
   macrophages and T-lymphocytes at the site of ruptured plaques. The
   mediators secreted by the inflammatory cells lead to the activation and
   proliferation of smooth muscle cells, lesion progression and matrix
   degradation of plaque fibrous cap^[122]13. Therefore, the active
   inflammation and matrix degradation might be the two most crucial
   processes that contribute to the rupture of plaques. However, the
   molecular mechanisms are still unclear.

   Previous studies compared the gene expression profile between unstable
   plaques and normal tissues^[123]14. They screened out the DEGs from
   [124]GSE41571, [125]GSE118481, and E-MTAB-2055, and analyzed the
   functions and mechanisms of these DEGs by GO and KEGG enrichment. The
   genes COL1A2, ADCY3, CXCR4, TYROBP, and IGFBP6 were thought to play
   roles in atherosclerotic plaques^[126]14. Another research analyzed the
   DEGs in [127]GSE41571^[128]10. COL3A1, COL1A2, ASPN, PPBP were thought
   to probably play important roles in the rupture of atherosclerotic
   plaques^[129]10. Our research used the WGCNA to construct the
   co-expression network. The modules with high association with the
   rupture of plaques were identified first. And then we mapped the hub
   genes to DEGs. The key genes screened out by this method were based on
   not only the expression level, but also the ruptured traits of the
   plaques.

   In the present study, we identified six modules through WGCNA analysis.
   Among these six modules, blue and brown modules were with the highest
   correlations with plaque traits (with a correlation of 0.82 and −0.9
   respectively); 72 hub genes were identified in these two modules.
   Further GO enrichment of the 72 hub genes from blue and brown modules
   indicates these genes be mostly related to cell adhesion, extracellular
   matrix organization, cell growth, cell migration, and leukocyte
   migration. These five processes are related to the key pathogenic steps
   of atherosclerosis progression and plaque rupture. In KEGG enrichment,
   terms of PI[3]K-Akt signaling, focal adhesion, ECM-receptor interaction
   were enriched. Focal adhesion, also named cell-matrix adhesion, is a
   structure which assembles the extracellular matrix (ECM) and
   interacting cells, through which regulatory signals are
   transmitted^[130]15. The interaction between ECM and cells is also
   mediated by some transmembrane receptors, such as integrins and CD36.
   The cell adhesion, migration, differentiation, proliferation and
   apoptosis, and the component of ECM are controlled by the interactions
   between ECM and cells. PI[3]K-Akt signaling is an important signaling
   pathway, which is included in variable processes such as protein
   synthesis, cell survival, apoptosis, proliferation, metabolism, etc.
   Taking all these together, the hub genes may affect the signal
   transduction and subsequent cell adhesion, migration, proliferation,
   and ECM degradation, which eventually lead to the progress and rupture
   of plaques.

   Further analysis discovered 13 key genes. Among them, EFEMP1, BGN, ELN,
   FMOD, DKK3, FBLN5, and FZD6 were decreased in ruptured plaques, while
   HLA-DRA, HLA-DMB, HLA-DMA, and RILP were downregulated.

   HLA-DRA, HLA-DMB, and HLA-DMA belong to the major histocompatibility
   complex, class II. They participate in the adaptive immune response and
   T cell receptor signaling pathway. They are binding peptides derived
   from antigens that access the endocytic route of antigen presenting
   cells (APC) and presents them on the cell surface for recognition by
   the CD4 T-cells^[131]16. The site of intimal rupture or erosion of
   thrombosed coronary atherosclerotic plaques were always characterized
   by abundant expression of HLA antigens on both inflammatory cells and
   adjacent smooth muscle cells, suggesting an active inflammatory
   reaction^[132]17. Our present research identified HLA-DRA, HLA-DMB, and
   HLA-DMA are key genes for the rupture of the plaques. The mechanisms
   may relate to the inflammation in atherosclerotic plaques. RILP encodes
   Rab interacting lysosomal protein. It is involved in the regulation of
   lysosomal morphology and distribution^[133]18. It promotes the
   centripetal migration of phagosomes and the fusion of phagosomes with
   the late endosomes and lysosomes^[134]18. However the mechanisms of
   RILP on atherosclerosis are still unknown. Based on the results of our
   present study, RILP might be a new target for the pathogenesis of
   atherosclerosis. BGN encodes biglycan. It is involved in collagen fiber
   assembly^[135]19. FMOD encodes fibromodulin. It affects the rate of
   fibrils formation and has a primary role in collagen
   fibrillogenesis^[136]20. FBLN5 encodes fibulin 5, while ELN encodes
   elastin. Elastin is a major structural protein of aorta^[137]21. It
   controls the stabilization of arterial structure by regulating
   proliferation and organization of vascular smooth muscle^[138]22. FBLN5
   is essential for elastic fiber formation. It is involved in the
   assembly of continuous elastin polymer and promotes the interaction of
   microfibrils and ELN^[139]21. These genes might contribute to the
   pathogenesis of atherosclerosis via affecting the collagen and the
   fibrous cap stability. FZD6 encodes the protein frizzled-6, a 7
   transmembrane protein receptor for Wnt4^[140]23,[141]24. FZD6 mediates
   the functions of Wnt signaling by both canonical and non-canonical
   pathways, but mainly through non-canonical pathway. The activation of
   Wnt4/FZD6 involves in the different diseases such as cancer, embryonic
   development, bone marrow mesenchymal stem cell
   dysfunction^[142]23,[143]25,[144]26. EFEMP1 encodes EGF containing
   fibulin extracellular matrix protein 1. EFEMPL binds to EGFR and play
   roles in cell adhesion and migration^[145]27. However, the roles of
   FZD6 and EFEMP1 in atherosclerosis are still unclear. Our present study
   indicates FZD6 might be a novel mechanism of atherogenesis. DKK3
   encodes protein dickkopf homolog 3, a secreted glycoprotein belongs to
   the dickkopf family. In the cardiovascular system, DKK3 is involved in
   the regulation of cardiac remodeling and vascular smooth muscle
   differentiation^[146]28,[147]29. In atherosclerotic mice, DKK3
   deficiency caused the acceleration of atherosclerosis and led to the
   vulnerable atherosclerotic plaques due to the reduction of the number
   of SMCs and matrix protein deposition^[148]23,[149]30. Further
   mechanism study on DKK3 is needed.

   Our study still has some limitations. According to FAQ’s of WGCNA
   package, no less than 15 samples are recommended. However, in our case,
   only 11 samples were analyzed. Therefore, we need to be aware of
   several problems with small sample size, such as poorer fit to
   scale-free topology, and the resulted network may be too noisy to be
   biologically meaningful. And another limitation is the lack of clinical
   trait data in the involved GEO datasets. These may affect the precision
   of present results on certain subgroup of patients.

   Overall, in our study, we used the WGCNA to identify the key genes for
   the ruptured traits of atherosclerotic plaques. 13 key genes were
   screened out. These genes may play crucial roles in inflammation,
   fibrous cap formation and degradation, and lead to the rupture of
   atherosclerotic plaques.

Materials and Methods

Data collection and preprocessing

   The gene expression profiles of [150]GSE41571 and [151]GSE120521 were
   downloaded from the Gene Expression Omnibus (GEO,
   [152]https://www.ncbi.nlm.nih.gov/geo/). The [153]GSE41571 was a
   genome-wide expression profile of macrophage-rich regions of 5 ruptured
   and 6 stable human atherosclerotic plaques. They were obtained from
   carotid endarterectomy operation. The clinical, radiological and
   histopathological criteria were used to evaluate the plaques as stale
   or ruptured. Laser micro-dissection was used to obtain the
   macrophage-rich regions. After amplification of total RNA of these
   samples, the transcriptional profiling was performed using [154]GPL570
   platform. The [155]GSE120521 was RNA-seq profile of 4 stable and 4
   ruptured human atherosclerotic plaques. The platform of [156]GPL16791
   Illumina HiSeq. 2500 was used.

   The preprocessing of the two series matrix profiles was conducted on R
   statistical software version 3.4.2 with related packages from
   Bioconductor ([157]www.bioconductor.org). After background correction,
   the original data were quantile normalized using Robust Multi-array
   Average (RMA) algorithm in the affy package in Bioconductor.

DEGs identification

   The two series matrix files were annotated with official gene symbols
   using the platform file and annotation package in the R software. The
   “LIMMA” package in the R was used to identify the DEGs. Genes with a
   logFC > 1.0 and p-value < 0.05 were considered to be DEGs. The heatmap
   of these DEGs were conducted using software “MeV”. “ggplot2” package in
   R software was used to produce the volcano plot. All the packages used
   in R software were downloaded from Bioconductor
   ([158]http://www.bioconductor.org).

Co-expressed module identification by WGCNA

   The WGCNA method was used to construct the co-expression network and to
   identify the co-expressed modules. The WGCNA was conducted in the R
   software using WGCNA package. The [159]GSE41571 dataset includes 8413
   genes. Among them, the top 50% most variantly expressed genes (4207
   genes) were used for further WGNCA analysis. The WGCNA was conducted as
   previously reported^[160]31. Briefly, the similarity matrix S[ij] was
   constructed by calculating the Pearson’s correlation coefficient
   between each pair of genes i and j. The adjacency matrix A[ij] was
   obtained by calculating the connection strength between gene i and j.
   The formulas were as follows:
   [MATH:
   <msup><mrow><mo>(</mo><mrow><msub><mrow><mi>S</mi></mrow><mrow><mi>i</m
   i><mi>j</mi></mrow></msub></mrow><mo>)</mo></mrow><mrow><mi>s</mi><mi>i
   </mi><mi>g</mi><mi>n</mi><mi>e</mi><mi>d</mi></mrow></msup><mo>=</mo><m
   row><mo>|</mo><mrow><mrow><mo>(</mo><mrow><mn>1</mn><mo>+</mo><mi
   mathvariant="normal">cor</mi><mrow><mo>(</mo><mrow><msub><mrow><mi>X</m
   i></mrow><mrow><mi>i</mi></mrow></msub><mo>,</mo><msub><mrow><mi>Y</mi>
   </mrow><mrow><mi>j</mi></mrow></msub></mrow><mo>)</mo></mrow></mrow><mo
   >)</mo></mrow><mo>/</mo><mn>2</mn></mrow><mo>|</mo></mrow> :MATH]
   [MATH:
   <msub><mrow><mi>A</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><m
   o>=</mo><mi>p</mi><mi>o</mi><mi>w</mi><mi>e</mi><mi>r</mi><mrow><mo>(</
   mo><mrow><msub><mrow><mi>S</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow>
   </msub><mo>,</mo><mi>β</mi></mrow><mo>)</mo></mrow> :MATH]

   Where Xi and Yj were vectors of expression values and Sij was the
   Pearson’s correlation coefficient of gene i and j; A[ij] represented
   the network strength between gene i and j. A power of β = 9 was
   selected in the present study as the soft-thresholding parameter to
   ensure a scale-free network.

   To identify the co-expressed gene modules, the adjacency metrix was
   changed to topological matrix by the method of TOM. The formula for TOM
   was as follows:
   [MATH:
   <mi>T</mi><mi>O</mi><msub><mrow><mi>M</mi></mrow><mrow><mi>i</mi><mi>j<
   /mi></mrow></msub><mo>=</mo><mfrac><mrow><msubsup><mo>∑</mo><mrow><mi>K
   </mi><mo>=</mo><mn>1</mn></mrow><mi>N</mi></msubsup><mrow><msub><mrow><
   mi>A</mi></mrow><mrow><mi>i</mi><mi>k</mi></mrow></msub><mo>⋅</mo><msub
   ><mrow><mi>A</mi></mrow><mrow><mi>k</mi><mi>j</mi></mrow></msub><mo>+</
   mo><msub><mrow><mi>A</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub
   ></mrow></mrow><mrow><mi>min</mi><mo>(</mo><msub><mrow><mi>K</mi></mrow
   ><mrow><mi>i</mi></mrow></msub><mo>,</mo><msub><mrow><mi>K</mi></mrow><
   mrow><mi>j</mi></mrow></msub><mo>)</mo><mo>+</mo><mn>1</mn><mo>−</mo><m
   sub><mrow><mi>A</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub></mr
   ow></mfrac> :MATH]

   With the minimum gene group size set as 20 for the gene dendrogram, the
   TOM-based dissimilarity measure was adopted to perform the average
   linkage hierarchical clustering. Using the Dynamic Tree Cut algorithm,
   genes were categorized into the same gene modules if they had similar
   expression profiles.

Relations between identified modules and plaque straits

   To identify the gene modules which were highly related to the ruptured
   traits of atherosclerotic plaques, module eigengenes (MEs) were firstly
   identified. MEs were principal components of a gene module. The
   expression of MEs represented all genes in the given module. The
   clinically significant modules were identified by calculating the
   correlation between MEs and ruptured traits of plaques. The gene
   significance (GS) and module significance (MS) were also calculated to
   identify the relations between gene expression and ruptured traits of
   plaques. Once the linear regression model of ruptured traits of plaques
   vs. gene expression was obtained, the GS was then defined as the
   mediated P-value of each gene (GS = lgP) in the model. The MS was
   defined as the average GS of all the genes involved in the module. MS
   was measured to incorporate clinical information into the coexpression
   network.

Hub genes identification

   There is a tendency that, in a given module, the hub genes have strong
   correlations with certain clinical traits. To identify the hub genes of
   a module, the correlation between ruptured traits of plaques and gene
   expression were measured by the absolute value of Pearson’s correlation
   (cor.gene TraitSignificance>0.2). The inter-module connectivity of
   genes was also measured by the absolute value of the Pearson’s
   correlation. Genes with cor.geneModuleMembership>0.9 were considered to
   be hub genes.

Gene ontology and pathway enrichment analysis

   The DAVID Functional Annotation Bioinformatics Microarray Analysis
   (website:[161]https://david.ncifcrf.gov/) was used for the GO
   enrichment and KEGG pathway analysis. The GO enriched the genes into 3
   terms: Molecular Function (MF), Biological Process (BP), and Cellular
   Component (CC). A q-value<0.05 was considered to be statistically
   significant for the correlations. Through GO and KEGG enrichment, the
   possible functions or biological processes of the genes in a given
   module were predicted. The Bubble Charts were conducted using the
   OmicShare tools, a free online platform for data analysis
   ([162]http://www.omicshare.com/tools). The RichFactor was calculated by
   “counts”/“pop hits”. The “counts” is the number of hub genes enriched
   in a certain term. The “pop hits” is the number of all genes enriched
   in a certain term.

Protein-protein interaction network integration

   The STRING online tool (website: [163]https://string-db.org/) was used
   for the identification of protein-protein interactions. The STRING
   database covers the number of 9 643 763 proteins from 2 031 organisms.
   It provides direct (physical) interactions and indirect (functional)
   associations; they stem from computational prediction, from knowledge
   transfer between organisms, and from interactions aggregated from other
   (primary) databases. The main databases used for STRING are the Genomic
   Context Predictions, the High-throughput Lab Experiments, the
   (Conserved) Co-Expression, the Automated Textming, and the Previous
   Knowledge in Databases.

Key gene identification and validation

   To figure out the key genes, we firstly mapped the hub genes to the
   DEGs and obtained the logFC and p-value of each hub gene. Then we
   screened out the hub genes with logFC>1.0 and p-value <0.05. These
   genes were used to construct the sub-network using Cytoscape 3.7.2
   software. The plugin “MCODE” tool in the “Apps” of the Cytoscarpe was
   used to calculate the most significant clusters of genes. The genes in
   the clusters were considered to be key genes.

   To validate the key genes, the expression profile of dataset
   [164]GSE120521 was downloaded from GEO, and the key genes expression
   were analyzed.

   For further validation, human CEA plaques were used. A total of 10
   patients with stable plaques and 10 patients with ruptured plaques were
   involved. All patients were performed CEA from the First Hospital of
   Jilin University (Changchun, Jilin, China) from July, 2019 to November,
   2019. The procedures were approved by the Ethics Committee of the First
   Hospital of Jilin University (No.2019-272, Changchun, Jilin). Written
   informed consent was obtained from every participant.

   The surgical specimens were obtained and fast-frozened in liquid
   nitrogen, then they were stored at −80 °C until use. The evaluation of
   stable and unstable plaques was conducted according to the
   classification defined by the American Heart Association (AHA) and
   previous report^[165]32. The type I/II plaques are with near-normal
   wall thickness but no calcification; the type III plaques are small
   eccentric plaque, or plaque with diffuse intimal thickening, but no
   calcification; the type IV/V plaques has lipid or necrotic core, which
   is surrounded by fibrous tissue, and with possible calcification; the
   type VI are complex plaque with possible surface defect, haemorrhage or
   thrombus; the type VII are plaques with obvious calcification; the type
   VIII are fibrotic plaque with possible small calcification, but without
   lipid core^[166]32. In the present study, the type I-II, III, VII, VIII
   were considered stable plaques, while type IV-V, VI to be ruptured
   plaques. The classification of carotid artery was performed by two
   independent investigators.

   The expressions of key genes in these plaques were analyzed using qPCR
   as previously reported by our lab^[167]33. Briefly, the total RNA was
   extracted by Trizol. After reverse transcripted to cDNA, the qPCR
   reaction was conducted using the SYBR Green Premix DimerEraser Kit
   (TaKaRa Bio Inc., Dalian, China) on ABI QuantaStudio5 system. The qPCR
   program was: 95 °C for 30 s, followed by 40 cycles of 95 °C 5 s, 60 °C
   30 s. The primers used were listed in Table [168]S3. All samples were
   run in triplicate. The results were calculated by the 2^(−ΔΔCq)
   Method^[169]34.

Acknowledgements