Abstract Intracranial aneurysms (IAs) are characterized by localized dilation or ballooning of a cerebral artery. When IAs rupture, blood leaks into the space around the brain to create a subarachnoid hemorrhage. The latter is associated with a higher risk of disability and mortality. The aims of this study were to gain greater insight into the pathogenesis of ruptured IAs, and to clarify whether identified hub genes represent potential biological markers for assessing the likelihood of IA progression and rupture. Briefly, the [38]GSE36791 and [39]GSE73378 datasets from the National Center of Biotechnology Information Gene Expression Omnibus database were reanalyzed and subjected to a weighted gene co-expression network analysis to test the association between gene sets and clinical features. The clinical significance of these genes as potential biomarkers was also examined, with their expression validated by quantitative real-time PCR. A total of 14 co-expression modules and 238 hub genes were identified. In particular, three modules (labeled turquoise, blue, and brown) were found to highly correlate with IA rupture events. Additionally, six potential biomarkers were identified (BASP1, CEBPB, ECHDC2, GZMK, KLHL3, and SLC2A3), which are strongly associated with the progression and rupture of IAs. Taken together, these findings provide novel insights into potential molecular mechanisms responsible for IAs and they highlight the potential for these particular genes to serve as biomarkers for monitoring IA rupture. 1. Introduction Intracranial aneurysms (IAs) represent a cerebrovascular disorder which affects between 3% and 5% of individuals. There is a potential risk that an IA will rupture, and this risk is higher in the posterior circulation [[40]1–[41]3]. With advances in intracranial imaging technologies, IAs have been detected more frequently. Consequently, clinicians are increasingly confronted with a dilemma regarding the choice of clinical management. Namely, whether preventive treatments (e.g., endovascular or surgical aneurysm repair) which are associated with inherent complication risks, or conservative management with or without follow-up imaging which leaves patients with a small, yet definite, risk of aneurysm rupture, should be applied [[42]3, [43]4]. When an IA ruptures, a subarachnoid hemorrhage (SAH) develops. This life-threatening clinical condition has an acute mortality rate of approximately 50% [[44]5, [45]6]. Despite considerable advances in therapy for IAs, SAH remains a highly lethal condition which is associated with a high socioeconomic burden [[46]7–[47]9]. Thus, an ability to identify IAs which have a high risk of rupture and provide timely preventive treatment may be key to the successful management of IAs. To predict IA rupture, researchers have studied aneurysmal hemodynamics [[48]10, [49]11], aneurysmal morphology and location [[50]8], genetics [[51]12], and other factors (e.g., cigarette smoking, hypertension, and positive family history for SAH) [[52]13]. Only inflow angle was identified as a significant predictor of rupture according to morphological parameters [[53]8]. However, aneurysm wall inflammation has also been shown to play a pivotal role in aneurysm growth and rupture [[54]2, [55]3, [56]14]. Two scoring systems have been established to evaluate the risk of rupture and to guide treatment. These include the PHASES (population, hypertension, age, size of aneurysm, earlier SAH from another aneurysm, and site of aneurysm) system and the UIATS (unruptured IA treatment score) system [[57]8, [58]15, [59]16]. However, standardized management of unruptured IAs remains controversial, as risks of prophylactic treatment must be weighed against possible risk of rupture for individual aneurysms [[60]17]. Despite the vast efforts made to date to prevent the rupture of IAs, the mechanisms mediating the pathology of IAs remains largely unknown. In particular, suitable biomarkers to predict IA rupture remain unavailable. Therefore, the aims of the present study were to gain greater insight into the pathogenesis of ruptured IAs and to clarify whether identified hub genes can be used as potential biological markers to assess the likelihood of IA progression and rupture ([61]Fig 1). For these aims, the [62]GSE36791 [[63]18] and [64]GSE73378 [[65]19] datasets from the National Center of Biotechnology Information (NCBI) Gene Expression Omnibus (GEO, [66]http://www.ncbi.nlm.nih.gov/geo/) were reanalyzed. A weighted gene co-expression network analysis (WGCNA) [[67]20] was performed to test possible correlations between gene sets and clinical features of IAs. Clinically significant genes were identified and their expression levels were validated in patients with ruptured versus unruptured IAs by quantitative real-time (qRT)-PCR. Fig 1. A flow chart illustrating the method used to identify six biomarkers associated with IA rupture. [68]Fig 1 [69]Open in a new tab The entries shown in the green box were completed in our previous study, while the entries shown in the red box are those that we intend to focus on in future studies. 2. Materials and methods 2.1 Microarray dataset Gene expression profile data from the [70]GSE36791 dataset were used as a training set to construct co-expression networks and to identify hub genes. This dataset was generated from peripheral blood samples collected from 43 patients with SAH due to ruptured IAs, and from 18 individuals with headaches (the reference group) [[71]18]. To verify the results obtained, the [72]GSE73378 dataset was used as a test set. The [73]GSE73378 dataset was generated from peripheral blood samples collected from 103 patients who developed an aneurysmal SAH at least two years prior, and from 107 individuals used as a reference group [[74]19]. 2.2 Data processing and WGCNA The [75]GSE36791 expression data were already normalized with a BeadArray package [[76]21] (quantile normalization). Probe sets were then mapped to gene symbols according to the [77]GPL10558 platform. After filtering the probes without a corresponding gene symbol, the average value of the gene symbols with multiple probes was calculated [[78]22]. Outlier samples were identified with hierarchical cluster analysis (i.e., distance matrices constructed with Pearson's correlation matrices and a hierarchical agglomerative method to adopt average linkages) by using the hclust function in WGCNA [[79]20]. Based on these results, three samples ([80]GSM901111, [81]GSM901112, and [82]GSM901161) in [83]GSE36791 were removed from subsequent analyses. In the present study, the co-expression analysis performed was based on a WGCNA, which is a systems biology method for describing correlation patterns among genes across microarray samples [[84]20]. Currently, this method is widely used in the international biomedical field. This method also helps identify clusters (modules) of highly correlated genes across samples. To identify ruptured IA-associated co-expression modules and their key constituents, preprocessed data of [85]GES36791 were analyzed. Briefly, Pearson's correlation matrices were generated (average linkage method) for all pair-wise genes. Next, co-expression similarities were transformed into a weighted adjacency matrix of connection strengths by using a power adjacency function (cutoff height of 0.85). This adjacency matrix was then transformed into a topological overlap matrix to measure relative gene interconnectedness and proximity. Finally, gene co-expression modules were identified based on clustering (hierarchical average linkage) according to topology overlap. 2.3 Identification of modules with clinical significance In order to identify modules related to the clinical traits of IA (group [rupture IA/ reference] and gender), two different approaches were used [[86]20, [87]23]. The first approach was to determine gene significance (GS) from a linear regression analysis of gene expression data and clinical traits [[88]20]. GS was defined based on a log10 transformation of P-values. Thus, a higher absolute value of GS[i] corresponds with greater biological significance of gene i [[89]20]. In addition, module significance (MS) is defined as the average GS value for all of the genes in a module, which implies that the higher the absolute value of MS[j], the more biologically significant is module j. The second approach was to define module eigengenes (MEs) as a major component in the principal component analysis for each module. Thus, MEs can be considered representative of the gene expression profiles in a module [[90]20]. Correlations between MEs and clinical traits were calculated to identify relevant modules. Modules with a P-value greater than 0.005 were considered to significantly correlate with certain clinical traits (ruptured IA). 2.4 Identification of hub genes and functional enrichment analysis Hub genes were defined according to module connectivity [absolute value of Pearson's correlation (cor.geneModuleMembership > 0.8)] and clinical trait relationship [absolute value of Pearson's correlation (cor.geneTraitSignificance > 0.5)] [[91]23]. Hub genes were highly interconnected with genes in the module with potential significance. To functionally characterize the identified hub genes, Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed by using clusterProfiler [[92]24]. The terms obtained from the KEGG pathway and GO analyses (including molecular functions, biological processes, and cellular components) which had false discovery rates (FDRs) < 0.05 were considered to be significantly enriched in the hub genes. Common hub genes were also screened from key genes identified in our previous study [[93]22] which were regarded as “candidate biomarkers” highly associated with IA rupture. 2.5 Analysis of the [94]GSE73378 dataset [95]GSE73378 expression data were already normalized (quantile normalization) by the surrogate variable analysis (SVA) package [[96]19]. Differentially expressed genes (DEGs) (ruptured IAs vs. references) were then screened by using the limma package [[97]25]. The