Abstract Purpose This study focused on identification of long non-coding RNAs (lncRNAs) for prognosis prediction of glioblastoma (GBM) through weighted gene co-expression network analysis (WGCNA) and L1-penalized least absolute shrinkage and selection operator (LASSO) Cox proportional hazards (PH) model. Materials and methods WGCNA was performed based on RNA expression profiles of GBM from Chinese Glioma Genome Atlas (CGGA), National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO), and the European Bioinformatics Institute ArrayExpress for the identification of GBM-related modules. Subsequently, prognostic lncRNAs were determined using LASSO Cox PH model, followed by constructing a risk scoring model based on these lncRNAs. The risk score was used to divide patients into high- and low-risk groups. Difference in survival between groups was analyzed using Kaplan–Meier survival analysis. IncRNA-mRNA networks were built for the prognostic lncRNAs, followed by pathway enrichment analysis for these networks. Results This study identified eight preserved GBM-related modules, including 188 lncRNAs. Consequently, C20orf166-AS1, LINC00645, LBX2-AS1, LINC00565, LINC00641, and PRRT3-AS1 were identified by LASSO Cox PH model. A risk scoring model based on the lncRNAs was constructed that could divide patients into different risk groups with significantly different survival rates. Prognostic value of this six-lncRNA signature was validated in two independent sets. C20orf166-AS1 was associated with antigen processing and presentation and cell adhesion molecule pathways, involving nine common genes. LBX2-AS1, LINC00641, PRRT3-AS1, and LINC00565 were related to focal adhesion, extracellular matrix receptor interaction, and mitogen-activated protein kinase signaling pathways, which shared 12 common genes. Conclusion This prognostic six-lncRNA signature may improve prognosis prediction of GBM. This study reveals many pathways and genes involved in the mechanisms behind these lncRNAs. Keywords: lncRNA, risk score, WGCNA, network, pathway Introduction Glioblastoma (GBM), grade IV glioma, is the most common and aggressive type of brain cancer characterized by high morbidity and mortality and dismal prognosis.[35]^1^,[36]^2 Reportedly, the median survival of patients with newly diagnosed GBM is approximately 15 months.[37]^3 Despite the development of medical interventions such as surgical resection, radiological therapy, and chemotherapeutic therapy, the survival rate remains largely unchanged over the past years.[38]^4 A deep understanding on the pathogenesis of GBM and the discovery of molecular biomarkers likely contribute to the improvement of GBM survival. Long non-coding RNAs (lncRNAs) are defined as transcripts greater than 200 nucleotides that do not code proteins.[39]^5 With the development of genome-wide expression profiling, a huge amount of novel lncRNAs have been discovered. These lncRNAs are known to play key roles in a broad range of biological processes such as cell differentiation, human diseases, and tumorigenesis.[40]^6 Unraveling potential roles of lncRNAs in GBM has emerged as a leading edge of GBM research.[41]^7 For instance, Han et al[42]^8 revealed that ASLNC22381 and ASLNC2081 may engage in recurrence and progression of GBM through conducting lncRNA and mRNA profiling. In addition, Zhang et al[43]^9 reported a set of lncRNAs that have prognostic value for GBM by lncRNAs bioinformatics analysis in The Cancer Genome Atlas (TCGA). Moreover, a recent study identifies an immune-related lncRNA signature for prognostic prediction based on TCGA data of GBM patients.[44]^10 Despite these valuable findings, the majority of lncRNAs in GBM remains poorly understood. In comparison with previous studies that identified prognostic lncRNA signatures based on the limited microarray data from TCGA,[45]^9^,[46]^10 we carried out a comprehensive analysis on all publicly available gene expression data of GBM from Chinese Glioma Genome Atlas (CGGA), National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO), and the European Bioinformatics Institute (EBI) ArrayExpress repositories through a series of bioinformatics approaches. We searched for GBM-related key modules through constructing a weighted gene co-expression network analysis (WGCNA). Based on the lncRNAs contained in these key modules, we acquired a panel of lncR-NAs as prognostic biomarkers by univariate Cox regression analysis, in combination with Cox proportional hazards (PH) model based on the L1-penalized least absolute shrinkage and selection operator (LASSO) estimation. Subsequently, a prognostic scoring system was constructed based on these prognostic lncRNAs to evaluate the death risk due to GBM. In addition, lncRNA alterations in GBM compared to normal samples were analyzed using metaDE method. Furthermore, pathway enrichment analysis using Gene Set Enrichment Analysis (GSEA) was conducted to give some insights into the underlying mechanisms of these predictive lncRNAs. Materials and methods Data resource The data sets in this study were derived from three sources. First, the gene expression data of 325 glioma samples, named “Part D”,[47]^11 was downloaded from the CGGA ([48]http://cgga.org.cn/), including 144 GBM samples that were selected as the training set in this study (platform: Illumina HiSeq 2000 RNA Sequencing). Survival information was available for 138 patients with GBM, of whom, 92 were dead, while 46 were alive with a median survival time of 13.22±11.44 months. Second, the data sets were searched in the NCBI GEO ([49]http://www.ncbi.nlm.nih.gov/geo/) and the EBI ArrayExpress ([50]https://www.ebi.ac.uk/arrayexpress/) repositories for publication of human GBM with no less than 40 samples. As a result, three data sets of [51]GSE51062, [52]GSE36245, and E-TABM-898 were obtained, including 52 samples, 46 samples, and 56 samples, separately. The platform for all the three data sets was Affymetrix-[53]GPL570. We also searched for human GBM data sets that had no less than 50 samples and simultaneously available survival information in NCBI GEO and EBI ArrayExpress. Two data sets, the [54]GSE74187 (n=60) and [55]GSE83300 (n=50), meeting the criteria were included in this study. The platform for both of them was Agilent-014850. In addition, we needed human GBM gene expression data sets that had both GBM samples and paired normal tissue samples, with the total number of samples greater than 40. Through exploring NCBI GEO and EBI ArrayExpress, the [56]GSE22866 (including 40 GBM samples and six normal samples; platform: Affymetrix-[57]GPL570), [58]GSE50161 (including 34 GBM samples and 13 normal samples; platform: Affymetrix-[59]GPL570), and [60]GSE4290 (including 77 GBM samples and 23 normal samples; platform: Agilent-014850) were acquired. Third, RNA-seq data set comprising 154 GBM samples and 18 normal samples was downloaded from the TCGA ([61]https://gdc-portal.nci.nih.gov/). There were 152 samples available with survival information, including 102 dead and 50 live samples. Data preprocessing For the data sets downloaded from Affymetrix-[62]GPL570 platform, raw data (CEL files) were background corrected and normalized[63]^12 using the oligo package (version 1.41.1, [64]http://www.bioconductor.org/packages/release/bioc/html/oligo.html) in R language (version 3.4.1). With respect to the data sets from Agilent-014850 platform, raw data (TXT files) underwent log2 transformation to yield approximately normal distribution with the limma[65]^13 software (version 3.34.0, [66]https://bioconductor.org/packages/release/bioc/html/limma.html), followed by standardization using the median method. CGGA and TCGA data were subject to quantile normalization using the preprocessCore package[67]^14 (version1.40.0, [68]http://bioconductor.org/packages/release/bioc/html/prepro-cessCore. html) in R language (version 3.4.1). Next, according to platform annotation files, the probes in all data sets that had RefSeq transcript ID and annotation information as non-coding RNA in the Refseq database were chosen. Moreover, the platform sequencing data were aligned to human genome (GRCh38 version) by using the Clustal2 ([69]http://www.clustal.org/clustal2/).[70]^15 The acquired lncRNAs combining with the annotated lncRNAs in the Refseq database[71]^16 were extracted for further analysis. WGCNA The WGCNA (version 1.61, [72]https://cran.r-project.org/web/packages/WGCNA/index.html)[73]^17 was applied to build a WGCNA to mine GBM-related preserved modules. For this network analysis, the CGGA data were referred to as the training set, while the [74]GSE51062, [75]GSE36245, and E-TABM-898 as validation sets. Initially, comparability between the four sets was analyzed using correlation analysis. A WGCNA was constructed in accordance with a previous study.[76]^18 Briefly, using scale-free topology criterion, the soft threshold power of β was established, through which the weighted adjacency matrix was developed. The modules with size ≥150 and minimum cut height of 0.99 were selected using dynamic tree cut algorithm, and the preserved modules were determined using the module preservation function of WGCNA package. In addition, the possible biological functions of the significantly preserved modules were studied using userListEnrichment function of WGCNA package. Selection of prognosis-related lncRNAs Based on the lncRNAs in the preselected preserved WGCNA modules and the corresponding survival information, univariate Cox regression analysis was used to identify the lncRNAs that were significantly correlated with prognosis (logrank P<0.05) by using survival package (version 2.4, [77]https://cran.r-project.org/web/packages/survival/index.html) in R language (version 3.4.1).[78]^19 Construction of prognosis scoring model based on lncRNAs The identified prognosis-related lncRNAs were used to fit a Cox PH model based on the LASSO estimation[79]^20 to select the optimal panel of prognostic lncRNAs. The optimal value for penalization coefficient lambda was selected by running cross-validation likelihood (cvl) 1,000 times. Subsequently, the Cox PH coefficients and expression levels of these prognostic lncRNAs were extracted to calculate the risk score as a measure of survival risk for each patient using the following formula: [MATH: Risk score=βl ncRNA1×exprlncRNA1+βlncRNA2×exprlncRNA2++βlncRNAn×exprl ncRNAn, :MATH] where β[lncRNAn] represents Cox PH coefficient of lncRNAn and expr[lncRNA] represents expression level of lncRNAn. All samples in the CGGA set were dichotomized into high- and low-risk groups by risk score, with median risk score as the threshold. Then, three independent sets with concomitant survival information (TCGA set, [80]GSE74187, and [81]GSE83300) were utilized to evaluate the effectiveness and robustness of the abovementioned risk scoring model. As mentioned above, the three data sets contained all available GBM data with survival information in TCGA, NCBI GEO, and EBI ArrayExpress. In the same manner, samples in each set were categorized by risk score into predicted high-and low-risk groups. Survival difference between different risk groups in each set was analyzed using the Kaplan–Meier curve in combination with the Wilcoxon logrank test. Analysis of consensus differentially expressed RNAs (DERs) [82]GSE22866, [83]GSE50161, and [84]GSE4290 contained both GBM samples and normal control samples. We screened the overlapped DERs between GBM and normal samples across the three data sets using MetaDE package ([85]https://cran.r-project.org/web/packages/MetaDE/),[86]^21^,[87]^22 under the thresholds of tau2=0, Qpval>0.05, P<0.05 and false discovery rate (FDR) <0.05. Of them, tau2 and Qpval were measures of heterogeneity for the heterogeneity test. When tau2=0 and Qpval>0.05, the gene was homogeneous and unbiased. Pathway enrichment analysis We built lncRNA-mRNA networks with the selected prognostic lncRNAs and their correlated mRNAs in WGCNA modules. GSEA is a powerful approach for annotating gene expression data that are characterized by focusing on gene set with common biological function, chromosomal location, or regulation ([88]http://software.broadinstitute.org/gsea/index.jsp).[89]^23 We performed pathway enrichment analysis for the lncRNA-mRNA networks using GSEA. Pathways with nominal (NOM) P-value <0.05 were considered significant. GSEA-enriched results were shown by normalized enrichment score (NES) that was calculated as previously described.[90]^24 Results WGCNA co-expression network construction and module mining After preprocessing, 609 lncRNAs and 14,948 mRNAs were overlapped in CGGA data set, [91]GSE51062, [92]GSE36245, and E-TABM-898. By using WGCNA, correlation analysis between any two sets of the four data sets, CGGA data set (training set), [93]GSE51062, [94]GSE36245, and E-TABM-898 (validation sets), were performed. As shown in [95]Figure 1, the correlation coefficients ranged from 0.5 to 1 (P-values<1e–200), suggesting that the expression of common RNAs among the four data sets was coincident. Figure 1. [96]Figure 1 [97]Open in a new tab Correlation analysis between CGGA data set, [98]GSE51062, [99]GSE36245, and E-TABM-898. Abbreviation: CGGA, Chinese Glioma Genome Atlas. Initially, WGCNA of RNAs was built for the training set (CGGA set). According to the scale-free topology criterion, the soft threshold power of β was set as 5 when scale-free topology model-fit R[100]^2=0.9. The phylogenetic tree mined nine co-expression modules (module size, ≥50; cut height, ≥0.99) in the WGCNA ([101]Figure 2A). As shown by the color bands underneath the phylogenetic tree, nine modules were represented by branches of different colors (M1, black; M2, blue; M3, brown; M4, green; M5, gray; M6, pink; M7, red; M8, turquoise; M9, yellow). Moreover, these modules were validated in E-TABM-898, [102]GSE51062, and [103]GSE36245 ([104]Figure 2B and D). In the three validation sets, genes were colored in the same manner as in TCGA set. Figure 2. [105]Figure 2 [106]Open in a new tab Clustering results of WGCNA modules in CGGA set (A), E-TABM-898 (B), [107]GSE51062 (C), and [108]GSE36245 (D). Note: Modules are labeled in different colors. Abbreviations: CGGA, Chinese Glioma Genome Atlas; WGCNA, weighted gene co-expression network analysis. As can be seen from a multidimensional scaling (MDS) for gene expression data of the nine modules ([109]Figure 3A), genes in yellow and red modules showed similar expression and genes in brown and black modules exhibit similar expression. Hierarchical clustering analysis of modules found that the yellow and red modules were on the same branch ([110]Figure 3B). These observations illustrate that the yellow and red modules possess similar gene expression patterns. Figure 3. [111]Figure 3 [112]Open in a new tab Further analyses of WGCNA modules in CGGA. Notes: (A) An MDS plot displaying expression data of genes in different modules. (B) A hierarchical clustering tree of modules. Abbreviations: CGGA, Chinese Glioma Genome Atlas; MDS, multidimensional scaling; WGCNA, weighted gene co-expression network analysis. Module preservation analysis found that among the nine nodules, eight modules had Z-score >5 ([113]Table 1). The eight modules were ranked in a descending order of Z-score. Top three modules were yellow module (Z-score=34.5011), red module (Z-score=34.3040), and black module (Z-score=24.5504), which were highly overlapped across all datasets. This observation indicates that the three modules may provide important information concerning the pathological mechanisms of GBM. With regard to functional annotation, the yellow module (84 lncRNAs) was related to biological adhesion, the red module (26 lncRNAs) was associated with immune response, and the brown module (eight lncRNAs) was possibly involved in synaptic transmission ([114]Table 1). Table 1. Features of WGCNA modules Module Color Module size Number of mRNAs Number of lncRNAs Preservation Z-score Module annotation Module 1 Black 199 178 21 24.5504 Regulation of action potential in neuron Module 2 Blue 386 376 10 11.6620 Cell cycle Module 3 Brown 299 291 8 24.2916 Synaptic transmission Module 4 Green 214 209 5 7.5840 Regulation of system process Module 5 Gray 3,418 3,405 13 2.7203 Regulation of cell proliferation Module 6 Pink 173 173 0 7.6614 RNA processing Module 7 Red 232 206 26 34.3041 Immune response Module 8 Turquoise 483 449 34 14.4361 Regulation of transcription Module 9 Yellow 301 217 84 34.5012 Biological adhesion [115]Open in a new tab Abbreviations: lncRNAs, long non-coding RNAs; WGCNA, weighted gene co-expression network analysis. Identification of prognosis-related lncRNAs There were 188 lncRNAs in the eight overlapped WGCNA modules. Based on the survival information of CGGA set, 32 lncRNAs were identified to be significant prognosis-related lncRNAs by univariate Cox regression analysis. As shown in [116]Figure 4, among the 32 prognosis-related lncRNAs, 11 were in the yellow module, eight in the red module, and eight in the turquoise module. As aforementioned, the yellow and red modules had similar gene expression patterns. Moreover, the two modules were functionally related to biological adhesion and immune response, which were critical for GBM pathogenesis.[117]^25^,[118]^26 Therefore, the 19 lncRNAs in the yellow and red modules were selected for further analysis. Figure 4. Figure 4 [119]Open in a new tab Distribution of the identified prognosis-related lncRNAs in WGNCA modules. Abbreviations: lncRNA, long non-coding RNA; WGCNA, weighted gene co-expression network analysis. Development of a six-lncRNA prognostic scoring system Expression of the 19 lncRNAs in the yellow and red modules were used as input for LASSO Cox PH model. When the cvl was maximized to be −466.2711, the optimal lambda value was 18.0151. As a result, a panel of six lncRNAs was selected as predictive factors for survival, including C20orf166-AS1, LINC00645, LBX2-AS1, LINC00565, LINC00641, and PRRT3-AS1 ([120]Table 2). For predicting each individual patient’s survival probability, risk score was calculated for each patient with the following formula: [MATH: Risk score=< mo>(0.83631)× ExpC20orf166AS1+(1.18806) ×ExpLINC00645+(0.11155 )×ExpLBX2-AS1+(1.04407)×ExpLINC 00565+(1 .16291)×ExpLINC00641+(0.29694)×ExpPRRT3AS1. :MATH] Table 2. Six lncRNAs selected by Cox PH model lncRNA β value HR (95% CI) P-values in univariate Cox regression Module __________________________________________________________________ C20orf166-AS1 0.8363 1.899 (1.238–2.914) 0.0031 Red LINC00645 1.1881 1.119 (1.018–1.231) 0.0182 Red LBX2-AS1 0.1116 1.094 (0.652–1.724) 0.0137 Yellow LINC00565 1.0441 1.259 (1.057–1.500) 0.0075 Yellow LINC00641 −1.1629 0.120 (1.039–1.207) 0.0031 Yellow PRRT3-AS1 0.2969 2.688 (1.200–6.021) 0.0159 Yellow [121]Open in a new tab Abbreviations: lncRNA, long non-coding RNA; PH, proportional hazards. Prediction of overall survival (OS) of GBM patients The aforementioned lncRNA-based risk scoring system was applied to the CGGA set. With the median risk score as cutoff, all patients in the CGGA set were categorized into a high-risk group (n=69) and a low-risk group (n=69). The results showed that the low-risk group had significantly longer OS compared to the high-risk group (16.61±14.22 months vs 9.83±6.17 months, logrank, P=0.000127; [122]Figure 5A). Figure 5. [123]Figure 5 [124]Open in a new tab Kaplan–Meier survival curves estimating overall survival in CGGA set (A), E-TABM-898 (B), [125]GSE51062 (C), and [126]GSE36245 (D). Notes: Patients in each set are sorted by risk score into a high-risk group and a low-risk group. Logrank P-values are calculated by logrank test. Abbreviations: CGGA, Chinese Glioma Genome Atlas; TCGA, The Cancer Genome Atlas. The predictive capability of this prognostic scoring system was tested in TCGA set, [127]GSE74187, and [128]GSE83300, and the risk score and risk group categories were similar for each of them. As shown in [129]Figure 5B, for TCGA set (n=152), when compared to the high-risk group, a notably better survival was observed in the low-risk group (14.93±12.54 months vs 9.19±6.65 months, logrank P=0.0001195). Consistent results were also found for [130]GSE74187 (n=60; 22.47±10.14 month vs 15.83±10.11 month, log-rank p=0.02568, [131]Figure 5C). For [132]GSE83300, the low-risk group had a longer OS compared to the high-risk group, with marginally significant difference (logrank P=0.09198; [133]Figure 5D). It may be attributed to the relatively small sample size (n=50) of [134]GSE83300. These findings offer strong evidence for the prognostic power of the six-lncRNA prognostic scoring system. Identification of overlapped DERs in [135]GSE22866, [136]GSE50161, and [137]GSE4290 [138]GSE22866, [139]GSE50161, and [140]GSE4290 with both GBM and normal tissue samples were used in the present study to find overlapped DERs between GBM and normal samples. Totally, 3,989 overlapped DERs (tau2=0, Qpval>0.05, P-value<0.05, and FDR,0.05) were found; of which, 98 were lncRNAs. Notably, of the aforementioned six prognostic lncRNAs, LBX2-AS1, LINC00641, LINC00645, and LINC00565 were DERs. A heatmap for expression of these overlapped DERs demonstrates that the DERs in the [141]GSE22866, [142]GSE50161, and [143]GSE4290 had similar expression patterns ([144]Figure 6). Figure 6. [145]Figure 6 [146]Open in a new tab A heatmap showing expression of consensus DERs in tumor and control samples of [147]GSE22866, [148]GSE50161, and [149]GSE4290. Abbreviation: DER, differentially expressed RNA. Establishment of lncRNA-mRNA networks To explore the relationships between the six prognostic lncRNAs and genes in the yellow and red modules, lncRNA-mRNA networks were constructed for the two modules, respectively ([150]Figure 7A and B). For the red module, the lncRNA-mRNA network was composed of two lncRNAs (C20orf166-AS1 and LINC00645) and 206 genes, of which, five were downregulated DERs and 72 were upregulated ([151]Figure 7A). For the yellow module, the lncRNA-mRNA network contained four lncRNAs (LBX2-AS1, LINC00641, PRRT3-AS1, and LINC00565) and 217 genes, of which, four were downregulated DERs and 97 were upregulated ([152]Figure 7B). Figure 7. [153]Figure 7 [154]Open in a new tab LncRNA-mRNA networks for the identified prognostic lncRNAs in the red module (A) and the yellow module (B). Notes: A round node stands for a gene, while a square node stands for an lncRNA. A regular triangle represents an upregulated gene, while an inverted triangle represents a downregulated gene. Green or red link signals negative or positive association, respectively, between two nodes. Abbreviation: lncRNA, long non-coding RNA. Pathway analysis We conducted pathway enrichment analysis using GSEA for the two lncRNA-mRNA networks. It was revealed that C20orf166-AS1 in the red module was significantly enriched in antigen processing and presentation and cell adhesion molecules (CAMs) pathways ([155]Table 3). The two pathways involved nine common genes: major histocompatibility complex, class II, DM alpha (HLA-DMA), major histocompatibility complex, class II, DM beta (HLA-DMB), major histocompatibility complex, class II, DP beta 1 (HLA-DPB1), CD2, sialic acid-binding Ig-like lectin 1 (SIGLEC1), major histocompatibility complex, class II, DO alpha (HLA-DOA), major histocompatibility complex, class II, DQ alpha 1 (HLA-DQA1), major histocompatibility complex, class II, DR beta 1 (HLA-DRB1), and major histocompatibility complex, class II, DQ beta 1 (HLA-DQB1). As shown in [156]Table 4, LBX2-AS1, LINC00641, PRRT3-AS1, and LINC00565 in the yellow module were significantly associated with cancer, focal adhesion, extracellular matrix (ECM) receptor interaction, and mitogen-activated protein kinase (MAPK) signaling pathways. Twelve common genes were involved in the four pathways, including laminin subunit beta (LAMB) 1, collagen type V alpha 2 chain (COL5A2), TGFB 1, integrin subunit alpha (ITGA) 5, platelet-derived growth factor receptor beta (PDGFRB), TNF receptor superfamily (TNFRSF) 12A, dual-specificity phosphatase (DUSP) 6, laminin subunit gamma (LAMC) 1, LAMC3, TNFRSF1A and myosin light chain (MYL) 9. Table 3. Results of pathway enrichment analysis for the lncRNA-mRNA network of the red module Name ES NES NOM P-value KEGG_ANTIGEN_PROCESSING_AND_PRESENTATION 0.5920 1.3432 0.0192 KEGG_CELL_ADHESION_MOLECULES_CAMS 0.5170 1.3731 0.0311 [157]Open in a new tab Note: Positive and negative NES values denote upregulation and downregulation of genes or lncRNAs involved in pathways, respectively. Abbreviations: ES, enrichment score; lncRNA, long non-coding RNA; NES, normalized enrichment score; NOM, nominal. Table 4. Results of pathway enrichment analysis for the lncRNA-mRNA network of the yellow module Name PRRT3-AS1 LBX2-AS1 ES NES NOM P-value ES NES NOM P-value KEGG_PATHWAYS_IN_CANCER −0.4574 −1.0686 0.3951 0.1585 0.3544 0.0084 KEGG_FOCAL_ADHESION −0.3054 −0.7933 0.6885 0.2975 0.7244 0.0276 KEGG_ECM_RECEPTOR_INTERACTION −0.3010 −0.7540 0.7215 0.2821 0.6673 0.0381 KEGG_MAPK_SIGNALING_PATHWAY 0.3486 0.7623 0.7955 0.2419 0.5276 0.0498 LINC00641 LINC00565 ES NES NOM P-value ES NES NOM P-value KEGG_PATHWAYS_IN_CANCER 0.1796 0.4001 0.0099 −0.4807 −1.1598 0.0031 KEGG_FOCAL_ADHESION −0.1243 −0.3241 0.0137 −0.4050 −1.1102 0.0370 KEGG_ECM_RECEPTOR_INTERACTION 0.1864 0.4300 0.0397 −0.3442 −0.8985 0.0457 KEGG_MAPK_SIGNALING_PATHWAY 0.1801 0.3902 0.0464 −0.5860 −1.3128 0.0413 [158]Open in a new tab Note: Positive and negative NES values denote upregulation and downregulation of genes or lncRNAs involved in pathways, respectively. Abbreviations: ES, enrichment score; lncRNA, long non-coding RNA; NES, normalized enrichment score; NOM, nominal. Discussion Increasing evidence indicates that a growing number of lncRNAs are associated with various cancer types.[159]^27 This discovery leads to a growing interest in the study of lncRNAs in GBM. Based on the gene expression data of GBM from CGGA, NCBI GEO, and EBI ArrayExpress, we identified a prognostic signature of six lncRNAs (C20orf166-AS1, LINC00645, LBX2-AS1, LINC00565, LINC00641, and PRRT3-AS1) through a combination of WGCNA, univariate Cox regression analysis, and LASSO PH model. Moreover, a six-lncRNA-based risk scoring system was constructed and capable to classify GBM patients into two risk groups with significantly different survival rates. The prognostic performance of the risk scoring model was successfully validated in two independent sets. It indicates that the six lncRNAs are promising prognostic biomarkers for GBM and may play important roles in tumorigenesis of GBM. LINC00645 is an endometrial cancer-specific lncRNA.[160]^28 Emerging studies have proved that C20orf166-AS1 is aberrantly expressed in prostate cancer and bladder cancer.[161]^42^,[162]^43 However, the involvement of LINC00645 and C20orf166-AS1 in GBM has not been reported yet. In the present study, C20orf166-AS1 was identified as an important lncRNA of prognostic value for GBM. Moreover, pathway enrichment analysis showed that C20orf166-AS1 was significantly related to antigen processing and presentation and CAMs pathways, and both pathways shared HLA-DMA, HLA-DMB, HLA-DPB1, CD2, SIGLEC1, HLA-DOA, HLA-DQA1, HLA-DRB1, and HLA-DQB1. Among the nine common genes, HLA-DMA, HLA-DMB, HLA-DPB1, HLA-DOA, HLA-DQA, HLA-DRB1, and HLA-DQB1 are major histocompatibility complex class II molecules that are mainly expressed on antigen-presenting cells and play an important role in immune response. The protein encoded by CD2 gene is a CAM located on the surface of T cells and NK cells, and it acts as a specific marker for these cells.[163]^29 SIGLEC1 protein is a member of siglecs that are predominately expressed on the surface of immune cells and bind to glycans enclosing sialic acids.[164]^30 Interactions between siglecs and glycans are implicated in cell adhesion and cell signaling. These findings reveal that C20orf166-AS1 might participate in immune response and cell adhesion in GBM through the regulation of these genes in antigen processing and presentation and CAM pathways. Recent studies report that upregulation of LBX2-AS1 has been observed in lung cancer.[165]^31^,[166]^32 Interestingly, LBX2-AS1 is significantly upregulated with the increase of tumor grade in GBM,[167]^33 suggesting that this lncRNA probably has an important regulatory role in GBM prognosis. Alterations of LINC00641, PRRT3-AS1, and LINC00565 in cancer have been scarcely reported. The current study provided evidence that the four lncRNAs had predictive value for survival of GBM patients. Notably, the study uncovered that they were significantly linked to focal adhesion, ECM receptor interaction, and MAPK signaling pathways. These pathways involved 12 common genes, including LAMB1, COL5A2, TGFB1, ITGA5, PDGFRB, TNFRSF12A, DUSP6, LAMC1, LAMC3, TNFRSF1A, and MYL9. Increasing evidence has established that MAPK pathway is involved in regulating GBM cell migration and proliferation.[168]^34^,[169]^35 LAMB1, LAMC1, and LAMC3 are members of ECM glycoproteins. COL5A2 encodes an alpha chain for fibrillar collagen, a major component of ECM proteins.[170]^36 TGF-β1 is a member of TGF β superfamily and plays a role in the regulation of growth, proliferation, and differentiation of glioma cells.[171]^37 Integrin alpha-5 protein encoded by ITGA5 belongs to the integrin alpha chain family that is critical for cell adhesion.[172]^38 Recently, it is found that PDGFRB is elevated in GBM microvascular proliferation compared to GBM tumor cells and selectively expressed PDGFRB protein in pericytes.[173]^39 DUSP6 protein belongs to the dual-specificity protein phosphatase subfamily that acts as a negative regulator over MAPK members.[174]^40 Besides, it has been found that DUSP6 is upregulated in GBM and promotes the development of GBM.[175]^41 These results imply that the involvement of LBX2-AS1, LINC00641, PRRT3-AS1, and LINC00565 in GBM may be involved in focal adhesion, ECM receptor interaction, and MAPK signaling pathways. These common genes might be potential therapeutic targets for GBM. Conclusion Based on the comprehensive analysis of publicly accessible GBM data in CGGA, NCBI GEO, and EBI ArrayExpress, this study identifies a novel six-lncRNA signature for GBM prognostic prediction. This study also highlights the pathways and genes involved in the regulatory mechanisms underlying these prognostic lncRNAs. Further studies are warranted prior to the application of this lncRNA signature in clinical practice. Availability of data and material The raw data were collected and analyzed by the authors, and they are not ready to share their data because the data have not been published. Footnotes Author contributions RL and YQZ participated in the design of this study, and they both performed the statistical analysis. GZ and BZ carried out the study and collected important background information. HZ and MW drafted the manuscript. All authors read and approved the final manuscript. All authors contributed toward data analysis, drafting and revising the paper and agree to be accountable for all aspects of the work. Disclosure The authors report no conflicts of interest in this work. References