Abstract

Purpose

   This study focused on identification of long non-coding RNAs (lncRNAs)
   for prognosis prediction of glioblastoma (GBM) through weighted gene
   co-expression network analysis (WGCNA) and L1-penalized least absolute
   shrinkage and selection operator (LASSO) Cox proportional hazards (PH)
   model.

Materials and methods

   WGCNA was performed based on RNA expression profiles of GBM from
   Chinese Glioma Genome Atlas (CGGA), National Center for Biotechnology
   Information (NCBI) Gene Expression Omnibus (GEO), and the European
   Bioinformatics Institute ArrayExpress for the identification of
   GBM-related modules. Subsequently, prognostic lncRNAs were determined
   using LASSO Cox PH model, followed by constructing a risk scoring model
   based on these lncRNAs. The risk score was used to divide patients into
   high- and low-risk groups. Difference in survival between groups was
   analyzed using Kaplan–Meier survival analysis. IncRNA-mRNA networks
   were built for the prognostic lncRNAs, followed by pathway enrichment
   analysis for these networks.

Results

   This study identified eight preserved GBM-related modules, including
   188 lncRNAs. Consequently, C20orf166-AS1, LINC00645, LBX2-AS1,
   LINC00565, LINC00641, and PRRT3-AS1 were identified by LASSO Cox PH
   model. A risk scoring model based on the lncRNAs was constructed that
   could divide patients into different risk groups with significantly
   different survival rates. Prognostic value of this six-lncRNA signature
   was validated in two independent sets. C20orf166-AS1 was associated
   with antigen processing and presentation and cell adhesion molecule
   pathways, involving nine common genes. LBX2-AS1, LINC00641, PRRT3-AS1,
   and LINC00565 were related to focal adhesion, extracellular matrix
   receptor interaction, and mitogen-activated protein kinase signaling
   pathways, which shared 12 common genes.

Conclusion

   This prognostic six-lncRNA signature may improve prognosis prediction
   of GBM. This study reveals many pathways and genes involved in the
   mechanisms behind these lncRNAs.

   Keywords: lncRNA, risk score, WGCNA, network, pathway

Introduction

   Glioblastoma (GBM), grade IV glioma, is the most common and aggressive
   type of brain cancer characterized by high morbidity and mortality and
   dismal prognosis.[35]^1^,[36]^2 Reportedly, the median survival of
   patients with newly diagnosed GBM is approximately 15 months.[37]^3
   Despite the development of medical interventions such as surgical
   resection, radiological therapy, and chemotherapeutic therapy, the
   survival rate remains largely unchanged over the past years.[38]^4 A
   deep understanding on the pathogenesis of GBM and the discovery of
   molecular biomarkers likely contribute to the improvement of GBM
   survival.

   Long non-coding RNAs (lncRNAs) are defined as transcripts greater than
   200 nucleotides that do not code proteins.[39]^5 With the development
   of genome-wide expression profiling, a huge amount of novel lncRNAs
   have been discovered. These lncRNAs are known to play key roles in a
   broad range of biological processes such as cell differentiation, human
   diseases, and tumorigenesis.[40]^6 Unraveling potential roles of
   lncRNAs in GBM has emerged as a leading edge of GBM research.[41]^7 For
   instance, Han et al[42]^8 revealed that ASLNC22381 and ASLNC2081 may
   engage in recurrence and progression of GBM through conducting lncRNA
   and mRNA profiling. In addition, Zhang et al[43]^9 reported a set of
   lncRNAs that have prognostic value for GBM by lncRNAs bioinformatics
   analysis in The Cancer Genome Atlas (TCGA). Moreover, a recent study
   identifies an immune-related lncRNA signature for prognostic prediction
   based on TCGA data of GBM patients.[44]^10 Despite these valuable
   findings, the majority of lncRNAs in GBM remains poorly understood.

   In comparison with previous studies that identified prognostic lncRNA
   signatures based on the limited microarray data from
   TCGA,[45]^9^,[46]^10 we carried out a comprehensive analysis on all
   publicly available gene expression data of GBM from Chinese Glioma
   Genome Atlas (CGGA), National Center for Biotechnology Information
   (NCBI) Gene Expression Omnibus (GEO), and the European Bioinformatics
   Institute (EBI) ArrayExpress repositories through a series of
   bioinformatics approaches. We searched for GBM-related key modules
   through constructing a weighted gene co-expression network analysis
   (WGCNA). Based on the lncRNAs contained in these key modules, we
   acquired a panel of lncR-NAs as prognostic biomarkers by univariate Cox
   regression analysis, in combination with Cox proportional hazards (PH)
   model based on the L1-penalized least absolute shrinkage and selection
   operator (LASSO) estimation. Subsequently, a prognostic scoring system
   was constructed based on these prognostic lncRNAs to evaluate the death
   risk due to GBM. In addition, lncRNA alterations in GBM compared to
   normal samples were analyzed using metaDE method. Furthermore, pathway
   enrichment analysis using Gene Set Enrichment Analysis (GSEA) was
   conducted to give some insights into the underlying mechanisms of these
   predictive lncRNAs.

Materials and methods

Data resource

   The data sets in this study were derived from three sources.

   First, the gene expression data of 325 glioma samples, named “Part
   D”,[47]^11 was downloaded from the CGGA ([48]http://cgga.org.cn/),
   including 144 GBM samples that were selected as the training set in
   this study (platform: Illumina HiSeq 2000 RNA Sequencing). Survival
   information was available for 138 patients with GBM, of whom, 92 were
   dead, while 46 were alive with a median survival time of 13.22±11.44
   months.

   Second, the data sets were searched in the NCBI GEO
   ([49]http://www.ncbi.nlm.nih.gov/geo/) and the EBI ArrayExpress
   ([50]https://www.ebi.ac.uk/arrayexpress/) repositories for publication
   of human GBM with no less than 40 samples. As a result, three data sets
   of [51]GSE51062, [52]GSE36245, and E-TABM-898 were obtained, including
   52 samples, 46 samples, and 56 samples, separately. The platform for
   all the three data sets was Affymetrix-[53]GPL570. We also searched for
   human GBM data sets that had no less than 50 samples and simultaneously
   available survival information in NCBI GEO and EBI ArrayExpress. Two
   data sets, the [54]GSE74187 (n=60) and [55]GSE83300 (n=50), meeting the
   criteria were included in this study. The platform for both of them was
   Agilent-014850. In addition, we needed human GBM gene expression data
   sets that had both GBM samples and paired normal tissue samples, with
   the total number of samples greater than 40. Through exploring NCBI GEO
   and EBI ArrayExpress, the [56]GSE22866 (including 40 GBM samples and
   six normal samples; platform: Affymetrix-[57]GPL570), [58]GSE50161
   (including 34 GBM samples and 13 normal samples; platform:
   Affymetrix-[59]GPL570), and [60]GSE4290 (including 77 GBM samples and
   23 normal samples; platform: Agilent-014850) were acquired.

   Third, RNA-seq data set comprising 154 GBM samples and 18 normal
   samples was downloaded from the TCGA
   ([61]https://gdc-portal.nci.nih.gov/). There were 152 samples available
   with survival information, including 102 dead and 50 live samples.

Data preprocessing

   For the data sets downloaded from Affymetrix-[62]GPL570 platform, raw
   data (CEL files) were background corrected and normalized[63]^12 using
   the oligo package (version 1.41.1,
   [64]http://www.bioconductor.org/packages/release/bioc/html/oligo.html)
   in R language (version 3.4.1). With respect to the data sets from
   Agilent-014850 platform, raw data (TXT files) underwent log2
   transformation to yield approximately normal distribution with the
   limma[65]^13 software (version 3.34.0,
   [66]https://bioconductor.org/packages/release/bioc/html/limma.html),
   followed by standardization using the median method. CGGA and TCGA data
   were subject to quantile normalization using the preprocessCore
   package[67]^14 (version1.40.0,
   [68]http://bioconductor.org/packages/release/bioc/html/prepro-cessCore.
   html) in R language (version 3.4.1).

   Next, according to platform annotation files, the probes in all data
   sets that had RefSeq transcript ID and annotation information as
   non-coding RNA in the Refseq database were chosen. Moreover, the
   platform sequencing data were aligned to human genome (GRCh38 version)
   by using the Clustal2 ([69]http://www.clustal.org/clustal2/).[70]^15
   The acquired lncRNAs combining with the annotated lncRNAs in the Refseq
   database[71]^16 were extracted for further analysis.

WGCNA

   The WGCNA (version 1.61,
   [72]https://cran.r-project.org/web/packages/WGCNA/index.html)[73]^17
   was applied to build a WGCNA to mine GBM-related preserved modules. For
   this network analysis, the CGGA data were referred to as the training
   set, while the [74]GSE51062, [75]GSE36245, and E-TABM-898 as validation
   sets. Initially, comparability between the four sets was analyzed using
   correlation analysis. A WGCNA was constructed in accordance with a
   previous study.[76]^18 Briefly, using scale-free topology criterion,
   the soft threshold power of β was established, through which the
   weighted adjacency matrix was developed. The modules with size ≥150 and
   minimum cut height of 0.99 were selected using dynamic tree cut
   algorithm, and the preserved modules were determined using the module
   preservation function of WGCNA package. In addition, the possible
   biological functions of the significantly preserved modules were
   studied using userListEnrichment function of WGCNA package.

Selection of prognosis-related lncRNAs

   Based on the lncRNAs in the preselected preserved WGCNA modules and the
   corresponding survival information, univariate Cox regression analysis
   was used to identify the lncRNAs that were significantly correlated
   with prognosis (logrank P<0.05) by using survival package (version 2.4,
   [77]https://cran.r-project.org/web/packages/survival/index.html) in R
   language (version 3.4.1).[78]^19

Construction of prognosis scoring model based on lncRNAs

   The identified prognosis-related lncRNAs were used to fit a Cox PH
   model based on the LASSO estimation[79]^20 to select the optimal panel
   of prognostic lncRNAs. The optimal value for penalization coefficient
   lambda was selected by running cross-validation likelihood (cvl) 1,000
   times. Subsequently, the Cox PH coefficients and expression levels of
   these prognostic lncRNAs were extracted to calculate the risk score as
   a measure of survival risk for each patient using the following
   formula:
   [MATH:
   <mrow><mtext>Risk score</mtext><mo>=</mo><msub><mi>β</mi><mrow><mtext>l
   ncRNA1</mtext></mrow></msub><mo>×</mo><msub><mrow><mtext>expr</mtext></
   mrow><mrow><mtext>lncRNA1</mtext></mrow></msub><mo>+</mo><msub><mi>β</m
   i><mrow><mtext>lncRNA2</mtext></mrow></msub><mo>×</mo><msub><mrow><mtex
   t>expr</mtext></mrow><mrow><mtext>lncRNA2</mtext></mrow></msub><mo>+</m
   o><mo>⋯</mo><mo>+</mo><msub><mi>β</mi><mrow><mtext>lncRNAn</mtext></mro
   w></msub><mo>×</mo><msub><mrow><mtext>expr</mtext></mrow><mrow><mtext>l
   ncRNAn</mtext></mrow></msub><mo>,</mo></mrow> :MATH]

   where β[lncRNAn] represents Cox PH coefficient of lncRNAn and
   expr[lncRNA] represents expression level of lncRNAn.

   All samples in the CGGA set were dichotomized into high- and low-risk
   groups by risk score, with median risk score as the threshold. Then,
   three independent sets with concomitant survival information (TCGA set,
   [80]GSE74187, and [81]GSE83300) were utilized to evaluate the
   effectiveness and robustness of the abovementioned risk scoring model.
   As mentioned above, the three data sets contained all available GBM
   data with survival information in TCGA, NCBI GEO, and EBI ArrayExpress.
   In the same manner, samples in each set were categorized by risk score
   into predicted high-and low-risk groups. Survival difference between
   different risk groups in each set was analyzed using the Kaplan–Meier
   curve in combination with the Wilcoxon logrank test.

Analysis of consensus differentially expressed RNAs (DERs)

   [82]GSE22866, [83]GSE50161, and [84]GSE4290 contained both GBM samples
   and normal control samples. We screened the overlapped DERs between GBM
   and normal samples across the three data sets using MetaDE package
   ([85]https://cran.r-project.org/web/packages/MetaDE/),[86]^21^,[87]^22
   under the thresholds of tau2=0, Qpval>0.05, P<0.05 and false discovery
   rate (FDR) <0.05. Of them, tau2 and Qpval were measures of
   heterogeneity for the heterogeneity test. When tau2=0 and Qpval>0.05,
   the gene was homogeneous and unbiased.

Pathway enrichment analysis

   We built lncRNA-mRNA networks with the selected prognostic lncRNAs and
   their correlated mRNAs in WGCNA modules. GSEA is a powerful approach
   for annotating gene expression data that are characterized by focusing
   on gene set with common biological function, chromosomal location, or
   regulation
   ([88]http://software.broadinstitute.org/gsea/index.jsp).[89]^23 We
   performed pathway enrichment analysis for the lncRNA-mRNA networks
   using GSEA. Pathways with nominal (NOM) P-value <0.05 were considered
   significant. GSEA-enriched results were shown by normalized enrichment
   score (NES) that was calculated as previously described.[90]^24

Results

WGCNA co-expression network construction and module mining

   After preprocessing, 609 lncRNAs and 14,948 mRNAs were overlapped in
   CGGA data set, [91]GSE51062, [92]GSE36245, and E-TABM-898. By using
   WGCNA, correlation analysis between any two sets of the four data sets,
   CGGA data set (training set), [93]GSE51062, [94]GSE36245, and
   E-TABM-898 (validation sets), were performed. As shown in [95]Figure 1,
   the correlation coefficients ranged from 0.5 to 1 (P-values<1e–200),
   suggesting that the expression of common RNAs among the four data sets
   was coincident.

Figure 1.

   [96]Figure 1
   [97]Open in a new tab

   Correlation analysis between CGGA data set, [98]GSE51062, [99]GSE36245,
   and E-TABM-898.

   Abbreviation: CGGA, Chinese Glioma Genome Atlas.

   Initially, WGCNA of RNAs was built for the training set (CGGA set).
   According to the scale-free topology criterion, the soft threshold
   power of β was set as 5 when scale-free topology model-fit
   R[100]^2=0.9. The phylogenetic tree mined nine co-expression modules
   (module size, ≥50; cut height, ≥0.99) in the WGCNA ([101]Figure 2A). As
   shown by the color bands underneath the phylogenetic tree, nine modules
   were represented by branches of different colors (M1, black; M2, blue;
   M3, brown; M4, green; M5, gray; M6, pink; M7, red; M8, turquoise; M9,
   yellow). Moreover, these modules were validated in E-TABM-898,
   [102]GSE51062, and [103]GSE36245 ([104]Figure 2B and D). In the three
   validation sets, genes were colored in the same manner as in TCGA set.

Figure 2.

   [105]Figure 2
   [106]Open in a new tab

   Clustering results of WGCNA modules in CGGA set (A), E-TABM-898 (B),
   [107]GSE51062 (C), and [108]GSE36245 (D). Note: Modules are labeled in
   different colors.

   Abbreviations: CGGA, Chinese Glioma Genome Atlas; WGCNA, weighted gene
   co-expression network analysis.

   As can be seen from a multidimensional scaling (MDS) for gene
   expression data of the nine modules ([109]Figure 3A), genes in yellow
   and red modules showed similar expression and genes in brown and black
   modules exhibit similar expression. Hierarchical clustering analysis of
   modules found that the yellow and red modules were on the same branch
   ([110]Figure 3B). These observations illustrate that the yellow and red
   modules possess similar gene expression patterns.

Figure 3.

   [111]Figure 3
   [112]Open in a new tab

   Further analyses of WGCNA modules in CGGA.

   Notes: (A) An MDS plot displaying expression data of genes in different
   modules. (B) A hierarchical clustering tree of modules.

   Abbreviations: CGGA, Chinese Glioma Genome Atlas; MDS, multidimensional
   scaling; WGCNA, weighted gene co-expression network analysis.

   Module preservation analysis found that among the nine nodules, eight
   modules had Z-score >5 ([113]Table 1). The eight modules were ranked in
   a descending order of Z-score. Top three modules were yellow module
   (Z-score=34.5011), red module (Z-score=34.3040), and black module
   (Z-score=24.5504), which were highly overlapped across all datasets.
   This observation indicates that the three modules may provide important
   information concerning the pathological mechanisms of GBM. With regard
   to functional annotation, the yellow module (84 lncRNAs) was related to
   biological adhesion, the red module (26 lncRNAs) was associated with
   immune response, and the brown module (eight lncRNAs) was possibly
   involved in synaptic transmission ([114]Table 1).

Table 1.

   Features of WGCNA modules
   Module Color Module size Number of mRNAs Number of lncRNAs Preservation
   Z-score Module annotation
   Module 1 Black 199 178 21 24.5504 Regulation of action potential in
   neuron
   Module 2 Blue 386 376 10 11.6620 Cell cycle
   Module 3 Brown 299 291 8 24.2916 Synaptic transmission
   Module 4 Green 214 209 5 7.5840 Regulation of system process
   Module 5 Gray 3,418 3,405 13 2.7203 Regulation of cell proliferation
   Module 6 Pink 173 173 0 7.6614 RNA processing
   Module 7 Red 232 206 26 34.3041 Immune response
   Module 8 Turquoise 483 449 34 14.4361 Regulation of transcription
   Module 9 Yellow 301 217 84 34.5012 Biological adhesion
   [115]Open in a new tab

   Abbreviations: lncRNAs, long non-coding RNAs; WGCNA, weighted gene
   co-expression network analysis.

Identification of prognosis-related lncRNAs

   There were 188 lncRNAs in the eight overlapped WGCNA modules. Based on
   the survival information of CGGA set, 32 lncRNAs were identified to be
   significant prognosis-related lncRNAs by univariate Cox regression
   analysis. As shown in [116]Figure 4, among the 32 prognosis-related
   lncRNAs, 11 were in the yellow module, eight in the red module, and
   eight in the turquoise module. As aforementioned, the yellow and red
   modules had similar gene expression patterns. Moreover, the two modules
   were functionally related to biological adhesion and immune response,
   which were critical for GBM pathogenesis.[117]^25^,[118]^26 Therefore,
   the 19 lncRNAs in the yellow and red modules were selected for further
   analysis.

Figure 4.

   Figure 4
   [119]Open in a new tab

   Distribution of the identified prognosis-related lncRNAs in WGNCA
   modules.

   Abbreviations: lncRNA, long non-coding RNA; WGCNA, weighted gene
   co-expression network analysis.

Development of a six-lncRNA prognostic scoring system

   Expression of the 19 lncRNAs in the yellow and red modules were used as
   input for LASSO Cox PH model. When the cvl was maximized to be
   −466.2711, the optimal lambda value was 18.0151. As a result, a panel
   of six lncRNAs was selected as predictive factors for survival,
   including C20orf166-AS1, LINC00645, LBX2-AS1, LINC00565, LINC00641, and
   PRRT3-AS1 ([120]Table 2). For predicting each individual patient’s
   survival probability, risk score was calculated for each patient with
   the following formula:
   [MATH: <mtable
   columnalign="left"><mtr><mtd><mtext>Risk score</mtext><mo>=</mo><mrow><
   mo>(</mo><mrow><mn>0.83631</mn></mrow><mo>)</mo></mrow><mo>×</mo><msub>
   <mtext>Exp</mtext><mrow><mi
   mathvariant="normal">C</mi><mn>20</mn><mtext>orf</mtext><mn>166</mn><mo
   >−</mo><mtext>AS</mtext><mn>1</mn></mrow></msub><mo>+</mo><mrow><mo>(</
   mo><mrow><mn>1.18806</mn></mrow><mo>)</mo></mrow></mtd></mtr><mtr><mtd>
   <mo>×</mo><msub><mtext>Exp</mtext><mrow><mtext>LINC</mtext><mn>00645</m
   n></mrow></msub><mo>+</mo><mrow><mo>(</mo><mrow><mn>0.11155</mn></mrow>
   <mo>)</mo></mrow><mo>×</mo><msub><mtext>Exp</mtext><mrow><mtext>LBX</mt
   ext><mn>2</mn><mtext>-AS</mtext><mn>1</mn></mrow></msub></mtd></mtr><mt
   r><mtd><mo>+</mo><mrow><mo>(</mo><mrow><mn>1.04407</mn></mrow><mo>)</mo
   ></mrow><mo>×</mo><msub><mtext>Exp</mtext><mrow><mtext>LINC</mtext><mn>
   00565</mn></mrow></msub><mo>+</mo><mrow><mo>(</mo><mrow><mo>−</mo><mn>1
   .16291</mn></mrow><mo>)</mo></mrow></mtd></mtr><mtr><mtd><mo>×</mo><msu
   b><mtext>Exp</mtext><mrow><mtext>LINC</mtext><mn>00641</mn></mrow></msu
   b><mo>+</mo><mrow><mo>(</mo><mrow><mn>0.29694</mn></mrow><mo>)</mo></mr
   ow><mo>×</mo><msub><mtext>Exp</mtext><mrow><mtext>PRRT</mtext><mn>3</mn
   ><mo>−</mo><mtext>AS</mtext><mn>1</mn></mrow></msub><mo>.</mo></mtd></m
   tr></mtable> :MATH]

Table 2.

   Six lncRNAs selected by Cox PH model
   lncRNA β value HR (95% CI) P-values in univariate Cox regression Module
     __________________________________________________________________

   C20orf166-AS1 0.8363 1.899 (1.238–2.914) 0.0031 Red
   LINC00645 1.1881 1.119 (1.018–1.231) 0.0182 Red
   LBX2-AS1 0.1116 1.094 (0.652–1.724) 0.0137 Yellow
   LINC00565 1.0441 1.259 (1.057–1.500) 0.0075 Yellow
   LINC00641 −1.1629 0.120 (1.039–1.207) 0.0031 Yellow
   PRRT3-AS1 0.2969 2.688 (1.200–6.021) 0.0159 Yellow
   [121]Open in a new tab

   Abbreviations: lncRNA, long non-coding RNA; PH, proportional hazards.

Prediction of overall survival (OS) of GBM patients

   The aforementioned lncRNA-based risk scoring system was applied to the
   CGGA set. With the median risk score as cutoff, all patients in the
   CGGA set were categorized into a high-risk group (n=69) and a low-risk
   group (n=69). The results showed that the low-risk group had
   significantly longer OS compared to the high-risk group (16.61±14.22
   months vs 9.83±6.17 months, logrank, P=0.000127; [122]Figure 5A).

Figure 5.

   [123]Figure 5
   [124]Open in a new tab

   Kaplan–Meier survival curves estimating overall survival in CGGA set
   (A), E-TABM-898 (B), [125]GSE51062 (C), and [126]GSE36245 (D). Notes:
   Patients in each set are sorted by risk score into a high-risk group
   and a low-risk group. Logrank P-values are calculated by logrank test.

   Abbreviations: CGGA, Chinese Glioma Genome Atlas; TCGA, The Cancer
   Genome Atlas.

   The predictive capability of this prognostic scoring system was tested
   in TCGA set, [127]GSE74187, and [128]GSE83300, and the risk score and
   risk group categories were similar for each of them. As shown in
   [129]Figure 5B, for TCGA set (n=152), when compared to the high-risk
   group, a notably better survival was observed in the low-risk group
   (14.93±12.54 months vs 9.19±6.65 months, logrank P=0.0001195).
   Consistent results were also found for [130]GSE74187 (n=60; 22.47±10.14
   month vs 15.83±10.11 month, log-rank p=0.02568, [131]Figure 5C). For
   [132]GSE83300, the low-risk group had a longer OS compared to the
   high-risk group, with marginally significant difference (logrank
   P=0.09198; [133]Figure 5D). It may be attributed to the relatively
   small sample size (n=50) of [134]GSE83300. These findings offer strong
   evidence for the prognostic power of the six-lncRNA prognostic scoring
   system.

Identification of overlapped DERs in [135]GSE22866, [136]GSE50161, and
[137]GSE4290

   [138]GSE22866, [139]GSE50161, and [140]GSE4290 with both GBM and normal
   tissue samples were used in the present study to find overlapped DERs
   between GBM and normal samples. Totally, 3,989 overlapped DERs (tau2=0,
   Qpval>0.05, P-value<0.05, and FDR,0.05) were found; of which, 98 were
   lncRNAs. Notably, of the aforementioned six prognostic lncRNAs,
   LBX2-AS1, LINC00641, LINC00645, and LINC00565 were DERs. A heatmap for
   expression of these overlapped DERs demonstrates that the DERs in the
   [141]GSE22866, [142]GSE50161, and [143]GSE4290 had similar expression
   patterns ([144]Figure 6).

Figure 6.

   [145]Figure 6
   [146]Open in a new tab

   A heatmap showing expression of consensus DERs in tumor and control
   samples of [147]GSE22866, [148]GSE50161, and [149]GSE4290.

   Abbreviation: DER, differentially expressed RNA.

Establishment of lncRNA-mRNA networks

   To explore the relationships between the six prognostic lncRNAs and
   genes in the yellow and red modules, lncRNA-mRNA networks were
   constructed for the two modules, respectively ([150]Figure 7A and B).
   For the red module, the lncRNA-mRNA network was composed of two lncRNAs
   (C20orf166-AS1 and LINC00645) and 206 genes, of which, five were
   downregulated DERs and 72 were upregulated ([151]Figure 7A). For the
   yellow module, the lncRNA-mRNA network contained four lncRNAs
   (LBX2-AS1, LINC00641, PRRT3-AS1, and LINC00565) and 217 genes, of
   which, four were downregulated DERs and 97 were upregulated
   ([152]Figure 7B).

Figure 7.

   [153]Figure 7
   [154]Open in a new tab

   LncRNA-mRNA networks for the identified prognostic lncRNAs in the red
   module (A) and the yellow module (B).

   Notes: A round node stands for a gene, while a square node stands for
   an lncRNA. A regular triangle represents an upregulated gene, while an
   inverted triangle represents a downregulated gene. Green or red link
   signals negative or positive association, respectively, between two
   nodes.

   Abbreviation: lncRNA, long non-coding RNA.

Pathway analysis

   We conducted pathway enrichment analysis using GSEA for the two
   lncRNA-mRNA networks. It was revealed that C20orf166-AS1 in the red
   module was significantly enriched in antigen processing and
   presentation and cell adhesion molecules (CAMs) pathways ([155]Table
   3). The two pathways involved nine common genes: major
   histocompatibility complex, class II, DM alpha (HLA-DMA), major
   histocompatibility complex, class II, DM beta (HLA-DMB), major
   histocompatibility complex, class II, DP beta 1 (HLA-DPB1), CD2, sialic
   acid-binding Ig-like lectin 1 (SIGLEC1), major histocompatibility
   complex, class II, DO alpha (HLA-DOA), major histocompatibility
   complex, class II, DQ alpha 1 (HLA-DQA1), major histocompatibility
   complex, class II, DR beta 1 (HLA-DRB1), and major histocompatibility
   complex, class II, DQ beta 1 (HLA-DQB1). As shown in [156]Table 4,
   LBX2-AS1, LINC00641, PRRT3-AS1, and LINC00565 in the yellow module were
   significantly associated with cancer, focal adhesion, extracellular
   matrix (ECM) receptor interaction, and mitogen-activated protein kinase
   (MAPK) signaling pathways. Twelve common genes were involved in the
   four pathways, including laminin subunit beta (LAMB) 1, collagen type V
   alpha 2 chain (COL5A2), TGFB 1, integrin subunit alpha (ITGA) 5,
   platelet-derived growth factor receptor beta (PDGFRB), TNF receptor
   superfamily (TNFRSF) 12A, dual-specificity phosphatase (DUSP) 6,
   laminin subunit gamma (LAMC) 1, LAMC3, TNFRSF1A and myosin light chain
   (MYL) 9.

Table 3.

   Results of pathway enrichment analysis for the lncRNA-mRNA network of
   the red module
   Name                                     ES     NES    NOM P-value
   KEGG_ANTIGEN_PROCESSING_AND_PRESENTATION 0.5920 1.3432 0.0192
   KEGG_CELL_ADHESION_MOLECULES_CAMS        0.5170 1.3731 0.0311
   [157]Open in a new tab

   Note: Positive and negative NES values denote upregulation and
   downregulation of genes or lncRNAs involved in pathways, respectively.

   Abbreviations: ES, enrichment score; lncRNA, long non-coding RNA; NES,
   normalized enrichment score; NOM, nominal.

Table 4.

   Results of pathway enrichment analysis for the lncRNA-mRNA network of
   the yellow module
   Name PRRT3-AS1 LBX2-AS1
   ES NES NOM P-value ES NES NOM P-value
   KEGG_PATHWAYS_IN_CANCER −0.4574 −1.0686 0.3951 0.1585 0.3544 0.0084
   KEGG_FOCAL_ADHESION −0.3054 −0.7933 0.6885 0.2975 0.7244 0.0276
   KEGG_ECM_RECEPTOR_INTERACTION −0.3010 −0.7540 0.7215 0.2821 0.6673
   0.0381
   KEGG_MAPK_SIGNALING_PATHWAY 0.3486 0.7623 0.7955 0.2419 0.5276 0.0498
   LINC00641 LINC00565
   ES NES NOM P-value ES NES NOM P-value
   KEGG_PATHWAYS_IN_CANCER 0.1796 0.4001 0.0099 −0.4807 −1.1598 0.0031
   KEGG_FOCAL_ADHESION −0.1243 −0.3241 0.0137 −0.4050 −1.1102 0.0370
   KEGG_ECM_RECEPTOR_INTERACTION 0.1864 0.4300 0.0397 −0.3442 −0.8985
   0.0457
   KEGG_MAPK_SIGNALING_PATHWAY 0.1801 0.3902 0.0464 −0.5860 −1.3128 0.0413
   [158]Open in a new tab

   Note: Positive and negative NES values denote upregulation and
   downregulation of genes or lncRNAs involved in pathways, respectively.

   Abbreviations: ES, enrichment score; lncRNA, long non-coding RNA; NES,
   normalized enrichment score; NOM, nominal.

Discussion

   Increasing evidence indicates that a growing number of lncRNAs are
   associated with various cancer types.[159]^27 This discovery leads to a
   growing interest in the study of lncRNAs in GBM. Based on the gene
   expression data of GBM from CGGA, NCBI GEO, and EBI ArrayExpress, we
   identified a prognostic signature of six lncRNAs (C20orf166-AS1,
   LINC00645, LBX2-AS1, LINC00565, LINC00641, and PRRT3-AS1) through a
   combination of WGCNA, univariate Cox regression analysis, and LASSO PH
   model. Moreover, a six-lncRNA-based risk scoring system was constructed
   and capable to classify GBM patients into two risk groups with
   significantly different survival rates. The prognostic performance of
   the risk scoring model was successfully validated in two independent
   sets. It indicates that the six lncRNAs are promising prognostic
   biomarkers for GBM and may play important roles in tumorigenesis of
   GBM.

   LINC00645 is an endometrial cancer-specific lncRNA.[160]^28 Emerging
   studies have proved that C20orf166-AS1 is aberrantly expressed in
   prostate cancer and bladder cancer.[161]^42^,[162]^43 However, the
   involvement of LINC00645 and C20orf166-AS1 in GBM has not been reported
   yet. In the present study, C20orf166-AS1 was identified as an important
   lncRNA of prognostic value for GBM. Moreover, pathway enrichment
   analysis showed that C20orf166-AS1 was significantly related to antigen
   processing and presentation and CAMs pathways, and both pathways shared
   HLA-DMA, HLA-DMB, HLA-DPB1, CD2, SIGLEC1, HLA-DOA, HLA-DQA1, HLA-DRB1,
   and HLA-DQB1. Among the nine common genes, HLA-DMA, HLA-DMB, HLA-DPB1,
   HLA-DOA, HLA-DQA, HLA-DRB1, and HLA-DQB1 are major histocompatibility
   complex class II molecules that are mainly expressed on
   antigen-presenting cells and play an important role in immune response.
   The protein encoded by CD2 gene is a CAM located on the surface of T
   cells and NK cells, and it acts as a specific marker for these
   cells.[163]^29 SIGLEC1 protein is a member of siglecs that are
   predominately expressed on the surface of immune cells and bind to
   glycans enclosing sialic acids.[164]^30 Interactions between siglecs
   and glycans are implicated in cell adhesion and cell signaling. These
   findings reveal that C20orf166-AS1 might participate in immune response
   and cell adhesion in GBM through the regulation of these genes in
   antigen processing and presentation and CAM pathways.

   Recent studies report that upregulation of LBX2-AS1 has been observed
   in lung cancer.[165]^31^,[166]^32 Interestingly, LBX2-AS1 is
   significantly upregulated with the increase of tumor grade in
   GBM,[167]^33 suggesting that this lncRNA probably has an important
   regulatory role in GBM prognosis. Alterations of LINC00641, PRRT3-AS1,
   and LINC00565 in cancer have been scarcely reported. The current study
   provided evidence that the four lncRNAs had predictive value for
   survival of GBM patients. Notably, the study uncovered that they were
   significantly linked to focal adhesion, ECM receptor interaction, and
   MAPK signaling pathways. These pathways involved 12 common genes,
   including LAMB1, COL5A2, TGFB1, ITGA5, PDGFRB, TNFRSF12A, DUSP6, LAMC1,
   LAMC3, TNFRSF1A, and MYL9. Increasing evidence has established that
   MAPK pathway is involved in regulating GBM cell migration and
   proliferation.[168]^34^,[169]^35 LAMB1, LAMC1, and LAMC3 are members of
   ECM glycoproteins. COL5A2 encodes an alpha chain for fibrillar
   collagen, a major component of ECM proteins.[170]^36 TGF-β1 is a member
   of TGF β superfamily and plays a role in the regulation of growth,
   proliferation, and differentiation of glioma cells.[171]^37 Integrin
   alpha-5 protein encoded by ITGA5 belongs to the integrin alpha chain
   family that is critical for cell adhesion.[172]^38 Recently, it is
   found that PDGFRB is elevated in GBM microvascular proliferation
   compared to GBM tumor cells and selectively expressed PDGFRB protein in
   pericytes.[173]^39 DUSP6 protein belongs to the dual-specificity
   protein phosphatase subfamily that acts as a negative regulator over
   MAPK members.[174]^40 Besides, it has been found that DUSP6 is
   upregulated in GBM and promotes the development of GBM.[175]^41 These
   results imply that the involvement of LBX2-AS1, LINC00641, PRRT3-AS1,
   and LINC00565 in GBM may be involved in focal adhesion, ECM receptor
   interaction, and MAPK signaling pathways. These common genes might be
   potential therapeutic targets for GBM.

Conclusion

   Based on the comprehensive analysis of publicly accessible GBM data in
   CGGA, NCBI GEO, and EBI ArrayExpress, this study identifies a novel
   six-lncRNA signature for GBM prognostic prediction. This study also
   highlights the pathways and genes involved in the regulatory mechanisms
   underlying these prognostic lncRNAs. Further studies are warranted
   prior to the application of this lncRNA signature in clinical practice.

Availability of data and material

   The raw data were collected and analyzed by the authors, and they are
   not ready to share their data because the data have not been published.

Footnotes

   Author contributions

   RL and YQZ participated in the design of this study, and they both
   performed the statistical analysis. GZ and BZ carried out the study and
   collected important background information. HZ and MW drafted the
   manuscript. All authors read and approved the final manuscript. All
   authors contributed toward data analysis, drafting and revising the
   paper and agree to be accountable for all aspects of the work.

   Disclosure

   The authors report no conflicts of interest in this work.

References