Abstract

Background

   Gliomas are the most common primary tumors of the central nervous
   system. The complexity and heterogeneity of the tumor makes it
   difficult to obtain good biomarkers for drug development. In this
   study, through The Cancer Genome Atlas (TCGA) and Chinese Glioma Genome
   Atlas (CGGA), we analyze the common diagnostic and prognostic moleculer
   markers in Caucasian and Asian populations, which can be used as drug
   targets in the future.

Methods

   The RNA-seq data from Genotype-Tissue Expression (GTEx) and The Cancer
   Genome Atlas (TCGA) were analyzed to identify signatures. Based on the
   signatures, the prognosis index (PI) of every patient was constructed
   to predict the prognostic risk. Also, gene ontology (GO) functional
   enrichment analysis and KEGG analysis were conducted to investigate the
   biological functions of these mRNAs. Glioma patients’ data in the CGGA
   database were introduced to validate the effectiveness of the
   signatures among Chinese populations. Excluding the previously reported
   prognostic markers of gliomas from this study, the expression of HSPA5
   and MTPN were examined by qRT-PCR and immunohistochemical assay.

Results

   In total, 20 mRNAs were finally selected to build PI for patients from
   TCGA, including 16 high-risk genes and four low-risk genes. For Chinese
   patients, the log-rank test p values of PI were both less than 0.0001
   in two independent datasets. And the AUCs were 0.831 and 0.907 for 3
   years of two datasets, respectively. Moreover, among these 20 mRNAs, 10
   and 15 mRNAs also had a significant predictive effect via univariate
   COX analysis in CGGA_693 and CGGA_325, respectively. qRT-PCR and
   Immunohistochemistry assay indicated that HSPA5 and MTPN over-expressed
   in Glioma samples compared to normal samples.

Conclusion

   The 20-gene signature can forecast the risk of Glioma in TCGA
   effectively, moreover it can also predict the risks of Chinese patients
   through validation in the CGGA database. HSPA5 and MTPN are possible
   biomarkers of gliomas suitable for all populations to improve the
   prognosis of these patients.

   Keywords: Gliomas, Bioinformatics, Gene Signatures, Prognosis, LASSO

Introduction

   Gliomas are the most common primary tumors of the central nervous
   system ([36]Schneider et al., 2010; [37]Wen & Kesari, 2008). According
   to the World Health Organization (WHO) classification of neurological
   tumors, they were classified into grade I–IV ([38]Louis et al., 2007).
   The treatment mode is surgical resection combined with chemotherapy and
   radiotherapy ([39]Mrugala, 2013). The prognosis of most patients is
   still poor due to its high incidence, high recurrence rate, high
   fatality rate and low cure rate. Epidemiological studies have found
   that the median survival time of patients with low-grade gliomas is 6
   to 8 years, and that of patients with glioblastomas is only 12 to 18
   months ([40]Yang, Zhou & Lin, 2014).

   With the continuous development of genome sequencing technology, a
   variety of new treatment methods, such as targeted therapy and
   immunotherapy, have been adopted in the first-line treatment. But there
   is no better way to treat gliomas because of their own complexity and
   heterogeneity that makes it difficult to obtain good biomarkers for
   drug development. Therefore, based on The Cancer Genome Atlas (TCGA)
   and the Chinese Glioma Genome Atlas (CGGA), two important genomic
   databases of gliomas, this study analyzes the common diagnostic and
   prognostic markers in Caucasian and Asian populations, which can be
   used as drug targets in the future.

   The heterogeneity of gliomas includes histological and genetic
   complexity. A number of genetic tests have been included in the latest
   WHO classification of tumors in the central nervous system, and some
   genotypes and phenotypes have been included in the main diagnostic
   indicators ([41]Louis et al., 2016). In light of the genetic
   characteristics, gliomas can be divided into Isocitrate dehydrogenase
   (IDH) mutation and O-6-methylguanine-DNA methyltransferase(MGMT)
   methylation. The research focuses involve IDH mutation, 1P/19Q combined
   deletion, MGMT promoter methylation, etc. Many studies have shown that
   IDH mutation is a common initial factor in all types of gliomas, and
   high-grade glioma patients with IDH mutation have a significantly
   better prognosis than those without mutation ([42]Leu et al., 2016).
   Deletion of chromosome1P/19Q is closely related to the occurrence and
   development of oligodendroglioma. Previous studies have confirmed that
   1P/19Q combined deletion is not only a favorable prognostic factor in
   oligodendroglioma, but also a marker of its sensitivity to radiotherapy
   and chemotherapy ([43]Zhang et al., 2014). The occurrence and
   development of various malignant tumors (such as lymphoblastoma, lung
   cancer, esophageal cancer) are related to MGMT protein expression loss
   and DNA repair disorder caused by MGMT gene promoter methylation.
   Multiple studies have shown that glioma patients with MGMT promoter
   methylation are more sensitive to chemotherapy and their survival time
   is longer than those without methylation ([44]Jiang et al., 2014).
   Although a couple of histological subtypes and molecular subtypes are
   already found, the prognosis is bad for some glioma patients.

   Studies on protein targets mainly focus on epidermal growth factor
   receptor, vascular endothelial growth factor receptor, platelet-derived
   growth factor receptor, RAS-RAF-MEK-ERK pathway, PI3K/AKT/mTOR pathway,
   protein kinase C pathway, and multi-target kinase inhibitors, etc
   ([45]Hide et al., 2011; [46]Wick et al., 2011; [47]Sathornsumetee et
   al., 2007; [48]Riolfi et al., 2010; [49]Xiao et al., 2020; [50]Lo,
   2010; [51]Fan & Weiss, 2010; [52]Huang et al., 2020). Molecular and
   genetic analysis to explore the pathogenesis of gliomas and the
   clinical application of targeted therapy for each subtype of gliomas is
   also a current research hot spot where deletion of PTEN gene, deletion
   or mutation of cyclin-dependent kinase inhibitor protein, etc. are
   studied. The increase of epidermal growth factor receptor protein is
   the main manifestation of primary gliomas. Hall et al. suggested that
   inhibition of P53 and activation of MYC signaling pathways in normal
   astrocytes exposed to GBM-EVs might be a mechanism by which
   glioblastoma(GBM) manipulates astrocytes to acquire a phenotype that
   promotes tumor progression ([53]Hallal et al., 2019).

   Bioinformatics methods have also been employed to find connections
   between RNA and gliomas. Through bioinformatics analysis, [54]Zhang et
   al. (2010) found that Mir-221/222 might jointly regulate about 70
   targeted genes and play a synergistic regulatory function in gliomas
   through the Akt signaling pathway. LncRNA including ASLNC22381 and
   ASLNC20819 also play important roles in the development and
   pathological mechanism of gliomas ([55]Han et al., 2012). In our
   preceding work, we detected that increased expression of neuropilin-1
   gene indicated poor prognosis for patients ([56]Dai et al., 2017). Our
   team also used mRNAseq and micro (mi)RNAseq data to construct a
   co-expression network of gliomas and revealed the prognostic molecular
   signature of grade III gliomas. A total of 37 mRNAs and 10 miRNAs were
   identified, which were closely associated with the survival rates of
   patients with grade III gliomas ([57]Bing et al., 2016). But non-coding
   RNAs were difficult to develop drug. The mRNA encoding protein is more
   suitable for drug development. Data mining of public sources of gene
   expression is an effective way to identify novel tumor-associated
   genes, and this work may contribute to the identification of candidate
   genes for glioma angiogenesis ([58]Su et al., 2013).

   In this study, we identified protein coding gene signature from TCGA
   and CGGA, and screened the top-20 differential expression genes that
   are related to prognosis. By literature search, we excluded the
   previously reported prognostic biomarkers of gliomas, and obtained two
   gene signatures (HSPA5 and MTPN) that can predict the prognosis of both
   white and Asian populations ([59]Norris et al., 2016; [60]Li et al.,
   2014; [61]Ge et al., 2018; [62]Yan et al., 2019). We also collected
   some clinical specimens to verify the expression of HSPA5 and MTPN in
   glioma and normal brain tissue. We hope to find biomarkers of gliomas
   suitable for all populations to improve the prognosis of these
   patients.

Materials and Methods

Data resource and preprocessing

   In order to obtain the genes related to the prognosis of GBM
   patients, the phenotype information and gene expression data of tumor
   samples and those of normal samples were successively collected from
   TCGA database and The Genotype-Tissue Expression (GTEx) project,
   according to overall survival (OS) of glioblastoma (GBM) patients. The
   standardized form of TCGA data were collected from UCSC Xena
   ([63]http://xena.ucsc.edu), where gene expression was recorded as the
   log2 transformed FPKM (Reads Per Kilobase of exon model per Million
   mapped reads). To validate whether the gene biomarkers were valid in
   Chinese people, two datasets were downloaded from CGGA
   ([64]http://www.cgga.org.cn/). The following three kinds of samples
   were excluded: (1) patients with incomplete information of either
   phenotype or gene expression, (2) patients with OS time less than 30
   days, (3) patients with recurrent tumor. In the TCGA database, GBM
   sample included 147 samples and 196 normal tissue that selected from
   GTEx database. In the CGGA database, three datasets were included in
   this database. Of these datasets, CGGA_639 and CGGA_325 were tested by
   RNA-Seq. Another one was performed by Affymatrix chip. So, we selected
   two datasets that used the same platform.

Differential expression analysis

   To identify the genes related to overall survival of GBM patients, the
   differentially expressed mRNAs between GBM patients and normal
   individuals were selected at first. The Limma package of R 3.6.1 was
   employed to analyze the data ([65]https://www.r-project.org/). The
   threshold was set as follows: the adjusted p value was less than 0.001,
   abstract of log2-fold-change was larger than 1, B>5 and AveExpr>5. The
   p value was calculated via Student-t test and adjusted by
   Benjamini–Hochberg method ([66]Puoliväli, Palva & Palva, 2020). The
   differentially expressed genes (DEGs) could be explained as genes
   expressed differently between GBM and normal tissue.

The predictive prognostic genes

   Univariate Cox (Uni-Cox) proportional hazard regression and least
   absolute shrinkage and selection operator (LASSO) were employed to
   screen the genes that are related to survival among TCGA GBM patients.
   Uni-Cox was applied to identify the independent effect related to
   overall survival of each DEG. The hazard ratio (HR) of each mRNA was
   calculated according to following equation:
   [MATH: <mstyle displaystyle="true"><mi
   mathvariant="normal">HR</mi><mo>=</mo><msup><mrow><mi>e</mi></mrow><mro
   w><mi>β</mi></mrow></msup></mstyle> :MATH]
   (1)

   where β represents the coefficient from Uni-Cox. Here Survival package
   of R was applied.

   Then to simplify the predictive genes, least absolute shrinkage and
   selection operator (LASSO) method was adopted, which further filtered
   the statistically significant mRNAs further ([67]Zhang et al., 2019).
   Package glmnet of R was applied.

Prognostic index model construction

   Based on the identified prognostic genes, a risk value named prognosis
   index (PI) was calculated for every patient as an integrated signature.
   PI was calculated as follows:
   [MATH: <mstyle displaystyle="true"><mi
   mathvariant="normal">PI</mi><mo>=</mo><munderover><mrow><mo
   mathsize="big" movablelimits="false">
   ∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>m</mi
   ></mrow></munderover><msub><mrow><mi>β</mi></mrow><mrow><mi>i</mi></mro
   w></msub><mo>×</mo><msub><mrow><mi>E</mi></mrow><mrow><mi>i</mi></mrow>
   </msub></mstyle> :MATH]
   (2)

   where β[i] represents the coefficient of the involved gene i, and E[i]
   represents the corresponding gene’s expression level. Then normalized
   PI was calculated as follows:
   [MATH: <mstyle
   displaystyle="true"><mtext>normalized</mtext><mi>P</mi><msub><mrow><mi>
   I</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>=</mo><mfrac><mrow><mi>P
   </mi><msub><mrow><mi>I</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>−</
   mo><mi>m</mi><mi>e</mi><mi>a</mi><mi>n</mi><mrow><mfenced separators=""
   open="("
   close=")"><mi>P</mi><mi>I</mi></mfenced></mrow></mrow><mrow><mi>s</mi><
   mi>d</mi><mrow><mfenced separators="" open="("
   close=")"><mi>P</mi><mi>I</mi></mfenced></mrow></mrow></mfrac></mstyle>
   :MATH]
   (3)

   where mean(PI) represents the mean and sd(PI) represents the standard
   deviation of all PIs, respectively. The PIs below all referred to
   normalized PI. In order to classify the patients of high risks and low
   risks, the median of PIs was set as a cut-off. If the PI of a patient
   is larger than the cut-off, the patient is going to be assigned to high
   risk group, and be predicted with a bad OS, otherwise the patient will
   be predicted to have low risks of death.

Analysis of clinical confounding factors with PI

   Several clinical characteristics might correlate with the OS of GBM
   patients. In TCGA database, age, KPS score, cancer status, gender and
   race were considered as the main clinical confounding factors affecting
   prognosis ([68]Xiong et al., 2014). Therefore, we explored the role of
   age, KPS score, cancer status, gender, race and PI in TCGA datasets via
   Uni-Cox. Following this, multivariable Cox (Multi-Cox) proportional
   hazard regression was also conducted to explore the joint effect of
   these clinical factors and the previously calculated PI, which allowed
   comparison of the prognostic value of PI to that of each clinical
   factor. In the CGGA database, clinical variables contain more
   information, which is helpful to understand the influence of various
   confounding factors on Cox regression. In the CGGA database, it mainly
   includes grade, gender, age, radiotherapy status, chemotherapy status,
   IDH mutation status and 1p19q codeletion status. The inclusion and
   exclusion criteria of the cases to enter into the multivariable model
   were listed as follows: (1) patients with clinical data were selected;
   (2) untreated, primary (de novo) GBM patients were selected; (3)
   patients with history of neoadjuvant treatment were not admissible; (4)
   patients more than 30-day survival were selected. Variables with
   p-values < 0.05 were selected as candidates entering into the
   multivariable model.

Survival analysis and model test

   To evaluate the predictive effect of PI, Kaplan–Meier survival curves
   were created. According to the log-rank test, the PI is considered as a
   good predictor if the p value is less than 0.05. Package Survminer of R
   was used.Then the time-dependent receiver operating characteristic
   (ROC) curve analysis was introduced to assess the model. Package
   survival ROC of R was applied and the area under the curve (AUC) was
   calculated. If AUC equals to 0.5, it indicates that the predictive
   effect of the model is tantamount to random allocation of patients. But
   if AUC is more than 0.5, it implies that its predictive effect is
   superior to random allocation ([69]Kottas, Kuss & Zapf, 2014).

Geno ontology (GO) functional enrichment

   To analyze the basic biological function of genes identified in our
   model, package ClusterProfiler ([70]Yu et al., 2012) of R was adopted
   to conduct the GO analysis of functional process and Kyoto Encyclopedia
   of Genes and Genomes (KEGG) pathway enrichment analysis. The p-value
   cut-off is less than 0.05 as significant enrichment threshold.

Validation in CGGA dataset

   Because most of the patients in the sample set is white, and only 8
   samples are Asians (accounting for 5.5% of all samples), whether the
   model is valid for Chinese patients is unknown. Therefore, the
   validation of the model was conducted in two datasets collected from
   CGGA, one of which named CGGA_325 (a dataset with 325 Chinese GBM
   patients) and the other is CGGA_693 (a dataset with 693 Chinese GBM
   patients), respectively. The two datasets contain different sample
   numbers, among which CGGA_693 contains 693 samples, CGGA_325 contains
   325 samples. The two datasets also contain different types of samples.
   CGGA_693 database contains primary LGG, recurrent LGG, primary GBM,
   recurrent GBM. CGGA_325 database contains primary LGG, recurrent LGG,
   primary GBM, recurrent GBM and secondary GBM. The PI of each patient
   was calculated according to the mRNAs identified in the model, and then
   K-M curves and ROC curves analysis were performed. Meanwhile, several
   clinical factors recorded in CGGA datasets were added into
   multivariable COX model to validate the predictive effects of PI when
   adjusted for confounders.

Human tissue samples

   A total of 30 glioma tissue specimens and 10 cases of peritumor brain
   tissue (used control) resected by corticostomy were collected from
   Lanzhou University Second Hospital between October 2019 and May 2020.
   All these patients did not receive preoperative radiotherapy,
   chemotherapy or other immunotherapy, and were confirmed by surgical
   pathology. According to the WHO classification, there were 15 cases of
   WHO Grade II. Among them, 9 cases are of males, 6 cases are of females
   and the age ranges from 21 to 59 (40.5 ± 9.6). There were 9 cases with
   diffuse astrocytoma (A), 4 cases with oligodendroglioma (O), 2 cases
   with oligoastrocytoma (OA). There were 15 cases of WHO Grade III/IV,
   among which 6 cases are of males, 9 cases are of females, and the age
   ranges from 29 to 69 (47.9 ± 12.9). There were 4 cases with Anaplastic
   astrocytoma (AA), 1 case with anaplastic oligodendroglioma (AO), 4
   cases with anaplastic oligoastrocytoma (AOA), 6 cases with glioblastoma
   (GBM). Among control groups, 5 cases are of males, 5 cases are of
   females, and the age ranges from 31 to 69 (49.7 ± 11.4). This study was
   approved by the Ethics Committee of Lanzhou University Second Hospital
   (2020A-147), and all patients signed informed consent before surgery.

RNA extraction and qRT-PCR

   Total RNA from all cells and tumor tissues were isolated using Trizol
   reagent (Takara, Dalian, China) and the first-strand cDNA was converted
   using the PrimeScript RT reagent Kit with genomic DNA Eraser (Takara).
   Then, TB Green Premix ExTaq (Takara) was used to perform qRT-PCR
   (Bio-Rad CFX96). Based on the results of differential analysis, GAPDH
   was selected as reference gene in glioma samples. Relative gene
   expression was evaluated by a comparative CT method (2-ΔΔCtmethod).
   Statistical significance of qRT-PCR data was analyzed using IBM SPSS
   Statistics 22.0 software(Armonk, NY, USA) and determined by the
   Student’s t-test. p < 0.05 was considered to be statistically
   significant. Primers used were as following: HSPA5 forward:
   5′-GACATCAAGTTCTTGCCGTTCA-3′, HSPA5 reverse: 5′-CCAGCAATAGTTCCAGCG
   TCTT-3′; MTPN forward: 5′-CGGAGACTTGGATGAGGTGAA-3′, MTPN reverse:
   5′-AGAGCTTTGATTGCCTGGTTG-3′; GAPDH forward: 5′-GGAAGCTTGTCATCAAT
   GGAAATC-3′, GAPDH reverse: 5′-TGATGACCCTTTTGGCTCCC-3′.

Western blot assay

   Tissue samples were lysed in RIPA lysis buffer, and lysates were
   harvested by centrifugation (12,000 rpm) at 4 °C for 30 min. Western
   blot was performed in accordance with the protocols as described above,
   using β-actin as the internal control. Briefly, the protein sample was
   separated by SDS-PAGE electrophoresis and then transferred to a PVDF
   membrane. After blocking nonspecific binding sites for 60 min with 5%
   non-fat milk, the membrane was incubated with the primary antibody at
   4 °C overnight. Membranes were washed three times with tris buffered
   saline with 1‰tween-20 and incubated with horseradish
   peroxidase-conjugated secondary antibody at 37 °C for 1 h. After 3
   washes, the bands were detected by an enhanced chemiluminescence system
   (WBKLS0500, Merck KGaA). Band density was measured using ImageJ
   software (National Institutes of Health, Bethesda, MD) and standardized
   to that of β-actin. Antibodies against proteins and dilution multiples
   are as follows: HSPA5(ab21685, abcam, USA) (1:1000),
   MTPN(bs-11891R,bioss,China) (1:1000), β-actin (service, China)(1:3000).

Immunohistochemical assay

   Two-Step method was adopted to detect the expression of HSPA5 and MTPN.
   Tissue specimens were fixed promptly with 100g/L formaldehyde solution,
   embedded in paraffin and cut into 3 µm sections. All sections were
   dehydrated with graded alcohol, repaired with antigen, quenched with
   peroxidase solution, put into tap water, and washed with PBS. The
   slides were incubated in a moist chamber with HSPA5 or MTPN rabbit
   polyclonal antibody (1:100) at 37 °C for 30min. Then they were
   incubated in a moist chamber with the goat polyclonal antibody against
   rabbit at 37 °C for 30min. After being washed in PBS completely, the
   slides were developed in 0.05% freshly prepared diaminobenzedine
   solution (DAB) for 10 min, and then counterstained with hematoxylin.
   Finally, the slides were dehydrated in ascending concentrations of
   ethanol, airdried, and mounted. The expression of HSPA5 and MTPN were
   scored according to the degree of staining and the number of stained
   cells. Degree of staining: 0 for non-staining, 1 for light yellow, 2
   for brownish, and 3 for tan. Stained cell counts are defined as
   follows: Under high power microscope, 5 fields were randomly selected
   from each section to count the percentage of stained cells; 0 point is
   given to match the stained cells which represent less than 5%; 1 point
   matches 5% to 25%; 2 points matches 26%∼50%; 3 points matches 51%∼75%;
   4 points matches more than 75%. The product of the two scores is
   scored:0 is negative, 1-3 is weakly positive, 4-5 is moderately
   positive, and ≥6 is strongly positive. Negative and weak positive were
   classified as negative expression, while moderate positive and strong
   positive were classified as positive expression.

Statistical analysis

   Statistical analysis of human tissue data was expressed as mean ± SD
   (standard deviation) from three independent experiments. Differences
   between groups were estimated using the Student’s t-test. Discrete data
   values were expressed as rate and analyzed by chi-squared test. All
   these analyses were performed using IBM SPSS Statistics 22.0 software
   (Armonk, NY, USA) and a two-tailed value of P < 0.05 was considered
   statistically significant.

Results

DEGs Between GBM samples and Non-GBM samples

   By preprocessing, 147 GBM patients with 19,199 genes in TCGA and 196
   non-GBM individuals with 19199 genes in GTEx were collected to do the
   following analysis.

   Through the analysis of differential gene expressions, 581 DEGs were
   left in the model according to the threshold. Among them, 138 mRNAs
   were highly expressed and 443 mRNAs were lowly expressed. [71]Figure 1
   shows the volcano plots of the results of DEGs. For further biological
   experiment, we filtered average expression more than 5 as threshold.
   The gene significantly expression showed in [72]Fig. 1A. Filtering
   genes were showed in [73]Fig. 1B.

Figure 1. Volcano plots of the DEGs.

   [74]Figure 1
   [75]Open in a new tab

   (A) Blue dots implied the genes were down-regulated and red dots were
   up-regulated. The grey plots implied the mRNAs were not differentially
   expressed significantly. (B) Blue and red dots indicate genes with an
   average expression of more than 5.

The survival related genes of GBM

   Through Uni-cox regression, 50 mRNAs were identified as survival
   related genes according to p value less than 0.05 in TCGA database.
   Then LASSO method was employed and 20 mRNAs were finally selected. The
   information of these 20 genes were listed in [76]Table 1. Among them,
   16 mRNAs were high-risk genes with HRs from 1.230 to 2.039, which meant
   that the higher their expression, the worse OS the patient might have.
   In contrast, 4 genes were protective factors with HRs from 0.789 to
   0.552.

Table 1. Survival related genes (TCGA).

    Genes   Coefficients of Uni-Cox  HR   95%CI for HR    p-value
   High Risk Genes
   RNF10    0.713                   2.039 1.013–4.105  0.046[77]^*
   MTPN     0.593                   1.809 1.141–2.868  0.012[78]^*
   RTN4     0.540                   1.717 1.078–2.734  0.023[79]^*
   HSPA5    0.518                   1.679 1.175–2.398  0.004[80]^**
   PLD3     0.511                   1.667 1.134–2.451  0.009[81]^**
   GRN      0.495                   1.640 1.176–2.287  0.004[82]^**
   FLII     0.467                   1.595 1.068–2.383  0.023[83]^*
   NDUFB2   0.397                   1.488 1.026–2.158  0.036[84]^*
   DKK3     0.365                   1.441 1.133–1.831  0.003[85]^**
   MAP1LC3A 0.361                   1.435 1.130–1.822  0.003[86]^**
   SERPINE2 0.329                   1.390 1.079–1.791  0.011[87]^*
   TTYH3    0.285                   1.330 1.056–1.674  0.015[88]^*
   SCG5     0.258                   1.294 1.037–1.615  0.023[89]^*
   FN1      0.250                   1.284 1.047–1.576  0.017[90]^*
   TAGLN2   0.239                   1.270 1.051–1.534  0.013[91]^*
   LY6E     0.207                   1.230 1.017–1.489  0.033[92]^*
   Low Risk Genes
   RPS19    −0.241                  0.786 0.620–0.996  0.046[93]^*
   EIF3L    −0.441                  0.643 0.469–0.883  0.006[94]^**
   EIF4A2   −0.447                  0.639 0.430–0.951  0.027[95]^*
   FDPS     −0.594                  0.552 0.333–0.916  0.021[96]^*
   Integrated Genes
   PI       0.976                   2.653 1.780–3.955  <0.001
   [97]Open in a new tab

   Notes.
   ^*

   p < 0.05.
   ^**

   p < 0.01.

PI construction

   According to [98]Eqs. (2) and [99](3), the PI of each patient was
   calculated and the cut-off was set to 0.207, which implied patients
   with PIs larger than 0.207 were considered with high risks of overall
   survival. Ordered PIs were depicted in [100]Fig. 2. The HR of PI, which
   was displayed in the last line of [101]Table 1, was 2.653 with 95% CI
   from 1.976 to 4.122. The p-value of Uni-Cox of PI was less than 0.001.

Figure 2. Distribution of patients according to PI value.

   [102]Figure 2
   [103]Open in a new tab

Clinical factors and their joint effect with PI

   [104]Table 2 summarizes the results of clinical factors analysis. Five
   clinical factors were included in the analysis according to the
   characteristics of the original training set. They are age, karnofsky
   performance score (KPS), cancer status, gender and race of patients.
   Based on Uni-cox and survival analysis, it showed that age, KPS and
   cancer status were independent significant factors to OS of GBM
   patients. Then, through multivariate analysis, only KPS, cancer status
   and PI were statistically significant. HR of PI is 2.653 (95% CI
   [1.780–3.955], p < 0.001), which demonstrated PI is significant risk
   factor.

Table 2. Cox hazard regression of clinical factors and PI.

   Factors Uni-Cox Multi-Cox
   Numbers Coef HR 95%CI for HR p p
   Age (>= 45 VS <45) 128/19 0.982 2.669 1.382–5.155 0.004[105]^** 0.202
   KPS (>= 50 VS <50) 105/7 −1.606 0.201 0.077–0.523 0.001[106]^**
   0.001[107]^**
   Cancer status (WITH TUMOR VS TUMOR FREE) 118/12 1.351 3.860 1.535–9.707
   0.004[108]^** 0.049[109]^*
   Gender (male VS female) 95/52 −0.091 0.913 0.624–1.338 0.641 0.555
   Race (white VS not white) 133/13 −0.090 0.914 0.444–1.882 0.808 0.375
   PI (high-risk VS low-risk) 73/74 0.976 2.653 1.780–3.955 <0.001 <0.001
   [110]Open in a new tab

   Notes.
   ^*

   p < 0.05.
   ^**

   p < 0.01.

Survival analysis and model test

   [111]Figure 3 shows the results of K-M curves of PI. The red line was
   the survival curve of the low-risk patients and the green line was that
   of high-risk patients. For 500 days, 31 patients (41%, 31/74) were
   alive in the low-risk group and 11 patients (15%, 11/73) in the
   high-risk group. For 1,000 days, all patients in the high-risk groups
   were dead, while 10 patients (13%, 10/74) in the low-risk group were
   still alive. And the p value of log-rank test was less than 0.001. The
   right term of the figure was the ROC curves of our model. The AUCs of 3
   years and 5 years were 0.824 and 0.820, respectively, which represented
   that our model had a good prognostic effect for GBM.

Figure 3. Survival analysis results in TCGA.

   [112]Figure 3
   [113]Open in a new tab

   (A) High-risk and low-risk groups showed significantly different
   survival. (B) ROC curve showed the performance of PI in TCGA.

Results of GO enrichment analysis

   According to GO functional enrichment analysis, 8 biological processes
   were enriched with adjusted p value less than 0.05 ([114]Fig. 4). They
   were cell adhesion molecule binding, chaperone binding, ubiquitin
   protein ligase binding, ubiquitin-like protein ligase binding, cadherin
   binding, translation initiation factor activity, translation initiation
   factor activity, translation factor activity-RNA binding and unfolded
   protein binding. DEGs mainly involved in cell adhesion molecule
   binding, chaperone binding, ubiquitin protein ligase binding,
   ubiquitin-like protein ligase binding and cadherin binding. These genes
   are engaged in the biological processes of various brain functions such
   as cerebellum morphogenesis, hindbrain morphogenesis, cerebellar cortex
   development and etc. In molecular function, these genes play a part in
   binding function.

Figure 4. GO enrichment analysis.

   [115]Figure 4
   [116]Open in a new tab

   (A) Top eight genes involved in biological process. (B) Top eight genes
   involved in molecular function.

Validation in CGGA dataset

   In validation section, we applied above results from TCGA to test in
   CGGA datasets. The Uni-Cox results of 20 mRNAs were listed in
   [117]Table 3. It showed us that among 20 identified genes through TCGA,
   4 genes (MTPN, RTN4, MAP1LC3A and DKK3) were significantly associated
   with OS. Although MTPN is not remarkably associated with OS, the
   results of calculation showed that HR>1 of MTPN and p-value showed a
   significant trend.

Table 3. Uni-Cox results in CGGA datasets.

   CGGA_693 CGGA_325
   Genes Coef HR 95%CI for HR p Coef HR 95%CI for HR P
   TTYH3 0.452 1.571 1.391–1.775 <0.001 0.729 2.072 1.743–2.464 <0.001
   TAGLN2 0.423 1.526 1.406–1.656 <0.001 0.652 1.919 1.707–2.158 <0.001
   FN1 0.336 1.399 1.301–1.504 <0.001 0.484 1.623 1.479–1.781 <0.001
   HSPA5 0.235 1.265 1.113–1.438 <0.001 0.783 2.188 1.843–2.597 <0.001
   EIF3L −0.156 0.856 0.810–0.904 <0.001 −0.772 0.462 0.358–0.597 <0.001
   SCG5 −0.088 0.916 0.863–0.972 0.004[118]^** −0.194 0.823 0.739–0.917
   <0.001
   SERPINE2 −0.128 0.880 0.806–0.961 0.005[119]^** −0.691 0.501
   0.395–0.635 <0.001
   LY6E −0.136 0.873 0.802–0.949 0.001[120]^** 0.027 1.028 0.841–1.257
   0.789
   FDPS −0.158 0.854 0.775–0.941 0.001[121]^** −0.569 0.566 0.365–0.877
   0.011[122]^*
   GRN 0.135 1.145 1.018–1.287 0.023[123]^* 0.930 2.534 2.116–3.034 <0.001
   FLII 0.127 1.136 0.997–1.294 0.056 1.463 4.318 3.239–5.756 <0.001
   MTPN 0.102 1.107 0.959–1.279 0.166 0.148 1.160 0.858–1.568 0.335
   PLD3 0.059 1.061 0.928–1.213 0.388 0.480 1.615 1.294–2.017 <0.001
   EIF4A2 −0.002 0.998 0.913–1.091 0.967 −0.557 0.573 0.412–0.798
   0.001[124]^**
   RTN4 −0.014 0.986 0.870–1.117 0.820 −0.240 0.787 0.537–1.153 0.219
   RNF10 −0.033 0.968 0.850–1.103 0.624 1.123 3.075 2.253–4.198 <0.001
   RPS19 −0.034 0.966 0.915–1.020 0.218 0.850 2.340 1.852–2.955 <0.001
   NDUFB2 −0.046 0.955 0.907–1.005 0.078 0.543 1.721 1.282–2.309 <0.001
   MAP1LC3A −0.049 0.953 0.880–1.031 0.228 −0.009 0.991 0.851–1.154 0.912
   DKK3 −0.074 0.929 0.822–1.049 0.235 −0.015 0.985 0.832–1.165 0.859
   Integrated Genes
   PI 1.649 5.200 3.872–6.983 <0.001 1.875 6.524 4.496–9.466 <0.001
   [125]Open in a new tab

   Notes.
   ^*

   p < 0.05.
   ^**

   p < 0.01.

   Seven of these genes only were statistically significant via Uni-Cox in
   either CGGA_693 or CGGA_325 (LY6E, FLII, PLD3, EIF4A2, RNF10, RPS19 and
   NDUFB2). The other 9 genes (TTYH3, TAGLN2, FN1, HSPA5, EI3L, SCG5,
   SERPINE2, FDPS and GRN) were still prognostic genes in both CGGA_693
   and CGGA_325. Among these nine genes, SCG5 and SERPINE2 showed the
   opposite effects in CGGA and TCGA, which might because of racial
   difference. And the seven genes mentioned above showed the same effects
   (protective factor or risk factor) between TCGA and CGGA, which might
   be the real prognostic genes of GBM. The HRs of PI in CGGA_693 and
   CGGA_325 were 5.200 (95%CI [3.872–6.983]) and 6.524 (95%CI
   [4.496–9.466]), respectively. Thus our 20-gene model still had a good
   predictive effect among Chinese GBM patients. As is shown in [126]Table
   4, WHO clinical grade and 1p19q codelection status were also
   significant high-risk clinical factors of GBM patients for overall
   survival. Accounting for clinical confounders, PI was still a critical
   prognostic predictor in both CGGA_693 and CGGA_325 datasets.

Table 4. Validation in CGGA datasets adjusted for clinical factors.

   CGGA_693 CGGA_325
   Coef HR 95%CI for HR P Coef HR 95%CI for HR P
   PI (high-risk VS low-risk) 0.884 2.422 1.577–3.718 <0.001 0.667 1.948
   1.185–3.202 0.009[127]^**
   Grade (WHO III VS WHO II) 1.260 3.524 2.091–5.939 <0.001 1.129 3.094
   1.716–5.580 <0.001
   Grade (WHO IV VS WHO II) 1.731 5.648 3.097–10.301 <0.001 1.380 3.974
   2.150–7.348 <0.001
   Gender (Male VS Female) −0.009 0.991 0.702–1.399 0.960 −0.064 0.938
   0.643–1.370 0.742
   Age (>= 45 VS <45) 0.385 1.469 1.025–2.107 0.036[128]^* 0.203 1.224
   0.805–1.862 0.344
   Radio_status (Radio_therapy VS Non-Radio_therapy) −0.659 0.517
   0.305–0.877 0.014[129]^* −0.102 0.903 0.489–1.667 0.745
   Chemo_status (Chemo_therapy VS Non-Chemo_therapy) −0.186 0.830
   0.518–1.330 0.439 −0.359 0.698 0.452–1.079 0.106
   IDH_mutation_status (Wildtype VS mutant) 0.336 1.400 0.897–2.184 0.139
   0.143 1.153 0.703–1.893 0.573
   1p19q_codeletion_status (Non-code VS Codel) 1.081 2.947 1.510–5.751
   0.002[130]^** 1.491 4.442 2.106–9.369 <0.001
   [131]Open in a new tab

   Notes.
   ^*

   p < 0.05.
   ^**

   p < 0.01.

Verification gene signature performance of CGGA dataset

   Gene signature performance was analyzed in two CGGA datasets. Survival
   curve and ROC curve were employed to test prediction performance of the
   gene signature. [132]Figure 5 showed that the p-values of log-rank test
   of K-M curves were less than 0.0001 both in CGGA_693 and CGGA_325. In
   CGGA_693, 185 patients (74%, 185/248) were alive in the low-risk group
   and 44 patients (27%, 44/161) in the high-risk group for 1,000 days.
   And all patients in the high-risk group were dead, while 11 patients
   (4%, 11/248) in the low-risk group were still alive for 3,000 days. In
   CGGA_325, 104 patients (79%, 104/131) were alive in the low-risk group
   and 18 patients (19%, 18/93) in the high-risk group for 1,000 days. And
   all patients in the high-risk group were dead, while 15 patients (11%,
   15/131) in the low-risk group were still alive for 4,000 days. This
   implied that the biomarkers could efficiently classify the Chinese GBM
   patients into good and poor prognosis groups. ROC analysis represented
   that the AUCs of CGGA_693 were 0.831 and 0.808 for 3 and 5 years,
   respectively, and AUCs of CGGA_325 were 0.907 and 0.912 for 3 and 5
   years, respectively. Therefore, our forecasting model for GBM had a
   good classificaion ability in CGGA datasets. To test the predictive
   effects of our 20-gene model among Chinese GBM patients further, K-M
   curves and ROC analysis were conducted.

Figure 5. K-M curves and ROC analysis of our signature in CGGA datasets.

   [133]Figure 5
   [134]Open in a new tab

   (A) and (B) shows the K-M curves of our signature in CGGA_693 and
   CGGA_325 datasets, respectively; (C) and (D) shows the ROC curves of
   our signature in CGGA_693 and CGGA_325 datasets, respectively.

The expression of mRNA HSPA5 and MTPN by PCR

   By real-time quantitative PCR detection, compared with normal brain
   tissue (0.96 ± 0.28), the expression of mRNA HSPA5 in WHO II gliomas
   and WHO III/IV glioma tissue both increased significantly
   (1.25 ± 0.32,1.85 ± 0.70), and the differences were statistically
   significant (P < 0.05, [135]Fig. 6).

Figure 6. The expression of mRNA HSPA5 and MTPN.

   [136]Figure 6
   [137]Open in a new tab

   (A) The expression of mRNA HSPA5. (B) The expression of mRNA MTPN
   *represents p < 0.05 and **represents p < 0.01.

   Compared with normal brain tissue (0.60 ± 0.28), the expression of mRNA
   MTPN in WHO II gliomas and WHO III/IV glioma tissues also increased
   greatly (0.95 ± 0.19,1.01 ± 0.39), and the difference was statistically
   significant (P < 0.05, [138]Fig. 6).

   We filtered all gene signature in three datasets. All genes were
   significantly changed in three datasets. These genes included six genes
   (TAGLN2, HSPA5, FN1, TTYH3, GRN and MTPN). Of these genes, TAGLN2
   ([139]Beyer et al., 2018; [140]Han et al., 2017), FN1 ([141]Yu et al.,
   2020; [142]Gu, Gu & Shou, 2014; [143]Guo, Heller & Thorslund, 2016;
   [144]Liao et al., 2018), TTYH3 ([145]Weinberg et al., 2020), GRN,
   ([146]Ness, Riemenschneider & Baches., 2009; [147]Trigos et al., 2019)
   have been reported in GBM.

Western blot analysis

   Western blotting showed HSPA5 protein level in WHO II glioma tissue and
   WHO III/IV glioma tissue were both elevated obviously compared with
   normal brain tissue. MTPN protein expression in WHO II glioma tissue
   and WHO III/IV glioma tissue were also significantly higher than normal
   brain tissue ([148]Fig. 7).

Figure 7. The expression of HSPA5 and MTPN by Western blot.

   [149]Figure 7
   [150]Open in a new tab

   (A) The expression of HSPA5 protein. (B) The expression of MTPN
   protein.

Immunohistochemical analysis

   HSPA5 protein was mainly expressed in the cytoplasm while MTPN protein
   was expressed in the nucleus and cytoplasm.

   Immunohistochemical results indicated that compared with normal brain
   tissue, the positive expression of HSPA5 protein in gliomas rose
   obviously. For the most part, the expression of protein was strongly
   positive in grade III/IV glioma tissue, moderately positive in grade II
   glioma tissue, and mostly negative or weakly positive in normal brain
   tissue ([151]Figs. 8A–[152]8C, [153]8G). It is equally true of the
   expression of MTPN protein in gliomas ([154]Figs. 8D–[155]8F, [156]8H).

Figure 8. Immunohistochemical analysis in normal brain tissue and glioma
(400×).

   [157]Figure 8
   [158]Open in a new tab

   (A) Strongly positive HSPA5 expression; (B) Moderately positive HSPA5
   expression; (C) Negative HSPA5 expression; (D) Strongly positive MTPN
   expression; (E) Moderately positive MTPN expression; (F) Negative MTPN
   expression; (G) the positive rate of HSPA5 protein expression (%); (H)
   the positive rate of MTPN protein expression (%) *represents p < 0.05.

Discussion

   In recent years, the relationship between some genes or signal pathways
   and the occurrence and development of gliomas has been discoverd by
   bioinformatics analysis. In this study, many prognostic genes of
   gliomas have been identified from TCGA and CGGA databases by data
   mining and biological experiment validation. Of these genes, many
   biomarkers have been identified in previous studies. Relevant teams
   analyzed the glioma data obtained from TCGA and verified the expression
   of differential proteins in the tissues or body fluids of glioma
   patients, animal models and cell lines. Common targets are GFAP
   ([159]Wang et al., 2018), HSP70 ([160]Sharifzad et al., 2020), VEGF
   ([161]Nicolas et al., 2019), BDNF ([162]Huo & Chen, 2019), ECM
   ([163]Tejero et al., 2019), FN1 ([164]Yu et al., 2020; [165]Gu, Gu &
   Shou, 2014; [166]Guo, Heller & Thorslund, 2016; [167]Liao et al.,
   2018), CD44 ([168]Mooney et al., 2016), Fransgelin-2, Short Hairpin
   RNAGLN and GRN ([169]Ness, Riemenschneider & Baches., 2009; [170]Trigos
   et al., 2019) etc. Previous research suggested that TAGLN2 might be
   involved in progression due to its higher expression in glioblastomas
   compared to IDH1/2 WT gliomas of lower grades ([171]Beyer et al., 2018;
   [172]Han et al., 2017). A recent study showed that FN1 gene expression
   was higher in glioma tissues than in normal tissues. GO enrichment
   analysis and KEGG pathway enrichment analysis indicated that FN1 was
   involved in the synthesis of extracellular matrix (ECM) components and
   the PI3K/AKT signaling pathway. It was found that FN1 gene could
   inhibit cell proliferation, promote cell apoptosis and senescence, and
   reduce migration and invasion through the down-regulation of FN1 gene
   expression and disruption of the PI3K-AKT signaling pathway ([173]Liao
   et al., 2018). Long, H.et al. demonstrated the importance of some
   genes, such as COL3A1, FN1, and MMP9 for glioblastoma. Based on the
   selected genes, a prediction model was built and its predictive
   accuracy was found to be 94.4%. These findings might provide more
   insights into the genetic basis of glioblastoma ([174]Long et al.,
   2017).

   In this study, we excluded the genes that have been reported and
   expressed the opposite effects in CGGA and TGGA, instead we focus HSPA5
   and MTPN as important gene signature in both TCGA and CGGA databases.
   HSPA5 is GRP78 (Glucoregulated Protein 78), belonging to the heat shock
   Protein 70 family, usually located in the endoplasmic reticulum. The
   role of this gene is to maintain the biological process of endoplasmic
   reticulum and homeostasis. It was reported that this gene could protect
   organs and tissues from pathological damage. HSPA5 can regulate
   endoplasmic reticulum stress, initiate unfolded protein response (UPR),
   and improve cell viability in case of hypoxia, low glucose, and other
   stress states. Moreover, it is not confined to the endoplasmic
   reticulum of tumor cells. It can migrate to cell membranes, cell
   fluids, mitochondria, and nucleus, and can even be secreted. It was
   reported that this gene is over-expressed in many types of cancers
   however the high expression of HSPA5 in glioma was seldom reported. The
   results of the study showed that the gene expression level of HSPA5
   increased with the grades of gliomas. According to data mining and
   patient tissue analysis, the expression of HSPA5 is positively
   correlated with the malignant degree of tumor. In addition,
   corresponding peptide drugs would be developed accordingly. Previous
   studies screened a novel peptide sequence SNTRVAP (VAP for short) with
   high binding affinity to HSPA5 in vitro and in vivo using phage display
   technology.

   MTPN can promote dimerization of NF-kappa-B subunits and regulates
   NF-kappa-B transcription factor activity. This gene plays a role in the
   regulation of the growth of actin filaments. MTPN has been reported in
   breast cancer ([175]Muñiz Lino et al., 2014). However, it has never
   been reported in gliomas. Bioinformatics analysis and biochemical tests
   showed that the gene was also highly expressed in high-grade gliomas,
   but low in low-grade gliomas and normal brain tissues.

   In this study, we established reliable tumor markers through data
   analysis combined with biological experiments to provide scientific
   basis for future drug development.

Conclusions

   Through analysis of differential gene expressions, 581 DEGs were left
   according to our thresholds. Among them, 138 mRNAs were highly
   expressed and 443 mRNAs with low expression levels, 20 mRNAs were
   identified as survival related genes. The 20-gene signature can
   forecast the risk of Glioma in TCGA effectively, moreover it can also
   predict the risks of Chinese patients through validation in the CGGA
   database. Our study suggests that HSPA5 and MTPN are possible
   biomarkers of gliomas suitable for all populations to improve the
   prognosis of these patients.

Supplemental Information

   Supplemental Information 1. Raw data.
   [176]Click here for additional data file.^ (249.7KB, zip)
   DOI: 10.7717/peerj.11350/supp-1

Acknowledgments