Abstract

Introduction

   Osteosarcoma is a high-morbidity bone cancer with an unsatisfactory
   prognosis. The aim of this study is to develop novel potential
   prognostic biomarkers and construct a prognostic risk prediction model
   for recurrence in osteosarcoma.

Methods

   By analyzing microarray data, univariate and multivariate Cox
   regression analyses were performed to screen prognostic RNA signatures
   and to build a prognostic model. The RNA signatures were validated
   using Kaplan–Meier curves. Then, we developed and validated a nomogram
   combining age, recurrence, metastatic, and Prognostic score (PS) models
   to predict the individual’s overall survival at the 3- and 5-year
   points. Pathway enrichment of RNA was conducted based on the
   significant co-expressed RNAs.

Results

   A total of 319 mRNAs and 14 lncRNAs were identified in the microarray
   data. One lncRNA (LINC00957) and six mRNAs (METL1, CA9, B3GALT4,
   ALDH1A1, LAMB3, and ITGB4) were identified as RNA signatures and showed
   good performances in survival prediction for both the training and
   validation cohorts. Cox regression analysis showed that the seven RNA
   signatures could independently predict overall survival. Furthermore,
   age, recurrence, metastatic, and PS models were identified as
   independent prognostic factors via univariate and multivariate Cox
   analyses (P < 0.05) and included in the prognostic nomogram. The
   C-index values for the 3- and 5-year overall survival predictions of
   the nomogram were 0.809 and 0.740, respectively.

Conclusions

   The current study provides the novel potential of seven RNA candidates
   as prognostic biomarkers. Nomograms were constructed to provide
   accurate and individualized survival prediction for recurrence in
   osteosarcoma patients.

1. Introduction

   Osteosarcoma is the most common primary non-hematological bone cancer,
   affecting adolescents and children more than adults, histologically
   characterized by the production of osteoid in malignant cells [28][1].
   Recurrence and metastasis are principal pathological problems in the
   malignant progression of osteosarcoma, and the long-term survival rate
   of such patients is around 20%, which evidently hampers the
   effectiveness of osteosarcoma clinical treatments and brings
   unfavorable outcomes to osteosarcoma patients [29][2]. The survival
   rate of these patients has improved as a result of comprehensive
   management in the form of intensive chemotherapy and surgery [30][3].
   However, the progress has dwindled despite modern therapy over the past
   three decades, the prognosis remains poor for most patients with
   recurrent osteosarcoma [31][4], [32][5], [33][6], [34][7]. Therefore,
   accurate prognosis and efficient therapy are urgently needed to improve
   the treatment of recurrence in osteosarcoma.

   As a complicated disease, osteosarcoma results from interactions
   between genetic and other factors. Numerous studies have demonstrated
   that a variety of factors may contribute to the development of
   osteosarcoma, including age, gender, ethnicity, or physical, chemical,
   and biological agents [35][8], [36][9], [37][10]. In addition, some
   studies showed that genetic factors may play a more important role in
   the pathogenesis of osteosarcoma [38][11], [39][12], [40][13].
   Recently, several studies have investigated the potential role of
   lncRNAs and mRNAs as diagnostic or prognostic targets in osteosarcoma
   [41][14], [42][15], [43][16]. Decreased expression levels of lncRNA
   maternally expressed 3 (MEG3) have been reported to be an independent
   predictor of a shorter overall survival period in patients with
   osteosarcoma, suggesting that MEG3 may be a useful prognostic biomarker
   and may exhibit an important pro-oncogenic effect for the prognosis of
   osteosarcoma [44][17], [45][18], [46][19]. Hongzeng Wu et al. were the
   first to report that high expression of mRNA TfR1 and VEGF was
   significantly correlated to poor overall survival, and both TfR1 and
   VEGF were the independent prognostic indicators of osteosarcoma
   patients [47][3], [48][20]. However, because the risk of relapse may
   differ among patients at the same disease stage in the osteosarcoma
   population, the roles of lncRNA and mRNA are still unclear, and the
   association of lncRNAs and mRNAs in the prognosis of osteosarcoma
   patients remains elusive. Consequently, individual and early prediction
   of recurrence while planning osteosarcoma management is crucial as it
   could result in better-adapted treatments and survival rates.

   This study aimed to develop and validate a survival prediction nomogram
   for an individualized prediction of survival in patients with recurring
   osteosarcoma. Note that in this study, patients with osteosarcoma
   recurrence only represented local recurrence, not including metastasis.
   The lncRNA and mRNA expression profiles and matched clinical
   information in samples of patients with osteosarcoma were integrated to
   identify the prognostic biomarkers associated with the overall survival
   of patients with osteosarcoma and to establish an RNA prognostic risk
   model that can effectively predict clinical survival. The significant
   prognostic power of an RNA prognostic risk model was assessed and
   validated. Subsequently, an effective prognostic nomogram that
   incorporates both RNA signatures and clinical risk factors to predict
   3- and 5-year overall survival rates and cancer-specific survival rates
   for patients with osteosarcoma was constructed.

2. Materials & methods

2.1. Patient information and data preprocessing

   The RNA-seq dataset (including lncRNA and mRNA) of the 265 osteosarcoma
   samples and the clinical survival data was downloaded from TCGA
   ([49]https://tcga-data.nci.nih.gov/) and obtained by Illumina HiSeq
   2000 RNA Sequencing. After retaining the osteosarcoma samples with
   recurrence and clinical survival prognosis information, 169
   osteosarcoma samples were included in this study. Simultaneously, the
   microarray data [50]GSE39055 [51][21] was downloaded from GEO
   ([52]https://www.ncbi.nlm.nih.gov/) with “osteosarcoma, recurrence,
   Homo sapiens” as the search terms [53][22]. A total of 37 osteosarcoma
   samples (including 18 recurrence and 19 non-recurrence) were
   downloaded, and all of the samples had clinical survival prognosis
   information. The Illumina HumanHT-12 WG-DASL V4.0 R2 Expression
   BeadChip detection platform was used to obtain the dataset.

2.2. Differential expression analysis

   The lncRNA and mRNA in the TCGA datasets were re-annotated using the
   Ref Seq ID information supported by the detection platform of the HUGO
   Gene Nomenclature Committee ([54]http://www.genenames.org/) [55][23],
   which includes 4526 lncRNAs and 19,205 mRNAs. The limma package
   (version 3.34.7,
   [56]https://bioconductor.org/packages/release/bioc/html/limma.html)
   [57][24] in R was used to perform screening differentially expressed
   RNAs [DERs, including differentially expressed lncRNA (DE-lncRNAs) and
   differentially expressed mRNA (DE-mRNAs)] between recurrence and
   non-recurrence osteosarcoma samples. A significance analysis of
   microarrays with false discovery rate (FDR) < 0.05 and |log[2] fold
   change (FC)| > 0.263 were chosen as the cutoff criteria to define the
   DERs. Hierarchical clustering analyses of DERs were performed using the
   pheatmap package (version 1.0.8,
   [58]https://cran.r-project.org/web/packages/pheatmap/index.html) in
   R[59][25] and were presented in two-way hierarchical clustering
   heatmaps based on centered Pearson correlation [60][26]. P < 0.05 was
   considered statistically significant.

2.3. Construction prognostic score model

   The R package “survival”
   ([61]http://bioconductor.org/packages/survivalr/) [62][27] was used to
   identify the independent prognosis-related DERs for the 169
   osteosarcoma samples by univariate and multivariate COX regression
   analyses. Log-rank P-value < 0.05 was set as the significant
   correlation threshold. On the basis of these prognosis-related DERs, a
   Cox Proportional Hazards (Cox-PH) model was applied to select the
   optimal panel of prognostic DERs. The optimal lambda was determined
   after running 1,000 stimulations through a cross-validation likelihood
   using the R package “penalized” (version 0.9-50,
   [63]http://bioconductor.org/packages/penalized/) [64][28]. Afterwards,
   the prognostic score model was built based on the expression levels and
   independent prognostic coefficients of the independent
   prognosis-associated DERs. The prognostic risk score (PS) of the
   osteosarcoma samples were calculated using the formula below [65][29]:
   [MATH: <mrow><mrow><mtext>PS =</mtext><mspace
   width="0.333333em"></mspace></mrow><mo>∑</mo><msub><mi>β</mi><mtext>DER
   s</mtext></msub><mo>×</mo><msub><mtext>Exp</mtext><mtext>DERs</mtext></
   msub></mrow> :MATH]

   β[DERs] indicates the independent prognostic coefficients of
   independent prognosis-related DERs, and Exp[DERs] denotes the
   expression levels of independent prognosis-related DERs in the training
   dataset.

   The median of the PS value of osteosarcoma samples in the training
   dataset were calculated, then the samples were divided into high-risk
   and low-risk groups using the median PS as the cutoff point. The
   overall survival differences between high-risk groups and low-risk
   groups were determined using Kaplan–Meier survival curves in the R
   package “survival” (version 2.41-1,
   [66]http://bioconductor.org/packages/survivalr/) [67][27]. Then,
   P-values and hazard ratio (HR) with a 95% confidence interval were
   generated by log-rank tests. Further to this, time-dependent receiver
   operating characteristic (ROC) curves were employed to measure
   predictive performance, and the [68]GSE39055 dataset from the GEO
   database was used to validate the prognostic model.

2.4. Construction of an osteosarcoma-specific prognostic model

   Univariate and multivariate Cox regression analyses were used to screen
   the independent predictive value of the DERs prognostic model in 169
   osteosarcoma patients with clinical information from the TCGA,
   including age (>20 years), gender, tumor multifocal, tumor recurrence,
   tumor metastasis, radiotherapy, and tumor necrosis. Statistically
   significant correlation was performed with a log-rank P-value < 0.05
   and was set as the cutoff. Kaplan–Meier analysis was performed, and
   Kaplan–Meier curves were plotted for independent predictive clinical
   characteristic models using the P-value < 0.05 as a cutoff value.

   We included each independent predictive factor selected by the
   multivariate Cox regression analysis to generate a nomogram using the
   “rms” package (version 5.1-2,
   [69]https://cran.r-project.org/web/packages/rms/index.html) in R3.4.1
   [70][30], [71][31]. The calibration and discrimination of the
   independent predictive model were included as validation steps. The ROC
   curve analyses with 3- and 5-years as the defining points were
   performed using the R package “pROC” (version 1.14.0,
   [72]https://cran.r-project.org/web/packages/pROC/index.html) [73][32],
   which has been used to evaluate prognostic performance for survival
   prediction. The area under the ROC curve (AUC) for evaluating
   discriminatory ability was calculated, and the values ranged from 0.5
   to 1, with those closer to 1 indicating better efficiency. Furthermore,
   the prognostic capacity of the nomogram was assessed by calculating the
   Harrell’s concordance index (C-index) in the R3.4.1 “survcomp” package
   ([74]http://www.bioconductor.org/packages/release/bioc/html/survcomp.ht
   ml) [75][33]. The value of the C-index ranged from 0.5 to 1.0
   (C-index = 0.5 indicated random chance, and C-index = 1.0 indicated
   perfect predictive accuracy). Generally, C-index > 0.7 indicated an
   acceptable discrimination accuracy for prognosis.

2.5. Construction of lncRNA and mRNA co-expression network and function
enrichment analysis

   The co-expression analysis of the DERs significantly associated with
   prognosis was conducted based on the Pearson correlation coefficient of
   the Cor function in the R language
   ([76]https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.test.
   html)[77][34].Their expression levels were measured on a network
   visualization display via Cytoscape (version 3.6.1,
   [78]http://www.cytoscape.org/) [79][35]. Then, KEGG pathway enrichment
   analysis was performed on DERs in the lncRNA–mRNA network using Gene
   Set Enrichment Analysis
   ([80]http://software.broadinstitute.org/gsea/index.jsp) in R[81][36].
   P < 0.05 was considered to screen KEGG pathways that were significantly
   enriched in the relevant DERs.

3. Results

3.1. Data preprocessing and DERs screening

   After data preprocessing, a total of 10,700 mRNAs and 1029 lncRNAs were
   detected ([82]Table S1). A total of 333 DERs were obtained among them,
   including 319 mRNAs (120 up-regulated and 199 down-regulated) and 14
   lncRNAs (two up-regulated and 12 down-regulated) in osteosarcoma
   samples with recurrence (n = 28) compared with non-recurrence (n = 141)
   when p < 0.05 and |log[2]FC| > 0.263 was the cutoff criteria ([83]Table
   S2). We identified all of the DERs that were shown in the volcano map,
   according to the value of |log[2]FC|, and displayed the DERs on a
   volcano map ([84]Fig. 1A). The expression values of differentially
   expressed lncRNAs and mRNAs were two-way hierarchically clustered, and
   the color contrast of the heatmap indicated that there was a
   significant difference in the expression levels between the
   non-recurrence and recurrence osteosarcoma samples ([85]Fig. 1B).

Fig. 1.

   [86]Fig. 1
   [87]Open in a new tab

   A: The volcano plot for DERs related to recurrence. The x-axis is the
   log[2] fold change (FC), and the y-axis is − log[10] false discovery
   rate (FDR). Blue dots indicate significant differentially expressed
   genes (DERs), the red horizontal dashed line indicates FDR < 0.05, and
   the two red vertical lines indicate |log[2]FC| > 0.263. B: Two-way
   hierarchically clustered heatmap for TCGA using the DERs: red indicates
   up-regulated DERs, green indicates down-regulated DERs, black indicates
   recurrence osteosarcoma samples, and white indicate non-recurrence
   osteosarcoma samples. (For interpretation of the references to color in