Abstract Introduction Osteosarcoma is a high-morbidity bone cancer with an unsatisfactory prognosis. The aim of this study is to develop novel potential prognostic biomarkers and construct a prognostic risk prediction model for recurrence in osteosarcoma. Methods By analyzing microarray data, univariate and multivariate Cox regression analyses were performed to screen prognostic RNA signatures and to build a prognostic model. The RNA signatures were validated using Kaplan–Meier curves. Then, we developed and validated a nomogram combining age, recurrence, metastatic, and Prognostic score (PS) models to predict the individual’s overall survival at the 3- and 5-year points. Pathway enrichment of RNA was conducted based on the significant co-expressed RNAs. Results A total of 319 mRNAs and 14 lncRNAs were identified in the microarray data. One lncRNA (LINC00957) and six mRNAs (METL1, CA9, B3GALT4, ALDH1A1, LAMB3, and ITGB4) were identified as RNA signatures and showed good performances in survival prediction for both the training and validation cohorts. Cox regression analysis showed that the seven RNA signatures could independently predict overall survival. Furthermore, age, recurrence, metastatic, and PS models were identified as independent prognostic factors via univariate and multivariate Cox analyses (P < 0.05) and included in the prognostic nomogram. The C-index values for the 3- and 5-year overall survival predictions of the nomogram were 0.809 and 0.740, respectively. Conclusions The current study provides the novel potential of seven RNA candidates as prognostic biomarkers. Nomograms were constructed to provide accurate and individualized survival prediction for recurrence in osteosarcoma patients. 1. Introduction Osteosarcoma is the most common primary non-hematological bone cancer, affecting adolescents and children more than adults, histologically characterized by the production of osteoid in malignant cells [28][1]. Recurrence and metastasis are principal pathological problems in the malignant progression of osteosarcoma, and the long-term survival rate of such patients is around 20%, which evidently hampers the effectiveness of osteosarcoma clinical treatments and brings unfavorable outcomes to osteosarcoma patients [29][2]. The survival rate of these patients has improved as a result of comprehensive management in the form of intensive chemotherapy and surgery [30][3]. However, the progress has dwindled despite modern therapy over the past three decades, the prognosis remains poor for most patients with recurrent osteosarcoma [31][4], [32][5], [33][6], [34][7]. Therefore, accurate prognosis and efficient therapy are urgently needed to improve the treatment of recurrence in osteosarcoma. As a complicated disease, osteosarcoma results from interactions between genetic and other factors. Numerous studies have demonstrated that a variety of factors may contribute to the development of osteosarcoma, including age, gender, ethnicity, or physical, chemical, and biological agents [35][8], [36][9], [37][10]. In addition, some studies showed that genetic factors may play a more important role in the pathogenesis of osteosarcoma [38][11], [39][12], [40][13]. Recently, several studies have investigated the potential role of lncRNAs and mRNAs as diagnostic or prognostic targets in osteosarcoma [41][14], [42][15], [43][16]. Decreased expression levels of lncRNA maternally expressed 3 (MEG3) have been reported to be an independent predictor of a shorter overall survival period in patients with osteosarcoma, suggesting that MEG3 may be a useful prognostic biomarker and may exhibit an important pro-oncogenic effect for the prognosis of osteosarcoma [44][17], [45][18], [46][19]. Hongzeng Wu et al. were the first to report that high expression of mRNA TfR1 and VEGF was significantly correlated to poor overall survival, and both TfR1 and VEGF were the independent prognostic indicators of osteosarcoma patients [47][3], [48][20]. However, because the risk of relapse may differ among patients at the same disease stage in the osteosarcoma population, the roles of lncRNA and mRNA are still unclear, and the association of lncRNAs and mRNAs in the prognosis of osteosarcoma patients remains elusive. Consequently, individual and early prediction of recurrence while planning osteosarcoma management is crucial as it could result in better-adapted treatments and survival rates. This study aimed to develop and validate a survival prediction nomogram for an individualized prediction of survival in patients with recurring osteosarcoma. Note that in this study, patients with osteosarcoma recurrence only represented local recurrence, not including metastasis. The lncRNA and mRNA expression profiles and matched clinical information in samples of patients with osteosarcoma were integrated to identify the prognostic biomarkers associated with the overall survival of patients with osteosarcoma and to establish an RNA prognostic risk model that can effectively predict clinical survival. The significant prognostic power of an RNA prognostic risk model was assessed and validated. Subsequently, an effective prognostic nomogram that incorporates both RNA signatures and clinical risk factors to predict 3- and 5-year overall survival rates and cancer-specific survival rates for patients with osteosarcoma was constructed. 2. Materials & methods 2.1. Patient information and data preprocessing The RNA-seq dataset (including lncRNA and mRNA) of the 265 osteosarcoma samples and the clinical survival data was downloaded from TCGA ([49]https://tcga-data.nci.nih.gov/) and obtained by Illumina HiSeq 2000 RNA Sequencing. After retaining the osteosarcoma samples with recurrence and clinical survival prognosis information, 169 osteosarcoma samples were included in this study. Simultaneously, the microarray data [50]GSE39055 [51][21] was downloaded from GEO ([52]https://www.ncbi.nlm.nih.gov/) with “osteosarcoma, recurrence, Homo sapiens” as the search terms [53][22]. A total of 37 osteosarcoma samples (including 18 recurrence and 19 non-recurrence) were downloaded, and all of the samples had clinical survival prognosis information. The Illumina HumanHT-12 WG-DASL V4.0 R2 Expression BeadChip detection platform was used to obtain the dataset. 2.2. Differential expression analysis The lncRNA and mRNA in the TCGA datasets were re-annotated using the Ref Seq ID information supported by the detection platform of the HUGO Gene Nomenclature Committee ([54]http://www.genenames.org/) [55][23], which includes 4526 lncRNAs and 19,205 mRNAs. The limma package (version 3.34.7, [56]https://bioconductor.org/packages/release/bioc/html/limma.html) [57][24] in R was used to perform screening differentially expressed RNAs [DERs, including differentially expressed lncRNA (DE-lncRNAs) and differentially expressed mRNA (DE-mRNAs)] between recurrence and non-recurrence osteosarcoma samples. A significance analysis of microarrays with false discovery rate (FDR) < 0.05 and |log[2] fold change (FC)| > 0.263 were chosen as the cutoff criteria to define the DERs. Hierarchical clustering analyses of DERs were performed using the pheatmap package (version 1.0.8, [58]https://cran.r-project.org/web/packages/pheatmap/index.html) in R[59][25] and were presented in two-way hierarchical clustering heatmaps based on centered Pearson correlation [60][26]. P < 0.05 was considered statistically significant. 2.3. Construction prognostic score model The R package “survival” ([61]http://bioconductor.org/packages/survivalr/) [62][27] was used to identify the independent prognosis-related DERs for the 169 osteosarcoma samples by univariate and multivariate COX regression analyses. Log-rank P-value < 0.05 was set as the significant correlation threshold. On the basis of these prognosis-related DERs, a Cox Proportional Hazards (Cox-PH) model was applied to select the optimal panel of prognostic DERs. The optimal lambda was determined after running 1,000 stimulations through a cross-validation likelihood using the R package “penalized” (version 0.9-50, [63]http://bioconductor.org/packages/penalized/) [64][28]. Afterwards, the prognostic score model was built based on the expression levels and independent prognostic coefficients of the independent prognosis-associated DERs. The prognostic risk score (PS) of the osteosarcoma samples were calculated using the formula below [65][29]: [MATH: PS =βDER s×ExpDERs :MATH] β[DERs] indicates the independent prognostic coefficients of independent prognosis-related DERs, and Exp[DERs] denotes the expression levels of independent prognosis-related DERs in the training dataset. The median of the PS value of osteosarcoma samples in the training dataset were calculated, then the samples were divided into high-risk and low-risk groups using the median PS as the cutoff point. The overall survival differences between high-risk groups and low-risk groups were determined using Kaplan–Meier survival curves in the R package “survival” (version 2.41-1, [66]http://bioconductor.org/packages/survivalr/) [67][27]. Then, P-values and hazard ratio (HR) with a 95% confidence interval were generated by log-rank tests. Further to this, time-dependent receiver operating characteristic (ROC) curves were employed to measure predictive performance, and the [68]GSE39055 dataset from the GEO database was used to validate the prognostic model. 2.4. Construction of an osteosarcoma-specific prognostic model Univariate and multivariate Cox regression analyses were used to screen the independent predictive value of the DERs prognostic model in 169 osteosarcoma patients with clinical information from the TCGA, including age (>20 years), gender, tumor multifocal, tumor recurrence, tumor metastasis, radiotherapy, and tumor necrosis. Statistically significant correlation was performed with a log-rank P-value < 0.05 and was set as the cutoff. Kaplan–Meier analysis was performed, and Kaplan–Meier curves were plotted for independent predictive clinical characteristic models using the P-value < 0.05 as a cutoff value. We included each independent predictive factor selected by the multivariate Cox regression analysis to generate a nomogram using the “rms” package (version 5.1-2, [69]https://cran.r-project.org/web/packages/rms/index.html) in R3.4.1 [70][30], [71][31]. The calibration and discrimination of the independent predictive model were included as validation steps. The ROC curve analyses with 3- and 5-years as the defining points were performed using the R package “pROC” (version 1.14.0, [72]https://cran.r-project.org/web/packages/pROC/index.html) [73][32], which has been used to evaluate prognostic performance for survival prediction. The area under the ROC curve (AUC) for evaluating discriminatory ability was calculated, and the values ranged from 0.5 to 1, with those closer to 1 indicating better efficiency. Furthermore, the prognostic capacity of the nomogram was assessed by calculating the Harrell’s concordance index (C-index) in the R3.4.1 “survcomp” package ([74]http://www.bioconductor.org/packages/release/bioc/html/survcomp.ht ml) [75][33]. The value of the C-index ranged from 0.5 to 1.0 (C-index = 0.5 indicated random chance, and C-index = 1.0 indicated perfect predictive accuracy). Generally, C-index > 0.7 indicated an acceptable discrimination accuracy for prognosis. 2.5. Construction of lncRNA and mRNA co-expression network and function enrichment analysis The co-expression analysis of the DERs significantly associated with prognosis was conducted based on the Pearson correlation coefficient of the Cor function in the R language ([76]https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.test. html)[77][34].Their expression levels were measured on a network visualization display via Cytoscape (version 3.6.1, [78]http://www.cytoscape.org/) [79][35]. Then, KEGG pathway enrichment analysis was performed on DERs in the lncRNA–mRNA network using Gene Set Enrichment Analysis ([80]http://software.broadinstitute.org/gsea/index.jsp) in R[81][36]. P < 0.05 was considered to screen KEGG pathways that were significantly enriched in the relevant DERs. 3. Results 3.1. Data preprocessing and DERs screening After data preprocessing, a total of 10,700 mRNAs and 1029 lncRNAs were detected ([82]Table S1). A total of 333 DERs were obtained among them, including 319 mRNAs (120 up-regulated and 199 down-regulated) and 14 lncRNAs (two up-regulated and 12 down-regulated) in osteosarcoma samples with recurrence (n = 28) compared with non-recurrence (n = 141) when p < 0.05 and |log[2]FC| > 0.263 was the cutoff criteria ([83]Table S2). We identified all of the DERs that were shown in the volcano map, according to the value of |log[2]FC|, and displayed the DERs on a volcano map ([84]Fig. 1A). The expression values of differentially expressed lncRNAs and mRNAs were two-way hierarchically clustered, and the color contrast of the heatmap indicated that there was a significant difference in the expression levels between the non-recurrence and recurrence osteosarcoma samples ([85]Fig. 1B). Fig. 1. [86]Fig. 1 [87]Open in a new tab A: The volcano plot for DERs related to recurrence. The x-axis is the log[2] fold change (FC), and the y-axis is − log[10] false discovery rate (FDR). Blue dots indicate significant differentially expressed genes (DERs), the red horizontal dashed line indicates FDR < 0.05, and the two red vertical lines indicate |log[2]FC| > 0.263. B: Two-way hierarchically clustered heatmap for TCGA using the DERs: red indicates up-regulated DERs, green indicates down-regulated DERs, black indicates recurrence osteosarcoma samples, and white indicate non-recurrence osteosarcoma samples. (For interpretation of the references to color in