Abstract Osteosarcoma (OS) is the most prevalent secondary sarcoma associated with retinoblastoma (RB). However, the molecular mechanisms driving the interactions between these two diseases remain incompletely understood. This study aims to explore the transcriptomic commonalities and molecular pathways shared by RB and OS, and to identify biomarkers that predict OS prognosis effectively. RNA sequences and patient information for OS and RB were obtained from the University of California Santa Cruz (UCSC) Xena and Gene Expression Omnibus databases. When RB and OS were first identified, a common gene expression profile was discovered. Weighted Gene Co-expression Network Analysis (WGCNA) revealed co-expression networks associated with OS after immunotyping patients. To evaluate the genes shared by RB and OS, univariate and multivariate Cox regression analysis were then carried out. Three machine learning methods were used to pick key genes, and risk models were created and verified. Next, medications that target independent prognostic genes were found using the Cellminer database. The comparison of differential gene expression between OS and RB revealed 1216 genes, primarily linked to the activation and proliferation of immune cells. WGCNA identified 12 modules related to OS immunotyping, with the grey module showing a strong correlation with the immune-inflamed phenotype. This module intersected with differential genes from RB, producing 65 RB-associated OS Immune-inflamed Genes (ROIGs). Analysis identified 6 hub genes for model construction through univariate Cox regression and three machine learning techniques. A risk model based on these hub genes was established, demonstrating significant prognostic value for OS. Genes shared between OS and RB contribute to the progression of both cancers through multiple pathways. The ROIGs risk score model independently predicts the overall survival of OS patients. Additionally, this study highlights genes with potential as therapeutic targets or biomarkers for clinical use. Keywords: Osteosarcoma, Retinoblastoma, Weighted gene co-expression network analysis, Machine learning, Immunotherapy, MXI1, ARX Subject terms: Cancer, Molecular biology Introduction Osteosarcoma (OS) is the most prevalent primary malignant bone tumor among children and adolescents, with approximately 800–900 new cases diagnosed annually in the United States^[36]1. The five-year survival percentage for patients receiving combined surgery, chemotherapy, and radiation treatment can reach 79.3%^[37]2, despite the low incidence of OS. However, the prognosis for metastatic or recurrent OS remains dismal, with a five-year survival rate of about 20–30%^[38]3. This is because of OS’s high invasiveness and heterogeneity, which contribute to medication resistance, postoperative recurrence, and distant metastases. The intricate etiology and processes of OS remain poorly understood^[39]4, making it extremely difficult to predict patient outcomes with any degree of accuracy^[40]5. Furthermore, finding useful biomarkers is critical for enhancing long-term survival and prognostic evaluations, since the majority of OS patients are young. Retinoblastoma (RB) is the most common malignant eye tumor in children and frequently predisposes individuals to OS^[41]6. Studies indicate a higher incidence of OS among patients with tumor syndromes compared to the general population^[42]7. Research has demonstrated that RB and OS share similar genetic alterations, including mutations in the RB1 gene^[43]8. Identified as the first human tumor suppressor gene, the RB1 gene plays a critical role in tumor prevention by regulating the Rb protein, which inhibits cell division^[44]9. However, investigations into the shared genetic traits between RB and OS and their prognostic value remain scant. Therefore, this study aims to identify biomarkers that enhance the prediction of OS outcomes by exploring the transcriptomic similarities between RB and OS. Tumor-associated macrophages (TAMs), T cells, B cells, and NK cells are examples of immune cells found in the tumor immune microenvironment (TIME), while mesenchymal stem cells (MSC) and circulating tumor cells (CTC) are examples of non-immune cells. This composition is important for the genesis and progression of tumors and is suggestive of immune infiltration into the tumor environmen^[45]10–[46]12. Three main immunological phenotypes have been discovered by recent developments in tumor immunity: immune-inflamed, which is characterized by significant immune infiltration; immune-excluded, which is limited to T-cell infiltration in the stroma; and immune-desert, which lacks immune infiltration^[47]13. In immune-inflamed tumors, the precise location of immune cell infiltration can elicit distinct anti-tumor immune responses: (1) intra-tumoral infiltration leads to a robust anti-tumor effect; (2) peripheral infiltration results in restricted anti-tumor immunity. The presence of tertiary lymphoid structures (TLS), which provide an organized, lymph node-like structure for T-cell activation, is indicative of high activity in the TLS pathway suggesting scenario (1), while low activity indicates scenario (2). The level of immune cell infiltration within the TIME significantly impacts the capacity to combat tumors, influencing the prognosis of osteosarcoma (OS) patients^[48]14,[49]15. Thus, TIME status may serve as a prognostic biomarker and predict responses to treatments such as immunotherapy in OS. We conducted a thorough analysis in this study using transcriptome sequencing data from the Gene Expression Omnibus (GEO) and University of California, Santa Cruz (UCSC Xena) databases. This allowed us to construct a shared differential gene expression profile for retinoblastoma (RB) and OS. Analysis of the biological processes and pathway enrichments of these genes revealed their close association with immune characteristics and cancer pathways. We identified prognostic genes linked to RB and OS immunity using a combination of three machine learning methods and constructed a risk model. Additionally, we assessed the correlation between immune features and chemotherapy responses in OS patients, and identified drugs targeting characteristic genes from the CellMiner database. Our findings offer new insights into the shared pathogenic mechanisms of RB and OS, potentially guiding personalized treatments for OS patients. Materials and methods Data collection and processing We downloaded microarray data for RB from the Gene Expression Omnibus (GEO) database, which included datasets [50]GSE58780 and [51]GSE208143. [52]GSE58780: The gene expression matrix of 63 samples retinoblastomas tumor which were studied had received no treatment prior to surgical enucleation and 3 fetal retinas. [53]GSE208143: The gene expression matrix of 27 retinoblastoma samples and 6 control retina samples. The samples were collected from 9 retinoblastoma patients. (Among them, 5 cases were advanced, while 4 cases did not) and 2 infants who served as controls. OS datasets included [54]GSE36001, which provided gene expression matrices for 19 OS cell lines and 6 normal samples, [55]GSE154540, which comprised data from 50 OS patients receiving chemotherapy with 29 showing good response and 21 poor response, and [56]GSE21257, which contained clinical-pathological information of 34 OS patients who developed metastases within 5 years and 19 who did not. Datasets [57]GSE58780 and [58]GSE208143 were utilized to analyze differential gene expression in RB, while [59]GSE36001 was used for similar analyses in OS. [60]GSE154540 was employed to examine the relationship between chemotherapy and immunological features in OS. Dataset [61]GSE21257 was used to validate a prognostic model. We retrieved TARGET-OS FPKM normalized data and clinical information from the UCSC Xena database, encompassing 85 OS patients. Common gene identification and functional enrichment analysis First, we integrated and batch corrected the gene expression data from the [62]GSE58780 and [63]GSE208143 cohorts using the R packages “limma”^[64]16 and “sva”^[65]17. After that, the combined datasets were subjected to differential gene analysis using the “limma” R package to find RB differentially expressed genes (RB-DEGs). Volcano plots were used to visualize the DEGs, with the thresholds set at P < 0.05 and |log2FC (fold change)| > 0.585. The “limma” R program was used to analyze differential gene expression in dataset [66]GSE36001. The results showed up as a Venn diagram with OS differentially expressed genes (OS-DEGs) intersecting with RB-DEGs to discover shared differentially expressed genes. We used the “clusterProfiler” R package^[67]18 for Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis in order to investigate the biological processes and pathways linked to these genes. Bar charts are used to display results that are significantly enriched, P < 0.05. By using gene set variation analysis (GSVA) and gene set enrichment analysis (GSEA), frequent differentially expressed genes in OS were examined for their functional roles and pathway enrichments. The “GSVA” R program was used to run GSVA^[68]19, and heatmaps and bar graphs were used to show the pathway enrichment of frequently differentially expressed genes in OS. Then, using the R package “clusterProfiler”, GSEA was carried out, and the outcomes were shown in enrichment plots. Immune infiltration characteristic and immunotherapeutic response prediction Prior research has indicated that immune cells and associated pathways are the primary sources of common DEGs. Thus, we used the R package “CIBERSORT” to look into the relationship between immunological features and chemotherapy responses^[69]20. Based on the characteristic gene sets of 22 immune cell subtypes, this investigation computed the composition ratios of immune cells in OS samples from patients with varying results from treatment. We examined the variations in immune cell infiltration between the well-responding and under-responding groups. We used the TIDE online method ([70]http://tide.dfci.harvard.edu/) to construct TIDE scores^[71]21,[72]22, Exclusion scores, Dysfunction scores, and MSI scores for OS samples from the [73]GSE154540 cohort, with P values < 0.05 considered significant, in order to predict the advantages of immunotherapy. Weighted gene co-expression network analysis (WGCNA) Gene correlation patterns can be identified and linked to clinical-pathological aspects using WGCNA, a crucial and popular bioinformatics technique that groups genes into modules based on similarities in their co-expression across samples. We divided the 85 OS patients from the TARGET-OS dataset into groups and used ESTIMATE and TIDE analyses to measure immune infiltration. This allowed us to get TIDE scores and Immune Scores before using WGCNA. We reversed the TIDE scores because high and low Immune Score values indicate opposing implications. After normalizing both scores to a similar baseline, PCA analysis was used to identify the first principal component, which served as the official Immune score. The OS samples were divided into two main groups based on this score: the lower third formed the low immune infiltration group (immune-desert type), while the upper two-thirds formed the strong immune infiltration group. Then, for tertiary lymphoid structures (TLS) in the high immune infiltration group, we computed single-sample Gene Set Enrichment Analysis (ssGSEA) pathway activity scores using the R package “GSVA”. Based on median results, a further classification was made within the high immune infiltration group, with the higher half being classified as immunological-inflammatory and the lower half as immune-exclusion. In total, the patients from the TARGET-OS dataset were divided into three categories: immune-inflammatory, immune-exclusion, and immune-desert. The “WGCNA” R program was then used to examine these subtypes, and the module that had the strongest correlation with the immune-inflammatory type was chosen to intersect with RB-DEGs. This allowed for the identification of important genes that are closely linked to OS immunity in RB^[74]23. After that, these important genes were the subject of KEGG and GO analyses to investigate the biological processes and pathways involved. The results were then shown using R. Machine learning algorithms screen prognostic features Using previously chosen important genes, we performed univariate and multivariate Cox regression analyses to tentatively identify genes related to OS prognosis. We used three different machine learning techniques to further narrow down feature genes from the 65 genes associated with prognosis found through univariate Cox regression analysis: LASSO, SVM, and RF (LASSO: Least Absolute Shrinkage and Selection Operator; SVM: Support Vector Machine; RF: Random Forest). Lasso regression is a linear model regularization method that addresses multicollinearity, promotes feature selection, and helps improve the predictive power of the model. Based on the least squares method, it adds a penalty term to the coefficient vector, causing some insignificant regression coefficients to shrink to zero, thereby selecting the most important variables. First, we input the previously identified 65 genes into the LASSO algorithm. Using the R package “glmnet”, we constructed a regression model with 10-fold cross-validation. The “family” parameter was set to “binomial” and the optimal lambda value was selected using “lambda.1min”. We plotted the logarithm (lambda) profiles of the LASSO coefficients for the 65 features and curves for partial likelihood deviation. SVM-RFE is a feature selection technique that leverages support vector machines (SVM) to iteratively remove the least significant features. At each step, the feature with the lowest importance, as determined by the SVM model’s coefficients, is discarded, and the model is re-trained using the remaining features. This process continues until the optimal subset of features is identified, enhancing the model’s performance and interpretability. SVM-RFE is realized utilizing the “e1071” and “MSVM-RFE” packages. All of the 65 key genes were used in the SVM model. The result of SVM-RFE was visualized, and by 10-fold cross-validation, the red circle indicated maximum classification precision and the corresponding gene sets were the most accurate features at the lowest 10×CV error and the highest 10×CV accuracy. The RandomForest method is an ensemble learning approach that enhances prediction accuracy and robustness by combining the outputs of multiple decision trees. Each tree is trained on a separate subset of samples. We utilized the RandomForest algorithm for gene classification with the R package “randomForest”. Through this decision tree-based method, we identified the most important variables. By applying the algorithm, we successfully pinpointed the significant genes. Initially, we built a random forest model using 1000 trees on the discovery cohorts. Then, the optimal number of trees was determined by minimizing cross-validation errors. Finally, we selected the top 16 most significant genes. The genes that were identified through the integration of three algorithms are known as hub genes, or ROIGs, and were used to create the risk model. Development and validation of the risk model ROIGS (Retinoblastoma-Osteosarcoma Immune-Inflamed Genes associated Signature) is a risk model that we created using six hub genes that were chosen using three different machine learning algorithms. The external dataset [75]GSE21257 served as the validation set, and the TARGET-OS cohort served as the training set. The R package “survminer”^[76]24 was utilized to calculate the ideal cutoff, which allowed for the classification of all patients into high-risk and low-risk groups. To create survival curves and risk plots that show the survival differences and status for every patient, the R packages “survminer” and “ggrisk”^[77]25 were used. Furthermore, we used the R package “timeROC” to plot 1-year, 3-year, and 5-year ROC curves in order to evaluate the predictive accuracy of our model^[78]26. Correlation study of drug sensitivity predictions and immunotherapy anticipated pathways Three separate OS risk variables (ARX, MXI1, and SLC38A5) were correlated with pathways predicted by immunotherapy. We used the R package “ggcor” to visualize the enrichment scores for immune-related pathways and those that predict the efficacy of immunotherapy, using the ssGSEA method. We obtained drug sensitivity data and gene expression datasets for the NCI-60 cancer cell lines from the CellMiner database ([79]https://discover.nci.nih.gov/cellminer/home.do) in order to identify medications sensitive to ARX, MXI1, and SLC38A5. The link between the expression of distinctive genes and drug sensitivity was ascertained by Spearman correlation analysis after the data had been integrated; a |correlation coefficient (R value)|> 0.3 and P < 0.01 was deemed statistically significant. Statistical analysis R software (version 4.1.1) was used for all statistical studies. Using the Pearson or Spearman coefficients, we investigated correlations between the variables. For categorical variables, chi-square tests were used, and for continuous variables, t-tests or Wilcoxon rank-sum tests. The log-rank test was utilized to assess variations in survival curves between the high-risk and low-risk groups, and Kaplan-Meier survival analysis was performed to compare overall survival between these groups. Furthermore, ROC curves and AUC values were computed to evaluate the risk ratings’ prediction ability. Statistical significance was defined as P-values less than 0.05 (*P < 0.05, **P < 0.01, ***P < 0.001, ns: not significant). Results Figure [80]1 displays the study’s workflow diagram. Fig. 1. [81]Fig. 1 [82]Open in a new tab Research flowchart. Functional characteristics of shared DEGs Utilizing the “limma” R package, differential gene expression studies were performed for both OS and RB. 7377 RB-DEGs were found in RB, as shown in Fig. [83]2A. considerably upregulated were TACC3, CDKN2A, POLD1, TK1, and UBE2C, while considerably downregulated were UNC13C, ARHGAP26, TBX2, and AHCYL2. The OS results are shown in Fig. [84]2B, where 4618 DEGs with significantly elevated levels of HOXB7, RPS28, and LOC390354 are identified. There are 1216 shared DEGs between RB and OS, as indicated by the intersection of the Venn diagram in Fig. [85]2C. Based on GO analysis, these frequent DEGs are principally involved in multiple regulatory processes of immunological and cell activation, such as leukocyte activation implicated in the immune response, lymphocyte and mononuclear cell proliferation, and positive regulation of cell activation (Fig. [86]2D). According to KEGG enrichment analysis (Fig. [87]2E), these DEGs are strongly linked to pathways that include rheumatoid arthritis, PD-L1 expression and PD-1 checkpoint pathways in cancer, Th17 cell differentiation, allograft rejection, Th1 and Th2 cell differentiation, T cell receptor signaling, osteoclast differentiation, B cell receptor signaling, hematopoietic cell lineage, natural killer cell-mediated cytotoxicity, T cell receptor signaling, HIV-1 infection, and PI3K-Akt signaling pathway. Fig. 2. [88]Fig. 2 [89]Open in a new tab Common DEGs and functional enrichment between OS and RB are identified. (A) The 7377 DEG volcano graphic for the RB and normal groups. Larger bubbles indicate greater significance. The size of the bubbles signifies significance. Genes that are upregulated are represented by orange, and those that are downregulated by green. (B) 4618 DEG volcano plots for the OS and normal groups. The Venn diagram in (C) shows intersected genes from OS-DEGs and RB-DEGs. (D) Common DEGs’ GO biological process enrichment analysis. (E) Common DEGs’ KEGG enrichment analysis. A functional study of frequent DEGs in OS via enrichment To explore the pathway enrichment of common DEGs in OS and normal tissues, we conducted GSVA and GSEA enrichment analyses. As illustrated in Fig. [90]3A, GSVA analysis shows that, compared to normal tissues, common DEGs in tumor tissues are upregulated in pathways including the RIG-I-like receptor signaling pathway, glyoxylate and dicarboxylate metabolism, nucleotide excision repair, DNA replication, Notch signaling pathway, regulation of autophagy, RNA degradation, bladder cancer, cell cycle, pancreatic cancer, chronic myeloid leukemia, ABC transporters, proteasome, and folate biosynthesis. We then analyzed the enrichment levels of the top 15 pathways (Fig. [91]3B), finding that common DEGs in tumor tissues are predominantly enriched in pathways such as hypertrophic cardiomyopathy, adipocytokine signaling pathway, cell adhesion molecules, type II diabetes mellitus, tryptophan metabolism, amyotrophic lateral sclerosis, dorso-ventral axis formation, arachidonic acid metabolism, fatty acid metabolism, starch and sucrose metabolism, epithelial cell signaling in helicobacter pylori infection, TGF-β signaling pathway, circadian rhythm mammal, sulfur metabolism, and glycosphingolipid biosynthesis in the lacto and neolacto series. Next, using GSEA, we assessed the signaling pathways and functions related to these common DEGs. Our findings suggest that these DEGs are closely associated with biosynthesis of cofactors, cell cycle, natural killer cell-mediated cytotoxicity, PI3K-Akt signaling pathway, and ribosome (Fig. [92]3C). Fig. 3. [93]Fig. 3 [94]Open in a new tab GSVA and GSEA analysis of common DEGs. (A) Volcano plot: GSVA between tumor tissues and normal tissues. (B) Bar plot: Enrichment levels of the top 15 pathways in tumor versus normal tissues, with tumor tissues represented in red, normal tissues in blue, and longer bars indicating higher enrichment. (C) GSEA of common DEGs. Immunological feature correlation analysis We previously established that immune infiltration is linked to the progression of RB and OS, associated with various lymphocytes and immune pathways. Consequently, we investigated whether immunological characteristics influence the chemotherapeutic responsiveness in OS. Clinical data on chemotherapy responses of OS patients are available from the [95]GSE154540 cohort. Based on these outcomes, patients were categorized into good and poor response groups Using the R package “CIBERSOTR,” we performed immune infiltration analysis, which showed different immune cell infiltration features throughout the groups. Figure [96]4A, B show that the group with a good reaction had slightly more lymphocyte infiltration than the group with a poor response. A greater TIDE score indicates a worse immunotherapy response, and this algorithm was used to predict patients’ clinical reactions to immunotherapy. Using TIDE analysis to forecast immunotherapy efficacy across various chemotherapy response groups, we found that the poor response group had significantly higher TIDE (Fig. [97]4C) and Exclusion scores (Fig. [98]4E). While the poor response group showed higher Dysfunction scores (Fig. [99]4D) and lower MSI scores (Fig. [100]4F), the differences were not statistically significant, most likely because of the small sample size. As a result, tumor tissues in the group with a poor response might have a cool tumor microenvironment, which could promote immune rejection during immunotherapy and result in less successful treatment results. Fig. 4. [101]Fig. 4 [102]Open in a new tab Correlation analysis between chemotherapy efficacy and immune features in OS. (A, B) Proportion of all 22 immune cell types based on CIBERSORT in the [103]GSE154540 cohort. (C–F) TIDE analysis of two chemotherapy response groups in the [104]GSE154540 cohort. An asterisk (“*”) represents P < 0.05, while “ns” indicates “not significant.” Identification of ROIGs After conducting immunotyping on OS patients in the TARGET-OS cohort, we applied WGCNA to analyze the expression profiles of three OS immune subtypes. Employing a soft threshold method, this study built a co-expression network. By setting β to 6, we achieved an R-squared value of 0.95, thus establishing a scale-free network (Fig. [105]5A, B). Figure [106]5C shows a hierarchical clustering tree, constructed using TOM similarity measure, with each branch indicating genes sharing similar expression and biological functions. We identified a total of 12 co-expression modules, and the heatmap showed the average correlation of module characteristic genes with different OS immune subtypes, with the grey module (MEgrey) displaying the strongest positive correlation at 21% with the OS immune-inflamed subtype (Fig. [107]5D). To explore genes associated with the OS immune-inflamed subtype in RB, we selected overlapping genes between RB-DEGs and OS-MEgrey for further analysis, identifying 672 common genes, termed ROIGs (RB associated with OS Immune-inflamed Genes) (Fig. [108]5E). Subsequently, GO and KEGG analyses were conducted to explore the biological processes and pathways enriched by ROIGs. The GO analysis found these genes to be primarily involved in the canonical Wnt signaling pathway, mitotic cell cycle phase transition, and other related processes (Fig. [109]5F). KEGG pathway enrichment analysis showed that ROIGs are predominantly enriched in pathways such as the cell cycle and P53 signaling pathway (Fig. [110]5G). These ROIGs are implicated in multiple cancer-related pathways and the cell cycle, potentially playing a key role in the development of RB and OS. Fig. 5. [111]Fig. 5 [112]Open in a new tab (A, B) Scale independence index and mean connectivity analysis at several soft thresholds, with the selected soft threshold indicated by the red line, along with WGCNA and functional analysis of ROIGs. (C) A dendrogram cluster. (D) Module-trait correlation heatmap with the p-value and correlation coefficient (in parentheses) displayed. (E) Venn diagram showing OS-MEgrey and RB-DEGs. (F-G) ROIG enrichment analysis using KEGG and GO pathways. Screening for prognostic genes based on machine learning algorithms We performed univariate and multivariate COX regression analysis on these 672 ROIGs in order to identify feature genes that are prognostically relevant. Univariate COX regression analysis identified 65 prognosis-related genes, including 13 highly significant ones, as shown in Fig. [113]6A. Multivariate COX regression analysis also identified SLC38A5, MXI1, and ARX as separate risk variables for OS prognosis. LASSO, SVM, and RF are three machine learning methods that we used to further refine feature genes based on the findings of the univariate COX regression analysis. Optimal parameter tuning and the distribution of LASSO coefficients recommended putting λ at 0.1255955, which produced 9 feature genes (Fig. [114]6B). 26 genes with the best 10-point CV accuracy and lowest 10-point CV error were identified by the SVM method (Fig. [115]6C). Finally, 22 feature genes were obtained by feeding the 65 genes from the univariate COX regression analysis into the RF classifier (Fig. [116]6D). Six hub genes for model development were identified by the intersection of the findings of the three algorithms: SLC38A5, SH3PXD2A, MXI1, SERPINH1, ARX, and COL5A2 (Fig. [117]6E). Fig. 6. [118]Fig. 6 [119]Open in a new tab Finding ROIGs associated to prognosis. (A) Cox regression analyses, both univariate and multivariate, of RIGs associated with OS prognosis. (B) Plots of the coefficient profiles and the partial likelihood deviance for logistic regression using Lasso algorithm. (C) Using the SVM technique, 26 genes were chosen. (D) Calculating the minimum error required to determine the number of trees and the significance of the ROIGs in the RF method. (E) A Venn diagram created by connecting the output of three algorithms revealed six hub genes. Building and verifying the model based on ROIGs Utilizing three machine learning methods to identify 6 hub genes, we created a risk model to investigate the association between ROIGs and OS prognosis. The [120]GSE21257 cohort served as the validation set and the TARGET-OS cohort as the training set. Based on the median cutoff, patients were divided into groups according to risk: high and low. The distribution of patients and risk scores is shown in Fig. [121]7A. It is seen that the high-risk group has a much shorter survival time than the low-risk group, and that hub gene expression is significantly higher in the high-risk group. Patients in the high-risk group have shorter survival periods than those in the low-risk group, according to the Kaplan-Meier analysis (log-rank P < 0.0001; Fig. [122]7C). Figure [123]7B shows the AUC values of 0.812, 0.877, and 0.866 for the risk scores that predict overall survival rates at 1, 3, and 5 years, respectively. The external validation set produced results that were similar as well. Patients in the high-risk category have lower survival periods than those in the low-risk group, as Fig. [124]7D illustrates. Patients in the high-risk group have a worse prognosis than those in the low-risk group, according to Kaplan-Meier analysis (log-rank P = 0.036; Fig. [125]7F). In addition, ROC analysis shows that the AUCs for 1, 3, and 5-year OS are 0.862, 0.736, and 0.773, respectively (Fig. [126]7E), indicating that our risk model has a high degree of predictive accuracy. Fig. 7. [127]Fig. 7 [128]Open in a new tab The ROIGs signature’s prognostic value. (A) A heatmap displaying the expression of the six signature genes of the ROIGs signature is displayed alongside the distribution of risk scores and survival status in the TARGET-OS cohort. (B) A ROC curve that shows how well the ROIGs signature predicts outcomes for the TARGET-OS cohort. (C) Kaplan-Meier curve illustrating the variations in overall survival between the TARGET-OS cohort’s high-risk and low-risk groups. (D) A summary of the six-gene signature’s gene expression profiles, survival status, and risk score distribution in the validation cohorts. (E) AUC values for the risk score in the validation cohorts that predicted the 1-, 3-, and 5-year survival rates. (F) Kaplan-Meier survival curves in the [129]GSE21257 cohort for patients with high and low ROIGS scores. Investigation of immune therapy-associated pathways and sensitive drug selection for ROIGs By analyzing the associations between ARX, MXI1, SLC38A5, and immune therapy prediction pathways, we found that ARX is positively correlated with pyrimidine metabolism (Fig. [130]8A). MXI1 shows positively associations with DNA replication, mismatch repair, and pyrimidine metabolism, but is negatively correlated with the IFN-Gamma signature and APM signal (Fig. [131]8B). We then identified sensitive drugs for these three characteristic genes from the CellMiner database, highlighting six drugs with notable sensitivity. We discovered that [132]AEG40730 (R = − 0.43, p = 0.00035; Fig. [133]8C) and azd5582 (R = − 0.35, p = 0.0056; Fig. [134]8D) are negatively correlate with ARX expression, whereas ARQ-680 (R = 0.48, p < 0.001; Fig. [135]8E) and HYPOTHEMYCIN (R = 0.5, p < 0.001; Fig. [136]8F) positively correlate with MXI1 expression, and 7-Hydroxystaurosporine (R = 0.38, p = 0.0025; Fig. [137]8G) and Acrichine (R = 0.39, p = 0.0024; Fig. [138]8H) are positively correlated with SLC38A5 expression. Therefore, OS patients with high ARX expression may be more responsive to [139]AEG40730 and azd5582, while those with high MXI1 expression may respond better to ARQ-680 and HYPOTHEMYCIN, and patients with high SLC38A5 expression may benefit from treatment with 7-Hydroxystaurosporine or Acrichine. Fig. 8. [140]Fig. 8 [141]Open in a new tab Examination of medication sensitivity and connections between immune treatment pathways. (A) ssGSEA analysis to ascertain whether immune treatment predicting pathways and ARX are related. (B) Correlation network diagram illustrating the relationship between MXI1 and immune therapy predicted pathways. Solid lines indicate positive correlations, dashed lines indicate negative correlations, green lines indicate P < 0.05, and purple lines indicate ns, or “not significant.” (C–H) Scatterplots showing the analysis of drug sensitivity and three distinctive gene expressions from the CellMiner database; ns denotes non-significant; *P < 0.05; **P < 0.01. Discussion OS is a common malignant tumor among adolescents and the second leading cause of cancer death in this age group^[142]27. Due to its highly heterogeneous and invasive nature, OS frequently metastasizes, with 10–20% of patients having lung metastases at diagnosis^[143]28. Following metastasis, there is a steep decline in patient prognosis, marked by a disappointing five-year survival rate^[144]29. In contrast, RB, a cancer susceptibility syndrome associated with OS, exhibits a higher survival rate. With advancements in clinical diagnosis and treatment, the 5-year survival rate for RB in developed countries can approach 100%^[145]30. However, RB patients face a significantly increased risk of developing OS due to exposure to radiation therapy and chemotherapy—up to 69 times higher, and nearly 400 times higher with radiation exposure specifically^[146]8. Studies indicate that the prognosis for patients with secondary OS is similar to those with primary OS^[147]31,[148]32. The complex molecular mechanisms underlying the interactions between OS and RB remain elusive, underscoring the vital importance of exploring the molecular pathways connecting the two conditions to better understand OS pathogenesis and to develop effective screening, diagnostic, preventive, and therapeutic strategies. We found that RB and OS have a differential gene expression profile, which indicates that these DEGs are mostly engaged in immune cell activation and proliferation. Tumor progression has been demonstrated to be significantly influenced by immune cell subsets present in the tumor microenvironment. The most common innate immune cell in humans, neutrophils, react to certain triggers called Neutrophil extracellular traps (NETs), which can encourage tumor growth and metastasis^[149]33,[150]34. Building on this, Tang et al.^[151]35 developed a prognostic model for OS related to NETs and confirmed that its key genes exacerbate OS metastasis. Therefore, exploring the role of immune cells in the OS microenvironment is crucial for a deeper understanding of its metastasis and progression mechanisms. Given the significant enrichment of DEGs in immunity-related pathways and functions, we aimed to further investigate the immunological differences between groups with good and poor chemotherapy response in OS. We discovered that the degree of immune cell infiltration was slightly higher in the group with a good response. Additionally, TIDE analysis indicated that TIDE and Exclusion scores are lower in the good group than in the poor group, suggesting that patients with a positive response to chemotherapy might be ideal candidates for immunotherapy. Subsequently, through WGCNA, we identified 65 ROIGs. Analyses via KEGG and GO demonstrate that these ROIGs are mainly enriched in cancer-related pathways such as the cell cycle, Wnt signaling pathway, and p53 pathway. Notably, the Wnt signaling pathway is crucial for regulating epithelial-mesenchymal transition (EMT), with sustained activation frequently observed in OS^[152]36,[153]37. These genes can drive OS progression through the Wnt/β-catenin signaling pathway, thus impacting the prognosis of OS patients^[154]38–[155]40. Machine learning, a significant branch of artificial intelligence, enables the identification of critical features from large volumes of high-dimensional data and has been widely used in the development of prognostic models^[156]41–[157]43. Six hub genes, namely SLC38A5, SH3PXD2A, MXI1, SERPINH1, ARX, and COL5A2, were discovered from ROIGs using Cox regression analysis and three different types of machine learning techniques. These were used to develop our model. COL5A2 is a collagen gene family member that contributes significantly to tumor progression by being engaged in tumor angiogenesis and metastasis^[158]44–[159]46. Han and colleagues conducted in vitro research to establish that increased COL5A2 expression speeds up osteosarcoma cell proliferation, invasion, and migration. They also discovered a negative correlation between high COL5A2 expression and a bad prognosis for OS patients^[160]39. The SH3PXD2A antisense RNA1 (SH3PXD2A-AS1) is notably expressed in colorectal cancer (CRC) tissues, where it interacts with the p53 protein and modulates p53-mediated gene transcription to facilitate CRC progression^[161]47, though its specific molecular mechanisms in OS are yet to be elucidated. SERPINH1, a major member of the serpin superfamily, is significantly overexpressed in osteosarcoma tissues compared to normal tissues. It promotes osteosarcoma progression by activating the PI3K-Akt signaling pathway and is an important prognostic biomarker for OS^[162]48,[163]49. Moreover, multivariate Cox regression analysis has demonstrated that SLC38A5, MXI1, and ARX are independent prognostic factors for OS. The dimerization protein MXI1, a member of the MAD gene family, plays a crucial role in controlling cell proliferation and growth by competing with Max to antagonize the transcriptional activity of Myc^[164]50. Downregulation of MXI1 contributes to tumor development and progression, yet its role in OS has not been reported. Xu et al.^[165]51 suggest that MXI1 might inhibit the progression of OS by impairing cell proliferation and inducing apoptosis. SLC38A5, an amino acid-dependent Na/H exchanger, plays a key role in cancer-related metabolic pathways^[166]52. In breast cancer, SLC38A5 enhances cancer cell vitality and reduces sensitivity to cisplatin chemotherapy by promoting glutamine metabolism^[167]53. Sikder et al.^[168]54 demonstrated the therapeutic effect of the selective SLC6A14 inhibitor α-methyl-L-tryptophan in colon cancer, but whether SLC38A5 can serve as a drug target for OS requires further exploration. The Aristaless-related homeobox gene (ARX) is an important transcription factor that plays a crucial role in the development of the central nervous system. It binds to the specific sequence motif 5′-TAATTA-3′ in the regulatory elements of the histone demethylase KDM5C, promoting its transcription and leading to neurodevelopmental disorders. ARX has been shown to be involved in the onset and progression of certain cancers^[169]55. Research by Azfar et al.^[170]56 demonstrated a strong association between high ARX expression in pancreatic neuroendocrine tumors and poor prognosis, as well as recurrence. However, studies on ARX in osteosarcoma (OS) remain scarce, and its role in OS requires further investigation. In our study, we found that ARX is associated with unfavorable prognosis in OS patients, highlighting a potential new direction for future research. In recent years, given the success of immunotherapy in various solid tumors^[171]57,[172]58, researchers have begun exploring immunotherapies that could improve the prognosis of OS patients. Studies suggest immunotherapy holds broad potential applications in OS^[173]59,[174]60. Mifamurtide (MTP-PE), by activating monocytes and selectively targeting tumor cells, has enhanced survival rates in OS patients^[175]56,[176]61. The combination of anti-TGF-β antibodies and dendritic cells (DCs) enhances systemic immune responses, thereby exerting an anti-tumor effect in OS^[177]62. The tyrosine kinase inhibitor of VEGFR2, cabozantinib (XL184), has demonstrated notable anti-tumor effects in advanced osteosarcoma patients, potentially offering a novel treatment option for those with unresectable or recurrent OS^[178]63. Our research indicates that ARX and MXI1 are intricately linked to several immunotherapy pathways, including pyrimidine metabolism, mismatch repair, APM signaling, DNA replication, and IFN-Gamma signature. We have also identified several drugs that target these genes from the CellMiner database, including [179]AEG40730, azd5582, ARQ-680, and HYPOTHEMYCIN. Thus, ARX and MXI1 represent promising new targets for immunotherapy in OS. To our knowledge, this is the first study to explore transcriptomic commonalities between RB and OS, utilizing machine learning algorithms to identify characteristic genes in OS and develop a risk model. Our research directs further exploration into the molecular mechanisms associated with OS, with the ROIGs signature able to accurately predict the prognosis of OS patients. However, our study has limitations. First, the data samples we have gathered are relatively limited in number. Secondly, further in vivo and in vitro studies are necessary to elucidate the specific mechanisms of three independent prognostic factors (ARX, MXI1, and SLC38A5) in OS. Lastly, while we have demonstrated the predictive capability of our model in OS, its prognostic value in RB still requires further investigation. Conclusion In this study, we analyzed the biological processes and pathways involving shared genes between RB and OS. Utilizing WGCNA and various machine learning algorithms, we identified six key immune-related shared genes between RB and OS, developed a risk model, and identified sensitive drugs. Our research contributes to the management and precision treatment of OS patients. Author contributions R. Y. and R. Z. conceptualized the manuscript; R. Y. and W. Y. was responsible to methodology and software; W. Y. was responsible to formal analysis; R. Z. and D. L. was responsible to investigation; Y. H. and H. T. was responsible to data curation; R. Y. and Q. Y. wrote original draft; H. T. and Q. Y. wrote, reviewed and edited the main manuscript text. All authors have read and agreed to the published version of the manuscript. Data availability All data come from the publicly available datasets in present study. All data come from the publicly available datasets in present study.The datasets analyzed during the current study are available in the GEO database ([180]GSE58780, [181]GSE208143, [182]GSE36001, [183]GSE154540 and [184]GSE21257, https://www.ncbi.nlm.nih.gov/geo/), UCSC Xena database (TARGET-OS, [185]https://xena.ucsc.edu/) and CellMiner database ([186]https://discover.nci.nih.gov/cellminer/home.do). Further information is available from the corresponding author on reasonable request (Rongdong Zeng, Email address: zrd201295@fjmu.edu.cn). Competing interests The authors declare no competing interests. Footnotes Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. These authors contributed equally: Rongjie Ye, Quan Yuan and Wenkang You. Contributor Information Haifeng Tang, Email: 2289325268@qq.com. Rongdong Zeng, Email: zrd201295@fjmu.edu.cn. References