Abstract Osteosarcoma, an aggressive bone malignancy predominantly affecting children and adolescents, is characterized by a poor prognosis and high mortality rates. The development of reliable prognostic tools is critical for advancing personalized treatment strategies. However, identifying robust gene signatures to predict osteosarcoma outcomes remains a significant challenge. In this study, we analyzed gene expression data from 138 osteosarcoma samples across two multicenter cohorts and identified 14 consensus prognosis-associated genes via univariate Cox regression analysis. Using 66 combinations of 10 machine learning (ML) algorithms, we developed a machine learning-derived prognostic signature (MLDPS) optimized by the average C-index across TARGET, [42]GSE21257, and merged cohorts. The MLDPS effectively stratified osteosarcoma patients into high- and low-risk score groups, achieving strong predictive performance for 1-, 3-, and 5-year overall survival (AUC range: 0.852 — 0.963). The MLDPS, comprising seven genes (CTNNBIP1, CORT, DLX2, TERT, BBS4, SLC7A1, NKX2-3), exhibited superior predictive accuracy compared to 10 established gene signatures. The findings of the MLDPS carry significant clinical implications for osteosarcoma treatment. Patients with a high-risk score demonstrated worse prognosis, increased metastasis risk, reduced immune infiltrations, and greater sensitivity to immunotherapy. Conversely, low-risk patients exhibited prolonged survival and distinct drug sensitivities. These findings underscore the potential of MLDPS to guide risk stratification, inform personalized therapeutic strategies, and improve clinical management in osteosarcoma. Supplementary Information The online version contains supplementary material available at 10.1038/s41598-025-00179-z. Keywords: Osteosarcoma, Machine learning, Tumor immunotherapy, Risk score, Prognosis Subject terms: Cancer, Computational biology and bioinformatics, Drug discovery, Immunology, Biomarkers, Oncology, Risk factors Introduction Osteosarcoma, a rare yet aggressive primary malignant bone tumor, predominantly affects pediatric and adolescent populations^[43]1,[44]2. Despite advancements in surgical resection, chemotherapy, and immunotherapy, survival outcomes remain poor, largely due to the high incidence of metastasis and chemotherapy resistance^[45]3,[46]4. The five-year survival rate for osteosarcoma patients exhibiting chemotherapy resistance or metastasis is a mere 20-30%^[47]5–[48]7. Although the TNM staging system provides basic prognostic insights^[49]8, it has limited capacity to capture the molecular complexity and heterogeneity of osteosarcoma. Consequently, researchers are prioritizing the discovery of novel molecular biomarkers to improve clinical outcome prediction and optimize therapeutic strategies^[50]9–[51]11. The treatment landscape for sarcomas, including osteosarcoma, is evolving through the integration of molecular and genomic insights. Immune checkpoint inhibitors, tumor microenvironment-targeted therapies, and combination regimens show promise in sarcoma management; however, their application in osteosarcoma remains exploratory. For example, genetic profiling of uterine leiomyosarcoma has uncovered frequent mutations in tumor suppressor genes such as TP53, RB1, and PTEN, pointing towards potential therapeutic targets^[52]12. Nonetheless, adjuvant chemotherapy yields inconsistent results across early-stage uterine leiomyosarcoma subtypes^[53]13,[54]14. Similarly, while immune checkpoint inhibitors demonstrate efficacyin advanced patientscancers^[55]15–[56]20, their role in osteosarcoma requires further validation. However, accurately forecasting prognosis has proved to be challenging due to the variations in the efficacy of treatment approaches^[57]21–[58]24. These findings emphasize the difficulty of establishing universal adjuvant therapies for diverse sarcoma subtypes and accentuate the necessity for individualized approaches grounded in tumor-specific biology. To significantly improve osteosarcoma survival rates, it is important to ascertain innovative biomarkers capable of accurately predicting clinical outcomes and therapeutic responsiveness. Therefore, high-confidence prognostic gene signatures are essential for guiding prognosis and treatment regimens. As high-throughput sequencing technology advances, researchers are increasingly confronted with vast amounts of biological and clinical data, collectively termed “Big Data”^[59]25–[60]27. These datasets enable the systematic identification of gene signatures and molecular pathways driving cancer progression^[61]28, including osteosarcoma^[62]29–[63]31. Previous studies have constructed prognostic signatures associated with hypoxia^[64]32, apoptosis^[65]33, energy metabolism^[66]22, and the immune tumor microenvironment (TME)^[67]34. Nevertheless, the absence of reliable biomarkers often leads to either overtreatment or undertreatment, causing significant financial burden, severe adverse effects, or accelerated disease progression^[68]35,[69]36. To address this gap, machine learning (ML), a subfield of artificial intelligence, has emerged as a powerful tool for constructing predictive models from large-scale datasets^[70]37,[71]38. ML algorithms has proven instrumental in biomedical research, facilitating biomarker discovery, therapeutic target identification, and mechanistic insights across diseases^[72]39–[73]41. For instance, Zhang et al.^[74]42 leveraged ML to develop a six-gene oxidative stress signature for predicting acute myeloid leukemia outcomes. Similarly, recent studies have employed ML to identify diagnostic and prognostic biomarkers in osteosarcoma^[75]43,[76]44, such as Shi et al.’s six-gene risk model (MYC, COL13A1, UHRF2, MT1A, ACTB, GBP1)al^[77]23.Nevertheless, integrating ML with bioinformatics to link prognostic signatures to diagnostic biomarkers and elucidate their clinical significance remains underexplored in osteosarcoma. To identify an optimal biomarker, we constructed an ML-derived prognostic signature (MLDPS) by implementing 66 ML algorithm combinations based on consensus prognostic signature genes (CPSGs). The MLDPS demonstrated robust predictive capabilities for overall survival (OS), metastasis, immunotherapy response, and drug treatment efficacy in osteosarcoma cohorts. After integrating key clinicopathological characteristics and 10 established osteosarcoma signatures, MLDPS exhibited consistent and significantly improved predictive performance. Overall, our research provides a valuable resource for guiding early diagnosis, prognostic evaluation, stratified treatment, personalized therapy, and post-treatment monitoring of osteosarcoma in clinical settings. Materials and methods Public osteosarcoma dataset collection and preprocessing The TARGET-osteosarcoma cohort ([78]https://ocg.cancer.gov/programs/target) consists of 85 samples with survival data and follow-up periods exceeding zero. We also acquired the [79]GSE21257 dataset from the Gene Expression Omnibus ([80]https://www.ncbi.nlm.nih.gov/geo/), which includes 53 osteosarcoma samples with survival information and follow-up periods greater than zero for all patients. In total, 138 samples were included in this study, encompassing 14,174 genes from the TARGET and [81]GSE21257 cohorts. Univariate Cox regression analysis Univariate Cox regression analysis was conducted on the [82]GSE21257 and TARGET cohorts using intersecting genes. CPSGs were selected for further investigation based on the criteria of P < 0.01 and consistent hazard ratios (HRs) that were either above or below 1 in both cohorts. Functional enrichment analysis The CPSGs were used to perform Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses. The R package ‘clusterProfiler’ was employed for Gene Set Enrichment Analysis (GSEA) along with the GO and KEGG analyses^[83]45–[84]49. Gene Set Variations Analysis (GSVA) was conducted using the R package ‘GSVA’^[85]50. The biocarta, KEGG, Reactome, wikipathways, and Hallmark gene sets used in GSVA were downloaded from the Molecular Signatures Database (MSigDB; [86]http://software.broadinstitute.org/gsea/msigdb/index.jsp) version 6.2. Development of MLDPS To construct the MLDPS and ensure high accuracy and generalizability, we integrated 10 widely used ML algorithms: gradient boosting machine (GBM), survival support vector machine (Survival-SVM), supervised principal components (SuperPC), ridge regression, partial least squares regression for Cox (plsRcox), elastic network (Enet), random forest (RSF), least absolute shrinkage and selection operator (LASSO), CoxBoost, and Stepwise Cox. Notably, RSF, LASSO, CoxBoost, and Stepwise Cox effectively reduce dimensions and screen variables. The development pipeline of the MLDPS is outlined as follows: * Univariate Cox analysis was performed on the merged cohort (created by removing batch effects and redundancies from the TARGET and [87]GSE21257 cohorts) to identify CPSGs, defined as genes with a significance level of P < 0.01 and consistent HR orientation across the two cohorts. * Ten ML algorithms were used to create 66 algorithms combinations to develop the most predictive MLDPS with superior C-index performance. * After establishing the model on the training cohort, we evaluated it on two separate testing cohorts. The average C-index was calculated for each model, with the model exhibiting the highest value being deemed optimal. Prognostic value of MLDPS and potential clinical application Patients from the TARGET, [88]GSE21257, and merged cohorts were stratified into high- and low-risk score groups based on the median MLDPS value. The prediction accuracy of the MLDPS was assessed using receiver operating characteristic (ROC) curves and by calculating the time-dependent area under the ROC curve (AUC). The prognostic significance of the MLDPS was evaluated using Kaplan–Meier (KM) curves. Collection and calculation of osteosarcoma’s published signatures Numerous prognostic signatures have been developed for osteosarcoma, focusing on stratified management and precise treatment, including those based on glycolysis- and cholesterol synthesis–related genes^[89]22, M1 macrophage–related genes^[90]24, and apoptosis–associated genes^[91]33. To compare the predictive efficacy of the MLDPS with established prognostic signatures, we systematically reviewed 10 published articles on prognostic models. Subsequently, we computed the risk scores for each signature in the three cohorts using the genes and coefficients outlined in the respective articles. In addition, we conducted a comparative analysis of key clinical characteristics, such as gender, age, and metastasis, between the high- and low-risk score groups in the merged cohort. Assessment of immune infiltrations Immune infiltrations were evaluated in the merged cohort using the R ‘IOBR’ package^[92]51 and various algorithms, including MCPcounter^[93]52, EPIC^[94]53, xCell^[95]54, CIBERSORT^[96]55, IPS, quanTIseq^[97]56, ESTIMATE^[98]57, and TIMER^[99]58. We compared microenvironment-related scores, such as ESTIMATEScore, ImmuneScore, MicroenvironmentScore, and StromalScore, between the high- and low-risk score groups in the merged cohort. The correlation between the MLDPS risk score and chemokines, as well as chemokine receptors, in the combined cohort was determined utilizing the Pearson method to calculate the correlation coefficient and P-value. Correlation coefficients greater than 0 signify a positive correlation, while those less than 0 indicate a negative correlation. A P-value below 0.05 was considered statistically significant. Response to immunotherapy Tumor Immune Dysfunction and Exclusion (TIDE) ([100]http://tide.dfci.harvard.edu/)^[101]59 is an integrated web tool used to forecast patients’ responses to immunotherapy. TIDE scores were calculated for each patient in the merged cohort. Subsequently, immunotherapeutic responses, represented by dysfunction scores and TIDE scores, were compared between high- and low-risk score groups. Prediction of drug sensitivity To further compare differences between high- and low-risk score groups, drug sensitivity analysis was performed to calculate the IC50 values of drugs for each sample using the ‘oncoPredict’ software package. Higher IC50 values indicate decreased sensitivity to treatment. Statistical analysis All statistical analyses were conducted using R software (version 4.2.2). AUC analysis was performed using the ‘timeROC’ package. The KM method, combined with the log-rank test, was employed to assess differences in survival rates among various patient subgroups. Results Development of osteosarcoma CPSGs using univariate Cox regression Prior to biological data analysis, the collected datasets were assessed for batch effects, which were found to be present in both datasets (Fig. [102]1A). To address this, the datasets were merged for downstream analysis, using the R packages ‘limma’ and “sva” to remove batch effects and reduce dataset size (Fig. [103]1B). Next, CPSGs were screened from the merged dataset consisting of 138 samples and 14,174 genes using univariate Cox regression analysis. The robust MLDPS was subsequently constructed using the 14 CPSGs, which showed consistent prognostic outcomes in both the [104]GSE21257 and TARGET cohorts (Fig. [105]1C). The 14 CPSGs comprised 12 risky genes (P-value < 0.01, HR > 1) and two protective genes (P-value < 0.01, HR < 1) (Fig. [106]1C). Fig. 1. [107]Fig. 1 [108]Open in a new tab Removal of batch effects and univariate regression analysis in osteosarcoma. (A) Principle-component Analysis (PCA) plot before batch effect removal. (B) PCA plot after batch effect removal. (C) Univariate Cox regression analysis results for 14 CPSGs in the [109]GSE21257 and TARGET cohorts. GO enrichment analysis revealed that the 14 CPSGs were predominantly enriched in the biological process (BP) of “pattern specification process” (Fig. [110]2A), the cell component (CC) of “mitochondrial inner membrane” (Fig. [111]2B), and the molecular function (MF) of “transcription coactivator binding,” “telomeric DNA binding,” and “telomerase RNA binding” (Fig. [112]2C). KEGG analysis identified the “Arginine and proline metabolism” as the most significant signaling pathway (Fig. [113]2D). These findings lay a groundwork for deeper investigations into the mechanisms of osteosarcoma and uncover pivotal biological processes and pathways that may influence its prognosis. Fig. 2. [114]Fig. 2 [115]Open in a new tab GO and KEGG pathway analyse of CPSGs. The GO enrichment analysis of 14 CPSGs. The enriched BPs (A), CCs (B), and MFs (C) are noted. (D) The KEGG pathway enrichment analysis of 14 CPSGs. Construction of MLDPS Risk prediction models were developed in the TARGET training cohort using a combination of 10 well-established ML algorithms. The predictive performance of all models was evaluated by calculating the average C-index across the TARGET, [116]GSE21257 and merged cohorts. The optimal model (MLDPS) was built by selecting the combination of LASSO and RSF with the highest average C-index (0.866) (Fig. [117]3A). The LASSO algorithm effectively identified the most critical CPSGs, while the RSF algorithm filtered the most valuable model. Finally, seven potential gene signatures (CTNNBIP1, CORT, DLX2, TERT, BBS4, SLC7A1, and NKX2-3) were identified, as illustrated in Fig. [118]3B. Fig. 3. [119]Fig. 3 [120]Open in a new tab Construction and evaluation of the MLDPS effectiveness. (A) The C-indexes values of 10 well-established ML algorithm combinations across three cohorts. (B) Seven CPSGs selected for the final model using the LASSO algorithm. (C) Time-dependent ROC analysis for predicting OS in the TARGET cohort. (D) AUC values of the time-dependent ROC curves in the TARGET cohort. (E) KM curves for OS in the TARGET cohort. Robust prediction performance of MLDPS To estimate the discriminatory capability and clinical utility of the MLDPS, we plotted ROC and survival curves. As shown in Fig. [121]3C and D, the AUCs for OS in the TARGET cohort at 1, 3, and 5 years were 0.949, 0.957, and 0.963, respectively. Similarly, the model exhibited robust performance in the [122]GSE21257 cohort, achieving AUCs of 0.954, 0.869, and 0.852 (Fig. [123]4A and B). The merged cohort had AUCs of 0.935, 0.913, and 0.912 (Fig. [124]4D and E). These findings indicate the robust prognostic capability of the MLDPS model for osteosarcoma patients, with AUCs ranging from 0.852 to 0.963 for OS at 1, 3, and 5 years across all three cohorts. Furthermore, KM analysis confirmed that the MLDPS could effectively stratify osteosarcoma patients into distinct high- and low-risk score groups. Patients in the high-risk score group, exhibited significantly worse survival outcomes compared to the low-risk score group across all three cohorts (P < 0.0001 for all cohorts) (refer to Figs. [125]3E and [126]4C, and Fig. [127]4F). Fig. 4. [128]Fig. 4 [129]Open in a new tab Prediction performance of MLDPS in the [130]GSE21257 and merged cohorts. Time-dependent ROC analysis for predicting OS in the (A) [131]GSE21257 and (D) merged cohorts. AUC values of the time-dependent ROC curves in the (B) [132]GSE21257 and (E) merged cohorts. KM curves for OS in the (C) [133]GSE21257 and (F) merged cohorts. Re-evaluation of 10 previously published signatures in osteosarcoma The expeditious advancement of high-throughput sequencing has boosted the stratified management and precise therapeutic approaches for tumors^[134]60. Recently, numerous prognostic signatures for osteosarcoma have been developed using ML algorithms, including RSF and CoxBoost, leveraging extensive and reliable datasets^[135]23,[136]61,[137]62. A total of 10 prognostic signatures were collected from published sources to perform a comparative assessment of the MLDPS’ predictive efficacy relative to these models. The evaluation was carried out using the C-index metric across three distinct cohorts. These results demonstrated that the MLDPS achieved significantly higher predictive accuracy than the other models across all three cohorts, consistently ranking first (Fig. [138]5). This underscores the robustness and reliability of the MLDPS. Notably, certain prognostic signatures, such as those reported in PMID34894177^[139]23, PMID37234254^[140]9, and PMID34337075^[141]32, exhibited higher C-index values in one or two cohorts but performed poorly in the others. This discrepancy may stem from impaired generalization ability caused by overfitting, as plotted in Fig. [142]5. Fig. 5. [143]Fig. 5 [144]Open in a new tab Comparisons of C-indexes values between MLDPS and 10 expression-based signatures. C-indexes values of MLDPS and 10 published signatures in the TARGET, [145]GSE21257, and merged cohorts. *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001. The clinical signature of MLDPS Univariate Cox regression analysis revealed that the MLDPS, published signatures (excluding PMID35598382^[146]33), and metastasis were statistically significant within the merged cohort (Fig. [147]6A). However, no statistically significant differences were observed for gender or age (Fig. [148]6A). Further analysis was conducted to compare OS and metastasis between high- and low-risk score groups. Compared to the low-risk score group, the high-risk score group exhibited significantly inferior OS (Fig. [149]6B). Patients in the high-risk score group showed a greater propensity for developing metastasis, potentially contributing to their unfavorable prognosis (Fig. [150]6C). Fig. 6. [151]Fig. 6 [152]Open in a new tab The clinical signature of MLDPS and other signatures in the merged cohort. (A) Univariate Cox regression analysis of MLDPS and other signatures in the merged cohort. (B) Comparison of osteosarcoma between the high- and low-risk score groups. Left: Differences in the MLDPS risk scores between the OS groups. Right: Composition percentage of the high- and low-risk score groups in osteosarcoma. (C) Comparison of metastasis between the high- and low-risk score groups. Left: Differences in the MLDPS risk scores between the metastasis groups. Right: Composition percentage of the high- and low-risk score groups in metastasis. Collectively, the timeROC curve, KM survival analysis, and Cox regression analysis outcomes consistently demonstrated that the MLDPS accurately forecasted the prognosis of osteosarcoma patients across all three cohorts. These findings suggest that the MLDPS holds promise as a valuable tool for clinical application. The underlying biological mechanisms of MLDPS Heatmaps (Fig. [153]7A and B) depicted the top 50 genes showing the most significant positive and negative correlations with the risk score. GO enrichment analysis of co-expressed genes positively associated with the MLDPS exhibited significant enrichment in various BPs, namely “ossification,” “ncRNA processing,” “embryonic organ morphogenesis,” “ribosome biogenesis,” “rRNA metabolic process,” and “rRNA processing” (Fig. [154]7C). Enrichment was observed in CCs such as “ribosome,” “mitochondrial matrix,” “ribosomal subunit,” “asymmetric synapse,” and “cytosolic ribosome” (Fig. [155]7D). With respect to MFs, “structural constituent of ribosome” (Fig. [156]7E) was significantly enriched. Notably, KEGG analysis showed “Ribosome” as the only pathway with significant enrichment (Fig. [157]7F). Fig. 7. [158]Fig. 7 [159]Open in a new tab Function and pathway analysis of MLDPS. (A) Heatmap of the top 50 positive correlation genes with the MLDPS risk score; (B) Heatmap of the top 50 negative correlation genes with the MLDPS risk score; The GO enrichment analysis of the top 500 positive correlation genes, including BP (C), CC (D) and MF (E); (F) The KEGG pathways enrichment analysis of the top 500 positive correlation genes. To understand the biological implications of MLDPS, we performed a GSEA analysis. The findings indicated a positive correlation between a high-risk score and the upregulation of key gene sets involved in translation, rRNA processing in the nucleus and cytosol, the major pathway of rRNA processing in the nucleolus and cytosol, and other biological pathways (Supplementary Fig. 1A). GSVA was utilized to analyze the effects of MLDPS on biological pathways in osteosarcoma. A heatmap was used to visualize the top 20 biological processes with statistically significant differences, ranked in ascending order of their P-values. As depicted in Supplementary Fig. 1B, significant pathway differences were observed between the high- and low-risk score groups. These observations may offer a potential explanation for the unfavorable prognosis associated with the high-risk score group. Immune landscape of MLDPS A thorough analysis of the TME in osteosarcoma was conducted using the ‘IOBR’ R package^[160]51. The heatmap exhibited variations in the immunological composition between high- and low-risk score groups (Fig. [161]8A). As shown in Fig. [162]8B, the high-risk score group demonstrated notably diminished ESTIMATEScore, ImmuneScore, MicroenvironmentScore, and StromalScore, suggesting an immunosuppressive condition. Fig. 8. [163]Fig. 8 [164]Open in a new tab Landscapes of the TME between the high- and low-risk score groups. (A) Heatmap indicates the difference in the TME between the high- and low-risk score groups. (B) The diversities of the ESTIMATEscore, ImmuneScore, MicroenvironmentScore, and StromalScore between the high- and low-risk score groups. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001. Further analysis identified a negative correlation between the risk score and the gene expression levels of most chemokines and chemokine receptors (Fig. [165]9A). Specifically, the correlation coefficients between the risk score and the expression levels of CCL8, CSF1, CCR1, CCR2, CCR4, and CCR7 were all below − 0.2, with corresponding P-values less than 0.05 (Fig. [166]9B). Fig. 9. [167]Fig. 9 [168]Open in a new tab The correlations between MLDPS and gene expression of chemokines and their receptors. (A) Correlation heatmap between MLDPS and expression of chemokines and their receptors. (B) Correlation analysis of MLDPS and CCL8, CSF1, CCR1, CCR2, CCR4 and CCR7 gene expression. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001. TIDE algorithms, widely used for predicting responses to cancer immunotherapy, indicate suboptimal immunotherapy outcomes when TIDE values are elevated^[169]59. In the merged cohort, TIDE algorithms predicted responses to immunotherapy for 138 patients. Of these, 46 were identified as responders (TIDE score < 0), whereas the remaining 92 were classified as non-responders (Fig. [170]10A). To evaluate the predictive power of the MLDPS for cancer immunotherapy response, dysfunction and TIDE scores were calculated. Osteosarcoma patients with elevated MLDPS risk scores exhibited decreased dysfunction and TIDE scores, indicating a higher likelihood of responding to immunotherapy in the merged cohort (Fig. [171]10B − D). Overall, these findings indicat that individuals in the high-risk score group are more likely to benefit from immunotherapy. Fig. 10. [172]Fig. 10 [173]Open in a new tab Immunotherapeutic response prediction by TIDE. (A) Patients were classified as potential responders or non-responders to immunotherapy based on their TIDE values in the merged cohort. (B) Differences in Dysfunction scores between high- and low-risk score groups (P < 0.001). (C) Percent weight of clinical response to immunotherapeutic in high- and low-risk score groups. (D) Differences in TIDE scores between high- and low-risk score groups (P = 0.02). Comparative analysis of drug sensitivity between high- and low-risk score groups Precision medicine necessitates clinicians to promptly recognize patients sensitive to different treatments in order to provide tailored and personalized interventions^[174]63,[175]64. Hence, the drug sensitivity between the two risk score groups was evaluated by the ‘oncoPredict’ R package. Drug sensitivity analysis demonstrated a notable discrepancy in the drug response between the two risk score groups. Specifically, the high-risk score group exhibited a correlation with decreased IC50 values for drugs such as ABT737_1910, AZ6102_2109, AZD5153_1706, AZD5991_1720, AZD6738_1917, and BDP.00009066_1866. Conversely, the low-risk score group was correlated with decreased IC50 values for drugs such as AZ960_1250, BMS.754807_2171, Entospletinib_1630, IAP_5620_1428, PF.4708671_1129, and Ribociclib_1632 (Fig. [176]11). Precise prediction of drug sensitivity may have the potential to enhance patient treatment outcomes. Fig. 11. [177]Fig. 11 [178]Open in a new tab Association between the risk model and drug sensitivity in osteosarcoma. Drug sensitivity of osteosarcoma patients with high- and low-risk score. *P < 0.05, **P < 0.01, ***P < 0.001. Discussion Osteosarcoma is a rare malignant tumor originating in bone tissue that predominantly afflicts adolescents and young adults^[179]65,[180]66. Clinicians and researchers face critical challenges due to the lack of readily available biomarkers for screening, stratified treatment, and prognostic follow-up, potentially resulting in inadequate treatment^[181]67. To bridge these gaps, novel methodologies are needed to identify osteosarcoma biomarkers that accurately reflect the biological characteristics of the disease. Recent studies have developed various prognostic signatures for osteosarcoma, many of which focus on specific biological pathways such as glycolysis, ferroptosis, and immunity^[182]68–[183]70. However, this focused approach neglects crucial biological processes that may also contribute to the oncogenesis and progression of osteosarcoma. Moreover, existing research indicates that researchers tend to select modeling algorithms based on their knowledge limitations and personal preferences^[184]23,[185]61,[186]71. To compensate for these