Abstract Lung squamous cell carcinoma (LUSC) represents a major subtype of lung cancer, and it demonstrates limited treatment options and worse survival. Identifications of a prognostic model and chemoresistance mechanism can be helpful for improving stratification and guiding therapy decisions. The integrative development of machine learning-based models reveals a random survival forest (RSF) prognostic model for LUSC. The 12-gene RSF model exhibits high prognostic power in more than 1,000 LUSC patients. High-risk LUSC patients are associated with worse survival and the activation of the epithelial-mesenchymal transition pathway. Additionally, high-risk LUSC patients are resistant to docetaxel or vinorelbine treatment. In vitro and in vivo drug sensitivity experiments indicates that high-risk HCC15/H226 tumour cells and cell line-derived xenograft models are more resistant to vinorelbine treatment. Furthermore, the combination of chemotherapy with transforming growth factor-β inhibition augments antitumour responses in LUSC tumours. Our study provides valuable insights into prognosis stratification and the development of therapeutic strategies for LUSC. graphic file with name 42003_2025_7595_Figa_HTML.jpg Subject terms: Non-small-cell lung cancer, Prognostic markers, Predictive markers, Machine learning, Translational research __________________________________________________________________ A random survival forest prognostic model for the risk assessment of newly diagnosed LUSC patients may guide the application of chemotherapy in LUSC. Introduction Lung cancer remains the leading cause of cancer-related deaths worldwide. Lung squamous cell carcinoma (LUSC) is the second most common lung cancer subtype and is responsible for 20–30% of lung carcinoma deaths^[48]1. However, unlike lung adenocarcinoma (LUAD), for which targeted therapies (including EGFR and ALK inhibitors) have demonstrated significant efficacy^[49]2, the lack of active genetic alterations and corresponding targeted therapies that are approved for the treatment of LUSC has hindered the development of precision medicine^[50]3. Chemotherapy is the standard treatment for LUSC^[51]4. Platinum doublet chemotherapy (which involves paclitaxel, gemcitabine, docetaxel and vinorelbine combined with platinum) is the first-line treatment for patients with advanced LUSC who are in good health (PS 0–1), and single-drug chemotherapy (which involves gemcitabine, paclitaxel, docetaxel, and vinorelbine) is recommended for patients with advanced LUSC (at PS = 2)^[52]5–[53]7. However, chemoresistance is a significant obstacle for treatment and may lead to poor survival^[54]8. Therefore, identifications of a prognostic model and chemoresistance mechanism are urgently needed for LUSC. Individual risk prediction demonstrates important benefits for clinical decision-making, as it can identify high-risk individuals and provide guidance for appropriate individual interventions and treatments. In recent years, machine learning algorithms have also been employed to predict the individual risk of disease development^[55]9,[56]10. A pioneering machine learning method is the random survival forest (RSF), which estimates a hazard function via the ensemble prediction of many decision trees. Previous studies have demonstrated the ability of RSF to effectively predict survival and identify molecular biomarkers in colorectal cancer^[57]11, hepatocellular carcinoma^[58]12, and oesophageal carcinoma^[59]13. In this study, we combined an integrated analysis of LUSC multiomics data and experimental validation to propose a RSF prognostic model for predicting patient clinical outcomes; importantly, this combined design may also suggest future directions for treatment strategies in LUSC. Results Identification of 12 core fitness genes in LUSC A complete flowchart of this study is shown in Fig. [60]1. To identify LUSC fitness genes, we performed an integrative analysis of 22 LUSC cell lines from the DepMap database, which stores genome-wide CRISPR information on various cancer cells. Genes with a CERES score of less than -1.0 in at least 80% of 22 LUSC cell lines were defined as fitness genes. In addition, we identified tumour-upregulated genes and prognosis-related genes in the TCGA-LUSC dataset. We intersected these three gene lists, and 31 essential fitness genes were identified (Fig. [61]2A and Supplementary Table [62]1). Via the training cohort from TCGA, we employed the “randomForestSRC” package to further select the final 12 core fitness genes (Fig. [63]2A and Supplementary Table [64]2), which were uniquely identified in LUSC (Supplementary Fig. [65]1A). Fig. 1. The workflow of this study. [66]Fig. 1 [67]Open in a new tab Created with BioRender.com. Fig. 2. Development and validation of a 12-gene fitness RSF model. [68]Fig. 2 [69]Open in a new tab A Core fitness gene selection in LUSC. B The 5-year AUCs and C-indices for seven prognostic models in the overall TCGA-LUSC dataset (n = 493). C Kaplan-Meier estimates of overall survival according to the RSF risk score in the training cohort from TCGA (n = 345) (Log-rank test). D Kaplan-Meier estimates of overall survival according to the RSF risk score in the test cohort from TCGA (n = 148) (Log-rank test). E Kaplan-Meier estimates of overall survival according to the RSF risk score in the [70]GSE157011 cohort (n = 484) (Log-rank test). F Multivariate Cox regression analysis of the RSF and seven prognostic models in the overall TCGA-LUSC dataset (n = 493). G Multivariate Cox regression analysis of the RSF and seven prognostic models in the [71]GSE157011 cohort (n = 484). Integrative construction of the prognostic model Some prognostic models using fitness genes have been reported in acute myeloid leukaemia^[72]14, bladder cancer^[73]15, and renal cell carcinoma^[74]16 but not in LUSC. With the training cohort from TCGA, seven prognostic models, including RSF, GBM, Lasso, Suprepc, Ridge, ElasticNet, and CoxPH, were established based on the 12 core fitness genes. In the training cohort, the predictive performance of the RSF model (0.798) was better than that of the other models, with C-indices of 0.734 for GBM, 0.591 for Lasso, 0.585 for Suprepc, 0.589 for Ridge, 0.588 for ElasticNet, and 0.582 for CoxPH. We also performed comparative analysis among these models in the overall TCGA-LUSC dataset (Fig. [75]2B). The results also revealed that the RSF (0.806) had higher 5-year AUCs than did the other models. After comprehensive consideration of the C-indices and 5-year AUCs, the RSF model was demonstrated to have better predictive performance than the other models and was chosen for further explorations. Evaluation and validation of the RSF model We calculated the RSF risk score for each patient and divided them into high-risk and low-risk groups. The high-risk group was associated with a significantly worse OS (HR: 6.06, 95% CI: 4.05–9.06; univariate Cox regression p < 0.01) (Fig. [76]2C) and remained a significant independent prognostic factor in the training cohort from TCGA (Supplementary Table [77]3). High-risk scores were also associated with worse survival (HR: 1.69, 95% CI: 1.03–2.78, univariate Cox regression p = 0.039) (Fig. [78]2D) and were an independent prognostic factor in the test cohort from TCGA (Supplementary Table [79]4). To ensure that the constructed RSF model had prognostic value in the different cohorts, another two independent validation cohorts were used. In [80]GSE157011, the high-risk score group exhibited worse survival (HR: 1.29, 95% CI: 1.01–1.65, univariate Cox regression p = 0.039) (Fig. [81]2E) and was determined to be an independent prognostic factor (Supplementary Table [82]5). Furthermore, LUSC patients with high risk scores also demonstrated worse survival in the CPTAC-LUSC cohort (Supplementary Fig. [83]1B). To validate the ability of the RSF model to predict survival outcomes, we derived an unbiased prognostic signature score in the TCGA training dataset and compared these two prognostic indices via multivariate Cox regression analysis. The results revealed that the RSF risk score was independent of the unbiased prognostic signature across the TCGA training, test, and [84]GSE157011 datasets (all p < 0.05; Supplementary Tables [85]6–[86]8). Furthermore, compared with seven reported prognostic models for LUSC (Model 1^[87]17, Model 2^[88]17, Model 3^[89]18, Model 4^[90]19, Model 5^[91]20, Model 6^[92]21, and Model 7^[93]22) via multivariate Cox regression analysis, the RSF model was an independent prognostic factor in the TCGA training, TCGA overall, and [94]GSE157011 datasets (all p < 0.05; Supplementary Fig. [95]1C and Fig. [96]2F, G). Subgroup analysis for the prognostic value of the RSF model in LUSC In the overall dataset from TCGA, we found that patients with higher RSF risk scores exhibited significantly worse OS, disease-free survival (DFS), and progression-free survival (PFS) (Supplementary Fig. [97]2A, C, E). To further investigate the impacts of clinical characteristics on the prognostic value of the RSF model, we conducted stratified analysis according to the baseline characteristics of LUSC patients (including age, sex, and disease stage) and found that the RSF model was able to discriminate LUSC patients into significantly different prognostic groups for OS, DFS, and PFS (Supplementary Fig. [98]2B, D, F). Genomic analysis in high-risk and low-risk groups of LUSC To obtain more comprehensive insights into the mutational landscape associated with the RSF model, we depicted a landscape of commonly mutated genes in NSCLC between the high-risk and low-risk groups (Supplementary Fig. [99]3A). Moreover, we revealed that the B2M mutation was associated with a higher risk score and poor survival (Supplementary Fig. [100]3B, C). However, the CSMD3 mutation was associated with a lower risk score and good survival (Supplementary Fig. [101]3D, E). The tumour mutational burden (TMB) is reported as being a predictive biomarker in solid tumours. We found that low-risk patients demonstrated higher TMB and better OS (Supplementary Fig. [102]3F, G), which was consistent with the results of Riviere’s study^[103]23. These results indicate that there is tumour genomic heterogeneity between the high-risk and low-risk groups and that precise treatment is needed. To gain a more comprehensive understanding of the molecular differences that exist between the low- and high-risk groups, we also analysed the copy number variation (CNV) data in the TCGA-LUSC dataset. Using GISTIC2, we detected aberrant CNV regions in the low/high-risk groups and identified several common and specific copy number alterations between these two groups. Both groups exhibited amplification in the cytoband of 8p11.23 and deletion in the cytoband of 2q22.1. In addition, specific deletions were detected in the cytoband 15q21.1 region in the low-risk group and in the cytoband 7q36.3 region in the high-risk group (Supplementary Fig. [104]4A, B). Moreover, there were specific amplifications observed in cytobands of 22q11.22 in the low-risk group and unique amplifications observed in cytobands of 2q31.2 in the high-risk group (Supplementary Fig. [105]4A, B). Notably, NFE2L2 is part of a recurrent focal amplicon at the 2q31.2 region in LUSC. Increased NFE2L2 accumulation in cancer tissues is strongly correlated with poor clinical outcomes in various cancer types^[106]24–[107]26. Tumours with 2q31.2 amplification exhibited significantly increased NFE2L2 expression in the TCGA-LUSC cohort (Supplementary Fig. [108]4C), which may partly explain the poor survival that is observed in patients with high-risk tumours. HIBCH is a central enzyme of valine catabolism^[109]27, and recent investigations have demonstrated HIBCH as being a key gene involved in the development of various tumours, such as clear cell renal cell carcinoma^[110]28, colorectal cancer^[111]29, and prostate cancer^[112]30. LUSC tumours with 2q31.2 amplification exhibited significantly increased HIBCH expression (Supplementary Fig. [113]4D), and more studies are needed to explore whether HIBCH is functionally involved in LUSC pathogenesis. A multivariate Cox regression model that included survival-associated genetic abnormalities and the RSF risk score revealed that the B2M mutation, CSMD3 mutation, and 22q11.22 amplification lost their prognostic significance. However, the RSF risk score and TMB remained significant independent prognostic factors (Supplementary Table [114]9). RSF risk score is associated with the tumour microenvironment in LUSC Tumour microenvironment differences were subsequently evaluated. First, we focused on several immune checkpoints and found that PDCD1, CTLA4, HAVCR2, TIGIT, and BTLA were highly expressed in the high-risk group (Supplementary Fig. [115]5A). In addition, the high-risk group demonstrated significantly higher stromal scores and immune scores (Supplementary Fig. [116]5B, C). Afterwards, we detected increased signatures of T cells, myeloid dendritic cells, and fibroblasts in the high-risk group (Supplementary Fig. [117]5D). Moreover, the high-risk group exhibited a higher TIDE score^[118]31, thus suggesting that high-risk patients may be resistant to ICB therapy (Supplementary Fig. [119]5E). Although high-risk patients exhibit high T-cell infiltration, most T cells may maintain an exhausted status and lack the practical ability to control tumour progression, thereby prolonging the survival times of these patients (Supplementary Fig. [120]5F). Moreover, we observed a greater expression level of TGFB1 in the high-risk group (Supplementary Fig. [121]5G), which is consistent with the findings of a previous study indicating that the cytokine TGF-β (encoded by TGFB1) plays a suppressive role during antitumour immunotherapy^[122]32. Moreover, multiple genes involved in TGF-β signalling are regulated via methylation of the promoters of these genes, thereby potentially contributing to mechanisms of immune evasion^[123]33. For example, the TGF-β1 complex is activated by MMP2, and the transduction of TGF-β signalling requires MMP2 activity^[124]34,[125]35. As shown in Supplementary Fig. [126]5H-I, we found that MMP2 exhibited a lower methylation level and a higher gene expression level in the high-risk group. Furthermore, SMAD7, which can be activated by TGF-β to restrain immune cell activation^[127]36, exhibited a lower methylation level and greater expression in the high-risk group (Supplementary Fig. [128]5J, K). Therefore, our investigation revealed that high-risk tumours exhibit a unique methylation pattern with a potential mechanism for regulating the immunosuppressive TME. A high risk score is associated with the EMT signalling pathway We further investigated the gene expression patterns related to the RSF score in the TCGA-LUSC cohort. We found that positively correlated genes were mainly enriched in the PI3K-AKT signalling, focal adhesion, and ECM interaction pathways (Fig. [129]3A). Genes involved in the EMT, PI3K-AKT signalling, focal adhesion, and TGF-β signalling pathways were extracted to generate a heatmap (Fig. [130]3B). The negatively correlated genes were mainly enriched in the ribosome, cell cycle, and DNA replication pathways (Fig. [131]3C). GSEA further revealed that the EMT, ECM, and TGF-β signalling pathways were significantly enriched in the high-risk group (Fig. [132]3D). Pathways enriched in the low-risk group were associated with E2F targets, DNA repair, and the cell cycle. Due to the fact that EMT has also been reported to be associated with poor survival^[133]37, we compared the prognostic value of the RSF risk score and EMT score in the TCGA-LUSC cohort. The results revealed that LUSC with higher EMT scores was associated with a worse prognosis (Supplementary Fig. [134]6). However, compared with the RSF risk score, the EMT score had an inferior ability to predict overall survival in the TCGA-LUSC cohort (5-year AUC: 0.56 vs. 0.81). Fig. 3. Functions and pathways associated with the RSF model. [135]Fig. 3 [136]Open in a new tab A GO enrichment analysis of genes that were positively correlated with the risk score. B Associations between gene expression in the EMT, PI3K-AKT signalling, focal adhesion, and TGF-β signalling pathways and the RSF risk score. C GO enrichment analysis of genes that were negatively correlated with the risk score. D Volcano plot showing normalized enrichment scores based on gene set enrichment analysis for significant pathways that were correlated (right) or anticorrelated (left) with the risk score. A high risk score predicts resistance to chemotherapy in LUSC In TCGA-LUSC chemotherapy-treated patients, the high-risk group also exhibited worse survival (Supplementary Fig. [137]7A), and patients who did not respond to chemotherapy had higher RSF risk scores (Supplementary Fig. [138]7B). Moreover, high-risk patients exhibited significantly worse survival after treatment with docetaxel, vinorelbine, navelbine, or cisplatin (Supplementary Fig. [139]7C–F). We further applied the RSF model to 9 LUSC cell lines with proteomics data. We conducted correlation analysis between the risk score and protein expression and subsequently performed GSEA. The EMT, ECM and focal adhesion pathways were significantly associated with a high risk score (Fig. [140]4A), which was consistent with the results from the TCGA-LUSC cohort (Fig. [141]3D). Drug susceptibility analysis revealed a positive correlation between the risk score and the LN IC50 of vinorelbine from the GDSC database (Fig. [142]4B). In addition, the risk score was significantly associated with the IC50 and AUC values of docetaxel in both databases (Fig. [143]4C, D). These results suggest that the RSF risk score can predict the effectiveness of chemotherapy in LUSC and that high-risk LUSCs are more resistant to chemotherapy. Fig. 4. Proteomic analysis of the RSF model and drug sensitivity analysis of the LUSC cell line. [144]Fig. 4 [145]Open in a new tab A GSEA revealed the upregulation of EMT signalling in the high-risk group. B The risk score was positively correlated with the LN IC50 of vinorelbine in the GDSC database (n = 7) (Pearson’s correlation test). C The risk score was positively correlated with the IC50 of docetaxel in the CTRP database (n = 8) (Pearson’s correlation test). D The risk score was positively correlated with the AUC of docetaxel in the GDSC database (n = 6) (Pearson’s correlation test). LUSC tumour cells with high risk scores exhibit significant resistance to vinorelbine or docetaxel treatment According to the abovementioned results, 9 LUSC cell lines were divided into high-risk (n = 4) and low-risk (n = 5) groups (Supplementary Table [146]10). HCC15 (high-risk), NCI-H226 (high-risk) and NCI-H520 (low-risk) cells were used for further experimental validation. We examined the IC50 values of vinorelbine and docetaxel in these three cell lines. Vinorelbine more efficiently inhibited the growth of NCI-H520 cells, with an IC50 value of 4.727 μM (95% CI: 4.092–5.467 μM), which was lower than the IC50 values for HCC15 (9.352 μM, 95% CI: 8.057–10.89 μM) and NCI-H226 (9.238 μM, 95% CI: 8.222–10.41 μM). Furthermore, the IC50 value of docetaxel was 12.47 μM (95% CI: 11.51–13.52 μM) in NCI-H520 cells, which was also lower than the IC50 values for HCC15 (19.57 μM, 95% CI: 17.84–21.58 μM) and NCI-H226 (25.34 μM, 95% CI: 23.44–27.48 μM) cells (Fig. [147]5A, B). Compared to NCI-H520 cells, the high-risk HCC15 and NCI-H226 cell lines were more resistant to vinorelbine and docetaxel treatment. Furthermore, the results of the cell colony formation assay suggested that high-risk cells were more resistant to vinorelbine or docetaxel (Fig. [148]5C, D). The difference in drug resistance between the high- and low-risk groups indicates that the RSF risk score can predict the efficacy of chemotherapy in LUSC patients. Fig. 5. Experimental validation of the RSF model’s ability to predict chemotherapy efficiency in cell lines. [149]Fig. 5 [150]Open in a new tab A, B NCI-H520, HCC15, and NCI-H226 cells were treated with vinorelbine or docetaxel in a dose-dependent manner. Cell viability was assessed to construct curve plots, after which the IC50 values were calculated (n = 3 per group; ∗∗, p < 0.01; ∗∗∗, p < 0.001; t test). C NCI-H520, HCC15, and NCI-H226 cells were plated into 6-well plates and then treated with DMSO, vinorelbine (3 μM), or docetaxel (10 μM). Fourteen days after implantation, the cells were stained with crystal violet and photographed with a camera (n = 5 per group). D NCI-H520, HCC15, and NCI-H226 cells were treated with DMSO, vinorelbine (3 μM), or docetaxel (10 μM). Following the colony formation assay, the number of colonies in each well was calculated by ImageJ (n = 5 per group). E NCI-H520, HCC15, and NCI-H226 cells were treated with DMSO or the TGF-β receptor I inhibitor SB525334 (5 μM) for 2 days. The cell lysates were immunoblotted with the indicated antibodies. F–G NCI-H520, HCC15, and NCI-H226 cells were treated with vinorelbine (3 μM) or docetaxel (10 μM) with either DMSO or SB525334 (5 μM) pretreatment. Cell viability was tested every day after cell implantation (n = 5 per group; ns, not significant; *, p < 0.05; **, p < 0.01; ***, p < 0.001, ANOVA). H NCI-H520, HCC15, and NCI-H226 cells were treated with multiple doses of the TGF-β receptor I inhibitor SB525334 combined with vinorelbine or docetaxel for 2 days. Synergy plots depicting Gaddum’s noninteraction model (HSA) for NCI-H520, HCC15, and NCI-H226 cells were separately drawn. An HSA value that exceeds 10 indicates a synergistic interaction. High-risk LUSC patients may benefit from combined treatment with chemotherapy and TGF-β pathway inhibition The EMT pathway was significantly enriched in the high-risk group in both the TCGA-LUSC dataset and the LUSC cell lines (Figs. [151]3D, [152]4A). TGF-β is a well-established and potent inducer of EMT^[153]38. The TGF-β receptor I inhibitor SB525334 has been reported to effectively inhibit the TGF-β pathway. We further investigated whether inhibition of the TGF-β pathway reverses chemotherapeutic resistance in high-risk HCC15 and NCI-H226 cells. To assess the efficacy of this inhibition, we examined the protein levels of downstream markers in the TGF-β pathway. After the inhibition of TGF-β receptor I, no significant changes in the expression levels of TGF-β receptor I itself were observed. Conversely, the protein levels of N-cadherin and vimentin were decreased, and these decreases were accompanied by an increase in E-cadherin expression (Fig. [154]5E). When combined with TGF-β pathway inhibition, both vinorelbine and docetaxel significantly suppressed the proliferation of high-risk HCC15 and NCI-H226 cell lines. However, for the low-risk NCI-H520 cell line, combined drug inhibition had a limited effect (Fig. [155]5F, G). To determine whether the inhibition of the TGF-β pathway enhances the effects of vinorelbine and docetaxel at multiple doses, we evaluated the synergistic effects by the highest single agent (HSA) synergy model. In high-risk HCC15 and NCI-H226 cells (but not in low-risk H520 cells), the HSA synergy scores far exceeded 10, thus indicating that SB525334 synergised with vinorelbine and docetaxel (Fig. [156]5H). These results suggest that vinorelbine or docetaxel combined with TGF-β inhibition (TGF-βi) may aid in the treatment of LUSC patients with high RSF risk scores. Vinorelbine resistance is reversed via combined inhibition of the TGF-β pathway in mice To further verify the efficacy of vinorelbine combined with TGF-β pathway inhibition in vivo, we conducted tumour implantation procedures in NOD-SCID mice. As shown in Fig. [157]6A, the mice were injected with NCI-H520, HCC15 or NCI-H226 cells and treated with vinorelbine, SB525334 or a combination of both drugs, respectively. Compared with the vehicle group, the growth of the NCI-H520-derived tumours was greater than that of the HCC15-derived and NCI-H226-derived tumours following vinorelbine treatment (Fig. [158]6B, C, D). Additionally, vinorelbine exerted minimal effects on the weights of the HCC15-derived and NCI-H226-derived tumours (Fig. [159]6E). Furthermore, HCC15- and NCI-H226-derived tumours exhibited increased sensitivity to vinorelbine after combination with SB525334, as did NCI-H520-derived tumours. Tumours derived from these LUSC cell lines were significantly inhibited after the combination treatment (Fig. [160]6D, E). These in vivo results provide valuable evidence for the development of therapeutic strategies for LUSC. Fig. 6. Vinorelbine resistance was reversed via the combined inhibition of the TGF-β pathway in mice. [161]Fig. 6 [162]Open in a new tab A NCI-H520, HCC15, and NCI-H226 cells were subcutaneously injected into NOD-SCID mice. Once the average tumour volume reached 100 mm³, the mice were randomized to receive different treatments. At 21 days postimplantation, the tumours were collected and weighed (n = 5 mice per group). B Mice were euthanized at 21 days after implantation, and all of the collected tumours are displayed in the figure (n = 5 mice per group). C Tumour volumes of NCI-H520-, HCC15-, and NCI-H226-derived tumours were measured every three days following implantation. Curve plots were generated to illustrate tumour growth associated with different cell line-derived tumours (n = 5 mice per group; ns, not significant; *, p < 0.05; **, p < 0.01; ***, p < 0.001; ****, p < 0.0001, ANOVA). D Tumour volumes of NCI-H520, HCC15, and NCI-H226-derived tumours on day 21 postimplantation were measured before the mice were euthanized (n = 5 mice per group; ns, no significance; *, p < 0.05; **, p < 0.01; ***, p < 0.001; ****, p < 0.0001, t test). E Tumour weights of NCI-H520, HCC15, and NCI-H226-derived tumours were measured after they were harvested from the mice (n = 5 mice per group; ns, not significant; *, p < 0.05; **, p < 0.01; ***, p < 0.001; ****, p < 0.0001, t test). Discussion Treatment options and efficacy in LUSC remain limited. The lack of targetable mutations in LUSC has hindered progress in the development of more effective, personalized treatment approaches^[163]3. Since the implementation of the CRISPR-Cas9 system in cancer research, numerous candidate LUSC targets have been revealed via in vitro or in vivo CRISPR screens^[164]39,[165]40. Given the critical importance of fitness genes for the functional phenotypes of cells, we speculated that these LUSC-specific fitness genes may also provide critical prognostic information. Thus, we established a powerful prognostic RSF model composed of 12 core fitness genes, which were validated in an internal cohort and two independent cohorts. Therefore, our study, which included more than 1,000 LUSC patients (representing the largest cohort that has ever been examined), proposed a potential prognostic indicator for LUSC. To explore whether essential genes selected for the RSF were lineage-specific, we further derived essential genes in LUAD and SCLC and found that essential genes were predominantly specific for lung cancer histologic subtypes. GSEA revealed that the RSF risk score was associated with the EMT pathway; however, the RSF risk score performed much better than the EMT signature in survival prediction, thus highlighting the potential role of machine learning in enhancing the predictive power of prognosis^[166]41. Furthermore, when genomic prognostic alterations and RSF scores were adjusted via multivariate Cox regression analysis, both the RSF score and the TMB emerged as independent prognostic factors. These findings suggest that the combination of RSF and TMB may be recommended for further application. Due to the lack of specific targeted drugs, chemotherapy remains the standard treatment for LUSC^[167]4. Our study serves as a proof-of-concept for the utility of the RSF model to identify potential therapeutic candidates for specific chemotherapy agents for patients with LUSC. In the drug sensitivity analysis of the TCGA-LUSC cohort, LUSC patients who received vinorelbine or docetaxel treatment were divided into significantly different prognosis groups according to the RSF risk score. The results of the LUSC cell line drug sensitivity analysis also suggest that a high RSF risk score can predict chemoresistance to vinorelbine and docetaxel. Therefore, we performed in vitro and in vivo drug sensitivity validation to confirm the predictive value of the RSF model by using the LUSC H520 and HCC15/H226 cell lines and CDX models. We found that the H520 cell line or the CDX model with a low RSF risk score exhibited increased sensitivity to vinorelbine. Our pathway enrichment analysis revealed significant EMT activation in the high-risk group. TGF-β is a well-established potential inducer of EMT^[168]38. Although high-risk tumours are resistant to vinorelbine, combined treatment with a TGF-βi reversed resistance in our in vitro and in vivo models. These findings suggest that the RSF risk score may guide the rational use of vinorelbine in patients who are most likely to benefit, such as patients with low RSF risk scores. Furthermore, the combination of vinorelbine with a TGF-βi may provide potential benefits for LUSC patients. A potential limitation of the current study is that we used only published datasets to validate the performance of our prognostic model. In addition, the differences in clinicopathological characteristics between the training and validation cohorts may influence the predictive power of the RSF model. Furthermore, a focus on LUSC fitness genes may overlook some superior prognostic genes that are not associated with cellular fitness. Moreover, the clinical utility of the RSF model needs to be further validated in further large-scale prospective studies in LUSC. In summary, we established a robust 12-gene RSF prognostic model for the rapid risk assessment of newly diagnosed LUSC patients. The predictive power of the RSF model risk score for chemotherapy response indicates its considerable potential for clinical applications. Our study provides a roadmap for patient stratification and may help in developing strategies for personalized follow-up and individualized decision-making for patients with LUSC. Methods Data source and processing RNA-seq expression data for TCGA-LUSC was downloaded from UCSC Xena ([169]https://xenabrowser.net/). Mutation, copy number variation, DNA methylation and clinical data were downloaded from the Genomic Data Commons (GDC) ([170]https://portal.gdc.cancer.gov/). External validation cohorts include [171]GSE157011^[172]42 and CPTAC-LUSC^[173]43. Dependence scores of LUSC cell lines were downloaded from DepMap ([174]https://depmap.org/portal/download/); scores were obtained following a series of loss-of-function genomic screenings in different cell lines. The CERES algorithm^[175]44 was used to determine dependency scores: a negative score of a gene indicates that knocking out the gene inhibits the survival of a cell line, whereas a positive score indicates that knocking out the gene promotes survival and proliferation. To evaluate the specificity of our screening flow, we applied the same criteria used in LUSC to identify essential genes in LUAD and small cell lung cancer (SCLC), the other two predominant subtypes of lung cancer. Differentially expressed genes and prognosis analysis were performed using the data from studies by Gillette et al. ^[176]45 (LUAD) and Liu et al. ^[177]46 (SCLC). Construction and verification of a random survival forest model For the LUSC dataset from TCGA, we performed random sampling at a ratio of 7:3 to split TCGA-LUSC patients into TCGA training cohort (n = 345) and TCGA test cohort (n = 148). In the training cohort, we further selected 12 genes using the ‘var.select’ function in the “randomForestSRC” package; next, we build prognostic models via seven different models, including RSF, GBM, Lasso, Suprepc, Ridge, Elasticnet, and CoxPH. The performance of these models was measured and compared with 5-year AUCs and C-indexes. The 5-year AUCs were calculated via “timeROC” R package. Comparison of time-dependent ROC curves was conducted via “compare” function in the “timeROC” R package. RSF model presented with the best performance and was screened out. Based on the median RSF risk score, patients in the training, test, and [178]GSE157011 cohorts were divided into low- or high-risk groups. For CPTAC-LUSC, the cutoff was generated based on the association between risk score and survival data using the “survminer” package because of the short follow-up time. We evaluated and compared the survival differences between the high- and low-risk groups using Kaplan-Meier plots. Unbiased prognostic signature in LUSC We performed univariate cox analysis in the TCGA-training cohort and identified 132 prognostic genes (Univariate cox regression P < 0.01) which was used to construct an unbiased prognostic signature. The unbiased prognostic signature score was evaluated via single-sample gene set enrichment analysis. Somatic mutation, copy number variation, DNA methylation, immune microenvironment, and functional enrichment analyses Mutation data were processed using the “maftools” R package^[179]47 and we calculated the TMB for the TCGA-LUSC dataset. GISTIC2^[180]48 was applied to detect aberrant CNV regions in the low-risk and high-risk groups. MCPcounter^[181]49 evaluated the infiltration of different cell types. TIDE algorithm^[182]31 was used to estimate the likelihood of an immunotherapy response. The immune score of each sample was calculated via the “ESTIMATE” R package^[183]50. Correlation analysis between the RSF risk score and gene expression was conducted via the Pearson correlation method. Gene set enrichment analysis (GSEA), biological functions and pathways were conducted using “clusterProfifiler4.0” package in R^[184]51. “Hallmark” and “KEGG” gene sets were acquired from MSigDb. EMT score We calculated the EMT score using the EMT signatures identified in the study by Mak et.al. ^[185]52 via single-sample gene set enrichment analysis. Drug sensitivity analysis Proteome data for 9 LUSC cell lines was acquired from Nusinow’s study^[186]53. Drug response data was acquired from Genomics of Drug Sensitivity in Cancer (GDSC) ([187]https://www.cancerrxgene.org/) and Cancer Therapeutics Response Portal (CTRP) ([188]https://portals.broadinstitute.org/ctrp). These two databases provide the IC50 (half the maximal inhibitory concentration) and/or the area under the dose-response curve (AUC) measures of drug sensitivity. A lower IC50 or AUC value indicates a higher sensitivity to an agent. Cell Culture NCI-H520 cell line (human lung squamous cell carcinoma, male), HCC15 cell line (human lung squamous cell carcinoma, male) and NCI-H226 cell line (human lung squamous cell carcinoma, male) were obtained from American Type Culture Collection (ATCC). All human lung squamous cells were maintained at 37 °C with 5% CO2 in RPMI-1640 (Gibco) medium supplemented with 10% foetal bovine serum (FBS, Sigma) and penicillin (100 units/ml)/streptomycin (0.1 mg/ml) (Gibco). All cell lines used were routinely confirmed as pathogen-free by PCR. Cell viability and proliferation Cell viability and proliferation were measured by using cell counting kit-8 (CCK-8) (Biomake, [189]B34304). For IC50 determination, NCI-H520, HCC15 and NCI-H226 cells were plated in 96-well plates at 1 × 10^4 per well and treated with vinorelbine (Selleck, S4269) or docetaxel (Selleck, S1148) next day at different concentrations for 24 h. For cell proliferation measurement, cells were seeded into 96-well plates at 2 × 10^3 per well and treated with 5 μM transforming growth factor-β receptor I inhibitor SB525334 (Selleck, S1476) next day. After 2 days of medication, 3 μM vinorelbine or 10 μM docetaxel was added into plates. For HSA synergy score estimation, cells were seeded into 96-well plates at 1 × 10^4 per well and treated with vinorelbine or docetaxel at multiple doses for 24 h. Then, 10 μl CCK-8 reagent was added to each well at the time of harvest. The cells were incubated at 37 °C for 1.5 hours and then measured at 450 nm absorbance to determine the cell viability or proliferation using the microplate reader (AMR-100, ALLSHENG). Clone formation NCI-H520, HCC15 and NCI-H226 cells were plated in 6-well plates at 800 per well. The medium was changed every 3 days. After 12 days of culture, 3 μM vinorelbine or 10 μM docetaxel were added and the cells were continuously maintained for 2 days. Afterward, cells in plates were fixed with methanol and stained with crystal violet. Pictures of stained cells were taken by camera. Counts of colonies in each well were calculated by ImageJ software. Immunoblotting assay All the cell samples were lysed with RIPA buffer (Beyotime, P0013B) and then the supernatant was boiled in 5× loading buffer (Beyotime, P0013L) for 10 minutes. Each sample was electrophoresed on 10% PAGE gels, then transferred to PVDF membranes and immunoblotted with the antibodies targeting N-cadherin (Abclonal, A19083), E-cadherin (CST, 3195S), VIM (CST, 5741S), TGFβR1 (Abclonal, A0708) and β-Actin (Abclonal, AC004). Immunoblots were visualized by a chemiluminescence imaging system (Tanon). Synergy analysis CCK-8 assays were conducted to evaluate the synergistic interactions between TGF-β receptor I inhibitor (0, 0.5, 1, 2, 4, 8 μM) and vinorelbine (0, 0.125, 0.25, 0.5, 1, 2, 4, 8, 16 μM)/docetaxel (0, 0.25, 0.5, 1, 2, 4, 8, 16, 32 μM). The suppression rates of cell proliferation were calculated after normalized to day 0 measurement. Synergy scores were calculated using Gaddum’s non-interaction model - Highest Single Agent (HSA) with SynergyFinder ([190]https://synergyfinder.fimm.fi/synergy)^[191]54. An HSA value that exceeds 10 indicates a synergistic interaction. Mouse model For animal studies, 5-week-old female NOD-SCID mice were chosen to perform tumour implantation. 1×10^6 NCI-H520, HCC15 or NCI-H226 LUSC cells were injected into the right flank of each mouse. Tumours were measured at serial time points after implantation by digital calipers and tumour volume was calculated as the 0.5×length×width×width. When average tumour volume reached 100 mm^3, mice were randomized to treatment groups. For vinorelbine treatment, mice were injected intraperitoneally at 0, 2 and 4 days after grouping at a concentration of 5 mg/kg. SB525334 was used for TGF-β pathway inhibition through oral gavage at 10 mg/kg every other day. Both drugs were dissolved in DMSO, and then we prepareda solution in 10% DMSO suspension, 40% PEG300 (S6704, Selleck), 5% Tween-80 (A600562, Sangon Biotech), and 45% saline. Twenty-one days after tumour implantation, all mice were executed and dissected to collect tumour tissues and weigh tumours by digital electronic scales. Animal care, housing, and procedures were performed following the Guide for the Care and Use of Laboratory Animals of the Shanghai Pulmonary Hospital Ethics Committee. Animals were randomized for this study and kept under specific pathogen-free (SPF) conditions at the Laboratory Animal Centre of Shanghai Pulmonary Hospital (SYXK (Shanghai) 2022-0013). Ethics approval and consent to participate This study was approved by the Ethics Committee of Shanghai Pulmonary Hospital (K24-311). Statistics and Reproducibility Statistical analysis was performed using GraphPad Prism (Version 9) and R Studio software (version 4.3.1). Student’s t-test or Wilcoxon-rank test was used to compare the statistical differences between sample groups. One-way anova test was used to compare the effects of drugs in mouse model. All p-values were two-tailed with a significance level of 0.05. Supplementary information [192]Supplementary Information^ (4MB, pdf) [193]Supplementary Data 1^ (297.9KB, xlsx) [194]Supplementary Data 2^ (3.2MB, xlsx) [195]Supplementary Data 3^ (280.3KB, xlsx) [196]Supplementary Data 4^ (24.9KB, xlsx) [197]Supplementary Data 5^ (24.2KB, xlsx) [198]42003_2025_7595_MOESM7_ESM.pdf^ (30.6KB, pdf) Description of Additional Supplementary Files [199]reporting-summary^ (2MB, pdf) Acknowledgements