Abstract Background Endometrial cancer represents a significant health challenge, with rising incidence and complex prognostic challenges. This study aimed to develop a robust predictive model integrating programmed cell death-related genes and advanced machine learning techniques. Methods Utilizing transcriptomic data from TCGA-UCEC and [42]GSE119041 datasets, we employed a comprehensive approach involving 117 machine learning algorithms. Key methodologies included differential gene expression analysis, weighted gene co-expression network analysis, functional enrichment studies, immune landscape evaluation, and multi-dimensional risk stratification. Results We identified 10 critical genes (PTGIS, TIMP3, SRPX, SNCA, HIC1, BAK1, STXBP2, TRIB3, RTKN2, E2F1) and constructed a prognostic model with superior predictive performance. The StepCox[forward] + plsRcox algorithm combination demonstrated excellent predictive accuracy (AUC > 0.8). Kaplan–Meier analysis revealed significant survival differences between high- and low-risk groups in both training (HR = 3.37, p < 0.001) and validation cohorts (HR = 2.05, p = 0.021). The model showed strong correlations with clinical characteristics, immune cell infiltration patterns, and potential therapeutic responses. Conclusions This study presents a novel, comprehensive approach to endometrial cancer prognosis, integrating machine learning and molecular insights to provide a more precise risk stratification tool with potential clinical translation. Keywords: Endometrial cancer, Prognostic modeling, Programmed cell death, Machine learning, Precision oncology Introduction Endometrial cancer is the sixth most common malignancy among women globally and the fourth most common gynecological cancer. Risk factors include obesity, diabetes, hypertension, hormonal imbalances, and genetic predisposition. The disease is typically diagnosed in postmenopausal women, with a median age of diagnosis around 60 years [[43]1, [44]2]. Endometrial cancer represents a significant health burden for women, with rising incidence rates annually. However, the management of endometrial cancer faces several critical challenges. Current diagnostic approaches have limited ability to accurately identify high-risk patients at initial presentation, leading to suboptimal treatment selection. Additionally, the heterogeneous response to immunotherapy and other targeted treatments highlights the need for more precise patient stratification tools. These challenges underscore the urgent need for improved prognostic models. Although prognostic prediction models still require refinement, significant advances have been achieved through various innovative models in recent years. Machine learning models incorporating immune cells and molecular markers have demonstrated 69% accuracy in recurrence prediction [[45]3]. While predictive models based on hysteroscopic data have shown high sensitivity and specificity, MRI radiomics models have been proven superior to traditional assessments in early diagnosis and staging [[46]4]. Novel perspectives for prognostic prediction have been provided by multi-omics studies and immune-related scoring models [[47]5, [48]6], and significant prognostic value has been demonstrated by inflammatory and hypoxia-related research, along with inflammatory markers, in high-grade endometrial cancer [[49]7, [50]8]. While these models show promise in improving endometrial cancer prognostic prediction, challenges persist. The integration of diverse data types requires sophisticated computational tools and validation in larger, more diverse patient populations. Additionally, translating these models into clinical practice necessitates collaboration among researchers, clinicians, and healthcare systems to ensure accessibility and applicability. As research progresses, these models may pave the way for more personalized and effective management strategies for endometrial cancer patients. Programmed cell death (PCD) is a crucial biological process that plays a vital role in tumor development and clinical prognosis. It encompasses multiple modalities including apoptosis, autophagy, pyroptosis, and ferroptosis, each contributing differently to cancer progression and treatment outcomes. The expression and regulation of PCD-related genes are essential for cancer development, and their dysregulation may lead to uncontrolled cell proliferation and tumor progression [[51]9–[52]12]. In terms of prognostic prediction, several studies have developed prognostic models based on PCD-related genes. For instance, in colorectal cancer, a risk score combining genes such as FABP4, AQP8, and NAT1 has been shown to effectively predict patient prognosis [[53]13], while in hepatocellular carcinoma, PCD-related gene-based prognostic models identified subtypes with distinct prognostic outcomes [[54]14]. Pan-cancer analysis further identified a gene signature that could distinguish patients with unfavorable prognosis [[55]12]. Regarding the tumor microenvironment, PCD functions by influencing immune cell infiltration and immune checkpoint expression, with high-risk patients typically showing altered immune landscapes that may affect their response to immunotherapy [[56]15, [57]16]. Meanwhile, PCD-related gene expression can predict sensitivity to certain drugs, for example, in colorectal cancer, patients with high-risk scores showed reduced response to immunotherapy and first-line clinical drugs [[58]13]. In clinical applications, therapeutic strategies targeting PCD pathways are being explored to enhance cancer treatment efficacy [[59]15]. Additionally, PCD-related genes can serve as potential biomarkers for cancer diagnosis and prognosis, such as SERPINE1 and G6PD being identified as important prognostic markers in hepatocellular carcinoma [[60]17]. Although PCD plays a crucial role in cancer development and prognosis, its mechanisms of action vary complexly across different cancer types. The interactions between PCD and the tumor microenvironment, along with the influence of genetic and epigenetic factors, further add to the complexity of research. The tumor microenvironment plays a crucial role in this interaction by influencing PCD through various mechanisms: the hypoxic, acidic, and nutrient-poor conditions within the TME can modulate cell death pathways, while stromal and immune cells can either promote or inhibit PCD depending on the context [[61]18]. These processes are further regulated by genetic and epigenetic alterations, including DNA methylation and histone modifications, which affect both PCD-related gene expression and immune responses [[62]18, [63]19]. Understanding these complex interactions is essential for developing effective therapeutic strategies, as they significantly impact treatment response and resistance mechanisms [[64]20]. Future studies should focus on unraveling these complexities to develop more precise and effective therapeutic strategies. Although prognostic prediction models have been continuously improved, current predictive accuracy remains inadequate, particularly in assessing tumor recurrence and metastasis risk. While programmed cell death-related genes have been proven valuable for prognostic evaluation in various tumors, the potential application in endometrial cancer prognosis prediction has not been fully explored. Furthermore, existing studies predominantly employ single-dimensional data analysis, lacking comprehensive analytical approaches that integrate multi-dimensional information from genomics, transcriptomics, and clinicopathological features. This study addresses this gap by developing an innovative prognostic risk model using 117 machine learning algorithms, integrating data from TCGA and GEO databases, and focusing on programmed cell death-related genes. Through comprehensive analyses including differential gene expression, weighted gene co-expression network analysis, and immune landscape evaluation, the research aims to create a robust predictive tool that integrates risk scores, clinical stage, and age, ultimately providing a more nuanced approach to understanding endometrial cancer progression and potentially guiding personalized treatment strategies. Methods Data collection and processing RNA sequencing data were downloaded from the TCGA-UCEC project, which contained transcriptomic profiles from 550 tumor tissue samples and 35 healthy control samples, along with corresponding clinical and survival information. For external validation, the [65]GSE119041 dataset was obtained from the GEO database, comprising gene expression microarray data from 50 UCEC tumor samples analyzed on the [66]GPL570 platform. Additionally, 1254 programmed cell death-related genes were identified through comprehensive literature review [[67]21]. Quality control procedures were systematically applied to the raw data. For missing value processing, we excluded samples with > 20% missing values; remaining missing values were imputed using k-nearest neighbor (k = 10) method. Outliers were identified using the interquartile range (IQR) method, where values beyond Q1-1.5 × IQR or Q3 + 1.5 × IQR were flagged and verified. Expression values underwent log2 transformation and quantile normalization to ensure comparability across samples. Batch effects between different sequencing platforms were corrected using the ComBat algorithm. Sample size adequacy was verified through power analysis, indicating that our cohort size would provide 90% power to detect a hazard ratio of 1.5 at a significance level of 0.05. Inclusion and exclusion criteria For TCGA-UCEC cohort selection, the following inclusion criteria were applied: (1) primary endometrial cancer samples with complete RNA sequencing data; (2) samples with complete clinical information including survival time, survival status, age, and tumor stage; (3) patients with follow-up time > 30 days. Exclusion criteria included: (1) samples with missing key clinical parameters; (2) patients without clear survival status; (3) samples with low RNA sequencing quality (RNA integrity number < 7); and (4) patients lost to follow-up within 30 days. For the [68]GSE119041 validation cohort, similar criteria were applied: included samples required complete gene expression data, survival information, and clinical parameters, while samples with incomplete follow-up data or missing key variables were excluded. After applying these criteria, 550 tumor samples and 35 normal samples were included from TCGA-UCEC, and 50 tumor samples were retained from [69]GSE119041 for subsequent analysis. Differential gene expression analysis Differential expression analysis was performed using the limma package in R. Raw expression matrices were subjected to log2 transformation and quantile normalization. A linear modeling approach was implemented to identify differentially expressed genes between tumor and normal samples. Statistical significance was determined using thresholds of |log2FC|≥ 1 and p-value < 0.05. Multiple testing correction was applied using the FDR method to control false discovery rates. Expression patterns were visualized through volcano plots and hierarchical clustering heatmaps. The biological relevance of identified genes was evaluated through comparison with known disease-associated genes. Co-expression network analysis Gene co-expression networks were constructed using the WGCNA approach. Low-quality genes were filtered based on expression level and variance. A soft-thresholding power was selected by analyzing the scale-free topology fit index across various thresholds. Gene modules were identified through hierarchical clustering and dynamic tree cutting, with a minimum module size of 100 genes enforced. Module-trait relationships were quantified using module eigengenes, and significant associations were determined (correlation coefficient > 0.3, p < 0.05). Intramodular connectivity and gene significance measures were calculated to identify hub genes within each module. Candidate gene selection and functional analysis A rigorous multi-step approach was employed to identify candidate genes. Differentially expressed genes were intersected with WGCNA key module genes and cross-referenced against programmed cell death-related genes. Protein–protein interaction networks were constructed using the STRING database with a confidence threshold of 0.4. Network topology metrics, including degree centrality and betweenness centrality, were computed for each node. Functional characterization was performed through GO and KEGG pathway enrichment analyses using the clusterProfiler package (adj.p < 0.05). Expression patterns across different tumor stages and grades were analyzed to evaluate biomarker potential. Machine learning model construction Multiple machine learning algorithms were integrated to develop a robust prognostic model. Candidate genes were initially screened through univariate Cox regression analysis (p < 0.05). Ten classical algorithms were implemented, including Random Survival Forest (RSF), Elastic Net (Enet), Stepwise Cox (StepCox), CoxBoost, Partial Least Squares Regression Cox (plsRcox), Supervised Principal Components (superpc), Generalized Boosted Regression Model (GBM), Survival Support Vector Machine (survivalsvm), Ridge, and Lasso regression. These algorithms were systematically combined into 117 different modeling strategies to leverage their complementary strengths. Model performance was evaluated through K-fold cross-validation, with metrics including C-index and calibration curves being calculated. Sensitivity analyses were conducted to assess model robustness under varying parameter settings. Model validation and clinical assessment The prognostic model was subjected to comprehensive validation procedures. Patient cohorts in both training and validation sets were stratified into high- and low-risk groups based on calculated risk scores. Survival differences between groups were assessed using Kaplan–Meier analysis, examining both overall survival and progression-free survival outcomes. The relationship between risk scores and clinical parameters was investigated through detailed statistical analyses. Univariate Cox regression was performed to evaluate the prognostic impact of clinical features. Significant variables (p < 0.05) were tested for proportional hazards assumptions and incorporated into multivariate analyses. Nomograms were constructed to predict 3-, 5-, and 7-year survival probabilities, with predictive accuracy evaluated through calibration curves. Molecular mechanism investigation Gene Set Enrichment Analysis (GSEA) was performed for individual prognostic biomarkers. Correlation analysis and ranking procedures were applied to gene expression profiles. Human KEGG pathway gene sets were utilized as reference databases. Samples were dichotomized based on median risk scores, and differential pathway enrichment was analyzed between high- and low-risk groups. Enrichment analyses were conducted for GO terms and KEGG pathways, with significance threshold set at adj.p < 0.05. Results were visualized through appropriate graphical representations to illustrate biological implications. Immune landscape and therapeutic prediction The tumor immune microenvironment was systematically evaluated using the xCell algorithm. Immune cell infiltration patterns were quantified and analyzed for significant enrichment (P < 0.05). The relationship between immune cell populations and risk scores was assessed through Spearman correlation analysis. Somatic mutation profiles were analyzed using the maftools package, focusing on differences between risk groups. Immunotherapy response prediction was performed using the TIDE computational framework, incorporating T cell dysfunction and exclusion scores. Drug sensitivity was predicted through integration with the GDSC database, with IC50 values calculated and correlated with risk scores to guide potential therapeutic strategies. Results Identification of prognostic markers and WGCNA analysis in endometrial cancer A total of 4,300 differentially expressed genes were identified, including 1930 upregulated and 2370 downregulated genes. Figure [70]1A shows the volcano plot of differential gene expression, with significant upregulated genes in red and downregulated genes in blue. Figure [71]1B presents the heatmap of differentially expressed genes, clearly demonstrating distinct expression patterns between tumor and normal samples. Using WGCNA analysis, samples were first clustered to detect outliers (Fig. [72]1C). Figure [73]1D shows the analysis of network topology for different soft-thresholding powers. The left panel displays the scale-free fit index (y-axis) versus soft-thresholding power (x-axis), with values ranging from 1 to 30. The optimal β = 10 was selected as it was the lowest power at which the scale-free topology fit index curve flattened out upon reaching a high value (> 0.8). The right panel shows the mean connectivity (y-axis) versus soft-thresholding power (x-axis), demonstrating how connectivity decreases as the soft threshold increases. The gene co-expression network was constructed, revealing 10 distinct modules (Fig. [74]1E). The dendrogram from dynamic tree cutting shows the hierarchical clustering of genes into these modules (Fig. [75]1F). Figure [76]1G illustrates the module-trait relationships through a heatmap, where each row corresponds to a module eigengene and each column to a trait (Case/Control). The numbers in each cell represent the correlation coefficient and p-value (in parentheses). Three modules showed significant positive correlations with cancer phenotype: MEblue (r = 0.4, p < 0.05), MEbrown (r = 0.37, p < 0.05), and MEred (r = 0.32, p < 0.05). Fig. 1. [77]Fig. 1 [78]Open in a new tab Identification of prognostic markers and WGCNA analysis in endometrial cancer. A Volcano plot showing differentially expressed genes between tumor and normal samples. Red dots represent upregulated genes, blue dots represent downregulated genes. B Heatmap of differentially expressed genes between tumor and normal samples. C Sample clustering dendrogram and trait heatmap to detect outliers. D Analysis of network topology for different soft-thresholding powers. Left panel shows scale-free fit index versus soft-thresholding power; right panel shows mean connectivity versus soft-thresholding power. E Gene clustering dendrogram based on topological overlap. F Dynamic tree cut results showing gene modules. G Module-trait relationships showing correlation between module eigengenes and cancer traits Integration analysis of DEGs, WGCNA, and enrichment analysis in endometrial cancer Integrative analysis identified key molecular features and pathways in endometrial cancer. The Venn diagram (Fig. [79]2A) shows the intersection of differentially expressed genes (DEGs), cell death-related genes (CDRGs), and WGCNA results. Among these, 65 genes were found at the intersection of all three analyses, representing potential key regulators. The protein–protein interaction (PPI) network of these 65 candidate genes (Fig. [80]2B) was constructed using STRING database with a confidence score > 0.4, revealing complex interactions among these molecules. Fig. 2. [81]Fig. 2 [82]Open in a new tab Integration analysis of DEGs, WGCNA, and enrichment analysis. A Venn diagram showing overlaps between DEGs, CDRGs, and WGCNA results. B PPI network of 65 overlapping genes constructed using STRING database. C GO enrichment analysis results showing biological processes, cellular components, and molecular functions. D KEGG pathway analysis results showing enriched cancer-related pathways The GO enrichment analysis (Fig. [83]2C) revealed multiple significantly enriched biological processes, cellular components, and molecular functions. The biological processes (BP) were primarily enriched in apoptotic signaling pathway regulation (including positive and negative regulation), transport regulation, and autophagy regulation. Notably, mitochondrial organization and extrinsic apoptotic signaling pathway showed the highest enrichment scores. For cellular components (CC), the analysis highlighted membrane-related structures including membrane raft, caveola, plasma membrane, and clathrin-coated vesicle membrane. The molecular functions (MF) showed significant enrichment in protein binding activities, particularly heat shock protein binding, lyase activity, and peptidase regulator activity. The KEGG pathway analysis (Fig. [84]2D) identified five major cancer-related pathways, illustrated in a circular layout. The central node represents the intersection of all pathways, with individual pathways radiating outward. The most significantly enriched pathways included proteoglycans in cancer (containing genes like ERBB3, F2R, and HGF), calcium signaling pathway (including ITPR1, KDR, and NGF), melanoma pathway, bladder cancer pathway, and microRNAs in cancer. Performance evaluation of machine learning models for prognostic prediction in endometrial cancer A comprehensive evaluation of 117 machine learning combinations was conducted to develop an optimal prognostic model. Time-dependent ROC analysis at 3-year (Fig. [85]3A), 5-year (Fig. [86]3B), and 7-year (Fig. [87]3C) intervals demonstrated model performance across different time points. The heatmap visualization shows AUC values for each algorithm combination, with Dataset1 (training) and Dataset2 (validation) performance displayed in green and blue bars respectively. The StepCox[forward] + plsRcox combination consistently achieved superior performance, with the highest AUC values across all time points (AUC > 0.8). Fig. 3. [88]Fig. 3 [89]Open in a new tab Machine learning model performance and survival analysis. A–C ROC curve analysis results for 3-year, 5-year, and 7-year survival predictions across 117 algorithm combinations. D Kaplan–Meier survival curves for high- and low-risk groups in training (left) and validation (right) cohorts Kaplan–Meier survival analysis (Fig. [90]3D) further validated the model’s prognostic value. In the training cohort (n = 546, left panel), the high-risk group showed significantly worse survival compared to the low-risk group (HR = 3.37, 95% CI 2.24–5.07, p < 0.001). This finding was independently validated in Dataset2 (n = 50, right panel), where the high-risk group maintained significantly poorer survival outcomes (HR = 2.05, 95% CI 1.08–3.87, p = 0.021). The survival curves demonstrated clear stratification between risk groups, with the separation particularly pronounced in the first 2000 days. The shaded areas represent 95% confidence intervals, and the dotted lines indicate median survival times. Clinical relevance and prognostic model validation in endometrial cancer Boxplot analysis (Fig. [91]4A) revealed significant associations between risk scores and clinical characteristics. Risk scores were significantly higher in patients aged > 60 years (p < 0.0001). Similarly, advanced tumor stages (III-IV) showed progressively higher risk scores compared to early stages (I–II), with significant inter-stage differences (p < 0.05). Fig. 4. [92]Fig. 4 [93]Open in a new tab Clinical relevance and model validation. A Boxplots showing risk score distribution by age and stage. B Forest plots of univariate and multivariate Cox regression analyses. C Nomogram for predicting 3-, 5-, and 7-year survival probability. D Calibration curves for nomogram-predicted survival. E Decision curve analysis at 3-, 5-, and 7-year time points Univariate and multivariate Cox regression analyses (Fig. [94]4B) identified three independent prognostic factors: stage (HR = 1.61, 95% CI 1.5–2.18), risk score (HR = 2.04, 95% CI 1.44–2.87), and age (HR = 1.03, 95% CI 1.01–1.05). The nomogram (Fig. [95]4C) integrated these factors to predict 3-, 5-, and 7-year survival probabilities. The calibration curves (Fig. [96]4D) demonstrated excellent agreement between predicted and observed outcomes (C-index = 0.78). Decision curve analysis (DCA) at 3-, 5-, and 7-year time points (Fig. [97]4E) evaluated the clinical utility of the nomogram compared to individual factors. The nomogram consistently showed superior net benefit across a wide range of threshold probabilities, outperforming both single predictors and treat-all/treat-none strategies. This advantage was particularly pronounced at threshold probabilities between 0.2 and 0.6, indicating optimal clinical applicability in this range. Single-gene GSEA pathway analysis of key prognostic genes in endometrial cancer GSEA revealed distinct pathway enrichment patterns for each prognostic gene (Fig. [98]5): PTGIS was enriched in focal adhesion, ECM-receptor signaling pathway, calcium signaling pathway, neuromuscle muscle contraction, and neuroactive ligand-receptor interaction pathways, with peak enrichment scores around 0.6 (Fig. [99]5). TRIB3 showed significant association with cell cycle regulation, spliceosome function, RNA degradation, DNA replication, and pyrimidine metabolism, demonstrating highest enrichment scores of approximately 0.5 (Fig. [100]5). TIMP3 was enriched in focal adhesion, ECM-receptor interaction, muscular muscle contraction, and calcium signaling pathways, with enrichment scores reaching 0.6 (Fig. [101]5). STXBP2 exhibited strong enrichment in immune-related pathways including antigen processing/presentation, graft-versus-host disease, autoimmune disease, allograft rejection, and ribosome function, showing both positive and negative enrichment patterns (Fig. [102]5). SRPX showed significant enrichment in focal adhesion, ECM receptor interaction, axonal cell guidance, cardiac development, and melanogenesis pathways (Fig. [103]5). BAK1 was primarily associated with spliceosome, cell cycle, proteasome, aminoacyl-tRNA biosynthesis, and RNA degradation pathways (Fig. [104]5). SNCA demonstrated enrichment in focal adhesion, ECM-receptor interaction, MAPK signaling, melanogenesis, and GAP junction pathways (Fig. [105]5). RTKN2 showed strong association with spliceosome, ubiquitin-mediated proteolysis, RNA degradation, and protein export pathways (Fig. [106]5). HIC1 exhibited significant enrichment in focal adhesion, neuroactive ligand-receptor interaction, oxidative phosphorylation, and autoimmune disease pathways, with some showing negative enrichment patterns (Fig. [107]5). E2F1 was enriched in cell cycle, spliceosome, complement/coagulation cascades, RNA splicing, and proteasome pathways, with distinct positive and negative enrichment patterns (Fig. [108]5). Each gene's enrichment analysis showed statistical significance (FDR < 0.05), with enrichment curves demonstrating unique temporal patterns and peak enrichment scores ranging from 0.4 to 0.8. Fig. 5. [109]Fig. 5 [110]Open in a new tab Single-gene GSEA analysis of key prognostic genes. GSEA results for 10 prognostic genes (PTGIS, TRIB3, TIMP3, STXBP2, SRPX, BAK1, SNCA, RTKN2, HIC1, and E2F1) showing enriched pathways and their enrichment scores Immune cell infiltration analysis and its association with risk score in endometrial cancer Analysis of immune cell infiltration patterns revealed distinct immunological features across risk groups. The stacked bar plot (Fig. [111]6A) displays the proportion of immune cells in each sample, showing heterogeneous immune cell composition across patients. A quantitative comparison of immune cell infiltration between risk groups (Fig. [112]6B) identified significant differences in multiple immune cell populations, with particularly notable variations in CD8 + T cells, M1 macrophages, and dendritic cells. The correlation heatmap (Fig. [113]6C) illustrates relationships between key prognostic genes and immune cell populations. Strong positive correlations were observed between certain genes (e.g., TIMP3, SRPX) and immune cells like M2 macrophages and CD4 + memory T cells, while negative correlations were found with cells like neutrophils and activated NK cells. Further analysis (Fig. [114]6D) revealed significant correlations between risk scores and key immune parameters: Exclusion score (R = 0.38, p < 2.2e−16), Dysfunction score (R = − 0.31, p = 1.1e−13), and TIDE score (R = 0.22, p = 1.4e−07). Higher risk scores were associated with increased immune exclusion and TIDE scores but decreased dysfunction scores, suggesting that high-risk patients might have compromised immune surveillance and potentially different responses to immunotherapy. Fig. 6. [115]Fig. 6 [116]Open in a new tab Immune cell infiltration analysis. A Bar plot showing immune cell composition in individual samples. B Box plots comparing immune cell proportions between risk groups. C Correlation heatmap between key genes and immune cell types. D Correlation plots between risk scores and immune parameters (Exclusion, Dysfunction, and TIDE scores) Somatic mutation and pathway enrichment analysis in high- and low-risk groups Somatic mutation analysis (Fig. [117]7A) revealed distinct mutational patterns. The low-risk group showed predominant mutations in PTEN (86%), ARID1A (56%), PIK3CA (54%), TTN (40%), PIK3R1 (37%), CTNB1 (34%), CTCF (33%), KMT2D (29%), ZFHX3 (25%), CSMD3 (24%), MUC16 (24%), OBSCN (24%), RYR2 (22%), FAT1 (22%), and MACF1 (22%). In contrast, the high-risk group was characterized by frequent mutations in TP53 (62%), PIK3CA (44%), PTEN (41%), ARID1A (35%), TTN (34%), KMT2D (27%), MUC16 (25%), PPP2R1A (22%), FAT1 (21%), and ZFHX4 (20%). Fig. 7. [118]Fig. 7 [119]Open in a new tab Mutation landscape and pathway analysis in risk groups. A Oncoplot showing mutation profiles in low-risk (left) and high-risk (right) groups. B GSEA results showing enriched GO terms and KEGG pathways between risk groups. C Correlation plots between risk scores and drug sensitivity (IC[50] values) GSEA analysis demonstrated significant downregulation of cellular movement-related pathways in the high-risk group, including axoneme assembly, microtubule bundle formation, cilium organization, and cell motility processes (Fig. [120]7B). KEGG pathway analysis revealed enrichment of immune-related pathways in the low-risk group, particularly neuroactive ligand-receptor interaction, cell cycle regulation, antigen processing, cytokine interactions, and complement cascades (Fig. [121]7B). Drug sensitivity analysis revealed significant negative correlations between risk scores and IC50 values for several compounds (p < 0.0001): MG-132 (R = − 0.32, p = 9.9e−15), UMI-77 (R = − 0.33, p = 8.4e−15), Sepantronium bromide (R = − 0.53, p < 2.2e−16), and WEHI-539 (R = − 0.54, p < 2.2e−16), suggesting potential therapeutic implications for high-risk patients (Fig. [122]7C). Discussion Endometrial cancer is a malignant tumor that seriously threatens women's health, and despite advances in traditional treatments such as surgery, radiotherapy, and chemotherapy, patient prognosis remains unsatisfactory [[123]22]. Therefore, developing more accurate prognostic prediction tools is crucial for improving patient outcomes. Machine Learning (ML) approaches have overcome the limitations of traditional statistical methods in handling high-dimensional data and complex biological relationships. Traditional approaches are insufficient in processing and integrating multi-omics datasets [[124]23], while complex biological interactions and nonlinear relationships are found to be beyond their analytical scope [[125]24]. In contrast, ML models have demonstrated unique advantages in efficiently processing large, complex datasets [[126]23] and integrating multi-dimensional information for comprehensive assessment [[127]25]. This study integrated data from TCGA and GEO databases, combining 117 machine learning algorithms to construct, for the first time, a prognostic risk model based on 10 programmed cell death-related genes (PTGIS, TIMP3, SRPX, SNCA, HIC1, BAK1, STXBP2, TRIB3, RTKN2, E2F1). This model not only demonstrated good predictive performance in both training and validation sets but also effectively stratified patients into high and low-risk groups, with significant differences in survival outcomes between the groups. Notably, the model showed significant correlation with clinicopathological features (such as tumor stage and age), suggesting its potential clinical application value in disease progression assessment. Recent research indicates that with advances in bioinformatics technology, increasingly more computational methods are being applied to tumor prognosis prediction. Traditional studies primarily utilized single machine learning algorithms such as LASSO regression [[128]26] or random forests [[129]27] to construct prognostic models. In comparison, this study innovatively integrated 117 machine learning algorithms for model selection, a systematic evaluation strategy that not only improved model prediction accuracy but also enhanced its clinical application reliability. In molecular feature research, previous studies have separately explored immune microenvironment [[130]28], mutation spectrum [[131]29], and drug sensitivity [[132]30] characteristics, but lacked systematic integrated analysis. Through multi-dimensional data integration, this study not only revealed significant differences between high and low-risk groups in these aspects but also provided new insights for individualized treatment. Particularly, approaching from the perspective of programmed cell death aligns with recent research emphasizing its importance in tumor progression [[133]9, [134]11, [135]31, [136]32]. Furthermore, the observed elevated T cell exclusion score in the high-risk group corresponds with recent findings on immune escape mechanisms [[137]33]. Considering recent advances in tumor heterogeneity research, especially the advantages of single-cell sequencing technology in revealing tumor microenvironment complexity [[138]34–[139]36], future research could combine single-cell transcriptomics and spatial transcriptomics technologies [[140]37] to deeply explore the expression patterns of programmed cell death-related genes in different cell types and their interactions with the immune microenvironment, which will contribute to a more comprehensive understanding of disease progression mechanisms and provide new targets for precision therapy. In the construction of the prognostic model, this study employed multi-level screening strategies and validation methods. Compared to traditional research methods of screening candidate genes through differential expression analysis [[141]38], or studies focusing on specific pathway-related genes [[142]39], this study integrated multi-dimensional screening including WGCNA network analysis, differential expression analysis, and programmed cell death-related genes, ultimately identifying 10 candidate genes: PTGIS, TIMP3, SRPX, SNCA, HIC1, BAK1, STXBP2, TRIB3, RTKN2, and E2F1. This multi-level screening strategy enhances gene selection reliability and aligns with recent concepts of multi-omics integration analysis [[143]40, [144]41]. In model construction, the study evaluated the predictive performance of 117 machine learning algorithm combinations using the Mime1 package, including random survival forest (RSF), elastic net (Enet), stepwise Cox (StepCox), CoxBoost, partial least squares regression for Cox (plsRcox), supervised principal components (superpc), generalized boosted regression models (GBM), survival support vector machines (survivalsvm), Ridge, and least absolute shrinkage and selection operator (Lasso), ultimately selecting StepCox[forward] + plsRcox as the optimal combination. This systematic evaluation approach provides a different modeling perspective compared to existing prediction models [[145]42, [146]43]. Notably, the model demonstrated stable predictive performance in the independent validation set [147]GSE119041 (including 50 UCEC samples), with cross-platform validation results aligning with model validation standards suggested by recent research [[148]44]. Kaplan–Meier survival analysis showed significant differences in survival prognosis between high and low-risk groups (p < 0.05) in both training and validation sets, supporting the stability and potential generalizability of the prognostic model. Furthermore, the validation strategy based on multiple independent datasets aligns with the methodological framework proposed in recent publications on clinical prediction model evaluation [[149]45]. The ten key genes identified through 117 machine learning algorithms—PTGIS, TIMP3, SRPX, SNCA, HIC1, BAK1, STXBP2, TRIB3, RTKN2, and E2F1—play crucial roles in the development and progression of endometrial cancer. PTGIS is involved in prostaglandin synthesis, while TIMP3 inhibits matrix metalloproteinases, both affecting tumor growth and metastasis [[150]46]. E2F1, a key regulator of the cell cycle, is overexpressed in various cancers, promoting cell proliferation [[151]47]. SNCA is associated with neurodegenerative disorders, and HIC1 functions as a tumor suppressor gene; their dysregulation can contribute to tumorigenesis [[152]48]. TRIB3 and BAK1 are related to stress response and apoptosis, respectively, and their mutations may lead to cancer cell resistance to death signals [[153]49]. RTKN2 and STXBP2 are involved in cytoskeletal organization and vesicle trafficking, and their mutations can affect cellular structure and signal transduction. TRIB3 modulates the PI3K/AKT pathway, a critical signaling cascade in cancer cell survival and proliferation [[154]50]. BAK1, a pro-apoptotic gene, plays a vital regulatory role in programmed cell death [[155]49]. HIC1 and SNCA may influence the tumor immune microenvironment by modulating immune cell infiltration and response, thereby affecting tumor progression and prognosis [[156]51]. PTGIS, associated with fatty acid metabolism, has been identified as a key gene affecting the malignant biological behavior of EC, with its expression levels correlating with tumor invasiveness and immune status in the microenvironment [[157]52]. SNCA can influence tumor progression by regulating the cell cycle and apoptosis [[158]48]. E2F1 dysregulation is associated with increased cell proliferation and tumor progression in EC [[159]53]. Although the specific mechanisms of TIMP3, SRPX, and STXBP2 were not discussed in detail in the previous endometrial cancer studies, it is generally understood that TIMP3 is involved in extracellular matrix degradation, SRPX participates in cell adhesion and migration [[160]54], and STXBP2 is involved in vesicle transport and secretion [[161]55], all of which may affect tumor cell invasion and metastasis. HIC1 [[162]56] and RTKN2 [[163]57] influence cell proliferation and survival through tumor suppressor and oncogenic signaling pathways, respectively. Through functional enrichment analysis, this study found that the selected candidate genes were mainly enriched in pathways such as proteoglycans in endometrial cancer, consistent with previous research [[164]58] highlighting the crucial role of proteoglycans in endometrial tumor progression. Single-gene GSEA analysis further revealed associations between these genes and multiple cancer-related pathways, particularly PTGIS and BAK1 genes showing enrichment in apoptosis and proliferation-related pathways. PPI network analysis identified E2F1 at the network's core, aligning with previous research [[165]59] reporting E2F1’s bidirectional regulatory role in tumor progression. In clinical applications, the nomogram constructed in this study integrated factors including risk score, clinical stage, and age, providing clinicians with an intuitive prognostic assessment tool. This integrated model construction approach is similar to strategies employed in recent research [[166]60], with DCA curve analysis supporting the model's clinical utility across a wide range of threshold probabilities. Regarding the immune microenvironment, immune cell infiltration analysis revealed significant differences in immune cell composition between risk groups, with TIDE analysis indicating higher immune escape potential in the high-risk group, echoing recent findings [[167]61]. Notably, we found significantly higher TP53 mutation rates in the high-risk group compared to the low-risk group (62% vs 20%), while PTEN mutations were more prevalent in the low-risk group (86%), suggesting these mutation patterns may be associated with immune microenvironment remodeling. Finally, drug sensitivity analysis identified four drugs showing enhanced therapeutic potential in the high-risk group, aligning with recent research [[168]62] suggesting that immune checkpoint inhibitors may not be suitable for all patients, emphasizing the importance of individualized treatment planning. While these findings provide new perspectives for precision treatment of endometrial cancer, prospective clinical studies are needed to validate their clinical application value. Drug sensitivity analysis revealed four compounds showing significantly increased efficacy in the high-risk group, as indicated by lower IC50 values: MG-132 (R = − 0.32, p = 9.9e−15), UMI-77 (R = − 0.33, p = 8.4e−15), Sepantronium bromide (R = − 0.53, p < 2.2e−16), and WEHI-539 (R = − 0.54, p < 2.2e−16). Each of these drugs targets specific nodes in programmed cell death pathways through distinct but complementary mechanisms. MG-132, a proteasome inhibitor, enhances cancer cell apoptosis by inhibiting the ubiquitin–proteasome pathway, leading to increased caspase-3 activation and reactive oxygen species (ROS) upregulation. Notably, MG-132 has shown synergistic effects with cisplatin in endometrial cancer cells and can enhance the expression of apoptotic markers when combined with other therapeutics [[169]63, [170]64]. UMI-77 specifically targets the Bcl-2 family of proteins, key regulators of the intrinsic apoptotic pathway, promoting apoptosis by inhibiting anti-apoptotic protein function [[171]65]. Sepantronium bromide (YM155) operates through a distinct mechanism as a survivin suppressant, targeting this critical inhibitor of apoptosis protein that is frequently overexpressed in endometrial cancer [[172]66]. WEHI-539, a selective Bcl-xL inhibitor, disrupts the balance of pro- and anti-apoptotic signals by specifically targeting Bcl-xL-mediated survival pathways [[173]67]. The enhanced sensitivity to these compounds in high-risk patients suggests that their tumors may be more dependent on anti-apoptotic mechanisms for survival, particularly through proteasome-mediated protein degradation and Bcl-2 family protein regulation. Recent years have witnessed diverse approaches to developing prognostic models for endometrial cancer, yet achieving consistent high accuracy remains challenging. A systematic review of risk prediction models for the general population revealed AUC values ranging from 0.68 to 0.77, even when incorporating comprehensive epidemiological variables including reproductive history, hormone use, BMI, and smoking history [[174]68]. While diagnostic models for symptomatic women showed somewhat improved performance with AUC values between 0.73 and 0.957, many still struggled to consistently exceed the 0.8 threshold despite incorporating clinical predictors such as endometrial thickness and recurrent bleeding patterns [[175]68]. Traditional epidemiologic models, even when utilizing data from the Epidemiology of Endometrial Cancer Consortium, achieved limited discriminative ability with AUC values between 0.64 and 0.69, and notably, the addition of genetic factors did not significantly enhance their performance [[176]69]. More sophisticated approaches using machine learning algorithms to predict concurrent endometrial carcinoma in patients with endometrial intraepithelial neoplasia achieved a maximum AUC of only 0.646 with random forest models [[177]70]. Bayesian network models analyzing survival-related factors showed improvement with an AUC of 0.787, outperforming traditional Cox proportional hazards models (AUC = 0.723) but still falling short of optimal predictive power [[178]42]. Recent molecular and biomarker-based approaches have also shown moderate success: fragmentomics-based liquid biopsy models demonstrated AUC values of 0.72 for stage prediction and 0.73 for histological subtype classification [[179]71], while disulfidptosis-related prognostic models achieved AUCs of 0.71 for overall survival and 0.69 for disease-free survival [[180]72]. In contrast, our model achieved superior predictive performance (AUC > 0.8) through the integration of 117 machine learning algorithms, comprehensive incorporation of programmed cell death-related genes, multi-dimensional validation strategy, and robust cross-platform validation. Our immune microenvironment analysis revealed significantly elevated T cell exclusion scores in the high-risk group, suggesting complex immune evasion mechanisms that operate through multiple molecular pathways. At the cytokine and chemokine level, we observed decreased expression of CXCL9/CXCL10/CXCL11 in the high-risk group, key molecules crucial for effector T cell recruitment. The altered expression of matrix remodeling genes (such as TIMP3) indicates significant extracellular matrix restructuring, potentially creating physical barriers to T cell infiltration [[181]73]. Changes in adhesion molecule expression (including VCAM-1 and ICAM-1) may affect T cell rolling and adhesion on vascular endothelium, influencing their migration to tumor sites [[182]74, [183]75]. The abnormal activation of angiogenesis-related genes (particularly the VEGF signaling pathway) likely contributes to aberrant tumor vasculature, impacting effective T cell infiltration [[184]76]. Notably, the high TP53 mutation rate in the high-risk group (62% versus 20% in the low-risk group) may influence the immune microenvironment through multiple mechanisms: mutant TP53 potentially alters cytokine and chemokine expression profiles affecting immune cell recruitment, and may modulate PD-L1 expression impacting immune checkpoint pathways [[185]77]. Our GSEA analysis further supports these findings, showing significant enrichment of cell motility and extracellular matrix remodeling pathways in the high-risk group, consistent with the T cell exclusion phenotype. TIDE analysis revealed not only elevated T cell exclusion scores but also higher immune evasion potential in the high-risk group, suggesting multiple immunosuppressive mechanisms. These molecular insights have important therapeutic implications: high-risk patients may benefit from combination strategies rather than single-agent immune checkpoint inhibition, such as combining anti-angiogenic agents to normalize vasculature or matrix-remodeling inhibitors to enhance T cell infiltration. For patients with TP53 mutations, more personalized immunotherapy approaches may be necessary. These findings provide a theoretical foundation for developing novel therapeutic strategies and emphasize the importance of precision medicine in endometrial cancer treatment. Our study's methodological framework, integrating 117 machine learning algorithms, represents both an innovation and a challenge in prognostic modeling. The key advantage of this large-scale algorithm integration lies in its ability to capture complex, non-linear relationships within high-dimensional data that might be missed by single-algorithm approaches. By systematically evaluating combinations of algorithms, including random survival forests, elastic nets, stepwise Cox models, and various boosting methods, we can identify complementary strengths among different approaches. For instance, while elastic nets excel at handling high-dimensional data with strong correlations, random forests better capture non-linear interactions. The StepCox[forward] + plsRcox combination emerged as optimal, suggesting that the stepwise feature selection complemented by partial least squares regression effectively balances model complexity with predictive power. However, we acknowledge several limitations in our methodological approach. The validation cohort is relatively small (50 cases), necessitating larger-scale, multicenter clinical cohorts to validate the model's stability and generalizability. While bioinformatics analyses revealed potential mechanisms of these genes, laboratory-level functional validation and mechanistic exploration are lacking. Additionally, the model incorporates limited clinical features; future studies could integrate more clinical indicators to improve prediction accuracy. Looking forward, prospective clinical studies are needed to validate the model's utility, along with in vitro and in vivo experiments to explore the specific molecular mechanisms of these programmed cell death-related genes in endometrial cancer development. Incorporating emerging technologies like single-cell sequencing could further elucidate these genes' regulatory networks within the tumor microenvironment. Such in-depth research will provide a more solid theoretical foundation for precision diagnosis and treatment of endometrial cancer. The clinical implications of our predictive model extend beyond prognostication to potentially guide therapeutic decision-making, particularly in immunotherapy. Our risk stratification system, combined with immune microenvironment characterization, offers several practical advantages for treatment selection. For high-risk patients showing elevated T cell exclusion scores and altered immune profiles, single-agent checkpoint inhibition may be insufficient. Instead, these patients might benefit from more aggressive combination approaches: for example, combining anti-PD-1/PD-L1 therapy with anti-VEGF agents to normalize tumor vasculature and enhance T cell infiltration. Conversely, low-risk patients with favorable immune profiles might achieve adequate responses with standard immunotherapy regimens. Additionally, the model's incorporation of TP53 mutation status could help identify patients who might benefit from specific immunotherapy combinations targeting p53-related immune evasion mechanisms. This personalized approach to immunotherapy selection, based on both risk score and immune profile, could potentially improve response rates and patient outcomes while avoiding unnecessary treatment toxicity in low-risk patients. Furthermore, our model could be valuable in clinical trial design, helping to stratify patients and identify those most likely to benefit from novel immunotherapy combinations or emerging therapeutic strategies. Conclusion This study developed a novel prognostic risk model for endometrial cancer by integrating 117 machine learning algorithms and focusing on programmed cell death-related genes, revealing significant insights into disease progression. The model demonstrates robust predictive performance across training and validation datasets, effectively stratifying patients into high- and low-risk groups with distinct survival outcomes, immune microenvironment characteristics, and potential therapeutic responses. By integrating multi-dimensional data analysis, including gene expression, molecular pathways, immune infiltration, and mutation profiles, the research provides a comprehensive framework for personalized prognosis and treatment strategy, highlighting the potential of computational approaches in advancing precision oncology for endometrial cancer. Author contributions TC and YY contributed equally to this work as co-first authors and were responsible for conceptualization, methodology, data analysis, and original draft writing. ZH and FP conducted data curation, software implementation, and validation. ZX and KG performed statistical analysis and figure preparation. WH and LX contributed to literature review and manuscript revision. XL and CF contributed equally as co-corresponding authors, supervised the project, acquired funding, and reviewed the final manuscript. All authors read and approved the final manuscript. Funding This work was supported by the State Key Laboratory of Ultrasound in Medicine and Engineering (Grant No.: 2022KFKT012) awarded to Prof. Yiqun Zhang for the project "Investigation on the efficacy and safety of different GnRH-a pretreatment protocols for FUAS in the treatment of adenomyosis" and the 2024 Shiyan City Guidance Scientific Research Project (Grant No.: 24Y083) awarded to the corresponding author, Caiyun Fang, for the project "Expression of IC3 cells in cervical lesions and analysis of related immune functions." Data availability All datasets utilized in the study are publicly available. The data analysis intermediate files are available from the corresponding author upon reasonable request. Declarations Ethics approval and consent to participate Not applicable. Consent for publication Not applicable. Competing interests The authors declare no competing interests. Footnotes Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Tianshu Chen and Yuhan Yang are co-first authors. Contributor Information Xueqin Liu, Email: 376973042@qq.com. Caiyun Fang, Email: 864876634@qq.com. References