Abstract

Background

   Endometrial cancer represents a significant health challenge, with
   rising incidence and complex prognostic challenges. This study aimed to
   develop a robust predictive model integrating programmed cell
   death-related genes and advanced machine learning techniques.

Methods

   Utilizing transcriptomic data from TCGA-UCEC and [42]GSE119041
   datasets, we employed a comprehensive approach involving 117 machine
   learning algorithms. Key methodologies included differential gene
   expression analysis, weighted gene co-expression network analysis,
   functional enrichment studies, immune landscape evaluation, and
   multi-dimensional risk stratification.

Results

   We identified 10 critical genes (PTGIS, TIMP3, SRPX, SNCA, HIC1, BAK1,
   STXBP2, TRIB3, RTKN2, E2F1) and constructed a prognostic model with
   superior predictive performance. The StepCox[forward] + plsRcox
   algorithm combination demonstrated excellent predictive accuracy
   (AUC > 0.8). Kaplan–Meier analysis revealed significant survival
   differences between high- and low-risk groups in both training
   (HR = 3.37, p < 0.001) and validation cohorts (HR = 2.05, p = 0.021).
   The model showed strong correlations with clinical characteristics,
   immune cell infiltration patterns, and potential therapeutic responses.

Conclusions

   This study presents a novel, comprehensive approach to endometrial
   cancer prognosis, integrating machine learning and molecular insights
   to provide a more precise risk stratification tool with potential
   clinical translation.

   Keywords: Endometrial cancer, Prognostic modeling, Programmed cell
   death, Machine learning, Precision oncology

Introduction

   Endometrial cancer is the sixth most common malignancy among women
   globally and the fourth most common gynecological cancer. Risk factors
   include obesity, diabetes, hypertension, hormonal imbalances, and
   genetic predisposition. The disease is typically diagnosed in
   postmenopausal women, with a median age of diagnosis around 60 years
   [[43]1, [44]2]. Endometrial cancer represents a significant health
   burden for women, with rising incidence rates annually. However, the
   management of endometrial cancer faces several critical challenges.
   Current diagnostic approaches have limited ability to accurately
   identify high-risk patients at initial presentation, leading to
   suboptimal treatment selection. Additionally, the heterogeneous
   response to immunotherapy and other targeted treatments highlights the
   need for more precise patient stratification tools. These challenges
   underscore the urgent need for improved prognostic models. Although
   prognostic prediction models still require refinement, significant
   advances have been achieved through various innovative models in recent
   years. Machine learning models incorporating immune cells and molecular
   markers have demonstrated 69% accuracy in recurrence prediction
   [[45]3]. While predictive models based on hysteroscopic data have shown
   high sensitivity and specificity, MRI radiomics models have been proven
   superior to traditional assessments in early diagnosis and staging
   [[46]4]. Novel perspectives for prognostic prediction have been
   provided by multi-omics studies and immune-related scoring models
   [[47]5, [48]6], and significant prognostic value has been demonstrated
   by inflammatory and hypoxia-related research, along with inflammatory
   markers, in high-grade endometrial cancer [[49]7, [50]8]. While these
   models show promise in improving endometrial cancer prognostic
   prediction, challenges persist. The integration of diverse data types
   requires sophisticated computational tools and validation in larger,
   more diverse patient populations. Additionally, translating these
   models into clinical practice necessitates collaboration among
   researchers, clinicians, and healthcare systems to ensure accessibility
   and applicability. As research progresses, these models may pave the
   way for more personalized and effective management strategies for
   endometrial cancer patients.

   Programmed cell death (PCD) is a crucial biological process that plays
   a vital role in tumor development and clinical prognosis. It
   encompasses multiple modalities including apoptosis, autophagy,
   pyroptosis, and ferroptosis, each contributing differently to cancer
   progression and treatment outcomes. The expression and regulation of
   PCD-related genes are essential for cancer development, and their
   dysregulation may lead to uncontrolled cell proliferation and tumor
   progression [[51]9–[52]12]. In terms of prognostic prediction, several
   studies have developed prognostic models based on PCD-related genes.
   For instance, in colorectal cancer, a risk score combining genes such
   as FABP4, AQP8, and NAT1 has been shown to effectively predict patient
   prognosis [[53]13], while in hepatocellular carcinoma, PCD-related
   gene-based prognostic models identified subtypes with distinct
   prognostic outcomes [[54]14]. Pan-cancer analysis further identified a
   gene signature that could distinguish patients with unfavorable
   prognosis [[55]12]. Regarding the tumor microenvironment, PCD functions
   by influencing immune cell infiltration and immune checkpoint
   expression, with high-risk patients typically showing altered immune
   landscapes that may affect their response to immunotherapy [[56]15,
   [57]16]. Meanwhile, PCD-related gene expression can predict sensitivity
   to certain drugs, for example, in colorectal cancer, patients with
   high-risk scores showed reduced response to immunotherapy and
   first-line clinical drugs [[58]13]. In clinical applications,
   therapeutic strategies targeting PCD pathways are being explored to
   enhance cancer treatment efficacy [[59]15]. Additionally, PCD-related
   genes can serve as potential biomarkers for cancer diagnosis and
   prognosis, such as SERPINE1 and G6PD being identified as important
   prognostic markers in hepatocellular carcinoma [[60]17]. Although PCD
   plays a crucial role in cancer development and prognosis, its
   mechanisms of action vary complexly across different cancer types. The
   interactions between PCD and the tumor microenvironment, along with the
   influence of genetic and epigenetic factors, further add to the
   complexity of research. The tumor microenvironment plays a crucial role
   in this interaction by influencing PCD through various mechanisms: the
   hypoxic, acidic, and nutrient-poor conditions within the TME can
   modulate cell death pathways, while stromal and immune cells can either
   promote or inhibit PCD depending on the context [[61]18]. These
   processes are further regulated by genetic and epigenetic alterations,
   including DNA methylation and histone modifications, which affect both
   PCD-related gene expression and immune responses [[62]18, [63]19].
   Understanding these complex interactions is essential for developing
   effective therapeutic strategies, as they significantly impact
   treatment response and resistance mechanisms [[64]20]. Future studies
   should focus on unraveling these complexities to develop more precise
   and effective therapeutic strategies.

   Although prognostic prediction models have been continuously improved,
   current predictive accuracy remains inadequate, particularly in
   assessing tumor recurrence and metastasis risk. While programmed cell
   death-related genes have been proven valuable for prognostic evaluation
   in various tumors, the potential application in endometrial cancer
   prognosis prediction has not been fully explored. Furthermore, existing
   studies predominantly employ single-dimensional data analysis, lacking
   comprehensive analytical approaches that integrate multi-dimensional
   information from genomics, transcriptomics, and clinicopathological
   features. This study addresses this gap by developing an innovative
   prognostic risk model using 117 machine learning algorithms,
   integrating data from TCGA and GEO databases, and focusing on
   programmed cell death-related genes. Through comprehensive analyses
   including differential gene expression, weighted gene co-expression
   network analysis, and immune landscape evaluation, the research aims to
   create a robust predictive tool that integrates risk scores, clinical
   stage, and age, ultimately providing a more nuanced approach to
   understanding endometrial cancer progression and potentially guiding
   personalized treatment strategies.

Methods

Data collection and processing

   RNA sequencing data were downloaded from the TCGA-UCEC project, which
   contained transcriptomic profiles from 550 tumor tissue samples and 35
   healthy control samples, along with corresponding clinical and survival
   information. For external validation, the [65]GSE119041 dataset was
   obtained from the GEO database, comprising gene expression microarray
   data from 50 UCEC tumor samples analyzed on the [66]GPL570 platform.
   Additionally, 1254 programmed cell death-related genes were identified
   through comprehensive literature review [[67]21].

   Quality control procedures were systematically applied to the raw data.
   For missing value processing, we excluded samples with > 20% missing
   values; remaining missing values were imputed using k-nearest neighbor
   (k = 10) method. Outliers were identified using the interquartile range
   (IQR) method, where values beyond Q1-1.5 × IQR or Q3 + 1.5 × IQR were
   flagged and verified. Expression values underwent log2 transformation
   and quantile normalization to ensure comparability across samples.
   Batch effects between different sequencing platforms were corrected
   using the ComBat algorithm. Sample size adequacy was verified through
   power analysis, indicating that our cohort size would provide 90% power
   to detect a hazard ratio of 1.5 at a significance level of 0.05.

Inclusion and exclusion criteria

   For TCGA-UCEC cohort selection, the following inclusion criteria were
   applied: (1) primary endometrial cancer samples with complete RNA
   sequencing data; (2) samples with complete clinical information
   including survival time, survival status, age, and tumor stage; (3)
   patients with follow-up time > 30 days. Exclusion criteria included:
   (1) samples with missing key clinical parameters; (2) patients without
   clear survival status; (3) samples with low RNA sequencing quality (RNA
   integrity number < 7); and (4) patients lost to follow-up within
   30 days. For the [68]GSE119041 validation cohort, similar criteria were
   applied: included samples required complete gene expression data,
   survival information, and clinical parameters, while samples with
   incomplete follow-up data or missing key variables were excluded. After
   applying these criteria, 550 tumor samples and 35 normal samples were
   included from TCGA-UCEC, and 50 tumor samples were retained from
   [69]GSE119041 for subsequent analysis.

Differential gene expression analysis

   Differential expression analysis was performed using the limma package
   in R. Raw expression matrices were subjected to log2 transformation and
   quantile normalization. A linear modeling approach was implemented to
   identify differentially expressed genes between tumor and normal
   samples. Statistical significance was determined using thresholds of
   |log2FC|≥ 1 and p-value < 0.05. Multiple testing correction was applied
   using the FDR method to control false discovery rates. Expression
   patterns were visualized through volcano plots and hierarchical
   clustering heatmaps. The biological relevance of identified genes was
   evaluated through comparison with known disease-associated genes.

Co-expression network analysis

   Gene co-expression networks were constructed using the WGCNA approach.
   Low-quality genes were filtered based on expression level and variance.
   A soft-thresholding power was selected by analyzing the scale-free
   topology fit index across various thresholds. Gene modules were
   identified through hierarchical clustering and dynamic tree cutting,
   with a minimum module size of 100 genes enforced. Module-trait
   relationships were quantified using module eigengenes, and significant
   associations were determined (correlation coefficient > 0.3, p < 0.05).
   Intramodular connectivity and gene significance measures were
   calculated to identify hub genes within each module.

Candidate gene selection and functional analysis

   A rigorous multi-step approach was employed to identify candidate
   genes. Differentially expressed genes were intersected with WGCNA key
   module genes and cross-referenced against programmed cell death-related
   genes. Protein–protein interaction networks were constructed using the
   STRING database with a confidence threshold of 0.4. Network topology
   metrics, including degree centrality and betweenness centrality, were
   computed for each node. Functional characterization was performed
   through GO and KEGG pathway enrichment analyses using the
   clusterProfiler package (adj.p < 0.05). Expression patterns across
   different tumor stages and grades were analyzed to evaluate biomarker
   potential.

Machine learning model construction

   Multiple machine learning algorithms were integrated to develop a
   robust prognostic model. Candidate genes were initially screened
   through univariate Cox regression analysis (p < 0.05). Ten classical
   algorithms were implemented, including Random Survival Forest (RSF),
   Elastic Net (Enet), Stepwise Cox (StepCox), CoxBoost, Partial Least
   Squares Regression Cox (plsRcox), Supervised Principal Components
   (superpc), Generalized Boosted Regression Model (GBM), Survival Support
   Vector Machine (survivalsvm), Ridge, and Lasso regression. These
   algorithms were systematically combined into 117 different modeling
   strategies to leverage their complementary strengths. Model performance
   was evaluated through K-fold cross-validation, with metrics including
   C-index and calibration curves being calculated. Sensitivity analyses
   were conducted to assess model robustness under varying parameter
   settings.

Model validation and clinical assessment

   The prognostic model was subjected to comprehensive validation
   procedures. Patient cohorts in both training and validation sets were
   stratified into high- and low-risk groups based on calculated risk
   scores. Survival differences between groups were assessed using
   Kaplan–Meier analysis, examining both overall survival and
   progression-free survival outcomes. The relationship between risk
   scores and clinical parameters was investigated through detailed
   statistical analyses. Univariate Cox regression was performed to
   evaluate the prognostic impact of clinical features. Significant
   variables (p < 0.05) were tested for proportional hazards assumptions
   and incorporated into multivariate analyses. Nomograms were constructed
   to predict 3-, 5-, and 7-year survival probabilities, with predictive
   accuracy evaluated through calibration curves.

Molecular mechanism investigation

   Gene Set Enrichment Analysis (GSEA) was performed for individual
   prognostic biomarkers. Correlation analysis and ranking procedures were
   applied to gene expression profiles. Human KEGG pathway gene sets were
   utilized as reference databases. Samples were dichotomized based on
   median risk scores, and differential pathway enrichment was analyzed
   between high- and low-risk groups. Enrichment analyses were conducted
   for GO terms and KEGG pathways, with significance threshold set at
   adj.p < 0.05. Results were visualized through appropriate graphical
   representations to illustrate biological implications.

Immune landscape and therapeutic prediction

   The tumor immune microenvironment was systematically evaluated using
   the xCell algorithm. Immune cell infiltration patterns were quantified
   and analyzed for significant enrichment (P < 0.05). The relationship
   between immune cell populations and risk scores was assessed through
   Spearman correlation analysis. Somatic mutation profiles were analyzed
   using the maftools package, focusing on differences between risk
   groups. Immunotherapy response prediction was performed using the TIDE
   computational framework, incorporating T cell dysfunction and exclusion
   scores. Drug sensitivity was predicted through integration with the
   GDSC database, with IC50 values calculated and correlated with risk
   scores to guide potential therapeutic strategies.

Results

Identification of prognostic markers and WGCNA analysis in endometrial cancer

   A total of 4,300 differentially expressed genes were identified,
   including 1930 upregulated and 2370 downregulated genes. Figure [70]1A
   shows the volcano plot of differential gene expression, with
   significant upregulated genes in red and downregulated genes in blue.
   Figure [71]1B presents the heatmap of differentially expressed genes,
   clearly demonstrating distinct expression patterns between tumor and
   normal samples. Using WGCNA analysis, samples were first clustered to
   detect outliers (Fig. [72]1C). Figure [73]1D shows the analysis of
   network topology for different soft-thresholding powers. The left panel
   displays the scale-free fit index (y-axis) versus soft-thresholding
   power (x-axis), with values ranging from 1 to 30. The optimal β = 10
   was selected as it was the lowest power at which the scale-free
   topology fit index curve flattened out upon reaching a high value
   (> 0.8). The right panel shows the mean connectivity (y-axis) versus
   soft-thresholding power (x-axis), demonstrating how connectivity
   decreases as the soft threshold increases. The gene co-expression
   network was constructed, revealing 10 distinct modules (Fig. [74]1E).
   The dendrogram from dynamic tree cutting shows the hierarchical
   clustering of genes into these modules (Fig. [75]1F). Figure [76]1G
   illustrates the module-trait relationships through a heatmap, where
   each row corresponds to a module eigengene and each column to a trait
   (Case/Control). The numbers in each cell represent the correlation
   coefficient and p-value (in parentheses). Three modules showed
   significant positive correlations with cancer phenotype: MEblue
   (r = 0.4, p < 0.05), MEbrown (r = 0.37, p < 0.05), and MEred (r = 0.32,
   p < 0.05).

Fig. 1.

   [77]Fig. 1
   [78]Open in a new tab

   Identification of prognostic markers and WGCNA analysis in endometrial
   cancer. A Volcano plot showing differentially expressed genes between
   tumor and normal samples. Red dots represent upregulated genes, blue
   dots represent downregulated genes. B Heatmap of differentially
   expressed genes between tumor and normal samples. C Sample clustering
   dendrogram and trait heatmap to detect outliers. D Analysis of network
   topology for different soft-thresholding powers. Left panel shows
   scale-free fit index versus soft-thresholding power; right panel shows
   mean connectivity versus soft-thresholding power. E Gene clustering
   dendrogram based on topological overlap. F Dynamic tree cut results
   showing gene modules. G Module-trait relationships showing correlation
   between module eigengenes and cancer traits

Integration analysis of DEGs, WGCNA, and enrichment analysis in endometrial
cancer

   Integrative analysis identified key molecular features and pathways in
   endometrial cancer. The Venn diagram (Fig. [79]2A) shows the
   intersection of differentially expressed genes (DEGs), cell
   death-related genes (CDRGs), and WGCNA results. Among these, 65 genes
   were found at the intersection of all three analyses, representing
   potential key regulators. The protein–protein interaction (PPI) network
   of these 65 candidate genes (Fig. [80]2B) was constructed using STRING
   database with a confidence score > 0.4, revealing complex interactions
   among these molecules.

Fig. 2.

   [81]Fig. 2
   [82]Open in a new tab

   Integration analysis of DEGs, WGCNA, and enrichment analysis. A Venn
   diagram showing overlaps between DEGs, CDRGs, and WGCNA results. B PPI
   network of 65 overlapping genes constructed using STRING database. C GO
   enrichment analysis results showing biological processes, cellular
   components, and molecular functions. D KEGG pathway analysis results
   showing enriched cancer-related pathways

   The GO enrichment analysis (Fig. [83]2C) revealed multiple
   significantly enriched biological processes, cellular components, and
   molecular functions. The biological processes (BP) were primarily
   enriched in apoptotic signaling pathway regulation (including positive
   and negative regulation), transport regulation, and autophagy
   regulation. Notably, mitochondrial organization and extrinsic apoptotic
   signaling pathway showed the highest enrichment scores. For cellular
   components (CC), the analysis highlighted membrane-related structures
   including membrane raft, caveola, plasma membrane, and clathrin-coated
   vesicle membrane. The molecular functions (MF) showed significant
   enrichment in protein binding activities, particularly heat shock
   protein binding, lyase activity, and peptidase regulator activity.

   The KEGG pathway analysis (Fig. [84]2D) identified five major
   cancer-related pathways, illustrated in a circular layout. The central
   node represents the intersection of all pathways, with individual
   pathways radiating outward. The most significantly enriched pathways
   included proteoglycans in cancer (containing genes like ERBB3, F2R, and
   HGF), calcium signaling pathway (including ITPR1, KDR, and NGF),
   melanoma pathway, bladder cancer pathway, and microRNAs in cancer.

Performance evaluation of machine learning models for prognostic prediction
in endometrial cancer

   A comprehensive evaluation of 117 machine learning combinations was
   conducted to develop an optimal prognostic model. Time-dependent ROC
   analysis at 3-year (Fig. [85]3A), 5-year (Fig. [86]3B), and 7-year
   (Fig. [87]3C) intervals demonstrated model performance across different
   time points. The heatmap visualization shows AUC values for each
   algorithm combination, with Dataset1 (training) and Dataset2
   (validation) performance displayed in green and blue bars respectively.
   The StepCox[forward] + plsRcox combination consistently achieved
   superior performance, with the highest AUC values across all time
   points (AUC > 0.8).

Fig. 3.

   [88]Fig. 3
   [89]Open in a new tab

   Machine learning model performance and survival analysis. A–C ROC curve
   analysis results for 3-year, 5-year, and 7-year survival predictions
   across 117 algorithm combinations. D Kaplan–Meier survival curves for
   high- and low-risk groups in training (left) and validation (right)
   cohorts

   Kaplan–Meier survival analysis (Fig. [90]3D) further validated the
   model’s prognostic value. In the training cohort (n = 546, left panel),
   the high-risk group showed significantly worse survival compared to the
   low-risk group (HR = 3.37, 95% CI 2.24–5.07, p < 0.001). This finding
   was independently validated in Dataset2 (n = 50, right panel), where
   the high-risk group maintained significantly poorer survival outcomes
   (HR = 2.05, 95% CI 1.08–3.87, p = 0.021). The survival curves
   demonstrated clear stratification between risk groups, with the
   separation particularly pronounced in the first 2000 days. The shaded
   areas represent 95% confidence intervals, and the dotted lines indicate
   median survival times.

Clinical relevance and prognostic model validation in endometrial cancer

   Boxplot analysis (Fig. [91]4A) revealed significant associations
   between risk scores and clinical characteristics. Risk scores were
   significantly higher in patients aged > 60 years (p < 0.0001).
   Similarly, advanced tumor stages (III-IV) showed progressively higher
   risk scores compared to early stages (I–II), with significant
   inter-stage differences (p < 0.05).

Fig. 4.

   [92]Fig. 4
   [93]Open in a new tab

   Clinical relevance and model validation. A Boxplots showing risk score
   distribution by age and stage. B Forest plots of univariate and
   multivariate Cox regression analyses. C Nomogram for predicting 3-, 5-,
   and 7-year survival probability. D Calibration curves for
   nomogram-predicted survival. E Decision curve analysis at 3-, 5-, and
   7-year time points

   Univariate and multivariate Cox regression analyses (Fig. [94]4B)
   identified three independent prognostic factors: stage (HR = 1.61, 95%
   CI 1.5–2.18), risk score (HR = 2.04, 95% CI 1.44–2.87), and age
   (HR = 1.03, 95% CI 1.01–1.05). The nomogram (Fig. [95]4C) integrated
   these factors to predict 3-, 5-, and 7-year survival probabilities. The
   calibration curves (Fig. [96]4D) demonstrated excellent agreement
   between predicted and observed outcomes (C-index = 0.78).

   Decision curve analysis (DCA) at 3-, 5-, and 7-year time points
   (Fig. [97]4E) evaluated the clinical utility of the nomogram compared
   to individual factors. The nomogram consistently showed superior net
   benefit across a wide range of threshold probabilities, outperforming
   both single predictors and treat-all/treat-none strategies. This
   advantage was particularly pronounced at threshold probabilities
   between 0.2 and 0.6, indicating optimal clinical applicability in this
   range.

Single-gene GSEA pathway analysis of key prognostic genes in endometrial
cancer

   GSEA revealed distinct pathway enrichment patterns for each prognostic
   gene (Fig. [98]5): PTGIS was enriched in focal adhesion, ECM-receptor
   signaling pathway, calcium signaling pathway, neuromuscle muscle
   contraction, and neuroactive ligand-receptor interaction pathways, with
   peak enrichment scores around 0.6 (Fig. [99]5). TRIB3 showed
   significant association with cell cycle regulation, spliceosome
   function, RNA degradation, DNA replication, and pyrimidine metabolism,
   demonstrating highest enrichment scores of approximately 0.5
   (Fig. [100]5). TIMP3 was enriched in focal adhesion, ECM-receptor
   interaction, muscular muscle contraction, and calcium signaling
   pathways, with enrichment scores reaching 0.6 (Fig. [101]5). STXBP2
   exhibited strong enrichment in immune-related pathways including
   antigen processing/presentation, graft-versus-host disease, autoimmune
   disease, allograft rejection, and ribosome function, showing both
   positive and negative enrichment patterns (Fig. [102]5). SRPX showed
   significant enrichment in focal adhesion, ECM receptor interaction,
   axonal cell guidance, cardiac development, and melanogenesis pathways
   (Fig. [103]5). BAK1 was primarily associated with spliceosome, cell
   cycle, proteasome, aminoacyl-tRNA biosynthesis, and RNA degradation
   pathways (Fig. [104]5). SNCA demonstrated enrichment in focal adhesion,
   ECM-receptor interaction, MAPK signaling, melanogenesis, and GAP
   junction pathways (Fig. [105]5). RTKN2 showed strong association with
   spliceosome, ubiquitin-mediated proteolysis, RNA degradation, and
   protein export pathways (Fig. [106]5). HIC1 exhibited significant
   enrichment in focal adhesion, neuroactive ligand-receptor interaction,
   oxidative phosphorylation, and autoimmune disease pathways, with some
   showing negative enrichment patterns (Fig. [107]5). E2F1 was enriched
   in cell cycle, spliceosome, complement/coagulation cascades, RNA
   splicing, and proteasome pathways, with distinct positive and negative
   enrichment patterns (Fig. [108]5). Each gene's enrichment analysis
   showed statistical significance (FDR < 0.05), with enrichment curves
   demonstrating unique temporal patterns and peak enrichment scores
   ranging from 0.4 to 0.8.

Fig. 5.

   [109]Fig. 5
   [110]Open in a new tab

   Single-gene GSEA analysis of key prognostic genes. GSEA results for 10
   prognostic genes (PTGIS, TRIB3, TIMP3, STXBP2, SRPX, BAK1, SNCA, RTKN2,
   HIC1, and E2F1) showing enriched pathways and their enrichment scores

Immune cell infiltration analysis and its association with risk score in
endometrial cancer

   Analysis of immune cell infiltration patterns revealed distinct
   immunological features across risk groups. The stacked bar plot
   (Fig. [111]6A) displays the proportion of immune cells in each sample,
   showing heterogeneous immune cell composition across patients. A
   quantitative comparison of immune cell infiltration between risk groups
   (Fig. [112]6B) identified significant differences in multiple immune
   cell populations, with particularly notable variations in CD8 + T
   cells, M1 macrophages, and dendritic cells. The correlation heatmap
   (Fig. [113]6C) illustrates relationships between key prognostic genes
   and immune cell populations. Strong positive correlations were observed
   between certain genes (e.g., TIMP3, SRPX) and immune cells like M2
   macrophages and CD4 + memory T cells, while negative correlations were
   found with cells like neutrophils and activated NK cells. Further
   analysis (Fig. [114]6D) revealed significant correlations between risk
   scores and key immune parameters: Exclusion score (R = 0.38,
   p < 2.2e−16), Dysfunction score (R = − 0.31, p = 1.1e−13), and TIDE
   score (R = 0.22, p = 1.4e−07). Higher risk scores were associated with
   increased immune exclusion and TIDE scores but decreased dysfunction
   scores, suggesting that high-risk patients might have compromised
   immune surveillance and potentially different responses to
   immunotherapy.

Fig. 6.

   [115]Fig. 6
   [116]Open in a new tab

   Immune cell infiltration analysis. A Bar plot showing immune cell
   composition in individual samples. B Box plots comparing immune cell
   proportions between risk groups. C Correlation heatmap between key
   genes and immune cell types. D Correlation plots between risk scores
   and immune parameters (Exclusion, Dysfunction, and TIDE scores)

Somatic mutation and pathway enrichment analysis in high- and low-risk groups

   Somatic mutation analysis (Fig. [117]7A) revealed distinct mutational
   patterns. The low-risk group showed predominant mutations in PTEN
   (86%), ARID1A (56%), PIK3CA (54%), TTN (40%), PIK3R1 (37%), CTNB1
   (34%), CTCF (33%), KMT2D (29%), ZFHX3 (25%), CSMD3 (24%), MUC16 (24%),
   OBSCN (24%), RYR2 (22%), FAT1 (22%), and MACF1 (22%). In contrast, the
   high-risk group was characterized by frequent mutations in TP53 (62%),
   PIK3CA (44%), PTEN (41%), ARID1A (35%), TTN (34%), KMT2D (27%), MUC16
   (25%), PPP2R1A (22%), FAT1 (21%), and ZFHX4 (20%).

Fig. 7.

   [118]Fig. 7
   [119]Open in a new tab

   Mutation landscape and pathway analysis in risk groups. A Oncoplot
   showing mutation profiles in low-risk (left) and high-risk (right)
   groups. B GSEA results showing enriched GO terms and KEGG pathways
   between risk groups. C Correlation plots between risk scores and drug
   sensitivity (IC[50] values)

   GSEA analysis demonstrated significant downregulation of cellular
   movement-related pathways in the high-risk group, including axoneme
   assembly, microtubule bundle formation, cilium organization, and cell
   motility processes (Fig. [120]7B). KEGG pathway analysis revealed
   enrichment of immune-related pathways in the low-risk group,
   particularly neuroactive ligand-receptor interaction, cell cycle
   regulation, antigen processing, cytokine interactions, and complement
   cascades (Fig. [121]7B).

   Drug sensitivity analysis revealed significant negative correlations
   between risk scores and IC50 values for several compounds (p < 0.0001):
   MG-132 (R = − 0.32, p = 9.9e−15), UMI-77 (R = − 0.33, p = 8.4e−15),
   Sepantronium bromide (R = − 0.53, p < 2.2e−16), and WEHI-539
   (R = − 0.54, p < 2.2e−16), suggesting potential therapeutic
   implications for high-risk patients (Fig. [122]7C).

Discussion

   Endometrial cancer is a malignant tumor that seriously threatens
   women's health, and despite advances in traditional treatments such as
   surgery, radiotherapy, and chemotherapy, patient prognosis remains
   unsatisfactory [[123]22]. Therefore, developing more accurate
   prognostic prediction tools is crucial for improving patient outcomes.
   Machine Learning (ML) approaches have overcome the limitations of
   traditional statistical methods in handling high-dimensional data and
   complex biological relationships. Traditional approaches are
   insufficient in processing and integrating multi-omics datasets
   [[124]23], while complex biological interactions and nonlinear
   relationships are found to be beyond their analytical scope [[125]24].
   In contrast, ML models have demonstrated unique advantages in
   efficiently processing large, complex datasets [[126]23] and
   integrating multi-dimensional information for comprehensive assessment
   [[127]25]. This study integrated data from TCGA and GEO databases,
   combining 117 machine learning algorithms to construct, for the first
   time, a prognostic risk model based on 10 programmed cell death-related
   genes (PTGIS, TIMP3, SRPX, SNCA, HIC1, BAK1, STXBP2, TRIB3, RTKN2,
   E2F1). This model not only demonstrated good predictive performance in
   both training and validation sets but also effectively stratified
   patients into high and low-risk groups, with significant differences in
   survival outcomes between the groups. Notably, the model showed
   significant correlation with clinicopathological features (such as
   tumor stage and age), suggesting its potential clinical application
   value in disease progression assessment.

   Recent research indicates that with advances in bioinformatics
   technology, increasingly more computational methods are being applied
   to tumor prognosis prediction. Traditional studies primarily utilized
   single machine learning algorithms such as LASSO regression [[128]26]
   or random forests [[129]27] to construct prognostic models. In
   comparison, this study innovatively integrated 117 machine learning
   algorithms for model selection, a systematic evaluation strategy that
   not only improved model prediction accuracy but also enhanced its
   clinical application reliability. In molecular feature research,
   previous studies have separately explored immune microenvironment
   [[130]28], mutation spectrum [[131]29], and drug sensitivity [[132]30]
   characteristics, but lacked systematic integrated analysis. Through
   multi-dimensional data integration, this study not only revealed
   significant differences between high and low-risk groups in these
   aspects but also provided new insights for individualized treatment.
   Particularly, approaching from the perspective of programmed cell death
   aligns with recent research emphasizing its importance in tumor
   progression [[133]9, [134]11, [135]31, [136]32]. Furthermore, the
   observed elevated T cell exclusion score in the high-risk group
   corresponds with recent findings on immune escape mechanisms [[137]33].
   Considering recent advances in tumor heterogeneity research, especially
   the advantages of single-cell sequencing technology in revealing tumor
   microenvironment complexity [[138]34–[139]36], future research could
   combine single-cell transcriptomics and spatial transcriptomics
   technologies [[140]37] to deeply explore the expression patterns of
   programmed cell death-related genes in different cell types and their
   interactions with the immune microenvironment, which will contribute to
   a more comprehensive understanding of disease progression mechanisms
   and provide new targets for precision therapy.

   In the construction of the prognostic model, this study employed
   multi-level screening strategies and validation methods. Compared to
   traditional research methods of screening candidate genes through
   differential expression analysis [[141]38], or studies focusing on
   specific pathway-related genes [[142]39], this study integrated
   multi-dimensional screening including WGCNA network analysis,
   differential expression analysis, and programmed cell death-related
   genes, ultimately identifying 10 candidate genes: PTGIS, TIMP3, SRPX,
   SNCA, HIC1, BAK1, STXBP2, TRIB3, RTKN2, and E2F1. This multi-level
   screening strategy enhances gene selection reliability and aligns with
   recent concepts of multi-omics integration analysis [[143]40, [144]41].
   In model construction, the study evaluated the predictive performance
   of 117 machine learning algorithm combinations using the Mime1 package,
   including random survival forest (RSF), elastic net (Enet), stepwise
   Cox (StepCox), CoxBoost, partial least squares regression for Cox
   (plsRcox), supervised principal components (superpc), generalized
   boosted regression models (GBM), survival support vector machines
   (survivalsvm), Ridge, and least absolute shrinkage and selection
   operator (Lasso), ultimately selecting StepCox[forward] + plsRcox as
   the optimal combination. This systematic evaluation approach provides a
   different modeling perspective compared to existing prediction models
   [[145]42, [146]43]. Notably, the model demonstrated stable predictive
   performance in the independent validation set [147]GSE119041 (including
   50 UCEC samples), with cross-platform validation results aligning with
   model validation standards suggested by recent research [[148]44].
   Kaplan–Meier survival analysis showed significant differences in
   survival prognosis between high and low-risk groups (p < 0.05) in both
   training and validation sets, supporting the stability and potential
   generalizability of the prognostic model. Furthermore, the validation
   strategy based on multiple independent datasets aligns with the
   methodological framework proposed in recent publications on clinical
   prediction model evaluation [[149]45].

   The ten key genes identified through 117 machine learning
   algorithms—PTGIS, TIMP3, SRPX, SNCA, HIC1, BAK1, STXBP2, TRIB3, RTKN2,
   and E2F1—play crucial roles in the development and progression of
   endometrial cancer. PTGIS is involved in prostaglandin synthesis, while
   TIMP3 inhibits matrix metalloproteinases, both affecting tumor growth
   and metastasis [[150]46]. E2F1, a key regulator of the cell cycle, is
   overexpressed in various cancers, promoting cell proliferation
   [[151]47]. SNCA is associated with neurodegenerative disorders, and
   HIC1 functions as a tumor suppressor gene; their dysregulation can
   contribute to tumorigenesis [[152]48]. TRIB3 and BAK1 are related to
   stress response and apoptosis, respectively, and their mutations may
   lead to cancer cell resistance to death signals [[153]49]. RTKN2 and
   STXBP2 are involved in cytoskeletal organization and vesicle
   trafficking, and their mutations can affect cellular structure and
   signal transduction. TRIB3 modulates the PI3K/AKT pathway, a critical
   signaling cascade in cancer cell survival and proliferation [[154]50].
   BAK1, a pro-apoptotic gene, plays a vital regulatory role in programmed
   cell death [[155]49]. HIC1 and SNCA may influence the tumor immune
   microenvironment by modulating immune cell infiltration and response,
   thereby affecting tumor progression and prognosis [[156]51]. PTGIS,
   associated with fatty acid metabolism, has been identified as a key
   gene affecting the malignant biological behavior of EC, with its
   expression levels correlating with tumor invasiveness and immune status
   in the microenvironment [[157]52]. SNCA can influence tumor progression
   by regulating the cell cycle and apoptosis [[158]48]. E2F1
   dysregulation is associated with increased cell proliferation and tumor
   progression in EC [[159]53]. Although the specific mechanisms of TIMP3,
   SRPX, and STXBP2 were not discussed in detail in the previous
   endometrial cancer studies, it is generally understood that TIMP3 is
   involved in extracellular matrix degradation, SRPX participates in cell
   adhesion and migration [[160]54], and STXBP2 is involved in vesicle
   transport and secretion [[161]55], all of which may affect tumor cell
   invasion and metastasis. HIC1 [[162]56] and RTKN2 [[163]57] influence
   cell proliferation and survival through tumor suppressor and oncogenic
   signaling pathways, respectively.

   Through functional enrichment analysis, this study found that the
   selected candidate genes were mainly enriched in pathways such as
   proteoglycans in endometrial cancer, consistent with previous research
   [[164]58] highlighting the crucial role of proteoglycans in endometrial
   tumor progression. Single-gene GSEA analysis further revealed
   associations between these genes and multiple cancer-related pathways,
   particularly PTGIS and BAK1 genes showing enrichment in apoptosis and
   proliferation-related pathways. PPI network analysis identified E2F1 at
   the network's core, aligning with previous research [[165]59] reporting
   E2F1’s bidirectional regulatory role in tumor progression. In clinical
   applications, the nomogram constructed in this study integrated factors
   including risk score, clinical stage, and age, providing clinicians
   with an intuitive prognostic assessment tool. This integrated model
   construction approach is similar to strategies employed in recent
   research [[166]60], with DCA curve analysis supporting the model's
   clinical utility across a wide range of threshold probabilities.
   Regarding the immune microenvironment, immune cell infiltration
   analysis revealed significant differences in immune cell composition
   between risk groups, with TIDE analysis indicating higher immune escape
   potential in the high-risk group, echoing recent findings [[167]61].
   Notably, we found significantly higher TP53 mutation rates in the
   high-risk group compared to the low-risk group (62% vs 20%), while PTEN
   mutations were more prevalent in the low-risk group (86%), suggesting
   these mutation patterns may be associated with immune microenvironment
   remodeling. Finally, drug sensitivity analysis identified four drugs
   showing enhanced therapeutic potential in the high-risk group, aligning
   with recent research [[168]62] suggesting that immune checkpoint
   inhibitors may not be suitable for all patients, emphasizing the
   importance of individualized treatment planning. While these findings
   provide new perspectives for precision treatment of endometrial cancer,
   prospective clinical studies are needed to validate their clinical
   application value.

   Drug sensitivity analysis revealed four compounds showing significantly
   increased efficacy in the high-risk group, as indicated by lower IC50
   values: MG-132 (R = − 0.32, p = 9.9e−15), UMI-77 (R = − 0.33,
   p = 8.4e−15), Sepantronium bromide (R = − 0.53, p < 2.2e−16), and
   WEHI-539 (R = − 0.54, p < 2.2e−16). Each of these drugs targets
   specific nodes in programmed cell death pathways through distinct but
   complementary mechanisms. MG-132, a proteasome inhibitor, enhances
   cancer cell apoptosis by inhibiting the ubiquitin–proteasome pathway,
   leading to increased caspase-3 activation and reactive oxygen species
   (ROS) upregulation. Notably, MG-132 has shown synergistic effects with
   cisplatin in endometrial cancer cells and can enhance the expression of
   apoptotic markers when combined with other therapeutics [[169]63,
   [170]64]. UMI-77 specifically targets the Bcl-2 family of proteins, key
   regulators of the intrinsic apoptotic pathway, promoting apoptosis by
   inhibiting anti-apoptotic protein function [[171]65]. Sepantronium
   bromide (YM155) operates through a distinct mechanism as a survivin
   suppressant, targeting this critical inhibitor of apoptosis protein
   that is frequently overexpressed in endometrial cancer [[172]66].
   WEHI-539, a selective Bcl-xL inhibitor, disrupts the balance of pro-
   and anti-apoptotic signals by specifically targeting Bcl-xL-mediated
   survival pathways [[173]67]. The enhanced sensitivity to these
   compounds in high-risk patients suggests that their tumors may be more
   dependent on anti-apoptotic mechanisms for survival, particularly
   through proteasome-mediated protein degradation and Bcl-2 family
   protein regulation.

   Recent years have witnessed diverse approaches to developing prognostic
   models for endometrial cancer, yet achieving consistent high accuracy
   remains challenging. A systematic review of risk prediction models for
   the general population revealed AUC values ranging from 0.68 to 0.77,
   even when incorporating comprehensive epidemiological variables
   including reproductive history, hormone use, BMI, and smoking history
   [[174]68]. While diagnostic models for symptomatic women showed
   somewhat improved performance with AUC values between 0.73 and 0.957,
   many still struggled to consistently exceed the 0.8 threshold despite
   incorporating clinical predictors such as endometrial thickness and
   recurrent bleeding patterns [[175]68]. Traditional epidemiologic
   models, even when utilizing data from the Epidemiology of Endometrial
   Cancer Consortium, achieved limited discriminative ability with AUC
   values between 0.64 and 0.69, and notably, the addition of genetic
   factors did not significantly enhance their performance [[176]69]. More
   sophisticated approaches using machine learning algorithms to predict
   concurrent endometrial carcinoma in patients with endometrial
   intraepithelial neoplasia achieved a maximum AUC of only 0.646 with
   random forest models [[177]70]. Bayesian network models analyzing
   survival-related factors showed improvement with an AUC of 0.787,
   outperforming traditional Cox proportional hazards models (AUC = 0.723)
   but still falling short of optimal predictive power [[178]42]. Recent
   molecular and biomarker-based approaches have also shown moderate
   success: fragmentomics-based liquid biopsy models demonstrated AUC
   values of 0.72 for stage prediction and 0.73 for histological subtype
   classification [[179]71], while disulfidptosis-related prognostic
   models achieved AUCs of 0.71 for overall survival and 0.69 for
   disease-free survival [[180]72]. In contrast, our model achieved
   superior predictive performance (AUC > 0.8) through the integration of
   117 machine learning algorithms, comprehensive incorporation of
   programmed cell death-related genes, multi-dimensional validation
   strategy, and robust cross-platform validation.

   Our immune microenvironment analysis revealed significantly elevated T
   cell exclusion scores in the high-risk group, suggesting complex immune
   evasion mechanisms that operate through multiple molecular pathways. At
   the cytokine and chemokine level, we observed decreased expression of
   CXCL9/CXCL10/CXCL11 in the high-risk group, key molecules crucial for
   effector T cell recruitment. The altered expression of matrix
   remodeling genes (such as TIMP3) indicates significant extracellular
   matrix restructuring, potentially creating physical barriers to T cell
   infiltration [[181]73]. Changes in adhesion molecule expression
   (including VCAM-1 and ICAM-1) may affect T cell rolling and adhesion on
   vascular endothelium, influencing their migration to tumor sites
   [[182]74, [183]75]. The abnormal activation of angiogenesis-related
   genes (particularly the VEGF signaling pathway) likely contributes to
   aberrant tumor vasculature, impacting effective T cell infiltration
   [[184]76]. Notably, the high TP53 mutation rate in the high-risk group
   (62% versus 20% in the low-risk group) may influence the immune
   microenvironment through multiple mechanisms: mutant TP53 potentially
   alters cytokine and chemokine expression profiles affecting immune cell
   recruitment, and may modulate PD-L1 expression impacting immune
   checkpoint pathways [[185]77]. Our GSEA analysis further supports these
   findings, showing significant enrichment of cell motility and
   extracellular matrix remodeling pathways in the high-risk group,
   consistent with the T cell exclusion phenotype. TIDE analysis revealed
   not only elevated T cell exclusion scores but also higher immune
   evasion potential in the high-risk group, suggesting multiple
   immunosuppressive mechanisms. These molecular insights have important
   therapeutic implications: high-risk patients may benefit from
   combination strategies rather than single-agent immune checkpoint
   inhibition, such as combining anti-angiogenic agents to normalize
   vasculature or matrix-remodeling inhibitors to enhance T cell
   infiltration. For patients with TP53 mutations, more personalized
   immunotherapy approaches may be necessary. These findings provide a
   theoretical foundation for developing novel therapeutic strategies and
   emphasize the importance of precision medicine in endometrial cancer
   treatment.

   Our study's methodological framework, integrating 117 machine learning
   algorithms, represents both an innovation and a challenge in prognostic
   modeling. The key advantage of this large-scale algorithm integration
   lies in its ability to capture complex, non-linear relationships within
   high-dimensional data that might be missed by single-algorithm
   approaches. By systematically evaluating combinations of algorithms,
   including random survival forests, elastic nets, stepwise Cox models,
   and various boosting methods, we can identify complementary strengths
   among different approaches. For instance, while elastic nets excel at
   handling high-dimensional data with strong correlations, random forests
   better capture non-linear interactions. The StepCox[forward] + plsRcox
   combination emerged as optimal, suggesting that the stepwise feature
   selection complemented by partial least squares regression effectively
   balances model complexity with predictive power. However, we
   acknowledge several limitations in our methodological approach. The
   validation cohort is relatively small (50 cases), necessitating
   larger-scale, multicenter clinical cohorts to validate the model's
   stability and generalizability. While bioinformatics analyses revealed
   potential mechanisms of these genes, laboratory-level functional
   validation and mechanistic exploration are lacking. Additionally, the
   model incorporates limited clinical features; future studies could
   integrate more clinical indicators to improve prediction accuracy.
   Looking forward, prospective clinical studies are needed to validate
   the model's utility, along with in vitro and in vivo experiments to
   explore the specific molecular mechanisms of these programmed cell
   death-related genes in endometrial cancer development. Incorporating
   emerging technologies like single-cell sequencing could further
   elucidate these genes' regulatory networks within the tumor
   microenvironment. Such in-depth research will provide a more solid
   theoretical foundation for precision diagnosis and treatment of
   endometrial cancer.

   The clinical implications of our predictive model extend beyond
   prognostication to potentially guide therapeutic decision-making,
   particularly in immunotherapy. Our risk stratification system, combined
   with immune microenvironment characterization, offers several practical
   advantages for treatment selection. For high-risk patients showing
   elevated T cell exclusion scores and altered immune profiles,
   single-agent checkpoint inhibition may be insufficient. Instead, these
   patients might benefit from more aggressive combination approaches: for
   example, combining anti-PD-1/PD-L1 therapy with anti-VEGF agents to
   normalize tumor vasculature and enhance T cell infiltration.
   Conversely, low-risk patients with favorable immune profiles might
   achieve adequate responses with standard immunotherapy regimens.
   Additionally, the model's incorporation of TP53 mutation status could
   help identify patients who might benefit from specific immunotherapy
   combinations targeting p53-related immune evasion mechanisms. This
   personalized approach to immunotherapy selection, based on both risk
   score and immune profile, could potentially improve response rates and
   patient outcomes while avoiding unnecessary treatment toxicity in
   low-risk patients. Furthermore, our model could be valuable in clinical
   trial design, helping to stratify patients and identify those most
   likely to benefit from novel immunotherapy combinations or emerging
   therapeutic strategies.

Conclusion

   This study developed a novel prognostic risk model for endometrial
   cancer by integrating 117 machine learning algorithms and focusing on
   programmed cell death-related genes, revealing significant insights
   into disease progression. The model demonstrates robust predictive
   performance across training and validation datasets, effectively
   stratifying patients into high- and low-risk groups with distinct
   survival outcomes, immune microenvironment characteristics, and
   potential therapeutic responses. By integrating multi-dimensional data
   analysis, including gene expression, molecular pathways, immune
   infiltration, and mutation profiles, the research provides a
   comprehensive framework for personalized prognosis and treatment
   strategy, highlighting the potential of computational approaches in
   advancing precision oncology for endometrial cancer.

Author contributions

   TC and YY contributed equally to this work as co-first authors and were
   responsible for conceptualization, methodology, data analysis, and
   original draft writing. ZH and FP conducted data curation, software
   implementation, and validation. ZX and KG performed statistical
   analysis and figure preparation. WH and LX contributed to literature
   review and manuscript revision. XL and CF contributed equally as
   co-corresponding authors, supervised the project, acquired funding, and
   reviewed the final manuscript. All authors read and approved the final
   manuscript.

Funding

   This work was supported by the State Key Laboratory of Ultrasound in
   Medicine and Engineering (Grant No.: 2022KFKT012) awarded to Prof.
   Yiqun Zhang for the project "Investigation on the efficacy and safety
   of different GnRH-a pretreatment protocols for FUAS in the treatment of
   adenomyosis" and the 2024 Shiyan City Guidance Scientific Research
   Project (Grant No.: 24Y083) awarded to the corresponding author, Caiyun
   Fang, for the project "Expression of IC3 cells in cervical lesions and
   analysis of related immune functions."

Data availability

   All datasets utilized in the study are publicly available. The data
   analysis intermediate files are available from the corresponding author
   upon reasonable request.

Declarations

Ethics approval and consent to participate

   Not applicable.

Consent for publication

   Not applicable.

Competing interests

   The authors declare no competing interests.

Footnotes

   Publisher's Note

   Springer Nature remains neutral with regard to jurisdictional claims in
   published maps and institutional affiliations.

   Tianshu Chen and Yuhan Yang are co-first authors.

Contributor Information

   Xueqin Liu, Email: 376973042@qq.com.

   Caiyun Fang, Email: 864876634@qq.com.

References