Abstract Background The tumor microenvironment (TME) exerts a profound influence on the progression, therapeutic responses, and clinical outcomes of acute myeloid leukemia (AML), a prevalent hematologic malignancy in adults. This study aimed to establish a TME-based prognostic model to unveil novel therapeutic and prognostic avenues for AML. Methods Gene expression profiles and clinical information for 134 AML patients were retrieved from The Cancer Genome Atlas (TCGA). The TME cellular components were evaluated using the ESTIMATE algorithm, and differentially expressed genes (DEGs) were identified. A Microenvironment Prognostic Model (MPM) was subsequently constructed through univariate Cox regression, LASSO regression, and multivariate Cox regression analyses. The predictive performance of the MPM was validated in a separate cohort of 312 AML patients from the TARGET database. Results Kaplan-Meier analysis revealed significant associations between the TME, French-American-British (FAB) classification, and overall survival (p-values = 3.6e-07 and 0.011, respectively). LASSO-Cox regression identified eight essential genes (CXCL12, GZMB, ITPR2, LYN, RAB9B, RGMB, RUFY4, TRIM16) that exhibited a strong correlation with survival (p-value < 0.0001). The MPM demonstrated excellent prognostic performance, with area under the curve (AUC) values of 84.05, 85.73, and 89.54 for predicting 1-, 3-, and 5-year survival, respectively. External validation with the TARGET database underscored the robustness of this model, yielding AUC values of 60.5%, 56.7%, and 55.7% at the corresponding intervals. Conclusion These findings present a TME-based prognostic model that offers a promising avenue for precise risk stratification and targeted therapeutic strategies in AML. 1 Background Acute myeloid leukemia (AML), the most prevalent form of adult acute leukemia, arises from the unchecked proliferation of myeloid precursor cells. This abnormal growth disrupts normal blood cell production, leading to bone marrow failure [[42]1,[43]2]. Despite achieving complete remission with initial induction chemotherapy, AML patients face a disappointingly low five-year survival rate, primarily due to frequent relapses. These relapses are often attributed to minimal residual disease (MRD) ensconced within a protective tumor microenvironment (TME) that promotes immune evasion and resistance to treatment [[44]3,[45]4]. The tumor microenvironment (TME) constitutes a spatially organized, metabolically dynamic niche where malignant cells co-opt stromal fibroblasts, endothelial cells, immunosuppressive myeloid populations, and extracellular matrix (ECM) components to drive tumor progression and therapeutic resistance [[46]5–[47]8]. AML blasts actively remodel their niche through complex interactions with mesenchymal stromal cells, endothelial cells, and immunosuppressive myeloid populations, which together establish a cytokine- and chemokine-rich milieu [[48]9–[49]11]. This remodeled niche supports leukemic stem cells (LSCs) through CXCR4/CXCL12-mediated retention, CD44/VLA-4 adhesion, and survival signals (VEGF, TGF-β) while suppressing normal hematopoiesis [[50]12]. The TME confers chemoresistance via hypoxic sanctuaries, mitochondrial transfer, and NF-κB/STAT3 activation [[51]13]. Therapeutic targeting remains challenging due to niche plasticity and hematopoietic toxicity, though emerging approaches combining CXCR4 inhibitors (plerixafor) with chemotherapy or metabolic disruptors show promise [[52]14,[53]15]. Thus, a comprehensive understanding of these dynamic microenvironmental interactions is therefore critical for developing more effective therapeutic strategies to overcome treatment resistance and prevent disease relapse. Rapid strides in microarray and next-generation sequencing (NGS) technologies now enable precise prognostication and customization of treatment for AML patients [[54]16]. The ESTIMATE algorithm stands out by effectively quantifying immune and stromal cell infiltration within the TME, and has been employed across various cancers including those of the gastric cancer [[55]17], breast cancer [[56]18], prostate cancer [[57]19], colon cancer [[58]20], osteosarcoma [[59]21], renal cell carcinoma [[60]22] and hepatocellular carcinoma [[61]23]. This approach has also facilitated the determination of immune and stromal scores specifically for AML patients [[62]24–[63]28]. Despite extensive research into novel therapeutic avenues and drugs, the relapse and mortality rates in AML remain stubbornly high. Accurately predicting patient outcomes at diagnosis is therefore crucial [[64]29]. Existing prognostic models, which incorporate factors like leukemia hematopoietic stem cells (LSC), microRNAs, gene expression patterns, methylation profiles, and markers of immunogenic cell death, often show limitations, particularly in their effectiveness across different AML subtypes [[65]30–[66]35]. Consequently, there is an urgent and pressing need for the development of more refined prognostic models. To address these challenges, our research introduces an innovative prognostic model that integrates gene expression data from AML patient cohorts in both the Cancer Genome Atlas (TCGA) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) databases, refined further with the ESTIMATE algorithm to boost its predictive precision. 2 Methods 2.1 Data requisition The level 3 RNA sequencing data with corresponding clinical information of 151 newly diagnosed AML patients from the TCGA database and 312 AML patients from the TARGET database were downloaded from the GDC database ([67]https://portal.gdc.cancer.gov/repository). The patient data derived from the TCGA database were employed in the construction of the prognostic model, while the data from the TARGET database were applied for the external validation of the model. Within this patient of TCGA cohort, 17 individuals were identified as having incomplete clinical records, notably in relation to their survival data. In line with our principal aim of developing a prognostic model, which hinged on the availability of comprehensive clinical details, these 17 patients were subsequently omitted from the analytical process, resulting in a final cohort of 134 patients for further analyses. 2.2 Microenvironment-related differentially expressed genes In order to assess the quality of stromal and immune cells in the TME of the patients with AML, an ESTIMATE analysis was performed. According to the median of their ESTIMATE scores, AML patients were categorized into low and high groups. The “DESeq2” package was utilized to obtain differentially expressed genes (DEGs) between high and low ESTIMATE groups. Genes with a |Fold Change| higher than 1.5 and a false discovery rate (FDR) lower than 0.05 were considered DEGs. The “pheatmap,” “plotPCA,” and “ggplot2” packages were utilized to perform heatmap, PCA, and volcano plots, respectively. 2.3 Gene ontology and KEGG pathway The DEGs were analyzed using DAVID ([68]http://david.niaid.nih.gov) for Gene Ontology (GO), REACTOME, and Kyoto Encyclopedia of Genes and Genomes (KEGG), with statistical significance at P < 0.05. 2.4 Protein-protein interaction analysis STRING version 12 ([69]https://string-db.org/) was utilized to investigate interactions between DEGs using Protein-Protein Interaction (PPI) analysis. The database settings were configured with a required score set to medium confidence and a False Discovery Rate (FDR) stringency of 5%. The results of this PPI analysis were subsequently imported into Cytoscape v.3.10.1 to enable the construction of a network model, providing insights into the intricate interplay among these DEGs. The top ten hub DEGs Were identified using Cytohubba, a plug‑in for Cytoscape, for closeness, betweenness, and degree algorithms for both upregulated and downregulated DEGs. 2.5 Survival analysis and prognostic model construction To construct a Microenvironment-Prognostic Model (MPM), the initial step involved the execution of univariate Cox regression analysis with the R package “survival” (Version 3.8–3) to determine the associations between DEG expression levels and overall patient survival. DEGs with a significance level of P < 0.05 in univariate Cox regression were identified as predictive genes. Subsequently, the dataset was randomly partitioned into training and test groups to validate the model’s accuracy. The train set was utilized to construct MPM, while the testing set and the entire dataset were utilized to validate the prediction signature. The “glmnet” package (version 4.1–8) was used to perform Least absolute shrinkage and selection operator (LASSO) regression analysis (with a penalty parameter determined by 10-fold cross-validation) to narrow the risk of overfitting. Multivariate Cox regression analysis was used to generate the risk score (RS) for each AML patient, which is statistically equivalent to Σ (βi * Expi) (i = the number of prognostic hub genes). To assess the model’s accuracy comprehensively, an array of R packages, including “survival”, “caret,” “glmnet”, “rms”, “survminer”, and “timeROC” were employed. These packages facilitated the execution of various analyses, including Kaplan-Meier analysis and the generation of receiver operating characteristic (ROC) curves for 1, 3, and 5-year survival across the training, testing, and entire patient datasets. Additionally, the calculation of the area under the curve (AUC) was carried out, providing a valuable measure of the model’s predictive performance for training, testing, and entire patient datasets. A higher AUC value indicated enhanced predictability of the Microenvironment-Prognostic Model (MPM) under the ROC curve. To assess the predictive capacity of the MPM in comparison to the ESTIMATE algorithm and age, ROC curve analysis was carried out across 1, 3, and 5-year intervals. Furthermore, external validation of the MPM was performed using AML patient data from the TARGET database, involving Kaplan-Meier analysis and ROC curves for 1, 3, and 5-year predictions to affirm the model’s predictive robustness. Subsequently, a nomogram model was established for forecasting survival years in AML patients by incorporating the risk score and various clinical features, such as age, FAB classification, and Cancer and Leukemia Group B (CALGB) stage, utilizing the “rms” and “survival” packages. The Consistency Index (C-index) was then computed to assess the model’s accuracy and provide insights into its reliability and effectiveness. Additionally, a comprehensive examination was conducted to explore the correlations between the MPM and various clinical factors, encompassing variables like age, FAB classification, and CALGB stage. 2.6 Statistical analysis All statistical analyses were performed in R software (version 4.4.2; Auckland, New Zealand, United States). The Kruskal–Wallis test was applied to evaluate differences among multiple groups, acknowledging the non-parametric nature of the data. For two-group comparisons, the Wilcoxon rank-sum test was employed. Statistical significance was defined as P < 0.05. 3 Results 3.1 ESTIMATE scores are associated with AML clinical parameters After excluding 17 AML patients with incomplete clinical information, 134 patients remained ([70]Table 1). Of these patients, 76 (56.71%) were male, and 58 (43.28%) were female. The median age at initial pathological diagnosis was 58 years, ranging from 21 to 88 years. The fourteen subtypes of these patients were M0 undifferentiated (14, 10.6%), M1 (30, 22.7%), M2 (32, 24.2%), M3 (14, 10.6%), M4 (27, 20.5%), M5 (12, 9.1%), M6 (2, 1.5%), and M7 (1, 0.8%); two patients were not classified. Subsequently, we determined the ESTIMATE scores for each patient using the ESTIMATE algorithm. Table 1. Clinical characteristics of the TCGA AML cohort. Number/ range Percentage (%) Sex  Male 76 56.72  Female 58 43.28 FAB classification  M0 14 10.45  M1 30 22.39  M2 32 23.88  M3 14 10.45  M4 27 20.15  M5 12 8.96  M6 2 1.49  M7 1 0.75  Not classified 2 1.49 CALGB category  Favorable 29 21.64  Intermediate 76 56.72  Poor 27 20.15  NA 2 1.49 Age  < 60 years 73 54.48  > 60 years 61 45.52 Continuous variables Range Median Stromal Score −1582.2 - 425.9 −937.4 Immune Score 1243–3669 2462 ESTIMATE Score −218.8 - 4094.9 1500.7 Age 21–88 58 [71]Open in a new tab In order to evaluate the association of ESTIMATE scores with AML cytogenetic risk, we classified the cytogenetic risk of AML patients as favorable, intermediate/normal, or poor and plotted the distribution of ESTIMATE scores concerning the level of cytogenetic risk; however, the result was not significant (p-value = 0.16; [72]Fig 1-A). On the other hand, ESTIMATE scores were significantly associated with the FAB classification (p-value = 9.7e-08; [73]Fig 1-B). Moreover, the AML patients were divided into high- and low-score groups to investigate the potential relationship between ESTIMATE scores and overall survival. Patients with low ESTIMATE scores had a longer median overall survival than those with high ESTIMATE scores (p-value = 0.011; [74]Fig 1-C). Fig 1. Association of ESTIMATE scores with AML clinical features. [75]Fig 1 [76]Open in a new tab A, The correlation between ESTIMATE scores and AML cytogenetic risk (P = 0.16). B, Distribution of ESTIMATE scores for AML subtypes (p-value = 9.7e-08). C, Kaplan-Meier survival curve reveals that higher ESTIMATE scores are associated with significantly shorter overall survival (log-rank test, p-value = 0.011). 3.2 Identification of differentially expressed genes (DEGs) based on Estimate scores in AML We evaluated the RNA-Seq data of the patients to examine the relationship between gene expression profiles and ESTIMATE scores. Using the cut-off criteria of p-value = 0.05 and |log2 fold change| > 1.5, 2134 DEGs (1380 commonly upregulated genes and 754 commonly downregulated genes) were found based on ESTIMATE scores ([77]Fig 2-A). Moreover, the principal component analysis (PCA) was performed to assess the relation between ESTIMATE scores and FAB classification ([78]Fig 2-B). The DEGs of the low versus high ESTIMATE score groups are depicted in [79]Fig 2-[80]C’s heatmap ([81]Fig 2-C). The focus of our subsequent analysis was on these common DEGs. Fig 2. Identification of DEGs based on ESTIMATE scores. [82]Fig 2 [83]Open in a new tab A, Volcano plot of DEGs from the low vs. high stromal score groups. Genes with p < 0.05 are shown in red (fold change > 1.5) and blue (fold change <−1.5). Grey plots represent the remaining genes (those with no significant difference). B, PCA plot of TCGA data based on ESTIMATE scores and FAB classification. C, Heatmap of top-20 upregulated-DEGs and top-20 downregulated-DEGs for the ESTIMATE score groups. 3.3 Gene ontology Gene ontology (GO), KEGG, and REACTOME pathway analyses were used to investigate the biological processes and pathways involved. Using the DAVID gene annotation tool, the DEGs were analyzed for three sub-ontologies, as shown in [84]Fig 3-A: biological processes (BP), cellular components (CC), and molecular function (MF). Regarding BP, DEGs were most enriched in neutrophil degranulation, inflammatory response, immune response, signal transduction, and cytokine-mediated signaling pathways. KEGG pathway enrichment and interrelationship showed that the DEGs involved the cytokine-cytokine receptor interaction, phagosome, tuberculosis, and osteoclast differentiation ([85]Fig 3-B). REACTOME pathway analysis revealed that the top pathways related to DEGs were the immune System, neutrophil degranulation, innate immune System, immunoregulatory interactions between a lymphoid and a non-lymphoid cell, toll-like receptor cascades, and cytokine signaling in the immune system ([86]Fig 3-C). Fig 3. GO term enrichment analysis of common DEGs. [87]Fig 3 [88]Open in a new tab A, the top 30 significantly enriched GO terms, including three sub-ontologies, biological process, molecular function, and cellular component, are shown. B, Interrelation analysis of KEGG and REACTOME pathways of common DEGs. 3.4 Protein-protein interaction (PPI) network construction and functional enrichment of genes of prognostic value We made a PPI network using the STRING online database and Cytoscape software to investigate the interactions between upregulated and downregulated DEGs. The supplementary shows that the network of upregulated DEGs contains 1366 nodes and 17147 edges, and the network of downregulated DEGs contains 739 nodes and 1238 edges. The STRING data were then further analyzed using Cytoscape, and closeness, betweenness, and degree were identified for upregulated DEGs ([89]Fig 4A-C) and downregulated DEGs ([90]Fig 4D-F) using cytoHubba. Fig 4. The PPI network consists of the top 10 hub upregulated and downregulated DEGs according to the cytoHubba analysis. [91]Fig 4 [92]Open in a new tab The algorithms are: A, betweenness of top 10 upregulated-DEGs; B, closeness of top 10 upregulated-DEGs; C, Degree of top 10 upregulated-DEGs. D, betweenness of top 10 downregulated-DEGs; E, closeness of top 10 downregulated-DEGs; F, Degree of top 10 downregulated-DEGs. The red indicates a higher score, and the yellow indicates a lower score. 3.5 Microenvironment prognostic model establishment In order to construct a microenvironment prognostic model (MPM), initially, we performed a univariate Cox regression analysis on the DEGs. Of 2134 microenvironment-related genes, 733 were prognostic. The LASSO regression was performed to avoid overfitting, and 24 genes were selected for further analysis. The multi-cox proportional hazard test revealed that eight genes were strongly associated with the overall survival of AML patients ([93]Figs 5A-C). The expression levels of these eight genes and their respective coefficients derived from the multi-Cox proportional hazard test were used to calculate individual-level risk scores for each patient. The following formula was used for calculating each patient’s risk score: risk score = ITPR2 × (−2.695558) + LYN × (1.762128) + RGMA × (−0.657528) + GZMB × (0.783182) + RAB9B × (0.880839) + CXCL12 × (−0.219737) + RUFY4 × (0.540176) + TRIM16 × (1.605168). Examination of the risk factors linked to these eight genes revealed a positive relation between increased gene expression and a heightened risk of mortality ([94]Fig 5D, [95]5E). Fig 5. Establishment of MPM. [96]Fig 5 [97]Open in a new tab A, LASSO coefficient profiles of the prognostic DEGs. B, Ten-fold cross-validation for tuning parameter selection in the LASSO model. The partial likelihood deviance is plotted against log (λ), where λ is the tuning parameter. Partial likelihood deviance values are shown, with error bars representing SE. The dotted vertical lines are drawn at the optimal values by minimum criteria and 1-SE criteria. C, Forest plot of hazard ratios for 8 prognostic DEGs. D, Distributions of risk score and overall survival status according to risk score increment. E, Expression profile of signature genes in high and low risk score groups. We obtained the Risk Score for all patients and then classified them as low or high risk based on the median. A Kaplan-Meier survival analysis of test, train, and whole data showed that the high-risk group had a considerably lower survival rate than the low-risk group (p-value < 0.0001, p-value = 0.00041, and p-value < 0.0001, respectively; [98]Fig 6A-C). The ROC curve was constructed to test the model’s accuracy in test, train, and entire data ([99]Fig 6D-F). Especially, the AUC of 1, 3, and 5-year survival for entire data were 84.05%, 85.73%, and 89.54%, respectively, which indicate the robust predictive power of our prognostic model across different timeframes. Fig 6. Evaluating the Prognostic Efficacy of MPM in AML. [100]Fig 6 [101]Open in a new tab A-C, Kaplan–Meier analysis substantiates the robust prognostic relevance of MPM within the training, test, and overall patient cohorts, exhibiting statistical significance (p-values <0.0001, = 0.0004, < 0.0001, respectively). D-F, Time-dependent ROC curves elucidate the precision of MPM in forecasting 1-, 3-, and 5-year Overall Survival rates among patients within the TCGA dataset. G, External validation using TARGET data corroborates MPM’s significant relationship with AML prognosis. H, Time-dependent ROC curves further highlight the MPM’s competence in predicting 1-, 3-, and 5-year OS rates within the TARGET AML patient population. Crucially, it is worth emphasizing that our prognostic model demonstrated superior predictive accuracy compared to both the ESTIMATE algorithm and age. When evaluating ROC curves for 1, 3, and 5-year survival, the AUC values for the ESTIMATE algorithm were 64.6%, 63.2%, and 71.2%, respectively, while age yielded AUC values of 68.8%, 72.8%, and 79.3%. This contrast underscores the enhanced predictive capability of our model in foreseeing AML patient outcomes ([102]S1A, [103]S1B Fig). Furthermore, we applied TARGET data comprising 312 AML patients to conduct external validation of the prognostic model. Our analysis, which included Kaplan-Meier survival curves, revealed a statistically significant relationship between the risk score and patient survival within this dataset (p-value = 0.025; [104]Fig 6G, [105]6H). Additionally, we generated ROC curves to assess the model’s validity across 1, 3, and 5-year intervals, yielding respective AUC values of 60.5%, 56.7%, and 55.7%, respectively. 3.6 Nomogram model construction We established and meticulously validated a predictive nomogram tailored for predicting outcomes in AML patients. This nomogram, presented in [106]Fig 7A, integrates our microenvironment prognostic model, patient age, FAB classification, and CALGB category, providing risk assessments for patients at 1, 3, and 5-year intervals. Its development aimed to facilitate personalized risk evaluation and inform clinical decision-making. To gauge its performance in distinguishing patients who experienced the targeted clinical event from those who did, we employed the concordance index (C-index) ([107]Fig 7B). Fig 7. Nomogram Development for Survival Prediction in AML. [108]Fig 7 [109]Open in a new tab A, Nomogram displaying the predictive factors, including RSG, age, FAB classification, and CALGB category, with survival probabilities for 1, 3, and 5 years. B, CI illustrating the comparison between the nomogram-predicted overall survival probability and the actual overall survival probability. 3.7 MPM’s Prognostic Accuracy in AML Clinical Context Our analysis has uncovered a robust connection between the MPM and critical clinical parameters in AML patients, which include age, FAB classification, and cytogenetic status. These findings underscore the potential value of the MPM in customizing treatment approaches and enhancing patient care. The MPM has demonstrated significant associations with AML patient age, effectively distinguishing between patients below and above the age of 60 (p-value = 1.1e-05, [110]Fig 8A). Furthermore, it exhibits a substantial correlation with the FAB (French-American-British) classification system, allowing for subtype-specific survival predictions (p-value = 6.1e-06, [111]Fig 8B). Additionally, as illustrated in [112]Fig 8C, the MPM has revealed notable relationships with cytogenetic status as characterized by CALGB criteria, enabling precise risk stratification for patients with favorable, intermediate, and poor cytogenetic profiles (p-value = 2.7e-07). Fig 8. Validation of the MPM in Clinical Characteristics of AML Patients. [113]Fig 8 [114]Open in a new tab A, MPM significantly correlates with AML patient age, distinguishing those under and above 60 years. (p-value = 1.1e-05). B, MPM shows a significant relationship with FAB classification, aiding in subtype-specific survival predictions (P = 6.1e-06). C, MPM is notably associated with cytogenetic status (CALGB criteria), offering precise risk stratification for favorable, intermediate, and poor cytogenetic patients (p-value = 2.7e-07). 4 Discussion AML represents the most common acute leukemia type in adults, characterized by a high mortality rate and variable outcomes [[115]36]. Despite ongoing advancements in identifying new drugs and therapeutic targets, the five-year survival rate remains disappointingly low [[116]37,[117]38]. Accurately determining prognosis at diagnosis is crucial for improving overall survival rates in AML patients [[118]39,[119]40]. AML prognosis is influenced by several factors, including genetic abnormalities; however, the role of the bone marrow microenvironment has garnered significant attention in recent years. The bone marrow serves as the primary site for leukemia’s onset and progression, where stromal and immune cells within this microenvironment are pivotal in the proliferation, survival, and drug resistance of leukemic cells [[120]41,[121]42]. Our study delves into the intricate relationships between specific TME cell populations and AML prognosis, pioneering a MPM rooted in the ESTIMATE score to heighten the precision of prognostic biomarkers for AML. This exploration aims to uncover how the TME dictates the destiny of AML, paving the way for innovative therapeutic targets. Using the ESTIMATE algorithm, we calculated the purity of microenvironment cells in the TME of AML patients. By comparing ESTIMATE scores among patients, which were highly correlated with FAB classification and overall survival, we classified our patients into two subgroups: high and low ESTIMATE scores. Subsequently, we identified differential expression genes (DEGs). Pathway enrichment analysis of TME-related DEGs revealed that these genes are remarkably associated with immune response, inflammatory response, and innate immune response pathways. Inflammation-related genes are attendant to AML progression and chemoresistance; therefore, the inflammatory response is known as a prognostic factor in these patients [[122]43]. Also, there has been a growing number of studies involving immune-related processes as a prognostic factor in AML [[123]44]. In our analysis, survival studies on DEGs facilitated the creation of the MPM. Notably, the overexpression of eight genes (CXCL12, GZMB, ITPR2, LYN, RAB9B, RGMB, RUFY4, TRIM16) correlated strongly with patient survival. Among these, the chemokine CXCL12 and its receptor CXCR4 are crucial for mediating interactions between leukemia cells and their microenvironment, promoting cell migration and survival, thereby contributing to chemotherapy resistance [[124]45]. AML cells often express the CXCR4 receptor, which is the receptor for CXCL12. The interaction between CXCL12 and CXCR4 promotes the migration and homing of AML cells to the bone marrow microenvironment, where they can receive support and protection from the surrounding stromal cells. This interaction also contributes to the resistance of AML cells to chemotherapy, as CXCL12 signaling can activate survival pathways in the leukemic cells. Elevated CXCL12 levels in the bone marrow are invariably associated with adverse AML patient outcomes [[125]46,[126]47]. Granzyme B (GzmB), a serine protease from cytotoxic lymphocytes, targets and destroys virus-infected or malignant cells [[127]48]. In AML, however, the levels of GZMB are notably diminished, undermining its role as a crucial cytotoxic mediator for T and NK cells in combatting cancer cells [[128]49]. The ITPR2 gene encodes for the Inositol 1,4,5-trisphosphate receptor type 2 (IP3R2) protein, a critical calcium channel that modulates intracellular signaling pathways. Although initial studies suggest a prognostic role for ITPR2, its implications in AML remain underexplored [[129]50,[130]51]. LYN, a tyrosine kinase, is integral to internal signaling processes and is crucial for the differentiation and persistence of the leukemic phenotype across various blood cancers including AML, CML, and B-cell lymphocytic leukemia [[131]52]. Research has indicated a significant association between LYN activity and overall survival in AML [[132]53]. RAB9B, belonging to the RAB family of GTPases, is essential for vesicle trafficking and protein transport within cells. While there is no study on RGMB in AML, Jiang et.al found that RAB9B is overexpressed in colorectal cancer and promotes tumor growth and metastasis [[133]54]. RGMB, a member of the repulsive guidance molecule family, is known for its role in neuronal development and axon guidance. RGMB is overexpressed in glioblastoma and promotes tumor growth and invasion. in terms of tumor progression, RGMb can inhibit cell proliferation, and invasion, in NSCLC, breast cancer, liver cancer, squamous cell carcinoma, nasopharyngeal carcinoma and other related tumors, so as to inhibit tumor progression and even improving survival ratio [[134]55]. RUFY4 (RUN and FYVE domain-containing protein 4) is a protein that plays a role in intracellular membrane trafficking and cytoskeletal organization [[135]56]. the role of RUFY4 is not well understood in AML. However, studies have shown its involvement in regulating autophagy, a cellular process that plays a role in cancer development and progression [[136]57]. TRIM16 (Tripartite Motif Containing 16), is a member of the tripartite motif (TRIM) family of proteins [[137]58]. While there is limited research on TRIM16 in AML, studies have shown its involvement in other types of cancers and cellular processes. Previous studies demonstrated that TRIM16 is involved in regulating the NF-κB signaling pathway, which plays a crucial role in cancer development and progression [[138]59]. We developed an integrated prognostic model combining univariate Cox regression, LASSO regression, and multivariate Cox regression, augmented by Kaplan-Meier survival curves and nomograms. This model, leveraging genes such as CXCL12, GZMB, ITPR2, LYN, RAB9B, RGMB, RUFY4, and TRIM16, was designed to forecast outcomes for AML patients. The efficacy of this model was corroborated through ROC curve analysis across training, testing, and overall patient groups. It demonstrated a notably high AUC in the TARGET dataset, and the Kaplan-Meier analysis also yielded significant results. The findings revealed that patients in the high-risk category exhibited markedly lower survival rates compared to those in the low-risk group. Moreover, the model’s prognostic relevance was significantly associated with patient age, the FAB classification, and the CALGB classification. Nonetheless, this study presents certain limitations. First, the prognostic model was constructed from retrospective data, necessitating prospective multi-institutional cohort studies to validate its clinical applicability and reproducibility across heterogeneous populations. While external validation was performed using the TARGET database, broader evaluation across geographically and ethnically diverse cohorts is essential to confirm the model’s generalizability, particularly given potential variability in tumor microenvironment composition influenced by genetic ancestry or regional therapeutic practices. Second, the reliance on bulk RNA sequencing and computational deconvolution algorithms such as ESTIMATE introduces inherent technical constraints, including limited resolution to dissect cellular heterogeneity and potential batch effects across sequencing platforms. Although emerging single-cell transcriptomic and spatially resolved omics technologies could address these limitations by providing high-resolution microenvironmental mapping, their integration into clinical workflows remains constrained by infrastructural requirements, technical expertise, and standardization challenges—critical barriers for resource-limited settings. Third, while the model demonstrates robust prognostic stratification, its translational utility requires rigorous validation through prospective clinical trials assessing its capacity to guide therapeutic decisions, particularly in the context of emerging targeted therapies against TME components. Despite these considerations, our predictive framework provides a promising foundation for identifying novel therapeutic targets, ultimately informing more robust diagnostic and treatment paradigms for AML. 5 Conclusion Our study establishes a Tumor Microenvironment-derived Prognostic Model (MPM) that integrates eight TME-associated genes (CXCL12, GZMB, ITPR2, LYN, RAB9B, RGMB, RUFY4, TRIM16) to stratify AML patients into distinct risk categories with significant survival differences. Unlike existing models that focus on genetic mutations or immunogenic cell death markers, the MPM uniquely leverages stromal and immune infiltration metrics derived from the ESTIMATE algorithm, capturing dynamic interactions between leukemic cells and their protective niche. The model demonstrated robust prognostic accuracy across two independent cohorts (TCGA and TARGET), outperforming conventional parameters such as age and cytogenetic risk. Clinically, the MPM offers a actionable framework for risk-adapted therapy: high-risk patients could be prioritized for intensive regimens or novel TME-targeted therapies (e.g., CXCR4 inhibitors), while low-risk patients might benefit from reduced-intensity protocols to minimize toxicity. Future steps include prospective validation in multicenter trials to assess its utility in guiding real-time therapeutic decisions and integration into digital platforms for rapid risk scoring, thereby bridging the gap between computational biology and bedside practice. Supporting information S1 Fig. Comparative ROC Curve analysis for 1, 3, and 5-year prognostic accuracy. A, ESTIMATE algorithm; B, patient age in AML cohort. (PDF) [139]pone.0325145.s001.pdf^ (6.8KB, pdf) Acknowledgments