Abstract

Background

   The tumor microenvironment (TME) exerts a profound influence on the
   progression, therapeutic responses, and clinical outcomes of acute
   myeloid leukemia (AML), a prevalent hematologic malignancy in adults.
   This study aimed to establish a TME-based prognostic model to unveil
   novel therapeutic and prognostic avenues for AML.

Methods

   Gene expression profiles and clinical information for 134 AML patients
   were retrieved from The Cancer Genome Atlas (TCGA). The TME cellular
   components were evaluated using the ESTIMATE algorithm, and
   differentially expressed genes (DEGs) were identified. A
   Microenvironment Prognostic Model (MPM) was subsequently constructed
   through univariate Cox regression, LASSO regression, and multivariate
   Cox regression analyses. The predictive performance of the MPM was
   validated in a separate cohort of 312 AML patients from the TARGET
   database.

Results

   Kaplan-Meier analysis revealed significant associations between the
   TME, French-American-British (FAB) classification, and overall survival
   (p-values = 3.6e-07 and 0.011, respectively). LASSO-Cox regression
   identified eight essential genes (CXCL12, GZMB, ITPR2, LYN, RAB9B,
   RGMB, RUFY4, TRIM16) that exhibited a strong correlation with survival
   (p-value < 0.0001). The MPM demonstrated excellent prognostic
   performance, with area under the curve (AUC) values of 84.05, 85.73,
   and 89.54 for predicting 1-, 3-, and 5-year survival, respectively.
   External validation with the TARGET database underscored the robustness
   of this model, yielding AUC values of 60.5%, 56.7%, and 55.7% at the
   corresponding intervals.

Conclusion

   These findings present a TME-based prognostic model that offers a
   promising avenue for precise risk stratification and targeted
   therapeutic strategies in AML.

1 Background

   Acute myeloid leukemia (AML), the most prevalent form of adult acute
   leukemia, arises from the unchecked proliferation of myeloid precursor
   cells. This abnormal growth disrupts normal blood cell production,
   leading to bone marrow failure [[42]1,[43]2]. Despite achieving
   complete remission with initial induction chemotherapy, AML patients
   face a disappointingly low five-year survival rate, primarily due to
   frequent relapses. These relapses are often attributed to minimal
   residual disease (MRD) ensconced within a protective tumor
   microenvironment (TME) that promotes immune evasion and resistance to
   treatment [[44]3,[45]4].

   The tumor microenvironment (TME) constitutes a spatially organized,
   metabolically dynamic niche where malignant cells co-opt stromal
   fibroblasts, endothelial cells, immunosuppressive myeloid populations,
   and extracellular matrix (ECM) components to drive tumor progression
   and therapeutic resistance [[46]5–[47]8]. AML blasts actively remodel
   their niche through complex interactions with mesenchymal stromal
   cells, endothelial cells, and immunosuppressive myeloid populations,
   which together establish a cytokine- and chemokine-rich milieu
   [[48]9–[49]11]. This remodeled niche supports leukemic stem cells
   (LSCs) through CXCR4/CXCL12-mediated retention, CD44/VLA-4 adhesion,
   and survival signals (VEGF, TGF-β) while suppressing normal
   hematopoiesis [[50]12]. The TME confers chemoresistance via hypoxic
   sanctuaries, mitochondrial transfer, and NF-κB/STAT3 activation
   [[51]13]. Therapeutic targeting remains challenging due to niche
   plasticity and hematopoietic toxicity, though emerging approaches
   combining CXCR4 inhibitors (plerixafor) with chemotherapy or metabolic
   disruptors show promise [[52]14,[53]15]. Thus, a comprehensive
   understanding of these dynamic microenvironmental interactions is
   therefore critical for developing more effective therapeutic strategies
   to overcome treatment resistance and prevent disease relapse.

   Rapid strides in microarray and next-generation sequencing (NGS)
   technologies now enable precise prognostication and customization of
   treatment for AML patients [[54]16]. The ESTIMATE algorithm stands out
   by effectively quantifying immune and stromal cell infiltration within
   the TME, and has been employed across various cancers including those
   of the gastric cancer [[55]17], breast cancer [[56]18], prostate cancer
   [[57]19], colon cancer [[58]20], osteosarcoma [[59]21], renal cell
   carcinoma [[60]22] and hepatocellular carcinoma [[61]23]. This approach
   has also facilitated the determination of immune and stromal scores
   specifically for AML patients [[62]24–[63]28].

   Despite extensive research into novel therapeutic avenues and drugs,
   the relapse and mortality rates in AML remain stubbornly high.
   Accurately predicting patient outcomes at diagnosis is therefore
   crucial [[64]29]. Existing prognostic models, which incorporate factors
   like leukemia hematopoietic stem cells (LSC), microRNAs, gene
   expression patterns, methylation profiles, and markers of immunogenic
   cell death, often show limitations, particularly in their effectiveness
   across different AML subtypes [[65]30–[66]35]. Consequently, there is
   an urgent and pressing need for the development of more refined
   prognostic models. To address these challenges, our research introduces
   an innovative prognostic model that integrates gene expression data
   from AML patient cohorts in both the Cancer Genome Atlas (TCGA) and the
   Therapeutically Applicable Research to Generate Effective Treatments
   (TARGET) databases, refined further with the ESTIMATE algorithm to
   boost its predictive precision.

2 Methods

2.1 Data requisition

   The level 3 RNA sequencing data with corresponding clinical information
   of 151 newly diagnosed AML patients from the TCGA database and 312 AML
   patients from the TARGET database were downloaded from the GDC database
   ([67]https://portal.gdc.cancer.gov/repository). The patient data
   derived from the TCGA database were employed in the construction of the
   prognostic model, while the data from the TARGET database were applied
   for the external validation of the model. Within this patient of TCGA
   cohort, 17 individuals were identified as having incomplete clinical
   records, notably in relation to their survival data. In line with our
   principal aim of developing a prognostic model, which hinged on the
   availability of comprehensive clinical details, these 17 patients were
   subsequently omitted from the analytical process, resulting in a final
   cohort of 134 patients for further analyses.

2.2 Microenvironment-related differentially expressed genes

   In order to assess the quality of stromal and immune cells in the TME
   of the patients with AML, an ESTIMATE analysis was performed. According
   to the median of their ESTIMATE scores, AML patients were categorized
   into low and high groups. The “DESeq2” package was utilized to obtain
   differentially expressed genes (DEGs) between high and low ESTIMATE
   groups. Genes with a |Fold Change| higher than 1.5 and a false
   discovery rate (FDR) lower than 0.05 were considered DEGs. The
   “pheatmap,” “plotPCA,” and “ggplot2” packages were utilized to perform
   heatmap, PCA, and volcano plots, respectively.

2.3 Gene ontology and KEGG pathway

   The DEGs were analyzed using DAVID ([68]http://david.niaid.nih.gov) for
   Gene Ontology (GO), REACTOME, and Kyoto Encyclopedia of Genes and
   Genomes (KEGG), with statistical significance at P < 0.05.

2.4 Protein-protein interaction analysis

   STRING version 12 ([69]https://string-db.org/) was utilized to
   investigate interactions between DEGs using Protein-Protein Interaction
   (PPI) analysis. The database settings were configured with a required
   score set to medium confidence and a False Discovery Rate (FDR)
   stringency of 5%. The results of this PPI analysis were subsequently
   imported into Cytoscape v.3.10.1 to enable the construction of a
   network model, providing insights into the intricate interplay among
   these DEGs. The top ten hub DEGs Were identified using Cytohubba, a
   plug‑in for Cytoscape, for closeness, betweenness, and degree
   algorithms for both upregulated and downregulated DEGs.

2.5 Survival analysis and prognostic model construction

   To construct a Microenvironment-Prognostic Model (MPM), the initial
   step involved the execution of univariate Cox regression analysis with
   the R package “survival” (Version 3.8–3) to determine the associations
   between DEG expression levels and overall patient survival. DEGs with a
   significance level of P < 0.05 in univariate Cox regression were
   identified as predictive genes. Subsequently, the dataset was randomly
   partitioned into training and test groups to validate the model’s
   accuracy. The train set was utilized to construct MPM, while the
   testing set and the entire dataset were utilized to validate the
   prediction signature. The “glmnet” package (version 4.1–8) was used to
   perform Least absolute shrinkage and selection operator (LASSO)
   regression analysis (with a penalty parameter determined by 10-fold
   cross-validation) to narrow the risk of overfitting. Multivariate Cox
   regression analysis was used to generate the risk score (RS) for each
   AML patient, which is statistically equivalent to Σ (βi * Expi)
   (i = the number of prognostic hub genes).

   To assess the model’s accuracy comprehensively, an array of R packages,
   including “survival”, “caret,” “glmnet”, “rms”, “survminer”, and
   “timeROC” were employed. These packages facilitated the execution of
   various analyses, including Kaplan-Meier analysis and the generation of
   receiver operating characteristic (ROC) curves for 1, 3, and 5-year
   survival across the training, testing, and entire patient datasets.
   Additionally, the calculation of the area under the curve (AUC) was
   carried out, providing a valuable measure of the model’s predictive
   performance for training, testing, and entire patient datasets. A
   higher AUC value indicated enhanced predictability of the
   Microenvironment-Prognostic Model (MPM) under the ROC curve.

   To assess the predictive capacity of the MPM in comparison to the
   ESTIMATE algorithm and age, ROC curve analysis was carried out across
   1, 3, and 5-year intervals. Furthermore, external validation of the MPM
   was performed using AML patient data from the TARGET database,
   involving Kaplan-Meier analysis and ROC curves for 1, 3, and 5-year
   predictions to affirm the model’s predictive robustness. Subsequently,
   a nomogram model was established for forecasting survival years in AML
   patients by incorporating the risk score and various clinical features,
   such as age, FAB classification, and Cancer and Leukemia Group B
   (CALGB) stage, utilizing the “rms” and “survival” packages. The
   Consistency Index (C-index) was then computed to assess the model’s
   accuracy and provide insights into its reliability and effectiveness.
   Additionally, a comprehensive examination was conducted to explore the
   correlations between the MPM and various clinical factors, encompassing
   variables like age, FAB classification, and CALGB stage.

2.6 Statistical analysis

   All statistical analyses were performed in R software (version 4.4.2;
   Auckland, New Zealand, United States). The Kruskal–Wallis test was
   applied to evaluate differences among multiple groups, acknowledging
   the non-parametric nature of the data. For two-group comparisons, the
   Wilcoxon rank-sum test was employed. Statistical significance was
   defined as P < 0.05.

3 Results

3.1 ESTIMATE scores are associated with AML clinical parameters

   After excluding 17 AML patients with incomplete clinical information,
   134 patients remained ([70]Table 1). Of these patients, 76 (56.71%)
   were male, and 58 (43.28%) were female. The median age at initial
   pathological diagnosis was 58 years, ranging from 21 to 88 years. The
   fourteen subtypes of these patients were M0 undifferentiated (14,
   10.6%), M1 (30, 22.7%), M2 (32, 24.2%), M3 (14, 10.6%), M4 (27, 20.5%),
   M5 (12, 9.1%), M6 (2, 1.5%), and M7 (1, 0.8%); two patients were not
   classified. Subsequently, we determined the ESTIMATE scores for each
   patient using the ESTIMATE algorithm.

Table 1. Clinical characteristics of the TCGA AML cohort.

                        Number/ range   Percentage (%)
   Sex
    Male                76              56.72
    Female              58              43.28
   FAB classification
    M0                  14              10.45
    M1                  30              22.39
    M2                  32              23.88
    M3                  14              10.45
    M4                  27              20.15
    M5                  12              8.96
    M6                  2               1.49
    M7                  1               0.75
    Not classified      2               1.49
   CALGB category
    Favorable           29              21.64
    Intermediate        76              56.72
    Poor                27              20.15
    NA                  2               1.49
   Age
    < 60 years          73              54.48
    > 60 years          61              45.52
   Continuous variables Range           Median
   Stromal Score        −1582.2 - 425.9 −937.4
   Immune Score         1243–3669       2462
   ESTIMATE Score       −218.8 - 4094.9 1500.7
   Age                  21–88           58
   [71]Open in a new tab

   In order to evaluate the association of ESTIMATE scores with AML
   cytogenetic risk, we classified the cytogenetic risk of AML patients as
   favorable, intermediate/normal, or poor and plotted the distribution of
   ESTIMATE scores concerning the level of cytogenetic risk; however, the
   result was not significant (p-value = 0.16; [72]Fig 1-A). On the other
   hand, ESTIMATE scores were significantly associated with the FAB
   classification (p-value = 9.7e-08; [73]Fig 1-B). Moreover, the AML
   patients were divided into high- and low-score groups to investigate
   the potential relationship between ESTIMATE scores and overall
   survival. Patients with low ESTIMATE scores had a longer median overall
   survival than those with high ESTIMATE scores (p-value = 0.011; [74]Fig
   1-C).

Fig 1. Association of ESTIMATE scores with AML clinical features.

   [75]Fig 1
   [76]Open in a new tab

   A, The correlation between ESTIMATE scores and AML cytogenetic risk
   (P = 0.16). B, Distribution of ESTIMATE scores for AML subtypes
   (p-value = 9.7e-08). C, Kaplan-Meier survival curve reveals that higher
   ESTIMATE scores are associated with significantly shorter overall
   survival (log-rank test, p-value = 0.011).

3.2 Identification of differentially expressed genes (DEGs) based on Estimate
scores in AML

   We evaluated the RNA-Seq data of the patients to examine the
   relationship between gene expression profiles and ESTIMATE scores.
   Using the cut-off criteria of p-value = 0.05 and |log2 fold
   change| > 1.5, 2134 DEGs (1380 commonly upregulated genes and 754
   commonly downregulated genes) were found based on ESTIMATE scores
   ([77]Fig 2-A). Moreover, the principal component analysis (PCA) was
   performed to assess the relation between ESTIMATE scores and FAB
   classification ([78]Fig 2-B). The DEGs of the low versus high ESTIMATE
   score groups are depicted in [79]Fig 2-[80]C’s heatmap ([81]Fig 2-C).
   The focus of our subsequent analysis was on these common DEGs.

Fig 2. Identification of DEGs based on ESTIMATE scores.

   [82]Fig 2
   [83]Open in a new tab

   A, Volcano plot of DEGs from the low vs. high stromal score groups.
   Genes with p < 0.05 are shown in red (fold change > 1.5) and blue (fold
   change <−1.5). Grey plots represent the remaining genes (those with no
   significant difference). B, PCA plot of TCGA data based on ESTIMATE
   scores and FAB classification. C, Heatmap of top-20 upregulated-DEGs
   and top-20 downregulated-DEGs for the ESTIMATE score groups.

3.3 Gene ontology

   Gene ontology (GO), KEGG, and REACTOME pathway analyses were used to
   investigate the biological processes and pathways involved. Using the
   DAVID gene annotation tool, the DEGs were analyzed for three
   sub-ontologies, as shown in [84]Fig 3-A: biological processes (BP),
   cellular components (CC), and molecular function (MF). Regarding BP,
   DEGs were most enriched in neutrophil degranulation, inflammatory
   response, immune response, signal transduction, and cytokine-mediated
   signaling pathways. KEGG pathway enrichment and interrelationship
   showed that the DEGs involved the cytokine-cytokine receptor
   interaction, phagosome, tuberculosis, and osteoclast differentiation
   ([85]Fig 3-B). REACTOME pathway analysis revealed that the top pathways
   related to DEGs were the immune System, neutrophil degranulation,
   innate immune System, immunoregulatory interactions between a lymphoid
   and a non-lymphoid cell, toll-like receptor cascades, and cytokine
   signaling in the immune system ([86]Fig 3-C).

Fig 3. GO term enrichment analysis of common DEGs.

   [87]Fig 3
   [88]Open in a new tab

   A, the top 30 significantly enriched GO terms, including three
   sub-ontologies, biological process, molecular function, and cellular
   component, are shown. B, Interrelation analysis of KEGG and REACTOME
   pathways of common DEGs.

3.4 Protein-protein interaction (PPI) network construction and functional
enrichment of genes of prognostic value

   We made a PPI network using the STRING online database and Cytoscape
   software to investigate the interactions between upregulated and
   downregulated DEGs. The supplementary shows that the network of
   upregulated DEGs contains 1366 nodes and 17147 edges, and the network
   of downregulated DEGs contains 739 nodes and 1238 edges. The STRING
   data were then further analyzed using Cytoscape, and closeness,
   betweenness, and degree were identified for upregulated DEGs ([89]Fig
   4A-C) and downregulated DEGs ([90]Fig 4D-F) using cytoHubba.

Fig 4. The PPI network consists of the top 10 hub upregulated and
downregulated DEGs according to the cytoHubba analysis.

   [91]Fig 4
   [92]Open in a new tab

   The algorithms are: A, betweenness of top 10 upregulated-DEGs; B,
   closeness of top 10 upregulated-DEGs; C, Degree of top 10
   upregulated-DEGs. D, betweenness of top 10 downregulated-DEGs; E,
   closeness of top 10 downregulated-DEGs; F, Degree of top 10
   downregulated-DEGs. The red indicates a higher score, and the yellow
   indicates a lower score.

3.5 Microenvironment prognostic model establishment

   In order to construct a microenvironment prognostic model (MPM),
   initially, we performed a univariate Cox regression analysis on the
   DEGs. Of 2134 microenvironment-related genes, 733 were prognostic. The
   LASSO regression was performed to avoid overfitting, and 24 genes were
   selected for further analysis. The multi-cox proportional hazard test
   revealed that eight genes were strongly associated with the overall
   survival of AML patients ([93]Figs 5A-C). The expression levels of
   these eight genes and their respective coefficients derived from the
   multi-Cox proportional hazard test were used to calculate
   individual-level risk scores for each patient. The following formula
   was used for calculating each patient’s risk score: risk score = ITPR2
   × (−2.695558) + LYN × (1.762128) + RGMA × (−0.657528) + GZMB ×
   (0.783182) + RAB9B × (0.880839) + CXCL12 × (−0.219737) + RUFY4 ×
   (0.540176) + TRIM16 × (1.605168). Examination of the risk factors
   linked to these eight genes revealed a positive relation between
   increased gene expression and a heightened risk of mortality ([94]Fig
   5D, [95]5E).

Fig 5. Establishment of MPM.

   [96]Fig 5
   [97]Open in a new tab

   A, LASSO coefficient profiles of the prognostic DEGs. B, Ten-fold
   cross-validation for tuning parameter selection in the LASSO model. The
   partial likelihood deviance is plotted against log (λ), where λ is the
   tuning parameter. Partial likelihood deviance values are shown, with
   error bars representing SE. The dotted vertical lines are drawn at the
   optimal values by minimum criteria and 1-SE criteria. C, Forest plot of
   hazard ratios for 8 prognostic DEGs. D, Distributions of risk score and
   overall survival status according to risk score increment. E,
   Expression profile of signature genes in high and low risk score
   groups.

   We obtained the Risk Score for all patients and then classified them as
   low or high risk based on the median. A Kaplan-Meier survival analysis
   of test, train, and whole data showed that the high-risk group had a
   considerably lower survival rate than the low-risk group
   (p-value < 0.0001, p-value = 0.00041, and p-value < 0.0001,
   respectively; [98]Fig 6A-C). The ROC curve was constructed to test the
   model’s accuracy in test, train, and entire data ([99]Fig 6D-F).
   Especially, the AUC of 1, 3, and 5-year survival for entire data were
   84.05%, 85.73%, and 89.54%, respectively, which indicate the robust
   predictive power of our prognostic model across different timeframes.

Fig 6. Evaluating the Prognostic Efficacy of MPM in AML.

   [100]Fig 6
   [101]Open in a new tab

   A-C, Kaplan–Meier analysis substantiates the robust prognostic
   relevance of MPM within the training, test, and overall patient
   cohorts, exhibiting statistical significance (p-values
   <0.0001, = 0.0004, < 0.0001, respectively). D-F, Time-dependent ROC
   curves elucidate the precision of MPM in forecasting 1-, 3-, and 5-year
   Overall Survival rates among patients within the TCGA dataset. G,
   External validation using TARGET data corroborates MPM’s significant
   relationship with AML prognosis. H, Time-dependent ROC curves further
   highlight the MPM’s competence in predicting 1-, 3-, and 5-year OS
   rates within the TARGET AML patient population.

   Crucially, it is worth emphasizing that our prognostic model
   demonstrated superior predictive accuracy compared to both the ESTIMATE
   algorithm and age. When evaluating ROC curves for 1, 3, and 5-year
   survival, the AUC values for the ESTIMATE algorithm were 64.6%, 63.2%,
   and 71.2%, respectively, while age yielded AUC values of 68.8%, 72.8%,
   and 79.3%. This contrast underscores the enhanced predictive capability
   of our model in foreseeing AML patient outcomes ([102]S1A, [103]S1B
   Fig).

   Furthermore, we applied TARGET data comprising 312 AML patients to
   conduct external validation of the prognostic model. Our analysis,
   which included Kaplan-Meier survival curves, revealed a statistically
   significant relationship between the risk score and patient survival
   within this dataset (p-value = 0.025; [104]Fig 6G, [105]6H).
   Additionally, we generated ROC curves to assess the model’s validity
   across 1, 3, and 5-year intervals, yielding respective AUC values of
   60.5%, 56.7%, and 55.7%, respectively.

3.6 Nomogram model construction

   We established and meticulously validated a predictive nomogram
   tailored for predicting outcomes in AML patients. This nomogram,
   presented in [106]Fig 7A, integrates our microenvironment prognostic
   model, patient age, FAB classification, and CALGB category, providing
   risk assessments for patients at 1, 3, and 5-year intervals. Its
   development aimed to facilitate personalized risk evaluation and inform
   clinical decision-making. To gauge its performance in distinguishing
   patients who experienced the targeted clinical event from those who
   did, we employed the concordance index (C-index) ([107]Fig 7B).

Fig 7. Nomogram Development for Survival Prediction in AML.

   [108]Fig 7
   [109]Open in a new tab

   A, Nomogram displaying the predictive factors, including RSG, age, FAB
   classification, and CALGB category, with survival probabilities for 1,
   3, and 5 years. B, CI illustrating the comparison between the
   nomogram-predicted overall survival probability and the actual overall
   survival probability.

3.7 MPM’s Prognostic Accuracy in AML Clinical Context

   Our analysis has uncovered a robust connection between the MPM and
   critical clinical parameters in AML patients, which include age, FAB
   classification, and cytogenetic status. These findings underscore the
   potential value of the MPM in customizing treatment approaches and
   enhancing patient care. The MPM has demonstrated significant
   associations with AML patient age, effectively distinguishing between
   patients below and above the age of 60 (p-value = 1.1e-05, [110]Fig
   8A). Furthermore, it exhibits a substantial correlation with the FAB
   (French-American-British) classification system, allowing for
   subtype-specific survival predictions (p-value = 6.1e-06, [111]Fig 8B).
   Additionally, as illustrated in [112]Fig 8C, the MPM has revealed
   notable relationships with cytogenetic status as characterized by CALGB
   criteria, enabling precise risk stratification for patients with
   favorable, intermediate, and poor cytogenetic profiles
   (p-value = 2.7e-07).

Fig 8. Validation of the MPM in Clinical Characteristics of AML Patients.

   [113]Fig 8
   [114]Open in a new tab

   A, MPM significantly correlates with AML patient age, distinguishing
   those under and above 60 years. (p-value = 1.1e-05). B, MPM shows a
   significant relationship with FAB classification, aiding in
   subtype-specific survival predictions (P = 6.1e-06). C, MPM is notably
   associated with cytogenetic status (CALGB criteria), offering precise
   risk stratification for favorable, intermediate, and poor cytogenetic
   patients (p-value = 2.7e-07).

4 Discussion

   AML represents the most common acute leukemia type in adults,
   characterized by a high mortality rate and variable outcomes [[115]36].
   Despite ongoing advancements in identifying new drugs and therapeutic
   targets, the five-year survival rate remains disappointingly low
   [[116]37,[117]38]. Accurately determining prognosis at diagnosis is
   crucial for improving overall survival rates in AML patients
   [[118]39,[119]40]. AML prognosis is influenced by several factors,
   including genetic abnormalities; however, the role of the bone marrow
   microenvironment has garnered significant attention in recent years.
   The bone marrow serves as the primary site for leukemia’s onset and
   progression, where stromal and immune cells within this
   microenvironment are pivotal in the proliferation, survival, and drug
   resistance of leukemic cells [[120]41,[121]42]. Our study delves into
   the intricate relationships between specific TME cell populations and
   AML prognosis, pioneering a MPM rooted in the ESTIMATE score to
   heighten the precision of prognostic biomarkers for AML. This
   exploration aims to uncover how the TME dictates the destiny of AML,
   paving the way for innovative therapeutic targets.

   Using the ESTIMATE algorithm, we calculated the purity of
   microenvironment cells in the TME of AML patients. By comparing
   ESTIMATE scores among patients, which were highly correlated with FAB
   classification and overall survival, we classified our patients into
   two subgroups: high and low ESTIMATE scores. Subsequently, we
   identified differential expression genes (DEGs). Pathway enrichment
   analysis of TME-related DEGs revealed that these genes are remarkably
   associated with immune response, inflammatory response, and innate
   immune response pathways. Inflammation-related genes are attendant to
   AML progression and chemoresistance; therefore, the inflammatory
   response is known as a prognostic factor in these patients [[122]43].
   Also, there has been a growing number of studies involving
   immune-related processes as a prognostic factor in AML [[123]44].

   In our analysis, survival studies on DEGs facilitated the creation of
   the MPM. Notably, the overexpression of eight genes (CXCL12, GZMB,
   ITPR2, LYN, RAB9B, RGMB, RUFY4, TRIM16) correlated strongly with
   patient survival. Among these, the chemokine CXCL12 and its receptor
   CXCR4 are crucial for mediating interactions between leukemia cells and
   their microenvironment, promoting cell migration and survival, thereby
   contributing to chemotherapy resistance [[124]45]. AML cells often
   express the CXCR4 receptor, which is the receptor for CXCL12. The
   interaction between CXCL12 and CXCR4 promotes the migration and homing
   of AML cells to the bone marrow microenvironment, where they can
   receive support and protection from the surrounding stromal cells. This
   interaction also contributes to the resistance of AML cells to
   chemotherapy, as CXCL12 signaling can activate survival pathways in the
   leukemic cells. Elevated CXCL12 levels in the bone marrow are
   invariably associated with adverse AML patient outcomes
   [[125]46,[126]47]. Granzyme B (GzmB), a serine protease from cytotoxic
   lymphocytes, targets and destroys virus-infected or malignant cells
   [[127]48]. In AML, however, the levels of GZMB are notably diminished,
   undermining its role as a crucial cytotoxic mediator for T and NK cells
   in combatting cancer cells [[128]49]. The ITPR2 gene encodes for the
   Inositol 1,4,5-trisphosphate receptor type 2 (IP3R2) protein, a
   critical calcium channel that modulates intracellular signaling
   pathways. Although initial studies suggest a prognostic role for ITPR2,
   its implications in AML remain underexplored [[129]50,[130]51]. LYN, a
   tyrosine kinase, is integral to internal signaling processes and is
   crucial for the differentiation and persistence of the leukemic
   phenotype across various blood cancers including AML, CML, and B-cell
   lymphocytic leukemia [[131]52]. Research has indicated a significant
   association between LYN activity and overall survival in AML [[132]53].
   RAB9B, belonging to the RAB family of GTPases, is essential for vesicle
   trafficking and protein transport within cells. While there is no study
   on RGMB in AML, Jiang et.al found that RAB9B is overexpressed in
   colorectal cancer and promotes tumor growth and metastasis [[133]54].
   RGMB, a member of the repulsive guidance molecule family, is known for
   its role in neuronal development and axon guidance. RGMB is
   overexpressed in glioblastoma and promotes tumor growth and invasion.
   in terms of tumor progression, RGMb can inhibit cell proliferation, and
   invasion, in NSCLC, breast cancer, liver cancer, squamous cell
   carcinoma, nasopharyngeal carcinoma and other related tumors, so as to
   inhibit tumor progression and even improving survival ratio [[134]55].
   RUFY4 (RUN and FYVE domain-containing protein 4) is a protein that
   plays a role in intracellular membrane trafficking and cytoskeletal
   organization [[135]56]. the role of RUFY4 is not well understood in
   AML. However, studies have shown its involvement in regulating
   autophagy, a cellular process that plays a role in cancer development
   and progression [[136]57]. TRIM16 (Tripartite Motif Containing 16), is
   a member of the tripartite motif (TRIM) family of proteins [[137]58].
   While there is limited research on TRIM16 in AML, studies have shown
   its involvement in other types of cancers and cellular processes.
   Previous studies demonstrated that TRIM16 is involved in regulating the
   NF-κB signaling pathway, which plays a crucial role in cancer
   development and progression [[138]59].

   We developed an integrated prognostic model combining univariate Cox
   regression, LASSO regression, and multivariate Cox regression,
   augmented by Kaplan-Meier survival curves and nomograms. This model,
   leveraging genes such as CXCL12, GZMB, ITPR2, LYN, RAB9B, RGMB, RUFY4,
   and TRIM16, was designed to forecast outcomes for AML patients. The
   efficacy of this model was corroborated through ROC curve analysis
   across training, testing, and overall patient groups. It demonstrated a
   notably high AUC in the TARGET dataset, and the Kaplan-Meier analysis
   also yielded significant results. The findings revealed that patients
   in the high-risk category exhibited markedly lower survival rates
   compared to those in the low-risk group. Moreover, the model’s
   prognostic relevance was significantly associated with patient age, the
   FAB classification, and the CALGB classification.

   Nonetheless, this study presents certain limitations. First, the
   prognostic model was constructed from retrospective data, necessitating
   prospective multi-institutional cohort studies to validate its clinical
   applicability and reproducibility across heterogeneous populations.
   While external validation was performed using the TARGET database,
   broader evaluation across geographically and ethnically diverse cohorts
   is essential to confirm the model’s generalizability, particularly
   given potential variability in tumor microenvironment composition
   influenced by genetic ancestry or regional therapeutic practices.
   Second, the reliance on bulk RNA sequencing and computational
   deconvolution algorithms such as ESTIMATE introduces inherent technical
   constraints, including limited resolution to dissect cellular
   heterogeneity and potential batch effects across sequencing platforms.
   Although emerging single-cell transcriptomic and spatially resolved
   omics technologies could address these limitations by providing
   high-resolution microenvironmental mapping, their integration into
   clinical workflows remains constrained by infrastructural requirements,
   technical expertise, and standardization challenges—critical barriers
   for resource-limited settings. Third, while the model demonstrates
   robust prognostic stratification, its translational utility requires
   rigorous validation through prospective clinical trials assessing its
   capacity to guide therapeutic decisions, particularly in the context of
   emerging targeted therapies against TME components. Despite these
   considerations, our predictive framework provides a promising
   foundation for identifying novel therapeutic targets, ultimately
   informing more robust diagnostic and treatment paradigms for AML.

5 Conclusion

   Our study establishes a Tumor Microenvironment-derived Prognostic Model
   (MPM) that integrates eight TME-associated genes (CXCL12, GZMB, ITPR2,
   LYN, RAB9B, RGMB, RUFY4, TRIM16) to stratify AML patients into distinct
   risk categories with significant survival differences. Unlike existing
   models that focus on genetic mutations or immunogenic cell death
   markers, the MPM uniquely leverages stromal and immune infiltration
   metrics derived from the ESTIMATE algorithm, capturing dynamic
   interactions between leukemic cells and their protective niche. The
   model demonstrated robust prognostic accuracy across two independent
   cohorts (TCGA and TARGET), outperforming conventional parameters such
   as age and cytogenetic risk. Clinically, the MPM offers a actionable
   framework for risk-adapted therapy: high-risk patients could be
   prioritized for intensive regimens or novel TME-targeted therapies
   (e.g., CXCR4 inhibitors), while low-risk patients might benefit from
   reduced-intensity protocols to minimize toxicity. Future steps include
   prospective validation in multicenter trials to assess its utility in
   guiding real-time therapeutic decisions and integration into digital
   platforms for rapid risk scoring, thereby bridging the gap between
   computational biology and bedside practice.

Supporting information

   S1 Fig. Comparative ROC Curve analysis for 1, 3, and 5-year prognostic
   accuracy.

   A, ESTIMATE algorithm; B, patient age in AML cohort.

   (PDF)
   [139]pone.0325145.s001.pdf^ (6.8KB, pdf)

Acknowledgments