Abstract Hepatocellular carcinoma (HCC) is a prevalent malignancy influenced by the interplay between the immune system and tumor progression, but the detailed biological mechanism still elusive. To address this, we integrate single-cell RNA sequencing (scRNAseq) data with bulk sequencing data to investigate the prognostic significance of tumor-associated macrophages (TAMs) signatures in HCC. Utilizing bioinformatics approaches, including differential gene expression analysis, Cox regression, and logistic regression modeling, we constructed a robust prognostic model that effectively stratifies HCC patients into distinct risk groups with significant differences in survival outcomes. Applying our model to multiple HCC cohorts, robust predictive and prognostic performances were observed. Moreover, examination of the tumor microenvironment (TME) revealed distinct patterns of immune cell infiltration between high-risk and low-risk patient groups, which may contribute to the poorer outcomes observed in high-risk patients. Finally, drug sensitivity and AutoDock simulations suggest that the signature genes we identified could be potential targets for HCC therapy. In summary, this study provides novel insights into the HCC tumor microenvironment and its interaction with TAMs, offering a prognostic model with potential for improving patient stratification and guiding the development of novel therapeutic approaches. Future research ought to concentrate on confirming our findings in larger, prospective studies and examining the functional implications of TAMs in HCC progression. Introduction Liver cancer is the 6th largest primary cancer site and the 4th largest cause of cancer deaths. Hepatocellular carcinoma (HCC), which makes up about 90% of liver cancers, is a prevalent malignancy whose progression is significantly shaped by immune system interactions [[28]1]. HCC typically arises in the context of chronic inflammatory liver disease and is strongly associated with risk factors including HBV, HCV, heavy alcohol consumption, and conditions like obesity, type 2 diabetes, and non-alcoholic fatty liver disease (NAFLD). These factors contribute to liver damage and increase the likelihood of HCC development. Curative treatments, including surgical excision, liver transplant, and radiofrequency ablation, can be used in early-stage disease. However, most patients present with advanced HCC, for which systemic therapies, such as tyrosine kinase inhibitors and immune checkpoint inhibitors, offer only limited survival benefits. Despite advances in therapeutic strategies, the prognosis of patients with advanced HCC remains dismal. In recent years, substantial efforts have been devoted to identifying reliable prognostic biomarkers to predict the HCC clinical outcomes and to guide personalized treatment strategies. Prognostic biomarkers for HCC encompass a wide range of molecular features, including genetic mutations, gene expression signatures, and epigenetic alterations. For example, TERT promoter mutations, one of the most frequent genetic mutations in HCC, have been strongly associated with malignant transformation, poorer prognosis, and a heightened risk of recurrence [[29]2]. Kornelius Schulze and his colleagues indicated that TP53 mutations are another common marker of HCC, whose mutational status are often linked to adverse outcomes and uncontrolled growth regulation of tumor cells [[30]3]. Additionally, the over-expression of immune checkpoint genes, such as PD-L1, CTLA-4, LAG3, and TIM3, has been frequently correlated with unfavorable prognosis in HCC patients, with high levels of PD-L1 expression particularly indicative of a diminished response to immunotherapy [[31]4]. Overexpression of glypican-3 (GPC3) has also emerged as a hallmark of aggressive tumor phenotypes, correlating with shortened overall and disease-free survival, and representing a promising therapeutic target in HCC [[32]5]. Recent studies have also identified PIAS family genes as prognostic indicators, whose expression levels are associated with immune modulation and sensitivity to chemotherapeutic agents in HCC [[33]6]. In the context of epigenetic alterations, DNA hypermethylation of tumor suppressor genes (e.g., GSTP1, RASSF1A, and SOCS1), have been widely reported in HCC tissues and are associated with more aggressive tumor behavior and lower survival rates [[34]7]. In parallel, the downregulation of microRNAs (miRNAs), such as miR-122 and miR-26, has been implicated in promoting tumor progression and poor patient survival [[35]8]. Moreover, Ma et al. demonstrated that IGF2 BP3 promotes HCC progression by modulating the phenotypes of macrophages and CD8 ⁺ T cells within the tumor microenvironment, further illustrating the interplay between oncogenic signaling and immune regulation in HCC [[36]9]. Despite these advances, the predictive accuracy and clinical applicability of these biomarkers remain suboptimal, highlighting the need for more integrative and mechanistically informed approaches to improve prognostication and guide therapy selection in HCC. Increasing studies demonstrated that tumor immune microenvironment (TIME), comprising tumor, immune, and inflammatory cells, plays a pivotal role in HCC pathogenesis [[37]10,[38]11]. Among them, tumor associated macrophages (TAMs) are central regulators of tumor progression, angiogenesis, and immune suppression. TAMs exhibit remarkable functional heterogeneity, and can be sculpted into divergent functional phenotypes, M1 and M2, each with its own distinct attributes and functions. M1 TAMs are linked to antitumor immunity, unleashing inflammatory cytokines and reactive oxygen species that are cytotoxic to tumor cells. In stark contrast, M2 TAMs foster tumor growth and progression by secreting growth factors, cytokines, and chemokines that suppress the immune system and promote angiogenesis and tissue remodeling. In HCC, the TIME is often skewed toward the M2 phenotype, facilitating immune evasion and tumor progression. Thus, unraveling the mechanisms that govern TAM polarization and function is essential for devising effective therapeutic strategies to reprogram these cells, thereby enhancing anti-tumor immunity and improving patient outcomes. Additionally, recent investigations have underscored the potential of TAMs as prognostic biomarkers in HCC [[39]12]. Their abundance and polarization state within the tumor microenvironment have been shown to correlate with clinical outcomes, such as tumor stage, recurrence rates, and patient overall survival. Consequently, the exploration of TAMs in HCC is indispensable for comprehending their impact on cancer development, immune suppression, and for identifying potential therapeutic targets and prognostic biomarkers. By targeting and modulating TAMs, it may be able to enhance anti-tumor immunity, improve therapeutic outcomes, and ultimately prolong HCC patients’ survival. In this study, we integrated scRNA-seq data with bulk sequencing data (TCGA-LIHC) to identify TAM-associated genes for predicting HCC patient’s prognosis. Using these genes as inputs, we developed a prognostic prediction model for HCC patients. When applied to independent external validation cohorts, the model demonstrated robust prognostic and predictive performance. Moreover, notable disparities in TME and drug susceptibility were also noted between the low-risk and high-risk patient groups, underscoring the broader implications of our findings. Materials and methods Acquisition and processing of data To investigate the relationships between TAMs and the clinical efficacy of immunotherapy, we performed a comprehensive analysis of the TME. The scRNA-seq transcriptome profiles ([40]GSE125449) of liver cancer patients undergoing immunotherapy were retrieved from the Gene Expression Omnibus (GEO) database. Next, data preprocessing was performed using the Seurat (v4.3.0) R package, which included cell and gene filtering, data normalization, principal component analysis (PCA), and t-distributed stochastic neighbor embedding (t-SNE). The criteria for quality control were similar to those described by the original data contributors. Cells with more than 10% mitochondrial gene expression were excluded, while those expressing over 500 genes and with genes present in more than 5 cells were retained. To account for potential batch effects among different single-cell samples, the Harmony algorithm was applied after PCA for batch effect correction. Harmony integration was conducted using the RunHarmony function within Seurat package to ensure that biological variation, rather than technical artifacts, guided the downstream clustering and differential analysis. To identify genes with high levels of variability for downstream analyses, scRNA-seq data were normalized. High-quality scRNA-seq data were normalized to identify highly variable genes for downstream analyses. Next, we performed PCA to detect the top meaningful principal components (PCs), and the top 20 PCs were clustered using the t-SNE algorithm. In parallel, the bulk RNA-seq data for HCC patients (n = 424) was retrieved from the TCGA database using the TCGAbiolinks package [[41]13]. Raw count data were converted into TPM format. Corresponding clinical information, including survival data, follow-up duration, and cancer stage, was retrieved from the TCGA. Patients without complete survival or TNM staging data were excluded, resulting in a final cohort of 424 hepatocellular carcinoma (HCC) patients for subsequent analyses to develop the predictive model. To evaluate the robustness of our predictive model, we downloaded six external validation datasets ([42]GSE76427, [43]GSE15654, [44]GSE10141, [45]GSE112790, [46]GSE174570, and [47]GSE228782) from GEO database. To correct for batch effects across these external bulk transcriptome datasets, we used the Combat function from the sva R package (version 3.54.0), which employs an empirical Bayes framework to adjust for non-biological variation between batches. This step ensured consistency in downstream differential expression and prognostic analyses across datasets of different origins. Comprehensive details for the datasets utilized in this analysis are presented in [48]Table 1. Table 1. List of datasets used in this study. Datasets Name Source No. Of Sample Sequencing Type PMID [49]GSE125449 GEO 19 Single-cell RNAseq 39932456 [50]GSE76427 GEO 167 Bulk RNAseq 29117471 [51]GSE15654 GEO 216 Bulk RNAseq 23333348 [52]GSE10141 GEO 80 Bulk RNAseq 18923165 [53]GSE112790 GEO 198 Bulk RNAseq 30598371 [54]GSE174570 GEO 114 Bulk RNAseq 35302601 [55]GSE228782 GEO 83 Bulk RNAseq 38072306 TCGA-LIHC TCGA 424 Bulk RNAseq – [56]Open in a new tab Furthermore, TAMs-associated marker gene sets were obtained from the study by Jiang et al. Additionally, 50 hallmark gene sets were retrieved from the MSigDB using the “h.all.v2023.1.Hs.symbols.gmt” file. KEGG pathway sets were acquired from the “c2.cp.kegg.v7.4.symbols.gmt” file for subsequent gene set enrichment analysis (GSEA). Identification of differentially expressed genes (DEGs) related to immunotherapy response and TAMs The AUCell algorithm was used to calculate TAMs scores for individual cells in the [57]GSE125449 dataset. To explore the association between TAMs and immunotherapy efficacy, the “FindAllMarkers” function was employed to detect DEGs between immunotherapy responders and non-responders, as well as between high- and low-TAM-score samples. A Venn diagram was then utilized to identify the intersection of DEGs associated with both immunotherapy response and TAMs, which were defined as candidate DEGs. Screening prognostically relevant genes in the TCGA-LIHC cohort To further identify prognostically relevant genes among the candidate DEGs, we utilized the TCGA-LIHC dataset, which includes comprehensive survival information, as the HCC datasets from the GEO database lack such data. Univariate Cox regression analysis was initially conducted to determine DEGs significantly linked to patient prognosis. Subsequently, stepwise Cox regression analysis was then applied to refine this list and identify the optimal combination of prognostic genes capable of most effectively predicting patient survival. This approach aimed at enhancing the robustness and clinical relevance of the final prognostic gene set, providing a foundation for constructing predictive models with high accuracy and interpretability. Construction of a prognostic prediction model in TCGA-LIHC cohort To evaluate the prognostic predictive capability of the optimal combination of prognostic genes in HCC patients, we constructed a logistic regression-based prognostic model using the TCGA-LIHC dataset. The predictive score formula was calculated by incorporating gene expression levels and their corresponding logistic regression coefficients. Predictive Score = (Exp[1]*Coef[1]) + (Exp[2]*Coef[2]) +...+ (Exp[n]*Coef[n]). Patients were divided into high-risk and low-risk groups according to their predictive scores. Overall survival (OS) differences between the groups were analyzed using Kaplan-Meier survival curves and log-rank tests with the “survival” R package. To further assess the model’s predictive performance, time-dependent ROC curves were generated, and AUC values were determined using the “timeROC” R package. To assess the robustness and generalizability of the predictive model, we tested its performance on external validation datasets. By applying the same predictive scoring formula, patients were similarly classified into low-risk and high-risk groups. Kaplan-Meier analysis, log-rank tests, and time-dependent ROC curve analysis were repeated to confirm the model’s stability and consistency in predicting survival outcomes across independent datasets. This comprehensive validation ensured that the model retained high predictive accuracy across diverse patient populations. Tumor microenvironment and drug sensitivity analysis To elucidate the potential mechanisms underlying the prognostic value of the model, we performed immune infiltration analysis using the CIBERSORT [[58]14] and ssGSEA [[59]15] algorithms. Specifically, CIBERSORT estimated the relative proportions of 22 distinct immune cell types within the TME of HCC patients. Concurrently, the ssGSEA algorithm was applied to evaluate the infiltration of 24 immune cell types. Both CIBERSORT and ssGSEA analyses were performed using the exp2cell function from the SMDIC R package, with 100 iterations of permutation applied to ensure robustness of the immune cell infiltration estimates. Subsequently, the relationship between the expression of prognosis-related gene and the abundance of immune cell types was assessed using person correlation analysis. Additionally, by utilizing the Genomics of Drug Sensitivity in Cancer (GDSC) database, we predicted the sensitivity of HCC to various chemotherapeutic agents based on the expression levels of genes by oncoPredict algorithm. This analysis aimed to provide insights into whether the signature genes could be used as potential targets for HCC therapy. Statistical Analysis All data processing and statistical analyses were conducted using R software (version 4.0.2). For the purpose of comparing continuous variables between two groups with a normal distribution, the t-test was utilized, otherwise the Mann-Whitney U test was used. The Chi-square test or Fisher’s exact test were applied for analyzing categorical variables. Furthermore, survival analysis was conducted using the R package “survival”, accompanied by the plotting of Kaplan-Meier curves to visually depict survival between distinct groups, along with the application of the log-rank test for the purpose of evaluating statistical significance. P < 0.05 was considered as statistical significance. Results In the tumor microenvironment, macrophages play a complex role. They are not only capable of clearing pathogens and dead cells through phagocytosis but also regulate the immune response by secreting cytokines. A growing corpus of research findings has demonstrated a close correlation between the expression patterns of specific genes belonging to the macrophages and the efficacy of cancer immunotherapy, as well as the development and progression of HCC. However, the underlying biological mechanisms remain poorly understood. To this end, the present study adopted bioinformatics approaches for identification of TAM-associated features relevant to immunotherapy and used them to develop a prognostic prediction model for HCC. The detailed workflow is illustrated in [60]Fig 1. Specifically, we first obtained single-cell sequencing data of HCC patients treated with immunotherapy from the GEO database. By performing differential expression analysis between pre- and post-treatment patients, along with groups with high and low TAM levels, we identified a set of DEGs linked to TAMs and immunotherapy. Based on these genes’ expression, we conducted univariate Cox regression analysis followed by stepwise Cox regression to identify the optimal gene list related with HCC prognosis. Finally, we applied logistic regression to construct a predictive model for HCC prognosis, which effectively stratifies patient outcomes. Furthermore, the present investigation sought to elucidate the functional contributions of these genes within the complex interplay of the TME, particularly regarding their potential influence on immune evasion strategies and the response to therapeutic interventions. Through these analyses, our objective is to provide novel targets and strategies for improving prognosis in HCC. Fig 1. The flowchart of this study. [61]Fig 1 [62]Open in a new tab Decoding the impact of tumor-associated macrophages on immunotherapy We commenced our analysis by evaluating the association between tumor-associated macrophages and the effectiveness of immunotherapy in the [63]GSE125449 HCC dataset. Utilizing the AUCell algorithm, we determined TAMs scores for individual cells ([64]Fig 2A–[65]B). The results indicated that TAMs scores were notably elevated in monocytes, macrophages, and tumor cells, with monocytes and macrophages showing significantly higher scores than other cell types (with P values < 0.05, [66]Fig 2C–[67]D). This suggests a potentially pivotal role for TAMs in hepatocellular carcinoma. Subsequently, we examined the variation in TAM scores between pre- and post-immunotherapy samples. Strikingly, there was a marked increase in TAM scores among patients who received immunotherapy (P value = 0.0016, [68]Fig 2E). This observation points to a likely connection between TAMs and the immunotherapy response, suggesting that TAMs may contribute to the dynamics of the immunotherapy process. Fig 2. Calculating the TAMs-associated score. [69]Fig 2 [70]Open in a new tab (A) The number of genes detected by cell; (B) The AUC histogram obtained from AUCell algorithm; (C) t-SNE analysis of cell types; (D) Distribution of TAMs score in different cell types; (E) Distribution of TAMs score between HCC patients receiving and not receiving immunotherapy. Previous study has demonstrated that the polarization states of TAMs, specifically the M1 and M2 phenotypes, play a pivotal role in tumor progression and therapeutic response [[71]16–[72]18]. Classically activated M1 macrophages are generally characterized by their pro-inflammatory and tumoricidal properties, including the secretion of cytokines such as IL-12 and TNF-α, which promote antigen presentation and enhance cytotoxic T cell activation. In contrast, alternatively activated M2 macrophages—commonly regarded as the predominant phenotype of TAMs in many solid tumors—are associated with tumor-promoting functions. Given the functional heterogeneity of TAMs, we sought to further investigate their polarization dynamics in the context of cancer immunotherapy. Specifically, we employed the AddModuleScore function from the Seurat package to quantify M1 and M2 polarization states based on previously validated marker gene sets. Our analysis revealed a transcriptional bias toward M2-like macrophage polarization in patients who had not undergone immunotherapy, indicative of a more immunosuppressive TME. In contrast, patients treated with immune checkpoint blockade (ICB) exhibited a more balanced or even M1-skewed TAM profile, suggesting that immunotherapy may reshape macrophage polarization toward a more pro-inflammatory and immunostimulatory state (S1 Fig). These findings underscore the potential of TAM polarization status not only as a biomarker for immunotherapeutic responsiveness but also as a targetable axis for enhancing treatment efficacy. To delve into the underlying biological mechanisms, we initially conducted a differential gene expression analysis of samples before and after immunotherapy, pinpointing genes that exhibited significant changes in expression levels. Through this analysis, we identified a total of 1115 immunotherapy-associated differentially expressed genes ([73]Fig 3A–[74]B). Our previous findings suggested that tumor-associated macrophages (TAMs) may be involved in the response to immunotherapy. Patients were subsequently categorized into two groups according to median TAM scores, herein designated as low and high TAM score groups, with the objective of identifying DEGs between these groups. This analysis uncovered 620 DEGs related to macrophages ([75]Fig 3C–[76]D). To further pinpoint genes that are both related to immunotherapy and macrophages, we intersected the set of immunotherapy-related DEGs with the TAMs-related DEGs, resulting in the identification of 360 overlapping genes ([77]Fig 3E), which included genes from the immunoglobulin family (such as IGHG1, IGHG2, IGHG3, IGHG4) and the cytochrome P450 family (like CYP2A7, CYP4A11, CYP2A6). Immunoglobulins represent a class of proteins that fulfill a pivotal function within the immune system, primarily responsible for recognizing and binding to antigens, thereby triggering an immune response. The cytochrome P450 family, on the other hand, consists of a large group of enzymes containing heme groups, which play a significant role in drug metabolism. CYP2A7, in particular, is an important enzyme involved in the metabolic process of a variety of drugs and xenobiotics, including the biotransformation and detoxification of certain pharmaceuticals. With high expression levels in the liver, CYP2A7 significantly impacts the metabolism of many clinical drugs, thus holding substantial relevance in pharmacogenetics and personalized medicine. To further investigate the functions of these DEGs, we employed Gene Ontology (GO) and KEGG enrichment analysis. The GO analysis identified significant enrichment in three aspects: biological processes, cellular components, and molecular functions, involving key biological processes such as the immunoglobulin complex, antigen binding, and amoeboid-type cell migration ([78]Fig 3F). Pathway enrichment analysis indicated that the DEGs were predominantly related to drug metabolism and chemical carcinogenesis ([79]Fig 3G). These analyses yielded a more profound comprehension of the potential mechanisms through which TAMs mediate their effects in immunotherapy, providing a theoretical foundation for subsequent clinical research. Fig 3. Identifying immunotherapy and TAMs related genes. [80]Fig 3 [81]Open in a new tab (A) The volcano plot of gene expressed differences between patients receiving and not receiving immunotherapy; (B) Heatmap of DEGs for patients receiving and not receiving immunotherapy; (C) The volcano plot of gene expressed differences between high- and low- TAMs score patients; (D) Heatmap of DEGs for patients with high- and low- TAMs score; (E) Immunotherapy and TAMs related DEGs Venn diagram; (F) GO functional enrichment analysis of DEGs; (G) KEG functional enrichment analysis of DEGs. Constructing a prognostic model for hepatocellular carcinoma Subsequently, we further identified key genes among these differentially expressed genes that are linked to HCC prognosis. In light of the paucity of clinical information in the [82]GSE125449 dataset, we proceeded with subsequent HCC prognosis-related analyses using the TCGA-LIHC dataset. Initially, we employed univariate Cox regression analysis to pinpoint genes within the DEGs that correlate with patient prognosis, resulting in the identification of 34 genes associated with prognosis, including 9 prognostic risk factors and 25 protective factors ([83]Fig 4A). To discern the optimal set of genes for predicting patient outcomes, stepwise regression analysis was conducted, leading to the identification of 9 genes, such as CLEC14A, TM4SF18, PLOD2, ROBO4, CYYR1, GCHFR, ECI1, SPP1, and COL15A1 ([84]Fig 4B and [85]Table 2). CLEC14A has been characterized as a tumor endothelial marker that plays a pivotal role in pathological angiogenesis [[86]19]. Mechanistically, it modulates signaling pathways mediated by vascular endothelial growth factor receptors VEGFR-2 and VEGFR-3, thereby contributing to endothelial sprouting and maintenance of vascular integrity [[87]20]. Notably, the interaction between CLEC14A and multimerin-2 (MMRN2) has been shown to facilitate angiogenic processes, and pharmacological disruption of this interaction significantly suppresses sprouting angiogenesis and impedes tumor growth, underscoring its potential as a therapeutic target in anti-angiogenic strategies. CYYR1 (Cysteine and Tyrosine-Rich 1), although less well characterized, has been implicated in the pathogenesis of several malignancies, including breast, ovarian, and colorectal cancers [[88]21–[89]23]. The Aberrant expression of CYYR1 has been associated with tumor progression, suggesting that it may function as an oncogenic factor contributing to tumorigenesis and disease advancement. Fig 4. Constructing the prognostic model for HCC. [90]Fig 4 [91]Open in a new tab (A) Forest plot of prognosis-related genes; (B) The histogram of logistic regression coefficients for signature genes; (C) Kaplan-Meier survival curve between high-risk and low-risk sample groups; (D) Time-dependent ROC curve of the model for predicting patients’ prognosis. Table 2. Coefficients of the genes included in the prognostic model. Gene Symbol Description Coefficient (95%CI) CYYR1 Cysteine and tyrosine-rich protein 1 1.422 (2.395, 0.486) GCHFR GTP cyclohydrolase I feedback regulatory protein 1.276 (2.069, 0.522) COL15A1 Collagen type XV alpha 1 chain 0.580 (0.922, 0.248) ECI1 Enoyl-CoA isomerase 1 0.567 (0.954, 0.1926) TM4SF18 Transmembrane 4 L6 family member 18 0.212 (0.774, −0.328) SPP1 Secreted phosphoprotein 1 (osteopontin) −0.120 (−0.043, −0.199) PLOD2 Procollagen-lysine,2-oxoglutarate 5-dioxygenase 2 −0.255 (−0.005, −0.512) CLEC14A C-type lectin domain family 14 member A −0.717 (−0.168, −1.283) ROBO4 Roundabout guidance receptor 4 −1.134 (−0.374, −1.923) [92]Open in a new tab We then utilized logistic regression analysis to construct an HCC prognosis model using the expression data of the above 9 genes, calculating a Predictive Score for each sample: Predictive Score = CYYR1 * 1.422 + GCHFR * 1.276 + COL15A1 * 0.580 + ECI1 * 0.567 + TM4SF18 * 0.212 - SPP1 * 0.120 - PLOD2 * 0.255 - CLEC14A * 0.717 - ROBO4 * 1.134 ([93]Table 2). Employing ROC curve analysis, we determined an optimal threshold value, which was used to stratify samples into two groups. Subsequently, a comparison was performed to ascertain the disparities in overall survival between these two distinct groups. The utilization of Kaplan-Meier survival analysis yielded a discernible outcome: low-risk samples exhibited a marked increase in overall survival periods when contrasted with high-risk samples (median OS, 1.814 months versus 1.425 months, P < 0.0001, [94]Fig 4C). Subsequently, time-dependent ROC curve analysis indicated that the model achieved an AUC of 0.71 for predicting patient survival within one year ([95]Fig 4D). Comparison of the 9-gene prognostic model with existing biomarkers To benchmark the predictive utility of our 9-gene prognostic model, we compared its performance with several previously reported HCC prognostic biomarkers. These included the expression levels of SMPD3 [[96]24], GPC3 [[97]5], CD274, CTLA4, LAG3, and HAVCR2 [[98]4], as well as a 12-gene prognostic signature developed by Yan et al. [[99]25]. We evaluated each model’s ability to predict 1-year overall survival using the area under the ROC curve (AUC) and assessed their stratification power using Kaplan-Meier survival analysis. As shown in [100]S2A Fig, our 9-gene model achieved the highest AUC (0.71), outperforming SMPD3 (0.52), GPC3 (0.53), CD274 (0.48), CTLA4 (0.52), LAG3 (0.52), HAVCR2 (0.56), and the 12-gene signature (0.56). Kaplan-Meier analysis further demonstrated that only the 12-gene signature, among the biomarkers, was able to separate patients into risk subgroups with statistically significant survival differences. However, the stratification achieved by our 9-gene model was more robust, with greater separation of survival curves and a more significant log-rank P value ([101]S2B–[102]H Fig). These findings collectively indicate that our model provides superior prognostic discrimination and may offer greater clinical utility than existing single-gene and multi-gene biomarkers in HCC. Assessing the differences in immune infiltration between high- and low-risk samples After this, we proceeded to assess the disparities in immune infiltration between the high and low-risk groups. Initially, we calculated several scores—stromal, immune, ESTIMATE, and tumor purity—applying the ESTIMATE algorithm. Our results revealed that low-risk samples exhibited higher stromal, immune, and ESTIMATE scores, along with lower tumor purity ([103]Fig 5A–[104]D). We then conducted a TIDE analysis on the dataset to further investigate the presence of Tumor Immune Dysfunction and Exclusion, indicating that high-risk samples were more prone to exhibit tumor immune dysfunction and exclusion ([105]Fig 5E–[106]G). Additionally, a significant decrease in the scores related to tumor-associated M2 macrophages was observed in low-risk samples (P < 0.05, [107]Fig 5H). M2 macrophages, a subtype of tumor-associated macrophages, have been shown to exert a multifaceted function in cancer progression and therapy response. They may contribute to the malignancy of tumors by promoting angiogenesis, tumor cell growth, and metastasis via releasing a range of cytokines and signaling molecules. In some contexts, M2 macrophages may suppress anti-tumor immune responses by suppressing the activity of T cells and natural killer (NK) cells, potentially leading to a poorer prognosis for HCC patients. Fig 5. Comparison of immune features between high-risk and low-risk sample groups. [108]Fig 5 [109]Open in a new tab Comparison of stromal score (A), immune score (B), ESTIMATE score (C), tumor purity (D), TIDE (E), Dysfunction (F), Exclusion (G), TAM.M2 score between high-risk and low-risk sample groups. To enhance our comprehension of the mechanisms driving these differences, we took a meticulous examination of TME between patients with distinct risk scores. The results indicated that in low-risk samples, there was a higher infiltration level of regulatory T cells (Tregs) and dendritic cells (DCs), which may enhance the anti-tumor immune response. In contrast, high-risk samples exhibited lower infiltration levels of tumor-infiltrating lymphocytes (TILs), potentially associated with poorer prognosis. Additionally, we observed higher expression levels of pro-inflammatory cytokines, including IL-6 and TNF-α in high-risk samples, which may reflect an inflammatory state within the tumor microenvironment. These findings suggest that differences in immune infiltration may be closely related to the composition and functionality of immune cells within the TME, thereby influencing HCC clinical outcomes. We employed the CIBERSORT and ssGSEA algorithms to assess relative immune cell abundance. By comparing the immune infiltration among individuals with low-risk and high-risk score subgroups, we identified significant differences in certain immune cell types. For CIBERSORT, consistent with prior findings, elevated levels of M2 macrophages were observed in high-risk groups, while increased infiltration of M1 macrophages was noted in low-risk samples ([110]Fig 6A). For ssGSEA, we discovered that high-risk patients enriched in macrophages, cytotoxic cells, mast cells, and neutrophils ([111]Fig 6B). Moreover, we found that the immune checkpoint-related genes were under-expression in high-risk samples. Subsequently, an in-depth investigation was undertaken to elucidate the relationship between signature genes expression and sample immune composition. The result of person correlation analysis demonstrated a significant correlation between signature genes expression and patient immune infiltration. These discoveries suggest that differences in immune infiltration may be closely associated with the composition and status of immune cells in TME, thus affecting the clinical outcomes of HCC patients. Fig 6. Comparison of immune cell infiltration between high-risk and low-risk sample groups. [112]Fig 6 [113]Open in a new tab (A) Comparison of immune cell infiltration between high-risk and low-risk sample groups estimated by CIBERSORT algorithm; (B) Comparison of immune cell infiltration between high-risk and low-risk sample groups estimated by ssGSEA algorithm. To further validate these results, we also performed GSEA to explore biological pathways associated with HCC prognosis. Our analysis revealed significant differences in biological processes associated with cancer development between distinct patient subgroups. Specifically, low-risk samples exhibited significant involvement in hallmark pathways such as TNFA signaling, IL6-JAK-STAT3, P53 pathway, and so on ([114]Fig 7 and [115]S1 Table). To further validate our findings, GSEA analysis was also performed based on the KEGG pathway gene sets. The results showed that low-risk samples exhibited significant involvement in cytokine-cytokine receptor interaction, JAK STAT signaling pathway, T/B cell receptor signaling pathway, and MAPK signaling pathway ([116]Fig 8 and [117]S2 Table). In summary, these findings indicate that low-risk patients exhibit activation of immune-related and tumor-suppressive pathways, correlating with favorable prognosis, while high-risk patients show suppressed immune activity and enhanced oncogenic signaling, contributing to poor outcomes. These insights underscore the molecular basis of HCC prognosis and highlight potential therapeutic targets. Fig 7. GSEA analysis of HALLMARK gene sets between high-risk and low-risk sample groups. [118]Fig 7 [119]Open in a new tab GSEA plots of TNFA signaling via NFKB (A); IL6 JAK STAT3 signaling (B); interferon gamma response (C); IL6 IL2 STAT5 signaling (D); interferon alpha response (E); and P53 pathway (F). Fig 8. GSEA analysis of KEGG pathways between high-risk and low-risk sample groups. [120]Fig 8 [121]Open in a new tab GSEA plots of cytokine cytokine receptor interaction (A); JAK STAT signaling pathway (B); natural killer cell mediated cytotoxicity (C); T cell receptor signaling pathway (D); MAPK signaling pathway (E); and B cell receptor signaling pathway (F). Drug sensitivity analysis Further investigation into the relationships between the risk model and drug sensitivity was undertaken by employing the oncoPredict algorithm to calculate IC50 values for drugs corresponding to TCGA-LIHC dataset samples against the GDSC database. oncoPredict predicts patient responses to various drugs based on specific gene expression patterns. Subsequently, we identified disparities in drug sensitivity among high-risk and low-risk patients ([122]Fig 9A), revealing that 14 drugs exhibited significant differences in IC50 values between these groups, such as gemcitabine, trametinib, and dasatinib (P < 0.05, [123]Fig 9A). Dasatinib, an oral multikinase inhibitor used to treat Philadelphia chromosome (Ph) -positive leukemia, has also been found to be useful for HCC [[124]26]. We then performed a correlation analysis between the signature genes expression and drug susceptibility, uncovering that the expression of genes CLEC14A and CYYR1 was significantly negatively correlated with sensitivity to all drugs ([125]Fig 9B), suggesting their potential key roles in HCC progression and treatment response. Specifically, we also found that the expression of the gene CLEC14A exhibited the strongest correlation with sensitivity to the drug dasatinib, while the expression of CYYR1 showed the highest correlation with sensitivity to talazoparib. To further investigate the possible application for signature genes as therapeutic targets for these drugs, molecular docking examination was conducted. Utilizing the AutoDock software, our findings indicated that the optimal binding conformation of CLEC14A with dasatinib demonstrated a high binding affinity, with a binding energy of −6.46 kcal/mol ([126]Fig 9C). Similarly, the optimal binding conformation of CYYR1 with talazoparib manifested an impressive binding affinity, exhibiting a binding energy of −7.66 kcal/mol ([127]Fig 9D). These findings suggested that signature genes we identified could serve as potential therapy targets and offer clinical guidance in the development of personalized treatment strategies. Fig 9. Drug sensitivity analysis. [128]Fig 9 [129]Open in a new tab (A) Distribution of drug sensitivity between high-risk and low-risk HCC patients; (B) Correlation heatmap between the expression of signature genes and drug sensitivity; (C) 3-D schematic view of AutoDock between gene CLEC14A and drug dasatinib; (D) 3-D schematic view of AutoDock between gene CYYR1 and drug talazoparib. Validation of the model’s predictive performance To validate the predictive capacity for our model, we first applied it to external validation datasets for HCC. We began by assessing the model’s prognostic capabilities in HCC datasets with clinical information, including [130]GSE76427, [131]GSE15654, and [132]GSE10141. Predictive scores were calculated and an optimal threshold determined for all samples according to the ROC curve. The samples were then categorized as low-risk and high-risk groups. Subsequently, we conducted survival analysis to ascertain the disparities in prognosis among low-risk and high-risk subgroups. Our results indicated that patients with high-risk scores exhibited notably poorer clinical outcomes against those with low-risk scores across all three cohorts, as clearly depicted by the Kaplan-Meier survival curves (P < 0.05, [133]Fig 10A–[134]C), confirming the model’s accuracy and reliability in predicting HCC patient outcomes. Furthermore, we also downloaded three additional HCC datasets without prognostic information from the GEO database. We then applied our predictive model to these external validation datasets to further verify the model’s predictive power for HCC. The ROC curve analysis demonstrated that the model maintained a high level of performance in predicting HCC across the [135]GSE112790, [136]GSE174570, and [137]GSE228782 datasets, with AUC values of 0.862 for [138]GSE112790, 0.884 for [139]GSE174570, and 0.703 for [140]GSE228782 ([141]Fig 10D–[142]F). Fig 10. Validating the predictive and prognostic performances of model across cohorts. [143]Fig 10 [144]Open in a new tab Kaplan-Meier survival curve between high-risk and low-risk sample groups in [145]GSE76427 cohort (A), [146]GSE15654 cohort (B), [147]GSE10141 cohort (C); ROC curve of model for predicting HCC in [148]GSE112790 cohort (D), [149]GSE174570 cohort (E), [150]GSE228782 cohort (F). Discussion In this study, we presented a comprehensive integrative analysis by combining scRNAseq data with bulk transcriptomic profiles from TCGA-LIHC to elucidate the prognostic significance of tumor-associated macrophages in HCC prognosis. By employing univariate Cox regression analysis and stepwise Cox regression analyses, we identified nine TAM-related prognostic genes and constructed a predictive model based on their expression profiles. This model effectively stratified HCC patients into high- and low-risk subgroups with distinct survival outcomes across multiple cohorts, as demonstrated by Kaplan–Meier curves and time-dependent ROC analyses, underscoring its clinical potential. Beyond its prognostic utility, our model offers mechanistic insights into how TAMs orchestrate the immunological landscape of HCC. TAMs represent a phenotypically plastic population capable of polarizing into pro-inflammatory M1-like or immunosuppressive M2-like states in response to microenvironmental cues. In HCC, a predominant M2-like polarization contributes to an immunosuppressive milieu through multiple mechanisms, including the secretion of anti-inflammatory cytokines (e.g., IL-10, TGF-β), promotion of regulatory T cell recruitment, suppression of cytotoxic T lymphocyte function, and enhancement of angiogenesis and extracellular matrix remodeling. These mechanisms collectively enable tumor progression and immune evasion. In our study, we observed that high-risk patients displayed increased expression of M2-associated markers and reduced infiltration of tumor-infiltrating lymphocytes (TILs), supporting the hypothesis that both the quantity and polarization state of TAMs critically shape immune responsiveness and clinical outcomes. Notably, patients receiving immune checkpoint blockade exhibited a more balanced or M1-skewed macrophage profile, suggesting that TAM reprogramming may enhance immunotherapeutic efficacy in HCC [[151]27,[152]28]. Moreover, drug sensitivity analysis and in silico docking suggest several of the identified genes (e.g., CLEC14A, CYYR1) may serve as potential therapeutic targets. CLEC14A has been implicated in tumor angiogenesis and endothelial cell adhesion [[153]19,[154]20,[155]29], while CYYR1 is reportedly dysregulated in several malignancies including breast and ovarian cancers [[156]21–[157]23]. However, we acknowledge that the lack of experimental validation of these targets is a limitation of our study. Given the retrospective and computational nature of this work, and current constraints in experimental capacity, we have not conducted functional assays. Nevertheless, we believe our findings provide a strong foundation for future in vitro or in vivo validation studies, which will be critical for confirming the translational relevance of our candidate genes. Moreover, the model’s compact gene set offers practical advantages for clinical translation, potentially enabling its application via qPCR panels, RNA-seq diagnostics, or immunohistochemistry to support risk stratification and therapeutic decision-making. This enhances the feasibility of integrating the model into clinical workflows for HCC management. In conclusion, our study highlights the prognostic value of TAM-related transcriptional programs in HCC and their potential interplay with immune responses and therapy. The proposed model offers a feasible tool for patient stratification and hypothesis generation in translational oncology. Future work should focus on prospective validation and mechanistic dissection of key candidate genes and pathways in experimental systems. Supporting information S1 Fig. Distribution of M1 scores between immunotherapy-treated and immunotherapy-naïve patients. (A) Comparison of M1 polarization scores between immunotherapy-treated and untreated samples. (B) Comparison. of M2 polarization scores between immunotherapy-treated and untreated samples. (C) t-SNE plot depicting the distribution of M1 polarization scores in immunotherapy-treated versus untreated samples. (D) t-SNE plot depicting the distribution of M2 polarization scores in immunotherapy-treated versus untreated samples. (E) Comparison of M1 polarization scores among different cell types. (F) Comparison of M2 polarization scores among different cell types. (TIF) [158]pone.0325120.s001.tif^ (6.1MB, tif) S2 Fig. Comparative prognostic performance of our 9-gene model versus established biomarkers in hepatocellular carcinoma. (A) ROC curves of prognostic biomarkers such as SMPD3, GPC3, CD274, CTLA4, LAG3, HAVCR2, and 12-gene signature for one year survival. (B) Kaplan-Meier survival analysis of OS comparing the SMPD3 high- and low-expression groups. (C) Kaplan-Meier survival analysis of OS comparing the GPC3 high- and low-expression groups. (D) Kaplan-Meier survival analysis of OS comparing the CD274 high- and low-expression groups. (E) Kaplan-Meier survival analysis of OS comparing the CTLA4 high- and low-expression groups. (F) Kaplan-Meier survival analysis of OS comparing the LAG3 high- and low-expression groups. (G) Kaplan-Meier survival analysis of OS comparing the HAVCR2 high- and low-expression groups. (H) Kaplan-Meier survival analysis of OS comparing the high- and low-risk groups classified by 12-gene signature. (TIF) [159]pone.0325120.s002.tif^ (2.5MB, tif) S1 Table. GSEA results based on HALLMARK gene sets. (DOCX) [160]pone.0325120.s003.docx^ (14.1KB, docx) S2 Table. GSEA results based on KEGG pathways. (DOCX) [161]pone.0325120.s004.docx^ (17KB, docx) Data Availability All datasets utilized in this study are publicly available from established repositories. Specifically, the GSE125449, GSE76427, GSE15654, GSE10141, GSE112790, GSE174570, and GSE228782 datasets can be accessed through the Gene Expression Omnibus (GEO, [162]https://www.ncbi.nlm.nih.gov/geo/) database, while the TCGA-LIHC dataset is available from The Cancer Genome Atlas (TCGA, [163]https://portal.gdc.cancer.gov/) database. Funding Statement The author(s) received no specific funding for this work. References