Abstract Background Hepatocellular carcinoma (HCC) is a common malignant tumor characterized by a high recurrence rate and poor prognosis. This study aimed to identify glycolysis-related prognostic markers and immunological abnormalities in patients with HCC. Methods We collected samples from cancerous and adjacent non-cancerous tissues for transcriptomic, metabolomic, and 16 S rRNA sequencing analyses. Glycolysis-related prognostic markers were identified by integrating public data from The Cancer Genome Atlas, [38]GSE14520, and [39]GSE76427 datasets. Additionally, single-cell sequencing data ([40]GSE202642) were used to analyze the significantly infiltrated cellular subpopulations in HCC and investigate the expression of prognostic markers across different cell types. Spatial transcriptomics and mass cytometry (CyTOF) data were used to examine the expression differences in immune cells across tumor, peritumoral, and control tissues. Key prognostic markers were validated using reverse transcription-quantitative polymerase chain reaction, western blotting, and immunohistochemistry. Results Differentially expressed genes (DEGs) between HCC and control tissues were primarily clustered in cell cycle and metabolic pathways, particularly in the glycolysis pathway. Metabolomic analysis identified 175 differentially expressed metabolites that were mainly enriched in digestive and amino acid metabolism pathways. 16 S rRNA analysis revealed a significant increase in the abundance of Aenigmarchaeota and a decrease in the abundance of Proteobacteria in HCC tissues. The former was positively associated with glycolysis, whereas the latter showed a negative association. Through public data integration, 17 glycolysis-related DEGs were identified and 101 predictive models were constructed using machine learning. The StepCox[both] + random survival forest model using AGL, G6PD, GOT2, and KIF20A exhibited the best diagnostic performance among the three datasets. Single-cell RNA sequencing indicated significant infiltration of CD8 + Tex, CD8 + T, CD8 + Trm, and epithelial cells in HCC tissues. AGL, G6PD, GOT2, and KIF20A were highly expressed in CD8 + Tex cells, CD8 + Trm cells, macrophages, and monocytes, respectively. Spatial transcriptomics and CyTOF analyses showed greater infiltration of CD8 + Tex and CD8 + Trm cells in tumor tissues than in controls. Molecular assays further confirmed that G6PD and KIF20A expression levels were significantly higher, whereas AGL and GOT2 expression levels were lower, in HCC tissues than in control tissues. Conclusion Through integrative multi-omics analysis, we identified glycolysis-related prognostic markers with distinct expression profiles across immune cell subsets in HCC. Our findings identify potential biomarkers and therapeutic targets for the diagnosis and treatment of HCC. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-025-06421-6. Keywords: Hepatocellular carcinoma, Glycolysis, Immune cells, Prognostic markers Introduction Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related deaths globally and one of the most common malignant tumors, with incidence and mortality rates continuing to rise in recent years [[41]1]. The development of HCC is closely associated with various factors, including chronic hepatitis virus infections (e.g., HBV and HCV), cirrhosis, obesity, metabolic syndrome, and environmental factors [[42]2]. Despite advances in early screening and treatment strategies for liver cancer, patient prognosis remains poor, with a 5-year survival rate of only 12%, as the disease is often diagnosed at an advanced stage [[43]3]. Therefore, a better understanding of the molecular mechanisms and the heterogeneity of HCC may aid in the identification of novel biomarkers and therapeutic targets. Current perspectives suggest that cancer cells initially undergo unique metabolic alterations to adapt to their adverse local environment, which activates downstream signaling pathways that promote tumor invasion and metastasis [[44]4]. In recent years, the importance of glycolysis in tumor metabolism has attracted increasing attention [[45]5]. The classic “Warburg effect” describes how tumor cells tend to rely on glycolysis over oxidative phosphorylation to obtain energy, even in the presence of oxygen, thereby supporting their rapid proliferation [[46]6]. This phenomenon has made glycolysis-related metabolic pathways crucial areas of research in tumor biology. Glycolysis provides a rapid energy source for tumor cells and generates various metabolic intermediates that play key roles in lipid, amino acid, and nucleotide biosynthesis [[47]7]. There is an increasing amount of evidence that alterations in glycolysis are closely associated with tumor initiation, progression, and metastasis [[48]8]. Although some studies have indicated that glycolysis creates an inhibitory tumor microenvironment (TME) [[49]9], its precise biological functions and roles in the TME remain to be elucidated. In liver cancer, the expression patterns and functional mechanisms of glycolysis-related genes have yet to be fully explored. Recent studies have highlighted the significant role of bacterial communities in the development and progression of HCC [[50]10]. However, bacterial identification is traditionally conducted through gut microbiome sequencing, rather than through direct analysis of tissue samples. In our study, we employed 16 S rRNA sequencing to directly examine the microbial composition of HCC and adjacent non-cancerous tissues, aiming to explore the potential interactions between the intrahepatic microbiota and HCC-associated metabolic characteristics. This study aimed to identify glycolysis-related genes associated with liver cancer and to assess their prognostic value across various datasets. Additionally, by examining changes in the microbial community and their relationship with host metabolism, we sought to uncover the interactions between metabolic reprogramming in liver cancer and microbial ecology. The findings of this study provide new insights into the metabolic characteristics of HCC, and potentially offer novel biomarkers and targets for the development of early diagnostic and therapeutic strategies for liver cancer. Materials and methods Study subjects This study included 31 patients with HCC, from whom cancerous and adjacent noncancerous tissue samples were collected during curative partial hepatectomy. Cancerous tissues were obtained from the solid tumor mass, whereas non-cancerous adjacent tissues were obtained from the liver tissue at least 2 cm away from the tumor margin. All samples were pathologically confirmed to ensure the accuracy of their categorization as cancerous or noncancerous tissues. The inclusion criteria were: an age from 35 to 65 years; no sex restrictions; no history of other malignancies; and no liver-cancer-related treatments, including surgery, radiotherapy, chemotherapy, immunotherapy, or targeted therapy. The exclusion criteria were severe liver dysfunction (Child-Pugh grade C), the presence of severe comorbidities, pathological reports indicating that the HCC was secondary or metastatic rather than primary, any medications or interventions within 6 months before sample collection, and active infections at the time of sample collection. Informed consent was obtained from all patients. This study strictly adhered to the Declaration of Helsinki and was approved by the Ethics Committee of the First Affiliated Hospital of Xinjiang Medical University (No. K202304-20). Immediately after collection, the tissues were washed with saline to remove blood and impurities and then stored at -80 °C. For sequencing experiments, 11 pairs of cancerous and adjacent non-cancerous tissues were randomly selected. Tissues using for immunohistochemistry were fixed in 10% formalin. Transcriptomic sequencing and data analysis Total RNA was extracted from freshly frozen cancerous and adjacent non-cancerous tissues using TRIzol reagent (Thermo Fisher Scientific, Waltham, MA, USA). RNA integrity was confirmed using an Agilent 2100 Bioanalyzer (Agilent, Santa Clara, CA, USA), based on an RNA integrity number ≥ 7.0, an A260/A280 ratio between 1.8 and 2.1, and an A260/A230 ratio greater than 1.8. mRNA was isolated from total RNA using magnetic bead technology to remove non-coding RNA, and the enriched mRNA was reverse-transcribed into cDNA using reverse transcriptase. The cDNA was then fragmented into short segments of 200–300 bp, and sequencing adapters were added to the fragment ends. The cDNA library was amplified using polymerase chain reaction (PCR), and the library quality and fragment length distribution were assessed using quantitative PCR and a bioanalyzer. Libraries meeting the quality standards were sequenced on a NovaSeq 6000 platform (Illumina, San Diego, CA, USA). Raw sequencing data were first quality-checked using FastQC software and trimmed using Trimmomatic to remove adapter sequences and low-quality bases and obtain high-quality, clean reads. Clean reads were aligned to the human genome (reference genome version: GRCh38) using HISAT2 software to retrieve alignment data for each gene. FeatureCounts software was used to quantify the read counts for each gene, providing expression levels across different samples. To account for differences in sequencing depth and technical variation, normalization was performed using variance-stabilizing transformation in DESeq2 software. Read counts were converted into transcripts per million. Differential expression analysis was performed using DESeq2, calculating the fold change and P-value of each gene. Genes with P-value < 0.05 and|log(fold change)| > 1 were identified as significantly differentially expressed genes (DEGs). Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses of DEGs were conducted using the ClusterProfiler package. Gene set enrichment analysis (GSEA) was also performed on whole-genome expression data to further validate the enrichment of DEGs in specific pathways. Metabolomics analysis Cancerous and adjacent non-cancerous tissues were homogenized in a pre-cooled 80% methanol solution (pH adjusted with 0.1% formic acid) for metabolomic analysis. Cell debris was removed using high-speed centrifugation (4 °C, 15,000 × g, 10 min), and the supernatant was collected. Quality control (QC) samples were prepared using a homogenized mixture of cancerous and noncancerous tissues. Non-targeted metabolomics was performed using a high-resolution, ultra-high-performance liquid chromatography-tandem mass spectrometry (UHPLC-MS/MS) platform. A C18 reverse-phase column was used in the UHPLC system to separate metabolites with varying polarities based on gradient elution. Detection was conducted in both positive and negative ion modes. Raw MS data were preprocessed using Compound Discoverer 3.3 software, including denoising, alignment, peak extraction, and peak identification. Metabolite peaks were characterized by their mass-to-charge ratio (m/z) and retention time (RT) to ensure comparability across samples. Peak area quantification was performed, and metabolites were identified by referencing the high-resolution secondary spectra databases mzCloud, mzVault, and the primary MassList database. Metabolites in QC samples with a coefficient of variance (CV) greater than 30% were excluded from further analysis. Metabolite intensities were normalized using total ion current normalization to correct for sample-to-sample variability. The peak area of each feature in the sample represented the relative quantification of a metabolite, and the quantification results were normalized to the total peak area to obtain the final metabolite quantification. Principal component analysis (PCA) was conducted on the peaks extracted from all experimental samples. Partial least squares discriminant analysis (PLS-DA) was used to compare metabolite levels between HCC and adjacent non-cancerous tissues. Metabolites with a variable importance in the projection (VIP) value > 1 and a P-value < 0.05 were considered significant. KEGG pathway enrichment analysis of the identified differentially expressed metabolites was performed using the ClusterProfiler package. 16 S rRNA sequencing analysis Strict aseptic techniques were used during tissue sampling to minimize environmental contamination. Negative control samples, including blank extraction and PCR-negative controls, were used to monitor potential contaminants. Microbial DNA was extracted from cancerous and adjacent non-cancerous tissues using the DNeasy PowerSoil Kit (Qiagen, Hilden, Germany). Specific primers targeting the 16 S rRNA gene were used for amplification and the PCR products were purified using Agilent AMPure XP magnetic beads. Purified PCR products were ligated to paired-end sequencing adapters to construct sequencing libraries. Library quality and concentration were assessed using quantitative PCR and a Qubit instrument. Sequencing was performed using an Illumina NovaSeq 6000 platform with PE250 reads. Raw sequencing data were quality-filtered using Trimmomatic software to remove low-quality reads and adapter sequences, and paired-end reads were merged into complete 16 S rRNA sequences. High-quality reads with lengths from 400 bp to 500 bp were retained to ensure data accuracy. Denoising was performed using the DADA2 algorithm and processed reads were clustered into amplicon sequence variants (ASVs). ASV sequences were aligned to the Silva 138.1 database with a classification confidence threshold of 97% similarity, and taxonomic annotation was conducted using QIIME2 to determine microbial taxonomy at various levels based on sequence similarity. To ensure consistency across samples, ASV counts were rarefied to the same sequencing depth prior to downstream analysis. Alpha diversity indices, including the Chao1 index (to estimate species richness), were calculated using QIIME2 to compare the microbial diversity between cancerous and adjacent tissues. The UniFrac distance metric was used to assess the structural differences in microbial communities across samples, and non-metric multidimensional scaling (NMDS) was used to visualize beta diversity. Linear discriminant analysis effect size (LEfSe) was used to identify significantly different microbes in cancerous and adjacent tissues. Functional prediction of 16 S rRNA data was performed using PICRUSt software. Network analysis was conducted using Cytoscape, with key nodes representing metabolic pathways, microbial genera, and associated genes. Public dataset collection The [51]GSE14520 dataset [[52]11] contains gene expression data for 247 tumor tissues and 239 paired non-tumor tissues from 247 patients with HCC. The [53]GSE76427 dataset [[54]12] includes gene expression data for 115 tumor tissues and 52 paired non-tumor tissues from 115 patients with HCC. The Cancer Genome Atlas (TCGA) dataset provides gene expression data for 371 tumor tissues and 50 adjacent normal tissues. Differential expression analysis between tumor and adjacent normal samples was performed using the DESeq2 package, with selection criteria for DEGs set to a P-value < 0.05 and|log(fold change)| > 1. The [55]GSE202642 dataset [[56]13] contains single-cell RNA sequencing data from seven HBV-related HCC tissues and four adjacent liver tissues. Genes related to glycolysis were extracted from the KEGG pathway database “Glycolysis/Gluconeogenesis”, forming a glycolysis gene set. Model construction and validation To further assess the role of glycolytic genes in predicting patient survival, we integrated 10 machine learning algorithms and 101 algorithm combinations. The algorithms included a random survival forest (RSF), CoxBoost, elastic network (Enet), generalized boosted regression modeling (GBM), stepwise Cox, partial least squares regression for Cox (plsRcox), Lasso, Ridge, supervised principal components (SuperPC), and survival support vector machines (survival-SVM). For each model, we calculated the area under the receiver operating characteristic curve (AUC) for all datasets. The model with the highest average AUC was selected as optimal. Cross-validation was performed across the three datasets to evaluate and compare the diagnostic and predictive performances of each model. Univariate Cox regression analysis was used to calculate the hazard ratios and 95% confidence intervals of the models for the three datasets. A patient risk score model was constructed based on the expression levels of genes associated with the optimal model. Individual risk scores were calculated by multiplying the expression value of each gene by the regression coefficients from the StepCox model and summing the results. Single-cell RNA sequencing analysis The Seurat toolkit was used to perform QC on the raw data from the [57]GSE202642 dataset. Cells with more than 20% mitochondrial gene expression or fewer than 200 or more than 6,000 detected genes were excluded. The log-normalization method was applied to log-transform and normalize the expression values for each cell type. Genes with high variance (> 2,000) were selected through variance analysis and PCA was performed on these highly variable genes, retaining the top 20 principal components for further analysis. The PCA results were used for dimensionality reduction with Uniform Manifold Approximation and Projection (UMAP), and cell clustering was conducted using the Leiden algorithm. The FindAllMarkers function was used to identify significant marker genes in each subpopulation. Based on marker gene expression, the cell marker database, and known cell-type markers, cell-type identities were assigned to each cell subpopulation. Spatial transcriptome and cytof analysis Spatial transcriptome sequencing data from Wu et al. [[58]14] were normalized using the log-normalization method to standardize the gene expression level for each spot. Dimensionality reduction of the spatial expression data was performed using Seurat and similar tools. Based on these results, spatially aware clustering algorithms (Spatially Variable Features in Seurat) were applied to segment the tissue slices into multiple subregions. Region-specific DEGs were identified for each spatial area and marker genes from single-cell analyses were used to determine the distribution of major cell types across different spatial regions. A spatial feature plot was used to generate heat maps displaying the spatial distribution of cells and genes within tissue slices, highlighting the spatial characteristics of the TME. CyTOF data were obtained from the HCC CyTOF Project ([59]https://data.mendeley.com/datasets/jxsz3hdsyg/1). FlowSOM was used to normalize signals across different samples to mitigate batch effects, with metal signals converted to relative abundances using reference-based standardization. Unsupervised clustering analysis was performed on the mass cytometry data using algorithms such as FlowSOM, which categorizes cells into different subpopulations. The resulting cell clusters were annotated based on the known cell marker genes. Reverse transcription-quantitative PCR (RT-qPCR) Total RNA was extracted from the remaining 20 samples using TRIzol reagent (Invitrogen, Carlsbad, CA, USA). RNA quality and concentration were assessed using a NanoDrop 2000 spectrophotometer. The extracted RNA was reverse transcribed into cDNA using a high-efficiency reverse transcriptase (Invitrogen). qPCR was performed on an Applied Biosystems 7500 system using the SYBR Green Master Mix (Applied Biosystems, Waltham, MA, USA). The specific primer sequences are listed in Table [60]S1. GAPDH was used as the reference gene, and the relative expression levels of the target genes were calculated using the 2^^−ΔΔCt method. Western blotting The remaining 20 tissue samples were ground in liquid nitrogen and total protein was extracted using radioimmunoprecipitation assay lysis buffer containing protease inhibitors. The protein concentration was determined using a BCA protein assay kit (Beyotime Biotechnology, Nantong, China). After denaturation, 20 µg of total protein was loaded onto an sodium dodecyl sulfate-polyacrylamide gel for electrophoretic separation. Proteins were transferred from the gel to a polyvinylidene fluoride membrane, which was blocked in 5% non-fat milk at room temperature for 1 h. The membrane was then incubated overnight at 4 °C with specific primary antibodies (1:1,200; Abcam, Cambridge, UK) targeting the protein of interest. After washing three times with Tris0buffered saline with Tween 20, the membrane was incubated with horseradish-peroxidase-conjugated secondary antibodies (1:5,000; Abcam) at room temperature for 1 h. The bands were visualized using an enhanced chemiluminescence reagent (Beyotime Biotechnology). The membrane was then exposed to an imaging system to capture the target protein bands. β-actin was used as the reference protein, and the band density was analyzed using ImageJ software (National Institutes of Health, Bethesda, MD, USA). Immunohistochemistry Fixed tissue samples were dehydrated, cleared, and embedded in paraffin to prepare 5-µm-thick sections, which were mounted onto pre-treated glass slides. The sections were deparaffinized by immersion in xylene (10 min per immersion, repeated twice), followed by gradual rehydration through an ethanol gradient (100%, 95%, and 80%) and a final rinse in distilled water to completely remove the paraffin. Heat-induced antigen retrieval was performed using a sodium citrate buffer (pH 6.0). The sections were incubated in 3% hydrogen peroxide for 10 min to block endogenous peroxidase activity, followed by blocking with 5% goat serum for 30 min. The sections were then incubated with primary antibodies (Abcam) overnight in a humid chamber at 4 °C for 12 h. After washing three times with phosphate-buffered saline (PBS), horseradish-peroxidase-conjugated secondary antibodies were added and the sections were incubated at room temperature for 1 h in a humid chamber. After three washes with PBS, the sections were incubated with DAB (3,3’-diaminobenzidine) for 3 min for color development. The sections were counterstained with hematoxylin to visualize cell nuclei, dehydrated using an ethanol gradient (80%, 95%, and 100%), and cleared in xylene. Finally, the slides were mounted using a neutral resin mounting medium and covered with coverslips. The stained sections were examined under a Leica optical microscope. The optical density of the stained areas was quantified using ImageJ software to assess the relative expression changes of the target protein in HCC tissues compared to adjacent non-cancerous tissues. Statistical analysis Data analysis was performed using R version 4.2 and GraphPad Prism (version 9.0.3) software. Pearson’s correlation coefficient was used to assess the linear relationships between two variables. All experiments were conducted with at least three biological replicates, and data are presented as the mean ± standard deviation. Differences in group means were evaluated using Student’s t-test or one-way analysis of variance. Statistical significance was set at a P-value < 0.05 for all statistical tests. Results Identification of differentially expressed genes and signaling pathways A total of 3,099 DEGs were identified between HCC and adjacent noncancerous tissues (Fig. [61]1A). Enrichment analysis revealed that these DEGs were significantly involved in GO functions such as cell cycle and cell division (Fig. [62]1B). KEGG pathway analysis indicated that the DEGs were primarily enriched in pathways related to the cell cycle, metabolism, and glycolysis (Fig. [63]1C). GSEA of the entire gene expression dataset revealed significant enrichment of gene sets associated with glycolysis, cell cycle regulation, and oxidative stress in HCC tissues (Fig. [64]1D). Fig. 1. [65]Fig. 1 [66]Open in a new tab Screening and enrichment analyses of differentially expressed genes. (A) The volcano plot displays differentially expressed genes (DEGs) between hepatocellular carcinoma (HCC) tissue and adjacent control tissue. Red dots indicate upregulated genes and blue dots indicate downregulated genes. (B) Geneo Ontology (GO) functional enrichment analysis, including biological processes (BPs), molecular functions (MFs), and cellular components (CCs). (C) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. (D) Gene set enrichment analysis (GSEA) of signaling pathway activation in liver cancer. NES, normalized enrichment score; NP, normalized P-value Metabolomics analysis PLS-DA analysis of the peaks extracted from all experimental samples indicated significant intergroup differences (Figure [67]S1). Metabolomic analysis identified 175 metabolites that were significantly different between HCC and adjacent tissues, with 110 detected in the positive ion mode and 65 in the negative ion mode (Fig. [68]2A). KEGG enrichment analysis of these differential metabolites showed primary enrichment in pathways related to the digestive system, lipid metabolism, and amino acid metabolism (Fig. [69]2B and C). NetScape analysis further revealed a strong association between the DEGs and glycolytic metabolic processes (Figure [70]S2). Fig. 2. [71]Fig. 2 [72]Open in a new tab Screening and enrichment analyses of differential metabolites. (A) Heatmap of significantly differential metabolites between hepatocellular carcinoma (HCC) and adjacent control tissue. (B) Metabolic pathways with significantly enriched differential metabolites detected in positive polarity mode. (C) Metabolic pathways with significantly enriched differential metabolites detected in negative polarity mode 16 S rRNA microbial community analysis 16 S rRNA sequencing revealed the microbial community composition in the HCC and adjacent control tissues. Representative sequences of the top 100 genera were obtained using multiple sequence alignment, as shown in Figure [73]S3. Phylogenetic analysis revealed significant differences in the abundances of various microbial genera between HCC and adjacent tissues. Alpha diversity analysis (Chao1 index) indicated increased microbial richness in HCC tissues compared to adjacent tissues (Fig. [74]3A). Beta diversity analysis (NMDS) further showed a distinct separation in the microbial community composition between HCC and adjacent tissues, supporting the significant structural differences in the microbiota of the two groups (Fig. [75]3B). LEfSe analysis revealed a notable increase in the relative abundance of the phylum Aenigmarchaeota in HCC tissues, whereas the abundance of the phylum Proteobacteria was significantly reduced (Fig. [76]3C). Fig. 3. [77]Fig. 3 [78]Open in a new tab Microbial community analysis in hepatocellular carcinoma (HCC) and adjacent tissues. (A) Alpha diversity analysis of microbial communities (Chao1 index). (B) The non-metric multidimensional scaling (NMDS) plot displays the differences in microbial community structure between liver cancer and adjacent tissues. Left: Based on the weighted Unifrac distance; Right: Based on the unweighted Unifrac distance. (C) Linear discriminant analysis effect size (LEfSe) analysis shows the relative abundance of different microorganisms in liver cancer and adjacent control tissues. (D) Functional annotation of microorganisms. (E) Correlation analysis between microbial abundance and glycolysis pathways Functional annotation indicated that the differential microbes were mainly associated with L-isoleucine biosynthesis (PWY-5101), amino acid biosynthesis (BRANCHED-CHAIN-AA-SYN-PWY), and nucleoside and nucleotide biosynthesis (PWY-6277) (Fig. [79]3D). Correlation analysis between differential microbes and metabolomic data showed a positive correlation between Aenigmacheota abundance and the glycolytic pathway, whereas Proteobacteria abundance was negatively correlated with glycolysis (Fig. [80]3E). Screening of prognostic markers related to Glycolysis Using public datasets, we investigated the prognostic value of glycolysis-related genes in HCC. In total, 1,016 DEGs were identified in the [81]GSE14520 dataset (Fig. [82]4A), 1,763 in the [83]GSE76427 dataset (Fig. [84]4B), and 3,685 in TCGA dataset (Fig. [85]4C). By intersecting these DEGs with glycolysis-related genes from the transcriptome sequencing data, we identified 17 glycolysis-related genes that were differentially expressed in HCC (Fig. [86]4D). Fig. 4. [87]Fig. 4 [88]Open in a new tab Identification of differentially expressed glycolysis-related genes in hepatocellular carcinoma (HCC). The volcano plot displays differentially expressed genes (DEGs) between HCC and controls in [89]GSE14520 (A), [90]GSE76427 (B), and The Cancer Genome Atlas (TCGA) (C) datasets. Red dots indicate upregulated genes and blue dots indicate downregulated genes. (D) The intersection of DEGs and genes related to glycolysis Based on these 17 glycolytic genes, we constructed 101 models using machine learning algorithms to predict the prognosis of patients with HCC. Among these, the StepCox[both] + RSF model demonstrated strong diagnostic and prognostic performance for the 1-year (Figure [91]S4), 3-year (Figure [92]S5), and 5-year (Figure [93]S6) outcomes across the three datasets (Fig. [94]5A). Univariate Cox regression analysis of patient prognosis in different cohorts ([95]GSE14520, TCGA, and [96]GSE76427) showed that the StepCox[both] + RSF model consistently provided reliable prognostic efficacy across multiple datasets (Fig. [97]5B). Comparisons with models from previously published studies further highlighted the superior performance of the StepCox[both] + RSF model in predicting patient prognosis (Figure [98]S7). Fig. 5. [99]Fig. 5 [100]Open in a new tab Constructing and screening HCC prognosis prediction models using machine learning algorithms. (A) The StepCox [both] + RSF model for predicting 1-year, 3-year, and 5-year survival rates in three sets of data. (B) Meta analysis of univariate Cox regression analysis on StepCox [both] + RSF model in the [101]GSE14520, TCGA, and [102]GSE76925 datasets. The association between risk scores constructed based on key gene expression and overall survival time and gene expression in the [103]GSE14520 (C), [104]GSE76427 (D), and TCGA (E) cohorts The StepCox[both] + RSF model incorporated the expression levels of AGL, G6PD, GOT2, and KIF20A to construct a patient risk score model. HCC samples were ranked according to the risk score and divided into low- and high-risk groups. Overall survival decreased with increasing risk score, with higher expression levels of G6PD and KIF20A and lower expression levels of AGL and GOT2 in the high-risk group (Fig. [105]5C, D, E). Single-cell RNA sequencing analysis Analysis of the single-cell RNA sequencing data ([106]GSE202642) identified 28 distinct clusters (Fig. [107]6A). Based on the expression of marker genes, we further identified cell subpopulations, including B, CD4 + Tn, CD4 + Trm, CD8 + Tex, CD8 + Tn, CD8 + Trm, and epithelial cells; fibroblasts; macrophages; monocytes; NK cells; and Treg cells (Figs. [108]6B, C). Significant infiltration of epithelial cells, CD8 + Tn cells, and their subgroups (CD8 + Tex and CD8 + Trm cells) was observed in HCC tissues (Figs. [109]6D, E). Additionally, there was notably differential expression levels of glycolysis-related prognostic markers across various cell types in HCC (Fig. [110]6F). Specifically, AGL, G6PD, GOT2, and KIF20A were highly expressed in CD8 + Tex cells, CD8 + Trm cells, macrophages, and monocytes, respectively, within HCC tissues. The glycolytic pathway was significantly elevated in epithelial cells, CD8 + Tex cells, and macrophages (Figs. [111]7A, B). Cell-cell communication analysis revealed enhanced interactions between CD8 + Tex cells and macrophages in HCC (Fig. [112]7C). Fig. 6. [113]Fig. 6 [114]Open in a new tab Cell composition and gene expression in the microenvironment of HCC revealed by single-cell RNA sequencing analysis. (A) Uniform Manifold Approximation and Projection (UMAP) plot showing the distribution of different clusters. (B) The expression levels of cell marker genes in different clusters. (C) UMAP plot showing cell types. Different colors represent different cell types. (D) UMAP plot showing the distribution of cells in tumor and control tissues. (E) The infiltration ratio of cell types in tumor and control tissue. (F) Thermogram of the expression of glycolysis-related prognostic markers in various cell types of tumor and control tissues Fig. 7. [115]Fig. 7 [116]Open in a new tab Expression of glycolysis- and intercellular-communication-related genes in hepatocellular carcinoma cells. (A) UMAP plot shows the expression pattern of the glycolytic pathway in cells. (B) The expression level of glycolytic-pathway-related genes in different cell types. (C) The intercellular ligand receptor interaction network displays the interactions between cell types Spatial transcriptome and cytof analysis Transcriptomic data revealed the spatial distribution of markers of glycolysis in the TME. The analysis revealed significantly higher infiltration levels of epithelial cells, CD8 Tex cells, CD8 Trm cells, monocytes, and macrophages in tumor tissues than in control tissues (Fig. [117]8). Of note, fibroblasts were more highly expressed in the leading-edge section than in both control and tumor tissues (Figures [118]S8). Spatial transcriptomics further illustrated the distribution of glycolysis-related prognostic markers in both tumor and control tissues (Fig. [119]9). Specifically, G6PD and KIF20A showed significantly elevated expression levels in the core region of the tumor, whereas AGL and GOT2 were highly expressed in the adjacent and control tissues. Fig. 8. [120]Fig. 8 [121]Open in a new tab The spatial transcriptome data displays the infiltration levels of immune cells in tumor, adjacent tissues, and control tissues Fig. 9. [122]Fig. 9 [123]Open in a new tab The spatial transcriptome data displays the expression levels of prognostic genes related to glycolysis in tumor, leading-edge section, and control tissues CyTOF analysis further confirmed the infiltration characteristics of different cell subpopulations. The results indicated that CD8 + Tex cells, CD8 + Trm cells, and monocytes were more abundant in tumor tissues than in the leading-edge sections, and their levels in the leading-edge sections exceeded those in the control tissues (Fig. [124]10). Fig. 10. [125]Fig. 10 [126]Open in a new tab Mass cytometry (CyTOF) analysis of the infiltration of CD8 + Tex cells, CD8 + Trm cells, and monocytes in tumor, leading-edge section, and control tissues Molecular experimental verification In the transcriptome sequencing data, the expression levels of AGL and GOT2 were lower and those of G6PD and KIF20A were higher in HCC tissues than in adjacent noncancerous tissues (Fig. [127]11A). Molecular experiments validated the expression of glycolysis-related prognostic markers. RT-qPCR results showed that the mRNA levels of G6PD and KIF20A were significantly elevated in HCC tissues compared with adjacent non-cancerous tissues, whereas AGL and GOT2 levels were lower in HCC tissues (Fig. [128]11B). Western blotting analysis confirmed that the protein levels of G6PD and KIF20A were notably upregulated in HCC tissues, whereas AGL and GOT2 expression levels were reduced compared to those in adjacent tissues (Fig. [129]11C). Immunohistochemistry demonstrated that the staining intensity of G6PD and KIF20A was markedly stronger in tumor tissues, whereas AGL and GOT2 were strongly expressed in adjacent tissues, but showed reduced expression levels in HCC tissues (Fig. [130]12). Fig. 11. [131]Fig. 11 [132]Open in a new tab Molecular experiments were conducted to detect the expression of prognostic genes related to glycolysis. (A) The expression levels of AGL, G6PD, GOT2, and KIF20A in transcriptome data. (B) Revers transcription-quantitative polymerase chain reaction (RT-qPCR) analysis of the mRNA levels of AGL, G6PD, GOT2, and KIF20A. (C) Western blotting analysis of the protein expression levels of AGL, G6PD, GOT2, and KIF20A. ** P < 0.01, *** P < 0.001 Fig. 12. [133]Fig. 12 [134]Open in a new tab Immunohistochemical detection of the protein expression levels of AGL, G6PD, GOT2, and KIF20A. Bar = 50 μm Discussion We comprehensively investigated the expression characteristics and roles of glycolysis-related genes in HCC by integrating transcriptomic, metabolomic, 16 S rRNA sequencing, single-cell RNA sequencing, spatial transcriptomic, and CyTOF data. The DEGs identified in HCC were predominantly enriched in pathways related to the cell cycle, metabolism, and glycolysis, underscoring the importance of metabolic reprogramming in HCC [[135]15, [136]16]. GSEA further confirmed the significant enrichment of gene sets related to glycolysis, cell cycle regulation, and oxidative stress in HCC tissues, suggesting that tumor cells rely on glycolysis to meet the demands of proliferation and survival [[137]17, [138]18]. Glycolysis not only provides essential energy and metabolic intermediates for tumor cells, but also influences tumor growth and metastasis by altering the TME [[139]5]. Metabolomic analysis revealed that the differential metabolites in HCC were mainly enriched in the digestive system, lipid metabolism, and amino acid metabolism pathways. The inhibition of lipid accumulation suppresses HCC growth, highlighting the potential of targeting lipid metabolism as a therapeutic approach and for treatment assessment [[140]19]. Increasing evidence suggests that amino acids promote tumorigenesis and tumor immunity by serving as nutrients, signaling molecules, and regulators of gene transcription and epigenetic modifications [[141]20]. Through association analysis between metabolites and DEGs, we found a significant correlation between metabolite levels and DEGs in the HCC microenvironment. Although the microbiome influences HCC progression, the tumor-associated microbial profile of HCC remains elusive. In the 16 S rRNA sequencing analysis, the composition of microbial communities was significantly different between HCC and control tissues. LEfSe analysis indicated that the relative abundance of Aenigmarchaeota was significantly elevated in HCC tissues, whereas that of Proteobacteria was notably reduced, suggesting a selective impact of the HCC microenvironment on microbial communities. Aenigmarchaeota possesses the metabolic potential to produce harmful secondary metabolites [[142]21], whereas Proteobacteria abundance has been associated with clinicopathological characteristics in patients with HCC [[143]22]. At the phylum level, the relative abundance of Proteobacteria is lower in tumor tissues than in non-tumor tissues [[144]23]. The accumulation of Proteobacteria is driven by diet and proteins secreted by the host cells, contributing to a state of low-level, chronic inflammation [[145]24]. Functional annotation results showed that the differentially expressed microbes were primarily associated with pathways involved in L-isoleucine, amino acid, and nucleotide synthesis, suggesting that these metabolic pathways may play critical roles in metabolic reprogramming in HCC. Glycolytic metabolism is closely linked to specific microbiome shifts in HCC, with Aenigmachaeota abundance positively correlated and Proteobacteria abundance negatively correlated with the glycolysis pathway. Members of the phylum Aenigmarchaeota have been identified in human samples [[146]25] and found to participate in pathways such as glycolysis, gluconeogenesis, the pentose phosphate pathway, and pyruvate metabolism [[147]26]. Increased abundance of Proteobacteria is typically associated with reduced glycolysis, as observed in inflammatory bowel disease [[148]27]. These findings provide important insights into the potential role of the microbiome in the metabolic reprogramming of HCC. Our microbiome analysis revealed significant compositional differences between HCC and adjacent non-cancerous tissues, suggesting that microbial alterations may play a role in HCC progression. Previous studies have demonstrated that gut and intrahepatic microbiome signatures are associated with liver disease progression [[149]28]. Microbially derived RNA fragments can enter the bloodstream through multiple mechanisms, such as direct secretion by bacteria or passive release from damaged host cells [[150]29]. The growing interest in cell-free RNA (cfRNA) analysis as a minimally invasive diagnostic tool raises the intriguing possibility of detecting microbiome-related RNA in cfRNA samples collected from patients’ blood [[151]30]. Furthermore, analyzing the functional potential of microbiome-derived cfRNAs may reveal microbial metabolic activities associated with tumor progression, immune evasion, or altered liver metabolism. Building on this analysis, we examined the prognostic roles of glycolysis-related genes in HCC using public datasets. Using a machine learning model, we identified 17 glycolysis-related genes and successfully constructed 101 prognostic prediction models. Among these, the StepCox[both] + RSF model demonstrated excellent predictive ability across multiple datasets, underscoring the importance of glycolysis-related genes in clinical prognostic assessments [[152]31]. This model offers a potential biomarker combination for identifying high-risk patients and supports the development of personalized treatment strategies. These findings suggest that glycolysis-related genes play a crucial role not only in metabolic adaptation in HCC, but also as prognostic markers, providing valuable information for survival predictions in patients with HCC [[153]32]. Glycolysis-related prognostic markers within the StepCox[both] + RSF model exhibited significant differential expression between HCC and control tissues. G6PD is transcribed through a p53-independent process, ultimately leading to metabolic reprogramming and tumorigenesis in tumor cells, making it a promising target for the diagnosis and treatment of tumors and metabolic disorders [[154]33]. During the G2 phase of the cell cycle, KIF20A accumulates in the nucleus, thereby promoting pathological hepatocyte proliferation. Silencing KIF20A suppresses HCC cell proliferation and enhances chemotherapeutic sensitivity [[155]34]. AGL expression is downregulated in HCC, resulting in the storage of glucose as glycogen, reflecting liver involvement in the disease [[156]35, [157]36]. Downregulation of GOT2 is associated with a poor prognosis in patients with HCC and increased Treg cell infiltration [[158]37]. These differential expression patterns highlighted metabolic reprogramming in HCC cells, enabling them to survive and proliferate in hypoxic and nutrient-limited microenvironments. Further, single-cell RNA sequencing analysis revealed significant infiltration of epithelial, CD8 + Tn, CD8 + Tex, and CD8 + Trm cells into HCC tissues, indicating their potential importance in the TME. CD8 + Tex cells, characterized by low responsiveness and functional impairment, are key components of the tumor immune microenvironment that can influence HCC progression and prognosis [[159]38]. CD8 + Tex cells pose a major challenge to current anti-cancer immunotherapies, as they lose the ability to produce anti-tumor cytokines [[160]39]. Research suggests that CD103^+CD8 + Trm cells represent a highly activated T cell subpopulation with tumor reactivity [[161]40]. Enrichment of PD-1^+CD8 + Trm cells in HBV^+ HCC tissues is closely associated with liver injury and fibrosis [[162]41]. Spatial transcriptomics and CyTOF analyses further validated the significantly higher infiltration levels of CD8 + Tex and CD8 + Trm cells in HCC tissues than in control tissues. Additionally, fibroblast levels was notably higher in the leading-edge section than in both the control and tumor tissues, suggesting a complex role within the TME. Fibroblasts are the most common components of the tumor stroma and the promote cancer progression [[163]42]. They secrete various cytokines or metabolites that impair immune cell function and facilitate tumor growth, invasion, and metastasis [[164]43]. The expression of glycolysis-related prognostic markers varied significantly across different HCC cell types. Higher expression levels of AGL, G6PD, GOT2, and KIF20A were found in CD8 + Tex cells, CD8 + Trm cells, macrophages, and monocytes, respectively. AGL is responsible for glycogen debranching, and its high expression levels in CD8 + Tex cells suggests a potential metabolic shift that supports exhaustion, limiting its cytotoxic potential against tumor cells. Given that CD8 + Trm cells are crucial for long-term immune surveillance in HCC, and G6PD upregulation may facilitate their persistence in the hypoxic and nutrient-deprived TME. Decreased expression levels of GOT2 in HCC macrophages may indicate a shift toward an M2-like, tumor-promoting state, as M2 macrophages suppress antitumor immune responses and promote angiogenesis. The expression of GOT2 in HCC macrophages may indicate an antitumor immune response and may promote angiogenesis [[165]44]. In HCC, high KIF20A expression levels may drive monocyte expansion and differentiation into tumor-associated macrophages, which contribute to an immunosuppressive TME [[166]45]. Spatial transcriptomics further illustrated the distribution patterns of these markers in tumor versus control tissues. G6PD and KIF20A levels were significantly elevated in the tumor core, whereas AGL and GOT2 exhibited higher expression levels in the leading-edge section and non-cancerous tissues. Molecular experiments validated the expression profiles of these glycolysis-related markers, providing further evidence of their crucial role in HCC. These findings suggest that glycolysis-related genes play a key role not only in metabolic reprogramming, but also in potentially influencing tumor progression by modulating the immune microenvironment. This study has several limitations. First, although we performed integrative analysis across multiple datasets, variations in data collection, processing, and normalization across different databases may have affected the consistency of the results. The relatively limited sample size may not fully represent the biological characteristics of the diverse populations and HCC subtypes. Second, the heterogeneity of HCC introduces substantial differences in the TME, metabolic state, and genetic background across patients, which potentially limits the generalizability of our findings to broader clinical settings. Additionally, the complexity of metabolite functions and interactions means that changes in individual metabolites may not fully capture shifts in tumor metabolic states, necessitating further mechanistic studies using functional experiments. The machine learning models constructed in this study demonstrated good predictive performance across different datasets; however, external validation remains a critical step. Future research should involve larger and more diverse sample populations, and integrate functional experiments and clinical validation to enhance our understanding of the biological significance of glycolysis in HCC and facilitate the translation of these research findings into clinical applications. Conclusions In summary, this study utilized multi-omics technologies to explore the metabolic characteristics of glycolysis in HCC and its interactions with microbial communities, immune cells, and the TME. These findings highlight the critical role of glycolysis in the progression of HCC and provide a valuable foundation for the development of biomarkers and personalized therapeutic strategies for HCC. Future studies should investigate the glycolytic pathway and its regulatory mechanisms to identify new therapeutic targets for liver cancer treatment. Electronic supplementary material Below is the link to the electronic supplementary material. [167]12967_2025_6421_MOESM1_ESM.tif^ (1.9MB, tif) Supplementary Material 1: Figure S1. Partial least squares discrimination analysis (PLS-DA) score scatter plot and ranking validation plot. (A) PLS-DA score in positive polarity mode. (B) PLS-DA score in negative polarity mode [168]12967_2025_6421_MOESM2_ESM.pdf^ (31.3KB, pdf) Supplementary Material 2: Figure S2. Network map of differentially expressed genes and metabolites. Circles represent genes, squares represent enzymes, hexagons represent metabolites, and diamonds represent pathways [169]12967_2025_6421_MOESM3_ESM.tif^ (4.7MB, tif) Supplementary Material 3: Figure S3. Phylogenetic relationships among horizontal species [170]12967_2025_6421_MOESM4_ESM.pdf^ (57.2KB, pdf) Supplementary Material 4: Figure S4. One hundred and one HCC prognostic prediction models constructed using 10 machine learning algorithms for predicting the 1-year survival rate of patients with HCC [171]12967_2025_6421_MOESM5_ESM.pdf^ (57.6KB, pdf) Supplementary Material 5: Figure S5. One hundred and one HCC prognostic prediction models constructed using 10 machine learning algorithms for predicting the 3-year survival rate of patients with HCC [172]12967_2025_6421_MOESM6_ESM.pdf^ (57.8KB, pdf) Supplementary Material 6: Figure S6. One hundred and one HCC prognostic prediction models constructed using 10 machine learning algorithms for predicting the 5-year survival rate of patients with HCC [173]12967_2025_6421_MOESM7_ESM.pdf^ (27.6KB, pdf) Supplementary Material 7: Figure S7. Evaluation of the predictive ability of clinical prediction models for StepCox [both] + RSF model and other studies. [174]12967_2025_6421_MOESM8_ESM.tif^ (41.2MB, tif) Supplementary Material 8: Figure S8. The spatial transcriptome data displays the infiltration levels of immune cells in tumor, adjacent, and control tissues [175]Supplementary Material 9^ (15.4KB, docx) Acknowledgements