Abstract Background Hepatocellular carcinoma (HCC) remains a major global health challenge, with the pathogenesis of HBV-induced HCC incompletely understood. Due to HCC’s significant heterogeneity, more accurate prognostic models and reliable biomarkers are urgently needed. This study aimed to develop such tools for HBV-related HCC. Methods Protein expression profiles and clinicopathological data were obtained from the CPTAC (Clinical Proteomic Tumor Analysis Consortium) database. Differential protein expression analysis between HBV-induced HCC and normal tissues were performed, followed by univariate and multivariate Cox regression to construct a prognostic protein signature. The four-protein signature’s prognostic power was assessed using time-dependent receiver operating characteristic (ROC) curves, Kaplan-Meier analysis, multivariate Cox regression, and a nomogram. Protein expression levels of the four proteins were validated in clinical specimens of HBV-related HCC. The signature was further validated in an independent cohort. Results A novel four-protein signature (SMC2, SMC4, UBE2C, UHRF1) was established for prognostic prediction in HBV-related HCC. ROC curves demonstrated good survival prediction performance in both the CPTAC HCC cohort and the validation cohort. Transcriptomic and immunohistochemical (IHC) analyses revealed significant overexpression of these four proteins at both mRNA and protein levels in HCC tissues. The signature stratified patients into high-risk and low-risk groups with significantly different survival outcomes. Multivariate Cox regression analysis confirmed that the 4-protein signature independently predicted overall survival (OS). Furthermore, the signature showed strong discriminatory power between HBV-related HCC and normal tissue. Time-dependent ROC analysis indicated that the protein signature exhibited superior specificity and sensitivity compared to alpha-fetoprotein (AFP) for diagnosing HBV-associated HCC. Conclusions Our study establishes a novel 4-protein signature and nomogram for predicting OS in HBV-related HCC, potentially aiding individualized clinical treatment decisions. Supplementary Information The online version contains supplementary material available at 10.1007/s12672-025-03693-8. Keywords: HBV, Hepatocellular carcinoma, Prognostic marker, Protein signature Introduction Primary liver cancer ranks as the sixth most common malignancy and the third leading cause of cancer-related death globally [[32]1]. Hepatocellular carcinoma (HCC) is the predominant form, accounting for 75–85% of cases [[33]2]. HCC development is influenced by genetic predisposition, viral infections, and environmental factors [[34]3]. Key risk factors include metabolic syndrome, alcohol consumption, aflatoxin exposure, and hepatitis viruses (HBV and HCV), with HBV and HCV being the most significant [[35]4]. HBV infects over 250 million people chronically worldwide. Infection can range from asymptomatic to causing liver failure and cancer [[36]4]. HBV contributes to HCC development through integration into the host genome, proto-oncogene activation, and interactions between HBV proteins and host factors that dysregulate cell signaling pathways [[37]5, [38]6]. Alcohol consumption further increases HCC risk in HBV patients by more than two-fold, primarily by accelerating liver fibrosis [[39]4]. While HBV is a well-established risk factor for HCC, its precise molecular mechanisms remain incompletely understood. Current HCC treatments, including surgery and chemotherapy, have limited efficacy and high recurrence rates [[40]7, [41]8]. Early detection and treatment are crucial for improving outcomes, particularly in HBV-related HCC, highlighting the urgent need for research into novel mechanisms and biomarkers. This study analyzed proteomic data from 159 paired HBV-related HCC tumor and adjacent liver tissue samples obtained from the CPTAC database. Data was normalized, grouped, and differentially expressed proteins (DEPs) were identified. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses of DEPs were conducted to guide mechanistic investigations. Subsequently, Kaplan-Meier survival analysis, univariate and multivariate Cox regression, and time-dependent ROC analysis were employed to identify proteins predictive of overall survival (OS) in HCC patients. A prognostic risk score model based on protein expression was developed for OS prediction. Ultimately, the identified prognostic biomarkers derived from DEPs may facilitate targeted treatments and improved patient outcomes. Materials and methods Antibodies and reagents Antibodies targeting SMC2 (ab10399) and UHRF1 (ab213223) were obtained from Abcam (Cambridge, MA). The antibody specific to SMC4 (PA5-56150) was sourced from Thermo Fisher Scientific (Waltham, MA). Paraformaldehyde, hydrogen peroxide, and diaminobenzidine were acquired from ZSGB-BIO (Beijing, China). Proteomics data and preprocessing Protein expression data and clinical information for HCC patients were retrieved from the CPTAC database ([42]https://pdc.cancer.gov/pdc/) [[43]9]. Data from 159 patients with complete information were combined for analysis (Supplementary Table 1) [[44]10–[45]13]. Proteomic data underwent quality control and normalization as described by Mertins et al. [[46]14]. Specifically, data normalization was achieved through the median centering method across the total proteins to account for variations in sample loading. K-nearest neighbor (k-NN) imputation, via the “impute” R package, addressed missing values, excluding proteins with over 50% missing data to ensure sufficient data for imputation [[47]9]. Identification of differentially expressed proteins (DEPs) and GO/KEGG enrichment analysis The limma package was used to identify DEPs using thresholds of |log2FC| >1 and p < 0.05. The clusterProfiler package was used for GO and KEGG enrichment analysis, and GOplot for visualization. GO analysis categorized DEPs into Biological Processes (BP), Cellular Components (CC), and Molecular Functions (MF). KEGG analysis identified pathways associated with DEPs. A significance threshold of p < 0.05 was applied. Protein-protein interaction (PPI) network, module analysis, and hub protein identification The STRING database ([48]https://string-db.org/) was used to construct a PPI network of DEPs with an interaction score ≥ 0.9. The MCODE plugin identified functional modules within the PPI network using parameters: node score cutoff = 0.2, k-core = 2, max.depth = 100, degree cutoff = 2 [[49]15]. The igraph package was used to visualize the core network of critical module genes. Selection of prognostic and diagnostic proteins Kaplan-Meier analysis assessed the association between DEP expression and HCC patient prognosis. Univariate and multivariate Cox proportional hazards regression models were used to determine hazard ratios (HRs) and 95% confidence intervals (CI), evaluating the independent predictive value of clinicopathological parameters and DEP expression on survival (p < 0.05 considered significant). The area under the curve (AUC) of the time-dependent ROC for 1-, 3-, and 5-year OS was calculated for DEPs associated with prognosis. DEPs with prognostic relevance and AUC values > 0.7 were selected for further investigation. Clinical validation of the 4-protein signature by immunohistochemistry (IHC) Twenty paired HCC tumor and adjacent normal tissue samples were collected from untreated patients undergoing surgery at Hangzhou TCM hospital affiliated to Zhejiang Chinese Medicial University between January 2024 and June 2025. Samples were divided: one part paraffin-embedded for histopathology, the other frozen at -80 °C. All patients had confirmed clinical and pathological HCC diagnoses; tumor stage and grade were defined according to UICC, AJCC, and WHO criteria. Paraffin-embedded tissues were sectioned at 4 μm. Sections were deparaffinized, rehydrated, and subjected to antigen retrieval. After PBS washing, endogenous peroxidase was blocked with 3% hydrogen peroxide for 10 min. Sections were then blocked with 5% BSA for 20 min. Primary antibodies (anti-SMC2 (1:50), anti-SMC4 (1:100), anti-UHRF1 (1:50 or 1:100)) were applied overnight at 4 °C. Signal detection used the streptavidin-peroxidase method with diaminobenzidine (DAB) as the chromogen, followed by hematoxylin counterstaining. Sections were examined and photographed under light microscopy. Scoring was performed independently by two blinded observers. Construction of the protein signature and risk score calculation Prognostic proteins (AUC > 0.7 in time-dependent ROC) were used to build a prognostic signature. An individual risk score for each HCC patient was calculated using a formula derived from the protein expression levels weighted by their regression coefficients in the multivariate Cox proportional hazards model. To identify the most prognostically relevant threshold that is robust to the underlying distribution of the risk score, the optimal cut-off value for stratifying patients was determined using the “surv_cutpoint” function (from the “survminer” R package), which maximizes the survival differences between groups based on the two-sided log-rank test statistic. This method is distribution-free and identifies the cut-point with the most significant association with survival outcomes. Subsequently, patients with HBV-related HCC were stratified into high-risk and low-risk cohorts based on this empirically derived threshold. Risk curve plots, survival status plots, and heatmaps visualized the patient risk distribution and protein expression patterns. The timeROC R package was used for time-dependent ROC curve analysis to evaluate the signature’s discriminatory capacity over time. Construction of a predictive nomogram A nomogram incorporating independent prognostic factors identified by multivariate Cox regression was developed to predict 1-, 3-, and 5-year OS in HCC. Discriminatory ability was assessed using time-ROC curves. Calibration curves plotted predicted probabilities against observed survival rates to evaluate accuracy. The performance of the nomogram was compared to models using single prognostic factors. Statistical analysis Statistical analyses were performed using R software version 3.6.3. The Wilcoxon test compared two groups of boxplot data. One-way ANOVA compared multiple groups of normalized data. Kaplan-Meier curves with log-rank tests and univariate Cox proportional hazards regression calculated p-values, HRs, and 95% CIs. Statistical significance was set at p < 0.05 unless otherwise stated. Results Identification of DEPs in HCC Compared to adjacent liver tissues, 418 DEPs were identified in HCC tissues: 109 upregulated and 309 downregulated (Supplementary Table 2). Functional and pathway enrichment of DEPs GO and KEGG analyses revealed the functional roles of DEPs. The top 10 enriched BPs were predominantly metabolic processes, including small molecule catabolic process, organic acid biosynthetic process, carboxylic acid biosynthetic process, and amino acid metabolic process. For CC, DEPs were significantly enriched in the mitochondrial matrix, collagen-containing extracellular matrix, peroxisome, microbody, collagen trimer, peroxisomal matrix, microbody lumen, CMG complex, DNA replication preinitiation complex, and MCM complex. Enriched MFs included oxidoreductase activity, iron ion binding, vitamin binding, monooxygenase activity, heme binding, tetrapyrrole binding, and arachidonic acid monooxygenase activity (Fig. [50]1A-B). KEGG pathway analysis showed enrichment in retinol metabolism, carbon metabolism, steroid hormone biosynthesis, amino acid biosynthesis and metabolism, fatty acid metabolism, pyruvate metabolism, and glycolysis/gluconeogenesis (Fig. [51]1C-D). Fig. 1. [52]Fig. 1 [53]Open in a new tab The differentially expressed proteins in HBV-related HCC were subjected to GO analysis and KEGG pathway analysis. A Conducting GO enrichment analyses on the proteins that were found to be differentially expressed. B The circular diagram depicting the enrichment of GO terms among the proteins that are differentially expressed. C KEGG pathway enrichment analysis on the proteins that exhibited differential expression. D The KEGG enrichment circular diagram illustrates the differentially expressed proteins PPI network construction and hub gene identification The STRING database generated a DEP interaction network comprising 212 nodes and 1038 edges (Fig. [54]2A). MCODE analysis identified 30 hub proteins (Fig. [55]2B). The top 10 DEPs with highest connectivity were CYP2E1, CYP3A4, AOX1, CYP1A1, CYP1A2, MCM2, MCM4, CDK1, TOP2A, and CYP2B6. CYP2E1, CYP3A4, AOX1, CYP1A1, CYP1A2, and CYP2B6 were predominantly downregulated (Fig. [56]2C-D), while TOP2A, CDK1, MCM4, and MCM2 were significantly upregulated (Fig. [57]2D). Fig. 2. [58]Fig. 2 [59]Open in a new tab Identification of hub proteins in the protein-protein interactions (PPI) network. A Utilizing the STRING online database to generate a PPI network of the differentially expressed proteins in HBV-related HCC. B The top 30 nodes in the PPI network based on their degree centrality. C-D The correlation network of ten hub genes is depicted with correlation coefficients represented by various colors Survival analysis, cox regression, and time-dependent ROC identify four prognostic proteins in HBV-related HCC Survival analysis using the R package, along with univariate and multivariate Cox analysis, investigated DEP associations with OS in HBV-related HCC. Initially, 105 DEPs were associated with OS. Univariate analysis identified 95 DEPs significantly correlated with OS. Multivariate Cox regression identified 37 DEPs as independent prognostic factors (data not shown). Time-dependent ROC curve analysis revealed that only four DEPs (SMC2, SMC4, UBE2C, UHRF1) had AUC values > 0.7. Kaplan-Meier analysis demonstrated that high expression of SMC2, SMC4, UBE2C, and UHRF1 was significantly associated with poor prognosis in HBV-related HCC (Fig. [60]3A, E, I, M). Multivariate Cox regression including sex, age, tumor differentiation, tumor size, tumor thrombus, and the four proteins confirmed that SMC2, SMC4, UBE2C, and UHRF1 were independent prognostic factors (Fig. [61]3C, G, K, O). ROC analysis indicated significant diagnostic efficacy of these proteins for predicting 1-year, 3-year, and 5-year survival rates (Fig. [62]3D, H, L, P). Fig. 3. [63]Fig. 3 [64]Open in a new tab Identification of four prognostic proteins in HBV-related HCC using Kaplan-Meier survival analysis, univariate and multivariate Cox regression analysis, and time-dependent receiver operating characteristic (ROC) analysis. A, E, I, M Utilized Kaplan-Meier curves to conduct survival analysis of SMC2 (A), SMC4 (E), UBE2C (I), and UHRF1 (M) in HCC related to HBV infection. Prognostic forest maps of proteins SMC2 B-C, SMC4 F-G, UBE2C J-K, and UHRF1 N-O in HBV-related HCC using univariate and multivariate Cox regression analysis. D, H, L, P Time-dependent ROC analysis of SMC2 (D), SMC4 (H), UBE2C (L), and UHRF1 (P) in HBV-related HCC Correlation between four prognostic protein levels and clinical characteristics We examined the correlation between the expression levels of the four prognostic proteins and clinical parameters (histological grade, tumor thrombus, age). The highest expression levels of all four proteins were associated with grade 3 histological tumors. SMC2, SMC4, and UHRF1 expression was higher in tumors with thrombus compared to those without. UHRF1 expression was also higher in patients under 55 years old (Fig. [65]4A-H). Fig. 4. [66]Fig. 4 [67]Open in a new tab Correlation between the expression of four prognostic proteins and clinicopathologic characteristics. A-D The scatter plot illustrates the relationship between SMC2/4 expression and clinicopathologic features, such as histologic grade and tumor thrombus. E The correlation between the expression of UBE2C and the histologic grade of HBV-related HCC. F-H Box plot analysis was conducted to assess the UHRF1 expression levels in patients with HBV-related HCC, stratified by various clinical characteristics including histologic grade, presence of tumor thrombus, and age. The group designated as grade I in Fig. 4A, C, E, and F is not presented, as it is considered invalid due to the inclusion of only a single patient Construction and validation of the 4-protein prognostic signature Multivariate Cox analysis confirmed the prognostic significance of SMC2, SMC4, UBE2C, and UHRF1, associating high expression with increased risk (Fig. [68]5A). The final risk score incorporated four protein biomarkers weighted by their Cox regression coefficients: risk score = 1.9106*Exp[(SMC2)] − 2.0133*Exp[(SMC4)] + 0.8297*Exp[(UBE2C)] + 0.2065*Exp[(UHRF1)]. The risk score formula was applied to each patient. Using the median risk score as the cutoff (determined by survminer), patients were stratified into high-risk (n = 71) and low-risk (n = 72) groups (Fig. [69]5B). Higher risk scores correlated with increased patient mortality (Fig. [70]5C). A heatmap confirmed significant overexpression of the four proteins in the high-risk group (Fig. [71]5D). Kaplan-Meier analysis revealed significantly worse OS in the high-risk group compared to the low-risk group (Fig. [72]5E). Time-dependent ROC analysis for the signature predicting 1-, 3-, and 5-year OS yielded AUC values of 0.775, 0.670, and 0.836, respectively (Fig. [73]5F), indicating sensitive and specific prognostic capability. Fig. 5. [74]Fig. 5 [75]Open in a new tab The development of a prognostic protein signature, along with the assessment of its predictive capabilities and validation. A Development of a protein signature through prognostic analysis in HBV-related HCC. (B) The distribution and median value of risk scores within the cohort of patients with HBV-related HCC. C The survival outcomes of patients with HBV-related HCC were examined, with a positive correlation observed between mortality rates and higher risk scores. D The heatmap displays the expression profiles of the four prognostic proteins in both low- and high-risk groups. E The Kaplan-Meier survival analysis was conducted on the four-protein signature. F The analysis of the time-dependent ROC curve for the four-protein signature Analysis of TCGA data revealed significant mRNA overexpression of SMC4, UBE2C, and UHRF1 in HCC tissues (p < 0.05) (Fig. [76]6B-D), but not SMC2 (Fig. [77]6A). However, analysis of RNA-seq data from Gao et al. confirmed SMC2 mRNA overexpression in HBV-related HCC compared to adjacent tissues (Fig. [78]6E). IHC analysis demonstrated significant protein overexpression of SMC2, SMC4, and UHRF1 in HBV-related HCC tissues (Fig. [79]7A-I and Supplementary Fig. 1–10). Supporting data from The Human Protein Atlas also showed marked overexpression of SMC4 and UBE2C proteins in HCC tissues (Fig. [80]7J-K). Fig. 6. [81]Fig. 6 [82]Open in a new tab To confirm the mRNA expression of prognostic protein signature in HCC. A-D The mRNA levels of SMC2, SMC4, UBE2C, and UHRF1 in HCC and normal tissues from the TCGA cohort were analyzed. E SMC2 mRNA levels in HBV-related HCC and adjacent non-tumor liver tissues were quantified using RNA-Seq (n = 159). The results mentioned above were derived from the study carried out by Gao et al. [[83]9]. Data are depicted as mean ± SE. *, p < 0.05 Fig. 7. [84]Fig. 7 [85]Open in a new tab Immunohistochemistry (IHC) confirmed the expression of prognostic protein signature in HCC. A, D, G Representative IHC images illustrating SMC2 expression in thyroid cancer (A), SMC4 in nephridial tissue (D), and UHRF1 in colorectal cancer (G), serving as a positive control, are presented. B, E, H IHC staining assessed SMC2 (B), SMC4 (E), and UHRF1 (H) in HBV-related HCC and adjacent normal tissues, with images provided at 100× and 400× magnifications. C, F, I H-score statistics were calculated for 20 (C, F) or 5 (I) paired specimens. J-K Compared to normal liver tissue, SMC4 (J) and UBE2C (K) expression is higher in HCC, according to The Human Protein Atlas database Construction and validation of a predictive nomogram Univariate and multivariate Cox regression confirmed that the 4-protein signature risk score, alongside tumor grade, tumor thrombus, and AFP level, were independent prognostic variables for OS in HBV-related HCC (Fig. [86]8A-B). A composite nomogram incorporating these factors was developed to predict 1-year, 3-year, and 5-year OS probabilities (Fig. [87]8C). Calibration plots showed good agreement between nomogram predictions and actual survival outcomes (Fig. [88]8D). The AUC values for the nomogram predicting 3-year and 5-year OS surpassed those of the individual risk score, AFP, tumor thrombus, or grade alone (Fig. [89]8F-G), indicating superior predictive performance. Fig. 8. [90]Fig. 8 [91]Open in a new tab The identification of independent prognostic parameters and the development and validation of a protein-based prognostic model. A Forrest plot depicting the results of univariate Cox regression analysis in HBV-related HCC. B A forest plot depicting the results of a multivariate Cox regression analysis in HBV-related HCC. (C) The nomogram incorporates four protein-based risk score, AFP levels, presence of tumor thrombus, and tumor grade. D The calibration plot of the nomogram was utilized to assess the agreement between predicted 1-, 3-, and 5-year OS rates and actual outcomes in patients with HBV-related HCC. E-G The time-dependent ROC curves for the nomogram, risk score, AFP, tumor thrombus, and grade in predicting 1-, 3-, and 5-year OS in the HBV-related HCC Discussion HCC remains a significant global public health burden. While demographic and clinical factors (age, sex, BMI, TNM stage, vascular invasion, tumor thrombus, AFP) offer some prognostic value [[92]16], the marked heterogeneity of HCC underscores the need for novel biomarkers and more precise prognostic models. For example, Li et al. [[93]17] utilized portal venous and hepatic arterial coefficients to predict overall and recurrence-free survival following hepatectomy in HCC patients. Similarly, Xiang et al. [[94]18] conducted an extensive review on the application of the PANoptosis signature (a novel inflammatory cell death pathway) in forecasting therapeutic response and prognosis in HCC. The integration of molecular signatures with clinical parameters appears promising as a strategy to address the heterogeneity of HCC, potentially offering enhanced predictive capabilities compared to individual biomarkers [[95]19]. Furthermore, the analysis of liver cancer biopsy specimens through immunohistochemistry (IHC) has revealed aberrant protein expression profiles as promising tools for cancer prognosis [[96]20–[97]22]. This study identified a novel 4-protein signature (SMC2, SMC4, UBE2C, UHRF1) for prognostic prediction in HBV-related HCC. Importantly, high expression of these proteins constitutes significant risk factors associated with poor patient outcomes. Furthermore, a nomogram integrating this signature with traditional prognostic factors proved highly effective for predicting both short-term (1-year, 3-year) and long-term (5-year) survival. Collectively, these findings indicate that a risk model based on this 4-protein signature is a valuable prognostic tool for survival assessment in HBV-related HCC. This study demonstrated that the expression levels of the proteins SMC2, SMC4, UBE2C, and UHRF1 significantly increase as the degree of tumor differentiation decreases. However, among the 159 cases of HBV-related HCC detailed in Supplementary Table 1, only one patient was classified as Grade I. Consequently, it is currently not feasible to ascertain whether these four proteins can serve as early diagnostic markers for HBV-related HCC. Furthermore, Supplementary Table 1 lacks data on the TNM staging of HBV-related HCC patients. Future research should involve classifying HBV-related HCC patients according to their clinical data, including tumor size, number, vascular invasion, liver function, and the patient’s physical condition, using the BCLC (Barcelona Clinic Liver Cancer) and CNLC (Chinese Liver Cancer) staging systems. By analyzing the correlation between the expression levels of SMC2, SMC4, UBE2C, and UHRF1 proteins and the staging of HBV-related HCC patients, we can provide scientific evidence to determine whether these proteins can be utilized as early diagnostic markers for HBV-related HCC and to elucidate their association with the occurrence and progression of HBV-related HCC. Tumor thrombus represents a prevalent complication associated with cancers, significantly impacting both the survival duration and quality of life of affected patients [[98]23]. When tumor thrombus develops in critical tissues and organs, it poses a heightened risk to patient survival, potentially leading to rapid mortality [[99]23–[100]25]. Consequently, the accurate diagnosis of tumor thrombus in patients with liver cancer, along with precise localization, is essential for ensuring safe and effective treatment. For example, a systematic review and single-arm meta-analysis by Wang et al. [[101]26] demonstrated that local ± systemic therapy offers superior long-term OS and manageable complications for HCC with hepatic vein tumor thrombus (HVTT), inferior vena cava tumor thrombus (IVCTT), and/or right atrium tumor thrombus (RATT), compared to surgery with adjuvant therapy or surgery alone. Globally, treatment strategies for HCC with portal vein tumor thrombus (PVTT) exhibit considerable variation. Although guidelines recommend systemic therapies such as sorafenib and immune checkpoint inhibitors, their efficacy remains limited in cases involving PVTT [[102]27]. Emerging evidence increasingly supports the use of aggressive local treatments, including liver resection, transplantation, radiation, and transarterial chemoembolization (TACE), for select patient populations [[103]27]. Notably, this study identified elevated expression levels of SMC2, SMC4, and UHRF1 in HBV-related HCC patients with tumor thrombus compared to those without. Moreover, multivariate Cox regression analysis demonstrated that tumor thrombus serves as an independent prognostic factor for patients with HBV-related HCC. Consistent with our findings, Yang et al. [[104]28] demonstrated that CCL20 is overexpressed in HCC with bile duct tumor thrombus and is inversely correlated with surgical outcomes. Furthermore, Li et al. [[105]29] reported higher expression of CXCR4 in PVTT tissue relative to HCC tissue; however, the invasion capacity of PVTT cells was significantly reduced (p < 0.05) following the downregulation of CXCR4 expression [[106]29]. These findings highlight the need to investigate if SMC2, SMC4, and UHRF1 overexpression increases HCC invasiveness, potentially serving as diagnostic markers and therapeutic targets for HBV-related HCC with tumor thrombus. SMC proteins, highly conserved ATPases, are essential for chromosome structure, dynamics, and processes like mitosis, gene regulation, and DNA repair, and are implicated in tumorigenesis [[107]30, [108]31]. The SMC2-SMC4 heterodimer plays a crucial role in chromosome dynamics [[109]32, [110]33]. SMC4 is overexpressed in lung adenocarcinoma and functions as an independent prognostic marker; its knockdown significantly inhibits A549 cell proliferation and invasion [[111]34]. Consistent with Zhou Bo et al.., we observed significant SMC2 and SMC4 protein overexpression in HCC tissues by IHC (Fig. [112]7B-C, E-F). Zhou et al.. also demonstrated SMC4’s association with tumor size, differentiation, TNM stage, and vascular invasion in HCC, and its role in modulating the miR-219-SMC4-JAK2/Stat3(Tyr705) signaling axis to promote hepatoma cell proliferation and invasion [[113]35]. Aberrant SMC4 expression is also reported in prostate cancer [[114]36], and it serves as a potential indicator for breast cancer chemotherapy response [[115]37] and prognosis [[116]38]. UBE2C, encoding an ubiquitin-conjugating enzyme, plays a key role in mitotic cyclin degradation and cell cycle progression, contributing to cancer development [[117]39, [118]40]. UBE2C overexpression is observed in 27 tumor types, including breast, colorectal, lung, and liver cancer, and is strongly associated with poor patient prognosis [[119]41, [120]42]. Its dysregulation promotes chromosomal instability and facilitates progression in gastric cancer, NSCLC, and melanoma [[121]43–[122]45]. Wei et al.. confirmed UBE2C overexpression in HCC tissues and cell lines correlating with poor prognosis [[123]46]; our study corroborates UBE2C mRNA and protein overexpression in HCC (Figs. [124]6C and [125]7K). These findings underscore UBE2C’s significance as a critical oncogene in HCC and other cancers. UHRF1, a key epigenetic regulator, maintains promoter methylation of tumor suppressor genes, suppressing their expression and influencing tumorigenesis, treatment response, and prognosis [[126]47, [127]48]. Dysregulated UHRF1 expression, characterized by elevated levels throughout the cell cycle, significantly contributes to tumor development, metastasis, and recurrence [[128]49]. IHC and microarray analyses confirm UHRF1 upregulation in lung, gastric, prostate, and colorectal cancers [[129]50–[130]52]. Consistent with this, our IHC analysis showed marked UHRF1 overexpression in the nuclei of hepatoma cells compared to adjacent normal liver (Supplementary Fig. 1). UHRF1 thus represents a potential pan-cancer biomarker; reliable detection methods could benefit cancer diagnosis and prognosis evaluation. To our knowledge, prognostic models and nomograms based on protein expression signatures in HBV-related HCC are rarely reported. Our novel model demonstrated efficacy in predicting HBV-related HCC outcomes. The roles of SMC2, SMC4, UBE2C, and UHRF1 in tumorigenesis are well-supported in the literature, as summarized above. While we have not deeply explored the mechanisms here, potential regulatory pathways can be inferred from their known functions. Further experiments are warranted to investigate their specific roles in HBV-related HCC pathogenesis. Limitations of the study Our study is subject to several limitations. Firstly, the CPTAC cohort predominantly consists of White or Asian patients, necessitating caution when generalizing findings to other ethnic groups. Secondly, it is imperative to candidly acknowledge that the lack of Bootstrap internal validation to evaluate the variability of the optimal cutoff point for the risk score constitutes a limitation of this exploratory study. Thirdly, it is crucial to perform external validation of the signature and nomogram in an independent cohort. Fourthly, the current analyses are primarily descriptive. Thus, functional experiments are required to elucidate the underlying mechanisms of these four proteins in HBV-related HCC. For instance, techniques like CRISPR knockout can assess the impact on cell line sensitivity to treatments like sorafenib and PD-1 inhibitors. Additionally, experiments involving nude mouse tumor formation or tail vein injection could be conducted to investigate the effects of gene silencing or overexpression of these proteins on the initiation, progression, and metastasis of HCC. For clinical application, future research should use fluorescence in situ hybridization (FISH) or IHC to measure protein levels in biopsies, and enzyme-linked immunosorbent assay (ELISA) or chemiluminescence immunoassay (CLIA) for blood samples. Additionally, findings should be validated with a larger, independent cohort of HCC patients, including both HBV-related and non-HBV-related cases. Conclusions Our study establishes a novel 4-protein signature and nomogram for predicting OS in HBV-induced HCC, offering potential guidance for personalized clinical treatment decisions. Supplementary Information Below is the link to the electronic supplementary material. [131]Supplementary Material 1.^ (21.9KB, xls) [132]Supplementary Material 2.^ (34.3KB, xls) [133]Supplementary Material 3.^ (28.4MB, ppt) Acknowledgements