Abstract

Background

   The mortality rate of hepatocellular carcinoma (HCC) remains high
   worldwide despite surgery and chemotherapy. Immunotherapy is a
   promising treatment for the rapidly expanding HCC spectrum. Therefore,
   it is necessary to further explore the immune-related characteristics
   of the tumour microenvironment (TME), which plays a vital role in
   tumour initiation and progression.

Methods

   In this research, 866 immune-related differentially expressed genes
   (DEGs) were identified by integrating the DEGs of samples from The
   Cancer Genome Atlas (TCGA)-HCC dataset and the immune-related genes
   from databases (InnateDB; ImmPort). Afterwards, 144 candidate
   prognostic genes were defined through weighted gene co-expression
   network analysis (WGCNA).

Results

   Seven immune-related prognostic DEGs were identified using the
   L1-penalized least absolute shrinkage and selection operator (LASSO)
   Cox proportional hazards (PH) model, and the ImmuneRiskScore model was
   constructed on this basis. The prognostic index of the ImmuneRiskScore
   model was then validated in the relevant dataset. Patients were divided
   into high- and low-risk groups according to the ImmuneRiskScore.
   Differences in the immune cell infiltration of patients with different
   ImmuneRiskScore values were clarified, and the correlation of immune
   cell infiltration with immunotherapy biomarkers was further explored.

Conclusion

   The ImmuneRiskScore of HCC could be a prognostic marker and can reflect
   the immune characteristics of the TME. Furthermore, it provides a
   potential biomarker for predicting the response to immunotherapy in HCC
   patients.

Supplementary Information

   The online version contains supplementary material available at
   10.1186/s12885-021-08052-8.

   Keywords: Hepatocellular carcinoma, Biomarker, Prognostic, Immune

Background

   Hepatocellular carcinoma (HCC) is one of the most common malignancies
   [[51]1, [52]2]. With a 5-year survival rate of 18%, HCC is the second
   most lethal tumour after pancreatic cancer [[53]3] and the fourth
   leading cause of cancer-related mortality worldwide [[54]4, [55]5]. HCC
   is the main type of primary liver cancer, and its increasing mortality
   rate is receiving growing concern. However, conventional treatments
   such as radiotherapy and chemotherapy do not significantly prolong
   overall survival (OS) in HCC patients [[56]6].

   Immunotherapy with immune checkpoint inhibitors (ICIs) is also an
   important treatment option. These therapies are revolutionizing the
   clinical treatment pattern of multiple tumours, most notably advanced
   melanoma [[57]7–[58]13], non-small-cell lung cancer [[59]14, [60]15]
   and renal cell carcinoma (RCC) [[61]16, [62]17]. Since HCC benefits
   from programmed cell death protein 1 (PD-1) pathway blockade [[63]18],
   approved ICIs may be used in alternative HCC treatment strategies in
   the near future [[64]6, [65]19, [66]20]. Although significant progress
   has been achieved with ICIs, only a small number of patients can
   benefit from them [[67]21]. Therefore, there is an urgent need for new,
   immune-based biomarkers to identify HCC patients who may have better
   prognoses after immunotherapy.

   Immune cells play an important role in the HCC microenvironment and
   show clinicopathological significance in predicting prognosis and
   treatment efficacy [[68]22–[69]24]. The characteristics of the tumour
   microenvironment (TME) and their functional impact on immunotherapy are
   actively being studied.

   In this study, we made full use of The Cancer Genome Atlas (TCGA) data
   and an a priori set of immune-related genes to construct a prognostic
   immune risk score using weighted gene co-expression network analysis
   (WGCNA) and the least absolute shrinkage and selection operator (LASSO)
   Cox model. We also analysed the correlation between the ImmuneRiskScore
   and the infiltration level of different immune cells to clarify
   potential mechanisms for the formation of the microenvironment.
   Finally, we explored its relevance to other immune biomarkers and its
   potential to identify patients eligible for immunotherapy to improve
   the therapeutic effects. The overall process is shown in Fig. [70]1.

Fig. 1.

   [71]Fig. 1
   [72]Open in a new tab

   Overview of the experiments

Methods

Data download and processing

   We downloaded the RNA sequencing (RNA-seq) expression profile (count
   and fragments per kilobase of transcript per million mapped reads
   (FPKM) format) and clinical data of the TCGA-HCC dataset from the
   University of California, Santa Cruz (UCSC) Xena data portal
   ([73]https://xena.ucsc.edu/), which contains information on 50 normal
   samples and 374 tumour samples. An immune-related gene set containing
   1052 immune genes was downloaded from InnateDB
   ([74]https://www.innatedb.ca/) (Table [75]S1), and a gene set
   containing 1811 immune-related genes was downloaded from ImmPort
   [[76]25] ([77]https://www.immport.org) (Table [78]S2). Expression
   profiling data and clinical data were obtained from the [79]GSE14520 by
   the GEOquery [[80]26] package of Bioconductor in R-3.5.2. [81]GSE14520
   contains 162 tumour samples after removing the normal samples. The
   microarray probe IDs were mapped to gene symbols based on the
   [82]GPL3921 platform (Affymetrix HT Human Genome U133A Array) and
   incorporated in the dataset matrix for each dataset. Eventually, the
   average of multiple probes that correspond to a single gene for each
   dataset was calculated individually using in-house R scripts. The
   tumour mutation burden (TMB) data was downloaded from the PanCancer
   Atlas ([83]https://www.cell.com/consortium/pancanceratlas).

   To further verify the predictive performance of the ImmuneRiskScore on
   immunotherapy, we collected the gene expression profiles and clinical
   response information of 33 RCC patients who received ICI therapies
   [[84]27]. In addition, we collected the RNA-seq data of 65 melanoma
   patients treated with anti-PD-1 or anti-PD-L1 therapies [[85]28,
   [86]29]. For each dataset, we standardized the transcriptome data
   across patients by max-min normalization.

Differentially expression genes

   Limma [[87]30] packages in R-3.5.2 were employed to identify
   differentially expressed genes (DEGs) between tissue adjacent to
   cancerous (n = 50) and cancer (n = 367) patients based on the raw
   counts for HCC gene expression from the TCGA. The empirical Bayes
   method was applied in the limma package to select significant DEGs.
   Here, the standard comparison mode was employed, and the threshold was
   treated as p-value < 0.05 and |log2-fold change| > = 1.5.

Gene ontology (GO) and pathway enrichment analysis of DEGs

   In this research, the clusterProfiler package [[88]31] was used to
   identify and visualize the GO terms (including biological process,
   cellular component, and molecular function terms) and Kyoto
   Encyclopedia of Genes and Genomes (KEGG) pathways enriched for DEGs. We
   set a p-value < 0.01 as the cut-off criterion and BH as the significant
   adjustment method; the cut-off for the q-value was also set as 0.01.

WGCNA

   The transcript FPKM data was used for WGCNA [[89]32]. First, the
   original data was preprocessed to identify samples with missing data
   and exclude outliers. Second, the soft-thresholding power was selected
   with the pickSoftThreshold package, which can calculate the scaleless
   topological fitting index for several powers and provide the
   appropriate soft-thresholding power for network construction. Third, we
   performed one-step automatic network construction and module detection.
   Adjacent relationships were converted into topological overlaps to
   measure the network connectivity of a gene as the sum of its
   connectivity to its neighbours and to all other genes to generate a
   network. A hierarchical clustering function was used to divide genes
   with similar expression profiles into several modules [[90]33]. Next,
   key modules related to OS were selected and visualized by Cytoscape
   [[91]34]. In the present study, we calculated the correlation between
   module eigengenes and the clinical traits survival events and survival
   time to determine the relevant module. Then, through linear regression
   analysis of gene expression and clinical information, we defined gene
   significance (GS) as the log10 transformation of the P-value
   (GS = lgP). In a module, the module significance (MS) was defined as
   the average GS for all the genes. Hub genes were identified as those
   with high clinical trait significance (> 0.1) and high intramodular
   connectivity (> 0.5) in relevant modules, and they were selected as
   candidate genes for further analysis and validation.

Gene set enrichment analysis of hub genes

   Gprofiler2 ([92]https://CRAN.R-project.org/package=gprofiler2) was used
   to perform overrepresentation analysis on input HCC hub genes [[93]35].
   It maps these immune genes to known sources of functional information
   and detects significantly enriched terms. We included pathways from
   KEGG ([94]https://www.genome.jp/kegg/), Reactome
   ([95]https://reactome.org/), and CORUM
   ([96]http://mips.helmholtz-muenchen.de/corum/). This method can obtain
   p-values by the hypergeometric test and perform false discovery rate
   (FDR) correction for multiple testing.

Construction and validation of an immune-related prognostic model

   The univariate Cox proportional hazards (PH) regression model in the
   ‘survival’ package was used to calculate the hazard ratio (HR) for DEGs
   of the HCC cohort. DEGs with significance (p-values < 0.05) were
   analysed, and their survival risks were recalculated using the Cox
   regression model from the glmnet R package [[97]35] so that the most
   important prognostic genes were selected. The regularization path of
   the LASSO method was calculated by setting the regularization parameter
   lambda to 1. To predict patient survival, the formula for the
   ImmuneRiskScore model was established as follows:
   [MATH: <mtext>ImmuneRiskScore</mtext><mo>=</mo><mo>∑</mo><mfenced
   close=")" open="("><mrow><mtext mathvariant="italic">Normalized
   expression value of gene</mtext><mspace width="0.25em"></mspace><mi
   mathvariant="italic">Gi</mi><mo>×</mo><mtext
   mathvariant="italic">LASSO</mtext><mspace width="0.25em"></mspace><mi
   mathvariant="italic">cox</mi><mspace width="0.25em"></mspace><mtext
   mathvariant="italic">coefficient of gene</mtext><mspace
   width="0.25em"></mspace><mi mathvariant="italic">Gi</mi><mspace
   width="0.25em"></mspace></mrow></mfenced> :MATH]

   Subsequently, we obtained the ImmuneRiskScore of 367 TCGA HCC patients
   and determined the threshold for the high- and low-risk groups based on
   the average.

Estimation of immune cell type fractions and ESTIMATE score

   CIBERSORT [[98]36] and LM22 reference gene expression matrices were
   employed to quantify the cell composition of different HCC samples. The
   normalized gene expression data were analysed using the CIBERSORT
   algorithm with 1000 permutations. Afterwards, immune and stromal scores
   were calculated with ESTIMATE, an algorithm that provides information
   on the abundance of these cell types in tumour tissues [[99]37].

Statistical analysis

   The survival curves of patients stratified according to the expression
   of hub immune genes were generated via the Kaplan-Meier method, and the
   statistical significance of differences was determined by the log-rank
   test. Receiver operating characteristic (ROC) curves were applied to
   assess the sensitivity and specificity of survival prediction based on
   the ImmuneRiskScore, and the pROC package was utilized to quantify the
   area under the curve (AUC). The nonparametric Mann-Whitney-Wilcoxon
   test was used to compare the data from different groups, and Pearson’s
   chi-square test was performed to measure the level of significance for
   associations among variables. All statistical analyses were performed
   using R-3.5.2. All reported p-values were two-tailed, and p < 0.05 was
   considered to indicate statistical significance.

Results

Identification of immune-related DEGs

   From the TCGA database, we obtained the expression profiles of 417 HCC
   samples, including 367 tumour samples and 50 normal samples after data
   preprocessing. A total of 7194 genes were identified as DEGs with the
   threshold of p-value < 0.05 and |log2-fold change| > 1.5; of the genes,
   3657 were upregulated and 3537 were downregulated (Fig. [100]2a, Table
   [101]S3). The samples were well clustered into normal and tumour groups
   when the top 200 DEGs were selected for unsupervised hierarchical
   clustering (Fig. [102]2b). To obtain the immune-related DEGs for HCC
   samples, we had to select genes that were both immune-related and
   differentially expressed in the two groups. Thus, we collected 2542
   human immune-related genes (1811 immune-related genes from ImmPort and
   1052 innate immune-related genes from InnateDB). We overlapped the DEGs
   with immune-related genes and selected 866 immune-related DEGs
   (Fig. [103]2c, Table [104]S4) for further analysis.

Fig. 2.

   [105]Fig. 2
   [106]Open in a new tab

   Identification of DEGs in 417 HCC patients. a Volcano plot of 7194
   DEGs. The upper-left brown and upper-right blue dots represent genes
   that are down- and upregulated in HCC, respectively. b Unsupervised
   hierarchical clustering heatmap for the top 200 DEGs ranked by fold
   change. Red: upregulated DEGs; navy blue: downregulated DEGs. c Venn
   diagram of immune genes and DEGs

   The results of GO term enrichment analysis varied according to GO
   classification and differences in the expression of immune-related
   DEGs. In terms of biological processes, the upregulated genes were
   significantly enriched in cell chemotaxis, positive regulation of
   cytokine production, leukocyte migration, etc., and the downregulated
   genes were enriched in the hormone-mediated signalling pathway, the
   steroid hormone mediated signalling pathway, etc. Regarding cellular
   components, upregulated immune-related DEGs were significantly enriched
   on the external side of the plasma membrane, major histocompatibility
   complex (MHC) protein complex, cytoplasmic vesicle lumen, vesicle lumen
   etc., and the downregulated DEGs were significantly enriched in the RNA
   polymerase II transcription factor complex, nuclear transcription
   factor complex, and transcription factor complex. In terms of molecular
   function, the upregulated immune-related DEGs were significantly
   enriched in receptor ligand activity, cytokine activity, etc. and the
   downregulated DEGs were significantly enriched in receptor ligand
   activity, steroid hormone receptor activity, and nuclear receptor
   activity. More detailed GO enrichment analysis results are shown in
   Fig. [107]3a, b and Table [108]S5. These significantly enriched
   pathways and terms help us better understand the role of DEGs in the
   HCC immune microenvironment.

Fig. 3.

   [109]Fig. 3
   [110]Open in a new tab

   Enrichment analysis of the immune-related DEGs. a GO results of
   upregulated immune-related DEGs. b GO results of downregulated
   immune-related DEGs. c KEGG analysis of immune-related DEGs. Red bars
   represent upregulated DEGs, and blue bars represent downregulated DEGs.
   d DEGs (y axis) significantly enriched in GO terms and KEGG pathways (x
   axis). Pink: GO terms enriched for the upregulated DEGs; purple: GO
   terms enriched for the downregulated DEGs; orange: KEGG pathways
   enriched for the upregulated DEGs; green: KEGG pathways enriched for
   the downregulated DEGs

   Ten significantly enriched KEGG pathways for the upregulated genes are
   shown in Fig. [111]3c and Table [112]S6: cytokine-cytokine receptor
   interaction, viral protein interaction with cytokine and cytokine
   receptor, natural killer (NK) cell-mediated cytotoxicity, the chemokine
   signalling pathway and rheumatoid arthritis. The top 10 KEGG and GO
   enrichment results for the up- and downregulated DEGs are detailed in
   Fig. [113]3d, and most of the genes enriched in the top five KEGG
   pathways and GO terms were upregulated genes. As shown in Fig. [114]3d,
   168 upregulated DEGs and 11 downregulated DEGs were enriched in
   cytokine-cytokine receptor interactions, which indicated that there may
   be a complicated molecular mechanism in the HCC immune
   microenvironment.

Weighted co-expression network construction and key modules identification

   The 866 immune-related DEGs from 367 HCC tumour samples were used to
   construct the gene co-expression network. After handling the missing
   values, we detected the outlier samples by hierarchical clustering
   (Figure [115]S1), and the dendrogram showed the 5 outlier samples that
   were removed from the analysis.

   performed network topology analysis for thresholding powers from 1 to
   20, and 4 was the lowest power with a scale-free topology fit index of
   0.85 (Fig. [116]4a). We obtained a gene clustering tree using
   hierarchical clustering of topological overlap measure (TOM)-based
   dissimilarity and identified 6 modules (Fig. [117]4b, Table [118]1). To
   select the clinically significant modules, we used WGCNA to calculate
   the correlations between the external clinical information and gene
   modules. As shown in Fig. [119]4c, the green module was the most
   associated with OS, and the green and brown eigengenes were highly
   relevant (Figure [120]S2).

Fig. 4.

   [121]Fig. 4
   [122]Open in a new tab

   WGCNA and enrichment analysis. a Scale-free fit index analysis for
   various soft-thresholding powers, cut-off > 0.85. The mean connectivity
   analysis for various soft-thresholding powers. b The cluster dendrogram
   of DEGs. In the figure, each branch represents one gene, and each
   colour indicates a co-expression module. The grey refers to genes that
   cannot be classified in any module. c Heatmap of the correlation
   between module eigengenes and the clinical traits of HCC. The green
   module was most relevant to the OS events and OS time. d The top 50 hub
   genes in the green module. The genes are represented as nodes; node
   size is related to connectivity of the gene by degree, and edge size is
   related to weight. e Enrichment analysis of immune hub genes. Left: The
   x axis represents the functional terms that are grouped and
   colour-coded according to data sources. The y-axis indicates the
   adjusted p-values on a negative-log10 scale. Every circle is one term
   and is sized according to the term enrichment degree. Right: the table
   of interesting enrichment results

Table 1.

   The number of genes in different modules
    Module   The number of genes
   Blue      302
   Brown     147
   Green     139
   Red       86
   Turquoise 389
   Yellow    139
   Grey      254
   [123]Open in a new tab

   We visualized the green module as a network by Cytoscape and selected
   the top 50 gene pairs by sorting the weight of gene pairs. As shown in
   Fig. [124]4d, some genes, such as PDLIM7, EHHADH, DMGDH, and CYP8B1 are
   represented with larger circles indicating higher node degrees.
   Finally, we screened 144 immune hub genes with high significance in
   terms of OS events and OS time (> 0.1) and high relevance to the green
   module (> 0.5) (Table [125]S7).

   Gene set enrichment analysis was performed on 144 immune hub genes to
   find overrepresented functions in the context of biological pathways,
   such as those in the KEGG and Reactome databases, and in the context of
   complexes in CORUM. The results showed that the immune hub genes were
   significantly enriched in 52 pathways and complexes (p < 0.05)
   (Fig. [126]4e; Table [127]S8), such as the TNF signalling pathway,
   tyrosine metabolism, IGF2R − PLAUR − PLAU complex, etc. These findings
   suggest that these hub genes not only affect the metabolism, apoptosis,
   cell survival, inflammation and immunity of HCC but also play a pivotal
   role in regulating the protein complexes of immune cells.

Construction of a prognostic gene signature with the LASSO Cox PH model

   We revealed that 108 of the 144 immune hub genes were significantly
   associated with OS through univariate Cox regression analysis (Table
   [128]S9). Subsequently, LASSO-penalized Cox analysis was performed to
   further narrow the scope of OS-related hub genes (Fig. [129]5a and b).
   As a result, seven genes were identified to construct the
   ImmuneRiskScore model for evaluating the prognosis of HCC patients. The
   details of the seven genes and their Cox coefficients are listed in
   Table [130]2. GO enrichment analysis showed that these seven genes were
   enriched in several molecular functions: asparaginase activity,
   beta-aspartyl-peptidase activity, 6-phosphofructokinase activity,
   glucose-6-phosphatase activity, and sugar-terminal-phosphatase
   activity.

Fig. 5.

   [131]Fig. 5
   [132]Open in a new tab

   LASSO Cox regression analysis of immune-related genes in HCC. a The
   solution path plot of each independent variate. The lateral axis and
   longitudinal axis represent the lambda value and independent variable
   coefficient, respectively. Each curve corresponds to a variable. b The
   confidence interval for each lambda, including the cross-validation
   curve (red dotted line) and upper and lower standard deviation curves
   along the λ sequence (error bars). The selected λ is shown by the
   vertical dotted lines. LASSO, least absolute shrinkage and selection
   operator. c Analysis of the prognostic value of the selected LASSO cox
   hub genes and ImmuneRiskScore

Table 2.

   The result of LASSO regression
                         Genes                        Coef.
   Secreted Phosphoprotein 2 (SPP2)                  −0.27112
   Glucose-6-Phosphatase Catalytic Subunit (G6PC)    −0.12328
   Phospholipase B Domain Containing 1 (PLBD1)       0.66909
   ETS Variant Transcription Factor 4 (ETV4)         0.359013
   Phosphofructokinase Platelet (PFKP)               0.402211
   G Protein Subunit Alpha Z (GNAZ)                  0.126409
   Asparaginase And Isoaspartyl Peptidase 1 (ASRGL1) 0.795796
   [133]Open in a new tab

   Next, HCC patients were divided into high- and low-score groups based
   on the LASSO Cox model-identified hub genes and ImmuneRiskScore, and
   the optimal threshold was obtained from the survminer package. The
   results indicated that five genes (PLBD1, ETV4, PFKP, GNAZ, ASRGL1)
   were risk factors, while SPP2 and G6PC were protective factors
   (Fig. [134]5c). In addition, high-scoring samples had worse OS than
   low-scoring samples. The prognostic accuracy of the ImmuneRiskScore
   (95% confidence interval (CI) for the HR: 0.48 (0.33–0.68), log-rank
   test p < 0.0001) is shown in the last figure of Fig. [135]5c.
   Additionally, the results of multivariate Cox regression analyses
   showed that the predictive value of the ImmuneRiskScore was independent
   of common clinical variables (Table [136]S10).

Verification of the ImmuneRiskScore in another HCC cohort

   To further investigate the prognostic value of the ImmuneRiskScore, we
   conducted a validation analysis in another Gene Expression Omnibus
   (GEO) cohort ([137]GSE14520, n = 221). The samples were categorized
   into two groups based on the ImmuneRiskScore, and the results indicated
   the significant prognostic value of the ImmuneRiskScore in predicting
   OS as well as recurrence (Fig. [138]6a and b). Figure [139]6c shows the
   prognostic accuracy of the ImmuneRiskScore, which was considered a
   continuous variable in this experiment. The AUC of the ROC curve of the
   prognostic model for OS was 0.608 at 1 year, 0.614 at 3 years, and
   0.620 at 5 years. These results suggest that the ImmuneRiskScore could
   be a potential survival predictor.

Fig. 6.

   [140]Fig. 6
   [141]Open in a new tab

   Validation of the ImmuneRiskScore model in the HCC cohort. a The OS of
   the high-risk group was significantly shorter than that of the low-risk
   group. b The recurrence rate of the high-risk group was higher than
   that of the low-risk group. c Time-dependent ROC curve analysis of the
   ImmuneRiskScore

Landscape of stromal and immune cell infiltration in patients with high and
low ImmuneRiskScore values

   The infiltrating cells and tumour purity of tumour tissues were
   assessed by ESTIMATE [[142]37]. The stromal score represented the
   presence of stromal cells in tumour tissue, and the immune score
   indicated the infiltration of immune cells in tumour tissue. Both of
   them were used for the determination of tumour purity (Table [143]S11).
   We then showed differences in terms of stromal and immune scores in
   high-risk and low-risk HCC patients (Fig. [144]7a and b).

Fig. 7.

   [145]Fig. 7
   [146]Open in a new tab

   Profile of the immune microenvironment between the low- and
   high-ImmuneRiskScore groups. a Difference in the distribution of immune
   score values in the low- and high-ImmuneRiskScore groups. b Difference
   in the distribution of stromal score values in the low- and
   high-ImmuneRiskScore groups. c The relative proportion of immune cell
   categories in the low- and high-ImmuneRiskScore groups. d The absolute
   proportion of immune cell categories in the low- and
   high-ImmuneRiskScore groups. Comparisons between the two groups were
   performed through the Wilcoxon rank-sum test. Each boxplot is labelled
   with asterisks indicating the p-values (*p < 0.05, **p < 0.01,
   ***p < 0.001, ****p < 0.0001)

   Both the immune score and stromal score of high-risk patients were
   significantly higher than those of low-risk patients (Wilcox,
   p < 0.05), indicating that immune and stromal cell infiltration were
   associated with the ImmuneRiskScore.

   We further estimated the proportions of 22 immune cell types in HCC
   patients using CIBERSORT [[147]36] (Table [148]S12). We compared the
   relative proportions of 22 immune cell types between patients with low
   and high ImmuneRiskScore values and found significant differences in
   memory B cells, plasma cells, CD4 memory activated T cells, regulatory
   T cells (Tregs), etc. (Fig. [149]7c).

   Furthermore, we calculated the absolute immune infiltration score of
   these 22 immune cell types in combination with tumour purity.
   Afterwards, a similar comparison of absolute proportions of these 22
   immune cell types was made between low and high ImmuneRiskScore
   patients. Significant differences were found in the proportions of
   memory B cells, plasma cells, CD4 memory-activated T cells, etc.
   (Fig. [150]7c).

   Both the relative and absolute proportions of several immune cell
   types, including memory B cells, plasma cells, CD4 memory-activated T
   cells, Tregs, resting NK cells, M0 macrophages, resting dendritic
   cells, resting mast cells, and neutrophils, were associated with the
   ImmuneRiskScore. The results also demonstrate that the changes in the
   proportions of immune cells may be indirectly associated with the OS of
   HCC patients.

Correlations between the ImmuneRiskScore and immune biomarkers

   It is well documented that immune biomarkers in the TME can effectively
   predict the clinical benefit of ICIs, which are revolutionizing the
   clinical treatment landscape. We next introduced a few important immune
   biomarkers, including PD-L1, PD-1, PD-L2, CTLA4, cytolytic activity
   (CYT), and interferon-gamma (IFN-γ). Among these biomarkers, the immune
   checkpoint genes PD-L1, PD-1, PD-L2, and CTLA4 are co-expressed in HCC
   [[151]38]. The CYT value reflects the activity of cytotoxic T cells
   (CTLs) and NK cells due to their powerful ability to lyse tumour cells
   [[152]39]. A recent study found that CYT-high HCC has stronger
   immunogenicity and a more favourable TME than CYT-low HCC, which would
   result in better clinical outcomes [[153]40]. CYT is measured based on
   the geometric mean of expression of granzyme A (GZMA) and perforin
   (PRF1). IFN-γ is a key cytokine that activates the PD-1 signalling axis
   by directly upregulating the ligands PD-L1 and PD-L2 in tumour cells
   produced by activated T cells, NK cells and NK T cells [[154]41,
   [155]42]. The expression of the IFN-γ receptor can affect the mechanism
   of escape from host immune surveillance in HCC [[156]43].

   TMB is defined as the number of nonsynonymous mutations per megabase
   sequenced. A high TMB is associated with an improved response to immune
   checkpoint blockade in HCC [[157]44], melanoma [[158]45], and
   non-small-cell lung cancer [[159]46, [160]47]. Dysregulation of the
   transforming growth factor beta (TGF-β) pathway plays a central role in
   inflammation, fibrogenesis, and immunomodulation in the HCC
   microenvironment [[161]48, [162]49]. TGF-β signalling in fibroblasts is
   documented as a pleiotropic cytokine associated with poor prognosis in
   multiple tumour categories [[163]50, [164]51] and is considered
   critical in advanced cancers in terms of the promotion of
   immunosuppression, angiogenesis, metastasis, tumour cell epithelial to
   mesenchymal transition (EMT), fibroblast activation and desmoplasia
   [[165]52–[166]54].

   We further explored the relationship between the ImmuneRiskScore and
   these immune biomarkers (Table [167]S13, Table [168]S14), which
   satisfied a bivariate normal distribution (Pearson result in Table
   [169]S15). As shown in Fig. [170]8, the ImmuneRiskScore values were
   significantly positively related to several immune inflammation
   biomarkers (PD-L1: r = 0.31; p = 1.63e-09, 95% CI: 0.21–0.40; PD-1:
   r = 0.35; p = 8.45e-12, 95% CI: 0.25–0.43; PD-L2: r = 0.27;
   p = 1.02e-07, 95% CI: 0.18–0.37; CTLA-4: r = 0.42; p = 6.78e-05, 95%
   CI: 0.33–0.50; CYT: r = 0.21; p = 5.41e-17, 95% CI: 0.11–0.30, IFN-γ:
   r = 0.42; p = 0.00037, 95% CI: 0.17–0.36) and Pan-F-TBRS (r = 0.27;
   p = 2.51e-07, 95% CI: 0.33–0.50). The p-values of all these
   correlations were smaller than 0.001, suggesting that the
   ImmuneRiskScore is correlated with immune biomarkers. In addition,
   there was no correlation between the ImmuneRiskScore and TMB,
   indicating that the TMB is an independent factor mediating the TME in
   these two groups.

Fig. 8.

   [171]Fig. 8
   [172]Open in a new tab

   Correlation scatterplots between the ImmuneRiskScore and combined with
   a density plot of expression distribution. The ICB biomarkers included
   PD-L1, PD-1, PD-L2, CTLA-4, CYT, IFN-γ, and Pan-F-TBRS

   Tumour immune dysfunction and exclusion (TIDE) is a gene expression
   biomarker developed for predicting the clinical response to immune
   checkpoint blockade (ICB) therapy [[173]55]. We obtained the TIDE score
   for the [174]GSE14520 dataset (Table [175]S16) through an online
   webserver ([176]http://tide.dfci.harvard.edu/). There was a significant
   difference in TIDE scores between the high and low ImmuneRiskScore
   values (Wilcox test p = 4.588315e-05) (Fig. [177]9a).

Fig. 9.

   [178]Fig. 9
   [179]Open in a new tab

   The value of the ImmuneRiskScore for predicting the response to ICB
   treatment. a Correlation between TIDE and ImmuneRiskScore (p < 0.0001,
   Wilcoxon test). b The performance of the ImmuneRiskScore in predicting
   ICB response in four datasets

   We further tested the predictive performance of the ImmuneRiskScore on
   the efficacy of ICB treatment in four public RNA-seq datasets
   (melanoma: HugoW_Cell_anti.PD-L1/Gide_CancerCell_anti.PD-1; RCC:
   Miao_Science_anti.PD-1/Miao_Science_multiTreat). The ImmuneRiskScore
   model achieved AUCs of 0.71 and 0.60 in predicting the response of
   melanoma patients treated with anti-PD-1 and anti-PD-L1 antibodies,
   respectively. It also achieved AUCs of 0.73 and 0.83 in predicting the
   response of RCC patients treated with anti-PD-1 and anti-CTLA4
   antibodies, respectively (Fig. [180]9b). These results indicate that
   the ImmuneRiskScore might be a potential biomarker for predicting the
   immunotherapy response.

Discussion

   Increasing evidence indicates that immune-related biomarkers are
   associated with the prognosis of various cancer types
   [[181]56–[182]58]. However, biomarkers that can be used directly to
   determine the efficacy of cancer immunotherapy and the prognosis of
   patients remain to be explored.

   Therefore, we propose the ImmuneRiskScore, which can predict OS based
   on the TME of HCC. We performed WGCNA, and identified 6 modules and
   found that the green module was highly relevant to the OS events and OS
   time. To identify clinically significant hub genes in the green module,
   we then performed LASSO Cox regression analysis and finally screened 7
   genes: SPP2, G6PC, PLBD1, ETV4, PFKP, GNAZ, and ASRGL1. Then, a
   seven-gene risk scoring system was constructed, and this system
   successfully classified 417 HCC patients into two risk groups with
   significantly different survival rates. The predictive performance of
   the risk scoring model was successfully validated in an independent set
   from the GEO. This suggests that the ImmuneRiskScore based on these
   seven genes may be a promising prognostic biomarker and play an
   important role in the TME of HCC.

   The pipeline constituted by WGCNA and LASSO Cox regression analysis is
   effective for many cancer types, such as glioblastoma [[183]59],
   prostate [[184]60], gastric cancer [[185]61], lung cancer [[186]62] and
   bladder cancer [[187]63]. To the best of our knowledge, only two
   studies [[188]64, [189]65] have used this pipeline to study HCC, but
   these studies did not focus on the immune microenvironment.

   Generally, the ImmuneRiskScore is primarily a reflection of the
   constituents of the TME and accounts for the complex interactions
   between cancer cells, stromal cells, and immune cells. Thus, we aimed
   to further explore the relationship between these components. We found
   that the relative or absolute infiltration levels of memory B cells,
   plasma cells, activated CD4 memory T cells, Tregs, resting NK cells, M0
   macrophages, resting dendritic cells, resting mast cells, and
   neutrophils were significantly associated with the ImmuneRiskScore,
   which also indicated the prognostic value of assessing these cell
   types. The infiltration of macrophages in solid tumours is associated
   with a poor prognosis and chemotherapy resistance in most cancers
   [[190]66]. It is noteworthy that these cell types did not cover most of
   the T cell compartment components, which are a key part of the clinical
   response. In fact, other immune cells may also contribute to antitumour
   immunity [[191]67–[192]69]. For example, memory B cells also have a
   potential role in the response to ICB treatment [[193]70].

   To validate the potential clinical benefits of assessing the
   ImmuneRiskScore to guide ICI strategies, we explored the status of
   active innate and adaptive immune responses within the TME by gene
   expression profiling. The predictive value of our immune relation score
   was positively associated with PD-L1, PD-1, PD-L2, CTLA-4, CYT, IFN-γ
   and Pan-F-TBRS. These biomarkers are proinflammatory cytokine-related
   components of the inflammatory microenvironment of tumours [[194]71,
   [195]72] and the TGF-β signalling pathway-related immune-excluded
   microenvironment of tumours [[196]49].

   Inflamed tumours contain proinflammatory cytokines and a type-I IFN
   signature, indicating activation of the innate immune response. TGF-β
   can drive the immune-excluded phenotype in the TME because it
   influences stromal cells and prevents T cells from penetrating into the
   tumour centre [[197]49]. These results indicate that antitumour
   immunity is a bidirectional and dynamic system in the TME. This
   biomarker analysis will help to unravel the complexities of the
   interaction and molecular mechanisms between cancer and the host immune
   system.

   We validated the predicted value of the ImmuneRiskScore by comparing it
   with that of the TIDE score. The results showed that the
   ImmuneRiskScore could possibly predict ICB clinical response based on
   pre-treatment tumour profiles. In addition, the predictive performance
   of the ImmuneRiskScore on four ICB treated cohorts indicate that it may
   be a potential immune-related biomarker for pan-cancer.

Conclusions

   In summary, we presented comprehensive insight into TME of HCC and
   identified ImmuneRiskScore, a potential biomarker that can be used to
   predict the response of immunotherapy for HCC patients, even for
   pan-cancer patients.

Supplementary Information

   [198]12885_2021_8052_MOESM1_ESM.docx^ (439.5KB, docx)

   Additional file 1: Figure S1. a: Clustering dendrogram of samples based
   on their Euclidean distance. b: Hierarchical clustering dendrogram of
   module eigengenes (top) and module heatmaps (bottom).
   [199]12885_2021_8052_MOESM2_ESM.docx^ (1.1MB, docx)

   Additional file 2: Figure S2. Stacked bar charts of the infiltrating
   immune cells in HCC samples separated by high and low ImmuneRiskScore
   values.
   [200]12885_2021_8052_MOESM3_ESM.xlsx^ (257.2KB, xlsx)

   Additional file 3: Supplementary Table S1. Genes downloaded from
   InnateDB.
   [201]12885_2021_8052_MOESM4_ESM.xlsx^ (148.4KB, xlsx)

   Additional file 4: Supplementary Table S2. Genes downloaded from
   ImmPort.
   [202]12885_2021_8052_MOESM5_ESM.xlsx^ (945.7KB, xlsx)

   Additional file 5: Supplementary Tables S3. Results of the differential
   expression analysis..
   [203]12885_2021_8052_MOESM6_ESM.xlsx^ (23.3KB, xlsx)

   Additional file 6: Supplementary Tables S4. Immune-related
   differentially expressed genes (DEGs).
   [204]12885_2021_8052_MOESM7_ESM.xlsx^ (340.8KB, xlsx)

   Additional file 7: Supplementary Tables S5. Results of the GO analysis.
   [205]12885_2021_8052_MOESM8_ESM.xlsx^ (44.5KB, xlsx)

   Additional file 8: Supplementary Tables S6. Results of the KEGG
   analysis.
   [206]12885_2021_8052_MOESM9_ESM.xlsx^ (29.8KB, xlsx)

   Additional file 9: Supplementary Tables S7. Immune-related hub genes
   related to OS.
   [207]12885_2021_8052_MOESM10_ESM.xlsx^ (16.2KB, xlsx)

   Additional file 10: Supplementary Tables S8. Enrichment results for the
   144 immune-related hub genes.
   [208]12885_2021_8052_MOESM11_ESM.xlsx^ (14.7KB, xlsx)

   Additional file 11: Supplementary Tables S9. Results of the univariate
   Cox regression analysis.
   [209]12885_2021_8052_MOESM12_ESM.xlsx^ (11.8KB, xlsx)

   Additional file 12: Supplementary Tables S10. Results of the
   multivariate Cox regression analysis.
   [210]12885_2021_8052_MOESM13_ESM.xlsx^ (36.9KB, xlsx)

   Additional file 13: Supplementary Tables S11. ESTIMATE results for the
   TCGA-HCC dataset.
   [211]12885_2021_8052_MOESM14_ESM.xlsx^ (113.4KB, xlsx)

   Additional file 14: Supplementary Tables S12. CIBERSORT results for the
   TCGA-HCC dataset.
   [212]12885_2021_8052_MOESM15_ESM.xlsx^ (55.3KB, xlsx)

   Additional file 15: Supplementary Tables S13. Biomarkers in the
   TCGA-HCC dataset.
   [213]12885_2021_8052_MOESM16_ESM.xlsx^ (17.9KB, xlsx)

   Additional file 16: Supplementary Tables S14. TMB data for the TCGA-HCC
   cohort.
   [214]12885_2021_8052_MOESM17_ESM.xlsx^ (9.4KB, xlsx)

   Additional file 17: Supplementary Tables S15. Results for the Pearson
   analysis of biomarkers.
   [215]12885_2021_8052_MOESM18_ESM.xlsx^ (17.9KB, xlsx)

   Additional file 18: Supplementary Tables S16. TIDE results for
   GSE14520.

Acknowledgements