Abstract Although immunotherapy has revolutionized cancer management, most patients do not derive benefits from it. Aiming to explore an appropriate strategy for immunotherapy efficacy prediction, we collected 6251 patients’ transcriptome data from multicohort population and analyzed the data using a machine learning algorithm. In this study, we found that patients from three immune gene clusters had different overall survival when treated with immunotherapy (P < 0.001), and that these clusters had differential states of hypoxia scores and metabolism functions. The immune gene score showed good immunotherapy efficacy prediction (AUC was 0.737 at 20 months), which was well validated. The immune gene score, tumor mutation burden, and long non-coding RNA score were further combined to build a tumor immune microenvironment signature, which correlated more strongly with overall survival (AUC, 0.814 at 20 months) than when using a single variable. Thus, we recommend using the characterization of the tumor immune microenvironment associated with immunotherapy efficacy via a multi-omics analysis of cancer. Keywords: Cancer, Immunotherapy, Tumor immune microenvironment, Machine learning Abbreviations: AUC, Area under the curve; CIs, Confidence intervals; CTL, Cytotoxic T-lymphocyte infiltration; GEO, Gene Expression Omnibus; GO, Gene Ontology; GSEA, Gene set enrichment analysis; GSVA, Gene set variation analysis; HLAs, Human leukocyte antigens; HRs, Hazard ratios; KEGG, Kyoto Encyclopedia of Genes and Genomes; LASSO, Penalized logistic least absolute shrinkage and selector operation; lncRNA, Long non-coding RNA; NSCLC, Non-small cell lung cancer; OS, Overall survival; PCA, Principal componentanalysis; PD-L1, Programmed death ligand-1; PFS, Profession-free survival; RNA-seq, Transcriptome RNA sequencing; ROC, receiver operating characteristic curves; TCGA, The Cancer Genome Atlas; TMB, Tumor mutation burden; TME, Tumor immunemicroenvironment; WGCNA, Weighted gene co-expression network analysis 1. Introduction Immunotherapy has revolutionized the management of cancers such as melanoma, non-small cell lung cancer (NSCLC), breast cancer, and bladder cancer [[41][1], [42][2], [43][3]]. Several biomarkers, namely programmed death ligand-1 (PD-L1) expression [[44]1,[45]3] and tumor mutation burden (TMB) [[46]2,[47]4] are used as predictors for immunotherapy efficacy. Nevertheless, a subset of patients cannot benefit from immunotherapy [[48]5,[49]6]. Some studies have found that there is no strong association between the radiological response and PD-L1 expression [[50]7], and the cutoff for high TMB varies across tumor types [[51]8]. Thus, there is an urgent need to develop an appropriate prediction strategy to help identify the potential population that can benefit from immunotherapy, as well as to provide a guide for clinical decision-making. The tumor immune microenvironment (TME) represents a multifaceted cellular environment that can not only restrict tumor development but also affect the anti-tumor therapeutic response and tumor progression [[52]9]. Emerging evidence indicates that tumor cells can activate different immune pathways, generate immune suppressive conditions, and transform the TME. A comprehensive understanding of TME can serve as a breakthrough in predicting the clinical benefits accurately. Except for the abovementioned immune checkpoints, immune cells within the TME also play determinative roles in tumorigenesis [[53]10]. Studies have indicated that human leukocyte antigens (HLAs) can influence the immune cell infiltration and stromal cell reaction of the TME, and losing the ability to present neoantigens through HLAs may facilitate immune invasion and contribute to tumor progression [[54]11,[55]12]. Previous studies have found that the PD-L1 expression, TMB, sex status, genetic mutation, and ctDNA maximum somatic allele frequency can collectively achieve a better predictive effect for immunotherapy efficacy than one of those alone [[56][13], [57][14], [58][15]]. Long non-coding RNA (lncRNA) has been shown to be closely related to immune dysfunction and the response of immunotherapy. By combining lncRNA score, PD-L1 expression, cytotoxic T-lymphocyte infiltration (CTL), and TMB, a novel multi-omics algorithm model was established, and showed a strong association with patients’ overall survival (OS) of immunotherapy [[59]16]. This indicates that the multi-omics data prediction model is a promising direction for immunotherapy efficacy prediction. In this multicohort study, we conducted an individual patient-level analysis with the transcriptome RNA sequencing (RNA-seq) data of patients by machine learning. The key immune cells, HLAs, and immune checkpoints were selected, and the immune-related genes of those three aspects of the TME were combined to construct the immune gene score. Finally, a prediction model combined with the immune gene score, TMB, and lncRNA score [[60]16] was constructed to improve the predictive ability, and it can serve as a promising strategy for immunotherapy efficacy prediction. 2. Methods 2.1. Patients and study design In this study, the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guideline was used. An individual patient-level analysis was conducted with RNA-seq data of patients from a multicohort population by using machine learning. The primary endpoint was OS. Overall study design was shown in [61]Fig. 1. Fig. 1. [62]Fig. 1 [63]Open in a new tab Schematic diagram of the study flow. GEO: Gene Expression Omnibus; GSEA: gene set enrichment analysis; GSVA: gene set variation analysis; GO: Gene Ontology; HLAs: human leukocyte antigens; KEGG: Kyoto Encyclopedia of Genes and Genomes; LASSO: penalized logistic least absolute shrinkage and selector operation; lncRNA: long non-coding RNA; NSCLC: non-small cell lung cancer; OS: overall survival; PCA: principal component analysis; PD-L1: programmed death ligand-1; TCGA: The Cancer Genome Atlas; TMB: tumor mutation burden; TME: tumor immune microenvironment; ROC: receiver operating characteristic; WGCNA: weighted gene co-expression network analysis. A total of 6251 patients from a multicohort population were enrolled and divided into a training cohort and three validation cohorts, including 675 patients treated with immunotherapy and 5576 patients given standard treatment (treatment strategies unannounced). The training cohort contained 348 bladder cancer patients treated with the PD-L1 inhibitor atezolizumab from the single-arm, phase 2, multicenter Imvigor210 trial. Validation-1 contained 87 metastatic urothelial cancer patients treated with immunotherapy from [64]GSE176307, Validation-2 contained 213 melanoma patients treated with the immunotherapy from [65]GSE78220, TCGA-SKCM, PRJEB23709, and [66]GSE100797, Validation-3 contained 27 NSCLC patients given anti-PD-1/PD-L1 treatment from [67]GSE135222, and Validation-4 consisted of 5576 pan-cancer patients from The Cancer Genome Atlas (TCGA). The individual patient-level data of these patients were downloaded from the Gene Expression Omnibus (GEO) database ([68]http://www.ncbi.nlm.nih.gov/geo/), and TCGA was downloaded from the UCSC Xena browser (GDC hub: [69]https://gdc.xenahubs.net). The RNA-seq data were transformed into transcripts per kilobase million values. The study protocol was approved by the ethics committee of the Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China. The requirement for informed consent of study participants was waived by the ethics committee because the data were obtained from publicly available datasets. 2.2. Immune-related genes selection The quality control and normalization of count data were performed by R package variance-stabilizing transformation. The human immune cells were signed by 577 marker genes, which are highly representative of 24 human immune cell phenotypes in the TME [[70]17]. The penalized logistic least absolute shrinkage and selector operation (LASSO) algorithm was used to select the key immune cells, HLAs, and immune checkpoints; the LASSO algorithm was conducted by the R package glmnet, and the penalty parameter tuning was performed using a fivefold cross-validation. The unsupervised hierarchical clustering method (K-means) was carried out by the R package CancerSubtypes to identify the optimal cluster number with the highest stability and lowest ambiguity among different clusters. In order to screen for differentially expressed genes, the R package limma was used to calculate the fold change of transcripts, for which a fold-change value greater than two and an adjusted P value less than 0.05 were set as the cutoff values. The distribution and expression levels of differentially expressed genes were shown by the gene's heatmap and volcano plots. 2.3. Immunotherapy efficacy prediction model construction A principal component analysis (PCA) was performed, and principal component 1 was extracted to serve as the signature score, where the signature score = A*PC1 − B*PC2. Here, PC1 is the expression of immune genes whose Cox coefficient is positive, and PC2 is the expression of genes whose Cox coefficient is negative. After the prognostic value of each gene score was obtained, a formula was built to define the immune gene score of each patient. Then, a prediction model combined with the immune gene score, TMB, and lncRNA score [[71]16] was also constructed to predict the survival of immunotherapy. 2.4. Internal mechanism pathways exploring The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis were performed by the R package clusterProfiler. The cellular component, molecular function, and biological process of KEGG pathways and GO terms were considered statistically significant when P values and false discovery rates were less than 0.05. The modules of co-expressed genes were identified by weighted gene co-expression network analysis (WGCNA). Gene set enrichment analysis (GSEA) was used to quantify the proportions of the gene function pathway. Through the R package gsva, the Broad Institute enrichment algorithm was used to conduct gene set variation analysis (GSVA) of a gene sets from the molecular signature database (MSigDB) to calculate the hypoxia score. The metabolic pathways enriched in genes was conducted with the algorithm reported by Xiao et al. [[72]17]. The expression of CYT was defined as the average expression level of five genes: CD8A, CD8B, GZMA, GZMB, and PRF1. 2.5. Statistical analysis R version 3.6.1 (R Project for Statistical Computing) was used for statistical analysis. The Kaplan–Meier method with the log-rank test was used to aggregate OS or profession-free survival (PFS) data. The hazard ratios (HRs) and 95% confidence intervals (CIs) of the Cox proportional hazards regression model were calculated to compare patient survival. Fisher's exact test or the χ^2 test was used for comparisons of categorical variables, and the Wilcoxon rank sum test for 2-group comparisons or the Kruskal–Wallis test was used for multiple comparisons of continuous variables. The R package survminer was used to identify the optimal cutoff value for continuous variables. The sensitivity and specificity of the signature were evaluated by generate time-dependent receiver operating characteristic (ROC) curves, and the area under the curve (AUC) was obtained by the R package timeROC. Two-sided P < 0.05 was considered statistically significant for all analyses. 3. Results 3.1. Patients’ characteristics A total of 6251 patients from a multicohort population were enrolled and divided into a training cohort and three validation cohorts, including 675 patients undergoing immunotherapy and 5576 patients undergoing standard treatment (treatment strategies unannounced). The detailed characteristics of datasets are listed in Appendix [73]Table S1. 3.2. The key factors of tumor immune microenvironment were selected Following the analysis of the clinical information and RNA-seq data of the training cohort, five immune cells, eight HLAs, and four immune checkpoints were selected. Details of all immune cells, HLAs, and immune checkpoints are shown in Appendix [74]Table S2. Five of 22 immune cells, including gamma delta T cells (γδ T cells), activated natural killer (NK) cells, M1 macrophages, M2 macrophages, and activated mass cells were identified (Appendix [75]Fig. S1A, B). These were further divided into two distinct immune-cell-based clusters based on the differential gene expression (Appendix [76]Fig. S1C, D). These two immune-cell-based clusters had a differential OS benefit of immunotherapy (HR, 3.23; 95% CI, 1.85–5.67; P < 0.001) ([77]Fig. 2A), and the differences expression gene of two clusters are shown by the genes’ heatmap ([78]Fig. 2B). The functional annotations and pathways of two clusters were enriched in the chemokine signaling pathway and genomic biological processes in the response to interferon-gamma, cytokine receptor binding, and chemoattractant activity (Appendix [79]Fig. S1E, F). Fig. 2. [80]Fig. 2 [81]Open in a new tab The different overall survival (OS) benefits of immunotherapy in different tumor immune microenvironment clusters. (A) OS of immunotherapy of two immune cell–related clusters; (B) gene heatmap of two immune cell-related clusters; (C) OS of immunotherapy of two human leukocyte antigen–based clusters; (D) gene heatmap of two human leukocyte antigen-based clusters; (E) OS of immunotherapy of two immune checkpoint–related clusters; (F) gene heatmap of two immune checkpoint-related clusters. Seven of 23 HLAs, including HLA-C, HLA-DMA, HLA-DOA, HLA-DQB1, HLA-DRB3, HLA-F-AS1, HLA-G, and HLA-B were identified (Appendix [82]Fig. S2A, B). These were divided to two distinct HLA-based clusters based on differential gene expression (Appendix [83]Fig. S2C, D). These two HLA-based clusters had a differential OS benefit of immunotherapy (HR, 1.43; 95% CI, 1.09–1.89; P = 0.010) ([84]Fig. 2C), and the difference expression genes of the two clusters are shown by the genes’ heatmap ([85]Fig. 2D). The functional annotations and pathways of two clusters were enriched in hydrolase activity and secretory granule membrane and peptidase regulator activity (Appendix [86]Fig. S2E). Four of 22 immune checkpoints, that is, CD276, IDO1, LAG3, and SIRPA, were identified (Appendix [87]Fig. S3A, B). These were divided to two immune-checkpoint-based clusters based on the differential gene expression (Appendix [88]Fig. S3C, D). These two immune-checkpoint-based clusters had a differential OS benefit of immunotherapy (HR, 1.46; 95% CI, 1.13–1.89; P = 0.004) ([89]Fig. 2E), and the difference expression genes of two clusters are shown by the genes’ heatmap ([90]Fig. 2F). The functional annotations and pathways of two clusters were enriched in cytokine–cytokine receptor interaction and the chemokine signaling pathway, as well as genomic biological processes in T cell activation (Appendix [91]Fig. S3E, F). 3.3. The immune gene score was constructed for immunotherapy efficacy prediction To combine the differential expression gene of immune cells, HLAs, and immune checkpoints for predicting the immunotherapy benefit, a total of 1360 immune-related genes underwent a rounded analysis and were divided into three immune gene–related clusters ([92]Fig. 3A, Appendix [93]Fig. S4). Patients' OS benefit of immunotherapy showed a significant difference among the three clusters (P < 0.001, [94]Fig. 3B). This result was confirmed in patients with the standard treatment from Validation-4, and the immune gene–related clusters could significantly distinguish the patients’ survival differences (P < 0.001, [95]Fig. 3C). Fig. 3. [96]Fig. 3 [97]Open in a new tab Immune-related genes were divided into three immune gene–related clusters. (A) Gene heatmap of three immune gene-related clusters; (B, C) OS benefit of immunotherapy among these clusters was significantly different both in the IMvigor210 trial and Validation-4; (D, E) hypoxia scores and metabolism functional annotation were significantly different. Among these clusters, the hypoxia scores were calculated and metabolism functional annotation was conducted, which showed a significant difference state (P < 0.001, [98]Fig. 3D and E). Cluster A had the highest hypoxia score and upregulated the metabolism function of glycan biosynthesis and metabolism; cluster B had a medium hypoxia score and upregulated the metabolism function of lipid metabolism; and cluster C had the lowest hypoxia score and downregulated almost all metabolic pathways ([99]Fig. 3E). Furthermore, different co-expressed gene modules and distinct gene expression patterns were found in three clusters ([100]Fig. 4A–D) and showed different genomic functions. Cluster A was expressed in the Rap1 signaling pathway, while cluster C was expressed in the MAPK signaling pathway ([101]Fig. 4E–H). Fig. 4. [102]Fig. 4 [103]Open in a new tab The three immune gene–related clusters had different genomic functions. (A–D) Different co-expressed genes modules and distinct gene expression patterns of three immune gene–related clusters; (E–G) Gene Ontology analysis of three immune gene–related clusters; (H) Kyoto Encyclopedia of Genes and Genomes analysis of three immune gene–related clusters. Then, the PCA algorithm was employed to construct the immune gene score for predicting immunotherapy efficacy by using the datasets of immune-related genes of the training cohort. The patient's immune gene score was calculated by the following formula: The immune gene score = 0.4189*PC1 − 0.4501*PC2. Here, PC1 is the expression of immune genes whose Cox coefficient is positive, and PC2 is the expression of genes whose Cox coefficient is negative. Patients were divided into a high immune gene score group and low immune gene score group, with an optimal threshold of 0.22. These two groups had differential gene expression ([104]Fig. 5A, B). By screening the correlations between immune-related genes with patients’ survival benefit of immunotherapy, we found that there were 107 positive genes and 71 negative genes ([105]Fig. 5C). The differential expression gene of the two groups was involved in complement and coagulation cascades, cytokine-cytokine receptor interaction, systemic lupus erythematosus, viral protein interaction with cytokines and the cytokine receptor, and the NF-kappa B signaling pathway ([106]Fig. 5D). The GO enrichment analysis revealed that the high and low group had different genomic functions in DNA replication, cell-substrate junction, chromosomal region, and organelle fission ([107]Fig. 5E). The KEGG pathway enrichment analysis found that the two groups had different expression in the PI3K-Akt signaling pathway, focal adhesion, cell cycle, and DNA replication ([108]Fig. 5F). Fig. 5. [109]Fig. 5 [110]Open in a new tab Two immune gene score groups had different gene expressions and genomic functions. (A) OncoPrint of the high immune gene score group; (B) oncoPrint of the low immune gene score group; (C) volcano plot of differentially expressed genes; (D) enrichment plots showing that the five gene sets had significantly different expression; (E) Gene Ontology analysis; (F) Kyoto Encyclopedia of Genes and Genomes analysis. Patients in the high immune gene score group had a significantly greater OS compared with those in the low immune gene score group (HR, 2.41; 95% CI 1.76–3.31; P < 0.0001, [111]Fig. 6A). Validation analyses were performed in a multicohort population including metastatic urothelial cancer patients given the immunotherapy from Validation-1 (PFS: HR = 1.86; 95% CI, 1.09–3.19; P = 0.024, [112]Fig. 6B), melanoma patients given the immunotherapy from Validation-2 (OS: HR = 2.83; 95% CI, 1.84–4.39, P < 0.001, PFS: HR = 2.46; 95% CI, 1.40–4.35; P = 0.002, [113]Fig. 6C and D), and NSCLC patients given the immunotherapy from Validation-3 (PFS: HR, 3.36; 95% CI, 1.26–8.94; P = 0.015; [114]Fig. 6E). Moreover, the immune gene score showed an association with the survival of patients given the standard treatment from Validation-4 ([115]Fig. 6F). Fig. 6. [116]Fig. 6 [117]Open in a new tab Two immune gene score groups could effectively predict patients' survival benefit. (A) Overall survival of 348 bladder cancer patients treated with the PD-L1 from IMvigor210; (B) overall survival of 87 metastatic urothelial cancer patients treated with immunotherapy from [118]GSE176307; (C) overall survival of 213 melanoma patients treated with the immunotherapy from [119]GSE78220, TCGA-SKCM, PRJEB23709, and [120]GSE100797; (D) progression-free survival of 96 melanoma patients treated with the immunotherapy from [121]GSE78220, TCGA-SKCM, PRJEB23709, and [122]GSE100797; (E) overall survival of 27 non-small cell lung cancer patients given immunotherapy from [123]GSE135222; (F) overall survival of 5576 pan-cancer patients given the standard treatment from The Cancer Genome Atlas. High score, high immune gene score group; Low score, low immune gene score group. Furthermore, the high immune gene score group had higher expression of CD274 (PD-L1) and CYT ([124]Fig. 7A and F) than the low immune gene score group. These findings were also validated in a multicohort population including melanoma patient given the immunotherapy from Validation-1 ([125]Fig. 7B and G), Validation-2 ([126]Fig. 7C and F), and Validation-3 ([127]Fig. 7D and I), and pan-cancer patients given the standard treatment from Validation-4 ([128]Fig. 7E and J). Fig. 7. [129]Fig. 7 [130]Open in a new tab Two immune gene score groups had different CD274 and CYT expressions. (A, F) The different CD274 and CYT expression of 348 bladder cancer patients treated with the PD-L1 from IMvigor210; (B, G) the different CD274 and CYT expression of 87 metastatic urothelial cancer patients treated with immunotherapy from [131]GSE176307; (C, H) the different CD274 and CYT expression of 213 melanoma patients treated with the immunotherapy from [132]GSE78220, TCGA-SKCM, PRJEB23709, and [133]GSE100797; (D, I) the different CD274 and CYT expression of 27 non-small cell lung cancer (NSCLC) patients treated with the immunotherapy from [134]GSE135222; (E, J) the different CD274 and CYT expression of 5576 pan-cancer patients given the standard treatment from The Cancer Genome Atlas. High score, high immune gene score group; Low score, low immune gene score group. 3.4. Tumor immune microenvironment signature was constructed and showed better immunotherapy efficacy prediction The TME signature was constructed by combining the immune gene score, TMB, and lncRNA score [[135]16]. Compared with single-factor prediction strategies including the immune gene score, TMB, and lncRNA score, the TME signature generated more substantial improvement in immunotherapy efficacy prediction (AUC: 0.814 for the TME signature, 0.61 for TMB, 0.791 for the lncRNA score, 0.737 for the immune gene score), and also greater than the CD274 (PD-L1, AUC = 0.615) and IFNG (AUC = 0.646) in the IMvigor210 trial ([136]Fig. 8A and B). Fig. 8. [137]Fig. 8 [138]Open in a new tab Tumor immune microenvironment (TME) signature had good prediction of immunotherapy survival benefit. (A) Receiver operating characteristic curves of the CD274, INFG, immune gene score, tumor mutation burden (TMB), and TME signature for immunotherapy efficacy prediction at 20 months. (B) The ROC curves of the CD274, INFG, immune gene score, TMB, and TME signature for immunotherapy efficacy prediction with times. 4. Discussion Given the excellent outcomes of immunotherapy on multiple types of difficult-to-treat tumors, it is increasingly important to develop a strategy that can accurately predict the efficacy and prognosis of immunotherapy for patients. In this study, total of 6251 patients’ RNA-seq data from a multicohort population were analyzed by machine learning to find a suitable prediction strategy for immunotherapy efficacy. It determined that the novel immune gene pattern with the highest hypoxia score was more closely associated with immunotherapy. Furthermore, a combined prediction model was constructed by combining the immune gene score, TMB, and lncRNA score, which showed better immunotherapy efficacy prediction than TMB, the lncRNA score, or the immune gene score alone. Increasing evidence indicates that the TME has important influences on tumor progression and the immunotherapy response [[139]18,[140]19]. In this study, we found that five immune cells, eight HLAs, and four immune checkpoints were significantly associated with immunotherapy efficacy. As one of the most powerful immune cells, γδ T cells can mobilize almost the entire immune system to kill tumor cells directly through the NK cell receptor on the cell surface, or activate B/DC/αβ T/NK cells in various possible ways to kill tumor cells indirectly [[141]20,[142]21]. Other immune cells have also been shown to contribute to tumor progression and immunotherapy responses, including macrophages, NK cells, and B cells [[143][22], [144][23], [145][24]]. HLAs are the expression products of the major human histocompatibility complex and associated with survival in the majority of cancer types [[146]11,[147]25,[148]26]. A previous study including two cohorts of more than 1535 patients treated with immunotherapy found that the HLA-I molecular polymorphism was associated with improved survival [[149]27]. Moreover, as for the key immune checkpoints identified in this study, high expression of CD276 on the surface of tumor cells could evade immune system surveillance or evade macrophage phagocytosis by the SIRPA-CD47 pathway [[150]28,[151]29]. In addition, a high level of IDO1 and LAG3 expression would inhibit the function of T cells and NK cells [[152][30], [153][31], [154][32]]. Therefore, these three aspects of the TME should be the basics of further exploration of the immunotherapy efficacy prediction strategy. Research has shown that hypoxia affects tumor immunosuppression and immune escape [[155]33,[156]34]. In anoxic microenvironments, T cells and NK cells exhibit impotent or depleted states, leading to dysfunction [[157][35], [158][36], [159][37]]. Hypoxia stimulates inhibitory cells or immunosuppressive cytokines that block immune effector cells [[160]38]. In addition, cancer cell metabolism has important interaction with immune metabolism, and cell metabolism is a key factor in maintaining the vitality and function of cancer cells and immune cells [[161]39]. Tumor cells have enormous anabolic needs, and through metabolic pathways different from those of normal cells, tumors leave the tumor microenvironment acidic, hypoxic, and/or deplete key nutrients needed by immune cells, which has been called “Warburg effect” [[162]40]. In other words, tumor cells mainly supply energy through the glycolysis pathway under aerobic conditions, the lipid metabolism is increased to maintain the energy supply required by tumor rapid proliferation, and the metabolic competition between cancer cells and invading immune cells is established [[163]41,[164]42]. As a complex form of post-translational modification, glycosylation affects more than 50% of cellular proteins. In many cancers, abnormal glycosylation commonly regulates the occurrence and progression of tumors by controlling a variety of physiological and pathological processes, such as cell–cell adhesion; cell–matrix interaction; epithelial–mesenchymal transformation; and tumor proliferation, invasion, metastasis, and angiogenesis [[165]43,[166]44]. The glycan pattern expressed in cancer cells has a significant effect on tumor behavior, and the tumor-associated carbohydrate antigen's overexpression on a variety of cancer cells makes it an attractive target for immunotherapy development [[167]45]. This can explain that why these three immune-related gene clusters can have significantly different immunotherapy OS benefits. In order to explore the potential underpinning of the above biomarkers, enrichment analysis of differentially expressed genes among different clusters was performed. Based on the immune gene signature, the results indicate that the three clusters are involved in the Rap1 signaling pathway, various metabolism-related pathways, and the MAPK signaling pathway, respectively. As we know, the Rap1 signaling pathway is a crucial player in the process of tumor cell migration, invasion, and metastasis [[168][46], [169][47], [170][48]], such as the MAPK signaling pathway in cell proliferation, growth, and differentiation [[171]49,[172]50]. Most of the genes are associated with the immune response and highlight diverse biological processes such as activating antitumor immunity, promoting the proliferation and differentiation of T and B cells, and enhancing the cytolysis of NK cells, which may provide avenues for further research in tumor immunotherapy. Even though profound insights into these gene-related mechanisms remain to be explored, these characteristics make them biomarker candidates and suggest that our prediction model may be a promising prognostic tool. Previous studies showed that the combined prediction model could significantly improve the ability of prediction for survival of immune checkpoint inhibitors treatment compared to the single biomarker of PD-L1 or TMB [[173]16,[174]51]. In our previous study [[175]16], we successfully used CTL and the lncRNA signature to categorize patients treaded with immunotherapy in the IMvigor210 trial into four classes: an immune-active class, immune-exclusion class, immune-dysfunctional class, and immune-desert class. Among them, the patients of the immune-dysfunctional class had highly immune-infiltrated tumors but did not have a favorable prognosis or immunotherapeutic response. The lncRNA signature could provide additional information about immune dysfunction. In this study, the TME signature was constructed in combination with the immune gene score, TMB, and lncRNA score, which was shown to be more strongly predictive of immunotherapy efficacy. Moreover, the TME signature was also greater than the single biomarkers such as PD-L1 and INFG, which are seen as useful biomarkers for the immunotherapy. The study findings indicate that comprehensive consideration of multi-omics data can contribute to new directions for immunotherapy predictive strategy development. This study found that the novel immune gene pattern with the highest hypoxia score was more closely associated with immunotherapy. In addition, the TME signature with full consideration of the level of cellular, molecular, and genetic factors of the TME, lncRNA, and TMB had a good association with immunotherapy efficacy in patients with cancer. Thus, we recommend using the characterization of the tumor immune microenvironment associated with immunotherapy efficacy via multi-omics analysis of cancer. 5. Limitations of the study There were several limitations to this study. First, the patients treated with immunotherapy in the current validation cohorts could not be generalized to all tumor types, and in-house cohort data are not yet available. However, we found that the survival prediction could also be verified in pan-cancer patients given standard treatment, which suggests that further validation can be carried out in the future. Second, gene expression signatures can be influenced by sampling bias owing to differences in specimen preservation methods and intratumor genetic heterogeneity. Compared with fresh specimens, there will be vitro RNA degradation in formalin-fixed, paraffin-embedded, and fresh frozen specimen. Last, the TME signature only contains the data of genetic mutations and RNA sequencing, so it cannot be called a real multi-omics prediction model. Previous studies have shown that the radiomic features are correlated with the TME and immune function [[176]52,[177]53], which can serve as one of the prognostic factors. At the same time, the role of genomic methylation signatures and plasma microRNA in the TME should not be ignored [[178]54,[179]55]. Our future research direction is to establish a new immunotherapy efficacy prediction model based on multi-omics data. Author contribution statement Lili Lin, Wenda Zhang and Yunfang Yu: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper; Yongjian Chen, Wei Ren, and Wenhao Ouyang: Contributed reagents, materials, analysis tools or data; Wrote the paper; Jianli Zhao, and Zifan He: Performed the experiments; Wrote the paper. Herui Yao, Weifeng Su: Conceived and designed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Contributed reagents, materials, analysis tools or data; Wrote the paper. Funding statement This study was supported by grants 82273204, 81972471 and 82073408 from the National Natural Science Foundation of China, grant 2023A1515012412 and 2023A1515011214 GuangDong Basic and Applied Basic Research Foundation, grant 202206010078 and 202201020574 from the Guangzhou Science and Technology Project, grant 2018007 from the Sun Yat-Sen University Clinical Research 5010 Program, grant SYS-C-201801 from the Sun Yat-Sen Clinical Research Cultivating Program, grant A2020558 from the Guangdong Medical Science and Technology Program, grant 7670020025 from Tencent Charity Foundation, grant YXQH202209 from the Scientific Research Launch Project of Sun Yat-Sen Memorial Hospital. Declaration of interest's statement The authors have declared that no competing interest exists. Acknowledgments