Graphical abstract

   graphic file with name fx1.jpg
   [43]Open in a new tab

Highlights

     * •
       Construction of a long-term survival predictive score of
       immunotherapy (iLSPS)
     * •
       Rigorous multi-omics feature selection based on extensive TCGA and
       clinical trial data
     * •
       Model incorporates neoantigen, CD8^+ T cell, metabolism, and
       MHC_score
     * •
       High iLSPS identifies ICI beneficiaries and correlates with
       inflamed microenvironment

Motivation

   Long-term survival (LTS) benefit is the landmark achievement of
   immunotherapy, but significant questions remain about the underlying
   molecular mechanism and predictive biomarkers. However, the limited
   clinical and genetic data on patients with long-term follow up poses
   challenges for research into ICI LTS biomarkers. Since conducting
   in vitro/in vivo experiments using cell lines or mouse models to
   explore the mechanism of human LTS is impractical, in silico
   bioinformatic analysis with human genomic sequencing data is a needed
   approach. Here, we conduct a systematic analysis to elucidate LTS
   predictors, including a panoramic biomarker review, rigorous feature
   selection, and Gaussian process regression machine-learning model
   training and testing. We validate model performance with multiple
   cohorts, discuss the biological interpretation of model findings, and
   provide illustrative guidance on real-world clinical application.
     __________________________________________________________________

   Zhao et al. utilize TCGA multi-omics datasets and cancer immunotherapy
   clinical trial and cohort data to develop predictive biomarkers and an
   integrative machine-learning model for long-term survival after
   immunotherapy in melanoma, lung cancer, and several other cancer types.
   This work helps to address the need for developing long-term survival
   predictors.

Introduction

   Immune checkpoint inhibitor (ICI)-based immunotherapy is a cornerstone
   of cancer treatment.[44]^1 Recently, Keynote-001 reported 5-year
   overall survival (OS) rates of 23.2% and 15.5% following pembrolizumab
   treatment among treatment-naive and previously treated patients with
   advanced non-small cell lung cancer (NSCLC), respectively.[45]^2 The
   survival benefits from ICI far exceeded those of the pre-ICI era,
   representing a chance for long-term survival (LTS) and even curability
   for patients with cancer.[46]^3 However, to date, the 5-year OS of
   immunotherapy, an indicator of LTS, has been only determined in the
   clinical trials of six cancer types, namely kidney renal clear cell
   carcinoma (KIRC), lung adenocarcinoma (LUAD), lung squamous cell
   carcinoma (LUSC), skin cutaneous melanoma (SKCM), esophageal squamous
   carcinoma (ESCC), and bladder urothelial carcinoma (BLCA). Therefore,
   the key determinants underpinning the LTS benefit have not been fully
   elucidated owing to the limited clinical and genetic data of these
   patients.

   Discriminating LTS patients implies identifying those who truly benefit
   from ICI. From previous studies, sustained responders to ICIs were
   enriched among patients with high tumor mutation burden (TMB) and/or
   PD-L1 tumor proportion scores (TPSs) >50%.[47]^4^,[48]^5 In the era of
   artificial intelligence (AI) and “big data,” machine-learning
   techniques are extensively utilized in biomedical
   research.[49]^6^,[50]^7^,[51]^8 Pretreatment circulating tumor DNA
   (ctDNA), peripheral T cell levels, and early on-treatment ctDNA
   dynamics were harmonized to develop a Bayesian probit model to predict
   durable response to ICIs.[52]^9 A machine-learning algorithm (XGBoost)
   was also deployed to construct a model based on 11 genetic features,
   predicting response to ICIs across seven tumor types.[53]^10 Modeling
   of LTS could similarly shed light on molecular biomarkers and
   mechanisms contributing to this critical clinical parameter.

   Previous work[54]^11 has utilized Spearman correlation analysis between
   the median value of 36 variables per cancer type from TCGA and the
   objective response rate (ORR) from anti-PD-(L)1 of corresponding
   clinical trials[55]^12 to identify predictors for ICI response. Another
   study[56]^13 also performed Spearman correlation analysis between the
   molecular features from TCGA and real-world pharmacovigilance data to
   identify biomarkers for immune-related adverse effects. In this study,
   to establish a multi-parameter network predictive of LTS, we adopted
   the approach of mapping TCGA multi-omics data to survival data from
   immunotherapy clinical trials. Each cancer type was treated as a
   separate unit. The median value of molecular features and survival of
   each cancer type were used to perform primary and secondary feature
   selection. The selected markers were harmonized to construct a
   machine-learning GPR model to estimate the LTS predictive score of
   immunotherapy (iLSPS). We present assessments of prediction accuracy in
   ICI-treated cohorts and correlations between iLSPS and immune-related
   indexes to explore the biological implications of iLSPS. A study
   overview is presented in [57]Figure 1.

Figure 1.

   [58]Figure 1
   [59]Open in a new tab

   Study schematic and surrogate metric for LTS

   (A) Study schematic. Clinical data of ICI efficacy and TCGA multi-omics
   parameters were collected. Spearman correlation, random forest, and GPR
   analyses were performed to select features used to construct the
   predictive model. The GPR model-estimated iLSPS was utilized for
   clinical application.

   (B and C) Pearson’s correlation between all efficacy and AE-related
   items from the publications of the included clinical trials. Each
   circle represents a trial (C).

   See also [60]Figure S1 and [61]Table S2.

Results

Cohort overview

   (1) 158 clinical trial cohorts ([62]Table S1) involving 21,023
   single-agent ICI-treated advanced/metastatic patients with 25 solid
   cancer types ([63]Table S2) were utilized to investigate an optimal
   surrogate for 5-year OS benefit. (2) TCGA multi-omics datasets of
   corresponding 25 cancer types were used for biomarker selection and GPR
   model construction. (3) Two single-cell RNA sequencing (scRNA-seq)
   datasets including ICI-treated patients with advanced basal cell
   carcinoma (BCC) and renal cell carcinoma (RCC), respectively, were
   included to validate the predictive ability of biomarkers in
   single-cell resolution. (4) Six external and one internal ICI-treated
   cohorts including a total of 634 patients were tested for model
   performance, which comprised five melanoma
   cohorts,[64]^14^,[65]^15^,[66]^16^,[67]^17^,[68]^18 one BLCA
   cohort,[69]^19 and one lung cancer cohort from our center (Zenodo:
   8275988).

Establishment of surrogate metric for LTS following immunotherapy

   We firstly reviewed 23 clinical trials with available 5-year OS rates
   of 3,999 patients with LUAD, LUSC, KIRC, SKCM, ESCC, or BLCA. However,
   we were interested in building a model that leverages relationships
   between ICI and LTS across more cancer types. We therefore explored the
   correlation of 5-year OS rate with near-time point OS from clinical
   trials with available 4-year OS (6 cancer types), 3-year OS (10 cancer
   types), 2-year OS (18 cancer types), and 1-year OS (25 cancer types)
   data. Of the included clinical trials, 22 indices were assessed,
   including OS and progression-free survival (PFS) outcomes, responses
   (ORR and median duration of response), and immune-related adverse
   events (irAEs). Pearson’s correlations across all indices were
   measured, and we found that the 5-year OS rate was strongly correlated
   with the 1-, 2-, 3-, and 4-year OS rates ([70]Figure 1B). The
   correlation between the 1-year OS rate and the 5-year OS rate was
   notably significant across 6 cancer types (Pearson’s R = 0.80;
   p < 0.0001; [71]Figure 1C).

   We would like to further note stepwise correlations between different
   time point OS data. 3- and 5-year OS rates were generally correlated
   (Pearson’s R = 0.96; p < 0.0001; [72]Figure S1A). We also analyzed 42
   clinical trials with available 3-year OS rates ([73]Table S4), finding
   that 2- and 3-year OS rates were significantly correlated (Pearson’s
   R = 0.94; p < 0.0001; [74]Figure S1B). Then, among 80 clinical trials
   with available 2-year OS rates, we found the correlation between the 2-
   and the 1-year OS rates to also be significant (Pearson’s R = 0.92;
   p < 0.0001; [75]Figure S1C). The above stepwise correlations involving
   more cohorts provide further evidence supporting the use of 1-year OS
   rate as a surrogate for the 5-year OS rate.

   To incorporate more cancer types for feature selection and model
   construction, subsequent analyses included 158 clinical trials
   involving 25 cancer types and 21,023 patients with available 1-year OS
   rates. The number of patients, follow-ups, lines of treatment, and ICI
   regimens of these clinical trials are described in detail in
   [76]Table S3. However, we want to clearly acknowledge the assumptions
   and limitations inherent to utilizing the 1-year OS rate, particularly
   when moving beyond the cancer types that had direct 5-year OS data
   available.

Exploring multi-omics biomarkers of LTS and primary selection of candidate
biomarkers

   To explore the multi-omics predictive biomarkers, we collected exome,
   transcriptome, and methylation data from TCGA. A panel of 107 potential
   biomarkers belonging to ten categories was considered ([77]Figure 2A),
   mapping median values to the 1-year OS rates per cancer type (see
   [78]STAR Methods). Spearman’s correlation analyses identified a total
   of 29 biomarkers significantly correlated with the 1-year OS rate
   following ICI (Spearman’s p < 0.05) but not with the 1-year OS rate
   from TCGA (Spearman’s p > 0.05) ([79]Figure 2A). Among these
   biomarkers, we selected the top 20 variables according to the
   Benjamini-Hochberg (BH) false discovery rate (FDR) adjusted p values on
   the 1-year OS rate following ICI ([80]Table S5).

Figure 2.

   [81]Figure 2
   [82]Open in a new tab

   Landscape of multi-omics biomarkers for LTS

   (A) The correlation of a panel of predictors with 1-year OS rate after
   ICI and TCGA prognostic 1-year OS rate. In the annular stacked bar
   chart, the color and the height of each bar represent the p value and
   the correlation coefficient from the Spearman’s rank test,
   respectively.

   (B and C) Spearman’s correlation between CD8^+ T cell, MHC_score, and
   1-year OS after ICI.

   (D) Spearman’s correlation between MHC_score and ssGSEA scores of 12
   biological pathways in all 25 cancer types. ∗p ≤ 0.05.

   (E) Validation of MHC_score in two immunotherapy datasets. The efficacy
   was evaluated according to RECIST v.1.1. CR, complete response; PR,
   partial response; SD, stable disease; PD, progressive disease. p values
   were from Wilcoxon rank-sum test. Data are represented as mean ± SEM.

   (F) Spearman’s correlation between metabolism score and 1-year OS after
   ICI.

   (G) GSEA plot showing enrichment of pathways in the population with
   superior ICI efficacy. p values from Benjamini-Hochberg (BH) FDR
   adjustment are shown on a color scale.

   (H and I) Spearman’s correlation between CNV burden, total neoantigen
   (log-scaled), and 1-year OS after ICI.

   (B, C, F, H, and I) Each circle represents the median value of a
   biomarker per cancer type from TCGA and the median value of the 1-year
   OS rates from clinical trials of that cancer type.

   See also [83]Figure S2 and [84]Table S5.

   A significant association between the level of CD8^+ T cells and the
   1-year OS rate following ICI (Spearman’s R = 0.55; p = 0.0048;
   [85]Figure 2B) reflected the critical role of CD8^+ T cells in
   antitumor immune effects. Major histocompatibility complex (MHC) class
   I and II genes were utilized to calculate the MHC_score using the Ridge
   regression model, which significantly correlated with the 1-year OS
   rate following ICI (Spearman’s R = 0.48; p = 0.0167; [86]Figure 2C). In
   most cancer types, MHC_score was positively correlated with the level
   of antitumor cytokines, coactivation molecules, and inflammatory
   response, as assayed by single-sample gene set enrichment analysis
   (ssGSEA) scores ([87]Figure 2D). Moreover, the MHC_score for patients
   with NSCLC with response to anti-PD-1 (p = 0.041; [88]Figure 2E, left)
   and patients with melanoma with response to adoptive T cell therapy
   (p = 0.030; [89]Figure 2E, right) was significantly higher than that of
   non-responders, supporting the relevance of MHC_score.

   We next identified set of 527 genes significantly correlated with the
   1-year OS rate after ICI but not with TCGA prognostic 1-year OS rate
   ([90]Table S6), finding that these were mainly enriched in metabolic
   pathways ([91]Figure S2A), which reflects metabolic reprogramming in
   most cancers.[92]^20 We calculated ssGSEA scores for all 106 metabolic
   pathways in the KEGG database, nine of which significantly correlated
   with the 1-year OS rate after ICI but not TCGA 1-year OS rate
   ([93]Table S7). Among these, five pathways ([94]Figure S2B) were
   significantly enriched in ICI beneficiaries (BH FDR adjusted p value
   [p.adjust] < 0.01; [95]Figure 2G). The top two pathways were valine,
   leucine, and isoleucine degradation and tryptophan metabolism
   ([96]Figure S2C). The ssGSEA scores of these five pathways were then
   added to define the metabolism score, which significantly correlated
   with the 1-year OS rate after ICI (Spearman’s R = 0.54; p = 0.0056;
   [97]Figure 2F) and 19 out of the 107 identified biomarkers in
   [98]Figure 2A ([99]Figure S2D).

   Traditionally defined copy-number variation (CNV) burden (“fraction
   altered” and “number of segments”) was not significantly predictive
   ([100]Figure 2A). However, a modified, gene-level CNV burden, based on
   the GISTIC2.0 value of the top 500 altered genes ([101]Table S8),
   correlated with the 1-year OS rate after ICI (Spearman’s R = −0.48; p =
   0.0147; [102]Figure 2H). These altered genes were mainly located on
   chromosomes 8q and 3q ([103]Figure S2E).

   385 mRNAs and seven long non-coding RNAs (lncRNAs) associated with the
   1-year OS rate after ICI (not TCGA 1-year OS rate) were obtained and
   yielded 273 significantly interacting lncRNA-mRNA pairs, the top 20 of
   which are presented in [104]Figure S2F. LINC00601 potentially targeted
   many mRNAs associated with immunotherapy efficacy. The summation of the
   four lncRNAs’ expression levels (RNA-TPM) was defined as the lncRNA
   score, which significantly correlated with the 1-year OS rate following
   ICI (Spearman’s R = 0.46; p = 0.022; [105]Figure S2G).

   Since TMB is an FDA-approved tumor-agnostic predictive biomarker of
   immunotherapy, we manually incorporated mutation-based variables,
   including total TMB, non-synonymous TMB, insertions and deletions
   (indels), non-synonymous indels, single-nucleotide variants (SNVs),
   non-synonymous SNVs, truncating and missense mutations, and tumor
   neoantigen burden (TNB). These variables were highly correlated
   ([106]Figure S2H), and among them we selected the top 5 variables
   according to the adjusted p values. In addition, the correlation
   between log-scaled TNB and the 1-year OS rate after ICI was presented
   ([107]Figure 2I).

   In summary, a total of 25 biomarkers were selected from the analyses
   for further refinement.

Secondary selection of candidate biomarkers

   The median value of the top 25 biomarkers per cancer type, 1-, 2-, and
   3-year OS rates from ICI, and their correlations are summarized in
   [108]Figure 3A. Spearman’s correlation analyses of the top 25
   biomarkers suggest five major groups ([109]Figure 3B): one comprised
   TMB-related biomarkers; one comprised tumor microenvironment
   (TME)-related biomarkers, such as tumor-infiltrating lymphocytes
   (TILs); one comprised genomic instability-related biomarkers, such as
   CNV burden; and the other two groups referred to metabolism score,
   lncRNA score, and MCAM expression, respectively. Importantly,
   inter-group variables had little to no correlation with each other.
   Based on the above intra-variable correlations, it would be redundant
   to incorporate all of the top 25 biomarkers in the predictive model;
   secondary feature selection was thus required to improve its
   interpretability and performance. The importance of the 25 features was
   ranked using random forests (RFs; [110]Figure 3C). We also used GPR
   algorithm-based wrapper methods to select a subset of crucial features.
   The intersection of the RF top 10 important features and GPR-selected
   features was obtained, and based on this, total neoantigen
   (log-scaled), metabolism score, CD8^+ T, and MHC_score were selected as
   candidate biomarkers for the predictive model construction
   ([111]Figure 3D). The correlations between these four biomarkers and
   the 1-year OS rate after ICI remained stable when iteratively removing
   one cancer type to avoid overfitting ([112]Figure S3B). Moreover, the
   likelihood-ratio test of HLM mixed-effects models indicated that these
   four biomarkers were significantly correlated with the 1-year OS rate
   after ICI across cancers (p < 0.05; [113]Figure S3C).

Figure 3.

   [114]Figure 3
   [115]Open in a new tab

   Candidate biomarker selection and validation using scRNA-seq

   (A) Heatmap of normalized median value of top 25 biomarkers correlated
   with 1-year OS rate after ICI (not TCGA prognostic 1-year OS rate) in
   each cancer type. The annotation on top presents the 1-, 2-, and 3-year
   OS rates after ICI of 25 cancer types. The annotation on the right
   presents the Spearman’s correlation coefficients of top 25 biomarkers
   with 1-, 2-, and 3-year OS rates following ICI treatment.

   (B) Spearman’s correlation matrix across top 25 biomarkers (median
   value per cancer type).

   (C) Feature importance scores assessed via random forest analysis.

   (D) Venn plot of features selected from the random forest and GPR
   wrapper method.

   (E–H) scRNA-seq validation using BCC (E and F) and RCC datasets (G and
   H). (E and G) Violin plots of metabolism scores of tumor cells among
   patients responsive or non-responsive to ICI. The point represents a
   single cell, and the color of the point denotes the patient. (F and H)
   Violin plots of immune-related gene expression per cell of CD8^+ T
   populations among patients responsive or non-responsive to ICI.

   Chi-squared (χ^2) and p values were obtained via one-way ANOVA from the
   linear mixed-effects models; ns: p > 0.05, ∗p ≤ 0.05, ∗∗p ≤ 0.01, ∗∗∗p
   ≤ 0.001. Data are represented as mean ± SEM (E–H).

   See also [116]Figures S3 and [117]S4.

Marker validation using scRNA-seq

   We utilized two scRNA-seq datasets to validate the biomarkers beyond
   what could be resolved via bulk transcriptomic analysis. In the BCC
   dataset, the metabolism scores of tumor cells (“tumor_1” cluster) were
   significantly more upregulated among ICI responders than among
   non-responders (χ^2 = 4.37, p = 0.0366), and the metabolism scores of
   the “tumor_2” cluster were also higher among ICI responders, although
   they did not achieve statistical significance (χ^2 = 0.29, p = 0.5892;
   [118]Figure 3E). Instead, the metabolism scores of CD8^+ T cells and
   natural killer (NK) cells indicated no association with ICI response
   ([119]Figure S4A). The CNV burden of the “Tumor_1” cluster from ICI
   responders was significantly lower than that from non-responders (χ^2 =
   5.16, p = 0.0231; [120]Figures S4B and S4D). The relative fraction of
   CD8^+ T and NK cell populations was seemingly higher in
   anti-PD-1-responsive patients ([121]Figure S4C), corresponding with the
   bulk analysis results. Furthermore, GZMB, GZMK, and CXCR3 transcript
   levels of activated CD8^+ T cells; interferon gamma (IFNG), TNFRSF4,
   and CD27 of exhausted CD8^+ T cells; and PRF1 and FASLG of memory CD8^+
   T cells were significantly higher in ICI responders than in
   non-responders ([122]Figure 3F). The effector gene expression of NK
   cells indicated no association with ICI response ([123]Figure S4E).

   In another RCC dataset, the metabolism scores of tumor cells, including
   “TP1” (χ^2 = 5.11, p = 0.0238), “TP2” (χ^2 = 4.28, p = 0.0387), and
   “cycling tumor” clusters (χ^2 = 6.76, p = 0.0093), were significantly
   more upregulated among ICI responders than among non-responders
   ([124]Figure 3G). However, the metabolism scores of CD8^+ T and NK cell
   clusters, excluding “FGFBP2- NK,” indicated no association with ICI
   response ([125]Figure S4F). The CNV burden of tumor cells in ICI
   responders tended to be lower ([126]Figures S4G and S4I). The fraction
   of CD8^+ T and NK cell populations were also higher in ICI responders
   ([127]Figure S4H). Moreover, most cytotoxic and costimulatory genes in
   all CD8^+ T cells clusters and partial NK cell clusters were
   significantly higher in ICI responders than in non-responders
   ([128]Figures 3H and [129]S4J).

   Overall, these results further demonstrated the significance of the
   metabolism score and CD8^+ T cells for model construction of LTS
   benefit.

GPR model construction and validation using TCGA datasets

   To construct a multi-parametric model, we utilized the GPR
   machine-learning algorithm, a non-parametric Bayesian approach, of
   which the prior probability and likelihood are generally assumed to
   have a Gaussian distribution, and the predictive posterior probability
   is speculated using Bayes’ rule.[130]^21 The median value of total
   neoantigen, metabolism score, CD8^+ T, and MHC_score for each cancer
   type from TCGA database, and the median 1-year OS rate from the
   corresponding clinical trials, were input to train the GPR model, and
   the predicted value was nominated as the iLSPS. The hyperparameter of
   the GPR model was tuned by random search optimization. The predictive
   performance of the model was measured using the Pearson correlation
   coefficient between the predicted iLSPS and the true 1-year OS rate.
   Through the training process in a bootstrap resampling framework
   ([131]Figure S5A), iLSPS values were closed to the actual 1-year OS
   rates (Pearson’s R = 0.81; p < 0.0001) in the training set
   ([132]Figure 4A). Using the trained GPR model, we also predicted the
   iLSPS of 18 cancers with available 2-year OS rates, which significantly
   correlated with the actual 2-year OS rates (Pearson’s R = 0.53; p =
   0.0235; [133]Figure 4B). Likewise, the iLSPS of eight cancers with
   available 3-year OS rates were estimated and correlated with the actual
   3-year OS rates (Pearson’s R = 0.89; p = 0.0029; [134]Figure 4C).

Figure 4.

   [135]Figure 4
   [136]Open in a new tab

   GPR model construction with TCGA datasets and utilization in
   ICI-treated cohorts

   (A) Pearson’s correlation between iLSPS and actual 1-year OS rates
   after ICI in the training set of 25 cancer types. The model achieved a
   mean square error (MSE) of 0.0098 and a root MSE (RMSE) of 0.0988.

   (B) Pearson’s correlation between iLSPS and actual 2-year OS rates
   after ICI of 18 cancer types.

   (C) Pearson’s correlation between iLSPS and actual 3-year OS rates
   after ICI of eight cancer types.

   (D) Leave-one-out cross-validation of the model. MSE was 0.0154, and
   RMSE was 0.1242. Each point refers to the true 1-year OS rate after ICI
   of that cancer type and the iLSPS predicted from the model trained with
   the remained 24 cancer types.

   (E) Kaplan-Meier (K-M) curves of OS stratified by iLSPS with the
   optimal cutoff in six public cohorts and our internal cohort.

   (F) Forest plot of OS HRs for patients with high versus low iLSPSs
   (with the optimal cutoff) in seven cohorts.

   (G and J) Pooled AUC for 1-, 2-, 3-, and 4-year OS of seven datasets
   for each biomarker.

   (H and K) Distribution of pooled HRs for patients with high versus low
   iLSPSs from seven cohorts for each biomarker. A series of cutoffs
   across the quartiles of biomarkers were used. Q values were calculated
   via the Wilcoxon rank-sum test, followed by BH FDR adjustment.

   (E–H) Data are represented as mean ± SEM.

   (I) K-M curves of OS stratified by PD-L1 (IHC).

   See also [137]Figure S5.

   To validate the model, we iteratively removed data of one cancer type
   and predicted its survival rate using the model trained with the
   remaining 24 cancer types. The correlation between the predicted iLSPS
   and the 1-year OS rate was still significant (Pearson R = 0.63; p =
   0.0007; [138]Figure 4D). Next, TCGA samples of each cancer type were
   randomly split into several subsets according to the number of clinical
   trials reporting 1-year OS rates for that specific cancer type, which
   constituted a dataset of 158 groups. The median value of total
   neoantigen, metabolism score, CD8^+ T, and MHC_score for each of these
   groups mapped to the 1-year OS rates from 158 clinical trials,
   performing as the test dataset. This random split of TCGA samples was
   repeated 500 times, in which the Pearson’s R between the predicted
   iLSPS and the 1-year OS rate consistently fluctuated within a narrow
   range of 0.60–0.68 ([139]Figure S5B). The most frequent result of
   Pearson’s R was 0.64 (p < 0.0001; [140]Figures S5B and S5C), validating
   the robustness of the model in this independent test dataset.

Utilization of iLSPS in individual patients from ICI-treated cohorts

   We sought to estimate the iLSPSs of individual patients and tested the
   predictive performance in seven independent ICI-treated cohorts of
   melanoma,[141]^14^,[142]^15^,[143]^16^,[144]^17^,[145]^18 BLCA,[146]^19
   and lung cancer (our own cohort, Wang 2022), which were not included in
   model training ([147]Table S9). The predictive ability of iLSPS was
   measured with a series of cutoff points, and the optimal cut point was
   determined with the lowest hazard ratio (HR) for OS in the iLSPS-high
   versus iLSPS-low groups in each cohort ([148]Figure S5D). In four
   melanoma,[149]^14^,[150]^15^,[151]^17^,[152]^18 one BLCA,[153]^19 and
   one lung cancer (Zenodo: 8275988) datasets—with 231, 347, and 30
   patients, respectively—a substantially longer OS was observed in
   patients with high iLSPS compared with those with low iLSPS under the
   optimal cut point ([154]Figure 4E). In the Hugo et al.[155]^16 cohort
   with 26 patients, a better OS was observed in the iLSPS-high than the
   iLSPS-low group, although the difference was not statistically
   significant (HR = 0.35, p = 0.1664). We then performed a pooled
   analysis of all seven cohorts, yielding a pooled HR of 0.44 (95%
   confidence interval [CI], 0.34–0.56; [156]Figure 4F), indicating that
   the predictive effect of the iLSPS on survival after ICIs was
   pronounced in these cohorts. Using the upper 1/3 as the cut point,
   patients with high iLSPSs also exhibited consistently superior survival
   to those with low iLSPSs across most cohorts, with some reaching
   statistical significance ([157]Figure S5E). More importantly, among
   these ICI-treated cohorts, melanoma,[158]^14^,[159]^18 LUAD, and LUSC
   (Zenodo: 8275988) cohorts held relative long-term survival, with some
   of the patients living more than 36 months.

   We compared the predictive power of iLSPS versus conventional
   individual and combined biomarkers. The area under the time-dependent
   receiver operating characteristic curve (AUC) was calculated to
   quantify the ability of continuous biomarkers to predict
   survival.[160]^22 Here, the pooled AUCs of the iLSPS in seven cohorts
   were 0.70, 0.75, 0.71, and 0.83 for 1-, 2-, 3-, and 4-year survival
   rates, respectively, consistently ranking first compared with each
   individual biomarker ([161]Figure 4G). Moreover, HR is an important
   index evaluating the predictive efficacy of biomarker for time-to-event
   outcomes.[162]^23 Pooled HRs of iLSPSs in seven cohorts were
   significantly lower than individual biomarkers with a series of cutoffs
   ([163]Figure 4H). In BLCA[164]^19 and lung cancer (Zenodo: 8275988)
   cohorts with available PD-L1 (immunohistochemistry [IHC]) levels, PD-L1
   alone failed to stratify patients benefitting from ICI ([165]Figures 4I
   and [166]S5F), while iLSPS displayed significant predictive efficacy
   ([167]Figure 4E). Furthermore, we compared the predictive performance
   of iLSPS with TMB- and IFNG-related signatures, including IFNG gene
   expression, IFNG4 signature (IFNG, CD274, LAG3, CXCL9), and effector
   T cell (Teff)/IFNG signature (CD8A, GZMA, GZMB, IFNG, EOMES, CXCL9,
   CXCL10, TBX21). Similarly, the pooled AUC of iLSPSs in seven cohorts
   for 1-, 2-, 3-, and 4-year survival rates consistently ranked first
   compared with these well-known biomarkers ([168]Figure 4J). The pooled
   HRs of iLSPSs across seven cohorts were significantly lower than TMB-
   and IFNG-related signatures across a series of cut points
   ([169]Figure 4K). In addition, TMB and CD8^+ T cells, TMB and IFNG were
   combined, respectively, to construct a linear regression model with
   leave-one-out cross-validation ([170]Figures S5G and S5H), which
   exhibited inferior predictive ability to iLSPS ([171]Figures 4A–4C and
   [172]5C) for survival after ICI. Taken together, iLSPS was predictive
   for 1-year OS of 25 cancer types after ICI and presents as a potential
   composite biomarker to predict LTS benefit at least for melanoma, LUAD,
   and LUSC.

Figure 5.

   [173]Figure 5
   [174]Open in a new tab

   Immune-related biological implications of iLSPS

   (A) Stacked bar plot showing the percentage per TME subtype in the
   high- and low-iLSPS groups in five immunotherapy datasets. p value was
   determined via the Fisher’s exact test.

   (B) Boxplots displaying the association between iLSPS and PD-L1 IHC
   (TC).

   (C) Boxplots showing the association between iLSPS and three
   histologically defined TME subtypes.

   (D) Boxplots showing the distribution of median iLSPS of TCGA 17 cancer
   types in four TME subtypes.

   (B–D) Q values were obtained via the Wilcoxon rank-sum test, followed
   by BH FDR adjustment.

   (E) Boxplots showing the distribution of median IS scores of TCGA 22
   cancer types in the high- and low-iLSPS groups.

   (F) Boxplots showing the distribution of median iLSPSs of TCGA 14
   cancer types in the high- and low-IPS groups.

   (D–F) p value (E and F) was determined via the Wilcoxon rank-sum test.
   The dots (D–F) denote the median score per cancer type.

   (G) Representative microphotographs of mIF (scale: 200 μm) from patient
   no. 2 with high iLSPS and patient no. 5 with low iLSPS.

   (H) The immune phenotype of iLSPS-high and -low patients.

   (I) Percentage of cells in tumor region of iLSPS-high and iLSPS-low
   patients.

   (J) Boxplots showing the association between iLSPS and TCR richness of
   pretreatment and on-treatment samples.

   (K) Boxplots showing the association between iLSPS and ITH in Liu
   et al.[175]^18 cohort.

   p value (I–K) was determined via the Wilcoxon rank-sum test. Data are
   represented as mean ± SEM (B–F and I–K).

   See also [176]Figures S6 and [177]S7.

   We subsequently utilized the model to calculate the iLSPS of all
   patients—i.e., not selecting for those receiving immunotherapy—from 22
   cancer types in TCGA ([178]Figure S5I). However, no significant
   difference in TCGA-reporting OS was observed between the high- (upper
   1/3) and low-iLSPS arms of most cancers, suggesting that iLSPS was not
   a prognostic indicator ([179]Figure S5J). Importantly, iLSPS in
   patients with stage I and II tumors distributed similarly to that of
   stages III and IV ([180]Figure S5K), thereby overcoming the bias from
   various stages of TCGA samples to some extent.

Immune-related biological implications of iLSPS

   Transcriptomic-based analyses are widely used to categorize patients
   into specific TME subtype to explain heterogeneous prognoses or
   responses to regimens. We assessed the association between iLSPSs and
   previously reported TME-related signatures[181]^24 and proposed four
   distinct TME clusters: immune-enriched, fibrotic (IE/F);
   immune-enriched, non-fibrotic (IE); fibrotic (F); and immune-depleted
   (D). We also make reference to an “immune-excluded” phenotype, in which
   the CD8^+ Teff signature was intermediate between immune-inflamed and
   immune-dessert phenotypes. The IE and IE/F subtypes generally exhibited
   superior survival benefits after ICI to the F and D subtypes. In four
   melanoma cohorts[182]^15^,[183]^16^,[184]^17^,[185]^18 and one BLCA
   dataset,[186]^19 a higher fraction of IE/F and IE phenotypes and a
   lower fraction of F and D phenotypes were observed in the iLSPS-high
   group ([187]Figure 5A). Generally, patients with the IE or IE/F subtype
   had significantly higher iLSPS than those with the F or D subtype
   ([188]Figure S6A). In the BLCA cohort,[189]^19 iLSPS was statistically
   higher in patients with positive PD-L1 expression on tumor cells
   (TC2^+ > TC0, p = 7.70 × 10^−5; [190]Figure 5B), and more
   PD-L1-positive (TC2^+) patients were observed in the iLSPS-high group
   ([191]Figure S6B). In addition, in the BLCA cohort,[192]^19 a higher
   iLSPS was observed among patients with immune-inflamed and -excluded
   phenotypes than those with immune-desert phenotypes ([193]Figure 5C),
   and the iLSPS-high group exhibited more immune-inflamed subtypes than
   the iLSPS-low group ([194]Figure S6C). Both TILs and immunomodulatory
   genes ([195]Table S11) differed between patients with high and low
   iLSPSs across the seven ICI-treated cohorts ([196]Figures S6D and S6E).
   These results implied that a high iLSPS may reflect an immunoactive or
   “hot” microenvironment.

   Using TCGA pan-cancer datasets, we found that iLSPS was consistently
   connected to the four major TME phenotypes in most cancers
   ([197]Figure S7A). The IE subtype generally had a higher iLSPS than the
   IE/F, F, and D subtypes ([198]Figure 5D). Similarly, a significantly
   higher immune signature (IS) was attained in the iLSPS-high group (p =
   0.0021; [199]Figures 5E and [200]S7B). Similarly, iLSPS values of
   patients were higher in immunophenoscore (IPS)-high groups
   ([201]Figures 5F and [202]S7C). In most cancer types, the C2 (IFNG
   dominant) subset uniformly possessed the highest iLSPS
   ([203]Figure S7D). This result supported the activated immune status
   associated with high iLSPSs, considering that the C2 subset is
   characterized by a massive CD8 signal, the greatest TCR diversity, and
   highest level of M1/M2 macrophage polarization.

   To confirm the indication of iLSPS high for a “hot” TME, multiplex
   immunofluorescence (mIF) was performed ([204]Figure 5G) in eight lung
   cancer samples from our center (Zenodo: 8275988). Two patients
   displayed an immune-inflamed phenotype and one patient displayed an
   immune-excluded phenotype in the iLSPS-high group, while five patients
   with low iLSPSs were characterized by an immune-desert phenotype
   ([205]Figure 5H). The infiltration of total CD3^+ T cells (p = 0.036),
   PD-1^+CD8^+ T cells (p = 0.026), CD68^+CD163^− M1 macrophages (p =
   0.036) and PD-L1^+CD68^+ macrophages (p = 0.046) in the tumor region
   was significantly higher in the iLSPS-high than in the iLSPS-low group
   ([206]Figure 5I). Overall, the mIF depiction of the histological TME
   validated the association of high iLSPSs with a molecular
   information-derived inflamed TME.

   Upon recognition of neoantigens, the characterization of a T cell
   receptor (TCR) repertoire reflects the T cell expansion and immune
   activation.[207]^25 In the Riaz et al.[208]^17 cohort, for both
   pretreatment and on-treatment samples, a significant increase in the
   number of TCRβ-chain complementarity-determining region (CDR3)
   repertoires and median CDR3 counts per VJ was observed in iLSPS-high
   compared with iLSPS-low patients ([209]Figure 5J), indicating the
   association between iLSPS and TCR richness. Besides, we found that
   patients with lower intra-tumor heterogeneity (ITH) had higher iLSPSs
   (p = 0.016, [210]Figure 5K) in the Liu et al.[211]^18 cohort.

Clinical application of iLSPS

   Here, we provided an overview of iLSPSs, other indicators, and OS in
   ICI-treated cohorts. First, our internal dataset of lung cancer was
   displayed, with higher levels of four features and a more prolonged OS
   (32.90 versus 13.67 months) in the iLSPS-high group ([212]Figure S8A).
   Further, a nomogram of our lung cancer cohort that incorporated
   features—including Eastern Cooperative Oncology Group (ECOG) score,
   smoking history, primary type of tumor, and iLSPS—was developed to
   forecast the survival after ICI ([213]Figure S8B). Likewise, overviews
   and nomograms for the BLCA cohort[214]^19 ([215]Figures S9A and S9B)
   and the largest melanoma cohort[216]^18 ([217]Figures S9C and S9D) were
   presented. The lung cancer, BLCA, and melanoma nomograms all
   demonstrated satisfactory accuracy, with C indices of 0.75, 0.64, and
   0.67, respectively, and the calibration plots indicated good
   concordance between the nomogram-estimated and actual survival
   probabilities ([218]Figures S9E–S9G).

   To be more intuitive, we selected two typical cases, nos. 1 and 2, of
   low and high iLSPS, respectively, from our lung cancer cohort to
   demonstrate the calculation and application of iLSPS in predicting
   survival in a clinical workflow. The clinicopathological features and
   model-calculated iLSPSs were extracted, and the nomogram predicted a
   superior survival for patient no. 2 compared with no. 1 after receiving
   tislelizumab ([219]Figure S8C). As evidenced by computed tomography
   (CT) images ([220]Figure S8D), patient no. 1 with a low iLSPS showed
   the best response of stable disease (SD; assessed according to RECIST
   criteria) and finally progressed, with an OS of 10.43 months.
   Conversely, patient no. 2 with a high iLSPS exhibited the best response
   of complete response (CR) on the liver metastatic lesion, with an OS of
   39.27 months (still living). These results suggested that the iLSPS is
   relevant and convenient in real-world clinical practice, with
   considerable predictive power for survival post-immunotherapy.

Discussion

   In this study, we integrated TCGA molecular multi-omics data and
   immunotherapy clinical trial reports of 1-year OS rates to identify LTS
   predictors and construct a predictive machine-learning model. Since
   in vivo and in vitro experiments to detect the predictors of human LTS
   are impractical and unreachable, in silico prediction models are an
   attractive approach.

   The 5-year OS rate, which reflects the proportion of potentially
   curable patients, is considered an indicator of LTS.[221]^26 Here, we
   investigated the correlation between 5-year OS and other survival time
   points. The direct correlation between 5-year and 1-year OS rates after
   ICI was significant in six cancer types: LUAD, LUSC, KIRC, SKCM, ESCC,
   and BLCA. The stepwise correlation analysis, based on more clinical
   trials with available 3-, 2-, and 1-year OS rates, also help to support
   the use of 1-year OS as a proxy. Since only six cancer types with
   5-year OS performed as six samples were unfeasible to develop a
   predictive model, we then utilized 1-year OS rate, reported in 158
   trials involving 25 cancer types and 21,023 patients, for the following
   in silico bioinformatic analysis, which expanded the sample size to 25
   cancer types.

   Using TCGA database, we considered a panel of 107 variables that might
   reflect the LTS predictor landscape. Based on bulk transcriptomic data,
   we defined an accessible MHC_score using MHC class I and II genes
   expression, which showed a significant association with survival after
   ICI and was validated in two independent immunotherapy datasets. The
   infiltration of CD8^+ T cells was also significantly correlated with
   the 1-year OS rate after ICI. Moreover, CD8^+ T cell clusters among
   ICI-responsive patients exhibited higher expression of cytotoxic genes
   and costimulatory genes, suggesting a dominant role of CD8^+ T cells
   with antitumor cytotoxicity. We calculated a metabolism score based on
   bulk sequencing data and pathway analysis. Interestingly, in our
   supporting analyses of scRNA-seq datasets, we found that the metabolism
   score of TCs, rather than that of CD8^+ T and NK cells, informed
   differential clinical outcomes of treatment with ICIs. These findings
   underscore the implication of tumor metabolism alterations in shaping
   the microenvironment.[222]^27^,[223]^28

   We ultimately constructed the GPR model based on log-scaled TNB, CD8^+
   T cell, metabolism score, and MHC_score. The model-estimated iLSPS
   produced significantly superior pooled HRs and the largest pooled AUC
   value for predicting 1-year OS benefit when compared with established
   individual and combined features. Moreover, in three trial cohorts of
   melanoma,[224]^14^,[225]^18 LUAD and LUSC (our own Wang 2022 cohort)
   with relative long-term-surviving, iLSPS-high patients exhibited
   increased OS compared with iLSPS-low patients, validating the
   predictive value of iLSPS for LTS, at least for melanoma, LUAD, and
   LUSC. We found that a higher proportion of immune-inflamed subtypes and
   signatures were enriched in the high-iLSPS group, consistent with the
   biological underpinnings of ICI therapies.

   Above all, neoantigen burden, MHC_score, CD8^+ T cells, and metabolism
   score were filtered to develop the model. These four biomarkers covered
   the core process of initiating immune defection. Due to the complex
   interactions between tumors and the microenvironment, incorporating
   tumor and host factors, intrinsic and extrinsic factors to establish a
   multi-omics model can enhance prediction accuracy and clinical
   applicability more effectively. Among the four biomarkers developed in
   this study, CD8^+ T cells may be the most crucial. However, the
   contribution weights of these four biomarkers may vary across different
   cancer types. Therefore, it is necessary to include more cancer types
   with sufficient LTS datasets and conduct in-depth analyses of
   identifying key predictive factors in each cancer type to boost the
   predictive power for the LTS of patients.

Limitations of the study

   First, since the direct correlation between 5- and 1-year OS rates
   after ICI was observed in only six cancer types, the utilization of the
   1-year OS rate as surrogate metric of LTS in other cancer types should
   be cautious. Moreover, the predictive value of iLSPSs for LTS was
   tested only in melanoma, LUAD, and LUSC cohorts with relative long-term
   survival. Therefore, it remains unclear how iLSPS performs in other
   cancer types. However, along with the continuous increment of
   immunotherapy cohorts of more tumor types with longer follow-up, we
   anticipate that iLSPS could be utilized for predicting LTS in other
   cancers, as the four key features reflect fairly general immune
   mechanisms. Second, treating each cancer type as a unit to conduct
   feature selection and model training might oversimplify tumor biology.
   However, differences among cancer types often exceed patient-level
   heterogeneity, which was the rationale for the approach. Moreover, the
   value of the selected features and iLSPS was validated across external
   and internal cohorts at the patient level. We look forward to
   consolidating this approach in larger datasets of individuals. Third,
   the included clinical trials involved patients receiving other
   therapies before immunotherapy, which was reflective of the organic
   reality of present-day patients with cancer. However, the predictive
   value of iLSPS for LTS would not be diminished since biomarkers are
   signposts and proxies and not indications of one-to-one mechanistic
   relationships. Finally, due to the heterogeneity and potential bias
   across multiple cohorts, the power of the pooled HR and the pooled AUC
   of seven cohorts may be diluted; however, it remained superior to that
   of individual features in our research. Direct comparison of iLSPS with
   other biomarkers in cohorts with larger sample sizes and prospective
   exploration of iLSPS as a predictor is warranted.

STAR★Methods

Key resources table

   REAGENT or RESOURCE SOURCE IDENTIFIER
   Antibodies
     __________________________________________________________________

   anti-PD-1 Cell Signaling Technology Cat#86163S; RRID:AB_2728833
   anti-PD-L1 Cell Signaling Technology Cat#13684S
   anti-CD8 Abcam Cat#ab178089
   anti-CD163 Abcam Cat#ab182422
   anti-CD68 Abcam Cat#ab213363
   anti-CD20 Daco Cat#L26, IR604
   anti-CD56 Abcam Cat#ab75813
   anti-CD4 Abcam Cat#ab133616
   anti-FoxP3 Abcam Cat#ab20034
   anti-CD3 Daco Cat#A0452
   anti-panCK Abcam Cat#ab7753
     __________________________________________________________________

   Biological samples
     __________________________________________________________________

   Lung cancer samples (n = 30) This study N/A
   Pan-cancer samples TCGA ([226]https://portal.gdc.cancer.gov/)
     __________________________________________________________________

   Critical commercial assays
     __________________________________________________________________

   AllPrep DNA/RNA FFPE Kit QIAGEN Cat#80234
   NEBNext® Ultra™II RNA First Strand Synthesis Module NEB Cat#E7771L
   NEBNext® Ultra™ II Directional RNA Second Strand Synthesis Module NEB
   Cat#E7550L
   Hieff NGS Ultima DNA Library Prep Kit for MGI Yeasen Cat#13310ES98
   DNBSEQ OneStep DNB Make Reagent Kit (OS-DB) MGI Cat#1000026466
     __________________________________________________________________

   Deposited data
     __________________________________________________________________

   Processed scRNA-seq data of BCC cohort Yost et al. 2019[227]^29 GEO:
   [228]GSE123813
   Raw and processed scRNA-seq data of RCC cohort Biet al. 2021[229]^30
   dbGaP: phs002065.v1.p1
   Single Cell Portal:
   ([230]https://singlecell.broadinstitute.org/single_cell/study/SCP1288/t
   umor-and-immune-reprogramming-during-immunotherapy-in-advanced-renal-ce
   ll-carcinoma#study-summary)
   Anti-PD-1-treated NSCLC-Cho Cho et al. 2020[231]^31 GEO: [232]GSE100797
   Adoptive T cell-treated melanoma-Lauss Lauss et al. 2017[233]^32 GEO:
   [234]GSE126044
   ICI-treated melanoma-Snyder Snyder et al. 2014[235]^14
   [236]https://www.cbioportal.org/study/summary?id=skcm_mskcc_2014
   ICI-treated melanoma-Van Allen Van Allen et al. 2015[237]^15 dbGaP:
   phs000452
   ICI-treated melanoma-Hugo Hugo et al. 2016[238]^16 GEO: [239]GSE78220
   ICI-treated melanoma-Riaz Riaz et al. 2017[240]^17 GEO: [241]GSE91061
   ICI-treated bladder cancer- Mariathasan Mariathasan et al. 2018[242]^19
   EGA: EGAS00001002556
   ICI-treated melanoma-Liu Liu et al. 2019[243]^18 dbGaP: phs000452
     __________________________________________________________________

   Software and algorithms
     __________________________________________________________________

   iLSPS code base This paper [244]https://doi.org/10.5281/zenodo.8275988
   survival (version 3.2–10) CRAN
   [245]https://cran.r-project.org/web/packages/survival/index.html
   GSVA (version 1.36.3) Hänzelmann et al. 2013[246]^33
   [247]https://bioconductor.org/packages/GSVA/
   maftools (version 2.7.10) Mayakonda et al. 2018[248]^34
   [249]https://github.com/PoisonAlien/maftools
   pathview (version 1.28.1) Luo and Brouwer. 2013[250]^35
   [251]https://pathview.uncc.edu/
   clusterProfiler (version 3.16.1) Wu et al. 2021[252]^36
   [253]https://doi.org/10.1016/j.xinn.2021.100141(paper)
   ssGSEA Subramanian et al. 2005[254]^37
   [255]http://software.broadinstitute.org/gsea
   Seurat (version 2.3.4) Butler et al. 2018[256]^38
   [257]https://satijalab.org/seurat
   infercnv (version 1.4.0) Elyanow et al. 2021[258]^39
   [259]https://github.com/broadinstitute/inferCNV/wiki
   rms (version 6.1–0) NA [260]https://hbiostat.org/R/rms/
   R (version 4.0.2) R Development Core Team, 2010
   [261]https://www.r-project.org/
   mlr (version 2.18.0) Bischl et al. 2016[262]^40
   [263]https://github.com/mlr-org/mlr
   [264]Open in a new tab

Resource availability

Lead contact

   Further information and requests for resources should be directed to
   and will be fulfilled by the lead contact, Dr. Wang (zlhuxi@163.com).

Materials availability

   This study did not generate new unique reagents.

Experimental model and study participant details

Tissue samples

   Patients diagnosed with advanced/metastatic lung cancer confirmed
   histologically or cytologically and received single-agent ICI were
   retrospectively collected from our center. A total of 30 patients with
   eligible formalin-fixed paraffin-embedded (FFPE) tumor samples were
   recruited. All tissue samples were retrieved with the informed consent,
   and used to perform WES, WTS, and mIF. The clinicopathological
   information of patients were summarized in [265]Table S10. The age of
   the included patients ranged between 56 and 64, and 76.67% of the
   patients were males. This study was approve by the ethics committee of
   the National Cancer Center/Cancer Hospital, Chinese Academy of Medical
   Sciences and Peking Union Medical College, National GCP Center for
   Anticancer Drugs, The Independent Ethics Committee (22/302–3504 and
   23/035–3774). Written consent was obtained from the included patients.

Method details

Study design

   We first reviewed clinical trials enrolled with patients receiving
   single-agent ICI, to detect a surrogate of the 5-year OS rate. Second,
   we depicted the landscape of TCGA predictive biomarkers of
   immunotherapy efficacy and defined several novel predictors. Third,
   primary and secondary feature selection were conducted to filter
   candidate variables, which were validated using scRNA-seq datasets.
   Then, four features were filtered to construct the predictive GPR
   model, which was utilized in ICI-treated public cohorts and our
   internal cohort. Finally, the model application in clinical practice
   was visualized. The detailed methods were described as follows.

Cohort review

Immunotherapy-related clinical trials

   Clinical trials on immunotherapy were searched and manually selected
   from PubMed, Embase, Cochrane Central, clinicaltrials.gov and major
   conference proceedings with the key words: “anti-PD-1 or anti-PD-L1 or
   anti-CTLA-4 or immune checkpoint inhibitor or immune checkpoint
   blockade” and “clinical trial.” To avoid the interference of
   chemotherapy or other regimens, only clinical trials that reported the
   efficacy of single ICI on advanced/metastatic solid tumors were
   included. Trials that only included PD-L1 positive patients or
   first-line therapy were excluded.

   For each clinical trial, the following information, if provided, were
   manually recorded, including the mOS, 1-, 2-, 3-, 4-, 5-year OS rate,
   mPFS, 6-month PFS rate, 1-, 2-, 3-, 4-year PFS rate, ORR, mDOR, any
   grade treatment-related AE incidence rate (AE-all-rate), grade 3 or 4
   treatment-related AE incidence rate (AE-3/4-rate), any grade endocrine
   AE incidence rate (AE-endocrine-rate), grade 3 or 4 endocrine AE
   incidence rate (AE-endocrine-3/4-rate), any grade gastrointestinal AE
   incidence rate (AE-GI-rate), grade 3 or 4 GI AE incidence rate
   (AE-GI-3/4-rate), any grade skin AE incidence rate (AE-skin-rate),
   grade 3 or 4 skin AE incidence rate (AE-skin-3/4-rate), follow-up time,
   number of patients, line of therapy, and treatment. The most recent and
   articulated data were documented for a given clinical trial with
   multiple publications. Pearson’s correlation analyses were conducted
   between these ICI-treated outcomes from all included clinical trials to
   identify the surrogate of the 5-year OS rate. The median value of the
   survival outcomes of the clinical trials of each cancer type was
   computed when regarding each cancer type as a unit.

TCGA tumor datasets

   We only included cancers with both TCGA molecular data and 1-year OS
   rate after ICIs. We further re-classified these cancers according to
   the included clinical trials. For example, we selected patients with
   high microsatellite instability (MSI-H) from colon and rectal
   adenocarcinoma to constitute the cancer type named CRCH. We also
   utilized the MSI status to stratify uterine corpus endometrial
   carcinoma (UCEC) into MSI-H (mismatch repair [MMR] deficient, dMMR)
   UCEC (UCECH) and MSI-L (MMR-proficient, pMMR) UCEC (UCECL).
   Triple-negative breast cancer (TNBC) that did not express estrogen
   receptor, progesterone receptor, or human epidermal growth factor
   receptor 2 (HER-2) were selected. Esophageal carcinoma was divided into
   esophageal squamous carcinoma (ESCC) and esophageal adenocarcinoma
   carcinoma, and the latter was combined with gastroesophageal junction
   cancer (GEJC) and stomach adenocarcinoma, to produce the “GESC” cancer
   type. Data regarding pancreatic ductal adenocarcinoma (PDAC) were
   selected from pancreatic adenocarcinoma. Therefore, 25 newly classified
   cancer types were produced ([266]Table S2).

   The OS time from TCGA clinical data
   ([267]https://portal.gdc.cancer.gov/) was manually curated from
   “days_to_last_follow-up” (CDE_ID: 3008273) if vital_status (CDE_ID:5) =
   “Alive” or “days_to_death” (CDE_ID: 3165475) if vital_status = “dead”.
   The TCGA 1-year OS rate for each cancer type was estimated using the
   Kaplan–Meier method with the R package survival (version 3.2–10). Six
   immune subtypes of C1 (wound healing), C2 (IFN-γ dominant), C3
   (inflammatory), C4 (lymphocyte depleted), C5 (immunologically quiet),
   and C6 (TGF-β dominant) were characterized across multiple TCGA tumor
   types.[268]^41 The immune signature (IS) scores of 22 TCGA cancer types
   were obtained from.[269]^42 The IPS values of TCGA 14 cancer types were
   obtained from.[270]^43

Single-cell RNA sequencing datasets

   Two scRNA-seq datasets of patients receiving ICI were obtained. One was
   the BCC cohort,[271]^29 and pretreatment biopsy samples of 11 patients
   (su001, su002, su003, su004, su005, su006, su007, su008, su009, su010,
   and su012) treated with pembrolizumab or cemiplimab were included.
   Another was the RCC cohort,[272]^30 and four patients (P55, P906, P913,
   and P915) reporting ICI-treated outcomes were included.

Immunotherapy-related datasets

   We curated public datasets of PD-L1 non-selective advanced/metastatic
   cancer treated with single-agent ICIs. In total, five melanoma
   datasets[273]^14^,[274]^15^,[275]^16^,[276]^17^,[277]^18 and one BLCA
   dataset[278]^19 were finally included. Besides, a total of 30 lung
   cancer patients receiving single-ICI from our center were respectively
   collected and utilized as the internal cohort.

   Samples with available transcriptomics sequencing data and OS data were
   included in the analysis, and those with unknown neoantigen burden were
   filled using the k-nearest neighbors algorithm with k = 10. For the
   Snyder 2014 cohort, patients SD0346 and SD2051 treated with combined
   ICI therapy were excluded. For the Mariathasan 2018 cohort, duplicate
   sample data of patient 10285 were excluded. Finally, patients’
   clinicopathological information and molecular data of all seven cohorts
   were manually curated and summarized ([279]Tables S9, [280]S10, and
   [281]S11).

   Three distinct pathological immune phenotypes (immune-inflamed,
   immune-excluded, or immune-desert) and PD-L1 expression (TC levels) of
   patients in the Mariathasan 2018 cohort, as well as four distinct TME
   classifications (immune-enriched, fibrotic [IE/F]; immune-enriched,
   non-fibrotic [IE]; fibrotic [F]; or immune-depleted [D]) of the cohorts
   (Van Allen 2015, Hugo 2016, Riaz 2017, Mariathasan 2018, Liu 2019) were
   retrieved from.[282]^24 ITH data was available in,[283]^18 and
   TCRβ-chain CDR3s repertoire data was available in.[284]^17

   In addition, an NSCLC cohort treated with anti-PD-1,[285]^31 and a
   melanoma cohort treated with adoptive T cell therapy[286]^32 were
   included, among which patients with available MHC genes expression data
   were enrolled.

Bioinformatics

WES processing

   For samples from our internal cohort, genomic DNA was extracted using
   the TIANamp Genomic DNA kit (Tiangen Biotech, Beijing, China) following
   manufacturer’s instruction. DNA concentration and purity were estimated
   using Nanodrop 2000 spectrophotometer and Qubit 2.0 Fluorometer with
   Quanti-IT dsDNA HS Assay Kit (Thermo Fisher Scientific, MA, USA).
   Library construction was performed using a custom 53M length capturing
   probe, made by Integrated DNA Technologies (IDT, IA, USA), and covering
   the coding regions of all genes and partial non-coding regions.
   Captured libraries were then pair-end sequenced in 100 bp lengths with
   DNBSEQ-T7R platform (MGI, Shenzhen, China) following manufacturer’s
   instructions. Raw data was filtered to remove low-quality reads and
   adaptor sequence. Reads were further mapped to the reference human
   genome (hg19) utilizing BWA aligner (version 0.7.10) for mutation
   calling.

Somatic mutations

   MuTect2 and GATK were performed on the WES data of our internal cohort
   to call SNVs and Indels mutations. Two or more mutation callers
   (NCALLERS >1) were assessed on the WES data of TCGA samples to generate
   maf files.[287]^41 The TMB was assessed as the total number of defined
   mutations with the variatllele frequency (VAF) ≥ 1% per sample. Data
   regarding somatic SNVs were extracted with theed a filter
   Variant_Type = ‘‘SNP’, while somatic indel variants were extracted with
   the filter Variant_Type = “INS,” ‘‘DEL”. Data regarding non-synonymous
   mutations were extracted using Variant_Classification =
   “Frame_Shift_Del,” “Frame_Shift_Ins,” “Splice_Site,”
   “Translation_Start_Site,” “Nonsense_Mutation,” “Nonstop_Mutation,”
   “In_Frame_Del,” “In_Frame_Ins,” “Missense_Mutation”. Data regarding
   truncating mutations were extracted using Variant_Classification =
   “Nonsense_Mutation,” “Splice_Site,” “Frame_Shift_Del,”
   “Frame_Shift_Ins.” Data regarding missense mutations were extracted
   using Variant_Classification = “Missense_Mutation”. Transition (Ti),
   transversion (Tv), their ratio (Ti/Tv), and mutant-allele tumor
   heterogeneity (MATH), a straightforward estimate of intratumor
   heterogeneity[288]^44 of each sample were calculated using the R
   package maftools (version 2.7.10).[289]^34

Neoantigen prediction

   To screen the neoantigen of our internal samples, depth-based filters
   were employed,[290]^45 and any variants with normal coverage ≤5× and
   normal VAF ≥2% were filtered out. For tumor coverage from DNA, a cutoff
   was placed at ≥ 10× with a VAF of ≥40%. To further evaluate the effect
   of relevant nearby variants on neoantigen identification, we used
   netMHC- 4.0, an updated version of the pVAC tools software, to assess
   the binding affinities of the neoantigens with the corrected mutant
   peptide sequence.[291]^46 TNB was defined as the number of neoantigens
   obtained through the above prediction process.

   The TNB of TCGA samples were calculated using NetMHCpan v3.0 by
   identifying those with MHC binding affinity <500 nM and retrieved from
   [292]https://gdc.cancer.gov/about-data/publications/panimmune.

   In addition, the TCGA landscape of ten canonical pathways, cell cycle,
   Hippo, Myc, Notch, Nrf2, PI-3-Kinase/Akt, RTK-RAS, TGFβ signaling, p53,
   and β-catenin/Wnt were obtained from.[293]^47 For each cancer type, the
   proportion of patients with one or more alterations (amplification,
   deletion, mutation, epigenetic silencing, or fusion) in at least one
   member of each pathway was calculated for each cancer type.

Chromosomal instability

   The aneuploidy score reflected the imbalance of chromosome numbers and
   was obtained from.[294]^48 Purity, ploidy, whole genomic doubling
   (WGD), and ITH, defined as the subclonal genome fraction, were
   calculated using ABSOLUTE.[295]^49 The chromosomal instability (CIN)
   score was defined as the sum of squares of gene-level GISTIC2.0 values
   and derived from.[296]^42 Loss of heterozygosity (LOH) events were
   measured by “LOH_n_seg” and “LOH_frac_altered.” The HRD score was
   calculated by summing the HRD-LOH, large-scale state transitions, and
   telomeric allelic imbalance (NtAI).

Copy number variation

   Data regarding focal copy number gains and losses identified by
   in silico admixture removal (ISAR)-corrected GISTIC2.0 were downloaded
   from [297]https://gdc.cancer.gov/about-data/publications/panimmune. CNV
   burden scores (“fraction altered” and “number of segments”), which were
   respectively defined as the fraction of altered bases and the total
   number of altered segments measured the general copy number variation
   profile. To refine the degree of copy number alteration, we retained
   genes with a CNV frequency among each cancer type that significantly
   correlated with 1-year OS rate after ICI but not TCGA 1-year OS rate.
   Top 500 genes according to the absolute value of Spearman correlation
   coefficient with 1-year OS rate after ICI were extracted to further
   define “CNV burden” as the summation of gene-level GISTIC2.0 absolute
   value, with 2 indicating deep amplification/deletion (alterations
   beyond the range of the median copy-ratio for each chromosome arm), and
   1 indicating shallow amplification/deletion (alterations between 0.1
   relative copy number and the thresholds for deep alterations).

DNA damage response

   Nine “core DNA damage response (DDR)” pathways comprising a set of 9
   DDR genes and 71 DNA repair pathway-specific genes were defined as
   previously described: base excision repair (BER), nucleotide excision
   repair (NER, including TC-NER and GC-NER), mismatch repair (MMR),
   Fanconi anemia (FA), homologous recombination (HR), non-homologous
   end-joining (NHEJ), direct repair (DR), and translesion synthesis
   (TLS). The fraction of patients with at least one gene alteration
   (mutation, deep copy number deletion, or epigenetic silencing) per
   given DDR pathway was calculated for each cancer type.

RNA-seq processing

   RNA extraction, library construction, sequencing, and FASTQ data
   quality control of our internal cohort were performed as previously
   described.[298]^50 The RNA-seq expression profiles (tumors only) of
   TCGA 25 cancer types were downloaded from The Cancer Genome Atlas
   (TCGA) portal ([299]https://portal.gdc.cancer.gov/) and transformed to
   log-scaled transcripts-per-million (TPM) values.

Immune cell infiltration

   The profiles of 16 immune cell populations[300]^51 were computed using
   the ssGSEA algorithm, including CD8 T cells (CD8^+ T), central memory
   T cells (Tcm), effector memory T cells (Tem), gamma delta T cells
   (Tgd), regulatory T cells (Treg) T helper cells, follicular helper
   T cells (Tfh), B cells, macrophages, monocytes, mast cells,
   eosinophils, activated dendritic cells (aDCs), immature dendritic cells
   (iDCs), NK CD56 dim cells (NKdim), and NK CD56 bright cells
   (NKbright)]. “CD8/Treg” cells were determined as the ratio of CD8^+ T
   to Treg cells. “dim/bri NK” represented the ratio of NKdim to NKbright.
   “aiDC” represented the ratio of aDCs to iDCs and “adp/inat” represented
   the ratio of adaptive immune cells (CD8^+ T, Tcm, Tem, Tgd, Treg, and T
   helper cells, Tfh, and B cells) to innate immune cells (macrophages,
   monocytes, mast cells, eosinophils, aDC, iDC, NKdim, and NKbright).

   TCR CDR3 sequences from T cells were inferred via RNAseq using MiTCR
   v1.0.3.[301]^52 B cell receptor (BCR) heavy chain contigs were
   reconstructed via RNAseq using the V’ DJer tool.[302]^53 V region
   identity, V and J gene segments, and CDR3 sequences were further
   identified using IMGT/HighV-Quest.[303]^54 TCR and BCR diversity scores
   (Shannon entropy, evenness, and richness) of TCGA samples were
   calculated.[304]^41

Inflammation signature

   The T cell-inflamed gene expression profile (GEP) score was calculated
   as the weighted sum of the transcript levels of 18 inflammatory genes
   involved in antigen presentation, chemokine expression, immune
   response, and immune resistance.[305]^55 The dendritic cell (DC) index
   referred to the level of infiltration and was computed as the weighted
   sum of chemokines recruiting DCs into the tumor
   microenvironment.[306]^56 Immune cytolytic activity (CYT) was defined
   as the geometric mean of granzyme A (GZMA) and perforin (PRF1)
   transcript levels (0.01 offset).[307]^57 The IFN-γ 4/6/10/18/28 gene
   signature,[308]^58 effector T cell (Teff) gene signature,[309]^59 and T
   effector and interferon-γ (Teff/IFN-γ) gene signatures[310]^60 were
   determined as the geometric means of the associated gene transcript
   levels (0.01 offset). The immune score, stromal score, and ESTIMATE
   score were analyzed using the ESTIMATE algorithm.[311]^61 Due to
   restrictions of access to germline HLA types, MHC classes I and II
   genes (HLA-A, HLA-B, HLA-C, MICA, MICB, HLA-DPA1, HLA-DPB1, HLA-DQA1,
   HLA-DQA2, HLA-DQB1, HLA-DQB2, HLA-DRA, HLA-DRB1, and HLA-DRB5)
   expression (RNA-TPM) were used to train Ridge regression model to
   calculate MHC_score.

lncRNA

   Protein-coding mRNAs and lncRNAs were identified according to GENCODE
   (release 22). Only genes with a median value >0 in more than half of
   the cancer types and associated with 1-year OS rate after ICI but not
   TCGA prognostic 1-year OS rate were selected to perform Spearman’s
   correlation analysis between mRNAs and lncRNAs. Moreover, significant
   mRNA-lncRNA interactive pairs with adjusted p value (from BH.fdr
   correction) < 0.05 were selected, from which the top 20 pairs according
   to the absolute value of Spearman correlation coefficient with 1-year
   OS after ICI were extracted to establish a mRNA-lncRNA regulatory
   network. The expression of lncRNA from these top 20 pairs was added to
   compute the lncRNA score.

Tumor metabolism

   KEGG pathway enrichment analysis was performed on genes associated with
   immunotherapy efficacy using the R package clusterProfiler (version
   3.16.1).[312]^62 The p value adjusted using the BH.fdr method was used
   to determine the significance.

   Genes panel regarding 106 metabolic pathways were obtained from the
   KEGG database and the metabolic activities of the pathways were
   assessed using the ssGSEA algorithm.[313]^33 We selected pathways for
   which the ssGSEA scores were significantly correlated with 1-year OS
   rate after ICI but not with TCGA 1-year OS rate. Gene set enrichment
   analysis (GSEA) enrichment analysis was conducted to further select
   metabolic pathways enriched in patients with superior survival benefit
   from immunotherapy. Genes were ranked according to Spearman’s
   correlation coefficients between transcript levels and 1-year OS rate
   after ICI. The pathways were portrayed using the R package pathview
   (version 1.28.1).

   In addition, the activities of immune-related pathways were also
   determined using the ssGSEA method.

Methylation

   DNA methylation data derived from the Infinium HM450 array (485,577 CpG
   site targeting probes) were downloaded from TCGA portal
   ([314]https://portal.gdc.cancer.gov/). The global genomic methylation
   loss of each sample was measured based on the beta values of probes
   mapped to long interspersed nuclear element-1 (LINE-1).[315]^63 A probe
   was considered hyper- or hypo-methylated as previously defined,[316]^64
   and hyper- or hypo-methylated probe frequencies of each sample were
   calculated. Then, DNA methylation inversion (DMI) scores were generated
   based on hyper- or hypo-methylation frequencies.

Feature selection

Primary feature selection

   For primary feature selection from 107 potential variables mentioned
   above, Spearman correlation analyses were conducted between the median
   value of TCGA multi-omics data and immunotherapy clinical trials’
   1-year OS rates per cancer type by regarding each cancer type as a
   unit.

Secondary feature selection

Random forest analysis

   The random forests (RFs) technique[317]^65 is an ensemble machine
   learning technique that aggregates a large number of decision trees,
   and predictions for regression from all trees are pooled to arrive at
   the final prediction, reducing the risk of overfitting phenomenon. Each
   tree was constructed based on bootstrap sample sets randomly drawn from
   the original dataset and a randomly selected subset of features. The
   remaining samples, the so-called out-of-bag samples, were used as the
   test set to determine the predictive ability of trees. The increase in
   mean square error (MSE) produced per given randomly permuted variable
   was calculated as the %IncMSE, indicating the importance of each
   feature.[318]^66 A higher %IncMSE value for a predictor indicates
   higher importance. Here, we utilized an RF of 200 trees to identify the
   importance of the features. Missing values in the dataset were filled
   using proximity imputation from the RF.

Gaussian process regression model

   The Gaussian process (GP) is a nonparametric kernel-based machine
   learning model[319]^67 that can make predictions relying on a few
   parameters. Due to the limited data of high quality in clinical
   scenarios, it was not reasonable to utilize other supervised learning
   methods that required a myriad of samples.

   GP regression (GPR) is a common application of the GP that was used to
   solve regression problems in our work. It makes predictions by first
   assuming a prior distribution and then updating the posterior
   distribution based on the observed samples corresponding to the
   Bayesian theorem. The GPR does not predict a specific value; rather, it
   predicts a function based on a sample from the GP by specifying a mean
   function
   [MATH: <mrow><mi mathvariant="normal">m</mi><mspace
   width="0.25em"></mspace><mrow><mo>(</mo><mi
   mathvariant="normal">x</mi><mo>)</mo></mrow><mo linebreak="goodbreak"
   linebreakstyle="after">=</mo><mi>E</mi><mspace
   width="0.25em"></mspace><mrow><mo>(</mo><mrow><mi>f</mi><mrow><mo>(</mo
   ><mi>x</mi><mo>)</mo></mrow></mrow><mo>)</mo></mrow></mrow> :MATH]
   and a covariance function
   [MATH: <mrow><mi>k</mi><mspace
   width="0.25em"></mspace><mrow><mo>(</mo><mrow><mi>x</mi><mo>,</mo><mi>z
   </mi></mrow><mo>)</mo></mrow><mo linebreak="goodbreak"
   linebreakstyle="after">=</mo><mi>c</mi><mi>o</mi><mi>v</mi><mrow><mo>(<
   /mo><mrow><mi>f</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow><mo>,</m
   o><mi>f</mi><mrow><mo>(</mo><mi>z</mi><mo>)</mo></mrow></mrow><mo>)</mo
   ></mrow></mrow> :MATH]
   where
   [MATH: <mrow><mi>x</mi><mo>,</mo><mi>z</mi><mo linebreak="goodbreak"
   linebreakstyle="after">∈</mo><msup><mi>R</mi><mi>d</mi></msup></mrow>
   :MATH]
   . This can be rewritten as
   [MATH: <mrow><mi mathvariant="normal">f</mi><mspace
   width="0.25em"></mspace><mrow><mo>(</mo><mi
   mathvariant="normal">x</mi><mo>)</mo></mrow><mspace
   width="0.25em"></mspace><mo>∼</mo><mspace
   width="0.25em"></mspace><mi>G</mi><mi>P</mi><mspace
   width="0.25em"></mspace><mrow><mo>(</mo><mrow><mi>m</mi><mrow><mo>(</mo
   ><mi>x</mi><mo>)</mo></mrow><mo>,</mo><mi>k</mi><mspace
   width="0.25em"></mspace><mrow><mo>(</mo><mrow><mi>x</mi><mo>,</mo><mi>z
   </mi></mrow><mo>)</mo></mrow></mrow><mo>)</mo></mrow></mrow> :MATH]
   . The prior mean is assumed to be the mean of the training data, as the
   features are centered and scaled. The prior’s covariance is specified
   by passing a radial basis function kernel object, given by
   [MATH: <mrow><mi>k</mi><mrow><mo
   stretchy="true">(</mo><mrow><msub><mi>x</mi><mi>i</mi></msub><mo>,</mo>
   <msub><mi>x</mi><mi>j</mi></msub></mrow><mo
   stretchy="true">)</mo></mrow><mo linebreak="goodbreak"
   linebreakstyle="after">=</mo><mi>exp</mi><mrow><mo
   stretchy="true">(</mo><mrow><mo>−</mo><mfrac><mrow><mi>d</mi><mspace
   width="0.25em"></mspace><msup><mrow><mo
   stretchy="true">(</mo><mrow><msub><mi>x</mi><mi>i</mi></msub><mo>,</mo>
   <msub><mi>x</mi><mi>j</mi></msub></mrow><mo
   stretchy="true">)</mo></mrow><mn>2</mn></msup></mrow><mrow><mn>2</mn><m
   sup><mi>l</mi><mn>2</mn></msup></mrow></mfrac></mrow><mo
   stretchy="true">)</mo></mrow></mrow> :MATH]
   , which is parameterized by a length-scale parameter
   [MATH: <mrow><mi>l</mi><mo linebreak="goodbreak"
   linebreakstyle="after">></mo><mn>0</mn></mrow> :MATH]
   and
   [MATH: <mrow><mi>d</mi><mspace width="0.25em"></mspace><mrow><mo
   stretchy="true">(</mo><mrow><msub><mi>x</mi><mi>i</mi></msub><mo>,</mo>
   <msub><mi>x</mi><mi>j</mi></msub></mrow><mo
   stretchy="true">)</mo></mrow></mrow> :MATH]
   is the Euclidean distance.

Feature selection via the wrapper method

   The wrapper method is an approach to feature selection that aims to
   select a feature subset that leads to the best training performance.
   Each sample in our dataset contained top 25 different features. To
   perform the feature selection,
   [MATH: <mrow><mi mathvariant="normal">D</mi><mo linebreak="goodbreak"
   linebreakstyle="after">=</mo><mrow><mo
   stretchy="true">{</mo><mrow><mrow><mo
   stretchy="true">(</mo><mrow><msub><mi>x</mi><mi>i</mi></msub><mo>,</mo>
   <msub><mi>y</mi><mi>i</mi></msub></mrow><mo
   stretchy="true">)</mo></mrow><mspace
   width="0.25em"></mspace><mo>|</mo><mspace
   width="0.25em"></mspace><mi>i</mi><mo
   linebreak="badbreak">=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mo>
   …</mo><mo>…</mo><mo>,</mo><mi>n</mi></mrow><mo
   stretchy="true">}</mo></mrow><mo linebreak="goodbreak"
   linebreakstyle="after">=</mo><mrow><mo>(</mo><mrow><mi>X</mi><mo>,</mo>
   <mi>Y</mi></mrow><mo>)</mo></mrow><mo>,</mo><mi>n</mi><mo
   linebreak="goodbreak"
   linebreakstyle="after">=</mo><mn>25</mn><mo>,</mo><msub><mi>x</mi><mi>i
   </mi></msub><mo linebreak="goodbreak"
   linebreakstyle="after">∈</mo><msup><mi>R</mi><mi>d</mi></msup></mrow>
   :MATH]
   (
   [MATH: <mrow><mi>d</mi></mrow> :MATH]
   is the 25 dimensions of features, e.g., the median CD8^+ T value and
   other features of each cancer type), and
   [MATH: <mrow><msub><mi>y</mi><mi>i</mi></msub></mrow> :MATH]
   represents the median 1-year OS rate of the included clinical trials
   per cancer type, n is the number of samples, referring to 25 cancer
   types). Through repeated training of various possible feature subsets
   following backward search, we estimated each model’s performance by
   10-fold cross-validation and selected the best feature subset evaluated
   by the MSE.

   Intersections of the top 10 important features in RFs and the selected
   feature subset from GPR were extracted for subsequent model
   establishment.

   Moreover, taken into consideration cancer type-specific factors, TCGA
   samples of each cancer type were split into several parts, mapping to
   the 1-year OS rates from 158 clinical trials of 25 cancer types, to
   measure the associations between these four selected features and
   1-year OS rate after ICI using HLM mixed-effects models. For each
   feature of interest, we compared a full model (1) with a null model (2)
   using ANOVA to determine if the feature was correlated with survival
   after ICI, using the R package lme4 (v1.1-26).

   Full = 1-year OS rate ∼ feature + (1 | cancer type) (1)

   Null = 1-year OS rate ∼ (1 | cancer type) (2)

   Cancer type was incorporated as a random factor while the feature was
   incorporated as a fixed factor.

scRNAseq validation of markers

   Uniform manifold approximation and projection clustering was performed
   using the R package Seurat (version 2.3.4). Cell clusters were obtained
   from[320]^29 and.[321]^30 The metabolism scores of individual cells
   were computed with the normalized count data following bulk analysis
   using the ssGSEA method. CNV profiles of all single cells were
   estimated using the R package infercnv (version 1.4.0), based on
   scRNA-seq profiles, and the CNV burden was calculated as the sum of
   alteration values of top 500 genes from bulk analysis for each given
   cell.

   Linear mixed-effects model was used to measure the difference of
   markers, including metabolism score, CNV burden, and effector genes
   expression, between ICI responders and non-responders using the R
   package lme4 (v1.1-26). In each cell cluster, we compared a full model
   (1) with a null model (2) using an ANOVA to determine if the marker was
   correlated with ICI response.

   [MATH: <mrow><mi>F</mi><mi>u</mi><mi>l</mi><mi>l</mi><mo
   linebreak="goodbreak"
   linebreakstyle="after">=</mo><mi>m</mi><mi>a</mi><mi>r</mi><mi>k</mi><m
   i>e</mi><mi>r</mi><mo linebreak="goodbreak"
   linebreakstyle="after">∼</mo><mtext>Response</mtext><mo
   linebreak="goodbreak"
   linebreakstyle="after">+</mo><mrow><mo>(</mo><mn>1</mn><mspace
   width="0.25em"></mspace><mo>|</mo><mspace
   width="0.25em"></mspace><mi>P</mi><mi>a</mi><mi>t</mi><mi>i</mi><mi>e</
   mi><mi>n</mi><mi>t</mi><mo>)</mo></mrow></mrow> :MATH]
   1

   [MATH: <mrow><mi>N</mi><mi>u</mi><mi>l</mi><mi>l</mi><mo
   linebreak="goodbreak"
   linebreakstyle="after">=</mo><mi>m</mi><mi>a</mi><mi>r</mi><mi>k</mi><m
   i>e</mi><mi>r</mi><mo linebreak="goodbreak"
   linebreakstyle="after">∼</mo><mrow><mo>(</mo><mn>1</mn><mspace
   width="0.25em"></mspace><mo>|</mo><mspace
   width="0.25em"></mspace><mtext>Patient</mtext><mo>)</mo></mrow></mrow>
   :MATH]
   2

   Patients were incorporated as a random factor while ICI responses were
   incorporated as a fixed factor.

Model construction, utilization

Final predictive score for the GPR model

   In our research, each cancer type was regarded as a unit, and there was
   a total of 25 cancer types, implying that 25 samples were incorporated
   to train this model. The GPR model was trained using the dataset D as
   shown in the equation
   [MATH: <mrow><mi>D</mi><mo linebreak="goodbreak"
   linebreakstyle="after">=</mo><mrow><mo
   stretchy="true">{</mo><mrow><mrow><mo
   stretchy="true">(</mo><mrow><msub><mi>x</mi><mi>i</mi></msub><mo>,</mo>
   <msub><mi>y</mi><mi>i</mi></msub></mrow><mo
   stretchy="true">)</mo></mrow><mspace
   width="0.25em"></mspace><mo>|</mo><mspace
   width="0.25em"></mspace><mi>i</mi><mo
   linebreak="badbreak">=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mo>
   …</mo><mo>…</mo><mo>,</mo><mi>n</mi></mrow><mo
   stretchy="true">}</mo></mrow><mo linebreak="goodbreak"
   linebreakstyle="after">=</mo><mrow><mo>(</mo><mrow><mi>X</mi><mo>,</mo>
   <mi>Y</mi></mrow><mo>)</mo></mrow><mo>,</mo><mi>n</mi><mo
   linebreak="goodbreak"
   linebreakstyle="after">=</mo><mn>25</mn><mo>,</mo><msub><mi>x</mi><mi>i
   </mi></msub><mo linebreak="goodbreak"
   linebreakstyle="after">∈</mo><msup><mi>R</mi><mi>d</mi></msup></mrow>
   :MATH]
   , and the number of features in each sample was optimized to four
   (log-scaled TNB, metabolism score, CD8^+ T cell, and MHC_score) to
   calculate the posterior distribution of the 1-year OS rate, which was
   defined as iLSPS.

   Missing values were filled using the k-nearest neighbors algorithm with
   k = 10. The space of hyperparameter sigma was searched and located via
   the Random search optimization method, and the optimized value was
   selected through bootstrap in that space. The fitting performance of
   the model was evaluated by Pearson R between the predicted iLSPS and
   true 1-year OS rate, and statistics in terms of the mean square error
   (MSE) and root-mean-square error (RMSE). The GPR stated above was
   achieved with R package mlr version 2.18.0 using the learner
   "regr.gausspr".

   Leave-one-out validation of the model was conducted. Besides, TCGA
   samples of each cancer type were divided into several parts according
   to the number of corresponding clinical trials reporting 1-year OS
   rates following single-ICI, which constituted a dataset of 158
   TCGA-based groups, with the median value of four features mapping to
   the 1-year OS rates from 158 clinical trials. This random split of TCGA
   samples was repeated 500 times and the performance of the model was
   also measured by Pearson R between the predicted iLSPS and 1-year OS
   rates of these 158 groups. Moreover, six public cohorts and our
   internal cohort mentioned above were used to test the model utility in
   ICI-treated patients.

Nomogram

   A nomogram was formulated based on the results of multivariable Cox
   proportional hazards regression analysis, using the R package rms
   (version 6.1–0). The predictive performance of the nomogram was
   measured by depicting a calibration plot and quantified by calculating
   Harrell’s C-index, with a value larger than 0.5 indicating predictive
   performance superior to random guessing.

Multiplex immunofluorescence

   Multiplex immunofluorescence (IF) staining was performed on 4 mm
   sections of FFPE tumor samples from our internal cohort, using an OPAL
   7-color Automation mIF Kit (Akoya Biosciences, Marlborough, MA, USA).
   FFPE tissue slides were incubated with specific primary antibodies
   including pan-CK, anti-CD8, anti-PD-1, anti-CD68, anti-CD163,
   anti-PD-L1 (panel 1), pan-CK, anti-CD3, anti-CD4, anti-CD20, anti-CD56,
   anti-FoxP3 (panel 2). Horseradish peroxidase-conjugated secondary
   antibody and corresponding reactive Opal fluorophores were then
   applied. Multiplex-stained slides were scanned using a Vectra Polaris
   Automated Quantitative Pathology Imaging System (Akoya), and scans from
   different channels were merged. InForm (Akoya) software was used for
   quantitative image analysis. Tumor and stroma regions were divided by
   4′,6-diamidino-2-phenylindole (DAPI)-labeled cell nuclei and CK-labeled
   tumor cells. Quantities of each cell population were reported as
   percentages (immune cells/total cells of DAPI) in the tumor area. The
   histology of tumor–immune phenotypes (inflamed, excluded, and desert)
   was interpreted as previously described.[322]^19

Quantification and statistical analysis

Survival analysis

   For the OS analysis, the log rank test was used to compare Kaplan–Meier
   survival curves of two arms. The Cox proportional hazards regression
   method was used to determine HRs and 95% CIs using the R package
   survival (version 3.2–10). A time-dependent receiver operating
   characteristic curve was plotted and the AUC was used to determine the
   predictive ability of continuous biomarkers for survival.

Statistical analysis

   Pearson’s and Spearman’s correlation analysis were performed using the
   R package stats version 4.0.2. For continuous data, the Wilcoxon test
   was used to compare two groups, while the Kruskal-Wallis test was used
   to compare across multiple groups. BH.fdr adjustment was utilized for
   multiple tests correction. Fisher’s exact test was used to compare two
   groups for categorical data.

Acknowledgments