Graphical abstract
graphic file with name fx1.jpg
[63]Open in a new tab
Highlights
* •
A TMErisk model was established based on six immune/stroma-related
genes
* •
High TMErisk predicts poor prognosis and immunotherapy efficacy in
multiple cancers
* •
High TMErisk correlated with immune-suppressive tumor micro-milieu
* •
TMErisk outperformed PD-L1 and TMB in predicting immunotherapy
efficacy
__________________________________________________________________
Biological sciences; Immunology; Cancer; Machine learning
Introduction
Antibodies targeting programmed cell death 1 (PD-1) immune checkpoint
or its ligand PD-L1 have yielded substantial overall survival (OS) and
therapeutic response benefits in non-small cell lung cancer
(NSCLC).[64]^1^,[65]^2^,[66]^3^,[67]^4 Their clinical application,
however, faces challenges which mainly lie in the relatively low
response rate of 14%–20% for single-agent treatment in unselected
patient population,[68]^4^,[69]^5^,[70]^6 and the lack of
well-established predictive biomarkers. PD-L1 by immunohistochemistry
(IHC) and tumor mutation burden (TMB) have been supported by multiple
evidence to serve as predictive biomarkers for immunotherapeutic
response and have gained approval for routine clinical
use,[71]^7^,[72]^8^,[73]^9 whereas in some cases they are confronted by
paradoxical prediction. It is difficult to draw a precise screening of
potential beneficiaries using a single biomarker. A multidimensional
indicator based on tumor cells, tumor microenvironment (TME) and host
immunity holds promise to integrate complementary factors for
individualized immunotherapy.
Tumor micro-milieu is an ecological niche where tumor cells exist and
is also composed of other nonmalignant components including
infiltrating immune cells, stromal cells, other types of cells,
extracellular matrix (ECM), vasculature, and signaling
molecules.[74]^10^,[75]^11 Components in the tumor ecosystem interact
with each other and collectively control tumor initiation and
progression, patient prognosis and drug sensitivity.[76]^10^,[77]^12
Current published studies mainly put emphasis on crosstalk between
cancer cells and two major benign cell components (immune cells and
stromal cells). The type, proportion, and spatial distribution of
infiltrating immune cells in the TME determine the immune phenotype of
tumor and therefore affect drug sensitivity of cancer cells and patient
survival.[78]^12^,[79]^13 The stromal cells bridge the crosstalk
between cancer cells and other TME components, which plays a crucial
regulatory role in tumor development and progression as well as
anti-tumor immunity, such as the immunosuppressive function of
cancer-associated fibroblasts (CAFs).[80]^14^,[81]^15 A previous study
published in Nature unraveled a dynamic co-evolution of pre-invasive
bronchial cells and the immune response during carcinogenesis, which
highlighted the feasibility of developing immune biomarkers.[82]^16
Considering the critical role of TME in carcinogenesis and tumor
progression, evaluating TME status could provide robust personalized
prediction of drug sensitivity and patient prognosis.
In this study, we employed the Estimation of STromal and Immune cells
in MAlignant Tumors using Expression data (ESTIMATE) algorithm and the
Least Absolute Shrinkage and Selection Operator (LASSO) regression
analysis to screen TME-related prognostic genes in the lung squamous
cell carcinoma (LUSC) patients from The Cancer Genome Atlas (TCGA)
database and established a TME-related risk scoring (TMErisk) model.
Prognostic efficacy of TMErisk and its ability to predict
immunotherapeutic response were comprehensively assessed and validated
in multiple cohorts. On development of the TMErisk model, we aimed to
provide a reliable and efficient quantitative instrument to select
patients likely or unlikely to be immunotherapy responders, and provide
insights into development of new combined therapy strategies.
Results
Correlation of immune/stromal scores with patient prognosis
494 patients containing complete prognosis information in TCGA-LUSC
cohort were enrolled in our study to develop TMErisk model
([83]Figure 1, [84]Table S1). Based on ESTIMATE algorithm, these
patients had a median immune score of 993.75 (range, −1189.75–3436.15),
and a median stromal score of −92.13 (range, −2217.10–1856.99)
([85]Table S2). The optimal cut-points of immune score and stromal
score determined by maximally selected rank statistics were 1904.86 and
191.14, respectively ([86]Figures S1A and S1B), based on which,
patients were divided into low and high immune/stromal score groups.
Kaplan–Meier survival analysis showed that patients with higher
immune/stromal scores had unfavorable OS compared with those with lower
scores (immune score: hazard ratio [HR] 1.41, 95% confidence interval
[CI] 1.01–1.97, p = 0.047; stromal score: HR 1.52, 95% CI 1.15–2.00,
p = 0.003) ([87]Figures S1C and S1D).
Figure 1.
[88]Figure 1
[89]Open in a new tab
A flowchart delineating process of model building and our data analysis
protocol
TCGA, the cancer genome atlas; LUSC, lung squamous cell carcinoma;
FPKM, Fragments Per Kilobase of exon model per Million mapped
fragments; ESTIMATE, Estimation of STromal and Immune cells in
MAlignant Tumors using Expression data; WGCNA, weighted gene
co-expression network analysis; TME, tumor microenvironment; NSCLC,
non-small cell lung cancer; LUAD, lung adenocarcinoma; TMB, tumor
mutation burden; ssGSEA, single sample gene set enrichment analysis;
TIDE, tumor immune dysfunction and exclusion; GDSC, Genomics of Drug
Sensitivity in Cancer; CTRP, Cancer Therapeutics Response Portal;
PD-L1, programmed cell death ligand 1; AUC, area under the receiver
operating characteristic curve.
Development and validation of TMErisk model
DESeq2 package was used to compare gene expression profile of patients
with low immune/stromal scores with those with high scores.
2885 differentially expressed genes (DEGs) were identified to be
related with immune score and 2007 to be related with stromal score
([90]Figures 2A and 2B). In weighted gene co-expression network
analysis (WGCNA), we identified 35 co-expressed gene modules
([91]Figure 2C, [92]Table S3), among which genes in the light green,
brown, blue and cyan modules were robustly associated with immune
score, and genes in the brown, blue, cyan and yellow modules had a
strong correlation with stromal score ([93]Figure 2D). Venn diagrams
presented overlapped genes between DEGs and strong TME-related genes
identified by WGCNA ([94]Figure 2E, [95]Table S4). These intersections
were further inputted into LASSO-Cox regression analyses to achieve
dimension reduction. The output genes associated with immune score and
stromal score were integrated to generate 15 candidate model genes
([96]Figure 2E, [97]Table S5). Association of the 15 candidate genes
with OS was evaluated in univariate Cox analyses ([98]Figure S2A) and
14 genes (HLA-DMB was excluded, p > 0.15) were incorporated into a
stepwise multivariate Cox model. The model containing TGM2, C11orf96,
PLAAT4, PNCK, KLF5 and C4BPA had a minimum AIC value (AIC = 2225.845).
Then, TMErisk could be calculated through the following formula:
TMErisk = 0.272255∗C11orf96 + 0.570467∗TGM2 - 0.318056∗PNCK -
0.752920∗PLAAT4 - 0.262275∗KLF5 + 0.311932∗C4BPA.
Figure 2.
[99]Figure 2
[100]Open in a new tab
Identification of significant TME-related genes
(A and B) The volcano map shows the differentially expressed genes
according to immune score (A) and stromal score strata (B).
(C) A dendrogram of clustered genes in WGCNA. Each branch in the figure
represents a gene and each color lump below represents a co-expressed
gene module.
(D) Correlation heatmap showing the relationship between the gene
modules and immune/stromal scores. Robust TME-related gene modules are
marked with black frames. The number within the color lump indicates
the correlation coefficient, followed by a statistical p value in the
parenthesis.
(E) Screening of candidate model genes. The Venn diagram shows the
intersection genes between DEGs and robust TME-related gene modules
identified by WGCNA. These overlapping genes are then inputted into a
LASSO regression analysis. The vertical dashed lines in the middle
panel represent the selected appropriate lambda value. Immune
score-related and stromal score-related LASSO output genes are combined
to generate candidate model genes. Genes in bold in the rectangular
boxes are unique genes in immune-related/stromal-related genes. DEGs,
differentially expressed genes; WGCNA, weighted gene co-expression
network analysis; LASSO, Least Absolute Shrinkage and Selection
Operator. See also [101]Figures S1–S4.
Patients in the TCGA-LUSC cohort were scored according to this model
and were stratified into low- and high-TMErisk groups based on a
cut-point value of 0.27 ([102]Figure S2B). Survival analysis revealed
that high TMErisk was significantly associated with worse OS (HR 2.88,
95% CI 2.09–3.96, p < 0.001) ([103]Figure 3A). This significant
association was validated in 6 NSCLC datasets ([104]GSE81089,
[105]GSE30219, [106]GSE37745, [107]GSE157011, TCGA-LUAD and
[108]GSE31210; all with a univariate Cox p-value of <0.001)
([109]Figures 3B–3G).
Figure 3.
[110]Figure 3
[111]Open in a new tab
Prognostic value of TMErisk in multiple publicly available NSCLC
datasets
(A–G) Kaplan–Meier survival analysis based on TMErisk strata in the
TCGA-LUSC training cohort. (B-G) Kaplan–Meier survival analyses based
on TMErisk strata in the [112]GSE81089 (B), [113]GSE30219 (C),
[114]GSE37745 (D), [115]GSE157011 (E), TCGA-LUAD (F) and [116]GSE31210
(G) cohorts.
(H) Time-dependent AUC shows accuracy of TMErisk for predicting OS in
the TCGA-LUSC training cohort and 6 NSCLC validation cohorts. NSCLC,
non-small cell lung cancer; TCGA, the cancer genome atlas; LUSC, lung
squamous cell carcinoma; AUC, area under the receiver operating
characteristic curve.
Prognostic potency of TMErisk and its correlation with clinicopathological
characteristics
Predictive accuracy of TMErisk for OS was evaluated using C-index and
time-dependent AUC. In the TCGA-LUSC cohort, TMErisk had a higher
C-index compared with TNM stage (0.620 [95% CI 0.578–0.661] vs. 0.557
[95% CI 0.516–0.598]) and combination of TMErisk and TNM stage could
yield a better C statistic (0.632, 95% CI 0.591–0.673). C-index (95%
CI) for validation cohorts [117]GSE81089, [118]GSE30219, [119]GSE37745,
[120]GSE157011, TCGA-LUAD and [121]GSE31210 were 0.711 (0.607–0.815),
0.654 (0.566–0.742), 0.711 (0.649–0.787), 0.606 (0.569–0.643), 0.629
(0.582–0.676) and 0.771 (0.689–0.853), respectively. Time-dependent AUC
analyses demonstrated that the training cohort and most of the
validation sets possessed good accuracy for predicting OS across
different time points ([122]Figure 3H). To determine relationship
between TMErisk and clinicopathological characteristics, we evaluated
distribution of TMErisk across selected baseline patient
characteristics in the TCGA-LUSC cohort. It was found that the elderly
patients had significantly higher TMErisk (median, 0.000 versus −0.046,
p = 0.027) ([123]Figure S3A). However, we observed no significant
difference between TMErisk and other clinicopathological factors
including gender, TNM stage, ECOG (Eastern Cooperative Oncology Group)
performance status (PS) and smoking status (all with p > 0.05)
([124]Figures S3B–S3H). Univariate and multivariate Cox analyses were
used to analyze impact of TEMrisk and other baseline parameters on
patient prognosis. In univariate analysis, higher TMErisk, AJCC tumor
stage of III and IV (versus stage I), and two or more ECOG PS
significantly predicted poorer OS (p < 0.05) ([125]Figure S4A). All
covariates analyzed in univariate analyses were incorporated into a
stepwise multivariate Cox model and the result revealed that TMErisk
was an independent prognostic factors with adjustment for other
baseline factors ([126]Figure S4B).
TMErisk correlated with immunosuppressive functional features and TME
landscape
To clarify the underlying cellular and molecular mechanisms for
prognostic difference related to TMErisk, we sought to investigate
signatures of biological function and TME landscape between the low-
and high-TMErisk groups. We found that significant enriched molecular
processes in the low-TMErisk group were mainly related with
down-regulated KRAS signaling ([127]Figure 4A). In contrast, gene sets
associated with coagulation, TNFα signaling via NF-κB, IL6-JAK-STAT3,
up-regulated KRAS signaling, complement, epithelial-mesenchymal
transition (EMT), Inflammatory response, angiogenesis and IL2-STAT5
signals were significantly enriched in the high-TMErisk group
([128]Figure 4A), which indicated chronic inflammation and formation of
immunosuppressive microenvironment.
Figure 4.
[129]Figure 4
[130]Open in a new tab
Association of TMErisk with cellular and molecular signatures
(A) TME-related Functional enrichment was performed by the
compareCluster function. The x-axis represents different selected genes
used for conducting enrichment analysis. The left facet panel shows the
enriched pathways for genes up-regulated in the high-TMErisk group, and
the right facet panel shows the enriched pathways for genes
up-regulated in the low-TMErisk group.
(B) The heatmap shows the normalized infiltration abundance of immune
and stromal cells that estimated by CIBERSORT, EPIC and xCell
algorithms. In the Wilcoxon test results, the orange asterisks indicate
cells significantly enriched in the low-TMErisk group, and the
lightgreen asterisks indicate cells significantly enriched in the
high-TMErisk group. ∗p < 0.05; ∗∗p < 0.01; ∗∗∗p < 0.001; ns, not
significant. See also [131]Figures S5–S8.
The abundance of infiltrating immune and stromal cells in the TME was
inferred by CIBERSORT, EPIC and xCell algorithms ([132]Tables S6,
[133]S7, and [134]S8). We observed that tumors with high-TMErisk had
higher infiltration levels of cells that could contribute to
immunosuppressive microenvironment, including regulatory T cells
(Tregs), endothelial cells, CAFs, macrophages M2 and neutrophil;
however, abundance of CD4^+ and CD8^+ T cells was higher in the
low-TMErisk group ([135]Figure 4B, [136]Table S9). Because most
chemokines exert an important influence on recruitment of immune cells
into the TME and the crucial role of some genes in modulating immune
response, we evaluated expression levels of 41 chemokine genes and 48
immune-related genes between different TME risk stratifications. It
revealed that expression levels of 34 immune-related genes (including
CTLA-4, PD-1, PD-L2, TIGIT, BTLA, TNFRSF14, VISTA, etc.) and 23
chemokine genes (including CCL2, CCL5, CCL13, CCL17-24, CXCL1, CXCL2,
CXCL5, etc.) were elevated in the high-TMErisk group ([137]Figures S5A
and S5B). Two co-stimulatory genes (TNFRSF18, TNFRSF25) and two
chemokine gens (CCL26, XCL1) were highly expressed in the low-TMErisk
group, whereas there was no significant difference in PD-L1 expression
level ([138]Figures S5A and S5B). In addition, the correlation between
6 model genes and selected immune-related genes was analyzed using
Pearson method. Our results revealed that TGM2 ([139]Figure S6A), C4BPA
([140]Figure S6B), C11orf96 ([141]Figure S7A) and PLAAT4
([142]Figure S7B) had significant positive correlation with most of the
selected immune-related genes, while PNCK and KLF5 were negatively
associated with most of the immune-related genes ([143]Figures S8A and
S8B).
Mutation profile for patients with low and high TMErisk in the TCGA-LUSC
cohort
Genetic alteration has been recognized as a pivotal factor that can
impact tumor biological behaviors and prognosis. We used “maftools”
instrument to explore whether there existed genetic mechanisms that
related to TMErisk. The top 15 genes mutated more frequently in the
low-TMErisk group included CDK6, FAM133B, LOC101927497, MTOR, PLCH1,
BCL11A, MIR4432, MIR4432HG, ATP13A2, COL6A3, IGSF3, CASZ1, KIF17, MMEL1
and PRDM2, whereas for the high-TMErisk group, the top 15 genes with
higher mutation frequency compared with those in the low-TMErisk group
were MYO15A, ATAD2, JMJD1C, IRF6, ITPR1, KIAA1524, VWA3A, CDKL5, DPH2,
DDC, HEG1, VGLL1, SLC35F1, OR7C2 and RNF10 ([144]Figures S9A and S9B).
The top 20 mutated genes and their mutation types based on the TME risk
stratification are respectively presented in [145]Figures S9C and S9D.
It has emerged that mutation heterogeneity of TP53 gene had predictive
potential for immunotherapy efficacy, in which PD-L1 expression, IFN-γ
signatures and TME composition were significantly distinguished between
tumors with TP53 missense and nonsense mutations.[146]^17 In our work,
the low-TMErisk group had higher frequency of TP53 missense mutation
compared with the high-TMErisk group (53.9% versus 38.6%, Chi-square p
value = 0.017), whereas frequency of nonsense mutation and truncating
mutation (referring to frame_shift_del, frame_shift_ins, and nonsense
mutations) was similar ([147]Figure S9E).
TMErisk had potential in predicting drug sensitivity
Based on the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer
Therapeutics Response Portal (CTRP) databases, we investigated whether
TMErisk could serve as a predictive biomarker for therapeutic response
to chemotherapy and targeted therapy. It turned out that patients with
low TMErisk were more sensitive to most chemotherapeutic and targeted
agents, including drugs that are commonly used in clinical setting,
such as paclitaxel, docetaxel, cisplatin and afatinib
([148]Figure S10A–S10H). TMB and Tumor Immune Dysfunction and Exclusion
(TIDE) algorithm were utilized to infer predictive potential of TMErisk
in terms of immunotherapeutic response. The results demonstrated that
patients with high TMErisk had higher TIDE score (Wilcoxon p < 0.001)
and lower TMB (Wilcoxon p = 0.017) than those with low TMErisk
([149]Figures 5A and 5B), indicating a better response to immunotherapy
in the low-TMErisk group.
Figure 5.
[150]Figure 5
[151]Open in a new tab
Evaluation of value of TMErisk in predicting immunotherapeutic response
and validation of predictive value of TMErisk for immunotherapy in
external cohorts
(A) Difference of TIDE score between the low- and high-TMErisk groups
was examined by Wilcoxon test.
(B) Difference of TMB between the low- and high-TMErisk groups was
examined by Wilcoxon test. Kaplan–Meier analyses for OS and PFS based
on TMErisk strata in OAK cohorts (C-D, G-H), POPLAR cohort (K-L) and
IMvigor210 cohort (O-P). Stacked percentage bar charts show the
association of TMErisk with ORR and DCR in in OAK cohorts (E-F, I-J),
POPLAR cohort (M−N) and IMvigor210 cohort (Q-R). TIDE, tumor immune
dysfunction and exclusion; TMB, tumor mutation burden; HR, hazard
ratio; CI, confidence interval; BOR, best overall response; CR,
complete response; PR, partial response; SD, stable disease; PD,
progressive disease; ORR, objective response rate; DCR, disease control
rate. See also [152]Figures S10 and[153]S11.
TMErisk predicted immunotherapeutic response in external cohorts
We first assessed the ability of TMErisk to serve as a predictive
biomarker for anti-PD-1/PD-L1 therapy in two GEO transcriptomic
datasets. Response and survival benefits could be observed for patients
with low TMErisk in the two GEO cohorts ([154]GSE135222: HR 5.28, 95%
CI 2.06–13.51, p < 0.001; [155]GSE78220: HR 6.49, 95% CI 1.93–21.78,
p = 0.002; objective response rate [ORR] for [156]GSE78220: 0.0% vs.
61.9%, p = 0.039) ([157]Figures S11A–S11C). Similar results were found
in another four external immunotherapy datasets from OAK, POPLAR and
IMvigor210 clinical trials, in which patients with high TMErisk score
had poorer OS and progression-free survival (PFS) ([158]Figures 5C, 5D,
5G, 5H, 5K, 5L, 5O, and 5P). In the OAK cohorts, high TMErisk was
significantly associated with lower ORR in LUSC patients (6.0% vs.
25.0%, p = 0.028) ([159]Figure 5E) and reduced disease control rate
(DCR) in LUAD patients (35.8% vs. 61.2%, p < 0.001) ([160]Figure 5J).
For the immunotherapy groups in the IMvigor210 and POPLAR cohorts,
patients with high TMErisk had both lower ORR and DCR
([161]Figures 5M−5N, 5Q, and 5R). TMErisk also was negatively
associated with OS and PFS in the chemotherapy groups from OAK and
POPLAR clinical trials ([162]Figures 6A, 6B, 6E, 6F, 6I, and 6J);
however, no significant relationship between the TMErisk and ORR as
well as DCR was observed (all with a p value of >0.05)
([163]Figures 6C, 6D, 6G, 6H, 6K, and 6L).
Figure 6.
[164]Figure 6
[165]Open in a new tab
Validation of predictive value of TMErisk for chemotherapy in external
cohorts
(A-B, E-F, I-J) Kaplan–Meier analyses for OS and PFS based on TMErisk
strata in OAK cohorts (A-B, E-F) and POPLAR cohort (I-J). (C-D, G-H,
K-L) Stacked percentage bar charts show the association of TMErisk with
ORR and DCR in in OAK cohorts (C-D, G-H) and POPLAR cohort (K-L).
(M−N) Predictive accuracy of TMErisk, TMB and PD-L1 for OS (M) and PFS
(N) in the POPLAR and IMvigor210 cohorts. HR, hazard ratio; CI,
confidence interval; BOR, best overall response; CR, complete response;
PR, partial response; SD, stable disease; PD, progressive disease; ORR,
objective response rate; DCR; OS, overall survival; PFS,
progression-free survival; TMB, tumor mutation burden; PD-L1,
programmed cell death ligand 1. See also [166]Figures S10 and [167]S12.
Univariate Cox analyses of TMErisk and other clinicopathological
covariates for OS and PFS in the POPLAR and IMvigor210 cohorts are
shown in [168]Figures S12A–S12D. These analyzed variables were then
incorporated into multivariate Cox models with stepwise screening.
TMErisk was found to be an independent predictive factor for OS and PFS
in patients treated with immunotherapy or chemotherapy drugs (all with
a p-value of <0.05). TMB and PD-L1 are two commonly used predictive
biomarker for response to immunotherapy in clinical practice. We next
compared the predictive performance of TMErisk, TMB and PD-L1 for OS
and PFS in the POPLAR and IMvigor210 cohorts. Our results demonstrated
that TMErisk outperformed TMB and PD-L1 in predicting OS and PFS for
immunotherapy and a combination of the three indexes could promote
predictive accuracy ([169]Figures 6M−6N). Predictive advance of TMErisk
for OS was also observed in the chemotherapy cohort from POPLAR, while
these three biomarkers alone had similar predictive accuracy for PFS
([170]Figures 6M−6N).
TMErisk predicted benefits of immunochemotherapy in the in-house cohort
For the internal ORIENT-11 cohort, a total of 113 samples from
immunochemotherapy (combo) group and 58 samples from chemotherapy group
were included in our study to conduct transcriptomic analysis. Detailed
sample screening procedure have been delineated in our previously
published study.[171]^18 Median TMErisk was 0.00 (range, −1.68 – 1.93)
for the combo group and −2.00 (−3.56 – 1.16) for the chemo group. In
the combo group, patients with high-TMErisk (42/113, 37.2%) had worse
median OS (11.7 vs. 34.0 months; HR 3.42, 95% CI 2.10–5.56, p < 0.001)
and PFS as well as lower ORR and DCR than those with low-TMErisk
([172]Figures 7A–7C). For the chemotherapy group from ORIENT-11,
high-TMErisk was significantly associated with shorter OS (10.6 versus
not reached months, HR 4.04, 95% CI 1.84–8.87, p < 0.001) whereas there
was no significant association between TMErisk and PFS (p = 0.423) as
well as ORR (p = 0.195) ([173]Figures 7D–7F).
Figure 7.
[174]Figure 7
[175]Open in a new tab
Validation of predictive value of TMErisk for immunotherapy and
chemotherapy in the internal cohort
(A and B) Kaplan–Meier analyses for OS and PFS based on TMErisk strata
in the immunochemotherapy group from ORIENT-11 cohort.
(C) Stacked percentage bar chart shows the distribution of BOR
according to TMErisk (left panel) and association of TMErisk with ORR
(right panel) in the immunochemotherapy group from ORIENT-11 cohort.
(D and E) Kaplan–Meier analyses for OS and PFS based on TMErisk strata
in the chemotherapy group from ORIENT-11 cohort.
(F) Stacked percentage bar chart shows the distribution of BOR
according to TMErisk (left panel) and association of TMErisk with ORR
(right panel) in the chemotherapy group from ORIENT-11 cohort.
(G–J) Predictive accuracy of TMErisk and PD-L1 for OS and PFS in the
ORIENT-11 immunochemotherapy and chemotherapy groups. Chemo,
chemotherapy; IO + Chemo, immunotherapy plus chemotherapy; HR, hazard
ratio; CI, confidence interval; BOR, best overall response; CR,
complete response; PR, partial response; SD, stable disease; PD,
progressive disease; ORR, objective response rate; R, responder; NR,
non-responder; AUC (t), time-dependent area under the receiver
operating characteristic curve; OS, overall survival. See also
[176]Figure S12.
Association of other clinicopathological factors with OS and PFS for
two treatment groups in ORIENT-11 was also evaluated using univariate
Cox analyses and the results are presented in [177]Tables S10 and
[178]S11. Multivariate analysis corroborated that TMErisk was an
independent predictive factor for OS in the combo (HR 3.81, 95% CI
2.33–6.24, p < 0.001) and chemo (HR 4.04, 95% CI 1.84–8.87, p < 0.001)
group, and for PFS in the combo group (HR 3.01, 95% CI 1.83–4.94,
p < 0.001); whereas TMErisk failed to predict PFS in the chemotherapy
group ([179]Figures S12E and S12F). We next specifically compared the
predictive efficacy of TMErisk and PD-L1 on patient survival and
therapeutic response. It was found that, irrespective of treatment
groups, TMErisk did better than PD-L1 in predicting OS at different
time points ([180]Figures 7G and 7H). In terms of predictive accuracy
for PFS, TMErisk was significantly superior to PD-L1 in the combo group
([181]Figure 7I), while both of the two biomarkers had low and similar
predictive accuracy in the chemo group ([182]Figure 7J). Intriguingly,
we observed that predictive efficacy of PD-L1 had a reduction tendency
over time; predictive accuracy of TMErisk, however, decreased first and
then raised ([183]Figures 7G–7J).
Discussion
Immune cells and stromal cells, as two major components in the tumor
micro-milieu, both can exert an important influence on biology and
prognosis of tumors and their response to treatment through direct
interaction with cancer cells or other indirect
mechanisms.[184]^10^,[185]^12^,[186]^15 In this study, we developed a
risk scoring (TMErisk) model with 6 gene signatures that related to
immune and stromal scores, and dissected the association of TMErisk
with TME as well as cellular and molecular signatures. Performance of
TMErisk for predicting OS and immunotherapeutic efficacy were evaluated
and validated in an internal cohort and multiple external cohorts. Our
results demonstrated that TMErisk could mirror the composition and
features of TME, and showed significant association with unfavorable OS
and poorer response (PFS, ORR and DCR) to checkpoint blockade
immunotherapy.
Accumulating studies have employed the ESTIMATE algorithm to infer
proportion of infiltrating immune and stromal cells in the tumor
samples, which highlighted reliability and effectiveness of prognosis
prediction according to TME signatures and the predictive advantage of
TME scores than other single biomarkers.[187]^19^,[188]^20 Based on
methodology of WGCNA and a machine-learning method LASSO-Cox, we
conducted a dimension-reduced screening for gene expression matrix from
TCGA-LUSC samples and a simplified model containing 6 gene signatures
(TGM2, C11orf96, PLAAT4, PNCK, KLF5 and C4BPA) was generated.
Prognostic performance of TMErisk was comprehensively evaluated via
several methodologies in the present study. Compared with TNM stage,
TMErisk had a higher C statistic for predicting OS and combination of
the two variables could enhance the predictive efficacy. Then, TMErisk
was corroborated to be an independent prognosticator in adjusted
multivariate Cox regression analysis. Ability of TMErisk for OS
prediction was validated in 6 NSCLC datasets, in which C-index ranged
from 0.606 to 0.711 and favorable predictive accuracy at different time
points was exhibited in time-dependent AUC. These assessments indicated
that TMErisk could serve as a quantitative and promising risk
stratification tool to guide personalized treatment. Furthermore,
analysis of drug sensitivity revealed a better sensitivity to most of
the tested chemotherapeutic and targeted agents in the low-TMErisk
group than that in the high-TMErisk group, which provided a preliminary
hint of therapeutic prediction. However, drug sensitivity was estimated
based on gene expression matrix and drug response data of tumor cell
lines and therefore, should be interpreted with this caveat in mind. It
appeared that TMErisk was unable to significantly stratify PFS and was
marginally associated with therapeutic response in the chemotherapy
groups from OAK, POPLAR and ORIENT-11 cohorts. The chemotherapeutic
sensitivity of patients screened using TMErisk remains to be evaluated
in studies with larger sample size and different cancer types.
Antibodies blocking PD-1 or PD-L1 checkpoint have revolutionized the
treatment paradigm of various cancer types, whereas only a fraction of
patients benefit from this treatment, and a reliable and accurate
predictive biomarker has not been established. To investigate the
potential role of TMErisk in predicting benefits of immunotherapy, we
analyzed the relationship of TMErisk with response and survival
benefits in multiple external cohorts and an in-house cohort. We
identified robust value of TMErisk for predicting therapeutic response
and patient prognosis in all these immunotherapy cohorts, though only a
marginal statistical significance was observed for LUSC patients from
OAK trial. We thought the small sample size and the retrospective
nature might be, in part, responsible for this marginal association. In
the atezolizumab group from POPLAR and IMvigor210 clinical trials, high
TMErisk was demonstrated to be an independent factor both for OS and
PFS in the multivariate analyses. In addition, predictive accuracy (as
measured using C-index) of TMErisk for OS and PFS was significantly
superior to that of PD-L1 and TMB. A strategy assessing combined
TMErisk, PD-L1 and TMB could promote predictive performance, and
therefore, might provide additional information for screening of
potential beneficiaries. In the internal cohort ORIENT-11,
time-dependent AUC also uncovered a remarkable predictive advantage of
TMErisk than PD-L1 for OS and PFS across different time points. More
interestingly, predictive accuracy of TMErisk first attenuated, and
then escalated, whereas there was a diminishing tendency for PD-L1,
which, to some extent, revealed inadequacy of PD-L1 as a predictive
biomarker.
We conducted in-depth bioinformatics analyses to elucidate mechanisms
that contributed to differential prognosis and response status between
the low- and high-TMErisk groups. CIBERSORT, EPIC and xCell algorithms
were applied to assess infiltration levels of immune and stromal cells
for each sample in the TCGA-LUSC cohort. It was found that high-TMErisk
group had higher infiltration abundance of most immune and stromal
cells than the low-TMErisk group. Specifically, cells that can catalyze
formation of immune-suppressive microenvironment, such as Tregs,
endothelial cell, CAFs, macrophages M2 and neutrophil, were enriched in
the high-TMErisk group. Communication between these cells and tumor
cells provides compensatory inhibitory mechanisms that contribute to
tumorigenesis and development as well as immune evasion, and thus may
affect the response to immunotherapy.[189]^21 By contrast, CD4^+ and
CD8^+ T cells, as the prime effectors of anti-tumor immunity, were
enriched in the low-TMErisk group. Accordingly, we found that
chemokines that guide Tregs (CCL17 and CCL22), microphages (CCL2 and
CCL5), and neutrophils (CXCL1, CXCL2 and CXCL5) into solid tumors
upregulated in the high-TMErisk group. Most of the selected
immune-related genes were highly expressed in the high-TMErisk group,
including negative checkpoints CTLA-4, PD-1, PD-L2, TIGIT, BTLA,
TNFRSF14 and VISTA, which might demonstrate a series of immune evasion
events in tumors with high TMErisk.
In functional enrichment analysis, we found that gene sets involved in
immunosuppression-related biological procedures including
IL6-JAK-STAT3, complement, EMT, inflammatory response, angiogenesis and
IL2-STAT5 signaling, were enriched in the high-TMErisk group. EMT is a
procedure characterized as phenotype transition from epithelial cells
to mesenchymal cells, in which series of biological events can occur,
such as induction of tumor stem cells, angiogenesis and immune
evasion.[190]^22 A recent study uncovered that there existed complex
and dynamic immunomodulatory crosstalk between EMT and immune evasion,
which was relevant to immunotherapeutic response and patient
survival.[191]^23
Taken together, we identified that tumors in the high-TMErisk group was
characterized by immunosuppressive micro-milieu, low abundance of
infiltrating CD4^+ and CD8^+ T cells, and low TMB. These phenotypes
might contribute to inferior therapeutic response to anti-PD1/PD-L1
antibody and OS in patients with high TMErisk. In regard to the
deserted infiltration of T cells and immune-suppressive
microenvironment, we speculated that differential intrinsic mechanisms
might be responsible for the two microenvironment phenotypes, and it is
worthwhile to explore the underlying mechanisms.
Our work had advantage in developing a simplified and clinically
friendly risk scoring model based on the 6 TME-related genes. We
identified that high TMErisk significantly predicted unfavorable
prognosis and immunotherapeutic response in multiple validation
cohorts. High TMErisk was associated with immune-suppressive
micro-milieu, deserted infiltration of CD4^+ and CD8^+ T cells, and low
TMB. This model can assist clinicians in screening patients who are
likely or unlikely to be immunotherapy responders and the relevant
findings provide a theoretical basis for developing new combined
treatment strategies.
Limitations of the study
The retrospective nature in the training set made our study subject to
potential biases and affected the statistical power. The association of
TMErisk with immunotherapeutic efficacy and prognosis remained to be
validated in multicentric prospective studies with larger sample size.
Furthermore, we established the TMErisk model using RNA-seq data, and
therefore it remained to be determined whether it could employ more
economic detection technology, such as IHC or a PCR assay panel,
to generate a risk score. Whereas, currently we can detect gene
expression levels based on a panel containing 6 model genes, which is
more cost-effective compared with the whole transcriptomic sequencing.
In addition, the bulk RNA-seq data limited us identifying immune or
stromal components that were directly relevant to TMErisk, and leading
to inadequate mechanism exploration. Our ongoing effort is to identify
key regulatory mechanisms of these model genes through mechanistic and
clinical validations.
STAR★Methods
Key resources table
REAGENT or RESOURCE SOURCE IDENTIFIER
Biological samples
__________________________________________________________________
Bulk tissues from NSCLC patients in ORIENT-11 clinical trial Yang
et al.[192]^18 N/A
__________________________________________________________________
Critical commercial assays
__________________________________________________________________
PD-L1 IHC 22C3 pharmDx Agilent Technologies Cat#SK006; RRID:
[193]AB_2889976
__________________________________________________________________
Deposited data
__________________________________________________________________
TCGA-LUSC RNA-seq data The Cancer Genome Atlas N/A
TCGA-LUAD RNA-seq data The Cancer Genome Atlas N/A
Bulk tissue RNA-seq from LUSC patients Mezheyeuski et al.[194]^24
[195]GSE81089
Microarray data from LUSC patients Rousseaux et al.[196]^25
[197]GSE30219
Microarray data from LUSC patients Botling et al.[198]^26 [199]GSE37745
Microarray data from LUSC patients Bueno et al.[200]^27 [201]GSE157011
Microarray data from LUAD patients Okayama et al.[202]^28 [203]GSE31210
Bulk RNA-seq data from NSCLC patients who received immunotherapy Jung
et al.[204]^29 [205]GSE135222
Bulk RNA-seq data from melanoma patients who received immunotherapy
Hugo et al.[206]^30 [207]GSE78220
Bulk RNA-seq data from POPLAR clinical trial Patil et al.[208]^31
EGAS00001005013
Bulk RNA-seq data from OAK clinical trial Patil et al.[209]^31
EGAS00001005013
Bulk RNA-seq data from IMvigor210 clinical trial Banchereau
et al.[210]^32 EGAS00001004343
__________________________________________________________________
Software and algorithms
__________________________________________________________________
R 4.1.0 R Development Core Team [211]https://cran.r-project.org/
estimate Yoshihara et al.[212]^19
[213]https://sourceforge.net/projects/estimateproject/
maxstat Hothorn et al.[214]^33
[215]https://cran.r-project.org/web/packages/maxstat/index.html
WCGNA Langfelder et al.[216]^34
[217]https://cran.r-project.org/web/packages/WGCNA/index.html
immunedeconv Sturm et al.[218]^35
[219]https://github.com/omnideconv/immunedeconv
clusterProfiler Wu et al.[220]^36
[221]https://bioconductor.org/packages/release/bioc/html/clusterProfile
r.html
oncoPredict Maeser et al.[222]^37
[223]https://cran.r-project.org/web/packages/oncoPredict/index.html
TIDE Jiang et al.[224]^38 [225]http://tide.dfci.harvard.edu
X-tile Camp et al.[226]^39
[227]https://medicine.yale.edu/lab/rimm/research/software/
[228]Open in a new tab
Resource availability
Lead contact
Further information and requests for resources should be directed to
and will be fulfilled by the lead contact, Shaodong Hong
([229]hongshd@sysucc.org.cn).
Materials availability
This study did not generate new unique reagents.
Gene expression data and clinical data for the TCGA-LUSC, TCGA-LUAD and
validation cohorts from Gene Expression Omnibus (GEO) (GEO:
[230]GSE30219, [231]GSE37745, [232]GSE157011, [233]GSE31210,
[234]GSE30219, [235]GSE37745, [236]GSE157011, [237]GSE31210) can be
retrieved from public repositories. RNA-seq data and corresponding
clinical data for OAK, POPLAR and IMvigor210 are available in the
European Genome-phenome Archive (EGA) ([238]https://ega-archive.org/)
with restricted access (EGA: EGAS00001004343 and EGAS00001005013). The
remaining data analyzed during this study are included within the
published article and its supplementary information files. Other
materials relevant to the study and data for ORIENT-11 are available
from the corresponding authors upon reasonable request.
Experimental model and study participant details
Public and in-house datasets
For model development, we downloaded TCGA-LUSC RNA-seq data (FPKM
normalized) and corresponding clinical information via R package
“TCGAbiolinks”. Publicly available gene expression datasets for NSCLCs
in Gene Expression Omnibus (GEO) database
([239]https://www.ncbi.nlm.nih.gov/geo/) and TCGA were systematically
searched to retrieve external validation cohorts. Four normalized
microarray datasets, [240]GSE30219 (LUSC),[241]^25 [242]GSE37745
(LUSC),[243]^26 [244]GSE157011 (LUSC),[245]^27 [246]GSE31210 (lung
adenocarcinoma, LUAD)[247]^28 and two RNA-sequencing datasets,
[248]GSE81089 (LUSC)[249]^24 and TCGA-LUAD, were identified. Samples
lacking complete prognosis information were excluded from further
evaluation. The TCGA-LUSC somatic mutation data (data category: Simple
Nucleotide Variation, workflow type: MuTect2 Variant Aggregation and
Masking) were obtained through R package “TCGAbiolinks”. Mutation
status was analyzed and visualized via R package “maftools”.
To investigate value of established model for predicting treatment
response and prognosis in patients who received immunotherapy or
chemotherapy, we collected gene expression data and corresponding
clinical information from several cohorts, including 5 external
datasets ([250]GSE135222 [NSCLC], [251]GSE78220 [melanoma], POPLAR, OAK
and IMvigor210) and an internal cohort (ORIENT-11). POPLAR (phase 2)
and OAK (phase 3) are multicenter, open-label, randomized controlled
trials, in which previously treated NSCLC patients were administrated
with atezolizumab or docetaxel.[252]^4^,[253]^40 IMvigor210 is a
multicenter, single-arm phase 2 trial, evaluating efficacy of
atezolizumab in patients with metastatic urothelial carcinoma who have
progressed on platinum-based chemotherapy.[254]^41 Gene expression data
and clinical information for these three cohorts have been stored in
European Genome-phenome Archive (EGA) ([255]https://ega-archive.org/)
with controlled access (EGA: EGAS00001004343 and
EGAS00001005013).[256]^31^,[257]^32 We submitted data access
applications to the corresponding Data Access Committee (DAC) and
downloaded relevant data using Python after approval. ORIENT-11 is a
multi-center, randomized, double-blind, phase 3 study that compared
sintilimab or placebo, in combination with pemetrexed and platinum, for
locally advanced or metastatic nonsquamous NSCLC in China.[258]^42 We
have prospectively collected the baseline demographical and
clinicopathological data as well as follow-up and survival information
for patients enrolled in the ORIENT-11 study. Formalin-fixed,
paraffin-embedded baseline tumor samples have also been prospectively
collected to perform PD-L1 IHC analysis (evaluated using 22C3 pharmDx,
Agilent Technologies) and RNA sequencing on the NovaSeq 6000 system
(Illumina), as previously described.[259]^18 Baseline
clinicopathological characteristics of patients in the POPLAR,
IMvigor210 and ORIENT-11 cohorts are presented in [260]Tables S12 and
[261]S13.
Study protocol of ORIENT-11 was approved by the respective
institutional review boards and ethics committees and all participants
provided written informed consent. Ethic approval and patient informed
consents for TCGA, GEO and EGA were waived due to their public
availability. The study was conducted in accordance with the
Declaration of Helsinki (as revised in 2013). A flowchart delineating
process of model building and our data analysis protocol is presented
in [262]Figure 1.
Method details
DEGs associated with TME scores and weighted gene co-expression network
analysis
TCGA-LUSC was utilized as a training set to evaluate the prognostic
TME-related genes from expression matrix. Immune scores and stromal
scores were calculated based on ESTIMATE algorithm via the R package
“estimate”.[263]^19 Tumor purity was inferred by ESTIMATE scores that
generated from the combination of immune and stromal scores.[264]^19
Based on the immune and stromal scores calculated for each sample,
patients were stratified into two groups (high vs. low immune scores or
high vs. low stromal scores) through methodology of maximally selected
rank statistics which can be employed using R package “maxstat”[265]^33
to select optimum prognosis-based cut-off. Differentially expressed
genes (DEGs) between the low and high immune/stromal scores were
identified using R package “DESeq2”.[266]^43 An adjusted p-value of
<0.05 and simultaneously an absolute value of log[2](fold change) of >1
were set as the significant criteria to filter DEGs. Weighted gene
co-expression network analysis (WCGNA)[267]^34 was performed to
investigate co-expressed gene modules correlated with immune/stromal
scores and modules with |correlation coefficient| > 0.45 were
considered as strong TME-related gene modules. we started with
constructing a co-expression network based on interaction patterns
among genes in the TCGA-LUSC cohort and genes with similar interaction
patterns were allocated into a module. We then calculated correlation
coefficients between gene modules and immune scores as well as stromal
scores (inputted as continuous variables), through which co-expressed
modules were connected to phenotypic variables.
Development and validation of TME-related risk scoring model
The intersections of DEGs and robust TME-related gene modules generated
by WGCNA were input into the LASSO regression analysis which was
performed through R package “glmnet” to identify candidate model genes.
The LASSO technique is able to modulate complexity and reduce redundant
variables when constructing a general linear model. A method combined
with LASSO and Cox regression model can be used to screen prognostic
biomarkers.[268]^44 The optimal penalty parameter λ was identified
through running 10-fold cross-validations. Generated candidate genes
were dichotomized (0 for low expression and 1 for high expression)
using X-tile software[269]^39 and univariate Cox analyses were
performed to evaluate association of these candidate genes with OS.
Gene variables with a univariate p-value of <0.15 were integrated into
an Akaike information criterion (AIC)-based multivariate stepwise Cox
regression model to establish the final TMErisk model. The TMErisk for
each sample was calculated by the formula: TMErisk =
[MATH:
∑i=1nβi∗Expi :MATH]
, where
[MATH: Expi :MATH]
indicates gene expression level of a given patient and
[MATH: βi :MATH]
is the corresponding model coefficient for this gene. Patients were
then divided into two groups, low- and high-TMErisk groups, based on
maximally selected rank statistics method. Value of TMErisk for
predicting OS was then validated in four LUSC cohorts (GEO:
[270]GSE81089, [271]GSE30219, [272]GSE37745 and [273]GSE157011) and two
LUAD cohorts (TCGA-LUAD and [274]GSE31210), during which cut-points
were derived in corresponding datasets using maximally selected rank
statistics method ([275]Table S14).
TME landscape and pathway enrichment analysis
CIBERSORT,[276]^45 EPIC,[277]^46 and xCell[278]^47 were utilized to
extrapolate infiltrating abundance of immune and stromal cells. The R
package “immunedeconv” provides a unified access to these
algorithms.[279]^35 Expression levels of chemokine genes and
immune-related genes[280]^48 were also compared to evaluate TME
signature between the low- and high-TMErisk groups. In addition, we
analyzed association of gene signatures in the TMErisk model with
immune-related genes using Pearson method and results were visualized
by R package “ggpubr”. DEGs between the low- and high-TMErisk groups
were identified using R package “DESeq2”, and an adjusted p-value of
<0.05 and simultaneously an absolute value of log[2](fold change) of >0
were set as the significant criteria to filter DEGs. Gene set
enrichment analysis of these DEGs were conducted via “compareCluster”
function in R package “clusterProfiler” to analyze biological functions
associated with TMErisk.[281]^36 The GMT file named
“h.all.v2022.1.Hs.symbols.gmt”, containing 50 hallmark gene sets that
represent specific biological states or processes, was downloaded from
the Molecular Signatures Database (MSigDB) (MSigDB:
[282]http://www.gsea-msigdb.org/gsea/msigdb/collections.jsp#H).[283]^49
Prediction of drug response in the low- and high-TMErisk groups
The Genomics of Drug Sensitivity in Cancer (GDSC)[284]^50 and Cancer
Therapeutics Response Portal (CTRP)[285]^51^,[286]^52 databases are two
important resources for predicting drug response and therapeutic
biomarkers based on high-throughput cancer cell line screening data.
Data from the two databases have been processed and included in the R
package “oncoPredict”.[287]^37 The function calcPhenotype in
“oncoPredict” package utilized a pipeline to predict therapeutic
response based on baseline tumor expression data. Briefly, gene
expression and drug screening data from these databases are used as
training sets to build ridge regression model that can be applied to a
new gene expression matrix to generate drug sensitivity
prediction.[288]^37 We used the R package “oncoPredict” to infer the
half maximal inhibitory concentration (IC50) value of various drugs for
TCGA-LUSC patients. Predictive potential of TMErisk for immunotherapy
was inferred by the tumor immune dysfunction and exclusion (TIDE)
algorithm. TIDE is a computational approach to reflect immune evasion
capacity of tumor and a lower TIDE score indicates a favorable response
to immunotherapy.[289]^38
Evaluation of TMErisk in immunotherapy/chemotherapy cohorts
We validated predictive value of TMErisk for predicting immunotherapy
or chemotherapy response and patient survival in five external public
datasets ([290]GSE135222, [291]GSE78220, OAK, POPLAR and IMvigor210)
and an in-house cohort (ORIENT-11). Since [292]GSE135222 lacks response
data, we failed to investigate association of TMErisk with
immunotherapeutic response in this cohort. As per the established
TMErisk model, patients were scored and then stratified into two groups
using maximally selected rank statistics method in corresponding
datasets ([293]Table S14).
Quantification and statistical analysis
Continuous variables between two groups were compared using Wilcoxon
rank-sum test or unpaired t-test depending on the normality of
distribution. Fisher’s exact test or Chi-square test, where
appropriate, was used to examine contingency tables. Correlation
between two continuous variables was examined by Pearson method for
normally distributed variables, and Spearman method for non-normally
distributed variables. The Kaplan–Meier method was conducted to
generate survival curves between different TMErisk groups in each
cohort and the statistical difference was compared using log rank test.
Association of TMErisk and other baseline factors with OS or
progression-free survival (PFS) was investigated by univariate Cox
regression models and multivariate Cox analysis was used to determine
independent factors. Aforementioned survival analyses were conducted
via R package “survival” and “survminer”. We utilized concordance index
(C-index) and time-dependent area under the receiver operating
characteristic curve (time-dependent AUC) to evaluate predictive
accuracy of TMErisk, PD-L1 and TMB for OS and PFS. The distribution of
response categories between the low- and high-TMErisk groups were
visualized through percentage stacked bar charts generated by R package
“ggplot2”. All statistical analyses and visualization were conducted
using R software ([294]http://www.R-project.org, version 4.1.0) and
two-sided p-value of less than 0.05 was considered statistically
significant.
Additional resources
Clinical trial registry number of ORIENT-11: [295]NCT03607539,
[296]https://clinicaltrials.gov/ct2/show/NCT03607539.
Acknowledgments