Abstract
Intratumoral heterogeneity (ITH) has been linked to decreased efficacy
of clinical treatments. However, although genomic ITH has been
characterized in genetic, transcriptomic and epigenetic alterations are
hallmarks of esophageal squamous cell carcinoma (ESCC), the extent to
which these are heterogeneous in ESCC has not been explored in a
unified framework. Further, the extent to which tumor-infiltrated T
lymphocytes are directed against cancer cells, but how the immune
infiltration acts as a selective force to shape the clonal evolution of
ESCC is unclear. In this study, we perform multi-omic sequencing on 186
samples from 36 primary ESCC patients. Through multi-omics analyses, it
is discovered that genomic, epigenomic, and transcriptomic ITH are
underpinned by ongoing chromosomal instability. Based on the RNA-seq
data, we observe diverse levels of immune infiltrate across different
tumor sites from the same tumor. We reveal genetic mechanisms of
neoantigen evasion under distinct selection pressure from the diverse
immune microenvironment. Overall, our work offers an avenue of
dissecting the complex contribution of the multi-omics level to the ITH
in ESCC and thereby enhances the development of clinical therapy.
Subject terms: Cancer genomics, Oesophageal cancer, Cancer epigenetics,
Tumour heterogeneity, Tumour immunology
__________________________________________________________________
It is essential to understand heterogeneity and evolution at different
omics levels in oesophageal squamous cell carcinoma (ESCC). Here, the
authors use multi-omics to analyse heterogeneity and evolution in ESCC
patient samples, and characterise the levels of immune infiltration as
well as selective pressure from the tumour microenvironment.
Introduction
Esophageal squamous cell carcinoma (ESCC) is one of the most prevalent
cancer types which occurs in Eastern Asia and parts of
Africa^[46]1,[47]2. Several large-scale sequencing studies have
revealed the complex genomic landscape of ESCC^[48]3–[49]8. However,
oncological biomarkers might be confounded by sampling bias derived
from spatial intratumoral heterogeneity (ITH). ITH, which supplies the
fuel for clonal evolution and drug resistance^[50]9, is pervasive
across multiple levels of molecular features. A precise understanding
of ITH is crucial for the development of effective diagnosis and
biomarker design.
Recently, several studies have performed multi-regional whole-exome
sequencing (WES), methylation profiling, or TCR sequencing to unveil
ITH in ESCC^[51]10–[52]12. However, ESCC ITH has not been
comprehensively characterized across multi-omics from a large cohort of
ESCC patients. Thus, an understanding of the complex interplay between
the genome, transcriptome, and methylome underpinning ITH in ESCC and
the evolution of this disease is lacking. Moreover, the interaction
between the cancer cell and its immune microenvironment and how their
cross-talk influences cancer evolution has not been investigated.
Indeed, although a strong correlation between TCR (T cell receptor)
repertoire and genomic ITH has been found in ESCC, mechanisms of
neoantigen evasion in ESCC remain unclear.
In this study, we prospectively collected 186 samples from 36 ESCC
tumors from 36 patients (Supplementary Fig. [53]S1). All samples
(including adjacent normal tissues) were characterized by whole-exome
sequencing (WES). We calculated CpG-rich methylation on a genome-wide
scale for all the patients by single-cytosine-resolution DNA
methylation analysis using reduced representation bisulfite sequencing
(RRBS) (Fig. [54]1a). We also performed RNA sequencing (RNA-seq) for
the 33 ESCC patients with high RNA quality. The WES data was derived
from each tumor to a median of 260x depth. RRBS data provided more than
10x sequencing coverage (median 77x) on three to six million CpG sites
genome-wide for each tumor sample. RNA-seq data achieved a median of
44.5 M high-quality reads. All the detailed information on tri-omics
sequencing data is listed in Supplementary Data [55]1. One main target
of our study is to investigate the tumor-intrinsic causes that promote
ITH across diverse molecular features. We also explore the hypothesis
that intratumor heterogeneity across the diverse molecular features is
associated with the selection pressures from distinct tumor
microenvironments in ESCC.
Fig. 1. Fuels of Genomic ITH in ESCC.
[56]Fig. 1
[57]Open in a new tab
a Overview of the experimental strategy. Each ESCC multi-sites samples
of patients was performed using WES, RNASeq and RRBS to explore the
intrinsic multiple omics ITH, the extrinsic TILs ITH and the fuels of
each omic ITH. b The top panel shows the tumor mutational burden (TMB)
for each tumor region (n = 186). The minimum and maximum are indicated
by the extreme points of the box plot; the median is indicated by the
thick horizontal line; and the first and third quartiles are indicated
by box edges. The second and third-panel shows the proportion of
somatic mutations and copy-number alterations that are identified as
clonal (deep blue bars) or subclonal (orange bars) in each patient. The
fourth and fifth-panel show the mutation signatures weight identified
in the ESCC cohort by using clonal and subclonal somatic mutations
separately. The bottom panel shows the summary of the genome doubling
(GD), evolution patterns, TIL group, and HLALOH status for each patient
(n = 36). c, d The top panel shows the predicted cellular prevalence of
each clonal population for each tumor sample from ESCC24 (c) and ESCC51
(d) patient; The same color represents the same cell subpopulation
across different tumor regions. The bottom panel shows the inferred
clonal evolution tree of each tumor sample (the circle in gray means
the corresponding clone does not exist in this sample). e Box plot
represents WGII comparison by genome doubling (GD, n = 112) and
non-genome doubling (NGD, n = 74) status across 186 tumor regions.. f,
g The box plot shows the subclonal somatic mutation proportion
comparison (f) and subclonal somatic copy-number proportion comparison
(g) between GD (n = 112) and no GD status (n = 74). All Box plot center
line represents median value; lower and upper hinges represent 25th and
75th percentiles; the minimum and maximum are indicated by the extremes
of the box plot; P value is shown; two-sided Wilcoxon rank-sum test.
Results
Genomic ITH in ESCC
Genomic heterogeneity between multi-sites of a single tumor poses major
barriers to the biomarker discovery and the development of target
therapies^[58]10,[59]13. To assess the genomic ITH in ESCC, we
classified the somatic mutations and copy-number alterations (SCNA) as
clonal (prevalent in all cancer cells) or subclonal (present in only a
subset of cancer cells). We confirmed early mutational events of
somatic mutations in TP53, NOTCH3, and PTPRC in ESCC (Supplementary
Fig. [60]S2a). We observed a median of 33.3% (range 6.1% to 83.4%) of
somatic mutations identified as subclonal and a median of 55.1% (range
10.2% to 90.8%) of SCNA as subclonal (Fig. [61]1b).
Investigating genomic ITH can unveil the evolution history of how
mutations are chronologically accumulated. To investigate clone
architecture and the evolutionary patterns of each patient, we
constructed the tumor phylogeny according to the mutational cellular
prevalence (Fig. [62]1c, d). Each node on the phylogenetic tree
represents clonal (shared by all tumor sites) or subclonal (shared by
part of tumor sites) mutation clusters and was mapped in the
evolutionary history of each ESCC tumor. A total of 207 mutation
clusters were identified (Supplementary Fig. [63]S3a, [64]6 per
patient, ranging from 2 to 13). There are 47% of subclones captured in
a single branch of the phylogenetic tree (Supplementary Fig. [65]S3b).
We also detected a median of 14% (range 0.1 to 53%) of mutations
exhibited a clonal illusion^[66]14, whereby they appeared clonal within
single samples yet were subclonal in the tumor as a whole,emphasizing
the importance of multi-sites sampling (Supplementary Fig. [67]S3c).
The evolutionary trees of our ESCC cohort showed two topological
patterns of evolution: the predominant pattern of evolution was the
branched pattern (33/36, 92%), and the minor pattern of evolution was
the liner pattern 3/36 (8%). This phenomenon was also observed in other
cancer types^[68]15,[69]16. This suggested that ESCC is a much more
intratumoral heterogeneous cancer type. For the linear evolutionary
pattern, successive clones overgrew their ancestral clones by
accumulating somatic alterations without expansion (Fig. [70]1c and
Supplementary Fig. [71]S3a). While the other cases (33/36, 92%) showed
a branched pattern, with diverse subclones coexisting and only sharing
part of mutations from the ancestral clone (Fig. [72]1d and
Supplementary Fig. [73]S3a). This result revealed the limitation of
single sampling for the accurate assessment of genomic ITH. Comparing
with the branched evolutionary pattern, we found that linear evolution
patterns generally showed limited intratumor heterogeneity, and no
clonal expansion occurred during tumor progression. Although the number
of patients with linear evolution patterns was limited, we did not
observe the purity or number of sites of these patients were
significantly lower than the patients with a branched evolutionary
pattern. Conceivably, monotherapies against the tumor with the linear
evolutionary pattern might show better clinical effects.
Fuels of genomic ITH in ESCC
Chromosomal instability
Ongoing chromosomal instability (CIN) may drive intratumoral
heterogeneity^[74]14. To distinguish CIN+ from CIN tumors, we evaluated
the weighted Genome Instability Index (wGII) for each ESCC tumor
regions^[75]17. We observed that higher wGII were found in high-ploidy
(ploidy ≥ 3) than diploid tumors (Fig. [76]1e), which is consistent
with previous studies across different tumor types^[77]18. The allele
information showed that genome-doubling events were detected in 70% of
ESCC patients and was shared by all the regions except four subclonal
genome doubling (GD) tumors (ESCC10, ESCC27, ESCC30, ESCC35), which
suggested that GD is frequently an early event during ESCC tumor
progression. We observed that there is a significant association
between genome doubling and subclonal copy-number alterations
(Fig. [78]1g, P = 4.33 × 10^−8, Wilcoxon rank-sum test), but not with
regard to mutational heterogeneity (Fig. [79]1f, P = 0.32, Wilcoxon
rank-sum test). This suggested that GD events provided the substrates
for the genomic ITH at the copy-number alterations level.
Mirrored subclonal allelic imbalance (MSAI), which resulting from
different parental alleles being gained or lost in distinct subclones,
is another cause of ITH driven by CIN^[80]14,[81]19. We detected MSAI
in 33.3% (12/36) patients and a total of 37 MSAI events from focal to
arm level alterations. Besides, this resulted in parallel evolution
involving multiple distinct events converging on the same regions in
different subclones (Supplementary Fig. [82]S4a–[83]4c).
Mutagenic processes
To determine which mutagenic processes promote the intratumoral
heterogeneity, we systematically analyzed the mutational signatures of
both clonal and subclonal stages (Fig. [84]1b). Using published
mutational signatures^[85]20, the number of clonal mutations
significantly correlated with the burden of mutations related to aging
(Supplementary Fig. [86]S5a, Signature 1a, P = 0.01) and DNA mismatch
repair (DMMR) processes (Supplementary Fig. [87]S5b, P = 9.5 × 10^−4)
across the ESCC cohort. We analyzed ESCC13 separately because of the
significant burden of signature 15 mutations, which correlated with
defective DNA mismatch repair. This might be attributed to the positive
MSI status of ESCC13^[88]21. In comparison to the clonal stage, the
contributions of signature 3 (BRAC1/2) significantly correlated with
the burden of subclonal mutations (Supplementary Fig. [89]S5d,
P = 2.4 × 10^−7). Tumors obtained from the 36 ESCC patients got both
clonal and subclonal mutations that could also be attributed to DNA
mismatch repair signatures (Supplementary Fig. [90]S5e) and aging
processes (Supplementary Fig. [91]S5c), which implies that both of the
processes continuously induced the mutational events which may
contribute to clonal expansion.
Epigenomic ITH in ESCC
Although informative, the genomic alterations did not fully explain the
heterogeneity of ESCC. DNA epigenetic changes also play an important
role in the tumorigenesis of ESCC^[92]13. Firstly, we calculated the
methylation level with 1-kb sliding bins across all the tumor regions
in our ESCC cohorts. We observed the global hypomethylation trend in
ESCC tumor compared with adjacent normal tissue (Fig. [93]2a and
Supplementary Fig. [94]S6a, P = 4.6 × 10^−10, Wilcoxon rank-sum test),
which was consistent with results from the previous studies^[95]22. The
tumor’s hypomethylated bins were significantly enriched in long
interspersed nuclear element 1 (LINE-1, L1) and endogenous retrovirus
(ERV) regions (Fig. [96]2c, P < 0.05, Fisher’s exact test). In
contrast, significantly increased DNA methylation levels at CpG islands
and promoter regions were detected in ESCC tumors (Fig. [97]2b,
P < 0.05, Fisher’s exact test). We compared the relative methylation
level changes between LINE-1 and LINE-2 regions, which showed that
LINE-1 tends to show significantly stronger DNA demethylation than
LINE-2 in cancer cells (Supplementary Fig. [98]S6b, P = 4.43 × 10^−13,
Wilcoxon rank-sum test). L1 is more active and evolutionarily younger
than L2^[99]23. This suggested that the DNA demethylation changes in
LINE-1 regions promote the tumorigenesis and progression of tumors,
which develop an avenue contrasted to normal tissue development.
Fig. 2. Fuels of epigenetic ITH in ESCC.
[100]Fig. 2
[101]Open in a new tab
a Methylation level comparison between paired Tumor (n = 186) and
adjacent normal (n = 36) samples. Two-sided paired t test is used for
comparison. b, c Enrichment of hypermethylated DMRs (b) and
hypomethylated DMRs (c) for annotated genomic elements between tumor
(n = 186) and paired normal (n = 36) samples. *P < 0.05, **P < 0.01,
***P < 0.001; two-sided Fisher’s exact test. d The pairwise distance of
tumor samples from the same patient is based on the DMRs and SCNA
profiles. Each dot represents a pair of tumor samples from the same
patient. Spearman’s correlation coefficient (rho) and corresponding
P-value is shown (n = 452 tumor region pairs). Line of best fit shown
in blue and gray area represents 95% confidence bands. e Phylogenetic
and phyloepigenetic trees were constructed using ESCC14 and ESCC25. f
Distribution of entropy, epipolymorphism, and discordantly methylated
read (PDR) scores between neutralnd somatic copy-number alterations
regions across 36 ESCC patients with 186 tumor regions. Box plot center
line represents median value; lower and upper hinges represent 25th and
75th percentiles; the minimum and maximum are indicated by the extremes
of the box plot; ***P < 0.001, NS, no significance, P > 0.05; two-sided
Wilcoxon rank-sum test.
To address the epigenetic ITH, we identified the differentially
methylated regions (DMR) with the 1-kb sliding bins between the tumor
and adjacent normal tissue (Supplementary Fig. [102]S7a). To quantify
the methylation level heterogeneity, we calculated the average pairwise
ITH (APITH) to measure the methylation ITH of tumors (Methods). The
advantage of APITH in ESCC is that its value is not biased by the
number of samples of tumor (Supplementary Fig. [103]S6c and
Supplementary Fig. [104]S6d). To explore the relationship between the
epigenetic ITH and the genomic ITH, we compared the APITH score and the
SNV-ITH (proportion of subclonal SNVs). We found a significant
correlation between them (Supplementary Fig. [105]S6e, Spearman’s
rho = 0.49, P = 0.0037). This confirmed the conclusion that genetic
mutations and epigenetic modifications diversify along with similar
patterns during ESCC progression^[106]10.
Fuels of epigenetic ITH
We next investigate the mechanism underpinning epigenetic ITH. First of
all, we calculated the Euclidean distance separately for SCNA profiles
and the DNA methylation levels of the DMRs. We observed that the
pairwise DNA methylation distances were positively correlated with the
pairwise SCNA distances (Fig. [107]2d, Spearman’s rho = 0.63,
P = 4.46 × 10^−52). To get a whole picture of the interplay between
genomics and DNA methylation ITH, we exhibited the phylogenetic and
phyloepigenetic trees to delineate the relationship among the tumor
samples from the same patient. We observed that phylogenetic trees
inferred from DNA methylation in several patients, such as ESCC14 and
ESCC25, closely recapitulated the phylogenetic trees from SCNA, which
is consistent with a previous study^[108]10 (Fig. [109]2e). Besides, we
also find several patients, such as ESCC 17 and ESCC36, got different
topologic structures (but with partial similarity) of phylogenetic and
phyloepigenetic trees (Supplementary Fig. [110]S7b). This suggested
that there is still a potential mechanism that regulates the
methylation of ITH during the evolution of ESCC tumors. (Supplementary
Fig. [111]S7b). We further explored the dynamics of chromosomal
alterations and the extent to which chromosomal instability may drive
epigenetic ITH. DNA-methylation at sequential CpG dinucleotides
constitutes a phased epigenetic pattern (epiallele), which provides a
snapshot of cells. The diversity of epialleles within the tumor
provides a measure of cellular subpopulations within the sample. To
further identify the relationship between epigenetic changes and SCNAs,
we employed the methclone^[112]24 and epihet^[113]25 to detect the
epigenetic loci and the corresponding intratumor methylation
heterogeneity which was quantified by PDR, entropy, or epipolymorphism.
The proportion of discordant reads (PDR), entropy, and epipolymorphism
was considered to be a measurement of epigenetic instability. We
observed the SCNA regions got an increased entropy level compared with
neutral regions. Similarly, SCNA regions with higher entropy also
exhibited higher epipolymorphism and PDR levels (Fig. [114]2f). In our
study, we found that CIN provides a higher level of somatic CNA
(Fig. [115]1g). CIN provides more copies of DNA segments, and these
segments provide the potential substrate permitting different
methylation statuses of CpG sites (Fig. [116]2f). We could understand
this point from the eloci changes. Extra copies will increase the
entropy (Epipolymorhism or PDR) of eloci. These results indicated a
close relationship between epigenetic variations and CIN.
As LINE-1 is a well-characterized phenomenon in cancer. We further
examine the association between the methylation level of LINE-1 and
chromosomal instabilities. We correlated the methylation level of L1
and SCNA-ITH, and we found that there is a significant correlation
between L1 and SCNA-ITH (Supplementary Fig. [117]S6f, R^2 = 0.54,
P = 0.00077). This suggested that the methylation level of LINE-1
elements is actually associated with CIN in ESCC. Methylation loss in
late-replicating regions makes the heterochromatic structure formation
which is called partial methylation domains (PMDs). A recent study
showed that PMD demethylation is pervasive in diverse cancer
type^[118]26. To investigate the relationship between epigenetic ITH
and CIN, we calculated the PMD APITH scores for each tumor sample and
correlated the PMD APITH score with the patient-wise RNA-ITH value. We
did not observe a significant correlation between them (R^2 = 0.32,
P = 0.11).
Transcriptomic ITH in ESCC
RNA-ITH is a crucial factor that partly was determined by the status of
genomic and epigenomic alterations. To identify which genes are more
vulnerable to sampling bias in ESCC due to the RNA intratumoral
heterogeneity or show higher intertumoral heterogeneity among ESCC
patients, we employed a gene classification method^[119]27, which is
based on the per-gene metric for RNA intra- and intertumoral
heterogeneity. Genes are divided into four RNA heterogeneity quadrants
for ESCC (Fig. [120]3a): low intertumoral heterogeneity and high
intratumor heterogeneity (Class I = 3354 genes); both low in intra- and
intertumoral heterogeneity (Class II = 11622 genes); both high in
intra- and intertumoral heterogeneity (Class III = 5368 genes); and
high intertumoral heterogeneity and low intratumor heterogeneity (Class
IV = 3668 genes). Genes in Class IV showed consistent expression within
regions and were highly variable between tumors, which provided
stronger information for patients’ stratification. By using clustering
concordance scores of each gene, we observed that Class IV genes
outperformed other class types in stratifying ESCC tumor regions by the
patient (Fig. [121]3b). This indicated that genes in Class IV, which
prohibited sampling bias and maximizing the difference between
patients.
Fig. 3. Fuels of transcriptomic ITH in ESCC.
[122]Fig. 3
[123]Open in a new tab
a RNA intertumor (x-axis) and RNA intratumor heterogeneity (y-axis) are
shown on the axes using density curves. The genes are split into four
classes by the mean intratumor and intertumor heterogeneity scores. The
classes are numbered and colored. b Boxplots show the clustering
concordance score per gene in each class (I = 3354 genes, II = 11622
genes, III = 5368 genes, IV = 3668 genes). The minimum and maximum are
indicated by the extremes of the box plot; the median is indicated by
the thick horizontal line; and the first and third quartiles are
indicated by box edges; *P < 0.05, **P < 0.01, two-sided Wilcoxon
rank-sum test. c Correlation of gene expression ITH with copy-number
ITH. The Spearman correlation coefficient (Rs) between patient-wise
RNA-ITH scores and patient-wise SCNA-ITH scores calculated in the ESCC
cohort (n = 33 ESCC patients; Rs = 0.38; P = 0.027). Line of best fit
shown in orange and gray area represents 95% confidence bands. d Violin
boxplots show the correlation between subclonal chromosomal copy-number
changes and gene expression across 36 ESCC patients. The minimum and
maximum are indicated by the extremes of the box plot; the median is
indicated by the thick horizontal line; and the first and third
quartiles are indicated by box edges; two-sided Wilcoxon rank-sum test.
e All the genes were classified in clonal/subclonal gain, loss, or no
change types. Enrichment was tested by RNA heterogeneity class. Odds
ratios are shown using a natural log scale. A two-sided Fisher’s exact
test was performed. f The spearman correlation between patient-wise
RNA-ITH and methylation of promoter regions calculated in the 36 ESCC
cohort. Spearman’s correlation coefficient (Rs) and corresponding
P-value are shown. Line of best fit shown in brown and gray area
represents 95% confidence bands.
To determine the potential biological features of the four classes, we
performed Reactome pathway analysis, respectively (Supplementary
Fig. [124]S8a). Class I showed no significant enrichment, RNA splicing
pathway enriched in Class II genes, and Class III showed involvement in
an extracellular matrix organization. It should be noted that pathways
in cell proliferation, including cleavage of the damaged pyrimidine,
DNA damage stress-induced senescence, and epigenetic regulation
enriched in Class IV, which suggested Class IV genes potentially encode
cell proliferation modules.
Moreover, we assessed the RNA-ITH scores based on the number of
sequencing regions. We observed RNA-ITH scores increased with the
number of sampling regions but became saturated at around four samples
for most tumors (Supplementary Fig. [125]S8b). This suggested that at
least four regions were required to accurately estimate the RNA-ITH
level in ESCC.
Fuels of transcriptomic ITH in ESCC
We next investigated the potential fuels which contribute to RNA-ITH.
Considering the association between CNV and RNA expression, we further
explored whether RNA-ITH provided the fuels for RNA-ITH. Overall, we
observed a positive correlation between the median RNA-ITH score and
SCNA-ITH (Fig. [126]3c, Spearman’s rho = 0.38; P = 0.027). The genomic
regions with subclonal gained copies also exhibited increased RNA
expression, whereas the expression of genes decreased within subclonal
lost copies (Fig. [127]3d). These results suggested that ongoing
dynamic chromosomal instability contributes to the ITH of gene
expression by altering the copy numbers and dosages of genes. To
identify the basic substrates in genomic features for the four
quadrants of genes, we performed relative enrichment of clonal and
subclonal copy-number changes of genes in different RNA-ITH classes.
Class IV genes were enriched in clonal copy-number gain events (odds
ratio = 1.59; P = 1.2 × 10^−6), which suggested Class IV genes may be
driven by clonal DNA copy-number gains selected early in tumor
evolution (Fig. [128]3e).
Epigenetic modifications of the gene’s promoter are important for
regulating gene expression. To explore the DNA methylation underpinning
the RNA-ITH, we detected the different methylated status in the
promoter of genes and the RNA expression changes in any paired regions
from the same patient. We observed that methylation of first exons and
gene promoters related to significant changes in gene expression
(P < 0.001, Wilcoxon test) (Supplementary Fig. [129]S8c). However, no
significant association was observed between methylation of the gene
body (excluding promoter regions) and corresponding gene expression
(P = 0.58, Wilcoxon test). Our results indicate DNA methylation also
regulates the RNA-ITH by changing the methylation level of the gene’s
promoter. We next calculated the ITH of the methylation ITH of genes’
promoters and found methylation ITH of gene’s promoters positively
correlated with the patient-wise RNA ITH (Fig. [130]3f, Spearman’s
rho = 0.43; P = 0.012). This suggested whole genome epigenetic-ITH
contributed more to RNA-ITH compared with SCNA-ITH (Fig. [131]3c, f).
We next measure and rank per gene metric for the methylation intra- and
intertumor heterogeneity, and we splited both heterogeneity metrics by
their mean value. Similar to the RNA-ITH level, this resulted in four
quadrants. Next, we examined the biological significance of the DNA
methylation ITH in each quadrant which correlated with the gene
expression ITH in the same quadrant. We performed the Reactome pathway
analysis to explore the overlapped genes in each quadrant. We found
that Q4 was significantly enriched for pathways involved in cell
proliferation and epigenetic regulation. Moreover, we investigate the
relationship between RNA-ITH and cellular composition. Overall, we did
not observe that RNA-ITH is associated with any immune cell composition
and tumor purity (Supplementary Fig. [132]S9a, [133]S9b).
Immune infiltration drives the heterogeneity across a different omic level
Finally, to determine how immune infiltration varies between and within
tumors across the ESCC cohort, we first implemented the published
Danaher method^[134]28 to evaluate immune infiltration in the
multi-regional ESCC RNA-Seq cohort. Using this method, we profiled
RNASeq-determined infiltrating immune cell populations of the 176 tumor
regions from 33 ESCC patients, for which the RNA data is qualified.
Unsupervised hierarchical clustering was performed for each immune cell
population using the ‘ward.D2’ method on the Manhattan metric. We
revealed two distinct TIL groups, which corresponded to high and low
immune infiltration levels (Fig. [135]4a). Of the 33 ESCC patients with
RNA-seq data, 22 patients harbored tumors with consistently high (7
patients (21%)) or low (15 patients (45%)) immune infiltration.
Besides, we found that 11 patients (33%) exhibited intratumor
heterogeneous immune infiltration (Supplementary Fig. [136]11c). To
further confirm the TILs groups of our ESCC cohort, we next performed
Consensus^TME to estimate the TILs^[137]29, which were positively
associated with the Danaher method across different immune cell types
(Supplementary Fig. [138]S10a). The distance between pairwise
Consensus^TME distance between every two tumor regions strongly
correlated with the pairwise Danaher distance (Supplementary
Fig. [139]S10b).
Fig. 4. Immune infiltrate heterogeneity in ESCC.
[140]Fig. 4
[141]Open in a new tab
a The ESCC tumor samples (n = 176) are clustered by the score of
estimated immune infiltrate. The immune cell population score is
estimated using the Danaher method. Each row represents a type of
immune cell population. Each column represents a tumor region. Tumor
regions are classified as having low levels of immune infiltration (low
immune) are shown in green; Regions classified as having high levels of
immune infiltration (high immune) are shown in red. If all the tumor
regions of the same patient are in the high levels of immune
infiltration, the patient is marked red. Patients who have tumors that
contain heterogeneous levels of immune infiltration are indicated in
orange. The patient who has consistently low levels of immune
infiltration is indicated in green. b Pairwise immune and somatic
mutations distances between every two tumor regions from the same
patient are compared. Spearman’s correlation coefficient (Rs) and
corresponding P-value is shown. c Evidence of HLA LOH has been
annotated with the most likely timing of the HLA LOH event (n = 11). d
Correlation heatmap between the ITH gene score of HLA, B2M and ITH
score of tumor immune infiltration cells.
Next, multiplex immunostaining on paraffin sections on the
heterogeneous TILs patients for CD4, CD8, CD68, and FOXP3 were taken to
validate the difference between high and low immune infiltrated regions
from the same patient (Supplementary Fig. [142]S11a). We used the
CD4^+, CD8^+, and FOXP3 T cell density in tumors to quantify the immune
infiltration. Consistent with the TILs groups inferred from RNAseq
data, tumor regions with high levels of immune infiltration contained
higher immune cell density as compared to regions with low levels of
immune infiltration (Supplementary Fig. [143]S11b).
Understanding how the immune microenvironment shape tumor evolution may
inform strategies to limit tumor adaption in the immune clinical
practice^[144]30,[145]31. We first calculated the Euclidean distance of
both somatic mutations and immune infiltration score between any pair
of tumor regions from the same tumor (Fig. [146]4b). We observed a weak
correlation between the somatic mutations and the immune
microenvironment (Spearman’s rho = 0.35, P = 1.1 × 10^−13). This result
suggested a potential interrelationship between the immune and
mutational landscape. Previous evidence suggested that a higher tumor
mutation burden (TMB) was likely to harbor more neoantigens as targets
for activated immune cells. Tumor with a high mutation burden (>10
mutations per megabase) was found to get a better response in
immunotherapy of non-small cell lung cancer^[147]32. We only observed
two ESCC patients (ESCC11 and ESCC13) with at least one tumor region
with a high mutation burden (Fig. [148]1b). Specifically, one tumor
region with low immune infiltration of the ESCC11 has a high tumor
mutation burden. In contrast, the other tumor regions of ESCC11 with
high immune infiltration have a low tumor mutation burden. This result
further confirmed the distinct selective pressure on the mutational
landscape that existed in different TILs.
Cancer cells can avoid immune-related negative selection through
neoantigen depletion or dysfunction of antigen-presenting machinery
(APM). We hypothesized that the mechanisms of immune escape might
promote genetic clone expansion. The neoantigens from nonsynonymous
mutations were used to explore neoantigen evasion. By implementing a
bioinformatics pipeline to identify neoantigens from the tumours
(Methods), a median of 107 predicted neoantigens per tumour (range from
42 to 754) was detected, and neoantigen heterogeneity varied across the
ESCC cohort (median 36% neoantigens were found heterogeneously, range
from 8 to 89%). In our cohort, we found the neoantigen heterogeneity
was significantly higher in heterogeneous immune infiltration
(Supplementary Fig. [149]S11f, Wilcoxon rank-sum test, P = 0.063),
implying the specific selective pressure on the neoantigen from the
TILs. We next checked the genetic aberrations of HLA-I, which were the
most frequent events leading to APM deficiency. Although HLA mutations
could interfere with neoantigen-MHC binding^[150]33, we did not detect
non-silent mutations of HLA or B2M in our ESCC cohort. We used the
LOHHLA^[151]34 tool to analyze WES data for HLA class I allele loss. Of
36 patients evaluated, we identified nine patients harboring clonal HLA
LOH and two with subclonal HLA LOH, suggesting HLA LOH is a prevalent
mechanism of APM deficiency in ESCC (Fig. [152]4c). We found HLA LOH
was associated with significantly higher expression of TIL markers,
indicating that HLA LOH as an immune evasion mechanism happened under
abundant TILs (Supplementary Fig. [153]S11e). We next identified
neoantigens, which bind to the lost HLA allele across the ESCC cohort
(Supplementary Fig. [154]S11d). We found that mutations predicted to
bind to a lost HLA allele exist in all the ESCC patients, which
highlights the importance of HLA LOH for immune evasion.
Tumor cells have utilized several means to escape TME recognition
through HLA-related mechanisms. They can change the HLA expression to
downregulate the expression of MHC complexes and therefore display
fewer identifying antigens. The TME is an important factor which
affected the HLA expression in ESCC. We assessed the relationship
between HLA gene expression and infiltration of different immune cell
types in ESCC. Based on the correlation patterns between immune cell
infiltration and HLA and B2M expression, we found that HLA expression
showed a significant positive association between HLA and B2M gene
expression and the total immune infiltration, CD8 T cells, Cytotoxic
cells, and T cells which indicating high HLA gene expression is related
with a relatively hot tumor microenvironment (Supplementary
Fig. [155]11g). This result is consistent with previous TCGA
research^[156]35. We further investigate the relationship between the
ITH of HLA gene expression and the ITH of TME. We found that the ITH of
CD8 T cells and NK cells are significantly correlated with the ITH of
HLA-A, HLA-B, and B2M expression (Fig. [157]4d). This result strongly
suggested that the transcriptomic ITH of HLA gene expression is driven
by immune infiltration.
Comparison between Inter-heterogeneity and Intra-heterogeneity in ESCC
The tumor heterogeneity recapitulates tumor heterogeneity in the
context of the immune microenvironment in ESCC, but the comparison
between inter- or intra-heterogeneity of cancer cell-intrinsic
molecular features has not been investigated. To compare the overall
intertumoral and intratumoral heterogeneity across multiple omics
profile, an integrative unsupervised hierarchical cluster of the
multi-regional ESCC cohorts was performed on CNA, gene expression, and
DNA methylation data (176 tumor regions; 33 ESCC patients). We revealed
there was significant segregation among different patients and perfect
clustering concordance between regions from the same tumor, which
suggested intertumoral heterogeneity exceeds intratumoral heterogeneity
across different omic levels (Supplementary Fig. [158]S12a). To
quantify the degree of within- and between patient heterogeneity, we
calculated the Euclidean distance between any tumor sample pairs in the
ESCC cohort across the three omics levels. We identified a significant
difference between intra-patient and inter-patient tumor pairs in SCNA
(Supplementary Fig. [159]S12b, P < 2.2 × 10^−16, Wilcoxon rank-sum
test), gene expression (Supplementary Fig. [160]S12c, P < 2.2 × 10^−16,
Wilcoxon rank-sum test), and DNA methylation (Supplementary
Fig. [161]S12d, P < 2.2 × 10^−16, Wilcoxon rank-sum test). This
suggested that each tumor has a uniquely identifiable omics profile on
all the copy-number level, gene expression level, and genomic
methylation level.
Discussion
To investigate the potential causes and consequences of ITH in ESCC on
different molecular layers, we collected genomic, transcriptomic, and
epigenomic data to track how the ITH of tumors are sculpted. We also
integrated the multi-omics data to explore how the immune
microenvironment shapes the tumor evolution and promotes immune evasion
of the tumor.
We observed chromosomal instability might drive the ITH across all the
molecular levels. At the genomic level, our results suggested that
genome doubling is associated with subclonal copy-number alterations.
At the epigenomic level, SCNA regions showed a higher extent of entropy
and PDR levels, which suggested CIN may promote epigenetic instability
and thereby regulate the epigenetic ITH. At the RNA level, genes with
subclonal gain copies also exhibited increased RNA expression or vice
versa. This suggests that CIN drives the ITH of gene expression by
changing the copy numbers. SCNA contributed directly to the RNA-ITH
mainly by dosage effect of copy-number alterations of corresponding
genes, and the epigenome regulates the RNA-ITH by methylation level
alterations of distant genomic components. To directly identify the
immune cell heterogeneity, we calculated the standard deviation of all
the immune cell type scores from Consensus^TME as a measurement of
total immune cell heterogeneity. We next examined the correlation
between immune-cell heterogeneity and RNA-ITH. We observed that the
immune-cell ITH negatively correlated with RNA-ITH (R^2 = −0.27,
P = 0.17). Although not significant, the result suggested the potential
contribution of immune-cell ITH to the RNA-ITH. Considering the
character of relatively low purity in ESCC, RNA-ITH does not solely
capture cancer cell-intrinsic differences. The immune infiltration
could also contribute to the RNA-ITH.
The overall intertumoral heterogeneity significantly outperformed the
intratumoral heterogeneity across all the three omics levels, which
suggested each tumor showed a uniquely identifiable omics profile. We
next profiled the infiltrating immune cell populations of the ESCC
cohort using RNA-seq data. We found the immune microenvironment is
heterogeneous between and within ESCC patients. 33% of ESCC patients in
our cohort showed a distinct status of immune infiltration. The
selective pressure from diverse TILs also promoted immune evasion. We
found neoantigen heterogeneity showed significantly higher in
heterogeneous TILs. HLA LOH is also an immune evasion mechanism of the
tumor. We found that 9/36 exhibited clonal HLA LOH and two with
subclonal HLA LOH. Besides, we found HLA LOH was associated with
significantly higher expression of TIL markers. This suggested HLA LOH
is also a potential mechanism of immune evasion in ESCC. We also found
that the infiltration of immune cell types was almost uniformly
positively associated with HLA and B2M gene expression in ESCC.
Besides, the ITH of CD8+ T cells and NK cells are associated with the
ITH of HLA and B2M gene expression. This highlights the immune
infiltration that drives both genomic and transcriptomic ITH in the
ESCC cohort.
Previous lung cancer using multi-regional sequencing across diverse
omics studies comprehensively described the spatial and temporal
aspects of tumor evolution and the population characteristics of their
subclonal evolution^[162]14,[163]27,[164]36,[165]37. Our results
emphasize the importance of both the intrinsic molecular ITH regulation
mechanisms and the selection pressures that the immune system regulates
the tumor ITH across different omics levels. Our results suggest that
the ITH of ESCC tumors is characterized by multiple independent
mechanisms, which involve both inner causes and the outer immune
microenvironment. In conclusion, a single tumor site may be
insufficient for clinical diagnosis for ESCC patients^[166]38.
Material and methods
Patients and sample collection
The cohort in our study was carefully sorted out from 82 ESCC patients
who were prospectively collected from Shenzhen Cancer Hospital based on
the eligibility criteria. All the patients underwent primary curative
resection and received no prior anticancer treatments. The spatially
separated tumor specimens were obtained from each individual, with each
region at least 0.5 cm away from the others. All the hematoxylin and
eosin slides of the ESCC tumor samples were macroscopically reviewed
(Tumor purity of each tumor sample was estimated at least 60% by two
pathologists, which makes sure all the selected regions were
comparable). All samples were immediately frozen in liquid nitrogen and
stored at −80 °C. Thin slices of snap-frozen, OCT-embedded tissue
blocks were sent for hematoxylin and eosin (H&E) staining.
Multi-regional tumors and adjacent non-tumor esophageal tissue from 36
ESCC patients were initially collected for the study. All specimens
were collected by surgical resection from Cancer Hospital&Shenzhen
Hospital with the written informed consent provided by the patients and
the approval by the Research Ethics Committee of Cancer
Hospital&Shenzhen Hospital (approval 2019-57). Clinical information
(provide in Source Data file) was collected, including gender, age,
TNMs stage, smoking and drinking. In total, 186 ESCC tumors and 36
non-cancerous matched normal tissue were performed by the following
sequencing experiments. We have consent to publish indirect information
including both gender and age. Other clinical information which could
identify patients directly are not shown in this study. MOST
approval(*BF2022123012755) had been obtained to share genetic data
outside of China.
Multi-regional whole-exome library construction and sequencing
Genomic DNA was taken for whole-exome sequencing. The libraries were
constructed by the protocol of Agilent SureSelectXT Human All Exon v6
(Agilent Technologies, CA, USA). First, fragmentation was carried out
by the hydrodynamic shearing system (Covaris, USA) to generate
180–280 bp fragments. Next, these short fragments went through a series
of library construction steps, such as purification, end blunts, ‘A’
tailed, adaptor ligation, and amplification. After the PCR reaction,
the library hybridizes with the Liquid phase with a biotin-labeled
probe, then uses magnetic beads with streptomycin to capture the exons
of genes. Captured libraries were enriched in a PCR reaction to add
index tags to prepare for hybridization. Products were purified using
the AMPure XP system (Beckman Coulter, Beverly, USA) and quantified
using the Agilent high sensitivity DNA assay on the Agilent Bioanalyzer
2100 system. After fluorescence quantification by ABI StepOne Plus
real-time PCR system (Life technologies), the libraries were sequenced
on the Illumina HiSeq X Ten platform, with 150-bp paired-end reads,
according to the manufacturer’s instructions.
Sequence alignment and SNV calling
First, paired reads (150 bp) were acquired from Hiseq and evaluated by
fastqc (version 0.11.7)
([167]http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Then,
the clean reads were aligned to the reference human genome (build
hg19), using the Burrows-Wheeler Aligner (BWA) v0.7.15 to get the
alignment file stored in BAM format^[168]39. SAMTools^[169]40, Picard
([170]http://broadinstitute.github.io/picard/), and GATK
(3.6.0)^[171]41 were separately used to sort BAM files, and remove
duplicated reads, local realignment, and base quality recalibration to
generate final BAM files. Recalibration training databases included
HapMap 3.3, dbSNP build 132, Omni 2.5 M chip, and Mills. Bamdst
([172]https://github.com/shiquan/bamdst) was performed to evaluate the
sequence coverage and depth.
Both MuTect^[173]42 (1.1.4) and VarScan2^[174]43 were used to detect
SNVs. To identify somatic variants in the target exon data, we first
used the Mutect algorithm^[175]42, which detects candidate somatic
mutations by the Bayesian statistical analysis of bases and their
qualities in both tumor and normal BAM files at a given genomic locus.
Variants called by MuTect were filtered according to the filtering
parameter ‘PASS’ Meanwhile, tumor and matched normal pileup files were
generated using the “samtools mpileup” command. Then VarScan2 somatic
(v2.3.9) utilized the output to identify somatic variants between tumor
and matched normal samples. Raw somatic variants were filtered using
the VarScan ‘processSomatic’ subcommand with arguments –min-tumor-freq
0.07, --max-normal-freq 0.02 and –p-value 0.05. An SNV was detected if
the mutations were called by both VarScan2 (p-value < = 0.01) and
Mutect when variant allele frequency is greater than 5% and somatic
p-value < = 0.01.
High-quality somatic variants could be obtained based on the following
filtering conditions:
* (i)
The total reads of the variant site were higher than 10 in both
tumor and normal samples.
* (ii)
VAF < 0.02 and no more than three reads in the normal samples.
* (iii)
VAF > 0.05 and with more than five reads support variants allele in
the tumor samples.
ANNOVAR^[176]44 was utilized to annotate single-nucleotide variants,
and SNVs from the 1000 Genome database and dbSNP were removed, but SNVs
in the COSMIC database^[177]45 were retained.
Multi-regional WES provides the opportunity to increase the sensitivity
to detect low VAF variants^[178]46. A somatic mutation was not captured
by all the tumor regions but only called in part of regions. Read depth
and allele information were obtained from the final BAM file using
bam-readcount ([179]https://github.com/genome/bam-readcount). Variants
were considered to be present if their VAF was more than 0.02 or there
were more than three reads supporting the variant allele.
The SNVs from ESCC40A were selected for validation by ultra-deep WES
(1200x). A strong relationship was observed between the VAF from the
exome-sequencing and the validation sequencing datasets.
We performed multi-regional whole-exome sequencing on 38 ESCC tumors
and classified somatic mutations as clonal or subclonal mutations.
These mutations were defined as single-nucleotide variants and
copy-number alterations, which were shared by all the tumor regions
(clonal mutations) or shared by a subset of tumor regions
(subclonal)^[180]14. For the somatic SNV, we merged all the tumor
region’s SNVs and divided them into two parts: the SNVs shared by all
the tumor regions are called clonal SNV, and the SNVs existed in only
parts of tumor regions are called subclonal. For the somatic SCNA, we
performed bedtools^[181]47 to obtain the SCNA segments shared by all
the tumor regions from one patient. The SCNA segments which were shared
by parts of tumor regions, were called subclonal SCNA.
Copy-number analysis
VarScan2 (v2.3.9) was performed to detect the copy-number status from
paired tumor and normal samples with default parameters. Chromosomal
arm copy-number alteration and the ITH status were detected as the
following steps:
* (i)
Clonal arm gain or loss was detected if all the same chromosomal
arms showed at least 75% gain (including amplification) or loss
across all the biopsies of the patient. And at least one biopsy
showed at least 90% gain or loss.
* (ii)
Subclonal arm gain or loss was detected if at least one biopsy of
the patient showed greater than 75% gain or loss of the chromosomal
arm.
Allele-specific copy number and ploidy estimates were generated by
using Sequenza^[182]48. Firstly, we converted VarScan2 output to
Sequenza format and further processed it using the Sequenza R package
to generate segmented copy-number profile and ploidy estimates. The
processed copy-number for each sample was divided by the sample mean
ploidy and then log[2] transformed. Gain and loss were defined as
log[2](2.5/2) and log[2](1.5/2), respectively. Amplification was
defined as log[2](4/2).
To correct purity for the copy number of each segment, we calculate the
expected log2ratio for each segment as follows:
[MATH:
log2purity×cnseg+(1−purity)×cnnploidy :MATH]
1
The cn[seg], purity, and ploidy are calculated from Sequenza. cn[n] is
the copy number of copy neutral region (i.e. 2).
The heterogeneity of CNAs was detected using the minimum overlapped
alteration regions. The output generated by Sequenza was subsequently
reviewed, and all gene amplifications, homozygous deletions, and loss
of heterozygosity were visually inspected using plots of raw log[2] and
allele ratios. Genome doubling and wGII for each tumor were determined
as previously described^[183]17. We performed ABSOLUTE to analyze exome
sequencing data to identify the genome doubling status of each tumor
sample. The weighted genome integrity index (wGII) is calculated by the
percentage of SNPs across the genome present at an aberrant copy
number, relative to the normal copy number of the tumor sample. The use
of percentages helps eliminates the bias induced by differing
chromosome sizes. And the wGII score of a sample is defined as the
average of this percentage value over the 22 autosomal chromosomes.
Estimating cancer cell fraction for somatic mutations
To estimate the cancer cell fraction (CCF) for each somatic mutation.
The variant allele frequency, copy number in the corresponding region,
and the tumor purity were all taken into consideration as previously
described^[184]49.
[MATH: CCF=VAF×1/PurityCNtumor×Purity+2×(1−Purity)
:MATH]
2
CCF is the cancer cell fraction, VAF is the variant allele frequency,
CN[tumor] is the copy number at the mutation, and Purity is estimated
by Sequenza.
Subclonal deconstruction
PyClone^[185]50, which implemented the Dirichlet process clustering
method, was used to infer the clonal subpopulations based on the
mutations and copy numbers. We ran PyClone with 10,000 iterations and a
burn-in of 1000 and default parameters. When we constructed the
phylogenetic tree based on clonal compositions, two principals were
considered: first, the pigeonhole principle, which stated that two
mutation clusters could not be considered independently and on separate
branches of an evolutionary tree if the sum of the cancer cell
prevalence values of the two clusters exceeded 100% within a single
tumor region. Second, a descendent clone must exhibit a smaller
cellular prevalence than its ancestor within each tumor region,
referred to as the “cross rule.”
Detection of MSAI (Mirrored Subclonal Allelic Imbalance)
Mirrored subclonal allelic imbalance discovery involves comparing the
BAF of the same heterozygous SNPs across multiple tumor regions and
detecting whether the BAF values always follow the same distribution or
whether their positions are reversed (mirrored). Germline SNPs were
called by using VarScan2, and only SNPs with a minimum coverage of 10×
were analyzed. The B allele frequency (BAF) of each SNP was calculated
as the ratio of reads of reference base to variant. Heterozygous SNPs
and BAFs were used as input, and mirror subclone allelic imbalances
were analyzed by RECUR^[186]19 with default parameters. We also
detected a mirrored subclonal allelic imbalance arm gain or loss as
parallel evolution events, which is defined as when the opposite
parental alleles were affected in at least 75% of a chromosome arm in a
minimum of two regions.
Mutational signature analysis
Both silent and non-silent somatic SNVs were classified as either
clonal or subclonal as described, and then the mutational signatures
estimation of these SNVs were performed separately by using
deconstructSigs^[187]51. Signatures 1A, 2, 3, 4, 5, 6, 13, 15, 20, and
26 of the published Signatures from Sanger COSMIC^[188]45 were
considered. Both clonal and subclonal-specific mutational signature
analysis was applied to each spatial biopsies with at least 15
mutations.
Microsatellite instability
MSI was estimated using the exome sequence data by MSIsensor^[189]52
version 0.2 with default parameters and filtered using a 0.05 false
discovery rate threshold.
RRBS sequencing
Library construction and RRBS sequencing
The RRBS library was constructed using 2 μg of high-quality genomic
DNA. Briefly, DNA was restriction digested using the MspI enzyme, which
cut the DNA at sites CCGG, then the fragment was end-repaired and
dA-tailing to blunt-end products, followed by adaptor-ligation with T
overhang. The ligation products were purified by 2% agarose gel
electrophoresis and size-selected of DNA fragments 150–400 bp long
(including a 100 bp adaptor). Size-selected DNA was bisulfite
conversion with the NEXTflex Bisulfite-Seq Kit (Bioo Scientific,
Austin, Tx, USA). Bisulfite-converted DNA was then amplified with
Illumina PCR primers PE1.0 and 2.0 for 18 cycles. The final library was
enriched for fragments with adapters on both ends, and the RRBS was
performed by the Illumina HiSeq X Ten platform.
RRBS quality control and alignment
We preprocessed the raw reads in the fastq format by in-house Perl
scripts. Clean reads were obtained by removing reads which contained
adapter, poly-N, and low-quality bases in the read end from raw data.
Meanwhile, Q20, Q30, and GC content of clean data were evaluated using
the fastqc. Then, clean reads were mapped to the reference genome using
Bismark alignment software (version 0.18.1)^[190]53. The mapped reads
number, CpG site number, bisulfite conversion rate, and other related
details are shown in Supplementary Data [191]1.
Bulk purified tumor methylation level from RRBS and Multidimensional scaling
(MDS)
The bulk tumor methylation level can be explained as the mixture of
methylation level in the tumor cells and normal cells. To evaluate the
purified tumor cell’s methylation level from RRBS data, we calculated
the local copy-number state and the purity of sample. The purity were
evaluated using the WES dataset, and the local copy number was
estimated by HMMcopy. Considering the copy number at each CpG site and
the tumor purity, we can calculate the deconvolved tumor methylation
rate m[t]:
[MATH:
mt=mb<
mfenced close=")"
open="(">ρnt<
/mrow>+nn1−ρ
mfenced>−nn<
/msub>mn1−ρρnt :MATH]
3
The mt should be between 0 and 1, but considering the technical and
biological noise, the mt values could be outside of the ranges. We
corrected the negative methylation values up to 0 and values greater
than one were set to 1.
Next, we aggregated the multi-regional methylation data of each
patient. The CpG sites covered by all the samples of each patient were
used for further analyses. Then the genome was divided into 3-kb bins
(with ≥ 3 CpGs), and the genome-wide DNA methylation level was measured
by the mean methylation level of the 3-kb tiles. Then the Euclidean
distances between samples were calculated and input to the cmdscale
function in R to perform MDS classification. The methylation level of
the gene promoter was determined by the mean value of all the
methylation sites located in the promoter regions.
DNA methylation variance among distinct spatial regions
We estimated the region-to-region variance with 3-kb bins. For a given
tumor, the standard deviation of methylation levels for a bin across
tumor regions was calculated. The variable bins were then ranked with
their values. Distribution enrichment of each genomic element in the
top 500 variable bins was calculated, and the significance was
calculated using Fisher’s exact test.
Identification of DMRs and genomic element enrichment of differentially
methylated regions (DMRs)
The CpG sites covered by all the tumor regions and attached normal
samples were used for further DMR analysis. Differentially methylated
regions (1-kb) were identified by methylKit^[192]54. Hyper DMRs are the
genomic bins with a q-value <0.01 and a percent methylation difference
larger than 25%. Hypo DMRs are the genomic bins with q-value <0.01 and
percent methylation difference less than −25%. A schema was shown to
explain the multi-regional methylation level ITH across the ESCC cohort
(Supplementary Fig. [193]7A). DMRs were annotated on gene promoters
(defined as 2-kb upstream and downstream of the transcription start
site), first exons, gene bodies, CpG islands, CpG shores (2-kb genomic
regions upstream and downstream of CpG islands), and other genomic
element regions (using the University of California Santa Cruz (UCSC)
table browser) using BedTools^[194]47.
Quantification of epigenetic ITH
We performed an average pairwise ITH^[195]36 (APITH) to measure the
methylation ITH for each patient. The value of APITH is not biased by
the number of multi-regional samples per tumor (Supplementary
Fig. [196]6C, D). For each ESCC patient with k tumor regions, we
defined d[ij] as the epigenetic distance between a pair of samples (i,
j) based on the methylation level of 1kb-bins, and the APITH is defined
as the average across all pairs of samples:
[MATH: APITH=2
k(k−1)∑1≤i<j≤kdij :MATH]
4
Epiallele shift analysis and epiallele diversity inference
We employed an updated version of methclone^[197]55
([198]https://github.com/TheJacksonLaboratory/Methclone) to analyze the
epiallele composition of each locus. The bam files were provided to the
methclone to calculate the dominant methylation pattern information of
one locus in the sample. To find suitable thresholds for read coverage,
we designed a series of read thresholds set to 40, 60, and 80 reads. We
did not find significant differences among the results. So we chose a
relatively moderate threshold of 60 reads for methclone to calculate
loci. The methclone discards loci that have coverage below 60. To
evaluate intratumoral epigenetic heterogeneity, we used an open-source
R package epihet^[199]25 to calculate the proportion of discordant
reads (PDR), Epipolymorphism, and Shannon entropy.
Unsupervised clustering analysis
Unsupervised clustering analysis was performed on the copy-number, RNA
expression, and methylation level of tumor samples in the ESCC cohort.
Briefly, the Euclidean distances were calculated based on the copy
number calls, RNA expression of each gene, and the methylation level of
each bin in any pair samples. We next integrated all the three omics
distances to perform unsupervised hierarchical clustering. Moreover,
the distance of each omics level was also evaluated to compare the
concordance within patients and between patients.
Multi-regional RNA-Seq analysis
RNA sequencing and alignment
Total RNA was extracted and purified from fresh frozen tissues using
the Trizol reagent (Invitrogen). RNA integrity was measured on an
Agilent 2100 Bioanalyzer (Agilent Technologies). Paired samples with
high RNA integrity (RNA integrity number > 5), no contaminants, and
enough amount of RNA were used to prepare the transcriptome library.
mRNA was purified from total RNA using poly-T oligo-attached magnetic
beads (Thermo Scientific) and fragmented with an NEB Fragmentation
Reagents kit (NEB). The cDNA synthesis, end-repair, A-base addition,
and ligation of the Illumina index adapters were performed according to
Illumina’s TruSeq RNA protocol (Illumina). Library quality was measured
on an Agilent 2100 Bioanalyzer for product size and concentration.
Paired-end libraries were sequenced by an Illumina HiSeq X Ten
(2 × 150-nucleotide read length), with a sequence coverage of 44 M
paired reads. FASTQ data underwent quality control and were aligned to
the hg19 genome using STAR^[200]56 to the human reference sequence
(UCSC hg19 assembly).
RNA qualification and quantification
Transcript quantification was performed using RSEM with default
parameters^[201]57. And transcript per million (TPM) expression values
were generated. Genes were kept with an expression value of at least 1
TPM in at least 20% (35/176) of tumor samples in the multi-region
RNA-Seq dataset, and log[2](TPM + 1) were used to represent the
expression levels. In total, 24,302 genes were applied for further
analysis.
RNA heterogeneity scores
Both Intratumor RNA heterogeneity scores and Intertumor RNA
heterogeneity scores were calculated using multi-region RNA-Seq data
(normalized count values) from ESCC, which is similar to Dhruva et al.
method^[202]27.
RNA heterogeneity quadrants
We devised the RNA heterogeneity quadrants by splitting the intra- and
inter-RNA heterogeneity using their respective mean values.
Pathway analysis
Pathway enrichment analysis was performed on genes in ESCC Q1-Q4
quadrants using the ReactomePA package (version 1.28.0).
Bonferroni-adjusted P-value was evaluated, which was based on the
threshold P-value <0.01.
Correlated subclonal SCNA with gene expression changes
Based on the heterogeneous copy-number segments between paired tumor
regions, we explored the gene expression difference between copy-number
alterations and copy-number neutral in the corresponding genes by
subtracting the log2 expression value of the SCNA neutral gene from the
SCN alteration genes. We then performed a two-sided paired t-test to
evaluate the statistical significance.
To examine the enrichment of genes with different copy-number statuses
across the four heterogeneity quadrants, we first divided all genes
within individual patients into “clonal gain”, “clonal loss”,
“subclonal gain”, “subclonal loss” and “no changes” classes. Then we
determined whether the different classes of genes across the four
quadrants are enriched or depleted by using a two-sided Fisher’s exact
test.
Correlation between the differentially methylated promoter and corresponding
gene expression changes
We assessed differentially methylated promoters as follows: The
methylation levels of all the CpG sites within the promoter of any
paired tumor regions from the same patient were compared. Sites that
were differentially methylated at the significant level of 0.05 as
determined by Fisher’s exact test and had a minimum methylation
difference of 0.2 between two tumor regions were considered as
differentially methylated promoters. Then we detected the difference
between the expression values of corresponding genes in the two tumor
regions. Statistical significance was tested with a two-sided paired
t-test. Similar steps were also performed in the first exon genomic
region.
Estimating tumor immune infiltration based on RNA-seq
In this part, we performed the Danaher et al. method^[203]28 to
evaluate the scores of the immune cell population. The immune signature
of this method optimally estimated immune infiltrates compared with
other immune cell deconvolution method^[204]37. The method of Danaher
et al. was used to estimate immune cell populations for the tumor
region samples by the RNA-seq dataset. The immune cell populations
consist of CD4^+ T cells (CD4), helper T cells (TH1), regulatory T
cells (Treg), dendritic cells (dendritic), B cells (B cell), mast cells
(mast), natural killer cells (NK), natural killer CD56^− cells (NK
CD56^−), neutrophils, macrophages, CD45^+ cells (CD45), CD8^+ T cells
(CD8), exhausted CD8^+ T cells (CD8 exhausted), and measures for total
T cells (T cells), total TILs (total TIL) and cytotoxic cells (cyto).
We also used Consensus^TME method^[205]29 to validate the TILs score of
each immune cell population. This method used co-expression patterns in
large tumor gene expression datasets to obtain candidate cell type
marker genes lists, eliminate numerous false positives and provide
high-confidence marker genes. This method provides robust TILs
estimation and performs excellently in all cancer-related benchmarks to
estimate the existing TILs of tumors. The immune cell population scores
of Consensus^TME were normalized to compare with the Danaher method
results by using the “ESCA” cancer type. The immune distance was
determined by using the Euclidean distance of immune-infiltrate
estimates between tumor regions.
Classifying tumor regions based on the levels of immune infiltration
First, we clustered the samples from the ESCC cohort based on the
immune cell population estimated score using “ward.D2”. Tumor samples
were split into two different groups according to the dendrogram. The
samples with higher immune cell population scores were considered to be
tumor regions with high levels of immune infiltration. And the samples
with lower immune cell populations were considered to be tumor regions
with low levels of immune infiltration. If all the tumor regions of the
patient were classified as immune high/low, the patient was considered
as immune high/low group. If the high immune samples and low immune
samples co-existed in the same patient, the overall tumor was
classified as a heterogeneous immune group. The three patients (ESCC12,
ESCC15, ESCC56) without RNA-seq data were excluded from the immune
infiltration analysis.
Validating tumor immune infiltration using opal
Multiplex staining was performed on 4 mm formalin-fixed
paraffin-embedded sections using the Opal multiplex IHC system
(PerkinElmer; NEL800001KT) according to the manufacturer’s
instructions. Briefly, slides were baked for 1 h at 80 °C, followed by
deparaffinization with xylene and a graded series of ethanol dilutions
(100%, 95%, and 70%), fixation with 10% neutral buffered formalin for
30 minutes, microwave antigen retrieval using the AR9 buffer
(PerkinElmer; AR900250ML), and blocking. The antibodies used in this
section were CD8 (R&D, MAB1509, clone # 37006, diluted at 1:200), CD56
(R&D, AF2408, Polyclonal, diluted at 1:150), CD4 (Abcam, ab133616,
clone EPR6855, diluted at 1:200), CD68 (Abcam, ab213363, clone
[206]EPR20545, diluted at 1:200), FOXP3 (Abcam, ab22510, clone
mAbcam22510, diluted at 1:200), CK (Abcam, ab756, clone MNF116, diluted
at 1:50). To visualize immunofluorescent signaling, the following
OPALTM (Perkin Elmer) TSA dyes were used: OPAL520, 570, 620, 650, 690.
Spectral DAPI (Perkin Elmer) was used for the nuclear counterstain.
Imaging of slides was performed on a Vectra 3.0 Automated Imaging
System (Perkin Elmer). Counterstain was done using DAPI (1:1000) and
subsequently mounted using Vectashield (Vectra; H-1000) fluorescence
media. Tyrosinase for higher-resolution imaging at 20x magnification
using Phenochart (Perkin Elmer). The number of 20x images per sample
scanned by the Vectra 3.0 microscope and used for further analysis.
HLA type and HLA LOH prediction
The HLA type for each patient was detected using POLYSOLVER^[207]33,
which uses a normal tissue BAM file as input and employs a Bayesian
classifier to determine genotype. Tumour regions with HLA LOH events
were detected using LOHHLA^[208]34.
Neoantigen binders prediction
All 9-11mer peptides that overlapped identified non-silent mutations
present in the sample were considered candidate epitopes. MHC-I binding
affinity was calculated for every mutation and corresponding wild-type
allele using netMHCpan-4.0^[209]58. The mutant epitope with percentile
binding scores of ≤ 2% and equal or better affinity than the wild-type
epitope were considered putative neoepitopes. In cases of tumor sample
with HLA LOH, predicted neoepitopes associated with the lost HLA allele
were excluded (for subclonal HLA LOH, a neoepitope was only excluded if
all clones with the neoepitope also exhibited loss of the corresponding
HLA allele). When RNA-seq data were available, an expressed neoantigen
was detected if at least five RNA-seq reads mapped to the mutation
position, and at least three contained the mutated allele.
Statistical information
We performed all the statistical tests using R. Comparisons of
distributions were done using ‘wilcox. test’ or ‘t.test’. The two-sided
Student’s t-test was used to compare the significant differences
between the two groups. To consolidate the conclusion of the
correlation test of our study, We also performed partial correlation to
estimate Pearson (linear) correlation between two variables while
controlling for one or more other variables (An R Package for a Fast
Calculation to Semi-partial Correlation Coefficients.). This method has
been validated in many studies. We computed the partial correlation
controlling for tumor purity using the pcor.test() function from the R
package ppcor. All data are presented as mean values ± SEM.
Reporting summary
Further information on research design is available in the [210]Nature
Portfolio Reporting Summary linked to this article.
Supplementary information
[211]Supplementary Figures^ (20.3MB, pdf)
[212]Peer review file^ (3.8MB, pdf)
[213]Supplementary Data 1^ (52.4KB, xlsx)
[214]Reporting Summary^ (1.6MB, pdf)
Acknowledgements