Abstract
Primary small cell carcinoma of the esophagus (PSCCE) is a lethal
neuroendocrine carcinoma. Previous studies proposed a genetic
similarity between PSCCE and esophageal squamous cell carcinoma (ESCC)
but provided little evidence for differences in clinical course and
neuroendocrine differentiation. We perform whole-exome sequencing, RNA
sequencing and immunohistochemistry profiling on 46 PSCCE cases.
Integrated analyses enable the discovery of multiple mechanisms of RB1
disruption in 98% (45/46) of cases. The transcriptomic landscape of
PSCCE closely resembles small cell lung cancer (SCLC) but differs from
ESCC or esophageal adenocarcinoma (EAC). Distinct gene expression
patterns regulated by ASCL1 and NEUROD1 define two molecular subtypes,
PSCCE-A and PSCCE-N, which are highly similar to SCLC subtypes. A T
cell excluded phenotype is widely observed in PSCCE. In conclusion,
PSCCE has genomic alterations, transcriptome features and molecular
subtyping highly similar to SCLC but distinct from ESCC or EAC. These
observations are relevant to oncogenesis mechanisms and therapeutic
vulnerability.
Subject terms: Cancer genomics, Oesophageal cancer, Tumour-suppressor
proteins
__________________________________________________________________
Primary small cell carcinoma of the oesophagus has a poor prognosis,
and has not been fully characterised molecularly. Here, the authors
study the disease using multi-omics technology and find frequent RB1
disruptions and similarities to small cell lung cancer, opening
potential therapeutic avenues.
Introduction
Primary small-cell carcinoma of the esophagus (PSCCE) is one of the
deadliest neuroendocrine malignancies, featuring rapid progression,
high metastasis propensity, and dismal prognosis. The 5-year survival
rates of limited-stage patients were ~10%^[62]1–[63]3 and were
practically zero for extensive-stage patients^[64]4,[65]5. The
prevalence of PSCCE has been increasing in the recent decade^[66]5,
estimated to account for 0.5–2.8% of all esophageal
malignancies^[67]6–[68]8. However, no consensus on standardized
treatment for PSCCE has been reached at present, and all current
strategies showed limited capacity to improve prognosis.
Diagnosis of PSCCE requires appropriate morphology and positive
immunohistochemistry (IHC) staining of neuroendocrine markers including
CD56, Synaptophysin, and Chromagranin A^[69]9. PSCCE has long been
observed to have different morphology^[70]10, disease
course^[71]2,[72]3,[73]8 and IHC staining^[74]9 from esophageal
squamous cell carcinoma (ESCC) or esophageal adenocarcinoma (EAC), two
predominant histologies of esophageal cancer. Indeed, most management
strategies for PSCCE were extrapolated from those designed for
small cell lung cancer (SCLC) based on their similar histology.
To date, there was no comprehensive multi-omics study of PSCCE. One
previous whole-exome sequencing (WES) study of 55 PSCCE revealed
disruption of putative tumor-suppressors including TP53 (80%), RB1
(27%), and NOTCH1 (24%), claiming that PSCCE genetically resembled
ESCC^[75]11. However, the reported mutation rate of RB1 was
significantly lower than that reported in SCLC (93%)^[76]12. Knowledge
of transcriptomic features, molecular subtyping, and therapeutic
targets is still lacking.
Integrated genomics and transcriptomics studies of SCLC converged on
four distinct molecular subtypes each with unique susceptibilities to
different targeted therapies^[77]13–[78]15, which was informative to
future researches and clinical trials. These studies of SCLC inspired
several studies on promising therapeutic targets, including the
inhibition of Aurora kinase^[79]14,[80]16 and targeting metabolism
addition of specific SCLC subtype^[81]17.
To provide a more comprehensive understanding of the biology of PSCCE,
we perform genomic, transcriptomic, and immune profiling of 46 PSCCE
screened from 7539 consecutive esophageal cancer patients treated in a
single institute. By integrating multi-omics studies, we discover an
array of molecular features distinguishing PSCCE from ESCC or EAC,
several of which may translate into therapeutic targets. We identify
the nearly universal disruption of RB1 by multiple mechanisms,
confirming the fundamental role for RB1 in PSCCE. We identify two
molecular subtypes of PSCCE, each with unique genomic and
transcriptomic features. The tumor microenvironment (TME) of PSCCE is
featured by insufficient T-cell infiltration, which might be induced by
some intrinsic expression programs. These findings deepen our
understanding of PSCCE and may translate into clinical impact.
Results
Patients and study overview
We retrospectively screened 7539 consecutive esophageal cancer patients
treated in the Department of Thoracic Surgery from 2006 to 2017 and
identified 65 PSCCE cases (0.9%). After intensive quality evaluation,
46 cases with complete medical records and sufficient specimens for
sequencing were included in our study (Supplementary Data [82]1).
We performed WES on 46 pairs of tumor and matched normal sample,
including 13 pairs of fresh-frozen (FF) samples and 33 pairs of
formalin-fixed, paraffin-embedded (FFPE) samples. The average coverages
for FF samples were 333
[MATH: × :MATH]
and 172
[MATH: × :MATH]
for tumors and normal samples, respectively. The average coverages for
FFPE samples were 314
[MATH: × :MATH]
and 166
[MATH: × :MATH]
for tumors and normal samples, respectively. Sequencing coverages were
comparable between FF samples and FFPE samples (tumors, P = 0.647;
normal samples, P = 0.454, all by the Wilcoxon rank-sum test).
We performed RNA Exome Access sequencing (RNA-seq) on FFPE samples of
38 tumors and 23 matched normal esophageal samples, achieving an
average read depth of 86.7 million reads per sample.
Mutational burden and signatures
We identified 4918 somatic mutations by WES, including 1200 synonymous
single-nucleotide variants (SNVs), 3511 nonsynonymous SNVs and 207
indels (Supplementary Data [83]2), with an average of 81 nonsynonymous
mutations per tumor (range: 16–190), corresponding to an average
nonsynonymous mutation rate of 2.31 mutations per megabase (Mb).
We observed that FFPE samples demonstrated significantly higher
mutational burden than FF samples (overall mutation rate: 3.36/Mb and
2.28/Mb for FFPE and FF samples, respectively, P = 0.00628, Wilcoxon
rank-sum test; nonsynonymous mutation rate: 2.54/Mb and 1.72/Mb for
FFPE and FF samples, respectively, P = 0.0044, Wilcoxon rank-sum test).
The proportion of low allele frequency (AF) mutations (AF < 0.05) in
FFPE samples (887/3881, 23%) was significantly higher than that in FF
samples (164/1037, 16%, P = 1.14 × 10^−6, Chi-square test), consistent
with previous reports that FFPE samples were enriched for low-AF
nucleotide transitions induced by fixation^[84]18. The proportions of
nonsynonymous mutations in all mutation were comparable between FFPE
(75.6%, 2933/3881) and FF samples (75.7%, 785/1037, P = 0.9656,
Chi-square test).
The nonsynonymous mutation rate in this study was significantly higher
than that reported by Wang et al.^[85]11 (2.12/Mb, P = 0.0225, Wilcoxon
rank-sum test), owing to the higher sequencing depth. The nonsynonymous
mutation rate of PSCCE was significantly lower than those of
ESCC^[86]19–[87]21 (3.15/Mb, P = 0.00021), EAC^[88]21 (5.16/Mb,
P = 1.11
[MATH:
×10−7 :MATH]
) and SCLC^[89]12 (8.62/Mb, P < 2.2
[MATH:
×10−16 :MATH]
, all by the Wilcoxon rank-sum test), ranking medially among cancers
sequenced by The Cancer Genome Atlas (TCGA, Supplementary Fig. [90]1a)
project.
We observed a high frequency of C>T substitutions, comprising 51.1%
(2407/4711) of all SNVs (including 1200 synonymous SNVs and 3511
nonsynonymous SNVs), especially in the NpCpG trinucleotide context
(Fig. [91]1a). C>T substitution was the most common SNVs in both FF and
FFPE samples; however, the proportion of C>T substitutions was higher
in FFPE samples (53.1%) than in FF samples (43.1%), consistent with
previous reports of C>T substitution enrichment in FFPE samples
resulted from deamination during fixation^[92]22. The overall
mutational spectrum was highly similar to that reported by Wang et
al.^[93]11 (cosine similarity: 0.97, Supplementary Fig. [94]1b), while
similar to EAC^[95]21 (0.89) and ESCC^[96]21 (0.85) to a lesser extent
and quite different from SCLC^[97]12 (0.62).
Fig. 1. Genomic alterations detected by WES in PSCCE.
[98]Fig. 1
[99]Open in a new tab
a Mutational spectrum of PSCCE. Substitutions are plotted in different
colors with their context arranged in the denoted order. b Five
mutational signatures, denoted as E1–E5, were identified in the
integrated analysis of PSCCE, ESCC, EAC, and SCLC. COSMIC signatures
showing high similarity, its etiology and cosine similarity score are
shown. c The proportions of signature E1–E5 in each sample of PSCCE,
ESCC, EAC, and SCLC tumors. The predominant signature in each cancer is
shown. d Recurrent somatic copy-number variations in PSCCE.
Amplifications and deletions are plotted in red and blue, respectively.
e Landscape of somatic alterations in PSCCE. Somatic alterations of
each gene (row) in each tumor (column) are plotted as a heatmap
according to the color legend below. Samples are arranged by the number
of mutations (top panel). Clinical parameters of each patient are shown
below. Alteration frequencies are shown in the right panel. The
proportion of single-nucleotide substitutions in each sample is shown
in the bottom panel.
In the integrated analysis of mutational signatures of PSCCE (n = 101,
combined with 55 cases reported by Wang et al.^[100]11, hereinafter
referred to as the “combined cohort”), SCLC (n = 110, ref. ^[101]12),
EAC (n = 88, TCGA), and ESCC (n = 96, TCGA), we identified five
principal mutational signatures, E1–E5 (Fig. [102]1b), which were
highly similar to Catalogue Of Somatic Mutations In Cancer (COSMIC)
signatures 1, 13, 4, 16, and 17, respectively. We did not discover any
de novo signature unique to small-cell carcinomas. The proportions of
signatures varied among cancer types (Fig. [103]1c), reflecting the
effects of both exposure and intrinsic tumorigenesis mechanism.
Endogenous 5-methylcytosine deamination-associated Signature E1 was the
predominant mutational signature in PSCCE. Signature E4 was recently
reported to be associated with alcohol drinking^[104]23. Consistent
with this observation, Signature E4 contribution was significantly
higher in regular drinkers of PSCCE patients (median 25.6%) compared to
never or occasional drinkers (median 13.0%, P = 0.039, Wilcoxon
rank-sum test).
Somatic copy-number variations (SCNVs)
We performed data cleaning and applied stringent thresholds to identify
solid somatic copy-number variations (SCNVs) from the noisy background
of FFPE WES data (Source Data 1).
GISTIC2.0 algorithm^[105]24 identified significant deletions
(Fig. [106]1d and Supplementary Data [107]3) of 13q14 (harboring RB1)
and 3p12-14 (harboring FHIT and ROBO1), both of which were also
recurrently deleted in SCLC^[108]12. We also observed amplifications of
canonical cancer genes, including CCNE1 on 19q12 and MYC on 8q24.
We validated the copy numbers of 102 loci in 34 samples (including 12
FF and 22 FFPE; MYC, TERT, and SOX4 loci for each sample, Supplementary
Data [109]4) by quantitative polymerase chain reaction (PCR). Thirty
five of 36 (97%) loci in FF samples and 61 of 66 (92%) loci in FFPE
samples showed consistent results between WES and quantitative PCR.
Several genes in cell-cycle pathway and receptor tyrosine kinase
pathway showed remarkably different frequency of SCNV between PSCCE and
esophageal cancers. CCND1 amplification was observed in 33–54% of
ESCCs^[110]19,[111]21 and in only 4% of PSCCEs (P = 1.06
[MATH:
×10−7 :MATH]
, Fisher’s exact test). CDKN2A deletion was reported in up to 66% of
ESCCs^[112]19,[113]21 but was only observed in 4% of PSCCEs (P = 2.46
[MATH:
×10−6 :MATH]
, Fisher’s exact test). ERBB2 was amplified in 17–27% of
EACs^[114]21,[115]25 but in none of PSCCEs (P = 7.36
[MATH:
×10−5 :MATH]
, Fisher’s exact test).
Recurrently mutated genes and pathways
The MutsigCV algorithm^[116]26 identified three significantly mutated
genes (SMGs, P < 0.05 and q < 0.1, Fig. [117]1e, Supplementary
Data [118]5): TP53, RB1, and NOTCH1. Two genes were identified with
significant mutation clusters (P < 0.05 and q < 0.1, Supplementary
Data [119]5): TP53 and NOTCH1. In the combined cohort of 101 PSCCE
patients, no more SMGs were identified, while two more genes were
identified with significant mutation clusters: EP300 and FBXW7
(Supplementary Fig. [120]1c).
Sequencing of FFPE samples was reported to have comparable performance
in detecting driver mutations and actionable events to sequencing of FF
samples^[121]18. We performed Sanger sequencing of somatic mutations in
TP53 (n = 54), RB1 (n = 19), and NOTCH1 (n = 15, Supplementary
Data [122]6). Twenty-four of 24 loci (100%) in FF samples and 55 of 64
loci (86%) in FFPE samples were successfully amplified by PCR. All loci
that were amplified were validated by Sanger sequencing. We further
validated seven of nine loci that suffered PCR amplification failure by
confirming expression of somatic mutations in RNA-seq reads
(Supplementary Fig. [123]2).
In total, 54 somatic mutations (43 nonsynonymous SNVs, 2 synonymous
SNVs and 9 indels) of TP53 were observed, affecting 85% (39/46) of all
tumors, similar to the previous report by Wang et al.^[124]11 (44/55,
P = 0.608, Fisher’s exact test). Thirty-four percent (18/52) of the
nonsynonymous mutations of TP53 were nonsense, frame-shifting indels or
splice site mutations truncating the protein.
We identified 22 genes with established roles in cancers (the Cancer
Gene Census, ref. ^[125]27) among 97 genes that were nonsynonymously
mutated in at least three tumors. We observed frequent mutations in
histone modifier genes. Histone acetyltransferase EP300 and CREBBP were
mutated in 9% (4/46) and 7% (3/46) of cases, respectively. Two out of
four EP300 mutations and all four CREBBP mutations affected the histone
acetyltransferase domain. COMPASS-like complex components KMT2D and
KDM6A, which play important roles in modifying histone methylation,
were mutated in 4 and 3 tumors, respectively. Mutations of EP300,
CREBBP, KMT2D, and KDM6A were largely mutually exclusive, affecting 26%
(12/46) of all PSCCEs (Fig. [126]1e). FAT atypical cadherin family,
including FAT1, FAT3, and FAT4, which are considered tumor-suppressive
in numerous cancers^[127]28, were mutated in a total of ten cases.
Nearly universal disruption of RB1 by multiple mechanisms
We identified 19 somatic mutations of RB1 affecting 16 (34.8%) cases
(Fig. [128]2a). The mutation frequency of RB1 was comparable to that
reported by Wang et al.^[129]11 (15/55, 27.3%, P = 0.5166, Fisher’s
exact test). Seventy-nine percent (15/19) of RB1 mutations were
truncating, including 9 nonsense, 3 frame-shifting indels, and 3 splice
site mutations.
Fig. 2. RB1 disruption by multiple mechanisms.
[130]Fig. 2
[131]Open in a new tab
a Schematic summary of RB1 disruption events observed in each sample
(column) by different methods (row). The rightmost sample is PSCCE_33T,
the only tumor positive for Rb, yet not evaluated by RNA-seq. b (From
top to bottom) Schematic plot of exon deletions, Sanger sequencing
validation of exon deletion breakpoints, sashimi plot of subsequent
splicing abnormalities in RB1 mRNA and Rb IHC of three representative
tumors. Schematic summary of disrupting mechanism is shown on the
right. Black rectangles represent RB1 exons and gray dashed line
represent deleted genomic regions. Green, red, blue, and black peaks in
Sanger sequencing chromatograms represent bases A, T, C, and G,
respectively. Genomic coordinates are in hg19 assembly. Curves in
sashimi plot represent reads spanning exon junction with numbers of
reads denoted. Abnormal exon junctions are plotted in red. Rb IHC of
matched normal sample is provided as control. Scale bar: 50 μm. c
Schematic plot and Sanger sequencing validation of exon deletions
(left), sashimi plot and validation of abnormal mRNA exon junctions
(middle) and Rb IHC staining (right) of PSCCE_10T. Two alleles of RB1
are plotted separately to show different ranges of exon deletions.
Exons of ITM2B are plotted in green. Sanger sequencing chromatograms
and sashimi plots are plotted in the same way described above. Scale
bar: 50 μm.
SCNV analysis discovered that 14 (30.4%) tumors harbored homozygous
deletions affecting RB1, including 13 homozygous deletions affecting
only part of but not the whole RB1 locus (Supplementary Data [132]7,
hereinafter referred to as “RB1 exon deletions”), and one homozygous
deletion of whole RB1 locus in PSCCE_79T. RB1 exon deletions affected a
median of 11 exons (range: 1–25) and had a median length of
52.5 kilobase (kb, range: 0.2–2828 kb). RB1 exon deletions left the
unaffected exons intact and were mutually exclusive to somatic
mutations (P = 0.00117, hypergeometric test). According to the SCNV
results, we designed series of PCR primers to locate breakpoints of RB1
exon deletions, consolidating the bioinformatics SCNV findings
(Fig. [133]2b and Supplementary Data [134]7). RB1 exon deletions
occurred at a distinguishably high frequency (28.3%) in PSCCE,
remarkably higher than that of 110 whole-genome sequenced SCLC^[135]12
(8.2%, P = 0.002, Fisher’s exact test) and 508 whole-genome sequenced
ESCCs^[136]29 (0.79%, P = 2.43
[MATH:
×10−13 :MATH]
, Fisher’s exact test). RB1 exon deletions were not thoroughly
described in the previous WES profiling of PSCCE^[137]11.
We further integrated RNA-seq and IHC profiling to discover additional
RB1-disrupting events. We found that 28 of 38 RNA-sequenced tumors
harbored splicing abnormalities of RB1 mRNA, including exon skipping
(n = 21), gene fusion (n = 2), formation of new splice site (n = 1),
and disrupted expression of 3′-terminal exons (n = 4, Supplementary
Data [138]8). RNA-seq recapitulated splicing abnormalities of RB1 mRNA
that are the deduced consequences of exon deletions and splicing site
mutations (Fig. [139]2b, c and Supplementary Fig. [140]3a, b). Eight
tumors with splicing abnormality in RNA-seq were otherwise RB1 “wild
type” in WES (Supplementary Fig. [141]3c), suggesting alterations
undetectable by WES.
Five tumors (11%) stained positive for RB1 protein (Rb) by IHC
(Supplementary Data [142]8). However, deleterious abnormalities were
observed in 4 of them. In PSCCE_10T and PSCCE_13T, in-frame exon
deletions excluded exons encoding >400 amino acids (Fig. [143]2c) but
retained C-terminal epitope for IHC antibody recognition. In PSCCE_10T,
two alleles of RB1 suffered different deletions: one allele suffered
exons 1–17 deletion (encoding 499 amino acids) and formed in-frame
fusion with the upstream ITM2B; the other allele suffered
frame-shifting deletion of exons 3–18 (Fig. [144]2c). Strong Rb
staining was observed in cytoplasm but not nucleus of PSCCE_10T. Two
tumors, namely PSCCE_32T and PSCCE_56T, were wild type by WES and
stained positive for Rb. However, RNA-seq revealed in-frame exon
skipping of RB1 mRNA in PSCCE_32T (exons 3–17 skipping, encoding 477
amino acids) and PSCCE_56T (exons 14–17 skipping, encoding 121 amino
acids, Supplementary Fig. [145]3d), which severely disrupted Rb
function. Notably, Rb was completely lost in four cases with no
abnormalities observed by WES or RNA-seq (Fig. [146]2a and
Supplementary Fig. [147]3e), indicating mechanisms disrupting RB1 mRNA
translation or protein stability.
By integrating WES, RNA-seq, and IHC results, we discovered disruption
of RB1 in 45 (98%) cases. Only one tumor, PSCCE_33T, stained positive
for Rb and showed no abnormality in WES; we had no RNA to test in-frame
exon skipping in PSCCE_33T. Integrative analysis identified
substantially more RB1 disruption events than did the single WES
performed by us or by Wang et al.^[148]11. RB1 disruption frequency was
remarkably higher than ESCC^[149]21 (10%, 9.8-fold, P < 2.2
[MATH:
×10−16 :MATH]
, Fisher’s exact test), EAC^[150]21 (3.4%, 28.8-fold, P < 2.2
[MATH:
×10−16 :MATH]
, Fisher’s exact test) and was comparable to SCLC^[151]12 (93%).
PSCCE has a transcriptome highly similar to SCLC
We next turned to the transcriptomic landscape of PSCCE (Source Data
[152]2). We compared the gene expressions of 38 tumors against 23
matched normal esophageal samples to identify differentially expressed
genes (DEGs, Source Data [153]3). DEG analysis revealed significant
upregulation of genes involved in DNA replication, cell cycle, and
neuroendocrine differentiation (Fig. [154]3a and Supplementary
Fig. [155]4a) and downregulation of genes related to cell adhesion in
PSCCE. By enrichment analysis, we discovered several pathways that were
largely distinguishable between PSCCE and esophageal cancers
(Fig. [156]3b). Neuroactive ligand-receptor interaction pathways were
specifically upregulated in PSCCE, which was consistent with its
neuroendocrine differentiation. Upregulated DEGs of EAC and ESCC were
significantly enriched for immune response pathways. In contrast, these
pathways were attenuated in PSCCE.
Fig. 3. The transcriptome landscape of PSCCE.
[157]Fig. 3
[158]Open in a new tab
a Volcano plot of differentially regulated genes (DEGs). Up- and
downregulated DEGs are plotted in red and cyan, respectively. Key genes
are plotted in purple with symbols annotated. Source data are provided
as a Source Data file. b Pathways that were largely discordant between
PSCCE, EAC, and ESCC. c Heatmap of gene expressions of three principle
clustering groups. Key signature genes associated with clustering are
marked on the right side. d Expression of neuroendocrine
differentiation marker genes and lineage transcription factors across
three groups. PSCCE and SCLC samples are plotted separately to show
that there was no hijacking of either cancer by the other. The upper
bound, centerline, and lower bound of boxplot represent the 75
percentile, the median and the 25 percentile of data; the upper and
lower whiskers extend to the largest and smallest value within 1.5
times of interquartile range (IQR) from corresponding bound. Data
beyond the whiskers are plotted as outlier dots. ***P < 0.001, by the
Wilcoxon rank-sum test. e GSEA revealed that E2F target genes were
specifically expressed in Group NE. f GSEA revealed that RB1-loss
associated genes were specifically expressed in Group NE.
Given the prominent neuroendocrine differentiation of PSCCE and its
divergence from EAC and ESCC, we sought to describe relationship with
its counterpart in the lung, SCLC. We collected transcriptomic profiles
of SCLC^[159]12, ESCC^[160]21, EAC^[161]21, lung squamous cell
carcinoma (LUSC)^[162]30, and LUAD^[163]31 (Supplementary Fig. [164]4b
and Supplementary Data [165]9). Iterations of unsupervised clustering
yielded three principal groups that showed distinct gene expression
patterns (“Group NE,” “Group Squamous,” and “Group Adeno,”
Fig. [166]3c). Consensus clustering demonstrated that the transcriptome
landscape of PSCCE was highly similar to SCLC: all PSCCEs and all but
one SCLCs clustered together into Group NE.
We identified signature genes highly and specifically expressed in each
group against other groups (Supplementary Data [167]10). Group NE
signature genes included several neuroendocrine markers including
NCAM1, SYP, and CHGA (Fig. [168]3d). These genes had vanishingly low
expressions in other groups. Genes associated with squamous cell and
gland differentiation were among signature genes specifically expressed
in Group Squamous and Group Adeno, respectively (Supplementary
Fig. [169]4c).
We next turned to transcription regulation networks of signature genes.
Several neuroendocrine lineage transcription factors including ASCL1,
NEUROD1, and INSM1 were among Group NE signature genes (Fig. [170]3d).
Group NE signature genes were also significantly enriched for targets
of the E2F family (Supplementary Fig, [171]4d). Gene set enrichment
analysis (GSEA) revealed significant enrichment of E2F targets in Group
NE (Fig. [172]3e), consistent with relief of E2F suppression by RB1
loss observed in both PSCCE and SCLC. Group Squamous signature genes
were regulated by TP63, consistent with its squamous cell
differentiation.
The three groups also showed differential activation of oncogenic
signaling pathways. The RB1 depletion-associated signature^[173]32
(“RB-loss signature”) was specifically enriched in Group NE tumors,
consistent with universal RB1 disruption in both PSCCE and SCLC
(Fig. [174]3f). EGFR signaling pathway was preferentially activated in
Group Squamous and Group Adeno (Supplementary Fig. [175]4e), consistent
with observations of EGFR signaling activation by activating mutations
or overexpression of EGFR in lung and esophageal
cancers^[176]31,[177]33.
Taken together, the distinguishable expression pattern, regulation
network, and oncogenic pathway activation encapsulated that PSCCE was
different from EAC and ESCC but highly similar to SCLC. PSCCE should
not be viewed as a neuroendocrine variant of ESCC but as a distinct
entity.
Gene expression pattern reveals two subtypes of PSCCE
We further looked into the transcriptome of PSCCE alone to decipher the
heterogeneity within PSCCE, which would be masked when compared with
strongly divergent cancers.
Unsupervised consensus clustering of 38 PSCCE tumors yielded two
molecular subtypes with distinct gene expression patterns (Fig. [178]4a
and Supplementary Data [179]11). The first subtype comprised nine
tumors stably clustered together. The other 29 tumors showed unstable
clustering but seldom clustered with the first subtype and were hence
considered collectively. We found lineage transcription factors NEUROD1
and ASCL1 within signature genes (Supplementary Data [180]12) of each
group; they were also among the most differentially expressed genes
across the two subtypes (Fig. [181]4b). According to these
observations, we named the 9 NEUROD1^high tumors as subtype PSCCE-N,
and the other 29 ASCL1^high tumors as subtype PSCCE-A. GSEA revealed
that target genes of NEUROD1 and ASCL1^[182]13 were preferentially
expressed in PSCCE-N and PSCCE-A tumors, respectively (Fig. [183]4c),
confirming the regulatory roles of ASCL1 and NEUROD1. The inverse
expression pattern of ASCL1 and NEUROD1 in two subtypes was validated
by IHC (Fig. [184]4d). Several neuroendocrine marker genes including
GRP, DDC, and SSTR2 also showed remarkable differences between two
subtypes (Fig. [185]4b).
Fig. 4. PSCCE had two molecular subtypes.
[186]Fig. 4
[187]Open in a new tab
a Gene expression heatmap of PSCCE-A and PSCCE-N subtypes identified by
unsupervised consensus clustering. Amplifications of MYC family are
shown on the top of the heatmap. Selected key genes related to
subtyping are shown on the right side. NA: no significant SCNV
observed. Source data are provided as a Source Data file. b Expressions
of lineage transcription factor and neuroendocrine marker genes that
were significantly differentially expressed across two subtypes. The
upper bound, centerline, and lower bound of boxplot represent the 75
percentile, the median and the 25 percentile of data; the upper and
lower whiskers extend to the largest and smallest value within 1.5
times of IQR from corresponding bound. Data beyond the whiskers are
plotted as outlier dots. ***P < 0.001, by the Wilcoxon rank-sum test. c
GSEA revealed that target genes of NEUROD1 and ASCL1 were
preferentially expressed in PSCCE-N and PSCCE-A subtype. d IHC of
Neurod1 and Ascl1 showed inverse expression pattern in PSCCE-N and
PSCCE-A subtypes. Scale bar: 50 μm. e Kaplan–Meier plot of the overall
survival of patients from two subtypes.
We screened for genomic alterations associated with this subtyping and
found that PSCCE-N subtype was associated with amplification of
multiple segments on chromosome 8q, including region harboring cancer
gene MYC (Supplementary Fig. [188]5a). The fraction of patients with
MYC amplification was significantly higher in PSCCE-N (56%, 5/9) than
in PSCCE-A (17%, 5/29, P = 0.036, Fisher’s exact test). However, the
expression levels of MYC and MYCL mRNA showed no significant difference
between two subtypes (Supplementary Fig. [189]5b). The PSCCE-N subtype
had a significantly higher MYCN level than PSCCE-A (P = 0.0012,
Wilcoxon rank-sum test, Supplementary Fig. [190]5b).
We also observed that PSCCE-N patients had significantly worse
prognoses when compared with PSCCE-A patients (hazard ratio (HR): 2.44,
95% confident interval (CI): 1.04–5.69, P = 0.0394, log-rank test,
Fig. [191]4e). Other clinical features, including gender, age at
diagnosis, T stage, N stage, and TNM stage presented no significant
difference between two subtypes (Supplementary Data [192]13).
A synthesis of recent studies of SCLC converged on four molecular
subtypes—SCLC-A, SCLC-N, SCLC-P, and SCLC-Y, as featured by the
differential expression of lineage transcription factors ASCL1,
NEUROD1, POU2F3, and YAP1, respectively^[193]15. We found that PSCCE
subtypes highly resembled the SCLC-A and SCLC-N subtypes. Gene sets
associated with SCLC-A and SCLC-N tumors^[194]13 were specifically
expressed in PSCCE-A and PSCCE-N tumors, respectively (Supplementary
Fig. [195]5c). The SCLC-N subtype was reported to associate with MYC
amplification^[196]14,[197]34 and to rapidly metastasize and relapse in
human and murine model^[198]14,[199]34, consistent with our findings
that PSCCE-N subtype was associated with MYC amplification and poorer
prognoses.
POU2F3 and YAP1 levels in the 38 RNA-sequenced PSCCEs were relatively
low compared to ASCL1 and NEUROD1 and did not appear to be a selective
master regulator (Supplementary Fig. [200]6). Given the prominent
ASCL1- or NEUROD1-associated expression patterns observed, the evidence
obtained in the present study was insufficient to confirm a POU2F3- or
YAP1-dominated subtype of PSCCE.
Notch pathway inactivation in PSCCE
Notch signaling was considered tumor-suppressive in SCLC^[201]12. In
PSCCE, we also observed low activity of Notch signaling, as
characterized by significant downregulation of Notch receptors and
effector, and upregulation of Notch antagonists (Fig. [202]5a).
Fig. 5. Notch pathway was inactivated in PSCCE.
[203]Fig. 5
[204]Open in a new tab
a A schematic plot of Notch pathway dysregulation in PSCCEs. Up:
upregulated, defined by having expression higher than three times of
95% quantile of 23 normal esophageal samples; Dn: downregulated,
defined by having expression lower than one-third of 5% quantile of 23
normal esophageal samples; Mut: mutated. The proportions of tumors
showing dysregulation are denoted as percentages, which are also
indicated by color intensities according to the color legend. b
Mutations of NOTCH1 were enriched in extracellular EGF-like repeats.
Truncating mutations are plotted in red. c Kaplan–Meier plots of
overall survival of PSCCE patients. Patients with mutated NOTCH1-4 had
significantly poorer prognoses. d The dysregulation of Notch pathway
components was observed regardless of Notch receptor status. The red
dots represent tumors harboring mutations in Notch receptors (NOTCH1-4)
and blue dots represent tumors with wild-type Notch receptors.
Log[2](foldchange), P and q values determined by DESeq2 are shown
below. T tumors. N normal samples. The upper bound, centerline, and
lower bound of boxplot represent the 75 percentile, the median, and the
25 percentile of data; the upper and lower whiskers extend to the
largest and smallest value within 1.5 times of IQR from corresponding
bound. Data beyond the whiskers are plotted as outlier dots. Source
data are provided as a Source Data file.
The inhibition of Notch signaling was observed on multiple levels. (a)
NOTCH1 was mutated in 26% (12/46) of cases. Sixty-four percent (9/14)
of NOTCH1 nonsynonymous mutations were frame-shifting deletions and
nonsense mutations truncating the protein far N-terminal to its
transactivation domain, and were thus considered inactivation. Missense
mutations of NOTCH1 showed significant local clustering in
extracellular epidermal growth factor-like repeats important for ligand
binding (Fig. [205]5b). Consistent with its tumor-suppressive role,
mutations in NOTCH1-4 were significantly associated with a poorer
prognosis in the combined cohort (HR 1.67, 95% CI 1.01–2.69, P = 0.046,
Fig. [206]5c). (b) The relatively mutation-sparse NOTCH2 and NOTCH3
genes were identified as downregulated DEGs when compared to matched
normal samples (Fig. [207]5d, detailed P, q, and log2FC value in Source
Data [208]3). (c) Transcriptomic profiling also revealed inhibitory
ligands of Notch pathway DLK1 and DLL3 as upregulated DEGs
(Fig. [209]5d and Source Data [210]3). (d) HES6, a member of basic
helix-loop-helix transcription factor, which binds and inhibits the
major Notch pathway effector HES1^[211]35, was overexpressed in PSCCE
(Source Data [212]3).
The downregulation of Notch receptors and overexpression of Notch
antagonists were also observed in tumors with wild-type Notch receptors
(Fig. [213]5d), indicating a constitutive suppression of Notch
signaling. Consequently, pro-neural TFs ASCL1 and NEUROD1, whose
expression was inhibited by Notch signaling pathway^[214]36, were
significantly upregulated.
PSCCE features a T-cell excluded tumor microenvironment
In DEG pathway enrichment analysis, we observed downregulation of
immune response pathways in PSCCE. To further describe TME in PSCCE, we
used computational methods to evaluate abundance of infiltrating immune
cells.
We first applied single-sample GSEA (ssGSEA) projection of immune cell
signatures collected from the literatures (n = 19, Supplementary
Data [215]14), including signatures of T cells, B cells, macrophages
and granulocytes, on gene expression profiles of PSCCE, SCLC, LUAD,
LUSC, EAC, and ESCC (see “Method” and Source Data [216]4). We found
that PSCCE had significantly lower T cells and CD8 T cells scores than
EAC and ESCC (Fig. [217]6a). Both organ-of-origin and histology had
significant impacts on the immune milieu: esophageal cancers generally
had significantly fewer infiltrating T cells than lung cancers,
consistent with previous reports^[218]37; and small-cell carcinomas had
significantly lower T cell and CD8 T-cell abundance compared to tumors
with other histologies arising from same organ (Fig. [219]6a and
Supplementary Fig. [220]7a, two-way ANOVA test). Enumeration of immune
infiltrates using a deconvolution method (CIBERSORT, ref. ^[221]38)
showed similar trend: EAC and ESCC had lower abundance of infiltrating
T cells and CD8 T cells than LUAD and LUSC, while PSCCE had
significantly even lower abundances than ESCC and EAC (both P < 0.001,
Wilcoxon rank-sum test, Fig. [222]6b). We also quantified T-cell
abundance using recent single-cell RNA-seq (scRNA-seq) derived cell
type signature^[223]39. The Overall Expression (OE) scores for both T
cells and CD8 cytotoxic T cells were significantly lower in PSCCE and
SCLC than in corresponding non-neuroendocrine malignancies
(Fig. [224]6c).
Fig. 6. PSCCE presented a T-cell excluded phenotype.
[225]Fig. 6
[226]Open in a new tab
a ssGSEA scores of T cells and CD8 T cells related signatures in each
cancer type. The impacts of “organ-of-origin” and “histology” on scores
were determined using two-way ANOVA tests. Degree of freedom:
organ-of-origin: 1; histology: 1. F values of each variate are shown in
brackets behind P values. Source data are provided as a Source Data
file. b T cell and CD8 T-cell abundance estimated by CIBERSORT are
plotted against marker genes (CD3E and CD8A) expression level. PSCCEs
are plotted in blue and other cancers in gray. The medians of each
cancer type are plotted separately and annotated on the plot. c Overall
Expression (OE) scores of T cells and CD8 cytotoxic T cells in each
cancer. The impacts of “organ-of-origin” and “histology” on OE score
were determined using the same method mentioned above. d Representative
CD8A IHC staining of PSCCE. Left: overview of the fields distribution
in whole section. Right top: fields of tumor parenchyma; right lower:
fields of invasive margin (dashed line) between the parenchyma (P) and
surrounding stroma (S). Scale bar: 100 μm. e OE scores of exclusion
program for CD8 cytotoxic T cells and T cells in each cancer. In all
panels, the upper bound, centerline, and lower bound of boxplot
represent the 75 percentile, the median and the 25 percentile of data;
the upper and lower whiskers extend to the largest and smallest value
within 1.5 times of IQR from corresponding bound. Data beyond the
whiskers are plotted as outlier dots. *P < 0.05, **P < 0.01,
***P < 0.001, ns not significant, by the Wilcoxon rank-sum test.
Most solid tumors could be categorized into three immune phenotypes:
inflamed, excluded, and desert^[227]40,[228]41. Consistent with the
computational analysis, IHC staining of CD8A revealed that 85% (23/27,
Supplementary Data [229]15) of PSCCEs were CD8 T-cell “excluded”, in
which CD8 T cells failed to infiltrate into tumor parenchyma and
aggregated in the surrounding stroma instead (Fig. [230]6d and
Supplementary Fig. [231]7b). A recent study showed that the exclusion
of T cell was induced by certain programs expressed by malignant
cells^[232]39. Intriguingly, DEGs of PSCCE were significantly
overrepresented in exclusion programs—of 146 repressed genes associated
with CD8 cytotoxic T-cell exclusion, 52 were downregulated DEGs in
PSCCE (P < 1.0
[MATH:
×10−5 :MATH]
, Monte Carlo stimulation of hypergeometric test). PSCCE and SCLC also
had significantly higher OE scores of exclusion programs for both CD8
cytotoxic T cells and T cells (Fig. [233]6e).
Discussion
In this study, we performed genomic, transcriptomic, and immune
profiling on a rare but highly aggressive neuroendocrine malignancy
arising from the alimentary tract–PSCCE. By integrating multi-omics
data, we revealed the following findings: (1) PSCCE harbors a high
frequency (98%) of RB1 disruption mediated by multiple mechanisms; (2)
the transcriptome of PSCCE highly resembles that of SCLC, but not that
of ESCC or EAC; (3) PSCCE has two distinct subtypes regulated by
lineage TFs ASCL1 and NEUROD1; and (4) insufficient T-cell infiltration
is widely observed in PSCCE.
RB1 was disrupted in up to 93% of all SCLC^[234]12,[235]42. The
previous WES profiling of PSCCE reported a RB1 mutation frequency of
27%^[236]11. Here, we demonstrated that with an integrated analysis of
WES, RNA-seq, and IHC, we could identify RB1 disruption in 98% of
PSCCEs with high fidelity. We also discovered that exon deletions—small
deletions affecting as few as one exon—were a major and unique
RB1-disrupting mechanism in PSCCE. It should be noted that none of WES,
RNA-seq, or IHC alone succeeded to detect all RB1 disruption events in
PSCCE. Given that only two-thirds of RB1 disruption events in the
present study were detectable by the canonical WES, we proposed a
potential underestimate of RB1 disruption in previous WES studies of
cancers. Nearly universal disruption of RB1, together with paucity of
CCND1 amplification or CDKN2A deletion, comprised genomic features
distinguishing PSCCE from ESCC or EAC.
In the present study, unsupervised clustering of gene expression
patterns revealed a molecular-based taxonomy of cancers different from
the present organ-of-origin classification. We discovered that PSCCE
had a transcriptome much more akin to SCLC. This shared transcriptomic
landscape of PSCCE and SCLC was consistent to the neuroendocrine
differentiation and loss of RB1 in both entities. On the other hand,
transcriptome of PSCCE was quite different from ESCC or EAC.
Recent research of SCLC categorized it into four molecular
subtypes^[237]15. Whether these subtypes also exist in other
neuroendocrine cancers is relevant to oncogenesis mechanisms and
therapy vulnerabilities. Two subtypes of PSCCE, namely PSCCE-A and
PSCCE-N, highly resembled the SCLC-A and SCLC-N subtypes of SCLC, in
the aspects of transcriptomic features, regulatory factors, associated
genomic alterations, and impact on patients’ prognoses^[238]13,[239]15.
It is quite striking that molecular subtyping persisted in cancers
arising from different organs and from patients with different ethnic
backgrounds. Given the highly similar genetic and transcriptomic
features of PSCCE and SCLC, it appeared intuitive that subtyping would
also be shared. However, the molecular fundaments of ASCL1- and
NEUROD1-regulated subtypes in both cancers remain largely unknown. It
was once supposed that SCLC-N lesions might have a different
cell-of-origin from SCLC-A lesions, due to the absence of Neurod1^+
cells in mouse lungs^[240]13. Recently, a murine model of SCLC
combining Trp53 and Rb1 knockout with Myc overexpression revealed that
SCLC-N tumors could arise from SCLC-A precursor lesions^[241]14,
suggesting lineage plasticity fueled by Myc. Consistent with this idea,
PSCCE-N tumors were also associated with MYC amplification.
Although we observed shared ASCL1- and NEUROD1-regulated subtypes, we
did not confirm any POU2F3- or YAP1-regulated subtypes in PSCCE. One
possible cause is our limited sample size. SCLC-P and SCLC-Y subtypes
are relatively rare in SCLC^[242]15. If the proportions of subtypes
were similar in PSCCE, a PSCCE-P or PSCCE-Y subtype could simply be
missed due to rarity. Further studies are required to elucidate whether
other molecular subtypes exist in PSCCE.
The similarities in terms of genomic alterations, transcriptomic
features, and molecular subtypes between PSCCE and SCLC lead to another
intriguing question: whether these two cancers have the same
cell-of-origin, or some shared genomic alterations (such as loss of
TP53 and RB1) restricted lineage to small- cell carcinoma? Pulmonary
neuroendocrine cells were reported to be the major cell-of-origins of
SCLC^[243]43. A recent study found that neuroendocrine stem cell
(NE^stem), which started uncontrolled proliferation upon loss of TP53
and RB1, could be the cell-of-origin of SCLC^[244]44. However, evidence
supporting such cells in the esophagus is lacking. Another emerging
hypothesis is that certain genomic alteration enhances lineage
plasticity and promotes cells to differentiate along a neuroendocrine
trajectory. In support of this idea, Trp53 and Rb1 dual knockout was
indispensable in all genetic-engineered murine models of
SCLC^[245]14,[246]43,[247]45. Lung adenocarcinomas (LUAD) with loss of
RB1 were more likely to transform into SCLC after
treatment^[248]46,[249]47, supporting the role of RB1 loss in lineage
switching. Our observations here emphasize role of RB1 disruption in
PSCCE. RB1 loss also shaped the shared transcriptome landscape of PSCCE
and SCLC. However, our study could not disclose the underlying
mechanisms. Further research, including cell line and mouse model
studies, is required.
Through the computational dissection of TME, we found that PSCCE had a
TME characterized by insufficient infiltration of both T cells and CD8
cytotoxic T cells. IHC of PSCCE tumors revealed that a large fraction
of PSCCE presented a CD8 T-cell “excluded” phenotype. CD8 T-cell
“excluded” was also the predominant immune phenotype of SCLC, and was
associated with failure of immune checkpoint blockage^[250]48. Multiple
lines of evidence indicated that the small-cell carcinoma histology was
associated with insufficient T-cell infiltration. Gene expression
programs associated with T-cell exclusion^[251]39 significantly
overlapped with DEGs of PSCCE and were highly expressed in PSCCE,
indicating that a modulated transcriptional network in small-cell
carcinoma might be responsible. Further research are required to
provide more comprehensive description of various immune populations in
PSCCE.
Our integrated study also provided therapeutic insights into PSCCE.
Systemic chemotherapy regimens extrapolated from SCLC proved beneficial
to PSCCE patients^[252]1,[253]49. By providing rationales through
molecular similarities, our observation supported the extrapolation of
systemic treatment. The identification of subtypes of PSCCE, which were
highly similar to SCLC-A and SCLC-N subtypes of SCLC, encouraged the
extrapolation of future targeted therapy designed and trialed for
certain SCLC subtypes into corresponding PSCCE subtype. The differences
between PSCCE-A and PSCCE-N tumors also warranted molecular
characterization when designing researches. Our profiling also
introduced insight for future preclinical research for PSCCE. Highly
and specifically expressed genes might be therapeutic targets,
including E2F family^[254]50 and DLL3^[255]51. Notch pathway
reactivation may suppress essential neuroendocrine programs of PSCCE,
leading to tumor regression^[256]52. The induction of T-cell inflamed
TME by antibody-guided chemokines^[257]53, or oncolytic viruses armed
with recruiting chemokines^[258]54 might improve response rate of
immune therapy.
Methods
Study design and inclusion criteria
This study was approved by the Ethics Committee of National Cancer
Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking
Union Medical College (2018103113071202). The study was conducted in
accordance with local laws and the guidelines of Declaration of
Helsinki. PSCCE patients were screened from patients who received
radical esophagectomy for esophageal cancer in the Department of
Thoracic Surgery, Cancer Hospital, Chinese Academy of Medical Sciences
from 2006 to 2017. All patients provided written informed consent. Both
FF and FFPE specimens of PSCCE tumors and matched esophageal tissue
were obtained from institutional biobank. Pathological diagnoses were
independently confirmed by two certificated pathologists (S. Shi and
L.X.). Tumors with positive IHC staining for at least one of CD56,
Synaptophysin, or Chromagranin A were included. Tumors were also
assessed to have ≥60% tumor cell purity and without extensive necrosis.
Matched non-cancerous esophageal samples were provided as normal
controls in sequencing studies
DNA and RNA extraction
DNA from FF specimens was extracted using Allprep DNA/RNA/miRNA
Universal Kit (QIAGEN) following the manufacturer’s protocol. DNA and
RNA from FFPE specimens were extracted using Allprep DNA/RNA FFPE Kit
(QIAGEN) by WuXi NextCODE (Shanghai). The amount of DNA was determined
by Qubit2.0 (Thermo Fisher) and integrity of DNA was determined by
agarose electrophoresis. The amount of extracted RNA from FFPE sample
was determined by NanoDrop (Thermo Fisher). Quality of RNA from FFPE
sample was assessed by Agilent 2100 Bioanalyzer (Agilent) to have a
DV[200] (percentage of RNA fragments >200 nucleotides fragment
distribution value) ≥30%.
Whole-exome sequencing (WES)
WES library was prepared using the SureSelect Human All Exon Kit
(Agilent, V5) following the manufacturer’s protocol. Libraries were
sequenced on Illumina NovaSeq platform, with the aim to obtain
sequencing depth of 200
[MATH: × :MATH]
and 100
[MATH: × :MATH]
for tumor and normal tissue, respectively.
Somatic mutation calling
Sequenced reads were trimmed by Trimmomatic^[259]55 (0.36) and mapped
to reference genome hg19 by Burrows–Wheeler Aligner^[260]56
(0.7.12-r1044). PCR duplicates were marked by Picard
([261]https://broadinstitute.github.io/picard/, 2.9.0) and excluded
from further analysis. Somatic mutations were detected using
MuTect2^[262]57 (3.5) with assistance from Genetron Health (Beijing).
After integration of results and cross-validation of mutation calls, we
further filtered mutation calls with following filters: (1) coverage at
mutation site in tumor ≥10; (2) number of variant reads in tumor ≥3;
(3) number of variant reads in matched normal sample ≤2; (4) maximal
frequency in population (Exome Aggregation Consortium and 1000 Genome)
<0.01; (5) mutations registered in snp142 database were removed unless
it was also recorded in the COSMIC database (v70); (6) variant AF
should be significantly higher in tumor than in matched normal tissue
(examined by the Fisher’s exact test with P < 0.05 and
Benjamini–Hochberg corrected q < 0.1); (7) distribution of forward and
backward reads in variant and reference reads covering mutation site
should show no significant difference (“strand bias,” examined by the
Fisher’s exact test with P > 0.0001).
SMGs and genes with mutation cluster
SMGs were called by MutsigCV^[263]26 (1.4). Genes with significantly
clustered mutations were identified using a nonrandom clustering method
implanted in R package iPAC. A cluster ≤100 amino acids was considered
within one functional domain and was included. Further, as mutations in
genes that were not expressed conferred neutral impact in
tumorigenesis, only gene that was expressed in tumors (defined by
having expression ≥10.0 transcript per million (TPM) in ≥10% of 38
tumors with RNA-seq data) were considered as candidate driver genes. In
analysis of combined cohort of 101 PSCCE, we only considered genes that
were mutated in ≥5 cases (4.95% of combined cohort) in order to further
identify genes of relevance.
Validation of somatic mutations
PCR primers were designed online using Primer3Plus
([264]https://primer3plus.com/) to amplify mutation loci in TP53, RB1,
and NOTCH1. As DNA fragmentation in FFPE samples greatly hampered
amplification of long amplicons, we limited amplicons for FFPE
DNA < 500 bp. A PCR failure was declared after three failed attempts.
Sanger sequencing of PCR product was performed on 3730xl DNA Analyzer
(Applied Biosystems). For loci that suffered PCR amplification failure,
expression of somatic mutations in RNA-seq reads was checked by
Integrative Genomics Viewer^[265]58 (IGV, 2.6.2).
Mutational signature analysis
Mutational signatures were decomposited using a nonnegative matrix
fraction method implemented in R package SomaticSignatures^[266]59. We
included mutations from the present study and those reported by Wang et
al.^[267]11 into a “combined cohort” of PSCCE. SCLC mutations were also
included to discover potential signatures unique to small-cell
carcinomas. Optimal number of output signatures (K, candidate range
from 2 to 20) was determined by 1000 iterations at each K value. For
each K, the cosine similarities between resultant signatures and COSMIC
signatures (version 2) were calculated. Optimal K was determined when
(i) explained variance did not increase remarkably by further
increasing K, and (ii) the average of cosine similarities to the most
resembling COSMIC signatures reached maximum.
SCNV calling
SCNVs were called using CNVkit^[268]60 (0.9.5). FF samples and FFPE
samples were analyzed separately as two groups. Normal samples in each
group were pooled up to serve as reference used in corresponding group.
Coverages of each bin (each bin equals a capture region by WES) were
compared against reference to calculate log2 ratios. Bins with log2
ratio < −15.0 were discarded. Segments were called from bins with same
copy number. Segments were filtered for copy-number variations (CNVs)
in healthy population. CNVs in healthy population were downloaded from
Database of Genomic Variants^[269]61 (DGV,
[270]http://dgv.tcag.ca/dgv/app/home, 2020-02-25 release). We built a
common CNVs reference set (DGV.refset), containing CNVs with frequency
>1% in studies with sample size ≥1000 included by DGV. Segments with a
log2 ratio > 0.2 were filtered for gains in DGV.refset; segments with a
log2 ratio < −0.2 were filtered for losses. Segments, which had >50%
reciprocal overlap with DGV.refset CNVs, were excluded. Stringent
thresholds were applied to call amplifications (log2 ratio > 0.807
which equaled >3.5 copies) and deletions (log2 ratio < −2.0 which
equaled <0.5 copies). Significant SCNV peaks were called from
DGV.refset filtered segments using GISTIC2.0^[271]24 module (version
6.15.28) on GenePattern public server
([272]https://cloud.genepattern.org/).
Validation of SCNVs
Copy number of MYC, TERT, and SOX4 in all samples was determined using
7900HT real-time PCR system (Applied Biosystems). IFNG was used as the
reference. When SCNVs were observed in IFNG locus, an alternative
reference including AQP5, ACACA, and ACLY was used. SCNV results were
determined with same thresholds (amplifications: >3.5 copies;
deletions: <0.5 copies) and were compared with SCNVs called from WES
data.
Validation of RB1 exon deletion breakpoints
We designed series of PCR primers according to RB1 exon deletions
detected by SCNV analysis pipeline to gradually approach and finally
locate exact breakpoints of deletions. As long PCR amplicons (>2000 bp)
were required to efficiently amplify regions flanking breakpoints,
validation was performed only on FF samples. Downstream breakpoint of
PSCCE_15T was predicted to residue in large intergenic region and was
not validated due to technical difficulty. Sanger sequencing of PCR
products was performed by GENEWIZ (Suzhou) on 3730xl DNA Analyzer
(Applied Biosystems).
RNA sequencing (RNA-seq)
Library was prepared using the TruSeq RNA Access Library Prep Kit
(Illumina), which was optimized to provide reproducible result of RNA
from FFPE samples. Paired-end 150-bp sequencing of the subsequent
libraries was sequenced on Illumina NovaSeq platform, with the aim to
obtain ≥6-Gb sequencing data.
RNA-seq data analysis
Sequencing reads were trimmed by Trimmomatic^[273]55 (0.36) and aligned
to reference genome hg19 by STAR^[274]62 (2.4.2a). RSEM^[275]63
(v1.3.1) was used to estimate abundance of annotated genes. Expression
value in TPM was supplemented in Source Data [276]2.
DEGs were identified using R package DESeq2^[277]64, by comparing 38
tumors against 23 matched normal esophageal samples sequenced together
in the present study. “apeglm” parameter of DESeq2 was activated to
accurately estimate true effect size. Genes with P < 0.01, q < 0.05,
and log2FoldChange ≥ 1.0 by DESeq2 were defined as upregulated DEGs;
genes with P < 0.01, q < 0.05, and log2FoldChange ≤ −1.0 were defined
as downregulated DEGs.
Gene ontology and Kyoto Encyclopedia of Genes and Genomes enrichment
was performed using R package clusterprofiler^[278]65. Enrichment with
P < 0.01 and q value < 0.05 was considered significant. Pathways that
were significantly enriched for upregulated DEGs but not for
downregulated DEGs were considered upregulated; similarly, pathways
which were specifically enriched for downregulated DEGs were considered
downregulated.
GSEA^[279]66 was performed using “GSEA” module at GenePattern server
([280]https://cloud.genepattern.org/).
Identification of RB1 exon skipping events
Sashimi plot of RNA-seq bam files was visualized and generated by
IGV^[281]58 (2.6.2). Sashimi plots of driver genes in PSCCE (TP53, RB1,
and NOTCH1) were manually checked for reads spanning abnormal exon
junctions. Abnormal exon junctions with ≥10 supporting reads were
included in further analyses. Sashimi plots of 23 matched normal sample
were used as control.
Comparison with TCGA tumors and SCLC
RNA-seq reads count data of TCGA esophageal cancers^[282]21 were
downloaded from the National Cancer Institute Genomic Data Commons data
portal ([283]https://portal.gdc.cancer.gov/projects/TCGA-ESCA) for DEG
analysis. Gene expression data of 81 SCLC samples reported by George et
al.^[284]12 were obtained from Supplementary Table [285]10 of ref.
^[286]12. Batch-effect-normalized TOIL recomputed TPM gene expression
data of TCGA LUAD^[287]31, LUSCs^[288]30, and esophageal
cancers^[289]21 were downloaded from UCSC Xena data hubs
([290]https://toil.xenahubs.net/download/tcga_RSEM_gene_tpm.gz).
Multiple cancers clustering
Thirty-eight EAC samples and 38 ESCC samples from TCGA were selected
randomly to match PSCCE in number. Similarly, 81 LUAD samples and 81
LUSC samples from TCGA were matched to 81 SCLC samples reported by
George et al.^[291]12 (list of samples used in analysis in
Supplementary Data [292]9).
Expression data of selected samples were combined and
quantile-normalized to minimize the batch effect. Genes with high
expression (average expression in top 50%) were included and then
log2-transformed. Median absolute deviations (MAD) of genes were
calculated. Top G genes with highest MAD were median-centered and then
supplied as input of consensus clustering using R package
ConsensusClusterPlus^[293]67. Candidate number of resultant groups N
was set ranging from 2 to 10; candidate number of input genes G was set
1000, 2000, 3000, 4000, and 5000. For each N and G, 1000 times of
fraction and clustering were iterated. The output tracking plots,
cluster matrices heatmaps, and CDF curve were manually checked. N was
finally determined as four as: (i) change in CDF sharply decreased to
nearly zero at 4; (ii) extremely small group (n ≤ 5) representing
outliers begin to emerge from N ≥ 4; (iii) the three major groups
(Group NE, Squamous, and Adeno) remained relatively stable when further
tuning of N and G. Due to its small size, Group 4 was not included for
further analysis and not plotted in Fig. [294]3c. Genes associated with
the clustering most were identified using R package ropls^[295]68 and
top 800 genes were plotted in Fig. [296]3c.
After optimal clustering was determined, comparisons of G genes
expression among N groups were performed using the Tukey’s HSD test.
Genes with log2Foldchange ≥ 2.0, P < 0.05, and Benjamini–Hochberg
corrected q < 0.1 against other groups were considered signature genes
and listed in Supplementary Data [297]10.
Identification of subtypes in PSCCE
Genes with high expression (average expression ≥5.0 TPM) were
log2-transformed. MAD of genes were calculated. Top G genes with
highest MAD were median-centered and then supplied as input of
consensus clustering by R package ConsensusClusterPlus^[298]67.
Candidate number of clusters N ranged from 2 to 10 and candidate
numbers of input genes G were 1000, 2000, 3000, 4000, and 5000. For
each N and G, 1000 times of fraction and clustering were iterated. The
output track plot, cluster matrix heatmap, and CDF curve were manually
checked. One stable group of nine samples (PSCCE-N) was repeatedly
observed. The other 29 samples had unstable clustering but seldom
clustered with PSCCE-N samples, thus were studied collectively as
PSCCE-A. Genes associated with the clustering most were identified
using R package ropls, and top 800 genes were plotted in Fig. [299]4a.
To better discover signature genes for each subtype, we compared all
genes with an average expression ≥1.0 TPM between subtypes using the
Tukey’s HSD test. Genes with log[2]Foldchange ≥ 2.0, P < 0.05, and
Benjamini–Hochberg corrected q < 0.1 against other groups were
considered signature genes and listed in Supplementary Data [300]12.
SCNVs that affected ≥3 of 38 RNA-sequenced tumors were tested for
significantly different distribution between two subtypes using the
Fisher’s exact test. Chromosomal location enrichment analysis of SCNVs
with P < 0.05 was performed on Enrichr
([301]http://amp.pharm.mssm.edu/Enrichr/). Chromosomal locations with
P < 0.01 and q < 0.05 were considered significant.
Computational dissection of TME in PSCCE
Infiltrating immune cell signatures were collected from literatures
(Supplementary Data [302]14). Gene expression profiles of PSCCE, SCLC,
LUAD, LUSC, EAC, and ESCC samples in Supplementary Data [303]9 were
combined and quantile-normalized. ssGSEA scoring of the signatures was
performed using R package GSVA. The deconvolution estimation of immune
cells’ abundance was performed using CIBERSORT^[304]38
([305]https://cibersort.stanford.edu/). For CIBERSORT, SCLC data were
not included due to missing value in several genes. scRNA-seq-derived
cell type signatures, immune cell exclusion signatures, and OE scoring
source code were obtained from report by Jerby-Arnon et al^[306]39. Raw
scores of immune signatures are provided in Source Data [307]4.
Median-centered and standard-deviation-scaled scores were used to
generate boxplots in Fig. [308]6 for better visualization. Monte Carlo
stimulation was carried out by repeating random draw of L gene
(repressed genes in CD8 cytotoxic T-cell exclusion signature) from
37536 gene (genes whose expression values were not all zero in 38
PSCCEs and 23 matched normal samples) for 100,000 times and recording
M, the numbers of certain 2249 genes (downregulated DEGs in PSCCE) in
each draw. Then, P(M
[MATH: ≥ :MATH]
m) equals (times when M
[MATH: ≥ :MATH]
m)/100,000. Stimulation was repeated three times with different random
seeds and the maximal P value was reported.
Immunohistochemistry (IHC)
To guarantee consistency of IHC procedure, staining of Rb, Ascl1, and
Neurod1 was performed on tissue microarray sections of PSCCE tumors and
matched normal esophageal samples. Briefly, cylindrical tissue cores of
tumor and matched normal samples were extracted from paraffin blocks
and re-embedded into a microarray block. Microarray block was
sectioned. Sections were deparaffinized in xylene, rehydrated in serial
ethanol solution and distilled water. Heat-mediated antigen retrieval
was performed in pH9.0 Tris-EDTA solution. Sections were blocked and
incubated with primary antibody and secondary antibody. Sections were
developed with DAB and counterstained with hematoxylin.
T-cell phenotype profiling by IHC
Whole slides of tumor section covering both the bulk of tumor and
surrounding normal tissue were stained for CD8A. Evaluation was carried
out by certificated pathologists (S. Shi and L.X.). First, ten random
high power fields (HPF, 400
[MATH: × :MATH]
) of tumor parenchyma >300 μm from outermost border of tumor were
checked for CD8A-positive cells. Efforts were taken to evenly
distribute the HPFs inside tumor parenchyma. If ≥10 CD8A-positive
lymphocytes were observed to directly contact with tumor cell in ≥3 HPF
(for tumors with large volume or extensive connective-tissue septa, 20
fields were checked and ≥6 HPF were then used as threshold) then immune
phenotype of examined tumor was considered “inflamed.” Otherwise, ten
HPFs around the invasive margin were checked. If ≥10 CD8A-positive
lymphocytes were observed in ≥3 HPF along the invasive margin, then
immune phenotype was considered “excluded”; if ≥10 CD8A-positive
lymphocytes were merely observed in <3 HPF, the examined tumor were
considered “desert.” Profiling results were listed in Supplementary
Data [309]15.
Antibodies and PCR primers
Antibodies and PCR primers used in this study were listed in
Supplementary Tables [310]1 and [311]2, respectively.
Statistics
The R program (3.6.1) was used for statistics. Differences between
groups were examined by the Fisher’s exact test or Wilcoxon rank-sum
test where appropriate. All tests were two-sided unless stated
otherwise. Survivals of different groups were compared using the
log-rank test. HR and its 95% confidential interval were estimated
using Cox proportional hazard model. Significant level was set to 0.05
unless stated otherwise.
Reporting summary
Further information on research design is available in the [312]Nature
Research Reporting Summary linked to this article.
Supplementary information
[313]Supplementary Information^ (11MB, pdf)
[314]Peer Review File^ (1.7MB, pdf)
[315]41467_2021_24043_MOESM3_ESM.pdf^ (38.8KB, pdf)
Description of Additional Supplementary Files
[316]Supplementary Data 1^ (14.1KB, xlsx)
[317]Supplementary Data 2^ (1.1MB, xlsx)
[318]Supplementary Data 3^ (27KB, xlsx)
[319]Supplementary Data 4^ (17.4KB, xlsx)
[320]Supplementary Data 5^ (14.1KB, xlsx)
[321]Supplementary Data 6^ (16.1KB, xlsx)
[322]Supplementary Data 7^ (10.1KB, xlsx)
[323]Supplementary Data 8^ (11.6KB, xlsx)
[324]Supplementary Data 9^ (20.4KB, xlsx)
[325]Supplementary Data 10^ (51.4KB, xlsx)
[326]Supplementary Data 11^ (9.8KB, xlsx)
[327]Supplementary Data 12^ (19.2KB, xlsx)
[328]Supplementary Data 13^ (10.2KB, xlsx)
[329]Supplementary Data 14^ (10KB, xlsx)
[330]Supplementary Data 15^ (10KB, xlsx)
[331]Reporting Summary^ (6.6MB, pdf)
Acknowledgements