Abstract

   Esophageal squamous cell carcinoma (ESCC) is malignant while the
   carcinogenesis is still unclear. Here, we perform a comprehensive
   multi-omics analysis of 786 trace-tumor-samples from 154 ESCC patients,
   covering 9 histopathological stages and 3 phases. Proteogenomics
   elucidates cancer-driving waves in ESCC progression, and reveals the
   molecular characterization of alcohol drinking habit associated
   signatures. We discover chromosome 3q gain functions in the transmit
   from nontumor to intraepithelial neoplasia phases, and find TP53
   mutation enhances DNA replication in intraepithelial neoplasia phase.
   The mutations of AKAP9 and MCAF1 upregulate glycolysis and Wnt
   signaling, respectively, in advanced-stage ESCC phase. Six major tracks
   related to different clinical features during ESCC progression are
   identified, which is validated by an independent cohort with another
   256 samples. Hyperphosphorylated phosphoglycerate kinase 1 (PGK1, S203)
   is considered as a drug target in ESCC progression. This study provides
   insight into the understanding of ESCC molecular mechanism and the
   development of therapeutic targets.

   Subject terms: Oesophagus, Proteomics, Oesophageal cancer, Cancer
   genomics
     __________________________________________________________________

   The progression of oesophageal squamous cell carcinoma (ESCC) from
   early to advanced stages requires comprehensive molecular
   characterisation. Here, the authors perform a proteogenomics analysis
   of ESCC patient samples across nine histopathological stages and three
   phases, identifying key alterations and paths for progression.

Introduction

   Esophageal cancer (EC) is a malignant gastrointestinal carcinoma,
   ranking the seventh most common cancer and the sixth leading cause of
   cancer-related death worldwide^[70]1. Esophageal adenocarcinoma (EAC)
   and esophageal squamous cell carcinoma (ESCC) are the two major
   histologic subtypes of EC, of which EAC is more prevalent in western
   countries^[71]2, whereas ESCC predominantly occurs in East Asia,
   particularly in China and Japan^[72]3, indicating the diverse lifestyle
   of the countries as a major etiological factor of EC. The other
   etiological factors of EC include gender, age^[73]4, and the habits of
   drinking/smoking^[74]5. However, the molecular signatures, which are
   associated with the risk factors in EC progression, are still unknown.

   Generally, the carcinogenesis of tumor development hall long-drawn-out
   process, and genome data act as a “fossil” record of how a tumor came
   to be. Several large-scale ESCC cohorts, including The Cancer Genome
   Atlas (TCGA)^[75]6, presented genomic aberrations and identified highly
   mutated genes (e.g., TP53) in the advanced stages of ESCC. However, the
   first occurrences of the mutations/key events and the related effects
   during the carcinogenesis of ESCC are poorly understood. Liu et al.
   showed that earlier and advanced stages of ESCC shared several
   significantly mutated genes^[76]7. Thus, the mutations were cumulative
   in ESCC progression, and the key mutation/events which were found in
   the advanced stages might exist in the earlier stages, whereas there is
   still a lack of proteogenomic landscape in ESCC progression. In
   addition, the molecular mechanism of the diversity and tumor
   heterogeneity of ESCC is poorly understood, imposing a challenge for
   developing ESCC therapeutic strategies.

   Histologically, the esophageal wall includes the epithelium, lamina
   propria, muscularis mucosa, submucosa, muscularis externa, and
   adventitia^[77]8. The depth of infiltration of the cancer cells,
   determining the stage of lesions, was measured at the deepest point of
   their penetration in corresponding layers^[78]9. Surgery is the
   predominant curative treatment strategy in advanced stages where the
   cells invade muscularis externa (T2–T4 stages), with poor quality of
   life (QOL) and a low 5-year survival rate (<30%)^[79]10. Though the
   recent advances in endoscopic submucosal dissection (ESD)^[80]11 have
   achieved the early detection of ESCC patients (T1 stage) with higher
   QOL and significantly improved overall survival rate (>90%)^[81]12, the
   complexity of the early ESCC progression and the extreme trace amount
   of tissue samples in different stages have limited in portraying the
   multi-omics molecular landscape of ESCC.

   In this study, we perform a comprehensive multi-omics analysis of 786
   trace-tumor samples from 154 ESCC patients. The integrative multi-omics
   dataset reveals the stage-specific and risk factor-associated molecular
   characterization, and defines the cancer-driving waves along with the
   mutation accumulation in EC progression. Proteogenomics uncovers key
   events in the transit of the phases and the trajectory analysis shows 6
   major tracks and their molecular characteristics during the
   carcinogenesis of ESCC.

Results

Overview of proteogenomic landscape in ESCC progression

   We performed multi-omics-based profiling of trace 786 samples collected
   from 154 ESCC patients who had not experienced prior chemotherapy or
   radiotherapy. The clinicopathological characteristics of patients and
   tumors are summarized in Supplementary Table [82]1. Subsequently, 22
   substages during ESCC progression from healthy esophageal tissue to
   tumor development were established for these samples following WHO and
   Japanese pathology diagnostic criteria^[83]13. The number of substages
   identified from the ESCC patients varied from 4 to 16, and the tissue
   samples from the corresponding substages were separately dissected from
   the formalin-fixed, paraffin-embedded (FFPE) slides (Supplementary
   Fig. [84]1a). All ESCC samples in our cohort were dissected 3 mm thick
   and stood up one by one in the embedding, and were then marked in the
   hematoxylin and eosin (H&E)-stained sections (Supplementary
   Fig. [85]1c, d). In total, 786 samples were collected and classified
   into 9 histopathological stages covering 22 substages in our cohort
   (Supplementary Table [86]2), including stage 1 (normal tissue stage),
   stage 2 (hyperplasia stage), stage 3 (Tis stage), stage 4 (lamina
   propria cancer stage), stage 5 (muscularis mucosa stage), stage 6
   (submucosal invasion cancer stage a), stage 7 (submucosal invasion
   cancer stage b), stage 8 (T2 stage), and stage 9 (T3 stage)
   (Fig. [87]1a and Supplementary Data [88]1a; Methods). The normal (stage
   1)/tumor cell purity of all samples was over 95%, indicating the high
   quality of all samples of our cohort (Supplementary Fig. [89]1e). In
   addition, we collected another 256 samples as an independent validation
   cohort from 49 early-stage ESCC patients, covering 6 of 9
   histopathological stages as stages 1, 2, 3, 4, 5, and 7 (Fig. [90]1a
   and Supplementary Table [91]3; Methods). That represented the
   advantages of our samples in a pathological region-resolved mode,
   providing the chance to portray molecular profiles of ESCC in a
   time-resolved mode. Subsequently, we performed mass spectrometry (MS)
   profiling of all 786 samples, phosphoproteomic profiling of 145 samples
   (Supplementary Table [92]4 and Supplementary Data [93]1b), and
   whole-exome sequencing (WES) of 102 samples in the main cohort
   (Fig. [94]1a, Supplementary Figs. [95]1b and [96]2a, and Supplementary
   Data [97]1c, d; Methods). To demonstrate the findings and results in
   the main cohort, we also performed proteomic profiling of 256 samples
   in the validation cohort.

Fig. 1. The multi-omics landscape in ESCC progression.

   [98]Fig. 1
   [99]Open in a new tab

   a Overview of the experimental design and the number of samples for the
   genomic, proteomic, and phosphoproteomic analyses. ESCC esophageal
   squamous cell carcinoma. b The genomic profile of ESCC progression.
   Top: the mutation number and types of all the samples from early to
   progressive ESCC. Bottom: the somatic copy number alterations of all
   the samples from early to progressive ESCC. The mutation frequencies
   are shown by a bar plot at the right panel. NT phase: non-tumor phase,
   IEN phase: intraepithelial neoplasia phase, A-ESCC phase:
   advanced-stage ESCC phase. c The gain of neo-mutations at all stages in
   ESCC progression. d Analysis of the mutations loads of diverse cohorts.
   EESCC cohort: early ESCC cohort. e The number of the identified
   proteins of 786 samples (Kruskal–Wallis test, p < 2.2E–16). f Boxplot
   showing the number of the phosphosites identifications of 145 samples
   (Kruskal–Wallis test). Boxplot shows median (central line), upper and
   lower quartiles (box limits), 1.5× interquartile range (whiskers).
   ****p < 1.0E–4, ***p < 1.0E–3, **p < 0.01, *p < 0.05, ns. > 0.05.
   Source data are provided as a Source data file.

   WES profiling identified 9547 mutations in the Fudan cohort (this
   study) (Supplementary Data [100]2a). The top mutations were TP53,
   MUC16, FAT3, SYNE1, AKAP9, FAT4, etc. (Fig. [101]1b). TP53, frequently
   mutated in EC^[102]14, was the top-ranked mutation in the Fudan cohort,
   and was co-occurrence with the mutation of KMT2D (Supplementary
   Fig. [103]2b). In our study, we observed that the number of mutations
   gradually cumulated during ESCC carcinogenesis, ranging from 422
   mutations in hyperplasia stage to 5280 mutations in the T3 stage.
   Neo-mutations were indicated as those just appearing at a certain stage
   (Methods). In our cohort, we found that the number of the neo-mutations
   peaked at the Tis stage (n = 1330), and the T2 (n = 3003), indicating
   the significant events during carcinogenesis (Fig. [104]1c). Thus, nine
   pathological stages of our cohort were grouped into three phases (NT,
   IEN, and A-ESCC) (Fig. [105]1c), which allowed us to explore the key
   mutational events and the corresponding impacts in ESCC progression.
   Observation of mutation loads of the Fudan cohort and other ESCC
   cohorts (TCGA cohort^[106]15, Lin’s cohort^[107]16, and Song’s
   cohort^[108]17) showed the lowest mutation loads in the NT phase in the
   Fudan cohort (Fig. [109]1d). Liu et al. reported fewer mutations were
   detected in the esophageal nondysplastic epithelium (simple
   hyperplasia)^[110]7, which was also observed in our cohort that the
   mutation loads of the NT phase were lower (Wilcoxon signed-rank test,
   p < 0.05, A-ESCC vs. NT ratio = 9.36, IEN vs. NT ratio = 2.08)
   (Supplementary Fig. [111]2c), indicating the low mutation burden of our
   cohort was due to the low mutation loads of the early-stages of ESCC.
   Collectively, we built a comprehensive genomic landscape in ESCC
   progression, and presented the difference between the early and
   advanced stages of ESCC.

   At the protein and phosphoprotein levels, proteomic analysis was
   performed using a label-free quantification strategy^[112]18. Protein
   abundance was first calculated by intensity-based absolute
   quantification (iBAQ)^[113]19 and then normalized as the fraction of
   total (FOT). Spearman’s correlation coefficient of HEK293T cells was
   0.91, indicating the consistent stability of our MS platform
   (Supplementary Fig. [114]2d and Supplementary Data [115]3a). The
   gradually decreased (Spearman’s) correlation coefficient of the nine
   histopathological stages reflected the increased tumor heterogeneity
   during the carcinogenesis of ESCC, highlighting the importance of
   exploring molecular characteristics in ESCC progression (Supplementary
   Fig. [116]2e). With the advancement of the stages of ESCC, the numbers
   of protein identifications were slightly increased (Kruskal–Wallis
   test, p < 2.2E–16) from ~5000 in stage 1 (normal tissue) to ~7000 in
   stages 8 and 9 (T2 and T3 stages) with a total of 15,071 in 786 samples
   at 1% false discovery rate (FDR) (Methods) (Fig. [117]1e and
   Supplementary Fig. [118]2f). In addition, the reference proteome was
   highly dynamic based on the protein abundance, which spanned over eight
   orders of magnitude (Supplementary Fig. [119]2g). In addition, the ESCC
   biomarkers identified in previous studies (several ESCC tissues or cell
   lines), including ACTA2, TAGLN^[120]20, HSPA9, PDIA4, PLEC, POSTN,
   PSAP, and THBS1^[121]21 were also covered in the Fudan cohort
   (Supplementary Data [122]3b). At the phosphoprotein level, a total of
   52,856 phosphosites corresponding to 7612 phosphoproteins were
   identified in 145 samples (Supplementary Fig. [123]2h, i and
   Supplementary Data [124]3c). During the process of ESCC carcinogenesis,
   the number of phosphosite identifications was slightly increased
   (Kruskal–Wallis test, p = 5.2E–3) (Fig. [125]1f). In the validation
   cohort, label-free quantification measurement of 256 samples resulted
   in a total of 12,383 protein groups with a 1% FDR at the protein and
   peptide levels (Supplementary Fig. [126]2j). Consistent with the
   findings in the main cohort, with the advancement of the stages of
   ESCC, slightly elevated protein identifications were detected in the
   validation cohort (Kruskal–Wallis test, p < 1.0E–4) (Supplementary
   Fig. [127]2k). Compared to the NT phase, more protein identifications
   were identified in the IEN phase both in the main cohort and validation
   cohort (Wilcoxon rank-signed test, p < 1.0E–4) (Supplementary
   Fig. [128]2l). Overall, we established a comprehensive landscape of
   ESCC progression at the multi-omics level.

The SBS16 signature, associated with alcohol drinking, promoted DNA
replication in the IEN phase

   Except diverse lifestyle of the countries^[129]2,[130]3, alcohol
   drinking is one of the key risk factors of ESCC^[131]22. Moody et al.
   pointed that SBS16 is the significant signature associated with alcohol
   drinking^[132]22, whereas the impacts of SBS16 signature on the
   molecular level in the ESCC progression have not been revealed.
   Furthermore, SBS16 signature shows a negative association with the
   overall survival of ESCC patients (referred from TCGA cohort, log-rank
   test, p = 0.049) (Supplementary Fig. [133]3a). These findings allowed
   us to explore the impacts of SBS16 in ESCC patients with drinking habit
   at the molecular level.

   Of note, gene ontology (GO) analysis revealed that DNA replication was
   enriched in the ESCC patients with SBS16 signature (Supplementary
   Fig. [134]3b). Among the mutations, significantly related to SBS16
   signature, we found that OLFM4 mutation upregulated its corresponding
   protein expression (Wilcoxon signed-rank test, OLFM4 Mut vs. WT ratio =
   4.6, FDR = 3.4E–5) (Fig. [135]2a, b). Generally, OLFM4 promotes S-phase
   transition in cancer cell proliferation^[136]23. In our cohort,
   OLFM4-positive-associated proteins were involved in cell proliferation,
   including DNA replication, cell division, etc. (Fig. [137]2c, d).
   Moreover, OLFM4 displayed a positive association with CDK1 and CDK2 at
   the protein level (Pearson’s R = 0.44 for CDK1 and 0.54 for CDK2,
   p = 0.013 for CDK1 and 2.5E–3 for CDK2) (Supplementary Fig. [138]3c),
   which is a key regulator in the transition from S-phase to G2/M phase
   in cell cycle^[139]24. The mutation of OLFM4 showed positive impacts on
   the protein levels of cell proliferation-related markers, such as CDK2,
   MCM3/5/6, etc. (Fig. [140]2d). Besides, over-phosphorylation of DNA
   replication-related proteins was also detected during the
   carcinogenesis of ESCC (Kruskal–Wallis test, FDR < 0.05, A-ESCC and IEN
   vs. NT ratio ≥ 2) (Fig. [141]2e). To further validate the findings in
   our cohort, we analyzed the impacts of OLFM4 in other ESCC cohorts,
   such as the Li’s cohort^[142]25 and the Liu’s cohort^[143]26, and found
   that OLFM4 displayed positive association with DNA replication-related
   proteins at the protein level in these cohorts (Fig. [144]2d and
   Supplementary Fig. [145]3d). Taken together, SBS16 signature was a key
   event in the transition from the NT phase to the IEN phase for ESCC
   patients with a drinking habit, in which OLFM4 mutation showed positive
   impacts on DNA replication (Fig. [146]2j).

Fig. 2. The risk factor-associated mutational signatures in ESCC progression.

   [147]Fig. 2
   [148]Open in a new tab

   a Heatmap showing the significantly positive associated mutations in
   SBS16 signature (two-sided Fisher’s exact test). ***p = 1.4E–4 (BCKDK),
   **p = 2.7E–3 (CEP104), ***p = 1.4E–4 (LRRC31), ***p = 4.5E–4 (OLFM4),
   ***p = 1.5E–3 (DSC3), ***p = 1.5E–3 (PGC). The square directs to a
   subset of patient samples used for WES. b Volcano analysis of the
   impacts of significantly positive associated mutations in SBS16
   signature on their counterpart proteins expression (two-sided Wilcoxon
   signed-rank test). c Represented pathway enrichment that was positively
   correlated with OLFM4. d Heatmap showing the protein levels of DNA
   replication in OLFM4 mutation group vs. WT group (two-sided Wilcoxon
   signed-rank test, BH-adjusted *p < 0.05). e The expression
   (log[2]-transformed Intensity, median) of represented DNA
   replication-related phosphoprotein in ESCC progression (Kruskal–Wallis
   test, BH-adjusted *p < 0.05). A total of 145 samples for
   phosphoproteomic profiling are used in this analysis. n (NT) = 57, n
   (IEN) = 62, n (A-ESCC) = 26 biologically independent samples examined.
   f The correlation between etiological factors (top) and the
   significantly associated mutations of APOBEC signature (bottom)
   (two-sided Fisher’s exact test). *p = 0.012 (Phase), **p = 4.2E–3
   (Habit), **p = 4.6E–3 (DCTN2), **p = 4.6E–3 (EPS8), **p = 4.6E–3
   (CENPE). g Venn diagram depicting the number of the overlapped proteins
   overrepresented in the APOBEC signature and non-smoking/drinking ESCC
   patients (left, two-sided Wilcoxon signed-rank test, BH-adjusted
   *p < 0.05), and the associated signaling pathways (right). h Venn plot
   (left) showing the overlapped proteins significantly correlated with
   DCTN2 at the gene and protein levels (two-sided Wilcoxon signed-rank
   test, BH-adjusted *p < 0.05). Volcano plot (right) depicting the
   correlation between DCTN2 and the overlapped proteins (n = 86,
   two-sided Pearson’s correlation test). i Heatmap (left) presenting the
   represented chromosomal/spindle components and cell proliferation
   markers overrepresented in the DCTN2 mutation group (two-sided Wilcoxon
   signed-rank test), and correlation (right) between DCTN2/RUVBL1 and
   represented chromosomal/spindle components and cell proliferation
   markers (two-sided Pearson’s correlation test). j A brief summary of
   the impacts of the SBS16 signature (top) and APOBEC signature (bottom)
   in ESCC progression. A total of 102 samples for WES are used in the
   analysis. ****p < 1.0E–4, ***p < 1.0E–3, **p < 0.01, *p < 0.05, ns. >
   0.05. Source data are provided as a Source data file.

The APOBEC signature was overrepresented in the non-smoking/drinking ESCC
patients

   To explore the specific etiological factors that might contribute to
   the mutagenesis of ESCC, we adopted a non-negative matrix factorization
   (NMF) algorithm^[149]27 to extract mutational signatures from the WES
   data of the Fudan cohort and other ESCC cohorts, such as Moody’s
   cohort^[150]22, TCGA cohort^[151]15, Lin’s cohort^[152]16, and Song’s
   cohort^[153]17. In our cohort, DNA repair signature was detected as
   early as in the NT phases and lasted till the A-ESCC phase (Fisher’s
   exact test, p = 2.8E–3, 41.7%, 31.8%, 70.6% for NT/IEN/A-ESCC phases,
   respectively) (Supplementary Fig. [154]3e, f), indicating DNA damage is
   one of the causes leading to esophageal carcinogenesis. In addition, we
   found APOBEC signature was prominent in ESCC progression and was
   specially detected as early as in the IEN phase (Fisher’s exact test,
   p = 0.012, 66.7%) (Fig. [155]2f and Supplementary Fig. [156]3g).
   Similarly, we confirmed that APOBEC signature was detected in the
   earlier stage of ESCC (Fisher’s exact test, p = 0.029, 50%) in the TCGA
   cohort^[157]28 (Supplementary Fig. [158]3h).

   Integration of the findings in the Moody’s cohort^[159]22, we found
   APOBEC signature was prominent in the non-smoking/drinking ESCC
   patients (Fisher’s exact test, p = 8.4E–3, 42.5%), in which higher
   enrichment score was detected (Wilcoxon signed-rank test, FDR = 5.1E–4,
   non-smoking/drinking vs. smoking vs. drinking ratio = 1.78)
   (Supplementary Fig. [160]3i, j). To explore the impacts of APOBEC
   signature in the non-smoking/drinking ESCC patients, we integrated the
   highly expressed proteins both detected in the APOBEC signature and
   ESCC patients with non-smoking/drinking habit, and discovered these
   proteins were correlated with cell proliferation process
   (Fig. [161]2g). Somatic mutations analysis revealed that the mutations
   of CENPE, EPS8, and DCTN2, were positive associated with APOBEC
   signature (Fisher’s exact test, p < 0.05) (Fig. [162]2f). We further
   investigated the impacts of these three mutations, and found DCTN2
   mutation upregulated its corresponding protein expression (Wilcoxon
   signed-rank test, FDR = 0.040, Mut vs. WT ratio = 2.0) at the protein
   level (Supplementary Fig. [163]3k). In addition, only the protein level
   of DCTN2 was significantly overrepresented in the APOBEC signature
   group (Wilcoxon signed-rank test, FDR = 5.1E–5) (Supplementary
   Fig. [164]3k). These findings indicated the potential effects of DCTN2
   mutation in the patients with APOBEC signature.

   DCTN2, a subunit of dynactin, binds to both microtubules and
   cytoplasmic dynein, and is involved in a diverse array of cellular
   functions, including spindle formation, chromosome movement, nuclear
   positioning, etc.^[165]29. Among the dynein families, we found that
   DYNLRB1 exhibited a positive correlation with DCTN2 (Pearson’s
   R = 0.30, p = 4.3E–3) at the protein level (Supplementary
   Fig. [166]3l). Dynein families share a conserved motor domain coupled
   cycles of ATP hydrolysis with conformational changes to produce
   movement^[167]30. To determine the activation and movement of dynein in
   mitosis, we integrated DCTN2 positively correlated proteins, and found
   RUVBL1 showed a significant correlation with DCTN2 (Pearson’s R = 0.54,
   p = 4.3E–8) at the protein level (Fig. [168]2h). Generally, RUVBL1
   exhibits DNA- and nucleosome- activated ATPase activity and catalyzes
   ATP-dependent nucleosome sliding in mitosis^[169]31,[170]32. In our
   cohort, we found DCTN2 mutation had positive impacts on the protein
   level of chromosomal/spindle components, evidenced by the markers
   including SMC2, SMC3, etc. (Fig. [171]2i). As well as RUVBL1, DCTN2
   showed a significantly positive correlation with chromosomal/spindle
   components and markers of tumor cell proliferation (e.g., TOP2A, MKI67,
   etc.)^[172]33 (Fig. [173]2i). These findings indicated the activation
   of microtubule and spindle in G2/M in the mitosis. Taken together, the
   links of the findings in our cohort and the Moody’s cohort revealed
   that APOBEC signature was prominent in the ESCC patients with
   non-smoking/drinking habit, in which DCTN2 mutation upregulated the
   protein level of DCTN2, elevating mitosis and cell proliferation
   (Fig. [174]2j).

The gain of chromosome 3q was a key event in the transmit from the NT phase
to the IEN phase

   To explore the impacts of key arm events, we performed whole
   exome-based somatic copy number alterations (SCNAs) analyses based on
   WES data and examined the regulatory effects of 23,109 SCNAs on protein
   expressions of 102 samples in ESCC progression (Supplementary
   Data [175]3d). The integrated genomic data of all phases illustrated
   that the gain of chromosomes 3q (chr3q), which was also observed in the
   TCGA ESCC cohort, was the top-ranked arm event, which was prominent in
   the IEN phase (Fisher’s exact test, p = 2.9E–3, 45.5%) (Fig. [176]3a
   and Supplementary Fig. [177]4a–c). In addition, the individual regions
   at the chr3q gain also showed significant association in ESCC
   progression, including chr3q26.1 gain, chr3q29 gain, chr3q22.1 gain,
   and chr3q12.2 gain (Supplementary Fig. [178]4d). To explore the
   biological functions of chr3q gain, we performed cis-effects analysis
   of the genes with CNA regions at the protein level, and found 20 genes
   at chr3q gain perturbation profiles had significantly positive cis
   effects on their associated proteins (Fig. [179]3b). Interestingly,
   nearly 35% genes at chr3q gain were positively associated with Ca^2+
   signal.

Fig. 3. Integrative omics analyses of early ESCC samples.

   [180]Fig. 3
   [181]Open in a new tab

   a The significant arm events of 102 samples in ESCC progression. The
   gain and loss events are highlighted in red and blue, respectively. b
   Volcano plot showing the cis- correlation of the SCNA (x-axis) and the
   associated –log[10] (p value) (y-axis) on the genes at chr3q gain
   (two-sided Spearman’s correlation test). c The cis SCNA-protein
   regulations of significantly correlated genes (top) on their
   corresponding proteins expression (bottom) (two-sided Spearman’s
   correlation test, p < 0.05). d Dependency map-supported
   ([182]https://depmap.org) panels showing relative survival averaged
   across all available ESCC cell lines after depletion of the indicated
   genes by RNAi or CRISPR. The right shows Pearson’s correlation and p
   value of these genes with PCNA and MKI67 at the protein level
   (two-sided Pearson’s correlation test). e A brief summary of the
   impacts of chr3q gain. f Volcano plot showing the impacts of the top
   ten mutations of ESCC progression on proteins expression
   (log[2]-transformed Intensity) (x-axis) and the associated –log[10]
   (FDR) (y-axis) (two-sided Student’s t-test). g Scatterplot showing the
   relationship between log[10] GSK3A and log[10] MACF1 expression at the
   protein level in the Fudan cohort (two-sided Pearson’s correlation
   test, mean ± SD). h Represented pathway enrichment of proteins that was
   positively correlated with MACF1 in the Fudan cohort (top) and the TCGA
   ESCC cohort (bottom). Biological pathways are analyzed from the GO/KEGG
   database. i Heatmap showing the impacts of the mutation of MACF1 on the
   expression of Wnt signaling-related proteins in ESCC progression
   (Kruskal–Wallis test, BH-adjusted *p < 0.05). The square directs to a
   subset of patient samples used for WES (n = 102). j A brief summary of
   the impacts of the mutation of MACF1. ****p < 1.0E–4, ***p < 1.0E–3,
   **p < 0.01, *p < 0.05, ns. > 0.05. Source data are provided as a Source
   data file.

   The Ca^2+ signal is identified as a key regulator in processes of
   excitable cells (e.g., neurons) and non-excitable cells (including
   those of the epithelia)^[183]34, where it controls a diverse array of
   processes such as secretion, proliferation^[184]35, and promotes cancer
   cell survival^[185]36. In our cohort, all the amplifications of genes
   in the Ca^2+ signal were detected in the IEN phase (Fig. [186]3c),
   suggesting the gain of chr3q event was detected as early as in the ESCC
   early stage. In addition, we found the amplifications of those genes
   also had significantly positive effects on other parallel proteins in
   the Ca^2+ signal at the protein level, such as the amplification of
   GMPS was positively associated with the protein levels of GFM1, OPA1,
   etc. (Fig. [187]3c). To assess the effects of these amplified genes, we
   annotated outliers for the degree of which CRISPR- or short hairpin RNA
   (RNAi)-mediated depletion reduced ESCC cell lines^[188]37. As a result,
   we found the deletion of three genes (GMPS, GFM1, and OPA1) had
   negative effects on the proliferation of ESCC cell lines (Fig. [189]3d
   and Supplementary Data [190]4a, b). In addition, the three genes (GMPS,
   GFM1, and OPA1) exhibited a positive correlation with PCNA and MKI67,
   makers of tumor cell proliferation^[191]33 (Fig. [192]3d).

   Furthermore, we found that the other important genes at chr3q (e.g.,
   TP63/PIK3CA/SOX2) were also prominent in the IEN phase (Fisher’s exact
   test, p < 0.05, 59.1%, 70.9%, 59.1% for TP63/PIK3CA/SOX2 respectively)
   (Supplementary Fig. [193]4e). Specifically, the amplifications of TP63
   at chr3q28 showed positive impacts on DNA replication (Supplementary
   Fig. [194]4f, g), further indicating the functions of chr3q gain in the
   transmit process from the NT phase to the IEN phase, and implying the
   impacts of TP63 amplification on DNA replication in ESCC progression.
   Taken together, the chr3q gain was a driven event in the transmit
   process from the NT phase to the IEN phase, leading to the activation
   of Ca^2+ signal and cell proliferation in the IEN phase (Fig. [195]3e).

The impacts of genomic aberrations in ESCC progression

   Notably, TP53, highly mutated in ESCC^[196]7, was the top-ranked
   mutation in ESCC progression, and was prominent in the younger male
   ESCC patients (Fisher’s exact test, p = 0.044, 65.8%) in our cohort
   (Supplementary Fig. [197]4h). To explore the impacts of TP53 mutation
   in ESCC progression, we incorporated the alterations of significantly
   upregulated proteins (SUPs, Wilcoxon rank-signed test, FDR < 0.05)
   (Supplementary Fig. [198]4i). As well as in the TCGA cohort^[199]15,
   the GO enrichment disclosed that those overlapped SUPs were related to
   DNA replication and cell cycle (e.g., MCM3/4/5/6, CCNK, PPP1R8, etc.),
   and ECM signaling (e.g., ITGA2/3/5, LAMA3/5, etc.), which was also
   overrepresented in the tumor tissues compared to the paired
   non-cancerous adjacent tissues (NATs) in the Liu’s cohort^[200]26
   (Supplementary Fig. [201]4j, k). Corresponding to the stages in ESCC
   progression, TP53 mutation displayed positive impacts on DNA
   replication/cell cycle in the IEN phase, and ECM signaling in the
   A-ESCC phase (Supplementary Fig. [202]4l).

   Among the top ten mutations in ESCC progression, the mutation-protein
   correlation analysis revealed that MACF1 mutation upregulated its
   counterpart protein level (t-test, FDR = 5.5E–5) (Fig. [203]3f and
   Supplementary Fig. [204]4m). In addition, gradually increased
   expression of MACF1 was detected (Kruskal–Wallis test, FDR = 6.3E–9,
   A-ESCC vs. NT ratio = 1.5E + 3) at the protein level in ESCC
   progression (Supplementary Fig. [205]4n).

   Generally, MACF1 is a multidomain protein that associates with
   microfilaments and microtubules, and binds to a complex (e.g., CTNND1,
   GSK3B, etc.) in Wnt signaling^[206]38. In our study, significant
   correlations were found between MACF1 and GSK3A (Pearson’s R = 0.46,
   p = 3.0E–4), and MACF1 and GSK3B (Pearson’s R = 0.28, p = 0.027) at the
   protein level (Fig. [207]3g and Supplementary Fig. [208]4o). In
   addition, consistent results were found in TCGA cohort^[209]15 at the
   RNA level (Supplementary Fig. [210]4p), inferring the mutation of MACF1
   would activate Wnt signaling. To investigate the impacts, we integrated
   the MACF1 positively associated proteins in our cohort and TCGA cohort,
   and found that those proteins participated in Wnt signaling
   (Fig. [211]3h). Of note, the incorporation of the proteins in Wnt
   signaling affected by MACF1 mutation, were highly expressed in the
   A-ESCC phase (Kruskal–Wallis test, FDR < 0.05, A-ESCC vs. NT ratio ≥
   2), such as GSK3A, GSK3B, CSNK1A, CSNK2B, DVL3, etc. (Fig. [212]3i).
   Furthermore, the CRISPR-mediated depletion of the genes (e.g., CSNK2B,
   CSNK1A, etc.) reduced the proliferation of ESCC cell lines
   (Fig. [213]3i). At the phosphoprotein level, the overrepresented
   phosphorylation of CTNNB1 (T551, S191) and CTNND1 (S346, Y865), were
   observed in the A-ESCC phase (Kruskal–Wallis test, FDR < 0.05, A-ESCC
   vs. NT ratio ≥ 2) (Supplementary Fig. [214]4q). Therefore, the mutation
   of MACF1 upregulated the protein level of MACF1, which bound to the Wnt
   complex and activated Wnt signaling in the A-ESCC phase (Fig. [215]3j).
   Collectively, we uncovered that TP53 mutation observed in the whole
   ESCC stages exhibited diverse impacts during ESCC carcinogenesis, and
   MACF1 mutation had roles in promoting ESCC progression in the A-ESCC
   phase.

Proteomic characterization of three phases in ESCC progression

   During the carcinogenesis of ESCC, three phases covered the NT phase,
   the IEN phase, and the A-ESCC phase. Specifically, the NT phase
   contained stage 1 and stage 2, the IEN phase covered stage 3 last till
   to stage 7, and the A-ESCC phase included the advanced stages T2 and T3
   (Supplementary Fig. [216]5a). To explore the characteristics of the
   three phases of ESCC, we analyzed the TMB and differentially expressed
   proteins (DEPs) of the three phases. As a result, the comparative
   analysis revealed that the NT phase, with the lowest TMB, was featured
   by metabolic process (e.g., KLK12/13, ALOX12, etc.) and inflammatory
   response (e.g., IL36A, SERPINB1, etc.) (Supplementary Figs. [217]2c and
   [218]5c). DNA repair signature was detected in the IEN phase,
   indicating DNA damage was initiated. To react to the external damage,
   the proteins involved in the inflammatory response (e.g., IL36A,
   SERPINB1, etc.) were highly expressed in the NT phase. As shown in
   Figs. [219]2 and [220]3, the IEN phase of ESCC had higher TMB compared
   to the NT phase, and was featured with the chr3q gain. The IEN
   overrepresented proteins participated in the oncogenic-related
   pathways, such as cell cycle (e.g., CDK1/2, RB1, etc.), DNA repair
   (e.g., LIG1, PARP1, etc.), EGFR signaling pathway (e.g., EGFR, SOS1,
   etc.), and so on. Comparatively, the A-ESCC phase of ESCC, with the
   highest TMB, was characterized by the dominant pathways of Wnt
   signaling (e.g., WNT2B, GSK3A, etc.) and glycolysis (e.g., PGK1, ENO3,
   etc.).

   To further demonstrate the findings, we analyzed the DEPs (Wilcoxon
   rank-signed test, FDR < 0.05, IEN vs. NT ratio ≥ 2 or ≤ 0.5) of the
   phases in the main cohort (n = 786) and the validation cohort
   (n = 256), and found consistent findings of the characterizations of
   the phases both in the main cohort and validation cohort. Specifically,
   the primary functions of esophagus were dominant in the NT phase, such
   as metabolic processes including keratinization (e.g., KLK12/13, CSTA,
   etc.), lipid and amino acid metabolism (e.g., CYP4F22, ALOX12, etc.)
   (Supplementary Fig. [221]5b–d). In addition, the overrepresented
   proteins in the IEN phase participated in the oncogenic-related
   pathways, such as cell cycle (e.g., CDK1/2, RB1, etc.), DNA repair
   (e.g., LIG1, PARP1, etc.), EGFR signaling pathway (e.g., EGFR, SOS1,
   etc.), etc. (Supplementary Fig. [222]5b–d). These consistent findings
   in the main cohort and validation cohort indicated the benign of the NT
   phase, and the relative malignancy of the IEN phase, highlighting the
   importance of investigating the key molecular events during the
   carcinogenesis of esophagus.

A carcinogenesis path with eight dynamic waves in ESCC progression

   The proteogenomics on three phases indicated a temporal correlation
   between genomic aberrations and proteomic alterations in ESCC
   progression. To further portray molecular profiles of ESCC in a
   time-resolved mode, we split the whole ESCC progression into 22
   substages. Visualization of the abundance of the E2 proteins (n = 6885;
   Methods) by principal component analysis (PCA) (Methods) differentiated
   the proteome profiles for the 22 substages, which clearly discriminated
   the proteomes of the early and the advanced ESCC stages. In addition,
   PCA of early ESCCs displayed obvious diversity among the substages
   along with ESCC progression (Fig. [223]4a). These results further
   showed the distinct profiles of the substages in ESCC progression.

Fig. 4. The temporal driver pathway waves in ESCC progression.

   [224]Fig. 4
   [225]Open in a new tab

   a Principal component analysis (PCA) of the Fudan cohort. Left: PCA of
   all 786 ESCC samples (including NT phase, IEN phase, and A-ESCC phase);
   Right: PCA of 746 early ESCC samples (NT phase and IEN phase). b
   Heatmap analysis of the dynamic switches during the carcinogenesis of
   ESCC (Kruskal–Wallis test). Left: heatmap analysis of DEPs of the 22
   substages in ESCC progression. Right: the driver pathway waves of 8
   panels in ESCC progression. c The mutations are significantly
   associated with stages in ESCC progression (two-sided Fisher’s exact
   test). The highlighted mutations (right) are exclusively co-mutations
   (two-sided Fisher’s exact test). The square directs to a subset of
   patient samples used for WES (n = 102). p values of the co-mutations
   with AKAP9: ****p = 3.7E–5 (PCDHB16), ****p = 3.7E–5 (BOC),
   **p = 1.7E–3 (STAG2). d The number of the proteins regulated by the
   co-mutations and e the associated biological pathways. f Heatmap
   showing the impacts of the co-mutations of PCDHB16, BOC, SYNE2, BCL9L,
   and STAG2 (top, two-sided Fisher’s exact test), on the protein level
   (bottom) in ESCC progression (Kruskal–Wallis test). *p = 0.038
   (PCDHB16), *p = 0.038 (BOC), *p = 0.048 (SYNE2), *p = 0.048 (BCL9L),
   *p = 0.048 (STAG2). The square directs to a subset of patient samples
   used for WES (n = 102). g The kinase-substrate interactions in ESCC
   progression (Kruskal–Wallis test). A total of 145 samples for
   phosphoproteomic profiling are used in this analysis. *p = 0.015
   (PRKCD), ***p = 2.4E–4 (SRC), **p = 9.6E–3 (CDK7), ****p = 8.0E–6
   (PAK1), ****p = 5.3E–9 (AKT1), ****p = 2.8E–10 (CDK1). ****p < 1.0E–4,
   ***p < 1.0E–3, **p < 0.01, *p < 0.05, ns. > 0.05. Source data are
   provided as a Source data file.

   To investigate specific characteristics of the substages in ESCC
   progression, we performed the substage-based supervised clustering
   analysis (Methods) with substage highly expressed proteins
   (Kruskal–Wallis test, FDR < 0.05, certain substage vs. other substages
   ratio ≥ 2) (Fig. [226]4b). Based on the enrichment analysis of the
   substages highly expressed proteins showing that the molecular
   characterization of the substages in ESCC progression was associated
   with the lesion invasion layers, and could be grouped into eight models
   (1–2_1/2_2–3_1/3_2/3_3/3_4/3_5–4_1/4_2/4_3/–5_1/5_2/5_3–6_1/6_2/6_3–7_1
   /7_2/7_3–8/9) (Supplementary Fig. [227]5g). Thus, we speculated that
   the carcinogenesis path in ESCC progression could be linked to those
   eight models.

   To further demonstrate the speculation, we performed co-expression
   analysis of the substage highly expressed proteins and identified eight
   protein patterns, which were consistent with the results in those eight
   models correlated to the lesion invasion layers (Supplementary
   Fig. [228]5h, k). In addition, the consistent findings of the
   enrichment pathways were also detected in eight models based on
   substages and eight protein patterns based on the results of
   co-expression analysis. Therefore, eight panels were identified on the
   basis of the histopathological stages in a time-resolved mode
   (Fig. [229]4b and Supplementary Fig. [230]5h). Specifically, in the
   (sub)stage 1 (normal tissue), the primary metabolic machinery was
   dominant, such as lipid metabolism (e.g., HMGCS1, ALOX15B, etc.),
   pyruvate metabolism (e.g., CYP4F12, ACOX3, etc.), and amino acid
   metabolism (e.g., ANXA1, CTSB, etc.). In stage 2 (hyperplasia stage),
   the expression of proteins, including PPP2R5A, SERPINB3, and CSTA,
   involved in the immune response to external damage, were
   overrepresented. Enhancement of ERBB signaling and NOTCH signaling, and
   the related proteins (e.g., EGFR, GRB7, etc.) were detected in stage 3
   (Tis stage). In stage 4 (lamina propria stage), insulin signaling and
   IGF signaling-related proteins were overrepresented (e.g., SOD1/2,
   PPP3CC, etc.). Cell cycle/TLR signaling (e.g., MCM2/3, TLR8, etc.) and
   DNA repair/mTOR signaling (e.g., MSH3, MTOR, etc.) were the dominant
   pathways in stage 5 (muscularis mucosa stage), and the stage 6 (sm a
   stage), respectively. In stage 7 (sm b stage), PI3K-AKT signaling and
   EMT signaling (e.g., AKT2/3, COL2A1, etc.) were dominant, of which EMT
   signaling is involved in tumor-initiation and motility^[231]39.
   Glycolysis and Wnt signaling (e.g., PGK1, WNT2B/7B, etc.) related
   proteins were overrepresented in stages 8 (T2 stage) and 9 (T3 stage).
   PGK1, an important glycolytic enzyme, though not mutated at the gene
   level, was highly increased at the protein level, especially in the T2
   and T3 stages of ESCC, which provided further evidence for the crucial
   role of glycolysis, and suggested the potential function of PGK1 in
   ESCC progression.

   In the validation cohort, 256 samples covered 6 of 9 histopathological
   stages, in which stages 6 (submucosal invasion cancer stage a), 8 (T2),
   and 9 (T3), were not included. Substage-based supervised clustering
   analysis identified six proteomic patterns (V1–V6), covering six of
   eight patterns in the main cohort (Supplementary Fig. [232]5i). In
   addition, the stages in the proteomic panels were consistent with those
   in the main cohort.

   To further demonstrate the result at the molecular level, we analyzed
   the DEPs of the stages (Kruskal–Wallis test, FDR < 0.05), and found
   that the molecular characterizations of the six patterns in the
   validation cohort were consistent with those in the main cohort
   (Supplementary Fig. [233]5j). Specifically, the V1 (stage 1, normal
   tissues) proteins were associated with lipid and protein metabolism
   (e.g., ACOX1, ALOX12, etc.) (Supplementary Fig. [234]5j, k). In the V2
   (stage 2, hyperplasia stage), inflammatory response (e.g., COL6A5,
   SERPINB3, etc.) was dominant. NOTCH signaling (e.g., ADAM10, CREBBP,
   etc.) and insulin signaling (e.g., IRS2, PPP1CA, etc.) were the
   dominant pathways in the V3 (stage 3, Tis stage) and V4 (stage 4,
   lamina propria stage), respectively. At the muscularis mucosa stage
   (stage 5, V5), cell cycle and TLR signaling (e.g., MCM2, TLR2, etc.)
   were dominant. The V6 (stage 7, submucosal invasion cancer stage b)
   proteins participated in EMT signaling and PI3K-AKT-mTOR signaling
   (e.g., MMP1/2, PIK3CB, etc.) were consistent with the molecular
   features of the submucosal invasion cancer stage in the main cohort.
   Therefore, the molecular characterizations of the proteomic patterns in
   the validation cohort were consistent with those in the main cohort
   (Supplementary Fig. [235]5k).

   At the gene level, we observed the stage-associated mutations including
   PCDHB16 (Fisher’s exact test, p = 0.038), BOC (Fisher’s exact test,
   p = 0.038), SYNE2 (Fisher’s exact test, p = 0.048), BCL9L (Fisher’s
   exact test, p = 0.048), STAG2 (Fisher’s exact test, p = 0.048), which
   were detected in stage 6 and last till to the stage 9 (Fig. [236]4c).
   In addition, the mutations of PCDHB16, BOC, and STAG2 were co-occurrent
   with AKAP9 mutation, which exhibited positive impacts on glycolysis.
   The recurrent co-occurrence of genomic events helps to dissect the
   genomic complexity underlying tumor progression^[237]40, thus enabling
   us to explore the functional impacts of the co-occurrence mutations in
   ESCC progression. Furthermore, the multi-dimensional omics data
   provided an excellent chance to explore the relationships between the
   genome and the proteome in the time-resolved ESCC progression. We thus
   performed enrichment pathway analysis using the DEPs (n = 608), which
   were the overlapped between the differentially upregulated proteins
   (n = 1748, Mut vs. WT ratio ≥ 2) and highly expressed phosphoproteins
   (n = 2818, submucosa stage to T3 stage) (Fig. [238]4d). The results
   showed that those proteins participated in DNA repair, cell
   proliferation, and glycolysis (Fig. [239]4e). Specifically, DNA repair
   (e.g., LIG1 S66, PARP1 T361, etc.) and glycolysis (e.g., GPI Y564,
   PGAM1 S118, etc.) related proteins were overrepresented in the
   submucosa stages (stages 6 and 7) and in the advanced stages (stages 8
   and 9) of ESCC, respectively, at the protein and phosphoprotein levels
   (Kruskal–Wallis test, FDR < 0.05, certain stage vs. other stages ratio
   ≥ 2) (Fig. [240]4f, g).

   To explore the kinase-substrate interactions in ESCC progression, we
   integrated the overrepresented kinases in the co-occurrence mutations
   group, and performed the kinase-substrate analysis based on
   phosphoprotein data. As a result, we found the kinases-substrate
   regulation was notably consistent with the dynamic waves in ESCC
   progression. Specifically, the stage 6 overrepresented kinases (e.g.,
   PRKCD, SRC, CDK7) (Kruskal–Wallis test, FDR < 0.05, stage 6 vs. other
   stages ratio ≥ 2), showed positive regulation with the DNA
   repair-related phosphoproteins, such as BCLAF1 S161, EP300 T885, POLR2A
   T1898, etc. The T2 and T3 stages overrepresented kinases (e.g., PAK1,
   AKT1, CDK1) (Kruskal–Wallis test, FDR < 0.05, T2 and T3 stages vs.
   other stages ratio ≥ 2), displayed positive regulation on glycolysis
   related phosphoproteins, evidenced by the overrepresented
   phosphorylation of PGAM1 S14 and S118, PFKFB3 S22, LDHA T95, etc.
   (Fig. [241]4g). These results suggested the potential therapy of kinase
   inhibitor (e.g., XMD11-85h, Dasatinib, CGP-60474/Seliciclib, etc.) in
   ESCC progression.

   Taken together, a carcinogenesis path with eight dynamic waves in ESCC
   progression was revealed on the basis of the consistency among the
   genomic aberrations, proteomic alterations, and phosphoproteomic
   actions: metabolism (e.g., ANXA1 and CTSB) – DNA damage (e.g., PPP2R5A
   and SERPINB3) – cell proliferation (e.g., EGFR and EGF) – lesion
   invasion (e.g., GRB2 and PRKCB) – cell cycle (e.g., MCM2/3 and FGFR1OP)
   – cell differentiation (e.g., MSH3 and MTOR) – tumor metastasis (e.g.,
   MMP2 and AKT2/3) – esophageal carcinogenesis (e.g., PGK1 and CTNNB1)
   (Supplementary Fig. [242]5k). Furthermore, these results also defined
   the substages-specific molecular characteristics and uncovered the
   potential candidates for ESCC malignancy.

Proteome clusters of ESCC progression

   Consensus clustering (Methods) identified two major proteomic clusters
   which showed association with the classification of the (sub)stages in
   ESCC progression: Cluster 1 (C1, n = 314) included the stage 1, and 2,
   and Cluster 2 (C2, n = 472) contained the rest of the stages, including
   the T2 and T3 stages (Fig. [243]5a, Supplementary Fig. [244]6a, and
   Supplementary Data [245]5a). Comparative analysis of the later-stage
   samples (especially the IEN phase samples) of C1 and C2 with the clinic
   features disclosed that the later samples in the C1 were prominent in
   the younger ESCC patients (<50 years old, Fisher’s exact test,
   p = 0.029, 15.0%) (Supplementary Fig. [246]6b), indicating the
   potential impacts of ages in ESCC progression. In addition, the two
   clusters were associated with 3 phases in ESCC progression, in which
   the C2 contained the most samples in the IEN phase (95.3%) and the
   A-ESCC phase (100%), reflecting more malignancy of the C2
   (Supplementary Fig. [247]6c). This distinct two-stage separation
   suggested that the irreversible fundamental proteome alterations were
   detected as early as in the Tis stage (stage 3). Van der Schaaf et al.
   pointed out that the Tis stage was associated with worse
   survival^[248]41, further indicating the malignancy of Tis stage in
   ESCC progression.

Fig. 5. Proteomic clusters and the impacts of AKAP9 mutation in ESCC
progression.

   [249]Fig. 5
   [250]Open in a new tab

   a Consensus clustering analysis of 786 samples (two-sided Fisher’s
   exact test). Left: the percentages of the two clusters in 22 substages;
   Right: 786 samples were classified into two clusters based on proteomic
   patterns. *p = 0.029 (Age), ****p < 2.2E–16 (Phases), ****p < 2.2E–16
   (Substages). b Volcano analysis of DEPs (left) in the two clusters and
   their associated biological pathways (right) in the two clusters
   (two-sided Student’s t-test). Biological pathways were analyzed from
   the Reactome database. C1: the Cluster 1. C2: Cluster 2. c Venn diagram
   depicting the number of the genes both detected in the genome and
   proteome in C2. The right shows the significant C2 mutations with
   mutation frequency over 10%. d Heatmap showing the impacts of AKAP9
   mutation on the protein level of AKAP9 (two-sided Student’s t-test,
   BH-adjusted **p = 8.4E–3). e Scatterplot showing the relationship
   between log[10] PRKACA and log[10] AKAP9 expression at the protein
   level (two-sided Pearson’s correlation test, mean ± SD). f GSEA plot
   (KEGG gene sets) for glycolysis in AKAP9 mutation and WT comparison. g
   Heatmap depicting the impacts of AKAP9 mutation on glycolysis in ESCC
   progression (two-sided Student’s t-test, BH-adjusted *p < 0.05). The
   square directs to a subset of patient samples used for WES (n = 102). h
   Scatterplots showing the relationship between log[10] G6PD (left)/HK1
   (right) and log[10] GPI expression at the protein level (two-sided
   Pearson’s correlation test, mean ± SD). i A brief summary of the
   impacts of AKAP9 mutation. ****p < 1.0E–4, ***p < 1.0E–3, **p < 0.01,
   *p < 0.05, ns. > 0.05. Source data are provided as a Source data file.

   The significance analysis of microarray (SAM)^[251]42 analysis
   (Methods) was performed to investigate the characteristics of the two
   clusters at the protein level, which identified 2922 DEPs between C1
   and C2 (t-test, FDR < 0.05, C2 vs. C1 ratio ≥ 2 or ≤ 0.5), including
   168 and 2754 proteins overrepresented in the C1 and C2, respectively
   (Supplementary Fig. [252]6d and Supplementary Data [253]5b). The
   results showed that the oncogenic pathways-related proteins, including
   ATM, CDK1, EGFR, CASP3, VCAN, etc., were significantly overrepresented
   in the C2. On the contrary, the overrepresented proteins of the C1,
   including KLK12, ACOX3, ALOX15B, and ANXA1, involved in keratinization
   and inflammatory response, were associated with the primary biological
   function of normal esophagus (Fig. [254]5b and Supplementary
   Fig. [255]6g). To validate the molecular characterization of two
   proteomic clusters, we performed PCA of the 256 samples in the
   validation cohort, which were also classified into two clusters.
   Consistent with the findings in the main cohort, the C1 included stages
   1 and 2, and the C2 contained the rest of the stages in the validation
   cohort (Supplementary Fig. [256]6e). In addition, the proteomic
   clusters in the validation cohort also showed a positive association
   with the phases in ESCC progression, and specifically, the C1 (Fisher’s
   exact test, p < 1.0E–4, 89.5%) and C2 (Fisher’s exact test, p < 1.0E–4,
   64.4%) consisted most of the samples in the NT phases and the IEN
   phase, respectively. Furthermore, based on the SAM analysis, the C1
   proteins in the validation cohort participated in the primary
   biological function of normal esophagus, such as keratinization (e.g.,
   KLK12/13, CSTA, etc.), fatty acid metabolism (e.g., ACOX3, ALOX12,
   etc.), and amino acid metabolism (e.g., MGLL, CYP4F12, etc.)
   (Supplementary Fig. [257]6f, g). The oncogenic pathways-related
   proteins were overrepresented in the C2 both in the main cohort and
   validation cohort, including cell cycle (e.g., CDK1, SKP1, etc.), DNA
   repair (e.g., DDB1, XPC, etc.), glycolysis (e.g., ENO2, HK3, etc.), and
   so on. The specific biomarkers of normal esophagus annotated from Human
   Proteome Atlas (HPA, [258]https://www.proteinatlas.org), including
   GBP6, TGM1, TGM3, and S100A14, were overrepresented in the C1, which
   gradually decreased from the normal stage (stage 1) to advanced stages
   (stages 8 and 9) of ESCC (Kruskal–Wallis test, FDR < 2.2E–16, stage 9
   vs. stage 1 ratio ≥ 2) (Supplementary Fig. [259]6h, i). Taken together,
   these results further confirmed the dysregulation of metabolism and
   oncogenic pathways in ESCC progression.

The AKAP9 mutation enhanced glycolysis in the A-ESCC phase

   Incorporation of genomic aberrations (n = 40, mutation frequency > 10%)
   and the C2 overrepresented proteome (n = 3385, C2 vs. C1 ratio ≥ 2)
   revealed that seven mutations (e.g., AKAP9, MACF1, EP400, etc.) were
   exclusively detected in the C2 with mutation frequency over 10%
   (Fig. [260]5c). Furthermore, the mutation of AKAP9 was associated with
   poor prognosis outcomes of ESCC patients (log-rank test, p = 0.029)
   indicating the potential functions of AKAP9 mutation in ESCC
   progression (Supplementary Fig. [261]6j).

   AKAP9, one of the A-kinase anchoring proteins (AKAPs), binds to the
   regulatory subunit of AMP-dependent protein kinase (PKA) and achieves
   the activation of PKA^[262]43, which regulates multiple signaling
   cascade^[263]44, such as glucose metabolism including
   glycolysis^[264]45. In our cohort, we found the mutation of AKAP9
   upregulated the protein level of AKAP9 (t-test, AKAP9 Mut vs. WT ratio
   = 6.58, FDR = 8.4E–3) (Fig. [265]5d and Supplementary Fig. [266]6k).
   Moreover, AKAP9 showed a significantly positive correlation with
   PRKACA, one of the PKA catalytic subunits^[267]46, at the protein level
   (Pearson’s R = 0.37, p = 0.011) (Fig. [268]5e). Gene set enrichment
   analysis (GSEA) demonstrated that the AKAP9 mutation
   positive-correlated proteins were converged on glycolysis (normalized
   enrichment score (NES) = 2.37, FDR = 0) (e.g., HK1, GPI, PGK1, etc.),
   which were gradually increased in ESCC progression (Kruskal–Wallis
   test, FDR < 0.05, A-ESCC vs. NT ratio ≥ 2) (Fig. [269]5f, g).
   Consistently, the expression of AKAP9 was significantly overrepresented
   in the C2 in the validation cohort (t-test, FDR = 3.1E–3, C2 vs. C1
   ratio = 3.29) (Supplementary Fig. [270]6l). We integrated AKAP9
   positively associated proteins (n = 1344), and found those proteins
   were involved in glucose metabolism, such as glycolysis, TCA cycle,
   etc. (Supplementary Fig. [271]6m). These consistent findings further
   indicated the impacts of AKAP9 mutation on glycolysis in ESCC
   progression.

   Allosteric regulation refers to the process where the effect of binding
   of a ligand at one site of a protein is transmitted to another, often
   distant, functional site, and thus regulates biological processes
   including glucose metabolism^[272]47. As well as HK1 and GPI, G6PD, as
   one of the activators in the transformation process from
   glucose-6-phosphate to fructose-6-phosphate^[273]48, showed a
   significant association with GPI (Pearson’s R = 0.33, p = 1.7E–3),
   further demonstrating the activation of glycolysis (Fig. [274]5h and
   Supplementary Fig. [275]6n). At the phosphoprotein level, we also
   detected the increased phosphorylation of glycolysis (e.g., ENO1 S272,
   PGAM1 S189, etc.) in ESCC progression (Kruskal–Wallis test, FDR < 0.05,
   A-ESCC vs. NT ratio ≥ 2) (Supplementary Fig. [276]6o). Taken together,
   the mutation of AKAP9, activated PKA and enhanced the energy formation,
   elevating the process of glucose metabolism, especially glycolysis, in
   the A-ESCC phase (Fig. [277]5i). Thus, our study provided valuable
   insights into the proteomic characteristics with AKAP9 mutation and
   revealed the impacts on glycolysis in ESCC progression at the
   multi-omics level.

Personalized trajectory revealed six major carcinogenesis tracks of early
ESCC

   The diversity and tumoral heterogeneity of ESCC remain challenging to
   decide precise clinical strategies for different ESCC patients who have
   diverse featured carcinogenesis tracks. To this end, we used trajectory
   inference methods^[278]49 (Methods) to trace the carcinogenesis
   lineages of early ESCC in the cohort. As a result, six major tracks
   were classified (13.2%, 7.0%, 7.9%, 43.9%, 19.3%, and 8.7% of patients,
   respectively) (Fig. [279]6a, b and Supplementary Data [280]6a). The
   track proteins were determined by the expression trend along with ESCC
   progression (Kruskal–Wallis test, FDR < 0.05, stage 9 vs. stage 1 ratio
   ≥ 2), and the dominant pathways of six major tracks were annotated as
   follows: (1) track 1 (T1, n = 15), biomaterial synthesis; (2) track 2
   (T2, n = 8, non-drinking/smoking track), ECM signaling; (3) track 3
   (T3, n = 9, female track), cell cycle; (4) track 4 (T4, n = 50,
   mainstream track (mainstream population of ESCC patients)), DNA repair;
   (5) track 5 (T5, n = 22, older track), glucose metabolism; and (6)
   track 6 (T6, n = 10, drinking/smoking track), immune response
   (Fig. [281]6b and Supplementary Data [282]6b, c). The proteins, of
   which the expression was gradually decreased in six tracks
   (Kruskal–Wallis test, FDR < 0.05, stage 9 vs. stage 1 ratio ≤ 0.5),
   were involved in the primary biology of normal esophagus, such as
   epithelial cell differentiation (e.g., EVPL and AHNAK2) and
   keratinization (e.g., FLG, KLK12/13 and SPRP3) (Supplementary
   Fig. [283]7a).

Fig. 6. Personalized trajectory reveals six major carcinogenesis tracks of
the early ESCCs.

   [284]Fig. 6
   [285]Open in a new tab

   a The trajectory of 746 samples (top) and 114 early ESCC cases were
   grouped into 9 (bottom). b Sankey diagram analysis of 114 early ESCC
   cases (top, main cohort) and 49 early ESCC cases (bottom, validation
   cohort). c Venn diagram showing the track mutations (top) and the CAGs
   (bottom) (two-sided Fisher’s exact test). CAGs: cancer-associated
   genes. The overlapped mutations are shown in the box. d The
   CAG-associated track mutations in the early ESCCs. The co-mutations are
   highlighted on the left (two-sided Fisher’s exact test), and the
   mutation frequency is shown on the right. The square directed to a
   subset of patient samples used for WES (n = 68) in early ESCCs.
   Co-mutations: *p = 0.032 (TP53 and EPAS1), *p = 0.032 (TP53 and EPHA3),
   ****p = 9.3E–6 (EPAS1 and EPHA3), *p = 2.2E–7 (STAG2 and USP6),
   ****p = 9.3E–6 (USP6 and AKAP9), ****p = 3.2E–5 (STAG2 and AKAP9). e
   GSEA plot (KEGG gene sets) for ECM signaling in EPAS1 mutation and WT
   comparison. f Venn diagram depicting the number of the overlapped
   proteins enhanced by the mutation of EPAS1 and T2 enhanced
   phosphoprotein (top), and the associated biological pathways (bottom).
   SUPs: the significantly upregulated proteins. g Heatmap showing the
   represented protein in the cell–cell adhesion positive associated with
   EPAS1 mutation (two-sided Fisher’s exact test). The square directs to a
   subset of patient samples used for WES (n = 68) in early ESCCs.
   Co-mutations: *p = 0.032 (TP53 and EPAS1), *p = 0.032 (TP53 and EPHA3),
   ****p = 9.3E–6 (EPAS1 and EPHA3). h Heatmap showing the phosphorylation
   of the phosphoproteins in cell–cell adhesion (Kruskal–Wallis test). The
   square directs to a subset of patient samples used for phosphoproteome
   (n = 119) in early ESCCs. ****p < 1.0E–4, ***p < 1.0E–3, **p < 0.01,
   *p < 0.05, ns. > 0.05. Source data are provided as a Source data file.

   Furthermore, different tracks were closely associated with various
   clinical features of early-stage ESCC patients, which improved our
   understanding of tumor heterogeneity. For example, track 3, featured
   with cell cycle, had the highest proportion of female patients
   (Fisher’s exact test, p = 7.7E–14, 29.2%) (Supplementary Fig. [286]7c),
   significantly greater than the natural proportion of female patients in
   ESCC (no more than 10%)^[287]50. Track 6, featured as immune response,
   had the highest proportion of patients with drinking/smoking habits
   (Fisher’s exact test, p = 2.3E–16, 18.1%), which was associated with
   chronic inflammatory and regulated oxidative stress in various cancer
   types^[288]51, revealing the unique track in the patients with
   drinking/smoking habits.

   To validate the features of the six major tracks, the trajectory
   inference methods were also applied to 49 early-stage ESCC patients in
   the validation cohort. As a result, we found all the patients in the
   validation cohort were also classified into tracks 1–6, of which the
   molecular characterization was similar to those in track 1 to track 6
   from 114 early-stage ESCC patients in the main cohort (Fig. [289]6b,
   Supplementary Fig. [290]7b, d, and Supplementary Data [291]6d–f).
   Specifically, the ESCC patients with drinking/smoking habits were
   prominent in track 6 of the validation cohort (Fisher’s exact test,
   p = 5.1E–9, 18.6%) (Supplementary Fig. [292]7c). To explore the
   features of the ESCC patients with drinking/smoking habits, we
   integrated the gradually increased proteins, which participated in the
   immune response pathways, including interleukins signaling, antigen
   processing and presentation, etc. (Supplementary Fig. [293]7e). The
   consistent findings in the characterizations of track 6 were observed
   in the main and validation cohorts, further validating the findings of
   the personalized tracks.

   At the gene level, we identified 150 differential track mutations, 5 of
   which were also covered in CAGs with mutation frequency over 10%,
   including TP53, EPAS1, EPHA3, STAG2, and USP6 (Fig. [294]6c and
   Supplementary Data [295]6g). Furthermore, we found the mutations of
   STAG2 (Fisher’s exact test, p = 1.1E–3, track 2 vs. track 4 vs. track 5
   vs track 6 = 0% vs. 0% vs. 30.4% vs. 0%) and USP6 (Fisher’s exact test,
   p = 3.1E–3, track 2 vs. track 4 vs. track 5 vs track 6 = 0% vs. 0% vs.
   26.1% vs. 0%) were prominent in track 5, which were all co-occurrence
   with AKAP9 mutation (Fisher’s exact test, p = 3.2E–5 for STAG2 and
   9.3E–6 for USP6), suggesting the roles of glycolysis in the ESCC
   patients in track 5 (Fig. [296]6d). In addition, GSEA showed the
   positive impacts of the mutations of STAG2 (NES = 2.10, FDR = 7.2E–4),
   and USP6 (NES = 2.04, FDR = 0) on pentose phosphate pathway
   (Supplementary Fig. [297]7f), which were the evidence of glucose
   metabolism characteristics in track 5.

   Notably, the mutations of TP53 (Fisher’s exact test, p = 7.2E–3, track
   2 vs. track 4 vs. track 5 vs track 6 = 100% vs. 73% vs. 57% vs. 27%),
   EPAS1 (Fisher’s exact test, p = 3.2E–5, track 2 vs. track 4 vs. track 5
   vs track 6 = 71% vs. 0% vs. 4% vs. 0%), and EPHA3 (Fisher’s exact test,
   p = 2.2E–7, track 2 vs. track 4 vs. track 5 vs track 6 = 86% vs. 0% vs.
   0% vs. 0%) were prominent in track 2. As shown in Supplementary
   Fig. [298]4l, TP53 mutation displayed positive impacts on ECM
   signaling, demonstrating the characteristics of track 2. GSEA displayed
   the positive impacts of the mutations of EPAS1 (NES = 1.99, FDR = 0),
   EPHA3 (NES = 1.83, FDR = 1.6E–3) on ECM signaling (Fig. [299]6e and
   Supplementary Fig. [300]7g). Furthermore, EPAS1 mutation was
   co-occurrence with the mutations of TP53 (Fisher’s exact test,
   p = 0.032) and EPHA3 (Fisher’s exact test, p = 9.3E–6). To elucidate
   the impacts of EPAS1 genomic aberrations on proteomic alterations and
   phosphoproteomic actions, we integrated the molecules (n = 126)
   overlapped in the EPAS1 mutation SUPs and track 2 overrepresented
   phosphoproteins. As a result, we found those overlapped molecules were
   involved in cell–cell adhesion (e.g., ITGA5, EGFR, etc.) (Fig. [301]6f,
   g). Compared with other tracks, the phosphorylation of the cell–cell
   adhesion proteins (ITGA5 S127 and S128, EGFR S991, etc.) was
   overrepresented in track 2 (Fig. [302]6h). Together, our study revealed
   six major carcinogenesis tracks, and found the track specific mutations
   had positive impacts on the track carcinogenesis lineages of ESCC.

PGK1, aberrant glycolytic enzyme, is a potential therapeutic target

   Abnormal glycolytic metabolism was observed in the whole process of
   ESCC carcinogenesis at the multi-omics level, which dramatically
   increased throughout carcinogenesis (Fig. [303]7a). PGK1, the first
   ATP-generating enzyme in glycolysis, gradually increased in ESCC
   progression in all 6 tracks at the protein level and identified as the
   nominated drug-targetable protein in ESCC progression (Supplementary
   Fig. [304]8a). In addition, we performed Cox regression analysis to
   assess the prognostic value of PGK1 expression, which was negatively
   correlated with the OS of ESCC in the TCGA dataset (log-rank test,
   p = 7.8E–3) (Fig. [305]7b and Supplementary Data [306]7a). To further
   cross-validate these results, our dataset confirmed that the expression
   of PGK1 was gradually increased in ESCC at the protein and
   phosphoprotein levels (Kruskal–Wallis test, FDR = 1.3E–18 for proteome
   and 4.2E–3 for phosphoproteome, stage 9 vs. stage 1 ratio = 2.30 for
   proteome and 2.49 for phosphoproteome) (Fig. [307]7c), as demonstrated
   by immunohistochemistry of ESCC FFPE slides in which PGK1 was gradually
   increased in the process from T0 stage (normal tissue) to the Tis
   stage, SM2 stage, and advanced stage (Fig. [308]7d). In addition, the
   elevated protein of PGK1 was identified in more than 75% (476/672)
   samples for proteome and 60% (75/125) for phosphoproteome from the
   stage 2 to stage 9 (n = 672 for proteome and 125 for phosphoproteome)
   (Fig. [309]7a, c). The only identified motif (sP) of PGK1 was
   ubiquitously (125/145) detected, and the expression of PGK1 S203 was
   also gradually increased with ESCC progression (Kruskal–Wallis test,
   FDR = 0.011, stage 9 vs. stage 1 ratio = 2.44) (Fig. [310]7e and
   Supplementary Fig. [311]8b).

Fig. 7. Aberrant glycolytic metabolism in ESCC and alterations in the
activities of its key enzyme, PGK1.

   [312]Fig. 7
   [313]Open in a new tab

   a Aberrant glycolysis in ESCC progression at the multi-omics level
   (Kruskal–Wallis test, BH-adjusted *p < 0.05). A total of 786 samples
   for proteomic profiling and 145 samples for phosphoproteomic profiling
   are used. b Highly expressed PGK1 is negatively correlated to prognosis
   (two-sided log-rank test). c Boxplots showing the increased expression
   (log[10]-transformed Intensity) of PGK1 in ESCC at the protein (left)
   and phosphoprotein (right) levels (Kruskal–Wallis test). Boxplots show
   median (central line), upper and lower quartiles (box limits), 1.5×
   interquartile range (whiskers). A total of 786 samples and 145 samples
   were used for proteome and phosphoproteome, respectively. Proteome: n
   (stage 1) = 114, n (stage 2) = 206, n (stage 3) = 259, n (stage 4) =
   86, n (stage 5) = 32, n (stage 6) = 17, n (stage 7) = 32, n (stage 8) =
   16, n (stage 9) = 24 biologically independent samples examined.
   Phosphoproteome: n (stage 1) = 20, n (stage 2) = 37, n (stage 3) = 31,
   n (stage 4) = 14, n (stage 5) = 5, n (stage 6) = 3, n (stage 7) = 9, n
   (stage 8) = 10, n (stage 9) = 16 biologically independent samples
   examined. d Immunohistochemistry analysis of PGK1 expression in normal
   (T0), Tis (T1), SM2 (T1), and advanced stage (T2/T3) tissues. The zone
   with the dotted lines and red arrow represents PGK1 positive staining.
   The scale bar indicates 50 µm. e Analysis of the serine motif of PGK1
   (sP). The top shows the sequence and phosphorylated sites of PGK1
   (S203). The bottom presents that PGK1 S203 is detected at almost
   samples in ESCC (125/145) and the kinases associated with the motif of
   PGK1 S203 (“sP”). f The expression of the kinases and the substrates in
   ESCC progression at the phosphoprotein level. The square directs to a
   subset of patient samples used for phosphoproteome. g Volcano plot
   displaying the ERK2-substrates (top) and CDK2-substrates (bottom)
   regulation results (two-sided Wilcoxon rank-sum test). The red marks
   the overrepresented substrates (left) and phosphorylations (right) in
   the kinases highly expressed group. h Histogram showing the Z-score and
   FDR of the KSEA results. A total of 145 samples for phosphoproteomic
   profiling are used in the analysis. i The SCNAs of CDK2 have positive
   effects on PGK1 expression (two-sided Wilcoxon signed-rank test). j The
   impacts of the SCNAs of CDK2 (middle) on the substrates expression of
   the kinases (bottom), associated with the PGK1 motif (sP). The square
   directs to a subset of patient samples used for WES (n = 102).
   ****p < 1.0E–4, ***p < 1.0E–3, **p < 0.01, *p < 0.05, ns. > 0.05.
   Source data are provided as a Source data file.

   To investigate the proteome-phosphoproteome regulation between the
   motif of PGK1 and the kinases, we applied motif extraction
   algorithm^[314]52 to MS phosphorylation dataset from 145 samples. The
   results revealed the association of ERK1/2, CDKs, and GSK-3 with the
   motif of PGK1 (sP)^[315]53, whose downstream substrates and the related
   corresponding phosphorylations, including MAP1S (S759), FOXK1 (S420),
   MAP2K1 (T386), etc., were increased in ESCC progression at the protein
   and phosphoprotein levels (Kruskal–Wallis test, FDR < 0.05, stage 9 vs.
   stage 1 ratio ≥ 2) (Fig. [316]7f). In addition, we found that PGK1 and
   the phosphorylation (S203) were overrepresented in the kinases highly
   expressed group (two-sided Wilcoxon rank-sum test, FDR < 0.05, highly
   vs. lowly expressed group ratio ≥ 2), especially in ERK2 and CDK2
   highly expressed group (Fig. [317]7g), indicating the potential
   functions of ERK2 and CDK2 in the activation of PGK1 (S203) in ESCC
   progression.

   Our previous study has found ERK1/2 could phosphorylate PGK1 S203 and
   result in the mitochondrial translocation of PGK1^[318]54. In this
   study, we performed the kinase-substrate enrichment analysis (KSEA) of
   the phosphoproteome of PGK1 (S203). The results of the score and FDR
   showed that ERK2 was the top-rank one kinase to activate PGK1 in ESCC
   progression (Fig. [319]7h). Furthermore, we found that the SCNAs of
   CDK2 had positive impacts on the expression of PGK1 (Wilcoxon rank-sum
   test, FDR = 2.8E–3) (Fig. [320]7i), and the substrates (e.g., GTF2F1,
   PAK4, etc.) of kinases (Fig. [321]7j). These findings indicated that
   the SCNA of CDK2 and ERK1/2 synergistically induced the total activity
   of PGK1 through increasing PGK1 expression both at the protein and
   phosphoprotein levels. Thus, we proposed that PGK1 S203 could be
   implicated in identifying the potential therapeutic target to manage
   ESCC.

   Next, we investigated the roles of PGK1 in regulating glucose and
   serine metabolism. Overexpression of PGK1 in KYSE150 cells increased
   the levels of glycolytic-citrate cycle flux metabolites, including
   3-PG, pyruvate, and lactate in glycolysis, and citrate, succinate, and
   fumarate in citrate cycle (Supplementary Fig. [322]8c and Supplementary
   Data [323]7b). In addition, the levels of serine and glycine were also
   increased in PGK1-overexpressing cells. Conversely, the knockdown of
   PGK1 decreased the concentration of metabolites in glycolysis and
   citrate cycle, as well as serine and glycine (Supplementary
   Fig. [324]8d and Supplementary Data [325]7c). Furthermore, the
   overexpression of ERK2 led to increased Ser-phosphorylation level, but
   not Thr- or Tyr- phosphorylation levels of PGK1 in KYSE150, KYSE70,
   ECA109, and TE-8 cell lines (Fig. [326]8a). On the contrary, ERK2 could
   not increase the Ser-phosphorylation level of PGK1 S203 mutant (S203A)
   (Fig. [327]8a), indicating that ERK2 phosphorylated PGK1 S203 in ESCC
   cells. Moreover, the increased Ser-phosphorylation level of PGK1 led to
   mitochondrial translocation of PGK1 (Fig. [328]8b), which increased the
   phosphorylation level of PDHK1 at T338 (Fig. [329]8c), and decreased
   pyruvate dehydrogenase (PDH) activity in ERK2 overexpressing cells
   (Fig. [330]8d and Supplementary Data [331]7d). It was also observed
   that the metabolites of glycolysis and serine metabolism were further
   increased, while citrate cycle metabolites were decreased in ERK2
   overexpression cells (t-test, p < 0.05) (Supplementary Fig. [332]8c),
   suggesting that the overexpression of ERK2 could shut down the pyruvate
   dehydrogenase complex. Moreover, through Seahorse assay, we found
   overexpression of PGK1 decreased oxygen consumption rate (OCR) and ATP
   production, and increased extracellular acidification rate (ECAR)
   (Fig. [333]8e, f and Supplementary Data [334]7e). In contrast,
   knockdown of PGK1 increased OCR and ATP production, and decreased ECAR
   (Fig. [335]8e, f and Supplementary Data [336]7f). In addition,
   increased expression of ERK2 further decreased OCR and ATP production,
   and increased ECAR, in PGK1-overexpression cells (Fig. [337]8e, f).
   Collectively, these results indicated that the increased expression and
   phosphorylation levels of PGK1 synergistically enhanced glycolysis and
   serine metabolism. Accordingly, we confirmed that co-overexpression of
   PGK1 and ERK2 in the above four kinds of ESCC cell lines (KYSE150,
   KYSE70, ECA109, and TE-8) promoted their proliferation most profoundly,
   compared to cells overexpressing either PGK1 or ERK2 (t-test,
   p < 1.0E–4) (Fig. [338]8g and Supplementary Fig. [339]8e, and
   Supplementary Data [340]7g). In contrast, the knockdown of PGK1 slowed
   down the cell proliferation, which was further inhibited by
   double-knock down of PGK1 and ERK2 in those four kinds of ESCC cell
   lines (t-test, p < 1.0E–3) (Supplementary Fig. [341]8f, g and
   Supplementary Data [342]7h). Furthermore, unlike PGK1, the
   overexpression of other glycolytic enzymes, including GAPDH and PGM1,
   which catalyzed the last and the next step reaction of PGK1,
   respectively, did not show pro-proliferation effects in KYSE150 cells
   (t-test, p < 0.05) (Supplementary Fig. [343]8h and Supplementary
   Data [344]7i). Taken together, these results indicated that the
   activation of glycolytic enzyme PGK1 was associated with the
   proliferation of ESCC cells.

Fig. 8. PGK1 reprograms glucose metabolism and contributes to ESCC
progression.

   [345]Fig. 8
   [346]Open in a new tab

   a Pan Serine/Threonine/Tyrosine-phosphorylation levels of PGK1 in
   KYSE150 cells, KYSE70 cells, ECA109 cells, and TE-8 cells. b PGK1 level
   in mitochondria and cytosol fraction of in KYSE150 cells, KYSE70 cells,
   ECA109 cells, and TE-8 cells. c The impacts of PGK1 and/or ERK2 on
   PDHK1 T338 phosphorylation levels in KYSE150 cells, KYSE70 cells,
   ECA109 cells, and TE-8 cells. d The impacts of PGK1 and/or ERK2 on PDH
   activity in KYSE150 cells (n = 36) and ECA109 cells (n = 36) (two-sided
   Student’s t-test, mean ± SD). KYSE150: p = 0.088, *p = 0.031,
   ****p = 7.7E–6, p = 0.81, p = 0.66 from left to right. ECA109:
   *p = 0.043, p = 0.052, ****p = 1.3E–7, p = 0.78, p = 0.058 from left to
   right. e The impacts of overexpressed and knockdown PGK1 and ERK2 on
   OCR and ATP production (two-sided Student’s t-test, mean ± SD).
   Twenty-four cell samples were used in the analysis. Top: ****p = 1.6E–8
   (PGK1), ****p = 2.6E–12 (PGK1 + ERK2). Bottom: ***p = 7.7E–4. OCR:
   oxygen consumption rate. f The impacts of overexpressed and knockdown
   PGK1 and ERK2 on ECAR (two-sided Student’s t-test, mean ± SD). Sixteen
   cell samples are used in the analysis. Top: **p = 1.2E–3 (PGK1),
   ****p = 1.0E–11 (PGK1 + ERK2). Bottom: ***p = 3.9E–7. ECAR:
   extracellular acidification rate. g The impacts of PGK1-overexpression
   (OE) and/or ERK2-OE on cell proliferation in KYSE150 cells, KYSE70
   cells, ECA109 cells, and TE-8 cell (two-sided Student’s t-test,
   mean ± SD). A total of 320 cell samples were used in the analysis.
   KYSE150: ****p = 1.5E–7 (PGK1), ****p = 1.6E–5 (ERK2), ****p = 2.9E–9
   (PGK1 + ERK2). KYSE70: *p = 0.014 (PGK1), **p = 4.5E–3 (ERK2),
   ****p = 6.8E–5 (PGK1 + ERK2). ECA109: ****p = 5.4E–7 (PGK1),
   ****p = 3.2E–6 (ERK2), ****p = 2.7E–9 (PGK1 + ERK2). TE-8: **p = 2.1E–3
   (PGK1), ***p = 5.0E–4 (ERK2), ***p = 3.2E–4 (PGK1 + ERK2). h
   Gemcitabine inhibits cell proliferation (n = 30, two-sided Student’s
   t-test, **p = 4.8E–3, mean ± SD). i The impacts of PGK1-OE (left) and
   PGK1 knockdown (right) on the weight of the xenografts in the KYSE150
   cells, ECA109 cells, and TE-8 cells (two-sided Student’s t-test,
   mean ± SD). A total of 130 samples are used in the analysis. Left:
   KYSE150: ****p = 5.9E–8 (Control and PGK1-OE), ****p = 4.4E–9 (PGK1-OE
   and PGK1-OE-inhibitor), p = 0.17 (Control and PGK1-OE-inhibitor).
   ECA109: ****p = 2.1E–8 (Control and PGK1-OE), ****p = 4.4E–9 (PGK1-OE
   and PGK1-OE-inhibitor), p = 0.17 (Control and PGK1-OE-inhibitor). TE-8:
   ****p = 1.3E–7 (Control and PGK1-OE), ****p = 4.0E–8 (PGK1-OE and
   PGK1-OE-inhibitor), p = 1.7E–3 (Control and PGK1-OE-inhibitor). Right:
   ****p = 8.6E–8 (KYSE150), ****p = 3.4E–7 (ECA109), ****p = 7.6E–8
   (TE-8). ****p < 1.0E–4, ***p < 1.0E–3, **p < 0.01, *p < 0.05, ns. >
   0.05. Source data are provided as a Source data file.

   It has been reported that pyrimidine deoxynucleoside analog
   diphosphates (for example, gemcitabine) could be used as PGK1
   inhibitors because L-nucleoside analog diphosphates were selectively
   phosphorylated by PGK1^[347]55. Therefore, we tested the potential of
   gemcitabine to inhibit ESCC tumor growth. Firstly, we validated the
   significant inhibitory effects of gemcitabine on PGK1 (t-test,
   p = 4.8E–3) with an IC50 of 16.3 nM (Fig. [348]8h and Supplementary
   Fig. [349]8i, and Supplementary Data [350]7j) using an in vitro
   enzymatic assay. Secondly, treating the cultured cells with 50 nmol/L
   gemcitabine significantly decreased the glycolytic flux (such as
   pyruvate (t-test, p = 1.6E–3) and lactate (t-test, p = 0.014)) and cell
   proliferation (Supplementary Fig. [351]8j and Supplementary
   Data [352]7k, l). Thirdly, the xenograft growth-promoting ability of
   PGK1 overexpression was abolished in gemcitabine treated mice bearing
   either KYSE150 cells (t-test, p = 4.4E–9), ECA109 cells (t -test,
   p = 2.2E–9), or TE-8 cells (t-test, p = 4.0E–8), which was consistent
   with the effects of PGK1 knockdown (t-test, p (KYSE150) = 8.6E–8, p
   (ECA109) = 3.4E–7, p (TE-8) = 7.6E–8) (Fig. [353]8i and Supplementary
   Fig. [354]8k, and Supplementary Data [355]7m, n). In addition, we found
   that increased levels of PGK1 phosphorylation, but not PGK1 protein
   levels, were associated with tumor weight in xenograft TE-8 cells
   (Supplementary Fig. [356]8l, m and Supplementary Data [357]7o).
   Overall, these observations suggested that enhanced PGK1 expression was
   associated with early ESCC development.

Discussion

   ESCC is one of the most common malignancies, with a relatively low
   overall 5-year survival rate (less than 30%). Even though the whole
   genome sequences of ESCC patients have been obtained, the tumor
   heterogeneity and lack of understanding of the molecular mechanisms in
   ESCC progression impose many challenging unmet clinical needs in ESCC.
   It was speculated that tracking the occurrence and development of early
   ESCC could provide direct evidence of cancer-driving pathways and
   molecules in each stage.

   Depending on the substage-based model, our study detected precise
   temporal molecular switches promoting the progression of ESCC at the
   multi-omics level. The sequence of canonical cancer pathways was also
   disclosed, which involved ERBB, NOTCH, IGF, cell cycle, DNA repair,
   PI3K-AKT, mTOR, glycolysis, and Wnt signaling. The co-occurrence
   mutations of BOC, AKAP9, and PCDHB16, detected as early as in stage 6
   and last till the T2 and T3 stages, had positive impacts on DNA repair
   in stage 6, and glycolysis in the T2 and T3 stages, respectively. To
   further validate the findings and results in our cohort, we collected
   another 256 samples as an independent validation cohort from 49
   early-stage ESCC patients. The number of stages in the validation
   cohort was comparable with the main cohort. Comparative analysis showed
   consistent findings in a time-resolved mode in ESCC progression, and
   presented that the gradual decrease of keratinization and lipid
   metabolism revealed that the loss of normal esophagus was an important
   event in initiating early ESCC. The biomarkers of ESCC tissues, such as
   ACTA2, TAGLN, POSTN, PSAP, and THBS1^[358]20,[359]21, were also
   detected and significantly increased during ESCC progression
   (Kruskal–Wallis test, FDR < 2.2E–16, stage 9 vs. stage 1 ratio ≥ 2)
   (Supplementary Fig. [360]5e, f). These findings provided a temporal
   dimension and trans-omics dimension in understanding the ESCC
   progression.

   It is generally agreed that carcinogenesis is a chronic process
   involving several genes and pathways in different stages. The
   significant genomic aberrations were then translated to proteomic
   alterations in ESCC progression. Liu et al. also showed that several
   significantly mutated genes were shared in the earlier and advanced
   stage of ESCC^[361]7. Thus, we hypothesized that the mutations were
   cumulative in ESCC progression, and the key mutation/events which were
   found in the advanced stages might exist in the earlier stages. In our
   cohort, we observed that the number of mutations gradually cumulated
   during ESCC carcinogenesis. The dual-peak of the neo-mutations detected
   at the Tis stage and the T2 stage, perfectly matched their
   corresponding pathological phenotypes observed in the clinic,
   delivering that the mutations were not based on the principle of the
   linear accumulation model, but surged in certain histopathological
   stages during the carcinogenesis process. The mutation-surging wave at
   the advanced stage of ESCC (T2 and T3 stages) in the Fudan cohort
   indicated that the sudden rise of the mutations at specific stages was
   regular and would determine the carcinogenesis trend.

   The neo-mutation peaks in ESCC progression promoted to divide the
   carcinogenesis of ESCC into three phases (NT, IEN, A-ESCC). Compared to
   the IEN and A-ESCC phases, the lowest TMB was observed in the NT phase,
   in which TP53 mutation and DNA repair signature were detected.
   Biological function of the normal esophagus and inflammatory response
   were the dominant pathways in the NT phase. The Ca^2+ signal impacts
   nearly all aspects of cellular life^[362]56, and is implicated in a
   variety of processes that are important in tumor progression, such as
   proliferation and invasiveness^[363]57. In our cohort, we found the
   gain of chr3q was characterized in the IEN phase, in which the
   cis-effects genes related to the Ca^2+ signal and showed positive
   impacts on DNA replication. Interestingly, the amplification of TP63 at
   chr3q28, prominent in the IEN phase, showed positive impacts on DNA
   replication, implying the combined effects of Ca^2+ signal and TP63
   amplification on DNA replication in the IEN phase and further
   indicating the functions of chr3q gain in the transmit process from the
   NT phase to the IEN phase and on DNA replication in ESCC progression.
   The highest TMB was detected in the A-ESCC phase, in which the mutation
   of MACF1, a large crosslinker that contributes to cell integrity and
   cell differentiation^[364]58, activated Wnt signaling at the
   multi-omics level. Interestingly, based on the two proteomic clusters
   (C1 and C2) which were associated with the three phases in ESCC
   progression, the C2 (relatively malignant cluster) prominent mutation
   of AKAP9 upregulated the expression-level of AKAP9 and thus activated
   PKA, improving the transfer of ATP to ADP and enhancing glycolysis at
   the protein and phosphoprotein levels (Supplementary Fig. [365]9).
   These results had explored that all these pathways followed a very
   precise temporal order during the carcinogenesis progress of ESCC.

   Except for the diverse lifestyles of the countries^[366]2,[367]3,
   drinking/smoking habit, gender, and ages are key environmental risks of
   ESCC^[368]1, while the molecular mechanism is yet unknown. We applied
   the NMF algorithm to analyze the mutational signatures of the Fudan
   cohort and other ESCC cohorts, including the TCGA cohort, Moody’s
   cohort, etc. The integrated findings revealed that the SBS16 signature,
   which was associated with the ESCC patients with a drinking habit and
   OLFM4 mutation exhibited positive impacts on CDKs activation and thus
   enhancing DNA replication evidenced by the related markers, indicating
   the potential medicative of Dinaciclib for drinking ESCC patients. In
   addition, the APOBEC signature was the dominant signature in the
   non-drinking/smoking ESCC patients, and the significant mutation of
   DCTN2 upregulated the protein level of DCTN2, and displayed positive
   impacts on RUVBL1 and thus activated DNA replication, implying the
   potential effects of CB-6644 in the non-drinking/smoking ESCC patients.
   These results revealed the functions of DNA replication in the IEN
   phase, whereas the molecular mechanism and clinic strategy were diverse
   in the ESCC patients with the habits of drinking and
   non-drinking/smoking.

   Furthermore, 746 samples from 114 early ESCC patients in the Fudan
   cohort allowed us to trace the carcinogenesis lineages of early-stage
   ESCCs, resulting in six tracks closely related to the clinical feature,
   including gender, age, and risk habits of drinking/smoking. For
   example, more female ESCC patients were observed in track 3
   (Supplementary Fig. [369]9). To further validate the results in our
   cohort, we collected another 256 samples as an independent validation
   cohort from 49 early-stage ESCC patient, which were then classified
   into tracks 1–6 with similar molecular characterization. A large-scale,
   population-based cohort study has shown that drinking/smoking promotes
   ESCC carcinogenesis^[370]5. Our study disclosed the carcinogenesis
   lineage of ESCC patients with drinking/smoking habits (track 6).
   Integration with the findings in the validation cohort revealed that
   the immune response pathways, including interleukins signaling, antigen
   processing and presentation, etc., were the dominant pathways in the
   ESCC patients with drinking/smoking habits, which was consistent with
   features of track 6, both in the main cohort and validation cohort,
   further validating the findings of the personalized tracks. In brief,
   in addition to the driver waves, this study provided 6 carcinogenesis
   tracks as references for diverse ESCC clinical therapies.