Abstract

   Although the incidence of cervical cancer (CC) has been reduced in
   high-income countries due to human papillomavirus (HPV) vaccination and
   screening strategies, it remains a significant public health issue that
   poses a threat to women’s health in low-income countries. Here, we
   perform a comprehensive proteogenomic profiling of CC tumors obtained
   from 139 Chinese women. Integrated proteogenomic analysis links genetic
   aberrations to downstream pathogenesis-related pathways and reveals the
   landscape of HPV-associated multi-omic changes. EP300 is found to
   enhance the acetylation of FOSL2-K222, consequently accelerating the
   malignant proliferation of CC cells. Proteomic stratification
   identifies three patient subgroups with distinct features in prognosis,
   genetic alterations, immune infiltration, and post-translational
   modification regulations. PRKCB is further identified as a potential
   radioresponse-related biomarker of CC patients. This study provides a
   valuable public resource for researchers and clinicians to delve into
   the molecular basis of CC, to identify potential treatments and to
   ultimately advance clinical practice.

   Subject terms: Cervical cancer, Cancer genomics, Tumour biomarkers,
   Proteomics
     __________________________________________________________________

   Cervical cancer remains a significant public health problem in many
   regions. Here, the authors perform a proteogenomic analysis of cervical
   cancer in Chinese patients; they reveal proteomic subgroups associated
   with clinical and biological features, and a potential biomarker of
   response to radiotherapy.

Introduction

   Cervical cancer (CC) remains the fourth most common malignancy and the
   fourth leading cause of cancer mortality in women^[86]1. Unlike
   numerous other cancers lacking direct etiological association with
   virus infection, persistent human papillomavirus (HPV) infection is the
   major cause of CC, and over 90% of CC patients are associated with HPV
   infection at the time of diagnosis^[87]2. Although HPV vaccination and
   screening strategies have reduced the incidence of CC in high-income
   countries^[88]3, CC is still a major public health problem threatening
   women’s health in low-income countries^[89]4. Patients with early-stage
   CC who are treated with radical surgery generally have a good
   prognosis. For locally advanced CC, cisplatin-based concurrent
   chemoradiotherapy is recommended as the standard treatment^[90]5–[91]7.
   However, about 30–50% of CC patients treated with standard care of
   chemoradiotherapy experienced progression or recurrence within five
   years^[92]8–[93]10. Some patients do not benefit optimally from
   radiotherapy because of radio-resistance, and the prognosis for those
   with persistent, recurrent, or metastatic CC is poor. Therefore,
   exploring potential radio-resistance biomarkers and effective
   interventions to improve outcomes for locally advanced disease is
   pivotal. Investigating the molecular mechanism of CC development and
   its HPV association, and identifying candidate biomarkers or
   therapeutic targets could improve individualized therapy for CC, which
   would greatly improve patient survival and quality of life.

   Previous genomic and transcriptomic studies revealed the genetic
   landscape of CC as well as patient stratification based on distinct
   molecular signatures^[94]11–[95]15. The study from The Cancer Genome
   Atlas (TCGA) clustered 178 CC patients using mRNA, methylation, miRNA,
   and copy number data, and identified three subtypes: (i) Keratin-low
   squamous, (ii) Keratin-high squamous, and (iii)
   Adenocarcinoma-rich^[96]14. The CC multi-omics research by Fan et al.
   focused on exploring the functional consequences of HPV fusion
   transcripts on the biology and pathophysiology^[97]16. However, the
   previous studies either lacked or possessed only limited proteomic
   data. Consequently, a comprehensive characterization of the changes in
   the protein landscape within CC was absent. In contrast, recent
   research on various women’s cancer types, including
   breast^[98]17,[99]18, endometrial^[100]19,[101]20, and ovarian
   cancers^[102]21,[103]22, conducted by the Clinical Proteomic Tumor
   Analysis Consortium (CPTAC), has demonstrated the indispensable role of
   integrating proteomic and post-translational modification (PTM)
   analyses with genomic strategies to reveal essential biological
   insights into complex diseases. This compelling precedent prompted our
   pursuit of proteogenomic analysis to delve deeper into the molecular
   mechanisms of CC, particularly in the context of HPV-induced
   carcinogenesis.

   In this study, we performed integrated genomic, transcriptomic,
   proteomic, phosphoproteomic, and acetylproteomic analyses on Chinese CC
   patients. Our proteogenomic study discovered: 1) HPV-promoted
   phenotypic perturbations at multi-omics levels; 2) acetylation of
   FOSL2-K222 by EP300 accelerating malignant proliferation; 3) proteomic
   subgroups associated with distinct clinical and biological features;
   and 4) PRKCB as a potential radioresponse-related biomarker. The
   underlying data presented here serves as a valuable resource for
   advancing research in various aspects of CC, including biology,
   treatment strategies, and the exploration of potential therapeutic
   drugs.

Results

Proteogenomic landscape of a Chinese CC cohort

   To comprehensively understand the molecular characteristics of CC in
   China and provide valuable biological insights to guide clinical
   treatments, we conducted a comprehensive analysis using whole exome
   sequencing (WES), RNA sequencing (RNA-seq), proteomics,
   phosphoproteomics, and acetylproteomics (Supplementary Fig. [104]1a).
   This analysis was performed on a total of 139 treatment-naive tumor
   samples and 33 normal adjacent tissues (NATs) prospectively collected
   from the Peking Union Medical College (PUMC). Out of 139 patients, 112
   were diagnosed with squamous cell carcinoma, 19 with adenocarcinoma, 2
   with adenosquamous carcinoma, 4 with small cell neuroendocrine
   carcinoma of the cervix (NECC), and 2 with other mixed histology type
   (Supplementary Fig. [105]1b). Clinical parameters including tumor
   grade, stage, primary treatment modality, and progression-free survival
   (PFS) information were summarized in Supplementary Fig. [106]1c and
   Supplementary Data [107]1a, b. The analysis flow chart of this study is
   presented in Supplementary Fig. [108]1d.

   A total of 31,047 non-silent point mutations and 1050 small
   insertions-deletions were detected in the WES data of the primary
   tumors of 134 CC patients (Supplementary Data [109]2a). The most
   frequently mutated genes in the PUMC cohort included PIK3CA (32.8%),
   KMT2C (18.7%), KMT2D (17.2%), FAT1 (13.4%), AR1D1A (11.2%), FBXW7
   (11.2%), EP300 (8.2%), TP53 (8.2%), PTEN (7.5%), and ERBB3 (7.5%)
   (Fig. [110]1a). PIK3CA was the most frequently mutated gene in CC, as
   previously published^[111]14,[112]15. KMT2C and KMT2D were frequently
   mutated in this cohort, and also observed across various types of
   cancers^[113]23–[114]25. The mutation rate of KRAS was higher in
   cervical adenocarcinoma than in squamous cell carcinoma (25.0% vs 0%)
   in our cohort, consistent with the TCGA CC study (15.6% vs
   2.9%)^[115]14 (Fig. [116]1b). In this study, 24,595 genes were
   quantified by RNA-seq; Isobaric tandem mass tags (TMT)-based proteomic
   analysis identified 9,600 proteins; Phosphoproteomic analysis
   identified 41,448 highly reliable phosphosites corresponding to 7721
   phosphoproteins and acetylproteomics detected 5749 highly reliable
   acetylsites corresponding to 2456 acetylproteins (Supplementary
   Data [117]3a-d). Besides, data-independent acquisition (DIA)-based
   proteomics identified 11,904 proteins and exhibited good consistency
   with TMT-proteomics data (Supplementary Data [118]3e, Supplementary
   Fig. [119]2a). Principal-component analysis (PCA) clearly discriminated
   tumors and NATs based on both TMT and DIA proteomics data, and no
   obvious batch effects were observed (Supplementary Fig. [120]2b). In
   addition, high data reproducibility and technical quality were
   demonstrated across the entire proteomics analysis (Supplementary
   Fig. [121]2c, d).

Fig. 1. WES-based mutation profile of Chinese CC cohort.

   [122]Fig. 1
   [123]Open in a new tab

   a Genetic profile of the 134 CC patients. b Comparisons of frequently
   mutated genes between cervical squamous cell carcinoma and
   adenocarcinoma in Chinese and TCGA CC cohorts (one-sided Fisher’s exact
   test). Source data are provided as a [124]Source Data file.

   For 6089 genes whose mRNA and protein abundance were both measured
   across 132 CC samples, we observed an overall positive mRNA-protein
   correlation (81.7% showed significantly positive correlation at
   Benjamini-Hochberg (BH) adjusted P < 0.05) with a median Spearman’s
   correlation coefficient of 0.40 (Supplementary Fig. [125]2e), which is
   similar to previous studies investigating ovarian cancer (median
   r = 0.45)^[126]21 and clear cell renal cell carcinoma (median
   r = 0.44)^[127]26. Previous studies have revealed the intricate
   cellular heterogeneity of CC tissues^[128]27–[129]29. We leveraged
   transcriptomic data to estimate the immune and stromal infiltration
   scores^[130]30 (Supplementary Data [131]1a). The immune infiltration
   scores were significantly higher in squamous cell carcinoma than in
   adenocarcinoma (P = 2.2E-03, Supplementary Fig. [132]2f), consistent
   with the better prognosis observed in squamous cell
   carcinoma^[133]31,[134]32.

Multi-omics analysis of somatic copy number alternations

   For 134 CC samples, somatic copy number alternations (SCNAs) were
   derived based on WES data. Overall, genome-wide focal alterations
   showed the most frequent gains in chromosomes 1q, 3q, 5p and focal
   losses in chromosome 2q, 3p, 11q, 13q (Supplementary Fig. [135]3a),
   consistent with previous reports^[136]12,[137]14,[138]15. In addition
   to amplification of genes CD274, PDCD1LG2, TP63 and PIK3CA reported in
   previous studies, we observed recurrent amplification events at PIK3CB
   (3q22.3, 55.97%), CDK4 (12q14.1, 37.3%), AKT1 (14q32.22, 39.55%), AKT2
   (19q13.2, 57.46%) (Supplementary Fig. [139]3b, Supplementary
   Data [140]4a), and also identified deletions of tumor suppressors such
   as LATS1 (6q25.1, 15.67%), AR1D1B (6q25.3, 14.93%), BRCA2 (13q13.1,
   23.88%) and RB1 (13q14.2, 23.88%) (|Log[2] (Tumor/Blood)| >0.3)
   (Supplementary Fig. [141]3c, Supplementary Data [142]4b).

   The gains and losses of gene copies often cis or trans-regulate mRNA,
   protein, and phosphoprotein abundance, as shown by the diagonal and
   vertical patterns in Fig. [143]2a. The cis and trans association among
   copy number alternation (CNA)-protein (TMT and DIA) and
   CNA-phosphoprotein were more attenuated than that of CNA-RNA
   (Fig. [144]2a, Supplementary Fig. [145]3d, e). Furthermore, based on
   the TMT and DIA proteomic datasets, 1358 common attenuated proteins
   were identified, which were mainly enriched in biological processes
   such as mRNA processing, mRNA splicing, and ribosome biogenesis
   (Supplementary Fig. [146]3f–h). Thus, PTM may play a key role in
   determining protein half-lives, resulting in decreased correlation
   between gene dosage and protein abundance^[147]33.

Fig. 2. Effects of SCNAs.

   [148]Fig. 2
   [149]Open in a new tab

   a Correlations of CNAs (x-axes) to TMT (left) and DIA (right) protein
   expression (y-axes) highlight CNA cis and trans effects. Significant
   positive (red) and negative (green) correlations (two-sided Spearman’s
   correlation, BH adjusted P < 0.01) are indicated in top panels. The
   proteins positively associated with a particular CNA were presented as
   red bars underneath the respective panels and those negatively
   associated with a particular CNA were represented by dark green bars. b
   Approach schematic for identifying CNG-Cis genes. From 8035 genes
   located in GISTIC focal amplification peaks, 337 were up-regulated in
   tumors (two-sided Wilcoxon rank-sum test, BH adjusted P < 0.05), of
   which 277 had mRNA effects and 178 had protein level effects
   (Spearman’s correlation > 0, BH adjusted P < 0.05). c Lymph
   node-positive patients (n = 44) exhibited a higher level of protein
   expression of CNG-Cis genes compared to lymph node-negative patients
   (n = 86, two-sided Student’s t test). d Flow chart for identification
   of CNL-Cis genes. From 2149 genes located in GISTIC focal deletion
   peaks, 80 were down-regulated in tumors (two-sided Wilcoxon rank-sum
   test, BH adjusted P < 0.05), of which 48 had mRNA effects and 28 had
   protein level effects (Spearman’s correlation > 0, BH adjusted
   P < 0.05). e Protein abundance (TMT) of CNL-Cis genes was negatively
   correlated with tumor size (two-sided Pearson’s correlation). f Lymph
   node-positive patients (n = 44) displayed lower protein expression
   levels of CNL-Cis genes than lymph node-negative patients (n = 86;
   two-sided Student’s t test). g KEGG pathway enrichment analysis of
   proteins with positive (n = 85) or negative (n = 64) CNA-protein
   correlations in chromosome 3q (one-sided Fisher’s exact test, BH
   adjusted P). h Heatmaps of CNG of chromosomes 3q and the
   protein/phospho-protein abundance of cell cycle and focal adhesion
   related proteins. i KEGG pathway enrichment analysis of proteins with
   positive (n = 197) or negative (n = 178) CNA-protein correlations in
   chromosome 11q25 (one-sided Fisher’s exact test, BH adjusted P). j
   Heatmap of CNL of chromosomes 11q25 and the protein/phospho-protein
   abundance of cell cycle and focal adhesion related proteins. Data in
   (c, f) are shown using boxplots. Boxplots show the median (central
   line), the 25–75% IQR (box limits), and the ± 1.5 × IQR (whiskers).
   Source data are provided as a [150]Source Data file.

   To identify hub CNA genes, we screened correlated CNA, mRNA, and
   protein levels across tumors and concordant protein-level changes
   between tumors and NATs as below. 178 genes were identified as copy
   number gain genes with cis effects (CNG-Cis genes, see Methods) and
   were enriched in proteasome, RNA transport, riboflavin metabolism,
   spliceosome, DNA replication and N-Glycan biosynthesis (Fig. [151]2b,
   Supplementary Data [152]5a), highlighting the functional impact of CNA
   on tumor proliferation and progression. It is noteworthy that tumors
   with positive lymph node exhibited higher average CNG-Cis protein
   abundances than tumors with negative lymph node (Fig. [153]2c).
   Meanwhile, 28 copy number loss genes with cis effects (CNL-Cis genes,
   see Methods) were identified and were involved in cell adhesion
   molecules and valine, leucine and isoleucine degradation pathways
   (Fig. [154]2d, Supplementary Data [155]5b). The average protein
   expression level of CNL-Cis genes was inversely correlated with tumor
   size (Fig. [156]2e, Supplementary Fig. [157]3j). Additionally, tumors
   with positive lymph node displayed significantly lower CNL-Cis protein
   abundance than tumors with negative lymph node (Fig. [158]2f). Among
   the CNG-Cis genes, 50 (28.1%) were located on chromosome 3q, while 16
   (57.1%) of the CNL-Cis genes were located on chromosome 11q
   (Supplementary Fig. [159]3i). Meanwhile, we found Chr3q and Chr11q25
   CNAs had much more trans effects at protein and phosphoprotein levels
   (Fig. [160]2a, Supplementary Fig. [161]3e). Pathway enrichment analysis
   revealed that gain of Chr3q and loss of Chr11q25 demonstrated the same
   trans effects on up-regulated cell-cycle pathway and down-regulated
   focal adhesion pathway at the protein and phosphoprotein levels
   (Fig. [162]2g–j). Furthermore, co-occurrence of CNG on Chr3q and CNL on
   Chr11q25 was observed (Supplementary Fig. [163]3k). Notably, loss of
   Chr11q25 was associated with bigger tumor size (Supplementary
   Fig. [164]3l) and positive lymph node metastasis (Supplementary
   Fig. [165]3m), also likely contributing to proliferation and metastasis
   of CC cells.

Landscape of HPV-associated proteogenomic changes

   HPV detection and genotyping were done based on RNA-seq data^[166]34,
   the HPV gene transcriptome data were shown in Supplementary
   Data [167]6a. Most squamous cell carcinoma samples were HPV positive
   (98.2%), with 77.7% being infected with HPV A9 clade; 16.7% and 33.3%
   of adenocarcinoma samples were infected with A9 and A7 clade
   respectively, and 44.4% of adenocarcinoma samples were HPV-negative
   (Fig. [168]3a). Furthermore, the proportion of HPV-positive samples
   among all tumors was 92.6%, substantially higher than that among NATs
   (42.9%) (Fig. [169]3a). We also found that viral gene mRNA abundance
   (E1/2, E5/6/7, L1/2) in tumors was significant higher compared to NATs
   (Supplementary Fig. [170]4a), suggesting severe HPV infection was
   primarily localized within cervical lesions. In addition, tumor samples
   infected with HPV A9 displayed higher immune infiltration scores
   compared with those with HPV A7 (P = 0.008, Fig. [171]3b).

Fig. 3. HPV-associated proteogenomic landscape and patient subgrouping based
on HPV E2/E5 mRNA level.

   [172]Fig. 3
   [173]Open in a new tab

   a The landscape of HPV infection types in Chinese CC cohort, with
   sample counts displayed. b Boxplot showing the comparison of immune
   score among different types of HPV clade (A9, n = 89; A7, n = 30,
   two-sided Wilcoxon rank-sum test). c Scatterplot displayed the
   Spearman’s correlation between the protein (x-axis)/phosphoprotein
   (y-axis) and HPV E6 mRNA abundance across tumor samples (two-sided
   Spearman’s correlation). d Scatterplot showing Spearman’s correlation
   between HPV E6 mRNA and H2AX protein/phosphoprotein abundance
   (two-sided Spearman’s correlation). e Scatterplot showing Spearman’s
   correlation between HPV E6 mRNA and WEE1 protein/phosphoprotein
   abundance (two-sided Spearman’s correlation). f Patient subgrouping
   based on HPV E2/E5 mRNA level (n = 122), and the heatmap in the bottom
   panel displayed differentially expressed proteins, phosphoprotein, and
   phosphosites (two-sided Student’s t test, BH adjusted P). g Age
   differences between patients in two HPV subgroups (HPV_G1, n = 56;
   HPV_G2, n = 66, two-sided Student’s t test, BH adjusted P). h
   Kaplan-Meier curves for PFS of radical radiotherapy CC patients based
   on HPV subgroups (two-sided log-rank test). Numbers in parentheses
   represented the sample sizes for the involved groups. Boxplots in (b,
   g) show the median (central line), the 25–75% IQR (box limits), and the
   ± 1.5 × IQR (whiskers). Source data are provided as a [174]Source Data
   file.

   Consistent with previous study that E6 and E7 could promote malignant
   transformation of host cells via forming a complex^[175]35, there was a
   strong positive correlation observed between the mRNA abundance of E6
   and E7 in tumor samples (Spearman’s correlation = 0.98, P = 1.2E-94,
   Supplementary Fig. [176]4b). To further elucidate the relationship
   between viral gene and proteome and tumor phenotype, we performed a
   correlation analysis of E6, and discovered that proteins positively
   correlated with E6 expression were significantly enriched in the cell
   cycle pathway, whereas proteins negatively correlated were associated
   with focal adhesion, protein processing in the endoplasmic reticulum,
   and N-glycan biosynthesis (Supplementary Fig. [177]5a–c).
   Simultaneously, we observed upregulation in the phosphoprotein of H2AX
   and WEE1 in tumors with higher E6 mRNA abundance, while their protein
   expressions did not show significant alteration (Fig. [178]3c–e).
   Previous studies have demonstrated that HPV-positive tumors responded
   remarkably to the treatment of WEE1 inhibitor
   Adavosertib^[179]36,[180]37. Phosphorylation of H2AX is a signature of
   DNA double-strand breaks^[181]38. The two key HPV capsid genes L1 and
   L2 in tumor samples showed a high correlation (Spearman’s
   correlation = 0.81, P = 3.2E-34, Supplementary Fig. [182]4b). L1 and L2
   together form the HPV’s capsid, which is essential for the HPV’s
   survival, transmission, and spread in the environment^[183]39. Proteins
   positively associated with L1 mRNA abundance were enriched in DNA
   replication, while proteins negatively associated were involved in cell
   adhesion, platelet activation, Fc gamma receptor-mediated autophagy,
   and leukocyte transendothelial migration (Supplementary Fig. [184]5d).
   The proteins most correlated with L1 mRNA abundance were shown in
   Supplementary Fig. [185]5e (|Spearman’s correlation | > 0.3).
   Interestingly, transcription levels of L1 gene negatively correlated
   with immune infiltration score and stromal score (Supplementary
   Fig. [186]5f).

   The carcinogenic progression of lesions infected with HPV16 and HPV18
   is typically linked to the integration of the viral genome into the
   host cell chromosome, and this process often results in the loss of E2
   gene expression^[187]40–[188]42. Additionally, we observed the mRNA
   abundance of E2 and E5 exhibited a bimodal distribution in this cohort
   (Supplementary Fig. [189]4b), so we clustered the tumors into two
   groups with either high or low E2/E5 mRNA abundance. VRK2 and MTDH
   showed higher expression in HPV_G2 tumors (Fig. [190]3f). VRK2 has been
   identified as an unfavorable prognostic marker in liver cancer, renal
   cancer, and pancreatic cancer, and MTDH is an unfavorable prognostic
   marker in lung cancer, renal cancer, and pancreatic cancer
   ([191]https://www.proteinatlas.org). Additionally, increased
   phosphorylation of SMIM13 and VANGL1 was observed in HPV_G2 compared
   with HPV_G1 (Fig. [192]3f). VANGL1 plays an important role in
   regulating the polarized cell behavior in cancer development^[193]43.
   HPV_G2 contained a higher prevalence of HPV A7 infection
   (Fig. [194]3f). In addition, patients in HPV_G2 were significantly
   younger (Fig. [195]3g) and had a worse PFS (Fig. [196]3h) than those in
   HPV_G1. Together, our data revealed the landscape of HPV-associated
   proteogenomic changes and the impact of HPV on prognosis.

Proteomic alterations associated with tumorigenesis in CC

   Tumors and paired NATs revealed remarkable differences in proteins,
   phosphosites, and acetylsites (Supplementary Data [197]7a-c). In total,
   3268 (38.8%) proteins showed significantly differential expression (BH
   adjusted P < 0.01), with 483 proteins up-regulated and 1041 proteins
   down-regulated for more than 2-fold abundance changes (Fig. [198]4a,
   Supplementary Fig. [199]6a). The up-regulated proteins were enriched in
   DNA replication, mismatch repair, p53 signaling pathway, and cell cycle
   pathway, while the down-regulated proteins were enriched in pathways
   such as cell adhesion molecules, focal adhesion, and complement and
   coagulation cascades (Fig. [200]4b). Enrichment analysis using altered
   phosphosites or acetylsites associated proteins, displayed similar
   results (Supplementary Fig. [201]6b, c). Additionally, we defined 23
   proteins as CC-associated proteins that were elevated by more than
   2-fold in at least 80% of all tumor-NAT pairs, and annotated its
   potential clinical utility by the Human Protein Atlas (Fig. [202]4c).
   Furthermore, the elevated tumor expression of these proteins was
   validated using DIA proteomic (Supplementary Fig. [203]6d).

Fig. 4. Proteomic alterations associated with CC tumorigenesis.

   [204]Fig. 4
   [205]Open in a new tab

   a Volcano plot depicting differentially expressed proteins between
   paired (n = 30) tumors and NATs (two-sided Student’s t test, BH
   adjusted). b Representative KEGG pathway terms for 2-fold upregulated
   and downregulated proteins. c Boxplot showing Log[2] FC between paired
   (n = 30) tumors and NATs for CC-associated proteins annotated with
   potential clinical utilities by the Human Protein Atlas (two-sided
   Student’s t test, BH adjusted P < 0.01). d Scatterplot of significant
   enriched GO biological processes terms in cervical adenocarcinoma and
   squamous cell carcinoma based on differentially expressed proteins
   between their tumors and NATs. e Heatmap of the relative abundance of
   response to type I interferon-related proteins that were significantly
   upregulated between cervical squamous cell carcinoma and adenocarcinoma
   (two-sided Student’s t test). f Boxplots showing the expression of
   immune checkpoints and MHC molecules at protein level in NATs (n = 33),
   tumors from cervical adenocarcinoma (n = 19) and squamous cell
   carcinoma (n = 109) (two-sided Student’s t test). g Enriched KEGG
   pathways for proteins that were differentially expressed among
   different stage. h Summary of metabolism-related proteins that were
   differentially expressed among stage and between tumors and NATs
   (two-sided Student’s t test). i–l Boxplot showing comparison of protein
   expression among NATs and tumors from different stage of B4GALT1 (i),
   DHCR24 (j), AGPS (k) and ACACA (l) (one-way ANOVA test). Kaplan-Meier
   curves for OS based on mRNA abundance of B4GALT1, DHCR24, AGPS, and
   ACACA from TCGA CC cohort (n = 291, two-sided log-rank test). Patients
   were stratified by the optimal cutpoint using maximally selected rank
   statistics (maxstat) on mRNA abundance. Boxplots in (c, f and i–l) show
   the median (central line), the 25–75% IQR (box limits), and the
   ± 1.5 × IQR (whiskers). Source data are provided as a [206]Source Data
   file.

   Next, we compared the molecular differences between cervical
   adenocarcinoma and squamous cell carcinoma, by focusing on the
   molecules that exhibited increased abundance in tumors compared to NATs
   at multi-omics level (Fig. [207]4d, Supplementary Fig. [208]6e, f). For
   proteins, we found immune-related biological processes such as response
   to type I interferon, response to interferon-beta/gamma, and regulation
   of innate immune response were significantly up-regulated in squamous
   cell carcinoma, while several metabolism processes were enriched in
   adenocarcinoma (Fig. [209]4d). The expression of interferon-induced
   antiviral RNA-binding protein (IFIT1/2/3), interferon regulatory factor
   6 (IRF6), interferon-induced GTP-binding protein (MX1/2), and STAT1/2
   were only significantly up-regulated in tumors of squamous cell
   carcinoma (Fig. [210]4e). Furthermore, the expression of immune
   checkpoints and MHC molecules including CD274, CD44, CXCL10, and
   HLA-E/F/G were also up-regulated in tumors of squamous cell carcinoma
   (Fig. [211]4f). We thus speculate that squamous cell carcinoma has an
   active immune microenvironment and may be beneficial from immune
   therapies.

   To explore the molecular dynamics of CC tumor progression, we performed
   differential protein expression analysis across different stages (I,
   II, III + IV). Enrichment analysis of KEGG pathways for these
   differentially expressed proteins suggested that several metabolic
   pathways including pyruvate metabolism, glycolysis/gluconeogenesis,
   carbon metabolism, and galactose metabolism were dysregulated among
   tumor progression (Fig. [212]4g). Notably, 14 proteins presented an
   overall positive regulation trend during CC progression and showed
   significantly up-regulated between tumors and NATs, while 9 proteins
   showed opposite trend and were down-regulated in tumors (Fig. [213]4h).
   Furthermore, TCGA CC cohort was utilized to better understand the
   clinical outcome for these metabolism-related proteins, and suggested
   that high transcription level, including B4GALT1, DHCR24, AGPS, ACACA,
   HK2 and SLC2A1, were positively associated with poor overall survival
   (OS) in CC (Fig. [214]4i–l and Supplementary Fig. [215]7a, b). It is
   worth noting that, HK2 expression is well known to be positively
   correlated with tumor size, pathological grade, and prognosis and
   served as a carcinogenic role in CC through AKT
   pathway^[216]44,[217]45. High SLC2A1 expression as an independent
   prognostic factor for OS may be associated with immunomodulation in
   CC^[218]46–[219]49. However, the relevance of B4GALT1, DHCR24, AGPS and
   ACACA in CC has not reported in previous studies. In addition to
   correlation with CC stage, we found that the expression of the above
   four proteins was higher in patients with positive lymph nodes
   (Supplementary Fig. [220]7c-f). To further validate the driving role of
   these genes in CC, we performed CCK-8 and colony formation experiments.
   These results suggested that knockdown of B4GALT1, DHCR24, AGPS and
   ACACA by siRNAs significantly decreased cell proliferation and colony
   formation capacity of the SiHa cells, further indicating their
   oncogenic roles in CC progression (Supplementary Fig. [221]7g-j).

Aberrate protein acetylation regulates CC progression

   To date, very few attempts have been made to systematically describe
   and make use of the acetylome regulation in CC. In this study, we
   performed differential analysis of lysine acetyltransferases (KATs),
   bromodomain-containing proteins (BRDs), and histones with protein and
   acetylation abundances. Compared to NATs, the acetylation of these
   proteins was dramatically up-regulated in tumors relative to its
   protein abundance (Fig. [222]5a). Notably, we found that EP300 had
   considerable K1542, K1546, K1549 and K1590 hyperacetylation in its
   histone acetyltransferase domain (HAT)^[223]50 (Fig. [224]5a). Among
   them, K1549, a key autoacetylation site in the activation loop that may
   activate EP300 activity was significantly up-regulated in
   tumors^[225]51.

Fig. 5. Aberrate protein acetylation regulates CC progression.

   [226]Fig. 5
   [227]Open in a new tab

   a Heatmap illustrating the protein and acetylation abundance of KATs,
   BRDs, and histones between paired (n = 30) tumors and NATs (two-sided
   Student’s t test). b Scatterplot showing Spearman’s correlation of
   FOSL2 acetylation level (x-axis) and EP300 acetylation level (y-axis)
   (two-sided Spearman’s correlation). c Boxplots showing comparison of
   protein or K222 acetylsite expression of FOSL2 between paired (n = 30)
   tumors and NATs (two-sided Student’s t test). Boxplots show the median
   (central line), the 25–75% IQR (box limits), and the ± 1.5 × IQR
   (whiskers). d EP300 acetylated FOSL2. Flag-FOSL2 expression vector was
   co-transfected separately with expression vectors containing a variety
   of acetyltransferases into HEK293T cells. WCL and immunoprecipitates
   were detected by immunoblot with indicated antibodies. Empty Vector
   (EV) was used as a negative control. The experiments were repeated at
   least three times. e Mutation of K222 significantly decreased FOSL2
   acetylation by EP300. HEK293T cells were co-transfected with indicated
   plasmids. Cell lysates were harvested for IP and immunoblotting. The
   experiments were repeated at least three times. f Relative mRNA level
   of WNT5A and FOSL2 in SiHa cells after FOSL2 siRNA transfection. Data
   are represented as mean ± SEM (n = 3, one-way ANOVA test). WNT5A:
   ^⋆⋆⋆P = 7.3E-05 (NC and siFOSL2-1), ^⋆⋆⋆P = 4.3E-04 (NC and siFOSL2-2).
   FOSL2: ^⋆⋆⋆P = 1.3E-06 (NC and siFOSL2-1), ^⋆⋆⋆P = 1.1E-04 (NC and
   siFOSL2-2). g SiHa cells ectopically expressing FLAG-tagged FOSL2-WT,
   FOSL2-K222R and FOSL2-K222Q mutants were generated and confirmed by
   immunoblot. The experiments were repeated at least three times. h
   Relative mRNA level of WNT5A was determined by qRT-PCR in the indicated
   SiHa cells. Data are represented as mean ± SEM (n = 3, two-sided
   Student’s t test). ^⋆⋆P = 2.9E-03 (EV and WT), ^⋆⋆P = 4.0E-03 (WT and
   K222Q), ^⋆⋆P = 1.1E-03 (WT and K222R), ^⋆⋆⋆P = 2.3E-05 (K222Q and
   K222R). i Proliferation of indicated SiHa cells was measured. Data are
   represented as mean ± SEM (n = 3, two-sided Student’s t test).
   ^⋆⋆⋆P = 7.5E-06 (EV and WT), ^⋆⋆⋆P = 4.5E-06 (WT and K222Q),
   ^⋆⋆⋆P = 1.1E-05 (WT and K222R), ^⋆⋆⋆P = 1.6E-06 (K222Q and K222R). j
   Xenograft tumor pictures of nude mice in different groups. Scale bar,
   1 cm. k Tumor growth curves of indicated SiHa cells subcutaneously
   injected into nude mice. Data are represented as mean ± SEM (n = 8 mice
   per group, two-sided Student’s t test). ^⋆⋆⋆P = 9.2E-05 (EV and WT),
   ^⋆⋆⋆P = 1.7E-04 (WT and K222Q), ^⋆⋆⋆P = 3.4E-06 (WT and K222R),
   ^⋆⋆⋆P = 6.4E-08 (K222Q and K222R). l Tumor weight of nude mice in
   different groups. Data are represented as mean ± SEM (n = 8 mice per
   group, two-sided Student’s t test). ^⋆⋆⋆P = 1.7E-04 (EV and WT),
   ^⋆⋆⋆P = 5.6E-06 (WT and K222Q), ^⋆⋆⋆P = 1.3E-05 (WT and K222R),
   ^⋆⋆⋆P = 4.6E-08 (K222Q and K222R). Source data are provided as a
   [228]Source Data file.

   We then examined the potential impact EP300 activity by measuring the
   acetylation of EP300 itself and performed correlation analysis with our
   acetylproteomics data. Consequently, high acetylation levels of EP300
   were associated with high acetylation of FOSL2 (Fig. [229]5b), a
   transcription factor that belongs to the AP-1 family and was reported
   to be involved in cell proliferation and differentiation^[230]52.
   However, the acetylation or function of FOSL2 in CC has not been
   studied. Interestingly, we found a significant upregulation of
   FOSL2-K222 acetylation but not its protein abundance (Fig. [231]5c).
   Representative spectra of the acetylated peptides of FOSL2-K222 and
   EP300 were shown in Supplementary Fig. [232]8a-i.

   To further explore the connection between EP300 and FOSL2-K222 in CC,
   we conducted a comprehensive set of experiments. First, co-transfection
   of EP300, but not KAT2B, KAT5 or KAT2A, promoted FOSL2 acetylation
   (Fig. [233]5d). The elevation of FOSL2-K222 acetylation upon EP300
   co-transfection was further confirmed by mass spectrometry (MS)
   analysis (Supplementary Fig. [234]8j, k). Compared with FOSL2-WT (wild
   type), when the FOSL2-K222R was mutated (acetylation-deficient), the
   acetylation level of FOSL2 caused by co-transfection with EP300 was
   significantly reduced (Fig. [235]5e), suggesting that K222 was a major
   acetylation site of FOSL2 by EP300. FOSL2 has been implicated as an
   onco-protein in several cancers, including lung cancer^[236]53, breast
   cancer^[237]54 and hepatocellular carcinoma^[238]55, and WNT5A has been
   shown as a transcription target gene of FOSL2^[239]54. To explore the
   function of FOSL2-K222 acetylation in CC, we performed real-time
   quantitative RT-PCR (qRT-PCR) to validate the FOSL2-WNT5A regulation in
   SiHa cells (Fig. [240]5f). We established SiHa cells stably expressing
   FLAG-tagged FOSL2-WT, FOSL2-K222R and FOSL2-K222Q, respectively
   (Fig. [241]5g). Interestingly, we found the acetylation mimetic
   FOSL2-K222Q mutant was more potent than FOSL2-WT in elevating the
   transcription of WNT5A (Fig. [242]5h). Furthermore, compared with
   FOSL2-WT and FOSL2-K222R, cells expressing FOSL2-K222Q showed more
   potent proliferation rate (Fig. [243]5i), and formed the largest tumors
   in xenograft tumorigenesis assay (Fig. [244]5j–l), implying a strong
   tumor-promoting effect of FOSL2-K222 acetylation. Therefore, these
   results together indicated that EP300 promoted CC cell proliferation at
   least impartially via FOSL2-K222 acetylation (Supplementary
   Fig. [245]8l).

Proteomic subgroups with distinct biological and clinical features

   We performed unsupervised consensus clustering based on TMT proteomic
   data (see Methods), resulting in three proteomic subgroups
   (Fig. [246]6a, Supplementary Data [247]8a). Focusing further on radical
   radiotherapy patients with PFS information (see Methods, n = 62), a
   significant difference in PFS was observed among the three proteomic
   subgroups. Patients in Subgroup 3 exhibited the best PFS, even when
   further divided by known prognostic clinicopathological variables such
   as stage, lymph node status, grade, and histology type (Fig. [248]6b,
   Supplementary Fig. [249]9a-d). DIA-based proteomic and TMT-based
   phosphoproteomic/acetylproteomic data were also used to cluster CC
   patients into three subgroups respectively, consistent with the TMT
   proteomic subgroups, demonstrating the reliability and robustness of
   proteomic subgrouping (Supplementary Fig. [250]9e-h).
   Transcriptome-based subgroups showed lower concordance with TMT
   proteomic subgroups, and their PFS differences were less pronounced
   than those based on proteomic profiling (Supplementary Fig. [251]9i),
   highlighting the superiority of proteomic subgrouping.

Fig. 6. Proteomic stratification of the CC cohort and the corresponding
protein pathways and subgroup-specific kinases.

   [252]Fig. 6
   [253]Open in a new tab

   a Patient subgrouping analysis of proteomic profiling identified three
   proteomic subgroups: Subgroup 1 (cyan, n = 53), Subgroup 2 (blue,
   n = 61), Subgroup 3 (red, n = 22). Each column represented a tumor
   sample, the associations of proteomic subgroups with clinical
   characteristics and somatic mutations were annotated in the top panel
   (Chi-square test or one-way ANOVA test, BH adjusted P). Rows in the
   bottom panel indicated differentially expressed proteins (One-way
   ANOVA test, Bonferroni-adjusted P < 0.05), color of each cell showed
   TMT protein abundance. b Kaplan-Meier curves of PFS for radical
   radiotherapy patients in each TMT proteomic subgroup (two-sided
   log-rank test). The comparison of immune score (c) and tumor size (d)
   among TMT proteomic subgroups (two-sided Student’s t test). Boxplots in
   c show the median (central line), the 25–75% IQR (box limits), and the
   ± 1.5 × IQR (whiskers). Data in d are represented as mean ± SEM. e
   Transcriptome-based deconvolution of mRNA transcript cell signatures in
   tumors using Xcell (one-way ANOVA test, BH adjusted P). f Differential
   phosphorylation sites among three TMT proteomic subgroups (one-way
   ANOVA test, Bonferroni-adjusted P < 0.05). Unsupervised clustering and
   biological pathways significantly enriched are presented on the right
   (one-sided Fisher’s exact test, BH adjusted P < 0.05). g Circular plot
   depicting the active kinases in each TMT proteomic subgroup compared to
   all other subgroups identified by KSEA (see Methods). The eight
   kinase-regulated phosphorylation sites with the highest t-statistics
   were indicated by black dots. h Heatmap showing the protein (TMT and
   DIA) and phosphorylation abundance of TMT subgroup-specific kinases.
   Source data are provided as a Source Data file.

   We found a higher prevalence of adenocarcinoma and HPV-A7 infection,
   along with a lower incidence of abnormal SCC level in Subgroup 1.
   Subgroup 2 showed the highest rates of 11q25 deletion and 3q29
   amplification (Fig. [254]6a, Supplementary Fig. [255]9j). Supervised
   differential expression analysis revealed subgroup-specific proteins.
   In Subgroup 1, proteins linked to keratinocyte differentiation and KRT
   family KRT4, KRT5, KRT6A, KRT6B), as well as S100A family members
   (S100A7, S100A8, S100A9) were down-regulated (Supplementary
   Fig. [256]9k), suggesting that Subgroup 1 may be associated with
   previously reported keratin-low cluster^[257]14. In Subgroup 2, 395
   proteins were up-regulated and enriched in DNA replication (MCM
   family), indicating proliferative characteristics for these tumors. In
   Subgroup 3, 381 proteins were up-regulated and enriched in complement
   and coagulation cascades, and neutrophil extracellular trap formation,
   indicating the inflammatory status of Subgroup 3^[258]56,[259]57.
   Pathway enrichment analysis also revealed that protein abundance in
   Subgroup 1 higher than in Subgroups 2 and 3 were enriched in
   ECM-receptor interaction, regulation of actin cytoskeleton, and amino
   sugar and nucleotide sugar metabolism pathways; Proteins up-regulated
   in Subgroup 1 and 2, were associated with ribosome, suggesting
   over-activated ribosome biogenesis^[260]58,[261]59 of CC tumors in
   these two subgroups (Fig. [262]6a, Supplementary Data [263]9a). The
   subgroup-related signatures from DIA-based proteomic data further
   corroborated the findings (Supplementary Fig. [264]9l, Supplementary
   Data [265]9b). Additionally, Subgroup 3 showed the highest immune
   infiltration scores, followed by Subgroup 2 and then Subgroup 1
   (Fig. [266]6c). This could further explain the highest response rate to
   radiotherapy in Subgroup 3 patients. While patients in Subgroup 1 have
   smaller tumors, their poorer PFS can be attributed to their
   significantly lower immune infiltration scores (Fig. [267]6d). Further
   comparison of the landscape of cellular heterogeneity revealed that
   Subgroup 1 patients exhibited the highest proportion of stromal cells
   and less NK and γδ T and DC cells, Subgroup 3 patients showed
   significantly higher proportions of monocytes and neutrophils
   (Fig. [268]6e), indicating Subgroup 3 patients might benefit from
   immunotherapy.

   To investigate the phosphoproteomic characteristics of CC subgroups,
   Unsupervised clustering was performed on the differentially expressed
   phosphosites, uncovering four phosphorylation modules corresponding to
   the four protein modules (Fig. [269]6f). Kinase-Substrate Enrichment
   Analysis (KSEA) was applied to further identify subgroup-specific
   kinases (Fig. [270]6g). PRKACA, PRKACB, PRKAA1, and PRKAA2 were
   identified as the Subgroup 1-specific kinases; CDK1, CDK2, EIF2AK2,
   MAPKAPK5 and TTK were identified as the Subgroup 2-specific kinases,
   associated with poor prognosis. PRKCB and PRKCD were identified as the
   Subgroup 3-specific kinases. PRKCB is involved in B cell activation,
   and apoptosis induction, and was considered as a tumor suppressor in
   colon cancer^[271]60. Overexpression of PRKCD in SiHa cells enhances
   the radiation sensitivity, while silencing of PRKCD expression in ME180
   cells by siRNA decreases the radiation sensitivity^[272]61. In TMT and
   DIA proteomic datasets, protein abundance of these subgroup-specific
   kinases showed similar expression patterns with phosphorylation,
   further demonstrating the strong association between phosphorylation
   function and protein expression (Fig. [273]6h).

Identification and validation of radioresponse-related biomarkers and
risk-scoring model

   By combining PFS data with protein expression in radical radiotherapy
   patients, we identified 92 radioresponse-related biomarkers (see
   Methods), including 37 favorable and 55 unfavorable candidates
   (Supplementary Fig. [274]10a). Then a risk scoring model was developed
   based on these biomarkers (see Methods). 62 radical radiotherapy
   patients were then divided into high-risk and low-risk groups based on
   the median risk score. Both TMT and DIA proteomic results showed that
   patients in the high-risk group had worse PFS than the low-risk group
   (Supplementary Fig. [275]10b, c). The prognostic predictive power of
   the risk scoring model was further validated in the TCGA CC cohort
   (Supplementary Fig. [276]10d).

   It is noteworthy that the Subgroup 3-specific kinase PRKCB
   (Fig. [277]6g), identified as one of the favorable biomarkers, was
   down-regulated in tumors (Fig. [278]7a, Supplementary Fig. [279]10e).
   Immunohistochemistry (IHC) images also showed reduced PRKCB expression
   in tumors compared to NATs (P = 4.4E-03, Supplementary Fig. [280]10i).
   Additionally, PRKCB proteins levels showed no significant association
   with clinical features, including different stages, lymph node
   metastasis and grade status (Supplementary Fig. [281]10f–h). Tumors
   with high PRKCB showed specific upregulation of Fc Gamma R-mediated
   phagocytosis, natural killer cell-mediated cytotoxicity, T cell
   receptor signaling pathway and B cell receptor signaling pathway, and
   downregulation of cell cycle, DNA replication, spliceosome, and ErbB
   signaling pathway (Fig. [282]7b). Besides, radical radiotherapy
   patients with high PRKCB protein (TMT/DIA) expression have a better PFS
   (Fig. [283]7c, d). Moreover, increased PRKCB mRNA levels were
   positively correlated with a better OS in the TCGA CC cohort
   (Fig. [284]7e). The IHC results from an independent cohort of 124
   patients (see Methods) who underwent radical radiotherapy further
   validated that higher PRKCB expression is positively correlated with
   better radiotherapy efficacy (Fig. [285]7f).

Fig. 7. Identification and validation of the radioresponse-related biomarker,
PRKCB.

   [286]Fig. 7
   [287]Open in a new tab

   a Relative protein (TMT and DIA) and phosphoprotein abundance of PRKCB
   in NATs and TMT proteomic subgroups (two-sided Student’s test).
   Boxplots show the median (central line), the 25–75% IQR (box limits),
   and the ± 1.5 × IQR (whiskers). b Associations of PRKCB protein
   abundance (TMT) with multi-omics profiles (|Spearman’s
   correlation | > 0.3). c Kaplan-Meier curves for PFS based on PRKCB
   protein abundance (TMT) of radical radiotherapy patients (two-sided
   log-rank test). d Kaplan-Meier curves for PFS based on PRKCB protein
   abundance (DIA) of radical radiotherapy CC patients (two-sided log-rank
   test). e Kaplan-Meier curves of OS based on PRKCB mRNA levels of TCGA
   CC cohort (two-sided log-rank test). f Kaplan-Meier curves for PFS
   based on PRKCB protein expression investigated by IHC of independent
   validation cohort (two-sided log-rank test). g SiHa cells stably
   expressing PRKCB were generated and confirmed by immunoblot. EV was
   used as a negative control. The experiments were repeated at least
   three times. h Proliferation of indicated SiHa cells was measured. Data
   are represented as mean ± SEM (n = 6, two-sided Student’s t test).
   ^⋆⋆⋆P = 1.7E-06 (EV and PRKCB). i Xenograft tumor pictures of nude mice
   in different groups. Scale bar, 1 cm. j Tumor growth curves of
   indicated SiHa cells subcutaneously injected into nude mice. Data are
   represented as mean ± SEM (n = 7 mice per group, two-sided Student’s t
   test). ^⋆⋆⋆P = 2.1E-04 (EV and PRKCB). k Tumor weight of nude mice in
   different groups. Data are represented as mean ± SEM (n = 7 mice per
   group, two-sided Student’s t test). ^⋆⋆⋆P = 1.6E-04 (EV and PRKCB). l
   Radiosensitization by PRKCB overexpression in SiHa cells in colony
   formation assays compared to EV. Clonogenic survival fraction were
   analyzed using SiHa cells in the relevant groups at radiation doses of
   0, 1, 2, 4, 6, and 8 Gray. Data are represented as mean ± SEM (n  =  6,
   two-sided Student’s t test). ^⋆⋆⋆P = 2.3E-05 (EV and PRKCB). m Flow
   cytometry was used to assess the cell cycle distribution of SiHa cells
   in the relevant groups. Modfit LT 3.1 was used for data analysis.
   Forward scatter and side scatter were used to circle the cell
   population (excluding debris), and the fluorescence channels PE-A
   (area) /PE-W (width) were used to exclude adherent cells. Finally, the
   proportion of cells in each stage was fitted. Left: Representative
   results of the cell cycle analysis. Right: Histogram of cell cycle
   distribution. Data are represented as mean ± SEM (n  =  3, two-sided
   Student’s t test). S: ^⋆⋆⋆P = 2.9E-03 (EV and PRKCB). G2/M:
   ^⋆⋆⋆P = 7.6E-05 (EV and PRKCB). Source data are provided as a
   [288]Source Data file.

   To explore the function of PRKCB in CC cell growth, we performed CCK-8
   experiments and confirmed that overexpression of PRKCB in SiHa cells
   inhibited tumor cell proliferation (Fig. [289]7g, h).
   PRKCB-overexpressing cells formed smaller tumors than the control cells
   in xenograft models (Fig. [290]7i–k). It is worth noting that
   overexpression of PRKCB significantly enhanced the killing effect of
   radiation and induce G2/M cycle arrest in SiHa cells, implying that
   PRKCB may enhance radiosensitivity by regulating cell cycle progression
   (Fig. [291]7l, m). Altogether, our data indicated that PRKCB could be a
   promising biomarker for prediction of radiotherapy efficacy and may
   play a tumor suppressor role in CC.

Discussion

   Comprehensive genomic and transcriptomic analysis of CC has
   significantly enhanced our comprehension of the molecular events of
   this malignancy^[292]11–[293]15,[294]62. Unlike previous CC proteomic
   studies^[295]63,[296]64, this study presents global genomic,
   transcriptomic, proteomic, phosphoproteomic and acetylproteomic data,
   offering additional and more comprehensive insights into the clinical,
   biological, and therapeutic dimensions of CC. Our integrated analysis
   provided a global view of the multi-omic changes associated with HPV,
   revealed functional PTMs on the key proteins such as FOSL2-K222
   acetylation involved in CC malignant proliferation, classified CC
   patients into three clinically relevant subgroups, and identified a
   candidate biomarker PRKCB and risk scoring model associated with
   radiotherapy response.

   Previous studies reported that HPV viral protein interacted with
   different epigenetic modifiers, including CREBBP, KAT2B,
   EP300^[297]65–[298]67. However, effects of viral genes on multi-omic
   layers in CC remain unexplored. Our investigation was conducted to
   examine the alterations in the profiles of core proteins and
   phosphorylated proteins, with a particular emphasis on the upregulation
   of H2AX and WEE1 phosphorylation in tumors expressing elevated levels
   of E6 mRNA. It has been demonstrated that the WEE1 inhibitor (AZD1775)
   can be considered as a potential radiosensitizer in CC^[299]68.
   Phosphorylation of the Ser 139 residue of H2AX, also known as γH2AX, is
   an initial cellular response to DNA double-strand breaks^[300]69.
   Phosphorylated H2AX is enriched in CC tumors with high expression of
   E6, which may be a consequence of HPV viral gene integration. We
   subsequently utilized high-quality MS data to uncover significant
   biological pathways and aberrant proteins that are closely associated
   with viral genes at the proteomic level, which is less studied in
   previous studies that primarily focused on genome and transcriptome.
   The integration of HPV viral genes usually results in the loss of viral
   E2 gene expression and a selection for the continued expression of E6
   and E7^[301]40–[302]42. Additionally, we observed a bimodal
   distribution in the mRNA levels of E2 and E5 in tumors, along with low
   or absent expression of the E2 gene. Then we clustered the samples into
   two groups with either high or low E2/E5 mRNA abundance. The patients
   undergoing radical radiotherapy, who exhibit low mRNA abundance of
   E2/E5 (HPV G_2), experience a worse PFS in comparison to patients with
   high mRNA abundance of E2/E5 (HPV G_1). We further analyzed the
   intrinsic motivation for the different prognosis from the perspective
   of aberrant proteins and phosphorylated proteins. These results not
   only expand and enhance the understanding that the deletion of the E2
   gene promotes tumor progression, but also innovatively reveal the
   impact of HPV E2/E5 gene deletion expression on the prognosis of CC
   patients.

   Acetyltransferases mediated protein acetylation is an essential protein
   PTM that regulates various biological processes, including oncogenesis
   and tumor progression. Previous studies have applied acetylome analysis
   in different cancers to demonstrate the function of acetylation
   protein^[303]70–[304]72, but few of them are related to CC. Here, we
   systematically described the acetylome characteristics in CC to
   identify the putative regulatory mechanisms. As a major
   acetyltransferase and transcriptional co-activator, EP300 has been
   indicated in many cancer types and plays important roles in cancer
   progression and poor prognosis^[305]71,[306]73. There are numerous
   instances where EP300 overexpression and inappropriate activation have
   been found to correlate with malignancy and drive cancer
   growth^[307]74–[308]76. Notably, small molecule inhibitors of EP300
   have also shown considerable inhibition of tumor progression in certain
   types of cancer^[309]77–[310]80. C646, a selective inhibitor of EP300,
   inhibited the proliferation and induced apoptosis of CC
   cells^[311]81.We found that acetyltransferase EP300, consistent with a
   previous study in CC^[312]82, was hyperacetylation in CC tissues.
   Previous studies revealed that autoacetylation of the EP300 HAT domain
   enhances its HAT activity^[313]50,[314]51. Therefore, hyperacetylation
   of EP300, especially in its HAT domain, may cause aberrate protein
   acetylation change to drive CC progression. This study discovered and
   confirmed that hyperacetylated EP300 drives cell proliferation and
   accelerates malignant through the FOSL2-K222 acetylation axis. These
   findings may offer a theoretical foundation for targeting EP300
   activity in the treatment of CC, but more precise mechanisms need to be
   investigated in future work.

   In this study, we classified CC patients into three TMT proteomic
   subgroups which provided therapeutic and biological insights. Subgroup
   2 consisted of sixty cases of squamous cell carcinoma and one case of
   adenosquamous carcinoma. However, the distribution of histological
   types of patients included in this study aligns with epidemiological
   statistics and is comparable to the TCGA CC patients with molecular
   subgroups^[315]14,[316]83,[317]84. Both TCGA CC subgroups and our
   proteomic subgroups suggest a correlation between the histological type
   and molecular classification. Protein enriched in Subgroup (such as
   MCM2, MCM3, MCM4, MCM5, MCM6, MCM7) were involved in DNA replication,
   indicating proliferative characteristics for these tumors. Compared
   with subgroup 1 and subgroup 2, Subgroup 3 associated with the best
   radiotherapy effect and showed the highest immune infiltration scores,
   proportions of monocytes and neutrophils, indicating that Subgroup 3
   patients might benefit the most from immunotherapy. Previous
   single-cell sequencing study has revealed radiochemotherapy-induced
   innate immune activation and MHC-II upregulation in CC^[318]85, which
   could further explain the best radiotherapy effect in Subgroup 3
   patients. Besides, the proteomic subgroups may also serve as a
   classifier to screen immune-benefiting populations and predict patient
   prognosis.

   Notably, we identified the biomarker PRKCB and built a risk-scoring
   model associated with the response to radiotherapy at the proteomic
   level through high-throughput screening. This finding was also
   confirmed in both TMT and DIA proteomic datasets. PRKCB was
   down-regulated in CC tumor tissues, and high PRKCB expression was
   positively correlated with better prognosis, consistent with previous
   findings reported in lung adenocarcinoma^[319]86. Other studies
   reported the implication of PRKCB in the regulation of mitochondrial
   integrity and oxidative phosphorylation^[320]87,[321]88. Overexpression
   of PRKCB in HEK293 cells led to a significant downregulation of
   autophagy, as assessed by the decrease of endogenous LC3-II
   levels^[322]89. PRKCB significantly enhanced the killing effect of
   radiation and induced G2/M cell cycle arrest in SiHa cells in our
   study. On one hand, we will elucidate the intricate molecular mechanism
   by which PRKCB enhances the sensitivity of CC to radiotherapy and its
   agonists. On the other hand, we plan to carry out multicenter cohort
   studies to facilitate the clinical implementation and utilization of
   PRKCB and risk scoring models, with the goal of enhancing the
   effectiveness of radiotherapy in treating locally advanced CC.

   This study bears the following limitations: 1) Only some of the tumors
   had matching NATs. Besides, although the current data correlates with
   patients’ radioresponse, however, we do not have tissue samples after
   radical radiotherapy. In the future, by expanding current primary
   treatment cohort and conducting a comparative analysis of samples
   before and after radiotherapy, it will be possible to find clues about
   important molecular events related to radioresponse. 2) The prognostic
   analysis of our current study was based on PFS data. And the analysis
   specifically focused on patients receiving radical radiotherapy, as
   surgery patients generally have a good prognosis. Extending the
   follow-up period in the future (up to 6 to 7 years post-treatment), we
   could consider including all patients and annotating patient prognosis
   from an OS perspective. 3) The proteogenomic data of tumors were
   collected from the same powdered sample in this study. This sample
   preparation methodology is good to keep the consistency when acquiring
   different omics data and to improve the accuracy of cross-omics
   analysis results, but it lost intra-tumor heterogeneity information in
   cellularity, especially the spatial structure information. Further
   integration of single-cell and spatial omics data will facilitate a
   more comprehensive analysis of the characteristics of CC.

   In summary, we conducted a comprehensive and integrative proteogenomic
   analysis of Chinese CC, resulting in a valuable public resource for
   exploring the landscape features of the CC genome, transcriptome, and
   proteome. Our proteogenomic findings extended the biology and clinical
   aspects of CC, and they could lead to clues as well as opportunities to
   improve the treatment of CC patients.

Methods

Clinical sample of PUMC-CC cohort

   Based on the sample selection parameters of TCGA CC cohort^[323]14,
   this study implemented the following criteria: 1) All tumors and NATs
   were collected before treatment, and were stored in liquid nitrogen
   within 30 min after isolation. 2) Each piece of tissue was sectioned
   and Hematoxylin and eosin (H&E) stained for tumor cellularity analysis
   before sequencing, and all H&E staining results were scanned and
   pathologically analyzed. 3) All tumor samples were completed
   independently by two pathologists to ensure histological accuracy and
   the NATs contained no tumor cells. 4) The percent tumor nuclei and
   percent necrosis were also assessed. Tumors with at least 50% tumor
   cell nuclei and less than 10% necrosis were reserved. All patients
   signed separate informed consent forms for sampling, research, and
   publication. This study was conducted in accordance with the guidelines
   and regulations set forth by the Ethics Committee of National Cancer
   Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking
   Union Medical College (approval No. 22/374-3576). The treatment-naive
   CC patients were recruited from Cancer Hospital Chinese Academy of
   Medical Sciences between June 2019 and July 2021. 139 tumors and 33
   NATs (30 NATs with paired tumors) were collected, and each sample was
   homogenized via cryo-pulverization and aliquoted to subsequent genomic,
   transcriptomic, proteomic, phosphoproteomic, and acetylproteomic
   analyses.

   Each sample was assigned a new research ID, and the patient’s name or
   medical record number used during hospitalization was de-identified.
   The clinical information of these enrolled patients were collected,
   including SCC, FIGO stage, tumor size, tumor grade, histology type, age
   diagnosis, HPV type, primary treatment, lymph node status and PFS
   information. PFS was defined as the time from the start of treatment to
   disease progression, relapse, or death. The follow-up deadline for
   these enrolled patients was July 2023. 1 patient did not receive
   antitumor treatment. The surgical patients generally have a good
   prognosis, no progression was observed in 62 out of the 73 surgical
   patients at the time of the follow-up cutoff. In addition, the standard
   treatment for locally advanced CC is cisplatin-based concurrent
   chemoradiotherapy, while radiotherapy alone is indicated in older
   patients or those with insufficient renal function^[324]90. Cisplatin
   remains the radiosensitizing agent for patients with locally advanced
   CC when used concomitantly with radiotherapy in the National
   Comprehensive Cancer Network guidelines^[325]91. In this study, 65
   patients (46.8% of the cohort) received primary treatment with either
   concurrent chemoradiotherapy (n = 54) or radiotherapy alone (n = 11).
   These patients, collectively referred to as radical radiotherapy
   patients^[326]92,[327]93, were combined for the PFS analysis.

   The 124 patients in the independent validation cohort were stage IIB to
   stage IVB CC patients who received concurrent chemoradiotherapy or
   radiotherapy alone at the Cancer Hospital of the Chinese Academy of
   Medical Sciences from November 2015 to December 2017. The
   clinicopathological parameters and FPS information for these cases were
   detailed in Supplementary Data [328]10.

Cell line

   SiHa and HEK293T cells were kept in Dr. Daming Gao’s lab. SiHa cells
   were cultured in MEM-NEAA with 10% FBS, 100 units of penicillin and
   100 mg/mL streptomycin. HEK293T cells were cultured in DMEM with 10%
   FBS, 100 units of penicillin, and 100 mg/mL streptomycin.

Proteogenomic workflow

   The proteogenomic research was carried out through the workflow shown
   in Supplementary Fig. [329]1a. To reduce the impact of intra-tumor
   heterogeneity on multi-omics analysis, tumors, and NATs were pulverized
   using the CryoPrepTM CP02 (Covaris) and then divided into two parts:
   the first part was snap-frozen in liquid nitrogen and stored at −80 °C
   for the following proteomic, phosphoproteomic and acetylproteomics
   analyses as well as DNA extraction; the second part was added to the
   RNAlater^TM Stabilization Solution (Invitrogen, AM7020) and kept in
   −80 °C, and then used for RNA extraction.

DNA/RNA extraction, WES, and RNA-seq

   Genomic DNA was extracted from tumors and NATs using Magnetic Universal
   Genomic DNA Kit (TIANGEN, DP705) according to manufacturer’s protocol.
   Matched blood DNA was also extracted using the above kit. The quality
   of isolated genomic DNA was verified by Qubit® DNA Assay Kit in Qubit®
   3.0 Flurometer (Invitrogen) and NanoDrop 2000 (Thermo Fisher
   Scientific) and the integrity was assessed by TapeStation (Agilent
   Technologies). A total amount of 0.2 μg DNA per sample was used as
   input material for the DNA library preparations. Sequencing library was
   generated using NEB Next® Ultra^TM DNA Library Prep Kit for Illumina
   (NEB) following manufacturer’s recommendations and index codes were
   added to each sample. DNA libraries were sequenced on the Illumina
   NovaSeq 6000 platform and 150 bp paired-end reads were generated. WES
   was conducted with a mean coverage depth of 156X (range: 112-253X) for
   tumors,157X (range: 125-216X) for NATs, and 126X (range: 83-205X) for
   blood samples.

   Total RNA was extracted and purified from fresh frozen tissues using
   the TRIzol® reagent (Invitrogen, 15596026CN). RNA integrity was
   assessed using the RNA Nano 6000 Assay Kit of the Bioanalyzer 2100
   system (Agilent Technologies). Briefly, mRNA was purified from total
   RNA using poly-T oligo-attached magnetic beads. Fragmentation was
   carried out using divalent cations under elevated temperature in First
   Strand Synthesis Reaction Buffer (5X). Paired-end libraries were
   sequenced on the NovaSeq 6000 Illumina platform, for each tumor and NAT
   sample, RNA-seq resulted in an average of 64.1 M and 64.4 M
   high-quality reads, respectively.

MS methods

Protein extraction and tryptic digestion

   For protein extraction, ~50 mg of cryo-pulverized tumors or NATs were
   homogenized in 500 μL lysis buffer (4% SDS, 0.1 M DTT, 0.1 M Tris-HCl,
   pH 7.6) and sonicated at 20% amplitude for 2 min (5 s on, 5 s off)
   using Ultrasonic Homogenizer (NingBo Scientz Biotechnology Co., LTD,
   JY92-IIDN). The proteins were then denatured and reduced at 95 °C for
   5 min. Lysates were centrifuged at 12,000 g for 10 min to remove the
   insoluble debris and protein concentrations of the clarified lysates
   were determined using tryptophan-based fluorescence quantification
   method^[330]94. Protein lysates (1 mg) were diluted to equal
   concentration and alkylated with iodoacetamide (IAA) for 45 min at room
   temperature in the dark. Then, protein lysates were precipitated by
   adding ice-cold acidified acetone/ethanol buffer with five times the
   volume of lysis buffer mixed and put them overnight at −20 °C.
   Precipitated proteins were collected by centrifugation at 18,000 g for
   40 min at 4 °C, washed with 1 mL ice-cold acetone and 1 mL 70% ethanol
   for 40 min at 4 °C. The proteins were then subjected to proteolytic
   digestion with sequencing grade modified trypsin (Promega) at 1:50
   enzyme-to-substrate ratio overnight at 37 °C. The resulting digests
   were acidified with FA to achieve a final volumetric concentration of
   1%, and then subjected to desalting using Waters Sep-Pak® Vac 1cc
   (50 mg) tC18 cartridges.

TMT 16-plex labeling of peptides

   Desalted peptides from each sample were labeled with TMTpro 16-plex
   reagents (Thermo Fisher Scientific). Tumors and NATs were co-randomized
   to 12 TMT sets. The NATs were labeled using channels that closely
   matched the paired tumors within the same TMT set. In addition, paired
   samples were evenly distributed among the 12 TMT sets to ensure a
   balance of NATs across all sets. A mixed sample was prepared by pooling
   an aliquot from 30 CC tumors and NATs and labeled with channel 134N in
   all of the TMTpro 16-plex sets as internal reference. For each TMT
   labeling experiment, dried peptides (250 μg) from each sample were
   dissolved in 250 μL 100 mM TEAB. 5 mg of TMT reagent was dissolved in
   205 μL anhydrous acetonitrile, and then 50 μL of each TMT reagent was
   added to the corresponding peptides. After 1-h incubation at room
   temperature, 12.5 μL 5% hydroxylamine was added to quench the labeling
   reaction for 15 min at room temperature. The labeled peptides were
   pooled, dried down via Speed-Vac, and subsequently desalted on a
   reversed-phase tC18 SepPak column (Waters).

Peptide fractionation

   In the TMT-based proteome analysis, the labeled peptides were pooled
   together to create a highly complex mixture. To reduce sample
   complexity and increase the depth of protein identification, high-pH
   reverse phase liquid chromatography (RPLC) was used for peptide
   fractionation, as previously described^[331]33. For each TMT set, about
   4 mg TMTpro 16-plex labeled peptides were fractionated using a
   4.6 mm × 250 mm Waters XBridge BEH300 C18 column with 3.5 μm size beads
   (Waters). Peptides were separated using an Agilent 1260 HPLC instrument
   via high-pH RPLC with solvent A (10 mM ammonium formate, pH 10) and a
   non-linear increasing concentration of solvent B (90% ACN, 10 mM
   ammonium formate, pH 10) at a flow rate of 0.7 mL/min. The 110-min
   separation gradient was set as follows: 1–5% B in 2 min; 5–25% B in 35
   min; 25–40% B in 43 min; 40–55% B in 6 min; 55–95% B in 3 min; 95% B
   for 4 min; 95–1% B in 1 min; 1% B for 16 min. Peptides were separated
   and collected every minute for a total of 96 fractions from 3 min to
   99 min, with fractions combined into 24 fractions by a stepwise
   concatenation strategy. 5% of each of the 24 fractions was allocated
   and dried down in a Speed-Vac for global proteome analysis. The
   remaining 95% sample was then utilized for phosphopeptides enrichment.
   In DIA-based proteome analysis, since the samples are analyzed
   individually, each sample is directly analyzed without prior
   fractionation.

Phosphopeptide enrichment

   High-Select Fe-NTA kit (Thermo Fisher Scientific, A32992) was used for
   phosphopeptide enrichment according to the manufacturer’s instructions.
   Briefly, the 24 fractionated peptides were dissolved in 200 μL 80%
   ACN/0.1% TFA and incubated with 50 μL Fe^3+-NTA agarose beads for
   20 min at room temperature. Then, the mixture was transferred into the
   filter tip (Axygen, TF-200-L-R-S) and clarified peptide flow-throughs
   with unbound peptides were collected by centrifugation. The resins with
   phosphopeptides were washed with 200 μL 80% ACN/0.1% TFA for 3 times
   and 200 μL H[2]O for 3 times. The bound phosphopeptides were eluted
   twice with 200 μL 50% ACN/5% NH[3]·H[2]O and dried down via Speed-Vac.
   All centrifugation steps above were conducted at 50 g at room
   temperature. The eluted peptides were desalted using C18 stage tips and
   dried down.

Acetylpeptide enrichment

   PTMScan Acetyl-Lysine Motif [Ac-K] Kit (CST, 13416) was used for
   acetylpeptide enrichment according to the manufacturer’s instructions.
   In brief, tryptic peptides from the flow-through of IMAC were
   concatenated into 4 fractions and dried down via Speed-Vac. The dried
   peptides were reconstituted in 1.4 mL of the immunoaffinity
   purification (IAP) buffer (50 mM MOPS/NaOH pH 7.2, 10 mM Na[2]HPO[4]
   and 50 mM NaCl) and incubated with freshly prepared acetyl-lysine motif
   antibody agarose beads for 2 h at 4 °C. After removing the supernatant,
   peptide-bound beads were washed 2 times with 1 mL of ice-cold IAP
   buffer followed by 3 times with 1 mL HPLC H[2]O. Then, acetylated
   peptides were eluted by incubation 2 times with 55 μL of 0.15% TFA at
   room temperature for 10 min. The eluted peptides were desalted using
   C18 stage tips and dried down.

LC-MS/MS analysis for TMT-based proteomics

   Peptides were resolved with 0.1% formic acid (FA) and separated on a
   nanoflow Easy nLC 1200 UHPLC system (Thermo Fisher Scientific) with an
   in-house packed 20 cm × 75 μm interbal diameter C18 column (3 μm
   ReproSil-Pur C18-AQ beads, Dr. Maisch GmbH). The column was heated to
   50 °C using a home-made column heater. The flow rate was set at
   300 nL/min with 0.1% FA in H[2]O (Solvent A) and 0.1% FA in 80%
   acetonitrile (Solvent B). The 120-min separation gradient was used for
   proteome analysis and set as followings: 2–5% B in 1 min; 5–27% B in 94
   min; 27–35% B in 15 min; 35–52% B in 3 min; 52–100% B in 1 min; and
   100% B in 6 min. For phoshoproteome analysis, the same LC and column
   setup was used except for a 70-min LC gradient (2–4% B in 1 min; 4–25%
   B in 51 min; 25–32% B in 8 min; 32–50% B in 3 min; 50–100% B in 1 min;
   and 100% B in 6 min). For acetylproteome analysis, the same LC and
   column setup was used except for a 180-min LC gradient (4–35% B in 152
   min; 35–45% B in 20 min; 45–100% B in 2 min; and 100% B in 6 min).

   For proteomic and phosphoproteome analysis, samples were analyzed with
   a Q-Exactive HF mass spectrometer (Thermo Fisher Scientific) equipped
   with a nanoflow ionization source. Data-dependent acquisition was
   performed using Xcalibur software in positive ion mode at a spray
   voltage of 2300 V. The MS1 spectra were measured with a resolution of
   120,000@m/z 200, an AGC target of 3e6, a maximum injection time of
   50 ms and a mass range of 300 to 1700 m/z. The data-dependent mode
   cycle was set to trigger MS2 scan on up to the top 15 most abundant
   precursors per cycle at an MS2 resolution of 60,000 @ m/z 200, an AGC
   target of 1e5, a maximum injection time of 120 ms, an isolation window
   of 0.7 m/z, an high collision dissociation (HCD) collision energy of
   30, and a fixed first mass of 110.0 m/z. The dynamic exclusion time was
   set as 40 s (30 s for phosphoproteome) and precursor ions with charge
   1, 7, 8 and >8 were excluded for MS2 analysis.

   For acetylproteome analysis, samples were analyzed with a Q-Exactive
   HF-X mass spectrometer (Thermo Fisher Scientific) equipped with a
   nanoflow ionization source. Data-dependent acquisition was performed
   using Xcalibur software in positive ion mode at a spray voltage of 1800
   V. The MS1 spectra was measured with a resolution of 120,000 @ m/z 200,
   an AGC target of 3e6, a maximum injection time of 50 ms and a mass
   range of 350 to 1700 m/z. The data-dependent mode cycle was set to
   trigger MS2 scan on up to the top 15 most abundant precursors per cycle
   at an MS2 resolution of 45,000 @ m/z 200, an AGC target of 1e5, a
   maximum injection time of 120 ms, an isolation window of 0.7 m/z, an
   HCD collision energy of 30, and a fixed first mass of 110.0 m/z. The
   dynamic exclusion time was set as 30 s and precursor ions with charge
   1, 7, 8 and >8 were excluded for MS2 analysis.

DIA analysis

   Unlabeled, digested peptides from each sample were separated on a
   nanoElute LC system (Bruker) with an analytical column (20 cm × 75 μm,
   1.6 μm C18 resin, IonOpticks). The column was heated to 50 °C using a
   column heater (Bruker). The flow rate was set at 300 nL/min with 0.1%
   FA in H[2]O (Solvent A) and 0.1% FA in acetonitrile (Solvent B). The
   90-min separation gradient was used for proteome analysis and set as
   followings: 2–22% B in 75 min; 22–37% B in 5 min; 37–80% B in 5 min;
   and 80% B in 5 min. Samples were analyzed using a timsTOF Pro mass
   spectrometer (Bruker) equipped with a nano-electrospray ion source
   (Bruker). Source capillary voltage was set to 1500 V in positive ion
   mode, and dry gas flow to 3 L/min at 180 °C. The DIA MS data were
   acquired using the diaPASEF method^[332]95 consisting of 14 cycles,
   which including a total of 28 mass-width windows (25 Da width, from 452
   to 1152 Da) with 4 mobility windows each, making a total of 56 windows
   covering the ion mobility range (1/K0) from 0.76 to 1.29 Vs/cm^2. The
   MS and MS/MS spectra were acquired from 100 to 1700 m/z. The TIMS-MS
   survey scan was acquired between 0.7 and 1.3 Vs/cm^2. The TIMS-MS
   survey scan was acquired between 0.7–1.3 Vs/cm^2 and 100–1700 m/z. The
   acquisition time of each PASEF scan was set to 100 ms with a near 100%
   duty cycle, which led to a total cycle time of around 1.59 s.

Database searching of MS data

   For TMT-based proteomics, all MS raw files were searched against the
   human Swiss-Prot database containing 20,360 sequences plus 277
   Swiss-Prot HPV protein sequences using MaxQuant (version
   1.6.17.0)^[333]96. TMTpro 16-plex based MS2 reporter ion quantification
   was chosen with reporter mass tolerance set as 0.003 Da. The purities
   of TMT labeling channels were corrected according to the kit LOT number
   (WC320807). The PIF (precursor intensity fraction) filter value was set
   at 0.5. Enzyme digestion specificity was set to Trypsin and maximum 2
   missed cleavages were allowed. Oxidized methionine, protein N-term
   acetylation, lysine acetylation, asparagine, and glutamine (NQ)
   deamidation were set as variable modifications. Carbamidomethyl
   cysteine was set as fixed modification. For phosphorylation data
   analysis, phospho (STY) was also chosen as a variable modification. The
   tolerances of first search and main search for peptides were set at 20
   ppm and 4.5 ppm, respectively. A cutoff of 1% FDR was applied at the
   peptide, protein, and site level. A minimum of 7 amino acids was
   required for peptide identification. For phosphosite and acetylsite
   localization, the localization probability >0.75 was considered as a
   confident identification. Contaminants, reverse sequences, and only
   considered identified sequences identified from the MaxQuant runs were
   removed except for KRT family proteins since KRT is known to be highly
   expressed in squamous cell carcinoma.

   For DIA proteomics, raw data files were analyzed using DIA-NN software
   (version 1.8.0)^[334]97. The Swiss-Prot database was used for a
   library-free search with precursor FDR set as 1%. Deep learning-based
   spectra and retention time prediction were enabled. The fragment m/z
   was set from 200 to 1800 and the peptide length was set from 7 to 30.
   Trypsin/P was enabled and maximum number of missed cleavages set to 1.
   N-terminal methionine excision was enabled and cysteine
   carbamidomethylation was set as a fixed modification. The median number
   of points measured across chromatographic peak is 7.

Somatic mutation calling and filtering

   WES sequencing reads after exclusion of low-quality reads were mapped
   to GRCh38 reference genome with BWA (Version: 0.7.17,
   [335]http://bio-bwa.sourceforge.net/)^[336]98, PCR duplicates were
   removed by Picard (version 2.27.2,
   [337]http://broadinstitute.github.io/picard/), the base quality score
   was recalibrated by the BaseRecalibrator tool from GATK (version
   v4.2.6.1, [338]https://software.broadinstitute.org/gatk/). For patients
   with blood WES data, somatic variants were detected using Mutect2 on
   tumor exome data and matched blood data. For those without matched
   blood data, the genetic variants database of the 1000 Genome Project
   was used as the panel of normal. Specifically, germline variants were
   filtered from the database of the 1000 Genome Project, NHLBI-ESP 6500
   Exome Sequencing Project, Exome Aggregation Consortium (EXAC), and
   Genome Aggregation Database (gnomAD). Further filtering was done to
   obtain high-confidence somatic mutations using the criteria: a minimum
   of 8X coverage, Variant Allele Fraction (VAF)
   [MATH: <mo>≥</mo> :MATH]
   5% and at least 3 variant supporting reads in the sample. Then,
   FilterMutectCalls. Annovar (version 2017 Jul 17,
   [339]https://annovar.openbioinformatics.org/en/latest/) was used
   functionally annotate genetic variants based on the annotation of the
   human genome (GRCh38), version 32 (Ensembl 98)^[340]99, and mutations
   in the non-coding regions (3′UTR, 5′UTR, Intron, gene intergenic etc.)
   were removed.

Analysis of significantly mutated genes

   The filtered mutations (including SNV and indel) were further used to
   identify significantly mutated genes by MutSigCV (version 1.41
   [341]http://www.broadinstitute.org/cancer/cga/MutSig) with default
   parameters. Genes with P < 0.05 were identified as significantly
   mutated genes^[342]100.

CNAs analysis

   CNAs were called using cnvkit (version 0.9.7,
   [343]https://cnvkit.readthedocs.io/en/stable/pipeline.html)^[344]101 on
   exome data with default parameters. All blood samples were generated to
   build the copy-number reference. Confident focal SCNAs across all tumor
   samples were identified by GISTIC2 with parameters amplifications
   threshold: 0.3, deletions threshold: −0.3, focal length cutoff: 0.5,
   value threshold: 0.25, genetic: yes, run broad analysis: yes, focal
   length cutoff: 0.5, confidence level: 0.99, arm peel: yes, joined
   segment size: 100, gene collapse method: extreme and max sample segs:
   3000. Significantly amplified and deleted chromosome arms were
   identified with a threshold of FDR < 0.25^[345]102.

Comparisons of frequently mutated genes between cervical squamous cell
carcinoma and adenocarcinoma

   The somatic mutation data called by Mutect from GDC TCGA CC were
   obtained from [346]https://xenabrowser.net/datapages/, and the cancer
   type information was retrieved from TCGA CC paper^[347]14. Fisher’s
   exact test was used to evaluate the statistical significance of the
   difference in mutation rates between patient groups.

RNA-seq data analysis

   After removal of adaptor contamination and low-quality reads,
   sequencing reads were aligned using STAR (version 2.7.2a) to human
   reference sequence (GRCh38 assembly) and featureCounts was used to
   produce a matrix of read counts across all genes based on the GENCODE
   gene annotation (version 32)^[348]103. An average of 68 million
   paired-end reads were sequenced per sample. The ratios of uniquely
   aligned reads exceeded 88% in all samples. Then, Transcripts Per
   Million (TPM) normalized read count of each gene was calculated based
   on the gene length and read count mapped to this gene. Subsequently,
   the Log[2] transformed TPM values were used for downstream analysis.

HPV detection

   After being aligned to hg38 reference genome using STAR, the unmapped
   RNA-seq reads were then aligned to the reference genome of 440 HPV
   subtypes. HPV detection and genotyping algorithm HPV-EM was used to
   identify the exact genotypic makeup of each sample^[349]34. Here
   subtypes with mapped read count less than 10 were not considered. The
   dominant infected HPV subtype was determined based on the criterion
   that its mapped read count was more than ten times higher than the sum
   of the other subtypes. For the three tumor samples (TN33, TB103-1, and
   TB98-1) lacking transcriptome data, HPV infection status was determined
   based on clinical detection records. HPV-EM was also utilized to
   measure HPV gene expression, providing read counts for HPV genes. HPV
   gene expression was subsequently normalized to TPM using the total read
   count aligned to the human genome.

Inference of cell type score

   ESTIMATE was used to infer the stromal and immune scores from the RNA
   expression data^[350]104. In addition, the signature-based method Xcell
   was used to dissect the relative infiltration levels of different
   immune cell types^[351]105.

Proteomic data analysis

Analysis of TMT quantitative proteomic data

   Here we used reverse-zMAP developed based on our previous MAP
   model^[352]106 to process cancer proteomics data generated using
   iTRAQ/TMT platforms with internal reference samples. Briefly, all the
   clinical samples were separately compared to the internal reference
   sample generated in the same MS run using a modified MAP algorithm and
   a rescaled Log[2] ratio of MS intensities, termed z-statistic, was
   calculated for each protein to describe its relative abundance change
   between the two samples. In details, for each MS run, we systematically
   performed pairwise comparison for each tissue sample against the
   reference sample and the Log[2] ratio of its protein intensities is
   calculated as
   [MATH:
   <msub><mrow><mi>M</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><m
   o>=</mo><msub><mrow><mi>L</mi><mi>o</mi><mi>g</mi></mrow><mrow><mn>2</m
   n></mrow></msub><mfenced close=")"
   open="("><mrow><mfrac><mrow><msub><mrow><mi>S</mi></mrow><mrow><mi>i</m
   i><mi>j</mi></mrow></msub></mrow><mrow><msub><mrow><mi>R</mi></mrow><mr
   ow><mi>i</mi><mi>k</mi></mrow></msub></mrow></mfrac></mrow></mfenced><m
   o>,</mo> :MATH]
   1

   in which
   [MATH:
   <msub><mrow><mi>S</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub>
   :MATH]
   denotes the MS intensity of protein
   [MATH: <mi>i</mi> :MATH]
   in sample
   [MATH: <mi>j</mi> :MATH]
   and
   [MATH:
   <msub><mrow><mi>R</mi></mrow><mrow><mi>i</mi><mi>k</mi></mrow></msub>
   :MATH]
   represents its intensity in the reference sample of this MS run, i.e.,
   run k. Then, a similar sliding window analysis to that performed in MAP
   was applied to derive a linear model for the Log[2] ratios in each
   window.
   [MATH:
   <msub><mrow><mi>M</mi></mrow><mrow><mi>j</mi></mrow></msub><mo>=</mo><m
   sub><mrow><mi>μ</mi></mrow><mrow><mi>j</mi></mrow></msub><mo>+</mo><msp
   ace
   width="0.25em"></mspace><msub><mrow><mi>σ</mi></mrow><mrow><mi>j</mi></
   mrow></msub><mo>*</mo><mspace width="0.25em"></mspace><mover
   accent="true"><mrow><mi>q</mi></mrow><mo>^</mo></mover> :MATH]
   2

   Next, natural cubic spline fitting was separately applied to
   characterize the dependence of the fitted
   [MATH: <msub><mrow><mi>μ</mi></mrow><mrow><mi>j</mi></mrow></msub>
   :MATH]
   and
   [MATH:
   <msup><mrow><msub><mrow><mi>σ</mi></mrow><mrow><mi>j</mi></mrow></msub>
   </mrow><mrow><mn>2</mn></mrow></msup> :MATH]
   on the Log[2] protein intensity across all windows. Finally, for each
   protein,
   [MATH:
   <msub><mrow><mi>μ</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub>
   :MATH]
   and
   [MATH:
   <msub><mrow><mi>σ</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub>
   :MATH]
   were calculated based on the fitted natural cubic spline function,
   respectively, and then, similar to the z-statistic defined in MAP, we
   defined the z-statistic of protein
   [MATH: <mi>i</mi> :MATH]
   in sample
   [MATH: <mi>j</mi> :MATH]
   as
   [MATH:
   <msub><mrow><mi>Z</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><m
   o>=</mo><mfrac><mrow><msub><mrow><mi>M</mi></mrow><mrow><mi>i</mi><mi>j
   </mi></mrow></msub><mo>−</mo><msub><mrow><mi>μ</mi></mrow><mrow><mi>i</
   mi><mi>j</mi></mrow></msub></mrow><mrow><msub><mrow><mi>σ</mi></mrow><m
   row><mi>i</mi><mi>j</mi></mrow></msub></mrow></mfrac><mo>,</mo> :MATH]
   3

   in which the dependence of observed Log[2] ratios on the protein
   intensities was effectively mitigated and the contribution of
   systematic and technical errors was rescaled to follow the standard
   normal distribution across all samples. Thus, it enables a better
   characterization of relative protein expression abundance and can be
   reliably used for downstream analyses.

Analysis of DIA quantitative proteomic data

   For DIA proteomic data, protein expression profiles from all samples
   were normalized together, to minimize the impact of missing values, and
   we used the proteins that were detected in all samples to calculate the
   normalization factors. Since there are no internal reference samples in
   the DIA dataset, we take the geometric mean of all DIA proteomic
   profiles as a pseudo-reference sample. Then, a similar parallel
   pairwise comparison analysis to that performed on TMT data was applied
   to compare each DIA proteomic profile against the pseudo reference
   sample and calculate a z-statistic for each protein as the basis for
   downstream analysis.

Analysis of phosphoproteomic and acetylproteomic data

   The phosphoproteomic and acetylproteomic data were normalized using the
   median centering method across adjacent phosphorylation/acetylation
   sites. Then, similar to the above analysis, we also utilized a sliding
   window approach on the M-A plot to first calculate the median
   modification intensity change and average modification intensity within
   each window. Subsequently, we applied natural cubic spline fitting to
   fit the dependence between the median modification intensity change and
   the average modification intensity across all windows. Then, the Log[2]
   ratio of each protein’s modification intensities was adjusted by the
   estimated median modification intensity derived from the fitted natural
   cubic spline function. The median ratio of all the modification sites
   of each protein was calculated as a measure of its modification level.

Imputation of missing values

   DreamAI ensemble algorithm^[353]107 (implemented using the DreamAI R
   package, [354]https://github.com/WangLab-MSSM/DreamAI) was applied to
   impute the missing values in proteomic, phosphoproteomic and
   acetylproteomic data. Imputation was only performed on the proteins,
   phosphosites and acetylsites with a missing rate <50%.

Batch effect and data quality analysis of proteomic data

   The batch effect in TMT data after transforming the original MS
   intensities into z-statistics was assessed by performing unsupervised
   PCA. The leading PCs of the global proteomic data clearly separated the
   tumors from NATs, and no obvious batch effect was observed among the 12
   TMT batches. We also performed unsupervised PCA on the DIA proteomic
   data, and no obvious batch effect was observed among the 3 DIA batches,
   too.

Subgrouping analysis of transcriptomic, proteomic, phosphoproteomic and
acetylproteomic data

   For TMT proteomic data, we selected the foremost 3000 proteins
   demonstrating the highest variability across tumor samples. Among
   these, 1674 proteins were identified without any missing values across
   all tumor samples. Subsequently, consensus clustering was performed on
   this set of 1674 highly variable proteins using the
   ConsensusClusterPlus R package^[355]108. The detailed parameter
   settings were number of repetitions = 1000 bootstraps; pItem = 0.8
   (resampling 80% of any sample); pFeature = 1; distance = “euclidean”;
   and k-means clustering with up to 5 clusters.

   For phosphoproteomic and acetylproteomic data, top 3000 most variable
   phosphoproteins/phosphoproteins within tumor samples were selected
   based on the protein-level modification abundance matrix, and among
   them, phosphoproteins/phosphoproteins without missing values were used
   as input for the ConsensusClusterPlus R package to perform sample
   clustering, with the same parameters applied as above.

   For RNA-seq data, we applied HyperChIP to model the mean-variance curve
   and subsequently utilized scaled variances that consider the
   mean-variance relationship for gene ranking^[356]109. Before
   constructing the model, we excluded genes with an average expression
   value of Log[2] (TPM + 1) < 2 in the tumor samples. Standardized
   z-score matrix of Log[2] (TPM + 1) for the top 3000 hypervariable genes
   were used to perform consensus clustering using identical parameters as
   above. In summary, the tumor samples were classified into three
   clusters at transcriptome and PTMs levels as well.

Differential protein expression/modification analysis between patient
subgroups

   ANOVA test was used to detect differentially expressed proteins,
   phosphosites and acetylsites among the three subgroups, with
   Bonferroni-adjusted P-values < 0.05 as the cutoff. Then, to detect
   proteins/PTM sites specifically expressed/modified in each subgroup,
   hierarchical clustering was applied to divide the differentially
   expressed proteins and PTM sites into 3 or 4 clusters using
   seaborn.clustermap (python module) with parameter metric =
   “correlation”; method = “average” and scipy.cluster.hierarchy.fcluster
   (Python module) with parameter criterion = “maxclust”. Pathway
   enrichment analysis of the proteins/PTM sites in each cluster was
   performed using GSEApy.enrichr (Python module) with the parameter:
   gene_sets = “KEGG_2021_Human”. Pathways with BH adjusted P < 0.05 were
   considered as significantly regulated.

Association between proteomic subgroup and clinical outcome

   To compare the survival outcomes among the three proteomic subgroups,
   log-rank test was employed. Then, Kaplan-Meier survival curves were
   generated using the KaplanMeierFitter function in Python to visualize
   the survival difference between three subgroups of patients.

Tumor-NAT samples differential expression analysis

   Differential expression analysis was performed for 30 paired CC tumors
   and NATs using the Student’s t test. P-values were adjusted using the
   BH method. Each feature was required to be non-missing in at least 50%
   of the paired samples. Proteins or phosphosites/acetylsites (collapsed
   into gene-level) differentially expressed between tumors and NATs (BH
   adjusted P < 0.01, Log[2] FC > 1 or < −1) were further used for
   over-representation analysis by WebGestalt^[357]110.

Multi-omics data integration

Analysis of mRNA-protein expression correlation

   Spearman’s correlation coefficient was used to measure the correlation
   between each mRNA-protein pair across all 132 tumors (using the RNA-seq
   and TMT proteomic data). In addition, a P-value was calculated for each
   correlation coefficient, which was then adjusted for multiple testing
   using the BH method. The correlation between a mRNA-protein pair was
   called significant if the adjusted P-value was found to be <0.05. A
   total number of 6089 mRNA-protein-matched genes were examined, with a
   median Spearman’s correlation coefficient of r = 0.40. Moreover, the
   mRNA and protein levels were found to be positively correlated for most
   (95.9%) mRNA-protein pairs, and 81.7% showed significant positive
   correlation.

Analysis of the cis/trans effect of CNAs

   The effects of SCNAs on mRNA, protein, and phosphoprotein abundance
   levels in either cis (within the same aberrant locus) or trans (remote
   locus) mode were visualized by multiOmicsViz (R package). Spearman’s
   correlation coefficients and the associated P values after adjustment
   for multiple testing were calculated for all possible
   CNA-mRNA/protein/phosphoprotein pairs. BH adjusted P < 0.01 was used to
   identify the significant correlations. A total of 130 tumor samples
   with both mRNA, WES, and proteomic data were included in this analysis.

Identification of CNG-Cis and CNL-Cis genes

   Among the 8035 genes located within amplification foci, 337 genes
   showed significantly higher abundance in tumor samples compared with
   NATs at both mRNA and protein levels (Wilcoxon rank-sum test, BH
   adjusted P < 0.05), and 227 genes also represented significant
   correlation of copy number with corresponding RNA levels, including 178
   that displayed concordant protein expression (Spearman’s correlation
   >0, BH adjusted P < 0.05). These 178 genes were identified as CNG-Cis
   genes. The same screening method was used to identify 28 CNL-Cis genes
   that were downregulated in tumor samples.

Analysis of the effects of arm-level CNAs

   Arm-level CNAs were detected using Log[2] (tumor/blood) = 0.3/−0.3
   (corresponding to arm-gain/loss, respectively) as the cutoff. Then, for
   each detected arm gain/loss event, the tumor samples were accordingly
   divided into two subgroups and the proteins whose expression was
   significantly associated with this event were identified using
   Student’s t test (with FDR < 0.05 as the cutoff). KEGG pathway
   enrichment analysis of the arm gain/loss-associated proteins supported
   by both TMT and DIA data was performed using GSEApy.enricher (python
   module) with the parameter gene_sets = “KEGG_2021_Human”^[358]111.
   Pathways with FDR < 0.05 were regarded as arm gain/loss-associated
   pathways.

Identification of radioresponse-related biomarkers

   For each candidate protein, we stratified the 62 radical radiotherapy
   CC patients into two groups using its median expression level as the
   cutoff. Then, Cox proportional hazards regression for PFS data was
   performed to identify radioresponse-related biomarkers of CC. Proteins
   detected in at least more than half of the samples are used here. The
   filter criteria for biomarker were: P-value of Cox proportional hazards
   regression <0.05; Log (HR) upper 95% < 0 and Log (HR) lower 95% > 0 for
   candidate favorable and unfavorable proteins, respectively. Finally, 37
   favorable and 55 unfavorable radioresponse-related biomarkers were
   identified, supported by both TMT and DIA proteomic data. Kaplan-Meier
   curves (log-rank test) were used to visualize the prognosis difference
   between radical radiotherapy patients with high and low protein
   expression.

Risk scoring model based on radioresponse-related biomarkers

   We identified 92 radioresponse-related biomarkers, comprising 55
   prognostically unfavorable proteins and 37 favorable ones. Unfavorable
   score and favorable score of each sample were calculated according to
   the following formulas.
   [MATH:
   <msub><mrow><mi>u</mi><mi>n</mi><mi>f</mi><mi>a</mi><mi>v</mi><mi>o</mi
   ><mi>r</mi><mi>a</mi><mi>b</mi><mi>l</mi><mi>e</mi><mspace
   width="0.16em"></mspace><mi>s</mi><mi>c</mi><mi>o</mi><mi>r</mi><mi>e</
   mi></mrow><mrow><mi>j</mi></mrow></msub><mo>=</mo><mfrac><mrow><msubsup
   ><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mro
   w><msub><mrow><mi>m</mi></mrow><mrow><mi>j</mi></mrow></msub></mrow></m
   subsup><mfenced close=")"
   open="("><mrow><msub><mrow><mi>Z</mi></mrow><mrow><mi>i</mi><mi>j</mi><
   /mrow></msub><mo>−</mo><mi>m</mi><mi>e</mi><mi>d</mi><mi>i</mi><mi>a</m
   i><mi>n</mi><mfenced close=")"
   open="("><mrow><msub><mrow><mi>Z</mi></mrow><mrow><mi>i</mi><mo>,</mo><
   mi>j</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mo>⋯</mo><m
   sub><mrow><mi>n</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow></msub>
   </mrow></mfenced></mrow></mfenced></mrow><mrow><msub><mrow><mi>m</mi></
   mrow><mrow><mi>j</mi></mrow></msub></mrow></mfrac> :MATH]
   4
   [MATH:
   <msub><mrow><mi>f</mi><mi>a</mi><mi>v</mi><mi>o</mi><mi>r</mi><mi>a</mi
   ><mi>b</mi><mi>l</mi><mi>e</mi><mspace
   width="0.16em"></mspace><mi>s</mi><mi>c</mi><mi>o</mi><mi>r</mi><mi>e</
   mi></mrow><mrow><mi>j</mi></mrow></msub><mo>=</mo><mfrac><mrow><msubsup
   ><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mro
   w><msub><mrow><mi>p</mi></mrow><mrow><mi>j</mi></mrow></msub></mrow></m
   subsup><mfenced close=")"
   open="("><mrow><msub><mrow><mi>Z</mi></mrow><mrow><mi>i</mi><mi>j</mi><
   /mrow></msub><mo>−</mo><mi>m</mi><mi>e</mi><mi>d</mi><mi>i</mi><mi>a</m
   i><mi>n</mi><mfenced close=")"
   open="("><mrow><msub><mrow><mi>Z</mi></mrow><mrow><mi>i</mi><mo>,</mo><
   mi>j</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mo>⋯</mo><m
   sub><mrow><mi>q</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow></msub>
   </mrow></mfenced></mrow></mfenced></mrow><mrow><msub><mrow><mi>p</mi></
   mrow><mrow><mi>j</mi></mrow></msub></mrow></mfrac> :MATH]
   5

   [MATH:
   <msub><mrow><mi>Z</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub>
   :MATH]
   donate z-statistic of protein
   [MATH: <mi>i</mi> :MATH]
   in sample
   [MATH: <mi>j</mi> :MATH]
   .
   [MATH: <msub><mrow><mi>m</mi></mrow><mrow><mi>j</mi></mrow></msub>
   :MATH]
   and
   [MATH: <msub><mrow><mi>p</mi></mrow><mrow><mi>j</mi></mrow></msub>
   :MATH]
   represent the number of prognostically unfavorable or favorable
   proteins detected in sample
   [MATH: <mi>j</mi> :MATH]
   , respectively.
   [MATH: <msub><mrow><mi>n</mi></mrow><mrow><mi>i</mi></mrow></msub>
   :MATH]
   and
   [MATH: <msub><mrow><mi>q</mi></mrow><mrow><mi>i</mi></mrow></msub>
   :MATH]
   represent the number of samples in which unfavorable or favorable
   protein
   [MATH: <mi>i</mi> :MATH]
   was detected, respectively.

   Then we calculated the risk score for each sample, the risk scoring
   model based on above formulas was successfully constructed.
   [MATH: <msub><mrow><mi>r</mi><mi>i</mi><mi>s</mi><mi>k</mi><mspace
   width="0.16em"></mspace><mi>s</mi><mi>c</mi><mi>o</mi><mi>r</mi><mi>e</
   mi></mrow><mrow><mi>j</mi></mrow></msub><mo>=</mo><msub><mrow><mi>u</mi
   ><mi>n</mi><mi>f</mi><mi>a</mi><mi>v</mi><mi>o</mi><mi>r</mi><mi>a</mi>
   <mi>b</mi><mi>l</mi><mi>e</mi><mspace
   width="0.16em"></mspace><mi>s</mi><mi>c</mi><mi>o</mi><mi>r</mi><mi>e</
   mi></mrow><mrow><mi>j</mi></mrow></msub><mo>−</mo><msub><mrow><mi>f</mi
   ><mi>a</mi><mi>v</mi><mi>o</mi><mi>r</mi><mi>a</mi><mi>b</mi><mi>l</mi>
   <mi>e</mi><mspace
   width="0.16em"></mspace><mi>s</mi><mi>c</mi><mi>o</mi><mi>r</mi><mi>e</
   mi></mrow><mrow><mi>j</mi></mrow></msub> :MATH]
   6

   62 radical radiotherapy patients were divided into high-risk and
   low-risk groups based on the median of the risk scores. Next, we
   validated the prognostic predictive ability of the risk scoring model
   using the TCGA CC. Firstly, we performed z-score transformation on the
   FPKM matrix at the gene level. Subsequently, risk scores were
   calculated for TCGA samples. 291 TCGA CC samples were divided into
   high-risk and low-risk groups based on the median of the risk scores.
   Kaplan-Meier analysis (log-rank test) was utilized to illustrate the
   prognosis disparity between the two groups.

Identification of subgroup-specific kinases

   We applied KSEA to estimate the change in a kinase’s activity based on
   the collective phosphorylation changes of its identified substrates
   using the method proposed in KSEA app^[359]112. Here all
   Kinase-Substrate annotations from PhosphoSitePlus and from NetworKIN
   were used. To identify subgroup-specific kinases, we first performed a
   Student’s t test for each phosphosite between its phosphorylation
   levels in the samples of a specific subgroup and all other samples.
   Then, a kinase’s normalized KSEA score is calculated as:
   [MATH: <mi>K</mi><mi>S</mi><mi>E</mi><mi>A</mi><mspace
   width="0.16em"></mspace><mi>S</mi><mi>c</mi><mi>o</mi><mi>r</mi><mi>e</
   mi><mo>=</mo><mfrac><mrow><mrow><mo>(</mo><mrow><mover
   accent="true"><mrow><mi>t</mi></mrow><mo>¯</mo></mover><mo>−</mo><mover
   accent="true"><mrow><mi>p</mi></mrow><mo>¯</mo></mover></mrow><mo>)</mo
   ></mrow><msqrt><mrow><mi>m</mi></mrow></msqrt></mrow><mrow><mi>δ</mi></
   mrow></mfrac><mo>,</mo> :MATH]
   7

   in which
   [MATH: <mover accent="true"><mrow><mi>t</mi></mrow><mo>¯</mo></mover>
   :MATH]
   denotes the mean t-statistic of the known substrate phosphosites of
   this kinase,
   [MATH: <mover accent="true"><mrow><mi>p</mi></mrow><mo>¯</mo></mover>
   :MATH]
   represents the mean t-statistic of all phosphosites in the dataset, m
   denotes the total number of known substrate phosphosites of this
   kinase, δ denotes the standard deviation of the t-statistics across all
   phosphosites in the dataset. Subsequently, the corresponding P-value is
   determined by assessing the one-tailed probability of having a more
   extreme score than the one measured based on the normal distribution.
   Finally, subgroup-specific kinases were selected based on: protein
   expression levels were found to be significantly higher in a specific
   subgroup than in other tumor samples (Student’s t test,
   P-value < 0.05); P-value of KSEA score < 0.05.

Pathway enrichment of HPV gene-associated proteins

   Spearman’s correlation coefficient was calculated between HPV E6/L1
   mRNA level and protein abundance of human genes, which was subsequently
   used to rank the human genes. Then, the resulting ranked gene list was
   used as input for the GSEApy.prerank (Python module) to perform pathway
   enrichment analysis using the following detailed settings: gene_sets =
   “KEGG_2021_Human”; min_size = 5; max_size = 1000; and permutation_num =
   1000. Pathways with FDR < 0.05 were regarded as HPV gene-associated
   pathways.

Subgrouping analysis based on HPV gene expression

   CC tumors were stratified into two subgroups based on the mRNA
   expression levels of HPV E2 and E5 genes by hierarchical clustering
   (using the Python function scipy.cluster.hierarchy.linkage with
   parameters method = “ward” and metric = “euclidean”). The resulting
   linkage matrix was then used as input for
   scipy.cluster.hierarchy.fcluster to create flat clusters, with the
   criterion parameter set to “maxclust” and “t = 2”. Then, we employed
   either Chi-square test or Fisher’s exact test to test the association
   between the detected subgroups and the clinical features of patients,
   depending on the number of categories in each clinical feature. For
   continuous clinical features, we used the Student’s t test. P-values
   were adjusted for multiple testing using the BH method.

Functional experiments

siRNA Interference

   The siRNAs were synthesized by GenePharma. All siRNA transfections were
   performed with Lipofectamine 2000 (Invitrogen, 11668019) at 50 nM final
   concentration according to the manufacturer’s protocol. The siRNA
   sequence information was shown in Supplementary Data [360]11.

Plasmids

   The HA-tagged coding sequence of human HATs was cloned into pCDNA3.1
   vector. The Flag-tagged coding sequence of human FOSL2-WT or the
   relevant FOSL2-K222R, FOSL2-K222Q mutant were cloned into the
   lentiviral vector pLEX-MCS-CMV-puro (Addgene) to generate corresponding
   expression plasmids. The Flag-tagged coding sequence of human PRKCB
   were cloned into pLVX-IRES-Neo (Clontech) vector.

Cell transfection and lentiviral production

   Cells were transfected with plasmid vectors using Lipofectamine 2000
   according to the manufacturer’s instructions. To generate the
   lentivirus, HEK293T cells were co-transfected with psPAX2, pMD2.G, and
   recombinant lentiviral vectors. Forty-eight hours after transfection,
   the lentiviral supernatants were harvested and passed through a 0.45 μM
   filter. The lentiviruses were added to media supplemented with 10 μg/mL
   polybrene (Beyotime, C0351-1 mL) to transduce SiHa cells following the
   manufacturer’s instructions.

Constructions of stable cell lines

   To establish FOSL2-WT or FOSL2-K222R, FOSL2-K222Q mutant SiHa cell line
   models, SiHa cells were infected with respective lentivirus and further
   selected with 0.6 µg/mL puromycin (Beyotime Biotechnology, ST551-10mg).
   To establish inducible PRKCB-overexpressing SiHa cell strains, SiHa
   cells were infected with pLVX-IRES-Neo-PRKCB lentivirus and further
   selected with 0.8 mg/mL G418 (Gibco, 10131035).

Cell proliferation assay

   For cell growth assays, experimental and control cells were plated in
   96-well plate (2*10^3 cells per well). For measurement, CCK-8 solution
   (Beyotime Biotechnology, C0039) at the final concentration of 10% was
   added to the wells, and absorbance at 450 nm was measured 2 h after
   incubation to represent the relative cell numbers.

Colony formation assay

   To evaluate the clonogenic capacity of experimental and control cells,
   500 cells were plated in six-well plate in triplicated. After 10–12
   days of continuous culture, the cells were fixed with methanol and
   stained with 1% crystal violet (Sigma, C0775). The colonies were
   counted, and each assay was repeated at least three times
   independently.

Cell cycle analysis

   For cell cycle assays, experimental and control cells were seeded in
   6-well plate (2*10^5 cells per well). The cells were collected by
   trypsinization and fixed in ice-cold 70% ethanol overnight at 4 °C. The
   cell cycle detection kit (Nanjing KeyGen Biotech, KGA512) and BD LSR II
   flowcytometer (BD Biosciences) were applied to detect cell cycle
   distribution. The cell cycle result profiles were analyzed using ModFit
   LT 3.1 (Verity Software House).

IHC

   The fresh tumors and NATs were fixed in 10% neutral buffered formalin,
   embedded in paraffin, and used to prepare sections that were 4 μm
   thick. The sections were deparaffinized, rehydrated, and then immersed
   in a 3% hydrogen peroxide solution for 10 min. For antigen retrieval,
   the sections were heated in citrate buffer (pH 6.0) at 95 °C for 25 min
   and then cooled at room temperature for 60 min. After each incubation
   step, the sections were washed with PBS (pH 7.4). Subsequently, the
   sections were incubated overnight at 4 °C with the anti-PRKCB antibody
   (Abcam, ab195039). Immunostaining was carried out following the
   instructions provided by the manufacturer (PV-9000 Polymer Detection
   System, Zhongshan Golden Bridge). The sections were counterstained with
   hematoxylin, dehydrated using graded ethanol, and sealed with neutral
   resin. Two investigators independently assessed the IHC staining of
   PRKCB on the sections.

Immunoblot and immunoprecipitation

   Cells were lysed in EBC lysis buffer (50 mM Tris HCl, pH 8.0, 120 mM
   NaCl, 0.5% NP-40) supplemented with protease inhibitors (Selleck
   Chemicals) and phosphatase inhibitors (Selleck Chemicals), followed by
   pulse sonication for 10 s. 30 mg total protein were separated by 10%
   SDS-PAGE gel and blotted with indicated primary antibodies. For
   immunoprecipitation (IP), the whole cell lysates (WCL) were
   immunoprecipitated with anti-Flag M2 agarose beads for 2 h in the
   presence of 2 μM TSA and 10 mM nicotinamide (NAM). The pellet was then
   washed with NETN buffer (20 mM Tris-HCl, pH 8.0, 100 mM NaCl, 0.5%
   NP-40, 1 mM EDTA) for four times and analyzed by immunoblotting or MS.

qRT-PCR

   Total RNA was extracted following the manufacturer’s instructions using
   the RNApure Tissue & Cell Kit (Cwbiotech, CW0560S). Subsequently, the
   isolated RNA served as a template for reverse-transcription reactions
   employing the HiFiScript cDNA Synthesis Kit (Cwbiotech, CW2569M).
   qRT-PCR analysis was conducted using the TB Green® Premix Ex Taq™ II
   (TaKaRa, RR820A) and CFX96 Real-Time System (Bio-Rad). β-actin was
   served as an internal control. The relative quantification of gene
   expression was analyzed by the 2^−△△Ct method. The primers used for
   qRT-PCR analyses are as following:

   WNT5A Forward: 5′-GCCAGTATCAATTCCGACATCG-3′,

   Reverse: 5′-TCACCGCGTATGTGAAGGC-3′

   FOSL2 Forward: 5’-CAGAAATTCCGGGTAGATATGCC-3′

   Reverse: 5′-GGTATGGGTTGGACATGGAGG-3′

   β-actin Forward: 5′-AGAGCTACGAGCTGCCTGAC-3′,

   Reverse: 5′-AGCACTGTGTTGGCGTACAG-3′

Xenograft tumorigenesis assay

   Five-week-old female BALB/c nude mice (HFK Bioscience) were maintained
   in pathogen-free conditions. All animals were acclimated for 1 week
   before experiments. 3*10^6 experimental and control SiHa cells in
   100 μL PBS were subcutaneously inoculated at the flank of randomly
   grouped nude mice. Tumor size was measured every 3 days with a caliper
   and tumor volumes were calculated by the formula:
   volume = (width)^2*length*0.52. The maximum tumor burden allowed by the
   ethics committee did not exceed 1500 mm^3. When the tumor burden
   reached 1500 mm^3, the mice were euthanized, and the tumors were
   dissected for further analysis. After the mice were euthanized, the
   transplanted tumors were weighed and photographed. All animal
   procedures were performed according to the guidelines approved by the
   National Cancer Center/National Clinical Research Center for
   Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking
   Union Medical College and adhered to the National Institutes of Health
   Guide for the care and use of laboratory animals.

Reporting summary

   Further information on research design is available in the [361]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [362]Supplementary Information^ (2.7MB, pdf)
   [363]Peer Review File^ (7.4MB, pdf)
   [364]41467_2024_53830_MOESM3_ESM.pdf^ (99KB, pdf)

   Description of additional supplementary files
   [365]Supplementary Data 1^ (188.4KB, xlsx)
   [366]Supplementary Data 2^ (6.4MB, xlsx)
   [367]Supplementary Data 3^ (164.8MB, xlsx)
   [368]Supplementary Data 4^ (57.9KB, xlsx)
   [369]Supplementary Data 5^ (20.4KB, xlsx)
   [370]Supplementary Data 6^ (27KB, xlsx)
   [371]Supplementary Data 7^ (1.9MB, xlsx)
   [372]Supplementary Data 8^ (15.5KB, xlsx)
   [373]Supplementary Data 9^ (26.9KB, xlsx)
   [374]Supplementary Data 10^ (17KB, xlsx)
   [375]Supplementary Data 11^ (139.7KB, xlsx)
   [376]Reporting Summary^ (5.3MB, pdf)

Source data

   [377]Source Data^ (26.6MB, xlsx)

Acknowledgements