Graphical abstract

   graphic file with name fx1.jpg
   [47]Open in a new tab

Highlights

     * •
       A large-scale multi-omics study of hepatocellular carcinoma (HCC)
     * •
       Robustness and simplified panel for HCC proteomic subtypes
     * •
       Proteomic model to personalize prediction of sorafenib response for
       HCC precision therapy
     * •
       Workflow from generating large-scale omics datasets to drug testing
     __________________________________________________________________

   Xing et al. develop a workflow from generating large-scale omics
   datasets to drug testing. The robustness and simplified panel for
   hepatocellular carcinoma (HCC) proteomic subtypes, as well as effective
   prediction of sorafenib response, demonstrate that targeting
   multi-omics can improve the implementation of personalized therapeutic
   strategies for HCC.

Introduction

   Hepatocellular carcinoma (HCC) accounts for about 75%–85% of all
   primary liver cancer, with limited therapies and accompanied by poor
   prognosis.[48]^1 The majority of HCC patients are diagnosed at an
   advanced stage and can only receive systemic anti-tumor therapy, which
   is less effective and has a low response
   rate.[49]^2^,[50]^3^,[51]^4^,[52]^5^,[53]^6^,[54]^7 In particular, the
   prognoses of patients with HCC at the same clinical stage have been
   remarkably different, making it challenging to predict the outcomes.
   Therefore, accurate staging and subsequent appropriate personalized
   treatment are the keys to improving clinical outcomes of HCC.

   Extensive efforts have been devoted to identify new subtypes for HCC
   using genomic and transcriptomic
   data,[55]^8^,[56]^9^,[57]^10^,[58]^11^,[59]^12 but with limited impact
   on clinical decisions. Proteins ultimately perform the real biological
   functions and represent the real world of complex diseases; in
   particular, the important post-translational modifications (PTMs) of
   proteins cannot be informed from genomic and transcriptomic data, such
   as phosphorylation-related kinases that are important drug targets.
   Proteomics, which reflects systemic changes in functional executors,
   has been extensively applied for biomarker discovery and therapy
   selection in cancers[60]^13^,[61]^14^,[62]^15^,[63]^16^,[64]^17 and
   other complex
   diseases,[65]^18^,[66]^19^,[67]^20^,[68]^21^,[69]^22^,[70]^23 and
   provides unlimited potential for precision medicine. Recently, Jiang
   et al. performed a comprehensive proteomics study and stratified
   early-stage HCC patients into three subtypes with different clinical
   outcomes (SI, SII, and SIII), and they identified a therapeutic target
   sterol O-acyltransferase 1 (SOAT1) for subtype SIII with the worst
   prognosis.[71]^24 Subsequently, Gao et al. performed the proteogenomic
   characterization of hepatitis B virus (HBV)-related HCC and also
   reported three similar HCC proteomic subtypes; meanwhile, they further
   identified two prognostic biomarkers involved in metabolic
   reprogramming.[72]^25 These two studies opened the avenue to precision
   typing and prognosis determination of HCC based on proteomics.

   To apply HCC proteomic subtypes in clinical precision therapy, further
   verifying the robustness and universality of HCC proteomic subtypes and
   identifying signatures for HCC subtypes are crucial. In addition,
   occurrence and progression of HCC is an extremely complex process, and
   the proteomics also needs to be integrated with genomics and
   transcriptomics to panoramically and comprehensively understand the
   molecular changes of HCC from the genetic level to the protein level
   and even the PTM level to better predict prognosis, find personalized
   drug targets, and evaluate drug
   efficacy.[73]^26^,[74]^27^,[75]^28^,[76]^29^,[77]^30^,[78]^31^,[79]^32^
   ,[80]^33^,[81]^34^,[82]^35^,[83]^36^,[84]^37^,[85]^38^,[86]^39^,[87]^40

   Here, we comprehensively analyzed the genomic, transcriptomic,
   proteomic, and phosphoproteomic profiles of an independent HCC cohort
   to reveal and verify the robustness and universality of the proteomic
   subtypes and have developed a simplified discriminating panel of
   proteomic subtypes using machine learning. Integrated multi-omics
   analysis further revealed the functional impact of the molecular
   alterations for different HCC subtypes. In addition, we identified
   actionable drug targets and corresponding drugs for different proteomic
   subtypes by integrating proteomics and phosphoproteomics and further
   assessed the therapeutic responses in patient-derived cells (PDCs).
   Overall, our study highlights the importance of proteomics and
   phosphoproteomics in clinical decisions for HCC patients, especially in
   precise prognosis evaluation and personalized therapeutic strategy
   selection.

Results

Overview of the study

   Proteomics-based precision typing of HCC leads us to the era of
   precision medicine. Regarding the robustness and universality of HCC
   proteomic subtypes, molecular characterization and a simplified panel
   for clinical application are the main obstacles to further advancement
   of proteomics to the clinic. To this end, we performed multi-omics
   profiling of primary tumor tissues (T) and paired para-tumor tissues
   (N) from 160 HCC patients (proteomics, N = 160; phosphoproteomics, N =
   132 from 160; whole-exome sequencing [WES], N = 58 from 132;
   transcriptome sequencing [RNA_Seq], N = 57 from 58) ([88]Figure S1A and
   [89]Table S1). HCC subtypes were identified by proteomic analysis based
   on data-independent acquisition (DIA) ([90]Figure S1B) and validated in
   three independent publicly available external validation sets to
   confirm its robustness and universality. Machine learning was used to
   develop a simplified panel for discriminating HCC proteomic subtypes
   for facilitating clinical applications. Furthermore, to explore the
   molecular characteristics and screen the potential clinically
   actionable drugs for different HCC proteomic subtypes, integrated
   multi-omics analysis was performed, especially kinase abundance and
   kinase-substrate regulatory network analysis. Finally, 26 PDC models
   were established to investigate the personalized response to sorafenib
   in different proteomic subtypes, and a sorafenib therapeutic effect
   prediction model was constructed and validated by machine learning to
   provide precise clinical decisions for patients who should undergo
   sorafenib treatment.

Proteomic characterization identified three HCC subtypes

   In total, we identified 6,512 proteins (false discovery rate <0.01)
   across all 152 paired samples that passed quality control, accounting
   for 76.33% of the hybrid spectral library ([91]Figures S1C and S1D;
   [92]Table S2). The high stability of the mass spectrometry (MS)
   platform (median Pearson’s correlation coefficient of 0.954 in
   data-dependent acquisition [DDA] mode, 0.976 in DIA mode) resulted in
   high quantitative stability and accuracy of HCC samples
   ([93]Figures S1E and S1F). As expected, the coefficient of variation
   values were higher in T than that in N (6.37% vs. 5.37%)
   ([94]Figure S1G), and the T and N were distinctly distinguished
   ([95]Figures S1H and S1I), highlighting the high inter-tumor
   heterogeneity of HCC, consistent with previously reported
   studies.[96]^24 These results strongly affirmed the high quality of our
   proteomic data.

   Unsupervised consensus clustering identified three proteomic subtypes
   in 152 HCC patients, which were designated as SI (n = 49), SII (n =
   47), and SIII (n = 56) ([97]Figures 1A and [98]S2A–S2C). Furthermore,
   the three proteomic subtypes significantly differed in overall survival
   (OS) and recurrence-free survival (RFS) (log-rank test, p < 0.0001)
   ([99]Figure 1B). The three proteomic subtypes were further verified in
   two independent surgical specimen cohorts from Jiang et al. (N =
   101)[100]^24 ([101]Figure 1C) and Gao et al. (N = 159)[102]^25
   ([103]Figure 1D), and even in an independent biopsy specimen cohort
   from Ng et al. (N = 51)[104]^41 (p < 0.01) ([105]Figure 1E), which
   supported the reliability of our proteomic subtypes. Consistent with
   the current clinical consensus, the advanced BCLC stage (p = 0.041) and
   TNM stage (p = 0.032, Fisher’s exact test), positive α-fetoprotein
   (AFP) (p < 0.001), microscopic vascular invasion (MVI^+, p = 0.006),
   tumor low differentiation (p = 0.001), multiple tumor number (p =
   0.027, Fisher’s exact test), and absence of tumor capsule (p = 0.045,
   Fisher’s exact test) were more prominent in SIII than in SI, and SII
   was intermediate ([106]Figures 1A and [107]S2D–S2J; [108]Table S2). In
   addition, the proteomic subtypes were also authenticated as an
   independent prognosis indicator by multivariate analysis (hazard ratio,
   2.43; 95% confidence interval, 1.22–4.82; p = 0.011) ([109]Figure S2K).

Figure 1.

   [110]Figure 1
   [111]Open in a new tab

   Proteomic characterization identified three HCC subtypes

   (A) Consensus clustering of 152 HCC tumors. The associations of HCC
   proteomic subtypes with clinical characteristics are annotated in the
   upper panel (chi-squared test, ∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001).
   The heatmap depicts the relative abundance of signature proteins
   (log[2]-transformed). Each column represents a patient sample, and rows
   indicate proteins.

   (B) Kaplan-Meier (KM) curves of OS and RFS for each proteomic subtype
   (log-rank test).

   (C–E) KM curves of OS for each proteomic subtype in Jing et al.’s
   cohort (C), Gao et al.’s cohort (D), and Ng et al.’s cohort (E). The p
   values are calculated by log-rank test.

   (F) ssGSEA reveals the pathways that are significantly enriched in the
   three respective proteomic subtypes. The specific enriched signaling
   pathways of each subtype are analyzed and summarized into four
   clusters. C1, upregulated in SI only; C2, upregulated in both SI and
   SII; C3, upregulated in SII and SIII; C4, upregulated in SIII only.

   (G) Signal pathway changing trend in the four clusters (C1–C4).

   See also [112]Figures S1 and [113]S2.

Molecular features of three HCC proteomic subtypes

   To comprehensively characterize the molecular features of three HCC
   proteomic subtypes, the specific enriched signaling pathways of each
   subtype were analyzed and summarized ([114]Figure 1F). SI was
   characterized by the highest enrichment of metabolism-related pathways,
   which accounted for the majority of all enriched pathways and was
   highly consistent with Jiang et al.’s and Gao et al.’s (denoted as
   metabolism subgroup, S-Mb) results.[115]^24^,[116]^25 SII was a
   transitional subtype from SI to SIII with an increase in cell-growth-
   and immunity-related pathways but a decrease in metabolism-related
   pathways. The enrichment of immune-related pathways, such as antigen
   processing and presentation and Toll-like receptor, was consistent with
   the immune properties of SII in Jiang et al.’s and Gao et al.’s studies
   (denoted as microenvironment dysregulated subgroup, S-Me). SIII showed
   an increase in proliferation-, metastasis-, and immune-related pathways
   but almost no metabolic pathways, which validated the immune/metastatic
   characteristics of Jiang et al.’s SIII subtype and the proliferation
   characteristics of Gao et al.’s SIII subtype (denoted as proliferation
   subgroup, S-Pf), suggesting more aggressive characteristics in SIII
   than in both SI and SII ([117]Figure 1G and [118]Table S2).

   Thus, the molecular characteristics of the three proteomic subtypes in
   the three independent studies were highly consistent, indicating the
   stability and reliability of HCC proteomic subtypes. Meanwhile, the
   relevant or specific pathways of HCC significantly correlated with
   their proteomic subtypes, further highlighting their clinical
   implications.

The HCC proteomic subtypes were robust and universal

   To investigate the universality of HCC proteomic subtypes, we performed
   a cross-validation with our subtypes and the proteomic subtypes from
   Jiang et al.’s cohort[119]^24 and Gao et al.’s cohort,[120]^25 with
   patient number >100. These three cohorts belonged to different clinical
   centers with different inclusion/exclusion criteria and MS strategies
   ([121]Figure 2A). The subtype-specific signatures (namely, signatures)
   contained 761 proteins in our cohort, which was significantly lower
   than the signatures of Jiang et al. (1,269 proteins) and Gao et al.
   (1,274 proteins) ([122]Figure S3A and [123]Table S3). However, to
   increase the potential for clinical application, these signatures need
   to be further simplified, as it is generally believed that fewer
   proteins are more applicable for clinical translation. For
   cross-validation, firstly, clustering our DIA data with Jiang et al.’s
   and Gao et al.’s signatures from DDA data using unsupervised consensus
   clustering also generated three subtypes with different OS (p = 0.00015
   and p = 0.00047), and there was obvious concordance of patient
   allocation with our original subtypes (Jiang et al.: 85.5%; Gao et al.:
   82.9%), supporting the reliable data-acquisition and subtyping
   procedure in this study ([124]Figure 2B and [125]Table S3). In turn,
   the consistency was up to 79.2% in Jiang et al.’s cohort (p = 0.0051)
   and 78.6% in Gao et al.’s cohort (p < 0.0001) when clustering their
   data using our signatures ([126]Figure 2C). Furthermore, in the mutual
   verification of the signatures from each of Jiang et al.’s or Gao
   et al.’s cohort, the consistency between these subtypes and the
   original subtypes reached up to 88.1% in Jiang et al.’s cohort (p =
   0.05) and 77.9% in Gao et al.’s cohort (p = 0.00032) ([127]Figures 2D
   and [128]S3B; [129]Table S3). These results showed that the signatures
   from three proteomic studies were robust and universal.

Figure 2.

   [130]Figure 2
   [131]Open in a new tab

   The robustness and universality of HCC proteomic subtypes and their
   simplified discriminating panel

   (A) Workflow of cross-validation of proteomic signatures in three
   cohorts (Gao et al.’s cohort: N = 159; Jiang et al.’s cohort: N = 101;
   present cohort: N = 152).

   (B) Validation of Jiang et al.’s and Gao et al.’s signatures in our
   cohort. The left panel shows the KM curves of OS and RFS according to
   the proteomic subtypes (log-rank test). The right alluvial plot shows
   the comparison between these subtypes and the original subtypes.

   (C) Validation of our signatures in Jiang et al.’s and Gao et al.’s
   cohorts. The confusion matrices are provided.

   (D) Validation of Jiang et al.’s and Gao et al.’s signatures in each
   other’s cohorts. The confusion matrices are provided.

   (E) Concordance rate of three proteomic signatures in three cohorts.

   (F) Workflow for developing the simplified panel for discriminating
   proteomic subtypes.

   (G) Receiver-operating characteristic accuracy, sensitivity, and
   specificity of simplified panel for discriminating proteomic subtypes
   in the validation set.

   (H) Alluvial plot shows the comparison between proteomic subtypes
   identified by simplified panel and the original subtypes in the
   validation set. The predictor represents subtypes from the SP9, while
   the response represents the original subtypes from the full panel.

   See also [132]Figure S3.

   Generally, a strong concordance of three proteomic subtypes was
   validated in three independent cohorts (over 80% on average), with SIII
   having the highest concordance among the three subtypes in three
   independent cohorts, followed by SI, while SII showed the lowest
   consistency because of its transition properties ([133]Figure 2E). In
   addition, most of the inconsistencies by different signatures in the
   same cohort were found in SII or SIII because of the similar molecular
   features in SII and SIII as mentioned above ([134]Figures S3C–S3H).

A machine-learning-based simplified panel for distinguishing HCC proteomic
subtypes

   It is expressly necessary to reduce the number of signatures of HCC
   proteomic subtypes to facilitate potential clinical applications. For
   machine learning, our cohort and Jiang et al.’s cohort were used as the
   training set, and Gao et al.’s cohort was used as the independent
   external validation set, after removing the batch effect from these
   three cohorts ([135]Figures 2F and [136]S3I). The machine-learning
   analysis resulted in a simplified panel (SP9) comprising nine proteins
   (DCXR, EHHADH, ALDH4A1, ABAT, ALDH6A1, ALDH7A1, SULT2A1, SORD, and
   ACSM2B), which were all drastically downregulated in SIII
   ([137]Figure S3J) and significant for OS to stratify the patients
   ([138]Figure S3K). Mechanistically, they were involved with fatty acid
   metabolism (ACSM2B, EHHADH), amino acid metabolism (ALDH6A1, ALDH7A1,
   ALDH4A1, ABAT), glycometabolism (SORD, DCXR), and bile secretion
   (SULT2A1).

   SP9 showed high sensitivity and specificity to discriminate SI or SIII
   from the total HCC cohort ([139]Figures S3L and S3M). In the validation
   cohort, the sensitivity and specificity of SP9 for SI discrimination
   were 84.5% and 96.4%, respectively with an area under the curve (AUC)
   of 0.961; for SII discrimination the sensitivity and specificity were
   72.7% and 89.4% with an AUC of 0.881; and the highest sensitivity
   (95.8%) and specificity (95.5%) with the highest AUC of 0.988 were
   obtained in discriminating SIII from SI and SII patients, respectively
   ([140]Figure 2G). Furthermore, the subtypes using SP9 showed a
   similarity up to 84.3% with the original subtypes, in conformance with
   the similarity of the three subtypes in different cohorts
   ([141]Figure 2H). These results indicated that SP9 was stable and
   reliable for HCC subtyping, which provided a possibility for clinical
   application of HCC proteomic subtypes.

Mutation and copy-number variation landscapes of three HCC proteomic subtypes

   To further explore the molecular characteristics of these three
   proteomic subtypes, we performed WES (N = 58), RNA_Seq (N = 57), and
   phosphoproteomic (N = 132) analysis of paired T and N samples
   ([142]Figure 3A), whose clinicopathological characteristics were not
   significantly different from the total samples ([143]Figure S4A). The
   mutation landscape showed that TP53, TNN, OBSCN, CTNNB1, RYR2, PCLG,
   MUC16, RYR1, ARIDA1, and RB1 were the most frequently mutated genes in
   HCC ([144]Figures 3B and [145]S4B; [146]Table S4). CTNNB1 mutations
   were the most important feature in SI and SIII and were significantly
   associated with OS and RFS (p < 0.05) rather than WNT pathway
   alterations ([147]Figures 3C–3F and [148]S4C–S4G). These findings were
   consistent with previous reports, indicating the reliability of our
   results.[149]^42^,[150]^43

Figure 3.

   [151]Figure 3
   [152]Open in a new tab

   Proteogenomic and immune landscape of three HCC proteomic subtypes

   (A) Illustration of 152 paired HCC cases used in the individual omics
   experiments. The omics experiments are colored blue, and the tumor
   tissues and non-tumor tissues were detected in pairs.

   (B) WES-based genomic landscape of the three HCC proteomic subtypes.
   The top panel represents the mutation profile, the middle panel
   represents TMB and TNB levels, and the bottom panel represents the CNV.

   (C–E) Feature importance ranking by the random forest algorithm for SI
   (C), SII (D), and SIII (E). The feature with the highest rank of
   importance score indicates the highest association (either positive or
   negative) with the proteomic subtype.

   (F) KM curves for OS of HCC patients with CTNNB1 mutation or wild type
   (log-rank test).

   (G) Heatmap shows the immune cell populations of HCC patients belonging
   to different proteomic subtypes.

   (H) Principal component analysis plot of immune scores of immune cell
   populations based on proteomic data in three proteomic subtypes.

   (I) Proteome-based immune scores in three proteomic subtypes
   (two-tailed Wilcoxon test). Box plots show median (central line), upper
   and lower quartiles (box limits), and 1.5× interquartile range
   (whiskers).

   (J) Proteomics-based immune scores of anti-tumor immunity and pro-tumor
   immune suppression in three proteomic subtypes (two-tailed Wilcoxon
   test). Box plots show median (central line), upper and lower quartiles
   (box limits), and 1.5× interquartile range (whiskers).

   (K) Correlation between anti-tumor immunity and pro-tumor immune
   suppression based on proteome in three proteomic subtypes. Each point
   represents a patient sample, the lines represent the fitted curves of
   correlation in each subtype, and the shaded area represents 95%
   confidence interval. Pearson’s correlation coefficient (r) and p values
   are present in the table. The p values were calculated using Pearson’s
   correlation method.

   See also [153]Figure S4.

   Meanwhile, the copy-number variation (CNV) landscape showed significant
   differences among different HCC subtypes, with different regulatory
   genes during hepatocarcinogenesis and progression. 1q was predominantly
   co-amplified in SII and SIII, suggesting that they may co-regulate gene
   expression ([154]Figure 3B). Additionally, the tumor mutation burden
   (TMB) and tumor neoantigen burden (TNB) in individual patients revealed
   the heterogeneity of HCC, consistent with previous reports.[155]^44
   Nevertheless, there was no significant difference in TMB and TNB among
   the three proteomic subtypes ([156]Figure 3B and [157]Table S4).

Immune landscape of three HCC proteomic subtypes

   To investigate the immune landscape of three HCC proteomic subtypes, we
   estimated the abundance of 28 immune cell populations using the
   single-sample gene set enrichment analysis (ssGSEA) method based on the
   proteomic and transcriptomic data. The immune landscape of three HCC
   proteomic subtypes showed high heterogeneity, with distinctly
   distinguished abundance of immune cell populations ([158]Figures 3G,
   3H, [159]S4H, and S4I). The immune infiltration score was significantly
   higher in SIII than that in SI and SII (p < 0.05, median ± SD: SIII
   [162.74 ± 258.87] vs. SII [17.72 ± 154.67] vs. SI [−186.26 ± 163.07])
   ([160]Figures 3I and [161]S4J). In detail, SI was characterized by
   increasing CD56 dim natural killer cells compared to SII and SIII.
   Meanwhile the abundance of most immune cells, including but not limited
   to central memory CD4 T cells, natural killer T cells, regulatory
   T cells, and myeloid-derived suppressor cells, were significantly
   higher in SIII than in the other two subtypes ([162]Figure S4K). The
   enriched ssGSEA score of immune activation cells (p < 0.05, median ±
   SD: SIII [1.25 ± 3.05] vs. SII [0.32 ± 2.64] vs. SI [−0.73 ± 4.05]) and
   immune suppression cells (p < 0.05, median ± SD: SIII [0.57 ± 3.33] vs.
   SII [−3.10 ± 2.37] vs. SI [−0.48 ± 2.32]) were both significantly
   higher in SIII than in SI and SII ([163]Figures 3J and [164]S4L).
   Meanwhile, anti-tumor immunity and pro-tumor immune suppression were
   significantly positively correlated in SIII, indicating a co-existence
   of immune activation and immunosuppression in aggressive HCC
   ([165]Figures 3K and [166]S4M).

   In addition, the abundance of key molecules of the immune
   microenvironment were significantly different among the three proteomic
   subtypes. As expected, SIII displayed significant upregulation of human
   leukocyte antigen (HLA) molecules (including HLA class I and II
   molecules), immune checkpoints (such as PDCD1 [PD-1], CD86, and CD47),
   immune-related CT antigens (such as ADAM28, CEP55, SPAG9, and PBK), and
   cytokines (such as IL-16, CCL3, TGFB-3, TGFB-1, CCL21, CFS1, and IL-18)
   compared with the other two subtypes ([167]Figure S4N). These results
   suggested that SIII HCC patients might be more suitable for
   immunotherapy, although this would need to be validated in other
   prospective clinical trials.

Phosphoproteomic and kinase profiles of three HCC proteomic subtypes

   In phosphoproteomics analysis, the high inter-tumor heterogeneity in
   HCC that was shown in proteome was also clearly presented in
   phosphoproteome, and the characteristics of the phosphorylation
   modifications were similar to those in other reports ([168]Figures 4A
   and [169]S5A–S5G; [170]Table S5).[171]^45 The signaling pathways
   enriched in three proteomic subtypes at the phosphorylation level were
   also very similar to that at RNA level and protein level
   ([172]Figures 4B, 4C, and [173]S5H–S5J; [174]Table S6). Considering
   that protein kinases have been developed as viable drug targets for
   cancer therapies, we next investigated the kinase activity (reflected
   by the overall phosphorylation level of the substrates) in three
   proteomic subtypes and its correlation with the substrate
   phosphorylation level and kinase abundance. In total, 215 kinases were
   found in HCC, among which the activity of 75 kinases differed among
   three subtypes, and the enriched pathways also suggested the close
   association of kinase activity with the occurrence and development of
   HCC ([175]Figures 4D and 4E, [176]Table S5). Furthermore, the
   kinase-substrate correlation was significantly lower in SIII than that
   in SI and SII (p < 0.01, median ±SD: SIII [0.15 ± 0.22] vs. SII [0.17 ±
   0.22] vs. SI [0.16 ± 0.23]) ([177]Figures 4F and 4G). Nevertheless,
   there was no correlation between kinase abundance and kinase activity
   ([178]Figure 4H). Moreover, the clinical value of kinase activity and
   kinase abundance in HCC was also significantly different. For example,
   the kinase activity of BCKDK, CDK16, and MAP3K2, but not the kinase
   abundance, was significant for the prognosis of HCC ([179]Figures 4I,
   4J, and [180]S5K). Therefore, it is necessary to systematically analyze
   the abundance and activity of kinases by integrating proteome and
   phosphoproteome for drug screening and precise treatment.

Figure 4.

   [181]Figure 4
   [182]Open in a new tab

   Phosphoproteomic profile and kinase-substrate regulatory network of
   three HCC proteomic subtypes

   (A) Summary of the identification of phosphoproteins and kinases.

   (B) Pathway alterations of phosphoproteins in three proteomic subtypes.

   (C) Enriched functions of three proteomic subtypes by ssGSEA.

   (D) Kinase activation enrichment of differentially abundant
   phosphosites among three proteomic subtypes. Each column represents a
   patient sample, and each row indicates a kinase.

   (E) Kinase regulation-pathway network. Rhombus indicates signaling
   pathways, and roundness indicates kinases differentially expressed in
   phosphoproteomic data.

   (F) Kinase-substrate regulation networks in three proteomic subtypes.
   The edges represent Pearson’s correlation coefficient between kinases
   and the corresponding phosphosubstrates.

   (G) Distribution of Pearson’s correlation coefficients of
   kinase-substrate networks in (F) (two-tailed Wilcoxon test). Box plots
   show median (central line), upper and lower quartiles (box limits), and
   1.5× interquartile range (whiskers).

   (H) Correlation between kinase abundance and kinase activity. r
   represents Pearson’s correlation coefficient. The p values were
   calculated using the Pearson’s correlation method.

   (I and J) KM curves of OS (I) and RFS (J) for BCKDK activity (log-rank
   test).

   See also [183]Figure S5.

Multi-omics landscape of three HCC proteomic subtypes

   To explore the inter-omics correlation, we visualized the correlations
   of different omics using Pearson’s correlation coefficient. The results
   demonstrated that the correlation decreased along with the “central
   dogma,” which may be attributed to the enrichment of attenuated
   proteins, reflecting RNA splicing, protein synthesis, protein
   degradation, and even PTMs ([184]Figures 5A–5C, [185]S6A, and S6B).
   Notably, phosphorylation levels poorly correlated with CNV alterations,
   mRNA, and protein abundance (median Pearson’s r < 0.02), indicating the
   important role of the protein PTM in the development and progression of
   HCC ([186]Figures 5D and 5E). The inter-omics correlations differed
   among the three proteomic subtypes, especially the correlation between
   RNA abundance and protein abundance, while the correlation between
   protein abundance and phosphorylation level was not significantly
   different among the three proteomic subtypes, mainly due to the low
   overall correlation between them ([187]Figure S6C). Furthermore, we
   found that two-thirds of the significant mRNA-protein co-variates were
   positive while showing a higher number of negative regulatory
   interactions of protein phosphorylation, which were classified into
   three clusters ([188]Figures 5F and [189]S6D). Consistent with the
   characterization of proteomic subtypes, the SI-specific positive
   correlation cluster featured metabolism characteristics, and
   SIII-specific positive correlation clusters were significantly involved
   in proliferation and metastasis pathways ([190]Figures 5G and
   [191]S6E).

Figure 5.

   [192]Figure 5
   [193]Open in a new tab

   Integrated multi-omics analysis of three HCC proteomic subtypes

   (A) Correlation between WES, transcriptome, proteome, and
   phosphoproteome. The overlap patients of every two individual omics
   were analyzed.

   (B) Spearman’s correlation of CNV to mRNA and protein. Each dot
   represents a transcript/protein. Attenuated proteins are represented in
   red using a Gaussian mixture model with two mixture components.

   (C) Enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway
   analysis of the attenuated proteins.

   (D) Scatterplots depicting log[2] (T/N) of protein (x axis) and
   phosphoprotein (y axis) abundance. The red dots indicate positive
   correlations, and the blue dots indicate negative correlations. The p
   values were calculated using the Pearson’s correlation method.

   (E) Enriched KEGG pathways for negative-correlated proteins in (D).

   (F) Hierarchical clustering analysis map of significantly changed
   phosphosite-to-protein correlations among three proteomic subtypes.
   Pearson’s correlation coefficients between matched pairs of phosphosite
   abundances versus protein abundances were calculated.

   (G) Functional enrichment for significant phosphosite-to-protein
   correlations in each cluster.

   (H) Phosphosite-to-protein co-varying MCODE complexes/subnetworks of
   significantly changed phosphosite-to-protein correlations among three
   proteomic subtypes. Top five phosphosite-to-protein co-varying MCODE
   complexes/subnetworks are shown in different colors.

   (I–K) Top specific MCODEs in SI (I), SII (J), and SIII (K) positivity
   are combined with upstream kinases.

   See also [194]Figure S6.

   Furthermore, MCODE complex/subnetwork analysis showed the top five key
   co-varying phosphosite-to-protein MCODEs ([195]Figure 5H). We noticed
   that some kinases were shared within MCODEs; for example, AKT1 and
   CDK1, the top co-varying phosphosite-to-protein MCODEs, differed in the
   three proteomic subtypes. mRNA processing was the top MCODE in
   SI-positive cluster, and the kinase PRKCA involved in transcriptional
   regulation of upstream mRNA splicing was identified, consistent with
   previous reports ([196]Figure 5I).[197]^46 Furthermore,
   nucleocytoplasmic transport was the top MCODE in the SII-positive
   cluster, and the experimentally verified or predicted upstream kinases
   (CLK1, MAPKAPK2) were also mapped ([198]Figure 5J). In contrast,
   regulation of actin cytoskeleton was the top MCODE in the SIII-positive
   cluster, and the kinase AKT1 involved in the upstream regulation of
   actin cytoskeleton organization was also reported ([199]Figure 5K).

   Altogether, due to the high heterogeneity and patient specificity, the
   integrated analysis of multi-omics is more suitable for the
   comprehensive molecular characterization of HCC. In particular, the
   integrated analysis of proteomics and phosphoproteomics has promising
   prospects for drug screening and precision treatment of HCC.

Drug-response prediction for three HCC proteomic subtypes

   To identify potential clinical drugs for the three HCC subtypes, we
   screened drug targets of solid tumor within the DrugBank database using
   quantitative proteomic and phosphoproteomic data. A total of 38
   quantifiable drug targets (including 11 kinases) belonging to 33
   clinically actionable drugs were identified, with significant
   difference in abundance among three proteomic subtypes
   ([200]Figure 6A). We identified targets of bevacizumab (C1QC, C1QB,
   C1QA, FCGR3A, FCGR2C), dasatinib (RSC, HSPA8, PPAT, CSK, LYN, MAPK14,
   FRK), and neratinib (ORM1, ORM2), with higher abundance in SIII than in
   SI and SII. Notably, RAF1 and LYN, key targets of sorafenib and
   regorafenib, had significantly higher kinase abundance in SIII than in
   SI and SII. However, enzymes affecting the drug activity (UGT1A9,
   UGT1A1, UGT1A3) or drug metabolism (CYP2D6, CYP3A4) of sorafenib and
   regorafenib were identified as highly ranked targets in SI and SII but
   not in SIII. Furthermore, phosphorylated substrates of kinase RAF1,
   MAP2K1, MAP2K2, and LYN were also significantly higher in SIII,
   suggesting abnormal regulation of kinase activity in SIII. In addition,
   the inconsistency between kinase abundance and substrate
   phosphorylation suggested that the aberrant regulation of kinases was
   caused by both kinase abundance and activity ([201]Figure 6A).

Figure 6.

   [202]Figure 6
   [203]Open in a new tab

   Drug prediction and key drug target screening for three HCC subtypes

   (A) Phosphosubstrates of kinases with clinical available drugs and fold
   change at proteomic and phosphoproteomic levels for kinases and
   substrates, respectively. The top section displays the abundance of
   drug-targeting proteins across the three subtypes, with each row
   representing a drug-targeting protein and “k” labeling representing
   drug-targeting kinase. The middle section shows the phosphorylation
   site abundance of substrate for differentially abundance kinases among
   the three subtypes, with each row representing the phosphorylation
   sites of substrate. The bottom section displays the abundance of the
   protein where the phosphorylation site is located, with each row
   representing a protein. The color gradient represents the abundance of
   drug-targeting proteins (kinases), phosphorylation sites, and substrate
   proteins, with green indicating low expression and red indicating high
   expression. Red labels indicate statistically significant differences
   in abundance among the three proteomic subtypes.

   (B) Kinase activity of FDA-approved drug targets in three proteomic
   subtypes (two-tailed Wilcoxon test).

   (C) Pathways based on the selected phosphosubstrates and kinases, with
   relevant drugs shown by targets.

   (D) Prognostic risk scores of each target from FDA-approved HCC
   clinical drugs. The x axis indicates log[2]-based hazard ratio for each
   target (log-rank test); y axis indicates log[2]-based T/N fold change
   for each target (two-tailed Wilcoxon test).

   (E) KM curves of OS and RFS for RAF1. The p values were calculated by
   log-rank test.

   (F) Abundance of RAF1 among three proteomic subtypes (two-tailed
   Wilcoxon test).

   (G) Correlation between substrate RAF1 and upstream kinase activity
   among three proteomic subtypes (two-tailed Wilcoxon test).

   See also [204]Figure S6.

   We next analyzed the kinase activity of the drug targets and observed
   high heterogeneity in kinase activity among the three proteomic
   subtypes ([205]Figure S6F). Specifically, the kinase activity of EGFR
   was significantly increased in SII compared to SI and SIII (p < 0.05);
   the kinase activity of ERBB2 was significantly decreased in SIII
   compared to SI and SII (p < 0.05); while the kinase activity of MAP2K1
   was significantly increased in SIII compared to SI and SII (p < 0.05),
   highly consistent with the protein abundance of these kinases in the
   three subtypes. However, the kinase activity of RAF1 was significantly
   downregulated in SIII compared to SI and SII, which was completely
   opposite to the protein abundance ([206]Figures 6B and [207]S6G).
   Combined with the significant increase in protein abundance of BARF,
   MAP2K2, and MAPK1 in SIII, we speculated that the RAF-MEK-ERK signaling
   pathway was significantly upregulated in SIII ([208]Figure S6G).
   Additionally, the decrease in kinase activity of downstream mTOR and
   the increase of its substrate EIF4EBP1 phosphorylation in SIII
   suggested that combination therapy targeting both the RAF-MEK-ERK and
   mTOR pathways may be a more effective approach for SIII
   ([209]Figures 6C and [210]S6H). Therefore, a more comprehensive
   investigation of the signaling pathway regulatory networks could
   provide effective evidence for expanding treatment options beyond the
   current Food and Drug Administration (FDA)-approved HCC therapies.

   Furthermore, after stringent filtering processes, 19 out of 22 drug
   targets with clinically available drugs showed significant prognostic
   values for HCC ([211]Figures 6D and [212]S6I; [213]Table S7). Among
   these targets, RAF1 had the highest negative correlation with OS and
   RFS (p < 0.01) ([214]Figure 6E). The abundance of RAF1 was also
   significantly higher in SIII than in SI (p < 0.05) ([215]Figure 6F).
   Additionally, the correlation between RAF1 and its upstream kinases
   significantly increased in SII and SIII compared to SI (p < 0.01),
   suggesting the important regulating role of RAF1 in SII and SIII
   ([216]Figure 6G). These observations can explain the potential clinical
   benefits to SIII subtype HCC patients of RAF1-targeted therapies such
   as sorafenib, regorafenib, and dabrafenib.

SIII subtype has more specific drug sensitivity to sorafenib than SI

   To further explore the potential drug response of HCC patients in three
   proteomic subtypes, we investigated the therapeutic effects of
   sorafenib on 26 PDC models ([217]Figures 7A and 7B; [218]Table S1). The
   clinicopathological features of these PDC models were not significantly
   different from those of the proteomic subtype discovery cohort
   ([219]Figure S7A). We assigned these 26 PDCs to their corresponding
   proteomic subtypes by consensus clustering of their proteomic data
   ([220]Figure S7B).

Figure 7.

   [221]Figure 7
   [222]Open in a new tab

   Subtype-specific drug sensitivities and machine-learning-based efficacy
   prediction model for sorafenib

   (A) Workflow of pharmacological tests using PDC models.

   (B) Immunofluorescence of PDCs. Blue represents DAPI staining of the
   cell nucleus, green represents the distribution of GPC3 on the cell
   membrane, and red represents the localization of α-fetoprotein (AFP)
   and albumin (ALB) in the cytoplasm. Scale bar, 20 μm.

   (C) Sorafenib sensitivity results of the PDC models in three proteomic
   subtypes (two-tailed Wilcoxon test). Box plots show median (central
   line), upper and lower quartiles (box limits), and 1.5× interquartile
   range (whiskers).

   (D) Sorafenib sensitivity results of the PDC models under different
   concentrations in three proteomic subtypes (two-tailed Wilcoxon test).
   Box plots show median (central line), upper and lower quartiles (box
   limits), and 1.5× interquartile range (whiskers).

   (E) Proportion of patients who achieved IC[50] response in three
   proteomic subtypes. IC[50], half-maximal inhibitory concentration.

   (F) Correlations between elastic-net-predicted and observed AUC in the
   validation set. Correlations were calculated by Pearson’s correlation
   method.

   (G) Regulation networks of selected protein features in sensitivity
   prediction model for sorafenib (Pearson’s correlation).

   (H) Correlations between enriched pathways and observed AUC in PDC
   models.

   (I) Enrichment of pathways associated with sorafenib sensitivity in
   three proteomic subtypes (two-tailed Wilcoxon test).

   See also [223]Figure S7.

   We then measured the drug response of each PDC to sorafenib by
   calculating the AUC of the dose-response curve after 4 days of
   treatment and compared the differences in response to sorafenib among
   the three proteomic subtypes ([224]Figure S7C). Overall, sorafenib
   effectively inhibited the proliferation of PDCs with a wide range of
   drug sensitivities for each subtype, indicating high heterogeneity of
   HCC PDCs ([225]Figure S7D). Notably, SII and SIII tumors showed high
   sensitivity to sorafenib while SI was mostly resistant (p < 0.05)
   ([226]Figures 7C and[227]S7D). In particular, the inhibition of PDC
   growth in SII and SIII was significantly stronger than in SI at
   sorafenib concentration of 32 μM or even higher ([228]Figure 7D).
   Moreover, the percentage of tumors achieving half-maximal inhibitory
   concentration (IC[50]) in SIII was significantly higher than that in SI
   ([229]Figure 7E), suggesting that SIII tumors may have more benefit
   from sorafenib. These findings demonstrated the strong potential of
   proteomic subtypes in predicting targeted drug sensitivity of HCC
   patients.

Machine-learning-based efficacy prediction model for sorafenib

   We developed an elastic net model to predict the responses to sorafenib
   using 16 tumors as the training set, while the remaining 10 tumors were
   regarded as the validation set. Remarkably, we observed high
   correlations (Pearson’s r^2 = 0.753, p = 0.0011) between the predicted
   and observed AUC for sorafenib ([230]Figure 7F). In total, 19 protein
   features were selected by elastic net in the prediction of drug
   response to sorafenib, with most showing negative correlations with
   sorafenib sensitivity ([231]Figure 7G). Notably, most of the key
   proteins identified in the prediction model have not been reported to
   contribute to sorafenib response, such as EPHX2, ENPP1, and AKR7A3,
   which need further attention in future studies.

   As shown in [232]Figure 7H, metabolism-related pathways were highly
   enriched in the tumors resistant to sorafenib, while cell cycle and
   mRNA surveillance pathways were highly enriched in tumors sensitive to
   sorafenib ([233]Figures 7H, 7I, and [234]S7E). These findings were
   highly consistent with the molecular characteristics of SI and SIII,
   further demonstrating the higher sensitivity of SIII and SII tumors to
   sorafenib than SI. Interestingly, herpes simplex virus 1 infection,
   Salmonella infection, and viral life cycle HIV-1 showed high
   correlations with sorafenib sensitivity, suggesting that viral or
   bacterial infection might enhance the sorafenib sensitivity of HCC
   patients due to the close relationship between viral or bacterial
   infection and the immune microenvironment, which may also affect the
   efficacy of sorafenib ([235]Figure S7E).[236]^47 Collectively, these
   results suggest that proteomic subtypes can serve as proxies for
   predicting the treatment efficacy of targeted therapies for HCC
   patients.

Discussion

   Currently, major advances and breakthroughs in precision medicine rely
   entirely on genomic analysis.[237]^48^,[238]^49^,[239]^50 A major
   challenge for proteomics-driven precision medicine is how to use the
   comprehensive proteome to identify subtypes of patients with shared
   biological factors that can be targeted for treatment. In this study,
   we identified three HCC proteomic subtypes characterized by metabolism,
   proliferation & immune, and proliferation & immune & metastasis,
   respectively. These findings were generally consistent with the
   proteomic subtypes of HCC reported by Jiang et al. and Gao et al.
   Furthermore, the cross-validation of three independent large-scale
   proteomics studies resulted in a stable and universal proteomic subtype
   for HCC, which was not limited to multiple centers, patient
   characteristics (BCLC stages and HBV infection), protein quantitative
   strategies (labeling and label-free), and data-acquisition mode of MS.
   These results demonstrated the reliability of the proteomic subtype
   strategies for HCC reported by Jiang et al. and Gao et al. as well as
   the feasibility of using the proteomics for HCC classification.
   Moreover, we developed a multi-centric and robust simplified panel
   based on machine learning for distinguishing HCC proteomic subtypes,
   suggesting that proteomics holds great promise in identifying
   HCC-subtype patients associated with different prognosis who might
   benefit from further clinical treatment.

   The integrated multi-omics analysis revealed alterations in mutation
   profiles, immune landscapes, and kinase-substrate regulation networks
   among three proteomic subtypes. SIII had stronger immune infiltration,
   suggesting that SIII HCC patients might derive potential benefits from
   immunotherapy. Therefore, a clinical trial of immunotherapy for SIII
   HCC patients could be attempted to test this speculation. Furthermore,
   the drug-targetable proteins identified through proteomic and
   phosphoproteomic data may provide evidence for expanding treatment
   selection beyond the current FDA-approved HCC therapies, which is
   promising for overcoming the limited availability of therapeutic drugs
   and their low response rates in HCC.

   Sorafenib remains the first-line drug for advanced
   HCC.[240]^2^,[241]^51 Based on the multi-omics dataset and PDC drug
   testing, we found that response rates to sorafenib were significantly
   higher in SII and SIII than in SI. We therefore hypothesize that SII
   and SIII patients may derive more benefit from sorafenib than SI
   patients. However, this hypothesis needs to be tested separately in a
   clinical trial. We further established a response prediction model for
   sorafenib based on machine learning. For patients who have lost the
   opportunity for surgery, we proposed a strategy that exploits the
   kinase-substrate regulation networks to effectively predict the benefit
   form sorafenib, regardless of whether tumors carry targets of
   sorafenib. Promisingly, the accumulation of multi-omics data combined
   with effective drug testing has the potential to establish a precise
   strategy for identifying the most appropriate drugs for HCC patients.

   In summary, we validated the proteomic subtypes of HCC in multi-centric
   datasets and developed a simplified panel for distinguishing these
   subtypes. Integrated multi-omics analysis revealed the functional
   impact of characterized molecular alterations in three proteomic
   subtypes. Effective prediction of sorafenib response demonstrated that
   targeting multi-omics could improve the implementation of advanced
   personalized therapeutic strategies for HCC. These findings provided
   important insights into the development of precision medicine for HCC
   patients and offered promising strategies to improve the selection of
   appropriate therapies for individual patients.

Limitations of the study

   A potential limitation of our study is primarily the sample size of HCC
   patients in the clinical cohort. Inclusion of a larger sample size
   could allow for more sufficient statistical analyses among the three
   proteomic subtypes, including prognostic and ssGSEA analyses.
   Especially for the construction of a sorafenib efficacy prediction
   model, expanding the sample size of the training set and validation set
   could greatly improve the reliability. Future studies in large
   prospective cohorts are warranted to validate the proteomic subtypes of
   HCC and evaluate the efficacy of the prediction model for consideration
   in clinical trials. Taken together, while our study has some
   limitations, it establishes a strong foundation for the investigation
   of precision medicine for HCC patients.

STAR★Methods

Key resources table

   REAGENT or RESOURCE SOURCE IDENTIFIER
   Antibodies
     __________________________________________________________________

   AFP Polyclonal antibody Proteintech Cat# 14550-1-AP;
   RRID: [242]AB_2223933
   Albumin Polyclonal antibody Proteintech Cat# 16475-1-AP;
   RRID: [243]AB_2242567
   glypican-3 (1G12) SANTA Cat# sc-65443;
   RRID: [244]AB_831376
   Goat anti-Mouse IgG (H + L) Highly Cross-Adsorbed Secondary Antibody,
   Alexa Fluor Plus 488 Invitrogen Cat# A32723;
   RRID: [245]AB_2633275
   Alexa Fluor® 546 goat anti-rabbit IgG (H + L) Life Technologies Cat#
   A11010;
   RRID: [246]AB_2534077
   Alexa Fluor® 546 goat anti-mouse IgG (H + L) Life Technologies Cat#
   A11003;
   RRID: [247]AB_2534071
     __________________________________________________________________

   Biological samples
     __________________________________________________________________

   Tumor with paired surrounding non-tumor tissues from primary HCC
   patients Mengchao Hepatobiliary Hospital of Fujian Medical University
   (Fujian, China) This paper
   Patient-derived cell (PDC) Mengchao Hepatobiliary Hospital of Fujian
   Medical University (Fujian, China) This paper
     __________________________________________________________________

   Chemicals, peptides, and recombinant proteins
     __________________________________________________________________

   Urea Sigma-Aldrich Cat# U1250
   β-Glycerophosphate disodium salt hydrate Sigma-Aldrich Cat# G9422
   Sodium orthovanadate Sigma-Aldrich Cat# S6508
   Sodium pyrophosphate tetrabasic decahydrate Sigma-Aldrich Cat# S6422
   Sodium dihydrogen phosphate dihydrate Sigma-Aldrich Cat# 04269
   Protease inhibitor cocktail tablets Roche Cat# 11697498001
   PhosSTOP Roche Cat# 04906837001
   Iodoacetamide Sigma-Aldrich Cat# I6125
   DL-Dithiothreitol Sigma-Aldrich Cat# D0632
   Acetonitrile, LC-MS Grade Thermo Fisher Scientific Cat# 51101
   Water, LC-MS Grade Thermo Fisher Scientific Cat# 51140
   Triethylammonium bicarbonate buffer Sigma-Aldrich Cat# 90360
   Sequencing Grade Modified Trypsin Promega Cat# V5111
   0.1% Formic acid in water Thermo Fisher Scientific Cat# 85170
   0.1% Formic acid in ACN Thermo Fisher Scientific Cat# 85174
   Pierce HeLa Protein Digest Standard Thermo Fisher Scientific Cat# 88329
   iRT Biognosys Cat# KI-3002-1
   Collagenase D Roche Cat# 11088866001
   Dnase I Roche Cat# 10104159001
   EBSS Gibco Cat# 14155063
   DMEM/F12 Gibco Cat# C11330500BT
   GlutaMax Gibco Cat# 35050061
   HEPES Gibco Cat# 15630106
   Pen/Strep Gibco Cat# 15140122
   B27 Gibco Cat# 12587010
   N2 Gibco Cat# 17502048
   Forskolin Sigma-Aldrich Cat# F3917
   Nicotinamide Sigma-Aldrich Cat# N0636
   N-acetyl-L-cysteine Sigma-Aldrich Cat# A9165
   recombinant human EGF PeproTech Cat# AF-100-15
   recombinant human FGF-10 PeproTech Cat# 100-26
   recombinant human HGF PeproTech Cat# 100-39
   [Leu[248]^15]-Gastrin I human Sigma-Aldrich Cat# G9145
   A83-01 Tocris Bioscience Cat# 2939
   Y-27632 MCE Cat# HY-10071
   Dexamethasone Sigma-Aldrich Cat# D1756
   Hoechst 33342 Beyotime Cat# C1028
   Sorafenib Selleckchem Cat# S7397
     __________________________________________________________________

   Critical commercial assays
     __________________________________________________________________

   EasyⅡProtein Quantitative Kit (BCA) TransGen Biotech Cat# DQ111
   High-Select™ Fe-NTA Phosphopeptide Enrichment Kit Thermo Fisher
   Scientific Cat# A32992
   Pierce High pH Reversed-Phase Peptide Fractionation Kit Thermo Fisher
   Scientific Cat# 84868
   QIAamp DNA Mini Kit Qiagen Cat# 51306
   Agilent SureSelect Human All Exon V6 kits Agilent Technologies Cat#
   5190-8864
   TransDetect Cell Counting Kit (CCK) TransGen Biotech Cat# FC101
     __________________________________________________________________

   Deposited data
     __________________________________________________________________

   DrugBank database NA RRID:[249]SCR_002700;
   [250]https://go.drugbank.com/drugs
   MSigDB database Broad Institute RRID: [251]SCR_002700;
   [252]https://data.broadinstitute.org/gsea-msigdb/msigdb/release/2022.1.
   Hs/c2.cp.kegg.v2022.1.Hs.symbols.gmt
   NetworKIN 3.0 Horn et al.[253]^46 RRID: [254]SCR_007818;
   [255]http://networkin.info
   PhosphoSitePlus Hornbeck et al.[256]^52 RRID:[257]SCR_001837;
   [258]https://www.phosphosite.org
   Proteomic data of this cohort (N = 152) This paper iPROX:
   IPX0005108000;
   ProteinXchange:
   PXD046519
   WES data of this cohort (N = 58) Cai et al.[259]^53 NGDC: HRA000045
   RNA_Seq data of this cohort (N = 57) Li et al.[260]^44 NGDC: HRA000464
   MSigDB c2 gene sets (version 6.2) NA
   [261]http://software.broadinstitute.org/gsea/msigdb/index.jsp
   Proteogenomic data of the Jiang et al. ’s cohort (N = 101) Jiang
   et al.[262]^24 PMID: [263]30814741
   Proteogenomic data of the Gao et al. ’s cohort (N = 159) Gao
   et al.[264]^25 PMID: [265]31585088
   Proteomic data of the Ng et al. ’s cohort (N = 51) Ng et al.[266]^41
   PMID: [267]35508466
     __________________________________________________________________

   Software and algorithms
     __________________________________________________________________

   HCC Proteomic Molecular subtypes and the SP9 This paper
   [268]https://doi.org/10.5281/zenodo.10053446
   Spectronaut (version 13.2) Biognosys Inc.
   [269]https://www.biognosys.com
   R (version3.6.3) R Core Team [270]https://www.r-project.orgs
   NIS-Elements (version 4.40) Nikon RRID: [271]SCR_014329
   TitanCNA (version 1.23.1) Ha et al.[272]^54
   [273]https://github.com/gavinha/TitanCNA
   Gephi (version 0.9.2) Bastian et al.[274]^55 RRID:[275]SCR_004293;
   [276]https://gephi.org/
   Cytoscape (version 3.9.0) Shannon et al.[277]^56 RRID:[278]SCR_003032;
   [279]https://cytoscape.org/
   ConsensusClusterPlus R package (version 1.50.0) Wilkerson
   et al.[280]^57 RRID:[281]SCR_016954;
   [282]http://bioconductor.org/packages/release/bioc/html/ConsensusCluste
   rPlus.html
   GISTIC (version 2.0.23) Mermel et al.[283]^58 RRID:[284]SCR_000151;
   [285]https://portals.broadinstitute.org/cgi-bin/cancer/publications/vie
   w/216
   NMF (version 2.23.0) Gaujoux et al.[286]^59
   [287]https://cran.r-project.org/web/packages/NMF/index.html
   Mutect (version 2) Broad Institute RRID:[288]SCR_000151;
   [289]https://portals.broadinstitute.org/cgi-bin/cancer/publications/vie
   w/216
   Genome Analysis Toolkit (GATK) (version 4.1.0.0) Broad Institute
   RRID:[290]SCR_001876; [291]https://software.broadinstitute.org/gatk/
   caret R package (version 1.50.0) Github
   [292]http://topepo.github.io/caret/index.html
   ANNOVAR (version 2017 Jul 17) Wang et al.[293]^60
   [294]http://annovar.openbioinformatics.org/en/latest/
   ggplot2 R package (version 3.3.6) Github
   [295]https://github.com/tidyverse/ggplot2
   [296]Metascape software (version 3.5) Zhou et al.[297]^61
   [298]https://metascape.org/gp/index.html#/main/step1
     __________________________________________________________________

   Other
     __________________________________________________________________

   Centrifugal filter Millipore Cat# UFC5010BK
   ACQUITY UPLC BEH C18 Column Waters Cat# 186002352
   nanoEase M/Z HSS T3 Column Waters Cat# 186008818
   Acclaim PepMap™ 100 C18 column Thermo Fisher Scientific Cat# 164946
   Cell Strainers Biologix Cat# 15-1070
   Glass bottom cell culture dish Biosharp Cat# BS-15-GJM
   96 Well cell culture plate Corning Cat# CLS3599
   SpectraMax M5e Molecular Device RRID: [299]SCR_020300
   Waters Acquity H-class UPLC system Waters Cat# ACQUITY UPLC H-Class;
   RRID: [300]SCR_022217
   EASY-nLC 1000 Thermo Fisher Scientific Cat# LC120; RRID:
   [301]SCR_014993
   Q Exactive Plus mass spectrometry Thermo Fisher Scientific Cat#
   IQLAAEGAAPFALGMBDK; RRID: [302]SCR_020556
   Zeiss LSM 780 Confocal Laser Scanning Microscope Carl Zeiss RRID:
   [303]SCR_020922
   [304]Open in a new tab

Resource availability

Lead contact

   Further information and requests for resources and reagents should be
   directed to and will be fulfilled by the lead contact, Xiaolong Liu
   (xiaoloong.liu@gmail.com).

Materials availability

   This study did not generate new unique reagents.

Data and code availability

     * •
       Raw data of proteome and phosphoproteome derived from HCC tissues
       have been deposited at iProX: IPX0005108000 and ProteomeXchange:
       PXD046519.[305]^62^,[306]^63 Raw data of genome and transcriptome
       derived from HCC tissues have been deposited at Genome Sequence
       Achieve of National Genomics Data Center (NGDC): [307]HRA000045,
       [308]HRA000464. Accession numbers are listed in the [309]key
       resources table.
     * •
       All analysis code has been deposited at Zenodo:
       [310]https://doi.org/10.5281/zenodo.10053446 and is publicly
       available as of the date of publication. DOls are listed in the
       [311]key resources table.
     * •
       Any additional information required to reanalyze the data reported
       in this work paper is available from the [312]lead contact upon
       request.

Experimental model and study participant details

Patients and tissue samples for multi-omics analysis

   A total number of 160 primary HCC patients obtained from Mengchao
   Hepatobiliary Hospital of Fujian Medical University in China were
   recruited for multi-omics analysis in this study (Figure S1A). The
   clinical and pathological characteristics of these HCC patients were
   summarized in [313]Table S1. The included HCC patients represented all
   major subtypes at different pathological stages, such as age, gender,
   tumor number, tumor size, tumor capsule, AFP concentration, HBV-DNA,
   PVTT, MVI, BCLC stage, TNM stage, cirrhosis, and differentiation. The
   median follow-up of this cohort for overall survival was 30.5 months,
   and recurrence-free survival had a median of 7.2 months
   ([314]Figure S1A and Table S1). This project was approved by the
   Institution Review Board of Mengchao Hepatobiliary Hospital of Fujian
   Medical University. Informed consent was obtained from each participant
   before the operation. The use of clinical specimens was completely in
   compliance with the “Declaration of Helsinki”.

Patient derived cell models of HCC

   To established Patient derived cell models of HCC, a total of 26 fresh
   HCC tumor tissue samples from patients who underwent surgical resection
   were collected, and the study protocol of clinical samples was approved
   by the Institutional Ethics Committee of Mengchao Hepatobiliary
   Hospital of Fujian Medical University. Tumor tissue acquisition was
   approved by the ethics committees of Mengchao Hepatobiliary Hospital of
   Fujian Medical University (Fuzhou, China) and agreed by each patient
   via written informed consent, and the whole procedure was carried out
   according to national and institutional regulations on experimental use
   of human tissues. The surgical tissue samples were enzymatically
   dissociated using digestive lysate (2.5 mg/mL Collagenase D, 1 mg/mL
   DNase I, 1×EBSS), and then the tumor cells from HCC tissues were
   collected by centrifugation at 500g for 3 min, followed by washing with
   washing buffer (1×DMEM/F12, 2mM GlutaMax, 10mM HEPES,
   1×Pen/Strep).[315]^64 PDCs were cultured in defined medium (1×DMEM/F12,
   2mM GlutaMax, 10mM HEPES, 1×Pen/Strep, 1×B27, 1×N2, 10μM forskolin,
   10mM Nicotinamide, 1.25 mM N-Ac, 50 ng/mL hEGF, 100 ng/mL FGF-10,
   25 ng/mL hHGF, 10 nM Gastrin, 5 μM A83-01, 10 μM Gastrin, 10 μM
   Y-27632, 3 nM Dexamethasone), and seeded in 96-well plates at 50000
   cells/well in quintuplicate for sorafenib (purchased from Selleckchem)
   treatment at 37°C in a humidified atmosphere with 5% CO[2]. In
   addition, the PDCs were inoculated in 15 mm confocal dishes (Biosharp
   Life sciences) with 1 × 10^5 cells in triplicate for
   immunofluorescence. The cells were cultured as described above.

Method details

Protein extraction and digestion

   Total protein was extracted from tissue samples by lysis buffer
   containing 9 M urea, 75 mM NaCl, 10 mM Tris-Hcl (pH8.0), 10 mM IAA,
   1 mM NaF, 1 mM β-glycerophosphate, 1mM Sodium orthovanatate, 10 mM
   Sodium pyrophosphate, 100 mM Sodium dihydrogen phosphate, 1 mM PMSF and
   protease inhibitors as described previously. Briefly, tissue was
   disrupted using a tissue homogenizer on ice for 3 min followed by
   centrifugation at 17000× g for 10 min at 4°C and then the supernatant
   was collected. The proteins were sequentially reduced with 10 mM
   dithiothreitol at 55°C for 30 min and alkylated with 50 mM
   iodoacetamide in the dark for 30 min at room temperature. Then, the
   protein concentration was determined using the BCA assay. Subsequently,
   100 μg of total protein was loaded in 10 kDa centrifugal filter tubes
   (Millipore), washed twice with 400 mL 100 mM triethyl ammonium
   bicarbonate (TEAB), and enzymatic digestion was carried out with
   trypsin at a concentration of 1:50 (w/w) using filter-aided sample
   preparation for 18 h at 37°C. Peptides were eluted by centrifugation at
   14,000g for 15 min and evaporated to dryness for LC-MS/MS analysis.

   To prepare samples for spectral library construction, each 20 samples
   were pooled into a 1 mg protein mixture (equally to a final of 16
   pools), and prepared in the same method as mentioned above
   ([316]Figure S1B).

Phosphopeptide enrichment

   The phosphopeptide enrichment was performed using High-Select Fe-NTA
   Phosphopeptide Enrichment Kit (Thermo Scientific) according to the kit
   manual. In brief, the peptide mixture from 1 mg proteins were dissolved
   with 200 μL binding buffer, then were added to a Fe-NTA spin column and
   incubated for 20 min at room temperature with end-over-end rotation.
   After 1 min of centrifugation at 1,000 g for 1 min, the column was
   washed for 4 times with wash buffer to remove nonspecific peptides.
   Finally, phosphopeptides were eluted with elution buffer and were
   immediately dried by speed-vac for MS analysis.

   To prepare samples for spectral library construction, each 40 samples
   were pooled into a 10 mg protein mixture (equally to a final of 8
   pools), and prepared in the same method as mentioned above
   ([317]Figure S1B).

High pH reversed-phase separation

   In order to increase the depth of protein or phosphopeptide
   identification for generating the spectral libraries, the pooled
   samples were fractionated by high-pH reverse phase liquid
   chromatography. Peptide mixtures were resuspended in FA/water (1/1000)
   for peptide fractionation. Offline basic RP UPLC fractionation was
   performed on a XBridge BEH C18 column (1.7 μm, 100 μm × 10 cm) using a
   Waters Acquity H-class UPLC system. Chromatographic separation was
   performed with 30 min gradient as following: 0–5 min, increase to 7% B;
   5–20 min, 7–35% B; 20–25 min, 35–80% B (mobile phase A: 0.1% FA (v/v);
   mobile phase B: 90% ACN, 0.08% FA (v/v), flow rate 300 nL/min). A
   fraction was collected every 30 s; to reduce the MS running time, the
   60 fractions were pooled (F01 + F13 + F25, F02 + F14 + F26 and so on)
   as described[318]^45 to a final of 12 samples ([319]Figure S1B). For
   each sample, 2 μg were taken out and dried by vacuum centrifugation for
   proteome analysis.

LC-MS/MS analysis

   Peptide fractions were analyzed on an EASY-nLC 1000 LC system (Thermo
   Fisher Scientific,Waltham, MA) coupled with an Orbitrap Exactive Plus
   mass spectrometry (Thermo Fisher Scientific, Waltham, MA). Briefly, the
   peptide was resuspended in mobile phase C (0.1% formic acid in water)
   and equal amounts of 11 indexed retention time (iRT) peptide standards
   (Biognosys) were spiked into each shotgun run in data-dependent
   acquisition (DDA) mode. And the peptides were separated onto the
   analytical column (1.8 μm, 75 μm × 25 cm) with a 120 min gradient at a
   constant flow rate of 300 nL/min (0–3 min, 4 to 7% of D; 3–79 min, 7 to
   20% D; 79–103 min, 20 to 32% of D, 103–105 min, 32 to 90% of D and held
   at 90% D for 15min mobile phase D: 0.1% Formic acid in ACN). MS was
   operated under a data-dependent acquisition mode. The spectra of full
   MS scan (m/z 300–1800) were acquired by Orbitrap mass analyzer at
   70,000 resolution for a maximum injection time of 60 ms with an AGC
   target value of 3e6. Up to 20 precursors were selected for MS2 analysis
   and fragmented by HCD using a normalized collision energy of 27% and
   analyzed using the Orbitrap at 17,500 resolution for a maximum
   injection time of 45 ms with AGC target value of 1e5. The dynamic
   exclusion was 20 s and the precursor ions with unassigned charge state
   as well as charge state of 1+, or superior to 8+ were excluded from
   fragmentation selection.

   As above, equal amount of iRT peptide standards were mixed into
   individual sample peptide for each shotgun run at DIA mode, and nano-LC
   MS/MS basic parameters were as the same as described above. In DIA
   mode, precursor ions were acquired using an MS1 master scan (m/z range:
   400–1200, max. injection time: 20 ms, AGC target: 3e6, resolution:
   70K), following 32 DIA scans for MS2 with two kinds of wide isolation
   window as follows: m/z range 400–1000 using an isolation window width
   of m/z 20, and m/z range 1000–1200 using an isolation window width of
   m/z 100. Each isolation window overlapped by 1 m/z.

Generation of proteomic and phosphoproteomic spectral library

   To generate an extensive spectral library, both DDA and DIA files were
   processed using Pulsar search engine inside Spectronaut with default
   settings. Reference FASTA files for human was downloaded from UniProt
   (release 2019-04, 23046 entries), combining with the fusion sequence of
   iRT. Enzyme specificity was set as trypsin/P. The maximum missing
   cleavage site was set as 2. The mass tolerances of precursor ion and
   fragment ion for peptides were set at 10 parts per million (ppm) and
   0.02 Da, respectively. The minimal peptide length was set at 7.
   Carbamidomethyl (C) was set as fixed modifications. Oxidation (M) and
   Acetyl (Protein N-term) were set as variable modifications. For
   phosphoproteomic analysis, phospho (STY) was also set as a variable
   modification. In additional, the false discovery rate (FDR) was set to
   1% at peptide spectrum matches, 1% at peptide precursor level and 1% at
   protein level.

DIA mode analysis to get proteomic and phosphoproteomic data

   In total, 304 proteome DIA files and 264 phosphoproteome DIA files
   passed the quality-control were searched using Spectronaut based on the
   hybrid spectral library of HCC proteome or phosphoproteome,
   respectively. Reference FASTA files from UniProt database were
   combining with the fusion sequence of iRT. The samples were grouped by
   T (HCC tumor tissue group) and N (paratumor noncancerous tissue group).
   Calibration was set to non-linear iRT calibration with enabled
   precision iRT. Identification was performed using 5% q-value cutoff on
   precursor and protein level, while the maximum number of decoys was set
   to a fraction of 0.1 of library size. Quantity was determined on MS/MS
   level using area of XIC peaks with enabled cross run normalization. For
   phosphoproteomic analysis, minor quantified (Peptide) grouping was set
   by modified sequence and PTM localization was activated and the
   probability cutoff was set to 0, in order to summarize phosphopeptide
   or phosphosite later.

Quality control of the MS platform

   To evaluate the performance of the MS system, the HeLa protein digest
   standard (Pierce) was measured every 10 samples from beginning to the
   end of the project as the quality-control standard. DDA analysis were
   performed all through the project, and DIA analysis were performed
   during the DIA process. The standard was analyzed using the same method
   and conditions as using in the HCC related tissue samples. A Pearson’s
   correlation coefficient was calculated for all quality-control runs in
   the statistical analysis environment R3.6.3.

Quality assessment of proteomic and phosphoproteomic data

   The two-component mixture model implemented in the R MCLUST package
   v.5.2 was applied to filter the samples with insufficient peptide
   counts and protein identifications. In total, 152 paired samples among
   the 160 total involved paired samples passed the quality-control and
   were used for further proteomic analysis. Furthermore, 132 paired
   samples among the 152 paired samples passed the quality-control and
   were used for further phosphoproteomic analysis. All samples that
   passed quality control showed good consistency in the proteome
   quantification and phosphoproteome quantification ([320]Figures S1F and
   [321]S5F). Furthermore, PCA (Principal Component Analysis) and C.V
   (Coefficient of Variance) distribution of protein quantification showed
   that there was no batch effect in proteomic and Phosphoproteomic data
   ([322]Figures S1G, S1I, [323]S5E, and S5G).

Pre-processing of proteomic and phosphoproteomic data

   In order to correct the initial loading volume for each sample,
   proteomic (304 samples and 6512 proteins) and phosphoproteomic (264
   samples and 54789 phosphopeptides) data from Spectronaut software were
   first log2-transformed, and then normalized using the median centering
   method. Glmnet Ridge Regression (GRR) of NAguideR
   ([324]https://www.omicsolution.org/)[325]^65 was used to impute missing
   values in tumor or paired non-tumor samples. Before missing value
   imputation, proteins and phosphopeptides with more than 50% missing
   data were taken out to make sure that each sample had enough data for
   imputation.

Consensus clustering for HCC in proteomic data

   For proteomic subtype identification, 1128 proteins from the training
   set with the top 1500 of MAD (median absolute deviation) and with
   expression in more than 80% of the tumor samples were selected for
   consensus clustering using the ConseususClusterPlus R package.[326]^57
   The parameters were as follows: maxK = 8, reps = 1,000 bootstraps,
   pItem = 0.8, pFeature = 0.8, clusterAlg = “kmdist”, distance =
   “euclidean”. The number of clustering was determined by the average
   pairwise consensus matrix within consensus clusters, the delta plot of
   the relative change in the area under the cumulative distribution
   function (CDF) curve, and the average silhouette distance for consensus
   clusters. To verify the reliability of the proteomic subtypes,
   consensus clustering was also performed in the validation set using the
   same parameters.

Identification of signatures for HCC subtypes

   Signatures for HCC subtypes were a group of proteins that meet the
   following criteria: (1) the Benjamini-Hochberg-adjusted p values in
   each subtype should be less than 0.01 compared to the other subtypes;
   (2) fold change of the protein abundance between proteomic subtypes was
   greater than 2 for upregulation or less than 0.5 for downregulation. We
   performed these analyses using the “Wilcoxon test” function in the
   R3.6.3 environment. In total, we identified the signatures containing
   761 proteins from the 3 HCC proteomic subtypes, and these signatures
   were then subjected to further analyses. To verify the reliability of
   the signatures, consensus clustering was also performed in all 152
   tumor patients using both the 1128 proteins and the signatures
   containing 761 proteins with the same parameters. The subtypes
   identified using 1128 proteins were taken as the true subtypes, and the
   results were compared using a confusion matrix.

Correlations between proteomic subtypes and clinical features

   The chi-square and Fisher-exact tests were performed to identify the
   correlations of proteomic subtypes with clinicopathologic feature.
   Survival curves were generated using the KM method, and the log rank
   test was applied to calculate differences between the curves. Hazard
   ratios (HR), 95% confidence intervals (CI) and figures were estimated
   by the survival and survminer or plotted by ggplot2 R packages.

   The univariable and multivariable Cox analyses were further used to
   evaluate the prognostic power of HCC proteomic subtypes. All
   statistical analyses were performed in R (version 3.6.3), and a
   significance level of 0.05 was used.

Pathway alterations of 3 HCC proteomic subtypes

   To explore the pathway alterations of 3 HCC proteomic subtypes, we
   performed single sample gene set enrichment analysis (ssGSEA) with 4026
   proteins that were identified in more over 50% of patients using the R
   package GSVA,[327]^66 and the enrichment scores were scaled by R. In
   this study, 186 canonical pathway gene sets derived from the KEGG
   database were obtained from the MSigDB database
   ([328]https://data.broadinstitute.org/gsea-msigdb/msigdb/release/2022.1
   .Hs/c2.cp.kegg.v2022.1.Hs.symbols.gmt). The clusterProfiler R
   package[329]^67 was used to perform pathway enrichment analysis of
   differentially abundant proteins or phosphoproteins. The statistical
   significance of pathways between 3 subtypes were analyzed by the
   “Wilcoxon test”, and p value less than 0.01 was considered as
   significant.

Cross-validation of HCC proteomic subtypes from multi-centers

   To verify the robustness and universality of HCC proteomic subtypes,
   the protein abundance matrix, signatures, proteomic subtypes, and
   prognostic information were collected from Jiang et al.’s and Gao
   et al.’s cohorts. 1269 signatures obtained from Jiang et al.’s cohort
   containing 101 patients and 9252 proteins, as well as 1274 signatures
   were obtained from Gao et al.’s cohort containing 159 patients and 6478
   proteins in total. The cross-validation was mainly focused on the
   two-by-two comparison of the 3 subtypes in 3 cohorts. For example,
   consensus clustering was performed using Gao et al.’s signatures and
   Jiang et al.’s signatures in our cohort, and they were further compared
   with our original subtypes based our signatures. When reclassifying
   samples in different cohort based on other signatures, the
   classification method in the original article related to the cohort was
   used. The validation results were presented in a confusion matrix, and
   the proportion of consistently assigned patients to the total patients
   was used to assess the consistency of the two subtypes. For patients
   with different assignments, survival analysis with two assignments by
   the KM method were required.

Simplified panel for discriminating proteomic subtypes

   To construct the simplified panel for discriminating proteomic
   subtypes, a total of 412 patients (152 patients from our cohort, 101
   patients from Jiang et al.’s cohort, and 159 patients from Gao et al.’s
   cohort) were used. Of these, our cohort and Jiang et al.’s cohort (N =
   253) were used as the training set for model training and parameter
   tuning, and Gao et al.’s cohort was used as an external independent
   validation set to evaluate the model performance (N = 159). The scale
   function in R was used to normalize 3 datasets to eliminate the batch
   effect. Then, the panel for discriminating proteomic subtypes was
   constructed and validated by the following steps. Firstly, a random
   sampling method was used to screen for differentially expressed
   proteins in 3 subtypes, a total of 50 replicates were performed, 40 of
   which reached the threshold (p value < 0.01, fold change >1.5) that
   were initially selected as differentially expressed proteins. After
   removing the proteins with correlations higher than 0.9 with other
   proteins, a total of 1133 differentially expressed proteins were
   selected in the training set. Secondly, for reducing the misleading
   impact of random fluctuations and correlations, Boruta algorithm from
   the Boruta R package was used to select the signatures. Then, 59 SI
   specifically expressed proteins and 88 SIII specifically expressed
   proteins were selected in total. Thirdly, the KNN (K-Nearest Neighbors)
   algorithm of the caret R package was used to construct the SI
   discriminating panel and SIII discriminating panel, respectively. And a
   5-fold cross-validation was performed to further reduce the number of
   signatures of the discriminating panel. The specific parameters are as
   follows: method = “cv”, 5, summaryFunction = twoClassSummary, and
   classProbs = TRUE for the trainControl function; method = “knn”,
   tuneLength = 30, and metric = “ROC” for the train function. Finally,
   9-protein simplified panel containing 4 proteins from the SI subtype
   and 6 proteins from the SIII subtype (1 protein was shared) were
   integrated to identify HCC proteomic subtypes. In addition, the
   simplified panel for discriminating proteomic subtypes was validated in
   an external independent validation set, and the accuracy, sensitivity
   and specificity were calculated to evaluate the panel performance.

DNA extraction and whole-exome sequencing

   Total DNA from HCC tissues and para-tumor tissues were extracted and
   subjected to DNA library preparation using QIAamp DNA Mini Kit (Qiagen)
   according to the manufacturer’s instructions. Whole-exome capture was
   carried out using Agilent SureSelect Human All Exon V6 kits. And
   sequencing was performed using the Illumina HiSeq X ten system
   (Annoroad Gene Tech. Co., Ltd). As described previously, qualified WES
   reads were aligned to hg19 human genome assembly (GRCh37) using BWA,
   and duplicates of all mapped reads were then marked and discarded using
   Picard.[330]^50

Transcriptome sequencing and mapping

   RNA was extracted from HCC tumor and para-tumor tissues, and the whole
   transcriptome sequencing (paired-end, 150 bp) was performed on the
   Illumina HiSeq X10 platform by Anoda Gene Technology (Beijing) Co. All
   high quality reads were aligned to GRCh37 with GENCODE gene annotations
   using STAR as described previously.[331]^44 The abundance of all genes
   was quantified using transcripts per kilobase per million mapped reads
   (TPM).

Somatic mutation detection, CNV, TMB calculation and neoantigen prediction

   Somatic mutations of HCC tumor were identified with Mutect2 in GATK
   (version 4.1.0.0) using paired peritumoral tissue samples as the
   control. Somatic mutations that meet the following criteria are
   retained for downstream analysis: ≥20×depth in both tumor and
   peritumoral samples; variant allele frequencies (VAF) ≥5% in tumor
   samples; VAF≤1% in peritumoral samples. CNVs were identified using
   TitanCNA R/Bioconductor package (version 1.23.1),[332]^54 and regions
   of the genome significantly amplified or deleted were identified by
   Gistic2.0 in “bzhanglab/gistic2”. BAM format files and the data of TMB
   and neoantigens were analyzed according to our previously published
   protocols.[333]^44^,[334]^53

Random forest algorithm for ranking the importance of mutations

   Random forest algorithm was applied to the WES data to identify the
   most important mutations associated with the proteomic subtypes. In
   brief, the gene mutation dataset and subtype tags were applied as
   input. The caret R package was used to find the best hyperparameter for
   the random forest model.

Immune microenvironment and immune infiltration analysis

   28 immune cell types and their feature gene panels were downloaded from
   Charoentong et al.’s study.[335]^68 Immune cell infiltration of each
   cell type was estimated by ssGSEA of GSVA R package. Then, the
   enrichment score was scaled to represent the relative level of
   infiltration. The estimate R package was also used to assess the
   overall immune infiltration of every tumor sample. The anti-tumor
   immunity and pro-tumor immune suppression were calculated from the
   enrichment score of infiltration cell types executing anti-tumor
   immunity (activated CD4 T cell, activated CD8 T cell, central memory
   CD4 T cell, central memory CD8 T cell, effector memory CD4 T cell,
   effector memory CD8 T cell, type 1 T helper cell, type 17 T helper
   cell, activated dendritic cell, CD56 bright natural killer cell,
   natural killer cell and natural killer T cell) and cell types executing
   pro-tumor immune suppressive functions (regulatory T cell, type 2 T
   helper cell, CD56 dim natural killer cell, immature dendritic cell,
   macrophage, MDSC, neutrophil, plasmacytoid dendritic cell).[336]^69

Correlation analysis of multi-omics data

   Correlations of multi-omics data were calculated for each gene/protein
   across the tumor samples using Pearson’s correlation coefficient,
   including CNV alterations and RNA abundance, CNV alterations and
   protein abundance, CNV alterations and phosphopeptide abundance, RNA
   abundance and protein abundance, RNA abundance and phosphopeptide
   abundance, as well as protein abundance and phosphopeptide abundance.
   Protein attenuation was calculated by the difference between the
   Pearson coefficient of the transcript correlation (correlation between
   CNV and transcript measurements) and the Pearson coefficient of the
   protein correlation (correlation between copy-number and protein
   measurements). Then proteins were classified according to their
   attenuation effect using a Gaussian mixture model with 2 mixture
   Components.[337]^70 Differentially expressed proteins or
   phosphopeptides between tumor and paired non-tumor samples were
   filtered using p values of less than 0.01 and fold change greater than
   2.

Phosphosite-to-protein co-variation analysis

   As for all proteomic and phosphoproteomic data from 132 HCC tumor
   samples, Pearson correlations for each protein and corresponding
   phosphorylation sites were also calculated in the 3 proteomic subtypes,
   where the upper quartile of all differences were considered as
   significantly co-regulated among subtypes. In total, we obtained 609
   proteins corresponding to 1,185 phosphorylation sites for further
   analysis. Functional enrichment and interactome analysis of these
   proteins were performed using Metascape.[338]^61 The top 5 MCODE
   complexes extracted from the interactome networks were selected and
   visualized by Gephi software. These significantly co-regulated proteins
   were classified into 3 classes by the k-means algorithm. Also, for each
   cluster, known or predicted upstream kinases of the phosphosites in the
   significantly co-regulated phosphosite-to-protein correlations were
   given based on PhosphoSitePlus (PSP)[339]^52 or NetworKIN 3.0
   (NetworKIN Score >1).[340]^46

Kinase activity prediction

   To estimate the kinase activity for each patient, enrichment analysis
   was performed by the ssGSEA method using the GSVA R package.[341]^71
   Kinase activity for each kinase was estimate using a kinase enrichment
   score, which was calculated based on the overall phosphorylation status
   of its all substrates.[342]^72^,[343]^73 To deduct protein levels and
   abundance in the para-tumor, we used the following formula to correct
   each phosphorylation site abundance:
   [MATH: <mrow><msub><mi>y</mi><mi>j</mi></msub><mo
   linebreak="badbreak">=</mo><mrow><mo
   stretchy="true">(</mo><mrow><msub><mi>T</mi><mi>j</mi></msub><mo>−</mo>
   <mi>T</mi></mrow><mo stretchy="true">)</mo></mrow><mo>−</mo><mrow><mo
   stretchy="true">(</mo><mrow><msub><mi>N</mi><mi>j</mi></msub><mo>−</mo>
   <mi>N</mi></mrow><mo stretchy="true">)</mo></mrow></mrow> :MATH]

   “
   [MATH: <mrow><mi>j</mi></mrow> :MATH]
   ” represented a single phosphorylation site; “
   [MATH: <mrow><msub><mi>y</mi><mi>j</mi></msub></mrow> :MATH]
   ” was the abundance of this phosphorylation site; “
   [MATH: <mrow><msub><mi>T</mi><mi>j</mi></msub></mrow> :MATH]
   ” and “
   [MATH: <mrow><msub><mi>N</mi><mi>j</mi></msub></mrow> :MATH]
   ” referred to the log2-transformed abundance of this phosphorylation
   site in tumor and non-tumor tissues, respectively; while “
   [MATH: <mrow><mi>T</mi></mrow> :MATH]
   ” and “
   [MATH: <mrow><mi>N</mi></mrow> :MATH]
   ” represented the log2-transformed abundance of the protein that
   corresponded to this phosphorylation site in tumor and non-tumor
   tissues.

   The known or predicted kinase-substrate relations, were also gained
   from PSP or NetworKIN 3.0. The gene symbol and Uniprot ID were
   corrected by the Uniprot Database ([344]https://www.uniprot.org/), and
   any ambiguous correspondence was excluded. In total, 239759 pairs of
   kinase-substrate relations were collected, which contained 381 kinases,
   6906 protein substrates, and 48730 phosphorylation sites.

Clinical available drugs and target mapping

   The FDA-approved drugs or candidate drugs and corresponding targets
   were downloaded from the DrugBank database (release version 5.1.7).
   Only targets with known pharmacological action as inhibitors, enzyme,
   carrier and transporter were selected, and 46 candidate targets were
   identified in this study. To get subtype-specific drug targets, we
   comprehensively evaluated their protein abundance, kinase activity, and
   correlations of kinase-substrate among the 3 proteomic subtypes.

Immunofluorescence analysis

   The PDCs on confocal dishes were fixed with 4% paraformaldehyde for
   15min, and then permeabilized with 0.5% Triton X-100 in PBS for 15 min
   at room temperature. After blocking (5% BSA in TBS containing 0.01%
   Triton X-100) for 1h, the PDCs were subsequently incubated overnight at
   4°C with primary antibodies against AFP (1:300, proteintech), albumin
   (1:300, proteintech) and glypican-3 (1:300, SANTA) diluted in blocking
   buffer. After washing with TBS for 4 times, PDCs were incubated with
   secondary Alexa Fluor Plus 488 (1:1000, Invitrogen) and Alexa Fluor 546
   (1:1000, Life Technologies). In addition, cell nucleus was stained with
   Hoechst 33342 (1:500, Beyotime). Following extensive washing, stained
   organoids were imaged in confocal (Zeiss LSM780, Gemany).

Drug treatment and viability assays

   PDCs were treated with sorafenib in an eight-point serial dilution
   series from 0.25 nM to 32 μM. After 4 days of incubation, cell
   viability was analyzed using CCK8 assay (TransGen Biotech, Beijing,
   China). Viable cells were estimated using SpectraMax M5e (Molecular
   Device, USA). Control cells treated with dimethyl sulfoxide (DMSO) were
   used to calculate relative cell viability and to normalize the data.
   Dose-response curve fitting was performed using R (3.6.3) and was
   evaluated by measuring the area under curve (AUC). In brief, the plate
   was normalized to the mean value from the seven serial conditions
   compared with DMSO control. The AUC of curve was determined using R
   (3.6.3), ignoring regions defined by fewer than two peaks.
   Non-convergence or ambiguous curves were excluded in every analysis.

Drug sensitivity prediction model

   To find sorafenib response associated proteomic features and build
   prediction models, the 26 PDC models were randomly split into two
   datasets. The 16 PDC models were used as the training dataset, and the
   remaining 10 PDC models were regarded as testing dataset. The Elastic
   net (EN) algorithm was used to build drug sensitivity prediction
   models.[345]^31^,[346]^74^,[347]^75 We firstly pre-selected the
   proteins in the training set based on their Pearson’s correlation
   coefficients with drug sensitivity, and then elastic net regression
   models were built using the 19 selected proteins. RMSE was used to
   select the optimal model using the smallest value. The final values
   used for the model were alpha = 0.770 and L1 ratio = 0.1. The model was
   used to predict the drug response of the 10 PDC models in the testing
   cohort. The Pearson’s correlation coefficients were calculated between
   the predicted drug sensitivity and examined ones to assess the
   prediction performance.

Identification of sorafenib sensitivity-related pathways

   To explore the pathways that may affect sorafenib sensitivity, the
   ssGSEA algorithm was used to evaluate the alteration of different
   pathways between PDC samples. The correlation analysis of sorafenib
   sensitivity with pathways was performed using the Pearson test, and p
   values of less than 0.01 were considered as statistically significant.

Quantification and statistical analysis

   Quantification and statistical analysis methods for single-omics and
   multi-omics analyses were mainly presented and cited in the respective
   [348]STAR Methods details.

   Standard statistical tests were used to analyze the clinical data,
   including but not limited to Wilcoxon test, Chi-square test, Fisher’s
   exact test, Log rank test. For categorical variables versus continuous
   variables, Wilcoxon test was used to test if any of the differences
   between the subtypes were statistically significant; for categorical
   variables versus categorical variables, Chi-square test and Fisher’s
   exact test were used; and for continuous variables versus continuous
   variables, Pearson correlation was used. All statistical tests were
   two-sided, and p value <0.05 statistical was considered statistically
   significant. Kaplan-Meier plots (Log rank test) were used to describe
   OS and RFS. Univariate Cox proportional hazards regression models were
   used to identify the variables associated with OS and RFS. Significant
   factors in the univariate analysis were further subjected to
   multivariate Cox regression analysis using a forward LR approach. All
   the analyses of clinical data were performed in R (version 3.6.3).

Acknowledgments