Abstract Hepatocellular carcinoma (HCC) is one of the most common cancers worldwide, and understanding its molecular pathogenesis is pivotal to managing this disease. Sequential window acquisition of all theoretical mass spectra (SWATH-MS) is an optimal proteomic strategy to seek crucial proteins involved in HCC development and progression. In this study, a quantitative proteomic study of tumour and adjacent non-tumour liver tissues was performed using a SWATH-MS strategy. In total, 4,216 proteins were reliably quantified, and 338 were differentially expressed, with 191 proteins up-regulated and 147 down-regulated in HCC tissues compared with adjacent non-tumourous tissues. Functional analysis revealed distinct pathway enrichment of up- and down-regulated proteins. The most significantly down-regulated proteins were involved in metabolic pathways. Notably, our study revealed sophisticated metabolic reprogramming in HCC, including alteration of the pentose phosphate pathway; serine, glycine and sarcosine biosynthesis/metabolism; glycolysis; gluconeogenesis; fatty acid biosynthesis; and fatty acid β-oxidation. Twenty-seven metabolic enzymes, including PCK2, PDH and G6PD, were significantly changed in this study. To our knowledge, this study presents the most complete view of tissue-specific metabolic reprogramming in HCC, identifying hundreds of differentially expressed proteins, which together form a rich resource for novel drug targets or diagnostic biomarker discovery. __________________________________________________________________ Liver cancer is one of the most common malignant cancers in the world, with more than 850,000 new cases worldwide annually[50]^1. This neoplasm is currently the second leading cause of cancer-related death globally, and the incidence is increasing[51]^2. Among all primary liver cancers, hepatocellular carcinoma (HCC) is the most common neoplasm, accounting for approximately 90% of all cases[52]^1,[53]^3,[54]^4,[55]^5,[56]^6,[57]^7,[58]^8. Hepatitis B virus (HBV) infection, hepatitis C virus (HCV) infection, alcohol abuse and intake of aflatoxin B1 are the main factors contributing to HCC[59]^1,[60]^3,[61]^4,[62]^5,[63]^6,[64]^7. In China, HCC has been ranked as the second most frequent fatal cancer since the 1990s[65]^9, and the majority of HCCs in China are caused by HBV infection[66]^10,[67]^11. Currently, surgical resection and liver transplantation are considered the best treatment options for early-stage HCC and are curative therapies for approximately 30% to 40% of early-stage patients[68]^3,[69]^12. Due to the asymptomatic features of HCC at early stages, patients are often diagnosed at very advanced stages. Thus, there is an urgent need to find key carcinogenesis-associated molecules for HCC diagnosis and treatment. Mass spectrometry (MS)-based proteomic analysis of human clinical tissues is a powerful tool to investigate cancer biomarkers and therapeutic targets[70]^13. Numerous clinical studies of HCC have been reported over the past decade using various quantitative techniques[71]^14,[72]^15,[73]^16,[74]^17,[75]^18,[76]^19,[77]^20, including SILAC (stable isotope labelling by amino acids in cell culture), iTRAQ (isobaric tags for relative and absolute quantification) and CDIT (culture-derived isotope tags) labelling techniques as well as label-free proteomics approaches based on quantification by ion intensity or spectral counting. Label-free approaches are relatively cheap compared to labelling approaches; when labelling reagents are not required, high-throughput and sensitive analyses in a mass spectrometer are possible. Quantitative studies of HCC using spectral counting and ion intensities have also been reported[78]^19,[79]^20. SWATH-MS (sequential window acquisition of all theoretical mass spectra) is an emerging label-free quantification approach that combines a highly specific data independent acquisition (DIA) method with a novel targeted data extraction strategy to mine the resulting fragment ion data sets. SWATH-MS has been widely used to compare protein expression and modify alterations[80]^21,[81]^22,[82]^23,[83]^24. To our knowledge, no SWATH-MS approach has been used to study HCC proteomics until now. In this study, we compared the protein expression of tumourous (HCC) and adjacent non-tumourous (non-HCC) tissues from 14 HBV-associated HCC patients using a SWATH-MS technique to identify new HCC biomarkers and potential therapeutic target candidates. In total, 338 differential proteins were quantified, and most down-regulated proteins were involved in metabolism. Sophisticated reprogramming of cell metabolic pathways was revealed. These observations are essential to elucidate the mechanisms underlying the occurrence and progression of HCC and contribute to the discovery of new candidates for early HCC diagnosis. Results Differentially expressed proteins quantified by SWATH-MS analysis in HCC tissues The experimental scheme of the present study is shown in [84]Fig. 1. HCC and non-HCC liver tissue samples were compared by SWATH-MS to identify differentially expressed proteins that can be used as biomarkers for HCC diagnosis or in HCC development and progression. To avoid individual differences and detect true HCC-related proteins, samples were analysed by equal pooling of two or three tissues from both groups to determine a quantitative expression ratio between HCC and non-HCC liver tissue groups based on total ion intensity normalization. Five biological replicates were analysed, and 14 pairs of tissue samples were used in total ([85]Supplementary Table S1). Figure 1. Quantitative proteomic workflow of human HCC and adjacent non-tumourous liver tissues analysed using a SWATH-MS approach. [86]Figure 1 [87]Open in a new tab The targeted identification of peptides in SWATH-MS datasets requires a priori generation of a spectral library that includes essential coordinates for each targeted peptide, such as precursor ion masses, fragment ion masses, fragment ion intensities and retention times[88]^21. For each biological replicate, a spectral library was generated with a traditional data-dependent acquisition (DDA) mass spectrometry technique as described in the Methods section. Five libraries were obtained in total. On average, the spectral libraries contained approximately 2,491 distinct protein groups, and 26,586 peptides were identified with greater than 99% confidence and passed the global false discovery rate (FDR) from fit analysis using a critical FDR of 1% ([89]Supplementary Table S2, [90]Supplementary Figure S1). Taken together, these findings indicated that the experimentally generated spectral libraries contained only high-confidence proteins. Following generation of the spectral library, the identification and quantification of HCC and non-HCC proteins were performed using a SWATH-MS approach as described by Gill et al.[91]^21, with modifications. Proteins were digested by trypsin, and the peptides were separated using a gradient of 120 min on a reverse-phase nanoLC instrument. SWATH data from six injections for each biological replicate were submitted in unison to PeakView software (Version 1.2, AB Sciex) for targeted data extraction, which resulted in the quantitative export of 2,122, 2,681, 1,860, 2,115 and 2,346 unique proteins for the analysis of five biological replicates ([92]Fig. 2a). In total, 4,216 proteins were quantified in at least one biological replicate, and 1,903 proteins were quantified in at least three biological replicates, which accounted for 45% of all quantified proteins ([93]Fig. 2b and [94]Supplementary Table S3). Figure 2. Quantitative proteomic analysis between HCC and non-HCC tissues. [95]Figure 2 [96]Open in a new tab (a) Numbers of proteins quantified by SWATH-MS in each biological replicate. Larger and smaller numbers in each column indicate total quantified and significantly changed proteins (p < 0.05, fold change (FC) ≥1.5 or FC ≤ 1/1.5) in each biological replicate. Rep1 through Rep5 are abbreviations of biological replicates 1 through 5. (b) Protein distributions were quantified one to five times. Numbers in parentheses indicate the percentage of total proteins. Legend numbers 1–5 on the right side show the repeat times of the quantified proteins. (c) Venn diagram depicting overlap of significantly regulated proteins (p < 0.05, FC ≥1.5 or FC ≤ 1/1.5) in five biological replicates. Red lines show proteins significantly regulated in at least 3 of 5 biological replicates. (d) Heatmap of 338 significantly regulated proteins, including 191 up-regulated and 147 down-regulated proteins. The colour bar on the right side represents the expression level of the proteins, corresponding to log2-ratios of FC (HCC/non-HCC). Red indicates up-regulation, and blue indicates down-regulation. Relative protein quantification was analysed using MarkerView (Version 1.2.1, AB Sciex) and R (Version 3.3.1, the R foundation), as described in the Methods section. Normality distribution of each technical replicates was performed after log2 transformation for the peak intensities of all the MS measurements prior to further analysis and the histograms were shown in [97]Supplementary Figure S2. In total, four criteria were used to filter out differential proteins. First, the Shapiro-Wilk test was used to test normality for each protein within one biological replicate, and only proteins that met normality were used for further analysis. Second, Welch’s t-test and Benjamini-Hochberg multiple test correction were performed, and an adjusted p_value < 0.05 and fold change (FC) ≥ 1.5 or FC ≤ 1/1.5 was considered statistically significant. With these two criteria, 865, 884, 550, 715 and 760 differentially expressed proteins were obtained for the five biological replicates ([98]Fig. 2a). Third, up- or down- regulated proteins were detected in at least three biological replicates are shown in the Venn diagram ([99]Fig. 2c). Fourth, the average ratio of up- or down-regulated proteins had to meet the 1.5-FC cutoff requirements. With the above mentioned four criteria, 191 up-regulated and 147 down-regulated proteins were obtained in total and are shown in the heatmap ([100]Fig. 2d). [101]Supplementary Table S4 and [102]Table S5 present the differentially expressed proteins for the adjusted p_value < 0.05 and FC ≥1.5 or FC ≤ 1/1.5 in five biological replicates and the final analysis results. To evaluate our SWATH-MS data, comprehensive proteomics-based quantitative expression data obtained from five previously published studies of HCC and adjacent/normal tissues were used to perform side-by-side comparisons[103]^15,[104]^17,[105]^18,[106]^20,[107]^25. The differentially expressed proteins quantified in these five studies were 151, 573, 71, 267 and 648. The comparison revealed that 49 (42), 155 (150), 28 (26), 68 (65) and 70 (62) differentially expressed proteins detected in our data were present in the five previous studies; the trends of the major proteins were consistent with published data (numbers in brackets represent proteins with the same change trend) ([108]Supplementary Table S6). In total, 214 (199) proteins in our dataset were detected in previous studies, which accounted for 63.31% (58.88%) of the 338 differential proteins. Among the 214 proteins, 99 proteins were detected in ≥3 studies, including our data ([109]Supplementary Table S6). Among these 99 frequently identified common proteins, 41 proteins were up-regulated, and 58 were down-regulated. Among the 338 differentially expressed proteins, nine were validated by western blotting in previous studies, and five were validated by immunohistochemical (IHC) methods. Seven of the nine western blotting-validated proteins (OTC, PEBP1, CPS1, BHMT, CLIC1, PPA1 and APEX1) and all five IHC-tested proteins (OTC, BHMT, CLIC1, PPA1 and APEX1) were detected in ≥3 data sets. All five proteins (OTC, BHMT, CLIC1, PPA1 and APEX1) were tested using both methods as HCC potential biomarkers. Our SWATH data were consistent with the western blotting- and IHC-validated results ([110]Supplementary Table S6). To decipher whether the differentially expressed proteins were detectable in plasma, we searched for these proteins in the Human Plasma Proteome database (HPPD)[111]^26. Approximately 85.50% of the differentially expressed proteins (289/338) appeared in this database, suggesting that they had relatively strong potential to be secreted into the blood ([112]Supplementary Table S7). Among these 289 proteins, the numbers of up- and down-regulated proteins were 169 and 120, respectively. Furthermore, 94 proteins were detected in ≥3 studies and were present in HPPD; among these, 38 proteins were up-regulated ([113]Supplementary Table S7). These proteins were potential biomarker candidates for HCC; some may be tested using multiple reaction monitoring (MRM), as described by Hou et al.[114]^27. The above results show the reliability of our SWATH-MS proteomic results and indicate that these differentially expressed proteins may be useful to delineate HCC properties and screen HCC biomarker candidates. GO and KEGG pathway enrichment analysis To investigate the function of these differentially expressed proteins, GO and KEGG pathway analyses of up-regulated and down-regulated proteins were performed separately by DAVID (Version 6.8, LHRI & DAVID Bioinformatics)[115]^28,[116]^29. This method easily determined the characteristics of up- or down-regulated proteins. Liver proteins downloaded from the human proteome map ([117]http://humanproteomemap.org) were used as the background dataset for enrichment analysis[118]^30. For up-regulated proteins, 34 significant enrichments were identified using GO analysis (p < 0.05, p_values were corrected using the Benjamini-Hochberg procedure). These were classified into three GO categories, including biological processes (BP, 5), molecular functions (MF, 6) and cellular components (CC, 23) ([119]Table 1). In BP, those items significantly participated in cell-cell adhesion (p = 7.12 × 10^−5), mRNA splicing via the spliceosome (p = 4.60 × 10^−3), SRP-dependent cotranslational protein targeting to membrane (p = 6.97 × 10^−3) and translation initiation (p = 9.07 × 10^−3). The most significant terms for MF and CC were poly(A) RNA binding (p = 5.60 × 10^−20) and extracellular exosome (p = 1.30 × 10^−15), respectively. For the down-regulated proteins, 66 significant enrichments were obtained, including 29 BP, 28 MF and 9 CC ([120]Table 1). The five most significant terms of BP were related to metabolic processes, including oxidation-reduction processes (p = 4.93 × 10^−21), xenobiotic metabolic processes (p = 1.03 × 10^−8), metabolic processes (p = 1.95 × 10^−8), drug metabolic processes (p = 3.32 × 10^−8) and epoxygenase P450 pathway (p = 7.15 × 10^−7). For MF and CC, the most significant terms were oxidoreductase activity (p = 1.03 × 10^−16) and mitochondrial matrix (p = 1.69 × 10^−18), respectively. Table 1. GO and KEGG pathway analyses of up-regulated and down-regulated proteins by DAVID. Enrichment terms Up-regulated proteins enrichment (p < 0.05) Down-regulated proteins enrichment (p < 0.05) GO: BP 5 29 GO: MF 6 28 GO: CC 23 9 KEGG pathway 1 37 Total 35 103 [121]Open in a new tab BP, biological processes; CC, cellular components; MF, molecular functions. To locate the key pathways implicated in HCC development and progression, KEGG pathway enrichment analysis was performed for 338 differentially expressed proteins. As for GO analysis, up- and down-regulated proteins were analysed separately, and the background of the enrichment used liver proteins from the human proteome map[122]^30. Pathway analysis showed that the up-regulated proteins were significantly enriched in term of spliceosome (p = 4.24 × 10^−2). For down-regulated proteins, 37 terms were enriched, and the most significant terms were metabolic pathways (p = 3.36 × 10^−42), glycine, serine and threonine metabolism (p = 9.03 × 10^−13), retinol metabolism (p = 1.65 × 10^−12), drug metabolism - cytochrome P450 (p = 6.72 × 10^−12) and biosynthesis of antibiotics (p = 7.05 × 10^−11). The all five significant GO terms of the BP for up-regulated proteins and the top 10 most significant GO terms of the BP and KEGG pathways for down-regulated proteins are shown in [123]Fig. 3(a–c). Those of MF and CC are in [124]Supplementary Figure S3(a–d). All enriched terms are shown in [125]Supplementary Table S8. Figure 3. GO and KEGG pathway enrichment of 338 significantly regulated proteins according to DAVID functional annotation. [126]Figure 3 [127]Open in a new tab (a) All 5 and significantly enriched biological processes of up-regulated proteins quantified using the SWATH-MS approach. (b,c) Top 10 significantly enriched biological processes and KEGG pathways of down-regulated proteins quantified using the SWATH-MS approach. Western blot validation for nine selected proteins in clinical HCC tissues Nine candidate proteins were selected for validation by western blot analysis using six sample pairs of HCC and non-HCC liver tissues that differed from those used in the proteomics studies ([128]Supplementary Table S1 patient ID 15–20). The candidate proteins were selected based on either dramatic fold change or involvement in key metabolic pathways. Five candidate proteins were up-regulated, namely FBXO2, ACSL4, PLIN2, PKM2 and GFPT1. The four down-regulated proteins were CYP1A2, FTCD, UGT2B7 and PCK2. Here, the analysis showed differential expression of all the candidates in HCC tissues compared with non-HCC tissues. FBOX2, ACSL4 and PLIN2 showed strong expression in all six tumour samples but weak or no expression in non-tumour tissues. PKM2 and GFPT1 showed generally high expression levels in five HCC tissues compared with the control group. For all down-regulated proteins, low expression was detected in HCC tissues compared with non-HCC tissues ([129]Fig. 4). The representative extracted ion chromatogram (XIC) comparisons in these nine proteins are shown in [130]Supplementary Figure S4. Overall, the western blot analysis results for all nine proteins were consistent with the proteomics data, which indicated that our proteomics data were highly reliable and that some proteins are worthy of further investigation. Figure 4. Validation of nine selected proteins in clinical HCC tissues by western blotting. [131]Figure 4 [132]Open in a new tab The abundance of FBXO2, ACSL4, PLIN2, PKM2, GFPT1, CYP1A2, FTCD, UGT2B7 and PCK2 proteins in HCC and adjacent non-HCC liver tissues were analysed by western blotting using six pairs of samples. The GAPDH protein was used as an internal reference. Discussion The purpose of this study was to characterize proteomic changes in HCC tissues, provide potential protein candidates for biomarker discovery and suggest molecular mechanisms of HCC development and progression. Although much proteomic research has been performed, the biological mechanisms of HCC development and progression are still unclear. Metabolic reprogramming is a hallmark of cancer[133]^31. Cancer cells can increase the amount of glucose and glutamine to satisfy energy needs and macromolecular synthesis demands. Therefore, understanding the metabolism of tumours remains an intense study topic with important therapeutic potential[134]^32. Using the newly developed SWATH-MS technique, we quantified more than 4,000 proteins, and 338 proteins were differentially expressed in HCC. Sophisticated metabolic reprogramming was revealed as depicted in [135]Fig. 5, including the following major aspects. Figure 5. Sophisticated metabolic reprogramming in HCC. [136]Figure 5 [137]Open in a new tab Our proteomic data revealed the up-regulation of glycolysis and pentose phosphate pathways and fatty acid biosynthesis and the down-regulation of gluconeogenesis; serine, glycine and sarcosine metabolism; and fatty acid β-oxidation. The red letters and arrows indicate up-regulated proteins and pathways, respectively, and the blue letters and arrows indicate down-regulated proteins and pathways. First, the oxidative pentose phosphate pathway (PPP) was up-regulated in HCC. PPP is the first branch pathway of glycolysis. In PPP, glucose-6-phosphate becomes partially oxidized to generate NADPH and ribose-5-phosphate. PPP is frequently elevated in tumourigenesis. In our study, two key enzymes—the rate-limiting enzyme glucose-6-phophate dehydrogenase (G6PD) and transaldolase (TALDO)—were over-expressed in HCC. The over-expression of G6PD and TALDO was detected in previous HCC references[138]18,[139]25,[140]33,[141]34.