Graphical abstract

   graphic file with name fx1.jpg
   [43]Open in a new tab

Highlights

     * •
       Tumor imaging and transcriptomics enables patient selection for
       good response to TACE
     * •
       PRETACE is a LR model based on tumor area and expression of FAM111B
       and HPRT1
     * •
       PRETACE predicts response to TACE with ∼90% accuracy
     __________________________________________________________________

   Transarterial chemoembolization (TACE) is a widely used first-line
   therapy for intermediate-stage hepatocellular carcinoma, but not all
   patients respond to the treatment. Using imaging data and transcriptome
   data from tumor biopsies, Boldanova et al. develop a logistic
   regression model that predicts responses to TACE.

Introduction

   Transarterial chemoembolization (TACE) is the most widely used
   treatment for intermediate-stage, unresectable hepatocellular carcinoma
   (HCC).[44]^1 TACE consists of an intra-arterial infusion of a cytotoxic
   agent followed by embolization of the tumor-feeding blood vessels,
   resulting in a strong cytotoxic and ischemic effect targeted to the
   tumor. In conventional TACE, chemotherapeutic drugs such as
   doxorubicin, epirubicin, cisplatin, or miriplatin are emulsioned with
   Lipiodol. In many centers, TACE is performed with drug-eluting beads
   (TACE-DEB). The DEB occlude the small tumor-feeding arterioles and
   obstruct arterial blood flow to the tumor. The chemotherapeutic drug is
   then released over a 1-week period. The efficacy of TACE was
   demonstrated in 2 randomized controlled trials,[45]^2^,[46]^3 and more
   recently confirmed in a large systematic review that included data from
   10,108 patients.[47]^4 The objective response rate was 52% and overall
   survival was 70% at 1 year, 52% at 2 years, and 32% at 5 years, with a
   median overall survival of 19.4 months.[48]^4 These numbers indicate
   that 30% of patients have a poor response to TACE and die within the
   first year. However, 32% of the patients are alive after 5 years. This
   suggests that TACE may be very efficacious for a subgroup of patients.
   Tumor burden, invasion of the portal vein and its main branches, liver
   function, and general performance status are criteria to select
   patients for TACE.[49]^5 Even in a carefully selected group of
   patients, however, response to TACE is variable. Moreover, there are
   alternative treatment options for patients in the intermediate
   Barcelona Clinic Liver Cancer (BCLC) stage B. Clearly, there is a
   medical need for predictive biomarkers to guide treatment decisions.

   We hypothesized that tumor-intrinsic characteristics are major
   determinants of the response to TACE. We therefore analyzed
   pretreatment tumor biopsies using total RNA and whole-exome sequencing,
   in addition to standard clinical and radiological measures. Our
   analysis focuses on the prediction of a complete response to treatment
   at 3 months following TACE. We explored the features that are most
   informative in predicting the response to TACE in a discovery cohort of
   33 tumors, 13 of which showed complete response to TACE at 3 months.
   Using random forest (RF) models of the discovery cohort, we identified
   the most informative features and tested ensembles of logistic
   regression (LR) models against the validation cohort. The most
   informative gene expression features were tested independently using
   reverse transcription-digital droplet PCR (RT-ddPCR), which is more
   readily accessible in the clinic. We report the distributions of the
   most informative features for predicting response to TACE in both the
   discovery and validation cohorts. The biological pathways associated
   with predictive features are described in the context of
   carcinogenesis.

Results

Patient clinical data and response to TACE

   From May 2008 to July 2017, 30 HCC patients were included in the
   discovery cohort: 11 (37%) patients with BCLC intermediate-stage HCC
   and 19 patients (63%) with BCLC early-stage HCC. The early-stage
   patients were treated with TACE because comorbidities precluded them
   from surgical treatment, because local ablation therapies were not
   possible due to the location of the tumor nodules, or because TACE was
   performed as a bridge-to-transplant treatment. Most of the patients
   (83%) were male and had cirrhosis of the liver (80%). Alcohol was the
   main reason for the liver disease (50%), followed by chronic hepatitis
   C infection (23%). One patient had no liver disease; 77% of the HCCs
   had Edmondson grade I or II and 23% grade III. The diameter of the
   tumors ranged from 14 to 155 mm (average 49 mm, median 44 mm). From
   December 2016 to March 2020, 7 additional patients were included in a
   validation cohort. The baseline characteristics of the 2 cohorts were
   consistent ([50]Figure S1C). Patient and tumor characteristics are
   summarized in [51]Table S1.

   In the discovery cohort, radiological complete responses (CRs) were
   found in 39.4% of biopsied nodules in computed tomography (CT) scans
   obtained at 3 months after TACE. For the subsequent data analysis,
   these tumors were classified as “responders,” and all others (including
   the modified Response Evaluation Criteria in Solid Tumors (mRECIST)
   categories partial response [PR], stable disease [SD], or progressive
   disease [PD]) as “non-responders.” Treatments, response to TACE, and
   clinical follow-up of patients in the discovery cohort are shown in
   [52]Figure S1A. At the time of the last follow-up, 26 of the 30
   patients were dead and 4 were alive (3 of them had received
   transplants). The overall survival in the discovery cohort of patients
   was 97%, 83%, 63%, and 43% at 6, 12, 18, and 24 months after the
   baseline imaging. Treatments, response to TACE, and clinical follow-up
   of patients in the validation cohort are shown in [53]Figure S1B.

Tumor area is predictive of response to TACE

   The clinical and radiological information of the study patients at
   baseline is summarized in [54]Table S1. In a univariate analysis using
   t test statistics, tumor area was the only feature that was
   significantly different between responder and non-responder tumors
   ([55]Figure 1A). In a multivariate analysis, we assessed the importance
   of these features in predicting response to TACE using RF models.
   Overall, standard clinical features such as age, sex, etiology of
   underlying liver disease, tumor stage, and grade, as well as
   radiological metrics for HCC nodules, were poor predictors of response
   to TACE treatment, yielding a mean accuracy of 64% across the bootstrap
   resampling ([56]Figure 1B).

Figure 1.

   [57]Figure 1
   [58]Open in a new tab

   Feature importance in predicting response to TACE

   (A) Boxplots showing distribution of tumor area for responder and
   non-responder tumors (∗∗∗p ≤ 0.001 for Student’s t test).

   (B) Feature importance scores from a random forest model involving
   clinical and radiological measurements in prediction of complete
   response to TACE at 3 months. The boxplots indicate the distribution of
   scores for 500 bootstraps of training and testing sets using the
   discovery cohort. Area_log10, log10 of tumor area; Mpp, mean of
   positive pixels; AFP_pre, pre-treatment α-fetoprotein (AFP) measurement
   at the time of biopsy; AFP_post, post-treatment AFP measurement
   1–3 months after TACE; num_nodules, number of TACEed nodules.

Transcriptomic features are predictive of response to TACE

   When considering the whole transcriptome dataset covering 20,126 genes,
   the first 3 principal components of the normalized count matrix can
   readily distinguish normal liver, non-tumor, and tumor samples
   ([59]Figure S2A). There is a hyperplane separating normal liver and
   non-tumor samples and another that separates tumors from paired
   non-tumor samples.

   We next compared the transcriptome of tumors against their paired
   non-tumor tissue. There are 341 significantly differentially expressed
   genes when contrasting tumors versus non-tumors. Using the normalized
   count matrix of these differentially expressed genes, projection in the
   first 2 principal components reveals that the distance between paired
   non-tumor and tumor samples is significantly greater for non-responder
   tumors ([60]Figure S2B). Although there is a significant difference in
   the distribution of these projected distances ([61]Figure S2C), it is
   not among the most informative features in RF models for response to
   TACE ([62]Figure S2D).

   Considering only tumor samples, differential expression analysis with a
   contrast of response versus non-response yields 289 significantly
   differentially expressed genes ([63]Table S2). Decomposition of the
   matrix of differentially expressed genes by principal-component
   analysis (PCA) ([64]Figure 2A) indicates a significant separation of
   response and non-response classes along the first principal component
   ([65]Figure 2B). Hierarchical clustering of this gene expression matrix
   produces 2 main clusters, which generally split the response and
   non-response classes ([66]Figure 2C). RF models indicate that FAM111B,
   hypoxanthine phosphoribosyltransferase 1 (HPRT1), and the projection
   along the first principal component are the most informative
   transcriptomic features in predicting tumor response to TACE
   ([67]Figure S2D).

Figure 2.

   [68]Figure 2
   [69]Open in a new tab

   Transcriptomic features are predictive of response to TACE

   (A) Scatterplot of the first 2 principal components of the RNA-seq gene
   expression matrix for significantly differentially expressed genes
   between responder and non-responder tumors. Expression values were
   normalized using the variance stabilizing transformation implemented in
   DESeq2.

   (B) Boxplot showing the distribution of projections along the first
   principal component for responder and non-responder tumors (∗∗∗p ≤
   0.001 for Student’s t test).

   (C) Heatmap showing hierarchically clustered RNA-seq gene expression
   vectors for differentially expressed genes between responder and
   non-responder tumors. The responder tumor labels are prefixed with
   “R_.” Expression values were normalized using the variance stabilizing
   transformation implemented in DESeq2 and scaled before hierarchical
   clustering using Pearson distance and Ward D2 agglomeration implemented
   in R version 4.0.3 using the heatmap.2 function from the gplots
   package.

Expression of hypoxia and drug efflux pathways are not correlated with
response to TACE

   Among the genes differentially expressed between responder and
   non-responder tumors, there is an enrichment of Reactome pathways[70]^6
   related to cell cycle and division, DNA repair, Golgi and endoplasmic
   reticulum homeostasis, platelet production, and major
   histocompatibility complex (MHC) class II presentation ([71]Figure 3).
   Strikingly absent are several pathways with well-established links to
   HCC progression, namely the hypoxia-inducible factor 1α (HIF1α) hypoxia
   pathway,[72]^7^,[73]^8 the ABC efflux transporter
   family,[74]^9^,[75]^10 and the targets of Notch
   signaling.[76]^11^,[77]^12 We further explored these pathways based on
   gene sets from previous studies. We used a well-characterized set of
   HIF1α target genes that form the core response to hypoxia[78]^13
   ([79]Table S3). Although P-glycoprotein (ABCB1/MDR1) is the canonical
   transporter for the efflux of chemotherapeutics,[80]^14 we extended our
   search to the entire set of ABC transporters, which were expressed in
   our dataset ([81]Table S3). The set of Notch target genes was derived
   from a previous study[82]^11 ([83]Table S3). Although none of the 3
   pathways are significantly enriched among genes that are differentially
   expressed between responder and non-responder tumors, subsets of the
   non-responder tumors have expression values that fall outside the range
   of expression observed for responder tumors in different genes of these
   pathways ([84]Figures S3A–S3C). However, we observed a similar pattern
   for random subsets of expressed genes ([85]Figures S3D–S3F). Although
   there is no discernible enrichment of differentially expressed Notch
   target genes, we observed significant differential expression of
   several genes involved in regulation of Notch signaling, specifically
   Cdk1, Fos, and HMOX1 ([86]Figures S3G–S3I).

Figure 3.

   [87]Figure 3
   [88]Open in a new tab

   Dotplot showing a subset of the most significantly enriched Reactome
   pathways among genes differentially expressed between responder and
   non-responder tumors

   p values were adjusted using false discovery rate control. The count
   indicates the number of genes within the pathway that are
   differentially expressed.

Transcriptomic and radiological features accurately predict response to TACE

   RF models clearly indicate that the tumor area at baseline and a small
   number of transcriptomic features are the most predictive of response
   to TACE. LR models based on tumor area and the expression of FAM111B
   and HPRT1 achieve ∼90% accuracy in the discovery cohort, with less
   specificity than sensitivity ([89]Figure 4A). A majority vote ensemble
   of 500 bootstraps of this LR model using the RNA sequencing
   (RNA-seq)-derived expression data of FAM11B and HPRT1 and tumor area
   correctly predicts response to TACE for 7/8 tumors in the validation
   cohort (data not shown). Apart from response to TACE, there was no
   significant association between expression levels of these 2
   transcripts and baseline characteristics such as BCLC stage or
   underlying liver disease (data not shown). We next evaluated the
   performance of similar LR decision support ensembles using
   semiquantitative measures of FAM111B and HPRT1 at the protein level
   using immunohistochemical staining of tumor biopsies ([90]Figure S4A).
   The protein abundances were generally consistent with the cognate
   RNA-seq gene expression measures; however, the correlation with
   response to TACE was less significant ([91]Figures S4B and S4C). We
   then evaluated the expression of FAM111B and HPRT1 using RT-ddPCR,
   which is generally more accessible and cost-effective in the clinic
   than RNA-seq. The RT-ddPCR gene expression estimates were consistent
   with the RNA-seq data ([92]Figure S4D). LR models based on the RT-ddPCR
   data achieved only ∼80% accuracy in the discovery cohort
   ([93]Figure 4B); however, bootstrapped ensembles also correctly
   predicted response to TACE for 7/8 tumors in the validation cohort.

Figure 4.

   [94]Figure 4
   [95]Open in a new tab

   Transcriptomic and radiological features accurately predict response to
   TACE

   (A) Violin plots showing the distribution of accuracy, sensitivity, and
   specificity scores for logistic regression (LR) models using tumor area
   at baseline, and RNA-seq gene expression measures of FAM111B and HPRT1.
   The gene expression measures were normalized using the variance
   stabilizing transformation implemented in DESeq2, followed by scaling
   for use in the LR models. The violin plots indicate the distribution of
   scores for 500 bootstraps of training and testing sets of the discovery
   cohort.

   (B) Violin plots showing the distribution of accuracy, sensitivity, and
   specificity scores for LR models using tumor area at baseline and
   RT-ddPCR gene expression measures of FAM111B and HPRT1. The gene
   expression measures were calculated as the average of 2 technical
   replicates followed by scaling for use in the LR models. The violin
   plots indicate the distribution of scores for 500 bootstraps of
   training and testing sets of the discovery cohort.

Discussion

   Treatment allocation for patients with HCC is currently based largely
   on radiological staging criteria, liver function, and general
   performance status.[96]^5 The widely used BCLC staging system
   classifies patients into 5 stages and allocates the different treatment
   modalities according to these stages. However, within a given stage,
   several treatment options are available, and there is an urgent medical
   need for additional criteria that guide the allocation of the different
   treatments within a BCLC staging category. TACE is the first treatment
   of choice for patients in BCLC stage B (intermediate stage). TACE is
   highly effective in ∼30% of patients,[97]^4 but for the other 70%,
   alternative therapies may be a better choice. This question has become
   even more topical with the increasing number of new systemic
   combination treatments that are being explored in clinical
   trials[98]^15 or have even been shown to be effective.[99]^16 In the
   present study, we investigated whether molecular characteristics
   extracted from the analysis of treatment-naive tumor biopsies could
   predict response to TACE.

   Genomic analyses have revealed that the telomerase reverse
   transcriptase (TERT) promoters CTNNB1 (encoding β-catenin) and TP53
   (encoding p53) are frequently mutated in HCC, while genes involved in
   other critical processes, such as oxidative stress response, chromatin
   remodeling, and hepatocyte differentiation, are recurrently mutated but
   in <10% of HCCs.[100]17, [101]18, [102]19 We investigated whether
   genomic analysis of tumor biopsies could predict the response to TACE.
   The whole-exome sequencing and variant analysis of our set of
   pretreatment tumor biopsies yielded a sparse matrix of known cancer
   mutations ([103]Table S4). Genomic information, however, was not useful
   for response prediction. Even among the most frequent mutations, there
   is no discernible distinction between responder and non-responder
   classes ([104]Table S4).

   The transcriptomic data readily distinguish tumor response to TACE,
   however ([105]Figure 2), confirming our hypothesis that tumor intrinsic
   characteristics regulate response to treatments. We expected to find an
   upregulation of pathways known to confer resistance to hypoxia or
   chemotherapy. However, gene expression patterns of hypoxia, Notch, and
   drug efflux pathways do not generally distinguish responder and
   non-responder tumors. It is conceivable that these pathways are
   predominantly regulated on a post-transcriptional level and could be
   assessed by proteomic profiling. Intriguingly, several non-responsive
   tumors fit our expectations: non-responsive tumor D359 has a high
   expression of 14 HIF1 target genes; tumors C710 and B854b have a high
   expression of 9 and 8 different ABC transporters, respectively; and
   B702b and A915b have a high expression of 7 and 6 targets of Notch
   signaling, respectively. More complex decision tree models may be able
   to capture these signals for response prediction from larger cohorts.
   Although the targets of Notch signaling are not predictive of response
   to TACE, several regulators of Notch signaling were significantly
   differentially expressed between responder and non-responder tumors
   ([106]Figure S3G–S3I). Cdk1 is directly implicated in Notch signaling
   via phosphorylation of the Notch intracellular domain,[107]^20 Fos is
   linked to Notch signaling via cross-talk with AP-1,[108]^21 and HMOX1
   is linked to Notch signaling via cross-talk with
   Nrf2.[109]^22^,[110]^23

   Family with sequence similarity 111 member B (FAM111B) encodes a
   protein with a trypsin-like cysteine/serine peptidase domain. Causative
   mutations in FAM111B have been identified in patients with congenital
   poikiloderma.[111]^24 The molecular function of FAM111B is not entirely
   clear. In multiple myeloma, it has been identified as a target gene of
   TP53,[112]^25 and in the human adenocarcinoma cell line A549, FAM111B
   was shown to degrade the cell-cycle inhibitor p14.[113]^26 Furthermore,
   FAM111B expression was significantly correlated with clinical
   progression and poor prognosis in lung
   adenocarcinoma.[114]^25^,[115]^26 It is therefore conceivable that in
   HCC, FAM111B is a surrogate marker for the subclass of TP53 mutant
   HCCs. In reanalyzing The Cancer Genome Atlas (TCGA) dataset on
   HCC,[116]^17 we found a significant increase in FAM111B expression in
   TP53-mutated tumor samples (data not shown).

   HPRT plays a central role in the purine salvage pathway and is widely
   regarded as a housekeeping gene. However, HPRT expression on the mRNA
   and protein levels is induced by HIF1α.[117]^27 HPRT is most likely a
   critical pathway that helps preserve the purine nucleotide resources of
   the cell under hypoxic conditions. One may speculate that HPRT
   expression is a surrogate marker for tumor cell adaptation to hypoxia,
   adaptations that protect against the “embolization arm” of TACE.

   Given the significant separation of the transcriptomic profile between
   responder and non-responder tumors, we combined transcriptomic data
   with clinical data in an LR analysis to identify the most important
   features that correlate with treatment outcome. Clinicians are well
   aware of the fact that increasing tumor size reduces the chance of
   complete response to TACE. Not surprisingly, tumor area was one of the
   most informative features in our analysis, and tumor area alone in an
   LR model yields an accuracy of ∼75% in predicting response to TACE in
   the discovery cohort samples. Adding the expression values of just 2
   transcripts increased this value to ∼90%. The two patients 10 and 13
   exemplify the relative contribution and importance of the components of
   the predict response to TACE (PRETACE) algorithm. In these patients, 2
   tumor nodules were treated and showed discrepant responses. In patient
   10, the 2 nodules did not differ in size, but in their transcriptome
   (samples C737 and C738; [118]Figure 2C). However, in patient 13, the 2
   nodules have closely related transcriptional profiles (samples C696 and
   C697; [119]Figure 2C), but C696 is larger than C697. Since RNA-seq from
   tumor biopsies may not be feasible for the next several years in a
   routine setting, we explored alternative methods for quantification of
   HPRT1 and FAM111B. Clinically most useful would be an
   immunohistochemical staining of histopathological sections. Both
   proteins can be detected, and their expression can be assessed in a
   semiquantitative approach ([120]Figures S4A and S4B). However, compared
   to the quantification of the gene transcripts, immunohistochemistry was
   less accurate. We next explored the usefulness of RT-ddPCR. LR models
   using RT-ddPCR and tumor area ([121]Figure 4B) were overall less
   accurate in discovery cohort compared to the RNA-seq-based models
   ([122]Figure 4A) but equally accurate in predicting response to TACE in
   the validation cohort. Of note, ddPCR has been proven to provide
   reproducible absolute quantification across different platforms[123]^28
   and laboratories.[124]^29 We therefore propose to implement RT-ddPCR
   quantification in our PRETACE algorithm.

   There are several non-response tumors (B766b, B904b, C168b, and C696)
   that cluster with response tumors at the level of gene expression
   ([125]Figure 2C). These non-response tumors do not have extremely large
   tumor areas; however, one of these patients had extensive
   comorbidities. We can only speculate why these tumor nodules with a
   “responder” profile did not respond to TACE, but it could have been due
   to technical failure of the TACE procedure.

   In conclusion, the LR model PRETACE using tumor area and expression
   values of HPRT1 and FAM111B accurately predicts the response of HCC
   tumor nodules to TACE. It is a first proof-of-principle that the
   molecular analysis of HCC tumor biopsies allows us to predict response
   to treatment. We propose that tumor biopsy-based approaches be explored
   to identify predictive biomarkers of response to other treatment
   modalities as well.

Limitations of the study

   The study has several limitations. The sample size of the cohorts is
   small, and we cannot exclude a selection bias. However, our findings
   are well supported statistically in both the discovery and the
   validation cohorts. The study is also not powered to analyze the impact
   of radiological response to TACE at time point 3 months on overall
   survival. Kaplan-Meier analysis revealed a hazard ratio of 0.60 in
   favor of TACE responders, but the result was statistically not
   significant ([126]Figure S1D). Nevertheless, clinical experience
   strongly suggests that non-response to TACE is associated with poorer
   prognosis unless patients receive an effective second-line therapy.

STAR★Methods

Key resources table

   REAGENT or RESOURCE SOURCE IDENTIFIER
   Antibodies
     __________________________________________________________________

   Polyclonal rabbit anti-human FAM111B Antibody Invitrogen Cat#PA5-58474;
   RRID:[127]AB_2641223
   Monoclonal rabbit anti-human HPRT antibody abcam Cat#ab109021;
   RRID:[128]AB_10866312
     __________________________________________________________________

   Biological samples
     __________________________________________________________________

   Tumor and Non-Tumor liver biopsy tissue University Hospital Basel This
   paper
     __________________________________________________________________

   Critical commercial assays
     __________________________________________________________________

   ZR-Duet DNA and RNA MiniPrep Plus kit Zymo Research, Irvine, CA
   Cat#D7003
   SureSelectXT Clinical Research Exome Agilent Technologies Cat#5190-7339
   SureSelect Human All Exon V6+COSMIC Agilent Technologies Cat#5190-9308
   TruSeq Stranded Total RNA Library Prep Kit with Ribo-Zero Gold Illumina
   Cat#20020599
   QX200 ddPCR EvaGreen Supermix Bio-Rad Cat#1864034
   QX200 Droplet Generation Oil for EvaGreen Bio-Rad Cat#1864005
   DG8 Cartridges for QX200/QX100 Droplet Generator Bio-Rad Cat#1864008
   DG8 Gaskets for QX200/QX100 Droplet Generator Bio-Rad Cat#1863009
   PCR Plate Heat Seal, foil, pierceable Bio-Rad Cat#1814040
   ddPCR 96-Well Plates Bio-Rad Cat#12001925
   MultiScribe Reverse Transcriptase Invitrogen Cat#4311235
   Random hexamers Roche Cat#11034731001
   RNase Inhibitor Applied Biosystems Cat# N8080119
   10 mM dNTSs Promega Cat# U1515
     __________________________________________________________________

   Deposited data
     __________________________________________________________________

   RNA-sequencing data (41 tumors and 37 adjacent non-tumor tissue and 15
   liver biopsies with normal histology) This paper European
   Genome-phenome Archive: EGAS00001005558
   Whole-exome sequencing data (122 tumors and 115 adjacent non-tumor
   liver tissues) Ng et al., unpublished data European Genome-phenome
   Archive: EGAS00001005073
     __________________________________________________________________

   Oligonucleotides
     __________________________________________________________________

   FAM111B – forward, 5′-CTGGCATAAGA
   AAGTGTAGCAGCA-3′ This paper N/A
   FAM111B – reverse, 5′-GGGCTGAGTAG
   ATACTCTCGCTG-3′ This paper N/A
   HPRT1 – forward, 5′-ACATTGTAGCCCT
   CTGTGTGC-3′ This paper N/A
   HPRT1 – reverse, 5′-AATCCAGCAGGT
   CAGCAAAGA-3′ This paper N/A
     __________________________________________________________________

   Software and algorithms
     __________________________________________________________________

   mint Lesion™ 3.0 software Mint Medical GmbH, Heidelberg, Germany
   [129]https://mint-medical.com
   GraphPad Prism 9.2.0 GraphPad Software, San Diego
   [130]https://www.graphpad.com
   STAR 2.7.1a Dobin et al., 2013[131]^30 N/A
   R 4.1.0 [132]https://cran.r-project.org/ N/A
   DESeq2 1.32.0
   [133]https://bioconductor.org/packages/release/bioc/html/DESeq2.html
   N/A
   scikit-learn 0.24.1 [134]https://scikit-learn.org/ N/A
     __________________________________________________________________

   Other
     __________________________________________________________________

   PRETACE Python code This paper [135]https://github.com/gfucile/PRETACE
   [136]Open in a new tab

Resource availability

Lead contact

   Further information and requests for resources and reagents should be
   directed to and will be fulfilled by the lead contact, Markus H. Heim
   ([137]markus.heim@unibas.ch).

Materials availability

   This study did not generate new unique reagents.

Experimental model and subject details

   From May 2008 to July 2017, 30 HCC patients undergoing TACE ([138]Table
   S1) and 15 control patients with no histopathological liver disease (4
   female and 11 male, median age 48 (range 30-62) and 54 (range 33-80),
   respectively) at the University Hospital Basel were included in our
   study. The patients undergoing diagnostic tumor and/or liver biopsy at
   the University Hospital Basel agreed to donate additional biopsies for
   research purposes. Written informed consent was obtained in accordance
   with the study protocol approved by the ethics committee of the
   Northwestern part of Switzerland (authorization number EKNZ 2014-099).
   The indication for TACE was provided by the institutional
   multidisciplinary gastrointestinal tumor board. Computed tomography
   (CT) scans and biopsies of tumors and liver parenchyma were obtained
   prior to treatment. Tumor staging was done according to the Barcelona
   Clinic Liver Cancer (BCLC) classification system.[139]^31

Method details

CT imaging

   All patients underwent dynamic CT consisting of four acquisitions:
   unenhanced, arterial phase (AP), portal venous phase (PVP) and 3-min
   delayed phase (DP). CT scans were performed using a 128-slice (SOMATOM
   Definition AS+ or SOMATOM Definition Edge, Siemens Healthineers,
   Erlangen, Germany) or a 256-slice (SOMATOM Definition Flash, Siemens
   Healthineers) scanner system. Following the unenhanced scan, 1.2 mL/kg
   of 370 mg I/ml Iopromide (Ultravist® 370, Bayer Pharma) or 370 mg I/ml
   Iopamidol (Iopamiro® 370, Bracco S.p.A.) were injected intravenously.
   Using bolus tracking technique, AP images were acquired 18 s after
   reaching an enhancement of 100 HU in the descending aorta at the level
   of the celiac trunk. PVP and DP images were obtained 70 and 180 s after
   reaching the scan initiation threshold, respectively. CT imaging was
   done at baseline prior to TACE, and after TACE every 3 months.

   Image datasets of all patients and time points were imported into mint
   Lesion™ 3.0 software (Mint Medical GmbH, Heidelberg, Germany;
   commercially available) for post-processing. Every lesion was manually
   segmented on arterial phase images in axial plane (slice thickness
   1.5 mm) according to mRECIST 1.1 criteria[140]^32 by one radiologist
   with two years of experience in abdominal imaging (JV). All
   segmentations were subsequently reviewed by one staff radiologist
   specialized in abdominal imaging with > 15 years of experience (DTB)
   and one staff radiologist specialized in abdominal imaging and
   interventional radiology with > 15 years of experience (CJZ).
   Disagreements were resolved by consensus. For each tumor nodule, short
   and long axis were measured (mm) and the largest area was calculated
   (mm^2). CT texture analysis of each segmented ROI was performed
   automatically by mint Lesion™ software based on gray-level histograms
   and included the following parameters: entropy, kurtosis, skewness,
   mean of positive pixels (MPP) and uniformity of positive pixel
   distribution (UPP).

TACE procedure

   TACE procedures were performed or supervised by two interventional
   radiologists with > 15 and > 10 years of experience. The right femoral
   artery was punctured using the Seldinger technique and a 4-French Cobra
   catheter was inserted into the celiac trunk and common hepatic artery,
   respectively. In case of variant vascular anatomy, also the superior
   mesenteric artery was catheterized. To visualize feeding arteries of
   the tumor, digital subtraction angiography was performed and feeding
   vessels were superselectively intubated with a highly flexible
   2.4-French microcatheter. For the following embolization,
   Doxorubicin-coated 100-300 μm beads were slowly injected under
   fluoroscopic guidance. The total applied dose per patient was between
   100 to 150 mg Doxorubicin. If stasis in the feeding vessel and
   disappearance of tumor staining was observed earlier, injection was
   stopped at a lower total Doxorubicin dose, respectively. A closing
   Dyna-CT was obtained to assess residual staining of treated lesions and
   vessel blockage.

Radiological response assessment

   Response of a biopsied HCC lesion to treatment with TACE as per mRECIST
   1.1 criteria was calculated automatically after manual segmentation of
   the enhancing portions of each lesion on arterial phase images at
   baseline and all follow-ups. mRECIST 1.1 criteria are: Complete
   response (CR) = disappearance of any intratumoral arterial enhancement;
   Partial response (PR) = at least a 30% decrease in the sum of diameters
   of viable tumor (arterial phase enhancement); Stable disease (SD) = any
   cases that do not qualify for either partial response or progressive
   disease; Progressive disease (PD) = an increase of at least 20% in the
   sum of the diameters of viable (enhancing) tumor. If a lesion did not
   achieve CR or progressed at the post-therapeutic follow-ups, the
   institutional multidisciplinary gastrointestinal tumor conference
   decided the next therapeutic step.

Biopsy procedure

   Biopsies were taken from tumor lesions and from non-tumor liver
   parenchyma using an US-guided coaxial technique as previously
   described.[141]^33 In 3 cases we obtained biopsy of the 2 different
   tumor nodules of the same patient, in 1 patient we obtained 2 biopsies
   of the same large tumor. 15 normal biopsies from patients (4 female,
   median age 48 (range 30-62); 11 male, median age 54 (range 33-80)
   without HCC and with normal liver values were included in the RNaseq
   analysis as ‘normal liver’.

Histology/pathology

   Liver biopsies used for routine diagnostic histopathology were
   processed using standard procedures. Histopathological evaluation was
   performed by L.M.T., an experienced board-certified pathologist
   specialized in liver pathology. Tumors were graded according to the
   Edmondson grading system.[142]^34 To be included in the study, tumor
   biopsies were required to contain > 50% tumor tissue, and non-tumor
   liver parenchyma samples had to be completely free of cancer cells.
   Immunohistochemical staining was performed on 4 μm thick sections.
   Tissue sections were deparaffinized, and subsequently subjected to
   heat-induced epitope retrieval for 20 minutes in citrate buffer (pH6.0
   for HPRT1) and EDTA buffer (pH9.0 for FAM111B). Slides were then
   stained using anti-FAM111B (Invitrogen PA5-58474; dilution 1:600) and
   anti-HPRT-1 (Abcam EPR5299; dilution 1:750) on a Leica Bond III IHC
   staining system (Muttenz, Switzerland) using DAB as chromogen. Images
   were acquired using an Olympus BX46 microscope ([143]Figure S4A). The
   expression of tissues was evaluated and scored by two board certified
   pathologists (L.M.T., C.E.) who were unaware of the clinical data. Each
   tumor was scored semi quantitively for each staining by multiplying the
   proportion of positive cells (%) and the staining intensity score (0,
   none; 1, weak; 2, moderate; 3, strong).

RNA and DNA extraction

   Snap frozen biopsies were crushed in liquid nitrogen, subsequent
   genomic DNA and total RNA were extracted using the ZR-Duet DNA and RNA
   MiniPrep Plus kit (Zymo Research) according to the manufacturer’s
   instructions. RNA isolation included a DNase digestion according to the
   Zymo kit instructions.

Next-generation sequencing

   RNA-Seq libraries (TruSeq Stranded Total RNA Ribo-Zero Gold Kit) were
   sequenced on an Illumina HiSeq 2500 at the Genomics Facility Basel. For
   whole exome sequencing (WES), whole-exome capture was performed using
   the SureSelectXT Clinical Research Exome (Agilent Technologies) or
   SureSelect Human All Exon V6+COSMIC (Agilent Technologies) platforms
   according to the manufacturer’s guidelines. Sequencing was performed on
   an Illumina HiSeq 2500 at the Genomics Facility Basel according to the
   manufacturer’s guidelines. Paired-end 101-bp reads were generated.

Reverse-transcription digital droplet PCR (RT-ddPCR)

   For RT-ddPCR, 250 ng RNA were reverse transcribed using MultiScribe™
   Reverse Transcriptase (RT) (Thermo Fischer Scientific) and random
   hexamer primers (Roche) in a 25 μL reaction volume according to the
   manufacturer’s instructions. Reactions omitting the RT enzyme or the
   RNA template served as negative controls. 2 μl of RT reaction (or water
   as negative control) was used for ddPCR in a total volume of 20 μl of
   1x QX200 ddPCR EvaGreen Supermix (Bio-Rad, #1864034) and primers (50 nM
   each). Droplets were generated in the QX200 Droplet Generator (Bio-Rad)
   with EvaGreen compatible oil (Bio-Rad, #1864006) following the device’s
   instruction manual. Amplification was performed in 96-well plates
   (Bio-Rad, #12001925) on a C1000 Touch Thermal Cycler (Bio-Rad) using
   the recommended cycling conditions for EvaGreen Supermix (5 min at
   95°C, followed by 40 cycles of 95°C for 30 s and 60°C for 1 minute,
   then 1 cycle of 4°C for 5 min and 95°C for 5 min, and a final step of
   5 min at 95°C; ramp rate 2°C/sec). After overnight incubation at 4°C,
   the plate was analyzed in the QX200 Droplet Reader (Bio-Rad), and the
   absolute copy numbers per 20 ng input RNA for each transcript were
   calculated with the QuantaSoft (Bio-Rad) software. The following
   primers were used for FAM111B and HPRT1 specific ddPCR: FAM111B –
   forward, 5′-CTGGCATAAGAAAGTGTAGCAGCA-3′; reverse,
   5′-GGGCTGAGTAGATACTCTCGCTG-3′; HPRT1 – forward,
   5′-ACATTGTAGCCCTCTGTGTGC-3′; reverse, 5′-AATCCAGCAGGTCAGCAAAGA-3′.
   Primer specificity was confirmed by analyzing the ddPCR products on a
   2% agarose gel. Each primer pair produced a single sharp band of the
   expected size.

Quantification and statistical analysis

Deep sequencing data analysis

   Sequences were aligned against the NCBI human GRCh38 assembly using
   STAR[144]^30 version 2.7.1a in 2-Pass mode, with an average mapping
   rate of 85% for uniquely mapped reads corresponding to an average of 47
   million aligned reads per sample. Counts were summarized at the gene
   level using featureCounts from the Subread package using strand
   information and the NCBI GRCh38 annotations.[145]^35 Differential
   expression was assessed using the DESeq2 Bioconductor package,[146]^36
   specifically the Wald test for significance of GLM coefficients. We
   used a significance threshold of 1.5 fold (0.67 log2-fold) change and
   adjusted P values < 0.05. Variance stabilizing transformation as
   implemented in DESeq2 was used as a normalization scheme for principal
   components analysis and hierarchical clustering. Validation cohort
   samples were normalized as an ensemble with the discovery cohort for
   the final models presented in this study. Principal Components Analysis
   (PCA) was conducted in R using the prcomp function without scaling.
   Heatmaps were generated using the heatmap.2 function in R, with Pearson
   distance and Ward D2 agglomeration of row-scaled VST counts.

   Pathway enrichment analysis was conducted in R using the
   ClusterProfiler[147]^37 and ReactomePA[148]^6 packages using
   hypergeometric tests.

   For whole exome sequencing, sequence alignment and single nucleotide
   variants and indels analysis were performed exactly as
   described.[149]^33

Random forest and logistic regression analysis

   Random Forest (RF) and Logistic Regression (LR) classifiers were
   developed using the Python scikit-learn library. Selecting subsets of
   the most informative features was based on an outlier analysis of the
   distribution of RF model feature importance metrics (Gini impurity
   index). Training and testing sets were split at 40% and 60% of the
   discovery dataset, respectively. Given the relatively small number of
   samples, the accuracy and generalizability of the models were assessed
   based on bootstrap resampling (n = 500). The class labels were balanced
   for each resampling, such that the difference was less than 15% of the
   total number of labels in the training set (corresponds to 6 or 7 of
   each sample class out of 33 tumors in discovery cohort). Exhaustive
   grid search for parameter optimization indicated that pruning should
   not be used, and generally the RF models behaved similarly across a
   wide range of number of trees. Feature vectors were scaled for LR
   models. The validation set of 8 tumors was evaluated after development
   of an ensemble classifier obtained from bootstrap resampling of the
   discovery cohort. A Jupyter Notebook including the Python code used to
   train the LR models against the discovery cohort and test against the
   validation cohort has been provided (see [150]Key resources table).
   This code (“PRETACE” = Predict REsponse to TACE) can be used to
   classify additional samples provided the RNA-Seq expression values are
   normalized with the discovery cohort samples as per this study using
   the samples deposited in the EGA (EGAS00001005558).

Statistical analysis of baseline characteristics and patient survival

   Fishers’s exact tests, chi square tests, Mann-Whitney tests and
   survival analysis were performed using GraphPad Prism version 9.2.0 for
   macOS, GraphPad Software, San Diego, California USA,
   [151]https://www.graphpad.com.

Acknowledgments