Abstract Background Non-alcoholic fatty liver disease (NAFLD) is affecting more people globally. Indeed, NAFLD is a spectrum of metabolic dysfunctions that can progress to hepatocellular carcinoma (NAFLD-HCC). This development can occur in a non-cirrhotic liver and thus, often lack clinical surveillance. The aim of this study was to develop non-invasive surveillance method for NAFLD-HCC. Methods Using comprehensive ultra-high-performance liquid chromatography mass-spectrometry, we investigated 1,295 metabolites in serum from 249 patients. Area under the receiver operating characteristic curve was calculated for all detected metabolites and used to establish their diagnostic potential. Logistic regression analysis was used to establish the diagnostic score. Findings We show that NAFLD-HCC is characterised by a complete rearrangement of the serum lipidome, which distinguishes NAFLD-HCC from non-cancerous individuals and other HCC patients. We used machine learning to build a diagnostic model for NAFLD-HCC. We quantified predictive metabolites and developed the NAFLD-HCC Diagnostic Score (NHDS), presenting superior diagnostic potential compared to alpha-fetoprotein (AFP). Patients’ metabolic landscapes show a progressive depletion in unsaturated fatty acids and acylcarnitines during transformation. Upregulation of fatty acid transporters in NAFLD-HCC tumours contribute to fatty acid depletion in the serum. Interpretation NAFLD-HCC patients can be efficiently distinguished by serum metabolic alterations from the healthy population and from HCC patients related to other aetiologies (alcohol and viral hepatitis). Our model can be used for non-invasive surveillance of individuals with metabolic syndrome(s), allowing for early detection of NAFLD-HCC. Therefore, serum metabolomics may provide valuable insight to monitor patients at risk, including morbidly obese, diabetics, and NAFLD patients. Funding The funding sources for this study had no role in study design, data collection, data analyses, interpretation or writing of the report as it is presented herein. Keywords: NAFLD, HCC, metabolomics, lipidomics, biomarker discovery Abbreviations: AC, acylcarnitines; AV-HCC, alcohol- and viral-associated HCC; AFP, alpha-fetoprotein; AA, amino acids; ArAA, aromatic amino acids; BA, bile acids; BMI, body mass index; BCAAs, branch-chain amino acids; Cer, ceramides; ChoE, cholesteryl esters; CTRL, control; DEM, differentially expressed metabolites; DG, diglycerides; LPC, lysophosphatidylcholines; LPE, lysophosphatidylethanolamines; LPI, lysophosphatidylinositols; CMH, monohexylceramides; MUFA, monounsaturated fatty acids; OB-NAFLD, morbidly obese bariatric surgery NAFLD; NAFLD, non-alcoholic fatty liver disease; NAFL, non-alcoholic fatty liver; NAS, non-alcoholic steatohepatitis; NAE, N-acyl ethanolamines; oxFA, oxidized fatty acids; PC, phosphatidylcholines; PE, phosphatidylethanolamines; PI, phosphatidylinositols; PUFA, polyunsaturated fatty acids; SFA, saturated fatty acids; SM, sphingomyelins; S1P, sphingosine-1-phosphate; TG, triglycerides __________________________________________________________________ Research in context. Evidence before this study Non-alcoholic fatty liver disease (NAFLD) is a group of metabolic disorders that can progress to hepatocellular carcinoma (HCC) often without a background of cirrhosis. Non-cirrhotic NAFLD patients are not under surveillance for HCC development. Some metabolomic studies distinguish stages of NAFLD (steatosis, steatohepatitis and cirrhosis). However, NAFLD versus early hepatocarcinogenesis (NAFLD-HCC) in a metabolic background has not been investigated. Added value of this study This is the first metabolomic study with comprehensive serum lipidomics, investigating a broad-range of lipid classes in NAFLD and NAFLD-HCC. This study provides evidence of lipidomic utility in NAFLD-HCC detection, presents the NAFLD-HCC Diagnostic Score (NHDS) as a putative screening tool for HCC development in a metabolic background and provides a comprehensive lipidomic landscape of NAFLD-HCC patients serum. We show that fatty acid (FA) uptake is increased in NAFLD-HCC tumours, leading to FA depletion in the circulation. Implications of all the available evidence Nearly 30% of newly diagnosed HCC patients have underlying metabolic liver disease. Unlike in the case of viral hepatitis or alcoholic-related HCC, NAFLD-HCC often develops without cirrhosis and thus, patients are not under surveillance. This demonstration of effectiveness using a non-invasive, blood-based assay in detection of resectable (early) NAFLD-HCC should encourage consideration for surveillance of this rapidly increasing, at-risk population. Alt-text: Unlabelled box 1. Introduction Non-alcoholic fatty liver disease (NAFLD) involves a spectrum of metabolic diseases affecting 24% of the population globally [63][1]. NAFLD is associated with obesity and type 2 diabetes [64][2], and represents a spectrum of non-malignant conditions that range from hepatic steatosis to non-alcoholic steatohepatitis (NASH) with fibrosis and ultimately cirrhosis [65][3]. NAFLD is emerging as a leading risk factor of hepatocellular carcinoma (HCC) in both men and women (hazard ratio ∼17) [66][4], but tools to detect the progression of NAFLD to HCC remain inadequate. HCC is the fourth most common cause of cancer-related death worldwide [67][5] with rapidly increasing incidence and mortality rates [[68]6,[69]7]. The incidence of NAFLD-related HCC (NAFLD-HCC) varies between 0·5% and 19·5%, depending on the disease state (presence or absence of cirrhosis) and geography [70][7], [71][8], [72][9]. Indeed, the prevalence of NAFLD-HCC is increasing compared to alcohol- and viral hepatitis-related HCC (AV-HCC) [[73]8,[74]10]. Importantly, accumulating evidence suggests that NAFLD-HCC may develop in a non-cirrhotic background [[75]9,[76]11]. This presents a major clinical challenge as non-cirrhotic NAFLD patients currently are not under surveillance for developing HCC [[77]7,[78]11]. Furthermore, NAFLD-HCC is characterised by unique mutational signatures and an immune environment, which plays an important role in the response to checkpoint inhibitors and may lead to therapeutic failure [79][12], [80][13], [81][14]. Therefore, non-invasive serum biomarkers are urgently needed to monitor NAFLD, its progression to HCC, and to distinguish NAFLD-HCC from other aetiologies. Serum metabolomics provides insight into holistic metabolic changes and can potentially predict hepatic pathological lipid abundance [82][15] and is reflective of the liver pathology [83][16]. However, metabolomic profiles are highly dependent on ethnicity and risk factors such as obesity and diabetes [84][17], [85][18], [86][19], [87][20]. The understanding of the biology underlying NAFLD-to-HCC progression is still lacking. Therefore, analysis of the metabolic shifts in the liver and in circulation at the stage of NAFLD and during malignant progression (NAFLD-HCC) is crucial for the development of new diagnostics and therapeutic modalities [88][21]. The aim of this study was to develop and validate a non-invasive biomarker for surveillance and early detection of NAFLD-HCC. Here, we used a comprehensive serum metabolomics-based approach to identify unique NAFLD-HCC biomarkers, allowing us to distinguish early NAFLD-HCC from patients with NAFLD or HCC on a background of alcohol or viral hepatitis (AV-HCC). 2. Patients and Methods 2.1. Study population A total of 249 patient samples were collected in this international and multicentre study. All samples were collected and stored according to REMARK for biomarker analysis. Our study cohort was divided into discovery and validation sets. The proportion of NAFLD-HCC patients in this study (17·3%) is reflecting the reported prevalence of NAFLD-HCC [89][7], [90][8], [91][9]. The discovery set included serum samples from 196 patients from Spain and France: 27 patients with NAFLD-HCC, 32 patients with AV-HCC, 102 morbidly obese NAFLD patients undergoing bariatric surgery (OB-NAFLD), and 35 healthy subjects described previously [92][22]. Based on unsupervised clustering (Supplementary Materials) and clinical information, we classified 9 obese bariatric surgery patients with a NAS<3 and liver fibrosis score <2 as controls (CTRL). The validation set included serum samples from 37 NAFLD patients from Chile and Spain and plasma sampled from 16 NAFLD-HCC patients in Brazil and Denmark. NAFLD patients from the validation set were overweight (body mass index (BMI>25)) or obese (BMI) >30), however, significantly leaner compared to patients in the OB-NAFLD group (p<0·0001). All patients had biopsy-proven NAFLD. The NAFLD-HCC group consisted of patients who were diagnosed non-alcoholic steatohepatitis (NASH) based on liver biopsy, self-reported alcohol consumption (<20 g/day) and hepatitis B or C serology (hepatitis B surface antigen, hepatitis B surface antibody, hepatitis B core antibody, and hepatitis C antibody). All patients were qualified for curative liver resection. Serum was obtained from all patients prior to surgery, including patients undergoing bariatric surgery or liver resection. The clinical and biochemical representation of the study population is presented in Supplementary Table 1. The NAFLD-HCC patients were diagnosed before surgery with imaging (33%), biopsy (26%), both (15%), or medical concilium (11%). For remaining, 15% of patients’ diagnostic method was not reported. After removal tumours were classified by a skilled pathologist. In all patients’ blood was collected before surgical intervention. 2.2. Ethics The study was performed following individual patient consent, local institutional review board (IRB) approval, and assessed by the Committee on Health and Research Ethics for the Capital Region of Denmark for use of archival material no. 17029679. All patient data sets were anonymised. 2.3. Metabolomic analyses Comprehensive metabolomics including 1295 metabolites was performed on serum samples with three platforms as described previously [93][23]. Briefly, metabolite extraction was accomplished by fractionating the samples into pools of species with similar physicochemical properties, using appropriate combinations of organic solvents. Two UHPLC-time of flight-MS based platforms analysing methanol and chloroform/methanol serum extracts were combined with the amino acid measurement using an UHPLC-single quadrupole-MS based analysis. Platform used for the analysis methanol extraction was optimized for the profiling of fatty acids, oxidized fatty acids, acyl carnitines, lysoglycerophospholipids (monoacylglycerophospholipids and monoetherglycerophospholipids), free sphingoid bases, bile acids, and steroid sulfates. The chloroform/methanol extract platform provided coverage over glycerolipids (di- and triglycerides), cholesterol esters, sphingolipids (ceramides and sphingomyelins), and glycerophospholipids (diacylglycerophospholipids and 1-ether, 2-acylglycerophospholipids). Metabolite extraction procedures, chromatographic separation conditions, and mass spectrometric detection conditions have been previously described [94][23] Data pre-processing generated a list of chromatographic peak areas for the metabolites detected in each sample injection. An approximated linear detection range was defined for each identified metabolite, assuming similar detector response levels for all metabolites belonging to a given chemical class represented by a single standard compound. Metabolites for which more than 30% of data points were found outside their corresponding linear detection range were not used for statistical analyses. Intra and inter batch data normalisation was performed following the procedure described by Martinez-Arranz et al [95][24]. For fatty acid profiling we used tissue samples matched to serum samples from 9 CTRL, 32 NAFLD, 20 NASH, 27 surrounding liver tissues, and 27 tumour tissues. Methanol with internal standards was added to liver tissue (30:1, v/w) and the homogenization of the resulting mixture was then performed using a Precellys 24 homogenizer at 6500 rpm for 26 seconds. After brief vortex mixing, the samples were incubated for 1h at -20 °C. Samples were centrifuged at 18000 x g for 15 minutes at 4 °C. 120 μL of supernatant were collected from every sample and transferred to vials for UHPLC-MS analysis. Absolute quantitation of Linoleyl carnitine, Linoleic acid, Osbond acid, 9 (Z),12 (Z)-Hexadecadienoic acid, 9 (E)-Tetradecanoic acid, Tetradecadienoic acid, Hexadecatrienoic acid, Hydroxyoctadecadienoic acid, 1-Hydroxy-2-Linoleoyl-sn-Glycero-3- Phosphatidylcholine 1-Linoleoyl-2-Hydroxy-sn-Glycero-3-Phosphatidylcholine, and 1-Hydroxy-2-Docosapentaenoyl-sn-Glycero-3-Phosphatidylcholine was also performed. Stock standard solution of Linoleyl carnitine, Linoleic acid, osbond acid, 9 (Z),12 (Z)-Hexadecadienoic acid, 9 (E)-Tetradecanoic acid, (2E, 4E)-2,4-Tetradecadienoic acid, 7 (Z),10 (Z),13 (Z)-Hexadecatrienoic acid, 9- (S)-HODE and 18:0 Lyso PC were prepared individually in methanol at a concentration level of approximately 100, 1000 or 10000 μg/mL, depending on the compound. The working standard was prepared by mixing the appropriate amount of each standard solution in methanol to reach a final concentration of approximately 100 μg/mL. Calibration standards were prepared by consecutive dilutions in methanol, ranged between 0·0005-5 μg/mL for FFAox13 and 0·005-50 μg/mL for the rest of the standards in the calibration curve. Likewise, internal standards (IS) stock solution of Octadecanoyl (18,18,18-D3)-L-Carnitine, Linoleic-9,10,12,13-D4 acid, Hexadecatrienoic 7 (Z),10 (Z), [96][13]14-D6 acid, and 18:1-d7 Lyso PC were prepared at a concentration of 1000 or 5000 μg/mL. The IS working solution was prepared by mixing the appropriate amount of each stock solution in methanol to reach the desire concentration. Quality control (QC) samples (reference sera samples commercially available) were treated according to the same protocol that serum and plasma samples and were analysed against the calibration curve. The intra-day precision was determined by analysing five replicates. To estimate the concentration of validated metabolites in the discovery set, the measured absolute values of metabolites (normalised to IS and standard curves) were plotted against the normalised chromatographic peak areas for 58 samples. Simple linear regression was applied to generate the equation for each metabolite and estimate R-square (Goodness of fit). The equations used for concentration estimation are presented in Supp. Table 2. 2.4. Gene expression analysis To estimate expression of fatty acid transport and metabolism genes we used 48 NAFLD-HCC tumour tissues and 47 matched, surrounding tissues. RNA was isolated from 20 mg of frozen tissue with AllPrep DNA/RNA/Protein Mini Kit (Qiagen) according to manufactures recommendation and sequenced Illumina PE150 (Novogene). Fastq files were trimmed with Trimmomatic-0-2·36 (TruSeq3-PE illumina adapters cut). Reads were annotated with STAR-2.5.1a to hg38 canonical genome assembly 1 pass mode, with default parameters. Gene reference (Gencode.v30) was used for gene abundancy estimation is and normalised by gene length (transcript per million, TPM). Differential expression between tumour and surrounding tissue was established with t-test (p<0·05). The Level 3 TCGA gene expression data (LIHC.rnaseqv2__illuminahiseq_rnaseqv2) from TCGA [97][25] were downloaded from Firebrowse ([98]http://firebrowse.org/). Clinical data were obtained from ([99]https://www.cbioportal.org/). Total of 12 tumour samples had identified NAFLD as one of aetiologies. For surrounding liver tissue only two matched NAFLD samples were available in the dataset, thus we included ‘Other’ (n=4) and ‘No History of Primary Risk Factors’ (n=19) as controls. 2.5. Statistics We identified 0·3% of missing values in the matrix with the maximum of missing values in one feature equal to 15%. The missing values were estimated by the k-nearest neighbour method with Metaboanalyst 4.0 [100][26]. Data were quantile-normalised, log-transformed before analysis (Metaboanalyst 4.0). The covariate testing and correction was performed with linear regression priori to analysis, and age, sex and BMI were included as covariates in differential expression analysis (R 4.0.4) The outline of statistical approaches is presented in [101]Figure 1a. Figure 1. [102]Figure 1 [103]Open in a new tab Workflow of statistical approaches and multivariate analysis between the patient groups. a. Schematic outline of the analytical steps applied post-detection and normalisation of metabolites b. The 3 components that based on spares partial least squares discriminant analysis (sPLS-DA model) of all detected metabolites show samples separation. c. Unsupervised hierarchical clustering of patient subgroups (each subgroup is averaged) across all metabolites (Ward's method clustering). Differences in biochemical parameters were established with one-way ANOVA, Kruskal-Wallis test with Tukey's multiple comparisons, or Fisher exact test for categorical data (Prism 8.2.0). The covariate testing and correction was performed with linear regression priori to analysis, and age, sex and BMI were included as covariates in differential expression analysis (R 4.0.4). Data sets per metabolic class were calculated as the sum of the normalised areas of all the metabolites with the same chemical characteristics. The outlier analysis was performed before one-way ANOVA or Kruskal-Wallis test with Tukey's or Dunn's multiple testing (Prism 8.2.0) depending on data normality. Multivariate analyses were performed with package mixOmics [104][27] or Metaboanalyst 4.0. The sparse partial least squares discriminant analysis (sPLS-DA) was used to investigate the best separation between the groups for multiple groups and orthogonal partial least squares discriminant analysis (oPLS-DA) for pair-wise compresence. Hierarchical cluster analysis was performed with Ward's algorithm using Euclidean distances. The receiver operating characteristic (ROC) curves were generated for pair-wise comparison and areas under the curves (AUC) were estimated with Metaboanalyst 4.0 or Prism 8.2.0 The diagnostic model was constructed using ROC curves generated by Monte-Carlo cross-validation (MCCV) using balanced sub-sampling. In each MCCV, two-thirds (2/3) of the samples were used to evaluate the feature importance and the classification model was validated on 1/3 of the samples that were left out. The procedure was repeated 50 times to calculate the performance and confidence interval of each model detecting the optimal number of features for best accuracy. The linear support vector machine (SVM) method was used for sample classification. Feature selection was performed with SVM Mean Importance Measure. The average accuracy of the model was based on 100 cross validations. Model performance was measured with 1000 permutations and both area under the ROC curve and predictive accuracy p<0·001 (Metaboanalyst 4.0). NAFLD-HCC diagnostic score (NHDS). In the discovery set of 196 samples, 8 metabolites [105][1] had commercial standards allowing absolute quantification, [106][2] highly correlated between measured concentration and peak areas. These were used to build an SVM model to identify the optimal number of metabolites and select features (Mean Importance Measure) distinguishing NAFLD-HCC. The concentrations of selected metabolites were used to build a logistic regression model (with an additional 10-fold Cross Validation) retaining metabolites in the equation with p<0·05. Model performance was tested with 1000x permutations and both AUC and predictive accuracy reaching p<0·001 (Metaboanalyst 4.0). A logistic model was generated based on metabolite concentrations in the discovery set data establishing the NAFLD-HCC diagnostic score (NHDS) and the cut-off value. The NHDS was calculated for the validation set. ROC curves between the groups were computed and compared (Prism 8.2.0). The odds ratio and relative risk for NHDS, clinical and biochemical features were established using Prism 8.2.0. We generated confusion matrixes for the features in discovery and validation sets as well as pulled patient sets using a cut-off value of age (50 years), BMI=30, and reference ranges for biochemical features. Statistical significance was measured with Fisher's exact test (p<0·05), the relative risk was calculated with Koopman asymptotic score and odds ratio with the Baptista-Pike method. Role of covariates on the performance of NHDS. Patients were stratified in groups according to the underlying condition (diabetes, dyslipidaemia, cirrhosis (NAFLD-HCC)), reported medication (statin and paracetamol) use or NAFLD risk-genotype of Patatin-like phospholipase domain-containing protein 3 (PNPLA3), Transmembrane 6 Superfamily Member 2 (TM6SF2), Membrane-Bound O-acyltransferase Domain Containing 7 (MBOAT7). Pearson correlation was applied for pattern hunter analysis to detect metabolites significantly correlating with progressive increase or decline. Metabolites were considered as correlated with pattern when (q<0·05) and abs (r)> 0·3. Pathway overrepresentation analysis was performed with the Integrated Molecular Pathway Level Analysis (IMPaLA) [107][28]. As the majority of our metabolites belong to lipids, which to a high extend are not included in metabolic pathways, we used all detected serum metabolites with HMDB identifiers as background in the enrichment analysis. Lipid pathway enrichment analysis was performed with BioPAN software and default settings. 2.6. Role of funding sources The funding sources for this study had no role in study design, data collection, data analyses or interpretation nor writing of the report as it is presented herein. 3. Results 3.1. Clinical and biochemical features of studied populations We analysed the metabolic composition of serum obtained from 249 patients divided into a discovery set comprising patients with NAFLD-HCC (n=27), non-cancerous individuals (n=137) and 32 patients with alcohol- or viral-related HCC (AV-HCC). The control (CTRL) group included 44 individuals (35 healthy subjects [108][22] and 9 bariatric surgery patients with a NAFLD-activity score (NAS)<3 and fibrosis score <2) and 93 morbidly obese NAFLD patients awaiting bariatric surgery (OB-NAFLD). Furthermore, we independently validated our findings in serum from 37 patients with NAFLD and in plasma from 16 patients with NAFLD-HCC. Importantly, all NAFLD-HCC patients had no prior history of viral hepatitis or excessive alcohol consumption. The clinical, pathological, and biochemical features of all patients are summarized in Supp. Table 1 and Supp. Fig. 1. We observed a significant difference in mean age (NAFLD-HCC patients were significantly older compared to CTRL, OB-NAFLD and AV-HCC), BMI (significantly higher in OB-NAFLD compared to CTRL, AV-HCC and NAFLD-HCC), as well as gender ratio (significantly higher prevalence of HCC among males, but equal between NAFLD-HCC and AV-HCC). Nevertheless, unsupervised principal component analysis (PCA) showed that the metabolic profiles grouped independently of these covariates (Supp. Fig. 2). Additionally, the presence of underlying diabetes, cirrhosis, and level of fibrosis did not affect patients’ grouping in unsupervised clustering. Further, the analysis of alpha-fetoprotein (AFP) or other liver biochemical features (alanine aminotransferase (ALT), gamma-glutamyl transferase (GGT), alkaline phosphatase (AP), bilirubin, albumin, and prothrombin activity) to follow the liver function showed extensive variability already in non-cancerous patients, but generally remained within the range of their references (Supp. Fig. 1). As such,