Abstract Colorectal cancer (CRC) is a high incidence disease, characterized by high morbidity and mortality rates. Early diagnosis remains challenging because fecal occult blood screening tests have performed sub-optimally, especially due to hemorrhoidal, inflammatory, and vascular diseases, while colonoscopy is invasive and requires a medical setting to be performed. The objective of the present study was to determine if serum metabolomic profiles could be used to develop a novel screening approach for colorectal cancer. Furthermore, the study evaluated the metabolic alterations associated with the disease. Untargeted serum metabolomic profiles were collected from 100 CRC subjects, 50 healthy controls, and 50 individuals with benign colorectal disease. Different machine learning models, as well as an ensemble model based on a voting scheme, were built to discern CRC patients from CTRLs. The ensemble model correctly classified all CRC and CTRL subjects (accuracy = 100%) using a random subset of the cohort as a test set. Relevant metabolites were examined in a metabolite-set enrichment analysis, revealing differences in patients and controls primarily associated with cell glucose metabolism. These results support a potential use of the metabolomic signature as a non-invasive screening tool for CRC. Moreover, metabolic pathway analysis can provide valuable information to enhance understanding of the pathophysiological mechanisms underlying cancer. Further studies with larger cohorts, including blind trials, could potentially validate the reported results. Keywords: metabolomics, colorectal cancer, screening test, fecal occult blood test, ensemble machine learning 1. Introduction Colorectal cancer (CRC) represents approximately 10% of all annually diagnosed cancers and cancer-related deaths worldwide. It is the second most frequent cancer diagnosed in women and the third most in men [[46]1]. Its incidence is higher in developed countries. However, while screening programs and improved lifestyle habits help stabilize incidence in these countries, the global incidence is expected to increase as affluence increases and lifestyle changes in developing countries [[47]2]. Both hereditary and environmental risk factors play a role in the onset of CRC. Family history of CRC is a known risk factor for developing the disease and encompasses both genetic and shared environmental risk factors [[48]3]. Among familial cases, a subgroup of patients (3–5% of all CRC [[49]4]) is affected by hereditary CRC syndrome, which can be subdivided into non-polyposis (such as Lynch syndrome) and polyposis syndromes, such as Peutz–Jeghers syndrome and familial adenomatous polyposis (FAP) [[50]5]. Known modifiable environmental risk factors include smoking, eating processed and red meat, alcohol intake, low intake of vegetables and fruits, and obesity [[51]6]. Among non-modifiable factors, male sex, age, and inflammatory bowel diseases such as ulcerative colitis are associated with an increased risk of developing CRC [[52]7]. Clinical signs and symptoms associated with CRC are occult or overt rectal bleeding, change in bowel habits, anemia, weight loss, and abdominal pain. However, they are not specific for this condition. Unfortunately, CRC may be asymptomatic until it reaches advanced stages [[53]8]. Secondary prevention, implemented through screening programs aimed at an early diagnosis, is of crucial importance for CRC, for several reasons. First, the most important prognostic factor is disease stage at diagnosis. Indeed, in the USA in the period 2001–07, 5-year survival was 90.1% for patients with localized stage, 69.2% for patients with regional spread, and 11.7% for patients with distant tumor spread [[54]9]. Second, CRC mostly progresses from precursor lesions (mainly polypoid) by the sequential accumulation of genetic mutations and epigenetic alterations, over a mean progression period of 10–15 years [[55]10]. Thus, the ideal method of screening should be able to detect pre-neoplastic lesions or, at least, localized neoplastic lesions, to allow radical and resolutive intervention. At present, CRC surveillance is based on invasive, i.e., colonoscopy, and non-invasive methods, i.e., fecal occult blood test, targeting either heme (guaiac fecal occult blood test-gFOBT) or hemoglobin (fecal immunochemical test; hemoglobin-FIT). A meta-analysis of four randomized controlled trials concluded that annual or biennial gFOBT screening had no effect on CRC incidence but led to an average 16% mortality reduction [[56]11]. No randomized controlled trial has reported the impact of FIT screening on CRC incidence and mortality; however, the latter method is preferred over the former because it is more sensitive. Nevertheless, as with all screening tests, FIT diagnostic performance depends on the cutoff value for a positive test result [[57]12]. Colonoscopy is, at present, the best method to screen for CRC, in terms of specificity and sensitivity [[58]13]. Nevertheless, randomized controlled trials aimed at quantifying the impact of colonoscopy screening on disease outcomes are still ongoing (clinical trial numbers: [59]NCT01239082, [60]NCT00883792, [61]NCT02078804). An advantage is that colonoscopy offers the possibility of direct lesion removal [[62]13]; however, it is invasive and requires adequate bowel preparation in addition to highly trained personnel. Alternative screening methods are sigmoidoscopy and computed tomography (CT) colonography. Four large randomized controlled trials on sigmoidoscopy screening have been done. All studies showed a reduced incidence of colorectal cancer and, three of four resulted in a lowered relative mortality risk as well [[63]14,[64]15,[65]16,[66]17]. CT colonography has comparable accuracy with respect to colonoscopy [[67]18]. However, both of these are often two-step screening methods, because they must also perform a total colonoscopy in any case in which further diagnosis is needed. In spite of the wide range of screening options currently available for CRC, such tests remain under-utilized by the public [[68]19]. In addition to low public awareness of the importance of screening programs, psychological factors play a key role in determining how often CRC screening is performed. Invasive procedures, such as colonoscopy, are often rejected due to the fear of pain, complications, and discomfort. On the contrary, stool-based approaches are generally preferred due to lower costs and invasiveness, but the need for multiple tests represents a major discouraging factor for many [[69]20]. Even though utilization of invasive procedures is significantly lower than that of stool-based methods, overall compliance with the latter remains low because people tend to postpone these investigations, despite the potentially crucial loss of time before being screened [[70]21]. However, as the adherence of the target population to screening programs is pivotal to obtaining successful public health results, the need to address this issue becomes prominent. Hence, significant effort is expended to develop novel diagnostic tools in order to encourage and simplify the screening of CRC, including, as an example, the development of a toilet paper-based FOBT method [[71]22]. In this context, metabolomics analysis may offer valuable support. Metabolomics is an emerging field of research in the -omic domain and refers to a comprehensive analysis of low molecular weight compounds, such as metabolic substrates and products, lipids, small peptides, vitamins, and other protein cofactors, generated by metabolism, in a biological fluid. It is a rapidly growing field in biomarker discovery [[72]23]. Moreover, unlike the other -omic sciences, such as genomics, transcriptomics, and proteomics, it can be more precise in the characterization of multifactorial diseases because it reflects the interactions between genes and the environment [[73]23,[74]24]. Compared to other diagnostic tools, the metabolomics approach can offer high diagnostic performance by means of a single analysis, in a cheap, fast, and non-invasive manner, potentially representing the ideal screening test. Furthermore, as the metabolome provides unique information regarding the mechanisms underlying the disease onset and progression, a thorough investigation of the metabolomic fingerprint of CRC may provide crucial insights to enhance the understanding of the pathology as well as to identify prognostic biomarkers and assess the severity of the disease [[75]25]. Several metabolomic studies have been conducted on a variety of biological matrices (blood, urine, fecal water, tissue) in small cohorts of colorectal cancer patients [[76]26,[77]27]. These compared either metabolic profiles to healthy subjects (and to normal tissue samples) [[78]26] or to patients with benign polypoid pathology [[79]28], using gas and liquid chromatography coupled to mass spectrometry (GC-MS, LC-MS) or nuclear magnetic resonance (NMR) as analytical techniques. Here, we describe the results of an untargeted metabolomics-based profiling of serum samples collected from subjects that tested positive using the FOBT screening program. Stratifying them according to their colonoscopy-based biopsies, the population was divided into three groups: healthy subjects, participants with benign colon lesions, and patients with CRC. The specific aim was to propose a novel, non-invasive method for the screening of CRC using a robust ensemble machine learning approach based on serum metabolomes. 2. Results The reported results were achieved by analyzing serum samples taken from 200 individuals who tested negative for FOBT or positive and subsequently underwent a colonoscopy and a biopsy to pathologically analyze any evident lesions. Fifty of the 200 participants presented with no lesions and were considered healthy subjects (HS), 50 presented with benign colon or rectum tumors (BCRT), and 100 were diagnosed with CRC. Gas chromatography–mass spectrometry analysis of derivatized samples detected up to 261 peaks in each specimen using an untargeted metabolite extraction procedure. Peaks present in at least 75% of samples and with sufficient signal to be confirmed as metabolites using library comparison were further investigated. As a result, a total of 243 signals were consistently detected. [80]Supplementary Figure S1 reports the deconvoluted chromatograms of typical CTRL and CRC samples. For all of the enrolled subjects, age, sex, weight, height, and biochemical parameters results were recorded. Moreover, the presence of other pathological conditions, as well as chronic treatments for these conditions were investigated ([81]Table 1). These parameters were normally distributed according to the Shapiro–Wilk test. All statistical comparisons used a significance value of α = 0.05 (described in detail below). CRC patients were significantly older than HS subjects (p = 0.009), whereas CRC patients showed a lesser mean body mass index (BMI) compared to both HS and BCRT subjects (p < 0.001). Table 1. Enrolled subject characteristics (mean ± standard deviation or %). Abbreviations used are HS: Healthy subjects, BCRT: Benign colon or rectum tumors, CRC: Colorectal cancer affected patients, BMI: Body mass index, HDL: High-density lipoprotein, LDL: Low-density lipoprotein, GGT: Gamma-glutamyltransferase, AST: Aspartate aminotransferase, ALT: Alanine transaminase LDH: Lactate dehydrogenase. HS (n = 50) BCRT (n = 50) CRC (n = 100) Age (years) 61.6 ± 7.0 62.8 ± 7.1 66.2 ± 11.3 * Men (%) 56 59 64 Weight (kg) 76.4 ± 15.5 80.0 ± 16.9 72.8 ± 15.1 ^§ Height (cm) 165.0 ± 9.5 167.5 ± 8.7 167.7 ± 9.4 BMI (kg/cm^2) 27.9 ± 4.3 28.4 ± 4.8 25.7 ± 9.4 *^,§ Blood Pressure (mm Hg) Systolic 135.2 ± 24.4 132.3 ± 17.7 139.9 ± 17.4 Diastolic 81.6 ± 11.4 81.9 ± 11.1 80.7 ± 8.0 Heart rate (bmp) 79.7 ± 7.7 79.8 ± 6.8 79.4 ± 7.5 Oxygen saturation (%) 99.0 ± 1.5 98.8 ± 1.6 99.7 ± 10.0 Azotemia (g/dL) 38.4 ± 10.4 40.8 ± 18.8 43.2 ± 13.5 * Total Cholesterol (mg/dL) 191.9 ± 39.1 194.9 ± 42.2 189.2 ± 40.0 HDL (mg/dL) 57.2 ± 13.1 52.5 ± 14.7 62.9 ± 18.8 ^§ LDL (mg/dL) 114.2 ± 30.1 113.7 ± 33.9 113.9 ± 33.9 Triglycerides (mg/dL) 115.4 ± 57.3 138.4 ± 95.3 116.6 ± 63.5 Creatinine (mg/dL) 0.8 ± 0.2 0.9 ± 0.4 0.9 ± 0.3 * Alkaline phosphatase (UI/L) 53.7 ± 16.2 55.2 ± 13.3 81.4 ± 59.3 *^,§ GGT (U/L) 26.4 ± 17.5 26.7 ± 22.4 49.2 ± 118.6 Glycaemia (mg/dL) 92.1 ± 25.3 99.1 ± 29.3 101.0 ± 26.7 White blood cells (n/µL) 6725.6 ± 1917.1 8427.6 ± 12315.9 6033.3 ± 1928.7 * Red blood cells (n/µL) 4.97 * 10^6 ± 4.92 * 10^6 4.95 * 10^6 ± 7.14 * 10^6 4.66 * 10^6 ± 6.78 * 10^6 * AST (mU/mL) 21.5 ± 7.7 23.8 ± 11.6 25.2 ± 11.9 * ALT (mU/mL) 25.1 ± 12.2 28.1 ± 14.9 27.2 ± 16.9 LDH (U/L) 169.3 ± 27.1 177.6 ± 28.9 172.3 ± 34.9 Serum iron (µg/dL) 95.0 ± 33.6 95.5 ± 36.9 79.6 ± 42.8 *^,§ Uric acid (mg/dL) 5.4 ± 1.4 5.8 ± 1.7 5.2 ± 1.4 ^§ Other pathologies (n(%)) 45 (90%) 40 (80%) 98 (98%) Hypertension ^¶ 24 (53%) 25 (63%) 49 (50%) Diabetes ^¶ 9 (20%) 8 (20%) 14 (14%) Hypertriglyceridemia ^¶ 2 (4%) 1 (3%) 0 (0%) Hypercholesterolemia ^¶ 6 (13%) 6 (15%) 4 (4%) Heart disease ^¶ 6 (13%) 5 (13%) 7 (7%) Cancer in other organ ^¶ 6 (13%) 3 (8%) 13 (13%) Other ^¶ 11 (24%) 10 (25%) 24 (24%) Pharmacological treatments (n(%)) 39 (78%) 37 (74%) 82 (82%) [82]Open in a new tab * Indicates statistical difference (p < 0.05) compared to HS; ^§ indicates statistical differences (p < 0.05) compared to BCRT; ^¶ indicates the percentage based on the cases with other pathologies. For the purpose of attempting to distinguish serum metabolomes of cancer vs. non-cancer subjects, the HS and BCRT groups were combined to form the control (CTRL) group. Based on this sample aggregation, training and test sets were prepared by randomly dividing the overall dataset (N = 200) into two parts (66:34). One (n = 133; composed of 69 CTRL and 64 CRC) was used to train and cross-validate multiple classification models, while the other (n = 67; with 31 CTRL and 36 CRC) was used to test them. Overall classification performance was evaluated using the test set. In total, 86,625 models, based on 25,141 feature subsets, were developed and tested with the aim of determining the most effective combination of hyperparameters and metabolites to maximize the accuracy of classification of the examined models. Ten machine learning algorithms were trained to classify samples as CTRL or CRC based on the metabolomic profile. These include naïve Bayes (NB), generalized linear model (GLM), logistic regression (LR), fast large margin (FLM), deep learning (DL), decision tree (DT), random forest (RF), gradient boosted tree (GBT), support vector machine (SVM), and partial least square discriminant analysis (PLS-DA). Ultimately, individual results from the ten classification models were statistically “ensembled” to generate an ensemble machine learning algorithm (EML). The best metabolite subsets used for training the final models, as selected by a genetic algorithm (GA), are reported in [83]Supplementary Table S1. The table also reports the metabolites with the highest weight used to build the UpSet representation. As highlighted in [84]Table 2, individual model classification accuracy ranged from 71% to 100% while the EML model reported no classification errors, resulting in 100% accuracy. For EML score evaluation, an EML score = 0 was selected as the optimized cut-off value and represents situations in which the individual votes for and against CRC-positive diagnosis were equal. [85]Figure 1 reports the EML score distribution among the samples in the test set and the corresponding ROC curve. [86]Supplementary Table S2 reports the classification results as well as the classification confidence for the enrolled samples among the test set. Table 2. Performance metrics (value ± standard error) of the individual and the ensembled machine learning classification algorithms when applied to the test set. Abbreviations; NB: Naïve Bayes, GLM: Generalized linear model, LR: Logistic regression, FLM: Fast large margin, DL: Deep learning, DT: Decision tree, RF: Random forest, GBT: Gradient boosted tree, SVM: Support vector machine, PLS-DA: Partial least square discriminant analysis, EML: Ensemble machine learning, S: Sensitivity, Sp: Specificity; PLR: Positive likelihood ratio, NLR: Negative likelihood ratio, NPV: Negative predictive value, PPV: Positive predictive value, A: Accuracy, ND: Not determinable. Model S Sp PLR NLR NPV PPV A NB 0.58 ± 0.10 1.00 ± 0.00 ND 0.42 0.67 ± 0.08 1.00 ± 0.00 0.77 GLM 0.96 ± 0.04 1.00 ± 0.00 ND 0.04 0.96 ± 0.04 1.00 ± 0.00 0.98 LR 0.88 ± 0.06 0.95 ± 0.05 18.58 0.12 0.87 ± 0.07 0.96 ± 0.04 0.91 FLM 1.00 ± 0.00 0.77 ± 0.09 4.40 0.00 1.00 ± 0.00 0.83 ± 0.07 0.89 DL 1.00 ± 0.00 1.00 ± 0.00 ND 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 DT 1.00 ± 0.00 0.82 ± 0.08 5.50 0.00 1.00 ± 0.00 0.86 ± 0.06 0.91 RF 0.69 ± 0.09 1.00 ± 0.00 ND 0.31 0.72 ± 0.08 1.00 ± 0.00 0.83 GBT 0.46 ± 0.10 1.00 ± 0.00 ND 0.54 0.61 ± 0.08 1.00 ± 0.00 0.71 SVM 0.81 ± 0.08 1.00 ± 0.00 ND 0.19 0.81 ± 0.08 1.00 ± 0.00 0.89 PLS-DA 0.92 ± 0.05 0.87 ± 0.06 7.10 0.10 0.90 ± 0.05 0.89 ± 0.05 0.90 EML 1.00 ± 0.00 1.00 ± 0.00 ND 0.00 1.00 ± 0.00 1.00 ± 0.00 1.00 [87]Open in a new tab Figure 1. [88]Figure 1 [89]Open in a new tab Ensemble machine learning (EML) scores calculated for the healthy controls (CTRL) and patients with colorectal cancer (CRC) among the test set; red dashed line represents the optimized cut-off value (Panel A). Receiver operating characteristic (ROC) curve obtained by varying the cut-off value when applying the EML model to the test set (Panel B); the area under the ROC curve is 1.0. Dotted blue line represents the 95% Confidence Bounds. The PLS-DA scatter plots of the first two latent components, reported in [90]Figure 2A, show the graphical representation of the class separation achieved between CTRL and CRC samples. The model showed the best performance using four latent components ([91]Figure 2B) and was statistically robust with no overfitting as confirmed by the permutation test represented by a histogram plot presented in [92]Figure 2C. Fitting value R^2 and its cross-validation homolog Q^2 were 0.907 and 0.787, respectively. Twelve metabolites were found to be most relevant to the class separation (as determined by a variable importance in projection (VIP) score >2.0). These metabolites, reported in [93]Figure 2D, were: glucose, tertraethylene glycol, fructose, quinolinic acid, tartaric acid, myristic acid, pyruvic acid, estradiol, hydroxylamine, nicotinic acid, oleamide, and palmitic acid. Figure 2. [94]Figure 2 [95]Open in a new tab Partial least square discriminant analysis (PLS-DA) score plot performed to classify CTRL and CRC subjects (panel A). For each axis, the percentage of explained variance is reported in parentheses. Panel (B) reports the PLS-DA classification performance using increasing number of latent variables. The red star indicates the best classifier. (C) Permutation test results in which models were built by randomly assigning the class label and then comparing the performance of the permuted models with that of the original model built with the correct class assignment. These were statistically different (based on 2000 permutations), highlighting the lack of overfitting in the original model. (D) The metabolites showing a variable importance in projection (VIP) score higher than 2.0. The blue bars represent metabolites increased in CTRL, while the red bars represent the metabolites decreased in CTRL with respect to CRC. *** represent metabolites with a p-value < 0.001 (E) Volcano plot reporting metabolite concentration fold-changes and their statistical significance when comparing CTRL vs. CRC subjects. 1. Galactose, 2. 4-Hydroxybenzyl alcohol, 3. Myristic acid, 4. Hydroxylamine, 5. Arabinose, 6. Guanine, 7. Fructose, 8. Tetraethylene glycol, 9. Glucose, 10. Quinolinic acid, 11. Estradiol, 12. Threonine, 13. Glutamine, 14. Glyceryl-glycoside, 15. Oxoproline, 16. Lactose, 17. Oxoglutaric acid, 18. 2-Ketobutyric acid, 19. Mandelic acid, 20. Creatinine, 21. Glutamic acid, 22. Nicotinic acid, 23. Norepinephrine, 24. Acetic acid. Horizontal dashed grey line shows p = 0.05; vertical dashed lines represent log[2]FC = ±1. The exploratory analysis illustrated in the volcano plot of [96]Figure 2E showed that 24 metabolites displayed both large magnitude fold-changes (2 < FC < 0.5) and high statistical significance (p < 0.05) when comparing CTRL vs. CRC among the 200 enrolled subjects. Of these, glucose, quinolinic acid, estradiol, threonine, glutamine, glyceryl-glycoside, oxyproline, lactose, oxoglutaric acid, 2-ketobutyric acid, mandelic acid, creatinine, glutamic acid, nicotinic acid, norepinephrine, and acetic acid were higher in CTRL compared to CRC. Conversely, galactose, 4-hydroxybenzyl alcohol, myristic acid, hydroxylamine, arabinose, guanine, fructose, and tetraethylene glycol were higher in CRC. The statistical significance of each metabolite in model training was evaluated for all of the classification algorithms. Those that were found to be most relevant within a given model combined with their multiple selections (in several classification models) were summarized using an UpSet diagram reported in [97]Figure 3. All selected metabolites were coded according to the Human Metabolites Database (HMDB) and reported in [98]Supplementary Table S2. [99]Supplementary Figure S2 reports the box and whisker plot representation of the relative abundances of the relevant metabolites according to the raw signals and the transformed data. Figure 3. [100]Figure 3 [101]Open in a new tab UpSet representation showing the metabolites selected as significant by a given classification model (horizontal) in addition to multiple models selecting a given metabolite (vertical). These metabolites were also employed to conduct a metabolite-set enrichment analysis, reported in [102]Figure 4. An intricate interplay of a number of different metabolic pathways and metabolites was found. For example: arginine biosynthesis; valine, leucine, and isoleucine biosynthesis; aminoacyl-tRNA biosynthesis; D-glutamine and D-glutamate metabolism; alanine, aspartate, and glutamate metabolism; nicotinate and nicotinamide metabolism; glyoxylate and dicarboxylate metabolism; nitrogen metabolism; galactose metabolism; galactose metabolism and butanoate metabolism were all found to play a role in distinguishing the serum metabolomes of CTRL from CRC subjects. The complete list of involved pathways can be found in [103]Supplementary Table S3. Figure 4. [104]Figure 4 [105]Open in a new tab Metabolite set enrichment analysis establishes whether compounds implicated in a specific pathway are increased compared to casual occurrence applying the hypergeometric test. Node centrality, which represents an estimate of node importance, was achieved by 0 employing the betweenness centrality. This reveals the number of shortest paths passing through the node. Because the metabolic network is directed, the relative betweenness centrality for a metabolite has been applied as the importance measure. The betweenness centrality measure is focused on the total network topology. Pathway relevance (represented in terms of circle size) was evaluated as the distance of each point (a metabolic pathway) from the axis origin. Colors represent the matching status of each pathway (number of reported metabolites compared to the total metabolites in the pathway). 3. Discussion In the present study, we evaluated the diagnostic performance of a machine learning ensemble model based on the statistical comparison of the serum metabolomic fingerprints of individuals that are and are not affected by malignant colorectal cancer. Univariate and multivariate comparison of hundreds of serum metabolites illustrates biochemical discrepancies in serum samples according to the presence or absence of CRC. According to the results, numerous serum metabolites and metabolic pathways seem to be associated with CRC. The analysis of the complex network of metabolic pathways connecting molecules, in combination with a powerful machine learning algorithm, enables an effective group separation, offering an innovative approach for noninvasive CRC screening, and providing useful biochemical insights regarding the involved metabolomic pathways. In particular, lower levels of glucose and glutamine were observed in CRC patients. In cancer cells, glucose and glutamine represent the most consumed nutrients [[106]29]. Moreover, both glutaminase and glutamate dehydrogenase are overexpressed in many cancers [[107]30,[108]31]. Notably, glutamine modulates glutaminolysis. This, in combination with leucine, which is capable of activating glutamate dehydrogenase, induces α-ketoglutarate production preventing glutaminase inhibition by glutamate accumulation [[109]32]. Line et al. [[110]33] reported that low glutamine amounts were associated with older age, advanced-stage cancer, low albumin levels, high carcinoembryonic antigen levels, increased C-reactive protein levels, higher modified Glasgow prognostic scores, and augmented proinflammatory cytokine levels in colorectal cancer-affected patients. Moreover, according to their results, subjects exhibiting decreased glutamine levels had lower overall survival and progression-free survival compared to individuals with higher glutamine levels. The low levels of glutamine are consistent with the well-known “glutamine addiction” of cancer cells [[111]34] reported in bladder cancer [[112]35], lung cancer [[113]36], and glioma [[114]37]. This is due to the need for cancer cells to sustain the production of acetyl-coenzyme A to synthetize fatty acids when pyruvate is converted to lactate. The low levels of aspartic acid reported here could explain the low levels of quinolinic and nicotinic acid levels because these molecules are produced from aspartic acid. The involvement of aspartic acid in CRC was reported several times although with different concentration changes in both serum and feces, highlighting its different role in advanced and early stages of the disease [[115]38]. The pivotal role of aspartic acid was also highlighted by Nishiumi et al. [[116]39] who proposed a four metabolite-based (2-hydroxybutyrate, aspartic acid, kynurenine, and cystamine) screening test to discriminate CRC patients from controls with AUC-ROC of 0.91 and an accuracy of 85%. Nicotinate and nicotinamide metabolism as well as leucine, isoleucine, and valine metabolism; short-chain fatty acid and alanine and aspartate metabolism were shown to have a significant impact on CRC metabolomics signature. Brown et al. [[117]40] also found that nicotinate and nicotinamide metabolism was associated with a CRC metabolomics signature. The higher abundances of tetraethylene glycol and hydroxylamine observed in CRC subjects may be caused by an increased exposure to these molecules or the metabolism of correlated pollutants. The selection of non-smoking subjects, both among CRC patients and healthy controls, allows us to exclude these substances as being related to smoking, although passive smoking exposure was not deeply investigated. In addition, the involvement of short-chain fatty acids (SCFA) including acetic, propionic, and 2-keto butyric acids might be explained by differences in microbiome composition between patients suffering from CRC and healthy controls. Indeed, evidence exists that microbiomes play a crucial role in CRC. In particular, several studies [[118]41,[119]42,[120]43,[121]44] modulating several host mechanisms, ranging from inflammation to DNA damage, and producing metabolites able to modulate cellular transformation and cancer progression or suppression. Gut microbiota imbalance (aka dysbiosis) has been reported in subjects affected by CRC, who exhibit lower counts of commensal bacteria (especially butyrate-producing species) and increased levels of pro-inflammatory opportunistic pathogens. The low concentrations of SCFA reported herein could be interpreted in line with this evidence. The reduced concentration of 4-hydroxybenzyl alcohol reported herein has an established correlation in CRC metabolomics-based studies [[122]40,[123]45]. This metabolite shows good angiogenetic antagonist activity [[124]46] and was decreased in polyvinyl chloride workers with hepatic hemangiosarcoma [[125]47]. 4-hydroxybenzyl alcohol, (aka gastrodigenin), is an intermediate metabolite produced during the biosynthesis of thiamine by Escherichia coli. Luo et al. [[126]48] reported that the anti-angiogenic and anti-tumor effects of 4-hydroxybenzyl alcohol could, in part, be due to upregulation of nuclear factor erythroid 2-related factor 2 (Nrf2), an emerging regulator of cellular resistance to oxidants, and in part to other antioxidant pathways. Moreover, this result could also be related to the already discussed CRC-derived dysbiosis. Lipid involvement in cancer and in particular, colorectal cancer, is established [[127]49]. Li et al. [[128]50] reported a nine biomarker-based panel, containing palmitic amide, oleamide, hexadecanedioic acid, octadecanoic acid, eicosatrienoic acid, LPC (18:2), LPC (20:4), LPC (22:6), myristic acid, and LPC (16:0), for an effective differentiation of early-stage patients from healthy controls (AUC-ROC = 0.991, sensitivity = 0.981 and specificity = 1.000). Herein, we observed high levels of oleamide in serum samples of CRC patients. This endogenous metabolite is known to accumulate in the cerebrospinal fluid during sleep deprivation and induces sleep in animals [[129]51]. The exact mechanism of action of oleamide’s sleep-inducing effects is still to be completely elucidated; however, it is likely that oleamide interacts with multiple neurotransmitter systems. Oleamide is structurally related to the endogenous cannabinoid anandamide and has the ability to bind to the CB1 receptor as a full agonist. The higher concentrations reported herein could reflect cannabinoid receptor expression increase in CRC patients [[130]52]. Overall, the CRC signature seems in part related to the Warburg effect (i.e., altered energetic metabolism of cancer cells to facilitate growth, survival, and proliferation), and specifically, the glutamine addiction reported in several cancer types. These well-established effects may help explain the high sensitivity of the ensemble machine learning model built herein using the serum signature. The other part of the CRC signature seems to be related to more specific effects such as the SCFA imbalance, the lipid profile, especially involving oleamide, and gastrodigenin. These differences may contribute to the high degree of CRC specificity of the serum signature and could lead to a differentiation with other oncological forms. This aspect was not part of our experimental design and represents the weakness of our study. The diagnostic performances of the classification models described herein, taken independently, are comparable to other studies reported in the literature. As an example, the sensitivity and specificity of PLS-DA we reported (92% and 87%, respectively) are comparable to those reported by Farshidfar et al. [[131]53] (85% and 86%, respectively), as well as the results of the minimal panel reported by Li et al. [[132]50] (98% and 100%). The strength of our approach lies in the EML algorithm. Indeed, all individual models (except DL) made some classification errors on the test set. On the contrary, the ensemble did not make any errors because more than half of the models would have to make the same mistake on the same sample for EML to show a classification error. This makes the EML system more robust compared to using a single classification approach and contributes to the observed high performance of the ensemble approach. Nevertheless, our study must only be considered a pilot study. Studies based on a single population can be affected by population selection biases as well as analytical biases. Despite efforts to exclude such biases, and the use of cross-validation and permutation tests to avoid overfittings, this risk cannot be completely excluded. Independent validation from a larger, multi-centric, blind cohort is imperative to probe the diagnostic performance under real-world conditions. In this way, it could be investigated whether confounding factors (other age range subjects, concurrent pathologies, etc.) contribute to limiting its applicability or diminish its overall performance. 4. Materials and Methods 4.1. Study Design and Patients Enrollment The clinical specimens have been collected within the “Prima Prevenzione—SPEM” study protocol—“Analysis of environmental, dietary, transcriptomic and genomic factors as biomarkers for risk assessment and early diagnosis of colorectal cancer within the screening campaign “Prima Prevenzione”, approved by the Ethics Committee “Campania Sud” (approval n° n.144, 6 December 2018). The “Prima Prevenzione—SPEM” study is an epidemiological observational cohort study promoted by the Istituto Zooprofilattico Sperimentale del Mezzogiorno (IZSM) of Portici (Naples), in collaboration with the Local Health Authorities of Salerno (ASL SA) and the National Tumor Institute IRCCS (Istituto di Ricovero e Cura a Carattere Scientifico) “G. Pascale” in Naples. The study design considers an integrated and holistic search for a predictive profile of the occurrence risk of CRC. It takes into account the individual hereditary/etiological causes and their combination, especially focusing on the correlation between environment and health and an innovative perspective of primary (prophylactic interventions) and secondary (early diagnosis) prevention. There are three main study areas: (1) Genomic, epigenomic, and metabolomics biomarkers; (2) exposure to sources of pollution; (3) eating habits and intestinal microbiota. About 60,000 subjects were screened for FOBT; the positive ones were subjected to colonoscopy and eventually to colon mucosa biopsy to confirm the presence of colorectal cancer. Inclusion criteria were: (1) age between 50 and 74 years; (2) residence in the municipality of enrollment for at least 5 years (in order to standardize environmental exposure); (3) history of negative hepatitis B, C and HIV; (4) signature of the informed. Exclusion criteria were: (1) Contraindication to blood sampling; (2) lack of cooperation or poor compliance. Among the enrolled subjects, 200 non-smokers were randomly selected resulting in 3 study groups: (I) Control Healthy Group: Negative or positive fecal occult blood test (FOBT−/+) with a negative endoscopy (n = 50); (II) Benign Colorectal Disease Group: Positive fecal occult blood test (FOBT +) with a positive endoscopy for benign adenomatous polyp without carcinoma (n = 50); (III) CRC Group: Positive fecal occult blood test (FOBT +) with a positive endoscopy and positive histology for CRC (pT1-2 N0 or pT3-4N0 or PT1-4 N +) (n = 100). After subscribing to an informed consent, each subject enrolled in the study first underwent medical examination, completed a questionnaire on lifestyle and eating habits (EPIC questionnaire—European Prospective Investigation into Cancer and Nutrition—validated by WHO—World Health Organization) [[133]54,[134]55], and filled in an anamnestic case report form (CRF). Then, samples of blood (about 50 mL), urine, and feces were collected. All data were treated in accordance with current legislation on privacy (EU 679/2016). Indeed, data encryption was applied in order to ensure the protection of privacy on the web-based platform for the study management ([135]https://pps.openspes.campaniatrasparente.it/, accessed on 2 December 2021). 4.2. Blood Sampling Blood samples were collected in vacutainer tubes for serum (BD REF 366,468 SST II Advance Tube) and separated by centrifugation at +4 °C, 10 min at 2000 RCF. Fresh samples were allocated to biochemical analysis, within 3 h from venous sampling. The remaining aliquots were stored at −80 °C in the biobank dedicated to the study ([136]www.biobancaizsm.it, accessed on 1 December 2021), until analysis of biomarkers. The clinical and biochemical parameters defining a basal profile of the individuals were evaluated. These included blood count, blood biochemistry, and endocrinological panel. These were considered in order to investigate the function of bone marrow, liver, kidney, and of the endocrine system and to underline the presence of clinically relevant differences among the three study groups. 4.3. Metabolomics Analysis Untargeted extraction, purification, and derivatization of serum samples were performed using MetaboPrep GC kit (Theoreo srl, Montecorvino Pugliano, Italy) as described in Troisi et al. [[137]56,[138]57]. Briefly, 50 µL of serum was placed in an Eppendorf tube and incubated with the alcohol-based extraction solution containing 2-isopropyl malic acid as internal standard. Tubes were vortexed at 1250 rpm for 30 min and subsequently centrifuged for 5 min at 16,000 rpm at 4 °C; 200 µL of supernatants, collected in new tubes were incubated under vortex conditions (1250 rpm for 30 s) with the purification solution and then centrifuged at 16,000 rpm at 4 °C for 5 min. Supernatants (175 µL) were transferred into a glass vial and freeze-dried overnight. Derivatization was conducted in two steps: First, a pyridine solution of methoxylamine was added and solutions were incubated under vortexed conditions (1200 rpm for 90 min); subsequently, 25 µL of an N,O-Bis(trimethylsilyl)trifluoroacetamide (BSTFA)-based derivatizing solution was added. Vials were subjected to an additional 90 min vortexing at 1200 rpm. The derivatized metabolome was transferred to a GC vial with a low-volume insert for the autosampler injection. Vials were centrifuged for 5 min at 16,000 rpm maintaining the temperature below 4 °C, before the injection into GC-MS. Derivatized samples (2 µL) were injected into the GCMS-2010SE (Shimadzu Corp., Kyoto, Japan). The chromatographic separation was performed using a 30 m × 0.25 mm CP-Sil 8 CB fused silica capillary column with 1.00 µm film thickness from Agilent (Agilent, J&W). Helium was used as carrier gas; the initial oven temperature was set at 100 °C and was maintained for 1 min and subsequently raised to 320 °C at 6 °C/min with a further hold time of 2.33 min. The gas flow was set to reach a constant linear speed of 39 cm/s, and the split flow was set to 1:5. The mass spectrometer was operated with electron impact ionization (70 eV) in full scan mode with a range of 35–600 m/z, a scanning speed of 3333 amu/sec, and a solvent cut-time of 5 min. Relevant metabolites were annotated setting to 50 the linear index difference max tolerance and setting to 85% the minimum matching for NIST-14 library search. These were further confirmed using external standards according to Level 1 Metabolomics Standard Initiatives (MSI) annotation [[139]58]. The samples were partitioned into batches, each consisting of 25 samples. Four controls were used to monitor each batch: an instrument blank injection, an injection of a standard mixture, an injection of a pooled sample solution, and a duplicated injection of a randomly chosen sample in the batch. In particular, 2 µL of hexane was employed for the instrument blank, while the standard mixture contained a solution of 15 molecules (organic acids, sugars, amino acids, steroids, and fatty acids) that underwent the same derivatization process used for the samples. The pooled sample consisted of 2 µL each from 50 randomly selected derivatized samples, and the duplicated injection was performed employing a sample chosen at random from the batch. Four conditions had to be met for each batch to be validated: No peaks were generated by the solvent blank; the ratio between the areas underlying the peak of the analytical standards (normalized by the internal standard area) remained within 10% of the expected value; the peak areas (normalized to the internal standard) of the 100 highest peaks of the repeated injection were within 15% of the first injection; and the pooled sample was allocated in the same area of the other pooled samples; that is, <5% of the total area of a model built using all the samples analyzed. For each batch, an alkane mixture (C10-C40, Sigma-Aldrich, Milan, Italy) was injected to evaluate the Kovats’ index [[140]59]. Moreover, for each batch, an injector liner change was performed. Injection liners contained a small amount of GC-MS-grade glass wool. Gas chromatography–mass spectrometry signals not consistently found in at least 80% of the samples were excluded. Very low-intensity metabolite peaks, resulting from low concentration and therefore poor mass spectral quality, were not investigated further. Signals derived from the same metabolites (e.g., sugars that result in multiple derivatization products) were considered as independent features. Chromatographic signals were first deconvoluted and peaks were picked using the GC-MS Solution software v.2.72 (Shimadzu, Kyoto, Japan) and then aligned using the MetaboPredict software (Theoreo srl, Montecorvino Pugliano, Italy), which uses the ptw R package [[141]60] within a proprietary script for gas chromatographic data alignment and missing data management. 4.4. Statistical Analysis Statistical analyses were performed by means of R Studio ver. 1.2.5042. The Shapiro–Wilk test was used to analyze clinical data distribution. Because the continuous variables were normally distributed, p-values were determined using the Student’s t-test, whereas the comparison of percentages was achieved using the χ^2-test. Statistical significance was evaluated using α = 0.05. With regard to bioinformatic analysis, data obtained by the chromatographic investigations were compiled in a table with one sample per row and one variable (metabolite) per column (dataset). Data transformation of the raw chromatographic signal intensities was performed by first taking the logarithm of the peak areas for each metabolite (normalized to that of the internal standard) and then scaling these values via the autoscaling process (mean centered and divided by the standard deviation for that variable). 4.5. Machine Learning Models After samples were separated into training (n = 133) and test (n = 67) sets (66:34 ratio), the first set was employed to train 10 classification models: Naïve Bayes (NB), generalized linear model (GLM), logistic regression (LR), fast large margin (FLM), deep learning (DL), decision tree (DT), random forest (RF), gradient boosted trees (GBT), support vector machine (SVM), and partial least square discriminant analysis (PLS-DA). Fine hyperparameter tuning was implemented to reach the optimal combination in order to maximize the classification accuracy of the models and avoid overfitting (evaluated by means of a cross-validation procedure on the training dataset). Meanwhile, numerous metabolite subsets were used to train the models. As suggested by the software, we used for data analysis, Rapid Miner Studio ver. 9.7.0 (RapidMiner GmbH, Boston, MA, USA), features to be included in the model were screened in accordance with three criteria: (a) correlation (features that too closely, or not at all, mirror the Yes/No diagnosis criterion), (b) stability (features where nearly all values are identical), (c) missing (features with missing values) and then mixed to identify the combination providing the best performance. Moreover, a metabolites subset selection was applied to each classification model as a nested genetic algorithm (GA) using the evolutionary features selection tool included in the RapidMiner software. For each model, the weights of all the metabolites selected to explain the class attribution were also evaluated by using a scree-plot-like graph, and the elbow of the graph was used as a threshold to identify the most relevant metabolites. These metabolites were combined in the Upset representation [[142]61] to evaluate the selected metabolites from different models. As part of the model training procedures, each model was subjected to cross validation. Furthermore, hyperparameter optimizations were also subjected to cross validation. To avoid overfitting of the trained models, the two cross-validations were nested in a single process to both train the models and tune the hyperparameters in an unbiased manner. The 10 individual classification models were also “ensembled” according to a voting scheme that used both the cross-validation accuracy and the confidence (i.e., distance from classification margin) as a vote weight. Ensembling was executed in accordance with Troisi et al. [[143]56,[144]57,[145]62,[146]63,[147]64,[148]65]. In brief, for samples identified as “CRC”, the scores (obtained by multiplying the model cross-validation accuracy and classification confidence) were used as is, whereas the scores of “CTRL” samples were multiplied by −1. Lastly, the sum of the individual classification model scores was used to calculate a CRC-ensemble machine learning (EML) score for each subject. These scores were then compared to the optimized cut-off value to arrive at a final prediction of whether a given metabolome derived from a CTRL or CRC sample. The overall diagnostic performance of the proposed score was investigated using a confusion matrix to summarize the results obtained using samples in the test set. The area under receiver operating characteristic (AUC-ROC) curve, as well as sensitivity, specificity, positive and negative predictive values, positive and negative likelihood ratios, and accuracy were calculated to assess the ability of the CRC-EML score to correctly predict CRC presence. A non-parametric approach (DeLong et al. [[149]66]) was used to compare the AUC-ROC curves. 4.6. Pathway Analysis The pathway analysis was performed combining the results from pathway enrichment analysis with pathway topology analysis according to Xia and Wishart [[150]67]. The analysis was based on the KEGG metabolic pathways as the backend knowledgebase included in the web application based on the MetPa algorithm. By means of the over-representation analysis, we first tested if compounds involved in a particular pathway were enriched compared to random hits. This evaluation was based on the hypergeometric test. Moreover, to take into consideration the structural information of the pathways, a pathway topology analysis was performed using the betweenness centrality as a measure of node centrality to estimate node importance. Because several pathways were tested at the same time, the statistical p values from enrichment analysis were adjusted for multiple testing with the false discovery rate (FDR) method. The Impact is the pathway impact value calculated from pathway topology analysis. 5. Conclusions The present pilot study allowed the identification of a complex network of serum metabolites significantly associated with the presence of colorectal cancer. The metabolomic signature appears to be strongly correlated with the Warburg effect and glutamine addiction that are widely reported in several cancers. However, several aspects of this signature seem more specific to CRC; namely, oleamide and SCFA imbalances play a large role in determining the high specificity of this signature. Other studies are necessary to validate these preliminary results and to evaluate the specifics of the signature in the differentiation of CRC from other cancer types. Supplementary Materials The following are available online at [151]https://www.mdpi.com/article/10.3390/metabo12020110/s1. Figure S1: Deconvoluted chromatograms of a control and a colorectal cancer serum sample, Figure S2: Box and Whisker plot of the raw and transformed data related to the relevant metabolites, Table S1: Metabolite subsets selected by genetic algorithm and used to train the final models, Table S2: Classification confidence for each model, and resulting final EML score, for each subject in the test set, Table S3: Metabolites selected as relevant according to VIP-score criterion (>2.0), volcano plot (2 < FC < 0.5 and p-value < 0.05), and UpSet plot, Table S4: Metabolite-set Enrichment Pathways analysis results derived from the metabolites selected by VIP-score and Volcano plot in CTRL vs. CRC comparison. [152]Click here for additional data file.^ (5.2MB, zip) Author Contributions Conceptualization, J.T., B.P., G.S. and P.C.; methodology, J.T., S.M.R., S.J.K.S. and G.S.; formal analysis, J.T., M.T. and M.L.; investigation, B.P., M.T., F.T., C.B., J.T. and M.L.; resources, P.C. and G.S.; data curation, J.T., B.P., S.M.R. and S.J.K.S.; writing—original draft preparation, J.T., B.P., M.T. and M.L.; writing—review and editing, S.M.R., S.J.K.S., P.A.A., P.D. and P.C.; visualization, J.T., M.T. and M.L.; supervision, P.A.A., P.D. and P.C.; project administration, P.C. and G.S.; funding acquisition, P.C. and G.S. All authors have read and agreed to the published version of the manuscript. Funding This project was funded by POR Campania FESR 2014/2020—Project “Campania Oncoterapie” grant n. B61G18000470007. Institutional Review Board Statement The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the ethics committee “Campania Sud” (protocol number 144, 6th December 2018). Informed Consent Statement Informed consent was obtained from all subjects involved in the study. Data Availability Statement The data that support the findings of this study are available from the corresponding author, upon reasonable request. The data are not publicly available due to privacy. Conflicts of Interest J.T. and G.S. (Giovanni Scala) are employed in companies (Theoreo Srl and Hosmotic Srl, respectively) dealing with the development and market of diagnostic tests based on the metabolomics. All the other authors have no conflict of interest. Footnotes Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. References