Abstract Parkinson’s disease (PD) is a progressive neurodegenerative disorder, which is characterised by degeneration of distinct neuronal populations, including dopaminergic neurons of the substantia nigra. Here, we use a metabolomics profiling approach to identify changes to lipids in PD observed in sebum, a non-invasively available biofluid. We used liquid chromatography-mass spectrometry (LC-MS) to analyse 274 samples from participants (80 drug naïve PD, 138 medicated PD and 56 well matched control subjects) and detected metabolites that could predict PD phenotype. Pathway enrichment analysis shows alterations in lipid metabolism related to the carnitine shuttle, sphingolipid metabolism, arachidonic acid metabolism and fatty acid biosynthesis. This study shows sebum can be used to identify potential biomarkers for PD. Subject terms: Parkinson's disease, Medical and clinical diagnostics __________________________________________________________________ Studies of metabolites in neurodegeneration have not yet used sebum as a source fluid. Here the authors demonstrate the potential of metabolomics of sebum samples from individuals with Parkinson’s disease and controls. Introduction Parkinson’s disease (PD) is a neurodegenerative disorder affecting over 6 million globally, second only in prevalence to Alzheimer’s disease^[44]1. The principal pathological hallmark of PD is the formation of aggregated α-synuclein deposits in the brainstem, which are the major components of Lewy bodies^[45]2,[46]3. The disease is also characterised by the loss of dopaminergic neurons in the substantia nigra pars compacta producing a decline in striatal dopamine levels and subsequent loss of motor function^[47]4. There is no conclusive preclinical diagnostic test for PD. Clinical diagnosis is achieved primarily through observations by a physician, of the decline in motor functions^[48]5,[49]6. These clinical manifestations normally present as a combination of one or more of the four cardinal signs of PD, namely; bradykinesia, resting tremor, rigidity, and postural instability^[50]7,[51]8. A formal diagnosis often occurs following the depletion of 60–80% of the brains dopaminergic neurons^[52]2. Non-motor symptoms are thought to precede motor symptoms by upto 20 years, some of these include: mood disorders, sleep disorders, and olfactory deficits^[53]9,[54]10. Seborrhoeic dermatitis is a common non-motor symptom reported in up to 60% of people with Parkinson’s (PwP)^[55]11,[56]12. This condition presents as “oily skin” that correlates to an excess of sebum, produced and secreted by the sebaceous glands in the dermis of the skin. Sebum is a complex lipid-rich substance that is predominantly composed of triglycerides, fatty acids, wax esters, squalene, and cholesterol^[57]13. It serves as a protective agent to the skin providing waterproofing, thermoregulation, and photoprotection, alongside suggested antimicrobial and antioxidant activities^[58]14,[59]15. Studies of sebum are commonplace in dermatological conditions such as acne, however sebum as a biofluid has rarely been used in disease diagnostics. In our recent study, we have reported the presence and differential regulation of volatile organic compounds in the sebum of PwP^[60]16. The analysis of complex mixtures of metabolites present in a lipid-rich biofluid such as sebum, calls for a sensitive and robust analytical platform. Mass spectrometry (MS) is a leading analytical technique for clinical metabolomics analyses and when hyphenated to chromatography, benefits from increased resolution and sensitivity^[61]17,[62]18. Liquid chromatography-mass spectrometry (LC-MS) facilitates the qualitative and quantitative analysis of the wide range of molecular species found within complex mixtures such as sebum. LC-MS has been used to study a number of biofluids in relation to PD prognosis and diagnosis, such as blood, saliva, and cerebrospinal fluid (CSF)^[63]19–[64]25. Alterations in the expression of metabolites and the downstream effects on their corresponding metabolic pathways have also been extensively studied for PD diagnostics within the blood and CSF metabolome, including: catecholamines, dopamine metabolites, amino acids, and urate alongside fatty acid metabolism, energy metabolism, and kynurenine metabolism^[65]19,[66]22,[67]26–[68]29. The use of sebum as a diagnostic tool for PD provides an exciting prospect from which a non-invasive and inexpensive test could be developed to detect the onset of the disease. In this study, we have used LC-MS to separate and detect lipid-like species and small molecules present in sebum. We have used data-driven approaches, with robust statistical validation, to discover biomarkers of Parkinson’s disease present in sebum. This will inform the development of future PD biomarkers alongside the understanding of metabolic pathways altered in PD. Additionally, we also investigate whether variations in the measured sebum metabolome between early drug naïve PD and later medicated PD were observed, suggesting changes in the metabolic pathways during disease progression. Results Analysis of patient metadata The study population comprised of 274 participants which included 138 medicated PD, 80 drug naïve PD and 56 control subjects. An overview of important patient demographics is summarised in Table [69]1. The results of significance tests between cohort group metadata are reported in Supplementary Table [70]1. Two-tailed Mann–Whitney U-test showed age is significantly different (p < 0.05) between control and PD cohorts (both drug naïve and medicated), however, BMI was not statistically significantly between these groups. There were more male participants in both PD cohorts (M/F > 1.5) compared to a higher proportion of female participants within the control group (M/F < 1). This was perhaps expected as the higher incidence and prevalence rates of PD in the male population is recognised and studies show a 1.4–1.5 fold increase in the number of male PD cases, although the reason for this is not yet understood^[71]1,[72]30. A similar comparison of the number of participants who smoke (yes/no) or consume alcohol (yes/no) showed no significant differences between drug naïve PD and control cohorts, with p-values of 0.837 and 0.192, respectively. However, the number of participants who consume alcohol was found to be 2.5 times higher in the control group compared to medicated PD. There were no smokers in the medicated PD cohort and 7% within the control group, which was deemed significant by a Fisher’s exact test (p-value 0.006). The discovery of significant differences of these metadata parameters between PD and control cohorts has led us to test their impact on classification accuracy, which are described within the following results sections. Table 1. Demographics of participants included in classification modelling and statistical analysis. Parameters Independent control Drug naïve PD Medicated PD n 56 80 138 Age (years)^a 54.3 ± 14.4 69.8 ± 9.4 70.3 ± 8.2 BMI (kg/m^2)^a 26.1 ± 4.4 25.8 ± 4.9 26.3 ± 5.4 Gender (Male:Female)^b 0.87 1.76 1.65 Alcohol intake (Yes:No)^c 4.60 1.76 1.81 Smoker (Yes:No)^c 0.08 0.07 0.00 [73]Open in a new tab BMI body mass index. ^aBMI and age values are expressed as mean ± standard deviation. ^bExpressed as a ratio (Male:Female). ^cExpressed as a ratio (Yes:No). Data driven prediction of PD In order to assess variation between the measured metabolome by phenotype, partial least squares-discriminant analysis (PLS-DA) was used. Two PLS-DA models were constructed, each using a two-class input: (1) drug naïve PD vs. control and (2) medicated PD vs. control. It is well known that unbalanced numbers within classification groups may bias prediction accuracy towards the majority class and to overcome this here, Synthetic Minority Over Sampling Technique (SMOTE) was applied^[74]31. PLS-DA models were built and validated using bootstrap resampling with replacement (n = 250). Figure [75]1 reports the classification sensitivity and specificity rates of each PLS-DA model alongside the observed and null distributions (from permutation testing). Fig. 1. PLS-DA classification models for (a, b) drug naïve PD vs. control and (c, d) medicated PD vs. control. [76]Fig. 1 [77]Open in a new tab a, c Classification rates for each model including true positive (TP, sensitivity), true negative (TN, specificity), false positive (FP), and false negative (FN). b, d Null distribution (grey bars) and observed distribution (blue bars) for each PLS-DA bootstrap model. The correct classification rate (CCR) were calculated from the test sets only (n = 250 from the bootstraps). To evaluate if gender influenced classification accuracy, two PLS-DA models were built for each gender separately, for drug naïve PD vs. control and medicated PD vs. control. If the compounds accounting for variance between disease and control were gender specific, we could expect consistent and significantly higher sensitivity and specificity values for one gender, which we did not find to be true (see Supplementary Table [78]3). Combined gender models (Fig. [79]1) were used for subsequent analysis owing to the heightened power attributed to statistical models with larger input groups. PLS-DA was also used to determine if geographical location or variances between clinician sampling could impact classification using an independent control cohort. Samples (n = 40) were chosen from four recruitment clinics, located in the north (n = 2) and south (n = 2) of the UK. Confounding factors were controlled so that age and BMI were not statistically significant between groups (one-way ANOVA p-value > 0.05) and the male-to-female ratio was identical. The average CCR for this model was 21%, which therefore indicates that our data is not biased by recruitment site or the clinician who collected samples. Selection of significant features which classify PD To define the features responsible for the measured variance in PLS-DA prediction models, variable importance in projection (VIP) scores were calculated. Receiver operating characteristic (ROC) analysis was performed on variables with VIP score > 1 (Fig. [80]2). The number of variables that met this threshold were 15 in Drug naïve PD and 26 in medicated PD analyses. The area under the curve (AUC) and 95% confidence intervals (CI) for each individual variable obtained from univariate ROC curve analysis are reported in the Supplementary Fig. [81]2. A limitation in ROC analysis of individual features is the failure to consider relationships between the features that account for the observed variance. The outcome of a multivariate analysis is reduced to a univariate one, in which each individual feature is treated as the sole biomarker accounting for 100% of the variation between the classes. Therefore, in combination with assessing individual metabolite ROC curves, a multivariate ROC analysis approach was also implemented based on the PLS-DA method (Fig. [82]2a, [83]b). Fig. 2. ROC curve analyses based on a multivariate PLS-DA algorithm with a two latent variable input, AUC and 95% confidence intervals (CI) were calculated by Monte Carlo cross validation (MCCV) using balanced subsampling with multiple repeats. [84]Fig. 2 [85]Open in a new tab a ROC curve analysis (n = 15 independent metabolite features) in drug naïve PD vs. control PLS-DA with VIP > 1. b ROC curve analysis (n = 26 independent metabolite features) in medicated PD vs. control PLS-DA with VIP > 1. c A bar chart displaying the comparison of AUCs for drug naïve PD (purple) and medicated PD (blue) using common VIPs between models (n = 10 independent metabolite features), data are presented as mean AUC value with error bars representing the minima and maxima values of the 95% CI range. We note that PLS-DA could not accurately differentiate medicated PD and drug naïve PD. Sensitivity and specificity values of 59.7 and 50.3% were returned for PLS-DA models in which medicated PD was the “positive” predicting class (data shown in Supplementary Fig. [86]1). Figures [87]2a, [88]2b report ROC curves for drug naïve PD and medicated PD models, respectively, which each use all VIP compounds > 1 for each respective model. VIP score examination of drug naïve PD vs. control and medicated PD vs. control models confirms that ten variables (VIP > 1) are common between the two PD groups. To investigate biomarkers associated with the diagnosis of PD rather than disease stage stratification and to avoid possible effect of medication, the common metabolites (VIP > 1) between drug naïve and medicated PD analyses were evaluated further. Figure [89]2c presents a multivariate ROC analysis for each common variable, and this analysis reports increased sensitivity and specificity rates as a function of the number of variables included in each model as demonstrated by higher AUC values. In addition, the 95% confidence interval range decreases as the number of variables in each model increases. Pearson correlation coefficients were calculated for each significant variable (VIP > 1) to investigate association of alcohol and significant variables. None of the significant compounds are associated to an increase in alcohol consumption (Supplementary Fig. [90]3). To exclude the possible contribution of age to disease classification, age was included as a variable for further PLS-DA models, giving it equal weighting as any other measured variable. If age had any significance, it had equal chances to contribute to the model and would be ranked as high as other measured variables. The difference in CCR between models with and without the inclusion of age were negligible (<0.5%), and VIP scores for the age variable were 1.17 × 10^−11 and 2.11 × 10^−11 for drug naïve and medicated PD models, respectively. In perspective, the variables were ranked at 6492 and 6498 out of a possible 6502 ranks, which strongly indicates that age is not a contributing factor for the separation presented. Annotation of metabolites associated with PD diagnosis Metabolomics Standards Initiative (MSI) guidelines^[91]32 and International Lipid Classification and Nomenclature Committee (ILCNC)^[92]33 guidelines were adhered to for the annotation of common significant metabolites (n = 10) (Table [93]2). Table [94]2A reports putative annotations based upon accurate mass and tandem MS fragmentation data for five of the significant compounds (MSI level 2). Table [95]2B reports the database matches based upon accurate mass, although there are no fragmentation data to support these matches the only possible hits from two databases (Lipid Maps and METLIN) within a low mass tolerance (10 ppm) correspond to a single chemical formula in three of the five compounds; the remaining two compound had no matches. Ceramides, triacylglycerol, glycosphingolipid, and fatty acyl lipid classes were amongst those putatively annotated in both common and non-common VIP compounds. Putative annotations and database matches listed in Table [96]2A, [97]B are expounded upon in Tables [98]S3A and [99]S3B, respectively. Notably, metabolites belonging to ceramide, triacylglycerol, and fatty acyl classes were downregulated whereas glycosphingolipid and fatty acyl metabolites were upregulated in PD. Box plots comparing control, drug naïve PD and medicated PD cohorts for these compounds are displayed in Fig. [100]3. Further details of putative compound annotations for all metabolites with VIP score > 1 in drug naïve PD and medicated PD analyses are found in Supplementary Tables [101]4A, B and [102]5A, B, respectively. Table 2. Putative annotations of the ten VIP compounds common between drug naïve PD and medicated PD analyses (VIP > 1). (A) Putative annotations have been assigned using accurate mass and MS/MS fragmentation matched against Lipid Maps database (LMSD) and Lipid Blast. Feature Putative annotation (Accurate mass & MS/MS fragmentation) Expression drug naïve PD (fold change) Expression medicated PD (fold change) m/z 825.6939 TG(50:5) ↓ (0.77) ↓ (0.64) m/z 764.5681 HexCer(36:2) ↑ (1.15) ↑ (1.10) m/z 666.6370 Cer(42:0) ↓ (0.60) ↓ (0.47) m/z 638.6067 Cer(40:0) ↓ (0.61) ↓ (0.47) m/z 610.5763 Cer(38:1) ↓ (0.63) ↓ (0.48) (B) Putative annotations have been assigned using accurate mass measurements matched against Lipid Maps (LMSD and COMP_DB) and METLIN databases. Measured feature Database matche(s) (accurate mass) Formula Expression drug naïve PD (Fold change) Expression medicated PD (Fold change) m/z 414.4308 FA(26:0) C[26]H[52]O[2] ↑ (1.23) ↓ (0.84) Methyl pentacosanoate m/z 358.3677 FA(22:0)* C[22]H[44]O[2] ↓ (0.81) ↓ (0.78) m/z 194.1396 FA(8:0) C[8]H[16]O[4] ↑ (1.74) ↑ (1.78) l-Cladinose Metaldehyde^† m/z 550.6277 – – ↑ (1.33) ↑ (1.10) m/z 368.4242 – – ↓ (0.15) ↓ (0.14) [103]Open in a new tab TG triacylglyceride, HexCer hexosylceramide, Cer ceramide FA fatty acyl. ^†Pesticide. Fig. 3. Box whisker plots for each of the eight putatively annotated compounds for control (Ctrl, yellow) (n = 56 biologically independent samples), drug naïve PD (DN, purple) (n = 80 biologically independent samples) and medicated PD (Meds, blue) cohorts (n = 138 biologically independent samples). [104]Fig. 3 [105]Open in a new tab Box plots display mean (square), median (line within box) and quartiles (box limits), range (whiskers) and outliers (diamond). The y-axis of each plot corresponds to the natural log of intensity values and the measured m/z value for each compound is labelled above the plot, these species correlate to the data presented in Table [106]2A, [107]B. Sebum metabolome measurements: context to current understanding of PD Pathway enrichment analysis was performed to explore changes in metabolic pathways with respect to disease onset and progression. A prerequisite for traditional pathway analysis methods is the annotation of all analytically detected features via spectral and compound database matching. This is a major bottleneck in untargeted metabolomics workflows and due to the large number of features detected in this study, Mummichog analysis was employed^[108]34. The analysis was performed independently for the two PD cohorts using a Student’s t-test (p-value < 0.05) between control subjects vs. (1) drug naïve PD and (2) medicated PD. There were 1378 and 504 features for drug naïve PD and medicated PD, respectively, which were significant between disease and control groups. Further details of significantly enriched pathways associated with PD can be found in Supplementary Tables [109]6 and [110]7 for drug naïve PD and medicated PD, respectively. Mummichog analysis reveals the carnitine shuttle to be the most important pathway linked to drug naïve PD patients (p = 0.002) (Fig. [111]4a). This pathway increases in significance (p = 5.09 × 10^−5) and enrichment within the medicated PD cohort, this can be visualised in Fig. [112]4c. The carnitine shuttle is highly involved in energy metabolism through the facilitation of long chain fatty acid (LCFA) β-oxidation via assisted transportation into the mitochondria by acyl-carnitine substrates^[113]35. Decreased long-chain acyl-carnitines, associated with insufficient β-oxidation, have previously been reported as potential diagnostic markers for PD^[114]28. The dysregulation of carnitine shuttle and vitamin E pathways have also been observed in frail elderly cohorts (between 56 and 84 years old) compared to resilient age-matched individuals^[115]36. The mapped m/z features correspond to a series of differing length fatty acid chains of acyl-carnitine conjugates. As the carnitine shuttle is a mediation pathway for fatty acid oxidation, it is reasonable that the perturbation of fatty acid biosynthesis and fatty acid metabolism pathways could be linked, which is further supported by the putative assignment of associated compound classes to VIP compounds. Fig. 4. Results of mummichog analysis for significant pathways (p < 0.05). [116]Fig. 4 [117]Open in a new tab Bar charts report pathways for (a) drug naïve PD vs. control and (b) medicated PD vs. control. c A bubble chart displaying the common significant pathways between drug naïve PD and medicated PD compared against controls; the bubble size refers to the enrichment factor of the pathway and the colour represents the natural log of the pathway p-value. Additional compounds putatively annotated from PLS-DA models (VIP > 1) belong to the sphingolipid class of compounds (Table [118]2). The sphingolipid metabolism pathway was enriched in both drug naïve and medicated PD. Sphingolipids are a major lipid class that are abundant in lipid-rich structures of the body (such as skin) and have central roles in cell signalling and regulation. Interestingly, disruption to the sphingolipid metabolism has been reported as a downstream effect of increased α-synuclein levels^[119]37,[120]38 and α-synuclein is disrupted in PD skin^[121]39. Perturbations within the sphingolipid pathway have been previously linked to defects in both lysosomal and mitochondrial metabolism, which are often implicated in the pathogenesis of neurodegenerative diseases such as PD and Gaucher’s disease^[122]38,[123]40–[124]43. Interestingly, the link between mitochondrial dysregulation and PD has been widely established in skin fibroblasts, however, never before in sebum^[125]44,[126]45. Recent studies have found the dysregulation of ceramide levels, which are common structural units of all sphingolipids, in numerous diseases including PD, Alzheimer’s disease and depression, although the general consensus from studies of sphingolipids in PD is an increase in ceramide levels^[127]46–[128]48. Due to their bioactive role within cell membranes sphingolipids are strongly linked to sterol metabolism pathways, and have an established role in the modulation of steroidogenesis. There is a direct link between ceramides and the biosynthesis of cholesterol which is then the feed in substrate for steroid hormone biosynthesis, the most significantly altered pathway shown for medicated PD patients^[129]49,[130]50. In conclusion, an untargeted LC-MS analysis of sebum obtained non-invasively from a simple skin swab from people with Parkinson’s reveals a difference in the composition of sebum compared to control subjects. The overlap of ten metabolites from separate statistical analyses for drug naïve PD and medicated PD strengthens the evidence, that these compounds are associated with PD and not associated with dopaminergic medication. This is further supported by the identification of common pathways between the two PD classes that are significantly enriched. Insufficient clinical data is available for these patients to hypothesise on the ability of a sebum analysis to help stratify disease progression, although it should be included in further studies. Future work will also focus on targeting the putatively identified lipid classes, with the inclusion of ion mobility to enhance separation and increase the confidence in compound identification. Methods Sample participants The participants included in this study were part of a nationwide recruitment process taking place at 25 different NHS clinics, in addition to subjects (n = 4) that participated in a clinical trial in the Netherlands^[131]51. A total of 274 participants were recruited from three subject groups: control (n = 56), drug naïve PD (n = 80), and medicated PD (n = 138). The participants included in this study were selected at random from these sites. Ethical approval for this project (IRAS project ID 191917) was obtained by the NHS Health Research Authority (REC reference: 15/SW/0354). Informed consent was received from all participants prior to their enrolment in the study. Chemicals and materials The chemicals and materials utilised in this study were: gauze swabs (Arco, UK), sample bags (GE Healthcare Whatman^TM[,] UK), 15 mL and 50 mL centrifuge tubes (Greiner Bio-One, UK), microcentrifuge tubes 2 mL (Eppendorf, UK), Ministart® 0.2 µm syringe filter (Sartorius, UK), Optima® LC-MS grade solvents 2-propanol, acetonitrile, methanol, and formic acid (Fisher Scientific), HPLC grade HiPerSolv CHROMANORM® ethanol absolute (99.8%), CHROMASOLV^TM LC-MS grade water (Honeywell) and Leucine Enkephalin (Waters, Wilmslow, UK). Sample collection Using a standard sampling procedure, each participant was swabbed by a clinician on the upper back with cotton-based medical gauze (7.5 cm × 7.5 cm) to collect sebum present on the skin. The sampled gauze swabs were sealed in background-inert plastic bags and transported to the central facility at the University of Manchester, where they were stored at −80 °C until end of recruitment. Sample extraction Gauze swabs were removed from −80 °C storage and allowed to equilibrate to room temperature. A solvent extraction method was used to prepare the samples for LC-MS analysis. Each gauze swab was transferred to an inert glass bottle. Methanol (9 mL) was added to each glass bottle and followed by vortex-mixing (10 s) and sonication (30 min) at ambient temperature, to extract sebum metabolites from gauze. The extracted metabolite-rich methanol was decanted from the gauze swab bottle and this solution was passed through a filter (0.2 µm). A recovery rate of approximately 7 mL per sample was achieved, which was aliquoted into three 2 mL fractions and one 1 mL fraction. Each 2 mL fraction was vacuum concentrated (Eppendorf) at ambient temperature for 12 h to remove methanol, which resulted in three identical sebum extracts per patient sample. These dried pellets were stored at −80 °C until required for analysis. A portion of the remaining 1 mL liquid fraction of each sample (100 μL) was used to create a biological pooled quality control (QC) sample. The mixture was vacuum centrifuged (Eppendorf) for 12 h at ambient temperature and the dried extract stored at −80 °C until analysis. Sample reconstitution Prior to LC-MS analyses dried sebum extracts were equilibrated to ambient temperature before reconstitution. Extracts were resuspended in 200 µL of MeOH:EtOH (v/v, 50:50). Samples were vortex-mixed (20 s), sonicated (5 min), and centrifuged (Eppendorf) at 12,000 × g for 10 min. The recovered supernatant (160 µL) was then submitted for LC-MS analysis. LC-MS analysis LC-MS analysis was performed on an Ultimate 3000 UHPLC (Thermo Scientific) coupled to a Synapt G2-Si QToF mass spectrometer (Waters). LC-MS data was acquired using MassLynx 4.2 (Waters). An ACQUITY UPLC BEH C18 column (1.7 µm, 2.1 mm × 100 mm) heated at 55 °C was utilised for chromatographic separation. The mobile phases were as follows; mobile phase A was acetonitrile:water (v/v 60:40) with 0.1% formic acid, mobile phase B was isopropanol:acetonitrile (v/v, 90:10) with 0.1% formic acid. An injection volume of 5 µL was used. The flow rate was set at 0.6 mL/min and the gradient elution began at 40% B and increased to 50% B over 30 s, then to 69% B at 1.8 min, with a final ramp to 88% B at 6 min. The gradient was reduced back to 40% B and held for 1 min to equilibrate column. Full MS spectra were obtained for the mass range m/z 50–2000, whilst infusing Leucine-Enkephalin (m/z 556.2766) as an online mass calibrant to retain mass accuracy. MS settings were as follows: Synapt G2-Si MS was operated in Q-ToF mode. Capillary voltage was set to 3.0 kV, sampling cone voltage was set to 40 V, source temperature was kept at 120 °C, desolvation temperature was set to 550 °C and desolvation gas flow was 900 L/h. MS^E acquisitions used identical LC and MS conditions, with an added high energy ramp from 19 to 45 V. Sample sequence and quality control Pooled QC samples were used to check analytical reproducibility both during analysis and during the data processing stages^[132]52. QC samples were injected at the beginning of each analytical batch (n = 3), every 5th injection, and at the end of each analytical batch (n = 2). Samples from 274 participants were stratified and randomised into 15 equal analytical batches. Each batch was reconstituted on the day of analysis to maintain sample integrity. LC-MS^E data were acquired for five sequential injections of a single pooled QC sample using an LC-MS^E method in which all sampling preparation/handling, LC and MS conditions were identical to patient samples, except with an added high energy MS ramp. Data pre-processing and deconvolution LC-MS raw data were deconvolved using Progenesis QI (Waters, Wilmslow, UK). Peak picking, alignment, and area normalisation were carried out with reference to a pooled QC. The resulting peak table had 8765 metabolite features. Features that were absent in more than 10% of pooled QC injections throughout analysis were removed. From the remaining features those with more than 20% relative standard deviation (RSD) in peak intensity across pooled QC injections were also removed. The remaining peak set of 6202 metabolite features were robust features detected reproducibly throughout analysis within QC samples. The data were mean centred and auto-scaled and missing values were replaced with cubic spline interpolation in MATLAB 2019a (MathWorks) prior to statistical analysis. LC-MS^E raw data were deconvolved using Progenesis QI (Waters, Wilmslow, UK). Peak picking, alignment, and area normalisation were carried out using one of the QC data files as the reference. Significant features extracted from raw data were aligned to significant features in clinical samples, using a RT window ±15 s and mass tolerance ±10 ppm filters. Features were annotated using accurate mass match and tandem MS data with Lipid Maps, Lipid Blast, and METLIN. Mass tolerances of 10 and 30 ppm were applied for precursor and fragment ions, respectively. Compounds with a fragmentation score <20 were not annotated. Progenesis QI score, fragmentation score, and isotope similarity are reported for all annotations based on a combination of accurate mass and fragmentation data, see Supplementary Tables [133]4–[134]6. Statistical analysis PLS-DA was performed for classification and prediction of data; resampling with replacement (bootstrapping) was used for model validation where the correct classification rates (CCRs) from the Y-variable were computed for the (n = 250) test data sets only. An in-house script was used in MATLAB (2019a) to perform PLS-DA. Univariate ROC analysis was performed in Origin (Version 2017, OriginLab Corporation, Northampton, MA, USA) and multivariate ROC curve-based exploratory analysis was executed using MetaboAnalyst Biomarker Analysis (Version 4.0) in which the data matrix was auto-scaled and PLS-DA was used for the classification method, and feature ranking method with a two latent variable input. Pathway analysis Mummichog analysis was performed using MetaboAnalyst (Version 4.0). During mummichog analysis a list of all m/z features (L[ref]) and a refined list of significant m/z features (L[sig]) were generated using Student’s t-test as the discriminatory test (p-value < 0.05). Significant m/z features were mapped onto a combination of metabolic models: Kyoto Encyclopaedia of Genes and Genomes (KEGG), Biochemical Genetic and Genomic knowledgebase (BiGG) and the Edinburgh Model. Feature hits on known metabolite networks were tested against a null distribution produced from permutations of random m/z features from L[ref] to yield significance values of metabolites enriched within any given network^[135]34. Reporting summary Further information on experimental design is available in the [136]Nature Research Reporting Summary linked to this paper. Supplementary information [137]Supplementary Information^ (1MB, pdf) [138]Peer Review File^ (239.2KB, pdf) [139]Reporting Summary^ (229KB, pdf) Acknowledgements