Abstract Background Metabolic syndrome (MetS), a cluster of factors associated with risks of developing cardiovascular diseases, is a public health concern because of its growing prevalence. Considering the combination of concomitant components, their development and severity, MetS phenotypes are largely heterogeneous, inducing disparity in diagnosis. Methods A case/control study was designed within the NuAge longitudinal cohort on aging. From a 3-year follow-up of 123 stable individuals, we present a deep phenotyping approach based on a multiplatform metabolomics and lipidomics untargeted strategy to better characterize metabolic perturbations in MetS and define a comprehensive MetS signature stable over time in older men. Findings We characterize significant changes associated with MetS, involving modulations of 476 metabolites and lipids, and representing 16% of the detected serum metabolome/lipidome. These results revealed a systemic alteration of metabolism, involving various metabolic pathways (urea cycle, amino-acid, sphingo- and glycerophospholipid, and sugar metabolisms…) not only intrinsically interrelated, but also reflecting environmental factors (nutrition, microbiota, physical activity…). Interpretation These findings allowed identifying a comprehensive MetS signature, reduced to 26 metabolites for future translation into clinical applications for better diagnosing MetS. Keywords: Metabolic syndrome, Metabolomics, Deep phenotyping, Lipidomics, Metabolic signature __________________________________________________________________ Research in context. Evidence before this study Prior association studies linking metabolites and lipids with MetS (i) have been limited in terms of molecular species profiled, (ii) lacked of considering the interaction between metabolisms as well as with extrinsic factors, and (iii) were very rarely issued from longitudinal studies. Added value of this study Our deep phenotyping approach, along with a 3-year follow-up design, provides robust and integrated insights into MetS mechanisms and proposes new candidate biomarkers within an optimized statistically, analytically and biologically refined associated molecular signature. Implications of all the available evidence These findings highlight the interest of a comprehensive molecular signature as marker of MetS, that should be validated for future translation into clinical applications for better diagnosing MetS. Alt-text: Unlabelled box 1. Introduction The metabolic syndrome (MetS), defined as a cluster of risk factors for cardiovascular disease (CVD), has been recognized for decades with a rising prevalence worldwide [65][1]. The main culprits of this rise are the aging of the population and the complex interactions between lifestyle factors such as unhealthy dietary habits and sedentarity, leading to overweight and obesity [[66][2], [67][3], [68][4]]. Because several clinical definitions co-exist [69][5] among health organizations (e.g. National Cholesterol Education Program (NCEP), International Diabetes Federation (IDF), World Health Organization (WHO)), the true prevalence of MetS is difficult to reliably establish. However, MetS comprises elevated blood pressure, dyslipidemia, including hypertriglyceridemia and reduced blood levels of high-density lipoprotein cholesterol (HDL-C), fasting hyperglycemia, and central adiposity. It is now accepted that it represents a global public health concern with a worldwide prevalence reaching one third of US adults having MetS and over 45% by the age of 60 [[70]1,[71]6]. There is also a consensus regarding the presence of multiple metabolic risk factors for CVD and type 2 diabetes (T2D) [[72]7,[73]8]. Moreover, considering the combination of concomitant components, and their development and severity profile, patients identified with MetS are largely heterogeneous, inducing a disparity in the diagnosis and therapeutic approach [74][9]. A better characterization of pathophysiological alterations associated with MetS could therefore contribute to improve diagnosis and better syndrome delineation. In this context, metabolomics and lipidomics have emerged over the last decade as powerful tools for the analysis of phenotypes, providing key insights into modified metabolic pathways and better understanding of pathophysiological processes [[75]10,[76]11]. Indeed, metabolic profiles allow getting an integrated view of metabolism because of a sensitive detection of molecular changes over time, resulting from the interaction between intrinsic and extrinsic factors. Metabolites, used as single targets or in combination within a comprehensive signature, are thus promising biomarkers to reveal metabolic dysfunctions. Metabolomics has therefore been widely applied for metabolic disease phenotyping and candidate biomarker discovery as well as pathophysiological exploration of underlying mechanisms [[77]12,[78]13]. However, even if studies on T2D have been among the main drivers in this chronic metabolic disease research field using these global approaches for biomarker research, few studies focussed on MetS and often consisted in targeted approaches with a restricted number of detected metabolites [79][14]. Consequently, an integrated vision of metabolic derangements is lacking along with a limited capacity of study comparisons [80][15]. In the present study, a 3-year follow-up design of stable subjects within an observational longitudinal cohort, as well as a deep phenotyping approach based on a multiplatform strategy involving metabolomics and lipidomics untargeted methods, were set up, with the objective to better characterize metabolic perturbations in MetS and define a comprehensive MetS signature stable over time in older men. 2. Materials 2.1. The NuAge cohort and subject selection The present study was designed within the 5-year observational Quebec Longitudinal Study on Nutrition and Successful Aging (NuAge). The cohort was constituted of 1793 men and women in good general health, selected from three age groups (68–72, 73–77, 78–82) at recruitment. French or English-speaking community-dwelling participants were committed to give fasting blood, undergo several direct measures annually, and to answer questionnaires related to food and health biannually. The NuAge database comprises large qualitative and quantitative data related to anthropometry/body composition, nutrition/dietary intakes, numerous markers of physical, clinical and cognitive status, physical activity, functional autonomy and social functioning. Methodological description of measures, questionnaires and blood test, processing and storing have been described in Gaudreau et al. [81][16]. 2.1.1. Ethical approval All procedures performed in the study involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed consent was obtained from all individual participants included in the NuAge study. The NuAge Study has been approved by the Research Ethics Board (REB) of both the Geriatric University Institutes of Montreal and Sherbrooke Research Centers. The management framework of the actual NuAge Database and Biobank has been approved by the REB of the CIUSSS-de-l'Estrie-CHUS (protocol #2019-2832). 2.1.2. Subject selection A case/control study on MetS was designed within the NuAge cohort, with serum samples collected at two time points (recruitment 2003–2005 (T1) and 3 years later (T4)), with the objective to identify a metabolic signature of MetS, stable over time using a multiplatform lipidomic/metabolomics approach. In this context, an optimized subject selection strategy was developed. Briefly, the selection was based on the presence and number of MetS criteria, and their stability over the three years. It was performed among the 853 males as it has been recognized that in the province of Quebec, men have more risk factors of MetS than women [[82][17], [83][18], [84][19], [85][20]]. MetS was defined using the following criteria - thresholds defined for men [[86]5,[87]21]: elevated waist circumference (≥ 102 cm, WC); high blood pressure (systolic > 130 mmHg and/or diastolic > 85 mmHg) or antihypertensive drug treatment with history of hypertension, elevated fasting blood glucose (≥ 5.7 mM) or drug treatment for hyperglycemia (oral hypoglycemic, insulin); high circulating triglyceride levels (≥ 1.7 mM) or drug treatment (fibrates, nicotinic acid); and reduced-HDL-cholesterol (< 1.0 mM) or drug treatment (fibrates, nicotinic acid). Regarding the study objectives, only stable subjects over time were included. Using these five criteria, subjects with unstable (changing status) or uncertain MetS status (due to missing values) over time were then excluded. Cases were defined as having three or more of the MetS criteria, while controls were defined as having less than three MetS criteria at each time point. It resulted in identifying 61 incident cases and 88 controls. Concerning control individuals, it was important to exclude extreme subjects that could generate false negative results. Therefore, in agreement with clinicians, controls with seven or more drug treatments were excluded [88][22]. Moreover, value outliers were analyzed. Because no time effect was observed for the quantitative variables defining MetS, individuals with mean extreme values for MetS biological variables over time, outside the range defined by the mean (T1 to T4) ± 1.5 interquartile range (IQR) were excluded. Finally, this strategy ended up selecting 61 cases and 62 controls. Because it is known that metabolomic profiles are modified by age, it was checked that there was no significant age difference between cases and controls to avoid a potential bias. To do so, three experimental classes were defined according to the age distribution (67–72 years old (n = 25 vs 22), 73–77 years old (n = 22 vs 24), 78–84 years old (n = 15 vs 15)), and the size balance between age class in both groups was checked using Fisher's Exact Test. 2.1.3. Epidemiological data Fifty-eight quantitative variables evaluated at T1 and T4 were considered to precisely describe the selected population: 23 biochemical parameters, 8 clinical variables, 25 nutritional data and finally 2 scores related to physical activity (Physical Activity Scale for the Elderly (PASE) questionnaire; [89][23]) and health-related quality of life (using physical (PCS) component summary score derived from the Medical Outcome Study 36-item Short Form Health Survey [SF-36] questionnaire; [[90]24,[91]25]). In particular, nutritional data consisted in intake data obtained from the mean of two to three non-consecutive 24 h dietary recalls (24-HR) [92][26], as well as in a validated Canadian global dietary quality index, the Canadian Healthy Eating Index (C-HEI) [93][27]. This index is based on intake of four food groups: grain products, fruits and vegetables, milk products, meat and alternatives, and five other items: % of energy as total fat intake and saturated fat intake, cholesterol, salt and diet variety. The total score ranges from 0 to 100, with higher scores indicating whether the nutritional quality of the diet is closer to the Canadian guidelines for healthy eating. 2.2. Randomization of biological samples Following sample selection and in perspective of multiplatform analyses, sample preparation and analytical sequence had to be carefully built. In metabolomics, analytical sequences are usually randomized using a Williams-Latin-Square strategy defined according to the main factors of the study, as well as potential confounding factors linked to sampling conditions. In the present work, samples were randomized using this strategy, defined first according to the main factor of the study (MetS), considering the sum of the annual number of MetS criteria between the two time points (T1 to T4), (divided in 4 groups: 0–3; 3–7; 11–14; 15–20; 0 being no positive criteria over the 3 years and 20 for 5 positive criteria over this period). This randomization was used both for sample preparations and analyses. 3. Methods Seven complementary untargeted metabolomics methods based on 3 different analytical platforms, Ultra High-Performance Liquid Chromatography coupled to High-Resolution Mass Spectrometry (LC-MS), Gas Chromatography coupled to High-Resolution Mass Spectrometry (GC-MS), and Nuclear Magnetic Resonance spectroscopy (NMR), were used to characterize the MetS phenotypic spectrum. Quality control samples were designed and prepared to control for potential bias due to sample preparation or analytical drifts. Since in untargeted metabolomics hundreds to thousands of metabolites are detected, the use of internal standards for each metabolite is almost impossible and pooled quality control (QC) samples are recognized to be the most appropriate approach [94][28]. In the present study, these QC samples consisted in a pool of human serum samples extracted independently and subsequently diluted 1/2, 1/4 and 1/8. All analytical sequences were standardized: at least three blank (solvent) samples and five pooled QC samples were injected for column conditioning. Then, the stability of the analytical system was monitored using these QC, injected one time at the beginning of each analytical sequence and thereafter every 10 samples. 3.1. Data production 3.1.1. Ultra high-performance liquid chromatography coupled to high-resolution mass spectrometry (LC-MS) Three methods were performed to maximize the serum metabolome coverage: reversed phase LC-MS (C18) analysis complemented by hydrophilic interaction chromatography (HILIC) to allow the detection of polar metabolites and an untargeted lipidomics approach using a reverse phase LC-MS (C8) to profile a large set of lipid species. 3.2. C18-based system (C18Pos and C18Neg) Serum samples (100 µL) were slowly thawed on ice at room temperature. Proteins precipitation was performed by addition of 200 µL of ice-cold methanol (MeOH). This mixture was vortexed and placed at −20 °C for 30 min. After a 10 min centrifugation (4 °C, 15,493 g, Sigma 3-16PK, Fischer Bioblock Scientific), the supernatant was divided into three 45 µL aliquots, dried completely (EZ2.3 Genevac, Biopharma Technologies France) and stored at −80 °C until further analysis. Just before analysis, 150 µL of injection solvents (water and acetonitrile 50/50 + 0.1% Formic Acid) was added to the dry fraction. A pooled QC sample was prepared by mixing 5 µL from each extracted sample. This sample preparation was automated on a Freedom EVO200 TECAN robot (Tecan Trading AG, Switzerland,), enabling liquid handling with a high repeatability (CV≤0.75%). Metabolic profiles were determined using an U3000 liquid chromatography system (Thermo Fisher Scientific, San Jose, CA, USA) coupled to a high-resolution Bruker Impact HDII UHR-QTOF (Bruker Daltonics, Wissembourg, France) equipped with an electrospray source (ESI). Chromatographic separation was performed on a Waters HSS T3 column (150 × 2.1 mm, 1.8 µm) at 0.4 mL/min, 30 °C and using an injection volume of 5 µL. Mobile phases A and B were water and acetonitrile with 0.1% formic acid, respectively. The gradient elution was 0% B (2 min), 0–100% B (13 min), 100% B (7 min), 100–0% B (0.1 min) and 0% B (3.9 min for re-equilibration). The mass resolving of the mass spectrometer was 50,000 full width at half maximum (FWHM) at m/z 1222. Samples were analyzed in the positive and negative ionization modes (C18Pos, C18Neg). Capillary and end plate offset voltages were set at 2500 V and 500 V for the ESI source. The drying gas temperature was 200 °C and nebulization gas flow was 10 L/min. Mass spectrum data was acquired in full-scan mode over mass range 50–1000 mass-to-charge ratio (m/z). 3.3. HILIC-based system (HILICneg) Metabolite extraction was performed from 50 µL of serum following methanol-assisted protein precipitation as previously described [95][29]. Briefly, 200 μL of methanol containing internal standards at 3.75 µg/mL (Dimetridazole, 2-amino-3-(3-hydroxy-5-methyl-isoxazol-4-yl) propanoic acid (AMPA), 2-methyl-4-chlorophenoxyacetic acid (MCPA), Dinoseb (Sigma-Aldrich, Saint-Quentin Fallavier, France)) were added to 50 µL of serum. The resulting samples were then left on ice for 90 min until complete protein precipitation. After centrifugation (20,000 g, 15 min, 4 °C), supernatants were collected and dried under a nitrogen stream using a TurboVap instrument (Thermo Fisher Scientific, Courtaboeuf, France) and stored at −80 °C until analysis. Dried extracts were resuspended in 150 µL of ammonium carbonate 10 mM pH10.5/acetonitrile (40:60). After reconstitution, the tubes were vortexed, incubated in an ultrasonic bath for 5 min on ice, and centrifuged (20,000 g, 15 min, 4 °C). A volume of 95 µL of the supernatant was transferred into 0.2 mL vials. External standard solution (5 µL; mixture of 9 authentic chemical standards covering the mass range of interest: ^13C-glucose, ^15N-aspartate, ethylmalonic acid, amiloride, prednisone, metformin, atropine sulfate, colchicine, imipramine) was added to all samples in order to check for consistency of analytical results in terms of signal and retention time stability throughout the experiments. The QC samples were prepared by mixing 20 µL of each extracted sample. QC samples were injected every 5 samples. Metabolic profiling experiments were performed using an U3000 liquid chromatography system coupled to an Exactive mass spectrometer from Thermo Fisher Scientific (Courtaboeuf, France) fitted with an electrospray source operating in the negative ion mode. Chromatographic separation was performed on a Sequant ZICpHILIC column (5 µm, 2.1 × 150 mm, Merck, Darmstadt, Germany) maintained at 15 °C for improved peak shape and chromatographic separation of nucleotidic metabolites [[96]30,[97]31], and also equipped with an on-line prefilter (Thermo Fisher Scientific, Courtaboeuf, France). Mobile phases A and B were an aqueous buffer of 10 mM ammonium carbonate in water adjusted to pH 10.5 with ammonium hydroxide, and 100% acetonitrile, respectively. The flow rate was 200 µL/min. Chromatographic elution was achieved under the following gradient conditions: isocratic step of 2 min at 80% B, followed by a linear gradient from 80 to 40% of phase B from 2 to 12 min. The chromatographic system was then rinsed for 5 min at 0% B, and the run ended with an equilibration step of 15 min (80% B). The Exactive mass spectrometer was operated with a capillary voltage set at −3 kV and a capillary temperature set at 280 °C. The sheath gas pressure and the auxiliary gas pressure (nitrogen) were at 60 and 10 arbitrary units, respectively. The mass resolving power of the analyzer was 50,000 (FWHM) at m/z 200, for singly charged ions. The detection was achieved from m/z 75 to 1000. 3.4. Lipidomic untargeted approach (LIPIDO) Serum samples were extracted using an adapted method to that previously described [98][32]. Briefly, 100 μL of serum was added to 490 μL of CHCl[3]/MeOH 1:1 (v/v) and 10 μL of internal standard mixture. Samples were vortexed for 60 s, sonicated for 30 s using an ultrasonic probe (Bioblock Scientific Vibra Cell VC 75,185, Thermo Fisher Scientific Inc., Waltham, MA, USA) and incubated for 2 h at 4 °C with mixing. Seventy-five μL of H[2]O was then added and samples were vortexed for 60 s before centrifugation at 15,000 g for 15 min at 4 °C. The upper phase (aqueous phase), containing gangliosides, lysoglycerophospholipids, and short chain glycerophospholipids, was transferred into a glass tube and dried under a stream of nitrogen. The protein disk interphase was discarded and the lower lipid-rich phase (organic phase) was pooled with the dried upper phase and the mixture dried under nitrogen. Samples were resuspended with 100 μL of a solution CHCl[3]/MeOH 1:1 (v/v). Ten μL were 100-fold diluted in a solution of MeOH/isopropanol/H[2]O 65:35:5 (v/v/v) before injection. Lipidomic profiles were determined using an Ultimate 3000 liquid chromatography system (Thermo Fisher Scientific, San Jose, CA, USA) coupled to a high resolution Thermo Orbitrap Fusion (Thermo Fisher Scientific, San Jose, CA, USA) equipped with an electrospray source (ESI). Chromatographic separation was performed on a Phenomenex Kinetex C8 column (150 × 2.1 mm, 2.6 µm) at 0.4 mL/min, 60 °C and using an injection volume of 10 µL. Mobile phases A and B were H[2]O/MeOH 60:40 (v/v), 0.1% formic acid and isopropanol/MeOH 90:10 (v/v), 0.1% formic acid in negative ionization mode (LIPIDOneg), respectively. Ammonium formate (10 mM) was added to both mobile phases in the positive ionization mode (LIPIDOpos) in order to detect glycerolipids and cholesteryl-esters under [M+NH[4]]^+ form. The gradient elution was solvent B was maintained for 2.5 min at 32%, from 2.5 to 3.5 min it was increased to 45% B, from 3.5 to 5 min to 52% B, from 5 to 7 min to 58% B, from 7 to 10 min to 66% B, from 10 to 12 min to 70% B, from 12 to 15 min to 75% B, from 15 to 19 min to 80% B, from 19 to 22 min to 85% B, and from 22 to 23 min to 95% B; from 23 to 25 min, 95% B was maintained; from 25 to 26 min solvent B was decreased to 32% and then maintained for 4 min for column re-equilibration. The mass resolving power of the mass spectrometer was 240,000 (FWHM) for MS experiments. Samples were analyzed in both positive and negative ionization modes. The ESI source parameters were as follows: the spray voltage was set to 3.7 kV and −3.2 kV in positive and negative ionization mode, respectively. The heated capillary was kept at 360 °C and the sheath and auxiliary gas flow were set to 50 and 15 (arbitrary units), respectively. Mass spectra were recorded in full-scan MS mode from m/z 50 to m/z 2000. 3.4.1. Gas chromatography coupled to high-resolution mass spectrometry (GCMS) Serum samples were slowly thawed at 4 °C overnight. Four hundred µL of ice-cold methanol (−20 °C) were added to 100 µL serum sample and the mixture was vortexed. After protein precipitation, samples were kept at −20 °C for 30 min and then centrifuged (Sigma 3–16PK, Fischer Bioblock Scientific) at 20,627 g for 10 min at 4 °C. Two hundred and fifty µL of supernatant were transferred into a 2 mL amber glass vial. After the addition of 10 µL of [^13C[1]]-l-valine (200 µg/mL), samples were evaporated under EZ2.3 Genevac (Biopharma Technologies France). At the same time and in parallel, a control derivatization sample (serum substituted by milliQ water) was prepared in order to remove the background noise produced during sample pre-processing, derivatization, and GC/MS analysis. The dry residues were dissolved with addition of 80 µL of methoxylamine solution (15 mg/mL in pyridine) to each vial, vortexed vigorously for 1 min and incubated for 24 h at 37 °C (in order to inhibit the cyclization of reducing sugars and the decarboxylation of α-keto acids). Then, 80 µL of N,O-bis(Trimethylsilyl)trifluoroacetamide (BSTFA) with 1% trimethylchlorosilane (TMCS) as catalyst were added into the mixture for derivatization (60 min, 70 °C). Before injection, 50 µL of derivatized mixture were transferred in a glass vial containing 100 µL heptane. QC pool samples were prepared using 10 µL of each extracted and derivatized samples. Metabolic profiles were obtained using an Agilent 7890B Gas Chromatograph coupled to an Agilent Accurate Mass QTOF 7200 equipped with a 7693A Injector (SSL) Auto-Sampler (Agilent Technologies, Inc). Separation was achieved on a fused silica column HP-5MS UI 30 m x 0.25 mm i.d. chemically bonded with a 5% phenyl-95% methylpolysiloxane cross-linked stationary phase (0.25 µm film thickness) (Agilent J & W Scientific, Folsom, CA, USA). Helium was used as a carrier gas at a flow rate of 1 mL/min. Two µl of derivatized sample was injected using 1:20 split. Temperatures of injector, transfer line, and electron impact (EI) ion source were set to 250 °C, 280 °C and 230 °C, respectively. The initial oven temperature was 60 °C for 2 min, ramped to 140 °C at a rate of 10 °C/min, to 240 °C at a rate of 4 °C/min, to 300 °C at a rate of 10 °C/min and finally held at 300 °C for 8 min. Agilent ‘‘retention time locking” (RTL) was applied to control the reproducibility of retention times. [^13C[1]]-l-valine was used to lock the GC method [99][33]. The electron energy was 70 eV and mass data were collected in a full scan mode (m/z 55-700) using a resolving power of 7000 (FWHM) to m/z 464 (perfluorotributylamine, PFTBA). Acquisition rate was 5 spectra/sec with acquisition time of 200 msec/spectrum. Four heptane blanks were injected at the beginning of each sequence, followed by four pool samples, and then one pool sample and one derivatization control sample after each set of 10 samples. Initially tune and calibrate the system were performed using PFTBA with acquisition conditions 2 GHz EDR with N2 (1.5 mL/min) and the limits for average PPM error were 3.0 and maximum error: 8.0. Also, a calibration was made between each sample. 3.5. Nuclear magnetic resonance spectroscopy (NMR) Serum aliquots (50 µL) were slowly thawed at room temperature on ice. One hundred µL of phosphate buffer (0.2 M, pH 7.0) prepared in deuterium oxide (D[2]O) were added to the aliquots, and each sample was vortexed and centrifuged for 15 min at 4500 g and 150 µL of the supernatants were transferred into the 3 mm NMR tubes. All ^1H NMR spectra of serum samples were obtained on a Bruker Avance III HD spectrometer (Bruker, Karlsruhe, Germany) operating at 600.13 MHz for ^1H resonance frequency and equipped with an inverse detection 5 mm CQPCI ^1H-^31P-^13C-^15N cryoprobe connected to a cryoplatform and a cooled SampleJet sample changer. Spectra were acquired at 300 K using the Carr-Purcell-Meiboom-Gill (CPMG) spin-echo pulse sequence with a total spin-echo delay of 240 msec to attenuate broad signals from proteins and lipoprotein and a 2 s relaxation delay. A water suppression signal was achieved by pre-saturation during the relaxation delay. The spectral width was set to 20 ppm for each spectrum, and 256 scans were collected with 32 K points. Free induction decays were multiplied by an exponential window function before Fourier Transform. The spectra were manually phased and calibrated to the lactate signal (δ 1.33 ppm), and the baseline was corrected using TopSpin 3.2 software (Bruker, Karlsruhe, Germany). 3.6. Data treatment Following metabolomic/lipidomic analyses, some samples were identified as missing, because of problem in sample preparation or missing data (1 for C18Pos, 6 for HILIC, 4 for Lipidomic and 1 for GCMS). All the obtained raw data from metabolic profiles were processed to yield a data matrix containing variables and peak intensities. All the data treatments were performed separately for each analytical method as individual datasets, under the Galaxy web-based platform Worflow4Metabolomics (W4M) [100][34] to ensure the standardization and reproducibility of the data treatment workflows. 3.6.1. Data extraction and pre-processing for MS First, raw data were extracted using XCMS [101][35], followed by quality checks and signal drift correction according to the strategy described by van der Kloet et al. [102][36] based on the use of pooled QC samples, to yield a data matrix containing retention times, masses, and peak intensities that have been corrected for batch effects. These steps include noise filtering, automatic peak detection, and chromatographic alignment. In particular, all XCMS extractions used a “minfrac” parameter of 0.2 to keep variables if present in at least 20% of the samples, since a huge variability of profiles in the selected individuals was expected. Due to a high degree of correlations between the two lipidomic extracted datasets, they were merged for further data processing. After signal drift and batch effect correction within the six datasets, metabolite MS signals were then filtered using the following criteria: ratio of chromatographic peak areas of samples over blanks (above 3), correlation between QC pool dilution factors and areas of chromatographic peaks (over 0.7), repeatability of QC pool samples (CVs under 30%) and ratio of QC pool sample CVs over biological sample CVs (below 1). 3.6.2. NMR data pre-processing The NMR spectra were imported in the Amix software (version 3.9.15, Bruker, Rheinstetten, Germany) for data integration. A variable size bucketing was performed based on graphical pattern (74 buckets) and each bucket was then integrated. 3.6.3. Filtration During the analysis, metabolites produce several analytical features corresponding to signals derived from different adduct ions generated in the ESI process, signals from the presence of isotopes in the molecule, signals from in-source fragmentation processes, and to different peaks from the same molecule in NMR. The data extraction step results in thousands of features present in the final datasets with a high degree of correlation, which is a constraint for the use of various data mining and statistical methods. For example, analytical redundancy highly affects multiple testing correction. Indeed, having non-independent variables (coming from the same metabolite) lead to an over-correction of data that can hide potentially relevant information. Therefore, the analytical redundancy inside each of the 6 datasets was reduced in the present study. In metabolomics, filtering was technique-specific but with a common characteristic to reduce correlation above 0.90 and to select one single representative per group, as being the most intense signal for MS data and the purest one for NMR. This procedure was conducted using the Analytic Correlation Filtration (ACorF) tool [103][37] within W4M, with a manual selection of the representative feature only for NMR. In lipidomics, this step was performed according to the workflow previously described [104][32]. Briefly, a first automatic feature annotation was achieved through using an in silico database containing the exact masses corresponding to pseudo-molecular ions ([M + H]^+, [M-H]^− and [M-2H]^2^−), adducts([M+NH[4]]^+, [M+Na]^+,[M-H+CHO[2]]^−), and in source fragments ([M + H−H[2]O]^+) ions along with their corresponding ^13C and double ^13C isotopes. Furthermore, specific retention time windows for each lipid class were also added by examining retention times of species containing the longest and the shortest fatty acyl chains. Then, annotated lipid species were thus kept if (i) their ^13C isotope was detected and aligned in time (± 5 s), and (ii) all related ions (i.e. pseudo-molecular ions, adduct ions and/or in source fragments, either as monoisotopic or ^13C and 2×^13C isotopes) had the same retention time as a reference ion specific of a lipid class/subclass (± 5 s, and ±10 s between the two ionization modes after merging the corresponding peaktables). In addition, the relative isotopic abundance (RIA) between the monoisotopic ion and its corresponding ^13C isotope, were automatically calculated and compared to theoretical ones. Annotated lipid species with an RIA error higher than 30% were filtered out. This threshold of 30% was selected since RIA errors of all internal standards were below this value. The two lipidomic peaktables obtained in both positive and negative ionization modes were merged, because of their high degree of correlations, due to the detection of specific lipid classes in both modes (i.e.: lysophosphatidylcholines, phosphatidylcholines and sphingomyelins) under [M + H]^+ and [M-H+CHO[2]]^− forms, respectively. The two peaktables were aligned according the retention time at ± 10 s. 3.7. Statistical analyses All statistical analyses were performed after data pre-processing and filtration of the individual 6 datasets. 3.7.1. Measurement of serum metabolomes in the NuAge MetS sub-cohort Correlation analyses were performed to give an overview of links between detected metabolites/lipids in serum, both at the level of method datasets and individual variables. First, the RV coefficients [105][38] were used to provide insight into the global association between datasets using the R software (version 3.4.1) [106][39], with the R-package FactoMineR [107][40]. This coefficient [108][38] is a multivariate generalization of the squared Pearson correlation coefficient, defining a scale of similarity between two matrices and measure to what degree the different datasets give the same view on the samples [109][41]. Second, to investigate individual correlations between detected features, pair-wise Spearman correlation coefficients between variables were calculated using the Between Table Correlation tool available in the W4M and a network analysis was done. The significant correlation coefficients >0.7 (after Benjamini-Hochberg correction) were filtered and a graphical representation of Spearman correlation network was made with Cytoscape [110][42]. 3.7.2. Metabolite and lipid levels modulated with MetS Individuals in this study were selected stable regarding their MetS status. Nonetheless we could expect that part of their metabolism was affected by time. Thus, the impact of time on the metabolomic/lipidomic datasets was also evaluated. As no interaction effect was observed between status and time, linear mixed models (LMM) were performed to analyze repeated measures, considering fixed effect factors (time, status (case/control), and their interactions) and subject as random effect, using the module available in W4M. In order to verify that the LMM assumptions were met, we considered the different residuals of LMM. The assumption of homogeneity of variance of the residuals was checked for each fixed factor using a Levene test. Then, the normality of the conditional residuals and random effect residuals were verified using quantile-quantile plot. The linearity of fixed effects was checked as proposed by Singer et al. [111][43] using plot of the marginal predictions vs standardized marginal residues. A p-value threshold of 0.05 after Benjamini-Hochberg (BH) correction was considered to detect variables strongly affected by status and time. Similar statistical analyses were performed on epidemiological data. 3.7.3. Identification of a comprehensive molecular MetS signature The objective of the present study was to identify a limited number of metabolites that could together reflect the MetS status. In this context, a variable selection was first performed based on the methodology developed by [112]Rinaudo et al. [113][44], using the biosigner module available in the W4M Galaxy instance [114][34] on each individual dataset. The aim was to focus on the variables, which significantly contribute to the performance of the discrimination. As feature selection may be affected by correlations between variables, a Pearson correlation filter on each dataset (over 0.8) was applied beforehand. All variables selected by biosigner with at least one of three classifiers (Partial Least Squares Discriminant Analysis (PLS-DA), Support Vector Machine (SVM), and Random Forest (RF)) were first considered. Then, this process was repeated five times to cope for the selection variability induced by the bootstrap effect of the methodology. The unions over the five repetitions were included in individual predictive subsets. In a second step the selected variables of each subsets were integrated into a common PLS-DA model to characterize the discriminant power of the comprehensive signature by combining the 6 individual predictive subsets. For all PLS analyses, unit variance scaling (UV) was applied to variable intensities. All PLS models were defined using the 7-fold cross validation method. The prediction power of the model was assessed using the Q^2 parameter. To check that PLS components could not lead to a correct classification by chance, a permutation evaluation was carried out (n = 200). For each test, samples are randomly assigned to each experimental group, a PLS model is carried out and R^2Y and Q^2 are computed. The result of the tests is displayed on a validation plot, which shows the correlation coefficient with the original non-permuted sample, having a value 1 on the horizontal axis and R^2Y and Q^2 values on the vertical axis. Logically, permuted samples must lead to poor predictive models with lower Q^2 values compared to the true model. In a perspective of future clinical application, an optimized reduced signature was then proposed. To fulfill this objective, the redundancy between methods was eliminated (correlation coefficient > 0.8), keeping the most robust variable (highest intensity, best peak purity). In a second step, this signature was restricted to the strictly formally identified compounds. The prediction model performance was evaluated using a confusion matrix, cross-validated error rates (using 200 repetitions of random training/test splits), and areas under ROC curves (AUC) [115][45] using the R software (version 3.6.2) [R package “pROC” [116][46]] with a CI estimated with the DeLong's method [117][47]. 3.8. Metabolite annotation The metabolite annotation was first conducted computationally using W4M and then, all annotations involved manual curation and interpretation of spectra. Metabolites contributing to the discrimination of the MetS phenotype were first identified using in-house databases, containing the reference spectra of more than 2000 authentic standard compounds analyzed in the same analytical conditions, and providing a comprehensive spectral information (i.e. protonated or deprotonated molecules, adducts and in-source fragment ions for LC-HRMS, or molecular ions as well as major fragments for GCMS). Metabolite annotation was first performed by using these spectral databases according to accurately measured masses within MS spectra and chromatographic retention times. Confirmation of metabolite annotation in LC-HRMS was then accomplished by running additional LC-MS/MS experiments using a Dionex Ultimate chromatographic system combined with a Q-Exactive mass spectrometer (Thermo Fisher Scientific) under non-resonant collision-induced dissociation conditions using higher-energy C-trap dissociation (HCD) in both positive and negative ion modes, conducted on the same QC samples, and with the instrument set in the targeted acquisition mode, using inclusion lists. Resulting MS/MS spectra were then manually matched to those included in the in-house spectral database and acquired using different collision energies. Confirmation of metabolite annotation in GC-MS was done by matching electron impact spectra, as well as using reports from the literature. Then, the remaining unknown compounds were identified on the basis of their exact masses which were compared to those registered in Metlin ([118]https://metlin.scripps.edu; [119][48]), in the Human Metabolome Database (HMDB; [120]www.hmdb.ca; [121][49]), in Massbank ([122]https://massbank.eu/MassBank/; [123][50]), in Kyoto Encyclopedia of Genes and Genomes (KEGG) database ([124]http://www.genome.jp/kegg/; [125][51]), or in the National Institute of Standard and Technology (NIST; [126]https://www.nist.gov/srd/nist-special-database-14; [127][52]). Database queries were performed with a mass error of 0.005 Da, and a retention time difference of 0.1 min for the in-house databases. Database results were confirmed using appropriate standards when available, isotopic patterns, and mass fragmentation analyses. For unidentified ions, the number of plausible elemental compositions were restricted to a small number (or uniquely identified) with the support of additional chemical information, i.e. the molecular formula of the precursor ions, reports from the literature [128][53], and knowledge of possible metabolic pathways. Metabolites were classified accordingly to Sumner et al. [129][54] concerning the levels of confidence in the identification process: identified (confirmed by an authentic chemical standard analyzed under the same conditions, with the match at least two orthogonal criteria among accurate measured mass, retention time and MS/MS or EI(MS) spectrum), putatively annotated (spectral similarity with public/commercial spectral libraries), putatively characterized compound classes or unknown. It is important to note that only very few standards of lipid species are commercially available compared to the large diversity of endogenous lipid species present in complex biological matrices. Therefore, results of in-house database queries were filtered, according to the workflow described in Seyer et al. [130][32], taking into account retention time ranges of each lipid class, as well as isotope patterns, for selection of relevant lipid species, as previously described in the data filtration section. Finally, all HCD mass spectra resulting from the additional MS/MS experiments, were manually inspected to identify specific diagnostic ions and to confirm the structure of lipid species [131][55] (see Supplemental Fig. 2), that were named following the LipidMaps nomenclature [132][56]. Spectral assignments were based on matching 1D and 2D NMR data to reference spectra in a homemade reference database, as well as with other databases ([133]http://www.bmrb.wisc.edu/metabolomics/; [134]http://www.hmdb.ca/), and reports from the literature [135][57]. 3.9. Extraction of modulated metabolic network To link metabolites identified in untargeted metabolomics/lipidomics experiments within the context of genome-scale reconstructed metabolic networks, the metabolites described as modulated after LMM, stable over time, and identified or annotated, were mapped into the human genome-scale metabolic network Recon2.2 [136][58]. This network contains 7785 reactions and 6047 metabolites. In order to map the modulated metabolites on this network, we first retrieved their ChEBI identifier and then search for their matching identifier in the Recon2.2 network using the "identifier matcher tool" in MetExplore. This tool allows performing both an exact matching (to find the exact metabolite in the network corresponding to the modulated metabolite from the experimental dataset) and an ontology-based matching (to make the link with a corresponding more generic class metabolite, when the exact same metabolite cannot be retrieved in the network) [137][59]. In the metabolic network, each metabolite is assigned to several different cellular compartments. However, because current global and untargeted metabolomics approaches do not provide information on cellular localization of metabolites, we chose to consider only cytosolic metabolites. In order to focus on the most likely modulated part of the network, we first selected all the metabolic pathways in which at least one modulated metabolite was found, while excluding pathways involving only transport reactions. Forty-one pathways, including 2753 reactions, were selected. Then, a metabolic sub-network extraction was performed from the modulated metabolites. It consists in computationally identifying among the previously selected reactions, the ones that are more likely to be related to the modulated metabolites. The algorithm computes the lightest path between each pair of metabolites in the dataset. The lightest path is a sequence of reactions and metabolites connecting two metabolites and minimizing a topological criterion in the network [[138]60,[139]61]. For one dataset, the related sub-network is thus the union of all the lightest paths between metabolites present in this dataset. Pathway enrichment analyses were performed to assess whether the modulated metabolites were significantly over-represented in a metabolic pathway. Pathway enrichment statistics were performed using the one-tailed exact Fisher test, with a BH correction for multiple tests, using the metabolic pathways defined in Recon2.2. All computational and visualization tasks were performed within MetExplore web server based on the Recon2.2 metabolic network (biosource id #4311) [[140]62,[141]63]. Role of funding sources: All metabolomics and lipidomics analyses (data collection) were funded by the MetaboHUB French infrastructure (ANR-INBS-0010). Funders had no role in study design, data analysis, interpretation or writing of report. 4. Results 4.1. Overview of study population Fifty-eight quantitative variables in total were considered to precisely describe the selected population: 23 biochemical parameters, 8 clinical variables, 25 nutritional data (essentially related to macronutrient intake and selected nutrients described as being related with MetS), and finally 2 scores related to physical activity and global health (see Materials and Methods). As defined in the subject selection process (see Materials and Methods), MetS status of individuals was stable over the three years follow-up (for the 4 time points considered). Behind the stability of MetS status, the clinical parameters associated with MetS, analyzed at T1 and T4, were found stable over time, with a slight improvement for some of them (i.e. significant reduction of systolic blood pressure, fasting glucose and triglycerides (TG)). Differences of most of the MetS criteria quantitative variables were highly significant between cases and controls (BH corrected p-values from 10^−5 to 10^−21, [142]Table 1; Supplementary Fig. 1). The main descriptive data, as well as results from linear mixed models, are presented in [143]Table 1 and Supplementary Tables 1a, 1b. They showed that all the subjects were globally stable over time, not only for clinical values of MetS criteria, as already emphasized, but also for the main parameters related to physical activity, nutrition and health-related quality of life (physical (PCS) component summary score). Regarding MetS status, results showed that MetS subjects were less healthy and active than controls, with all global scores related to physical activity (PASE) and health-related quality of life (PCS) found significant (corrected p-values = 0.02 and 3.6 × 10^−3, respectively). Moreover, cases showed also significant lower total energy and carbohydrate intakes (corrected p-value = 0.025 and 6.8 × 10^−3, respectively). In addition, despite the fact that total dietary fiber intake was at the limit of significance, the evaluation of the consumption of cereal products, based on the Canadian Food Guide recommended intakes for grain products (Canadian-Healthy Eating Index, C-HEI) was significantly lower in cases in comparison to controls (corrected p-value = 0.052 and 0.013, respectively). Table 1. Overview of the study population. Controls __________________________________________________________________ Cases __________________________________________________________________ Corrected p-value time (BH) Corrected p-value MetS status (BH) T1 T4 T1 T4 n 62 62 61 61 – – Age (yrs) 73.5 ± 4.1 (62) – 74.1 ± 3.6 (61) – 1.00 0.34 Body weight (kg) 71.0 ± 8.0 (62) 69.9 ± 7.8 (62) 87.7 ± 12.5 (61) 87.4 ± 13.3 (61) 0.04 6.2 × 10^−14 BMI (kg/m^2) 25.1 ± 2.3 (62) 24.8 ± 2.4 (62) 30.5 ± 3.7 (61) 30.6 ± 3.7 (61) 0.37 1.5 × 10^−16 Waist circumference (cm) 93.3 ± 6.9 (62) 92.8 ± 6.9 (62) 109.9 ± 8.9 (61) 110.8 ± 9.5 (61) 0.67 6.2 × 10^−21 Fasting serum glucose (mM) 5.08 ± 0.44 (62) 4.86 ± 0.58 (62) 6.66 ± 1.45 (61) 6.54 ± 1.21 (61) 0.04 2.1 × 10^−15 Fasting TG (mM) 1.23 ± 0.47 (50) 1.18 ± 0.40 (53) 2.23 ± 1.01 (51) 1.94 ± 0.86 (51) 0.04 1.6 × 10^−8 Fasting HDL-C (mM) 1.43 ± 0.45 (50) 1.50 ± 0.34 (53) 1.13 ± 0.29 (56) 1.16 ± 0.26 (56) 0.74 1.1 × 10^−5 SBP (mmHg) 126.2 ± 16.6 (62) 120.9 ± 18.4 (62) 138.4 ± 15.8 (61) 133.7 ± 19.3 (61) 0.02 4.4 × 10^−5 DBP (mmHg) 71.8 ± 9.9 (62) 73.9 ± 8.1 (62) 74.7 ± 8.9 (61) 73.6 ± 9.4 (61) 0.69 0.47 Leucoc (10^9/L) 5.61 ± 1.29 (62) 5.97 ± 1.57 (62) 6.32 ± 1.22 (61) 6.55 ± 1.34 (61) 0.02 2.0 × 10^−2 Lympho (10^9/L) 1.49 ± 0.45 (62) 1.54 ± 0.46 (62) 1.75 ± 0.43 (61) 1.76 ± 0.63 (61) 0.50 2.0 × 10^−2 SF-36-Physical Component Summary Score[144]^* (PCS) 52.8 ± 5.8 (62) 52.3 ± 6.0 (61) 49.7 ± 8.0 (61) 46.7 ± 9.2 (61) 0.01 3.6 × 10^−3 Physical activity (PASE score) 125.4 ± 51.9 (62) 124.7 ± 53.5 (59) 107.1 ± 55.2 (61) 94.6 ± 50.5 (57) 0.41 2.3 × 10^−2 Energy (kCal/day) 2179 ± 524 (62) 2251 ± 576 (62) 1935 ± 462 (60) 2026 ± 506 (59) 0.08 2.5 × 10^−2 Carbohydrate (g/day) 269.6 ± 68.8 (62) 272.3 ± 73.6 (62) 231.7 ± 64.0 (60) 238.4 ± 62.5 (59) 0.45 6.8e-3 Protein (g/day) 87.5 ± 22.8 (62) 89.3 ± 28.2 (62) 83.1 ± 20.4 (60) 82.9 ± 22.6 (59) 0.75 0.31 Lipid (g/day) 78.5 ± 23.9 (62) 83.8 ± 26.7 (62) 71.9 ± 23.4 (60) 80.3 ± 27.0 (59) 0.02 0.35 C-HEI-Cereals (score: 0–10) 7.99 ± 2.25 (62) 7.72 ± 2.26 (62) 6.66 ± 2.05 (60) 6.99 ± 2.20 (60) 0.88 1.3 × 10^−2 Total dietary fiber (g/day) 23.6 ± 9.4 (62) 24.9 ± 10.7 (62) 19.8 ± 7.5 (60) 21.4 ± 7.2 (59) 0.10 5.2 × 10^−2 [145]Open in a new tab Mean ± SD; linear mixed model p-value (after Benjamini-Hochberg (BH) correction for 58 parameters). Bold: corrected p-value < 0.05. ^⁎ T-scores based on a mean of 50 and a SD of 10. 4.2. Serum metabolomes in the NuAge sub-cohort Given the high diversity of metabolites present over a wide concentration range in biological samples, proper MetS evaluation requires a broad metabolome coverage. The use of complementary technologies, combining both Nuclear Magnetic Resonance (NMR) and high-resolution mass spectrometry (HRMS), as well as different chromatographic systems, including gas, reverse-phase and hydrophilic interaction chromatography with detection in both positive and negative electrospray ionization modes, allowed covering both polar and apolar compounds for relevant and comprehensive metabolome and lipidome analysis (see Materials and Methods). On this basis, a full workflow was developed for serum analysis, allowing the sample preparation and the measurement of a wide diversity of metabolites from the more polar ones to lipids ([146]Fig. 1a and b). Fig. 1. [147]Fig. 1 [148]Open in a new tab Study design and multiplatform metabolomics data generation. a: Experimental design of the MetS case/control study, involving the follow-up of 123 stable subjects over 3 years. b: Analytical workflow based on seven complementary untargeted metabolomics methods using 3 different analytical platforms (LC- and GC-HRMS, NMR) for the serum analysis of the 123 subjects collected at two time points. Data analysis resulted in 2915 metabolite and lipid related features, detected across at least 20% of the subjects over time. c: The similarity of method blocks evaluated using RV coefficients (a matrix correlation coefficient (see method section)) after Multiple Factorial Analysis. d: Spearman correlation network between the 2915 metabolite and lipid features from the six metabolomic/lipidomic datasets (significant correlation >0.7). Nodes correspond to features obtained using the different analytical platform (Pink: C18pos; Dark Blue: C18neg; Green: GCMS; Blue: HILICneg; Yellow: Lipidomics; Black: NMR). Two nodes are connected by an edge if correlation is significant between the corresponding features (For interpretation of the references to color in this figure legend, the reader is referred to