Abstract

Background

   Metabolic syndrome (MetS), a cluster of factors associated with risks
   of developing cardiovascular diseases, is a public health concern
   because of its growing prevalence. Considering the combination of
   concomitant components, their development and severity, MetS phenotypes
   are largely heterogeneous, inducing disparity in diagnosis.

Methods

   A case/control study was designed within the NuAge longitudinal cohort
   on aging. From a 3-year follow-up of 123 stable individuals, we present
   a deep phenotyping approach based on a multiplatform metabolomics and
   lipidomics untargeted strategy to better characterize metabolic
   perturbations in MetS and define a comprehensive MetS signature stable
   over time in older men.

Findings

   We characterize significant changes associated with MetS, involving
   modulations of 476 metabolites and lipids, and representing 16% of the
   detected serum metabolome/lipidome. These results revealed a systemic
   alteration of metabolism, involving various metabolic pathways (urea
   cycle, amino-acid, sphingo- and glycerophospholipid, and sugar
   metabolisms…) not only intrinsically interrelated, but also reflecting
   environmental factors (nutrition, microbiota, physical activity…).

Interpretation

   These findings allowed identifying a comprehensive MetS signature,
   reduced to 26 metabolites for future translation into clinical
   applications for better diagnosing MetS.

   Keywords: Metabolic syndrome, Metabolomics, Deep phenotyping,
   Lipidomics, Metabolic signature
     __________________________________________________________________

Research in context.

Evidence before this study

   Prior association studies linking metabolites and lipids with MetS (i)
   have been limited in terms of molecular species profiled, (ii) lacked
   of considering the interaction between metabolisms as well as with
   extrinsic factors, and (iii) were very rarely issued from longitudinal
   studies.

Added value of this study

   Our deep phenotyping approach, along with a 3-year follow-up design,
   provides robust and integrated insights into MetS mechanisms and
   proposes new candidate biomarkers within an optimized statistically,
   analytically and biologically refined associated molecular signature.

Implications of all the available evidence

   These findings highlight the interest of a comprehensive molecular
   signature as marker of MetS, that should be validated for future
   translation into clinical applications for better diagnosing MetS.

   Alt-text: Unlabelled box

1. Introduction

   The metabolic syndrome (MetS), defined as a cluster of risk factors for
   cardiovascular disease (CVD), has been recognized for decades with a
   rising prevalence worldwide [65][1]. The main culprits of this rise are
   the aging of the population and the complex interactions between
   lifestyle factors such as unhealthy dietary habits and sedentarity,
   leading to overweight and obesity [[66][2], [67][3], [68][4]]. Because
   several clinical definitions co-exist [69][5] among health
   organizations (e.g. National Cholesterol Education Program (NCEP),
   International Diabetes Federation (IDF), World Health Organization
   (WHO)), the true prevalence of MetS is difficult to reliably establish.
   However, MetS comprises elevated blood pressure, dyslipidemia,
   including hypertriglyceridemia and reduced blood levels of high-density
   lipoprotein cholesterol (HDL-C), fasting hyperglycemia, and central
   adiposity. It is now accepted that it represents a global public health
   concern with a worldwide prevalence reaching one third of US adults
   having MetS and over 45% by the age of 60 [[70]1,[71]6]. There is also
   a consensus regarding the presence of multiple metabolic risk factors
   for CVD and type 2 diabetes (T2D) [[72]7,[73]8]. Moreover, considering
   the combination of concomitant components, and their development and
   severity profile, patients identified with MetS are largely
   heterogeneous, inducing a disparity in the diagnosis and therapeutic
   approach [74][9]. A better characterization of pathophysiological
   alterations associated with MetS could therefore contribute to improve
   diagnosis and better syndrome delineation.

   In this context, metabolomics and lipidomics have emerged over the last
   decade as powerful tools for the analysis of phenotypes, providing key
   insights into modified metabolic pathways and better understanding of
   pathophysiological processes [[75]10,[76]11]. Indeed, metabolic
   profiles allow getting an integrated view of metabolism because of a
   sensitive detection of molecular changes over time, resulting from the
   interaction between intrinsic and extrinsic factors. Metabolites, used
   as single targets or in combination within a comprehensive signature,
   are thus promising biomarkers to reveal metabolic dysfunctions.
   Metabolomics has therefore been widely applied for metabolic disease
   phenotyping and candidate biomarker discovery as well as
   pathophysiological exploration of underlying mechanisms
   [[77]12,[78]13]. However, even if studies on T2D have been among the
   main drivers in this chronic metabolic disease research field using
   these global approaches for biomarker research, few studies focussed on
   MetS and often consisted in targeted approaches with a restricted
   number of detected metabolites [79][14]. Consequently, an integrated
   vision of metabolic derangements is lacking along with a limited
   capacity of study comparisons [80][15].

   In the present study, a 3-year follow-up design of stable subjects
   within an observational longitudinal cohort, as well as a deep
   phenotyping approach based on a multiplatform strategy involving
   metabolomics and lipidomics untargeted methods, were set up, with the
   objective to better characterize metabolic perturbations in MetS and
   define a comprehensive MetS signature stable over time in older men.

2. Materials

2.1. The NuAge cohort and subject selection

   The present study was designed within the 5-year observational Quebec
   Longitudinal Study on Nutrition and Successful Aging (NuAge). The
   cohort was constituted of 1793 men and women in good general health,
   selected from three age groups (68–72, 73–77, 78–82) at recruitment.
   French or English-speaking community-dwelling participants were
   committed to give fasting blood, undergo several direct measures
   annually, and to answer questionnaires related to food and health
   biannually. The NuAge database comprises large qualitative and
   quantitative data related to anthropometry/body composition,
   nutrition/dietary intakes, numerous markers of physical, clinical and
   cognitive status, physical activity, functional autonomy and social
   functioning. Methodological description of measures, questionnaires and
   blood test, processing and storing have been described in Gaudreau
   et al. [81][16].

2.1.1. Ethical approval

   All procedures performed in the study involving human participants were
   in accordance with the ethical standards of the institutional and/or
   national research committee and with the 1964 Helsinki declaration and
   its later amendments or comparable ethical standards. Informed consent
   was obtained from all individual participants included in the NuAge
   study. The NuAge Study has been approved by the Research Ethics Board
   (REB) of both the Geriatric University Institutes of Montreal and
   Sherbrooke Research Centers. The management framework of the actual
   NuAge Database and Biobank has been approved by the REB of the
   CIUSSS-de-l'Estrie-CHUS (protocol #2019-2832).

2.1.2. Subject selection

   A case/control study on MetS was designed within the NuAge cohort, with
   serum samples collected at two time points (recruitment 2003–2005 (T1)
   and 3 years later (T4)), with the objective to identify a metabolic
   signature of MetS, stable over time using a multiplatform
   lipidomic/metabolomics approach. In this context, an optimized subject
   selection strategy was developed. Briefly, the selection was based on
   the presence and number of MetS criteria, and their stability over the
   three years. It was performed among the 853 males as it has been
   recognized that in the province of Quebec, men have more risk factors
   of MetS than women [[82][17], [83][18], [84][19], [85][20]]. MetS was
   defined using the following criteria - thresholds defined for men
   [[86]5,[87]21]: elevated waist circumference (≥ 102 cm, WC); high blood
   pressure (systolic > 130 mmHg and/or diastolic > 85 mmHg) or
   antihypertensive drug treatment with history of hypertension, elevated
   fasting blood glucose (≥ 5.7 mM) or drug treatment for hyperglycemia
   (oral hypoglycemic, insulin); high circulating triglyceride levels (≥
   1.7 mM) or drug treatment (fibrates, nicotinic acid); and
   reduced-HDL-cholesterol (< 1.0 mM) or drug treatment (fibrates,
   nicotinic acid). Regarding the study objectives, only stable subjects
   over time were included. Using these five criteria, subjects with
   unstable (changing status) or uncertain MetS status (due to missing
   values) over time were then excluded. Cases were defined as having
   three or more of the MetS criteria, while controls were defined as
   having less than three MetS criteria at each time point. It resulted in
   identifying 61 incident cases and 88 controls. Concerning control
   individuals, it was important to exclude extreme subjects that could
   generate false negative results. Therefore, in agreement with
   clinicians, controls with seven or more drug treatments were excluded
   [88][22]. Moreover, value outliers were analyzed. Because no time
   effect was observed for the quantitative variables defining MetS,
   individuals with mean extreme values for MetS biological variables over
   time, outside the range defined by the mean (T1 to T4) ± 1.5
   interquartile range (IQR) were excluded. Finally, this strategy ended
   up selecting 61 cases and 62 controls. Because it is known that
   metabolomic profiles are modified by age, it was checked that there was
   no significant age difference between cases and controls to avoid a
   potential bias. To do so, three experimental classes were defined
   according to the age distribution (67–72 years old (n = 25 vs 22),
   73–77 years old (n = 22 vs 24), 78–84 years old (n = 15 vs 15)), and
   the size balance between age class in both groups was checked using
   Fisher's Exact Test.

2.1.3. Epidemiological data

   Fifty-eight quantitative variables evaluated at T1 and T4 were
   considered to precisely describe the selected population: 23
   biochemical parameters, 8 clinical variables, 25 nutritional data and
   finally 2 scores related to physical activity (Physical Activity Scale
   for the Elderly (PASE) questionnaire; [89][23]) and health-related
   quality of life (using physical (PCS) component summary score derived
   from the Medical Outcome Study 36-item Short Form Health Survey [SF-36]
   questionnaire; [[90]24,[91]25]). In particular, nutritional data
   consisted in intake data obtained from the mean of two to three
   non-consecutive 24 h dietary recalls (24-HR) [92][26], as well as in a
   validated Canadian global dietary quality index, the Canadian Healthy
   Eating Index (C-HEI) [93][27]. This index is based on intake of four
   food groups: grain products, fruits and vegetables, milk products, meat
   and alternatives, and five other items: % of energy as total fat intake
   and saturated fat intake, cholesterol, salt and diet variety. The total
   score ranges from 0 to 100, with higher scores indicating whether the
   nutritional quality of the diet is closer to the Canadian guidelines
   for healthy eating.

2.2. Randomization of biological samples

   Following sample selection and in perspective of multiplatform
   analyses, sample preparation and analytical sequence had to be
   carefully built. In metabolomics, analytical sequences are usually
   randomized using a Williams-Latin-Square strategy defined according to
   the main factors of the study, as well as potential confounding factors
   linked to sampling conditions. In the present work, samples were
   randomized using this strategy, defined first according to the main
   factor of the study (MetS), considering the sum of the annual number of
   MetS criteria between the two time points (T1 to T4), (divided in 4
   groups: 0–3; 3–7; 11–14; 15–20; 0 being no positive criteria over the 3
   years and 20 for 5 positive criteria over this period). This
   randomization was used both for sample preparations and analyses.

3. Methods

   Seven complementary untargeted metabolomics methods based on 3
   different analytical platforms, Ultra High-Performance Liquid
   Chromatography coupled to High-Resolution Mass Spectrometry (LC-MS),
   Gas Chromatography coupled to High-Resolution Mass Spectrometry
   (GC-MS), and Nuclear Magnetic Resonance spectroscopy (NMR), were used
   to characterize the MetS phenotypic spectrum.

   Quality control samples were designed and prepared to control for
   potential bias due to sample preparation or analytical drifts. Since in
   untargeted metabolomics hundreds to thousands of metabolites are
   detected, the use of internal standards for each metabolite is almost
   impossible and pooled quality control (QC) samples are recognized to be
   the most appropriate approach [94][28]. In the present study, these QC
   samples consisted in a pool of human serum samples extracted
   independently and subsequently diluted 1/2, 1/4 and 1/8. All analytical
   sequences were standardized: at least three blank (solvent) samples and
   five pooled QC samples were injected for column conditioning. Then, the
   stability of the analytical system was monitored using these QC,
   injected one time at the beginning of each analytical sequence and
   thereafter every 10 samples.

3.1. Data production

3.1.1. Ultra high-performance liquid chromatography coupled to
high-resolution mass spectrometry (LC-MS)

   Three methods were performed to maximize the serum metabolome coverage:
   reversed phase LC-MS (C18) analysis complemented by hydrophilic
   interaction chromatography (HILIC) to allow the detection of polar
   metabolites and an untargeted lipidomics approach using a reverse phase
   LC-MS (C8) to profile a large set of lipid species.

3.2. C18-based system (C18Pos and C18Neg)

   Serum samples (100 µL) were slowly thawed on ice at room temperature.
   Proteins precipitation was performed by addition of 200 µL of ice-cold
   methanol (MeOH). This mixture was vortexed and placed at −20 °C for
   30 min. After a 10 min centrifugation (4 °C, 15,493 g, Sigma 3-16PK,
   Fischer Bioblock Scientific), the supernatant was divided into three
   45 µL aliquots, dried completely (EZ2.3 Genevac, Biopharma Technologies
   France) and stored at −80 °C until further analysis. Just before
   analysis, 150 µL of injection solvents (water and acetonitrile
   50/50 + 0.1% Formic Acid) was added to the dry fraction. A pooled QC
   sample was prepared by mixing 5 µL from each extracted sample. This
   sample preparation was automated on a Freedom EVO200 TECAN robot (Tecan
   Trading AG, Switzerland,), enabling liquid handling with a high
   repeatability (CV≤0.75%).

   Metabolic profiles were determined using an U3000 liquid chromatography
   system (Thermo Fisher Scientific, San Jose, CA, USA) coupled to a
   high-resolution Bruker Impact HDII UHR-QTOF (Bruker Daltonics,
   Wissembourg, France) equipped with an electrospray source (ESI).
   Chromatographic separation was performed on a Waters HSS T3 column
   (150 × 2.1 mm, 1.8 µm) at 0.4 mL/min, 30 °C and using an injection
   volume of 5 µL. Mobile phases A and B were water and acetonitrile with
   0.1% formic acid, respectively. The gradient elution was 0% B (2 min),
   0–100% B (13 min), 100% B (7 min), 100–0% B (0.1 min) and 0% B (3.9 min
   for re-equilibration). The mass resolving of the mass spectrometer was
   50,000 full width at half maximum (FWHM) at m/z 1222. Samples were
   analyzed in the positive and negative ionization modes (C18Pos,
   C18Neg). Capillary and end plate offset voltages were set at 2500 V and
   500 V for the ESI source. The drying gas temperature was 200 °C and
   nebulization gas flow was 10 L/min. Mass spectrum data was acquired in
   full-scan mode over mass range 50–1000 mass-to-charge ratio (m/z).

3.3. HILIC-based system (HILICneg)

   Metabolite extraction was performed from 50 µL of serum following
   methanol-assisted protein precipitation as previously described
   [95][29]. Briefly, 200 μL of methanol containing internal standards at
   3.75 µg/mL (Dimetridazole, 2-amino-3-(3-hydroxy-5-methyl-isoxazol-4-yl)
   propanoic acid (AMPA), 2-methyl-4-chlorophenoxyacetic acid (MCPA),
   Dinoseb (Sigma-Aldrich, Saint-Quentin Fallavier, France)) were added to
   50 µL of serum. The resulting samples were then left on ice for 90 min
   until complete protein precipitation. After centrifugation (20,000 g,
   15 min, 4 °C), supernatants were collected and dried under a nitrogen
   stream using a TurboVap instrument (Thermo Fisher Scientific,
   Courtaboeuf, France) and stored at −80 °C until analysis. Dried
   extracts were resuspended in 150 µL of ammonium carbonate 10 mM
   pH10.5/acetonitrile (40:60). After reconstitution, the tubes were
   vortexed, incubated in an ultrasonic bath for 5 min on ice, and
   centrifuged (20,000 g, 15 min, 4 °C). A volume of 95 µL of the
   supernatant was transferred into 0.2 mL vials. External standard
   solution (5 µL; mixture of 9 authentic chemical standards covering the
   mass range of interest: ^13C-glucose, ^15N-aspartate, ethylmalonic
   acid, amiloride, prednisone, metformin, atropine sulfate, colchicine,
   imipramine) was added to all samples in order to check for consistency
   of analytical results in terms of signal and retention time stability
   throughout the experiments. The QC samples were prepared by mixing
   20 µL of each extracted sample. QC samples were injected every 5
   samples.

   Metabolic profiling experiments were performed using an U3000 liquid
   chromatography system coupled to an Exactive mass spectrometer from
   Thermo Fisher Scientific (Courtaboeuf, France) fitted with an
   electrospray source operating in the negative ion mode. Chromatographic
   separation was performed on a Sequant ZICpHILIC column (5 µm,
   2.1 × 150 mm, Merck, Darmstadt, Germany) maintained at 15 °C for
   improved peak shape and chromatographic separation of nucleotidic
   metabolites [[96]30,[97]31], and also equipped with an on-line
   prefilter (Thermo Fisher Scientific, Courtaboeuf, France). Mobile
   phases A and B were an aqueous buffer of 10 mM ammonium carbonate in
   water adjusted to pH 10.5 with ammonium hydroxide, and 100%
   acetonitrile, respectively. The flow rate was 200 µL/min.
   Chromatographic elution was achieved under the following gradient
   conditions: isocratic step of 2 min at 80% B, followed by a linear
   gradient from 80 to 40% of phase B from 2 to 12 min. The
   chromatographic system was then rinsed for 5 min at 0% B, and the run
   ended with an equilibration step of 15 min (80% B). The Exactive mass
   spectrometer was operated with a capillary voltage set at −3 kV and a
   capillary temperature set at 280 °C. The sheath gas pressure and the
   auxiliary gas pressure (nitrogen) were at 60 and 10 arbitrary units,
   respectively. The mass resolving power of the analyzer was 50,000
   (FWHM) at m/z 200, for singly charged ions. The detection was achieved
   from m/z 75 to 1000.

3.4. Lipidomic untargeted approach (LIPIDO)

   Serum samples were extracted using an adapted method to that previously
   described [98][32]. Briefly, 100 μL of serum was added to 490 μL of
   CHCl[3]/MeOH 1:1 (v/v) and 10 μL of internal standard mixture. Samples
   were vortexed for 60 s, sonicated for 30 s using an ultrasonic probe
   (Bioblock Scientific Vibra Cell VC 75,185, Thermo Fisher Scientific
   Inc., Waltham, MA, USA) and incubated for 2 h at 4 °C with mixing.
   Seventy-five μL of H[2]O was then added and samples were vortexed for
   60 s before centrifugation at 15,000 g for 15 min at 4 °C. The upper
   phase (aqueous phase), containing gangliosides,
   lysoglycerophospholipids, and short chain glycerophospholipids, was
   transferred into a glass tube and dried under a stream of nitrogen. The
   protein disk interphase was discarded and the lower lipid-rich phase
   (organic phase) was pooled with the dried upper phase and the mixture
   dried under nitrogen. Samples were resuspended with 100 μL of a
   solution CHCl[3]/MeOH 1:1 (v/v). Ten μL were 100-fold diluted in a
   solution of MeOH/isopropanol/H[2]O 65:35:5 (v/v/v) before injection.

   Lipidomic profiles were determined using an Ultimate 3000 liquid
   chromatography system (Thermo Fisher Scientific, San Jose, CA, USA)
   coupled to a high resolution Thermo Orbitrap Fusion (Thermo Fisher
   Scientific, San Jose, CA, USA) equipped with an electrospray source
   (ESI). Chromatographic separation was performed on a Phenomenex Kinetex
   C8 column (150 × 2.1 mm, 2.6 µm) at 0.4 mL/min, 60 °C and using an
   injection volume of 10 µL. Mobile phases A and B were H[2]O/MeOH 60:40
   (v/v), 0.1% formic acid and isopropanol/MeOH 90:10 (v/v), 0.1% formic
   acid in negative ionization mode (LIPIDOneg), respectively. Ammonium
   formate (10 mM) was added to both mobile phases in the positive
   ionization mode (LIPIDOpos) in order to detect glycerolipids and
   cholesteryl-esters under [M+NH[4]]^+ form. The gradient elution was
   solvent B was maintained for 2.5 min at 32%, from 2.5 to 3.5 min it was
   increased to 45% B, from 3.5 to 5 min to 52% B, from 5 to 7 min to 58%
   B, from 7 to 10 min to 66% B, from 10 to 12 min to 70% B, from 12 to
   15 min to 75% B, from 15 to 19 min to 80% B, from 19 to 22 min to 85%
   B, and from 22 to 23 min to 95% B; from 23 to 25 min, 95% B was
   maintained; from 25 to 26 min solvent B was decreased to 32% and then
   maintained for 4 min for column re-equilibration. The mass resolving
   power of the mass spectrometer was 240,000 (FWHM) for MS experiments.
   Samples were analyzed in both positive and negative ionization modes.
   The ESI source parameters were as follows: the spray voltage was set to
   3.7 kV and −3.2 kV in positive and negative ionization mode,
   respectively. The heated capillary was kept at 360 °C and the sheath
   and auxiliary gas flow were set to 50 and 15 (arbitrary units),
   respectively. Mass spectra were recorded in full-scan MS mode from m/z
   50 to m/z 2000.

3.4.1. Gas chromatography coupled to high-resolution mass spectrometry (GCMS)

   Serum samples were slowly thawed at 4 °C overnight. Four hundred µL of
   ice-cold methanol (−20 °C) were added to 100 µL serum sample and the
   mixture was vortexed. After protein precipitation, samples were kept at
   −20 °C for 30 min and then centrifuged (Sigma 3–16PK, Fischer Bioblock
   Scientific) at 20,627 g for 10 min at 4 °C. Two hundred and fifty µL of
   supernatant were transferred into a 2 mL amber glass vial. After the
   addition of 10 µL of [^13C[1]]-l-valine (200 µg/mL), samples were
   evaporated under EZ2.3 Genevac (Biopharma Technologies France). At the
   same time and in parallel, a control derivatization sample (serum
   substituted by milliQ water) was prepared in order to remove the
   background noise produced during sample pre-processing, derivatization,
   and GC/MS analysis. The dry residues were dissolved with addition of 80
   µL of methoxylamine solution (15 mg/mL in pyridine) to each vial,
   vortexed vigorously for 1 min and incubated for 24 h at 37 °C (in order
   to inhibit the cyclization of reducing sugars and the decarboxylation
   of α-keto acids). Then, 80 µL of
   N,O-bis(Trimethylsilyl)trifluoroacetamide (BSTFA) with 1%
   trimethylchlorosilane (TMCS) as catalyst were added into the mixture
   for derivatization (60 min, 70 °C). Before injection, 50 µL of
   derivatized mixture were transferred in a glass vial containing 100 µL
   heptane. QC pool samples were prepared using 10 µL of each extracted
   and derivatized samples.

   Metabolic profiles were obtained using an Agilent 7890B Gas
   Chromatograph coupled to an Agilent Accurate Mass QTOF 7200 equipped
   with a 7693A Injector (SSL) Auto-Sampler (Agilent Technologies, Inc).
   Separation was achieved on a fused silica column HP-5MS UI 30 m x
   0.25 mm i.d. chemically bonded with a 5% phenyl-95% methylpolysiloxane
   cross-linked stationary phase (0.25 µm film thickness) (Agilent J & W
   Scientific, Folsom, CA, USA). Helium was used as a carrier gas at a
   flow rate of 1 mL/min. Two µl of derivatized sample was injected using
   1:20 split. Temperatures of injector, transfer line, and electron
   impact (EI) ion source were set to 250 °C, 280 °C and 230 °C,
   respectively.   The initial oven temperature was 60 °C for 2 min,
   ramped to 140 °C at a rate of 10 °C/min, to 240 °C at a rate of
   4 °C/min, to 300 °C at a rate of 10 °C/min and finally held at 300 °C
   for 8 min. Agilent ‘‘retention time locking” (RTL) was applied to
   control the reproducibility of retention times. [^13C[1]]-l-valine was
   used to lock the GC method [99][33]. The electron energy was 70 eV and
   mass data were collected in a full scan mode (m/z 55-700) using a
   resolving power of 7000 (FWHM) to m/z 464 (perfluorotributylamine,
   PFTBA). Acquisition rate was 5 spectra/sec with acquisition time of
   200 msec/spectrum. Four heptane blanks were injected at the beginning
   of each sequence, followed by four pool samples, and then one pool
   sample and one derivatization control sample after each set of 10
   samples. Initially tune and calibrate the system were performed using
   PFTBA with acquisition conditions 2 GHz EDR with N2 (1.5 mL/min) and
   the limits for average PPM error were 3.0 and maximum error: 8.0. Also,
   a calibration was made between each sample.

3.5. Nuclear magnetic resonance spectroscopy (NMR)

   Serum aliquots (50 µL) were slowly thawed at room temperature on ice.
   One hundred µL of phosphate buffer (0.2 M, pH 7.0) prepared in
   deuterium oxide (D[2]O) were added to the aliquots, and each sample was
   vortexed and centrifuged for 15 min at 4500 g and 150 µL of the
   supernatants were transferred into the 3 mm NMR tubes.

   All ^1H NMR spectra of serum samples were obtained on a Bruker Avance
   III HD spectrometer (Bruker, Karlsruhe, Germany) operating at
   600.13 MHz for ^1H resonance frequency and equipped with an inverse
   detection 5 mm CQPCI ^1H-^31P-^13C-^15N cryoprobe connected to a
   cryoplatform and a cooled SampleJet sample changer. Spectra were
   acquired at 300 K using the Carr-Purcell-Meiboom-Gill (CPMG) spin-echo
   pulse sequence with a total spin-echo delay of 240 msec to attenuate
   broad signals from proteins and lipoprotein and a 2 s relaxation delay.
   A water suppression signal was achieved by pre-saturation during the
   relaxation delay. The spectral width was set to 20 ppm for each
   spectrum, and 256 scans were collected with 32 K points. Free induction
   decays were multiplied by an exponential window function before Fourier
   Transform. The spectra were manually phased and calibrated to the
   lactate signal (δ 1.33 ppm), and the baseline was corrected using
   TopSpin 3.2 software (Bruker, Karlsruhe, Germany).

3.6. Data treatment

   Following metabolomic/lipidomic analyses, some samples were identified
   as missing, because of problem in sample preparation or missing data (1
   for C18Pos, 6 for HILIC, 4 for Lipidomic and 1 for GCMS). All the
   obtained raw data from metabolic profiles were processed to yield a
   data matrix containing variables and peak intensities. All the data
   treatments were performed separately for each analytical method as
   individual datasets, under the Galaxy web-based platform
   Worflow4Metabolomics (W4M) [100][34] to ensure the standardization and
   reproducibility of the data treatment workflows.

3.6.1. Data extraction and pre-processing for MS

   First, raw data were extracted using XCMS [101][35], followed by
   quality checks and signal drift correction according to the strategy
   described by van der Kloet et al. [102][36] based on the use of pooled
   QC samples, to yield a data matrix containing retention times, masses,
   and peak intensities that have been corrected for batch effects. These
   steps include noise filtering, automatic peak detection, and
   chromatographic alignment. In particular, all XCMS extractions used a
   “minfrac” parameter of 0.2 to keep variables if present in at least 20%
   of the samples, since a huge variability of profiles in the selected
   individuals was expected. Due to a high degree of correlations between
   the two lipidomic extracted datasets, they were merged for further data
   processing. After signal drift and batch effect correction within the
   six datasets, metabolite MS signals were then filtered using the
   following criteria: ratio of chromatographic peak areas of samples over
   blanks (above 3), correlation between QC pool dilution factors and
   areas of chromatographic peaks (over 0.7), repeatability of QC pool
   samples (CVs under 30%) and ratio of QC pool sample CVs over biological
   sample CVs (below 1).

3.6.2. NMR data pre-processing

   The NMR spectra were imported in the Amix software (version 3.9.15,
   Bruker, Rheinstetten, Germany) for data integration. A variable size
   bucketing was performed based on graphical pattern (74 buckets) and
   each bucket was then integrated.

3.6.3. Filtration

   During the analysis, metabolites produce several analytical features
   corresponding to signals derived from different adduct ions generated
   in the ESI process, signals from the presence of isotopes in the
   molecule, signals from in-source fragmentation processes, and to
   different peaks from the same molecule in NMR. The data extraction step
   results in thousands of features present in the final datasets with a
   high degree of correlation, which is a constraint for the use of
   various data mining and statistical methods. For example, analytical
   redundancy highly affects multiple testing correction. Indeed, having
   non-independent variables (coming from the same metabolite) lead to an
   over-correction of data that can hide potentially relevant information.
   Therefore, the analytical redundancy inside each of the 6 datasets was
   reduced in the present study. In metabolomics, filtering was
   technique-specific but with a common characteristic to reduce
   correlation above 0.90 and to select one single representative per
   group, as being the most intense signal for MS data and the purest one
   for NMR. This procedure was conducted using the Analytic Correlation
   Filtration (ACorF) tool [103][37] within W4M, with a manual selection
   of the representative feature only for NMR. In lipidomics, this step
   was performed according to the workflow previously described [104][32].
   Briefly, a first automatic feature annotation was achieved through
   using an in silico database containing the exact masses corresponding
   to pseudo-molecular ions ([M + H]^+, [M-H]^− and [M-2H]^2^−),
   adducts([M+NH[4]]^+, [M+Na]^+,[M-H+CHO[2]]^−), and in source fragments
   ([M + H−H[2]O]^+) ions along with their corresponding ^13C and double
   ^13C isotopes. Furthermore, specific retention time windows for each
   lipid class were also added by examining retention times of species
   containing the longest and the shortest fatty acyl chains. Then,
   annotated lipid species were thus kept if (i) their ^13C isotope was
   detected and aligned in time (± 5 s), and (ii) all related ions (i.e.
   pseudo-molecular ions, adduct ions and/or in source fragments, either
   as monoisotopic or ^13C and 2×^13C isotopes) had the same retention
   time as a reference ion specific of a lipid class/subclass (± 5 s, and
   ±10 s between the two ionization modes after merging the corresponding
   peaktables). In addition, the relative isotopic abundance (RIA) between
   the monoisotopic ion and its corresponding ^13C isotope, were
   automatically calculated and compared to theoretical ones. Annotated
   lipid species with an RIA error higher than 30% were filtered out. This
   threshold of 30% was selected since RIA errors of all internal
   standards were below this value. The two lipidomic peaktables obtained
   in both positive and negative ionization modes were merged, because of
   their high degree of correlations, due to the detection of specific
   lipid classes in both modes (i.e.: lysophosphatidylcholines,
   phosphatidylcholines and sphingomyelins) under [M + H]^+ and
   [M-H+CHO[2]]^− forms, respectively. The two peaktables were aligned
   according the retention time at ± 10 s.

3.7. Statistical analyses

   All statistical analyses were performed after data pre-processing and
   filtration of the individual 6 datasets.

3.7.1. Measurement of serum metabolomes in the NuAge MetS sub-cohort

   Correlation analyses were performed to give an overview of links
   between detected metabolites/lipids in serum, both at the level of
   method datasets and individual variables. First, the RV coefficients
   [105][38] were used to provide insight into the global association
   between datasets using the R software (version 3.4.1) [106][39], with
   the R-package FactoMineR [107][40]. This coefficient [108][38] is a
   multivariate generalization of the squared Pearson correlation
   coefficient, defining a scale of similarity between two matrices and
   measure to what degree the different datasets give the same view on the
   samples [109][41]. Second, to investigate individual correlations
   between detected features, pair-wise Spearman correlation coefficients
   between variables were calculated using the Between Table Correlation
   tool available in the W4M and a network analysis was done. The
   significant correlation coefficients >0.7 (after Benjamini-Hochberg
   correction) were filtered and a graphical representation of Spearman
   correlation network was made with Cytoscape [110][42].

3.7.2. Metabolite and lipid levels modulated with MetS

   Individuals in this study were selected stable regarding their MetS
   status. Nonetheless we could expect that part of their metabolism was
   affected by time. Thus, the impact of time on the metabolomic/lipidomic
   datasets was also evaluated. As no interaction effect was observed
   between status and time, linear mixed models (LMM) were performed to
   analyze repeated measures, considering fixed effect factors (time,
   status (case/control), and their interactions) and subject as random
   effect, using the module available in W4M. In order to verify that the
   LMM assumptions were met, we considered the different residuals of LMM.
   The assumption of homogeneity of variance of the residuals was checked
   for each fixed factor using a Levene test. Then, the normality of the
   conditional residuals and random effect residuals were verified using
   quantile-quantile plot. The linearity of fixed effects was checked as
   proposed by Singer et al. [111][43] using plot of the marginal
   predictions vs standardized marginal residues. A p-value threshold of
   0.05 after Benjamini-Hochberg (BH) correction was considered to detect
   variables strongly affected by status and time. Similar statistical
   analyses were performed on epidemiological data.

3.7.3. Identification of a comprehensive molecular MetS signature

   The objective of the present study was to identify a limited number of
   metabolites that could together reflect the MetS status. In this
   context, a variable selection was first performed based on the
   methodology developed by [112]Rinaudo et al. [113][44], using the
   biosigner module available in the W4M Galaxy instance [114][34] on each
   individual dataset. The aim was to focus on the variables, which
   significantly contribute to the performance of the discrimination. As
   feature selection may be affected by correlations between variables, a
   Pearson correlation filter on each dataset (over 0.8) was applied
   beforehand. All variables selected by biosigner with at least one of
   three classifiers (Partial Least Squares Discriminant Analysis
   (PLS-DA), Support Vector Machine (SVM), and Random Forest (RF)) were
   first considered. Then, this process was repeated five times to cope
   for the selection variability induced by the bootstrap effect of the
   methodology. The unions over the five repetitions were included in
   individual predictive subsets. In a second step the selected variables
   of each subsets were integrated into a common PLS-DA model to
   characterize the discriminant power of the comprehensive signature by
   combining the 6 individual predictive subsets. For all PLS analyses,
   unit variance scaling (UV) was applied to variable intensities. All PLS
   models were defined using the 7-fold cross validation method. The
   prediction power of the model was assessed using the Q^2 parameter. To
   check that PLS components could not lead to a correct classification by
   chance, a permutation evaluation was carried out (n = 200). For each
   test, samples are randomly assigned to each experimental group, a PLS
   model is carried out and R^2Y and Q^2 are computed. The result of the
   tests is displayed on a validation plot, which shows the correlation
   coefficient with the original non-permuted sample, having a value 1 on
   the horizontal axis and R^2Y and Q^2 values on the vertical axis.
   Logically, permuted samples must lead to poor predictive models with
   lower Q^2 values compared to the true model.

   In a perspective of future clinical application, an optimized reduced
   signature was then proposed. To fulfill this objective, the redundancy
   between methods was eliminated (correlation coefficient > 0.8), keeping
   the most robust variable (highest intensity, best peak purity). In a
   second step, this signature was restricted to the strictly formally
   identified compounds. The prediction model performance was evaluated
   using a confusion matrix, cross-validated error rates (using 200
   repetitions of random training/test splits), and areas under ROC curves
   (AUC) [115][45] using the R software (version 3.6.2) [R package “pROC”
   [116][46]] with a CI estimated with the DeLong's method [117][47].

3.8. Metabolite annotation

   The metabolite annotation was first conducted computationally using W4M
   and then, all annotations involved manual curation and interpretation
   of spectra.

   Metabolites contributing to the discrimination of the MetS phenotype
   were first identified using in-house databases, containing the
   reference spectra of more than 2000 authentic standard compounds
   analyzed in the same analytical conditions, and providing a
   comprehensive spectral information (i.e. protonated or deprotonated
   molecules, adducts and in-source fragment ions for LC-HRMS, or
   molecular ions as well as major fragments for GCMS). Metabolite
   annotation was first performed by using these spectral databases
   according to accurately measured masses within MS spectra and
   chromatographic retention times. Confirmation of metabolite annotation
   in LC-HRMS was then accomplished by running additional LC-MS/MS
   experiments using a Dionex Ultimate chromatographic system combined
   with a Q-Exactive mass spectrometer (Thermo Fisher Scientific) under
   non-resonant collision-induced dissociation conditions using
   higher-energy C-trap dissociation (HCD) in both positive and negative
   ion modes, conducted on the same QC samples, and with the instrument
   set in the targeted acquisition mode, using inclusion lists. Resulting
   MS/MS spectra were then manually matched to those included in the
   in-house spectral database and acquired using different collision
   energies. Confirmation of metabolite annotation in GC-MS was done by
   matching electron impact spectra, as well as using reports from the
   literature.

   Then, the remaining unknown compounds were identified on the basis of
   their exact masses which were compared to those registered in Metlin
   ([118]https://metlin.scripps.edu; [119][48]), in the Human Metabolome
   Database (HMDB; [120]www.hmdb.ca; [121][49]), in Massbank
   ([122]https://massbank.eu/MassBank/; [123][50]), in Kyoto Encyclopedia
   of Genes and Genomes (KEGG) database ([124]http://www.genome.jp/kegg/;
   [125][51]), or in the National Institute of Standard and Technology
   (NIST; [126]https://www.nist.gov/srd/nist-special-database-14;
   [127][52]). Database queries were performed with a mass error of
   0.005 Da, and a retention time difference of 0.1 min for the in-house
   databases. Database results were confirmed using appropriate standards
   when available, isotopic patterns, and mass fragmentation analyses. For
   unidentified ions, the number of plausible elemental compositions were
   restricted to a small number (or uniquely identified) with the support
   of additional chemical information, i.e. the molecular formula of the
   precursor ions, reports from the literature [128][53], and knowledge of
   possible metabolic pathways. Metabolites were classified accordingly to
   Sumner et al. [129][54] concerning the levels of confidence in the
   identification process: identified (confirmed by an authentic chemical
   standard analyzed under the same conditions, with the match at least
   two orthogonal criteria among accurate measured mass, retention time
   and MS/MS or EI(MS) spectrum), putatively annotated (spectral
   similarity with public/commercial spectral libraries), putatively
   characterized compound classes or unknown.

   It is important to note that only very few standards of lipid species
   are commercially available compared to the large diversity of
   endogenous lipid species present in complex biological matrices.
   Therefore, results of in-house database queries were filtered,
   according to the workflow described in Seyer et al. [130][32], taking
   into account retention time ranges of each lipid class, as well as
   isotope patterns, for selection of relevant lipid species, as
   previously described in the data filtration section. Finally, all HCD
   mass spectra resulting from the additional MS/MS experiments, were
   manually inspected to identify specific diagnostic ions and to confirm
   the structure of lipid species [131][55] (see Supplemental Fig. 2),
   that were named following the LipidMaps nomenclature [132][56].

   Spectral assignments were based on matching 1D and 2D NMR data to
   reference spectra in a homemade reference database, as well as with
   other databases ([133]http://www.bmrb.wisc.edu/metabolomics/;
   [134]http://www.hmdb.ca/), and reports from the literature [135][57].

3.9. Extraction of modulated metabolic network

   To link metabolites identified in untargeted metabolomics/lipidomics
   experiments within the context of genome-scale reconstructed metabolic
   networks, the metabolites described as modulated after LMM, stable over
   time, and identified or annotated, were mapped into the human
   genome-scale metabolic network Recon2.2 [136][58]. This network
   contains 7785 reactions and 6047 metabolites. In order to map the
   modulated metabolites on this network, we first retrieved their ChEBI
   identifier and then search for their matching identifier in the
   Recon2.2 network using the "identifier matcher tool" in MetExplore.
   This tool allows performing both an exact matching (to find the exact
   metabolite in the network corresponding to the modulated metabolite
   from the experimental dataset) and an ontology-based matching (to make
   the link with a corresponding more generic class metabolite, when the
   exact same metabolite cannot be retrieved in the network) [137][59]. In
   the metabolic network, each metabolite is assigned to several different
   cellular compartments. However, because current global and untargeted
   metabolomics approaches do not provide information on cellular
   localization of metabolites, we chose to consider only cytosolic
   metabolites. In order to focus on the most likely modulated part of the
   network, we first selected all the metabolic pathways in which at least
   one modulated metabolite was found, while excluding pathways involving
   only transport reactions. Forty-one pathways, including 2753 reactions,
   were selected. Then, a metabolic sub-network extraction was performed
   from the modulated metabolites. It consists in computationally
   identifying among the previously selected reactions, the ones that are
   more likely to be related to the modulated metabolites. The algorithm
   computes the lightest path between each pair of metabolites in the
   dataset. The lightest path is a sequence of reactions and metabolites
   connecting two metabolites and minimizing a topological criterion in
   the network [[138]60,[139]61]. For one dataset, the related sub-network
   is thus the union of all the lightest paths between metabolites present
   in this dataset. Pathway enrichment analyses were performed to assess
   whether the modulated metabolites were significantly over-represented
   in a metabolic pathway. Pathway enrichment statistics were performed
   using the one-tailed exact Fisher test, with a BH correction for
   multiple tests, using the metabolic pathways defined in Recon2.2. All
   computational and visualization tasks were performed within MetExplore
   web server based on the Recon2.2 metabolic network (biosource id #4311)
   [[140]62,[141]63].

   Role of funding sources: All metabolomics and lipidomics analyses (data
   collection) were funded by the MetaboHUB French infrastructure
   (ANR-INBS-0010). Funders had no role in study design, data analysis,
   interpretation or writing of report.

4. Results

4.1. Overview of study population

   Fifty-eight quantitative variables in total were considered to
   precisely describe the selected population: 23 biochemical parameters,
   8 clinical variables, 25 nutritional data (essentially related to
   macronutrient intake and selected nutrients described as being related
   with MetS), and finally 2 scores related to physical activity and
   global health (see Materials and Methods). As defined in the subject
   selection process (see Materials and Methods), MetS status of
   individuals was stable over the three years follow-up (for the 4 time
   points considered). Behind the stability of MetS status, the clinical
   parameters associated with MetS, analyzed at T1 and T4, were found
   stable over time, with a slight improvement for some of them (i.e.
   significant reduction of systolic blood pressure, fasting glucose and
   triglycerides (TG)). Differences of most of the MetS criteria
   quantitative variables were highly significant between cases and
   controls (BH corrected p-values from 10^−5 to 10^−21, [142]Table 1;
   Supplementary Fig. 1). The main descriptive data, as well as results
   from linear mixed models, are presented in [143]Table 1 and
   Supplementary Tables 1a, 1b. They showed that all the subjects were
   globally stable over time, not only for clinical values of MetS
   criteria, as already emphasized, but also for the main parameters
   related to physical activity, nutrition and health-related quality of
   life (physical (PCS) component summary score). Regarding MetS status,
   results showed that MetS subjects were less healthy and active than
   controls, with all global scores related to physical activity (PASE)
   and health-related quality of life (PCS) found significant (corrected
   p-values = 0.02 and 3.6 × 10^−3, respectively). Moreover, cases showed
   also significant lower total energy and carbohydrate intakes (corrected
   p-value = 0.025 and 6.8 × 10^−3, respectively). In addition, despite
   the fact that total dietary fiber intake was at the limit of
   significance, the evaluation of the consumption of cereal products,
   based on the Canadian Food Guide recommended intakes for grain products
   (Canadian-Healthy Eating Index, C-HEI) was significantly lower in cases
   in comparison to controls (corrected p-value = 0.052 and 0.013,
   respectively).

Table 1.

   Overview of the study population.
   Controls
     __________________________________________________________________

   Cases
     __________________________________________________________________

   Corrected p-value time (BH) Corrected p-value MetS status (BH)
   T1 T4 T1 T4
   n 62 62 61 61 – –
   Age (yrs) 73.5 ± 4.1 (62) – 74.1 ± 3.6 (61) – 1.00 0.34
   Body weight (kg) 71.0 ± 8.0 (62) 69.9 ± 7.8 (62) 87.7 ± 12.5 (61)
   87.4 ± 13.3 (61) 0.04 6.2 × 10^−14
   BMI (kg/m^2) 25.1 ± 2.3 (62) 24.8 ± 2.4 (62) 30.5 ± 3.7 (61) 30.6 ± 3.7
   (61) 0.37 1.5 × 10^−16
   Waist circumference (cm) 93.3 ± 6.9 (62) 92.8 ± 6.9 (62) 109.9 ± 8.9
   (61) 110.8 ± 9.5 (61) 0.67 6.2 × 10^−21
   Fasting serum glucose (mM) 5.08 ± 0.44 (62) 4.86 ± 0.58 (62)
   6.66 ± 1.45 (61) 6.54 ± 1.21 (61) 0.04 2.1 × 10^−15
   Fasting TG (mM) 1.23 ± 0.47 (50) 1.18 ± 0.40 (53) 2.23 ± 1.01 (51)
   1.94 ± 0.86 (51) 0.04 1.6 × 10^−8
   Fasting HDL-C (mM) 1.43 ± 0.45 (50) 1.50 ± 0.34 (53) 1.13 ± 0.29 (56)
   1.16 ± 0.26 (56) 0.74 1.1 × 10^−5
   SBP (mmHg) 126.2 ± 16.6 (62) 120.9 ± 18.4 (62) 138.4 ± 15.8 (61)
   133.7 ± 19.3 (61) 0.02 4.4 × 10^−5
   DBP (mmHg) 71.8 ± 9.9 (62) 73.9 ± 8.1 (62) 74.7 ± 8.9 (61) 73.6 ± 9.4
   (61) 0.69 0.47
   Leucoc (10^9/L) 5.61 ± 1.29 (62) 5.97 ± 1.57 (62) 6.32 ± 1.22 (61)
   6.55 ± 1.34 (61) 0.02 2.0 × 10^−2
   Lympho (10^9/L) 1.49 ± 0.45 (62) 1.54 ± 0.46 (62) 1.75 ± 0.43 (61)
   1.76 ± 0.63 (61) 0.50 2.0 × 10^−2
   SF-36-Physical Component Summary Score[144]^* (PCS) 52.8 ± 5.8 (62)
   52.3 ± 6.0 (61) 49.7 ± 8.0 (61) 46.7 ± 9.2 (61) 0.01 3.6 × 10^−3
   Physical activity (PASE score) 125.4 ± 51.9 (62) 124.7 ± 53.5 (59)
   107.1 ± 55.2 (61) 94.6 ± 50.5 (57) 0.41 2.3 × 10^−2
   Energy (kCal/day) 2179 ± 524 (62) 2251 ± 576 (62) 1935 ± 462 (60)
   2026 ± 506 (59) 0.08 2.5 × 10^−2
   Carbohydrate (g/day) 269.6 ± 68.8 (62) 272.3 ± 73.6 (62) 231.7 ± 64.0
   (60) 238.4 ± 62.5 (59) 0.45 6.8e-3
   Protein (g/day) 87.5 ± 22.8 (62) 89.3 ± 28.2 (62) 83.1 ± 20.4 (60)
   82.9 ± 22.6 (59) 0.75 0.31
   Lipid (g/day) 78.5 ± 23.9 (62) 83.8 ± 26.7 (62) 71.9 ± 23.4 (60)
   80.3 ± 27.0 (59) 0.02 0.35
   C-HEI-Cereals (score: 0–10) 7.99 ± 2.25 (62) 7.72 ± 2.26 (62)
   6.66 ± 2.05 (60) 6.99 ± 2.20 (60) 0.88 1.3 × 10^−2
   Total dietary fiber (g/day) 23.6 ± 9.4 (62) 24.9 ± 10.7 (62) 19.8 ± 7.5
   (60) 21.4 ± 7.2 (59) 0.10 5.2 × 10^−2
   [145]Open in a new tab

   Mean ± SD; linear mixed model p-value (after Benjamini-Hochberg (BH)
   correction for 58 parameters). Bold: corrected p-value < 0.05.
   ^⁎

   T-scores based on a mean of 50 and a SD of 10.

4.2. Serum metabolomes in the NuAge sub-cohort

   Given the high diversity of metabolites present over a wide
   concentration range in biological samples, proper MetS evaluation
   requires a broad metabolome coverage. The use of complementary
   technologies, combining both Nuclear Magnetic Resonance (NMR) and
   high-resolution mass spectrometry (HRMS), as well as different
   chromatographic systems, including gas, reverse-phase and hydrophilic
   interaction chromatography with detection in both positive and negative
   electrospray ionization modes, allowed covering both polar and apolar
   compounds for relevant and comprehensive metabolome and lipidome
   analysis (see Materials and Methods). On this basis, a full workflow
   was developed for serum analysis, allowing the sample preparation and
   the measurement of a wide diversity of metabolites from the more polar
   ones to lipids ([146]Fig. 1a and b).

Fig. 1.

   [147]Fig. 1
   [148]Open in a new tab

   Study design and multiplatform metabolomics data generation.

   a: Experimental design of the MetS case/control study, involving the
   follow-up of 123 stable subjects over 3 years.

   b: Analytical workflow based on seven complementary untargeted
   metabolomics methods using 3 different analytical platforms (LC- and
   GC-HRMS, NMR) for the serum analysis of the 123 subjects collected at
   two time points. Data analysis resulted in 2915 metabolite and lipid
   related features, detected across at least 20% of the subjects over
   time.

   c: The similarity of method blocks evaluated using RV coefficients (a
   matrix correlation coefficient (see method section)) after Multiple
   Factorial Analysis. d: Spearman correlation network between the 2915
   metabolite and lipid features from the six metabolomic/lipidomic
   datasets (significant correlation >0.7). Nodes correspond to features
   obtained using the different analytical platform (Pink: C18pos; Dark
   Blue: C18neg; Green: GCMS; Blue: HILICneg; Yellow: Lipidomics; Black:
   NMR). Two nodes are connected by an edge if correlation is significant
   between the corresponding features (For interpretation of the
   references to color in this figure legend, the reader is referred to