Abstract

Background

   Outcome prediction tools for patients with type 2 diabetes mellitus
   (T2DM) undergoing percutaneous coronary intervention (PCI) are lacking.
   Here, we developed a machine learning-based metabolite classifier for
   predicting 1-year major adverse cardiovascular events (MACEs) after PCI
   among patients with T2DM.

Methods

   Serum metabolomic profiling was performed in a nested case–control
   study of 108 matched pairs of patients with T2DM occurring and not
   occurring MACEs at 1 year after PCI, then the matched pairs were 1:1
   assigned into the discovery and internal validation sets. External
   validation was conducted using targeted metabolite analyses in an
   independent prospective cohort of 301 patients with T2DM receiving PCI.
   The function of candidate metabolites was explored in high
   glucose-cultured human aortic smooth muscle cells (HASMCs).

Results

   Overall, serum metabolome profiles differed between diabetic patients
   with and without 1-year MACEs after PCI. Through VSURF, a machine
   learning approach for feature selection, we identified the 6 most
   important metabolic predictors, which mainly targeted the nicotinamide
   adenine dinucleotide (NAD^+) metabolism. The 6-metabolite model based
   on random forest and XGBoost algorithms yielded an area under the curve
   (AUC) of ≥ 0.90 for predicting MACEs in both discovery and internal
   validation sets. External validation of the 6-metabolite classifier
   also showed good accuracy in predicting MACEs (AUC 0.94, 95% CI
   0.91–0.97) and target lesion failure (AUC 0.89, 95% CI 0.83–0.95). In
   vitro, there were significant impacts of altering NAD^+ biosynthesis on
   bioenergetic profiles, inflammation and proliferation of HASMCs.

Conclusion

   The 6-metabolite model may help for noninvasive prediction of 1-year
   MACEs following PCI among patients with T2DM.

Supplementary Information

   The online version contains supplementary material available at
   10.1186/s12933-022-01561-1.

   Keywords: Metabolomics, Type 2 diabetes, Clinical outcomes after PCI,
   Prediction model, NAD metabolites

Background

   Patients with type 2 diabetes mellitus (T2DM) account for more than a
   quarter of all coronary artery disease (CAD) patients receiving
   percutaneous coronary intervention (PCI) [[29]1]. Despite great
   advances in stent technologies, T2DM remains a strong indicator of
   major adverse cardiovascular events (MACEs) after PCI [[30]2, [31]3].
   Thus, identifying biomarkers for noninvasive prediction of post-PCI
   outcomes among type 2 diabetic patients has substantial clinical
   implications [[32]4].

   Metabolomics, which provides untargeted measurements of the
   multiparametric metabolic response of living systems to
   pathophysiological stimuli [[33]5], has the potential to both discover
   new biomarkers and reveal key metabolic pathways intrinsic to the
   disease pathogenesis [[34]6]. Although such metabolomic approaches have
   been increasingly explored in cardiovascular biomarker discovery, most
   previous studies have concentrated on the screening of metabolic
   biomarkers for the discrimination of established CAD from non-CAD
   controls [[35]7, [36]8]. Yet it remains unclear how systemic metabolic
   alterations impact clinical outcomes after PCI, especially for patients
   having T2DM. Moreover, the majority of metabolic biomarker candidates
   for cardiovascular disease were identified using the classical
   generalized linear method of regression [[37]9, [38]10]. Modern machine
   leaning approaches, which are better able to incorporate high-order
   nonlinear associations between predictors to gain predictive
   performance [[39]11], have rarely been applied to outcome predictions
   for type 2 diabetic patients receiving PCI.

   Hence, in a nested case–control study of 216 patients with T2DM who
   underwent PCI due to obstructive CAD, we first assessed the prospective
   associations of serum metabolic profiles at baseline with the risk of
   incident MACEs at 1 year after PCI. Then, a 6-metabolite model, mainly
   targeting the pathway of nicotinamide adenine dinucleotide (NAD^+)
   metabolism, was developed and internally validated for the prediction
   of 1-year MACEs following PCI based on a set of machine learning
   algorithms. Next, we externally verified the 6-metabolite model using a
   targeted metabolite analysis in an independent prospective cohort of
   301 diabetic patients who received PCI. Finally, we explored the
   biological relevance of altering NAD^+ biosynthesis to abnormal
   phenotypes of human aortic smooth muscle cells (HASMCs) under high
   glucose (HG) conditions.

Methods

Study design and participants

   An overview of the study design is depicted in Fig. [40]1. In brief, we
   first conducted a nested case–control study within a prospective cohort
   of 702 patients with T2DM who underwent primary PCI from Sep 2017 to
   Jan 2019 in the First Affiliated Hospital of Zhengzhou University. As
   previously described [[41]12, [42]13], the prospective cohort excluded
   patients who had systematic diseases including cancer, serious
   infection, chronic liver disease, and type 1 diabetes. T2DM was
   diagnosed based on the 2014 American Diabetes Association criteria
   [[43]14]. All participants were hospitalized for angiographically
   confirmed obstructive CAD [[44]15] and underwent PCI at baseline, and
   then completed a follow-up of 1 year to track MACEs [composite of
   all-cause death, myocardial infarction (MI), stroke, and repeat
   revascularization] as the primary outcome. Clinical, angiographic, and
   procedural data were collected at baseline (Table [45]1), and outcome
   data were obtained from medical records and telephone interviews with
   participants at 30 days and 6, 9, 12 months after PCI. Within a median
   follow up of 12.5 months (interquartile range [IQR]: 11.9–12.6 months),
   108 (15.4%) patients occurred MACEs (all-cause death, n = 46; repeat
   revascularization, n = 54; MI, n = 13; stroke, n = 7). Of the remaining
   594 participants without the occurrence of MACEs, 108 individuals,
   matched for baseline characteristics using propensity scores [[46]16],
   were selected as the controls. Then, the matched case–control pairs
   were randomly assigned (1:1) to a discovery set or an internal
   validation set. Both sets could provide a > 90% power to detect a fold
   change of > 4/3 or < 3/4 for differential metabolites at a false
   discovery rate (FDR) of < 5%.

Fig. 1.

   [47]Fig. 1
   [48]Open in a new tab

   Study workflow and design. PCI percutaneous coronary intervention;
   MACEs major adverse cardiovascular events; VSURF variable selection
   using random forest; XGBoost extreme gradient boosting; SVM Support
   Vector Machines; DNN deep neural network

Table 1.

   Baseline characteristics of each study set
   Variables^a Nested case–control study Prospective cohort study
   Discovery set Internal validation set External validation set
   MACEs (n = 54) Controls (n = 54) P^a MACEs (n = 54) Controls (n = 54)
   P^a With MACEs (n = 47) Without MACEs (n = 254) P^a
   Clinical data
    Age, years 63.4 ± 9.5 63.1 ± 9.5 0.86 63.9 ± 9.0 64.5 ± 10.6 0.78
   62.3 ± 9.5 62.8 ± 9.4 0.71
    Male 33 (61.1) 35 (64.8) 0.69 32 (59.3) 34 (63.0) 0.69 30 (63.8) 146
   (57.5) 0.42
    Current smokers 11 (20.4) 11 (20.4) 1.00 10 (18.5) 12 (22.2) 0.63 17
   (36.2) 53 (20.9) 0.023
    BMI, kg/m^2, 25.6 ± 4.0 25.3 ± 4.3 0.73 25.1 ± 4.5 25.4 ± 3.6 0.75
   25.2 ± 3.9 25.0 ± 4.1 0.75
    FPG, mmol/L 8.8 ± 2.1 8.4 ± 1.9 0.24 8.4 ± 2.1 8.8 ± 2.1 0.37
   8.2 ± 2.3 8.4 ± 2.1 0.72
    HbA1c, (%) 7.6 ± 1.8 7.3 ± 1.5 0.33 7.2 ± 1.7 7.8 ± 2.3 0.15 8.1 ± 1.9
   7.4 ± 1.7 0.020
    Diabetes duration, years 9.2 ± 5.1 10.0 ± 4.8 0.39 9.9 ± 5.4 9.0 ± 4.9
   0.36 9.5 ± 5.6 10.0 ± 5.4 0.58
    Diabetes management
     Lifestyle modification 12 (22.2) 18 (33.3) 0.55 15 (27.8) 17 (31.5)
   0.95 14 (29.8) 87 (34.3) 0.34
     Oral agents only 13 (24.1) 11 (20.4) 18 (33.3) 16 (29.6) 12 (25.5) 80
   (31.5)
     Insulin only 16 (29.6) 16 (29.6) 11 (20.4) 10 (18.5) 10 (21.3) 53
   (20.9)
     Oral agents and insulin 13 (24.1) 9 (16.7) 10 (18.5) 11 (20.4) 11
   (23.4) 34 (13.4)
    PAD 8 (14.8) 9 (16.7) 0.79 7 (13.0) 8 (14.8) 0.78 6 (12.8) 25 (9.8)
   0.55
    Dyslipidemia 15 (27.8) 13 (24.1) 0.66 12 (22.2) 12 (22.2) 1.00 10
   (21.3) 62 (24.4) 0.64
    Hypertension 20 (37.0) 22 (40.7) 0.69 19 (35.2) 20 (37.0) 0.84 23
   (48.9) 111 (43.7) 0.51
    LVEF < 50% 11 (20.4) 8 (14.8) 0.45 9 (16.7) 10 (18.5) 0.80 11 (23.4)
   27 (10.6) 0.015
    Clinical Presentation
     ACS 35 (64.8) 37 (68.5) 0.68 34 (63.0) 36 (66.7) 0.69 30 (63.8) 137
   (53.9) 0.21
     SCAD 19 (35.2) 17 (31.5) 20 (37.0) 18 (33.3) 17 (36.2) 117 (46.1)
   Angiographic data
    Chronic total occlusion 7 (13.0) 7 (13.0) 1.00 7 (13.0) 9 (16.7) 0.59
   9 (19.1) 40 (15.7) 0.56
    Multivessel CAD 35 (64.8) 36 (66.7) 0.84 32 (59.3) 35 (64.8) 0.55 37
   (78.7) 158 (62.2) 0.029
    Calcification 8 (14.8) 10 (18.5) 0.61 7 (13.0) 8 (14.8) 0.78 9 (19.1)
   36 (14.2) 0.38
    ACC/AHA lesion B/C 36 (66.7) 36 (66.7) 1.00 37 (68.5) 35 (64.8) 0.68
   30 (63.8) 165 (65.0) 0.88
    SYNTAX score 22.9 ± 8.5 22.7 ± 9.3 0.91 24.6 ± 7.7 24.2 ± 9.0 0.81
   21.7 ± 7.3 22.4 ± 8.5 0.63
   Procedure data
    Stent diameter, mm 3.1 ± 0.4 3.0 ± 0.4 0.33 3.2 ± 0.4 3.1 ± 0.4 0.74
   3.1 ± 0.5 3.1 ± 0.6 0.99
    Stent length, mm 32.9 ± 17.4 34.3 ± 17.4 0.64 34.5 ± 14.7 31.7 ± 10.8
   0.27 31.1 ± 14.5 35.0 ± 20.5 0.22
    Complete revascularization 21 (38.9) 21 (38.9) 1.00 17 (31.5) 18
   (33.3) 0.84 13 (27.7) 81 (31.9) 0.57
    Stent type
     First generation DES 36 (66.7) 39 (72.2) 0.53 36 (66.7) 34 (63.0)
   0.69 31 (66.0) 169 (66.5) 0.94
     Second generation DES 18 (33.3) 15 (27.8) 18 (33.3) 20 (37.0) 16
   (34.0) 85 (33.5)
   [49]Open in a new tab

   MACEs major adverse cardiovascular event; BMI body mass index; FPG
   fasting plasma glucose; HbA1c glycosylated haemoglobin; PAD peripheral
   artery disease; LVEF left ventricular ejection fraction; ACS acute
   coronary syndrome; SCAD stable coronary artery disease; DES
   drug-eluting stent

   ^aValues are mean ± SD or n (%) as appropriate. P values are obtained
   from the student t test for continuous variables, and the Pearson χ^2
   test for categorical variables

   The external validation set was a prospective study of 301 patients
   with T2DM who were treated with PCI at the Zhongnan Hospital of Wuhan
   University between May 2016 and Jun 2017. The exclusion criteria were
   the same as in the above nested case–control study. For all
   participants, clinical follow-up was performed at 30 days and 6, 9,
   12 months after PCI, and angiographic follow-up was conducted at
   12 months after PCI [[50]17]. The primary outcome remained MACEs, while
   the secondary outcome was target lesion failure (TLF), a
   device-oriented composite endpoint of cardiac death, target vessel MI,
   and target lesion revascularization [[51]18]. During a median follow-up
   of 12.4 months (IQR: 11.7–12.6 months), a total of 47 (15.6%) MACEs
   (all-cause death, n = 17; repeat revascularization, n = 22; MI, n = 9;
   stroke, n = 3) and 30 (10.0%) TLF were documented. This study followed
   the Transparent Reporting of a Multivariable Prediction Model for
   Individual Prognosis or Diagnosis (TRIPOD) statement [[52]19]. More
   details in study design, baseline characteristics, and outcome
   definitions are summarized in Additional file [53]1: Supplementary
   Methods. The Study protocol was approved by local institutional review
   boards, and written informed consents were obtained from all
   participants.

Untargeted metabolic profiling by liquid chromatography-mass spectrometry
(LC–MS)

   For all participants, serum samples were isolated from whole blood by
   centrifugation within 30 days before PCI, and stored at – 80 ℃ until
   use. To minimize the potential impacts of storage conditions on
   metabolite stability, we analyzed the level of methionine, a metabolite
   that could be extensively degraded if the frozen serum samples were
   stored too long or improperly [[54]20]. As a result, the relative
   abundance of methionine for all samples was greater than 3 standard
   deviations below the mean (Additional file [55]1: Fig. S1), indicating
   that the serum samples were properly stored for metabolite detection.

   Details in LC–MS procedures are described in Additional file [56]1:
   Supplementary Methods. Briefly, serum samples were first treated with
   acetonitrile/methanol (1:1, v/v) and isotope-labeled internal standard
   mixtures for metabolite extraction. Then, metabolite extracts were
   separated using an UPLC BEH Amide column (2.1 mm × 100 mm, 1.7 µm) on a
   Vanquish UHPLC system (Thermo, Waltham, USA). The column eluent was
   further detected for the acquisition of MS/MS spectra on a Q Exactive
   Orbitrap mass spectrometer (Thermo, Waltham, USA) operating in the
   positive and negative ion modes. The acquired MS raw data were analyzed
   on the XCMS Online platform ([57]https://xcmsonline.scripps.edu)
   [[58]21] for peak detection and metabolite annotation. The best-matched
   internal standard (B-MIS) normalization method [[59]22] was used to
   normalize peak areas and yield relative abundance of metabolites. To
   ensure data quality, the quality control (QC) samples were prepared by
   mixing an equal aliquot of all samples, and run at the beginning of the
   sample queue for column conditioning and each of 10 samples thereafter.
   Metabolic peaks with relative standard deviations (RSD) of > 30% across
   QC samples or presenting in < 80% of QC samples were removed for
   further analysis [[60]23].

Targeted metabolite analysis in the external validation set

   External validation of 6 metabolic biomarkers selected from untargeted
   metabolomic profiling was conducted by targeted metabolite analyses on
   a 20AD UPLC system (Shimadzu, Kyoto, Japan) coupled with a QTrap 5500
   mass spectrometer (SCIEX, Framingham, USA) operating in the multiple
   reaction monitoring mode (Additional file [61]1: Supplementary
   Methods). The absolute concentration (µmol/L) of each metabolite in
   serum was determined using a 7-point calibration curve, created by
   calculating the peak area ratio of each calibrator (Sigma, St. Louis,
   USA) versus its concentration. All the calibration curves showed a good
   linearity, with R^2 values of > 0.990, intra- and inter-batch precision
   values (as RSD) of < 15%, and accuracy values (as relative error)
   ranging from − 9.1 to 5.3% (Additional file [62]1: Table S1).

In vitro experiments in HG-cultured HASMCs

   HASMCs (LONZA) were grown in Dulbecco’s Modified Eagle’s medium (LONZA)
   supplemented with 25 mM D-glucose, 4 mM L-glutamine, 10% fetal bovine
   serum, and 1% penicillin/streptomycin in a humidified atmosphere at
   37 ℃ and 5% CO[2]. After 3–5 passages, HG-cultured HASMCs were treated
   with 10 μM FK866 (an inhibitor of nicotinamide
   phosphoribosyltransferase, Sigma) for 20 h to block the salvage
   biosynthetic pathway of NAD^+, or 200 μM 1-methyl-L-tryptophan (1MT, an
   inhibitor of indole-2,3-dioxygenase, Sigma) for 20 h to inhibit de novo
   synthesis of NAD^+, or 10 μM β-nicotinamide mononucleotide (NMN, a
   NAD^+ precursor, Sigma) for 20 h to sustain NAD^+ levels [[63]24]. For
   each group of HASMCs, NAD^+ levels were detected by LC–MS; the activity
   of mitochondrial respiratory chain complexes (I–V) was measured using
   spectrophotometric assays [[64]25]. The bioenergetic profile of HASMCs
   was determined by an XF24 Extracellular Flux Analyzer (Agilent, Santa
   Clara, USA) to calculate the parameters of mitochondrial respiration
   and glycolysis [[65]26]. The mRNA expression and protein secretion of
   proinflammatory cytokines in HASMCs were examined by reverse
   transcription quantitative PCR and cytometric bead array
   (BD-Pharmingen), respectively. A transwell migration assay was
   performed to evaluate the ability of HASMCs to recruit THP1 monocytes
   [[66]27]. Proliferation of HASMCs was assessed using a methylene blue
   dye assay, as described in our previous report [[67]13]. All in vitro
   experiments were repeated 3 times. More Details are provided in
   Additional file [68]1: Supplementary Methods and Additional file [69]1:
   Table S2.

Statistical analysis

   In both discovery and internal validation sets, the global metabolic
   differences between participants with and without MACEs were assessed
   by a supervised model of orthogonal partial least-squares discriminate
   analysis (OPLS-DA). For assessing the robustness of OPLS-DA, we
   performed 200 permutations of the metabolomic datasets, and for each
   permutation, the values of Q^2 and R^2Y were calculated by a seven-fold
   cross validation to reflect the goodness of prediction and the risk of
   overfitting, respectively. The values of variable importance in the
   projection (VIP) were also calculated to reflect the contribution of
   each metabolite to the group discrimination in the OPLS-DA model.
   Differential comparisons of single metabolites between groups were
   examined using the Mann–Whitney U test, followed by the
   Benjamini–Hochberg FDR-controlling method [[70]28] to adjust for
   multiple comparisons. Metabolites with VIP values of > 1.0, FDRs
   of < 0.05, and fold changes of > 4/3 or < 3/4 were considered as the
   differential metabolites [[71]23, [72]29], which were further mapped
   into the KEGG database ([73]https://www.kegg.jp/) for pathway
   enrichment analyses. Then, an optimal set of metabolic features for
   predicting MACEs was selected from the differential metabolites using
   the R package VSURF (Variable Selection Using Random Forests) [[74]30],
   in which a recommended stepwise random forest (RF) procedure [[75]31]
   was executed to identify the best combination of discriminant variables
   for classification prediction modeling on the basis of predictive
   accuracy (as the amount of out-of-bag error) and parsimony (as the
   number of selected variables).

   In the discovery set, we integrated the metabolic features selected by
   VSURF to develop the reference model of logistic regression and 4
   machine learning models for prediction of MACEs. The 4 machine learning
   algorithms were: (1) RF, an ensemble approach that produces multiple
   decision trees for classification [[76]32], (2) extreme gradient
   boosting (XGBoost), another ensemble machine using the Shapley additive
   explanations method to create an additive model of decision trees
   [[77]33], (3) nonlinear Support Vector Machines (SVM) with a polynomial
   kernel [[78]34], and (4) deep multilayer neural network (DNN) with the
   adaptive moment estimation optimizer [[79]35]. Details in model
   parameters are listed in the Additional file [80]1: Supplementary
   Methods. The importance of each feature to the prediction was assessed
   by Gini index in RF and by relative importance values in XGBoost. The
   predictive performance of 4 machine learning models was compared with
   that of the reference model (i.e., logistic regression) by measuring:
   (1) discrimination statistics including area under the
   receiver-operating characteristic curve (AUC), sensitivity,
   specificity, positive predictive value, and negative predictive value;
   (2) continuous net reclassification index (NRI); (3) calibration
   statistics (P-value of the Hosmer–Lemeshow test, calibration slope, and
   calibration plots); and (4) net clinical benefit through decision curve
   analyses [[81]36]. To adjust for potential over-fitting and
   over-optimism, a seven-fold cross validation was also performed to
   obtain a bias-corrected AUC for each model.

   We performed Kaplan–Meier curves and Cox regression to assess the
   prognostic values of the derived models. The proportional hazards
   assumption of Cox models was verified by visual inspection of
   log-minus-log plots and calculation of Schoenfeld residuals.
   Sensitivity analyses were also conducted to validate model performance
   after stratification by initial clinical presentations and different
   components of MACEs. For in vitro experiments, intergroup differences
   were compared using one-way ANOVA with LSD post hoc tests. All
   statistical analyses were conducted with R (version 3.5.3) and SIMCA-P
   (version 16.0.2). A P-value < 0.05 was considered significant.

Results

Baseline characteristics

   As described in Table [82]1, after propensity score matching and random
   assignment, the baseline characteristics in both discovery and internal
   validation sets were similar between patients occurring MACEs (n = 54
   for each set) and matched controls (n = 54 for each set). In the
   external validation set, MACEs (n = 47) were more likely to occur in
   patients who were current smokers or had left ventricular ejection
   fractions of < 50%, multivessel CAD, or higher levels of glycated
   hemoglobin at baseline (Table [83]1).

Global metabolic alterations in patients with T2DM occurring MACEs after PCI

   We first assessed the reliability of the LC–MS analysis using QC
   samples. As presented in Additional file [84]1: Figure S2, the Pearson
   correlation coefficients of metabolomic data among QC samples were
   greater than 0.99, indicating a good reproducibility of the LC–MS
   analysis. After data quality control and peak alignment, serum
   metabolome analyses annotated a total of 778 metabolites (discovery
   set: 743; internal validation set: 674; identified in both sets: 639),
   which were distributed across 10 ontology classes (Additional file
   [85]1: Table S3).

   As illustrated in the OPLS-DA models, the metabolomic profile of the
   MACEs group was significantly distinct from that of the matched control
   group in both discovery (R^2Y = 0.81, Q^2 = 0.65, Fig. [86]2A) and
   internal validation sets (R^2Y = 0.83, Q^2 = 0.67, Fig. [87]2B). After
   200 permutations, the intercepts of goodness-of-prediction (Q^2) and
   goodness-of-fit (R^2) values were within ± 0.5 (Fig. [88]2C and D),
   indicating that the OPLS-DA models were well explained and not
   overfitting. Based on the conditions of VIP values > 1.0, FDR < 0.05,
   and fold changes > 4/3 or < 3/4, the volcano plots depicted 69
   differential metabolites (37 upregulated, 32 downregulated, Fig. [89]2E
   and Additional file [90]1: Table S4) in the discovery set and 89
   differential metabolites (56 upregulated, 33 downregulated, Fig. [91]2F
   and Additional file [92]1: Table S5) in the internal validation set.
   Pathway enrichment analyses of the differential metabolites found 4
   metabolic pathways significantly perturbed in patients with incident
   MACEs, involving nicotinate and nicotinamide metabolism, tryptophan
   metabolism, glycerophospholipid metabolism, pentose phosphate pathway,
   and glycolysis (Fig. [93]2G). Of all the differential metabolites, 35
   were identified in both discovery and internal validation sets (20
   upregulated, 15 downregulated, Fig. [94]2H), with the potential to
   better distinguish MACEs from matched controls (Additional file [95]1:
   Figure S3).

Fig. 2.

   [96]Fig. 2
   [97]Open in a new tab

   Differences in serum metabolome profiles between type 2 diabetic
   patients with and without the incidence of 1-year MACEs after PCI in
   the discovery and internal validation sets. A and B The OPLS-DA scatter
   plot. Each dot and square represents the serum metabolomic profile of a
   single participant in a 2-dimensional space. The ellipses represent 95%
   confidence intervals. Cross-validation plot with 200 permutations in
   the discovery (C) and internal validation sets (D). The volcano plot of
   differential metabolites in the discovery (E) and internal validation
   sets (F). The vertical dashed lines indicate the threshold of fold
   changes > 4/3 or < 3/4. The horizontal dashed line indicates the
   threshold of FDR < 0.05. G The pathway enrichment analysis of all
   differential metabolites. The horizontal dashed line indicates the
   threshold of FDR < 0.05. H The Venn diagram. MACEs major adverse
   cardiovascular events; FDR false discovery rate; FC fold change; VIP
   variable importance in the projection

Development and internal validation of a 6-metbolite signature to predict
MACEs

   From the 35 metabolites differentially expressed in both data sets
   (Fig. [98]3A), we sought to select an optimal set of discriminators for
   MACEs by considering the balance between classification accuracy and
   parsimony. For this purpose, the normalized data of the 35 metabolites
   from the discovery set were inputted into a tree-based VSURF algorithm,
   in which a total of 29 metabolites were gradually excluded due to low
   importance or high redundancy (Fig. [99]3B and C), finally leaving a
   subset of 6 metabolites with the lowest prediction error for
   multivariate modeling (Fig. [100]3D). Of particular note, among the 6
   metabolites, 4 were involved in biosynthesis (nicotinamide [NAM] and
   L-tryptophan), consumption (adenosine diphosphate ribose [ADPR]), or
   excretion [1-methylnicotinamide (1-MNAM)] of NAD^+ [[101]37], implying
   a crucial role of NAD^+ metabolism in the occurrence of MACEs among
   diabetic patients who received PCI.

Fig. 3.

   [102]Fig. 3
   [103]Open in a new tab

   The variable selection process for identifying a 6-metabolite signature
   to predict 1-year MACEs after PCI among patients with T2DM. A
   Hierarchical cluster analysis of the 35 metabolites differentially
   expressed in both discovery and internal validation sets. B Of the 35
   metabolites, 27 important metabolites with some redundancy are selected
   in the first step of a variable selection algorithm (VSURF). C Of the
   remaining 27 metabolites, 11 metabolites that avoid redundancy are
   further selected in the second step of VSURF. D The final step of VSURF
   identifies a combination of 6 metabolites with the best potential for
   predicting 1-year MACEs. The OPLS-DA scatter plot based on the 6
   metabolites in both discovery (E) and internal validation (F) sets. G
   The ROC analysis of the 6 metabolites based on logistic regression.
   ADPR adenosine diphosphate ribose; 1-MNAM 1-Methylnicotinamide; PC
   phosphatidylcholine; NAM nicotinamide

   As depicted in the OPLS-DA models (Fig. [104]3E and F), the combination
   of these 6 metabolites could clearly separated patients with MACEs from
   matched controls in both discovery and internal validation sets. The
   logistic regression model composed of these 6 metabolites yielded an
   AUC of 0.89 [95% confidence interval (CI) 0.81–0.94] in the discovery
   set and 0.85 (95% CI 0.76–0.91) in the internal validation set for
   predicting MACEs (Fig. [105]3G). When the 2 datasets were divided into
   high-(> 62%) and low-risk (≤ 62%) groups based on the risk probability
   predicted by the 6-metabolite model (cut-off value derived from the ROC
   analysis), Kaplan–Meier estimates of the rates of MACEs significantly
   differed between high- and low-risk groups (P < 0.001, Additional file
   [106]1: Figure S4). After multivariable adjustment by potential
   confounding factors, the 6-metabolite model remained a powerful and
   independent prognostic predictor for MACEs [discovery set: hazard ratio
   (HR) = 8.92; internal validation set: HR = 6.17, both P < 0.001,
   Additional file [107]1: Figure S4].

Improving performance of the 6-metabolite model using machine learning

   To evaluate whether the predictive performance of the 6-metabolite
   panel could be improved by the application of machine learning, we
   developed the 6-metabolite prediction models by incorporating the data
   of the discovery set into 4 machine learning algorithms: RF, XGBoost,
   SVM, and DNN. As summarized in Table [108]2, all of the machine
   learning models, except for DNN, yielded a significantly higher AUC
   (0.93–0.99, P[difference] < 0.05) than the reference model of logistic
   regression (AUC = 0.89). Likewise, the reclassification ability of 4
   machine learning models was also improved, with continuous NRIs ranging
   from 0.96 to 1.93.

Table 2.

   Discrimination and reclassification ability of the 6-metabolite model
   based on 4 machine learning algorithms and logistic regression
   Model Discrimination statistics Seven-fold CV Reclassification index
   AUC P value^a Sensitivity Specificity PPV NPV AUC NRI P value
   Discovery set
    Logistic regression 0.89 (0.81–0.94) Ref. 80 (67–89) 91 (80–97) 61
   (40–79) 96 (94–98) 0.89 Ref. Ref.
    Random forest 0.99 (0.95–1.00) < 0.001 98 (90–99) 96 (87–99) 83
   (55–95) 99 (98–100) 0.99 1.93 (1.83–2.03) < 0.001
    XGBoost 0.99 (0.96–1.00) < 0.001 98 (90–99) 96 (87–99) 83 (55–95) 99
   (98–100) 0.99 1.89 (1.77–2.01) < 0.001
   SVM 0.93 (0.86–0.97) 0.005 85 (73–93) 93 (82–98) 68 (45–84) 97 (95–99)
   0.91 1.00 (0.68–1.32) < 0.001
    DNN 0.91 (0.85–0.96) 0.092 96 (87–99) 87 (75–95) 58 (40–73) 99
   (97–100) 0.92 0.96 (0.66–1.26) < 0.001
   Internal validation
    Logistic regression 0.85 (0.76–0.91) Ref. 72 (58–84) 94 (85–99) 70
   (44–88) 95 (92–97) 0.85 Ref. Ref.
    Random forest 0.93 (0.87–0.97) 0.003 83 (71–92) 94 (85–99) 73 (48–89)
   97 (95–98) 0.91 1.33 (1.05–1.61) < 0.001
    XGBoost 0.91 (0.84–0.95) 0.023 85 (73–93) 94 (85–99) 73 (48–89) 97
   (95–99) 0.91 1.33 (1.05–1.61) < 0.001
    SVM 0.89 (0.82–0.94) 0.119 85 (73–93) 83 (71–92) 48 (34–63) 97 (94–98)
   0.88 0.22 (− 0.15 to 0.59) 0.240
    DNN 0.82 (0.74–0.89) 0.329 69 (54–81) 94 (85–99) 69 (42–87) 94 (92–96)
   0.84 0.22 (− 0.14 to 0.59) 0.233
   [109]Open in a new tab

   AUC area under the curve; PPV positive predictive value; NPV negative
   predictive value; CV cross validation; NRI net reclassification
   improvement; XGBoost extreme gradient boosting tree; SVM Support Vector
   Machines; DNN deep neural network

   ^aPaired comparisons of AUCs between logistic regression and each
   machine learning model are conducted using a DeLong test

   When the machine learning models were applied in the internal
   validation set, the top 2 best-performing models were RF and XGBoost
   models, which both showed significant improvements in discrimination
   and reclassification abilities to predict MACEs compared with the
   logistical regression model. Specifically, sensitivities for the RF and
   XGBoost models increased to  ~ 85% compared with 72% for the logistic
   regression model, meaning that about 13% (7/54) of patients who
   developed MACEs after PCI would be correctly identified using the RF
   and XGBoost models but would be missed when the logistic regression
   model was applied (Additional file [110]1: Figure S5). Interestingly,
   all 4 metabolites related to NAD^+ metabolism were highlighted as the
   top 4 important variables to the predictive outcomes in both RF and
   XGBoost models (Additional file [111]1: Figure S6). Otherwise, the SVM
   and DNN models did not improve predictive performance relative to the
   regression model in the internal validation set.

   For all prediction models, the average AUCs calculated by cross
   validation remained largely unchanged (Table [112]2); calibration plots
   showed a good agreement between predicted and observed outcomes
   (calibration slope around 1, Additional file [113]1: Figure S7).

External validation using a targeted metabolite analysis

   When determining the absolute concentrations of the 6 metabolites using
   a targeted metabolite analysis in the external validation cohort, we
   found that 3 of the 6 metabolites were associated with a higher risk of
   MACEs and the other 3 were associated with a lower risk of MACEs
   (Fig. [114]4A–F), which were consistent with the results from the
   untargeted metabolomics mentioned above. After inputting the normalized
   (also used the B-MIS method) data of the 6 metabolites into the
   established machining learning models, the AUCs for predicting MACEs
   reached to 0.92 (95% CI 0.88–0.95, P[difference] = 0.005, Fig. [115]4G)
   in RF and 0.94 (95% CI 0.91–0.97, P[difference] < 0.001, Fig. [116]4G)
   in XGBoost, compared to 0.85 (95% CI 0.81–89) in logistic regression.
   Decision curve analyses also showed that both RF and XGBoost models had
   larger net benefits (i.e., a greater number of appropriate triage)
   across the range of risk thresholds compared with the logistic
   regression model (Fig. [117]4H). When categorizing participants into
   high-risk and low-risk groups based on the prediction of the
   6-metbaolite classifier, the adjusted HR for MACEs was 7.68 (95% CI
   3.57–16.55, Additional file [118]1: Figure S8) for the comparison of
   high-risk versus low-risk groups. Likewise, the 6-metabolite classifier
   achieved a smaller but still good AUC (0.83–0.89) to predict the
   device-oriented endpoint of TLF (Fig. [119]4I).

Fig. 4.

   [120]Fig. 4
   [121]Open in a new tab

   External validation of the 6-metabolite signature by targeted
   metabolite analyses in a prospective cohort of 301 type 2 diabetic
   patients undergoing PCI. Comparisons of the absolute concentrations of
   ADPR (A), 1-MNAM (B), D-Ribose 5-phosphate (C), NAM (D), L-Tryptophan
   (E), and PC(36:2) (F) between patients with and without the incidence
   of MACEs at 1-year after PCI. G The improvements in AUCs of the
   6-metabolite models based on random forest or XGBoost compared with the
   logistic regression model. H The greater clinical benefit of
   6-metabolite model based on random forest or XGBoost compared with the
   logistic regression model. I The ROC analysis of the 6-metabolite
   models for predicting target lesion failure at 1 year after PCI. ADPR
   adenosine diphosphate ribose; 1-MNAM 1-Methylnicotinamide; PC
   phosphatidylcholine; NAM nicotinamide; XGBoost gradient-boosted
   decision tree

   Recently, the FREEDOM trial derived a personalized clinical risk model
   for MACE prediction in diabetic patients undergoing revascularization
   [[122]38]. Here, we further assessed the additional value of our
   6-metabolite model beyond the FREEDOM tool. As shown in Additional file
   [123]1: Table S6, adding the RF-based 6-metbolite model into the
   FREEDOM tool substantially increased the C-index to 0.87 (95% CI
   0.81–0.93) for predicting MACEs. The classification performance was
   also improved after addition of the 6-metabolite panel, with a
   categorical NRI of 0.60 (95% CI 0.45–0.75, P < 0.001) and an IDI of
   0.27 (95% CI 0.22–0.32, P < 0.001).

Internal and external validation by sensitivity analyses

   We first performed sensitivity analyses for assessing the performance
   of the 6-metabolite model in predicting different components of MACEs.
   As shown in Fig. [124]5, the 6-metabolite model consistently yielded
   high AUCs (≥ 0.90) for either predicting the combined end point of
   death, MI, and stroke or predicting repeat revascularization in both
   internal and external validation sets.

Fig. 5.

   [125]Fig. 5
   [126]Open in a new tab

   Sensitivity analyses for internally and externally validating the
   predictive performance of the 6-metabolite model after stratification
   by initial clinical presentations and different components of MACEs. RF
   random forest; XGBoost gradient-boosted decision tree; ACS acute
   coronary syndrome; SCAD stable coronary artery disease; MI myocardial
   infarction

   Then, considering that the prognosis following PCI significantly
   differed between patients initially presenting with acute coronary
   artery syndrome (ACS) and stable CAD (SCAD) [[127]39], sensitivity
   analyses were further conducted after stratification by initial
   clinical presentations. We observed that the AUCs of the 6-metabolite
   model for predicting MACEs were generally higher (0.93–0.96) in
   patients presenting with ACS, and slightly lowered to 0.87–0.90 in
   patients presenting with SCAD, but the differences were not
   statistically significant (Fig. [128]5).

Effects of altering NAD^+ biosynthesis in HG-cultured HASMCs

   Considering the importance of NAD^+ metabolites in gaining predictive
   performance of our prediction model, we investigated the effects of
   altering NAD^+ biosynthesis on the phenotypes of HASMCs under HG
   conditions. As shown in Fig. [129]6A and B, pharmacological inhibition
   of NAD^+ biosynthesis by FK866 or 1MT led to a substantial reduction in
   basal NAD^+ levels, accompanied by a marked deficit in mitochondrial
   complex I activity, which requires reduced NAD^+ for mitochondrial
   electron transfer [[130]40]. Consistent with abnormal changes in
   activities of mitochondrial complexes, significant reductions in
   parameters of mitochondrial respiratory, including mitochondrial basal
   respiration, ATP-linked respiratory capacity, and maximal respiration
   (Fig. [131]6C and D), were observed along with increases in glycolytic
   flux after pharmacological blockade of NAD^+ biosynthesis in HASMCs
   (Fig. [132]6E and F). Conversely, supplementation of NMN, a NAD^+
   precursor, increased basal NAD^+ levels and enhanced mitochondrial
   respiratory capacities while decreasing glycolysis in HG-cultured
   HASMCs (Fig. [133]6A–F).

Fig. 6.

   [134]Fig. 6
   [135]Open in a new tab

   Effects of inhibiting (FK866 or 1MT) or sustaining (by NMN) NAD^+
   biosynthesis on bioenergetic profiles, inflammatory activation, and
   proliferation of HASMCs under high glucose conditions. A NAD^+ levels
   detected by LC–MS. B Mitochondrial complex activity. C Mitochondrial
   stress test where different parameters of mitochondrial respiration are
   compartmentalized by a sequential application of Oligo, FCCP, and a
   combination of rotenone and antimycin. D Parameters of mitochondrial
   respiration. E Glycolysis stress test where different parameters of
   glycolytic flux are compartmentalized by a sequential application of
   glucose, Oligo, and 2DG. F Parameters of glycolytic flux. G Heatmap of
   mRNA expression of inflammatory cytokines determined by RT-qPCR. H
   Protein secretion of inflammatory cytokines. I Number of THP1 monocytes
   migrating toward HASMCs. J Proliferation of HASMCs. For all in vitro
   experiments, the control group is HG-cultured HASMCs. Data are
   presented as mean ± SD, n = 3 independent experiments. *P < 0.05;
   **P < 0.01. 1MT 1-methyl-L-tryptophan; NMN nicotinamide mononucleotide;
   Oligo oligomycin; R rotenone; A antimycin; OCR oxygen consumption rate;
   2DG 2-deoxy-glucose; ECAR extracellular acidification rate; FC fold
   change; Gly glycolytic

   Given the potential link of aerobic glycolysis to inflammatory
   activation [[136]41], we elected to further explore the impact of
   interfering NAD^+ biosynthesis on the HG-induced expression of an array
   of proinflammatory factors with documented roles in cardiovascular
   disease. As a result, inhibition of NAD^+ biosynthesis by FK866 or 1MT
   in HASMCs was found to significantly increase the production of a
   series of chemokines (MCP1, CCL3, CCL4, etc.) and interleukins (IL6,
   IL8, IL-1β etc.), both in terms of mRNA expression and protein
   secretion (Fig. [137]6G and H). In parallel, exposure of HASMCs to
   FK866 or 1MT resulted in a more than 80% increase in chemotaxis of THP1
   monocytes toward HASMCs (Fig. [138]6I), along with increased
   proliferation of HASMCs (Fig. [139]6J). In contrast, NMN
   supplementation normalized the production of proinflammatory factors,
   attenuated the ability of HASMCs to recruit THP1 monocytes, and
   inhibited HASMCs proliferation (Fig. [140]6G–J).

Discussion

   Diabetes is deemed as one of the most significant prognostic factors
   for adverse outcomes following PCI, with hazard ratios ranging from 1.9
   to 2.5 [[141]42]. Mounting evidence also indicates that diabetes causes
   increased proliferation of vascular smooth muscle cells [[142]43], more
   extensive neointimal hyperplasia [[143]44], and consequent more severe
   restenosis after stent implantation [[144]45], highlighting the view
   that the mechanism underlying the adverse outcomes after PCI in
   diabetes is probably different from that in non-diabetes [[145]42].
   Specific to metabolomic studies, there has been epidemiological
   evidence showing that diabetic patients with macrovascular
   complications have distinct circulating metabolic profiles compared
   with those without [[146]46, [147]47]. Recently, Cui and colleagues
   constructed a metabolite panel of phospholipids and sphingolipids with
   high accuracy (AUC > 0.90) for diagnosis of stent restenosis in
   non-diabetic patients [[148]48]. However, this metabolite panel only
   achieved a modest AUC of < 0.70 (data not shown) for predicting MACEs
   in our cohorts of type 2 diabetic patients receiving PCI, implying that
   the metabolite fingerprint causally associated with the occurrence of
   adverse outcomes after PCI may also differ between diabetic and
   non-diabetic status.

   Hence, for the first time, the present study focused on the
   identification of differential metabolic patterns at baseline to
   predict the incidence of MACEs at 1 year after PCI for patients with
   T2DM. By applying both untargeted and targeted metabolomic approaches,
   we first found a significant difference in serum metabolome profiles
   between type 2 diabetic patients with and without incident MACEs
   following PCI. Then, the Venn diagram depicted 35 metabolites
   differentially expressed in both discovery and internal validation
   sets, which had the potential to better discriminate patients with the
   occurrence of MACEs from matched controls. For constructing the
   parsimonious model that would be more achievable in the clinical
   setting, our next step was to select an optimal set of predictors from
   the 35 metabolites.

   Considering that the 35 metabolites belong to several metabolic
   pathways with possible interconnections, we did not use the traditional
   feature selection methods like logistic regression, because fitting a
   logistic regression model is, in fact, algebraically difficult when
   there are too many variables with complex interactions between each
   other [[149]11]. Instead, the 35 metabolic features were filtered by
   the VSURF algorithm, which is a recommended tree-based method to
   identify the most important variables for classification after
   accounting for the complex, nonlinear relations in the dataset
   [[150]11, [151]31]. As a result, a total of 6 metabolic features,
   including 4 NAD^+ metabolites, 1 phosphatidylcholine lipid, and 1 sugar
   phosphate, were finally selected for multivariate modeling.

   Then, we went on to consider how the 6 metabolites could best be
   incorporated to increase predictive accuracy. For this purpose, we
   trained the 6 metabolites using a series of powerful machine learning
   algorithms, including ensemble methods like RF and XGBoost, nonlinear
   method like SVM, and multilayer neural network. The testing results
   showed that the 6-metbolite models based on RF and XGBoost were the
   best-performing models, with high discriminative accuracy of ≥ 90% in
   both internal and external validation cohorts. Of note, the two
   ensemble models could detect an additional 13% of patients in whom the
   occurrence of MACEs following PCI would not be identified when using
   the traditional logistic regression model. If these findings can be
   verified, our 6-metabolite classifier may be as a helpful tool for
   identifying patients at high risk for MACEs and guiding early
   prevention among patients with T2DM undergoing PCI.

   An interesting finding of the current study is that our prediction
   model includes 4 key metabolites involved in biosynthesis (NAM,
   L-tryptophan), consumption (ADPR), or secretion (1-MNAM) of NAD^+,
   suggesting a possible link of NAD^+ metabolism to adverse outcomes
   following PCI. Physiologically, NAD^+ can be synthesized de novo
   starting with tryptophan, or from salvage pathway starting with NAD^+
   precursors like NAM derived from cellular NAD^+ metabolism or dietary
   supply [[152]49]. Under critical conditions (e.g., acute DNA injury and
   cell death), the synthesized NAD^+ can be excessively consumed by
   hyperactivated poly(ADP-ribose) polymerases to produce ADPR and NAM as
   byproducts, in which ADPR continues to form poly(ADP-ribose) chains
   with pivotal effects on posttranslational modification of target
   proteins [[153]50], whereas NAM can be methylated to form MNAM that is
   mainly secreted via urine [[154]37]. Our previous bidirectional
   Mendelian Randomization study has shown that higher extent of leukocyte
   poly(ADP-ribose), as a hallmark of massive NAD^+ consumption, is
   causally associated with the incidence of 1-year MACEs after PCI
   [[155]12]. The present study extends our previous work by providing
   experimental evidence that pharmacologically blocking the salvage or de
   novo biosynthetic pathways of NAD^+ causes abnormal changes in
   bioenergetic profiles, upregulated expression of proinflammatory
   factors, increased chemotaxis of monocytes to HASMCs, and enhanced
   proliferation of HASMCs. In contrast, sustaining NAD^+ levels via NMN
   supplementation may inhibit HG-induced glycolysis, pro-inflammation,
   and proliferation of HASMCs, all of which are aberrant phenotypes
   related to incident MACEs after PCI [[156]42, [157]51]. These findings
   are in line with recent evidence that the intrinsic NAD^+ fueling
   system is essential to protect against DNA damage, premature
   senescence, and chaotic migration of smooth muscle cells [[158]52,
   [159]53], supporting the close link between NAD^+ biosynthesis and
   phenotypic switching of HASMCs. Collectively, our in vitro data may
   provide an experimental foundation for the significant effects of NAD^+
   metabolites on post-PCI outcomes.

   Our study is the first to develop a machine learning-based metabolite
   classifier to predict incident MACEs at 1 year after PCI for type 2
   diabetic patients, with rigorous steps for model specification and
   evaluation of model performance (i.e., discrimination, calibration, and
   clinical usefulness). Nevertheless, our study also has limitations.
   First, owing to a relatively short follow-up period, we could not
   evaluate the predictive performance of our 6-metabolite model for
   long-term outcomes after PCI. Second, although we validated the
   6-metabolite model using both internal and external datasets, its
   predictive utility should be extended in larger independent cohorts,
   especially in other geographic populations. Third, sensitivity analyses
   observed that the predictive accuracy of the 6-metabolite model might
   slightly decrease in patients initially presenting with SCAD. So we
   could not rule out the possibility that simultaneously enrolling
   patients with different clinical presentations might potentially
   confound the performance of our prediction model. Fourth, other
   angiographic parameters, such as lesion type, vessel tortuosity, and
   presence of thrombus, might provide additional information on disease
   progression, but were not available in the current study. Fifth,
   although we computed variable importance to define predictors that most
   affected classification, the prediction model based on RF and XGBoost
   might still be harder to be interpreted, compared with the regression
   model simply using given coefficients to weight predictors [[160]54].
   Finally, more experimental research is needed for better understanding
   of the exact mechanism underlying the predictive value of NAD^+
   metabolites.

Conclusions

   Using an array of machine learning algorithms, we develop a
   6-metabolite signature with high accuracy for predicting incident MACEs
   at 1 year after PCI in patients with T2DM. A diagnostic test based on
   this metabolite model is clinically achievable because of the small
   number of metabolites included in our model, and may have a potential
   utility of early identification of type 2 diabetic patients at high
   risk for post-PCI outcomes. Our study also provides the first evidence
   for a critical role of abnormal changes in NAD^+ metabolites in the
   occurrence of adverse outcomes after PCI under diabetic conditions.

Supplementary Information

   [161]12933_2022_1561_MOESM1_ESM.pdf^ (2.4MB, pdf)

   Additional file 1: Supplementary methods. Figure S1. The Z distribution
   of methionine abundance in both discovery and internal validation sets.
   The Z scores of methionine across all samples were greater than -3
   (-3SD), indicating that the serum samples were properly stored for
   metabolite detection. Figure S2. The Pearson correlation between
   metabolic data of quality control samples assessing the reliability of
   the LC-MS analysis in the discovery (A) and internal validation sets
   (B). Figure S3. The OPLS-DA analysis assessing the performance of 35
   differential metabolites for discrimination of MACEs from matched
   controls in the discovery (A) and internal validation sets (B). Figure
   S4. The differences in cumulative rates of MACEs between participants
   with high-risk and low risk scores of the 6-metabolite signatures. P
   values were derived from Cox regression with adjustment for age, sex,
   smoking status, obesity (BMI > 25 kg/m^2), hypertension, HbA1c, LVEF <
   50%, clinical presentations, multivessel CAD, SYNTAX score, and stent
   types. Figure S5. Scatter plots for comparing the predictive
   performance of the random forest (A) and XGBoost (B) models to the
   logistic regression model of the 6-metabolite panel. The models are
   generated using the discovery dataset and presented here in the
   internal validation set. Red lines indicate the cut-offs of random
   forest and XGBoost models; Black lines indicate the cut-off of logistic
   regression model. Black circles label MACEs that would be identified
   using the random forest and XGBoost models but would be missed when the
   logistic regression model is applied. Figure S6. The importance of each
   predictor of the 6-metabolite classifier constructed by random forest
   (A) and XGBoost (B). Abbreviations: NAM, nicotinamide; ADPR, adenosine
   diphosphate ribose; 1-MNAM, 1-methylnicotinamide; PC,
   phosphatidylcholine. Figure S7. Calibration plots for the logistic
   regression and 4 machine learning models of the 6-metabolite classifier
   in the discovery (A) and internal validation sets (B). Abbreviations:
   RF, random forest; XGBoost, extreme gradient boosting; SVM, Support
   Vector Machines; DNN, deep neural network. Figure S8. Backward stepwise
   Cox regression analyses of the association between the 6-metabolite
   classifier and MACEs in the external validation set. (A) The
   log-minus-log plot for graphically testing the proportional hazards
   assumption. (B) Kaplan-Meier curve for assessing the performance of the
   6-metabolite classifier to predict MACEs. In the backward stepwise Cox
   regression analyses, variables including age, sex, smoking status,
   obesity (BMI > 25 kg/m^2), hypertension, HbA1c, LVEF < 50%, clinical
   presentations, multivessel CAD, SYNTAX score, stent types, and the
   6-metabolite classifier were first entered one at a time. Then, 4
   variables with P< 0.10 (i.e. HbA1c, LVEF < 50%, multivessel CAD, and
   the 6-metabolite classifier) in the stepwise procedure were retained to
   fit the final model. The HR and P value were calculated accordingly.
   Table S1. Calibration data for 6 metabolites detected by targeted
   metabolite analyses. Table S2. List of primers used in RT-qPCR. Table
   S3. Ontology classes of metabolites in the discovery and internal
   validation sets. Table S4. 69 differential metabolites in the discovery
   set. Table S5. 89 differential metabolites in the internal validation
   set. Table S6. The additional values of the RF-based 6-metabolite model
   beyond the FREEDOM clinical risk score in the external validation set.

Acknowledgements