Abstract
Serum metabolite profiling in Duchenne muscular dystrophy (DMD) may
enable discovery of valuable molecular markers for disease progression
and treatment response. Serum samples from 51 DMD patients from a
natural history study and 22 age-matched healthy volunteers were
profiled using liquid chromatography coupled to mass spectrometry
(LC-MS) for discovery of novel circulating serum metabolites associated
with DMD. Fourteen metabolites were found significantly altered (1%
false discovery rate) in their levels between DMD patients and healthy
controls while adjusting for age and study site and allowing for an
interaction between disease status and age. Increased metabolites
included arginine, creatine and unknown compounds at m/z of 357 and 312
while decreased metabolites included creatinine, androgen derivatives
and other unknown yet to be identified compounds. Furthermore, the
creatine to creatinine ratio is significantly associated with disease
progression in DMD patients. This ratio sharply increased with age in
DMD patients while it decreased with age in healthy controls. Overall,
this study yielded promising metabolic signatures that could prove
useful to monitor DMD disease progression and response to therapies in
the future.
Introduction
Duchenne muscular dystrophy is an X-linked genetic disease and the most
common childhood neuromuscular disorder, with an incidence of about 1
in 5,000 newborn males [[53]1–[54]3]. The disease is characterized by a
natural history consisting of progressive muscle degeneration, loss of
ambulation by the age of 12, and eventually death by late 20s to early
30s [[55]4, [56]5]. DMD is caused by mutations in the dystrophin gene
resulting in complete loss of the dystrophin protein that maintains
muscle fiber integrity and function [[57]6]. Absence of dystrophin
causes muscle fragility and loss of muscle fibers which are replaced by
fat and connective tissue. While understanding of the disease and
clinical management–including the use of glucocorticoid treatment and
assisted ventilation–has continued to improve over the last few decades
[[58]7, [59]8], development of novel therapies and testing of novel
drugs is still hindered by the small numbers of patients available for
clinical trials and the difficulties in selecting appropriate outcome
measures [[60]9]. As a result, regulatory agencies such as the Food and
Drug Administration (FDA) and European Medicines Agency (EMA) are more
willing to consider surrogate biomarkers as endpoint measures in Phase
II dose-ranging studies. Thus it is important to find and identify
reliable biomarkers associated with disease progression and response to
therapies for DMD patients.
The best-known serum molecular biomarker for DMD is muscle-derived
creatine kinase (CK) [[61]10], which is typically measured by enzymatic
activity. Serum elevations of CK are generally considered a diagnostic
biomarker of DMD, even in presymptomatic infants [[62]11]. However, the
large inter- and intra-person variability in CK levels, partly caused
by the impact of the child’s age and sensitivity to physical activity,
makes it a poor candidate for use as a surrogate biomarker.
Furthermore, CK levels have been shown to be inversely correlated with
disease progression and severity due to the loss of muscle tissue for
more advanced patients, which causes less CK to leak into the blood
stream and be detected [[63]12].
In the last few years, there has been renewed interest in defining
biochemical biomarkers for DMD. These include miRNAs [[64]13–[65]15]
and proteins [[66]12, [67]16–[68]20], but little research has been
dedicated to identifying metabolic biomarkers for DMD [[69]21].
Metabolites are small molecular mass components or intermediate
products of metabolism that can be detected in biofluids and tissues;
they regulate and maintain physiology homeostasis and have various
biological functions. They can be influenced by genetics, as well as by
environmental factors [[70]22]. Metabolites are easily measurable using
high throughput technologies and can be transitioned into clinical
assays. They are widely used in other diseases (e.g. serum creatinine
is a marker of liver function) but had very limited implementation in
DMD. “Metabolomics” refers to the systematic assessment of metabolites.
Metabolic profiles have been used to predict disease risk, to diagnose
disease, or as biomarkers of disease in a variety of disorders,
including diabetes, prostate cancer, and Crohn’s disease
[[71]23–[72]26]. Untargeted metabolomics approaches in DMD can add to
the growing catalogue of protein and miRNA biomarkers which may be used
to monitor disease progression and, as a result, to evaluate
therapeutic response. The present study proposes several such putative
metabolic biomarkers.
Materials and Methods
Study participants and samples
We analysed the circulating serum metabolites of 51 DMD patients and 22
healthy volunteers from 5 different study sites enrolled through the
Cooperative International Neuromuscular Research Group (CINRG) Duchenne
Natural History Study. All the study participants were male. The cases
and controls were age-matched, with some enrichment for older cases.
Summary characteristics of the study participants are provided in
[73]Table 1 and detailed in [74]S1 Table. The DMD patients had a
minimum age of 4, a maximum age of 28.7, and a median age of 11.4
years. The healthy controls had a minimum age of 6, a maximum age of
17.8, and a median age of 13.7 years. The study protocol was approved
by Institutional Review Boards at all participating institutions, and
informed written consent was obtained from participants or their parent
or legal guardian. The Institutional Review Boards were: Conjoint
Health Research Ethics Board (CHREB) (for Alberta Children’s Hospital,
Calgary), UC Davis Institutional Review Board, University of Pittsburgh
Institutional Review Board, Children’s National Medical Center
Institutional Review Board (CNMC IRB) and Executive Committee of the
Sydney Children’s Hospitals Network (SCHN) Human Research Ethics
Committee (HREC) (for Children’s Hospital at Westmead in Australia).
Table 1. Summary characteristics of study participants.
Study site DMD patients Healthy controls
Alberta Children’s Hospital (Calgary) 19 16
University of California, Davis (UC Davis) 26 0
University of Pittsburgh / Children's Hospital of Pittsburgh of UPMC (U
of Pittsburgh) 5 2
Children’s National Medical Center (CNMC) 0 4
Children’s Hospital at Westmead (Australia) 1 0
Median age in years (minimum, maximum) 11.4 (4, 28.7) 13.7 (6, 17.8)
Total by age category
4–7 years 15 2
> 7–11 years 8 3
> 11–18 years 17 17
> 18–29 years 11 0
Total 51 22
[75]Open in a new tab
For both DMD patients and healthy controls, 15 mL of blood was taken
according to the study protocol. The blood was separated into 3 red top
tubes, which were then mixed by 5 complete inversions. Samples were
allowed to clot for 30 minutes at room temperature in a vertical
position. After clotting, the tubes were centrifuged for 10 minutes to
separate the serum, which was carefully collected by avoiding contact
with the blood clot. The serum was then aliquoted into four
polypropylene Cryogenic vials from Thermo Scientific Nalgene. Each
transfer tube of 500 μL was frozen in a secure -80°C freezer until
shipment. Samples were not thawed and refrozen until analysis.
Liquid chromatography-mass spectrometry (LC-MS) analysis
For metabolite extraction, a single technical replicate for each serum
sample was processed by adding 175 μl extraction buffer (A solution of
40% Acetonitrile, 25% Methanol and 35% Water containing internal
standards [10μl of 1mg/ml debrisoquine and 50μl of 1mg/ml 4-
nitrobenzoic acid]) to 25 μl of each serum sample. The samples were
incubated on ice for 10 minutes and centrifuged at 13,000 rpm for 20
minutes at 4°C. The supernatant was transferred to a fresh Eppendorf
vial and dried under vacuum. The dried samples were reconstituted in
200 μl of 5% Methanol, 1% Acetonitrile and 94% water solution. The
samples were re-centrifuged at 13,000 rpm for 20 minutes at 4°C and
supernatant transferred to fresh vials for UPLC- QTof analysis. 2μl of
each sample was injected onto a Waters Acquity CSH C18 1.7 μm, 2.1 ×
100mm column using an Acquity UPLC system by Waters Corporation,
Milford, MA. The gradient mobile phase consisted of Solvent A- 100%
water with 0.1% formic acid, Solvent B- 100% acetonitrile with 0.1%
Formic acid and Solvent D- 9:1 ratio of Isopropanol to Acetonitrile
with 0.1% Formic acid and 10mM Ammonium Formate. To reduce the chance
of possible batch effects, the cases and controls were randomized. Each
sample was run onto the column for 13 minutes at a flow rate of 0.4 ml
per minute. The column temperature was set to 60° C. The gradient
consisted of 97% Solvent A for 3 minutes and then at a ramp of curve 6
to 60% Solvent B from 0.5 to 4 minutes. From 4.0 to 8.0 minutes at a
ramp of curve 6, the gradient moved to 98% of solvent B and at 9
minutes shifting to 5% Solvent B and 95% solvent D for10 minutes. At 11
minutes, the gradient was 25% Solvent A, 25% solvent B and 50% solvent
D at a curve of ramp 6 and then back to initial conditions of 97%
solvent A and 3% solvent B at 13 minutes. The elution from the column
was introduced to Quadrupole Time of flight Mass spectrometer (Waters
G2- Qtof) by electrospray Ionization in both positive and negative mode
at a capillary voltage of 3.5 kV and sampling cone voltage of 35 V. The
source temperature was set to 120° C and Desolvation temperature to 350
°C. The cone gas flow was maintained at 25L/hr and Desolvation gas flow
at 750L/hr. Leucine- Encephalin solution in 50% acetonitrile was used a
reference mass ([M+H]^+ = 556.2771 and [M-H]^− = 554.2615) to maintain
accurate mass. The data was acquired in centroid mode from 50 to 1200
mass range with the software Mass lynx (Waters Corporation). The column
was initially conditioned by multiple injections of pooled quality
controls and then every ten injections subsequently. The mass accuracy
was monitored by injecting a mixture of standard compounds at the
beginning and at the end of the batch.
Data processing
The resulting CDF files were processed together using the XCMS method
[[76]27] to align the peaks and estimate the metabolite intensities for
each sample. The exact steps performed were: feature detection,
retention time correction, alignment of peaks into peak groups, and
imputation of the values for the missing peaks based on the raw files.
Thus, each peak group may have slightly different m/z values in
different samples. The groups which had fewer than 19 peaks detected
across all samples prior to imputation were removed. Henceforth, we
will refer to “peak groups” as simply “peaks.” Prediction of isotopes
was then performed using the CAMERA package [[77]28] and higher-weight
isotopes were removed. This resulted in 313 peaks in the negative mode
and 1892 peaks in the positive mode, including nitrobenzoic acid and
debrisoquine, respectively. Four of the estimated peak intensities,
corresponding to two peaks, were equal to zero; we imputed these as the
minimum non-zero value for that peak across all samples. The
intensities were then normalized to the intensities of the two internal
standards, quantile normalized (separately for the positive and
negative modes), then log transformed (we used a log[2] basis, but the
results do not depend on the basis). Thus, a total of 2203 peaks were
considered in downstream analyses. Individual-level internal standard
normalized, quantile normalized, and log2-transformed peak intensity
values are given in [78]S1 Table.
Statistical and bioinformatic analysis
Variables considered
The primary variable of interest for all the analyses considered below
was DMD disease status. Additional variables considered were: age,
age-by-DMD status multiplicative interaction, and study site. We
included the effect of age, as metabolites related to hormonal and
other physical changes in individuals between childhood and adulthood
are expected to be reflected in metabolomics measurements, even in
healthy controls. Other sources of variation not related to the disease
process might exist. For example, many metabolites show associations
with diet [[79]29], some of which might also change with age. Since DMD
is a progressive disorder, the age trends may be different in cases and
controls, hence the use of the age-by-DMD status interaction term. We
considered the effect of the study site, since demographic
characteristics and environment factors including altitude or latitude
might be different in the populations served by the different centers.
Furthermore, we wished to account for any effects due to technical
variables which were not controlled for between sites and which might
have an impact on the results. For example, issues like personnel
differences could play an important role and can never be fully
eliminated [[80]30].
Global exploratory analyses
A principal components analysis (PCA) was also performed as an
exploratory analysis to check for possible technical artifacts and any
other unusual patterns. After the PCA, 9 linear regression models were
considered for the first principal component (PC1) to perform an
assessment of variables which may have a “global” impact on the
metabolite profiles. These models included as explanatory variables all
possible combinations of DMD status, age, and site, as well as the
possible age-by-DMD status interaction, only considered in models where
both age and DMD status were also included separately. The best-fitting
model in terms of the trade-off between model fit and number of
parameters were chosen by the Akaike Information Criterion (AIC).
Linear regression analyses
The goal of the main statistical analysis was to find metabolites that
are significantly different between DMD patients and healthy controls.
The following linear model was fit to the transformed and normalized
intensities:
[MATH: Met=β0+
mo>β1x1(Status=DMD)+β2xAge+β3x1(Status=DMD)xAge+βsit<
mi>e+Noise,
:MATH]
where β[site] is an intercept for each site with the exception of
Australia, which serves as a baseline.
Equivalently, this means that for DMD patients, the model was:
[MATH: Met=(β0+
mo>β1)
mo>+(β2+
mo>β3)
mo>xAge+βsit<
mi>e+Noise,
:MATH]
while for healthy controls, it was:
[MATH: Met=β0+
mo>β2xAge+βsit<
mi>e+Noise.
:MATH]
As a result, testing the null of no effect of the DMD disease on the
metabolite levels is equivalent, in this framework, to testing whether
both the intercept and the slope for age are the same in the two
groups:
[MATH:
H0
:β1=
mo>β3=
mo>0. :MATH]
We perform this F-test for each of the 2203 metabolite peaks, adjusting
for multiple testing via the Bonferroni threshold, to ensure control of
the familywise error rate (FWER) at a significance level of 0.05 and
via the Benjamini-Hochberg (BH) correction [[81]31]–which estimates
conservative q-values [[82]32]–to ensure control of the false discovery
rate (FDR) at a significance level of 0.01. The significant peaks from
the FDR-adjusted analysis were annotated via the HMDB [[83]22] and
Metlin [[84]33] databases and manually inspected. These peaks were also
considered in receiver operator characteristic (ROC) curve analyses,
which fit a logistic regression model with case/control status as
outcome and age and peak intensity as explanatory variables.
Two further analyses were considered. One subgroup analysis considered
the same linear model, but only on the 62 study participants with ages
4–18 years. This is because the DMD group contained adult patients,
whereas the control group did not. A subgroup analysis was also
performed for the study participants from Calgary, using the model:
[MATH: Met=β0+
mo>β1x1(Status=DMD)+β2xAge+β3x1(Status=DMD)xAge+Noise :MATH]
and performing the analogous F-test. The purpose of this analysis was
to see whether the most significant peaks are also found in a subset
which is expected to be more homogeneous, in terms of both demographics
and sample processing. Note that the Calgary subgroup had the largest
sample size and was also somewhat balanced in terms of cases and
controls ([85]Table 1).
Validation of significant metabolites
For validation studies, eight serum samples were processed and run for
tandem mass spectrometry (MS/MS) on Waters Quadruple time of flight
Mass Spectrometer with ultra-performance liquid chromatography (Waters
G2- Qtof). Samples were processed in the similar manner as in the
profiling experiment and the chromatographic conditions used for data
acquisition remained the same. The MS/MS spectra obtained for the
significant m/z was then used to match with available spectra in online
databases for identification of the biomarkers.
Pathway analyses
We used the web-based resource ConsensusPathDB [[86]34] to perform
pathway over-representation analysis. ConsensusPathDB contains a total
of 4349 pathways from 12 different pathway databases such as Reactome,
Kegg, Signalink, Biocarta, Pharmgkb, Netpath, Smpdb, Inoh,
Wikipathways, Pid, Ehmn and Humancyc. HMDB IDs of the peaks validated
by MS/MS were used to conduct pathway enrichment analysis. For each
pathway, the hypergeometric distribution was used to test for
over-representation of the metabolites in the input list of validated
peaks among the metabolites in the pathway set. A multiple testing
correction was performed using the Benjamini-Hochberg approach to
control the FDR for the pathways that overlapped with at least one of
the input HMDB IDs.
Bayesian network analyses
A Bayesian network analysis was conducted to develop a classifier for
the participants in the study using a minimal set of metabolites, for
both the entire dataset and the Calgary subset. The goal of this
analysis was to identify the minimum set of peaks that classify samples
as either DMD or control. The full data analysis considered age,
normalized peak intensities, and site. Age and the peak intensity
values were discretized using density approximation with 3 bins. The
Markov Blanket network learning algorithm [[87]35] was then applied to
create the network. The Calgary-only analysis considered age and
normalized peak intensities.
We then performed five-fold cross validation using the Markov Blanket
algorithm. The data was split randomly into 5 approximately equal
subsets, with the learning being performed on 80% of the data and the
remaining 20% being used for validation for each of the 5 combinations
of learning/validation sets.
Results
Global exploratory analysis
The top two principal components are shown in [88]Fig 1(A) and 1(B).
The samples in [89]Fig 1 are shape-coded by DMD disease status; in
[90]Fig 1(A) they are also color-coded by age category, whereas in
[91]Fig 1(B) they are color-coded by study site. 23% of the total
variance is explained by the first principal component (PC1) and 6% by
the second principal component (PC2). No clear clusters are observed.
The regression models considered with PC1 as the outcome, along with
the corresponding AIC and relative probability that they minimize the
information loss [[92]36] are presented in [93]S2 Table. The top model
selected via AIC for PC1 had age and study site, with study site being
significant (p = 0.008 from F-test comparing it to model with only age)
and age showing a borderline significant association (p = 0.066). Thus,
both age and study site appear to have an impact on global metabolite
profiles.
Fig 1. PCA plots of the internal standard normalized, quantile normalized,
and log2-transformed intensity data.
[94]Fig 1
[95]Open in a new tab
PC2 is plotted against PC1. Solid circles are DMD patients, empty
circles are healthy controls. In panel a), individuals are color-coded
by age category and in panel b) by study site.
Serum metabolome signature for DMD
Based on the F-tests performed for each of the 2203 peaks, testing for
an effect of disease status in the presence of age and study site and
allowing for an interaction between disease status and age, we find 14
metabolites to be statistically significant using the
Benjamini-Hochberg procedure with q-value (adjusted p-value) ≤ 0.01,
thus controlling the FDR at the 0.01 level. This means that we would
expect the average fraction of false discoveries to be no more than 1%
when repeatedly using this procedure. Of these 14 metabolites, 8 also
showed statistically significant associations at a Bonferroni threshold
of 0.05/2203, therefore controlling the FWER at 0.05. This means that
using a p-value cut-off of 0.05/2203 for rejection for each peak, we
would expect to obtain at least one false discovery (a peak which is
not truly different between cases and controls but for which the null
is rejected) no more than 5% of the time when repeatedly using this
procedure on individuals sampled in the same way. Significant
differences indicate that the fitted age trends for the DMD and control
groups differed in slope and/or intercept, i.e. either there is a
difference between intensity levels in normal and DMD individuals at
baseline, which is maintained at different ages (difference in
intercept); or there is no difference at baseline, but the values
change according to age (difference in slope); or there is a difference
at baseline and the values change according to age as well (difference
in intercept and slope.) We focused the metabolite annotation and
validation efforts on these 14 peaks. Each peak was annotated using the
HMDB and Metlin databases (see [96]S3 Table for the positive mode and
[97]S4 Table for the negative mode). Following MS/MS validation ([98]S1
Fig), the identities of 4 of these peaks were determined very likely to
be: 5a-DHT; creatinine; either epitestosterone sulfate,
dehydroepiandrosterone sulfate, or testosterone sulfate; and creatine,
corresponding to adducts with median m/z values of 369.17, 114.07,
367.16, and 132.08 in the LC-MS experiment. The MS/MS spectra for these
peaks were all in accordance with the most likely database annotations.
The identity of a fifth peak was determined to likely be L-arginine
upon manual inspection; the adduct for this peak has an m/z value of
174.15, which was close to the monoisotopic mass of 174.11 for
L-arginine, however it was not identified in the database annotation.
A subgroup analysis was conducted for the participants aged 4–18 years.
This retained 6 significant peaks (m/z values of 357.25, 449.15,
369.17, 114.07, 132.08, 312.01) at the Bonferroni threshold of
0.05/2203, with no additional peaks significant at the q-value
threshold of 0.01. Five of these peaks (all except the peak at m/z of
449.15) were among the top 14 peaks in the full data analysis. Three of
these peaks were validated as being creatinine, creatine, and 5a-DHT.
The remaining peak was ranked 16^th (q-value = 0.013) in the full data
analysis.
A subgroup analysis was also conducted for the Calgary dataset. This
resulted in 8 significant peaks at the Bonferroni threshold of
0.05/2203, with no additional peaks significant at the q-value
threshold of 0.01. Six (m/z values of 357.25, 369.17, 114.07, 367.16,
451.17, and 209.12) of these 8 peaks were among the top 14 peaks in the
full data analysis, three of them corresponding to the validated
metabolites creatinine, 5a-DHT, and either testosterone sulfate,
epitestosterone sulfate, or dehydroepiandrosterone sulfate. The
remaining 2 peaks at m/z of 223.13 and 175.12 respectively were ranked
15^th (q-value = 0.0101), and 612^th (q-value > 0.99) in the full data
analysis. We note that creatine was only slightly outside of the
threshold deemed significant in the Calgary-only analysis (q-value =
0.016).
[99]Fig 2 plots the normalized intensity values versus age for the top
14 peaks, with the regression lines shown for the Calgary and UC Davis
sites, which had the largest numbers of participants. It is clearly
shown that levels of some metabolites (e.g. creatinine at m/z 114.06)
and the unknown compound at m/z = 312) are not substantially different
between DMD and healthy volunteers at young ages, then their levels
substantially diverge at around 15 years of age. Creatinine was lower
in DMD group relative to control group at all ages with the most
substantial difference seen at an older age (> 15 years old). [100]Fig
3 displays boxplots for these same metabolites for the two groups and
clearly shows substantial differences between DMD and healthy
volunteers at older ages (blue dots).
Fig 2. Plots of the intensity versus age for the top 14 metabolites
associated with DMD status.
[101]Fig 2
[102]Open in a new tab
The intensity levels have been internal standard normalized, quantile
normalized, and log2-transformed. The likely annotations are given for
the validated peaks and the m/z value of the adduct in the LC-MS
experiment is given for the other peaks; the peak labelled testosterone
sulfate may be dehydroepiandrosterone sulfate or epitesterone sulfate.
The points are color- and shape-coded by disease status. The regression
lines obtained from the interaction model are shown for the Calgary
site for DMD cases and controls (solid lines) and for the UC Davis site
for DMD cases (broken line).
Fig 3. Boxplots of the intensity versus age for the top 14 metabolites
associated with DMD status.
[103]Fig 3
[104]Open in a new tab
The intensity levels have been internal standard normalized, quantile
normalized, and log2-transformed. The likely annotations are given for
the validated peaks and the m/z value of the adduct in the LC-MS
experiment is given for the other peaks; the peak labelled testosterone
sulfate may be dehydroepiandrosterone sulfate or epitesterone sulfate.
The points are separated by DMD status and color-coded by age category.
The top 14 metabolites include multiple pairs which have strong
positive or negative correlations ([105]Fig 4). In particular, we note
that creatine and creatinine are negatively correlated (Pearson
correlation = -0.48) while the two testosterone metabolites, 5a-DHT and
testosterone sulfate (or epitestosterone sulfate or
dehydroepiandrosterone sulfate), are strongly positively correlated
(Pearson correlation = 0.92).
Fig 4. Correlation plot of the top 14 metabolites associated with DMD status.
[106]Fig 4
[107]Open in a new tab
The intensity levels have been internal standard normalized, quantile
normalized, and log2-transformed. The likely annotations are given for
the validated peaks and the m/z value of the adduct in the LC-MS
experiment is given for the other peaks; the peak labelled testosterone
sulfate may be dehydroepiandrosterone sulfate or epitesterone sulfate.
We further considered a similar linear regression analysis for the
creatine/creatinine ratio (on the log[2] scale). Once again an F-test
was performed, looking for an effect of disease status in the presence
of age and study site and allowing for an interaction between disease
status and age; the result was highly significant (p = 4 x 10^−13).
[108]Fig 5 shows the plot of the ratio on the log[2] scale against
participants’ ages. We also compared this ratio to the values of serum
CK muscle type (CKM) measured in a recent proteomics study [[109]12].
All 51 of the DMD cases in the current study were also considered in
the proteomics study. Overall, there was a negative correlation between
CKM level and creatine/creatinine in the cases, reflecting a decrease
in CKM activity that is most likely due to loss of muscle mass with age
([110]S2 Fig).
Fig 5. Plot of the creatine/creatinine ratio of intensities on the log[2]
scale versus age.
[111]Fig 5
[112]Open in a new tab
The intensity levels have been internal standard normalized and
quantile normalized. The points are color- and shape-coded by disease
status. The regression lines obtained from the interaction model are
shown for the Calgary site for DMD cases and controls (solid lines) and
for the UC Davis site for DMD cases (broken line).
The top 14 metabolites and the creatine/creatinine ratio all had areas
under the curve (AUC) values greater than 0.5 in the ROC analysis
([113]S3 Fig) 8 of these 15 biomarkers had AUC greater than 0.8,
including creatine, creatinine, creatine/creatinine ratio, and the two
testosterone metabolites.
Pathway analyses for metabolites significantly different between DMD patients
and healthy controls
The pathways in which at least one of the 5 validated peaks,
corresponding to 7 HMDB IDs–due to the uncertainty for the peak with
m/z 367.16 –were considered. Two IDs, corresponding to epitestosterone
sulfate and 5-DHT were not found in the ConsensusPathDB database. The
pathways in which at least one of the metabolites associated with the
remaining 5 HMDB IDs is present are given in [114]S5 Table. Creatine
metabolism is the most significant pathway followed by urea cycle and
metabolism of amino groups, creatine biosynthesis and arginine and
proline metabolism. The validated metabolites present in these pathways
are creatine, creatinine, and L-arginine. In addition to the amino acid
metabolism pathways, hormonal pathways such as 17-beta hydroxysteroid
dehydrogenase III deficiency and androgen and estrogen metabolism were
also significant. These pathways contain testosterone sulfate and
dehydroepiandrosterone sulfate.
Bayesian network analyses
Six peaks were identified as important for classification for the
full-data analysis, two of them being validated as creatinine and
creatine; the remaining 4 peaks were ranked 27^th, 58^th, 88^th, and
1479^th using the F-test. Using only these 6 peaks, the participants in
the study can be classified at 97% accuracy (in-sample classification).
Four of these peaks were identified in at least 4 of the 5 combinations
in the five-fold cross-validation, corresponding to creatine (5 times),
creatinine (5 times), and the peaks ranked 27^th (q-value = 0.04) and
58^th (q-value = 0.10), which have m/z values of 154.06 and 176.04,
respectively (4 times). The peak with m/z = 154.06 is likely to be
annotated to a +Na adduct of creatine or beta-guanidinopropionic acid
based on the HMDB database.
The average classification precision for the five-fold validation was
88%.
The Bayesian network analysis applied to the Calgary subset selected
only two peaks, corresponding to creatine and creatinine. Using only
these two peaks, the samples were classified at 89% accuracy. Five-fold
cross validation identified the creatine and creatinine peaks as the
only peaks being selected in 3 of the 5 combinations. None of the peaks
appeared in more than 3 of the combinations. The total classification
precision for the 5 fold subgroup analysis was 71%. The lower
classification accuracy of the subgroup analysis may be due to the
smaller learning set size.
Discussion
To the best of our knowledge, this is the first serum metabolomics
study of DMD. We considered serum from 51 DMD patients and 22
age-matched controls for discovery of novel metabolic biomarkers for
DMD using LC-MS profiling. After data processing, linear regression
models were fit for each metabolite, considering the associations with
DMD disease status, age, study site, as well as the age-by-DMD status
interaction. We found 14 putative molecular biomarkers that differ in
their levels between DMD patients and healthy controls when age is
taken into consideration. Essentially, as in the DMD proteomics study,
we considered increasing age as a surrogate for disease progression
among the DMD patients [[115]12]. These metabolites included creatine,
creatinine, and testosterone-related steroids, which were confirmed by
MS/MS analysis. Creatinine and creatine were, as expected, inversely
correlated, with creatinine being generally lower and creatine
generally higher in the DMD patients. Creatinine tends to increase with
age in the controls and decrease in the DMD patients, while creatine
tends to decrease in the controls but stay relatively stable in the DMD
patients. Creatinine is a breakdown product from the high energy
metabolite phosphocreatine in muscle. With loss of muscle mass in DMD
there is a decrease in production of creatinine while creatine that is
synthesized by the liver stays at a steady level. These results are
also in agreement with a recent study which considers serum creatinine
in DMD and the less severe Becker muscular dystrophy phenotype, showing
an inverse correlation with a number of measures of disease severity
[[116]37], given that in our study creatinine is higher in controls
compared to cases and shows a slight decrease in cases with age, which
is correlated with disease severity.
We note that the ratio between creatine and creatinine may also serve
as a possible biomarker, as it generally appears to increase in DMD
patients and decrease in controls ([117]Fig 5) and is more
significantly associated with DMD status than either creatine or
creatinine individually. It is apparent that DMD patients are not
metabolizing creatine as they age, most likely due to loss of muscle
mass. These results are also in agreement with the values of serum CKM,
which were measured on all the DMD cases in a recent proteomics study
[[118]12]. While CKM decreases in DMD cases with age, thus leading to
levels more similar to those in healthy controls, which makes it not an
ideal marker of disease progression, the creatine to creatinine ratio
appears to increase in cases and decrease in controls.
Other interesting biomarkers are testosterone-related steroids that
showed an increase with age in both DMD patients and controls. However,
they remain lower in the case group compared to the control group. This
is most probably due to glucocorticoid use, which is known to cause a
decrease in levels of endogenous testosterone and derivatives
[[119]38]. We note that we detected two testosterone-related
derivatives, 5a-DHT and testosterone sulfate (or epitestosterone
sulfate or dehydroepiandrosterone sulfate), which are positively
correlated with each other. Our subgroup analyses–one restricted to
participants aged 4–18 years and one restricted to the Calgary
participants–both confirmed several of the significant peaks from the
all-data results. The Bayesian network analysis confirms the importance
of creatine and creatinine, although it does not select the
testosterone metabolites. We note that our study did not include
reliable recording of glucocorticoid use, because this is a natural
history study and it has been shown that DMD patients may take either
deflazacort or prednisone/prednisolone and that many different dosing
regimens exist [[120]39]. This is certainly an important aspect which
should be investigated in future work. Additional variables such as
muscle mass, dietary intake, disease conditions, and drugs or
supplements taken by the study participants at the time of the blood
draw are also known to impact metabolism and could not be controlled in
the present DMD natural history study, therefore additional studies
considering these variable are desirable and could be achieved with a
well-controlled cohort in the future.
A major challenge we faced and a common one for this type of study
involves separating metabolites related to the disease process from
those that are a result of drugs or dietary supplements used by the
patients. Several compounds were found to have altered levels in DMD
patients compared to controls, but are yet to be identified.
Potential differences between study sites could also be a concern.
These were significant for the first principal component, which is not
surprising, due to possible differences in demographics and
environmental exposures which could not be controlled, although the
sample collection, processing and storage protocols were consistent at
each site. We note as a limitation of this work the fact that the
number of patients in 3 of the five sites was less than 10 and that the
second largest site, UC Davis, only provided cases. We adjusted for
study site in the linear model and performed the Calgary-only subgroup
analysis, which resulted in very similar top hits, to partially address
this issue but are aware that in the future more sites should be
considered, with a larger number of participants per site and more
detailed individual-level demographic and exposure information. A
further limitation is due to the retrospective nature of the
study–meaning that we cannot tell if the change in a metabolite’s
intensity is related to a process which impacts disease progression or
is impacted by disease progression. We are presently collecting
longitudinal samples within a well-controlled cohort in order to answer
these questions and to validate the biomarkers proposed in our current,
natural history, study. Our future work will also be powered to detect
correlations between clinical outcomes which measure disease severity
and metabolic markers. As creatine and creatinine are well-studied
metabolites, the use of more targeted methods to measure them as well
as their ratio in future studies may also provide additional validation
for our results.
Conclusions
We described herein the first comprehensive metabolomic study for DMD.
We believe that this preliminary study represents one of the first
steps towards finding metabolic surrogate biomarkers of disease
progression in DMD patients. Given that the present work considered a
natural history study, important variables could not be controlled. We
are hopeful that future work will include prospective longitudinal
studies with a larger number of variables collected, as well as more
targeted assays, which will provide better insights into temporal
trends and causal mechanisms related to DMD progression. While further
validation of the results from this initial discovery study is expected
to follow from the longitudinal cohort, our current work should provide
valuable insights as the first metabolomic signature of DMD and might
prove to be a useful reference for future metabolomics studies.
Integration with additional types of data–such as genomics, proteomics,
and clinical outcomes–is a further avenue of interest.
Supporting Information
S1 Fig. MS/MS spectra for the top peaks.
(PDF)
[121]Click here for additional data file.^ (236.3KB, pdf)
S2 Fig. Plot of CKM versus the creatine/creatinine ratio in DMD cases,
both on the log[2] scale.
The intensity levels for creatine and creatinine have been internal
standard normalized and quantile normalized. The points are color- and
shape-coded by age category. ρ represents the correlation on the log
scale. Both the overall correlation and the correlations within age
categories are given.
(PDF)
[122]Click here for additional data file.^ (7.1KB, pdf)
S3 Fig. ROC curves for the top 14 peaks from the overall analysis and
for the creatine/creatinine ratio.
A logistic model was fit with case/control status as outcome and age
and peak intensity as explanatory variables. The AUC for each possible
biomarker is given in the title; tpr = true positive rate, fpr = false
positive rate.
(PDF)
[123]Click here for additional data file.^ (15KB, pdf)
S1 Table. Individual characteristics and peak intensities of the study
participants.
Each peak is labelled according to its m/z value, retention time in
seconds, and mode (‘p’ = positive, ‘n’ = negative); for example, peak
M132T37p has an m/z value of 132, a retention time of 37 seconds, and
was detected in the positive mode. The log[2]-transformed CKM values
from the [[124]12] study are also given for the common samples.
(XLSX)
[125]Click here for additional data file.^ (2.2MB, xlsx)
S2 Table. Models considered for PC1.
The 9 linear models considered for PC1. The variables and Akaike
Information Criteria (AIC) are given. Smaller AIC values indicate a
better model fit, while also adjusting for the number of parameters,
with the best model according to the AIC being shown in bold. The
relative probabilities representing the strength of evidence for each
model j compared to the model deemed to be the best are calculated by
exp(-(AIC[j]-AIC[min])/2); for instance, the model including only site
has 0.43 times as much evidence than the model including both age and
site.
(XLSX)
[126]Click here for additional data file.^ (8.1KB, xlsx)
S3 Table. Annotations for the top peaks detected in positive mode from
HMDB and Metlin.
Drugs and non-human metabolites are shown in red. The most likely
annotations are highlighted.
(XLSX)
[127]Click here for additional data file.^ (17KB, xlsx)
S4 Table. Annotations for the top peaks detected in negative mode from
HMDB and Metlin.
Drugs and non-human metabolites are shown in red. The most likely
annotations are highlighted.
(XLSX)
[128]Click here for additional data file.^ (14.7KB, xlsx)
S5 Table. Pathways from ConsensusPathDB in which at least one of the 5
validated peaks is present.
The ‘Size’ column represents the absolute size of the pathway, the
‘Effective size’ column represents the number of set members that are
annotated with an ID of the user-specified ID type (HMDB ID in this
case); the ‘Input overlap’ column represents number of entities from
the user input list that overlap with entities in the pathway based
set. The p-values are obtained from a hypergeometric test of
over-representation. The q-values are calculated using the
Benjamini-Hochberg approach considering just the tests for the pathways
which include at least one validated peak.
(XLSX)
[129]Click here for additional data file.^ (21.7KB, xlsx)
Acknowledgments