Abstract
Moyamoya disease (MMD) is a progressive cerebrovascular disorder that
increases the risk of intracranial ischemia and hemorrhage. Timely
diagnosis and intervention can significantly reduce the risk of
new‐onset stroke in patients with MMD. However, the current diagnostic
methods are invasive and expensive, and non‐invasive diagnosis using
biomarkers of MMD is rarely reported. To address this issue,
nanoparticle‐enhanced laser desorption/ionization mass spectrometry
(LDI MS) was employed to record serum metabolic fingerprints (SMFs)
with the aim of establishing a non‐invasive diagnosis method for MMD.
Subsequently, a diagnostic model was developed based on deep learning
algorithms, which exhibited high accuracy in differentiating the MMD
group from the HC group (AUC = 0.958, 95% CI of 0.911 to 1.000).
Additionally, hierarchical clustering analysis revealed a significant
association between SMFs across different groups and vascular cognitive
impairment in MMD. This approach holds promise as a novel and intuitive
diagnostic method for MMD. Furthermore, the study may have broader
implications for the diagnosis of other neurological disorders.
Keywords: biomarkers, fingerprints, mass spectrometry, moyamoya disease
diagnosis
__________________________________________________________________
Moyamoya disease (MMD) increases stroke risk, but current invasive
diagnostic methods are costly. To address this, nanoparticle‐enhanced
laser desorption/ionization mass spectrometry was used to analyze serum
metabolic fingerprints and developed a deep learning diagnostic model,
achieving high accuracy in distinguishing MMD patients from healthy
controls (AUC = 0.958). This approach holds promise as a novel and
intuitive diagnostic method for MMD.
graphic file with name ADVS-12-2405580-g004.jpg
1. Introduction
Moyamoya disease (MMD) is a chronic progressive cerebrovascular
disorder characterized by bilateral occlusion or stenosis of the
internal carotid artery at the entrance to the Willis circle.^[ [52]^1
, [53]^2 ^] It is highly prevalent among people of Asian countries,
particularly Korea, Japan, and China, with the annual incidence rising
yearly.^[ [54]^3 , [55]^4 ^] The disease primarily manifests as
intracranial ischemia and hemorrhage, maintaining a bimodal age
distribution with peaks at ages 5 and 40.^[ [56]^5 , [57]^6 , [58]^7 ^]
Among European Caucasian MMD patients, 81% report ischemic symptoms at
the initial stage, while 8.5% experience hemorrhagic symptoms.^[ [59]^8
^] The 5‐year stroke risk, encompassing both ischemic and hemorrhagic
manifestations, ranges from 40% to 65% based on natural history studies
of MMD patients.^[ [60]^9 , [61]^10 ^] Pediatric patients are more
prone to intracranial ischemia, while intracranial hemorrhage typically
manifests after 25 years of age, leading to varied neurological
symptoms depending on the bleeding site.^[ [62]^11 , [63]^12 ^]
Rebleeding symptoms, affecting ≈40% of patients, are a significant
cause of mortality, with death rates reaching 28.6%.^[ [64]^13 ^]
Beyond these physical symptoms, MMD patients frequently suffer from
cognitive dysfunction, likely attributable to brain structure damage
and hypoperfusion.^[ [65]^14 ^] Initial studies suggested that 23%–31%
of patients experience significant cognitive impairment.^[ [66]^15 ,
[67]^16 , [68]^17 ^] However, recent research indicates that 79% of
patients exhibit impairment in one or more cognitive domains, with 58%
affected in two or more areas.^[ [69]^18 ^] MMD is associated with a
broad spectrum of cognitive impairment, with severe damage observed in
working memory, attention, and executive function.^[ [70]^19 ^]
Neurocognitive dysfunction induced by MMD, as significant as stroke
events, markedly diminishes quality of life and exacerbates economic
burdens.^[ [71]^20 ^]
Timely surgical intervention not only is an effective treatment for
reducing the long‐term stroke risk but also can reverse hypoperfusion
and mitigate brain microstructure damage, underscoring the importance
of early detection and intervention in improving MMD outcomes.^[
[72]^21 , [73]^22 , [74]^23 ^] Digital subtraction angiography (DSA)
remains the diagnostic gold standard for MMD,^[ [75]^24 , [76]^25 ^]
with an AUC of 0.813 in DSA image models.^[ [77]^26 ^] Despite its
time‐consuming nature and lower spatial resolution, magnetic resonance
imaging/angiography (MRI/MRA) is increasingly used to identify the
disease's major neuroradiological features.^[ [78]^27 ^] These
procedures, however, are invasive, expensive, and require expert
annotation of images, making them unsuitable for mass screening. The
advent of non‐invasive, low‐cost blood tests may revolutionize MMD
detection.
Blood tests are indispensable for the early diagnosis of diseases as
they encompass a plethora of biomarkers capable of providing
disease‐related information, containing metabolic byproducts,
inflammation markers, biochemical indicators, immunoserum markers, and
hormone levels.^[ [79]^28 , [80]^29 , [81]^30 ^] In comparison to
alternative diagnostic approaches, such as tissue biopsies or imaging
tests, blood tests exhibit superior safety, simplicity, and
cost‐effectiveness attributable to their non‐invasive nature,
convenience, and broad applicability.^[ [82]^31 , [83]^32 , [84]^33 ^]
A solitary blood sample can furnish abundant information, facilitating
physicians' assessment of a patient's health status and timely
detection of potential diseases. The significance of blood tests in
early disease diagnosis extends beyond prevalent conditions like
cancer, cardiovascular diseases, and diabetes,^[ [85]^34 , [86]^35 ,
[87]^36 ^] encompassing the vital realm of brain diseases.^[ [88]^37 ^]
Thus, blood tests are capable of uncovering specific genetic variations
or distinct biomarkers of brain diseases, offering a dependable basis
for diagnosing brain conditions.^[ [89]^38 , [90]^39 ^] In conclusion,
blood tests are poised to serve an irreplaceable role in early brain
disease detection, their speed, precision, and cost‐efficiency
rendering them essential in clinical settings.
Metabolites hold the potential to serve as biomarkers for the diagnosis
of brain diseases, given their ability to cross the blood‐brain barrier
(BBB).^[ [91]^40 ^] This characteristic allows for the acquisition of
metabolite information that reflects the state of brain diseases from
the blood, obviating the need for more invasive diagnostic procedures
such as brain tissue biopsies or neuroimaging.^[ [92]^38 , [93]^41 ,
[94]^42 ^] Furthermore, disease conditions can impact the metabolic
activities of brain cells, leading to changes in metabolite levels.^[
[95]^43 ^] The pathophysiological processes underlying brain diseases
can be elucidated by measuring these alterations, aiding in early
diagnosis and monitoring of disease progression. Serum metabolic
fingerprints (SMFs), a type of blood test that employs small molecule
metabolites (such as amino acids, peptides, lipids), have the capacity
to identify specific diseases by analyzing metabolite patterns in the
blood.^[ [96]^35 , [97]^44 , [98]^45 , [99]^46 ^] This non‐invasive
method of testing, characterized by its convenience, speed, and
cost‐effectiveness, can be widely applied in clinical practice and has
been reported in intracranial diseases, including stroke,^[ [100]^36 ^]
acute traumatic brain injury,^[ [101]^47 ^] intracranial aneurysm,^[
[102]^48 ^] and other diseases.^[ [103]^45 ^] Consequently, metabolic
diagnosis plays a crucial role in MMD diagnosis, as it can provide
information about the brain's microenvironment and disease state,
offering support for early diagnosis, treatment selection, and disease
monitoring.
Over the last ten years, analytical methods, including nuclear magnetic
resonance (NMR) and mass spectrometry (MS), have emerged in the field
of SMFs.^[ [104]^49 , [105]^50 , [106]^51 , [107]^52 , [108]^53 ^] NMR
detection has limitations in sensitivity, molecular recognition, and
lengthy detection times due to relaxation.^[ [109]^54 ^] MS, measuring
the mass‐to‐charge ratio (m/s), is traditionally used with
chromatography for sample purification and metabolite enrichment, which
can limit analytical efficiency and throughput.^[ [110]^44 , [111]^55 ,
[112]^56 , [113]^57 ^] In contrast, nanoparticle‐assisted laser
desorption/ionization (LDI MS) offers fast analysis (approximately
seconds), low sample volume (0.1–1 µL), simple sample preprocessing,
high sensitivity, and low cost.^[ [114]^55 , [115]^58 , [116]^59 ^] It
presents a high‐performance technique that could significantly aid SMFs
analysis of MMD. Concurrently, machine learning unveils correlations
and mechanisms between metabolomics and diseases, offering a novel
perspective for disease comprehension and treatment.^[ [117]^60 ,
[118]^61 ^] Machine learning algorithms are capable of identifying
biomarkers linked to specific diseases, thereby enabling early disease
detection and diagnosis.^[ [119]^45 , [120]^62 , [121]^63 , [122]^64 ,
[123]^65 ^] Moreover, machine learning can forecast disease risk and
progression using patient metabolomic data through classification and
predictive models, providing a basis for treatment selection in
personalized medicine.^[ [124]^36 , [125]^66 ^] Nonetheless, challenges
pertaining to model interpretability and generalizability must be
addressed to improve the reliability and precision of machine learning
in metabolomics‐assisted diagnosis.
In our study, we employed nanoparticle‐assisted LDI MS for the
extraction of serum metabolic fingerprints (SMFs) from MMD group,
aiming for non‐invasive diagnosis. Initially, we procured SMFs from 288
serum samples (comprising 144 MMD and 144 HC), as depicted in Scheme
[126]1A, demonstrating high reproducibility and low sample volume (as
shown in Scheme [127]1B). Utilizing machine learning, we distinguished
the MMD group from the HC group, achieving an optimal area under the
curve (AUC) of 0.958 (as illustrated in Scheme [128]1C). A cluster
analysis was performed on the group of MMD patients based on
metabolites to examine the variations in cognitive impairment levels
across groups. Our study has propelled the analysis of serum metabolism
in MMD forward and laid the groundwork for future non‐invasive blood
tests for MMD.
Scheme 1.
Scheme 1
[129]Open in a new tab
Schematics for MMD diagnosis based on SMFs and machine learning. A)
Sample collection: collection of serum samples from the MMD group and
HC group. B) Nanoparticle‐assisted LDI MS analysis: serum samples were
arrayed on chip, followed by nanoparticles to obtain raw mass spectra
and serum metabolic fingerprints. C) Processing flow: The MMD group and
HC group were differentiated after conducting machine learning of serum
metabolic fingerprints (SMFs) for signal selection.
2. Results
2.1. Characterization of MMD Specific SMFs
For the analysis of MMD‐specific SMFs, we gathered 288 serum samples,
comprising 144 from patients diagnosed with MMD and 144 from healthy
controls (HC), as depicted in Figure [130]1A. The diagnosis for the MMD
group was confirmed through DSA, with representative DSA images from
the MMD group presented in Figure [131]1B and Figure [132]S1
(Supporting Information). Concurrently, the HC group exhibited no
clinical indications of MMD or any other significant diseases.
Comparative analysis between the MMD and HC groups indicated no
significant differences in age, gender, or Body Mass Index (BMI), with
the P‐value for age exceeding 0.5, an F‐value for sex of 0.906, and a
P‐value for BMI greater than 0.05 (Table [133]S1, Supporting
Information).
Figure 1.
Figure 1
[134]Open in a new tab
Characterization of MMD‐specific SMFs. A) The distribution of age and
sex among the 144 participants in the MMD group and the 144
participants in the HC group. B) A representative DSA image from an MMD
patient. C) Two typical mass spectra at the m/z range of 100 to 600 are
shown for serum samples of MMD group and HC group. D) Intensities of
four molecular peaks (gray line for [Val + Na]^+ at an m/z of 140.14,
red line for [Lys + Na]^+ at an m/z of 169.09, blue line for [Glu +
Na]^+ at an m/z of 203.05, green line for [Suc + Na]^+ at an m/z of
365.11) for eight replicates per sample. E) A heat map of distinct
metabolic fingerprints for 288 serum samples was constructed using
134 m/z signals obtained through data preprocessing. F) The PCA plot
for the MMD group (depicted as bule points) and HC group (depicted as
green points).
We established a serum metabolic fingerprinting database utilizing
nanoparticle‐assisted LDI MS analysis, which exhibited high
reproducibility and necessitated minimal sample volumes with optimal
nanoparticles (Figure [135]S2, Supporting Information). To ascertain
the optimal dilution factor, we procured the mass spectrum of the
undiluted serum, along with the serum diluted 20‐fold, 10‐fold, 5‐fold,
and undiluted. The 10‐fold diluted serum exhibited the most optimal
mass spectrum, with significant intensity differences and more peaks
with signal‐to‐noise ratios (S/N) exceeding 3 (Figure [136]S3,
Supporting Information). For stability assessment, we collected mass
spectra from standard serum on days 1, 3, 5, and 7, with no clear
distinction between the four‐day results and similarity analyses
showing more than 96% of scores exceeded 0.9 (Figure [137]S4,
Supporting Information). Regarding detection sensitivity, the efficient
absorption and transfer of nanoparticle‐assisted LDI MS resulted in
enhanced detection sensitivity compared to organic matrices (Figure
[138]S5, Supporting Information).
Conventional methods like liquid chromatography‐mass spectrometry
(LC‐MS) and gas chromatography‐mass spectrometry (GC‐MS) necessitate
tedious sample pretreatment and prolonged chromatographic separation
(≈40 min) to diminish the complexity of biological samples and enrich
the molecules.^[ [139]^67 , [140]^68 ^] Contrastingly,
nanoparticle‐assisted LDI MS facilitates size‐selective capture of
metabolites and in‐source cation adduction (e.g., Na^+ and K^+) for
in‐situ pre‐concentration.^[ [141]^35 , [142]^69 , [143]^70 ^] This
allows for the direct detection of metabolites in complex biological
fluids with minimal biological samples (0.1 – 1 µL). Moreover, we
attained high analysis speeds (≈5 – 20 s per sample) owing to the 2000
laser emission at a pulse frequency of 1000 Hz when detected by chip
microarrays and nanoparticle‐assisted LDI MS devices.^[ [144]^44 ,
[145]^46 ^] This satisfies the requirements for high‐throughput
detection and provides a high‐performance platform for disease
diagnosis through body fluids, such as blood and urine.
Our methodology exhibited high analytical throughput, generating ample
m/z signals for each metabolic fingerprint. In each sample, we procured
≈124 000 data points in the original MS result, with over 98% of strong
m/z signals acquired within the low mass range (m/z of 100 to 600)
(Figure [146]1C; Figure [147]S6A, Supporting Information).
Additionally, the intensity CVs of the four molecular peaks ([Val +
Na]^+ at an m/z of 140.14, red line for [Lys + Na]^+ at an m/z of
169.09, blue line for [Glu + Na]^+ at an m/z of 203.05, green line for
[Suc + Na]^+ at an m/z of 365.11) in the standard ranged from 3.04% to
4.10% (Figure [148]1D), indicating the reliability of SMFs and their
potential for further diagnostic applications.
After data preprocessing, which encompassed binning, smoothing,
baseline correction, peak detection, and alignment, we extracted the
MMD‐specific SMFs containing 134 m/z signals from the original raw mass
spectra (Figure [149]1E). To reduce bias, each sample was measured five
times, using the average to represent the SMFs of the sample. We
selected 10 m/z values to calculate the relative standard deviation
(RSD), which was found to be less than 15% (Figure [150]S6C, Supporting
Information). Furthermore, each batch experiment included five quality
control (QC) samples, and we observed metabolic clustering with the
standard serum collected in each independent experiment batch,
indicating data reliability (Figure [151]S6B, Supporting Information).
Meanwhile, the similarity scores for these batches surpassed 0.95,
indicating the data's stability and reliability (Figure [152]S6D,
Supporting Information). Principal component analysis (PCA) revealed a
degree of separation between the MMD and HC group (Figure [153]1F).
These findings suggest that the MMD‐specific SMFs, based on
nanoparticle‐assisted LDI MS, can be utilized in the construction of
the MMD diagnosis model and signal selection, thereby achieving the
capability to distinguish the MMD group from the HC group.
2.2. Construction of a SMFs‐Based Diagnosis Model for MMD
To complement the diagnosis of imaging methods, we verified that the
metabolite information of SMFs, consisting of 134 signals, showed a
promising future as a blood examination method to distinguish MMD group
from HC group. As serum samples were randomly assigned to discovery
cohorts of n = 230 (n = 115 MMD and n = 115 HC) and n = 58 independent
validation cohorts (n = 29 MMD and n = 29 HC), we constructed a MMD
diagnosis model using machine learning based on MMD specific SMFs. At
the outset, power analysis was conducted to determine the required
sample size for the experiment, targeting a false discovery rate (FDR)
of 0.1. With a sample size of 230 (115/115, MMD/HC), a power level of
0.90 can be reached. (Figure [154]S6E, Supporting Information).
Machine learning has been integrated with metabolomics data to diagnose
various diseases, including stroke,^[ [155]^36 ^] breast cancer,^[
[156]^71 ^] stomach cancer,^[ [157]^72 ^] metabolic syndrome,^[
[158]^73 ^] and cerebral aneurysm.^[ [159]^48 ^] This integration
allows for efficient analysis of metabolomic data, identifying
potential diagnostic markers for clinical use. To test the
classification performance of MMD specific SMFs, five algorithms
(Neural Network [NN],^[ [160]^36 , [161]^74 ^] Adaptive Boosting
[AdaBoost,^[ [162]^75 ^]] Ridge Regression [RR,^[ [163]^76 ^]]
K‐Nearest Neighbor [kNN]^[ [164]^77 ^] and Naïve Bayes [NB]^[ [165]^78
^]), which were suitable for disease diagnosis and classification,^[
[166]^73 , [167]^79 , [168]^80 ^] were chosen for model construction.
We verified the effectiveness of the model using 10‐fold
cross‐validation, while evaluating the average performance of the model
by sensitivity, specificity, and AUC. For the discovery cohort, we
achieved AUC of 0.945 (95% CI 0.912 to 0.977, Figure [169]2A,C) with NN
algorithm. For comparison, RR showed AUC with 0.824 (95% CI 0.767 to
0.880, Figure [170]S7A, Supporting Information), AUC with 0.770 (95% CI
0.707 to 0.833, Figure [171]S7B, Supporting Information) was provided
by AdaBoost. Meanwhile, kNN (AUC = 0.852, 95% CI 0.801 to 0.903, Figure
[172]S7C, Supporting Information) and NB (AUC = 0.789, 95% CI 0.729 to
0.849, Figure [173]S7D, Supporting Information) (Table [174]S2,
Supporting Information) were assessed. Notably, NN showed optimization
performance in all five algorithms (p < 0.05, DeLong test, Table
[175]S3, Supporting Information), though the AUC of other algorithms
differentiating MMD group from HC group in the discovery cohort were
greater than 0.750. Furthermore, we obtained consistent results in the
independent validation cohort in the blind test, with an AUC value of
0.958 (95% CI 0.911 to 1.000, Figure [176]2B,D; Figure [177]S8,
Supporting Information) for the NN.
Figure 2.
Figure 2
[178]Open in a new tab
The machine learning model for MMD screening. A sample‐level scatter
plot stratifies the HC group (depicted in orange) and the MMD group
(depicted in blue) for A) discovery cohort (n = 230; 115/115, HC/MMD)
and B) validation cohort (n = 58; 29/29, HC/MMD). The ROC curves for
five typical algorithms (NN, RR, AdaBoost, kNN, and NB) for diagnosing
the MMD group from the HC group in C) the discovery cohort and D) the
validation cohort. E) Model evaluation of five algorithms, including
accuracy, F1 score, recall, precision, and MCC. F) AUC value of 10
repeated instances for the five algorithms in the validation cohort.
Moreover, we compared the five algorithms in more details including
accuracy, F1 score, recall, precision, and stability to further
illustrate the superiority of NN algorithm. Compared with the other
four algorithms, the NN algorithm performed best in accuracy (0.914),
F1 score (0.918), MCC (0.832), recall (0.966), and precision (0.875)
(Figure [179]2E; Table [180]S4, Supporting Information), whose
performance was greater than other algorithms significantly. Moreover,
the AUC value obtained by 10 repeated tests in the validation cohort
was carried out. Obviously, the average AUC value of NN was remarkably
better than that of other 4 algorithms, and the performance was the
most stable as the AUC values were highly consistent across 10
repetitions (Figure [181]2F; Table [182]S5, Supporting Information).
Also, through 20 repetitions at 10‐fold and AUC value undergoing
20‐fold in validation cohort, the stability of the NN model results has
been further verified (Figure [183]S9, Supporting Information).
Notably, NN algorithm has significantly higher performance in MMD
specific SMFs to make a distinction between MMD group and HC group.
Compared with commonly used traditional machine learning algorithms,
end‐to‐end learning and multilayer stacked networks contribute to the
excellent performance of NN algorithms.^[ [184]^36 ^] When traditional
machine learning algorithms need to gradually integrate separate signal
extraction and supervised classification processes into reliable
routines, end‐to‐end learning in neural networks combines signal
extraction and disease classification into a single step. This involves
computing convolution and nonlinear transformations of features to
extract high‐level abstractions for final classification within the
network.^[ [185]^81 , [186]^82 ^] For multilayer stacked networks,
locally connected 1D layers and stacked nonlinear signal interaction
layers are built in NN to solve linear and nonlinear problems between
input signals. Compared to multilayer perception, the nonlinear signal
interaction layer, in particular, prevents potential overfitting and
enhances classification performance in scenarios with limited samples.
In comparison, traditional machine learning algorithms such as RR,
AdaBoost, kNN, and NB, while capable of addressing certain nonlinear
complexities, exhibit limited performance in terms of scalability and
flexibility, and are sensitive to noise and outliers in the data.
2.3. Identification of a Metabolic Biomarker Panel
To further identify the characteristic signals associated with MMD, we
screened three aspects: integrated gradients (IG), fold change (FC),
and T test. This allowed us to obtain the MMD‐specific metabolite panel
(Figure [187]3A). Integrated Gradients (IG) is a method for
interpreting the predictions of deep learning models by determining how
much each signal contributes to a model.^[ [188]^83 , [189]^84 ^] In
this regard, we selected the top 20% signals based on their
contribution degree in the NN algorithm (Figure [190]3B).
Simultaneously, we compared the SMFs of the MMD group and HC group to
identify the characteristic signals with |Log[2](FC)| greater than 1.8
and a P‐value less than 0.05 (Figure [191]S10A, Supporting
Information). Furthermore, by matching in the HMDB database, we
cataloged 134 potential metabolites in Table [192]S6 (Supporting
Information), and from this, we identified 6 metabolites that
constitute a specific panel for MMD (Figure [193]3C). These metabolites
exhibited significant differences in the serum between the MMD group
and HC group, with 2 metabolites down‐regulated and 4 metabolites
up‐regulated in the MMD group (Figure [194]3D). To confirm the
stability of these six serum metabolites, we utilized Fourier Transform
Ion Cyclotron Resonance (FT ICR) mass spectrometry and LC MS, which
detected these metabolites with absolute molecular weight deviations
under 150 ppm (Table [195]S7, Supporting Information).
Figure 3.
Figure 3
[196]Open in a new tab
Biomarker panel development. A) The procedure for deriving metabolite
panels via three methods based on SMFs. B) A heatmap representing the
contribution value of 134 signals in IG. C) The intersection of the
signals filtered by the three methods, including integrated gradients
(IG), fold change (FC), and T test. D) The violin plot illustrated the
differential expression of six metabolites between the MMD group
(depicted in red) and HC group (depicted in white), with P‐value (^****
p < 0.0001) indicated at the top of each violin plot. E) The ROC curves
exhibit a higher AUC of 0.861 using the metabolic biomarker panel
compared to a single metabolic biomarker (AUC ranging from 0.682 to
0.837).
To assess the predictive power of the MMD‐specific panel consisting of
6 metabolites, we utilized the MMD diagnosis model to calculate the AUC
value. The panel exhibited an improved diagnostic AUC (0.861),
surpassing that of a single metabolic biomarker (AUC value of
0.682–0.837) (Figure [197]3E; Figure [198]S10B and Table [199]S8,
Supporting Information). The performance of the MMD‐specific panel
suggests its potential as a promising biomarker for MMD clinic
diagnosis. Additionally, we conducted a search for metabolites
associated with brain diseases such as epilepsy and Alzheimer's, which
have been previously reported in HMDB, using the m/z values of
metabolic signals (Tables [200]S6 and [201]S7, Supporting
Information).^[ [202]^85 , [203]^86 ^] Furthermore, to validate the
repeatability of the six metabolic indicators, we collected 86 samples
from three centers as an independent external validation set with no
obvious separation of the three‐center samples (Table [204]S9 and
Figure [205]S10C, Supporting Information). The AUC reached 0.867 based
on the 6 metabolic signals, with specificity 0.892 and sensitivity
0.735, indicating that the six metabolic indicators can be clinically
applied (Figure [206]S10D,E, Supporting Information). Finally, during
pathway enrichment analysis using the Kyoto Encyclopedia of Genes and
Genomes (KEGG) pathway library, we investigated the potential
biological relevance and metabolic pathways involving these six
metabolic signals. There are 7 pathways involved, including taurine and
hypotaurine metabolism, vitamin B6 metabolism, nicotinate and
nicotinamide metabolism, pantothenate and CoA biosynthesis,
beta‐Alanine metabolism, pyrimidine metabolism, and primary bile acid
biosynthesis (Figure [207]S11A, Supporting Information). Furthermore,
the pathways including 1) taurine and hypotaurine metabolism (PI of
0.429), 2) vitamin B6 metabolism (PI of 0.078), and 3) pyrimidine
metabolism (PI of 0.053) have a significant pathway impact (PI) > 0.05
and a hit number (the number of matched metabolites in the pathway) ≥1
(Table [208]S10, Supporting Information).
Utilizing biofluid biomarkers for MMD diagnosis could revolutionize
clinical practice, particularly as non‐invasive, cost‐effective liquid
biopsy techniques like blood tests are currently underexplored in MMD
clinical diagnosis. Currently, traditional imaging techniques such as
MRI and DSA remain the primary diagnostic tools for MMD, despite their
inherent risks, including potential anaphylaxis and nephropathy from
contrast medium use.^[ [209]^27 ^] Furthermore, MMD diagnosis via
imaging necessitates a seasoned physician specializing in intracranial
diseases, posing a significant barrier for many patients residing in
smaller cities. In contrast, assessing biomarker content via blood
tests offers a convenient, non‐invasive, and intuitive method for MMD
diagnosis.
Alternatively, few studies focusing on proteomics and genomics analyses
have reported some changed protein and nucleic acid caused of MMD.^[
[210]^87 , [211]^88 , [212]^89 ^] Concurrently, the use of small
metabolites in MMD diagnosis is infrequent, largely due to the limited
sample size (20–45) in studies. In comparison, our study, based on a
well‐structured cohort, demonstrated superior diagnostic performance
with an AUC of 0.958 (95% CI of 0.911 to 1.000), sensitivity of 0.966,
and specificity of 0.862, indicating promise for MMD diagnosis in
clinical settings.
2.4. Differentiation of Cognitive Impairment by Metabolic Grouping
To elucidate the clinical significance of metabolites in MMD, we
explored the correlation between metabolite patterns and cognitive
functions reflecting MMD severity. Hierarchical clustering analysis
revealed a natural division of MMD metabolite patterns into three
distinct groups (Figure [213]4A). We retrospectively assessed
neuropsychological performance in 92 patients (Table [214]S11,
Supporting Information), evaluating neurocognitive functions including
global cognition, verbal memory, attention, executive function,
visuospatial ability, and language (Figure [215]S11B,C; Tables [216]S12
and [217]S13, Supporting Information).^[ [218]^90 ^] Each
neuropsychological scale's scores were represented using rank‐sum.
Interestingly, we found that Group 3′s neuropsychological performance
was significantly inferior to that of Groups 1 and 2 (Kruskal–Wallis
test, p < 0.001), with no significant difference observed between
Groups 1 and 2 (Figure [219]4B,C). This difference persisted both in
the data of 134 metabolites and on the MMD‐specific panel of 6
metabolites, with Group 3 consistently showing significantly lower
neuropsychological performance than Groups 1 and 2 (Figure [220]S12A,
Supporting Information). Additionally, a significantly higher
proportion of patients in Group 3 were diagnosed with vascular
cognitive impairment (VCI), including mild VCI and vascular dementia
(VaD), compared to Groups 1 and 2 (Kruskal–Wallis test, p = 0.002)
(Figure [221]4D; Figure [222]S12B, Supporting Information). These
results suggest that the metabolite patterns in Group 3 are associated
with poor neuropsychological performance and VCI.
Figure 4.
Figure 4
[223]Open in a new tab
The association between metabolite pattern and cognitive function. A)
The outcome of hierarchical clustering analysis. B) The heatmap of the
rank‐sum of neuropsychological scores indicates that Group 3 exhibited
the poorest performance. C) Group 3 demonstrated the lowest
neuropsychological score in comparison to Group 1 (^**** p < 0.0001)
and Group 2 (^**** p < 0.0001). D) Group 3 comprised the highest
proportion of mild Vascular Cognitive Impairment (VCI) and Vascular
Dementia (VaD).
Currently, VCI diagnosis primarily depends on comprehensive evaluations
by professional neuropsychologists, incorporating clinical
manifestations, neuropsychological assessments, and neuroimaging
findings.^[ [224]^91 ^] The early stages of mild VCI can often be
elusive without the use of neuropsychological tests. However, the
diagnostic efficacy of neuropsychological tests can be influenced by
factors such as age,^[ [225]^92 ^] educational background,^[ [226]^93
^] cultural differences,^[ [227]^94 ^] and test‐retest effects.^[
[228]^95 ^] Thus, employing serum biomarkers could offer a promising
new approach for screening cognitive impairment. In this study, we
discovered a strong correlation among specific MMD metabolite patterns,
neuropsychological performance, and the incidence of VCI. The
underlying pathophysiological mechanisms remain elusive, potentially
involving metabolites released from dysfunctional neurons and/or
neuroglial cells, which then traverse the BBB and are detected in
peripheral circulation.
To date, several studies have reported potential peripheral biomarkers
for identifying VCI. However, most of these studies were conducted
between VCI and HC, where significant biases may arise due to primary
diseases like stroke or atherosclerosis.^[ [229]^96 , [230]^97 ^] In
contrast to those studies, our research was conducted within the MMD
group, effectively eliminating bias from primary diseases. These
findings offer new insights into the role of serum metabolite
biomarkers in identifying VCI in MMD patients, which is strongly
associated with MMD severity.
3. Conclusion
In summary, we achieved rapid sample detection using the
nanoparticle‐assisted LDI MS platform. Coupled with machine learning,
we successfully diagnosed MMD patients with an AUC of up to 0.958, and
based on this, we screened to obtain MMD‐related metabolic panels.
Finally, the cognitive status of the patients with MMD was analyzed. we
discovered a strong correlation between specific MMD metabolite
patterns, neuropsychological performance, and the incidence of VCI. Our
work could potentially contribute to the diagnosis of MMD patients via
clinical blood tests. However, there are certain limitations to our
work that should be acknowledged. First, it is essential to
characterize the compounds and verify their functionality, either in
vitro or in vivo. Second, integrating our metabolic panels with imaging
to construct a multimodal database could potentially enhance clinical
practice.
4. Experimental Section
Chemicals and Reagents
Chemicals and reagents in this work included standard small metabolites
reagents to prepare nanoparticles and organic matrices. For standard
small metabolites, L‐valine (Val, 98%), D‐glucose (Glu, 99.5%),
L‐arginine (Arg, 99.5%), L‐leucine (Leu,98%), Sucrose (99.5%) were
purchased from Sigma–Aldrich (St. Louis, MO, USA). For reagents to
prepare nanoparticles, trisodium citrate dihydrate (99%), ferric
chloride hexahydrate (97%), ethylene glycol (99.5%), and anhydrous
sodium acetate (99%) were purchased from Sinopharm Chemical Reagent
Beijing Co., Ltd. (Beijing, China). For organic matrices,
α‐cyano‐4‐hydroxy‐cinnamic acid (CHCA) and 2,5‐dihydroxybenzoic acid
(DHB) were purchased from Bruker (Bremen, Germany). Trifluoroacetic
acid (TFA, 99.5%) was purchased from Macklin Biochemical Co., Ltd.
(Shanghai, China). Acetonitrile (ACN, 99%) was purchased from Aladdin
Reagent (Shanghai, China). LC‐MS grade acetonitrile (ACN), HPLC grade
methanol (MeOH) were obtained from Honeywell. (Houston, TA, USA). All
chemicals and reagents above were directly used without further
purification or enrichment. All aqueous solutions were prepared using
deionized water (18.2 MΩ·cm, Milli‐Q, Millipore, GmbH) for all the
experiments.
Clinical Subjects and Sample Harvesting
This study was approved by the institutional review boards of Huashan
Hospital, Fudan University (reference number: KY2015‐256 and
KY2022‐821), and all participants or their guardians signed the
informed consent. 288 blood samples of MMD patients and health controls
were harvested between September 2020 and October 2022. All MMD
patients underwent Digital subtraction angiography (DSA) to confirm the
diagnosis by two senior neurosurgeons. Healthy controls were recruited
from the Health Examination Center of Hashan Hospital, Fudan
University. Additionally, from November 2023 to January 2024, 86
samples were collected for an external validation cohort, including 49
MMD samples from three centers: Center A (Huashan Hospital), Center B
(Liaocheng People's Hospital), and Center C (The First Affiliated
Hospital of Fujian Medical University Hospital). Blood samples were
obtained at the same time as regular blood tests after overnight
fasting and then centrifuged for 10 min (1500 g, 4 °C). Serum samples
were aliquoted in sterile centrifuge tubes and stored at −80 °C storage
freezer. The results presented are derived solely from all qualified
samples suitable for mass spectrometry analysis, and no samples were
discarded during the course of the project.
Assessment of Cognitive Impairment in Clinical Subjects
A battery of neuropsychological assessment and diagnosis of vascular
cognitive impairment were performed by three professional
neuropsychologists, including Mini‐Mental State Examination (MMSE),
Memory and Executive Screening (MES),^[ [231]^98 ^] Symbol Digit
Modalities Test (SDMT),^[ [232]^99 ^] Trail Making Test A and B (TMT‐A
and B),^[ [233]^100 ^] Chinese auditory verbal learning test (AVLT),^[
[234]^101 ^] Animal/Vegetable/Alternation Verbal Fluency Test (VFT),^[
[235]^102 ^] Boston Naming Test (BNT),^[ [236]^103 ^] Clock Drawing
Test (CDT)^[ [237]^104 ^] and Rey‐Osterrieth Complex Figure Test
(CFT).^[ [238]^105 ^] The diagnosis criteria of VCI were based on the
Guidelines from the Vascular Impairment of Cognition Classification
Consensus Study (VICCCS).^[ [239]^91 ^]
Nanoparticle‐Assisted LDI MS Experiments
Nanoparticle‐assisted LDI MS experiments of standard small metabolites
and aqueous humor samples of MMD patients were conducted on the Bruker
systems with Nd:YAG lasers (355 nm) for Autoflex (Time of Flight‐Mass
Spectrometry, TOF‐MS) and Solarix 7.0T (Fourier Transform‐Ion Cyclotron
Resonance‐Mass Spectrometry, FT‐ICR‐MS), using prepared nanoparticles
and organic matrices for laser desorption/ionization (LDI) process.
Typically, the prepared nanoparticles were suspended in deionized water
at a 1.0 mg mL^−1 concentration. The CHCA and DHB were dissolved in
0.3% TFA aqueous solution/ACN (2/1, v/v) to prepare a saturated and
10 mg mL^−1 solution. In LDI MS experiments, 1 µL of analyte solution
(prepared standard small metabolites or samples) was first spotted on
the polished steel plate and dried, followed by depositing 1 µL of
nanoparticle suspension or organic matrix solution and dried in air at
room temperature. Standard small metabolites were utilized for precise
mass measurement during mass calibration, and all MS experiments were
performed in the positive ion mode. Each analysis was conducted with a
pulse frequency of 1000 Hz and 2000 laser shots. The acceleration
voltage was set as 20 kV, and the delay time was optimized to 200 ns.
For internal quality control in each batch of testing, a nine‐grid
spotting method was employed during the sample analysis process. In
this 3 × 3 matrix, the center point served as the quality control spot
(composed of various standard molecules), where calibration was
performed before collecting the metabolic spectra of the other eight
samples in the matrix. To address instrument detection biases, each
sample point is measured five times in each batch test, and the average
of these five measurements is used to obtain stable spectral values for
further analysis.
LC‐MS Experiments
To prepare serum, 400 µL of a methanol: acetonitrile mixture (1:1 v:v)
was added to a 100 µL serum sample and vortexed for 60 s. The mixture
was then incubated at −20 °C for 2 h to facilitate protein
precipitation. Following incubation, the sample was centrifuged at
13 000 rpm for 15 min at 4 °C, and the supernatant was collected and
evaporated to dryness using a vacuum concentrator. The dry extracts
were subsequently resuspended in 100 µL of a 3:7 methanol: water
solution, after which the supernatant was collected and stored at −80
°C. Analysis of the supernatant was performed using HPLC‐MS/MS on a
TripleTOF 7600 plus mass spectrometer (AB SCIEX, USA) coupled with an
Agilent 1290 liquid chromatography system (Agilent, USA). For LC
separation, the ACQUITY UPLC BEH C18 column (100 mm × 2.1 mm i.d.,
1.7 µm; Waters) was utilized. The mobile phase consisted of A) formic
acid in water (containing 25 mm ammonium acetate and 25 mm ammonia) and
B) formic acid in acetonitrile (0:100, v/v). The flow rate was
maintained at 0.5 mL min^−1, with an injection volume of 2 µL. The
column flow rate was fixed at 500 µL min^−1, with the column
temperature set to 40 °C. The chromatographic gradient was as follows:
0–0.5 min at 95% A; 0.5–7 min from 5% to 35% A; 7–8 min from 35% to 60%
A; 8–9 min at 60% A; 9–9.1 min from 60% to 5% A; and 9.1–12 min at 5%
A. Electrospray ionization MS was conducted in positive/negative ion
modes. Information dependent acquisition (IDA) was employed to
simultaneously gather full scan MS and MS/MS data. Mass data were
collected within the m/z range of 60 to 1200 Da, with ion spray
voltages set to 5000 V (positive mode) and 4000 V (negative mode), and
a heated‐capillary temperature maintained at 600 °C. The flow rates for
the curtain gas, nebulizer, and heater gas were established at 35, 60,
and 60 arbitrary units, respectively, with the collision energy set to
30 V. The raw data generated were imported into Progenesis QI (version
2.2, Waters, USA) for peak detection, chromatogram deconvolution,
alignment, and normalization. The coefficient of variance (CV) values
for QC samples were calculated, and data with CVs less than 30% were
preserved for further analysis. Subsequently, metabolites were
identified by comparing the detected MS/MS spectra with information
from the Human Metabolome Database (HMDB).
Data Preprocessing
The mass spectrometry data preprocessing workflow included steps such
as binning, smoothing, baseline correction, peak detection, and
alignment. Spectrum smoothing was performed using a 1D Gaussian filter
(sigma = 1) to eliminate noise. To reduce complexity, spectrum
down‐sampling with a binning operation employing a window size of
0.05 Da was executed. Baselines were corrected using the white top‐hat
operation through morphological transformations to eliminate background
noise. Peak extraction and spectrum alignment were utilized to address
inherent variations in mass‐to‐charge (m/z) ratios found across
individual sample spectra. During peak extraction, raw spectral data
from each sample were processed to identify and extract peaks using
sophisticated algorithms aimed at distinguishing between signal and
noise, ensuring high accuracy in the identified peaks for subsequent
analysis. The spectrum alignment technique was utilized to standardize
peak positions across all samples. In peak extraction, the raw spectral
data from each sample was processed to identify and extract peaks,
which involves sophisticated algorithms designed to differentiate
between signal and noise, ensuring a high degree of accuracy in the
peaks identified for subsequent analysis. In spectrum alignment,
Spectrum Alignment technique was employed to standardize peak positions
across all samples. This method meticulously adjusts the m/z positions
to ensure that identical peaks across different spectra align
perfectly. The primary goal of this alignment is to achieve consistency
in the representation of each signal (identified peak) across the
dataset, thereby enabling accurate cross‐sample comparisons. Local
maxima operation was performed to extract the final metabolic signals
for peak detection. As a result of these preprocessing efforts, the
positions of 134 signals across 288 samples were successfully
standardized, ensuring uniformity in m/z positions while acknowledging
that signals intensities may vary among samples.
Machine Learning
Machine learning of this work was conducted for model building, signal
selection, parameter tuning, and validation on Python version 3.9 and
Orange3‐3.37.0 ([240]https://github.com/biolab/orange3). During model
building, five machine learning algorithms, including Ridge Regression
(RR), Adaptive Boosting (AdaBoost), K‐Nearest Neighbor (kNN), Naïve
Bayes (NB), and Neural Network (NN) were used and trained using 10‐fold
cross‐validation with 10 times repeating. The code for these five
algorithms can be accessed through the following links: RR
([241]https://github.com/biolab/orange3/blob/master/Orange/classificati
on/logistic_regression.py), AdaBoost
([242]https://github.com/biolab/orange3/blob/master/Orange/modelling/ad
a_boost.py), kNN
([243]https://github.com/biolab/orange3/blob/master/Orange/modelling/kn
n.py), NB
([244]https://github.com/biolab/orange3/blob/master/Orange/classificati
on/naive_bayes.py), and NN ([245]https://github.com/Alltnl/MMD). This
data is available at the NIH Common Fund's National Metabolomics Data
Repository (NMDR) website, the Metabolomics Workbench,
[246]https://www.metabolomicsworkbench.org where it has been assigned
Project ID ST003368. The data can be accessed directly via it's Project
DOI: 10.21228/M8B83W.
For the Neural Network (NN), the initial input was standardized to a
12D vector (x_input). This input comprised 134 signals for the
recognition of serum metabolic fingerprints (SMFs) which included 134
mass‐to‐charge ratio (m/z) signals (x_spectral). To augment the
networks' adaptability, the remaining features—10 for SMFs
recognition—were assigned a value of 0. The architecture of these
networks was rooted in deep neural networks (DNN), featuring two
primary components: a feature extraction part (feature_extract) and a
non‐linear feature interaction layer (feature_interaction). After
undergoing extraction and interaction processes, the reorganized
features were fed into a classification layer equipped with a Softmax
function. This layer was responsible for generating the output
probabilities for the classification tasks.^[ [247]^36 ^]
For validation, all the machine learning algorithms were evaluated in
an independent validation cohort.^[ [248]^106 ^] The stage prediction
task was considered a binary classification. All included machine
learning algorithms output a normalized probability of advanced stage
from 0 to 1. For each machine learning model, the parameters are as
follows: Ridge Regresstion: alpha = 0.140; Adaboost: base estimator =
Tree, number of estimators = 60, learn rate = 1.0, classification
algorithm = SAMME.R, regression loss function = Linear; kNN: number of
neighbors = 5, weight = uniform, metric = Euclidean; naïve bayes: none;
Neural Networks: batch_size_cv = 32, epochs_cv = 200, drop_rate = 0.25.
During signal selection, the Integrated Gradient algorithm is used to
illustrate the contribution of each signal in the NN algorithm, which
is to obtain the contribution of non‐zero gradients in the unsaturated
region to the importance of decisions by integrating gradients along
different paths.^[ [249]^83 ^]
Power Analysis
Power analysis (FDR = 0.1, power = 0.9) was conducted on MetaboAnalyst
5.0 ([250]https://www.metaboanalyst.ca/). The specifics of this method
are as follows: Assume a set of test statistics measuring differential
expression follows a normal distribution N (µ, σ^2). Under the null
hypothesis H[0] (no differential expression), the mean µ = 0; under the
alternative hypothesis H[1] (differential expression), the mean µ ≠ 0.
The cumulative distribution functions (CDF) of the test statistics
under H[0] and H[1] are denoted as K and L, respectively. The observed
test statistics' mixed CDF is given by:
[MATH: Mt=π0Kt+1−π0
mrow>∫−∞<
mrow>+∞Lt,θλθdθ :MATH]
(1)
where λ represents the density of effect sizes θ, and π[0] represents
the proportion of non‐differentially expressed features.
[MATH:
∫−∞+∞Tu,θλθdθ=uπ0
1−δ<
mrow>δ1−π0
mrow> :MATH]
(2)
The effect size is defined as the difference between the mean
expression levels of metabolites under two conditions, divided by their
pooled standard deviation. The estimation of π[0] follows the approach
suggested by Langaas et al.,^[ [251]^107 ^] and λ is estimated by a
deconvolution estimator. The average power can be estimated by solving
the following equation: where
[MATH: T :MATH]
represents the power for a single metabolite as a function of the
p‐value u and effect size θ, and δ is the user‐defined false discovery
rate. The average power is controlled for multiple testing through the
adaptive Benjamini‐Hochberg method,^[ [252]^108 ^] which prevents
overestimation. The effect size density estimation must be constrained
to be non‐negative and integrated to 1. To avoid discontinuities where
the constraint is applied, the π[0] estimate needs to be readjusted.
Estimating the average power also involves detecting effect sizes
around zero, which are technically challenging to measure accurately. A
small region around zero can be defined and excluded from the effect
size density, thereby increasing the estimated average power.^[
[253]^109 ^]
Statistical Analysis
The heatmap, principal component analysis (PCA), and clustering
analysis were performed on R version 4.2.2 using pheatmap, ggplot2,
FactoMineR, factoextra, and NbClust packages.^[ [254]^58 , [255]^110 ^]
Fold change and pathway analysis were conducted on MetaboAnalyst 5.0
([256]https://www.metaboanalyst.ca/).^[ [257]^111 ^] For pathway
analysis, verification of the metabolites that displayed a significant
difference between the HC group and MMD group was conducted on the
human metabolome database (HMDB, [258]http://www.hmdb.ca/) using the
signal molecular formula.^[ [259]^112 ^] The pathway analysis was
performed on MetaboAnalyst 5.0 based on the KEGG pathway library for
homo sapiens. Another statistical analysis in this work was performed
on SPSS software (version 24.0, SPSS Inc., USA) to calculate the
P‐value for statistical demonstration, including two‐sided student's
t‐test, chi‐square test, ANOVA, and Kruskal–Wallis H Test. All
significance level was set as 0.05. AUC was calculated using Origin
2024.
Conflict of Interest
The authors declare no conflict of interest.
Author Contributions
Y.X., R.W., and X.G. contributed equally to this work. K.Q. and W.X.
designed the overall approach and planned this work with Y.G. and W.N.
R.W., X.G., J.S., M.Z., J.H. and H.Y. contributed to the serum sample
collection and preparation, respectively. R.W. and X.G. also
contributed to the summary of clinical information. Y.X., L.C., and
W.X. contributed to the MS data acquisition and MS data analysis. Y.X.
and W.X. organized the figure and wrote the manuscript with R.W. and
X.G. All authors joined in the critical discussion and revised the
manuscript. Additionally, we thank Tingting Xie for assisting the
collection and treatment of blood samples.
Supporting information
Supporting Information
[260]ADVS-12-2405580-s001.docx^ (3.1MB, docx)
Acknowledgements