Abstract
Evaluating the plant risk of soil pollution by plant physiological
indices usually requires a long cycle and has significant uncertainty.
In this study, oxidative potential (OP) of the in situ heavy metal
contaminated soils was measured by the dithiothreitol method. The
oxidative stress response of the model plant ryegrass (Lolium perenne
L.) induced by heavy metal contaminated soil was evaluated by the
biomarkers, including superoxide dismutase and total antioxidant
capacity. The comprehensive biomarker response index has a significant
exponential correlation with the OP of soil (r = 0.923, p < 0.01)
in ryegrass. Metabolomics analysis also showed a significant
relationship of the metabolic effect level index of amino acids and
sugars with OP. Random forest was selected from four machine learning
models to screen the metabolites most relevant to OP, and Shapley
additive explanations analysis was used to explain the contribution and
the influence direction of the features on the model. Based on the
selected 20 metabolites, the metabolic pathways most related to OP in
plants, including alkaloid synthesis and amino acids metabolism, were
identified. Compared to the plant physiological indices, OP is a more
stable and faster indicator for the plant risk assessment of heavy
metals contaminated soil.
Keywords: Oxidative potential, Heavy metal contaminated soil, Plant
health, Metabolomics, Machine learning
Graphical abstract
[39]Image 1
[40]Open in a new tab
Highlights
* •
Oxidative potential (OP) is exponential correlated with the
comprehensive biomarker response index.
* •
OP was significantly related to the metabolic effect level indexes
of amino acids and sugars.
* •
Top 20 important metabolites in ryegrass contributed to OP was
screened based on random forest.
* •
The metabolic pathways in ryegrass most related to OP were
identified.
1. Introduction
Heavy metal-contaminated soil poses widespread health risks to plants
[[41]1]. Cu and Pb are major soil contaminants widely distributed in
industrial sites and can disrupt the redox state of plant cells by
generating reactive oxygen species (ROS) [[42]2], which is considered a
crucial mechanism of the ecological risk posed by Cu and Pb. ROS can
negatively affect important physiological processes such as plant
respiration and photosynthesis, ultimately leading to toxicity symptoms
such as growth retardation, root blackening, and yellowing in plants
[[43]3,[44]4]. For example, ROS can result in metabolic imbalances in
plant cells [[45]5], affect membrane structure and permeability, and
inhibit photosynthesis of plants [[46]6]. Ryegrass (Lolium perenne L.)
is a widely distributed, fast-growing plant with high biomass. It plays
a vital role in maintaining the functions of soil ecosystems and has
the ability to tolerate soil contamination, such as heavy metals. In
addition, the growth status and physiological indicators of ryegrass
are sensitive to heavy metals, so it can serve as a model plant for
evaluating the ecological risk of heavy metals contamination
[[47]7,[48]8]. The oxidative stress of heavy metals on plants was
usually measured by indicators such as antioxidant enzymes and
malondialdehyde (MDA). However, the indicators have limitations, such
as the long time required for plant cultivation and multiple factors
that can cause increased bias in evaluation, including interspecific or
individual variations of plants and environmental factors like soil
properties and temperature [[49]9].
Oxidative potential (OP) is an index reflecting the integrated contents
of redox-active components in contaminated particle matters [[50]10].
Dithiothreitol (DTT) assay can quantitatively measure particle
matter-mediated ROS formation, and Cu and Pb have been proven to be
contributors to OP values measured by the DTT method. In addition, OP
value has been found to be associated with several biological
endpoints, such as congestive heart failure, lung cancer, and asthma in
humans [[51][11], [52][12], [53][13], [54][14]] and antioxidant enzymes
in soil animals [[55]15,[56]16], indicating that OP may be a reasonable
predictor parameter for ROS induced oxidative stress effects
[[57]12,[58]17,[59]18]. Although there are differences in the oxidative
damage of heavy metals to animals and plants [[60]19,[61]20], OP is
also speculated to be applicable for assessing ecological risks caused
by ROS induced by heavy metals to plants and may have a dose–response
correlation with oxidative stress indicators in plants. However, there
is still no direct evidence for the correlation between soil OP and
oxidative stress indicators in plants, such as antioxidant enzymes and
metabolomics, which limits the application of OP in ecological risk
assessment of heavy metal-contaminated soils to plants. Meanwhile, due
to the large scale and interrelationship of metabolites, traditional
mathematical methods are unable to screen out the most important
metabolites related to OP as a whole, and the accuracy of the screening
is limited.
Machine learning has the ability to effectively select the most
important features because of the classification function, and is
suitable for complex omics data analysis due to the high flexibility of
its algorithms [[62]21]. Tree-based algorithms, such as extreme
gradient boosting (XGBoost), decision tree and random forest, and
neural networks, such as multi-layer perceptron classifier
(MLPClassifier), are common modeling methods used in machine learning
[[63]22]. These algorithms can provide insights into the potential
interactions between numerous input features and output results,
thereby improving the prediction accuracy [[64]23]. Machine learning
has been applied in biological models and has been proven to be
effective in genomic selection, treatment target prediction, and drug
target prediction [[65]24,[66]25]. Technically, metabolites related to
OP can be ranked and selected, and the accuracy of screening can be
improved based on machine learning.
To further explain the results of machine learning screening,
explainable machine learning was used for analysis. As a new branch of
machine learning, explainable machine learning can better explain the
prediction results of machine learning models [[67]26]. It provides
details such as the correctness, robustness, bias, improvement, and
transferability, and further reveals the dependencies between inputs
and outputs and the importance of inputs [[68]27]. Shapley additive
explanations (SHAP) and local interpretable model-agnostic explanations
(LIME) are the common explainable machine learning techniques. LIME
weights instances based on their similarity to the corresponding
original instances, while Shapley values in SHAP analysis are
calculated by estimating the weights corresponding to sampled
instances. Compared to LIME, SHAP can provide a global explanation by
combining individual local explanations and using predictions with a
fair distribution among the input [[69]28]. SHAP value is considered a
unified measure of feature importance, and SHAP has an advantage over
LIME. Therefore, SHAP was selected to explain the feature contribution
and the direction of feature influence on the model.
To assess the applicability of OP in evaluating the ecological risk of
heavy metal-contaminated soil on plants, the potential relationship
between soil OP values and the oxidative stress indicators in ryegrass
Lolium perenne L.) planted in heavy metals contaminated soils was
analyzed. Using machine learning, the most relevant metabolic
perturbations associated with OP were screened from metabolomics and
explained by SHAP analysis to prove that OP can be used as a rapid and
stable indicator for assessing risks in ryegrass under oxidative stress
caused by heavy metals.
2. Materials and methods
2.1. Chemicals and soils
The reagents for OP determination, including DTT, phosphate-buffered
saline (PBS) buffer (pH 7.4), and 5,5-dithiobis(2-nitrobenzoic acid)
(DTNB) and the reagents for heavy metal determination, including HNO[3]
and HF, were purchased from Sigma Aldrich (Shanghai, China) and Aladdin
Reagent Co., Ltd. of China, respectively. Fisher Scientific Inc.
(Loughborough, UK) provided acetonitrile of chromatographic grade,
ammonium acetate, ethanoic acid, and methanol. All the chemicals
purchased were of the highest purity available. The total protein, MDA,
superoxide dismutase (SOD), hydrogen peroxide (H[2]O[2]), peroxidase
(POD), metallothionein (MT), catalase (CAT), and total antioxidant
capacity (T-AOC) assay kits were bought from Nanjing Jiancheng
Bioengineering Institute (Nanjing, China).
A metal smelter plant (Shenyang, Liaoning Province, China) provided the
topsoil samples contaminated with Cu and Pb (i.e., S1–S8), and the
concentrations of Cu and Pb in the soil are shown in [70]Table S1. The
soil located 3 km upwind from the plant was selected as the control.
The physical and chemical properties of soil are shown in [71]Table S2,
and the determination is described in a previous study [[72]15]. The
soils were air-dried and sieved to a particle size of <2 mm for
ryegrass cultivation and sieved to <0.15 mm for Cu and Pb
concentrations detection and OP determination. Cu and Pb concentrations
were analyzed using inductively coupled plasma mass spectrometry
(ICP-MS) (Elan DRC-e, PerkinElmer, Waltham, MA, USA) after soil
digestion, and the digestion procedure was described in previous
literature [[73]29]. The limit of detection for ICP-MS was 0.5 μg/L.
The OP of soil water extracts was assessed by the DTT assay, as
described in previous research [[74]15].
2.2. Ryegrass exposure design
Ryegrass (Lolium perenne L.) seeds were obtained from a local gardening
supply store. The ryegrass seed surfaces were sterilized with a
solution of 3% H[2]O[2] for a duration of 20 min, and then sowed in
the blank soil for 14 days. When the ryegrass reached a height of about
15–20 cm, 20 ryegrass plants with similar growth were selected and
transplanted into the pots filled with 2.5 kg of control soil or S1–S8
contaminated soils for 21 days [[75]30]. The pots were incubated at
20 ± 1 °C and light cycle of 16 h/8 h (day/night). Ryegrass leaf
samples were cleaned with PBS and collected after 21 days for
biomarkers and metabolomics determination.
2.3. Biomarkers assay
Fresh leaf tissues were ground using liquid nitrogen mixed with PBS
solution, followed by centrifugation at 4 °C for 15 min at 5,000×g to
extract the supernatants. Supernatants were collected and used for
subsequent determination. Details on the assessment of total protein,
ROS, MDA, T-AOC, MT, as well as the activities of CAT, POD, and SOD and
the calculation of biomarker response index (BRI) has been described in
the previous research [[76]12,[77]15,[78]31,[79]32].
2.4. Metabolomics detection and profiling
Ryegrass leaves exposed to S1, S2, S5, S6, and the control groups were
used for metabolomics analysis. Samples with 25 mg were extracted
using a 500 μL mixture of acetonitrile:methanol:water in a 1:2:1 ratio
with isotopic internal standard. Subsequently, samples were homogenized
at 35 Hz for 4 min and sonicated in an ice water bath for 5 min and
repeated three times. The samples were then incubated at −40 °C for
1 h and centrifuged at 13,800 × g at 4 °C for 15 min. The obtained
supernatants were used for analysis. The mixture of an equal amount of
supernatant from all samples was prepared for the quality control
sample [[80]33,[81]34]. The masses of the metabolites were analyzed by
LC-MS/MS [[82]35,[83]36]. The LC gradient elution programs and mass
detection are detailed in [84]Table S3. Metabolic analysis, including
metabolite identification and pathway enrichment, is described in the
previous research [[85]15].
2.5. The calculation of the metabolic effect level index
The metabolic effect level index (MELI) is a quantitative endpoint to
reflect the aggregated response of the changes in all individual
metabolites after exposure [[86]37].
Firstly, the metabolic changes (MC) for each individual metabolites in
plants samples of the contaminated group compared to the average MC in
plants of the control group were calculate as Eq. [87]1:
[MATH:
MCi,
T=e|lnA
i,T|−e|lnA
i,C| :MATH]
(1)
Here A[T] is the ratio of the abundance of individual metabolite i in
the contaminated group to the mean abundance of these metabolites in
the control group, and A[C] is the relative metabolic change of the
control group samples, which, by definition, equals to 1. Then MELI was
calculated as Eq. [88]2, dividing the sum of MC for all n metabolites
in the sample by the number of metabolites n:
[MATH: MELI=MCtotal,T
/n=∑i=1i=nMCi,T/n :MATH]
(2)
In this study, MELI for different classes of metabolites was calculated
separately.
2.6. Machine learning for identifying important metabolites related to OP
2.6.1. Machine learning models
In order to further identify metabolites related to OP, the machine
learning analysis based on the results of differential expression
analysis was conducted. The whole process of machine learning analysis
is illustrated in [89]Fig. S1. To find the machine learning model that
best fits the data in this study, four commonly used classification
models, including XGBoost, MLPClassifier, decision tree, and random
forest, were tested [[90]21]. A brief introduction to the principles of
four machine learning models is described in Text S1, and the details
of the algorithms can be found in previous literature
[[91]22,[92]28,[93]38,[94]39]. For the purposes of the model
performance evaluation, each of the machine learning models was applied
to the same normalized data set. Significantly differential metabolites
in S1, S2, S5, and S6 groups compared to the control group were
selected and used for model analysis. The data set was divided into
training (60%) and test (40%) sets prior to the application of machine
learning [[95]21].
2.6.2. Model optimization and performance evaluation
The grid search method was used for hyperparameter optimization of the
four machine learning models [[96]28]. The grid search method evaluates
the models by selecting the most appropriate set of hyperparameters.
The optimized models were then used for validation purposes.
Metrics for evaluating model performance, including cross-validation
accuracy, recall, precision, F1 score, receiver operating
characteristic (ROC) curves and the area under the ROC curve (AUC), and
the calculation of cross-validation accuracy, recall, precision, and F1
score are described in Text S2. Five-fold cross-validation was used to
evaluate the accuracy of the models. Based on the metrics, the best
model was selected, and all metabolites with differential expression
compared to the control group were ranked using the selected model to
identify the top 20 important metabolites related to OP values.
2.6.3. SHAP analysis
SHAP analysis interprets machine learning models through SHAP values
[[97]27]. A brief introduction to the principles of SHAP is described
in Text S3, and the details of the algorithm can be found in previous
literature [[98]26,[99]28]. Each feature possesses a SHAP value
representing its average marginal contribution. A higher absolute SHAP
value indicates a greater influence of the feature on the model, with
positive or negative values indicating the average impact on the
model's prediction [[100]40]. Features that have a significant impact
on the model output can be identified through SHAP global explanations,
which rank the features based on the SHAP values.
2.7. Statistical analysis
SPSS software (V22.0, Chicago, USA) was used to identify significant
differences through one-way analysis of variance (ANOVA, p < 0.05)
and post hoc testing using Fisher's least significant difference (LSD).
Additionally, the R software V.3.3.5 package Ropls was used for
univariate (Student's t-test) testing and multivariate analyses,
including principal component analysis (PCA). The machine learning
model was developed using the package Scikit-learn V.1.4 and SHAP
V.0.45.1. All steps were performed using Python V.3.12.2 and Jupyter
Notebook. Pandas V.2.2 and NumPy V.1.26.3 were used for basic
statistical calculations and data collation, respectively.
3. Results and discussion
3.1. OP values of heavy metals contaminated soil
OP values were measured by the DTT oxidation efficiency. The OP of the
control group was much lower than that of the contaminated groups, with
values of 20.9 and 25.5–62.8 pmol/(min·μg), respectively
([101]Table S1). Cu and Pb concentrations in water extracts of
contaminated soils were 0.155–1.93 and 0.00399–0.384 μM,
respectively, while the Cu concentration in water extract of control
soil was 0.0783 μM, and Pb was lower than the detection limit. The OP
values increased with the increase of total Cu and Pb concentration in
water extracts ([102]Table S1).
Heavy metals could be the dominant species responsible for catalyzing
DTT oxidation [[103]15]. Transition metals such as Cu and Pb play a
dominant role in ROS formation from the soil samples. The family of
ROS, including O[2]^·–, HO[2], H[2]O[2], and ·OH, is linked by multiple
reactions beginning with O[2] accepting an electron from DTT catalyzed
by Cu and Pb to become O[2]^·–. O[2]^·– can add H^+ to form HO[2],
which leads to the formation of H[2]O[2] and then ·OH, which is
eventually converted to H[2]O [[104]41]. Transition metals with
particle matter-induced cytotoxicity and health effects have been
reported [[105]42] and explained by the formation of ROS [[106]43].
3.2. Oxidative stress on ryegrass and correlation with OP
Heavy metals can severely damage the defense system by disturbing the
normal metabolic processes in plants, ultimately affecting the
production of antioxidants in plants [[107]44]. The increase in ROS
levels in plants compared to the control group also indicated that
heavy metals induced oxidative stress in ryegrass ([108]Fig. S2). The
activity of biomarkers, including antioxidant enzymes (i.e., SOD, CAT,
POD), T-AOC, MDA, and MT, in the ryegrass leaves under the Cu or Pb
stress changed significantly compared to the control group
([109]Fig. S3). T-AOC, MT, and POD in leaves of the contaminated group
were significantly higher than those in the control groups, while SOD,
CAT, and MDA were significantly lower than those in the control group
(p < 0.05). Antioxidant enzymes (e.g., SOD, CAT, POD), assessment
indicators of oxidative level (e.g., T-AOC, MDA), and antioxidant
molecules (e.g., MT) are a common plant response to oxidative stress.
Due to the significant differences in dose–response relationships among
different biomarker responses, BRI, as an integrated index based on
multiple biomarker responses, was used to establish dose-effect
relationships between contamination levels and biomarker responses
[[110]45]. In this study, BRI varied from 1.95 to 3.45 for ryegrasses
exposed to industrial-contaminated soils ([111]Fig. 1), while BRI of
the ryegrass exposed to the control soil was 4.00, indicating that
heavy metals caused slight or severe changes of the physiological
status of ryegrass, respectively [[112]45].
Fig. 1.
Fig. 1
[113]Open in a new tab
Correlation between OP values of the soils and biomarker response index
(BRI) of the ryegrass (Lolium perenne L.).
BRI has a significant exponential correlation with OP (r = 0.923,
p < 0.01) ([114]Fig. 1), and OP values have been linked to the health
effect indicators of particles, such as airway inflammation markers
[[115]46], cellular cytotoxicity [[116]47,[117]48], cellular oxidative
stress markers [[118]49], and cardiorespiratory health endpoints
[[119]12,[120]14]. Although there is a good exponential relationship
between BRI in earthworms after 28 days of exposure and the
concentration of antimony [[121]50], for the first time, OP was linked
to the biomarker changes (i.e., BRI) in plants, an indicator of
oxidative stress induced by heavy metals. The exponential relationship
between OP and BRI also indicated that the range of BRI variance at
lower OP values was greater than that at higher OP values. This may be
because plants can self-regulate through their own antioxidant systems
to slow down cell damage and resist oxidative stress, while higher
concentrations of heavy metals (i.e., at higher OP values) can quickly
induce plant tolerance mechanisms, leading to inconsistent changes in
biomarkers and OP values [[122]51]. As a whole, the OP values can be
used as an index for assessing the oxidative stress from lower heavy
metal contamination in soil on plants.
3.3. Relationship between OP values and metabolomics response of ryegrass
3.3.1. Changes in metabolic profiles of ryegrass exposed to soils with
different OP
To visualize the overall changes in metabolites between control and
groups with different OP values, PCA analysis of all detected
metabolites was performed. Along the first principal axis, all treated
groups were clearly separated from the control ([123]Fig. S4). In
addition, according to the volcano plots ([124]Fig. S5), a total of 345
metabolites were detected. Compared to the control group, 31, 26, 11,
and 38 metabolites were significantly up-regulated, and 19, 12, 31, and
14 metabolites were significantly down-regulated in the S1, S2, S5, and
S6 treatments, respectively ([125]Fig. S5). There were 50, 38, 42, and
52 significant differentially expressed metabolites in the S1, S2, S5,
and S6 treatments, with a total of 81 significant differentially
expressed metabolites based on p < 0.05 and VIP >1 ([126]Fig. 2). The
relative contents of all 81 metabolites is shown in [127]Fig. 2A. This
indicates that the metabolite profiles of ryegrass leaves significantly
changed after exposure to soils with different OP values. Plants are
sensitive to stimuli in environmental conditions, which can lead to the
imbalance of redox state [[128]52]. Excessive ROS in plants can affect
the cellular and molecular functions of plants and cause changes in the
metabolites [[129]52,[130]53].
Fig. 2.
[131]Fig. 2
[132]Open in a new tab
The significantly changed metabolites in ryegrass leaves exposure to
soils contaminated by heavy metals showed by heat map (A) and the
number of different classes of significantly regulated metabolites in
treatment groups compared to the control group (B).
Significant changes between the treatments and the control group were
observed in seven classes of metabolites, including amino acids,
purines, pyrimidines and their derivatives, sugars, organic acids,
fatty acids, prenol lipids, and glycerophospholipids, as well as other
metabolites ([133]Fig. 2B). The number of amino acids was the highest
among metabolites in different treatment groups ([134]Fig. 2B). In
addition, compared to that in the control group, most of the amino acid
metabolites such as l-proline, l-phenylalanine, and l-serine, were
up-regulated in the treatments ([135]Fig. 2A), which may indicate that
the amino acid synthesis of ryegrass was promoted after exposure to
soils with higher OP values, thereby improving the stress resistance.
Amino acids play an important role in plant resistance to oxidative
stress. For example, proline can directly scavenge free radicals and
affect the transcription levels of antioxidant enzyme genes, thereby
inducing oxidative stress, while other amino acids can indirectly act
as signaling molecules involved in plant development and antioxidant
stress [[136]54].
In addition to amino acid metabolites, other metabolites, such as fatty
acids, organic acids, and sugars, have significantly changed. Compared
to the control group, sugar metabolites, such as l-sorbose, trehalose,
and sucrose, were upregulated in the S1 and S2 treatments, which may be
due to the fact that sugars may induce stress response signals and
enhance plant resistance to stress by directly acting as negative
signals or altering cellular response pathways [[137]55]. The contents
of some sugars, such as d-mannose and melibiose, decreased in the S5
and S6 treatments, which may be related to the fact that soils with
higher OP values reduce sugar contents by regulating hormone signal
transduction. Most of the organic acids and fatty acids were
down-regulated in the S1, S2, and S5 treatments, and up-regulated in
the S6 treatment. Different contamination levels can result in
different regulation of metabolites, which may be related to plant
self-regulation [[138]56,[139]57]. ROS can cause disruption of energy
metabolism and redox balance, lipid peroxidation, nucleic acid damage,
etc. These metabolites were involved in the maintenance of redox
balance, synthesis of metabolites with ROS scavenging capacities,
generation and consumption of ATP, and stabilization and protection of
cell membranes in plants, all of which were associated with oxidative
stress [[140]58].
3.3.2. Relationship between metabolic effect level index and OP values
To quantitatively reflect the aggregated metabolic changes in ryegrass
with different soil OP, MELI of different classes of 81 metabolites
were calculated, and the relationship between MELI and OP was
evaluated. Compared to the control, the MELI values were significantly
higher in the treatment groups ([141]Fig. S6), which was consistent
with the evident separation between control and treatment groups shown
in the PCA analysis ([142]Fig. S4), indicating that higher oxidative
stress may induce more metabolic disturbance. The MELI of the
lipophilic metabolites of M. spicatum significantly increased after two
days of isoproterenol exposure due to oxidative stress [[143]37]. A
significant correlation between OP and MELI of amino acids and sugar
metabolites in ryegrass was found ([144]Fig. 3). There were also
significant correlations among the MELI of other classes of metabolites
([145]Fig. 3), indicating that OP was directly or indirectly related to
the disturbance of metabolites in ryegrass leaves exposed to Cu and
Pb-contaminated soils.
Fig. 3.
[146]Fig. 3
[147]Open in a new tab
The correlation between OP value and MELI values of different classes
of significantly regulated metabolites. The thickness of the line
indicates the correlation coefficient. The red lines represent
significant correlation with p < 0.05 and the grey lines represent
insignificant correlation with p > 0.05.
Plants under excessive ROS stress interact with other organelles
through a “fast” signaling communication network and establish
appropriate defense responses [[148]59]. OP has been linked to the
potential to induce ROS formation [[149]60], and the OP values of fine
particulate matter have been reported to mainly perturb amino acid
metabolism, lipid metabolism, sugar biosynthesis, as well as cofactor
and vitamin metabolism in the plasma of participants from two
dormitories near and far from a congested highway [[150]16]. The
correlation between OP and MELI of specific metabolites, especially
amino acids and sugars, in ryegrass was first reported in our study,
illustrating the relationship between OP and metabolic disturbances of
plant induced by oxidative stress.
3.4. Identification of important metabolites related to OP values by machine
learning
To screen the most important metabolites related to OP values,
different machine learning algorithms were used to rank the importance
of metabolites. The performance of four machine learning models,
including XGBoost, MLPClassifier, decision tree, and random forest, was
optimized through the tuning of hyperparameters, and the optimized
hyperparameter sets of the four models are shown in [151]Table S4.
The best model was selected by cross-validated accuracy, recall,
precision, F1 score, ROC, and AUC [[152]61]. The F1 score achieved by
the four models was 0.80–0.97 ([153]Table S5), indicating a good
balance between precision and recall. Random forest yielded the highest
cross-validated accuracy (97.0%), while the cross-validated accuracies
of the XGBoost and decision tree models were 77.0% and 60.0%,
respectively ([154]Table S5). The optimization did not converge when
maximum iterations (200) were reached through five-fold cross
validation with the MLPClassifier model ([155]Table S5). Moreover, ROC
curves of the four machine learning methods further illustrated that
random forest performed best ([156]Fig. S7). AUC values of random
forest reached 1. Therefore, the random forest model was selected to
rank the significantly differential metabolites based on their
importance to OP values.
Eighty-one significant differentially expressed metabolites between the
treatment groups and the control group were involved in the analysis
and ranked by the feature's importance. The relative abundances of the
top 20 metabolites are shown in [157]Fig. S8, and their sum importance
to OP arrived 50.6% ([158]Fig. 4A). The top 20 metabolites with the
highest importance can be divided into six categories, including seven
amino acids (i.e., l-isoleucine, l-arginine, tyramine, l-histidine,
l-asparagine, l-phenylalanine, l-tyrosine), three purine, pyrimidine
and their derivatives (i.e., cytidine, 7-methylxanthine, 3′-AMP), two
organic acids (i.e., malonic acid, l-malic acid), three sugars (i.e.,
trehalose, l-sorbose, d-mannose), one fatty acid (i.e., stearidonic
acid) and four other metabolites (i.e., piperidine, putrescine,
hydrocinnamic acid, phytosphingosine).
Fig. 4.
[159]Fig. 4
[160]Open in a new tab
The importance ranking of the top 20 metabolites related to the OP
values based on the random forest model (A) and the SHAP global
explanation of top 20 features (B). The horizontal position represents
the SHAP value of each data point, and the color reflects the content
of the features. The point less than zero SHAP value represents a
negative contribution to the output, while the point on the right
represents a positive contribution.
Feature importance can be assessed based on random forest algorithm. To
further intuitively explain the contribution of these features to the
model, SHAP was used for in-depth model analysis [[161]62]. SHAP values
of the top 20 features contribution obtained from SHAP analysis for
each data point are shown in [162]Fig. 4B. Seventeen of the top 20
metabolites screened based on SHAP analysis were the same as the rank
results of random forest, which further verified the correctness of the
rank results of random forest. The SHAP explanation provided a more
comprehensive understanding of the relationships and interactions
within the dataset [[163]27]. In the global SHAP explanation, the most
important feature is “piperidine”, which is consistent with the random
forest results. Some metabolites, such as l-malic acid and
l-phenylalanine, have larger contents with higher SHAP values,
indicating that these metabolites have a positive correlation with OP
values, while some metabolites such as trehalose and putrescine showed
a negative correlation with OP values ([164]Fig. 4B). The top 20 most
important metabolites for OP were identified through machine learning
([165]Fig. 4A), providing further indication for the relationship
between OP and metabolic disturbances of plants.
3.5. The identified metabolic pathways related to oxidative stress
Based on the pathway enrichment analysis using the public database
Kyoto Encyclopedia of Genes and Genomes (KEGG), the metabolic pathways
of the top 20 metabolites most related to OP were analyzed by pathway
enrichment analysis ([166]Fig. 5), including amino acid metabolism,
tricarboxylic acid (TCA) cycle, alkaloid biosynthesis, caffeine
metabolism, purine and pyrimidine metabolism, sphingolipid metabolism,
sugar metabolism, and α-linolenic acid metabolism.
Fig. 5.
[167]Fig. 5
[168]Open in a new tab
Schematic diagram of the proposed metabolic pathway of ryegrass related
to OP values after exposed to Cu and Pb-contaminated soils. Red
indicates the top 20 important metabolites related to OP values
identified based on the random forest model.
ROS can affect alkaloid biosynthesis, and alkaloids such as piperine
have a feedback regulatory effect on ROS, participating in the
clearance of free radicals and ROS in plants to decrease oxidative
stress ([169]Fig. 5) [[170]63,[171]64]. Amino acid metabolic pathways
are influenced by ROS, as ROS can form stable oxidative covalent
modifications with histidine, arginine, lysine, and tyrosine, leading
to the formation of carbonyl compounds of these amino acids
([172]Fig. 5) [[173]65]. The TCA cycle plays a crucial role in plant
stress responses to heavy metals by supplying reductants and adenosine
triphosphate (ATP) for various adaptive mechanisms such as ROS
detoxification and the electron transport chain [[174]66,[175]67].
Purine, pyrimidine metabolism and ROS interact with each other, and
purine, pyrimidine metabolism may lead to secondary ROS generation,
exacerbating oxidative stress [[176]68]. Caffeine biosynthesis is
important for maintaining the redox state of plants, and caffeine plays
an important role in maintaining the redox homeostasis of peppermint
leaves during later stages of phenanthrene stress [[177]69]. Sugar
metabolism, involved with osmoregulatory substances such as d-mannose
and trehalose, can affect the antioxidant defense of plants by
regulating the expression levels of enzymatic antioxidants [[178][70],
[179][71], [180][72]]. α-linolenic acid functions as an antioxidant in
plant lipid metabolism and serves as a precursor for the production of
jasmonic acid, a recognized signaling molecule involved in subsequent
antistress responses [[181]73,[182]74]. Sphingolipid metabolism can be
affected due to the damage to the fluidity of membranes by excessive
ROS [[183]75]. Overall, the above metabolites were directly or
indirectly related to oxidative stress induced by heavy metals.
4. Conclusions
OP of the in situ Cu and Pb contaminated soils were measured by DTT
method, and biomarkers related to oxidative stress were determined. An
exponential correlation between soil OP and the biomarker response
index of ryegrass was observed. OP was found to be significantly
associated with MELI of amino acids and sugars. By using random forest
machine learning, the top 20 metabolites in plants and metabolic
pathways related to OP values were identified. SHAP analysis was used
to interpret the feature's contribution and the influence direction of
the model. Metabolic pathways, such as amino acid metabolism, caffeine
and purine metabolism, α-linolenic acid metabolism, sphingolipid
metabolism, pyrimidine metabolism, alkaloid biosynthesis, TCA cycle,
and sugar metabolism, were identified to be related to OP values. As a
whole, OP can be used to evaluate the oxidative stress in plants
induced by heavy metal-contaminated soil. The applicability of OP
values in the ecological risk assessment of heavy metals or other
contaminants on more types of plants can be further studied in the
future.
CRediT authorship contribution statement
Chunmei Ran: Writing – review & editing, Writing – original draft,
Validation, Investigation. Meiqi Guo: Validation, Investigation. Yuan
Wang: Validation, Formal analysis. Ye Li: Validation, Formal analysis.
Jiao Wang: Writing – review & editing, Validation. Yinqing Zhang:
Validation, Writing – review & editing. Chunguang Liu: Writing – review
& editing, Supervision. Bridget A. Bergquist: Writing – review &
editing, Supervision. Chu Peng: Writing – review & editing,
Supervision, Methodology.
Declaration of competing interests
The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to
influence the work reported in this paper.
Acknowledgments