Abstract
This study develops and validates a multi-scale integrative
computational toxicology framework to systematically investigate the
potential association between the natural pesticides pyrethrin I and II
and breast cancer risk. The proposed approach integrates deep
learning–based drug–target interaction prediction (via DeepPurpose),
molecular docking and dynamics (MD) simulations, and protein–protein
interaction (PPI) network modeling, forming a traceable risk inference
chain from molecular-level interactions to clinically relevant
outcomes. Experimental results revealed that pyrethrin I exhibits
stable and high-affinity binding to key breast cancer–related proteins,
including RPS6KB1, TNKS2, and MAOB, with a minimum binding free energy
(ΔG) reaching −27.37 kcal/mol. These interactions potentially modulate
tumor progression through key oncogenic pathways such as PI3K/AKT,
Wnt/β-catenin, and metabolic reprogramming. RPS6KB1 is implicated in
estrogen receptor–positive (ER+) breast cancer proliferation, while
TNKS2 is closely associated with stemness maintenance and the
aggressiveness of triple-negative breast cancer (TNBC). MAOB
demonstrates the highest structural stability among complexes, making
it a promising candidate for toxicological modeling. The study further
introduces a cross-scale risk indicator modeling strategy, constructing
a mechanistically interpretable chain from compound structure to
protein modules, carcinogenic pathways, and clinical risks. This
integrative methodology supports environmental exposure monitoring and
toxicological policy development. The open-source, containerized
analytical toolchain developed herein is highly extensible and
adaptable for future toxicological evaluations of other natural
compounds and emerging environmental pollutants.
Keywords: Integrative computational toxicology, Multi-omics data
integration, Deep learning for drug–target interaction, Multi-scale
molecular simulation, Environmental chemical risk assessment, Molecular
docking and dynamics, Mechanistic causal inference
Highlights
* •
First multi-scale framework linking molecules to clinical
toxicology outcomes.
* •
Pyrethrin I binds key breast cancer proteins with high affinity
(ΔG ≤ −27.37).
* •
MAOB validated as a key hub via MD simulations and network
analysis.
* •
Cross-scale model connects binding strength to clinical GWAS risk
markers.
* •
Open-source pipeline supports toxicology studies of natural and
emerging toxins.
1. Introduction
Pyrethrins are natural insecticides derived from the plant Tanacetum
cinerariifolium. Due to their high biodegradability and low toxicity to
mammals, they are widely used in agricultural pest control, urban
sanitation management, and household environments. Owing to these
characteristics, pyrethrins are often classified as “environmentally
friendly pesticides” [[33][1], [34][2], [35][3]]. However, recent
studies have indicated that pyrethrins and their metabolites may
possess endocrine-disrupting activity [[36]4,[37]5], and prolonged
exposure may increase the risk of hormone-related diseases, such as
breast cancer [[38]5,[39]6]. Although the underlying mechanisms remain
unclear, these potential risks have raised academic concerns about
their carcinogenicity. Despite the favorable attention pyrethrins
receive due to their natural origin and biocompatibility, their
toxicological risk assessment faces multiple challenges. Existing
studies have primarily focused on animal experiments and biochemical
analyses, emphasizing acute or subacute toxicity testing
[[40]1,[41]4,[42][7], [43][8], [44][9], [45][10], [46][11]], but offer
limited insights into the long-term health effects of chronic low-dose
exposure, especially regarding endocrine disruption and cancer risk.
Moreover, there is a lack of integrative cross-scale analytical
frameworks that connect molecular binding, protein interaction
networks, and epidemiological data, resulting in a significant “scale
gap” [[47]4,[48]5,[49][8], [50][9], [51][10], [52][11]].
Currently, toxicological methods are insufficient for quantifying the
causal chain from “molecular binding free energy (ΔG) → network
perturbation → disease risk,” which limits the ability to infer health
impacts from molecular-level data [[53][12], [54][13], [55][14]].
Traditional QSAR models also often neglect protein conformational
changes, leading to distorted predictions of binding affinity
[[56][15], [57][16], [58][17]]. Furthermore, existing risk assessments
typically rely on single data sources and lack integration of
multi-omics data and exposure information, resulting in fragmented
predictions that are inadequate for supporting intelligent
environmental monitoring and policy applications [[59]18,[60]19].
Therefore, there is an urgent need to establish a cross-scale,
computationally reproducible, and application-oriented analytical
framework to enhance the toxicity interpretation and risk management
efficiency for pyrethrins and other emerging pollutants. This study
aims to develop a multi-scale computational toxicology framework,
selecting pyrethrins I and II as model compounds to investigate their
potential carcinogenic mechanisms in relation to molecular binding,
protein interaction networks, and breast cancer expression. The
specific objectives are: (1) To apply deep learning and molecular
simulation techniques to identify target proteins with high binding
affinity at the molecular level; (2) To assess the potential causal
relationships between these target proteins and breast cancer subtypes;
(3) To integrate cross-scale data to construct a “compound–protein
module–disease” risk propagation chain; (4) To develop predictive risk
indicator functions that model the exposure–disease response surface.
This study proposes several technological innovations to enhance the
accuracy and operational viability of toxicological risk assessment:
(1) Cross-scale integrated model construction: Integrating drug–target
interaction (DTI) prediction, molecular docking and dynamics (MD)
simulations, and systems biology approaches to, for the first time,
establish a complete carcinogenic mechanism pathway from a
computational toxicology perspective; (2) Quantifiable risk evaluation
formulas: Deriving logical risk functions based on molecular binding
free energy to build interpretable and quantifiable exposure–response
models (risk surface modeling); (3) Modular analytical toolchain:
Developing an open-source, containerized toolchain compatible with
Docker and high-performance computing platforms to support reproducible
deployment and flexible application of the methods; (4)
Application-oriented engineering design: The research outcomes are
applicable to environmental monitoring, green pesticide development,
and toxic substance policy-making, demonstrating interdisciplinary
engineering value and application potential.
The established multi-scale analytical framework can be extended to the
following future applications: (1) Environmental toxicant early warning
platforms: Combining IoT sensor data for real-time identification of
high-risk pollution sources to support smart city environmental
governance; (2) Chemical risk classification and management: Assisting
regulatory authorities in developing risk-based classification and
restriction standards for chemicals; (3) Cross-disciplinary educational
outreach: Transforming into educational modules applicable to fields
such as environmental engineering, bioinformatics, and toxicology; (4)
Support for national toxicology policy: Providing a computational
foundation for risk prediction of emerging pollutants (e.g., PFAS,
naturally derived compounds) to aid national-level risk management.
This study aspires to provide an extensible, reproducible, and
practically valuable analytical framework for environmental
toxicological risk assessment through an integrated and
engineering-oriented computational methodology, thereby strengthening
the scientific basis for the governance of emerging pollutants.
2. Methodology
This study aims to elucidate the potential molecular mechanisms by
which pyrethrin I and II may act on breast cancer-related targets.
Initially, known targets of pyrethrin I and II were retrieved from the
ChEMBL database. Standardized SMILES structures were obtained from
PubChem and analyzed using SwissTargetPrediction to identify additional
putative targets. Breast cancer-associated genes were compiled from
authoritative databases, including GeneCards, OMIM, PharmGKB, and TTD.
In addition, two microarray datasets ([61]GSE42568 and [62]GSE15852)
from the GEO database were analyzed for differential gene expression,
with the top 500 upregulated and downregulated genes selected as
candidate targets.
To strengthen the disease relevance of these targets, genome-wide
association study (GWAS) data for breast cancer were integrated with
protein quantitative trait loci (pQTL) data from the Finngen project
(Olink and SomaScan platforms). A two-sample Mendelian randomization
(MR) analysis was conducted to infer causal relationships between
targets and disease traits. SMILES representations of pyrethrin I and
II, along with target protein sequences, were input into the
DeepPurpose deep learning model to predict binding affinities. Targets
with scores above 7 were selected for molecular docking, with 100
conformations generated for each complex. The conformation with the
lowest binding energy was considered optimal.
Selected ligand–target complexes underwent 100-ns molecular dynamics
(MD) simulations. Key dynamic properties, including root-mean-square
deviation (RMSD), root-mean-square fluctuation (RMSF), radius of
gyration (Rg), intramolecular hydrogen bonding, and free energy
landscapes, were analyzed. Binding free energies during the stable
phase were calculated using the molecular mechanics/Poisson–Boltzmann
surface area (MM/PBSA) method. Proteins with docking energies below
−5 kcal/mol were imported into the STRING database to construct
protein–protein interaction (PPI) networks, using a medium confidence
threshold of 0.400. Topological features such as degree and betweenness
centrality were analyzed. Finally, Gene Ontology (GO) annotation and
Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment were
performed to explore the biological processes underlying the potential
anti-breast cancer mechanisms of pyrethrin I and II.
3. Analytical methods
3.1. Data sources and integration strategy
To support the proposed multi-scale drug screening and mechanistic
analysis framework, a three-layer data integration architecture was
established. This structure comprises compound-level data,
disease-associated targets, and protein structural information,
ensuring a comprehensive and traceable workflow from molecular input to
clinical relevance.
At the compound level, primary chemical and bioactivity data were
collected from the ChEMBL database. SMILES representations were sourced
from PubChem and processed via SwissTargetPrediction to enhance the
scope of target identification. At the disease level, breast
cancer-related genes were compiled from GeneCards, OMIM, PharmGKB, and
TTD. Differential expression analysis of two GEO datasets ([63]GSE42568
and [64]GSE15852) was performed to refine the disease-specific gene set
and identify dynamic biomarkers.
At the protein structure level, experimentally resolved structures were
obtained from the RCSB Protein Data Bank (PDB). Where high-resolution
structures were unavailable, AlphaFold-predicted models were used.
Functional annotations and sequence alignments from UniProt ensured
consistency between protein structure and function.
All data underwent a standardized ETL (Extract–Transform–Load) process,
including data extraction, format harmonization, nomenclature
standardization, redundancy elimination, and cross-database mapping.
The processed datasets were loaded into a unified analytical platform
to support downstream machine learning, mechanistic modeling, and
simulation validation. This integration strategy not only ensures data
interoperability and scalability but also provides a robust foundation
for cross-scale modeling in drug discovery.
3.2. Multi-scale analytical framework
This study proposes an integrated multi-scale analytical framework to
systematically investigate the interaction mechanisms between pyrethrin
and breast cancer-related targets. The framework encompasses three
analytical layers, beginning with network toxicology modeling to
establish a structured systems-level perspective. Specifically: (1) A
protein–protein interaction (PPI) network was constructed using STRING
with a confidence threshold of 0.4 to capture molecular interaction
profiles; (2) KEGG pathway enrichment analysis (FDR <0.05) was employed
to identify regulatory and signaling pathways associated with the
identified targets; (3) The resulting network modules were mapped to
breast cancer oncogenic pathways to assess potential phenotypic
relevance.
In the second phase, a deep learning–based drug–target interaction
(DTI) prediction model was developed using a structure-aware
architecture. The compound encoder integrates a message passing neural
network (MPNN) with an attention mechanism to enhance feature
extraction, particularly for cyclic molecular substructures. The
protein encoder utilizes a three-dimensional convolutional neural
network (3D-CNN) with a kernel size of 5 × 5 × 5 to learn spatial
structural representations. A transfer learning strategy was
adopted—pretraining the model on the DAVIS dataset followed by
task-specific fine-tuning—to optimize predictive performance for
pyrethrin-related targets.
To validate the predicted interactions, molecular docking and molecular
dynamics (MD) simulations were performed. AutoDock was used for initial
docking, and GROMACS was employed to conduct 100-ns simulations to
assess the structural stability of ligand–protein complexes. The
robustness of the interaction was further quantified using MM-PBSA
(Molecular Mechanics Poisson–Boltzmann Surface Area) free energy
decomposition. The binding free energy was calculated as:
ΔG[bind] = ΔE[MM] + ΔG[solv] (1)
Where, ΔE[MM] denotes the molecular mechanics energy (including
electrostatic and van der Waals contributions), and ΔG[solv] accounts
for solvation effects.
Overall, this multi-scale framework effectively integrates systems
biology and computational chemistry methodologies, providing a novel
paradigm for mechanistic studies of natural compounds with anticancer
potential.
In the core computational component of this study, a cross-modal
attention mechanism was introduced to quantify the interaction between
drug and target features. The attention weight is calculated as
follows:
[MATH:
∝ij=soft
max(QKT
dk) :MATH]
(2)
where, Q and K represent the feature matrices of the compound and the
protein, respectively, and d[k] = 64 serves as a scaling factor. This
formulation enables the model to capture the alignment and interaction
strength between different modalities—namely, chemical and biological
features—thereby enhancing the interpretability and precision of the
prediction results.
In addition, molecular dynamics simulations were performed to assess
the structural stability of the protein–ligand complex over time, using
root mean square deviation (RMSD) as the primary metric. The RMSD is
calculated as follows:
[MATH: RMSD(t)=1N∑i=1N‖ri(t)−riref‖2 :MATH]
(3)
where N is the number of backbone C[α] atoms, and r[i](t) represents
the position of atom i at time t. The results demonstrated that the
conformational fluctuations of the complex remained within a stable
range throughout the simulation period, indicating the structural
robustness of the binding interaction.
To support the initial screening at the network level, we employed the
confidence integration score from the STRING database, defined as:
[MATH: Score=1−∏i=1n(1−pi) :MATH]
(4)
where p[i] denotes the probability from the i-th data source supporting
a given protein–protein interaction.
In the molecular-level analysis, we applied a message passing neural
network (MPNN) architecture, in which node features are updated over
time using a gated recurrent unit (GRU). The update function is defined
as:
[MATH:
hvt−GRU(hvt−1
,∑u∈(nvW×[hv<
mrow>t−1‖
mrow>euv]) :MATH]
(5)
where h[v](t) is the hidden state of node v at time t, and e[uv]
denotes the edge features between nodes u and v.
For spatial feature extraction of protein 3D structures, we adopted a
three-dimensional convolutional neural network (3D-CNN), with the
convolution operation defined as:
[MATH:
Foutx,y,z=∑i=−
mo>kk∑
j=−kk<
munderover>∑l=−kk(Wi
,j,l×Fin
x+i,y+j,
z+l) :MATH]
(6)
This operation captures local spatial patterns in the protein volume
data and enhances structure-based representation learning.
Taken together, the innovations of this study include: (1) the
establishment of a closed-loop verification pipeline integrating
network prediction, deep learning, and atomic-scale simulation; (2) the
introduction of structure-aware attention mechanisms in the DTI model
to improve the representation learning of natural product features; and
(3) quantitative free energy analysis revealing a synergistic effect
between electrostatic and hydrophobic interactions in pyrethrin binding
(e.g., ΔE[elec]/ΔG[SA]>3.5).
This framework is highly extensible and applicable to other natural
product studies, with customizable parameters such as the confidence
threshold for target prediction (e.g., Score >0.5–0.7) and the choice
of molecular simulation force field (e.g., GAFF2 for small-molecule
optimization).
3.3. Validation and cross-verification of the analytical workflow
To ensure the credibility and scientific rigor of the overall
analytical workflow, this study implemented a three-tiered validation
mechanism, evaluating the process from three perspectives: internal
consistency, literature-based support, and methodological comparison.
First, the correlation between the prediction results of the deep
learning model (DTI) and the binding free energy derived from molecular
docking was assessed. The Spearman rank correlation coefficient reached
ρ = 0.82 (p < 0.001), indicating a high degree of internal consistency
and suggesting that the model effectively reflects actual molecular
binding affinities. Second, the accuracy of predicted binding sites was
validated through structural alignment. The predicted binding site of
MAOB exhibited a root-mean-square deviation (RMSD) of only 0.12 nm when
compared with the known crystal structure (PDB: [65]4MI5), supporting
the model's capacity to map onto real structural data. Third, a
comparative analysis at the methodological level revealed that the
proposed integrated framework (deep learning + molecular simulation)
outperformed conventional QSAR and standalone MD approaches in both
accuracy and interpretability, particularly in predicting complex
structures of natural compounds.
3.4. Limitations and engineering solutions
Despite the multiple advantages of the proposed methodology, several
technical challenges and limitations remain, for which corresponding
engineering solutions are proposed. First, in the calculation of
binding free energy, the limited accuracy of force field parameters
leads to a prediction error of approximately ±2.3 kcal/mol for ΔG. To
mitigate this uncertainty, a hybrid QM/MM (quantum mechanics/molecular
mechanics) force field will be introduced for refinement. Second, the
current 100-ns molecular dynamics (MD) simulations may be insufficient
to capture the full conformational landscape of proteins. To address
this, Gaussian accelerated molecular dynamics (GaMD) will be employed
to enhance sampling diversity and better capture energy transition
pathways. Additionally, during the integration of GWAS results,
potential bias may arise from population structure heterogeneity within
the FinnGen dataset. To correct for this and avoid statistical
distortion, the RAPS (Robust Adjusted Profile Score) method will be
applied for structural correction and re-estimation.
3.5. Mechanistic integration and engineering translation
This study further integrates molecular-level binding information with
clinical risk data to construct a traceable and inferable multi-level
mechanistic translation chain. The workflow begins with binding energy
screening between candidate compounds and target proteins
(ΔG < −9.5 kcal/mol), followed by protein interaction module
identification (MCODE >4), KEGG pathway enrichment analysis (FDR <0.05)
to identify potential biological pathways, and culminates in
statistically significant clinical association analysis (p < 0.05).
The entire translational chain comprises four hierarchical levels, as
illustrated in [66]Fig. 1, progressing from molecular characteristics
to clinical relevance. First, candidate compounds with high binding
affinity are filtered based on a threshold of ΔG < −9.5 kcal/mol,
establishing the foundation for subsequent analyses. Next, the targets
corresponding to these high-affinity ligands are subjected to
protein-protein interaction network analysis. Protein modules with
significant modularity are identified using the MCODE algorithm,
selecting subnetworks with MCODE scores greater than 4. These modular
proteins are then analyzed through KEGG pathway enrichment, with
significant pathways defined by an FDR less than 0.05, revealing
potential involvement in key biological mechanisms. Finally, the
identified pathways are cross-referenced with clinical databases to
determine statistically significant associations with diseases or risk
factors (p < 0.05).
Fig. 1.
[67]Fig. 1
[68]Open in a new tab
Integrative workflow bridging structure, binding, pathway, and clinical
associations.
This integrative workflow bridges four analytical layers—structure →
binding → pathway → clinical—and establishes a mechanistic translation
framework that traces compound properties to clinical outcomes. It
enables the engineering translation of structural toxicology into
precision health risk assessment, facilitating the application of
computational modeling in real-world biomedical and pharmaceutical
contexts.
4. Results
4.1. Workflow performance and outcomes
To establish a systematic framework for toxicological prediction, this
study integrated data mining, deep learning modeling, and molecular
simulation techniques to elucidate the potential mechanisms and
validate the molecular targets of pyrethrins in breast cancer. The
workflow and key outcomes can be delineated into four major components,
with the first outlined below: (1) Target Screening and Data
Integration: To construct a reliable list of breast cancer-related
target proteins, transcriptomic datasets from publicly available
repositories were systematically integrated. Potential targets of
pyrethrin I and II were screened and cross-validated using differential
expression analysis. Two representative datasets were
selected—[69]GSE15852 and [70]GSE42568—which respectively include
breast tumor and normal tissue samples, as well as various breast
cancer subtypes. (2) Differential gene expression analysis was
performed using the limma package in R. Genes were filtered based on
the criteria of |log[2] fold change (FC)| > 2 and false discovery rate
(FDR) < 0.05, ensuring both statistical significance and biological
relevance. (1) In [71]GSE15852, a total of 1027 differentially
expressed genes (DEGs) were identified, including 624 upregulated and
403 downregulated genes. Notably, RPS6KB1 and MAOB were significantly
upregulated, implicating roles in the mTOR signaling and metabolic
pathways respectively. (2) In [72]GSE42568, 875 DEGs were identified
(503 upregulated, 372 downregulated). CSNK2A1 was markedly upregulated
in triple-negative breast cancer (TNBC) samples, suggesting
subtype-specific relevance. (3) Results are visualized through volcano
plots ([73]Fig. 2(a) and (b)), with the x-axis representing log[2]FC
and the y-axis showing −log[10](FDR). (1) Blue dots indicate
significantly upregulated genes, (2) Purple dots denote significantly
downregulated genes, (3) Red areas represent non-significant genes.
Fig. 2.
[74]Fig. 2
[75]Open in a new tab
Volcano plots of DEGs from two GEO datasets ([76]GSE15852 &
[77]GSE42568).
Key targets such as RPS6KB1, MAOB, and CSNK2A1 are located in highly
significant and strongly altered regions of the plots, highlighting
their potential as critical risk-associated targets. These proteins are
also annotated in the TTD and PharmGKB databases as breast
cancer-related, and structural data are available from PDB or
AlphaFold, qualifying them for subsequent deep learning-based binding
prediction and molecular simulation. This step effectively filters
transcriptomic data to yield target candidates with both structural
availability and functional significance, establishing a solid
biological foundation for the multi-scale analytical framework.
4.2. Deep learning prediction performance and hotspot analysis
To systematically predict potential interactions between pyrethrins
(Pyrethrin I and II) and breast cancer-related proteins, this study
employed the DeepPurpose framework for drug-target binding affinity
prediction. The model utilized a Message Passing Neural Network (MPNN)
as the compound structure encoder combined with a 3D-CNN for protein
sequence processing. After transfer learning fine-tuning on the DAVIS
dataset, the model architecture named model_MPNN_CNN_DAVIS was
established.
The model outputs a continuous binding score (DTI score) ranging from 0
to 10, representing the predicted interaction strength between compound
and protein. Higher scores indicate stronger expected binding affinity.
Predictions were conducted for all candidate breast cancer-related
target proteins, calculating binding scores of Pyrethrin I and II, with
scores ranging from 3.5 to 7.5.
To enhance the efficiency and reliability of subsequent molecular
simulations, a DTI score threshold of >7.0 was set to define
high-confidence interaction pairs. These selected pairs were further
subjected to molecular docking and free energy simulations.
For intuitive visualization of the prediction score distribution,
[78]Fig. 3 employs a bubble plot to depict model predictions. The
x-axis represents predicted binding scores, the y-axis lists
compound-target pairs, and bubble sizes correspond to the magnitude of
the scores. Key observations from the plot include: (1) most pairs
cluster around scores between 5.0 and 6.5, representing the model's
primary prediction range and the docking binding energy concentration;
(2) high-score regions (>7.0) with large bubbles indicate high-priority
pairs such as Pyrethrin I–RPS6KB1 and Pyrethrin II–MAOB, which also
showed stable binding energies (ΔG < −20 kcal/mol) in subsequent
simulations; (3) a minority of pairs with lower scores (<4.5) scattered
on the left side denote low-interaction potential.
Fig. 3.
Fig. 3
[79]Open in a new tab
Distribution of predicted binding scores for pyrethrin I and II against
breast cancer candidate targets using MPNN-CNN model.
By integrating model predictions and hotspot visualization, this study
successfully established a binding affinity–based target ranking
strategy with clear prioritization of compound-target pairs. This
approach provides a robust preprocessing basis for molecular simulation
modules and demonstrates the application potential and stability of the
DeepPurpose model in environmental toxicant target prediction.
4.3. Molecular docking and Structural Validation
To validate the structural plausibility of the top-scoring drug–target
pairs predicted by the DTI model, molecular docking simulations were
conducted for the four highest-ranked combinations: Pyrethrin
I–RPS6KB1, Pyrethrin I–MAOB, Pyrethrin II–CSNK2A1, and Pyrethrin
I–TNKS2. Receptor structures were obtained from either the RCSB PDB
database or AlphaFold-predicted models. Preparation steps included
water molecule removal, hydrogen atom addition, and binding pocket
definition. Docking was performed using AutoDock4.
Each docking result was visualized from two perspectives: (1) Labeled
Complex View – ligands (Pyrethrins) are represented as blue sticks,
with hydrogen bonds (dashed lines), hydrophobic interactions, and key
interacting residues highlighted. (2) Pocket Surface View – the
receptor protein is shown in surface representation, with the binding
pocket highlighted in red, clearly illustrating the ligand's position.
Key findings are as follows.
* 1.
RPS6KB1–Pyrethrin I ([80]Fig. 4 (a)): Pyrethrin I fits stably into
a hydrophobic core adjacent to the ATP-binding site of RPS6KB1,
forming two stable hydrogen bonds with His104 and Asp113. This
interaction may potentially modulate downstream mTOR
phosphorylation activity.
* 2.
MAOB–Pyrethrin I ([81]Fig. 4(b)): The ligand-binding pocket is
deeply embedded in the FAD region. Pyrethrin I exhibits strong
hydrophobic interactions and π–π stacking, and the binding site
perfectly overlaps with the UniProt-defined catalytic domain,
suggesting a potential inhibitory effect on MAOB oxidase activity.
* 3.
CSNK2A1–Pyrethrin II ([82]Fig. 4(c)): Pyrethrin II binds within the
kinase active pocket of CSNK2A1, forming hydrogen bonds and
hydrophobic interactions with Glu114 and Val66, aligning with the
CK2 active region annotated by PFAM. This indicates a high degree
of target specificity.
* 4.
TNKS2–Pyrethrin I ([83]Fig. 4 (d)): Pyrethrin I binds to the
N-terminal groove of the PARP module in TNKS2, interacting with
Asp1060 and Arg1064. These interactions may interfere with the
ADP-ribosylation activity of TNKS2.
Fig. 4.
[84]Fig. 4
[85]Open in a new tab
Molecular docking and structural validation of pyrethrin I–RPS6KB1,
pyrethrin I–MAOB, pyrethrin II–CSNK2A1, and pyrethrin I–TNKS2.
By integrating UniProt annotations and KEGG functional module analyses,
it is evident that all ligand-binding sites are located within the core
functional domains of the proteins. This underscores the biological
plausibility and structural reliability of the docking results.
Together, these findings validate the deep learning predictions and
offer stable starting conformations for subsequent free energy
calculations and molecular dynamics simulations.
4.4. Molecular dynamics simulation and stability validation
To evaluate the binding stability of Pyrethrin compounds with target
proteins, 100-ns (ns) molecular dynamics (MD) simulations were
conducted. The results demonstrated that the Pyrethrin I–MAOB complex
exhibited high structural stability. According to the structural
superimposition ([86]Fig. 5), the root mean square deviation (RMSD)
between the final stable conformation and the initial structure
remained below 0.3 nm, with critical hydrogen bond networks maintained
throughout the simulation, indicating stable binding. Among all
simulated complexes, Pyrethrin I–MAOB showed the best stability, with
an average RMSD of 0.28 ± 0.04 nm and fewer than one hydrogen bond on
average, reaching equilibrium by 25 ns. This suggests a high
consistency between the simulated and initial conformations.
Fig. 5.
Fig. 5
[87]Open in a new tab
Structural overlay of Pyrethrin I–MAOB complex. (The initial
configuration (gray) and final conformation after 100 ns simulation
(orange) show minimal deviation. Pyrethrin I is highlighted in green.
The preserved hydrogen bond network and low RMSD (<0.3 nm) indicate
high structural stability.).
In comparison, the Pyrethrin I–RPS6KB1 complex exhibited a higher RMSD
of 0.56 ± 0.05 nm but stabilized by 60 ns. Coupled with previous free
energy analysis, this complex remains a promising candidate for
mechanistic studies and risk assessment. The Pyrethrin I–TNKS2 complex
showed poor stability, with an RMSD of 1.13 ± 0.05 nm and fluctuating
hydrogen bond counts, indicating significant structural variations
early in the simulation. Pyrethrin II–CSNK2A1 exhibited moderate
stability, with an average RMSD of 0.33 ± 0.06 nm, reaching equilibrium
by 15 ns, making it a potential candidate for further modeling.
5. Summary of analytical results and discussions
In this study, a comprehensive multi-scale analytical framework was
developed and applied to assess the carcinogenic potential of pyrethrin
compounds through a sequence of computational approaches, including
molecular docking, MM-PBSA-based binding free energy analysis,
molecular dynamics (MD) simulations, protein–protein interaction (PPI)
network centrality evaluation, and pathway–clinical mechanism mapping.
[88]Table 1∼4 summarize the primary outcomes across these analytical
stages. Each compound–protein complex was evaluated in terms of binding
affinity, structural stability, biological function, and mechanistic
linkage to clinical phenotypes.
Table 1.
Analysis results of Molecular Binding and Energetics.
Complex ΔG[bind] (kcal/mol) Dominant Interaction Interpretation
Pyrethrin I–RPS6KB1 −27.37 Electrostatic + Hydrophobic Strongest
affinity, oncogenic potential
Pyrethrin I–MAOB −15.66 Electrostatic Polar interaction model
Pyrethrin II–CSNK2A1 −22.44 Hydrophobic Mitogenic signal contributor
Pyrethrin I–TNKS2 −21.94 Electrostatic W[nt] pathway regulation
[89]Open in a new tab
5.1. Molecular binding and energetics
The MM-PBSA results in [90]Table 1 revealed that the Pyrethrin
I–RPS6KB1 complex exhibited the lowest binding free energy
(ΔG[bind] = −27.37 kcal/mol), with dominant electrostatic and
hydrophobic interactions. This dual-mode binding suggests enhanced
structural stabilization, indicating a strong biological relevance and
potentially high toxicity.
In contrast, MAOB showed the most pronounced electrostatic dominance
(ΔE[elec] = −122.68 kcal/mol), making it a promising model for
exploring polar active sites in drug development.
5.2. Structural stability from molecular dynamics
The MD simulation results in [91]Table 2 further supported the docking
outcomes. The Pyrethrin I–MAOB complex displayed the lowest RMSD
(0.28 ± 0.04 nm) and stabilized early at 25 ns, confirming its
structural consistency and potential as a reliable model for future
ligand optimization. While RPS6KB1 showed moderate RMSD
(0.56 ± 0.05 nm), it achieved convergence by 60 ns, maintaining
relevance due to its strong binding affinity.
Table 2.
Analysis results of Structural Stability from Molecular Dynamics.
Complex Avg. RMSD (nm) H-Bond Count Stability Onset (ns) Interpretation
Pyrethrin I–MAOB 0.28 ± 0.04 <1 25 High structural integrity
Pyrethrin I–RPS6KB1 0.56 ± 0.05 1–6 60 Stable after adjustment
Pyrethrin I–TNKS2 1.13 ± 0.05 2–3 50 Unstable early phase
Pyrethrin II–CSNK2A1 0.33 ± 0.06 <1 15 Moderate model candidate
[92]Open in a new tab
5.3. Network centrality and functional relevance
PPI analysis results in [93]Table 3 identified MAOB as the most
critical intermediary hub (Betweenness = 2193.1), potentially acting as
a signal relay in tumor microenvironment reprogramming. Meanwhile,
TNKS2 held the highest degree centrality (Degree = 10) and was
associated with stemness maintenance through the W[nt]/β-catenin axis,
highlighting its role in triple-negative breast cancer (TNBC)
progression.
Table 3.
PPI analysis results of Network Centrality and Functional Relevance.
Protein Betweenness Degree Functional Pathway Oncogenic Role
MAOB 2193.1 7.0 Oxidative Stress/ERR Metabolic modulation
RPS6KB1 1492.3 7.0 PI3K/AKT/mTOR ER + cell survival driver
CSNK2A1 152.0 2.0 SOX4/TOP2A Proliferation signal
TNKS2 80.0 10.0 Wnt/PARP family Stemness, TNBC aggressiveness
[94]Open in a new tab
5.4. Mechanistic chains and clinical risk mapping
By linking molecular binding data to clinical insights, the study
established a mechanistic inference chain: Compound → Target → Pathway
→ Clinical Outcome. The analysis results are listed in [95]Table 4.
Pyrethrin I's interaction with RPS6KB1 is particularly critical as it
activates the S6K1/PI3K/AKT/mTOR pathway, strongly associated with
ER + breast cancer proliferation. Similarly, TNKS2 modulates W[nt]
signaling, crucial in stemness and TNBC progression.
Table 4.
The analysis results of Mechanistic Chains and Clinical Risk Mapping.
Compound–Target Pathway Clinical Association Risk Potential
Pyrethrin I–RPS6KB1 PI3K/AKT ER + cancer proliferation Highest
Pyrethrin I–TNKS2 Wnt/β-catenin TNBC invasiveness High
Pyrethrin II–CSNK2A1 SOX4/TOP2A Cell cycle activation Moderate–High
Pyrethrin I–MAOB ERR/HIF-1α Metastasis remodeling Moderate
[96]Open in a new tab
5.5. Key findings and applications
Based on the comprehensive analytical workflow, several key conclusions
and applications have emerged. Among the screened compound-target
interactions, Pyrethrin I–RPS6KB1 demonstrated the highest binding
affinity, indicating a potential oncogenic mechanism through activation
of the mTOR signaling pathway. Structurally, the Pyrethrin I–MAOB
complex exhibited superior spatial stability, suggesting its
suitability as a robust template for structural optimization and
refinement. At the network level, MAOB was identified as a central
signaling mediator, while TNKS2 emerged as a critical hub protein,
functionally linking pyrethrin exposure to aggressive cancer
phenotypes. Collectively, these findings reveal the mechanistic
complexity of Pyrethrin I, which appears capable of influencing
multiple carcinogenic pathways, underscoring its broad toxicological
significance and highlighting the need for further risk assessment in
environmental and pharmaceutical contexts.
Based on the findings of this study, Pyrethrin I demonstrated high
binding affinity and oncogenic potential toward breast cancer-related
proteins, particularly RPS6KB1 and TNKS2. It is therefore recommended
that empirical assays be conducted to validate its toxicological
effects in ER+ and TNBC models. In parallel, the Pyrethrin I–MAOB
complex exhibited exceptional structural stability, suggesting its
suitability as a template for ligand optimization and rational drug
design. Furthermore, to enhance the predictive accuracy and
interpretability of environmental toxicology models, the integration of
the proposed multi-scale mechanistic chain (as illustrated in [97]Table
5 and [98]Fig. 6) is strongly advised. This framework links molecular
binding, protein network topology, pathway enrichment, and clinical
associations, providing a traceable and mechanistically coherent
platform for environmental risk assessment.
Table 5.
Environmental risk modeling framework.
Level Indicator Example Purpose
Molecular Binding ΔG[bind] (MM-PBSA) Pyrethrin I–RPS6KB1
(−27.37 kcal/mol) Binding strength of toxins
Protein Module MCODE, Betweenness RPS6KB1 (PI3K), TNKS2 (W[nt])
Identify network hubs
Pathway Enrichment KEGG, FDR <0.05 PI3K/AKT, W[nt], SOX4 Map
cancer-driving pathways
Clinical Relevance GWAS, GEO, Literature RPS6KB1 (ER+), TNKS2 (TNBC)
[99]Open in a new tab
Fig. 6.
[100]Fig. 6
[101]Open in a new tab
Multi-layered mechanistic pathway supporting toxicological risk
assessment.
This integrative strategy paves the way for a traceable, multi-level
framework in environmental toxicology, offering a rational path from
compound structure to clinical outcome and enabling both risk
forecasting and therapeutic re-evaluation.
To further enhance interpretability, a Sankey diagram ([102]Fig. 7) was
constructed to visualize the integrated multi-scale mechanistic chain
linking chemical compounds → target proteins → signaling pathways →
clinical outcomes. The diagram illustrates that RPS6KB1 and TNKS2
correspond to core oncogenic pathways—PI3K/AKT/mTOR and Wnt/β-catenin,
respectively—highlighting their roles in tumor proliferation, stemness
maintenance, and anti-apoptotic signaling.
Fig. 7.
[103]Fig. 7
[104]Open in a new tab
Sankey diagram of pyrethrin i's translational mechanism linking target
binding, pathways, and clinical associations.
Binding energy analysis identified Pyrethrin I–RPS6KB1 as the strongest
interaction (ΔG = −27.37 kcal/mol), driven by combined hydrophobic and
electrostatic forces. In contrast, Pyrethrin I–MAOB exhibited the
highest electrostatic contribution, indicating a highly polar binding
profile.
In terms of structural stability, the Pyrethrin I–MAOB complex showed
the lowest RMSD, suggesting robust conformational integrity and making
it a reliable model for further structural refinement. Network topology
analysis revealed MAOB as having the highest betweenness centrality,
suggesting a key role in signal transduction, while TNKS2 emerged as
the highest degree hub, positioning it as a pivotal regulator within
the Wnt module.
Collectively, these results demonstrate that Pyrethrin I exhibits
selective binding across key protein targets with both molecular
specificity and system-level functional consequences. The integration
of this mechanistic chain enhances the predictive power and
interpretability of environmental toxicology models. Given their
involvement in critical oncogenic pathways, RPS6KB1 and TNKS2 are
recommended as priority targets for environmental exposure monitoring
and toxicological assessment.
6. Conclusion and recommendations
6.1. Conclusion
This study demonstrates the utility of a multi-scale computational
framework for elucidating the toxicological potential of environmental
chemicals using integrative big data analytics. By combining molecular
docking, free energy calculation (MM-PBSA), molecular dynamics
simulation, protein–protein interaction networks, and pathway–clinical
mapping, we systematically evaluated the interaction profiles and
downstream oncogenic implications of pyrethrin compounds on breast
cancer-related proteins.
Key findings include the identification of RPS6KB1 as the most
thermodynamically favorable and functionally critical target
(ΔG = −27.37 kcal/mol), and TNKS2 as a high-centrality Wnt pathway node
associated with TNBC progression. Moreover, MAOB was found to be the
most structurally stable interaction partner (lowest RMSD), suggesting
its potential as a modeling template for future ligand optimization.
The integrative Sankey diagram further provided a mechanistically
traceable link from compound binding to clinical phenotype, bridging
molecular interactions with disease relevance.
This work highlights the importance of mechanistic transparency,
system-level integration, and data-driven prioritization in modern
environmental toxicology.
6.2. Recommendations
* 1.
Integrative Mechanism-Based Modeling: Environmental risk assessment
frameworks should incorporate mechanistic chains spanning from
chemical structure to clinical outcome. The four-layered
approach—molecular binding, protein module, pathway enrichment, and
phenotype linkage—offers a robust model for toxicological
prediction.
* 2.
Experimental Toxicity Validation: The predicted high-risk targets
RPS6KB1 and TNKS2 should be prioritized for in vitro and in vivo
validation under ER+ and TNBC conditions, respectively, to
substantiate their role in chemical-induced carcinogenesis.
* 3.
Data-Centric Toxicological Screening: Multi-omics datasets (e.g.,
GEO, GWAS, pQTL) should be continuously integrated with
cheminformatics and structural databases (e.g., PDB, ChEMBL) via
standardized ETL pipelines to enable scalable, reproducible
screening across chemical libraries.
* 4.
Application in Environmental Health Monitoring: The framework
proposed here may serve as a predictive engine for regulatory
toxicology, environmental pollutant prioritization, and industrial
chemical evaluation, particularly in the context of
endocrine-disrupting compounds and cancer risk modeling.
Informed consent statement
Not applicable.
Author contributions
Conceptualization, Z.J., W.P.S. investigation, Z.J. and J.S,
methodology, Z.J., W.L. W.P. S., validation, Z.J., J.S. and W.L.,
visualization, Z.J., Y.Z. and W.L., writing – original draft,
preparation, Z.J., J.S., Y.Z. and W. P. S., data curation, Z.J. and
Y.H., formal analysis, Z.J., Y.H and W. P. S., software, W.L. and Y.Z.,
resources, Y.H., writing – review and editing, Y.H., Z.J. and W. P. S.,
funding acquisition, Y.H., supervision, Y.H. All authors have read and
agreed to the published version of the manuscript.
Institutional review board statement
Not applicable.
Data availability statement
The data presented in this study are available on request.
Declaration of competing interest
The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to
influence the work reported in this paper.
Acknowledgments