Abstract

   This study develops and validates a multi-scale integrative
   computational toxicology framework to systematically investigate the
   potential association between the natural pesticides pyrethrin I and II
   and breast cancer risk. The proposed approach integrates deep
   learning–based drug–target interaction prediction (via DeepPurpose),
   molecular docking and dynamics (MD) simulations, and protein–protein
   interaction (PPI) network modeling, forming a traceable risk inference
   chain from molecular-level interactions to clinically relevant
   outcomes. Experimental results revealed that pyrethrin I exhibits
   stable and high-affinity binding to key breast cancer–related proteins,
   including RPS6KB1, TNKS2, and MAOB, with a minimum binding free energy
   (ΔG) reaching −27.37 kcal/mol. These interactions potentially modulate
   tumor progression through key oncogenic pathways such as PI3K/AKT,
   Wnt/β-catenin, and metabolic reprogramming. RPS6KB1 is implicated in
   estrogen receptor–positive (ER+) breast cancer proliferation, while
   TNKS2 is closely associated with stemness maintenance and the
   aggressiveness of triple-negative breast cancer (TNBC). MAOB
   demonstrates the highest structural stability among complexes, making
   it a promising candidate for toxicological modeling. The study further
   introduces a cross-scale risk indicator modeling strategy, constructing
   a mechanistically interpretable chain from compound structure to
   protein modules, carcinogenic pathways, and clinical risks. This
   integrative methodology supports environmental exposure monitoring and
   toxicological policy development. The open-source, containerized
   analytical toolchain developed herein is highly extensible and
   adaptable for future toxicological evaluations of other natural
   compounds and emerging environmental pollutants.

   Keywords: Integrative computational toxicology, Multi-omics data
   integration, Deep learning for drug–target interaction, Multi-scale
   molecular simulation, Environmental chemical risk assessment, Molecular
   docking and dynamics, Mechanistic causal inference

Highlights

     * •
       First multi-scale framework linking molecules to clinical
       toxicology outcomes.
     * •
       Pyrethrin I binds key breast cancer proteins with high affinity
       (ΔG ≤ −27.37).
     * •
       MAOB validated as a key hub via MD simulations and network
       analysis.
     * •
       Cross-scale model connects binding strength to clinical GWAS risk
       markers.
     * •
       Open-source pipeline supports toxicology studies of natural and
       emerging toxins.

1. Introduction

   Pyrethrins are natural insecticides derived from the plant Tanacetum
   cinerariifolium. Due to their high biodegradability and low toxicity to
   mammals, they are widely used in agricultural pest control, urban
   sanitation management, and household environments. Owing to these
   characteristics, pyrethrins are often classified as “environmentally
   friendly pesticides” [[33][1], [34][2], [35][3]]. However, recent
   studies have indicated that pyrethrins and their metabolites may
   possess endocrine-disrupting activity [[36]4,[37]5], and prolonged
   exposure may increase the risk of hormone-related diseases, such as
   breast cancer [[38]5,[39]6]. Although the underlying mechanisms remain
   unclear, these potential risks have raised academic concerns about
   their carcinogenicity. Despite the favorable attention pyrethrins
   receive due to their natural origin and biocompatibility, their
   toxicological risk assessment faces multiple challenges. Existing
   studies have primarily focused on animal experiments and biochemical
   analyses, emphasizing acute or subacute toxicity testing
   [[40]1,[41]4,[42][7], [43][8], [44][9], [45][10], [46][11]], but offer
   limited insights into the long-term health effects of chronic low-dose
   exposure, especially regarding endocrine disruption and cancer risk.
   Moreover, there is a lack of integrative cross-scale analytical
   frameworks that connect molecular binding, protein interaction
   networks, and epidemiological data, resulting in a significant “scale
   gap” [[47]4,[48]5,[49][8], [50][9], [51][10], [52][11]].

   Currently, toxicological methods are insufficient for quantifying the
   causal chain from “molecular binding free energy (ΔG) → network
   perturbation → disease risk,” which limits the ability to infer health
   impacts from molecular-level data [[53][12], [54][13], [55][14]].
   Traditional QSAR models also often neglect protein conformational
   changes, leading to distorted predictions of binding affinity
   [[56][15], [57][16], [58][17]]. Furthermore, existing risk assessments
   typically rely on single data sources and lack integration of
   multi-omics data and exposure information, resulting in fragmented
   predictions that are inadequate for supporting intelligent
   environmental monitoring and policy applications [[59]18,[60]19].

   Therefore, there is an urgent need to establish a cross-scale,
   computationally reproducible, and application-oriented analytical
   framework to enhance the toxicity interpretation and risk management
   efficiency for pyrethrins and other emerging pollutants. This study
   aims to develop a multi-scale computational toxicology framework,
   selecting pyrethrins I and II as model compounds to investigate their
   potential carcinogenic mechanisms in relation to molecular binding,
   protein interaction networks, and breast cancer expression. The
   specific objectives are: (1) To apply deep learning and molecular
   simulation techniques to identify target proteins with high binding
   affinity at the molecular level; (2) To assess the potential causal
   relationships between these target proteins and breast cancer subtypes;
   (3) To integrate cross-scale data to construct a “compound–protein
   module–disease” risk propagation chain; (4) To develop predictive risk
   indicator functions that model the exposure–disease response surface.

   This study proposes several technological innovations to enhance the
   accuracy and operational viability of toxicological risk assessment:
   (1) Cross-scale integrated model construction: Integrating drug–target
   interaction (DTI) prediction, molecular docking and dynamics (MD)
   simulations, and systems biology approaches to, for the first time,
   establish a complete carcinogenic mechanism pathway from a
   computational toxicology perspective; (2) Quantifiable risk evaluation
   formulas: Deriving logical risk functions based on molecular binding
   free energy to build interpretable and quantifiable exposure–response
   models (risk surface modeling); (3) Modular analytical toolchain:
   Developing an open-source, containerized toolchain compatible with
   Docker and high-performance computing platforms to support reproducible
   deployment and flexible application of the methods; (4)
   Application-oriented engineering design: The research outcomes are
   applicable to environmental monitoring, green pesticide development,
   and toxic substance policy-making, demonstrating interdisciplinary
   engineering value and application potential.

   The established multi-scale analytical framework can be extended to the
   following future applications: (1) Environmental toxicant early warning
   platforms: Combining IoT sensor data for real-time identification of
   high-risk pollution sources to support smart city environmental
   governance; (2) Chemical risk classification and management: Assisting
   regulatory authorities in developing risk-based classification and
   restriction standards for chemicals; (3) Cross-disciplinary educational
   outreach: Transforming into educational modules applicable to fields
   such as environmental engineering, bioinformatics, and toxicology; (4)
   Support for national toxicology policy: Providing a computational
   foundation for risk prediction of emerging pollutants (e.g., PFAS,
   naturally derived compounds) to aid national-level risk management.

   This study aspires to provide an extensible, reproducible, and
   practically valuable analytical framework for environmental
   toxicological risk assessment through an integrated and
   engineering-oriented computational methodology, thereby strengthening
   the scientific basis for the governance of emerging pollutants.

2. Methodology

   This study aims to elucidate the potential molecular mechanisms by
   which pyrethrin I and II may act on breast cancer-related targets.
   Initially, known targets of pyrethrin I and II were retrieved from the
   ChEMBL database. Standardized SMILES structures were obtained from
   PubChem and analyzed using SwissTargetPrediction to identify additional
   putative targets. Breast cancer-associated genes were compiled from
   authoritative databases, including GeneCards, OMIM, PharmGKB, and TTD.
   In addition, two microarray datasets ([61]GSE42568 and [62]GSE15852)
   from the GEO database were analyzed for differential gene expression,
   with the top 500 upregulated and downregulated genes selected as
   candidate targets.

   To strengthen the disease relevance of these targets, genome-wide
   association study (GWAS) data for breast cancer were integrated with
   protein quantitative trait loci (pQTL) data from the Finngen project
   (Olink and SomaScan platforms). A two-sample Mendelian randomization
   (MR) analysis was conducted to infer causal relationships between
   targets and disease traits. SMILES representations of pyrethrin I and
   II, along with target protein sequences, were input into the
   DeepPurpose deep learning model to predict binding affinities. Targets
   with scores above 7 were selected for molecular docking, with 100
   conformations generated for each complex. The conformation with the
   lowest binding energy was considered optimal.

   Selected ligand–target complexes underwent 100-ns molecular dynamics
   (MD) simulations. Key dynamic properties, including root-mean-square
   deviation (RMSD), root-mean-square fluctuation (RMSF), radius of
   gyration (Rg), intramolecular hydrogen bonding, and free energy
   landscapes, were analyzed. Binding free energies during the stable
   phase were calculated using the molecular mechanics/Poisson–Boltzmann
   surface area (MM/PBSA) method. Proteins with docking energies below
   −5 kcal/mol were imported into the STRING database to construct
   protein–protein interaction (PPI) networks, using a medium confidence
   threshold of 0.400. Topological features such as degree and betweenness
   centrality were analyzed. Finally, Gene Ontology (GO) annotation and
   Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment were
   performed to explore the biological processes underlying the potential
   anti-breast cancer mechanisms of pyrethrin I and II.

3. Analytical methods

3.1. Data sources and integration strategy

   To support the proposed multi-scale drug screening and mechanistic
   analysis framework, a three-layer data integration architecture was
   established. This structure comprises compound-level data,
   disease-associated targets, and protein structural information,
   ensuring a comprehensive and traceable workflow from molecular input to
   clinical relevance.

   At the compound level, primary chemical and bioactivity data were
   collected from the ChEMBL database. SMILES representations were sourced
   from PubChem and processed via SwissTargetPrediction to enhance the
   scope of target identification. At the disease level, breast
   cancer-related genes were compiled from GeneCards, OMIM, PharmGKB, and
   TTD. Differential expression analysis of two GEO datasets ([63]GSE42568
   and [64]GSE15852) was performed to refine the disease-specific gene set
   and identify dynamic biomarkers.

   At the protein structure level, experimentally resolved structures were
   obtained from the RCSB Protein Data Bank (PDB). Where high-resolution
   structures were unavailable, AlphaFold-predicted models were used.
   Functional annotations and sequence alignments from UniProt ensured
   consistency between protein structure and function.

   All data underwent a standardized ETL (Extract–Transform–Load) process,
   including data extraction, format harmonization, nomenclature
   standardization, redundancy elimination, and cross-database mapping.
   The processed datasets were loaded into a unified analytical platform
   to support downstream machine learning, mechanistic modeling, and
   simulation validation. This integration strategy not only ensures data
   interoperability and scalability but also provides a robust foundation
   for cross-scale modeling in drug discovery.

3.2. Multi-scale analytical framework

   This study proposes an integrated multi-scale analytical framework to
   systematically investigate the interaction mechanisms between pyrethrin
   and breast cancer-related targets. The framework encompasses three
   analytical layers, beginning with network toxicology modeling to
   establish a structured systems-level perspective. Specifically: (1) A
   protein–protein interaction (PPI) network was constructed using STRING
   with a confidence threshold of 0.4 to capture molecular interaction
   profiles; (2) KEGG pathway enrichment analysis (FDR <0.05) was employed
   to identify regulatory and signaling pathways associated with the
   identified targets; (3) The resulting network modules were mapped to
   breast cancer oncogenic pathways to assess potential phenotypic
   relevance.

   In the second phase, a deep learning–based drug–target interaction
   (DTI) prediction model was developed using a structure-aware
   architecture. The compound encoder integrates a message passing neural
   network (MPNN) with an attention mechanism to enhance feature
   extraction, particularly for cyclic molecular substructures. The
   protein encoder utilizes a three-dimensional convolutional neural
   network (3D-CNN) with a kernel size of 5 × 5 × 5 to learn spatial
   structural representations. A transfer learning strategy was
   adopted—pretraining the model on the DAVIS dataset followed by
   task-specific fine-tuning—to optimize predictive performance for
   pyrethrin-related targets.

   To validate the predicted interactions, molecular docking and molecular
   dynamics (MD) simulations were performed. AutoDock was used for initial
   docking, and GROMACS was employed to conduct 100-ns simulations to
   assess the structural stability of ligand–protein complexes. The
   robustness of the interaction was further quantified using MM-PBSA
   (Molecular Mechanics Poisson–Boltzmann Surface Area) free energy
   decomposition. The binding free energy was calculated as:
   ΔG[bind] = ΔE[MM] + ΔG[solv] (1)

   Where, ΔE[MM] denotes the molecular mechanics energy (including
   electrostatic and van der Waals contributions), and ΔG[solv] accounts
   for solvation effects.

   Overall, this multi-scale framework effectively integrates systems
   biology and computational chemistry methodologies, providing a novel
   paradigm for mechanistic studies of natural compounds with anticancer
   potential.

   In the core computational component of this study, a cross-modal
   attention mechanism was introduced to quantify the interaction between
   drug and target features. The attention weight is calculated as
   follows:
   [MATH:
   <mrow><msub><mo>∝</mo><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo
   linebreak="badbreak">=</mo><mi>s</mi><mi>o</mi><mi>f</mi><mi>t</mi><mi>
   m</mi><mi>a</mi><mi>x</mi><mrow><mo
   stretchy="true">(</mo><mfrac><mrow><mi>Q</mi><msup><mi>K</mi><mi>T</mi>
   </msup></mrow><msub><mi>d</mi><mi>k</mi></msub></mfrac><mo
   stretchy="true">)</mo></mrow></mrow> :MATH]
   (2)

   where, Q and K represent the feature matrices of the compound and the
   protein, respectively, and d[k] = 64 serves as a scaling factor. This
   formulation enables the model to capture the alignment and interaction
   strength between different modalities—namely, chemical and biological
   features—thereby enhancing the interpretability and precision of the
   prediction results.

   In addition, molecular dynamics simulations were performed to assess
   the structural stability of the protein–ligand complex over time, using
   root mean square deviation (RMSD) as the primary metric. The RMSD is
   calculated as follows:
   [MATH: <mrow><mi>R</mi><mi>M</mi><mi>S</mi><mi>D</mi><mrow><mo
   stretchy="true">(</mo><mi>t</mi><mo stretchy="true">)</mo></mrow><mo
   linebreak="badbreak">=</mo><mfrac><mn>1</mn><mi>N</mi></mfrac><munderov
   er><mo>∑</mo><mrow><mi>i</mi><mo
   linebreak="badbreak">=</mo><mn>1</mn></mrow><mi>N</mi></munderover><msu
   p><mrow><mo>‖</mo><mrow><msub><mi>r</mi><mi>i</mi></msub><mrow><mo
   stretchy="true">(</mo><mi>t</mi><mo stretchy="true">)</mo></mrow><mo
   linebreak="badbreak">−</mo><msubsup><mi>r</mi><mi>i</mi><mrow><mi>r</mi
   ><mi>e</mi><mi>f</mi></mrow></msubsup></mrow><mo>‖</mo></mrow><mn>2</mn
   ></msup></mrow> :MATH]
   (3)

   where N is the number of backbone C[α] atoms, and r[i](t) represents
   the position of atom i at time t. The results demonstrated that the
   conformational fluctuations of the complex remained within a stable
   range throughout the simulation period, indicating the structural
   robustness of the binding interaction.

   To support the initial screening at the network level, we employed the
   confidence integration score from the STRING database, defined as:
   [MATH: <mrow><mi>S</mi><mi>c</mi><mi>o</mi><mi>r</mi><mi>e</mi><mo
   linebreak="badbreak">=</mo><mn>1</mn><mo
   linebreak="goodbreak">−</mo><mrow><munderover><mo>∏</mo><mrow><mi>i</mi
   ><mo
   linebreak="badbreak">=</mo><mn>1</mn></mrow><mi>n</mi></munderover><mro
   w><mo stretchy="true">(</mo><mrow><mn>1</mn><mo
   linebreak="badbreak">−</mo><msub><mi>p</mi><mi>i</mi></msub></mrow><mo
   stretchy="true">)</mo></mrow></mrow></mrow> :MATH]
   (4)

   where p[i] denotes the probability from the i-th data source supporting
   a given protein–protein interaction.

   In the molecular-level analysis, we applied a message passing neural
   network (MPNN) architecture, in which node features are updated over
   time using a gated recurrent unit (GRU). The update function is defined
   as:
   [MATH:
   <mrow><msub><mi>h</mi><mi>v</mi></msub><mfenced><mi>t</mi></mfenced><mo
   >−</mo><mi>G</mi><mi>R</mi><mi>U</mi><mo>(</mo><msub><mi>h</mi><mi>v</m
   i></msub><mfenced><mrow><mi>t</mi><mo>−</mo><mn>1</mn></mrow></mfenced>
   <mo>,</mo><munder><mo>∑</mo><mrow><mi>u</mi><mo>∈</mo><mo
   stretchy="true">(</mo><mi>n</mi><mfenced><mi>v</mi></mfenced></mrow></m
   under><mrow><mi>W</mi><mo>×</mo><mo
   stretchy="true">[</mo><msub><mi>h</mi><mi>v</mi></msub><mrow><mfenced><
   mrow><mi>t</mi><mo>−</mo><mn>1</mn></mrow></mfenced><mo>‖</mo></mrow></
   mrow><msub><mi>e</mi><mrow><mi>u</mi><mi>v</mi></mrow></msub></mrow><mo
   >]</mo><mo>)</mo> :MATH]
   (5)

   where h[v](t) is the hidden state of node v at time t, and e[uv]
   denotes the edge features between nodes u and v.

   For spatial feature extraction of protein 3D structures, we adopted a
   three-dimensional convolutional neural network (3D-CNN), with the
   convolution operation defined as:
   [MATH:
   <msub><mi>F</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub><mfen
   ced><mrow><mi>x</mi><mo>,</mo><mi>y</mi><mo>,</mo><mi>z</mi></mrow></mf
   enced><mo>=</mo><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mo>−</
   mo><mi>k</mi></mrow><mi>k</mi></munderover><munderover><mo>∑</mo><mrow>
   <mi>j</mi><mo>=</mo><mo>−</mo><mi>k</mi></mrow><mi>k</mi></munderover><
   munderover><mo>∑</mo><mrow><mi>l</mi><mo>=</mo><mo>−</mo><mi>k</mi></mr
   ow><mi>k</mi></munderover><mo>(</mo><mi>W</mi><mfenced><mrow><mi>i</mi>
   <mo>,</mo><mi>j</mi><mo>,</mo><mi>l</mi></mrow></mfenced><mo>×</mo><msu
   b><mi>F</mi><mrow><mi>i</mi><mi>n</mi></mrow></msub><mfenced><mrow><mi>
   x</mi><mo>+</mo><mi>i</mi><mo>,</mo><mi>y</mi><mo>+</mo><mi>j</mi><mo>,
   </mo><mi>z</mi><mo>+</mo><mi>l</mi></mrow></mfenced><mo>)</mo> :MATH]
   (6)

   This operation captures local spatial patterns in the protein volume
   data and enhances structure-based representation learning.

   Taken together, the innovations of this study include: (1) the
   establishment of a closed-loop verification pipeline integrating
   network prediction, deep learning, and atomic-scale simulation; (2) the
   introduction of structure-aware attention mechanisms in the DTI model
   to improve the representation learning of natural product features; and
   (3) quantitative free energy analysis revealing a synergistic effect
   between electrostatic and hydrophobic interactions in pyrethrin binding
   (e.g., ΔE[elec]/ΔG[SA]>3.5).

   This framework is highly extensible and applicable to other natural
   product studies, with customizable parameters such as the confidence
   threshold for target prediction (e.g., Score >0.5–0.7) and the choice
   of molecular simulation force field (e.g., GAFF2 for small-molecule
   optimization).

3.3. Validation and cross-verification of the analytical workflow

   To ensure the credibility and scientific rigor of the overall
   analytical workflow, this study implemented a three-tiered validation
   mechanism, evaluating the process from three perspectives: internal
   consistency, literature-based support, and methodological comparison.
   First, the correlation between the prediction results of the deep
   learning model (DTI) and the binding free energy derived from molecular
   docking was assessed. The Spearman rank correlation coefficient reached
   ρ = 0.82 (p < 0.001), indicating a high degree of internal consistency
   and suggesting that the model effectively reflects actual molecular
   binding affinities. Second, the accuracy of predicted binding sites was
   validated through structural alignment. The predicted binding site of
   MAOB exhibited a root-mean-square deviation (RMSD) of only 0.12 nm when
   compared with the known crystal structure (PDB: [65]4MI5), supporting
   the model's capacity to map onto real structural data. Third, a
   comparative analysis at the methodological level revealed that the
   proposed integrated framework (deep learning + molecular simulation)
   outperformed conventional QSAR and standalone MD approaches in both
   accuracy and interpretability, particularly in predicting complex
   structures of natural compounds.

3.4. Limitations and engineering solutions

   Despite the multiple advantages of the proposed methodology, several
   technical challenges and limitations remain, for which corresponding
   engineering solutions are proposed. First, in the calculation of
   binding free energy, the limited accuracy of force field parameters
   leads to a prediction error of approximately ±2.3 kcal/mol for ΔG. To
   mitigate this uncertainty, a hybrid QM/MM (quantum mechanics/molecular
   mechanics) force field will be introduced for refinement. Second, the
   current 100-ns molecular dynamics (MD) simulations may be insufficient
   to capture the full conformational landscape of proteins. To address
   this, Gaussian accelerated molecular dynamics (GaMD) will be employed
   to enhance sampling diversity and better capture energy transition
   pathways. Additionally, during the integration of GWAS results,
   potential bias may arise from population structure heterogeneity within
   the FinnGen dataset. To correct for this and avoid statistical
   distortion, the RAPS (Robust Adjusted Profile Score) method will be
   applied for structural correction and re-estimation.

3.5. Mechanistic integration and engineering translation

   This study further integrates molecular-level binding information with
   clinical risk data to construct a traceable and inferable multi-level
   mechanistic translation chain. The workflow begins with binding energy
   screening between candidate compounds and target proteins
   (ΔG < −9.5 kcal/mol), followed by protein interaction module
   identification (MCODE >4), KEGG pathway enrichment analysis (FDR <0.05)
   to identify potential biological pathways, and culminates in
   statistically significant clinical association analysis (p < 0.05).

   The entire translational chain comprises four hierarchical levels, as
   illustrated in [66]Fig. 1, progressing from molecular characteristics
   to clinical relevance. First, candidate compounds with high binding
   affinity are filtered based on a threshold of ΔG < −9.5 kcal/mol,
   establishing the foundation for subsequent analyses. Next, the targets
   corresponding to these high-affinity ligands are subjected to
   protein-protein interaction network analysis. Protein modules with
   significant modularity are identified using the MCODE algorithm,
   selecting subnetworks with MCODE scores greater than 4. These modular
   proteins are then analyzed through KEGG pathway enrichment, with
   significant pathways defined by an FDR less than 0.05, revealing
   potential involvement in key biological mechanisms. Finally, the
   identified pathways are cross-referenced with clinical databases to
   determine statistically significant associations with diseases or risk
   factors (p < 0.05).

Fig. 1.

   [67]Fig. 1
   [68]Open in a new tab

   Integrative workflow bridging structure, binding, pathway, and clinical
   associations.

   This integrative workflow bridges four analytical layers—structure →
   binding → pathway → clinical—and establishes a mechanistic translation
   framework that traces compound properties to clinical outcomes. It
   enables the engineering translation of structural toxicology into
   precision health risk assessment, facilitating the application of
   computational modeling in real-world biomedical and pharmaceutical
   contexts.

4. Results

4.1. Workflow performance and outcomes

   To establish a systematic framework for toxicological prediction, this
   study integrated data mining, deep learning modeling, and molecular
   simulation techniques to elucidate the potential mechanisms and
   validate the molecular targets of pyrethrins in breast cancer. The
   workflow and key outcomes can be delineated into four major components,
   with the first outlined below: (1) Target Screening and Data
   Integration: To construct a reliable list of breast cancer-related
   target proteins, transcriptomic datasets from publicly available
   repositories were systematically integrated. Potential targets of
   pyrethrin I and II were screened and cross-validated using differential
   expression analysis. Two representative datasets were
   selected—[69]GSE15852 and [70]GSE42568—which respectively include
   breast tumor and normal tissue samples, as well as various breast
   cancer subtypes. (2) Differential gene expression analysis was
   performed using the limma package in R. Genes were filtered based on
   the criteria of |log[2] fold change (FC)| > 2 and false discovery rate
   (FDR) < 0.05, ensuring both statistical significance and biological
   relevance. (1) In [71]GSE15852, a total of 1027 differentially
   expressed genes (DEGs) were identified, including 624 upregulated and
   403 downregulated genes. Notably, RPS6KB1 and MAOB were significantly
   upregulated, implicating roles in the mTOR signaling and metabolic
   pathways respectively. (2) In [72]GSE42568, 875 DEGs were identified
   (503 upregulated, 372 downregulated). CSNK2A1 was markedly upregulated
   in triple-negative breast cancer (TNBC) samples, suggesting
   subtype-specific relevance. (3) Results are visualized through volcano
   plots ([73]Fig. 2(a) and (b)), with the x-axis representing log[2]FC
   and the y-axis showing −log[10](FDR). (1) Blue dots indicate
   significantly upregulated genes, (2) Purple dots denote significantly
   downregulated genes, (3) Red areas represent non-significant genes.

Fig. 2.

   [74]Fig. 2
   [75]Open in a new tab

   Volcano plots of DEGs from two GEO datasets ([76]GSE15852 &
   [77]GSE42568).

   Key targets such as RPS6KB1, MAOB, and CSNK2A1 are located in highly
   significant and strongly altered regions of the plots, highlighting
   their potential as critical risk-associated targets. These proteins are
   also annotated in the TTD and PharmGKB databases as breast
   cancer-related, and structural data are available from PDB or
   AlphaFold, qualifying them for subsequent deep learning-based binding
   prediction and molecular simulation. This step effectively filters
   transcriptomic data to yield target candidates with both structural
   availability and functional significance, establishing a solid
   biological foundation for the multi-scale analytical framework.

4.2. Deep learning prediction performance and hotspot analysis

   To systematically predict potential interactions between pyrethrins
   (Pyrethrin I and II) and breast cancer-related proteins, this study
   employed the DeepPurpose framework for drug-target binding affinity
   prediction. The model utilized a Message Passing Neural Network (MPNN)
   as the compound structure encoder combined with a 3D-CNN for protein
   sequence processing. After transfer learning fine-tuning on the DAVIS
   dataset, the model architecture named model_MPNN_CNN_DAVIS was
   established.

   The model outputs a continuous binding score (DTI score) ranging from 0
   to 10, representing the predicted interaction strength between compound
   and protein. Higher scores indicate stronger expected binding affinity.
   Predictions were conducted for all candidate breast cancer-related
   target proteins, calculating binding scores of Pyrethrin I and II, with
   scores ranging from 3.5 to 7.5.

   To enhance the efficiency and reliability of subsequent molecular
   simulations, a DTI score threshold of >7.0 was set to define
   high-confidence interaction pairs. These selected pairs were further
   subjected to molecular docking and free energy simulations.

   For intuitive visualization of the prediction score distribution,
   [78]Fig. 3 employs a bubble plot to depict model predictions. The
   x-axis represents predicted binding scores, the y-axis lists
   compound-target pairs, and bubble sizes correspond to the magnitude of
   the scores. Key observations from the plot include: (1) most pairs
   cluster around scores between 5.0 and 6.5, representing the model's
   primary prediction range and the docking binding energy concentration;
   (2) high-score regions (>7.0) with large bubbles indicate high-priority
   pairs such as Pyrethrin I–RPS6KB1 and Pyrethrin II–MAOB, which also
   showed stable binding energies (ΔG < −20 kcal/mol) in subsequent
   simulations; (3) a minority of pairs with lower scores (<4.5) scattered
   on the left side denote low-interaction potential.

Fig. 3.

   Fig. 3
   [79]Open in a new tab

   Distribution of predicted binding scores for pyrethrin I and II against
   breast cancer candidate targets using MPNN-CNN model.

   By integrating model predictions and hotspot visualization, this study
   successfully established a binding affinity–based target ranking
   strategy with clear prioritization of compound-target pairs. This
   approach provides a robust preprocessing basis for molecular simulation
   modules and demonstrates the application potential and stability of the
   DeepPurpose model in environmental toxicant target prediction.

4.3. Molecular docking and Structural Validation

   To validate the structural plausibility of the top-scoring drug–target
   pairs predicted by the DTI model, molecular docking simulations were
   conducted for the four highest-ranked combinations: Pyrethrin
   I–RPS6KB1, Pyrethrin I–MAOB, Pyrethrin II–CSNK2A1, and Pyrethrin
   I–TNKS2. Receptor structures were obtained from either the RCSB PDB
   database or AlphaFold-predicted models. Preparation steps included
   water molecule removal, hydrogen atom addition, and binding pocket
   definition. Docking was performed using AutoDock4.

   Each docking result was visualized from two perspectives: (1) Labeled
   Complex View – ligands (Pyrethrins) are represented as blue sticks,
   with hydrogen bonds (dashed lines), hydrophobic interactions, and key
   interacting residues highlighted. (2) Pocket Surface View – the
   receptor protein is shown in surface representation, with the binding
   pocket highlighted in red, clearly illustrating the ligand's position.

   Key findings are as follows.
     * 1.
       RPS6KB1–Pyrethrin I ([80]Fig. 4 (a)): Pyrethrin I fits stably into
       a hydrophobic core adjacent to the ATP-binding site of RPS6KB1,
       forming two stable hydrogen bonds with His104 and Asp113. This
       interaction may potentially modulate downstream mTOR
       phosphorylation activity.
     * 2.
       MAOB–Pyrethrin I ([81]Fig. 4(b)): The ligand-binding pocket is
       deeply embedded in the FAD region. Pyrethrin I exhibits strong
       hydrophobic interactions and π–π stacking, and the binding site
       perfectly overlaps with the UniProt-defined catalytic domain,
       suggesting a potential inhibitory effect on MAOB oxidase activity.
     * 3.
       CSNK2A1–Pyrethrin II ([82]Fig. 4(c)): Pyrethrin II binds within the
       kinase active pocket of CSNK2A1, forming hydrogen bonds and
       hydrophobic interactions with Glu114 and Val66, aligning with the
       CK2 active region annotated by PFAM. This indicates a high degree
       of target specificity.
     * 4.
       TNKS2–Pyrethrin I ([83]Fig. 4 (d)): Pyrethrin I binds to the
       N-terminal groove of the PARP module in TNKS2, interacting with
       Asp1060 and Arg1064. These interactions may interfere with the
       ADP-ribosylation activity of TNKS2.

Fig. 4.

   [84]Fig. 4
   [85]Open in a new tab

   Molecular docking and structural validation of pyrethrin I–RPS6KB1,
   pyrethrin I–MAOB, pyrethrin II–CSNK2A1, and pyrethrin I–TNKS2.

   By integrating UniProt annotations and KEGG functional module analyses,
   it is evident that all ligand-binding sites are located within the core
   functional domains of the proteins. This underscores the biological
   plausibility and structural reliability of the docking results.
   Together, these findings validate the deep learning predictions and
   offer stable starting conformations for subsequent free energy
   calculations and molecular dynamics simulations.

4.4. Molecular dynamics simulation and stability validation

   To evaluate the binding stability of Pyrethrin compounds with target
   proteins, 100-ns (ns) molecular dynamics (MD) simulations were
   conducted. The results demonstrated that the Pyrethrin I–MAOB complex
   exhibited high structural stability. According to the structural
   superimposition ([86]Fig. 5), the root mean square deviation (RMSD)
   between the final stable conformation and the initial structure
   remained below 0.3 nm, with critical hydrogen bond networks maintained
   throughout the simulation, indicating stable binding. Among all
   simulated complexes, Pyrethrin I–MAOB showed the best stability, with
   an average RMSD of 0.28 ± 0.04 nm and fewer than one hydrogen bond on
   average, reaching equilibrium by 25 ns. This suggests a high
   consistency between the simulated and initial conformations.

Fig. 5.

   Fig. 5
   [87]Open in a new tab

   Structural overlay of Pyrethrin I–MAOB complex. (The initial
   configuration (gray) and final conformation after 100 ns simulation
   (orange) show minimal deviation. Pyrethrin I is highlighted in green.
   The preserved hydrogen bond network and low RMSD (<0.3 nm) indicate
   high structural stability.).

   In comparison, the Pyrethrin I–RPS6KB1 complex exhibited a higher RMSD
   of 0.56 ± 0.05 nm but stabilized by 60 ns. Coupled with previous free
   energy analysis, this complex remains a promising candidate for
   mechanistic studies and risk assessment. The Pyrethrin I–TNKS2 complex
   showed poor stability, with an RMSD of 1.13 ± 0.05 nm and fluctuating
   hydrogen bond counts, indicating significant structural variations
   early in the simulation. Pyrethrin II–CSNK2A1 exhibited moderate
   stability, with an average RMSD of 0.33 ± 0.06 nm, reaching equilibrium
   by 15 ns, making it a potential candidate for further modeling.

5. Summary of analytical results and discussions

   In this study, a comprehensive multi-scale analytical framework was
   developed and applied to assess the carcinogenic potential of pyrethrin
   compounds through a sequence of computational approaches, including
   molecular docking, MM-PBSA-based binding free energy analysis,
   molecular dynamics (MD) simulations, protein–protein interaction (PPI)
   network centrality evaluation, and pathway–clinical mechanism mapping.
   [88]Table 1∼4 summarize the primary outcomes across these analytical
   stages. Each compound–protein complex was evaluated in terms of binding
   affinity, structural stability, biological function, and mechanistic
   linkage to clinical phenotypes.

Table 1.

   Analysis results of Molecular Binding and Energetics.
   Complex ΔG[bind] (kcal/mol) Dominant Interaction Interpretation
   Pyrethrin I–RPS6KB1 −27.37 Electrostatic + Hydrophobic Strongest
   affinity, oncogenic potential
   Pyrethrin I–MAOB −15.66 Electrostatic Polar interaction model
   Pyrethrin II–CSNK2A1 −22.44 Hydrophobic Mitogenic signal contributor
   Pyrethrin I–TNKS2 −21.94 Electrostatic W[nt] pathway regulation
   [89]Open in a new tab

5.1. Molecular binding and energetics

   The MM-PBSA results in [90]Table 1 revealed that the Pyrethrin
   I–RPS6KB1 complex exhibited the lowest binding free energy
   (ΔG[bind] = −27.37 kcal/mol), with dominant electrostatic and
   hydrophobic interactions. This dual-mode binding suggests enhanced
   structural stabilization, indicating a strong biological relevance and
   potentially high toxicity.

   In contrast, MAOB showed the most pronounced electrostatic dominance
   (ΔE[elec] = −122.68 kcal/mol), making it a promising model for
   exploring polar active sites in drug development.

5.2. Structural stability from molecular dynamics

   The MD simulation results in [91]Table 2 further supported the docking
   outcomes. The Pyrethrin I–MAOB complex displayed the lowest RMSD
   (0.28 ± 0.04 nm) and stabilized early at 25 ns, confirming its
   structural consistency and potential as a reliable model for future
   ligand optimization. While RPS6KB1 showed moderate RMSD
   (0.56 ± 0.05 nm), it achieved convergence by 60 ns, maintaining
   relevance due to its strong binding affinity.

Table 2.

   Analysis results of Structural Stability from Molecular Dynamics.
   Complex Avg. RMSD (nm) H-Bond Count Stability Onset (ns) Interpretation
   Pyrethrin I–MAOB 0.28 ± 0.04 <1 25 High structural integrity
   Pyrethrin I–RPS6KB1 0.56 ± 0.05 1–6 60 Stable after adjustment
   Pyrethrin I–TNKS2 1.13 ± 0.05 2–3 50 Unstable early phase
   Pyrethrin II–CSNK2A1 0.33 ± 0.06 <1 15 Moderate model candidate
   [92]Open in a new tab

5.3. Network centrality and functional relevance

   PPI analysis results in [93]Table 3 identified MAOB as the most
   critical intermediary hub (Betweenness = 2193.1), potentially acting as
   a signal relay in tumor microenvironment reprogramming. Meanwhile,
   TNKS2 held the highest degree centrality (Degree = 10) and was
   associated with stemness maintenance through the W[nt]/β-catenin axis,
   highlighting its role in triple-negative breast cancer (TNBC)
   progression.

Table 3.

   PPI analysis results of Network Centrality and Functional Relevance.
   Protein Betweenness Degree  Functional Pathway         Oncogenic Role
   MAOB      2193.1     7.0   Oxidative Stress/ERR Metabolic modulation
   RPS6KB1   1492.3     7.0   PI3K/AKT/mTOR        ER + cell survival driver
   CSNK2A1    152.0     2.0   SOX4/TOP2A           Proliferation signal
   TNKS2      80.0      10.0  Wnt/PARP family      Stemness, TNBC aggressiveness
   [94]Open in a new tab

5.4. Mechanistic chains and clinical risk mapping

   By linking molecular binding data to clinical insights, the study
   established a mechanistic inference chain: Compound → Target → Pathway
   → Clinical Outcome. The analysis results are listed in [95]Table 4.
   Pyrethrin I's interaction with RPS6KB1 is particularly critical as it
   activates the S6K1/PI3K/AKT/mTOR pathway, strongly associated with
   ER + breast cancer proliferation. Similarly, TNKS2 modulates W[nt]
   signaling, crucial in stemness and TNBC progression.

Table 4.

   The analysis results of Mechanistic Chains and Clinical Risk Mapping.
     Compound–Target       Pathway      Clinical Association    Risk Potential
   Pyrethrin I–RPS6KB1  PI3K/AKT      ER + cancer proliferation Highest
   Pyrethrin I–TNKS2    Wnt/β-catenin TNBC invasiveness         High
   Pyrethrin II–CSNK2A1 SOX4/TOP2A    Cell cycle activation     Moderate–High
   Pyrethrin I–MAOB     ERR/HIF-1α    Metastasis remodeling     Moderate
   [96]Open in a new tab

5.5. Key findings and applications

   Based on the comprehensive analytical workflow, several key conclusions
   and applications have emerged. Among the screened compound-target
   interactions, Pyrethrin I–RPS6KB1 demonstrated the highest binding
   affinity, indicating a potential oncogenic mechanism through activation
   of the mTOR signaling pathway. Structurally, the Pyrethrin I–MAOB
   complex exhibited superior spatial stability, suggesting its
   suitability as a robust template for structural optimization and
   refinement. At the network level, MAOB was identified as a central
   signaling mediator, while TNKS2 emerged as a critical hub protein,
   functionally linking pyrethrin exposure to aggressive cancer
   phenotypes. Collectively, these findings reveal the mechanistic
   complexity of Pyrethrin I, which appears capable of influencing
   multiple carcinogenic pathways, underscoring its broad toxicological
   significance and highlighting the need for further risk assessment in
   environmental and pharmaceutical contexts.

   Based on the findings of this study, Pyrethrin I demonstrated high
   binding affinity and oncogenic potential toward breast cancer-related
   proteins, particularly RPS6KB1 and TNKS2. It is therefore recommended
   that empirical assays be conducted to validate its toxicological
   effects in ER+ and TNBC models. In parallel, the Pyrethrin I–MAOB
   complex exhibited exceptional structural stability, suggesting its
   suitability as a template for ligand optimization and rational drug
   design. Furthermore, to enhance the predictive accuracy and
   interpretability of environmental toxicology models, the integration of
   the proposed multi-scale mechanistic chain (as illustrated in [97]Table
   5 and [98]Fig. 6) is strongly advised. This framework links molecular
   binding, protein network topology, pathway enrichment, and clinical
   associations, providing a traceable and mechanistically coherent
   platform for environmental risk assessment.

Table 5.

   Environmental risk modeling framework.
   Level Indicator Example Purpose
   Molecular Binding ΔG[bind] (MM-PBSA) Pyrethrin I–RPS6KB1
   (−27.37 kcal/mol) Binding strength of toxins
   Protein Module MCODE, Betweenness RPS6KB1 (PI3K), TNKS2 (W[nt])
   Identify network hubs
   Pathway Enrichment KEGG, FDR <0.05 PI3K/AKT, W[nt], SOX4 Map
   cancer-driving pathways
   Clinical Relevance GWAS, GEO, Literature RPS6KB1 (ER+), TNKS2 (TNBC)
   [99]Open in a new tab

Fig. 6.

   [100]Fig. 6
   [101]Open in a new tab

   Multi-layered mechanistic pathway supporting toxicological risk
   assessment.

   This integrative strategy paves the way for a traceable, multi-level
   framework in environmental toxicology, offering a rational path from
   compound structure to clinical outcome and enabling both risk
   forecasting and therapeutic re-evaluation.

   To further enhance interpretability, a Sankey diagram ([102]Fig. 7) was
   constructed to visualize the integrated multi-scale mechanistic chain
   linking chemical compounds → target proteins → signaling pathways →
   clinical outcomes. The diagram illustrates that RPS6KB1 and TNKS2
   correspond to core oncogenic pathways—PI3K/AKT/mTOR and Wnt/β-catenin,
   respectively—highlighting their roles in tumor proliferation, stemness
   maintenance, and anti-apoptotic signaling.

Fig. 7.

   [103]Fig. 7
   [104]Open in a new tab

   Sankey diagram of pyrethrin i's translational mechanism linking target
   binding, pathways, and clinical associations.

   Binding energy analysis identified Pyrethrin I–RPS6KB1 as the strongest
   interaction (ΔG = −27.37 kcal/mol), driven by combined hydrophobic and
   electrostatic forces. In contrast, Pyrethrin I–MAOB exhibited the
   highest electrostatic contribution, indicating a highly polar binding
   profile.

   In terms of structural stability, the Pyrethrin I–MAOB complex showed
   the lowest RMSD, suggesting robust conformational integrity and making
   it a reliable model for further structural refinement. Network topology
   analysis revealed MAOB as having the highest betweenness centrality,
   suggesting a key role in signal transduction, while TNKS2 emerged as
   the highest degree hub, positioning it as a pivotal regulator within
   the Wnt module.

   Collectively, these results demonstrate that Pyrethrin I exhibits
   selective binding across key protein targets with both molecular
   specificity and system-level functional consequences. The integration
   of this mechanistic chain enhances the predictive power and
   interpretability of environmental toxicology models. Given their
   involvement in critical oncogenic pathways, RPS6KB1 and TNKS2 are
   recommended as priority targets for environmental exposure monitoring
   and toxicological assessment.

6. Conclusion and recommendations

6.1. Conclusion

   This study demonstrates the utility of a multi-scale computational
   framework for elucidating the toxicological potential of environmental
   chemicals using integrative big data analytics. By combining molecular
   docking, free energy calculation (MM-PBSA), molecular dynamics
   simulation, protein–protein interaction networks, and pathway–clinical
   mapping, we systematically evaluated the interaction profiles and
   downstream oncogenic implications of pyrethrin compounds on breast
   cancer-related proteins.

   Key findings include the identification of RPS6KB1 as the most
   thermodynamically favorable and functionally critical target
   (ΔG = −27.37 kcal/mol), and TNKS2 as a high-centrality Wnt pathway node
   associated with TNBC progression. Moreover, MAOB was found to be the
   most structurally stable interaction partner (lowest RMSD), suggesting
   its potential as a modeling template for future ligand optimization.
   The integrative Sankey diagram further provided a mechanistically
   traceable link from compound binding to clinical phenotype, bridging
   molecular interactions with disease relevance.

   This work highlights the importance of mechanistic transparency,
   system-level integration, and data-driven prioritization in modern
   environmental toxicology.

6.2. Recommendations

     * 1.
       Integrative Mechanism-Based Modeling: Environmental risk assessment
       frameworks should incorporate mechanistic chains spanning from
       chemical structure to clinical outcome. The four-layered
       approach—molecular binding, protein module, pathway enrichment, and
       phenotype linkage—offers a robust model for toxicological
       prediction.
     * 2.
       Experimental Toxicity Validation: The predicted high-risk targets
       RPS6KB1 and TNKS2 should be prioritized for in vitro and in vivo
       validation under ER+ and TNBC conditions, respectively, to
       substantiate their role in chemical-induced carcinogenesis.
     * 3.
       Data-Centric Toxicological Screening: Multi-omics datasets (e.g.,
       GEO, GWAS, pQTL) should be continuously integrated with
       cheminformatics and structural databases (e.g., PDB, ChEMBL) via
       standardized ETL pipelines to enable scalable, reproducible
       screening across chemical libraries.
     * 4.
       Application in Environmental Health Monitoring: The framework
       proposed here may serve as a predictive engine for regulatory
       toxicology, environmental pollutant prioritization, and industrial
       chemical evaluation, particularly in the context of
       endocrine-disrupting compounds and cancer risk modeling.

Informed consent statement

   Not applicable.

Author contributions

   Conceptualization, Z.J., W.P.S. investigation, Z.J. and J.S,
   methodology, Z.J., W.L. W.P. S., validation, Z.J., J.S. and W.L.,
   visualization, Z.J., Y.Z. and W.L., writing – original draft,
   preparation, Z.J., J.S., Y.Z. and W. P. S., data curation, Z.J. and
   Y.H., formal analysis, Z.J., Y.H and W. P. S., software, W.L. and Y.Z.,
   resources, Y.H., writing – review and editing, Y.H., Z.J. and W. P. S.,
   funding acquisition, Y.H., supervision, Y.H. All authors have read and
   agreed to the published version of the manuscript.

Institutional review board statement

   Not applicable.

Data availability statement

   The data presented in this study are available on request.

Declaration of competing interest

   The authors declare that they have no known competing financial
   interests or personal relationships that could have appeared to
   influence the work reported in this paper.

Acknowledgments