Abstract This study develops and validates a multi-scale integrative computational toxicology framework to systematically investigate the potential association between the natural pesticides pyrethrin I and II and breast cancer risk. The proposed approach integrates deep learning–based drug–target interaction prediction (via DeepPurpose), molecular docking and dynamics (MD) simulations, and protein–protein interaction (PPI) network modeling, forming a traceable risk inference chain from molecular-level interactions to clinically relevant outcomes. Experimental results revealed that pyrethrin I exhibits stable and high-affinity binding to key breast cancer–related proteins, including RPS6KB1, TNKS2, and MAOB, with a minimum binding free energy (ΔG) reaching −27.37 kcal/mol. These interactions potentially modulate tumor progression through key oncogenic pathways such as PI3K/AKT, Wnt/β-catenin, and metabolic reprogramming. RPS6KB1 is implicated in estrogen receptor–positive (ER+) breast cancer proliferation, while TNKS2 is closely associated with stemness maintenance and the aggressiveness of triple-negative breast cancer (TNBC). MAOB demonstrates the highest structural stability among complexes, making it a promising candidate for toxicological modeling. The study further introduces a cross-scale risk indicator modeling strategy, constructing a mechanistically interpretable chain from compound structure to protein modules, carcinogenic pathways, and clinical risks. This integrative methodology supports environmental exposure monitoring and toxicological policy development. The open-source, containerized analytical toolchain developed herein is highly extensible and adaptable for future toxicological evaluations of other natural compounds and emerging environmental pollutants. Keywords: Integrative computational toxicology, Multi-omics data integration, Deep learning for drug–target interaction, Multi-scale molecular simulation, Environmental chemical risk assessment, Molecular docking and dynamics, Mechanistic causal inference Highlights * • First multi-scale framework linking molecules to clinical toxicology outcomes. * • Pyrethrin I binds key breast cancer proteins with high affinity (ΔG ≤ −27.37). * • MAOB validated as a key hub via MD simulations and network analysis. * • Cross-scale model connects binding strength to clinical GWAS risk markers. * • Open-source pipeline supports toxicology studies of natural and emerging toxins. 1. Introduction Pyrethrins are natural insecticides derived from the plant Tanacetum cinerariifolium. Due to their high biodegradability and low toxicity to mammals, they are widely used in agricultural pest control, urban sanitation management, and household environments. Owing to these characteristics, pyrethrins are often classified as “environmentally friendly pesticides” [[33][1], [34][2], [35][3]]. However, recent studies have indicated that pyrethrins and their metabolites may possess endocrine-disrupting activity [[36]4,[37]5], and prolonged exposure may increase the risk of hormone-related diseases, such as breast cancer [[38]5,[39]6]. Although the underlying mechanisms remain unclear, these potential risks have raised academic concerns about their carcinogenicity. Despite the favorable attention pyrethrins receive due to their natural origin and biocompatibility, their toxicological risk assessment faces multiple challenges. Existing studies have primarily focused on animal experiments and biochemical analyses, emphasizing acute or subacute toxicity testing [[40]1,[41]4,[42][7], [43][8], [44][9], [45][10], [46][11]], but offer limited insights into the long-term health effects of chronic low-dose exposure, especially regarding endocrine disruption and cancer risk. Moreover, there is a lack of integrative cross-scale analytical frameworks that connect molecular binding, protein interaction networks, and epidemiological data, resulting in a significant “scale gap” [[47]4,[48]5,[49][8], [50][9], [51][10], [52][11]]. Currently, toxicological methods are insufficient for quantifying the causal chain from “molecular binding free energy (ΔG) → network perturbation → disease risk,” which limits the ability to infer health impacts from molecular-level data [[53][12], [54][13], [55][14]]. Traditional QSAR models also often neglect protein conformational changes, leading to distorted predictions of binding affinity [[56][15], [57][16], [58][17]]. Furthermore, existing risk assessments typically rely on single data sources and lack integration of multi-omics data and exposure information, resulting in fragmented predictions that are inadequate for supporting intelligent environmental monitoring and policy applications [[59]18,[60]19]. Therefore, there is an urgent need to establish a cross-scale, computationally reproducible, and application-oriented analytical framework to enhance the toxicity interpretation and risk management efficiency for pyrethrins and other emerging pollutants. This study aims to develop a multi-scale computational toxicology framework, selecting pyrethrins I and II as model compounds to investigate their potential carcinogenic mechanisms in relation to molecular binding, protein interaction networks, and breast cancer expression. The specific objectives are: (1) To apply deep learning and molecular simulation techniques to identify target proteins with high binding affinity at the molecular level; (2) To assess the potential causal relationships between these target proteins and breast cancer subtypes; (3) To integrate cross-scale data to construct a “compound–protein module–disease” risk propagation chain; (4) To develop predictive risk indicator functions that model the exposure–disease response surface. This study proposes several technological innovations to enhance the accuracy and operational viability of toxicological risk assessment: (1) Cross-scale integrated model construction: Integrating drug–target interaction (DTI) prediction, molecular docking and dynamics (MD) simulations, and systems biology approaches to, for the first time, establish a complete carcinogenic mechanism pathway from a computational toxicology perspective; (2) Quantifiable risk evaluation formulas: Deriving logical risk functions based on molecular binding free energy to build interpretable and quantifiable exposure–response models (risk surface modeling); (3) Modular analytical toolchain: Developing an open-source, containerized toolchain compatible with Docker and high-performance computing platforms to support reproducible deployment and flexible application of the methods; (4) Application-oriented engineering design: The research outcomes are applicable to environmental monitoring, green pesticide development, and toxic substance policy-making, demonstrating interdisciplinary engineering value and application potential. The established multi-scale analytical framework can be extended to the following future applications: (1) Environmental toxicant early warning platforms: Combining IoT sensor data for real-time identification of high-risk pollution sources to support smart city environmental governance; (2) Chemical risk classification and management: Assisting regulatory authorities in developing risk-based classification and restriction standards for chemicals; (3) Cross-disciplinary educational outreach: Transforming into educational modules applicable to fields such as environmental engineering, bioinformatics, and toxicology; (4) Support for national toxicology policy: Providing a computational foundation for risk prediction of emerging pollutants (e.g., PFAS, naturally derived compounds) to aid national-level risk management. This study aspires to provide an extensible, reproducible, and practically valuable analytical framework for environmental toxicological risk assessment through an integrated and engineering-oriented computational methodology, thereby strengthening the scientific basis for the governance of emerging pollutants. 2. Methodology This study aims to elucidate the potential molecular mechanisms by which pyrethrin I and II may act on breast cancer-related targets. Initially, known targets of pyrethrin I and II were retrieved from the ChEMBL database. Standardized SMILES structures were obtained from PubChem and analyzed using SwissTargetPrediction to identify additional putative targets. Breast cancer-associated genes were compiled from authoritative databases, including GeneCards, OMIM, PharmGKB, and TTD. In addition, two microarray datasets ([61]GSE42568 and [62]GSE15852) from the GEO database were analyzed for differential gene expression, with the top 500 upregulated and downregulated genes selected as candidate targets. To strengthen the disease relevance of these targets, genome-wide association study (GWAS) data for breast cancer were integrated with protein quantitative trait loci (pQTL) data from the Finngen project (Olink and SomaScan platforms). A two-sample Mendelian randomization (MR) analysis was conducted to infer causal relationships between targets and disease traits. SMILES representations of pyrethrin I and II, along with target protein sequences, were input into the DeepPurpose deep learning model to predict binding affinities. Targets with scores above 7 were selected for molecular docking, with 100 conformations generated for each complex. The conformation with the lowest binding energy was considered optimal. Selected ligand–target complexes underwent 100-ns molecular dynamics (MD) simulations. Key dynamic properties, including root-mean-square deviation (RMSD), root-mean-square fluctuation (RMSF), radius of gyration (Rg), intramolecular hydrogen bonding, and free energy landscapes, were analyzed. Binding free energies during the stable phase were calculated using the molecular mechanics/Poisson–Boltzmann surface area (MM/PBSA) method. Proteins with docking energies below −5 kcal/mol were imported into the STRING database to construct protein–protein interaction (PPI) networks, using a medium confidence threshold of 0.400. Topological features such as degree and betweenness centrality were analyzed. Finally, Gene Ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment were performed to explore the biological processes underlying the potential anti-breast cancer mechanisms of pyrethrin I and II. 3. Analytical methods 3.1. Data sources and integration strategy To support the proposed multi-scale drug screening and mechanistic analysis framework, a three-layer data integration architecture was established. This structure comprises compound-level data, disease-associated targets, and protein structural information, ensuring a comprehensive and traceable workflow from molecular input to clinical relevance. At the compound level, primary chemical and bioactivity data were collected from the ChEMBL database. SMILES representations were sourced from PubChem and processed via SwissTargetPrediction to enhance the scope of target identification. At the disease level, breast cancer-related genes were compiled from GeneCards, OMIM, PharmGKB, and TTD. Differential expression analysis of two GEO datasets ([63]GSE42568 and [64]GSE15852) was performed to refine the disease-specific gene set and identify dynamic biomarkers. At the protein structure level, experimentally resolved structures were obtained from the RCSB Protein Data Bank (PDB). Where high-resolution structures were unavailable, AlphaFold-predicted models were used. Functional annotations and sequence alignments from UniProt ensured consistency between protein structure and function. All data underwent a standardized ETL (Extract–Transform–Load) process, including data extraction, format harmonization, nomenclature standardization, redundancy elimination, and cross-database mapping. The processed datasets were loaded into a unified analytical platform to support downstream machine learning, mechanistic modeling, and simulation validation. This integration strategy not only ensures data interoperability and scalability but also provides a robust foundation for cross-scale modeling in drug discovery. 3.2. Multi-scale analytical framework This study proposes an integrated multi-scale analytical framework to systematically investigate the interaction mechanisms between pyrethrin and breast cancer-related targets. The framework encompasses three analytical layers, beginning with network toxicology modeling to establish a structured systems-level perspective. Specifically: (1) A protein–protein interaction (PPI) network was constructed using STRING with a confidence threshold of 0.4 to capture molecular interaction profiles; (2) KEGG pathway enrichment analysis (FDR <0.05) was employed to identify regulatory and signaling pathways associated with the identified targets; (3) The resulting network modules were mapped to breast cancer oncogenic pathways to assess potential phenotypic relevance. In the second phase, a deep learning–based drug–target interaction (DTI) prediction model was developed using a structure-aware architecture. The compound encoder integrates a message passing neural network (MPNN) with an attention mechanism to enhance feature extraction, particularly for cyclic molecular substructures. The protein encoder utilizes a three-dimensional convolutional neural network (3D-CNN) with a kernel size of 5 × 5 × 5 to learn spatial structural representations. A transfer learning strategy was adopted—pretraining the model on the DAVIS dataset followed by task-specific fine-tuning—to optimize predictive performance for pyrethrin-related targets. To validate the predicted interactions, molecular docking and molecular dynamics (MD) simulations were performed. AutoDock was used for initial docking, and GROMACS was employed to conduct 100-ns simulations to assess the structural stability of ligand–protein complexes. The robustness of the interaction was further quantified using MM-PBSA (Molecular Mechanics Poisson–Boltzmann Surface Area) free energy decomposition. The binding free energy was calculated as: ΔG[bind] = ΔE[MM] + ΔG[solv] (1) Where, ΔE[MM] denotes the molecular mechanics energy (including electrostatic and van der Waals contributions), and ΔG[solv] accounts for solvation effects. Overall, this multi-scale framework effectively integrates systems biology and computational chemistry methodologies, providing a novel paradigm for mechanistic studies of natural compounds with anticancer potential. In the core computational component of this study, a cross-modal attention mechanism was introduced to quantify the interaction between drug and target features. The attention weight is calculated as follows: [MATH: ij=soft max(QKT dk) :MATH] (2) where, Q and K represent the feature matrices of the compound and the protein, respectively, and d[k] = 64 serves as a scaling factor. This formulation enables the model to capture the alignment and interaction strength between different modalities—namely, chemical and biological features—thereby enhancing the interpretability and precision of the prediction results. In addition, molecular dynamics simulations were performed to assess the structural stability of the protein–ligand complex over time, using root mean square deviation (RMSD) as the primary metric. The RMSD is calculated as follows: [MATH: RMSD(t)=1Ni=1Nri(t)riref2 :MATH] (3) where N is the number of backbone C[α] atoms, and r[i](t) represents the position of atom i at time t. The results demonstrated that the conformational fluctuations of the complex remained within a stable range throughout the simulation period, indicating the structural robustness of the binding interaction. To support the initial screening at the network level, we employed the confidence integration score from the STRING database, defined as: [MATH: Score=1i=1n(1pi) :MATH] (4) where p[i] denotes the probability from the i-th data source supporting a given protein–protein interaction. In the molecular-level analysis, we applied a message passing neural network (MPNN) architecture, in which node features are updated over time using a gated recurrent unit (GRU). The update function is defined as: [MATH: hvtGRU(hvt1 ,u(nvW×[hv< mrow>t1euv]) :MATH] (5) where h[v](t) is the hidden state of node v at time t, and e[uv] denotes the edge features between nodes u and v. For spatial feature extraction of protein 3D structures, we adopted a three-dimensional convolutional neural network (3D-CNN), with the convolution operation defined as: [MATH: Foutx,y,z=i=kk j=kk< munderover>l=kk(Wi ,j,l×Fin x+i,y+j, z+l) :MATH] (6) This operation captures local spatial patterns in the protein volume data and enhances structure-based representation learning. Taken together, the innovations of this study include: (1) the establishment of a closed-loop verification pipeline integrating network prediction, deep learning, and atomic-scale simulation; (2) the introduction of structure-aware attention mechanisms in the DTI model to improve the representation learning of natural product features; and (3) quantitative free energy analysis revealing a synergistic effect between electrostatic and hydrophobic interactions in pyrethrin binding (e.g., ΔE[elec]/ΔG[SA]>3.5). This framework is highly extensible and applicable to other natural product studies, with customizable parameters such as the confidence threshold for target prediction (e.g., Score >0.5–0.7) and the choice of molecular simulation force field (e.g., GAFF2 for small-molecule optimization). 3.3. Validation and cross-verification of the analytical workflow To ensure the credibility and scientific rigor of the overall analytical workflow, this study implemented a three-tiered validation mechanism, evaluating the process from three perspectives: internal consistency, literature-based support, and methodological comparison. First, the correlation between the prediction results of the deep learning model (DTI) and the binding free energy derived from molecular docking was assessed. The Spearman rank correlation coefficient reached ρ = 0.82 (p < 0.001), indicating a high degree of internal consistency and suggesting that the model effectively reflects actual molecular binding affinities. Second, the accuracy of predicted binding sites was validated through structural alignment. The predicted binding site of MAOB exhibited a root-mean-square deviation (RMSD) of only 0.12 nm when compared with the known crystal structure (PDB: [65]4MI5), supporting the model's capacity to map onto real structural data. Third, a comparative analysis at the methodological level revealed that the proposed integrated framework (deep learning + molecular simulation) outperformed conventional QSAR and standalone MD approaches in both accuracy and interpretability, particularly in predicting complex structures of natural compounds. 3.4. Limitations and engineering solutions Despite the multiple advantages of the proposed methodology, several technical challenges and limitations remain, for which corresponding engineering solutions are proposed. First, in the calculation of binding free energy, the limited accuracy of force field parameters leads to a prediction error of approximately ±2.3 kcal/mol for ΔG. To mitigate this uncertainty, a hybrid QM/MM (quantum mechanics/molecular mechanics) force field will be introduced for refinement. Second, the current 100-ns molecular dynamics (MD) simulations may be insufficient to capture the full conformational landscape of proteins. To address this, Gaussian accelerated molecular dynamics (GaMD) will be employed to enhance sampling diversity and better capture energy transition pathways. Additionally, during the integration of GWAS results, potential bias may arise from population structure heterogeneity within the FinnGen dataset. To correct for this and avoid statistical distortion, the RAPS (Robust Adjusted Profile Score) method will be applied for structural correction and re-estimation. 3.5. Mechanistic integration and engineering translation This study further integrates molecular-level binding information with clinical risk data to construct a traceable and inferable multi-level mechanistic translation chain. The workflow begins with binding energy screening between candidate compounds and target proteins (ΔG < −9.5 kcal/mol), followed by protein interaction module identification (MCODE >4), KEGG pathway enrichment analysis (FDR <0.05) to identify potential biological pathways, and culminates in statistically significant clinical association analysis (p < 0.05). The entire translational chain comprises four hierarchical levels, as illustrated in [66]Fig. 1, progressing from molecular characteristics to clinical relevance. First, candidate compounds with high binding affinity are filtered based on a threshold of ΔG < −9.5 kcal/mol, establishing the foundation for subsequent analyses. Next, the targets corresponding to these high-affinity ligands are subjected to protein-protein interaction network analysis. Protein modules with significant modularity are identified using the MCODE algorithm, selecting subnetworks with MCODE scores greater than 4. These modular proteins are then analyzed through KEGG pathway enrichment, with significant pathways defined by an FDR less than 0.05, revealing potential involvement in key biological mechanisms. Finally, the identified pathways are cross-referenced with clinical databases to determine statistically significant associations with diseases or risk factors (p < 0.05). Fig. 1. [67]Fig. 1 [68]Open in a new tab Integrative workflow bridging structure, binding, pathway, and clinical associations. This integrative workflow bridges four analytical layers—structure → binding → pathway → clinical—and establishes a mechanistic translation framework that traces compound properties to clinical outcomes. It enables the engineering translation of structural toxicology into precision health risk assessment, facilitating the application of computational modeling in real-world biomedical and pharmaceutical contexts. 4. Results 4.1. Workflow performance and outcomes To establish a systematic framework for toxicological prediction, this study integrated data mining, deep learning modeling, and molecular simulation techniques to elucidate the potential mechanisms and validate the molecular targets of pyrethrins in breast cancer. The workflow and key outcomes can be delineated into four major components, with the first outlined below: (1) Target Screening and Data Integration: To construct a reliable list of breast cancer-related target proteins, transcriptomic datasets from publicly available repositories were systematically integrated. Potential targets of pyrethrin I and II were screened and cross-validated using differential expression analysis. Two representative datasets were selected—[69]GSE15852 and [70]GSE42568—which respectively include breast tumor and normal tissue samples, as well as various breast cancer subtypes. (2) Differential gene expression analysis was performed using the limma package in R. Genes were filtered based on the criteria of |log[2] fold change (FC)| > 2 and false discovery rate (FDR) < 0.05, ensuring both statistical significance and biological relevance. (1) In [71]GSE15852, a total of 1027 differentially expressed genes (DEGs) were identified, including 624 upregulated and 403 downregulated genes. Notably, RPS6KB1 and MAOB were significantly upregulated, implicating roles in the mTOR signaling and metabolic pathways respectively. (2) In [72]GSE42568, 875 DEGs were identified (503 upregulated, 372 downregulated). CSNK2A1 was markedly upregulated in triple-negative breast cancer (TNBC) samples, suggesting subtype-specific relevance. (3) Results are visualized through volcano plots ([73]Fig. 2(a) and (b)), with the x-axis representing log[2]FC and the y-axis showing −log[10](FDR). (1) Blue dots indicate significantly upregulated genes, (2) Purple dots denote significantly downregulated genes, (3) Red areas represent non-significant genes. Fig. 2. [74]Fig. 2 [75]Open in a new tab Volcano plots of DEGs from two GEO datasets ([76]GSE15852 & [77]GSE42568). Key targets such as RPS6KB1, MAOB, and CSNK2A1 are located in highly significant and strongly altered regions of the plots, highlighting their potential as critical risk-associated targets. These proteins are also annotated in the TTD and PharmGKB databases as breast cancer-related, and structural data are available from PDB or AlphaFold, qualifying them for subsequent deep learning-based binding prediction and molecular simulation. This step effectively filters transcriptomic data to yield target candidates with both structural availability and functional significance, establishing a solid biological foundation for the multi-scale analytical framework. 4.2. Deep learning prediction performance and hotspot analysis To systematically predict potential interactions between pyrethrins (Pyrethrin I and II) and breast cancer-related proteins, this study employed the DeepPurpose framework for drug-target binding affinity prediction. The model utilized a Message Passing Neural Network (MPNN) as the compound structure encoder combined with a 3D-CNN for protein sequence processing. After transfer learning fine-tuning on the DAVIS dataset, the model architecture named model_MPNN_CNN_DAVIS was established. The model outputs a continuous binding score (DTI score) ranging from 0 to 10, representing the predicted interaction strength between compound and protein. Higher scores indicate stronger expected binding affinity. Predictions were conducted for all candidate breast cancer-related target proteins, calculating binding scores of Pyrethrin I and II, with scores ranging from 3.5 to 7.5. To enhance the efficiency and reliability of subsequent molecular simulations, a DTI score threshold of >7.0 was set to define high-confidence interaction pairs. These selected pairs were further subjected to molecular docking and free energy simulations. For intuitive visualization of the prediction score distribution, [78]Fig. 3 employs a bubble plot to depict model predictions. The x-axis represents predicted binding scores, the y-axis lists compound-target pairs, and bubble sizes correspond to the magnitude of the scores. Key observations from the plot include: (1) most pairs cluster around scores between 5.0 and 6.5, representing the model's primary prediction range and the docking binding energy concentration; (2) high-score regions (>7.0) with large bubbles indicate high-priority pairs such as Pyrethrin I–RPS6KB1 and Pyrethrin II–MAOB, which also showed stable binding energies (ΔG < −20 kcal/mol) in subsequent simulations; (3) a minority of pairs with lower scores (<4.5) scattered on the left side denote low-interaction potential. Fig. 3. Fig. 3 [79]Open in a new tab Distribution of predicted binding scores for pyrethrin I and II against breast cancer candidate targets using MPNN-CNN model. By integrating model predictions and hotspot visualization, this study successfully established a binding affinity–based target ranking strategy with clear prioritization of compound-target pairs. This approach provides a robust preprocessing basis for molecular simulation modules and demonstrates the application potential and stability of the DeepPurpose model in environmental toxicant target prediction. 4.3. Molecular docking and Structural Validation To validate the structural plausibility of the top-scoring drug–target pairs predicted by the DTI model, molecular docking simulations were conducted for the four highest-ranked combinations: Pyrethrin I–RPS6KB1, Pyrethrin I–MAOB, Pyrethrin II–CSNK2A1, and Pyrethrin I–TNKS2. Receptor structures were obtained from either the RCSB PDB database or AlphaFold-predicted models. Preparation steps included water molecule removal, hydrogen atom addition, and binding pocket definition. Docking was performed using AutoDock4. Each docking result was visualized from two perspectives: (1) Labeled Complex View – ligands (Pyrethrins) are represented as blue sticks, with hydrogen bonds (dashed lines), hydrophobic interactions, and key interacting residues highlighted. (2) Pocket Surface View – the receptor protein is shown in surface representation, with the binding pocket highlighted in red, clearly illustrating the ligand's position. Key findings are as follows. * 1. RPS6KB1–Pyrethrin I ([80]Fig. 4 (a)): Pyrethrin I fits stably into a hydrophobic core adjacent to the ATP-binding site of RPS6KB1, forming two stable hydrogen bonds with His104 and Asp113. This interaction may potentially modulate downstream mTOR phosphorylation activity. * 2. MAOB–Pyrethrin I ([81]Fig. 4(b)): The ligand-binding pocket is deeply embedded in the FAD region. Pyrethrin I exhibits strong hydrophobic interactions and π–π stacking, and the binding site perfectly overlaps with the UniProt-defined catalytic domain, suggesting a potential inhibitory effect on MAOB oxidase activity. * 3. CSNK2A1–Pyrethrin II ([82]Fig. 4(c)): Pyrethrin II binds within the kinase active pocket of CSNK2A1, forming hydrogen bonds and hydrophobic interactions with Glu114 and Val66, aligning with the CK2 active region annotated by PFAM. This indicates a high degree of target specificity. * 4. TNKS2–Pyrethrin I ([83]Fig. 4 (d)): Pyrethrin I binds to the N-terminal groove of the PARP module in TNKS2, interacting with Asp1060 and Arg1064. These interactions may interfere with the ADP-ribosylation activity of TNKS2. Fig. 4. [84]Fig. 4 [85]Open in a new tab Molecular docking and structural validation of pyrethrin I–RPS6KB1, pyrethrin I–MAOB, pyrethrin II–CSNK2A1, and pyrethrin I–TNKS2. By integrating UniProt annotations and KEGG functional module analyses, it is evident that all ligand-binding sites are located within the core functional domains of the proteins. This underscores the biological plausibility and structural reliability of the docking results. Together, these findings validate the deep learning predictions and offer stable starting conformations for subsequent free energy calculations and molecular dynamics simulations. 4.4. Molecular dynamics simulation and stability validation To evaluate the binding stability of Pyrethrin compounds with target proteins, 100-ns (ns) molecular dynamics (MD) simulations were conducted. The results demonstrated that the Pyrethrin I–MAOB complex exhibited high structural stability. According to the structural superimposition ([86]Fig. 5), the root mean square deviation (RMSD) between the final stable conformation and the initial structure remained below 0.3 nm, with critical hydrogen bond networks maintained throughout the simulation, indicating stable binding. Among all simulated complexes, Pyrethrin I–MAOB showed the best stability, with an average RMSD of 0.28 ± 0.04 nm and fewer than one hydrogen bond on average, reaching equilibrium by 25 ns. This suggests a high consistency between the simulated and initial conformations. Fig. 5. Fig. 5 [87]Open in a new tab Structural overlay of Pyrethrin I–MAOB complex. (The initial configuration (gray) and final conformation after 100 ns simulation (orange) show minimal deviation. Pyrethrin I is highlighted in green. The preserved hydrogen bond network and low RMSD (<0.3 nm) indicate high structural stability.). In comparison, the Pyrethrin I–RPS6KB1 complex exhibited a higher RMSD of 0.56 ± 0.05 nm but stabilized by 60 ns. Coupled with previous free energy analysis, this complex remains a promising candidate for mechanistic studies and risk assessment. The Pyrethrin I–TNKS2 complex showed poor stability, with an RMSD of 1.13 ± 0.05 nm and fluctuating hydrogen bond counts, indicating significant structural variations early in the simulation. Pyrethrin II–CSNK2A1 exhibited moderate stability, with an average RMSD of 0.33 ± 0.06 nm, reaching equilibrium by 15 ns, making it a potential candidate for further modeling. 5. Summary of analytical results and discussions In this study, a comprehensive multi-scale analytical framework was developed and applied to assess the carcinogenic potential of pyrethrin compounds through a sequence of computational approaches, including molecular docking, MM-PBSA-based binding free energy analysis, molecular dynamics (MD) simulations, protein–protein interaction (PPI) network centrality evaluation, and pathway–clinical mechanism mapping. [88]Table 1∼4 summarize the primary outcomes across these analytical stages. Each compound–protein complex was evaluated in terms of binding affinity, structural stability, biological function, and mechanistic linkage to clinical phenotypes. Table 1. Analysis results of Molecular Binding and Energetics. Complex ΔG[bind] (kcal/mol) Dominant Interaction Interpretation Pyrethrin I–RPS6KB1 −27.37 Electrostatic + Hydrophobic Strongest affinity, oncogenic potential Pyrethrin I–MAOB −15.66 Electrostatic Polar interaction model Pyrethrin II–CSNK2A1 −22.44 Hydrophobic Mitogenic signal contributor Pyrethrin I–TNKS2 −21.94 Electrostatic W[nt] pathway regulation [89]Open in a new tab 5.1. Molecular binding and energetics The MM-PBSA results in [90]Table 1 revealed that the Pyrethrin I–RPS6KB1 complex exhibited the lowest binding free energy (ΔG[bind] = −27.37 kcal/mol), with dominant electrostatic and hydrophobic interactions. This dual-mode binding suggests enhanced structural stabilization, indicating a strong biological relevance and potentially high toxicity. In contrast, MAOB showed the most pronounced electrostatic dominance (ΔE[elec] = −122.68 kcal/mol), making it a promising model for exploring polar active sites in drug development. 5.2. Structural stability from molecular dynamics The MD simulation results in [91]Table 2 further supported the docking outcomes. The Pyrethrin I–MAOB complex displayed the lowest RMSD (0.28 ± 0.04 nm) and stabilized early at 25 ns, confirming its structural consistency and potential as a reliable model for future ligand optimization. While RPS6KB1 showed moderate RMSD (0.56 ± 0.05 nm), it achieved convergence by 60 ns, maintaining relevance due to its strong binding affinity. Table 2. Analysis results of Structural Stability from Molecular Dynamics. Complex Avg. RMSD (nm) H-Bond Count Stability Onset (ns) Interpretation Pyrethrin I–MAOB 0.28 ± 0.04 <1 25 High structural integrity Pyrethrin I–RPS6KB1 0.56 ± 0.05 1–6 60 Stable after adjustment Pyrethrin I–TNKS2 1.13 ± 0.05 2–3 50 Unstable early phase Pyrethrin II–CSNK2A1 0.33 ± 0.06 <1 15 Moderate model candidate [92]Open in a new tab 5.3. Network centrality and functional relevance PPI analysis results in [93]Table 3 identified MAOB as the most critical intermediary hub (Betweenness = 2193.1), potentially acting as a signal relay in tumor microenvironment reprogramming. Meanwhile, TNKS2 held the highest degree centrality (Degree = 10) and was associated with stemness maintenance through the W[nt]/β-catenin axis, highlighting its role in triple-negative breast cancer (TNBC) progression. Table 3. PPI analysis results of Network Centrality and Functional Relevance. Protein Betweenness Degree Functional Pathway Oncogenic Role MAOB 2193.1 7.0 Oxidative Stress/ERR Metabolic modulation RPS6KB1 1492.3 7.0 PI3K/AKT/mTOR ER + cell survival driver CSNK2A1 152.0 2.0 SOX4/TOP2A Proliferation signal TNKS2 80.0 10.0 Wnt/PARP family Stemness, TNBC aggressiveness [94]Open in a new tab 5.4. Mechanistic chains and clinical risk mapping By linking molecular binding data to clinical insights, the study established a mechanistic inference chain: Compound → Target → Pathway → Clinical Outcome. The analysis results are listed in [95]Table 4. Pyrethrin I's interaction with RPS6KB1 is particularly critical as it activates the S6K1/PI3K/AKT/mTOR pathway, strongly associated with ER + breast cancer proliferation. Similarly, TNKS2 modulates W[nt] signaling, crucial in stemness and TNBC progression. Table 4. The analysis results of Mechanistic Chains and Clinical Risk Mapping. Compound–Target Pathway Clinical Association Risk Potential Pyrethrin I–RPS6KB1 PI3K/AKT ER + cancer proliferation Highest Pyrethrin I–TNKS2 Wnt/β-catenin TNBC invasiveness High Pyrethrin II–CSNK2A1 SOX4/TOP2A Cell cycle activation Moderate–High Pyrethrin I–MAOB ERR/HIF-1α Metastasis remodeling Moderate [96]Open in a new tab 5.5. Key findings and applications Based on the comprehensive analytical workflow, several key conclusions and applications have emerged. Among the screened compound-target interactions, Pyrethrin I–RPS6KB1 demonstrated the highest binding affinity, indicating a potential oncogenic mechanism through activation of the mTOR signaling pathway. Structurally, the Pyrethrin I–MAOB complex exhibited superior spatial stability, suggesting its suitability as a robust template for structural optimization and refinement. At the network level, MAOB was identified as a central signaling mediator, while TNKS2 emerged as a critical hub protein, functionally linking pyrethrin exposure to aggressive cancer phenotypes. Collectively, these findings reveal the mechanistic complexity of Pyrethrin I, which appears capable of influencing multiple carcinogenic pathways, underscoring its broad toxicological significance and highlighting the need for further risk assessment in environmental and pharmaceutical contexts. Based on the findings of this study, Pyrethrin I demonstrated high binding affinity and oncogenic potential toward breast cancer-related proteins, particularly RPS6KB1 and TNKS2. It is therefore recommended that empirical assays be conducted to validate its toxicological effects in ER+ and TNBC models. In parallel, the Pyrethrin I–MAOB complex exhibited exceptional structural stability, suggesting its suitability as a template for ligand optimization and rational drug design. Furthermore, to enhance the predictive accuracy and interpretability of environmental toxicology models, the integration of the proposed multi-scale mechanistic chain (as illustrated in [97]Table 5 and [98]Fig. 6) is strongly advised. This framework links molecular binding, protein network topology, pathway enrichment, and clinical associations, providing a traceable and mechanistically coherent platform for environmental risk assessment. Table 5. Environmental risk modeling framework. Level Indicator Example Purpose Molecular Binding ΔG[bind] (MM-PBSA) Pyrethrin I–RPS6KB1 (−27.37 kcal/mol) Binding strength of toxins Protein Module MCODE, Betweenness RPS6KB1 (PI3K), TNKS2 (W[nt]) Identify network hubs Pathway Enrichment KEGG, FDR <0.05 PI3K/AKT, W[nt], SOX4 Map cancer-driving pathways Clinical Relevance GWAS, GEO, Literature RPS6KB1 (ER+), TNKS2 (TNBC) [99]Open in a new tab Fig. 6. [100]Fig. 6 [101]Open in a new tab Multi-layered mechanistic pathway supporting toxicological risk assessment. This integrative strategy paves the way for a traceable, multi-level framework in environmental toxicology, offering a rational path from compound structure to clinical outcome and enabling both risk forecasting and therapeutic re-evaluation. To further enhance interpretability, a Sankey diagram ([102]Fig. 7) was constructed to visualize the integrated multi-scale mechanistic chain linking chemical compounds → target proteins → signaling pathways → clinical outcomes. The diagram illustrates that RPS6KB1 and TNKS2 correspond to core oncogenic pathways—PI3K/AKT/mTOR and Wnt/β-catenin, respectively—highlighting their roles in tumor proliferation, stemness maintenance, and anti-apoptotic signaling. Fig. 7. [103]Fig. 7 [104]Open in a new tab Sankey diagram of pyrethrin i's translational mechanism linking target binding, pathways, and clinical associations. Binding energy analysis identified Pyrethrin I–RPS6KB1 as the strongest interaction (ΔG = −27.37 kcal/mol), driven by combined hydrophobic and electrostatic forces. In contrast, Pyrethrin I–MAOB exhibited the highest electrostatic contribution, indicating a highly polar binding profile. In terms of structural stability, the Pyrethrin I–MAOB complex showed the lowest RMSD, suggesting robust conformational integrity and making it a reliable model for further structural refinement. Network topology analysis revealed MAOB as having the highest betweenness centrality, suggesting a key role in signal transduction, while TNKS2 emerged as the highest degree hub, positioning it as a pivotal regulator within the Wnt module. Collectively, these results demonstrate that Pyrethrin I exhibits selective binding across key protein targets with both molecular specificity and system-level functional consequences. The integration of this mechanistic chain enhances the predictive power and interpretability of environmental toxicology models. Given their involvement in critical oncogenic pathways, RPS6KB1 and TNKS2 are recommended as priority targets for environmental exposure monitoring and toxicological assessment. 6. Conclusion and recommendations 6.1. Conclusion This study demonstrates the utility of a multi-scale computational framework for elucidating the toxicological potential of environmental chemicals using integrative big data analytics. By combining molecular docking, free energy calculation (MM-PBSA), molecular dynamics simulation, protein–protein interaction networks, and pathway–clinical mapping, we systematically evaluated the interaction profiles and downstream oncogenic implications of pyrethrin compounds on breast cancer-related proteins. Key findings include the identification of RPS6KB1 as the most thermodynamically favorable and functionally critical target (ΔG = −27.37 kcal/mol), and TNKS2 as a high-centrality Wnt pathway node associated with TNBC progression. Moreover, MAOB was found to be the most structurally stable interaction partner (lowest RMSD), suggesting its potential as a modeling template for future ligand optimization. The integrative Sankey diagram further provided a mechanistically traceable link from compound binding to clinical phenotype, bridging molecular interactions with disease relevance. This work highlights the importance of mechanistic transparency, system-level integration, and data-driven prioritization in modern environmental toxicology. 6.2. Recommendations * 1. Integrative Mechanism-Based Modeling: Environmental risk assessment frameworks should incorporate mechanistic chains spanning from chemical structure to clinical outcome. The four-layered approach—molecular binding, protein module, pathway enrichment, and phenotype linkage—offers a robust model for toxicological prediction. * 2. Experimental Toxicity Validation: The predicted high-risk targets RPS6KB1 and TNKS2 should be prioritized for in vitro and in vivo validation under ER+ and TNBC conditions, respectively, to substantiate their role in chemical-induced carcinogenesis. * 3. Data-Centric Toxicological Screening: Multi-omics datasets (e.g., GEO, GWAS, pQTL) should be continuously integrated with cheminformatics and structural databases (e.g., PDB, ChEMBL) via standardized ETL pipelines to enable scalable, reproducible screening across chemical libraries. * 4. Application in Environmental Health Monitoring: The framework proposed here may serve as a predictive engine for regulatory toxicology, environmental pollutant prioritization, and industrial chemical evaluation, particularly in the context of endocrine-disrupting compounds and cancer risk modeling. Informed consent statement Not applicable. Author contributions Conceptualization, Z.J., W.P.S. investigation, Z.J. and J.S, methodology, Z.J., W.L. W.P. S., validation, Z.J., J.S. and W.L., visualization, Z.J., Y.Z. and W.L., writing – original draft, preparation, Z.J., J.S., Y.Z. and W. P. S., data curation, Z.J. and Y.H., formal analysis, Z.J., Y.H and W. P. S., software, W.L. and Y.Z., resources, Y.H., writing – review and editing, Y.H., Z.J. and W. P. S., funding acquisition, Y.H., supervision, Y.H. All authors have read and agreed to the published version of the manuscript. Institutional review board statement Not applicable. Data availability statement The data presented in this study are available on request. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgments