Abstract Cepharanthine (CEP) is a natural bisbenzylisoquinoline alkaloid known for its antibacterial, antiviral, and anti-inflammatory activities. Its antifungal effect, however, has not been well studied. In this work, we used machine learning-based virtual screening with Random Forest, Neural Network, and Support Vector Machine models to identify potential inhibitors of Fusarium solani. CEP was selected as a candidate and tested experimentally. The results showed that it inhibited the growth of Fusarium solani, Fusarium proliferatum, Fusarium oxysporum, Alternaria alternata, and Botrytis cinerea. It also reduced the sporulation and spore germination of Fusarium solani and disrupted its redox balance. Transcriptome analysis showed changes in gene expression related to basic metabolic pathways. Molecular docking suggested that CEP binds to the FsCFEM1 protein, and molecular dynamics simulations confirmed stable binding, with key roles for residues THR748 and LEU950. These results suggest that CEP is a potential bio-based antifungal agent and provide novel insights into its mechanism against Fusarium solani. Keywords: cepharanthine, Fusarium solani, antifungal activity, machine learning, CFEM domain-containing protein, natural compound 1. Introduction Fusarium solani is a widespread soilborne pathogen that causes root rot and wilt in many crops, including alfalfa, tobacco, sweet potato, and peanut [[36]1]. These diseases impair plant growth, reduce yields, and affect product quality. At present, chemical fungicides such as carbendazim and procymidone are the main means of control [[37]2]. However, the long-term and extensive use of these fungicides has led to environmental pollution, the development of resistant strains, and food safety concerns, which conflict with the goals of sustainable agriculture [[38]3]. Biological control strategies have drawn increasing attention as alternatives to chemical fungicides. Several endophytic fungi, such as Trichoderma reesei and Chaetomium globosum, have shown the ability to inhibit the mycelial growth of F. solani in vitro [[39]4,[40]5]. In addition, plant-derived secondary metabolites, including carvacrol and eugenol, exhibit antifungal activity by disrupting cell wall and membrane structures, altering their morphology, and interfering with ergosterol biosynthesis [[41]6,[42]7]. Seaweed extracts and certain biofertilizers have also shown inhibitory effects on F. solani, with reduced spore germination observed at higher concentrations [[43]8,[44]9]. Despite these findings, the field application of antagonistic microorganisms and natural compounds remains limited. There is an urgent need to develop efficient, safe, and broad-spectrum agents for managing F. solani. Natural products and microbial metabolites are important sources for discovering new antifungal compounds [[45]10]. In recent years, machine learning has been applied in virtual screening, allowing the analysis of large chemical libraries based on structural and biological data to predict antifungal activity [[46]11,[47]12,[48]13]. Unlike traditional experimental screening, which is often time-consuming and resource-intensive, machine learning-based virtual screening allows the rapid, large-scale identification of potential compounds with improved efficiency and accuracy. Models trained on peptide datasets have been used to extract key activity-related features, with the Random Forest algorithm showing a relatively high performance in prediction tasks [[49]14]. Structural biology and molecular modeling approaches, including docking and dynamics simulations, have also been used to identify candidate compounds against plant pathogens such as cotton pests and Rhizoctonia solani [[50]15,[51]16]. Furthermore, machine learning can assist in the interpretation of fungal genomic and transcriptomic data to uncover gene networks affected by antifungal agents, supporting the identification of potential targets [[52]17]. Together, these tools provide a valuable platform for screening and characterizing antifungal candidates against F. solani. Cepharanthine (CEP) is a bisbenzylisoquinoline alkaloid extracted from the roots of Stephania species (Menispermaceae). It has demonstrated broad-spectrum antibacterial, antiviral, and anti-inflammatory activities [[53]18,[54]19]. CEP can activate type I interferon signaling to enhance host antiviral responses and has shown inhibitory effects on viruses such as human immunodeficiency virus (HIV), hepatitis B virus (HBV), influenza A virus (H1N1), and herpes simplex virus type 1 (HSV-1) [[55]20,[56]21,[57]22,[58]23]. It also modulates the MAPK and NF-κB pathways, suppresses inflammatory cytokine expression, restores autophagy, and contributes to cellular protection [[59]24,[60]25]. Additionally, CEP has been reported to inhibit tumor cell proliferation, induce apoptosis and autophagy, and prevent viral entry [[61]26]. Although its pharmacological effects are well-documented, its antifungal potential, especially against plant pathogenic fungi, remains unclear. In this study, we constructed virtual screening models using Random Forest (RF), Neural Network (NN), and Support Vector Machine (SVM) algorithms to identify potential inhibitors of F. solani. These models were selected to leverage their complementary strengths, as they represent distinct classes of algorithms (ensemble, neural-inspired, and kernel-based, respectively) commonly used in biological prediction tasks [[62]27,[63]28]. CEP was identified as a promising candidate and subsequently validated through in vitro assays. The results showed that CEP significantly inhibited the mycelial growth of F. solani, Fusarium proliferatum, Fusarium oxysporum, Alternaria alternata, and Botrytis cinerea. We further investigated its effects on the sporulation, spore germination, and oxidative stress response in F. solani. Transcriptome analysis revealed significant changes in gene expression following CEP treatment. Molecular docking indicated a potential interaction between CEP and the FsCFEM protein, which was further supported by molecular dynamics simulations. Key residues THR748 and LEU950 were found to contribute to the stability of the binding. 2. Materials and Methods 2.1. Data Preprocessing The relevant information of Fusarium solani was collected from the CHEMBL database ([64]https://www.ebi.ac.uk/chembl/) (accessed on 28 July 2023) as the dataset for the construction of the machine learning model, and 1275 relevant active compounds’ molecular information was obtained. The modeling dataset was cleaned and proofread, and the values of duplicate molecule ChEMBL IDs were deleted, the unified minimum inhibitory concentration (MIC) unit was calculated, and the compounds with no MIC or multiple MICs were deleted. Generally, when essential oils, alkaloids, and other substances are used for the biocontrol of certain pathogenic fungi, they are considered to have inhibitory effects if they can suppress the fungi at concentrations lower than 50 µg/mL [[65]29,[66]30,[67]31]. Isoeugenol exhibited considerable efficacy against free radicals, with MIC50 values of 38.97 and 43.76 µg/mL [[68]30]. Inhibitors and non-inhibitors were divided according to their MIC value, and compounds with a MIC < 50 µg/mL were marked as 0 (representing inhibitors), and those with a MIC ≥ 50 µg/mL were marked as 1 (representing non-inhibitors). 2.2. Molecular Characterization, Feature Selection, and Dataset Partitioning Descriptors and MoleculeDescriptors of the toolkit RDKit in Python 3.7.0 were used to batch-calculate descriptors based on the Simplified Molecular Input Line Entry System (SMILES) of the chemical structure of the compounds in the modeling dataset, and all descriptors were obtained using Descriptors._descList, a component package within Python 3.7.0. Feature dimension reduction was performed by Recursive Feature Elimination (RFE), and 50 molecular descriptors were finally retained for the construction of the machine learning model. A tree-based ensemble model was selected as the basic model for RFE to evaluate features’ importance, and the feature set and target variable were initialized; the final number of features and the number of features eliminated in each iteration were set to 50 and 1, respectively; then all the features and target variables were used to build and train the model, and the importance scores (feature_importances_) of each feature were calculated based on the model. Several features with the lowest scores were removed to obtain a new feature set, and the steps of construction of the model, evaluation of importance, and feature elimination were repeated until the predetermined number of features was reached, the model performance was no longer improved, or the difference in feature importance was less than the threshold, and the feature set that was finally left was the selected optimal feature subset. The dataset was normalized using the Standard scalar function in the Python 3.9 toolkit Scikit-learn and then divided into a training set and a test set in a 4:1 ratio. 2.3. Grid Search and Five-Fold Cross-Validation In order to achieve the best performance of the model, the grid search algorithm is used to determine the optimal parameter combination of RF, SVM, and NN. The three models set the hyperparameter space, clarify the hyperparameters to be optimized, and specify a set of possible value ranges for each hyperparameter, respectively. For RF, n_estimators, max_depth and min_samples_leaf, were selected as the hyperparameters to be optimized; for SVM, the hyperparameters to be optimized were gamma and C; for the NN model, hidden_layer_sizes and max_iter were selected as the hyperparameters to be optimized. The training set was randomly divided into five non-overlapping subsets, each containing approximately 20% of the total data volume. These subsets maintained data distribution consistency as much as possible to ensure that each fold contained inhibitor and non-inhibitor data; each time, one of the subsets was selected as the test set, and the remaining four subsets were combined as the training set. Each subset had the opportunity to be used as a test set once and also as a training set four times; in each round, the performance indicators of the model on the test set were recorded; finally, after five training and validation rounds, the average values of the performance indicators were calculated, including accuracy, precision, recall, F1 score, and Area Under the receiver operating characteristic Curve (AUC), and the performance of each model was carefully compared. 2.4. Assessment of Mycelial Growth, Conidiation, and Spore Germination Following the methodology described previously [[69]32], Fusarium solani, Fusarium oxysporum, Fusarium proliferatum, Botrytis cinerea, and Alternaria alternata were cultured and treated with CEP (Macklin, Shanghai, China). Each fungal strain was inoculated onto PDA medium supplemented with different concentrations of CEP (10 mg/L, 20 mg/L, 30 mg/L, 40 mg/L, 50 mg/L, 60 mg/L, 80 mg/L, 100 mg/L, 120 mg/L, 200 mg/L, 250 mg/L, 300 mg/L), while potato dextrose agar (PDA) medium without CEP was used as a control. The cultures were incubated at 26 °C for 7 days, and colony diameters were measured to calculate the mycelial growth inhibition rate. For conidiation assays in F. solani, the fungus was inoculated into mung bean broth supplemented with 200 mg/L CEP and incubated at 26 °C with shaking at 250 rpm for 2 days. The number of conidia produced was determined using a hemocytometer. To assess conidial germination, conidia were harvested from 2-day-old F. solani cultures and suspended in YEPD medium (3 g yeast extract, 10 g peptone, and 20 g glucose per liter) supplemented with 200 mg/L CEP. The cultures were incubated at 26 °C with shaking for 6 and 12 h, respectively. At each time point, at least 100 randomly selected conidia per field of view were examined under a microscope to determine their germination rates. Statistical significance analysis was performed using a one-way analysis of variance (ANOVA) with pairwise comparisons, implemented via the SPSS 21.0 statistical software package. 2.5. Mycelial Preparation and Oxidative Stress Assays To evaluate the oxidative stress of F. solani after treatment with 20 mg/L, 50 mg/L, 100 mg/L, and 200 mg/L CEP, malondialdehyde (MDA) content, hydrogen peroxide (H[2]O[2]) content, and the activities of superoxide dismutase (SOD), peroxidase (POD), and catalase (CAT) were measured. First, an F. solani spore suspension was prepared and adjusted to 1 × 10^6 spores/mL. A 1 mL aliquot was inoculated into 100 mL of potato dextrose broth (PDB) and incubated at 26 °C with shaking at 200 rpm for 2 days. The mycelia were collected by filtration, washed with sterile water, and transferred into culture media containing 20 mg/L, 50 mg/L, 100 mg/L, or 200 mg/L CEP. Untreated mycelia served as controls. After incubation for 12 h with shaking, the mycelia were collected by vacuum filtration, washed with sterile water, and used for subsequent assays. For MDA content’s measurement, the thiobarbituric acid (TBA) (Sigma-Aldrich, St. Louis, MO, USA) method was used. A 1.0 g mycelial sample was ground into powder in liquid nitrogen, followed by the addition of 5.0 mL of 10% trichloroacetic acid (TCA) (Sigma-Aldrich, St. Louis, MO, USA). After homogenization, the mixture was centrifuged at 10,000× g for 20 min at 4 °C, and the supernatant was collected. A 2.0 mL aliquot of the supernatant (for the blank control, 2.0 mL of 10% TCA was used instead) was mixed with 2.0 mL of 0.67% TBA solution, heated in a boiling water bath for 20 min, cooled, and centrifuged again. The absorbance of the supernatant was measured at 450 nm, 532 nm, and 600 nm. For the H[2]O[2] content’s measurement, a commercial hydrogen peroxide assay kit (Biyuntian, Shanghai, China) was used, following the manufacturer’s instructions. For antioxidant enzyme activity assays, SOD activity was measured using a SOD assay kit (Solarbio, Beijing, China). For the POD activity assay, a 1 g mycelial sample was ground into powder in liquid nitrogen, suspended in 1 mL of PBS (pH 7.2) (Sigma-Aldrich, St. Louis, MO, USA), and centrifuged at 4000 rpm for 10 min. The supernatant was collected, and POD activity was determined using a POD assay kit (Solarbio, Beijing, China). For the CAT activity assay, a CAT assay kit (Solarbio, Beijing, China) was used. 2.6. Chitin Content Measurement To investigate the effect of CEP on fungal cell wall integrity, the chitin content in F. solani mycelia was determined. A 5 mg sample of ground mycelial powder was suspended in 1 mL of 6% KOH solution (Aladdin, Shanghai, China). The sample was incubated in an 80 °C water bath for 1.5 h, followed by centrifugation at 12,000 rpm for 10 min. The supernatant was discarded, and the pellet was resuspended in 1 mL of 10 mM phosphate-buffered saline (pH 7.4). The washing step was repeated twice under the same centrifugation conditions. The final pellet was resuspended in 100 μL of McIlvaine buffer (pH 6.0) (Sigma-Aldrich, St. Louis, MO, USA) with 5 μL of chitinase and incubated at 37 °C for 24 h. Following enzymatic hydrolysis, 100 μL of 0.27 M boric acid solution (Sigma-Aldrich, St. Louis, MO, USA) was added, and the sample was boiled for 10 min before cooling it to room temperature. After adding 1 mL of dimethylaminobenzaldehyde (DMAB) (Aladdin, Shanghai, China) solution, the mixture was incubated at 37 °C for 20 min, and absorbance was measured at 585 nm. A standard curve was generated using N-acetylglucosamine (0.05–0.40 mM) (Sigma-Aldrich, St. Louis, MO, USA) to determine the chitin content. 2.7. RNA Extraction For RNA extraction, at least 0.5 g of F. solani mycelia from CEP-treated samples was collected, along with untreated mycelia as controls. Each group included three biological replicates. The collected mycelia were ground into fine powder in liquid nitrogen and transferred into 2 mL centrifuge tubes. A total of 1 mL of TRIzol reagent (Vazyme Biotech, Nanjing, China) was added to each tube, followed by thorough mixing and incubation at room temperature for 10 min. Then, 200 μL of chloroform (Sigma-Aldrich, St. Louis, MO, USA) was added, and the mixture was shaken at 70 Hz for 120 s, followed by another 10 min of incubation at room temperature. The sample was centrifuged at 12,000 rpm for 10 min at 4 °C, and the aqueous phase was transferred to a new tube. An additional 200 μL of chloroform was added for a second extraction, followed by the same centrifugation step. The final aqueous phase was mixed with an equal volume of isopropanol (Sigma-Aldrich, St. Louis, MO, USA) and incubated on ice for 1 h. The RNA was then precipitated by centrifugation at 12,000 rpm for 10 min at 4 °C. The supernatant was discarded, and the RNA pellet was washed with 1 mL of 70% ethanol prepared with DEPC-treated water (Sigma-Aldrich, St. Louis, MO, USA), followed by centrifugation at 12,000 rpm for 5 min at 4 °C. After discarding the supernatant, the pellet was air-dried in a biosafety cabinet and dissolved in 70 μL of DEPC-treated water. The RNA samples were either stored at −80 °C or used immediately for further experiments. RNA quality was assessed by agarose gel electrophoresis. 2.8. Transcriptome Analysis For the construction of the RNA sequencing (RNA-seq) library, the NEB or strand-specific method was used, ensuring a library concentration above 2 nM. The insert size of the library was also evaluated before the sequencing. The sequencing reads were aligned to the F. solani reference genome using HISAT2 with paired-end clean reads. Gene expression levels were quantified using StringTie v2.0.4, employing both fragments per kilobase per million reads (FPKM) and transcripts per million (TPM) as expression metrics. After gene quantification, expression values from all the samples were merged into an expression matrix. Differential expression analysis was performed using DESeq2 v1.26.0, with significance criteria set as a p-value ≤ 0.05 and |log[2]FC| ≥ 1 for DEGs between CEP-treated and control groups. GO enrichment analysis of DEGs was conducted using the clusterProfiler package v3.14.0, with significantly enriched GO terms identified based on a p-value threshold of ≤0.05. KEGG pathway enrichment analysis was performed using hypergeometric testing, with pathways considered significantly enriched at a p-value ≤ 0.05. For functional annotation using the Clusters of Orthologous COG database, the protein sequences of DEGs were aligned against the COG database using BLASTP v2.2.31 with an E-value threshold of ≤10^−5. Homologous COG protein clusters were identified, and the functional classification of DEGs was performed accordingly. 2.9. Molecular Docking and Molecular Dynamics Simulation The structure file of the CEP small molecule was obtained from the PubChem database. The three-dimensional structure of the FsCFEM1 protein was predicted using SWISS-MODEL ([70]https://swissmodel.expasy.org/) (accessed on 12 September 2024), and the corresponding structure file was retrieved. Molecular docking was performed using AutoDock software 4.2.6. Before docking, both the protein and the small molecule were preprocessed by removing unnecessary atoms, adding hydrogen atoms, and assigning charges. Docking parameters, including the search space, sampling algorithm, and scoring function, were set accordingly. The CFEM domain region (amino acids 725–789) of FsCFEM1 was selected as the docking box, and a semi-flexible molecular docking approach was adopted. After docking, binding modes were evaluated, and the conformations with reasonable binding poses and high docking scores were selected for further analysis. For the selected protein–ligand complexes, molecular dynamics simulations were performed using Gromacs 2022. The Amber99sb-ildn force field was applied for the protein, while the General Amber Force Field (GAFF) was used for the small molecule. A TIP3P water model was used to construct a 10 × 10 × 10 nm^3 water box, ensuring at least 1.2 nm between the protein and the box edges. Ions were added to neutralize the system. During the simulation, long-range electrostatic interactions were treated using the particle–mesh Ewald (PME) method, and energy minimization was conducted for 50,000 steps using the steepest descent algorithm, with Coulomb and van der Waals cutoff distances set to 1 nm. After the system’s minimization, equilibration was performed under the NVT (constant number of particles, volume, and temperature) and NPT (constant number of particles, pressure, and temperature) ensembles. MD simulation was conducted for 100 ns at 300 K, controlled by the Langevin thermostat, and 1 bar, controlled by the Berendsen barostat. A 10 Å cutoff was used for non-bonded interactions. Post-simulation analysis was conducted using built-in Gromacs analysis tools. Structural stability and flexibility were assessed by calculating RMSD, RMSF, and Rg. The binding free energy was estimated using Gmx_MMPBSA, a component within Gromacs 2022. 3. Results 3.1. Construction and Evaluation of Machine Learning Models In this study, three machine learning algorithms, namely Random Forest (RF), Support Vector Machine (SVM), and Neural Network (NN), were selected to build models for screening compounds that inhibit F. solani. Before the model building, a chemical space analysis was conducted based on the Molecular Weight (MW) and Aliphatic and Aromatic LogP (AlogP) of the compounds in the processed dataset ([71]Figure 1A). The results showed that the MW of the modeling dataset was concentrated in the range of 100 to 600, and the AlogP values ranged from 0 to 8. This indicates that the chemical space of the compounds in the modeling dataset of this study is relatively large, with a good stability and high operability, and it may contain various types of candidate drugs. In order to effectively improve the stability and reliability of the model, the 208 molecular descriptors obtained through molecular characterization calculations were screened using recursive feature elimination (RFE); the number of retained features was determined based on the RFE; when the number of features was 50, the evaluation metrics, such as the accuracy of the model on the dataset, were relatively high ([72]Figure S1). Features at this dimension can ensure the generalizability of the models. The Pearson correlation coefficient of these 50 features was calculated ([73]Figure S2). The results showed that the absolute values of correlation coefficients for most features were less than 0.5, indicating a weak correlation among those features. The features retained by RFE effectively removed redundancy, which served to improve the performance and accuracy of the model. The three selected machine learning models were generated using the open-source toolkit Scikit-learn in Python 3.9. The parameters of the models were adjusted using grid search, learning curves, and accuracy values to achieve the best prediction results ([74]Figure 1C–E). Finally, the five-fold cross-validation results on the training set and the test set results of the generated machine learning models are shown in [75]Table 1 and [76]Figure 1F–H, respectively. The accuracy rates of the three models on the training set are close to those on the test set, indicating that the models exhibit a good generalizability and robustness. RF and SVM had similar prediction effects on inhibitors (“0”) and non-inhibitors (“1”), and NN had better prediction effects on non-inhibitors. The F1 scores and precision indicators of the three models are all greater than 0.7, indicating that the model performance is balanced and can accurately predict compounds that inhibit the activity of F. solani within a certain range. Additionally, this study found that the ROC curves of the three models of RF, SVM, and ANN were highly similar, and the values of the area under curve (AUC) were all above 0.80. Among them, RF had the highest AUC of 0.93, indicating that the three models performed well. Considering that different algorithms have preferences and the generalizability of a single model is