Abstract

   Cepharanthine (CEP) is a natural bisbenzylisoquinoline alkaloid known
   for its antibacterial, antiviral, and anti-inflammatory activities. Its
   antifungal effect, however, has not been well studied. In this work, we
   used machine learning-based virtual screening with Random Forest,
   Neural Network, and Support Vector Machine models to identify potential
   inhibitors of Fusarium solani. CEP was selected as a candidate and
   tested experimentally. The results showed that it inhibited the growth
   of Fusarium solani, Fusarium proliferatum, Fusarium oxysporum,
   Alternaria alternata, and Botrytis cinerea. It also reduced the
   sporulation and spore germination of Fusarium solani and disrupted its
   redox balance. Transcriptome analysis showed changes in gene expression
   related to basic metabolic pathways. Molecular docking suggested that
   CEP binds to the FsCFEM1 protein, and molecular dynamics simulations
   confirmed stable binding, with key roles for residues THR748 and
   LEU950. These results suggest that CEP is a potential bio-based
   antifungal agent and provide novel insights into its mechanism against
   Fusarium solani.

   Keywords: cepharanthine, Fusarium solani, antifungal activity, machine
   learning, CFEM domain-containing protein, natural compound

1. Introduction

   Fusarium solani is a widespread soilborne pathogen that causes root rot
   and wilt in many crops, including alfalfa, tobacco, sweet potato, and
   peanut [[36]1]. These diseases impair plant growth, reduce yields, and
   affect product quality. At present, chemical fungicides such as
   carbendazim and procymidone are the main means of control [[37]2].
   However, the long-term and extensive use of these fungicides has led to
   environmental pollution, the development of resistant strains, and food
   safety concerns, which conflict with the goals of sustainable
   agriculture [[38]3]. Biological control strategies have drawn
   increasing attention as alternatives to chemical fungicides. Several
   endophytic fungi, such as Trichoderma reesei and Chaetomium globosum,
   have shown the ability to inhibit the mycelial growth of F. solani in
   vitro [[39]4,[40]5]. In addition, plant-derived secondary metabolites,
   including carvacrol and eugenol, exhibit antifungal activity by
   disrupting cell wall and membrane structures, altering their
   morphology, and interfering with ergosterol biosynthesis [[41]6,[42]7].
   Seaweed extracts and certain biofertilizers have also shown inhibitory
   effects on F. solani, with reduced spore germination observed at higher
   concentrations [[43]8,[44]9]. Despite these findings, the field
   application of antagonistic microorganisms and natural compounds
   remains limited. There is an urgent need to develop efficient, safe,
   and broad-spectrum agents for managing F. solani.

   Natural products and microbial metabolites are important sources for
   discovering new antifungal compounds [[45]10]. In recent years, machine
   learning has been applied in virtual screening, allowing the analysis
   of large chemical libraries based on structural and biological data to
   predict antifungal activity [[46]11,[47]12,[48]13]. Unlike traditional
   experimental screening, which is often time-consuming and
   resource-intensive, machine learning-based virtual screening allows the
   rapid, large-scale identification of potential compounds with improved
   efficiency and accuracy. Models trained on peptide datasets have been
   used to extract key activity-related features, with the Random Forest
   algorithm showing a relatively high performance in prediction tasks
   [[49]14]. Structural biology and molecular modeling approaches,
   including docking and dynamics simulations, have also been used to
   identify candidate compounds against plant pathogens such as cotton
   pests and Rhizoctonia solani [[50]15,[51]16]. Furthermore, machine
   learning can assist in the interpretation of fungal genomic and
   transcriptomic data to uncover gene networks affected by antifungal
   agents, supporting the identification of potential targets [[52]17].
   Together, these tools provide a valuable platform for screening and
   characterizing antifungal candidates against F. solani.

   Cepharanthine (CEP) is a bisbenzylisoquinoline alkaloid extracted from
   the roots of Stephania species (Menispermaceae). It has demonstrated
   broad-spectrum antibacterial, antiviral, and anti-inflammatory
   activities [[53]18,[54]19]. CEP can activate type I interferon
   signaling to enhance host antiviral responses and has shown inhibitory
   effects on viruses such as human immunodeficiency virus (HIV),
   hepatitis B virus (HBV), influenza A virus (H1N1), and herpes simplex
   virus type 1 (HSV-1) [[55]20,[56]21,[57]22,[58]23]. It also modulates
   the MAPK and NF-κB pathways, suppresses inflammatory cytokine
   expression, restores autophagy, and contributes to cellular protection
   [[59]24,[60]25]. Additionally, CEP has been reported to inhibit tumor
   cell proliferation, induce apoptosis and autophagy, and prevent viral
   entry [[61]26]. Although its pharmacological effects are
   well-documented, its antifungal potential, especially against plant
   pathogenic fungi, remains unclear.

   In this study, we constructed virtual screening models using Random
   Forest (RF), Neural Network (NN), and Support Vector Machine (SVM)
   algorithms to identify potential inhibitors of F. solani. These models
   were selected to leverage their complementary strengths, as they
   represent distinct classes of algorithms (ensemble, neural-inspired,
   and kernel-based, respectively) commonly used in biological prediction
   tasks [[62]27,[63]28]. CEP was identified as a promising candidate and
   subsequently validated through in vitro assays. The results showed that
   CEP significantly inhibited the mycelial growth of F. solani, Fusarium
   proliferatum, Fusarium oxysporum, Alternaria alternata, and Botrytis
   cinerea. We further investigated its effects on the sporulation, spore
   germination, and oxidative stress response in F. solani. Transcriptome
   analysis revealed significant changes in gene expression following CEP
   treatment. Molecular docking indicated a potential interaction between
   CEP and the FsCFEM protein, which was further supported by molecular
   dynamics simulations. Key residues THR748 and LEU950 were found to
   contribute to the stability of the binding.

2. Materials and Methods

2.1. Data Preprocessing

   The relevant information of Fusarium solani was collected from the
   CHEMBL database ([64]https://www.ebi.ac.uk/chembl/) (accessed on 28
   July 2023) as the dataset for the construction of the machine learning
   model, and 1275 relevant active compounds’ molecular information was
   obtained. The modeling dataset was cleaned and proofread, and the
   values of duplicate molecule ChEMBL IDs were deleted, the unified
   minimum inhibitory concentration (MIC) unit was calculated, and the
   compounds with no MIC or multiple MICs were deleted. Generally, when
   essential oils, alkaloids, and other substances are used for the
   biocontrol of certain pathogenic fungi, they are considered to have
   inhibitory effects if they can suppress the fungi at concentrations
   lower than 50 µg/mL [[65]29,[66]30,[67]31]. Isoeugenol exhibited
   considerable efficacy against free radicals, with MIC50 values of 38.97
   and 43.76 µg/mL [[68]30]. Inhibitors and non-inhibitors were divided
   according to their MIC value, and compounds with a MIC < 50 µg/mL were
   marked as 0 (representing inhibitors), and those with a MIC ≥ 50 µg/mL
   were marked as 1 (representing non-inhibitors).

2.2. Molecular Characterization, Feature Selection, and Dataset Partitioning

   Descriptors and MoleculeDescriptors of the toolkit RDKit in Python
   3.7.0 were used to batch-calculate descriptors based on the Simplified
   Molecular Input Line Entry System (SMILES) of the chemical structure of
   the compounds in the modeling dataset, and all descriptors were
   obtained using Descriptors._descList, a component package within Python
   3.7.0. Feature dimension reduction was performed by Recursive Feature
   Elimination (RFE), and 50 molecular descriptors were finally retained
   for the construction of the machine learning model. A tree-based
   ensemble model was selected as the basic model for RFE to evaluate
   features’ importance, and the feature set and target variable were
   initialized; the final number of features and the number of features
   eliminated in each iteration were set to 50 and 1, respectively; then
   all the features and target variables were used to build and train the
   model, and the importance scores (feature_importances_) of each feature
   were calculated based on the model. Several features with the lowest
   scores were removed to obtain a new feature set, and the steps of
   construction of the model, evaluation of importance, and feature
   elimination were repeated until the predetermined number of features
   was reached, the model performance was no longer improved, or the
   difference in feature importance was less than the threshold, and the
   feature set that was finally left was the selected optimal feature
   subset. The dataset was normalized using the Standard scalar function
   in the Python 3.9 toolkit Scikit-learn and then divided into a training
   set and a test set in a 4:1 ratio.

2.3. Grid Search and Five-Fold Cross-Validation

   In order to achieve the best performance of the model, the grid search
   algorithm is used to determine the optimal parameter combination of RF,
   SVM, and NN. The three models set the hyperparameter space, clarify the
   hyperparameters to be optimized, and specify a set of possible value
   ranges for each hyperparameter, respectively. For RF, n_estimators,
   max_depth and min_samples_leaf, were selected as the hyperparameters to
   be optimized; for SVM, the hyperparameters to be optimized were gamma
   and C; for the NN model, hidden_layer_sizes and max_iter were selected
   as the hyperparameters to be optimized. The training set was randomly
   divided into five non-overlapping subsets, each containing
   approximately 20% of the total data volume. These subsets maintained
   data distribution consistency as much as possible to ensure that each
   fold contained inhibitor and non-inhibitor data; each time, one of the
   subsets was selected as the test set, and the remaining four subsets
   were combined as the training set. Each subset had the opportunity to
   be used as a test set once and also as a training set four times; in
   each round, the performance indicators of the model on the test set
   were recorded; finally, after five training and validation rounds, the
   average values of the performance indicators were calculated, including
   accuracy, precision, recall, F1 score, and Area Under the receiver
   operating characteristic Curve (AUC), and the performance of each model
   was carefully compared.

2.4. Assessment of Mycelial Growth, Conidiation, and Spore Germination

   Following the methodology described previously [[69]32], Fusarium
   solani, Fusarium oxysporum, Fusarium proliferatum, Botrytis cinerea,
   and Alternaria alternata were cultured and treated with CEP (Macklin,
   Shanghai, China). Each fungal strain was inoculated onto PDA medium
   supplemented with different concentrations of CEP (10 mg/L, 20 mg/L, 30
   mg/L, 40 mg/L, 50 mg/L, 60 mg/L, 80 mg/L, 100 mg/L, 120 mg/L, 200 mg/L,
   250 mg/L, 300 mg/L), while potato dextrose agar (PDA) medium without
   CEP was used as a control. The cultures were incubated at 26 °C for 7
   days, and colony diameters were measured to calculate the mycelial
   growth inhibition rate. For conidiation assays in F. solani, the fungus
   was inoculated into mung bean broth supplemented with 200 mg/L CEP and
   incubated at 26 °C with shaking at 250 rpm for 2 days. The number of
   conidia produced was determined using a hemocytometer. To assess
   conidial germination, conidia were harvested from 2-day-old F. solani
   cultures and suspended in YEPD medium (3 g yeast extract, 10 g peptone,
   and 20 g glucose per liter) supplemented with 200 mg/L CEP. The
   cultures were incubated at 26 °C with shaking for 6 and 12 h,
   respectively. At each time point, at least 100 randomly selected
   conidia per field of view were examined under a microscope to determine
   their germination rates. Statistical significance analysis was
   performed using a one-way analysis of variance (ANOVA) with pairwise
   comparisons, implemented via the SPSS 21.0 statistical software
   package.

2.5. Mycelial Preparation and Oxidative Stress Assays

   To evaluate the oxidative stress of F. solani after treatment with 20
   mg/L, 50 mg/L, 100 mg/L, and 200 mg/L CEP, malondialdehyde (MDA)
   content, hydrogen peroxide (H[2]O[2]) content, and the activities of
   superoxide dismutase (SOD), peroxidase (POD), and catalase (CAT) were
   measured. First, an F. solani spore suspension was prepared and
   adjusted to 1 × 10^6 spores/mL. A 1 mL aliquot was inoculated into 100
   mL of potato dextrose broth (PDB) and incubated at 26 °C with shaking
   at 200 rpm for 2 days. The mycelia were collected by filtration, washed
   with sterile water, and transferred into culture media containing 20
   mg/L, 50 mg/L, 100 mg/L, or 200 mg/L CEP. Untreated mycelia served as
   controls. After incubation for 12 h with shaking, the mycelia were
   collected by vacuum filtration, washed with sterile water, and used for
   subsequent assays. For MDA content’s measurement, the thiobarbituric
   acid (TBA) (Sigma-Aldrich, St. Louis, MO, USA) method was used. A 1.0 g
   mycelial sample was ground into powder in liquid nitrogen, followed by
   the addition of 5.0 mL of 10% trichloroacetic acid (TCA)
   (Sigma-Aldrich, St. Louis, MO, USA). After homogenization, the mixture
   was centrifuged at 10,000× g for 20 min at 4 °C, and the supernatant
   was collected. A 2.0 mL aliquot of the supernatant (for the blank
   control, 2.0 mL of 10% TCA was used instead) was mixed with 2.0 mL of
   0.67% TBA solution, heated in a boiling water bath for 20 min, cooled,
   and centrifuged again. The absorbance of the supernatant was measured
   at 450 nm, 532 nm, and 600 nm. For the H[2]O[2] content’s measurement,
   a commercial hydrogen peroxide assay kit (Biyuntian, Shanghai, China)
   was used, following the manufacturer’s instructions. For antioxidant
   enzyme activity assays, SOD activity was measured using a SOD assay kit
   (Solarbio, Beijing, China). For the POD activity assay, a 1 g mycelial
   sample was ground into powder in liquid nitrogen, suspended in 1 mL of
   PBS (pH 7.2) (Sigma-Aldrich, St. Louis, MO, USA), and centrifuged at
   4000 rpm for 10 min. The supernatant was collected, and POD activity
   was determined using a POD assay kit (Solarbio, Beijing, China). For
   the CAT activity assay, a CAT assay kit (Solarbio, Beijing, China) was
   used.

2.6. Chitin Content Measurement

   To investigate the effect of CEP on fungal cell wall integrity, the
   chitin content in F. solani mycelia was determined. A 5 mg sample of
   ground mycelial powder was suspended in 1 mL of 6% KOH solution
   (Aladdin, Shanghai, China). The sample was incubated in an 80 °C water
   bath for 1.5 h, followed by centrifugation at 12,000 rpm for 10 min.
   The supernatant was discarded, and the pellet was resuspended in 1 mL
   of 10 mM phosphate-buffered saline (pH 7.4). The washing step was
   repeated twice under the same centrifugation conditions. The final
   pellet was resuspended in 100 μL of McIlvaine buffer (pH 6.0)
   (Sigma-Aldrich, St. Louis, MO, USA) with 5 μL of chitinase and
   incubated at 37 °C for 24 h. Following enzymatic hydrolysis, 100 μL of
   0.27 M boric acid solution (Sigma-Aldrich, St. Louis, MO, USA) was
   added, and the sample was boiled for 10 min before cooling it to room
   temperature. After adding 1 mL of dimethylaminobenzaldehyde (DMAB)
   (Aladdin, Shanghai, China) solution, the mixture was incubated at 37 °C
   for 20 min, and absorbance was measured at 585 nm. A standard curve was
   generated using N-acetylglucosamine (0.05–0.40 mM) (Sigma-Aldrich, St.
   Louis, MO, USA) to determine the chitin content.

2.7. RNA Extraction

   For RNA extraction, at least 0.5 g of F. solani mycelia from
   CEP-treated samples was collected, along with untreated mycelia as
   controls. Each group included three biological replicates. The
   collected mycelia were ground into fine powder in liquid nitrogen and
   transferred into 2 mL centrifuge tubes. A total of 1 mL of TRIzol
   reagent (Vazyme Biotech, Nanjing, China) was added to each tube,
   followed by thorough mixing and incubation at room temperature for 10
   min. Then, 200 μL of chloroform (Sigma-Aldrich, St. Louis, MO, USA) was
   added, and the mixture was shaken at 70 Hz for 120 s, followed by
   another 10 min of incubation at room temperature. The sample was
   centrifuged at 12,000 rpm for 10 min at 4 °C, and the aqueous phase was
   transferred to a new tube. An additional 200 μL of chloroform was added
   for a second extraction, followed by the same centrifugation step. The
   final aqueous phase was mixed with an equal volume of isopropanol
   (Sigma-Aldrich, St. Louis, MO, USA) and incubated on ice for 1 h. The
   RNA was then precipitated by centrifugation at 12,000 rpm for 10 min at
   4 °C. The supernatant was discarded, and the RNA pellet was washed with
   1 mL of 70% ethanol prepared with DEPC-treated water (Sigma-Aldrich,
   St. Louis, MO, USA), followed by centrifugation at 12,000 rpm for 5 min
   at 4 °C. After discarding the supernatant, the pellet was air-dried in
   a biosafety cabinet and dissolved in 70 μL of DEPC-treated water. The
   RNA samples were either stored at −80 °C or used immediately for
   further experiments. RNA quality was assessed by agarose gel
   electrophoresis.

2.8. Transcriptome Analysis

   For the construction of the RNA sequencing (RNA-seq) library, the NEB
   or strand-specific method was used, ensuring a library concentration
   above 2 nM. The insert size of the library was also evaluated before
   the sequencing. The sequencing reads were aligned to the F. solani
   reference genome using HISAT2 with paired-end clean reads. Gene
   expression levels were quantified using StringTie v2.0.4, employing
   both fragments per kilobase per million reads (FPKM) and transcripts
   per million (TPM) as expression metrics. After gene quantification,
   expression values from all the samples were merged into an expression
   matrix. Differential expression analysis was performed using DESeq2
   v1.26.0, with significance criteria set as a p-value ≤ 0.05 and
   |log[2]FC| ≥ 1 for DEGs between CEP-treated and control groups. GO
   enrichment analysis of DEGs was conducted using the clusterProfiler
   package v3.14.0, with significantly enriched GO terms identified based
   on a p-value threshold of ≤0.05. KEGG pathway enrichment analysis was
   performed using hypergeometric testing, with pathways considered
   significantly enriched at a p-value ≤ 0.05. For functional annotation
   using the Clusters of Orthologous COG database, the protein sequences
   of DEGs were aligned against the COG database using BLASTP v2.2.31 with
   an E-value threshold of ≤10^−5. Homologous COG protein clusters were
   identified, and the functional classification of DEGs was performed
   accordingly.

2.9. Molecular Docking and Molecular Dynamics Simulation

   The structure file of the CEP small molecule was obtained from the
   PubChem database. The three-dimensional structure of the FsCFEM1
   protein was predicted using SWISS-MODEL
   ([70]https://swissmodel.expasy.org/) (accessed on 12 September 2024),
   and the corresponding structure file was retrieved. Molecular docking
   was performed using AutoDock software 4.2.6. Before docking, both the
   protein and the small molecule were preprocessed by removing
   unnecessary atoms, adding hydrogen atoms, and assigning charges.
   Docking parameters, including the search space, sampling algorithm, and
   scoring function, were set accordingly. The CFEM domain region (amino
   acids 725–789) of FsCFEM1 was selected as the docking box, and a
   semi-flexible molecular docking approach was adopted. After docking,
   binding modes were evaluated, and the conformations with reasonable
   binding poses and high docking scores were selected for further
   analysis. For the selected protein–ligand complexes, molecular dynamics
   simulations were performed using Gromacs 2022. The Amber99sb-ildn force
   field was applied for the protein, while the General Amber Force Field
   (GAFF) was used for the small molecule. A TIP3P water model was used to
   construct a 10 × 10 × 10 nm^3 water box, ensuring at least 1.2 nm
   between the protein and the box edges. Ions were added to neutralize
   the system. During the simulation, long-range electrostatic
   interactions were treated using the particle–mesh Ewald (PME) method,
   and energy minimization was conducted for 50,000 steps using the
   steepest descent algorithm, with Coulomb and van der Waals cutoff
   distances set to 1 nm. After the system’s minimization, equilibration
   was performed under the NVT (constant number of particles, volume, and
   temperature) and NPT (constant number of particles, pressure, and
   temperature) ensembles. MD simulation was conducted for 100 ns at 300
   K, controlled by the Langevin thermostat, and 1 bar, controlled by the
   Berendsen barostat. A 10 Å cutoff was used for non-bonded interactions.
   Post-simulation analysis was conducted using built-in Gromacs analysis
   tools. Structural stability and flexibility were assessed by
   calculating RMSD, RMSF, and Rg. The binding free energy was estimated
   using Gmx_MMPBSA, a component within Gromacs 2022.

3. Results

3.1. Construction and Evaluation of Machine Learning Models

   In this study, three machine learning algorithms, namely Random Forest
   (RF), Support Vector Machine (SVM), and Neural Network (NN), were
   selected to build models for screening compounds that inhibit F.
   solani. Before the model building, a chemical space analysis was
   conducted based on the Molecular Weight (MW) and Aliphatic and Aromatic
   LogP (AlogP) of the compounds in the processed dataset ([71]Figure 1A).
   The results showed that the MW of the modeling dataset was concentrated
   in the range of 100 to 600, and the AlogP values ranged from 0 to 8.
   This indicates that the chemical space of the compounds in the modeling
   dataset of this study is relatively large, with a good stability and
   high operability, and it may contain various types of candidate drugs.
   In order to effectively improve the stability and reliability of the
   model, the 208 molecular descriptors obtained through molecular
   characterization calculations were screened using recursive feature
   elimination (RFE); the number of retained features was determined based
   on the RFE; when the number of features was 50, the evaluation metrics,
   such as the accuracy of the model on the dataset, were relatively high
   ([72]Figure S1). Features at this dimension can ensure the
   generalizability of the models. The Pearson correlation coefficient of
   these 50 features was calculated ([73]Figure S2). The results showed
   that the absolute values of correlation coefficients for most features
   were less than 0.5, indicating a weak correlation among those features.
   The features retained by RFE effectively removed redundancy, which
   served to improve the performance and accuracy of the model. The three
   selected machine learning models were generated using the open-source
   toolkit Scikit-learn in Python 3.9. The parameters of the models were
   adjusted using grid search, learning curves, and accuracy values to
   achieve the best prediction results ([74]Figure 1C–E). Finally, the
   five-fold cross-validation results on the training set and the test set
   results of the generated machine learning models are shown in [75]Table
   1 and [76]Figure 1F–H, respectively. The accuracy rates of the three
   models on the training set are close to those on the test set,
   indicating that the models exhibit a good generalizability and
   robustness. RF and SVM had similar prediction effects on inhibitors
   (“0”) and non-inhibitors (“1”), and NN had better prediction effects on
   non-inhibitors. The F1 scores and precision indicators of the three
   models are all greater than 0.7, indicating that the model performance
   is balanced and can accurately predict compounds that inhibit the
   activity of F. solani within a certain range. Additionally, this study
   found that the ROC curves of the three models of RF, SVM, and ANN were
   highly similar, and the values of the area under curve (AUC) were all
   above 0.80. Among them, RF had the highest AUC of 0.93, indicating that
   the three models performed well. Considering that different algorithms
   have preferences and the generalizability of a single model is