Abstract Breast cancer (BC) is a complex disease, which causes of high mortality rate in women. Early diagnosis and therapeutic improvements may reduce the mortality rate. There were more than 74 individual studies that have suggested BC-causing hub-genes (HubGs) in the literature. However, we observed that their HubG sets are not so consistent with each other. It may be happened due to the regional and environmental variations with the sample units. Therefore, it was required to explore hub of the HubG (hHubG) sets that might be more representative for early diagnosis and therapies of BC in different country regions and their environments. In this study, we selected top-ranked 10 HubGs (CCNB1, CDK1, TOP2A, CCNA2, ESR1, EGFR, JUN, ACTB, TP53, and CCND1) as the hHubG set by the protein-protein interaction network analysis based on all of 74 individual HubG sets. The hHubG set enrichment analysis detected some crucial biological processes, molecular functions, and pathways that are significantly associated with BC progressions. The expression analysis of hHubGs by box plots in different stages of BC progression and BC prediction models indicated that the proposed hHubGs can be considered as the early diagnostic and prognostic biomarkers. Finally, we suggested hHubGs-guided top-ranked 10 candidate drug molecules (SORAFENIB, AMG-900, CHEMBL1765740, ENTRECTINIB, MK-6592, YM201636, masitinib, GSK2126458, TG-02, and PAZOPANIB) by molecular docking analysis for the treatment against BC. We investigated the stability of top-ranked 3 drug-target complexes (SORAFENIB vs ESR1, AMG-900 vs TOP2A, and CHEMBL1765740 vs EGFR) by computing their binding free energies based on 100-ns molecular dynamic (MD) simulation based Molecular Mechanics Poisson-Boltzmann Surface Area (MM-PBSA) approach and found their stable performance. The literature review also supported our findings much more for BC compared with the results of individual studies. Therefore, the findings of this study may be useful resources for early diagnosis, prognosis, and therapies of BC. Keywords: Breast cancer, hub of the hub-genes (hHubGs), early diagnosis and prognosis, regulatory factors, drug repurposing, molecular docking Introduction Cancer is a malignant tumor that is initiated by multiple changes at the genetic and epigenetic levels that progressively lead to abnormal cell division and cellular alteration.^[35]1,[36]2 Breast cancer (BC) is a major public health concern, since it is the leading cause of cancer-related deaths in women worldwide. Breast cancer was the second most common cancer diagnosed, and the number was approximately 2 million (11.6% of all cancers) in 2018.^ [37]3 Breast cancer became the leading cause of diagnosis in 2020, and the number was approximately 2.2 million (11.7% of all cancers) worldwide.^ [38]4 Thus, it is a matter of concern that the number of patients with BC and deaths is increasing every year, despite claims that there have been significant improvements in BC treatment. According to the American Cancer Society, the 5-year survival rates of patients with BC are 98%, 93%, 72%, and 22% in stage 1, stage 2, stage 3, and stage 4, respectively.^ [39]5 The 5-year BC survival rate is the percentage of people who survive at least 5 years after diagnosis of BC. From a previous study, it was found that around 52.34% of the patients with BC received treatment in stage 4, 32.81% in stage 3, 10.16% in stage 2, and only 4.69% patients were treated in stage 1.^ [40]6 It has been observed that highest number of patients is detected in the fourth stage of BC and their survival rate is the lowest. Therefore, early diagnosis and therapeutic improvement is essential to reduce the death rate of patients with BC. The BC-causing genes may play the key role in this regard. There are several individual studies on transcriptomics (microarray/RNA-Seq) analysis in the literature that suggested BC-causing hub-genes (HubGs) to investigate the pathogenetic processes of BC. We systematically reviewed 74 individual articles from different online sources that suggested BC-causing HubGs and found that their HubG sets are not so similar with each other (see Supplemental Table S1). It may be happened due to the regional and environmental variations with the experimental units. We found a total of 259 different HubGs from those articles in which there was no any common HubGs among the articles. So, it may be difficult to take a common treatment plan for all patients with BC in different regions and environment. Therefore, it is required to explore hub of the HubG (hHubG) sets that may be more representative BC-causing HubGs for diagnosis and therapeutic improvement for the treatment against BC in different environments. Exploring BC-causing hHubGs out of 259 HubGs highlighting their early diagnostic, prognostic, and therapeutic characteristics by the wet-laboratory experimental procedure may be laborious, time-consuming, and costly. In this study, we attempted to explore BC-causing hHubGs highlighting their early diagnostic, prognostic, and therapeutic characteristics by using the integrated statistics and bioinformatics approaches. To explore disease-causing HubGs, the protein-protein interaction (PPI) network analysis is a widely used popular approach.^[41]7-[42]12 In this study, we also explored BC-causing hHubGs by the PPI network analysis of 259 HubGs that we mentioned earlier. In the case of drug discovery, drug repurposing (DR) is a promising strategy, where new therapeutic inventions are made through existing drugs.^ [43]13 It is secure, more efficient, less expensive, and less time-consuming compared with the de novo process due to it skip several steps from the target-based drug selection to clinical validation.^ [44]14 Therefore, in this study, we detected hHubGs-guided repurposable drug molecules. There are several steps involved in the in silico analysis of the DR process, such as network pharmacology, molecular docking analysis, and dynamic simulation analysis.^ [45]15 Network pharmacology is an efficient tool for systematic pharmacologic research which creates interactions between target genes and drugs through network-based methods.^[46]16,[47]17 Molecular docking analysis plays a vital role in drug design through structural binding between therapeutic targets and drug molecules by calculating affinity scores.^ [48]18 The molecular dynamic (MD) simulation indicates the stability of docking patterns. Recently, it has been widely used and has become one of the most popular platforms to validate the candidate drugs.^[49]19,[50]20 The workflow of this study is displayed in [51]Figure 1. It should be mentioned here that the similar workflow was also considered in some previous studies for other diseases.^[52]21,[53]22 Figure 1. [54]Figure 1. [55]Open in a new tab The workflow of this study. Methods and Materials Data collection on BC-causing HubG sets We collected HubG sets and drug molecules as molecular metadata from different published articles up to December 31, 2023, through online search strategy with the keywords “breast cancer [ti], gene expression, RNA-Seq, scRNA-Seq, differentially expressed genes (DEGs), and core/key/hub genes” from the public sources including PubMed in the National Center of Biotechnology Information (NCBI) ([56]https://pubmed.ncbi.nlm.nih.gov/), Google Scholar ([57]https://scholar.google.com/), and Google ([58]https://www.google.com/). We found a total of 74 independent articles that suggested BC-related HubGs-set. We collected 74 HubG sets from the selected 74 articles. We found a total of 259 unique HubGs from those HubG sets and used them to explore hHubGs for further investigation in this study (see Supplemental Table S1). Data collection on drug molecules We collected candidate drug molecules from 3 platforms: (1) published articles, (2) the Gene Set Cancer Analysis (GSCALite) database,^ [59]23 and (3) the Drug Gene Interaction Database (DGIdb).^ [60]24 Some of our selected articles suggested candidate drugs for the treatment of BC and considered them set A.^[61]25,[62]26 We collected meta-drugs from 2 databases DGIdb and GSCALite through the hHubG-drugs interaction and considered them sets B and C, respectively (Supplemental Table S2). Detection of BC-causing hub of the HubGs Network-based strategies are widely used in identifying disease-driving KGs and their potential regulators.^[63]27,[64]28 We submitted all unique HubGs to the STRING web tool (Search Tool for the Retrieval of Interacting Genes) (version: 11.5) to construct PPI network.^ [65]29 Protein-protein interactions of the all HubGs were constructed with a confidence score 0.90. Subsequently, Cytoscape software (version: 3.9.0) was used for visualization of the PPI network.^ [66]30 The “Analyze Network tool” in Cytoscape was used for topological degree analysis, and the degree >90 was considered as the cutoff criterion for hHubG identification. Assessment of hHubGs as the early diagnostic and prognostic biomarkers To assess hHubGs as early diagnostic and prognostic biomarkers, we performed their expression analysis with the different stages (normal status, stage 1, stage 2, stage 3, and stage 4) of BC progression by using box plots through the UALCAN web-tool and The Cancer Genome Atlas (TCGA) database.^ [67]31 To investigate the prognostic power of hHubGs, we also developed BC prediction models through 2 popular machine learning methods known as support vector machine (SVM) and random forest (RF). To perform prediction models, we collected 3 microarray gene-expression data sets with accession numbers [68]GSE65216,^ [69]32 [70]GSE10810,^ [71]33 and [72]GSE36295^ [73]34 from the Gene Expression Omnibus (GEO) database. Each data set divided into 2 groups: training data (70% of all data) and test data (30% of all data). Finally, the receiver operating characteristic (ROC) curve was used to assess the prognosis performance. The ROC curve was constructed using the R package ROCR. We also used independent database “DisGeNET” and web-tool “Enrichr” to verify the association of hHubGs with BC and other disease by disease-hHubGs enrichment analysis.^ [74]35 Functional and pathway enrichment analysis To investigate the pathogenetic processes of hHubGs, we performed their Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genome (KEGG) pathway enrichment analysis. The GO enrichment analysis was performed in 3 categories namely biological process (BP), cellular component (CC), and molecular function (MF). Gene Ontology functional and KEGG pathway enrichment analyses were performed using the online database DAVID (version: 6.8) online tool.^ [75]36 Regulatory interaction network We performed regulatory interaction network to identify the key transcription factors (TFs) and microRNAs (miRNAs) as the transcriptional and posttranscriptional regulators of hHubGs. The online tool “NetworkAnalyst” (version: 3.0) was used to construct the regulatory interaction network.^ [76]37 ChEA and TarBase (v8.0) databases were selected for constructing hHubG-TF and hHubG-miRNA interaction networks, respectively. Drug repurposing We used molecular docking analysis, a popular in silico validation technique, to perform the binding between the drug and target. We considered hHubGs as targets (receptors/proteins) and meta-drug agents collected from published articles and the online databases GSCALite and DGIdb as drugs (compounds) for performing molecular docking analysis. We collected 3-dimensional (3D) structures of proteins from the Protein Data Bank (PDB) ([77]https://www.rcsb.org/)^ [78]38 and the SWISS-MODEL ([79]https://swissmodel.expasy.org/)^ [80]39 databases due to the required to perform molecular docking analysis. The 3D structures of all compounds were downloaded from the PubChem ([81]https://pubchem.ncbi.nlm.nih.gov/) database.^ [82]40 PyMOL (PyMOL2) software was used for the preprocessing of 3D structure proteins.^ [83]41 PyRx software was used to perform the blind molecular docking analysis between proteins and compounds and selected shortlisted drugs based on binding affinity scores (BASs) (kcal/mol).^[84]42,[85]43 Then, we validate the selected drug molecules with the active sites (by using PyMOL2 software)^ [86]41 of our selected hHubGs through flexible docking analysis. The 3D interaction between the protein and compound was constructed and visualized by using USCF Chimera and Discovery Studio Visualizer 2021 software.^ [87]44 MD simulation studies Molecular dynamic simulations were performed using YASARA Dynamics software^ [88]45 and the AMBER (Assisted Model Building with Energy Refinement) force field^ [89]46 to test the dynamic behavior of the top protein-compound interactions. The AMBER’s force fields have been extensively validated and optimized for various biological molecules, making it a robust choice for simulation study.^ [90]47 The root mean square deviation (RMSD) value between the crystal reference and the predicted structure is commonly used to confirm whether a close-match docked pose was predicted or not by docking simulations. If the RMSD value is ⩽2 Å, the result is considered as good result.^ [91]48 Three top-ranked interactions based on affinity scores were selected to perform MD simulations. For simulation performance, the protein-compound interaction hydrogen bonding network is optimized and solvated through the TIP3P water model in a simulation cell.^ [92]49 The periodic boundary condition was maintained by a solvent concentration of 0.997 g L^−1. pKa was calculated during solvation by subjecting to titratable amino acids in the interaction. A time-step interval of 2.50 fs (298 K, pH = 7.4, 0.9% NaCl) under physiological conditions was used to run several time-step algorithms for each simulation.^ [93]50 The SETTLE and LINCS algorithms were used to contain water molecules and constrain all bond lengths, respectively.^[94]51,[95]52 For further analysis, trajectories were recorded every 250 ps, and subsequent analyses were implemented by default scripts of YASARA Macro and SciDAVis software.^ [96]53 Then, all snapshots were subjected to the Molecular Mechanics Poisson-Boltzmann surface area (MM-PBSA) in YASARA software to calculate the binding free energy using the given equation.^ [97]54 [MATH: Binding freeEnergyΔG=G< mrow>boundGunbo< mi>und :MATH] Here, YASARA built-in macros were used with consideration AMBER 14 as a force field to calculate the MM-PBSA binding energy, where a greater positive energy indicates better binding. Results Exploring BC-causing hub-genes by the systematic literature review We found a total of 297 BC-related articles from various online sources (PubMed, Google Scholar, and Google) by keywords search up to December 31, 2023. Finally, we selected 74 articles out of them through the inclusion-exclusion criteria (see Supplemental Figure S1 and Supplemental Table S1). Then, we detected a total of BC-causing 259 different HubGs from those 74 HubG sets as described in the “Materials and Methods” section and used them for further investigation in this study. Detection of hub of the HubGs The PPI network was constructed based on those 259 individual HubGs to detect BC-causing hHubGs as key genomic biomarkers. It included 209 of nodes, 5785 edges, average node degree 39.5, average local clustering coefficient 0.599, and P < 1.0e-15. The PPI network was visualized in [98]Figure 2, where hHubGs are indicated by large-size nodes and selected based on the top degree of connectivity (degree >95). We selected top-ranked 10 hHubGs (CCNB1, CDK1, TOP2A, CCNA2, ESR1, EGFR, JUN, ACTB, TP53, and CCND1) in which 6 hHubGs (CCNB1, CDK1, TOP2A, CCNA2, ESR1, and EGFR) belong to the set of already published BC-causing top-ranked 10 HubGs (see [99]Table 1 and Supplemental Table S1). Figure 2. Figure 2. [100]Open in a new tab The PPI network based on 259 unique HubGs, where large size indicates hHubGs with the highest degree of connectivity (degree >90). Table 1. List of top-ranked 10 BC-causing HubGs/hHubGs based on the agreement of published articles (Supplemental Table S1) (A*) and hHubGs through PPI network analysis (B*) ([101]Figure 2), respectively. Top-ranked 10 HubGs by the agreement of published articles (Supplemental Table S1) Top-ranked 10 hHubGs (proposed) Common hHubGs [MATH: (A*B*) :MATH] Genes (A*) Number of agreed articles Genes (B*) Degree of PPI score TOP2A 8 TP53 143 CCNB1 CDK1 TOP2A CCNA2 ESR1 EGFR CDK1 7 ACTB 122 CCNB1 9 EGFR 102 FN1 8 CCND1 100 RRM2 6 JUN 100 BUB1B 7 ESR1 95 CCNB2 8 CDK1 99 CCNA2 8 CCNB1 99 ESR1 6 CCNA2 98 EGFR 6 TOP2A 91 [102]Open in a new tab Bold indicates common genes. Performance of hHubGs as early diagnostic and prognostic biomarkers To investigate the significance of differential expression patterns of top-ranked 10 hHubGs between the normal group and 4-stage group (stage 1, stage 2, stage 3, and stage 4) of BC individually, we performed box-plot analysis using t test statistic with P < .001 based on their TCGA expression profiles ([103]Figure 3A). We observed that all hHubGs individually showed significantly differential expression patterns (P values ranged from 8.99E-04 [TP53] to 2.22E-16 [ACTB]) between stage 1 BC and control samples group. Thus, our identified hHubGs may play an important role in the early stage diagnosis of BC. To investigate the prognostic power of top-ranked 10 hHubGs, we developed RF and SVM-based 2 BC prediction models by using the gene-expression data with NCBI-GEO accession number [104]GSE65216 as training data set, and [105]GSE10810 and [106]GSE36295 as the test data sets. To investigate the performance of the prediction models, we constructed ROC curves in [107]Figure 3B, where green indicates training performance and red and blue indicate test performance. We observed that both the training and test performances were good with area under the curve (AUC) values >0.86 for both the RF and SVM-based prediction models. Furthermore, we observed that hHubGs are significantly associated with invasive carcinoma of BC and luminal B breast carcinoma (Supplemental Figure S2). Thus, our suggested hHubGs are significantly associated with almost all subtypes of BC. Figure 3. [108]Figure 3. [109]Open in a new tab (A) Analysis of expression levels of hHubGs between the normal group and 4-stage group (stage 1, stage 2, stage 3, and stage 4) of BC, and (B) ROC curves to investigate the prognostic power of hHubGs by the RF and SVN-based BC prediction models with 3 gene-expression profile data sets in NCBI-GEO database. HubGs-set enrichment analysis The GO functional and KEGG enrichment analysis showed that unique HubGs were enriched by 234 GO-BP terms, 56 GO-MF terms, 38 GO-CC terms, and 85 pathways. Then, we selected the significant GO terms and KEGG pathways that were associated with at least 4 of our identified hHubGs and P < .01. Four BP terms (positive regulation of transcription from RNA polymerase II promoter, cell proliferation, cell division, and regulation of cell cycle), 5 CC terms (nucleoplasm, nucleus, cytosol, cytoplasm, and membrane), 5 MF terms (protein binding, enzyme binding, adenosine triphosphate (ATP) binding, TF binding, and protein heterodimerization activity), and 7 KEGG pathways (pathways in cancer, PI3K-Akt signaling pathway, cell cycle, herpes simplex infection, human T-lymphotropic virus type 1 (HTLV-I) infection, proteoglycans in cancer, and PI3K-Akt signaling pathway) were associated with at least 4 hHubGs and may be responsible for BC progression in [110]Table 2. Table 2. Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment analysis results of all HubGs highlighting number of HubGs (nHubGs), P value, and associated hub of the HubGs (hHubGs). ID Terms/pathways HubGs P Associated hHubGs GO: Biological process (BP) GO:0045944 Positive regulation of transcription from RNA polymerase II promoter 66 3.8E−21 TOP2A, TP53, EGFR, ESR1, JUN GO:0008283 Cell proliferation 38 5.0E−18 EGFR, CDK1, TP53, JUN GO:0051301 Cell division 36 5.9E−17 CCNB1, CCND1, CCNA2, CDK1 GO:0051726 Regulation of cell cycle 23 2.8E−16 JUN, CCNB1, TP53, ESR1 GO: Cellular components (CCs) GO:0005654 Nucleoplasm 123 2.6E−26 ACTB, CCND1, TP53, CCNB1, ESR1, JUN, CDK1, TOP2A, CCNA2 GO:0005634 Nucleus 175 5.9E−24 CCND1, TP53, CCNB1, JUN, ESR1, CDK1, TOP2A, CCNA2, EGFR GO:0005829 Cytosol 129 1.0E−22 ACTB, CCND1, TP53, CCNB1, JUN, CDK1 GO:0005737 Cytoplasm 154 8.7E−16 ACTB, CCND1, TP53, CCNB1, ESR1, CDK1, TOP2A, CCNA2, EGFR GO:0016020 Membrane 73 1.1E−08 ACTB, CCND1, EGFR, CCNB1, ESR1, CDK1 GO: Molecular functions (MFs) GO:0005515 Protein binding 245 2.6E−29 ACTB, TP53, JUN, TOP2A, EGFR, CCND1, CCNB1, ESR1, CDK1, CCNA2 GO:0019899 Enzyme binding 33 3.9E−15 TOP2A, EGFR, CCND1, JUN, ESR1, TP53 GO:0005524 ATP binding 67 9.8E−13 TOP2A, ACTB, TP53, EGFR, CDK1 GO:0008134 Transcription factor binding 24 1.0E−09 JUN, ESR1, CCND1, TP53 GO:0046982 Protein heterodimerization activity 28 4.8E−08 TOP2A, EGFR, JUN, TP53 KEGG pathways hsa05200 Pathways in cancer 51 1.4E−20 CCND1, TP53, EGFR, JUN hsa04151 PI3K-Akt signaling pathway 33 1.6E−20 CCND1, JUN, CCNA2, TP53 hsa04110 Cell cycle 29 2.6E−18 CCNB1, CCND1, CCNA2, CDK1, TP53 hsa05168 Herpes simplex infection 24 1.0E−16 EGFR, CCND1, TP53, JUN hsa05166 HTLV-I infection 36 1.8E−15 CCND1, JUN, TP53, ESR1 hsa05205 Proteoglycans in cancer 31 2.4E−14 EGFR, ACTB, CCND1, ESR1, TP53 hsa04151 PI3K-Akt signaling pathway 40 3.1E−14 EGFR, CCND1, TP53, TOP2A [111]Open in a new tab Regulatory network analysis To identify the top-ranked transcriptional and posttranscriptional regulators of hHubGs, we constructed and visualized KG-TF and KG-miRNA interaction networks in [112]Figure 4A and [113]B, where pink indicates hHubGs and green indicates TFs and miRNAs in [114]Figure 4A and [115]B, respectively. We observed that 5 TFs (MYC, HNF4A, KLF4, POU5F1, and SOX2) were associated with at least 8 hHubGs (degree ⩾8), so these 5 TFs considered as transcriptional factors of hHubG. Similarly, 5 miRNAs (hsa-mir-103a-3p, hsa-mir-107, hsa-mir-16-5p, hsa-mir-34a-5p, and hsa-mir-23b-3p) were associated with at least 8 hHubGs (degree ⩾8), so these 5 miRNAs were considered as posttranscriptional factors of hHubGs. We found from EMBL-EBI (European Molecular Biology Laboratory—European Bioinformatics Institute) database that 3 hHubGs (JUN, TP53, and ESR1) out of 10 hHubGs are TFs and indicated them by diamond shape in [116]Figure 4. [117]Figure 4C(I) indicates that 3 TFs of hHubGs (MYC, KLF4, and POU5F1) showed downregulation, and 2 TFs (HNF4A and SOX2) showed upregulation. [118]Figure 4C(II) shows that KLF4 is a upstream and SOX2 is a downstream TFs for the JUN hHubGs (TFs), so JUN acts as an inhibitor to both KLF4 and SOX2 (if the expression level of the downstream TF decreased, then core TF acted as an activator otherwise inhibitor). Similarly, TP53 acted as an activator to POU5F1 and inhibitor to MYC and SOX2, and ESR1 acted as an activator to KLF4. Figure 4. [119]Figure 4. [120]Open in a new tab hHubGs-regulatory networks, where pink indicates hHubGs (diamond shape hHubGs indicate TFs themselves): (A) hHubGs-TFs interaction network, where large green indicates key transcription factors; (B) hHubG-miRNA interaction network, where large green indicates key posttranscriptional factors; and (C) upstream and downstream analysis of TFs (I) differential expressions for TFs of hHubGs (II) upstream and downstream regulation by hHubGs that are also TFs (JUN, TP53, and ESR1), where pink diamond indicates core TFs, red rectangle indicates upstream TFs, and blue rectangle indicates downstream TFs. Drug repurposing First, we collected BC and hHubGs associated with a total of 258 drug agents from the published articles and 2 online databases: DGIdb and GSCALite. Then, we performed blind molecular docking analysis (entire protein) between compounds (drug agents) and target proteins (hHubGs). We selected the top-10 poses based on the smallest docking scores between compounds and target proteins. To check the quality of binding poses, we calculated the RMSD between native pose and our sleeted 10 poses. Finally, we selected the best pose based on the smallest RMSD and considered the docking score in that pose. We visualized the molecular docking results of the top 125 interactions in [121]Figure 5A, where the y-axis represents the target proteins and x-axis compounds and different colors indicate the affinity scores. Compounds were arranged in x-axis in ascending order based on the average BAS across all proteins. For example, top-ranked 10 compounds (SORAFENIB, AMG-900, CHEMBL1765740, ENTRECTINIB, MK-6592, YM201636, masitinib, GSK2126458, TG-02, and PAZOPANIB) bind to the receptor proteins with average BAS less than −8.76 kcal/mol (binds strongly to the receptor proteins), and the next top-ranked 10 compounds (GSK-269962A, MLN-8054, ENMD-2076, ENMD-981693, NERVIANO, SB590885, Lapatinib, GSK1070916, TAK-715, and PF-03814735) bind to the receptor proteins with average BAS greater than −8.23 kcal/mol (binds weakly to the receptor proteins). Therefore, top-ranked 10 compounds (SORAFENIB, AMG-900, CHEMBL1765740, ENTRECTINIB, MK-6592, YM201636, masitinib, GSK2126458, TG-02, and PAZOPANIB) were proposed as candidate drugs against BC. In [122]Figure 5B, the BAS of top-ranked 10 compounds with all receptors was highlighted. We found 6 overlapping genes between the top-ranked HubGs and hHubGs in [123]Figure 5C. We cross-validated the proposed candidate drugs with 4 uncommon top-ranked HubGs in [124]Figure 5D. To validate the top-ranked drug molecules with active sites of top-ranked HubGs by their flexible docking analysis, we predicted their active sites by the PyMOL software and listed in Table S3. Then, we perform flexible docking analysis between drugs and selected top-ranked HubGs (with flexibility of active sites) in [125]Table 3. We also found a strong bond between them, so we may recommend our candidate drugs for universal use against BC. Figure 5. [126]Figure 5. [127]Open in a new tab Molecular docking analysis results for screening candidate drugs against BC: (A) presented interaction binding affinity scores between hHubGs and drug agents; (B) visualization of interaction between 10 candidate drugs and hHubGs by zooming; (C) overlap between the top-ranked HubGs and hHubGs; and (D) cross-validation of the candidate drugs by top-ranked HubGs. Table 3. Flexible docking scores between the proposed drug molecules and the active sites of hHubGs. Proteins AMG900 CHEMBL1765740 SORAFENIB ENTRECTINIB MK6592 YM201636 masitinib GSK2126458 TG02 PAZOPANIB TOP2A −8.2 −7.3 −7.1 −7 −7.5 −7.3 −6.8 −7.1 −6.9 −6.3 EGFR −9.3 −8.5 −7.3 −7.6 −7.8 −8 −7.1 −6.9 −7.1 −6.1 ESR1 −8.9 −9.3 −9.5 −8.3 −8 −6.6 −7.3 −7.2 −7 −6.6 TP53 −8.5 −8.8 −7.8 −8 −8.1 −7.9 −7.6 −6.7 −7.1 −7.8 CDK1 −7.7 −6.8 −7.5 −7.6 −7.1 −7.2 −7 −6.7 −6.5 −6.8 CCND1 −7.6 −7.4 −8.1 −7.7 −7.3 −7.4 −7.6 −7.7 −7.4 −7.1 JUN −8.5 −7.7 −8.1 −8.1 −7.7 −7.1 −6.7 −7 −6.9 −6.7 ACTB −9.2 −7.8 −8.6 −7.2 −7.5 −7.7 −7.1 −7.7 −7.1 −6.6 CCNB1 −8.3 −7.5 −7.5 −8.1 −7.7 −7.6 −7.4 −7 −6.7 −7.2 CCNA2 −9.6 −7.6 −8.5 −7.5 −7.6 −7.2 −6.9 −7.5 −5.8 −6.3 [128]Open in a new tab Molecular dynamic simulations Three top interactions, SORAFENIB-ESR1 (BAS = −12.5), CHEMBL1765740-EGFR (BAS = −10.8), and AMG900-TOP2A (BAS = −12.4), between candidate drugs and target proteins were selected for stability analysis using 100-ns MD-based MM-PBSA simulations. We noticed that all systems were remarkably stable in variations of moving and initial drug-target interactions in [129]Figure 6. We calculated the RMSD of selected interactions to check the stability of structure during the simulation period in [130]Figure 6A. All the systems projected RMSD range from 1 Å to 1.75 Å, .75 Å to 2.25 Å, and .5 Å to 2.75 Å for SORAFENIB-ESR1, CHEMBL1765740-EGFR, and AMG900-TOP2A interactions, respectively. We observed that all interactions fluctuated slightly between 0 and 16 000 ps and were stable in the remaining simulations. We also calculated the MM-PBSA binding energy for the 3 selected top interactions in [131]Figure 6B. The average binding forces for SORAFENIB-ESR1, CHEMBL1765740-EGFR, and AMG900-TOP2A interactions were 217.5, 130.45, and 131.4 kJ/mol, respectively. Figure 6. [132]Figure 6. [133]Open in a new tab MD simulation analysis of the top 3 interactions, where blue, red, and green indicate SORAFENIB-ESR1, CHEMBL1765740-EGFR, and AMG900-TOP2A interactions, respectively: (A) time evolution of RMSDs of backbone atoms (C, Cα, and N) for protein for each docked complex, and (B) represented binding free energy of each snapshot for the top 3 interactions. Discussion This study analyzed metadata on BC-causing HubGs collected from different published articles. At first, we selected 74 articles out of 297 by the systematic review as shown in Supplemental Figure S1. Different 259 HubGs were found in total from the selected articles (Supplemental Table S1), and these data were used for further analysis in this study. Top-ranked 10 HubGs (CCNB1, CDK1, TOP2A, CCNA2, ESR1, EGFR, JUN, ACTB, TP53, and CCND1) were selected as the hHubGs out of different 259 HubGs by the PPI network analysis, where 6 hHubGs (CCNB1, CDK1, TOP2A, CCNA2, ESR1, and EGFR) are common with the top-ranked HubGs based on the agreement of published articles ([134]Table 1). Then, we verified their differential expression patterns between BC and control samples using independent data sets from NCBI and TCGA databases to the prediction models and box-plot analysis. We observed that the proposed hHubGs can be used as the strong biomarkers for early diagnosis and prognosis, where 8 genes (CCNB1, CDK1, TOP2A, CCNA2, ESR1, ACTB, TP53, and CCND1) are up-regulated hHubGs and the rest 2 genes (EGFR and JUN) are down-regulated hHubGs. In particularly, the CCNB1 gene is associated with aggressive tumor behavior, larger tumor size, higher grade, hormone receptor negativity, HER2 positivity, and poor clinical outcome.^ [135]55 The TOP2A gene has been suggested in 8 articles that this gene plays a prognostic role in patients with BC.^[136]56-[137]62 Overexpression of TOP2A is a marker of poor prognosis, especially in luminal/hormone receptor-positive BC, and may serve as both a prognostic and predictive biomarkers in BC disease.^ [138]63 Seven articles claim that the CDK1 gene is responsible for the occurrence and development of BC.^[139]26,[140]57-[141]59,[142]64,[143]65 CDK1 is associated with shorter overall survival, progression-free survival, and relapse-free survival, as well as more advanced clinical stage of cancer.^ [144]66 The CCNA2 gene has been identified in 6 studies as a promising biomarker in silico analysis related to BC progression.^[145]26,[146]56,[147]57,[148]67-[149]72 The CCNA2 gene is linked to a poor prognosis and resistance to endocrine therapy and serve as a prognostic biomarker in estrogen receptor-positive patients with BC.^ [150]73 The ESR1 gene suggested in several articles suggested as generalized BC.^[151]56,[152]74-[153]78 The EGFR gene and its signaling pathways are related to the progression of BC and can be used as a therapeutic target against BC therapy.^[154]61,[155]74,[156]79-[157]81 Overexpression of EGFR is associated with more aggressive, ER-negative disease and poorer prognosis, particularly in triple negative BC.^ [158]82 The JUN gene has been suggested that it can be used as a biomarker for the diagnosis and therapeutic target of BC.^[159]74,[160]83,[161]84 In addition, 3 genes (ACTB, TP53, and CCND1) are supported by 3 articles.^[162]25,[163]74,[164]75,[165]77,[166]79,[167]85-[168]89 For more details, please see [169]Figure 7. The CCND1 gene encodes cyclin D1, with a critical role in BC prognosis.^ [170]90 We observed that our proposed HubGs are mostly associated with a poor prognosis in patients with BC and also suggested for generalized BC by almost all of the 74 articles we listed in Supplemental Figure S1. Figure 7. [171]Figure 7. [172]Open in a new tab Agreement of the proposed BC-related hHubGs with other independent studies. To investigate the pathogenetic processes of hHubGs, we selected the top significant terms for each BP, MF, CC, and KEGG pathway associated with at least 4 hHubGs and P < .01 in [173]Table 2. The GO terms and KEGG pathways related to our identified hHubGs have suggested several studies that are directly linked to BC progression. In particularly, the positive regulation of transcription from RNA polymerase II promoter term with BC cells associates late events in transcription with repair processes in eukaryotic cells.^ [174]91 The regulation of cell cycle and cell proliferation are important indicators of BC prognosis.^ [175]92 The cell proliferation employs transcriptional and nontranscriptional mechanisms to activate the cascade of cyclin-dependent kinases in BC.^ [176]93 The asymmetric cell division intricately regulates the diverse states of stem cells, pivotal in orchestrating the initiation of BC.^ [177]94 Observing the spatial overlap between BRCA1 and nucleus presents intriguing implications for the interplay between BC proteins within the nucleoplasm-nucleolus pathway, shedding light on their potential functional significance.^ [178]95 Several proteins are detected in both cytosolic and membrane fractions, exhibiting robust associations with distinct subfamilies of BC.^ [179]96 The TF binding term may lead to new investigations to understand the mechanism of CXXC5 in BC and may suggest new treatments for ER + BC.^ [180]97 The protein-binding term associated with BC-causing DEGs.^ [181]78 Two terms, enzyme binding and ATP binding, proposed by other studies that are important for BC.^[182]98,[183]99 The KEGG term PI3K-Akt signaling pathway proposed for BC cancer by bioinformatics study.^ [184]12 Furthermore, 3 KEGG pathways are cell cycle, HTLV-I infection, and proteoglycans in cancer are also related to BC progression claimed by other studies.^[185]100-[186]102 Finally, we suggested hHubGs-guided 10 candidate drugs molecules (SORAFENIB, AMG-900, CHEMBL1765740, ENTRECTINIB, MK-6592, masitinib, GSK2126458, TG-02, YM201636, and PAZOPANIB) for the treatment of patients with BC. These drug molecules might be effective for generalized patients with BC in different regions and environments, since these molecules strongly bind to all target genes/proteins (hHUBGs). It should remind here that hHubGs are the representative of all HubGs that were detected in different regions and environment (column 3 in Supplemental Table S1). However, among the identified candidate drugs, some drugs are already approved by the Food and Drug Administration (FDA) for the treatment of other diseases, some are under clinical trials and almost all drugs are supported by other in silico studies for BC. The mechanism of action of our proposed drug is also briefly described below. In particular, SORAFENIB was approved by the FDA in 2013 for the treatment of thyroid carcinoma^ [187]103 and also is under phase 2 clinical trials for the treatment of stage 4 BC.^[188]104,[189]105 The drug SORAFENIB is a multikinase inhibitor that provides antiproliferative, antiangiogenic, and antimetastatic effects by targeting cell surface tyrosine kinase receptors and downstream intracellular kinases implicated in tumor cell proliferation.^ [190]106 The drug AMG-900 is not yet approved for the treatment of any disease but it is under clinical trial (phase 1) for advanced solid tumors.^ [191]107 AMG-900 is an aurora kinases inhibitor, an essential regulator of cell division in mammalian cells, Aurora-A and Aurora-B expression and kinase activity is elevated in a variety of human cancers and is associated with high level of proliferation and poor prognosis.^ [192]108 The drug CHEMBL1765740 has not yet been published or is not under clinical trial for any disease but our study shows good efficacy in both molecular docking and MD simulation analysis. The drug ENTRECTINIB for the treatment of BC by Japan in 2019.^ [193]109 Entrectinib is a tyrosine kinase inhibitor that has shown potent antitumor effects in preclinical studies and has been found to inhibit tumor growth and survival in a dose-dependent manner in murine models.^ [194]110 The drug masitinib was approved by the European Union (EU) for the treatment of amyotrophic lateral sclerosis and mast cell disease and was later denied approval in 2017 and 2018.^[195]111,[196]112 The drug GSK2126458 is under clinical trial of phase 1 study in human patients with BC.^ [197]113 The primary mechanism of action of GSK2126458 is inhibition of the PI3K-Akt signaling pathway, which is critical for cancer cell proliferation and survival.^ [198]114 The drug TG-02 has been approved by the FDA as an orphan drug for glioma and has also been suggested as an enzyme inhibitor of CDK for the treatment of cancer disease.^[199]115,[200]116 The TG-02 and YM201636 drugs have been also suggested for treatment against BC through molecular docking analysis.^ [201]12 TG-02 reduces cell proliferation and promotes cell death in cancer by inhibiting multiple kinases, particularly CDK1, CDK2, CDK7, and CDK9.^ [202]117 YM201636 inhibits the phosphatidylinositol 3,5-bisphosphate kinase (PIKfyve), and has been shown to have antiproliferative effects on liver cancer cells in a dose-dependent manner.^ [203]118 The drug PAZOPANIB is under clinical trial (phase 2) for BC treatment.^ [204]119 Pazopanib is an oncostatic drug with antiangiogenic properties via inhibition of the intracellular tyrosine kinase vascular endothelial growth factor receptors (VEGFR).^ [205]120 We conclude from the literature review that the drug ENTRECTINIB have been approved and three drugs (SORAFENIB, GSK2126458, and PAZOPANIB) are under the clinical trial for BC treatment and other 6 drugs (AMG-900, CHEMBL1765740, MK-6592, masitinib, TG-02, and YM201636) are approved/under clinical trials for different diseases. Thus, we strongly recommend for further evaluation at the molecular level by experimental-laboratory testing and hope that effective results can be obtained. Conclusions This study identified most representative top-ranked 10 BC-causing HubGs (CCNB1, CDK1, TOP2A, CCNA2, ESR1, EGFR, JUN, ACTB, TP53, and CCND1) as the hHubGs by the PPI network analysis of 259 individual HubGs that are obtained from 74 previously published individual studies. Breast cancer prediction analysis based 3 gene-expression profiles ([206]GSE65216, [207]GSE10810 and [208]GSE36295) of NCBI and the box-plot analysis with the gene-expression profiles of TCGA databases confirmed the differential expression patterns of hHubGs between BC and control samples. The enrichment analysis with GO terms and KEGG pathways revealed some crucial BC-causing BP (regulation of cell cycle and cell proliferation), MFs (nucleoplasm and nucleus), CCs (protein binding and enzyme binding), and pathways (PI3K-Akt signaling pathway and cell cycle) that we mentioned in the “Discussion” section. Gene regulatory network (GRN) disclosed top-ranked 5 TFs (MYC, HNF4A, KLF4, POU5F1, and SOX2) and 5 miRNAs (hsa-mir-103a-3p, hsa-mir-107, hsa-mir-16-5p, hsa-mir-34a-5p, and hsa-mir-23b-3p) as the transcriptional and posttranscriptional regulators of hHubGs. Three hHubGs (JUN, TP53, and ESR1) also act as TFs. As for example, the JUN inhibits to both TFs KLF4 and SOX2, TP53 activates to the TF POU5F1 and inhibits the TFs MYC and SOX2, and ESR1 activates to the TFs KLF4. Finally, this study recommended 10 candidate drug molecules (SORAFENIB, AMG-900, CHEMBL1765740, ENTRECTINIB, MK-6592, masitinib, GSK2126458, TG-02, YM201636, and PAZOPANIB) for the treatment against BC, where 1 drug (ENTRECTINIB) already approved by the Japanese drug administration authority, 3 drug molecules SORAFENIB, GSK2126458, and PAZOPANIB are under the clinical trial for BC treatment and the rest 6 molecules (AMG-900, CHEMBL1765740, MK-6592, masitinib, TG-02, and YM201636) require further experimental validation in wet-laboratory although they received support by some independent computational studies for BC treatment. Therefore, the findings of this study might be played a vital role for taking a proper treatment plan against BC. Acknowledgments