Abstract
   Breast cancer (BC) is a complex disease, which causes of high mortality
   rate in women. Early diagnosis and therapeutic improvements may reduce
   the mortality rate. There were more than 74 individual studies that
   have suggested BC-causing hub-genes (HubGs) in the literature. However,
   we observed that their HubG sets are not so consistent with each other.
   It may be happened due to the regional and environmental variations
   with the sample units. Therefore, it was required to explore hub of the
   HubG (hHubG) sets that might be more representative for early diagnosis
   and therapies of BC in different country regions and their
   environments. In this study, we selected top-ranked 10 HubGs (CCNB1,
   CDK1, TOP2A, CCNA2, ESR1, EGFR, JUN, ACTB, TP53, and CCND1) as the
   hHubG set by the protein-protein interaction network analysis based on
   all of 74 individual HubG sets. The hHubG set enrichment analysis
   detected some crucial biological processes, molecular functions, and
   pathways that are significantly associated with BC progressions. The
   expression analysis of hHubGs by box plots in different stages of BC
   progression and BC prediction models indicated that the proposed hHubGs
   can be considered as the early diagnostic and prognostic biomarkers.
   Finally, we suggested hHubGs-guided top-ranked 10 candidate drug
   molecules (SORAFENIB, AMG-900, CHEMBL1765740, ENTRECTINIB, MK-6592,
   YM201636, masitinib, GSK2126458, TG-02, and PAZOPANIB) by molecular
   docking analysis for the treatment against BC. We investigated the
   stability of top-ranked 3 drug-target complexes (SORAFENIB vs ESR1,
   AMG-900 vs TOP2A, and CHEMBL1765740 vs EGFR) by computing their binding
   free energies based on 100-ns molecular dynamic (MD) simulation based
   Molecular Mechanics Poisson-Boltzmann Surface Area (MM-PBSA) approach
   and found their stable performance. The literature review also
   supported our findings much more for BC compared with the results of
   individual studies. Therefore, the findings of this study may be useful
   resources for early diagnosis, prognosis, and therapies of BC.
   Keywords: Breast cancer, hub of the hub-genes (hHubGs), early diagnosis
   and prognosis, regulatory factors, drug repurposing, molecular docking
Introduction
   Cancer is a malignant tumor that is initiated by multiple changes at
   the genetic and epigenetic levels that progressively lead to abnormal
   cell division and cellular alteration.^[35]1,[36]2 Breast cancer (BC)
   is a major public health concern, since it is the leading cause of
   cancer-related deaths in women worldwide. Breast cancer was the second
   most common cancer diagnosed, and the number was approximately 2
   million (11.6% of all cancers) in 2018.^ [37]3 Breast cancer became the
   leading cause of diagnosis in 2020, and the number was approximately
   2.2 million (11.7% of all cancers) worldwide.^ [38]4 Thus, it is a
   matter of concern that the number of patients with BC and deaths is
   increasing every year, despite claims that there have been significant
   improvements in BC treatment. According to the American Cancer Society,
   the 5-year survival rates of patients with BC are 98%, 93%, 72%, and
   22% in stage 1, stage 2, stage 3, and stage 4, respectively.^ [39]5 The
   5-year BC survival rate is the percentage of people who survive at
   least 5 years after diagnosis of BC. From a previous study, it was
   found that around 52.34% of the patients with BC received treatment in
   stage 4, 32.81% in stage 3, 10.16% in stage 2, and only 4.69% patients
   were treated in stage 1.^ [40]6 It has been observed that highest
   number of patients is detected in the fourth stage of BC and their
   survival rate is the lowest. Therefore, early diagnosis and therapeutic
   improvement is essential to reduce the death rate of patients with BC.
   The BC-causing genes may play the key role in this regard.
   There are several individual studies on transcriptomics
   (microarray/RNA-Seq) analysis in the literature that suggested
   BC-causing hub-genes (HubGs) to investigate the pathogenetic processes
   of BC. We systematically reviewed 74 individual articles from different
   online sources that suggested BC-causing HubGs and found that their
   HubG sets are not so similar with each other (see Supplemental Table
   S1). It may be happened due to the regional and environmental
   variations with the experimental units. We found a total of 259
   different HubGs from those articles in which there was no any common
   HubGs among the articles. So, it may be difficult to take a common
   treatment plan for all patients with BC in different regions and
   environment. Therefore, it is required to explore hub of the HubG
   (hHubG) sets that may be more representative BC-causing HubGs for
   diagnosis and therapeutic improvement for the treatment against BC in
   different environments. Exploring BC-causing hHubGs out of 259 HubGs
   highlighting their early diagnostic, prognostic, and therapeutic
   characteristics by the wet-laboratory experimental procedure may be
   laborious, time-consuming, and costly. In this study, we attempted to
   explore BC-causing hHubGs highlighting their early diagnostic,
   prognostic, and therapeutic characteristics by using the integrated
   statistics and bioinformatics approaches.
   To explore disease-causing HubGs, the protein-protein interaction (PPI)
   network analysis is a widely used popular approach.^[41]7-[42]12 In
   this study, we also explored BC-causing hHubGs by the PPI network
   analysis of 259 HubGs that we mentioned earlier. In the case of drug
   discovery, drug repurposing (DR) is a promising strategy, where new
   therapeutic inventions are made through existing drugs.^ [43]13 It is
   secure, more efficient, less expensive, and less time-consuming
   compared with the de novo process due to it skip several steps from the
   target-based drug selection to clinical validation.^ [44]14 Therefore,
   in this study, we detected hHubGs-guided repurposable drug molecules.
   There are several steps involved in the in silico analysis of the DR
   process, such as network pharmacology, molecular docking analysis, and
   dynamic simulation analysis.^ [45]15 Network pharmacology is an
   efficient tool for systematic pharmacologic research which creates
   interactions between target genes and drugs through network-based
   methods.^[46]16,[47]17 Molecular docking analysis plays a vital role in
   drug design through structural binding between therapeutic targets and
   drug molecules by calculating affinity scores.^ [48]18 The molecular
   dynamic (MD) simulation indicates the stability of docking patterns.
   Recently, it has been widely used and has become one of the most
   popular platforms to validate the candidate drugs.^[49]19,[50]20 The
   workflow of this study is displayed in [51]Figure 1. It should be
   mentioned here that the similar workflow was also considered in some
   previous studies for other diseases.^[52]21,[53]22
Figure 1.
   [54]Figure 1.
   [55]Open in a new tab
   The workflow of this study.
Methods and Materials
Data collection on BC-causing HubG sets
   We collected HubG sets and drug molecules as molecular metadata from
   different published articles up to December 31, 2023, through online
   search strategy with the keywords “breast cancer [ti], gene expression,
   RNA-Seq, scRNA-Seq, differentially expressed genes (DEGs), and
   core/key/hub genes” from the public sources including PubMed in the
   National Center of Biotechnology Information (NCBI)
   ([56]https://pubmed.ncbi.nlm.nih.gov/), Google Scholar
   ([57]https://scholar.google.com/), and Google
   ([58]https://www.google.com/). We found a total of 74 independent
   articles that suggested BC-related HubGs-set. We collected 74 HubG sets
   from the selected 74 articles. We found a total of 259 unique HubGs
   from those HubG sets and used them to explore hHubGs for further
   investigation in this study (see Supplemental Table S1).
Data collection on drug molecules
   We collected candidate drug molecules from 3 platforms: (1) published
   articles, (2) the Gene Set Cancer Analysis (GSCALite) database,^ [59]23
   and (3) the Drug Gene Interaction Database (DGIdb).^ [60]24 Some of our
   selected articles suggested candidate drugs for the treatment of BC and
   considered them set A.^[61]25,[62]26 We collected meta-drugs from 2
   databases DGIdb and GSCALite through the hHubG-drugs interaction and
   considered them sets B and C, respectively (Supplemental Table S2).
Detection of BC-causing hub of the HubGs
   Network-based strategies are widely used in identifying disease-driving
   KGs and their potential regulators.^[63]27,[64]28 We submitted all
   unique HubGs to the STRING web tool (Search Tool for the Retrieval of
   Interacting Genes) (version: 11.5) to construct PPI network.^ [65]29
   Protein-protein interactions of the all HubGs were constructed with a
   confidence score 0.90. Subsequently, Cytoscape software (version:
   3.9.0) was used for visualization of the PPI network.^ [66]30 The
   “Analyze Network tool” in Cytoscape was used for topological degree
   analysis, and the degree >90 was considered as the cutoff criterion for
   hHubG identification.
Assessment of hHubGs as the early diagnostic and prognostic biomarkers
   To assess hHubGs as early diagnostic and prognostic biomarkers, we
   performed their expression analysis with the different stages (normal
   status, stage 1, stage 2, stage 3, and stage 4) of BC progression by
   using box plots through the UALCAN web-tool and The Cancer Genome Atlas
   (TCGA) database.^ [67]31 To investigate the prognostic power of hHubGs,
   we also developed BC prediction models through 2 popular machine
   learning methods known as support vector machine (SVM) and random
   forest (RF). To perform prediction models, we collected 3 microarray
   gene-expression data sets with accession numbers [68]GSE65216,^ [69]32
   [70]GSE10810,^ [71]33 and [72]GSE36295^ [73]34 from the Gene Expression
   Omnibus (GEO) database. Each data set divided into 2 groups: training
   data (70% of all data) and test data (30% of all data). Finally, the
   receiver operating characteristic (ROC) curve was used to assess the
   prognosis performance. The ROC curve was constructed using the R
   package ROCR. We also used independent database “DisGeNET” and web-tool
   “Enrichr” to verify the association of hHubGs with BC and other disease
   by disease-hHubGs enrichment analysis.^ [74]35
Functional and pathway enrichment analysis
   To investigate the pathogenetic processes of hHubGs, we performed their
   Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genome (KEGG)
   pathway enrichment analysis. The GO enrichment analysis was performed
   in 3 categories namely biological process (BP), cellular component
   (CC), and molecular function (MF). Gene Ontology functional and KEGG
   pathway enrichment analyses were performed using the online database
   DAVID (version: 6.8) online tool.^ [75]36
Regulatory interaction network
   We performed regulatory interaction network to identify the key
   transcription factors (TFs) and microRNAs (miRNAs) as the
   transcriptional and posttranscriptional regulators of hHubGs. The
   online tool “NetworkAnalyst” (version: 3.0) was used to construct the
   regulatory interaction network.^ [76]37 ChEA and TarBase (v8.0)
   databases were selected for constructing hHubG-TF and hHubG-miRNA
   interaction networks, respectively.
Drug repurposing
   We used molecular docking analysis, a popular in silico validation
   technique, to perform the binding between the drug and target. We
   considered hHubGs as targets (receptors/proteins) and meta-drug agents
   collected from published articles and the online databases GSCALite and
   DGIdb as drugs (compounds) for performing molecular docking analysis.
   We collected 3-dimensional (3D) structures of proteins from the Protein
   Data Bank (PDB) ([77]https://www.rcsb.org/)^ [78]38 and the SWISS-MODEL
   ([79]https://swissmodel.expasy.org/)^ [80]39 databases due to the
   required to perform molecular docking analysis. The 3D structures of
   all compounds were downloaded from the PubChem
   ([81]https://pubchem.ncbi.nlm.nih.gov/) database.^ [82]40 PyMOL
   (PyMOL2) software was used for the preprocessing of 3D structure
   proteins.^ [83]41 PyRx software was used to perform the blind molecular
   docking analysis between proteins and compounds and selected
   shortlisted drugs based on binding affinity scores (BASs)
   (kcal/mol).^[84]42,[85]43 Then, we validate the selected drug molecules
   with the active sites (by using PyMOL2 software)^ [86]41 of our
   selected hHubGs through flexible docking analysis. The 3D interaction
   between the protein and compound was constructed and visualized by
   using USCF Chimera and Discovery Studio Visualizer 2021 software.^
   [87]44
MD simulation studies
   Molecular dynamic simulations were performed using YASARA Dynamics
   software^ [88]45 and the AMBER (Assisted Model Building with Energy
   Refinement) force field^ [89]46 to test the dynamic behavior of the top
   protein-compound interactions. The AMBER’s force fields have been
   extensively validated and optimized for various biological molecules,
   making it a robust choice for simulation study.^ [90]47 The root mean
   square deviation (RMSD) value between the crystal reference and the
   predicted structure is commonly used to confirm whether a close-match
   docked pose was predicted or not by docking simulations. If the RMSD
   value is ⩽2 Å, the result is considered as good result.^ [91]48 Three
   top-ranked interactions based on affinity scores were selected to
   perform MD simulations. For simulation performance, the
   protein-compound interaction hydrogen bonding network is optimized and
   solvated through the TIP3P water model in a simulation cell.^ [92]49
   The periodic boundary condition was maintained by a solvent
   concentration of 0.997 g L^−1. pKa was calculated during solvation by
   subjecting to titratable amino acids in the interaction. A time-step
   interval of 2.50 fs (298 K, pH = 7.4, 0.9% NaCl) under physiological
   conditions was used to run several time-step algorithms for each
   simulation.^ [93]50 The SETTLE and LINCS algorithms were used to
   contain water molecules and constrain all bond lengths,
   respectively.^[94]51,[95]52 For further analysis, trajectories were
   recorded every 250 ps, and subsequent analyses were implemented by
   default scripts of YASARA Macro and SciDAVis software.^ [96]53 Then,
   all snapshots were subjected to the Molecular Mechanics
   Poisson-Boltzmann surface area (MM-PBSA) in YASARA software to
   calculate the binding free energy using the given equation.^ [97]54
   [MATH:
   Binding
   freeEnerg
   mi>yΔG=G<
   mrow>bound−Gunbo<
   mi>und :MATH]
   Here, YASARA built-in macros were used with consideration AMBER 14 as a
   force field to calculate the MM-PBSA binding energy, where a greater
   positive energy indicates better binding.
Results
Exploring BC-causing hub-genes by the systematic literature review
   We found a total of 297 BC-related articles from various online sources
   (PubMed, Google Scholar, and Google) by keywords search up to December
   31, 2023. Finally, we selected 74 articles out of them through the
   inclusion-exclusion criteria (see Supplemental Figure S1 and
   Supplemental Table S1). Then, we detected a total of BC-causing 259
   different HubGs from those 74 HubG sets as described in the “Materials
   and Methods” section and used them for further investigation in this
   study.
Detection of hub of the HubGs
   The PPI network was constructed based on those 259 individual HubGs to
   detect BC-causing hHubGs as key genomic biomarkers. It included 209 of
   nodes, 5785 edges, average node degree 39.5, average local clustering
   coefficient 0.599, and P < 1.0e-15. The PPI network was visualized in
   [98]Figure 2, where hHubGs are indicated by large-size nodes and
   selected based on the top degree of connectivity (degree >95). We
   selected top-ranked 10 hHubGs (CCNB1, CDK1, TOP2A, CCNA2, ESR1, EGFR,
   JUN, ACTB, TP53, and CCND1) in which 6 hHubGs (CCNB1, CDK1, TOP2A,
   CCNA2, ESR1, and EGFR) belong to the set of already published
   BC-causing top-ranked 10 HubGs (see [99]Table 1 and Supplemental Table
   S1).
Figure 2.
   Figure 2.
   [100]Open in a new tab
   The PPI network based on 259 unique HubGs, where large size indicates
   hHubGs with the highest degree of connectivity (degree >90).
Table 1.
   List of top-ranked 10 BC-causing HubGs/hHubGs based on the agreement of
   published articles (Supplemental Table S1) (A*) and hHubGs through PPI
   network analysis (B*) ([101]Figure 2), respectively.
   Top-ranked 10 HubGs by the agreement of published articles
   (Supplemental Table S1) Top-ranked 10 hHubGs (proposed) Common hHubGs
   [MATH: (A*∩B*
   mrow>) :MATH]
   Genes (A*) Number of agreed articles Genes (B*) Degree of PPI score
   TOP2A 8 TP53 143 CCNB1
   CDK1
   TOP2A
   CCNA2
   ESR1
   EGFR
   CDK1 7 ACTB 122
   CCNB1 9 EGFR 102
   FN1 8 CCND1 100
   RRM2 6 JUN 100
   BUB1B 7 ESR1 95
   CCNB2 8 CDK1 99
   CCNA2 8 CCNB1 99
   ESR1 6 CCNA2 98
   EGFR 6 TOP2A 91
   [102]Open in a new tab
   Bold indicates common genes.
Performance of hHubGs as early diagnostic and prognostic biomarkers
   To investigate the significance of differential expression patterns of
   top-ranked 10 hHubGs between the normal group and 4-stage group (stage
   1, stage 2, stage 3, and stage 4) of BC individually, we performed
   box-plot analysis using t test statistic with P < .001 based on their
   TCGA expression profiles ([103]Figure 3A). We observed that all hHubGs
   individually showed significantly differential expression patterns (P
   values ranged from 8.99E-04 [TP53] to 2.22E-16 [ACTB]) between stage 1
   BC and control samples group. Thus, our identified hHubGs may play an
   important role in the early stage diagnosis of BC. To investigate the
   prognostic power of top-ranked 10 hHubGs, we developed RF and SVM-based
   2 BC prediction models by using the gene-expression data with NCBI-GEO
   accession number [104]GSE65216 as training data set, and [105]GSE10810
   and [106]GSE36295 as the test data sets. To investigate the performance
   of the prediction models, we constructed ROC curves in [107]Figure 3B,
   where green indicates training performance and red and blue indicate
   test performance. We observed that both the training and test
   performances were good with area under the curve (AUC) values >0.86 for
   both the RF and SVM-based prediction models. Furthermore, we observed
   that hHubGs are significantly associated with invasive carcinoma of BC
   and luminal B breast carcinoma (Supplemental Figure S2). Thus, our
   suggested hHubGs are significantly associated with almost all subtypes
   of BC.
Figure 3.
   [108]Figure 3.
   [109]Open in a new tab
   (A) Analysis of expression levels of hHubGs between the normal group
   and 4-stage group (stage 1, stage 2, stage 3, and stage 4) of BC, and
   (B) ROC curves to investigate the prognostic power of hHubGs by the RF
   and SVN-based BC prediction models with 3 gene-expression profile data
   sets in NCBI-GEO database.
HubGs-set enrichment analysis
   The GO functional and KEGG enrichment analysis showed that unique HubGs
   were enriched by 234 GO-BP terms, 56 GO-MF terms, 38 GO-CC terms, and
   85 pathways. Then, we selected the significant GO terms and KEGG
   pathways that were associated with at least 4 of our identified hHubGs
   and P < .01. Four BP terms (positive regulation of transcription from
   RNA polymerase II promoter, cell proliferation, cell division, and
   regulation of cell cycle), 5 CC terms (nucleoplasm, nucleus, cytosol,
   cytoplasm, and membrane), 5 MF terms (protein binding, enzyme binding,
   adenosine triphosphate (ATP) binding, TF binding, and protein
   heterodimerization activity), and 7 KEGG pathways (pathways in cancer,
   PI3K-Akt signaling pathway, cell cycle, herpes simplex infection, human
   T-lymphotropic virus type 1 (HTLV-I) infection, proteoglycans in
   cancer, and PI3K-Akt signaling pathway) were associated with at least 4
   hHubGs and may be responsible for BC progression in [110]Table 2.
Table 2.
   Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes
   (KEGG) pathways enrichment analysis results of all HubGs highlighting
   number of HubGs (nHubGs), P value, and associated hub of the HubGs
   (hHubGs).
   ID Terms/pathways HubGs P Associated hHubGs
   GO: Biological process (BP)
   GO:0045944 Positive regulation of transcription from RNA polymerase II
   promoter 66 3.8E−21 TOP2A, TP53, EGFR, ESR1, JUN
   GO:0008283 Cell proliferation 38 5.0E−18 EGFR, CDK1, TP53, JUN
   GO:0051301 Cell division 36 5.9E−17 CCNB1, CCND1, CCNA2, CDK1
   GO:0051726 Regulation of cell cycle 23 2.8E−16 JUN, CCNB1, TP53, ESR1
   GO: Cellular components (CCs)
   GO:0005654 Nucleoplasm 123 2.6E−26 ACTB, CCND1, TP53, CCNB1, ESR1, JUN,
   CDK1, TOP2A, CCNA2
   GO:0005634 Nucleus 175 5.9E−24 CCND1, TP53, CCNB1, JUN, ESR1, CDK1,
   TOP2A, CCNA2, EGFR
   GO:0005829 Cytosol 129 1.0E−22 ACTB, CCND1, TP53, CCNB1, JUN, CDK1
   GO:0005737 Cytoplasm 154 8.7E−16 ACTB, CCND1, TP53, CCNB1, ESR1, CDK1,
   TOP2A, CCNA2, EGFR
   GO:0016020 Membrane 73 1.1E−08 ACTB, CCND1, EGFR, CCNB1, ESR1, CDK1
   GO: Molecular functions (MFs)
   GO:0005515 Protein binding 245 2.6E−29 ACTB, TP53, JUN, TOP2A, EGFR,
   CCND1, CCNB1, ESR1, CDK1, CCNA2
   GO:0019899 Enzyme binding 33 3.9E−15 TOP2A, EGFR, CCND1, JUN, ESR1,
   TP53
   GO:0005524 ATP binding 67 9.8E−13 TOP2A, ACTB, TP53, EGFR, CDK1
   GO:0008134 Transcription factor binding 24 1.0E−09 JUN, ESR1, CCND1,
   TP53
   GO:0046982 Protein heterodimerization activity 28 4.8E−08 TOP2A, EGFR,
   JUN, TP53
   KEGG pathways
   hsa05200 Pathways in cancer 51 1.4E−20 CCND1, TP53, EGFR, JUN
   hsa04151 PI3K-Akt signaling pathway 33 1.6E−20 CCND1, JUN, CCNA2, TP53
   hsa04110 Cell cycle 29 2.6E−18 CCNB1, CCND1, CCNA2, CDK1, TP53
   hsa05168 Herpes simplex infection 24 1.0E−16 EGFR, CCND1, TP53, JUN
   hsa05166 HTLV-I infection 36 1.8E−15 CCND1, JUN, TP53, ESR1
   hsa05205 Proteoglycans in cancer 31 2.4E−14 EGFR, ACTB, CCND1, ESR1,
   TP53
   hsa04151 PI3K-Akt signaling pathway 40 3.1E−14 EGFR, CCND1, TP53, TOP2A
   [111]Open in a new tab
Regulatory network analysis
   To identify the top-ranked transcriptional and posttranscriptional
   regulators of hHubGs, we constructed and visualized KG-TF and KG-miRNA
   interaction networks in [112]Figure 4A and [113]B, where pink indicates
   hHubGs and green indicates TFs and miRNAs in [114]Figure 4A and [115]B,
   respectively. We observed that 5 TFs (MYC, HNF4A, KLF4, POU5F1, and
   SOX2) were associated with at least 8 hHubGs (degree ⩾8), so these 5
   TFs considered as transcriptional factors of hHubG. Similarly, 5 miRNAs
   (hsa-mir-103a-3p, hsa-mir-107, hsa-mir-16-5p, hsa-mir-34a-5p, and
   hsa-mir-23b-3p) were associated with at least 8 hHubGs (degree ⩾8), so
   these 5 miRNAs were considered as posttranscriptional factors of
   hHubGs. We found from EMBL-EBI (European Molecular Biology
   Laboratory—European Bioinformatics Institute) database that 3 hHubGs
   (JUN, TP53, and ESR1) out of 10 hHubGs are TFs and indicated them by
   diamond shape in [116]Figure 4. [117]Figure 4C(I) indicates that 3 TFs
   of hHubGs (MYC, KLF4, and POU5F1) showed downregulation, and 2 TFs
   (HNF4A and SOX2) showed upregulation. [118]Figure 4C(II) shows that
   KLF4 is a upstream and SOX2 is a downstream TFs for the JUN hHubGs
   (TFs), so JUN acts as an inhibitor to both KLF4 and SOX2 (if the
   expression level of the downstream TF decreased, then core TF acted as
   an activator otherwise inhibitor). Similarly, TP53 acted as an
   activator to POU5F1 and inhibitor to MYC and SOX2, and ESR1 acted as an
   activator to KLF4.
Figure 4.
   [119]Figure 4.
   [120]Open in a new tab
   hHubGs-regulatory networks, where pink indicates hHubGs (diamond shape
   hHubGs indicate TFs themselves): (A) hHubGs-TFs interaction network,
   where large green indicates key transcription factors; (B) hHubG-miRNA
   interaction network, where large green indicates key
   posttranscriptional factors; and (C) upstream and downstream analysis
   of TFs (I) differential expressions for TFs of hHubGs (II) upstream and
   downstream regulation by hHubGs that are also TFs (JUN, TP53, and
   ESR1), where pink diamond indicates core TFs, red rectangle indicates
   upstream TFs, and blue rectangle indicates downstream TFs.
Drug repurposing
   First, we collected BC and hHubGs associated with a total of 258 drug
   agents from the published articles and 2 online databases: DGIdb and
   GSCALite. Then, we performed blind molecular docking analysis (entire
   protein) between compounds (drug agents) and target proteins (hHubGs).
   We selected the top-10 poses based on the smallest docking scores
   between compounds and target proteins. To check the quality of binding
   poses, we calculated the RMSD between native pose and our sleeted 10
   poses. Finally, we selected the best pose based on the smallest RMSD
   and considered the docking score in that pose. We visualized the
   molecular docking results of the top 125 interactions in [121]Figure
   5A, where the y-axis represents the target proteins and x-axis
   compounds and different colors indicate the affinity scores. Compounds
   were arranged in x-axis in ascending order based on the average BAS
   across all proteins. For example, top-ranked 10 compounds (SORAFENIB,
   AMG-900, CHEMBL1765740, ENTRECTINIB, MK-6592, YM201636, masitinib,
   GSK2126458, TG-02, and PAZOPANIB) bind to the receptor proteins with
   average BAS less than −8.76 kcal/mol (binds strongly to the receptor
   proteins), and the next top-ranked 10 compounds (GSK-269962A, MLN-8054,
   ENMD-2076, ENMD-981693, NERVIANO, SB590885, Lapatinib, GSK1070916,
   TAK-715, and PF-03814735) bind to the receptor proteins with average
   BAS greater than −8.23 kcal/mol (binds weakly to the receptor
   proteins). Therefore, top-ranked 10 compounds (SORAFENIB, AMG-900,
   CHEMBL1765740, ENTRECTINIB, MK-6592, YM201636, masitinib, GSK2126458,
   TG-02, and PAZOPANIB) were proposed as candidate drugs against BC. In
   [122]Figure 5B, the BAS of top-ranked 10 compounds with all receptors
   was highlighted. We found 6 overlapping genes between the top-ranked
   HubGs and hHubGs in [123]Figure 5C. We cross-validated the proposed
   candidate drugs with 4 uncommon top-ranked HubGs in [124]Figure 5D. To
   validate the top-ranked drug molecules with active sites of top-ranked
   HubGs by their flexible docking analysis, we predicted their active
   sites by the PyMOL software and listed in Table S3. Then, we perform
   flexible docking analysis between drugs and selected top-ranked HubGs
   (with flexibility of active sites) in [125]Table 3. We also found a
   strong bond between them, so we may recommend our candidate drugs for
   universal use against BC.
Figure 5.
   [126]Figure 5.
   [127]Open in a new tab
   Molecular docking analysis results for screening candidate drugs
   against BC: (A) presented interaction binding affinity scores between
   hHubGs and drug agents; (B) visualization of interaction between 10
   candidate drugs and hHubGs by zooming; (C) overlap between the
   top-ranked HubGs and hHubGs; and (D) cross-validation of the candidate
   drugs by top-ranked HubGs.
Table 3.
   Flexible docking scores between the proposed drug molecules and the
   active sites of hHubGs.
   Proteins AMG900 CHEMBL1765740 SORAFENIB ENTRECTINIB MK6592 YM201636
   masitinib GSK2126458 TG02 PAZOPANIB
   TOP2A −8.2 −7.3 −7.1 −7 −7.5 −7.3 −6.8 −7.1 −6.9 −6.3
   EGFR −9.3 −8.5 −7.3 −7.6 −7.8 −8 −7.1 −6.9 −7.1 −6.1
   ESR1 −8.9 −9.3 −9.5 −8.3 −8 −6.6 −7.3 −7.2 −7 −6.6
   TP53 −8.5 −8.8 −7.8 −8 −8.1 −7.9 −7.6 −6.7 −7.1 −7.8
   CDK1 −7.7 −6.8 −7.5 −7.6 −7.1 −7.2 −7 −6.7 −6.5 −6.8
   CCND1 −7.6 −7.4 −8.1 −7.7 −7.3 −7.4 −7.6 −7.7 −7.4 −7.1
   JUN −8.5 −7.7 −8.1 −8.1 −7.7 −7.1 −6.7 −7 −6.9 −6.7
   ACTB −9.2 −7.8 −8.6 −7.2 −7.5 −7.7 −7.1 −7.7 −7.1 −6.6
   CCNB1 −8.3 −7.5 −7.5 −8.1 −7.7 −7.6 −7.4 −7 −6.7 −7.2
   CCNA2 −9.6 −7.6 −8.5 −7.5 −7.6 −7.2 −6.9 −7.5 −5.8 −6.3
   [128]Open in a new tab
Molecular dynamic simulations
   Three top interactions, SORAFENIB-ESR1 (BAS = −12.5),
   CHEMBL1765740-EGFR (BAS = −10.8), and AMG900-TOP2A (BAS = −12.4),
   between candidate drugs and target proteins were selected for stability
   analysis using 100-ns MD-based MM-PBSA simulations. We noticed that all
   systems were remarkably stable in variations of moving and initial
   drug-target interactions in [129]Figure 6. We calculated the RMSD of
   selected interactions to check the stability of structure during the
   simulation period in [130]Figure 6A. All the systems projected RMSD
   range from 1 Å to 1.75 Å, .75 Å to 2.25 Å, and .5 Å to 2.75 Å for
   SORAFENIB-ESR1, CHEMBL1765740-EGFR, and AMG900-TOP2A interactions,
   respectively. We observed that all interactions fluctuated slightly
   between 0 and 16 000 ps and were stable in the remaining simulations.
   We also calculated the MM-PBSA binding energy for the 3 selected top
   interactions in [131]Figure 6B. The average binding forces for
   SORAFENIB-ESR1, CHEMBL1765740-EGFR, and AMG900-TOP2A interactions were
   217.5, 130.45, and 131.4 kJ/mol, respectively.
Figure 6.
   [132]Figure 6.
   [133]Open in a new tab
   MD simulation analysis of the top 3 interactions, where blue, red, and
   green indicate SORAFENIB-ESR1, CHEMBL1765740-EGFR, and AMG900-TOP2A
   interactions, respectively: (A) time evolution of RMSDs of backbone
   atoms (C, Cα, and N) for protein for each docked complex, and (B)
   represented binding free energy of each snapshot for the top 3
   interactions.
Discussion
   This study analyzed metadata on BC-causing HubGs collected from
   different published articles. At first, we selected 74 articles out of
   297 by the systematic review as shown in Supplemental Figure S1.
   Different 259 HubGs were found in total from the selected articles
   (Supplemental Table S1), and these data were used for further analysis
   in this study. Top-ranked 10 HubGs (CCNB1, CDK1, TOP2A, CCNA2, ESR1,
   EGFR, JUN, ACTB, TP53, and CCND1) were selected as the hHubGs out of
   different 259 HubGs by the PPI network analysis, where 6 hHubGs (CCNB1,
   CDK1, TOP2A, CCNA2, ESR1, and EGFR) are common with the top-ranked
   HubGs based on the agreement of published articles ([134]Table 1).
   Then, we verified their differential expression patterns between BC and
   control samples using independent data sets from NCBI and TCGA
   databases to the prediction models and box-plot analysis. We observed
   that the proposed hHubGs can be used as the strong biomarkers for early
   diagnosis and prognosis, where 8 genes (CCNB1, CDK1, TOP2A, CCNA2,
   ESR1, ACTB, TP53, and CCND1) are up-regulated hHubGs and the rest 2
   genes (EGFR and JUN) are down-regulated hHubGs. In particularly, the
   CCNB1 gene is associated with aggressive tumor behavior, larger tumor
   size, higher grade, hormone receptor negativity, HER2 positivity, and
   poor clinical outcome.^ [135]55 The TOP2A gene has been suggested in 8
   articles that this gene plays a prognostic role in patients with
   BC.^[136]56-[137]62 Overexpression of TOP2A is a marker of poor
   prognosis, especially in luminal/hormone receptor-positive BC, and may
   serve as both a prognostic and predictive biomarkers in BC disease.^
   [138]63 Seven articles claim that the CDK1 gene is responsible for the
   occurrence and development of
   BC.^[139]26,[140]57-[141]59,[142]64,[143]65 CDK1 is associated with
   shorter overall survival, progression-free survival, and relapse-free
   survival, as well as more advanced clinical stage of cancer.^ [144]66
   The CCNA2 gene has been identified in 6 studies as a promising
   biomarker in silico analysis related to BC
   progression.^[145]26,[146]56,[147]57,[148]67-[149]72 The CCNA2 gene is
   linked to a poor prognosis and resistance to endocrine therapy and
   serve as a prognostic biomarker in estrogen receptor-positive patients
   with BC.^ [150]73 The ESR1 gene suggested in several articles suggested
   as generalized BC.^[151]56,[152]74-[153]78 The EGFR gene and its
   signaling pathways are related to the progression of BC and can be used
   as a therapeutic target against BC
   therapy.^[154]61,[155]74,[156]79-[157]81 Overexpression of EGFR is
   associated with more aggressive, ER-negative disease and poorer
   prognosis, particularly in triple negative BC.^ [158]82 The JUN gene
   has been suggested that it can be used as a biomarker for the diagnosis
   and therapeutic target of BC.^[159]74,[160]83,[161]84 In addition, 3
   genes (ACTB, TP53, and CCND1) are supported by 3
   articles.^[162]25,[163]74,[164]75,[165]77,[166]79,[167]85-[168]89 For
   more details, please see [169]Figure 7. The CCND1 gene encodes cyclin
   D1, with a critical role in BC prognosis.^ [170]90 We observed that our
   proposed HubGs are mostly associated with a poor prognosis in patients
   with BC and also suggested for generalized BC by almost all of the 74
   articles we listed in Supplemental Figure S1.
Figure 7.
   [171]Figure 7.
   [172]Open in a new tab
   Agreement of the proposed BC-related hHubGs with other independent
   studies.
   To investigate the pathogenetic processes of hHubGs, we selected the
   top significant terms for each BP, MF, CC, and KEGG pathway associated
   with at least 4 hHubGs and P < .01 in [173]Table 2. The GO terms and
   KEGG pathways related to our identified hHubGs have suggested several
   studies that are directly linked to BC progression. In particularly,
   the positive regulation of transcription from RNA polymerase II
   promoter term with BC cells associates late events in transcription
   with repair processes in eukaryotic cells.^ [174]91 The regulation of
   cell cycle and cell proliferation are important indicators of BC
   prognosis.^ [175]92 The cell proliferation employs transcriptional and
   nontranscriptional mechanisms to activate the cascade of
   cyclin-dependent kinases in BC.^ [176]93 The asymmetric cell division
   intricately regulates the diverse states of stem cells, pivotal in
   orchestrating the initiation of BC.^ [177]94 Observing the spatial
   overlap between BRCA1 and nucleus presents intriguing implications for
   the interplay between BC proteins within the nucleoplasm-nucleolus
   pathway, shedding light on their potential functional significance.^
   [178]95 Several proteins are detected in both cytosolic and membrane
   fractions, exhibiting robust associations with distinct subfamilies of
   BC.^ [179]96 The TF binding term may lead to new investigations to
   understand the mechanism of CXXC5 in BC and may suggest new treatments
   for ER + BC.^ [180]97 The protein-binding term associated with
   BC-causing DEGs.^ [181]78 Two terms, enzyme binding and ATP binding,
   proposed by other studies that are important for BC.^[182]98,[183]99
   The KEGG term PI3K-Akt signaling pathway proposed for BC cancer by
   bioinformatics study.^ [184]12 Furthermore, 3 KEGG pathways are cell
   cycle, HTLV-I infection, and proteoglycans in cancer are also related
   to BC progression claimed by other studies.^[185]100-[186]102
   Finally, we suggested hHubGs-guided 10 candidate drugs molecules
   (SORAFENIB, AMG-900, CHEMBL1765740, ENTRECTINIB, MK-6592, masitinib,
   GSK2126458, TG-02, YM201636, and PAZOPANIB) for the treatment of
   patients with BC. These drug molecules might be effective for
   generalized patients with BC in different regions and environments,
   since these molecules strongly bind to all target genes/proteins
   (hHUBGs). It should remind here that hHubGs are the representative of
   all HubGs that were detected in different regions and environment
   (column 3 in Supplemental Table S1). However, among the identified
   candidate drugs, some drugs are already approved by the Food and Drug
   Administration (FDA) for the treatment of other diseases, some are
   under clinical trials and almost all drugs are supported by other in
   silico studies for BC. The mechanism of action of our proposed drug is
   also briefly described below. In particular, SORAFENIB was approved by
   the FDA in 2013 for the treatment of thyroid carcinoma^ [187]103 and
   also is under phase 2 clinical trials for the treatment of stage 4
   BC.^[188]104,[189]105 The drug SORAFENIB is a multikinase inhibitor
   that provides antiproliferative, antiangiogenic, and antimetastatic
   effects by targeting cell surface tyrosine kinase receptors and
   downstream intracellular kinases implicated in tumor cell
   proliferation.^ [190]106 The drug AMG-900 is not yet approved for the
   treatment of any disease but it is under clinical trial (phase 1) for
   advanced solid tumors.^ [191]107 AMG-900 is an aurora kinases
   inhibitor, an essential regulator of cell division in mammalian cells,
   Aurora-A and Aurora-B expression and kinase activity is elevated in a
   variety of human cancers and is associated with high level of
   proliferation and poor prognosis.^ [192]108 The drug CHEMBL1765740 has
   not yet been published or is not under clinical trial for any disease
   but our study shows good efficacy in both molecular docking and MD
   simulation analysis. The drug ENTRECTINIB for the treatment of BC by
   Japan in 2019.^ [193]109 Entrectinib is a tyrosine kinase inhibitor
   that has shown potent antitumor effects in preclinical studies and has
   been found to inhibit tumor growth and survival in a dose-dependent
   manner in murine models.^ [194]110 The drug masitinib was approved by
   the European Union (EU) for the treatment of amyotrophic lateral
   sclerosis and mast cell disease and was later denied approval in 2017
   and 2018.^[195]111,[196]112 The drug GSK2126458 is under clinical trial
   of phase 1 study in human patients with BC.^ [197]113 The primary
   mechanism of action of GSK2126458 is inhibition of the PI3K-Akt
   signaling pathway, which is critical for cancer cell proliferation and
   survival.^ [198]114 The drug TG-02 has been approved by the FDA as an
   orphan drug for glioma and has also been suggested as an enzyme
   inhibitor of CDK for the treatment of cancer disease.^[199]115,[200]116
   The TG-02 and YM201636 drugs have been also suggested for treatment
   against BC through molecular docking analysis.^ [201]12 TG-02 reduces
   cell proliferation and promotes cell death in cancer by inhibiting
   multiple kinases, particularly CDK1, CDK2, CDK7, and CDK9.^ [202]117
   YM201636 inhibits the phosphatidylinositol 3,5-bisphosphate kinase
   (PIKfyve), and has been shown to have antiproliferative effects on
   liver cancer cells in a dose-dependent manner.^ [203]118 The drug
   PAZOPANIB is under clinical trial (phase 2) for BC treatment.^ [204]119
   Pazopanib is an oncostatic drug with antiangiogenic properties via
   inhibition of the intracellular tyrosine kinase vascular endothelial
   growth factor receptors (VEGFR).^ [205]120 We conclude from the
   literature review that the drug ENTRECTINIB have been approved and
   three drugs (SORAFENIB, GSK2126458, and PAZOPANIB) are under the
   clinical trial for BC treatment and other 6 drugs (AMG-900,
   CHEMBL1765740, MK-6592, masitinib, TG-02, and YM201636) are
   approved/under clinical trials for different diseases. Thus, we
   strongly recommend for further evaluation at the molecular level by
   experimental-laboratory testing and hope that effective results can be
   obtained.
Conclusions
   This study identified most representative top-ranked 10 BC-causing
   HubGs (CCNB1, CDK1, TOP2A, CCNA2, ESR1, EGFR, JUN, ACTB, TP53, and
   CCND1) as the hHubGs by the PPI network analysis of 259 individual
   HubGs that are obtained from 74 previously published individual
   studies. Breast cancer prediction analysis based 3 gene-expression
   profiles ([206]GSE65216, [207]GSE10810 and [208]GSE36295) of NCBI and
   the box-plot analysis with the gene-expression profiles of TCGA
   databases confirmed the differential expression patterns of hHubGs
   between BC and control samples. The enrichment analysis with GO terms
   and KEGG pathways revealed some crucial BC-causing BP (regulation of
   cell cycle and cell proliferation), MFs (nucleoplasm and nucleus), CCs
   (protein binding and enzyme binding), and pathways (PI3K-Akt signaling
   pathway and cell cycle) that we mentioned in the “Discussion” section.
   Gene regulatory network (GRN) disclosed top-ranked 5 TFs (MYC, HNF4A,
   KLF4, POU5F1, and SOX2) and 5 miRNAs (hsa-mir-103a-3p, hsa-mir-107,
   hsa-mir-16-5p, hsa-mir-34a-5p, and hsa-mir-23b-3p) as the
   transcriptional and posttranscriptional regulators of hHubGs. Three
   hHubGs (JUN, TP53, and ESR1) also act as TFs. As for example, the JUN
   inhibits to both TFs KLF4 and SOX2, TP53 activates to the TF POU5F1 and
   inhibits the TFs MYC and SOX2, and ESR1 activates to the TFs KLF4.
   Finally, this study recommended 10 candidate drug molecules (SORAFENIB,
   AMG-900, CHEMBL1765740, ENTRECTINIB, MK-6592, masitinib, GSK2126458,
   TG-02, YM201636, and PAZOPANIB) for the treatment against BC, where 1
   drug (ENTRECTINIB) already approved by the Japanese drug administration
   authority, 3 drug molecules SORAFENIB, GSK2126458, and PAZOPANIB are
   under the clinical trial for BC treatment and the rest 6 molecules
   (AMG-900, CHEMBL1765740, MK-6592, masitinib, TG-02, and YM201636)
   require further experimental validation in wet-laboratory although they
   received support by some independent computational studies for BC
   treatment. Therefore, the findings of this study might be played a
   vital role for taking a proper treatment plan against BC.
Acknowledgments