Abstract
Breast cancer (BC) is a complex disease, which causes of high mortality
rate in women. Early diagnosis and therapeutic improvements may reduce
the mortality rate. There were more than 74 individual studies that
have suggested BC-causing hub-genes (HubGs) in the literature. However,
we observed that their HubG sets are not so consistent with each other.
It may be happened due to the regional and environmental variations
with the sample units. Therefore, it was required to explore hub of the
HubG (hHubG) sets that might be more representative for early diagnosis
and therapies of BC in different country regions and their
environments. In this study, we selected top-ranked 10 HubGs (CCNB1,
CDK1, TOP2A, CCNA2, ESR1, EGFR, JUN, ACTB, TP53, and CCND1) as the
hHubG set by the protein-protein interaction network analysis based on
all of 74 individual HubG sets. The hHubG set enrichment analysis
detected some crucial biological processes, molecular functions, and
pathways that are significantly associated with BC progressions. The
expression analysis of hHubGs by box plots in different stages of BC
progression and BC prediction models indicated that the proposed hHubGs
can be considered as the early diagnostic and prognostic biomarkers.
Finally, we suggested hHubGs-guided top-ranked 10 candidate drug
molecules (SORAFENIB, AMG-900, CHEMBL1765740, ENTRECTINIB, MK-6592,
YM201636, masitinib, GSK2126458, TG-02, and PAZOPANIB) by molecular
docking analysis for the treatment against BC. We investigated the
stability of top-ranked 3 drug-target complexes (SORAFENIB vs ESR1,
AMG-900 vs TOP2A, and CHEMBL1765740 vs EGFR) by computing their binding
free energies based on 100-ns molecular dynamic (MD) simulation based
Molecular Mechanics Poisson-Boltzmann Surface Area (MM-PBSA) approach
and found their stable performance. The literature review also
supported our findings much more for BC compared with the results of
individual studies. Therefore, the findings of this study may be useful
resources for early diagnosis, prognosis, and therapies of BC.
Keywords: Breast cancer, hub of the hub-genes (hHubGs), early diagnosis
and prognosis, regulatory factors, drug repurposing, molecular docking
Introduction
Cancer is a malignant tumor that is initiated by multiple changes at
the genetic and epigenetic levels that progressively lead to abnormal
cell division and cellular alteration.^[35]1,[36]2 Breast cancer (BC)
is a major public health concern, since it is the leading cause of
cancer-related deaths in women worldwide. Breast cancer was the second
most common cancer diagnosed, and the number was approximately 2
million (11.6% of all cancers) in 2018.^ [37]3 Breast cancer became the
leading cause of diagnosis in 2020, and the number was approximately
2.2 million (11.7% of all cancers) worldwide.^ [38]4 Thus, it is a
matter of concern that the number of patients with BC and deaths is
increasing every year, despite claims that there have been significant
improvements in BC treatment. According to the American Cancer Society,
the 5-year survival rates of patients with BC are 98%, 93%, 72%, and
22% in stage 1, stage 2, stage 3, and stage 4, respectively.^ [39]5 The
5-year BC survival rate is the percentage of people who survive at
least 5 years after diagnosis of BC. From a previous study, it was
found that around 52.34% of the patients with BC received treatment in
stage 4, 32.81% in stage 3, 10.16% in stage 2, and only 4.69% patients
were treated in stage 1.^ [40]6 It has been observed that highest
number of patients is detected in the fourth stage of BC and their
survival rate is the lowest. Therefore, early diagnosis and therapeutic
improvement is essential to reduce the death rate of patients with BC.
The BC-causing genes may play the key role in this regard.
There are several individual studies on transcriptomics
(microarray/RNA-Seq) analysis in the literature that suggested
BC-causing hub-genes (HubGs) to investigate the pathogenetic processes
of BC. We systematically reviewed 74 individual articles from different
online sources that suggested BC-causing HubGs and found that their
HubG sets are not so similar with each other (see Supplemental Table
S1). It may be happened due to the regional and environmental
variations with the experimental units. We found a total of 259
different HubGs from those articles in which there was no any common
HubGs among the articles. So, it may be difficult to take a common
treatment plan for all patients with BC in different regions and
environment. Therefore, it is required to explore hub of the HubG
(hHubG) sets that may be more representative BC-causing HubGs for
diagnosis and therapeutic improvement for the treatment against BC in
different environments. Exploring BC-causing hHubGs out of 259 HubGs
highlighting their early diagnostic, prognostic, and therapeutic
characteristics by the wet-laboratory experimental procedure may be
laborious, time-consuming, and costly. In this study, we attempted to
explore BC-causing hHubGs highlighting their early diagnostic,
prognostic, and therapeutic characteristics by using the integrated
statistics and bioinformatics approaches.
To explore disease-causing HubGs, the protein-protein interaction (PPI)
network analysis is a widely used popular approach.^[41]7-[42]12 In
this study, we also explored BC-causing hHubGs by the PPI network
analysis of 259 HubGs that we mentioned earlier. In the case of drug
discovery, drug repurposing (DR) is a promising strategy, where new
therapeutic inventions are made through existing drugs.^ [43]13 It is
secure, more efficient, less expensive, and less time-consuming
compared with the de novo process due to it skip several steps from the
target-based drug selection to clinical validation.^ [44]14 Therefore,
in this study, we detected hHubGs-guided repurposable drug molecules.
There are several steps involved in the in silico analysis of the DR
process, such as network pharmacology, molecular docking analysis, and
dynamic simulation analysis.^ [45]15 Network pharmacology is an
efficient tool for systematic pharmacologic research which creates
interactions between target genes and drugs through network-based
methods.^[46]16,[47]17 Molecular docking analysis plays a vital role in
drug design through structural binding between therapeutic targets and
drug molecules by calculating affinity scores.^ [48]18 The molecular
dynamic (MD) simulation indicates the stability of docking patterns.
Recently, it has been widely used and has become one of the most
popular platforms to validate the candidate drugs.^[49]19,[50]20 The
workflow of this study is displayed in [51]Figure 1. It should be
mentioned here that the similar workflow was also considered in some
previous studies for other diseases.^[52]21,[53]22
Figure 1.
[54]Figure 1.
[55]Open in a new tab
The workflow of this study.
Methods and Materials
Data collection on BC-causing HubG sets
We collected HubG sets and drug molecules as molecular metadata from
different published articles up to December 31, 2023, through online
search strategy with the keywords “breast cancer [ti], gene expression,
RNA-Seq, scRNA-Seq, differentially expressed genes (DEGs), and
core/key/hub genes” from the public sources including PubMed in the
National Center of Biotechnology Information (NCBI)
([56]https://pubmed.ncbi.nlm.nih.gov/), Google Scholar
([57]https://scholar.google.com/), and Google
([58]https://www.google.com/). We found a total of 74 independent
articles that suggested BC-related HubGs-set. We collected 74 HubG sets
from the selected 74 articles. We found a total of 259 unique HubGs
from those HubG sets and used them to explore hHubGs for further
investigation in this study (see Supplemental Table S1).
Data collection on drug molecules
We collected candidate drug molecules from 3 platforms: (1) published
articles, (2) the Gene Set Cancer Analysis (GSCALite) database,^ [59]23
and (3) the Drug Gene Interaction Database (DGIdb).^ [60]24 Some of our
selected articles suggested candidate drugs for the treatment of BC and
considered them set A.^[61]25,[62]26 We collected meta-drugs from 2
databases DGIdb and GSCALite through the hHubG-drugs interaction and
considered them sets B and C, respectively (Supplemental Table S2).
Detection of BC-causing hub of the HubGs
Network-based strategies are widely used in identifying disease-driving
KGs and their potential regulators.^[63]27,[64]28 We submitted all
unique HubGs to the STRING web tool (Search Tool for the Retrieval of
Interacting Genes) (version: 11.5) to construct PPI network.^ [65]29
Protein-protein interactions of the all HubGs were constructed with a
confidence score 0.90. Subsequently, Cytoscape software (version:
3.9.0) was used for visualization of the PPI network.^ [66]30 The
“Analyze Network tool” in Cytoscape was used for topological degree
analysis, and the degree >90 was considered as the cutoff criterion for
hHubG identification.
Assessment of hHubGs as the early diagnostic and prognostic biomarkers
To assess hHubGs as early diagnostic and prognostic biomarkers, we
performed their expression analysis with the different stages (normal
status, stage 1, stage 2, stage 3, and stage 4) of BC progression by
using box plots through the UALCAN web-tool and The Cancer Genome Atlas
(TCGA) database.^ [67]31 To investigate the prognostic power of hHubGs,
we also developed BC prediction models through 2 popular machine
learning methods known as support vector machine (SVM) and random
forest (RF). To perform prediction models, we collected 3 microarray
gene-expression data sets with accession numbers [68]GSE65216,^ [69]32
[70]GSE10810,^ [71]33 and [72]GSE36295^ [73]34 from the Gene Expression
Omnibus (GEO) database. Each data set divided into 2 groups: training
data (70% of all data) and test data (30% of all data). Finally, the
receiver operating characteristic (ROC) curve was used to assess the
prognosis performance. The ROC curve was constructed using the R
package ROCR. We also used independent database “DisGeNET” and web-tool
“Enrichr” to verify the association of hHubGs with BC and other disease
by disease-hHubGs enrichment analysis.^ [74]35
Functional and pathway enrichment analysis
To investigate the pathogenetic processes of hHubGs, we performed their
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genome (KEGG)
pathway enrichment analysis. The GO enrichment analysis was performed
in 3 categories namely biological process (BP), cellular component
(CC), and molecular function (MF). Gene Ontology functional and KEGG
pathway enrichment analyses were performed using the online database
DAVID (version: 6.8) online tool.^ [75]36
Regulatory interaction network
We performed regulatory interaction network to identify the key
transcription factors (TFs) and microRNAs (miRNAs) as the
transcriptional and posttranscriptional regulators of hHubGs. The
online tool “NetworkAnalyst” (version: 3.0) was used to construct the
regulatory interaction network.^ [76]37 ChEA and TarBase (v8.0)
databases were selected for constructing hHubG-TF and hHubG-miRNA
interaction networks, respectively.
Drug repurposing
We used molecular docking analysis, a popular in silico validation
technique, to perform the binding between the drug and target. We
considered hHubGs as targets (receptors/proteins) and meta-drug agents
collected from published articles and the online databases GSCALite and
DGIdb as drugs (compounds) for performing molecular docking analysis.
We collected 3-dimensional (3D) structures of proteins from the Protein
Data Bank (PDB) ([77]https://www.rcsb.org/)^ [78]38 and the SWISS-MODEL
([79]https://swissmodel.expasy.org/)^ [80]39 databases due to the
required to perform molecular docking analysis. The 3D structures of
all compounds were downloaded from the PubChem
([81]https://pubchem.ncbi.nlm.nih.gov/) database.^ [82]40 PyMOL
(PyMOL2) software was used for the preprocessing of 3D structure
proteins.^ [83]41 PyRx software was used to perform the blind molecular
docking analysis between proteins and compounds and selected
shortlisted drugs based on binding affinity scores (BASs)
(kcal/mol).^[84]42,[85]43 Then, we validate the selected drug molecules
with the active sites (by using PyMOL2 software)^ [86]41 of our
selected hHubGs through flexible docking analysis. The 3D interaction
between the protein and compound was constructed and visualized by
using USCF Chimera and Discovery Studio Visualizer 2021 software.^
[87]44
MD simulation studies
Molecular dynamic simulations were performed using YASARA Dynamics
software^ [88]45 and the AMBER (Assisted Model Building with Energy
Refinement) force field^ [89]46 to test the dynamic behavior of the top
protein-compound interactions. The AMBER’s force fields have been
extensively validated and optimized for various biological molecules,
making it a robust choice for simulation study.^ [90]47 The root mean
square deviation (RMSD) value between the crystal reference and the
predicted structure is commonly used to confirm whether a close-match
docked pose was predicted or not by docking simulations. If the RMSD
value is ⩽2 Å, the result is considered as good result.^ [91]48 Three
top-ranked interactions based on affinity scores were selected to
perform MD simulations. For simulation performance, the
protein-compound interaction hydrogen bonding network is optimized and
solvated through the TIP3P water model in a simulation cell.^ [92]49
The periodic boundary condition was maintained by a solvent
concentration of 0.997 g L^−1. pKa was calculated during solvation by
subjecting to titratable amino acids in the interaction. A time-step
interval of 2.50 fs (298 K, pH = 7.4, 0.9% NaCl) under physiological
conditions was used to run several time-step algorithms for each
simulation.^ [93]50 The SETTLE and LINCS algorithms were used to
contain water molecules and constrain all bond lengths,
respectively.^[94]51,[95]52 For further analysis, trajectories were
recorded every 250 ps, and subsequent analyses were implemented by
default scripts of YASARA Macro and SciDAVis software.^ [96]53 Then,
all snapshots were subjected to the Molecular Mechanics
Poisson-Boltzmann surface area (MM-PBSA) in YASARA software to
calculate the binding free energy using the given equation.^ [97]54
[MATH:
Binding
freeEnerg
mi>yΔG=G<
mrow>bound−Gunbo<
mi>und :MATH]
Here, YASARA built-in macros were used with consideration AMBER 14 as a
force field to calculate the MM-PBSA binding energy, where a greater
positive energy indicates better binding.
Results
Exploring BC-causing hub-genes by the systematic literature review
We found a total of 297 BC-related articles from various online sources
(PubMed, Google Scholar, and Google) by keywords search up to December
31, 2023. Finally, we selected 74 articles out of them through the
inclusion-exclusion criteria (see Supplemental Figure S1 and
Supplemental Table S1). Then, we detected a total of BC-causing 259
different HubGs from those 74 HubG sets as described in the “Materials
and Methods” section and used them for further investigation in this
study.
Detection of hub of the HubGs
The PPI network was constructed based on those 259 individual HubGs to
detect BC-causing hHubGs as key genomic biomarkers. It included 209 of
nodes, 5785 edges, average node degree 39.5, average local clustering
coefficient 0.599, and P < 1.0e-15. The PPI network was visualized in
[98]Figure 2, where hHubGs are indicated by large-size nodes and
selected based on the top degree of connectivity (degree >95). We
selected top-ranked 10 hHubGs (CCNB1, CDK1, TOP2A, CCNA2, ESR1, EGFR,
JUN, ACTB, TP53, and CCND1) in which 6 hHubGs (CCNB1, CDK1, TOP2A,
CCNA2, ESR1, and EGFR) belong to the set of already published
BC-causing top-ranked 10 HubGs (see [99]Table 1 and Supplemental Table
S1).
Figure 2.
Figure 2.
[100]Open in a new tab
The PPI network based on 259 unique HubGs, where large size indicates
hHubGs with the highest degree of connectivity (degree >90).
Table 1.
List of top-ranked 10 BC-causing HubGs/hHubGs based on the agreement of
published articles (Supplemental Table S1) (A*) and hHubGs through PPI
network analysis (B*) ([101]Figure 2), respectively.
Top-ranked 10 HubGs by the agreement of published articles
(Supplemental Table S1) Top-ranked 10 hHubGs (proposed) Common hHubGs
[MATH: (A*∩B*
mrow>) :MATH]
Genes (A*) Number of agreed articles Genes (B*) Degree of PPI score
TOP2A 8 TP53 143 CCNB1
CDK1
TOP2A
CCNA2
ESR1
EGFR
CDK1 7 ACTB 122
CCNB1 9 EGFR 102
FN1 8 CCND1 100
RRM2 6 JUN 100
BUB1B 7 ESR1 95
CCNB2 8 CDK1 99
CCNA2 8 CCNB1 99
ESR1 6 CCNA2 98
EGFR 6 TOP2A 91
[102]Open in a new tab
Bold indicates common genes.
Performance of hHubGs as early diagnostic and prognostic biomarkers
To investigate the significance of differential expression patterns of
top-ranked 10 hHubGs between the normal group and 4-stage group (stage
1, stage 2, stage 3, and stage 4) of BC individually, we performed
box-plot analysis using t test statistic with P < .001 based on their
TCGA expression profiles ([103]Figure 3A). We observed that all hHubGs
individually showed significantly differential expression patterns (P
values ranged from 8.99E-04 [TP53] to 2.22E-16 [ACTB]) between stage 1
BC and control samples group. Thus, our identified hHubGs may play an
important role in the early stage diagnosis of BC. To investigate the
prognostic power of top-ranked 10 hHubGs, we developed RF and SVM-based
2 BC prediction models by using the gene-expression data with NCBI-GEO
accession number [104]GSE65216 as training data set, and [105]GSE10810
and [106]GSE36295 as the test data sets. To investigate the performance
of the prediction models, we constructed ROC curves in [107]Figure 3B,
where green indicates training performance and red and blue indicate
test performance. We observed that both the training and test
performances were good with area under the curve (AUC) values >0.86 for
both the RF and SVM-based prediction models. Furthermore, we observed
that hHubGs are significantly associated with invasive carcinoma of BC
and luminal B breast carcinoma (Supplemental Figure S2). Thus, our
suggested hHubGs are significantly associated with almost all subtypes
of BC.
Figure 3.
[108]Figure 3.
[109]Open in a new tab
(A) Analysis of expression levels of hHubGs between the normal group
and 4-stage group (stage 1, stage 2, stage 3, and stage 4) of BC, and
(B) ROC curves to investigate the prognostic power of hHubGs by the RF
and SVN-based BC prediction models with 3 gene-expression profile data
sets in NCBI-GEO database.
HubGs-set enrichment analysis
The GO functional and KEGG enrichment analysis showed that unique HubGs
were enriched by 234 GO-BP terms, 56 GO-MF terms, 38 GO-CC terms, and
85 pathways. Then, we selected the significant GO terms and KEGG
pathways that were associated with at least 4 of our identified hHubGs
and P < .01. Four BP terms (positive regulation of transcription from
RNA polymerase II promoter, cell proliferation, cell division, and
regulation of cell cycle), 5 CC terms (nucleoplasm, nucleus, cytosol,
cytoplasm, and membrane), 5 MF terms (protein binding, enzyme binding,
adenosine triphosphate (ATP) binding, TF binding, and protein
heterodimerization activity), and 7 KEGG pathways (pathways in cancer,
PI3K-Akt signaling pathway, cell cycle, herpes simplex infection, human
T-lymphotropic virus type 1 (HTLV-I) infection, proteoglycans in
cancer, and PI3K-Akt signaling pathway) were associated with at least 4
hHubGs and may be responsible for BC progression in [110]Table 2.
Table 2.
Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes
(KEGG) pathways enrichment analysis results of all HubGs highlighting
number of HubGs (nHubGs), P value, and associated hub of the HubGs
(hHubGs).
ID Terms/pathways HubGs P Associated hHubGs
GO: Biological process (BP)
GO:0045944 Positive regulation of transcription from RNA polymerase II
promoter 66 3.8E−21 TOP2A, TP53, EGFR, ESR1, JUN
GO:0008283 Cell proliferation 38 5.0E−18 EGFR, CDK1, TP53, JUN
GO:0051301 Cell division 36 5.9E−17 CCNB1, CCND1, CCNA2, CDK1
GO:0051726 Regulation of cell cycle 23 2.8E−16 JUN, CCNB1, TP53, ESR1
GO: Cellular components (CCs)
GO:0005654 Nucleoplasm 123 2.6E−26 ACTB, CCND1, TP53, CCNB1, ESR1, JUN,
CDK1, TOP2A, CCNA2
GO:0005634 Nucleus 175 5.9E−24 CCND1, TP53, CCNB1, JUN, ESR1, CDK1,
TOP2A, CCNA2, EGFR
GO:0005829 Cytosol 129 1.0E−22 ACTB, CCND1, TP53, CCNB1, JUN, CDK1
GO:0005737 Cytoplasm 154 8.7E−16 ACTB, CCND1, TP53, CCNB1, ESR1, CDK1,
TOP2A, CCNA2, EGFR
GO:0016020 Membrane 73 1.1E−08 ACTB, CCND1, EGFR, CCNB1, ESR1, CDK1
GO: Molecular functions (MFs)
GO:0005515 Protein binding 245 2.6E−29 ACTB, TP53, JUN, TOP2A, EGFR,
CCND1, CCNB1, ESR1, CDK1, CCNA2
GO:0019899 Enzyme binding 33 3.9E−15 TOP2A, EGFR, CCND1, JUN, ESR1,
TP53
GO:0005524 ATP binding 67 9.8E−13 TOP2A, ACTB, TP53, EGFR, CDK1
GO:0008134 Transcription factor binding 24 1.0E−09 JUN, ESR1, CCND1,
TP53
GO:0046982 Protein heterodimerization activity 28 4.8E−08 TOP2A, EGFR,
JUN, TP53
KEGG pathways
hsa05200 Pathways in cancer 51 1.4E−20 CCND1, TP53, EGFR, JUN
hsa04151 PI3K-Akt signaling pathway 33 1.6E−20 CCND1, JUN, CCNA2, TP53
hsa04110 Cell cycle 29 2.6E−18 CCNB1, CCND1, CCNA2, CDK1, TP53
hsa05168 Herpes simplex infection 24 1.0E−16 EGFR, CCND1, TP53, JUN
hsa05166 HTLV-I infection 36 1.8E−15 CCND1, JUN, TP53, ESR1
hsa05205 Proteoglycans in cancer 31 2.4E−14 EGFR, ACTB, CCND1, ESR1,
TP53
hsa04151 PI3K-Akt signaling pathway 40 3.1E−14 EGFR, CCND1, TP53, TOP2A
[111]Open in a new tab
Regulatory network analysis
To identify the top-ranked transcriptional and posttranscriptional
regulators of hHubGs, we constructed and visualized KG-TF and KG-miRNA
interaction networks in [112]Figure 4A and [113]B, where pink indicates
hHubGs and green indicates TFs and miRNAs in [114]Figure 4A and [115]B,
respectively. We observed that 5 TFs (MYC, HNF4A, KLF4, POU5F1, and
SOX2) were associated with at least 8 hHubGs (degree ⩾8), so these 5
TFs considered as transcriptional factors of hHubG. Similarly, 5 miRNAs
(hsa-mir-103a-3p, hsa-mir-107, hsa-mir-16-5p, hsa-mir-34a-5p, and
hsa-mir-23b-3p) were associated with at least 8 hHubGs (degree ⩾8), so
these 5 miRNAs were considered as posttranscriptional factors of
hHubGs. We found from EMBL-EBI (European Molecular Biology
Laboratory—European Bioinformatics Institute) database that 3 hHubGs
(JUN, TP53, and ESR1) out of 10 hHubGs are TFs and indicated them by
diamond shape in [116]Figure 4. [117]Figure 4C(I) indicates that 3 TFs
of hHubGs (MYC, KLF4, and POU5F1) showed downregulation, and 2 TFs
(HNF4A and SOX2) showed upregulation. [118]Figure 4C(II) shows that
KLF4 is a upstream and SOX2 is a downstream TFs for the JUN hHubGs
(TFs), so JUN acts as an inhibitor to both KLF4 and SOX2 (if the
expression level of the downstream TF decreased, then core TF acted as
an activator otherwise inhibitor). Similarly, TP53 acted as an
activator to POU5F1 and inhibitor to MYC and SOX2, and ESR1 acted as an
activator to KLF4.
Figure 4.
[119]Figure 4.
[120]Open in a new tab
hHubGs-regulatory networks, where pink indicates hHubGs (diamond shape
hHubGs indicate TFs themselves): (A) hHubGs-TFs interaction network,
where large green indicates key transcription factors; (B) hHubG-miRNA
interaction network, where large green indicates key
posttranscriptional factors; and (C) upstream and downstream analysis
of TFs (I) differential expressions for TFs of hHubGs (II) upstream and
downstream regulation by hHubGs that are also TFs (JUN, TP53, and
ESR1), where pink diamond indicates core TFs, red rectangle indicates
upstream TFs, and blue rectangle indicates downstream TFs.
Drug repurposing
First, we collected BC and hHubGs associated with a total of 258 drug
agents from the published articles and 2 online databases: DGIdb and
GSCALite. Then, we performed blind molecular docking analysis (entire
protein) between compounds (drug agents) and target proteins (hHubGs).
We selected the top-10 poses based on the smallest docking scores
between compounds and target proteins. To check the quality of binding
poses, we calculated the RMSD between native pose and our sleeted 10
poses. Finally, we selected the best pose based on the smallest RMSD
and considered the docking score in that pose. We visualized the
molecular docking results of the top 125 interactions in [121]Figure
5A, where the y-axis represents the target proteins and x-axis
compounds and different colors indicate the affinity scores. Compounds
were arranged in x-axis in ascending order based on the average BAS
across all proteins. For example, top-ranked 10 compounds (SORAFENIB,
AMG-900, CHEMBL1765740, ENTRECTINIB, MK-6592, YM201636, masitinib,
GSK2126458, TG-02, and PAZOPANIB) bind to the receptor proteins with
average BAS less than −8.76 kcal/mol (binds strongly to the receptor
proteins), and the next top-ranked 10 compounds (GSK-269962A, MLN-8054,
ENMD-2076, ENMD-981693, NERVIANO, SB590885, Lapatinib, GSK1070916,
TAK-715, and PF-03814735) bind to the receptor proteins with average
BAS greater than −8.23 kcal/mol (binds weakly to the receptor
proteins). Therefore, top-ranked 10 compounds (SORAFENIB, AMG-900,
CHEMBL1765740, ENTRECTINIB, MK-6592, YM201636, masitinib, GSK2126458,
TG-02, and PAZOPANIB) were proposed as candidate drugs against BC. In
[122]Figure 5B, the BAS of top-ranked 10 compounds with all receptors
was highlighted. We found 6 overlapping genes between the top-ranked
HubGs and hHubGs in [123]Figure 5C. We cross-validated the proposed
candidate drugs with 4 uncommon top-ranked HubGs in [124]Figure 5D. To
validate the top-ranked drug molecules with active sites of top-ranked
HubGs by their flexible docking analysis, we predicted their active
sites by the PyMOL software and listed in Table S3. Then, we perform
flexible docking analysis between drugs and selected top-ranked HubGs
(with flexibility of active sites) in [125]Table 3. We also found a
strong bond between them, so we may recommend our candidate drugs for
universal use against BC.
Figure 5.
[126]Figure 5.
[127]Open in a new tab
Molecular docking analysis results for screening candidate drugs
against BC: (A) presented interaction binding affinity scores between
hHubGs and drug agents; (B) visualization of interaction between 10
candidate drugs and hHubGs by zooming; (C) overlap between the
top-ranked HubGs and hHubGs; and (D) cross-validation of the candidate
drugs by top-ranked HubGs.
Table 3.
Flexible docking scores between the proposed drug molecules and the
active sites of hHubGs.
Proteins AMG900 CHEMBL1765740 SORAFENIB ENTRECTINIB MK6592 YM201636
masitinib GSK2126458 TG02 PAZOPANIB
TOP2A −8.2 −7.3 −7.1 −7 −7.5 −7.3 −6.8 −7.1 −6.9 −6.3
EGFR −9.3 −8.5 −7.3 −7.6 −7.8 −8 −7.1 −6.9 −7.1 −6.1
ESR1 −8.9 −9.3 −9.5 −8.3 −8 −6.6 −7.3 −7.2 −7 −6.6
TP53 −8.5 −8.8 −7.8 −8 −8.1 −7.9 −7.6 −6.7 −7.1 −7.8
CDK1 −7.7 −6.8 −7.5 −7.6 −7.1 −7.2 −7 −6.7 −6.5 −6.8
CCND1 −7.6 −7.4 −8.1 −7.7 −7.3 −7.4 −7.6 −7.7 −7.4 −7.1
JUN −8.5 −7.7 −8.1 −8.1 −7.7 −7.1 −6.7 −7 −6.9 −6.7
ACTB −9.2 −7.8 −8.6 −7.2 −7.5 −7.7 −7.1 −7.7 −7.1 −6.6
CCNB1 −8.3 −7.5 −7.5 −8.1 −7.7 −7.6 −7.4 −7 −6.7 −7.2
CCNA2 −9.6 −7.6 −8.5 −7.5 −7.6 −7.2 −6.9 −7.5 −5.8 −6.3
[128]Open in a new tab
Molecular dynamic simulations
Three top interactions, SORAFENIB-ESR1 (BAS = −12.5),
CHEMBL1765740-EGFR (BAS = −10.8), and AMG900-TOP2A (BAS = −12.4),
between candidate drugs and target proteins were selected for stability
analysis using 100-ns MD-based MM-PBSA simulations. We noticed that all
systems were remarkably stable in variations of moving and initial
drug-target interactions in [129]Figure 6. We calculated the RMSD of
selected interactions to check the stability of structure during the
simulation period in [130]Figure 6A. All the systems projected RMSD
range from 1 Å to 1.75 Å, .75 Å to 2.25 Å, and .5 Å to 2.75 Å for
SORAFENIB-ESR1, CHEMBL1765740-EGFR, and AMG900-TOP2A interactions,
respectively. We observed that all interactions fluctuated slightly
between 0 and 16 000 ps and were stable in the remaining simulations.
We also calculated the MM-PBSA binding energy for the 3 selected top
interactions in [131]Figure 6B. The average binding forces for
SORAFENIB-ESR1, CHEMBL1765740-EGFR, and AMG900-TOP2A interactions were
217.5, 130.45, and 131.4 kJ/mol, respectively.
Figure 6.
[132]Figure 6.
[133]Open in a new tab
MD simulation analysis of the top 3 interactions, where blue, red, and
green indicate SORAFENIB-ESR1, CHEMBL1765740-EGFR, and AMG900-TOP2A
interactions, respectively: (A) time evolution of RMSDs of backbone
atoms (C, Cα, and N) for protein for each docked complex, and (B)
represented binding free energy of each snapshot for the top 3
interactions.
Discussion
This study analyzed metadata on BC-causing HubGs collected from
different published articles. At first, we selected 74 articles out of
297 by the systematic review as shown in Supplemental Figure S1.
Different 259 HubGs were found in total from the selected articles
(Supplemental Table S1), and these data were used for further analysis
in this study. Top-ranked 10 HubGs (CCNB1, CDK1, TOP2A, CCNA2, ESR1,
EGFR, JUN, ACTB, TP53, and CCND1) were selected as the hHubGs out of
different 259 HubGs by the PPI network analysis, where 6 hHubGs (CCNB1,
CDK1, TOP2A, CCNA2, ESR1, and EGFR) are common with the top-ranked
HubGs based on the agreement of published articles ([134]Table 1).
Then, we verified their differential expression patterns between BC and
control samples using independent data sets from NCBI and TCGA
databases to the prediction models and box-plot analysis. We observed
that the proposed hHubGs can be used as the strong biomarkers for early
diagnosis and prognosis, where 8 genes (CCNB1, CDK1, TOP2A, CCNA2,
ESR1, ACTB, TP53, and CCND1) are up-regulated hHubGs and the rest 2
genes (EGFR and JUN) are down-regulated hHubGs. In particularly, the
CCNB1 gene is associated with aggressive tumor behavior, larger tumor
size, higher grade, hormone receptor negativity, HER2 positivity, and
poor clinical outcome.^ [135]55 The TOP2A gene has been suggested in 8
articles that this gene plays a prognostic role in patients with
BC.^[136]56-[137]62 Overexpression of TOP2A is a marker of poor
prognosis, especially in luminal/hormone receptor-positive BC, and may
serve as both a prognostic and predictive biomarkers in BC disease.^
[138]63 Seven articles claim that the CDK1 gene is responsible for the
occurrence and development of
BC.^[139]26,[140]57-[141]59,[142]64,[143]65 CDK1 is associated with
shorter overall survival, progression-free survival, and relapse-free
survival, as well as more advanced clinical stage of cancer.^ [144]66
The CCNA2 gene has been identified in 6 studies as a promising
biomarker in silico analysis related to BC
progression.^[145]26,[146]56,[147]57,[148]67-[149]72 The CCNA2 gene is
linked to a poor prognosis and resistance to endocrine therapy and
serve as a prognostic biomarker in estrogen receptor-positive patients
with BC.^ [150]73 The ESR1 gene suggested in several articles suggested
as generalized BC.^[151]56,[152]74-[153]78 The EGFR gene and its
signaling pathways are related to the progression of BC and can be used
as a therapeutic target against BC
therapy.^[154]61,[155]74,[156]79-[157]81 Overexpression of EGFR is
associated with more aggressive, ER-negative disease and poorer
prognosis, particularly in triple negative BC.^ [158]82 The JUN gene
has been suggested that it can be used as a biomarker for the diagnosis
and therapeutic target of BC.^[159]74,[160]83,[161]84 In addition, 3
genes (ACTB, TP53, and CCND1) are supported by 3
articles.^[162]25,[163]74,[164]75,[165]77,[166]79,[167]85-[168]89 For
more details, please see [169]Figure 7. The CCND1 gene encodes cyclin
D1, with a critical role in BC prognosis.^ [170]90 We observed that our
proposed HubGs are mostly associated with a poor prognosis in patients
with BC and also suggested for generalized BC by almost all of the 74
articles we listed in Supplemental Figure S1.
Figure 7.
[171]Figure 7.
[172]Open in a new tab
Agreement of the proposed BC-related hHubGs with other independent
studies.
To investigate the pathogenetic processes of hHubGs, we selected the
top significant terms for each BP, MF, CC, and KEGG pathway associated
with at least 4 hHubGs and P < .01 in [173]Table 2. The GO terms and
KEGG pathways related to our identified hHubGs have suggested several
studies that are directly linked to BC progression. In particularly,
the positive regulation of transcription from RNA polymerase II
promoter term with BC cells associates late events in transcription
with repair processes in eukaryotic cells.^ [174]91 The regulation of
cell cycle and cell proliferation are important indicators of BC
prognosis.^ [175]92 The cell proliferation employs transcriptional and
nontranscriptional mechanisms to activate the cascade of
cyclin-dependent kinases in BC.^ [176]93 The asymmetric cell division
intricately regulates the diverse states of stem cells, pivotal in
orchestrating the initiation of BC.^ [177]94 Observing the spatial
overlap between BRCA1 and nucleus presents intriguing implications for
the interplay between BC proteins within the nucleoplasm-nucleolus
pathway, shedding light on their potential functional significance.^
[178]95 Several proteins are detected in both cytosolic and membrane
fractions, exhibiting robust associations with distinct subfamilies of
BC.^ [179]96 The TF binding term may lead to new investigations to
understand the mechanism of CXXC5 in BC and may suggest new treatments
for ER + BC.^ [180]97 The protein-binding term associated with
BC-causing DEGs.^ [181]78 Two terms, enzyme binding and ATP binding,
proposed by other studies that are important for BC.^[182]98,[183]99
The KEGG term PI3K-Akt signaling pathway proposed for BC cancer by
bioinformatics study.^ [184]12 Furthermore, 3 KEGG pathways are cell
cycle, HTLV-I infection, and proteoglycans in cancer are also related
to BC progression claimed by other studies.^[185]100-[186]102
Finally, we suggested hHubGs-guided 10 candidate drugs molecules
(SORAFENIB, AMG-900, CHEMBL1765740, ENTRECTINIB, MK-6592, masitinib,
GSK2126458, TG-02, YM201636, and PAZOPANIB) for the treatment of
patients with BC. These drug molecules might be effective for
generalized patients with BC in different regions and environments,
since these molecules strongly bind to all target genes/proteins
(hHUBGs). It should remind here that hHubGs are the representative of
all HubGs that were detected in different regions and environment
(column 3 in Supplemental Table S1). However, among the identified
candidate drugs, some drugs are already approved by the Food and Drug
Administration (FDA) for the treatment of other diseases, some are
under clinical trials and almost all drugs are supported by other in
silico studies for BC. The mechanism of action of our proposed drug is
also briefly described below. In particular, SORAFENIB was approved by
the FDA in 2013 for the treatment of thyroid carcinoma^ [187]103 and
also is under phase 2 clinical trials for the treatment of stage 4
BC.^[188]104,[189]105 The drug SORAFENIB is a multikinase inhibitor
that provides antiproliferative, antiangiogenic, and antimetastatic
effects by targeting cell surface tyrosine kinase receptors and
downstream intracellular kinases implicated in tumor cell
proliferation.^ [190]106 The drug AMG-900 is not yet approved for the
treatment of any disease but it is under clinical trial (phase 1) for
advanced solid tumors.^ [191]107 AMG-900 is an aurora kinases
inhibitor, an essential regulator of cell division in mammalian cells,
Aurora-A and Aurora-B expression and kinase activity is elevated in a
variety of human cancers and is associated with high level of
proliferation and poor prognosis.^ [192]108 The drug CHEMBL1765740 has
not yet been published or is not under clinical trial for any disease
but our study shows good efficacy in both molecular docking and MD
simulation analysis. The drug ENTRECTINIB for the treatment of BC by
Japan in 2019.^ [193]109 Entrectinib is a tyrosine kinase inhibitor
that has shown potent antitumor effects in preclinical studies and has
been found to inhibit tumor growth and survival in a dose-dependent
manner in murine models.^ [194]110 The drug masitinib was approved by
the European Union (EU) for the treatment of amyotrophic lateral
sclerosis and mast cell disease and was later denied approval in 2017
and 2018.^[195]111,[196]112 The drug GSK2126458 is under clinical trial
of phase 1 study in human patients with BC.^ [197]113 The primary
mechanism of action of GSK2126458 is inhibition of the PI3K-Akt
signaling pathway, which is critical for cancer cell proliferation and
survival.^ [198]114 The drug TG-02 has been approved by the FDA as an
orphan drug for glioma and has also been suggested as an enzyme
inhibitor of CDK for the treatment of cancer disease.^[199]115,[200]116
The TG-02 and YM201636 drugs have been also suggested for treatment
against BC through molecular docking analysis.^ [201]12 TG-02 reduces
cell proliferation and promotes cell death in cancer by inhibiting
multiple kinases, particularly CDK1, CDK2, CDK7, and CDK9.^ [202]117
YM201636 inhibits the phosphatidylinositol 3,5-bisphosphate kinase
(PIKfyve), and has been shown to have antiproliferative effects on
liver cancer cells in a dose-dependent manner.^ [203]118 The drug
PAZOPANIB is under clinical trial (phase 2) for BC treatment.^ [204]119
Pazopanib is an oncostatic drug with antiangiogenic properties via
inhibition of the intracellular tyrosine kinase vascular endothelial
growth factor receptors (VEGFR).^ [205]120 We conclude from the
literature review that the drug ENTRECTINIB have been approved and
three drugs (SORAFENIB, GSK2126458, and PAZOPANIB) are under the
clinical trial for BC treatment and other 6 drugs (AMG-900,
CHEMBL1765740, MK-6592, masitinib, TG-02, and YM201636) are
approved/under clinical trials for different diseases. Thus, we
strongly recommend for further evaluation at the molecular level by
experimental-laboratory testing and hope that effective results can be
obtained.
Conclusions
This study identified most representative top-ranked 10 BC-causing
HubGs (CCNB1, CDK1, TOP2A, CCNA2, ESR1, EGFR, JUN, ACTB, TP53, and
CCND1) as the hHubGs by the PPI network analysis of 259 individual
HubGs that are obtained from 74 previously published individual
studies. Breast cancer prediction analysis based 3 gene-expression
profiles ([206]GSE65216, [207]GSE10810 and [208]GSE36295) of NCBI and
the box-plot analysis with the gene-expression profiles of TCGA
databases confirmed the differential expression patterns of hHubGs
between BC and control samples. The enrichment analysis with GO terms
and KEGG pathways revealed some crucial BC-causing BP (regulation of
cell cycle and cell proliferation), MFs (nucleoplasm and nucleus), CCs
(protein binding and enzyme binding), and pathways (PI3K-Akt signaling
pathway and cell cycle) that we mentioned in the “Discussion” section.
Gene regulatory network (GRN) disclosed top-ranked 5 TFs (MYC, HNF4A,
KLF4, POU5F1, and SOX2) and 5 miRNAs (hsa-mir-103a-3p, hsa-mir-107,
hsa-mir-16-5p, hsa-mir-34a-5p, and hsa-mir-23b-3p) as the
transcriptional and posttranscriptional regulators of hHubGs. Three
hHubGs (JUN, TP53, and ESR1) also act as TFs. As for example, the JUN
inhibits to both TFs KLF4 and SOX2, TP53 activates to the TF POU5F1 and
inhibits the TFs MYC and SOX2, and ESR1 activates to the TFs KLF4.
Finally, this study recommended 10 candidate drug molecules (SORAFENIB,
AMG-900, CHEMBL1765740, ENTRECTINIB, MK-6592, masitinib, GSK2126458,
TG-02, YM201636, and PAZOPANIB) for the treatment against BC, where 1
drug (ENTRECTINIB) already approved by the Japanese drug administration
authority, 3 drug molecules SORAFENIB, GSK2126458, and PAZOPANIB are
under the clinical trial for BC treatment and the rest 6 molecules
(AMG-900, CHEMBL1765740, MK-6592, masitinib, TG-02, and YM201636)
require further experimental validation in wet-laboratory although they
received support by some independent computational studies for BC
treatment. Therefore, the findings of this study might be played a
vital role for taking a proper treatment plan against BC.
Acknowledgments