Abstract Metastasis is a major breast cancer hallmark due to which tumor cells tend to relocate to regional or distant organs from their organ of origin. This study is aimed to decipher the interaction among 113 differentially expressed genes, interacting non-coding RNAs and drugs (614 miRNAs, 220 lncRNAs and 3241 interacting drugs) associated with metastasis in breast cancer. For an extensive understanding of genetic interactions in the diseased state, a backbone gene co-expression network was constructed. Further, the mRNA–miRNA–lncRNA–drug interaction network was constructed to identify the top hub RNAs, significant cliques and topological parameters associated with differentially expressed genes. Then, the mRNAs from the top two subnetworks constructed are considered for transcription factor (TF) analysis. 39 interacting miRNAs and 1641 corresponding TFs for the eight mRNAs from the subnetworks are also utilized to construct an mRNA–miRNA–TF interaction network. TF analysis revealed two TFs (EST1 and SP1) from the cliques to be significant. TCGA expression analysis of miRNAs and lncRNAs as well as subclass-based and promoter methylation-based expression, oncoprint and survival analysis of the mRNAs are also done. Finally, functional enrichment of mRNAs is also performed. Significant cliques identified in the study can be utilized for identification of newer therapeutic interventions for breast cancer. This work will also help to gain a deeper insight into the complicated molecular intricacies to reveal the potential biomarkers involved with breast cancer progression in future. Subject terms: Cancer, Systems biology Introduction Research shows that breast cancer (BC) morbidity has been on the rise^[26]1. BC cell motility is one of the key traits of individual tumor cells. The primary clinical cause of BC morbidity accounts to metastasis^[27]2,[28]3. Metastasis is the migration of tumor cells from the site of origin to different locations of the body resulting in new tumor colonies that result in deaths^[29]4. The main sites of BC metastasis include the lungs, liver, bone and brain^[30]5. The current therapeutic strategies to combat BC are surgery, chemotherapy and radiation therapy^[31]6. Tumor metastasis is associated with various types of RNAs and compounds that play a role in deregulating different BC signalling cascades to lead to BC progression. Before metastasis, the primary tumor cells secrete cytokines and extracellular vesicles that modulate the pre-metastatic niche formation. The messenger RNA (mRNA)-microRNA (miRNA)-long noncoding RNA (lncRNA) axis is significantly associated with tumor pathogenesis such tumor stage, rates of progression and metastasis making them potential candidates of clinical outcome or progression prediction including survival and severity of BC. In this study we deal with two categories of noncoding RNAs (miRNAs and lncRNAs) interacting with the differentially expressed metastatic mRNAs. The miRNAs are endogenous single-stranded noncoding RNAs (ncRNA) having 19–23 nucleotides (nt) that are key regulators in cell development and division^[32]7. The miRNAs also regulate key signalling pathways and mRNA translation is inhibited or enhanced by posttranscriptional degradation modulating target gene expression and indirectly control tumor cell growth and apoptosis, triggers angiogenesis, and/or controls the cell cycle processes. The lncRNAs play an important role in evolution and progression of BC^[33]8,[34]9. These noncoding transcripts can affect to different cancer hallmarks like angiogenesis, epithelial-to-mesenchymal transition (EMT), apoptotic cascades, tissue invasion and metastasis to name a few. For instance, NEAT1 promotes progression of BC by modulating CCND1 gene expression^[35]10, while LINC01355 lncRNA suppresses BC progression via FOXO3-mediated inhibition of CCND1^[36]11. DNA methylation is frequently linked to differential gene expression, and until recently, the general consensus was that higher levels of cytosine methylation around and inside genes are associated with regulation of gene expression^[37]12. Calculating ratios for gene expression across samples to obtain the fold-change (FC) signifies the factor of change in expression between groups. Such genes are called differentially expressed genes (DEGs) which are expressed differently across different types of samples. They exhibit statistically significant variable behavior across samples constructed after marginalizing only those genes that truly demonstrate a difference. Limma (Linear Models for Microarray Data) t-statistical algorithm is generally used to identify DEGs for feature selection. The ground-level idea behind limma is modelling of expression levels of each gene as a linear combination of experimental factors and covariates. It studies gene expression microarray data, with a focus on the evaluation of differential expression and the application of linear models for the analysis of specified experiments. The model coefficients are estimated by maximum likelihood estimation (MLE) alongside hypothesis testing performed on estimated coefficients to determine DEGs across experimental condition with associated significant p values and false discovery rates (FDRs). The algorithm initially normalizes the microarray data to denoise and avoid bias of datasets. Also, Benjamini–Hochberg (B–H) correction is employed for reduction of FDR^[38]13,[39]14. The final computationally exhaustive list of DEGs that has undergone these algorithms and corrections is obtained and are considered for the study. Already existing drugs including chemotherapeutic agents not only have side effects, but also with prolonged usage of such drugs the BC cells become resistant to the effects of the drug. The only way to minimize side effects and avoid chemoresistance is dosage modifications or modulations of the chemotherapeutic drugs. Chemotherapeutic agents have shown directly or indirectly cause cytotoxicity resulting in tumor regression even after dosage modulations. Moreover, only a handful number of drugs target tumor tissues. Increasing research reveals that BC development and recurrence is not only associated with physiological aspects of the body but also have a direct impact on psychological and social life^[40]15. It is the need of the hour to explore drugs targeted to treat BC cells that can potentially work without effecting the non-cancerous native cells of the body, thereby mitigating the side effects of existing treatment options with newer therapeutic options. Hence, in this study, the detection of significant cliques, biomarkers and drugs derived from interaction networks of mRNA–miRNA–lncRNA–drug interactions, followed by their expression, survival and oncoprint profiles can be useful and critical for an enhanced understanding of the BC progression, which may lead to identification of newer therapeutic strategies as well. Methodology Retrieval of significant mRNAs The mRNAs associated with BC metastasis are retrieved from the HCMDB database ([41]https://hcmdb.i-sanger.com/)^[42]16. This integrated database supports wide range of expression data compiled from Gene Expression Omnibus (GEO) ([43]https://www.ncbi.nlm.nih.gov/geo/) as well as The Cancer Genome Atlas (TCGA) datasets pertaining to cancer metastasis. Moreover, the shared genes among the four datasets and Human Cancer Metastasis Database (HCMDB) were explored for distinguishing metastatic target genes of BC using InteractiVenn tool ([44]http://www.interactivenn.net/)^[45]17. These metastatic mRNAs are then compared to differentially expressed genes (DEGs) in BC obtained from the GEPIA database ([46]http://gepia.cancer-pku.cn/)^[47]18 to get a final list of 113 differentially expressed metastatic mRNAs involved in BC. GEPIA (Gene Expression Profiling Interactive Analysis) aids in time-efficient and customized functions of genes including the differential expression, correlation status, survival plots, gene similarity indices etc. encompassing the gene set from TCGA ([48]https://www.cancer.gov/ccg/research/genome-sequencing/tcga)^[49]19 and GTEx datasets ([50]https://www.genome.gov/Funded-Programs-Projects/Genotype-Tissue-Ex pression-Project)^[51]20. Retrieval of non-coding RNAs In this study, we deal with two different types of ncRNAs:miRNAs and lncRNAs. The common miRNAs interacting with the DEGs are retrieved from ENCORI ([52]https://starbase.sysu.edu.cn/)^[53]21, miRTarBase ([54]https://mirtarbase.cuhk.edu.cn/~miRTarBase/miRTarBase_2022/php/sea rch.php)^[55]22 and TargetScan ([56]http://www.targetscan.org/vert_80/). ENCORI database is composed of experimentally identified interaction networks of RNA–RNA and protein–RNA form 108 CLIP-Seq datasets generated from 37 independent studies. miRTarBase datasets exclusively deals with miRNAs along with high-throughput corroborative miRNA-target interactions. The contemporary miRNA-mRNA interactions are specifically supported from CLIP-Seq verified data. TargetScan gathers anticipated targets of miRNAs by exploring the conserved 8-, 7- and 6-mer sites that complement the seed region of miRNA. In case of common lncRNAs interacting with the DEGs two databases (TarBase v8: [57]https://dianalab.e-ce.uth.gr/html/diana/web/index.php?r=tarbasev8/i ndex^[58]21 and lncBase v3 [59]https://diana.e-ce.uth.gr/lncbasev3)^[60]23 are utilised. ENCORI is also used to obtain the miRNAs and lncRNAs that interact with each other. Retrieval of interacting drug list The drugs interacting with mRNAs are retrieved from the Comparative Toxicogenomics Database ([61]https://ctdbase.org/)^[62]24. CTD comprises of manually curated mRNA–drug interactions that affects various biological pathways underlying various diseases such as cancer. For information regarding drugs interacting with miRNAs, Sm2miR ([63]http://www.jianglab.cn/SM2miR/)^[64]25 is utilised. Sm2miR database is an inclusive database consisting of information of the role of drugs and various small molecules affecting miRNA expression influencing miRNA-associated therapeutics. The lncRNA–drug interaction data is retrieved from the D-lnc database ([65]http://www.jianglab.cn/D-lnc/)^[66]26. Generation of mRNA–miRNA–lncRNA–drug interaction network followed by hub RNA, module identification and TF analysis In order to understand the interaction in the complex linkage of miRNA–mRNAs–lncRNA–drugs, the cytoscape v3.9.0^[67]27 software was utilized effectively. It is a freely accessible platform to perform complex cluster- based networks for multiple bioentities approach. The software enables visualization of complex networks comprising of multiple bioentities. From the ‘Tool” dropdown menu, ‘Merge” attribute has been used utilized to merge the small networks formed namely, mRNA–mRNA, mRNA–lncRNA, mRNA–drugs, miRNA–drugs, lncRNA–drugs and miRNA–lncRNA. Consequently, the Maximal Clique Centrality (MCC) ranking method of the ‘cytohubba’ plugin has been utilized for extracting the hub mRNAs from interaction network. Additionally, the ‘MClique’ plugin was employed to obtain cliques. The MCODE plugin is employed on the interaction network to obtain the top two subnetworks (based on MCODE Score of 5.438 and 5.417) to carry out transcription factor (TF) analysis, respectively. The top BC carcinogenic conditions in which the highly expressed hub genes (cut-off degree of 30 for significant genes) are obtained from the topological parameters. For the mRNAs from the top two subnetworks from MCODE results, interacting miRNAs are searched from the ENCORI database. Once the mRNAs and miRNAs are retrieved, the interacting TFs for mRNAs and miRNAs are retrieved from the TRRUST ([68]https://www.grnpedia.org/trrust/)^[69]28 and TransmiR ([70]https://www.cuilab.cn/transmir)^[71]29 databases, respectively. To retrieve the hub TFs, the subnetworks are first merged and then MClique plugin is employed on the interaction network involving mRNAs, miRNAs and TFs. ChA3 ([72]https://maayanlab.cloud/chea3/)^[73]30 TF analysis was done for all the TFs obtained from TRRUST and TransmiR to validate the hub TFs revealed in the Clique identification in MClique of cytoscape. To validate the significance of the TFs, CheA3 TF analysis of the TFs was done. Status of various correlated gene regulation in diseased state In TCSBN, the gene co-expression network for the 20 hub RNAs is built under the category of cancer tissue utilising the Breast (BRCA) dataset. Based on the hub RNAs retrieved from the interaction network, a backbone gene co-expression network is built using the Tissue/Cancer-Specific Biological Networks ([74]https://inetmodels.com/)^[75]31 database. Among the top 20 hub RNAs, 11 were mRNAs. The adjusted p value is set to 0.05, the node limit is 25, correlation is set to both (positive and negative). Expression, survival and oncoprint analyses of the RNAs Following the retrieval of hub RNAs, the expression analysis for mRNAs and miRNAs is done using UALCAN (The University of ALabama at Birmingham CANcer). The UALCAN webtool ([76]http://ualcan.path.uab.edu/cgi-bin/ualcan-res.pl)^[77]32 generally comprises of omics data related to cancer and provides efficient expression profiles for genes corresponding to protein-coding and non-coding RNAs (ncRNAs). The gold-standard metastatic dataset of TCGA corresponding to ‘BRCA: breast invasive carcinoma’ dataset is exploited effectively to retrieve the expression pattern of the hub RNAs. For the selections pertaining to sample types, BC subclasses and their subclass-associated DNA-methylation status, genes with p value < 0.05 were checked for their significant expression in BC. Subclass-based promoter methylation status is generally an epigenetic event in the initial phases of tumorigenesis and therefore has prognostic cancer biomarker potential^[78]33. In order for normal regulation of the genes, DNA methylation as well as structure of the chromatin play significant roles. Hence, DNA methylation status was also checked using UALCAN tool. The beta-value along the ordinate-axis of the methylation plots range from 0 (unmethylated) to 1 (full methylated mRNAs). Hypermethylation ranges from beta value of 0.7–0.5 while the 0.3–0.25 range of beta value indicates hypomethylation. Further, survival analysis of mRNAs was performed using Oncolnc ([79]http://www.oncolnc.org/)^[80]34 that provides an interactive platform correlating the survival data of cancer patients obtained from TCGA with mRNA, miRNA and lncRNA expressions. The patients’ tumor samples were divided into high- and low-expression groups (n = 503) which were analyzed by log-rank test and statistical significance of the selected markers was confirmed with p value < 0.05. To validate the most significant survival biomarkers PrognoScan ([81]http://dna00.bio.kyutech.ac.jp/PrognoScan/)^[82]35 and Fp tool ([83]http://dcv.uhnres.utoronto.ca/FPCLASS)^[84]36 scores were compared to the UALCAN p values for individual DEGs. The robust platform provided by PrognoScan makes it possible to assess prospective tumor markers and treatment targets, which would speed up the study of cancer. Finding the genes that are co-expressed with hub genes can be done using the in-silico method known as the Fp tool, which predicts high-confidence protein–protein interactions. Based on the total scores, the co-expressed genes were determined. In order to achieve vital information on the genetic alteration of the DEGs in individual hallmarks, oncoprint analysis was performed with the help of cBioPortal server ([85]https://www.cbioportal.org/)^[86]37. The Oncoprint technique is particularly helpful for finding trends like co-occurrence and mutual exclusivity as well as for visualizing changes across a group of instances. In the end, two genes that do not co-occur in the same patient may be in the same pathway. Finding these genes makes it possible to develop artificially deadly treatments. It shows discrete gene values for all data types, including data with continuous values like mRNA expression (i.e., whether a gene is altered or not based on a predetermined threshold. This integrative platform provides high quality genetic profiles relative to the various alterations at molecular level. Samples across the TCGA dataset (Cell, 2015) was explored to see genetically altered mRNAs. This database reveals different chromosomal mutations with their chromosomal position and changes in base pair. In addition to the cBioPortal default layout, there are supplementary bar plots on either side of the heatmap that display the number of various modifications for each sample and each gene. In this study we have incorporated this data in the form of a Circos plot using the web-based Circos tool ([87]http://mkweb.bcgsc.ca/tableviewer/)^[88]38. Enrichment analysis of the DEGs EnrichR was used to conduct the functional enrichment analysis of the DEGs. This web-based tool ([89]https://maayanlab.cloud/Enrichr/)^[90]39 enables enrichment analysis of the gene list based on genome-wide experiments. This study utilizes this open-access tool to perform pathway analysis (BioPlanet 2019, KEGG 2021 Human, Elsevier Pathway Collection and Reactome 2022) and ontology analysis of the DEGs categorized into three semantics: Biological Processes (BP), Molecular function (MP) and Cellular Component (CC). Figure [91]1 schematically represents the framework of the overall study. Figure 1. [92]Figure 1 [93]Open in a new tab The integrated methodology of the study exploring the role of coding and non-coding RNAs, small molecules and transcription factors involved in breast cancer metastasis. Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors. Results Obtaining target RNAs and interacting drugs Initially, 450 BC metastatic mRNAs are retrieved from HCMDB database followed by retrieval of 3159 BC associated differentially expressed mRNAs from GEPIA2. Comparing both the gene lists, 113 differentially expressed mRNAs involved in BC carcinogenesis. These 113 DEGs are found to be interacting with 614 common miRNAs retrieved from TargetScan, ENCORI and miRTarBase databases and 220 common lncRNAs of TarBase, lncBase (mRNA–lncRNA interaction data) and ENCORI (miRNA–lncRNA interaction data) are considered for the study. 1049 drugs interacting with mRNA, miRNA and lncRNA were discovered from CTDbase, Sm2miR and D-lnc database, respectively. BC-specific backbone PPI network The backbone gene co-expression network for the top 20 hub DEGs provides the positively and negatively correlated genes involved in the network (Fig [94]S3) using TCSBN. The correlated gene clusters are listed along with their p values in Tables [95]S1 and [96]S2. The positively correlated edges are represented as orange lines while the negative ones as blue lines. The nodes are in blue for both the correlation networks representing genes. The net correlation network is represented in Fig. [97]2. Figure 2. [98]Figure 2 [99]Open in a new tab Overall correlation network for the 11 hub mRNAs. The orange lines represent positive correlation and blue lines represent negative correlation of genes. The central nodes are the 11 hub DEGs and the surrounding nodes are correlated genes forming clusters. Interaction network analysis using cytoscape v3.9 In cytoscape, the interaction data tables are loaded individually summing up to six different interaction tables: mRNA–miRNA, miRNA–lncRNA, mRNA–lncRNA, mRNA–small molecules, miRNA–small molecules and lncRNA–small molecules interaction network. The complex interaction network is generated by merging these individual interaction networks (Fig [100]S1). In the interaction network the mRNAs, miRNAs, lncRNAs and drugs are represented by green rectangles, yellow parallelogram, blue rhombus and grey triangles, respectively. Cytohubba and MClique plugins applied on the interaction network reveals the top 10 hub RNAs (STAT3, MAPK1, BIRC5, NEAT1, XIST, MALAT1, Doxorubicin, Curcumin, Formaldehyde, Glucose) and a total of 656 cliques: one pentagon clique with 5 nodes, 133 cliques with four nodes and remaining 522 cliques comprised of three nodes. Top 10 cliques are represented in Fig. [101]3. Among the top 10 cliques generated, significant mRNAs include INPPL1, STAT5B, BIRC5, YAP1, SATB1, BSG, FGFR1, significant lncRNAs include NEAT1 and XIST and significant drugs present in the top 10 cliques include doxorubicin (6) and dexamethasone (2). Ten different miRNAs are involved in cliques. No redundant/common miRNAs among the top 10 cliques. Figure 3. [102]Figure 3 [103]Open in a new tab Top 10 cliques from the complex mRNA–miRNA–lncRNA–drug interaction network. DEGs involved in the cliques include INPPL1, STAT5B, BIRC5, YAP1, SATB1, BSG, FGFR1. Significant lncRNAs include NEAT1 (6) and XIST (3). Drugs in maximum number of cliques include doxorubicin (6) and dexamethasone (2). There are no shared miRNAs among the top 10 cliques. Retrieval of sub networks from the interaction complex network MCODE is implemented on the interaction network based on the haircut algorithm taking into consideration the parameters like a k core of 2, a node cut-off value of 0.2, and a maximum depth of 100. To create sub-networks, the top two clusters according to the clustering score were utilised (Fig. [104]4a) taking into consideration the top 20 hub RNAs (Fig [105]S2) using cytoscape. The first module had cluster score of 5.438 encompassing 90 nodes and 242 edges followed by the module with a cluster score of 5.417 that was made up of 25 nodes with 65 edges. Figure 4. [106]Figure 4 [107]Open in a new tab (a) Based on the MCODE plugin score of cytoscape, the top 2 sub-networks are retrieved: Cluster 1 with a score of 5.438 and Cluster 2 with a score of 5.417. (b) The merged network for mRNAs-miRNA-TF (c) Cliques generated involving mRNAs-miRNAs-TFs reveal two significant TFs: SP1 and ETS1. Gene ontology (GO) analysis of the top 2 modules revealed their involvement in salivary gland morphogenesis, lymph node development, pathway restricted SMAD protein phosphorylation, JAK-STAT cascade involved in growth hormone signalling pathway (biological processes), ion transmembrane transported activity and Ran GTPase binding (molecular function) and connexon complex, contractile fiber, pseudopodium, dendrite cytoplasm, axon and gap junction (cellular components) were the topmost ontologies enriched. Construction of TF–miRNA–mRNA interaction network and TF analysis The mRNAs involved in these two sub-networks (Fig. [108]4a) are further exploited to study a mRNA–miRNA–transcription factor (TF) interaction network. For the 8 common mRNAs involved, 39 interacting miRNAs and 56 interacting TFs were retrieved. The merged network for mRNA–miRNA–TF is generated (Fig. [109]4b). The MClique plugin of cytoscape is utilised to reveal that two of the 56 TFs (ETS1 and SP1) are present in two of the three generated cliques (Fig. [110]4c). Two out of the three cliques have TFs ETS1 and SP1. Further, the CheA3 TF analysis reveals that these two TFs from the cliques are present among the top 20 integrated mean rank TF analysis (Table [111]S3). The third clique is comprised of two mRNAs and one miRNA. Expression, survival and oncoprint analyses of the DEGs Expression analysis of mRNAs and lncRNAs Subclass-based expression analysis. The expression analysis of mRNAs and miRNAs was performed using the UALCAN tool (Table [112]1). The BC subtype-based expression analysis reveals GJA1 is the most expressed mRNAs and TGFB1 and EGFR has similar expression levels (Fig [113]S4) in BC subtypes while the hub lncRNAs (NEAT1 and MALAT1) (Fig [114]S5) are almost similarly expressed in all BC subtypes. XIST less expressed in luminal than the other lncRNAs. Table 1. Subclass-based expression analysis of hub RNAs and DNA promoter methylation status of mRNAs. [115]graphic file with name 41598_2023_42904_Tab1_HTML.jpg [116]Open in a new tab Promoter-methylation based expression. The promoter methylation status analyses of the mRNAs revealed that (Table [117]1) MMP2, CXCL12, MAPK1, ICAM1 and BIRC5 are significantly methylated in Luminal A subtype of BC. GJA1, ENAH, EGFR, TGFB1and YAP1 are highly methylated in HER2 + BC subtype. STAT3 is the only gene found to be most methylated in triple-negative breast cancer (TNBC). Expression analysis of miRNAs Expression levels of miRNAs (Fig. [118]5a,b) reveals that miR-105-2 and miR137 are the most significantly expressed miRNAs, while miR-204 and miR133b expression levels are reduced in BC (Fig [119]S6). Figure 5. [120]Figure 5 [121]Open in a new tab (a) Top 7 downregulated miRNAs include miR-516-2, miR-488, miR-346, miR-204, miR-139, miR-133b and miR-934. (b) Top 7 upregulated miRNAs include miR-301b, miR-190b, miR-105-2, miR-18b, miR429, miR-96 and miR-137. Oncoprint analysis results from cBioPortal reveals that SQLE has the highest (27%) genetic alteration followed by DEPTOR (23%), TNFRSF11B (21%), GATA3 (19%), CTTN, ERBB2 & CCN4 (18%), AKT3 & FGFR1 (17%), SHC1 & S100A9 (16%), DDR2, IL10 & S100A4 (14%) and LYN, CDKN2A & BIRC5 (11%). The oncoprint analysis of all the significant DEGs are depicted in Fig. [122]6. Figure 6. [123]Figure 6 [124]Open in a new tab The oncoprint analysis of the significantly altered DEGs. > 10% genetic regulation of SQLE, DEPTOR, TNFRSF11B, GATA3, CTTN, ERBB2 & CCN4, AKT3 & FGFR1, SHC1 & S100A9, DDR2, IL10 & S100A4 and LYN, CDKN2A & BIRC5 from TCGA dataset. Survival Analysis of DEGs Survival Analysis using Oncolnc in Table [125]2 lists significant survival biomarkers with log rank p values of less than 0.05. Further, p values from the PrognoScan tool analysis and Fp class scores validates the survival status of the top hub mRNAs. Table 2. Kaplan–Meier plot analysis of DEGs significant in survival of BC patient. Survival biomarkers Logrank p value ADGRF5 0.0379 ALDH1A1 0.0284 FOXM1 0.0125 GBP2 0.0221 KRT5 0.0442 STAT5A 0.0163 TGFB1 0.03 [126]Open in a new tab Validation of survival biomarkers was done by comparing the p values of the hub mRNAs from PrognoScan and Oncolnc (Table [127]3). This was done to finalize the most probable survival biomarkers that can be targeted as a therapeutic alternative to manage BC. To each hub gene, a gene expression score and a network’s topological score were generated. The co-expressed partner gene with the common survival marker TGFB1 (hub gene) was identified as FURIN. Table 3. Validation of top 20 hub genes: by utilizing the p value from PrognoScan and Oncolnc and Fp Class scores. hub mRNA PrognoScan Oncolnc Fp class Co-expressed gene symbol Total score Gene co-expression score Network topology score BIRC5 0.000296436 0.07 NUSAP1 0.9429 0.9423 0 STAT3 0.0027682 0.1 SOD2 (Superoxide dismutase [Mn], mitochondrial) 0.5549 0.527 0 MAPK1 9.00E-06 0.5 KCTD20 0.4916 0.4457 0 TGFB1 0.000147159 0.03 FURIN 0.4944 0.1083 0 MMP2 0.00829304 1 COL1A2 0.8826 0.8801 0 EGFR 3.39E-05 0.3 FBLN1 0.8117 0.6334 0 ENAH 0.00351316 0.5 PTK2 0.2609 0.1297 0 GJA1 0.000256725 0.04 SPARC 0.7426 0.731 0 CXCL12 3.02E-05 0.3 DCN 0.8826 0.8436 0 YAP1 2.71E-05 0.3 ID1 0.803 0.0741 0 ICAM1 0.00176679 0.03 SOD2 (Superoxide dismutase [Mn], mitochondrial) 0.5027 0.4925 0 [128]Open in a new tab Mutation-related data for DEGs From cBioPortal, the types of mutation for each hub mRNAs are obtained. Circos plot (Fig. [129]7) represents missense, frameshift deletions and splicing in different chromosomal positions of the hub genes. Figure 7. Figure 7 [130]Open in a new tab Circos plot representing mutations of the 11 DEGs. FI-frameshift insertions, FD-frameshift deletions, SP-splicing. The circular two-dimensional graphical representations known as “circos plots” offer a thorough method for presenting and understanding multi-dimensional genetic data. The different versions of the same genes are mentioned as gene_1, gene_2 and so on depending on the number of different chromosomal positions at which the mutations occur. ENAH is the only mRNA with missense (MS), Splice sites (SP) and frameshift deletion (FD) while BIRC5 has FD AND missense mutations. No data was found for CXCL12. The Table [131]S4 lists all the mutations with respective chromosomal positions for the 11 hub DEGs. Enrichment analysis of DEGs The web-based EnrichR tool (Fig. [132]8a,b) is used for pathway enrichment and functional ontology analysis. Based on REACTOME, KEGG, Elsevier and WiKi datasets, the most significant pathways include ‘Role of ERBB2 in Signal Transduction and Oncology’, ‘Bioactive Peptide Induced Signaling Pathway’, ‘IL-2 Receptor Beta Chain in T cell Activation’, ‘TPO Signaling Pathway’ and ‘cell cycle arrest at G1/S Check point’ to name a few. Figure 8. [133]Figure 8 [134]Open in a new tab (a) Pathways enriched by the 113 DEGs as per REACTOME, KEGG, Elsevier and WiKi datasets. (b) Ontology enrichment (cellular components, molecular function and biological processes) of the 113 DEGs. Transmembrane receptor protein kinase activity, transmembrane receptor protein tyrosine kinase activity, protein tyrosine kinase activity, type I transforming growth factor beta receptor binding and serine-type peptidase activity are the top 5 molecular functions that are regulated by mRNAs like AXL, FLT4, ERBB2, EGFR, TGFBR2, DDR2, FGFR1, TGFB1, ENG, TGFBR2, ST14, HGF, HTRA3, MMP2, MST1 and F3. TGFB1, ANGPT2, MMP2, MST1, F3, THBS1, SOD3, ICAM1, VCAN, EFEMP1, CXCL12, TIMP3, S100A4, S100A9, TGM2, CAV1, ITGA2, EGFR, ENAH, FERMT1, IL1RL1, GJA1, ALCAM, CTTN, BSG, MAPK1, TGM2, DDR2 and ENG are some of the mRNAs that are present in the collagen-containing extracellular matrix, cell-substrate junction, focal adhesion points, intracellular membrane-bounded organelle and integral component of plasma membrane amongst many. The top significant biological processes include cytokine-mediated signaling pathway, cellular response to cytokine stimulus, regulation of cell population proliferation, positive regulation of intracellular signal transduction and positive regulation of cell population proliferation. Discussion Conventionally, an individual signaling pathway or a dysregulated proteins is targeted for devising therapies. In this work, DEGs involved in BC are considered that mediate the functioning of various ncRNAs involved in the disease development. Hence, we have utilised network-based approaches to identify specific cliques involving the interaction of DE mRNAs, miRNAs, lncRNAs and drugs/small molecules that can be considered for devising newer therapeutic strategies as they give us an overall idea of the disease environment under the influence of interacting RNAs and drugs. The ncRNAs are often not extensively discussed or considered with respect to disease causing phenomena and in some of the studies they are considered as separate identities, when the underlying mechanism of disease involves the interaction of mRNAs and ncRNAs with each other. For instance, when no drug is provided into a system, dysregulated levels of ncRNAs can modulate the levels of coding RNAs or expression of protein levels leading to a diseased state. Through this network-based approach we are trying to understand the crosstalk between mRNAs and ncRNAs that can therapeutically be addressed in future considering the entire disease environment. Differential expression of genes helps in the developing in-depth biological insight into the underlying mechanism of a disease condition owing to its interaction with various ncRNAs and drugs that also play a major role in modulating the overall scenario underpinning any diseased state. According to the in-silico study conducted, 11 out of 113 differentially expressed metastatic genes undergo different types of mutations, primarily missense, frameshift and splicing that lead to the diseased state. Regulation of expression of different DEGs in turn influence the expression levels of miRNAs, lncRNAs and different molecules (small molecules/drugs). Among the mRNAs from the top 10 cliques generated, BIRC5 was one of the most significant with more than 10% of probable genetic alteration. BIRC5 expression is associated with resistance to chemotherapy, radiation and neo-adjuvant therapy, specifically in II/III stage of BC^[135]40. Also, with age of the patients the expression of BIRC5 increases and hence targeting it aids in better survival in BC patients. Significant amount of DNA methylation is observed in BC tissues. Also, from the DNA promoter methylation status of BIRC5 reveals its negative association with the methylation status of the gene. Survivin coded by BIRC5 is responsible for cell division bypassing the cell death in normal as well as cancer cells leading to decreases survival of cancer patients. However, the validated survival markers include TGFB1, GJA1 and ICAM1. The role of TGFB1 is multifaceted in BC depending on the stage of cancer^[136]41. This cytokine exhibits tumor-suppressing properties in the initial stages of BC by hindering epithelial cell cycle progression and enhancing apoptosis. TGFB1 is associated with tumor development, higher cell motility, cancer invasiveness, and metastasis in late stages, however. Additionally, it promotes the EMT and modifies the cancer microenvironment. However, there are three known therapeutic strategies involving TGFB1. On order to disrupt ligand-receptor interactions, soluble receptors and anti-receptor monoclonal antibodies are used along with TGFB1 receptor kinase inhibitors and peptide aptamers that suppress intracellular signalling cascades by using antisense compounds to impede TGFB1 production on the ligand level^[137]42–[138]45. GJA1 is subtype-dependent mRNA that codes for the protein Connexin-43 (Cx43). Cx43 is overexpressed in ERα- or PR-positive breast tumors compared to ERα- or PR-negative tumors^[139]46. The expression of ICAM1 is directly proportional to the metastatic potential of the tumor cells. Reduced invasion of human epithelial BC cells in vitro is done by targeting endogenous human ICAM1^[140]46. The significant DEGs are not only identified by exploiting multiple correction algorithms but are also backed up by existing literature review regarding their relevant role in BC. Hence, the results are consistent and comparable (Table [141]4) as per literature provided regarding the role of these DEGs in BC. Table 4. Comparison of our integrated analysis with similar studies reported in the literature. Cancer type Gene-expression mRNA–miRNA–lncRNA–drugs Hub identification Functional module /subnetwork extraction Pathway enrichment analysis Oncoprint and survival analysis Mutation circos plots References