Abstract Bladder Cancer (BC) is one of the most common cancers in the world. Recent studies show that non-coding RNAs such as lncRNAs and circRNAs play critical roles in the progression of this cancer, but their regulatory relationships and functions are still largely unknown. As a new regulatory process within the cell, the coding and non-coding RNAs compete with each other to sponge their target miRNAs. This mechanism is described as “the competing endogenous RNA (ceRNA) hypothesis” which provides a new perspective to understand the regulation of gene expression in health and diseases such as cancer. In this study, to investigate the role of non-coding RNAs in BC, a new approach was used to reconstruct the ceRNA network for Non-Muscle Invasive Bladder Cancer (NMIBC) based on the expression data of coding and non-coding genes. Analysis of ceRNA networks in the early stage of BC led to the detection of an important module containing the lncRNA MEG3 as the central gene. The results show that the lncRNAs CARMN, FENDRR and ADAMTS9-AS2 may regulate MEG3 in NMIBC through sponging some important miRNAs such as miR-143-3p, miR-106a-5p and miR-34a-3p. Also, the lncRNA [30]AC007608.2 is shown to be a potential BC related lncRNA for the first time based on ceRNA stage-specific network analysis. Furthermore, hub and altered genes in stage-specific and between stage networks led to the detection of hsa_circ_0017586 and hsa_circ_0001741 as novel potential circRNAs related to NMIBC. Finally, the hub genes in the networks were shown to be valuable candidates as biomarkers for the early stage diagnosis of BC. Subject terms: Urological cancer, Biochemical networks Introduction Bladder Cancer (BC) is one of the most common cancers in men and women. This cancer is the 10th common cancer worldwide with about 200,000 people dying every year^[31]1. It has been estimated that there were 79,030 new cases of BC and 16,870 BC deaths in the United States in 2017^[32]2. In China, the mortality rate of BC patients has significantly increased in recent years, and 32,900 deaths were estimated in 2015^[33]3. Furthermore, due to the complex progression process and molecular interactions linked to the disease, the mechanism of BC remains poorly understood^[34]4. Recent studies demonstrate that non-coding RNAs are closely related to BC progression and tumorigenesis^[35]5,[36]6. Non-coding RNAs construct 80–90 percent of human transcripts^[37]7. This type of RNA is involved in many gene regulation processes in the cell, and any change in these processes can trigger complex disorders especially cancer^[38]8. Coding and non-coding RNAs regulate the expression of each other through common miRNAs that target them. This mechanism is known as the competing endogenous RNA (ceRNA)^[39]9,[40]10. More precisely, transcripts with the same miRNA response elements (MREs) that are targeted by common miRNAs compete with each other to decoy or sponge shared miRNAs (these transcripts are called ceRNAs)^[41]9. This competition affects gene expression in the cell. Recent studies have shown that ceRNA interactions control many important biological processes and alteration in these interactions may cause diseases such as cancer^[42]11–[43]13. Based on previous studies, long non-coding RNAs (lncRNAs), circular RNAs (circRNAs), microRNAs (miRNAs) and transcribed pseudogenes are the most important non-coding RNAs involved in the ceRNA mechanism^[44]10,[45]14. lncRNAs are RNAs longer than 200 nucleotides and play pivotal roles in many complex diseases^[46]15. In the past decade, the role of many lncRNAs has been studied in various cancer types. For example, lncRNA H19 is involved in metastasis, tumor initiation, and progression of many cancers^[47]16. HOTAIR, a well-known lncRNA, is overexpressed in lung cancer and correlates with its metastasis and poor prognosis^[48]17. Another type of non-coding RNA with very important regulatory functions are circRNAs^[49]18. These RNAs are constructed as a result of back-splicing from pre-mRNAs^[50]19. The conservation, stability, and tissue specificity properties of circRNAs make them valuable candidates for biomarker discovery in cancer research^[51]18,[52]20. For instance, circRNA hsa_circ_0003221 promotes the proliferation and migration of BC cells^[53]21, and the expression of circRNA Hsa_circ_0000190 correlates with tumor size, metastasis, and disease stage of gastric cancer^[54]20. The pivotal non-coding RNAs in the ceRNA hypothesis are miRNAs. These non-coding RNAs are involved in post-transcriptional regulatory processes. They bind to the 3′ untranslated region (3′UTR) of mRNAs and prevent translation^[55]22. miRNAs dysregulation affects many critical biological processes such as cell proliferation, migration, differentiation, and apoptosis^[56]23. The expression of miRNAs significantly change in many cancerous tissues^[57]24, and these non-coding RNAs can act as both tumor suppressor or oncogene in cancer^[58]25. The ceRNA interactions can be modeled as a network in which each node represents a ceRNA, and each interaction represents the competing relationship between two ceRNAs based on common miRNAs as mediators^[59]14. Reconstructing and analysis of such networks is a very fascinating approach to identify key genes in complex diseases to understand disease mechanism. Since the function of many non-coding RNAs (especially lncRNAs and circRNAs) in cancer remain unclear, the ceRNA network analysis can be helpful to discover novel cancer related non-coding RNAs as biomarkers or therapeutic targets. To understand the post-translation mechanism and non-coding RNA roles in BC, the ceRNA network analysis has been a desirable approach in recent years^[60]4,[61]6,[62]7,[63]26. For instance, Zhu et al. proposed six lncRNAs as potential biomarkers for BC based on ceRNA network analysis^[64]4. Also, they found one miRNA and six differentially expressed mRNAs that correlated with BC patient survival rates. The ceRNA network reconstructed by Li et al. revealed that lncRNA MIR194-2HG, AATBC, and circRNA PGM5 could sponge the BC related miRNAs^[65]6. In^[66]7 constructing and analyzing the ceRNA network indicated that lncRNA H19 and circRNA MYLK compete with each other to decoy miRNA-29a-3p and increase the expression of its target genes DNMT3B, VEGFA and ITGB1. Moreover, in the same study, Wang et al. reconstruct the BC related ceRNA network based on 322 muscle-invasive bladder cancer tissues and 19 normal tissues^[67]26. Consequently, they proposed 5 BC related lncRNAs as potential biomarkers and therapeutic targets. Although many studies have assessed the role of non-coding RNAs in BC, the molecular mechanism remain mostly unclear. To tackle this problem, in this study, we reconstruct and analyze a ceRNA network based on lncRNA, circRNA, transcribed pseudogene, and mRNA expression data from two early-stage urothelial carcinomas. A new computational approach was used to select disease-related mRNAs. Consequently, the stage-specific and the between stage ceRNA networks were reconstructed and examined. Finally, the hub nodes in the networks were analyzed as candidate prognostic biomarkers to detect BC at the early stages. Results Gene set enrichment We selected the disease-related mRNAs based on their expression and interaction at the protein level for further analysis. Two approaches were used to this end: the PPR algorithm and the DNR approach (Eq. [68]2). Ten percent of the mRNAs with the highest rank (605 mRNAs) were selected in each approach, and about 70% of mRNAs chosen by the two methods were the same. To evaluate our new approach for selecting cancer-related mRNAs, we extracted those genes selected by our method that they were not in the DEGs set. Subsequently, the biological process, pathway, and disease enrichment analysis were applied to these extracted genes (Supplementary Files [69]2–[70]4). The results show that the selected genes were associated to critical cancer-related pathways and processes such as “PI3K-Akt signaling pathway”, “Rap1 signaling pathway”, “Ras signaling pathway”, “regulation of cell proliferation”, “programmed cell death”, “cell cycle” and “DNA Repair” (Fig. [71]1, Supplementary File [72]1 and Fig. [73]S1). Furthermore, we investigated the relationship between these genes and BC using PubMed enrichment analysis in ToppFun and literature search. For the genes selected by DNR and PPR method, 62 and 45 articles about BC were found, respectively. Also, 107 and 81 genes of the genes selected by DNR and PPR respectively were studied in those mentioned articles (Supplementary File [74]5). These results show that there are many BC related genes in our samples that are not differentially expressed, but our new approach may be able to find them. For example, Michiels et al.^[75]27 investigated the role of 85 DNA repair genes in BC. Among the genes significantly related to BC in this study, six genes were found by our approach (CDKN2A, FANCD2, LIG1, POLR2K, RFC2, and RFC5). In a recent study, Books et al.^[76]28 experimentally found a significant association between the expression of COL1A1 and COL1A2 mRNAs and NMIBC. These genes are also found by the PPR method. MDM2, HMOX1, and SDC1 are other examples of the NMIBC related genes^[77]29,[78]30 that are not differentially expressed in the samples but are found by our ranking methods. Figure 1. [79]Figure 1 [80]Open in a new tab Some of the most critical enriched pathways and processes belonging to the mRNAs selected by DNR approach that they are not in the DEGs set. Stage-specific ceRNA networks Considering selected key mRNAs based on PPR and DNR methods, two ceRNA networks for each stage of BC were reconstructed (Supplementary File [81]1, Figs [82]S2 and [83]S3): * DNR-Ta: the network of ceRNAs that was reconstructed based on the mRNAs selected by the DNR method based on stage Ta expression data. * DNR-T1: the network of ceRNAs that was reconstructed based on the mRNAs selected by the DNR method based on stage T1 expression data. * PPR-Ta: the network of ceRNAs that was reconstructed based on the mRNAs selected by the PPR method based on stage Ta expression data. * PPR-T1: the network of ceRNAs that was reconstructed based on the mRNAs selected by the PPR method based on stage T1 expression data. Table [84]1 shows the global properties of these networks. The degree distribution of the networks follows the power low mode (Supplementary File [85]1, Figs [86]S2 and [87]S3). It is clear that the networks are scale-free and their connections are not organized randomly. Table 1. Global properties of the stage-specific networks. Network # nodes # edges # mRNAs # lncRNAs or Pseudogenes # circRNAs DNR-Ta 1692 32478 529 910 253 PPR-Ta 1655 31311 494 909 252 DNR-T1 1295 13899 491 624 180 PPR-T1 1283 15010 469 632 182 [88]Open in a new tab Stage-specific modules To find important functional blocks in each stage-specific network, the MCL algorithm was used to cluster the networks and detect modules. Consequently, the ToppFun web tool was used to perform GO, pathway, and disease enrichment analysis on the modules. Considering the modules with a size larger than three nodes, 55 and 83 modules were detected in DNR-T1 and DNR-Ta networks, respectively (48 and 90 modules in PPR-T1 and PPR-Ta networks, respectively). Although the enrichment tools could not find any significant process and pathway for many non-coding modules, we found some significant cancer-related pathways and processes in the modules based on enrichment analysis on coding genes (Supplementary Files [89]6–[90]9). The most important modules (called MEG3 modules) detected in stage-specific networks are the modules depicted in Fig. [91]2. Figure 2. [92]Figure 2 [93]Open in a new tab MEG3 module in stage-specific ceRNA networks ((A): DNR-Ta, (B): PPR-Ta, (C): DNR-T1, (D): PPR-T1). The thickness of the edges shows the correlation, and the number on each edge is the number of shared miRNAs between two genes. As is shown in Fig. [94]2, many of the genes in the modules are the same (almost 75–80 percent of genes in modules detected by two approaches in each stage are common). This shows that in fact, we found one module by two approaches, altered when the disease has progressed from stage Ta to T1. Searching in the literature shows that there are many cancer-related genes in this module (Supplementary File [95]1, Table [96]S1). Also, the biological process and pathway enrichment analysis of the protein-coding genes in this module showed that it is involved in many important processes and pathways in human cells (Supplementary File [97]10). The genes in this module could be interesting candidates for therapeutic targets. Stage-specific key genes Key genes in the stage-specific networks were detected using the degree, and betweenness centrality measures. The degree of each gene in the network demonstrates the number of genes that are connected with it and the betweenness centrality measures the amount of shortest path that passes from the genes. Therefore, the genes with high degree and betweenness centralities can be important because these genes compete with many other genes and play a pivotal role to construct pathways in the network, respectively. Accordingly, we selected the top 10 genes with the highest degree and betweenness centrality in each stage-specific network as key genes and analyzed cancer relativity of them based on literature searches (Tables [98]2 and [99]3). Table 2. Key nodes in stage Ta networks based on Degree and Betweenness centrality measures. Gene name Gene type DNR-Ta network* PPR-Ta network* Some related cancers Reference (PMID) NRAS Protein coding D, B D, B Bladder, Colorectal, Melanoma 21072204 22820081 28074351 PDIA6 Protein coding D, B — Bladder, Breast 27760590 26125904 CCNG1 Protein coding D, B — Bladder, Ovarian 27982046 25981880 hsa_circ_0000591 circRNA D D — — hsa_circ_0000592 circRNA D D, B — — RPLP0P6 Pseudogene D, B D, B — — PARPBP Protein coding D D Pancreatic 23436799 CCNT2 Protein coding D — leukemia 28409330 MSH6 Protein coding D D, B Bladder, colorectal 20591884 21642682 hsa_circ_0003221 circRNA D D Bladder 29125888 KCNQ1OT1 lncRNA B B Breast, Lung 30157476 28600629 PWAR6 lncRNA B B Glioma, Breast 30472640 30297886 WEE1 Protein coding B B Breast, Leukemia, Melanoma, Brain 27427153 HCG18 lncRNA B B Bladder 30426533 [100]AC015813.1 lncRNA B B — — NCAPD2 Protein coding B — Bladder 21982874 RFC3 Protein coding — D Breast, Ovarian, Esophageal 27888707 26464638 22328562 CCNA2 Protein coding — D Bladder, Breast, Colorectal 30138038 24622579 30464611 hsa_circ_0017586 circRNA — D — — NAMPTP1 Pseudogene — B — — [101]Open in a new tab *D: Degree, B: Betweenness. Table 3. Key nodes in stage T1 networks based on Degree and Betweenness centrality measures. Gene name Gene type DNR-T1 network* PPR-T1 network* Some related cancers Reference (PMID) CCNA2 Protein coding D D Bladder, Breast, Colorectal 30138038 24622579 30464611 CCNB1 Protein coding D D Bladder, Breast, Gastric 30170380 28821558 30518842 PARPBP Protein coding D D Pancreatic 23436799 NCAPD2 Protein coding D D Ovarian 21423607 RFC3 Protein coding D D Breast, Ovarian, Esophageal 27888707 26464638 22328562 MCM7 Protein coding D D Bladder, Lung 23201130 27797825 KPNA2 Protein coding D D Bladder, Lung, Colorectal 27611951 27009856 26663089 ASF1B Protein coding D D Breast 21179005 LMNB1 Protein coding D — Liver, Cervical 19522540 30394198 CDC25A Protein coding — D Bladder, Breast, Ovarian 29246203 30443946 30405749 NUP214 Protein coding B — Leukemia 25120641 RPLP0P6 Pseudogene B B — — NR2F1-AS1 lncRNA B — Hepatocellular carcinoma 29602203 YWHAH Protein coding B — Glioma 26370624 GAPDHP1 Pseudogene D, B B — — NRAS Protein coding B B Bladder, Colorectal, Melanoma 21072204 22820081 28074351 LINC01355 lncRNA B — Lung 28949095 GLI2 Protein coding B — Bladder, Gastric 27465044 28975979 CCNT2 Protein coding B — leukemia 28409330 EEF1A1P13 Pseudogene B — — — FASN Protein coding — D Bladder, Breast 30221356 28922023 ANXA2P2 Pseudogene — B — — HCG18 lncRNA — B Bladder 30426533 NCOR2 Protein coding — B Lung 29764865 COL1A2 Protein coding — B Bladder, Gastric 27655672 28401451 MEG3 lncRNA — B Bladder, Gastric, Esophageal, Glioma, Breast, Ovarian 30461333 30417553 30310931 28975980 28539329 30389444 ENO1 Protein coding — B Pancreatic, Gastric, Colorectal 28086938 29986635 26097998 CDC25B Protein coding — B Bladder, Endometrial, Gastric 29234286 29353204 27863420 [102]Open in a new tab *D: Degree, B: Betweenness. As shown in Table [103]3, no circRNA is detected as a hub in T1 networks, but in Ta networks, the circRNAs are seen in the hub nodes set. If 13 and 12 nodes with the highest degree are considered in PPR-Ta and DNR-Ta networks respectively, five circRNAs (hsa_circ_0000591, hsa_circ_0000592, hsa_circ_0003221, hsa_circ_0017586, hsa_circ_0001741) are seen as hub nodes in both networks. hsa_circ_0000591, hsa_circ_0000592, and hsa_circ_0003221 are derived from the PTK2 gene and hsa_circ_0017586, and hsa_circ_0001741 are derived from GDI2 and TNPO3, respectively. Interestingly, all of these transcripts interact with each other in Ta networks (Fig. [104]3A). Furthermore, the circRNA hsa_circ_0003221 has been recently reported as a novel biomarker in BC that promotes the migration and proliferation of BC cells^[105]21. hsa_circ_0003221, hsa_circ_0017586 and hsa_circ_0001741 share 2 miRNAs that are involved in multiple cancers including BC (Fig. [106]3B)^[107]31–[108]38. Figure 3. [109]Figure 3 [110]Open in a new tab The module constructed from 5 hub circRNAs in Ta networks. Between stage networks To investigate how the connections of the ceRNAs change between stage Ta and T1, we compared the degree, shortest path and neighborhood connectivity of the genes in stage-specific networks (Fig. [111]4). As shown in Fig. [112]4, the degree and neighborhood connectivity of the genes in Ta networks are higher than T1. Also, the average shortest path of the genes in T1 networks is higher than Ta. These results show that many of the ceRNAs connections are lost when the tumor progresses from Ta to T1 stage. To determine which connection is altered and which genes have the most changes in their connections, we reconstructed the between stage networks through calculating the difference of edges. Two network types were defined: Lose Interaction Network (LIN = Ta − T1) and Gain Interaction Network (GIN = Ta − T1). Figure 4. [113]Figure 4 [114]Open in a new tab Comparing the connection between genes in Ta and T1 networks. The node degree, average shortest path and neighborhood connectivity distributions in all stage-specific networks show that many of connections have been lost in T1 networks relative to Ta. Subsequently, the most important genes in these networks (genes with the most connection changes from stage Ta to T1) were detected using degree and betweenness centrality measures (top 10 genes with the highest rank based on each measure were selected). The results show that most of the important genes in the between stage networks, especially the genes with the most altered connections (high degree or hub genes) are critical in cancer (Tables [115]4 and [116]5). These genes could be potential biomarkers for bladder cancer diagnosis. For example, circRNA hsa_circ_0003221 is one of the hub genes in DNR-LIN and PPR-LIN networks. Recently, Xu et al.^[117]21 have demonstrated that this circRNA promotes the proliferation and migration of BC cells. Table 4. Key nodes in stage LIN networks based on Degree and Betweenness centrality measures. Gene name Gene type DNR-LIN network* PPR-LIN network* Some related cancers Reference (PMID) hsa_circ_0000591 circRNA D, B D, B — — CCNT2 Protein coding D, B — Leukemia 28409330 hsa_circ_0000592 circRNA D D, B — — NRAS Protein coding D, B D, B Bladder, Colorectal, Melanoma 21072204 22820081 28074351 WEE1 Protein coding D, B D, B Breast, Leukemia, Melanoma, Brain 27427153 CCNG1 Protein coding D, B — Bladder, Ovarian 27982046 25981880 hsa_circ_0001741 circRNA D D — — hsa_circ_0017586 circRNA D D — — hsa_circ_0007646 circRNA D D Hypopharyngeal squamous cell carcinoma, Pancreatic 28514762 29922161 hsa_circ_0003221 circRNA D D Bladder 29125888 MDM2 Protein coding B B Bladder, Ovarian 28498468 28817834 PWAR6 lncRNA B B Glioma, Breast 30472640 30297886 KCNQ1OT1 lncRNA B B Breast, Lung 30157476 28600629 hsa_circ_0008345 circRNA — D — — [118]AC015813.1 lncRNA B B — — IGF1R Protein coding B — Bladder, Colon 27470393 28882129 NEAT1 lncRNA — B Bladder, thyroid 30349370 30596336 GABPB1-AS1 lncRNA — B — — hsa_circ_0011536 circRNA — D — — [119]Open in a new tab *D: Degree, B: Betweenness. Table 5. Key nodes in stage GIN networks based on Degree and Betweenness centrality measures. Gene name Gene type DNR-GIN network* PPR-GIN network* Some related cancers Reference (PMID) FADD Protein coding D, B — Pancreatic, Lung 28454341 30190126 PSMC4 D D Breast 18042273 GAPDHP1 Pseudogene D, B D — — PDIA5 Protein coding D — Melanoma 25912252 SLC25A1 Protein coding D — Lung 29651165 TFRC Protein coding D D Melanoma 28551638 SNRPA Protein coding D — Gastric 30039889 MYC Protein coding D, B D, B Bladder, Prostate 29463565 27159573 MCM3 Protein coding D D Hepatocellular Carcinoma 30363964 HSPA1A Protein coding D Bladder, Ovarian 23874968 26868087 NR3C1 Protein coding B — Bladder, Gastric 29991691 29285253 NUP214 Protein coding B — Leukemia 25120641 NR2F1-AS1 lncRNA B B Hepatocellular carcinoma 29602203 GLI2 Protein coding B — Bladder, Gastric 27465044 28975979 NCOR2 Protein coding B — Lung 29764865 ADCY7 Protein coding B B Leukemia 26220344 FASN Protein coding — D, B Bladder, Breast 30221356 28922023 CBX2 Protein coding — D Ovarian 30478317 CDC25B Protein coding — D, B Bladder, Endometrial, Gastric 29234286 29353204 27863420 ADCY3 Protein coding — D Gastric 24113161 DAG1 Protein coding — D Leukemia 28591567 STAT1 Protein coding — B Bladder, Breast, Lung, Melanoma, Gastric, Colorectal, Esophageal 27080594 28950072 ANXA2P2 Pseudogene — B — — CDK6 Protein coding — B Bladder, Esophageal 30362519 30551480 IL10 Protein coding — B Bladder, Melanoma, Lung 27987237 26188281 PART1 lncRNA B — Esophageal, Lung, Prostate 30049286 28819376 29261512 LINC02607 lncRNA — B — — [120]Open in a new tab *D: Degree, B: Betweenness. Potential biomarker detection As mentioned in section 2.8, we analyzed the hub nodes (nodes with the highest degree) as candidate biomarkers to investigate which hubs were able to separate Ta and T1 samples based on their expression patterns. Since the expression of circRNAs relative to other RNAs is very low in the samples (Supplementary File [121]1, Fig. [122]S4), we analyzed circRNAs separately. Overall, 48 gene sets from 8 networks were analyzed (16 circRNA sets and 32 other RNA sets from DNR and PPR networks). Note that there were no circRNAs as a hub in the GIN and T1 networks. None of the circRNA sets were able to separate Ta and T1 samples significantly. Among all circRNA hubs, five circRNAs (Fig. [123]3A) with the highest degree in Ta networks (DNR-Ta and PPR-Ta) showed approximately good clustering, but validation results demonstrated that these genes are not able to separate samples based on their expression pattern or maybe our clustering method is not sensitive enough for the circRNAs (Supplementary File [124]1, Fig. [125]S5). These transcripts are seen in the top ten circRNAs with the highest degree in all networks (except T1 and GIN networks that do not have any circRNAs in their highest degree genes). In the LIN networks, the set of 10 non-circRNAs with the highest degree in the DNR-LIN network showed an acceptable clustering result on all samples (Fig. [126]5). The validation results for these genes were also satisfactory and the average AUC in the 5-fold cross validation was 0.93 for these genes. For the GIN networks, the best clustering result was obtained from the 10 RNAs with the highest degree in the PPR-GIN network (Fig. [127]6). Figure 5. [128]Figure 5 [129]Open in a new tab The clustering and Validation results for ten non-circRNAs with the highest degree in the DNR-LIN network. Figure 6. [130]Figure 6 [131]Open in a new tab The clustering and Validation results for ten genes with the highest degree in the PPR-GIN network. The clustering result for 20 genes with the highest degree in DNR-T1 networks was interesting, and the validation showed that these genes could be a bona fide potential biomarker set to separate Ta and T1 bladder cancer stages from each other (Fig. [132]7). Figure 7. [133]Figure 7 [134]Open in a new tab The clustering and Validation results for 20 genes with the highest degree in the DNR-T1 network. The results show that the hub genes in the Ta networks are interesting to further investigate as biomarkers. The best validation result was acquired from the set of 15 and 20 high degree non-circRNA genes in the DNR-Ta network (Fig. [135]8). The average AUCs for these sets in the 5-fold cross validation were 0.96 and 0.95, respectively. Approximately the same results were acquired from the PPR-Ta network (Supplementary File [136]1, Fig. [137]S6). Figure 8. [138]Figure 8 [139]Open in a new tab The clustering and validation results for 15 and 20 genes with the highest degree in DNR-Ta network. Discussion Reconstructing the ceRNA networks based on differentially expressed genes is widely used in the researches but the expression difference is not the only key factor in the ceRNAs activity. For instance, a gene may decoy a large number of miRNAs through a large number of MREs while its expression may not be changed significantly^[140]14. To challenge those mentioned limitations and reconstruct ceRNA network for Non-Muscle Invasive Bladder Cancer (NMIBC), we used the expression data of four RNA types (mRNA, lncRNA, pseudogene, and circRNA) in stages of Ta and T1 in bladder cancer transcriptomics data. Additionally, we integrated gene expression data with PPI scaffold to select the ceRNAs based on both expression changes and interactions at the protein level. The stage-specific network analysis led to the prediction of an important module, entitled the MEG3 module, containing some key cancer-related coding and non-coding genes. As shown in Fig. [141]2, indeed four modules in four networks were predicted but many of the genes in these modules are the same. More precisely, there is one module (detected by two computational approaches) that has its genes and interactions rewired when the disease progresses from stage Ta to T1. Many of the genes in the MEG3 module are cancer-related (more than 80% in T1 and more than 50% in Ta networks), and interestingly in T1 networks, all protein-coding genes detected in the module are bladder cancer-related. This proves that our suggested approach enjoys significant power to predict disease-related genes. lncRNA MEG3 is the core gene in this module that interacts with all other genes (except PDGFB and [142]AC018647.1 in DNR-T1 and PPR-Ta networks, respectively). This is a very important cancer-related lncRNA that acts as a tumor suppressor in many cancers including bladder cancer^[143]5,[144]39–[145]41. Strong correlated competing interactions are observed between this gene and some other bladder cancer-related genes in this module that has not yet been reported for NMIBS. For example, ADAMTS9-AS2 is a bladder cancer-related lncRNA^[146]4 that interacts with MEG3 in both Ta and T1 networks through sharing four miRNAs (hsa-miR-148b-3p, hsa-miR-143-3p, hsa-miR-574-5p and hsa-miR-4701-3p). Among these miRNAs, miR-574-5p is an important cancer-related miRNA that shows oncogenic function in thyroid, breast, and lung cancers^[147]42,[148]43 but has been not reported in NMIBC. CARMN (or miR143HG) is another bladder cancer related lncRNA^[149]44 that has a strong connection with MEG3 (correlation 0.76) via sharing 5 miRNAs (hsa-miR-148b-3p, hsa-miR-106a-5p, hsa-miR-9-5p, hsa-miR-148a-3p, hsa-miR-501-3p). miR-106a-5p is significantly associated with tumor grade in NMIBC^[150]45 and miR-148a-3p suppresses Epithelial Mesenchymal Transition (EMT) and cell proliferation in bladder cancer^[151]46. An example of a bladder cancer-related mRNA that is found in the MEG3 module is Decorin^[152]47. In our results, the lncRNA MEG3 competes with Decorin through sponging 6 miRNA including hsa-miR-122-5p, hsa-miR-484, hsa-miR-338-3p, hsa-miR-30a-3p, hsa-miR-30e-3p, hsa-miR-224-5p. miR-484 is a tumor suppressor miRNA between MEG3 and Decorin in our module. Also, the lncRNA ZFAS1 promotes colorectal cancer cell proliferation via sponging this miRNA^[153]48. Furthermore, ZFAS1 recently has been detected as an oncogenic lncRNA in BC^[154]49. Therefore, miR-484 can be introduced as an important NMIBC related miRNA. FENDRR is another key lncRNA in MEG3 module. This gene has a suppressive function in non-small cell lung cancer, and downregulation of it correlates with poor prognosis in renal cell carcinoma^[155]50,[156]51. We found just one study by Suqing et al.^[157]52 reporting it as a significantly downregulated lncRNA in bladder cancer. Five miRNAs are shared between FENDRR and MEG3 in our networks (hsa-miR-148a-5p, hsa-miR-106b-5p, hsa-miR-4705, hsa-miR-20a-5p, hsa-miR-34a-3p). Interestingly, all of these miRNAs (except miR-4705) are bladder cancer related^[158]53–[159]55, and one of them (miR-34a-3p) has been reported as a biomarker of NMIBC recurrence^[160]56. These results indicate that the lncRNA FENDRR can potentially be a novel candidate lncRNA related to NMIBC. As shown in Fig. [161]2, there are some cancer-related genes in the MEG3 module that highly correlate and interact with bladder cancer genes, but no report was found about the activity of these genes in bladder cancer. These genes are potentially related to NMIBC and should be investigated as future work. For instance, NR2F1-AS1 was strongly connected with MEG3 in Ta networks. This gene is relevant to oxaliplatin resistance in hepatocellular carcinoma^[162]57. Furthermore, some other genes in the module such as lncRNA [163]AC007608.2 and [164]AL117190.1 have a strong connection with MEG3 and some other cancer-related genes, but we did not find any report about their roles in cancer. These genes could be candidate genes to investigate their function in bladder cancer especially NMIBC. Taking a closer look at these interactions and shared miRNAs can help to select a better candidate gene. For instance, there are three shared miRNAs between [165]AC007608.2 and MEG3 and one of them (miR-324-3p) is bladder cancer related^[166]58 and two others (miR- 365a-3p, miR-365b-3p) are involved in other cancer types^[167]59,[168]60. Therefore, the lncRNA [169]AC007608.2 is another novel candidate gene potentially related to NMIBC. Change in ceRNA interactions is one of the key features of cancer^[170]11,[171]13. Therefore, we reconstructed between-stage networks (LIN and GIN) by calculating the difference between the edge set of Ta and T1 networks. Subsequently, the top 10 hub genes (10 genes with the highest degree) in these networks were detected as the genes with the most connection rewiring when the disease progressed from Ta to T1 stage. In the LIN and Ta networks, five circRNAs had the highest degrees simultaneously (Fig. [172]3A). Among these hub circRNAs, hsa_circ_0003221, hsa_circ_0000591, and hsa_circ_0000592 were derived from one gene (PTK2), and hsa_circ_0003221 recently has been reported as a potential biomarker for bladder cancer that promotes proliferation and migration of cancer cells^[173]21. We did not find any report about the four other circRNAs in this set but closely looking in the module constructed by these transcripts revealed that there are only two important miRNAs (miR-103a-3p, miR-107) shared between hsa_circ_0003221, hsa_circ_0017586, and hsa_circ_0001741 (Fig. [174]3B). These two miRNAs have recently been reported as regulators to promote proliferation and migration of BC cells through ceRNA activity of circTCF25 and CDK6^[175]31. miR-107 is one of the most important miRNAs in cancer^[176]36. This miRNA is sponged by the lncRNA RP11-79H23.3 and regulates PTEN in BC cells^[177]34. This process can suppress BC development via inactivation of the PI3K/Akt signaling pathway^[178]35. The tumor suppressor function of miR-107 via ceRNA activity has also been demonstrated by Su et al. in another recent study^[179]32. Therefore, hsa_circ_0017586 and hsa_circ_0001741 may have oncogenic functions in BC through sponging miR-107. Furthermore, miR-103a-3p, another miRNA shared between the three mentioned circRNAs, is also involved in BC and some other cancer types such as Gastric and Glioma^[180]33,[181]37,[182]38. The last analysis in this study was to evaluate the hub genes as potential biomarkers to design a diagnostic panel. To this end, the set of hub genes in each network were analyzed using hierarchical clustering with correlation based distance and SVM classifier with 5-fold cross validation (see section Materials and method). As shown in Figs [183]5–[184]8 and Supplementary File [185]1, Fig. [186]S6, the results show the power of hub genes to separate Ta and T1 samples based on their expression patterns. The results of our study demonstrate that the selection of genes via a computational manner such as mapping the expression difference of genes on PPI data and ranking them based on their connection and expression could be an alternative approach to find significant disease-related genes. Also, the ceRNA hypothesis and its novel perspective to the gene regulatory system could be very helpful to understand complex disease mechanisms such as cancer. Material and Method Data collection Total RNA sequencing of 457 Non-Muscle-Invasive Bladder Cancer (NMIBC) samples (348 samples in stage Ta and 109 samples in stage T1) were selected from the ArrayExpress database ([187]http://www.ebi.ac.uk/arrayexpress)^[188]61 under accession number E-MTAB-4321^[189]62. circRNA expression data was obtained from Okholm et al.^[190]63. Preprocessing and normalization Based on the ceRNA hypothesis, lncRNAs, transcribed pseudogenes, circRNAs, and mRNAs are four key RNA categories that act as ceRNAs^[191]14,[192]64. Therefore, we selected only these types of RNAs for further analysis. The BioMart tool^[193]65 from the Ensembl genome browser ([194]www.ensembl.org)^[195]66 was used to select the gene types. In the next step, the genes with low expression in all samples were removed. To this end, we filtered out each gene that had an average count in all samples of less than one^[196]67. Furthermore, the genes with low standard deviation (lower than the first quartile of the standard deviation of all genes) in all samples were also filtered out. Additionally, the genes with a high standard deviation of their count in T1 or Ta stages were also filtered out (threshold was the third quartile of the standard deviation of all genes in each group). Considering all samples of a stage of the disease, if the average Pearson correlation of a sample with other samples was less than the first quartile of Pearson correlations between all sample pairs, then this sample was ignored as a within stage outlier. After all preprocessing steps, 220 samples (148 Ta and 72 T1) were selected for further analysis. The ceRNA count data was normalized using the TMM method^[197]68 implemented in the edgeR package^[198]69 and the log-transformed normalized count table used for further analysis. Key mRNA selection Generally, due to a large number of genes, the ceRNA networks are reconstructed based on Differentially Expressed Genes (DEGs), but it is obvious that DEGs are not the only actors in this process^[199]14. Therefore, we tried to find key disease-related mRNAs based on expression data and physical interactions at the protein level. To this end, firstly we applied differential expression analysis to get the fold change and p-value of all mRNAs (the edgeR package was used). After that, a combined weight was assigned to each mRNA based on Eq. [200]1: [MATH: w(g)=|log(FoldChange(g))|(log(correctedP_Value(g)) :MATH] 1 where g is a protein-coding gene or mRNA in the data. In the next step, the Protein-Protein Interactions (PPI) of all weighted mRNAs were extracted from the STRING database^[201]70 with a minimum interaction confidence score of 0.5. The weighted nodes in the PPI network were ranked based on their weights and connections. Finally, 10 percent of the nodes with the highest ranks were selected for further analyses. Two ranking methods were applied: the Personalized PageRank (PPR) algorithm^[202]71 (implemented by the igrapg package^[203]72) and our proposed method based on Eq. [204]2 (entitled Depth Neighborhood Ranking or DNR): [MATH: rank(g)=w(g)+i =1d< mrow>|Ng d|d(|Ng d|w(Ng d)) :MATH] 2 where g is a vertex in the PPI network (a protein-coding gene), w(g) is the weight of vertex g calculated by Eq. [205]1, [MATH: Ngd :MATH] is the set of neighbors of vertex g that the shortest path of g to them is equal to d. In this new approach, we tried to add a fraction of total weights of the neighbors to the initial weight of each vertex. This fraction is the square root of the number of neighbors because the slope of the square root function grows softly by increasing the number of neighbors. Concerning “d” as a depth threshold, the effect of farther neighbors can be applied. However, the greater the distance from g the lower the effect as neighbors weights are divided by d threshold (The d threshold was set to 2 as default). ceRNA network reconstruction and Module Detection The miRNAs that target the key mRNAs were extracted from the TarBase^[206]73 mirTarBase^[207]74 and RAID^[208]75 databases. In addition to the mRNA-miRNA bipartite network obtained from the key mRNAs and abovementioned databases, three other ceRNA-miRNA bipartite networks (lncRNA-miRNA, pseudogenes-miRNA, and circRNA-miRNA) were reconstructed based on the extracted miRNAs and the experimentally validated data from the RAID, lncBase^[209]76 and StarBase^[210]77 databases. The ceRNA-miRNA bipartite networks were merged and reconstructed to form a basic ceRNA-ceRNA interaction network. In this network, the nodes represent ceRNAs (mRNA, lncRNA, circRNA or pseudogene) and two ceRNAs are connected if they have common miRNAs in bipartite networks. All edges with less than three shared miRNAs were removed from the network. The hypergeometric test was applied to all ceRNA-ceRNA interactions based on Eq. [211]3^[212]12: [MATH: pvalue< mo>=1i =0s1(C1i)(TC1C2i)(Tc2) :MATH] 3 where, s is the number of miRNAs shared between ceRNA1 and ceRNA2, c[1] is the number of miRNAs targeting ceRNA1, c[2] is the number of miRNAs targeting ceRNA2 and T is the total number of miRNAs in the human genome. All edges with a hypergeometric test p-value higher than 0.01 were removed from the basic network. Next, the normalized ceRNA count table was used to implement the Pearson correlation test for all interactions in the basic ceRNA-ceRNA interaction network. All edges with a correlation test p-value larger than 0.01 were removed from the network. Since the Pearson correlation test for each interaction was applied based on Ta and T1 samples separately, two final ceRNA-ceRNA interaction networks (Ta ceRNA network and T1 ceRNA network) were obtained for further analysis. Figure [213]9 shows the overall process of our study. Figure 9. [214]Figure 9 [215]Open in a new tab The overall process of our study. Additionally, we reconstructed between stage ceRNA network from Ta and T1 networks using the difference of edge sets. Cytoscape version 3.2^[216]78 was applied to visualize and reconstruct the network differences. The Markov CLuster (MCL) algorithm^[217]79 via the ClusterMaker Cityscape application^[218]80 was used to detect modules in the networks. Enrichment analysis The Gene Ontology (GO), disease and pathway enrichment analyses were performed using the ToppFun web tool^[219]81. Centrality calculation The critical genes in the networks were predicted based on degree and betweenness centrality measures. The CytoNCA^[220]82 tool was applied to calculate centralities. Potential biomarker detection For potential biomarker detection, 5, 10, 15 and 20 hub genes (genes with the highest degree) in the networks were selected. After that, all samples were clustered based on the expression pattern of each chosen gene set (hierarchical clustering with ward.D2 method^[221]83 and 1-Pearson correlation as the distance). The gene sets that significantly separated the Ta and T1 samples based on expression pattern were selected as candidate biomarker sets. To validate biomarkers, we used SVM classifier and 5-fold cross-validation. Consequently, by calculating the Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) the best gene sets were selected as final potential biomarkers to detect BC in the early stage. The e1071^[222]84, plotROC^[223]85 and ggplot2^[224]86 R packages were used to apply the SVM algorithm and calculate and visualize ROC curves. Supplementary information [225]41598_2019_44944_MOESM1_ESM.docx^ (3.8MB, docx) Supplementary file 1 (Supplementary Figures and Tables) [226]41598_2019_44944_MOESM2_ESM.xlsx^ (286KB, xlsx) Supplementary file 2 (mRNAs Disease enrichment analysis results) [227]41598_2019_44944_MOESM3_ESM.xlsx^ (427.8KB, xlsx) Supplementary file 3 (mRNAs BP enrichment analysis results) [228]41598_2019_44944_MOESM4_ESM.xlsx^ (170.3KB, xlsx) Supplementary file 4 (mRNAs Pathway enrichment analysis results) [229]41598_2019_44944_MOESM5_ESM.xlsx^ (30.7KB, xlsx) Supplementary file 5 (mRNAs PubMed enrichment analysis results) [230]41598_2019_44944_MOESM6_ESM.xlsx^ (39.7KB, xlsx) Supplementary file 6 (DNR-Ta network modules enrichment analysis results) [231]41598_2019_44944_MOESM7_ESM.xlsx^ (58.2KB, xlsx) Supplementary file 7 (DNR-T1 network modules enrichment analysis results) [232]41598_2019_44944_MOESM8_ESM.xlsx^ (46.5KB, xlsx) Supplementary file 8 (PPR-Ta network modules enrichment analysis results) [233]41598_2019_44944_MOESM9_ESM.xlsx^ (49.8KB, xlsx) Supplementary file 9 (PPR-T1 network modules enrichment analysis results) [234]41598_2019_44944_MOESM10_ESM.xlsx^ (117.3KB, xlsx) Supplementary file 10 (MEG3 module enrichment analysis results) Acknowledgements