Abstract Background Autosomal dominant polycystic kidney disease (ADPKD), a common of monogenetic disorder caused by the polycystic kidney disease-1 (PKD1) or PKD2 genes deficiency. In this study, we have re-analyzed a microarray dataset to generate a holistic view of this disease. Methodology [26]GSE7869, an expression profiling dataset was downloaded from the Gene Expression Omnibus (GEO) database. After quality control assessment, using GEO2R tool of GEO, genes with adjusted p-value ≤ 0.05 were determined as differentially expressed (DE). The expression profiles from ADPKD samples in different sizes were compared. Using CluePedia plugin of Cytoscape software, the protein–protein interaction (PPI) networks were constructed and analyzed by Cytoscape NetworkAnalyzer tool and MCODE application. Pathway enrichment analysis of clustered genes by MCODE with the high centrality parameters in PPI networks was performed using Cytoscape ClueGO plugin. Moreover, by Enrichr database, microRNAs (miRNAs) and transcription factors (TFs) targeted DE genes were identified. Results In this study to explore the molecular pathogenesis of kidney in ADPKD, mRNA expression profiles of cysts from patients in different sizes were re-analyzed. The comparisons were performed between normal with minimally cystic tissue (MCT) samples, MCTs with small cysts, and small cysts with large cysts. 512, 7024, and 655 DE genes were determined, respectively. The top central genes, e.g. END1, EGFR, and FOXO1 were identified with topology and clustering analysis. DE genes that were significantly enriched in PPI networks are critical genes and their roles in ADPKD remain to be assessed in future experimental studies beside miRNAs and TFs predicted. Furthermore, the functional analysis resulted in which most of them are expected to be associated with ADPKD pathogenesis, such as signal pathways that involved in cell growth, inflammation, and cell polarity. Conclusion We have here explored systematic approaches for molecular mechanisms assay of ADPKD as a monogenic disease, which may also be used for other monogenetic diseases beside complex diseases to provide suitable therapeutic targets. Keywords: Autosomal dominant polycystic kidney disease, Microarray, Protein interaction network, Signal pathway Background The hereditary autosomal dominant polycystic kidney disease (ADPKD) is the most common monogenic disorder. ADPKD is a multi-systematic disease diagnosed by growing multiple cysts on kidneys. liver cysts and cerebral aneurysms are also the main clinical findings of disease [[27]1]. ADPKD is genetically heterogeneous and results from mutations in at least two genes, Polycystic Kidney Disease-1 (PKD1) or PKD-2 [[28]2]. These genes encode transmembrane proteins, Polycystin-1 (PC-1) and Polycystin-2 (PC-2) which form a functional complex [[29]3]. This protein complex, similar to other proteins are affected in polycystic kidney diseases locate in primary cilia of epithelial and endothelial cells [[30]4]. PC-1 known as a cell surface receptor and PC-2 is a cation channel and both of them play a critical role in controlling of signaling pathways related to proliferation, apoptosis, and cell polarities through Ca2+ homeostasis regulation [[31]5]. In spite of numerous studies related to polycystins functions, their roles are poorly understood. Regarding this major limitation being sensible to recognize the underlying mechanisms, systems biology approaches with a holistic view of the molecular mechanisms of disorders, have the potential to overcome these limitations. These approaches with comprehensive interpretation, using high throughput data extracted from omics data, provide the opportunity to represent the behavior of networks and emerge new therapeutic strategies. Therefore, we re-analyzed the array dataset deposited by Song X et al. which was compared transcription profiling of all samples from PKD1 patients with normal tissue, and gene set enrichment analysis (GSEA) was performed [[32]6]. But here, we have shown large-scale protein interaction networks. For deeply understanding of central genes that related with phenotypes of disease in each step, network and clustering analysis were carried out. These revealed some of the key genes, such as EDN1, EGFR, ARF6, FOXO1, and ITGB5 involved during disease. Pathways were identified with enrichment analysis with the notice on cysts size, from early to late steps. Moreover, for the purpose of assay the regulatory mechanisms of DE genes, microRNAs (miRNAs) and transcription factors (TFs) enriched with DE genes were predicted. Methods Microarray data and DE genes screening Microarray dataset with accession number “[33]GSE7869” from the Gene Expression Omnibus (GEO) database was extracted. The quality of transcriptomics dataset was measured by principal component analysis (PCA) through the ggplot2 package and prcomp function of R [[34]7]. Using GEO2R a web tool of GEO, groups were compared to detect genes that are differentially expressed with cysts growth. Samples of normal tissues (n = 3), minimally cystic tissues (n = 5), small cysts (n = 5), and large cysts (n = 3) were compared based on during the time of disease progression, using Student’s t-test, respectively. Benjamini–Hochberg false discovery rate (FDR) was used for p-value correction. Genes were declared as differentially expressed, had an adjusted p-value less than 0.05. Protein-protein interaction networks construction The protein–protein interaction (PPI) networks were built with DE genes. For networks construction, CluePedia plugin version 1.5.2 [[35]8] of Cytoscape software version 3.7.1 [[36]9] was used. STRING database with confidence cutoff 0.80 was provided, for retrieving interactions [[37]10]. Networks topology was investigated using the NetworkAnalyzer tool of Cytoscape [[38]11]. “Molecular Complex Detection” (MCOD) plugin of Cytoscape detected modules, highly connected sub-networks, based on default settings [[39]12]. Pathway enrichment analysis Functional analysis of genes clustered with MCODE was done by Cytoscape ClueGO plugin version 2.5.2 [[40]13]. Reactome [[41]14] and KEGG (Kyoto Encyclopedia of Genes and Genomes) [[42]15] databases were chosen for retrieving pathways. Bonferroni step down was applied for p-value correction, and signaling pathways with adjusted p-value ≤ 0.05 were determined. miRNA and TF enrichment analysis The microRNAs (miRNAs) and transcription factors (TFs), key regulators of genes, were predicted by Enrichr web server [[43]16]. TargetScan microRNA 2017 and ChEA 2016 libraries were used for miRNA and TF enrichment analysis, respectively. Adjusted p-value less than 0.05 was considered as the significant threshold. The miRNAs with more targeted genes were selected. Results By microarray data analysis, differentially expressed genes were identified The microarray dataset “[44]GSE7869” which includes renal cysts in different sizes; small cysts (SC) less than 1 mm, medium cysts between 10 and 25 mm, and large cysts (LC) greater than 50 mm have been analyzed. Minimally cystic tissues (MCT) obtained from healthy parts of the renal cortex of PKD1 patients were considered as heterozygote samples. In quality assay step except medium cysts, the samples were segregated based on their states (normal tissue, minimally cystic tissue, small cyst, and large cyst), indicate the acceptable quality of this dataset (Fig. [45]1). Using GEO2R tool, we obtained 512, 7024, and 655 genes which are significantly variably expressed between normal vs. MCT samples, MCT vs. SC samples, and SC vs. LC samples, respectively (Additional file [46]1). Interestingly, these sets of DE genes have few overlapping genes (Fig. [47]2a). Fig. 1. [48]Fig. 1 [49]Open in a new tab The quality of the microarray dataset is satisfying. The Principle component analysis results of the [50]GSE7869 dataset were shown the samples were separated appropriately Fig. 2. [51]Fig. 2 [52]Open in a new tab The overlapping of differentially expressed genes and protein–protein interaction networks. The protein- protein interaction networks were built with differentially expressed genes. b: Normal vs. MCT, c: MCT vs. SC, d: SC vs. LC) Protein–protein interaction networks were constructed The PPI networks with DE genes were constructed. Links between genes were selected based on activation, binding, post-translational modification, and inhibition interactions. PPI networks are shown small cyst growth phase is an important and complex step during the progression of the disease. 81, 2737, and 155 nodes (genes) are in PPI networks (normal vs. MCT, MCT vs. SC, and SC vs. LC), respectively (Fig. [53]2b–d). The MCODE application identified protein clusters in networks. These protein complexes and modules are highly interconnected subnetworks with the most effective genes. Network topology were measured based on the graph theory concepts such as degree, betweenness, and closeness centrality. The seed gene with the highest centrality is EDN1 in the early stage, normal vs. MCT comparison. Seed genes such as EGFR, ARF6, WWTR, SMURF2, TGFB2, and HSD17B8 are critical genes in the comparison of MCT with SC. FOXO1, EDN1, and ITGB5 are introduced as central genes in the late stage, SC vs. LC comparison. Some of these genes including EGFR and EDN1 have been recognized related to ADPKD in previous experimental studies [[54]17, [55]18] and other genes are candidates for future studies. The genes are represented in Table [56]1. Table 1. Top clustered genes in the PPI networks. The seed genes with the highest density in PP networks are shown GOID Betweenness centrality Closeness centrality Degree GOID Betweenness centrality Closeness centrality Degree Normal vs. MCT HSF1 0.002468 0.285681 17 EDN1 0.412698 0.313043 3 NRCAM 0.001105 0.25658 13 MCT vs. SC PRMT5 0.000581 0.296164 17 EGFR 0.017832 0.371711 149 POLR3C 3.92E-05 0.265655 17 RPL35 5.85E-05 0.30435847 73 SEMA3A 3.9E-05 0.28154 16 COPA 0.00661 0.299735 54 TAF9B 0.00086 0.274302 15 PMF1 1.41E-05 0.309089 49 PIAS4 0.000677 0.299916 15 ARF6 0.006571 0.317375 45 THRB 0.000456 0.312586 13 FBXW11 0.002383 0.324162 44 DCAF7 0.000292 0.276992 12 SMURF2 0.003544 0.323445 40 EIF1 1.64E-05 0.240752 12 WWTR1 0.004166 0.318269 35 GSN 0.004329 0.284374 11 CACNB2 0.001283 0.313691 34 OCLN 0.001005 0.269397 11 TBL1X 0.001139 0.317051 33 AES 7.63E-05 0.308475 11 CUL3 0.001145 0.302286 28 ATP6 3.7E-07 0.186567 11 RPP30 0.000402 0.250681 28 ALDOA 0.001615 0.264891 4 AP1M2 5.86E-05 0.281063 26 HSD17B8 0.666667 1 3 TGFB2 0.002654 0.314127 25 SC vs. LC PHAX 0.000336 0.294655 23 NCBP1 0.153881 0.26259 17 FABP4 0.000212 0.297939 23 KDELR2 0.01875 0.533333 7 SEC23A 0.001501 0.258339 22 FOXO1 0.152968 0.264493 5 CXCL9 6.9E-05 0.269514 21 EDN1 0.121988 0.302905 7 ND2 5E-05 0.215126 21 PSMA5 0.066667 0.769231 7 CTF1 0.000322 0.267398 19 ITGB5 0.134354 0.25 5 CITED2 9.8E-05 0.289845 19 PPIE 0.105023 0.216617 7 [57]Open in a new tab Pathway enrichment analysis was performed Functional analysis was carried out based on genes detected by MCODE. Using pathway enrichment analysis from 18, 1318, and 66 genes, we reached to 7, 113, and 39 pathways, respectively (Fig. [58]3). Interestingly, the GoTerms are informative and related to the phenotype of each step, such as collecting duct acid secretion in early step. An interesting finding in this study was the detection of critical pathways and functions such as EGF, Wnt, MAPK, HIF, P53, CFTR, AMPK, PDGF, NFκB, IGF1, MET signaling, oxidative phosphorylation, energy metabolism, cell–cell and cell–matrix interaction, and signaling by interleukins which were previously shown to be associated with ADPKD in experimental studies [[59]19–[60]21], and other pathways could consider for more studies and validation. Fig. 3. [61]Fig. 3 [62]Open in a new tab Pathway enrichment analysis of clustered genes. Functional analysis showed interconnected and informative pathways mainly are associated with renal cystic growth (a Normal vs. MCT, b MCT vs. SC, and c SC vs. LC). The significance of pathways is labeled based on the color code. The number of mapped genes in each path is shown according to the size of nodes miRNAs and TFs enriched with DE gene were determined The miRNAs and TFs as important regulators of DE genes were predicted. HNF4A, ESR1, and RXR were defined as TFs in the initial step, in normal vs. MCT comparison. TFs were significant in the small and large cyst growth steps are shown in Table [63]2. The top miRNAs enriched with DE genes in each phase are shown in Fig. [64]4. Previous studies reported the association of ADPKD with some of TFs e.g. HNF4A, STAT3, VDR, TP53, and HIF1A [[65]6, [66]20, [67]22, [68]23]. Also, the role of miR-17 family and miR-192 in cyst enlargement were identified [[69]24, [70]25]. It is valuable to investigate other miRNAs and TFs in experimental studies. Table 2. Transcription factor enrichment analysis Normal vs. MCT MCT vs. SC SC vs. LC HNF4A CLOCK ESR1 BRD4 VDR ESR1 ZNF217 TCF21 NERF2 AR RXR NUCKS1 ESR2 CEBPD PPARG RELA NR1H3 TOP2B KDM5B WT1 OLIG2 SMARCA4 TP53 PAX3-FKHR SOX9 STAT3 MYB CEBPD FOXA1 TP63 KDM5A SMARCA4 AR EBF1 EKLF FOXA2 KLF4 PAX3-FKHR FOXA1 PRDM5 MYB SOX2 CEBPB SOX2 CTNNB1 ZNF217 PCGF2 HNF4A KLF6 NFE2L2 NFE2L2 P300 NRF2 NRF2 VDR SMAD4 SMAD4 HNF4A [71]Open in a new tab TFs were obtained with adjusted p-value < 0.05 Fig. 4. [72]Fig. 4 [73]Open in a new tab miRNA enrichment analysis results. The top of the miRNA were predicted. An adjusted p-value less than 0.05 was considered as the significant cut-off Discussion ADPKD caused by mutations in PKD1 or PKD2 genes [[74]2] and the protein products of these genes, polycystin-1 and polycystin-2 act as a mechanosensor on the surface of epithelial and endothelial cells [[75]4]. The loss and gain of function of these proteins, leading to dysregulation of pathways related to proliferation, apoptosis, and polarity of cells [[76]5]. Despite many studies indicated the functions of the polycystins, the numerous ambiguities remain about the molecular mechanisms of the disease progression. For the importance of time series analysis of diseases [[77]26], the purpose of this study was the computational analysis of the expression profile of renal cysts that were compared based on different sizes of cysts. Bioinformatics methods were performed in this study showed that 512, 7024, and 655 DE genes, respectively dysregulated in each step. The PPI networks were shown nodes and their interactions became more complex with the progression of disease in small cyst growth. The topology and clustering analysis of networks were employed for revealing candidate genes with high centrality as therapeutic targets. Nodes (genes) with high degree, they have many connections and are important for the networks. Betweenness centrality is based on the number of shortest paths going through a node and are shortcuts of the networks. Also, closeness centrality calculated physically nearest genes to all nodes [[78]27]. Modules are high density regions in the network and identify functional genes [[79]12]. The role of some of these genes has been well documented in ADPKD such as EDN1 as a vasoconstrictor may promote tumorigenesis and recent studies have documented that an increase in serum endothelin levels is associated with renal pathogenesis of ADPKD. Also, polymorphisms of EDN1 can influence the age of onset of end-stage renal disease in ADPKD [[80]18, [81]28]. EGFR promotes cell growth, proliferation, and cell survival and has important functions in the progression of ADPKD [[82]17]. Other genes introduced as applicant genes for future studies are ARF6, SMURF2, WWTR1, CACNB2, and FOXO1. ARF6 is a member of the RAS superfamily that regulates signaling pathways related to actin remodeling such as wnt path, the central pathway in ADPKD [[83]26]. SMURF2 controls cell migration with BMP and TGFβ signaling pathways [[84]29]. WWTR1 acts as a transcriptional coactivator downstream of the Hippo signaling pathway that plays a major function in the control of organ size [[85]30]. Ablation of CACNB2 leads to calcium homeostasis derivation and could have a critical role in the initiation and progression of the disease. Previous studies showed that mutation in the PKD1 leads to higher glycolysis in ADPKD kidneys. FOXO1 through insulin signaling plays a main role in glucose metabolism and consequently involved in ADPKD pathogenesis [[86]31, [87]32]. Also, ITGB5 contributes to cell adhesion and known as a biomarker in kidney disease [[88]33]. The mechanisms of the newly introduced crucial genes such as PPIE remain to be identified with experimental studies. We pointed out TFs such as HNF4A, STAT3, VDR, TP53, and HIF1A associated with ADPKD [[89]22, [90]23]. In addition, other TFs as CLOCK in ADPKD pathogenesis firstly are described in this study. Since CLOCK involved in kidney function, confirmation its role in ADPKD can get interesting results [[91]34]. Functional analysis was shown that the pathways are correlated with the phenotype of disease in each step including pathways involved in cell proliferation, apoptosis, and inflammation. The roles of some of the pathways have determined in ADPKD pathogenesis [[92]19, [93]20]. Conclusions Here by computational tools we generate a systematic view of the ADPKD to explore the comprehensive molecular mechanisms of a monogenic disease. Methods employed in this study may also be used for each monogenic disorder to reach novel therapeutic targets. Also, the necessity of holistic maps assay of monogenetic disease besides complex disease is desired. Supplementary information [94]Additional file 1. ^(649.8KB, xlsx) Acknowledgements