Abstract Human clear cell renal cell carcinoma (ccRCC) is the most common solid lesion within kidney, and its prognostic is influenced by the progression covering a complex network of gene interactions. In our study, we screened differential expressed genes, and constructed protein-protein interaction (PPI) network and a weighted gene co-expression network to identify key genes and pathways associated with the progression of ccRCC (n = 56). Functional and pathway enrichment analysis demonstrated that upregulated differentially expressed genes (DEGs) were significantly enriched in response to wounding, positive regulation of immune system process, leukocyte activation, immune response and cell activation. Downregulated DEGs were significantly enriched in oxidation reduction, monovalent inorganic cation transport, ion transport, excretion and anion transport. In the PPI network, top 10 hub genes were identified (TOP2A, MYC, ALB, CDK1, VEGFA, MMP9, PTPRC, CASR, EGFR and PTGS2). In co-expression network, 6 ccRCC-related modules were identified. They were associated with immune response, metabolic process, cell cycle regulation, angiogenesis and ion transport. In conclusion, our study illustrated the hub genes and pathways involved in the progress of ccRCC, and further molecular biological experiments are needed to confirm the function of the candidate biomarkers in human ccRCC. Keywords: clear cell renal cell carcinoma (ccRCC), differentially expressed genes (DEGs), biomarker, weighted gene co-expression network analysis (WGCNA), protein-protein interaction (PPI) Introduction Renal cell carcinoma (RCC), which makes up approximately 3% of all adult malignancies and 90-95% of adult kidney neoplasm, is being diagnosed with increasing incidence and mortality rates worldwide [45]^1. Clear cell renal cell carcinoma (ccRCC) is the most common tumor in kidney. However, its prognostic is influenced by the progression covering a complex network of gene interactions. As the most common pathological type of renal cancer, apart from surgery, it is both radiotherapy and chemotherapy resistant. At present, biomarkers for early detection and follow-up of the disease are not available. Therefore, targeted therapies are the best choices of non-surgical treatment for ccRCC [46]^2. Many studies revealed that targeted therapies, such as multipotase inhibitors, anti-VEGF antibodies and mTOR, had been approved for clinical use [47]^3. However, identification of novel therapeutic targets or biomarkers for prognostic, diagnostic or predictive use remains a priority. Currently, gene expression profiles have been used to identify genes associated with progression of renal cancer [48]^4^-[49]^7. Meanwhile, some researchers also used integrated approach to screen changes in renal carcinogenesis [50]^8^, [51]^9. However, most studies just focused on the screening of differentially expressed genes and ignored the high degree of interconnection between genes, although genes with similar expression patterns may be functionally related. Weighted gene expression network analysis (WGCNA) is a systems biology method for describing the correlation patterns among genes across microarray or RNA Sequence data, and it is an algorithm used for finding clusters (modules) of highly correlated genes as well as identifying phenotype related modules or genes clusters [52]^10. Nowadays, there are various studies revealing the phenotypes-related genes via WGCNA method, especially in cancers [53]^11^, [54]^12. For example, Zhou et al. found that TOP2A could be used as the potential prognostic and progression biomarker for pancreatic ductal adenocarcinoma [55]^12. Wang et al. revealed that ASPM could cause cirrhosis and eventually led to hepatocellular carcinoma [56]^13. In this study, we attempt to firstly screen differential expressed genes, construct protein-protein interaction networks and a co-expression network of relationships between genes through a systematic biology method based on WGCNA and to identify key genes and pathways participating in the carcinogenesis of ccRCC [57]^13^, [58]^14. Materials and Methods Data collection Gene expression profile was downloaded from Gene Expression Omnibus (GEO) database ([59]http://www.ncbi.nlm.nih.gov/geo/). Dataset [60]GSE53000 performed on Affymetrix Human Gene 1.0 ST Array [transcript (gene) version] (Affymetrix, Santa Clara, CA, USA) was used to screen differential expressed genes, construct protein-protein interaction networks and co-expression networks and identify hub genes and pathways in this study. This dataset included 56 clear cell renal cell carcinoma samples (including 2 lymph node metastasis samples and 1 venous thrombus samples) and 6 normal kidney samples. Study design and data preprocessing The study design was performed in a flow diagram (Fig. [61]1). Raw expression data were calculated following the pre-processing procedures: RMA background correction, log2transformation, quantile normalization and median polish algorithm summarization using the “affy” R package. Probes were annotated by the Affymetrix annotation files. Microarray quality was assessed by sample clustering according to the distance between different samples in Pearson's correlation matrices. No samples were removed from subsequent analysis in the test dataset (Fig. [62]2A). Figure 1. [63]Figure 1 [64]Open in a new tab Flow diagram of study. Data preparing, processing and analysis was shown in the flow diagram. Figure 2. [65]Figure 2 [66]Open in a new tab Samples clustering and identification of differentially expressed genes (DEGs) in ccRCC tissues. (A) Samples clustering of [67]GSE53000 to detect outliers. (B) The volcano plot of all DEGs. Differentially expressed genes (DEGs) screening We use the “limma” R package to screen the DEGs between ccRCC samples and normal kidney samples. The FDR (false discovery rate) < 0.05 and |log[2]fold change (FC)| > 1 were chosen as the cut-off criteria (Fig. [68]2B). Furthermore, we chose the same condition and screened differentially expressed genes between ccRCC samples and metastasis samples. Functional and pathway enrichment analysis The Database for Annotation, Visualization and Integrated Discovery (DAVID) ([69]http://david.abcc.ncifcrf.gov/) is an online program providing a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes [70]^15. Enriched biological themes of DEGs, particularly GO terms and visualization of those on KEGG pathway maps were performed using DAVID database, p < 0.05 was set as the cut-off criterion. PPI network construction We used Search Tool for the Retrieval of Interacting Genes (STRING) Database (STRING) ([71]http://www.string-db.org/) to assess protein-protein interaction (PPI) information [72]^16. In addition, to evaluate the interrelationships of DEGs, we used STRING database for analysis and Cytoscape software for visualization [73]^17. Confidence score > 0.4 was set as significant. Hub module selection and functional analysis We used plug-in Molecular Complex Detection (MCODE) to select hub modules of PPI network in Cytoscape [74]^18. Meanwhile, degree = 5, node score = 0.2, k-core = 2, and max. depth = 100 were used as cut-off criteria. Then we uploaded the genes in hub module to DAVID database to perform functional analysis. Co-expression network construction and module functional analysis Firstly, expression data profile of DEGs was tested to check if they were the good samples and good genes. Then, we used the “WGCNA” package in R to construct co-expression network for the DEGs [75]^19^, [76]^20. At first, the Pearson's correlation matrices were both performed for all pair-wise genes. Then, a weighted adjacency matrix was constructed using a power function a[mn] = |c[mn]|^β (c[mn] = Pearson's correlation between gene m and gene n; a[mn] = adjacency between gene m and gene n). β was a soft-thresholding parameter that could emphasize strong correlations between genes and penalize weak correlations. Next, the adjacency was transformed into topological overlap matrix (TOM), which could measure the network connectivity of a gene defined as the sum of its adjacency with all other genes for network generation. To classify genes with similar expression profiles into gene modules, average linkage hierarchical clustering was conducted according to the TOM-based dissimilarity measure with a minimum size (gene group) of 50 for the genes dendrogram. To further analyze the module, we calculated the dissimilarity of module eigengenes, chose a cut line for module dendrogram and merged some modules, after which we performed GO enrichment analysis and KEGG pathway enrichment analysis on gene modules to characterize modules related to ccRCC. Hub genes and pathway validation Here, we chose genes with the most connectivities as hub genes and used TCGA KIRC data to perform validation using GEPIA database [77]^21. Common pathways in all modules were chosen to perform validation on both TCGA data and patients tissues. Preparation for human ccRCC samples The ccRCC and paracancerous tissues samples were collected from patients after surgery at Zhongnan Hospital of Wuhan University. The histology diagnosis was confirmed by two pathologists independently. The ccRCC and paracancerous tissues were immediately frozen and stored in liquid nitrogen or fixed in 4 % PFA after collection. The study using ccRCC and paracancerous tissue samples for total RNA isolation and qRT-PCR analysis was approved by the Ethics Committee at Zhongnan Hospital of Wuhan University (approval number: 2015029). Informed consent was obtained from all subjects. Total RNA isolation Total RNA from ccRCC tissues were isolated using RNeasy Mini Kit (Cat. #74101, Qiagen, Germany) according to the manufacturer's instruction. DNase I digestion (Cat. #79254, Qiagen, Germany) was used in each RNA preparation to remove genomic DNA. After that, total RNA quantity was measured using NanoPhotometer (Cat. #N60, Implen, Germany). Quantitative real time PCR (qRT-PCR) The cDNA was synthesized using 1 µg of total RNA isolated from patients tissues by ReverTra Ace qPCR RT Kit (Toyobo, China) and qRT-PCR was performed using 400 ng cDNA per 25 μl reaction. Each reaction was conducted with iQTM SYBR^® Green Supermix (Bio-Rad, China) using 400 or 500 ng of cDNA in a final volume of 25 µl. Primers sequences and annealing temperature were summarized in Supplementary Tab. S1. Statistical analyses All analyses were performed three times and represent data from three individual experiments. Two-tailed Student's t-test was used for significance of differences between subgroups. Statistical analyses were performed with SPSS 16.0. Statistical significance was set at probability values of p < 0.05. Results Identification of DEGs in ccRCC tissues The gene expression profiling of [78]GSE53000 including 56 ccRCC tissues and 6 normal kidney tissues were analysed. Using “limma” package of R software, selecting p < 0.05 and |log[2]fold change (FC)| > 1 as the cut-off criteria, 1175 DEGs were identified, of which, 533 were upregulated and 642 were downregulated (Supplementary Tab. S2). The volcano plot of all DEGs is shown in Fig. [79]2B. Meanwhile, we screened 11 genes differentially expressed in metastasis samples compared with primary ccRCC (Supplementary Tab. S3). Functional enrichment analysis of DEGs To obtain further insight into the function of DEGs of ccRCC, the up- and downregulated DEGs were respectively uploaded to the DAVID database. GO analysis results showed as to biological process (BP), the upregulated DEGs significantly enriched in response to wounding, positive regulation of immune system process, leukocyte activation, immune response and cell activation, and the downregulated DEGs significantly enriched in oxidation reduction, monovalent inorganic cation transport, ion transport, excretion and anion transport. For molecular function (MF), the upregulated DEGs significantly enriched in protein homodimerization activity, protein dimerizetion activity, phospholipid binding, kinase binding and identical protein binding, and downregulated DEGs significantly enriched in sodium ion binding, cofactor binding, alkali metal ion binding, coenzyme binding and anion transmembrane transporter activity. About cellular component, the upregulated DEGs significantly enriched in intrinsic to plasma membrane, integral to plasma membrane, cell surface, plasma membrane and external side of plasma membrane, and downregulated DEGs significantly enriched in plasma membrane part, apical plasma membrane, apical part of cell, plasma membrane and basolateral plasma membrane (Fig. [80]3A-B). These significantly enriched GO terms could help us a lot to further understand the role of DEGs played in ccRCC development and progression. Figure 3. [81]Figure 3 [82]Open in a new tab GO analysis of DEGs. (A) upregulated DEGs with fold change > 2. (B) downregulated DEGs with fold change < -2. BP: biological process, CC: cellular component, MF: molecular function. KEGG pathway enrichment analysis of DEGs KEGG pathway analysis was performed for all DEGs, and we found that complement and coagulation cascades, primary immunodeficiency, cell adhesion molecules (CAMs), glycolysis / gluconeogenesis and glycine, serine and threonine metabolism were most significantly enriched (Fig. [83]4). Figure 4. [84]Figure 4 [85]Open in a new tab KEGG enrichment analysis of all DEGs with |fold change| > 2. All differentially expressed genes (DEGs) were analysed by KEGG enrichment. Fold change > 2 was set as cut-off value. PPI network construction Based on the string profile obtained from STRING analysis tool (Supplementary Tab. S3), the PPI network of DEGs consisted of 949 nodes and 4774 edges, including 367 upregulated genes and 414 downregulated genes. We considered the top 10 DEGs with high degree of connectivity as the hub genes of ccRCC (TOP2A, MYC, ALB, CDK1, VEGFA, MMP9, PTPRC, CASR, EGFR and PTGS2), which may play a critical role in tumorogenesis and proliferation. Hub module selection and validation Degree cut-off = 5, node score cut-off = 0.2, k-core = 2, and max. depth = 100 as the criterion, top 3 significant modules were selected by using plug-in MCODE. Gene Ontology analysis of each module was performed in DAVID database (Fig. [86]5). Figure 5. [87]Figure 5 [88]Open in a new tab Module analysis of PPI network. (A) Module rank 1. (B) GO enrichment analysis of module rank 1. (C) Module rank 2. (D) GO enrichment analysis of module rank 2. (E) Module rank 3. (F) GO enrichment analysis of module rank 3. Weighted co-expression network construction and analysis We used “WGCNA” package in R, after the quality assessment for expression data matrix of [89]GSE53000, select the power of β = 6 (scale free R^2 = 0.92) to ensure a scalefree network (Fig. [90]6A-D). After putting the DEGs with similar expression patterns into modules by average linkage clustering, a total of 7 modules were identified (Fig. [91]6E). Except grey module, Gene Ontology was performed for other 6 modules so as to explore the underlying biological process correlated to ccRCC. In Fig. [92]7, we could find that DEGs in blue module significantly enriched in hexose biosynthetic process, oxidation reduction and monosaccharide biosynthetic process. In brown module, DEGs were enriched in immune response, defense response and inflammatory response. In green module, DEGs were enriched in M phase, nuclear division and mitosis. In red module, DEGs were enriched in blood vessel morphogenesis, blood vessel development and vasculature development. In turquoise module, DEGs were enriched in ion transport, monovalent inorganic cation transport and cation transport. In yellow module, DEGs were enriched in immune response, T cell activation and lymphocyte activation. In Fig. [93]8, we found the common pathways related to renal carcinoma: PI3K/AKT pathway, Rap1 pathway, NF-kappa B pathway and insulin associated pathways. Figure 6. [94]Figure 6 [95]Open in a new tab Determination of soft-thresholding power in the weighted gene co-expression network analysis (WGCNA). (A) Analysis of the scale-free fit index for various soft-thresholding powers (β). (B) Analysis of the mean connectivity for various soft-thresholding powers. (C) Histogram of connectivity distribution when β = 6. (D) Checking the scale free topology when β = 6. (E) Dendrogram of all differentially expressed genes clustered based on a dissimilarity measure (1-TOM). Figure 7. [96]Figure 7 [97]Open in a new tab GO enrichment analysis of 6 genes modules. (A) blue module, (B) brown module, (C) green module, (D) red module, (E) turquoise module and (F) yellow module. Figure 8. [98]Figure 8 [99]Open in a new tab KEGG pathway enrichment analysis of 6 genes modules. (A) blue module, (B) brown module, (C) green module, (D) red module, (E) turquoise module and (F) yellow module. Hub genes and pathway validation In Supplementary Fig. [100]S1-2, we performed validation of hub genes using TCGA KIRC data and microarray data ([101]GSE53000). We found that CASR and PTGS2 were tumor suppressors in both microarray and RNA-Seq data; CDK1, EGFR, MMP9, MYC, PTPRC, TOP2A and VEGFA were oncogenes in the common expression data. To perform the pathway validation, we chose the common pathway related to renal carcinogenesis and validated the key molecules using qRT-PCR. We found PIK3CD, PIK3CG and NFKB2 were upregulated in tumor samples and PIK3CB, AKT1, AKT2, AKT3, RAP1A and RAP1B were downregulated in tumor samples (Fig. [102]9). To further validate the pathway, we use TCGA KIRC data to perform the validation as well (Fig. [103]10). And we found that there were 7 genes deregulated in common validation sets (PIK3CB, PIK3CD, PIK3CG, AKT2, AKT3, NFKB2, RAP1A, Fig. [104]11). Figure 9. [105]Figure 9 [106]Open in a new tab Pathway validation using qRT-PCR analysis. (A) PIK3CB, (B) PIK3CD, (C) PIK3CG, (D) AKT1, (E) AKT2, (F) AKT3, (G) NFKB2, (H) RAP1A and (I) RAP1B. Figure 10. [107]Figure 10 [108]Open in a new tab Pathway validation using TCGA KIRC data. (A) PIK3CB, (B) PIK3CD, (C) PIK3CG, (D) AKT1, (E) AKT2, (F) AKT3, (G) NFKB2, (H) RAP1A and (I) RAP1B (* p < 0.05; ** p < 0.01; *** p <0.001). Figure 11. [109]Figure 11 [110]Open in a new tab Venn plot of common deregulated genes. Blue: downregulated genes; red: upregulated genes. Discussion Clear cell renal cell carcinoma is biologically heterogeneous and has variable clinical courses, therefore, it is essential to understand the molecular mechanism for better diagnosis and treatment of ccRCC. In this study, we investigated the gene expression profile of [111]GSE53000, including 56 clear cell renal cell carcinoma samples (including 2 lymph node metastasis samples and 1 venous thrombus samples) and 6 normal kidney samples to explore the molecular mechanism of ccRCC and find some biomarkers, which might be helpful therapeutic targets by using bioinformatics analysis. In this study, results showed that expressions of total 1175 genes were altered between normal kidney tissues and ccRCC tissues at FDR < 0.05. Among the 1175 DEGs, 533 were upregulated and 642 were downregulated. PPI network analysis and WGCNA analysis were performed to identify protein-protein interactions and gene co-expression modules related with the clinical features of ccRCC. In addition, functional and pathway analysis were also performed to find ccRCC-related biological process and pathways. According to the GO analysis of DEGs, we found that upregulated DEGs were significantly enriched in immune response, cell activation, leukocyte activation and positive regulation of immune system process; downregulated DEGs were significantly enriched in ion transport, monovalent inorganic cation transport, cation transport and anion transport. Giraldo NA et al. investigated that the infiltration and the localization of DC, and the expression of immune checkpoints (PD-1, LAG-3, PD-L1, and PD-L2) in relation with prognosis in the tumor microenvironment modulated the clinical impact of CD8(+) T cells in ccRCC [112]^22. In addition, Balan M et al. also reported that c-Met can promote increased survival of renal cancer cells through the regulation of HO-1 and PD-L1 [113]^23. Ciarimboli G et al. found that OCT2 played a decisive role in the renal secretion of creatinine and the process could be inhibited by OCT2 substrates [114]^24. Pochini L et al. also discovered that OCTN cation transporters were associated with several pathologies [115]^25. Meanwhile, KEGG pathway analysis revealed those glycolysis/gluconeogenesis and glycine, serine and threonine metabolisms were significantly enriched. Many studies illustrated that carcinogenesis could have very closely correlation with metabolism [116]^26^-[117]^28. As to renal carcinoma, significant progress had been made to understand the metabolic derangements present, which had been derived through translational, in vitro, and in vivo studies. So far, von Hippel-Lindau (VHL) loss was the well-characterized metabolic features linked to renal cancer [118]^29. And several metabolic pathways were altered, including glycolysis and oxidative phosphorylation due to VHL loss and the influence caused by increasing expression of hypoxia-inducible factor [119]^30^-[120]^35. Furthermore, protein-protein interaction network analysis demonstrated that TOP2A, MYC, ALB, CDK1, VEGFA, MMP9, PTPRC, CASR, EGFR and PTGS2 had the highest degree of connectivity among DEGs. TOP2A and CDK1 played an important role in regulation of cell cycle, which might modulate the tumor proliferation [121]^36^-[122]^38. MYC, as an oncogene, encoded a transcription factor, triggered selective gene expression amplification to promote cell growth and proliferation [123]^39. Shroff EH et al. demonstrated that MYC overexpression causes RCC and pointed to the inhibition of glutamine metabolism [124]^40. ALB was reported to be an independent prognostic factor for patients with mRCC treated with angiogenesis-targeted therapy [125]^41. VEGFA, as a member of PDGF/VEGF growth factor family, induced proliferation and migration of vascular endothelial cells, and was essential for both physiological and pathological angiogenesis, which played a vital role in renal carcinogenesis [126]^42^, [127]^43. MMP9 (matrix metalloproteinase-9) was reported to have a strong correlation with tumor invasion and migration [128]^44^-[129]^46. PTPRC, as an essential regulator of T- and B-cell antigen receptor signaling, might influence tumor proliferation via immune regulation [130]^47. CASR was a robust promoter of differentiation in colonic epithelial cells and functions as a tumor suppressor in colon cancer [131]^48. EGFR, upregulated in ccRCC, was a receptor tyrosine kinase involved in many important aspects of cell biology that are related to tumorigenesis [132]^49^-[133]^51. PTGS2 was reported to be an prognostic biomarker for poor clinical outcome of upper tract urothelial cancer [134]^52. WGCNA analysis found six modules with highly relevant expression pattern. Then Gene Ontology enrichment analysis was performed to explore the biological process of each module. Blue module including 244 DEGs was involved in glycometabolic process such as hexose biosynthetic process, monosaccharide biosynthetic process and hexose metabolic process. Many studies had mentioned the close relationship between metabolism and renal carcinoma [135]^53^, [136]^54. Brown module including 199 DEGs was associated with immune response. Meanwhile, yellow module including 182 DEGs also participated in immune response, which revealed the correlation between renal cancer and immune regulation. 125 DEGs in green module were enriched in cell cycle regulation, playing a critical role in tumor proliferation. Red module including 71 DEGs was closely related to angiogenesis and vascular repair, relating to the prognosis of ccRCC. Turquoise module including 326 DEGs has a strong correlation to ion transport, which might participated in the tumor migration and invasion. In conclusion, by using a series of bioinformatics analysis, we have illustrated the hub genes and pathways which may be involved in the progress of ccRCC, based on differentially expressed genes between ccRCC samples and normal kidneys. However, further molecular biological experiments are needed to confirm the function of the candidate biomarkers in ccRCC. Supplementary Material Supplementary figures and tables. [137]Click here for additional data file.^ (9.8MB, pdf) Acknowledgments