Abstract Although it is well known that smoking is one of pathogenesis of clear cell renal cell carcinoma (ccRCC), the underlying molecular mechanism is still unclear. In our study, the microarray dataset [39]GSE46699 is analyzed by weighted gene co-expression network analysis (WGCNA). Then we identify 15 co-expressed gene modules in which the lightcyan module (R^2 = 0.30) is the most significant. Combined with the protein-protein interaction (PPI) network and WGCNA, two hub genes are identified. Meanwhile, linear regression analyses indicate that TOP2A has a higher connection with smoking in ccRCC, survival analysis proved that overexpression of TOP2A in ccRCC could lead to shorter survival time. Furthermore, bioinformatical analyses based on [40]GSE46699 and [41]GSE2109 as well as qRT-PCR experiment show similar results that TOP2A is significantly up-regulated in smoking ccRCC compared to non-smoking ccRCC samples. In addition, Functional analysis, pathway enrichment analysis and gene set enrichment analysis (GSEA) indicate that high expression of TOP2A is related to cell cycle and p53 signaling pathway in ccRCC samples. Moreover, in vitro experiments revealed that TOP2A induced cell cycle arrest at G2 phase and proliferation inhibition via p53 phosphorylation. Taken together, by using WGCNA, we have identified a novel biomarker named TOP2A, which could affect the development of smoking-related ccRCC by regulating cell cycle and p53 signaling pathway. Keywords: weighted gene co-expression network analysis (WGCNA), TOP2A, biomarker, smoking, clear cell renal cell carcinoma (ccRCC) Introduction Renal cell carcinoma is the third most frequent urological malignant neoplasms worldwide, among which, clear cell renal cell carcinoma (ccRCC) accounts for approximately 80-90% [42]^1. It is estimated that about 14,400 deaths and 63,990 new cases associated with renal cell carcinoma occur during 2017 in the U.S [43]^2. Smoking, hypertension, and obesity are proven to be the high risk of renal cell carcinoma [44]^3^,[45]^4. Among these, smoking increases renal cell carcinoma risk by 54% in men and by 22% in women [46]^5. Therefore, it is urgent to identify the molecular mechanism of ccRCC associated with smoking to understand the pathogenesis of smoking-related ccRCC and to find a novel biomarker for early diagnosis and treatment. The breakthrough in sequencing of cancer genomes make great contribute to reveal the potential pathological mechanisms in various cancers [47]^6. However, few studies concentrate on the interrelated genes which may have similar function. Weighted gene co-expression network analysis (WGCNA) is a system biology approach which has been widely applied to identifying the hub genes associated with clinical features in various cancer [48]^7^,[49]^8. According to the gene expression profiles, WGCNA can construct a gene co-expression network and divide the co-expressed genes into different modules. After that, WGCNA can find the most related module with clinical traits and identify the most central genes among it [50]^9. In this study, we utilized WGCNA to identify hub genes of ccRCC associated with smoking, which may decipher the potential mechanisms between smoking and ccRCC. Material and Methods Study design and collection of microarray data First, we designed a flow chart to understand the whole process of the study (Figure [51]1A). The raw data of [52]GSE46699 was obtained from the Gene Expression Omnibus database ([53]http://www.ncbi.nlm.nih.gov/geo/) and was annotated according to the Affymetrix Human Genome U133 Plus 2.0 Array platform (Affymetrix, Santa Clara, CA, USA). This dataset contained 130 samples, including 40 non-smoking normal samples, 44 non-smoking tumor samples, 23 smoking normal samples, 23 smoking tumor samples. Moreover, another two datasets [54]GSE2109 and [55]GSE10072 were downloaded to validate the results. The description of [56]GSE46699, [57]GSE2109 and [58]GSE10072 were listed in Table [59]1. Figure 1. [60]Figure 1 [61]Open in a new tab Description of the study. (A) Flow chart of this study. (B) Sample dendrogram and cigarette smoking heatmap ([62]GSE46699). Table 1. The description of [63]GSE46699, [64]GSE2109 and [65]GSE10072. GEO Dataset Platform Tumor type Total samples Non-smoking samples Smoking samples Normal samples Tumor samples [66]GSE46699 Affymetrix Human Genome U133 Plus 2.0 Array ccRCC 130 84 46 63 67 [67]GSE2109 Affymetrix Human Genome U133 Plus 2.0 Array ccRCC 283 140 143 - 283 [68]GSE10072 Affymetrix Human Genome U133A Array LUAD 107 31 76 49 58 [69]Open in a new tab Data preprocessing Robust Multiarray Averaging (RMA) background correction, log2 transformed and quantile normalization were sequentially utilized to process the raw expression data. Then, the ''affy'' R package and Affymetrix annotation files were used to summarize and annotate the probes respectively. Besides, sample clustering was applied to evaluate the quality of [70]GSE46699 (Figure [71]1B). Test sets [72]GSE2109 and [73]GSE10072 were performed the same data preprocessing. Screening for differentially expressed genes The limma R package, based on empirical Bayes methods and linear models, was used to find DEGs between smoking and non-smoking ccRCC samples. The DEGs threshold was set at a false discovery rate (FDR) < 0.05 and |log2 fold change (FC)| > 0.585 to be further analyzed. Constructing gene co-expression network Before we utilized the “WGCNA” package to construct the scale-free gene co-expression network [74]^10^,[75]^11, we first verified the qualification of DEGs. Then, Pearson's correlation matrices were conducted and a weighted adjacency matrix were performed by a formula amn = |cmn|^β (cmn represents Pearson's correlation between genes, amn represents adjacency between genes, the β parameter could magnify the correlation between genes). An appropriate power of β was chosen on basis of standard scale-free networks. Subsequently, we transformed the adjacency into topological overlap matrix (TOM) [76]^12 and identified modules including similar genes by hierarchically clustering genes [77]^13. Discovering module of interest We first defined module eigengenes (MEs) as the most principal component and summarize all genes into a single characteristic expression profile. Then, the interested module was identified by calculating the relevance between MEs and smoking. After that, the log10 transformation of the P value was defined as gene significance (GS) and the average GS for all genes in the module was defined as the module significance (MS). Finally, the module with the highest MS score was chosen as the one related to smoking. Identification and validation of hub genes Genes in the interested module with module membership > 0.8 and gene significance > 0.2 were defined as hub gene. Then, we constructed protein-protein interaction (PPI) networks and screen hub nodes through mapping all genes to the Cytoscape v3.4.0 software ([78]http://cytoscape.org/) [79]^14 and the Search Tool for the Retrieval of Interacting Genes (STRING, [80]http://www.string-db.org/) [81]^15^,[82]^16. To screen the most important hub gene, we performed linear regression analysis to observe the relationship between the hub gene and smoking. In orders to further validate the results, we used [83]GSE2109 and [84]GSE10072 datasets to check the relevance between the hub gene and smoking. Furthermore, The Gene Expression Profiling Interactive Analysis (GEPIA, [85]http://gepia.cancer-pku.cn/) was used to validate the expression and prognosis of hub gene [86]^17^,[87]^18. Functional enrichment analysis DEGs in the interested module were put into The Database for Annotation, Visualization and Integrated Discovery (DAVID, [88]https://david.ncifcrc.gov/) [89]^19 to obtain the enriched biological process and KEGG pathway with the cutoff criteria of P < 0.05. Gene Set Enrichment Analysis (GSEA) According to the expression level of the hub gene, we divided 530 samples of ccRCC downloaded from TCGA into two groups. To detect whether the previously defined biological processes and KEGG pathway were enriched in the two groups, we conducted GSEA with the cutoff criteria of FDR < 0.05 [90]^20. Ethics statement Human ccRCC tissue samples (n=18) and human paracancerous tissues (n=18) were all collected from patients suffering renal cancer in surgery and were stored in liquid nitrogen for later RNA isolation. The study was approved by the Ethics Committee at Zhongnan Hospital of Wuhan University (approval number: 2015029). All experimental methods were carried out in accordance with the approved guidelines. Cell culture Human ccRCC cell line 786-O was purchased from Chinese Academy of Sciences in Shanghai. 786-O was cultured in RPMI-1640 (Gibco, China) medium with 10% fetal bovine serum (FBS, Gibco, Australia) in a humidified atmosphere containing 5% CO[2] at 37 °C. Total RNA isolation and quantitative real time PCR (qRT-PCR) Total RNA was isolated from kidney tissues by RNeasy Mini Kit (Cat. #74101, Qiagen, Germany). 1 µg of total RNA was used as the template to synthesize first-strand cDNA by utilizing ReverTra Ace qPCR RT Kit (Toyobo, China). All primers were conducted with iQTM SYBR® Green Supermix (Bio-Rad, China) in a final volume of 20 μl with Bio-Rad iCycler (Cat. #CFX96). Primer sequences of TOP2A (annealing temperature is 60°C): 5'-ACCATTGCAGCCTGTAAATGA-3' (forward); 5'-GGGCGGAGCAAAATATGTTCC-3' (reverse). Primer sequences of GAPDH (loading control, annealing temperature is 60°C): 5'-GGAGCGAGATCCCTCCAAAAT-3' (forward); 5'-GGCTGTTGTCATACTTCTCATGG-3' (reverse). Relative gene abundance was calculated from the ΔΔCT values as: 2^-ΔΔCT. Cell transfection and reagents The three specific small interfering RNAs: siTOP2A-1: 5'-CUCCUAACUUCUAGUAACUTT-3'; siTOP2A-2: 5'-GAUCCACCAAAGAUGUCAATT-3'; siTOP2A-3: 5'-GUCCAGUUAAACAAGAAGUTT-3' against TOP2A were synthesized by Genepharma Ltd. in Suzhou, China. For transfection, Lipofectamine 2000 (Invitrogen, USA) was used as the transfection reagent according to the manufacturer's instructions. MTT assay The MTT (methyl thiazolyl tetrazolium, Sigma-Aldrich, MO, USA) assay was used for cell viability measurement in ccRCC cells. After transfection for 48 h, 786-O cells were seeded in 96-well plates (3000 cells per well) in RPMI-1640 medium containing 10% FBS for 5 days. Then, 20 μl of MTT reagent was added to each well for 4 h at 37 °C. After discarding the medium, the precipitates were dissolved by 200 μl of DMSO. The absorbance was measured at 490 nm using a Spectramax M5 spectrophotometer (Molecular Devices, Sunnyvale, USA). Colony formation assay After transfection for 48 h, 786-O cells were plated in 6-well plates (800 per well) and cultured for 14 days. Then, surviving colonies (> 50 cells per colony) were fixed by 4% PFA for 30 min, stained with crystal violet and photographed. Flow cytometry analysis for the alterations of cell cycle For cell cycle analysis, 786-O cells were harvested and washed by cold PBS for three times after transfection for 48 h. Then, the cells were resuspended with 1× DNA Staining Solution containing propidium iodide and permeabilization solution (Multisciences, China) in the dark. After incubation at 37 °C for 30 min. The sample was analyzed by flow cytometry analysis (Cat. #FC500, Beckman, USA). Western blot analysis The RIPA buffer with protease inhibitor and phosphatase inhibitor (Sigma-Aldrich, USA) was used to extract total protein of 786-O cells. We used 10% SDS-PAGE to resolve the total protein and transferred the SDS-PAGE to PVDF membrane (Millipore, USA). Then, membranes were blocked by 5% non-fat milk and incubated with primary antibodies at 4°C for overnight. After washing, secondary antibody was used to incubated the membranes at room temperature for 2 h. Bands were visualized using an enhanced chemiluminescence (ECL) kit (Bio-rad, USA) and detected by BIO-RAD ChemiDoc MP Imaging System (Bio-Rad, USA). Western blotting was performed using the following antibodies: anti-TOP2A, 1:1000 (Proteintech); anti-CDK1, 1:1000 (Abcam); anti-CDK2, 1:1000 (Abcam ); anti-CCNA1/2, 1:2000 (Abcam); anti-p-TP53, 1:1000 (Abcam); anti-TP53, 1:1000 (Abcam). Blotting membranes were stripped and probed again with anti-GAPDH antibodies (Santa) as a loading control. Statistical analyses All analyses were performed at least three times and represented data from three individual experiments. Two-tailed Student's t-tests were used to assess the statistical significance of differences between the groups. Statistical analyses were performed using SPSS 16.0. Statistical significance was considered as a P value < 0.05. Results Screening for differentially expressed genes (DEGs) A total of 3053 overlapping DEGs were chosen for further analysis. Among these, 1736 were up-regulated and 1317 were down-regulated. Constructing co-expression network When we set the soft threshold power β to 11 (scale-free R^2=0.89), the link between genes in the gene network satisfied a scale-free network distribution (Figure [91]2A-D). A total of 15 co-expressed modules were identified (Figure [92]3A). Figure 2. [93]Figure 2 [94]Open in a new tab Determination of soft-thresholding power and analysis of hub genes as well as protein-protein interaction network (PPI). (A) Analysis of the scale-free fit index for various soft-thresholding powers (β). (B) Analysis of the mean connectivity for various soft-thresholding powers. (C) Histogram of connectivity distribution when β = 11. (D) Checking the scale-free topology when β = 11. (E) Scatter plot of module eigengenes in lightcyan module. (F) PPI network of genes in the lightcyan module. Figure 3. [95]Figure 3 [96]Open in a new tab Identify modules associated with the smoking of ccRCC. (A) Dendrogram of all differentially expressed genes clustered. (B) Distribution of average gene significance in the modules associated with smoking of ccRCC. (C) The correlation between module eigengenes and the smoking group. Finding module of interest We applied 2 approaches to verify the relevance between each module and smoking. Consequently, the lightcyan module (r = 0.3, P = 0.01) exhibited higher MS and greater connection with smoking (Figure [97]3B-C) and was identified as the most relevant module. Identification of hub genes Seven hub genes (PRC1, TOP2A, SMC4, RACGAP1, CENPK, PAPD4, ECT2) were defined by module connectivity (Figure [98]2E) and eight hub genes (PRC1, TOP2A, ZWINT, KNTC1, ASPM, PCNA, RRM2, CEP55) were defined as hub nodes (Figure [99]2F). Validation of hub genes By conducting linear regression analysis with [100]GSE2109, we observed that TOP2A was more relevant to smoking (Figure [101]4B), and we selected TOP2A as the hub gene. According to GEPIA database, we found TOP2A were more highly expressed in ccRCC samples (Figure [102]4A). Meanwhile, [103]GSE2109 and [104]GSE46699 showed that TOP2A was significantly overexpressed in smoking ccRCC than non-smoking ccRCC samples (Figure [105]4C, D). Interestingly, [106]GSE10072 also proved the positive correlation between the expression of TOP2A and smoking in lung adenocarcinoma (Figure [107]4E). The P value of Figure [108]4E was listed in Table [109]2. More convincingly, the qRT-PCR experiment revealed that TOP2A was obviously up-regulated in smoking ccRCC patients as well (Figure [110]4F). In addition, the survival analysis uncovered that up-regulation of TOP2A had an obviously shorter overall survival time and disease-free survival time (Figure [111]4G- H). Figure 4. [112]Figure 4 [113]Open in a new tab Validate the relationship between TOP2A and smoking ccRCC samples. (A) TOP2A is significantly overexpressed in the ccRCC according to GEPIA database. (B) Linear regression analysis between the expressions of TOP2A and smoking group. (C) The expression of TOP2A is correlated with smoking of ccRCC in [114]GSE46699 (N represents non-smoking patients, S represents smoking patients). (D) TOP2A expression is correlated with smoking of ccRCC in [115]GSE2109. (E) TOP2A expression is positive correlation with smoking status in lung adenocarcinoma in [116]GSE10072 (NNS: Normal never smoked, NFS: Normal former smoker, NCS: Normal current smoker, TNS: Tumor never smoked, TFS: Tumor former smoker, TCS: Tumor current smoker). (F) qRT-PCR indicates the expression of TOP2A is up-regulated in smoking ccRCC tissues. (G-H) Kaplan-Meier survival curve downloaded from GEPIA database demonstrate that up-regulation of TOP2A have a significantly shorter overall survival time and disease-free survival time. (I) Biological processes and (J) KEGG pathway enrichment analysis of TOP2A. Table 2. The P value of the [117]GSE10072 in Figure [118]4E. Smoking status Mean±SD Smoking status Mean±SD P NNS 5.805±0.469 VS NFS 6.088±0.555 0.116475331 NNS 5.805±0.469 VS NCS 6.251±0.631 0.00052866 NFS 6.088±0.555 VS NCS 6.251±0.631 0.034655055 NNS 5.805±0.469 VS NFS+NCS 6.642±0.598 0.002130744 TNS 7.658±1.028 VS TFS 8.451±0.913 0.076959943 TNS 7.658±1.028 VS TCS 9.123±0.851 0.001676115 TFS 8.451±0.913 VS TCS 9.123±0.851 0.148011761 TNS 7.658±1.028 VS TNS+TCS 8.835±0.939 0.003132606 NNS+NFS+ NCS 6.054±0.585 VS TNS+TFS+ TCS 8.510±1.098 1.95169E-25 [119]Open in a new tab Functional Enrichment Analysis Biological process (BP) in the lightcyan module was found to focus on cell cycle, cell division, mitotic cell cycle, cell cycle phase and cell cycle process (Figure [120]4I). KEGG pathway enrichment analysis showed DNA replication, cell cycle and p53 signaling pathway were significantly enriched in lightcyan module (Figure [121]4J). Gene Set Enrichment Analysis (GSEA) Six gene sets (cell cycle, leishmanial infection, antigen processing and presentation, autoimmune thyroid disease, viral myocarditis and p53 signaling pathway) were enriched in samples with highly expressed TOP2A by conducting GSEA (Figure [122]5). Figure 5. [123]Figure 5 [124]Open in a new tab Gene set enrichment analysis (GSEA). “Viral myocarditis”, “Antigen processing and presentation”, “Leishmania infection”, “Cell cycle”, “Autoimmune Thyoid disease”, “P53 signaling pathway” were enriched in ccRCC samples with TOP2A highly expressed. Knockdown of TOP2A inhibited cell proliferation by triggered cell cycle arrest at G2 phase in ccRCC cell To explore the biological role of TOP2A in ccRCC, we established a model of TOP2A deficiency in 786-O cell line. The knockdown efficiency of the siRNAs was validated by qRT-PCR and Western blot analysis (Figure [125]6A). MTT assay revealed that downregulation of TOP2A inhibited the cell proliferation significantly comparing with the NC group (Figure [126]6B). This uncovered that silencing TOP2A inhibited the proliferative capacity of ccRCC cell. The colony formation assays demonstrated a similar conclusion (Figure [127]6C). Flow cytometry analysis also showed that knockdown of TOP2A triggered cell cycle arrest at G2 phase (Figure [128]6D). Indeed, cell cycle related proteins such as Cyclin A1/2 and CDK1/2 were significantly decreased in the ccRCC cell in TOP2A knockdown group (Figure [129]6E). Moreover, TOP2A deficiency strongly induced phosphorylated p53 in ccRCC cells (Figure [130]6F). Figure 6. [131]Figure 6 [132]Open in a new tab Downregulation of TOP2A inhibited cell proliferation by triggered cell cycle arrest at G2 phase in ccRCC cells. (A) The expression of TOP2A in transcriptional level (upper picture) and translation level (lower picture) by treating with siTOP2A-1, siTOP2A-2 and siTOP2A-3 in 786-O cell line. The proliferative capacity was measured by (B) MTT assay and (C) colony formation assays in 786-O cell line. (D) Flow cytometry analysis of TOP2A deficiency group and NC group in 786-O cell line. (E) Cell cycle related proteins such as Cyclin A1/2 and CDK1/2 were significantly decreased in the 786-O cell line in TOP2A knockdown group. (F) TOP2A deficiency strongly induced phosphorylated p53 in 786-O cell line. Discussion In our study, the lightcyan module associated with smoking of ccRCC was identified by applying WGCNA. Seven hub genes were obtained from the module. Then, we constructed PPI networks and found that PRC1 and TOP2A were both hub genes in PPI network and WGCNA. Furthermore, linear regression analyses indicated that TOP2A was the most correlated hub gene with the smoking ccRCC samples. The GEPIA database indicated that TOP2A was significantly overexpressed and resulted in poorer prognosis in ccRCC samples. TOP2A encodes topoisomerase 2-alpha, an enzyme that plays a key role in the topologic states of DNA during replication and transcription, which is essential for chromosome condensation and segregation [133]^21. Our group had revealed TOP2A played a critical role in ccRCC carcinogenesis and progression, involving in the biological process of ccRCC proliferation [134]^22^,[135]^23. Meanwhile, two publications suggested that TOP2A is overexpressed in colorectal cancer and hepatocellular carcinoma [136]^24^,[137]^25, and overexpression of TOP2A was presumed to inhibit cell proliferation in colorectal cancer [138]^26. What's more, several investigators have reported that TOP2A is an independent prognostic biomarker in various cancer, such as nasopharyngeal carcinoma, breast cancer, pancreatic cancer and renal cell carcinoma [139]^27^-[140]^30. Given that a cigarette contains approximately 50 chemicals that are associated with carcinogenesis [141]^31, it is not surprising that cigarette smoking is one of an etiology of renal cell carcinoma. In fact, the relationship between cigarette smoking and the development of renal cell carcinoma has been widely recognized, and previous reports demonstrated that overall survival time was significantly shorter in renal cell carcinoma patients who smoke [142]^32^-[143]^34. It is noteworthy that many researchers have concentrated on elucidating the potential molecular mechanism of smoking-related lung cancer [144]^35^-[145]^37. However, there is little progress in illustrating the specific underlying molecular alterations triggered by cigarette smoking in the development of ccRCC. Our study unveiled that TOP2A was significantly up-regulated in smoking ccRCC samples in [146]GSE46699 and [147]GSE2109 datasets, which was consistent with the positive correlation between TOP2A and smoking status in lung adenocarcinoma in [148]GSE10072 dataset. Moreover, the result of qRT-PCR experiment revealed that TOP2A was significantly up-regulated in smoking ccRCC than non-smoking ccRCC samples. More importantly, in vitro experiments uncovered that TOP2A inhibited the proliferative capacity and induced cell cycle arrest at G2 phase via p53 phosphorylation. To explain how TOP2A accelerated the tumorigenesis of ccRCC related to cigarette smoking, we conducted KEGG pathway enrichment analyses and GSEA. Combined with the results of the two methods, we speculated that the potential pathways influenced by TOP2A were cell cycle and p53 signaling pathway. It has been proved that cell cycle pathway can regulate the proliferation of colorectal cancer and lung adenocarcinoma [149]^38^,[150]^39. Additionally, many researchers have uncovered that p53 signaling pathway is essential to carcinogenesis, proliferation, apoptosis, and progression [151]^40^-[152]^42. Interestingly, Mizuno S. et al. [153]^43 indicated p53 signaling pathway was important for the pathogenesis of smoking-related emphysematous changes. In addition, synchronous double primary lung cancers have been reported to be induced by heavy smoking via p53 pathway [154]^44. Taken all of these evidences into account, TOP2A is likely to be a potential key gene in the development of smoking-related ccRCC by regulating cell cycle and p53 signaling pathway. The major limitation of our study is lack of adequate evidence in illustrating the specific pathway of smoking-related ccRCC regulated by TOP2A. Further experiments in mRNA and protein levels are needed. A large-scale epidemiological survey of the relationship between smoking and ccRCC may be helpful to verify our results. In conclusion, we used WGCNA, PPI network, GSEA and qRT-PCR to identify and validate the hub gene of ccRCC associated with cigarette smoking. TOP2A was identified to be a potential biomarker of ccRCC associated with cigarette smoking by regulating cell cycle signaling and p53 signaling pathway. In future, TOP2A may be used to assist in the diagnosis of ccRCC in smoking patients. Acknowledgments