Abstract

Background

   The dreadful prognosis of nonmuscle invasive bladder cancer mainly
   results from the delay in recognition of individuals with a high risk
   of progression. Thus, the emphasis of this work lies in developing
   valuable biomarkers that is conducive to accurately predicting the
   progression of NMIBC.

Methods

   Microarray data from [32]GSE32894 including 209 NMIBC samples were
   performed by weighted gene coexpression network analysis (WGCNA), which
   could find modules of highly correlated genes and relate modules to
   external sample traits. Besides, we constructed a protein–protein
   interaction to facilitate screening the hub gene. At last, we used
   RNA‐seq and microarray data and clinical information from ArrayExpress
   (E‐MTAB‐4321) and [33]GSE13507 to select and validate the candidate
   gene.

Results

   In current paper, blue module of 13 gene coexpression clusters we
   identified was selected as the key modules. Seven genes namely: CDCA8,
   CENPF, MCM6, MELK, PRC1, STIL, and TPX2 have been identified as
   candidate genes. Notably, among them, only elevated CENPF in NIMBC
   tissue was closely associated with low progression‐free survival (PFS)
   and overall survival (OS) rate in three datasets and had a large area
   under receiver operating characteristic (ROC) curve. Finally, CENPF was
   identified as an effective biomarker in NMIBC.

Conclusion

   Therefore, our findings submit a new progressive and prognostic
   molecular marker and therapeutic target for NMIBC. Moreover, these
   genes that deserve to be further researched may improve the
   comprehension about the occurrence and development of superficial
   bladder cancer.

   Keywords: biomarkers, CENPF, nonmuscle invasive bladder cancer (NMIBC),
   progression, weighted gene coexpression network analysis (WGCNA)

1. INTRODUCTION

   Bladder cancer is the ninth most common cancer in the world, with the
   13th most common cause of cancer‐associated mortality (Ferlay et al.,
   [34]2015). Depending on the guidelines of European Association of
   Urology (EAU) and American Urological Association (AUA), bladder cancer
   (BCa) can be divided into muscle invasive bladder cancer (MIBC) and
   nonmuscle invasive bladder cancer (NMIBC)—which are comprised of Ta
   (noninvasive papillary carcinoma), Tis (carcinoma in situ: “flat
   tumor”), and T1 (tumor invades subepithelial connective tissue) and
   represented the majority of primary BCa with approximately 85% (Babjuk
   et al., [35]2017; Chang et al., [36]2016). The prevailing management
   strategy for NMIBC is complete transurethral resection of the bladder
   tumor (TURBT) and adjuvant intravesical treatment (Babjuk et al.,
   [37]2017; Chang et al., [38]2016). Despite this, it is reported that
   the progression rate of NIMBC can reach 10%–30%, especially those
   patients with T1G3 (Cambier et al., [39]2016). Van den Bosch and Alfred
   pointed out that two‐thirds of the secondary MIBC from progression died
   to BCa within 48 months even after radical treatment, and
   cancer‐specific survival is significantly dreadful (van den Bosch &
   Alfred, [40]2011). Fortunately, European Organization for Research and
   Treatment of Cancer (EORTC) risk tables were recommended for predicting
   the progression of NMIBC after TURBT (Babjuk et al., [41]2017; Chang et
   al., [42]2016). However, there are certain deficiencies in these risk
   tables. Numerous researches in recent years have established that
   remarkable the molecular heterogeneity has been shown in bladder cancer
   (Cancer Genome Atlas Research Network, [43]2014; Choi et al., [44]2014;
   Sanchez‐Carbayo, Socci, Lozano, Saint, & Cordon‐Cardo, [45]2006;
   Sjodahl et al., [46]2012; van Kessel et al., [47]2018). NMIBC is a
   heterogeneous set of tumors with thoroughly different oncologic
   outcomes. Furthermore, variability in staging and grading assessment is
   a recognized problem (Bol et al., [48]2003). Unfortunately, EORTC risk
   tables do not take these factors into account.

   Due to rapid technological advancements in the next‐generation
   sequencing, whole‐genome and RNA sequencing have been used to research
   pathological mechanisms and related biomarkers of BCa (Cancer Genome
   Atlas Research Network, [49]2014; Choi et al., [50]2014;
   Sanchez‐Carbayo et al., [51]2006; Sjodahl et al., [52]2012). According
   to the latest research, certain molecular markers can improve the
   accuracy of predictive progression of EAU risk stratification (van
   Kessel et al., [53]2018). Even though molecular profiles of NMIBC
   subtypes and considerable diagnostic indicators have been investigated
   as biomarkers of the risk of disease progression (Choi et al.,
   [54]2014; Ding et al., [55]2014; Kim et al., [56]2010; Sanchez‐Carbayo
   et al., [57]2006; Sjodahl et al., [58]2012) and play a role in clinical
   management, none are able to accurately predict the behavior of NMIBC.
   Hence, identifying molecular biomarkers or therapeutic targets to
   address those problems is imminent. Unlike most other studies that
   focus on the screening of differentially expressed genes, this paper
   explores the affiliation between gene sets and clinical features by
   WGCNA—a systems biology method and enables the description of the
   correlation patterns between microarray samples and gene expression
   profiles (Langfelder & Horvath, [59]2008). Here, by this method, we
   find some candidate biomarkers associated with NMIBC progression and
   prognosis or therapeutic targets.

2. MATERIALS AND METHODS

2.1. Ethical compliance and study design

   We retrospectively analyzed the gene expression profiles of four public
   BCa cohorts, therefore ethical approval is not required. This study
   design was performed in a flow diagram (Figure [60]1).

Figure 1.

   Figure 1
   [61]Open in a new tab

   Flowchart of data collection, preparation, processing, analysis, and
   validation in this report

2.2. Data collection

   For consistency in the microarray platform, we searched in GEO database
   ([62]http://www.ncbi.nlm.nih.gov/geo/) for the BCa‐related gene
   expression datasets measured by the Illumina HumanHT‐12 V3.0 expression
   beadchip (Illumina, Inc.) and obtained two independent datasets. For
   one, non‐normalized microarray gene expressing profiles and
   corresponding clinical data were downloaded from [63]GSE32894 (n = 213)
   (Sjodahl et al., [64]2012) and used to screen probesets, construct of
   coexpression networks and identify hub genes in the present study. For
   another, the processed expression matrix of [65]GSE48075 (n = 54) (Choi
   et al., [66]2014) was obtained for module preservation analysis.
   Meanwhile, to verify the analysis results, expression data and related
   clinicopathologic information were acquired from [67]GSE13507 (n = 103)
   (Kim et al., [68]2010) and ArrayExpress
   ([69]https://www.ebi.ac.uk/arrayexpress/) (n = 460, E‐MTAB‐4321)
   (Hedegaard et al., [70]2016) which was performed on Illumina human‐6
   v2.0 expression beadchip (48K) and paired‐end RNA‐Seq
   (101 + 7 + 101 bp) analysis on an Illumina HiSeq 2000 (Illumina, Inc.),
   respectively. In addition, the ONCOMINE database ([71]www.oncomine.org)
   (Rhodes et al., [72]2004; Sanchez‐Carbayo et al., [73]2006) were used
   for further analysis to validating the result of mRNA levels compared
   with normal tissue. All NMIBC (Ta/Tis/T1, 2009, 7th edition) patients
   were saved.

2.3. Data preprocessing and genes screening

   Non‐normalized data sets from [74]GSE32894 and ArrayExpress
   (E‐MTAB‐4321) were performed log2 transformation and standardized by
   quantile normalization used “preprocessCore” package (Bolstad,
   [75]2018) in R. Then, the reference genome (GRCh38.86) was used to
   convert Gene ID in the later dataset. Processed gene expression
   profiles from other datasets were utilized directly in the further
   analysis. Furthermore, according to the variance of the probe sets in
   all samples, the genes relevant to the probes of ranking in the top 20%
   were chosen for subsequent analysis. All the above operations are
   conducted on the programming software R and could be found in Appendix
   Code [76]S1.

2.4. Weighted coexpression network construction

   Following the instructions below, expression matrix of 9,761 probes was
   used to build coexpression network by the “WGCNA” packages in R
   (Langfelder & Horvath, [77]2008). Firstly, a gene coexpression
   resemblance measure (absolute value of the Pearson correlation) was
   utilized to estimate for all pairwise gene–gene relationship. Next, a
   weighted adjacency matrix was constructed using a “soft” power
   adjacency function a[ij] = |cor (x[i], x[j])| ^β. a[ij] indicates the
   weighted Pearson's correlation coefficient that measures the connection
   strength between gene i and gene j. The soft threshold power β, is the
   lowest integer where the resulting gene coexpression networks satisfy
   approximate scale‐free topology (Zhang & Horvath, [78]2005). The
   adjacency matrix was converted to topological overlap matrix (TOM)
   subsequently, which could evaluate the direct correlation of gene pairs
   and the degree of agreement in association with other genes in the data
   set. (Yip & Horvath, [79]2007). After that, average linkage
   hierarchical clustering was conducted in accordance with the TOM‐based
   dissimilarity measure. An appropriate minimum gene module size for the
   gene dendrogram was set to classify similar genes into one module
   (Ravasz, Somera, Mongru, Oltvai, & Barabasi, [80]2002). Moreover,
   module eigengene (ME) that can be regarded as a representative of the
   gene expression profiles of a module, is defined as the first principal
   component of an interesting module (Langfelder & Horvath, [81]2008).
   Similar modules would be merged by the calculation of ME. More detailed
   WGCNA method processing can be found in these references (Langfelder &