Abstract

Simple Summary

   Bladder cancer is evidently a challenge as far as its prognosis and
   treatment are concerned. The investigation of potential biomarkers and
   therapeutic targets is indispensable and still in progress. Most
   studies attempt to identify differential signatures between distinct
   molecular tumor subtypes. Therefore, keeping in mind the heterogeneity
   of urinary bladder tumors, we attempted to identify a consensus
   gene-related signature between the common expression profile of bladder
   cancer and control samples. In the quest for substantive features, we
   were able to identify key hub genes, whose signatures could hold
   diagnostic, prognostic, or therapeutic significance, but, primarily,
   could contribute to a better understanding of urinary bladder cancer
   biology.

Abstract

   Bladder cancer (BCa) is one of the most prevalent cancers worldwide and
   accounts for high morbidity and mortality. This study intended to
   elucidate potential key biomarkers related to the occurrence,
   development, and prognosis of BCa through an integrated bioinformatics
   analysis. In this context, a systematic meta-analysis, integrating 18
   microarray gene expression datasets from the GEO repository into a
   merged meta-dataset, identified 815 robust differentially expressed
   genes (DEGs). The key hub genes resulted from DEG-based protein–protein
   interaction and weighted gene co-expression network analyses were
   screened for their differential expression in urine and blood plasma
   samples of BCa patients. Subsequently, they were tested for their
   prognostic value, and a three-gene signature model, including COL3A1,
   FOXM1, and PLK4, was built. In addition, they were tested for their
   predictive value regarding muscle-invasive BCa patients’ response to
   neoadjuvant chemotherapy. A six-gene signature model, including ANXA5,
   CD44, NCAM1, SPP1, CDCA8, and KIF14, was developed. In conclusion, this
   study identified nine key biomarker genes, namely ANXA5, CDT1, COL3A1,
   SPP1, VEGFA, CDCA8, HJURP, TOP2A, and COL6A1, which were differentially
   expressed in urine or blood of BCa patients, held a prognostic or
   predictive value, and were immunohistochemically validated. These
   biomarkers may be of significance as prognostic and therapeutic targets
   for BCa.

   Keywords: bladder cancer, urologic cancer, diagnosis, prognosis,
   prediction, biomarkers, hub genes, bioinformatics, microarrays,
   meta-analysis

1. Introduction

1.1. Bladder Cancer towards Biomarker-Directed Management

   Bladder cancer (BCa) is any of the various types of cancer that arise
   from the urinary bladder lining. BCa is a complex and heterogeneous
   disease that requires intensive surveillance owing to its high global
   prevalence, recurrence rate, as well as poor prognosis of invasive
   disease [[30]1,[31]2]. BCa constitutes the most common neoplasm of the
   urinary tract and is estimated to be the fourth most frequent
   malignancy in males, with a male-to-female preponderance of at least
   three to one [[32]3,[33]4]. In the American Cancer Society’s latest
   annual report, it is stated that an estimated number of 81,180 new
   cases of BCa will be diagnosed in the USA in 2022 and 17,100 people
   will die due to the disease this year [[34]5]. BCa is primarily
   categorized into non-muscle-invasive BCa (NMIBC), which pertains to
   approximately 70–75% of diagnoses, and muscle-invasive BCa (MIBC),
   which refers to the other 25–30%. The two subtypes differ genetically
   and are related to distinct prognoses [[35]6].

   Advanced age, male sex, cigarette smoking, and chemical exposure
   contribute to the development of BCa [[36]2]. There is currently no
   routine screening test recommended for the general public or for people
   at average risk [[37]7]. The main diagnostic tools for symptomatic or
   people at increased risk are cystoscopy—which constitutes the gold
   standard for the evaluation of the lower urinary tract—and urine tests,
   such as urine cytology and urinalysis. In view of the fact that
   cystoscopy is an invasive and high-cost method, and cytology is
   restricted by poor sensitivity, particularly for early-stage and
   low-grade tumors [[38]8], new urine tests for tumor biomarkers, which
   were found to partially overcome these limitations, have emerged. These
   tests investigate certain substances in urine, called extracellular
   vesicles (EVs), which comprise a new promising source of diagnostic and
   prognostic biomarkers in liquid biopsies [[39]9]. However, they often
   lack sensitivity and specificity, in particular for low-grade and
   early-stage BCa tumors and recurrent diagnoses, and return many false
   positive results [[40]10,[41]11]. For this reason, these tests have not
   substituted the current diagnostic standards of cystoscopy and cytology
   [[42]12].

   Over the past years, many efforts have been stepped up to identify
   high-value molecular markers for BCa. There is no doubt that, although
   the exact molecular mechanisms underlying the progression of BCa remain
   unclear [[43]13], we have greatly broadened our comprehension of the
   BCa molecular pathology, which has allowed us to establish new
   prognostic and predictive biomarkers using high-throughput
   technologies. However, there is still no biomarker approved for
   clinical practice, and advances in the treatment of BCa are lacking as
   opposed to those in other cancers [[44]14]. On that account, the need
   to develop reliable and non-invasive methods to detect and predict BCa
   biological behavior is indispensable and still ongoing [[45]15].

1.2. Reuse of Public Genome-Wide Gene Expression Data

   The growing use of high-throughput technologies for gene expression
   analysis for the past two decades and the deposition of the vast
   majority of research data in public repositories have created a wealth
   of publicly available archives [[46]16]. All these data offer an
   invaluable resource for reuse so that scientific findings and new
   knowledge can be introduced. In particular, the data integration
   approach from multiple experimental studies allows for increasing the
   sample size, the statistical power, and the robustness of the results
   [[47]16,[48]17], as well as improving reproducibility and the relevance
   of the biological information extracted [[49]18].

   The motivation of this study was to identify key hub genes serving as
   potential diagnostic, prognostic, and predictive biomarkers for BCa. On
   this basis, the aim of the first part of our analysis was to reuse all
   the available microarray-based gene expression data and carry out an
   integrative meta-analysis in order to assess the alterations of gene
   expression in urinary bladder tissue and to identify key hub genes in
   BCa. In this context, we also incorporated gene expression data derived
   from urine and blood samples in order to further investigate the
   potential altered expression of the identified key hub genes in these
   biological fluids. Subsequently, we conducted a survival analysis in
   order to assess the prognostic value of the key hub genes and
   constructed a prognostic model for BCa. The performance of the
   developed model was validated using two independent datasets as well as
   an online bioinformatics tool. In addition, we included data from MIBC
   patients receiving preoperative cisplatin-based chemotherapy to explore
   the predictive value of the hub genes in terms of therapy response and
   to construct a predictive model for BCa which was validated onto two
   external datasets. Finally, we resulted in a nine-gene panel of
   potential key biomarkers for BCa and equipped machine learning
   techniques [[50]19] in order to deepen our research results and
   validate its diagnostic performance. We believe that these biomarkers
   could be promising diagnostic and prognostic targets for the management
   and treatment of BCa.

2. Materials and Methods

2.1. Overall Study Design and Workflow

   The overall design and pipeline of our integrative bioinformatics
   meta-analysis is presented in [51]Figure 1. In the first phase, this
   study aggregated multiple microarray datasets and after the creation of
   a merged meta-dataset, it identified the differentially expressed genes
   as well as the key hub genes for BCa, through protein–protein
   interaction and weighted gene co-expression network analyses. In
   addition, functional analysis of differentially expressed genes was
   performed using the Gene Ontology (GO), the Kyoto Encyclopedia of Genes
   and Genomes pathways (KEGG), the Reactome (REAC) knowledge base, and
   the Disease Ontology (DO). Subsequently, the identified key hub genes
   were assessed for their diagnostic, prognostic, and predictive value.
   Towards this goal, urine- and blood-based gene expression data were
   incorporated, as well as survival data from BCa patients with various
   stages, and from MIBC patients receiving neoadjuvant chemotherapy. The
   key hub genes that were significantly expressed in urine or blood
   plasma and concurrently held a prognostic or predictive value were
   considered as potential key biomarkers. Finally, these biomarkers were
   validated for their expression in BCa and also evaluated for their
   diagnostic performance in multiple datasets.

Figure 1.

   [52]Figure 1
   [53]Open in a new tab

   The overall study design and workflow of the integrative bioinformatics
   analysis of this study.

2.2. Data Source, Systematic Search, and Selection of Eligible Microarray
Datasets

   All the microarray datasets used are publicly available and were
   derived from the Gene Expression Omnibus (GEO) at the National Center
   for Biotechnology Information (NCBI), which is the largest public
   repository designed for archiving and distributing microarray,
   next-generation sequencing, and other functional high-throughput
   genomic data [[54]20].

   We conducted a systematic search in the GEO repository entering the
   following query: “(bladder OR urothelial) AND (tumor OR cancer OR
   carcinoma)”. Additionally, we applied the following filter criteria:
   “Series”, “Expression profiling by array”, and “Homo sapiens” as entry
   type, study type, and organism, respectively. We obtained a total of
   255 datasets from the inception up to 10 January 2022. A dataset was
   encompassed in the meta-analysis if the following main inclusion
   criteria were fulfilled: (1) implemented a case-control study design;
   (2) conducted using a single-color commercial microarray platform; (3)
   performed on human samples and derived from a lower urinary tract
   tissue (i.e., bladder or urethra); (4) performed only on untreated
   samples. We performed this meta-analysis conforming to the guidelines
   provided by the Preferred Reporting Items for Systematic Reviews and
   Meta-Analysis (PRISMA) statement published in 2020 [[55]21]. The
   details of the selection process, including the complete criteria and
   workflow implemented, are depicted in [56]Figure 2. Each dataset was
   independently checked by two authors (M.S. and G.I.L.).

Figure 2.

   [57]Figure 2
   [58]Open in a new tab

   Preferred reporting items for systematic reviews and meta-analyses
   (PRISMA 2020) flow diagram. * An array identical to a commercial
   platform in which a custom, remapped CDF environment is used to extract
   data, was considered as a commercial platform.

   The studies included in this meta-analysis examined human cancerous and
   normal urinary bladder tissues and were conducted using commercial
   platforms (Affymetrix, Illumina, and Agilent) for reproducibility and
   consistency reasons [[59]22]. In addition, the selection of
   single-color arrays allowed us to conduct an integrative “early stage”
   approach [[60]23], setting aside the increased complexity of
   incorporating data from two-color arrays.

   For some datasets the initial number of samples was higher than the
   samples ultimately included in our study and the reasons for ruling
   some of them out are specified as follows: the 24 samples from the
   series [61]GSE37815 were all included in series [62]GSE13507;
   therefore, they were removed from the latter series. Moreover, the
   series [63]GSE38264 included 13 samples from the organism mus musculus,
   and, thus, they were excluded from the series. Finally, the series
   [64]GSE40355 included 24 samples that were obtained using a non-coding
   RNA microarray platform (Agilent Human miRNA Microarray V2) and they
   were consequently removed from this series.

2.3. Platform-Specific Pre-Processing

   After identifying the eligible datasets for our study, we retrieved the
   raw microarray expression data for each dataset from GEO. Then,
   normalization was conducted in order to adjust the technical and
   environmental effects on the data. This procedure allows samples from a
   common study to be on a similar scale. The normalization process was
   performed in accordance with the dataset’s microarray platform.

   The normalization of the Affymetrix datasets was performed by using the
   Robust Multiarray Analysis (RMA) algorithm, within the R/Bioconductor
   packages affy (version 1.72) [[65]24], for the HG-U133A and HG-U133
   Plus 2 platform types, and oligo (version 1.58) [[66]25], for the
   HuGene-1.0 ST, HuEx-1.0 ST and HTA-2.0 platform types. This algorithm
   performs background correction, log2 data transformation, quantile
   normalization, and summarization of all probe sets into a single
   expression value for each gene. It has been shown to perform very well
   in terms of sensitivity to biological variation and to improve
   cross-platform comparability [[67]26,[68]27].

   The normalization of the Illumina datasets was conducted by utilizing
   methods implemented in the R/Bioconductor package limma (version 3.50)
   [[69]28], using the read.ilmn and neqc functions which read the
   Illumina expression data and perform background correction, log2
   transformation, quantile normalization, using negative and positive
   control probes for normalization, and only negative controls for
   background correction. In datasets [70]GSE13507 and [71]GSE37815, the
   control probes have been removed from the non-normalized data; hence,
   we utilized the read.table and normalizeBetweenArrays functions of the
   same R package to properly read the raw data and perform the
   above-described steps.

   Finally, the normalization of the Agilent datasets was also conducted
   by utilizing R/Bioconductor package limma methods. Raw data were read,
   background corrected (using the method normexp), log2 transformed, and
   quantile normalized, using the functions read.maimages,
   backgroundCorrect, and normalizeBetweenArrays within the package limma.
   The recommendations for the commonly found two-color Agilent arrays
   were followed since the same procedure applies and corresponds to a
   similar error model [[72]29].

2.4. Quality Control

   For all datasets, a common quality control (QC) non-platform-dependent
   analytical framework was applied for consistency reasons. After the
   corresponding pre-processing, each normalized dataset underwent QC
   implementing the outlier removal strategy. QCs were conducted using the
   R/Bioconductor package arrayQualityMetrics (version 3.50) [[73]30],
   inspecting three visualizations included in the arrayQualityMetrics
   reports: the heatmap of the distance between arrays, the boxplot of the
   logarithm ratios, and the MA plot, which includes the logarithm of the
   intensity ratios (M) vs. average log intensities (A). This strategy has
   been shown to improve the efficacy of the meta-analysis and to increase
   the power of differentially expressed gene detection [[74]31]. Samples
   classified as outliers in at least two of the three metrics during the
   quality control process were removed from their dataset. Subsequently,
   the raw data without outliers were normalized de novo, following the
   process described in the previous section, and were used for the
   downstream analysis.

2.5. Gene Annotation

   All probes were mapped at the gene level using gene symbols as the
   common identifier across platforms. The official gene symbols are
   approved by the HUGO Gene Nomenclature Committee (HGNC) [[75]32]. The
   use of HGNC-approved nomenclature is recommended since it is well
   curated and has been previously shown to enhance accuracy in scientific
   and public communication [[76]33]. If more than one probe was mapped to
   the same gene symbol, the final expression level of this gene was
   calculated as the average expression values of the different probes.
   Probes with annotations for more than one gene or with no annotations
   were excluded from our study.

   The mapping between probe sets and corresponding gene symbols was
   performed through particular annotation packages for each array model
   provided in the Bioconductor repository. The conversion between probes
   and gene symbols was achieved by the R/Bioconductor packages
   AnnotationDbi (version 1.56.2) and org.Hs.eg.db (version 3.14). In
   particular, the datasets were annotated using the R/Bioconductor
   packages hgu133a.db, hgu133plus2.db (version 3.13),
   hugene10sttranscriptcluster.db, huex10sttranscriptcluster.db,
   hta20transcriptcluster.db (version 8.8), illuminaHumanv2.db,
   illuminaHumanv3.db, and illuminaHumanWGDASLv4.db (version 1.26),
   depending on the platform. For the three series, [77]GSE21142,
   [78]GSE24152, and [79]GSE42089, the corresponding custom brainarray
   chip description file (CDF) was utilized for annotation. For the
   Agilent platforms, the probe-gene mapping was conducted utilizing the
   R/Bioconductor package biomaRt (version 2.50.3) in order to access the
   Ensembl annotation [[80]34]. The selection of these gene annotation
   resources was based on their constant updates, consistency, and
   reliability [[81]34,[82]35,[83]36]. It is essential that the probe set
   annotations are updated and reliable so that biological inferences can
   be made accurately throughout the downstream analysis.

2.6. Batch Effects and Cross-Platform Normalization

   Gene expression levels may vary due to biological factors in
   conjunction with non-biological ones, i.e., technical sources of
   variation, which are time- and place-dependent. These sources of
   variation, which are irrelevant to inter- and intra-sample class
   differences, are almost inevitable and summarily termed “batch effects”
   [[84]37]. On account of them, the data integration from diverse
   microarray gene expression experiments, which are conducted in this
   study, becomes a complicated procedure [[85]38].

   The information on the batch numbers or the date of experiments is not
   available for many of the 18 datasets of our integrative meta-analysis,
   so applying a method that adjusts data for known batches is unfeasible.
   In order to perform a batch effect detection, a visual inspection of
   dimension-reduced data representations, using principal component
   analysis (PCA), was conducted. It needs to be mentioned that, due to
   the detection of a very strong batch effect, samples from [86]GSE13507
   were further separated into two subgroups, GSE13507A and GSE13507B,
   respectively. These two subgroups were considered distinct datasets
   during our integrative meta-analysis.

   The Z-score transformation or standardization was applied to gene
   expression data, using the scale function in R package stats. The
   application of this classical normalization method constitutes an
   approach to standardizing data over a broad range of experiments and
   allows the microarray data juxtaposition regardless of the initial
   hybridization intensities [[87]39]. In addition, the Z-score
   transformation is simple, it has low time and memory complexity, it
   does not require any assumption on data distribution, and it has been
   implemented successfully in previous studies indicating high
   performance [[88]40,[89]41,[90]42]. Z-score transformation was applied
   to all samples by subtracting each sample’s mean and dividing by its
   standard deviation (SD), according to the formula:
   [MATH:
   <mrow><mrow><msub><mi>Z</mi><mrow><mi>s</mi><mi>c</mi><mi>o</mi><mi>r</
   mi><mi>e</mi></mrow></msub><mo>=</mo><mfrac><mrow><msub><mi>I</mi><mi>G
   </mi></msub><mo>−</mo><msub><mrow><mover><mi>I</mi><mo>-</mo></mover></
   mrow><mrow><msub><mi>G</mi><mn>1</mn></msub><mo>…</mo><msub><mi>G</mi><
   mi>n</mi></msub></mrow></msub></mrow><mi>σ</mi></mfrac></mrow></mrow>
   :MATH]
   (1)

   where
   [MATH: <mrow><mrow><msub><mi>I</mi><mi>G</mi></msub></mrow></mrow>
   :MATH]
   represents the intensity of gene G, and
   [MATH:
   <mrow><mrow><msub><mrow><mover><mi>I</mi><mo>-</mo></mover></mrow><mrow
   ><msub><mi>G</mi><mn>1</mn></msub><mo>…</mo><msub><mi>G</mi><mi>n</mi><
   /msub></mrow></msub></mrow></mrow> :MATH]
   and
   [MATH: <mrow><mi>σ</mi></mrow> :MATH]
   represent the mean intensity and standard deviation of the aggregate
   measure of all genes within a sample.

   After the simple data homogenization, a further removal of, or
   adjustment for, batch effects was not attempted as it could
   systematically induce incorrect group differences, especially for our
   analysis where the batch–group design is unbalanced. Instead, it is
   recommended that, when possible, the batch variables should be
   incorporated into the downstream analysis [[91]43]. Therefore, during
   the differential analysis, we incorporated each sample’s dataset as a
   covariate.

2.7. Differential Expression Analysis

   In our integrative meta-analysis, we followed an “early stage” data
   integration method [[92]23]. We created a merged microarray
   meta-dataset by binding the Z-score transformed samples and by matching
   the 8201 common gene symbols. This meta-dataset contained a total of
   606 samples, incorporating 410 BCa samples and 196 control samples,
   across 19 different datasets.

   The differentially expressed genes (DEGs) between BCa and normal tissue
   samples were screened using the R/Bioconductor package limma (version
   3.50) [[93]28], with the dataset/series of microarrays included as a
   covariate in the model. For the significance analysis, the main
   statistic used was the moderated t-statistic, which was computed for
   each gene symbol between cancer and control samples. In order to
   control the false discovery rate (FDR) for multiple comparisons, the
   p-value was adjusted based on the Benjamini–Hochberg (BH) method.

   The statistical methods used to identify DEGs depend on the
   determination of arbitrary thresholds for p-value and fold change (FC)
   of expression levels, which can significantly alter microarray
   interpretations [[94]44]. In our analysis, we used a stringent
   threshold for an adjusted p-value of 0.01. The cut-off threshold for
   |log[2]FC| is usually chosen between one and two. For the 11 different
   values of |log[2]FC| from one to two in steps of 0.1, 11 different sets
   of DEGs were obtained. It needs to be noted that the log[2]FC value for
   each gene that resulted from limma was corrected by dividing by the SD
   of the mean group differences for all genes, according to the formula:
   [MATH:
   <mrow><mrow><msub><mrow><mi>l</mi><mi>o</mi><mi>g</mi></mrow><mn>2</mn>
   </msub><mi>F</mi><msub><mi>C</mi><mrow><mi>c</mi><mi>o</mi><mi>r</mi><m
   i>r</mi><mi>e</mi><mi>c</mi><mi>t</mi><mi>e</mi><mi>d</mi></mrow></msub
   ><mo>=</mo><mfrac><mrow><msub><mrow><mi>l</mi><mi>o</mi><mi>g</mi></mro
   w><mn>2</mn></msub><mi>F</mi><mi>C</mi></mrow><mrow><msub><mi>σ</mi><mr
   ow><mi>Z</mi><mo>−</mo><mi>s</mi><mi>c</mi><mi>o</mi><mi>r</mi><mi>e</m
   i><mo> </mo><mi>d</mi><mi>i</mi><mi>f</mi><mi>f</mi><mi>e</mi><mi>r</mi
   ><mi>e</mi><mi>n</mi><mi>c</mi><mi>e</mi><msub><mi>s</mi><mrow><msub><m
   i>G</mi><mn>1</mn></msub><mo>…</mo><msub><mi>G</mi><mi>n</mi></msub></m
   row></msub></mrow></msub></mrow></mfrac></mrow></mrow> :MATH]
   (2)

   where
   [MATH:
   <mrow><mrow><msub><mi>σ</mi><mrow><mi>Z</mi><mo>−</mo><mi>s</mi><mi>c</
   mi><mi>o</mi><mi>r</mi><mi>e</mi><mo> </mo><mi>d</mi><mi>i</mi><mi>f</m
   i><mi>f</mi><mi>e</mi><mi>r</mi><mi>e</mi><mi>n</mi><mi>c</mi><mi>e</mi
   ><msub><mi>s</mi><mrow><msub><mi>G</mi><mn>1</mn></msub><mo>…</mo><msub
   ><mi>G</mi><mi>n</mi></msub></mrow></msub></mrow></msub></mrow></mrow>
   :MATH]
   represents the SD of the mean Z-score differences between the two
   experimental conditions (cancer versus control group) for all genes
   [[95]39].

   In order to find the optimal set of DEGs that led to a more robust
   classification of samples, a support vector machine (SVM) model was
   established using DEGs as features. The SVM is one of the most popular
   supervised learning algorithms and has demonstrated a high ability to
   handle high-dimensional data and superior performance in the microarray
   classification of cancers [[96]45]. In particular, an SVM model was
   built for each of the 11 sets of DEGs, implemented using the R package
   caret (version 6.0). For every set of features, the merged meta-dataset
   was split into a training set and a test set in the ratio of 90:10 and
   a random manner, and this procedure was iterated 10 times, implementing
   a 10-cross fold validation. The value of |log[2]FC|, and by extension
   the set of DEGs, which resulted in the higher area under the receiver
   operating characteristic (ROC) curve (AUC) of the classifier was
   selected. The ROC curve is a probability plot that features the true
   positive rate (sensitivity) against a false positive rate
   (1—specificity) at various threshold settings and constitutes an
   evaluation plot for binary classification problems. The AUC is a metric
   for the classifier’s ability to discriminate between classes and for
   the classification performance evaluation.

   After the definition of the cut-off criteria, the set of DEGs was
   obtained and the volcano plot, as well as the heatmap for the first 100
   DEGs, were plotted by implementing the R packages ggplot2 (version
   3.3.5) and ComplexHeatmap (version 2.10), respectively.

2.8. DEG Functional Enrichment Analysis

   In order to analyze and visualize functional profiles of the identified
   DEGs, the R/Bioconductor package clusterProfiler (version 4.2.2)
   [[97]46] was utilized. Gene Ontology (GO), Kyoto Encyclopedia of Genes
   and Genomes (KEGG) pathway, Reactome (REAC) pathway, and Disease
   Ontology (DO) enrichment analyses were conducted. Before performing
   enrichment analyses, the gene symbols were mapped to the Entrez Gene
   database in NCBI to retrieve Entrez Gene IDs, using the Bioconductor
   annotation org.Hs.eg.db. For all the following analyses, the cut-off
   thresholds were p-valueCutoff = 0.01 and q-valueCutoff = 0.05,
   corrected using the BH method.

   The GO knowledgebase is the most extensive information resource
   regarding gene functions [[98]47]. GO enrichment analysis covers three
   areas including cell component (CC), molecular function (MF), and
   biological process (BP), which were all included in our analysis. The
   GO terms for the down- and upregulated genes were enriched using the
   enrichGO function in the clusterProfiler. The GO terms were enriched by
   assigning OrgDb = “org.Hs.eg.db”, when running the enrichGO. Redundant
   enriched GO terms were removed using the simplify function, applying a
   threshold cut-off = 0.7 and the Wang method to measure the similarity
   [[99]48]. Subsequently, the most significantly enriched terms were
   plotted using a bar plot.

   KEGG is an integrated database for comprehending and associating
   higher-order functional information of the biological systems with
   genomic information [[100]49]. KEGG pathway enrichment analysis was
   performed using the enrichKEGG function in the clusterProfiler. The
   corresponding Entrez Gene IDs of DEGs were imported and the
   aforementioned threshold criteria were implemented. The enrichment
   analysis was plotted using a dot plot.

   REAC is a public, open-source, curated, and peer-reviewed pathway
   database that systematically relates human proteins to their molecular
   functions [[101]50]. The REAC pathway analysis was performed against
   REAC (version 79), and the R/Bioconductor package ReactomePA (version
   1.38) [[102]51] was used. The pathway analysis was plotted using a dot
   plot, and an enrichment map of the results, based on the pairwise
   similarities of the enriched terms, was also visualized [[103]52].

   The DO represents a comprehensive knowledge base of over 10,000
   inherited, developmental, and acquired human diseases [[104]53]. For
   the DO enrichment analysis, the R/Bioconductor package DOSE (version
   3.20.1) [[105]54] was used. DO terms with more than minGSSize = 5 and
   less than maxGSSize = 500 genes annotated were tested, and from them,
   only those satisfying the cut-off thresholds were considered to be
   significantly enriched. The results were presented in the form of a dot
   plot.

2.9. Protein–Protein Interaction Network Analysis

   A protein–protein interaction (PPI) network analysis was conducted to
   further explore the potential interaction between DEGs obtained from
   the integrative meta-analysis of the different datasets and to discover
   the key hub genes among them. The search tool for retrieval of
   interacting genes (STRING) database (version 11.5) [[106]55], which
   incorporates both known and predicted PPIs, was employed to predict
   functional interactions between proteins. The PPI network of the 815
   DEGs was created and visualized via the STRING web interface, applying
   a minimum required interaction score of 0.4.

   In addition, the PPI network was imported into the Cytoscape software
   (version 3.9.1) [[107]56]. The PPI network nodes were ranked performing
   10 topological analysis methods from the cytoHubba plugin [[108]57] in
   Cytoscape. These included three local-based methods, which are Maximal
   Clique Centrality (MCC), Maximum Neighborhood Component (MNC), and
   degree, as well as seven global-based methods, which are Edge
   Percolated Component (EPC), BottleNeck, EcCentricity, Closeness,
   Radiality, Betweenness, and Stress. A final ranking of the PPI
   network’s hub genes, based on the cytoHubba analysis, was obtained by
   utilizing the robust rank aggregation (RRA) method from the R package
   RobustRankAggreg (version 1.1). In the final ranking, only the hub
   genes with a p-value < 0.01 were kept.

   Additionally, the MCODE (Molecular Complex Detection) plugin [[109]58]
   of Cytoscape was used to determine gene clusters in the constructed
   network. The selection parameters were set as follows: MCODE scores ≥
    7, degree cut-off  =  2, node score cut-off  =  0.2, max depth  =
    100, k-score  =  2, and haircut = true. A gene list with all the genes
   belonging to the clusters that fulfill the above criteria was acquired.

   Finally, the intersection of the two generated gene lists was
   calculated, in order to obtain a final list of hub genes based on the
   two Cytoscape plugins. The PPI network of the final hub genes list was
   also constructed.

2.10. Weighted Correlation Network Analysis

   Weighted gene co-expression network analysis (WGCNA) can be used to
   construct a weighted gene co-expression network, define clusters
   (modules) of highly correlated genes, correlate modules with clinical
   traits, and identify intramodular hub genes [[110]59]. In this study,
   we performed consensus WGCNA, using the R/Bioconductor package WGCNA
   [[111]60], in order to find key gene modules that are highly associated
   with BCa. We utilized the individual datasets that were employed in the
   current integrative meta-analysis. Only the series that contained more
   than 20 samples were deployed, as datasets with fewer samples would
   simply be too noisy for the network to be biologically meaningful. Due
   to the fact that data came from different batches, which are unknown,
   we checked and adjusted for batch effects using the R/Bioconductor
   package sva (version 3.42) [[112]61].

   Gene expression values were hierarchically clustered for each of the
   datasets, in order to identify outliers and remove them from further
   analysis. For each dataset, the network topology analysis was
   performed. The intended scale-free topology fitting index threshold
   (R^2) was set above 0.77 and the median connectivity was set below 30.
   After the selection of the proper soft thresholding power, the
   consensus modules across the datasets were calculated using the
   blockwiseConsensusModules function. The parameters were set as soft
   threshold power = 7, minModuleSize = 30, deepSplit = 2, and
   mergeCutHeight = 0.25, for merging the highly similar modules, and all
   the genes were processed into one block. Subsequently, for each
   dataset, the correlation matrix was converted into an adjacency matrix,
   which was analyzed further to compute the topological overlap matrix
   (TOM), using the TOM similarity algorithm. Based on the dissimilarity
   topological overlap calculation formula, the 8201 genes were assigned
   to distinct gene modules indicated by various colors.

   Consequently, the correlation degree between each module’s eigengene
   (ME) and sample phenotype for each dataset was calculated by the
   Pearson correlation coefficient, using the cor function, and
   corPvalueFisher function for the calculation of the corresponding
   p-value. In order to find a consensus module–trait correlation, we
   formed a measure of module–trait relationship that summarized all the
   datasets into one measure: for each module–trait pair we obtained the
   correlation based on the shared correlation sign across datasets.
   Particularly, the lower absolute value was attributed to each consensus
   module–trait coefficient, if the correlations had the same sign, and a
   zero correlation for those with opposite signs. Hence, only modules
   with consistent correlation coefficients, either positive or negative,
   across datasets were considered key modules. The key gene modules were
   determined based on the correlation coefficient and the significance
   between the module’s ME values and sample traits (phenotypic group).

   To further identify which genes in the key modules were highly
   associated with clinical traits, the correlation between sample
   phenotype, gene significance (GS), and module membership (MM) was
   evaluated. MM stands for the correlation between MEs and the profile of
   gene expression, and GS represents the correlation between genes and
   phenotypic traits. Thus, for every ME we calculated the GS and MM in
   each dataset, then, we combined the Z-scores of correlations from each
   dataset to form a consensus meta-Z-score and the corresponding
   meta-p-value for each module. Genes with high Z-scores for both MM and
   GS in the key module were highly interrelated with the cancer trait.
   Particularly, genes for which the MM and GS values were in the upper or
   lower quartile of all genes in the module were determined as hub genes
   for BCa. Finally, we compared these hub genes with the hub genes
   derived from the PPI network analysis in order to obtain the key hub
   genes of this study which were highly connected with BCa.

2.11. Differential Expression in Urine and Blood Plasma Samples

   In order to explore the potential altered expression of the key hub
   genes, which derived from the gene expression analysis of urinary
   bladder tissues, in other biological fluids, urine and blood samples
   were included in our integrative meta-analysis. These samples underwent
   gene expression analysis by array, following a similar case-control
   study design to the meta-analyzed datasets in this study, and they were
   also downloaded from the GEO repository.

   For the gene expression profiling in urine, we retrieved two datasets
   from the GEO repository, namely [113]GSE51843 and [114]GSE68020. The
   first dataset ([115]GSE51843) includes a total of 11 mRNA-containing
   extracellular vesicle samples, 5 urine samples from BCa patients, and 6
   samples from non-cancer patients, which were characterized by Illumina
   Human HT12 v4 BeadChip ([116]GPL10558) [[117]62]. The latter dataset
   ([118]GSE68020) contains a total of 50 urine samples, including 30
   high-grade urothelial carcinomas and 20 non-tumor healthy controls,
   which were characterized by the same microarray platform
   ([119]GPL10558).

   The raw data were downloaded for both datasets and platform-specific
   pre-processing was conducted, as was described in [120]Section 2.3 for
   the other datasets. Implementing the num.sv function from the
   R/Bioconductor package sva, no surrogate variables were identified and,
   hence, no batch correction was needed. Each one of the key hub genes
   was tested for its statistically significant difference between the BCa
   and the control group in each dataset, using the Wilcoxon rank sum test
   which consists of a convenient and robust way to identify
   differentially expressed genes [[121]63].

   The blood dataset ([122]GSE138118) includes a total of 75 samples, 11
   newly diagnosed patients with BCa, 18 recurrence-negative formerly
   diagnosed BCa patients, 17 recurrence-positive formerly diagnosed BCa
   patients, and 29 healthy volunteers with no previous history of BCa or
   any other cancer. Total plasma RNA was isolated from clinical whole
   blood samples and was characterized by Affymetrix Human Gene 2.1 ST
   Array ([123]GPL17692).

   The raw expression data were downloaded and a platform-specific
   pre-processing was conducted, as was previously described in
   [124]Section 2.3 for the other datasets. A batch effects removal was
   performed to minimize the unwanted variation on the data, using the sva
   function as implemented in the R/Bioconductor package sva, since it can
   be used without known batch variables. For this dataset, only the 28
   BCa blood samples, from newly- or recurrence-positive formerly
   diagnosed patients, were kept along with the 29 control blood samples
   from healthy individuals. Each one of the key hub genes was tested for
   its statistically significant difference between the BCa and the
   control group, using the Wilcoxon rank sum test.

2.12. Finding Prognostic Genes for BCa

   For the purpose of identifying which of the key hub genes hold a
   prognostic value, a survival analysis was conducted. Towards this
   purpose, the dataset [125]GSE13507, which contains gene expression
   profile data from 165 patients with BCa of various stages (102 NMIBC
   and 63 MIBC) [[126]64], was utilized. The clinical data of the patients
   are also available and contain information about the cancer-related
   events and the overall survival (OS) time. A univariate Cox regression
   analysis on the key hub genes was conducted to evaluate the association
   between cancer-specific OS of each patient and gene expression values,
   considering only genes with a p-value < 0.05. The R package survival
   (version 3.2) was used to conduct the univariate Cox regression
   analysis [[127]65]. In order to select a panel of genes, and then build
   a prognostic multi-gene signature model, the least absolute shrinkage
   and selection operator (LASSO) Cox regression was performed, applying a
   10-fold cross validation for 100 iterations, using the R package glmnet
   (version 4.1) [[128]66]. Aiming to eliminate the selected gene
   correlation and prevent model overfitting, the gene coefficients were
   shrunk towards zero, by applying the minimum deviance lambda.min in
   each iteration and using Harrell’s C-index (concordance index) as the
   fit measure. The genes with nonzero coefficients for 75% of iterations
   were selected. In order to narrow the gene list down further and
   optimize the model, a multivariate Cox analysis was performed to
   identify the independent predictors for the prognosis of BCa patients
   and construct a prognostic index (PI) model. The PI was calculated
   based on the formula:
   [MATH: <mrow><mrow><mi>P</mi><mi>I</mi><mo>=</mo><munderover><mstyle
   mathsize="140%"
   displaystyle="true"><mo>∑</mo></mstyle><mrow><mi>i</mi><mo>=</mo><mn>1<
   /mn></mrow><mi>n</mi></munderover><msub><mi>c</mi><mi>i</mi></msub><msu
   b><mi>X</mi><mi>i</mi></msub></mrow></mrow> :MATH]
   (3)

   where
   [MATH: <mrow><mrow><msub><mi>c</mi><mi>i</mi></msub></mrow></mrow>
   :MATH]
   is the coefficient of the ith gene,
   [MATH: <mrow><mrow><msub><mi>X</mi><mi>i</mi></msub></mrow></mrow>
   :MATH]
   is the expression of the ith gene, and
   [MATH: <mrow><mi>n</mi></mrow> :MATH]
   is the number of the selected genes in the optimal model. The
   prognostic score was calculated for each patient, and the median score
   was defined as the cut-off value that stratified BCa patients into low-
   and high-risk groups to contrast their survival. The one-, three-,
   five-, and ten-year ROC curves were drawn along with AUC values for the
   evaluation of the model’s performance, using the R packages survivalROC
   (version 1.3.0) [[129]67] and plotROC (version 2.2.1) [[130]68]. To
   explore the relationship among the prognostic genes of this panel, we
   determined the Pearson correlation coefficient between all pairs.
   Finally, the R package survminer (version 0.4.9) was used to perform
   the Cox proportional hazards model analysis.

   To investigate whether our prognostic model was applicable to other
   datasets and to validate its prognostic value, we obtained two
   independent microarray datasets from the GEO repository ([131]GSE32894
   and [132]GSE32548), which incorporated gene expression data along with
   survival information of BCa patients. The [133]GSE32894 contains a
   total of 224 primary BCa samples of various stages, which were
   characterized by Illumina HumanHT-12 V3.0 ([134]GPL6947) [[135]69]. The
   original dataset contains more samples but information about the
   survival events is available only for a subset of the original samples.
   The [136]GSE32548 dataset includes a total of 131 primary BCa tumor
   samples, which were characterized by the same platform ([137]GPL6947)
   [[138]70]. Subsequently, the prognostic index was calculated for each
   patient of the two datasets. Based on this index, patients were divided
   into low- and high-risk groups, and Kaplan–Meier survival curves were
   generated to compare survival between the two groups by log-rank test,
   considering a p-value < 0.05 as statistically significant. The hazard
   ratios (HR) and 95% confidence intervals (CI) were also calculated.
   Time-dependent ROC analyses were conducted to evaluate the prognostic
   effectiveness of the prognostic risk score model.

   Additionally, we utilized publicly available online bioinformatics
   tools to also access the prognostic value of the identified key hub
   genes. Gene Expression Profiling Interactive Analysis (GEPIA2)
   [[139]71] is an open-access online tool for the interactive exploration
   of RNA sequencing data from The Cancer Genome Atlas (TCGA) [[140]72]
   and the Genotype-Tissue Expression (GTEx) [[141]73] programs. GEPIA2
   was utilized for accessing the prognostic value of the key hub genes in
   terms of OS or disease-free survival (DFS) of the TCGA-BCa patients.
   The discovery TCGA-BCa cohort consists of 404 BCa patients and 19
   controls. The difference between the survival rates of high- and
   low-expression groups for each key hub gene was contrasted using the
   log-rank test, considering statistical significance when the p-value <
   0.05 and using the median or the quartile as cut-off criteria. The
   survival curves with the calculated HR and the log-rank p-value were
   plotted. Lastly, the GEPIA2 platform was utilized to confirm the
   prognostic validity of the gene signature generated by the multivariate
   Cox regression analysis and to plot the survival curves.

2.13. Finding Predictive Genes for BCa

   Aiming to further investigate the predictive value of the key hub
   genes, samples from MIBC patients receiving preoperative
   cisplatin-based chemotherapy were included in our analysis. These
   samples underwent gene expression analysis by array and they were
   derived from the GEO repository. One of the aims of this study was to
   explore to what extent gene expression signatures can predict
   chemotherapy response. It is noteworthy that the current standard for
   MIBC is platinum-containing (e.g., cisplatin) neoadjuvant chemotherapy
   followed by radical cystectomy. Nonetheless, for many patients, there
   is a low chemotherapy success rate and several candidate biomarkers of
   therapy responsiveness are investigated [[142]74].

   The selected dataset ([143]GSE169455) includes a total of 149 samples,
   which are all derived from MIBC patients receiving neoadjuvant
   cisplatin-based chemotherapy undergoing radical cystectomy [[144]75].
   RNA was extracted from bladder transurethral resection specimens and
   hybridized on an Affymetrix Human Gene 1.0 ST Array ([145]GPL6244). The
   raw expression data were downloaded and a platform-specific
   pre-processing was conducted, as was previously described ([146]Section
   2.3). A batch effects removal was performed to minimize the unwanted
   variation on the data, using the Combat function from the
   R/Bioconductor package sva [[147]76].

   The main outcome measure was a pathological response in the cystectomy
   specimen, stratified as “complete response”, “partial response”, and
   “no response”. Each one of the key hub genes was tested for its
   statistically significant difference between the three response groups,
   using the Wilcoxon rank sum test. Furthermore, the univariate Cox
   regression analysis was conducted in order to calculate the association
   between each hub gene and the recurrence-free survival (RFS),
   cancer-specific survival (CSS), and overall survival (OS) of each
   patient, using the R package survival and considering a p-value < 0.05
   as statistically significant. As univariate analysis resulted in a
   limited number of genes, the LASSO Cox regression for penalty parameter
   tuning (as described in [148]Section 2.12) with 10-fold cross
   validation was performed to screen the key hub genes. The prevailing
   nonzero-coefficient genes were incorporated into the multivariate
   analysis, applying the Cox proportional hazards regression model, which
   resulted in a predictive gene signature. Finally, we successfully
   constructed a predictive risk score formula by using the corresponding
   coefficients of the gene signature (as in [149]Section 2.12). The risk
   score divided the patients into low- and high-risk groups by the median
   value. The Kaplan–Meier survival curves were plotted for the two groups
   and time-dependent ROC curve analysis was performed based on the
   prediction risk score, and the AUC values were calculated to assess the
   prediction performance.

   To validate the predictive value of our model, we acquired two
   independent microarray datasets ([150]GSE87304 and [151]GSE69795) from
   the GEO database, which included survival information of BCa patients
   recruited into a neoadjuvant trial. The [152]GSE87304 contains 305
   specimens from patients with MIBC, obtained by transurethral resection
   prior to pre-neoadjuvant chemotherapy, which was characterized by
   Affymetrix Human Exon 1.0 ST Array [[153]77]. The [154]GSE69795
   contains 38 formalin-fixed paraffin-embedded bladder tumors, obtained
   by transurethral resection from BCa patients receiving neoadjuvant
   chemotherapy with dose-dense methotrexate, vinblastine, doxorubicin,
   and cisplatin along with bevacizumab, which was characterized by
   Illumina HumanHT-12 WG-DASL V4.0 R2 [[155]78]. For both datasets,
   patients were divided into low- and high-risk groups, according to the
   predictive index, and Kaplan–Meier survival curves were plotted,
   including HR and 95% CI. Time-dependent ROC analysis was performed to
   evaluate the predictive effectiveness of the risk score model.

2.14. Expression Validation of Key Biomarkers and Immunohistochemistry

   Based on the identified key hub genes and taking the above analysis
   into consideration, we opted for nine potential key biomarker genes
   that seem to play a significant role in the development and progression
   of BCa. All these biomarkers are significantly expressed in the urine
   or blood plasma of BCa patients and hold a prognostic or predictive
   value.

   The GEPIA2 platform was utilized to confirm the expression alterations
   of the key biomarker genes. The external validation was done by
   comparing transcriptomic data from TCGA-BCa, TCGA normal, and GTEx
   datasets. The cut-off criteria |log[2]FC| > 1 and p-value < 0.05 were
   considered for a statistically significant difference. In addition, the
   association of the key biomarker genes with the pathological stages of
   BCa was performed through the GEPIA2 platform.

   Further, the protein expression encoded by these biomarker genes was
   validated in BCa specimens using the Human Protein Atlas platform,
   which incorporates spatial proteomics and quantitative transcriptomics
   (RNA-Seq) data obtained from immunohistochemistry (IHC) analysis of
   tissue microarrays [[156]79].

2.15. Diagnostic Performance Analysis

   Using the identified key biomarker genes as features, we built and
   tested various classification models to access their diagnostic
   performance. Initially, all the datasets used in the current analysis
   and contained more than 10 samples were repurposed as training/test
   sets in order to validate the diagnostic ability of the nine features.
   Their diagnostic value was also evaluated in the final merged
   meta-dataset.

   Finally, an external dataset was also utilized to further evaluate
   these features as diagnostic biomarkers for BCa. This dataset was
   obtained from the ArrayExpress repository at the European
   Bioinformatics Institute (EMBL-EBI) [[157]80] and contained 19 BCa (14
   NMIBC and 5 MIBC) and 11 control samples from urinary bladder tissue
   biopsies [[158]81], which were hybridized to Affymetrix Human Gene 1.0
   ST ([159]GPL6244).

   For every dataset, a fivefold cross-validation technique was
   implemented and repeated 10 times in order to get a more accurate
   evaluation of our classification model’s performance. For the final
   merged meta-dataset, a 10-fold cross validation was implemented. For
   all the developed models, the AUC of the classifier was used to
   evaluate the diagnostic performance of the model. The resulting ROC
   curves for all the built models along with their corresponding AUC
   values and 95% CIs were plotted. Each point on the ROC curves denotes a
   sensitivity/specificity pair obtained from a particular decision
   threshold, and the AUC indicates the efficacy of the corresponding
   model. The closer the AUC is to 1.0, the better the performance of the
   classification model.

3. Results

3.1. Systematic Search and Selection of Eligible Microarray Datasets

   A total of 18 studies from GEO (accession numbers: [160]GSE3167,
   [161]GSE7476, [162]GSE13507, [163]GSE21142, [164]GSE23732,
   [165]GSE24152, [166]GSE31189, [167]GSE37815, [168]GSE38264,
   [169]GSE40355, [170]GSE41614, [171]GSE42089, [172]GSE45184,
   [173]GSE52519, [174]GSE65635, [175]GSE76211, [176]GSE100926, and
   [177]GSE121711) met the inclusion criteria ([178]Figure 2) and were
   selected for the integrative meta-analysis. The final dataset included
   619 samples (417 BCa samples and 202 controls). The datasets included
   in this meta-analysis followed a similar experimental design and
   compared human BCa tissues with normal ones. Notably, the datasets were
   characterized by 13 different microarray platforms. [179]Table 1
   provides detailed information on each dataset included in the
   integrative meta-analysis and highlights the sample type, their
   phenotypic characteristics, year, and reference of the study and
   microarray platform used.

Table 1.

   Characteristics of the 18 individual series included in the integrative
   meta-analysis.
   GEO
   Accession Samples (n) Year Platform Sample Characteristics Reference
   Total BCa Controls
   [180]GSE3167 60 46 14 2005 [181]GPL96
   (HG-U133A) Affymetrix Human Genome U133A Array
     * 13 superficial transitional cell carcinomas with surrounding CIS
     * 15 without surrounding CIS lesions13 muscle-invasive carcinomas
     * 5 CISs
     * 5 CISs
     * 14 normal bladder tissues

   [[182]82]
   [183]GSE7476 12 9 3 2007 [184]GPL570
   (HG-U133_Plus_2) Affymetrix Human Genome U133 Plus 2.0 Array
     * 3 groups of 5 pooled Ta tumors
     * 1 group of 5 pooled T1 tumors
     * 2 groups of 4 pooled T1 tumors
     * 3 groups of 5 pooled T2+ tumors
     * 3 groups of 4 pooled normal bladder tissues

   [[185]83]
   [186]GSE13507 232 170 62 2010 [187]GPL6102
   Illumina human-6 v2.0 expression beadchip
     * 24 primary Ta cancer tissues
     * 62 primary T1 cancer tissues
     * 31 primary T2 cancer tissues
     * 19 primary T3 cancer tissues
     * 11 primary T4 cancer tissues
     * 23 recurrent NMIBC tissues
     * 62 normal bladder tissues

   [[188]64]
   [189]GSE21142 24 12 12 2013 [190]GPL10274
   Affymetrix GeneChip Human Genome U133 Plus 2.0 Array
   (Brainarray CustomCDF, GU133Plus2_Hs_UG_Version 12.cdf)
     * 6 superficial urothelial carcinomas
     * 6 invasive urothelial carcinomas
     * 12 normal bladder tissues

   [[191]84]
   [192]GSE23732 8 7 1 2012 [193]GPL6244
   (HuGene-1_0-st) Affymetrix Human Gene 1.0 ST Array
   (transcript (gene) version)
     * 7 muscle-invasive bladder cancer
     * 1 normal bladder tissue

   -
   [194]GSE24152 17 10 7 2010 [195]GPL6791
   Affymetrix GeneChip Human Genome U133 Plus 2.0 Array
   (CDF: Hs_ENTREZG_10)
     * 10 muscle-invasive urothelial bladder carcinomas
     * 7 benign bladder tissues

   [[196]85]
   [197]GSE31189 92 52 40 2013 [198]GPL570
   (HG-U133_Plus_2) Affymetrix Human Genome U133 Plus 2.0 Array
     * 52 urothelial cancer cells
     * 40 normal urothelial cells

   [[199]86]
   [200]GSE37815 24 18 6 2013 [201]GPL6102
   Illumina human-6 v2.0 expression beadchip
     * 18 NMIBC tissues
     * 6 normal bladder tissues

   [[202]87]
   [203]GSE38264 38 28 10 2014 [204]GPL6244
   (HuGene-1_0-st) Affymetrix Human Gene 1.0 ST Array
   (transcript (gene) version)
     * 28 Ta and T1 tumors
     * 10 normal bladder tissues

   [[205]88]
   [206]GSE40355 24 16 8 2013 [207]GPL13497
   Agilent-026652 Whole Human Genome Microarray 4x44K v2
   (Probe Name version)
     * 8 Ta urothelial carcinoma tissues
     * 5 T1 urothelial carcinoma tissues
     * 3 T2 urothelial carcinoma tissues
     * 8 normal bladder tissue samples

   [[208]89]
   [209]GSE41614 10 5 5 2013 [210]GPL5175
   (HuEx-1_0-st) Affymetrix Human Exon 1.0 ST Array
   (transcript (gene) version)
     * 2 samples with blood vessels from T1 bladder cancer tissue
     * 2 samples with blood vessels from T2 bladder cancer tissue
     * 1 sample with blood vessels from T3 bladder cancer tissue
     * 5 samples with blood vessels from normal bladder

   [[211]90]
   [212]GSE42089 18 10 8 2013 [213]GPL9828
   (HG-U133_Plus_2) Affymetrix Human Genome U133 Plus 2.0 Array (CDF:
   Brainarray Hs133P_Hs_ENTREZG version 10)
     * 10 urothelial cell carcinomas
     * 8 normal bladder tissues

   [[214]91]
   [215]GSE45184 6 3 3 2013 [216]GPL14550
   Agilent-028004 SurePrint G3 Human GE 8x60K Microarray
   (Probe Name Version)
     * 3 bladder cancer tissues
     * 3 normal adjacent tissues

   [[217]92]
   [218]GSE52519 12 9 3 2013 [219]GPL6884
   Illumina HumanWG-6 v3.0 expression beadchip
     * 1 T1 cancerous tissue sample
     * 2 T2 cancerous tissue samples
     * 2 T3 cancerous tissue samples
     * 4 T4 cancerous tissue samples
     * 3 normal bladder tissues

   [[220]93]
   [221]GSE65635 12 8 4 2015 [222]GPL14951
   Illumina HumanHT-12 WG-DASL V4.0 R2 expression beadchip
     * 5 T1 bladder cancer tissues
     * 2 T3 bladder cancer tissues
     * 1 T4 bladder cancer tissues
     * 4 normal bladder tissues

   [[223]93]
   [224]GSE76211 6 3 3 2017 [225]GPL17586
   (HTA-2_0) Affymetrix Human Transcriptome Array 2.0
   (transcript (gene) version)
     * 3 T3 bladder cancer tissues
     * 3 normal bladder tissues

   [[226]94]
   [227]GSE100926 6 3 3 2017 [228]GPL14550
   Agilent-028004 SurePrint G3 Human GE 8x60K Microarray
   (Probe Name Version)
     * 2 T2 MIBC tissues
     * 1 T3 MIBC tissue
     * 3 normal bladder tissues

   [[229]95]
   [230]GSE121711 18 8 10 2019 [231]GPL17586
   (HTA-2_0) Affymetrix Human Transcriptome Array 2.0
   (transcript (gene) version)
     * 3 Ta primary tumors
     * 2 T1 primary tumors
     * 3 T2 primary tumors
     * 10 normal bladder tissues

   [[232]96]
   Total 619 417 202 -- -- -- --
   [233]Open in a new tab

3.2. Quality Control

   In total, 13 samples were identified as outliers according to the
   implemented QC framework and they were consequently excluded from
   further analysis. More specifically, we removed two samples from the
   dataset [234]GSE3167 (one BCa sample and one control), one sample from
   the dataset [235]GSE13507 (BCa sample), one sample from the dataset
   [236]GSE21142 (BCa sample), five samples from the dataset [237]GSE31189
   (two BCa samples and three controls), one sample from the dataset
   [238]GSE38264 (control), two samples from the dataset [239]GSE40355
   (two BCa samples), and one sample from the dataset [240]GSE42089
   (control). After QC, the meta-analysis included 606 samples, consisting
   of 410 BCa samples and 196 controls.

3.3. Gene Annotation

   Subsequent to the proper probe-gene mapping for each individual
   platform, we juxtaposed the coverage of the 13 different human arrays
   (Affy HG U133A, Affy HG U133 Plus 2, Illu Human-6 V2, Affy HuGene 1 ST,
   Agi WHG 4x44K V2, Affy HuEx 1 ST, Agi G3 GE 8x60K, Illu Human-6 V3,
   Illu Human-12 WG-DASL V4, Affy HTA 2, and three custom Brainarray CDFs
   for Affy HG U133 Plus 2). Overall, the probes on the 13 array platforms
   targeted a total of 27,579 unique gene symbols, out of which 8201 gene
   symbols were common among all 13 microarray platforms. Hence, the
   integrative meta-analysis was conducted only on these 8201 common gene
   symbols across all datasets.

3.4. Batch Effects and Cross-Platform Normalization

   The batch effects presented in each dataset were inspected utilizing
   the PCA. Due to a very strong detected batch effect, samples from
   [241]GSE13507 were further separated into two subgroups, GSE13507A and
   GSE13507B, with 41 (24 BCa samples and 17 controls) and 190 (145 BCa
   samples and 45 controls) samples, respectively ([242]Figure S1). The
   new PCA plots for each of these two datasets are presented in
   [243]Figure S2. These two subgroups were considered as individual
   datasets during the downstream analysis.

   The Z-score transformation was used to correct intra-sample data and to
   adjust the systematic bias across datasets generated by different
   platforms. Therefore, the hybridization values for each gene within a
   sample are expressed in SD units from the zero mean. Comparisons across
   samples were then performed on uniformly transformed data.

3.5. Differential Expression Analysis

   Following the pre-processing and standardization of each dataset, we
   combined the Z-score transformed expression data for every sample into
   a universal dataset by using the common gene symbols. This merged
   meta-dataset included 606 samples (410 BCa and 196 control samples) and
   8201 shared gene symbols.

   The DEGs between BCa and control tissue samples of the 19 merged
   datasets from the GEO were obtained using the R/Bioconductor package
   limma. The p-value was adjusted using the BH method in order to control
   the FDR, and a cut-off threshold of adjusted p-value < 0.01 was
   selected. In order to determine the |log[2]FC| cut-off value, an SVM
   classification model was established for each of the different sets of
   DEGs corresponding to the |log[2]FC| values from one to two in steps of
   0.1. Thus, for every |log[2]FC| value the corresponding set of DEGs was
   used as features. For every model, the number of the features along
   with the estimated area under the ROC curve (AUC), the sensitivity, and
   the specificity of the classifier are presented in [244]Table 2. The
   AUC actually expresses the probability value that one sample is
   classified correctly. As we notice, the classifier achieves very high
   classification precision and the differences are almost negligible.

Table 2.

   Performance parameters of the various classification models for the
   different set of DEGs. Bold: the one with the highest AUC.
   |log[2]FC| No of Features (DEGs)  AUC   Sensitivity Specificity
       1               1295         0.9525    0.7964      0.9366
      1.1              1099         0.9517    0.7934      0.9327
      1.2              929          0.9527    0.7842      0.9346
      1.3              815          0.9531   0.7985      0.9334
      1.4              725          0.9510    0.7996      0.9322
      1.5              625          0.9516    0.8022      0.9342
      1.6              549          0.9487    0.7929      0.9278
      1.7              495          0.9482    0.7844      0.9312
      1.8              442          0.9510    0.7966      0.9298
      1.9              407          0.9519    0.8001      0.9288
      2.0              364          0.9507    0.7903      0.9356
   [245]Open in a new tab

   However, the highest value for AUC, which indicates the classifier with
   the highest performance, was achieved for the cut-off value of
   |log[2]FC| = 1.3. Therefore, a total of 815 DEGs between BCa and
   control samples were obtained through the expression profiles of the
   limma package, implementing |log[2]FC| ≥ 1.3 and adjusted p-value <
   0.01 as cut-off criteria. Overall, the DEGs contained 540 downregulated
   genes and 275 upregulated genes. The volcano plot and the top 100 DEGs
   heatmap are illustrated in [246]Figure S3 and [247]Figure 3,
   respectively. In the generated heatmap, hierarchical clustering was
   performed on gene and on sample level as well. It can be observed that
   batch effects were present in the gene expression space, as samples
   were clustered based on their phenotype and microarray study.

Figure 3.

   [248]Figure 3
   [249]Open in a new tab

   Heatmap plot of the top 100 DEGs between BCa and control samples in the
   merged meta-dataset. Blue and red represent relative downregulation and
   upregulation, and white represents no significant change in gene
   expression. Horizontal and vertical axes are clustered by genes and
   samples, respectively.

3.6. Functional and Pathway Enrichment Analysis

   To gain insight into the functional roles of DEGs and pathways involved
   in BCa, we performed a comprehensive functional enrichment analysis in
   various databases. In particular, GO, KEGG pathway, REAC pathway, and
   DO enrichment analysis of the 815 robust DEGs were performed, using the
   R/Bioconductor package clusterProfiler. For all the following analyses,
   the cut-off threshold parameters were p-valueCutoff = 0.01 and
   q-valueCutoff = 0.05, corrected using the BH method.

   Gene Ontology (GO) enrichment analysis was performed based on the list
   of identified DEGs. The bar plots of the top 25, if present, enriched
   GO terms of biological processes (BP), molecular functions (MF), and
   cellular components (CC) were generated in the form of bar plots and
   are presented in [250]Figure 4. GO terms of downregulated genes related
   to BP included extracellular matrix organization, extracellular
   structure organization, angiogenesis, vasculature, muscle structure,
   and muscle tissue development; GO terms of upregulated genes related to
   BP included cell division, mitotic cell cycle process, chromosome
   segregation, and organization. In the MF category, downregulated DEGs
   were enriched with extracellular matrix structural constituent, as well
   as glycosaminoglycan, integrin, sulfur compound, and calcium ion
   binding functions. The upregulated genes in the MF area were enriched
   only with DNA replication origin binding. CC GO terms of downregulated
   genes were primarily associated with the collagen-containing
   extracellular matrix, extracellular encapsulating structure, and
   extracellular matrix. Finally, CC GO terms of upregulated genes were
   enriched in the spindle, chromosome centromeric region, and chromosomal
   region.

Figure 4.

   [251]Figure 4
   [252]Open in a new tab

   Top 25 significant Gene Ontology (GO) enrichment analysis terms of
   (A,B) biological processes (BP) of down- and upregulated DEGs; (C,D)
   molecular functions (MF) of down- and upregulated DEGs; (E,F) cellular
   components (CC) of down- and upregulated DEGs.

   KEGG pathway enrichment analysis demonstrated that there were 35
   pathways enriched in the set of DEGs. The top 25 enriched terms are
   presented in [253]Figure 5. The analysis showed that PI3K-Akt signaling
   pathway, micro-RNAs in cancer, cell cycle, focal adhesion, cell
   adhesion molecules, cellular senescence, complement, and coagulation
   cascades, ECM–receptor interaction, and bladder cancer were highly
   connected with the detected DEGs.

Figure 5.

   [254]Figure 5
   [255]Open in a new tab

   Top 25 significant terms of the KEGG pathway analysis.

   Through the REAC enrichment analysis, a number of 81 pathways were
   enriched. All of the top 25 pathways showed high significance for their
   respective entities and are presented in [256]Figure 6. The most
   enriched REAC terms were extracellular matrix organization, cell cycle
   checkpoints, and Rho GTPase effectors, which are key regulators of
   cytoskeletal dynamics. The pairwise similarities of the enriched terms
   were also calculated and visualized in an enrichment map ([257]Figure
   6). In this map, two main clusters were defined, which were involved in
   the processes of cell cycle and replication and extracellular matrix,
   respectively.

Figure 6.

   [258]Figure 6
   [259]Open in a new tab

   (Upper) Top 25 significant terms of the Reactome pathway enrichment
   analysis. (Lower) Enrichment map of the Reactome enriched terms
   presented into a network.

   DO enrichment analysis demonstrated that there were 185 enriched terms
   that were strongly connected with the detected DEGs. The top 25
   enriched DO terms are presented in [260]Figure 7. Noteworthy, the
   highest enriched DO term was the urinary system cancer, along with
   non-small cell lung, kidney, breast, and musculoskeletal system
   carcinomas in the following ranks.

Figure 7.

   [261]Figure 7
   [262]Open in a new tab

   Top 25 significant terms of the Disease Ontology (DO) enrichment
   analysis.

3.7. Protein–Protein Interaction Network Analysis

   The PPI network of the 815 DEGs was constructed and visualized via the
   STRING database ([263]Figure 8). The network included 813 nodes and
   8260 edges, with an average node degree of 20.3 and an average local
   clustering coefficient of 0.382. The PPI network’s enrichment p-value
   was < 10^−16, indicating that the proteins were biologically related as
   a group.

Figure 8.

   [264]Figure 8
   [265]Open in a new tab

   The constructed PPI network visualized by the STRING database. Nodes
   represent proteins and edges represent protein–protein associations.
   Line thickness represents the confidence score of a functional
   association.

   The nodes of the PPI network were ranked applying 10 topological
   analysis methods and, hence, 10 ranked gene lists were obtained. These
   methods included local as well as global algorithms, as implemented in
   the cytoHubba plugin in Cytoscape software. In order to result in a
   final ranked gene list, we utilized the robust rank aggregation (RRA)
   method. The final ranked gene list, based on the cytoHubba plugin,
   included 129 genes, which showed a p-value < 0.01 (A in [266]Table 3).

Table 3.

   Hub genes as obtained from cytoHubba and MCODE plugins of Cytoscape.
   A. Genes Included in the Final Ranked List Aggregated from the 10
   Topological cytoHubba Methods
   IL6, VEGFA, CCNB1, BRCA1, CCNA2, CD44, TYMS, CDH1, LMNB1, AURKB, EZH2,
   MKI67, KIF23, ECT2, MCM4, CDC6, PLK1, CDC25C, CDKN3, CENPA, MMP2,
   TOP2A, CENPE, PBK, NDC80, FOXM1, SPP1, IGF1, UBE2C, RRM2, KIF11, CHEK1,
   CD8A, CCNB2, ASPM, NCAM1, FLNA, LGALS4, ITPR1, DLGAP5, CDCA8, COL5A1,
   TIMELESS, CDC20, DMD, PPARGC1A, WNT5A, BUB1, KIF20A, EXO1, CDC25A, VCL,
   LUM, CCND2, CD34, MCM2, MAD2L1, HPGDS, ISL1, ESRP1, SKP2, NCAPG, CENPU,
   HJURP, CCL2, TPM1, CDH11, PLK4, FABP4, H2AFX, GJA1, DHCR7, PTGS2, MSN,
   ANXA5, COL6A1, TRIP13, OIP5, MYH11, KRT20, TTK, MYL9, CAV1, FBXO5,
   PROM1, BMP4, CDT1, KIAA0101, CCNE1, ANXA1, FGFR3, SNCA, ATAD2, ESPL1,
   FASN, NT5E, ZWINT, SDC1, FGF2, NEK2, ACTG2, KIF14, COL3A1, EPCAM,
   ASF1B, IGFBP5, RAD54L, CYP1B1, STMN1, COL4A5, ATF3, CASC5, CENPM,
   ERBB3, DNMT3B, ITGB2, ISG15, ANK2, CDC45, PLAT, TACC3, EGR1, MYLK,
   CTSG, GINS2, ITGA8, CENPF, TGFBR2, OGN
   B. Genes included in the first three clusters of MCODE
   Cluster Score Nodes Gene clusters
   1 74.268 83 PLK4, TRIP13, CDC45, PBK, RRM2, ERCC6L, CHAF1A, DEPDC1,
   DLGAP5, ASPM, E2F8, MAD2L1, CDCA8, CCNB1, BRCA1, FANCI, FBXO5, CENPA,
   KIAA0101, TK1, TACC3, DTL, CDCA3, HJURP, CENPE, ZWINT, ESPL1, POLQ,
   OIP5, CDC25C, ASF1B, CDKN3, POLE2, CCNB2, CHAF1B, EZH2, UBE2C, RAD54L,
   CDT1, MCM5, CDC20, TROAP, CKS2, NEK2, SPC25, MKI67, CHEK1, TTK, CDC6,
   GINS2, BUB1, CENPU, CCNE2, STIL, KIF14, TYMS, CDC7, MCM2, KIF23, KNTC1,
   SKA1, CASC5, CENPF, HELLS, NUSAP1, ATAD2, CEP55, NCAPG, MCM4, NDC80,
   ECT2, TOP2A, CENPM, CDC25A, MCM10, ORC1, KIF20A, AURKB, CCNA2, PLK1,
   EXO1, FOXM1, KIF11
   2 18 23 CXCL12, PTGS2, BMP4, IL6, GJA1, CD34, FGF2, NES, PROM1, CD8A,
   VEGFA, CD44, SDC1, SPP1, ANXA5, NCAM1, SELP, CCL2, CCL5, IGF1, CSF1R,
   NT5E, SELE
   3 7.923 27 TGFBI, COL6A2, THBS2, TPM1, MYH11, ACTG2, COL6A1, COL13A1,
   COL3A1, TGFBR2, VCL, FBLN2, COL4A5, CTSK, LYVE1, CLDN5, ANGPT2, LUM,
   MYL9, LEPREL1, TPM2, SPARC, MYLK, CAV1, ADAMTS5, TAGLN, FMOD
   C. Common genes between cytoHubba and MCODE
   IL6, VEGFA, CCNB1, BRCA1, CCNA2, CD44, TYMS, AURKB, EZH2, MKI67, KIF23,
   ECT2, MCM4, CDC6, PLK1, CDC25C, CDKN3, CENPA, TOP2A, CENPE, PBK, NDC80,
   FOXM1, SPP1, IGF1, UBE2C, RRM2, KIF11, CHEK1, CD8A, CCNB2, ASPM, NCAM1,
   DLGAP5, CDCA8, CDC20, BUB1, KIF20A, EXO1, CDC25A, VCL, LUM, CD34, MCM2,
   MAD2L1, NCAPG, CENPU, HJURP, CCL2, TPM1, PLK4, GJA1, PTGS2, ANXA5,
   COL6A1, TRIP13, OIP5, MYH11, TTK, MYL9, CAV1, FBXO5, PROM1, BMP4, CDT1,
   KIAA0101, ATAD2, ESPL1, NT5E, ZWINT, SDC1, FGF2, NEK2, ACTG2, KIF14,
   COL3A1, ASF1B, RAD54L, COL4A5, CASC5, CENPM, CDC45, TACC3, MYLK, GINS2,
   CENPF, TGFBR2
   [267]Open in a new tab

   Furthermore, we performed cluster analysis utilizing the MCODE plugin
   in Cytoscape software and kept only clusters with a score of more than
   seven. According to this, the selected modules included 133 nodes and
   3346 edges grouped in three clusters. Cluster one contained 83 nodes
   and 3045 edges, with a cluster score of 74.268. Cluster two contained
   23 nodes and 198 edges, with a cluster score of 18. Finally, cluster
   three contained 27 nodes and 103 edges, with a score of 7.923. The
   whole node list for each of the three clusters, containing a total of
   133 genes, is presented in B in [268]Table 3.

   The final list of hub genes, based on cytoHubba and MCODE plugins of
   Cytoscape, contained 87 common genes which were considered the most
   significant nodes of the PPI network (C in [269]Table 3). The PPI
   network of these genes is presented in [270]Figure 9.

Figure 9.

   [271]Figure 9
   [272]Open in a new tab

   The PPI network of the final 87 hub genes. Nodes represent proteins and
   edges represent protein–protein associations. Line thickness represents
   the confidence score of a functional association.

3.8. Weighted Protein–Protein Interaction Network Analysis

   In order to identify strongly BCa-correlated genes among the common
   cross-platform genes, a weighted gene co-expression network analysis
   (WGCNA) was conducted, including only the datasets containing more than
   20 samples. Based on the hierarchical clustering trees, no outliers
   were detected, since they were removed in the previous steps
   ([273]Section 3.2). After the batch effects correction through the sva
   function, a total of 482 samples, distributed across eight datasets,
   were used in this consensus network topology analysis.

   We chose seven as the consensus suitable soft thresholding power, as
   this is the lowest power at which two conditions are fulfilled: the
   scale-free topology fit index reaches 0.77 and the median connectivity
   measurements decrease below 30 ([274]Figure S4). Along with the
   threshold power, we set 30 as the minimal module gene size and 0.25 as
   the height for the dynamic tree cutting algorithm. Accordingly, we
   obtained eight consensus gene co-expression modules. The number of the
   included genes in each module ranged from 34 to 1901 ([275]Figure S5);
   the gray module contained 4898 genes that could not be assigned to any
   module.

   After obtaining gene modules, the correlation coefficients of each
   module for all eight datasets were calculated ([276]Figure S6). In
   order to obtain a consensus module–trait correlation heatmap, only the
   modules with a consistent coefficient sign across all datasets were
   kept. For them, the lower absolute correlation value in all datasets
   and the higher p-value were assigned as the module’s consensus
   correlation coefficient and significance, respectively. For the
   remaining modules, the zero value was assigned as the consensus
   correlation ([277]Figure 10). The advantage of the consensus
   relationship heatmap is that it isolates the module–trait relationships
   that are present in all datasets, and hence may be in a sense
   considered validated. It has to be noted that all the module–trait
   correlations and significance were very low for the [278]GSE31189
   dataset, presumably due to the strong batch effects that remained.
   Hence, this dataset was not considered for the consensus module–trait
   correlation. Based on the final heatmap of module–trait correlations,
   we determined that the turquoise (cor  =  −0.68, p-value  =  3 ×
   10^−8), brown (cor  =  −0.65, p-value  =  1 × 10^−7), black (cor  =
    −0.54, p-value  =  2 × 10^−4), blue (cor  =  0.71, p-value  =  9 ×
   10^−11), green (cor  =  0.66, p-value  =  4 × 10^−8), and yellow (cor =
   0.64, p-value  =  3 × 10^−8) modules were most highly correlated to BCa
   (p-value < 0.01) and were characterized as key modules ([279]Figure
   10). The turquoise module contained 1901 genes, the brown module
   included 764 genes, the black included 231 genes, the blue module
   encompassed 139 genes, the green module included 115 genes and,
   finally, the yellow module integrated 58 genes.

Figure 10.

   [280]Figure 10
   [281]Open in a new tab

   Heatmap of the consensus relationships of consensus module eigengenes
   and phenotypic traits. Each row corresponds to a consensus module
   eigengene and each column corresponds to the phenotypic characteristic.
   Each cell contains the corresponding correlation (ranging from blue to
   red) and p-value.

   Next, we calculated the gene significances and module memberships in
   each key module. We set the criteria to specify the hub genes highly
   associated with BCa: module membership (MM) and gene significance (GS)
   meta-Z-scores in the upper or lower quartile of each module. We
   identified that 815 hub genes from the turquoise module, 345 hub genes
   from the brown, 91 genes from the black, 49 genes from the blue, 50
   genes from the green, and 21 genes from the yellow module met the
   inclusion criteria ([282]Figure S7). Finally, we combined the genes of
   each key module with the hub genes obtained from the PPI network
   analysis, and we determined the key hub genes of our analysis. These
   key hub genes are listed in [283]Table 4.

Table 4.

   The key hub genes of our study are defined as the intersection of hub
   genes between PPI network analysis and WGCNA.
   Module Common Hub Genes
   turquoise ACTG2, ANXA5, AURKB, BUB1, CD34, CD44, CDC25A, CDT1, CENPM,
   ESPL1, EXO1, FGF2, GINS2, KIF20A, NCAM1
   brown CAV1, COL3A1, COL4A5, IGF1, LUM, MYLK, PROM1, SDC1, SPP1, TPM1,
   VCL, VEGFA
   black ASPM, CCNA2, CCNB1, CCNB2, CDC20, CDC45, CDCA8, CDKN3, CENPA,
   CENPF, CENPU, DLGAP5, ECT2, EZH2, FOXM1, HJURP, KIF11, KIF14, KIF23,
   MCM2, MCM4, MKI67, NCAPG, NDC80, NEK2, PBK, PLK4, RAD54L, TOP2A, TTK,
   UBE2C, ZWINT
   blue --
   green COL6A1, MYH11
   yellow --
   [284]Open in a new tab

3.9. Differential Expression in Urine and Blood Plasma Samples

   The raw gene expression data of urine samples, from series
   [285]GSE51843 and [286]GSE68020, characterized by Illumina Human HT12
   v4 BeadChip ([287]GPL10558), were downloaded and appropriately
   pre-processed. The datasets consist of five BCa and six control urine
   samples, and 30 BCa and 20 control urine samples, respectively. The
   Wilcoxon rank sum test was applied in each dataset to assess the
   statistical significance between the BCa and control groups for all the
   key hub genes. The hub genes with a p-value < 0.05 were identified as
   significantly differentially expressed between BCa and control urine
   samples. These genes included KIF20A, CDCA8, and TTK for the
   [288]GSE51843, and AURKB, CDT1, GINS2, COL3A1, SDC1, SPP1, CCNB2,
   CDC45, CDCA8, CENPU, MCM4, PBK, PLK4, TOP2A, and UBE2C for the
   [289]GSE68020 dataset. The Wilcoxon rank sum test results for these
   genes are presented in boxplots in [290]Figure 11 and [291]Figure 12.

Figure 11.

   [292]Figure 11
   [293]Open in a new tab

   Significantly differentially expressed key hub genes in urine samples
   using the Wilcoxon rank sum test in [294]GSE51843 (n = 11).

Figure 12.

   [295]Figure 12

   [296]Figure 12

   [297]Figure 12
   [298]Open in a new tab

   Significantly differentially expressed key hub genes in urine samples
   using the Wilcoxon rank sum test in [299]GSE68020 (n = 50).

   The raw gene expression data of blood plasma samples, from series
   [300]GSE138118, characterized by Affymetrix Human Gene 2.1 ST Array
   ([301]GPL17692), were downloaded and appropriately pre-processed. The
   dataset consisting of 28 BCa and 29 control blood plasma samples was
   adjusted for batch effects. The Wilcoxon rank sum test was conducted to
   assess the statistical significance between the BCa and control groups
   for all the key hub genes. The hub genes with a p-value < 0.05 were
   considered as significantly differentially expressed between BCa and
   control blood samples. These genes included ANXA5, CD34, CDT1, COL4A5,
   VEGFA, ASPM, CDC20, ECT2, HJURP, MCM2, and COL6A1. The Wilcoxon rank
   sum test results for these genes are presented in boxplots in
   [302]Figure 13.

Figure 13.

   [303]Figure 13

   [304]Figure 13
   [305]Open in a new tab

   Significantly differentially expressed key hub genes in blood plasma
   samples using the Wilcoxon rank sum test in [306]GSE138118 (n = 57).

3.10. Prognostic Genes for BCa

   To better comprehend which of the 61 key hub genes were more closely
   associated with clinical outcomes of BCa patients, we further evaluated
   these genes by applying univariate Cox regression analysis on survival
   data of 165 patients with BCa of various stages ([307]GSE13507).
   Univariate Cox regression analysis indicated that 46 genes were of
   statistically significant correlation with OS ([308]Table 5). We
   performed LASSO Cox regression analysis with 10-fold cross validation
   to further decrease the number of significant genes and properly detect
   those that were highly associated with BCa-related survival. Nine
   genes, namely ACTG2, ASPM, CDCA8, COL3A1, COL4A5, FOXM1, MKI67, PLK4,
   and SPP1, were identified. Multivariate Cox regression analysis
   indicated that the expressions of COL3A1, FOXM1, and PLK4 were highly
   and independently connected with the BCa patients’ prognosis
   ([309]Table 5), and were used to calculate each gene’s coefficient.
   Finally, a three-gene signature prognostic model was constructed. We
   calculated a prognostic risk score for every patient of the training
   set ([310]GSE13507) based on their distinct expression levels of the
   three genes, using the prognostic index (PI):
   [MATH:
   <mrow><mrow><mi>P</mi><mi>I</mi><mo>=</mo><mn>0.5405</mn><mo>·</mo><mi>
   e</mi><mi>x</mi><msub><mi>p</mi><mrow><mi>C</mi><mi>O</mi><mi>L</mi><mn
   >3</mn><mi>A</mi><mn>1</mn></mrow></msub><mo>+</mo><mn>1.6748</mn><mo>·
   </mo><mi>e</mi><mi>x</mi><msub><mi>p</mi><mrow><mi>F</mi><mi>O</mi><mi>
   X</mi><mi>M</mi><mn>1</mn></mrow></msub><mo>−</mo><mn>0.9583</mn><mo>·<
   /mo><mi>e</mi><mi>x</mi><msub><mi>p</mi><mrow><mi>P</mi><mi>L</mi><mi>K
   </mi><mn>4</mn></mrow></msub></mrow></mrow> :MATH]
   (4)

   where
   [MATH: <mrow><mrow><mi>e</mi><mi>x</mi><mi>p</mi></mrow></mrow> :MATH]
   is the expression value of the respective gene. The forest plot of our
   prognostic model is depicted in [311]Figure 14C. The co-expression
   correlation analysis was performed through the GEPIA2 platform and
   indicated that no couple of the three genes held a Pearson correlation
   coefficient greater than 0.6 in BCa ([312]Figure S8).

Table 5.

   Univariate Cox regression analysis of the survival-associated hub genes
   in BCa patients (HR: hazard ratio, CI: confidence interval, * p-value <
   0.05, ** p-value < 0.01, *** p-value < 0.001, **** p-value < 0.0001).
   Univariate Analysis Multivariate Analysis
   RFS Related Gene HR (95% CI) p-Value HR (95% CI) p-Value
   ACTG2 1.2 (1–1.4) 1.40 × 10^−2 * -- --
   AURKB 2.1 (1.5–3) 3.30 × 10^−5 **** -- --
   BUB1 1.8 (1.2–2.6) 1.90 × 10^−3 ** -- --
   CDC25A 3.1 (1.6–5.9) 9.80 × 10^−4 *** -- --
   CDT1 1.9 (1.4–2.6) 1.00 × 10^−4 *** -- --
   CENPM 2.1 (1.4–3.1) 1.90 × 10^−4 *** -- --
   ESPL1 2.8 (1.7–4.6) 5.30 × 10^−5 **** -- --
   EXO1 3.8 (1.9–7.7) 2.50 × 10^−4 *** -- --
   GINS2 1.8 (1.2–2.5) 1.40 × 10^−3 ** -- --
   KIF20A 1.8 (1.3–2.5) 6.00 × 10^−4 *** -- --
   COL3A1 1.5 (1.1–1.9) 3.50 × 10^−3 ** 1.72 (1.29–2.29) 0.000223 ***
   COL4A5 0.66 (0.51–0.85) 1.70 × 10^−3 ** -- --
   LUM 1.3 (1–1.6) 1.80 × 10^−2 * -- --
   SPP1 1.4 (1.1–1.7) 4.80 × 10^−3 ** -- --
   ASPM 1.9 (1.4–2.6) 5.40 × 10^−5 **** -- --
   CCNA2 1.6 (1.2–2.3) 2.50 × 10^−3 ** -- --
   CCNB1 1.7 (1.1–2.6) 1.00 × 10^−2 * -- --
   CCNB2 1.8 (1.3–2.4) 9.90 × 10^−5 **** -- --
   CDC20 1.7 (1.3–2.3) 1.90 × 10^−4 *** -- --
   CDC45 2.7 (1.7–4.2) 1.30 × 10^−5 **** -- --
   CDCA8 2.2 (1.5–3.2) 2.40 × 10^−5 **** -- --
   CDKN3 2.1 (1.5–3) 3.40 × 10^−5 **** -- --
   CENPA 1.8 (1.3–2.5) 2.70 × 10^−4 *** -- --
   CENPF 2 (1.5–2.7) 7.50 × 10^−6 **** -- --
   CENPU 1.7 (1.1–2.6) 2.20 × 10^−2 * -- --
   DLGAP5 1.7 (1.2–2.3) 1.20 × 10^−3 ** -- --
   ECT2 3 (1.5–6.2) 3.00 × 10^−3 ** -- --
   EZH2 2.2 (1.4–3.4) 5.30 × 10^−4 *** -- --
   FOXM1 2.7 (1.8–4) 3.20 × 10^−6 **** 5.34 (2.95–9.64) 2.87 × 10^−8 ****
   HJURP 2.1 (1.5–2.9) 4.30 × 10^−5 **** -- --
   KIF11 1.9 (1.3–2.8) 6.00 × 10^−4 *** -- --
   KIF14 2.4 (1.5–3.7) 1.40 × 10^−4 *** -- --
   KIF23 3 (1.5–5.7) 1.20 × 10^−3 ** -- --
   MCM2 1.7 (1.2–2.3) 1.70 × 10^−3 ** -- --
   MCM4 1.4 (1–1.9) 3.80 × 10^−2 * -- --
   MKI67 9.8 (4–24) 8.60 × 10^−7 **** -- --
   NCAPG 2.1 (1.5–3) 6.40 × 10^−5 **** -- --
   NDC80 1.5 (1.1–2.1) 1.30 × 10^−2 * -- --
   NEK2 2.5 (1.4–4.5) 2.40 × 10^−3 ** -- --
   PBK 1.7 (1.2–2.4) 2.50 × 10^−3 ** -- --
   PLK4 1.6 (1–2.5) 3.90 × 10^−2 * 0.38 (0.19–0.80) 0.010188 *
   RAD54L 2.3 (1.5–3.4) 4.40 × 10^−5 **** -- --
   TOP2A 1.5 (1.2–1.9) 1.50 × 10^−3 ** -- --
   TTK 1.8 (1.3–2.4) 5.50 × 10^−4 *** -- --
   UBE2C 2.3 (1.5–3.6) 2.50 × 10^−4 *** -- --
   ZWINT 3 (1.4–6) 3.00 × 10^−3 ** -- --
   [313]Open in a new tab

Figure 14.

   [314]Figure 14

   [315]Figure 14
   [316]Open in a new tab

   Survival analysis for the [317]GSE13507 (n = 165) dataset (training
   set). (A) Kaplan–Meier curves for the overall survival of BCa patients
   as stratified by the three-gene prognostic index. Patients were divided
   into low- and high-risk groups according to the median prognostic
   index. (B) Time-dependent ROC curves (one-, three-, five-, and ten-year
   predictions) to assess the prognostic accuracy of the three-gene
   prognostic model. (C) Forest plot for the multivariate Cox regression
   analysis on the [318]GSE13507. The figure incorporates the hazard ratio
   value (e^coef) along with the 95% CI and p-value for each gene.

   Calculating the prognostic index for each of the patients in the
   training set and grouping them based on the median value, the survival
   time for the high-risk (poor prognosis) patients (n  =  82) was
   significantly worse (p-value < 0.0001) than that of the low-risk (good
   prognosis) patients (n  =  83), as indicated by the Kaplan–Meier curves
   ([319]Figure 14A). Additionally, the three-gene prognostic signature
   was assessed for its prognostic accuracy, conducting time-dependent ROC
   analysis at specific follow-up times; namely one, three, five, and ten
   years after diagnosis. The AUC at the different cut-off times were
   0.796, 0.779, 0.846, and 0.8, respectively ([320]Figure 14B).

   In the first test set ([321]GSE32894), the low-risk patient group (n =
   112), as predicted by the prognostic model, demonstrated significantly
   longer OS (p-value < 0.0001) in contrast to the high-risk patient group
   (n = 112) ([322]Figure 15A). The time-dependent ROC curves were plotted
   and the one-, three-, five-, and ten-year AUC values were 0.819, 0.86,
   0.871, and 0.789, respectively ([323]Figure 15B). In the second test
   set ([324]GSE32548), the low-risk (n = 65) and the high-risk (n = 66)
   patient groups also produced significantly different OS times (p-value
   < 0.0001) ([325]Figure 16A). Likewise, the time-dependent ROC curves
   were plotted and the corresponding AUC values were 0.82, 0.806, 0.774,
   and 0.724 ([326]Figure 16B).

Figure 15.

   [327]Figure 15
   [328]Open in a new tab

   Survival analysis for the [329]GSE32894 (n = 224) dataset (first test
   set). (A) Kaplan–Meier curves for overall survival of BCa patients as
   stratified by the three-gene prognostic index. Patients were divided
   into low- and high-risk groups according to the median prognostic
   index. (B) Time-dependent ROC curves (one-, three-, five-, and ten-year
   predictions) to assess the prognostic accuracy of the three-gene
   prognostic model.

Figure 16.

   [330]Figure 16
   [331]Open in a new tab

   Survival analysis for the [332]GSE32548 (n = 131) dataset (second test
   set). (A) Kaplan–Meier curves for overall survival of BCa patients as
   stratified by the three-gene prognostic index. Patients were divided
   into low- and high-risk groups according to the median prognostic
   index. (B) Time-dependent ROC curves (one-, three-, five-, and ten-year
   predictions) to assess the prognostic accuracy of the three-gene
   prognostic model.

   Finally, we investigated whether the signature, constituted of COL3A1,
   FOXM1, and PLK4 could be prognostic for the OS or DFS of TCGA-BCa
   patients. For this purpose, we tested the three-gene signature with
   Kaplan–Meier survival analysis using the GEPIA2 platform. The patients
   were separated into low- and high-risk groups (n = 201 in each group)
   based on the median expression value and the two groups showed
   statistically significant different OS time (p-value = 0.02)
   ([333]Figure 17A). Regarding the DFS time, the lower 15% and the upper
   85% of the sorted expression values were used for distinguishing the
   low- and high-expression patient groups, respectively (n = 61 in each
   group). The difference between the DFS curves of these expression
   groups was found to be significant (p-value = 0.019) ([334]Figure 17B).

Figure 17.

   [335]Figure 17
   [336]Open in a new tab

   Kaplan–Meier survival plots of the three-gene prognostic signature,
   generated using the GEPIA2 platform. Red and blue lines indicate the
   high- and low-risk patient groups, respectively. Patients were grouped
   according to (A) median cut-off value for overall survival and (B) a
   custom cut-off high and low value of 85% and 15%, respectively, for
   disease-free survival.

   To conclude, the Kaplan–Meier survival analysis and the AUC values at
   the different cut-off times indicated that the three-gene signature
   model holds a very good prognostic accuracy regarding grouping BCa
   patients in terms of survival. These results endorsed the validation of
   this prognostic gene signature.

   Survival plots for the individual key hub genes were generated by
   utilizing the GEPIA2 platform and were used to observe the OS and DFS
   status for each key hub gene in BCa ([337]Figure 18). The OS and DFS
   plots compared high- and low-expression groups in Bca tissues and a
   p-value <  0.05 was regarded as statistically significant. Elevated
   expression levels of ANXA5, CD34, FGF2, CAV1, COL3A1, IGF1, LUM, MYLK,
   SPP1, TPM1, VCL, DLGAP5, COL6A1, and MYH11 were found to be correlated
   with poorer patient OS, whereas expression levels of VEGFA were found
   to be inversely correlated with OS ([338]Figure 18). Moreover, the high
   expression levels of KIF20A, NCAM1, PROM1, CCNA2, CCNB1, CENPU, HJURP,
   MCM4, NCAPG, PBK, TTK, UBE2C, and ZWINT were found to be correlated
   with worse DFS. No significant relationship was observed for other hub
   genes (data not shown).

Figure 18.

   [339]Figure 18

   [340]Figure 18
   [341]Open in a new tab

   Kaplan–Meier survival plots of key hub genes, generated using the
   GEPIA2 platform. Red and blue lines indicate the high- and low-risk
   patient groups, respectively. Patients were grouped according to median
   or quartile cut-off values.

3.11. Predictive Genes for BCa

   The raw gene expression data of MIBC samples, from series
   [342]GSE169455, characterized by Affymetrix Human Gene 2.1 ST Array
   ([343]GPL17692), were downloaded and appropriately pre-processed. The
   dataset consists of 149 MIBC patients receiving preoperative
   cisplatin-based chemotherapy and was adjusted for batch effects. The
   Wilcoxon rank sum test was applied to assess each key hug gene’s
   statistical significance between the “No response”, the “Partial
   response”, and the “Complete response” groups. In addition, statistical
   significance between the “No response” and “Partial/Complete response”
   groups was also investigated for all the key hub genes.

   The hub genes with a p-value < 0.05 were identified as significantly
   differentially expressed between the groups. These genes included
   ESPL1, SPP1, CDCA8, HJURP, MKI67, PBK, TOP2A, and ZWINT between
   patients that did not respond to therapy and those who responded
   completely, KIF14 between patients that did not respond to therapy and
   those who responded partially, and KIF20A and KIF14 between the
   partially and completely responded patients. The Wilcoxon rank sum test
   results for these genes are presented in [344]Figure 19. As regards the
   two-class comparison between the “No response” and “Partial/Complete
   response” groups, the genes CD44, ESPL1, SPP1, and CDCA8 were
   significantly differentially expressed. The Wilcoxon rank sum test
   results for these genes are presented in [345]Figure 20.

Figure 19.

   [346]Figure 19

   [347]Figure 19
   [348]Open in a new tab

   Significantly differentially expressed key hub genes between “No
   response”, “Partially response”, and “Complete response” to
   preoperative cisplatin-based chemotherapy groups of MIBC patients.

Figure 20.

   [349]Figure 20

   [350]Figure 20
   [351]Open in a new tab

   Significantly differentially expressed key hub genes between “No
   response” and “Partially/Complete response” for preoperative
   cisplatin-based chemotherapy groups of MIBC patients.

   For the purpose of investigating the role of the key hub genes
   regarding patients’ response to cisplatin-based chemotherapy, we
   additionally assessed these genes using univariate Cox regression
   analysis in a total of 149 patients with MIBC ([352]GSE169455). The
   clinical characteristics of patients for this dataset contain
   information about recurrence-free survival (RFS), cancer-specific
   survival (CSS), and overall survival (OS) events. Therefore, we
   conducted the univariate Cox regression analysis for each one of these
   events. Only two genes, SPP1 and CDCA8, were found to be statistically
   significant with RFS, CSS, and OS time simultaneously, with SPP1
   showing a strong statistical significance ([353]Table 6). Additionally,
   we performed LASSO Cox regression analysis with 10-fold cross
   validation to select which of the key hub genes were highly associated
   with BCa RFS. A total of 19 genes, namely ACTG2, ANXA5, AURKB, CCNA2,
   CCNB2, CD44, CDC45, CDCA8, CDT1, CENPA, COL6A1, DLGAP5, IGF1, KIF14,
   NCAM1, NEK2, SPP1, VCL, and ZWINT, were identified. Multivariate Cox
   regression analysis indicated that the expression of ANXA5, CD44,
   NCAM1, SPP1, CDCA8, and KIF14 were highly and independently associated
   with the RFS of BCa patients ([354]Table 6) and were used to calculate
   the coefficient of each gene. Finally, a six-gene signature predictive
   model was constructed, and the predictive index (PDI) formula for this
   model was:
   [MATH: <mrow><mrow><mtable
   columnalign="left"><mtr><mtd><mi>P</mi><mi>D</mi><mi>I</mi><mo>=</mo></
   mtd><mtd><mo>|</mo><mo>−</mo><mn>0.87492</mn><mo>·</mo><mi>e</mi><mi>x<
   /mi><msub><mi>p</mi><mrow><mi>A</mi><mi>N</mi><mi>X</mi><mi>A</mi><mn>5
   </mn></mrow></msub><mo>+</mo><mn>0.50317</mn><mo>·</mo><mi>e</mi><mi>x<
   /mi><msub><mi>p</mi><mrow><mi>C</mi><mi>D</mi><mn>44</mn></mrow></msub>
   <mo>+</mo><mn>0.46781</mn><mo>·</mo><mi>e</mi><mi>x</mi><msub><mi>p</mi
   ><mrow><mi>N</mi><mi>C</mi><mi>A</mi><mi>M</mi><mn>1</mn></mrow></msub>
   </mtd></mtr><mtr><mtd></mtd><mtd><mo>+</mo><mn>0.54406</mn><mo>·</mo><m
   i>e</mi><mi>x</mi><msub><mi>p</mi><mrow><mi>S</mi><mi>P</mi><mi>P</mi><
   mn>1</mn></mrow></msub><mo>−</mo><mn>1.70391</mn><mo>·</mo><mi>e</mi><m
   i>x</mi><msub><mi>p</mi><mrow><mi>C</mi><mi>D</mi><mi>C</mi><mi>A</mi><
   mn>8</mn></mrow></msub><mo>+</mo><mn>1.54315</mn><mo>·</mo><mi>e</mi><m
   i>x</mi><msub><mi>p</mi><mrow><mi>K</mi><mi>I</mi><mi>F</mi><mn>14</mn>
   </mrow></msub><mo>|</mo></mtd></mtr></mtable></mrow></mrow> :MATH]
   (5)

Table 6.

   Univariate Cox regression analysis of the survival-associated hub genes
   in MIBC patients receiving neoadjuvant cisplatin-based chemotherapy
   (HR: hazard ratio, CI: confidence interval, * p-value < 0.05, **
   p-value < 0.01, *** p-value < 0.001, **** p-value < 0.0001).
   Univariate Analysis Multivariate Analysis
   RFS-Related Gene HR (95% CI) p-Value HR (95% CI) p-Value
   ANXA5 1.1 (0.70–1.70) 0.7 0.42 (0.24–0.74) 0.00268 **
   CD44 1.3 (0.99–1.70) 0.055 1.65 (1.20–2.28) 0.00220 **
   NCAM1 1.1 (0.81–1.60) 0.48 1.60 (1.09–2.35) 0.01718 *
   IGF1 0.48 (0.24–0.98) 0.043 * -- --
   SPP1 1.5 (1.30–1.80) 3.3 × 10^−6 **** 1.72 (1.42–2.09) 2.93 × 10^−8
   ****
   CDCA8 0.5 (0.26–0.96) 0.037 * 0.18 (0.08–0.42) 5.72 × 10^−5 ****
   KIF14 1.8 (0.87–3.70) 0.11 4.68 (2.17–10.11) 8.59 × 10^−5 ****
   CSS-Related Gene HR (95% CI) p -Value
   ANXA5 1.1 (0.68–1.70) 0.74 0.44 (0.24–0.82) 0.009708 **
   CD44 1.3 (0.95–1.60) 0.11 1.60 (1.12–2.29) 0.009785 **
   NCAM1 1.1 (0.82–1.60) 0.47 1.43 (0.99–2.05) 0.049989 *
   SPP1 1.4 (1.20–1.70) 5.8 × 10^−5 **** 1.64 (1.35–2.01) 1.15 × 10^−6
   ****
   CDCA8 0.48 (0.24–0.94) 0.034 * 0.17 (0.07–0.41) 6.49 × 10^−5 ****
   KIF14 1.7 (0.84–3.60) 0.14 4.82 (2.12–10.96) 0.000171 ***
   OS-Related Gene HR (95% CI) p -Value
   ACTG2 1.3 (1.00–1.60) 0.038 * -- --
   ANXA5 0.97 (0.63–1.50) 0.9 0.41 (0.23–0.72) 0.002139 **
   CD44 1.2 (0.96–1.60) 0.095 1.63 (1.17–2.28) 0.003812 **
   NCAM1 1.1 (0.82–1.50) 0.48 1.42 (1.00–2.01) 0.047272 *
   SPP1 1.4 (1.20–1.60) 0.00012 *** 1.60 (1.32–1.92) 1.1 × 10^−6 ****
   CDCA8 0.48 (0.25–0.91) 0.024 * 0.19 (0.09–0.42) 5.0 × 10^−5 ****
   KIF14 1.6 (0.78–3.20) 0.21 4.45 (2.01–9.85) 0.000225 ***
   [355]Open in a new tab

   The forest plot of our predictive model is presented in [356]Figure
   21C. The co-expression correlation analysis for the six genes was
   performed through the GEPIA2 platform and indicated that no couple of
   these genes held a Pearson correlation coefficient greater than 0.44 in
   BCa ([357]Figure S9).

Figure 21.

   [358]Figure 21
   [359]Open in a new tab

   Survival analysis for the [360]GSE169455 (n = 148) dataset (training
   set). (A) Kaplan–Meier curves for disease-free survival of MIBC
   patients who received cisplatin-based chemotherapy as stratified by the
   six-gene predictive index. Patients were divided into low- and
   high-risk groups according to the median predictive index. (B)
   Time-dependent ROC curves (one-, three-, five-, and 10-year
   predictions) to assess the predictive accuracy of the six-gene
   predictive model. (C) Forest plot for multivariate Cox regression
   analysis on the [361]GSE169455. The figure incorporates the hazard
   ratio value (e^coef) along with the 95% CI and p-value for each gene.

   Applying the predictive index for each of the patients in the training
   set, the recurrence-free survival time for the high-risk patients (n  =
    74) was significantly worse (p-value < 0.0001) than that of the
   low-risk patients (n  =  74), as indicated by the Kaplan–Meier curves
   ([362]Figure 21A). In addition, we assessed the predictive performance
   of the six-gene signature using time-dependent ROC analysis at specific
   follow-up times, namely one, three, five, and ten years after
   diagnosis. The AUC at the different cut-off times were 0.603, 0.688,
   0.716, and 0.801, respectively ([363]Figure 21B).

   In the first test set ([364]GSE87304), the survival data were available
   for 258 out of 305 patients. Thus, the low-risk patient group (n =
   129), as indicated by the predictive model, demonstrated statistically
   significant longer DFS (p-value < 0.012) in comparison with the
   high-risk patient group (n  =  129) ([365]Figure 22A). The
   time-dependent ROC curves were drawn and the three-, five-, and
   six-year AUC values were 0.699, 0.572, and 0.582, respectively
   ([366]Figure 22B). In the second test set ([367]GSE69795), the low-risk
   patient group (n = 19), as indicated by the predictive model,
   demonstrated statistically significant longer DFS (p-value < 0.038) in
   comparison with the high-risk patient group (n  =  19) ([368]Figure
   23A). The time-dependent ROC curves were drawn and the one-, three-,
   five-, and seven-year AUC values were 0.619, 0.673, 0.611, and 0.611,
   respectively ([369]Figure 23B).

Figure 22.

   [370]Figure 22
   [371]Open in a new tab

   Survival analysis using the [372]GSE87304 (n = 258) dataset (first test
   set). (A) Kaplan–Meier curves for disease-free survival of MIBC
   patients who received cisplatin-based chemotherapy as stratified by the
   six-gene predictive index. Patients were divided into low- and
   high-risk groups according to the median predictive index. (B)
   Time-dependent ROC curves (three-, five-, and six-year predictions) to
   assess the predictive accuracy of the six-gene predictive model.

Figure 23.

   [373]Figure 23
   [374]Open in a new tab

   Survival analysis using the [375]GSE69795 (n = 38) dataset (second test
   set). (A) Kaplan–Meier curves for disease-free survival of MIBC
   patients who received cisplatin-based chemotherapy as stratified by the
   six-gene predictive index. Patients were divided into low- and
   high-risk groups according to the median predictive index. (B)
   Time-dependent ROC curves (one-, three-, five-, and seven-year
   predictions) to assess the predictive accuracy of the six-gene
   predictive model.

   Then, we investigated whether the six-gene signature constituted of
   ANXA5, CD44, NCAM1, SPP1, CDCA8, and KIF14 could be prognostic for the
   OS or DFS of TCGA-BCa patients and tested this signature with
   Kaplan–Meier survival analysis using the GEPIA2 platform. The patients
   were separated into the low- and high-risk groups (n = 121 in each
   group) based on custom low- and high-cut-off values of 30% and 70%,
   respectively. The two groups showed statistically significant different
   OS time (p-value = 0.0011) and DFS time (p-value = 0.0014) ([376]Figure
   24). This indicated that the six-gene signature held also a prognostic
   value in OS and DFS of BCa patients.

Figure 24.

   [377]Figure 24
   [378]Open in a new tab

   Kaplan–Meier survival plots of the six-gene predictive signature,
   generated using the GEPIA2 platform. Red and blue lines indicate the
   high- and low-expression patient groups, respectively. Patients were
   grouped (n = 121 in each group) according to custom low and high
   cut-off values of 30% and 70%, respectively, for comparing their (A) OS
   and (B) DFS.

   In brief, the Kaplan–Meier survival analysis and the AUC values at the
   different cut-off times indicated that the six-gene signature model
   holds a quite good prognostic accuracy regarding the DFS time of MIBC
   patients who received preoperative cisplatin-based chemotherapy and,
   thus, it could be further assessed for whether it may predict the MIBC
   patients’ response to the preoperative chemotherapy treatment.

3.12. Expression Validation of Key Biomarkers and Immunohistochemistry

   The key hub genes which were revealed to be differentially expressed
   between the BCa and control samples in either the urine or the blood
   plasma of Bca patients (p-value < 0.05), and concurrently hold a
   prognostic or predictive value, were considered as the key biomarker
   genes of our integrative meta-analysis. These key biomarker genes
   include ANXA5, CDT1, COL3A1, SPP1, VEGFA, CDCA8, HJURP, TOP2A, and
   COL6A1.

   So that we would be able to confirm the altered mRNA expression levels
   of the proposed biomarker genes between Bca and normal groups, TCGA and
   GTEx datasets were analyzed using the GEPIA2 platform. The selected
   cut-off values were set as |log[2]FC| = 1 and p-value = 0.05. The
   corresponding boxplots were generated and downloaded from GEPIA2
   ([379]Figure 25). The plots demonstrated that the results of our
   differential expression analysis for all the genes were validated, in
   terms of the occurrence of down- or upregulation. However, for three
   genes, namely ANXA5, COL3A1, and VEGFA, the differences between the Bca
   and the control group means had a lower |log[2]FC| value than the
   selected one and, thus, they were not characterized as statistically
   significant. To allow us to explore the expression of these genes for
   the main Bca subtypes (i.e., non-papillary and papillary) in more
   detail, we further analyzed their expression and found that ANXA5 and
   COL3A1 were significantly differentially expressed in papillary
   subtype, using the aforementioned cut-off values ([380]Figure 26).

Figure 25.

   [381]Figure 25

   [382]Figure 25
   [383]Open in a new tab

   The gene expression level analysis of the nine key biomarkers in BCa
   patients, generated using the GEPIA2 platform. The red boxes represent
   the mRNA expression levels in BCa tissues and the gray boxes represent
   the expression levels in control bladder tissues from patients of the
   TCGA-BCa and GTEx cohorts. * indicates statistical significance
   applying p-value < 0.05 and |log[2]FC| < 1 as cut-off criteria (BLCA:
   bladder cancer, TPM: transcript count per million).

Figure 26.

   [384]Figure 26
   [385]Open in a new tab

   The gene expression level analysis of the ANXA5 and COL3A1 in BCa
   patients for non-papillary and papillary subtypes, generated using the
   GEPIA2 platform. The red boxes represent the mRNA expression levels in
   BCa subtype tissues and the gray boxes represent the expression levels
   in control bladder tissues from patients of the TCGA-BCa and GTEx
   cohorts. * indicates statistical significance applying p-value < 0.05
   and |log[2]FC| < 1 as cut-off criteria (TPM: transcript count per
   million).

   In an attempt to investigate the correlation of the key biomarker genes
   with the different pathological stages of BCa, we used the TCGA-BCa
   data and the corresponding feature from the GEPIA2 platform. The
   analysis showed that five out of the nine genes were strongly
   associated with the pathological BCa stages, highlighting their
   prognostic value for BCa. In particular, ANXA5, COL3A1, SPP1, VEGFA,
   and COL6A1 were identified to be highly correlated with BCa stages,
   while no significant correlation was found for the others ([386]Figure
   27).

Figure 27.

   [387]Figure 27

   [388]Figure 27
   [389]Open in a new tab

   Violin plots representing the association between the expression levels
   of the nine key biomarker genes and the three pathological tumor stages
   among BCa patients, generated using the GEPIA2 platform and based on
   the TCGA-BLCA cohort. F value and Pr(>F) indicate the statistical value
   and the p-value of the F-test, respectively.

   The Human Protein Atlas (HPA) was utilized to obtain the protein
   expression levels which are encoded by the key biomarker genes in the
   urinary bladder tissue for both pathologic and normal states. The
   immunohistochemistry (IHC) analysis based on the HPA images revealed
   that SPP1, CDCA8, and TOP2A showed high antibody staining intensity in
   BCa tissues and low staining intensity in normal tissues. Further,
   ANXA5, COL3A1, and HJURP had medium staining intensity in cancerous
   tissues, whereas low intensity was inspected in normal bladder. CDT1
   and VEGFA showed high staining intensity in both BCa and normal bladder
   tissues. Lastly, the antibody intensity for COL6A1 was higher in BCa
   tissue compared to the corresponding normal one, in which no staining
   was detected. The IHC analysis showcased that the expression levels of
   these proteins were generally upregulated in the protein expression
   level in BCa ([390]Figure 28).

Figure 28.

   [391]Figure 28

   [392]Figure 28

   [393]Figure 28
   [394]Open in a new tab

   Immunohistochemical (IHC) validation of the nine key biomarker genes in
   cancer and normal human bladder tissue specimens, obtained from the
   Human Protein Atlas (available online: [395]www.proteinatlas.org
   (accessed on 4 July 2022)). Staining demonstrated that protein
   expression of the nine key biomarker genes was higher in BCa tissues
   compared to normal bladder tissue samples.

3.13. Diagnostic Performance of Key Biomarkers

   To determine whether the identified key biomarker gene signature holds
   a diagnostic value, we used the genes as features and built various
   classification models utilizing all the datasets (with more than 10
   samples) used in this integrative meta-analysis ([396]Table 1), as well
   as the merged meta-dataset and an external set (ArrayExpress
   E-MTAB-1560).

   For the individual datasets, a fivefold cross validation method was
   implemented, whereas for the final merged meta-dataset, a 10-fold cross
   validation was conducted and they were all repeated 10 times. The
   resulting ROC curves for all the built models in addition to their
   corresponding AUC values and the 95% CIs were plotted ([397]Figure 29
   and [398]Figure 30). The results indicated a very high diagnostic
   performance of the various models, with the AUC values ranging from
   0.8863 to 1.00 for the individual datasets, and reaching 0.9307 and
   0.8909 for the merged meta-dataset and the external dataset,
   respectively. The classification model built from [399]GSE31189
   resulted in an AUC value of 0.6325, as this dataset suffered due to
   batch effects (as mentioned in [400]Section 3.8), and was not
   considered in our overall evaluation.

Figure 29.

   [401]Figure 29

   [402]Figure 29
   [403]Open in a new tab

   Validation of the diagnostic models, using the nine key biomarker genes
   as features, by ROC curve analysis on each of the datasets included in
   our integrative meta-analysis and contained at least 10 samples (see
   [404]Table 1).

Figure 30.

   [405]Figure 30
   [406]Open in a new tab

   Validation of the diagnostic models, using the nine key biomarker genes
   as features, by ROC curve analysis on the merged meta-dataset (n = 606)
   and on external validation E-MAT-1650 dataset (n = 30).

4. Discussion

   BCa is among the most common cancer types worldwide, accounting for
   high incidence, prevalence, mortality, as well as recurrence rate, and
   still remains an open clinical and social problem. Improved
   comprehension of its pathophysiology has evolved, but underlying
   molecular mechanisms and genetics need to be further elucidated. A
   major obstacle is the fact that its detection remains demanding due to
   the lack of specific and sensitive tumor markers, and the absence of
   new symptoms. This issue has become even more imperative during the
   COVID-19 era [[407]97]. There is thus an urgent need to develop more
   efficient diagnostic, prognostic, and predictive markers in order to
   better manage and treat the onset and course of BCa.

   In this study, we employed microarray data to investigate gene
   expression profiles in BCa. By combining and reanalyzing a high number
   of samples, we aimed to conclude more reliable results, statistical
   inferences, and gene expression signatures. In order to find a robust
   list of DEGs for BCa, we conducted a systematic review across multiple
   GEO studies using the PRISMA guidelines ([408]Figure 2), selected the
   eligible datasets, which were downloaded from the GEO and pre-processed
   according to their microarray platform, controlled the quality of all
   samples and removed outliers, and created a common gene symbol set for
   all datasets. Finally, we developed a merged microarray meta-dataset,
   comprising 410 BCa and 196 healthy urinary bladder tissue samples, from
   18 independent datasets, adopting an “early stage” integration approach
   [[409]23]. Our comprehensive analysis, which is among the largest of
   its kind to the best of our knowledge, identified 815 DEGs between BCa
   and normal tissues.

   The pathways significantly overrepresented in the DEGs list were
   investigated. The results from the GO analysis revealed biological
   processes related to the extracellular matrix, angiogenesis, muscle
   development, cell division, chromosome organization, and DNA
   replication, which are all fundamental processes for cancer development
   and progression ([410]Figure 4) [[411]98]. The KEGG pathway enrichment
   analysis exposed pathways enriched in PI3K-PKB/Akt signaling, microRNAs
   in cancer, cell cycle, focal adhesion, regulation of actin
   cytoskeleton, calcium signaling, proteoglycans in cancer, cellular
   senescence, vascular smooth muscle contraction, and bladder cancer,
   among others ([412]Figure 5). The Reactome pathway analysis showed
   pathways enriched in the extracellular matrix organization, cell cycle
   checkpoints, Rho GTPase effectors, control of insulin-like growth
   factor (IGF) transport, DNA replication, platelet degranulation, and
   collagen degradation, to name a few ([413]Figure 6). Finally, the
   enrichment analysis based on disease ontology exhibited urinary system
   cancer as the most significantly enriched term, followed by non-small
   cell lung carcinoma, as well as kidney, breast, musculoskeletal system,
   and renal and prostate cancers ([414]Figure 7), indicating the high
   association of the identified DEGs with BCa and disclosing the shared
   mechanisms and commonalities of different types of cancer [[415]99].

   In our study, we combined the results from the PPI network analysis and
   WGCNA methods in order to identify the key hub genes for the occurrence
   and development of BCa ([416]Table 4). The consensus WGCNA created a
   network established on the association among genes, as it is an
   unsupervised analysis, whereas the PPI network was created grounded on
   the know interactions among human proteins. The PPI analysis resulted
   in a densely connected protein network, which indicated high biological
   relevance ([417]Figure 8). In the WGCNA, despite the meta-analysis of
   various and heterogeneous datasets, we resulted in highly correlated
   consensus key modules and phenotypic characteristics ([418]Figure 10).
   It is noteworthy that the hub genes contained in the brown module were
   found to have a strong association with OS and DFS of BCa patients,
   whereas genes of the black module had relevance with the DFS of
   patients. The combination of the identified hub genes by these two
   methods resulted in 61 common genes characterized as key hub genes for
   our analysis.

   A crucial current issue is the capability to detect BCa easily and
   early using less invasive methods and, ideally, with markers showing
   high sensitivity and specificity [[419]100]. Molecular markers, such as
   circulating mRNAs, in urine and blood could offer promising sources to
   gain comprehension of BCa and its associated micro- and
   macro-environment. Therefore, we tested whether each of the key hub
   genes was differentially expressed in the urine or blood plasma of BCa
   patients. In urine specimens, 17 genes, namely AURKB, CCNB2, CDC45,
   CDCA8, CDT1, CENPU, COL3A1, GINS2, KIF20A, MCM4, PBK, PLK4, SDC1, SPP1,
   TOP2A, TTK, and UBE2C, showed statistically significant differential
   expression between BCa and healthy individuals. In previous studies,
   osteopontin (SPP1) was investigated in the urine of nephrolithiasis
   [[420]101], Alzheimer’s disease patients [[421]102], and in cancer
   patients presenting cisplatin-induced nephrotoxicity [[422]103], and
   was found to provide diagnostic value. Syndecan one (SDC1) was measured
   by a multiplex immunoassay along with nine other protein biomarkers for
   the diagnosis of BCa [[423]104] and for the detection of recurrent BCa
   [[424]105]. Ubiquitin-conjugating enzyme E2 C (UBE2C) was analyzed in
   urine samples as a potential diagnostic marker for BCa [[425]106].
   Additionally, urinary peptidome profiling, using a 22-marker panel
   including collagen type III alpha one chain (COL3A1), was investigated
   for clinical diagnostics of preeclampsia [[426]107]. Finally,
   minichromosome maintenance five (MCM5), a protein in the same family as
   MCM4, was measured in urine specimens using an immunofluorometric assay
   in order to diagnose genitourinary tract cancer [[427]108]. Apart from
   the above proteins, literature on the rest of the urinary biomarkers is
   extremely limited, if any. Hence, these urine targets should be
   investigated for their potential diagnostic value in BCa patients.

   As regards blood plasma specimens, 11 genes, namely ANXA5, ASPM, CD34,
   CDC20, CDT1, COL4A5, COL6A1, ECT2, HJURP, MCM2, and VEGFA, presented
   with statistically significant differential expression between BCa and
   healthy individuals. In previous findings, Annexin A5 (ANXA5) plasma
   levels were investigated as a potential biomarker for asthma diagnosis
   [[428]109], pregnant and non-pregnant subjects [[429]110], as well as
   for liver cirrhosis and hepatocellular carcinoma [[430]111]. Abnormal
   spindle protein homolog (ASPM) was detected in circulating tumor cells
   through single-cell genomic characterization in cancer patients
   [[431]112]. CD34 serves as an essential marker in disease research, as
   it is routinely used for identifying and isolating human hematopoietic
   stem/progenitor cells applied in bone marrow transplantation. Due to
   its high sensitivity regarding endothelial cell differentiation, it has
   also been studied as a marker for cancer [[432]113]. What is more,
   cell-free mRNAs of Holliday junction recognition protein (HJURP) were
   found to be expressed at significant levels in plasma from patients
   with lung cancer [[433]114]. Vascular endothelial growth factor A
   (VEGFA) protein plays a significant role in the growth of blood vessels
   and, as such, in diseases that involve them. These diseases include
   heart disease [[434]115], COVID-19 [[435]116], and various types of
   cancer [[436]117], like ovarian [[437]118] and breast cancer
   [[438]119]. Conclusively, it appears that the value of most of these
   potential blood molecular biomarkers remains unclear for BCa and their
   further examination may offer opportunities to improve understanding of
   BCa and assist its early identification, patient stratification, and
   enhanced outcome predictions.

   In order to assess the prognostic value of the key hub genes, we
   conducted univariate Cox, LASSO, and multivariate Cox regression
   analyses. Based on the genes and coefficients resulting from the
   multivariate Cox regression analysis, we built a three-gene prognostic
   model for BCa patients, constituting COL3A1, FOXM1, and PLK4.
   Expression changes of COL3A1 were found to be prognostic markers in BCa
   [[439]120] and to be involved in the development of MIBC [[440]121].
   This gene was determined as a potential key biomarker gene for BCa in
   our study and is further analyzed below. Forkhead box protein M1
   (FOXM1) was reported to participate in an axis that regulates the cell
   cycle process and promotes progression of BCa [[441]122] and to be a
   strong prognostic marker for disease progression in NMIBC
   [[442]123,[443]124]. It was also found to play a role in BCa recurrence
   and drug resistance to cancer therapies [[444]125]. Polo-like kinase
   four (PLK4) was characterized as an important regulator of BCa cell
   proliferation, and, therefore, as a potential novel molecular target
   for BCa treatment [[445]126].

   Our prognostic model achieved AUC values of time-dependent ROC curves
   for the 1/3/5 years of 0.796/0.779/0.846, 0.819/0.86/0.871, and
   0.82/0.806/0.774 for the training set ([446]GSE13507), the first test
   set ([447]GSE32894) and the second test set ([448]GSE32548),
   respectively. In the same context, L. Yang et al. proposed a nine-gene
   prognostic model to enhance the prognosis prediction of BCa, achieving
   an AUC value of 0.76 at five years on the training set and a value of
   0.63 for the same time on the test set [[449]127]. In another study, Z.
   Xie et al. suggested a 10-inflammatory response-associated gene
   prognostic model which reached AUC values of 0.71 and 0.67 at year one
   in the TCGA-BCa and [450]GSE13507 cohorts, respectively [[451]128].
   Furthermore, J. Lin et al. constructed an 11-gene prognostic model for
   predicting overall survival in BCa patients, reaching 1/3/5-year AUC
   values of 0.686/0.665/0.666, 0.800/0.742/0.697, 0.826/0.792/0.763, and
   0.781/0.831/0.839 for the TCGA-BCa, [452]GSE13507, [453]GSE32548, and
   [454]GSE32894 cohorts, respectively [[455]129]. F. Tang et al.
   developed a seven-gene signature in order to predict the BCa patient
   prognosis, succeeding AUC values of 0.711/0.714/0.711 and
   0.608/0.680/0.638 for the years one, three, and five in the training
   and test sets, respectively [[456]130]. Additionally, C. Zhou suggested
   an 11-autophagy-related gene signature to predict the prognosis of BCa
   patients, showing a predictive efficiency for 1/3/5-year of
   0.702/0.744/0.794 and 0.695/0.640/0.658 in the training and validation
   cohorts, respectively [[457]131]. Moreover, F. Xu developed a six-gene
   prognostic signature for BCa, showing AUC values for cancer-specific
   survival of 3/5 years of 0.96/0.967, 0.744/0.748, and 0.576/0.606 for
   the training set ([458]GSE32894) and the test sets ([459]GSE13507 and
   TCGA-BCa), respectively [[460]132]. Finally, F. Chen et al. constructed
   an eight-gene prognostic prediction model for BCa, which achieved
   maximum AUC values of 0.795 and 0.669 for the TCGA-BCa training and
   test sets, respectively [[461]133]. Remarkably, this mini review of
   recent studies underlines the superior performance of our simple
   three-gene prognostic model and emphasizes its validity.

   For the purpose of evaluating the predictive value of the key hub genes
   in terms of therapy response, we performed univariate Cox, LASSO, and
   multivariate Cox regression analyses on disease-free survival data of
   MIBC patients who received cisplatin-based chemotherapy treatment.
   Based on the genes and coefficients resulting from multivariate Cox
   regression analysis, we built a six-gene predictive model for MIBC BCa
   patients, constituting ANXA5, CD44, NCAM1, SPP1, CDCA8, and KIF14.
   ANXA5, SSP1, and CDCA8 were characterized as potential key biomarker
   genes by our analysis, and their function, as well as connection with
   Bca, are described below. The cluster of differentiation 44 (CD44)
   antigen expression levels were reported to be associated with
   progression, metastasis, and disease failure of BCa [[462]134].
   Additionally, CD44 expression was associated with BCa tumor
   aggressiveness [[463]135], and it was related to the prediction of the
   radiation response of BCa cells [[464]136]. Its expression was also
   suggested to be useful for prognostication and treatment options in
   primary BCa [[465]137]. Neural cell adhesion molecule one (NCAM1) has
   not been extensively investigated in BCa, but there are studies that
   link its expression with drug resistance in acute myeloid leukemia
   [[466]138], pleuropulmonary blastoma [[467]139], and cervical
   intraepithelial neoplasia [[468]140]. Kinesin family member 14 (KIF14)
   expression levels were linked to chemosensitivity of hepatocellular
   carcinoma [[469]141] and cervical cancer [[470]142], as well as to
   prognosis of various cancers, such as breast [[471]143] and pancreatic
   cancer [[472]144].

   Our predictive model achieved AUC values of time-dependent ROC curves
   for the first and third years of 0.603/0.688, 0.699/0.572, and
   0.619/0.673 for the training set ([473]GSE169455), the first test set
   ([474]GSE87304), and the second test set ([475]GSE69795), respectively.
   The literature on gene signature models for predicting MIBC patients’
   response to preoperative therapy is limited. In a similar study, W.
   Jiang et al. developed an immune-relevant nine-gene signature that
   could predict the immunotherapeutic response of immune checkpoint
   inhibitors, achieving a maximum AUC value of 0.69 and 0.64, in TCGA-BCa
   and IMvigor210 cohorts, respectively [[476]145]. C. Shen et al.
   constructed an immune-associated two-gene signature to predict MIBC
   patients’ response to immunotherapy, succeeding with an AUC value of
   0.695 in terms of its predictive ability [[477]146]. S. J. Choi et al.
   developed a radiomic-based model for predicting the response of MIBC
   patients to neoadjuvant chemotherapy (NAC), achieving an AUC value of
   0.75 for the validation set [[478]147]. In a similar line, A. Parmar et
   al. used a predictive radiomic signature for MIBC patients’ response to
   NAC, reaching an AUC value of 0.63 in terms of discriminating the
   patients into responders and non-responders [[479]148]. These findings
   indicate that our model’s predictive performance is satisfactory. To
   date, efforts to predict tumor response to NAC are still ongoing and
   mRNA-based gene expression profiling markers that can accurately
   predict response have yet to be introduced [[480]149].

   Taking into account the results of our integrative meta-analysis
   regarding the key hub genes that were identified to be differentially
   expressed in urine or blood plasma of BCa patients and concurrently
   hold a prognostic or predictive value, we concluded with some potential
   key biomarker genes regarding BCa. These genes include ANXA5, CDT1,
   COL3A1, SPP1, VEGFA, CDCA8, HJURP, TOP2A, and COL6A1.

   Annexin A5 (ANXA5) is a protein kinase C inhibitor and one of the
   twelve annexins that have been identified in humans (ANXA1-11, 13). It
   constitutes an anticoagulant protein that indirectly inhibits the
   thromboplastin-specific complex that participates in the coagulation
   cascade. In general, annexins are involved in the homeostatic
   regulation of intracellular calcium ion concentration and play a
   significant role in the cell life cycle, cell signaling, inflammation,
   growth, differentiation, exocytosis, and apoptosis. The annexins are
   normally found inside human cells. However, some annexins (ANXA1,
   ANXA2, and ANXA5) can be secreted from the cytoplasm to outside
   cellular environments, such as blood. In our study, the expression of
   ANXA5 was significantly overexpressed in the blood plasma of BCa
   patients, whereas it was significantly under-expressed in the urinary
   bladder tissue of BCa patients. This is owed to the fact the merged
   meta-dataset with bladder tissues included more NMIBC than MIBC
   samples, and ANXA5 was shown to be downregulated at the early stages,
   but to be upregulated at the higher stages [[481]150]. Therefore, this
   protein has been suggested to be a marker of the low- to high-grade
   stage transition of tumors in BCa. ANXA5, along with the Annexin family
   members, was found to be aberrantly expressed and highly connected with
   BCa prognosis [[482]151]. More specifically, high expression of ANXA5
   was found to be correlated with poor disease-free and progression-free
   survival times, indicating that it may be involved in the recurrence
   and progression of BCa. In another study, the unfavorable prognostic
   value of ANXA5 was verified and its high expression was linked with the
   basal-subtype MIBC [[483]152]. ANXA5 was also found to be
   differentially expressed in a variety of other cancers, such as breast
   cancer [[484]153], hepatocellular carcinoma [[485]154], and lung
   squamous cell carcinoma [[486]155]. ANXA5 is considered a predictive
   biomarker for tumor development, progression, invasion, and metastasis,
   and is suggested to be of diagnostic, prognostic, and therapeutic
   importance in cancer [[487]156].

   Chromatin licensing and DNA replication factor one (CDT1) constitutes a
   licensing factor that operates so as to restrict DNA from replicating
   over once per cell cycle. Particularly, CDT1 protein is implicated in
   the formation of the pre-replication complex which is required for DNA
   replication. CDT1 is inhibited by geminin, preventing the assembly of
   the pre-replication complex and the replication initiation at
   inappropriate origins. CDT1 protein is phosphorylated by cyclin
   A-dependent kinases resulting in its degradation. Hence, CDT1 is highly
   associated with the cell cycle, cell division, DNA replication, and
   mitosis. In our study, CDT1 was overexpressed in bladder tissue and
   urine of BCa patients, which was also confirmed by X. C. Mo et al.
   [[488]157], whereas it was significantly under-expressed in the blood
   plasma of BCa patients. CDT1 has been considered to contribute to cell
   proliferation and genome instability [[489]158] and to be often
   misregulated in cancer [[490]159]. Furthermore, when expressed at a
   high level, it was linked with poor survival and prognosis in breast
   cancer [[491]160], hepatocellular carcinoma [[492]161], colon
   [[493]162], and prostate cancer [[494]163]. Overexpression of CDT1 is
   connected with irregular cell replication, activation of DNA damage
   checkpoints, and predisposition to malignant transformation in various
   human cancers. The aberrant expression of CDT1 in BCa and its
   concomitant diagnostic and prognostic relevance remains to be furtherly
   elucidated.

   Collagen type III alpha one chain (COL3A1) encodes the pro-alpha one
   chains of type III collagen, a fibrillary collagen protein that occurs
   in most soft connective tissues, such as arteries, skin, and soft
   organs, frequently along with type I collagen. It is an essential
   extracellular matrix-related gene, as its monomers cross-assemble into
   thicker fibrils, which aggregate to form fibers, providing a strong
   support structure for tissues requiring tensile strength and playing an
   essential role in their extensibility [[495]164]. COL3A1 levels were
   reported to be remarkably upregulated in high-grade and MIBC compared
   to low-grade and NMIBC cases, and this high expression was linked with
   shorter disease-free survival [[496]120] as well as with worse overall
   survival [[497]165]. COL3A1 was also found to be among the hub genes
   associated with the progression of NMIBC to MIBC [[498]121]. Notably,
   the expression levels of COL3A1 were reported to be lower among
   patients with NMIBC [[499]166] and higher among patients with invasive
   disease, contributing to tumor progression and metastasis
   [[500]167,[501]168]. This is consistent with our meta-analysis results
   in which COL3A1 was found to be downregulated in bladder tissue, as the
   number of NMIBC samples was higher compared to MIBC. In contrast to
   this, COL3A1 was found to be under-expressed in urine samples as well,
   despite the fact that the majority of these patients had high-grade
   urothelial carcinomas. Future studies are needed in order to shed light
   on COL3A1′s role in the development of BCa and in the progression from
   NMIBC to MIBC.

   Collagen type VI alpha one chain (COL6A1) encodes the pro-alpha one
   chain of type VI collagen and belongs to the superfamily of collagen
   proteins, as COL3A1. Collagens are extracellular matrix proteins and
   play an important role in sustaining the integrity of various tissues.
   Namely, collagen VI acts as a cell-binding protein and is involved in
   the cell adhesion process. COL6A1′s elevated expressions were
   significantly correlated with worse overall survival in BCa patients
   [[502]165]. In a recent bioinformatics analysis [[503]169], COL6A1 was
   found to be a risk indicator for high progression of BCa and negatively
   associated with the patient’s prognosis. In the same study, it was also
   stated that it may be used as an individual effective diagnostic and
   prognostic biomarker for BCa, along with five other collagen family
   members. In our study, COL6A1 levels were found to be significantly
   downregulated in the blood plasma and urinary bladder tissue among BCa
   patients, substantiating previous findings in the literature
   [[504]170]. In the latter study, it was suggested that COL6A1 and
   COL6A2 may act as standard collagens by constructing a physical barrier
   to inhibit BCa tumor growth and invasion. According to a study that
   applied comparative urine proteomics profiling from prostate cancer
   patients, COL6A1 protein had a highly confirmed involvement in prostate
   cancer as well [[505]171]. Urine and blood levels of collagens may hold
   a potential diagnostic and prognostic value for BCa and should be
   properly investigated, especially in the context of the extracellular
   matrix–tumor interaction. Collagen is an essential constituent of the
   tumor microenvironment, as it participates in cancer fibrosis. Thus,
   the comprehension of its structural properties and pathophysiological
   functions in human cancers may lead to the development of novel
   anticancer therapies [[506]172].

   Secreted phosphoprotein one (SPP1), commonly known as osteopontin
   (OPN), is a major non-collagenous extracellular matrix structural
   protein and an organic component of bone. The SPP1 protein participates
   in the osteoclast attachment to the mineralized extracellular bone
   matrix. Apart from SPP1′s beneficial roles in wound healing and bone
   homeostasis, SPP1 is considered to be involved in several
   pathophysiological processes including cancer progression and
   metastasis, acting as a cardinal mediator of tumor-associated
   inflammation [[507]173], as well as immunomodulation [[508]174]. It was
   found to be significantly upregulated among bladder tumor samples in
   various previous studies [[509]175,[510]176] and to indicate poor
   prognosis in relation to advanced disease stage [[511]177]. SPP1 also
   proved to be markedly overexpressed in the bladder tissue and serum of
   transitional cell carcinoma patients [[512]178] and in MIBC patients
   compared to healthy individuals [[513]81], which is consistent with our
   results. It has also been suggested that SPP1 may be an effective
   therapeutic or diagnostic target in certain cancers, such as melanoma,
   breast, colorectal, head and neck, and lung cancer, as it appeared to
   correlate with poor clinical outcomes and promote tumor progression by
   interacting with carcinogenic genes and facilitating immune cell
   infiltration [[514]179,[515]180]. Significantly, expression of SPP1
   showed a subtype-dependent effect on chemotherapy response [[516]75],
   which was also confirmed in our analysis. More specifically, we found
   that patients who had higher SPP1 expression levels showed a lower
   response to cisplatin-based chemotherapy, which is also supported by
   previous evidence from the literature and research on other types of
   cancers, such as lung [[517]181] and ovarian cancer [[518]182]. Results
   from another study found that SPP1 was upregulated in upper tract
   urothelial carcinoma cells and tissues, and high plasma SPP1 expression
   levels were strongly connected with higher stage and grade [[519]183].
   Considering all the above, it is suggested that circulating SPP1 levels
   may be a potential biomarker for identifying BCa patients and
   predicting invasive disease and therapy response. Further research is
   required to explore its exact molecular mechanisms in BCa and to assess
   its value as a biomarker.

   Vascular endothelial growth factor A (VEGFA) is a member of the
   vascular endothelial growth factor (VEGF) and placental growth factor
   (PGF) family, which both play essential and complementary roles in
   angiogenesis. This gene encodes a heparin-binding protein, which
   constitutes a disulfide-linked glycosylated homodimer. It provokes
   proliferation and migration of vascular endothelial cells, comprising a
   key regulator of both physiological and pathological angiogenesis.
   VEGFA has long been recognized as a potential vascular and
   proliferative therapeutic target in cancer patients and it has revealed
   innovative therapeutic approaches in oncology [[520]184]. Particularly,
   VEGFA is overexpressed in many known tumors, including BCa, and its
   expression has been associated with tumor stage and progression as well
   as the patient’s prognosis [[521]185]. VEGFA’s levels were reported to
   be highly expressed in BCa [[522]186], and it was also recognized as a
   key candidate gene in BCa and as a gene related to the prognosis of
   patients with BCa [[523]187], which was also confirmed by our results.
   Although the prognostic role of VEGFA in BCa remains controversial,
   most studies converge on the fact that patients with higher tissue or
   urine VEGFA levels showed worse outcomes in both overall and
   disease-free survival [[524]177,[525]188,[526]189,[527]190], which does
   not appear to be corroborated by the results from TCGA-BCa cohort. In a
   study conducted by Z. Zhong and M. Huang et al., it was suggested that
   MYLK, which was described as a key hub gene for BCa by our study, might
   function as a competing endogenous RNA promoting BCa progression
   through modulating VEGFA/VEGFR2 signaling pathway [[528]191]. VEGFA was
   previously proposed, along with other ELISA-detected markers such as
   IL8 and MMP9, as a urinary biomarker that can accurately detect primary
   or recurrent BCa [[529]105,[530]192,[531]193]. There is evidence to
   suggest that plasma levels of VEGFA hold value as a potential
   diagnostic and prognostic biomarker for BCa patients.

   Cell division cycle associated 8 (CDCA8) encodes Borealin/Dasra B,
   which is a crucial protein component of the chromosomal passenger
   complex (CPC), an important dynamic structure that functions as a key
   regulator during mitosis. In particular, CDCA8 is necessary for the
   kinetochore attachment-error correction and for the stability of the
   bipolar spindle in human mitosis and, in disease states, it can
   contribute to distant metastasis of cancer cells. CDCA8 was reported as
   a potential prognostic biomarker for a variety of cancers, including
   breast [[532]194], liver [[533]195], and prostate cancer [[534]196]. In
   a recent study by X. Gao et al., results found that CDCA8 was
   upregulated in BCa in contrast to normal tissues, and its high
   expression was highly associated with the unfavorable prognosis of
   patients [[535]197], findings which were also highlighted by other
   authors [[536]198]. In the same study, it was shown that through the
   CDCA8 expression inhibition, the proliferation, migration, and invasion
   of BCa cell lines were also inhibited and the apoptosis of cells was
   induced. What is more, S. Pan et al. found that CDCA8, along with
   KIF11, NCAPG, and NEK2, played an essential role in the maintenance of
   BCa stem cells [[537]199]. K. Chen et al. reported that CDCA8, together
   with CENPF, AURKB, CCNB2, CDC20, TTK, and ASPM, were considered hub
   genes for BCa and verified their prognostic value [[538]200]. In
   addition, it was indicated that CDCA8 in conjunction with MKI67, CENPA,
   AURKB, FOXM1, and DLGAP5, were among the top hub genes with regard to
   BCa [[539]201]. In a current bioinformatics study [[540]202], CDCA8 and
   CDC20 were identified as candidate diagnostic biomarkers for BCa. Its
   essential role in BCa was also supported by S. Li et al., who suggested
   that lower expression of CDCA8, TOP2A, CENPF, and FOXM1, were
   associated with favorable overall survival of BCa patients [[541]203].
   CDCA8 was also proposed by J. Shi et al. to be a candidate gene in
   NMIBC [[542]204]. The aforementioned genes were all indicated as key
   hub genes for BCa by our study and the previous findings in the
   literature confirm our results. In our analysis, high CDCA8 expression
   levels were found to significantly deteriorate the overall survival of
   BCa patients ([543]Table 5), but to be associated with better overall
   survival of the MIBC patients receiving cisplatin-based chemotherapy.
   Despite the fact that there was some inconsistency, chemotherapy
   response is connected with a multitude of parameters. The reason for
   this inconsistency could be owed to the presence of more
   basal/squamous-like subtypes in the [544]GSE169455 dataset, which have
   been suggested to suppress chemotherapy efficacy [[545]205], or to the
   CDCA8′s association with the immune cell infiltration, which could be a
   predictive biomarker for chemotherapy responsiveness
   [[546]206,[547]207]. Hence, there is growing evidence that CDCA8 may
   constitute an effective therapeutic target for prognosis and treatment
   of BCa, but its exact biological function remains still obscure and
   needs to be clarified.

   Holliday junction recognition protein (HJURP) is a centromeric histone
   chaperone required for the histone H3-like variant centromere protein A
   (CENPA) recruitment and its deposition at centromeres during the early
   G1 phase. More explicitly, HJURP is a cell cycle regulated factor
   responsible for the maintenance and deposition of CENPA at centromeres
   [[548]208]. Apart from CENPA-containing chromatin assembly, it is
   involved in the regulation of DNA binding activity, chromosomal
   segregation, cell mitosis, and regulation of protein-containing complex
   assembly. Our analysis confirmed previous bioinformatics studies in
   which HJURP was suggested to be a key hub gene for BCa
   [[549]157,[550]209]. R. Cao et al. found that HJURP is highly
   overexpressed in BCa tissues at both mRNA and protein levels and
   suggested that HJURP might regulate cell proliferation and apoptosis in
   BCa by acting on the PPARγ-SIRT1 negative feedback loop [[551]210].
   Other studies reported that HJURP levels were significantly higher in
   cancerous than those in normal tissues in pancreatic [[552]211], lung
   [[553]212], breast [[554]213], prostate [[555]214], and renal cell
   cancer [[556]215], and its high expression was liked with poor
   survival. Its oncogenic role was also investigated in a recent
   pan-cancer analysis by R. Su et al., validating the aforementioned
   findings [[557]216]. In our study, HJURP was found overexpressed in BCa
   tissues and its expression levels were reported lower in the blood
   plasma of BCa patients. The role of HJURP in tumor development, and
   especially in BCa, is still unclear and it remains to be elucidated, as
   it appears that HJURP could serve as a novel diagnostic and prognostic
   biomarker for the management of BCa.

   DNA topoisomerase IIα (TOP2A) encodes a key nuclear enzyme that
   regulates the state of DNA during transcription, generates DNA
   single-strand breaks, and induces gene transcription during cell
   division. Accordingly, TOP2A is involved in chromosome formation,
   enrichment, and separation, in DNA replication and transcription, and
   it is suggested to be involved in the development of several cancer
   types. TOP2A was reported to be one of the top 10 hub genes identified
   in BCa, along with VEGFA, CCNB1, CDC20, AURKB, UBE2C, and CCNB2
   [[558]187]. Furthermore, S. Zeng et al. found that TOP2A was highly
   overexpressed in BCa, especially in high-grade and advanced-stage
   tumors, and its overexpression was highly connected with worse
   cancer-specific, progression-free, and recurrence-free survival
   [[559]217]. In the latter study, it was proved that the proliferation
   of BCa cells was especially inhibited by the knockdown of TOP2A, and
   their migration and invasion capacity was strongly suppressed. In the
   same line, F. Zhang and H. Wu found that by inhibiting TOP2A the BCa
   tumorigenesis is repressed [[560]218]. In our study, TOP2A was found to
   be significantly overexpressed in the urine of the BCa patients
   compared to controls. These results correlate fairly well with findings
   by W. T. Kim et al. [[561]219] and further support the concept of
   urinary cell-free nucleic acids that may be complementary diagnostic
   biomarkers for BCa. TOP2A was previously confirmed to be more abundant
   in the urine of patients with BCa than in the urine of controls, using
   the Western blotting technique [[562]220]. G. Botti et al. reviewed the
   effective utility of ProEx C, an immunohistochemical reagent
   incorporating TOP2A and MCM2 antibodies, as an assistant tool in
   evaluating the urothelial lesions in urine cytology and stated that it
   could accurately differentiate high-grade lesions from benign and
   reactive conditions [[563]221]. It is remarkable that MCM2 was found to
   be significantly downregulated in the blood plasma of BCa patients in
   our study ([564]Figure 13). Notably, the TOP2A/MCM2 combination was
   reported to be the best biomarker for discriminating between low- and
   high-grade squamous intraepithelial lesions for cervical cancer
   [[565]222]. Additionally, TOP2A expression levels were found to be
   significantly different between patients who completely responded to
   therapy and those who do not in our study. Previous studies have
   reported that TOP2A constitutes a marker for predicting prognosis and
   response to various cancer therapies, for instance, breast cancer
   [[566]223], soft tissue sarcomas [[567]224], as well as clear cell
   renal cell carcinoma [[568]225]. There are strong indications that
   TOP2A plays a functional role in the BCa proliferation and invasion, as
   well as in patients’ response to the disease, which remain to be
   proved.

   The nine identified potential key biomarker genes were employed as
   features in classification models, aiming to distinguish the cancerous
   and normal samples. These models showed a very high prediction accuracy
   for the vast majority of the utilized datasets, indicating that these
   genes may be used as potential diagnostic biomarkers in BCa. The
   protein expression of these genes in cancerous and normal urinary
   bladder tissues was confirmed by immunohistochemistry from the HPA.
   Overall, we highlighted the findings and main points of our study in
   [569]Table 7. The results underscore the need for validation of these
   promising BCa biomarkers in independent pre-clinical settings.

Table 7.

   The highlights of this study at a glance.
                         Highlights of This Study
     * The analysis of the merged microarray meta-dataset, comprising of
       410 BCa and 196 healthy urinary bladder tissue samples from 18
       independent datasets, revealed 815 robust differentially expressed
       genes (DEGs).
     * A total of 61 key hub genes resulted from DEG-based protein–protein
       interaction (PPI) and weighted gene co-expression (WGCNA) network
       analyses.
     * A subset of key hub genes, namely AURKB, CCNB2, CDC45, CDCA8, CDT1,
       CENPU, COL3A1, GINS2, KIF20A, MCM4, PBK, PLK4, SDC1, SPP1, TOP2A,
       TTK, and UBE2C, were found to be differentially expressed in the
       urine of BCa patients.
     * A subset of key hub genes, namely ANXA5, ASPM, CD34, CDC20, CDT1,
       COL4A5, COL6A1, ECT2, HJURP, MCM2, and VEGFA, were found to be
       differentially expressed in the blood plasma of BCa patients.
     * Bioinformatics tools and machine learning techniques were utilized
       to reveal and assess the diagnostic, prognostic, and predictive
       value of the identified key hub genes.
     * A three-gene signature prognostic model for BCa patients, including
       COL3A1, FOXM1, and PLK4, was built and demonstrated high
       performance.
     * A six-gene signature predictive model regarding MIBC patients’
       response to neoadjuvant chemotherapy, including ANXA5, CD44, NCAM1,
       SPP1, CDCA8, and KIF14, was developed and showed satisfactory
       performance.
     * Overall, nine genes, namely ANXA5, CDT1, COL3A1, SPP1, VEGFA,
       CDCA8, HJURP, TOP2A, and COL6A1, were identified as potential
       prognostic and therapeutic target biomarkers for BCa, they were
       immunohistochemically validated using Human Protein Atlas (HPA),
       and were bibliographically analyzed.

   [570]Open in a new tab

   It is plausible that there were a number of limitations that could have
   influenced the results of the present study and should be declared. The
   first is the fact that the 606 bladder tissue samples that were
   meta-analyzed did not originate from the corresponding number of
   patients. In many studies, case and control samples were collected from
   the same patient, from cancerous and adjacent healthy tissues (matched
   pairs), which could introduce some intra-subject correlations.
   Additionally, there were studies that performed sample pooling prior to
   hybridization. Secondly, the inevitable technical sources of variation
   confounded our analysis either when they were corrected or not. We
   tried to handle them properly and justify the followed methodology in
   each case. Another considerable limitation was the fact that there were
   no in vivo experiments conducted in order to validate the potential
   functions and mechanisms of the identified genes in the development and
   progression of BCa. Moreover, it should be underlined that BCa
   constitutes a highly heterogeneous malignancy, which is composed of
   various molecular subtypes. The identification of DEGs, as well as
   genes with prognostic and predictive value, may considerably vary
   depending on the specific subtype under investigation. However, in this
   study, we aimed to identify the main hub genes that aggregately
   differentiate BCa cells from the normal ones. Last but not least, this
   study integrated mRNA gene expression data derived solely from
   microarray experiments and validated some results using bioinformatics
   tools that incorporate RNA-Seq or IHC data. Further studies including a
   wider range of data types, such as non-coding RNA, DNA methylation, and
   RNA-Seq data, are planned to be performed.

5. Conclusions

   In conclusion, this study aspired to contribute to the elucidation of
   the genetic changes occurring in BCa, using systematic bioinformatic
   tools and methods. In particular, we successfully integrated gene
   expression data from multiple datasets and we identified a list of hub
   genes that appear to play an essential role in the development and
   progression of BCa. A subset of these genes, namely ANXA5, CDT1,
   COL3A1, SPP1, VEGFA, CDCA8, HJURP, TOP2A, and COL6A1, was associated
   with altered gene expression in urine or blood plasma of patients and
   were highlighted as potential diagnostic markers for BCa. Moreover, the
   study revealed a three-gene signatures (COL3A1, FOXM1, and PLK4) that
   achieved high prognostic performance in relation to the overall
   survival of BCa patients, and a six-gene signature (ANXA5, CD44, NCAM1,
   SPP1, CDCA8, and KIF14) that showed satisfactory predictive performance
   in terms of disease-free survival of MIBC patients receiving
   cisplatin-based NAC. Further research is needed to validate the
   clinical value of these biomarkers and their potential in BCa
   treatment.

Supplementary Materials

   The following supporting information can be downloaded at:
   [571]https://www.mdpi.com/article/10.3390/cancers14143358/s1, Figure
   S1: PCA on each of the 18 datasets, using Euclidean distance to measure
   dissimilarities over gene expression values between samples. Red points
   denote BCa samples and blue points denote control samples; Figure S2:
   PCA on datasets GSE13507A and GSE13507B. Red points denote BCa samples
   and blue points denote control samples; Figure S3: Volcano plot of DEGs
   between BCa and control samples in the merged meta-dataset. The DEGs
   were identified based on the criteria |log[2](FC)| ≥ 1.3 and adj.
   p-value < 0.01, as represented with gray lines. The blue and red points
   denote downregulated genes and upregulated genes, respectively. The
   black points denote genes showing no statistically significant
   difference in expression between the two phenotypes; Figure S4: Summary
   of structural network indices in relation to the soft thresholding
   power for each of the eight selected datasets; Figure S5: Hierarchical
   clustering dendrogram of genes for determining consensus modules based
   on consensus topological overlap. Genes in a common module are assigned
   the same color, as presented in the color band below the dendrogram.
   Genes not assigned to any of the modules are colored gray; Figure S6:
   Heatmaps relationships of consensus module eigengenes and phenotypic
   traits across the eight datasets. Each row corresponds to a consensus
   module eigengene and, each column corresponds to a phenotypic
   characteristic. Each cell contains the corresponding correlation
   (ranging from blue to red) and p-value; Figure S7: Scatter plots of
   gene significance (GS) for “tumor” and module membership (MM) for the
   key modules. The lines indicate the upper and the lower quartile;
   Figure S8: Expression correlation analysis of the three genes in the
   prognostic model, obtained from the GEPIA2 platform. The Pearson
   correlation coefficient was used, and indicated that there is no
   statistically significant correlation coefficient among these genes
   (maximum value 0.6); Figure S9: Expression correlation analysis of the
   six genes in the predictive model, obtained from the GEPIA2 platform.
   The Pearson correlation coefficient was used and indicated that there
   is no statistically significant correlation coefficient among these
   genes (maximum value 0.44).
   [572]Click here for additional data file.^ (1.3MB, zip)

Author Contributions

   Conceptualization, M.S., G.I.L. and V.Z.; methodology, M.S. and G.I.L.;
   software, M.S.; validation, M.S.; formal analysis, M.S.; investigation,
   M.S.; resources, M.S. and G.I.L.; data curation, M.S.; writing—original
   draft preparation, M.S.; writing—review and editing, M.S., G.I.L., V.Z.
   and D.K.; visualization, M.S. and G.I.L.; supervision, G.I.L. and D.K.;
   project administration, M.S., G.I.L. and D.K. All authors have read and
   agreed to the published version of the manuscript.

Institutional Review Board Statement

   Not applicable.

Informed Consent Statement

   Not applicable.

Data Availability Statement

   All data are available from the GEO repository. The present analysis is
   available after a reasonable request to the first author.

Conflicts of Interest

   The authors declare no conflict of interest.

Funding Statement

   This research received no external funding.

Footnotes

   Publisher’s Note: MDPI stays neutral with regard to jurisdictional
   claims in published maps and institutional affiliations.

References