Abstract The effective detection and management of muscle-invasive bladder Transition Cell Carcinoma (TCC) continues to be an urgent clinical challenge. While some differences of gene expression and function in papillary (Ta), superficial (T1) and muscle-invasive (≥T2) bladder cancers have been investigated, the understanding of mechanisms involved in the progression of bladder tumors remains incomplete. Statistical methods of pathway-enrichment, cluster analysis and text-mining can extract and help interpret functional information about gene expression patterns in large sets of genomic data. The public availability of patient-derived expression microarray data allows open access and analysis of large amounts of clinical data. Using these resources, we investigated gene expression differences associated with tumor progression and muscle-invasive TCC. Gene expression was calculated relative to Ta tumors to assess progression-associated differences, revealing a network of genes related to Ras/MAPK and PI3K signaling pathways with increased expression. Further, we identified genes within this network that are similarly expressed in superficial Ta and T1 stages but altered in muscle-invasive T2 tumors, finding 7 genes (COL3A1, COL5A1, COL11A1, FN1, ErbB3, MAPK10 and CDC25C) whose expression patterns in muscle-invasive tumors are consistent in 5 to 7 independent outside microarray studies. Further, we found increased expression of the fibrillar collagen proteins COL3A1 and COL5A1 in muscle-invasive tumor samples and metastatic T24 cells. Our results suggest that increased expression of genes involved in mitogenic signaling may support the progression of muscle-invasive bladder tumors that generally lack activating mutations in these pathways, while expression changes of fibrillar collagens, fibronectin and specific signaling proteins are associated with muscle-invasive disease. These results identify potential biomarkers and targets for TCC treatments, and provide an integrated systems-level perspective of TCC pathobiology to inform future studies. Introduction Bladder cancer is a disease receiving growing attention within the cancer biology community. Transition Cell Carcinoma (TCC) occurs as papillary tumors (Ta stage), superficial tumors (T1 stage), and muscle-invasive tumors of increasing severity (T2, T3 and T4 stage). Approximately 20% of primary bladder cancers are muscle-invasive at presentation and are associated with a poor prognosis, with 5 year survival estimates for muscle-invasive TCC approaching the low survival rates for advanced metastatic pancreatic cancers, small cell lung cancers, liver and bile-duct cancers, stomach and non-small cell lung carcinomas [29][1], [30][2]. Although papillary and superficial tumors recur in 70% of patients after surgical removal, non-invasive tumors have a more favorable outcome than muscle-invasive tumors as only 10–20% of these recurrences progress to muscle-invasive disease [31][2]. The regulatory mechanisms that are altered and disrupted in muscle-invasive bladder cancer may represent a barrier to progression in superficial tumors, and are candidate targets for therapeutic intervention. Accumulating evidence suggests that superficial and muscle-invasive tumors are pathobiologically distinct [32][3], [33][4]. Superficial tumors frequently overexpress or express constitutively active mutants of HRAS and FGFR3 leading to hyperactivated Ras/MAPK signaling activity [34][3], [35][4]. Muscle-invasive tumors demonstrate disrupted activity of p53 and Rb and other tumor suppressors, overexpress EGFR and ErbB2, MMP2 and MMP9, and other pro-angiogenic factors, while having deleted cyclin-dependent kinase inhibitor genes CDKN2A (p16^Ink4a) and CDKN2B (p15^Ink4b) [36][4]. However, there is evidence that suggests that Ta tumors in some patients may progress and become muscle-invasive. In addition to observations that 10–20% of patients who initially have Ta tumors later develop muscle-invasive disease, tumors ≥T1 share common chromosomal deletions, gains and amplifications that are distinct from those found in Ta stage tumors, suggesting that accumulated chromosomal aberrations may be involved in the progression from papillary to muscle-invasive tumors [37][1], [38][2]. Moreover, while activating mutations in FGFR3, ras isoforms, and PI3K are more common in papillary tumors, most high grade superficial tumors lack these mutations and are similar to invasive tumors, suggesting that these may be predisposed to progress to muscle-invasive disease [39][5]. At the same time, some tumor recurrences lose the activating mutations that are present in earlier tumors, which may also potentially drive progression [40][5]. In all, the relationship of superficial and muscle-invasive bladder cancer remains controversial and largely unresolved. There are few effective systemic treatments for muscle-invasive bladder cancer, and an improved understanding of the molecular pathogenesis and progression of TCC is urgently needed. Analysis mRNA expression in different stages of bladder cancer could illustrate differences that exist which may promote progression from Ta and T1 tumors to higher stage recurrences, while similarities in expression patterns could clarify the relationship of tumor stages in progression of the disease. Insights into these molecular mechanisms of bladder tumor progression can provide targets for preventative and therapeutic interventions while providing biomarkers that reliably predict progression into muscle-invasive disease. Microarray technology provides a powerful tool to measure mRNA expression across the entire genome of biological samples, allowing detailed analysis of large numbers of experimental and clinical samples in relatively little time. Typically in studies of various cancers, clinical samples are analyzed to identify genes expressed at relatively high or low amounts in common patterns that correlate with tumor stage and patient survival [41][6]. While this approach may be appropriate for identifying potentially useful diagnostic and prognostic biomarkers, it is often difficult to determine whether signature genes in advanced cancers are functionally relevant to tumor progression. Strategies to derive functional information from gene-expression datasets include expression clustering to identify genes with similar patterns of expression [42][7]and pathway enrichment programs including WebGestalt [43][8] that query Gene Ontology, Kyoto Encyclopedia of Genes and Genomes (KEGG), and other databases to identify processes in which gene expression changes are focused. Another strategy is to analyze the published literature using automated text-mining programs such as PubGene and Chilibot to identify functional associations between genes, phenotypes and diseases within publications indexed in PubMed [44][9]–[45][12]. Such resources have made computational informatic tools available to scientists who need to analyze and interpret microarray data, yet few studies have taken advantage of this information. Studies of whole-genome expression using microarray technology require resources that may not be widely available to all researchers. Reliable microarray studies require large numbers of samples to provide for an adequate analysis and can be prohibitively expensive and time consuming. Access to human tissues required for relevant analyses is often not available to researchers outside of the medical field. The establishment of public microarray data archives, including the Gene Expression Omnibus ([46]http://www.ncbi.nlm.nih.gov/geo/), has allowed free-access to a wealth of clinical and experimental data that can be analyzed and re-purposed by any investigator with the interest and the means to do so. The development of open-access analytical programs, including Oncomine ([47]https://www.oncomine.org) [48][13] and Mayday [49][14], provide a means to investigate gene expression in data across multiple microarray studies. These advances allow researchers to pursue genome-scale investigations in clinically-derived human tissues without the time, expense, and administrative efforts associated with generating primary genomic data. Here, we use public data to identify genes that are associated with TCC tumor progression and muscle-invasive disease in male patients without metastases. We find that the differential expression of a network of genes related to the mitogenic Ras/MAPK and related signaling pathways are associated with progression beyond Ta stage. Additionally, we identified a subset of 7 genes within this network whose expression is associated with muscle-invasive TCC. Materials and Methods Microarray Dataset The Gene Expression Omnibus (GEO) website ([50]http://www.ncbi.nlm.nih.gov/geo/) was used to search publicly available datasets for recent studies of bladder cancers with at least 3 samples representing all tumor grades and performed using up-to-date whole-genome microarray chips. We chose a dataset (GEO# GSE 31684; accessed 2/29/2012) which was originally used to develop predictive models of bladder cancer outcome [51][15], as it fulfilled these requirements and provided detailed patient data. Data was further selected from male TCC patients without associated metastases, carcinoma in situ or sarcoma to reduce potential variation in data, resulting in a total of 36 patient datasets ([52]Table 1). Table 1. Tumor stage, sample size and age range of data selected from GSE 31684 dataset. Tumor Stage n =  Age Range Ta 3 57–78 T1 7 57–74 T2 10 51–80 T3 9 66–84 T4 7 50–83 [53]Open in a new tab Data selection, calculations and statistics The meta-analysis of public data was performed according to PRISMA guidelines ([54]Table S1, [55]Table S2). Microarray expression data was organized, managed, and calculated using Microsoft Excel (Redmond, WA). The microarray expression data of all 36 sample datasets were expressed in log[2] scale, and the average expression values were calculated for samples of each pathological tumor grade. To identify general expression changes associated with progression of TCC, the gene probe expression values of tumor stage T1, T2, T3 and T4 samples were individually normalized to Ta by subtracting their log[2] averages. Gene probes were selected based on both an increase or decrease in expression by a factor of 2 (log[2]1/−1) and statistical significance (p<0.05) in at least one tumor stage based on Student's t-test as a more stringent method of selection than either criterion alone. To identify expression changes associated with muscle-invasive disease, genes were selected that did not significantly change from Ta to T1 but are significantly different by a factor of 2 in T2 tumors in addition to significance based on Student's t-test. Further, the average expression values of selected gene probes in each tumor grade were compiled, and 3×3 matrix self-organizing map-based clustering analysis of relative gene expression across tumor stages was performed with 10,000 iterations using Mayday [56][14]. Clusters of gene probes with minimal expression changes, representing apparent false-positives, were arbitrarily omitted from subsequent analysis. Statistical pathway enrichment analysis of differentially expressed genes To perform pathway enrichment analysis on selected genes, gene probe IDs were annotated with Entrez gene symbols (Affymetrix Human Genome U133 Plus 2.0 [HG-U133_Plus_2]; GEO# [57]GPL570) and analyzed using the WebGestalt website ([58]http://bioinfo.vanderbilt.edu/webgestalt/) to identify defined KEGG pathways that are significantly over-represented in the dataset [59][8], using default settings and the Fisher's Exact test, selecting pathways where p<0.01. These results produced lists of the genes included in the enriched pathways, and provided a connectivity map of each pathway. A model network based on these pathways was generated using the Visual Understanding Environment ([60]http://vue.tufts.edu/). Gene expression heatmaps were generated using Mayday ([61]http://www-ps.informatik.uni-tuebingen.de/mayday/wp/) [62][14]. Statistical validation of muscle-invasive tumor gene signature using outside datasets To determine whether muscle-invasive-specific expression changes identified in our initial dataset are typical across an expanded number of samples and studies, we obtained expression data for each gene compiled within the Oncomine database [63][13]. This included 8 datasets from 7 studies in which gene expression is measured in superficial and muscle-invasive bladder tumors [64][16]–[65][22]. The median expression of each gene in superficial and muscle-invasive tumors from each dataset was recorded and the averaged difference was calculated. A Mann-Whitney Rank-Sum test was performed to determine the significance of expression changes across datasets. Bibliomic text-mining for functional gene association To obtain published results that link functional interactions of genes and phenotypes, selected genes were analyzed using PubGene [66][10], while genes and keywords “bladder cancer”, “metastasis”, “angiogenesis”, “invasive” were analyzed using Chilibot [67][11]. An association network based on co-occurrances and implied functional relationships determined by each program was constructed based on these results. Immunohistochemistry of Bladder Tumors Samples (n = 5 each) of normal bladder and T1, T2 and T3 bladder tumors were purchased from Biomax.us (Rockville, MD). Samples were processed and stained by the TRIP Laboratory facility (Depatment of Pathology, University of Wisconsin School of Medince and Public Health) using standard immunostaining procedures. Antibodies to COL3A1 and COL5A1 were purchased from Santa Cruz Biotechnology (Santa Cruz, CA) and visualized using DAB and Warp Red stains. Nuclei were visualized using hematoxylin staining. Microscopy and image processing was performed as previously described [68][23]. Western Immunoblotting T24 bladder cancer cells and E6-immortalized human urothelial cells (HUCs) were the kind gift of Dr. Dale Bjorling, University of Wisconsin. T24 cells were cultured in DMEM media +10% FBS, while HUCs were cultured in Ham's F12 media +10% FBS. Cells were cultured to 70% confluence, scraped from plates, collected, solubilized and processed, analyzed by western immunoblotting, and quantified using Image J, as previously described [69][23]. Antibodies to COL3A1 and COL5A1 are described above. Antibodies recognizing β-actin were used as a loading control. Results TCC progression is associated with differential expression of Ras/MAPK and related pathway genes Our hypothesis was that gene expression differences exist between papillary/superficial and muscle-invasive bladder TCC tumors that define the regulation of tumor invasion, progression and metastasis. We developed an intuitive method, summarized in [70]Figure 1, to select, analyze and interpret public microarray data using open access computational resources. We found a suitable dataset from the GEO database ([71]GSE31684) containing expression data from 93 patient bladder tumors that were removed prior to any chemical treatment and used to identify gene expression signatures that predict outcome of high-risk bladder cancer [72][15]. To limit potential variation in sample expression data, sample data were selected from male patients with TCC of the bladder, without metastatic tumors, CIS or sarcoma, producing 36 total datasets that represented at least three patients per stage for all stages of TCC tumors ([73]Table 1). Figure 1. Flow diagram detailing the methods used to identify genes associated with TCC progression and muscle-invasive behavior from expression microarray data. [74]Figure 1 [75]Open in a new tab First, we calculated expression differences between Ta and T1–T4 bladder tumors to identify changes broadly associated with bladder cancer progression beyond papillary tumor stages. Data were processed, averaged, normalized to the expression of Ta tumors, and selected based on a greater than 2-fold threshold difference in expression at any stage beyond Ta, and p<0.05 using Student's t-test. This produced a list of 8110 gene-associated probes. Expression data was retrieved for those selected probes and analyzed using the self-organizing map cluster analysis within Mayday [76][14] to arbitrarily remove probes that reflect little change in expression, selecting 5509 total probes. As receptor tyrosine kinases, mitogenic signaling pathways and tumor suppressors are each involved in superficial and muscle-invasive bladder cancer, our aim was to identify changes in gene expression that could compliment the activities of these pathways. The list of selected probes was analyzed with WebGestalt [77][8] to identify defined pathways that are significantly represented in the microarray data (p<0.05). Whether the selected genes were analyzed as a single group or separated based on increased or decreased expression, many of the pathways reliably associated with these genes are related to cancer-related signal transduction and similar pathways ([78]Table 2, [79]Table S3). The ten best supported pathways associated with the entire set of genes include “Cell Cycle”, “Pathways in Cancer”, and “Focal Adhesion”. Beyond these ten, “MAPK signaling”, “ErbB signaling” and “Bladder Cancer” pathways were also well supported. Likewise, when this list was separated into separate lists of 4816 increasing and 693 decreasing genes, many similar cancer-related signal transduction pathways were found to be represented in these genes. While other pathways represented in the data could also be associated with cancer progression and metastasis, such as “Metabolism” and “DNA Repair”, the relationship of these pathways to current models of bladder progression is less direct. These results suggest that bladder cancer progression from Ta to T1+ stage tumors involves the increased expression of genes in mitogenic, cancer-associated pathways related to FGFR3 and ErbB family signaling. Table 2. KEGG pathway enrichment of genes differentially expressed in T1–T4 staged bladder tumors versus Ta-stage tumors. KEGG Pathway Number of Pathway Genes in Dataset Enrichment Factor “R” Fisher's Exact, Adjusted P-value All Changes, Top 10 Metabolic pathways 332 3.46 5.26E-93 Cell cycle[80]^* 68 6.11 2.46E-36 Spliceosome 68 6.11 2.46E-36 DNA replication 29 9.27 3.29E-23 Pyrimidine metabolism 47 5.52 1.05E-22 Ubiquitin mediated proteolysis 54 4.50 6.33E-21 Pathways in cancer[81]^* 86 3.00 2.50E-19 Focal adhesion[82]^* 63 3.61 1.30E-18 Huntington's disease 60 3.73 1.32E-18 Purine metabolism 52 3.96 1.75E-17 Selected Pathways MAPK signaling pathway[83]^* 60 2.57 6.88E-11 ErbB signaling pathway[84]^* 26 3.44 5.92E-08 Bladder cancer[85]^* 16 4.38 7.25E-07 Genes Up, Top 10 Metabolic pathways 299 3.57 1.64E-85 Spliceosome 66 6.79 1.34E-37 Cell cycle[86]^* 62 6.38 2.13E-33 DNA replication 29 10.61 7.10E-25 Pyrimidine metabolism 46 6.18 3.64E-24 Ubiquitin mediated proteolysis 50 4.77 3.54E-20 Huntington's disease 58 4.13 7.22E-20 Purine metabolism 51 4.45 3.88E-19 Oocyte meiosis 42 4.85 1.95E-17 Alzheimer's disease 50 3.91 4.12E-16 Parkinson's disease 43 4.26 1.71E-15 Genes Down, Top 10 Complement and coagulation cascades 11 14.12 1.22E-08 Pathways in cancer[87]^* 21 5.64 1.22E-08 Focal adhesion[88]^* 17 7.49 1.22E-08 MAPK signaling pathway[89]^* 17 5.60 4.00E-07 Small cell lung cancer 10 10.54 8.56E-07 Metabolic pathways 35 2.81 1.07E-06 p53 signaling pathway 9 11.55 1.31E-06 Tight junction 11 7.27 5.35E-06 NOD-like receptor signaling pathway 8 11.43 5.84E-06 Metabolism of xenobiotics by cytochrome P450 8 10.12 1.36E-05 [90]Open in a new tab ^(*) indicates KEGG pathways that involve mechanisms of signal transduction. P value represents the results of Fisher's Exact tests reported by WebGestalt. KEGG pathways and other related ontologies and annotations are highly redundant, with individual genes associated with many multiple limited and interconnecting pathways. For this and other reasons, pathway enrichment analysis alone often does not adequately represent the global scope and relationships within the data. To better understand the relationships of these genes and how they may function in bladder cancer progression, we built an integrated pathway network model based on the KEGG maps of selected signaling pathways represented in the data ([91]Table 2; [92]Table S3). These results reveal that gene expression differences between Ta and T1 stage tumors largely occur within a signaling network related to Ras/MAPK signaling pathways, with increased expression of a majority of these genes ([93]Figure 2). Figure 2. Altered expression in a network of Ras/MAPK associated signaling pathway genes in TCC progression. [94]Figure 2 [95]Open in a new tab The stimulatory and inhibitory interactions of gene products differentially expressed in T1–T4 versus Ta tumors were mapped based on KEGG pathway maps. Node color denotes relative expression: Yellow = Increased expression in muscle-invasive tumors; Blue = Decreased expression in muscle-invasive tumors. Green and red arrows represent stimulatory and inhibitory interactions, respectively. The names of genes with supported associations with muscle-invasive tumors are highlighted in bold italics. Muscle-invasive TCC is associated with specific expression changes in ECM and signaling proteins We further hypothesized that the expression of a subset of the genes within this signaling network model is specifically altered in muscle-invasive versus superficial tumors. Returning to the complete dataset, we selected genes that are expressed at similar levels within a 2 fold difference between Ta and T1 tumors, and are changed greater than 2 fold in T2 stage tumors with Student's t-test p<0.05 ([96]Figure 1). After performing expression clustering analysis and arbitrarily removing clusters with minimal expression changes, as above, 496 total gene probes were selected for pathway enrichment analysis using WebGestalt. As before, these results show that the selected genes are associated with many signal transduction and cancer-related pathways that involve 23 of the selected genes ([97]Table 3, [98]Table S4, [99]Figure 3). These genes were not focused in any single pathway, but instead were distributed across the network of pathways ([100]Figure 2). Visualizing gene expression data relative to Ta tumors in a heatmap, the expression changes of selected genes are consistent in T2, T3 and T4 tumors ([101]Figure 3). These results suggest that the differences of selected gene expression are distinct fundamental molecular characteristics of muscle-invasive tumors. Table 3. KEGG pathway enrichment of genes differentially expressed in T2 versus Ta and T1 stage bladder tumors. KEGG Pathway Number of Pathway Genes in Dataset Enrichment Factor “R” Fisher's Exact, Adjusted P-value All Changes, Top 10 Focal adhesion[102]^* 14 8.4 1.44E-07 Tight junction 11 9.92 7.39E-07 Pancreatic cancer 8 13.43 4.23E-06 Androgen and estrogen metabolism 6 16.12 3.96E-05 Adherens junction 7 10.99 6.08E-05 ECM-receptor interaction 7 10.08 9.07E-05 Retinol metabolism 6 11.33 2.00E-04 MAPK signaling pathway[103]^* 11 4.94 2.00E-04 Ascorbate and aldarate metabolism 4 19.34 5.00E-04 Drug metabolism – other enzymes 5 11.85 5.00E-04 Selected Pathways Pathways in cancer[104]^* 10 3.6 1.90E-03 Cell cycle[105]^* 6 5.67 2.30E-03 Genes Up, Top 10 Focal adhesion[106]^* 11 12.79 5.49E-08 ECM-receptor interaction 7 19.48 1.66E-06 Cell cycle[107]^* 5 9.13 2.10E-03 MAPK signaling pathway[108]^* 7 6.08 2.10E-03 Tight junction 5 8.72 2.50E-03 Regulation of actin cytoskeleton 6 6.49 2.70E-03 Progesterone-mediated oocyte maturation 4 10.87 2.90E-03 Valine, leucine and isoleucine degradation 3 15.93 4.60E-03 Sulfur metabolism 2 35.95 6.40E-03 Vascular smooth muscle contraction 4 8.13 6.60E-03 Genes Down, Top 10 Retinol metabolism 6 12024 1.09E-05 Androgen and estrogen metabolism 5 27.68 1.63E-05 Adherens junction 6 19.41 1.63E-05 Drug metabolism – other enzymes 5 24.42 2.29E-05 Ascorbate and aldarate metabolism 4 39.86 2.73E-05 Pentose and glucuronate interconversions 4 36.91 3.14E-05 Pancreatic cancer 5 17.3 5.67E-05 Drug metabolism – cytochrome P450 5 17.3 5.67E-05 Metabolism of xenobiotics by cytochrome P450 5 17.79 5.67E-05 [109]Open in a new tab ^(*) indicates KEGG pathways that involve mechanisms of signal transduction. P value represents the results of Fisher's Exact tests reported by WebGestalt. Figure 3. Expression heatmap of candidate muscle-invasive genes. Figure 3 [110]Open in a new tab Yellow and blue indicate increased and decreased expression in candidate muscle-invasive genes, respectively, relative to the average expression in Ta tumors. We then determined whether the relative expression of the selected genes could be observed in outside microarray datasets of superficial and muscle-invasive bladder tumors. Using Oncomine [111][24], we identified archived datasets containing expression data for each of the 23 selected candidate genes. The median gene expression of non-invasive and invasive tumors in each dataset was recorded, and the average change in expression of each gene was calculated across all datasets ([112]Table 4). The Mann-Whitney Rank-Sum test was performed to determine the significance of the changes in median gene expression across datasets. These results show that expression of extracellular matrix genes COL3A1, COL5A1, COL11A1 and FN1 are significantly increased in muscle-invasive bladder tumors relative to Ta/T1 (p<0.05), while changes in CDC25C, MAPK10 and ErbB3 expression approached significance (p≤0.08) ([113]Table 4). These final selected genes represent a sub-network of consistently-observed differences in gene expression of muscle-invasive and non-invasive bladder tumors that are associated with the activity of Ras/MAPK, PI3K and other signaling pathways ([114]Figure 4). Table 4. Average normalized median expression of selected genes differentially expressed in non-invasive and muscle-invasive TCC in at least 3 datasets within the Oncomine Collection. Gene Average Change in Median (Fold Increase vs Superficial) St.Dev. Mann-Whitney Rank-Sum p =  References