Abstract

Background

   Prostate cancer is one of the most common complex diseases with high
   leading cause of death in men. Identifications of prostate cancer
   associated genes and biomarkers are thus essential as they can gain
   insights into the mechanisms underlying disease progression and
   advancing for early diagnosis and developing effective therapies.

Methods

   In this study, we presented an integrative analysis of gene expression
   profiling and protein interaction network at a systematic level to
   reveal candidate disease-associated genes and biomarkers for prostate
   cancer progression. At first, we reconstructed the human prostate
   cancer protein-protein interaction network (HPC-PPIN) and the network
   was then integrated with the prostate cancer gene expression data to
   identify modules related to different phases in prostate cancer. At
   last, the candidate module biomarkers were validated by its predictive
   ability of prostate cancer progression.

Results

   Different phases-specific modules were identified for prostate cancer.
   Among these modules, transcription Androgen Receptor (AR) nuclear
   signaling and Epidermal Growth Factor Receptor (EGFR) signalling
   pathway were shown to be the pathway targets for prostate cancer
   progression. The identified candidate disease-associated genes showed
   better predictive ability of prostate cancer progression than those of
   published biomarkers. In context of functional enrichment analysis,
   interestingly candidate disease-associated genes were enriched in the
   nucleus and different functions were encoded for potential
   transcription factors, for examples key players as AR, Myc, ESR1 and
   hidden player as Sp1 which was considered as a potential novel
   biomarker for prostate cancer.

Conclusions

   The successful results on prostate cancer samples demonstrated that the
   integrative analysis is powerful and useful approach to detect
   candidate disease-associate genes and modules which can be used as the
   potential biomarkers for prostate cancer progression. The data, tools
   and supplementary files for this integrative analysis are deposited at
   [31]http://www.ibio-cn.org/HPC-PPIN/.

   Keywords: Biomarker, Disease-associated Genes, Integrative analysis,
   Prostate cancer, Transcription factor

Background

   Prostate cancer is the second leading cause of morbidity and mortality
   in men [[32]1,[33]2]. In recent years, the incidence rate of prostate
   cancer has dramatically increased [[34]3], and this is largely because
   of lack of diagnosis and treatment of the disease at the early stage
   [[35]4]. Thus, the successful clinical biomarkers for early diagnosis
   of the presence of prostate cancer become very urgent to reduce the
   death risk of the prostate cancer [[36]5,[37]6].

   In the post-genomics era, there is an explosion of biological data and
   information generated from high-throughput technologies which have
   rapidly provided an unprecedented multi-level omics data [[38]7]. Such
   transcriptomics, referred to as gene expression profiling can now
   comprehensively survey the entire human genomics. Moreover, enormous
   efforts have been made to identify biomarkers for various cancers by
   the analysis of different transcriptomics data [[39]8-[40]12]. As an
   example reported by our previous study, integrative transcriptomics
   data could be used to identify putative novel prostate cancer
   associated pathways, such as Endothelin-1/EDNRA trans-activation of
   EGFR pathway which would provide essential information for development
   of network biomarkers and individualized therapy strategy for prostate
   cancer [[41]11-[42]13]. Looking at the other relevant studies for
   cancer transcriptomics, a large scale expression study presented by
   Wang et al. identified a set of gene markers for prediction of
   metastasis for breast cancer [[43]14] and followed by Chari et al.
   demonstrated an approach based on multiple concerted disruptions (MCD)
   analysis and identified genes and pathways in cancer [[44]15].
   Furthermore, transcriptomics could be used to identify metabolic
   biomarkers through alterative metabolic pathways at different cancer
   phases [[45]16]. Concerning on the other levels of omics, proteomics in
   context of protein-protein interaction network could also be used to
   characterize and diagnose a pathological process [[46]17]. As clearly
   reported by Ideker and Sharan [[47]18], the indicating genes as
   biomarkers in complex diseases tend to cluster together on
   well-connected proteins interaction sub-networks. In following years,
   Chuang et al. also showed that it could be useful to extract
   co-expressed functional sub-networks for metastasis of breast cancer
   through integrating transcriptomics data with protein-protein
   interaction to obtain higher classification accuracy [[48]19]. Later,
   Taylor et al. studied the altered protein interaction modularity to
   predict breast cancer progression by examining the biochemical
   structure of the interactome [[49]20]. Besides, there were similar
   studies for analysis of sub-networks and/or hub proteins which had been
   helpful for the understanding of the metastasis of cancer at the
   molecular level [[50]18].

   Focusing on prostate cancer, there were some reports on identifying
   disease-related gene modules, sub-networks or dysfunctional pathways
   focused on global characteristics of interactome together with gene
   expression data by different novel algorithms and methods development
   [[51]21-[52]23]. Nonetheless, there are still few studies on
   identification of prostate cancer biomarkers for early detection of the
   presence as well as disease progression [[53]20]. The relationships
   among the potential prostate cancer genes and associated functions as
   well as pathways are still poorly characterized, such as how they
   interacted and regulated with each other, also what they act within the
   network modules. These investigations are warranted for a comprehensive
   understanding of the molecular mechanisms underlying prostate cancer
   progression. Hence, it is a challenge to perform an integrative
   analysis of different data, which can be gene expression profiling,
   protein-protein interaction (PPI) data, pathway information, and
   clinical information, that can offer different perspectives on the
   biological problems in prostate cancer and further identification of
   potential biomarkers [[54]24,[55]25].

   In this study, we therefore aim to reveal candidate disease- associated
   genes and biomarkers for prostate cancer progression by integrative
   gene expression profiling and network analysis at a systematic level.
   We first reconstructed human prostate cancer protein-protein
   interaction network and used this network as a scaffold for further
   integrative analysed with gene expression data of prostate cancer.
   Here, analysis of gene expression profiling of prostate cancer was
   performed at different disease phases. Through modular analysis, the
   different modules associated with disease phases were then identified.
   Last but not least, we could identify significant genes through these
   modules which were supposed to be the gene expression signatures with
   highly relevant to specific phases of prostate cancer. Once the common
   genes identified in each of different modules were overlapped,
   expectedly these common genes were beneficial for uncovering of novel
   prostate cancer-related pathways and transcription factors which could
   be candidate biomarkers for prostate cancer progression. Our study
   hereby demonstrated a practical workflow for integrative analysis of
   prostate cancer at the systematic level. For the genome-wide studies,
   this will be a basic effort for future development and evolution in
   aspects of the translational biomedical informatics, which ultimately
   intend to improve patient outcomes and diagnostics with omics dataset
   through integrative systems biology [[56]26].

Methods

Human prostate cancer protein interaction network reconstruction and
annotation

   The human prostate cancer protein-protein interaction network
   (HPC-PPIN) was initially reconstructed in order to be further used for
   integrative analysis as a diagram illustrated in Figure [57]1. To
   reconstruct the HPC-PPIN, we used two different types of datasets. The
   first dataset was the genes associated in prostate cancer derived from
   a collection of prostate cancer databases and other relevant resources
   (e.g. Dragon Database of Genes associated with Prostate Cancer (DDPC)
   [[58]27], GeneGo [[59]28], OMIM [[60]29], KEGG [[61]30], PGDB [[62]31],
   CCDB [[63]32], and Gene Ontology (GO) [[64]33]).

Figure 1.

   Figure 1
   [65]Open in a new tab

   The modular analysis pipeline. Diagram shows identification of
   candidate disease-associated genes as potential module biomarker based
   on integrative analysis of the reconstructed human prostate cancer
   protein-protein interaction network (HPC-PPIN) and the different phases
   of gene expression profiles of prostate cancer. The threshold for
   greedy algorithm via Cytoscape jActiveModules (jAM) plugin for the most
   significant core sub-networks analysis in each gene expression profile
   was set to three iterations and top ten ranks.

   For the second type of the dataset, it was the human protein-protein
   interactions data (Homo sapiens) which was downloaded from the BioGRID
   database [[66]34]. Concerning on annotation of the HPC-PPIN, we used
   the Database for Annotation, Visualization and Integrated Discovery
   (DAVID) system [[67]35,[68]36]. At the beginning, functional annotation
   clustering tool of DAVID system was applied to group annotated genes
   within HPC-PPIN across three GO processes underlying molecular
   function, biological process, and cellular component. Among three GO
   processes, this tool was then used to identify the enriched GO terms.
   In order to annotate detailed functions in context of pathways
   underlying metabolism, cellular process, environmental information
   process and genetics information process, KEGG database was used
   ([69]http://www.genome.jp/kegg/pathway.html).

Prostate cancer gene expression data collection and analysis

   The gene expression profiles based different platform arrays from
   different stages of prostate cancer (i.e. disease stages I, II, II, IV)
   were collected from various laboratories. Table [70]1 lists available
   information of collected gene expression profiles (431 samples) of
   prostate cancer progression. Since only fewer samples are available in
   stage I than other disease stages, stages I and II were combined into
   one phase (Table [71]1). All expression datasets were analysed for
   gaining statistics values. The statistical processing methods were
   invoked through the limma (Linear Models for Microarray Data) package
   in R [[72]37,[73]38] and scripting under R version 2.9.0 (R Development
   Core Team). The limma package [[74]37] was applied to perform moderated
   Student's t-test between all possible pairwise disease phases
   comparison i.e., early-middle phases, middle-late phases, and
   early-late phases, to determine significantly differential gene
   expression. Empirical Bayesian statistical method was applied to
   moderate the standard errors within each gene and then the
   Benjamini-Hochberg's method was applied to adjust the multi-testing
   [[75]39], as well as to obtain the adjusted p-value.

Table 1.

   Gene expression profiles of prostate cancer used for integrative
   analysis#
   No. Exp. Platform No. Probes Samples Series No. Samples of prostate
   cancer stages References