Abstract

   The copy numbers of genes in cancer samples are often highly disrupted
   and form a natural amplification/deletion experiment encompassing
   multiple genes. Matched array comparative genomics and transcriptomics
   datasets from such samples can be used to predict inter-chromosomal
   gene regulatory relationships. Previously we published the database
   METAMATCHED, comprising the results from such an analysis of a large
   number of publically available cancer datasets. Here we investigate
   genes in the database which are unusual in that their copy number
   exhibits consistent heterogeneous disruption in a high proportion of
   the cancer datasets. We assess the potential relevance of these genes
   to the pathology of the cancer samples, in light of their predicted
   regulatory relationships and enriched biological pathways. A
   network-based method was used to identify enriched pathways from the
   genes’ inferred targets. The analysis predicts both known and new
   regulator-target interactions and pathway memberships. We examine
   examples in detail, in particular the gene POGZ, which is disrupted in
   many of the cancer datasets and has an unusually large number of
   predicted targets, from which the network analysis predicts membership
   of cancer related pathways. The results suggest close involvement in
   known cancer pathways of genes exhibiting consistent heterogeneous copy
   number disruption. Further experimental work would clarify their
   relevance to tumor biology. The results of the analysis presented in
   the database METAMATCHED, and included here as an R archive file,
   constitute a large number of predicted regulatory relationships and
   pathway memberships which we anticipate will be useful in informing
   such experiments.

Introduction

   Previously we have demonstrated that an analysis of matched array
   comparative genomics and transcriptomics human cancer datasets can
   reveal inter-chromosomal acting gene regulatory relationships
   [[26]1–[27]3]. By regulatory relationship we are refering to either a
   direct relationship, of a transcription factor on its target gene, or a
   very indirect one, through a pathway containing intermediate regulatory
   steps. We published the database METAMATCHED [[28]4], comprising the
   results from such an analysis of a large number of publically available
   cancer datasets. Careful data randomisation ensures statistically
   significant predictions. Each dataset originated from samples of a
   particular type of cancer, and the datasets covered a wide range of
   cancer types.

   We noticed that there are genes in the database which have a highly
   variable copy number amongst samples within a dataset and this occurs
   consistently for these same genes across many of the datasets and
   different cancer types. In this paper we investigate these unusual
   genes. We investigate their target genes, predicted by the
   meta-analysis of publically available cancer datasets, the biological
   pathways enriched in their lists of target genes, and their relevance
   to the cancer pathology of the samples. Why genes which have a highly
   variable, inconsistent copy number disruption amongst samples within a
   cancer dataset may, perhaps counter-intuitively, be of relevance to the
   cancer pathology is examined later in this introduction. Firstly we
   discuss the background to the meta-analysis and the pathway enrichment
   analysis.

   Array comparative genomics (aCGH) microarrays detect gene deletions or
   gene amplifications (extra copies) by comparing gene copy numbers in
   the DNA extracted from test sample cells to the copy numbers in normal
   control cells. Transcriptomics experiments use microarrays that measure
   the abundance of mRNA. In matched experiments the two different types
   of measurement are performed on the same samples. Reviews of matched
   aCGH and transcriptomics experiments, their analysis and uses can be
   found in references [[29]5] and [[30]6].