Abstract

   Data analysis is one of the most critical and challenging steps in drug
   discovery and disease biology. A user-friendly resource to visualize
   and analyse high-throughput data provides a powerful medium for both
   experimental and computational biologists to understand vastly
   different biological data types and obtain a concise, simplified and
   meaningful output for better knowledge discovery. We have previously
   developed TargetMine, an integrated data warehouse optimized for target
   prioritization. Here we describe how upgraded and newly modelled data
   types in TargetMine can now survey the wider biological and chemical
   data space, relevant to drug discovery and development. To enhance the
   scope of TargetMine from target prioritization to broad-based knowledge
   discovery, we have also developed a new auxiliary toolkit to assist
   with data analysis and visualization in TargetMine. This toolkit
   features interactive data analysis tools to query and analyse the
   biological data compiled within the TargetMine data warehouse. The
   enhanced system enables users to discover new hypotheses interactively
   by performing complicated searches with no programming and obtaining
   the results in an easy to comprehend output format.

   Database URL: [28]http://targetmine.mizuguchilab.org

Introduction

   The proliferation of high-throughput ‘omics’ experiments has led to a
   surge in the availability of biomedical data that need to be properly
   analysed. Leveraging biological information from different data types
   yields deeper insights into gene function and provides a better
   understanding of the biological process under study, which can be
   further transformed into actionable research. For instance, drug
   repositioning (i.e. new uses for existing drugs) ([29]1) and
   combinatorial drug treatments ([30]2) necessitate a systems-level
   mapping of drug–target interactions and their influence on cellular
   networks. Cellular networks themselves comprise multiple organizational
   layers made up of different types of biomolecular interactions such as
   microRNA (miRNA)–target interactions (MTIs), protein–protein
   interactions (PPIs) and transcription factor (TF)–target gene
   interactions, which together modulate the functioning of the living
   systems. The ability to correlate these data with gene expression
   patterns and a priori knowledge of the genetic determinants of various
   diseases is key to a deeper understanding of disease mechanisms and
   development of better therapeutic strategies.

   However, integration of the vast and scattered array of biological
   information is a multifarious scientific challenge, for reasons ranging
   from inconsistencies in data gathering to heterogeneous and often
   incompatible formats used to store and manage biological data in
   different repositories. Despite these obstacles, the immense benefits
   of an integrative approach in disease biology and drug discovery have
   spawned numerous efforts to develop different types of frameworks and
   tools for integrating diverse biological data types ([31]3–10) and for
   functional analysis of large gene sets. Gene set functional enrichment
   relies on a statistical analysis of the relative abundance of
   biological themes associated with a given gene list and identifies
   themes (and associated genes) that are overrepresented and therefore,
   likely to be more relevant to the biological conditions under study.
   For instance, the DAVID gene functional classification resource employs
   a heuristic approach to grouping genes into modules based on
   similarities in the biological annotations and provides a set of tools
   for functional analysis of user-supplied gene lists ([32]11). On the
   other hand, Enrichr employs pre-defined gene set libraries to assist
   functional enrichment analysis of large gene lists ([33]12). However,
   most of the available tools that facilitate gene set functional
   enrichment provide a standalone web interface and have not been
   integrated into a more general data-mining platform; such a platform is
   often crucial to refine and validate gene set functional enrichment
   results and for further characterization of gene sets in drug discovery
   and related applications.

   We have previously developed TargetMine, an integrated data warehouse
   based on the versatile InterMine framework ([34]8, [35]10, [36]13),
   which models biological entities (such as genes and proteins) as
   ‘objects’ and their relationships as ‘references’ to other objects.