Abstract

   High-throughput OMICs experiments generate signals for millions of
   entities (i.e. genes, proteins, metabolites or any measurable
   biological entity) in the cell. In an effort to summarize and explore
   these signals, expression results are examined in the context of known
   pathways and processes, through enrichment analysis to generate a set
   of pathways and processes that is significantly enriched. Due to the
   high redundancy in annotation resources this often results in hundreds
   of sets. To facilitate the analysis of these results, we have developed
   the Enrichment Map app to visualize enrichments as a network. We have
   updated Enrichment Map to support Cytoscape 3, and have added
   additional features including new data formats and command line access.

Introduction

   With the expansion and accessibility of a wide range of experimental
   techniques to accurately identify and measure any known genomics
   feature ranging from proteins, transcripts, genes, microRNAs, copy
   number variations, or DNA methylation in a high-throughput manner,
   signals for thousands of entities are often generated for an individual
   OMICs experiment. In efforts to interpret these results in the context
   of perturbed cellular mechanisms, the entities are often scored and
   examined for enrichment in known pathways and processes.

   Pathway enrichment analysis helps to uncover general trends or themes
   present in the data, instead of focusing on one or a few favorite
   differential genes. Available tools are abundant, designed for varying
   data types and implemented using a range of different statistical
   tests: given a set of biological entities, these OMICs signals are then
   translated into a set of significant pathways and processes (reviewed
   in Khatri et al. ^[29]1, Huang et al. ^[30]2). Due to the high
   redundancy that exists between pathway databases coming from multiple
   functional annotations of gene products, pathway enrichment often
   results in a long list of potentially interesting pathways. To help
   analyze the set of differential pathways, we created the Enrichment Map
   app to display enrichment results as a network, where pathways are
   nodes in the network and edges represent known pathway cross-talk
   defined by the number of genes shared between the pair of pathways and
   where the network layout organizes the map into functional modules
   ^[31]3.

   In this paper, we present the recent implementation of the Enrichment
   Map app for Cytoscape 3 as well as new features.

Implementation

   Although originally designed to support Gene Set Enrichment Analysis
   (GSEA) ^[32]4 the current Enrichment Map app supports multiple
   enrichment results from tools such as DAVID ^[33]5, BiNGO ^[34]6, and
   GREAT ^[35]7 as well as simplified generic input files which one can
   easily create from your own enrichment results. Tools like g:Profiler
   ^[36]8 allow users to download results in an Enrichment Map compatible
   generic format.

   With the ongoing effort to populate gene annotation and pathway
   databases, it is difficult for standalone enrichment tools to keep
   databases up to date. For convenience, we compile gene set files or GMT
   files, a format created for the GSEA software, to describe all the
   genes contained in a specified gene set, monthly, from a comprehensive
   set of annotation and Pathway databases (
   [37]http://download.baderlab.org/EM_Genesets/), including standard
   sources, like MSigDB ^[38]4. Although originally GMT files were
   specific to GSEA, with the expansion of R and Bioconductor it is now
   straightforward to load GMT files into data structures in R using
   packages like GSA ( [39]http://statweb.stanford.edu/~tibs/ftp/GSA.pdf)
   and analyze your OMICs expression data with one of the many different
   gene set enrichment algorithms such as geneSetTest in the Limma package
   ^[40]9, global test ^[41]10, or Camera ^[42]11. Visualizing the
   resulting enrichments is straightforward by exporting to our generic
   format which minimally consists of the geneset name, description and
   associated enrichment p-value. Through this mechanism, no matter what
   the dataset of interest is, gene, protein or metabolite expression, the
   resulting enrichment analysis can be displayed as an enrichment map.

   There are two main ways to input data into Enrichment Map, through the
   user interface ( [43]Figure 1) or the command tool ( [44]Table 1). The
   user interface is an interactive way to specify all the required files
   and parameters based on the analysis type chosen. The command tool
   allows users to automatically create maps directly from the command
   line, other Cytoscape apps or other programs which can include in-house
   enrichment tools.

Figure 1. Enrichment Map app user interface.

   [45]Figure 1.
   [46]Open in a new tab

   Illustration of Enrichment Map user interface which consists of four
   main parts: analysis type, file specifications, node and edge
   filtering. For each analysis type there is a different set of required
   files. For added functionality there are a set of optional files that
   can be included to help annotate and explore results. Tuning parameters
   such as p-value and q-value helps control the number of nodes while
   tuning the similarity coefficient helps control the number of edges.

Table 1. Command tool specification outlined for each of the analysis types.

   There is an additional command optimized for GSEA inputs only.
   Command Required Arguments Optional Arguments
   enrichment map build
   analysistype="GSEA" gmtFile=filepath to geneset file
   enrichmentsDataset1=filepath to enrichments
   enrichments2Dataset1=filepath to enrichments
   pvalue=numerical cutoff, {default : 0.05}
   qvalue=numerical cutoff, {default : 0.1}
   coefficients=one of the following
   [OVERLAP, JACCARD, COMBINED],
   {default:OVERLAP}
   similaritycutoff=numerical cutoff,
   {default : 0.5} expressionDataset 1=filepath to expression file
   ranksDataset 1=filepath to rank file
   classDataset 1=filepath to class file
   phenotype1Dataset 1=Text representing Phenotype
   phenotype2Dataset 1=Text representing Phenotype2
   enrichmentsDataset2=filepath to enrichments
   enrichments2Dataset2=filepath to enrichments
   (Replace 1 for 2 to specify which dataset the file is)
   enrichmentmap build
   analysistype="generic" gmtFile=filepath to geneset file
   enrichmentsDataset 1=filepath to enrichments
   pvalue=numerical cutoff, {default : 0.05}
   qvalue=numerical cutoff, {default : 0.1}
   coefficients=one of the following
   [OVERLAP, JACCARD, COMBINED],
   {default:OVERLAP}
   similaritycutoff=numerical cutoff,
   {default : 0.5} expressionDataset 1=filepath to expression file
   ranksDataset 1=filepath to rank file
   classDataset 1=filepath to class file
   phenotype1Dataset 1=Text representing Phenotype
   phenotype2Dataset 1=Text representing Phenotype2
   enrichmentsDataset2=filepath to enrichments
   (Replace 1 for 2 to specify which dataset the file is)
   enrichmentmap build
   analysistype=
   "David/BiNGO/Great" enrichmentsDataset1=filepath to enrichments
   pvalue=numerical cutoff, {default : 0.05}
   qvalue=numerical cutoff, {default : 0.1}
   coefficients=one of the following
   [OVERLAP, JACCARD, COMBINED],
   {default:OVERLAP}
   similaritycutoff=numerical cutoff,
   {default : 0.5} expressionDataset 1=filepath to expression file
   enrichmentsDataset2=filepath to enrichments
   (Replace 1 for 2 to specify which dataset the file is)
   enrichmentmap
   gseabuild edb=filepath to GSEA results edb directory
   pvalue=numerical cutoff, {default : 0.05}
   qvalue=numerical cutoff, {default : 0.1}
   coefficients=one of the following
   [OVERLAP, JACCARD, COMBINED],
   {default:OVERLAP}
   similaritycutoff=numerical cutoff,
   {default : 0.5} expression=filepath to expression file
   expression2=filepath to expression file
   edbdir2=filepath to edb directory
   [47]Open in a new tab

   Once files and parameters have been specified, the Enrichment Map can
   be created. Unlike a traditional biological network, nodes in an
   Enrichment Map represent a set of genes (e.g. a pathway) and their
   connections the set of genes that two nodes have in common (e.g.
   pathway cross-talk). Every Enrichment Map is associated with a set of
   files, parameters, and a number of datasets (currently limited to two)
   ( [48]Figure 2). Datasets contain gene sets, enrichments, and
   expression all of which is needed to interactively update the map
   through cutoff adjustment sliders found in the legend panel or display
   the genes contained in a given node or edge selection as a heatmap.

Figure 2. Enrichment Map build process overview.

   [49]Figure 2.
   [50]Open in a new tab

   Enrichment Map app was ported to Cytoscape 3 as a bundle app using Open
   Service Gateway initiative (OSGi) services provided through the
   extensive Cytoscape API (version 3.1). The look and feel of the app
   remains similar to the original implementation for Cytoscape 2 with
   user input interfaces and view panels including expression heatmap and
   legend being a direct port from the original source. Given the new
   framework, each panel implements the CytoPanelComponent and is a
   registered service associated with the Enrichment Map app. The main
   enrichment map input panel is registered only once a user opens the
   app. The remaining view panels are only registered once an enrichment
   map is created. Enrichment Map consists of one main taskFactory that
   given an Enrichment Map object populated with a set of input files will
   construct the appropriate task iterator. Depending on the files
   specified different parsing tasks can be added to the iterator.
   Additionally, multiple files of the same type can also be added to the
   queue with distinct instantiations of a parsing task (with different
   files specified on task creation). All parsed files populate fields
   contained in the Enrichment Map object which is then passed to and
   updated by each of the subsequent tasks ( [51]Figure 2).

   The BuildEnrichmentMapTaskFactory is used by both the user interface
   and command tool to construct an enrichment map. Command tool
   functionality for Enrichment Map requires the given task to define its
   variables as tunables. Tunables are user supplied information needed by
   the task. User interfaces can be automatically generated for such tasks
   based on the set of tunable definitions. When implementing the
   Enrichment Map tunable task it was our intention to replace our current
   user interface with the one automatically generated by the task. Given
   the varied data required from the user as well as the interactive
   nature of our current user interface the generated tunable interface
   although functional lacked features that our users are accustomed to.
   For instance, to specify the analysis type or similarity cutoff our
   interface has two sets of radio buttons where all the options are
   visible and only one is selectable. In the tunable interface the same
   choice can only be represented as a single selection list, a drop down
   list the user can choose one option from. Both representations are
   functional but we preferred the radio button implementation therefore,
   we decided to keep our original interface and add the tunable task
   solely for the command tool functionality.

Results

   To illustrate the functionality of Enrichment Map we analyzed and
   visualized an expression dataset from the Gene Expression Omnibus (GEO)
   ^[52]12 for mouse fibroblast cells. The experiment was designed to
   compare gene expression in fibroblast cells in the heart to those in
   the tail to highlight genes that are uniquely expressed in heart
   fibroblasts ^[53]13 ([54]GSE50531). Raw expression data was scored
   using the GEO2R tool available on the GEO website. These expression
   data were input to GSEA along with a recent compilation of mouse
   pathway gene sets (May 14, 2014;
   [55]http://download.baderlab.org/EM_Genesets/May_14_2014/) to calculate
   enrichments. GSEA output files were given to the app with the cutoffs
   p-value < 0.005, q-value < 0.05 and overlap similarity coefficient >
   0.3. The Enrichment Map generated had roughly the same number of
   enriched gene sets specific to heart as to tail with cardiac specific
   sets associated only with the heart phenotype ( [56]Figure 3, red
   nodes).

Figure 3. Enrichment Map of heart fibroblast versus tail fibroblast
expression.

   Figure 3.
   [57]Open in a new tab

   Using the search field you can enter any text to search all attributes
   of the given network. Highlighted nodes, (shown as yellow nodes with
   red edges just left of center) are genesets that contain the gene
   TBX20.

   One of the main genes mentioned in the paper associated with this
   dataset was TBX20 as a specific cardiogenic fibroblast gene found to be
   important for both normal cardiac development and postinfarct repair
   ^[58]13. In Enrichment Map it is easy to find all gene sets that
   contain it by entering the term TBX20 into the search box ( [59]Figure
   3) (this will also highlight any gene sets that have TBX20 in the name
   or any other attribute). Built-in search functionality in Cytoscape 3
   has improved from Cytoscape 2. All attributes associated with a given
   network are indexed so there is no longer the need to specify which
   attribute you would like to search through. Selection of individual or
   sets of nodes and edges creates a view of the genes contained within
   the selection as a heat map ( [60]Figure 4).

Figure 4. Node Heat Map Panel (contained in the Cytoscape table panel)
displayed on selection of “Pericardium development (GO:0060039)” gene set.

   Figure 4.
   [61]Open in a new tab

   If GSEA results are loaded into Enrichment Map, GSEA leading edge
   genes, defined as the set of genes that contribute most to the
   enrichment, are highlighted in yellow.

   Often one of the main challenges after creating an Enrichment Map is
   going from a network in Cytoscape to publication quality figures. We
   format the labels so they are more readable and don’t extend across the
   whole screen, but as a result modules often contain overlapping labels
   that are difficult to read and require hours of manual formatting to
   create networks that can be used for figures. Using the Cytoscape 3
   built-in scaling feature (Layout>Scale), the visualization of clusters
   and networks can be improved.

Conclusions

   The Enrichment Map app allows users to translate large sets of
   enrichment results to a network where highly similar terms cluster
   together to better highlight overall trends and themes of the
   underlying data. The details behind the enrichment can be further
   investigated within the Enrichment Map app using the built-in
   expression viewer to see all the entities associated with a selected
   pathway.

Software availability

   Software available from:
   [62]http://apps.cytoscape.org/apps/enrichmentmap

   Latest source code: [63]https://github.com/BaderLab/EnrichmentMapApp

   Source code as at the time of publication:
   [64]https://github.com/F1000Research/EnrichmentMapApp/releases/tag/V1.0

   Archived source code as at the time of publication:
   [65]http://dx.doi.org/10.5281/zenodo.10542 ^[66]14

   License: Lesser GNU Public License 2.1:
   [67]https://www.gnu.org/licenses/old-licenses/lgpl-2.1.html

   Tutorials
   [68]http://baderlab.org/Software/EnrichmentMap#Tutorials_and_Examples

Funding Statement

   This work was supported by a NRNB grant (U.S. National Institutes of
   Health, National Center for Research Resources grant number P41
   GM103504) to Gary D. Bader.

   The funders had no role in study design, data collection and analysis,
   decision to publish, or preparation of the manuscript.

   v1; ref status: indexed

References