Abstract

Background

   Interpreting large-scale studies from microarrays or next-generation
   sequencing for further experimental testing remains one of the major
   challenges in quantitative biology. Combining expression with physical
   or genetic interaction data has already been successfully applied to
   enhance knowledge from all types of high-throughput studies. Yet,
   toolboxes for navigating and understanding even small gene or protein
   networks are poorly developed.

Results

   We introduce two Cytoscape plug-ins, which support the generation and
   interpretation of experiment-based interaction networks. The virtual
   pathway explorer viPEr creates so-called focus networks by joining a
   list of experimentally determined genes with the interactome of a
   specific organism. viPEr calculates all paths between two or more
   user-selected nodes, or explores the neighborhood of a single selected
   node. Numerical values from expression studies assigned to the nodes
   serve to score identified paths. The pathway enrichment analysis tool
   PEANuT annotates networks with pathway information from various sources
   and calculates enriched pathways between a focus and a background
   network. Using time series expression data of atorvastatin treated
   primary hepatocytes from six patients, we demonstrate the handling and
   applicability of viPEr and PEANuT. Based on our investigations using
   viPEr and PEANuT, we suggest a role of the FoxA1/A2/A3 transcriptional
   network in the cellular response to atorvastatin treatment. Moreover,
   we find an enrichment of metabolic and cancer pathways in the Fox
   transcriptional network and demonstrate a patient-specific reaction to
   the drug.

Conclusions

   The Cytoscape plug-in viPEr integrates –omics data with interactome
   data. It supports the interpretation and navigation of large-scale
   datasets by creating focus networks, facilitating mechanistic
   predictions from –omics studies. PEANuT provides an up-front method to
   identify underlying biological principles by calculating enriched
   pathways in focus networks.

Electronic supplementary material

   The online version of this article (doi:10.1186/s12864-015-2017-z)
   contains supplementary material, which is available to authorized
   users.

   Keywords: Focus network, Disease state, Shortest path algorithm, Node
   neighborhood, Pathway enrichment

Background

   The integration and biological interpretation of large-scale datasets
   is currently one of the main challenges in bioinformatics research. How
   can we extract meaningful information from a list of differentially
   regulated genes? One possibility to understand, how (co-)regulated
   genes relate to each other is to view them in the context of their
   physical, genetic or regulatory interactions: network-based analysis
   using data from protein-protein or regulatory interactions can open new
   perspectives for further experimental studies.

   Quantitative values from a functional screen or a list of mutated genes
   identified in a cancer genomics study can be used to generate
   sub-networks from a large, biological interaction network. These sub-
   or focus networks can be termed ‘disease’ or ‘state’ networks, as they
   describe the modules in the cell or the organism, which are affected by
   a certain experimental condition or by a particular disease. This
   approach has for instance been employed by software like the database
   and web-tool String [[35]1] or the command-line based tool Netbox
   [[36]2].

   Focus networks can also be created based on a specific biological
   question: how are two specific proteins - or two groups of proteins -
   connected with each other? This approach allows an even more
   biologically focused view on the changes in the cellular network under
   different conditions.

   Focus networks allow us moreover to understand the cross-talk between
   two molecules or pathways, which in this context is defined by all
   paths between two proteins or two groups of proteins.

   Typically, some form of shortest-path algorithm like Dijkstra’s
   algorithm [[37]3] is used to create sub-networks between two or more
   nodes. The numeric values from functional genomics studies are used to
   score paths between two nodes. Methods like Pathfinder [[38]4] or the
   Reactome browser [[39]5] have implemented this functionality of
   connecting two molecules with each other within a biological network.
   Both tools use numeric values also to visualize regulatory changes that
   take place during state changes of the cell/organism under study.

   Focus networks can be further enriched using Gene Ontology (GO) terms
   [[40]6] or pathways from different sources to provide additional
   functional information for data interpretation. GO biological processes
   can also be used to explore cross-connections between two or more
   pathways and find missing pathway components. This provides a more
   integrative view of a biological network.

   The drug family of the statins is currently widely used to lower
   cholesterol levels in the treatment of hypercholesterolemia. Statins,
   which act as HMG-CoA (3-hydroxy-3-methylglutaryl–coenzyme A) reductase
   (HMGCR) inhibitors, prevent the production of cholesterol by inhibiting
   the biosynthesis of isoprenoids and sterols in the mevalonate pathway
   [[41]7]. However, statins are known to have a variety of side effects,
   including muscle adverse effects, liver damage, cognitive impairment,
   cancer progression or diabetes mellitus [[42]8–[43]11]. Functional
   genomics studies of statin-treated cell systems indicate extensive
   changes of expression levels upon drug treatment (see for instance
   [[44]12–[45]20]). The detailed analysis of these transcriptional
   changes should therefore lead to a better understanding of the
   functions and pleiotropic effects of statins.

   In this study, we re-analyzed the time-course expression data from
   atorvastatin-treated, primary human hepatocytes from six different
   patients published in a previous study [[46]20]. We focused our
   analysis on determining the regulation of downstream genes from statin
   drug targets as defined in STITCH [[47]21]. We were especially
   interested in addressing the following issues: 1) How do statin targets
   and differentially regulated genes relate to each other? 2) Which
   pathways are affected upon statin treatment? 3) How does the dynamics
   of the neighborhood of specific proteins change after statin treatment?

   In order to answer those questions, we have developed two Cytoscape
   plug-ins that work together: viPEr, the virtual Pathway Explorer,
   creates focus interaction networks by connecting two or more nodes with
   each other. It applies user-provided expression data to score paths
   between two nodes and thus limits the network to functionally relevant
   paths. The Cytoscape plug-in PEANuT (Pathway Enrichment ANalysis Tool)
   upgrades interaction networks with pathway information and identifies
   enriched pathways in focus networks.

   We have applied our toolbox to re-analyze the expression data from
   atorvastatin-treated, primary human hepatocytes and found that the
   transcription factors FOXA1, 2 and 3 are important regulatory players
   in atorvastatin response.

Implementation

viPEr

   viPEr was written in Java as a Cytoscape plug-in. The basis of all
   functions is a recursive method, which iterates through the members
   (nodes) of all paths emanating from a selected node. The step depth is
   influenced by two parameters: 1) the maximum number of steps allowed
   (set by the user). 2) the numerical values of the nodes. We used the
   log2fold expression changes of atorvastatin treated primary hepatocytes
   described in [[48]20] as numerical values.

   viPEr can be accessed under:
   [49]http://sourceforge.net/projects/viperplugin/

   viPEr has three main search options:
    1. ‘A to B’: ‘A to B’ connects two selected nodes with each other. We
       refer to the paths between nodes A and B as cross-talk.
       Mathematically, we define cross-talk as all paths between two nodes
       (x1, x2), where a single node in a path can only be passed once.
       The result is a focus network, which is determined by the maximum
       number of steps allowed between the start and the target node. The
       search is stopped when the target node is reached or the maximum
       number of steps is exceeded. Only if the target has been found, a
       path is stored, scored and displayed in the results tab. The focus
       network is created based on all nodes that are present in all
       stored paths. The connecting edges are taken from the original
       network. Therefore, all known interactions between the subset of
       nodes are included in the newly created focus network.
       Scoring of paths is done using the following equation:

   [MATH: <mi
   mathvariant="italic">Score</mi><mo>=</mo><mfrac><mrow><mo>#</mo><mspace
   width="0.5em"></mspace><mi mathvariant="italic">of</mi><mspace
   width="0.5em"></mspace><mi mathvariant="italic">differently</mi><mspace
   width="0.5em"></mspace><mi mathvariant="italic">regulated</mi><mspace
   width="0.5em"></mspace><mi mathvariant="italic">nodes</mi><mspace
   width="0.5em"></mspace><mo>∈</mo><mspace width="0.5em"></mspace><mi
   mathvariant="italic">path</mi></mrow><msup><mfenced close=")"
   open="("><mi
   mathvariant="italic">pathlength</mi></mfenced><mn>2</mn></msup></mfrac>
   :MATH]
       The p-values for discovered paths in focus networks are calculated
       based on the cumulative probability of the hypergeometric
       distribution to find k or more differentially expressed genes in a
       path of length n.
    2. ‘connecting in batch’: similarly to the ‘A to B’ search, two groups
       of nodes can be connected using the ‘connection in batch’ function.
       For every node in the start list A, the recursive search is
       computed towards every node in the target list B. A results tab
       with scored paths is not created in this case.
    3. ‘environment search’: The third option is to explore the regulated
       proximity of a single node using the ‘environment search’. Just one
       starting node is selected in this case. Mathematically, we define
       the environment search as follows: a network is calculated from all
       outgoing paths of length l from x1, where every node is allowed to
       be passed only once per path and all paths with at least two
       consecutive node scores below threshold t have been removed. The
       iteration through emanating paths is carried out until the allowed
       maximum search depth is reached. When exploring the neighborhood of
       a single node, the numerical data are used to select paths
       radiating from the selected node. Paths, in which at least two
       consecutive nodes are not differentially expressed, are removed
       from the resulting neighbor focus network. Thus, only paths that
       contain differentially regulated nodes are considered for the
       environment search, though single unregulated linker nodes are
       allowed. The resulting network is referred to as a neighbor focus
       network.

Using viPEr

   Starting from any existing network supplemented with expression data,
   the user has to select the attribute field containing the expression
   information. A slider is automatically set to the respective range of
   expression values. After adjusting the slider to the desired expression
   range, different options are available in the workflow (see
   Fig. [50]1).
    1. ‘A to B’
       This function executes the path search algorithm between two
       selected nodes. The result is a focus network of all identified
       paths of a certain length between two nodes. The user selects the
       length (step-size) of the calculated paths. All interconnecting
       edges are added to the focus network. A result list, which includes
       every discovered path between the nodes, is located on the right
       side of the screen. This list shows all paths, their respective
       members and the assigned score as described above. The score can be
       used to further reduce the focus network or simply to visualize
       specific paths.
    2. ‘connecting in batch’
       Two groups of nodes can be connected in the ‘connecting in batch’
       function of viPEr. The same algorithm is used as in the ‘A to B’
       search, except that all paths between all members of a start list
       and a target list are computed. This algorithm can be applied to
       detect cross talk between two pathways, two protein complexes or
       two hit lists from different experiments. Three buttons have to be
       used for the ‘connecting in batch’ search: 1) a start protein list
       has to be defined by selecting all starting nodes and pressing the
       ‘select start protein list’; 2) the target protein list has to be
       selected accordingly and confirmed by pressing the ‘select target
       protein list’ button; 3) the button ‘start connection in batch’
       executes the search.
    3. ‘environment search’
       In case only a single protein of interest exists, the algorithm can
       be used to observe the dynamics of expression in the environment of
       this protein using the ‘environment search’. A single node is
       selected and the search is executed with the button ‘environment
       search’. All regulated nodes within a certain step size of the
       selected protein give rise to the neighbor focus network.

Fig. 1.

   Fig. 1
   [51]Open in a new tab

   Workflow for creating focus networks. Workflow of viPEr in creating
   focus networks between two nodes/two groups of nodes, or in exploring
   the neighborhood of a single node of interest. The user must select two
   nodes or group of nodes for creating a focus network. A single node is
   selected when exploring the neighborhood. Numerical data (for instance
   from an expression screen) must be added to the network for scoring
   paths of a focus network and for creating a neighbor focus network from
   a single node. In both cases, the user selects the search depth. After
   creating the focus network, the network can for instance be explored by
   using and visualizing GO-terms. PEANuT is used to find and visualize
   enriched pathways

PEANuT

   PEANuT (Pathway Enrichment ANalysis Tool) is a Cytoscape plug-in
   designed to annotate protein interaction networks with biological
   pathway information and to identify enriched pathways in focus
   networks. The interactome of the organism denotes the background
   network. Next to visualizing enriched pathways in the focus networks,
   the results can be exported as a tab delimited file.

   PEANuT can be accessed under:
   [52]http://sourceforge.net/projects/peanut-cyto and was implemented in
   Java.

   The user can choose between the three databases ConsensusPathDB
   ([53]http://consensuspathdb.org/, [[54]22]), Pathway Commons
   ([55]http://www.pathwaycommons.org/, [[56]23]) and Wikipathways
   ([57]http://www.wikipathways.org/ [[58]24]) to annotate the network.
   While ConsensusPathDB requires Entrez gene IDs as input, Pathway
   Commons and Wikipathways require UniProt accession numbers. Annotation
   of nodes with these IDs can be done within Cytoscape using for instance
   the plug-in CyThesaurus [[59]25].

   ConsensusPathDB and Pathway Commons contain pathway data collected from
   publicly available pathway databases (e.g., Reactome [[60]26], KEGG
   [[61]27]; see the respective homepages for more information).
   WikiPathways is a database based on the ‘wiki principle’ and provides
   an open platform dedicated to collaborative registering, reviewing and
   curation of biological pathways.

   While Pathway Commons and WikiPathways work with a wide variety of
   organisms, ConsensusPathDB is specialized on human, mouse and yeast
   pathways. When the user chooses to annotate his network of interest
   with ConsensusPathDB data, he can additionally import directed
   interactions from KEGG to increase the amount of vertex degrees,
   enabling more complex path searches using viPEr.

   Information from Pathway Commons is accessed over their web service.
   Flat files from the ConsensusPathDB and WikiPathways webpages are
   downloaded via the Apache Commons IO library
   ([62]http://commons.apache.org/proper/commons-io/) and Cytoscape
   internal downloader classes.

   Once downloaded, ConsensusPathDB and WikiPathways can be used offline,
   while Pathway Commons requires internet access. Network annotation with
   Pathway Commons is slower, as it depends on the load and availability
   of the host server, as well as internet connection speed.

   The probability value for the pathway enrichment in the focus network
   is determined using the Apache Commons Math library
   ([63]http://commons.apache.org/proper/commons-math/) to calculate the
   cumulative probability of a hypergeometric distribution. Multiple
   testing correction is achieved by applying either Bonferroni [[64]28]
   or Benjamini-Hochberg [[65]29] correction.

   PEANuT has three sub-menus:
    1. ‘find pathways’: the find pathways sub-menu annotates the networks
       in Cytoscape with pathway data. Networks can be labeled using more
       than one pathway resource by re-using the sub-menu with different
       pathway selections.
    2. ‘show pathway statistics’: the ‘show pathway statistics’ sub-menu
       calculates enriched pathways in a selected focus network. The user
       has to select the focus network of interest, the background network
       and choose a p-value cut-off. Enriched pathways can be selected for
       visualization and downloaded as a tab-delimited file.
    3. ‘download/update dependencies’: this sub-menu is used to download
       pathway information for network annotation. It needs to be run
       before using PEANuT the first time and should be run regularly to
       update pathway information.

Using PEANuT

   After installing PEANuT in Cytoscape by placing the plug-in in the
   Cytoscape plug-in folder, the tool can be accessed via the plug-in
   menu. The sub-menus are used as follows:
    1. ‘find pathways’
       This sub-menu allows the user to start the software and annotate
       the network(s) of choice with pathway data. In a simple dialog the
       user can select between three different databases: ConsensusPathDB,
       Pathway Commons or WikiPathways. The user can select different
       options for each database depending on preferences (such as import