Abstract
Background
Interpreting large-scale studies from microarrays or next-generation
sequencing for further experimental testing remains one of the major
challenges in quantitative biology. Combining expression with physical
or genetic interaction data has already been successfully applied to
enhance knowledge from all types of high-throughput studies. Yet,
toolboxes for navigating and understanding even small gene or protein
networks are poorly developed.
Results
We introduce two Cytoscape plug-ins, which support the generation and
interpretation of experiment-based interaction networks. The virtual
pathway explorer viPEr creates so-called focus networks by joining a
list of experimentally determined genes with the interactome of a
specific organism. viPEr calculates all paths between two or more
user-selected nodes, or explores the neighborhood of a single selected
node. Numerical values from expression studies assigned to the nodes
serve to score identified paths. The pathway enrichment analysis tool
PEANuT annotates networks with pathway information from various sources
and calculates enriched pathways between a focus and a background
network. Using time series expression data of atorvastatin treated
primary hepatocytes from six patients, we demonstrate the handling and
applicability of viPEr and PEANuT. Based on our investigations using
viPEr and PEANuT, we suggest a role of the FoxA1/A2/A3 transcriptional
network in the cellular response to atorvastatin treatment. Moreover,
we find an enrichment of metabolic and cancer pathways in the Fox
transcriptional network and demonstrate a patient-specific reaction to
the drug.
Conclusions
The Cytoscape plug-in viPEr integrates –omics data with interactome
data. It supports the interpretation and navigation of large-scale
datasets by creating focus networks, facilitating mechanistic
predictions from –omics studies. PEANuT provides an up-front method to
identify underlying biological principles by calculating enriched
pathways in focus networks.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-2017-z)
contains supplementary material, which is available to authorized
users.
Keywords: Focus network, Disease state, Shortest path algorithm, Node
neighborhood, Pathway enrichment
Background
The integration and biological interpretation of large-scale datasets
is currently one of the main challenges in bioinformatics research. How
can we extract meaningful information from a list of differentially
regulated genes? One possibility to understand, how (co-)regulated
genes relate to each other is to view them in the context of their
physical, genetic or regulatory interactions: network-based analysis
using data from protein-protein or regulatory interactions can open new
perspectives for further experimental studies.
Quantitative values from a functional screen or a list of mutated genes
identified in a cancer genomics study can be used to generate
sub-networks from a large, biological interaction network. These sub-
or focus networks can be termed ‘disease’ or ‘state’ networks, as they
describe the modules in the cell or the organism, which are affected by
a certain experimental condition or by a particular disease. This
approach has for instance been employed by software like the database
and web-tool String [[35]1] or the command-line based tool Netbox
[[36]2].
Focus networks can also be created based on a specific biological
question: how are two specific proteins - or two groups of proteins -
connected with each other? This approach allows an even more
biologically focused view on the changes in the cellular network under
different conditions.
Focus networks allow us moreover to understand the cross-talk between
two molecules or pathways, which in this context is defined by all
paths between two proteins or two groups of proteins.
Typically, some form of shortest-path algorithm like Dijkstra’s
algorithm [[37]3] is used to create sub-networks between two or more
nodes. The numeric values from functional genomics studies are used to
score paths between two nodes. Methods like Pathfinder [[38]4] or the
Reactome browser [[39]5] have implemented this functionality of
connecting two molecules with each other within a biological network.
Both tools use numeric values also to visualize regulatory changes that
take place during state changes of the cell/organism under study.
Focus networks can be further enriched using Gene Ontology (GO) terms
[[40]6] or pathways from different sources to provide additional
functional information for data interpretation. GO biological processes
can also be used to explore cross-connections between two or more
pathways and find missing pathway components. This provides a more
integrative view of a biological network.
The drug family of the statins is currently widely used to lower
cholesterol levels in the treatment of hypercholesterolemia. Statins,
which act as HMG-CoA (3-hydroxy-3-methylglutaryl–coenzyme A) reductase
(HMGCR) inhibitors, prevent the production of cholesterol by inhibiting
the biosynthesis of isoprenoids and sterols in the mevalonate pathway
[[41]7]. However, statins are known to have a variety of side effects,
including muscle adverse effects, liver damage, cognitive impairment,
cancer progression or diabetes mellitus [[42]8–[43]11]. Functional
genomics studies of statin-treated cell systems indicate extensive
changes of expression levels upon drug treatment (see for instance
[[44]12–[45]20]). The detailed analysis of these transcriptional
changes should therefore lead to a better understanding of the
functions and pleiotropic effects of statins.
In this study, we re-analyzed the time-course expression data from
atorvastatin-treated, primary human hepatocytes from six different
patients published in a previous study [[46]20]. We focused our
analysis on determining the regulation of downstream genes from statin
drug targets as defined in STITCH [[47]21]. We were especially
interested in addressing the following issues: 1) How do statin targets
and differentially regulated genes relate to each other? 2) Which
pathways are affected upon statin treatment? 3) How does the dynamics
of the neighborhood of specific proteins change after statin treatment?
In order to answer those questions, we have developed two Cytoscape
plug-ins that work together: viPEr, the virtual Pathway Explorer,
creates focus interaction networks by connecting two or more nodes with
each other. It applies user-provided expression data to score paths
between two nodes and thus limits the network to functionally relevant
paths. The Cytoscape plug-in PEANuT (Pathway Enrichment ANalysis Tool)
upgrades interaction networks with pathway information and identifies
enriched pathways in focus networks.
We have applied our toolbox to re-analyze the expression data from
atorvastatin-treated, primary human hepatocytes and found that the
transcription factors FOXA1, 2 and 3 are important regulatory players
in atorvastatin response.
Implementation
viPEr
viPEr was written in Java as a Cytoscape plug-in. The basis of all
functions is a recursive method, which iterates through the members
(nodes) of all paths emanating from a selected node. The step depth is
influenced by two parameters: 1) the maximum number of steps allowed
(set by the user). 2) the numerical values of the nodes. We used the
log2fold expression changes of atorvastatin treated primary hepatocytes
described in [[48]20] as numerical values.
viPEr can be accessed under:
[49]http://sourceforge.net/projects/viperplugin/
viPEr has three main search options:
1. ‘A to B’: ‘A to B’ connects two selected nodes with each other. We
refer to the paths between nodes A and B as cross-talk.
Mathematically, we define cross-talk as all paths between two nodes
(x1, x2), where a single node in a path can only be passed once.
The result is a focus network, which is determined by the maximum
number of steps allowed between the start and the target node. The
search is stopped when the target node is reached or the maximum
number of steps is exceeded. Only if the target has been found, a
path is stored, scored and displayed in the results tab. The focus
network is created based on all nodes that are present in all
stored paths. The connecting edges are taken from the original
network. Therefore, all known interactions between the subset of
nodes are included in the newly created focus network.
Scoring of paths is done using the following equation:
[MATH: Score=#ofdifferentlyregulatednodes∈pathpathlength2
:MATH]
The p-values for discovered paths in focus networks are calculated
based on the cumulative probability of the hypergeometric
distribution to find k or more differentially expressed genes in a
path of length n.
2. ‘connecting in batch’: similarly to the ‘A to B’ search, two groups
of nodes can be connected using the ‘connection in batch’ function.
For every node in the start list A, the recursive search is
computed towards every node in the target list B. A results tab
with scored paths is not created in this case.
3. ‘environment search’: The third option is to explore the regulated
proximity of a single node using the ‘environment search’. Just one
starting node is selected in this case. Mathematically, we define
the environment search as follows: a network is calculated from all
outgoing paths of length l from x1, where every node is allowed to
be passed only once per path and all paths with at least two
consecutive node scores below threshold t have been removed. The
iteration through emanating paths is carried out until the allowed
maximum search depth is reached. When exploring the neighborhood of
a single node, the numerical data are used to select paths
radiating from the selected node. Paths, in which at least two
consecutive nodes are not differentially expressed, are removed
from the resulting neighbor focus network. Thus, only paths that
contain differentially regulated nodes are considered for the
environment search, though single unregulated linker nodes are
allowed. The resulting network is referred to as a neighbor focus
network.
Using viPEr
Starting from any existing network supplemented with expression data,
the user has to select the attribute field containing the expression
information. A slider is automatically set to the respective range of
expression values. After adjusting the slider to the desired expression
range, different options are available in the workflow (see
Fig. [50]1).
1. ‘A to B’
This function executes the path search algorithm between two
selected nodes. The result is a focus network of all identified
paths of a certain length between two nodes. The user selects the
length (step-size) of the calculated paths. All interconnecting
edges are added to the focus network. A result list, which includes
every discovered path between the nodes, is located on the right
side of the screen. This list shows all paths, their respective
members and the assigned score as described above. The score can be
used to further reduce the focus network or simply to visualize
specific paths.
2. ‘connecting in batch’
Two groups of nodes can be connected in the ‘connecting in batch’
function of viPEr. The same algorithm is used as in the ‘A to B’
search, except that all paths between all members of a start list
and a target list are computed. This algorithm can be applied to
detect cross talk between two pathways, two protein complexes or
two hit lists from different experiments. Three buttons have to be
used for the ‘connecting in batch’ search: 1) a start protein list
has to be defined by selecting all starting nodes and pressing the
‘select start protein list’; 2) the target protein list has to be
selected accordingly and confirmed by pressing the ‘select target
protein list’ button; 3) the button ‘start connection in batch’
executes the search.
3. ‘environment search’
In case only a single protein of interest exists, the algorithm can
be used to observe the dynamics of expression in the environment of
this protein using the ‘environment search’. A single node is
selected and the search is executed with the button ‘environment
search’. All regulated nodes within a certain step size of the
selected protein give rise to the neighbor focus network.
Fig. 1.
Fig. 1
[51]Open in a new tab
Workflow for creating focus networks. Workflow of viPEr in creating
focus networks between two nodes/two groups of nodes, or in exploring
the neighborhood of a single node of interest. The user must select two
nodes or group of nodes for creating a focus network. A single node is
selected when exploring the neighborhood. Numerical data (for instance
from an expression screen) must be added to the network for scoring
paths of a focus network and for creating a neighbor focus network from
a single node. In both cases, the user selects the search depth. After
creating the focus network, the network can for instance be explored by
using and visualizing GO-terms. PEANuT is used to find and visualize
enriched pathways
PEANuT
PEANuT (Pathway Enrichment ANalysis Tool) is a Cytoscape plug-in
designed to annotate protein interaction networks with biological
pathway information and to identify enriched pathways in focus
networks. The interactome of the organism denotes the background
network. Next to visualizing enriched pathways in the focus networks,
the results can be exported as a tab delimited file.
PEANuT can be accessed under:
[52]http://sourceforge.net/projects/peanut-cyto and was implemented in
Java.
The user can choose between the three databases ConsensusPathDB
([53]http://consensuspathdb.org/, [[54]22]), Pathway Commons
([55]http://www.pathwaycommons.org/, [[56]23]) and Wikipathways
([57]http://www.wikipathways.org/ [[58]24]) to annotate the network.
While ConsensusPathDB requires Entrez gene IDs as input, Pathway
Commons and Wikipathways require UniProt accession numbers. Annotation
of nodes with these IDs can be done within Cytoscape using for instance
the plug-in CyThesaurus [[59]25].
ConsensusPathDB and Pathway Commons contain pathway data collected from
publicly available pathway databases (e.g., Reactome [[60]26], KEGG
[[61]27]; see the respective homepages for more information).
WikiPathways is a database based on the ‘wiki principle’ and provides
an open platform dedicated to collaborative registering, reviewing and
curation of biological pathways.
While Pathway Commons and WikiPathways work with a wide variety of
organisms, ConsensusPathDB is specialized on human, mouse and yeast
pathways. When the user chooses to annotate his network of interest
with ConsensusPathDB data, he can additionally import directed
interactions from KEGG to increase the amount of vertex degrees,
enabling more complex path searches using viPEr.
Information from Pathway Commons is accessed over their web service.
Flat files from the ConsensusPathDB and WikiPathways webpages are
downloaded via the Apache Commons IO library
([62]http://commons.apache.org/proper/commons-io/) and Cytoscape
internal downloader classes.
Once downloaded, ConsensusPathDB and WikiPathways can be used offline,
while Pathway Commons requires internet access. Network annotation with
Pathway Commons is slower, as it depends on the load and availability
of the host server, as well as internet connection speed.
The probability value for the pathway enrichment in the focus network
is determined using the Apache Commons Math library
([63]http://commons.apache.org/proper/commons-math/) to calculate the
cumulative probability of a hypergeometric distribution. Multiple
testing correction is achieved by applying either Bonferroni [[64]28]
or Benjamini-Hochberg [[65]29] correction.
PEANuT has three sub-menus:
1. ‘find pathways’: the find pathways sub-menu annotates the networks
in Cytoscape with pathway data. Networks can be labeled using more
than one pathway resource by re-using the sub-menu with different
pathway selections.
2. ‘show pathway statistics’: the ‘show pathway statistics’ sub-menu
calculates enriched pathways in a selected focus network. The user
has to select the focus network of interest, the background network
and choose a p-value cut-off. Enriched pathways can be selected for
visualization and downloaded as a tab-delimited file.
3. ‘download/update dependencies’: this sub-menu is used to download
pathway information for network annotation. It needs to be run
before using PEANuT the first time and should be run regularly to
update pathway information.
Using PEANuT
After installing PEANuT in Cytoscape by placing the plug-in in the
Cytoscape plug-in folder, the tool can be accessed via the plug-in
menu. The sub-menus are used as follows:
1. ‘find pathways’
This sub-menu allows the user to start the software and annotate
the network(s) of choice with pathway data. In a simple dialog the
user can select between three different databases: ConsensusPathDB,
Pathway Commons or WikiPathways. The user can select different
options for each database depending on preferences (such as import