Abstract

Background

   The interplay between metabolic processes and signalling pathways
   remains poorly understood. Global, detailed and comprehensive
   reconstructions of human metabolism and signalling pathways exist in
   the form of molecular maps, but they have never been integrated
   together. We aim at filling in this gap by integrating of both
   signalling and metabolic pathways allowing a visual exploration of
   multi-level omics data and study of cross-regulatory circuits between
   these processes in health and in disease.

Results

   We combined two comprehensive manually curated network maps. Atlas of
   Cancer Signalling Network (ACSN), containing mechanisms frequently
   implicated in cancer; and ReconMap 2.0, a comprehensive reconstruction
   of human metabolic network. We linked ACSN and ReconMap 2.0 maps via
   common players and represented the two maps as interconnected layers
   using the NaviCell platform for maps exploration
   ([35]https://navicell.curie.fr/pages/maps_ReconMap%202.html). In
   addition, proteins catalysing metabolic reactions in ReconMap 2.0 were
   not previously visually represented on the map canvas. This precluded
   visualisation of omics data in the context of ReconMap 2.0. We
   suggested a solution for displaying protein nodes on the ReconMap 2.0
   map in the vicinity of the corresponding reaction or process nodes.
   This permits multi-omics data visualisation in the context of both map
   layers. Exploration and shuttling between the two map layers is
   possible using Google Maps-like features of NaviCell. The integrated
   networks ACSN-ReconMap 2.0 are accessible online and allows data
   visualisation through various modes such as markers, heat maps,
   bar-plots, glyphs and map staining. The integrated networks were
   applied for comparison of immunoreactive and proliferative ovarian
   cancer subtypes using transcriptomic, copy number and mutation
   multi-omics data. A certain number of metabolic and signalling
   processes specifically deregulated in each of the ovarian cancer
   sub-types were identified.

Conclusions

   As knowledge evolves and new omics data becomes more heterogeneous,
   gathering together existing domains of biology under common platforms
   is essential. We believe that an integrated ACSN-ReconMap 2.0 networks
   will help in understanding various disease mechanisms and discovery of
   new interactions at the intersection of cell signalling and metabolism.
   In addition, the successful integration of metabolic and signalling
   networks allows broader systems biology approach application for data
   interpretation and retrieval of intervention points to tackle
   simultaneously the key players coordinating signalling and metabolism
   in human diseases.

Electronic supplementary material

   The online version of this article (10.1186/s12859-019-2682-z) contains
   supplementary material, which is available to authorized users.

   Keywords: Signalling, Metabolism, Networks, Comprehensive map, Systems
   biology, Cancer, Multi-level omics data, Data visualisation

Background

   There is still a gap in understanding the coordination between
   metabolic functions and signalling pathways in mammalian cells.
   Metabolic processes and cell signalling pathways contain a large number
   of molecular species together with their complex relationships. No
   single mind can accurately account for all these molecular interactions
   whilst drawing conclusions from a process of descriptive thought. To
   tackle the complexity of these multi-molecular interactions networks, a
   systems biology approach is needed. In addition, there is a high number
   of omics data such as transcriptome, proteome, metabolome, etc.
   accumulated for many human diseases as age-related disorders (e.g.
   neurodegeneration or cancer). Modelling and interpretation of these
   data combining metabolic and signalling networks together can help to
   decipher the mechanisms responsible for deregulations in human
   disorders by considering a broader range of molecular processes types.

   Much of the produced high-throughput molecular data in many medical and
   biological applications remain under-explored due to the lack of
   insightful methods for data representation in the context of formally
   represented biological knowledge. Carefully designed maps of complex
   molecular mechanisms such as the whole-cell reconstructions of human
   metabolism in ReconMap 2.0 [[36]1, [37]2] or the global reconstruction
   of cell signalling of cancer in ACSN [[38]3] potentially provide ways
   to better exploit existing and new multi-omics data, by overlaying it
   on top of large molecular maps.

   ACSN is a resource and a web-based environment that contains a
   collection of interconnected signalling network maps
   ([39]https://acsn.curie.fr). Cell signalling mechanisms are depicted on
   the maps at the level of biochemical interactions, forming a large
   network of 4600 reactions covering 1821 proteins and 564 genes and
   connecting several major cellular processes [[40]3]. ACSN is composed
   of 5 interconnected maps of major biological processes implicated in
   cancer. The maps are further divided into functional modules that
   represent signalling pathways collectively responsible for the
   execution of a particular process. In total, there are 52 functional
   modules in the ACSN resource (See Table [41]1 for terms definition).
   Each of these modules can be visualised in the context of the global
   ACSN map or accessed as individual maps. The Atlas is a
   “geographic-like” interactive “world map” of molecular interactions.
   ACSN is supported by NaviCell platform for easy map navigation and its
   annotations using Google maps™ engine. The logic of navigation as
   scrolling and zooming; features as markers, pop-up bubbles and zoom bar
   are adapted from the Google map. Finally, NaviCell includes a powerful
   module for data visualisation. Users can map and visualise different
   types of “omics” data on the NaviCell maps [[42]4, [43]5].

Table 1.

   Term definitions used in the paper
   Graphical standards and exchange formats
    XML format Markup language that defines a set of rules for encoding
   documents used to store and transport data by describing the content in
   terms of what data is being described.
    SBGN Systems Biology Graphical Notation (SBGN) is a standard graphical
   syntax for representation of biological processes and interactions.
   SBGN is compatible with multiple pathway drawing and analytical tools,
   [44]http://sbgn.github.io/sbgn/
    SBML Systems Biology Markup Language (SBML) is a representation
   format, based on XML, for communicating and storing computational
   models of biological processes. It is a free and open standard language
   with widespread software support, [45]http://sbml.org
    Standard identifier (ID) Community-accepted nomenclature for
   scientific naming of biomolecules as genes, proteins, chemicals, drugs
   etc. The sources for standard IDs are repositories as UNIPROT, CHEB,
   HUGO, [46]http://identifiers.org
    Data and models exchange formats Standard formats for data and models
   to facilitate networks and software intercompatibility. There are two
   major standard networks exchange formats, BIOPAX for complex networks
   and SIF for simple binary interactions. The CellDesigner xml format is
   a commonly-used exchange format compatible with multiple network
   analysis tools.
   Signalling and metabolic network map
    Map Diagram of detailed molecular interactions with meaningful layout
   reflecting a certain biological process, which is graphically
   represented in CellDesigner tool.
    Map module (in ACSN) Part of the map representing a sequence of
   molecular interactions responsible for execution of a particular
   function.
    Metabolic pathway Subsystem (in ReconMap 2.0) Set of reactions forming
   a metabolic function.
   Set of metabolic reactions associated to (representing) a specific
   metabolic pathway.
    Map node Graphical representation of a molecule on the map.
    Map entity and alias Unique representation of a molecule on the map.
   As each molecule can be present multiple times on a map, each
   individual representation is called an alias of the entity. This
   definition corresponds to the CellDesigner feature.
   Networks merging procedure
    Voronoi cell Individual shape allocated to a seed from the Voronoi
   method. Each cell is a space that contains only the seed and can be
   used to generate points inside without overlapping with close seeds.
    Voronoi tessellation A partitioning of a plane into regions based on
   distance to points in a specific subset of the plane. That set of
   points (called seeds) is specified beforehand, and for each seed there
   is a corresponding region consisting of all points closer to that seed
   than to any other. These regions are called Voronoi cells. In our case,
   each seed is a molecule or a reaction’s central glyph.
    Centroid Barycenter of a cluster.
    Merging function of BiNoM Function allowing taking two or more
   CellDesigner maps and merging them in one unique map. This function
   modifies each entities’ id and alias but keeps the name, coordinates
   and notes.
   NaviCell
    Semantic zoom A mechanism providing several map views with different
   levels of details depiction achieved by gradual exclusion of details
   while zooming out. It simplifies navigation through large maps of
   molecular interactions by providing several levels of details,
   resembling navigation through geographical maps. Exploring the map from
   a detailed toward a top-level view is achieved by gradual exclusion and
   modification (simplification and abstraction) of details. One of the
   main principles of semantic zooming is in that every detail which is
   shown on the map at a current zoom level, should be readable.
    Marker Symbol indicating location of chosen objects on the map;
   adapted from Google maps.
    Pop-up bubble Small window that opens by clicking on marker. Contains
   short description and hyperlinks related to the marked entity.
    Annotation post Detailed map entity annotation created in CellDesigner
   by map manager. The annotation is converted to Annotation post and
   displayed in the associated blog by NaviCell.
   [47]Open in a new tab

   The manually curated genome-scale reconstruction Recon2.04 is a
   representation of the human metabolism. It accounts for 1733
   enzyme-encoding genes associated to 7440 reactions which are
   distributed in 100 subsystems, referring to metabolic pathways.
   Furthermore, Recon2.04 accounts with 2626 unique metabolites
   distributed over eight cellular compartments [[48]2]. Subsequently, to
   visualise the resource, a comprehensive metabolic map termed
   ReconMap 2.0 was generated from the Recon2.04 resource [[49]1]. In the
   ReconMap 2.0 reactions (hyper-edges) were manually laid out using the
   biochemical network editor CellDesigner [[50]6]. ReconMap 2.0 is
   currently distributed in a Systems Biology Graphical Notation (SBGN)
   compliant format and its content is also accessible via a web interface
   ([51]https://vmh.uni.lu/#reconmap). All major human metabolic pathways
   are considered and represented as a seamless network where different
   pathways are interconnected via common molecules. There are 96
   subsystems on the ReconMap 2.0, each of them representing a specific
   metabolic pathway (See Table [52]1 for terms definition).

   By integrating these resources together, it will be possible to
   elucidate the crosstalk between metabolic and signalling networks. In
   addition, the integrated networks, provided in a common graphical
   language and available in standard exchange formats, makes them
   accessible for multiple systems biology tools. It opens an opportunity
   to model coordination between signalling pathways and metabolism using
   various systems biology approaches. Among others, there are several
   methods for multi-level omics data analysis in the context of the
   biological network maps that allow defining “hot” areas in molecular
   mechanisms and point to key regulators in physiological or in
   pathological situations [[53]7–[54]9] and beyond.

General workflow for integration of ACSN and ReconMap 2.0 networks

   With the aim to integrate signalling and metabolic networks there is a
   need to find common players (proteins) that participate in the
   regulation of metabolic processes and simultaneously involved in signal
   transduction pathways. Thus, the networks can be interconnected via
   these common players. In addition, some solution for visualisation of
   proteins participating in the catalytic process in ReconMap 2.0 should
   be provided, since there is no such representation up to date.

   The rationale behind the proposed methodology is to take an advantage
   of the CellDesigner SBML format for networks representation and develop
   a robust automated algorithm for an efficient finding of coordinates
   for new entities avoiding an overlap with existing elements and
   visualising these entities in the vicinity of the corresponding
   reactions they regulate. The integrated networks can be provided as
   interconnected layers supported by NaviCell platform for navigation and
   data integration.

   The suggested methodology is applied for ACSN and ReconMap 2.0
   resources integration. However, this is a generic method applicable for
   integration of different types of networks prepared in CellDesigner
   SBML format (Fig. [55]1). In the following sections of the paper, we
   explain the challenges and describe how each step mentioned in the
   workflow was addressed.

Fig. 1.

   [56]Fig. 1
   [57]Open in a new tab

   General workflow for integration of proteins into a metabolic network.
   (1) Extraction of the informations on proteins present in metabolic
   reactions from a model and CellDesigner file. (2) Addition of proteins
   in the vicinity of catalysed reactions. (3) Merging of obtained
   proteins with the metabolic map through the BiNoM plugin. (4) As a
   result, a CellDesigner network file containing proteins on top of the
   original metabolic network is obtained. This file can be later
   integrated into NaviCell through the NaviCell Factory tool

   The workflow in the Section 2 includes the following major steps (see
   Table [58]1 for terms definition):
     * Identification of common proteins between ACSN and ReconMap 2.0
       networks
     * Finding metabolic and molecular processes crosstalk between ACSN
       and ReconMap 2.0
     * Displaying protein nodes on the ReconMap 2.0 map
     * ACSN-ReconMap 2.0 networks integration and visualisation using
       NaviCell

Materials and methods

Step-by-step procedure for network integration

Identification of common proteins between ACSN and ReconMap 2.0 networks

   ACSN and ReconMap 2.0 maps contain information on proteins implicated
   in the regulation of reactions. First, the systematic use of the common
   identifiers as standard protein names (HUGO) for all proteins in both
   resources was verified and inconsistencies corrected. Thus, the
   proteins found in both resources ACSN and ReconMap 2.0, were compared,
   quantified and visualised. We detected 252 proteins in common between
   the two networks (Additional file [59]1).

Displaying protein nodes on the ReconMap 2.0 map

   ACSN and ReconMap 2.0 are both used as visual objects for exploration
   of processes as well as for data integration and visualisation in the
   context of the maps. After identification of the cross-talks between
   the two resources, it is important to ensure that all components of the
   maps are represented in a visual manner suitable for meaningful
   visualisation of omics data.

   Due to the different nature of the networks, protein nodes are
   explicitly visualised on the ACSN map. However, in the ReconMap 2.0 the
   Standard Names (Identifiers) of proteins regulating metabolic reactions
   are included into the reaction annotations, but not represented
   visually on the map canvas. This precludes visualisation of omics data
   in the context of ReconMap 2.0 map. We developed a procedure for
   displaying the protein nodes on the ReconMap 2.0 map in the vicinity of
   the corresponding reaction edges, that now permits a multi-omics data
   visualisation in the context of both ACSN and ReconMap 2.0 layers.

Extraction of information regarding reactions and implicated genes in the
metabolic network

     * Recuperation of the information from the Recon2.04 model
          + ReconMap 2.0 is the graphical representation of the
            literature-based genome-scale metabolic reconstruction
            Recon2.04, which is freely available at
            ([60]https://vmh.uni.lu/#downloadview). It is stored as a
            MatLab “.mat” file which contains a direct link between
            metabolic reactions and gene Entrez, specified by gene-rules.
            Therefore, it is possible to generate a direct
            protein-reaction association based on the gene codifying for
            the protein. As ACSN uses HUGO Standard Identifiers, Entrez
            IDs in ReconMap 2.0 were first converted to HUGO.
          + It is important to stress that this approach is based on a
            simplified assumption that if a protein is associated to a
            metabolic reaction in ReconMap 2.0, it may have a role in
            catalysis of the reactions. However, it is clear that the
            biological regulation is much more sophisticated than this
            basic assumption. For example, there are many protein
            complexes collectively regulating propagation of metabolic
            reaction and only part of them are actual enzymes that execute
            the catalysis, whereas others are co-factors of regulatory
            sub-units. Moreover, the activation states of proteins that is
            often regulated by post-translational modifications are also
            not taken into account in this simplified approach.
     * Recuperation of entities positions in ReconMap 2.0 from the XML
       network file
          + In the graphical representation of reactions in CellDesigner,
            each reaction contains a central glyph in the form of a
            square. This glyph is normally used to allocate the position
            of the markers (see Table [61]1 for terms definition).
            However, its location is not explicitly saved in the network
            XML file. A specific function of NaviCell factory can
            calculate the coordinates of these glyphs and extract them in
            a separated file. These coordinates can be later used as a
            reference positions to assign protein nodes position in the
            ReconMap 2.0 map canvas.

Automated calculation of proteins coordinates in vicinity of corresponding
reactions at ReconMap 2.0 network

     * Computing Voronoi cells for all elements
          + By using the Voronoi method, each element of the network
            (molecules, reaction glyphs, etc.) is associated to a Voronoi
            cell. This method guarantees the lack of overlapping elements
            with already existing entities in the network when adding new
            proteins (Fig. [62]2).
     * Creation of randomly distributed points inside each reaction’s
       Voronoi cell
          + When each entity has a cell assigned, cells of reactions’
            central glyphs are utilised. Each cell has a certain number of
            points assigned randomly inside the cell. For our purpose, 100
            points were deemed sufficient (Fig. [63]2).
     * Application of K-means algorithm to create K clusters
          + Each reaction has a certain number of proteins implicated in
            its catalysis. Using the information from the model, the
            K-means algorithm was applied to identify the number of
            cluster centres corresponding to the number of protein nodes
            (Fig. [64]2).
     * Assigning protein positions using centroids coordinates of each
       cluster
          + After the protein clusters are found, their centroids (see
            Table [65]1 for terms definition) are calculated and saved as
            the coordinated of the proteins tied to the specific reaction
            as catalysts (Fig. [66]2).

Fig. 2.

   [67]Fig. 2
   [68]Open in a new tab

   Illustration of the three steps for automated proteins addition in the
   vicinity of a reaction. The first step is to generate a Voronoi cell
   for each entity in the map. The second step is to generate several
   randomly assigned points in the Voronoi cell of reactions catalysed by
   proteins. The third step consists in using the k-means algorithm to
   generate the needed number of clusters and assign the cluster’s
   centroids coordinates as those of the proteins catalysing the reaction
   in question

Conversion of obtained coordinates into a standard format (SBML)

     * Saving protein positions in a BiNoM Reaction Format
          + Following the previous steps, a file in the BiNoM Reaction
            Format is obtained, containing the name of the proteins as
            well as their coordinates and sizes. This simple file will
            then be converted to a standard CellDesigner SBML format to be
            compatible with the original metabolic network. As
            CellDesigner allows the manipulation of “aliases” (multiple
            copies of the same entity); each protein with the same name
            present multiple times will have an apostrophe attached to its
            name based on the number of its repetition within the network.
     * Conversion of BiNoM Reaction Format into a CellDesigner map
          + Using a custom python script, information stored in the BiNoM
            Reaction Format is transformed into a XML file following the
            SBML format. This file will contain each protein names, IDs,
            alias IDs, coordinates and type. As for now, only the
            manipulation of simple proteins is available.
     * Merging of the ReconMap 2.0 and Proteins maps using BiNoM merging
       function.
          + Once the file containing proteins to add to the metabolic map
            is obtained, as they both are in the same SBML format, it is
            possible to merge them by using a function of the BiNoM
            plugin. This function allows transforming two or more
            separated maps into one unique map. This final merged map will
            be transformed into the NaviCell environment using the
            NaviCell Factory package
            ([69]https://github.com/sysbio-curie/NaviCell).

   Thus, proteins implicated in the catalysis of a reaction can be seen in
   the vicinity of the corresponding reactions (Additional file [70]2,
   Fig. A). It is important to note that in some cases, reactions are
   regulated by many proteins, for example in the case of protein
   families, and the resulting configuration of protein nodes can be very
   dense (Additional file [71]2, Fig. B). This aspect can be improved by
   grouping protein families and visualising them together as a single
   generic entity. However, it is not always relevant to group all protein
   sharing a similar name by “family”, since different family members
   might fulfil distinct or even opposite function, leading to a
   misinterpretation of the omics data in the context of the maps.
   Therefore, each protein was kept as a unique and independent entity.

   Thanks to this method, 1.550 proteins were allocated in the
   ReconMap 2.0 canvas associated to more than 7.500 aliases. The
   algorithm for assigning proteins’ coordinates is robust and its
   computation time is also scalable as the generation of the 7.500
   allocation points is resolved in a matter of seconds.

ACSN and ReconMap 2.0 merging

   Once the protein positions file has been generated, it was converted to
   a CellDesigner [[72]10, [73]11] XML format through a custom python
   script
   ([74]https://github.com/sysbio-curie/CellDesigner_networks_map_integrat
   ion_procedure). This script allows to obtain a file in XML format
   following the standard of CellDesigner’s SBML. This ‘map’ contains only
   proteins in the positions they should belong on the final metabolic
   map. This file was then merged with the ReconMap 2.0 network by using
   an existing merging function of BiNoM [[75]12, [76]13] to obtain the
   final network containing the original ReconMap 2.0 as well as the
   proteins in the vicinity of reactions they catalyse.

Tools, data source and code accessibility

Maps generation tool

   CellDesigner [[77]10, [78]11] is a tool used for the construction of
   both networks and its standard notation allowed the integration and
   linking across these maps. Both maps are available in a XML format,
   thus facilitating their automated manipulation.

Map entity annotation with NaviCell format

   The annotation panel followed the NaviCell annotation format of each
   entity and reaction of the maps includes sections ‘Identifiers’,
   ‘Maps_Modules’, ‘References’ and ‘Confidence’ as detailed in [[79]3].