Abstract

   Analysis of Pan-omics Data in Human Interactome Network (APODHIN) is a
   platform for integrative analysis of transcriptomics, proteomics,
   genomics, and metabolomics data for identification of key molecular
   players and their interconnections exemplified in cancer scenario.
   APODHIN works on a meta-interactome network consisting of human
   protein–protein interactions (PPIs), miRNA-target gene regulatory
   interactions, and transcription factor-target gene regulatory
   relationships. In its first module, APODHIN maps proteins/genes/miRNAs
   from different omics data in its meta-interactome network and extracts
   the network of biomolecules that are differentially altered in the
   given scenario. Using this context specific, filtered interaction
   network, APODHIN identifies topologically important nodes (TINs)
   implementing graph theory based network topology analysis and further
   justifies their role via pathway and disease marker mapping. These TINs
   could be used as prospective diagnostic and/or prognostic biomarkers
   and/or potential therapeutic targets. In its second module, APODHIN
   attempts to identify cross pathway regulatory and PPI links connecting
   signaling proteins, transcription factors (TFs), and miRNAs to
   metabolic enzymes via utilization of single-omics and/or pan-omics data
   and implementation of mathematical modeling. Interconnections between
   regulatory components such as signaling proteins/TFs/miRNAs and
   metabolic pathways need to be elucidated more elaborately in order to
   understand the role of oncogene and tumor suppressors in regulation of
   metabolic reprogramming during cancer. APODHIN platform contains a web
   server component where users can upload single/multi omics data to
   identify TINs and cross-pathway links. Tabular, graphical and 3D
   network representations of the identified TINs and cross-pathway links
   are provided for better appreciation. Additionally, this platform also
   provides few example data analysis of cancer specific, single and/or
   multi omics dataset for cervical, ovarian, and breast cancers where
   meta-interactome networks, TINs, and cross-pathway links are provided.
   APODHIN platform is freely available at
   [31]http://www.hpppi.iicb.res.in/APODHIN/home.html.

   Keywords: meta-interactome, network analysis, pan-omics, multi-omics
   analysis, pathway cross links

Introduction

   Technological advances have made different types of omics data
   accessible in large scale. Different types of omics data are outcomes
   of profiling of different bio-entities, namely RNA (RNA
   transcriptomics), miRNA (miRNA transcriptomics), proteins (proteomics,
   phosphoproteomics), genes (genomics, epigenomics), metabolites
   (metabolomics), lipids (lipidomics), and pharmacogenomics. These
   bio-entities are functionally inter-related in a complex fashion.
   Extrapolation from single omics data of one type of bio-entity fails to
   provide the true biological status of various linked bio-entities
   (e.g., RNA, protein, metabolites). Hence, to inquire the causative
   phenomena underlying the genesis and progression of systemic/genetic
   diseases, an integrative analysis considering the profiles of above
   mentioned bio-entities appears as a requisite. Moreover, because of the
   heterogeneous nature of the diseases, even if patients having similar
   pathological features are treated similarly, the disease prognosis
   differs a lot. It shows the inadequacy of symptom-based diagnosis and
   demands patient-specific analysis of omics data. Collective analysis of
   these multi-dimensional omics data is referred to as “pan-omics”
   ([32]Sandhu et al., 2018) which are also considered as “big” data in
   the context of biological data analysis. Pan-omics data enable us to
   predict novel functional interactions between molecular mediators at
   multiple levels. Also, these data have the potential to uncover crucial
   biological observations into hallmarks and pathways that would
   otherwise not be obvious through single-omics studies. Patient-specific
   pan-omics data analysis is going to disclose the genetic, epigenetic,
   and other functional profiles responsible for the disease of an
   individual which might eventually lead to the development of
   individualistic “precision medicine” and will provide right treatment
   to right patient at right time.

   Cancer is a leading cause of death worldwide, being responsible for 9.6
   million deaths in 2018 ([33]Bray et al., 2018). Cancer is a
   heterogeneous disease caused by aberrations of genes and proteins.
   “Precision oncology” promises identification of disease subtypes,
   specific biomarkers and subsequently prediction and translation toward
   the development of treatment procedures. Pan-omics or multi-omics
   analysis in breast cancer has revealed significant differences in
   molecular subtype distribution ([34]Kan et al., 2018). Genomics and
   transcriptomics analysis of breast cancer data of Korean and Caucasian
   cohorts showed underlying molecular differences, which are responsible
   for the occurrence of breast cancer at the younger age in the Asian
   population compared to the western population ([35]Kan et al., 2018).
   Multi-omics analysis extended to different types of cancers confirms
   the existence of broadly two types of cancers, cancers caused by
   recurrent mutations and cancers caused by copy-number variations
   ([36]Mcgrail et al., 2018). Computational methodologies like,
   artificial intelligence are being used widely to extract
   patient-specific information from these big data, discussed in a recent
   review ([37]Biswas and Chakrabarti, 2020). Machine learning based
   pan-omics analysis of pan-cancer data shows the existence of clusters
   within different types of cancers ([38]Ramazzotti et al., 2018),
   identifies cell-model selective anti-cancer drug targets for breast
   cancer ([39]Gautam et al., 2019).

   Multiple data portals like TCGA ([40]TCGA, 2020) and ICGA ([41]Zhang et
   al., 2011) have been developed to make multi-omics data conveniently
   accessible. LinkedOmics contains pan-omics data of several types of
   cancers ([42]Vasaikar et al., 2018). Databases like, GliomaDB ([43]Yang
   et al., 2019) and MOBCdb ([44]Xie et al., 2018) are dedicated to
   integrate multi-omics data for specific type of cancers. Standalone
   software packages and web-servers are also being developed for the
   analysis pan-omics data. [45]Table 1 compares the analytical tools
   which are being used by researchers. R package mixOmics ([46]Rohart et
   al., 2017), based on multi-variate analysis is available for the
   integration of multi-omics data. It finds subsets of important features
   but excludes network analysis. OmicsNet provides a web-based platform
   to create different types of interactive molecular interaction networks
   for single or multiple types of omics data ([47]Zhou and Xia, 2018).
   Network-based integration of multi-omics data using iOmicsPASS, allows
   to predict subnetworks of molecular interactions within a single type
   or multiple types of omics data ([48]Koh et al., 2019). R package
   Miodin ([49]Ulfenborg, 2019) provides a software infrastructure for
   vertical and horizontal integration of multi-omics data but lacks a
   comprehensive network analysis and visualization. PaintOmics allows
   integrated visualization of multiple types of omics data in KEGG
   pathway diagrams ([50]Hern et al., 2018). Software package, Multi-Omics
   Factor Analysis (MOFA) ([51]Argelaguet et al., 2018) integrates omics
   data in an unsupervised approach implementing generalized principal
   component analysis (PCA). pathfindR ([52]Ulgen et al., 2019) finds
   active sub networks for genes in omics data and perform pathway
   enrichment analysis. R package Mergeomics ([53]Shu et al., 2016)
   provides a pipeline to identify important pathways and key drivers in
   biological systems. However, platforms required for systematic analysis
   of the landscape of genetic, epigenetic, and metabolomics alterations
   and biological and clinical relevance of multi-layer signature in
   cancers are still limited.

TABLE 1.

   Comparison of APODHIN with other existing pan-omics data analysis
   tools.
   Feature APODHIN OmicsNet [14] mixOmics [13] iOmicsPASS [15] Miodin [16]
   Platform Web Web Standalone Standalone Standalone
   Programming language Python, R, perl R R C^++ R
   Types of omics data as input
   mRNA transcriptomics Yes Yes Yes Yes Yes
   miRNA transcriptomics Yes Yes Yes No Yes
   Proteomics Yes Yes Yes Yes Yes
   Phospho-proteomics Yes No Yes No No
   Genomics Yes Yes Yes Yes Yes
   Epi-genomics Yes No No No Yes
   Metabolomics Yes Yes Yes No No
   Multiple lists of same type of omics data Yes No Yes Yes Yes
   Finding deregulated proteins/genes/miRNAs Yes No No No No
   Map in meta-interactome Yes No No No No
   3D interactive network Yes Yes No No No
   Network topology analysis Yes No No No No
   Prognostic status of proteins/genes (in cancer) Yes No No No No
   Pathway enrichment analysis Yes Yes Yes Yes No
   Analysis for regulatory network protein links Yes No No No No
   [54]Open in a new tab

   Different types of omics data carry information on different types of
   bio-entities, e.g., genes, proteins, miRNAs, metabolites, etc. Hence,
   integrative analysis of pan-omics data needs a meta-interactome
   consisting of a protein–protein interaction network (PPIN) as well as
   different regulatory networks. The web server for the Analysis of
   Pan-omics Data in Human Interactome Network (APODHIN) provides a unique
   platform where users can analyze different types of omics data using a
   human cellular meta-interactome network. Graph theory based network
   analysis has become an essential tool for analysis of PPIN for
   extracting proteins important in the construction and information flow
   of the network ([55]Jeong et al., 2001; [56]Barabási and Oltvai, 2004;
   [57]Mistry et al., 2017; [58]Ashtiani et al., 2018), APODHIN provides
   options to identify topologically important nodes (TINs) such as hubs,
   bottlenecks, and central nodes (CNs) and their subsequent modules via
   protein–protein interaction (PPI) and regulatory relationship network
   analyses and pathway enrichment analysis. TINs are also correlated as
   prospective diagnostic and/or prognostic biomarkers. APODHIN can also
   analyze and compare multiple omics data set for a single omics layer,
   such as transcriptomics, proteomics data collected from different
   patient cohorts and/or different stage/grade of the same cohort.

   Additionally, utilizing multi-omics data APODHIN calculates
   cross-pathway regulatory and PPI links connecting signaling proteins or
   transcription factors (TFs) or miRNAs to metabolic enzymes and their
   metabolites using network analysis and mathematical modeling. These
   cross-pathway links were shown to play important roles in metabolic
   reprogramming in cancer scenarios such as glioblastoma multiforme in a
   previous work ([59]Bag et al., 2019).

   In addition to the server part, APODHIN shares analysis of multi-omics
   data from various cancer cell lines where TINs and cross-pathway links
   were identified using publicly available omics datasets collected for
   various gynecological cancers. APODHIN platform is freely available at
   [60]http://www.hpppi.iicb.res.in/APODHIN/home.html.

Materials and Methods

Server Description

   Analysis of Pan-omics Data in Human Interactome Network web server is
   dedicated for the integration and subsequent analysis using single or
   multiple types of omics data. For single type of omics data, APODHIN
   can analyze multiple datasets (up to 3) which may correspond to either
   different stages of a disease from a single cohort or from dataset
   collected from multiple patient cohorts and/or cell lines.

   For multiple types of omics data, APODHIN allows single input data file
   for each type of omics data. Following sections briefly describe the
   various analytical part of the APODHIN server.

Data Collection

   Analysis of Pan-omics Data in Human Interactome Network web server is
   preloaded with a human cellular meta-interactome network. This
   meta-interactome consists of human protein–protein interaction network
   (HPPIN), network of human miRNAs and their target genes and network of
   human TFs and their target genes. The PPI data was collected from
   STRING ([61]Szklarczyk et al., 2019) database (version 11).
   Interactions having a medium threshold of experimental score ≥700 were
   considered ([62]Ferretti and Cortelezzi, 2011) for construction of the
   PPIN. Target gene information of miRNAs was collected from the TarBase
   ([63]Vergoulis et al., 2012) and miRTarBase ([64]Chou et al., 2016)
   databases. From the TarBase database (version 6) we have taken reliable
   interactions supported only by low-throughput experiments (e.g.,
   reporter gene assay, western blot, qPCR, etc.) whereas miRNA target
   interactions with strong confidence (i.e., validated by either of
   report assay, western blot, qPCR experiments) from miRTarBase (version
   6) were considered for APODHIN meta-interactome network. We trusted on
   the more reliable low-throughput experimental data to build the parent
   miRNA-target mRNA interactome network. We found 2492 target genes for
   544 miRNAs creating 6917 interactions. TFs and their target genes were
   downloaded from Human Transcriptional Regulation Interactions database
   (HTRIdb) ([65]Bovolenta et al., 2012). We found 11887 target genes for
   284 TFs creating 18153 interactions. These three networks were merged
   together to form the APODHIN meta-interactome consisting of two types
   of biomolecular nodes i.e., proteins/genes and miRNAs along with three
   types of interactions, i.e., protein–protein, miRNA-target gene, and
   TF-target gene, respectively.

   Additionally, we have also included a network of metabolites as
   substrate and product with their corresponding metabolic enzymes in the
   APODHIN server. For constructing this network, we downloaded metabolic
   reactions from MetaNetX database ([66]Moretti et al., 2016) and
   extracted the metabolites along with the corresponding metabolic
   enzymes and further filtered those enzymes and metabolites which have
   been listed in the Human Metabolome Database (HMDB) database
   ([67]Wishart et al., 2018).

Pan-omics Data Integration and Meta-Interaction Network Extraction

   In APODHIN web server, user can upload single or multiple types of
   omics data. The server accepts RNA transcriptomics, miRNA
   transcriptomics, proteomics, phosphoproteomics, genomics, epigenomics,
   and metabolomics data. The current version of the server accept only
   processed format of the omics data where differential
   expression/abundance of corresponding biomolecules are provided with
   logFC for defining up and down regulation of genes/miRNAs/proteins and
   threshold probability or p-value. For RNA transcriptomics, miRNA
   transcriptomics and proteomics data user should select threshold values
   of logFC for defining up and down regulation of genes/miRNAs/proteins
   and corresponding adjusted p-value. Uploaded files should contain list
   of genes/miRNAs/proteins along with logFC and p-values. Sample file
   formats for different omics data are provided in the APODHIN help page.
   For genomics, epigenomics, and phosphoproteomics data, genes that are
   mutated and/or methylated and proteins, which are phosphorylated are
   considered, respectively. APODHIN help page also provides guidelines to
   process GEO ([68]Barrett et al., 2013) transcriptomics data for using
   in APODHIN. Packages and tools for GEO series data are also enlisted in
   the APODHIN “Help” page. For other types, of omics data like,
   proteomics, genomics, metabolomics, useful links for data processing is
   provided in the APODHIN help page and it will be made more enriched
   gradually depending on the requirements from users.

   Analysis of Pan-omics Data in Human Interactome Network web server
   extracts the interactome networks from the parent meta-interactome for
   the genes, mRNAs, miRNAs, proteins, and metabolites that are either
   deregulated or altered according to the user supplied single or
   multiple omics data. It creates a filtered meta-interactome network
   comprising of deregulated or altered nodes and their 1st or 2nd level
   (as chosen by user) interactors and/or regulators. For metabolomics
   data, the web server finds out the proteins linked with metabolites and
   constructs network. These single or multi omics data specific
   meta-interactome networks are subsequently displayed in an interactive
   three-dimensional (3D) network viewer within the APODHIN server. For
   creating omics data mapped network, and subsequently network analysis,
   APODHIN does not provide any special weight or scores to any type of
   omics data.

   For the module “pathway connectivity analysis,” RNA transcriptomics,
   miRNA transcriptomics, and proteomics data were considered as primary
   data and submission of at least one of them is mandatory to define
   deregulated miRNAs and/or genes/proteins. In case of “pathway
   connectivity analysis,” the logFC values for each of the uploaded omics
   data is normalized in the scale of −1 to +1 following Eq. 1,
   [MATH:
   <mrow><mrow><mi>log</mi><mo>⁡</mo><mrow><mi>F</mi><mo>⁢</mo><msub><mi>C
   </mi><mrow><mtext>normalized</mtext></mrow></msub></mrow></mrow><mo>=</
   mo><mfrac><mrow><mi>log</mi><mo>⁡</mo><mrow><mi>F</mi><mo>⁢</mo><mi>C</
   mi></mrow></mrow><msub><mrow><mo>|</mo><mrow><mi>log</mi><mo>⁡</mo><mro
   w><mi>F</mi><mo>⁢</mo><mi>C</mi></mrow></mrow><mo>|</mo></mrow><mrow><m
   text>max</mtext></mrow></msub></mfrac></mrow> :MATH]
   (1)

   where positive and negative values indicate up and down regulated
   entities, respectively. If more than one primary omics data, for
   example, transcriptomics and proteomics are provided, APODHIN web
   server sums up the normalized logFC values from the different omics
   data for the same node (RNA/protein) and if the sum is non-zero,
   gene/protein is considered deregulated. Primary omics data determines
   whether the gene is deregulated or not. Also, if a gene is found not
   altered in supplied primary omics data, APODHIN does not consider this
   gene for further analysis, irrespective of its status in the supplied
   secondary omics data. Details of the utilization of the normalized
   omics values in mathematical modeling based pathway connectivity link
   identification are provided later. In this module, the information on
   metabolites for any enzyme can be obtained in the associated table on
   selection of enzyme.

Network Analysis and Identification of TINs

   Once the context specific meta-interactome network is formed via
   utilization of user supplied single or multiple omics data, APODHIN web
   server primarily finds three types of TINs, namely, hubs, CNs
   ([69]Bhattacharyya and Chakrabarti, 2015) and bottlenecks (BNs) ([70]Yu
   et al., 2007). To find the important nodes, network and node indices
   like degree, betweenness, closeness and clustering coefficients are
   calculated from the extracted meta-interactome network. These node
   parameters were calculated using previously reported methods and
   protocols ([71]Bhattacharyya and Chakrabarti, 2015). For
   transcriptomics and proteomics data, TINs are identified from the
   expressed nodes only. For phosphoproteomics, genomics, epigenomics and
   metabolomics data, TINs are identified from phosphorylated, mutated,
   methylated proteins/genes and metabolic enzymes, respectively.

   Hubs are nodes that have high degrees. Degree distribution is
   normalized following Eq. 2,
   [MATH:
   <mrow><msub><mi>x</mi><mrow><mi>i</mi><mo>,</mo><mrow><mi>n</mi><mo>⁢</
   mo><mi>o</mi><mo>⁢</mo><mi>r</mi><mo>⁢</mo><mi>m</mi><mo>⁢</mo><mi>a</m
   i><mo>⁢</mo><mi>l</mi><mo>⁢</mo><mi>i</mi><mo>⁢</mo><mi>z</mi><mo>⁢</mo
   ><mi>e</mi><mo>⁢</mo><mi>d</mi></mrow></mrow></msub><mo>=</mo><mfrac><m
   sub><mi>x</mi><mi>i</mi></msub><msub><mi>x</mi><mrow><mi>m</mi><mo>⁢</m
   o><mi>a</mi><mo>⁢</mo><mi>x</mi><mo>⁢</mo><mi>i</mi><mo>⁢</mo><mi>m</mi
   ><mo>⁢</mo><mi>u</mi><mo>⁢</mo><mi>m</mi><mo>⁢</mo><mi>n</mi></mrow></m
   sub></mfrac></mrow> :MATH]
   (2)

   where x[i] is degree value of a node i and x[maximum] is the maximum
   degree of the network. APODHIN web server converts normalized degree
   distribution to corresponding z-score distribution. The plot of
   probability distribution function (PDF) of z-scores for all nodes in
   network is sent to the user by email. This email shares intermediate
   results only. From the plot of PDF, users are asked to provide the
   threshold value for hub identification. After receiving the threshold
   value, APODHIN initiates hub identification program. Nodes having
   degree greater than the threshold value are considered as hub. It is
   also mentioned in the help page. Scores concerning individual
   centrality parameters like, betweenness, closeness and clustering
   coefficients are calculated and the cumulative centrality scores (CCS)
   are estimated by summing over the combined scores for first layer
   interactors ([72]Bhattacharyya and Chakrabarti, 2015). CCSs are
   normalized following Eq. 2 where x is equal to CCS. Normalized CCS are
   converted into z-scores. The PDF of z-scores for all nodes of network
   are sent to the user by email and CNs are chosen based on the user
   provided threshold value of z-score following similar procedure as
   mentioned while identifying hubs. Bottleneck nodes are characterized
   based on their betweenness values. Normalized betweenness values were
   obtained from Eq. 2 where x is betweenness and subsequently, converted
   into z-scores. Similar to hubs, bottleneck nodes are also chosen based
   on the user provided threshold z-score, chosen from the PDF plot of
   z-score for all nodes.

   Further, sub-network consisting of TINs and their first or second layer
   interactors are constructed and displayed in an interactive
   three-dimensional (3D) network viewer.

   The overlap of TINs, as well as all nodes of the network, as prognostic
   cancer marker is checked after extraction of prognostic marker
   information from the Human Protein Atlas database (version 19)
   ([73]Uhlen et al., 2017). The prognostic data was obtained from
   Kaplan-Meier survival analysis. The cancer type, for which prognostic
   status have minimum p-value, is shown in the “Node information” table
   in the page of “network view of identified important nodes.” On mouse
   hover on the cancer type, more detail information for other cancer
   types, is available.

Pathway Mapping and Network of Mapped Pathways

   For each identified TIN, particularly for genes and proteins, APODHIN
   maps the corresponding pathways listed in the KEGG database
   ([74]Kanehisa et al., 2017). APODHIN performs a hypergeometric Fishers
   Exact test and selects enriched pathways satisfying p-value (p[HGD])
   ≤0.05 using the following contingency table and formula.
   [MATH: <mrow><mo>[</mo><mtable displaystyle="true"
   rowspacing="0pt"><mtr><mtd columnalign="left"><mrow><mtext>a</mtext><mo
   mathvariant="italic"
   separator="true">   </mo><mrow><mtext>b</mtext><mo>-</mo><mpadded
   width="+2.8pt"><mtext>a</mtext></mpadded></mrow></mrow></mtd></mtr><mtr
   ><mtd columnalign="left"><mrow><mtext>c</mtext><mo mathvariant="italic"
   separator="true">   </mo><mrow><mtext>d</mtext><mo>-</mo><mtext>c</mtex
   t></mrow></mrow></mtd></mtr></mtable><mo>]</mo></mrow> :MATH]

   Where,

   a = Number of genes in the pathway.

   b = Number of genes in the gene list.

   c = Total number of genes in the pathway.

   d = Total number of genes in all pathways in KEGG.
   [MATH:
   <mrow><msub><mi>p</mi><mrow><mi>H</mi><mo>⁢</mo><mi>G</mi><mo>⁢</mo><mi
   >D</mi></mrow></msub><mo>=</mo><mfrac><mrow><mrow><mo>(</mo><mtable
   rowspacing="0pt"><mtr><mtd
   columnalign="left"><mtext>b</mtext></mtd></mtr><mtr><mtd
   columnalign="left"><mtext>a</mtext></mtd></mtr></mtable><mo>)</mo></mro
   w><mo>⁢</mo><mrow><mo>(</mo><mtable rowspacing="0pt"><mtr><mtd
   columnalign="left"><mtext>d</mtext></mtd></mtr><mtr><mtd
   columnalign="left"><mtext>c</mtext></mtd></mtr></mtable><mo>)</mo></mro
   w></mrow><mrow><mo>(</mo><mtable rowspacing="0pt"><mtr><mtd
   columnalign="left"><mrow><mpadded width="+3.3pt"><mi
   mathvariant="normal">b</mi></mpadded><mo rspace="5.8pt">+</mo><mi
   mathvariant="normal">d</mi></mrow></mtd></mtr><mtr><mtd
   columnalign="left"><mrow><mpadded width="+3.3pt"><mi
   mathvariant="normal">a</mi></mpadded><mo rspace="5.8pt">+</mo><mi
   mathvariant="normal">c</mi></mrow></mtd></mtr></mtable><mo>)</mo></mrow
   ></mfrac></mrow> :MATH]
   (3)

   Further, a network representation of important nodes along with their
   enriched mapped pathways is displayed in an interactive
   three-dimensional (3D) network viewer. [75]Figure 1A shows the flow
   chart of “pan-omics data mapping and network analysis” module of
   APODHIN.

FIGURE 1.

   [76]FIGURE 1
   [77]Open in a new tab

   Flow charts showing work flow in APODHIN web-server for module (A) data
   mapping and network analysis and (B) pathway connectivity analysis.

Pathway Connectivity Analysis and Cross-Pathway Links

   This module of the APODHIN web server aims to construct regulatory
   interaction networks and subsequently identifies cross-pathway
   interaction links connecting different cellular pathway proteins [e.g.,
   signaling proteins (S)], regulatory proteins [e.g., transcription
   factor (TF)] or miRNAs with metabolic pathway proteins (M).

   For this purpose, APODHIN web server was preloaded with cross-pathway
   links or paths where protein–protein interactors (P) connect X nodes (X
   can be S or target gene of TF or target genes of miRNAs) with M
   (metabolic) proteins. We have limited the number (n) of protein–protein
   interactors (P) to a maximum value of three between X and M proteins.
   This limit provides four types of paths, XM (n = 0), XPM (n = 1), XPPM
   (n = 2), XPPPM (n = 3). These cross-pathway linking paths are filtered
   and selected based on expression and/or abundance status of the
   biomolecules supplied by user uploaded pan-omics data for a given
   disease or context. The filtering criteria for any given path is set
   when the terminal nodes are found to be deregulated and the remaining
   nodes are at least expressed within the user provided single or
   multi-omics datasets.

   We implemented an established probabilistic approach based on the
   Hidden Markov Model (HMM) ([78]Tuncbag et al., 2013; [79]Vinayagam et
   al., 2014; [80]Bag et al., 2019) utilizing the information of
   experimentally established PPIs and gene regulatory information to
   extract novel paths and interconnections between regulatory nodes such
   as signaling proteins, TFs and miRNAs and metabolic pathway proteins
   (M). Within these important X-M pairs, important cross-pathway
   connecting paths are again scored by considering all filtered paths
   between X-M pairs. To find important X-M pairs, weights are assigned on
   nodes and edges depending on network and biological properties. Edge
   weight is assigned in terms of normalized interaction probability which
   is proportional to the product of their expression scores.

   Two types of node weights, network entropy, and effect-on-nodes are
   considered. Network entropy includes local entropy of the node. Another
   node weight parameter, “effect-on-node” considers the impact of
   interactors of a particular gene in the cross-connected network. The
   “effect-on-node” considers both biological and network properties of
   the node. Biological properties include deregulated gene, signaling
   crosstalk gene and rate limiting enzyme. Network properties include
   hubs, CNs, and bottlenecks.

   Analysis of Pan-omics Data in Human Interactome Network web server
   allows the user to choose maximum four weight options out of the six
   weights. If a node satisfies any of the selected weight options, weight
   value 1 is assigned for each satisfied option. To identify important
   cross-connecting X-M pairs we have evaluated “path score” (PS) based on
   a HMM implemented within the core mathematical model that calculated
   the significant cross-pathway linking paths. “Path scores” are
   converted to z-scores and paths having z-score ≥1 are considered as
   important cross-connecting paths. A detailed description of the
   mathematical models and path calculation is available in our previous
   publication ([81]Bag et al., 2019). [82]Figure 1B shows the flow chart
   of “pathway connectivity analysis” module of APODHIN.

APODHIN Architecture

   Analysis of Pan-omics Data in Human Interactome Network web server is
   created using HTML, PHP, PYTHON, and JAVA scripts. Client/user side
   scripts are written in HTML, PHP and JAVA scripts. User uploaded data
   is analyzed using PYTHON scripts. For network analysis, PYTHON package
   networkX (version 1.8.1) is used. For visualization of 3D presentation
   of networks JAVA scripts based open source technologies (three.js and
   3d-force-graph.js) were utilized.

   Analysis of Pan-omics Data in Human Interactome Network has two
   separate parts A. APODHIN server and B. APODHIN example data analysis.

APODHIN Server

   Analysis of Pan-omics Data in Human Interactome Network web server is
   preloaded with human interactome network containing PPIN, target gene
   network of miRNAs and target gene network of TFs. Proteins
   participating in signaling and metabolic pathways are also marked
   separately. Metabolites along with their target enzymes are also
   included within APODHIN. This meta-interactome network is used as
   framework of cellular interactions and is further used to map user
   supplied single or multiple types of “omics” data to perform the
   following analyses.
     * •
       Omics data mapping and network analysis: This module has two
       sub-modules. On clicking first submit button, this web server
       provides meta-interactome network filtered by uploaded omics data
       where deregulated and/or altered nodes along with their interactors
       are included. Users can further proceed for finding important
       interacting nodes from the “pan-omics” data mapped interaction
       network by clicking second submit button. Tabular, graphical and 3D
       network representations of the identified TINs are provided for
       better appreciation. Overlap of the TINs is shown both in tabular
       and interactive 3D network visualization. Additionally, TINs and
       their enriched pathways are also shown in tabular and interactive
       3D network visualization manner.
       Sample input files for each omics data type and example analysis
       output are provided for the ease of use and apprehension.
     * •
       Pathway connectivity analysis: As mentioned before, this sub-module
       highlights significant PPI and regulatory paths connecting
       signaling proteins/TF/miRNAs to metabolic proteins. These
       cross-pathway links are thought to be supra-molecular regulatory
       links/signatures connected with metabolic rearrangement or
       reprogramming events that are observed during cancer. In APODHIN,
       these cross-pathway regulatory links can be constructed from three
       types of interaction networks.

     * 1.
       Integrated network where signaling (S) and metabolic (M) pathway
       proteins are connected through protein–protein interactors (P).
     * 2.
       Integrated network where target genes of TFs and metabolic (M)
       pathway proteins are connected through protein–protein interactors
       (P).
     * 3.
       Integrated network where miRNA target genes and metabolic (M)
       pathway proteins are connected through protein–protein interactors
       (P).

   Cross-pathway linking paths are filtered and selected based on
   expression and/or abundance status of the biomolecules supplied by user
   uploaded single or pan-omics data for a given disease or context. These
   paths are shown both in tabular and interactive 3D network
   visualization.

APODHIN Example Data Analysis

   Analysis of Pan-omics Data in Human Interactome Network example data
   analysis page showcase few example analysis of multi-omics data for
   different cancer cell lines. We have used the APODHIN web server to
   construct individual cancer and dataset centric meta-interactome
   network using cell line specific single and/or multi-omics data
   collected from various resources such as GEO ([83]Barrett et al.,
   2013), PRIDE ([84]Perez-Riverol et al., 2019), publication reports and
   data sources for cervical, ovarian, and breast cancers, respectively.
   Further, these cancer and dataset specific meta-interactome networks
   were analyzed and important interacting nodes and cross-pathway links
   were identified and provided within the APODHIN example data analysis
   module. We have used cancer cell line derived omics data freely
   available from different public resources. Options are provided for the
   users to select single and/or multi-omics data to construct the
   meta-interactome networks and further analyze them to identify and
   important interacting nodes and cross-pathway links specific for the
   selected dataset.

Results

Input Options

   Analysis of Pan-omics Data in Human Interactome Network server provides
   two different but linked analysis options for the users who would like
   to utilize single or multiple types of omics data for a given context.
   APODHIN web server provides options to upload seven types of “omics”
   data comprising of mRNA transcriptomics, miRNA transcriptomics,
   proteomics, phosphoproteomics, genomics, epigenomics, and metabolomics.
   The file formats for each data type is specified in the “Help” page and
   sample input files are also available in the server input page.
   Information on preparing input files for using in APODHIN is also
   shared in the “Help” page. For transcriptomics and proteomics data,
   maximum and minimum threshold values for the differential
   expression/abundance (logFC) and statistical significance of that
   (p-values) need to be provided. As the calculations are computation
   intensive, results are sent via email.

   Similarly, for cross-pathway connectivity analysis users need to upload
   single or multiple types of “omics” data for a given context. At least
   one “primary” type (see Methods) of omics data need to be uploaded.
   Now, in this case, users also need to specify the type of connectivity
   they would like to explore, for example, signaling to metabolic
   proteins, TFs to metabolic proteins, or miRNAs to metabolic proteins.
   Only one type of pathway connectivity can be explored at a time for a
   given set of “omics” data. Additionally, users also need to select the
   kind of weights (see section “Materials and Methods”) that would be
   applied while calculating the scores of the selected cross-pathway
   regulatory and PPI paths. E-mail address needs to be supplied for
   APODHIN server to send the result link of the identified cross-pathway
   connections.

Output Options

   Output option for the “Data mapping and network analysis” module has
   two stages. At first stage ([85]Figure 2A), the context specific
   meta-interactome network (“filtered network”) can be visualized via a
   user interactive 3D network viewer where information regarding each
   node and edge are provided in graphical as well as tabular view
   ([86]Figure 2B). Status of the “omics” data mapping is shown in various
   color codes for the nodes whereas different relationship like PPI,
   miRNA-target gene interaction, and TF and target in connections are
   shown varied color codes. Additional details about the protein nodes
   can be obtained via GeneCards ([87]Stelzer et al., 2016) link while
   miRNA details can be found via miRTarBase ([88]Chou et al., 2016) link.
   List of metabolites mapped onto the protein nodes are also provided
   both in the network viewer as well as in the adjacent tabular format.
   If network analysis is opted, along with filtered network, APODHIN
   provides the PDFs for the opted TINs ([89]Figure 2C). Filtered nodes
   (genes/proteins/miRNAs) that satisfied the selected threshold criteria
   are characterized as TINs and further utilized for meta-interactome
   network construction. If multiple files of single type of omics data is
   uploaded, users can see the number of TINs (as hub, bottlenecks, and
   CNs) and their mutual overlap using interactive Venn diagram by
   clicking the “link for analysis” option for single or combination of
   “omics” data ([90]Figure 2D). Combined analysis of multiple types of
   omics data files is shown if multiple types of omics data files are
   provided. Here also, the resultant page ([91]Figure 2E) provides three
   output options. First, the regulatory and PPI connectivity specific to
   the hubs, bottleneck and CNs can be seen via corresponding link where
   networks of deregulated hubs, bottleneck, and CNs can be seen
   separately and saved accordingly ([92]Figure 2F). Association to
   various kinds of cancers for the identified TINs as
   favorable/unfavorable prognostic markers are also provided here after
   mapping the TINs (see Methods) to the data provided in Human Protein
   Atlas ([93]Uhlen et al., 2017). Another option provides the network of
   common TINs ([94]Figure 2G) whereas a separate link provides network of
   enriched pathways with the identified TINs ([95]Figure 2H). Enriched
   pathway networks of deregulated hubs, bottleneck, and CNs can be seen
   separately and saved accordingly. In all these three network output
   options, data can be downloaded in text format for further analysis.

FIGURE 2.

   [96]FIGURE 2
   [97]Open in a new tab

   Snapshots of outputs of module “data mapping and network analysis.” (A)
   Page showing link for filtered network and probability distribution
   function. (B) Filtered network. (C) Probability distribution function
   for network analysis. (D) Output page of a single omics data. (E)
   Network analysis page for multi-omics data. (F) Network of important
   interacting nodes. (G) Network of important nodes. (H) Network of
   pathway mapping.

   Similar to “Data mapping and network analysis,” “Pathway connectivity
   analysis” module also provides a tabular result with a summary of the
   user uploaded data ([98]Figure 3A). Users can see the cross-pathway
   links for multiple types of omics data ([99]Figure 3B). For multiple
   types of files with single type of omics data ([100]Figure 3C), the
   comparison ([101]Figure 3D) is shown in Venn diagram as well as in
   network visualization. In the 3D network visualization window,
   significant PPI and regulatory paths connecting signaling
   proteins/TFs/miRNAs to metabolic proteins are shown in color coded
   fashion. As described before, these cross-pathway links or paths
   connect X nodes (X can be S or target gene of TF or target genes of
   miRNAs) with metabolic (M) proteins. These linking paths are filtered
   and selected based on expression and/or abundance status of the
   biomolecules supplied by the users where for any given path the
   terminal nodes are found to be deregulated and the remaining nodes are
   at least expressed. The corresponding pathways and biological functions
   of the proteins are also provided in tabular format adjacent to the
   network viewer. Additionally, the metabolites connected to the
   metabolic proteins that are part of the selected cross-pathway links
   are also provided in the same page.

FIGURE 3.

   [102]FIGURE 3
   [103]Open in a new tab

   Snapshots of outputs of module “pathway connectivity analysis.” (A)
   Output page shows user provided data in tabular form along with link
   for network view. (B) Output page showing network of signaling to
   metabolic proteins connecting paths for multiple types of omics data.
   (C) Output page when multiple files for single type of omics data is
   provided. (D) Venn diagram shows overlap of signaling to metabolic
   proteins connecting paths for different omics data set.

Example Data Analysis Option

   Analysis of Pan-omics Data in Human Interactome Network example data
   analysis page contains important nodes (genes/proteins/miRNAs),
   pathways, and their networks with interacting partners specific for
   cancers affecting women such as cervical, ovarian, and breast cancer.
   This section also contains important paths linking signaling
   proteins/TFs/miRNAs to metabolic enzymes, which could perhaps be
   responsible for metabolic reprogramming in cancer. The example content
   is produced by APODHIN web server using publicly available cervical,
   ovarian, and breast cancer specific cell line based omics data.
   [104]Figure 4 briefs the statistics derived from APODHIN example
   analyses for mRNA transcriptomics data of different cell lines of
   cervical, ovarian, and breast cancer. [105]Figure 4A shows the overlap
   of deregulated genes. It reveals lesser overlap among deregulated genes
   across cell lines for all cancers. However, there is almost complete
   overlap of pathways mapped by deregulated genes ([106]Figure 4B). Nodes
   satisfying any two types of TINs are considered as important
   interacting nodes (IINs). [107]Figure 4C shows overlap for common IINs
   between cell lines across cancer types are observed. Similarly,
   [108]Figure 4D shows much higher overlap of common pathways mapped by
   IINs. This demonstrates that IINs and their pathways represent the
   common core genes and processes related to a cancer type in a better
   way than that achieved by the initial deregulated genes obtained from
   the omics data. We also checked whether the mapped pathways are related
   to cancer pathways enlisted in KEGG database ([109]Kanehisa et al.,
   2017). [110]Figure 5A shows that pathways mapped by IINs are more
   cancer specific compared to the pathways mapped by deregulated genes
   for all cell lines. [111]Figures 4E,F show the number and overlap of
   deregulated genes and IINs as prognostic markers of respective cancer
   type. [112]Figure 5B shows that compared to the deregulated genes, IINs
   possess higher fractions of prognostic markers for all cancer cell
   lines, except MDAMB231. This advocates the usefulness of the IINs over
   deregulated genes. Moreover, as the number of IINs is much smaller than
   that of deregulated genes the false discovery rate is also expected to
   be lower.

FIGURE 4.

   [113]FIGURE 4
   [114]Open in a new tab

   Statistics derived from APODHIN database for mRNA transcriptomics data
   derived from different cell lines of cervical (HeLa, SiHa, and CaSki),
   ovarian (IGROV1, SKOV3, OVCAR3), and breast cancer (MCF7 and MDAMB231).
   Transcriptomics data was derived from the GEO datasets [115]GSE9750,
   [116]GSE19352, and [117]GSE71363, respectively. (A) Deregulated genes,
   (B) Overlap of pathways mapped by deregulated genes, (C) Overlap of
   IINs, (D) Overlap of pathways mapped by IINs, (E) Deregulated genes as
   prognostic marker, and (F) IINs as prognostic marker.

FIGURE 5.

   [118]FIGURE 5
   [119]Open in a new tab

   (A) Comparison of number of cancer specific pathways mapped by
   deregulated genes and IINs. (B) Comparison of fraction of prognostic
   markers within the deregulated genes and network analysis derived
   important nodes, such as IIN and various TIN (e.g., Hubs, CN, and BN,
   respectively). Dashed lines are drawn to separate cell lines of
   different cancer types.

   [120]Figure 6 shows overlap of cross-pathway links or paths connecting
   signaling (S) proteins, TFs, and miRNAs to metabolic (M) proteins
   identified using omics data derived from the cell lines of three types
   of cancers. For signaling to metabolic connection, four common paths
   for three cervical cell lines were observed. However, no such overlap
   was found for breast and ovarian cancer cell lines.

FIGURE 6.

   [121]FIGURE 6
   [122]Open in a new tab

   Statistics derived from “pathway connectivity analysis” module of
   APODHIN database for mRNA transcriptomics data derived from different
   cell lines of cervical (HeLa, SiHa, and CaSki), ovarian (IGROV1, SKOV3,
   OVCAR3), and breast cancer (MCF7 and MDAMB231). Transcriptomics data
   was derived from the GEO datasets [123]GSE9750, [124]GSE19352, and
   [125]GSE71363, respectively. (A) Overlap of cross-pathway links
   connecting signalling (S) to metabolic (M) proteins, (B) Overlap of
   pathway links connecting target genes of TFs to metabolic (M) proteins,
   and (C) Overlap of cross-pathway links connecting target genes of
   miRNAs to metabolic (M) proteins.

   Analysis of pan-omics data considering transcriptomics, genomics,
   epigenomics, metabolomics data in different combinations are available
   for different cell lines in the example data analysis section of
   APODHIN.

Discussion

   Large-scale genomics, transcriptomics and proteomics approaches have
   made it possible to characterize different clinical spectra associated
   with cancers. Use of pan-omics platforms and approaches in the analysis
   of systemic disease like cancer will not only help to identify numerous
   useful biomarkers but also will expose areas for further improvement in
   therapeutic intervention. Here, we present APODHIN web server, which
   extracts cellular interactome networks from the parent meta-interactome
   for the genes, mRNAs, miRNA, proteins, and metabolites that are either
   deregulated or altered according to the user supplied single or
   multiple omics data. These single or multi-omics data specific
   meta-interactome networks are utilized to identify TINs and their
   sub-modules enriched with PPI and regulatory relationship via
   utilization of graph theory based network analyses and biological
   pathway enrichment analysis. Important interacting nodes (proteins and
   miRNAs), IINs are identified based on the overlap of key nodes such as
   hubs and bottlenecks. Using data from The Human Protein Atlas database,
   APODHIN provides the probable prognostic status of the IINs. Also, as
   observed in our earlier works ([126]Bhattacharyya and Chakrabarti,
   2015), IINs extracted from network topology, could correlate to be
   prospective diagnostic and/or prognostic biomarkers or even turn out to
   be potential therapeutic targets.

   Molecular mechanisms for cancer progression and development of
   potential therapeutics to inhibit these complex diseases are difficult
   from the independent knowledge of signaling, TFs, miRNAs, and metabolic
   pathways. Metabolic reprogramming is an essential hallmark of cancer
   ([127]Hanahan and Weinberg, 2011). Understanding the coordination among
   various cellular pathways, such as gene-regulatory, signaling and
   metabolic pathways is crucial and may provide clues into the molecular
   mechanism of metabolic adaptation in cancer and associated cells.
   Therefore, there is an urgent need for systems biology model, which can
   coordinate among signaling-induced proliferation of tumor cells/growth,
   transcription factor/miRNA based gene regulation and metabolic
   processes. Hence, we emphasized to design a mathematical approach to
   identify significant proteins forming interconnections between
   signaling, regulatory and metabolic pathways. We have constructed an
   integrated network where signaling (S), regulatory (TFs and miRNAs),
   and metabolic (M) pathway entities are connected through
   protein–protein and gene regulatory interactions. Interconnections
   between regulatory components such as signaling proteins/TFs/miRNAs and
   metabolic pathways need to be elucidated rigorously to understand the
   role of oncogene and tumor suppressors in regulation of metabolism
   alongside their normal functions. Analyses of such cross-connected
   network and linking paths will facilitate probable way(s) to inhibit
   cancer progression in a more specific manner.

   Considering the growing demand of multi-omics data integration followed
   by systems biology based analytical interpretation of the large-scale
   “omics” data, implementation of a robust and user-friendly web-based
   platform is very much due. In order to make better sense out of the
   various “omics” data, it is imperative to utilize them in a way so that
   the global scenario of the complex and multi-layer cellular interactome
   can be recapitulated. Several data portals have been coming up to make
   multi-omics data accessible, visible and more importantly,
   interpretable. Various programs and web portals are being made to
   interpret omics data in different perspectives. Each of these tools has
   its own merits and limitations also. [128]Table 1 provides a
   qualitative comparison of features and functionalities of APODHIN with
   respect to existing omics data analysis tools. Web servers like
   OmicsNet ([129]Zhou and Xia, 2018) is a technically powerful web based
   platform specifically meant for better visualization of molecular
   networks. It mainly provides varied and efficient ways of network
   visualization including different components. However, it provides
   minimal emphasis on networks analysis and identification and
   interpretation of important interacting nodes and cross-pathway links.
   Similarly, this server only accept differential omics data for
   genes/proteins and metabolites, it does not have the option to include
   the epigenetic modification, miRNA expression data, and
   phosphoproteomics data. mixOmics ([130]Rohart et al., 2017) is a
   software package which is based on multi-variate analysis. It performs
   data reduction, and then identifies combination of biomarkers. It
   offers a network visualization but does not consider network topology.
   It does not consider any meta-interactome. Software package iOmicsPASS
   ([131]Koh et al., 2019) considers a meta-interactome by including PPIN
   and TF regulatory network within omics data. But it excludes miRNA-mRNA
   regulatory network. It considers only three types of omics data,
   transcriptomics, proteomics, DNA copy number data, thus limiting its
   applicability. Another R package Miodin ([132]Ulfenborg, 2019) provides
   opportunity of creating a workflow of data analysis. It considers
   different omics data, but not metabolomics data. It requires
   pre-installation of several R packages. Miodin provides Venn diagram of
   differentially expressed genes, overlapped within different datasets.
   However, Miodin does not consider any meta-interactome and does not
   construct any network. None of these tools perform network topology
   analysis and provide cross-pathway connectivity information of
   proteins. APODHIN is perhaps the only available web based platform that
   offers to (a) integrate multi-omics data onto an exhaustive multi
   layered cellular meta-interactome network, (b) extract and analyze the
   context specific networks and sub-networks to identify TINs that could
   serve as potential biomarkers and/or therapeutic targets (c)
   rationalize the role of the identified TINs to the given context via
   pathway enrichment and prognostic marker correlation, and (d) identify
   cross-pathway interconnections between regulatory components such as
   signaling proteins/TFs/miRNAs and metabolic pathways for better
   understanding the role of oncogenes and tumor suppressors in regulation
   of metabolic reprogramming during cancer. Additionally, being a web
   based tool, APODHIN requires no installation of software, good
   computing systems, and technical expertise. We believe these features
   make APODHIN useful as well as a user-friendly application.

   However, there is still scope for improvement for the APODHIN server.
   The example data analysis part can be enriched to upgrade as a
   database. For example, in future we would like to equip the server to
   accept and process raw “omics” data directly and further create the
   processed data for genetic or epigenetic alterations, differential
   expression and abundance, respectively. We would also like to add
   components for handling large number of datasets which will be able to
   analyze cohort data. Current version is mostly aimed to
   patient-specific personalized data. Similarly, the server and along
   with a database should be enriched in such a way that it could be
   utilized for deep learning and artificial intelligence based tools to
   predict the disease outcome, recurrence and drug resistance,
   respectively.

Data Availability Statement

   The datasets presented in this study can be found in online
   repositories. The names of the repository/repositories and accession
   number(s) can be found in the article.

Author Contributions

   NB and SC designed the web server. NB created the web server. KK, SB,
   and RB provided the data for meta-interactome network. NB and SC
   analyzed the data and drafted the manuscript. All authors contributed
   to the article and approved the submitted version.

Conflict of Interest

   The authors declare that the research was conducted in the absence of
   any commercial or financial relationships that could be construed as a
   potential conflict of interest.

Footnotes

   Funding. The authors acknowledge CSIR-Indian Institute of Chemical
   Biology for infrastructural support. SC acknowledges the Systems
   Medicine Cluster (SyMeC) grant (GAP357), Department of Biotechnology
   (DBT) for funding. NB acknowledges the Systems Medicine Cluster (SyMeC)
   grant (GAP357), Department of Biotechnology (DBT) for fellowship. KK,
   SB, and RB acknowledge Department of Biotechnology (DBT), Council of
   Scientific and Industrial Research (CSIR), respectively for their
   fellowships. The funders had no role in study design, data collection
   and analysis, decision to publish, or preparation of the manuscript.
   This manuscript has been released as a pre-print at bioRxiv
   ([133]Biswas et al., 2020).

References