Abstract

Background

   Functional modules in protein-protein interaction networks (PPIN) are
   defined by maximal sets of functionally associated proteins and are
   vital to understanding cellular mechanisms and identifying disease
   associated proteins. Topological modules of the human proteome have
   been shown to be related to functional modules of PPIN. However, the
   effects of the weights of interactions between protein pairs and the
   integration of physical (direct) interactions with functional (indirect
   expression-based) interactions have not been investigated in the
   detection of functional modules of the human proteome.

Results

   We investigated functional homogeneity and specificity of topological
   modules of the human proteome and validated them with known biological
   and disease pathways. Specifically, we determined the effects on
   functional homogeneity and heterogeneity of topological modules (i)
   with both physical and functional protein-protein interactions; and
   (ii) with incorporation of functional similarities between proteins as
   weights of interactions. With functional enrichment analyses and a
   novel measure for functional specificity, we evaluated functional
   relevance and specificity of topological modules of the human proteome.

Conclusions

   The topological modules ranked using specificity scores show high
   enrichment with gene sets of known functions. Physical interactions in
   PPIN contribute to high specificity of the topological modules of the
   human proteome whereas functional interactions contribute to high
   homogeneity of the modules. Weighted networks result in more number of
   topological modules but did not affect their functional propensity.
   Modules of human proteome are more homogeneous for molecular functions
   than biological processes.

Electronic supplementary material

   The online version of this article (10.1186/s12859-018-2549-8) contains
   supplementary material, which is available to authorized users.

   Keywords: Topological modules, Functional modules, Physical PPI,
   Functional PPI, Functional enrichment analysis, Protein-protein
   interaction networks

Background

   Even after decades of research in the field of human genes, gene
   products and functions, understanding of genotype-phenotype
   relationship is far from complete. Biomolecules (genes, RNA, proteins,
   metabolites) interact with each other and environmental factors in
   order to accomplish various biological processes. Representing these
   interactions as biological networks (metabolic, protein-protein
   interactions, gene regulatory, co-expression) and their analyses
   provide insights in finding genes associated with cellular processes
   such as immune response, signalling pathways or with a complex disease
   like cancer [[27]1].

   Currently, 20,231 proteins of the human proteome have been identified
   [[28]2] but the landscape of their interactions is only partially
   known. Protein interactions may be physical when their amino acid
   residues physically interact through electrostatic forces like
   hydrophobic or functional interactions when a protein influences the
   activity of another protein through regulation, co-expression, or some
   other genetic interaction [[29]3, [30]4] (Fig. [31]1). Large scale
   experiments like yeast two-hybrid and affinity purification coupled to
   mass spectrometry identify physical protein interactions [[32]5, [33]6]
   while high throughput expression techniques like microarray and RNA-seq
   elucidate functional links between proteins [[34]7, [35]8].

Fig. 1.

   Fig. 1
   [36]Open in a new tab

   Illustrations of physical, functional and combined protein-protein
   interaction networks (PPIN)

   Protein-protein interaction networks (PPIN) like most biological
   networks are believed to be modular in nature [[37]4, [38]9, [39]10]
   and detecting functional modules of PPIN are vital for understanding
   gene-function associations and designing therapeutics. Topological
   modules are sub-networks where nodes within a module have dense
   connections as compared to the nodes of the other modules [[40]11].
   Functional module, on the other hand, is a sub-network that contribute
   to similar biological functions [[41]4, [42]9]. Computational methods
   accurately inferring functional and disease modules of the human
   proteome would be of paramount importance for studying cellular and
   disease mechanisms.

   Numerous computational algorithms have been attempted on biological
   networks in order to identify modules by using networks’ topological
   properties based on node neighbours [[43]12], edge weights [[44]13] and
   modularity [[45]14, [46]15]. Other sub-network identifying algorithms
   including those finding core and loop structures [[47]16, [48]17],
   cliques [[49]18] and frequent graph patterns [[50]19] have also been
   attempted to find topological modules in biological networks. However,
   only a few studies have compared their functional properties and their
   relevance to functional modules [[51]20–[52]22]. Usual approach to
   evaluate the functional significance of topological modules is to
   perform functional enrichment analysis and decide on the significantly
   enriched biological functions [[53]21, [54]23, [55]24]. This approach
   is however inconclusive of determining functional coherence and
   specificity of topological modules [[56]25]. In present work, we
   introduce a novel functional specificity measure that encompasses both
   functional homogeneity and heterogeneity of the topological modules.
   Top ranked topological modules are thereby identified and validated for
   their functional specificity.

   We combine functional interactions inferred from expression data
   [[57]26, [58]27] and physical interactions of PPIN [[59]6, [60]16] to
   provide holistic functional attributes to protein nodes and
   interactions of the network for the determination of functional modules
   [[61]28–[62]30]. Though several studies have reported characteristics
   of resulting modules of different biological networks [[63]13, [64]17,
   [65]21], there is a need of a systematic study elucidating the effects
   of using both functional and physical interactions of PPIN on detecting
   topological and functional modules. Previously, Theofilatos et al. and
   Lubovac et al. have applied weighted PPIN to predict protein complexes
   using a Markov clustering based approach and ranking measure on the
   basis of weighted neighborhood property, respectively [[66]31, [67]32].
   But here we investigate the role of edge weights incorporated from gene
   functional similarities in the modular detection of PPIN.

   Our contributions in this study are (i) evaluation of functional
   coherence and specificity of the topological modules of the human
   proteome by using novel measures, (ii) determination of the effect of
   using both direct physical and indirect functional links of PPIN on
   detection of functional modules, and (iii) systematic analysis of
   incorporating functional context of interactions as edge weights using
   functional similarities of genes. We have used three different PPIN
   datasets of the human proteome and Louvain community detection
   algorithm [[68]14] for modular detection. The weighted PPIN were
   generated by calculating functional similarity between interacting
   proteins by using molecular functions, biological processes and
   cellular components of Gene Ontology (GO) [[69]33]. We also elaborate
   on how physical and functional interactions between proteins affect
   functional diversity of topological modules.

Results

Physical and functional PPIN

   The present study considers three types of human PPIN based on
   physical, functional, and combined interactions as given in
   Table [70]1. The strengths or weights of protein-protein interactions
   with respect to their functional context (MF, BP and CC) are calculated
   from functional similarities of respective GO context, using Wang
   measure [[71]34]. This led to nine sets of weighted PPIN and their
   network properties are listed in Table [72]2.

Table 1.

   Properties of different binary PPIN: physical (P), functional (F), and
   combined (C)
   Network Nodes Edges Avg. degree Avg. path length Diameter Edge density
   Clustering coeff. Giant component size
   P 13,269 98,013 14.73 6.95 11 0.0011 0.15 13,177
   F 11,362 613,865 108.06 3.14 11 0.0095 0.26 11,271
   C 15,562 700,640 90.04 6.10 11 0.0057 0.20 15,518
   [73]Open in a new tab

Table 2.

   Properties of weighted PPIN: physical (P), functional (F) and combined
   (C) PPIN weighted by functional contexts: molecular function (MF),
   biological process (BF) or cellular components (CC)
 Network Nodes   Edges  Avg. degree Avg. path length Diameter Edge density
   Clustering coeff.
 P-MF    13,269 98,013  9.06        1.54             5.73     0.0007       0.133
 P-BP                   5.25        0.60             5.13     0.0004       0.136
 P-CC                   8.85        1.40             5.36     0.0007       0.135
 F-MF    11,362 613,865 43.96       0.42             4.77     0.0038       0.223
 F-BP                   26.12       0.27             5.06     0.0023       0.222
 F-CC                   46.90       0.64             5.45     0.0040       0.226
 C-MF    15,562 700,640 38.82       0.63             5.03     0.0025       0.176
 C-BP                   22.76       0.28             4.12     0.0015       0.178
 C-CC                   40.53       0.66             4.36     0.0026       0.180
   [74]Open in a new tab

   PPIN like other biological networks such as metabolic and
   gene-regulatory networks are characterised by specific interactions
   between proteins (nodes) and functions of proteins and therefore
   demonstrate small world properties (i.e., short path length) and scale
   free characteristics (i.e., few nodes with large number of neighbours)
   (Tables [75]1 and [76]2).

Topological modules

   Topological modules of binary and weighted PPIN were detected using
   Louvain algorithm and analysed to investigate how (i) different
   interactions (physical and functional) and (ii) different biological
   contexts (i.e., MF, BP and CC ontologies) affect the functional
   properties of the modules.

   As shown in Table [77]3, the number of modules predicted for different
   networks vary considerably although the modularity values remain almost
   the same. We note that the number of modules predicted for weighted
   networks (1586 to 2912) is much more than that of binary networks (34
   to 64), but only 0.3 to 1.2% of these modules are mesoscale (size> 10)
   as compared to 20–27% of binary networks. A closer inspection of
   Figs. [78]2, [79]3 and [80]4 finds that most of the modules are of size
   two, corresponding to isolated protein pairs whose interactions with
   others is yet be known or weak.

Table 3.

   Properties of topological modules of different PPIN
   Network Modularity Number of Modules Mesoscale^a modules (%) Largest
   Module Size Network edge density Module edge density
   P 0.43 64 26.6 2150 0.0011 0.0014 ± 0.01
   P-MF 0.46 1586 1.2 1658 0.0007 0.0006 ± 0.0003
   P-BP 0.53 2213 1.0 1988 0.0004 0.0004 ± 0.0002
   P-CC 0.45 1754 0.9 2159 0.0007 0.0006 ± 0.0003
   F 0.52 54 20.4 2730 0.0095 0.007 ± 0.004
   F-MF 0.52 1777 0.6 2367 0.0038 0.005 ± 0.003
   F-BP 0.55 1999 0.7 1998 0.0023 0.002 ± 0.001
   F-CC 0.50 1700 0.7 2565 0.0040 0.009 ± 0.015
   C 0.51 34 23.5 6882 0.0057 0.004 ± 0.003
   C-MF 0.50 2391 0.3 6484 0.0025 0.003 ± 0.003
   C-BP 0.53 2912 0.4 4186 0.0015 0.002 ± 0.001
   C-CC 0.48 2430 0.5 4869 0.0026 0.002 ± 0.001
   [81]Open in a new tab

   ^aMesoscale modules refer to the modules with size more than 10

Fig. 2.

   [82]Fig. 2
   [83]Open in a new tab

   Size distributions for modules detected using Louvain algorithm in
   physical networks of human proteome: x-axis represents the size of
   modules while y-axis represents the count of meso-modules of size more
   than 10 nodes. P denotes the binary physical network while P-MF, P-BP
   and P-CC denote the weighted networks with edges scored according to
   functional similarity based on molecular functions (MF), biological
   process (BP) and cellular component (CC), respectively

Fig. 3.

   [84]Fig. 3
   [85]Open in a new tab

   Size distributions of modules detected using Louvain algorithm in
   functional networks of human proteome. x-axis represents the size of
   modules while y-axis represents the count of meso-modules of size more
   than 10 nodes. F denotes the binary functional network while F-MF, F-BP
   and F-CC denote the weighted networks with edges scored according to
   similarity based on molecular functions (MF), biological process (BP),
   and cellular component (CC), respectively

Fig. 4.

   [86]Fig. 4
   [87]Open in a new tab

   Size distributions for modules detected using Louvain algorithm in
   combined networks (physical and functional) of human proteome. x-axis
   represents the size of modules while y-axis represents the count of
   meso-modules of size more than 10 nodes. C denotes the combined
   physical network while C-MF, C-BP and C-CC denote the weighted networks
   where edges scored according to similarity based on molecular
   functions, biological process (BP), and cellular components,
   respectively

Biological relevance of PPIN modules

   More importantly, proteins in topological modules ought to share the
   same functional profile. To study functional relevance of topological
   modules in the human proteome, mesoscale modules from all networks were
   tested for their biological relevance by using functional enrichment
   analysis. The enriched function set F is given by the union of all
   significantly enriched functions across topological modules and
   functional specificities of the set of enriched functions were computed
   for each PPIN. Figure [88]5 (and Additional file [89]1: Figure S2)
   shows the distribution of significantly enriched biological functions
   and size of topological modules of binary and weighted physical PPIN.

Fig. 5.

   [90]Fig. 5
   [91]Open in a new tab

   Functional enrichment analyses of topological modules: (a) and (b) show
   distributions of enriched molecular functions in topological modules of
   PPIN networks. X-axis, Y-axis (left) and Y-axis (right) represent the
   modules, number of statistically significant GO terms, and size of
   modules, respectively. See Additional file [92]1: Figure S2 for the set
   of enriched biological processes and cellular locations in the modules

Functional homogeneity and specificity of topological modules

   Functional homogeneity of a module quantifies functional consistency of
   a topological module as defined by the maximal fraction of proteins
   associated with a biological function. The homogeneity ranges from 0 to
   1 where a value of 1 indicates that all genes in the module exhibit
   that function. A module’s heterogeneity value estimates how specific a
   function is for a particular module.

   A recent study of human proteome [[93]35] discussed how most of the
   topological modules are functionally diverse despite high homogeneity
   values. In our study, we further this observation by including
   functional interactions and incorporating the weights to PPIN. As shown
   in Table [94]4, the MF and BP homogeneity values are observed to be
   higher (0.79 and 0.59) for physical networks than functional networks
   (0.64 and 0.57) whereas cellular localizations (~ 0.7) do not vary much
   across different networks. We conclude that functional interactions
   lead to low homogeneity values in networks because they mostly
   represent cross talks between modules with not much variations in
   cellular localizations. For example, cross talks in TGF-beta signalling
   is known to be involved in many developmental defects and cancer
   [[95]36]. This observation concurs with homogeneity values derived in
   gene-disease associations (a type of functional interactions) in
   disease networks [[96]24, [97]37].

Table 4.

   Functional homogeneity of mesoscale modules detected by Louvain
   algorithm, evaluated using three ontologies: MF, BP, and CC
          PPIN               MF              BP            CC
                       max  mean  std  max  mean std  max  mean std
   Physical   Binary   0.81 0.79 0.01  0.85 0.59 0.24 0.78 0.75 0.04
              Weighted 0.83 0.80 0.02  0.75 0.42 0.31 0.80 0.77 0.01
   Functional Binary   0.71 0.64 0.12  0.72 0.57 0.25 0.8  0.72 0.25
              Weighted 0.71 0.42 0.22  0.74 0.60 0.25 0.8  0.76 0.07
   Combined   Binary   0.74 0.73 0.01  0.70 0.58 0.25 0.77 0.75 0.01
              Weighted 0.73 0.72 0.002 0.72 0.45 0.29 0.76 0.75 0.01
   [98]Open in a new tab

   Table [99]5 shows heterogeneity values for enriched functions of the
   modules. On average, molecular function homogeneity was observed to be
   higher than bioprocess homogeneity for physical (0.80 > 0.42) and
   combined (0.72 > 0.45) networks except for functional networks
   (0.42 < 0.60). But homogeneity and heterogeneity values are more varied
   (high standard deviation) for functional PPIN than physical and
   combined. Thus, it is advantageous to integrate physical protein
   interactions with expression based networks for functional analyses as
   attempted in some reported studies [[100]29, [101]38].

Table 5.

   Functional heterogeneity of modules detected by Louvain algorithm,
   calculated for all the enriched functions
          PPIN               MF             BP            CC
                       min  mean std  min  mean std  min  mean std
   Physical   Binary   0.05 0.07 0.05 0.04 0.09 0.06 0.05 0.09 0.08
              Weighted 0.04 0.08 0.12 0.04 0.09 0.05 0.05 0.21 0.20
   Functional Binary   0.09 0.22 0.14 0.09 0.17 0.12 0.09 0.25 0.16
              Weighted 0.08 0.20 0.14 0.07 0.16 0.15 0.09 0.24 0.16
   Combined   Binary   0.14 0.20 0.12 0.14 0.25 0.15 0.14 0.28 0.21
              Weighted 0.13 0.21 0.18 0.07 0.10 0.05 0.09 0.31 0.19
   [102]Open in a new tab

Effect of the resolution limit on module detection in PPIN

   Modularity-based algorithms for module detection often suffer from
   resolution limit [[103]39] as the scale of modularization depends upon
   the inter-connectedness of the modules. This leads to the inability to
   detect smaller modules in a given network. To study the effect of
   resolution limit in detecting topological modules, we also implemented
   the Incremental Louvain algorithm [[104]35], which first finds modules
   by maximizing modularity while incrementally modularizing larger
   modules into smaller sub-networks, thus converging the algorithm for
   modules with size greater than a threshold size.

   Here, we observed on average eight times more mesoscale modules as
   compared to the Louvain algorithm, the majority of modules being
   smaller in the size range of 10 to 200 (Additional file [105]1: Figure
   S4). In case of smaller modules detected using Incremental Louvain
   algorithm, an increase in the homogeneity values is observed when
   indirect functional interactions are combined with physical PPI
   (Tables [106]6 and [107]7). While functional homogeneities of modules
   detected with the Louvain algorithm decreased when functional
   interactions are introduced into PPI network. This phenomenon can be
   simply attributed to difference in module sizes. When compared with
   respect to three ontologies, the homogeneity of modules shows on
   average 3.4% decrease for MF, 47.08% increase for BP and 4.6% decrease
   in CC. And heterogeneity values showed large percentage of decrease for
   these smaller modules (85.1, 78.9 and 87% decrease in MF, BP and CC,
   respectively). Weighting interactions in PPI network improves
   homogeneity of these modules but no change in heterogeneity values is
   observed.

Table 6.

   Functional homogeneity of mesoscale modules detected by Incremental
   Louvain algorithm, evaluated using three ontologies: MF, BP, and CC
          PPIN               MF             BP            CC
                       max  mean std  max  mean std  max mean std
   Physical   Binary   1    0.57 0.28 1    0.67 0.29 1   0.65 0.28
              Weighted 1    0.63 0.27 1    0.81 0.20 1   0.76 0.22
   Functional Binary   1    0.59 0.25 0.97 0.73 0.22 1   0.72 0.23
              Weighted 0.97 0.69 0.25 1    0.85 0.16 1   0.72 0.24
   Combined   Binary   1    0.60 0.26 1    0.72 0.26 1   0.70 0.24
              Weighted 0.98 0.65 0.26 1    0.82 0.19 1   0.74 0.21
   [108]Open in a new tab

Table 7.

   Functional heterogeneity of the modules detected by Incremental Louvain
   algorithm, calculated for all the enriched functions
          PPIN                MF              BP              CC
                        min  mean  std   min  mean std   min  mean std
   Physical   Binary   0.008 0.017 0.02 0.008 0.02 0.03 0.008 0.02 0.03
              Weighted 0.007 0.017 0.02 0.008 0.03 0.04 0.008 0.03 0.04
   Functional Binary   0.01  0.02  0.03 0.01  0.03 0.03 0.01  0.03 0.04
              Weighted 0.01  0.03  0.03 0.01  0.04 0.04 0.01  0.03 0.05
   Combined   Binary   0.007 0.02  0.02 0.007 0.02 0.02 0.007 0.02 0.04
              Weighted 0.008 0.02  0.02 0.007 0.02 0.03 0.01  0.03 0.05
   [109]Open in a new tab

Functionally specific modules

   The specificity of a particular function takes both its homogeneity
   within the module and its diversity across the modules into account.
   The normalized specificity scores for all significantly enriched
   functions across modules are summarized in Fig. [110]6. As seen from
   the patterns of homogeneity and heterogeneity values, physical PPIN
   produce more functionally specific modules (highly homogenous and less
   diverse) than functional and combined PPIN, underscoring the benefit of
   including proteomics while analysing expression based networks in the
   identification of functional modules.

Fig. 6.

   [111]Fig. 6
   [112]Open in a new tab

   Functional specificity of significantly enriched molecular functions of
   topological modules. See Additional file [113]1: Figure S3 for
   specificity scores of topological modules for BP and CC enrichment

   Topological modules were ranked using the specificity score and we
   labelled the modules with normalized specificity greater than 0.90 as
   functionally specific modules and the others as general modules.
   Table [114]8 summarizes the biological functions and Table [115]9
   enlists enriched biological pathways of specific modules. Main
   functions specific to the modules were enzymatic activities like
   kinase, hydrolase, and transferase; and protein and nucleotide binding
   activities. About 36 to 55% of topological modules in binary and 14 to
   32% of in weighted networks were classified as specific modules
   according to above mentioned criteria. More number of modules are found
   to be functionally specific (55%) in physical PPIN as compared to
   functional and combined PPIN (Table [116]8). This is in agreement with
   the effect of heterogeneity and homogeneity values of physical
   networks. This maybe imparted to the fact that direct interaction
   between proteins which are elucidated through high throughput screening
   experiments [[117]6, [118]40] are more often studied and more popularly
   annotated with functions and that gene-function associations as
   annotation based functional enrichment analysis are affected by missing
   annotations. See Additional file [119]1: Tables S1 and S2 for specific
   modules enriched in biological processes and cellular locations.

Table 8.

   The percentage (%), mean size, and summary of molecular functions of
   the specific modules of physical (P), functional (F) and combined (C)
   PPIN
   Network % Mean Size (std.) Specific molecular functions of modules
   P 0.55 1523 (390) Module1: cytoskeletal protein binding; Module2:
   receptor activity; Module3: cation binding, dimerization/transcription
   factor activity; Module4: cyclic compound/RNA/chromatin binding,
   Wnt-activated receptor activity; Module5: DNA binding; Module6:
   pyruvate dehydrogenase (acetyl-transferring) kinase activity
   P-weighted 0.32 1215 (262) Module1: deacytylase activity; Module2:
   cyclic compound/nucleotide/ATP/nucleoside binding, kinase/transferase
   activity; Module3: amide/peptide binding; Module4: transferase
   activity; Module5: protein domain specific binding; Module6: DNA
   binding
   F 0.36 2298(223) Module1: RNA binding; Module2:ssDNA/
   nucleotide/nucleoside/GTP/Mg ion binding binding,
   oxidoreductase/transferase/kinase/ activity, transmembrane transporter
   activity; Module3: ATPase/DNA helicase activity, chromatin binding
   F-weighted 0.42 1036 (978) Module1: hydrolase/transferase activity,
   TF/transcription regulator/transcription coactivator/transcription
   cofactor/transmembrane transporter activity, CCR5 chemokine receptor
   binding; Module2: ATPase/hydrolase activity, transmembrane transporter
   activity; Module3: kinase activity, DNA binding; Module3: ATPase
   activity; Module4: alcohol binding
   C 0.38 4011 (2495) Module1: lamin binding; Module2: cyclic
   compound/ion/DNA/enzyme/small molecule/ATP/chromatin/protein
   kinase/nucleotide/nucleoside binding, transcription regulator/protein
   dimerization/kinase/nucleoside-triphosphatase/phosphotransferase
   activity; Module3: macrolide/FK506 binding
   C-weighted 0.14 6484(0) Module1: ion/cyclic
   compound/DNA/cation/enzyme/identical protein binding,
   catalytic/transferase activity
   [120]Open in a new tab

Table 9.

   The top enriched protein pathways in the specific modules of physical
   (P), functional (F) and combined (C) PPIN. Pathways are mapped using
   PANTHER Pathway database ([121]http://www.pantherdb.org/pathway/)
   Network Specific pathways of modules
   P Inflammation mediated by chemokine and cytokine signalling pathway;
   gonadotropin-releasing hormone receptor pathway; Wnt signalling
   pathway; Integrin signalling pathway; CCKR signalling map;
   Heterotrimeric G-protein signalling pathway; Angiogenesis; PDGF
   signalling pathway; Apoptosis signalling pathway; EGF receptor
   signalling pathway
   P-weighted Inflammation mediated by chemokine and cytokine signalling
   pathway; gonadotropin-releasing hormone receptor pathway; Wnt
   signalling pathway; Integrin signalling pathway; PDGF signalling
   pathway; Heterotrimeric G-protein signalling pathway; Angiogenesis;
   Apoptosis signalling pathway; CCKR signalling map; Angiogenesis; EGF
   receptor signalling pathway; FGF signalling pathway; Huntington
   disease; Cadherin signalling pathway; Alzheimer disease-presenilin
   pathway
   F Inflammation mediated by chemokine and cytokine signalling pathway;
   gonadotropin-releasing hormone receptor pathway; Wnt signalling
   pathway; Integrin signalling pathway, CCKR signalling map;
   Angiogenesis; EGF receptor signalling pathway; Huntington disease;
   Alzheimer disease-presenilin pathway; TGF-beta signalling pathway; PDGF
   signalling pathway; Heterotrimeric G-protein signalling pathway;
   Nicotinic acetylcholine receptor signalling pathway
   F-weighted Inflammation mediated by chemokine and cytokine signalling
   pathway; gonadotropin-releasing hormone receptor pathway; Wnt
   signalling pathway; Integrin signalling pathway, CCKR signalling map;
   Angiogenesis; EGF receptor signalling pathway; FGF signalling pathway;
   Heterotrimeric G-protein signalling pathway; PDGF signalling pathway;
   Huntington disease; Alzheimer disease-presenilin pathway; B-cell
   activation; Parkinson disease; Insulin/IGF pathway; Interleukin
   signalling pathway; Ionotropic glutamate receptor pathway; Mannose
   metabolism; Pyridoxal-5-phosphate biosynthesis; PDGF signalling pathway
   C Inflammation mediated by chemokine and cytokine signalling pathway;
   gonadotropin-releasing hormone receptor pathway; Wnt signalling
   pathway; Integrin signalling pathway, CCKR signalling map;
   Angiogenesis; Heterotrimeric G-protein signalling pathway; EGF receptor
   signalling pathway; PDGF signalling pathway; Huntington disease; FGF
   signalling pathway; Apoptosis signalling pathway
   C-weighted Inflammation mediated by chemokine and cytokine signalling
   pathway; gonadotropin-releasing hormone receptor pathway; Wnt
   signalling pathway; Integrin signalling pathway, CCKR signalling map;
   Angiogenesis; Heterotrimeric G-protein signalling pathway; EGF receptor
   signalling pathway; FGF signalling pathway; Cadherin signalling pathway
   [122]Open in a new tab

Biological validation by pathway enrichment analysis

   To validate biological relevance of top ranked specific modules, their
   enrichment with genes from experimentally known biological pathways was
   computed. Four gene sets of known pathways were considered: glycolysis,
   transcriptional regulation, lung cancer and breast cancer, and their
   details [[123]40–[124]42] are given in Table [125]10. Breast and lung
   cancer pathway set has a total 363 and 300 genes, out of which 347 and
   286 are present in the physical, 260 and 219 in the functional and 349
   and 288 in the combined PPIN. Out of 244 genes from glycolysis pathway,
   158 are present in the physical, 187 in the functional and 226 in the
   combined PPIN.

Table 10.

   Details of biological pathways used for validation of functional
   modules
   Biological Pathway No. of genes Overlap with Source
   Physical PPIN Functional PPIN Combined PPIN
   Glycolysis 262 203 187 229 KEGG [[126]41], MSigDB [[127]42]
   Transcriptional regulation 1705 1554 1243 1640 Rolland et al. [[128]40]
   Lung cancer 300 286 219 288 Rolland et al. [[129]40]
   Breast cancer 363 347 260 349 Rolland et al. [[130]40]
   [131]Open in a new tab

   The overlapped fractions of genes of known pathways to those in
   specific and general modules were calculated in order to estimate
   validity of the topological modules. As shown in Fig. [132]7, specific
   modules from binary combined PPIN retrieved ~ 79% of breast and lung
   cancer genes as compared to 43–45% by modules of weighted PPIN. In a
   similar fashion, for physical and functional PPIN, specific modules of
   binary PPIN were enriched with more cancer pathway genes (69 and 89%
   for breast cancer, 69 and 85% for lung cancer) than respective modules
   from weighted PPIN (56 and 45% for breast cancer, ~ 49% for lung
   cancer). Specific modules of binary networks were also highly enriched
   with 70, 90, and 76% of glycolysis genes and 71, 87 and 77% of
   transcriptional regulation genes in physical, functional and combined
   networks, respectively.

Fig. 7.

   [133]Fig. 7
   [134]Open in a new tab

   Overlap of specific topological modules (specificity score > 0.9) and
   general topological modules (specificity score < 0.9) with
   experimentally known biological pathways: glycolysis, transcriptional
   regulation, lung cancer and breast cancer. Bars represent overlap of
   genes involved in biologically validated pathways with specific modules
   (brown colour) and general modules (green colour). Topological modules
   are detected via molecular function enrichment for binary and weighted
   physical(P), functional(F) and combined (C) PPINs

Discussion

   The three different PPIN (physical, functional and combined) were
   modularized and their functional relevance was analysed using
   functional enrichment analysis. As observed from Table [135]1, physical
   PPIN are sparser (have high average path length and low edge density)
   than functional PPIN, resulting due to high number of functional
   interactions and noise in the gene expression experiments. For weighted
   networks (Table [136]2), the edges with low functional similarity
   between proteins reduce the average path length to lower values than
   binary PPIN (ranges from 0.2 to 1.5 as compared to 3.1 to 6.9). There
   is a high overlap between functional and physical PPIN with 9069 common
   nodes between the two, underlining that most physical interactions also
   exert functional interactions. However, small amount of non-overlapping
   edges between physical and functional PPIN suffices to cause changes in
   edge density and clustering coefficient for the combined network.

   When modularized using Louvain algorithm, size distribution of
   topological modules in three PPIN (Figs. [137]2, [138]3, [139]4 and
   Table [140]3) shows that weighting interactions with functional
   similarities of proteins removes weak protein-protein interactions in
   PPIN and leads to higher number of compact modules.

   Figure [141]5 (and Additional file [142]1: Figure S2) shows the
   functionally enriched GO terms in the PPIN modules. The number of
   enriched cellular functions and processes are observed to be higher for
   the weighted PPIN despite the smaller size of the modules. The number
   of cellular locations decreases however with the inclusion of weights
   of protein interactions. Overall, combined PPIN are enriched by more GO
   terms, with biological processes approximately 1.5 to 3 times more,
   molecular functions up to 3 times more and cellular locations
   approximately 1.4 times more, than those in physical and functional
   binary PPIN.

   Functional homogeneity analysis (Tables [143]4 and [144]5) shows
   Physical PPIN modules to be more specific than functional networks, in
   case of molecular functions as compared to bioprocesses and cellular
   localizations. Overall, homogeneity and heterogeneity values are not
   much different when weighted interactions are considered, indicating
   that topological modules are more resilient to edge weights when
   functional annotations are considered. We also conclude that
   topological modules in PPIN are more homogeneous and specific in
   molecular functions, and less homogeneous (diverse) in terms of
   biological processes. This is in agreement with the fact that a
   biological process may involve multiple sets of molecular functions and
   thus functional modules map to a number of molecular functions but less
   number of biological processes. Most importantly, the results indicate
   that the functional modules are observed to be more homogenous and
   specific when direct interactions in PPIN are also considered (as seen
   in the combined network), a fact to kept in mind when identifying
   biologically relevant modules by using computational methods. To study
   the effect of resolution limit on functional properties of modules,
   three PPIN were modularized using Incremental Louvain Algorithm that
   resulted in modules, eight times more in number but smaller in size
   than Louvain (Additional file [145]1: Figure S4). Despite the
   differences, enrichment analyses of modules from both type of
   algorithms show that physical networks are more specific than
   functional ones (see Tables [146]6 and [147]7). Thus topological
   modules are more specific and homogeneous when direct interactions are
   considered with indirect functional associations (such as derived from
   co-expression or microarray based experiments).

   A specificity score is introduced in this study that considers both
   functional homogeneity and heterogeneity of a module. Topological
   modules with specificity score greater than 0.90 were labelled
   functionally specific modules and the others as general modules. Table
   [148]8 shows that physical PPIN modules are more enriched in specific
   modules than functional PPIN. As seen in Fig. [149]6 and Additional
   file [150]1: Figure S3, the modules appear to become smaller and the
   biological functions re-distributed into more number of highly specific
   modules when edge weights are introduced to physical and functional
   protein interaction networks. However, combining functional
   interactions with physical interactions led to formation of few and
   larger specific modules. This limitation due to increasing module size
   can be handled by optimizing modularizing algorithm for detecting
   smaller modules of high functional specificity in future and is beyond
   the scope of present study.

   Biological relevance of top ranked specific modules in physical,
   functional and combined PPIN was evaluated on the basis of their
   enrichment with genes from experimentally known biological pathways
   such as glycolysis, transcriptional regulation, lung cancer and breast
   cancer. As shown in Fig. [151]7, specific modules are overall found to
   be more enriched than general modules for all four biological pathways,
   but the specific modules from binary PPIN were observed to be highly
   enriched than those of weighted PPIN. This indicates that the specific
   modules obtained by using specificity scores of enriched functions are
   highly enriched with known functional and disease pathways. However,
   inclusion of weights did not improve the enrichment of biological and
   disease pathways in physical and functional networks.

Conclusions

   We systematically analysed functional properties of topological modules
   in human proteome and investigated the effect of physical and
   functional interactions in PPIN on functional specificity of modules.
   We also studied the contribution of weighting edges with functional
   similarities on topological modules. A specificity score was introduced
   to identify more accurate biologically relevant and specific modules.
   Functionally homogeneity was earlier used to evaluate functional value
   of topological modules detected in biological networks [[152]24,
   [153]25] but failed to consider the heterogeneity of functional modules
   of biological networks due to a protein or gene mapping to a number of
   cellular processes. Thus, a set of proteins (in a module) are involved
   in more than one function and also a biological function is mapped to
   more than one module. In order to handle this, functional specificity
   was introduced which considers both functional homogeneity within the
   module and functional heterogeneity across the modules. The function
   specificity helps in identifying functional modules or specific
   functions of topological modules and one may use our methods to
   confidently map specific functions to topological modules of PPIN.

   The topological modules detected using physical, functional and
   combined PPIN are found to be homogeneous, highly specific, and
   enriched in a number of significant biological functions, processes and
   cellular localizations (Fig. [154]5 and Additional file [155]1: Figure
   S2). Though weighted edges do not affect the homogeneity and
   heterogeneity of the modules, incorporating functional similarities of
   edge do help in identifying compact and highly specific functional
   modules based on topological properties.

   Functional or indirect interactions are generally noisy as they are
   determined using statistical inferences from gene expressions based
   experiments and vary on tissue and patient sample basis [[156]40]. But
   functional interactions encompass the whole interaction profile of
   genes involved in a cellular function or a disease and thus important
   for systematic analysis and prediction of functional modules. Present
   study provides a first hand insight into the effect of these different
   type of protein-protein interactions on topological modules of human
   proteome. We conclude that instead of using only co-expression based
   networks in identifying functionally relevant topological modules, one
   should combine the accuracy of physical interactions with the larger
   coverage of interactome landscape by functional networks. Though our
   methodology provides an edge over usual methods (like homogeneity, GO
   enrichment) for functional validation of topological modules and helps
   in identifying specific functions of these modules, it does not
   identify core components of a biological pathway. One limitation of our
   study is that our methods do not handle the overlapping modules and
   consider overlapping properties of functional modules. It would be
   interesting to study overlapping sub-modules, core modules, and the
   hierarchical organization of functionally specific topological modules
   as future work of this study.

Methods

Datasets

   In a cellular machinery, proteins function as enzymes, transcription
   factors, receptors or structural proteins, and interact with other
   biomolecules. Protein interactions are either physical (direct) or
   functional (indirect). For studying the role of these two types of
   interactions on detection of modules of PPIN, three datasets were used:
   Physical, Functional and Combined (see Fig. [157]1, Table [158]1).
   These three datasets were prepared from HPRD (Human Protein Reference
   Database) (version Release9) [[159]43] and STRING database (version 10)
   [[160]44]; and include experimental information from other well-known
   databases like BIND, DIP, GRID, HPRD, IntAct, MINT and PID (updated
   till 14 May 2017). All the proteins were mapped to their Entrez gene
   ids. Details of data pre-processing are provided in Additional file
   [161]1.
    1. Physical PPIN enlists curated binary interactions of proteins,
       representing physical or direct interactions that are determined
       using in vivo (e.g. co-immunoprecipitation), in vitro (e.g. GST
       pull-down assays) or yeast two-hybrid experiments.
    2. Functional PPIN represents functional interactions of proteins,
       i.e., these proteins may or may not physically interact but they do
       participate in a biological function by influencing each other
       genetically through co-regulation or co-expression, which are
       determined using experimental techniques like microarray expression
       data analysis or double mutant analysis.
    3. Combined PPIN is the inclusive set of both the physical and
       functional networks mentioned above.

Weights for protein-protein interactions

   Weighted PPIN are obtained by assigning functional similarities between
   proteins as edge weights, considering different GO domains: molecular
   function (MF), biological process (BP) and cellular component (CC). We
   used popular Wang’s semantic similarity measure [[162]34, [163]45] to
   evaluate the functional similarity between genes (i.e., weights of
   protein-protein interactions).

Module detection

   Functional modules of PPIN correspond to communities or sub-networks of
   proteins having specific and similar biological functions [[164]4,
   [165]46]. We chose the Louvain algorithm modular detection algorithm to
   find topological modules of PPIN because it has demonstrated excellent
   performance and low computational complexity on benchmark networks
   [[166]20] (Lancichinetti & Fortunato, 2009). The Louvain algorithm
   finds the community or modular structure by optimizing the modularity Q
   (the quality function) of the network:
   [MATH: <mi>Q</mi><mo>=</mo><msub><mo>∑</mo><mi
   mathvariant="italic">ij</mi></msub><mfenced close=")"
   open="("><mrow><msub><mi>e</mi><mi
   mathvariant="italic">ij</mi></msub><mo>−</mo><msup><mfenced close=")"
   open="("><msub><mi>a</mi><mi>i</mi></msub></mfenced><mn>2</mn></msup></
   mrow></mfenced> :MATH]
   1

   where e[ij] is fraction of edges between modules i and j, and a[i] is
   the fraction of edges connected to the nodes in module i. The modular
   structure is found by maximizing the modularity in an iterative manner.
   All the nodes in the network are assigned to independent modules in the
   beginning and the algorithm progressively merges two communities that
   best increase the modularity of the resulting network structure.
   Merging of nodes and modules continues until there is no further
   increase in the modularity of the network.

Functional enrichment analysis

   The functional enrichment analysis was performed in order to find the
   GO terms in MF, BP, and CC contexts, which are significantly
   represented (enriched) by the proteins in the predicted topological
   modules. The functional enrichment analysis was implemented using R
   package BioStats [[167]47]. The statistical significance of a GO term
   in a module was estimated by evaluating its overrepresentation using a
   hypergeometric test. A functionally enriched module signifies that the
   number of genes observed to be annotated with a function (i.e., the GO
   term) is more than the expected number of genes annotated to that
   function. The ‘expected value’ for a function is the number of genes
   having that specific function in the given module, with respect to the
   reference list (whole list of human genes).

Functional homogeneity and specificity of topological modules

   In this section, we introduce measures to quantify functional
   homogeneity and heterogeneity of topological modules of PPIN. First,
   functional enrichment analysis is performed on the modules to identify
   biological functions (GO terms) that are significantly enriched
   (p-value < 0.0001) in the modules and the functions are ranked
   according to their significance values. Systematic estimation of
   p-value is done using a set of detailed experiments explained in
   Additional file [168]1. We selected the enriched functions for each
   module and identified the set F of enriched functions in all the
   modules.

   Homogeneity of a module with respect to a particular function is
   computed by the proportion of genes annotated by the function. That is,
   the homogeneity of a function f ∈ F within a module is given by
   [MATH: <mtext
   mathvariant="italic">homogeneity</mtext><mo>=</mo><mfrac><msub><mi>n</m
   i><mi>f</mi></msub><mi>N</mi></mfrac> :MATH]

   where n[f] is the number of genes annotated by the function and N is
   the total number of genes in the module. The functional homogeneity (H)
   of a module is defined as the homogeneity of maximally enriched
   function in the module. The heterogeneity of a function is defined as
   the proportion of the modules where the function f ∈ F is enriched.
   That is,
   [MATH: <mtext
   mathvariant="italic">heterogeneity</mtext><mo>=</mo><mfrac><msub><mi>k<
   /mi><mi>f</mi></msub><mi>K</mi></mfrac> :MATH]

   where k[f] is the number of modules enriched with function f and K is
   the total number of modules detected in PPIN.

   Functional homogeneity measures functional coherence of the modules
   while functional heterogeneity indicates how exclusive the modules are
   for the function across all predicted modules. To combine functional
   homogeneity and heterogeneity of a module, functional specificity for
   an enriched function is defined as follows:
   [MATH: <mtext mathvariant="italic">specificity</mtext><mo>=</mo><mtext
   mathvariant="italic">homogeneity</mtext><mo>+</mo><mfrac><mn>1</mn><mte
   xt mathvariant="italic">heterogeneity</mtext></mfrac> :MATH]
   2

   The values of specificity scores across all enriched functions are
   normalized to a range between 0 and 1. The functional specificity value
   measures how exclusively the module is enriched by the specific
   biological function. Modules are ranked using the functional
   specificity score and the top ranked modules are considered as highly
   specific modules.

Additional file

   [169]Additional file 1:^ (1.5MB, pdf)

   Supplementary information. (PDF 1521 kb)

Acknowledgements