Abstract

Background

   In cancer research, the comparison of gene expression or DNA
   methylation networks inferred from healthy controls and patients can
   lead to the discovery of biological pathways associated to the disease.
   As a cancer progresses, its signalling and control networks are subject
   to some degree of localised re-wiring. Being able to detect disrupted
   interaction patterns induced by the presence or progression of the
   disease can lead to the discovery of novel molecular diagnostic and
   prognostic signatures. Currently there is a lack of scalable
   statistical procedures for two-network comparisons aimed at detecting
   localised topological differences.

Results

   We propose the dGHD algorithm, a methodology for detecting differential
   interaction patterns in two-network comparisons. The algorithm relies
   on a statistic, the Generalised Hamming Distance (GHD), for assessing
   the degree of topological difference between networks and evaluating
   its statistical significance. dGHD builds on a non-parametric
   permutation testing framework but achieves computationally efficiency
   through an asymptotic normal approximation.

Conclusions

   We show that the GHD is able to detect more subtle topological
   differences compared to a standard Hamming distance between networks.
   This results in the dGHD algorithm achieving high performance in
   simulation studies as measured by sensitivity and specificity. An
   application to the problem of detecting differential DNA co-methylation
   subnetworks associated to ovarian cancer demonstrates the potential
   benefits of the proposed methodology for discovering network-derived
   biomarkers associated with a trait of interest.

   Keywords: Network comparisons, Co-methylation networks

Background

   Current efforts at understanding diseases rely on the ability to
   identify differences between healthy and affected tissues. A number of
   high-throughput platforms are now commonly used to compare genome-wide
   molecular profiles collected from large cohorts of healthy and diseased
   subjects in search for patterns that differentiate between them. For
   instance, in cancer research, gene expression and DNA methylation
   profiles from diseased tissues are compared to those extracted from
   normal controls in order to identify groups of genes whose expression
   or methylation levels are significantly different, and consequently
   associated to the trait of interest. From a statistical modelling
   standpoint, the primary interest of these studies lies in detecting
   statistically significant changes in average gene expression or
   methylation values in a two-sample comparison. A number of standard
   statistical tests, which are generally applied in a univariate fashion,
   have been proposed for this task and generate candidate sets of genes
   for further investigation [[29]1]. Statistical methods have also been
   developed to assess whether these candidate genes are over-represented
   in pre-defined biological pathways or subnetworks within protein
   interaction networks [[30]2]. These developments are based upon the
   principle that, in order to understand the roles of genes in complex
   diseases, genes need to be studied in the context of the regulatory
   systems they are involved in [[31]2–[32]4].

   An alternative way of analysing genome-wide expression and methylation
   levels observed in a random sample consists of studying their
   interaction patterns, which are often represented in the form of
   networks [[33]5, [34]6]. Network edges quantify the similarity in
   transcription activity between two genes [[35]7] or in DNA methylation
   between two CpG islands [[36]8], respectively. The notion of similarity
   is usually measured by linear correlation, partial correlation or
   mutual information coefficients estimated from the sample data [[37]7,
   [38]9]. The networks arising in the two-sample setting above can then
   be compared to assess whether there are statistically significant
   differences in network topology that can be associated to the disease.
   The detection of markedly distinct interaction patterns across
   conditions may be indicative of local disturbances within known
   biological pathways, and can be taken as candidate biomarkers. For
   instance, as a cancer progresses, it has been observed that its
   signalling and control networks are subjected to re-arrangments which
   are advantageous for the cancer [[39]10]. Changes in methylation levels
   are believed to be among the earliest and most common alterations in
   human cancers [[40]11, [41]12], and topological differences in healthy
   and diseased networks can reflect significant dysregulations associated
   to the disease [[42]13].

   In this paper we discuss the the problem of comparing two labelled
   biological networks, each one representing a different population or
   condition, with the aim of detecting statistically significant
   differences between them. We approach this problem from a hypothesis
   testing perspective. This is a challenging statistical problem as only
   one random network is observed under each condition. Various
   computational methodologies have been developed to compare networks,
   including graph matching and graph similarity algorithms [[43]14].
   Graph matching algorithms have been used to discover similarities
   between molecular pathways across organisms and functions [[44]15,
   [45]16], but are typically limited to unlabelled graphs, and are not
   concerned with hypothesis testing. Graph similarity algorithms also
   assume that the graphs are unlabelled, and the attention has mostly
   focused on detecting patterns that are most similar between networks
   [[46]17]. For instance, gene modules can be identified separately in
   each network first, and then compared across networks [[47]7, [48]18,
   [49]19]. More closely related work includes inferential methods for
   performing two-sample hypothesis tests where the sampling unit is a
   network, and assess whether the two paired networks come from the same
   assumed model [[50]20].

   We take a non-parametric approach to inference that does not require to
   make assumptions about a specific random network model. Our premise is
   that any true topological differences between the two networks would
   involve only a smaller set of edges, compared to all edges in the
   network, which we aim to detect. Our contributions to this problem are
   as follows. First, we consider the issue of choosing a distance measure
   between two paired networks that is able to capture subtle topological
   differences. Second, we discuss how to establish whether large values
   of this distance can be deemed statistically significant under a null
   hypothesis that the networks are independent. Finally, we ask whether
   it is possible to identify a differential subnetwork, starting from two
   large networks, in a computationally efficient manner.

   The article is organised as follows. In Section [51]Methods we
   introduce a distance for labelled networks, the Generalised Hamming
   Distance (GHD). Building on this distance, a permutation-based test
   statistic for two-sample network comparisons is introduced in Section
   [52]A non-parametric test for network comparison. Conditions for
   asymptotic normality are provided so that p-values can be obtained in
   closed-form without the need to carry out computationally expensive
   permutations. In order to verify these results in special cases, in
   Section [53]Validation of asymptotic normality on scale-free networks
   we argue that the proposed conditions hold true for various random
   network models, and provide a sketch proof for the case of scale free
   networks. In Section [54]Differential subnetwork detection we describe
   an algorithm, dGHD, for the detection of differential subnetworks. In
   Section [55]Results we present a number of simulation experiments that
   highlight the advantages of the proposed methodology under different
   graph models. As an illustrative application of the proposed
   methodology, a case-control study involving DNA co-methylation networks
   in ovarian cancer is presented in Section [56]Application to
   co-methylation networks in ovarian cancer. We conclude with a
   discussion in Section [57]Conclusions.

Methods

   We assume to have observed two paired biological networks, each
   represented by a graph, denoted by
   [MATH: <mi
   mathvariant="script">A</mi><mo>=</mo><mo>(</mo><mi>V</mi><mo>,</mo><msu
   b><mrow><mi>E</mi></mrow><mrow><mi>A</mi></mrow></msub><mo>)</mo>
   :MATH]
   and
   [MATH: <mi
   mathvariant="script">ℬ</mi><mo>=</mo><mo>(</mo><mi>V</mi><mo>,</mo><msu
   b><mrow><mi>E</mi></mrow><mrow><mi>B</mi></mrow></msub><mo>)</mo>
   :MATH]
   , respectively. Both graphs are defined on a common set, V={1,2,…,N}.
   The respective sets E [A] and E [B] of edges indicate the connection
   between the nodes in the two graphs. We also let the matrices A=(A
   [ij]) and B=(B [ij]) denote the two (N×N) adjacency matrices associated
   with graphs
   [MATH: <mi mathvariant="script">A</mi> :MATH]
   and
   [MATH: <mi mathvariant="script">ℬ</mi> :MATH]
   , respectively.

   The Hamming distance (HD) between
   [MATH: <mi mathvariant="script">A</mi> :MATH]
   and
   [MATH: <mi mathvariant="script">ℬ</mi> :MATH]
   provides a commonly used metric to quantify the difference between the
   networks, and is defined by
   [MATH:
   <mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac><mtext>tr<
   /mtext><mfenced close="]" open="["
   separators=""><mrow><msup><mrow><mo>(</mo><mi>A</mi><mo>−</mo><mi>B</mi
   ><mo>)</mo></mrow><mrow><mn>2</mn></mrow></msup></mrow></mfenced>
   :MATH]
   , where tr[ ·] denotes the trace of a matrix. This distance takes into
   account the number of edges that are in common between the two
   networks. Here we propose an extension of this metric, which we call
   the Generalised Hamming Distance (GHD), defined as
   [MATH: <mtext>GHD</mtext><mo>(</mo><mi
   mathvariant="script">A</mi><mo>,</mo><mi
   mathvariant="script">ℬ</mi><mo>)</mo><mo>=</mo><mfrac><mrow><mn>1</mn><
   /mrow><mrow><mi>N</mi><mo>(</mo><mi>N</mi><mo>−</mo><mn>1</mn><mo>)</mo
   ></mrow></mfrac><munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>,</m
   o><mi>j</mi></mrow></munder><msup><mrow><mfenced close=")" open="("
   separators=""><mrow><msubsup><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   <mo>−</mo><msubsup><mrow><mi>b</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   </mrow></mfenced></mrow><mrow><mn>2</mn></mrow></msup><mo>,</mo> :MATH]
   (1)

   where a ij′ and b ij′ are mean-centred edge weights defined as
   [MATH: <mrow><mspace
   width="-14.5pt"></mspace><msubsup><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   <mo>=</mo><msub><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub><mo>−</mo><mfrac><mrow><mn
   >1</mn></mrow><mrow><mi>N</mi><mo>(</mo><mi>N</mi><mo>−</mo><mn>1</mn><
   mo>)</mo></mrow></mfrac><munder><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow>
   </munder><msub><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub><mo>,</mo><mspace
   width="1em"></mspace><msubsup><mrow><mi>b</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   <mo>=</mo><msub><mrow><mi>b</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub><mo>−</mo><mfrac><mrow><mn
   >1</mn></mrow><mrow><mi>N</mi><mo>(</mo><mi>N</mi><mo>−</mo><mn>1</mn><
   mo>)</mo></mrow></mfrac><munder><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow>
   </munder><msub><mrow><mi>b</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub></mrow> :MATH]

   and
   [MATH:
   <munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mr
   ow></munder> :MATH]
   denotes summation over distinct i and j. The edge weights, which depend
   on the topology of the networks, provide a measure of connectivity
   between every pair of nodes i and j in
   [MATH: <mi mathvariant="script">A</mi> :MATH]
   and
   [MATH: <mi mathvariant="script">ℬ</mi> :MATH]
   , respectively. When a [ij] and b [ij] are binary values indicating the
   presence or absence of an edge, i.e. are the elements of A and B,
   [MATH: <mtext>GHD</mtext><mo>(</mo><mi
   mathvariant="script">A</mi><mo>,</mo><mi
   mathvariant="script">ℬ</mi><mo>)</mo> :MATH]
   is related to the HD. The specific node weights we propose here instead
   quantify the topological overlap (TO) between a pair of nodes by taking
   into account the local neighbourhood structure around those nodes
   [[58]21]. In the literature, the TO measure has been successfully
   applied for the detection of communities in biological networks, and
   there is empirical evidence that it carries biological meaning [[59]7,
   [60]22].

   We use the one-step TO between nodes i and j indicating whether they
   share direct connections to other nodes. The weights are obtained from
   the adjacency matrix as follows:
   [MATH: <msub><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub><mo>=</mo><mfrac><mrow><mu
   nder><mrow><mo>∑</mo></mrow><mrow><mi>l</mi><mo>≠</mo><mi>i</mi><mo>,</
   mo><mi>j</mi></mrow></munder><msub><mrow><mi>A</mi></mrow><mrow><mtext
   mathvariant="italic">il</mtext></mrow></msub><msub><mrow><mi>A</mi></mr
   ow><mrow><mtext
   mathvariant="italic">lj</mtext></mrow></msub><mo>+</mo><msub><mrow><mi>
   A</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub></mrow><mrow><mo>min</mo><
   mfenced close=")" open="("
   separators=""><mrow><munder><mrow><mo>∑</mo></mrow><mrow><mi>l</mi><mo>
   ≠</mo><mi>i</mi></mrow></munder><msub><mrow><mi>A</mi></mrow><mrow><mte
   xt
   mathvariant="italic">il</mtext></mrow></msub><mo>−</mo><msub><mrow><mi>
   A</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub><mo>,</mo><munder><mrow><m
   o>∑</mo></mrow><mrow><mi>l</mi><mo>≠</mo><mi>j</mi></mrow></munder><msu
   b><mrow><mi>A</mi></mrow><mrow><mtext
   mathvariant="italic">il</mtext></mrow></msub><mo>−</mo><msub><mrow><mi>
   A</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub></mrow></mfenced><mo>+</mo
   ><mn>1</mn></mrow></mfrac><mo>,</mo> :MATH]
   (2)

   when i≠j, and otherwise a [ij]=1, and analogously for b [ij]. The GHD
   sums the squared differences
   [MATH:
   <msup><mrow><mo>(</mo><msubsup><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   <mo>−</mo><msubsup><mrow><mi>b</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   <mo>)</mo></mrow><mrow><mn>2</mn></mrow></msup> :MATH]
   over all pairs of nodes in the network. Note that the term
   [MATH:
   <munder><mrow><mo>∑</mo></mrow><mrow><mi>l</mi><mo>≠</mo><mi>i</mi><mo>
   ,</mo><mi>j</mi></mrow></munder><msub><mrow><mi>A</mi></mrow><mrow><mte
   xt
   mathvariant="italic">il</mtext></mrow></msub><msub><mrow><mi>A</mi></mr
   ow><mrow><mtext mathvariant="italic">lj</mtext></mrow></msub> :MATH]
   is a count of all vertexes (i,l,j) containing node pair (i,j). This
   term measures the connectivity information of each (i,j) pair plus
   their common one-step neighbours. The denominator in ([61]2) can be
   written as min(d [i],d [j])+1−A [ij], where d [i] and d [j] represent
   the node degrees of i and j, respectively. It is roughly equal to the
   smaller of (d [i],d [j]) and normalises a [ij] such that 0≤a [ij]≤1. A
   large discrepancy between
   [MATH: <msubsup><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   :MATH]
   and
   [MATH: <msubsup><mrow><mi>b</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   :MATH]
   indicates a topological difference localised around that pair of nodes.

   By exploring the neighbourhood of each node, the proposed GHD can
   detect subtle topological changes with higher sensitivity compared to
   the HD. A simple illustration of this is given in Fig. [62]1, where
   four simple networks are shown: the network labelled (a) is taken as
   reference while the three paired networks (b), (c), and (d) have been
   generated by changing the position of a single edge in (a). The two
   distances, HD and GHD, have been computed to quantify the difference
   between (a) and each of the other three networks. It can be observed
   that, whereas the HD is unable to distinguish between the three
   networks, the GHD score is more sensitive to subtle topological
   variations and can discriminate between them.

Fig. 1.

   Fig. 1
   [63]Open in a new tab

   Networks (b), (c) and (d) are generated from the reference network (a)
   by a single edge change. Both HD and GHD between the reference network
   and each modified paired network have been computed in each case

A non-parametric test for network comparison

   For inferential purposes, we require computing the probability that a
   distance as extreme or more extreme than the observed GHD value could
   have been observed by chance only. By treating the GHD as a random
   variable with unknown sampling distribution, this probability can be
   estimated non-parametrically via permutation testing. First, we specify
   the null hypothesis as being
   [MATH: <msub><mrow><mi
   mathvariant="script">ℋ</mi></mrow><mrow><mn>0</mn></mrow></msub><mo>:</
   mo><mtext>networks</mtext><mi
   mathvariant="script">A</mi><mtext>and</mtext><mi
   mathvariant="script">ℬ</mi><mtext>are independent.</mtext> :MATH]
   (3)

   By taking
   [MATH: <mi mathvariant="script">ℬ</mi> :MATH]
   as reference, each permutation consists of shuffling the labels of the
   nodes in
   [MATH: <mi mathvariant="script">A</mi> :MATH]
   while keeping the edges unchanged. This generates a permuted network
   [MATH: <msub><mrow><mi
   mathvariant="script">A</mi></mrow><mrow><mi>π</mi></mrow></msub> :MATH]
   that is isomorphic to
   [MATH: <mi mathvariant="script">A</mi> :MATH]
   , and the exchangeability property holds. In turn, this signifies that
   the original and permuted networks are generated from the same
   underlying, but unspecified, model [[64]23, [65]24]. Since all
   permutation networks are isomorphic, permuting the labels of the
   network is equivalent to shuffling rows and columns of the adjacency
   matrix, an approach that bears some similarity with Mantel’s test
   [[66]25] for the comparison of two distance matrices. All the the N!
   possible permutations are then collected in a set Π, and for each π∈Π a
   permuted GHD value is denoted as
   [MATH:
   <mrow><msub><mrow><mtext>GHD</mtext></mrow><mrow><mi>π</mi></mrow></msu
   b><mo>(</mo><msub><mrow><mi
   mathvariant="script">A</mi></mrow><mrow><mi>π</mi></mrow></msub><mo>,</
   mo><mi
   mathvariant="script">ℬ</mi><mo>)</mo><mo>=</mo><mfrac><mrow><mn>1</mn><
   /mrow><mrow><mi>N</mi><mo>(</mo><mi>N</mi><mo>−</mo><mn>1</mn><mo>)</mo
   ></mrow></mfrac><munder><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow>
   </munder><msup><mrow><mfenced close=")" open="("
   separators=""><mrow><msubsup><mrow><mi>a</mi></mrow><mrow><mi>π</mi><mo
   >(</mo><mi>i</mi><mo>)</mo><mi>π</mi><mo>(</mo><mi>j</mi><mo>)</mo></mr
   ow><mrow><mo>′</mo></mrow></msubsup><mo>−</mo><msubsup><mrow><mi>b</mi>
   </mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   </mrow></mfenced></mrow><mrow><mn>2</mn></mrow></msup><mo>,</mo></mrow>
   :MATH]

   and is calculated from the edge weights
   [MATH:
   <msubsup><mrow><mi>a</mi></mrow><mrow><mi>π</mi><mo>(</mo><mi>i</mi><mo
   >)</mo><mi>π</mi><mo>(</mo><mi>j</mi><mo>)</mo></mrow><mrow><mo>′</mo><
   /mrow></msubsup> :MATH]
   after permutation. The exact permutation distribution is obtained by
   carrying out an exhaustive calculation of all GHD[π] values, and
   p-values can then be evaluated as usual. In practice, however, doing so
   is computationally infeasible because the cardinality of Π is generally
   extremely large, even for relatively small networks. The exhaustive
   evaluation for all permutations in Π could be replaced by a Monte Carlo
   approach whereby only a smaller number of random permutations are
   explored. Nevertheless, the overall computational costs remain high for
   networks of the moderately large sizes observed in applications or when
   this procedure has to be repeated several times, for instance when
   searching for a differential subnetwork as in Section [67]Differential
   subnetwork detection.

   In what follows, we propose an alternative approach that removes the
   need to carry out computationally expensive permutation testing
   altogether. We demonstrate that, under our null hypothesis, the exact
   GHD permutation distribution can be approximated well by a normal
   distribution with moments that can be obtained analytically, in closed
   form. First, we notice that the GHD can be rewritten in an equivalent
   form in terms of a generalised correlation coefficient as follows:
   [MATH: <mtable class="eqnarray" columnalign="left center
   right"><mtr><mtd
   class="eqnarray-1"><msub><mrow><mtext>GHD</mtext></mrow><mrow><mi>π</mi
   ></mrow></msub><mo>(</mo><msub><mrow><mi
   mathvariant="script">A</mi></mrow><mrow><mi>π</mi></mrow></msub><mo>,</
   mo><mi
   mathvariant="script">ℬ</mi><mo>)</mo><mo>=</mo><mi>c</mi><mo>−</mo><mfr
   ac><mrow><mn>2</mn></mrow><mrow><mi>N</mi><mo>(</mo><mi>N</mi><mo>−</mo
   ><mn>1</mn><mo>)</mo></mrow></mfrac><munder><mrow><mo>∑</mo></mrow><mro
   w><mi>i</mi><mo>,</mo><mi>j</mi></mrow></munder><msubsup><mrow><mi>a</m
   i></mrow><mrow><mi>π</mi><mo>(</mo><mi>i</mi><mo>)</mo><mi>π</mi><mo>(<
   /mo><mi>j</mi><mo>)</mo></mrow><mrow><mo>′</mo></mrow></msubsup><msubsu
   p><mrow><mi>b</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   <mo>,</mo></mtd></mtr></mtable> :MATH]
   (4)

   where c is a constant that does not change under permutations. By
   making use of this alternative representation, we are able to exploit
   well-known sufficient conditions for asymptotic normality, which can
   also be easily checked in practice. For a generalised correlation
   coefficient of this form, the exact permutation distribution is
   asymptotically normal under two sufficient conditions [[68]26–[69]28]:
   [MATH:
   <munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mr
   ow></munder><msubsup><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   <mo>=</mo><munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>,</mo><mi>
   j</mi></mrow></munder><msubsup><mrow><mi>b</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   <mo>=</mo><mn>0</mn><mspace width="1em"></mspace><mtext>and</mtext>
   :MATH]
   (5a)
   [MATH: <msub><mrow><mo>lim</mo></mrow><mrow><mtext
   mathvariant="italic">N</mtext><mo>→</mo><mi>∞</mi></mrow></msub><mfrac>
   <mrow><msup><mrow><mfenced close="]" open="["
   separators=""><mrow><munder><mrow><mo>∑</mo></mrow><mrow><mtext
   mathvariant="italic">ijkl</mtext></mrow></munder><msubsup><mrow><mi>a</
   mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   <msubsup><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">ik</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   <msubsup><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">il</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   </mrow></mfenced></mrow><mrow><mn>2</mn></mrow></msup></mrow><mrow><msu
   p><mrow><mfenced close="]" open="["
   separators=""><mrow><munder><mrow><mo>∑</mo></mrow><mrow><mtext
   mathvariant="italic">ijk</mtext></mrow></munder><msubsup><mrow><mi>a</m
   i></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   <msubsup><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">ik</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   </mrow></mfenced></mrow><mrow><mn>3</mn></mrow></msup></mrow></mfrac><m
   o>=</mo><msub><mrow><mo>lim</mo></mrow><mrow><mtext
   mathvariant="italic">N</mtext><mo>→</mo><mi>∞</mi></mrow></msub><mfrac>
   <mrow><msup><mrow><mfenced close="]" open="["
   separators=""><mrow><munder><mrow><mo>∑</mo></mrow><mrow><mtext
   mathvariant="italic">ijkl</mtext></mrow></munder><msubsup><mrow><mi>b</
   mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   <msubsup><mrow><mi>b</mi></mrow><mrow><mtext
   mathvariant="italic">ik</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   <msubsup><mrow><mi>b</mi></mrow><mrow><mtext
   mathvariant="italic">il</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   </mrow></mfenced></mrow><mrow><mn>2</mn></mrow></msup></mrow><mrow><msu
   p><mrow><mfenced close="]" open="["
   separators=""><mrow><munder><mrow><mo>∑</mo></mrow><mrow><mtext
   mathvariant="italic">ijk</mtext></mrow></munder><msubsup><mrow><mi>b</m
   i></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   <msubsup><mrow><mi>b</mi></mrow><mrow><mtext
   mathvariant="italic">ik</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   </mrow></mfenced></mrow><mrow><mn>3</mn></mrow></msup></mrow></mfrac><m
   o>=</mo><mn>0</mn><mi>.</mi> :MATH]
   (5b)

   Condition ([70]5a) follows directly from the definition of
   [MATH: <msubsup><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   :MATH]
   and
   [MATH: <msubsup><mrow><mi>b</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   :MATH]
   as being mean-centred. In order to gain some insight into the meaning
   of condition ([71]5b) in our context, it is instructive to consider the
   case where a [ij] and b [ij] are elements of the two adjacency
   matrices, i.e. they indicate the presence of an edge. On defining
   [MATH: <msub><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">i</mtext><mo>·</mo></mrow></msub><mo>=</mo><munder
   ><mrow><mo>∑</mo></mrow><mrow><mi>j</mi><mo>≠</mo><mi>i</mi></mrow></mu
   nder><msub><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub> :MATH]
   and
   [MATH:
   <mi>ā</mi><mo>=</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mi>N</mi></mro
   w></mfrac><munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mrow></munde
   r><msub><mrow><mi>a</mi></mrow><mrow><mi>i</mi><mo>·</mo></mrow></msub>
   :MATH]
   , we have
   [MATH: <msubsup><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">i</mtext><mo>·</mo></mrow><mrow><mo>′</mo></mrow><
   /msubsup><mo>=</mo><munder><mrow><mo>∑</mo></mrow><mrow><mi>j</mi><mo>≠
   </mo><mi>i</mi></mrow></munder><msubsup><mrow><mi>a</mi></mrow><mrow><m
   text
   mathvariant="italic">ij</mtext></mrow><mrow><mo>′</mo></mrow></msubsup>
   <mo>=</mo><msub><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">i</mtext><mo>·</mo></mrow></msub><mo>−</mo><mi>ā</
   mi><mo>,</mo> :MATH]
   (6)

   and condition ([72]5b), with reference to network
   [MATH: <mi mathvariant="script">A</mi> :MATH]
   , can be written as
   [MATH: <msub><mrow><mo>lim</mo></mrow><mrow><mtext
   mathvariant="italic">N</mtext><mo>→</mo><mi>∞</mi></mrow></msub><mfrac>
   <mrow><msup><mrow><mfenced close="]" open="["
   separators=""><mrow><munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mr
   ow></munder><msup><mrow><mo>(</mo><mi>N</mi><msubsup><mrow><mi>a</mi></
   mrow><mrow><mtext
   mathvariant="italic">i</mtext><mo>·</mo></mrow><mrow><mo>′</mo></mrow><
   /msubsup><mo>)</mo></mrow><mrow><mn>3</mn></mrow></msup></mrow></mfence
   d></mrow><mrow><mn>2</mn></mrow></msup></mrow><mrow><msup><mrow><mfence
   d close="]" open="["
   separators=""><mrow><munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mr
   ow></munder><msup><mrow><mo>(</mo><mi>N</mi><msubsup><mrow><mi>a</mi></
   mrow><mrow><mtext
   mathvariant="italic">i</mtext><mo>·</mo></mrow><mrow><mo>′</mo></mrow><
   /msubsup><mo>)</mo></mrow><mrow><mn>2</mn></mrow></msup></mrow></mfence
   d></mrow><mrow><mn>3</mn></mrow></msup></mrow></mfrac><mo>=</mo><msub><
   mrow><mo>lim</mo></mrow><mrow><mtext
   mathvariant="italic">N</mtext><mo>→</mo><mi>∞</mi></mrow></msub><mfrac>
   <mrow><msup><mrow><mfenced close="]" open="["
   separators=""><mrow><munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mr
   ow></munder><msup><mrow><mo>(</mo><msub><mrow><mi>a</mi></mrow><mrow><m
   text
   mathvariant="italic">i</mtext><mo>·</mo></mrow></msub><mo>−</mo><mi>ā</
   mi><mo>)</mo></mrow><mrow><mn>3</mn></mrow></msup></mrow></mfenced></mr
   ow><mrow><mn>2</mn></mrow></msup></mrow><mrow><msup><mrow><mfenced
   close="]" open="["
   separators=""><mrow><munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mr
   ow></munder><msup><mrow><mo>(</mo><msub><mrow><mi>a</mi></mrow><mrow><m
   text
   mathvariant="italic">i</mtext><mo>·</mo></mrow></msub><mo>−</mo><mi>ā</
   mi><mo>)</mo></mrow><mrow><mn>2</mn></mrow></msup></mrow></mfenced></mr
   ow><mrow><mn>3</mn></mrow></msup></mrow></mfrac><mo>=</mo><mn>0</mn><mo
   >,</mo> :MATH]
   (7)

   and analogously for
   [MATH: <mi mathvariant="script">ℬ</mi> :MATH]
   . It can be observed that, when using the adjacency matrix, a [i·]
   represents the degree of the i ^th node. An analogous condition also
   applies to
   [MATH: <mi mathvariant="script">ℬ</mi> :MATH]
   . Therefore, checking ([73]5b) amounts to computing the degree of each
   node in the two networks, and assessing the limiting behaviour. When
   the TO measure is used instead, as in the GHD, the coefficient a [i·]
   represents the overall topological overlap information at node i, and
   can also be computed using ([74]6).

   When both ([75]5a) and ([76]5b) hold true, under the null hypothesis,
   the permutation distribution of
   [MATH: <mtext>GHD</mtext><mo>(</mo><mi
   mathvariant="script">A</mi><mo>,</mo><mi
   mathvariant="script">ℬ</mi><mo>)</mo> :MATH]
   is approximately normal. We then standardise the GHD value by
   mean-centring and normalising it, so that it follows a standard normal
   distribution asymptotically,
   [MATH:
   <mfrac><mrow><msub><mrow><mtext>GHD</mtext></mrow><mrow><mi>π</mi></mro
   w></msub><mo>(</mo><msub><mrow><mi
   mathvariant="script">A</mi></mrow><mrow><mi>π</mi></mrow></msub><mo>,</
   mo><mi
   mathvariant="script">ℬ</mi><mo>)</mo><mo>−</mo><msub><mrow><mi>μ</mi></
   mrow><mrow><mi>π</mi></mrow></msub></mrow><mrow><msub><mrow><mi>σ</mi><
   /mrow><mrow><mi>π</mi></mrow></msub></mrow></mfrac><mo>∼</mo><mi>N</mi>
   <mo>(</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>)</mo> :MATH]
   (8)

   where μ [π] and σ [π] are the mean and standard deviation of GHD under
   the exact permutation distribution, respectively. These two moments can
   be computed precisely and in closed-form by enumerative combinatorics;
   the calculations follow developments described in the context of
   related permutation-based testing procedures [[77]25], and can also be
   found in [[78]29]. Here we provide explicit formula for both μ [π] and
   [MATH:
   <msubsup><mrow><mi>σ</mi></mrow><mrow><mi>π</mi></mrow><mrow><mn>2</mn>
   </mrow></msubsup> :MATH]
   as follows. First, we need to define
   [MATH: <mtable class="eqnarray" columnalign="left center
   right"><mtr><mtd class="eqnarray-1"><mspace
   width="0.3em"></mspace><msup><mrow><mspace
   width="1em"></mspace></mrow><mrow><mi>t</mi></mrow></msup><msub><mrow><
   mi>S</mi></mrow><mrow><mi>a</mi></mrow></msub><mo>=</mo><munderover
   accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mi>N</mi></mrow></munderover><munderover
   accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mi>N</mi></mrow></munderover><msubsup><mrow><mi>a
   </mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mi>t</mi></mrow></msubsup>
   <mo>,</mo><mi>t</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn></mtd><mtd
   class="eqnarray-2"><mtext>and</mtext></mtd><mtd
   class="eqnarray-3"><msub><mrow><mi>T</mi></mrow><mrow><mi>a</mi></mrow>
   </msub><mo>=</mo><munderover accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mi>N</mi></mrow></munderover><msup><mrow><mfenced
   close=")" open="(" separators=""><mrow><munderover accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mi>N</mi></mrow></munderover><msub><mrow><mi>a</m
   i></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub></mrow></mfenced></mrow><m
   row><mn>2</mn></mrow></msup></mtd><mtd><mtext></mtext></mtd></mtr><mtr>
   <mtd class="eqnarray-1"><mspace
   width="0.3em"></mspace><msup><mrow><mspace
   width="1em"></mspace></mrow><mrow><mi>t</mi></mrow></msup><msub><mrow><
   mi>S</mi></mrow><mrow><mi>b</mi></mrow></msub><mo>=</mo><munderover
   accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mi>N</mi></mrow></munderover><munderover
   accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mi>N</mi></mrow></munderover><msubsup><mrow><mi>b
   </mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mi>t</mi></mrow></msubsup>
   <mo>,</mo><mi>t</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn></mtd><mtd
   class="eqnarray-2"><mtext>and</mtext></mtd><mtd
   class="eqnarray-3"><msub><mrow><mi>T</mi></mrow><mrow><mi>b</mi></mrow>
   </msub><mo>=</mo><munderover accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mi>N</mi></mrow></munderover><msup><mrow><mfenced
   close=")" open="(" separators=""><mrow><munderover accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mi>N</mi></mrow></munderover><msub><mrow><mi>b</m
   i></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub></mrow></mfenced></mrow><m
   row><mn>2</mn></mrow></msup></mtd><mtd><mtext></mtext></mtd></mtr></mta
   ble> :MATH]

   where
   [MATH: <msubsup><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mi>t</mi></mrow></msubsup>
   :MATH]
   and
   [MATH: <msubsup><mrow><mi>b</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow><mrow><mi>t</mi></mrow></msubsup>
   :MATH]
   are edge weights with power t. Here
   [MATH: <mfrac><mrow><mspace width="0.3em"></mspace><msup><mrow><mspace
   width="1em"></mspace></mrow><mrow><mn>1</mn></mrow></msup><msub><mrow><
   mi>S</mi></mrow><mrow><mi>a</mi></mrow></msub></mrow><mrow><mi>N</mi><m
   o>(</mo><mi>N</mi><mo>−</mo><mn>1</mn><mo>)</mo></mrow></mfrac> :MATH]
   and
   [MATH: <mfrac><mrow><mspace width="0.3em"></mspace><msup><mrow><mspace
   width="1em"></mspace></mrow><mrow><mn>2</mn></mrow></msup><msub><mrow><
   mi>S</mi></mrow><mrow><mi>a</mi></mrow></msub></mrow><mrow><mi>N</mi><m
   o>(</mo><mi>N</mi><mo>−</mo><mn>1</mn><mo>)</mo></mrow></mfrac> :MATH]
   are empirical raw moment of edge weight a [ij], and analogously for b
   [ij]. Furthermore we need to introduce the following quantities,
   [MATH: <mtable class="eqnarray" columnalign="left center
   right"><mtr><mtd class="eqnarray-1"></mtd><mtd
   class="eqnarray-2"></mtd><mtd
   class="eqnarray-3"><msub><mrow><mi>A</mi></mrow><mrow><mi>a</mi></mrow>
   </msub><mo>=</mo><msup><mrow><mfenced close=")" open="("
   separators=""><mrow><mspace width="0.3em"></mspace><msup><mrow><mspace
   width="1em"></mspace></mrow><mrow><mn>1</mn></mrow></msup><msub><mrow><
   mi>S</mi></mrow><mrow><mi>a</mi></mrow></msub></mrow></mfenced></mrow><
   mrow><mn>2</mn></mrow></msup><mo>,</mo><msub><mrow><mi>B</mi></mrow><mr
   ow><mi>a</mi></mrow></msub><mo>=</mo><msub><mrow><mi>T</mi></mrow><mrow
   ><mi>a</mi></mrow></msub><mo>−</mo><mfenced close=")" open="("
   separators=""><mrow><mspace width="0.3em"></mspace><msup><mrow><mspace
   width="1em"></mspace></mrow><mrow><mn>2</mn></mrow></msup><msub><mrow><
   mi>S</mi></mrow><mrow><mi>a</mi></mrow></msub></mrow></mfenced><mo>,</m
   o></mtd><mtd><mtext></mtext></mtd></mtr><mtr><mtd
   class="eqnarray-1"></mtd><mtd class="eqnarray-2"></mtd><mtd
   class="eqnarray-3"><mspace width="1em"></mspace><mspace
   width="1em"></mspace><mtext>and</mtext><mspace
   width="1em"></mspace><mspace
   width="1em"></mspace><msub><mrow><mi>C</mi></mrow><mrow><mi>a</mi></mro
   w></msub><mo>=</mo><msub><mrow><mi>A</mi></mrow><mrow><mi>a</mi></mrow>
   </msub><mo>+</mo><mn>2</mn><mfenced close=")" open="("
   separators=""><mrow><mspace width="0.3em"></mspace><msup><mrow><mspace
   width="1em"></mspace></mrow><mrow><mn>2</mn></mrow></msup><msub><mrow><
   mi>S</mi></mrow><mrow><mi>a</mi></mrow></msub></mrow></mfenced><mo>−</m
   o><mn>4</mn><msub><mrow><mi>T</mi></mrow><mrow><mi>a</mi></mrow></msub>
   </mtd><mtd><mtext></mtext></mtd></mtr><mtr><mtd
   class="eqnarray-1"></mtd><mtd class="eqnarray-2"></mtd><mtd
   class="eqnarray-3"><msub><mrow><mi>A</mi></mrow><mrow><mi>b</mi></mrow>
   </msub><mo>=</mo><msup><mrow><mfenced close=")" open="("
   separators=""><mrow><mspace width="0.3em"></mspace><msup><mrow><mspace
   width="1em"></mspace></mrow><mrow><mn>1</mn></mrow></msup><msub><mrow><
   mi>S</mi></mrow><mrow><mi>b</mi></mrow></msub></mrow></mfenced></mrow><
   mrow><mn>2</mn></mrow></msup><mo>,</mo><msub><mrow><mi>B</mi></mrow><mr
   ow><mi>b</mi></mrow></msub><mo>=</mo><msub><mrow><mi>T</mi></mrow><mrow
   ><mi>b</mi></mrow></msub><mo>−</mo><mfenced close=")" open="("
   separators=""><mrow><mspace width="0.3em"></mspace><msup><mrow><mspace
   width="1em"></mspace></mrow><mrow><mn>2</mn></mrow></msup><msub><mrow><
   mi>S</mi></mrow><mrow><mi>b</mi></mrow></msub></mrow></mfenced><mo>,</m
   o></mtd><mtd><mtext></mtext></mtd></mtr><mtr><mtd
   class="eqnarray-1"></mtd><mtd class="eqnarray-2"></mtd><mtd
   class="eqnarray-3"><mspace width="1em"></mspace><mspace
   width="1em"></mspace><mtext>and</mtext><mspace
   width="1em"></mspace><mspace
   width="1em"></mspace><msub><mrow><mi>C</mi></mrow><mrow><mi>b</mi></mro
   w></msub><mo>=</mo><msub><mrow><mi>A</mi></mrow><mrow><mi>b</mi></mrow>
   </msub><mo>+</mo><mn>2</mn><mfenced close=")" open="("
   separators=""><mrow><mspace width="0.3em"></mspace><msup><mrow><mspace
   width="1em"></mspace></mrow><mrow><mn>2</mn></mrow></msup><msub><mrow><
   mi>S</mi></mrow><mrow><mi>b</mi></mrow></msub></mrow></mfenced><mo>−</m
   o><mn>4</mn><msub><mrow><mi>T</mi></mrow><mrow><mi>b</mi></mrow></msub>
   </mtd><mtd><mtext></mtext></mtd></mtr></mtable> :MATH]

   Then, closed-form expressions for the mean μ [π] and variance
   [MATH:
   <msubsup><mrow><mi>σ</mi></mrow><mrow><mi>π</mi></mrow><mrow><mn>2</mn>
   </mrow></msubsup> :MATH]
   are,
   [MATH:
   <mrow><mtable><mtr><mtd><msub><mrow><mi>μ</mi></mrow><mrow><mi>π</mi></
   mrow></msub><mo>=</mo><mfrac><mrow><mspace
   width="0.3em"></mspace><msup><mrow><mspace
   width="1em"></mspace></mrow><mrow><mn>2</mn></mrow></msup><msub><mrow><
   mi>S</mi></mrow><mrow><mi>a</mi></mrow></msub><mo>+</mo><mspace
   width="0.3em"></mspace><msup><mrow><mspace
   width="1em"></mspace></mrow><mrow><mn>2</mn></mrow></msup><msub><mrow><
   mi>S</mi></mrow><mrow><mi>b</mi></mrow></msub></mrow><mrow><mi>N</mi><m
   o>(</mo><mi>N</mi><mo>−</mo><mn>1</mn><mo>)</mo></mrow></mfrac><mo>−</m
   o><mfrac><mrow><mn>2</mn><mfenced close=")" open="("
   separators=""><mrow><mspace width="0.3em"></mspace><msup><mrow><mspace
   width="1em"></mspace></mrow><mrow><mn>1</mn></mrow></msup><msub><mrow><
   mi>S</mi></mrow><mrow><mi>a</mi></mrow></msub></mrow></mfenced><mspace
   width="0.3em"></mspace><mspace width="0.3em"></mspace><mfenced
   close=")" open="(" separators=""><mrow><mspace
   width="0.3em"></mspace><msup><mrow><mspace
   width="1em"></mspace></mrow><mrow><mn>1</mn></mrow></msup><msub><mrow><
   mi>S</mi></mrow><mrow><mi>b</mi></mrow></msub></mrow></mfenced></mrow><
   mrow><msup><mrow><mi>N</mi></mrow><mrow><mn>2</mn></mrow></msup><msup><
   mrow><mfenced close=")" open="("
   separators=""><mrow><mi>N</mi><mo>−</mo><mn>1</mn></mrow></mfenced></mr
   ow><mrow><mn>2</mn></mrow></msup></mrow></mfrac><mspace
   width="0.3em"></mspace><mo>,</mo></mtd></mtr><mtr><mtd><msubsup><mrow><
   mi>σ</mi></mrow><mrow><mi>π</mi></mrow><mrow><mn>2</mn></mrow></msubsup
   ><mo>=</mo><mfrac><mrow><mn>4</mn></mrow><mrow><msup><mrow><mi>N</mi></
   mrow><mrow><mn>3</mn></mrow></msup><msup><mrow><mo>(</mo><mi>N</mi><mo>
   −</mo><mn>1</mn><mo>)</mo></mrow><mrow><mn>3</mn></mrow></msup></mrow><
   /mfrac><mfenced close="" open="["
   separators=""><mrow><mn>2</mn><mfenced close=")" open="("
   separators=""><mrow><mspace width="0.3em"></mspace><msup><mrow><mspace
   width="1em"></mspace></mrow><mrow><mn>2</mn></mrow></msup><msub><mrow><
   mi>S</mi></mrow><mrow><mi>a</mi></mrow></msub></mrow></mfenced></mrow><
   /mfenced><mfenced close="" open="" separators=""><mrow><mspace
   width="0.3em"></mspace><mspace width="0.3em"></mspace><mspace
   width="0.3em"></mspace><mfenced close=")" open="("
   separators=""><mrow><mspace width="0.3em"></mspace><msup><mrow><mspace
   width="1em"></mspace></mrow><mrow><mn>2</mn></mrow></msup><msub><mrow><
   mi>S</mi></mrow><mrow><mi>b</mi></mrow></msub></mrow></mfenced><mo>+</m
   o><mfrac><mrow><mn>4</mn><mo>(</mo><msub><mrow><mi>B</mi></mrow><mrow><
   mi>a</mi></mrow></msub><mo>)</mo><mo>(</mo><msub><mrow><mi>B</mi></mrow
   ><mrow><mi>b</mi></mrow></msub><mo>)</mo></mrow><mrow><mi>N</mi><mo>−</
   mo><mn>2</mn></mrow></mfrac></mrow></mfenced></mtd></mtr><mtr><mtd><mfe
   nced close="]" open="" separators=""><mrow><mspace
   width="0.3em"></mspace><mspace width="0.3em"></mspace><mspace
   width="0.3em"></mspace><mspace width="0.3em"></mspace><mspace
   width="0.3em"></mspace><mspace width="0.3em"></mspace><mspace
   width="0.3em"></mspace><mspace width="0.3em"></mspace><mspace
   width="0.3em"></mspace><mspace width="0.3em"></mspace><mspace
   width="0.3em"></mspace><mspace width="0.3em"></mspace><mspace
   width="0.3em"></mspace><mspace
   width="0.3em"></mspace><mo>+</mo><mfrac><mrow><mo>(</mo><msub><mrow><mi
   >C</mi></mrow><mrow><mi>a</mi></mrow></msub><mo>)</mo><mo>(</mo><msub><
   mrow><mi>C</mi></mrow><mrow><mi>b</mi></mrow></msub><mo>)</mo></mrow><m
   row><mo>(</mo><mi>N</mi><mo>−</mo><mn>2</mn><mo>)</mo><mo>(</mo><mi>N</
   mi><mo>−</mo><mn>3</mn><mo>)</mo></mrow></mfrac><mo>−</mo><mfrac><mrow>
   <mo>(</mo><msub><mrow><mi>A</mi></mrow><mrow><mi>a</mi></mrow></msub><m
   o>)</mo><mo>(</mo><msub><mrow><mi>A</mi></mrow><mrow><mi>b</mi></mrow><
   /msub><mo>)</mo></mrow><mrow><mi>N</mi><mo>(</mo><mi>N</mi><mo>−</mo><m
   n>1</mn><mo>)</mo></mrow></mfrac></mrow></mfenced><mi>.</mi></mtd></mtr
   ></mtable></mrow> :MATH]

   With the expressions for the first two exact moments, a corresponding
   p-value can therefore be efficiently computed from the normal
   approximation, even for very large networks. We will exploit the
   computational efficiency gained here in Section [79]Differential
   subnetwork detection, where we apply the test repeatedly on networks of
   increasingly smaller size in order to detect differential subnetworks.

Validation of asymptotic normality on scale-free networks

   The closed-form approximation for the computation of p-values only
   requires that conditions ([80]5a) and ([81]5b) are satisfied, and does
   not need any random network model to be specified. These two conditions
   can also be verified analytically in special case when certain random
   network models are assumed. For instance, in [[82]29] it was proved
   that these conditions hold true for scale-free (SF), random geometric
   (RG) and Erdös-Rényi (ER) network models when using both HD and GHD
   distances. In this section we provide a simplified proof for the case
   of SF networks using the Hamming distance. This proof should serve as
   an illustration of how these derivations can be carried out
   analytically, and as simple validation of the methodology described in
   Section [83]A non-parametric test for network comparison for SF
   networks. An analogous proof using the GHD distance can be found in the
   Supplementary Material, and we refer the reader to [[84]29] for the
   other models.

   A SF network is a network whose node degree distribution follows a
   power law, at least asymptotically, and has often been used to describe
   real biological networks [[85]30–[86]32]. The degree of each node is
   assumed to be an independent and identically distributed (IID) random
   variable with probability mass function defined as
   [MATH:
   <mi>P</mi><mo>(</mo><msub><mrow><mi>d</mi></mrow><mrow><mi>i</mi></mrow
   ></msub><mo>=</mo><mi>k</mi><mo>)</mo><mo>=</mo><mi>c</mi><msup><mrow><
   mi>k</mi></mrow><mrow><mo>−</mo><mi>α</mi></mrow></msup><mo>,</mo><mspa
   ce
   width="1em"></mspace><mi>k</mi><mo>=</mo><mi>m</mi><mo>,</mo><mi>m</mi>
   <mo>+</mo><mn>1</mn><mo>…</mo><mo>,</mo><mi>K</mi><mo>,</mo> :MATH]
   (9)

   where m and K are the lower and upper cut-offs for the node degree,
   respectively, c is a normalising constant, and α represents a power
   exponent. It is generally assumed that α is greater than 1, and the
   lower cut-off m is generally be taken to be 1. The upper cut-off K for
   α>2 is conventionally specified as
   [MATH:
   <mi>K</mi><mo>=</mo><msup><mrow><mi>N</mi></mrow><mrow><mfrac><mrow><mn
   >1</mn></mrow><mrow><mi>α</mi><mo>−</mo><mn>1</mn></mrow></mfrac></mrow
   ></msup> :MATH]
   [[87]33], and generally K=N−1 for 1<α≤2. Values of α for different
   biological networks have been characterised, and mostly vary between
   1.4 to 1.7 [[88]30].

   On defining the weights a [ij] and b [ij] as elements of A and B,
   respectively, ([89]7) becomes
   [MATH: <mtable class="eqnarray" columnalign="left center
   right"><mtr><mtd
   class="eqnarray-1"><msub><mrow><mo>lim</mo></mrow><mrow><mtext
   mathvariant="italic">N</mtext><mo>→</mo><mi>∞</mi></mrow></msub><mfrac>
   <mrow><msup><mrow><mfenced close="]" open="["
   separators=""><mrow><munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mr
   ow></munder><msup><mrow><mo>(</mo><msub><mrow><mi>d</mi></mrow><mrow><m
   i>i</mi></mrow></msub><mo>−</mo><mover
   accent="true"><mrow><mi>d</mi><mspace
   width="0.3em"></mspace></mrow><mo>¯</mo></mover><mo>)</mo></mrow><mrow>
   <mn>3</mn></mrow></msup></mrow></mfenced></mrow><mrow><mn>2</mn></mrow>
   </msup></mrow><mrow><msup><mrow><mfenced close="]" open="["
   separators=""><mrow><munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mr
   ow></munder><msup><mrow><mo>(</mo><msub><mrow><mi>d</mi></mrow><mrow><m
   i>i</mi></mrow></msub><mo>−</mo><mover
   accent="true"><mrow><mi>d</mi><mspace
   width="0.3em"></mspace></mrow><mo>¯</mo></mover><mo>)</mo></mrow><mrow>
   <mn>2</mn></mrow></msup></mrow></mfenced></mrow><mrow><mn>3</mn></mrow>
   </msup></mrow></mfrac><mo>=</mo><mn>0</mn><mo>,</mo></mtd></mtr></mtabl
   e> :MATH]
   (10)

   where
   [MATH: <mover accent="true"><mrow><mi>d</mi><mspace
   width="0.3em"></mspace></mrow><mo>¯</mo></mover> :MATH]
   is the average node degree. In order to study this limiting behaviour,
   we exploit the fact that both numerator and denominator are powers of
   the centralised empirical moments of the node degree distribution. We
   let
   [MATH:
   <msub><mrow><mi>μ</mi></mrow><mrow><mi>s</mi></mrow></msub><mo>=</mo><m
   i>c</mi><munderover><mrow><mo>∑</mo></mrow><mrow><mi>d</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mi>K</mi></mrow></munderover><msup><mrow><mi>d</m
   i></mrow><mrow><mi>s</mi><mo>−</mo><mi>α</mi></mrow></msup> :MATH]
   denote the s ^th theoretical moment and
   [MATH:
   <msub><mrow><mi>m</mi></mrow><mrow><mi>s</mi></mrow></msub><mo>=</mo><m
   frac><mrow><mn>1</mn></mrow><mrow><mi>N</mi></mrow></mfrac><munderover>
   <mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow
   ><mi>N</mi></mrow></munderover><msubsup><mrow><mi>d</mi></mrow><mrow><m
   i>i</mi></mrow><mrow><mi>s</mi></mrow></msubsup> :MATH]
   the corresponding empirical moment of this distribution. In order to
   study the limit above we need to characterise the order of m [s], for
   s=1,2,3, as N increases. Our strategy here consists of first
   characterising the order of μ [s] asymptotically, for the first three
   moments, and establishing a correspondence with m [s].

   We start by examining the order of μ [s], for s=1,2,3, in the limit.
   Since this depends on s, we consider three distinct cases: (a) s−α+1<0,
   (b) s−α+1=0 and (c) s−α+1>0. For (a), the order of μ [s] is
   [MATH:
   <munderover><mrow><mo>∑</mo></mrow><mrow><mi>d</mi><mo>=</mo><mn>1</mn>
   </mrow><mrow><mi>K</mi></mrow></munderover><mfrac><mrow><mn>1</mn></mro
   w><mrow><mi>α</mi><mo>−</mo><mn>1</mn></mrow></mfrac><msup><mrow><mi>d<
   /mi></mrow><mrow><mo>−</mo><mn>1</mn></mrow></msup><mo>=</mo><mi>O</mi>
   <mo>(</mo><mn>1</mn><mo>)</mo> :MATH]
   . For (b), the order of μ [s] is
   [MATH:
   <munderover><mrow><mo>∑</mo></mrow><mrow><mi>d</mi><mo>=</mo><mn>1</mn>
   </mrow><mrow><mi>K</mi></mrow></munderover><msup><mrow><mi>d</mi></mrow
   ><mrow><mo>−</mo><mn>1</mn></mrow></msup><mo>=</mo><mi>O</mi><mo>(</mo>
   <mo>ln</mo><mo>(</mo><mi>K</mi><mo>)</mo><mo>)</mo> :MATH]
   . Finally, for (c), we need to study how μ [s] increases with K. First,
   we apply the Euler-Maclaurin formula,
   [MATH: <mrow><munderover accent="false" accentunder="false"><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>d</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>K</mi></mrow></munderover><msup><mrow><mi>d</mi></mrow><mrow>
   <mi>s</mi><mo>−</mo><mi>α</mi></mrow></msup><mo>=</mo><msup><mrow><mi>K
   </mi></mrow><mrow><mi>s</mi><mo>−</mo><mi>α</mi><mo>+</mo><mn>1</mn></m
   row></msup><mo>+</mo><mo>(</mo><mi>α</mi><mo>−</mo><mi>s</mi><mo>)</mo>
   <munderover><mrow><mo>∫</mo></mrow><mrow><mn>1</mn></mrow><mrow><mi>K</
   mi></mrow></munderover><mfrac><mrow><mfenced close="⌋" open="⌊"
   separators=""><mrow><mi>x</mi></mrow></mfenced></mrow><mrow><msup><mrow
   ><mi>x</mi></mrow><mrow><mi>α</mi><mo>−</mo><mi>s</mi><mo>+</mo><mn>1</
   mn></mrow></msup></mrow></mfrac><mtext
   mathvariant="italic">dx</mtext><mo>+</mo><mi>O</mi><mo>(</mo><mn>1</mn>
   <mo>)</mo><mo>,</mo></mrow> :MATH]

   where ⌊x⌋ denotes the largest integer that is not greater than x. To
   compute the order of
   [MATH:
   <munderover><mrow><mo>∑</mo></mrow><mrow><mi>d</mi><mo>=</mo><mn>1</mn>
   </mrow><mrow><mi>K</mi></mrow></munderover><msup><mrow><mi>d</mi></mrow
   ><mrow><mi>s</mi><mo>−</mo><mi>α</mi></mrow></msup> :MATH]
   , we need to know which one of the two terms in the sum dominates in
   order. By applying l’Hospital’s rule we have
   [MATH: <mtable class="eqnarray" columnalign="left center
   right"><mtr><mtd
   class="eqnarray-1"><msub><mrow><mo>lim</mo></mrow><mrow><mtext
   mathvariant="italic">K</mtext><mo>→</mo><mi>∞</mi></mrow></msub><mfrac>
   <mrow><mi>s</mi><munderover><mrow><mo>∫</mo></mrow><mrow><mn>1</mn></mr
   ow><mrow><mi>K</mi></mrow></munderover><mfrac><mrow><mfenced close="⌋"
   open="⌊"
   separators=""><mrow><mi>x</mi></mrow></mfenced></mrow><mrow><msup><mrow
   ><mi>x</mi></mrow><mrow><mi>α</mi><mo>−</mo><mi>s</mi><mo>+</mo><mn>1</
   mn></mrow></msup></mrow></mfrac><mtext
   mathvariant="italic">dx</mtext></mrow><mrow><msup><mrow><mi>K</mi></mro
   w><mrow><mi>s</mi><mo>−</mo><mi>α</mi><mo>+</mo><mn>1</mn></mrow></msup
   ></mrow></mfrac></mtd><mtd class="eqnarray-2"><mo>=</mo></mtd><mtd
   class="eqnarray-3"><mfrac><mrow><mi>s</mi></mrow><mrow><mi>s</mi><mo>−<
   /mo><mi>α</mi><mo>+</mo><mn>1</mn></mrow></mfrac><mo>,</mo></mtd><mtd><
   mtext></mtext></mtd></mtr></mtable> :MATH]

   which is a finite constant, and hence μ [s] has the same order as K
   ^s−α+1. For a SF network, the condition for asymptotic normality also
   depends on the values taken by the exponent. In the case where 1<α<2,
   for which K=N−1, the calculation of the s ^th moment falls under case
   (c), hence we conclude that the order of the first three theoretical
   moments are, respectively, O(N ^2−α),O(N ^3−α) and O(N ^4−α).

   We now turn to the direct comparison of the orders of μ [s] and m [s]
   in the limit. Specifically, we assess whether the order of each μ [s]
   established above also holds true for the corresponding m [s]. This can
   be verified by checking that
   [MATH: <mtable class="eqnarray" columnalign="left center
   right"><mtr><mtd
   class="eqnarray-1"><msub><mrow><mo>lim</mo></mrow><mrow><mi>N</mi><mo>→
   </mo><mi>∞</mi></mrow></msub><mfrac><mrow><msub><mrow><mi>m</mi></mrow>
   <mrow><mi>s</mi></mrow></msub></mrow><mrow><msub><mrow><mi>μ</mi></mrow
   ><mrow><mi>s</mi></mrow></msub></mrow></mfrac><mo>=</mo><msub><mrow><mi
   >c</mi></mrow><mrow><mi>s</mi></mrow></msub><mo>,</mo></mtd></mtr></mta
   ble> :MATH]
   (11)

   for s=1,2,3, and for some positive constants c [s]. To study the above
   limit, we apply the Weak Law of Large Numbers (WLLN). For the WLLN to
   hold, μ [s] must be finite. Hence we first transform d [i] so that μ
   [s], after the transformation, is finite. We let
   [MATH:
   <msub><mrow><mi>N</mi></mrow><mrow><mi>s</mi></mrow></msub><mo>=</mo><m
   sup><mrow><mi>N</mi></mrow><mrow><mfrac><mrow><mi>s</mi><mo>+</mo><mn>1
   </mn><mo>−</mo><mi>α</mi></mrow><mrow><mi>s</mi></mrow></mfrac></mrow><
   /msup> :MATH]
   , and define
   [MATH: <msub><mrow><mi>z</mi></mrow><mrow><mtext
   mathvariant="italic">si</mtext></mrow></msub><mo>=</mo><mfrac><mrow><ms
   ub><mrow><mi>d</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow><mrow><m
   sub><mrow><mi>N</mi></mrow><mrow><mi>s</mi></mrow></msub></mrow></mfrac
   > :MATH]
   . The distribution of z [si] is
   [MATH:
   <mrow><mi>P</mi><mo>(</mo><msub><mrow><mi>z</mi></mrow><mrow><mtext
   mathvariant="italic">si</mtext></mrow></msub><mo>=</mo><mi>z</mi><mo>)<
   /mo><mo>=</mo><msup><mrow><mi>c</mi></mrow><mrow><mo>′</mo></mrow></msu
   p><msup><mrow><mi>z</mi></mrow><mrow><mo>−</mo><mi>α</mi></mrow></msup>
   <mo>,</mo><mspace
   width="2em"></mspace><mi>z</mi><mo>=</mo><mfrac><mrow><mn>1</mn></mrow>
   <mrow><msub><mrow><mi>N</mi></mrow><mrow><mi>s</mi></mrow></msub></mrow
   ></mfrac><mo>,</mo><mfrac><mrow><mn>2</mn></mrow><mrow><msub><mrow><mi>
   N</mi></mrow><mrow><mi>s</mi></mrow></msub></mrow></mfrac><mo>,</mo><mi
   >..</mi><mo>,</mo><mfrac><mrow><mi>K</mi></mrow><mrow><msub><mrow><mi>N
   </mi></mrow><mrow><mi>s</mi></mrow></msub></mrow></mfrac><mo>,</mo></mr
   ow> :MATH]

   where c ^′=c N [s]. Thus the s ^th theoretical moment of z [si] is
   [MATH: <mtable class="eqnarray" columnalign="left center
   right"><mtr><mtd
   class="eqnarray-1"><msub><mrow><mi>μ</mi></mrow><mrow><mtext
   mathvariant="italic">zs</mtext></mrow></msub><mo>=</mo><msup><mrow><mi>
   c</mi></mrow><mrow><mo>′</mo></mrow></msup><munder><mrow><mo>∑</mo></mr
   ow><mrow><mi>z</mi></mrow></munder><msup><mrow><mi>z</mi></mrow><mrow><
   mi>s</mi><mo>−</mo><mi>α</mi></mrow></msup><mo>=</mo><msup><mrow><mi>c<
   /mi></mrow><mrow><mo>′</mo></mrow></msup><munder><mrow><mo>∑</mo></mrow
   ><mrow><mi>d</mi></mrow></munder><msup><mrow><mfenced close="]"
   open="["
   separators=""><mrow><mfrac><mrow><mi>d</mi></mrow><mrow><msub><mrow><mi
   >N</mi></mrow><mrow><mi>s</mi></mrow></msub></mrow></mfrac></mrow></mfe
   nced></mrow><mrow><mi>s</mi></mrow></msup><msup><mrow><mi>d</mi></mrow>
   <mrow><mo>−</mo><mi>α</mi></mrow></msup><mo>=</mo><mfrac><mrow><msub><m
   row><mi>μ</mi></mrow><mrow><mi>s</mi></mrow></msub></mrow><mrow><msup><
   mrow><mi>N</mi></mrow><mrow><mi>s</mi><mo>+</mo><mn>1</mn><mo>−</mo><mi
   >α</mi></mrow></msup></mrow></mfrac><mo>,</mo></mtd><mtd
   class="eqnarray-2"></mtd><mtd
   class="eqnarray-3"></mtd><mtd><mtext></mtext></mtd></mtr></mtable>
   :MATH]

   which is finite. Denoting by m [zs] the s ^th empirical moment of z
   [si],i=1,...,N, we have
   [MATH: <mtable class="eqnarray" columnalign="left center
   right"><mtr><mtd
   class="eqnarray-1"><msub><mrow><mi>m</mi></mrow><mrow><mtext
   mathvariant="italic">zs</mtext></mrow></msub><mo>=</mo><mfrac><mrow><mn
   >1</mn></mrow><mrow><mi>N</mi></mrow></mfrac><munderover accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mi>N</mi></mrow></munderover><msubsup><mrow><mi>z
   </mi></mrow><mrow><mtext
   mathvariant="italic">si</mtext></mrow><mrow><mi>s</mi></mrow></msubsup>
   <mo>=</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mi>N</mi></mrow></mfrac>
   <munderover accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mi>N</mi></mrow></munderover><msup><mrow><mfenced
   close=")" open="("
   separators=""><mrow><mfrac><mrow><msub><mrow><mi>d</mi></mrow><mrow><mi
   >i</mi></mrow></msub></mrow><mrow><msub><mrow><mi>N</mi></mrow><mrow><m
   i>s</mi></mrow></msub></mrow></mfrac></mrow></mfenced></mrow><mrow><mi>
   s</mi></mrow></msup><mo>=</mo><mfrac><mrow><msub><mrow><mi>m</mi></mrow
   ><mrow><mi>s</mi></mrow></msub></mrow><mrow><msup><mrow><mi>N</mi></mro
   w><mrow><mi>s</mi><mo>+</mo><mn>1</mn><mo>−</mo><mi>α</mi></mrow></msup
   ></mrow></mfrac><mi>.</mi></mtd><mtd class="eqnarray-2"></mtd><mtd
   class="eqnarray-3"></mtd><mtd><mtext></mtext></mtd></mtr></mtable>
   :MATH]

   Now, since μ [zs] is finite and since d [1],d [2],...,d [N] are assumed
   IID, z [s1],z [s2],...z [sN] are also IID, and according to the WLLN, m
   [zs] converges to μ [zs] in probability. Hence we have
   [MATH: <mtable class="eqnarray" columnalign="left center
   right"><mtr><mtd
   class="eqnarray-1"><mn>1</mn><mo>=</mo><msub><mrow><mo>lim</mo></mrow><
   mrow><mi>N</mi><mo>→</mo><mi>∞</mi></mrow></msub><mfrac><mrow><msub><mr
   ow><mi>m</mi></mrow><mrow><mtext
   mathvariant="italic">zs</mtext></mrow></msub></mrow><mrow><msub><mrow><
   mi>μ</mi></mrow><mrow><mtext
   mathvariant="italic">zs</mtext></mrow></msub></mrow></mfrac><mo>=</mo><
   msub><mrow><mo>lim</mo></mrow><mrow><mi>N</mi><mo>→</mo><mi>∞</mi></mro
   w></msub><mfrac><mrow><mfrac><mrow><msub><mrow><mi>m</mi></mrow><mrow><
   mi>s</mi></mrow></msub></mrow><mrow><msup><mrow><mi>N</mi></mrow><mrow>
   <mi>s</mi><mo>+</mo><mn>1</mn><mo>−</mo><mi>α</mi></mrow></msup></mrow>
   </mfrac></mrow><mrow><mfrac><mrow><msub><mrow><mi>μ</mi></mrow><mrow><m
   i>s</mi></mrow></msub></mrow><mrow><msup><mrow><mi>N</mi></mrow><mrow><
   mi>s</mi><mo>+</mo><mn>1</mn><mo>−</mo><mi>α</mi></mrow></msup></mrow><
   /mfrac></mrow></mfrac><mo>=</mo><msub><mrow><mo>lim</mo></mrow><mrow><m
   i>N</mi><mo>→</mo><mi>∞</mi></mrow></msub><mfrac><mrow><msub><mrow><mi>
   m</mi></mrow><mrow><mi>s</mi></mrow></msub></mrow><mrow><msub><mrow><mi
   >μ</mi></mrow><mrow><mi>s</mi></mrow></msub></mrow></mfrac><mo>,</mo></
   mtd><mtd class="eqnarray-2"></mtd><mtd
   class="eqnarray-3"></mtd><mtd><mtext></mtext></mtd></mtr></mtable>
   :MATH]

   indicating that m [s] and μ [s] are of the same order asymptotically.
   Using this result, we are able to approximate the orders of the
   numerator and denominator of condition ([90]7):
   [MATH:
   <munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mrow></munder><msup><m
   row><mfenced close=")" open="("
   separators=""><mrow><msub><mrow><mi>d</mi></mrow><mrow><mi>i</mi></mrow
   ></msub><mo>−</mo><mover
   accent="true"><mrow><mi>d</mi></mrow><mo>¯</mo></mover></mrow></mfenced
   ></mrow><mrow><mn>3</mn></mrow></msup><mo>=</mo><mi>N</mi><mfenced
   close=")" open="("
   separators=""><mrow><msub><mrow><mi>m</mi></mrow><mrow><mn>3</mn></mrow
   ></msub><mo>−</mo><mn>2</mn><msub><mrow><mi>m</mi></mrow><mrow><mn>2</m
   n></mrow></msub><msub><mrow><mi>m</mi></mrow><mrow><mn>1</mn></mrow></m
   sub><mo>+</mo><mn>2</mn><msubsup><mrow><mi>m</mi></mrow><mrow><mn>1</mn
   ></mrow><mrow><mn>3</mn></mrow></msubsup></mrow></mfenced> :MATH]
   is O(N ^4−α+1), and
   [MATH:
   <munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi></mrow></munder><msup><m
   row><mfenced close=")" open="("
   separators=""><mrow><msub><mrow><mi>d</mi></mrow><mrow><mi>i</mi></mrow
   ></msub><mo>−</mo><mover
   accent="true"><mrow><mi>d</mi></mrow><mo>¯</mo></mover></mrow></mfenced
   ></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mi>N</mi><mo>(</mo><ms
   ub><mrow><mi>m</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>−</mo><msub
   sup><mrow><mi>m</mi></mrow><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mro
   w></msubsup><mo>)</mo> :MATH]
   is O(N ^3−α+1). Substituting into ([91]7), we see that the numerator is
   of order O(N ^8−2α+2), the denominator is of order O(N ^9−3α+3), and
   therefore the ratio is of order O(N ^α−2). Hence for 1<α<2, the limit
   in ([92]10) is 0. By following a similar procedure, it can be proved
   that the normality condition is also satisfied when α≥3.

Differential subnetwork detection

   In this section we leverage the test statistic of Section [93]A
   non-parametric test for network comparison to detect a differential
   subnetwork. When comparing the two networks, the expectation is that
   only a subset of edges would present altered interaction patterns. This
   task is formulated here as the problem of detecting a subset V ^∗⊂V for
   which there is no sufficient evidence to reject the null hypothesis
   that the corresponding subnetworks
   [MATH: <msup><mrow><mi
   mathvariant="script">A</mi></mrow><mrow><mo>∗</mo></mrow></msup><mo>(</
   mo><msup><mrow><mi>V</mi></mrow><mrow><mo>∗</mo></mrow></msup><mo>,</mo
   ><msub><mrow><mi>E</mi></mrow><mrow><msup><mrow><mi>A</mi></mrow><mrow>
   <mo>∗</mo></mrow></msup></mrow></msub><mo>)</mo> :MATH]
   and
   [MATH: <msup><mrow><mi
   mathvariant="script">ℬ</mi></mrow><mrow><mo>∗</mo></mrow></msup><mo>(</
   mo><msup><mrow><mi>V</mi></mrow><mrow><mo>∗</mo></mrow></msup><mo>,</mo
   ><msub><mrow><mi>E</mi></mrow><mrow><msup><mrow><mi>B</mi></mrow><mrow>
   <mo>∗</mo></mrow></msup></mrow></msub><mo>)</mo> :MATH]
   are statistically independent.

   An algorithm for the detection of V ^∗ should take into account the
   fact that a certain degree of topological difference between
   [MATH: <mi mathvariant="script">A</mi> :MATH]
   and
   [MATH: <mi mathvariant="script">ℬ</mi> :MATH]
   is always bound to be observed, even when the two population networks
   are the same, due to finite sample variability. The GHD test provides
   an efficient way to assess the statistical significance of any observed
   discrepancy between two paired networks, and is used as a building
   block to derive an algorithm that identifies differential subnetworks.

   We indicate by V [K] a subset of V of size K≤N, and define the
   centralised GHD test statistic computed by comparing
   [MATH: <mi
   mathvariant="script">A</mi><mo>=</mo><mo>(</mo><msub><mrow><mi>V</mi></
   mrow><mrow><mi>K</mi></mrow></msub><mo>,</mo><msub><mrow><mi>E</mi></mr
   ow><mrow><mi>A</mi></mrow></msub><mo>)</mo> :MATH]
   and
   [MATH: <mi
   mathvariant="script">ℬ</mi><mo>=</mo><mo>(</mo><msub><mrow><mi>V</mi></
   mrow><mrow><mi>K</mi></mrow></msub><mo>,</mo><msub><mrow><mi>E</mi></mr
   ow><mrow><mi>B</mi></mrow></msub><mo>)</mo> :MATH]
   by
   [MATH:
   <msub><mrow><mi>Δ</mi></mrow><mrow><msub><mrow><mi>V</mi></mrow><mrow><
   mi>K</mi></mrow></msub></mrow></msub><mo>=</mo><mtext>GHD</mtext><mo>(<
   /mo><mi
   mathvariant="script">A</mi><mo>(</mo><msub><mrow><mi>V</mi></mrow><mrow
   ><mi>K</mi></mrow></msub><mo>,</mo><msub><mrow><mi>E</mi></mrow><mrow><
   mi>A</mi></mrow></msub><mo>)</mo><mo>,</mo><mi
   mathvariant="script">ℬ</mi><mo>(</mo><msub><mrow><mi>V</mi></mrow><mrow
   ><mi>K</mi></mrow></msub><mo>,</mo><msub><mrow><mi>E</mi></mrow><mrow><
   mi>B</mi></mrow></msub><mo>)</mo><mo>)</mo><mo>−</mo><msub><mrow><mi>μ<
   /mi></mrow><mrow><msub><mrow><mi>V</mi></mrow><mrow><mi>K</mi></mrow></
   msub><mo>,</mo></mrow></msub> :MATH]
   (12)

   where
   [MATH:
   <msub><mrow><mi>μ</mi></mrow><mrow><msub><mrow><mi>V</mi></mrow><mrow><
   mi>K</mi></mrow></msub></mrow></msub> :MATH]
   is the mean of the permutation distribution for node set V [K].
   Furthermore we define
   [MATH:
   <msub><mrow><mi>Δ</mi></mrow><mrow><msub><mrow><mi>V</mi></mrow><mrow><
   mi>K</mi></mrow></msub><mo>|</mo><mi>i</mi></mrow></msub> :MATH]
   to be the centralised GHD value computed by comparing the two networks
   after removal of node i. The quantity
   [MATH:
   <mrow><msub><mrow><mi>δ</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>=<
   /mo><msub><mrow><mi>Δ</mi></mrow><mrow><msub><mrow><mi>V</mi></mrow><mr
   ow><mi>K</mi></mrow></msub><mo>|</mo><mi>i</mi></mrow></msub><mo>−</mo>
   <msub><mrow><mi>Δ</mi></mrow><mrow><msub><mrow><mi>V</mi></mrow><mrow><
   mi>K</mi></mrow></msub></mrow></msub><mo>,</mo></mrow> :MATH]

   measures the influence that node i has on the mean-centred GHD test
   when comparing two subnetworks defined on set V [K]. We propose an
   iterative procedure which removes a node or set of nodes at each step,
   and generates a sequence of node sets of increasing smaller size, i.e.
   [MATH:
   <mrow><msub><mrow><mi>V</mi></mrow><mrow><mi>N</mi></mrow></msub><mo>⊃<
   /mo><msub><mrow><mi>V</mi></mrow><mrow><mi>N</mi><mo>−</mo><mn>1</mn></
   mrow></msub><mo>⊃</mo><mo>…</mo><mo>⊃</mo><msub><mrow><mi>V</mi></mrow>
   <mrow><msub><mrow><mi>N</mi></mrow><mrow><mtext>min</mtext></mrow></msu
   b></mrow></msub><mo>,</mo></mrow> :MATH]

   where N [min]<N is a constant indicating the smallest allowed size of
   subnetwork. Starting with V [N], the two corresponding networks are
   compared by the GHD test, and a p-value is computed, as described
   previously. For each node indexed by i=1,…,N, the corresponding δ [i]
   is computed, and the node associated with the largest positive δ [i]
   value is removed. Given a new set V [N−1], the process is then repeated
   again, and then again until a specified minimal set size is reached.

   This simple algorithm produces a monotonic sequence of p-values that
   increases as the subnetwork size decreases (e.g. see Fig. [94]2). The
   p-values should be adjusted for multiple testing, e.g. by controlling
   the false discovery rate [[95]34]. In the presence of a differential
   subnetwork, the sequence is expected to feature a peak corresponding to
   the size of the subnetwork. Specifically, for a given desired
   significance level α, the algorithm finds the largest K, with N≥K≥N
   [min], such that the adjusted p-value exceeds α. Clearly the algorithm
   benefits from the fact that p-values at each iteration can be computed
   very quickly in closed-form.

Fig. 2.

   Fig. 2
   [96]Open in a new tab

   Sequence of adjusted p-values produced by the dGHD algorithm as a
   function of subnetwork size. The size of the subnetwork is
   progressively reduced by removing nodes that further increase the
   distance between the subnetworks. For this example we simulated 2D RG
   networks of size 1,000 and subnetworks of size 200

Results

   In this section we report on three different simulation experiments
   that have been carried out to study the properties of the proposed
   methodology. Our simulations make use of RG networks, which are
   plausible models for biological networks [[97]17, [98]35–[99]37].
   Two-dimensional RG networks were generated by first uniformly sampling
   N points on [0,1]^2, each one corresponding to a node in the graph. A
   pair of nodes was connected by an edge if the Euclidean distance
   between the corresponding two-dimensional points was smaller than a
   pre-determined threshold d.

   The purpose of the first simulation study was to confirm the asymptotic
   null sampling distribution of the GHD statistic. In this case we
   randomly generated 10,000 pairs of networks
   [MATH: <mi mathvariant="script">A</mi> :MATH]
   and
   [MATH: <mi mathvariant="script">ℬ</mi> :MATH]
   of size N=250, with parameters d=0.3 and d=0.15. For each d value,
   paired networks were independently generated, and the GHD test was
   computed to detect differences between them. As a result of this
   process, we obtained an empirical distribution of p-values. Under the
   null, this distribution is expected to be uniform on [0,1], and the
   resulting QQ plots confirm that the empirical moments of this
   distribution agree perfectly with the expected theoretical moments for
   a RG model; see Fig. [100]3.

Fig. 3.

   Fig. 3
   [101]Open in a new tab

   QQ plots confirming the asymptotic null distribution of the GHD test.
   For RG model, 10,000 paired networks of size 250 were independently
   generated and the GHD test was applied to detect differences between
   them. An empirical distribution of p-values was obtained through 10,000
   comparison for each model, which under the null is expected to be
   uniformly distributed on [0,1]. The figure shows that empirical and
   theoretical moments agree

   In the second study, we compared the ability of the GHD test to detect
   differential networks against three competing tests: Mean Absolute
   Difference (MAD) [[102]38], Quadratic Assignment Procedure (QAP)
   [[103]39] and Conditional Uniform Graph (CUG) [[104]40]. The MAD test
   counts the number of different edges in the two networks
   [MATH: <mtext>MAD</mtext><mo>(</mo><mi
   mathvariant="script">A</mi><mo>,</mo><mi
   mathvariant="script">ℬ</mi><mo>)</mo><mo>=</mo><mfrac><mrow><mn>1</mn><
   /mrow><mrow><mi>N</mi><mo>(</mo><mi>N</mi><mo>−</mo><mn>1</mn><mo>)</mo
   ></mrow></mfrac><munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>,</m
   o><mi>j</mi></mrow></munder><mo>|</mo><msub><mrow><mi>a</mi></mrow><mro
   w><mtext
   mathvariant="italic">ij</mtext></mrow></msub><mo>−</mo><msub><mrow><mi>
   b</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub><mo>|</mo><mo>,</mo>
   :MATH]
   (13)

   where a [ij] and b [ij] correspond to the (i,j) elements in the
   adjacency matrices of
   [MATH: <mi mathvariant="script">A</mi> :MATH]
   and
   [MATH: <mi mathvariant="script">ℬ</mi> :MATH]
   , respectively. The QAP uses edge set product statistics to test for
   the independence between networks,
   [MATH: <mtext>QAP</mtext><mo>(</mo><mi
   mathvariant="script">A</mi><mo>,</mo><mi
   mathvariant="script">ℬ</mi><mo>)</mo><mo>=</mo><mfrac><mrow><mn>1</mn><
   /mrow><mrow><mi>N</mi><mo>(</mo><mi>N</mi><mo>−</mo><mn>1</mn><mo>)</mo
   ></mrow></mfrac><munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>,</m
   o><mi>j</mi></mrow></munder><msub><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub><msub><mrow><mi>b</mi></mr
   ow><mrow><mtext mathvariant="italic">ij</mtext></mrow></msub><mo>,</mo>
   :MATH]
   (14)

   where a [ij] and b [ij] are again elements of the adjacency matrices.
   For both the MAD and QAP tests we also used the traditional permutation
   testing approach. We further included in the study the CUG approach.
   According to this procedure, random networks are generated with
   pre-determined properties, such as size and density, matching the
   properties of the observed networks. For each simulated pair of random
   networks, a measure of correlation between networks is computed, and
   its empirical distribution is built up over many simulations. The
   correlation coefficient is defined as:
   [MATH: <mtable class="eqnarray" columnalign="left center
   right"><mtr><mtd class="eqnarray-1"><mtext>gcor</mtext><mo>(</mo><mi
   mathvariant="script">A</mi><mo>,</mo><mi
   mathvariant="script">ℬ</mi><mo>)</mo><mo>=</mo><munder><mrow><mo>∑</mo>
   </mrow><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow></munder><mfenced
   close=")" open="("
   separators=""><mrow><msub><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub><mo>−</mo><mfrac><mrow><mu
   nder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow>
   </munder><msub><mrow><mi>a</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub></mrow><mrow><mi>N</mi><mo
   >(</mo><mi>N</mi><mo>−</mo><mn>1</mn><mo>)</mo></mrow></mfrac></mrow></
   mfenced><mfenced close=")" open="("
   separators=""><mrow><msub><mrow><mi>b</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub><mo>−</mo><mfrac><mrow><mu
   nder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow>
   </munder><msub><mrow><mi>b</mi></mrow><mrow><mtext
   mathvariant="italic">ij</mtext></mrow></msub></mrow><mrow><mi>N</mi><mo
   >(</mo><mi>N</mi><mo>−</mo><mn>1</mn><mo>)</mo></mrow></mfrac></mrow></
   mfenced><mo>,</mo></mtd><mtd class="eqnarray-2"></mtd><mtd
   class="eqnarray-3"></mtd><mtd><mtext></mtext></mtd></mtr></mtable>
   :MATH]

   where a [ij] and b [ij] are elements of the adjacency matrices for
   [MATH: <mi mathvariant="script">A</mi> :MATH]
   and
   [MATH: <mi mathvariant="script">ℬ</mi> :MATH]
   , respectively [[105]41].

   This experiment required the simulation of paired networks with a
   pre-specified degree of topological dissimilarity. This was achieved by
   generating
   [MATH: <mi mathvariant="script">A</mi> :MATH]
   first, using one of the two random models as described above. Network
   [MATH: <mi mathvariant="script">ℬ</mi> :MATH]
   was then obtained by first making an exact copy of
   [MATH: <mi mathvariant="script">A</mi> :MATH]
   , and then randomly shuffling a fixed proportion γ of edges so that, as
   γ increases, the dissimilarity between
   [MATH: <mi mathvariant="script">A</mi> :MATH]
   and
   [MATH: <mi mathvariant="script">ℬ</mi> :MATH]
   increases. For each given value of γ, we generated 1,000 pairs of
   networks, computed the tests and corresponding p-values, and evaluated
   the proportion of tests that rejected the null hypothesis of
   independence at a 5 % significance level. The results of this study are
   summarised in Fig. [106]4 where the “power” is defined as the
   proportion of replications, out of 1,000, when we accept the null
   hypothesis of independence. This rises from zero at γ=0.84, when
   networks are still associated, to close to 1 when a lot of shuffling
   has been carried out, to produce nearly independent networks. This
   figure shows that for noise levels as large as γ=0.93, the tests based
   on HD consider the two networks to be strongly associated. It is only
   when reaching that threshold that their power starts increasing rapidly
   away from zero. This suggests that the tests based on HD may be too
   stringent for real application and miss importance differential
   patterns. By contract, the GHD test is able to detect differences at
   lower noise levels compared to other tests and capture more subtle
   differences. This is not surprising as GHD is more sensitive to
   topological changes, as seen in Fig. [107]1.

Fig. 4.

   Fig. 4
   [108]Open in a new tab

   The “power” is defined as the proportion of replications when the null
   hypothesis of independence is not rejected. As the noise level γ
   increases, the GHD test has more power to detect true structural
   changes compared to competing methods. Simulations are based on 2D RG
   networks

   In the third simulation study, we carried out an investigation to
   assess the behaviour of the differential subnetwork detection
   algorithm, and quantify its performance in comparison with other tests.
   We report on experiments involving RG networks
   [MATH: <mi mathvariant="script">A</mi> :MATH]
   and
   [MATH: <mi mathvariant="script">ℬ</mi> :MATH]
   of size 1,000 and generated as described above using a noise parameter
   γ. Two independent subnetworks, denoted here by
   [MATH: <msup><mrow><mi
   mathvariant="script">A</mi></mrow><mrow><mo>∗</mo></mrow></msup> :MATH]
   and
   [MATH: <msup><mrow><mi
   mathvariant="script">ℬ</mi></mrow><mrow><mo>∗</mo></mrow></msup> :MATH]
   , were introduced by randomly selecting a subset V ^∗⊂V of size 200,
   and replacing the existing edges with connections simulated from two
   independent RG networks. For each value of γ, we generated 100 such
   paired large networks containing smaller differential subnetworks. We
   term a true positive (TP) a node that is correctly identified as
   belonging to the differential subnetwork, and a false negative (FN) a
   node that belongs to the subnetwork but has not been detected by the
   algorithm. Similarly we define false positives (FP) and true negatives
   (TN). In Table [109]1 we report the sensitivity or true positive rate
   (TPR) computed as TP/(TP + FN), and the specificity (SPC) computed as
   TN/(FP + TN). For comparative purposes, we have also implemented an
   alternative algorithm, called dHD, which is similar to dGHD but uses
   the Hamming distance instead for distance calculations. As can be
   observed, both dHD and dGHD maintain high sensitivity and specificity
   up to moderately high noise levels. For noise levels at the top end of
   the spectrum, dHD has slightly higher sensitivity but much smaller
   specificity than dGHD, indicating that it detects a larger number of
   incorrect nodes.

Table 1.

   Sensitivity (TPR) and specificity (SPC) of the subnetwork detection
   algorithms for different values of γ, the noise level. The results are
   based on simulated RG networks
        γ   0.055 0.11  0.23  0.54  0.79  0.95
   dGHD TPR 0.897 0.889 0.855 0.627 0.570 0.789
        SPC 0.987 0.984 0.974 0.912 0.768 0.439
   dHD  TPR 0.914 0.904 0.872 0.725 0.712 0.862
        SPC 0.978 0.971 0.956 0.843 0.567 0.201
   [110]Open in a new tab

   Figure [111]5 provides an example of simulated networks
   [MATH: <mi mathvariant="script">A</mi> :MATH]
   and
   [MATH: <mi mathvariant="script">ℬ</mi> :MATH]
   and ground truth differential subnetworks
   [MATH: <msup><mrow><mi
   mathvariant="script">A</mi></mrow><mrow><mo>∗</mo></mrow></msup> :MATH]
   and
   [MATH: <msup><mrow><mi
   mathvariant="script">ℬ</mi></mrow><mrow><mo>∗</mo></mrow></msup> :MATH]
   as well as the differential subnetworks
   [MATH: <msup><mrow><mover accent="true"><mrow><mi
   mathvariant="script">A</mi></mrow><mo>^</mo></mover></mrow><mrow><mo>∗<
   /mo></mrow></msup> :MATH]
   and
   [MATH: <msup><mrow><mover accent="true"><mrow><mi
   mathvariant="script">ℬ</mi></mrow><mo>^</mo></mover></mrow><mrow><mo>∗<
   /mo></mrow></msup> :MATH]
   detected by dGHD in one of the 100 simulations. The corresponding
   sequence of p-values generated by running the dGHD algorithm in this
   example is shown in Fig. [112]2. It can be noticed how the null
   hypothesis of independence is rejected for all the subnetworks of size
   ranging from 1000 down to 200, at which point there is no evidence to
   reject the null, and the algorithm produced large p-values for all
   sizes smaller than 200.

Fig. 5.

   Fig. 5
   [113]Open in a new tab

   Example of differential subnetworks detected by dGHD using 2D RG
   networks.
   [MATH: <msup><mrow><mi
   mathvariant="script">A</mi></mrow><mrow><mo>∗</mo></mrow></msup> :MATH]
   and
   [MATH: <msup><mrow><mi
   mathvariant="script">ℬ</mi></mrow><mrow><mo>∗</mo></mrow></msup> :MATH]
   are the true simulated independent subnetworks, and
   [MATH: <msup><mrow><mover accent="true"><mrow><mi
   mathvariant="script">A</mi></mrow><mo>^</mo></mover></mrow><mrow><mo>∗<
   /mo></mrow></msup> :MATH]
   and
   [MATH: <msup><mrow><mover accent="true"><mrow><mi
   mathvariant="script">ℬ</mi></mrow><mo>^</mo></mover></mrow><mrow><mo>∗<
   /mo></mrow></msup> :MATH]
   are the subnetworks detected by the algorithm (γ≈0.23). Nodes belongs
   to differential subnetwork are coloured in green, and edges colored
   red. Please refer to Table [114]1 for full results

Application to co-methylation networks in ovarian cancer

   We present an application to a case-control epigenetic study of ovarian
   cancer. The dataset for this study was originally presented in
   [[115]42]. Methylation profiles for 27,578 CpGs islands were obtained
   from whole blood samples in 540 women, of which 266 were samples taken
   from postmenopausal women with ovarian cancer and 274 were from
   age-matched healthy controls. In our analysis we set out to compare
   control and case DNA co-methylation networks in search of a
   differential subnetwork.

   Raw data files were downloaded from GEO (repos. number [116]GSE19711),
   and were obtained from Illumina Infinium 27k Human DNA methylation
   Beadchip v1.2. The raw data was pre-processed by using the lumi package
   in R [[117]43]. After quantile normalization, PCA applied to the beta
   value was used to detect and remove extreme outliers. After quality
   control, 243 control samples and 215 case samples remained for further
   analysis. The networks was inferred by taking each probe as a node.
   Following [[118]44], an adjacency measure was computed as ω
   [ij]=|(1+cor(g [i],g [j]))/2|^b where cor(g [i],g [j]) denotes the
   Pearson’s correlation coefficient between beta values observed at the i
   ^th and j ^th CpG sites. The power exponent b was set to a default
   value of 12 so as to place more emphasis on higher positive
   correlations [[119]7]. Two nodes were linked in the network if ω [ij]
   was higher than 0.2 so that the presence of an edge indicates a strong
   correlation. This value also yields networks that roughly follow a SF
   model (see Fig. [120]6). The number of resulting edges is 48,224 and
   75,913 in the control network
   [MATH: <mi mathvariant="script">A</mi> :MATH]
   and case network
   [MATH: <mi mathvariant="script">ℬ</mi> :MATH]
   , respectively.

Fig. 6.

   Fig. 6
   [121]Open in a new tab

   Node degree distribution for control and case co-methylation networks.
   Both plots shows that the networks roughly follow SF network models

   At a significance level of 5 % and after correction for multiple
   testing, the dGHD algorithm detected a subnetwork of size 1,642, with
   1,954 edges in
   [MATH: <msup><mrow><mi
   mathvariant="script">A</mi></mrow><mrow><mo>∗</mo></mrow></msup> :MATH]
   and 12,556 edges in
   [MATH: <msup><mrow><mi
   mathvariant="script">ℬ</mi></mrow><mrow><mo>∗</mo></mrow></msup> :MATH]
   . The two resulting subnetworks are presented in Fig. [122]7. Although
   the algorithm does not constrain the differential networks to be
   connected, they both comprise a number of connected subgraphs. The
   Walktrap community detection algorithm, as implemented in the R package
   iGraph [[123]45], was used to identify communities in these two
   subnetworks, as shown in the figure. The density of the six largest
   communities, which are denoted
   [MATH: <msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>,</
   mo><mo>…</mo><mo>,</mo><msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>6</mn></mrow></msub> :MATH]
   , differs quite substantially between control and cancer networks. In
   almost all communities, the density is much higher in
   [MATH: <msup><mrow><mi
   mathvariant="script">ℬ</mi></mrow><mrow><mo>∗</mo></mrow></msup> :MATH]
   , with the exception of
   [MATH: <msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>6</mn></mrow></msub> :MATH]
   , where it is higher in
   [MATH: <msup><mrow><mi
   mathvariant="script">A</mi></mrow><mrow><mo>∗</mo></mrow></msup> :MATH]
   .

Fig. 7.

   Fig. 7
   [124]Open in a new tab

   DNA co-methylation networks: differential subnetworks
   [MATH: <msup><mrow><mi
   mathvariant="script">A</mi></mrow><mrow><mo>∗</mo></mrow></msup> :MATH]
   (controls) and
   [MATH: <msup><mrow><mi
   mathvariant="script">ℬ</mi></mrow><mrow><mo>∗</mo></mrow></msup> :MATH]
   (cases) detected by dGHD algorithm. Six main communities within the
   subnetworks are characterised by a much higher network density in
   cancer patients compared to healthy controls. Differential methylation
   is mostly concentrated in
   [MATH: <msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>2</mn></mrow></msub> :MATH]
   ,
   [MATH: <msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>3</mn></mrow></msub> :MATH]
   ,
   [MATH: <msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>5</mn></mrow></msub> :MATH]
   and
   [MATH: <msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>6</mn></mrow></msub> :MATH]
   (See also Table [125]2 and Fig. [126]8)

   To gain initial insight into the biological meaning of the subnetworks
   and the communities within them we used the R package GOstat [[127]46]
   to identify enriched Gene Ontology (GO) terms within two broad
   categories, Biological Processes (BP) and Molecular Functions (MF). At
   a 5 % significance level, the hypergeometric test detected 762 BP and
   154 MF statistically significant terms enriched in the subnetworks
   where most of these terms can be found in 6 communities. For instance,
   the top three BPs were response to stimulus, cellular response to
   stimulus and response to chemical stimulus, and the top three MFs were
   protein binding, collagen binding and RNA polymerase II transcription
   cofactor activity. Furthermore, we carried out a pathway enrichment
   analysis to identify any significantly enriched KEGG pathways. At a 5 %
   significance level, 12 pathways were found to be enriched, including
   hematopoietic cell lineage, acute myeloid leukemia, and regulation of
   action cytoskeleton.

   Probes showing statistically significant changes in mean methylation
   levels were detected by a two-sample SAM statistic as implemented in
   the R package samr. After Benjamini & Hochberg correction for multiple
   testing, 2,770 probes were found to be differentially methylated (DM)
   at the 5 % significance level. Of these, 620 were also found in the
   differential subnetworks, 90 % of which are concentrated in communities
   [MATH: <msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>,</
   mo><msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>3</mn></mrow></msub><mo>,</
   mo><msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>5</mn></mrow></msub> :MATH]
   and
   [MATH: <msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>6</mn></mrow></msub> :MATH]
   . For example in community
   [MATH: <msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>3</mn></mrow></msub> :MATH]
   , there are 109 probes in total, half of which (54) are differentially
   methylated. Figure [128]8 shows the distribution of DM probes in the
   subnetworks. These results suggest that a differential analysis based
   exclusively on detecting mean levels of differential methylation may
   miss important differences that can only be identified by comparing the
   interaction networks.

Fig. 8.

   Fig. 8
   [129]Open in a new tab

   Visualization of the distribution of differential methylated probes
   (red) in differential subnetworks detected by dGHD in the DNA
   co-methylation networks

   Table [130]2 provides a breakdown of the number of probes,
   differentially methylated probes (q [i]), density ratio between control
   and case subnetworks (R [i]), and distribution of enriched GO terms and
   KEGG pathways in the 6 communities (see also Fig. [131]7). Replicated
   GO terms and pathways involved in different communities were excluded
   in the subtotal. In
   [MATH: <msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>5</mn></mrow></msub> :MATH]
   we found that all top 6 ranked significant BP terms were related to
   interleukin-3 (IL-3), a cytokine that is made by leukocytes and other
   cells in the body. IL-3 can increase the number of leukocytes,
   neutrophils, and platelets made by the bone marrow [[132]47]. As
   Myelosuppression induced by chemotherapy is closely related to the
   effect of IL-3 in blood cells when suppressing a tumor during the
   therapy [[133]48], this may offer a possible explanation for the
   observed enrichment results. A possible explanation for the observed
   difference in the
   [MATH: <msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>6</mn></mrow></msub> :MATH]
   cluster may be related to hypermethylation being linked to cancer
   [[134]49, [135]50].

Table 2.

   DNA co-methylation networks: a summary for different communities
   [MATH: <msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>1</mn></mrow></msub> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>2</mn></mrow></msub> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>3</mn></mrow></msub> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>4</mn></mrow></msub> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>5</mn></mrow></msub> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="script">C</mi></mrow><mrow><mn>6</mn></mrow></msub> :MATH]
   Subtotal              Overall
   # of probes                              418  66  109 34  347  200 1174 1642
   q [i]                                      4  66   54  1  338   97  560  620
   R [i]                                   .181 .013 .012  0 .002 23.4 .145 .156
   BP                                       320  25   38 22  236   54  568  762
   MF                                        54   4   15  3   43   27  125  154
   KEGG                                       5   0    1  1    0    1    8   12
   [136]Open in a new tab

Conclusions

   The comparison of DNA methylation or gene expression profiles across
   conditions is enabling the discovery of novel biomarkers for diagnosis
   or prognosis, and holds the promise to identify novel targets for
   therapeutical intervention. In this paper we have discussed the problem
   of comparing two labelled networks that are representative of two
   conditions (e.g. healthy and diseased tissues) and detecting
   statistically significant differences in their topology. Identifying
   disrupted interaction patterns in two labelled network comparisons is a
   challenging problem requiring novel statistical tools, and three
   contributions have been made here in this direction. Firstly, we have
   proposed the GHD, a distance between two labelled networks that detects
   more subtle differences compared to the traditional Hamming distance.
   Secondly, we have demonstrated that the GHD can be used as a
   non-parametric test to assess whether two paired networks are
   statistically independent, and have described how p-values can be
   computed in closed-form without requiring computationally expensive
   permutation procedures. The plausibility of the conditions underpinning
   our derivations has been discussed using scale-free random network
   models as an example. Thirdly, we have proposed a fast subnetwork
   detection procedure, the dGHD algorithm, to detect localized
   topological differences between two paired networks. This methodology
   provides a useful addition to standard two-sample tests that are
   commonly used for biomarker discovery. An initial evaluation has been
   carried out by comparing co-methylation networks inferred from healthy
   and cancer patients, and detecting differential subnetworks. Further
   experimental evaluation on independent datasets will be required to
   validate these results. In future work, the methodology could be
   extended to the case of more than two conditions.

Acknowledgements