Abstract

Background

   The advance of omics technologies has made possible to measure several
   data modalities on a system of interest. In this work, we illustrate
   how the Non-Parametric Combination methodology, namely NPC, can be used
   for simultaneously assessing the association of different molecular
   quantities with an outcome of interest. We argue that NPC methods have
   several potential applications in integrating heterogeneous omics
   technologies, as for example identifying genes whose methylation and
   transcriptional levels are jointly deregulated, or finding proteins
   whose abundance shows the same trends of the expression of their
   encoding genes.

Results

   We implemented the NPC methodology within “omicsNPC”, an R function
   specifically tailored for the characteristics of omics data. We compare
   omicsNPC against a range of alternative methods on simulated as well as
   on real data. Comparisons on simulated data point out that omicsNPC
   produces unbiased / calibrated p-values and performs equally or
   significantly better than the other methods included in the study;
   furthermore, the analysis of real data show that omicsNPC (a) exhibits
   higher statistical power than other methods, (b) it is easily
   applicable in a number of different scenarios, and (c) its results have
   improved biological interpretability.

Conclusions

   The omicsNPC function competitively behaves in all comparisons
   conducted in this study. Taking into account that the method (i)
   requires minimal assumptions, (ii) it can be used on different studies
   designs and (iii) it captures the dependences among heterogeneous data
   modalities, omicsNPC provides a flexible and statistically powerful
   solution for the integrative analysis of different omics data.

Introduction

   Recent developments in various high-throughput technologies have
   heightened the need for integrative analysis methods. Nowadays, several
   studies measure heterogeneous data modalities, as for example
   methylation levels, protein abundance, transcriptomics, etc., on the
   same or partially overlapping biological samples/subjects. The key idea
   is to measure several aspects of the same system in order to gain a
   deeper understanding of the underlying biological mechanisms. In such
   settings, a common tasks is identifying molecular quantities that are
   (a) measured by different omics technologies, (b) related to each other
   (e.g., associated to the same gene), and (c) that are conjointly
   affected by the factor(s) under study or associated to a relevant
   outcome, in a statistically significant way. A typical example is the
   identification of differentially expressed genes that are also
   characterized by one or more differentially methylated epigenetic
   markers [[28]1–[29]3]. Other studies investigate factors that
   simultaneously enhance the expression of a given protein and the
   abundance of its related metabolites [[30]4,[31]5]. Another scenario
   (somewhat less common) is the measurement of the same molecular
   quantities with different technologies, as for example when previously
   produced microarray gene expression profiles should be co-analyzed with
   newly produced RNA-seq data [[32]6].

   More in general, the presence of multiple omics data allows the
   identification of “differentially behaving genes”, i.e., genes that are
   affected by the factors under study in one or more of the
   transcription, translation or epigenetic levels.

   In this work we introduce and evaluate a novel application of a known
   statistical methodology, the Non-Parametric Combination (NPC) of
   dependent permutation tests [[33]7], for the integrative analysis of
   heterogeneous omics data. NPC has been described in several scientific
   papers and books [[34]7–[35]9], and it has been applied in the fields
   of industrial production [[36]10], face/expressions analysis [[37]11]
   and neuroimaging [[38]12]. However, to the best of our knowledge, this
   methodology has never been applied in molecular biology. NPC provides a
   theoretically-sound statistical framework for the integrative analysis
   of heterogeneous omics data measured on correlated samples. NPC assumes
   a global null-hypothesis of no association between any of the data
   modalities and an outcome of interested. This global null-hypothesis is
   first broken down in a set of “partial null hypotheses”, one for each
   omics dataset. NPC then uses a permutation procedure that preserves
   correlations among datasets for simultaneously producing a single
   p-value assessing the global null-hypothesis, as well as a partial
   p-value for each partial null-hypothesis.

   NPC has several advantages that make it especially suitable for being
   applied on the analysis of multiple, heterogeneous omics data.
   Particularly, NPC:
     * ▪
       allows the integration of data modalities characterized by
       different encodings, ranges and data distributions
     * ▪
       employs minimal assumptions; particularly, it does not assume
       independence across data modalities, taking in due account
       correlations among datasets
     * ▪
       provides an interpretable metric as final output, a p-value, which
       reflects the overall evidence of rejecting or not the global null
       hypothesis
     * ▪
       identifies changes that are supported by at least one modality,
       assigning a lower p-value to findings which are supported by more
       modalities
     * ▪
       provides the user with the flexibility of weighting differently the
       information from each dataset based on biological knowledge and
       experience
     * ▪
       exhibits higher statistical power than analyzing each data modality
       in isolation and thus increase the number of true findings.

   NPC is a general methodology, and must be tailored on the
   idiosyncrasies of the specific data at hand. In order to allow the
   application of NPC on omics data, we realized the R function omicsNPC,
   freely available in the STATegRa R-Bioconductor package [[39]13]. The
   omicsNPC function is able to process and co-analyze different types of
   omics data, and to combine their results following the NPC principles.

   We characterize the performances of omicsNPC in comparison with other
   methods for the integrative analysis of omics information. To this end
   simulated as well as real data are used. In the simulated data, each
   data modality is first analyzed independently using an appropriate
   statistical approach, and then all data modalities are conjointly
   analyzed using in turn omicsNPC and other, alternative integrative
   methods. The results show that the p-values provided by omicsNPC are
   calibrated, meaning that they follow a uniform distribution when no
   association is present between any data modality and the outcome of
   interest. Leek and Storey [[40]14] showed that a number of procedures
   for controlling the false discovery rate strongly control their
   respective error measure when applied on calibrated p-value. Moreover,
   omicsNPC exhibits increased statistical power in all simulated
   scenarios conducted in this study, by retrieving true findings equally
   well or better than the other methods, especially when correlation
   structures are introduced into the data.

   We further explored NPC/omicsNPC capabilities on three separate, real
   data applications: (a) the identification of gene/protein pairs that
   are deregulated in Glioblastoma, (b) pathway enrichment analysis on
   expression / methylation profiles in Schizophrenia, and (c) the
   integration of RNA-seq and microarray expression profiles measured on
   the same Breast Cancer subjects. The results of these different
   case-studies pointed out that applying omicsNPC provides more
   biological insights than analyzing each modality in isolation.

Related work

   The simplest approach adopted in the literature for assessing the
   conjoint deregulation of multiple, linked molecular quantities consists
   in analysing each data modality separately, using one of the numerous
   available methods [[41]15,[42]16], and then try to combine the results
   in an ad-hoc way. Common integration strategies are verifying whether
   linked molecular quantities are both associated with the outcome of
   interest (see for example [[43]17,[44]18]), or graphically inspecting
   whether their expression / abundance follow similar patterns [[45]19].
   These methods do not provide quantitative scores for assessing the
   overall evidence of conjoint differential behaviour, and the lack of an
   overall evidence makes the user to decide if the observed differences
   are relevant or not.

   Meta-analysis methods are a more theoretically grounded alternative
   [[46]20]. Meta-analysis has emerged an effective methodology for data
   integration, showing that combining information of several studies is
   more powerful than analysing each study/ data-modality separately, and
   results can be reproducible and robust by following well defined
   frameworks [[47]21,[48]22]. Regarding the identification of
   differentially expressed molecular quantities, meta-analysis methods
   can be grouped in the following four categories: combining p-values,
   combining ranks, combining effect sizes and directly merging data
   modalities [[49]23]. Here we further analyse the first two as more
   relevant to the problem under discussion; the latter two are usually
   unfeasible for integrating heterogeneous omics data.

   Combining p-values approaches has become popular in the integrative
   analysis of datasets with common properties [[50]23], where the
   underlying distributions of the data are assumed to be the same.
   According to this method datasets are analysed independently using the
   same statistical test and the resulting p-values, partial p-values, are
   combined using a proper function, usually Fisher or Stouffer method.
   The resulting statistic is transformed into a p-value, overall p-value.
   Partial and overall p-values are generated either parametrically, under
   the assumption that p-values are uniformly distributed or
   not-parametrically by permutation-based analysis [[51]24]. However,
   different data types require different statistical tests, characterized
   by diverse statistical distribution, and the resulting partial p-values
   could be inappropriate to combine, as discussed in the supplementary
   material of a recent study [[52]25].

   Combining ranks approaches require the genes to be ranked in each study
   / data modality according to some measure of differential expression.
   Then a combined rank is computed by taking the product, the mean, or
   the sum of the partial ranks. The statistical significance of the
   combined rank is usually assessed through a permutation-based procedure
   [[53]23]. The initial impression is that combining ranks methods could
   be applied in the integrative analysis of different omics data. On one
   hand the underlying assumptions are minimal: permutation-based testing
   assumes that samples are independent and identically distributed
   [[54]26]. However, several questions remain unanswered. In which cases
   these methods provide calibrated p-values? What is the effect of
   choosing a combination method over the others? Ranking methods return
   findings which are supported by all modalities, but what if the
   researcher is interested in retrieving elements that are supported by
   at least one modality?

   Finally, meta-analysis approaches typically do not consider
   correlations that may be present among the different data modalities
   [[55]24,[56]26]. In a typical meta-analysis schema, datasets are
   assumed to be produced by independent studies. Yet, when different data
   modalities have been measured on samples from the same subjects /
   patients, correlations among datasets are expected and should be taken
   into account. Failure to do so may lead to severe inflation of Type I
   error rates in the final findings [[57]27]. A considerable amount of
   literature has been published on extending combining p-values
   algorithms in order to take correlations into account, which is a
   non-trivial task [[58]27–[59]29]. The problem is even more complex if a
   single dataset is expected to be more informative than the other and
   thus a weighted combining function is desired [[60]29].

Methods

The Non-Parametric Combination methodology

   The omicsNPC function relies heavily on the Non-Parametric Combination
   (NPC) methodology. Here we use an example to overview the method, state
   some important points and show the modifications we employ in the case
   of omicsNPC. Let us assume that we have measured different omics
   modalities (M^1, M^2, …, M^i) over the same biological samples. We also
   make the simplistic assumption that each modality provides exactly one
   measurement for each gene; dropping this assumption is discussed later
   in the text. Our question of interest is which genes (Gs) are
   associated with a given outcome of interest / experimental factor,
   e.g., which genes behave differently between two conditions. NPC is
   applied by following the steps reported below and depicted in [61]Fig
   1.

Fig 1. Combining p-values in the NPC framework.

   [62]Fig 1
   [63]Open in a new tab

   See section “Non-Parametric Combination methodology” for an extensive
   description of the figure.

   Step 1: Divide the question of interest to a set of i partial null
   hypotheses, [64]Fig 1A.

   Partial null hypotheses:

   ▪
   [MATH: <msubsup><mrow><mi
   mathvariant="normal">H</mi></mrow><mrow><mn>0</mn></mrow><mrow><mn>1</m
   n></mrow></msubsup> :MATH]
   : the level of G is NOT associated to the outcome of interest in M^1.

   ▪
   [MATH: <msubsup><mrow><mi
   mathvariant="normal">H</mi></mrow><mrow><mn>0</mn></mrow><mrow><mn>2</m
   n></mrow></msubsup> :MATH]
   : the level of G is NOT associated to the outcome of interest in M^2.

   ▪ …

   ▪
   [MATH: <msubsup><mrow><mi
   mathvariant="normal">H</mi></mrow><mrow><mn>0</mn></mrow><mrow><mi
   mathvariant="normal">i</mi></mrow></msubsup> :MATH]
   : the level of G is NOT associated to the outcome of interest in M^i.

   Global null: all partial null hypotheses are true

   Global alternative: at least ONE of partial null is NOT true

   Step 2: For each partial null hypothesis i, select a test statistic T^i
   sensitive to the alternative and calculate its value in the observed
   data,
   [MATH: <mo>(</mo><msubsup><mrow><mi
   mathvariant="normal">T</mi></mrow><mrow><mi
   mathvariant="normal">b</mi><mo>=</mo><mn>0</mn></mrow><mrow><mn>1</mn><
   /mrow></msubsup><msubsup><mrow><mo>,</mo><mspace
   width="0.25em"></mspace><mtext>T</mtext></mrow><mrow><mtext>b</mtext><m
   o>=</mo><mn>0</mn></mrow><mrow><mn>2</mn></mrow></msubsup><mo>,</mo><mo
   >…</mo><mo>,</mo><msubsup><mrow><mi
   mathvariant="normal">T</mi></mrow><mrow><mi
   mathvariant="normal">b</mi><mo>=</mo><mn>0</mn></mrow><mrow><mtext>i</m
   text></mrow></msubsup><mo>)</mo> :MATH]
   , [65]Fig 1A. For each partial null hypothesis a suitable statistic
   should be employed, depending by the nature of the corresponding omics
   dataset.

   Step 3: Estimate the permutation distribution of each test statistic.
   To this end perform the following B times: a) permute the samples in a
   manner consistent with the global null hypothesis, and b) calculate the
   test statistic in the permuted data,
   [MATH: <mo>(</mo><msubsup><mrow><mi
   mathvariant="normal">T</mi></mrow><mrow><mi
   mathvariant="normal">b</mi></mrow><mrow><mn>1</mn></mrow></msubsup><msu
   bsup><mrow><mo>,</mo><mspace
   width="0.25em"></mspace><mtext>T</mtext></mrow><mrow><mtext>b</mtext></
   mrow><mrow><mn>2</mn></mrow></msubsup><mo>,</mo><mo>…</mo><mo>,</mo><ms
   ubsup><mrow><mi mathvariant="normal">T</mi></mrow><mrow><mi
   mathvariant="normal">b</mi></mrow><mrow><mtext>i</mtext></mrow></msubsu
   p><mo>)</mo> :MATH]
   , b ∈ {1,…,B}. Note that each time the samples must be permuted in the
   same exact way across all data modalities, in order to preserve
   dependencies among the measurements ([66]Fig 1B). This is an essential
   technicality of the NPC permutation schema.

   Step 4: Calculate the permutation p-values of the partial tests, namely
   partial p-values,
   [MATH: <mo>(</mo><mrow><msubsup><mrow><mi
   mathvariant="normal">λ</mi></mrow><mrow><mi
   mathvariant="normal">b</mi><mo>=</mo><mn>0</mn></mrow><mrow><mn>1</mn><
   /mrow></msubsup><msubsup><mrow><mo>,</mo><mi
   mathvariant="normal">λ</mi></mrow><mrow><mtext>b</mtext><mo>=</mo><mn>0
   </mn></mrow><mrow><mn>2</mn></mrow></msubsup><mo>,</mo><mo>…</mo><mo>,<
   /mo><msubsup><mrow><mi
   mathvariant="normal">λ</mi></mrow><mrow><mtext>b</mtext><mo>=</mo><mn>0
   </mn></mrow><mrow><mtext>i</mtext></mrow></msubsup></mrow><mo>)</mo>
   :MATH]
   . For each statistic T^i calculate its partial p-value on the observed
   data, by comparing its values on the observed data,
   [MATH: <msubsup><mrow><mi mathvariant="normal">T</mi></mrow><mrow><mi
   mathvariant="normal">b</mi><mo>=</mo><mn>0</mn></mrow><mrow><mtext>i</m
   text></mrow></msubsup> :MATH]
   , against its empirical distribution approximated through permutation,
   [MATH: <msubsup><mrow><mi mathvariant="normal">T</mi></mrow><mrow><mi
   mathvariant="normal">b</mi></mrow><mrow><mi
   mathvariant="normal">i</mi></mrow></msubsup> :MATH]
   , b ∈ {1,…,B}. Partial p-values are computed as:
   [MATH: <msubsup><mrow><mi mathvariant="normal">λ</mi></mrow><mrow><mi
   mathvariant="normal">b</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi
   mathvariant="normal">i</mi></mrow></msubsup><mo>=</mo><mrow><mrow><mo>[
   </mo><mn>1</mn><mo>+</mo><mi mathvariant="normal">Σ</mi><mi
   mathvariant="normal">I</mi><mo>(</mo><msubsup><mrow><mi
   mathvariant="normal">T</mi></mrow><mrow><mi
   mathvariant="normal">b</mi><mo>≠</mo><mn>0</mn></mrow><mrow><mi
   mathvariant="normal">i</mi></mrow></msubsup><mo>≥</mo><msubsup><mrow><m
   i mathvariant="normal">T</mi></mrow><mrow><mi
   mathvariant="normal">b</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi
   mathvariant="normal">i</mi></mrow></msubsup><mo>)</mo><mo>]</mo></mrow>
   <mo>/</mo><mrow><mo>(</mo><mi
   mathvariant="normal">B</mi></mrow></mrow><mo>+</mo><mn>1</mn><mo>)</mo>
   :MATH]
   (1)

   where the indicator function I( ) assumes value 1 if its argument is
   true. Compute partial “pseudo” p-value
   [MATH: <mo>(</mo><msubsup><mrow><mi
   mathvariant="normal">λ</mi></mrow><mrow><mi
   mathvariant="normal">b</mi></mrow><mrow><mn>1</mn></mrow></msubsup><msu
   bsup><mrow><mo>,</mo><mi
   mathvariant="normal">λ</mi></mrow><mrow><mtext>b</mtext></mrow><mrow><m
   n>2</mn></mrow></msubsup><mo>,</mo><mo>…</mo><mo>,</mo><msubsup><mrow><
   mi
   mathvariant="normal">λ</mi></mrow><mrow><mtext>b</mtext></mrow><mrow><m
   text>i</mtext></mrow></msubsup><mo>)</mo> :MATH]
   , for each permutation b ∈ {1,…,B}, [67]Fig 1C. Note that with respect
   to the standard estimators, both the numerator and denominator of
   formula (1) have been augmented by one unit. This is a subtle yet
   important detail: in a multiple testing context, permutation p-values
   should never be zero in order to avoid serious inflation of type I
   error rate, as discussed by Phipson and Smyth [[68]30].

   Step 5: Use an appropriate convex function to combine p-values across
   data modalities into a single “global” test statistic. A final vector
   of length B + 1 is produced by combining the p-values computed at step
   4. The first element
   [MATH: <msubsup><mrow><mi mathvariant="normal">T</mi></mrow><mrow><mi
   mathvariant="normal">b</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi
   mathvariant="normal">g</mi><mi mathvariant="normal">l</mi><mi
   mathvariant="normal">o</mi><mi mathvariant="normal">b</mi><mi
   mathvariant="normal">a</mi><mi
   mathvariant="normal">l</mi></mrow></msubsup> :MATH]
   summarizes the partial p-values
   [MATH: <mo>(</mo><mrow><msubsup><mrow><mi
   mathvariant="normal">λ</mi></mrow><mrow><mi
   mathvariant="normal">b</mi><mo>=</mo><mn>0</mn></mrow><mrow><mn>1</mn><
   /mrow></msubsup><msubsup><mrow><mo>,</mo><mi
   mathvariant="normal">λ</mi></mrow><mrow><mtext>b</mtext><mo>=</mo><mn>0
   </mn></mrow><mrow><mn>2</mn></mrow></msubsup><mo>,</mo><mo>…</mo><mo>,<
   /mo><msubsup><mrow><mi
   mathvariant="normal">λ</mi></mrow><mrow><mtext>b</mtext><mo>=</mo><mn>0
   </mn></mrow><mrow><mtext>i</mtext></mrow></msubsup></mrow><mo>)</mo>
   :MATH]
   observed on the initial datasets, whereas the remaining elements
   [MATH: <msubsup><mrow><mi mathvariant="normal">T</mi></mrow><mrow><mi
   mathvariant="normal">b</mi></mrow><mrow><mi
   mathvariant="normal">g</mi><mi mathvariant="normal">l</mi><mi
   mathvariant="normal">o</mi><mi mathvariant="normal">b</mi><mi
   mathvariant="normal">a</mi><mi
   mathvariant="normal">l</mi></mrow></msubsup> :MATH]
   , b ∈ {1,…,B} are derived by combining the corresponding “pseudo”
   p-values, [69]Fig 1D.

   Step 6: Calculate a p-value of the global test. Similarly to the
   partial p-values, the global p-value is computed by equation (1),
   [70]Fig 1E. Note that also global p-values are never allowed to assume
   zero value.

   Step 7: Assess the evidence of the global null hypothesis by using the
   global p-value. Each partial hypothesis is evaluated based on its
   partial p-value. If necessary, both global and partial p-values should
   be adjusted for multiple testing.

   An important point of this methodology is the selection of a convex
   function to combine p-values in step 5. Pesarin and Salmaso [[71]7]
   discuss three different functions that are suitable for the NPC
   framework: Tippet, Liptak and Fisher. Each function corresponds to a
   different rejection region for the global null hypothesis, as shown in
   [72]Fig 2.

Fig 2. Combining functions’ rejection regions.

   [73]Fig 2
   [74]Open in a new tab

   Rejection regions at significance level 0.15 for the Tippett (blue,
   dashed), Liptak (green, dotted) and Fisher (red, continuous) functions
   in combining two independent p-values. The rejection regions are the
   areas below the respective lines. The Tippett function rejects the
   global null-hypothesis if at least one of the two partial tests is
   significant (violet and orange areas); in contrast, the Liptak function
   can reject the global null hypothesis if both partial tests are not
   significant, but their combination is (yellow area). However, the
   presence of a single, extremely high p-value will force the Liptak
   function to accept the global null hypothesis (orange areas). The
   Fisher function has an intermediate behavior between the previous two.

   The Tippett function,
   [MATH: <msub><mrow><mi mathvariant="normal">Τ</mi></mrow><mrow><mi
   mathvariant="normal">Τ</mi></mrow></msub><mo>=</mo><mtext>max</mtext><m
   o>(</mo><mrow><mn>1</mn><mo>−</mo><msubsup><mrow><mi
   mathvariant="normal">λ</mi></mrow><mrow><mi
   mathvariant="normal">b</mi></mrow><mrow><mi
   mathvariant="normal">i</mi></mrow></msubsup></mrow><mo>)</mo> :MATH]
   , rejects the null hypothesis as soon as any of the partial p-values
   are below the significance threshold. However, this functions fails in
   summing up the contribution of several partial p-values that,
   considered together, may indicate that the global null-hypothesis
   should be rejected. Thus, the Tippett function yields the most powerful
   global test when only one or few, but not all, sub-alternatives may
   occur.

   In contrast, the normal combining function, Liptak,
   [MATH: <msub><mrow><mi mathvariant="normal">T</mi></mrow><mrow><mi
   mathvariant="normal">L</mi></mrow></msub><mo>=</mo><mo>∑</mo><msup><mro
   w><mi
   mathvariant="normal">Φ</mi></mrow><mrow><mo>−</mo><mn>1</mn></mrow></ms
   up><mo>(</mo><mrow><mn>1</mn><mo>−</mo><msubsup><mrow><mi
   mathvariant="normal">λ</mi></mrow><mrow><mi
   mathvariant="normal">b</mi></mrow><mrow><mi
   mathvariant="normal">i</mi></mrow></msubsup></mrow><mo>)</mo> :MATH]
   , would reject the global null-hypothesis when several partial tests
   are slightly significant, but would accept it if one or more partial
   tests support it. Fisher's function summarizes p-values as follows,
   [MATH: <msub><mrow><mi mathvariant="normal">T</mi></mrow><mrow><mi
   mathvariant="normal">F</mi></mrow></msub><mo>=</mo><mo>−</mo><mn>2</mn>
   <mo>⋅</mo><mo>∑</mo><mtext>log</mtext><mo>(</mo><mrow><msubsup><mrow><m
   i mathvariant="normal">λ</mi></mrow><mrow><mi
   mathvariant="normal">b</mi></mrow><mrow><mi
   mathvariant="normal">i</mi></mrow></msubsup></mrow><mo>)</mo> :MATH]
   , and its behaviour is intermediate between those of Tippett and
   Liptak. Other functions, such as Mahalanobis and Lancaster's distance
   could also be used, however we did not employ them in the experiments
   conducted in this study. Selected combining functions also allow
   weighting differently the contribution of each data modality. Given a
   set of positive weights w^i such that ∑w^i = 1, the Fisher function
   becomes
   [MATH: <msub><mrow><mi mathvariant="normal">T</mi></mrow><mrow><mi
   mathvariant="normal">F</mi></mrow></msub><mo>=</mo><mo>−</mo><mn>2</mn>
   <mo>⋅</mo><mo>∑</mo><msup><mrow><mi
   mathvariant="normal">w</mi></mrow><mrow><mi
   mathvariant="normal">i</mi></mrow></msup><mo>⋅</mo><mtext>log</mtext><m
   o>(</mo><mrow><msubsup><mrow><mi
   mathvariant="normal">λ</mi></mrow><mrow><mi
   mathvariant="normal">b</mi></mrow><mrow><mi
   mathvariant="normal">i</mi></mrow></msubsup></mrow><mo>)</mo> :MATH]
   , while the Tippet and Liptak functions become
   [MATH: <msub><mrow><mi mathvariant="normal">Τ</mi></mrow><mrow><mi
   mathvariant="normal">Τ</mi></mrow></msub><mo>=</mo><mtext>max</mtext><m
   row><mo>(</mo><msup><mrow><mi
   mathvariant="normal">w</mi></mrow><mrow><mi
   mathvariant="normal">i</mi></mrow></msup><mo>⋅</mo><mo>(</mo><mrow><mn>
   1</mn><mo>−</mo><msubsup><mrow><mi
   mathvariant="normal">λ</mi></mrow><mrow><mi
   mathvariant="normal">b</mi></mrow><mrow><mi
   mathvariant="normal">i</mi></mrow></msubsup></mrow><mo>)</mo><mo>)</mo>
   </mrow> :MATH]
   and
   [MATH: <msub><mrow><mi mathvariant="normal">T</mi></mrow><mrow><mi
   mathvariant="normal">L</mi></mrow></msub><mo>=</mo><mo>∑</mo><msup><mro
   w><mi mathvariant="normal">w</mi></mrow><mrow><mi
   mathvariant="normal">i</mi></mrow></msup><mo>⋅</mo><msup><mrow><mi
   mathvariant="normal">Φ</mi></mrow><mrow><mo>−</mo><mn>1</mn></mrow></ms
   up><mo>(</mo><mrow><mn>1</mn><mo>−</mo><msubsup><mrow><mi
   mathvariant="normal">λ</mi></mrow><mrow><mi
   mathvariant="normal">b</mi></mrow><mrow><mi
   mathvariant="normal">i</mi></mrow></msubsup></mrow><mo>)</mo> :MATH]
   , respectively. The weights to use can be selected on the basis of
   biological or technical considerations (e.g., the expected amount of
   noise in each data modality).

Applying the NPC on heterogeneous omics data in practice

Mapping measurements across omics modalities

   In the simplified explanation presented in the “Non-Parametric
   Combination methodology” section, each omics dataset was assumed to
   provide exactly one measurement for each gene. This is often not the
   case in real studies, where each data modality is usually defined over
   a distinct set of measurements, and the relation among these sets is
   often complex and strictly dependent on their biological significance.
   For example, multiple methylation markers can be associated with the
   transcriptional activity of the same gene; furthermore, this
   association is not always certain, especially in case of markers close
   to the genomic region of the gene but not physically laying on it. The
   same applies for genomic variants, for the interplay between genes and
   proteins, or proteins and metabolites.

   The NPC methodology requires features measured by different data
   modalities to be mapped to each other. While the exact mapping strategy
   can be defined only in the context of each specific study, general
   guidelines can be provided for the general case where each measurement
   from data modality M^1 can be mapped to multiple features in data
   modality M^2. A first possible approach is pairing each M^1 measurement
   in turn with all their corresponding M^2 measurements. Another solution
   is summarizing the information from each group of M^2 measurements
   corresponding to a single feature in M^1. If methylation and expression
   data are available, the first approach leads to pairing the expression
   value of each gene with the methylation level of each epigenetic
   markers attributable to that gene, in turn. In contrast, the second
   solution requires to compute a single methylation value for each gene.

   Integrating more than two modalities requires more complicate mapping
   strategies. If suitable, a convenient solution is defining a reference
   set of elements, typically genes, and summarizing all data modalities
   in such a way that there is one measurement for each element and data
   type.

Analysing partially overlapping samples across datasets

   The NPC was originally conceived for co-analyzing datasets measured
   over the exact same samples. This is often too restrictive in biology,
   where different datasets might share only part of the samples. For
   example, proteomics profiles might be available for all the patients
   involved in a study, while expression profiles are available only for a
   subset of them. The simplistic solution of co-analyzing only the
   samples in common clearly discards potentially relevant information. We
   argue that a better solution is co-analyzing all samples, by preserving
   the same permutation schema for the overlapping ones. In such a way the
   correlation among datasets are preserved, while all available
   information is employed in the analysis.

Interpreting global and partial NPC p-values

   A significant global p-value λ^global indicates that not all data
   modalities comply with the global null-hypothesis; however, it does not
   indicate which omics datasets departs from it. To this scope, the
   partial p-values
   [MATH: <mo>(</mo><mrow><msubsup><mrow><mi
   mathvariant="normal">λ</mi></mrow><mrow><mi
   mathvariant="normal">b</mi><mo>=</mo><mn>0</mn></mrow><mrow><mn>1</mn><
   /mrow></msubsup><msubsup><mrow><mo>,</mo><mspace
   width="0.25em"></mspace><mi
   mathvariant="normal">λ</mi></mrow><mrow><mtext>b</mtext><mo>=</mo><mn>0
   </mn></mrow><mrow><mn>2</mn></mrow></msubsup><mo>,</mo><mo>…</mo><mo>,<
   /mo><mspace width="0.25em"></mspace><msubsup><mrow><mi
   mathvariant="normal">λ</mi></mrow><mrow><mtext>b</mtext><mo>=</mo><mn>0
   </mn></mrow><mrow><mtext>i</mtext></mrow></msubsup></mrow><mo>)</mo>
   :MATH]
   can be used for a “post-hoc” analysis. There are a couple of cases of
   particular interest: (i) the global p-value is significant, along with
   only one partial p-value. In this case the data modality corresponding
   to the significant partial p-value drives the results, and should be
   further investigated; (ii) no partial p-value is significant, even if
   the global p-value is. This latter case denotes findings that require
   cumulating information from several omics datasets in order to reach
   significance, and that would have not being retrieved by analyzing each
   data modality in isolation.

The omicsNPC function

   We implemented the “omicsNPC” function within the STATegRa [[75]13]
   Bioconductor R package for facilitating the application of the NPC
   methodology on omics datasets. The function accepts an arbitrary number
   of omics datasets, along with their respective study designs
   represented as data frames. The datasets are expected to be defined
   over the same (or overlapping sets of) samples. Datasets defined over
   disjoint sets of samples can also be analyzed, even though
   between-datasets correlations cannot be defined in this extreme case,
   and the NPC methodology becomes equivalent to a permutation-based,
   non-parametric meta-analysis method. The function internally uses the
   limma function [[76]15] for computing the association between
   measurements and the factor under study / outcome of interest, while
   adjusting for the remaining covariates. The factor under study can be a
   dichotomous variable (as in case-control studies), a multi-class factor
   or a continuous outcome. The limma function is directly applied on
   omics data that can be assumed to be (approximately) normally
   distributed, as for examples microarray-derived measurements,
   proteomics, metabolomics, while Next Generation Sequencing (NGS) data
   (e.g., RNAseq), are pretreated with the voom function [[77]16]. The
   output of the function is a matrix containing, for each measurement,
   the partial p-values, as well the global p-values computed with Fisher,
   Liptak and Tippett functions. A more exhaustive description of the
   function, as well as a detailed explanation regarding its input
   arguments and output, can be found in the STATegRa help pages.

NPC comparative evaluation

   We evaluated the performance of the omicsNPC function in an exhaustive
   comparative study over simulated data. The scope of these analysis is
   to quantify the capability of each method in retrieving the genes that
   show a deregulated behavior in one or more heterogeneous datasets.

Simulated data

   These simulations investigate how omicsNPC and the other methods behave
   when analysing data characterized by encodings and distributions that
   commonly occur in biological studies. Particularly, we simulated count
   data following a negative binomial distribution (representative of
   RNA-seq data), continuous data distributed (approximately) normally
   (microarray data), three-level ordinal data (Single Nucleotide
   Polymorphism, SNP), as well as data generated by the uniform
   distribution (for reference purposes).

   For sake of simplicity, across all experimentations we assume a
   case-control study design, where the factor under study is a
   dichotomous variables and no additional covariates are taken into
   account. The issue of mapping features across different data modalities
   is not addressed in these analysis, and each data modality is assumed
   to provide one measurement for each gene.

   For each simulation we fixed the total number of measurements,
   Num_genes, and Num_samples, the total number of samples per group. The
   RNAseq dataset, namely RNAseq_sim, was simulated using the R package
   compcodeR [[78]31], while the microarray data, Microarray_sim, were
   simulated using the R package OCplus [[79]32]. SNP data SNP_sim were
   simulated by randomly sampling values from the set {0, 1, 2}, while
   Unif_sim data were sampled from the uniform distribution. For each
   dataset a number of measurements were deregulated between the two
   groups, respectively RNAseq_DE for RNAseq data, Microarray_DE for
   microarray data, SNP_DE for SNP_sim and Unif_DE for Unif_sim. Common_DE
   indicates the number of elements deregulated at the same time across
   all modalities, while Cor_level is the correlation among different
   datasets.

   We perform two main experiments using simulated data, named Different
   Modalities and Correlated Modalities. The first investigates the impact
   of combining statistics / p-values computed with different statistical
   tests, while the latter focuses on the impact of correlation among data
   modalities. In both cases the objective is discriminating deregulated
   elements from the others.
     * ▪
       Different Modalities: The purpose of this experiments was to
       identify how the performances of the different integrative methods
       are influenced by a) the number of samples, b) the number of
       modalities and c) the number of permutations. Regarding a), we set
       the simulation parameters as follows: Num_genes = 2000, RNAseq_DE =
       1600, Microarray_DE = 1200, SNP_DE = 800, Unif_DE = 400 and
       Common_DE = 400. The number of samples per group was varied within
       {4, 6, 8, 10}. Regarding b), we set the simulation parameters as in
       a), except for the number of samples that was set to 10 and the
       number of datasets that was varied between 2 and 4. Regarding c),
       we generated data for all four data modalities as in a), with the
       Num_samples set to 10, and the number of permutations varied within
       {100, 500, 1000, 2000, 5000}. We stress that no correlation
       structures were added across data modalities in these experiments.
     * ▪
       Correlated Modalities: In this scenario we simulated two correlated
       microarray datasets. We set Num_genes = 2000, Num_samples = 10,
       Microarray_DE = 1000, Common_DE = 1000. We introduced different
       levels of correlation among the non-differential expressed genes of
       the two datasets, specifically Cor_level = {0.6, 0.7, 0.8, 0.9, 1}.
       We forgo introducing a fixed amount of correlation among
       differentially expressed quantities, since to the best of our
       knowledge there is no procedure able to ensure both a given amount
       of correlation and the desired level of differential expression. We
       also avoid adding within-dataset correlations, since in the NPC
       framework each gene is analysed across datasets independently by
       the others, and thus within-dataset correlations would not affect
       the behaviour of the omicsNPC function.

   We analysed RNAseq_sim, Micr_sim, SNP_sim and Unif_sim either (a)
   independently using respectively voom/limma, limma, the R package
   scrime [[80]33] and the one sided Wilcoxon rank sum test, or (b) by
   employing the integrative analysis methods included in the comparative
   evaluation.

Integrative methods included in the comparison

   We compared omicsNPC against several integrative analyses methods often
   used in the literature:

   NPC^no-correction: we use the standard NPC methodology without
   correction for zero p-values, in order to evaluate the relevance of
   this modification.

   RankSum: this is a standard ranks combining method, where the sum of
   the ranks is used for providing a statistic of global differential
   expression.

   omicsNPC^RankSum: we use the NPC permutation schema for providing
   p-values for the combined rank computed with the RankSum approach. More
   specifically, the sum of the ranks is used as global test statistic and
   its null distribution is generated through permutations, as depicted in
   panels c) and d) of Figure A in [81]S1 File. The global p-value is
   calculated by equation (1), see also panel e) of Figure A in [82]S1
   File.

   RankProd [[83]26]: this approach follows the same steps as RankSum but
   the product of the ranks is used as the global statistic instead of the
   sum.

   omicsNPC^RankProd: as in omicsNPC^RankSum, the omicsNPC permutation
   schema is employed to generate p-values for assessing the significance
   of RankProd statistics.

   Combining P-values (CP): we use the standard meta-analysis process of
   combining p-values. Specifically, here we employ the classical Fisher
   combination method [[84]34]. Benjamini [[85]35]: Benjamini and Heller
   proposed a method for testing whether at least u out of n partial
   null-hypotheses are false. Their method is valid also when some
   specific dependency structures hold among the partial p-values, and the
   global p-value is computed as:
   [MATH: <msup><mrow><mi mathvariant="normal">λ</mi></mrow><mrow><mi
   mathvariant="normal">B</mi><mi mathvariant="normal">e</mi><mi
   mathvariant="normal">n</mi><mi mathvariant="normal">j</mi><mi
   mathvariant="normal">a</mi><mi mathvariant="normal">m</mi><mi
   mathvariant="normal">i</mi><mi mathvariant="normal">n</mi><mi
   mathvariant="normal">i</mi><mo>(</mo><mrow><mi
   mathvariant="normal">u</mi><mo>,</mo><mi
   mathvariant="normal">n</mi></mrow><mo>)</mo></mrow></msup><mo>=</mo><mr
   ow><mrow><munder><mrow><mi
   mathvariant="normal">min</mi></mrow><mrow><mi
   mathvariant="normal">j</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>…</mo><mo>
   ,</mo><mi mathvariant="normal">n</mi><mo>−</mo><mi
   mathvariant="normal">u</mi><mo>+</mo><mn>1</mn></mrow></munder></mrow><
   mo>⁡</mo><mrow><mo>{</mo><mrow><mfrac><mrow><mo>(</mo><mrow><mi
   mathvariant="normal">n</mi><mo>−</mo><mi
   mathvariant="normal">u</mi><mo>+</mo><mn>1</mn></mrow><mo>)</mo></mrow>
   <mrow><mi mathvariant="normal">j</mi></mrow></mfrac><msub><mrow><mi
   mathvariant="normal">λ</mi></mrow><mrow><mo>(</mo><mrow><mi
   mathvariant="normal">u</mi><mo>−</mo><mn>1</mn><mo>+</mo><mi
   mathvariant="normal">j</mi></mrow><mo>)</mo></mrow></msub></mrow><mo>}<
   /mo></mrow></mrow> :MATH]

   where the partial p-values λ[j] have been ordered so that λ[j] ≤
   λ[j+1]. When u = 1, the null hypotheses of the Benjamini and NPC method
   coincide. Interestingly, for u = 1 the Benjamini equation reduces to
   the Benjamini-Hochberg formula for controlling the false discovery rate
   [[86]36].

The partial AUC metric

   We evaluated the capability of each method in discriminating
   differentially expressed quantities by employing the partial AUC, a
   metric commonly used in Information Retrieval applications. The
   Receiver Operator Characteristic (ROC) Area Under the Curve (AUC)
   [[87]37], is a metric which combines sensitivity and specificity
   information for all possible values of a decision threshold. AUC ranges
   in the interval of [0, 1], where one corresponds to the perfect rank
   (i.e., all true findings receive a low p-value), 0.5 corresponds to
   random ordering and zero to predictions that are perfectly inverted.
   AUC would evaluate the whole list of p-values, providing a measure of
   global performance. However, researchers attempting to identify
   relevant findings often restrict their attention to a few genes, the
   ones deemed more reliable for subsequent in vitro or in vivo
   experimental validation, which are usually too expensive or demanding
   to be performed on all findings. Thus, we are interested in evaluating
   the partial performances of the methods on the genes corresponding to
   the lowest p-values. We used a version of AUC known as partial AUC,
   pAUC [[88]38], which considers a restricted region of the whole
   sensitivity / specificity curve (specificity in [0.9, 1] for our
   experimentations). The McClish formula [[89]38] standardizes pAUC
   values in [0,1], so that the pAUC has the same interpretation of the
   AUC.

   In all experiments conducted in this study, the whole procedure from
   data simulation to pAUC calculation was performed 20 times and the
   median pAUC was calculated. Finally, we applied the two sided Wilcoxon
   test to evaluate the significance of the differences in performances
   between the best method and all the other ones.

The joint null criterion

   We performed a separated analysis for assessing the correctness
   (calibration) of the p-values computed by each integrative method
   through the Joint Null Criterion, JNC [[90]14]. According to the JNC,
   calibrate methods should provide uniformly distributed p-values when
   applied on multiple hypothesis testing tasks where all null-hypothesis
   are true. When the JNC holds, then a number of procedures for multiple
   testing correction are ensured to strongly control the false discovery
   rate. We applied the “double Kolmogorov-Smirnov test”, dks, for
   checking whether the JNC holds, by performing the following process. We
   used the OCplus package for simulating two uncorrelated or two
   perfectly correlated microarray datasets, each containing 1000 genes
   and 20 i.i.d. samples. The samples were randomly and equally subdivided
   in two groups, and the method to evaluate was applied for assessing the
   differential expression of each gene. A two sided Kolmogorov-Smirnov
   test was performed for checking that the resulting p-values follow a
   uniform distribution. The above process was performed 1000 times,
   producing 1000 distinct p-values from the Kolmogorov-Smirnov (KS) test.
   These p-values are also expected to follow a uniform distribution,
   hypothesis that is assessed using a second Kolmogorov-Smirnov test
   (hence the name double KS test).

   Alternatively, the JNC criterion can also be checked by computing at
   each iteration the posterior probability that the p-values come from a
   uniform distribution [[91]14]. Both dks and the posterior probability
   method are implemented in the “dks” R package [[92]39].

Applications on real data

   We further applied omicsNPC in three different case-studies, for
   demonstrating (a) omicsNPC applicability on the integration of
   heterogeneous datasets, and (b) the increase in relevant biological
   findings obtained by integrating several omics datasets.

Integrative analysis of methylation–expression profiles in Schizophrenia

   We co-analyzed a gene expression [[93]40] and a methylation dataset
   [[94]41] measured on whole blood of Schizophrenic patients and healthy
   controls. Part of the subjects are common between the two datasets, and
   the data are publicly available in the Gene Expression Omnibus (GEO)
   repository, GEO id [95]GSE38484 and [96]GSE41037 respectively. First,
   each dataset was pre-processed independently. The gene expression data
   were produced with the Illumina HumanHT-12 V3.0 expression beadchip;
   probesets without annotations and whose coefficient of deviation
   (defined as the ratio between standard deviation and average value) was
   below 0.01 were excluded from the analysis. The remaining probesets
   were collapsed by selecting for each gene the probeset with highest
   average value. Subjects [97]GSM943305 and [98]GSM943331 were deemed
   outliers and excluded after visual inspection of the Principal
   Component Analysis (PCA) plots. Methylation data were measured with the
   Illumina HumanMethylation27 BeadChip, and were adjusted for batch
   effects using the ComBat method [[99]42] (known batches were indicated
   by the original authors in private communications). Methylation
   probesets lacking annotations or whose methylation site was further
   than 500 kilo bases from the Transcription Start Site (TSS) of the
   closest gene were excluded, as well as probesets whose values spanned
   an interval less than 0.1. Following the single-dataset preprocessing,
   each methylation site was linked to the closest gene measured in the
   expression dataset (cpg-to-the-closest-gene mapping, [[100]43]);
   unmatched genes or methylation sites were excluded. The final
   expression and methylation datasets contain 200 (105 cases) and 658
   (293 cases) subjects, respectively, out of which 50 subjects are in
   common. The expression dataset is defined over 4962 genes, while 6428
   methylation sites are reported in the methylation data.

Deregulation of protein and gene expression levels across Glioblastoma
subtypes

   Proteomics and transcriptomics profiles of Glioblastoma patients were
   downloaded from the The Cancer Genome Atlas (TCGA) repository. We used
   the data as preprocessed by the TCGA consortium (“Level 3”
   preprocessing); expression data were measured with the Affymetrix HT
   Human Genome U133 microarray, while proteomics information was produced
   with the reverse phase protein array technique [[101]44]. Both datasets
   were adjusted for known batches with the ComBat method, and 114 protein
   measurements were associated to their corresponding genes accordingly
   to the annotation of the data producer (RPPA Core Facility, MD Anderson
   Cancer Center, University of Texas). We focuses on the comparison
   between ‘Classical” and “Mesenchymal” Glioblastoma. At the end of the
   preprocessing, the transcriptomics and protein datasets contained
   respectively 315 (159 Mesenchymal) and 98 (45 Mesenchymal) samples, out
   of which 97 were in common.

Co-analyzing NGS and microarray expression data in Breast Invasive Carcinoma

   The expression profiles of 16 distinct BReast invasive CArcinoma (BRCA)
   patients were downloaded from TCGA repository. For each patient two
   types of tissues are available, i.e., tumor and healthy ones, and each
   tissue is profiled with three different technologies, “RNAseq”,
   “RNAseqV2” and “Exp-Gene”, leading to a total of six expression
   profiles for each patient. RNAseq and RNAseqV2 are both produced by
   mRNA sequencing but with slightly different pipelines, while the
   Exp-Gene dataset is generated with microarray technology. The selected
   16 patients belong to the same experimental batch, namely batch 93. We
   used the data as pre-processed by the TCGA consortium at “Level 3”,
   meaning that count matrices are provided for the sequencing datasets
   and normalized expression values for the microarray samples. In total,
   16146 genes were in common among all datasets and were subsequently
   analyzed.

   In all case-studies the datasets were first analyzed in isolation with
   limma (voom/limma for RNAseq data). Furthermore, datasets were
   co-analyzed with omicsNPC by setting the number of permutations to
   10000. All Schizophrenia and Glioblastoma analyses were corrected for
   age and gender.

Results

The JNC applied on integrative analysis methods

   We applied the joint null criterion to evaluate whether the p-values
   produced by each method follow the desired uniform distribution when
   the null-hypotheses are all true. To this end, we simulated 1000 times
   either two uncorrelated or two perfectly correlated microarray
   datasets, and we applied the double KS test process.

   The omicsNPC function exhibited the same behavior in both the
   correlated and uncorrelated scenario, generating calibrated p-values
   with all combining functions (Figures B and G in [102]S1 File).
   NPC^no-correction generated slightly non-calibrated p-values, in both
   uncorrelated and correlated scenarios. In the uncorrelated scenario
   (Figure C in [103]S1 File), the dks p-values for the Fisher and Liptak
   methods were one order of magnitude lower than omicsNPC respective dks
   p-values (p-value = 0.08 and 0.07, respectively). NPC^no-correction
   combined with the Tippett method was found to generate non-calibrated
   p-values (dks p-value = 0.012). In the correlated scenario
   NPC^no-correction generated non-calibrated p-values, regardless of the
   combining function used (Figure H in [104]S1 File). Due to these
   findings we deemed the correction in the formula for computing
   permutation-based p-values necessary, and we use only omicsNPC in the
   subsequent experiments. Interestingly, assessing the JNC is not trivial
   when the sum or the product of the ranks are used as combining
   functions in conjunction with the NPC permutation schema. In both
   uncorrelated and correlated scenarios, omicsNPC^RankSum and
   omicsNPC^RankProduct obtain a dks p-value ≈ 0 (Figures D and I in
   [105]S1 File, panels a and b). However, the empirical distribution
   function of the first level KS test p-values is clearly shifted toward
   high p-values, indicating that at each repetition the KS test accepts
   the null-hypothesis of uniformly distributed p-values, as also
   suggested by distribution of the posterior probabilities. Thus,
   assessing the JNC for these two methods seem quite controversial.
   Notably, to the best of our knowledge, this is the first time that the
   distribution of NPC p-values is evaluated through the joint null
   criterion.

   We used an approximate algorithm [[106]45] for estimating RankProd
   p-values, over which we assessed the JNC for this method. The RankProd
   behaves similarly to omicsNPC^RankSum and omicsNPC^Rankroduct in the
   uncorrelated scenario (Figure D in [107]S1 File, panel c). However, in
   the correlated scenario this method produces uncalibrated p-values, as
   evident in the Q-Q plot of Figure I in [108]S1 File, panel c (dks
   p-value ≈ 0). Also the CP method produced calibrated p-values in the
   non-correlated scenario (dks p-value = 0.8, Figure E in [109]S1 File),
   while producing uncalibrated p-values when correlation structures were
   included in the data, dks p-value ≈ 0 (Figure J in [110]S1 File).
   Finally, the Benjamini method produced calibrated p-values in both
   uncorrelated and correlated scenarios (Figures F and K in [111]S1 File,
   respectively).

Results of simulation studies

   We evaluated the capability of each method in retrieving true positive
   findings on simulated data, where we could assess their performances on
   the known ground truth. In the different modalities scenario we
   simulated several types of data (RNAseq_sim, Microarray_sim, SNPs_sim,
   Unif_sim) and analyzed them either independently or by employing
   integrative analysis methods. We evaluated performances by employing
   the pAUC metric. First, we restricted our analysis on genes that have
   the same behavior across all modalities, i.e., they are either
   deregulated in all datasets or in none of them, and examined the effect
   of the sample size on the performances. Overall, integrative approaches
   performed better in identifying differential expressed genes, in
   comparison with methods that analyze each data modality independently
   ([112]Fig 3A, Table A in [113]S1 File).

Fig 3. Median pAUCs in relation to the number of samples in the different
modalities scenario.

   [114]Fig 3
   [115]Open in a new tab

   Each line corresponds to a specific method. The x axis represents the
   number of samples per group used in the simulations, and y axis the
   median pAUCs. Circles suggest that the observed differences from the
   best method were not statistically significant at level ≤ 0.05. A) Only
   genes which exhibited the same behaviour across datasets were taken
   into account. The integrative analysis yielded better results
   regardless the employed method. CP and omicsNPC^Fisher were the best
   method in most cases. B) All genes were considered in this analysis.
   Again, omicsNPC^Fisher and CP achieved better performance than other
   methods in most cases; the Benjamini method was also statistically
   indistinguishable from the best performing method when the sample size
   was 6 or 10.

   This stands also for low sample sizes (samples per group = 4). In all
   cases CP performed equally or slightly better than omicsNPC, however
   the observed differences were often not statistically significant: when
   the sample size was equal to 4, CP performed equally well with
   omicsNPC^RankSum and RankSum, whereas for bigger sample sizes CP was as
   well performing as omicsNPC^Fisher.

   When all genes are taken into account, CP performed better than other
   methods for moderated sample size, samples = 4, while omicsNPC^Fisher
   performed equally well with CP in all other cases ([116]Fig 3B, Table B
   in [117]S1 File). Benjamini was also one of the best methods when the
   number of samples was six and ten. Applying single-dataset methods
   would have not been meaningful in this case, since each dataset has
   different sets of deregulated quantities and results would not be
   comparable.

   Furthermore, we evaluated the effect of the number of modalities on
   integrative methods’ performances. Similarly to above, first we
   considered only the genes that behaved similarly across all modalities.
   As evident from [118]Fig 4A and the Table C in [119]S1 File, CP and
   omicsNPC^Fisher performed equally well. Furthermore, adding more
   modalities increased the performance of omicsNPC^Fisher,
   omicsNPC^Liptak, omicsNPC^RankSum and RankSum. On the other hand when
   we performed the same analysis taking into account all genes ([120]Fig
   4B and Table D in [121]S1 File), omicsNPC^Tippett, omicsNPC^Fisher, CP
   and Benjamini performed equally well in all cases. It is worth noting
   that the performances of all ranking methods and omicsNPC^Liptak were
   slightly decreasing as we were adding more data modalities.

Fig 4. Median pAUCs in relation to the number of modalities analysed in the
different modalities scenario.

   [122]Fig 4
   [123]Open in a new tab

   Each line corresponds to a specific method. The x axis represents the
   number of modalities analysed, and the y axis the median pAUCs. Circles
   suggest that the observed differences from the best method were not
   statistically significant at level ≤ 0.05. A) Only genes which
   exhibited the same behaviour across all datasets were taken into
   account. In this experiment only the integrative approaches were
   evaluated. CP and omicsNPC^Fisher were the best method in all cases. B)
   All genes were considered in this analysis. omicsNPC^Tippett,
   omicsNPC^Fisher, CP and Benjamini achieved better performance than
   other methods.

   Also, we examined the performance and the computational time of the
   algorithms in relation with the number of permutations (Figure L and
   Table E in [124]S1 File). We observed that the number of permutations
   had a relatively low influence on the performance of the ranking
   approaches and omicsNPC^Liptak, while omicsNPC^Tippett, omicsNPC^Fisher
   and CP reached their maximum performances at 1000 permutations. In
   addition, the computational time of omicsNPC presented a linear
   relation with the number of permutations.

   In the correlated modalities scenario we introduced different levels of
   correlation between two microarray datasets. In all cases
   omicsNPC^Fisher and omicsNPC^Liptak performed identically, and they
   were the best performing methods ([125]Fig 5, Table F in [126]S1 File).
   All combining function that employed the NPC permutation framework, as
   well as the Benjamini method, showed a stable performance, as they were
   able to take into account the correlation structures between the
   datasets. On the contrary, the performance of methods which did not
   correct for correlations (CP, RankProd, RankSum), behaved inversely as
   the level of correlation. It is worth noting that after a specific
   correlation threshold, analyzing datasets in isolation with limma
   provides better performances than methods overlooking between-datasets
   correlation structures.

Fig 5. Median pAUCs in relation to correlation structures.

   [127]Fig 5
   [128]Open in a new tab

   Each line corresponds to a specific method. The x axis represents the
   level of correlation introduced between the datasets, and the y axis
   the median pAUCs. Circles suggest that the observed differences from
   the best method were not statistically significant at level 0.05.
   omicsNPC^Fisher and OmicsNPC^Liptak were the best methods in all cases.
   All functions employing omicsNPC framework, as well as Benjamini,
   showed a stable performance regardless the correlation intensity. On
   the other hand methods which did not account for the correlations
   structures, showed decrease in performance as the correlation was
   higher. Note that single-dataset analyses, sky blue lines, performed
   better than these methods when strong correlations are present.

Results on the real-data case-studies

   The results on the three cases-studies underline how the non-parametric
   combination framework is able to provide additional, biological
   insights with respect to analyzing each dataset in isolation.

   In the breast invasive carcinoma, BRCA, two RNA-seq and one microarray
   gene expression dataset were generated from the same cancer patients;
   thus high between-datasets correlation structures should be expected.
   Indeed, the median Spearman correlation between the two sequencing
   dataset is 0.93, while measurements in the Exp-Gene have a median
   Spearman correlation with their respective counterparts in RNAseq and
   RNAseqV2 of 0.6 and 0.57, respectively. This application seems an ideal
   test-bed for omicsNPC, and indeed we observe that applying omicsNPC on
   all data-types allows to retrieve a higher number of differentially
   expressed genes than analyzing each dataset in isolation, and this
   effect is independent by the combining function or significance
   threshold used on the FDR adjusted p-values (Table G in [129]S1 File).
   For example, at 0.05 FDR level the Liptak combination function provides
   7429 findings, versus a maximum of 7116 retrieved analyzing the RNAseq
   dataset alone. Furthermore, 133 out of the 7429 Liptak significant
   genes are not significant in single-dataset analyses, and 24 out of
   these 133 genes were barely significant (adjusted p-value in
   [0.05–0.1]) for both RNA-seq and the microarray data. This indicates
   how integrating several data sources with OmicsNPC allows to retrieve
   findings that would not be identified by analyzing each omics dataset
   in isolation. Interestingly, an enrichment analysis performed on these
   24 genes over the Disease Ontology of the OBO Foundry [[130]46] shows
   that six out of the ten most enriched diseases are ovarian-related
   cancers ([131]Table 1), a class of malignancies known to share similar
   hormonal [[132]47] and genetic bases [[133]48] with breast cancer.

Table 1. Disease Ontology enrichment over the 24 BRCA genes identified solely
by omicsNPC (Liptak combining function).

   ID Description p-value Adjusted p-value
   DOID:3369 peripheral primitive neuroectodermal tumor 0.000550441
   0.131004856
   DOID:3713 ovary adenocarcinoma 0.001650812 0.182356547
   DOID:10534 stomach cancer 0.008751988 0.182356547
   DOID:1856 Cherubism 0.009691775 0.182356547
   DOID:3605 ovarian cystadenocarcinoma 0.009691775 0.182356547
   DOID:2151 malignant ovarian surface epithelial-stromal neoplasm
   0.009845306 0.182356547
   DOID:2152 ovary epithelial cancer 0.009845306 0.182356547
   DOID:4001 ovarian carcinoma 0.009845306 0.182356547
   DOID:5828 endometrioid ovary carcinoma 0.011298616 0.182356547
   DOID:2394 ovarian cancer 0.012482471 0.182356547
   [134]Open in a new tab

   The table reports the enrichment result computed with hypergeometric
   test (function “enrichDO” from the R package “DOSE”) over the 24 genes
   identified by omicsNPC equipped with the Liptak combining function and
   deemed barely significant by the single-dataset analyses.

   Applying omicsNPC on the Schizophrenia data further confirms NPC’s
   capabilities of retrieving relevant biological findings. At an FDR
   level of 0.05, analyzing methylation data alone identifies 3194 genes
   with at least one altered methylation site, against 1892 genes whose
   expression is deregulated. Pairing methylation and expression data with
   omicsNPC and the Fisher combining function leads to the identification
   of 3844 genes deregulated at the methylation / expression level. We
   analyzed the enrichment of these significant genes in KEGG [[135]49],
   Gene Ontology Biological Processes [[136]50] and Reactome [[137]51]
   pathways using the hypergeometric test, and excluding pathways with
   less than 20 or more than 200 elements (3918 pathways to test in
   total). The enrichment analysis identified sixty-eight pathways
   enriched for omicsNPC findings, out of which thirty-nine were not
   significant when the enrichment was performed on each data modality in
   isolation (FDR level 0.05 in all cases, [138]S2 File). [139]Fig 6
   reports the five most significant of them, comparing their level of
   significance according to omicsNPC against methylation and expression
   data alone. The first pathway, the biological process “antigen
   receptor-mediated signaling pathway” (GO id GO:0050851), represents a
   set of molecular signals that are initiated in B or T cells by the
   cross-linking of an antigen receptor. Interestingly, Schizophrenia has
   been found associated to antigen processing and Human Leucocyte Antigen
   (HLA) genes in numerous studies [[140]52–[141]54], findings that
   corroborate an implication of the immune system in the disease
   [[142]55,[143]56]. Similarly several studies have hypothesized a role
   of Chronic Inflammation in Schizophrenia [[144]57–[145]60]. Finally,
   deficiencies in Toll-like receptor-2, a family of pattern recognition
   receptors, have been demonstrated to induce Schizophrenia-like
   behaviors in mice [[146]61].

Fig 6. Pathway enrichment analysis in the Schizophrenia case-study.

   [147]Fig 6
   [148]Open in a new tab

   Some pathways are enriched according to omicsNPC (Fisher combining
   function) but not for the single-dataset analysis (FDR level 0.05 in
   all cases); the 5 most enriched of such pathways are reported on the
   y-axis. Each dot represents the significance (color) and number of
   deregulated genes (size) in the respective pathway according to
   omicsNPC^Fisher (Fisher), methylation (Methy), or transcriptomics data
   (Expr).

   In the Glioblastoma case-study, most of the proteins show a
   deregulation at 0.1 FDR level between Mesenchymal and Classical
   Glioblastoma; 53 deregulated proteins in the proteomics data, 64
   differentially expressed genes in the transcriptomics data, and 95
   deregulated elements in both data types combined (omicsNPC, Fisher
   combining function). Also in this case, omicsNPC retrieves additional
   findings that seem to have biological relevance: proteins Myc, MSH6 and
   STAT3 were significant at 0.1 FDR level for omicsNPC, even if they were
   not significant when proteomics and transcriptomics data were analyzed
   in isolation. Genes encoding Myc, MSH6 and STAT3 are well known for
   their role as oncogenes [[149]62–[150]64]. More importantly, STAT3 has
   been found as an initiator and master regulator of brain tumor
   mesenchymal transformation [[151]65]; MSH6 mutations have reported to
   affect treatment response in Glioblastoma [[152]66]; and Myc has been
   found separately associated to glioma/glioblastoma [[153]67][[154]68]
   and mesenchymal transition of epithelial cells [[155]69].

Discussion

   In this study we have shown that the Non-Parametric Combination, NPC,
   is a valuable methodology for the integration of heterogeneous omics
   datasets. NPC is particularly useful for identifying molecular
   quantities that, considered together, are deregulated / associated to
   an outcome of interested in a statistically significant way.

   We realized the omicsNPC function for facilitating the analysis of
   omics data within the NPC framework. The function is freely available
   in the R Bioconductor package STATegRa, and is able to address a
   variety of different study designs as well as analyzing datasets having
   only part of their samples in common.

   We contrasted omicsNPC against two rank-based and two parametric
   approaches. The results showed that OmicsNPC^Fisher performed better
   than other methods in most scenarios conducted in this study. When no
   correlation structures were present in the data, the CP method was
   slightly more performant than omicsNPC^Fisher, with the observed
   differences often being not statistically significant. In contrast,
   when between-datasets correlation structures were introduced, omicsNPC
   outperformed all other methods, while approaches that did not take
   those correlations into account performed poorly, achieving lower
   performances than single-dataset analyses for moderate to high
   correlation levels ([156]Fig 5).

   Furthermore, we applied the joint null criterion, JNC, on all methods
   under study employing correlated as well as uncorrelated datasets. The
   omicsNPC function always presented the same unbiased behavior, when
   Liptak, Tippett or Fisher combining functions were used (Figures B and
   G in [157]S1 File), and permutation-based p-values were corrected for
   avoiding reaching zero. RankProd and CP produced uncalibrated p-values
   when correlated datasets were considered (Figures I and J in [158]S1
   File), while the Benjamini method showed to comply with the JNC in all
   simulations (Figures F and K in [159]S1 File).

   Furthermore, we explored NPC as a general framework for integrative
   analysis, by using ranking methods as combining functions (Figure A in
   [160]S1 File). The key idea is using the permutation strategy of the
   NPC framework for allowing different integration methods, RankSum and
   RankProd in our analysis, to correctly address between-datasets
   correlations. Indeed, the performances of the omicsNPC^RankProd and
   omicsNPC^RankSum methods were not negatively affected by the level of
   correlation among data modalities, contrarily to the original RankSum
   and RankProd approaches ([161]Fig 5). Also the results of the dks tests
   did not indicated any difference due to the presence of correlation
   (Figures D and I in [162]S1 File), even though it is not clear whether
   the JNC holds for omicsNPC^RankProd and omicsNPC^RankSum.

   Finally, we applied OmicsNPC on real data from three different
   case-studies. Taken together, the results on the three real-data
   case-studies show that omicsNPC is (a) versatile enough to be used for
   the integration of different omics data; (b) able to include
   co-variates into the analysis and process sets of datasets that share
   only part of the samples; (c) finally, omicsNPC have shown, at least in
   the context of our analysis, to produce additional, relevant biological
   insights with respect to analyzing data modalities in isolation.
   Particularly, we concentrated on results deemed statistically
   significant by omicsNPC but discarded by single-dataset analysis,
   retrieving interesting findings both at gene (Myc, STAT3 and MSH6 genes
   in the Mesenchymal vs. Classical Glioblastoma case study), pathway (the
   antigen and pattern recognition signaling pathways in the Schizophrenia
   example) and disease levels (gynecologic cancer diseases in BRCA).

   In conclusion, several important features of the NPC methodology make
   it appropriate for the integrative analysis of omics data. First, the
   underlying assumptions are minimal: NPC permutation schema requires
   observations to be exchangeable under the null hypothesis, a
   requirement common in permutation-based approaches. Furthermore, the
   partial p-values are assumed to be adequate for assessing the partial
   null-hypothesis, marginally unbiased and consistent [[163]70], while
   the combining functions must respect a set of mild assumptions
   [[164]71]. The possibility of equipping NPC methods with different
   combining functions provides the method with great flexibility:
   researchers can select the function that best reflects their
   expectative on the data or that are most suitable for answering their
   specific scientific questions. Last but not least, NPC frees the
   researcher from the necessity to define and model the dependence
   relations among different data modalities and partial tests.

   Finally, we would like to underline the limitations of the present
   study. First, we restricted our simulations to the special case where
   exactly one measurement per gene is provided by each data modality, and
   this is rarely the case in real applications. All analyses were
   performed solely in the context of case-control studies, without taking
   into consideration other types of study designs. The performances of
   the different integrative methods, as well as the joint-null criterion,
   were evaluated only on simulated data, since the ground-truth necessary
   for the evaluation is not available in the case-studies on real data.

Conclusions

   To the best of our knowledge, this is the first study investigating the
   applicability of the Non-Parametric Combination methodology in the
   analysis of heterogeneous omics data. Our results indicate omicsNPC and
   the NPC framework as versatile and valuable tools for performing
   integrative analyses in biological studies.

Supporting Information

   S1 File

   This file contains Figure A, which explain OmicsNPC employing ranking
   methods, Figures B–K which illustrate the diagnostic plots of the joint
   null hypothesis criterion. Further it includes Figure L which
   demonstrates how omicsNPC performances are influenced by the number of
   permutations. Finally it includes Tables A–F, which report the median
   pAUCs values for the experimentations on simulated data and their
   respective significance levels. Table G reports the number of
   deregulated genes in the BRCA case-study.

   (PDF)
   [165]Click here for additional data file.^ (1.2MB, pdf)
   S2 File. Results of the enrichment analysis on the Schizophrenia
   case-study.

   (XLSX)
   [166]Click here for additional data file.^ (1.1MB, xlsx)

Acknowledgments