Abstract

Background

   Detecting and visualizing nonlinear interaction effects of single
   nucleotide polymorphisms (SNPs) or epistatic interactions are important
   topics in bioinformatics since they play an important role in
   unraveling the mystery of “missing heritability”. However, related
   studies are almost limited to pairwise epistatic interactions due to
   their methodological and computational challenges.

Results

   We develop CINOEDV (Co-Information based N-Order Epistasis Detector and
   Visualizer) for the detection and visualization of epistatic
   interactions of their orders from 1 to n (n ≥ 2). CINOEDV is composed
   of two stages, namely, detecting stage and visualizing stage. In
   detecting stage, co-information based measures are employed to quantify
   association effects of n-order SNP combinations to the phenotype, and
   two types of search strategies are introduced to identify n-order
   epistatic interactions: an exhaustive search and a particle swarm
   optimization based search. In visualizing stage, all detected n-order
   epistatic interactions are used to construct a hypergraph, where a real
   vertex represents the main effect of a SNP and a virtual vertex denotes
   the interaction effect of an n-order epistatic interaction. By deeply
   analyzing the constructed hypergraph, some hidden clues for better
   understanding the underlying genetic architecture of complex diseases
   could be revealed.

Conclusions

   Experiments of CINOEDV and its comparison with existing
   state-of-the-art methods are performed on both simulation data sets and
   a real data set of age-related macular degeneration. Results
   demonstrate that CINOEDV is promising in detecting and visualizing
   n-order epistatic interactions. CINOEDV is implemented in R and is
   freely available from R CRAN: [35]http://cran.r-project.org and
   [36]https://sourceforge.net/projects/cinoedv/files/.

Electronic supplementary material

   The online version of this article (doi:10.1186/s12859-016-1076-8)
   contains supplementary material, which is available to authorized
   users.

   Keywords: Epistatic interactions, Co-information, Single nucleotide
   polymorphisms, Particle swarm optimization, Hypergraph

Background

   Following the development of high-throughput sequencing and genotyping
   technologies, there has been a rapid increase in the availability of
   single nucleotide polymorphisms (SNPs). Hence genome-wide association
   studies (GWAS) have become a routine tool in investigating the genetic
   architectures of complex diseases, such as cancer, heart disease,
   diabetes and many others. With these studies, hundreds of thousands of
   SNPs speculated to associate with complex diseases have been
   identified. However, these SNPs have been shown to explain only a small
   proportion of underlying genetic variance of complex diseases, leaving
   the question of “missing heritability” open for further investigation
   [[37]1, [38]2].

   Some plausible explanations should be taken into account to reveal the
   gaps between expectations and realities of GWAS. Firstly, GWAS require
   p-values (or other similar measures) of disease-associated SNPs to
   reach a genome-wide significance level after several stringent multiple
   testing corrections, for example, Bonferroni correction, which may
   exclude many genuinely associated SNPs that have moderate or weak
   association signals [[39]3]. Secondly, rare SNPs (i.e., minor allele
   frequency of each is < 5 %) are difficult to be detected, and sometimes
   even be ignored in GWAS, though they may play an important role in
   explaining “missing heritability” [[40]1, [41]4, [42]5]. Thirdly,
   besides SNPs, other types of biological data, for instance, copy number
   variation, DNA methylation, and gene expression, also provide
   different, partly independent and complementary, views for unraveling
   the mystery of “missing heritability” [[43]6, [44]7]. Fourthly, it is
   widely believed that nonlinear interaction effects of multiple SNPs or
   epistatic interactions could unveil a large portion of unexplained
   heritability of complex diseases [[45]8–[46]11]. In fact, detection of
   epistatic interactions has already been a compelling step in GWAS
   [[47]12].

   In general, detection of epistatic interactions is of great challenge.
   The first challenge is the intensive computational burden mainly
   imposed by the “curse of dimensionality” and the “combinatorial
   explosion”, which has significant implications for GWAS with millions
   of SNPs. For instance, search space of a 100 K SNP data set with
   maximum order of three is an astronomical number ∑^3[k = 1]C^k[100000],
   where the order refers to the number of SNPs in a SNP combination. The
   second challenge is the complexity of genetic architecture of a
   disease. It may involve multiple epistatic interactions interacting
   with other causative factors in a complicated way, each displaying
   strong association with the phenotype as a whole but the contained SNPs
   possibly having small or even no main effects. Limited prior knowledge
   available for a disease, such as the number of epistatic interactions,
   the order and the effect magnitude of an epistatic interaction, makes
   their detection difficult. The third is the association measure that
   determines how well a SNP combination contributes to the phenotype. A
   suitable association measure is required to be efficient in
   computational cost and insensitive to both SNP combination order and
   effect type, and more importantly, it can truly capture causative
   epistatic interactions. Though several association measures have been
   widely used for the detection of epistatic interactions, such as
   permutation test and chi-squared test, developing new association
   measures that can effectively and efficiently capture epistatic
   interactions is still a direction. All the above are the great
   challenges in genome-wide interaction analysis.

   Though methodological and computational perplexities of the detection
   of epistatic interactions have been well recognized, the algorithmic
   development is still ongoing. Exhaustive methods, e.g., MDR [[48]13],
   show their successes on small scale data sets. However, for large scale
   data sets, especially those for GWAS, the detection of epistatic
   interactions becomes a needles-in-a-haystack problem [[49]14] and
   exhaustive methods are no longer feasible. Recently, heuristic methods
   are gaining increasing favor since they can retain as many informative
   SNPs as possible while largely reducing computational complexity. For
   instance, Zhang et al. developed TEAM [[50]15] to identify epistatic
   interactions, which updates contingency tables by utilizing a minimum
   spanning tree. Wan et al. presented an epistatic interaction detection
   method BOOST [[51]16], which involves only Boolean values and allows
   the use of fast logic operations to obtain contingency tables. They
   also proposed another method SNPRuler [[52]17] based on predictive rule
   inference. Wang et al. used AntEpiSeeker [[53]18] to identify epistatic
   interactions, which is a two-stage ant colony optimization algorithm.
   Zhang and Liu developed a Bayesian partition approach BEAM to find
   groups of genotypes with large posterior probability [[54]19]. Tang et
   al. introduced the concept of epistatic module and designed a Gibbs
   sampling approach epiMODE [[55]20] to detect such modules, which is a
   generalization of BEAM.

   Besides these methods, co-information based methods appear promising in
   detecting epistatic interactions since they have a well-developed
   theory, and can measure multivariate dependence without any complex
   modeling. Chanda et al. [[56]21] developed a co-information based
   metric called the interaction index for prioritizing interacting SNPs.
   They also proposed another three co-information based methods: AMBIENCE
   [[57]22] and KWII [[58]23] for detecting epistatic interactions
   associated with the binary phenotype, CHORUS [[59]24] for identifying
   associations with quantitative traits. Sucheston et al. [[60]25]
   demonstrated that co-information based methods are flexible and have
   excellent power to detect epistatic interactions under a variety of
   conditions that characterize complex diseases.

   Although many methods for detecting epistatic interactions have been
   performed, most of them were constrained to pairwise epistatic
   interactions, easily ignoring the broader epistasis landscape [[61]26].
   Furthermore, these methods usually output identified epistatic
   interactions, as well as their significance levels, in flat file
   formats. Hence, the correct understanding of them is sometimes a
   challenge for researchers, especially for biologists and non-expert
   users, who are unfamiliar with the methods. A desirable strategy is to
   develop an effective visualization tool not only to intuitively
   visualize the detected interactions but also to discover several hidden
   patterns [[62]27].

   However, there have been few studies focused on their visualization.
   Moore et al. [[63]28] built an interaction graph to visualize detected
   epistatic interactions. McKinney et al. [[64]29] constructed a genetic
   association interaction network (GAIN) to characterize detailed
   interactions, whose edges quantify the synergy between pair SNPs with
   respect to the phenotype. GAIN has been successfully used for
   identifying modulators of antibody response to smallpox vaccine
   [[65]30], GWAS of bipolar disorder [[66]31], and analyzing exome data
   for systemic lupus erythematous cases and controls [[67]32]. Hu et al.
   [[68]33] proposed a statistical epistasis network (SEN) approach, which
   has been proven to be able to discover pairwise epistatic interactions
   of bladder cancer [[69]34, [70]35] and prostate cancer [[71]36]. They
   also demonstrated that SEN supervised search is able to infer several
   3-order epistatic interactions with significantly high associations at
   a substantially reduced computational cost [[72]37]. Though these
   network-assisted methods can provide a global map of pairwise epistatic
   interactions, can indirectly capture higher order epistatic
   interactions on the basis of observing topology structures of networks,
   and can be exported for visualization in existing tools, such as
   Cytoscape and Graphviz, they could not directly detect and visualize
   n-order epistatic interactions, for example, n = 3 or larger. More
   recently, Hu et al. [[73]38] presented a visualization tool ViSEN,
   which can show both pairwise and 3-order epistatic interactions, in
   addition to main effects, in one network. To the best of our knowledge,
   it is the first visualization tool that shows three orders of effects
   simultaneously. Nevertheless, different orders of effects in ViSEN are
   difficult to be fairly and intuitively compared. Wu et al. [[74]27]
   designed another visualization tool EINVis to analyze and explore
   genetic interactions, which utilizes a tree ring view to simultaneously
   visualize the hierarchical interactions between SNPs, genes, and
   chromosomes. However, EINVis is limited in detecting and visualizing
   high order epistatic interactions.

   In the light of above observations, we develop CINOEDV (Co-Information
   based NOrder Epistasis Detector and Visualizer) for the detection and
   visualization of epistatic interactions of their orders from 1 to n
   (n ≥ 2). CINOEDV is composed of two stages, namely, detecting stage and
   visualizing stage. In detecting stage, co-information based measures
   are employed to quantify association effects of n-order SNP
   combinations to the phenotype, and two types of search strategies are
   introduced to identify n-order epistatic interactions: an exhaustive
   search for lower order epistatic interactions and/or small scale data
   sets, a particle swarm optimization (PSO) based search for higher order
   epistatic interactions and/or large scale data sets. In visualizing
   stage, all detected n-order epistatic interactions are used to
   construct a hypergraph, where a real vertex represents the main effect
   of a SNP and a virtual vertex denotes the interaction effect of an
   n-order epistatic interaction. By deeply analyzing the constructed
   hypergraph, some hidden clues for better understanding the underlying
   genetic architecture of complex diseases could be revealed, for
   instance, higher order epistatic interactions, hub SNPs and connected
   subgraphs. Experiments of CINOEDV and its comparison with
   state-of-the-art methods are performed on lots of simulation data sets
   under the evaluation measures of both detection power and computational
   complexity. Results demonstrate that CINOEDV is promising in detecting
   and visualizing n-order epistatic interactions. In addition, CINOEDV is
   also applied on a real data set of age-related macular degeneration
   (AMD), and results of which provide several new clues for the
   exploration of causative factors of AMD. CINOEDV might be an
   alternative to existing methods for the detection and visualization of
   n-order epistatic interactions.

Methods

Co-information based association measures

   Before introducing the measures, several terms and notations are
   described. At present, the generally accepted way of mapping SNPs is to
   collect them as a matrix, where a row represents genotypes of an
   individual and a column represents a SNP. Genotypes of a SNP are coded
   as {0, 1, 2}, corresponding to homozygous common genotype, heterozygous
   genotype, and homozygous minor genotype. The label of an individual is
   a binary phenotype being either 0 (control) or 1 (case). Based on this
   numerical mapping, let N and M be the number of SNPs and the number of
   individuals in the data respectively. Below we will discuss the
   definitions of co-information based association measures between n SNPs
   S[1], ⋯, S[n], that randomly sampled from N SNPs, and the phenotype C.

   Co-information is one of several generalizations of mutual information,
   and can measure multivariate dependence without any complex modeling
   [[75]39]. Co-information among n SNPs and the phenotype C is defined as
   an alternating sum of the joint entropies of all possible subsets T of
   V using the difference operator notation of Han [[76]40],
   [MATH: <mi>C</mi><mi>I</mi><mfenced close=")" open="("
   separators=";;;"><msub><mi>S</mi><mn>1</mn></msub><mo>⋯</mo><msub><mi>S
   </mi><mi>n</mi></msub><mi>C</mi></mfenced><mo>=</mo><mo>−</mo><msup><ms
   tyle displaystyle="true"><munder><mo
   stretchy="true">∑</mo><mrow><mi>T</mi><mo>⊆</mo><mi>V</mi></mrow></mund
   er><mrow><mfenced close=")"
   open="("><mrow><mo>−</mo><mn>1</mn></mrow></mfenced></mrow></mstyle><mr
   ow><mi>n</mi><mo>+</mo><mn>1</mn><mo>−</mo><mfenced close="|"
   open="|"><mi>T</mi></mfenced></mrow></msup><mi>H</mi><mfenced close=")"
   open="("><mi>T</mi></mfenced><mtext>,</mtext> :MATH]

   where V = {S[1], ⋯, S[n], C}, T represents all possible subsets of V,
   and H(T) is the joint entropy of T, which can be written as
   [MATH: <mi>H</mi><mfenced close=")"
   open="("><mi>T</mi></mfenced><mo>=</mo><mo>−</mo><mstyle
   displaystyle="true"><munder><mo
   stretchy="true">∑</mo><mrow><mi>t</mi><mo>∈</mo><mi>T</mi></mrow></mund
   er><mrow><mi>p</mi><mfenced close=")"
   open="("><mi>t</mi></mfenced></mrow></mstyle><mo>log</mo><mi>p</mi><mfe
   nced close=")" open="("><mi>t</mi></mfenced><mtext>,</mtext> :MATH]

   and p(t) is the probability mass function.

   It is seen that co-information is a parsimonious, multivariate measure
   quantifying interactions that cannot be obtained without observing all
   variables at the same time [[77]24], and it seems promising for
   detecting n-order epistatic interactions. However, it also has two
   confusing properties retarded its wider adoption as an association
   measure.

   The first is its value. In the bivariate case, co-information is
   equivalent to mutual information and its value is always positive. But
   in the multivariate case, its value can be positive or negative, the
   interpretation of which is generally intuitive [[78]26]: a positive
   value is an evidence of interactions among variables; a negative value
   indicates the presence of redundancy; and a value of zero denotes that
   variables are independent or, more likely, interact with a mixture of
   synergy and redundancy. Almost all existing applications of
   co-information depend upon this intuitive explanation [[79]21, [80]24,
   [81]28, [82]29, [83]33, [84]38], and it is also the basis of our
   association measures.

   The second is its sensitivity to the SNP combination order. This
   property leads to difficulty in ranking SNP combinations of different
   orders. As yet, it still lacks the widely accepted normalization
   method. In this study, we make use of the order-fixed averages of
   co-information values to normalize them, defined as n-order interaction
   effect,
   [MATH:
   <mi>N</mi><mi>C</mi><mi>I</mi><mrow><mo>(</mo><msub><mrow><mi>S</mi></m
   row><mn>1</mn></msub><mo>;</mo><mo>⋯</mo><mo>;</mo><msub><mrow><mi>S</m
   i></mrow><mi>n</mi></msub><mo>;</mo><mi>C</mi><mo>)</mo></mrow><mo>=</m
   o><mfrac><mrow><mi>C</mi><mi>I</mi><mrow><mo>(</mo><msub><mrow><mi>S</m
   i></mrow><mn>1</mn></msub><mo>;</mo><mo>⋯</mo><mo>;</mo><msub><mrow><mi
   >S</mi></mrow><mi>n</mi></msub><mo>;</mo><mi>C</mi><mo>)</mo></mrow></m
   row><mrow><mi>H</mi><mo stretchy="false">(</mo><mi>C</mi><mo
   stretchy="false">)</mo></mrow></mfrac><mo>⋅</mo><mfrac><mrow><mover
   accent="true"><mrow><mi>C</mi><msub><mi>I</mi><mn>1</mn></msub></mrow><
   mo stretchy="true">¯</mo></mover></mrow><mrow><mover
   accent="true"><mrow><mi>C</mi><msub><mi>I</mi><mi>n</mi></msub></mrow><
   mo stretchy="true">¯</mo></mover></mrow></mfrac><mo>,</mo> :MATH]

   where H(C) is the entropy of the phenotype,
   [MATH: <mover
   accent="true"><mrow><mi>C</mi><msub><mi>I</mi><mi>n</mi></msub></mrow><
   mo stretchy="true">¯</mo></mover> :MATH]
   and
   [MATH: <mover
   accent="true"><mrow><mi>C</mi><msub><mi>I</mi><mn>1</mn></msub></mrow><
   mo accent="false">¯</mo></mover> :MATH]
   are respective averages of all considered n-order and 1-order
   co-information values. The first part of the formula provides the
   percentage of explaining the phenotype by giving the knowledge of n
   SNPs, and the second part is a coefficient that balances the
   contributions of SNP combinations of different orders to the phenotype.

   However, NCI only measures contribution of a SNP combination itself,
   not containing contributions of its subsets. In fact, the effect of an
   n-order SNP combination to the phenotype consists of main effects of
   all involved SNPs, as well as interaction effects of itself and its all
   subsets. To quantify the total contribution of a SNP combination to the
   phenotype, another co-information based association measure is
   presented, defined as the summation of all involved contributions,
   including its contribution, and contributions of its subsets whose NCI
   values reach the user-specified thresholds. The formula can be written
   as
   [MATH: <mi>C</mi><mi>C</mi><mi>I</mi><mfenced close=")" open="("
   separators=";;;"><msub><mi>S</mi><mn>1</mn></msub><mo>⋯</mo><msub><mi>S
   </mi><mi>n</mi></msub><mi>C</mi></mfenced><mo>=</mo><mstyle
   displaystyle="true"><munder><mo
   stretchy="true">∑</mo><mrow><mi>Z</mi><mo>⊆</mo><mi>C</mi><mi>S</mi><mo
   >∩</mo><mi>Z</mi><mo>⊆</mo><mfenced close="}" open="{"
   separators=";;"><msub><mi>S</mi><mn>1</mn></msub><mo>⋯</mo><msub><mi>S<
   /mi><mi>n</mi></msub></mfenced></mrow></munder><mrow><mi>N</mi><mi>C</m
   i><mi>I</mi><mfenced close=")" open="("
   separators=";"><msup><mi>Z</mi><mo>′</mo></msup><mi>C</mi></mfenced></m
   row></mstyle><mtext>,</mtext> :MATH]

   where Z^′ represents all SNPs in the set Z, C represents the phenotype,
   and CS is a set of SNP combinations that their NCI values pass the
   user-specified thresholds.

Search strategies

   CINOEDV supports two types of search strategies, with NCI as its
   association measure, to simultaneously detect epistatic interactions of
   their orders from 1 to n (n ≥ 2). One is an exhaustive search for lower
   order epistatic interactions and/or small scale data sets. With genome
   wide SNPs from thousands of individuals, it is difficult to search high
   order epistatic interactions exhaustively because of their heavy
   computational burden. CINOEDV provides a PSO based search for higher
   order epistatic interactions and/or large scale data sets.

   The PSO is a popular member of swarm intelligence algorithms inspired
   by the collective behaviors of organisms, like birds (viewed as
   particles), which can jointly perform many complex tasks though each
   individual is very limited in its capability [[85]41]. In PSO, the
   position of a particle represents a possible solution which is adjusted
   according to its velocity, and estimated by a fitness function at each
   generation. A higher fitness value implies a better position. The
   velocity of a particle is updated according to three factors: its
   previous velocity, its individual experience, and the common knowledge
   of the swarm. The individual experience of a particle is the best
   position that it has travelled. The common knowledge of the swarm is
   the best one among individual experiences of all particles. This
   feedback strategy leads the swarm gradually converge to an optimal
   solution [[86]42].

   In our PSO based search, NCI is applied as its fitness function. That
   is to say, a higher NCI value indicates a stronger association between
   the SNP combination and the phenotype. Compared with the PSO, our PSO
   based search has its own highlights: detecting multiple epistatic
   interactions with different orders at the same time, dynamic inertia
   weight, and opposition based learning.

   Suppose
   [MATH: <mi
   mathvariant="italic">Positio</mi><msub><mi>n</mi><mi>g</mi></msub><mfen
   ced close=")" open="("><mi>q</mi></mfenced><mo>=</mo><mfenced close=")"
   open="("
   separators=",,,,"><msubsup><mi>S</mi><mrow><mi>q</mi><mn>1</mn></mrow><
   mi>g</mi></msubsup><mo>⋯</mo><msubsup><mi>S</mi><mrow><mi>q</mi><mi>k</
   mi></mrow><mi>g</mi></msubsup><mo>⋯</mo><msubsup><mi>S</mi><mrow><mi>q<
   /mi><msub><mi>K</mi><mi>q</mi></msub></mrow><mi>g</mi></msubsup></mfenc
   ed> :MATH]
   is the position of the q[th] particle at iteration g, where
   q ∊ {1, ⋯, Q}, g ∊ {1, ⋯, G}, k ∊ {1, ⋯, K[q]}, S[qk]^g is the selected
   k[th] SNP of the q[th] particle at iteration g, Q is the number of
   particles, G is the number of iterations, and K[q] is the considered
   order of epistatic interactions of the q[th] particle, which is
   randomly specified within [1, n] at the initialization stage. The
   velocity of the q[th] particle at iteration g is denoted as
   [MATH: <mi
   mathvariant="italic">Velocit</mi><msub><mi>y</mi><mi>g</mi></msub><mfen
   ced close=")" open="("><mi>q</mi></mfenced><mo>=</mo><mfenced close=")"
   open="("
   separators=",,,,"><msubsup><mi>v</mi><mrow><mi>q</mi><mn>1</mn></mrow><
   mi>g</mi></msubsup><mo>⋯</mo><msubsup><mi>v</mi><mrow><mi>q</mi><mi>k</
   mi></mrow><mi>g</mi></msubsup><mo>⋯</mo><msubsup><mi>v</mi><mrow><mi>q<
   /mi><msub><mi>K</mi><mi>q</mi></msub></mrow><mi>g</mi></msubsup></mfenc
   ed> :MATH]
   , where v[qk]^g is the velocity of S[qk]^g. The individual experience
   of the q[th] particle is written as
   [MATH: <mi
   mathvariant="italic">Pbes</mi><msub><mi>t</mi><mi>g</mi></msub><mfenced
   close=")" open="("><mi>q</mi></mfenced><mo>=</mo><mfenced close=")"
   open="("><mrow><mi>P</mi><msubsup><mi>S</mi><mrow><mi>q</mi><mn>1</mn><
   /mrow><mi>g</mi></msubsup><mo>,</mo><mo>⋯</mo><mo>,</mo><mi>P</mi><msub
   sup><mi>S</mi><mrow><mi>q</mi><mi>k</mi></mrow><mi>g</mi></msubsup><mo>
   ,</mo><mo>⋯</mo><mo>,</mo><mi>P</mi><msubsup><mi>S</mi><mrow><mi>q</mi>
   <msub><mi>K</mi><mi>q</mi></msub></mrow><mi>g</mi></msubsup></mrow></mf
   enced> :MATH]
   . The common knowledge of the swarm is redefined as the best ones among
   individual experiences of particles with the same considered orders,
   i.e., Gbest[g]^K = (GS[1]^g, ⋯, GS[k]^g, ⋯, GS[K]^g), where K ∊ [1, n].
   During the initialization stage, Position[1](q), Velocity[1](q),
   Pbest[1](q) and Gbest[1]^K are randomly initialized in their respective
   domains.

   The PSO based search detects epistatic interactions by continuously
   updating velocity and position of each particle at all iterations. The
   velocity of S[qk]^g is updated according to the following two
   equations,
   [MATH: <msubsup><mover accent="true"><mi>v</mi><mo
   stretchy="true">˜</mo></mover><mrow><mi>q</mi><mi>k</mi></mrow><mrow><m
   i>g</mi><mo>+</mo><mn>1</mn></mrow></msubsup><mo>=</mo><msubsup><mi>W</
   mi><mrow><mi>q</mi><mi>k</mi></mrow><mi>g</mi></msubsup><mo>⋅</mo><msub
   sup><mi>v</mi><mrow><mi>q</mi><mi>k</mi></mrow><mi>g</mi></msubsup><mo>
   +</mo><msub><mi>c</mi><mn>1</mn></msub><mo>⋅</mo><msub><mi>r</mi><mn>1<
   /mn></msub><mo>⋅</mo><mfenced close=")"
   open="("><mrow><mi>P</mi><msubsup><mi>S</mi><mrow><mi>q</mi><mi>k</mi><
   /mrow><mi>g</mi></msubsup><mo>−</mo><msubsup><mi>S</mi><mrow><mi>q</mi>
   <mi>k</mi></mrow><mi>g</mi></msubsup></mrow></mfenced><mo>+</mo><msub><
   mi>c</mi><mn>2</mn></msub><mo>⋅</mo><msub><mi>r</mi><mn>2</mn></msub><m
   o>⋅</mo><mfenced close=")"
   open="("><mrow><mi>G</mi><msubsup><mi>S</mi><mi>k</mi><mi>g</mi></msubs
   up><mo>−</mo><msubsup><mi>S</mi><mrow><mi>q</mi><mi>k</mi></mrow><mi>g<
   /mi></msubsup></mrow></mfenced><mtext>,</mtext> :MATH]
   [MATH:
   <msubsup><mi>v</mi><mrow><mi>q</mi><mi>k</mi></mrow><mrow><mi>g</mi><mo
   >+</mo><mn>1</mn></mrow></msubsup><mo>=</mo><mfenced close=""
   open="{"><mtable columnalign="center"><mtr columnalign="center"><mtd
   columnalign="center"><msubsup><mover accent="true"><mi>v</mi><mo
   stretchy="true">˜</mo></mover><mrow><mi>q</mi><mi>k</mi></mrow><mrow><m
   i>g</mi><mo>+</mo><mn>1</mn></mrow></msubsup></mtd><mtd
   columnalign="center"><msubsup><mover accent="true"><mi>v</mi><mo
   stretchy="true">˜</mo></mover><mrow><mi>q</mi><mi>k</mi></mrow><mrow><m
   i>g</mi><mo>+</mo><mn>1</mn></mrow></msubsup><mo>∈</mo><mfenced
   close="]"
   open="["><mrow><mn>1</mn><mo>−</mo><mi>N</mi><mo>,</mo><mi>N</mi><mo>−<
   /mo><mn>1</mn></mrow></mfenced></mtd></mtr><mtr
   columnalign="center"><mtd columnalign="center"><mi
   mathvariant="italic">rand</mi><mfenced close=")"
   open="("><mrow><mn>1</mn><mo>−</mo><mi>N</mi><mo>,</mo><mi>N</mi><mo>−<
   /mo><mn>1</mn></mrow></mfenced></mtd><mtd
   columnalign="center"><msubsup><mover accent="true"><mi>v</mi><mo
   stretchy="true">˜</mo></mover><mrow><mi>q</mi><mi>k</mi></mrow><mrow><m
   i>g</mi><mo>+</mo><mn>1</mn></mrow></msubsup><mo>∉</mo><mfenced
   close="]"
   open="["><mrow><mn>1</mn><mo>−</mo><mi>N</mi><mo>,</mo><mi>N</mi><mo>−<
   /mo><mn>1</mn></mrow></mfenced></mtd></mtr></mtable></mfenced><mtext>,<
   /mtext> :MATH]

   where acceleration factors c[1] and c[2] control how far a particle
   moves in a single iteration, r[1] and r[2] are random values in (0, 1),
   GS[k]^g is the k[th] SNP of
   [MATH: <mi
   mathvariant="italic">Gbes</mi><msubsup><mi>t</mi><mi>g</mi><msub><mi>K<
   /mi><mi>q</mi></msub></msubsup> :MATH]
   (K = K[q]) at iteration g, W[qk]^g is the inertia weight regulating the
   impact of the previous velocity of a particle on its current velocity.

   For the inertia weight, a large weight facilitates the global
   exploration and thus enables the method to execute a search over
   various regions, while a small weight facilitates the local
   exploitation, which helps to search a promising region. In order to
   balance the global exploration and the local exploitation, a dynamical
   inertia weight is introduced, defined as
   [MATH:
   <msubsup><mi>W</mi><mrow><mi>q</mi><mi>k</mi></mrow><mi>g</mi></msubsup
   ><mo>=</mo><mfrac><mrow><mo>max</mo><mfenced close=")"
   open="("><mrow><mi
   mathvariant="italic">coun</mi><msub><mi>t</mi><mi>g</mi></msub></mrow><
   /mfenced><mo>−</mo><mi
   mathvariant="italic">coun</mi><msub><mi>t</mi><mi>g</mi></msub><mfenced
   close="]"
   open="["><mrow><mi>P</mi><msubsup><mi>S</mi><mrow><mi>q</mi><mi>k</mi><
   /mrow><mi>g</mi></msubsup></mrow></mfenced></mrow><mrow><mo>max</mo><mf
   enced close=")" open="("><mrow><mi
   mathvariant="italic">coun</mi><msub><mi>t</mi><mi>g</mi></msub></mrow><
   /mfenced><mo>−</mo><mo>min</mo><mfenced close=")" open="("><mrow><mi
   mathvariant="italic">coun</mi><msub><mi>t</mi><mi>g</mi></msub></mrow><
   /mfenced></mrow></mfrac><mtext>,</mtext> :MATH]

   where count[g] = (ct[1]^g, ⋯, ct[m]^g, ⋯, ct[N]^g), and ct[m]^g is a
   counter that counts the number of SNP m presented in Pbest from
   iteration 1 to iteration g. This strategy allows particles to cover a
   wider search space while the considered SNP is likely to be a random
   one, and to converge on a promising region of the search space while
   capturing a highly suspected SNP.

   Based on v[qk]^g+1, the position of S[qk]^g is updated to S[qk]^g+1
   using the following two equations,
   [MATH: <msubsup><mover accent="true"><mi>S</mi><mo
   stretchy="true">˜</mo></mover><mrow><mi>q</mi><mi>k</mi></mrow><mrow><m
   i>g</mi><mo>+</mo><mn>1</mn></mrow></msubsup><mo>=</mo><msubsup><mi>S</
   mi><mrow><mi>q</mi><mi>k</mi></mrow><mi>g</mi></msubsup><mo>+</mo><msub
   sup><mi>v</mi><mrow><mi>q</mi><mi>k</mi></mrow><mrow><mi>g</mi><mo>+</m
   o><mn>1</mn></mrow></msubsup><mtext>,</mtext> :MATH]
   [MATH:
   <msubsup><mi>S</mi><mrow><mi>q</mi><mi>k</mi></mrow><mrow><mi>g</mi><mo
   >+</mo><mn>1</mn></mrow></msubsup><mo>=</mo><mfenced close=""
   open="{"><mtable columnalign="center"><mtr columnalign="center"><mtd
   columnalign="center"><mo>int</mo><mfenced close=")"
   open="("><msubsup><mover accent="true"><mi>S</mi><mo
   stretchy="true">˜</mo></mover><mrow><mi>q</mi><mi>k</mi></mrow><mrow><m
   i>g</mi><mo>+</mo><mn>1</mn></mrow></msubsup></mfenced></mtd><mtd
   columnalign="center"><msubsup><mover accent="true"><mi>S</mi><mo
   stretchy="true">˜</mo></mover><mrow><mi>q</mi><mi>k</mi></mrow><mrow><m
   i>g</mi><mo>+</mo><mn>1</mn></mrow></msubsup><mo>∈</mo><mfenced
   close="]" open="["
   separators=","><mn>1</mn><mi>N</mi></mfenced></mtd></mtr><mtr
   columnalign="center"><mtd columnalign="center"><mo>int</mo><mfenced
   close=")" open="("><mrow><mi mathvariant="italic">rand</mi><mfenced
   close=")" open="("
   separators=","><mn>1</mn><mi>N</mi></mfenced></mrow></mfenced></mtd><mt
   d columnalign="center"><msubsup><mover accent="true"><mi>S</mi><mo
   stretchy="true">˜</mo></mover><mrow><mi>q</mi><mi>k</mi></mrow><mrow><m
   i>g</mi><mo>+</mo><mn>1</mn></mrow></msubsup><mo>∉</mo><mfenced
   close="]" open="["
   separators=","><mn>1</mn><mi>N</mi></mfenced></mtd></mtr></mtable></mfe
   nced><mtext>,</mtext> :MATH]

   where rand( ⋅ ) is the random function and int(⋅) is the rounding
   function. Random functions used for updating both v[qk]^g+1 and
   S[qk]^g+1 help to increase the diversity of the search.

   Another highlight introduced to the PSO based search is the opposition
   based learning, basic principle of which is the consideration of a
   solution and its corresponding opposite solution simultaneously to
   approximate the global optima [[87]43]. In our PSO based search, if the
   solution is Position[g](q), its corresponding opposite solution is
   defined as
   [MATH: <mi
   mathvariant="italic">Positio</mi><msubsup><mi>n</mi><mi>g</mi><mo>'</mo
   ></msubsup><mfenced close=")"
   open="("><mi>q</mi></mfenced><mo>=</mo><mn>1</mn><mo>+</mo><mi>N</mi><m
   o>−</mo><mi
   mathvariant="italic">Positio</mi><msub><mi>n</mi><mi>g</mi></msub><mfen
   ced close=")" open="("><mi>q</mi></mfenced><mtext>,</mtext> :MATH]

   which not only expands the search space and enhances the global
   explorative ability, but also accelerates the convergence and avoids
   premature convergence.

   By comparing NCI values of Position[g]^’(q), Position[g](q) and
   Pbest[g](q), the individual experience of the q[th] particle at
   iteration g + 1, i.e., Pbest[g+1](q), is updated to the best one among
   them. Similarly, whether the common knowledge of the swarm at iteration
   g + 1, e.g., Gbest[g+1]^K, is updated or maintained as Gbest[g]^K
   depends on individual experiences of particles with the same order K.
   Specifically, Gbest[g+1]^K is updated to Pbest[g+1](q) while NCI value
   of Pbest[g+1](q) is the highest one among those of individual
   experiences of particles with the order K, and is also higher than that
   of Gbest[g]^K. When completing the iteration process, the PSO based
   search reports the sorted Pbest[G] according to their descending NCI
   values as its detected epistatic interactions.

Epistasis hypergraph and deep analyses

   In visualizing stage, lists of all epistatic interactions identified in
   detecting stage, or similar lists generated by other methods, are
   exported to construct an undirected epistasis hypergraph for refining
   the interpretation of genetic basis of disease susceptibility and
   disease etiology, by capturing and visualizing broader epistasis
   landscape. The hypergraph is composed of weighted vertices and
   unweighted edges. For weighted vertices, two types of them are
   presented, that is, real vertices and virtual vertices. A real vertex
   represents a SNP and its weight is the NCI value between the SNP and
   the phenotype, corresponding to the main effect of the SNP to the
   phenotype. A virtual vertex denotes the n-order interaction effect of
   the combination of linking SNPs to the phenotype, and its weight is the
   NCI value between the SNP combination and the phenotype. In the
   hypergraph, each red circle is a real vertex in which SNP name or index
   is labeled, each non-red circle is a virtual vertex. Sizes of vertices
   are respectively in proportion to their weights. For unweighted edges,
   each of them links a SNP and an effect of this SNP to the phenotype.
   From the hypergraph, effects of different orders, especially high
   orders, as well as topology structures of SNPs, can be intuitively
   visualized and compared.

   For deep analyses of the constructed hypergraph, several useful tools
   are also provided in CINOEDV. For example, epistatic interactions can
   be visually displayed according to their descending effects, either NCI
   values or CCI values; penetrance of an epistatic interaction can be
   estimated and visualized; degree of a real vertex in the hypergraph and
   connectivity of the hypergraph can be further analyzed. More details
   about these tools are available in its user manual. With the help of
   these tools, some hidden clues for better understanding the underlying
   genetic architecture of complex diseases could be revealed.

Results and Discussion

Experiments on simulation data

Detection power analysis for pairwise epistatic interactions

   Six commonly used models of epistatic interactions are simulated for
   the study. Details of these models are given in Fig. [88]1. For each
   model, 200 data sets are generated by the simulator epiSIM [[89]44],
   which describes the simulation steps of SNP data sets as well as true
   epistatic interactions of SNPs in detail, and has been used in several
   references [[90]26, [91]45, [92]46]. Among them, each data set contains