Abstract

   The causes of many complex human diseases are still largely unknown.
   Genetics plays an important role in uncovering the molecular mechanisms
   of complex human diseases. A key step to characterize the genetics of a
   complex human disease is to unbiasedly identify disease-associated gene
   transcripts on a whole-genome scale. Confounding factors could cause
   false positives. Paired design, such as measuring gene expression
   before and after treatment for the same subject, can reduce the effect
   of known confounding factors. However, not all known confounding
   factors can be controlled in a paired/match design. Model-based
   clustering, such as mixtures of hierarchical models, has been proposed
   to detect gene transcripts differentially expressed between paired
   samples. To the best of our knowledge, no model-based gene clustering
   methods have the capacity to adjust for the effects of covariates yet.
   In this article, we proposed a novel mixture of hierarchical models
   with covariate adjustment in identifying differentially expressed
   transcripts using high-throughput whole-genome data from paired design.
   Both simulation study and real data analysis show the good performance
   of the proposed method.

Supplementary Information

   The online version contains supplementary material available at
   10.1186/s12859-023-05556-x.

   Keywords: Curse of dimensionality, Confounding, EM algorithm, RNAseq

Introduction

   Genome-wide differential gene expression analysis is widely used for
   the elucidation of the molecular mechanisms of complex human diseases.
   One popular and powerful approach to detect differentially expressed
   genes is the probe-wise linear regression analysis combined with the
   control of multiple testing, such as limma [[29]1]. That is, we first
   perform linear regression for each probe and then adjust p-values for
   controlling multiple testing. One advantage of this approach is its
   capacity to adjust for potential confounding factors.

   Another approach for detecting differentially expressed genes is the
   model-based clustering via mixture of Bayesian hierarchical models
   (MBHM) [[30]2–[31]7], which can borrow information across genes to
   cluster genes. Probe clustering based on MBHMs treats gene transcripts
   as “samples” and samples as “variables”. Therefore, transcript
   clustering based on MBHMs has large number of “samples” and relatively
   small number of “variables”, hence does not have the curse-of
   dimensionality problem. In addition, unlike transcript-specific tests
   that have several parameters per transcript, transcript clustering
   based on MBHMs has only a few hyperparameters per cluster to be
   estimated and could borrow information across transcripts to estimate
   model hyperparameters. These approaches generally assume that samples
   under two groups are obtained independently. [[32]8] proposed a
   constrained MBHM to identify genetic outcomes measured from
   paired/matched designs.

   Paired design is commonly used in study design for its homogeneous
   external environment for comparing measurements under different
   conditions. However, not all known confounding factors can be
   controlled in a paired/match design. Hence, we might still need to
   adjust the effects of confounding factors for data from a
   paired/matched design.

   Mixture of regressions or mixture of experts model [[33]9–[34]11] have
   been proposed in literature to do clustering with capacity to adjust
   for covariates. To best of our knowledge, this approach does not have
   constraints on positive, negative, and constant means and has not been
   applied to detect differentially expressed genes.

   In this article, we proposed a novel mixture of hierarchical models
   with covariate adjustment in identifying differentially expressed
   transcripts using high-throughput whole genome data from paired design.

Method

   We assumed that gene transcripts can be roughly classified into 3
   clusters based on their expression levels in subjects after treatment
   (denoted as condition 1) relative to those before treatment (denoted as
   condition 2):
    1. Transcripts after treatment have higher expression levels than
       those before treatment, i.e., over-expressed (OE) in condition 1;
    2. Transcripts after treatment have lower expression levels than those
       before treatment, i.e., under-expressed (UE) in condition 1;
    3. Transcripts after treatment have same expression levels than those
       before treatment, i.e., non-differentially expressed (NE) between
       condition 1 and matched condition 2.

   We followed [[35]8] to directly model the marginal distributions of
   gene transcripts in the 3 clusters. In [[36]8], they proposed a mixture
   of three-component hierarchical distributions to characterize the
   within-pair difference of gene expression. We extended their model by
   incorporating potential confounding factors (such as Age and Sex) in
   the mixture of hierarchical models, which might affect the response of
   gene expression to drug treatment.

   Note that this extension is non-trivial, just like multiple linear
   regression is not just a simple extension to simple linear regression.

   We assumed that data have been processed so that the distributions of
   mRNA expression levels are close to normal distributions. For RNAseq
   data, we can apply VOOM transformation [[37]12] or countTransformers
   [[38]13] before applying eLNNpairedCov.

A mixture of hierarchical models

   For the
   [MATH: <msup><mi>g</mi><mrow><mi
   mathvariant="italic">th</mi></mrow></msup> :MATH]
   gene transcript, let
   [MATH: <msub><mi>x</mi><mrow><mi
   mathvariant="italic">gl</mi></mrow></msub> :MATH]
   and
   [MATH: <msub><mi>y</mi><mrow><mi
   mathvariant="italic">gl</mi></mrow></msub> :MATH]
   denote the expression levels of the
   [MATH: <msup><mi>l</mi><mrow><mi
   mathvariant="italic">th</mi></mrow></msup> :MATH]
   subject under two different conditions, e.g., before and after
   treatment,
   [MATH:
   <mrow><mi>g</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>…</mo><mo>,</mo><mi>G
   </mi></mrow> :MATH]
   ,
   [MATH:
   <mrow><mi>l</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>…</mo><mo>,</mo><mi>n
   </mi></mrow> :MATH]
   , where G is the number of transcripts and n is the number of subjects
   (i.e., the number of pairs). Let
   [MATH: <mrow><msub><mi>d</mi><mrow><mi
   mathvariant="italic">gl</mi></mrow></msub><mo>=</mo><msub><mo>log</mo><
   mn>2</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi>y</mi><mrow><mi
   mathvariant="italic">gl</mi></mrow></msub><mo
   stretchy="false">)</mo></mrow><mo>-</mo><msub><mo>log</mo><mn>2</mn></m
   sub><mrow><mo stretchy="false">(</mo><msub><mi>x</mi><mrow><mi
   mathvariant="italic">gl</mi></mrow></msub><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   be the log2 difference for the
   [MATH: <msup><mi>g</mi><mrow><mi
   mathvariant="italic">th</mi></mrow></msup> :MATH]
   gene transcript of
   [MATH: <msup><mi>l</mi><mrow><mi
   mathvariant="italic">th</mi></mrow></msup> :MATH]
   subject. Denote
   [MATH: <mrow><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo>=</mo><msup><mfenced
   close=")"
   open="("><msub><mi>d</mi><mrow><mi>g</mi><mn>1</mn></mrow></msub><mo>,<
   /mo><mo>…</mo><mo>,</mo><msub><mi>d</mi><mrow><mi
   mathvariant="italic">gn</mi></mrow></msub></mfenced><mi>T</mi></msup></
   mrow> :MATH]
   . We assumed that
   [MATH: <msub><mi mathvariant="bold">d</mi><mi>g</mi></msub> :MATH]
   is conditionally normally distributed given mean vector and covariance
   matrix. Let
   [MATH: <msup><mrow><mrow><mi
   mathvariant="bold-italic">W</mi></mrow></mrow><mi>T</mi></msup> :MATH]
   be the
   [MATH: <mrow><mi>n</mi><mo>×</mo><mo
   stretchy="false">(</mo><mi>p</mi><mo>+</mo><mn>1</mn><mo
   stretchy="false">)</mo></mrow> :MATH]
   design matrix, where p is the number of covariates. The first column of
   [MATH: <msup><mrow><mrow><mi
   mathvariant="bold-italic">W</mi></mrow></mrow><mi>T</mi></msup> :MATH]
   is the vector of ones, indicating intercept. Let
   [MATH: <mrow><mi mathvariant="bold-italic">η</mi></mrow> :MATH]
   be the
   [MATH: <mrow><mo
   stretchy="false">(</mo><mi>p</mi><mo>+</mo><mn>1</mn><mo
   stretchy="false">)</mo><mo>×</mo><mn>1</mn></mrow> :MATH]
   vector of coefficients for the intercept and covariate effects. We
   assume following mixture of three-component hierarchical models:

   For gene transcripts over-expressed (OE) in post-treatment samples, we
   expect that the mean log2 differences are positive. Hence, we assume
   [MATH: <mrow><mtable><mtr><mtd
   columnalign="right"><mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mrow></mrow><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mrow><mo
   stretchy="false">|</mo></mrow><mfenced close=")"
   open="("><msub><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mi>g</mi></msub><mo>,</mo><msub
   ><mi>τ</mi><mi>g</mi></msub></mfenced><mo>∼</mo><mi>N</mi><mfenced
   close=")" open="("><msub><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mi>g</mi></msub><mo>,</mo><msub
   sup><mi>τ</mi><mi>g</mi><mrow><mo>-</mo><mn>1</mn></mrow></msubsup><msu
   b><mrow><mi
   mathvariant="bold-italic">I</mi></mrow><mi>n</mi></msub></mfenced></mro
   w></mtd></mtr><mtr><mtd
   columnalign="right"><mrow><mrow></mrow><mrow></mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mrow></mrow><mspace
   width="1em"></mspace><msub><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mi>g</mi></msub><mrow><mo
   stretchy="false">|</mo></mrow><msub><mi>τ</mi><mi>g</mi></msub><mo>∼</m
   o><mi>N</mi><mfenced close=")" open="("><mo>exp</mo><mrow><mo
   stretchy="false">[</mo><msup><mrow><mrow><mi
   mathvariant="bold-italic">W</mi></mrow></mrow><mi>T</mi></msup><msub><m
   row><mi mathvariant="bold-italic">η</mi></mrow><mn>1</mn></msub><mo
   stretchy="false">]</mo></mrow><mo>,</mo><msub><mi>k</mi><mn>1</mn></msu
   b><msubsup><mi>τ</mi><mi>g</mi><mrow><mo>-</mo><mn>1</mn></mrow></msubs
   up><msub><mrow><mi
   mathvariant="bold-italic">I</mi></mrow><mi>n</mi></msub></mfenced></mro
   w></mtd></mtr><mtr><mtd
   columnalign="right"><mrow><mrow></mrow><mrow></mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mrow></mrow><mspace
   width="1em"></mspace><msub><mi>τ</mi><mi>g</mi></msub><mo>∼</mo><mi
   mathvariant="normal">Γ</mi><mfenced close=")"
   open="("><msub><mi>α</mi><mn>1</mn></msub><mo>,</mo><msub><mi>β</mi><mn
   >1</mn></msub></mfenced></mrow></mtd></mtr></mtable></mrow> :MATH]

   where
   [MATH:
   <mrow><msub><mi>k</mi><mn>1</mn></msub><mo>></mo><mn>0</mn><mo>,</mo><m
   sub><mi>α</mi><mn>1</mn></msub><mo>></mo><mn>0</mn></mrow> :MATH]
   and
   [MATH:
   <mrow><msub><mi>β</mi><mn>1</mn></msub><mo>></mo><mn>0</mn></mrow>
   :MATH]
   .
   [MATH: <mrow><mi mathvariant="normal">Γ</mi><mfenced close=")"
   open="("><msub><mi>α</mi><mn>1</mn></msub><mo>,</mo><msub><mi>β</mi><mn
   >1</mn></msub></mfenced></mrow> :MATH]
   denotes the Gamma distribution with shape parameter
   [MATH: <msub><mi>α</mi><mn>1</mn></msub> :MATH]
   and rate parameter
   [MATH: <msub><mi>β</mi><mn>1</mn></msub> :MATH]
   . That is, we assume that (1) the mean vectors
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mi>g</mi></msub> :MATH]
   ,
   [MATH: <mrow><mi>g</mi><mo>=</mo><mn>1</mn></mrow> :MATH]
   ,
   [MATH: <mo>…</mo> :MATH]
   , G, given the variance
   [MATH:
   <msubsup><mi>τ</mi><mi>g</mi><mrow><mo>-</mo><mn>1</mn></mrow></msubsup
   > :MATH]
   follow a multivariate normal distribution with mean vector
   [MATH: <mrow><mo>exp</mo><mfenced close="]"
   open="["><msup><mrow><mrow><mi
   mathvariant="bold-italic">W</mi></mrow></mrow><mi>T</mi></msup><msub><m
   row><mi
   mathvariant="bold-italic">η</mi></mrow><mn>1</mn></msub></mfenced></mro
   w> :MATH]
   and covariance matrix
   [MATH:
   <mrow><msub><mi>k</mi><mn>1</mn></msub><msubsup><mi>τ</mi><mi>g</mi><mr
   ow><mo>-</mo><mn>1</mn></mrow></msubsup><msub><mrow><mi
   mathvariant="bold-italic">I</mi></mrow><mi>n</mi></msub></mrow> :MATH]
   ; and (2) the variances
   [MATH:
   <msubsup><mi>τ</mi><mi>g</mi><mrow><mo>-</mo><mn>1</mn></mrow></msubsup
   > :MATH]
   ,
   [MATH: <mrow><mi>g</mi><mo>=</mo><mn>1</mn></mrow> :MATH]
   ,
   [MATH: <mo>…</mo> :MATH]
   , G, follow a Gamma distribution with shape parameter
   [MATH: <msub><mi>α</mi><mn>1</mn></msub> :MATH]
   and rate parameter
   [MATH: <msub><mi>β</mi><mn>1</mn></msub> :MATH]
   .

   Note that the exponential of the intercept
   [MATH: <mrow><mo>exp</mo><mo
   stretchy="false">(</mo><msub><mi>η</mi><mn>10</mn></msub><mo
   stretchy="false">)</mo></mrow> :MATH]
   indicates the mean of log2 difference is positive.

   For gene transcripts under-expressed (UE) in post-treatment samples, we
   expect that the mean log2 differences are negative. Hence, we assume
   [MATH: <mrow><mtable><mtr><mtd
   columnalign="right"><mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mrow></mrow><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mrow><mo
   stretchy="false">|</mo></mrow><mfenced close=")"
   open="("><msub><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mi>g</mi></msub><mo>,</mo><msub
   ><mi>τ</mi><mi>g</mi></msub></mfenced><mo>∼</mo><mi>N</mi><mfenced
   close=")" open="("><msub><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mi>g</mi></msub><mo>,</mo><msub
   sup><mi>τ</mi><mi>g</mi><mrow><mo>-</mo><mn>1</mn></mrow></msubsup><msu
   b><mrow><mi
   mathvariant="bold-italic">I</mi></mrow><mi>n</mi></msub></mfenced></mro
   w></mtd></mtr><mtr><mtd
   columnalign="right"><mrow><mrow></mrow><mrow></mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mrow></mrow><mspace
   width="1em"></mspace><msub><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mi>g</mi></msub><mrow><mo
   stretchy="false">|</mo></mrow><msub><mi>τ</mi><mi>g</mi></msub><mo>∼</m
   o><mi>N</mi><mfenced close=")" open="("><mo>-</mo><mo>exp</mo><mrow><mo
   stretchy="false">[</mo><msup><mrow><mrow><mi
   mathvariant="bold-italic">W</mi></mrow></mrow><mi>T</mi></msup><msub><m
   row><mi mathvariant="bold-italic">η</mi></mrow><mn>2</mn></msub><mo
   stretchy="false">]</mo></mrow><mo>,</mo><msub><mi>k</mi><mn>2</mn></msu
   b><msubsup><mi>τ</mi><mi>g</mi><mrow><mo>-</mo><mn>1</mn></mrow></msubs
   up><msub><mrow><mi
   mathvariant="bold-italic">I</mi></mrow><mi>n</mi></msub></mfenced></mro
   w></mtd></mtr><mtr><mtd
   columnalign="right"><mrow><mrow></mrow><mrow></mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mrow></mrow><mspace
   width="1em"></mspace><msub><mi>τ</mi><mi>g</mi></msub><mo>∼</mo><mi
   mathvariant="normal">Γ</mi><mfenced close=")"
   open="("><msub><mi>α</mi><mn>2</mn></msub><mo>,</mo><msub><mi>β</mi><mn
   >2</mn></msub></mfenced></mrow></mtd></mtr></mtable></mrow> :MATH]

   where
   [MATH:
   <mrow><msub><mi>k</mi><mn>2</mn></msub><mo>></mo><mn>0</mn><mo>,</mo><m
   sub><mi>α</mi><mn>2</mn></msub><mo>></mo><mn>0</mn></mrow> :MATH]
   ,
   [MATH:
   <mrow><msub><mi>β</mi><mn>2</mn></msub><mo>></mo><mn>0</mn></mrow>
   :MATH]
   , and
   [MATH: <msup><mrow><mrow><mi
   mathvariant="bold-italic">W</mi></mrow></mrow><mi>T</mi></msup> :MATH]
   is the design matrix.

   Note that the negative exponential of the intercept
   [MATH: <mrow><mo>-</mo><mo>exp</mo><mo
   stretchy="false">(</mo><msub><mi>η</mi><mn>20</mn></msub><mo
   stretchy="false">)</mo></mrow> :MATH]
   indicates the mean of log2 difference is negative.

   For gene transcripts non-differentially expressed (NE) between pre- and
   post-treatment samples, we expect the mean log2 differences are zero.
   Hence, we assume
   [MATH: <mrow><mtable><mtr><mtd
   columnalign="right"><mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mrow></mrow><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mrow><mo
   stretchy="false">|</mo></mrow><msub><mi>τ</mi><mi>g</mi></msub><mo>∼</m
   o><mi>N</mi><mfenced close=")" open="("><msup><mrow><mrow><mi
   mathvariant="bold-italic">U</mi></mrow></mrow><mi>T</mi></msup><msub><m
   row><mi
   mathvariant="bold-italic">θ</mi></mrow><mi>g</mi></msub><mo>,</mo><msub
   sup><mi>τ</mi><mi>g</mi><mrow><mo>-</mo><mn>1</mn></mrow></msubsup><msu
   b><mrow><mi
   mathvariant="bold-italic">I</mi></mrow><mi>n</mi></msub></mfenced></mro
   w></mtd></mtr><mtr><mtd
   columnalign="right"><mrow><mrow></mrow><mrow></mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mrow></mrow><mspace
   width="1em"></mspace><msub><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mi>g</mi></msub><mrow><mo
   stretchy="false">|</mo></mrow><msub><mi>τ</mi><mi>g</mi></msub><mo>∼</m
   o><mi>N</mi><mrow><mo stretchy="false">(</mo><msub><mrow><mi
   mathvariant="bold-italic">η</mi></mrow><mn>3</mn></msub><mo>,</mo><msub
   ><mi>k</mi><mn>3</mn></msub><msubsup><mi>τ</mi><mi>g</mi><mrow><mo>-</m
   o><mn>1</mn></mrow></msubsup><msub><mrow><mi
   mathvariant="bold-italic">I</mi></mrow><mi>p</mi></msub><mo
   stretchy="false">)</mo></mrow></mrow></mtd></mtr><mtr><mtd
   columnalign="right"><mrow><mrow></mrow><mrow></mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mrow></mrow><mspace
   width="1em"></mspace><msub><mi>τ</mi><mi>g</mi></msub><mo>∼</mo><mi
   mathvariant="normal">Γ</mi><mfenced close=")"
   open="("><msub><mi>α</mi><mn>3</mn></msub><mo>,</mo><msub><mi>β</mi><mn
   >3</mn></msub></mfenced></mrow></mtd></mtr></mtable></mrow> :MATH]

   where
   [MATH:
   <mrow><msub><mi>k</mi><mn>3</mn></msub><mo>></mo><mn>0</mn><mo>,</mo><m
   sub><mi>α</mi><mn>3</mn></msub><mo>></mo><mn>0</mn></mrow> :MATH]
   and
   [MATH:
   <mrow><msub><mi>β</mi><mn>3</mn></msub><mo>></mo><mn>0</mn></mrow>
   :MATH]
   .
   [MATH: <msup><mrow><mrow><mi
   mathvariant="bold-italic">U</mi></mrow></mrow><mi>T</mi></msup> :MATH]
   is the design matrix without intercept column. That is, the intercepts
   are zero. Note that the intercepts indicate mean log2 differences.
   Hence,
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">η</mi></mrow><mn>3</mn></msub> :MATH]
   is a
   [MATH: <mrow><mi>p</mi><mo>×</mo><mn>1</mn></mrow> :MATH]
   vector of coefficients for the covariates.

   Note that
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mi>g</mi></msub> :MATH]
   measure effects of confounding factors for NE genes. The true effect of
   NE genes are zero (i.e., the intercept of
   [MATH: <mrow><msup><mrow><mrow><mi
   mathvariant="bold-italic">U</mi></mrow></mrow><mi>T</mi></msup><msub><m
   row><mi mathvariant="bold-italic">θ</mi></mrow><mi>g</mi></msub></mrow>
   :MATH]
   is zero in the above model).

   The hyperparameters
   [MATH: <msub><mi>α</mi><mi>c</mi></msub> :MATH]
   and
   [MATH: <msub><mi>β</mi><mi>c</mi></msub> :MATH]
   are shape and rate parameters for the Gamma distribution, respectively,
   [MATH:
   <mrow><mi>c</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mn>3
   </mn></mrow> :MATH]
   . As for
   [MATH:
   <mrow><msub><mi>k</mi><mn>1</mn></msub><mo>,</mo><msub><mi>k</mi><mn>2<
   /mn></msub></mrow> :MATH]
   and
   [MATH: <msub><mi>k</mi><mn>3</mn></msub> :MATH]
   , the variation of the mean vector
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mi>g</mi></msub> :MATH]
   should be smaller than that of the observations
   [MATH: <msub><mi mathvariant="bold">d</mi><mi>g</mi></msub> :MATH]
   . So we expect
   [MATH:
   <mrow><mn>0</mn><mo><</mo><msub><mi>k</mi><mi>c</mi></msub><mo><</mo><m
   n>1</mn><mo>,</mo><mi>c</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo
   >,</mo><mn>3</mn></mrow> :MATH]
   .

   Note that the marginal distribution for each component of the mixture
   is a multivariate t distribution [[39]14, Section 3.7.6]. However, to
   model differentially expressed genes, the multivariate t distributions
   derived from our models have special structure of mean vector and
   covariance matrix.

   For continuous covariates, we require that they are standardized so
   that they have mean zero and variance one. Standardizing continuous
   covariates would make
   [MATH: <mrow><mo>exp</mo><mfenced close=")"
   open="("><msup><mrow><mrow><mi
   mathvariant="bold-italic">W</mi></mrow></mrow><mi>T</mi></msup><msub><m
   row><mi
   mathvariant="bold-italic">η</mi></mrow><mn>1</mn></msub></mfenced></mro
   w> :MATH]
   and
   [MATH: <mrow><mo>exp</mo><mfenced close=")"
   open="("><msup><mrow><mrow><mi
   mathvariant="bold-italic">W</mi></mrow></mrow><mi>T</mi></msup><msub><m
   row><mi
   mathvariant="bold-italic">η</mi></mrow><mn>2</mn></msub></mfenced></mro
   w> :MATH]
   be numerically finite.

   Ideally, we should require
   [MATH: <mrow><msub><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mi>g</mi></msub><mo>></mo><mn>0
   </mn></mrow> :MATH]
   [MATH: <mrow><mo stretchy="false">(</mo><msub><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mi>g</mi></msub><mo><</mo><mn>0
   </mn><mo stretchy="false">)</mo></mrow> :MATH]
   for all transcripts in cluster 1 (cluster 2). To do so, we can assume a
   log normal prior distribution for
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mi>g</mi></msub> :MATH]
   in cluster 1, for instance. However, a log normal distribution could
   not be a conjugate prior for the mean of a normal distribution. It
   would increase the computational burden if non-conjugate priors were
   used. Other alternative models can also be used, such as assuming
   [MATH: <mrow><msub><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mi>g</mi></msub><mrow><mo
   stretchy="false">|</mo></mrow><msub><mi>η</mi><mn>10</mn></msub><mo>=</
   mo><mo>exp</mo><mrow><mo
   stretchy="false">(</mo><msub><mi>η</mi><mn>10</mn></msub><mo
   stretchy="false">)</mo></mrow><mo>+</mo><msup><mrow><mrow><mi
   mathvariant="bold-italic">W</mi></mrow></mrow><mi>T</mi></msup><msub><m
   row><mi mathvariant="bold-italic">η</mi></mrow><mn>1</mn></msub></mrow>
   :MATH]
   and
   [MATH: <msub><mi>η</mi><mn>10</mn></msub> :MATH]
   follows a normal distribution. However, these models do not have
   closed-form marginal densities. Hence, they would substantially
   increase computational burden. Besides, the empirical distribution of
   the mean log2 difference
   [MATH: <msub><mi mathvariant="bold">d</mi><mi>g</mi></msub> :MATH]
   of the differentially expressed gene probes has shown a right-skewed
   pattern, while that of non-differentially expressed genes demonstrates
   an approximate bell shape (see in Additional file [40]1: Figures
   A2-A4). Hence, we require the mean
   [MATH: <mrow><mi>E</mi><mo stretchy="false">(</mo><msub><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mi>g</mi></msub><mo
   stretchy="false">)</mo><mo>></mo><mn>0</mn></mrow> :MATH]
   (
   [MATH: <mrow><mi>E</mi><mo stretchy="false">(</mo><msub><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mi>g</mi></msub><mo
   stretchy="false">)</mo><mo><</mo><mn>0</mn></mrow> :MATH]
   ) for cluster 1 (cluster 2) by assuming
   [MATH: <mrow><mi>E</mi><mo stretchy="false">(</mo><msub><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mi>g</mi></msub><mo
   stretchy="false">)</mo></mrow> :MATH]
   for cluster 1 (cluster 2) to be
   [MATH: <mrow><mo>exp</mo><mo
   stretchy="false">[</mo><msup><mrow><mrow><mi
   mathvariant="bold-italic">W</mi></mrow></mrow><mi>T</mi></msup><msub><m
   row><mi mathvariant="bold-italic">η</mi></mrow><mn>1</mn></msub><mo
   stretchy="false">]</mo></mrow> :MATH]
   (
   [MATH: <mrow><mo>-</mo><mo>exp</mo><mo
   stretchy="false">[</mo><msup><mrow><mrow><mi
   mathvariant="bold-italic">W</mi></mrow></mrow><mi>T</mi></msup><msub><m
   row><mi mathvariant="bold-italic">η</mi></mrow><mn>2</mn></msub><mo
   stretchy="false">]</mo></mrow> :MATH]
   ).

   The proposed mixture models have meaningful biological interpretations
   for mean structures. In particular, for the OE cluster, the intercept
   [MATH: <mrow><mo>exp</mo><mo
   stretchy="false">(</mo><msub><mi>η</mi><mn>10</mn></msub><mo
   stretchy="false">)</mo></mrow> :MATH]
   can be interpreted as the expected average log2 difference of gene
   transcripts when the value of all the p covariates are zero; the
   coefficient
   [MATH: <msub><mi>η</mi><mrow><mn>1</mn><mi>i</mi></mrow></msub> :MATH]
   of covariate i can be interpreted as there exists
   [MATH: <mrow><mo>exp</mo><mo
   stretchy="false">(</mo><msub><mi>η</mi><mrow><mn>1</mn><mi>i</mi></mrow
   ></msub><mo stretchy="false">)</mo></mrow> :MATH]
   fold-change associated with the one unit increase in covariate i while
   the values of the remaining
   [MATH: <mrow><mo
   stretchy="false">(</mo><mi>p</mi><mo>-</mo><mn>1</mn><mo
   stretchy="false">)</mo></mrow> :MATH]
   covariates are fixed; for the UE cluster, the intercept
   [MATH: <mrow><mo>-</mo><mo>exp</mo><mo
   stretchy="false">(</mo><msub><mi>η</mi><mn>20</mn></msub><mo
   stretchy="false">)</mo></mrow> :MATH]
   can be interpreted as the expected average log2 difference of gene
   transcripts when the value of all the p covariates are zero; the
   coefficient
   [MATH: <msub><mi>η</mi><mrow><mn>2</mn><mi>i</mi></mrow></msub> :MATH]
   of covariate i can be interpreted as there exists
   [MATH: <mrow><mo>exp</mo><mo
   stretchy="false">(</mo><msub><mi>η</mi><mrow><mn>2</mn><mi>i</mi></mrow
   ></msub><mo stretchy="false">)</mo></mrow> :MATH]
   fold-change associated with the one unit increase in covariate i while
   the values of the remaining
   [MATH: <mrow><mo
   stretchy="false">(</mo><mi>p</mi><mo>-</mo><mn>1</mn><mo
   stretchy="false">)</mo></mrow> :MATH]
   covariates are fixed; while for the NE cluster, the coefficient
   [MATH: <msub><mi>η</mi><mrow><mn>3</mn><mi>i</mi></mrow></msub> :MATH]
   of covariate i can be interpreted as
   [MATH: <msub><mi>η</mi><mrow><mn>3</mn><mi>i</mi></mrow></msub> :MATH]
   unit increase of expected log2 difference of gene transcripts
   associated with the one unit increase in covariate i while the values
   of the remaining
   [MATH: <mrow><mo
   stretchy="false">(</mo><mi>p</mi><mo>-</mo><mn>1</mn><mo
   stretchy="false">)</mo></mrow> :MATH]
   covariates are fixed. They also are convenient to get closed-form
   marginal densities so that we can use Expectation-Maximization (EM)
   algorithm to estimate hyperparameters, instead of using
   computational-intensive algorithms, such as Markov chain Monte Carlo
   (MCMC).

Marginal density functions

   Let
   [MATH: <mrow><msub><mi>f</mi><mn>1</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   ,
   [MATH: <mrow><msub><mi>f</mi><mn>2</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   ,
   [MATH: <mrow><msub><mi>f</mi><mn>3</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   be the marginal densities of the 3 hierarchical models, and
   [MATH: <mrow><mi mathvariant="bold-italic">π</mi></mrow> :MATH]
   [MATH: <mo>=</mo> :MATH]
   [MATH: <mrow><mo
   stretchy="false">(</mo><msub><mi>π</mi><mn>1</mn></msub><mo>,</mo><msub
   ><mi>π</mi><mn>2</mn></msub><mo>,</mo><msub><mi>π</mi><mn>3</mn></msub>
   <mo stretchy="false">)</mo></mrow> :MATH]
   be the vector of cluster mixture proportions, where
   [MATH: <mrow><mi>ψ</mi><mo>=</mo><msup><mfenced close=")"
   open="("><msub><mi>α</mi><mn>1</mn></msub><mo>,</mo><msub><mi>β</mi><mn
   >1</mn></msub><mo>,</mo><msub><mi>k</mi><mn>1</mn></msub><mo>,</mo><msu
   bsup><mrow><mi
   mathvariant="bold-italic">η</mi></mrow><mn>1</mn><mi>T</mi></msubsup><m
   o>,</mo><msub><mi>α</mi><mn>2</mn></msub><mo>,</mo><msub><mi>β</mi><mn>
   2</mn></msub><mo>,</mo><msub><mi>k</mi><mn>2</mn></msub><mo>,</mo><msub
   sup><mrow><mi
   mathvariant="bold-italic">η</mi></mrow><mn>2</mn><mi>T</mi></msubsup><m
   o>,</mo><msub><mi>α</mi><mn>3</mn></msub><mo>,</mo><msub><mi>β</mi><mn>
   3</mn></msub><mo>,</mo><msub><mi>k</mi><mn>3</mn></msub><mo>,</mo><msub
   sup><mrow><mi
   mathvariant="bold-italic">η</mi></mrow><mn>3</mn><mi>T</mi></msubsup></
   mfenced><mi>T</mi></msup></mrow> :MATH]
   . Then the marginal density of
   [MATH: <msub><mi mathvariant="bold">d</mi><mi>g</mi></msub> :MATH]
   is:
   [MATH: <mrow><mtable><mtr><mtd
   columnalign="right"><mrow><mi>f</mi><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow><mo>=</mo><msub><mi>π</mi><mn>1</mn></msu
   b><msub><mi>f</mi><mn>1</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow><mo>+</mo><msub><mi>π</mi><mn>2</mn></msu
   b><msub><mi>f</mi><mn>2</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow><mo>+</mo><msub><mi>π</mi><mn>3</mn></msu
   b><msub><mi>f</mi><mn>3</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow><mo>.</mo></mrow></mtd></mtr></mtable></m
   row> :MATH]

Determining transcript cluster membership

   The transcript-cluster membership is determined based on the posterior
   probabilities,
   [MATH: <msub><mi>ζ</mi><mrow><mi
   mathvariant="italic">gc</mi></mrow></msub> :MATH]
   [MATH: <mo>=</mo> :MATH]
   [MATH: <mrow><mi>P</mi><mi>r</mi><mo
   stretchy="false">(</mo><msup><mi>g</mi><mrow><mi
   mathvariant="italic">th</mi></mrow></msup></mrow> :MATH]
   gene transcript in cluster c
   [MATH: <mrow><mrow><mo stretchy="false">|</mo></mrow><msub><mrow><mi
   mathvariant="bold-italic">d</mi></mrow><mi>g</mi></msub><mrow><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   . We can get
   [MATH: <mrow><mtable><mtr><mtd
   columnalign="right"><mrow><mtable><mtr><mtd
   columnalign="right"><mrow><msub><mi>ζ</mi><mrow><mi
   mathvariant="italic">gc</mi></mrow></msub><mo>=</mo><mstyle
   displaystyle="true"
   scriptlevel="0"><mfrac><mrow><msub><mi>π</mi><mi>c</mi></msub><msub><mi
   >f</mi><mi>c</mi></msub><mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow></mrow><mrow><msub><mi>π</mi><mn>1</mn></
   msub><msub><mi>f</mi><mn>1</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow><mo>+</mo><msub><mi>π</mi><mn>2</mn></msu
   b><msub><mi>f</mi><mn>2</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow><mo>+</mo><msub><mi>π</mi><mn>3</mn></msu
   b><msub><mi>f</mi><mn>3</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow></mrow></mfrac></mstyle><mo>,</mo><mi>c</
   mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mn>3</mn><mo>.</m
   o></mrow></mtd></mtr></mtable></mrow></mtd></mtr></mtable></mrow>
   :MATH]
   1

   We determine a transcript’s cluster membership as follows: If the
   maximum value among
   [MATH: <mrow><msub><mi>ζ</mi><mrow><mi
   mathvariant="italic">gi</mi></mrow></msub><mo>,</mo><mi>i</mi><mo>=</mo
   ><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mn>3</mn></mrow> :MATH]
   is
   [MATH: <msub><mi>ζ</mi><mrow><mi
   mathvariant="italic">gc</mi></mrow></msub> :MATH]
   , then the transcript g belongs to cluster c.

   The true values of
   [MATH: <msub><mi>π</mi><mn>1</mn></msub> :MATH]
   ,
   [MATH: <msub><mi>π</mi><mn>2</mn></msub> :MATH]
   ,
   [MATH: <msub><mi>π</mi><mn>3</mn></msub> :MATH]
   , and
   [MATH: <mi>ψ</mi> :MATH]
   are unknown. We use estimated values to determine transcripts’ cluster
   membership.

Parameter estimation via EM algorithm

   We used expectation-maximization (EM) algorithm [[41]15] to estimate
   the model parameters
   [MATH: <mrow><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mo>=</mo><msup><mfenced
   close=")"
   open="("><msub><mi>π</mi><mn>1</mn></msub><mo>,</mo><msub><mi>π</mi><mn
   >2</mn></msub><mo>,</mo><msub><mi>π</mi><mn>3</mn></msub></mfenced><mi>
   T</mi></msup></mrow> :MATH]
   and
   [MATH: <mi>ψ</mi> :MATH]
   .

   Let
   [MATH: <mrow><msub><mrow><mi
   mathvariant="bold-italic">z</mi></mrow><mi>g</mi></msub><mo>=</mo><mrow
   ><mo
   stretchy="false">(</mo><msub><mi>z</mi><mrow><mi>g</mi><mn>1</mn></mrow
   ></msub><mo>,</mo><msub><mi>z</mi><mrow><mi>g</mi><mn>2</mn></mrow></ms
   ub><mo>,</mo><msub><mi>z</mi><mrow><mi>g</mi><mn>3</mn></mrow></msub><m
   o stretchy="false">)</mo></mrow></mrow> :MATH]
   to be the indicator vector indicating if gene transcript g belongs to a
   cluster or not. To stablize the estimate of
   [MATH: <mrow><mi mathvariant="bold-italic">π</mi></mrow> :MATH]
   when
   [MATH: <msub><mi>π</mi><mi>c</mi></msub> :MATH]
   is very small, we assume that the cluster mixture proportions
   [MATH: <mrow><mi mathvariant="bold-italic">π</mi></mrow> :MATH]
   follows a symmetric Dirichlet
   [MATH: <mrow><mi>D</mi><mo stretchy="false">(</mo><mi
   mathvariant="bold">b</mi><mo stretchy="false">)</mo></mrow> :MATH]
   distribution, i.e.,
   [MATH: <mrow><mi>f</mi><mrow><mo stretchy="false">(</mo><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mo
   stretchy="false">)</mo></mrow><mo>=</mo><mfrac><mrow><mi
   mathvariant="normal">Γ</mi><mo
   stretchy="false">(</mo><msubsup><mo>∑</mo><mrow><mi>c</mi><mo>=</mo><mn
   >1</mn></mrow><mn>3</mn></msubsup><msub><mi>b</mi><mi>c</mi></msub><mo
   stretchy="false">)</mo></mrow><mrow><msubsup><mo>∏</mo><mrow><mi>c</mi>
   <mo>=</mo><mn>1</mn></mrow><mn>3</mn></msubsup><mi
   mathvariant="normal">Γ</mi><mrow><mo
   stretchy="false">(</mo><msub><mi>b</mi><mi>c</mi></msub><mo
   stretchy="false">)</mo></mrow></mrow></mfrac><msubsup><mo>∏</mo><mrow><
   mi>c</mi><mo>=</mo><mn>1</mn></mrow><mn>3</mn></msubsup><msubsup><mi>π<
   /mi><mi>c</mi><mrow><msub><mi>b</mi><mi>c</mi></msub><mo>-</mo><mn>1</m
   n></mrow></msubsup></mrow> :MATH]
   . Therefore, the likelihood function for the complete data
   [MATH: <mrow><mo stretchy="false">(</mo><mi
   mathvariant="bold">d</mi><mo>,</mo><mrow><mi
   mathvariant="bold-italic">z</mi></mrow><mo>,</mo><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mo
   stretchy="false">)</mo></mrow> :MATH]
   is
   [MATH: <mrow><mtable><mtr><mtd columnalign="right"><mrow><mi>L</mi><mo
   stretchy="false">(</mo><mi>ψ</mi><mo stretchy="false">|</mo><mi
   mathvariant="bold">d</mi><mo>,</mo><mrow><mi
   mathvariant="bold-italic">z</mi></mrow><mo>,</mo><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mo
   stretchy="false">)</mo><mo>=</mo></mrow></mtd><mtd
   columnalign="left"><mrow><mspace
   width="0.166667em"></mspace><mi>f</mi><mo stretchy="false">(</mo><mi
   mathvariant="bold">d</mi><mo>,</mo><mrow><mi
   mathvariant="bold-italic">z</mi></mrow><mo>,</mo><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow></mtd></mtr><mtr><mtd
   columnalign="right"><mrow><mrow></mrow><mo>=</mo></mrow></mtd><mtd
   columnalign="left"><mrow><mspace
   width="0.166667em"></mspace><mi>f</mi><mo stretchy="false">(</mo><mi
   mathvariant="bold">d</mi><mo>,</mo><mrow><mi
   mathvariant="bold-italic">z</mi></mrow><mo
   stretchy="false">|</mo><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mo>,</mo><mi>ψ</mi><mo
   stretchy="false">)</mo><mi>f</mi><mo stretchy="false">(</mo><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow></mtd></mtr><mtr><mtd
   columnalign="right"><mrow><mrow></mrow><mo>=</mo></mrow></mtd><mtd
   columnalign="left"><mrow><mspace
   width="0.166667em"></mspace><mi>f</mi><mo stretchy="false">(</mo><mi
   mathvariant="bold">d</mi><mo>,</mo><mrow><mi
   mathvariant="bold-italic">z</mi></mrow><mo
   stretchy="false">|</mo><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mo>,</mo><mi>ψ</mi><mo
   stretchy="false">)</mo><mi>f</mi><mo stretchy="false">(</mo><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mo
   stretchy="false">)</mo></mrow></mtd></mtr><mtr><mtd
   columnalign="right"><mrow><mrow></mrow><mo>=</mo></mrow></mtd><mtd
   columnalign="left"><mrow><mfenced close=")"
   open="("><munderover><mo>∏</mo><mrow><mi>g</mi><mo>=</mo><mn>1</mn></mr
   ow><mi>G</mi></munderover><mi>f</mi><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo>,</mo><msub><mrow><mi
   mathvariant="bold-italic">z</mi></mrow><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo>,</mo><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mo
   stretchy="false">)</mo></mrow></mfenced><mi>D</mi><mi>i</mi><mi>r</mi><
   mrow><mo stretchy="false">(</mo><mi mathvariant="bold">b</mi><mo
   stretchy="false">)</mo></mrow></mrow></mtd></mtr><mtr><mtd
   columnalign="right"><mrow><mrow></mrow><mo>=</mo></mrow></mtd><mtd
   columnalign="left"><mfenced close=")"
   open="("><munderover><mo>∏</mo><mrow><mi>g</mi><mo>=</mo><mn>1</mn></mr
   ow><mi>G</mi></munderover><msup><mrow><mo
   stretchy="false">(</mo><msub><mi>π</mi><mn>1</mn></msub><msub><mi>f</mi
   ><mn>1</mn></msub><mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo stretchy="false">)</mo></mrow><mo
   stretchy="false">)</mo></mrow><msub><mi>z</mi><mrow><mi>g</mi><mn>1</mn
   ></mrow></msub></msup><msup><mrow><mo
   stretchy="false">(</mo><msub><mi>π</mi><mn>2</mn></msub><msub><mi>f</mi
   ><mn>2</mn></msub><mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo stretchy="false">)</mo></mrow><mo
   stretchy="false">)</mo></mrow><msub><mi>z</mi><mrow><mi>g</mi><mn>2</mn
   ></mrow></msub></msup><msup><mrow><mo
   stretchy="false">(</mo><msub><mi>π</mi><mn>3</mn></msub><msub><mi>f</mi
   ><mn>3</mn></msub><mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo stretchy="false">)</mo></mrow><mo
   stretchy="false">)</mo></mrow><msub><mi>z</mi><mrow><mi>g</mi><mn>3</mn
   ></mrow></msub></msup></mfenced></mtd></mtr><mtr><mtd
   columnalign="right"><mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mo>×</mo><mfrac><mrow><mi
   mathvariant="normal">Γ</mi><mo
   stretchy="false">(</mo><msubsup><mo>∑</mo><mrow><mi>c</mi><mo>=</mo><mn
   >1</mn></mrow><mn>3</mn></msubsup><msub><mi>b</mi><mi>c</mi></msub><mo
   stretchy="false">)</mo></mrow><mrow><msubsup><mo>∏</mo><mrow><mi>c</mi>
   <mo>=</mo><mn>1</mn></mrow><mn>3</mn></msubsup><mi
   mathvariant="normal">Γ</mi><mrow><mo
   stretchy="false">(</mo><msub><mi>b</mi><mi>c</mi></msub><mo
   stretchy="false">)</mo></mrow></mrow></mfrac><munderover><mo>∏</mo><mro
   w><mi>c</mi><mo>=</mo><mn>1</mn></mrow><mn>3</mn></munderover><msubsup>
   <mi>π</mi><mi>c</mi><mrow><msub><mi>b</mi><mi>c</mi></msub><mo>-</mo><m
   n>1</mn></mrow></msubsup><mo>.</mo></mrow></mtd></mtr></mtable></mrow>
   :MATH]

   Then the log complete-data likelihood function is:
   [MATH: <mrow><mtable><mtr><mtd columnalign="right"><mrow><mi>l</mi><mo
   stretchy="false">(</mo><mi>ψ</mi><mo stretchy="false">|</mo><mi
   mathvariant="bold">d</mi><mo>,</mo><mrow><mi
   mathvariant="bold-italic">z</mi></mrow><mo>,</mo><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mo
   stretchy="false">)</mo></mrow></mtd><mtd
   columnalign="left"><mrow><mo>=</mo><munderover><mo>∑</mo><mrow><mi>g</m
   i><mo>=</mo><mn>1</mn></mrow><mi>G</mi></munderover><mrow><mo
   stretchy="false">(</mo><mrow><mo
   stretchy="false">(</mo><msub><mi>z</mi><mrow><mi>g</mi><mn>1</mn></mrow
   ></msub><mo>log</mo><msub><mi>f</mi><mn>1</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow><mo>+</mo><msub><mi>z</mi><mrow><mi>g</mi
   ><mn>2</mn></mrow></msub><mo>log</mo><msub><mi>f</mi><mn>2</mn></msub><
   mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo stretchy="false">)</mo></mrow><mo
   stretchy="false">)</mo></mrow><mo>+</mo><msub><mi>z</mi><mrow><mi>g</mi
   ><mn>3</mn></mrow></msub><mo>log</mo><msub><mi>f</mi><mn>3</mn></msub><
   mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo stretchy="false">)</mo></mrow><mo
   stretchy="false">)</mo></mrow></mrow></mtd></mtr><mtr><mtd
   columnalign="right"><mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mo>+</mo><munderover><mo>∑</mo><mrow><mi>g</m
   i><mo>=</mo><mn>1</mn></mrow><mi>G</mi></munderover><mrow><mo
   stretchy="false">(</mo><msub><mi>z</mi><mrow><mi>g</mi><mn>1</mn></mrow
   ></msub><mo>log</mo><msub><mi>π</mi><mn>1</mn></msub><mo>+</mo><msub><m
   i>z</mi><mrow><mi>g</mi><mn>2</mn></mrow></msub><mo>log</mo><msub><mi>π
   </mi><mn>2</mn></msub><mo>+</mo><msub><mi>z</mi><mrow><mi>g</mi><mn>3</
   mn></mrow></msub><mo>log</mo><msub><mi>π</mi><mn>3</mn></msub><mo
   stretchy="false">)</mo></mrow></mrow></mtd></mtr><mtr><mtd
   columnalign="right"><mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mo>+</mo><mo>log</mo><mfenced close=")"
   open="("><mfrac><mrow><mi mathvariant="normal">Γ</mi><mo
   stretchy="false">(</mo><msubsup><mo>∑</mo><mrow><mi>c</mi><mo>=</mo><mn
   >1</mn></mrow><mn>3</mn></msubsup><msub><mi>b</mi><mi>c</mi></msub><mo
   stretchy="false">)</mo></mrow><mrow><msubsup><mo>∏</mo><mrow><mi>c</mi>
   <mo>=</mo><mn>1</mn></mrow><mn>3</mn></msubsup><mi
   mathvariant="normal">Γ</mi><mrow><mo
   stretchy="false">(</mo><msub><mi>b</mi><mi>c</mi></msub><mo
   stretchy="false">)</mo></mrow></mrow></mfrac></mfenced><mo>+</mo><munde
   rover><mo>∑</mo><mrow><mi>c</mi><mo>=</mo><mn>1</mn></mrow><mn>3</mn></
   munderover><mrow><mo
   stretchy="false">(</mo><msub><mi>b</mi><mi>c</mi></msub><mo>-</mo><mn>1
   </mn><mo
   stretchy="false">)</mo></mrow><mo>log</mo><msub><mi>π</mi><mi>c</mi></m
   sub><mo>.</mo></mrow></mtd></mtr></mtable></mrow> :MATH]

   The EM algorithm is used to estimate parameters
   [MATH: <mrow><mi mathvariant="bold-italic">π</mi></mrow> :MATH]
   and
   [MATH: <mi>ψ</mi> :MATH]
   . Since
   [MATH: <mrow><mi mathvariant="bold-italic">z</mi></mrow> :MATH]
   is unknown random vector, we integrate it out from the log
   complete-data likelhood function. Here,
   [MATH: <mrow><msub><mrow><mi
   mathvariant="bold-italic">z</mi></mrow><mi>g</mi></msub><mo>=</mo><mrow
   ><mo
   stretchy="false">(</mo><msub><mi>z</mi><mrow><mi>g</mi><mn>1</mn></mrow
   ></msub><mo>,</mo><msub><mi>z</mi><mrow><mi>g</mi><mn>2</mn></mrow></ms
   ub><mo>,</mo><msub><mi>z</mi><mrow><mi>g</mi><mn>3</mn></mrow></msub><m
   o stretchy="false">)</mo></mrow></mrow> :MATH]
   .
   [MATH: <mrow><mtable><mtr><mtd
   columnalign="right"><mrow><mtable><mtr><mtd
   columnalign="right"><msub><mi>ζ</mi><mrow><mi>g</mi><mn>1</mn></mrow></
   msub></mtd><mtd columnalign="left"><mrow><mo>=</mo><mi>E</mi><mo
   stretchy="false">(</mo><msub><mi>z</mi><mrow><mi>g</mi><mn>1</mn></mrow
   ></msub><mo stretchy="false">|</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo>,</mo><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mo>,</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow></mtd></mtr><mtr><mtd
   columnalign="right"><mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mo>=</mo><mi>P</mi><mi>r</mi><mo
   stretchy="false">(</mo><msub><mi>z</mi><mrow><mi>g</mi><mn>1</mn></mrow
   ></msub><mo>=</mo><mn>1</mn><mo stretchy="false">|</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo>,</mo><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mo>,</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow></mtd></mtr><mtr><mtd
   columnalign="right"><mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mo>=</mo><mstyle displaystyle="true"
   scriptlevel="0"><mfrac><mrow><msub><mi>π</mi><mn>1</mn></msub><msub><mi
   >f</mi><mn>1</mn></msub><mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow></mrow><mrow><msub><mi>π</mi><mn>1</mn></
   msub><msub><mi>f</mi><mn>1</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow><mo>+</mo><msub><mi>π</mi><mn>2</mn></msu
   b><msub><mi>f</mi><mn>2</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow><mo>+</mo><msub><mi>π</mi><mn>3</mn></msu
   b><msub><mi>f</mi><mn>3</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow></mrow></mfrac></mstyle></mrow></mtd></mt
   r><mtr><mtd
   columnalign="right"><mrow><mrow></mrow><msub><mi>ζ</mi><mrow><mi>g</mi>
   <mn>2</mn></mrow></msub></mrow></mtd><mtd
   columnalign="left"><mrow><mo>=</mo><mstyle displaystyle="true"
   scriptlevel="0"><mfrac><mrow><msub><mi>π</mi><mn>2</mn></msub><msub><mi
   >f</mi><mn>2</mn></msub><mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow></mrow><mrow><msub><mi>π</mi><mn>1</mn></
   msub><msub><mi>f</mi><mn>1</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow><mo>+</mo><msub><mi>π</mi><mn>2</mn></msu
   b><msub><mi>f</mi><mn>2</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow><mo>+</mo><msub><mi>π</mi><mn>3</mn></msu
   b><msub><mi>f</mi><mn>3</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow></mrow></mfrac></mstyle></mrow></mtd></mt
   r><mtr><mtd
   columnalign="right"><mrow><mrow></mrow><msub><mi>ζ</mi><mrow><mi>g</mi>
   <mn>3</mn></mrow></msub></mrow></mtd><mtd
   columnalign="left"><mrow><mo>=</mo><mstyle displaystyle="true"
   scriptlevel="0"><mfrac><mrow><msub><mi>π</mi><mn>3</mn></msub><msub><mi
   >f</mi><mn>3</mn></msub><mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow></mrow><mrow><msub><mi>π</mi><mn>1</mn></
   msub><msub><mi>f</mi><mn>1</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow><mo>+</mo><msub><mi>π</mi><mn>2</mn></msu
   b><msub><mi>f</mi><mn>2</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow><mo>+</mo><msub><mi>π</mi><mn>3</mn></msu
   b><msub><mi>f</mi><mn>3</mn></msub><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow></mrow></mfrac></mstyle></mrow></mtd></mt
   r></mtable></mrow></mtd></mtr></mtable></mrow> :MATH]
   2

   E-step. Denote
   [MATH: <mrow><msup><mi>Q</mi><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup><mfenced close=")"
   open="("><mrow><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mo>,</mo><mi>ψ</mi><mo
   stretchy="false">|</mo><mi
   mathvariant="bold">d</mi><mo>,</mo></mrow><msup><mrow><mrow><mi
   mathvariant="bold-italic">z</mi></mrow></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup><mo>,</mo><msup><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup></mfenced></mrow> :MATH]
   as the expected log complete-data likelihood function at t-th iteration
   of the EM algorithm, we have
   [MATH: <mrow><mtable><mtr><mtd
   columnalign="right"><msup><mi>Q</mi><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup></mtd><mtd
   columnalign="left"><mrow><mo>=</mo><msub><mi>E</mi><mrow><mi
   mathvariant="bold-italic">z</mi></mrow></msub><mfenced close="]"
   open="["><mi>l</mi><mfenced close=")" open="("><mi>ψ</mi><mfenced
   open="|"><mi mathvariant="bold">d</mi><mo>,</mo><mrow><mi
   mathvariant="bold-italic">z</mi></mrow><mo>,</mo><mrow><mi
   mathvariant="bold-italic">π</mi></mrow></mfenced></mfenced><mrow><mo
   stretchy="false">|</mo><mrow><mi
   mathvariant="bold-italic">d</mi></mrow><mo>,</mo></mrow><msup><mrow><mr
   ow><mi mathvariant="bold-italic">z</mi></mrow></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup><mo>,</mo><msup><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup></mfenced></mrow></mtd></mtr><mtr>
   <mtd columnalign="right"><mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mo>=</mo><munderover><mo>∑</mo><mrow><mi>g</m
   i><mo>=</mo><mn>1</mn></mrow><mi>G</mi></munderover><mrow><mo
   stretchy="false">(</mo><mrow><mo
   stretchy="false">(</mo><msubsup><mi>ζ</mi><mrow><mi>g</mi><mn>1</mn></m
   row><mrow><mo stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msubsup><mo>log</mo><msub><mi>f</mi><mn
   >1</mn></msub><mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo
   stretchy="false">)</mo></mrow><mo>+</mo><msubsup><mi>ζ</mi><mrow><mi>g<
   /mi><mn>2</mn></mrow><mrow><mo stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msubsup><mo>log</mo><msub><mi>f</mi><mn
   >2</mn></msub><mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo stretchy="false">)</mo></mrow><mo
   stretchy="false">)</mo></mrow><mo>+</mo><msubsup><mi>ζ</mi><mrow><mi>g<
   /mi><mn>3</mn></mrow><mrow><mo stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msubsup><mo>log</mo><msub><mi>f</mi><mn
   >3</mn></msub><mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><mi>ψ</mi><mo stretchy="false">)</mo></mrow><mo
   stretchy="false">)</mo></mrow></mrow></mtd></mtr><mtr><mtd
   columnalign="right"><mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mo>+</mo><munderover><mo>∑</mo><mrow><mi>g</m
   i><mo>=</mo><mn>1</mn></mrow><mi>G</mi></munderover><mrow><mo
   stretchy="false">(</mo><msubsup><mi>ζ</mi><mrow><mi>g</mi><mn>1</mn></m
   row><mrow><mo stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msubsup><mo>log</mo><msub><mi>π</mi><mn
   >1</mn></msub><mo>+</mo><msubsup><mi>ζ</mi><mrow><mi>g</mi><mn>2</mn></
   mrow><mrow><mo stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msubsup><mo>log</mo><msub><mi>π</mi><mn
   >2</mn></msub><mo>+</mo><msubsup><mi>ζ</mi><mrow><mi>g</mi><mn>3</mn></
   mrow><mrow><mo stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msubsup><mo>log</mo><msub><mi>π</mi><mn
   >3</mn></msub><mo
   stretchy="false">)</mo></mrow></mrow></mtd></mtr><mtr><mtd
   columnalign="right"><mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mo>+</mo><mo>log</mo><mfenced close=")"
   open="("><mfrac><mrow><mi mathvariant="normal">Γ</mi><mo
   stretchy="false">(</mo><msubsup><mo>∑</mo><mrow><mi>c</mi><mo>=</mo><mn
   >1</mn></mrow><mn>3</mn></msubsup><msub><mi>b</mi><mi>c</mi></msub><mo
   stretchy="false">)</mo></mrow><mrow><msubsup><mo>∏</mo><mrow><mi>c</mi>
   <mo>=</mo><mn>1</mn></mrow><mn>3</mn></msubsup><mi
   mathvariant="normal">Γ</mi><mrow><mo
   stretchy="false">(</mo><msub><mi>b</mi><mi>c</mi></msub><mo
   stretchy="false">)</mo></mrow></mrow></mfrac></mfenced><mo>+</mo><munde
   rover><mo>∑</mo><mrow><mi>c</mi><mo>=</mo><mn>1</mn></mrow><mn>3</mn></
   munderover><mrow><mo
   stretchy="false">(</mo><msub><mi>b</mi><mi>c</mi></msub><mo>-</mo><mn>1
   </mn><mo
   stretchy="false">)</mo></mrow><mo>log</mo><msub><mi>π</mi><mi>c</mi></m
   sub><mo>,</mo></mrow></mtd></mtr></mtable></mrow> :MATH]

   where
   [MATH: <mrow><mtable><mtr><mtd
   columnalign="right"><mrow><mtable><mtr><mtd
   columnalign="right"><msubsup><mi>ζ</mi><mrow><mi
   mathvariant="italic">gc</mi></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msubsup></mtd><mtd
   columnalign="left"><mrow><mo>=</mo><mi>E</mi><mfenced close=")"
   open="("><msub><mi>z</mi><mrow><mi
   mathvariant="italic">gc</mi></mrow></msub><mrow><mo
   stretchy="false">|</mo></mrow><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo>,</mo><msup><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup><mo>,</mo><msup><mrow><mi>ψ</mi></
   mrow><mrow><mo stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup></mfenced></mrow></mtd></mtr><mtr>
   <mtd columnalign="right"><mrow></mrow></mtd><mtd
   columnalign="left"><mrow><mo>=</mo><mstyle displaystyle="true"
   scriptlevel="0"><mfrac><mrow><msubsup><mi>π</mi><mi>c</mi><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msubsup><msub><mi>f</mi><mi>c</mi></msu
   b><mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><msup><mrow><mi>ψ</mi></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup><mo
   stretchy="false">)</mo></mrow></mrow><mrow><msubsup><mi>π</mi><mn>1</mn
   ><mrow><mo stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msubsup><msub><mi>f</mi><mn>1</mn></msu
   b><mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><msup><mrow><mi>ψ</mi></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup><mo
   stretchy="false">)</mo></mrow><mo>+</mo><msubsup><mi>π</mi><mn>2</mn><m
   row><mo stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msubsup><msub><mi>f</mi><mn>2</mn></msu
   b><mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><msup><mrow><mi>ψ</mi></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup><mo
   stretchy="false">)</mo></mrow><mo>+</mo><msubsup><mi>π</mi><mn>3</mn><m
   row><mo stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msubsup><msub><mi>f</mi><mn>3</mn></msu
   b><mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><msup><mi>ψ</mi><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup><mo
   stretchy="false">)</mo></mrow></mrow></mfrac></mstyle><mo>,</mo><mspace
   width="4pt"></mspace><mi>c</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn>
   <mo>,</mo><mn>3</mn><mo>.</mo></mrow></mtd></mtr></mtable></mrow></mtd>
   </mtr></mtable></mrow> :MATH]
   3

   M-step. Maximize
   [MATH: <mrow><msup><mi>Q</mi><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup><mfenced close=")"
   open="("><mrow><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mo>,</mo><mi>ψ</mi><mo
   stretchy="false">|</mo><mi
   mathvariant="bold">d</mi><mo>,</mo></mrow><msup><mrow><mrow><mi
   mathvariant="bold-italic">z</mi></mrow></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup><mo>,</mo><msup><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup></mfenced></mrow> :MATH]
   to find the optimal values of
   [MATH: <mrow><mi mathvariant="bold-italic">π</mi></mrow> :MATH]
   and
   [MATH: <mi>ψ</mi> :MATH]
   , and use these optimal values as estimates for the parameters
   [MATH: <mrow><mi mathvariant="bold-italic">π</mi></mrow> :MATH]
   and
   [MATH: <mi>ψ</mi> :MATH]
   .

   To maximize
   [MATH: <mrow><msup><mi>Q</mi><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup><mfenced close=")"
   open="("><mrow><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mo>,</mo><mi>ψ</mi><mo
   stretchy="false">|</mo><mi
   mathvariant="bold">d</mi><mo>,</mo></mrow><msup><mrow><mrow><mi
   mathvariant="bold-italic">z</mi></mrow></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup><mo>,</mo><msup><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup></mfenced></mrow> :MATH]
   , we use the “L-BFGS-B” method developed by Byrd et al. (1995)
   [[42]16], which utilizes the first partial derivatives of
   [MATH: <mrow><msup><mi>Q</mi><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup><mfenced close=")"
   open="("><mrow><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mo>,</mo><mi>ψ</mi><mo
   stretchy="false">|</mo><mi
   mathvariant="bold">d</mi><mo>,</mo></mrow><msup><mrow><mrow><mi
   mathvariant="bold-italic">z</mi></mrow></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup><mo>,</mo><msup><mrow><mi
   mathvariant="bold-italic">π</mi></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup></mfenced></mrow> :MATH]
   and allows box constraints, that is each variable can be given a lower
   and/or upper bound.

Simulated annealing modification

   EM algorithm may be trapped in a local maximum since it is strictly
   ascending. As introduced by Celeux and Govaert (1992) [[43]17],
   simulated annealing (SA) is widely used to help EM algorithm escape
   from local maximum by adding randomness with a stochastic step.
   Specifically, the conditional expectation in ([44]2) is modified in a
   SA algorithm as follows
   [MATH: <mrow><mtable><mtr><mtd
   columnalign="right"><mrow><mtable><mtr><mtd
   columnalign="right"><mrow><msubsup><mover accent="true"><mi>ζ</mi><mo
   stretchy="false">~</mo></mover><mrow><mi
   mathvariant="italic">gc</mi></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msubsup><mo>=</mo><mstyle
   displaystyle="true" scriptlevel="0"><mfrac><msup><mfenced close="]"
   open="["><msubsup><mi>π</mi><mi>c</mi><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msubsup><msub><mi>f</mi><mi>c</mi></msu
   b><mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><msup><mrow><mi>ψ</mi></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup><mo
   stretchy="false">)</mo></mrow></mfenced><mrow><mn>1</mn><mo
   stretchy="false">/</mo><msup><mi>m</mi><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup></mrow></msup><mrow><msubsup><mo>∑
   </mo><mrow><mi>c</mi><mo>=</mo><mn>1</mn></mrow><mn>3</mn></msubsup><ms
   up><mfenced close="]" open="["><msubsup><mi>π</mi><mi>c</mi><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msubsup><msub><mi>f</mi><mi>c</mi></msu
   b><mrow><mo stretchy="false">(</mo><msub><mi
   mathvariant="bold">d</mi><mi>g</mi></msub><mo
   stretchy="false">|</mo><msup><mrow><mi>ψ</mi></mrow><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup><mo
   stretchy="false">)</mo></mrow></mfenced><mrow><mn>1</mn><mo
   stretchy="false">/</mo><msup><mi>m</mi><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup></mrow></msup></mrow></mfrac></mst
   yle><mo>,</mo><mspace
   width="4pt"></mspace><mi>c</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn>
   <mo>,</mo><mn>3</mn><mo>.</mo></mrow></mtd></mtr></mtable></mrow></mtd>
   </mtr></mtable></mrow> :MATH]
   4

   where m is the temperature used to control the randomness. Usually, the
   temperature m starts with a relatively high value since larger m leads
   to larger randomness. At iteration t, the temperature is updated by
   [MATH: <mrow><msup><mi>m</mi><mrow><mo
   stretchy="false">(</mo><mi>t</mi><mo>+</mo><mn>1</mn><mo
   stretchy="false">)</mo></mrow></msup><mo>=</mo><mi>r</mi><mo>×</mo><msu
   p><mi>m</mi><mrow><mo stretchy="false">(</mo><mi>t</mi><mo
   stretchy="false">)</mo></mrow></msup></mrow> :MATH]
   with the cooling rate r controls the speed of reduction. As suggested
   in [[45]18, [46]19], we use
   [MATH: <mrow><msup><mi>m</mi><mrow><mo
   stretchy="false">(</mo><mn>0</mn><mo
   stretchy="false">)</mo></mrow></msup><mo>=</mo><mn>2</mn></mrow> :MATH]
   and
   [MATH: <mrow><mi>r</mi><mo>=</mo><mn>0.9</mn></mrow> :MATH]
   .

   We denoted eLNNpairedCov as the proposed method using the traditional
   EM algorithm to obtain parameter estimates and denoted
   eLNNpairedCov.SEM as the proposed method using the EM with
   SA-modification to obtain parameter estimates.

   We stop the expectation-maximization iterations based on a proportional
   change, i.e. if the maximum of the absolute value of the differences of
   model parameter estimates between current iteration and previous
   iteration over the absolute value of the previous iteration estimates
   is smaller than a small constant (e.g.
   [MATH:
   <mrow><mn>1.0</mn><mo>×</mo><msup><mn>10</mn><mrow><mo>-</mo><mn>3</mn>
   </mrow></msup></mrow> :MATH]
   ).

   More details about the EM algorithm are shown in Supplementary Document
   [see Additional file [47]1].

A real data study

   We used the dataset [48]GSE24742 [[49]20], which can be downloaded from
   the Gene Expression Omnibus
   [[50]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE24742], to
   evaluate the performance of the proposed model-based clustering methods
   (denoted as eLNNpairedCov and eLNNpairedCov.SEM ).

   The dataset is from a study that investigated the gene expression
   before and after administrating rituximab, a drug for treating anti-TNF
   resistant rheumatoid arthritis (RA). There are 12 subjects, each having
   2 samples (one sample is before treatment and the other is after
   treatment). Age and sex are also available. Expression levels of 54,675
   gene probes were measured for each of the 24 samples by using
   Affymetrix HUman Genome U133 Plus 2.0 array. The dataset has been
   preprocessed by the dataset contributor. We further kept only 43,505
   gene probes in the autosomal chromosomes (i.e., chromosomes 1 to 22).
   We then performed log2 transformation for gene expression levels. We
   next obtained the within-subject difference of the log2 transformed
   expression levels (log2 expression after-treatment minus log2
   expression before-treatment). By examining the histogram (Figure A1)
   [see Additional file [51]1] of the estimated standard deviations of
   log2 differences of within-subject gene expression for the 43,505 gene
   probes, we found a bimodal distribution. Based on Figure A1 [see
   Additional file [52]1], where the histogram of estimated standard
   deviations exhibits two modes, we choose to exclude gene probes with
   standard deviation
   [MATH: <mrow><mo><</mo><mn>1</mn></mrow> :MATH]
   corresponding to the first mode. It is a common practice to remove
   genes with low variation [[53]21–[54]23]. Finally, 23,948 gene probes
   kept in the down-stream analysis.

A simulation study

   We performed a simulation study to compare the performance of the
   proposed methods eLNNpairedCov, eLNNpairedCov.SEM with transcript-wise
   test limma and Li et al.’s [[55]8] method (denoted as eLNNpaired).
   eLNNpairedCov, eLNNpairedCov.SEM and limma adjust covariate effects,
   while eLNNpaired does not. For eLNNpaired, we first regress out
   covariates effect for each gene to make a fair comparison between
   eLNNpaired and other methods.

   The limma approach first performs an empirical-Bayes-based linear
   regression for each transcript. In this linear regression, the
   within-subject log2 difference of transcript expression is the outcome
   and intercept indicating if the transcript is over-expressed
   (intercept>0), under-expressed (intercept<0), or non-differentially
   expressed (intercept = 0), adjusting for potential confounding factors.
   A transcript is claimed as OE if its intercept estimate is positive and
   corresponding FDR-adjusted p-value
   [MATH: <mrow><mo><</mo><mn>0.05</mn></mrow> :MATH]
   , where FDR stands for false discovery rate. A transcript is claimed as
   UE if its intercept estimate is negative and corresponding FDR-adjusted
   p-value
   [MATH: <mrow><mo><</mo><mn>0.05</mn></mrow> :MATH]
   . Other transcripts are claimed as NE.

   The parameter values (
   [MATH: <mrow><mi mathvariant="bold-italic">π</mi></mrow> :MATH]
   ,
   [MATH: <mi>ψ</mi> :MATH]
   , and proportion of women) in the simulation study are based on the
   estimates via eLNNpairedCov.SEM from the analysis of the pre-processed
   real dataset [56]GSE24742 described in Subsection “A real data study”.

   In this simulation study, we considered two sets with different
   covariate coefficients for differentially expressed genes clusters. In
   the first set (Set 1), parameter values are the estimates of parameters
   based on the eLNNpairedCov.SEM method from real dataset. That is,
   [MATH:
   <mrow><msub><mi>π</mi><mn>1</mn></msub><mo>=</mo><mn>0.00246</mn></mrow
   > :MATH]
   ,
   [MATH:
   <mrow><msub><mi>π</mi><mn>2</mn></msub><mo>=</mo><mn>0.01470</mn></mrow
   > :MATH]
   ,
   [MATH:
   <mrow><msub><mi>π</mi><mn>3</mn></msub><mo>=</mo><mn>0.98284</mn></mrow
   > :MATH]
   ,
   [MATH:
   <mrow><msub><mi>α</mi><mn>1</mn></msub><mo>=</mo><mn>3.53</mn></mrow>
   :MATH]
   ,
   [MATH:
   <mrow><msub><mi>β</mi><mn>1</mn></msub><mo>=</mo><mn>3.45</mn></mrow>
   :MATH]
   ,
   [MATH:
   <mrow><msub><mi>k</mi><mn>1</mn></msub><mo>=</mo><mn>0.26</mn></mrow>
   :MATH]
   ,
   [MATH:
   <mrow><msub><mi>η</mi><mn>10</mn></msub><mo>=</mo><mn>0.18</mn></mrow>
   :MATH]
   ,
   [MATH:
   <mrow><msub><mi>η</mi><mn>11</mn></msub><mo>=</mo><mn>0.00</mn></mrow>
   :MATH]
   ,
   [MATH:
   <mrow><msub><mi>η</mi><mn>12</mn></msub><mo>=</mo><mo>-</mo><mn>1.05</m
   n></mrow> :MATH]
   ,
   [MATH:
   <mrow><msub><mi>α</mi><mn>2</mn></msub><mo>=</mo><mn>3.53</mn></mrow>
   :MATH]
   ,
   [MATH:
   <mrow><msub><mi>β</mi><mn>2</mn></msub><mo>=</mo><mn>3.45</mn></mrow>
   :MATH]
   ,
   [MATH:
   <mrow><msub><mi>k</mi><mn>2</mn></msub><mo>=</mo><mn>0.26</mn></mrow>
   :MATH]
   ,
   [MATH:
   <mrow><msub><mi>η</mi><mn>20</mn></msub><mo>=</mo><mn>0.18</mn></mrow>
   :MATH]
   ,
   [MATH:
   <mrow><msub><mi>η</mi><mn>21</mn></msub><mo>=</mo><mn>0.00</mn></mrow>
   :MATH]
   ,
   [MATH:
   <mrow><msub><mi>η</mi><mn>22</mn></msub><mo>=</mo><mo>-</mo><mn>1.05</m
   n></mrow> :MATH]
   ,
   [MATH:
   <mrow><msub><mi>α</mi><mn>3</mn></msub><mo>=</mo><mn>2.86</mn></mrow>
   :MATH]
   ,
   [MATH:
   <mrow><msub><mi>β</mi><mn>3</mn></msub><mo>=</mo><mn>2.20</mn></mrow>
   :MATH]
   ,
   [MATH:
   <mrow><msub><mi>k</mi><mn>3</mn></msub><mo>=</mo><mn>0.72</mn></mrow>
   :MATH]
   ,
   [MATH:
   <mrow><msub><mi>η</mi><mn>31</mn></msub><mo>=</mo><mo>-</mo><mn>0.01</m
   n></mrow> :MATH]
   ,
   [MATH:
   <mrow><msub><mi>η</mi><mn>32</mn></msub><mo>=</mo><mn>0.00</mn></mrow>
   :MATH]
   . In the second set (Set 2), we set
   [MATH: <msub><mi>η</mi><mn>10</mn></msub> :MATH]
   =
   [MATH: <msub><mi>η</mi><mn>20</mn></msub> :MATH]
   =0.08 instead of 0.18. For each set, we considered two scenarios. In
   the first scenario (Scenario1), the number of subjects is equal to 30.
   In the second scenario (Scenario2), the number of subjects is equal to
   100.

   For each scenario, we generated 100 datasets. Each simulated dataset
   contains
   [MATH:
   <mrow><mi>G</mi><mo>=</mo><mn>20</mn><mo>,</mo><mn>000</mn></mrow>
   :MATH]
   gene transcripts. There are two covariates: standardized age (denoted
   as Age.s) and Sex. Age.s follows normal distribution with mean 0 and
   standard deviation 1. Seventy five percent (75%) of subjects are women.

Evaluation criteria

   Two agreement indices and two error rates are used to compare the
   predicted cluster membership and true cluster membership of all genes.
   The two agreement indices are accuracy (i.e., proportion of predicted
   cluster membership equal to the true cluster membership) and Jaccard
   index [[57]24]. For perfect agreement, these indices have a value of
   one. If an index takes a value close to zero, then the agreement
   between the true transcript cluster membership and the estimated
   transcript cluster membership is likely due to chance. The two error
   rates are false positive rate (FPR) and false negative rate (FNR). FPR
   is the percentage of detected DE transcripts among truly NE
   transcripts. FNR is the percentage of detected NE transcripts among
   truly DE transcripts. We also examined the user time and number of EM
   iterations for running each simulated dataset.

Results

Results of the real data analysis

   For the real dataset, we adjusted standardized age and sex for
   eLNNpairedCov, eLNNpairedCov.SEM, and limma. We standardized age so
   that it has mean zero and variance one. For each transcript, we also
   scaled its expression across subjects so that its variance is equal
   one. For eLNNpaired, we first regressed out the effect of standardized
   age and sex for each transcript.

   The estimates of parameters in our model are listed in Table [58]1.
   Note that the proposed eLNNpairedCov and eLNNpairedCov.SEM have the
   same estimates for the parameters in these three clusters, except for
   the proportions of three clusters. The proportions of OE and UE
   estimated by eLNNpairedCov method are 0.0376% and 0.346%, respectively.
   The proportions of OE and UE estimated by eLNNpairedCov.SEM method are
   0.246% and 1.47%, respectively.

Table 1.

   Parameter estimates of OE, UE and NE clusters from eLNNpairedCov and
   eLNNpairedCov.SEM
   OE UE NE
   [MATH: <msub><mi>β</mi><mn>1</mn></msub> :MATH]
   3.445543
   [MATH: <msub><mi>β</mi><mn>2</mn></msub> :MATH]
   3.445543
   [MATH: <msub><mi>β</mi><mn>3</mn></msub> :MATH]
   3.445543
   [MATH: <msub><mi>k</mi><mn>1</mn></msub> :MATH]
   0.264565
   [MATH: <msub><mi>k</mi><mn>2</mn></msub> :MATH]
   0.264565
   [MATH: <msub><mi>k</mi><mn>3</mn></msub> :MATH]
   0.264565
   [MATH: <msub><mi>η</mi><mn>10</mn></msub> :MATH]
   0.176007
   [MATH: <msub><mi>η</mi><mn>20</mn></msub> :MATH]
   0.176007
   [MATH: <msub><mi>η</mi><mn>11</mn></msub> :MATH]
   −0.000609
   [MATH: <msub><mi>η</mi><mn>21</mn></msub> :MATH]
   −0.000609
   [MATH: <msub><mi>η</mi><mn>31</mn></msub> :MATH]
   −0.013796
   [MATH: <msub><mi>η</mi><mn>12</mn></msub> :MATH]
   −1.051257
   [MATH: <msub><mi>η</mi><mn>22</mn></msub> :MATH]
   −1.051257
   [MATH: <msub><mi>η</mi><mn>32</mn></msub> :MATH]
   −0.000017
   [59]Open in a new tab

   For the OE cluster,
   [MATH: <mrow><mo>exp</mo><mrow><mo
   stretchy="false">(</mo><msub><mi>η</mi><mn>10</mn></msub><mo
   stretchy="false">)</mo></mrow><mo>=</mo><mo>exp</mo><mrow><mo
   stretchy="false">(</mo><mn>0.176007</mn><mo
   stretchy="false">)</mo></mrow><mo>=</mo><mn>1.192</mn></mrow> :MATH]
   can be interpreted as the expected log2 difference for a male subject (
   [MATH: <mrow><mi>s</mi><mi>e</mi><mi>x</mi><mo>=</mo><mn>0</mn></mrow>
   :MATH]
   ) whose age is equal to mean age (
   [MATH: <mrow><mi>a</mi><mi>g</mi><mi>e</mi><mo>=</mo><mn>0</mn></mrow>
   :MATH]
   is the mean-centered age);
   [MATH:
   <mrow><msub><mi>η</mi><mn>11</mn></msub><mo>=</mo><mo>-</mo><mn>0.00060
   9</mn></mrow> :MATH]
   indicates that one-unit increase in age leads to
   [MATH: <mrow><mo>exp</mo><mo
   stretchy="false">(</mo><mo>-</mo><mn>0.000609</mn><mo
   stretchy="false">)</mo><mo>=</mo><mn>0.999</mn></mrow> :MATH]
   fold-changes in expected log2 difference, while
   [MATH:
   <mrow><msub><mi>η</mi><mn>12</mn></msub><mo>=</mo><mo>-</mo><mn>1.05125
   7</mn></mrow> :MATH]
   indicates that there is
   [MATH: <mrow><mo>exp</mo><mo
   stretchy="false">(</mo><mo>-</mo><mn>1.051257</mn><mo
   stretchy="false">)</mo><mo>=</mo><mn>0.349</mn></mrow> :MATH]
   fold-changes between male subjects and female subjects in expected log2
   difference if they are at the same age. For the UE cluster,
   [MATH:
   <mrow><msub><mi>η</mi><mn>20</mn></msub><mo>=</mo><mn>0.176007</mn></mr
   ow> :MATH]
   can be interpreted as the expected log2 difference for a male subject (
   [MATH: <mrow><mi>s</mi><mi>e</mi><mi>x</mi><mo>=</mo><mn>0</mn></mrow>
   :MATH]
   ) whose age is equal to mean age (
   [MATH: <mrow><mi>a</mi><mi>g</mi><mi>e</mi><mo>=</mo><mn>0</mn></mrow>
   :MATH]
   is the mean-centered age) is
   [MATH: <mrow><mo>-</mo><mo>exp</mo><mo
   stretchy="false">(</mo><mn>0.176007</mn><mo
   stretchy="false">)</mo><mo>=</mo><mo>-</mo><mn>1.192</mn></mrow> :MATH]
   ;
   [MATH:
   <mrow><msub><mi>η</mi><mn>21</mn></msub><mo>=</mo><mo>-</mo><mn>0.00060
   9</mn></mrow> :MATH]
   indicates that one-unit increase in age leads to
   [MATH: <mrow><mo>exp</mo><mo
   stretchy="false">(</mo><mo>-</mo><mn>0.000609</mn><mo
   stretchy="false">)</mo><mo>=</mo><mn>0.999</mn></mrow> :MATH]
   fold-changes in expected log2 difference, while
   [MATH:
   <mrow><msub><mi>η</mi><mn>22</mn></msub><mo>=</mo><mo>-</mo><mn>1.05125
   7</mn></mrow> :MATH]
   indicates that there is
   [MATH: <mrow><mo>exp</mo><mo
   stretchy="false">(</mo><mo>-</mo><mn>1.051257</mn><mo
   stretchy="false">)</mo><mo>=</mo><mn>0.349</mn></mrow> :MATH]
   fold-changes between male subjects and female subjects in expected log2
   difference if they are at the same age. For the NE cluster,
   [MATH:
   <mrow><msub><mi>η</mi><mn>31</mn></msub><mo>=</mo><mo>-</mo><mn>0.01379
   6</mn></mrow> :MATH]
   indicates that one-unit increase in age leads to 0.01379 decreases in
   expected log2 difference, and
   [MATH:
   <mrow><msub><mi>η</mi><mn>32</mn></msub><mo>=</mo><mo>-</mo><mn>0.00001
   7</mn></mrow> :MATH]
   indicates that there is 0.000017 decrease from female subjects to male
   subjects in the expected log2 difference if they are at the same age.

   The number of differentially expressed genes detected by each method is
   listed in Table [60]2.

Table 2.

   Number of Differentially expressed genes detected by limma, eLNNpaired,
   eLNNpairedCov and eLNNpairedCov.SEM in [61]GSE24742
      limma eLNNpaired eLNNpairedCov eLNNpairedCov.SEM
   OE 0     0          55            59
   UE 6     0          355           352
   [62]Open in a new tab

   The limma method detected 6 under-expressed gene transcripts
   (Figure [63]1 and Table S1), while eLNNpaired did not find any positive
   signals (i.e.,
   [MATH: <mrow><msub><mover accent="true"><mi>π</mi><mo
   stretchy="false">^</mo></mover><mn>3</mn></msub><mo>=</mo><mn>1</mn></m
   row> :MATH]
   ). The proposed methods eLNNpairedCov and eLNNpairedCov.SEM detected 55
   OE transcripts (Table S2) and 59 OE transcripts (Table S3),
   respectively (Upper two panels of Fig. [64]2) and 355 UE transcripts
   (Table S4) and 352 UE transcripts (Table S5), respectively (Lower two
   panels of Figure [65]2). The 6 UE transcripts detected by limma is also
   selected as UE transcripts by eLNNpairedCov and eLNNpairedCov.SEM. Note
   that the 55 OE genes detected by eLNNpairedCov are also detected by
   eLNNpariedCov.SEM. The 352 UE genes detected by eLNNpairedCov.SEM are
   also detected by eLNNpariedCov.

Fig. 1.

   Fig. 1
   [66]Open in a new tab

   Parallel boxplots of log2 within-subject difference of gene expression
   for 6 UE transcripts detected by limma for pre-processed [67]GSE24742
   dataset. Red horizontal line indicates log2 difference equal to zero

Fig. 2.

   [68]Fig. 2
   [69]Open in a new tab

   Parallel boxplots of log2 within-subject difference of gene expression
   for differentially expressed transcripts detected by eLNNpairedCov and
   eLNNpairedCov.SEM for pre-processed [70]GSE24742 dataset. Upper two
   panels: 55 OE transcripts and 59 OE transcripts, respectively; Lower
   two panels: 355 UE transcripts and 352 UE transcripts, respectively.
   Red horizontal lines indicate log2 difference equal to zero

   It is assuring that several genes corresponding to the DE transcripts
   identified by eLNNpairedCov and eLNNpairedCov.SEM have been associated
   to rheumatoid arthritis (RA) in literature. For example, Humby et al.
   (2019) [[71]25] reported that genes ZNF365 (OE), IL36RN (OE), MRVI1-AS1
   (OE), WFDC6 (UE), UBE2H (UE), are associated with RA.

   We performed pathway enrichment analysis through the use of IPA (QIAGEN
   Inc.,
   [72]https://www.qiagenbioinformatics.com/products/ingenuitypathway-anal
   ysis) for 352 UE and 55 OE genes identified by eLNNpairedCov.SEM. The
   top enriched canonical pathways are shown in Tables [73]3 and [74]4.
   Evidence in literature shows that these pathways are relevant to RA.
   S100 protein family plays an important role in rheumatoid arthritis (
   [[75]26]). Literature shows consistent crucial role of the PD-1/PD-L
   pathway in the pathogenesis of rheumatic diseases ( [[76]27, [77]28]).
   It has been shown that RA can lead to lung tissue damage, resulting in
   pulmonary fibrosis ( [[78]29]). Macrophage is a key player in the
   pathogenesis of autoimmune diseases, such as RA ( [[79]30]). RA and
   osteoarthritis (OA) are two common arthritis with different
   pathogenesis ( [[80]31]). It is interesting to see Osteoarthritis
   pathway is a significantly enriched pathway for UE genes. It is
   consistent with literature that similar focal and systemic alterations
   exist in RA and OA [[81]32].

Table 3.

   Top canonical pathways for 352 UE genes by eLNNpairedCov.SEM
   Name                                            p-value
   S100 Family Signaling Pathway
   [MATH: <mrow><mn>2.97</mn><mi>E</mi><mo>-</mo><mn>06</mn></mrow> :MATH]
   PD-1, PD-L1 cancer immunotherapy pathway
   [MATH: <mrow><mn>7.54</mn><mi>E</mi><mo>-</mo><mn>05</mn></mrow> :MATH]
   Pulmonary Fibrosis Idiopathic Signaling pathway
   [MATH: <mrow><mn>3.45</mn><mi>E</mi><mo>-</mo><mn>04</mn></mrow> :MATH]
   Phagosome Formation
   [MATH: <mrow><mn>7.56</mn><mi>E</mi><mo>-</mo><mn>04</mn></mrow> :MATH]
   Osteoarthritis Pathway
   [MATH: <mrow><mn>1.04</mn><mi>E</mi><mo>-</mo><mn>03</mn></mrow> :MATH]
   [82]Open in a new tab

Table 4.

   Top canonical pathways for 55 OE genes by eLNNpairedCov.SEM
          Name                                                           p-value
          Ribonucleotide Reductase Signaling Pathway
   [MATH: <mrow><mn>5.34</mn><mi>E</mi><mo>-</mo><mn>03</mn></mrow> :MATH]
          Leukocyte Extravasation Signaling
   [MATH: <mrow><mn>7.57</mn><mi>E</mi><mo>-</mo><mn>03</mn></mrow> :MATH]
          Cell Cycle: G1/S Checkpoint Regulation
   [MATH: <mrow><mn>8.85</mn><mi>E</mi><mo>-</mo><mn>03</mn></mrow> :MATH]
          Tetrahydrofolate Salvage from 5,10- methenyltetrahydrofolate
   [MATH: <mrow><mn>1.04</mn><mi>E</mi><mo>-</mo><mn>02</mn></mrow> :MATH]
   Role of Osteoblasts, Osteoclasts and Chondrocytes in Rheumatoid
Arthritis
   [MATH: <mrow><mn>1.19</mn><mi>E</mi><mo>-</mo><mn>02</mn></mrow> :MATH]
   [83]Open in a new tab

   Ribonucleotide Reductase (RNR) is the enzyme providing the precursors
   needed for both synthesis and repair of DNA, which could be a potential
   drug for RA ( [[84]33, [85]34]). Leukocyte extravasation through the
   endothelial barrier is important in the pathogenesis of RA ( [[86]35]).
   It has been shown that the limb bud and heart development (LBH) gene is
   a key dysregulated gene in RA and other autoimmune diseases and there
   are some evidence showing LBH could modulate the cell cycle [[87]36].
   Osteoblasts, osteoclasts and chondrocytes play importan roles in
   Rheumatoid Arthritis ( [[88]37–[89]39]). We did not find literature
   linking Tetrahydrofolate Salvage from 5,10- methenyltetrahydrofolate to
   RA yet, indicating this enrichment might be novel.

Results of the simulation study

   For Scenario 1 (
   [MATH: <mrow><mi>n</mi><mo>=</mo><mn>30</mn></mrow> :MATH]
   ), the jittered scatter plots of the performance indices versus methods
   are shown in Fig. [90]3 (Set 1) and Fig. [91]5 (Set 2) and the jittered
   scatter plots of the difference of the performance indices versus
   methods are shown in Fig. [92]4 (Set 1) and Figure [93]6 (Set 2).

Fig. 3.

   [94]Fig. 3
   [95]Open in a new tab

   Jittered scatter plots of performance indices versus method for Set 1,
   Scenario 1 (number of pairs
   [MATH: <mrow><mo>=</mo><mn>30</mn></mrow> :MATH]
   ). Red solid horizontal lines indicate the median performance indices
   of eLNNpairedCov.SEM

Fig. 5.

   [96]Fig. 5
   [97]Open in a new tab

   Jittered scatter plots of performance indices versus method for Set 2,
   Scenario 1 (number of pairs
   [MATH: <mrow><mo>=</mo><mn>30</mn></mrow> :MATH]
   ). Red solid horizontal lines indicate the median performance indices
   of eLNNpairedCov.SEM

Fig. 4.

   [98]Fig. 4
   [99]Open in a new tab

   Jittered scatter plots of difference of performance indices versus
   method for Set 1, Scenario 1 (number of pairs
   [MATH: <mrow><mo>=</mo><mn>30</mn></mrow> :MATH]
   ). Red solid horizontal lines indicate y-axis equal to zero

Fig. 6.

   [100]Fig. 6
   [101]Open in a new tab

   Jittered scatter plots of difference of performance indices versus
   method for Set 2, Scenario 1 (number of pairs
   [MATH: <mrow><mo>=</mo><mn>30</mn></mrow> :MATH]
   ). Red solid horizontal lines indicate y-axis equal to zero

   The differences of performance indices are between eLNNpairedCov.SEM
   and the other three methods (limma, eLNNpaired and eLNNpairedCov). A
   positive difference indicates that the performance indices of the other
   method is larger than that of eLNNpairedCov.SEM. A negative difference
   indicates that the performance indices of the other method is smaller
   than that of eLNNpairedCov.SEM.

   The upper panel of Figs. [102]3, [103]4, [104]5 and [105]6 show that
   both the eLNNpairedCov and eLNNpairedCov.SEM have higher agreement
   indices (Jaccard and accuracy) than limma, which in turn have higher
   agreement indices than eLNNpaired.

   The middle panel of Figures 3-6 show that the proposed eLNNpairedCov
   and eLNNpairedCov.SEM methods have similar performance, They have lower
   FPR than limma, while eLNNpaired has an exceedingly low FPR (close to
   0). The middle panel also show that eLNNpairedCov, eLNNpairedCov.SEM
   have smaller FNR than limma, while eLNNpaired has an exceedingly high
   FNR (close to 1). The extreme values in FPR and FNR of eLNNpaired can
   be attributed to the fact that it did not detect any differentially
   expressed genes in this case.

   Additionally, Figs. [106]3, [107]4, [108]5 and [109]6 also show that
   compared with the performances of these methods in Set 1 (
   [MATH:
   <msub><mrow><msub><mi>η</mi><mn>10</mn></msub></mrow><mn>1</mn></msub>
   :MATH]
   [MATH:
   <mrow><mo>=</mo><msub><mi>η</mi><mn>20</mn></msub><mo>=</mo></mrow>
   :MATH]
   0.18), those in Set 2 (
   [MATH:
   <msub><mrow><msub><mi>η</mi><mn>10</mn></msub></mrow><mn>1</mn></msub>
   :MATH]
   [MATH:
   <mrow><mo>=</mo><msub><mi>η</mi><mn>20</mn></msub><mo>=</mo></mrow>
   :MATH]
   0.08) have lower agreement indices and higher error rates except for
   eLNNpaired, which fails to detect any differentially expressed genes in
   both Set 1 and Set 2.

   The bottom panel of Figs. [110]3 and [111]5 show that limma runs very
   fast, while eLNNpaired, eLNNpairedCov and eLNNpairedCov.SEM run in
   reasonable time (i.e., less than 30 s per dataset that has
   [MATH:
   <mrow><mi>G</mi><mo>=</mo><mn>20</mn><mo>,</mo><mn>000</mn></mrow>
   :MATH]
   genes and
   [MATH: <mrow><mi>n</mi><mo>=</mo><mn>30</mn></mrow> :MATH]
   subjects). On average eLNNpairedCov and eLNNpairedCov.SEM spend a
   little more time than eLNNpaired. The bottom panel of Fig. [112]3 and
   [113]5 also show that eLNNpaired uses less than 5 EM iterations, while
   eLNNpairedCov and eLNNpairedCov.SEM tend to use more EM iterations. In
   particular, eLNNpairedCov.SEM uses 10 EM iterations, which is the
   maximum number of iterations we set to save computing time. Note that
   the EM iteration number for limma is set to be one, which does not use
   EM algorithm to obtain parameter estimates.

   The simulation results for Scenario 2 (
   [MATH: <mrow><mi>n</mi><mo>=</mo><mn>100</mn></mrow> :MATH]
   ) are shown in Figures A5-A8 [see Additional file [114]1], which have
   similar patterns to those for Scenario 1 (
   [MATH: <mrow><mi>n</mi><mo>=</mo><mn>30</mn></mrow> :MATH]
   ), except that both eLNNpairedCov and eLNNpairedCov.SEM have smaller
   FPR which are close to 0. Note that eLNNpairedCov,eLNNpairedCov.SEM and
   limma have small FNR (close to 0), while eLNNpaired still has huge FNR
   (close to 1).

Discussion and conclusion

   In this article, we proposed a novel model-based clustering approach to
   detect differential expressed transcripts between samples before
   treatment and samples after treatment, with the capacity to adjust for
   potential confounding factors. This is novel in that to the best of our
   knowledge, all existing model-based gene clustering methods do not yet
   have the capacity to adjust for covariates.

   The proposed approach is different from transcript-wise test followed
   by multiplicity adjustment in that it does not involve hypothesis
   testing. Hence, no multiplicity adjustment is needed. The simulation
   study showed that if the difference of gene expression between samples
   before treatment and samples after treatment follows the mixture of
   hierarchical models in Subsection “A mixture of hierarchical models”,
   then the proposed method can outperform limma, which is a fast and
   powerful transcript-wise test method. The real data analysis also
   showed the proposed method eLNNpairedCov can detect more differentially
   expressed gene transcripts, which include the transcripts detected by
   limma.

   Although we classify genes to three distinct clusters, the transitions
   between these clusters could be smooth. This would be reflected by a
   gene’s posterior probability that might be large in two of three
   clusters, e.g., 0.49 for cluster 1, 0.01 for cluster 2, and 0.5 for
   cluster 3. On the other hand, expression changes could be split up into
   more than 3 clusters, e.g., groups behaving differently. In this
   article, we are only interested in identifying three clusters of genes:
   over-expressed in condition 1, under-expressed in condition 1, and
   non-differentially expressed.

   There are other model-based clustering methods in literature, such as
   [[115]40]. However, they were not designed to detect differentially
   expressed genes. For example, we can set the number K of clusters as 3
   for their model. However, there is no constraints that the intercepts
   for the three clusters have to be positive, negative, and zero. That
   is, the three clusters identified might not correspond to
   over-expressed, under-expressed, and non-differentially expressed
   genes.

   It is well-known in literature that EM algorithm might stuck at local
   optimal solution. In this article, we used EM with SA-modification to
   help escape from local optimal solutions. In future, we plan to try the
   hybrid algorithm of the DPSO (Discrete Particle Swarm Optimization) and
   the EM approach to improve the global search performance [[116]41].

   In our models, the three gene groups allow to have different
   coefficients of covariates. In future, we could test if these
   coefficients are same or not. If no significant difference, we could
   use a model assuming equal coefficients.

   RNAseq and single-cell RNAseq data are cutting-edge tools to
   investigate molecular mechanisms of complex human diseases. However, it
   is quite challenging to analyze these count data with inflated zero
   counts. In future, we will evaluate if eLNNpairedCov can be used to
   analyze single-cell RNAseq data by first transforming counts to
   continuous scale (e.g., via VOOM [[117]12] or countTransformers
   [[118]13]) and then to apply eLNNpairedCov to the transformed data.

   We implemented the proposed methods to an R package eLNNpairedCov,
   which will be freely available to researchers.

Supplementary Information

   [119]12859_2023_5556_MOESM1_ESM.pdf^ (468.4KB, pdf)

   Additional file 1. Supplementary Document.
   [120]12859_2023_5556_MOESM2_ESM.xlsx^ (10.8KB, xlsx)

   Additional file 2: Table S1.Gene list of 6 UE transcripts detected by
   limma.
   [121]12859_2023_5556_MOESM3_ESM.xlsx^ (12.5KB, xlsx)

   Additional file 3: Table S2.Gene list of 55 OE transcripts detected by
   eLNNpairedCov.
   [122]12859_2023_5556_MOESM4_ESM.xlsx^ (12.7KB, xlsx)

   Additional file 4: Table S3.Gene list of 59 OE transcripts detected by
   eLNNpairedCov.SEM.
   [123]12859_2023_5556_MOESM5_ESM.xlsx^ (23.9KB, xlsx)

   Additional file 5: Table S4.Gene list of 355 UE transcripts detected by
   eLNNpairedCov.
   [124]12859_2023_5556_MOESM6_ESM.xlsx^ (23.8KB, xlsx)

   Additional file 6: Table S5.Gene list of 352 UE transcripts detected by
   eLNNpairedCov.SEM.

Acknowledgements