Abstract

   Single-cell multiomics technologies, where the transcriptomic and
   epigenomic profiles are simultaneously measured in the same set of
   single cells, pose significant challenges for effective integrative
   analysis. Here, we propose an unsupervised generative model, iPoLNG,
   for the effective and scalable integration of single-cell multiomics
   data. iPoLNG reconstructs low-dimensional representations of the cells
   and features using computationally efficient stochastic variational
   inference by modelling the discrete counts in single-cell multiomics
   data with latent factors. The low-dimensional representation of cells
   enables the identification of distinct cell types, and the feature by
   factor loading matrices help characterize cell-type specific markers
   and provide rich biological insights on the functional pathway
   enrichment analysis. iPoLNG is also able to handle the setting of
   partial information where certain modality of the cells is missing.
   Taking advantage of GPU and probabilistic programming, iPoLNG is
   scalable to large datasets and it takes less than 15 min to implement
   on datasets with 20,000 cells.

   Keywords: integrative analysis, single-cell multi-omics data,
   probabilistic non-negative matrix factorization, generative model,
   unsupervised learning, stochastic variational inference

1 Introduction

   With the rapid development of single-cell technologies, the abundant
   biological information in the single cell is collected at unprecedented
   resolution. More recently, sequencing methods enable the simultaneous
   measurement of epigenome and transcriptome from a common set of single
   cells. For example, sci-CAR ([28]Cao et al., 2018) jointly profiles
   chromatin accessibility and mRNA (CAR) in each of thousands of single
   cells; SNARE-seq ([29]Chen et al., 2019b), SHARE-seq ([30]Ma et al.,
   2020), Paired-seq ([31]Zhu et al., 2019) can measure chromatin
   accessibility and gene expression in the same single cell; Paired-Tag
   ([32]Zhu et al., 2021) is an ultra-high-throughput method for joint
   profiling of histone modifications and transcriptome in single cells to
   produce cell-type-resolved maps of chromatin state and transcriptome in
   complex tissues.

   The single-cell multiomics datasets generated by these technologies
   pose challenges for effective integrative analysis due to the
   characteristics of the datasets. First, the single-cell data is
   high-dimensional yet very sparse, and high technical variation is
   present in single-cell datasets. Second, the level of noise in
   chromatin accessibility data or histone modification data is usually
   higher than gene expression data in single-cell multiomics datasets,
   which suggests that different data modalities cannot be simply treated
   the same.

   Computational tools for the integrative analysis of single-cell assays
   are essential to provide more comprehensive biological insights at the
   cellular level. Integration problems in single-cell biology can be
   divided into those associated with the integration of unmatched data
   (that is, different modalities profiled from different cells) or
   matched (that is, different modalities profiled from the same cell)
   data ([33]Miao et al., 2021). A few methods have been developed for the
   integrative analysis of unmatched data ([34]Duren et al., 2018;
   [35]Zeng et al., 2019; [36]Cao et al., 2020; [37]Lin et al., 2020;
   [38]Wangwu et al., 2021; [39]Zeng et al., 2021; [40]Zeng and Lin, 2021;
   [41]Cao et al., 2022; [42]Demetci et al., 2022), which are not
   applicable to matched data. Some matched data integration methods
   ([43]Kim et al., 2020; [44]Wang et al., 2020; [45]Gayoso et al., 2021)
   are designed for technologies that jointly profile transcriptomic and
   surface protein data such as CITE-seq ([46]Stoeckius et al., 2017) and
   REAP-seq ([47]Peterson et al., 2017). In this study, we mainly focus on
   single-cell multiomics technologies simultaneously measuring
   transcriptomic and epigenomic profiles in the same individual cells.
   Unsupervised methods have been developed for this type of data,
   including Multi-Omics Factor Analysis (MOFA+) ([48]Argelaguet et al.,
   2018; [49]Argelaguet et al. 2020), single-cell Aggregation and
   Integration (scAI) ([50]Jin et al., 2020) and jointly semi-orthogonal
   non-negative matrix factorization (JSNMF) ([51]Ma et al., 2022). Both
   MOFA+ and scAI infers a low-dimensional representation of the data
   using a small number of latent factors that are expected to capture the
   heterogeneous cellular variability. The key difference between MOFA+
   and scAI is that MOFA+ is based on the Bayesian Group Factor Analysis
   framework, while scAI is based on non-negative matrix factorization.
   JSNMF assumes different latent variables for the two molecular
   modalities, and integrates the information of transcriptomic and
   epigenomic data with consensus graph fusion.

   In this paper, we propose an unsupervised generative model, iPoLNG, for
   the effective and scalable integration of single-cell multiomics data,
   where transcriptomic and epigenomic (chromatin accessibility or histone
   modifications) data were obtained from the same cell. iPoLNG
   reconstructs low-dimensional representations of the cells and features
   using computationally efficient stochastic variational inference by
   modelling the discrete counts in single-cell multiomics data with
   latent factors. The hyperparameters introduced to tackle the difference
   in the levels of noise across different data modalities can be
   estimated automatically through a heuristic procedure. By applying
   iPoLNG to real datasets, we demonstrate that the low-dimensional
   representation of cells leads to improved clustering performance, and
   the feature by factor loading matrices help characterize cell-type
   specific markers and provide rich and consistent biological insights on
   the functional pathway enrichment analysis. iPoLNG is also able to
   handle the setting of partial information where certain modality of the
   cells is missing. We also illustrate the effectiveness of our model in
   the simulation study. Taking advantage of GPU and probabilistic
   programming, iPoLNG is scalable to large datasets and it takes less
   than 15 min to implement on datasets with 20,000 cells.

2 Materials and methods

2.1 PoLNG for one data modality

   We first introduce some notations. Let
   [MATH: <mi mathvariant="bold-italic">W</mi><mo>∈</mo><msub><mrow><mi
   mathvariant="double-struck">N</mi></mrow><mrow><mi>I</mi><mo>×</mo><mi>
   J</mi></mrow></msub> :MATH]
   denote the cell by feature count data for one single-cell data
   modality, I the number of cells, J the number of features,
   [MATH: <msup><mrow><mi
   mathvariant="double-struck">R</mi></mrow><mrow><mo>*</mo></mrow></msup>
   :MATH]
   the notation for non-negative real numbers and K the number of latent
   factors, which is much smaller than I or J.

2.1.1 Model formulation

   The basic idea of the PoLNG model is to model the data matrix W as
   random variables sampled from Poisson distributions, the parameters of
   which are determined by two low-rank non-negative matrices
   [MATH: <mi mathvariant="bold-italic">L</mi><mo>∈</mo><msubsup><mrow><mi
   mathvariant="double-struck">R</mi></mrow><mrow><mi>I</mi><mo>×</mo><mi>
   K</mi></mrow><mrow><mo>*</mo></mrow></msubsup> :MATH]
   sampled from Gamma distributions and
   [MATH: <mi mathvariant="bold">Θ</mi><mo>∈</mo><msubsup><mrow><mi
   mathvariant="double-struck">R</mi></mrow><mrow><mi>K</mi><mo>×</mo><mi>
   J</mi></mrow><mrow><mo>*</mo></mrow></msubsup> :MATH]
   . L can be viewed as the low-dimensional representation of the cells,
   while Θ can be viewed as the loading matrix for the features. More
   specifically, the formulation of the model is proposed as follows:
   [MATH: <mtable
   class="gathered"><mtr><mtd><msub><mrow><mi>l</mi></mrow><mrow><mi>i</mi
   ><mo>,</mo><mi>k</mi></mrow></msub><mo>∼</mo><mi
   mathvariant="normal">G</mi><mi mathvariant="normal">a</mi><mi
   mathvariant="normal">m</mi><mi mathvariant="normal">m</mi><mi
   mathvariant="normal">a</mi><mfenced open="("
   close=")"><mrow><msub><mrow><mi>α</mi></mrow><mrow><mi>i</mi><mo>,</mo>
   <mi>k</mi></mrow></msub><mo>,</mo><msub><mrow><mi>β</mi></mrow><mrow><m
   i>i</mi><mo>,</mo><mi>k</mi></mrow></msub></mrow></mfenced><mo>,</mo></
   mtd></mtr><mtr><mtd><msub><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mrow><mi>k</mi><mo>,</mo><mo>⋅<
   /mo></mrow></msub><mo>=</mo><mi>σ</mi><mfenced open="("
   close=")"><mrow><msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi><mo>,</mo><mo>⋅</mo></mrow></msub></mrow></mfenced><mo>,</mo><
   msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi><mo>,</mo><mo>⋅</mo></mrow></msub><mo>∼</mo><mi
   mathvariant="normal">L</mi><mi mathvariant="normal">o</mi><mi
   mathvariant="normal">g</mi><mi mathvariant="normal">i</mi><mi
   mathvariant="normal">t</mi><mo>−</mo><mi mathvariant="normal">N</mi><mi
   mathvariant="normal">o</mi><mi mathvariant="normal">r</mi><mi
   mathvariant="normal">m</mi><mi mathvariant="normal">a</mi><mi
   mathvariant="normal">l</mi><mfenced open="("
   close=")"><mrow><msub><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>,</mo><msub><mrow><mi
   mathvariant="bold">Σ</mi></mrow><mrow><mi>k</mi></mrow></msub></mrow></
   mfenced><mo>,</mo></mtd></mtr><mtr><mtd><msub><mrow><mi>w</mi></mrow><m
   row><mi>i</mi><mo>,</mo><mi>j</mi></mrow></msub><mo>∼</mo><mi
   mathvariant="normal">P</mi><mi mathvariant="normal">o</mi><mi
   mathvariant="normal">i</mi><mi mathvariant="normal">s</mi><mi
   mathvariant="normal">s</mi><mi mathvariant="normal">o</mi><mi
   mathvariant="normal">n</mi><mfenced open="("
   close=")"><mrow><msub><mrow><mi>s</mi></mrow><mrow><mi>i</mi></mrow></m
   sub><munderover accentunder="false"
   accent="true"><mrow><mo>∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</m
   n></mrow><mrow><mi>K</mi></mrow></munderover><msub><mrow><mi>l</mi></mr
   ow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow></msub><msub><mrow><mi>θ<
   /mi></mrow><mrow><mi>k</mi><mo>,</mo><mi>j</mi></mrow></msub></mrow></m
   fenced><mo>,</mo></mtd></mtr></mtable> :MATH]
   (1)

   where σ(⋅) is the softmax function, the lth element of which is given
   by
   [MATH:
   <msub><mrow><mi>σ</mi></mrow><mrow><mi>l</mi></mrow></msub><mfenced
   open="(" close=")"><mrow><msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi><mo>,</mo><mo>⋅</mo></mrow></msub></mrow></mfenced><mo>=</mo><
   mfrac><mrow><msup><mrow><mi>e</mi></mrow><mrow><msub><mrow><mover
   accent="true"><mrow><mi>θ</mi></mrow><mo>~</mo></mover></mrow><mrow><mi
   >k</mi><mo>,</mo><mi>l</mi></mrow></msub></mrow></msup></mrow><mrow><ms
   ubsup><mrow><mo movablelimits="false"
   form="prefix">∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow><
   mrow><mi>J</mi></mrow></msubsup><msup><mrow><mi>e</mi></mrow><mrow><msu
   b><mrow><mover
   accent="true"><mrow><mi>θ</mi></mrow><mo>~</mo></mover></mrow><mrow><mi
   >k</mi><mo>,</mo><mi>j</mi></mrow></msub></mrow></msup></mrow></mfrac><
   mo>,</mo> :MATH]
   (2)

   μ [k ]is a vector of length J serving as the mean of the Logit-Normal
   distribution, Σ [k ]is a J by J diagonal matrix serving as the
   covariance matrix of the Logit-Normal distribution, and s [i ]is the
   scaling factor to take into account the sequencing depth for the ith
   cell.

   The PoLNG model is designed to facilitate the downstream analysis of
   single-cell data. In general, each column of L represents a latent
   factor that can disentangle the heterogeneous cellular information,
   while each row of Θ represents a latent factor for features. Since Θ is
   constrained to have row sum equal to 1, we also impose a soft
   normalization on L by introducing the scaling factor s [i ].

   We further illustrate the choice of s [i ]. Utilizing the simplex
   constraint for each row of Θ, we have
   [MATH: <mi mathvariant="double-struck">E</mi><mfenced open="("
   close=")"><mrow><munderover accentunder="false"
   accent="true"><mrow><mo>∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><mn>1</m
   n></mrow><mrow><mi>J</mi></mrow></munderover><msub><mrow><mi>w</mi></mr
   ow><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow></msub><mo
   stretchy="false">|</mo><msub><mrow><mi
   mathvariant="bold-italic">l</mi></mrow><mrow><mi>i</mi><mo>,</mo><mo>⋅<
   /mo></mrow></msub></mrow></mfenced><mo>=</mo><munderover
   accentunder="false"
   accent="true"><mrow><mo>∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><mn>1</m
   n></mrow><mrow><mi>J</mi></mrow></munderover><mi
   mathvariant="double-struck">E</mi><mfenced open="("
   close=")"><mrow><msub><mrow><mi>w</mi></mrow><mrow><mi>i</mi><mo>,</mo>
   <mi>j</mi></mrow></msub><mo stretchy="false">|</mo><msub><mrow><mi
   mathvariant="bold-italic">l</mi></mrow><mrow><mi>i</mi><mo>,</mo><mo>⋅<
   /mo></mrow></msub></mrow></mfenced><mo>=</mo><munderover
   accentunder="false"
   accent="true"><mrow><mo>∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><mn>1</m
   n></mrow><mrow><mi>J</mi></mrow></munderover><msub><mrow><mi>s</mi></mr
   ow><mrow><mi>i</mi></mrow></msub><munderover accentunder="false"
   accent="true"><mrow><mo>∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</m
   n></mrow><mrow><mi>K</mi></mrow></munderover><msub><mrow><mi>l</mi></mr
   ow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow></msub><msub><mrow><mi>θ<
   /mi></mrow><mrow><mi>k</mi><mo>,</mo><mi>j</mi></mrow></msub><mo>=</mo>
   <msub><mrow><mi>s</mi></mrow><mrow><mi>i</mi></mrow></msub><munderover
   accentunder="false"
   accent="true"><mrow><mo>∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</m
   n></mrow><mrow><mi>K</mi></mrow></munderover><msub><mrow><mi>l</mi></mr
   ow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow></msub><munderover
   accentunder="false"
   accent="true"><mrow><mo>∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><mn>1</m
   n></mrow><mrow><mi>J</mi></mrow></munderover><msub><mrow><mi>θ</mi></mr
   ow><mrow><mi>k</mi><mo>,</mo><mi>j</mi></mrow></msub><mo>=</mo><msub><m
   row><mi>s</mi></mrow><mrow><mi>i</mi></mrow></msub><munderover
   accentunder="false"
   accent="true"><mrow><mo>∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</m
   n></mrow><mrow><mi>K</mi></mrow></munderover><msub><mrow><mi>l</mi></mr
   ow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>,</mo> :MATH]
   (3)

   which suggests that the choice of s [i ]will softly constrain the row
   sum of L . To alleviate the effect of the difference in sequencing
   depth for the cells, we constrain the summation
   [MATH: <msubsup><mrow><mo movablelimits="false"
   form="prefix">∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow><
   mrow><mi>K</mi></mrow></msubsup><msub><mrow><mi>l</mi></mrow><mrow><mi>
   i</mi><mo>,</mo><mi>k</mi></mrow></msub> :MATH]
   to be around 1, and set s [i ]as
   [MATH:
   <msub><mrow><mi>s</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>=</mo><m
   underover accentunder="false"
   accent="true"><mrow><mo>∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><mn>1</m
   n></mrow><mrow><mi>J</mi></mrow></munderover><msub><mrow><mi>w</mi></mr
   ow><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow></msub><mo>.</mo> :MATH]
   (4)

   To obtain the parameter estimation, we implement the stochastic
   variational inference (SVI) algorithm ([52]Hoffman et al., 2013) with
   the deep universal probabilistic program Pyro ([53]Bingham et al.,
   2019). Conditional on the data W , we assume the independency across
   all l [i,k ], across all
   [MATH: <msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi><mo>,</mo><mo>⋅</mo></mrow></msub> :MATH]
   , and between L and Θ. The variational distributions are set as
   [MATH: <mtable
   class="gathered"><mtr><mtd><msub><mrow><mi>l</mi></mrow><mrow><mi>i</mi
   ><mo>,</mo><mi>k</mi></mrow></msub><mo stretchy="false">|</mo><mi
   mathvariant="bold-italic">W</mi><mo>∼</mo><mi
   mathvariant="normal">G</mi><mi mathvariant="normal">a</mi><mi
   mathvariant="normal">m</mi><mi mathvariant="normal">m</mi><mi
   mathvariant="normal">a</mi><mfenced open="("
   close=")"><mrow><msub><mrow><mi>a</mi></mrow><mrow><mi>i</mi><mo>,</mo>
   <mi>k</mi></mrow></msub><mo>,</mo><msub><mrow><mi>b</mi></mrow><mrow><m
   i>i</mi><mo>,</mo><mi>k</mi></mrow></msub></mrow></mfenced><mo>,</mo></
   mtd></mtr><mtr><mtd><msub><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mrow><mi>k</mi><mo>,</mo><mo>⋅<
   /mo></mrow></msub><mo>=</mo><mi>σ</mi><mfenced open="("
   close=")"><mrow><msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi><mo>,</mo><mo>⋅</mo></mrow></msub></mrow></mfenced><mo>,</mo><
   msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi><mo>,</mo><mo>⋅</mo></mrow></msub><mo
   stretchy="false">|</mo><mi
   mathvariant="bold-italic">W</mi><mo>∼</mo><mi
   mathvariant="normal">L</mi><mi mathvariant="normal">o</mi><mi
   mathvariant="normal">g</mi><mi mathvariant="normal">i</mi><mi
   mathvariant="normal">t</mi><mo>−</mo><mi mathvariant="normal">N</mi><mi
   mathvariant="normal">o</mi><mi mathvariant="normal">r</mi><mi
   mathvariant="normal">m</mi><mi mathvariant="normal">a</mi><mi
   mathvariant="normal">l</mi><mfenced open="("
   close=")"><mrow><msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mo>¯</mo></mover></mrow><mrow><
   mi>k</mi></mrow></msub><mo>,</mo><msub><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold">Σ</mi></mrow><mo>¯</mo></mover></mrow><mrow><mi>k</m
   i></mrow></msub></mrow></mfenced><mo>.</mo></mtd></mtr></mtable> :MATH]
   (5)

   By default, the hyperparameters in the prior in model (1) are set as
   [MATH:
   <msub><mrow><mi>α</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow
   ></msub><mo>=</mo><mn>0.1</mn><mo>,</mo><msub><mrow><mi>β</mi></mrow><m
   row><mi>i</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>=</mo><mi>K</mi><ms
   ub><mrow><mi>α</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow></
   msub><mo>,</mo><msub><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>=</mo><mn mathvariant="bold">0</mn><mo>,</mo><msub><mrow><mi
   mathvariant="bold">Σ</mi></mrow><mrow><mi>k</mi></mrow></msub><mo>=</mo
   ><mi
   mathvariant="bold">I</mi><mtext> for all </mtext><mi>i</mi><mo>,</mo><m
   i>k</mi><mo>.</mo> :MATH]
   (6)

   The default initial values for the parameters in the variational
   distributions are set as
   [MATH:
   <msub><mrow><mi>a</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow
   ></msub><mo>=</mo><msub><mrow><mi>b</mi></mrow><mrow><mi>i</mi><mo>,</m
   o><mi>k</mi></mrow></msub><mo>=</mo><mn>0.5</mn><mo>,</mo><msub><mrow><
   mover accent="true"><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mo>¯</mo></mover></mrow><mrow><
   mi>k</mi></mrow></msub><mo>=</mo><mn
   mathvariant="bold">0</mn><mo>,</mo><msub><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold">Σ</mi></mrow><mo>¯</mo></mover></mrow><mrow><mi>k</m
   i></mrow></msub><mo>=</mo><mn>0.1</mn><mi
   mathvariant="bold">I</mi><mtext> for all </mtext><mi>i</mi><mo>,</mo><m
   i>k</mi><mo>.</mo> :MATH]
   (7)

   The estimated parameters
   [MATH: <mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow> :MATH]
   and
   [MATH: <mrow><mover accent="true"><mrow><mi
   mathvariant="bold">Θ</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow> :MATH]
   are computed as the mode of the corresponding variational
   distributions:
   [MATH: <msub><mrow><mover accent="true"><mrow><mi>l</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</
   mi></mrow></msub><mo>=</mo><mfrac><mrow><msub><mrow><mover
   accent="true"><mrow><mi>a</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</
   mi></mrow></msub><mo>−</mo><mn>1</mn></mrow><mrow><msub><mrow><mover
   accent="true"><mrow><mi>b</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</
   mi></mrow></msub></mrow></mfrac><mi>I</mi><mfenced open="("
   close=")"><mrow><msub><mrow><mover
   accent="true"><mrow><mi>a</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</
   mi></mrow></msub><mo>></mo><mn>1</mn></mrow></mfenced><mo>,</mo><mspace
   width="1em"></mspace><msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mi>k</mi><mo>,</mo><mo>⋅</
   mo></mrow></msub><mo>=</mo><mi>σ</mi><mfenced open="("
   close=")"><mrow><msub><mrow><mover accent="true"><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mo>¯</mo></mover></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mi>k</mi></mrow></msub></m
   row></mfenced><mo>.</mo> :MATH]
   (8)

   Note that the covariance matrix Σ [k ]in the Logit-Normal distribution
   can capture the correlation structure in the features if we do not
   constrain it to be diagonal. However, if we do not impose the diagonal
   constraint on the covariance matrix, the number of free parameters in
   one covariance matrix will increase from J to J(J + 1)/2, which brings
   high computational cost. Therefore, we assume that the covariance
   matrix Σ [k ]is diagonal for efficient and lightweight implementation
   of the model.

2.1.2 Relationship to existing models

   The PoLNG model can be considered as a special case in Poisson Factor
   Analysis ([54]Zhou et al., 2012) under novel priors. One model that is
   closely related to our PoLNG model is called the Gamma-Poisson (GaP)
   model ([55]Canny, 2004), and [56]Buntine and Jakulin (2006) extended
   the GaP model with Dirichlet priors on θ [k,⋅]:
   [MATH:
   <msub><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow
   ></msub><mo>∼</mo><mi mathvariant="normal">G</mi><mi
   mathvariant="normal">a</mi><mi mathvariant="normal">m</mi><mi
   mathvariant="normal">m</mi><mi mathvariant="normal">a</mi><mfenced
   open="("
   close=")"><mrow><msub><mrow><mi>α</mi></mrow><mrow><mi>i</mi><mo>,</mo>
   <mi>k</mi></mrow></msub><mo>,</mo><msub><mrow><mi>β</mi></mrow><mrow><m
   i>i</mi><mo>,</mo><mi>k</mi></mrow></msub></mrow></mfenced><mo>,</mo><m
   sub><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mrow><mi>k</mi><mo>,</mo><mo>⋅<
   /mo></mrow></msub><mo>∼</mo><mi mathvariant="normal">D</mi><mi
   mathvariant="normal">i</mi><mi mathvariant="normal">r</mi><mi
   mathvariant="normal">i</mi><mi mathvariant="normal">c</mi><mi
   mathvariant="normal">h</mi><mi mathvariant="normal">l</mi><mi
   mathvariant="normal">e</mi><mi mathvariant="normal">t</mi><mfenced
   open="(" close=")"><mrow><msub><mrow><mi
   mathvariant="bold-italic">γ</mi></mrow><mrow><mi>k</mi></mrow></msub></
   mrow></mfenced><mo>,</mo><msub><mrow><mi>w</mi></mrow><mrow><mi>i</mi><
   mo>,</mo><mi>j</mi></mrow></msub><mo>∼</mo><mi
   mathvariant="normal">P</mi><mi mathvariant="normal">o</mi><mi
   mathvariant="normal">i</mi><mi mathvariant="normal">s</mi><mi
   mathvariant="normal">s</mi><mi mathvariant="normal">o</mi><mi
   mathvariant="normal">n</mi><mfenced open="("
   close=")"><mrow><munderover accentunder="false"
   accent="true"><mrow><mo>∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</m
   n></mrow><mrow><mi>K</mi></mrow></munderover><msub><mrow><mi>l</mi></mr
   ow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow></msub><msub><mrow><mi>θ<
   /mi></mrow><mrow><mi>k</mi><mo>,</mo><mi>j</mi></mrow></msub></mrow></m
   fenced><mo>,</mo> :MATH]
   (9)

   where γ [k ]is the concentration parameter in the Dirichlet
   distribution.

   The difference between the GaP model and the PoLNG model is that the
   Dirichlet prior in the GaP model is replaced by the Logit-Normal
   distribution in the softmax basis in the PoLNG model, as suggested by
   [57]Atchison and Shen (1980). This change keeps the simplex constraint
   for each row of Θ, but it allows for carrying out unconstrained
   optimization of the cost function without the simplex constraints
   ([58]Srivastava and Sutton, 2017). Moreover, it improves computational
   stability in the stochastic variational inference method. When coded in
   the Pyro program, our model (1) with logit-normal distribution is less
   likely to raise numerical errors than model (9) with the Dirichlet
   prior.

   For the parameter estimation in the GaP model, [59]Buntine and Jakulin
   (2006) proposed a mean-field variational inference algorithm and a
   Gibbs sampling algorithm by introducing a latent variable with
   dimension I × J × K. However, because I and J are typically large in
   single-cell data, the computational and memory cost of introducing such
   a 3-dimensional latent variable would be unaffordable for a moderate K.
   In contrast, our SVI algorithm does not introduce memory consuming
   latent variables and enables GPU acceleration when coded in the deep
   universal probabilistic program Pyro.

   The PoLNG model is also related to non-negative matrix factorization
   (NMF) ([60]Lee and Seung, 1999). It can be viewed as a probabilistic
   non-negative matrix factorization model, as it models the expectation
   of the count data as the multiplication of two non-negative matrices,
   i.e.,
   [MATH: <mi mathvariant="double-struck">E</mi><mrow><mo
   stretchy="false">(</mo><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mo
   stretchy="false">)</mo></mrow><mo>=</mo><msup><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mrow><mo>*</mo></mrow></msup><m
   i mathvariant="bold">Θ</mi> :MATH]
   , where L * = SL and S is an I by I diagonal matrix with diagonal
   elements {s [1], s [2], … , s [I ]}. To alleviate the model
   identification problem, the prior on Θ ensures each row of Θ is
   normalized to have sum 1, thus avoiding the case where
   [MATH: <mrow><mo stretchy="false">(</mo><mrow><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mo>~</mo></mover></mrow><mo>,</
   mo><mrow><mover accent="true"><mrow><mi
   mathvariant="bold">Θ</mi></mrow><mo>~</mo></mover></mrow></mrow><mo
   stretchy="false">)</mo></mrow><mo>=</mo><mrow><mo
   stretchy="false">(</mo><mrow><mi>a</mi><mi
   mathvariant="bold-italic">L</mi><mo>,</mo><mfrac><mrow><mn>1</mn></mrow
   ><mrow><mi>a</mi></mrow></mfrac><mi mathvariant="bold">Θ</mi></mrow><mo
   stretchy="false">)</mo></mrow> :MATH]
   is also a possible solution for any a > 0, a ≠ 1. However, this kind of
   topic models also typically suffer from the label switching problem.
   For example, if we impose identical priors to all the components in L
   and Θ, switch the k [1]-th and k [2]-th columns in L and switch the k
   [1]-th and k [2]-th rows in Θ at the same time, then we obtain another
   solution that leads to the same data likelihood or evidence lower bound
   (ELBO) in variational inference. But we need not worry about this label
   switching problem as the switching of factor indices has little
   influence on the downstream analysis.

2.2 iPoLNG for multiomics data

   For the single-cell multiomics data, suppose we have two data
   modalities,
   [MATH: <msup><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msup><mo>∈</mo><msub><mrow><mi
   mathvariant="double-struck">N</mi></mrow><mrow><mi>I</mi><mo>×</mo><msu
   b><mrow><mi>J</mi></mrow><mrow><mi>m</mi></mrow></msub></mrow></msub>
   :MATH]
   for m = 1, 2. Both data modalities measure the information for the same
   set of I cells, but they represent different types of genomic features.
   For example, W ^(1) can be gene expression data, the features being
   genes, and W ^(2) can be chromatin accessibility data, the features
   being peaks.

   To model single-cell multiomics data, we extend the PoLNG model to the
   iPoLNG model. The model overview is presented in [61]Figure 1. In the
   iPoLNG model, we model the expectation of the mth data modality as the
   multiplication of two non-negative matrices, i.e.,
   [MATH: <mi mathvariant="double-struck">E</mi><mrow><mo
   stretchy="false">(</mo><mrow><msup><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msup></mrow><mo
   stretchy="false">)</mo></mrow><mo>=</mo><msup><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mrow><mo>*</mo><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msup><msup><mrow><mi
   mathvariant="bold">Θ</mi></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msup> :MATH]
   , where L *^(m) = S ^(m) L ^(m) and S ^(m) is an I by I diagonal matrix
   that takes into account the sequencing depth for the cells in the mth
   data modality. Then we link all L ^(m) to a common non-negative matrix
   L , each element of which follows an inverse gamma distribution.

FIGURE 1.

   [62]FIGURE 1
   [63]Open in a new tab

   Overview of the iPoLNG model. The input consists of two data modalities
   measuring different aspects of biological profiles in the same set of
   cells. Each data modality W ^(m) is approximated by the matrix product
   of a diagonal matrix S ^(m) that takes into account the cell sequencing
   depth, a feature loading matrix Θ ^(m) and a cell loading matrix L ^(m)
   for m = 1, 2. The feature loading matrices can characterize cell-type
   specific markers and facilitate functional pathway enrichment analysis.
   The cell loading matrices have different variances due to the levels of
   noise across different data modalities, but share the same mean, L ,
   which represents the low-dimensional cell embedding and facilitates
   cell clustering.

   More specifically, the iPoLNG model is proposed as follows:
   [MATH: <mtable
   class="gathered"><mtr><mtd><msub><mrow><mi>l</mi></mrow><mrow><mi>i</mi
   ><mo>,</mo><mi>k</mi></mrow></msub><mo>∼</mo><mi
   mathvariant="normal">I</mi><mi mathvariant="normal">n</mi><mi
   mathvariant="normal">v</mi><mi mathvariant="normal">e</mi><mi
   mathvariant="normal">r</mi><mi mathvariant="normal">s</mi><mi
   mathvariant="normal">e</mi><mi mathvariant="normal">G</mi><mi
   mathvariant="normal">a</mi><mi mathvariant="normal">m</mi><mi
   mathvariant="normal">m</mi><mi mathvariant="normal">a</mi><mfenced
   open="("
   close=")"><mrow><msub><mrow><mi>α</mi></mrow><mrow><mi>i</mi><mo>,</mo>
   <mi>k</mi></mrow></msub><mo>,</mo><msub><mrow><mi>β</mi></mrow><mrow><m
   i>i</mi><mo>,</mo><mi>k</mi></mrow></msub></mrow></mfenced><mo>,</mo></
   mtd></mtr><mtr><mtd><msubsup><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo
   >,</mo><mi>k</mi></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo
   stretchy="false">|</mo><msub><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo
   >,</mo><mi>k</mi></mrow></msub><mo>∼</mo><mi
   mathvariant="normal">G</mi><mi mathvariant="normal">a</mi><mi
   mathvariant="normal">m</mi><mi mathvariant="normal">m</mi><mi
   mathvariant="normal">a</mi><mfenced open="("
   close=")"><mrow><msubsup><mrow><mi>α</mi></mrow><mrow><mn>0</mn></mrow>
   <mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>,</mo><
   msubsup><mrow><mi>α</mi></mrow><mrow><mn>0</mn></mrow><mrow><mfenced
   open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><msubsup><m
   row><mi>l</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow><mrow><
   mo>−</mo><mn>1</mn></mrow></msubsup></mrow></mfenced><mtext> for </mtex
   t><mi>m</mi><mo>=</mo><mn>1,2</mn><mo>,</mo></mtd></mtr><mtr><mtd><msub
   sup><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mrow><mi>k</mi><mo>,</mo><mo>⋅<
   /mo></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>=</mo><
   mi>σ</mi><mfenced open="(" close=")"><mrow><msubsup><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi><mo>,</mo><mo>⋅</mo></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup></mrow></mf
   enced><mo>,</mo><msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi><mo>,</mo><mo>⋅</mo></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>∼</mo><
   mi mathvariant="normal">L</mi><mi mathvariant="normal">o</mi><mi
   mathvariant="normal">g</mi><mi mathvariant="normal">i</mi><mi
   mathvariant="normal">t</mi><mo>−</mo><mi mathvariant="normal">N</mi><mi
   mathvariant="normal">o</mi><mi mathvariant="normal">r</mi><mi
   mathvariant="normal">m</mi><mi mathvariant="normal">a</mi><mi
   mathvariant="normal">l</mi><mfenced open="("
   close=")"><mrow><msubsup><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mrow><mi>k</mi></mrow><mrow><mf
   enced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>,</mo><
   msubsup><mrow><mi
   mathvariant="bold">Σ</mi></mrow><mrow><mi>k</mi></mrow><mrow><mfenced
   open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup></mrow></mf
   enced><mtext> for </mtext><mi>m</mi><mo>=</mo><mn>1,2</mn><mo>,</mo></m
   td></mtr><mtr><mtd><msubsup><mrow><mi>w</mi></mrow><mrow><mi>i</mi><mo>
   ,</mo><mi>j</mi></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>∼</mo><
   mi mathvariant="normal">P</mi><mi mathvariant="normal">o</mi><mi
   mathvariant="normal">i</mi><mi mathvariant="normal">s</mi><mi
   mathvariant="normal">s</mi><mi mathvariant="normal">o</mi><mi
   mathvariant="normal">n</mi><mfenced open="("
   close=")"><mrow><msubsup><mrow><mi>s</mi></mrow><mrow><mi>i</mi></mrow>
   <mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><munderover
   accentunder="false"
   accent="true"><mrow><mo>∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</m
   n></mrow><mrow><mi>K</mi></mrow></munderover><msubsup><mrow><mi>l</mi><
   /mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow><mrow><mfenced
   open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><msubsup><m
   row><mi>θ</mi></mrow><mrow><mi>k</mi><mo>,</mo><mi>j</mi></mrow><mrow><
   mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup></mrow></mf
   enced><mtext> for </mtext><mi>m</mi><mo>=</mo><mn>1,2</mn><mo>,</mo></m
   td></mtr></mtable> :MATH]
   (10)

   where l [i,k ]is the element in the ith row and the kth column in L , α
   [i,k ], β [i,k ]is the shape and scale parameter in the inverse Gamma
   distribution,
   [MATH:
   <msubsup><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></m
   row><mrow><mrow><mo stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   is the element in the ith row and the kth column in L ^(m),
   [MATH:
   <msubsup><mrow><mi>α</mi></mrow><mrow><mn>0</mn></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   is the hyperparameter that tackles the level of noise in the mth data
   modality,
   [MATH: <msubsup><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mrow><mi>k</mi><mo>,</mo><mo>⋅<
   /mo></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   is the kth row vector in Θ ^(m),
   [MATH: <msubsup><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mrow><mi>k</mi></mrow><mrow><mr
   ow><mo stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   is a vector of length J [m ]serving as the mean of the Logit-Normal
   distribution,
   [MATH: <msubsup><mrow><mi
   mathvariant="bold">Σ</mi></mrow><mrow><mi>k</mi></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   is a J [m ]by J [m ]diagonal matrix serving as the covariance matrix of
   the Logit-Normal distribution,
   [MATH:
   <msubsup><mrow><mi>w</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>j</mi></m
   row><mrow><mrow><mo stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   is the element in the ith row and the jth column in W ^(m), and
   [MATH:
   <msubsup><mrow><mi>s</mi></mrow><mrow><mi>i</mi></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   is the scaling factor that accounts for the sequencing depth for each
   cell in the mth data modality.

   In the iPoLNG model, we use an inverse gamma distribution to model the
   elements in L , such that
   [MATH:
   <msubsup><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></m
   row><mrow><mo>−</mo><mn>1</mn></mrow></msubsup> :MATH]
   follows a gamma distribution, based on the fact that gamma distribution
   is the conjugate prior to the gamma distribution with a known shape
   parameter. To tackle different levels of noise across the data
   modalities, we assume that the expectations of
   [MATH:
   <msubsup><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></m
   row><mrow><mrow><mo stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   given l [i,k ]are identical for all m, but the variances vary according
   to the hyperparameter
   [MATH:
   <msubsup><mrow><mi>α</mi></mrow><mrow><mn>0</mn></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   :
   [MATH: <mi mathvariant="double-struck">E</mi><mfenced open="("
   close=")"><mrow><msubsup><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo>,</
   mo><mi>k</mi></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo
   stretchy="false">|</mo><msub><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo
   >,</mo><mi>k</mi></mrow></msub></mrow></mfenced><mo>=</mo><msub><mrow><
   mi>l</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>,
   </mo><mi mathvariant="normal">v</mi><mi mathvariant="normal">a</mi><mi
   mathvariant="normal">r</mi><mfenced open="("
   close=")"><mrow><msubsup><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo>,</
   mo><mi>k</mi></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo
   stretchy="false">|</mo><msub><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo
   >,</mo><mi>k</mi></mrow></msub></mrow></mfenced><mo>=</mo><mfrac><mrow>
   <msubsup><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></m
   row><mrow><mn>2</mn></mrow></msubsup></mrow><mrow><msubsup><mrow><mi>α<
   /mi></mrow><mrow><mn>0</mn></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup></mrow></mf
   rac><mo>.</mo> :MATH]
   (11)

   note that the variance of
   [MATH:
   <msubsup><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></m
   row><mrow><mrow><mo stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   given l [i,k ]will decrease when
   [MATH:
   <msubsup><mrow><mi>α</mi></mrow><mrow><mn>0</mn></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   increases. When
   [MATH:
   <msubsup><mrow><mi>α</mi></mrow><mrow><mn>0</mn></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   is large,
   [MATH:
   <msubsup><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></m
   row><mrow><mrow><mo stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   will tend to be close to l [i,k ], which indicates that the level of
   noise in the mth data modality is low.

   [MATH:
   <msubsup><mrow><mi>s</mi></mrow><mrow><mi>i</mi></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   is set the same way as that in the PoLNG model:
   [MATH:
   <msubsup><mrow><mi>s</mi></mrow><mrow><mi>i</mi></mrow><mrow><mfenced
   open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>=</mo><
   munderover accentunder="false"
   accent="true"><mrow><mo>∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><mn>1</m
   n></mrow><mrow><msub><mrow><mi>J</mi></mrow><mrow><mi>m</mi></mrow></ms
   ub></mrow></munderover><msubsup><mrow><mi>w</mi></mrow><mrow><mi>i</mi>
   <mo>,</mo><mi>j</mi></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>.</mo>
   :MATH]
   (12)

   To obtain the parameter estimation in Pyro, conditional on the data W
   ^(1), W ^(2), we assume the independency across all
   [MATH:
   <msub><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow
   ></msub><mo>,</mo><msubsup><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo>,
   </mo><mi>k</mi></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup><mo>,</mo><msubsup><mrow
   ><mi>l</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow><mrow><mro
   w><mo stretchy="false">(</mo><mrow><mn>2</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   , across all
   [MATH: <msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi><mo>,</mo><mo>⋅</mo></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup><mo>,</mo><msubsup><mrow
   ><mover accent="true"><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi><mo>,</mo><mo>⋅</mo></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>2</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   , and among L , L ^(1), L ^(2), Θ ^(1), Θ ^(2). The variational
   distributions are set as
   [MATH: <mtable
   class="gathered"><mtr><mtd><msub><mrow><mi>l</mi></mrow><mrow><mi>i</mi
   ><mo>,</mo><mi>k</mi></mrow></msub><mo
   stretchy="false">|</mo><msup><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mfenced open="("
   close=")"><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mo>,</mo><msu
   p><mrow><mi mathvariant="bold-italic">W</mi></mrow><mrow><mfenced
   open="("
   close=")"><mrow><mn>2</mn></mrow></mfenced></mrow></msup><mo>∼</mo><mi
   mathvariant="normal">I</mi><mi mathvariant="normal">n</mi><mi
   mathvariant="normal">v</mi><mi mathvariant="normal">e</mi><mi
   mathvariant="normal">r</mi><mi mathvariant="normal">s</mi><mi
   mathvariant="normal">e</mi><mi mathvariant="normal">G</mi><mi
   mathvariant="normal">a</mi><mi mathvariant="normal">m</mi><mi
   mathvariant="normal">m</mi><mi mathvariant="normal">a</mi><mfenced
   open="("
   close=")"><mrow><msub><mrow><mi>a</mi></mrow><mrow><mi>i</mi><mo>,</mo>
   <mi>k</mi></mrow></msub><mo>,</mo><msub><mrow><mi>b</mi></mrow><mrow><m
   i>i</mi><mo>,</mo><mi>k</mi></mrow></msub></mrow></mfenced><mo>,</mo></
   mtd></mtr><mtr><mtd><msubsup><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo
   >,</mo><mi>k</mi></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo
   stretchy="false">|</mo><msup><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mfenced open="("
   close=")"><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mo>,</mo><msu
   p><mrow><mi mathvariant="bold-italic">W</mi></mrow><mrow><mfenced
   open="("
   close=")"><mrow><mn>2</mn></mrow></mfenced></mrow></msup><mo>∼</mo><mi
   mathvariant="normal">G</mi><mi mathvariant="normal">a</mi><mi
   mathvariant="normal">m</mi><mi mathvariant="normal">m</mi><mi
   mathvariant="normal">a</mi><mfenced open="("
   close=")"><mrow><msubsup><mrow><mi>a</mi></mrow><mrow><mi>i</mi><mo>,</
   mo><mi>k</mi></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>,</mo><
   msubsup><mrow><mi>b</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mr
   ow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup></mrow></mf
   enced><mtext> for </mtext><mi>m</mi><mo>=</mo><mn>1,2</mn><mo>,</mo></m
   td></mtr><mtr><mtd><msubsup><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mrow><mi>k</mi><mo>,</mo><mo>⋅<
   /mo></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>=</mo><
   mi>σ</mi><mfenced open="(" close=")"><mrow><msubsup><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi><mo>,</mo><mo>⋅</mo></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup></mrow></mf
   enced><mo>,</mo><msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi><mo>,</mo><mo>⋅</mo></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo
   stretchy="false">|</mo><msup><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mfenced open="("
   close=")"><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mo>,</mo><msu
   p><mrow><mi mathvariant="bold-italic">W</mi></mrow><mrow><mfenced
   open="("
   close=")"><mrow><mn>2</mn></mrow></mfenced></mrow></msup><mo>∼</mo><mi
   mathvariant="normal">L</mi><mi mathvariant="normal">o</mi><mi
   mathvariant="normal">g</mi><mi mathvariant="normal">i</mi><mi
   mathvariant="normal">t</mi><mo>−</mo><mi mathvariant="normal">N</mi><mi
   mathvariant="normal">o</mi><mi mathvariant="normal">r</mi><mi
   mathvariant="normal">m</mi><mi mathvariant="normal">a</mi><mi
   mathvariant="normal">l</mi><mfenced open="("
   close=")"><mrow><msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mo>¯</mo></mover></mrow><mrow><
   mi>k</mi></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>,</mo><
   msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold">Σ</mi></mrow><mo>¯</mo></mover></mrow><mrow><mi>k</m
   i></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup></mrow></mf
   enced><mtext> for </mtext><mi>m</mi><mo>=</mo><mn>1,2</mn><mo>.</mo></m
   td></mtr></mtable> :MATH]
   (13)

   by default, the hyperparameters in the prior in model (10) are set as
   [MATH:
   <msub><mrow><mi>α</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow
   ></msub><mo>=</mo><mn>1</mn><mo>,</mo><msub><mrow><mi>β</mi></mrow><mro
   w><mi>i</mi><mo>,</mo><mi>k</mi></mrow></msub><mo>=</mo><mfenced
   open="("
   close=")"><mrow><msub><mrow><mi>α</mi></mrow><mrow><mi>i</mi><mo>,</mo>
   <mi>k</mi></mrow></msub><mo>+</mo><mn>1</mn></mrow></mfenced><mo>/</mo>
   <mi>K</mi><mo>,</mo><msubsup><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mrow><mi>k</mi></mrow><mrow><mf
   enced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>=</mo><
   mn mathvariant="bold">0</mn><mo>,</mo><msubsup><mrow><mi
   mathvariant="bold">Σ</mi></mrow><mrow><mi>k</mi></mrow><mrow><mfenced
   open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>=</mo><
   mi
   mathvariant="bold">I</mi><mtext> for all </mtext><mi>i</mi><mo>,</mo><m
   i>k</mi><mo>,</mo><mi>m</mi><mo>.</mo> :MATH]
   (14)

   if no initial values are provided, the default initial values for the
   parameters in the variational distributions are set as
   [MATH:
   <msub><mrow><mi>a</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow
   ></msub><mo>=</mo><msub><mrow><mi>b</mi></mrow><mrow><mi>i</mi><mo>,</m
   o><mi>k</mi></mrow></msub><mo>=</mo><msubsup><mrow><mi>a</mi></mrow><mr
   ow><mi>i</mi><mo>,</mo><mi>k</mi></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>=</mo><
   msubsup><mrow><mi>b</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mr
   ow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>=</mo><
   mn>0.5</mn><mo>,</mo><msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mo>¯</mo></mover></mrow><mrow><
   mi>k</mi></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>=</mo><
   mn mathvariant="bold">0</mn><mo>,</mo><msubsup><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold">Σ</mi></mrow><mo>¯</mo></mover></mrow><mrow><mi>k</m
   i></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>=</mo><
   mn>0.1</mn><mi
   mathvariant="bold">I</mi><mtext> for all </mtext><mi>i</mi><mo>,</mo><m
   i>k</mi><mo>,</mo><mi>m</mi><mo>.</mo> :MATH]
   (15)

   the estimated parameters
   [MATH: <mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mo>,</mo><msup><mrow><mover
   accent="true"><mrow><mi mathvariant="bold-italic">L</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msup> :MATH]
   and
   [MATH: <msup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold">Θ</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msup> :MATH]
   are computed as the mode of the corresponding variational
   distributions:
   [MATH: <msub><mrow><mover accent="true"><mrow><mi>l</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</
   mi></mrow></msub><mo>=</mo><mfrac><mrow><msub><mrow><mover
   accent="true"><mrow><mi>b</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</
   mi></mrow></msub></mrow><mrow><msub><mrow><mover
   accent="true"><mrow><mi>a</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</
   mi></mrow></msub><mo>+</mo><mn>1</mn></mrow></mfrac><mo>,</mo><msubsup>
   <mrow><mover accent="true"><mrow><mi>l</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</
   mi></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>=</mo><
   mfrac><mrow><msubsup><mrow><mover
   accent="true"><mrow><mi>a</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</
   mi></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>−</mo><
   mn>1</mn></mrow><mrow><msubsup><mrow><mover
   accent="true"><mrow><mi>b</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</
   mi></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup></mrow></mf
   rac><mi>I</mi><mfenced open="(" close=")"><mrow><msubsup><mrow><mover
   accent="true"><mrow><mi>a</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</
   mi></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>></mo><
   mn>1</mn></mrow></mfenced><mo>,</mo><msubsup><mrow><mover
   accent="true"><mrow><mi mathvariant="bold-italic">θ</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mi>k</mi><mo>,</mo><mo>⋅</
   mo></mrow><mrow><mfenced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup><mo>=</mo><
   mi>σ</mi><mfenced open="(" close=")"><mrow><msubsup><mrow><mover
   accent="true"><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mo>¯</mo></mover></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mi>k</mi></mrow><mrow><mfe
   nced open="("
   close=")"><mrow><mi>m</mi></mrow></mfenced></mrow></msubsup></mrow></mf
   enced><mo>.</mo> :MATH]
   (16)

   We propose a heuristic procedure to select
   [MATH:
   <msubsup><mrow><mi>α</mi></mrow><mrow><mn>0</mn></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   . First, we apply the PoLNG model to data W ^(m), m = 1, 2, separately.
   With the estimated variational parameters in the Gamma distribution, we
   obtain the mean and variance of
   [MATH:
   <msubsup><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi><mo
   >,</mo><mi mathvariant="normal">P</mi><mi
   mathvariant="normal">o</mi><mi mathvariant="normal">L</mi><mi
   mathvariant="normal">N</mi><mi
   mathvariant="normal">G</mi></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   , denoted as
   [MATH: <mi mathvariant="double-struck">E</mi><mrow><mo
   stretchy="false">(</mo><mrow><msubsup><mrow><mi>l</mi></mrow><mrow><mi>
   i</mi><mo>,</mo><mi>k</mi><mo>,</mo><mi mathvariant="normal">P</mi><mi
   mathvariant="normal">o</mi><mi mathvariant="normal">L</mi><mi
   mathvariant="normal">N</mi><mi
   mathvariant="normal">G</mi></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup></mrow><mo
   stretchy="false">)</mo></mrow> :MATH]
   and
   [MATH: <mi mathvariant="normal">v</mi><mi
   mathvariant="normal">a</mi><mi mathvariant="normal">r</mi><mrow><mo
   stretchy="false">(</mo><mrow><msubsup><mrow><mi>l</mi></mrow><mrow><mi>
   i</mi><mo>,</mo><mi>k</mi><mo>,</mo><mi mathvariant="normal">P</mi><mi
   mathvariant="normal">o</mi><mi mathvariant="normal">L</mi><mi
   mathvariant="normal">N</mi><mi
   mathvariant="normal">G</mi></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup></mrow><mo
   stretchy="false">)</mo></mrow> :MATH]
   , respectively. Next, we fit a quantile regression with 90% quantile
   and no intercept term, with
   [MATH: <mi mathvariant="normal">v</mi><mi
   mathvariant="normal">a</mi><mi mathvariant="normal">r</mi><mrow><mo
   stretchy="false">(</mo><mrow><msubsup><mrow><mi>l</mi></mrow><mrow><mi>
   i</mi><mo>,</mo><mi>k</mi><mo>,</mo><mi mathvariant="normal">P</mi><mi
   mathvariant="normal">o</mi><mi mathvariant="normal">L</mi><mi
   mathvariant="normal">N</mi><mi
   mathvariant="normal">G</mi></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup></mrow><mo
   stretchy="false">)</mo></mrow> :MATH]
   being the dependent variable and
   [MATH: <msup><mrow><mi
   mathvariant="double-struck">E</mi></mrow><mrow><mn>2</mn></mrow></msup>
   <mrow><mo
   stretchy="false">(</mo><mrow><msubsup><mrow><mi>l</mi></mrow><mrow><mi>
   i</mi><mo>,</mo><mi>k</mi><mo>,</mo><mi mathvariant="normal">P</mi><mi
   mathvariant="normal">o</mi><mi mathvariant="normal">L</mi><mi
   mathvariant="normal">N</mi><mi
   mathvariant="normal">G</mi></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup></mrow><mo
   stretchy="false">)</mo></mrow> :MATH]
   being the independent variable. Finally,
   [MATH:
   <msubsup><mrow><mi>α</mi></mrow><mrow><mn>0</mn></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   is computed as the reciprocal of the slope in the quantile regression.

   The idea behind this heuristic procedure is based on Eq. [64]11, while
   the conditional mean and variance are approximated with the variational
   mean and variance. According to Eq. [65]11, there exists a linear
   relationship between
   [MATH: <mi mathvariant="normal">v</mi><mi
   mathvariant="normal">a</mi><mi mathvariant="normal">r</mi><mrow><mo
   stretchy="false">(</mo><mrow><msubsup><mrow><mi>l</mi></mrow><mrow><mi>
   i</mi><mo>,</mo><mi>k</mi></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup><mo
   stretchy="false">|</mo><msub><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo
   >,</mo><mi>k</mi></mrow></msub></mrow><mo
   stretchy="false">)</mo></mrow> :MATH]
   and
   [MATH:
   <msubsup><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></m
   row><mrow><mn>2</mn></mrow></msubsup> :MATH]
   with slope equal to
   [MATH:
   <mfrac><mrow><mn>1</mn></mrow><mrow><msubsup><mrow><mi>α</mi></mrow><mr
   ow><mn>0</mn></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup></mrow></mfrac> :MATH]
   . By fitting W ^(m) with the PoLNG model, we are able to obtain the
   variational mean and variance, which approximate l [i,k ]and
   [MATH: <mi mathvariant="normal">v</mi><mi
   mathvariant="normal">a</mi><mi mathvariant="normal">r</mi><mrow><mo
   stretchy="false">(</mo><mrow><msubsup><mrow><mi>l</mi></mrow><mrow><mi>
   i</mi><mo>,</mo><mi>k</mi></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup><mo
   stretchy="false">|</mo><msub><mrow><mi>l</mi></mrow><mrow><mi>i</mi><mo
   >,</mo><mi>k</mi></mrow></msub></mrow><mo
   stretchy="false">)</mo></mrow> :MATH]
   , respectively. Considering the fact that the variance of the
   variational distributions is typically underestimated, we perform
   quantile regression with a high quantile rather than linear regression.

   We also use the variational parameters obtained from fitting the PoLNG
   model to individual data modality as the warm start for the iPoLNG
   model. Because a large
   [MATH:
   <msubsup><mrow><mi>α</mi></mrow><mrow><mn>0</mn></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   indicates a small level of noise in data modality W ^(m), we define
   [MATH: <mrow><mover
   accent="true"><mrow><mi>m</mi></mrow><mo>~</mo></mover></mrow><mo>=</mo
   ><mi>arg</mi><msub><mrow><mi>max</mi></mrow><mrow><mi>m</mi></mrow></ms
   ub><msubsup><mrow><mi>α</mi></mrow><mrow><mn>0</mn></mrow><mrow><mrow><
   mo stretchy="false">(</mo><mrow><mi>m</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   and use the variational parameters
   [MATH: <msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">μ</mi></mrow><mo>¯</mo></mover></mrow><mrow><
   mi>k</mi></mrow><mrow><mrow><mo stretchy="false">(</mo><mrow><mover
   accent="true"><mrow><mi>m</mi></mrow><mo>~</mo></mover></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup><mo>,</mo><msubsup><mrow
   ><mover accent="true"><mrow><mi
   mathvariant="bold">Σ</mi></mrow><mo>¯</mo></mover></mrow><mrow><mi>k</m
   i></mrow><mrow><mrow><mo stretchy="false">(</mo><mrow><mover
   accent="true"><mrow><mi>m</mi></mrow><mo>~</mo></mover></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   obtained from the PoLNG model as the initial values for the variational
   parameters in the iPoLNG model. Also, to alleviate the effect of
   non-identifiability, we use
   [MATH:
   <msubsup><mrow><mi>a</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></m
   row><mrow><mrow><mo stretchy="false">(</mo><mrow><mover
   accent="true"><mrow><mi>m</mi></mrow><mo>~</mo></mover></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup><mo>,</mo><msubsup><mrow
   ><mi>b</mi></mrow><mrow><mi>i</mi><mo>,</mo><mi>k</mi></mrow><mrow><mro
   w><mo stretchy="false">(</mo><mrow><mover
   accent="true"><mrow><mi>m</mi></mrow><mo>~</mo></mover></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   obtained from the PoLNG model as the initial values for the variational
   parameters for all m in the iPoLNG model. In the following analysis,
   the number of epochs is fixed to 3,000, the learning rate is set as
   0.1, and the Adam optimizer is used in the SVI algorithm for both PoLNG
   and iPoLNG models.

3 Results

3.1 Real data analysis

   To show that our model facilitates downstream analysis, iPoLNG is
   applied to several single-cell multiomics datasets, including one
   dataset generated from SHARE-seq, which measures gene expression and
   chromatin accessibility in the same single cells from a mouse brain,
   one dataset generated from Paired-Tag, which jointly profiles H3K27me3
   histone modification and transcriptome in the same single cells from a
   mouse brain, and two cryopreserved human peripheral blood mononuclear
   (PBMC) datasets generated from 10X Genomics Single Cell Multiome ATAC +
   Gene Expression Sequencing.

   For these datasets, we first filter out the low-quality cells that
   express in less than 500 genes in the gene expression data or in less
   than 200 regions in the epigenomic data. To select the informative
   features, we perform log-normalization with a scaling factor of 10,000
   and select the top 5,000 highly variable genes and top 20,000 highly
   variable regions with selection.method = ‘‘vst” using R package Seurat
   ([66]Stuart et al., 2019). The log normalizaion is merely used for
   selecting the highly variable features and the counts of the features
   are modeled by iPoLNG. Finally we take out the common cells in both
   data modalities as the input of the single cell multiomics data
   analysis.

3.1.1 iPoLNG achieves good clustering performance on datasets from different
technologies

   We evaluate the clustering performance of iPoLNG on these datasets and
   compare our method with several existing methods designed for
   single-cell multiomics data integration, including scAI ([67]Jin et
   al., 2020) and MOFA+ ([68]Argelaguet et al., 2018; [69]Argelaguet et
   al., 2020). scAI is implemented with the default parameters, and MOFA+
   is implemented with the default parameters and two algorithms:
   mean-field variational inference (VI) using CPU and stochastic
   variational inference (SVI) with GPU acceleration. iPoLNG accepts raw
   count data as the input, while scAI and MOFA + accept the
   log-normalized data as the input. All these three methods can infer a
   low-dimensional representation of the data with a user-defined number
   of latent factors. We set the number of latent factors K = 50. After
   obtaining the cell by factor loading matrix, we perform Leiden
   clustering algorithm ([70]Traag et al., 2019) with a binary search for
   the resolution parameter to cluster the data into the specific number
   of clusters. For datasets with given cell-type labels in the
   publications, the number of clusters is set as the number of the unique
   labels. The number of cluster is set as 8 for PBMC3k dataset and 19 for
   PBMC10k dataset.

   For datasets with given cell-type labels, Adjusted Rand Index (ARI)
   ([71]Hubert and Arabie, 1985) is computed to measure the accuracy of
   the clustering results. For PBMC datasets with unknown cell-type
   labels, Residual Average Gini Index (RAGI) score ([72]Chen et al.,
   2019a) is computed based on canonical marker genes and housekeeping
   genes (see [73]Supplementary Materials). A high RAGI score indicates a
   reasonable clustering result where the expression of marker genes is
   high in one or a few clusters, while the expression of housekeeping
   genes is broadly distributed across all the clusters. Considering the
   fact that the given cell-type labels in the original publications are
   also from some computational methods and can be wrong for some of the
   cells, we also compute the RAGI score. As Leiden clustering algorithm
   makes use of greedy search and leads to different clustering results
   with different initialization, we calculate the mean and standard error
   in 10 runs with different random seeds in the clustering step.

   The clustering performance evaluated by ARI or RAGI is presented in
   [74]Figure 2. In the Paired-Tag mouse brain dataset, iPoLNG achieves
   the highest ARI score (0.698), followed by scAI (0.653). In the
   SHARE-seq mouse brain dataset, iPoLNG reaches an ARI score of 0.606,
   which is comparable to the ARI score of 0.607 in scAI, although the
   clustering results of iPoLNG show a relatively high fluctuation.
   Neither VI nor SVI versions of MOFA + performs well in these two
   datasets. The clustering performance measured by RAGI also shows a
   trend similar to that measured by ARI for Paired-Tag and SHARE-seq
   mouse brain datasets. In 10xPBMC3k dataset, iPoLNG has the highest RAGI
   score (0.423). In 10xPBMC10k dataset, the RAGI score of iPoLNG is
   0.426, slightly higher than that of MOFA+ (0.418 for SVI and 0.419 for
   VI), while scAI cannot perform as well as the other methods in this
   dataset.

FIGURE 2.

   [75]FIGURE 2
   [76]Open in a new tab

   Clustering performance of iPoLNG, MOFA+ (SVI), MOFA+ (VI), and scAI on
   real data. (A) Comparison of ARI scores for Paired-Tag and SHARE-seq
   with given cell-type labels. (B) Comparison of RAGI scores for
   Paired-Tag, SHARE-seq, 10xPBMC3k, and 10xPBMC10k. The error bar
   represents the mean and standard error in 10 runs with different random
   seeds in the clustering step.

   In some applications, the cell structure revealed by different
   modalities can be different. We illustrate that iPoLNG is able to
   handle such scenarios by comparing the clustering performance of PoLNG
   (the simplified version of iPoLNG with just one data modality) with
   iPoLNG. In the Paired-Tag mouse brain dataset, the ARI score of running
   PoLNG for the single-cell RNA-seq data is 0.594, while the ARI score of
   running PoLNG for the single-cell histone modification data is very
   close to 0. In the SHARE-seq mouse brain dataset, the ARI score of
   running PoLNG for the single-cell RNA-seq data is 0.500, while the ARI
   score of running PoLNG for the single-cell ATAC-seq data is 0.02. The
   large difference in ARI between the two modalities indicates that the
   cell structure revealed by these modalities are different. When we
   integrate the information of both modalities using iPoLNG, the ARI
   score improves significantly compared to using RNA alone (from 0.594 to
   0.698 in the Paired-Tag mouse brain dataset, and from 0.500 to 0.606 in
   the SHARE-seq mouse brain dataset).

3.1.2 The factor loading matrices in iPoLNG provide rich biological insights

   We inspect the cell by factor loading matrix
   [MATH: <mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow> :MATH]
   inferred by iPoLNG for the 10xPBMC3k dataset ([77]Figure 3A) and the
   heatmap for the top 8 differentially expressed genes for each cluster
   ([78]Figure 3C). The differentially expressed genes are found by the
   FindAllMarkers() function using R package Seurat. Similarity in the
   factor loading matrix tends to be consistent with the similarity in the
   heatmap of marker genes: for example, clusters 1, 2 and 3 tend to have
   high factor scores for factors 16 and 29 ([79]Figure 3A), and their
   expression pattern for the marker genes tend to be more similar to each
   other ([80]Figure 3C).

FIGURE 3.

   [81]FIGURE 3
   [82]Open in a new tab

   Downstream analysis with iPoLNG for 10xPBMC3k data. (A) The heatmap of
   the cell by factor loading matrix
   [MATH: <mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow> :MATH]
   . The major factor score of cluster 6 is allocated to factor 28. (B)
   Gene scores for
   [MATH: <msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mn>28</mn><mo>,</mo><mo>⋅<
   /mo></mrow><mrow><mo>(</mo><mtext>RNA</mtext><mo>)</mo></mrow></msubsup
   > :MATH]
   , sorted in increasing order. The labelled marker genes of B cells tend
   to have high gene scores. (C) The heatmap for the top 8
   cluster-specific differentially expressed genes for each cluster. This
   heatmap validates the similarity of cells within and across clusters.
   Cells in cluster 6 tend to have high gene expression values in the
   canonical marker genes of B cells.

   Next, we focus on cluster 6, whose major factor score is allocated to
   factor 28. From the heatmap of gene expression, we find that cells in
   cluster 6 tend to have high gene expression values in canonical marker
   genes of B cells, including BANK1, MS4A1, CD79A, and IGHM ([83]Figure
   3C). By plotting
   [MATH: <msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mn>28</mn><mo>,</mo><mo>⋅<
   /mo></mrow><mrow><mo>(</mo><mtext>RNA</mtext><mo>)</mo></mrow></msubsup
   > :MATH]
   according to the ranking of gene factor scores ([84]Figure 3B), the
   canonical marker genes of B cells also tend to have large gene factor
   scores, which is consistent with the conclusion from the heatmap of
   gene expression.

   We also perform gene ontology (GO) enrichment analysis using the
   feature by factor loading matrices
   [MATH: <msup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold">Θ</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mo>(</mo><mtext>RNA</mtext
   ><mo>)</mo></mrow></msup> :MATH]
   and
   [MATH: <msup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold">Θ</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mo>(</mo><mtext>ATAC</mtex
   t><mo>)</mo></mrow></msup> :MATH]
   . We still focus on cluster 6 and factor 28 in 10xPBMC3k data. More
   specifically, we select the top 200 genes with large factor scores in
   [MATH: <msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mn>28</mn><mo>,</mo><mo>⋅<
   /mo></mrow><mrow><mo>(</mo><mtext>RNA</mtext><mo>)</mo></mrow></msubsup
   > :MATH]
   as the input of Metascape ([85]Zhou et al., 2019), and the top 1,000
   regions with large factor scores in
   [MATH: <msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">θ</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mn>28</mn><mo>,</mo><mo>⋅<
   /mo></mrow><mrow><mo>(</mo><mtext>ATAC</mtext><mo>)</mo></mrow></msubsu
   p> :MATH]
   as the input of Genomic Regions Enrichment of Annotations Tool (GREAT)
   ([86]McLean et al., 2010). The results for Metascape and GREAT are
   presented in [87]Supplementary Files S1, S2, respectively. The enriched
   biological processes and pathways with highly significant p-values
   include immune response, regulation of lymphocyte activation and
   pathways that are highly related to B cells. In conclusion, the GO
   enrichment analysis agrees well with the previous analysis of marker
   genes on cluster 6.

3.1.3 iPoLNG is able to handle partial information in the input

   In some applications, we have one dataset that have multiple
   modalities, but the other dataset that only measures one of the
   modalities, and we expect that the dataset with only one modality can
   be jointly trained with the multi-modal dataset so that it can borrow
   some information from the multi-modal dataset. iPoLNG is able to handle
   such partial information in the input by setting the unobserved count
   as 0, which is mathematically equivalent to not including the
   unobserved data in the likelihood function of the data.

   We design a new experiment to illustrate the power of iPoLNG to handle
   partial information and compare the result with Cobolt ([88]Gong et
   al., 2021), which also enables integrating single-modality dataset with
   multi-modal dataset. First, for the epigenomic data modality W ^(2), we
   randomly mask the data matrix for a certain percentage of the cells by
   setting the observed count as 0, i.e.,
   [MATH: <msup><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>2</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msup><mo>=</mo><msup><mrow><mrow
   ><mo stretchy="false">(</mo><mrow><msubsup><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mi
   mathvariant="italic">unmasked</mi></mrow><mrow><msup><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>2</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow><mrow><mtext>T</mtext></mrow></msu
   p></mrow></msubsup><mo>,</mo><msubsup><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mi
   mathvariant="italic">masked</mi></mrow><mrow><msup><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>2</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow><mrow><mtext>T</mtext></mrow></msu
   p></mrow></msubsup></mrow><mo
   stretchy="false">)</mo></mrow></mrow><mrow><mtext>T</mtext></mrow></msu
   p> :MATH]
   and
   [MATH: <msubsup><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mi
   mathvariant="italic">masked</mi></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>2</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup><mo>=</mo><mi
   mathvariant="bold-italic">O</mi> :MATH]
   is a zero matrix. Correspondingly, we denote the transcriptomic data
   modality
   [MATH: <msup><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msup><mo>=</mo><msup><mrow><mrow
   ><mo stretchy="false">(</mo><mrow><msubsup><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mi
   mathvariant="italic">unmasked</mi></mrow><mrow><msup><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow><mrow><mtext>T</mtext></mrow></msu
   p></mrow></msubsup><mo>,</mo><msubsup><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mi
   mathvariant="italic">masked</mi></mrow><mrow><msup><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow><mrow><mtext>T</mtext></mrow></msu
   p></mrow></msubsup></mrow><mo
   stretchy="false">)</mo></mrow></mrow><mrow><mtext>T</mtext></mrow></msu
   p> :MATH]
   , where
   [MATH: <msubsup><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mi
   mathvariant="italic">masked</mi></mrow><mrow><msup><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow><mrow><mtext>T</mtext></mrow></msu
   p></mrow></msubsup> :MATH]
   represents the transcriptomic data for the cells in which the
   epigenomic modality is masked. Next, we apply PoLNG to the
   transcriptomic data modality of these masked cells,
   [MATH: <msubsup><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mi
   mathvariant="italic">masked</mi></mrow><mrow><msup><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow><mrow><mtext>T</mtext></mrow></msu
   p></mrow></msubsup> :MATH]
   . We apply iPoLNG and Cobolt to
   [MATH: <msup><mrow><mrow><mo
   stretchy="false">(</mo><mrow><msubsup><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mi
   mathvariant="italic">unmasked</mi></mrow><mrow><msup><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow><mrow><mtext>T</mtext></mrow></msu
   p></mrow></msubsup><mo>,</mo><msubsup><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mi
   mathvariant="italic">masked</mi></mrow><mrow><msup><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow><mrow><mtext>T</mtext></mrow></msu
   p></mrow></msubsup></mrow><mo
   stretchy="false">)</mo></mrow></mrow><mrow><mtext>T</mtext></mrow></msu
   p> :MATH]
   and
   [MATH: <msup><mrow><mrow><mo
   stretchy="false">(</mo><mrow><msubsup><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mi
   mathvariant="italic">unmasked</mi></mrow><mrow><msup><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>2</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow><mrow><mtext>T</mtext></mrow></msu
   p></mrow></msubsup><mo>,</mo><msup><mrow><mi
   mathvariant="bold-italic">O</mi></mrow><mrow><mtext>T</mtext></mrow></m
   sup></mrow><mo
   stretchy="false">)</mo></mrow></mrow><mrow><mtext>T</mtext></mrow></msu
   p> :MATH]
   , where both transcriptomic and epigenomic data are observed for
   unmasked cells, and only transcriptomic data is observed for masked
   cells. Finally, we perform Leiden clustering on the low-dimensional
   embeddings of the masked cells in PoLNG, iPoLNG and Cobolt,
   respectively, and measure the clustering performance by computing their
   ARI scores. We set the percentage of masked cells to be 20%, 40%, 60%
   and 80% of all the cells, and the results for the Paired-Tag mouse
   brain dataset and the SHARE-seq mouse brain dataset are presented in
   [89]Figure 4. The clustering performance of iPoLNG is better than that
   of Cobolt and PoLNG under all settings, showing the power of iPoLNG to
   enable a dataset with single modality to borrow information from a
   larger dataset with two modalities.

FIGURE 4.

   [90]FIGURE 4
   [91]Open in a new tab

   Clustering performance of iPoLNG, Cobolt and PoLNG on the partially
   masked dataset for (A) Paired-Tag mouse brain RNA + H3K27me3 and (B)
   SHARE-seq mouse brain RNA + ATAC. We set the percentage of masked cells
   to be 20%, 40%, 60% and 80% of all the cells.

3.2 Model validation and comparison using simulated data

   We next perform simulation study to demonstrate the effectiveness of
   our proposed method.

   To generate simulated data, we first fit the iPoLNG model with K = 50
   to one dataset from Paired-Tag, where W ^(1) is the transcriptome data
   and W ^(2) is the H3K27me3 histone modification data of a mouse brain,
   and obtain the hyperparameters
   [MATH:
   <msubsup><mrow><mi>α</mi></mrow><mrow><mn>0</mn></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup><mo>,</mo><msubsup><mrow
   ><mi>α</mi></mrow><mrow><mn>0</mn></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>2</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   and the fitted variational parameters
   [MATH: <mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mo>,</mo><msup><mrow><mover
   accent="true"><mrow><mi mathvariant="bold">Θ</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msup><mo>,</mo><msup><mrow><move
   r accent="true"><mrow><mi mathvariant="bold">Θ</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>2</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msup> :MATH]
   . Next, we take a subset of this dataset to obtain all cells in the
   following five cell types “HC_ExNeu_CA1” (403 cells), “FC_ExNeu_PT”
   (219 cells), “HC_ExNeu_DG” (396 cells), “BR_InNeu_CGE” (169 cells),
   “HC_ExNeu_CA23” (440 cells), calculate the column mean of
   [MATH: <mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow> :MATH]
   within each cluster to obtain “cluster centers”
   [MATH: <msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">l</mi></mrow><mo>¯</mo></mover></mrow><mrow><
   mn>1</mn><mo>,</mo><mo>⋅</mo></mrow></msub><mo>,</mo><msub><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold-italic">l</mi></mrow><mo>¯</mo></mover></mrow><mrow><
   mn>2</mn><mo>,</mo><mo>⋅</mo></mrow></msub><mo>,</mo><msub><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold-italic">l</mi></mrow><mo>¯</mo></mover></mrow><mrow><
   mn>3</mn><mo>,</mo><mo>⋅</mo></mrow></msub><mo>,</mo><msub><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold-italic">l</mi></mrow><mo>¯</mo></mover></mrow><mrow><
   mn>4</mn><mo>,</mo><mo>⋅</mo></mrow></msub><mo>,</mo><msub><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold-italic">l</mi></mrow><mo>¯</mo></mover></mrow><mrow><
   mn>5</mn><mo>,</mo><mo>⋅</mo></mrow></msub> :MATH]
   . In the simulated data, we assume that there are 5 clusters and the
   cells in the ith cluster are generated from
   [MATH: <msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">l</mi></mrow><mo>¯</mo></mover></mrow><mrow><
   mi>i</mi><mo>,</mo><mo>⋅</mo></mrow></msub> :MATH]
   . More specifically, we utilize the hyperparameters
   [MATH:
   <msubsup><mrow><mi>α</mi></mrow><mrow><mn>0</mn></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup><mo>,</mo><msubsup><mrow
   ><mi>α</mi></mrow><mrow><mn>0</mn></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>2</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msubsup> :MATH]
   , cluster centers
   [MATH: <msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">l</mi></mrow><mo>¯</mo></mover></mrow><mrow><
   mn>1</mn><mo>,</mo><mo>⋅</mo></mrow></msub><mo>,</mo><msub><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold-italic">l</mi></mrow><mo>¯</mo></mover></mrow><mrow><
   mn>2</mn><mo>,</mo><mo>⋅</mo></mrow></msub><mo>,</mo><msub><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold-italic">l</mi></mrow><mo>¯</mo></mover></mrow><mrow><
   mn>3</mn><mo>,</mo><mo>⋅</mo></mrow></msub><mo>,</mo><msub><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold-italic">l</mi></mrow><mo>¯</mo></mover></mrow><mrow><
   mn>4</mn><mo>,</mo><mo>⋅</mo></mrow></msub><mo>,</mo><msub><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold-italic">l</mi></mrow><mo>¯</mo></mover></mrow><mrow><
   mn>5</mn><mo>,</mo><mo>⋅</mo></mrow></msub> :MATH]
   , the fitted variational parameters
   [MATH: <msup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold">Θ</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>1</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msup><mo>,</mo><msup><mrow><move
   r accent="true"><mrow><mi mathvariant="bold">Θ</mi></mrow><mo
   stretchy="false">^</mo></mover></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>2</mn></mrow><mo
   stretchy="false">)</mo></mrow></mrow></msup> :MATH]
   and the sequencing depth obtained from all 1,627 cells in the 5
   clusters to generate simulated data according to our generative model
   (10) (See [92]Supplementary File S3 for the value of fitted variational
   parameters and the sequencing depths for the cells.) In order to
   evaluate the performance of our algorithm for data under different
   levels of noise, we divide the sequencing depth in the transcriptomic
   data by 1,2,5,10 respectively to generate simulated datasets with 4
   different levels of noise. We expect that datasets with small sequecing
   depths tend to have low UMI counts, thus high sparsity and a high level
   of noise. For each setting, 5 datasets are generated with different
   seeds.

   We again evaluate the clustering performance of iPoLNG and compare our
   method with scAI and MOFA+. We varied the number of factors K as 5, 20,
   50 for all methods. The boxplots of ARI values for the simulated
   datasets are presented in [93]Figure 5A. When the level of noise is low
   (sequencing depth divided by 1 or 2), both iPoLNG and scAI can reach
   ARI values of nearly 1, which suggests that they can accurately recover
   the cell types in the simulated data. As the level of noise increases,
   the performance of all methods becomes worse as expected, but iPoLNG
   still remains the best method among all settings. We also note that
   iPoLNG is robust to the choice of the number of factors. When K is
   larger than 5, i.e. the number of cell types, the clustering
   performance of iPoLNG does not decrease significantly under small or
   moderate levels of noise.

FIGURE 5.

   [94]FIGURE 5
   [95]Open in a new tab

   Comparison of clustering performance and running time for simulated
   data. (A) ARI scores for K = 5, 20, 50 and the level of noise is
   adjusted by dividing the sequencing depth by 1, 2, 5, and 10. The
   boxplot represents the ARI scores for 5 simulated datasets under the
   same setting. (B) Computational time for iPoLNG, MOFA+ and scAI. MOFA+
   (VI) and scAI were run on a server with Intel Xeon Gold 6246R CPU and
   120 GB RAM. iPoLNG and MOFA+ (SVI) were run on a server with NVIDIA
   Tesla V100 GPU and 80 GB RAM.

   We also evaluate the running time of the methods ([96]Figure 5B). Cells
   are sampled with replacement from the preprocessed 10xPBMC3k dataset to
   generate simulated data with different numbers of cells. With GPU
   acceleration, the running time of iPoLNG for the simulated dataset with
   20,000 cells is 13.9 min for K = 5, 14.7 min for K = 20 and 14.9 min
   for K = 50, which remains the smallest among all the methods under the
   same setting. MOFA+ (SVI) is the second fastest method, but its running
   time is 2–6 times the running time of iPoLNG. The slight change of
   running time across K also illustrates iPoLNG’s running time is robust
   to the number of factors K. By contrast, the running time of MOFA+ and
   scAI can be significantly affected by the number of factors.

4 Discussion

   Single-cell multiomics technologies generate datasets with multi-modal
   measurements from the same set of cells, thus posing significant
   challenges for integrating and characterizing multiple types of
   measurements in a biologically meaningful way. The single-cell data is
   high-dimensional yet intrinsically sparse, and different layers of
   single-cell multiomics data usually exhibit different levels of noise.

   In this study, we introduced iPoLNG, an unsupervised method for
   integrating single-cell multiomics data to dissect the cellular
   heterogeneity from multiple data modalities. From a biological
   perspective, iPoLNG infers two kinds of low-dimensional representations
   of the high-dimensional single-cell multiomics data: one cell by factor
   loading matrix and two feature by factor loading matrices. The cell by
   factor loading matrix can identify distinct cell types and improve
   clustering accuracy compared to other models that reconstruct the
   latent space of cells, and the feature by factor loading matrices can
   characterize cell-type specific markers and facilitate gene ontology
   (GO) enrichment analysis. From a technical perspective, iPoLNG presents
   several advantages. First, it directly models the unique molecular
   identifiers (UMIs) of single-cell multiomics data and takes into
   account the sequencing depths of cells, which suggests the discrete
   counts without any normalization procedure can directly serve as the
   input of the model. Second, as a scalable algorithm, stochastic
   variational inference with GPU acceleration in iPoLNG potentially
   enables the computation of large-scale single-cell datasets with a
   considerably high speed. Third, the hyperparameters that control the
   levels of noise across different data modalities in iPoLNG are
   automatically learned by fitting the PoLNG model to individual data
   modality, which saves the efforts to tune these hyperparameters.

   iPoLNG also exhibits some limitations. First, modelling the discrete
   counts directly suggests that it lacks the flexibility to fit
   continuous data. Second, this method is tailored specifically for
   multi-modal measurements from the same sample space, contrasting with
   some other methods ([97]Stuart et al., 2019; [98]Welch et al., 2019)
   that aim at integrating cells on the same feature space. Third, iPoLNG
   assumes independence between features by a diagonal covariance matrix
   in the Logit-Normal distribution, but genomic features are known to
   show interaction via gene regulatory networks ([99]Duren et al., 2017;
   [100]Colomé-Tatché and Theis, 2018; [101]Delgado and Gómez-Vela, 2019).

   We speculate the future direction of iPoLNG as follows. We may
   incorporate the idea of Deep Exponential Families ([102]Ranganath et
   al., 2015) to model the complex biological structures by adding
   additional layers for the latent factors. The model may also be
   extended to analyze spatial epigenome-transcriptome co-profiling data
   by modelling the information of spatial coordinates with links
   ([103]Chang and Blei, 2009). Additionally, the model may be extended to
   incorporate the regulatory links between transcriptome and epigenome
   ([104]Colomé-Tatché and Theis, 2018).

Funding Statement

   This work has been supported by The Chinese University of Hong Kong
   startup grant (4930181), The Chinese University of Hong Kong Science
   Faculty’s Collaborative Research Impact Matching Scheme (CRIMS
   4620033), and Hong Kong Research Grant Council (ECS 24301419 and GRF
   14301120).

Data availability statement

   Publicly available datasets were analyzed in this study. This data can
   be found here:
   [105]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE140203
   [106]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE152020
   [107]https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy
   -donor-granulocytes-removed-through-cell-sorting-10-k-1-standard-2-0-0.
   The processed data can also be accessed from GitHub repository:
   iPoLNG_source. [108]https://github.com/cuhklinlab/iPoLNG_source. iPoLNG
   is implemented in Python, and it is freely available under the LGPL-3.0
   license on GitHub ([109]https://github.com/cuhklinlab/iPoLNG).

Author contributions

   WZ conducted the research. ZL supervised the research. WZ wrote the
   first draft of the manuscript. All authors contributed to manuscript
   revision, read, and approved the submitted version.

Conflict of interest

   The authors declare that the research was conducted in the absence of
   any commercial or financial relationships that could be construed as a
   potential conflict of interest.

Publisher’s note

   All claims expressed in this article are solely those of the authors
   and do not necessarily represent those of their affiliated
   organizations, or those of the publisher, the editors and the
   reviewers. Any product that may be evaluated in this article, or claim
   that may be made by its manufacturer, is not guaranteed or endorsed by
   the publisher.

Supplementary material

   The Supplementary Material for this article can be found online at:
   [110]https://www.frontiersin.org/articles/10.3389/fgene.2023.998504/ful
   l#supplementary-material
   [111]Click here for additional data file.^ (85.2KB, PDF)
   [112]Click here for additional data file.^ (33KB, XLS)
   [113]Click here for additional data file.^ (13MB, ZIP)
   [114]Click here for additional data file.^ (95.3KB, XLSX)

References