Abstract

Background

   Multiple co-inertia analysis (mCIA) is a multivariate analysis method
   that can assess relationships and trends in multiple datasets. Recently
   it has been used for integrative analysis of multiple high-dimensional
   -omics datasets. However, its estimated loading vectors are non-sparse,
   which presents challenges for identifying important features and
   interpreting analysis results. We propose two new mCIA methods: 1) a
   sparse mCIA method that produces sparse loading estimates and 2) a
   structured sparse mCIA method that further enables incorporation of
   structural information among variables such as those from functional
   genomics.

Results

   Our extensive simulation studies demonstrate the superior performance
   of the sparse mCIA and structured sparse mCIA methods compared to the
   existing mCIA in terms of feature selection and estimation accuracy.
   Application to the integrative analysis of transcriptomics data and
   proteomics data from a cancer study identified biomarkers that are
   suggested in the literature related with cancer disease.

Conclusion

   Proposed sparse mCIA achieves simultaneous model estimation and feature
   selection and yields analysis results that are more interpretable than
   the existing mCIA. Furthermore, proposed structured sparse mCIA can
   effectively incorporate prior network information among genes,
   resulting in improved feature selection and enhanced interpretability.

   Keywords: Multiple co-inertia analysis, l[0] penalty, Network penalty,
   Structural information, Gene network information, Integrative analysis,
   High-dimensional data, -omics data

Background

   Large scale -omics studies have become common partly as a result of
   rapid advances in technologies. Many of them generate multiple -omics
   datasets on the same set of subjects. For example, cancer studies
   generate datasets using the NCI-60 cell line panel, a group of 60 human
   cancer cell lines used by the National Cancer Institute (NCI). Various
   types of -omics datasets such as gene expression or protein abundance
   from this cell line panel are generated and available via a web
   application CellMiner [[27]32]. Another example can be found at The
   Cancer Genome Atlas (TCGA) repository that contains multiple types of
   -omics datasets such as genotype, mRNA, microRNA, and protein abundance
   data collected from the same set of subjects. The abundance of such
   datasets has created increasing needs in advanced methods for
   integrative analysis beyond separated analyses. Integrative analysis
   enables us not only to understand underlying relationships among
   multiple datasets but also discover more biologically meaningful
   results that may not be found from analysis of a single dataset. As a
   response to increasing needs, there have been continuous efforts in
   developing such methods.

   Tenenhaus and Tenenhaus [[28]36] reviewed various methods for
   integrative analysis of multiple datasets from the same set of
   subjects. Canonical correlation analysis [[29]17] is one popular method
   for integrative analysis of two datasets measured on the same set of
   subjects. For each of two datasets, CCA seeks a linear transformation
   so that correlation between two transformed datasets is maximized. It
   is a prototype method to use a correlation-based objective function.
   Based on CCA, various extended methods have been proposed to integrate
   more than two datasets into a single model. Some examples are [[30]4,
   [31]16, [32]41], [[33]12, [34]12], and [[35]15].

   Covariance-based criteria is another way to construct an objective
   function. Tucker’s inner-battery factor analysis [[36]38] is the
   seminal paper for investigating covariance structures between two
   datasets. Various approaches have been proposed to extend the method to
   an integrative model for more than two datasets. [[37]39], [[38]13,
   [39]19], and [[40]8] are some examples.

   Multiple co-inertia analysis [[41]6] is another integrative analysis
   method employing a covariance-based objective function to identify
   common relationships and assess concordance among multiple datasets.
   This method finds a set of loading vectors for multiple K≥2 datasets
   and a so-called “synthetic" center of all datasets such that a sum of
   squared covariances between each of linearly transformed datasets and a
   synthetic center is maximized. Recently it has been applied to an
   integrative analysis of multiple -omics datasets [[42]24]. However,
   estimated loading vectors of mCIA are nonsparse. That is, if we want to
   apply mCIA for analyzing two gene expression data, every gene in each
   data has nonzero coefficient, making it difficult to interpret the
   results. This has been noted as a weakness of the method [[43]20,
   [44]25]. In statistical literature, sparse estimation has been
   suggested as a remedy for this type of problem and has shown good
   performance in genomics or biological data [[45]22, [46]34].

   In this paper, we propose a novel approach that imposes a sparsity
   constraint on mCIA method, sparse mCIA (smCIA). This model conducts
   estimation and variable selection simultaneously. Non-sparsity poses
   significant challenges not only in developing an accurate model, but
   also in interpreting the results. Ultra-high dimensionality is the
   inherited nature of -omics datasets, thus statistical models for
   analyzing -omics datasets benefit from feature selection procedure. To
   address this issue, it is desirable to employ a sparsity in the model.
   However, it has not been introduced in the mCIA framework to the best
   of our knowledge. The regularized generalized CCA framework [[47]37]
   encompasses many integrative methods including mCIA and a sparse
   version of generalized CCA as its special cases, but it does not
   include a sparsity-constrained mCIA as its special case.

   Also, we propose to extend smCIA, structured sparse mCIA (ssmCIA) that
   incorporates the structural information among variables to guide the
   model for obtaining more biologically meaningful results. It is
   well-known that gene expressions are controlled by the gene regulatory
   network (GRN) [[48]31]. Incorporation of those known prior structural
   knowledge among genes is one of potential approaches to improve
   analysis results. There are continuing interests in developing
   statistical methods toward this direction [[49]21, [50]26, [51]27]. To
   incorporate structural knowledge, we employ another penalty term in the
   objective function of smCIA so that we can guide the model to achieve
   the improved feature selection.

Methods

   Before introducing two proposed models, we briefly review the classical
   mCIA problem.

   Suppose that we have K datasets from n subjects, i.e., K data triplets
   [MATH: <msubsup><mrow><mo>(</mo><msub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>,</mo><mi mathvariant="bold-italic">D</mi><mo>,</mo><msub><mrow><mi
   mathvariant="bold-italic">Q</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>)</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>K</
   mi></mrow></msubsup><mo>,</mo><msub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>∈</mo><msup><mrow><mi>ℝ</mi></mrow><mrow><mi>n</mi><mo>×</mo><msub><m
   row><mi>p</mi></mrow><mrow><mi>k</mi></mrow></msub></mrow></msup><mo>,<
   /mo><mi
   mathvariant="bold-italic">D</mi><mo>∈</mo><msup><mrow><mi>ℝ</mi></mrow>
   <mrow><mi>n</mi><mo>×</mo><mi>n</mi></mrow></msup><mo>,</mo><msub><mrow
   ><mi
   mathvariant="bold-italic">Q</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>∈</mo><msup><mrow><mi>ℝ</mi></mrow><mrow><msub><mrow><mi>p</mi></mrow
   ><mrow><mi>k</mi></mrow></msub><mo>×</mo><msub><mrow><mi>p</mi></mrow><
   mrow><mi>k</mi></mrow></msub></mrow></msup> :MATH]
   , and w=(w[1],…,w[K]) for k=1,…,K. D is a diagonal weight metric of the
   space
   [MATH:
   <msup><mrow><mi>ℝ</mi></mrow><mrow><mi>n</mi></mrow></msup><mo>,</mo><m
   sub><mrow><mi
   mathvariant="bold-italic">Q</mi></mrow><mrow><mi>k</mi></mrow></msub>
   :MATH]
   is a diagonal weight metric of the space
   [MATH:
   <msup><mrow><mi>ℝ</mi></mrow><mrow><msub><mrow><mi>p</mi></mrow><mrow><
   mi>k</mi></mrow></msub></mrow></msup> :MATH]
   , and w[k] is a positive weight for the k-th dataset such that
   [MATH:
   <mo>∑</mo><msub><mrow><mi>w</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>=</mo><mn>1</mn> :MATH]
   . Without loss of generality, assume that X[k] is column-wise centered
   and standardized.

   There are various ways to construct D. The simplest way is to use the
   identity matrix for D, equal weights for each sample. Or, it can be
   used to put strong emphasis on some reliable samples compared to other
   samples by putting higher weights. Also possible sampling bias or
   duplicated observations can be adjusted via constructing appropriate D
   matrix. In specific, we can estimate the probability of selection for
   each individual in the sample using available covariates in the dataset
   and use the inverse of the estimated probability as a weight of each
   individual for adjustment. Later in our real data analysis, we use the
   identity matrix for D.

   For Q[k], we use the proportions defined as the column sums divided by
   the total sum of the absolute values of the k-th dataset, following the
   similar approaches used in the literature [[52]7, [53]9, [54]24,
   [55]25]. In this way, we put higher weights on the genes with higher
   variability. Or, we can construct Q matrices such that some genes known
   to be associated with a clinical phenotype of interest have higher
   weights. Also, it would be another possible approaches to construct Q
   based on functional annotation following recent methods, originally
   proposed for a rare variant test for an integrative analysis [[56]3,
   [57]14].

Multiple co-Inertia analysis (mCIA)

   The goal of mCIA is to find a set of vectors
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>∈</mo><msup><mrow><mi>ℝ</mi></mrow><mrow><msub><mrow><mi>p</mi></mrow
   ><mrow><mi>k</mi></mrow></msub></mrow></msup><mo>,</mo><mi>k</mi><mo>=<
   /mo><mn>1</mn><mo>,</mo><mo>…</mo><mo>,</mo><mi>K</mi> :MATH]
   , and a vector
   [MATH: <mi
   mathvariant="bold-italic">v</mi><mo>∈</mo><msup><mrow><mi>ℝ</mi></mrow>
   <mrow><mi>n</mi></mrow></msup> :MATH]
   , such that the weighted sum of
   [MATH: <msup><mrow><mo>(</mo><mi
   mathvariant="bold-italic">v</mi><mo>⊺</mo><mi
   mathvariant="bold-italic">D</mi><msub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   sub><mrow><mi
   mathvariant="bold-italic">Q</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   sub><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>)</mo></mrow><mrow><mn>2</mn></mrow></msup> :MATH]
   is maximized. The objective function of mCIA problem is defined as
   follows,
   [MATH: <mtable class="align" columnalign="left"><mtr><mtd
   class="align-1"></mtd><mtd
   class="align-2"><munder><mrow><mo>max</mo></mrow><mrow><mi
   mathvariant="bold-italic">v</mi><mo>,</mo><msub><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>,</mo><mo>…</mo><mo>,</mo><msub><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mi>K</mi></mrow></msub></
   mrow></munder><munderover accent="false" accentunder="false"><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>K</mi></mrow></munderover><msub><mrow><mi>w</mi></mrow><mrow>
   <mi>k</mi></mrow></msub><msup><mrow><mo>(</mo><msup><mrow><mi
   mathvariant="bold-italic">v</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   i mathvariant="bold-italic">D</mi><msub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   sub><mrow><mi
   mathvariant="bold-italic">Q</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   sub><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>)</mo></mrow><mrow><mn>2</mn></mrow></msup><mspace
   width="2em"></mspace></mtd></mtr><mtr><mtd class="align-1"></mtd><mtd
   class="align-2"><mtext>s.t</mtext><mspace
   width="1em"></mspace><msubsup><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><msub><mrow><mi
   mathvariant="bold-italic">Q</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   sub><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>=</mo><mn>1</mn><mo>,</mo><mi>k</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo
   >…</mo><mo>,</mo><mi>K</mi><mo>,</mo><mspace
   width="1em"></mspace><msup><mrow><mi
   mathvariant="bold-italic">v</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   i mathvariant="bold-italic">D</mi><mi
   mathvariant="bold-italic">v</mi><mo>=</mo><mn>1</mn><mo>,</mo><mspace
   width="2em"></mspace></mtd><mtd><mspace
   width="2em"></mspace></mtd></mtr></mtable> :MATH]
   1

   where (u[1],…,u[K]) denotes a set of co-inertia loadings (or
   coefficients) and v is a synthetic center [[58]24]. The synthetic
   center v can be understood as a reference structure in the sample
   space. Loading vectors (u[1],…,u[K]) are the set of coefficients that
   maximizes the objective function.

   It has been shown that the vector v of problem ([59]1) can be found by
   solving the following eigenvalue problem [[60]6],
   [MATH: <mrow><msup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>†</mi></mrow></msup><m
   sup><mrow><mi
   mathvariant="bold-italic">Q</mi></mrow><mrow><mi>†</mi></mrow></msup><m
   sup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mo>⊺</mo><mi>†</mi></mrow
   ></msup><mi mathvariant="bold-italic">D</mi><mi
   mathvariant="bold-italic">v</mi><mo>=</mo><mi>λ</mi><mi
   mathvariant="bold-italic">v</mi><mo>,</mo></mrow> :MATH]

   where
   [MATH: <msup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>†</mi></mrow></msup><m
   o>=</mo><mfenced close="]" open="["
   separators=""><mrow><msubsup><mrow><mi>w</mi></mrow><mrow><mn>1</mn></m
   row><mrow><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msubsup><msub><mrow><m
   i
   mathvariant="bold-italic">X</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>,</mo><msubsup><mrow><mi>w</mi></mrow><mrow><mn>2</mn></mrow><mrow><m
   n>1</mn><mo>/</mo><mn>2</mn></mrow></msubsup><msub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   o>,</mo><mo>…</mo><mo>,</mo><msubsup><mrow><mi>w</mi></mrow><mrow><mi>K
   </mi></mrow><mrow><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msubsup><msub>
   <mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>K</mi></mrow></msub></
   mrow></mfenced><mo>∈</mo><msup><mrow><mi>ℝ</mi></mrow><mrow><mi>n</mi><
   mo>×</mo><mo>∑</mo><msub><mrow><mi>p</mi></mrow><mrow><mi>k</mi></mrow>
   </msub></mrow></msup> :MATH]
   is the merged table of K weighted datasets and
   [MATH: <msup><mrow><mi
   mathvariant="bold-italic">Q</mi></mrow><mrow><mi>†</mi></mrow></msup><m
   o>∈</mo><msup><mrow><mi>ℝ</mi></mrow><mrow><mo>∑</mo><msub><mrow><mi>p<
   /mi></mrow><mrow><mi>k</mi></mrow></msub><mo>×</mo><mo>∑</mo><msub><mro
   w><mi>p</mi></mrow><mrow><mi>k</mi></mrow></msub></mrow></msup> :MATH]
   is the matrix that has Q[1],…,Q[K] as its diagonal blocks. Given the
   reference vector v defined above, the loading vectors u[k],k=1,…,K are
   obtained by
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>=</mo><msubsup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mi mathvariant="bold-italic">D</mi><mi
   mathvariant="bold-italic">v</mi><mo>/</mo><mo>∥</mo><msubsup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mi mathvariant="bold-italic">D</mi><mi
   mathvariant="bold-italic">v</mi><msub><mrow><mo>∥</mo></mrow><mrow><msu
   b><mrow><mi
   mathvariant="bold-italic">Q</mi></mrow><mrow><mi>k</mi></mrow></msub></
   mrow></msub> :MATH]
   .

   The second set of loadings orthogonal to the first set can be obtained
   by repeating the above procedure to the residual datasets calculated
   using a deflation method [[61]10, Chap7.1.2].

   We propose a new mCIA approach that enforces sparsity on the set of
   loading vectors for all datasets. Consider the following problem, which
   is another representation of ([62]1),
   [MATH: <mtable><mtr><mtd><munder><mrow><mtext
   mathvariant="italic">maximize</mtext></mrow><mrow><mi
   mathvariant="bold-italic">b</mi><mo>,</mo><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>,</mo><mo>…</mo><mo>,</mo><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>K</mi></mrow></msub></
   mrow></munder></mtd><mtd><mspace
   width="1em"></mspace><munderover><mrow><mo>∑</mo></mrow><mrow><mi>k</mi
   ><mo>=</mo><mn>1</mn></mrow><mrow><mi>K</mi></mrow></munderover><msup><
   mrow><mfenced close=")" open="(" separators=""><mrow><msup><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   over accent="true"><mrow><msub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>k</mi></mrow></msub></
   mrow><mo>~</mo></mover><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow></msub></
   mrow></mfenced></mrow><mrow><mn>2</mn></mrow></msup><mo>,</mo></mtd><mt
   d><mtext>s.t</mtext><mspace
   width="2.77626pt"></mspace><msubsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>=</mo><mn>1</mn><mo>,</mo><mspace
   width="2.77626pt"></mspace><msup><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   i
   mathvariant="bold-italic">b</mi><mo>=</mo><mn>1</mn><mi>.</mi></mtd></m
   tr></mtable> :MATH]
   2

   where
   [MATH: <msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi></mrow></msub><mo>=</mo><msqrt><mrow><msub><mrow><mi>w</mi></m
   row><mrow><mi>k</mi></mrow></msub></mrow></msqrt><msup><mrow><mi
   mathvariant="bold-italic">D</mi></mrow><mrow><mn>1</mn><mo>/</mo><mn>2<
   /mn></mrow></msup><msub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">Q</mi></mrow><mrow><mi>k</mi></mrow><mrow><mn
   >1</mn><mo>/</mo><mn>2</mn></mrow></msubsup><mo>∈</mo><msup><mrow><mi>ℝ
   </mi></mrow><mrow><mi>n</mi><mo>×</mo><msub><mrow><mi>p</mi></mrow><mro
   w><mi>k</mi></mrow></msub></mrow></msup><mo>,</mo><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>=</mo><msubsup><mrow><mi
   mathvariant="bold-italic">Q</mi></mrow><mrow><mi>k</mi></mrow><mrow><mn
   >1</mn><mo>/</mo><mn>2</mn></mrow></msubsup><msub><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>∈</mo><msup><mrow><mi>ℝ</mi></mrow><mrow><msub><mrow><mi>p</mi></mrow
   ><mrow><mi>k</mi></mrow></msub></mrow></msup> :MATH]
   , and
   [MATH: <mi mathvariant="bold-italic">b</mi><mo>=</mo><msup><mrow><mi
   mathvariant="bold-italic">D</mi></mrow><mrow><mn>1</mn><mo>/</mo><mn>2<
   /mn></mrow></msup><mi
   mathvariant="bold-italic">v</mi><mo>∈</mo><msup><mrow><mi>ℝ</mi></mrow>
   <mrow><mi>n</mi></mrow></msup> :MATH]
   . The problem ([63]2) is a multi-convex problem, which is a convex
   problem with respect to a[k] while others
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><msup><mrow><mi>k</mi></mr
   ow><mrow><mo>′</mo></mrow></msup></mrow></msub><mo>,</mo><msup><mrow><m
   i>k</mi></mrow><mrow><mo>′</mo></mrow></msup><mo>=</mo><mn>1</mn><mo>,<
   /mo><mo>…</mo><mo>,</mo><mi>k</mi><mo>−</mo><mn>1</mn><mo>,</mo><mi>k</
   mi><mo>+</mo><mn>1</mn><mo>,</mo><mo>…</mo><mo>,</mo><mi>K</mi> :MATH]
   and v are fixed. This enables us to apply an iterative algorithm for
   finding a solution set (b,a[1],…,a[K]).

   First, for fixed a[k],k=1,…,K, the problem ([64]2) becomes
   [MATH: <mtable><mtr><mtd><munder><mrow><mtext
   mathvariant="italic">maximize</mtext></mrow><mrow><mi
   mathvariant="bold-italic">b</mi></mrow></munder></mtd><mtd><mspace
   width="1em"></mspace><munderover><mrow><mo>∑</mo></mrow><mrow><mi>k</mi
   ><mo>=</mo><mn>1</mn></mrow><mrow><mi>K</mi></mrow></munderover><msup><
   mrow><mfenced close=")" open="(" separators=""><mrow><msup><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   sub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi></mrow></msub><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow></msub></
   mrow></mfenced></mrow><mrow><mn>2</mn></mrow></msup><mo>,</mo></mtd><mt
   d><mtext>s.t</mtext><mspace width="1em"></mspace><msup><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   i
   mathvariant="bold-italic">b</mi><mo>=</mo><mn>1</mn><mi>.</mi></mtd></m
   tr></mtable> :MATH]
   3

   where the objective function is convex with respect to b. Indeed, above
   problem can be optimized via Eigenvalue decomposition. Consider the
   Lagrangian formulation of ([65]3),
   [MATH: <mi>L</mi><mo>(</mo><mi
   mathvariant="bold-italic">b</mi><mo>)</mo><mo>=</mo><munderover><mrow><
   mo>∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>K<
   /mi></mrow></munderover><msup><mrow><mfenced close=")" open="("
   separators=""><mrow><msup><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   sub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi></mrow></msub><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow></msub></
   mrow></mfenced></mrow><mrow><mn>2</mn></mrow></msup><mo>−</mo><mi>λ</mi
   ><mo>(</mo><msup><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   i mathvariant="bold-italic">b</mi><mo>−</mo><mn>1</mn><mo>)</mo> :MATH]
   , where λ is a Lagrangian multiplier. To obtain a solution, we take a
   derivative of L with respect to b and solve the equation by setting the
   derivative equal to zero as follows,
   [MATH: <mfrac><mrow><mi>∂L</mi></mrow><mrow><mi>∂</mi><mi
   mathvariant="bold-italic">b</mi></mrow></mfrac><mo>=</mo><mn>2</mn><mun
   derover><mrow><mo>∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mr
   ow><mrow><mi>K</mi></mrow></munderover><mfenced close=")" open="("
   separators=""><mrow><msup><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   sub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi></mrow></msub><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow></msub></
   mrow></mfenced><msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi></mrow></msub><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>−</mo><mn>2</mn><mi>λ</mi><mi
   mathvariant="bold-italic">b</mi><mo>=</mo><mn>2</mn><mfenced close=")"
   open="("
   separators=""><mrow><munderover><mrow><mo>∑</mo></mrow><mrow><mi>k</mi>
   <mo>=</mo><mn>1</mn></mrow><mrow><mi>K</mi></mrow></munderover><msub><m
   row><mi
   mathvariant="bold-italic">M</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   i mathvariant="bold-italic">b</mi><mo>−</mo><mi>λ</mi><mi
   mathvariant="bold-italic">b</mi></mrow></mfenced><mo>=</mo><mn>0</mn><m
   o>,</mo> :MATH]
   where
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">M</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>=</mo><msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi></mrow></msub><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi></mrow><mrow><mo>⊺</mo></mrow></msubsup><mo>∈</mo><mi>n</mi><m
   o>×</mo><mi>n</mi> :MATH]
   . The optimal b is the first eigenvector of
   [MATH:
   <munderover><mrow><mo>∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</mn>
   </mrow><mrow><mi>K</mi></mrow></munderover><msub><mrow><mi
   mathvariant="bold-italic">M</mi></mrow><mrow><mi>k</mi></mrow></msub>
   :MATH]
   .

   As a next step for finding a solution of a[1], we fix b and
   a[k],k=2,…,K. Then we have
   [MATH: <mtable><mtr><mtd><munder><mrow><mtext
   mathvariant="italic">maximize</mtext></mrow><mrow><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub></
   mrow></munder></mtd><mtd><mspace
   width="1em"></mspace><msubsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><msub><mrow><mi
   mathvariant="bold-italic">N</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   sub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>,</mo></mtd><mtd><mtext>s.t</mtext><mspace
   width="1em"></mspace><msubsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>=</mo><mn>1</mn><mo>,</mo></mtd></mtr></mtable> :MATH]
   4

   where
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">N</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>=</mo><msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mn>1</mn></mrow><mrow><mo>⊺</mo></mrow></msubsup><mi
   mathvariant="bold-italic">b</mi><msup><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   sub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mn>1</mn></mrow></msub> :MATH]
   . Notice that the problem ([66]4) is the eigenvalue decomposition
   problem. The first eigenvector of N[1] is the optimal a[1] and the
   corresponding eigenvalue is the maximized objective value at the
   optimal value of a[1]. Rest of loading vectors a[2],…,a[K] can be
   estimated by applying the same procedure as a[1]. From the set of
   estimated vectors
   [MATH: <mo>(</mo><mover accent="true"><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mo>^</mo></mover><mo>,</mo><mov
   er accent="true"><mrow><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub></
   mrow><mo>^</mo></mover><mo>,</mo><mo>…</mo><mo>,</mo><mover
   accent="true"><mrow><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>K</mi></mrow></msub></
   mrow><mo>^</mo></mover><mo>)</mo> :MATH]
   , we recover a solution of the original mCIA,
   [MATH: <mo>(</mo><mover accent="true"><mrow><mi
   mathvariant="bold-italic">v</mi></mrow><mo>^</mo></mover><mo>,</mo><mov
   er accent="true"><mrow><msub><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mn>1</mn></mrow></msub></
   mrow><mo>^</mo></mover><mo>,</mo><mo>…</mo><mo>,</mo><mover
   accent="true"><mrow><msub><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mi>K</mi></mrow></msub></
   mrow><mo>^</mo></mover><mo>)</mo> :MATH]
   , by premultiplying
   [MATH: <msup><mrow><mi
   mathvariant="bold-italic">D</mi></mrow><mrow><mo>−</mo><mn>1</mn><mo>/<
   /mo><mn>2</mn></mrow></msup><mo>,</mo><msubsup><mrow><mi
   mathvariant="bold-italic">Q</mi></mrow><mrow><mn>1</mn></mrow><mrow><mo
   >−</mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msubsup><mo>,</mo><mo>…</
   mo><mo>,</mo><msubsup><mrow><mi
   mathvariant="bold-italic">Q</mi></mrow><mrow><mi>K</mi></mrow><mrow><mo
   >−</mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msubsup> :MATH]
   to
   [MATH: <mo>(</mo><mover accent="true"><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mo>^</mo></mover><mo>,</mo><mov
   er accent="true"><mrow><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub></
   mrow><mo>^</mo></mover><mo>,</mo><mo>…</mo><mo>,</mo><mover
   accent="true"><mrow><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>K</mi></mrow></msub></
   mrow><mo>^</mo></mover><mo>)</mo> :MATH]
   respectively.

   The subsequent sets of vectors
   [MATH: <mfenced close=")" open="(" separators=""><mrow><msup><mrow><mi
   mathvariant="bold-italic">v</mi></mrow><mrow><mo>(</mo><mi>r</mi><mo>)<
   /mo></mrow></msup><mo>,</mo><msubsup><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mn>1</mn></mrow><mrow><mo
   >(</mo><mi>r</mi><mo>)</mo></mrow></msubsup><mo>,</mo><mo>…</mo><mo>,</
   mo><msubsup><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mi>K</mi></mrow><mrow><mo
   >(</mo><mi>r</mi><mo>)</mo></mrow></msubsup></mrow></mfenced><mo>,</mo>
   <mi>r</mi><mo>=</mo><mn>2</mn><mo>,</mo><mo>…</mo><mo>,</mo><mo>min</mo
   ><mo>(</mo><mi>n</mi><mo>,</mo><msub><mrow><mi>p</mi></mrow><mrow><mn>1
   </mn></mrow></msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><mrow><mi>p</mi>
   </mrow><mrow><mi>K</mi></mrow></msub><mo>)</mo> :MATH]
   which are orthogonal to all sets of previously estimated vectors can be
   estimated by applying the same procedure to the residual data matrices
   [MATH: <msubsup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mn>1</mn></mrow><mrow><mo
   >(</mo><mi>r</mi><mo>)</mo></mrow></msubsup><mo>,</mo><mo>…</mo><mo>,</
   mo><msubsup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>K</mi></mrow><mrow><mo
   >(</mo><mi>r</mi><mo>)</mo></mrow></msubsup> :MATH]
   with respect to the previously estimated vectors
   [MATH: <mfenced close=")" open="(" separators=""><mrow><msup><mrow><mi
   mathvariant="bold-italic">v</mi></mrow><mrow><mo>(</mo><msup><mrow><mi>
   r</mi></mrow><mrow><mo>′</mo></mrow></msup><mo>)</mo></mrow></msup><mo>
   ,</mo><msubsup><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mn>1</mn></mrow><mrow><mo
   >(</mo><msup><mrow><mi>r</mi></mrow><mrow><mo>′</mo></mrow></msup><mo>)
   </mo></mrow></msubsup><mo>,</mo><mo>…</mo><mo>,</mo><msubsup><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mi>K</mi></mrow><mrow><mo
   >(</mo><msup><mrow><mi>r</mi></mrow><mrow><mo>′</mo></mrow></msup><mo>)
   </mo></mrow></msubsup></mrow></mfenced><mo>,</mo><msup><mrow><mi>r</mi>
   </mrow><mrow><mo>′</mo></mrow></msup><mo>=</mo><mn>1</mn><mo>,</mo><mo>
   …</mo><mo>,</mo><mi>r</mi><mo>−</mo><mn>1</mn> :MATH]
   using a deflation technique.

Sparse mCIA

   For obtaining interpretable results, sparsity on coefficient loading
   vectors (a[1],…,a[K]) is desirable. To this end, we will impose a
   sparsity constraint on the transformed loading vectors a[1],…,a[K].
   Note that we do not put a sparsity constraint on the reference vector b
   in the sample space. Sparsity on (a[1],…,a[K]) can be transferred to
   the original loading vectors (u[1],…,u[K]) because the weight matrices
   Q[1],…,Q[K] are assumed to be diagonal matrices.

   Given b and a[k],k=2,…,K, we propose to add the l[0]-sparsity
   constraint to ([67]4) for obtaining a sparse estimate of a[1] as
   follows,
   [MATH: <mtable><mtr><mtd><munder><mrow><mtext
   mathvariant="italic">maximize</mtext></mrow><mrow><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub></
   mrow></munder></mtd><mtd><mspace
   width="1em"></mspace><msubsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><msub><mrow><mi
   mathvariant="bold-italic">N</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   sub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>,</mo></mtd><mtd><mtext>s.t</mtext><mspace
   width="2.77626pt"></mspace><msubsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>=</mo><mn>1</mn><mo>,</mo><mspace
   width="2.77626pt"></mspace><mo>∥</mo><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   sub><mrow><mo>∥</mo></mrow><mrow><mn>0</mn></mrow></msub><mo>≤</mo><msu
   b><mrow><mi>s</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>,</mo></mtd>
   </mtr></mtable> :MATH]
   5

   where
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">N</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>=</mo><msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mn>1</mn></mrow><mrow><mo>⊺</mo></mrow></msubsup><mi
   mathvariant="bold-italic">b</mi><msup><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   sub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mn>1</mn></mrow></msub> :MATH]
   and s[1] is a pre-defined positive integer value less than p[1].

   To tackle our problem ([68]5), we will utilize the algorithm recently
   proposed by [[69]35]. They proposed the truncated Rayleigh flow method
   (Rifle), which solves the maximization problem of the l[0]-sparsity
   constrained generalized Rayleigh quotient. It is well known that the
   optimization problem of the generalized Rayleigh quotient with respect
   to
   [MATH: <mi
   mathvariant="bold-italic">ω</mi><mo>∈</mo><msup><mrow><mi>ℝ</mi></mrow>
   <mrow><mi>p</mi></mrow></msup> :MATH]
   ,
   [MATH: <mi>f</mi><mo>(</mo><mi
   mathvariant="bold-italic">ω</mi><mo>)</mo><mo>=</mo><msup><mrow><mi
   mathvariant="bold-italic">ω</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   sub><mrow><mi
   mathvariant="bold-italic">R</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   i mathvariant="bold-italic">ω</mi><mo>/</mo><msup><mrow><mi
   mathvariant="bold-italic">ω</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   sub><mrow><mi
   mathvariant="bold-italic">R</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   i mathvariant="bold-italic">ω</mi><mo>,</mo> :MATH]
   6

   where
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">R</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>,</mo><msub><mrow><mi
   mathvariant="bold-italic">R</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   o>∈</mo><msup><mrow><mi>ℝ</mi></mrow><mrow><mi>p</mi><mo>×</mo><mi>p</m
   i></mrow></msup> :MATH]
   are symmetric real-valued matrices, is same as the generalized
   eigenvalue problem. Our objective criterion is a specific case of the
   generalized eigenvalue problem with R[1]=N[1] and
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">R</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   o>=</mo><msub><mrow><mi
   mathvariant="bold-italic">I</mi></mrow><mrow><msub><mrow><mi>p</mi></mr
   ow><mrow><mn>1</mn></mrow></msub></mrow></msub> :MATH]
   , which allows us to use Rifle for solving our problem. The algorithm
   is a simple iterative procedure consisting of the gradient descent
   algorithm and hard-thresholding steps. At each iteration, the most
   biggest s[1] elements of the solution from the gradient descent step
   are left as nonzero and others are forced to be zero. The same
   procedure is applied for estimating remaining loading vectors
   a[2],…,a[K]. The complete pseudo-algorithm of smCIA problem is
   summarized in Algorithm 1.

Structured sparse mCIA

   We propose another new model that incorporates prior known network
   information among features. To this end, we employ the Laplacian
   penalty on the sparse mCIA model to obtain more biologically meaningful
   results.

   Let
   [MATH: <msub><mrow><mi
   mathvariant="script">G</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>=</
   mo><mo>{</mo><msub><mrow><mi
   mathvariant="bold-italic">C</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>,</mo><msub><mrow><mi
   mathvariant="bold-italic">E</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>,</mo><msub><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>}</mo> :MATH]
   denote a weighted and undirected graph of variables in X[1], where C[1]
   is the set of vertices corresponding to the p[1] features (or nodes),
   E[1]={i∼j} is the set of edges that connect features i and j, and W[1]
   contains the weights for all nodes. Given
   [MATH: <msub><mrow><mi
   mathvariant="script">G</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>=</
   mo><mo>{</mo><msub><mrow><mi
   mathvariant="bold-italic">C</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>,</mo><msub><mrow><mi
   mathvariant="bold-italic">E</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>,</mo><msub><mrow><mi
   mathvariant="bold-italic">W</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>}</mo> :MATH]
   , the (i,j)-th element of the normalized Laplacian matrix L[1] of X[1]
   is defined by
   [MATH: <mrow><mtable><mtr><mtd><msub><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>(</mo><mi>i</mi><mo>,</mo><mi>j</mi><mo>)</mo><mo>=</mo><mfenced
   close="" open="{"
   separators=""><mrow><mtable><mtr><mtd><mn>1</mn><mo>−</mo><msub><mrow><
   mi>w</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>(</mo><mi>i</mi><mo>,
   </mo><mi>j</mi><mo>)</mo><mo>/</mo><msub><mrow><mi>d</mi></mrow><mrow><
   mi>i</mi></mrow></msub><mo>,</mo></mtd><mtd><mtext>if</mtext><mi>i</mi>
   <mo>=</mo><mi>j</mi><mtext>and</mtext><mspace
   width="1em"></mspace><msub><mrow><mi>d</mi></mrow><mrow><mi>i</mi></mro
   w></msub><mo>≠</mo><mn>0</mn><mo>,</mo></mtd></mtr><mtr><mtd><mo>−</mo>
   <msub><mrow><mi>w</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>(</mo><m
   i>i</mi><mo>,</mo><mi>j</mi><mo>)</mo><mo>/</mo><msqrt><mrow><msub><mro
   w><mi>d</mi></mrow><mrow><mi>i</mi></mrow></msub><msub><mrow><mi>d</mi>
   </mrow><mrow><mi>j</mi></mrow></msub></mrow></msqrt><mo>,</mo></mtd><mt
   d><mtext>if</mtext><mspace
   width="1em"></mspace><mi>i</mi><mtext>and</mtext><mspace
   width="1em"></mspace><mi>j</mi><mtext>are
   adjacent</mtext><mo>,</mo></mtd></mtr><mtr><mtd><mn>0</mn><mo>,</mo></m
   td><mtd><mtext>otherwise,</mtext></mtd></mtr></mtable></mrow></mfenced>
   </mtd></mtr></mtable></mrow> :MATH]

   where w[1](i,j) is a weight of the edge e=(i∼j) and d[i] is a degree of
   the vertex i defined as
   [MATH:
   <munder><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>∼</mo><mi>j</mi></mr
   ow></munder><msub><mrow><mi>w</mi></mrow><mrow><mn>1</mn></mrow></msub>
   <mo>(</mo><mi>i</mi><mo>,</mo><mi>j</mi><mo>)</mo> :MATH]
   . It is easily shown that
   [MATH: <mi>p</mi><mo>(</mo><msub><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>;</mo><msub><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>)</mo><mo>=</mo><msubsup><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mn>1</mn></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><msub><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   sub><mrow><mi
   mathvariant="bold-italic">u</mi></mrow><mrow><mn>1</mn></mrow></msub>
   :MATH]
   becomes zero if the prior known network information of L[1] agrees with
   the true network existing among X[1].

   For fixed b and a[k],k=2,…,K, consider the following optimization
   problem,
   [MATH: <mtable><mtr><mtd><munder><mrow><mtext
   mathvariant="italic">maximize</mtext></mrow><mrow><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub></
   mrow></munder></mtd><mtd><mspace
   width="1em"></mspace><msubsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><msub><mrow><mi
   mathvariant="bold-italic">N</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   sub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>−</mo><msub><mrow><mi>λ</mi></mrow><mrow><mn>1</mn></mrow></msub><msu
   bsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mn>1</mn></mrow></msub><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub></
   mtd></mtr><mtr><mtd><mtext>s.t</mtext></mtd><mtd><mspace
   width="1em"></mspace><msubsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>=</mo><mn>1</mn><mo>,</mo><mspace
   width="1em"></mspace><mo>∥</mo><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   sub><mrow><mo>∥</mo></mrow><mrow><mn>0</mn></mrow></msub><mo>≤</mo><msu
   b><mrow><mi>s</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>,</mo></mtd>
   </mtr><mtr></mtr></mtable> :MATH]
   7

   where
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">N</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>=</mo><msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mn>1</mn></mrow><mrow><mo>⊺</mo></mrow></msubsup><mi
   mathvariant="bold-italic">b</mi><msup><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   sub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mn>1</mn></mrow></msub><mo>,</mo><msub><mrow><mi>s</mi></mrow><mrow><mn
   >1</mn></mrow></msub> :MATH]
   is a pre-defined positive integer value less than p[1],λ[1] is a
   pre-defined network penalty parameter, and
   [MATH: <msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mn>1</mn></mrow></msub><mo>=</mo><msubsup><mrow><mi
   mathvariant="bold-italic">Q</mi></mrow><mrow><mn>1</mn></mrow><mrow><mo
   >−</mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msubsup><msub><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">Q</mi></mrow><mrow><mn>1</mn></mrow><mrow><mo
   >−</mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></msubsup> :MATH]
   is a transformed Laplacian matrix that contains the network information
   among variables of X[1]. To solve ([70]7), the network penalty needs to
   be minimized, which implies that the penalty encourages the model to
   estimate a[1] to be in agreement with the incorporated network
   information contained in the
   [MATH: <msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mn>1</mn></mrow></msub> :MATH]
   .

   We again employ Rifle for solving ([71]7). The objective function of
   ([72]7) become
   [MATH: <msubsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><msub><mrow><mi
   mathvariant="bold-italic">R</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   sub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub>
   :MATH]
   where
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">R</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>=</mo><msub><mrow><mi
   mathvariant="bold-italic">N</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>−</mo><msub><mrow><mi>λ</mi></mrow><mrow><mn>1</mn></mrow></msub><msu
   b><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mn>1</mn></mrow></msub> :MATH]
   . Rifle requires R[1] to be symmetric and
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">N</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>−</mo><msub><mrow><mi>λ</mi></mrow><mrow><mn>1</mn></mrow></msub><msu
   b><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mn>1</mn></mrow></msub> :MATH]
   satisfies the condition since both N[1] and
   [MATH: <msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">L</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mn>1</mn></mrow></msub> :MATH]
   are symmetric. Like smCIA algorithm, the estimation of remaining
   loading vectors a[2],…,a[K] is same as that of a[1]. The complete
   pseudo-algorithm of ssmCIA problem is summarized in Algorithm 1.

   graphic file with name 12859_2020_3455_Figa_HTML.jpg

Choice of tuning parameters

   In our methods, we have K and 2K parameters required to be tuned for
   smCIA and ssmCIA, respectively. Denote the set of tuning parameters as
   [MATH: <mrow><mtable><mtr><mtd><mi
   mathvariant="bold-italic">λ</mi><mo>=</mo><mfenced close="" open="{"
   separators=""><mrow><mtable><mtr><mtd><mo>{</mo><msub><mrow><mi>s</mi><
   /mrow><mrow><mi>k</mi></mrow></msub><mo>,</mo><mi>k</mi><mo>=</mo><mn>1
   </mn><mo>,</mo><mo>…</mo><mo>,</mo><mi>K</mi><mo>}</mo><mo>,</mo></mtd>
   <mtd><mtext>if
   smCIA,</mtext></mtd></mtr><mtr><mtd><mo>{</mo><msub><mrow><mi>s</mi></m
   row><mrow><mi>k</mi></mrow></msub><mo>,</mo><msub><mrow><mi>λ</mi></mro
   w><mrow><mi>k</mi></mrow></msub><mo>,</mo><mi>k</mi><mo>=</mo><mn>1</mn
   ><mo>,</mo><mo>…</mo><mo>,</mo><mi>K</mi><mo>}</mo><mo>,</mo></mtd><mtd
   ><mtext>if
   ssmCIA</mtext><mi>.</mi></mtd></mtr></mtable></mrow></mfenced></mtd></m
   tr></mtable></mrow> :MATH]

   We employ a T-fold cross validation (CV) method to select the best
   tuning parameter set. We set the range of grid points for each
   parameters from several initial trials. We divide each dataset into T
   subgroups and calculate the CV objective value defined as follows,
   [MATH: <mrow><mtext mathvariant="italic">CV</mtext><mo>(</mo><mi
   mathvariant="bold-italic">λ</mi><mo>)</mo><mo>=</mo><mfrac><mrow><mo>(<
   /mo><mi>T</mi><mo>−</mo><mn>1</mn><mo>)</mo><munderover accent="false"
   accentunder="false"><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>K</mi></mrow></munderover><munderover accent="false"
   accentunder="false"><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>t</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>T</mi></mrow></munderover><mi>c</mi><msub><mrow><mi>v</mi></m
   row><mrow><mi>t</mi><mo>,</mo><mi>k</mi></mrow></msub></mrow><mrow><mi>
   T</mi><munderover accent="false" accentunder="false"><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>K</mi></mrow></munderover><munderover accent="false"
   accentunder="false"><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>t</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>T</mi></mrow></munderover><msup><mrow><mfenced close=")"
   open="("
   separators=""><mrow><mi>c</mi><msub><mrow><mi>v</mi></mrow><mrow><mi>t<
   /mi><mo>,</mo><mi>k</mi></mrow></msub><mo>−</mo><munderover
   accent="false" accentunder="false"><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>K</mi></mrow></munderover><munderover accent="false"
   accentunder="false"><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>t</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>T</mi></mrow></munderover><mi>c</mi><msub><mrow><mi>v</mi></m
   row><mrow><mi>t</mi><mo>,</mo><mi>k</mi></mrow></msub></mrow></mfenced>
   </mrow><mrow><mn>2</mn></mrow></msup></mrow></mfrac></mrow> :MATH]

   where
   [MATH:
   <mi>c</mi><msub><mrow><mi>v</mi></mrow><mrow><mi>t</mi><mo>,</mo><mi>k<
   /mi></mrow></msub><mo>=</mo><msup><mrow><mfenced close=")" open="("
   separators=""><mrow><msup><mrow><mo>(</mo><msup><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mo>^</mo></mover></mrow><mrow><
   mo>−</mo><mi>t</mi></mrow></msup><mo>(</mo><mi
   mathvariant="bold-italic">λ</mi><mo>)</mo><mo>)</mo></mrow><mrow><mo>⊺<
   /mo></mrow></msup><msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi></mrow><mrow><mi>t</mi></mrow></msubsup><msubsup><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mo>^</mo></mover></mrow><mrow><
   mi>k</mi></mrow><mrow><mo>−</mo><mi>t</mi></mrow></msubsup><mo>(</mo><m
   i
   mathvariant="bold-italic">λ</mi><mo>)</mo></mrow></mfenced></mrow><mrow
   ><mn>2</mn></mrow></msup> :MATH]
   , and
   [MATH: <msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mo>^</mo></mover></mrow><mrow><
   mi>k</mi></mrow><mrow><mo>−</mo><mi>t</mi></mrow></msubsup><mo>(</mo><m
   i mathvariant="bold-italic">λ</mi><mo>)</mo> :MATH]
   and
   [MATH: <msup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mo>^</mo></mover></mrow><mrow><
   mo>−</mo><mi>t</mi></mrow></msup><mo>(</mo><mi
   mathvariant="bold-italic">λ</mi><mo>)</mo><mo>,</mo><mi>t</mi><mo>=</mo
   ><mn>1</mn><mo>,</mo><mo>…</mo><mo>,</mo><mi>T</mi> :MATH]
   are estimated loading vectors and reference vectors from the training
   data
   [MATH: <msubsup><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mo>~</mo></mover></mrow><mrow><
   mi>k</mi></mrow><mrow><mo>−</mo><mi>t</mi></mrow></msubsup> :MATH]
   using a tuning parameter set λ. This can be considered as a scaled
   version of the CV objective value used in [[73]40]. Unlike CCA whose
   correlation values are always within a range [−1,1], co-inertia values
   are not limited to be within a certain range. We overcome this problem
   by standardizing all co-inertia values used for the cross validation.

   There is another set of parameters in the algorithm, the stepsize η[k]
   of the gradient descent step. [[74]35] suggests that η[k]<1/maximum
   eigenvalue of R[2], where R[2] is the matrix in the denominator of the
   Rayleigh function ([75]6). Since R[2] is the identity matrix in smCIA
   and ssmCIA problem, the maximum value of η[k] is 1. We also tune this
   value by exploring multiple values within (0,1] and select the best
   value using the cross validation.

   Lastly, we use the nonsparse solution of (u[1],…,u[k],v) from mCIA as a
   starting point.

Simulation study

Synthetic data generation

   We use a latent variable model to generate synthetic K datasets related
   to each other. Let θ be a latent variable such that θ∼N(0,σ^2) and it
   affects to K sets of random variables
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">x</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>=</mo><mi>θ</mi><msubsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mo>+</mo><msubsup><mrow><mi
   mathvariant="bold-italic">ε</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mo>∈</mo><msup><mrow><mi>ℝ</mi></mrow><mrow><m
   sub><mrow><mi>p</mi></mrow><mrow><mi>k</mi></mrow></msub></mrow></msup>
   <mo>,</mo><mi>k</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>…</mo><mo>,</mo><
   mi>K</mi> :MATH]
   , where
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">ε</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>∼</mo><mi>N</mi><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><msub><mrow><
   mi>p</mi></mrow><mrow><mi>k</mi></mrow></msub></mrow></msub><mo>,</mo><
   msub><mrow><mi
   mathvariant="bold-italic">Σ</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>)</mo> :MATH]
   and a[k] is set to be same as the first eigenvector of the matrix Σ[k].
   Then following calculation
   [MATH: <mrow><mtable><mtr><mtd><mi
   mathvariant="normal">E</mi><mo>(</mo><msubsup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mi
   mathvariant="bold-italic">b</mi><msup><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   sub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>)</mo></mtd><mtd><mo>=</mo><munderover accent="false"
   accentunder="false"><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>i</mi></mrow><mrow><mi>n</mi></mr
   ow></munderover><msubsup><mrow><mi>b</mi></mrow><mrow><mi>i</mi></mrow>
   <mrow><mn>2</mn></mrow></msubsup><mi
   mathvariant="normal">E</mi><mo>[</mo><mfenced close=")" open="("
   separators=""><mrow><msub><mrow><mi>θ</mi></mrow><mrow><mi>i</mi></mrow
   ></msub><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   o>+</mo><msub><mrow><mi
   mathvariant="bold-italic">ε</mi></mrow><mrow><mi>k</mi><mo>,</mo><mi>i<
   /mi></mrow></msub></mrow></mfenced><mfenced close=")" open="("
   separators=""><mrow><msub><mrow><mi>θ</mi></mrow><mrow><mi>i</mi></mrow
   ></msub><msubsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mo>+</mo><msubsup><mrow><mi
   mathvariant="bold-italic">ε</mi></mrow><mrow><mi>k</mi><mo>,</mo><mi>i<
   /mi></mrow><mrow><mo>⊺</mo></mrow></msubsup></mrow></mfenced><mo>]</mo>
   </mtd></mtr><mtr><mtd><mo>=</mo><mi mathvariant="normal">E</mi><mfenced
   close="]" open="["
   separators=""><mrow><msup><mrow><mi>θ</mi></mrow><mrow><mn>2</mn></mrow
   ></msup><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mo>+</mo><mi>θ</mi><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">ε</mi></mrow><mrow><mi>k</mi><mo>,</mo><mi>i<
   /mi></mrow><mrow><mo>⊺</mo></mrow></msubsup><mo>+</mo><mi>θ</mi><msub><
   mrow><mi
   mathvariant="bold-italic">ε</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mo>+</mo><msub><mrow><mi
   mathvariant="bold-italic">ε</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">ε</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup></mrow></mfenced></mtd></mtr><mtr><mtd><mo>=</m
   o><msup><mrow><mi>σ</mi></mrow><mrow><mn>2</mn></mrow></msup><msub><mro
   w><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mo>+</mo><msub><mrow><mi
   mathvariant="bold-italic">Σ</mi></mrow><mrow><mi>k</mi></mrow></msub></
   mtd></mtr><mtr><mtd><mo>=</mo><mfenced close=")" open="("
   separators=""><mrow><msup><mrow><mi>σ</mi></mrow><mrow><mn>2</mn></mrow
   ></msup><mo>+</mo><msub><mrow><mi>γ</mi></mrow><mrow><mn>1</mn></mrow><
   /msub></mrow></mfenced><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mo>+</mo><munderover accent="false"
   accentunder="false"><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><mn>2</mn></mrow>
   <mrow><msub><mrow><mi>p</mi></mrow><mrow><mi>k</mi></mrow></msub></mrow
   ></munderover><msub><mrow><mi>γ</mi></mrow><mrow><mi>j</mi></mrow></msu
   b><msub><mrow><mi
   mathvariant="bold-italic">e</mi></mrow><mrow><mi>j</mi></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">e</mi></mrow><mrow><mi>j</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup></mtd></mtr></mtable></mrow> :MATH]

   verifies that a[k] is same as e[1], the first eigenvector of the matrix
   [MATH: <mi mathvariant="normal">E</mi><mfenced close=")" open="("
   separators=""><mrow><msubsup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mi
   mathvariant="bold-italic">b</mi><msup><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   sub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>k</mi></mrow></msub></
   mrow></mfenced> :MATH]
   with the corresponding eigenvalue nσ^2+γ[1], where
   (γ[j],e[j]),j=1,…,p[k] are eigen-pairs of Σ[k].

   Following calculation is for cross-covariance matrices in the model.
   [MATH: <mrow><mtable><mtr><mtd><mi
   mathvariant="normal">E</mi><mo>(</mo><msubsup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>l</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mi
   mathvariant="bold-italic">b</mi><msup><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   sub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>m</mi></mrow></msub><m
   o>)</mo></mtd><mtd><mo>=</mo><munderover accent="false"
   accentunder="false"><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>i</mi></mrow><mrow><mi>n</mi></mr
   ow></munderover><msubsup><mrow><mi>b</mi></mrow><mrow><mi>i</mi></mrow>
   <mrow><mn>2</mn></mrow></msubsup><mi
   mathvariant="normal">E</mi><mfenced close="]" open="["
   separators=""><mrow><mo>(</mo><msub><mrow><mi>θ</mi></mrow><mrow><mi>i<
   /mi></mrow></msub><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>l</mi></mrow></msub><m
   o>+</mo><msub><mrow><mi
   mathvariant="bold-italic">ε</mi></mrow><mrow><mi>l</mi><mo>,</mo><mi>i<
   /mi></mrow></msub><mo>)</mo><mfenced close=")" open="("
   separators=""><mrow><msub><mrow><mi>θ</mi></mrow><mrow><mi>i</mi></mrow
   ></msub><msubsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>m</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mo>+</mo><msubsup><mrow><mi
   mathvariant="bold-italic">ε</mi></mrow><mrow><mi>m</mi><mo>,</mo><mi>i<
   /mi></mrow><mrow><mo>⊺</mo></mrow></msubsup></mrow></mfenced></mrow></m
   fenced></mtd></mtr><mtr><mtd><mo>=</mo><mi
   mathvariant="normal">E</mi><mfenced close="]" open="["
   separators=""><mrow><msup><mrow><mi>θ</mi></mrow><mrow><mn>2</mn></mrow
   ></msup><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>l</mi></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>m</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mo>+</mo><mi>θ</mi><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>l</mi></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">ε</mi></mrow><mrow><mi>m</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mo>+</mo><mi>θ</mi><msub><mrow><mi
   mathvariant="bold-italic">ε</mi></mrow><mrow><mi>l</mi></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>m</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mo>+</mo><msub><mrow><mi
   mathvariant="bold-italic">ε</mi></mrow><mrow><mi>l</mi></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">ε</mi></mrow><mrow><mi>m</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup></mrow></mfenced></mtd></mtr><mtr><mtd><mo>=</m
   o><msup><mrow><mi>σ</mi></mrow><mrow><mn>2</mn></mrow></msup><msub><mro
   w><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>l</mi></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>m</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mi>.</mi></mtd></mtr><mtr></mtr></mtable></mro
   w> :MATH]

   Our complete generative model simulates a concatenated dataset
   [MATH: <msup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   o>=</mo><mfenced close="]" open="["
   separators=""><mrow><msubsup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mn>1</mn></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mspace
   width="0.3em"></mspace><msubsup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mn>2</mn></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mspace
   width="0.3em"></mspace><mo>⋯</mo><mspace
   width="0.3em"></mspace><msubsup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>K</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup></mrow></mfenced><mo>∈</mo><msup><mrow><mi>ℝ</m
   i></mrow><mrow><mo>∑</mo><msub><mrow><mi>p</mi></mrow><mrow><mi>k</mi><
   /mrow></msub><mo>×</mo><mi>n</mi></mrow></msup> :MATH]
   from the normal distribution with the mean
   [MATH: <msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mo>∑</mo><ms
   ub><mrow><mi>p</mi></mrow><mrow><mi>k</mi></mrow></msub></mrow></msub>
   :MATH]
   and the variance
   [MATH:
   <msub><mrow><mi>Σ</mi></mrow><mrow><mi>T</mi></mrow></msub><mo>∈</mo><m
   sup><mrow><mi>ℝ</mi></mrow><mrow><mo>∑</mo><msub><mrow><mi>p</mi></mrow
   ><mrow><mi>k</mi></mrow></msub><mo>×</mo><mo>∑</mo><msub><mrow><mi>p</m
   i></mrow><mrow><mi>k</mi></mrow></msub></mrow></msup> :MATH]
   , where
   [MATH:
   <mrow><msub><mrow><mi>Σ</mi></mrow><mrow><mi>T</mi></mrow></msub><mo>=<
   /mo><mfenced close="]" open="["
   separators=""><mrow><mtable><mtr><mtd><msup><mrow><mi>σ</mi></mrow><mro
   w><mn>2</mn></mrow></msup><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mo>+</mo><msub><mrow><mi
   mathvariant="bold-italic">Σ</mi></mrow><mrow><mn>1</mn></mrow></msub></
   mtd><mtd><msup><mrow><mi>σ</mi></mrow><mrow><mn>2</mn></mrow></msup><ms
   ub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>2</mn></mrow><mrow><mo
   >⊺</mo></mrow></msubsup></mtd><mtd><mo>⋯</mo></mtd><mtd><msup><mrow><mi
   >σ</mi></mrow><mrow><mn>2</mn></mrow></msup><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>K</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup></mtd></mtr><mtr><mtd><msup><mrow><mi>σ</mi></m
   row><mrow><mn>2</mn></mrow></msup><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>2</mn></mrow><mrow><mo
   >⊺</mo></mrow></msubsup></mtd><mtd><msup><mrow><mi>σ</mi></mrow><mrow><
   mn>2</mn></mrow></msup><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>2</mn></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mo>+</mo><msub><mrow><mi
   mathvariant="bold-italic">Σ</mi></mrow><mrow><mn>2</mn></mrow></msub></
   mtd><mtd><mo>⋯</mo></mtd><mtd><msup><mrow><mi>σ</mi></mrow><mrow><mn>2<
   /mn></mrow></msup><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>K</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup></mtd></mtr><mtr><mtd><mo>⋮</mo></mtd><mtd><mo>
   ⋮</mo></mtd><mtd><mo>⋮</mo></mtd><mtd><mo>⋮</mo></mtd></mtr><mtr><mtd><
   msup><mrow><mi>σ</mi></mrow><mrow><mn>2</mn></mrow></msup><msub><mrow><
   mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>K</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup></mtd><mtd><msup><mrow><mi>σ</mi></mrow><mrow><
   mn>2</mn></mrow></msup><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>K</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup></mtd><mtd><mo>⋯</mo></mtd><mtd><msup><mrow><mi
   >σ</mi></mrow><mrow><mn>2</mn></mrow></msup><msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>K</mi></mrow></msub><m
   subsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>K</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mo>+</mo><msub><mrow><mi
   mathvariant="bold-italic">Σ</mi></mrow><mrow><mi>K</mi></mrow></msub></
   mtd></mtr></mtable></mrow></mfenced><mi>.</mi></mrow> :MATH]

Simulation design

   We consider various simulation designs to compare the performance of
   smCIA and ssmCIA with mCIA. We compare our methods with mCIA only since
   the objective functions of other integrative methods such as
   generalized CCA or methods that have the covariance-based objective
   function are different from mCIA so that direct comparison is
   inappropriate.

   We assume that there exist multiple networks among genes in each
   dataset, and the networks affect the relationship between datasets. We
   have 8 design scenarios by varying three conditions:
     * σ^2, the variance of the latent variable,
     * n[el], the number of elements in each network,
     * n[en], the number of effective networks among whole networks.

   We generate 100 Monte Carlo (MC) datasets. For each MC dataset, we
   generate n=200 observations of each three random variables
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">x</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>∈</mo><msup><mrow><mi>ℝ</mi></mrow><mrow><mn>300</mn></mrow></msup><m
   o>,</mo><msub><mrow><mi
   mathvariant="bold-italic">x</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   o>∈</mo><msup><mrow><mi>ℝ</mi></mrow><mrow><mn>400</mn></mrow></msup>
   :MATH]
   , and
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">x</mi></mrow><mrow><mn>3</mn></mrow></msub><m
   o>∈</mo><msup><mrow><mi>ℝ</mi></mrow><mrow><mn>500</mn></mrow></msup>
   :MATH]
   . There are 5 networks among each of x[1],x[2], and x[3] and 10 or 20
   elements n[el] in each network. Among n[el] genes of each network, the
   first indexed gene is the main gene that are connected to all other
   genes within the network. This means that the first indexed gene of
   each network in the simulation design with n[el]=20 has the higher
   weight compared to the one in the simulation with n[el]=10. For the
   number of effective networks n[en], we consider two cases. One case
   assumes that some networks affect relationships among datasets by
   setting n[en]=(3,4,5), while the other case assumes that all existing
   networks affect relationships, n[en]=(5,5,5). Also, we consider two
   values for σ^2=(1.2,2.5), the higher σ^2 value leads to the higher
   first eigenvalue of
   [MATH: <mi mathvariant="normal">E</mi><mfenced close=")" open="("
   separators=""><mrow><msubsup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mi
   mathvariant="bold-italic">b</mi><msup><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   sub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>k</mi></mrow></msub></
   mrow></mfenced> :MATH]
   .

   All true loadings make the network penalty zero. Thus we expect that
   ssmCIA performs better compared to smCIA since ssmCIA is encouraged to
   estimate the coefficient loadings minimizing the network penalty. All
   simulation scenarios and corresponding true coefficient loadings are
   summarized in Table [76]1. In addition, we consider incorporating
   incorrect network information in the first scenario to show the
   robustness of ssmCIA. Results of the additional simulation studies can
   be found in the supplementary materials.

Table 1.

   Simulation designs for each scenario and corresponding true loading
   vectors. All true vectors are normalized to have l[2]-norm 1
   n[en]=(3,4,5)
                         n[el]=10              n[el]=20
   σ^2=1.2               scenario 1            scenario 2
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>3</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>27</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>10</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>3</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>27</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>20</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>4</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>36</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>10</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>4</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>36</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>20</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>3</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>5</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>45</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>10</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>3</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>5</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>45</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>20</mn></
   mrow></msub><mo>)</mo> :MATH]
   σ^2=2.5               scenario 3            scenario 4
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>3</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>27</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>10</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>3</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>27</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>20</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>4</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>36</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>10</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>4</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>36</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>20</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>3</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>5</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>45</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>10</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>3</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>5</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>45</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>20</mn></
   mrow></msub><mo>)</mo> :MATH]
                       n[en]=(5,5,5)
                         n[el]=10              n[el]=20
   σ^2=1.2               scenario 5            scenario 6
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>5</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>25</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>10</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>5</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>25</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>20</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>5</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>35</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>10</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>5</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>35</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>20</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>3</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>5</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>45</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>10</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>3</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>5</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>45</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>20</mn></
   mrow></msub><mo>)</mo> :MATH]
   σ^2=2.5               scenario 7            scenario 8
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>5</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>25</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>10</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>5</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>25</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>20</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>5</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>35</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>10</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>5</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>35</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>20</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>3</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>5</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>45</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>10</mn></
   mrow></msub><mo>)</mo> :MATH]
   [MATH: <msub><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mn>3</mn></mrow></msub><m
   o>=</mo><mo>(</mo><msup><mrow><mo>(</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>5</mn></m
   row></msub><mo>,</mo><msub><mrow><mstyle
   mathvariant="bold-italic"><mn>0</mn></mstyle></mrow><mrow><mn>45</mn></
   mrow></msub><mo>)</mo></mrow><mrow><mo>⊺</mo></mrow></msup><mo>⊗</mo><m
   sub><mrow><mstyle
   mathvariant="bold-italic"><mn>1</mn></mstyle></mrow><mrow><mn>20</mn></
   mrow></msub><mo>)</mo> :MATH]
   [77]Open in a new tab

Performance measures

   To compare the feature selection performance of our methods in the
   simulations, we use sensitivity (SENS), specificity (SPEC), and
   Matthew’s correlation coefficient (MCC) defined as follows,
   [MATH:
   <mrow><mtable><mtr><mtd><mtext>SENS</mtext></mtd><mtd><mo>=</mo><mfrac>
   <mrow><mtext mathvariant="italic">TP</mtext></mrow><mrow><mtext
   mathvariant="italic">TP</mtext><mo>+</mo><mtext
   mathvariant="italic">FN</mtext></mrow></mfrac><mo>,</mo><mspace
   width="1em"></mspace><mtext>SPEC</mtext><mo>=</mo><mfrac><mrow><mtext
   mathvariant="italic">TN</mtext></mrow><mrow><mtext
   mathvariant="italic">FP</mtext><mo>+</mo><mtext
   mathvariant="italic">TN</mtext></mrow></mfrac></mtd></mtr><mtr><mtd><mt
   ext>MCC</mtext></mtd><mtd><mo>=</mo><mfrac><mrow><mtext
   mathvariant="italic">TP</mtext><mo>×</mo><mtext
   mathvariant="italic">TN</mtext><mo>−</mo><mtext
   mathvariant="italic">FP</mtext><mo>×</mo><mtext
   mathvariant="italic">FN</mtext></mrow><mrow><msqrt><mrow><mo>(</mo><mte
   xt mathvariant="italic">TP</mtext><mo>+</mo><mtext
   mathvariant="italic">FP</mtext><mo>)</mo><mo>(</mo><mtext
   mathvariant="italic">TP</mtext><mo>+</mo><mtext
   mathvariant="italic">FN</mtext><mo>)</mo><mo>(</mo><mtext
   mathvariant="italic">TN</mtext><mo>+</mo><mtext
   mathvariant="italic">FP</mtext><mo>)</mo><mo>(</mo><mtext
   mathvariant="italic">TN</mtext><mo>+</mo><mtext
   mathvariant="italic">FN</mtext><mo>)</mo></mrow></msqrt></mrow></mfrac>
   <mo>,</mo></mtd></mtr></mtable></mrow> :MATH]

   where TP, TN, FP, and FN are true positives, true negatives, false
   positives, and false negatives, respectively. Also, we calculate the
   angle between the estimated loading vectors
   [MATH: <msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mo>^</mo></mover></mrow><mrow><
   mi>k</mi></mrow></msub> :MATH]
   and the true loading vectors
   [MATH: <msubsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >∗</mo></mrow></msubsup><mo>,</mo><mi>k</mi><mo>=</mo><mn>1</mn><mo>,</
   mo><mn>2</mn><mo>,</mo><mn>3</mn> :MATH]
   , to compare the estimation performance between our methods and mCIA.
   Angle is defined as
   [MATH: <mi>∠</mi><mo>(</mo><msub><mrow><mover accent="true"><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mo>^</mo></mover></mrow><mrow><
   mi>k</mi></mrow></msub><mo>)</mo><mo>=</mo><mfrac><mrow><msubsup><mrow>
   <mover accent="true"><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mo>^</mo></mover></mrow><mrow><
   mi>k</mi></mrow><mrow><mo>⊺</mo></mrow></msubsup><msubsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >∗</mo></mrow></msubsup></mrow><mrow><mo>∥</mo><msub><mrow><mover
   accent="true"><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mo>^</mo></mover></mrow><mrow><
   mi>k</mi></mrow></msub><msub><mrow><mo>∥</mo></mrow><mrow><mn>2</mn></m
   row></msub><mo>×</mo><mo>∥</mo><msubsup><mrow><mi
   mathvariant="bold-italic">a</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >∗</mo></mrow></msubsup><msub><mrow><mo>∥</mo></mrow><mrow><mn>2</mn></
   mrow></msub></mrow></mfrac> :MATH]
   . If two vectors are exactly same, the calculated angle between those
   two vectors is 1.

Simulation results

   Simulation results are summarized in Table [78]2 and Table [79]3.
   First, the estimation performance of our proposed methods are superior
   compared to mCIA evidenced by calculated angle values. An angle value
   is close to 1 if the estimated loading vector is close to the true
   loading vector. The calculated angle values from our methods are closer
   to 1 than those from mCIA. Second, ssmCIA performs better than smCIA in
   feature selection. Note that, in our simulation scenarios, the true
   loadings are designed to follow the pre-defined network structure of
   the synthetic data. Thus we expect to observe better performance from
   ssmCIA than that from smCIA. In all scenarios, ssmCIA performs better
   than smCIA in all aspects, SENS, SPEC, MCC, and even for angle.

Table 2.

   Simulation results using sparse mCIA are shown. Sensitivity (Sens),
   Specificity (Spec), and Matthew’s correlation coefficient (MCC) for
   feature selection performance and Angle for estimation performance are
   calculated. 5-fold cross validation is used to choose the best tuning
   parameter combination in each method. Values within parenthesis are
   standard errors
   sparse multiple CIA mCIA
   a[1] a[2] a[3] a[1] a[2] a[3]
   scen Sens Spec MCC Angle Sens Spec MCC Angle Sens Spec MCC Angle Angle
   Angle Angle
   1 0.675 0.991 0.754 0.885 0.74 0.991 0.803 0.901 0.77 0.991 0.82 0.905
   0.882 0.847 0.830
   (0.285) (0.018) (0.161) (0.081) (0.205) (0.014) (0.102) (0.052) (0.155)
   (0.012) (0.071) (0.037) (0.025) (0.028) (0.025)
   2 0.754 0.974 0.781 0.901 0.759 0.966 0.762 0.886 0.755 0.96 0.743
   0.875 0.879 0.847 0.833
   (0.130) (0.032) (0.058) (0.028) (0.089) (0.027) (0.046) (0.024) (0.071)
   (0.022) (0.041) (0.021) (0.024) (0.027) (0.023)
   3 0.711 0.996 0.794 0.904 0.776 0.996 0.846 0.924 0.813 0.996 0.87
   0.933 0.933 0.915 0.897
   (0.316) (0.012) (0.200) (0.095) (0.231) (0.009) (0.134) (0.066) (0.177)
   (0.007) (0.096) (0.047) (0.011) (0.011) (0.015)
   4 0.826 0.982 0.848 0.937 0.846 0.981 0.857 0.936 0.845 0.977 0.845
   0.928 0.933 0.915 0.897
   (0.145) (0.029) (0.069) (0.033) (0.100) (0.022) (0.040) (0.020) (0.077)
   (0.020) (0.040) (0.018) (0.011) (0.011) (0.015)
   5 0.771 0.986 0.816 0.908 0.763 0.989 0.812 0.902 0.764 0.991 0.819
   0.903 0.882 0.847 0.83
   (0.162) (0.020) (0.079) (0.042) (0.159) (0.015) (0.077) (0.040) (0.157)
   (0.012) (0.074) (0.039) (0.024) (0.028) (0.025)
   6 0.812 0.93 0.757 0.897 0.783 0.944 0.742 0.879 0.767 0.954 0.738
   0.871 0.883 0.85 0.825
   (0.081) (0.042) (0.046) (0.023) (0.078) (0.031) (0.049) (0.023) (0.078)
   (0.024) (0.051) (0.027) (0.023) (0.025) (0.03)
   7 0.839 0.99 0.873 0.941 0.836 0.993 0.875 0.938 0.837 0.994 0.878
   0.939 0.933 0.912 0.9
   (0.161) (0.017) (0.087) (0.043) (0.159) (0.013) (0.083) (0.041) (0.160)
   (0.010) (0.082) (0.042) (0.011) (0.014) (0.013)
   8 0.88 0.959 0.851 0.942 0.865 0.968 0.847 0.933 0.863 0.975 0.854
   0.933 0.933 0.913 0.899
   (0.077) (0.039) (0.044) (0.017) (0.076) (0.026) (0.040) (0.017) (0.071)
   (0.021) (0.036) (0.015) (0.011) (0.014) (0.013)
   [80]Open in a new tab

Table 3.

   Simulation results using structured sparse mCIA are shown. Sensitivity
   (Sens), Specificity (Spec), and Matthews correlation coefficient (MCC)
   for feature selection performance and Angle for estimation performance
   are calculated. 5-fold cross validation is used to choose the best
   tuning parameter combination in each method. Values within parenthesis
   are standard errors
   structured sparse multiple CIA mCIA
   a[1] a[2] a[3] a[1] a[2] a[3]
   scenario Sens Spec MCC Angle Sens Spec MCC Angle Sens Spec MCC Angle
   Angle Angle Angle
   1 0.71 0.994 0.786 0.897 0.767 0.993 0.827 0.913 0.79 0.992 0.837 0.915
   0.882 0.847 0.830
   (0.284) (0.011) (0.166) (0.088) (0.204) (0.009) (0.106) (0.056) (0.154)
   (0.008) (0.073) (0.041) (0.025) (0.028) (0.025)
   2 0.79 0.979 0.814 0.918 0.787 0.97 0.789 0.901 0.774 0.962 0.761 0.885
   0.879 0.847 0.833
   (0.127) (0.021) (0.058) (0.030) (0.089) (0.018) (0.046) (0.024) (0.068)
   (0.016) (0.041) (0.022) (0.024) (0.027) (0.023)
   3 0.748 0.995 0.816 0.915 0.807 0.996 0.863 0.934 0.838 0.996 0.884
   0.941 0.933 0.915 0.897
   (0.300) (0.010) (0.186) (0.092) (0.221) (0.008) (0.126) (0.064) (0.171)
   (0.006) (0.091) (0.047) (0.011) (0.011) (0.015)
   4 0.854 0.987 0.875 0.947 0.867 0.984 0.877 0.945 0.862 0.979 0.861
   0.937 0.933 0.915 0.897
   (0.142) (0.016) (0.072) (0.034) (0.097) (0.014) (0.042) (0.021) (0.074)
   (0.013) (0.038) (0.018) (0.011) (0.011) (0.015)
   5 0.798 0.986 0.833 0.919 0.791 0.989 0.831 0.913 0.793 0.992 0.838
   0.915 0.882 0.847 0.83
   (0.162) (0.016) (0.075) (0.042) (0.162) (0.012) (0.076) (0.043) (0.160)
   (0.009) (0.073) (0.042) (0.024) (0.028) (0.025)
   6 0.83 0.939 0.781 0.911 0.803 0.951 0.768 0.893 0.785 0.959 0.76 0.884
   0.883 0.85 0.825
   (0.069) (0.029) (0.042) (0.020) (0.069) (0.020) (0.043) (0.021) (0.065)
   (0.017) (0.043) (0.024) (0.023) (0.025) (0.03)
   7 0.852 0.993 0.887 0.947 0.848 0.994 0.886 0.944 0.849 0.996 0.89
   0.945 0.933 0.912 0.9
   (0.158) (0.011) (0.087) (0.044) (0.157) (0.008) (0.083) (0.043) (0.156)
   (0.006) (0.081) (0.043) (0.011) (0.014) (0.013)
   8 0.873 0.968 0.859 0.945 0.861 0.975 0.857 0.938 0.86 0.981 0.864
   0.937 0.933 0.913 0.899
   (0.076) (0.025) (0.039) (0.018) (0.077) (0.017) (0.039) (0.018) (0.072)
   (0.014) (0.035) (0.016) (0.011) (0.014) (0.013)
   [81]Open in a new tab

   Also, we have several observations by comparing the results of
   different scenarios, driven by the nature of our generative model.
   First, the performance of the methods is better in the scenarios
   3(4,7,8) than the one in the scenarios 1(2,5,6) (respectively). This
   observation agrees with our expectation originated from the nature of
   our generative model. In particular, the bigger σ^2 makes the first
   eigenvalue of the matrix
   [MATH: <msubsup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>k</mi></mrow><mrow><mo
   >⊺</mo></mrow></msubsup><mi
   mathvariant="bold-italic">b</mi><msup><mrow><mi
   mathvariant="bold-italic">b</mi></mrow><mrow><mo>⊺</mo></mrow></msup><m
   sub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>k</mi></mrow></msub>
   :MATH]
   big, and this helps the algorithm detect the eigenvector, which is the
   estimator of the true loading vector.

   Second, results of ssmCIA from the scenarios with n[en]=(5,5,5) show a
   better performance than those from the scenarios with n[en]=(3,4,5) and
   the results from the scenarios with n[el]=10 show a better performance
   than those from the scenarios with n[el]=20 in terms of sensitivity.
   Again, this agrees with the nature of our generative model. This is
   because the true loading vectors from the scenarios with n[en]=(3,4,5)
   has bigger nonzero valued elements compared to the scenarios with
   n[en]=(5,5,5), and the coefficients of connected variables in the
   network are bigger in the scenarios with n[el]=10 than those in the
   scenarios with n[el]=20.

Data analysis

NCI60 dataset

   The NCI60 dataset includes a panel of 60 diverse human cancer cell
   lines used by the Developmental Therapeutics Program (DTP) of the U.S.
   National Cancer Institute (NCI) to screen over 100,000 chemical
   compounds and natural products. It consists of 9 cancer types;
   leukemia, melanomas, ovarian, renal, breast, prostate, colon, lung, and
   CNS origin. There are various -omics datasets generated from the cell
   line panel including gene expression datasets from various platforms,
   protein abundance datasets, and methylation datasets.

   The goal of the analysis is to identify a subset of biomarker genes
   that contributes to the explanation of common relationships among
   multiple datasets. We downloaded three datasets generated using NCI-60
   cell lines from CellMiner [[82]32], two of which were gene expression
   datasets and the other was protein abundance dataset. Two gene
   expression datasets were obtained from different technologies, one was
   the Affymetrix HG-U133 chips [[83]33] and the other was the Agilent
   Whole Human Genome Oligo Microarray [[84]23]. The third dataset was the
   proteomics dataset using high-density reverse-phase lysate microarrays
   [[85]29]. Since one melanoma cell line was not available in the
   Affymetrix data, We used 59 cell line data that are common to all three
   datasets. To reduce the computational burden, we selected top 5% of
   genes with high variance, which resulted in 491 genes in the Affymetrix
   data, 488 genes in the Agilent data, and 94 proteins in proteomics
   data. Pathway graph information was obtained from the Kyoto
   Encyclopedia of Genes and Genomes (KEGG) pathway database [[86]18].

Analysis results

   Table [87]4 shows the number of nonzero elements in each estimated
   loading and the percentage of explained data variability by each
   method. Our sparse methods show comparable or better performance in
   terms of explained variability with much smaller number of nonzero
   elements in the estimated loadings. Percentage of explained variability
   is calculated as a ratio of pseudo eigenvalues corresponding to the
   estimated loading vectors to the sum of total eigenvalues of the
   datasets. We applied the estimated loading vectors to the test dataset
   and the whole dataset to calculate the percentage of explained
   variability. When we apply the estimated loading to the whole dataset,
   our sparse methods explain almost the same amount of variability as
   mCIA with much fewer selected genes. When we apply the estimated
   loadings to the test dataset, both sparse methods explain comparable
   amount of variability as mCIA explains using the first estimated
   loading vector. Moreover, the first two loading vectors of ssmCIA
   explain more variability than mCIA with much more sparsely estimated
   loadings.

Table 4.

   For each method, the first two columns show the number of nonzero
   elements in the first two estimated coefficient loadings of three
   datasets, the Affymetrix, the Agilent, and the protein dataset
   respectively. Next four columns contain pseudo-eigenvalues calculated
   using the estimated coefficient loadings from the training dataset.
   Last four columns include proportions of pseudo-eigenvalues to the sum
   of total eigenvalues for each dataset
   # of nonzeros Pseudo Eigenvalues % of variability explained
   test dataset whole dataset test dataset whole dataset
   1st 2nd 1st 1st + 2nd 1st 1st + 2nd 1st 1st + 2nd 1st 1st + 2nd
   mCIA (491,488,94) (491,488,94) 36065.92 33447.03 282991.70 218372.50
   0.088 0.169 0.129 0.229
   smCIA (250,30,20) (100,80,15) 31161.89 21283.77 208966.30 157045.80
   0.076 0.127 0.095 0.167
   ssmCIA (300,80,15) (400,15,30) 34611.11 36793.08 239050.80 239050.80
   0.084 0.173 0.109 0.218
   [88]Open in a new tab

   Four plots generated using the first two estimated loading vectors from
   each method are shown in Fig. [89]1. Plots in the first column are 3-D
   figures where each point represents one cell line sample. The
   coordinate of each point consists of scores calculated using the first
   estimated loading vectors of three datasets. Plots from the second to
   fourth columns are generated using the first two estimated loading
   vectors on the variable spaces of each data.

Fig. 1.

   [90]Fig. 1
   [91]Open in a new tab

   From the top to bottom, each row shows the results from mCIA, smCIA,
   and ssmCIA method respectively. From left to right, each column
   represents the sample space in
   [MATH: <msup><mrow><mi>ℝ</mi></mrow><mrow><mi>n</mi></mrow></msup>
   :MATH]
   , the gene space of the Affymetrix dataset in
   [MATH: <msup><mrow><mi>ℝ</mi></mrow><mrow><mn>491</mn></mrow></msup>
   :MATH]
   , the gene space of the Agilent dataset in
   [MATH: <msup><mrow><mi>ℝ</mi></mrow><mrow><mn>488</mn></mrow></msup>
   :MATH]
   , and the gene space of the proteomics dataset in
   [MATH: <msup><mrow><mi>ℝ</mi></mrow><mrow><mn>94</mn></mrow></msup>
   :MATH]
   . For three panels in the first column, the estimates of the first
   loading vectors are used. Each different colors represent different
   cell lines, breast (BR), melanoma (ME), colon (CO), ovarian (OV), renal
   (RE), lung (LC), central nervous system (CNS, glioblastoma), prostate
   (PR) cancers and leukemia (LE). For the remaining plots, the estimates
   of the first two loading vectors are used. Also, colored and labeled
   points in the plots are top 20 genes that are most distant from the
   origin, which are more significant compared to other genes. Complete
   lists of top 20 genes for each panel can be found in the supplementary
   materials

   Figure [92]1 proves that sparse estimates from our proposed methods
   reveal biologically meaningful results that are consistent with
   previous studies [[93]7, [94]24]. In the 3-D plot, leukemia cells are
   well separated from other cells. And we confirmed that smCIA and ssmCIA
   select certain genes related to leukemia. For example, SPARC is also
   high weight on both axes of Affymetrix plot from mCIA, smCIA, and
   ssmCIA analysis. Recent study showed that this gene promotes the growth
   of leukemic cell [[95]1]. EPCAM is an another example, the gene having
   a high negative weight on the second axis in the plot of mCIA and
   ssmCIA in the Affymetrix dataset. This gene is known to be frequently
   over-expressed in patients with acute myeloid leukemia (AML) [[96]42].
   The gene EBF1, another example, has a high weight on the second axis in
   plot of ssmCIA in the Agilent data, which can be supported by recent
   studies discussing the relationship between this gene and leukemia
   [[97]30]. Also, above observations implies that the second axis of the
   ssmCIA analysis may contribute to cluster the dataset into leukemia
   cells and non-leukemia cells. From the comparison between the results
   of smCIA and ssmCIA, we notice that the ssmCIA results is more
   consistent with the result of mCIA than the results of smCIA, in terms
   of number of common genes and estimated coefficients of those common
   genes. Selected genes from ssmCIA has more common genes with mCIA than
   smCIA. We compared top 30 genes in each datasets and smCIA selected 40
   common genes with mCIA while ssmCIA selected 56 genes in common with
   mCIA. Also, ssmCIA results shows consistent direction for estimated
   coefficients of genes that are common with the results of mCIA, while
   some of genes from smCIA shows different directions compared to mCIA
   results. From this observation, we confirm that incorporation of
   network information guides the model to achieve the more biologically
   meaningful estimate results.

   In addition, we have conducted a pathway enrichment analysis using
   ToppGene Suite [[98]5] to assess the set of features selected by our
   methods. Note that we compare the result using the first estimated
   loading vectors only. There are numerous gene ontology terms (GO),
   pathways, and diseases that genes with nonzero values in the estimated
   loading vectors are enriched. For example, the GO term, regulation of
   cell proliferation, is revealed to be highly enriched in our results
   (GO:0042127, Bonferroni adjusted p-values are 5.77e^−16 in the result
   of smCIA, 7.52e^−19 in the result of ssmCIA). Leukemia-cell
   proliferation is a topic of interest to researchers [[99]2, [100]28].
   Recently, [[101]11] have reviewed the molecular mechanism related the
   cell proliferation in leukemia. Also, we confirm that ssmCIA enjoys the
   benefit of incorporating the network information from the pathway
   enrichment results. Compared to the results from smCIA, the enrichment
   results of ssmCIA often shows much smaller Bonferroni adjusted
   p-values, above GO:0042127 is one of examples. Also, we could obtain
   more enriched results from ssmCIA than those from smCIA. There are 673
   enriched GO terms, pathways, human phenotypes, and diseases in the
   results of ssmCIA, while 520 enriched results are obtained from smCIA.
   These results indicate that ssmCIA is more sensitive to select relevant
   features by incorporating structural information so that more
   biologically meaningful genes can be identified.

Discussion

   For integrative analysis of K data sets, the number of tuning
   parameters is K and 2K for smCIA and ssmCIA respectively. As such, the
   computational costs of the methods can become prohibitively expensive
   for integrative analysis of a large number of -omics datasets using the
   proposed cross validation strategy for parameter tuning. One potential
   solution is to use the same pair of tuning parameter values for all K
   data sets. It is of potential interest to tackle this limitation in
   future research.

Conclusion

   In this article, we propose smCIA method that imposes a sparsity
   penalty on mCIA loading vectors and ssmCIA that employs a network-based
   penalty to incorporate biological information represented by a graph.
   Our numerical studies demonstrate that both methods are useful for
   integrative analysis of multiple high-dimensional datasets.
   Particularly, they yield sparse estimates of the loading vectors while
   explaining a similar amount of variance of the data compared to the
   mCIA. In the real data analysis, ssmCIA, with incorporation of
   biological information, is able to select important pathways
   contributing to correspondence among the three datasets, and hence
   yields more interpretable results.

Abbreviations

   mCIA
          Multiple co-inertia analysis

   smCIA
          Sparse multiple co-inertia analysis

   ssmCIA
          Structured sparse co-inertia analysis

   NCI
          National cancer institute

   CCA
          Canonical correlation analysis

   Rifle
          Truncated Rayleigh flow method

   GO
          Gene ontology term

Authors’ contributions

   QL and EJ formulated the ideas and revised the paper. EJ designed the
   experiments, performed the experiments, analyzed the data, and wrote
   the first draft. Both authors read and approved the final manuscript.

Funding

   This work is partly supported by NIH grants P30CA016520, R01GM124111,
   and RF1AG063481. The content is the responsibility of the authors and
   does not necessarily represent the views of NIH.

Availability of data and materials

   Our algorithms are implemented by the free statistical software
   language R and are freely available at:
   [102]https://www.med.upenn.edu/long-lab/software.html. Three -omics
   datasets used for the real data analysis can be obtained from the
   CellMiner webpage [103]https://discover.nci.nih.gov/cellminer/.
   Additional simulation results and the list of top 30 genes from the
   NCI60 data analysis can be found in the supplementary materials.

Ethics approval and consent to participate

   Not applicable

Consent for publication

   Not applicable

Competing interests

   The authors declare that they have no competing interests.

Footnotes

   Publisher’s Note

   Springer Nature remains neutral with regard to jurisdictional claims in
   published maps and institutional affiliations.

References