Abstract

   The rapid development of high-throughput sequencing techniques provides
   an unprecedented opportunity to generate biological insights into
   microbiome-related diseases. However, the relationships among microbes,
   metabolites and human microenvironment are extremely complex, making
   data analysis challenging. Here, we present NMFGOT, which is a
   versatile toolkit for the integrative analysis of microbiome and
   metabolome data from the same samples. NMFGOT is an unsupervised
   learning framework based on nonnegative matrix factorization with graph
   regularized optimal transport, where it utilizes the optimal transport
   plan to measure the probability distance between microbiome samples,
   which better dealt with the nonlinear high-order interactions among
   microbial taxa and metabolites. Moreover, it also includes a spatial
   regularization term to preserve the spatial consistency of samples in
   the embedding space across different data modalities. We implemented
   NMFGOT in several multi-omics microbiome datasets from multiple
   cohorts. The experimental results showed that NMFGOT consistently
   performed well compared with several recently published multi-omics
   integrating methods. Moreover, NMFGOT also facilitates downstream
   biological analysis, including pathway enrichment analysis and
   disease-specific metabolite-microbe association analysis. Using NMFGOT,
   we identified the significantly and stable metabolite-microbe
   associations in GC and ESRD diseases, which improves our understanding
   for the mechanisms of human complex diseases.

   Subject terms: Microbial communities, Metagenomics

Introduction

   With the rapid evolution of multi-omics microbiome technologies, it
   becomes easy to collect heterogeneous data to explore biological
   questions and generate biological hypothesis. 16S ribosomal RNA gene
   amplicon and whole metagenomic shotgun sequencing (WMGS)^[26]1 are used
   to detect the taxonomic composition. Simultaneously, high-throughput
   untargeted and targeted technologies can also be used to estimate the
   abundance of metabolites in the same sample^[27]2. The paired
   metagenomic and metabolomic profiles from the same samples provide an
   unprecedented opportunity to explore biological mechanisms and
   cross-omics feature links in microbiome-related diseases.

   Increasing evidences have shown that the microbial community in the
   human body widely involves in metabolic activities and plays an
   important role in host health and diseases^[28]3–[29]5. The microbial
   metabolic activities break down some indigestible carbohydrates and
   synthesize vitamins that are beneficial to host^[30]6. The bacterial
   metabolites promote gut homeostasis, and may also lead to
   gastrointestinal and systemic diseases^[31]7. However, findings from a
   single data view or modality usually ignore the complementary and
   compatible information from other views. For example, host genes and
   gut microbiota would act in a coordinated way when they are involved in
   common biological functions^[32]8. The metabolites generated in some
   bacterial taxa from the cockroach gut have antagonistic activity
   against certain pathogens^[33]9. These studies provided important
   evidences of metabolite-microbe interactions, but a knowledge gap still
   largely remains for entirety character the landscape of
   metabolite-microbe associations. This gap stems from the incompleteness
   of data, the limited characterization of bacterial genes, the metabolic
   “dark matter” (yet uncharacterized metabolites) and so on. Hence, there
   is an essential need for computational approaches and tools that can
   effectively integrate microbiome and metabolome data to comprehensively
   identify the underlying associations and patterns in the data.

   Recently, a few multivariate integration methods have been developed to
   analyze multi-omics data obtained from the same sample^[34]4,[35]10,
   including SNF^[36]11, DIABLO^[37]12, and MVCPM^[38]13, but these
   methods were not initially designed for microbiome-metabolome joint
   analysis. The integration methods for paired metagenomic and
   metabolomic profiles mainly contain CCA-based framework and its
   variants: SCCA^[39]14,[40]15 and DCCAE^[41]16. SCCA and DCCAE assume
   that the projections of two sets of observations are lineally
   correlated and try to establish confident microbe-metabolite
   associations. DCCAE introduced two autoencoders and minimized the
   combination of CCA objective and the reconstruction error of the
   autoencoders. SNF and MVCPM are graph-based multi-omics integration
   methods, and assume that there exists a consensus sample similarity
   network obtained by iteratively fusing operation. However, these
   assumptions may not be realistic, because the interplay between
   bacteria, metabolites, and other factors in the complex ecosystem,
   renders it difficult to character the landscape of metabolite-microbe
   interactions with linear mapping.

   In this work, we propose NMFGOT for the integrative analysis of
   multi-omics data, where microbiome and metabolome data are parallelly
   profiled from the same sample. NMFGOT is a versatile toolkit that
   enables clustering of the samples and facilitates downstream biological
   analysis, including pathway enrichment analysis and metabolite-microbe
   association analysis. NMFGOT is a novel unsupervised learning framework
   based on nonnegative matrix factorization with graph regularized
   optimal transport. Unlike CCA-based methods^[42]12,[43]14,[44]16,[45]17
   which assume linear projection of two sets of observations, NMFGOT uses
   the optimal transport plan to measure the probability distance between
   microbiome samples, which better deals with the nonlinear high-order
   interactions between microbial taxa and metabolites. NMFGOT not only
   integrates the complementary information from different data
   modalities, but also includes a spatial regularization term to preserve
   the spatial consistency of samples in the embedding space. Through
   analyzing on three microbiome-related multi-omics datasets from
   different tissues (including gastric and gut) and diseases (including
   gastric cancer, end-stage renal disease, and inflammatory bowel
   disease), we show that NMFGOT is effective in distinguishing sample
   types: NMFGOT achieves superior performance in clustering and
   visualization. The factors for metabolome obtained in NMFGOT provide
   rich biological significances: they are enriched for disease-specific
   biological pathways, which are directly related to disease development.
   NMFGOT also includes a model that infers disease-specific
   metabolite-microbe associations, which is based on lasso penalized
   regression model^[46]18 and stability selection approach^[47]19. An
   overview of NMFGOT is shown in Fig. [48]1.

Fig. 1. An illustration of NMFGOT.

   [49]Fig. 1
   [50]Open in a new tab

   NMFGOT is designed for joint analysis of microbiome and metabolome data
   obtained from the same sample. a The microbial abundance and metabolite
   abundance profile matrices
   [MATH: <msup><mrow><mi>X</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup> :MATH]
   and
   [MATH: <msup><mrow><mi>X</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup> :MATH]
   are the inputs of NMFGOT (row represents microbial taxa or metabolite,
   and column represents sample). b By iteration fusion, NMFGOT learns the
   microbe loading matrix
   [MATH: <msup><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup> :MATH]
   , the metabolite loading matrix
   [MATH: <msup><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup> :MATH]
   , the clustering indicator matrices
   [MATH: <msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup> :MATH]
   and
   [MATH: <msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup> :MATH]
   , and the sample-sample similarity matrix
   [MATH: <mi>S</mi> :MATH]
   . c The enrichment analysis provides biological insights of the top
   metabolites identified in the matrix
   [MATH: <msup><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup> :MATH]
   . d Lasso penalized regression and stability selection are performed to
   identify significant and stable metabolite-microbe pairs, which provide
   an avenue to understand complicated disease mechanisms. e The
   sample-sample similarity matrix
   [MATH: <mi>S</mi> :MATH]
   facilitates visualization and data clustering.

Results

NMFGOT achieves good clustering performance on multiple disease-specific
microbiome-metabolome datasets

   We evaluated the performance of NMFGOT on three microbiome-metabolome
   datasets: Gastric cancer (GC) dataset, End-stage renal disease (ESRD)
   dataset and Inflammatory bowel disease (IBD) dataset, where microbial
   and metabolite abundance are simultaneously profiled for the same
   samples.

   We compared NMFGOT with several baselines and state-of-the-art methods
   for microbiome-metabolome data integration, including SCCA^[51]14,
   SNF^[52]20, MVCPM^[53]21 and a deep learning algorithm DCCAE^[54]16.
   For SCCA, we utilized grid search to determine the optimal
   regularization parameters and implemented k-means clustering on the
   canonical variables for microbiome and metabolome data. For SNF, we
   evaluated its performance with its default parameters and clustering
   method. For MVCPM, k-means was implemented on the low-dimensional
   representations of samples. DCCAE consists of two autoencoders and
   minimizes the combination of canonical correlation objective and the
   reconstruction error of the autoencoders. We implemented DCCAE with the
   defaulted parameters and used k-means on the low-dimensional
   representations of samples. For NMFGOT, the k-nearest neighbor (KNN)
   graph with k = 20 was first established, then Louvain clustering was
   implemented on the k-nearest neighbor graph to obtain the final
   clustering assignment.

   The clustering performance assessed by AC, ARI and silhouette score are
   presented in Fig. [55]2. As shown in Fig. [56]2, our proposed NMFGOT
   algorithm achieves the best performance among all the methods on the
   three datasets in terms of AC and ARI. For the average silhouette
   coefficient criterion, SCCA also performs well in IBD data. The numeric
   values of the clustering performance metrics are presented in
   Supplementary Table [57]1.

Fig. 2. Assessment of clustering performance.

   [58]Fig. 2
   [59]Open in a new tab

   a Evaluation of the clustering performance in terms of AC. b Evaluation
   of the clustering performance in terms of the average silhouette score.
   Silhouette score measures the clustering quality by comparing the
   distances between each sample and its assigned cluster and its
   neighboring cluster. Finally, the average silhouette score was
   presented above. c Evaluation of the clustering performance in terms of
   ARI.

   To further validate the effectiveness and efficiency of NMFGOT, we also
   conducted comparison experiments with a simplified variant of model
   (5): in NMFGL, we replace Laplacian graph L[opt] with traditional graph
   (based on similarity computation, such as Gaussian kernel function).
   The performance of NMFGL was not as good as model (5), which suggests
   that it is beneficial to model microbial data using optimal transport
   distance (Supplementary Table [60]2). Ablation studies were also
   implemented to investigate the roles of the spatial regularization
   technique and optimal transport, where we set β and γ equal to 0 in
   turn. The experimental results show that NMFGOT performs well in most
   cases. More details were presented in Supplementary Table [61]3.

NMFGOT facilitates data visualization

   We next implement UMAP visualization^[62]22 to further evaluate their
   performance. For SNF and NMFGOT, the learned sample similarity matrix S
   were used as input of UMAP. For other methods, the low-dimensional
   representation matrices were used to perform UMAP. The results are
   presented in Fig. [63]3. As Fig. [64]3 has shown, NMFGOT identified
   more clear cluster structure on GC (Fig. [65]3a) and ESRD data (Fig.
   [66]3b).

Fig. 3. Comparison of the visualization results.

   [67]Fig. 3
   [68]Open in a new tab

   SCCA, SNF, MVCPM, DCCAE and NMFGOT are compared by implementing UMAP on
   two multi-omics datasets (microbiome + metabolome). a GC data. b IBD
   data. Samples are colored based on the true labels provided in their
   original publications.

   Interesting, on GC datasets NMFGOT identifies what appears to be a
   transitional state from gastrectomy to healthy patients (left bottom of
   Fig. [69]3a). The similar situation can be also found in IBD (Fig.
   [70]3b, sample nodes in box; Supplementary Fig. [71]1).

NMFGOT identifies host pathways associated with disease-specific metabolites
on different datasets

   The factors obtained from NMFGOT provide rich biological insights, and
   are easy to interpret. Specifically, for metabolite profile data, we
   selected the top 50 metabolites with large magnitudes in each column of
   the metabolite loading matrix
   [MATH: <msup><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup> :MATH]
   , and implemented pathway enrichment analysis by using
   MetaboAnalyst^[72]23. The enriched metabolite sets for metabolome data
   agree with the biological function of the underlying sample types that
   the factors represent (Supplementary Table [73]4). In GC data, factor 2
   corresponds to GC type, which is inferred by inspecting the sample type
   label of the samples with large values in
   [MATH: <msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup> :MATH]
   . The second column in the metabolite loading matrix
   [MATH: <msup><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup> :MATH]
   is enriched for “valine, leucine and isoleucine biosynthesis”
   (log10(q-value) = −14.80), “Glutathione metabolism”
   (log10(q-value) = −7.70), and “Phenylalanine, tyrosine and tryptophan
   biosynthesis” (log10(q-value) = −5.67). The enriched pathways are
   consistent with the previous studies^[74]24–[75]26. In ESRD data,
   factor 1 corresponds ESRD sample type. The first column in the
   metabolite loading matrix
   [MATH: <msup><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup> :MATH]
   is enriched for “Pyruvate metabolism” (log10(q-value) = −6.68),
   “Glycolysis / Gluconeogenesis” (log10(q-value) = −6.68), These results
   are also consistent with the previous studies^[76]27,[77]28 (:
   Supplementary Table [78]4).

   We further validated the metabolite markers by retrieving related
   literatures from PubMed. The results were presented in Supplementary
   Table [79]5-[80]6. Most of enriched metabolites included in these
   pathways were found to be related to GC and ESRD.

   Fig. [81]4 shows the top enriched metabolite sets in each of these two
   diseases.

Fig. 4. Enrichment analysis for GC dataset and ESRD dataset.

   [82]Fig. 4
   [83]Open in a new tab

   Enrichment analysis for GC data (a) and End-stage renal disease ESRD
   data (b). The enriched metabolite sets are sorted by p-value
   (FDR < 0.5).

   To summarize, NMFGOT facilitates the identification of host pathways
   associated with disease-specific metabolites. The enrichment analyses
   for the metabolite loading matrix obtained from NMFGOT provide rich and
   consistent biological insights on the identified sample types.

NMFGOT identifies disease-specific metabolite-microbe association

   Microbial taxa and metabolites involved in common biological functions
   may act in a coordinated manner. Based on this hypothesis, we used the
   factors obtained from NMFGOT to character molecular-level associations
   between microbiome and metabolome in each of these three diseases. More
   specifically, by inspecting the sample labels with large values in
   [MATH: <msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup> :MATH]
   , sample types can be assigned to the factors in NMFGOT. We selected
   top 100 microbial taxa for microbial abundance data and top 50
   metabolites for metabolite abundance data, and then computed the
   spearman correlation coefficients between microbial taxa and
   metabolites. Fig. [84]5a represents the overall pattern of correlation
   between microbial taxa and metabolites identified by NMFGOT in GC, ESRD
   and IBD (p-value < 0.05).

Fig. 5. Associations between specific microbes and individual metabolites in
each disease.

   [85]Fig. 5
   [86]Open in a new tab

   a Heatmap showing the overall pattern of correlation between top
   microbial taxa (rows) and metabolites (columns) in factors identified
   by NMFGOT for GC, ESRD and IBD (p-value < 0.05). b Correlation network
   for significantly and stability-selected metabolites and microbial taxa
   in GC and ESRD. The red nodes represent metabolites, the blue nodes
   represent microbes. Edges represent metabolite-microbe associations.
   The red line indicates positive correlation, and green line indicates
   negative correlation.

   Inferring disease-specific metabolite-microbe associations may be
   valuable for understanding the mechanisms of complex diseases and
   functions of microbes. Next, we further explored associations between
   metabolites and microbes based on factors obtained from NMFGOT in GC
   and ERSD. To do so, we firstly used lasso penalized regression model to
   identify specific microbes whose abundances are associated with the
   abundance of certain metabolite^[87]18. Specifically, we used the
   abundances of the microbes as predictors and the abundance of
   metabolite as response variable to fit the model. Then, stability
   selection was applied to select robust associations^[88]19. Finally, an
   intersection between associations identified by the lasso penalized
   regression model and stability selection above was performed to retain
   significant and stability-selected metabolite-microbe associations.
   Using this way, we found 40 and 54 robust metabolite-microbe
   associations in GC and ESRD, respectively (Fig. [89]5b). In GC data,
   these associations consist of 17 microbes and 35metabolites, 33
   microbes and 35 metabolites in ESRD data.

   Next, we also implemented the additional experiments on holdout dataset
   to validate the effectiveness and reproducibility of some markers.
   Specifically, we first split ESRD data into two parts, the one is used
   to train (50%), the other is used as holdout dataset. Then, on the
   train data we implemented NMFGOT algorithm and detected the significant
   and stability-selected metabolite-microbe associations. Meanwhile, we
   also implemented the same operation on the holdout dataset to identify
   the significant and stability-selected metabolite-microbe associations.
   Finally, we compared the results obtained from these two datasets and
   obtained the shared metabolite-microbe associations. 16 significant and
   stability-selected metabolite-microbe associations were supported by
   these two parts of data (Supplementary Table [90]7).

   Taken together, these findings demonstrate the effectiveness of NMFGOT
   in identifying the latent metabolite-microbe associations.

Discussion

   Advances in multi-omics sequencing technologies provide an
   unprecedented opportunity to explore metabolite-microbe associations
   and understand the mechanism in human complex diseases. To this end, we
   proposed NMFGOT, which integrates microbiome and metabolome data from
   the same samples. CCA based methods used in multi-omics data analysis
   (including SCCA and DCCAE) typically assume that the projections of two
   sets of observations are lineally correlated. Unlike these approaches,
   NMFGOT assumes that there exist complicated nonlinear interactions
   among microbial taxa, metabolites, and human gut environment, and uses
   OT plan to encode the complex relationships between samples. We
   demonstrate through three multi-omics microbiome datasets that NMFGOT
   consistently performs well when benchmarked with several recently
   published multi-omics integrating methods. NMFGOT also takes advantage
   of OT, which characters the probability distance between samples, as
   well as integrates spatial regularizations to preserve the spatial
   consistency of samples. Moreover, NMFGOT facilitates downstream
   biological analysis, including pathway enrichment analysis and
   disease-specific metabolite-microbe association analysis. With NMFGOT,
   we identified significantly and stable metabolite-microbe associations
   in GC and ESRD diseases. The results further show disease-specific
   microbial taxa can regulate synthesis of host metabolites.

   In the whole experiments, we set the number of factors in NMFGOT equal
   to the number of sample types:
   [MATH: <mrow><mi>k</mi><mo>=</mo><mn>2</mn></mrow> :MATH]
   in each of the three datasets. Experimental results show that NMFGOT
   achieves the best performance. we also tested the robustness of each
   method on the values of k, where we varied the values of k in the range
   {2,3,4,5,6} on different datasets. However, we found that the
   clustering performance of these methods evaluated by silhouette score
   are sensitivities for these three datasets, which is likely because
   microbiome multi-omics data tends to have high level of noise. For
   datasets with multiple groups of samples, NMFGOT also performs well in
   terms of silhouette score. The experimental results are also presented
   in Supplementary Fig. [91]2.

   We also implemented the side-by-side comparison experiments in which
   single modality compositional data (microbial abundance data or
   metabolite abundance data) and multi-omics data are used to test the
   effectiveness of NMFGOT. The experimental results show that NMFGOT has
   better performance on AC, ARI, and silhouette score metrics
   (Supplementary Table [92]8). We further analyzed a colorectal cancer
   data, where microbial abundance profile and metabolite abundance data
   are simultaneously profiled in the same samples^[93]4. The 70 percent
   of samples are used to train, and 30 percent of samples are used as
   validation. The experimental results showed that the samples were
   reasonably separated by NMFGOT, and it obtained the high average
   silhouette score of samples (0.9310). The results were presented in
   Supplementary Fig. [94]3.

   In the future, we will extend NMFGOT to integrate more molecular
   modalities to capture complementary biological insights into complex
   mechanisms underlying multi-omics crosstalk^[95]15. In addition,
   extending NMFGOT to analyze gene expression data may also be another
   interesting direction: gene abundance can be considered as a modality
   and using the gene loading and microbe loading matrices in NMFGOT to
   analyze gene-microbe association.

Methods

Datasets description and data preprocessing

   Gastric cancer (GC) data^[96]29: This dataset used in this manuscript
   was downloaded from
   ([97]https://github.com/borenstein-lab/microbiome-metabolome-curated-da
   ta)^[98]30. 96 faecal samples from 54 healthy individuals and 42
   patients with gastrectomy for gastric cancer were collected. Shotgun
   metagenomic sequencing and targeted metabolites quantification for the
   same faecal samples are parallelly profiled.

   End-stage renal disease (ESRD) data^[99]28: This data downloaded from
   the literature^[100]28 includes 287 samples from 223 haemodialysis
   patients with ESRD and 69 healthy volunteers. Microbial abundance and
   metabolite abundance are simultaneously profiled by shotgun
   metagenomics sequencing and a headspace solid phase microextraction–gas
   chromatography-MS (GC-MS) method, respectively.

   Inflammatory bowel disease (IBD) data^[101]3: The dataset were taken
   from a published study of patients with IBD. It includes 121 samples
   with IBD and 34 controls. Microbial abundance and metabolite abundance
   data are parallelly profiled for the same sample. The statistical
   information of datasets is presented in Supplementary Table [102]9.

   The original microbial abundance data were normalized such that the
   total counts of all species in each sample equal to 1. The metabolite
   abundance data were log-transformed with a pseudo-count of 1. For the
   microbial abundance data, taxa appear in less than 3 samples were
   removed. For the metabolite abundance data, metabolites appear in less
   than 2 samples were removed.

Overview of NMF

   Given a nonnegative data matrix
   [MATH:
   <mrow><mi>X</mi><mo>∈</mo><msubsup><mrow><mi>R</mi></mrow><mrow><mo>+</
   mo></mrow><mrow><mi>p</mi><mo>×</mo><mi>n</mi></mrow></msubsup></mrow>
   :MATH]
   , NMF factorizes
   [MATH: <mi>X</mi> :MATH]
   into two low-rank matrices
   [MATH:
   <mrow><mi>W</mi><mo>∈</mo><msubsup><mrow><mi>R</mi></mrow><mrow><mo>+</
   mo></mrow><mrow><mi>p</mi><mo>×</mo><mi>k</mi></mrow></msubsup></mrow>
   :MATH]
   and
   [MATH:
   <mrow><mi>H</mi><mo>∈</mo><msubsup><mrow><mi>R</mi></mrow><mrow><mo>+</
   mo></mrow><mrow><mi>k</mi><mo>×</mo><mi>n</mi></mrow></msubsup></mrow>
   :MATH]
   , where p is the number of features, n is the number of observations
   and
   [MATH:
   <mrow><mi>k</mi><mo>≪</mo><mi>min</mi><mo>(</mo><mi>p</mi><mo>,</mo><mi
   >n</mi><mo>)</mo></mrow> :MATH]
   is the number of factors^[103]31. The objective function of NMF is
   written as follows.
   [MATH:
   <mrow><munder><mrow><mi>min</mi></mrow><mrow><mi>W</mi><mo>,</mo><mi>H<
   /mi><mo>≥</mo><mn>0</mn></mrow></munder><msubsup><mrow><mi>||</mi><mi>X
   </mi><mo>−</mo><mi
   mathvariant="italic">WH</mi><mi>||</mi></mrow><mrow><mi>F</mi></mrow><m
   row><mn>2</mn></mrow></msubsup><mo>,</mo></mrow> :MATH]
   1

   where
   [MATH: <mi>W</mi> :MATH]
   is basis matrix,
   [MATH: <mi>H</mi> :MATH]
   is coefficient matrix and can be used as clustering indicator.
   [MATH:
   <msub><mrow><mo>∣</mo><mo>∣</mo><mo>⋅</mo><mo>∣</mo><mo>∣</mo></mrow><m
   row><mi>F</mi></mrow></msub> :MATH]
   indicates the Frobenius norm of a matrix.

Optimal transport - Earth mover’s distance

   Optimal transport plan has been successfully applied to some fields,
   including cell-cell communication^[104]32,[105]33, domain
   adaptation^[106]34,[107]35 and single-cell multi-omics data
   integration^[108]36. Given a cost matrix
   [MATH:
   <mrow><mi>M</mi><mo>∈</mo><msup><mrow><mi>R</mi></mrow><mrow><mi>d</mi>
   <mo>×</mo><mi>d</mi></mrow></msup></mrow> :MATH]
   , probability vectors
   [MATH: <mi>r</mi> :MATH]
   and
   [MATH: <mi>c</mi> :MATH]
   belong to the simplex
   [MATH:
   <mrow><msub><mrow><mo>∑</mo></mrow><mrow><mi>d</mi></mrow></msub><mrow>
   <mi>≔</mi></mrow><mrow><mfenced close="}"
   open="{"><mrow><mi>x</mi><mo>∈</mo><msubsup><mrow><mi>R</mi></mrow><mro
   w><mo>+</mo></mrow><mrow><mi>d</mi></mrow></msubsup><mo>:</mo><msup><mr
   ow><mi>x</mi></mrow><mrow><mi>T</mi></mrow></msup><msub><mrow><mn
   mathvariant="bold-italic">1</mn></mrow><mrow><mi>d</mi></mrow></msub><m
   o>=</mo><mn>1</mn></mrow></mfenced></mrow></mrow> :MATH]
   , where
   [MATH: <msub><mrow><mn
   mathvariant="bold-italic">1</mn></mrow><mrow><mi>d</mi></mrow></msub>
   :MATH]
   is the
   [MATH: <mi>d</mi> :MATH]
   dimensional vector with all its elements to be 1 s, the optimal
   transport plan aims to find a transport matrix
   [MATH: <mi>P</mi> :MATH]
   that maps
   [MATH: <mi>r</mi> :MATH]
   to
   [MATH: <mi>c</mi> :MATH]
   . The optimal transport problem can be defined as follows^[109]37.
   [MATH:
   <mrow><munder><mrow><mi>min</mi></mrow><mrow><mi>P</mi></mrow></munder>
   <mfenced close="⟩"
   open="⟨"><mrow><mi>P</mi><mo>,</mo><mi>M</mi></mrow></mfenced><mo>,</mo
   ><mspace width="0.25em"></mspace><mi
   mathvariant="normal">s</mi><mo>.</mo><mi
   mathvariant="normal">t</mi><mo>.</mo><mi>P</mi><mo>∈</mo><mi>U</mi><mfe
   nced close=")"
   open="("><mrow><mi>r</mi><mo>,</mo><mi>c</mi></mrow></mfenced><mo>,</mo
   ></mrow> :MATH]
   2

   Here,
   [MATH: <mfenced close="⟩" open="⟨"><mrow><mo>⋅</mo><mo>,</mo><mspace
   width="0.25em"></mspace><mo>⋅</mo></mrow></mfenced> :MATH]
   is Frobenius dot-product,
   [MATH:
   <mrow><mi>U</mi><mo>(</mo><mi>r</mi><mo>,</mo><mi>c</mi><mo>)</mo></mro
   w> :MATH]
   denotes the transport polytope for
   [MATH: <mi>r</mi> :MATH]
   and
   [MATH: <mi>c</mi> :MATH]
   .
   [MATH:
   <mrow><mi>U</mi><mo>(</mo><mi>r</mi><mo>,</mo><mi>c</mi><mo>)</mo></mro
   w> :MATH]
   can be defined as the follows.
   [MATH: <mrow><mi>U</mi><mfenced close=")"
   open="("><mrow><mi>r</mi><mo>,</mo><mi>c</mi></mrow></mfenced><mspace
   width="0.25em"></mspace><mi mathvariant="normal">:=</mi><mfenced
   close="}"
   open="{"><mrow><mi>P</mi><mo>∈</mo><msubsup><mrow><mi>R</mi></mrow><mro
   w><mo>+</mo></mrow><mrow><mi>d</mi><mo>×</mo><mi>d</mi></mrow></msubsup
   ><mo>∣</mo><mi>P</mi><msub><mrow><mi
   mathvariant="bold">1</mi></mrow><mrow><mi>d</mi></mrow></msub><mo>=</mo
   ><mi>r</mi><mo>,</mo><msup><mrow><mi>P</mi></mrow><mrow><mi>T</mi></mro
   w></msup><msub><mrow><mi
   mathvariant="bold">1</mi></mrow><mrow><mi>d</mi></mrow></msub><mo>=</mo
   ><mi>c</mi></mrow></mfenced><mo>.</mo></mrow> :MATH]
   3

   The optimal transport distance between
   [MATH: <mi>r</mi> :MATH]
   and
   [MATH: <mi>c</mi> :MATH]
   is defined as follows.
   [MATH:
   <mrow><msub><mrow><mi>d</mi></mrow><mrow><mi>M</mi></mrow></msub><mrow>
   <mfenced close=")"
   open="("><mrow><mi>r</mi><mo>,</mo><mi>c</mi></mrow></mfenced></mrow><m
   o>=</mo><munder><mrow><mi>min</mi></mrow><mrow><mi>P</mi><mo>∈</mo><mi>
   U</mi><mrow><mfenced close=")"
   open="("><mrow><mi>r</mi><mo>,</mo><mi>c</mi></mrow></mfenced></mrow></
   mrow></munder><mrow><mfenced close="〉"
   open="〈"><mrow><mi>P</mi><mo>,</mo><mi>M</mi></mrow></mfenced></mrow><m
   o>.</mo></mrow> :MATH]
   4

   In this paper,
   [MATH:
   <mrow><msub><mrow><mi>d</mi></mrow><mrow><mi>M</mi></mrow></msub><mrow>
   <mfenced close=")"
   open="("><mrow><mi>r</mi><mo>,</mo><mi>c</mi></mrow></mfenced></mrow></
   mrow> :MATH]
   is used to compute the probability distance between samples from
   different conditions.

Multi-view learning with graph regularized optimal transport plan

   To dissect heterogeneity of samples from both microbiome abundance and
   metabolite abundance layers, we introduce an unsupervised learning
   framework, named nonnegative matrix factorization with graph
   regularized optimal transport (NMFGOT). Considering a multi-view
   dataset which consists of microbial abundance profile matrix
   [MATH: <mrow><msup><mrow><mi>X</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mo>∈</mo><msub
   sup><mrow><mi>R</mi></mrow><mrow><mo>+</mo></mrow><mrow><mi>l</mi><mo>×
   </mo><mi>n</mi></mrow></msubsup></mrow> :MATH]
   (l microbial species in n samples) and metabolite profile matrix
   [MATH: <mrow><msup><mrow><mi>X</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><mo>∈</mo><msub
   sup><mrow><mi>R</mi></mrow><mrow><mo>+</mo></mrow><mrow><mi>m</mi><mo>×
   </mo><mi>n</mi></mrow></msubsup></mrow> :MATH]
   (m metabolites in n samples), the objective function of NMFGOT is
   defined as follows:
   [MATH: <mtable><mtr><mtd
   columnalign="left"><munder><mrow><mi>min</mi></mrow><mrow><msup><mrow><
   msup><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mo>,</mo><mi>H
   </mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mo>,</mo><mi>S
   </mi></mrow></munder><mi>J</mi><mo>=</mo><munderover accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mn>2</mn></mrow></munderover><msubsup><mrow><mfen
   ced close="∥"
   open="∥"><mrow><msup><mrow><mi>X</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mo>−</mo><msup
   ><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><msup><mrow><mi
   >H</mi></mrow><mrow><mrow><mo>(</mo><mrow><mi>i</mi></mrow><mo>)</mo></
   mrow></mrow></msup></mrow></mfenced></mrow><mrow><mi>F</mi></mrow><mrow
   ><mn>2</mn></mrow></msubsup><mo>+</mo><mi>α</mi><munderover
   accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mn>2</mn></mrow></munderover><msubsup><mrow><mfen
   ced close="∥"
   open="∥"><mrow><mi>S</mi><mo>−</mo><msup><mrow><mi>H</mi></mrow><mrow><
   msup><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><msup><mrow><mi>H</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup></mrow></mfence
   d></mrow><mrow><mi>F</mi></mrow><mrow><mn>2</mn></mrow></msubsup></mtd>
   </mtr><mtr><mtd columnalign="left"><mspace
   width="2.0em"></mspace><mspace width="2.0em"></mspace><mspace
   width="0.16em"></mspace><mspace
   width="0.16em"></mspace><mo>+</mo><mspace
   width="0.25em"></mspace><mi>β</mi><mi>t</mi><mi>r</mi><mrow><mo>(</mo><
   mrow><msup><mrow><mi>H</mi></mrow><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><msup><mrow><mi>H</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mrow><mo>(</mo
   ><mrow><msub><mrow><mn>1</mn></mrow><mrow><mi>n</mi><mo>×</mo><mi>n</mi
   ></mrow></msub><mo>−</mo><msup><mrow><mi>H</mi></mrow><mrow><msup><mrow
   ><mfenced close=")"
   open="("><mrow><mi>j</mi></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><msup><mrow><mi>H</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mi>j</mi></mrow></mfenced></mrow></msup></mrow><mo>)</m
   o></mrow></mrow><mo>)</mo></mrow><mo>+</mo><mi>φ</mi><msubsup><mrow><mf
   enced close="∥" open="∥"><mrow><mi>S</mi><mi
   mathvariant="bold">1</mi><mo>−</mo><mi
   mathvariant="bold">1</mi></mrow></mfenced></mrow><mrow><mi>F</mi></mrow
   ><mrow><mn>2</mn></mrow></msubsup></mtd></mtr><mtr><mtd
   columnalign="left"><mspace width="2.0em"></mspace><mspace
   width="2.0em"></mspace><mspace width="0.16em"></mspace><mspace
   width="0.16em"></mspace><mo>+</mo><mspace
   width="0.25em"></mspace><mi>γ</mi><munderover accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mn>2</mn></mrow></munderover><mi>t</mi><mi>r</mi>
   <mfenced close=")"
   open="("><mrow><msubsup><mrow><msup><mrow><mi>H</mi></mrow><mrow><mfenc
   ed close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mi>L</mi></mro
   w><mrow><mi>o</mi><mi>p</mi><mi>t</mi></mrow><mrow><mrow><mo>(</mo><mro
   w><mi>i</mi></mrow><mo>)</mo></mrow></mrow></msubsup><msup><mrow><mi>H<
   /mi></mrow><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup></mrow></mfenced></mtd></mtr></mtable> :MATH]
   5
   [MATH: <mrow><mi>s</mi><mo>.</mo><mi>t</mi><mo>.</mo><mspace
   width="1em"></mspace><msup><mrow><mi>W</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mo>,</mo><msup
   ><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mo>,</mo><mi>S
   </mi><mo>,</mo><mi>α</mi><mo>,</mo><mi>β</mi><mo>,</mo><mi>φ</mi><mo>,<
   /mo><mi>γ</mi><mo>≥</mo><mn>0</mn><mo>.</mo></mrow> :MATH]

   where
   [MATH: <msup><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup> :MATH]
   ,
   [MATH:
   <msup><mrow><mi>H</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow
   ></msup> :MATH]
   indicate the basis matrix and coefficient matrix for the ith data
   modality, respectively.
   [MATH: <mrow><msubsup><mrow><mi>L</mi></mrow><mrow><mi
   mathvariant="italic">opt</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo
   ></mrow></msubsup><mo>∈</mo><msup><mrow><mi>R</mi></mrow><mrow><mi>n</m
   i><mo>×</mo><mi>n</mi></mrow></msup></mrow> :MATH]
   is the Laplacian matrix for ith data modality. In this manuscript, we
   used the optimal transport distance defined in subsection 2.2 to
   compute
   [MATH: <msub><mrow><mi>L</mi></mrow><mrow><mi
   mathvariant="italic">opt</mi></mrow></msub> :MATH]
   , i.e.,
   [MATH: <mrow><msub><mrow><mi>L</mi></mrow><mrow><mi
   mathvariant="italic">opt</mi></mrow></msub><mo>=</mo><mi>D</mi><mo>−</m
   o><mi>A</mi></mrow> :MATH]
   ,
   [MATH: <mrow><msub><mrow><mi>D</mi></mrow><mrow><mi
   mathvariant="italic">ii</mi></mrow></msub><mo>=</mo><munder><mo>∑</mo><
   mrow><mi>j</mi></mrow></munder><mrow><msub><mrow><mi>A</mi></mrow><mrow
   ><mi mathvariant="italic">ij</mi></mrow></msub></mrow></mrow> :MATH]
   , where D is a diagonal matrix, A is obtained via a Gaussian kernel
   function based optimal transport distance. In manifold learning,
   Laplacian graph is usually used to capture the high-order geometrical
   structure relationships in original data^[110]38–[111]40.
   [MATH:
   <mrow><mi>S</mi><mo>∈</mo><msubsup><mrow><mi>R</mi></mrow><mrow><mo>+</
   mo></mrow><mrow><mi>n</mi><mo>×</mo><mi>n</mi></mrow></msubsup></mrow>
   :MATH]
   represents the learned sample-sample similarity matrix.
   [MATH: <mn mathvariant="bold-italic">1</mn> :MATH]
   is a column vector with all its elements to be 1 s.
   [MATH:
   <msub><mrow><mn>1</mn></mrow><mrow><mi>n</mi><mo>×</mo><mi>n</mi></mrow
   ></msub> :MATH]
   represents a matrix of 1 s.
   [MATH: <mi>φ</mi> :MATH]
   is a parameter that is used to control the strength of the constraint
   [MATH: <mrow><mi>S</mi><mn
   mathvariant="bold-italic">1</mn><mo>−</mo><mn
   mathvariant="bold-italic">1</mn></mrow> :MATH]
   .
   [MATH: <mi>α</mi> :MATH]
   and
   [MATH: <mi>γ</mi> :MATH]
   are graph regularization parameters.
   [MATH: <mi>β</mi> :MATH]
   is spatial regularization parameter that is used to control the
   strength of spatial embeddings consistence, and is set
   [MATH: <mrow><mi>β</mi><mo>=</mo><mn>0.1</mn></mrow> :MATH]
   for all datasets. We will discuss how to select
   [MATH: <mi>α</mi> :MATH]
   and
   [MATH: <mi>γ</mi> :MATH]
   parameters in the following section.

   In the objective function of NMFGOT (Eq. [112]5), the first term,
   [MATH:
   <mrow><msubsup><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</
   mn></mrow><mrow><mn>2</mn></mrow></msubsup><msubsup><mrow><mo>∣</mo><mo
   >∣</mo><msup><mrow><mi>X</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mo>−</mo><msup
   ><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><msup><mrow><mi
   >H</mi></mrow><mrow><mrow><mo>(</mo><mrow><mi>i</mi></mrow><mo>)</mo></
   mrow></mrow></msup><mo>∣</mo><mo>∣</mo></mrow><mrow><mi>F</mi></mrow><m
   row><mn>2</mn></mrow></msubsup></mrow> :MATH]
   is the NMF loss function for microbial abundance data and metabolite
   abundance data. The second term,
   [MATH:
   <mrow><msubsup><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</
   mn></mrow><mrow><mn>2</mn></mrow></msubsup><msubsup><mrow><mo>∣</mo><mo
   >∣</mo><mi>S</mi><mo>−</mo><msup><mrow><mi>H</mi></mrow><mrow><msup><mr
   ow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><msup><mrow><mi>H</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mo>∣</mo><mo>∣
   </mo></mrow><mrow><mi>F</mi></mrow><mrow><mn>2</mn></mrow></msubsup></m
   row> :MATH]
   , is a consensus graph fusion strategy which aims to learn a sample
   similarity matrix S. Through iteratively training, generated kernel
   [MATH: <mrow><msup><mrow><mi>H</mi></mrow><mrow><msup><mrow><mfenced
   close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><msup><mrow><mi>H</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup></mrow> :MATH]
   from each data modality was regularized towards a consensus graph S. In
   the third term of the objective function,
   [MATH: <mrow><mi
   mathvariant="italic">tr</mi><mo>(</mo><msup><mrow><mi>H</mi></mrow><mro
   w><msup><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><msup><mrow><mi>H</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mo>(</mo><msub
   ><mrow><mn>1</mn></mrow><mrow><mi>n</mi><mo>×</mo><mi>n</mi></mrow></ms
   ub><mo>−</mo><msup><mrow><mi>H</mi></mrow><mrow><msup><mrow><mfenced
   close=")"
   open="("><mrow><mi>j</mi></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><msup><mrow><mi>H</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mi>j</mi></mrow></mfenced></mrow></msup><mo>)</mo><mo>)
   </mo></mrow> :MATH]
   , we adopt a spatial regularization technique to preserve the spatial
   consistency of samples. For low-dimensional sample representation
   matrices
   [MATH: <msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup> :MATH]
   and
   [MATH: <msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup> :MATH]
   obtained from microbial abundance and metabolite abundance data, we
   assume that samples that are spatially distant in the one embedding
   space, should be also pushed further in the other embedding space.
   Meanwhile, this strategy also introduces more flexibility and allows
   for specificity across different molecular modalities. The fourth term,
   [MATH: <mrow><mi mathvariant="italic">||</mi><mi>S</mi><mn
   mathvariant="bold-italic">1</mn><mo>−</mo><mn
   mathvariant="bold-italic">1</mn><mi mathvariant="italic">||</mi></mrow>
   :MATH]
   , encourages each row in S to have summation close to 1.

   The first four terms in Eq. [113]5 can learn the low-dimensional
   representation matrices
   [MATH: <msup><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup> :MATH]
   and
   [MATH:
   <msup><mrow><mi>H</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow
   ></msup> :MATH]
   for multi-omics microbiome data, but they may lose the high-dimensional
   geometrical structure information in the original data
   space^[114]38,[115]40. To solve this problem, we add the fifth term
   [MATH: <mrow><munderover accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mn>2</mn></mrow></munderover><mi>t</mi><mi>r</mi>
   <mrow><mo>(</mo><mrow><msubsup><mrow><msup><mrow><mi>H</mi></mrow><mrow
   ><mrow><mo>(</mo><mrow><mi>i</mi></mrow><mo>)</mo></mrow></mrow></msup>
   <mi>L</mi></mrow><mrow><mi>o</mi><mi>p</mi><mi>t</mi></mrow><mrow><mrow
   ><mo>(</mo><mrow><mi>i</mi></mrow><mo>)</mo></mrow></mrow></msubsup><ms
   up><mrow><mi>H</mi></mrow><mrow><msup><mrow><mrow><mo>(</mo><mrow><mi>i
   </mi></mrow><mo>)</mo></mrow></mrow><mrow><mi>T</mi></mrow></msup></mro
   w></msup></mrow><mo>)</mo></mrow></mrow> :MATH]
   in the object of NMFGOT. The Laplacian graph is established based on
   the optimal transport distance between samples (see subsection 2.3),
   and it can well capture the intrinsic geometry structure of feature
   spaces in unsupervised learning environment^[116]41.

   The details of constructing Laplacian graph
   [MATH: <msub><mrow><mi>L</mi></mrow><mrow><mi
   mathvariant="italic">opt</mi></mrow></msub> :MATH]
   are presented as follows:

   We first used the optimal transport distance described above to
   construct the Laplacian matrix
   [MATH: <msub><mrow><mi>L</mi></mrow><mrow><mi
   mathvariant="italic">opt</mi></mrow></msub> :MATH]
   . Given the optimal transport distance matrix
   [MATH:
   <msup><mrow><mi>D</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow
   ></msup> :MATH]
   obtained from the ith compositional profile, the sample-sample
   similarity matrix
   [MATH:
   <msup><mrow><mi>A</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow
   ></msup> :MATH]
   are defined as follows:
   [MATH: <mrow><msubsup><mrow><mi>A</mi></mrow><mrow><mi
   mathvariant="italic">jl</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo>
   </mrow></msubsup><mo>=</mo><mi>exp</mi><mrow><mfenced close=")"
   open="("><mrow><mo>−</mo><mfrac><mrow><msubsup><mrow><mi>D</mi></mrow><
   mrow><mi
   mathvariant="italic">jl</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo>
   </mrow></msubsup></mrow><mrow><mi>μ</mi><msup><mrow><msubsup><mrow><mo>
   (</mo><mi>σ</mi></mrow><mrow><mi
   mathvariant="italic">jl</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo>
   </mrow></msubsup><mo>)</mo></mrow><mrow><mn>2</mn></mrow></msup></mrow>
   </mfrac></mrow></mfenced></mrow><mo>,</mo></mrow> :MATH]
   6
   [MATH: <mrow><msubsup><mrow><mi>σ</mi></mrow><mrow><mi
   mathvariant="italic">jl</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo>
   </mrow></msubsup><mo>=</mo><mfrac><mrow><msub><mrow><mi>E</mi></mrow><m
   row><mi mathvariant="italic">jl</mi></mrow></msub><mo>+</mo><mi
   mathvariant="italic">mean</mi><mrow><mfenced close=")"
   open="("><mrow><mi>E</mi><mrow><mfenced close=")"
   open="("><mrow><mi>j</mi><mo>,</mo><msub><mrow><mi>N</mi></mrow><mrow><
   mi>j</mi></mrow></msub></mrow></mfenced></mrow></mrow></mfenced></mrow>
   <mo>+</mo><mi mathvariant="italic">mean</mi><mrow><mfenced close=")"
   open="("><mrow><mi>E</mi><mrow><mfenced close=")"
   open="("><mrow><mi>k</mi><mo>,</mo><msub><mrow><mi>N</mi></mrow><mrow><
   mi>l</mi></mrow></msub></mrow></mfenced></mrow></mrow></mfenced></mrow>
   </mrow><mrow><mn>3</mn></mrow></mfrac><mo>.</mo></mrow> :MATH]
   7

   where
   [MATH: <mi>μ</mi> :MATH]
   is a parameter that can be empirically set. In this study, we set
   [MATH: <mrow><mi>μ</mi><mo>=</mo><mn>0.5</mn></mrow> :MATH]
   for all datasets.
   [MATH: <msubsup><mrow><mi>σ</mi></mrow><mrow><mi
   mathvariant="italic">jl</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo>
   </mrow></msubsup> :MATH]
   is the bandwidth parameter that can be used to eliminate the scaling
   problem.
   [MATH: <msub><mrow><mi>E</mi></mrow><mrow><mi
   mathvariant="italic">jl</mi></mrow></msub> :MATH]
   denotes the squared Euclidean distance between sample
   [MATH: <mi>j</mi> :MATH]
   and
   [MATH: <mi>l</mi> :MATH]
   .
   [MATH: <msub><mrow><mi>N</mi></mrow><mrow><mi>j</mi></mrow></msub>
   :MATH]
   is the set of nearest neighbors of the
   [MATH: <mrow><mi>j</mi><mi mathvariant="normal">th</mi></mrow> :MATH]
   sample where
   [MATH: <mrow><mfenced close="|"
   open="|"><mrow><msub><mrow><mi>N</mi></mrow><mrow><mi>j</mi></mrow></ms
   ub></mrow></mfenced></mrow><mo>=</mo><mn>20</mn> :MATH]
   .
   [MATH:
   <mrow><mi>m</mi><mi>e</mi><mi>a</mi><mi>n</mi><mrow><mo>(</mo><mrow><mi
   >E</mi><mrow><mo>(</mo><mrow><mi>j</mi><mo>,</mo><msub><mrow><mi>N</mi>
   </mrow><mrow><mi>j</mi></mrow></msub></mrow><mo>)</mo></mrow></mrow><mo
   >)</mo></mrow></mrow> :MATH]
   denotes the average of the squared Euclidean distances between the
   [MATH: <mrow><mi>j</mi><mi mathvariant="normal">th</mi></mrow> :MATH]
   sample and its neighbors.

   Then, the Laplacian matrix
   [MATH: <msub><mrow><mi>L</mi></mrow><mrow><mi
   mathvariant="italic">opt</mi></mrow></msub> :MATH]
   cab be defined as follows:
   [MATH: <mrow><msubsup><mrow><mi>L</mi></mrow><mrow><mi
   mathvariant="italic">opt</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo
   ></mrow></msubsup><mo>=</mo><msubsup><mrow><mi>D</mi></mrow><mrow><mi
   mathvariant="italic">opt</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo
   ></mrow></msubsup><mo>−</mo><msup><mrow><mi>A</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mo>.</mo></mro
   w> :MATH]
   8

   Here,
   [MATH: <msubsup><mrow><mi>D</mi></mrow><mrow><mi
   mathvariant="italic">opt</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo
   ></mrow></msubsup> :MATH]
   is a diagonal matrix with entries
   [MATH: <mrow><msub><mrow><msubsup><mrow><mi>D</mi></mrow><mrow><mi
   mathvariant="italic">opt</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo
   ></mrow></msubsup></mrow><mrow><mi
   mathvariant="italic">jj</mi></mrow></msub><mo>=</mo><msubsup><mrow><mo>
   ∑</mo></mrow><mrow><mi>l</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>n</mi
   ></mrow></msubsup><mrow><msubsup><mrow><mi>A</mi></mrow><mrow><mi
   mathvariant="italic">jl</mi></mrow><mrow><mo>(</mo><mi>i</mi><mo>)</mo>
   </mrow></msubsup></mrow></mrow> :MATH]
   .

   We note that canonical correlations analysis (CCA) is also used to
   integrate multi-omics microbiome data^[117]8. Our proposed NMFGOT
   framework differs from CCA or its variants (sparse CCA) in the
   following three aspects. 1) CCA-based methods assume that there exists
   linear projection of two sets of observations, and maximize correlation
   between these two data modalities. However, these methods do not
   consider the complicated nonlinear relationships among microbial
   species^[118]42,[119]43. In NMFGOT, we used optimal transport distance
   to measure the relationships between microbial samples, which better
   dealt with the high-order interactions (more than two species) among
   microbial taxa or metabolites. 2) NMFGOT includes a spatial
   regularization term to preserve the spatial consistency of samples in
   the embedding space across different data modalities, and to some
   extent it tolerates modality specificity. Obviously, spatial
   regularization leads to better clustering solutions and
   interpretability: the elements in
   [MATH: <msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup> :MATH]
   will tend to be consistent with the ones in
   [MATH: <msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup> :MATH]
   . 3) NMFGOT utilizes optimal transport Laplacian to encode the
   geometrical structure relationships of microbial samples in the
   original data space, and enhances the representation ability of
   low-dimensional sample factor matrices.

The optimization algorithm for NMFGOT

   We used the alternative iteration algorithm to solve the optimization
   problem of NMFGOT model. The updating rules for
   [MATH: <msup><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup> :MATH]
   ,
   [MATH: <mrow><msup><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><mo>,</mo></mro
   w> :MATH]
   [MATH: <msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup> :MATH]
   ,
   [MATH: <msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup> :MATH]
   and
   [MATH: <mi>S</mi> :MATH]
   can be obtained as follows.
   [MATH: <mrow><msubsup><mrow><mi>W</mi></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msubsup><mo>⟵</mo><m
   subsup><mrow><mi>W</mi></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msubsup><mfrac><mrow
   ><msub><mrow><mfenced close=")"
   open="("><mrow><msup><mrow><mi>X</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><msup><mrow><mi
   >H</mi></mrow><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup></mrow></mfenced></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub></mrow><mrow><msub><mrow><mfe
   nced close=")"
   open="("><mrow><msup><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><msup><mrow><ms
   up><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mi>H</mi></mro
   w><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup></mrow></mfenced></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub></mrow></mfrac><mo>.</mo></mr
   ow> :MATH]
   9
   [MATH: <mrow><msubsup><mrow><mi>W</mi></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msubsup><mo>⟵</mo><m
   subsup><mrow><mi>W</mi></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msubsup><mfrac><mrow
   ><msub><mrow><mfenced close=")"
   open="("><mrow><msup><mrow><mi>X</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><msup><mrow><mi
   >H</mi></mrow><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup></mrow></mfenced></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub></mrow><mrow><msub><mrow><mfe
   nced close=")"
   open="("><mrow><msup><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><msup><mrow><mi
   >H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><msup><mrow><mi
   >H</mi></mrow><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup></mrow></mfenced></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub></mrow></mfrac><mo>.</mo></mr
   ow> :MATH]
   10
   [MATH: <mrow><msubsup><mrow><mi>H</mi></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msubsup><mo>⟵</mo><m
   subsup><mrow><mi>H</mi></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msubsup><mfrac><mrow
   ><msub><mrow><mfenced close=")"
   open="("><mrow><msup><mrow><mi>W</mi></mrow><mrow><msup><mrow><mfenced
   close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><msup><mrow><mi>X</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mo>+</mo><mi>α
   </mi><msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><msup><mrow><mi
   >S</mi></mrow><mrow><mi>T</mi></mrow></msup><mo>+</mo><mi>γ</mi><msup><
   mrow><msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mi>A</mi></mro
   w><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mo>+</mo><mi>β
   </mi><msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><msup><mrow><mi
   >H</mi></mrow><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><msup><mrow><mi>H</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup></mrow></mfence
   d></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub></mrow><mrow><msub><mrow><mfe
   nced close=")"
   open="("><mrow><msup><mrow><mi>W</mi></mrow><mrow><msup><mrow><mfenced
   close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><msup><mrow><mi>W</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><msup><mrow><mi
   >H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mo>+</mo><mi>α
   </mi><msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><msup><mrow><mi
   >H</mi></mrow><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><msup><mrow><mi>H</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mo>+</mo><mi>γ
   </mi><msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><msup><mrow><mi
   >D</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup></mrow></mfence
   d></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub></mrow></mfrac><mo>.</mo></mr
   ow> :MATH]
   11
   [MATH: <mrow><msubsup><mrow><mi>H</mi></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msubsup><mo>⟵</mo><m
   subsup><mrow><mi>H</mi></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msubsup><mfrac><mrow
   ><msub><mrow><mfenced close=")"
   open="("><mrow><msup><mrow><mi>W</mi></mrow><mrow><msup><mrow><mfenced
   close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><msup><mrow><mi>X</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><mo>+</mo><mi>α
   </mi><msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><msup><mrow><mi
   >S</mi></mrow><mrow><mi>T</mi></mrow></msup><mo>+</mo><mi>γ</mi><msup><
   mrow><msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><mi>A</mi></mro
   w><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><mo>+</mo><mi>β
   </mi><msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><msup><mrow><mi
   >H</mi></mrow><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><msup><mrow><mi>H</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup></mrow></mfence
   d></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub></mrow><mrow><msub><mrow><mfe
   nced close=")"
   open="("><mrow><msup><mrow><mi>W</mi></mrow><mrow><msup><mrow><mfenced
   close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><msup><mrow><mi>W</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><msup><mrow><mi
   >H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><mo>+</mo><mi>α
   </mi><msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><msup><mrow><mi
   >H</mi></mrow><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><msup><mrow><mi>H</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><mo>+</mo><mi>γ
   </mi><msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><msup><mrow><mi
   >D</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup></mrow></mfence
   d></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub></mrow></mfrac><mo>.</mo></mr
   ow> :MATH]
   12
   [MATH: <mrow><msub><mrow><mi>S</mi></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub><mo>⟵</mo><msub><mrow><mi>S</
   mi></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub><mfrac><mrow><msub><mrow><mfe
   nced close=")"
   open="("><mrow><mi>α</mi><msubsup><mrow><mo>∑</mo></mrow><mrow><mi>l</m
   i><mo>=</mo><mn>1</mn></mrow><mrow><mn>2</mn></mrow></msubsup><mrow><ms
   up><mrow><mi>H</mi></mrow><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mi>l</mi></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup></mrow><msup><mrow><mi>H</mi></mrow><mrow><mfence
   d close=")"
   open="("><mrow><mi>l</mi></mrow></mfenced></mrow></msup><mo>+</mo><mn>2
   </mn><mi>η</mi><msup><mrow><mi
   mathvariant="italic">ee</mi></mrow><mrow><mi>T</mi></mrow></msup></mrow
   ></mfenced></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub></mrow><mrow><msub><mrow><mn>
   2</mn><mrow><mfenced close=")"
   open="("><mrow><mi>α</mi><mi>S</mi><mo>+</mo><mi>η</mi><msup><mrow><mi
   mathvariant="italic">ee</mi></mrow><mrow><mi>T</mi></mrow></msup><mi>S<
   /mi></mrow></mfenced></mrow></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub></mrow></mfrac><mo>.</mo></mr
   ow> :MATH]
   13

Selection of parameters α,
[MATH: <mi>φ</mi> :MATH]
and γ

   In NMFGOT, there are three parameters α,
   [MATH: <mi>φ</mi> :MATH]
   and γ that need to be tuned, and they are determined as the following.
   The optimization problems
   [MATH:
   <mrow><munder><mrow><mi>min</mi></mrow><mrow><msup><mrow><mi>W</mi></mr
   ow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mo>,</mo><msup
   ><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mo>≥</mo><mn>0
   </mn></mrow></munder><mrow><msubsup><mrow><mi
   mathvariant="italic">||</mi><msup><mrow><mi>X</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mo>−</mo><msup
   ><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><msup><mrow><mi
   >H</mi></mrow><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><mi
   mathvariant="italic">||</mi></mrow><mrow><mi>F</mi></mrow><mrow><mn>2</
   mn></mrow></msubsup></mrow></mrow> :MATH]
   ,
   [MATH:
   <mrow><munder><mrow><mi>min</mi></mrow><mrow><msup><mrow><mi>W</mi></mr
   ow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><mo>,</mo><msup
   ><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><mo>≥</mo><mn>0
   </mn></mrow></munder><mrow><msubsup><mrow><mi
   mathvariant="italic">||</mi><msup><mrow><mi>X</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><mo>−</mo><msup
   ><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><msup><mrow><mi
   >H</mi></mrow><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><mi
   mathvariant="italic">||</mi></mrow><mrow><mi>F</mi></mrow><mrow><mn>2</
   mn></mrow></msubsup></mrow></mrow> :MATH]
   were first solved by NNDSVD^[120]44 and obtain the solutions
   [MATH: <mrow><msup><mrow><mover
   accent="true"><mrow><mi>W</mi></mrow><mo>ˆ</mo></mover></mrow><mrow><mf
   enced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mo>,</mo><mspa
   ce width="0.25em"></mspace><msup><mrow><mover
   accent="true"><mrow><mi>H</mi></mrow><mo>ˆ</mo></mover></mrow><mrow><mf
   enced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mo>,</mo><mspa
   ce width="0.25em"></mspace><msup><mrow><mover
   accent="true"><mrow><mi>W</mi></mrow><mo>ˆ</mo></mover></mrow><mrow><mf
   enced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup></mrow> :MATH]
   and
   [MATH: <msup><mrow><mover
   accent="true"><mrow><mi>H</mi></mrow><mo>ˆ</mo></mover></mrow><mrow><mf
   enced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup> :MATH]
   . Then,
   [MATH: <mover accent="true"><mrow><mi>S</mi></mrow><mo>ˆ</mo></mover>
   :MATH]
   can be obtained using SNF^[121]20, and set α,
   [MATH: <mi>φ</mi> :MATH]
   and γ as follows.
   [MATH: <mtable><mtr><mtd
   columnalign="left"><mi>α</mi><mo>=</mo><mfenced close=")"
   open="("><mrow><msubsup><mrow><mfenced close="∥"
   open="∥"><mrow><msup><mrow><mi>X</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mo>−</mo><msup
   ><mrow><mover
   accent="true"><mrow><mi>W</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><msup><mrow><mo
   ver
   accent="true"><mrow><mi>H</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup></mrow></mfenced></mrow><mrow><mi>F</mi></mrow><m
   row><mn>2</mn></mrow></msubsup><mo>+</mo><msubsup><mrow><mfenced
   close="∥" open="∥"><mrow><msup><mrow><mi>X</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><mo>−</mo><msup
   ><mrow><mover
   accent="true"><mrow><mi>W</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><msup><mrow><mo
   ver
   accent="true"><mrow><mi>H</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup></mrow></mfenced></mrow><mrow><mi>F</mi></mrow><m
   row><mn>2</mn></mrow></msubsup></mrow></mfenced><mo>/</mo><mfenced
   close=")" open="("><mrow><mn>5</mn><munderover accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mn>2</mn></mrow></munderover><msubsup><mrow><mfen
   ced close="∥" open="∥"><mrow><mover
   accent="true"><mrow><mi>S</mi></mrow><mrow><mo>^</mo></mrow></mover><mo
   >−</mo><msup><mrow><mover
   accent="true"><mrow><mi>H</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><msup><mrow><mo
   ver
   accent="true"><mrow><mi>H</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup></mrow></mfenced></mrow><mrow><mi>F</mi></mrow><m
   row><mn>2</mn></mrow></msubsup></mrow></mfenced></mtd></mtr><mtr><mtd
   columnalign="left"><mi>φ</mi><mo>=</mo><mfenced close=")"
   open="("><mrow><msubsup><mrow><mfenced close="∥"
   open="∥"><mrow><msup><mrow><mi>X</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mo>−</mo><msup
   ><mrow><mover
   accent="true"><mrow><mi>W</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><msup><mrow><mo
   ver
   accent="true"><mrow><mi>H</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup></mrow></mfenced></mrow><mrow><mi>F</mi></mrow><m
   row><mn>2</mn></mrow></msubsup><mo>+</mo><msubsup><mrow><mfenced
   close="∥" open="∥"><mrow><msup><mrow><mi>X</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><mo>−</mo><msup
   ><mrow><mover
   accent="true"><mrow><mi>W</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><msup><mrow><mo
   ver
   accent="true"><mrow><mi>H</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup></mrow></mfenced></mrow><mrow><mi>F</mi></mrow><m
   row><mn>2</mn></mrow></msubsup></mrow></mfenced><mo>/</mo><mfenced
   close=")" open="("><mrow><mn>1000</mn><msubsup><mrow><mfenced close="∥"
   open="∥"><mrow><mover
   accent="true"><mrow><mi>S</mi></mrow><mrow><mo>^</mo></mrow></mover><mi
   mathvariant="bold">1</mi><mo>−</mo><mi
   mathvariant="bold">1</mi></mrow></mfenced></mrow><mrow><mi>F</mi></mrow
   ><mrow><mn>2</mn></mrow></msubsup></mrow></mfenced><mo>,</mo><mi
   mathvariant="normal">and</mi></mtd></mtr><mtr><mtd
   columnalign="left"><mi>γ</mi><mo>=</mo><mfenced close=")"
   open="("><mrow><msubsup><mrow><mfenced close="∥"
   open="∥"><mrow><msup><mrow><mi>X</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><mo>−</mo><msup
   ><mrow><mover
   accent="true"><mrow><mi>W</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup><msup><mrow><mo
   ver
   accent="true"><mrow><mi>H</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup></mrow></mfenced></mrow><mrow><mi>F</mi></mrow><m
   row><mn>2</mn></mrow></msubsup><mo>+</mo><msubsup><mrow><mfenced
   close="∥" open="∥"><mrow><msup><mrow><mi>X</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><mo>−</mo><msup
   ><mrow><mover
   accent="true"><mrow><mi>W</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup><msup><mrow><mo
   ver
   accent="true"><mrow><mi>H</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup></mrow></mfenced></mrow><mrow><mi>F</mi></mrow><m
   row><mn>2</mn></mrow></msubsup></mrow></mfenced><mo>/</mo><mfenced
   close=")" open="("><mrow><mn>100</mn><munderover accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mn>2</mn></mrow></munderover><mi>t</mi><mi>r</mi>
   <mfenced close=")" open="("><mrow><msup><mrow><mover
   accent="true"><mrow><mi>H</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow><mrow><mi>T</mi></mrow
   ></msup></mrow></msup><msup><mrow><mi>L</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><msup><mrow><mo
   ver
   accent="true"><mrow><mi>H</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup></mrow></mfence
   d></mrow></mfenced></mtd></mtr></mtable> :MATH]
   14

   The sensitive analyses of
   [MATH: <mi>α</mi> :MATH]
   and
   [MATH: <mi>γ</mi> :MATH]
   are presented in Supplementary Fig. [122]4.

Evaluation metrics

   Accuracy(AC), adjusted rand index (ARI)^[123]45 and silhouette
   coefficient^[124]46 are used to assess the performance of the
   clustering methods. For unlabeled dataset, we use an unsupervised
   metric, silhouette coefficient^[125]47,[126]48, to evaluate the
   clustering performance of each method. High silhouette coefficient
   score indicates that the sample is close to other samples in the same
   cluster, and distant from samples in other clusters. The average value
   of silhouette scores is used as the final evaluation.

The robustness analysis of NMFGOT on the hyperparameters

   To test the robustness of NMFGOT on the hyperparameters, we varied α
   and γ in the range
   [MATH: <mrow><mfenced close="}" open="{"><mrow><msup><mrow><mi
   mathvariant="normal">α</mi></mrow><mrow><mo>*</mo></mrow></msup><mo>/</
   mo><mn>10</mn><mo>,</mo><msup><mrow><mi
   mathvariant="normal">α</mi></mrow><mrow><mo>*</mo></mrow></msup><mo>/</
   mo><mn>5</mn><mo>,</mo><mspace width="1em"></mspace><msup><mrow><mi
   mathvariant="normal">α</mi></mrow><mrow><mo>*</mo></mrow></msup><mo>/</
   mo><mn>2</mn><mo>,</mo><msup><mrow><mi
   mathvariant="normal">α</mi></mrow><mrow><mo>*</mo></mrow></msup><mo>,</
   mo><mspace width="1em"></mspace><mn>2</mn><msup><mrow><mi
   mathvariant="normal">α</mi></mrow><mrow><mo>*</mo></mrow></msup><mo>,</
   mo><mn>5</mn><msup><mrow><mi
   mathvariant="normal">α</mi></mrow><mrow><mo>*</mo></mrow></msup><mo>,</
   mo><mspace width="1em"></mspace><mn>10</mn><msup><mrow><mi
   mathvariant="normal">α</mi></mrow><mrow><mo>*</mo></mrow></msup><mo>,</
   mo><mspace width="1em"></mspace></mrow></mfenced></mrow> :MATH]
   and
   [MATH: <mrow><mfenced close="}" open="{"><mrow><msup><mrow><mi
   mathvariant="normal">γ</mi></mrow><mrow><mo>*</mo></mrow></msup><mo>/</
   mo><mn>10</mn><mo>,</mo><msup><mrow><mi
   mathvariant="normal">γ</mi></mrow><mrow><mo>*</mo></mrow></msup><mo>/</
   mo><mn>5</mn><mo>,</mo><msup><mrow><mi
   mathvariant="normal">γ</mi></mrow><mrow><mo>*</mo></mrow></msup><mo>/</
   mo><mn>2</mn><mo>,</mo><msup><mrow><mi
   mathvariant="normal">γ</mi></mrow><mrow><mo>*</mo></mrow></msup><mo>,</
   mo><mspace width="1em"></mspace><mn>2</mn><msup><mrow><mi
   mathvariant="normal">γ</mi></mrow><mrow><mo>*</mo></mrow></msup><mo>,</
   mo><mn>5</mn><msup><mrow><mi
   mathvariant="normal">γ</mi></mrow><mrow><mo>*</mo></mrow></msup><mo>,</
   mo><mspace width="1em"></mspace><mn>10</mn><msup><mrow><mi
   mathvariant="normal">γ</mi></mrow><mrow><mo>*</mo></mrow></msup></mrow>
   </mfenced></mrow> :MATH]
   , respectively. Here
   [MATH: <msup><mrow><mi
   mathvariant="normal">α</mi></mrow><mrow><mo>*</mo></mrow></msup> :MATH]
   and
   [MATH: <msup><mrow><mi
   mathvariant="normal">γ</mi></mrow><mrow><mo>*</mo></mrow></msup> :MATH]
   are the hyperparameters chosen by the rules described in the main text.
   The results are presented in Supplementary Fig. [127]4. For IBD
   dataset, the silhouette score is relatively stable when the
   hyperparameters vary. For GC dataset, the AC is stable when the
   hyperparameters vary.

Extension of NMFGOT to unseen or holdout data

   we extended NMFGOT to analyze the unseen or holdout data. Given the
   unseen or holdout data
   [MATH: <mover accent="true"><mrow><mi>X</mi></mrow><mo>ˆ</mo></mover>
   :MATH]
   , the transformation objective function can be defined as follows:
   [MATH: <mrow><munder><mrow><mi>min</mi></mrow><mrow><msup><mrow><mover
   accent="true"><mrow><mi>H</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mo>,</mo><mi>Š
   </mi></mrow></munder><mi>J</mi><mo>=</mo><munderover accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mn>2</mn></mrow></munderover><msubsup><mrow><mfen
   ced close="∥" open="∥"><mrow><msup><mrow><mover
   accent="true"><mrow><mi>X</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mo>−</mo><msup
   ><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><msup><mrow><mo
   ver
   accent="true"><mrow><mi>H</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><mrow><mo>(</mo><mrow><mi>i</mi></mrow><mo>)</mo></mrow></mro
   w></msup></mrow></mfenced></mrow><mrow><mi>F</mi></mrow><mrow><mn>2</mn
   ></mrow></msubsup><mo>+</mo><mi>α</mi><munderover accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mn>2</mn></mrow></munderover><msubsup><mrow><mfen
   ced close="∥" open="∥"><mrow><mi>Š</mi><mo>−</mo><msup><mrow><mfenced
   close="]" open="["><mrow><msup><mrow><mover
   accent="true"><mrow><mi>H</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mfenced
   open="∥"><mrow></mrow></mfenced><msup><mrow><mi>H</mi></mrow><mrow><mfe
   nced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup></mrow></mfence
   d></mrow><mrow><mi>T</mi></mrow></msup><mfenced close="]"
   open="["><mrow><msup><mrow><mover
   accent="true"><mrow><mi>H</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mfenced
   open="∥"><mrow></mrow></mfenced><msup><mrow><mi>H</mi></mrow><mrow><mfe
   nced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup></mrow></mfence
   d></mrow></mfenced></mrow><mrow><mi>F</mi></mrow><mrow><mn>2</mn></mrow
   ></msubsup><mo>+</mo><mi>β</mi><msubsup><mrow><mfenced close="∥"
   open="∥"><mrow><mi>Š</mi><mi mathvariant="bold">1</mi><mo>−</mo><mi
   mathvariant="bold">1</mi></mrow></mfenced></mrow><mrow><mi>F</mi></mrow
   ><mrow><mn>2</mn></mrow></msubsup><mo>,</mo></mrow> :MATH]
   15
   [MATH: <mrow><mi>s</mi><mo>.</mo><mi>t</mi><mo>.</mo><mspace
   width="0.25em"></mspace><msup><mrow><mover
   accent="true"><mrow><mi>H</mi></mrow><mo>^</mo></mover></mrow><mrow><mr
   ow><mo>(</mo><mrow><mi>i</mi></mrow><mo>)</mo></mrow></mrow></msup><mo>
   ,</mo><mi>Š</mi><mo>,</mo><mi>α</mi><mo>,</mo><mi>β</mi><mo>≥</mo><mn>0
   </mn></mrow> :MATH]

   where
   [MATH: <msup><mrow><mover
   accent="true"><mrow><mi>H</mi></mrow><mo>^</mo></mover></mrow><mrow><mr
   ow><mo>(</mo><mrow><mi>i</mi></mrow><mo>)</mo></mrow></mrow></msup>
   :MATH]
   represents the low-dimensional representation of unseen data,
   [MATH: <mrow><mover
   accent="true"><mrow><mi>S</mi></mrow><mo>ˇ</mo></mover><mo>=</mo><mrow>
   <mfenced close="]" open="["><mrow><mtable><mtr><mtd><mover
   accent="true"><mrow><mi>S</mi></mrow><mo>ˆ</mo></mover></mtd><mtd><msup
   ><mrow><mi>A</mi></mrow><mrow><mi>T</mi></mrow></msup></mtd></mtr><mtr>
   <mtd><mi>A</mi></mtd><mtd><mi>S</mi></mtd></mtr></mtable></mrow></mfenc
   ed></mrow></mrow> :MATH]
   is the fused similarity matrix.
   [MATH: <mover accent="true"><mrow><mi>S</mi></mrow><mo>ˆ</mo></mover>
   :MATH]
   is the similarity matrix between the unseen data, and
   [MATH: <mi>A</mi> :MATH]
   is the similarity matrix between the unseen data and the training data.
   We used the consensus graph fusion,
   [MATH:
   <mrow><msubsup><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</
   mn></mrow><mrow><mn>2</mn></mrow></msubsup><msubsup><mrow><mo>∣</mo><mo
   >∣</mo><mi>Š</mi><mo>−</mo><msup><mrow><mrow><mo>[</mo><mrow><msup><mro
   w><mover
   accent="true"><mrow><mi>H</mi></mrow><mo>^</mo></mover></mrow><mrow><mf
   enced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mo>∣</mo><mo>∣
   </mo><msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup></mrow><mo>]</m
   o></mrow></mrow><mrow><mi>T</mi></mrow></msup><mrow><mo>[</mo><mrow><ms
   up><mrow><mover
   accent="true"><mrow><mi>H</mi></mrow><mo>^</mo></mover></mrow><mrow><mf
   enced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mo>∣</mo><mo>∣
   </mo><msup><mrow><mi>H</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup></mrow><mo>]</m
   o></mrow><mo>∣</mo><mo>∣</mo></mrow><mrow><mi>F</mi></mrow><mrow><mn>2<
   /mn></mrow></msubsup></mrow> :MATH]
   , to transform the unseen data into the latent space.

   We analyzed a colorectal cancer data, where microbial abundance profile
   and metabolite abundance data are simultaneously profiled in the same
   samples. The experimental results were presented in Supplementary Fig.
   [128]3.

Lasso regression and stability selection approach for metabolite-microbe
associations

   After obtaining the factors in our NMFGOT model, the lasso penalized
   regression model was used to identify the association between a
   metabolite and a set of microbial taxa^[129]18:
   [MATH: <mrow><munderover accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mi>n</mi></mrow></munderover><msup><mrow><mfenced
   close=")"
   open="("><mrow><msub><mrow><mi>y</mi></mrow><mrow><mi>i</mi></mrow></ms
   ub><mo>−</mo><msub><mrow><mi>β</mi></mrow><mrow><mn>0</mn></mrow></msub
   ><mo>−</mo><munderover accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mi>p</mi></mrow></munderover><msub><mrow><mi>β</m
   i></mrow><mrow><mi>j</mi></mrow></msub><msub><mrow><mi>x</mi></mrow><mr
   ow><mi>i</mi><mi>j</mi></mrow></msub></mrow></mfenced></mrow><mrow><mn>
   2</mn></mrow></msup><mo>+</mo><mi>λ</mi><munderover accent="false"
   accentunder="false"><mrow><mo>∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><m
   n>1</mn></mrow><mrow><mi>p</mi></mrow></munderover><mfenced close="∣"
   open="∣"><mrow><msub><mrow><mi>β</mi></mrow><mrow><mi>j</mi></mrow></ms
   ub></mrow></mfenced><mo>,</mo></mrow> :MATH]
   16

   where
   [MATH: <mi>n</mi> :MATH]
   is the number of samples,
   [MATH: <mi>p</mi> :MATH]
   is the number of microbial taxa.
   [MATH: <mi>y</mi> :MATH]
   is the response variable (metabolite abundance),
   [MATH: <mi>x</mi> :MATH]
   is the predictor (taxa abundance).
   [MATH: <mi>λ</mi> :MATH]
   is the parameter that controls the sparseness of models.

   Due to the sensitive of the lasso model, we also used stability
   selection approach to choose robust microbial taxa associated with a
   metabolite^[130]8,[131]19. In this manuscript, we used stability
   selection with lasso to select stable microbes. The process is
   demonstrated as follows:

   Step 1. Select a random subset of the analyzed data. We used top 100
   microbes with large entries in the column of the microbe loading matrix
   [MATH: <msup><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>1</mn></mrow></mfenced></mrow></msup> :MATH]
   and top 50 metabolites with large entries in the column of the
   metabolite loading matrix
   [MATH: <msup><mrow><mi>W</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>2</mn></mrow></mfenced></mrow></msup> :MATH]
   .

   Step 2. Fit the lasso model with
   [MATH: <mover accent="true"><mrow><mi>λ</mi></mrow><mo>ˆ</mo></mover>
   :MATH]
   that is about the best penalty
   [MATH: <mi>λ</mi> :MATH]
   , and record the set of selected microbes.

   Step 3. Repeat steps 1 and 2 t times.

   Step 4. Compute the frequency
   [MATH: <msub><mrow><mi>f</mi></mrow><mrow><mi>i</mi></mrow></msub>
   :MATH]
   of each microbe that was selected across all trials.

   Step 5. Pick out the microbes with its
   [MATH:
   <mrow><msub><mrow><mi>f</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>≥<
   /mo><msub><mrow><mi>f</mi></mrow><mrow><mi
   mathvariant="italic">thr</mi></mrow></msub></mrow> :MATH]
   .
   [MATH: <msub><mrow><mi>f</mi></mrow><mrow><mi
   mathvariant="italic">thr</mi></mrow></msub> :MATH]
   is a prespecified threshold.

   In this manuscript, we set the size of a random subset as
   [MATH:
   <mrow><mrow><mi>n</mi></mrow><mo>/</mo><mrow><mn>2</mn></mrow></mrow>
   :MATH]
   data,
   [MATH: <mrow><mi>t</mi><mo>=</mo><mn>100</mn></mrow> :MATH]
   and
   [MATH: <mrow><msub><mrow><mi>f</mi></mrow><mrow><mi
   mathvariant="italic">thr</mi></mrow></msub><mo>=</mo><mn>0.6</mn></mrow
   > :MATH]
   .

Supplementary information

   [132]Supplementary material^ (486.8KB, pdf)

Acknowledgements