Abstract
Simultaneous profiling of spatial transcriptomics (ST) and spatial
metabolomics (SM) on the same or adjacent tissue sections offers a
revolutionary approach to decode tissue microenvironment and identify
potential therapeutic targets for cancer immunotherapy. Unlike other
spatial omics, cross-modal integration of ST and SM data is challenging
due to differences in feature distributions of transcript counts and
metabolite intensities, and inherent disparities in spatial morphology
and resolution. Furthermore, cross-sample integration is essential for
capturing spatial consensus and heterogeneous patterns but is often
complicated by batch effects. Here, we introduce SpatialMETA, a
conditional variational autoencoder (CVAE)-based framework for
cross-modal and cross-sample integration of ST and SM data. SpatialMETA
employs tailored decoders and loss functions to enhance modality
fusion, batch effect correction and biological conservation, enabling
interpretable integration of spatially correlated ST-SM patterns and
downstream analysis. SpatialMETA identifies immune spatial clusters
with distinct metabolic features in cancer, revealing insights that
extend beyond the original study. Compared to existing tools,
SpatialMETA demonstrates superior reconstruction capability and fused
modality representation, accurately capturing ST and SM feature
distributions. In summary, SpatialMETA offers a powerful platform for
advancing spatial multi-omics research and refining the understanding
of metabolic heterogeneity within the tissue microenvironment.
Subject terms: Computational models, Data integration, Machine learning
__________________________________________________________________
Simultaneous profiling of spatial transcriptomics (ST) and metabolomics
(SM) offers a novel way to decode tissue microenvironment
heterogeneity. Here, the authors present SpatialMETA, a conditional
variational autoencoder-based framework designed for the integration of
ST and SM data.
Introduction
In recent years, spatial omics technologies have advanced rapidly,
particularly spatial transcriptomics (ST), which systematically
measures gene expression levels within the spatial context of tissues.
ST technologies are categorized into imaging-based ST (iST) and
sequencing-based ST (sST)^[46]1–[47]3. iST technologies include
MERFISH^[48]4, seqFISH^[49]5, CosMx^[50]6, Xenium^[51]7, and
STARMap^[52]8, while sST technologies encompass 10X Visium, Visium
HD^[53]9, Slide-seq^[54]10, and Stereo-seq^[55]11. ST facilitates the
study of cell-cell interactions^[56]12–[57]14 and spatial pattern
identification^[58]15–[59]17, offering profound insights into fields
such as cancer research, neuroscience, developmental biology, and plant
biology^[60]1–[61]3. Spatial metabolomics (SM) employs mass
spectrometry imaging to measure metabolite intensities and their
spatial position^[62]18,[63]19. Representative SM technologies include
Desorption Electrospray Ionization (DESI)^[64]20 and Matrix-Assisted
Laser Desorption/Ionization (MALDI)^[65]21. These techniques are
primarily applied to investigate cancer-associated metabolic
reprogramming, enabling the identification of crucial biomarkers with
potential diagnostic or therapeutic applications^[66]19,[67]22.
Combining ST and SM allows for a comprehensive characterization of
tissue microenvironment across spatial, transcriptomic, and metabolomic
modalities. Particularly in cancer research, the tumor microenvironment
(TME) presents unique properties including hypoxia, acidity, and
nutrient deprivation, which suppress anticancer immune responses and
diminish the effectiveness of immunotherapy in solid
tumors^[68]23,[69]24. Therefore, integrating transcriptomic and
metabolomic modalities is crucial for studying immune cell functional
plasticity and metabolic state heterogeneity. Spatially resolved data
can unveil intricate heterogeneities of tissue morphology, cell type
composition, and cellular interactions, offering critical insights into
tumor progression and therapeutic responses^[70]25. Simultaneous ST and
SM profiling has been widely applied in studies of prostate cancer
(PC)^[71]26, gastric cancer^[72]27, clear cell renal cell carcinoma
(ccRCC)^[73]28, glioblastoma (GBM)^[74]29, as well as human and mouse
brains^[75]30,[76]31, providing valuable insights into metabolic
reprogramming and metabolic heterogeneity under these contexts. Even
though measurement of ST and SM on the same tissue section has been
demonstrated^[77]31, most current spatial multi-omics practices
typically use adjacent or serial sections from the same tissue
sample^[78]27–[79]32.
The cross-modal (vertical) and cross-sample (horizontal) integration of
non-spatial single-cell multi-omics data has been advanced through the
application of machine learning and deep learning frameworks. Notable
approaches for vertical integration include Seurat^[80]33,[81]34,
totalVI^[82]35, and Stabmap^[83]36, methods such as Seurat
CCA/RPCA^[84]34, scVI^[85]37, scANVI^[86]38, SCALEX^[87]39,
scPoli^[88]40, and scAtlasVAE^[89]41 are designed for horizontal
integration. Recent advances in multi-omics integration have extended
into spatial omics, with methods such as spaVAE^[90]42,
spaMultiVAE^[91]42, SpatialGLUE^[92]43, and MISO^[93]44. Among them,
spaVAE, spaMultiVAE, and SpatialGLUE are primarily designed for
integrating ST with spatial proteomics (SP) or spatial epigenomics
(SE), where both modalities are derived from the same tissue section.
Recently, MISO was developed to integrate ST, SP, SE, and SM data
derived from the same tissue sections, utilizing an algorithm
specifically designed to incorporate histological images. However, most
ST and SM data are obtained from adjacent tissue sections, which can
create discrepancies in spatial morphology and resolution. This
necessitates additional preprocessing steps to unify spatial
coordinates and resolution. Moreover, SM data is represented as
continuous mass spectrometry intensity, rather than sequencing counts
as in ST, SP, or SE. This difference results in distinct feature
distributions and necessitates specialized model design. In addition,
capturing spatial consensus and heterogeneous patterns in ST and SM
data across samples is crucial. Hence, it is essential to develop
methods for the simultaneous vertical and horizontal integration of ST
and SM data that effectively balance modality fusion, batch effect
correction, and biological conservation, to accurately characterize the
complexities of tissue microenvironments.
Here, we present SpatialMETA (Spatial Metabolomics and Transcriptomics
Analysis), a method designed for the vertical and horizontal
integration of ST and SM data. First, SpatialMETA aligns and reassigns
ST and SM data to achieve a unified spatial resolution. It then employs
a Conditional Variational Autoencoder (CVAE) framework to learn the
joint latent embedding for vertical and horizontal integration. To
correct batch effects in the latent space, conditioning is incorporated
into the decoder. To account for modality-specific features, the
reconstruction loss of ST data is modeled using a zero-inflated
negative binomial (ZINB) distribution, while SM data is modeled with a
Gaussian distribution. Additionally, shared decoders are applied
simultaneously to both single-modality and joint embeddings,
facilitating modality fusion and model interpretability. We also
demonstrated the efficacy of SpatialMETA using previously published
human ccRCC ST and SM data, revealing a spatial immune cluster enriched
with lipid metabolism-associated features. SpatialMETA outperforms
existing methods in both vertical and horizontal integration on human
ccRCC^[94]28, GBM^[95]29, and mouse brain^[96]31 datasets, assessed by
a series of designed metrics^[97]45,[98]46. SpatialMETA provides a
comprehensive suite of analytical and visualization capabilities to
facilitate in-depth data interpretation.
Results
SpatialMETA model architecture
SpatialMETA comprises four key modules: alignment, reassignment,
integration, and analysis (Fig. [99]1). To align the morphology of ST
and SM data obtained from adjacent tissue sections, SpatialMETA
incorporates rotation, translation, and non-linear distortion
(Fig. [100]1a). Publicly available ST datasets employed in this study
generally exhibit lower resolution compared to SM data. To address this
discrepancy, SpatialMETA employs a K-nearest neighbor (KNN) approach,
reassigning SM data to ensure consistency with the spots and resolution
of ST data (Fig. [101]1b).
Fig. 1. Schematic representation of the integrated analysis of ST and SM with
SpatialMETA.
[102]Fig. 1
[103]Open in a new tab
a Schematic illustration for the alignment process between ST and SM
data. SM spots align with ST spots or histology image derived from ST
through rotation, translation, and non-linear distortion. b Schematic
illustration for the reassignment process, in which SM data is adjusted
to a unified resolution with ST data by k-nearest neighbors (KNN)
computation. c Schematic illustration for spatially variable features
(SVFs) identification before vertical or horizontal integration. d
Schematic illustration for vertical and horizontal integration of ST-SM
data using a conditional variational autoencoder (CVAE). e Schematic
overview for downstream analysis and visualization capabilities of
SpatialMETA. f Schematic illustration for different scenarios supported
by SpatialMETA. SVGs spatially variable genes, SVMs spatially variable
metabolites, TOI trajectories of interest, ROI spatial regions of
interest.
SpatialMETA calculates the spatially variable features (SVFs) and
adopts a CVAE framework to vertically and horizontally integrate the
aligned ST and SM data (Fig. [104]1c, d). For vertical integration,
separate encoders extract feature representations (ST-only and SM-only
embeddings) from the original gene expression matrix (GEM) and
normalized metabolite intensity matrix (MIM), respectively. These
representations are concatenated and fed into a linear projection layer
to generate a joint embedding, which is then utilized to reconstruct
GEM and MIM data via ST or SM decoder. Additionally, the same ST or SM
decoders are applied to the ST-only or SM-only embedding. This unique
model design incorporates corresponding reconstruction losses that
constrain the ST- and SM-only embedding to closely resemble the joint
embedding within the latent space. Furthermore, angular similarity
between the single-modality embedding and joint embedding quantifies
the contribution of each modality, enhancing model interpretability.
For simultaneous vertical and horizontal integration, SpatialMETA
employs a batch-invariant encoder and a batch-variant decoder. While
batch labels are not utilized during encoding, they are incorporated
into the joint, ST-only, or SM-only embeddings during decoding,
facilitating effective batch effects correction across multiple
samples. Additionally, Maximum Mean Discrepancy (MMD) losses are
applied to ensure the comparability of joint embeddings across
different samples^[105]47. To model the feature distributions,
SpatialMETA adopts a ZINB distribution for GEM, which is widely used in
single-cell transcriptomics computational methods^[106]37,[107]48. MIM
is modeled using a Gaussian negative log-likelihood loss (hereafter
referred to as Gaussian loss), which reflects the typical intensity
distribution of most spatially variable metabolites (SVMs)
(Supplementary Fig. [108]1a).
SpatialMETA also offers a suite of analysis and visualization
functionalities (Fig. [109]1e), including spatial clustering,
identification of marker genes and metabolites, metabolite annotation
and enrichment analysis, gene-metabolite correlation analysis,
user-defined spatial trajectories of interest (TOI) for gene expression
and metabolite intensity, interactive analysis for user-defined spatial
regions of interest (ROI), quantification of modality contribution and
network analysis for spatial clusters. SpatialMETA supports various
scenarios, including vertical integration for ST and SM data from
single sample, simultaneous vertical and horizontal integration of ST
and SM data, as well as horizontal integration for SM data only
(Fig. [110]1f).
Alignment and reassignment of ST and SM data
To demonstrate the performance of SpatialMETA in aligning and
reassigning ST and SM data, we applied the method to publicly available
datasets from ccRCC (Y7_T), ccRCC (R29_T), mouse brain (m3_FMP), and
GBM (248_T)^[111]28,[112]29,[113]31. Since these ST and SM datasets are
obtained from adjacent tissue sections, they share overall
morphological similarities but also exhibit variations in tissue
morphology and misalignment in orientation. Additionally, differences
in spatial resolution arise due to the use of different platforms and
technologies. The raw ST and SM data, as initially presented, reflect
these discrepancies in morphology and resolution (Supplementary
Fig. [114]1b–e). While SM data exhibit morphological similarities to ST
data and their corresponding ST histology images, positional
discrepancies between SM and ST spots are evident.
SpatialMETA’s alignment module addresses these discrepancies by
applying a series of transformations to the SM spot coordinates,
including rotation, translation, non-linear distortion, and,
optionally, diffeomorphic metric mapping. These transformations are
optimized using gradient descent, adapted from the STalign
method^[115]49. Additionally, SpatialMETA provides an option to align
the latent features of the SM data with either latent features
derived from ST data or features from ST histology, utilizing
rasterization to convert the spots into an image with a specified
resolution^[116]49. In most instances, the transformation is optimized
by aligning the outline of SM spots with that of the ST histology
image, given their high degree of morphological similarity. However, it
is also feasible to align the outline of SM spots with the ST spots.
This process enables the projection of the SM data, which lacks
histology images, onto histology images derived from ST data in various
datasets, including ccRCC (R29_T), mouse brain (m3_FMP), and GBM
(248_T) (Supplementary Fig. [117]1f–m).
After the alignment of spatial coordinates for ST and SM data,
SpatialMETA performs a reassignment step. In most datasets employed in
this study, ST data generally exhibit lower resolution than SM data,
except for the mouse brain (m3_FMP) sample, where their resolutions are
comparable. By default, SpatialMETA reassigns SM data to ST coordinates
to achieve a consistent spatial resolution (Supplementary
Fig. [118]1n–q). For instance, in the ccRCC Y7_T sample, the raw ST
data contain 2,018 spots, whereas the SM data have 10,145 spots
(Supplementary Fig. [119]1b). SpatialMETA utilizes ST coordinates,
identifies KNN spots in the SM data, and averages their MIM values to
generate an aggregated SM dataset with 2,018 spots (Supplementary
Fig. [120]1n). Furthermore, SpatialMETA offers the flexibility to
reassign ST data to SM coordinates when ST data have higher spatial
resolution, ensuring adaptability to emerging high-resolution ST
technologies.
Vertical integration of ST and SM data and benchmark analysis
To evaluate the vertical and horizontal integration performance of
SpatialMETA, we compared it with other state-of-the-art methods for
non-spatial single-cell and spatial multi-omics data integration. The
methods include deep learning-based approaches such as scVI^[121]37,
scANVI^[122]38, totalVI^[123]35, scPoli^[124]40, spaVAE^[125]42,
spaMultiVAE^[126]42, SpatialGLUE^[127]43, and MISO^[128]44 as well as
traditional machine learning techniques, including Seurat
RPCA/CCA/BNN^[129]33,[130]34, Stabmap^[131]36,and principal component
analysis (PCA) (Supplementary Fig. [132]2a, Supplementary Data [133]1).
Among these methods, only MISO supports the incorporation of
histological image data. We evaluated MISO under two configurations:
MISO_2m, which integrates the ST and SM modalities; MISO_3m, which
integrates ST, SM, and histological image embeddings. Additionally, we
performed PCA on raw ST data, raw SM data, reassigned SM data, and
reassigned SM combined with raw ST data, revealing a high degree of
similarity in spatial patterns before and after the alignment and
reassignment preprocessing (Supplementary Fig. [134]2b–d).
SpatialMETA facilitates the vertical integration of ST and SM data and
provides interpretable quantification of modality contribution for each
spatial spot (Fig. [135]2a). To evaluate its performance for vertical
integration, we introduced four key metrics for benchmark analysis:
reconstruction accuracy, continuity score, marker score, and biological
conservation score (Fig. [136]2b, Supplementary Fig. [137]2e–i).
Reconstruction accuracy is assessed using the Pearson Correlation
Coefficient (PCC) and Cosine Similarity (CS), with higher values
indicating better feature representation (Supplementary Fig. [138]2e).
Continuity score, evaluating smoothness and consistency, is measured
using CHAOS, and the percentage of abnormal spots (PAS), as proposed in
SDMBench^[139]46 (Supplementary Fig. [140]2f). Marker score, assessing
the efficiency of modality fusion and the preservation of biologically
meaningful spatial patterns, are quantified using Moran’s I and Geary’s
C from SDMBench^[141]46, along with specificity, logistic, and Mutual
Information (MI) scores proposed in this study (Supplementary
Fig. [142]2g, and see “Methods” for details). Biological
conservation score was evaluated by measuring the preservation of
cell-type level variation, using Adjusted Rand Index (ARI), Normalized
Mutual Information (NMI), the Average Silhouette Width (ASW) of cell
types, the cell-type separation Local Inverse Simpson’s Index (cLISI),
and the Isolated label ASW, as introduced by scIB^[143]45
(Supplementary Fig. [144]2h). These metrics were averaged as overall
score, allowing a comprehensive benchmark against other vertical
integration methods across eleven samples from ccRCC, mouse brain, and
GBM. Overall, our benchmark results suggested that SpatialMETA
outperformed existing methods, demonstrating superior performance in
reconstruction and modality fusion (Fig. [145]2b). Additionally, we
quantified the computational efficiency in terms of time and memory
consumption, with SpatialMETA exhibiting reasonable memory usage and
efficient processing time (generally less than 2 gigabytes (GB) and
100 s for memory and time consumption) (Supplementary Fig. [146]2j). To
further evaluate the scalability of SpatialMETA, we simulated ST and SM
datasets with varying degrees of spot numbers ranging from 10,000 to
100,000. The results showed approximately linear increases in both time
and memory usage, with memory consumption primarily driven by data
scale rather than model complexity (Supplementary Fig. [147]2k).
Fig. 2. Benchmarking and application of vertical integration using
SpatialMETA.
[148]Fig. 2
[149]Open in a new tab
a Schematic diagram illustrating the vertical integration process of
SpatialMETA. b Summary of benchmarking metrics assessing the vertical
integration performance of different tools on ST and SM data. The tools
are arranged in descending order based on their overall scores. Source
data are provided as a [150]Source Data file. c–e Spatial plots
comparing the original (upper left and middle panels) and denoised
(lower left and middle panels) gene expression (left panels) and
metabolite intensity (middle panels) data processed by SpatialMETA. Bar
plots display the Pearson correlation coefficient (PCC) and cosine
similarity between the original and denoised data for each method
(right panels). Spatial plots visualizing Leiden clustering results
derived from different tools for ccRCC (f), GBM (g), and mouse brain
(h) datasets. Bar plots depicting spatial continuity scores for ccRCC
(i), GBM (j), and mouse brain (k) datasets across tools, bars are
color-coded by method. Stacked bar plots showing marker scores for ST
(green) and SM (yellow) data across different tools for RCC (l), GBM
(m), and mouse brain (n) samples. Imm immune clusters, Endo endothelial
clusters, Stro stromal clusters, Mal malignant clusters. Note that the
color legend is placed at the bottom of the figure, and the colors of
clusters do not directly correspond to the same captured structures
across different methods.
Compared to totalVI and spaMultiVAE, SpatialMETA generally achieves the
highest PCC and CS scores, demonstrating superior reconstruction
accuracy (Supplementary Fig. [151]2e). Furthermore, SpatialMETA
effectively denoises both modalities in the reconstructed GEM and MIM,
as represented by the selected spatially variable genes (SVGs) or SVMs
(Fig. [152]2c–e). Given that the spot size of ST data employed in this
study are larger than cell size, spot annotation could be performed by
cell type deconvolution using scRNA-seq data as reference. For example,
in the case of ccRCC (Y7_T), we conducted deconvolution and automatic
cell type annotations via DestVI^[153]50 using the original and
reconstructed GEM along with the ccRCC scRNA-seq reference^[154]28
(Supplementary Fig. [155]3a). The reconstructed GEM visually presents a
more spatially homogeneous distribution in cell type deconvolution
(Supplementary Fig. [156]3b–e).
To evaluate SpatialMETA’s ability to generate coherent spatial patterns
through vertical integration, we benchmarked it against
SeuratV5_BNN^[157]33,[158]34, totalVI^[159]1, Stabmap^[160]36,
spaMultiVAE^[161]42, SpatialGLUE^[162]43, and MISO^[163]44. Leiden
clustering was then performed using default parameters, and the
continuity score was evaluated. For ccRCC Y7_T sample, the Leiden
clustering results obtained from SpatialMETA were manually annotated
(referred to as cluster annotation) into four main spatial clusters:
immune clusters (Imm), endothelial clusters (Endo), stromal clusters
(Stro), and malignant clusters (Mal), which were subsequently used for
downstream analyses (Supplementary Fig. [164]4a–e). The results
indicated that SpatialMETA, MISO, and SeuratV5_BNN consistently
generated visually coherent spatial clusters evaluated with CHAOS and
PAS scores (Fig. [165]2f–k, and Supplementary Fig. [166]2f). TotalVI
exhibited variable performance across datasets. For example, it
performed well on the mouse brain and GBM samples but was less
effective on the ccRCC datasets (Fig. [167]2f–k). Moreover, the Geary’s
C and Moran’s I scores demonstrated that SpatialMETA effectively
integrates ST and SM features compared to other methods
(Fig. [168]2l–n, and Supplementary Fig. [169]2g). Notably, MISO_3m
achieved the best performance in terms of biological conservation. This
enhancement compared to MISO_2m may be attributed to the incorporation
of histological image information in MISO_3m (Fig. [170]2b, and
Supplementary Fig. [171]2h).
To evaluate the effectiveness of the SpatialMETA model design, we
systematically analyzed the impact of key model components on the
performance of vertical integration (Supplementary Fig. [172]4f, g).
Specifically, we compared different configurations of SM loss
functions, including Mean Squared Error (MSE) and Gaussian, as well as
the effect of SM data normalization, with or without normalization.
Additionally, we evaluated decoder architectures under two scenarios:
one utilizing the designed shared decoders and the other employing
independent decoders, which directly input concatenated joint
embeddings into separate ST and SM decoders (Supplementary
Fig. [173]4f). Their performance was assessed using metrics such as
reconstruction accuracy, continuity score, marker score, and biological
conservation score. The benchmarking results highlighted the critical
importance of Gaussian loss function and normalization for SM data in
achieving effective vertical integration. Furthermore, the shared
decoder architecture led to improved performance in modality fusion,
outperforming the independent decoder model (Supplementary
Fig. [174]4g).
Comprehensive downstream analyses for ST and SM data with SpatialMETA
SpatialMETA’s quantification of ST and SM contributions allows the
identification of spatial clusters driven by either or both modalities.
In the ccRCC Y7 sample, all five immune clusters (Imm_1 to 5) and
stromal type 1 cluster (Stro_1) exhibit a balanced contribution from
both ST and SM (Fig. [175]3a). Notably, among the immune subclusters,
Imm_3 and Imm_5 showed a stronger reliance on ST compared to the other
immune subtypes (Fig. [176]3b). In contrast, other spatial clusters,
particularly the endothelial (Endo) and malignant (Mal) clusters, are
predominantly driven by metabolomics (Fig. [177]3a, b). This pattern
could be attributed to the unique metabolic characteristics of cancer
cells, which adapt to survive in the hypoxic and nutrient-deprived
TME^[178]51,[179]52. Additionally, endothelial cells undergo metabolic
reprogramming to promote angiogenesis, supporting tumor growth within
the TME by supplying oxygen and nutrients^[180]53–[181]55. In the mouse
brain (m3_FMP) sample, most spatial clusters are primarily represented
by ST, while SM plays a more prominent role in the cerebral
cortex^[182]56 (Fig. [183]3c, d). Most spatial clusters in the GBM
(248_T) sample displayed a balanced contribution from both ST and SM,
highlighting the importance of both modalities (Supplementary
Fig. [184]5a, b). While SpatialGLUE’s model design incorporates the
calculation of modality weights, it may not be ideally suited for
integrating ST and SM data. This is evidenced in the integration
results for the three samples, which are all dominated by SM
(Supplementary Fig. [185]5c–e). Originally, SpatialGLUE was designed
for integrating ST with SP or ST with SE, where these modalities differ
from SM in terms of feature distribution, which may explain the
suboptimal performance of SpatialGLUE in ST and SM integration.
Fig. 3. Downstream analysis of ST and SM contributions and immune-enriched
spatial clusters using SpatialMETA.
[186]Fig. 3
[187]Open in a new tab
a Violin plots showing the ST and SM contributions across different
Leiden clusters identified by SpatialMETA for ccRCC (Y7_T). b Spatial
plots visualizing the ST and SM contributions generated by SpatialMETA
for ccRCC (Y7_T). Colored dashed outlines correspond to the clusters
identified by SpatialMETA in Fig. [188]2f. c Violin plots showing the
ST and SM contributions across different Leiden clusters derived by
SpatialMETA for mouse brain (m3_FMP). d Spatial plots visualizing the
ST and SM contributions generated by SpatialMETA for mouse brain
(m3_FMP), with the black dashed line outlining the cerebral cortex
region. e Spatial plot depicting immune-cell-enriched subclusters.
Spots are colored by Leiden clusters as defined by SpatialMETA,
consistent with those in Fig. [189]2f. f Scatter plot illustrating the
upregulated differentially expressed genes/metabolites in each
immune-cell-enriched spatial subcluster. Genes/metabolites with a
log2FoldChange > 0.25 and false discovery rate (FDR) < 0.05 in each
cluster are highlighted, with the top 5 genes and top 3 metabolites
labeled. Dot size indicates the proportion of spots expressing the
respective gene or metabolite in each subcluster. g Spatial plot
depicting the expression of marker genes CYP2J2 and ACSM2A, as well as
marker metabolites with m/z values of 267.1356 and 872.7177 in the
Imm_4 subcluster (Wilcoxon test with adjusted p values). Spots are
colored based on scaled gene expression or scaled metabolite intensity.
h Dot plot displaying gene ontology (GO) term enrichment specific to
the Imm_4 subcluster (hypergeometric test with adjusted p values). i
Bar plot displaying enrichment of metabolite groups specific to the
Imm_4 subcluster, assessed by hypergeometric test without multiple
comparison adjustment. As detailed in the “Methods” section. Source
data for (h, i) are provided as a [190]Source Data file.
The transcriptional and metabolic characteristics identified by
SpatialMETA can provide further insights into metabolic reprogramming
with therapeutic potential. To demonstrate this, we focused on immune
clusters for further analysis using SpatialMETA (Fig. [191]3e, f). The
original ccRCC study divided tumor samples into four groups, with Y7_T
sample defined as IM2 (immune subtypes 2) group, showing enriched fatty
acid and amino acid metabolism^[192]28. Here, we further analyzed
sub-spatial clusters. Notably, the Imm_4 sub-cluster prominently
expresses lipid synthesis-related genes (CYP2J2 and ACSM2A) and
metabolites (Phosphatidylglycerol and PA (24:0/24:1(15Z)))
(Fig. [193]3g). Pathway enrichment analysis in SpatialMETA revealed the
upregulated lipid metabolism at both transcriptional and metabolic
levels in Imm_4 (Fig. [194]3h, i). Notably, CYP2J2 has been reported as
a diagnostic and prognostic biomarker associated with immune cell
infiltration in RCC^[195]57,[196]58. Moreover, previous studies have
highlighted lipid metabolic reprogramming as a key metabolic marker of
RCC^[197]59–[198]61.
SpatialMETA also integrates several user-friendly analysis modules,
including automatic annotation of metabolites based on the m/z values
(Supplementary Fig. [199]5f), characterization of marker genes and
metabolites for each spatial cluster (Supplementary Fig. [200]5g), and
correlation analysis for user-defined genes and metabolites
(Supplementary Fig. [201]5h, i). For metabolites of interest, users are
recommended to perform additional experimental validation, such as
liquid chromatography-mass spectrometry (LC-MS), to confirm their
identities. In addition, SpatialMETA provides a user-interactive
interface, as exemplified by its application to the Y7_T sample
(Supplementary Fig. [202]6a). Users can draw a TOI from the malignant
clusters to the immune clusters (Supplementary Fig. [203]6b), enabling
the calculation and visualization of gene expression and metabolite
intensity gradients along the trajectory (Supplementary Fig. [204]6c,
d). Additionally, users can draw ROI, such as tumor core and immune
infiltrated regions (Supplementary Fig. [205]6e), to calculate marker
genes and metabolites within the ROIs (Supplementary Fig. [206]6f).
Simultaneous vertical and horizontal integration of ST and SM data and
benchmark analysis
During vertical integration, SpatialMETA also supports horizontal
integration to perform batch effect correction across samples
(Fig. [207]4a). We utilized five publicly available ccRCC samples
classified as the IM2 group in the original study to demonstrate this
feature. The process begins by aligning and reassigning the ST and SM
data from adjacent tissue sections derived from the same sample.
Subsequently, batch-biased SVGs and SVMs were excluded from the list of
SVFs and used as the input for the SpatialMETA model. This
preprocessing step, compared to relying solely on highly variable genes
and metabolites, alleviates batch effects (Supplementary Fig. [208]7a,
b). Following this, the CVAE model within SpatialMETA further corrected
the batch effect across samples, enabling the identification of unified
spatial clusters characterized by unique transcriptional and metabolic
states (Fig. [209]4b, and Supplementary Fig. [210]7c, d).
Fig. 4. Application of SpatialMETA to multiple ccRCC samples for vertical and
horizontal integration of ST and SM data.
[211]Fig. 4
[212]Open in a new tab
a Schematic representation for the vertical and horizontal integration
of ST and SM data from multiple samples using SpatialMETA. b UMAP
visualization of integrated ST-SM data from five ccRCC samples using
SpatialMETA for dimensionality reduction, colored by samples (left
panel), annotated Leiden clusters identified by SpatialMETA (right
panel). c Network plot illustrating spatial distances between different
spatial clusters. Node size represents the number of spots scaled using
min-max normalization, and node color indicates Leiden clusters
identified by SpatialMETA. Edge weights indicate reciprocal spatial
distances between nodes, calculated as average distances among all
spots within each spatial cluster and scaled using min-max
normalization. Only the top 4 weighted edges for each node are
retained. d Summary of benchmarking metrics evaluating the performance
of vertical and horizontal integration of ST and SM data by different
tools. The tools are arranged in descending order based on their
overall scores. Source data of the benchmark are provided as a
[213]Source Data file. e Spatial plots for individual ccRCC samples,
with spots colored by annotated Leiden clusters derived from
SpatialMETA. The color legend is consistent with (b) (right). Imm
immune clusters, Endo endothelial clusters, Stro stromal clusters, Mal
malignant clusters.
Spatial network analysis for clusters in integrated ccRCC IM2 samples
showed that endothelial clusters consistently exhibited closer spatial
proximity to malignant clusters, indicating frequent co-localization
(Fig. [214]4c, and Supplementary Fig. [215]7e)^[216]62–[217]64. This
finding aligns with the original study, in which they reported enriched
endothelial signatures in IM2 tumors^[218]28. Notably, heterogeneity
was also observed across samples. In sample R114_T, S15_T and Y27_T,
the Imm_2 and malignant clusters co-localized, whereas in sample X49_T,
no immune clusters co-localized with malignant clusters (Supplementary
Fig. [219]7e).
We next conducted a benchmark analysis to evaluate the vertical and
horizontal integration performance of SpatialMETA compared to existing
methods, including Seurat RPCA/CCA^[220]33,[221]34, scVI^[222]37,
scANVI^[223]38, totalVI^[224]35, scPoli^[225]40, spaVAE^[226]42, and
PCA (Fig. [227]4d, and Supplementary Fig. [228]7f–k). The performance
for vertical integration was assessed using metrics such as
reconstruction accuracy, continuity score, marker score, as described
previously. To assess the performance of horizontal integration, we
employed batch ASW, Graph Integration LISI (iLISI), and Principal
component regression (PCR) batch for batch correction, and ARI, NMI,
cell type ASW, Isolated label ASW, and cLISI for biological
conservation, as introduced by scIB^[229]45. The overall integration
score, which reflects both vertical and horizontal integration, was
calculated as the mean of the cross-modality and cross-sample scores.
Achieving a proper balance between modality fusion, batch correction,
and biological conservation is crucial for effective vertical and
horizontal integration of SM and ST data. For all five ccRCC samples,
we performed both vertical and horizontal integration using the
aforementioned methods. SpatialMETA achieved the highest overall
scores, demonstrating superior performance in vertical integration,
while maintaining a reasonable balance between batch correction and
biological conservation (Fig. [230]4d). We visualized the unified
spatial clusters in each sample, noting that other methods often
exhibit biases toward vertical or horizontal integration (Fig. [231]4e,
and Supplementary Fig. [232]8). For instance, while scPoli excelled at
vertical integration, achieving high cross-modal overall score and
providing smoother results, it failed to effectively remove batch
effects, as evidenced by its lower cross-sample overall score and
visual representation (Fig. [233]4d, Supplementary Fig. [234]8). In
contrast, Seurat CCA/RPCA and spaVAE performed well in horizontal
integration, but their vertical integration results were suboptimal
(Fig. [235]4d, Supplementary Fig. [236]8). SpatialMETA outperformed
other methods in both vertical and horizontal integration of the mouse
brain samples, achieving the highest overall score and further
demonstrating its robustness (Supplementary Fig. [237]9, [238]10).
Additionally, when only SM data are available, SpatialMETA supports
horizontal integration and analysis of SM modality independently. Using
the SM data of ccRCC samples as examples, we further benchmarked
SpatialMETA against other horizontal integration methods, including
Seurat CCA/RPCA, scVI, scANVI, scPoli, spaVAE, and PCA (Supplementary
Fig. [239]11a). Overall, scPoli, SpatialMETA, and Seurat CCA/RPCA
demonstrated superior performance compared to the other methods across
continuity score, marker score, and batch correction metrics
(Supplementary Fig. [240]11). The result indicated that while some
horizontal integration tools not specifically designed for SM data,
such as scPoli and Seurat CCA/RPCA, can still effectively applied for
SM-only integration.
Discussion
In this study, we introduced SpatialMETA, a robust deep learning-based
framework for the vertical and horizontal integration of ST and SM
data. With the specialized model design for decoders and loss
functions, SpatialMETA achieves a balanced performance across modality
fusion, batch effect correction and biological conservation. This
approach offers the potential to study the microenvironmental spatial
organization and metabolic heterogeneity of tissues. When applied to
the vertical integration of ccRCC ST and SM data, SpatialMETA
characterizes a sub-spatial cluster of tumor-infiltrating immune cells
with upregulated lipid metabolic pathways. Additionally, the horizontal
integration of ST and SM data across samples enables the identification
of unified spatial clusters and their consistent interactions.
Benchmark analyses demonstrate that, compared to previously published
non-spatial single-cell or spatial multi-omics tools, SpatialMETA
achieved superior overall performance in vertical and horizontal
integration of ST and SM data.
Current spatial multi-omics strategies can be generally categorized
into two main approaches. The first approach involves multimodal
profiling on the same tissue section, generating spatial multi-omics
data that retain identical spatial morphology. Techniques following
this approach either utilize the same spatial spots, such as
Stereo-CITE-seq^[241]65, spatial ATAC-RNA-seq^[242]66, and
CUT&Tag-RNA-seq^[243]66, or employ different spatial spots with varying
resolutions, as demonstrated in a recent method by Vicari et
al.^[244]27. However, these methods face experimental challenges,
including conflicting tissue embedding requirements for different
modalities, and technical limitations that may compromise the quality
of one modality during the acquisition of another^[245]32. The second
approach utilizes adjacent tissue sections for multimodal profiling,
enabling the independent acquisition of each modality, as observed in
the majority of ST and SM joint profiling studies^[246]27–[247]30.
While this method bypasses the experimental constraints of the first
approach, it introduces computational challenges due to morphological
and resolution discrepancies between adjacent multi-modal sections,
which necessitate alignment and reassignment during preprocessing.
Several single-modality spatial alignment algorithms for ST have been
rapidly developed, such as STalign^[248]49, PASTE^[249]67,
GPSA^[250]68, and Spateo^[251]69, which consider both spot coordinates
and transcriptomic similarity to align series of sections from the same
sample. Nevertheless, multi-modal integration methods like spaVAE,
spaMultiVAE, and SpatialGLUE do not incorporate alignment, as they are
designed for spatial multi-modal integration within the same tissue
section and spot coordinates. Currently, most ST and SM techniques rely
on adjacent tissue sections, resulting in different spatial morphology
and resolution, thereby necessitating alignment and reassignment. In
SpatialMETA, we align the SM to the ST using rotation, translation, and
non-linear distortion, and perform reassignment using KNN. This process
alters only the spatial coordinates and resolution of the SM data with
minor influence on biological conclusions, such as the identification
of marker metabolites within specific spatial clusters. Recently, MISO
has incorporated the iStar^[252]70 method that utilizes histological
images to enhance resolution alignment when ST and SM data are
generated from the same tissue section. Such approaches may become
increasingly important as emerging technologies enable multi-modal data
acquisition from the same tissue section, preserving spatial morphology
during resolution unification. Notably, the alignment and reassignment
are currently applied only between two adjacent sections within a
single sample. However, for future applications in 3D spatial omics,
alignment across multiple sections may be necessary to capture the full
spatial features. Looking ahead, we propose that, beyond morphological
similarity, shared spatial patterns across modalities could further
enhance alignment accuracy.
Furthermore, recent advancements in high-resolution spatial
transcriptomics and proteomics technologies, such as CosMx^[253]6,
Xenium^[254]7, Visium HD^[255]9, Stereo-seq^[256]11, Slide-seq^[257]10,
and CODEX^[258]71, when combined with SM modalities, present new
opportunities and challenges for spatial multi-omics integration.
High-resolution spatial omics data, which can profile millions of
individual spots or cells within each tissue section, poses significant
challenges for the computational scalability of analysis. SpatialMETA
demonstrates efficient memory usage, exhibiting approximately linear
growth with an increasing of spots number. This scalability highlights
SpatialMETA’s potential applicability for high-resolution multi-omics
data integration. Additionally, high-resolution ST datasets often
exhibit low sequencing depth per spot, which underscores the importance
of incorporating histological image data. In this context, methods like
MISO, which integrate image-derived features, provide valuable
enhancements for modality fusion^[259]44. High-resolution spatial omics
facilitates the spatial profiling at the single-cell level. Therefore,
incorporating single-cell segmentation algorithms such as
Cellpose3^[260]72, and Proseg^[261]73 would enable the reassignment of
ST and SM data based on individual cells, which may further enhance the
multi-modality integration.
Multi-omics captures cellular identity from different molecular layers,
and the integration of these datasets provides a more comprehensive
understanding of cell types and states. An optimal vertical integration
approach entails deriving a joint embedding that simultaneously
extracts features from multi-modal data. Similar to existing methods
for vertical integration of non-spatial single-cell multi-omics data,
such as MultiVI^[262]74, totalVI^[263]35, Cobolt^[264]75, GLUE^[265]76
and scMVP^[266]77, SpatialMETA employs an encoder-decoder framework to
obtain a joint embedding and reconstruct input matrices. SpatialMETA
further utilizes shared decoders for both the joint embedding and the
ST- or SM-only embeddings, ensuring consistency between the joint
embedding and individual modality embeddings. This approach facilitates
effective modality fusion and enhances the accuracy of ST and SM data
reconstruction. Additionally, the model design enables the calculation
of the angular similarity between embeddings, improving the
interpretability of modality contributions. Additional ablation studies
on model components further emphasizes the importance of shared
decoders, the selection of appropriate SM loss function, and
SM-specific normalization to achieve optimal integration performance.
For vertical integration of spatial multi-omics, Tian et al. developed
spaVAE, which uses a joint embedding for vertical integration, while
horizontal integration is achieved by incorporating the batch
information using another VAE framework, spaMultiVAE^[267]42. In
contrast, SpatialMETA simultaneously integrates both batch information
and the joint embedding, enabling vertical and horizontal integration
within a single VAE framework. SpatialGLUE incorporates spatial
position information through graph neural network (GNN)
framework^[268]43, while MISO adopts Vision Transformers for
histological image representation learning^[269]44. In summary, future
computational methods for spatial multi-omics integration should
consider the incorporation of spatial or image features to enhance the
vertical integration performance.
In both single-cell and spatial omics horizontal integration, batch
variations arise from both technological and biological
factors^[270]32. It is crucial to avoid overcorrecting true biological
variation while effectively removing technological variation during
horizontal integration across different samples. Similar to
SCALEX^[271]39 and scAtlasVAE^[272]78, we introduce a batch-variant
decoder to ensure biological conservation, where the latent embeddings
are derived solely from the original matrix and remains independent of
batch information. To facilitate batch correction and inspired by
scArches^[273]47, we incorporated the MMD loss function to ensure that
the latent embeddings from different batches are similar. Additionally,
achieving an optimal balance between batch correction and biological
conservation is critical when evaluating horizontal integration.
Therefore, we adopted previously established metrics to comprehensively
assess the effectiveness of SpatialMETA’s horizontal
integration^[274]45,[275]46. Notably, for the SM-only horizontal
integration, some existing methods not specifically designed for SM
data may still be applicable, offering additional options for
integrating SM data across batches.
SpatialMETA demonstrates robust performance in vertical and horizontal
integration of ST and SM data, however, several limitations remain.
Firstly, SpatialMETA is designed for low-resolution ST and SM data.
With the rapid development of single-cell resolution spatial omics
technologies, alignment and reassignment could be achieved on cellular
or subcellular levels, necessitating computational advancements for
precise cell segmentation. Secondly, SpatialMETA primarily focuses on
the integration of ST and SM data. As spatial multi-omics technologies
continue to evolve, future iterations of SpatialMETA will need to
accommodate vertical integration involving more than two modalities.
Thirdly, the horizontal integration capabilities of SpatialMETA have
been demonstrated on a limited number of samples from a single study.
With the exponential growth of spatial omics datasets, we foresee that
the horizontal integration of SpatialMETA will be instrumental in
constructing comprehensive spatial multi-omics atlases. Lastly,
collaborative efforts between experimental and computational biologists
are crucial for validating the practical applications and biological
significance of spatial multi-omics data integration.
Methods
Datasets and preprocessing
The ST and SM datasets used in this study were preprocessed into GEM
and MIM formats, respectively. These datasets were derived from human
ccRCC, GBM, and mouse brain samples, obtained from previously published
studies. Detailed information of the datasets and their respective
sources are provided in Supplementary Data [276]2. The ST datasets were
generated using the 10x Genomics Visium platform, while the SM datasets
were produced using both the MALDI and DESI platforms. The raw GEM and
MIM data, as originally provided in the respective publications, were
processed according to the SpatialMETA tutorial for subsequent
integration and analysis. Additionally, all Leiden clusters in this
study were performed using the default parameters of the
‘scanpy.tl.leiden’ function.
Overview of SpatialMETA
SpatialMETA consists of four key components: alignment between ST and
SM with different morphologies, reassignment between ST and SM with
different resolutions, vertical and horizontal integration between ST
and SM from different samples, extensive analysis and visualization
capabilities.
Alignment between ST and SM
We implement an alignment module (
[MATH: ϕ :MATH]
) to perform biological meaningful alignment between ST and SM, which
requires considerations on registration of two modalities: spot
coordinate from ST (
[MATH: CoordST∈R2
:MATH]
) and SM (
[MATH: CoordSM∈R2
:MATH]
), histological image from ST (
[MATH: ImageST∈R3
:MATH]
) and SM (
[MATH: ImageSM∈R3
:MATH]
), and transcriptomic features from ST and metabolomic features from
SM. An alignment between ST and SM requires a transformation
[MATH: ϕ:R2
→R2
:MATH]
, which can be represented by an affine transformation matrix
[MATH: A :MATH]
which include a rotation matrix
[MATH: R∈R2×
2 :MATH]
, a translation matrix
[MATH: t∈R2×
1 :MATH]
, a scaling matrix
[MATH: S∈R2×
2 :MATH]
for non-linear distortion, and a diffeomorphism term
[MATH: φ :MATH]
, to transform
[MATH: CoordSM :MATH]
to the
[MATH: CoordSM′
:MATH]
within same coordinate system with
[MATH: CoordST :MATH]
:
[MATH: ϕCoordSM′=R ⋅ S⋅φCoordSM+t :MATH]
1
where
[MATH: S :MATH]
is a diagonal matrix constructed from two scaling factor
[MATH: s1
:MATH]
and
[MATH: s2
:MATH]
:
[MATH: S=s100s2 :MATH]
2
As ST and SM are often measured on adjacent or serial sections from the
same tissue sample, and the outlines from each slide are similar. We
transform
[MATH: ImageST :MATH]
into 2-dimensional point set with the same coordinate system as
[MATH: CoordST :MATH]
, as registration between
[MATH: ImageST :MATH]
and
[MATH: CoordST :MATH]
is provided by common ST sequencing platforms. As
[MATH: ImageSM :MATH]
is often unavailable, we use the outline from the SM spot coordinates
[MATH: CoordSM :MATH]
using the concave hull algorithm to represent the shapes of the SM
tissue sections. To model the transcriptomic features and metabolomic
features, we use two independent variational autoencoders to learn the
latent representations ST (
[MATH: LatentST∈RL
:MATH]
) and SM (
[MATH: LatentSM∈RL
:MATH]
) at spot level, and then rasterized into the image space as described
in STalign^[277]49.
Optionally, a linear projection layer
[MATH: FSM→Image:RL
→R3
:MATH]
was trained to learn the relationship between histological features and
metabolomic features. A linear projection layer
[MATH: FSM→ST:RL
→RL
:MATH]
was trained to learn the relationship between transcriptomic features
and metabolomic features.
Our alignment module computes
[MATH: A :MATH]
,
[MATH: φ :MATH]
, and
[MATH: S :MATH]
by adopting the stochastic gradient descent of Large Deformation
Diffeomorphic Metric Mapping adapted from STalign^[278]49, with
modifications on the default weight parameters and optimization
objective functions:
[MATH: Lalignment=
λ1×Lpoint+λ2×Lhisto+
λ3×Lfeatures :MATH]
3
Where
[MATH:
λ1(<
/mo>default=1)<
/mrow> :MATH]
,
[MATH: λ2
:MATH]
(default = 0), and
[MATH: λ3
:MATH]
(default = 0) are weight parameters, and
[MATH: Lpoint=PointDistanceAlphaShapeCoordST,AlphaShapeImageST :MATH]
4
[MATH: Lhisto=MSEImageST,FSM→Image(LatentSM) :MATH]
5
[MATH: Lfea<
mi>tures=MSELatentST,FSM→ST(<
mrow>LatentSM) :MATH]
6
are objective functions for different measurement.
[MATH: MSE :MATH]
calculates the mean squared error between two sets of latent
representation.
[MATH: PointDistance(P
,Q) :MATH]
calculates the pairwise Euclidean distance between two sets of points
[MATH: P={p1,p2,⋯,pn}
:MATH]
and
[MATH: Q={q1,q2,⋯,qm}
:MATH]
. First, for each point
[MATH: pi∈P :MATH]
, we find the closest point
[MATH: qj∈Q :MATH]
, and vise versa. Let
[MATH:
NN(i)
mrow>=j :MATH]
denote the index of closest point in
[MATH: Q :MATH]
to
[MATH: P :MATH]
, and
[MATH:
NN(j)
mrow>=i :MATH]
denote the index of closest point in
[MATH: P :MATH]
to
[MATH: Q :MATH]
. Then,
[MATH: PointDistanceP,Q=1N∑i=1Ndistpi,qNNi⋅wI
+∑j=1Mdistqj,pNNj⋅wJ
:MATH]
7
where
[MATH: wI
:MATH]
and
[MATH: wJ
:MATH]
are the weights for the two sets of points, and
[MATH: dist :MATH]
calculates the Euclidean distance between two points.
[MATH: AlphaShape(P) :MATH]
finds the edge of a set of points
[MATH: P={p1,p2,⋯,pn
mrow>} :MATH]
, where each point
[MATH: pi=xi
,yi
:MATH]
is a point in a 2D plane, the goal is to find the concave hull of the
points using Delaunay triangulation and edge detection based on the
circumradius of the triangles. The Delaunay triangulation divides
[MATH: P :MATH]
into a set of triangles
[MATH:
T={t
1,t2,⋯,tk} :MATH]
, and each triangle
[MATH: tk
:MATH]
has three vertices
[MATH:
tk=(va,vb<
/msub>,vc) :MATH]
. The circumradius
[MATH:
Rk=abc4A
mi> :MATH]
, where
[MATH: a,b,c :MATH]
are side lengths and
[MATH: A :MATH]
is the area of the triangle. The triangle edges would be considered as
bounardy edges if
[MATH:
Rk>α :MATH]
, where
[MATH: α :MATH]
(default = 1) is a user-defined parameter. Vertices from boundry edges
are used as outline spots.
In addition, if there is a significant difference between the SM and ST
data positions, the SM image needs to be manually flipped and rotated
by a large angle to ensure more effective alignment before using the
automatic SpatialMETA Alignment Module.
Reassignment between ST and SM
After aligning the ST and SM data, the spatial positions of the ST
spots are extracted and used as the new reassigned SM coordinates.
Then, we perform the KNN calculation on the raw SM data to establish a
correspondence between the new SM spots and the raw SM data. The
process is detailed as follows:
1. KNN Calculation: For each new SM spatial position (aligned with the
ST spots), the
[MATH: K :MATH]
nearest raw SM spots are identified based on their Euclidean
distance, with a default setting of
[MATH: K=5 :MATH]
. Users can adjust this value as needed for their specific
analysis. An increased
[MATH: K :MATH]
value results in each new SM spot being derived from a larger
number of neighboring raw SM spots, which in turn produces a
smoother distribution of the MIM in the newly generated SM data.
2. Distance Threshold: To ensure that only nearby SM spots are
included in the KNN calculation, a distance threshold is applied,
filtering out those too far from the reassigned SM spot. This
threshold is determined as the product of
[MATH:
dmin
:MATH]
and
[MATH: fdist :MATH]
, where
[MATH:
dmin
:MATH]
represents the minimum inter-spot distance in the ST data, computed
via the “calculate_min_dist” function (recommended) or manually
defined by the user.
[MATH: fdist :MATH]
is a scaling factor with a default value of 1.5, which can also be
user-defined.
3. Average Calculation: After identifying the
[MATH: K :MATH]
nearest raw SM spots, the MIM for the new SM spot is computed by
averaging the values of the
[MATH: K :MATH]
nearest raw SM spots.
Vertical and horizontal integration among ST and SM
The vertical and horizontal integration of ST and SM in SpatialMETA is
based on a CVAE. We denote the gene expression counts of each spot as
[MATH: Xst={X1st,…,<
/mo>Xnst}∈RG
:MATH]
, where
[MATH: G :MATH]
is the number of selected SVGs and n is the number of spots. Similarly,
the metabolite intensity values of each spot are denoted as
[MATH: Xsm={X1sm,…,<
/mo>Xnsm}∈RM
:MATH]
, with
[MATH: M :MATH]
indicating the number of selected SVMs and n is the number of spots.
Encoding
SpatialMETA utilizes two batch-invariant encoders,
[MATH: Fencoderst :MATH]
for ST data and
[MATH: Fencodersm :MATH]
for SM data, to obtain their respective embeddings:
[MATH: qst=FencoderstXst :MATH]
8
[MATH: qsm=FencodersmXsm :MATH]
9
Vertical integration
In the vertical integration process, the ST and SM representation
[MATH: qst :MATH]
and
[MATH: qsm :MATH]
are concatenated to form the joint representation
[MATH: qjoint :MATH]
. The joint representation is passed through a Multivariate Normal
distribution
[MATH: N :MATH]
, where the mean (
[MATH: μjoint :MATH]
) and variance (
[MATH: σjoint2 :MATH]
) are computed as follows:
[MATH: μjoint=Fmeanqjoint :MATH]
10
[MATH: σjoint=Fexpqjoint+ϵ :MATH]
11
[MATH: zjoint=Nμjoint,σjoint2I :MATH]
12
where
[MATH: ϵ :MATH]
is typically a small number (default is 1e-4) added to prevent
[MATH: σ :MATH]
from being zero.
Latent embeddings for each modality
Additionally, the means of the individual ST and SM embeddings are
computed as:
[MATH: μst=Fmeanqst :MATH]
13
[MATH: μsm=Fmeanqsm :MATH]
14
Decoding for joint embedding
To decode the joint embedding
[MATH: zjoint :MATH]
, SpatialMETA employs two batch-variant neural networks:
[MATH: Fdecoderst :MATH]
serves as the ST decoder, and
[MATH: Fdecodersm :MATH]
serves as the SM decoder. These decoders reconstruct the respective
data for ST and SM as follows:
[MATH: rmeanst,<
mi mathvariant="bold">rvarst,<
mi mathvariant="bold">rgatest=Fdecoderstzjoint,B :MATH]
15
[MATH: rmeansm,<
mi mathvariant="bold-italic">rvarsm=Fdecodersmzjoint,B :MATH]
16
where
[MATH: B :MATH]
represents the batch information, which could be samples, studies and
sequencing platforms. When SpatialMETA is applied to a single sample
for vertical integration only,
[MATH: B :MATH]
is set to None. When SpatialMETA is applied to multiple samples for
both vertical and horizontal integration,
[MATH: B :MATH]
is included to account for batch effects.
Batch embeddings
The batch embeddings are computed as:
[MATH: FembB=femb1,…,fembH :MATH]
17
where the batch index for each spot is denoted as
[MATH: B={B1,…,Bn}∈<
/mo>RH
:MATH]
, with
[MATH: H :MATH]
representing the number of batch levels and
[MATH: n :MATH]
representing the number of spots. Each batch index
[MATH: Bn :MATH]
is defined as
[MATH: Bn={bn,1,…,bn,H} :MATH]
. These batch embeddings are trainable parameters optimized through the
reconstruction loss to capture batch-specific gene expression and
metabolite intensity patterns, thereby enabling batch correction. In
this study, only a single batch level, “sample” was used due to the
limited number of samples. However, for datasets integrating samples
from different projects, multiple batch levels (e.g., [“sample”,
“project”]) can be incorporated to achieve more comprehensive batch
correction.
Decoding for ST and SM embedding
For the corresponding ST and SM reconstruction, the shared decoder
[MATH: Fdecoderst :MATH]
and
[MATH: Fdecodersm :MATH]
are used as follows:
[MATH: rmeanstcor,<
msubsup>rvarstcor,<
msubsup>rgatestcor=Fdecoderstμst,B :MATH]
18
[MATH: rmeansmcor,<
msubsup>rvarsmcor=Fdecodersmμsm,B :MATH]
19
ST and SM contribution calculation
To assess the influence of each modality on the joint embedding for
each spot, the contribution of input
[MATH: Xst :MATH]
and
[MATH: Xsm :MATH]
to joint embedding
[MATH: zjoint :MATH]
was determined by calculating the the angular distances as the
following:
[MATH: θsti=μsti⋅zjointi∣μsti*∣zjointi :MATH]
20
[MATH: θsmi=μsmi⋅zjointi∣μsmi*∣zjointi :MATH]
21
where
[MATH:
i=1,…,n
:MATH]
,
[MATH: n :MATH]
is the number of spots. The final modality contribution is defined as
the difference between
[MATH: θsti
:MATH]
and
[MATH: θsmi
:MATH]
shifted by the factor 0.5 to scale the values between 0 and 1.
ST and SM decoder outputs
The ST decoder outputs the mean (
[MATH: rmeanst :MATH]
and
[MATH: rmeanst_cor :MATH]
), variance (
[MATH: rvarst :MATH]
and
[MATH: rvarst_cor :MATH]
), and dropout probability (
[MATH: rgatest :MATH]
and
[MATH: rgatest_cor :MATH]
) of the gene expression counts to parameterize the zero-inflated
negative-binomial distribution (ZINB) and model the raw GEM. Meanwhile,
the SM decoder outputs the mean (
[MATH: rmeansm :MATH]
and
[MATH: rmeansm_cor :MATH]
), and variance (
[MATH: rvarsm :MATH]
and
[MATH: rvarsm_cor :MATH]
) of the metabolite intensity values to parameterize the Gaussian
distribution and model the raw MIM.
Total loss function in SpatialMETA
Let
[MATH: LreconsmXi<
mrow>sm
:MATH]
denote the reconstruction loss for SM from
[MATH: zjoint :MATH]
and
[MATH: LreconstXi<
mrow>st
:MATH]
denote the reconstruction loss for ST from
[MATH: zjoint :MATH]
. Similarly,
[MATH: LreconsmcorXi<
mrow>sm
:MATH]
represents the reconstruction loss for SM from
[MATH: μsm :MATH]
, and
[MATH: Lreconstcor(
Xist)
:MATH]
represents the reconstruction loss for ST from
[MATH: μst :MATH]
. Additionally,
[MATH: LMMD(
qi)
:MATH]
denotes the MMD to ensure comparable joint embeddings across samples,
as defined in scArches^[279]47.
[MATH: LKL(<
mrow>qi)
:MATH]
represents the KL-divergence loss for the variational distribution. The
total loss function for SpatialMETA combines the total reconstruction
losses for ST and SM, the MMD loss, and the KL-divergence loss. It is
expressed as:
[MATH: Ltotal=1N∑i=1
NLreconsmXi<
mrow>sm+LreconsmcorXi<
mrow>sm+LreconstXi<
mrow>st+LreconstcorXi<
mrow>st+LMMDqi+LKLqi :MATH]
22
where
[MATH:
i=1,…,N
:MATH]
,
[MATH: N :MATH]
is the number of spots. The weights for each loss component can be
defined by the user to emphasize specific objectives.
Horizontal integration for SM data
Similar to the horizontal integration among ST and SM, we denote the
metabolite intensity values of each spot as
[MATH: Xsm={X1sm,…,<
/mo>Xnsm}∈RM
:MATH]
, where
[MATH: M :MATH]
represents the selected SVMs and
[MATH: n :MATH]
representing the number of spots. If batch information is available,
the batch index of each spot is annotated as
[MATH: B={B1,…,Bn}∈RH
:MATH]
, where
[MATH: H :MATH]
denotes the number of batch levels, and
[MATH: Bn=bn,1,…,bn,H :MATH]
} with
[MATH: H :MATH]
being the number of batch levels.
Then, a batch-invariant neural network
[MATH: FencodersmXsm→(qsm)
:MATH]
serving as the encoder for SM. Additionally, a batch-variant neural
network
[MATH: Fdecodersmz,B→(rmeansm,<
mi mathvariant="bold">rvarsm)
:MATH]
acts as the SM decoder. The horizontal integration process mirrors the
aforementioned approach, with the SM decoder outputting the mean (
[MATH: rmeansm :MATH]
), and variance (
[MATH: rvarsm :MATH]
) of the metabolite intensity values to parameterize the Gaussian
distribution and model the raw MIM.
Moreover, the total loss function, which incorporates the average
reconstruction loss for SM, the MMD loss, and the average KL-divergence
loss, is expressed as:
[MATH: Ltotalsm=1N∑i=1
NLreconsmXi<
mrow>sm+LMMDqi+LKLqi :MATH]
23
where
[MATH:
i=1,…,N
:MATH]
,
[MATH: N :MATH]
is the number of spots. The weights for each loss component can be
defined by the user to emphasize specific objectives.
Benchmark metrics calculation
For the benchmarking of vertical integration on individual samples, we
categorized the metrics into four components: (1) Reconstruction
accuracy, (2) Continuity scores, and (3) Marker scores, (4) Biology
conservation.
The overall score for vertical integration was calculated as the mean
of the four component scores:
[MATH: Overallscore=Reconstructionaccuracy+Continuityscore+ Markerscore+ Biologyconservation4 :MATH]
24
1. Reconstruction accuracy: To evaluate the reconstruction accuracy of
SpatialMETA, two quantitative metrics were utilized: Pearson
Correlation Coefficient (PCC) and Cosine Similarity (CS). These
metrics were computed between the original data matrix
[MATH: Xoriginal :MATH]
and
[MATH: Xreconstructed :MATH]
for each variable (e.g., gene expression or metabolite intensity)
independently. The Reconstruction Accuracy score was defined as the
mean of PCC and CS.
[MATH: Reconstructionaccuracy=PCC+CS2
:MATH]
25
Pearson Correlation Coefficient (PCC)
The PCC was calculated for each feature
[MATH: i :MATH]
as follows:
[MATH: PCCi=
Covxioriginal,<
mrow>xireconstructedσxioriginal⋅σxireconstructed :MATH]
26
[MATH: MeanPCC=1N∑i=1
NPCCi
:MATH]
27
Where
[MATH: xioriginal :MATH]
and
[MATH: xireconstructed :MATH]
represent the
[MATH: i :MATH]
-th feature of
[MATH: Xoriginal :MATH]
and
[MATH: Xreconstructed :MATH]
, repectively.
[MATH: Cov :MATH]
denotes covariance, and
[MATH: σ :MATH]
denotes the standard deviation, and
[MATH:
i=1,…,N
:MATH]
, where
[MATH: N :MATH]
is the number of features.
Cosine similarity (CS)
The cosine similarity for each feature
[MATH: i :MATH]
was calculated as:
[MATH: CSi=<
/mo>xioriginal⋅
xireconstructed∣xioriginal∣*
∣xireconstructed∣ :MATH]
28
[MATH: MeanCS=1N∑i=1NCSi :MATH]
29
Where
[MATH: xioriginal :MATH]
and
[MATH: xireconstructed :MATH]
represent the
[MATH: i :MATH]
-th feature of
[MATH: Xoriginal :MATH]
and
[MATH: Xreconstructed :MATH]
, repectively.
[MATH:
i=1,…,N
:MATH]
, where
[MATH: N :MATH]
is the number of features.
* (2).
Continuity scores: These metrics assess the smoothness and
consistency of the data embedding. This is evaluated using CHAOS
and PAS, as defined in SDMBench^[280]46. Since lower CHAOS and PAS
scores indicate better performance, they were transformed as
[MATH: 1−CHAOS
:MATH]
or
[MATH: 1−PAS :MATH]
before aggregation. The overall continuity score was then
calculated as the mean of the transformed scores:
[MATH: Continuityscore=1−CHAOS+1−PAS2
mrow> :MATH]
30
* (3).
Marker scores: Marker scores evaluate the accuracy of modality
fusion and the preservation of biological spatial patterns. This is
evaluated using Moran’s I and Geary’s C, as defined in
SDMBench^[281]46, along with Specificity, Logistic, and Mutual
Information(MI) scores. Similarly, Geary’s C was transformed as
[MATH: 1−Geary′sC :MATH]
. The marker score was calculated as:
[MATH: Markerscore=Moran′sI+1−Geary′sC+Specificity+Logistic+MI5 :MATH]
31
Specificity score
The specificity score quantifies how specific the expression of a gene,
or the intensity of a metabolite is specific to particular spatial
clusters within a dataset. The calculation involves the following
steps:
1. Feature filtering: To mitigate the influence of lowly expressed or
sparse features, an initial filtering step was applied. Features
with an excessively high proportion of zero expression across all
spatial spots were excluded. Specifically, features were excluded
if their proportion of zero values exceeded a predefined threshold
(default: 99%, i.e., features with zero expression in ≥99% of
spots). To ensure robustness, filtering was performed at multiple
zero-percentage thresholds (100%, 99%, 98%, 95%), and the final
specificity score was computed as the mean across these thresholds.
ST and SM datasets were processed independently.
2. Specificity score calculation: For each feature
[MATH: i :MATH]
, its specificity to a spatial cluster
[MATH: c :MATH]
is calculated as:
[MATH: Specificityi=MeanExpressioni,cMeanExpressioni :MATH]
32
[MATH: Specificity
Score=1N∑i=1Nma
xcSpecificityg,c :MATH]
33
where
[MATH: Mean
Expressioni,c :MATH]
is the average expression of feature
[MATH: i :MATH]
in spatial cluster
[MATH: c :MATH]
, and
[MATH: Mean
Expressioni :MATH]
is the overall average expression of feature
[MATH: i :MATH]
across all spatial clusters and
[MATH: N :MATH]
is the number of features.
Logistic score
An effective spatial clustering method should consider feature
characteristics, enabling accurate prediction of spatial clusters based
on feature expression. The logistic score evaluates this predictive
capability as follows:
1. The dataset is divided into training and testing sets.
2. A logistic regression model is trained on the training data using
feature expression values as input and spatial cluster labels as
the target.
3. The trained model is then used to predict cluster labels on the
testing set (
[MATH:
Xtest :MATH]
).
4. The accuracy of the predictions is calculated as:
[MATH: Accuracy=NumberofcorrectlypredictedclustersTotalnumberofclustersinthetestset :MATH]
33
Mutual information score
The Mutual Information Score measures the dependency between feature
expression levels and spatial cluster labels. The mutual information
(MI) for each feature
[MATH: i :MATH]
is computed as:
[MATH: MIi,c=
∑x,yP<
/mi>x,ylog
mi>Px,yPxPy :MATH]
34
[MATH: MeanMI=1N∑i=1
NMIi,c :MATH]
35
Where
[MATH:
P(x,y) :MATH]
is joint probability distribution of the feature values and cluster
labels, and
[MATH: Px,P(y) :MATH]
are their marginal probabilities, and
[MATH:
i=1,…,N
:MATH]
,
[MATH: N :MATH]
is the number of features.
* (4).
Biology conservation: To assess the conservation of biological
information, metrics such as ARI, NMI, Cell type ASW, Isolated
label silhouette, and Graph cLISI score, as defined in
scIB^[282]45, were utilized. The ground-truth labels used for
evaluation (specified via the “label key” parameter in scIB) were
manually annotated based on histological features identified
through H&E staining (Supplementary Fig. [283]2i). The biology
conservation score was computed as:
[MATH: Biologyconservationscore=ARI+NMI+CelltypeASW+Isolatedlablesilhouette+ GraphcLISI5<
/mn> :MATH]
36
For benchmarking simultaneous vertical and horizontal integration
across samples, we categorized the evaluation metrics into five broad
groups: (1) Reconstruction accuracy, (2) Continuity scores, and (3)
Marker scores, (4) Biology conservation, (5) Batch correction.
The definitions for Reconstruction Accuracy, Continuity Scores, Marker
Scores, and Biological Conservation were identical to those described
in the vertical integration benchmarking (see above). Additional
metrics were introduced to assess Batch Correction performance in the
horizontal integration context.
Batch correction
Batch correction performance was evaluated using metrics including
Batch ASW, Graph iLISI score, and PCR batch score, as defined in
scIB^[284]45. The batch correction was computed as:
[MATH: Batchcorrectionscore=BatchASW+GraphiLISI+PCRbatch3
:MATH]
37
In addition, the cross modality overall scores are calculated:
[MATH: crossmodalityscore=Continuityscore+Markerscore+Reconstruction3<
/mfrac> :MATH]
38
The cross sample overall scores are calculated:
[MATH: crosssamplescore=Biologyconservation+Batchcorrection2 :MATH]
39
The overall scores are calculated:
[MATH: overallscore=crossmodalityscore+crosssamplescore2
:MATH]
40
Metrics from SDMBench and scIB were used with their default scaling
(normalized to [0,1]). The additional metrics defined in this study
(PCC, CS, Specificity, Logistic, and MI) were also scaled to the [0,1]
range across all metrics.
The full Python code of the benchmark metrics and the visualization
Jupyter notebook are available at:
[285]https://github.com/WanluLiuLab/SpatialMETA/tree/master/benchmark.
Detailed benchmark metric values are provided in [286]source data
files.
Extensive analysis and visualization
The joint data of ST and SM in the AnnData format^[287]79. In this
structure, the spots information and annotation are contained with
DataFrames “obs”. Similarly, the genes and metabolites information and
annotation are stored within DataFrames “var”, and the distinction
between genes and metabolites is made using the “var.type” attribute,
which categorizes entries as either ST or SM.
To annotate metabolites, we first reference the Human Metabolome
Database (HMDB)^[288]80. All metabolites from HMDB undergo user-defined
adduct addition to generate new m/z values. A user-defined tolerance
(ppm) is then applied to calculate the m/z range for each metabolite.
Each metabolite is individually searched within its respective m/z
range to identify matching annotations (Supplementary Fig. [289]5f).
To determine the enrichment of the annotated metabolites, we perform a
hypergeometric distribution test. This statistical test assesses
whether the number of annotated metabolites associated with a specific
pathway is greater than expected by chance. The hypergeometric test is
defined by the following formula:
[MATH: Pk;n,K,<
/mo>N=(<
mrow>Kk)
mrow>(N−Kn−k)(Nn)
mrow> :MATH]
, where
[MATH: N :MATH]
is the total number of metabolites,
[MATH: K :MATH]
is the number of metabolites annotated to the pathway of interest,
[MATH: n :MATH]
is the number of annotated metabolites in the study, and
[MATH: k :MATH]
is the number of annotated metabolites in the study that are associated
with the pathway.
The interactive analysis and visualization of SpatialMETA is built by
Python Graphing Library, Plotly Dash. SpatialMETA loads spatial data in
AnnData format along with histology images. Users can interact with the
spatial data through various functionality such as box select, lasso
select, draw trajectory, and draw closed freeform. These interactive
operations are recorded in AnnData object.
Calculation of spatially variable genes and metabolites
The SVGs and SVMs for each sample were calculated using Moran’s I
method, as provided by Squidpy^[290]81. To remove batch-biased SVGs and
SVMs, we first calculated the shared sample numbers for each gene and
metabolite, setting a threshold to ensure they are shared among
multiple samples (“min_samples”, default is 2). For each sample, we
identified batch-biased SVGs and SVMs by calculating the fold change of
each gene or metabolite between one sample and the others. Genes or
metabolites with a fold change greater than the threshold (“min_logfc”,
default is 3) were defined as batch-biased SVGs and SVMs. Additionally,
these genes or metabolites had to be expressed in a certain proportion
of spots within the sample (“min_frac”, default is 0.8). The
batch-biased SVGs and SVMs were removed to facilitate better vertical
integration.
Annotation of spatial clusters
We employed the ccRCC scRNA-seq reference (Supplementary Fig. [291]3a)
for spatial transcriptomic deconvolution using DestVI^[292]50. For the
ST data, we calculated the top 2000 SVGs, and for the scRNA-seq data,
we identified the top 2000 highly variable genes. We then intersected
these gene sets, retaining only the overlapping genes. For the
scRNA-seq data, we used the “CondSCVI” model, while for the ST data, we
utilized the “DestVI.from_rna_model”. Referring to the deconvolution
prediction results, we manually annotated the spatial clusters
according to marker gene expression: Imm (immune clusters; CD8A, CD3D,
CD74), Endo (endothelial clusters; CD34, PECAM1, APLN), Stro (stromal
clusters; COL3A1, ACTA2, COL1A1), and Mal (malignant clusters;
NDUFA4L2, CNDP2, EGFR).
Reporting summary
Further information on research design is available in the [293]Nature
Portfolio Reporting Summary linked to this article.
Supplementary information
[294]Supplementary Information^ (27.3MB, pdf)
[295]41467_2025_63915_MOESM2_ESM.pdf^ (29.7KB, pdf)
Description of Additional Supplementary Files
[296]Supplementary Data 1^ (14.9KB, xlsx)
[297]Supplementary Data 2^ (11KB, xlsx)
[298]Reporting Summary^ (73.2KB, pdf)
[299]Transparent Peer review file^ (9.3MB, pdf)
Source data
[300]Source data Files^ (208.4KB, zip)
Acknowledgements