Abstract
Spatial transcriptomics (ST) technologies allow for comprehensive
characterization of gene expression patterns in the context of tissue
microenvironment. However, accurately identifying domains with spatial
coherence in both gene expression and histology in situ and effectively
integrating data from multi-sample remains challenging. Here, we
propose ResST, a graph self-supervised residual learning model based on
graph neural network and Margin Disparity Discrepancy (MDD) theory.
ResST aggregates gene expression, biological effects, spatial location,
and morphological information to capture nonlinear relationships
between a cell and surrounding cells for spatial domain identification.
Also, ResST integrates multiple ST datasets and aligns latent
embeddings based on MDD theory for correcting batch effects. Results
show that ResST identifies continuous spatial domains at a finer scale
in ten ST datasets acquired with different technologies. Moreover,
ResST efficiently integrated data from multiple tissue sections
vertically or horizontally while correcting batch effects. Overall,
ResST demonstrates exceptional performance in analyzing ST datasets.
Subject terms: Computational models, Machine learning
__________________________________________________________________
A graph self-supervised residual learning framework (ResST) for natural
domain identification and data integration of spatial transcriptomics.
Introduction
The complex structures and functions of multicellularity require the
cooperation of thousands to billions of cells. The gene expressions of
a cell are influenced by the surrounding cell and tissue
microenvironment. Knowledge of the relative locations of different
cells in a tissue is critical for understanding tissue functionality
and pathological changes. Spatial transcriptomics (ST) technologies
provide quantitative gene expression profiling with spatial information
in tissues and have brought unprecedented opportunities for insights
into the molecular organization of tissues. ST technologies can be
mainly classified into two categories. The first category is in situ
hybridization or sequencing-based technologies, such as
seqFISH^[38]1,[39]2, seqFISH+^[40]3, MERFISH^[41]4,[42]5, etc, which
measure the expression levels of hundreds to thousands of genes in
cells within tissue microenvironments. This type of method can achieve
gene expression quantification with high spatial resolution at the
single-cell resolution level. The second category is based on in situ
capturing-based technologies, including ST^[43]6, SLIDE-seq^[44]7,
SLIDE-seqV2^[45]8, Stereoseq^[46]9, 10× Visium^[47]10,[48]11, etc,
which first label spatial spots on histological tissue sections with
unique barcodes and then perform sequencing. These methods can profile
the whole transcriptome information of tissue sections of any size.
In ST studies, a critical step is accurate clustering for the
identification of spatial domains. Traditional non-spatial clustering
methods only utilize gene expression for cell clustering, such as
Kmeans and Louvain’s method, which were extensively employed in
single-cell transcriptomics studies^[49]12. However, these methods
ignore spatial location and morphological information, resulting in
clustering results that are discontinuous in tissue sections. Several
spatial clustering methods have been developed to analyze ST data
derived from tissue samples and interpret the spatial dependence of
genes by integrating spatial location or morphological information. For
example, BayesSpace^[50]13 enables spatial clustering by modeling a
low-dimensional representation of gene expression matrix and
encouraging neighboring spots to belong to the same cluster via a
spatial prior. stLearn^[51]14 extracts features and location
information from histological images and combines gene expression
features from adjacent nodes to normalize gene expression data.
SpaGCN^[52]15 utilizes graph convolutional neural networks to identify
spatial domains and spatially differential genes by combining gene
expression, spatial location, and histology. SEDR^[53]16 embeds spatial
information using deep autoencoders and graph autoencoders to map gene
expression patterns to low-dimensional space. Recently, Cell Clustering
for Spatial Transcriptomics data (CCST)^[54]17 performs cell clustering
by extending an unsupervised node embedding method based on graph
convolutional networks. These existing methods enhance clustering
accuracy by incorporating spatial location or morphological
information. Therefore, the comprehensive utilization of multimodal
data including gene expression, spatial location, and morphological
information, effectively improves the clustering performance. However,
the aforementioned methods do not consider the impact of biological
effects on spatial domain identification. It has been demonstrated that
biological effects are strongly linked to gene expression in tissue
cells owing to gene expression characteristics influenced by biological
effects, such as cell properties, surrounding cells, and tissue
microenvironment^[55]18. Also, the neural network architectures
developed by these deep learning methods mainly rely on linear
principal component analysis (PCA) to extract features from normalized
gene expression and are relatively simple, lacking the ability to learn
more nonlinear information from gene expression, biological effects,
spatial information, and morphological information. It is desirable to
develop efficient computational approach that aggregates gene
expression, biological effects, spatial location, and morphological
information to capture nonlinear relationships between a cell and
surrounding cells for natural domain identification.
Another critical task in ST studies is to integrate data from multiple
sections while correcting batch effects. In order to obtain the
real-world gene expression patterns of tissue organization for fully
understanding tissue functionality and pathological changes, multiple
studies have performed the analysis of ST data from multiple tissue
sections equally distributed along vertical or horizontal axis of the
tissue. Although a variety of data integration methods have been
developed, such as Harmony^[56]19 and Scanorama^[57]20, most of them
are for single-cell datasets without consideration of spatial
information. Recently, PASTE^[58]21 and PRECAST^[59]22 were proposed to
analyze ST datasets with consideration of spatial information. However,
these methods only apply to integrate ST data from multiple tissue
sections uniformly distributed along the anteroposterior axis of the
same area of interest (vertical integration). They also show limited
flexibility for different ST platforms, resulting in poor clustering
performance in broad-spectrum data^[60]23,[61]24. More recently,
SpaGCN^[62]15 and STAGATE^[63]25 can jointly analyze data from multiple
adjacent sections (horizontal integration) containing tissue
compartments within the whole organ, but they are ineffective in
correcting batch effects and inapplicable in integrating data from
multiple tissue sections vertically. Also, most existing methods map
integrated data into low-dimensional space using the principal
components of conventional dimension reduction without considering the
consistent loss functions of dimension reduction, alignment across
samples, or spatial clustering. Therefore, there is a need for a novel
method to integrate ST data from multiple sections horizontally or
vertically while correcting batch effects.
To address the aforementioned challenges, we developed ResST, a graph
self-supervised residual learning model based on graph neural networks
and Margin Disparity Discrepancy (MDD) theory, to perform spatially
natural clustering and integrate data horizontally or vertically while
correcting batch effects. For clustering, ResST constructs gene
expression correlation matrix to represent the impact of biological
effects on gene expression. Then ResST learns the spatial embedding
(residual) and low-dimensional gene representation. Subsequently,
low-dimensional gene concatenates with the residual for composing
latent embedding that captures multimodal information, including gene
expression, biological effects, spatial location, and morphological
information, significantly enhancing clustering accuracy. Moreover,
ResST achieved algorithm design of data integration based on MDD theory
for domain adaptation. MDD is a measurement with rigorous
generalization bounds, tailored to the distribution comparison with the
asymmetric margin loss^[64]26. It promotes the proximity between the
class mean of the two domains for each class and pushes apart the mean
of different classes, significantly improving prediction accuracy in
unsupervised domain adaptation^[65]26,[66]27. In this study, ResST
possesses the ability to integrate ST data from multiple tissue
sections horizontally or vertically. Moreover, ResST aligns the latent
embeddings across multiple tissue sections by incorporating MDD
measurement to correct batch effects.
We extensively tested ResST for the two key tasks on analysis of ST
data from different platforms (e.g., 10× Visium, SLIDE-seqV2^[67]8,
Stereoseq^[68]9, seqFISH^[69]1,[70]2, MERFISH^[71]4,[72]5, 4i^[73]24,
and MIBITOF^[74]23) and compared it with current methods. The results
showed that ResST achieved better clustering performance on the human
dorsolateral prefrontal cortex (DLPFC) dataset than current
state-of-the-art methods. Moreover, ResST identified much more refined
heterogenous subregions in breast cancer ST data and sagittal section
seqFISH data of mouse embryos. Interestingly, ResST effectively reduced
batch effects in the human DLPFC dataset and mouse posterior brain 10×
Visium data, and more clearly identified the dentate gyrus of mouse
posterior brain compared to other mainstream batch processing methods.
In summary, our results demonstrated that ResST has a powerful ability
to accurately identify spatial domains and efficiently integrate data
from multiple tissue samples horizontally or vertically while
correcting batch effects, making it highly scalable for different ST
platforms.
Results
Overview of ResST workflow
We explain the workflow of ResST using in situ capturing-based ST data
as an example, but the model can be easily modified to different ST
platforms. ResST couples gene expression with biological effects,
spatial location, and morphological information to accurately identify
spatial domains (Fig. [75]1a). ResST first constructs an enhanced gene
expression matrix of all spots by combining gene expression, spatial
location, and morphological information, and calculates an adjacent
matrix using k-nearest neighbors algorithm based on spatial coordinates
(Fig. [76]1b). Then, ResST feeds enhanced gene expression matrix into
encoder to generate low-dimensional gene representation. Next, ResST
utilizes two graph convolutional layers to map low-dimensional gene
representation into spatial embedding (residual). Subsequently,
low-dimensional gene representation from the encoder is mapped through
identity shortcuts and concatenates with the residual for composing
latent embedding. Finally, the learned latent embedding can be applied
to spatial domain identification and cell type or subtype
identification (Fig. [77]1c). When integrating data from multiple
sections, we design a loss function for domain classifier based on MDD
measurement to calculate loss between domains (tissue sections). ResST
integrates enhanced gene expression matrixes from multiple datasets to
generate latent embeddings, which are fed into a domain classifier
connected by a gradient reversal layer for correcting batch effects
(Fig. [78]1c, d). During the learning process, the domain classifier
can map the source and target domains of different distributions into
the same feature space, which is used to integrate spatial data from
various distributions. In addition, ResST applies domain classifier to
maintain the correspondence between cells and tissue sections,
preserving layer-specific gene expression variations.
Fig. 1. Overview of ResST workflow.
[79]Fig. 1
[80]Open in a new tab
a ResST workflow begins with gene expression, spatial location, and
morphological information in ST data. b ResST constructs an enhanced
gene expression matrix from morphological information, gene expression,
and spatial location. ResST initially segments the histological image
based on the coordinate of each spot and extracts the image features by
a pre-trained deep learning model. The morphological similarity matrix
is calculated to assess the influence of morphological information on
clustering based on the image features. The gene expression correlation
matrix is calculated to assess the influence of biological effects on
clustering. The spatial correlation matrix is calculated to quantify
the influence of spatial information on determining similar cellular
states. c ResST is a graph self-supervised residual learning model
based on graph neural network and Margin Disparity Discrepancy (MDD)
theory. The deep autoencoder consists of an encoder, a graph
autoencoder, and a decoder. d ResST can integrate ST data from
different sections while correcting batch effects and accurately
assigning spots to the correct spatial domains. N represents the number
of spots, while M[1] and M[2] correspond to the dimensions of the final
layer of encoder and the final layer of GNN, respectively.
ResST exhibits superiority in identifying spatial domains in the human
dorsolateral prefrontal cortex and mouse brain data
We first assessed the superiority of ResST’s spatial clustering
performance on the human DLPFC^[81]20 dataset with 12 sections from
three distinct individuals acquired with 10× Visium. It includes
cortical layers (L1-L6) and white matter (WM) labeled with marker genes
and histological annotations of tissue structure, which enabled
evaluation of the accuracy of spatial domain detected by each method.
The spatial domain identified by these methods for the representative
tissue section 151673 (3639 locations and 33538 genes) was displayed in
Fig. [82]2b. The visual comparison demonstrated that Kmeans spatial
clustering had the poorest performance with mixed cortical layers
(L1-L6). Seurat and stLearn recovered part of the cortical layers and
WM, but they did not achieve clean separation and clear boundaries of
the cortical layers (L1-L6). SEDR and CCST could identify well-refined
layers by utilizing histological information or spatial location for
spatial clustering, but they only divided cortical layers into five
layers and most of the layers were inconsistent with the annotations.
SpaGCN and BayesSpace accurately delineated all the cortical layers
(L1-L6) and WM, but they generated incorrect layer thickness of L1 and
L2. ResST delineated finer-grained layers by integrating multimodal
information including gene expression, biological effects, spatial
location, and morphological information, and distinguished all the
cortical layers (L1-L6) and WM that were most consistent with manual
annotations of DLPFC and neuroscientific definitions of cortical
layers.
Fig. 2. ResST exhibits superiority in identifying spatial domains.
[83]Fig. 2
[84]Open in a new tab
a Manually annotated layer structure for slice 151673 from ref.
^[85]57. b Spatial domains identified by ResST and existing
state-of-the-art algorithms (Kmeans, Seurat, SpaGCN, stLearn, SEDR,
CCST, and BayesSpace). c Boxplot of the performance of ResST and other
algorithms for all 12 DLPFCs. The x-axis shows ARI and FMI, which were
used to compare the similarity of the predicted spatial layers and the
manually annotated layers for each algorithm. d UMAP visualizations and
PAGA graphs were generated for slide 151673 with Seurat-derived
principal components (left) and ResST-derived embeddings (right). e
Clustering results of mouse brain coronal region (top); The
corresponding anatomical Allen Mouse Brain Atlas (bottom,
[86]https://atlas.brain-map.org/). f Clustering results of mouse brain
sagittal anterior section (section 2); The corresponding anatomical
Allen Mouse Brain Atlas (bottom). g Clustering results of mouse brain
sagittal posterior section (section 1); The corresponding anatomical
Allen Mouse Brain Atlas (bottom).
We further evaluated the quality of clustering performances for each
method using external measures, namely the Adjusted Rand Index
(ARI)^[87]28 and the Fowlkes-Mallows Index (FMI)^[88]29. ResST achieved
much higher ARI and FMI than the current best method (BayesSpace). The
mean ARI was 0.694 ± 0.062 for ResST and 0.526 ± 0.128 for BayesSpace
(Wilcoxon test, p = 0.0007; Supplementary Table [89]1). The FMI was
0.752 ± 0.064 for ResST and 0.635 ± 0.099 for BayesSpace (Wilcoxon
test, p = 0.0061; Supplementary Table [90]2). The mean ARI was lower
than 0.426 ± 0.094 for other methods. ResST achieved the highest
clustering accuracy in tissue section 151509 (ARI = 0.792;
Supplementary Fig. [91]3) and lower variability in performance across
the slices by considering complex global interactions than most of the
other methods (Fig. [92]2c and Supplementary Figs. [93]1–[94]11).
Moreover, compared to Seurat, ResST provided more orderly cortical
layers from L1 to L6 and WM illustrated by uniform manifold
approximation and projection (UMAP) and partition-based graph
abstraction (PAGA) (Fig. [95]2d). Therefore, ResST exhibited the most
outstanding performance in DLPFC dataset.
Also, we applied ResST to 10× Visium data from the sagittal posterior,
sagittal anterior, and coronal regions of the mouse brain with highly
intricate tissue architectures. Although Kmeans was able to outline the
major anatomical regions, many clusters were intermixed. Seurat,
stLearn, and SpaGCN managed to achieve more distinct clusters, but the
boundaries between these clusters remained ambiguous. CCST generated
clean separation and clear boundaries, but clusters were inconsistent
with histological images (Supplementary Figs. [96]12, [97]13). ResST,
BayesSpace, and SEDR showed outstanding clustering performance and
clearly identified tissue structures closer to histological images.
Moreover, only ResST revealed finer-grained tissue structures on all
three areas of mouse brain ST data, including the cornu ammonia and
dentate gyrus in the hippocampus of mouse brain (Fig. [98]2e), the
layers 2/3, 4/5, and 6 of the cortex in the sagittal anterior (Fig.
[99]2f and Supplementary Figs. [100]12, [101]13), as well as the
cerebellar cortex and dorsal horn of hippocampus region in the sagittal
posterior (Fig. [102]2g and Supplementary Figs. [103]14, [104]15).
We quantified the clustering performance with internal measures of the
Silhouette Coefficient (SC)^[105]30, Calinski-Harabasz Index
(CH)^[106]31, and Davies-Bouldin index (DB)^[107]32, which is only
generated from SEDR, CCST, and stLearn, and cannot be obtained from
other methods because of no embeddings. Clustering analysis of the
sagittal posterior data showed that ResST had higher SC (SC = 0.122)
and CH (CH = 137.031), and lower DB (DB = 1.920) than those of SEDR,
CCST, and stLearn (Fig. [108]2g and Supplementary Fig. [109]14).
Similarly, ResST achieved better clustering performance than most of
the test methods on the coronal and sagittal anterior data
(Supplementary Figs. [110]12, [111]13). These results demonstrated that
ResST exhibited superiority in identifying spatial domains by
considering gene expression, biological effects, spatial location, and
morphological information in the human DLPFC and mouse brain data.
ResST reveals finer-grained tissue structures for improved heterogeneity
analysis in breast cancer data
We assessed the clustering performance of ResST on the 10× Visium
dataset of human breast cancer (invasive ductal carcinoma). This
dataset comprises 3789 spatial spots and 36601 genes, manually
annotated by pathologists based on H&E images and reported breast
cancer marker genes with spatial expression characteristics (Fig.
[112]3a). Here, we tested the ability of ResST, compared to other
competing methods, to recover the detailed pathological structures.
Among all the methods, ResST achieved the highest ARI (ARI = 0.728),
followed by BayesSpace (ARI = 0.668), while the remaining methods had
ARI values below 0.56 (Fig. [113]3b, Supplementary Fig. [114]16). The
visual comparison clearly showed that the computed domains from Kmeans,
Seurat, stLearn, and SpaGCN were fragmented and discontinuous, while
SEDR, CCST, BayesSpace, and ResST produced more jointed domains. Kmeans
was only able to recover the Healthy 1 domain with the remaining
domains mixed among the others, indicating the poorest performance. The
remaining methods achieved improved spatial domain identifying, yet the
boundaries between domains performed by Seurat, stLearn, and SpaGCN
appeared ragged and lacked clean separation. Although CCST and
BayesSpace performed better, they inaccurately divided Health 1 region
into multiple sub-regions which also existed in Seurat and stLearn.
SEDR exhibited clear boundaries, but the reconstructed tissue
structures did not align perfectly with the manual annotations. Only
Seurat, BayesSpace, and ResST recognized the sub-domain (domain 13)
within domain 2. Furthermore, ResST was the sole method that overcame
all the aforementioned limitations that existed in other methods,
producing the domains that were closest to the annotations, and
achieving finer-grained pathological structures in cancer slices. Based
on the clustering results of ResST, we delved deeper into the
significant differentially expressed genes (DEGs) between domain 2 and
its sub-domain, domain 13. We conducted differential expression
analysis on domains 2 and 13 and detected 349 significant DEGs
(log[2]FC ≥ 2; p.adjust < 0.05) between the two domains. Some of these
DEGs, such as PTGES and VTCN1, were significantly overexpressed in
domain 13 and could distinguish domains 2 and 13 (Fig. [115]3c,
Supplementary Fig. [116]17, and Supplementary Data [117]1). It has been
demonstrated that several identified DEGs were regarded as potential
targets in cancer therapy. For example, VTCN1^[118]33 is associated
with immunosuppression, and PTGES^[119]34 plays an important role in
inflammation, immune regulation, cellular proliferation, and apoptosis
(Fig. [120]3c). These results suggested that ResST displayed superior
spatial clustering performance and provided more important DEGs for
understanding cancer pathogenesis and uncovering therapeutic targets.
Fig. 3. ResST identified spatial domains in human breast cancer at a finer
level.
[121]Fig. 3
[122]Open in a new tab
a 10× Visium ST data of a breast cancer sample annotated by
pathologists containing invasive ductal carcinoma (IDC), ductal/lobular
carcinoma (DCIS/LCIS), tumor edge, and healthy regions. b Spatial
domains identified by ResST, Seurat, and BayesSpace. c Volcano graph of
DEGs between domains 2 and 13 (left); Gene expression with regional
annotation and violin plots of PTGES and VTCN1 (right). d Heatmap of
Pearson correlation coefficient among domains. e LYZ and SLITRK6
express differentially between domains 1 and 13. f Cell type
annotations from Seurat (left); Domains with highest ligand-receptor
interaction activity (LR_score < 7000) (middle); Spatial location of
domains 5, 7, and 13 (right). g Gene ontology enrichment analysis of
the DEGs between domains 1 and 13. h H&E of human breast cancer sample
annotated by Agoko’s telepathology platform (left); Spatial domains
identified by ResST, SEDR, CCST, and stLearn (right).
To further explore the spatial heterogeneity within tumor tissues, we
conducted domain correlation analysis using the cluster labels and the
recovered pathological structures from ResST (Fig. [123]3d and
Supplementary Data [124]2). We found significant heterogeneity between
region 1 and regions 2, 3, and 13. Region 1 corresponds to DCIS/LCIS,
while regions 2, 3, and 13 correspond to IDC in the breast tissue
slices, respectively (Fig. [125]3d). We compared transcriptional
differences between domain 1 and 13 by performing differential
expression analysis followed by pathway enrichment analysis and
ligand-receptor interactions analysis. A total of 413 significant DEGs
(log[2]FC ≥ 2; p.adjust < 0.05) were detected and the top ten DEGs were
selected in each domain for evaluating the transcriptome heterogeneity
(Supplementary Data [126]3). CRISP3, SLITRK6, C6orf141, VTCN1, ABCC11,
EFHD1, PDLIM1, ECM1, PIP, and PTGES were upregulated in domain 13 and
could be used to accurately distinguish DCIS/LCIS and IDC. CRISP3 is
associated with immune response modulation and tumor microenvironment,
tumor-associated macrophage (TAM) infiltration^[127]35. VTCN1 is a
negative co-stimulatory molecule that inhibits the activation and
proliferation of T cells, which is associated with immune escape, TAM
infiltration, and poor prognosis in various cancers^[128]36. PIP is
linked to immune escape mechanisms in breast cancer^[129]37.
ECM1^[130]38 and PTGES^[131]34 are associated with tumor proliferation,
migration and invasion, chemoresistance, and poor prognosis (Fig.
[132]3e and Supplementary Fig. [133]18). Moreover, ResST identified
SCGB1D2, SCGB2A2, FCGR3B, LYZ, CXCL9, ABHD2, FCGRT, LINC02224, KLHDC7B,
and HLA-DPA1, which were highly expressed in spatial domain 1. Previous
studies have demonstrated that SCGB2A2 and SCGB1D2 may be new
independent prognostic markers in breast cancer^[134]39. FCGR3B is
considered as a marker gene of Natural killer cells^[135]40, and
CXCL9^[136]41, KLHDC7B^[137]42, and LYZ^[138]43 are marker genes of
lymphocytes, indicating abundant immune cell infiltration in domain 1
(Fig. [139]3e and Supplementary Fig. [140]18). We conducted
differential expression analysis for region 1 compared to regions 2 and
3, respectively. The genes highly expressed in region 1 (Supplementary
Data [141]3) are associated with immune response and extracellular
matrix organization. The genes highly expressed in regions 2 and 13
(Supplementary Data [142]3) are related to tumor invasion, immune
evasion, and cytoskeletal remodeling.
Additionally, we observed multiple pathways upregulated in domain 1,
such as phagosome, lymphocyte-mediated immunity, and innate immune
response (Fig. [143]3g and Supplementary Data [144]4). These pathways
participate in anti-tumorigenic or anti-metastatic
effects^[145]44–[146]46, indicating a low potential for tumor
metastasis of DCIS/LCIS, consistent with previous studies^[147]47.
Moreover, ResST performed spatial cluster by considering gene
expression, spatial location, complex biological effects, and
morphological information and identified domains 5, 7, and 15 which was
consistent with locations in the tissue with high ligand-receptor
interaction activity performed by stLearn (Fig. [148]3f). In
conclusion, the results demonstrated ResST was capable of recovering
detailed pathological structures in cancer tissues and provided immense
assistance for exploring the heterogeneity present within tumor
tissues.
We also examined another ST data of human breast cancer (Ductal
Carcinoma in Situ) with manual annotations (Fig. [149]3h). ResST, along
with other methods (Fig. [150]3h and Supplementary Fig. [151]19), was
employed to assess the ability to detect spatial structures. The
clustering results showed that SEDR and stLearn exhibited poorer
performance compared to CCST and ResST, with lower SC and CH and higher
DB. Although ResST (SC = 0.132, DB = 2.433, CH = 71.518) provided
higher DB and lower CH compared to CCST (SC = 0.092, DB = 1.793,
CH = 199.572), ResST exhibited higher SC by performing stronger
coherence within their respective spatial positions and generated clear
boundaries and continuous domains consistent with the annotations which
are similar to those of CCST. The results indicated that ResST enabled
detailed profiles of spatial transcriptome and facilitated the
discovery of tumor heterogeneity.
ResST corrects batch effects for vertical and horizontal integration of
multiple tissue sections
An obstacle to integrating data from multiple sections vertically or
horizontally is the presence of batch effects. We assessed the
remarkable capabilities of ResST in correcting batch effects and
revealing finer-grained tissue structures. This performance was
conducted on the 10× Visium dataset of the posterior sagittal mouse
brain, which comprises two sections along the same vertical axis and
exhibits intricate tissue characteristics. ResST incorporated a
measurement with rigorous generalization bounds, MDD, in the workflow
for integrating data from multiple tissue sections. We compared the
performance of the pipeline integrated without (Method 1) or with
(Method 2) MDD measurement for evaluating the effect of MDD measurement
on spatial clustering results. The results showed that Method 2, ResST
(Fig. [152]4b; SC = 0.161, DB = 1.676, CH = 284.062), provided more
finer-grained clustering results than Method 1 (Fig. [153]4b;
SC = 0.152, DB = 2.054, CH = 145.330). As shown in Fig. [154]4b, we
observed that ResST accurately identified the dorsal and the ventral
horn of the hippocampus regions, which was not identified by Method 1.
Current methods used for data integration, Harmony and Scanorama, only
identified the ventral horn of the hippocampus region. These results
suggested that the spots obtained by ResST retained maximal biological
content from multiple samples and MDD measurement improved spatial
clusters on data integration of multiple tissue sections.
Fig. 4. ResST corrects batch effects for vertical and horizontal data
integration of multiple tissue sections.
[155]Fig. 4
[156]Open in a new tab
a H&E image of posterior mouse brain from the original study. b The
spatial clustering results from Method 2 (with MDD), Method 1 (without
MDD), Harmony, and Scanorama (from left to right). c UMAP plots before
batch effects correction and spatial location of sections 151673 and
151675 in DLPFC. d Vertical integration results with two DLPFC
sections. UMAP plots after batch effects correction and spatial
clustering from ResST, Harmony, and Scanorama (left). Spots are colored
according to the spatial domains identified by the respective
clustering methods (right). e Horizontal integration results from ResST
(top) and SpaGCN (bottom) with two mouse brain samples which consist of
sagittal anterior and posterior regions.
We further evaluated the ability of ResST to correct batch effects on
the DLPFC ST dataset, which originates from a contiguous region within
the tissue obtained along a vertical axis, serving as a benchmark
dataset for evaluating vertical integration. ResST was compared with
Harmony and Scanorama, which are commonly used to overcome batch
effects in single-cell genomics data. We first assessed the integration
results visually with UMAP. There was a slight offset between sections
prior to batch correction (Fig. [157]4c). After integration, the UMAP
plots for ResST, Harmony, and Scanorama showed that spots from
different sections were all well mixed, while the domain clusters were
well segregated. In the subsequent clustering, the spatial domains from
ResST, Harmony, and Scanorama were all highly overlapping, but the
domains from Harmony and Scanorama were only able to recover the WM
with remaining clusters mixed among the other layers (Fig. [158]4d).
Moreover, the visualization demonstrated that ResST more distinctly
revealed the laminar structure of the human cerebral cortex with clean
separations and clear boundaries compared to other methods (Fig.
[159]4d).
We next tested the performance of ResST on data integration and batch
effects correction using a mouse brain 10× Visium dataset. The mouse
brain sections consist of anterior and posterior regions (horizontal
sections) and include two slices for each tissue region (vertical
sections) with complex structural characteristics. On the UMAP plots,
ResST, Harmony, and Scanorama all reduced batch effects and spots from
different sections were all well mixed. Moreover, ResST performed
better and well-segregated clusters, which were not achieved by Harmony
and Scanorama (Supplementary Fig. [160]20). Subsequently, the clusters
from ResST aligned well across the two sections, and ResST generated
clusters that were more consistent with known anatomy (Fig. [161]4a)
than other methods. In contrast, the clusters generated by Harmony and
Scanorama were fragmented and incompletely in agreement with known
anatomy (Supplementary Fig. [162]20). We further compared the internal
measures of ResST with those of Harmony and Scanorama. ResST exhibited
a higher SC and similar DB values compared to Harmony (Fig. [163]4b;
SC = 0.125, DB = 1.705, CH = 426.953) and Scanorama (Fig. [164]4b;
SC = 0.125, DB = 1.509, CH = 675.083), generating greater continuous
spatial domains with lower noise levels. Also, ResST performed better
on two sections from the 10× Visium dataset of the anterior sagittal
mouse brain (Supplementary Figs. [165]12, [166]13). These results
demonstrated that ResST efficiently integrated data from multiple
sections and corrected batch effects, thereby improving the performance
of identification of spatial domain.
Then, we compared SpaGCN and ResST in horizontal integration using two
sections of anterior and posterior from the mouse brain 10× Visium data
and employed the annotated Allen mouse brain Atlas^[167]48 as the
ground truth. The results showed that most of the spatial clusters
identified by SpaGCN and ResST demonstrated remarkable concordance with
known anatomical structures. However, SpaGCN did not capture certain
pivotal regions within the tissue and many clusters obtained by this
method were fragmented and failed to align along the shared edges
between the two tissue sections (Fig. [168]4e). In contrast, ResST
exhibits superior performance and clearly identified the cerebral
cortex, cerebellum, and corpus callosum. Moreover, ResST showcased
enhanced alignment along the edges of both anterior and posterior
sections, leading to outstanding performance on data integration and
batch effects correction.
ResST works well on various spatial omics data independent of platforms
Aside from the 10× Visium platform, we investigated the generalization
ability of ResST on image-based molecular data (seqFISH, MERFISH, 4i,
and MIBITOF) and high-resolution ST data (Stereoseq and SlideseqV2). We
first applied ResST to 4i^[169]24 (iterative indirect
immunofluorescence imaging) data, which measured 40 protein reads
(~270,000 observations/pixel) in high-throughput biological samples
from millimeter to nanometer scales. Here, we only used partial
molecular data for spatial domain identification (Fig. [170]5a and
Supplementary Fig. [171]21a, pixel coordinates (x[1], y[1])- (x[2],
y[2]): (1100, 950)- (1300, 1150)). The results showed that ResST can
depict detailed subcellular distribution in local regions, including
various compartments, organelles, and cell structures within each cell
(Fig. [172]5a). Also, we applied ResST to another image-based molecular
MIBI-TOF dataset^[173]23, which imaged 36 labeled antibodies with
histological staining and endogenous elements (3309 pixels). Similarly,
ResST revealed partially continuous regions and locally fused elements
in four imaging results (Fig. [174]5b and Supplementary Fig. [175]21b),
which was almost consistent with the original annotations^[176]23.
Fig. 5. ResST works well on various spatial omics data.
[177]Fig. 5
[178]Open in a new tab
a Visualization of subcellular molecular maps using 4i (Iterative
Indirect Immunofluorescence Imaging) (top; partial data, coordinates
(x[1], y[1])–(x[2], y[2]): (1100, 950)–(1300, 1150) and spatial domain
detection using ResST (bottom). b Visualization of imaging-based
molecular MIBI-TOF dataset (3309 pixels and 36 proteins) with
annotation (top, the point16 section) and spatial domain detection
using ResST (right, ARI = 0.524). c Visualization and annotation of
SlideqV2 data in the mouse hippocampus (left); spatial domain detection
using ResST (middle); distribution of gene Tmsb4x expression in the
data. d Visualization and annotation of MERFISH data in the mouse
prefrontal hypothalamus (top) and spatial domain detection using ResST
(bottom). e Visualization and annotation of adult Macaque cortex
Stereoseq data (leftmost); spatial domain partition using ResST (left
middle); expression distribution of the marker genes GFAP, GPR83, and
SYT6 in tissue sections of the cortex (right). f Visualization and
annotation of seqFISH data in sagittal slices of mouse embryos (left);
spatial domain partition using ResST (right).
Subsequently, we evaluated the performance of ResST on ST data at an
approximate single-cell resolution. We assessed the ability to recover
the annotated structures of ResST on a mouse hippocampus dataset
acquired with Slid-seqV2^[179]8 (comprising 41,886 subcellular regions
and 4000 genes). ResST was able to produce spatially consistent
clustering and outline the major anatomical regions such as the
“DenatePyramids” and “CA1_CA2_CA3_subiculum”. Interestingly, ResST
divided “CA1_CA2_CA3_subiculum” into two domains (domains 5 and 6)
(Fig. [180]5c) and we detected significant spatial DEGs between domains
5 and 6, such as Tmsb4x. Also, ResST demarcated several subareas,
including the third ventricle (domain 13), medial habenula (domain 12),
and lateral habenula (domain 5), which were not clarified in the
original annotation (Supplementary Fig. [181]21d). Furthermore, ResST
successfully processed MERFISH^[182]4 data from mouse hypothalamus, and
clearly identified both “Ependymal” and “OD Mature” structural domains
(Fig. [183]5d).
We also validated the performance of ResST on adult Macaque cortex
Stereoseq chips^[184]49. ResST accurately identified the L2-L6 regions
in the cortex layer, consistent with histological staining definitions.
In addition, we identified marker genes through a more in-depth
analysis, such as GFAP, GPR83, and SYT6, which were consistent with
spatial clustering results (Fig. [185]5e and Supplementary
Fig. [186]21c). Finally, we validated the generalization ability of
ResST on seqFISH data from sagittal slices of mouse embryos^[187]50.
ResST can accurately identify tissue structures in the brain, heart,
and intestinal tube among the ectoderm, mesoderm, and endoderm,
respectively. Moreover, ResST divided “Forebrain/midbrain/hindbrain”
into finer-grained tissue structures, forebrain, midbrain, and
hindbrain, which were not clarified in the original annotation (Fig.
[188]5f). These results suggested that ResST can make detailed
structural domain divisions for ST data from different platforms.
Discussion
ST technologies are powerful experimental methods for measuring gene
expression while preserving the relevant spatial context. Spatial
clustering and data integration are the most important tasks for
analyzing ST datasets. Previous studies showed that accurate clustering
is dependent on the comprehensive utilization of multimodal data (gene
expression and spatial location) derived from ST technologies and
morphological information^[189]12–[190]17. It has been demonstrated
that gene expression characteristics in tissue cells are related to
biological effects, such as cell properties, surrounding cells, and
tissue microenvironment^[191]18. However, current ST processing tools
ignore the impact of biological effects on spatial clustering.
Moreover, current ST datasets are obtained from a single section or
multiple tissue sections. Integration of these data includes horizontal
or vertical integration. Previous studies have revealed that the batch
effects during data generation processes decrease the accuracy of
downstream analysis and must be removed while preserving the biological
meaning of datasets^[192]51. However, few ST processing tools are
applicable to simultaneously integrate data horizontally and vertically
while remove batch effects. In this study, we developed a novel ST
processing method, ResST, which not only fully utilized gene
expression, biological effects, spatial location, and morphological
information for improving clustering accuracy, but also simultaneously
integrated data horizontally and vertically while removed batch
effects.
Gene expression derived from ST technologies encapsulates complex
biological information including cell properties, surrounding cells,
tissue microenvironment, cell communication, receptor-ligand
interactions, etc, which are biological effects objectively existing in
the tissues and organs^[193]18. Current deep learning methods used for
ST clustering analysis are based on simplistic neural network
architectures^[194]14,[195]15,[196]17. These methods simply aggregate
gene expression, spatial information, or morphological information from
neighboring spots, but they fail to capture more effective multimodal
information, such as biological effects. In this study, ResST computed
the gene expression correlation matrix using gene expression matrix
based on a linear approach. The gene expression correlation matrix, a
n × n matrix, quantified the correlation between cells/spots,
representing biological effects. The clustering performance was
evaluated with or without biological effects on breast cancer datasets.
The results showed that the ARI values significantly increased when
integrating biological effects (Supplementary Fig. [197]22), indicating
that ResST improved spatial domain clustering performance by
integrating biological effects. Previous reports showed that the
residual neural network is a classical convolutional neural network
that introduces a shortcut connection that enhances the extraction of
effective information and effectively prevents network performance
degradation^[198]52,[199]53. In this study, we applied the residual
learning model to fully capture multimodal information, such as
biological effects. The unique residual design of ResST allows the
encoder’s output to be used as input for a graph autoencoder, and
concatenate with the corresponding output of the autoencoder for
generating latent embeddings that capture nonlinear relationships
between a cell and surrounding cells for spatial domain identification.
Therefore, ResST enables accurate and continuous spatial clustering,
bringing ST analysis profiles closer to natural and realistic tissue
structures and achieving more accurate clustering than that of existing
models.
MDD theory, a measurement with rigorous generalization bounds, can
seamlessly be transformed into an adversarial learning algorithm for
domain adaptation^[200]26. This algorithm promotes the proximity
between the class mean of the two domains for each class and pushes
apart the mean of different classes, significantly improving prediction
accuracy in unsupervised domain adaptation^[201]26,[202]27. Notably,
data integration tasks in ST studies include batch effects correction
and clustering of mixed data, which essentially belong to the
classification tasks of domain adaptation. In the present study, ResST
achieved an algorithm design for data integration based on the MDD
theory. The results showed that the designed algorithm can effectively
extract the features of data from different sections and make the
feature distributions of different sections more similar, removing the
batch effects in data integration tasks.
Another advantage of ResST is that it integrates ST data from multiple
sections while preserving layer-specific gene expression variation.
Current data integration methods mix data from multiple batches before
performing downstream analysis^[203]19–[204]22. For example,
PASTE^[205]21 stacks and/or integrates ST data from multiple adjacent
tissue sections into a single slide, failing to capture cellular
biological variations across sections. Based on the MDD theory, ResST
feeds latent embeddings into a domain classifier through a gradient
reversal layer for maintaining the correspondence between cells and
tissue sections, capturing the cellular biological variations across
sections. Therefore, ResST integrates ST data from multiple sections
horizontally or vertically while correcting batch effects and
preserving layer-specific gene expression variation.
We designed ResST to process data acquired from different spatial
transcriptomic platforms. When applied to different complexity levels
of 10× Visium data, ResST detected more continuous spatial domains with
reduced noise levels, which was consistent with manual annotations.
Analysis of breast cancer ST data, ResST offered more reasonable
partitioning of spatial domains and provided a more detailed
delineation of substructural regions within the tissue, giving great
assistance in understanding cancer heterogeneity and molecular
mechanism of cancer occurrence and metastasis. ResST also identified
fine spatial domains for seqFISH data of sagittal sections of mouse
embryos. When analyzing ST data of human DLPFC and sagittal sections of
mouse brain, ResST identified more continuous and accurate spatial
domains by integrating data horizontally and detected biologically
coherent spatial domains that aligned across samples while removing
batch effects by integrating data vertically. Also, ResST not only
performed well on mainstream ST platforms, such as 10× Visium, MERFISH,
and Slide-seq but also showed great potential in other ST data (4i and
MIBI-TOF).
ST provides spatially resolved gene expression of diverse cells in
complex tissues. However, uncovering a more detailed picture of complex
biological processes in tissues remains a challenge due to limited gene
recovery or low spatial resolution in recent ST technologies. The
integration of ST with multi-omics data is an inevitable trend for
comprehensively understanding complex biological processes (the
fine-grained panorama of a heterogeneous tissue/molecular
architecture). ResST offers opportunities for new research issues in
data integration of ST and multi-omics. For instance, the scRNA-seq
data can be applied to deconvolute ST data, generating clearer
cell-type compositions and enabling tissue cartography at single-cell
resolution. Furthermore, when single-cell resolution of ST datasets is
available, they can be directly integrated with multi-omics data. For
example, integrating CITE-seq will deeply differentiate cellular
heterogeneity and accurately explore specific cell types in the spatial
context of tissues, which can help us to investigate the molecular
mechanisms of drug resistance in cancer therapies. Additionally,
combining ST datasets with scNMT-seq data will profile multiple
epigenetic features in conjunction with gene expression, at single-cell
resolution in the spatial context of tissues, enabling a more complete
understanding of epigenetic dependencies and their associations with
transcription and cell states.
Methods
Normalization of gene expression
In ST studies, gene expression and spatial information can be utilized
to identify cells in close spatial positions that exhibit similar
states. The histology image shows a clear difference among tissue
substructures with different functions, suggesting histology is
informative for clustering. The gene expression of each spot can be
enhanced by combining with morphological information and spatial
information, facilitating natural domain identification. Additionally,
because of the strong link between biological effects and gene
expression, we take into account the impact of biological effects on
the clustering. Taking the in situ capturing-based technologies as an
example, the spatial gene expression stores the spatial two-dimensional
coordinates
[MATH:
(x,y)
mrow> :MATH]
of n spots, as well as a raw gene expression matrix
[MATH: GE∈Rn×
d :MATH]
with n spots and d genes. The matrix stores the unique molecular
identifier count (UMI; an n×d matrix of UMI counts) of each gene.
[MATH: GEi∈R1×
d :MATH]
is a raw gene expression vector for spot s[i], where
[MATH:
s∈(s
1,…,si,…,sj,
…,sn<
/mrow>) :MATH]
. ResST constructs enhanced gene expression matrix of all spots by
combining morphological information, gene expression, and spatial
location.
1. Calculation of morphological similarity matrix. To effectively
utilize morphological information, we accurately evaluate the
morphological similarity matrix by measuring the morphological
distance between each spot and others. For the ST data with
morphological information, ResST segments the histological images
based on the spatial coordinates of each spot and employs a
pre-trained convolutional neural network (optional; default
ResNet^[206]52) to extract image features from the segmented
subgraphs as the morphological feature vectors of each spot. This
model can convert each spot image into a 2048-dimensional latent
variable. To better represent the morphological features of each
spot, we apply principal component analysis (PCA) to extract the
first 50 principal components as latent characteristics and
calculated the cumulative explained variance (more than 0.99999).
Cosine distance is used to calculate the morphological similarity
weight
[MATH: MWij<
/msub> :MATH]
between s[i] and s[j], as
[MATH: MW(s<
/mrow>i,sj)=MWij=Mi⋅Mj∣∣Mi∣∣2∣<
mo>∣Mj∣∣2, :MATH]
1
where
[MATH: MW :MATH]
is a n×n morphological similarity matrix,
[MATH: MWij<
/msub> :MATH]
is the morphological similarity weight between s[i] and s[j],
[MATH: Mi
:MATH]
and
[MATH: Mj
:MATH]
represent the latent characteristics after performing PCA on
morphological feature vectors for spots s[i] and s[j],
respectively.
2. Calculation of gene expression correlation matrix. The gene
expression of a cell is influenced by its neighboring cells and
intracellular or extracellular states, suggesting biological
effects are nonnegligible for clustering. We explore biological
effects by calculating the gene expression correlation
[MATH: GWij<
/msub> :MATH]
, using the raw gene expression vectors
[MATH: GEi :MATH]
and
[MATH: GEj :MATH]
from spots s[i] and s[j]. The characteristics of
[MATH: GWij<
/msub> :MATH]
are:
[MATH: GW(s<
/mrow>i,sj)=GWij=GEi−GEi<
mo accent="true">¯⋅GEj−GEj<
mo
accent="true">¯<
mo>∣∣GEi−GEi<
mo
accent="true">¯∣∣2
∣∣GEj−GEj<
mo
accent="true">¯∣∣2
, :MATH]
2
where
[MATH: GW
:MATH]
is a n×n gene expression correlation matrix,
[MATH: GWij<
/msub> :MATH]
is the gene expression correlation between s[i] and s[j],
[MATH: GEi
:MATH]
and
[MATH: GEj
:MATH]
represent the raw gene expression vectors for spots s[i] and s[j].
3. Calculation of spatial correlation matrix. We initially calculate
the Euclidean distance between each spot and all other spots based
on spatial coordinates and normalize the distances for neighboring
spots to quantify the influence of spatial information on
determining similar cellular states. For given spots s[i] and s[j],
if the distance between the two spots is less than radius r, s[i]
and s[j] are considered to be adjacent, and
[MATH: SWij<
/msub>=1 :MATH]
, otherwise
[MATH: SWij<
/msub>=0 :MATH]
. The characteristics of
[MATH: SWij
msub> :MATH]
are:
[MATH: SWij=1,&SWij<r0,&SWij≥r,
:MATH]
3
where
[MATH: SW
:MATH]
is an n×n spatial correlation matrix and the value of
[MATH: SWij<
/msub> :MATH]
represents whether s[j] is a neighboring spot of s[i]. We
calculated Euclidean distance of the coordinate correlations by
establishing a mapping relationship of each sequencing spot’s
position between the experimental platform and the histological
image. The threshold r was obtained by multiplying an adjustment
factor by Euclidean distance to determine whether spots have
spatial correlations.
4. The enhanced gene weight
[MATH: EGWij<
/msub> :MATH]
between s[i] and s[j] is calculated as follows:
[MATH: EGW(s
i,s
mrow>j)=EGWij=SWij⋅MWij⋅GWij,
:MATH]
4
if in 10× Visium, otherwise as:
[MATH: EGW(s
i,s
mrow>j)=EGWij=SWij⋅GWij, :MATH]
5
where
[MATH: EGW
:MATH]
is an n×n enhanced gene weight matrix,
[MATH: EGWij :MATH]
is the enhanced gene weight between s[i] and s[j].
ResST then normalizes gene expression vector
[MATH: GEi′ :MATH]
of each spot s[i], as shown in Eq. ([207]6):
[MATH: GEi′=GEi+∑j=1
mGEj⋅EGWijm :MATH]
6
where
[MATH: GEi′ :MATH]
represents normalized gene expression vector for a center spot s[i],
[MATH: GEi :MATH]
and
[MATH: GEj :MATH]
are the raw gene expression vectors for spot s[i] and m adjacent spots
s[j].
Graph construction for ST data
To make full use of spatial information, we first construct a cell-cell
spatial relationship graph G = (V, E) by calculating the distances
between spots based on spatial location and selecting the k nearest
spots as the current spot’s neighbors. In graph G, V represents the set
of spots and E is the set of connected edges between spots. That is, A
is defined as the adjacency matrix of graph G, and if spot
[MATH:
sj∈V :MATH]
is the neighbor of spot
[MATH:
si∈V :MATH]
,
[MATH: Aij=Aji=1 :MATH]
, otherwise
[MATH: Aij=Aji=0 :MATH]
. Additionally, we add a self-loop to this matrix. For a given spot,
its neighbors are determined by calculating the distance between spots
using spatial locations (optional; default is Euclidean), and we select
k spots as its neighbors from the top nearest neighbors. In our tests,
ResST achieves its best performance in most of the tested datasets with
k = 12 (Supplementary Data [208]5). Finally, we perform normalization
preprocessing on the adjacency matrix, as shown in Eq. ([209]7):
[MATH: A′=D−1/2⋅A⋅D−1/2, :MATH]
7
where
[MATH: A′ :MATH]
is the symmetrically normalized adjacency matrix and
[MATH: D :MATH]
is the degree of adjacency matrix
[MATH: A :MATH]
.
Graph self-supervised residual learning
We propose a graph self-supervised residual learning model based on a
graph neural network to learn the latent embedding from the normalized
gene expression
[MATH: GE′ :MATH]
and the neighborhood graph G. This process comprises three major steps:
(1) data preprocessing, (2) Graph autoencoder for spatial embedding,
and (3) Self-supervised residual learning for latent embedding.
Subsequently, the latent embedding is used for spatial clustering and
multiple ST data integration.
Data preprocessing
In the aforementioned steps, ResST calculated normalized gene
expression using the complete raw gene expression counts. Before model
training, the normalized gene expression matrix
[MATH: GE′ :MATH]
is initially logarithmically transformed and normalized by library size
and scaled to unit variance and zero mean using the Scanpy package. In
this paper, all 10× Visium datasets are subjected to PCA to extract the
top 200 principal components (optional) for generating gene expression
vectors X as input for model training. For datasets obtained from other
platforms, the number of PCA components for datasets with fewer
measured genes is equal to the number of genes, such as 4i, MOBI-TOF,
MERFISH, seqFISH, and stereoseq. For Slide-seqV2 data, which has a
higher number of measured genes, the top 200 principal components were
selected.
To enhance the model’s robustness, we randomly corrupt the gene
expression matrix using a noise function while preserving the original
graph’s topological structure unchanged.
Graph autoencoder for spatial embedding
We devise a graph autoencoder that imbues spatial information from the
cell-cell spatial relationship graph into gene expression vectors
through iterative model training. Graph autoencoder aggregates the
spatial information of neighboring cells or spots by improving the
variational graph autoencoder. Spatial embedding (the residual) is
computed by posterior probabilities and is incorporated into the
learning of latent embedding, as shown in the expression below:
[MATH: qZg∣A,Ze=∏i=1
nqzi
mfenced>A,Ze,
:MATH]
8
the solution for the posterior probability is shown in the equation
below:
[MATH: qzi
mrow>A,Ze=Nzi
mfenced>μi
,diagσi<
mrow>2,
:MATH]
9
where n is the number of cells or spots,
[MATH: A :MATH]
is the adjacency matrix,
[MATH: N :MATH]
is the normal distribution,
[MATH: zi=GNNμ(A,Ze) :MATH]
is the means of adjacency matrix,
[MATH: Ze :MATH]
and
[MATH: Zg :MATH]
are introduced in the section of self-supervised residual learning for
latent embedding,
[MATH:
logσ=GNN
mi>σ(A,Ze) :MATH]
is the variance between the node vectors. The two-layer GNN is defined
as:
[MATH: GNNZe,A=A′ReLUA′ZeWoW1, :MATH]
10
where
[MATH:
GNNμ(Ze,A) :MATH]
and
[MATH:
GNNσ(Ze,A) :MATH]
share the same weight
[MATH: W0 :MATH]
, but
[MATH: W1 :MATH]
is different.
[MATH: A′ :MATH]
is the symmetrically normalized adjacency matrix, as shown in Eq.
([210]7).
Self-supervised residual learning for latent embedding
We propose a deep autoencoder, a graph self-supervised residual
learning model to aggregate gene expression, biological effects,
spatial location, and morphological information for capturing nonlinear
relationships between a spot and surrounding spots. The deep
autoencoder consists of an encoder, a graph autoencoder, and a decoder.
As shown in Fig. [211]1c, the deep autoencoder learns low-dimensional
gene representation by encoder. Next, low-dimensional gene
representation is mapped to spatial embedding (residual) by the graph
autoencoder. Finally, low-dimensional gene representation from the
encoder is mapped through identity shortcuts and concatenates with the
residual for composing latent embedding. Identity shortcuts are never
closed, and low-dimensional gene representation is always passed
through, with additional residual functions to be learned.
We feed the gene expression vector X into the encoder E after data
preprocessing. The encoder
[MATH: E :MATH]
is composed of multiple fully connected layers, which can denoise the
input data and convert the input
[MATH: X∈Rn×
200 :MATH]
into a low-dimensional gene representation
[MATH: Ze∈Rn×He
:MATH]
, where
[MATH: He
:MATH]
= 20 is the dimension of the last encoder layer. The number and
dimension of the fully connected layers can be defined by the user and
can be expressed as:
[MATH: EX=Ze. :MATH]
11
Additionally, the low-dimensional gene representation and the cell-cell
spatial relationship graph G are spatially embedded by two graph neural
network (GNN) layers, resulting in a spatial embedding
[MATH: Zg∈Rn×Hg
:MATH]
, which can be expressed as:
[MATH: GNNZe=Zg, :MATH]
12
where
[MATH: Hg
:MATH]
= 8 is the dimension of the last GNN layer.
Finally, the decoder
[MATH: D :MATH]
is composed of multiple fully connected layers that decode the latent
embedding
[MATH: Z∈Rn×
H :MATH]
and reconstruct the gene expression matrix
[MATH: X′∈Rn×
200 :MATH]
. The latent embedding is composed of the low-dimensional gene
representation and spatial embedding with “shortcut connections”. This
residual learning structure can facilitate the learning of more
nonlinear information from gene expression, biological effects, spatial
information, and morphological information, and it can be expressed as
follows:
[MATH: DZ=X′, :MATH]
13
[MATH:
H=He+Hg<
/msub>, :MATH]
14
where n is the number of cells or spots.
The loss function L includes the reconstruction loss between the
reconstructed gene expression matrix and the original gene expression
matrix, and the Kullback–Leibler divergence of the node representation
vector distribution and the normal distribution, as shown below:
[MATH:
L=1n∑i=1
n∣∣
Xi−X′i∣∣2
+KL[q
(Zg∣A,Ze)∣∣p(Zg)], :MATH]
15
where n,
[MATH: Xi :MATH]
and
[MATH: X′i :MATH]
are the same as mentioned above, q is posterior probability the same as
Eq. ([212]8),
[MATH: p(Zg)=∏i<
/mrow>N(zi∣0,I) :MATH]
.
Training strategies
The training of ResST model involves several key strategies. Initially,
ST data are processed to produce the input matrix X and adjacency
matrix A for the model.
Firstly, we initialize ResST and configure the Adam optimizer along
with specific hyperparameters, such as learning rate and weight decay.
In this study, the Adam optimizer’s learning rate is 0.0005, and weight
decay is 0.0004. During the pre-training phase, we employ a noise
function to augment the input data with a masking ratio of 0.001.
Concurrently, we calculate the loss function, which comprises the
reconstruction error and KL divergence. Following the pre-training
phase, the Kmeans algorithm is utilized to perform initial clustering
on the pre-trained latent embeddings. The model then progresses to the
formal training phase. In this phase, we periodically recalculate the
target distribution and update the clustering labels, while optimize
the model parameters by minimizing the loss function. The loss function
now includes reconstruction error, KL divergence, and adversarial loss.
Within the optimize_cluster function, the CH index is used to assess
clustering performance, and the resolution with the highest CH score is
selected as the optimal resolution. During training, KL divergence is
employed to measure the difference between the predicted distribution
and the target distribution for preventing overfitting.
Throughout the training, we monitor the total loss (including
reconstruction error, KL divergence, and adversarial loss) and stop
training when the total loss reaches a preset tolerance (0.001). Other
hyperparameters include KL divergence weight (0.1), mean squared error
weight (10), and binary cross-entropy weight (1) to balance different
loss terms in the total loss.
Batch effects correction for spatial transcriptomics
Integrated analysis of multiple tissue sections can provide deeper
insights into a comprehensive understanding of biological functions.
Integration of ST datasets is typically categorized into two scenarios:
vertical integration and horizontal integration. The major challenges
for integrated analyses are the presence of batch effects among
different sections and accurately assigning spots to the correct
spatial domains. To overcome these challenges, we extend ResST model,
which applies MDD to correct batch effects and align gene expression
from different batches into the same feature space. Compared to other
methods, ResST efficiently conducts vertical or horizontal integration
from multiple tissue sections while correcting batch effects and
preserving layer-specific gene expression variations.
In ST data from different batches, there are no direct spatial
relationships among spots from different sections. For vertical
integration, we align the spatial locations from ST data based on H&E
images, resulting in adjacent space among different sections.
Subsequently, we construct the cell-cell spatial relationship graph
using the aligned spatial coordinates. For horizontal integration, we
use one of the sections as a reference and transform the spatial
coordinates of the other sections to align the domains straddling the
jointing edges. The model uses A^m to represent the adjacency matrix of
the mth batch and X^m to represent the gene expression matrix of the
mth batch. This study creates a block diagonal matrix
[MATH: A :MATH]
, and concatenates the gene expression matrix
[MATH: X :MATH]
with it, as shown below:
[MATH: A=A1⋯0⋮⋱⋮0⋯Am,X=X1⋮Xm, :MATH]
16
where m is the number of ST batches.
Next, we apply MDD to reduce batch effects and map the gene expression
from different batches to the same feature space, resulting in maximal
spatial proximity of different batches. MDD consists of a feature
extractor, domain classifier, and label classifier. The feature
extractor and label classifier share the encoder and decoder parts of
the deep autoencoder, which are composed of joint linear layers and
graphical neural networks. The domain classifier is composed of joint
linear layers, with a user-defined number of hidden layers (default is
2). A gradient reversal layer (GRL) is added before the domain
classifier to blur the boundary between the source and target domain.
We train the feature extractor and extract shared features from the
source and target domain using Eq. ([213]15). The domain classifier
further maps the latent embeddings Z outputted by the feature extractor
to a new D-dimensional representation:
[MATH: Z→RD
:MATH]
. We train the domain classifier using binary cross-entropy loss
function to maximize the adversarial loss, thus increasing the
dissimilarity between the source and target domain, as shown in the
equation below:
[MATH:
L=−1n
∑i=1
nyi⋅logpyi+1−y<
mi>i⋅log1−pyi, :MATH]
17
where
[MATH: yi
:MATH]
is the sample
[MATH: i :MATH]
, if the prediction label is equal to the true label,
[MATH:
yi=1 :MATH]
, otherwise
[MATH:
yi=0 :MATH]
;
[MATH: pyi :MATH]
represents the probability that the observed sample
[MATH: i :MATH]
predicts correctly.
Spatial domain identification and visualization by clustering
After model training, the Leiden algorithm in the Scanpy^[214]54
package is used to identify the spots into different spatial domains
based on latent embeddings obtained by ResST. ResST achieves the
optimal resolution in two ways. If the number of spatial domains is
known, a grid search is conducted over resolutions ranging from 0.1 to
2.5 with a step size of 0.001. If the number of spatial domains is
unknown, the optimal resolution is determined by conducting a grid
search over resolutions ranging from 0.1 to 2.5 with a step size of
0.001 and simultaneously calculating the CH score. The resolution with
the highest CH score is selected as the optimal resolution. The Leiden
clustering algorithm is used to form clusters based on optimal
resolution, and each cell or spot is assigned a label. Subsequently,
the clustering results are visualized on tissue sections using the
Scanpy package.
Although the latent embedding is obtained using gene expression,
spatial locations, complex biological effects, and morphological
information, some spots may be erroneously assigned to different
spatial domains when utilizing latent embedding for clustering. We
consider these occurrences as noise and acknowledge their potential
impact on downstream biological analysis. To address this, we reassign
incorrectly allocated spots to the most frequently adjacent region.
During vertical integration, we perform UMAP analysis using the RunUMAP
function from Seurat, with default parameters of min_dist set to 0.5
and n_neighbors set to 15. The feature space derived from ResST,
Harmony, and Scanorama integrate multiple ST data sections and
extracted features for UMAP analysis. To correct the UMAP results, we
employ scDEED^[215]55 tool to detect dubious cells and optimize the
UMAP analysis hyperparameters. We run scDEED with candidate n_neighbors
values of 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 80, 160, and 240, and
min_dist values of 0.0125, 0.05, 0.1, 0.3, 0.5, 0.7, and 0.9. After
removing the dubious cells, we visualize the UMAP results.
Identification of spatially variable genes
The Wilcoxon rank-sum test in Scanpy package is used to confirm highly
variable genes for each spatial domain as well as differentially
expressed genes between spatial domains. We then use rule^[216]15 in
SpaGCN package to filter the differentially expressed genes: (1) the
percentage of spots expressing the gene in the target domain, that is,
in-fraction, is >80%; (2) for each neighboring domain, the ratio of the
percentages of spots expressing the gene in the target domain and the
neighboring domain(s), that is, in/out fraction ratio, is >1; and (3)
the expression fold change between the target and neighboring domain(s)
is >1.5. Genes with log[2]FC ≥ 2 are selected as input, and the gene
enrichment analysis is performed using the online tool
Metascape^[217]56.
Determination of ligand-receptor interactions between cells
For human breast cancer samples, we analyze the ligand-receptor
interactions by stLearn package. For each cell or spot, the expression
levels of their receptors and ligands are calculated, and a
receptor-ligand expression matrix is formed. Then, we calculate the
expression scores of ligands or receptors in each cell or spot and
display regions with scores less than 7000.
Statistics & reproducibility
In our case analysis, we utilized the scipy package, a toolkit for
scientific computing, to compare paired samples or two independent
samples within the study. To evaluate the stability and reproducibility
of ResST, we conducted 15 tests on different ST datasets, utilizing the
default parameters. Subsequently, we calculated the variance of several
key metrics to assess the performance of ResST, including the ARI, FMI,
SC, DB, and CH.
Evaluation metrics
In this study, we utilized several well-established metrics to evaluate
the clustering results, providing a holistic view of the algorithm’s
performance. The ARI was used to measure the similarity between the
clustering outcome and the ground truth, with a focus on correcting for
the chance grouping. The FMI, known for its effectiveness in scenarios
with a large number of clusters, gauged the clustering quality based on
precision and recall. To assess the cohesion and separation of the
clusters, we employed the SC, which offers insights into how well each
object lies within its cluster. Additionally, the DB was utilized to
evaluate the clustering compactness and separation, where a lower value
indicates better clustering. Finally, the CH was used to measure the
variance ratio between clusters, with higher values signifying more
distinct clustering. This combination of metrics provided a robust
framework for evaluating and validating our clustering approach.
Comparison with other methods
To evaluate the clustering performance of ResST in identifying spatial
domains, we compared it with Kmeans, Seurat, SpaGCN, stLearn, SEDR,
CCST, and BayesSpace. (1) Kmeans and Seurat implemented in the R
package Seurat. (2) SpaGCN implemented in the Python package SpaGCN.
(3) stLearn implemented in the Python package stLearn. (4) SEDR
implemented in the Python package SEDR. (5) CCST implemented in the
Python package CCST. (6) BayesSpace implemented in the Python package
BayesSpace. The default parameters were used for all methods.
To evaluate the data integration performance of ResST, we compared it
with Harmony and Scanorama which are implemented in the Python package
Harmony and Scanorama.
Reporting summary
Further information on research design is available in the [218]Nature
Portfolio Reporting Summary linked to this article.
Supplementary information
[219]Peer Review File^ (4.4MB, pdf)
[220]Supplementary Information^ (11.1MB, pdf)
[221]42003_2024_6814_MOESM3_ESM.pdf^ (40.9KB, pdf)
Description of Additional Supplementary Files
[222]Supplementary Data 1^ (37.5KB, xlsx)
[223]Supplementary Data 2^ (12.9KB, xlsx)
[224]Supplementary Data 3^ (123.3KB, xlsx)
[225]Supplementary Data 4^ (58.4KB, xlsx)
[226]Supplementary Data 5^ (11.8KB, xlsx)
[227]Reporting Summary^ (1.6MB, pdf)
Acknowledgements