Abstract

   Spatial transcriptomics (ST) technologies allow for comprehensive
   characterization of gene expression patterns in the context of tissue
   microenvironment. However, accurately identifying domains with spatial
   coherence in both gene expression and histology in situ and effectively
   integrating data from multi-sample remains challenging. Here, we
   propose ResST, a graph self-supervised residual learning model based on
   graph neural network and Margin Disparity Discrepancy (MDD) theory.
   ResST aggregates gene expression, biological effects, spatial location,
   and morphological information to capture nonlinear relationships
   between a cell and surrounding cells for spatial domain identification.
   Also, ResST integrates multiple ST datasets and aligns latent
   embeddings based on MDD theory for correcting batch effects. Results
   show that ResST identifies continuous spatial domains at a finer scale
   in ten ST datasets acquired with different technologies. Moreover,
   ResST efficiently integrated data from multiple tissue sections
   vertically or horizontally while correcting batch effects. Overall,
   ResST demonstrates exceptional performance in analyzing ST datasets.

   Subject terms: Computational models, Machine learning
     __________________________________________________________________

   A graph self-supervised residual learning framework (ResST) for natural
   domain identification and data integration of spatial transcriptomics.

Introduction

   The complex structures and functions of multicellularity require the
   cooperation of thousands to billions of cells. The gene expressions of
   a cell are influenced by the surrounding cell and tissue
   microenvironment. Knowledge of the relative locations of different
   cells in a tissue is critical for understanding tissue functionality
   and pathological changes. Spatial transcriptomics (ST) technologies
   provide quantitative gene expression profiling with spatial information
   in tissues and have brought unprecedented opportunities for insights
   into the molecular organization of tissues. ST technologies can be
   mainly classified into two categories. The first category is in situ
   hybridization or sequencing-based technologies, such as
   seqFISH^[38]1,[39]2, seqFISH+^[40]3, MERFISH^[41]4,[42]5, etc, which
   measure the expression levels of hundreds to thousands of genes in
   cells within tissue microenvironments. This type of method can achieve
   gene expression quantification with high spatial resolution at the
   single-cell resolution level. The second category is based on in situ
   capturing-based technologies, including ST^[43]6, SLIDE-seq^[44]7,
   SLIDE-seqV2^[45]8, Stereoseq^[46]9, 10× Visium^[47]10,[48]11, etc,
   which first label spatial spots on histological tissue sections with
   unique barcodes and then perform sequencing. These methods can profile
   the whole transcriptome information of tissue sections of any size.

   In ST studies, a critical step is accurate clustering for the
   identification of spatial domains. Traditional non-spatial clustering
   methods only utilize gene expression for cell clustering, such as
   Kmeans and Louvain’s method, which were extensively employed in
   single-cell transcriptomics studies^[49]12. However, these methods
   ignore spatial location and morphological information, resulting in
   clustering results that are discontinuous in tissue sections. Several
   spatial clustering methods have been developed to analyze ST data
   derived from tissue samples and interpret the spatial dependence of
   genes by integrating spatial location or morphological information. For
   example, BayesSpace^[50]13 enables spatial clustering by modeling a
   low-dimensional representation of gene expression matrix and
   encouraging neighboring spots to belong to the same cluster via a
   spatial prior. stLearn^[51]14 extracts features and location
   information from histological images and combines gene expression
   features from adjacent nodes to normalize gene expression data.
   SpaGCN^[52]15 utilizes graph convolutional neural networks to identify
   spatial domains and spatially differential genes by combining gene
   expression, spatial location, and histology. SEDR^[53]16 embeds spatial
   information using deep autoencoders and graph autoencoders to map gene
   expression patterns to low-dimensional space. Recently, Cell Clustering
   for Spatial Transcriptomics data (CCST)^[54]17 performs cell clustering
   by extending an unsupervised node embedding method based on graph
   convolutional networks. These existing methods enhance clustering
   accuracy by incorporating spatial location or morphological
   information. Therefore, the comprehensive utilization of multimodal
   data including gene expression, spatial location, and morphological
   information, effectively improves the clustering performance. However,
   the aforementioned methods do not consider the impact of biological
   effects on spatial domain identification. It has been demonstrated that
   biological effects are strongly linked to gene expression in tissue
   cells owing to gene expression characteristics influenced by biological
   effects, such as cell properties, surrounding cells, and tissue
   microenvironment^[55]18. Also, the neural network architectures
   developed by these deep learning methods mainly rely on linear
   principal component analysis (PCA) to extract features from normalized
   gene expression and are relatively simple, lacking the ability to learn
   more nonlinear information from gene expression, biological effects,
   spatial information, and morphological information. It is desirable to
   develop efficient computational approach that aggregates gene
   expression, biological effects, spatial location, and morphological
   information to capture nonlinear relationships between a cell and
   surrounding cells for natural domain identification.

   Another critical task in ST studies is to integrate data from multiple
   sections while correcting batch effects. In order to obtain the
   real-world gene expression patterns of tissue organization for fully
   understanding tissue functionality and pathological changes, multiple
   studies have performed the analysis of ST data from multiple tissue
   sections equally distributed along vertical or horizontal axis of the
   tissue. Although a variety of data integration methods have been
   developed, such as Harmony^[56]19 and Scanorama^[57]20, most of them
   are for single-cell datasets without consideration of spatial
   information. Recently, PASTE^[58]21 and PRECAST^[59]22 were proposed to
   analyze ST datasets with consideration of spatial information. However,
   these methods only apply to integrate ST data from multiple tissue
   sections uniformly distributed along the anteroposterior axis of the
   same area of interest (vertical integration). They also show limited
   flexibility for different ST platforms, resulting in poor clustering
   performance in broad-spectrum data^[60]23,[61]24. More recently,
   SpaGCN^[62]15 and STAGATE^[63]25 can jointly analyze data from multiple
   adjacent sections (horizontal integration) containing tissue
   compartments within the whole organ, but they are ineffective in
   correcting batch effects and inapplicable in integrating data from
   multiple tissue sections vertically. Also, most existing methods map
   integrated data into low-dimensional space using the principal
   components of conventional dimension reduction without considering the
   consistent loss functions of dimension reduction, alignment across
   samples, or spatial clustering. Therefore, there is a need for a novel
   method to integrate ST data from multiple sections horizontally or
   vertically while correcting batch effects.

   To address the aforementioned challenges, we developed ResST, a graph
   self-supervised residual learning model based on graph neural networks
   and Margin Disparity Discrepancy (MDD) theory, to perform spatially
   natural clustering and integrate data horizontally or vertically while
   correcting batch effects. For clustering, ResST constructs gene
   expression correlation matrix to represent the impact of biological
   effects on gene expression. Then ResST learns the spatial embedding
   (residual) and low-dimensional gene representation. Subsequently,
   low-dimensional gene concatenates with the residual for composing
   latent embedding that captures multimodal information, including gene
   expression, biological effects, spatial location, and morphological
   information, significantly enhancing clustering accuracy. Moreover,
   ResST achieved algorithm design of data integration based on MDD theory
   for domain adaptation. MDD is a measurement with rigorous
   generalization bounds, tailored to the distribution comparison with the
   asymmetric margin loss^[64]26. It promotes the proximity between the
   class mean of the two domains for each class and pushes apart the mean
   of different classes, significantly improving prediction accuracy in
   unsupervised domain adaptation^[65]26,[66]27. In this study, ResST
   possesses the ability to integrate ST data from multiple tissue
   sections horizontally or vertically. Moreover, ResST aligns the latent
   embeddings across multiple tissue sections by incorporating MDD
   measurement to correct batch effects.

   We extensively tested ResST for the two key tasks on analysis of ST
   data from different platforms (e.g., 10× Visium, SLIDE-seqV2^[67]8,
   Stereoseq^[68]9, seqFISH^[69]1,[70]2, MERFISH^[71]4,[72]5, 4i^[73]24,
   and MIBITOF^[74]23) and compared it with current methods. The results
   showed that ResST achieved better clustering performance on the human
   dorsolateral prefrontal cortex (DLPFC) dataset than current
   state-of-the-art methods. Moreover, ResST identified much more refined
   heterogenous subregions in breast cancer ST data and sagittal section
   seqFISH data of mouse embryos. Interestingly, ResST effectively reduced
   batch effects in the human DLPFC dataset and mouse posterior brain 10×
   Visium data, and more clearly identified the dentate gyrus of mouse
   posterior brain compared to other mainstream batch processing methods.
   In summary, our results demonstrated that ResST has a powerful ability
   to accurately identify spatial domains and efficiently integrate data
   from multiple tissue samples horizontally or vertically while
   correcting batch effects, making it highly scalable for different ST
   platforms.

Results

Overview of ResST workflow

   We explain the workflow of ResST using in situ capturing-based ST data
   as an example, but the model can be easily modified to different ST
   platforms. ResST couples gene expression with biological effects,
   spatial location, and morphological information to accurately identify
   spatial domains (Fig. [75]1a). ResST first constructs an enhanced gene
   expression matrix of all spots by combining gene expression, spatial
   location, and morphological information, and calculates an adjacent
   matrix using k-nearest neighbors algorithm based on spatial coordinates
   (Fig. [76]1b). Then, ResST feeds enhanced gene expression matrix into
   encoder to generate low-dimensional gene representation. Next, ResST
   utilizes two graph convolutional layers to map low-dimensional gene
   representation into spatial embedding (residual). Subsequently,
   low-dimensional gene representation from the encoder is mapped through
   identity shortcuts and concatenates with the residual for composing
   latent embedding. Finally, the learned latent embedding can be applied
   to spatial domain identification and cell type or subtype
   identification (Fig. [77]1c). When integrating data from multiple
   sections, we design a loss function for domain classifier based on MDD
   measurement to calculate loss between domains (tissue sections). ResST
   integrates enhanced gene expression matrixes from multiple datasets to
   generate latent embeddings, which are fed into a domain classifier
   connected by a gradient reversal layer for correcting batch effects
   (Fig. [78]1c, d). During the learning process, the domain classifier
   can map the source and target domains of different distributions into
   the same feature space, which is used to integrate spatial data from
   various distributions. In addition, ResST applies domain classifier to
   maintain the correspondence between cells and tissue sections,
   preserving layer-specific gene expression variations.

Fig. 1. Overview of ResST workflow.

   [79]Fig. 1
   [80]Open in a new tab

   a ResST workflow begins with gene expression, spatial location, and
   morphological information in ST data. b ResST constructs an enhanced
   gene expression matrix from morphological information, gene expression,
   and spatial location. ResST initially segments the histological image
   based on the coordinate of each spot and extracts the image features by
   a pre-trained deep learning model. The morphological similarity matrix
   is calculated to assess the influence of morphological information on
   clustering based on the image features. The gene expression correlation
   matrix is calculated to assess the influence of biological effects on
   clustering. The spatial correlation matrix is calculated to quantify
   the influence of spatial information on determining similar cellular
   states. c ResST is a graph self-supervised residual learning model
   based on graph neural network and Margin Disparity Discrepancy (MDD)
   theory. The deep autoencoder consists of an encoder, a graph
   autoencoder, and a decoder. d ResST can integrate ST data from
   different sections while correcting batch effects and accurately
   assigning spots to the correct spatial domains. N represents the number
   of spots, while M[1] and M[2] correspond to the dimensions of the final
   layer of encoder and the final layer of GNN, respectively.

ResST exhibits superiority in identifying spatial domains in the human
dorsolateral prefrontal cortex and mouse brain data

   We first assessed the superiority of ResST’s spatial clustering
   performance on the human DLPFC^[81]20 dataset with 12 sections from
   three distinct individuals acquired with 10× Visium. It includes
   cortical layers (L1-L6) and white matter (WM) labeled with marker genes
   and histological annotations of tissue structure, which enabled
   evaluation of the accuracy of spatial domain detected by each method.
   The spatial domain identified by these methods for the representative
   tissue section 151673 (3639 locations and 33538 genes) was displayed in
   Fig. [82]2b. The visual comparison demonstrated that Kmeans spatial
   clustering had the poorest performance with mixed cortical layers
   (L1-L6). Seurat and stLearn recovered part of the cortical layers and
   WM, but they did not achieve clean separation and clear boundaries of
   the cortical layers (L1-L6). SEDR and CCST could identify well-refined
   layers by utilizing histological information or spatial location for
   spatial clustering, but they only divided cortical layers into five
   layers and most of the layers were inconsistent with the annotations.
   SpaGCN and BayesSpace accurately delineated all the cortical layers
   (L1-L6) and WM, but they generated incorrect layer thickness of L1 and
   L2. ResST delineated finer-grained layers by integrating multimodal
   information including gene expression, biological effects, spatial
   location, and morphological information, and distinguished all the
   cortical layers (L1-L6) and WM that were most consistent with manual
   annotations of DLPFC and neuroscientific definitions of cortical
   layers.

Fig. 2. ResST exhibits superiority in identifying spatial domains.

   [83]Fig. 2
   [84]Open in a new tab

   a Manually annotated layer structure for slice 151673 from ref.
   ^[85]57. b Spatial domains identified by ResST and existing
   state-of-the-art algorithms (Kmeans, Seurat, SpaGCN, stLearn, SEDR,
   CCST, and BayesSpace). c Boxplot of the performance of ResST and other
   algorithms for all 12 DLPFCs. The x-axis shows ARI and FMI, which were
   used to compare the similarity of the predicted spatial layers and the
   manually annotated layers for each algorithm. d UMAP visualizations and
   PAGA graphs were generated for slide 151673 with Seurat-derived
   principal components (left) and ResST-derived embeddings (right). e
   Clustering results of mouse brain coronal region (top); The
   corresponding anatomical Allen Mouse Brain Atlas (bottom,
   [86]https://atlas.brain-map.org/). f Clustering results of mouse brain
   sagittal anterior section (section 2); The corresponding anatomical
   Allen Mouse Brain Atlas (bottom). g Clustering results of mouse brain
   sagittal posterior section (section 1); The corresponding anatomical
   Allen Mouse Brain Atlas (bottom).

   We further evaluated the quality of clustering performances for each
   method using external measures, namely the Adjusted Rand Index
   (ARI)^[87]28 and the Fowlkes-Mallows Index (FMI)^[88]29. ResST achieved
   much higher ARI and FMI than the current best method (BayesSpace). The
   mean ARI was 0.694 ± 0.062 for ResST and 0.526 ± 0.128 for BayesSpace
   (Wilcoxon test, p = 0.0007; Supplementary Table [89]1). The FMI was
   0.752 ± 0.064 for ResST and 0.635 ± 0.099 for BayesSpace (Wilcoxon
   test, p = 0.0061; Supplementary Table [90]2). The mean ARI was lower
   than 0.426 ± 0.094 for other methods. ResST achieved the highest
   clustering accuracy in tissue section 151509 (ARI = 0.792;
   Supplementary Fig. [91]3) and lower variability in performance across
   the slices by considering complex global interactions than most of the
   other methods (Fig. [92]2c and Supplementary Figs. [93]1–[94]11).
   Moreover, compared to Seurat, ResST provided more orderly cortical
   layers from L1 to L6 and WM illustrated by uniform manifold
   approximation and projection (UMAP) and partition-based graph
   abstraction (PAGA) (Fig. [95]2d). Therefore, ResST exhibited the most
   outstanding performance in DLPFC dataset.

   Also, we applied ResST to 10× Visium data from the sagittal posterior,
   sagittal anterior, and coronal regions of the mouse brain with highly
   intricate tissue architectures. Although Kmeans was able to outline the
   major anatomical regions, many clusters were intermixed. Seurat,
   stLearn, and SpaGCN managed to achieve more distinct clusters, but the
   boundaries between these clusters remained ambiguous. CCST generated
   clean separation and clear boundaries, but clusters were inconsistent
   with histological images (Supplementary Figs. [96]12, [97]13). ResST,
   BayesSpace, and SEDR showed outstanding clustering performance and
   clearly identified tissue structures closer to histological images.
   Moreover, only ResST revealed finer-grained tissue structures on all
   three areas of mouse brain ST data, including the cornu ammonia and
   dentate gyrus in the hippocampus of mouse brain (Fig. [98]2e), the
   layers 2/3, 4/5, and 6 of the cortex in the sagittal anterior (Fig.
   [99]2f and Supplementary Figs. [100]12, [101]13), as well as the
   cerebellar cortex and dorsal horn of hippocampus region in the sagittal
   posterior (Fig. [102]2g and Supplementary Figs. [103]14, [104]15).

   We quantified the clustering performance with internal measures of the
   Silhouette Coefficient (SC)^[105]30, Calinski-Harabasz Index
   (CH)^[106]31, and Davies-Bouldin index (DB)^[107]32, which is only
   generated from SEDR, CCST, and stLearn, and cannot be obtained from
   other methods because of no embeddings. Clustering analysis of the
   sagittal posterior data showed that ResST had higher SC (SC = 0.122)
   and CH (CH = 137.031), and lower DB (DB = 1.920) than those of SEDR,
   CCST, and stLearn (Fig. [108]2g and Supplementary Fig. [109]14).
   Similarly, ResST achieved better clustering performance than most of
   the test methods on the coronal and sagittal anterior data
   (Supplementary Figs. [110]12, [111]13). These results demonstrated that
   ResST exhibited superiority in identifying spatial domains by
   considering gene expression, biological effects, spatial location, and
   morphological information in the human DLPFC and mouse brain data.

ResST reveals finer-grained tissue structures for improved heterogeneity
analysis in breast cancer data

   We assessed the clustering performance of ResST on the 10× Visium
   dataset of human breast cancer (invasive ductal carcinoma). This
   dataset comprises 3789 spatial spots and 36601 genes, manually
   annotated by pathologists based on H&E images and reported breast
   cancer marker genes with spatial expression characteristics (Fig.
   [112]3a). Here, we tested the ability of ResST, compared to other
   competing methods, to recover the detailed pathological structures.
   Among all the methods, ResST achieved the highest ARI (ARI = 0.728),
   followed by BayesSpace (ARI = 0.668), while the remaining methods had
   ARI values below 0.56 (Fig. [113]3b, Supplementary Fig. [114]16). The
   visual comparison clearly showed that the computed domains from Kmeans,
   Seurat, stLearn, and SpaGCN were fragmented and discontinuous, while
   SEDR, CCST, BayesSpace, and ResST produced more jointed domains. Kmeans
   was only able to recover the Healthy 1 domain with the remaining
   domains mixed among the others, indicating the poorest performance. The
   remaining methods achieved improved spatial domain identifying, yet the
   boundaries between domains performed by Seurat, stLearn, and SpaGCN
   appeared ragged and lacked clean separation. Although CCST and
   BayesSpace performed better, they inaccurately divided Health 1 region
   into multiple sub-regions which also existed in Seurat and stLearn.
   SEDR exhibited clear boundaries, but the reconstructed tissue
   structures did not align perfectly with the manual annotations. Only
   Seurat, BayesSpace, and ResST recognized the sub-domain (domain 13)
   within domain 2. Furthermore, ResST was the sole method that overcame
   all the aforementioned limitations that existed in other methods,
   producing the domains that were closest to the annotations, and
   achieving finer-grained pathological structures in cancer slices. Based
   on the clustering results of ResST, we delved deeper into the
   significant differentially expressed genes (DEGs) between domain 2 and
   its sub-domain, domain 13. We conducted differential expression
   analysis on domains 2 and 13 and detected 349 significant DEGs
   (log[2]FC ≥ 2; p.adjust < 0.05) between the two domains. Some of these
   DEGs, such as PTGES and VTCN1, were significantly overexpressed in
   domain 13 and could distinguish domains 2 and 13 (Fig. [115]3c,
   Supplementary Fig. [116]17, and Supplementary Data [117]1). It has been
   demonstrated that several identified DEGs were regarded as potential
   targets in cancer therapy. For example, VTCN1^[118]33 is associated
   with immunosuppression, and PTGES^[119]34 plays an important role in
   inflammation, immune regulation, cellular proliferation, and apoptosis
   (Fig. [120]3c). These results suggested that ResST displayed superior
   spatial clustering performance and provided more important DEGs for
   understanding cancer pathogenesis and uncovering therapeutic targets.

Fig. 3. ResST identified spatial domains in human breast cancer at a finer
level.

   [121]Fig. 3
   [122]Open in a new tab

   a 10× Visium ST data of a breast cancer sample annotated by
   pathologists containing invasive ductal carcinoma (IDC), ductal/lobular
   carcinoma (DCIS/LCIS), tumor edge, and healthy regions. b Spatial
   domains identified by ResST, Seurat, and BayesSpace. c Volcano graph of
   DEGs between domains 2 and 13 (left); Gene expression with regional
   annotation and violin plots of PTGES and VTCN1 (right). d Heatmap of
   Pearson correlation coefficient among domains. e LYZ and SLITRK6
   express differentially between domains 1 and 13. f Cell type
   annotations from Seurat (left); Domains with highest ligand-receptor
   interaction activity (LR_score < 7000) (middle); Spatial location of
   domains 5, 7, and 13 (right). g Gene ontology enrichment analysis of
   the DEGs between domains 1 and 13. h H&E of human breast cancer sample
   annotated by Agoko’s telepathology platform (left); Spatial domains
   identified by ResST, SEDR, CCST, and stLearn (right).

   To further explore the spatial heterogeneity within tumor tissues, we
   conducted domain correlation analysis using the cluster labels and the
   recovered pathological structures from ResST (Fig. [123]3d and
   Supplementary Data [124]2). We found significant heterogeneity between
   region 1 and regions 2, 3, and 13. Region 1 corresponds to DCIS/LCIS,
   while regions 2, 3, and 13 correspond to IDC in the breast tissue
   slices, respectively (Fig. [125]3d). We compared transcriptional
   differences between domain 1 and 13 by performing differential
   expression analysis followed by pathway enrichment analysis and
   ligand-receptor interactions analysis. A total of 413 significant DEGs
   (log[2]FC ≥ 2; p.adjust < 0.05) were detected and the top ten DEGs were
   selected in each domain for evaluating the transcriptome heterogeneity
   (Supplementary Data [126]3). CRISP3, SLITRK6, C6orf141, VTCN1, ABCC11,
   EFHD1, PDLIM1, ECM1, PIP, and PTGES were upregulated in domain 13 and
   could be used to accurately distinguish DCIS/LCIS and IDC. CRISP3 is
   associated with immune response modulation and tumor microenvironment,
   tumor-associated macrophage (TAM) infiltration^[127]35. VTCN1 is a
   negative co-stimulatory molecule that inhibits the activation and
   proliferation of T cells, which is associated with immune escape, TAM
   infiltration, and poor prognosis in various cancers^[128]36. PIP is
   linked to immune escape mechanisms in breast cancer^[129]37.
   ECM1^[130]38 and PTGES^[131]34 are associated with tumor proliferation,
   migration and invasion, chemoresistance, and poor prognosis (Fig.
   [132]3e and Supplementary Fig. [133]18). Moreover, ResST identified
   SCGB1D2, SCGB2A2, FCGR3B, LYZ, CXCL9, ABHD2, FCGRT, LINC02224, KLHDC7B,
   and HLA-DPA1, which were highly expressed in spatial domain 1. Previous
   studies have demonstrated that SCGB2A2 and SCGB1D2 may be new
   independent prognostic markers in breast cancer^[134]39. FCGR3B is
   considered as a marker gene of Natural killer cells^[135]40, and
   CXCL9^[136]41, KLHDC7B^[137]42, and LYZ^[138]43 are marker genes of
   lymphocytes, indicating abundant immune cell infiltration in domain 1
   (Fig. [139]3e and Supplementary Fig. [140]18). We conducted
   differential expression analysis for region 1 compared to regions 2 and
   3, respectively. The genes highly expressed in region 1 (Supplementary
   Data [141]3) are associated with immune response and extracellular
   matrix organization. The genes highly expressed in regions 2 and 13
   (Supplementary Data [142]3) are related to tumor invasion, immune
   evasion, and cytoskeletal remodeling.

   Additionally, we observed multiple pathways upregulated in domain 1,
   such as phagosome, lymphocyte-mediated immunity, and innate immune
   response (Fig. [143]3g and Supplementary Data [144]4). These pathways
   participate in anti-tumorigenic or anti-metastatic
   effects^[145]44–[146]46, indicating a low potential for tumor
   metastasis of DCIS/LCIS, consistent with previous studies^[147]47.
   Moreover, ResST performed spatial cluster by considering gene
   expression, spatial location, complex biological effects, and
   morphological information and identified domains 5, 7, and 15 which was
   consistent with locations in the tissue with high ligand-receptor
   interaction activity performed by stLearn (Fig. [148]3f). In
   conclusion, the results demonstrated ResST was capable of recovering
   detailed pathological structures in cancer tissues and provided immense
   assistance for exploring the heterogeneity present within tumor
   tissues.

   We also examined another ST data of human breast cancer (Ductal
   Carcinoma in Situ) with manual annotations (Fig. [149]3h). ResST, along
   with other methods (Fig. [150]3h and Supplementary Fig. [151]19), was
   employed to assess the ability to detect spatial structures. The
   clustering results showed that SEDR and stLearn exhibited poorer
   performance compared to CCST and ResST, with lower SC and CH and higher
   DB. Although ResST (SC = 0.132, DB = 2.433, CH = 71.518) provided
   higher DB and lower CH compared to CCST (SC = 0.092, DB = 1.793,
   CH = 199.572), ResST exhibited higher SC by performing stronger
   coherence within their respective spatial positions and generated clear
   boundaries and continuous domains consistent with the annotations which
   are similar to those of CCST. The results indicated that ResST enabled
   detailed profiles of spatial transcriptome and facilitated the
   discovery of tumor heterogeneity.

ResST corrects batch effects for vertical and horizontal integration of
multiple tissue sections

   An obstacle to integrating data from multiple sections vertically or
   horizontally is the presence of batch effects. We assessed the
   remarkable capabilities of ResST in correcting batch effects and
   revealing finer-grained tissue structures. This performance was
   conducted on the 10× Visium dataset of the posterior sagittal mouse
   brain, which comprises two sections along the same vertical axis and
   exhibits intricate tissue characteristics. ResST incorporated a
   measurement with rigorous generalization bounds, MDD, in the workflow
   for integrating data from multiple tissue sections. We compared the
   performance of the pipeline integrated without (Method 1) or with
   (Method 2) MDD measurement for evaluating the effect of MDD measurement
   on spatial clustering results. The results showed that Method 2, ResST
   (Fig. [152]4b; SC = 0.161, DB = 1.676, CH = 284.062), provided more
   finer-grained clustering results than Method 1 (Fig. [153]4b;
   SC = 0.152, DB = 2.054, CH = 145.330). As shown in Fig. [154]4b, we
   observed that ResST accurately identified the dorsal and the ventral
   horn of the hippocampus regions, which was not identified by Method 1.
   Current methods used for data integration, Harmony and Scanorama, only
   identified the ventral horn of the hippocampus region. These results
   suggested that the spots obtained by ResST retained maximal biological
   content from multiple samples and MDD measurement improved spatial
   clusters on data integration of multiple tissue sections.

Fig. 4. ResST corrects batch effects for vertical and horizontal data
integration of multiple tissue sections.

   [155]Fig. 4
   [156]Open in a new tab

   a H&E image of posterior mouse brain from the original study. b The
   spatial clustering results from Method 2 (with MDD), Method 1 (without
   MDD), Harmony, and Scanorama (from left to right). c UMAP plots before
   batch effects correction and spatial location of sections 151673 and
   151675 in DLPFC. d Vertical integration results with two DLPFC
   sections. UMAP plots after batch effects correction and spatial
   clustering from ResST, Harmony, and Scanorama (left). Spots are colored
   according to the spatial domains identified by the respective
   clustering methods (right). e Horizontal integration results from ResST
   (top) and SpaGCN (bottom) with two mouse brain samples which consist of
   sagittal anterior and posterior regions.

   We further evaluated the ability of ResST to correct batch effects on
   the DLPFC ST dataset, which originates from a contiguous region within
   the tissue obtained along a vertical axis, serving as a benchmark
   dataset for evaluating vertical integration. ResST was compared with
   Harmony and Scanorama, which are commonly used to overcome batch
   effects in single-cell genomics data. We first assessed the integration
   results visually with UMAP. There was a slight offset between sections
   prior to batch correction (Fig. [157]4c). After integration, the UMAP
   plots for ResST, Harmony, and Scanorama showed that spots from
   different sections were all well mixed, while the domain clusters were
   well segregated. In the subsequent clustering, the spatial domains from
   ResST, Harmony, and Scanorama were all highly overlapping, but the
   domains from Harmony and Scanorama were only able to recover the WM
   with remaining clusters mixed among the other layers (Fig. [158]4d).
   Moreover, the visualization demonstrated that ResST more distinctly
   revealed the laminar structure of the human cerebral cortex with clean
   separations and clear boundaries compared to other methods (Fig.
   [159]4d).

   We next tested the performance of ResST on data integration and batch
   effects correction using a mouse brain 10× Visium dataset. The mouse
   brain sections consist of anterior and posterior regions (horizontal
   sections) and include two slices for each tissue region (vertical
   sections) with complex structural characteristics. On the UMAP plots,
   ResST, Harmony, and Scanorama all reduced batch effects and spots from
   different sections were all well mixed. Moreover, ResST performed
   better and well-segregated clusters, which were not achieved by Harmony
   and Scanorama (Supplementary Fig. [160]20). Subsequently, the clusters
   from ResST aligned well across the two sections, and ResST generated
   clusters that were more consistent with known anatomy (Fig. [161]4a)
   than other methods. In contrast, the clusters generated by Harmony and
   Scanorama were fragmented and incompletely in agreement with known
   anatomy (Supplementary Fig. [162]20). We further compared the internal
   measures of ResST with those of Harmony and Scanorama. ResST exhibited
   a higher SC and similar DB values compared to Harmony (Fig. [163]4b;
   SC = 0.125, DB = 1.705, CH = 426.953) and Scanorama (Fig. [164]4b;
   SC = 0.125, DB = 1.509, CH = 675.083), generating greater continuous
   spatial domains with lower noise levels. Also, ResST performed better
   on two sections from the 10× Visium dataset of the anterior sagittal
   mouse brain (Supplementary Figs. [165]12, [166]13). These results
   demonstrated that ResST efficiently integrated data from multiple
   sections and corrected batch effects, thereby improving the performance
   of identification of spatial domain.

   Then, we compared SpaGCN and ResST in horizontal integration using two
   sections of anterior and posterior from the mouse brain 10× Visium data
   and employed the annotated Allen mouse brain Atlas^[167]48 as the
   ground truth. The results showed that most of the spatial clusters
   identified by SpaGCN and ResST demonstrated remarkable concordance with
   known anatomical structures. However, SpaGCN did not capture certain
   pivotal regions within the tissue and many clusters obtained by this
   method were fragmented and failed to align along the shared edges
   between the two tissue sections (Fig. [168]4e). In contrast, ResST
   exhibits superior performance and clearly identified the cerebral
   cortex, cerebellum, and corpus callosum. Moreover, ResST showcased
   enhanced alignment along the edges of both anterior and posterior
   sections, leading to outstanding performance on data integration and
   batch effects correction.

ResST works well on various spatial omics data independent of platforms

   Aside from the 10× Visium platform, we investigated the generalization
   ability of ResST on image-based molecular data (seqFISH, MERFISH, 4i,
   and MIBITOF) and high-resolution ST data (Stereoseq and SlideseqV2). We
   first applied ResST to 4i^[169]24 (iterative indirect
   immunofluorescence imaging) data, which measured 40 protein reads
   (~270,000 observations/pixel) in high-throughput biological samples
   from millimeter to nanometer scales. Here, we only used partial
   molecular data for spatial domain identification (Fig. [170]5a and
   Supplementary Fig. [171]21a, pixel coordinates (x[1], y[1])- (x[2],
   y[2]): (1100, 950)- (1300, 1150)). The results showed that ResST can
   depict detailed subcellular distribution in local regions, including
   various compartments, organelles, and cell structures within each cell
   (Fig. [172]5a). Also, we applied ResST to another image-based molecular
   MIBI-TOF dataset^[173]23, which imaged 36 labeled antibodies with
   histological staining and endogenous elements (3309 pixels). Similarly,
   ResST revealed partially continuous regions and locally fused elements
   in four imaging results (Fig. [174]5b and Supplementary Fig. [175]21b),
   which was almost consistent with the original annotations^[176]23.

Fig. 5. ResST works well on various spatial omics data.

   [177]Fig. 5
   [178]Open in a new tab

   a Visualization of subcellular molecular maps using 4i (Iterative
   Indirect Immunofluorescence Imaging) (top; partial data, coordinates
   (x[1], y[1])–(x[2], y[2]): (1100, 950)–(1300, 1150) and spatial domain
   detection using ResST (bottom). b Visualization of imaging-based
   molecular MIBI-TOF dataset (3309 pixels and 36 proteins) with
   annotation (top, the point16 section) and spatial domain detection
   using ResST (right, ARI = 0.524). c Visualization and annotation of
   SlideqV2 data in the mouse hippocampus (left); spatial domain detection
   using ResST (middle); distribution of gene Tmsb4x expression in the
   data. d Visualization and annotation of MERFISH data in the mouse
   prefrontal hypothalamus (top) and spatial domain detection using ResST
   (bottom). e Visualization and annotation of adult Macaque cortex
   Stereoseq data (leftmost); spatial domain partition using ResST (left
   middle); expression distribution of the marker genes GFAP, GPR83, and
   SYT6 in tissue sections of the cortex (right). f Visualization and
   annotation of seqFISH data in sagittal slices of mouse embryos (left);
   spatial domain partition using ResST (right).

   Subsequently, we evaluated the performance of ResST on ST data at an
   approximate single-cell resolution. We assessed the ability to recover
   the annotated structures of ResST on a mouse hippocampus dataset
   acquired with Slid-seqV2^[179]8 (comprising 41,886 subcellular regions
   and 4000 genes). ResST was able to produce spatially consistent
   clustering and outline the major anatomical regions such as the
   “DenatePyramids” and “CA1_CA2_CA3_subiculum”. Interestingly, ResST
   divided “CA1_CA2_CA3_subiculum” into two domains (domains 5 and 6)
   (Fig. [180]5c) and we detected significant spatial DEGs between domains
   5 and 6, such as Tmsb4x. Also, ResST demarcated several subareas,
   including the third ventricle (domain 13), medial habenula (domain 12),
   and lateral habenula (domain 5), which were not clarified in the
   original annotation (Supplementary Fig. [181]21d). Furthermore, ResST
   successfully processed MERFISH^[182]4 data from mouse hypothalamus, and
   clearly identified both “Ependymal” and “OD Mature” structural domains
   (Fig. [183]5d).

   We also validated the performance of ResST on adult Macaque cortex
   Stereoseq chips^[184]49. ResST accurately identified the L2-L6 regions
   in the cortex layer, consistent with histological staining definitions.
   In addition, we identified marker genes through a more in-depth
   analysis, such as GFAP, GPR83, and SYT6, which were consistent with
   spatial clustering results (Fig. [185]5e and Supplementary
   Fig. [186]21c). Finally, we validated the generalization ability of
   ResST on seqFISH data from sagittal slices of mouse embryos^[187]50.
   ResST can accurately identify tissue structures in the brain, heart,
   and intestinal tube among the ectoderm, mesoderm, and endoderm,
   respectively. Moreover, ResST divided “Forebrain/midbrain/hindbrain”
   into finer-grained tissue structures, forebrain, midbrain, and
   hindbrain, which were not clarified in the original annotation (Fig.
   [188]5f). These results suggested that ResST can make detailed
   structural domain divisions for ST data from different platforms.

Discussion

   ST technologies are powerful experimental methods for measuring gene
   expression while preserving the relevant spatial context. Spatial
   clustering and data integration are the most important tasks for
   analyzing ST datasets. Previous studies showed that accurate clustering
   is dependent on the comprehensive utilization of multimodal data (gene
   expression and spatial location) derived from ST technologies and
   morphological information^[189]12–[190]17. It has been demonstrated
   that gene expression characteristics in tissue cells are related to
   biological effects, such as cell properties, surrounding cells, and
   tissue microenvironment^[191]18. However, current ST processing tools
   ignore the impact of biological effects on spatial clustering.
   Moreover, current ST datasets are obtained from a single section or
   multiple tissue sections. Integration of these data includes horizontal
   or vertical integration. Previous studies have revealed that the batch
   effects during data generation processes decrease the accuracy of
   downstream analysis and must be removed while preserving the biological
   meaning of datasets^[192]51. However, few ST processing tools are
   applicable to simultaneously integrate data horizontally and vertically
   while remove batch effects. In this study, we developed a novel ST
   processing method, ResST, which not only fully utilized gene
   expression, biological effects, spatial location, and morphological
   information for improving clustering accuracy, but also simultaneously
   integrated data horizontally and vertically while removed batch
   effects.

   Gene expression derived from ST technologies encapsulates complex
   biological information including cell properties, surrounding cells,
   tissue microenvironment, cell communication, receptor-ligand
   interactions, etc, which are biological effects objectively existing in
   the tissues and organs^[193]18. Current deep learning methods used for
   ST clustering analysis are based on simplistic neural network
   architectures^[194]14,[195]15,[196]17. These methods simply aggregate
   gene expression, spatial information, or morphological information from
   neighboring spots, but they fail to capture more effective multimodal
   information, such as biological effects. In this study, ResST computed
   the gene expression correlation matrix using gene expression matrix
   based on a linear approach. The gene expression correlation matrix, a
   n × n matrix, quantified the correlation between cells/spots,
   representing biological effects. The clustering performance was
   evaluated with or without biological effects on breast cancer datasets.
   The results showed that the ARI values significantly increased when
   integrating biological effects (Supplementary Fig. [197]22), indicating
   that ResST improved spatial domain clustering performance by
   integrating biological effects. Previous reports showed that the
   residual neural network is a classical convolutional neural network
   that introduces a shortcut connection that enhances the extraction of
   effective information and effectively prevents network performance
   degradation^[198]52,[199]53. In this study, we applied the residual
   learning model to fully capture multimodal information, such as
   biological effects. The unique residual design of ResST allows the
   encoder’s output to be used as input for a graph autoencoder, and
   concatenate with the corresponding output of the autoencoder for
   generating latent embeddings that capture nonlinear relationships
   between a cell and surrounding cells for spatial domain identification.
   Therefore, ResST enables accurate and continuous spatial clustering,
   bringing ST analysis profiles closer to natural and realistic tissue
   structures and achieving more accurate clustering than that of existing
   models.

   MDD theory, a measurement with rigorous generalization bounds, can
   seamlessly be transformed into an adversarial learning algorithm for
   domain adaptation^[200]26. This algorithm promotes the proximity
   between the class mean of the two domains for each class and pushes
   apart the mean of different classes, significantly improving prediction
   accuracy in unsupervised domain adaptation^[201]26,[202]27. Notably,
   data integration tasks in ST studies include batch effects correction
   and clustering of mixed data, which essentially belong to the
   classification tasks of domain adaptation. In the present study, ResST
   achieved an algorithm design for data integration based on the MDD
   theory. The results showed that the designed algorithm can effectively
   extract the features of data from different sections and make the
   feature distributions of different sections more similar, removing the
   batch effects in data integration tasks.

   Another advantage of ResST is that it integrates ST data from multiple
   sections while preserving layer-specific gene expression variation.
   Current data integration methods mix data from multiple batches before
   performing downstream analysis^[203]19–[204]22. For example,
   PASTE^[205]21 stacks and/or integrates ST data from multiple adjacent
   tissue sections into a single slide, failing to capture cellular
   biological variations across sections. Based on the MDD theory, ResST
   feeds latent embeddings into a domain classifier through a gradient
   reversal layer for maintaining the correspondence between cells and
   tissue sections, capturing the cellular biological variations across
   sections. Therefore, ResST integrates ST data from multiple sections
   horizontally or vertically while correcting batch effects and
   preserving layer-specific gene expression variation.

   We designed ResST to process data acquired from different spatial
   transcriptomic platforms. When applied to different complexity levels
   of 10× Visium data, ResST detected more continuous spatial domains with
   reduced noise levels, which was consistent with manual annotations.
   Analysis of breast cancer ST data, ResST offered more reasonable
   partitioning of spatial domains and provided a more detailed
   delineation of substructural regions within the tissue, giving great
   assistance in understanding cancer heterogeneity and molecular
   mechanism of cancer occurrence and metastasis. ResST also identified
   fine spatial domains for seqFISH data of sagittal sections of mouse
   embryos. When analyzing ST data of human DLPFC and sagittal sections of
   mouse brain, ResST identified more continuous and accurate spatial
   domains by integrating data horizontally and detected biologically
   coherent spatial domains that aligned across samples while removing
   batch effects by integrating data vertically. Also, ResST not only
   performed well on mainstream ST platforms, such as 10× Visium, MERFISH,
   and Slide-seq but also showed great potential in other ST data (4i and
   MIBI-TOF).

   ST provides spatially resolved gene expression of diverse cells in
   complex tissues. However, uncovering a more detailed picture of complex
   biological processes in tissues remains a challenge due to limited gene
   recovery or low spatial resolution in recent ST technologies. The
   integration of ST with multi-omics data is an inevitable trend for
   comprehensively understanding complex biological processes (the
   fine-grained panorama of a heterogeneous tissue/molecular
   architecture). ResST offers opportunities for new research issues in
   data integration of ST and multi-omics. For instance, the scRNA-seq
   data can be applied to deconvolute ST data, generating clearer
   cell-type compositions and enabling tissue cartography at single-cell
   resolution. Furthermore, when single-cell resolution of ST datasets is
   available, they can be directly integrated with multi-omics data. For
   example, integrating CITE-seq will deeply differentiate cellular
   heterogeneity and accurately explore specific cell types in the spatial
   context of tissues, which can help us to investigate the molecular
   mechanisms of drug resistance in cancer therapies. Additionally,
   combining ST datasets with scNMT-seq data will profile multiple
   epigenetic features in conjunction with gene expression, at single-cell
   resolution in the spatial context of tissues, enabling a more complete
   understanding of epigenetic dependencies and their associations with
   transcription and cell states.

Methods

Normalization of gene expression

   In ST studies, gene expression and spatial information can be utilized
   to identify cells in close spatial positions that exhibit similar
   states. The histology image shows a clear difference among tissue
   substructures with different functions, suggesting histology is
   informative for clustering. The gene expression of each spot can be
   enhanced by combining with morphological information and spatial
   information, facilitating natural domain identification. Additionally,
   because of the strong link between biological effects and gene
   expression, we take into account the impact of biological effects on
   the clustering. Taking the in situ capturing-based technologies as an
   example, the spatial gene expression stores the spatial two-dimensional
   coordinates
   [MATH:
   <mrow><mo>(</mo><mrow><mi>x</mi><mo>,</mo><mi>y</mi></mrow><mo>)</mo></
   mrow> :MATH]
   of n spots, as well as a raw gene expression matrix
   [MATH: <mi mathvariant="bold">GE</mi><mo>∈</mo><msup><mrow><mi
   mathvariant="double-struck">R</mi></mrow><mrow><mi>n</mi><mo>×</mo><mi>
   d</mi></mrow></msup> :MATH]
   with n spots and d genes. The matrix stores the unique molecular
   identifier count (UMI; an n×d matrix of UMI counts) of each gene.
   [MATH: <msub><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>∈</m
   o><msup><mrow><mi
   mathvariant="double-struck">R</mi></mrow><mrow><mn>1</mn><mo>×</mo><mi>
   d</mi></mrow></msup> :MATH]
   is a raw gene expression vector for spot s[i], where
   [MATH:
   <mi>s</mi><mo>∈</mo><mrow><mo>(</mo><mrow><msub><mrow><mi>s</mi></mrow>
   <mrow><mn>1</mn></mrow></msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><mrow
   ><mi>s</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>,</mo><mo>…</mo><mo
   >,</mo><msub><mrow><mi>s</mi></mrow><mrow><mi>j</mi></mrow></msub><mo>,
   </mo><mo>…</mo><mo>,</mo><msub><mrow><mi>s</mi></mrow><mrow><mi>n</mi><
   /mrow></msub></mrow><mo>)</mo></mrow> :MATH]
   . ResST constructs enhanced gene expression matrix of all spots by
   combining morphological information, gene expression, and spatial
   location.
    1. Calculation of morphological similarity matrix. To effectively
       utilize morphological information, we accurately evaluate the
       morphological similarity matrix by measuring the morphological
       distance between each spot and others. For the ST data with
       morphological information, ResST segments the histological images
       based on the spatial coordinates of each spot and employs a
       pre-trained convolutional neural network (optional; default
       ResNet^[206]52) to extract image features from the segmented
       subgraphs as the morphological feature vectors of each spot. This
       model can convert each spot image into a 2048-dimensional latent
       variable. To better represent the morphological features of each
       spot, we apply principal component analysis (PCA) to extract the
       first 50 principal components as latent characteristics and
       calculated the cumulative explained variance (more than 0.99999).
       Cosine distance is used to calculate the morphological similarity
       weight
       [MATH: <msub><mrow><mi
       mathvariant="bold">MW</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow><
       /msub> :MATH]
       between s[i] and s[j], as

   [MATH: <mi
   mathvariant="bold">MW</mi><mrow><mo>(</mo><mrow><msub><mrow><mi>s</mi><
   /mrow><mrow><mi>i</mi></mrow></msub><mo>,</mo><msub><mrow><mi>s</mi></m
   row><mrow><mi>j</mi></mrow></msub></mrow><mo>)</mo></mrow><mo>=</mo><ms
   ub><mrow><mi
   mathvariant="bold">MW</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msu
   b><mo>=</mo><mfrac><mrow><msub><mrow><mi
   mathvariant="bold">M</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>⋅</mo
   ><msub><mrow><mi
   mathvariant="bold">M</mi></mrow><mrow><mi>j</mi></mrow></msub></mrow><m
   row><msub><mrow><mo>∣</mo><mo>∣</mo><msub><mrow><mi
   mathvariant="bold">M</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>∣</mo
   ><mo>∣</mo></mrow><mrow><mn>2</mn></mrow></msub><msub><mrow><mo>∣</mo><
   mo>∣</mo><msub><mrow><mi
   mathvariant="bold">M</mi></mrow><mrow><mi>j</mi></mrow></msub><mo>∣</mo
   ><mo>∣</mo></mrow><mrow><mn>2</mn></mrow></msub></mrow></mfrac><mo>,</m
   o> :MATH]
       1
       where
       [MATH: <mi mathvariant="bold">MW</mi> :MATH]
       is a n×n morphological similarity matrix,
       [MATH: <msub><mrow><mi
       mathvariant="bold">MW</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow><
       /msub> :MATH]
       is the morphological similarity weight between s[i] and s[j],
       [MATH: <msub><mrow><mi
       mathvariant="bold">M</mi></mrow><mrow><mi>i</mi></mrow></msub>
       :MATH]
       and
       [MATH: <msub><mrow><mi
       mathvariant="bold">M</mi></mrow><mrow><mi>j</mi></mrow></msub>
       :MATH]
       represent the latent characteristics after performing PCA on
       morphological feature vectors for spots s[i] and s[j],
       respectively.
    2. Calculation of gene expression correlation matrix. The gene
       expression of a cell is influenced by its neighboring cells and
       intracellular or extracellular states, suggesting biological
       effects are nonnegligible for clustering. We explore biological
       effects by calculating the gene expression correlation
       [MATH: <msub><mrow><mi
       mathvariant="bold">GW</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow><
       /msub> :MATH]
       , using the raw gene expression vectors
       [MATH: <msub><mrow><mi mathvariant="bold">GE</mi></mrow><mrow><mi
       mathvariant="bold-italic">i</mi></mrow></msub> :MATH]
       and
       [MATH: <msub><mrow><mi mathvariant="bold">GE</mi></mrow><mrow><mi
       mathvariant="bold-italic">j</mi></mrow></msub> :MATH]
       from spots s[i] and s[j]. The characteristics of
       [MATH: <msub><mrow><mi
       mathvariant="bold">GW</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow><
       /msub> :MATH]
       are:

   [MATH: <mi
   mathvariant="bold">GW</mi><mrow><mo>(</mo><mrow><msub><mrow><mi>s</mi><
   /mrow><mrow><mi>i</mi></mrow></msub><mo>,</mo><msub><mrow><mi>s</mi></m
   row><mrow><mi>j</mi></mrow></msub></mrow><mo>)</mo></mrow><mo>=</mo><ms
   ub><mrow><mi mathvariant="bold">G</mi><mi
   mathvariant="bold">W</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub
   ><mo>=</mo><mfrac><mrow><mfenced close=")"
   open="("><mrow><msub><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>−</m
   o><mover accent="true"><mrow><msub><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow><
   mo accent="true">¯</mo></mover></mrow></mfenced><mo>⋅</mo><mfenced
   close=")" open="("><mrow><msub><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mi>j</mi></mrow></msub><mo>−</m
   o><mover accent="true"><mrow><msub><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mi>j</mi></mrow></msub></mrow><
   mo
   accent="true">¯</mo></mover></mrow></mfenced></mrow><mrow><msub><mrow><
   mo>∣</mo><mo>∣</mo><msub><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>−</m
   o><mover accent="true"><mrow><msub><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow><
   mo
   accent="true">¯</mo></mover><mo>∣</mo><mo>∣</mo></mrow><mrow><mn>2</mn>
   </mrow></msub><msub><mrow><mo>∣</mo><mo>∣</mo><msub><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mi>j</mi></mrow></msub><mo>−</m
   o><mover accent="true"><mrow><msub><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mi>j</mi></mrow></msub></mrow><
   mo
   accent="true">¯</mo></mover><mo>∣</mo><mo>∣</mo></mrow><mrow><mn>2</mn>
   </mrow></msub></mrow></mfrac><mo>,</mo> :MATH]
       2
       where
       [MATH: <mi mathvariant="bold">G</mi><mi mathvariant="bold">W</mi>
       :MATH]
       is a n×n gene expression correlation matrix,
       [MATH: <msub><mrow><mi
       mathvariant="bold">GW</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow><
       /msub> :MATH]
       is the gene expression correlation between s[i] and s[j],
       [MATH: <msub><mrow><mi mathvariant="bold">G</mi><mi
       mathvariant="bold">E</mi></mrow><mrow><mi>i</mi></mrow></msub>
       :MATH]
       and
       [MATH: <msub><mrow><mi mathvariant="bold">G</mi><mi
       mathvariant="bold">E</mi></mrow><mrow><mi>j</mi></mrow></msub>
       :MATH]
       represent the raw gene expression vectors for spots s[i] and s[j].
    3. Calculation of spatial correlation matrix. We initially calculate
       the Euclidean distance between each spot and all other spots based
       on spatial coordinates and normalize the distances for neighboring
       spots to quantify the influence of spatial information on
       determining similar cellular states. For given spots s[i] and s[j],
       if the distance between the two spots is less than radius r, s[i]
       and s[j] are considered to be adjacent, and
       [MATH: <msub><mrow><mi
       mathvariant="bold">SW</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow><
       /msub><mo>=</mo><mn>1</mn> :MATH]
       , otherwise
       [MATH: <msub><mrow><mi
       mathvariant="bold">SW</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow><
       /msub><mo>=</mo><mn>0</mn> :MATH]
       . The characteristics of
       [MATH: <msub><mrow><mi mathvariant="bold">S</mi><mi
       mathvariant="bold">W</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></
       msub> :MATH]
       are:

   [MATH: <msub><mrow><mi mathvariant="bold">S</mi><mi
   mathvariant="bold">W</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub
   ><mo>=</mo><mfenced open="{"><mrow><mtable><mtr><mtd
   columnalign="center"><mn>1</mn><mo>,</mo><mi
   mathvariant="normal">&</mi><msub><mrow><mi mathvariant="bold">S</mi><mi
   mathvariant="bold">W</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub
   ><mspace width="0.25em"></mspace><mo><</mo><mspace
   width="0.25em"></mspace><mi>r</mi></mtd></mtr><mtr><mtd
   columnalign="center"><mn>0</mn><mo>,</mo><mi
   mathvariant="normal">&</mi><msub><mrow><mi mathvariant="bold">S</mi><mi
   mathvariant="bold">W</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub
   ><mo>≥</mo><mi>r</mi></mtd></mtr></mtable></mrow></mfenced><mo>,</mo>
   :MATH]
       3
       where
       [MATH: <mi mathvariant="bold">S</mi><mi mathvariant="bold">W</mi>
       :MATH]
       is an n×n spatial correlation matrix and the value of
       [MATH: <msub><mrow><mi
       mathvariant="bold">SW</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow><
       /msub> :MATH]
       represents whether s[j] is a neighboring spot of s[i]. We
       calculated Euclidean distance of the coordinate correlations by
       establishing a mapping relationship of each sequencing spot’s
       position between the experimental platform and the histological
       image. The threshold r was obtained by multiplying an adjustment
       factor by Euclidean distance to determine whether spots have
       spatial correlations.
    4. The enhanced gene weight
       [MATH: <msub><mrow><mi mathvariant="bold">E</mi><mi
       mathvariant="bold">GW</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow><
       /msub> :MATH]
       between s[i] and s[j] is calculated as follows:

   [MATH: <mi
   mathvariant="bold">EGW</mi><mrow><mo>(</mo><mrow><msub><mrow><mi>s</mi>
   </mrow><mrow><mi>i</mi></mrow></msub><msub><mrow><mo>,</mo><mi>s</mi></
   mrow><mrow><mi>j</mi></mrow></msub></mrow><mo>)</mo></mrow><mo>=</mo><m
   sub><mrow><mi mathvariant="bold">EGW</mi></mrow><mrow><mspace
   width="0.3em"></mspace><mi>i</mi><mi>j</mi></mrow></msub><mo>=</mo><msu
   b><mrow><mi mathvariant="bold">SW</mi></mrow><mrow><mspace
   width="0.3em"></mspace><mi>i</mi><mi>j</mi></mrow></msub><mo>⋅</mo><msu
   b><mrow><mi mathvariant="bold">MW</mi></mrow><mrow><mspace
   width="0.3em"></mspace><mi>i</mi><mi>j</mi></mrow></msub><mo>⋅</mo><msu
   b><mrow><mi mathvariant="bold">GW</mi></mrow><mrow><mspace
   width="0.3em"></mspace><mi>i</mi><mi>j</mi></mrow></msub><mo>,</mo>
   :MATH]
   4

   if in 10× Visium, otherwise as:
   [MATH: <mi
   mathvariant="bold">EGW</mi><mrow><mo>(</mo><mrow><msub><mrow><mi>s</mi>
   </mrow><mrow><mi>i</mi></mrow></msub><msub><mrow><mo>,</mo><mi>s</mi></
   mrow><mrow><mi>j</mi></mrow></msub></mrow><mo>)</mo></mrow><mo>=</mo><m
   sub><mrow><mi mathvariant="bold">EGW</mi></mrow><mrow><mspace
   width="0.3em"></mspace><mi>i</mi><mi>j</mi></mrow></msub><mo>=</mo><msu
   b><mrow><mi mathvariant="bold">SW</mi></mrow><mrow><mspace
   width="0.3em"></mspace><mi>i</mi><mi>j</mi></mrow></msub><mo>⋅</mo><msu
   b><mrow><mi
   mathvariant="bold">GW</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msu
   b><mo>,</mo> :MATH]
   5

   where
   [MATH: <mi mathvariant="bold">EG</mi><mi mathvariant="bold">W</mi>
   :MATH]
   is an n×n enhanced gene weight matrix,
   [MATH: <mi mathvariant="bold">E</mi><msub><mrow><mi
   mathvariant="bold">GW</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msu
   b> :MATH]
   is the enhanced gene weight between s[i] and s[j].

   ResST then normalizes gene expression vector
   [MATH: <msubsup><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mi>i</mi></mrow><mrow><mo>′</mo
   ></mrow></msubsup> :MATH]
   of each spot s[i], as shown in Eq. ([207]6):
   [MATH: <msubsup><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mi>i</mi></mrow><mrow><mo>′</mo
   ></mrow></msubsup><mo>=</mo><msub><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>+</m
   o><mfrac><mrow><msubsup><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>m</mi></mrow></msubsup><msub><mrow><msub><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mi>j</mi></mrow></msub><mo>⋅</m
   o><mi
   mathvariant="bold">EGW</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></ms
   ub></mrow><mrow><mi>m</mi></mrow></mfrac> :MATH]
   6

   where
   [MATH: <msubsup><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mi>i</mi></mrow><mrow><mo>′</mo
   ></mrow></msubsup> :MATH]
   represents normalized gene expression vector for a center spot s[i],
   [MATH: <msub><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mi>i</mi></mrow></msub> :MATH]
   and
   [MATH: <msub><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mi>j</mi></mrow></msub> :MATH]
   are the raw gene expression vectors for spot s[i] and m adjacent spots
   s[j].

Graph construction for ST data

   To make full use of spatial information, we first construct a cell-cell
   spatial relationship graph G = (V, E) by calculating the distances
   between spots based on spatial location and selecting the k nearest
   spots as the current spot’s neighbors. In graph G, V represents the set
   of spots and E is the set of connected edges between spots. That is, A
   is defined as the adjacency matrix of graph G, and if spot
   [MATH:
   <msub><mrow><mi>s</mi></mrow><mrow><mi>j</mi></mrow></msub><mo>∈</mo><m
   i>V</mi> :MATH]
   is the neighbor of spot
   [MATH:
   <msub><mrow><mi>s</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>∈</mo><m
   i>V</mi> :MATH]
   ,
   [MATH: <msub><mrow><mi
   mathvariant="bold">A</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub
   ><mo>=</mo><msub><mrow><mi
   mathvariant="bold">A</mi></mrow><mrow><mi>j</mi><mi>i</mi></mrow></msub
   ><mo>=</mo><mn>1</mn> :MATH]
   , otherwise
   [MATH: <msub><mrow><mi
   mathvariant="bold">A</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub
   ><mo>=</mo><msub><mrow><mi
   mathvariant="bold">A</mi></mrow><mrow><mi>j</mi><mi>i</mi></mrow></msub
   ><mo>=</mo><mn>0</mn> :MATH]
   . Additionally, we add a self-loop to this matrix. For a given spot,
   its neighbors are determined by calculating the distance between spots
   using spatial locations (optional; default is Euclidean), and we select
   k spots as its neighbors from the top nearest neighbors. In our tests,
   ResST achieves its best performance in most of the tested datasets with
   k = 12 (Supplementary Data [208]5). Finally, we perform normalization
   preprocessing on the adjacency matrix, as shown in Eq. ([209]7):
   [MATH: <msup><mrow><mi
   mathvariant="bold">A</mi></mrow><mrow><mo>′</mo></mrow></msup><mo>=</mo
   ><msup><mrow><mi
   mathvariant="bold">D</mi></mrow><mrow><mo>−</mo><mn>1</mn><mo>/</mo><mn
   >2</mn></mrow></msup><mo>⋅</mo><mi
   mathvariant="bold">A</mi><mo>⋅</mo><msup><mrow><mi
   mathvariant="bold">D</mi></mrow><mrow><mo>−</mo><mn>1</mn><mo>/</mo><mn
   >2</mn></mrow></msup><mo>,</mo> :MATH]
   7

   where
   [MATH: <msup><mrow><mi
   mathvariant="bold">A</mi></mrow><mrow><mo>′</mo></mrow></msup> :MATH]
   is the symmetrically normalized adjacency matrix and
   [MATH: <mi mathvariant="bold">D</mi> :MATH]
   is the degree of adjacency matrix
   [MATH: <mi mathvariant="bold">A</mi> :MATH]
   .

Graph self-supervised residual learning

   We propose a graph self-supervised residual learning model based on a
   graph neural network to learn the latent embedding from the normalized
   gene expression
   [MATH: <msup><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mo>′</mo></mrow></msup> :MATH]
   and the neighborhood graph G. This process comprises three major steps:
   (1) data preprocessing, (2) Graph autoencoder for spatial embedding,
   and (3) Self-supervised residual learning for latent embedding.
   Subsequently, the latent embedding is used for spatial clustering and
   multiple ST data integration.

Data preprocessing

   In the aforementioned steps, ResST calculated normalized gene
   expression using the complete raw gene expression counts. Before model
   training, the normalized gene expression matrix
   [MATH: <msup><mrow><mi
   mathvariant="bold">GE</mi></mrow><mrow><mo>′</mo></mrow></msup> :MATH]
   is initially logarithmically transformed and normalized by library size
   and scaled to unit variance and zero mean using the Scanpy package. In
   this paper, all 10× Visium datasets are subjected to PCA to extract the
   top 200 principal components (optional) for generating gene expression
   vectors X as input for model training. For datasets obtained from other
   platforms, the number of PCA components for datasets with fewer
   measured genes is equal to the number of genes, such as 4i, MOBI-TOF,
   MERFISH, seqFISH, and stereoseq. For Slide-seqV2 data, which has a
   higher number of measured genes, the top 200 principal components were
   selected.

   To enhance the model’s robustness, we randomly corrupt the gene
   expression matrix using a noise function while preserving the original
   graph’s topological structure unchanged.

Graph autoencoder for spatial embedding

   We devise a graph autoencoder that imbues spatial information from the
   cell-cell spatial relationship graph into gene expression vectors
   through iterative model training. Graph autoencoder aggregates the
   spatial information of neighboring cells or spots by improving the
   variational graph autoencoder. Spatial embedding (the residual) is
   computed by posterior probabilities and is incorporated into the
   learning of latent embedding, as shown in the expression below:
   [MATH: <mi>q</mi><mfenced close=")" open="("><mrow><msub><mrow><mi
   mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">g</mi></mrow></msub><mo>∣</mo><mi
   mathvariant="bold">A</mi><mo>,</mo><msub><mrow><mi
   mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">e</mi></mrow></msub></mrow></mfenced><mo>=</mo><msub
   sup><mrow><mo
   mathsize="big">∏</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>n</mi></mrow></msubsup><mi>q</mi><mfenced close=")"
   open="("><mrow><mfenced close="∣"><mrow><msub><mrow><mi
   mathvariant="bold">z</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow></
   mfenced><mi mathvariant="bold">A</mi><mo>,</mo><msub><mrow><mi
   mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">e</mi></mrow></msub></mrow></mfenced><mo>,</mo>
   :MATH]
   8

   the solution for the posterior probability is shown in the equation
   below:
   [MATH: <mi>q</mi><mfenced close=")" open="("><mrow><mfenced
   close="∣"><mrow><msub><mrow><mi
   mathvariant="bold-italic">z</mi></mrow><mrow><mi>i</mi></mrow></msub></
   mrow></mfenced><mi mathvariant="bold">A</mi><mo>,</mo><msub><mrow><mi
   mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">e</mi></mrow></msub></mrow></mfenced><mo>=</mo><mi
   mathvariant="script">N</mi><mfenced close=")" open="("><mrow><mfenced
   close="∣"><mrow><msub><mrow><mi
   mathvariant="bold">z</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow></
   mfenced><msub><mrow><mi>μ</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>
   ,</mo><mi>d</mi><mi>i</mi><mi>a</mi><mi>g</mi><mfenced close=")"
   open="("><mrow><msubsup><mrow><mi>σ</mi></mrow><mrow><mi>i</mi></mrow><
   mrow><mn>2</mn></mrow></msubsup></mrow></mfenced></mrow></mfenced><mo>,
   </mo> :MATH]
   9

   where n is the number of cells or spots,
   [MATH: <mi mathvariant="bold">A</mi> :MATH]
   is the adjacency matrix,
   [MATH: <mi mathvariant="script">N</mi> :MATH]
   is the normal distribution,
   [MATH: <msub><mrow><mi
   mathvariant="bold">z</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>=</mo
   ><msub><mrow><mi>G</mi><mi>N</mi><mi>N</mi></mrow><mrow><mi>μ</mi></mro
   w></msub><mrow><mo>(</mo><mrow><mi
   mathvariant="bold">A</mi><mo>,</mo><msub><mrow><mi
   mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">e</mi></mrow></msub></mrow><mo>)</mo></mrow> :MATH]
   is the means of adjacency matrix,
   [MATH: <msub><mrow><mi mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">e</mi></mrow></msub> :MATH]
   and
   [MATH: <msub><mrow><mi mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">g</mi></mrow></msub> :MATH]
   are introduced in the section of self-supervised residual learning for
   latent embedding,
   [MATH:
   <mi>log</mi><mi>σ</mi><mo>=</mo><msub><mrow><mi>G</mi><mi>N</mi><mi>N</
   mi></mrow><mrow><mi>σ</mi></mrow></msub><mrow><mo>(</mo><mrow><mi
   mathvariant="bold">A</mi><mo>,</mo><msub><mrow><mi
   mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">e</mi></mrow></msub></mrow><mo>)</mo></mrow> :MATH]
   is the variance between the node vectors. The two-layer GNN is defined
   as:
   [MATH: <mi>G</mi><mi>N</mi><mi>N</mi><mfenced close=")"
   open="("><mrow><msub><mrow><mi
   mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">e</mi></mrow></msub><mo>,</mo><mi
   mathvariant="bold">A</mi></mrow></mfenced><mo>=</mo><msup><mrow><mi
   mathvariant="bold">A</mi></mrow><mrow><mo>′</mo></mrow></msup><mi>R</mi
   ><mi>e</mi><mi>L</mi><mi>U</mi><mfenced close=")"
   open="("><mrow><msup><mrow><mi
   mathvariant="bold">A</mi></mrow><mrow><mo>′</mo></mrow></msup><msub><mr
   ow><mi mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">e</mi></mrow></msub><msub><mrow><mi
   mathvariant="bold">W</mi></mrow><mrow><mi
   mathvariant="bold">o</mi></mrow></msub></mrow></mfenced><msub><mrow><mi
   mathvariant="bold">W</mi></mrow><mrow><mi
   mathvariant="bold">1</mi></mrow></msub><mo>,</mo> :MATH]
   10

   where
   [MATH:
   <msub><mrow><mi>G</mi><mi>N</mi><mi>N</mi></mrow><mrow><mi>μ</mi></mrow
   ></msub><mrow><mo>(</mo><mrow><msub><mrow><mi
   mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">e</mi></mrow></msub><mo>,</mo><mi
   mathvariant="bold">A</mi></mrow><mo>)</mo></mrow> :MATH]
   and
   [MATH:
   <msub><mrow><mi>G</mi><mi>N</mi><mi>N</mi></mrow><mrow><mi>σ</mi></mrow
   ></msub><mrow><mo>(</mo><mrow><msub><mrow><mi
   mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">e</mi></mrow></msub><mo>,</mo><mi
   mathvariant="bold">A</mi></mrow><mo>)</mo></mrow> :MATH]
   share the same weight
   [MATH: <msub><mrow><mi mathvariant="bold">W</mi></mrow><mrow><mi
   mathvariant="bold-italic">0</mi></mrow></msub> :MATH]
   , but
   [MATH: <msub><mrow><mi mathvariant="bold">W</mi></mrow><mrow><mi
   mathvariant="bold-italic">1</mi></mrow></msub> :MATH]
   is different.
   [MATH: <msup><mrow><mi
   mathvariant="bold">A</mi></mrow><mrow><mo>′</mo></mrow></msup> :MATH]
   is the symmetrically normalized adjacency matrix, as shown in Eq.
   ([210]7).

Self-supervised residual learning for latent embedding

   We propose a deep autoencoder, a graph self-supervised residual
   learning model to aggregate gene expression, biological effects,
   spatial location, and morphological information for capturing nonlinear
   relationships between a spot and surrounding spots. The deep
   autoencoder consists of an encoder, a graph autoencoder, and a decoder.
   As shown in Fig. [211]1c, the deep autoencoder learns low-dimensional
   gene representation by encoder. Next, low-dimensional gene
   representation is mapped to spatial embedding (residual) by the graph
   autoencoder. Finally, low-dimensional gene representation from the
   encoder is mapped through identity shortcuts and concatenates with the
   residual for composing latent embedding. Identity shortcuts are never
   closed, and low-dimensional gene representation is always passed
   through, with additional residual functions to be learned.

   We feed the gene expression vector X into the encoder E after data
   preprocessing. The encoder
   [MATH: <mi>E</mi> :MATH]
   is composed of multiple fully connected layers, which can denoise the
   input data and convert the input
   [MATH: <mi mathvariant="bold">X</mi><mo>∈</mo><msup><mrow><mi
   mathvariant="double-struck">R</mi></mrow><mrow><mi>n</mi><mo>×</mo><mn>
   200</mn></mrow></msup> :MATH]
   into a low-dimensional gene representation
   [MATH: <msub><mrow><mi mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">e</mi></mrow></msub><mo>∈</mo><msup><mrow><mi
   mathvariant="double-struck">R</mi></mrow><mrow><mi>n</mi><mo>×</mo><msu
   b><mrow><mi>H</mi></mrow><mrow><mi>e</mi></mrow></msub></mrow></msup>
   :MATH]
   , where
   [MATH: <msub><mrow><mi>H</mi></mrow><mrow><mi>e</mi></mrow></msub>
   :MATH]
    = 20 is the dimension of the last encoder layer. The number and
   dimension of the fully connected layers can be defined by the user and
   can be expressed as:
   [MATH: <mi>E</mi><mfenced close=")" open="("><mrow><mi
   mathvariant="bold">X</mi></mrow></mfenced><mo>=</mo><msub><mrow><mi
   mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">e</mi></mrow></msub><mo>.</mo> :MATH]
   11

   Additionally, the low-dimensional gene representation and the cell-cell
   spatial relationship graph G are spatially embedded by two graph neural
   network (GNN) layers, resulting in a spatial embedding
   [MATH: <msub><mrow><mi mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">g</mi></mrow></msub><mo>∈</mo><msup><mrow><mi
   mathvariant="double-struck">R</mi></mrow><mrow><mi>n</mi><mo>×</mo><msu
   b><mrow><mi>H</mi></mrow><mrow><mi>g</mi></mrow></msub></mrow></msup>
   :MATH]
   , which can be expressed as:
   [MATH: <mi>G</mi><mi>N</mi><mi>N</mi><mfenced close=")"
   open="("><mrow><msub><mrow><mi
   mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">e</mi></mrow></msub></mrow></mfenced><mo>=</mo><msub
   ><mrow><mi mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">g</mi></mrow></msub><mo>,</mo> :MATH]
   12

   where
   [MATH: <msub><mrow><mi>H</mi></mrow><mrow><mi>g</mi></mrow></msub>
   :MATH]
    = 8 is the dimension of the last GNN layer.

   Finally, the decoder
   [MATH: <mi>D</mi> :MATH]
   is composed of multiple fully connected layers that decode the latent
   embedding
   [MATH: <mi mathvariant="bold">Z</mi><mo>∈</mo><msup><mrow><mi
   mathvariant="double-struck">R</mi></mrow><mrow><mi>n</mi><mo>×</mo><mi>
   H</mi></mrow></msup> :MATH]
   and reconstruct the gene expression matrix
   [MATH: <msup><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mo>′</mo></mrow></msup><mo>∈</mo
   ><msup><mrow><mi
   mathvariant="double-struck">R</mi></mrow><mrow><mi>n</mi><mo>×</mo><mn>
   200</mn></mrow></msup> :MATH]
   . The latent embedding is composed of the low-dimensional gene
   representation and spatial embedding with “shortcut connections”. This
   residual learning structure can facilitate the learning of more
   nonlinear information from gene expression, biological effects, spatial
   information, and morphological information, and it can be expressed as
   follows:
   [MATH: <mi>D</mi><mfenced close=")" open="("><mrow><mi
   mathvariant="bold">Z</mi></mrow></mfenced><mo>=</mo><msup><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mo>′</mo></mrow></msup><mo>,</mo
   > :MATH]
   13
   [MATH:
   <mi>H</mi><mo>=</mo><msub><mrow><mi>H</mi></mrow><mrow><mi>e</mi></mrow
   ></msub><mo>+</mo><msub><mrow><mi>H</mi></mrow><mrow><mi>g</mi></mrow><
   /msub><mo>,</mo> :MATH]
   14

   where n is the number of cells or spots.

   The loss function L includes the reconstruction loss between the
   reconstructed gene expression matrix and the original gene expression
   matrix, and the Kullback–Leibler divergence of the node representation
   vector distribution and the normal distribution, as shown below:
   [MATH:
   <mi>L</mi><mo>=</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mi>n</mi></mro
   w></mfrac><msubsup><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>n</mi></mrow></msubsup><msup><mrow><mo>∣</mo><mo>∣</mo><msub>
   <mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>−</mo
   ><msub><mrow><msup><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mo>′</mo></mrow></msup></mrow><m
   row><mi>i</mi></mrow></msub><mo>∣</mo><mo>∣</mo></mrow><mrow><mn>2</mn>
   </mrow></msup><mo>+</mo><mi>K</mi><mi>L</mi><mrow><mo>[</mo><mrow><mi>q
   </mi><mrow><mo>(</mo><mrow><msub><mrow><mi
   mathvariant="bold">Z</mi></mrow><mrow><mi>g</mi></mrow></msub><mo>∣</mo
   ><mi mathvariant="bold">A</mi><mo>,</mo><msub><mrow><mi
   mathvariant="bold">Z</mi></mrow><mrow><mi
   mathvariant="bold">e</mi></mrow></msub></mrow><mo>)</mo></mrow><mo>∣</m
   o><mo>∣</mo><mi>p</mi><mrow><mo>(</mo><mrow><msub><mrow><mi
   mathvariant="bold">Z</mi></mrow><mrow><mi>g</mi></mrow></msub></mrow><m
   o>)</mo></mrow></mrow><mo>]</mo></mrow><mo>,</mo> :MATH]
   15

   where n,
   [MATH: <msub><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mi>i</mi></mrow></msub> :MATH]
   and
   [MATH: <msub><mrow><msup><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mo>′</mo></mrow></msup></mrow><m
   row><mi>i</mi></mrow></msub> :MATH]
   are the same as mentioned above, q is posterior probability the same as
   Eq. ([212]8),
   [MATH: <mi>p</mi><mrow><mo>(</mo><mrow><msub><mrow><mi
   mathvariant="bold">Z</mi></mrow><mrow><mi>g</mi></mrow></msub></mrow><m
   o>)</mo></mrow><mo>=</mo><msub><mrow><mo>∏</mo></mrow><mrow><mi>i</mi><
   /mrow></msub><mi
   mathvariant="script">N</mi><mrow><mo>(</mo><mrow><msub><mrow><mi
   mathvariant="bold">z</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>∣</mo
   ><mn>0</mn><mo>,</mo><mi>I</mi></mrow><mo>)</mo></mrow> :MATH]
   .

Training strategies

   The training of ResST model involves several key strategies. Initially,
   ST data are processed to produce the input matrix X and adjacency
   matrix A for the model.

   Firstly, we initialize ResST and configure the Adam optimizer along
   with specific hyperparameters, such as learning rate and weight decay.
   In this study, the Adam optimizer’s learning rate is 0.0005, and weight
   decay is 0.0004. During the pre-training phase, we employ a noise
   function to augment the input data with a masking ratio of 0.001.
   Concurrently, we calculate the loss function, which comprises the
   reconstruction error and KL divergence. Following the pre-training
   phase, the Kmeans algorithm is utilized to perform initial clustering
   on the pre-trained latent embeddings. The model then progresses to the
   formal training phase. In this phase, we periodically recalculate the
   target distribution and update the clustering labels, while optimize
   the model parameters by minimizing the loss function. The loss function
   now includes reconstruction error, KL divergence, and adversarial loss.
   Within the optimize_cluster function, the CH index is used to assess
   clustering performance, and the resolution with the highest CH score is
   selected as the optimal resolution. During training, KL divergence is
   employed to measure the difference between the predicted distribution
   and the target distribution for preventing overfitting.

   Throughout the training, we monitor the total loss (including
   reconstruction error, KL divergence, and adversarial loss) and stop
   training when the total loss reaches a preset tolerance (0.001). Other
   hyperparameters include KL divergence weight (0.1), mean squared error
   weight (10), and binary cross-entropy weight (1) to balance different
   loss terms in the total loss.

Batch effects correction for spatial transcriptomics

   Integrated analysis of multiple tissue sections can provide deeper
   insights into a comprehensive understanding of biological functions.
   Integration of ST datasets is typically categorized into two scenarios:
   vertical integration and horizontal integration. The major challenges
   for integrated analyses are the presence of batch effects among
   different sections and accurately assigning spots to the correct
   spatial domains. To overcome these challenges, we extend ResST model,
   which applies MDD to correct batch effects and align gene expression
   from different batches into the same feature space. Compared to other
   methods, ResST efficiently conducts vertical or horizontal integration
   from multiple tissue sections while correcting batch effects and
   preserving layer-specific gene expression variations.

   In ST data from different batches, there are no direct spatial
   relationships among spots from different sections. For vertical
   integration, we align the spatial locations from ST data based on H&E
   images, resulting in adjacent space among different sections.
   Subsequently, we construct the cell-cell spatial relationship graph
   using the aligned spatial coordinates. For horizontal integration, we
   use one of the sections as a reference and transform the spatial
   coordinates of the other sections to align the domains straddling the
   jointing edges. The model uses A^m to represent the adjacency matrix of
   the mth batch and X^m to represent the gene expression matrix of the
   mth batch. This study creates a block diagonal matrix
   [MATH: <mi mathvariant="bold">A</mi> :MATH]
   , and concatenates the gene expression matrix
   [MATH: <mi mathvariant="bold">X</mi> :MATH]
   with it, as shown below:
   [MATH: <mi mathvariant="bold">A</mi><mo>=</mo><mfenced close="]"
   open="["><mrow><mtable><mtr><mtd columnalign="center"><msup><mrow><mi
   mathvariant="bold">A</mi></mrow><mrow><mn>1</mn></mrow></msup></mtd><mt
   d columnalign="center"><mo>⋯</mo></mtd><mtd
   columnalign="center"><mn>0</mn></mtd></mtr><mtr><mtd
   columnalign="center"><mo>⋮</mo></mtd><mtd
   columnalign="center"><mo>⋱</mo></mtd><mtd
   columnalign="center"><mo>⋮</mo></mtd></mtr><mtr><mtd
   columnalign="center"><mn>0</mn></mtd><mtd
   columnalign="center"><mo>⋯</mo></mtd><mtd
   columnalign="center"><msup><mrow><mi
   mathvariant="bold">A</mi></mrow><mrow><mi>m</mi></mrow></msup></mtd></m
   tr></mtable></mrow></mfenced><mo>,</mo><mi
   mathvariant="bold">X</mi><mo>=</mo><mfenced close="]"
   open="["><mrow><mtable><mtr><mtd columnalign="center"><msup><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mn>1</mn></mrow></msup></mtd></m
   tr><mtr><mtd columnalign="center"><mo>⋮</mo></mtd></mtr><mtr><mtd
   columnalign="center"><msup><mrow><mi
   mathvariant="bold">X</mi></mrow><mrow><mi>m</mi></mrow></msup></mtd></m
   tr></mtable></mrow></mfenced><mo>,</mo> :MATH]
   16

   where m is the number of ST batches.

   Next, we apply MDD to reduce batch effects and map the gene expression
   from different batches to the same feature space, resulting in maximal
   spatial proximity of different batches. MDD consists of a feature
   extractor, domain classifier, and label classifier. The feature
   extractor and label classifier share the encoder and decoder parts of
   the deep autoencoder, which are composed of joint linear layers and
   graphical neural networks. The domain classifier is composed of joint
   linear layers, with a user-defined number of hidden layers (default is
   2). A gradient reversal layer (GRL) is added before the domain
   classifier to blur the boundary between the source and target domain.
   We train the feature extractor and extract shared features from the
   source and target domain using Eq. ([213]15). The domain classifier
   further maps the latent embeddings Z outputted by the feature extractor
   to a new D-dimensional representation:
   [MATH: <mi mathvariant="bold">Z</mi><mo>→</mo><msup><mrow><mi
   mathvariant="double-struck">R</mi></mrow><mrow><mi>D</mi></mrow></msup>
   :MATH]
   . We train the domain classifier using binary cross-entropy loss
   function to maximize the adversarial loss, thus increasing the
   dissimilarity between the source and target domain, as shown in the
   equation below:
   [MATH:
   <mi>L</mi><mo>=</mo><mo>−</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mi>n
   </mi></mrow></mfrac><msubsup><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>n</mi></mrow></msubsup><msub><mrow><mi>y</mi></mrow><mrow><mi
   >i</mi></mrow></msub><mo>⋅</mo><mi>log</mi><mfenced close=")"
   open="("><mrow><mi>p</mi><mfenced close=")"
   open="("><mrow><msub><mrow><mi>y</mi></mrow><mrow><mi>i</mi></mrow></ms
   ub></mrow></mfenced></mrow></mfenced><mo>+</mo><mfenced close=")"
   open="("><mrow><mn>1</mn><mo>−</mo><msub><mrow><mi>y</mi></mrow><mrow><
   mi>i</mi></mrow></msub></mrow></mfenced><mo>⋅</mo><mi>log</mi><mfenced
   close=")" open="("><mrow><mn>1</mn><mo>−</mo><mi>p</mi><mfenced
   close=")"
   open="("><mrow><msub><mrow><mi>y</mi></mrow><mrow><mi>i</mi></mrow></ms
   ub></mrow></mfenced></mrow></mfenced><mo>,</mo> :MATH]
   17

   where
   [MATH: <msub><mrow><mi>y</mi></mrow><mrow><mi>i</mi></mrow></msub>
   :MATH]
   is the sample
   [MATH: <mi>i</mi> :MATH]
   , if the prediction label is equal to the true label,
   [MATH:
   <msub><mrow><mi>y</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>=</mo><m
   n>1</mn> :MATH]
   , otherwise
   [MATH:
   <msub><mrow><mi>y</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>=</mo><m
   n>0</mn> :MATH]
   ;
   [MATH: <mi>p</mi><mfenced close=")"
   open="("><mrow><msub><mrow><mi>y</mi></mrow><mrow><mi>i</mi></mrow></ms
   ub></mrow></mfenced> :MATH]
   represents the probability that the observed sample
   [MATH: <mi>i</mi> :MATH]
   predicts correctly.

Spatial domain identification and visualization by clustering

   After model training, the Leiden algorithm in the Scanpy^[214]54
   package is used to identify the spots into different spatial domains
   based on latent embeddings obtained by ResST. ResST achieves the
   optimal resolution in two ways. If the number of spatial domains is
   known, a grid search is conducted over resolutions ranging from 0.1 to
   2.5 with a step size of 0.001. If the number of spatial domains is
   unknown, the optimal resolution is determined by conducting a grid
   search over resolutions ranging from 0.1 to 2.5 with a step size of
   0.001 and simultaneously calculating the CH score. The resolution with
   the highest CH score is selected as the optimal resolution. The Leiden
   clustering algorithm is used to form clusters based on optimal
   resolution, and each cell or spot is assigned a label. Subsequently,
   the clustering results are visualized on tissue sections using the
   Scanpy package.

   Although the latent embedding is obtained using gene expression,
   spatial locations, complex biological effects, and morphological
   information, some spots may be erroneously assigned to different
   spatial domains when utilizing latent embedding for clustering. We
   consider these occurrences as noise and acknowledge their potential
   impact on downstream biological analysis. To address this, we reassign
   incorrectly allocated spots to the most frequently adjacent region.

   During vertical integration, we perform UMAP analysis using the RunUMAP
   function from Seurat, with default parameters of min_dist set to 0.5
   and n_neighbors set to 15. The feature space derived from ResST,
   Harmony, and Scanorama integrate multiple ST data sections and
   extracted features for UMAP analysis. To correct the UMAP results, we
   employ scDEED^[215]55 tool to detect dubious cells and optimize the
   UMAP analysis hyperparameters. We run scDEED with candidate n_neighbors
   values of 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 80, 160, and 240, and
   min_dist values of 0.0125, 0.05, 0.1, 0.3, 0.5, 0.7, and 0.9. After
   removing the dubious cells, we visualize the UMAP results.

Identification of spatially variable genes

   The Wilcoxon rank-sum test in Scanpy package is used to confirm highly
   variable genes for each spatial domain as well as differentially
   expressed genes between spatial domains. We then use rule^[216]15 in
   SpaGCN package to filter the differentially expressed genes: (1) the
   percentage of spots expressing the gene in the target domain, that is,
   in-fraction, is >80%; (2) for each neighboring domain, the ratio of the
   percentages of spots expressing the gene in the target domain and the
   neighboring domain(s), that is, in/out fraction ratio, is >1; and (3)
   the expression fold change between the target and neighboring domain(s)
   is >1.5. Genes with log[2]FC ≥ 2 are selected as input, and the gene
   enrichment analysis is performed using the online tool
   Metascape^[217]56.

Determination of ligand-receptor interactions between cells

   For human breast cancer samples, we analyze the ligand-receptor
   interactions by stLearn package. For each cell or spot, the expression
   levels of their receptors and ligands are calculated, and a
   receptor-ligand expression matrix is formed. Then, we calculate the
   expression scores of ligands or receptors in each cell or spot and
   display regions with scores less than 7000.

Statistics & reproducibility

   In our case analysis, we utilized the scipy package, a toolkit for
   scientific computing, to compare paired samples or two independent
   samples within the study. To evaluate the stability and reproducibility
   of ResST, we conducted 15 tests on different ST datasets, utilizing the
   default parameters. Subsequently, we calculated the variance of several
   key metrics to assess the performance of ResST, including the ARI, FMI,
   SC, DB, and CH.

Evaluation metrics

   In this study, we utilized several well-established metrics to evaluate
   the clustering results, providing a holistic view of the algorithm’s
   performance. The ARI was used to measure the similarity between the
   clustering outcome and the ground truth, with a focus on correcting for
   the chance grouping. The FMI, known for its effectiveness in scenarios
   with a large number of clusters, gauged the clustering quality based on
   precision and recall. To assess the cohesion and separation of the
   clusters, we employed the SC, which offers insights into how well each
   object lies within its cluster. Additionally, the DB was utilized to
   evaluate the clustering compactness and separation, where a lower value
   indicates better clustering. Finally, the CH was used to measure the
   variance ratio between clusters, with higher values signifying more
   distinct clustering. This combination of metrics provided a robust
   framework for evaluating and validating our clustering approach.

Comparison with other methods

   To evaluate the clustering performance of ResST in identifying spatial
   domains, we compared it with Kmeans, Seurat, SpaGCN, stLearn, SEDR,
   CCST, and BayesSpace. (1) Kmeans and Seurat implemented in the R
   package Seurat. (2) SpaGCN implemented in the Python package SpaGCN.
   (3) stLearn implemented in the Python package stLearn. (4) SEDR
   implemented in the Python package SEDR. (5) CCST implemented in the
   Python package CCST. (6) BayesSpace implemented in the Python package
   BayesSpace. The default parameters were used for all methods.

   To evaluate the data integration performance of ResST, we compared it
   with Harmony and Scanorama which are implemented in the Python package
   Harmony and Scanorama.

Reporting summary

   Further information on research design is available in the [218]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [219]Peer Review File^ (4.4MB, pdf)
   [220]Supplementary Information^ (11.1MB, pdf)
   [221]42003_2024_6814_MOESM3_ESM.pdf^ (40.9KB, pdf)

   Description of Additional Supplementary Files
   [222]Supplementary Data 1^ (37.5KB, xlsx)
   [223]Supplementary Data 2^ (12.9KB, xlsx)
   [224]Supplementary Data 3^ (123.3KB, xlsx)
   [225]Supplementary Data 4^ (58.4KB, xlsx)
   [226]Supplementary Data 5^ (11.8KB, xlsx)
   [227]Reporting Summary^ (1.6MB, pdf)

Acknowledgements