Abstract Spatial transcriptomics (ST) technologies allow for comprehensive characterization of gene expression patterns in the context of tissue microenvironment. However, accurately identifying domains with spatial coherence in both gene expression and histology in situ and effectively integrating data from multi-sample remains challenging. Here, we propose ResST, a graph self-supervised residual learning model based on graph neural network and Margin Disparity Discrepancy (MDD) theory. ResST aggregates gene expression, biological effects, spatial location, and morphological information to capture nonlinear relationships between a cell and surrounding cells for spatial domain identification. Also, ResST integrates multiple ST datasets and aligns latent embeddings based on MDD theory for correcting batch effects. Results show that ResST identifies continuous spatial domains at a finer scale in ten ST datasets acquired with different technologies. Moreover, ResST efficiently integrated data from multiple tissue sections vertically or horizontally while correcting batch effects. Overall, ResST demonstrates exceptional performance in analyzing ST datasets. Subject terms: Computational models, Machine learning __________________________________________________________________ A graph self-supervised residual learning framework (ResST) for natural domain identification and data integration of spatial transcriptomics. Introduction The complex structures and functions of multicellularity require the cooperation of thousands to billions of cells. The gene expressions of a cell are influenced by the surrounding cell and tissue microenvironment. Knowledge of the relative locations of different cells in a tissue is critical for understanding tissue functionality and pathological changes. Spatial transcriptomics (ST) technologies provide quantitative gene expression profiling with spatial information in tissues and have brought unprecedented opportunities for insights into the molecular organization of tissues. ST technologies can be mainly classified into two categories. The first category is in situ hybridization or sequencing-based technologies, such as seqFISH^[38]1,[39]2, seqFISH+^[40]3, MERFISH^[41]4,[42]5, etc, which measure the expression levels of hundreds to thousands of genes in cells within tissue microenvironments. This type of method can achieve gene expression quantification with high spatial resolution at the single-cell resolution level. The second category is based on in situ capturing-based technologies, including ST^[43]6, SLIDE-seq^[44]7, SLIDE-seqV2^[45]8, Stereoseq^[46]9, 10× Visium^[47]10,[48]11, etc, which first label spatial spots on histological tissue sections with unique barcodes and then perform sequencing. These methods can profile the whole transcriptome information of tissue sections of any size. In ST studies, a critical step is accurate clustering for the identification of spatial domains. Traditional non-spatial clustering methods only utilize gene expression for cell clustering, such as Kmeans and Louvain’s method, which were extensively employed in single-cell transcriptomics studies^[49]12. However, these methods ignore spatial location and morphological information, resulting in clustering results that are discontinuous in tissue sections. Several spatial clustering methods have been developed to analyze ST data derived from tissue samples and interpret the spatial dependence of genes by integrating spatial location or morphological information. For example, BayesSpace^[50]13 enables spatial clustering by modeling a low-dimensional representation of gene expression matrix and encouraging neighboring spots to belong to the same cluster via a spatial prior. stLearn^[51]14 extracts features and location information from histological images and combines gene expression features from adjacent nodes to normalize gene expression data. SpaGCN^[52]15 utilizes graph convolutional neural networks to identify spatial domains and spatially differential genes by combining gene expression, spatial location, and histology. SEDR^[53]16 embeds spatial information using deep autoencoders and graph autoencoders to map gene expression patterns to low-dimensional space. Recently, Cell Clustering for Spatial Transcriptomics data (CCST)^[54]17 performs cell clustering by extending an unsupervised node embedding method based on graph convolutional networks. These existing methods enhance clustering accuracy by incorporating spatial location or morphological information. Therefore, the comprehensive utilization of multimodal data including gene expression, spatial location, and morphological information, effectively improves the clustering performance. However, the aforementioned methods do not consider the impact of biological effects on spatial domain identification. It has been demonstrated that biological effects are strongly linked to gene expression in tissue cells owing to gene expression characteristics influenced by biological effects, such as cell properties, surrounding cells, and tissue microenvironment^[55]18. Also, the neural network architectures developed by these deep learning methods mainly rely on linear principal component analysis (PCA) to extract features from normalized gene expression and are relatively simple, lacking the ability to learn more nonlinear information from gene expression, biological effects, spatial information, and morphological information. It is desirable to develop efficient computational approach that aggregates gene expression, biological effects, spatial location, and morphological information to capture nonlinear relationships between a cell and surrounding cells for natural domain identification. Another critical task in ST studies is to integrate data from multiple sections while correcting batch effects. In order to obtain the real-world gene expression patterns of tissue organization for fully understanding tissue functionality and pathological changes, multiple studies have performed the analysis of ST data from multiple tissue sections equally distributed along vertical or horizontal axis of the tissue. Although a variety of data integration methods have been developed, such as Harmony^[56]19 and Scanorama^[57]20, most of them are for single-cell datasets without consideration of spatial information. Recently, PASTE^[58]21 and PRECAST^[59]22 were proposed to analyze ST datasets with consideration of spatial information. However, these methods only apply to integrate ST data from multiple tissue sections uniformly distributed along the anteroposterior axis of the same area of interest (vertical integration). They also show limited flexibility for different ST platforms, resulting in poor clustering performance in broad-spectrum data^[60]23,[61]24. More recently, SpaGCN^[62]15 and STAGATE^[63]25 can jointly analyze data from multiple adjacent sections (horizontal integration) containing tissue compartments within the whole organ, but they are ineffective in correcting batch effects and inapplicable in integrating data from multiple tissue sections vertically. Also, most existing methods map integrated data into low-dimensional space using the principal components of conventional dimension reduction without considering the consistent loss functions of dimension reduction, alignment across samples, or spatial clustering. Therefore, there is a need for a novel method to integrate ST data from multiple sections horizontally or vertically while correcting batch effects. To address the aforementioned challenges, we developed ResST, a graph self-supervised residual learning model based on graph neural networks and Margin Disparity Discrepancy (MDD) theory, to perform spatially natural clustering and integrate data horizontally or vertically while correcting batch effects. For clustering, ResST constructs gene expression correlation matrix to represent the impact of biological effects on gene expression. Then ResST learns the spatial embedding (residual) and low-dimensional gene representation. Subsequently, low-dimensional gene concatenates with the residual for composing latent embedding that captures multimodal information, including gene expression, biological effects, spatial location, and morphological information, significantly enhancing clustering accuracy. Moreover, ResST achieved algorithm design of data integration based on MDD theory for domain adaptation. MDD is a measurement with rigorous generalization bounds, tailored to the distribution comparison with the asymmetric margin loss^[64]26. It promotes the proximity between the class mean of the two domains for each class and pushes apart the mean of different classes, significantly improving prediction accuracy in unsupervised domain adaptation^[65]26,[66]27. In this study, ResST possesses the ability to integrate ST data from multiple tissue sections horizontally or vertically. Moreover, ResST aligns the latent embeddings across multiple tissue sections by incorporating MDD measurement to correct batch effects. We extensively tested ResST for the two key tasks on analysis of ST data from different platforms (e.g., 10× Visium, SLIDE-seqV2^[67]8, Stereoseq^[68]9, seqFISH^[69]1,[70]2, MERFISH^[71]4,[72]5, 4i^[73]24, and MIBITOF^[74]23) and compared it with current methods. The results showed that ResST achieved better clustering performance on the human dorsolateral prefrontal cortex (DLPFC) dataset than current state-of-the-art methods. Moreover, ResST identified much more refined heterogenous subregions in breast cancer ST data and sagittal section seqFISH data of mouse embryos. Interestingly, ResST effectively reduced batch effects in the human DLPFC dataset and mouse posterior brain 10× Visium data, and more clearly identified the dentate gyrus of mouse posterior brain compared to other mainstream batch processing methods. In summary, our results demonstrated that ResST has a powerful ability to accurately identify spatial domains and efficiently integrate data from multiple tissue samples horizontally or vertically while correcting batch effects, making it highly scalable for different ST platforms. Results Overview of ResST workflow We explain the workflow of ResST using in situ capturing-based ST data as an example, but the model can be easily modified to different ST platforms. ResST couples gene expression with biological effects, spatial location, and morphological information to accurately identify spatial domains (Fig. [75]1a). ResST first constructs an enhanced gene expression matrix of all spots by combining gene expression, spatial location, and morphological information, and calculates an adjacent matrix using k-nearest neighbors algorithm based on spatial coordinates (Fig. [76]1b). Then, ResST feeds enhanced gene expression matrix into encoder to generate low-dimensional gene representation. Next, ResST utilizes two graph convolutional layers to map low-dimensional gene representation into spatial embedding (residual). Subsequently, low-dimensional gene representation from the encoder is mapped through identity shortcuts and concatenates with the residual for composing latent embedding. Finally, the learned latent embedding can be applied to spatial domain identification and cell type or subtype identification (Fig. [77]1c). When integrating data from multiple sections, we design a loss function for domain classifier based on MDD measurement to calculate loss between domains (tissue sections). ResST integrates enhanced gene expression matrixes from multiple datasets to generate latent embeddings, which are fed into a domain classifier connected by a gradient reversal layer for correcting batch effects (Fig. [78]1c, d). During the learning process, the domain classifier can map the source and target domains of different distributions into the same feature space, which is used to integrate spatial data from various distributions. In addition, ResST applies domain classifier to maintain the correspondence between cells and tissue sections, preserving layer-specific gene expression variations. Fig. 1. Overview of ResST workflow. [79]Fig. 1 [80]Open in a new tab a ResST workflow begins with gene expression, spatial location, and morphological information in ST data. b ResST constructs an enhanced gene expression matrix from morphological information, gene expression, and spatial location. ResST initially segments the histological image based on the coordinate of each spot and extracts the image features by a pre-trained deep learning model. The morphological similarity matrix is calculated to assess the influence of morphological information on clustering based on the image features. The gene expression correlation matrix is calculated to assess the influence of biological effects on clustering. The spatial correlation matrix is calculated to quantify the influence of spatial information on determining similar cellular states. c ResST is a graph self-supervised residual learning model based on graph neural network and Margin Disparity Discrepancy (MDD) theory. The deep autoencoder consists of an encoder, a graph autoencoder, and a decoder. d ResST can integrate ST data from different sections while correcting batch effects and accurately assigning spots to the correct spatial domains. N represents the number of spots, while M[1] and M[2] correspond to the dimensions of the final layer of encoder and the final layer of GNN, respectively. ResST exhibits superiority in identifying spatial domains in the human dorsolateral prefrontal cortex and mouse brain data We first assessed the superiority of ResST’s spatial clustering performance on the human DLPFC^[81]20 dataset with 12 sections from three distinct individuals acquired with 10× Visium. It includes cortical layers (L1-L6) and white matter (WM) labeled with marker genes and histological annotations of tissue structure, which enabled evaluation of the accuracy of spatial domain detected by each method. The spatial domain identified by these methods for the representative tissue section 151673 (3639 locations and 33538 genes) was displayed in Fig. [82]2b. The visual comparison demonstrated that Kmeans spatial clustering had the poorest performance with mixed cortical layers (L1-L6). Seurat and stLearn recovered part of the cortical layers and WM, but they did not achieve clean separation and clear boundaries of the cortical layers (L1-L6). SEDR and CCST could identify well-refined layers by utilizing histological information or spatial location for spatial clustering, but they only divided cortical layers into five layers and most of the layers were inconsistent with the annotations. SpaGCN and BayesSpace accurately delineated all the cortical layers (L1-L6) and WM, but they generated incorrect layer thickness of L1 and L2. ResST delineated finer-grained layers by integrating multimodal information including gene expression, biological effects, spatial location, and morphological information, and distinguished all the cortical layers (L1-L6) and WM that were most consistent with manual annotations of DLPFC and neuroscientific definitions of cortical layers. Fig. 2. ResST exhibits superiority in identifying spatial domains. [83]Fig. 2 [84]Open in a new tab a Manually annotated layer structure for slice 151673 from ref. ^[85]57. b Spatial domains identified by ResST and existing state-of-the-art algorithms (Kmeans, Seurat, SpaGCN, stLearn, SEDR, CCST, and BayesSpace). c Boxplot of the performance of ResST and other algorithms for all 12 DLPFCs. The x-axis shows ARI and FMI, which were used to compare the similarity of the predicted spatial layers and the manually annotated layers for each algorithm. d UMAP visualizations and PAGA graphs were generated for slide 151673 with Seurat-derived principal components (left) and ResST-derived embeddings (right). e Clustering results of mouse brain coronal region (top); The corresponding anatomical Allen Mouse Brain Atlas (bottom, [86]https://atlas.brain-map.org/). f Clustering results of mouse brain sagittal anterior section (section 2); The corresponding anatomical Allen Mouse Brain Atlas (bottom). g Clustering results of mouse brain sagittal posterior section (section 1); The corresponding anatomical Allen Mouse Brain Atlas (bottom). We further evaluated the quality of clustering performances for each method using external measures, namely the Adjusted Rand Index (ARI)^[87]28 and the Fowlkes-Mallows Index (FMI)^[88]29. ResST achieved much higher ARI and FMI than the current best method (BayesSpace). The mean ARI was 0.694 ± 0.062 for ResST and 0.526 ± 0.128 for BayesSpace (Wilcoxon test, p = 0.0007; Supplementary Table [89]1). The FMI was 0.752 ± 0.064 for ResST and 0.635 ± 0.099 for BayesSpace (Wilcoxon test, p = 0.0061; Supplementary Table [90]2). The mean ARI was lower than 0.426 ± 0.094 for other methods. ResST achieved the highest clustering accuracy in tissue section 151509 (ARI = 0.792; Supplementary Fig. [91]3) and lower variability in performance across the slices by considering complex global interactions than most of the other methods (Fig. [92]2c and Supplementary Figs. [93]1–[94]11). Moreover, compared to Seurat, ResST provided more orderly cortical layers from L1 to L6 and WM illustrated by uniform manifold approximation and projection (UMAP) and partition-based graph abstraction (PAGA) (Fig. [95]2d). Therefore, ResST exhibited the most outstanding performance in DLPFC dataset. Also, we applied ResST to 10× Visium data from the sagittal posterior, sagittal anterior, and coronal regions of the mouse brain with highly intricate tissue architectures. Although Kmeans was able to outline the major anatomical regions, many clusters were intermixed. Seurat, stLearn, and SpaGCN managed to achieve more distinct clusters, but the boundaries between these clusters remained ambiguous. CCST generated clean separation and clear boundaries, but clusters were inconsistent with histological images (Supplementary Figs. [96]12, [97]13). ResST, BayesSpace, and SEDR showed outstanding clustering performance and clearly identified tissue structures closer to histological images. Moreover, only ResST revealed finer-grained tissue structures on all three areas of mouse brain ST data, including the cornu ammonia and dentate gyrus in the hippocampus of mouse brain (Fig. [98]2e), the layers 2/3, 4/5, and 6 of the cortex in the sagittal anterior (Fig. [99]2f and Supplementary Figs. [100]12, [101]13), as well as the cerebellar cortex and dorsal horn of hippocampus region in the sagittal posterior (Fig. [102]2g and Supplementary Figs. [103]14, [104]15). We quantified the clustering performance with internal measures of the Silhouette Coefficient (SC)^[105]30, Calinski-Harabasz Index (CH)^[106]31, and Davies-Bouldin index (DB)^[107]32, which is only generated from SEDR, CCST, and stLearn, and cannot be obtained from other methods because of no embeddings. Clustering analysis of the sagittal posterior data showed that ResST had higher SC (SC = 0.122) and CH (CH = 137.031), and lower DB (DB = 1.920) than those of SEDR, CCST, and stLearn (Fig. [108]2g and Supplementary Fig. [109]14). Similarly, ResST achieved better clustering performance than most of the test methods on the coronal and sagittal anterior data (Supplementary Figs. [110]12, [111]13). These results demonstrated that ResST exhibited superiority in identifying spatial domains by considering gene expression, biological effects, spatial location, and morphological information in the human DLPFC and mouse brain data. ResST reveals finer-grained tissue structures for improved heterogeneity analysis in breast cancer data We assessed the clustering performance of ResST on the 10× Visium dataset of human breast cancer (invasive ductal carcinoma). This dataset comprises 3789 spatial spots and 36601 genes, manually annotated by pathologists based on H&E images and reported breast cancer marker genes with spatial expression characteristics (Fig. [112]3a). Here, we tested the ability of ResST, compared to other competing methods, to recover the detailed pathological structures. Among all the methods, ResST achieved the highest ARI (ARI = 0.728), followed by BayesSpace (ARI = 0.668), while the remaining methods had ARI values below 0.56 (Fig. [113]3b, Supplementary Fig. [114]16). The visual comparison clearly showed that the computed domains from Kmeans, Seurat, stLearn, and SpaGCN were fragmented and discontinuous, while SEDR, CCST, BayesSpace, and ResST produced more jointed domains. Kmeans was only able to recover the Healthy 1 domain with the remaining domains mixed among the others, indicating the poorest performance. The remaining methods achieved improved spatial domain identifying, yet the boundaries between domains performed by Seurat, stLearn, and SpaGCN appeared ragged and lacked clean separation. Although CCST and BayesSpace performed better, they inaccurately divided Health 1 region into multiple sub-regions which also existed in Seurat and stLearn. SEDR exhibited clear boundaries, but the reconstructed tissue structures did not align perfectly with the manual annotations. Only Seurat, BayesSpace, and ResST recognized the sub-domain (domain 13) within domain 2. Furthermore, ResST was the sole method that overcame all the aforementioned limitations that existed in other methods, producing the domains that were closest to the annotations, and achieving finer-grained pathological structures in cancer slices. Based on the clustering results of ResST, we delved deeper into the significant differentially expressed genes (DEGs) between domain 2 and its sub-domain, domain 13. We conducted differential expression analysis on domains 2 and 13 and detected 349 significant DEGs (log[2]FC ≥ 2; p.adjust < 0.05) between the two domains. Some of these DEGs, such as PTGES and VTCN1, were significantly overexpressed in domain 13 and could distinguish domains 2 and 13 (Fig. [115]3c, Supplementary Fig. [116]17, and Supplementary Data [117]1). It has been demonstrated that several identified DEGs were regarded as potential targets in cancer therapy. For example, VTCN1^[118]33 is associated with immunosuppression, and PTGES^[119]34 plays an important role in inflammation, immune regulation, cellular proliferation, and apoptosis (Fig. [120]3c). These results suggested that ResST displayed superior spatial clustering performance and provided more important DEGs for understanding cancer pathogenesis and uncovering therapeutic targets. Fig. 3. ResST identified spatial domains in human breast cancer at a finer level. [121]Fig. 3 [122]Open in a new tab a 10× Visium ST data of a breast cancer sample annotated by pathologists containing invasive ductal carcinoma (IDC), ductal/lobular carcinoma (DCIS/LCIS), tumor edge, and healthy regions. b Spatial domains identified by ResST, Seurat, and BayesSpace. c Volcano graph of DEGs between domains 2 and 13 (left); Gene expression with regional annotation and violin plots of PTGES and VTCN1 (right). d Heatmap of Pearson correlation coefficient among domains. e LYZ and SLITRK6 express differentially between domains 1 and 13. f Cell type annotations from Seurat (left); Domains with highest ligand-receptor interaction activity (LR_score < 7000) (middle); Spatial location of domains 5, 7, and 13 (right). g Gene ontology enrichment analysis of the DEGs between domains 1 and 13. h H&E of human breast cancer sample annotated by Agoko’s telepathology platform (left); Spatial domains identified by ResST, SEDR, CCST, and stLearn (right). To further explore the spatial heterogeneity within tumor tissues, we conducted domain correlation analysis using the cluster labels and the recovered pathological structures from ResST (Fig. [123]3d and Supplementary Data [124]2). We found significant heterogeneity between region 1 and regions 2, 3, and 13. Region 1 corresponds to DCIS/LCIS, while regions 2, 3, and 13 correspond to IDC in the breast tissue slices, respectively (Fig. [125]3d). We compared transcriptional differences between domain 1 and 13 by performing differential expression analysis followed by pathway enrichment analysis and ligand-receptor interactions analysis. A total of 413 significant DEGs (log[2]FC ≥ 2; p.adjust < 0.05) were detected and the top ten DEGs were selected in each domain for evaluating the transcriptome heterogeneity (Supplementary Data [126]3). CRISP3, SLITRK6, C6orf141, VTCN1, ABCC11, EFHD1, PDLIM1, ECM1, PIP, and PTGES were upregulated in domain 13 and could be used to accurately distinguish DCIS/LCIS and IDC. CRISP3 is associated with immune response modulation and tumor microenvironment, tumor-associated macrophage (TAM) infiltration^[127]35. VTCN1 is a negative co-stimulatory molecule that inhibits the activation and proliferation of T cells, which is associated with immune escape, TAM infiltration, and poor prognosis in various cancers^[128]36. PIP is linked to immune escape mechanisms in breast cancer^[129]37. ECM1^[130]38 and PTGES^[131]34 are associated with tumor proliferation, migration and invasion, chemoresistance, and poor prognosis (Fig. [132]3e and Supplementary Fig. [133]18). Moreover, ResST identified SCGB1D2, SCGB2A2, FCGR3B, LYZ, CXCL9, ABHD2, FCGRT, LINC02224, KLHDC7B, and HLA-DPA1, which were highly expressed in spatial domain 1. Previous studies have demonstrated that SCGB2A2 and SCGB1D2 may be new independent prognostic markers in breast cancer^[134]39. FCGR3B is considered as a marker gene of Natural killer cells^[135]40, and CXCL9^[136]41, KLHDC7B^[137]42, and LYZ^[138]43 are marker genes of lymphocytes, indicating abundant immune cell infiltration in domain 1 (Fig. [139]3e and Supplementary Fig. [140]18). We conducted differential expression analysis for region 1 compared to regions 2 and 3, respectively. The genes highly expressed in region 1 (Supplementary Data [141]3) are associated with immune response and extracellular matrix organization. The genes highly expressed in regions 2 and 13 (Supplementary Data [142]3) are related to tumor invasion, immune evasion, and cytoskeletal remodeling. Additionally, we observed multiple pathways upregulated in domain 1, such as phagosome, lymphocyte-mediated immunity, and innate immune response (Fig. [143]3g and Supplementary Data [144]4). These pathways participate in anti-tumorigenic or anti-metastatic effects^[145]44–[146]46, indicating a low potential for tumor metastasis of DCIS/LCIS, consistent with previous studies^[147]47. Moreover, ResST performed spatial cluster by considering gene expression, spatial location, complex biological effects, and morphological information and identified domains 5, 7, and 15 which was consistent with locations in the tissue with high ligand-receptor interaction activity performed by stLearn (Fig. [148]3f). In conclusion, the results demonstrated ResST was capable of recovering detailed pathological structures in cancer tissues and provided immense assistance for exploring the heterogeneity present within tumor tissues. We also examined another ST data of human breast cancer (Ductal Carcinoma in Situ) with manual annotations (Fig. [149]3h). ResST, along with other methods (Fig. [150]3h and Supplementary Fig. [151]19), was employed to assess the ability to detect spatial structures. The clustering results showed that SEDR and stLearn exhibited poorer performance compared to CCST and ResST, with lower SC and CH and higher DB. Although ResST (SC = 0.132, DB = 2.433, CH = 71.518) provided higher DB and lower CH compared to CCST (SC = 0.092, DB = 1.793, CH = 199.572), ResST exhibited higher SC by performing stronger coherence within their respective spatial positions and generated clear boundaries and continuous domains consistent with the annotations which are similar to those of CCST. The results indicated that ResST enabled detailed profiles of spatial transcriptome and facilitated the discovery of tumor heterogeneity. ResST corrects batch effects for vertical and horizontal integration of multiple tissue sections An obstacle to integrating data from multiple sections vertically or horizontally is the presence of batch effects. We assessed the remarkable capabilities of ResST in correcting batch effects and revealing finer-grained tissue structures. This performance was conducted on the 10× Visium dataset of the posterior sagittal mouse brain, which comprises two sections along the same vertical axis and exhibits intricate tissue characteristics. ResST incorporated a measurement with rigorous generalization bounds, MDD, in the workflow for integrating data from multiple tissue sections. We compared the performance of the pipeline integrated without (Method 1) or with (Method 2) MDD measurement for evaluating the effect of MDD measurement on spatial clustering results. The results showed that Method 2, ResST (Fig. [152]4b; SC = 0.161, DB = 1.676, CH = 284.062), provided more finer-grained clustering results than Method 1 (Fig. [153]4b; SC = 0.152, DB = 2.054, CH = 145.330). As shown in Fig. [154]4b, we observed that ResST accurately identified the dorsal and the ventral horn of the hippocampus regions, which was not identified by Method 1. Current methods used for data integration, Harmony and Scanorama, only identified the ventral horn of the hippocampus region. These results suggested that the spots obtained by ResST retained maximal biological content from multiple samples and MDD measurement improved spatial clusters on data integration of multiple tissue sections. Fig. 4. ResST corrects batch effects for vertical and horizontal data integration of multiple tissue sections. [155]Fig. 4 [156]Open in a new tab a H&E image of posterior mouse brain from the original study. b The spatial clustering results from Method 2 (with MDD), Method 1 (without MDD), Harmony, and Scanorama (from left to right). c UMAP plots before batch effects correction and spatial location of sections 151673 and 151675 in DLPFC. d Vertical integration results with two DLPFC sections. UMAP plots after batch effects correction and spatial clustering from ResST, Harmony, and Scanorama (left). Spots are colored according to the spatial domains identified by the respective clustering methods (right). e Horizontal integration results from ResST (top) and SpaGCN (bottom) with two mouse brain samples which consist of sagittal anterior and posterior regions. We further evaluated the ability of ResST to correct batch effects on the DLPFC ST dataset, which originates from a contiguous region within the tissue obtained along a vertical axis, serving as a benchmark dataset for evaluating vertical integration. ResST was compared with Harmony and Scanorama, which are commonly used to overcome batch effects in single-cell genomics data. We first assessed the integration results visually with UMAP. There was a slight offset between sections prior to batch correction (Fig. [157]4c). After integration, the UMAP plots for ResST, Harmony, and Scanorama showed that spots from different sections were all well mixed, while the domain clusters were well segregated. In the subsequent clustering, the spatial domains from ResST, Harmony, and Scanorama were all highly overlapping, but the domains from Harmony and Scanorama were only able to recover the WM with remaining clusters mixed among the other layers (Fig. [158]4d). Moreover, the visualization demonstrated that ResST more distinctly revealed the laminar structure of the human cerebral cortex with clean separations and clear boundaries compared to other methods (Fig. [159]4d). We next tested the performance of ResST on data integration and batch effects correction using a mouse brain 10× Visium dataset. The mouse brain sections consist of anterior and posterior regions (horizontal sections) and include two slices for each tissue region (vertical sections) with complex structural characteristics. On the UMAP plots, ResST, Harmony, and Scanorama all reduced batch effects and spots from different sections were all well mixed. Moreover, ResST performed better and well-segregated clusters, which were not achieved by Harmony and Scanorama (Supplementary Fig. [160]20). Subsequently, the clusters from ResST aligned well across the two sections, and ResST generated clusters that were more consistent with known anatomy (Fig. [161]4a) than other methods. In contrast, the clusters generated by Harmony and Scanorama were fragmented and incompletely in agreement with known anatomy (Supplementary Fig. [162]20). We further compared the internal measures of ResST with those of Harmony and Scanorama. ResST exhibited a higher SC and similar DB values compared to Harmony (Fig. [163]4b; SC = 0.125, DB = 1.705, CH = 426.953) and Scanorama (Fig. [164]4b; SC = 0.125, DB = 1.509, CH = 675.083), generating greater continuous spatial domains with lower noise levels. Also, ResST performed better on two sections from the 10× Visium dataset of the anterior sagittal mouse brain (Supplementary Figs. [165]12, [166]13). These results demonstrated that ResST efficiently integrated data from multiple sections and corrected batch effects, thereby improving the performance of identification of spatial domain. Then, we compared SpaGCN and ResST in horizontal integration using two sections of anterior and posterior from the mouse brain 10× Visium data and employed the annotated Allen mouse brain Atlas^[167]48 as the ground truth. The results showed that most of the spatial clusters identified by SpaGCN and ResST demonstrated remarkable concordance with known anatomical structures. However, SpaGCN did not capture certain pivotal regions within the tissue and many clusters obtained by this method were fragmented and failed to align along the shared edges between the two tissue sections (Fig. [168]4e). In contrast, ResST exhibits superior performance and clearly identified the cerebral cortex, cerebellum, and corpus callosum. Moreover, ResST showcased enhanced alignment along the edges of both anterior and posterior sections, leading to outstanding performance on data integration and batch effects correction. ResST works well on various spatial omics data independent of platforms Aside from the 10× Visium platform, we investigated the generalization ability of ResST on image-based molecular data (seqFISH, MERFISH, 4i, and MIBITOF) and high-resolution ST data (Stereoseq and SlideseqV2). We first applied ResST to 4i^[169]24 (iterative indirect immunofluorescence imaging) data, which measured 40 protein reads (~270,000 observations/pixel) in high-throughput biological samples from millimeter to nanometer scales. Here, we only used partial molecular data for spatial domain identification (Fig. [170]5a and Supplementary Fig. [171]21a, pixel coordinates (x[1], y[1])- (x[2], y[2]): (1100, 950)- (1300, 1150)). The results showed that ResST can depict detailed subcellular distribution in local regions, including various compartments, organelles, and cell structures within each cell (Fig. [172]5a). Also, we applied ResST to another image-based molecular MIBI-TOF dataset^[173]23, which imaged 36 labeled antibodies with histological staining and endogenous elements (3309 pixels). Similarly, ResST revealed partially continuous regions and locally fused elements in four imaging results (Fig. [174]5b and Supplementary Fig. [175]21b), which was almost consistent with the original annotations^[176]23. Fig. 5. ResST works well on various spatial omics data. [177]Fig. 5 [178]Open in a new tab a Visualization of subcellular molecular maps using 4i (Iterative Indirect Immunofluorescence Imaging) (top; partial data, coordinates (x[1], y[1])–(x[2], y[2]): (1100, 950)–(1300, 1150) and spatial domain detection using ResST (bottom). b Visualization of imaging-based molecular MIBI-TOF dataset (3309 pixels and 36 proteins) with annotation (top, the point16 section) and spatial domain detection using ResST (right, ARI = 0.524). c Visualization and annotation of SlideqV2 data in the mouse hippocampus (left); spatial domain detection using ResST (middle); distribution of gene Tmsb4x expression in the data. d Visualization and annotation of MERFISH data in the mouse prefrontal hypothalamus (top) and spatial domain detection using ResST (bottom). e Visualization and annotation of adult Macaque cortex Stereoseq data (leftmost); spatial domain partition using ResST (left middle); expression distribution of the marker genes GFAP, GPR83, and SYT6 in tissue sections of the cortex (right). f Visualization and annotation of seqFISH data in sagittal slices of mouse embryos (left); spatial domain partition using ResST (right). Subsequently, we evaluated the performance of ResST on ST data at an approximate single-cell resolution. We assessed the ability to recover the annotated structures of ResST on a mouse hippocampus dataset acquired with Slid-seqV2^[179]8 (comprising 41,886 subcellular regions and 4000 genes). ResST was able to produce spatially consistent clustering and outline the major anatomical regions such as the “DenatePyramids” and “CA1_CA2_CA3_subiculum”. Interestingly, ResST divided “CA1_CA2_CA3_subiculum” into two domains (domains 5 and 6) (Fig. [180]5c) and we detected significant spatial DEGs between domains 5 and 6, such as Tmsb4x. Also, ResST demarcated several subareas, including the third ventricle (domain 13), medial habenula (domain 12), and lateral habenula (domain 5), which were not clarified in the original annotation (Supplementary Fig. [181]21d). Furthermore, ResST successfully processed MERFISH^[182]4 data from mouse hypothalamus, and clearly identified both “Ependymal” and “OD Mature” structural domains (Fig. [183]5d). We also validated the performance of ResST on adult Macaque cortex Stereoseq chips^[184]49. ResST accurately identified the L2-L6 regions in the cortex layer, consistent with histological staining definitions. In addition, we identified marker genes through a more in-depth analysis, such as GFAP, GPR83, and SYT6, which were consistent with spatial clustering results (Fig. [185]5e and Supplementary Fig. [186]21c). Finally, we validated the generalization ability of ResST on seqFISH data from sagittal slices of mouse embryos^[187]50. ResST can accurately identify tissue structures in the brain, heart, and intestinal tube among the ectoderm, mesoderm, and endoderm, respectively. Moreover, ResST divided “Forebrain/midbrain/hindbrain” into finer-grained tissue structures, forebrain, midbrain, and hindbrain, which were not clarified in the original annotation (Fig. [188]5f). These results suggested that ResST can make detailed structural domain divisions for ST data from different platforms. Discussion ST technologies are powerful experimental methods for measuring gene expression while preserving the relevant spatial context. Spatial clustering and data integration are the most important tasks for analyzing ST datasets. Previous studies showed that accurate clustering is dependent on the comprehensive utilization of multimodal data (gene expression and spatial location) derived from ST technologies and morphological information^[189]12–[190]17. It has been demonstrated that gene expression characteristics in tissue cells are related to biological effects, such as cell properties, surrounding cells, and tissue microenvironment^[191]18. However, current ST processing tools ignore the impact of biological effects on spatial clustering. Moreover, current ST datasets are obtained from a single section or multiple tissue sections. Integration of these data includes horizontal or vertical integration. Previous studies have revealed that the batch effects during data generation processes decrease the accuracy of downstream analysis and must be removed while preserving the biological meaning of datasets^[192]51. However, few ST processing tools are applicable to simultaneously integrate data horizontally and vertically while remove batch effects. In this study, we developed a novel ST processing method, ResST, which not only fully utilized gene expression, biological effects, spatial location, and morphological information for improving clustering accuracy, but also simultaneously integrated data horizontally and vertically while removed batch effects. Gene expression derived from ST technologies encapsulates complex biological information including cell properties, surrounding cells, tissue microenvironment, cell communication, receptor-ligand interactions, etc, which are biological effects objectively existing in the tissues and organs^[193]18. Current deep learning methods used for ST clustering analysis are based on simplistic neural network architectures^[194]14,[195]15,[196]17. These methods simply aggregate gene expression, spatial information, or morphological information from neighboring spots, but they fail to capture more effective multimodal information, such as biological effects. In this study, ResST computed the gene expression correlation matrix using gene expression matrix based on a linear approach. The gene expression correlation matrix, a n × n matrix, quantified the correlation between cells/spots, representing biological effects. The clustering performance was evaluated with or without biological effects on breast cancer datasets. The results showed that the ARI values significantly increased when integrating biological effects (Supplementary Fig. [197]22), indicating that ResST improved spatial domain clustering performance by integrating biological effects. Previous reports showed that the residual neural network is a classical convolutional neural network that introduces a shortcut connection that enhances the extraction of effective information and effectively prevents network performance degradation^[198]52,[199]53. In this study, we applied the residual learning model to fully capture multimodal information, such as biological effects. The unique residual design of ResST allows the encoder’s output to be used as input for a graph autoencoder, and concatenate with the corresponding output of the autoencoder for generating latent embeddings that capture nonlinear relationships between a cell and surrounding cells for spatial domain identification. Therefore, ResST enables accurate and continuous spatial clustering, bringing ST analysis profiles closer to natural and realistic tissue structures and achieving more accurate clustering than that of existing models. MDD theory, a measurement with rigorous generalization bounds, can seamlessly be transformed into an adversarial learning algorithm for domain adaptation^[200]26. This algorithm promotes the proximity between the class mean of the two domains for each class and pushes apart the mean of different classes, significantly improving prediction accuracy in unsupervised domain adaptation^[201]26,[202]27. Notably, data integration tasks in ST studies include batch effects correction and clustering of mixed data, which essentially belong to the classification tasks of domain adaptation. In the present study, ResST achieved an algorithm design for data integration based on the MDD theory. The results showed that the designed algorithm can effectively extract the features of data from different sections and make the feature distributions of different sections more similar, removing the batch effects in data integration tasks. Another advantage of ResST is that it integrates ST data from multiple sections while preserving layer-specific gene expression variation. Current data integration methods mix data from multiple batches before performing downstream analysis^[203]19–[204]22. For example, PASTE^[205]21 stacks and/or integrates ST data from multiple adjacent tissue sections into a single slide, failing to capture cellular biological variations across sections. Based on the MDD theory, ResST feeds latent embeddings into a domain classifier through a gradient reversal layer for maintaining the correspondence between cells and tissue sections, capturing the cellular biological variations across sections. Therefore, ResST integrates ST data from multiple sections horizontally or vertically while correcting batch effects and preserving layer-specific gene expression variation. We designed ResST to process data acquired from different spatial transcriptomic platforms. When applied to different complexity levels of 10× Visium data, ResST detected more continuous spatial domains with reduced noise levels, which was consistent with manual annotations. Analysis of breast cancer ST data, ResST offered more reasonable partitioning of spatial domains and provided a more detailed delineation of substructural regions within the tissue, giving great assistance in understanding cancer heterogeneity and molecular mechanism of cancer occurrence and metastasis. ResST also identified fine spatial domains for seqFISH data of sagittal sections of mouse embryos. When analyzing ST data of human DLPFC and sagittal sections of mouse brain, ResST identified more continuous and accurate spatial domains by integrating data horizontally and detected biologically coherent spatial domains that aligned across samples while removing batch effects by integrating data vertically. Also, ResST not only performed well on mainstream ST platforms, such as 10× Visium, MERFISH, and Slide-seq but also showed great potential in other ST data (4i and MIBI-TOF). ST provides spatially resolved gene expression of diverse cells in complex tissues. However, uncovering a more detailed picture of complex biological processes in tissues remains a challenge due to limited gene recovery or low spatial resolution in recent ST technologies. The integration of ST with multi-omics data is an inevitable trend for comprehensively understanding complex biological processes (the fine-grained panorama of a heterogeneous tissue/molecular architecture). ResST offers opportunities for new research issues in data integration of ST and multi-omics. For instance, the scRNA-seq data can be applied to deconvolute ST data, generating clearer cell-type compositions and enabling tissue cartography at single-cell resolution. Furthermore, when single-cell resolution of ST datasets is available, they can be directly integrated with multi-omics data. For example, integrating CITE-seq will deeply differentiate cellular heterogeneity and accurately explore specific cell types in the spatial context of tissues, which can help us to investigate the molecular mechanisms of drug resistance in cancer therapies. Additionally, combining ST datasets with scNMT-seq data will profile multiple epigenetic features in conjunction with gene expression, at single-cell resolution in the spatial context of tissues, enabling a more complete understanding of epigenetic dependencies and their associations with transcription and cell states. Methods Normalization of gene expression In ST studies, gene expression and spatial information can be utilized to identify cells in close spatial positions that exhibit similar states. The histology image shows a clear difference among tissue substructures with different functions, suggesting histology is informative for clustering. The gene expression of each spot can be enhanced by combining with morphological information and spatial information, facilitating natural domain identification. Additionally, because of the strong link between biological effects and gene expression, we take into account the impact of biological effects on the clustering. Taking the in situ capturing-based technologies as an example, the spatial gene expression stores the spatial two-dimensional coordinates [MATH: (x,y) :MATH] of n spots, as well as a raw gene expression matrix [MATH: GERn× d :MATH] with n spots and d genes. The matrix stores the unique molecular identifier count (UMI; an n×d matrix of UMI counts) of each gene. [MATH: GEiR1× d :MATH] is a raw gene expression vector for spot s[i], where [MATH: s(s 1,,si,,sj, ,sn< /mrow>) :MATH] . ResST constructs enhanced gene expression matrix of all spots by combining morphological information, gene expression, and spatial location. 1. Calculation of morphological similarity matrix. To effectively utilize morphological information, we accurately evaluate the morphological similarity matrix by measuring the morphological distance between each spot and others. For the ST data with morphological information, ResST segments the histological images based on the spatial coordinates of each spot and employs a pre-trained convolutional neural network (optional; default ResNet^[206]52) to extract image features from the segmented subgraphs as the morphological feature vectors of each spot. This model can convert each spot image into a 2048-dimensional latent variable. To better represent the morphological features of each spot, we apply principal component analysis (PCA) to extract the first 50 principal components as latent characteristics and calculated the cumulative explained variance (more than 0.99999). Cosine distance is used to calculate the morphological similarity weight [MATH: MWij< /msub> :MATH] between s[i] and s[j], as [MATH: MW(s< /mrow>i,sj)=MWij=MiMjMi2< mo>∣Mj2, :MATH] 1 where [MATH: MW :MATH] is a n×n morphological similarity matrix, [MATH: MWij< /msub> :MATH] is the morphological similarity weight between s[i] and s[j], [MATH: Mi :MATH] and [MATH: Mj :MATH] represent the latent characteristics after performing PCA on morphological feature vectors for spots s[i] and s[j], respectively. 2. Calculation of gene expression correlation matrix. The gene expression of a cell is influenced by its neighboring cells and intracellular or extracellular states, suggesting biological effects are nonnegligible for clustering. We explore biological effects by calculating the gene expression correlation [MATH: GWij< /msub> :MATH] , using the raw gene expression vectors [MATH: GEi :MATH] and [MATH: GEj :MATH] from spots s[i] and s[j]. The characteristics of [MATH: GWij< /msub> :MATH] are: [MATH: GW(s< /mrow>i,sj)=GWij=GEiGEi< mo accent="true">¯GEjGEj< mo accent="true">¯< mo>∣GEiGEi< mo accent="true">¯2 GEjGEj< mo accent="true">¯2 , :MATH] 2 where [MATH: GW :MATH] is a n×n gene expression correlation matrix, [MATH: GWij< /msub> :MATH] is the gene expression correlation between s[i] and s[j], [MATH: GEi :MATH] and [MATH: GEj :MATH] represent the raw gene expression vectors for spots s[i] and s[j]. 3. Calculation of spatial correlation matrix. We initially calculate the Euclidean distance between each spot and all other spots based on spatial coordinates and normalize the distances for neighboring spots to quantify the influence of spatial information on determining similar cellular states. For given spots s[i] and s[j], if the distance between the two spots is less than radius r, s[i] and s[j] are considered to be adjacent, and [MATH: SWij< /msub>=1 :MATH] , otherwise [MATH: SWij< /msub>=0 :MATH] . The characteristics of [MATH: SWij :MATH] are: [MATH: SWij=1,&SWij<r0,&SWijr, :MATH] 3 where [MATH: SW :MATH] is an n×n spatial correlation matrix and the value of [MATH: SWij< /msub> :MATH] represents whether s[j] is a neighboring spot of s[i]. We calculated Euclidean distance of the coordinate correlations by establishing a mapping relationship of each sequencing spot’s position between the experimental platform and the histological image. The threshold r was obtained by multiplying an adjustment factor by Euclidean distance to determine whether spots have spatial correlations. 4. The enhanced gene weight [MATH: EGWij< /msub> :MATH] between s[i] and s[j] is calculated as follows: [MATH: EGW(s i,sj)=EGWij=SWijMWijGWij, :MATH] 4 if in 10× Visium, otherwise as: [MATH: EGW(s i,sj)=EGWij=SWijGWij, :MATH] 5 where [MATH: EGW :MATH] is an n×n enhanced gene weight matrix, [MATH: EGWij :MATH] is the enhanced gene weight between s[i] and s[j]. ResST then normalizes gene expression vector [MATH: GEi :MATH] of each spot s[i], as shown in Eq. ([207]6): [MATH: GEi=GEi+j=1 mGEjEGWijm :MATH] 6 where [MATH: GEi :MATH] represents normalized gene expression vector for a center spot s[i], [MATH: GEi :MATH] and [MATH: GEj :MATH] are the raw gene expression vectors for spot s[i] and m adjacent spots s[j]. Graph construction for ST data To make full use of spatial information, we first construct a cell-cell spatial relationship graph G = (V, E) by calculating the distances between spots based on spatial location and selecting the k nearest spots as the current spot’s neighbors. In graph G, V represents the set of spots and E is the set of connected edges between spots. That is, A is defined as the adjacency matrix of graph G, and if spot [MATH: sjV :MATH] is the neighbor of spot [MATH: siV :MATH] , [MATH: Aij=Aji=1 :MATH] , otherwise [MATH: Aij=Aji=0 :MATH] . Additionally, we add a self-loop to this matrix. For a given spot, its neighbors are determined by calculating the distance between spots using spatial locations (optional; default is Euclidean), and we select k spots as its neighbors from the top nearest neighbors. In our tests, ResST achieves its best performance in most of the tested datasets with k = 12 (Supplementary Data [208]5). Finally, we perform normalization preprocessing on the adjacency matrix, as shown in Eq. ([209]7): [MATH: A=D1/2AD1/2, :MATH] 7 where [MATH: A :MATH] is the symmetrically normalized adjacency matrix and [MATH: D :MATH] is the degree of adjacency matrix [MATH: A :MATH] . Graph self-supervised residual learning We propose a graph self-supervised residual learning model based on a graph neural network to learn the latent embedding from the normalized gene expression [MATH: GE :MATH] and the neighborhood graph G. This process comprises three major steps: (1) data preprocessing, (2) Graph autoencoder for spatial embedding, and (3) Self-supervised residual learning for latent embedding. Subsequently, the latent embedding is used for spatial clustering and multiple ST data integration. Data preprocessing In the aforementioned steps, ResST calculated normalized gene expression using the complete raw gene expression counts. Before model training, the normalized gene expression matrix [MATH: GE :MATH] is initially logarithmically transformed and normalized by library size and scaled to unit variance and zero mean using the Scanpy package. In this paper, all 10× Visium datasets are subjected to PCA to extract the top 200 principal components (optional) for generating gene expression vectors X as input for model training. For datasets obtained from other platforms, the number of PCA components for datasets with fewer measured genes is equal to the number of genes, such as 4i, MOBI-TOF, MERFISH, seqFISH, and stereoseq. For Slide-seqV2 data, which has a higher number of measured genes, the top 200 principal components were selected. To enhance the model’s robustness, we randomly corrupt the gene expression matrix using a noise function while preserving the original graph’s topological structure unchanged. Graph autoencoder for spatial embedding We devise a graph autoencoder that imbues spatial information from the cell-cell spatial relationship graph into gene expression vectors through iterative model training. Graph autoencoder aggregates the spatial information of neighboring cells or spots by improving the variational graph autoencoder. Spatial embedding (the residual) is computed by posterior probabilities and is incorporated into the learning of latent embedding, as shown in the expression below: [MATH: qZgA,Ze=i=1 nqziA,Ze, :MATH] 8 the solution for the posterior probability is shown in the equation below: [MATH: qziA,Ze=Nziμi ,diagσi< mrow>2, :MATH] 9 where n is the number of cells or spots, [MATH: A :MATH] is the adjacency matrix, [MATH: N :MATH] is the normal distribution, [MATH: zi=GNNμ(A,Ze) :MATH] is the means of adjacency matrix, [MATH: Ze :MATH] and [MATH: Zg :MATH] are introduced in the section of self-supervised residual learning for latent embedding, [MATH: logσ=GNNσ(A,Ze) :MATH] is the variance between the node vectors. The two-layer GNN is defined as: [MATH: GNNZe,A=AReLUAZeWoW1, :MATH] 10 where [MATH: GNNμ(Ze,A) :MATH] and [MATH: GNNσ(Ze,A) :MATH] share the same weight [MATH: W0 :MATH] , but [MATH: W1 :MATH] is different. [MATH: A :MATH] is the symmetrically normalized adjacency matrix, as shown in Eq. ([210]7). Self-supervised residual learning for latent embedding We propose a deep autoencoder, a graph self-supervised residual learning model to aggregate gene expression, biological effects, spatial location, and morphological information for capturing nonlinear relationships between a spot and surrounding spots. The deep autoencoder consists of an encoder, a graph autoencoder, and a decoder. As shown in Fig. [211]1c, the deep autoencoder learns low-dimensional gene representation by encoder. Next, low-dimensional gene representation is mapped to spatial embedding (residual) by the graph autoencoder. Finally, low-dimensional gene representation from the encoder is mapped through identity shortcuts and concatenates with the residual for composing latent embedding. Identity shortcuts are never closed, and low-dimensional gene representation is always passed through, with additional residual functions to be learned. We feed the gene expression vector X into the encoder E after data preprocessing. The encoder [MATH: E :MATH] is composed of multiple fully connected layers, which can denoise the input data and convert the input [MATH: XRn× 200 :MATH] into a low-dimensional gene representation [MATH: ZeRn×He :MATH] , where [MATH: He :MATH]  = 20 is the dimension of the last encoder layer. The number and dimension of the fully connected layers can be defined by the user and can be expressed as: [MATH: EX=Ze. :MATH] 11 Additionally, the low-dimensional gene representation and the cell-cell spatial relationship graph G are spatially embedded by two graph neural network (GNN) layers, resulting in a spatial embedding [MATH: ZgRn×Hg :MATH] , which can be expressed as: [MATH: GNNZe=Zg, :MATH] 12 where [MATH: Hg :MATH]  = 8 is the dimension of the last GNN layer. Finally, the decoder [MATH: D :MATH] is composed of multiple fully connected layers that decode the latent embedding [MATH: ZRn× H :MATH] and reconstruct the gene expression matrix [MATH: XRn× 200 :MATH] . The latent embedding is composed of the low-dimensional gene representation and spatial embedding with “shortcut connections”. This residual learning structure can facilitate the learning of more nonlinear information from gene expression, biological effects, spatial information, and morphological information, and it can be expressed as follows: [MATH: DZ=X, :MATH] 13 [MATH: H=He+Hg< /msub>, :MATH] 14 where n is the number of cells or spots. The loss function L includes the reconstruction loss between the reconstructed gene expression matrix and the original gene expression matrix, and the Kullback–Leibler divergence of the node representation vector distribution and the normal distribution, as shown below: [MATH: L=1ni=1 n XiXi2 +KL[q (ZgA,Ze)p(Zg)], :MATH] 15 where n, [MATH: Xi :MATH] and [MATH: Xi :MATH] are the same as mentioned above, q is posterior probability the same as Eq. ([212]8), [MATH: p(Zg)=i< /mrow>N(zi0,I) :MATH] . Training strategies The training of ResST model involves several key strategies. Initially, ST data are processed to produce the input matrix X and adjacency matrix A for the model. Firstly, we initialize ResST and configure the Adam optimizer along with specific hyperparameters, such as learning rate and weight decay. In this study, the Adam optimizer’s learning rate is 0.0005, and weight decay is 0.0004. During the pre-training phase, we employ a noise function to augment the input data with a masking ratio of 0.001. Concurrently, we calculate the loss function, which comprises the reconstruction error and KL divergence. Following the pre-training phase, the Kmeans algorithm is utilized to perform initial clustering on the pre-trained latent embeddings. The model then progresses to the formal training phase. In this phase, we periodically recalculate the target distribution and update the clustering labels, while optimize the model parameters by minimizing the loss function. The loss function now includes reconstruction error, KL divergence, and adversarial loss. Within the optimize_cluster function, the CH index is used to assess clustering performance, and the resolution with the highest CH score is selected as the optimal resolution. During training, KL divergence is employed to measure the difference between the predicted distribution and the target distribution for preventing overfitting. Throughout the training, we monitor the total loss (including reconstruction error, KL divergence, and adversarial loss) and stop training when the total loss reaches a preset tolerance (0.001). Other hyperparameters include KL divergence weight (0.1), mean squared error weight (10), and binary cross-entropy weight (1) to balance different loss terms in the total loss. Batch effects correction for spatial transcriptomics Integrated analysis of multiple tissue sections can provide deeper insights into a comprehensive understanding of biological functions. Integration of ST datasets is typically categorized into two scenarios: vertical integration and horizontal integration. The major challenges for integrated analyses are the presence of batch effects among different sections and accurately assigning spots to the correct spatial domains. To overcome these challenges, we extend ResST model, which applies MDD to correct batch effects and align gene expression from different batches into the same feature space. Compared to other methods, ResST efficiently conducts vertical or horizontal integration from multiple tissue sections while correcting batch effects and preserving layer-specific gene expression variations. In ST data from different batches, there are no direct spatial relationships among spots from different sections. For vertical integration, we align the spatial locations from ST data based on H&E images, resulting in adjacent space among different sections. Subsequently, we construct the cell-cell spatial relationship graph using the aligned spatial coordinates. For horizontal integration, we use one of the sections as a reference and transform the spatial coordinates of the other sections to align the domains straddling the jointing edges. The model uses A^m to represent the adjacency matrix of the mth batch and X^m to represent the gene expression matrix of the mth batch. This study creates a block diagonal matrix [MATH: A :MATH] , and concatenates the gene expression matrix [MATH: X :MATH] with it, as shown below: [MATH: A=A100Am,X=X1Xm, :MATH] 16 where m is the number of ST batches. Next, we apply MDD to reduce batch effects and map the gene expression from different batches to the same feature space, resulting in maximal spatial proximity of different batches. MDD consists of a feature extractor, domain classifier, and label classifier. The feature extractor and label classifier share the encoder and decoder parts of the deep autoencoder, which are composed of joint linear layers and graphical neural networks. The domain classifier is composed of joint linear layers, with a user-defined number of hidden layers (default is 2). A gradient reversal layer (GRL) is added before the domain classifier to blur the boundary between the source and target domain. We train the feature extractor and extract shared features from the source and target domain using Eq. ([213]15). The domain classifier further maps the latent embeddings Z outputted by the feature extractor to a new D-dimensional representation: [MATH: ZRD :MATH] . We train the domain classifier using binary cross-entropy loss function to maximize the adversarial loss, thus increasing the dissimilarity between the source and target domain, as shown in the equation below: [MATH: L=1n i=1 nyilogpyi+1y< mi>ilog1pyi, :MATH] 17 where [MATH: yi :MATH] is the sample [MATH: i :MATH] , if the prediction label is equal to the true label, [MATH: yi=1 :MATH] , otherwise [MATH: yi=0 :MATH] ; [MATH: pyi :MATH] represents the probability that the observed sample [MATH: i :MATH] predicts correctly. Spatial domain identification and visualization by clustering After model training, the Leiden algorithm in the Scanpy^[214]54 package is used to identify the spots into different spatial domains based on latent embeddings obtained by ResST. ResST achieves the optimal resolution in two ways. If the number of spatial domains is known, a grid search is conducted over resolutions ranging from 0.1 to 2.5 with a step size of 0.001. If the number of spatial domains is unknown, the optimal resolution is determined by conducting a grid search over resolutions ranging from 0.1 to 2.5 with a step size of 0.001 and simultaneously calculating the CH score. The resolution with the highest CH score is selected as the optimal resolution. The Leiden clustering algorithm is used to form clusters based on optimal resolution, and each cell or spot is assigned a label. Subsequently, the clustering results are visualized on tissue sections using the Scanpy package. Although the latent embedding is obtained using gene expression, spatial locations, complex biological effects, and morphological information, some spots may be erroneously assigned to different spatial domains when utilizing latent embedding for clustering. We consider these occurrences as noise and acknowledge their potential impact on downstream biological analysis. To address this, we reassign incorrectly allocated spots to the most frequently adjacent region. During vertical integration, we perform UMAP analysis using the RunUMAP function from Seurat, with default parameters of min_dist set to 0.5 and n_neighbors set to 15. The feature space derived from ResST, Harmony, and Scanorama integrate multiple ST data sections and extracted features for UMAP analysis. To correct the UMAP results, we employ scDEED^[215]55 tool to detect dubious cells and optimize the UMAP analysis hyperparameters. We run scDEED with candidate n_neighbors values of 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 80, 160, and 240, and min_dist values of 0.0125, 0.05, 0.1, 0.3, 0.5, 0.7, and 0.9. After removing the dubious cells, we visualize the UMAP results. Identification of spatially variable genes The Wilcoxon rank-sum test in Scanpy package is used to confirm highly variable genes for each spatial domain as well as differentially expressed genes between spatial domains. We then use rule^[216]15 in SpaGCN package to filter the differentially expressed genes: (1) the percentage of spots expressing the gene in the target domain, that is, in-fraction, is >80%; (2) for each neighboring domain, the ratio of the percentages of spots expressing the gene in the target domain and the neighboring domain(s), that is, in/out fraction ratio, is >1; and (3) the expression fold change between the target and neighboring domain(s) is >1.5. Genes with log[2]FC ≥ 2 are selected as input, and the gene enrichment analysis is performed using the online tool Metascape^[217]56. Determination of ligand-receptor interactions between cells For human breast cancer samples, we analyze the ligand-receptor interactions by stLearn package. For each cell or spot, the expression levels of their receptors and ligands are calculated, and a receptor-ligand expression matrix is formed. Then, we calculate the expression scores of ligands or receptors in each cell or spot and display regions with scores less than 7000. Statistics & reproducibility In our case analysis, we utilized the scipy package, a toolkit for scientific computing, to compare paired samples or two independent samples within the study. To evaluate the stability and reproducibility of ResST, we conducted 15 tests on different ST datasets, utilizing the default parameters. Subsequently, we calculated the variance of several key metrics to assess the performance of ResST, including the ARI, FMI, SC, DB, and CH. Evaluation metrics In this study, we utilized several well-established metrics to evaluate the clustering results, providing a holistic view of the algorithm’s performance. The ARI was used to measure the similarity between the clustering outcome and the ground truth, with a focus on correcting for the chance grouping. The FMI, known for its effectiveness in scenarios with a large number of clusters, gauged the clustering quality based on precision and recall. To assess the cohesion and separation of the clusters, we employed the SC, which offers insights into how well each object lies within its cluster. Additionally, the DB was utilized to evaluate the clustering compactness and separation, where a lower value indicates better clustering. Finally, the CH was used to measure the variance ratio between clusters, with higher values signifying more distinct clustering. This combination of metrics provided a robust framework for evaluating and validating our clustering approach. Comparison with other methods To evaluate the clustering performance of ResST in identifying spatial domains, we compared it with Kmeans, Seurat, SpaGCN, stLearn, SEDR, CCST, and BayesSpace. (1) Kmeans and Seurat implemented in the R package Seurat. (2) SpaGCN implemented in the Python package SpaGCN. (3) stLearn implemented in the Python package stLearn. (4) SEDR implemented in the Python package SEDR. (5) CCST implemented in the Python package CCST. (6) BayesSpace implemented in the Python package BayesSpace. The default parameters were used for all methods. To evaluate the data integration performance of ResST, we compared it with Harmony and Scanorama which are implemented in the Python package Harmony and Scanorama. Reporting summary Further information on research design is available in the [218]Nature Portfolio Reporting Summary linked to this article. Supplementary information [219]Peer Review File^ (4.4MB, pdf) [220]Supplementary Information^ (11.1MB, pdf) [221]42003_2024_6814_MOESM3_ESM.pdf^ (40.9KB, pdf) Description of Additional Supplementary Files [222]Supplementary Data 1^ (37.5KB, xlsx) [223]Supplementary Data 2^ (12.9KB, xlsx) [224]Supplementary Data 3^ (123.3KB, xlsx) [225]Supplementary Data 4^ (58.4KB, xlsx) [226]Supplementary Data 5^ (11.8KB, xlsx) [227]Reporting Summary^ (1.6MB, pdf) Acknowledgements