Abstract

   Recent advances in spatial transcriptomics have enabled simultaneous
   preservation of high‐throughput gene expression profiles and the
   spatial context, enabling high‐resolution exploration of distinct
   regional characterization in tissue. To effectively understand the
   underlying biological mechanisms within tissue microenvironments, there
   is a requisite for methods that can accurately capture external spatial
   heterogeneity and interpret internal gene regulation from spatial
   transcriptomics data. However, current methods for region
   identification often lack the simultaneous characterizing of spatial
   structure and gene regulation, thereby limiting the ability of spatial
   dissection and gene interpretation. Here, stDCL is developed, a dual
   graph contrastive learning method to identify spatial domains and
   interpret gene regulation in spatial transcriptomics data. stDCL
   adaptively incorporates gene expression data and spatial information
   via a graph embedding autoencoder, thereby preserving critical
   information within the latent embedding representations. In addition,
   dual graph contrastive learning is proposed to train the model,
   ensuring that the latent embedding representation closely resembles the
   actual spatial distribution and exhibits cluster similarity.
   Benchmarking stDCL against other state‐of‐the‐art clustering methods
   using complex cortex datasets demonstrates its superior accuracy and
   effectiveness in identifying spatial domains. Our analysis of the
   imputation matrices generated by stDCL reveals its capability to
   reconstruct spatial hierarchical structures and refine differential
   expression assessment. Furthermore, it is demonstrated that the
   versatility of stDCL in interpretability of gene regulation, spatial
   heterogeneity at high resolution, and embryonic developmental patterns.
   In addition, it is also showed that stDCL can successfully annotate
   disease‐associated astrocyte subtypes in Alzheimer's disease and
   unravel multiple relevant pathways and regulatory mechanisms.

   Keywords: dual graph contrastive learning, gene regulation, graph
   contrastive learning, spatial heterogeneity
     __________________________________________________________________

   stDCL, a dual graph contrastive learning method, captures spatial
   heterogeneity and interprets gene regulation in spatial transcriptomics
   data. Integrating spatial and gene expression data through graph
   embeddings, stDCL provides robust spatial characterization and accurate
   region identification, reconstructs spatial hierarchies, and identifies
   disease‐associated cell subtypes, unveiling new insights into tissue
   microenvironments and disease mechanisms.

   graphic file with name ADVS-12-2410081-g009.jpg

1. Introduction

   Spatial transcriptomics involves a series of emerging genomic
   technologies that reveal gene expression in tissues with spatial
   context, thereby providing a deeper understanding of the structure and
   function of tissues at the spatial level.^[ [36]^1 ^] Several spatial
   transcriptomics technologies have been developed, including RNA
   sequencing‐based technologies such as 10x Visium,^[ [37]^2 ^]
   Slide‐seq,^[ [38]^3 ^] and spatial transcriptomics (ST),^[ [39]^4 ^]
   which enable genome‐wide analysis of gene expression by thousands
   across tissue locations. In addition, single‐molecule fluorescence in
   situ hybridization (smFISH)‐based technologies, such as MERFISH,^[
   [40]^5 , [41]^6 , [42]^7 ^] seqFISH,^[ [43]^8 , [44]^9 ^] and seqFISH
   plus,^[ [45]^10 ^] offer single‐cell resolution, bridging gene
   expression and spatial position. These advancements allow studying
   spatial gene expression patterns,^[ [46]^11 , [47]^12 ^] exploring the
   transcriptomic landscape of tissues, and inferring cell‐to‐cell
   communication.^[ [48]^13 ^] In general, spatial transcriptomics has
   facilitated new discoveries in many areas of biology.^[ [49]^14 ,
   [50]^15 ^]

   The spatial heterogeneity of complex tissue architecture is
   fundamentally related to the spatial domains of different regions.
   These domains, characterized by their distinct gene expression patterns
   and histological characteristics, underlie various biological functions
   and regulatory mechanisms. They are characterized by distinct cellular
   compositions, transcriptomic diversity, and cell‐to‐cell interactions,
   all of which are critical to understanding physiology and pathology.^[
   [51]^16 , [52]^17 , [53]^18 ^] Therefore, it is crucial to identify and
   determine the spatial domains in tissues. Traditional clustering
   algorithms, such as K‐means and Louvain, rely primarily on gene
   expression data and ignore critical spatial information, resulting in
   clustering results that fail to accurately represent the intricacies of
   the actual tissue. To mitigate this issue, several methodologies
   incorporating spatial information for enhanced identification of
   spatial domains have been developed. Among these, stLearn^[ [54]^19 ^]
   extracts morphological features from morphological images and spatial
   locations using deep learning models and normalizes gene expression
   using morphological distances. BayesSpace,^[ [55]^20 ^] a Bayesian
   statistical clustering method, imposes a prior for ST data and
   encourages spatially adjacent spots to belong to the same cluster.
   SpaGCN^[ [56]^21 ^] employs graph convolution networks to integrate
   gene expression with weighted graphs constructed from spatial distance
   and histology to identify spatial domains. STAGATE^[ [57]^22 ^]
   develops an adaptive graph attention autoencoder to learn
   low‐dimensional latent embeddings of ST data and identify spatial
   domains. CCST^[ [58]^23 ^] uses unsupervised GCNs to learn a cell
   embedding representation based on graphs extracted from spatial
   transcriptomics data. However, although these methods combine gene
   expression and spatial information to identify spatial domains, they
   are all based entirely on unsupervised learning, resulting in the
   learned embedding representations often lacking the fidelity required
   to accurately segment spatial domains, making it prone to generate
   discrepancies between identified domains and pathology annotations.

   In recent years, various methods rooted in self‐supervised learning
   have emerged to enhance the optimization of embedding representations
   and elevate the efficacy of spatial domain identification. SpaceFlow^[
   [59]^24 ^] uses a spatially regularized deep graph network to combine
   gene expression similarity with spatial information through a
   contrastive learning strategy. conST^[ [60]^25 ^] explores multiple
   patterns of ST data through a contrastive learning framework in two
   training stages and learns to embed representations beneficial for
   diverse downstream tasks. Similarly, GraphST^[ [61]^26 ^] introduces
   self‐supervised graph contrastive learning to learn informative
   representations of gene expression profiles alongside their spatial
   coordinates, thus enriching latent representation learning. ConGI^[
   [62]^27 ^] deciphers the spatial domains by adapting gene expression to
   histopathology images through contrastive learning. However, these
   methods typically lack ability to fully utilize the spatial information
   within their self‐supervised contrastive learning designs, resulting in
   inadequate guidance of model training by spatial information, thus
   perpetuating the difficulty of accurately identifying spatial domains.
   Moreover, existing methods are unable to optimize the latent embedding
   representations at both spatial and gene levels, thereby ignoring many
   key gene regulations.

   Here, we develop stDCL, a graph convolutional autoencoder framework
   that uses dual contrastive learning to identify spatial domains from
   spatial transcriptomics data. stDCL integrates spatial information and
   gene expression information through graph convolutional networks (GCNs)
   to learn the latent embedding representations and allow imputation of
   ST data. Then, stDCL employs dual graph contrastive learning
   (space‐aware contrastive learning and cluster‐level feature contrastive
   learning) to enable the embedding representation and imputation matrix
   to preserve the important spatial heterogeneity and gene regulation.
   The space‐aware contrastive learning improves the similarity between
   spots of close spatial location while reducing the similarity between
   spots of distant spatial location within the representation space. The
   cluster‐level feature contrastive learning brings the intra‐cluster
   spots closer and further away from the inter‐cluster spots in terms of
   feature dimensionality. Evaluation of stDCL's clustering performance on
   the human dorsolateral prefrontal cortex (DLPFC) 10x Visium dataset and
   frontal cortex MERFISH dataset demonstrated its superiority over eight
   state‐of‐the‐art methods for identifying spatial domains. In
   particular, stDCL exhibited robust performance even on complex tissue
   structures and detected substructures not detected by other methods.
   Additionally, we analyzed the interpretable functional genes in the
   latent embedding representation of stDCL and aggregated their spatial
   expression information to provide a reliable comprehension of multiple
   regions. Furthermore, the imputation of stDCL could effectively
   reconstruct the hierarchical structure of the spatial transcriptomics
   data, enhancing layer‐specific differential gene identification while
   mitigating noise. We also demonstrated that stDCL can reveal spatial
   heterogeneity between different regions at high spatial resolution and
   unravel patterns of brain development in mouse embryos. Moreover, stDCL
   proved valuable in identifying cellular phenotypes associated with
   Alzheimer's disease, elucidating the complex regulatory processes
   underlying its pathology.

2. Results

2.1. Overview of stDCL

   stDCL learns the low‐dimensional latent embedding representation from
   spatial information and gene expression information by dual graph
   contrastive losses. As depicted in Figure [63] 1 , the workflow of
   stDCL comprises four principal stages: (1) constructing a
   transcriptomics profile‐based spatial graph that merges gene expression
   and spatial data from spatial transcriptomics (ST) data; (2) deriving
   the low‐dimensional latent representations using a graph convolutional
   autoencoder; (3) bringing spatially neighboring spots closer and
   separating non‐neighboring spots via space‐aware contrastive learning;
   and (4) utilizing cluster‐level feature contrastive learning to augment
   the correlation between homologous dimensional genes based on
   cluster‐level representations derived from the readout function.

Figure 1.

   Figure 1
   [64]Open in a new tab

   Overview of stDCL. a) The graph construction process for spatial
   transcriptomics data via spatial location and gene expression. b) stDCL
   learns the low‐dimensional latent representation and the imputation
   matrix with spatial information and gene expression information via the
   graph convolutional autoencoder. c) stDCL uses a weight‐shared GCN
   encoder to encode the corrupted gene expression matrix generated by
   data augmentation and obtains the latent embedding representations in
   different views. The space‐aware contrastive learning is used to
   further learn the relationships between spots on the latent
   representation. d) stDCL uses the cluster‐level feature contrastive
   learning to strengthen the correlation of the same dimension features
   in two views and improve the clustering performance of the model. e)
   The application capabilities of latent embedding representations
   trained and imputation matrices by stDCL.

   stDCL first constructs a transcriptomics profile‐based spatial graph by
   integrating gene expression information and spatial information to
   jointly characterize and balance the relationship between spots at
   gene‐specific and spatial levels (Figure [65]1a). Subsequently, the
   process continues with the use of a preprocessed gene expression matrix
   and the neighborhood graph as inputs into graph convolutional networks
   (GCNs), serving both as encoder and decoder to effectively preserve the
   underlying topological structure inherent in the latent embedding
   representations (Figure [66]1b). At the end, through optimizing the
   expression matrix and the loss function associated with graph
   reconstruction, stDCL learns to derive low‐dimensional latent embedding
   representations and imputation matrices, leveraging the spatial and
   gene expression information.

   In addition, we propose a dual graph contrastive learning model in
   stDCL to learn graph‐level discriminative representations efficiently.
   We first generate a corrupted gene expression matrix by adding Gaussian
   noise to the original gene expression matrix to obtain different views
   of genetic characterization for data augmentation. Then, stDCL feeds
   the corrupted gene expression matrix into the shared encoder, yielding
   cross‐view latent characterizations. Further, to preserve the
   underlying spatial similarities and topological relationships between
   cellular spots, we propose a space‐aware contrastive learning approach
   that harnesses spatial information to ensure alignment between the
   spots in the embedding representations and their true spatial
   distribution (Figure [67]1c). Specifically, we calculate the mean
   squared error (MSE) of the cosine similarity matrix derived from the
   cross‐view representations utilizing the neighborhood graph,
   effectively assigning higher similarity scores to spots in close
   spatial proximity and lower scores to those in distant spatial
   locations during the training process. On this basis, we also propose a
   cluster‐level feature contrastive learning strategy to enhance the
   spatial clustering proficiency of our model (Figure [68]1d), which
   utilizes a readout function to calculate cluster‐level embedding
   representations, achieved by averaging the cross‐view representations
   for each cluster. Subsequently, we compute the mean squared error (MSE)
   between the identity matrix and the cosine similarity matrix of the
   cluster‐level embedding representations to draw representations of the
   identical dimensional features closer together. Finally, the
   cluster‐level feature contrastive learning approach is proposed to
   strengthen intra‐cluster connectivity and weakens inter‐cluster
   connectivity from the perspective of gene dimension.

2.2. stDCL Accurately Identifies Spatial Heterogeneity from Spatial
Transcriptomic Profiling of DLPFC

   To evaluate the spatial clustering performance of stDCL, we used it on
   the human dorsolateral prefrontal cortex (DLPFC) 10x Visium dataset^[
   [69]^28 ^] from brain tissue, which contains spatially resolved
   transcriptomic profiles of 12 DLPFC slices. The cortical layers (L1‐L6)
   and the white matter (WM) regions within each slice have been carefully
   identified by manual annotation, relying on morphological
   characteristics and genetic markers, as described by Maynard et al. ^[
   [70]^28 ^] To ensure a comprehensive comparison, we conducted a
   comparative analysis of stDCL against eight state‐of‐the‐art methods
   (Seurat,^[ [71]^29 ^] SpaGCN,^[ [72]^21 ^] stLearn,^[ [73]^19 ^]
   BayesSpace,^[ [74]^20 ^] SEDR,^[ [75]^30 ^] CCST,^[ [76]^23 ^]
   STAGATE,^[ [77]^22 ^] and GraphST^[ [78]^26 ^]) on the DLPFC dataset.
   Figure [79] 2a shows the results using the Adjusted Rand Index (ARI)
   metric to compare performance between stDCL and the other methods on
   the 12 tissue slices. We observe that stDCL achieved the highest ARI
   score on 8 of the 12 datasets. We depict the overall clustering
   performance of the different methods on the 12 tissue slices using
   boxplots, where the middle line and white diamond represent the median
   and mean ARI values, respectively. On all 12 slices, stDCL showed
   superior performance with the highest mean (ARI = 0.61) and median (ARI
   = 0.61) scores, exhibiting the least variance among the methods
   evaluated, as depicted in Figure [80]2b. GraphST also demonstrated
   remarkable spatial clustering performance, with a median score reaching
   0.59. In contrast, the mean and median scores of the other methods were
   not above 0.50, with both BayesSpace and CCST displaying high variance,
   with significant performance fluctuations across the several tissue
   slices. For a more detailed assessment, we further utilized the
   normalized mutual information (NMI) and homogeneity score (HS) metrics
   to evaluate the clustering performance of stDCL (Figure [81]S1,
   Supporting Information). Notably, the experimental results indicated
   that stDCL outperformed the other methods with both metrics.

Figure 2.

   Figure 2
   [82]Open in a new tab

   stDCL accurately identifies spatial heterogeneity from spatial
   transcriptomic profiling of DLPFC. a) Comparison of ARI values between
   stDCL and 8 spatial transcriptomics clustering methods on 12 DLPFC
   slices (n = 1 in each group). b) The box plot of clustering accuracy on
   all 12 DLPFC slice datasets in terms of ARI score for 9 methods (n = 12
   in each group; center line, median; box limits, upper and lower
   quartiles; whiskers, 1.5× interquartile range). c) The manually
   annotated layer structure compared to spatial domains detected by
   Seurat, SpaGCN, stLearn, BayesSpace, SEDR, CCST, STAGATE, GraphST, and
   stDCL on slice 151508.

   We next take tissue slice 151508 as an example, which contains 4384
   spots, both stDCL and STAGATE achieved superior clustering performance,
   and successfully identified the spatial domains that closely match the
   manually annotated tissue layers (Figure [83]2c). In contrast, methods
   such as Seurat and SEDR encountered difficulty in clearly defining
   layer boundaries, leading to numerous outliers that compromised the
   spatial structuring in the brain region. SpaGCN failed to separate the
   spots within layers L4‐L6 based on the hierarchical structure, whereas
   for stLearn and CCST, cluster most spots in layers L4‐L6 into a single
   expansive cluster. Overall, stDCL can effectively identify the cortical
   layer structures in the DLPFC dataset with stability and robustness
   across several slices. Figures [84]S2– [85]S4 (Supporting Information)
   detail the outcomes for the additional slices.

2.3. stDCL Provides Better Clustering Performance on Complex Spatial Domains
and Tissue Structures

   In the next benchmark experiment, we applied stDCL to two 10x Visium
   datasets with more complex tissue structures (the mouse anterior brain
   dataset and the human breast cancer dataset), and compared it with
   other state‐of‐the‐art clustering methods. Long et al. ^[ [86]^26 ^]
   provided manual annotations for this mouse dataset in accordance with
   the Allen Mouse Brain Reference Atlas, identifying 52 clusters, as
   depicted in Figure [87] 3a. The experimental results revealed that
   stDCL outperformed the other methods in terms of clustering performance
   on this dataset (Figure [88]3b; Figure [89]S5, Supporting Information).
   From the spatial clustering results, it becomes evident that only stDCL
   succeeded in accurately capturing the ‘CPu’ domain and fully
   identifying the “St” domain, while other methods produced aberrant
   tissue structures or disorganized spot distributions. Furthermore, the
   clusters identified by Seurat, SpaGCN, and SEDR showed spatial
   fragmentation, and the boundaries defined by stLearn and BayesSpace
   were indistinct. Of the remaining three methods, CCST and STAGATE
   introduced unnecessary layer structures, while GraphST formed
   discontinuous clusters. In summary, on spatial transcriptomics datasets
   with intricate organizational structures, stDCL demonstrated its
   capacity to effectively cluster spatial domains that align with
   biological annotations and identify several substructures.

Figure 3.

   Figure 3
   [90]Open in a new tab

   Spatial clustering performance on datasets with complex tissue
   structures.) The spatial domains of the mouse anterior brain dataset
   manually annotated by the Allen Mouse Brain Reference Atlas. b) The
   spatial domains detected by different methods on the mouse brain
   dataset. c) H&E images and manual annotation on the human breast cancer
   dataset. d) Comparisons of ARI values between stDCL and 8 spatial
   transcriptomics clustering methods on the human breast cancer dataset
   (n = 1 in each group). e) Spatial domains detected by different methods
   on the human breast cancer dataset.

   For the human breast cancer dataset, the dataset was manually annotated
   with 20 regions by pathologists based on the H&E image and reported
   breast cancer marker genes (Figure [91]3c). stDCL still maintained the
   best clustering performance on this dataset. The ARI value of stDCL
   improved by 7% compared with other methods (Figure [92]3d), and it also
   outperformed other methods in the evaluation metrics of NMI and HS
   (Figure [93]S6, Supporting Information). The spatial domains identified
   by stDCL were the closest to manual annotation (Figure [94]3e).
   Specially, for regions “IDC 5” and “Healthy 1,” only stDCL clearly
   captures most of the spots as a cluster, while other methods failed.
   Besides stDCL, CCST also had higher ARI value and better spatial
   clustering performance on the human breast cancer dataset, while
   Seurat, SpaGCN and STAGATE performed poorer clustering results on this
   dataset. In summary, on spatial transcriptomics datasets with complex
   organizational structures, scDCL was able to cluster spatial domains
   consistent with biological annotations and identify some substructures.

2.4. stDCL Efficiently Demarcates Tissue Structures in Spatial Transcriptomic
Profiling Across Different Platforms

   To test the spatial clustering performance of stDCL using various
   spatial transcriptomics technologies, we used it on different platforms
   to analyze the MERFISH dataset,^[ [95]^31 ^] with 9 tissue slices from
   the frontal cortex and striatum of mouse brain of diverse ages. From
   the 9 slices, slices 4‐0, 4‐1, 4‐2, 6‐0, 6‐1, and 6‐2 correspond to
   mouse brain at 4 weeks, and slices 8‐0, 8‐1, and 8‐2 are mouse brain at
   90 weeks. Distinct tissue regions were manually delineated on each
   slice, indicating the corpus callosum, pia mater, striatum, olfactory
   region, brain ventricle, cortical layer V, cortical layer VI, and
   cortical layer II. We evaluated the performance of the algorithm on
   different platforms and further conducted a comparative analysis
   comparing stDCL with four well‐known spatial clustering methods,
   including GraphST, STAGATE, CCST, and SEDR. As depicted in Figure [96]
   4a, stDCL demonstrated superior spatial clustering performance on all 9
   tissue slices compared to other methods, with stDCL achieving the
   highest ARI on 7 of them. In terms of overall clustering performance,
   stDCL exhibited the highest mean score (ARI = 0.63) and median (ARI =
   0.62) of all methods across all 9 slices (Figure [97]4b). Notably,
   stDCL attained an ARI of 0.78 on slice 8‐0, significantly outperforming
   the other methods. The mean and median scores of the other methods were
   below 0.55, with SEDR proving ineffective for spatial clustering on the
   MERFISH dataset. Additionally, we employed NMI and HS metrics to
   provide a comprehensive assessment of the performance of stDCL on the
   MERFISH dataset (Figure [98]S7, Supporting Information). The
   experimental findings indicated that stDCL surpasses the other methods
   measured by both metrics, and notably, stDCL achieved the highest HS
   scores on all slices.

Figure 4.

   Figure 4
   [99]Open in a new tab

   stDCL efficiently demarcates tissue structures in spatial
   transcriptomic profiling across different platforms. a) Comparisons of
   ARI values between stDCL and 4 spatial transcriptomics clustering
   methods on 9 slices from the MERFISH dataset (n = 1 in each group). b)
   The box plot of clustering accuracy on all 9 slice datasets in terms of
   ARI score (n = 9 in each group; center line, median; box limits, upper
   and lower quartiles; whiskers, 1.5× interquartile range). The manually
   annotated tissue structure compared to spatial domains detected by
   stDCL, GraphST, STAGATE, CCST and SEDR on c slice 4‐0, d slice 6‐0 and
   e slice 8‐0.

   In the young mouse brain slices 4‐0 and 6‐0, and in an old brain slice
   8‐0, stDCL accurately identified the spatial domains, aligning closely
   with the manually annotated tissue structures (Figure [100]4c–e),
   distinguishing regions like the corpus callosum, pia mater, and
   striatum in both ages, which was challenging for the other methods.
   Specifically, the pia mater identified by GraphST tended to be
   conflated with other tissue structures, and the clarity of the striatum
   identification in old slice 8‐0 was markedly compromised compared to
   stDCL. STAGATE and CCST, failed to identify the pia mater, and the
   corpus callosum identified in young brain slice 6‐0 and striatum in old
   brain slice 8‐0 were incomplete. The spatial domains found by SEDR are
   relatively disordered and the spatial clustering performance poor.
   Moreover, Figures [101]S8–[102]S10 (Supporting Information) provide
   results for all other slices for an exhaustive evaluation. We aslo
   applied stDCL to the slices from Adult mouse central nervous system,
   obtained using the STARMap Plus technique.^[ [103]^32 ^] Distinct
   tissue regions were manually delineated on three slice, indicating the
   CB 1, CB 2, CTX 1, CTX 2, DG, and so on. As depicted in
   Figure [104]S11a (Supporting Information), stDCL demonstrated superior
   spatial clustering performance on all 3 tissue slices compared to other
   methods. In mouse central nervous system slices, stDCL accurately
   identified spatial domains that were tightly aligned with manually
   annotated tissue structures (Figure [105]S11b, Supporting Information).
   Notably, stDCL is able to successfully detect HY regions, whereas other
   methods either failed to detect or incorporated HY regions into other
   regions. For the other methods, GraphST is also able to identify most
   of the spatial domains, while CCST and SEDR detected regions that are
   more ambiguous. In conclusion, stDCL consistently exhibits effective
   performance in spatial clustering on several datasets utilizing various
   spatial transcriptomics technologies, and maintaining robust
   performance with different age of mouse brain.

   We also compared the runtime of stDCL with other clustering algorithms.
   For the test datasets, we adopted 19 datasets with 1200–30 000 cells
   (Figure [106]S12, Supporting Information). As shown in Figure [107]S12
   (Supporting Information), stDCL, GraphST, and SEDR demonstrate
   comparable execution times, while CCST exhibits the longest runtime
   among the methods. Notably, CCST's runtime escalates sharply once the
   cell count surpasses 5000, indicating a steep increase in computational
   demand at higher cell densities. For STAGATE, the growth rate in
   running time is particularly pronounced between 10 000 and 20 000
   cells, highlighting its sensitivity to increased cell numbers. Although
   stDCL's runtime is generally longer than that of GraphST, this is
   attributed to the added computational complexity introduced by its dual
   graph contrastive learning mechanism, which enhances model performance
   at the cost of increased processing time. In addition, we compared the
   memory usage of these methods at different numbers of cells
   (Figure [108]S13, Supporting Information). In Figure [109]S13
   (Supporting Information), we can observe that stDCL and GraphST
   maintain relatively consistent memory usage, even as the number of
   cells scales up, suggesting their efficiency in managing computational
   resources. However, CCST has significantly higher memory requirements,
   especially when the number of cells exceeds 5000, suggesting that its
   algorithmic structure may need to allocate a large amount of memory for
   larger datasets. SEDR maintains a lower memory usage. As the number of
   cells increases from 20 000 to 25 000, STAGATE's memory usage also
   increases significantly, reflecting the method's growing demand on
   resources for larger data sizes. In summary, despite incorporating dual
   graph contrastive learning, stDCL maintains runtime and memory usage
   within a manageable range.

2.5. The Latent Embedding Characterization of stDCL Aggregates Interpretable
Spatial Expression Information

   To investigate whether stDCL can capture critical spatial
   transcriptomics information, we conducted genome interpretability
   analysis on the latent embedding representation of stDCL using the
   mouse forebrain 10x Visium dataset. Initially, we identified two
   regions based on the manual annotation, namely MOB::ONL and VL
   (Figure [110] 5a). We then utilized an interpretable gene selection
   method^[ [111]^33 ^] informed by the standard deviation ranking of GNN
   weight matrices to select the top 1000 most highly expressed genes from
   the latent embedding representations of stDCL, GraphST, and STAGATE.
   Subsequently, we visualized the top 5 most highly expressed genes in
   regions MOB::ONL and VL using the spatial clustering methods and
   compared with the highly expressed genes obtained by manual annotation
   (Figure [112]5b). It can be observed that stDCL correctly identified
   the highly expressed genes in the VL region without mistaking genes
   from other spatial domains. Furthermore, stDCL detected highly
   expressed genes, such as Igf2, which were not identified by the other
   methods or even the manual annotation. For the olfactory bulb (MOB)
   olfactory nerve layer (ONL) MOB::ONL region, both stDCL and manual
   annotation found the highly expressed gene S100a5, while the other
   methods failed. To verify the significance of S100a5 expression in the
   MOB::ONL region, we visualized the expression of S100a5 in the mouse
   olfactory bulb datasets profiled by Stereo‐seq^[ [113]^34 ^] and
   Slide‐seqV2,^[ [114]^35 ^] respectively (Figure [115]5c). As
   illustrated in Figure [116]5c, the S100a5 gene is clearly highly
   expressed in the olfactory nerve layer. Recent studies have also
   demonstrated that S100a5 is mainly expressed in the olfactory bulb and
   olfactory sensory neurons (OSNs), with expression significantly
   upregulated in response to olfactory stimulation.^[ [117]^36 , [118]^37
   ^]

Figure 5.

   Figure 5
   [119]Open in a new tab

   The latent embedding characterization of stDCL aggregates interpretable
   spatial expression information. a) The spatial domains of the mouse
   anterior brain dataset manually annotated by the Allen Mouse Brain
   Reference Atlas (left). b) The dotplot of the top 5 highly expressed
   genes in MOB::ONL and VL in manual annotation, stDCL, GraphST and
   STAGATE. c) The visualization of expression of S100a5 on mouse
   olfactory bulb datasets, profiled by Stereo‐seq and Slide‐seqV2,
   respectively. d) Visualization of the spatial expression of genes Igf2
   and S100a5 denoised by stDCL.

   To further evaluate the ability of stDCL to aggregate and characterize
   the spatial expression information of the data, we computed the 15
   nearest neighbors for each spot based on the Euclidean distance in the
   latent embedding representation of stDCL and then averaged the gene
   expression data for these neighbors to yield a denoised gene expression
   matrix. As depicted in Figure [120]5d, compared to the raw gene
   expression data, the expression levels of the interpretable genes Igf2
   and S100a5, identified through the latent embedding representation,
   were elevated in their respective regions, and their enrichment levels
   were higher. Overall, these experimental results demonstrate that the
   latent embedding characterization of stDCL can uncover important
   interpretable genes and aggregate their spatial expression information,
   offering insights into the molecular architecture and function
   of tissue.

2.6. stDCL Efficiently Recovers Gene Expression to Characterize Spatial
Hierarchy and Differential Gene Expression in the Mouse Visual Cortex

   To investigate the reconstruction performance of stDCL in spatial
   hierarchy structure and denoising and enhancement of differential gene
   expression, we used mouse primary visual cortex (V1) profiled by
   STARmap.^[ [121]^38 ^] The dataset comprises 1,207 cells and 1,020
   genes, and the layer structure and cell types of the tissue slices are
   depicted in Figure [122] 6a. We first applied STAGATE, GraphST, and
   stDCL to the STARmap dataset and compared their clustering performance
   (Figure [123]S14, Supporting Information). The results demonstrated
   that stDCL surpassed the other methods. To evaluate the performance on
   reconstructing the hierarchical structure through the original spatial
   information, we first projected the imputation matrix generated by the
   three methods onto two dimensions as the imputation space using the
   scanpy package. We proceeded to compare the correlation of pairwise
   distances in the imputation space generated by stDCL, GraphST, and
   STAGATE with the original space, as depicted in Figure [124]6b. We see
   that the imputation space of stDCL exhibited a significantly higher
   correlation with the original space compared to the other two methods,
   with a Pearson correlation coefficient (PCC) of 0.502, while STAGATE
   and GraphST achieved PCC values of 0.298 and 0.345, respectively. We
   also used other correlation metrics for systematic comparison, and
   stDCL is the best in all four metrics (Figure [125]S15, Supporting
   Information). In addition, the imputation space of stDCL preserved the
   relative position between the L1‐L6 layers, that is, the distance to
   the L1 layer gradually increased from the inside to the outside,
   whereas the hierarchies reconstructed by STAGATE and GraphST did not
   show this tendency (Figure [126]S16, Supporting Information).

Figure 6.

   Figure 6
   [127]Open in a new tab

   stDCL can efficiently recover gene expression to characterize spatial
   hierarchy and differential gene expression in the mouse visual cortex.
   a) The layer structure and cell types of the tissue slices of the mouse
   primary visual cortex (V1) dataset. b) The density plot of the
   correlation of pairwise distances between spots in the original space
   and the imputation space generated by STAGATE, GraphST and stDCL. c)
   The spatial expression of layer‐specific genes. d) The expression trend
   of differential genes in each layer along the X‐axis in the original
   space and the imputation spaces generated by STAGATE, GraphST and
   stDCL, respectively.

   We also assessed the performance of stDCL imputation on denoising and
   enhancing differential gene expression results. Specifically, we
   focused on four layer‐specific differential genes, Pcp4 (L6), Rprm
   (L6), Tmsb10 (L5) and Mapk3 (L1), and compared their spatial expression
   levels in both the raw and imputed expression matrices (Figure [128]6c;
   Figure [129]S17, Supporting Information). These genes were specifically
   expressed at the corresponding spatial positions of the imputation
   matrix, whereas their raw spatial expression was inconspicuous and
   dispersed. We further investigated the expression trends of these
   layer‐specific genes along the X‐axis in the imputation space and the
   original space. As illustrated in Figure [130]6d, the imputation space
   of stDCL effectively preserved most of the expression patterns of genes
   along the layer axis, closely resembling the patterns observed in the
   original space. For other methods, GraphST retained only a part of the
   gene expression trend, while STAGATE displayed a broader trend. In
   particular, all three methods substantially preserved the expression
   pattern of the gene Tmsb10. Overall, imputation of stDCL not only
   successfully reconstructs the hierarchical structure of the spatial
   transcriptomics data, but also clarifies and strengthens the
   identification of layer‐specific differential genes to unravel spatial
   intricacies in the mouse visual cortex.

2.7. Evaluation of Hyperparameter Selection and Ablation Analysis

   We evaluated the effect of the number of selected genes, the number of
   neighbors and their metric distance used to construct the neighborhood
   graph, the loss weights, the network framework and other parameters on
   the clustering performance of stDCL on 12 DLPFC slice datasets. In
   addition, we investigated the effectiveness of each component of stDCL
   on the same dataset.

   The clustering performance of stDCL was evaluated using varying numbers
   of selected genes and different numbers of neighbors, as depicted in
   Figure [131] 7a and Figure [132]S18a (Supporting Information). The
   results revealed that stDCL achieved optimal clustering when utilizing
   3000 genes as input and constructing the neighborhood graph with three
   neighbors. Subsequently, stDCL was trained by three loss functions
   simultaneously, namely reconstruction loss, space‐aware contrastive
   loss, and cluster‐level feature contrastive loss. To ensure a
   harmonious balance of the overall loss function, we assigned weights to
   three parameters (γ[1], γ[2], and γ[3]) and explored various
   combinations of weights. Specifically, in accordance with our previous
   experiences, we selected γ[1], γ[2], and γ[3] in the range [1.0, 5.0,
   10.0], [0.2, 0.5, 0.8], or [0.2, 0.5, 0.8]. On this basis, we
   enumerated weights to obtain 27 distinct loss weight assignments and
   compared the clustering performance in each case (Figure [133]7b).
   Notably, among the 27 configurations, composition 24 yielded the
   highest average ARI value, where γ[1], γ[2], and γ[3] were set to [10,
   0.5, 0.8], respectively. We also employed NMI and HS metrics for
   further comparisons (Figure [134]S19, Supporting Information). We also
   systematically analyze the parameters alpha [1] and alpha [2] of the
   reconstruction loss and give the best settings as [1.0, 0.5]
   (Figure [135]S20, Supporting Information). To construct the
   neighborhood graph, we tested the clustering performance using
   different methods and distance metrics. As shown in Figure [136]7c and
   Figure [137]S18b (Supporting Information), stDCL achieved optimal
   clustering performance when constructing a KNN graph using the
   Euclidean distance metric. Regarding the network architecture, we
   evaluated the clustering performance of stDCL with varying numbers of
   hidden layers, as depicted in Figure [138]7d and Figure [139]S18c
   (Supporting Information). Experimental results indicated that stDCL was
   most effective in clustering with a single hidden layer. In addition,
   we also analyzed the number of nodes in the hidden layer, as
   illustrated in Figure [140]S21 (Supporting Information).

Figure 7.

   Figure 7
   [141]Open in a new tab

   Evaluation of hyperparameter selection and ablation analysis on 12
   DLPFC slices. a) The clustering performance was evaluated using ARI
   metric for different numbers of genes and numbers of neighbors (n = 12
   in each group; center line, median; box limits, upper and lower
   quartiles; whiskers, 1.5× interquartile range). b) The ARI bubble plot
   under different loss weight assignments. c) The radar plot of
   clustering performance for different ways and distance metrics of
   constructing the neighborhood graph. d) The ridgeline plot of
   clustering performance for different numbers of hidden layers in the
   graph autocoder (n = 12 in each group). e) The ablation study for the
   neighborhood graph and dual graph contrastive learning (n = 1 in each
   group).

   We conducted an ablation analysis on stDCL to assess the effectiveness
   of the neighborhood graph and dual graph contrastive learning
   mechanisms. Specifically, we compared the spatial clustering
   performance of stDCL with its partially ablated counterpart using the
   ARI metric. In the neighborhood graph module construction, we removed
   two components: spatial location (“Without Spatial”) and the gene
   expression matrix (“Without Gene”). In the contrastive learning module,
   we evaluated three ablation cases: (1) replacing the space‐aware
   contrastive loss function with the InfoNCE loss function (“InfoNCE”);
   (2) removing the cluster‐level feature contrastive loss function
   (“Without Clu”): (3) using only the InfoNCE loss function (“Without
   Clu(Info)”). As shown in Figure [142]7e and Figure [143]S22 (Supporting
   Information), the spatial clustering performance of stDCL outperformed
   the other arrangements. In addition, the clustering performance of
   stDCL exhibited greater stability, whereas the other arrangements
   displayed some degree of fluctuation. In particular, the spatial
   clustering performance of stDCL after removing gene expression
   information during the construction of the neighborhood graph was
   significantly higher than that of the other arrangements on slices
   151671 and 151672. This implies that spatial information plays a
   predominant role in the correlation of spots on these two slices, and
   the addition of gene expression information has an adverse impact on
   clustering performance. In summary, these experimental results
   highlight the effectiveness of both the neighborhood graph construction
   and the dual graph contrastive learning components in stDCL. To provide
   a more comprehensive analysis, we investigated how dual graph
   contrastive learning with stDCL contributes to the performance of
   various tasks, including spatial heterogeneity dissection, spatial
   hierarchy characterization and gene expression recovery. We evaluated
   three ablation cases: (1) using only the space‐aware contrastive
   learning (stDCL(Spa)); (2) using only the cluster‐level feature
   contrastive learning (stDCL(Clu)): (3) using only the InfoNCE loss
   function (stDCL(Info)). We study the performance of these cases on
   various tasks in more detail and provided the analysis and experimental
   results in Note [144]S8 (Supporting Information).

2.8. stDCL Reveals Molecular and Spatial Heterogeneity in Mouse Brain at
Single‐Cell Resolution

   To evaluate whether stDCL can explore molecular and spatial
   heterogeneity, we applied stDCL to analyze spatial transcriptomics data
   from the mouse brain at single‐cell resolution. We collected a spatial
   dataset obtained from the mouse somatosensory cortex acquired with
   osmFISH,^[ [145]^39 ^] which contains 4839 cells and 33 genes. The real
   annotations of this dataset, as illustrated in Figure [146] 8a, exhibit
   its origin at the Pia Layer 1 and encompass 11 cell‐wise regions. We
   compared the spatial clustering performance of STAGATE, GraphST and
   stDCL on this dataset using NMI and ARI metrics, and then visualized
   the clustering results (Figure [147]8b,c, Supporting Information). The
   experimental results indicate that stDCL significantly outperformed
   other methods, and the identified spatial domains are closer to the
   real annotation. To elaborate, in comparison to the other two methods,
   stDCL could distinguish Layer 4 and White matter more effectively,
   rendering the layer boundaries more distinctly defined. Furthermore, we
   validated that stDCL revealed heterogeneity in these spatial layers
   through the expression of marker genes^[ [148]^39 ^] native to osmFISH
   data (Figure [149]8d). For instance, Rorb, a marker gene for Layer 4
   pyramidal neurons, exhibited the expected high expression within Layer
   4 identified by stDCL. Plp1, a marker gene for oligodendrocytes,
   displayed elevated expression within the white matter, where various
   types of oligodendrocytes are known to reside. In addition, Flt1, a
   marker gene for endothelial cells in Pia Layer 1, and Slc32a1, a marker
   gene for interneurons in IC CP, were also found to be expressed as
   anticipated within their respective spatial locations. These findings
   collectively emphasize the capacity of stDCL to elucidate the molecular
   and spatial heterogeneity, further revealing spatial gene expression
   patterns. In addition, we also conducted trajectory inference on the
   relatively well‐defined domains identified by stDCL, such as Layer 4,
   Layer 6, and white matter (Figure [150]8e). The inferred trajectory
   projects from White matter to Layer 4, aligning with the established
   pattern of cortical development from the inside out,^[ [151]^40 ,
   [152]^41 ^] where new neurons migrate vertically along the radial glial
   fibers to the marginal area of the cortical periphery to form a new
   cortex on top of the existing layers.

Figure 8.

   Figure 8
   [153]Open in a new tab

   stDCL reveals molecular and spatial heterogeneity in mouse brain at
   single‐cell resolution. a) The ground truth of regional annotation for
   the osmFISH dataset. b) The spatial domains identified by STAGATE,
   GraphST and stDCL. c) The clustering performance comparison of the
   three methods. d) The visualization of domains identified by stDCL and
   the corresponding marker genes. e) The visualization of the trajectory
   inferred by stDCL.

   Subsequently, we conducted an analysis of mouse hypothalamic preoptic
   area tissue slices (Bregma‐0.14) obtained using the MERFISH Spatial
   Profiling Technology.^[ [154]^42 ^] This dataset contains 5926 cells
   and 155 genes, annotated across eight structural regions: bed nuclei of
   the strata terminalis (BST), medial preoptic area (MPA), medial
   preoptic nucleus (MPN), periventricular hypothalamic nucleus (PV),
   paraventricular hypothalamic nucleus (PVH), and paraventricular nucleus
   of the thalamus (PVT), third ventricle (V3), and columns of the fornix
   (fx) (Figure [155] 9a). We examined the spatial domain clustering
   results of STAGATE, GraphST, and stDCL (Figure [156]9b). It can be
   observed that stDCL surpassed the other methods in its ability to
   detect the majority of spatial domains, exhibiting a closer resemblance
   to the histological annotations. Specifically, stDCL was adept at
   identifying and delineating the BST region, achieving smoother
   boundaries than those produced by alternative approaches. In addition,
   stDCL was able to successfully identify the V3 region, a task that
   STAGATE failed to perform in the lower part of the V3 region, and
   GraphST incorrectly merged some V3 cells with other regions. Then, we
   performed enrichment analysis on the top 20 differentially expressed
   genes detected in the V3 region identified by stDCL (Figure [157]9c)
   and detected marker genes for this region (Figure [158]S26, Supporting
   Information). These genes are mainly enriched in response to glucose
   and monosaccharide; the reason being there are basically ependymal
   cells in the V3 region (Figure [159]9d), suggesting their potential
   involvement in glucose and monosaccharide transport.^[ [160]^43 ,
   [161]^44 ^] Moreover, ependymal cells, as a type of glial cell found in
   the central nervous system, are an indispensable component of the
   central nervous system.^[ [162]^45 ^] Therefore, the V3 region is also
   enriched for Gene Ontology (GO) terms associated with glial cells and
   the nervous system, such as glial cell projections and regulation of
   nervous system processes. In summary, stDCL demonstrated an exceptional
   capacity to accurately identify and characterize the complex molecular
   and spatial heterogeneity embedding within the mouse brain at the
   single‐cell resolution.

Figure 9.

   Figure 9
   [163]Open in a new tab

   stDCL reveals molecular and spatial heterogeneity in mouse brain at
   single‐cell resolution. a) The manual annotation of mouse hypothalamic
   regions with reference to the Allen Mouse Brain Atlas. b) The
   identified spatial domains are shown for STAGATE, GraphST and stDCL. c)
   The GO enrichment analysis (two‐sided Wilcoxon test and adopt BH to
   adjust p‐values for multiple comparisons). d) The true cell annotation
   of mouse hypothalamic MERFISH dataset.

2.9. stDCL Unravels Developmental Patterns in the Mouse Embryo Brain

   To investigate developmental patterns in the mouse embryonic brain, we
   applied stDCL to analyze multiple mouse embryo Stereo‐seq datasets^[
   [164]^34 ^] acquired at the E9.5 and E10.5 time points. We first
   performed clustering analysis on the E1S1 slices of E9.5 embryos using
   STAGATE, GraphST, and stDCL, followed by a comparison with the ground
   truth annotations (Figure [165] 10a,b). The reason for setting the
   number of clusters higher than the number of annotated tissue regions
   was to obtain higher resolution identification of tissue regions. As
   depicted in Figure [166]10b, the clustering results of stDCL exhibited
   a closer alignment with the annotations, enabling the clear detection
   of specific regions such as heart, liver, and dermatomyositis. In
   addition, we analyzed the correlation between the clusters identified
   by stDCL. Notably, stDCL uncovered a group of highly correlated
   clusters concentrated in the brain region (Figure [167]10c). To explore
   developmental patterns in the mouse embryonic brain, we integrated this
   group of clusters as a significant brain region detected by stDCL and
   investigated its distinctions from other regions. We first calculated
   the differentially expressed genes within this brain region and
   visualized the expression levels of the top 15 up‐regulated genes
   (Figure [168]10d,e; Figure [169]S27, Supporting Information). These
   genes are clearly highly expressed in the brain region compared to
   other regions. We further constructed a Protein–Protein Interaction
   (PPI) network for the up‐regulated differentially expressed genes via
   STRING^[ [170]^46 ^] and extracted the top‐scoring gene modules from
   the network using Molecular Complex Detection (MCODE)^[ [171]^47 ^]
   (Figure [172]10f). Additionally, we conducted Gene Ontology (GO)
   enrichment analysis, revealing that the highest‐scoring gene modules
   were associated with neurodevelopment (Figure [173]10g). In the early
   stages of mouse brain development, many neurons are generated, thus
   nervous system development plays an important role at this time.^[
   [174]^48 ^] Moreover, several pathways related to brain development,
   including neuronal development, neuronal differentiation, and forebrain
   development, were also enriched. These findings indicate that stDCL
   reveals the developmental patterns inherent in the mouse embryo.

Figure 10.

   Figure 10
   [175]Open in a new tab

   stDCL unravels developmental patterns in the mouse embryo brain. a) The
   ground truth of regional annotation for the E1S1 section of E9.5
   embryos. b) The spatial domains identified by STAGATE, GraphST and
   stDCL. c) The correlation matrix between clusters identified by stDCL.
   A group of highly correlated clusters concentrated in the brain region
   is marked by a black box. d) The violin plot of expression levels of
   the top 15 up‐regulated genes. e) The visualization of expression
   levels of the top up‐regulated genes in the brain region identified by
   stDCL. f) The PPI network of up‐regulated differential genes. The
   purple marked part is the highest scoring gene module in the network
   calculated by MCODE. g) The heat map of GO enrichment analysis.

2.10. stDCL Uncovers Biological Connections Across Multiple Mouse Embryo
Slices

   We performed an extensive clustering analysis on four distinct
   embryonic slices, denoted as E2S1, E2S2, E2S3, and E2S4, from E9.5
   embryos (Figure [176] 11a; Figure [177]S28, Supporting Information).
   The experimental results show that the clustering performance of stDCL
   was better than other clustering methods overall and can correctly
   identify most regions. On this basis, we examined the correlation
   between the clusters identified by stDCL on these embryo
   sections (Figure [178]11b). Remarkably, stDCL revealed a consistent
   correlation between the regions identified within the brain and those
   present within the spinal cord on all four slices (Figure [179]11c). To
   investigate the underlying biological significance, we aggregated these
   regions for KEGG pathway analysis of differentially expressed genes
   within the integrated regions (Figure [180]11d; Figure [181]S29,
   Supporting Information). It can be observed that up‐regulated
   differentially expressed genes are enriched for axon guidance and the
   Wnt signaling pathway in all four sections. This observation highlights
   their pivotal roles in axon guidance and Wnt signaling in the complex
   process of neuronal development, particularly in forming commissural
   circuits connecting the brain and spinal cord within the CNS.^[
   [182]^49 ^] The axon guidance mechanism is key to accomplishing this
   task.^[ [183]^50 , [184]^51 ^] In addition, the Wnt signaling pathway
   is also essential for the formation of neuronal circuits.^[ [185]^52 ^]
   To elucidate further, the Wnt signaling controls neuronal polarity,
   promotes axon growth, and regulates axon navigation to the final
   synaptic target.^[ [186]^53 , [187]^54 ^]

Figure 11.

   Figure 11
   [188]Open in a new tab

   stDCL uncovers biological connections across multiple mouse embryo
   slices. a The ground truth of regional annotation for the E1S1, E2S1,
   E2S2, E2S3, and E2S4 section of E9.5 embryos. The spatial domains
   identified by stDCL, GraphST, STAGATE, CCST and SEDR. b The correlation
   matrix between clusters identified by stDCL. Marked by the black box is
   a group of highly correlated clusters concentrated in the brain region.
   c The spatial domains identified by stDCL on four sections of E2S1,
   E2S2, E2S3, and E2S4 in E9.5 embryos (left). The regions of correlation
   between the brain and spinal cord identified by stDCL on all four
   sections (right). d The KEGG pathway enrichment analysis of the
   selected region on the E2S1 section. e The ground truth of regional
   annotation for the E1S1 section of E10.5 embryos (left). The regions of
   correlation between the brain and spinal cord identified by stDCL on
   the E1S1 section (right). f The GSVA enrichment pathway diagram of the
   selected region.

   To provide a more comprehensive analysis, we extended our investigation
   by conducting analogous clustering and correlation analyses on the E1S1
   section of E10.5 embryos, which has a denser spinal cord region
   distribution (Figure [189]11e; Figure [190]S30, Supporting
   Information). We also found the consistent correlation between brain
   and spinal cord regions on this slice, and the area of the correlated
   region was larger. We performed gene set variation analysis (GSVA) on
   the integrated regions and found that the Wnt signaling pathway was
   predominantly enriched (Figure [191]11f). This demonstrates that
   embryos at this stage, which have a denser spinal cord region, are
   important for the formation of commissural circuits connecting the
   brain and spinal cord within the CNS. In particular, we also enriched
   for various Interleukin (IL) signaling pathways, including IL‐2, IL‐3,
   IL‐6, and IL‐9. The IL signaling pathways highlight the importance of
   immune‐neural interactions, especially in response to injury or
   inflammation. This is particularly relevant for areas such as the
   hypothalamus and spinal cord, which are involved in both neuroendocrine
   regulation and immune responses. For example, in the spinal cord, IL‐6
   is involved in the regulation of chronic pain, especially neuropathic
   pain. It can promote inflammatory responses in the spinal cord by
   activating microglia and astrocytes, thereby affecting pain
   transmission.^[ [192]^55 ^] Collectively, these findings revealed that
   stDCL uncovers the biological connections between the brain and spinal
   cord in the mouse embryo.

2.11. stDCL Illustrates Underlying Regulatory Mechanisms in Alzheimer's
Disease

   Alzheimer's disease (AD), a progressive neurodegenerative disorder, is
   marked by distinct pathologies in brain tissue, including reactive
   astrocyte changes. To illustrate the underlying regulatory mechanisms,
   we applied stDCL to the STARmap PLUS dataset,^[ [193]^56 ^] which
   comprises AD and control mouse brains from 8‐month and 13‐month
   specimens (Figure [194] 12a). stDCL identifies several cell clusters,
   including astrocytes, endothelial cells, microglia, oligodendrocytes,
   etc (Figure [195]12b and Figure [196]S31, Supporting Information). We
   integrated all astrocytes identified by stDCL and performed
   subclustering analysis to obtain three distinct subpopulations Astro1,
   Astro2, and Astro3 (Figure [197]12c). Notably, the proportion of Astro3
   was higher in AD than in controls, and Astro3 was significantly
   increased in AD from 8 to 13 months, suggesting a dynamic cellular
   response in AD progression (Figure [198]12d). Compared to other
   astrocyte subtypes, Gfap, Vim, Cd9, and Igfbp5 were significantly
   upregulated in Astro3, which aligns with the disease‐associated
   astrocyte (DAA) phenotype reported in literature.^[ [199]^57 , [200]^58
   ^] Gfap is primarily expressed in astrocytes and significantly elevated
   in the brains of Alzheimer's disease (AD) patients, where it correlates
   with gliosis and the presence of A β; plaques.^[ [201]^59 , [202]^60 ^]
   As a potential blood biomarker for AD,^[ [203]^61 ^] Gfap could aid in
   diagnosing patients and tracking disease progression and therapeutic
   efficacy.^[ [204]^62 ^] Vim is expressed in response to damage in
   brains, and its upregulation in Alzheimer's disease may play a role in
   neuronal repair following synaptic disruption and dendrite
   retraction,^[ [205]^63 ^] although its precise function in this context
   remains to be clarified. To further validate the association of Astro3
   with the DAA‐like cell population, we introduced single nuclei RNA
   sequencing (sNuc‐seq) datasets of 10‐month‐old WT and AD (5xFAD) mouse
   brains ([206]GSE143758),^[ [207]^64 ^] including 1179 WT astrocytes and
   2079 disease‐associated astrocytes. We performed multimodal
   intersection analysis (MIA)^[ [208]^64 ^] using astrocytes from the
   sNuc‐seq dataset with astrocytes identified by stDCL in the STARmap
   PLUS dataset. MIA showed that these AD astrocytes were significantly
   enriched in the Astro3 subpopulation, WT astrocytes were predominantly
   enriched in the Astro2 subpopulation, and these cells were not
   significantly enriched in the Astro1 subpopulation (Figure [209]12g).
   In addition, gene expression of Astro3 was also more relevant to AD
   astrocytes compared to WT astrocytes. These results demonstrate the
   strong association of the Astro3 subpopulation with AD, affirming its
   alignment with the DAA‐like population and suggesting its proliferation
   within AD cells as the pathology advances.

Figure 12.

   Figure 12
   [210]Open in a new tab

   stDCL illustrates the underlying regulatory mechanisms in Alzheimer's
   disease. a) The spatial domains identified by stDCL on the STARmap PLUS
   dataset. b) The dot plot of marker gene expression levels in different
   cell populations identified by stDCL. c) The UMAP visualization of
   astrocyte subtypes. d) The comparison of different astrocyte subtypes
   proportions in AD and control samples at different times. e) The violin
   plot of expression levels of top marker genes in Astro3. f) The UMAP
   visualization of expression levels of top marker genes in Astro3. g)
   The MIA map of WT and AD astrocytes in sNuc‐seq and stDCL‐identified
   astrocyte subtypes in ST (left). The violin plot of expression levels
   of Astro3 gene in WT and AD astrocytes (right). h) The gene ontology
   enrichment analysis results of DAA‐like populations in astrocytes. i)
   The visualization of pseudotime trajectories of different astrocyte
   subtypes.

   Then, to further explore the AD‐related gene regulation in the Astro3
   subpopulation, we performed GO enrichment analysis (Figure [211]12h) on
   the differentially expressed genes. It can be observed that
   differentially expressed genes in Astro3 were involved in glial cell
   development and proliferation, amyloid fibril formation, and protein
   import, all of which are relevant to pathological processes. In
   addition, these genes were mainly enriched in the regulation of tau
   protein kinase activity, the imbalance of tau kinase and phosphatase
   activity is believed to lead to tau hyperphosphorylation in the
   disease,^[ [212]^65 ^] which is a major pathological characteristic of
   AD. Additionally, we also performed pseudotime trajectory inference
   across the three astrocyte subtypes, unveiling that Astro1 and Astro2,
   which are not associated with disease, represent the initial state of
   astrocytes, and progressively merge into the branch of the DAA‐like
   subtype Astro3, which is consistent with the trend of disease
   progression (Figure [213]12i).

3. Discussion

   Spatial transcriptomics enables the quantification of gene expression
   while simultaneously capturing spatial gene locations. The integration
   of gene expression and spatial information aids in the identification
   of spatial gene expression clusters, offering a pathway to elucidate
   spatial heterogeneity. Nevertheless, the effective integration of
   spatial data remains an ongoing challenge. Here, we have developed a
   graph‐based contrastive learning method called stDCL to identify
   spatial domains and uncover spatial heterogeneity. In particular, dual
   graph contrastive learning has been proposed to synergize both the
   space‐aware contrastive learning and cluster‐level feature contrastive
   learning for the single‐cell spatial dissections of tissues.

   Specifically, stDCL accurately identifies spatial domains in the human
   dorsolateral prefrontal cortex tissue, outperforming eight other
   state‐of‐the‐art methods in spatial clustering performance. In
   particular, stDCL can identify spatial domains that closely match with
   human annotations consistently on the datasets generated from different
   spatial transcriptomics techniques, while maintaining stable
   performance across different age groups. Based on the latent embedding
   representation, stDCL can uncover important interpretable gene
   candidates in tissue and aggregate their spatial expression
   information. Additionally, we have demonstrated stDCL's effectiveness
   in reconstructing hierarchical structures and enhancing differential
   expression analysis on the STARmap dataset. Compared to STAGATE and
   GraphST, the imputation matrix generated by stDCL preserves the
   relative position between layers in the original space and accurately
   reconstructs the expression patterns of individual layer‐specific
   differential genes. Furthermore, stDCL successfully characterized the
   complicated molecular and spatial heterogeneity within the mouse brain
   at single‐cell resolution.

   Notably, stDCL can unravel developmental patterns in the brain of mouse
   embryos. We applied stDCL to identify spatial domains in the E1S1 slice
   of E9.5 embryos. Interestingly, stDCL found a cluster of highly
   correlated regions concentrated within the brain area, elucidating gene
   regulation relevant to early stages of mouse brain development. In the
   downstream cluster analysis of additional slices, stDCL revealed the
   correlations between identified regions within the brain and spinal
   cord, highlighting the co‐enrichment of crucial axon guidance
   mechanisms involved in forming commissural circuits connecting these
   two regions. To provide a comprehensive analysis, we employed the E1S1
   slice of E10.5 embryos for validation, confirming the significant
   pathways implicated in the formation of neuronal circuits.

   Furthermore, stDCL can reveal the underlying regulatory mechanisms in
   Alzheimer's disease. We used stDCL to perform spatial clustering on
   Alzheimer's disease (AD) datasets, annotating astrocytes using marker
   genes. From the analysis of the astrocyte subgroup, it was found that
   the Astro3 subtype represented a higher proportion of diseased cells
   compared to control cells. Therefore, we further verified that subtype
   Astro3 was a DAA‐like population through the sNuc‐seq dataset and
   explored its AD‐related biological processes. In addition, we conducted
   pseudotime trajectory analysis to infer trends in astrocyte changes
   during disease progression.

   In summary, stDCL is a versatile approach based on dual graph
   contrastive learning to jointly analyze spatial transcriptomics
   datasets across various regions, considering both gene expression and
   spatial distribution. In the complex spatial structure, stDCL
   demonstrates the ability to identify spatial domains, enhance the
   interpretability of genes, recover spatial hierarchy, elucidate spatial
   heterogeneity, unveil developmental patterns, and elucidate disease
   regulatory mechanisms. We will continue to advance and refine the
   capabilities of stDCL with open‐source availability and reproducibility
   in the future.

4. Experimental Section

Data processing

   In this study, stDCL performs spatial clustering by utilizing both the
   gene expression matrix and spatial location information derived from
   spatial transcriptomics data. To elucidate this methodology further, a
   logarithmic transformation was initially applied to the raw gene
   expression matrix and subsequently normalized it based on library size
   using the SCANPY software package.^[ [214]^66 ^] After normalization,
   the gene expression counts were further scaled to unit variance and
   zero mean. For 10x Genomics and Stereo‐seq datasets, the top 3000
   highly variable genes as input were carefully selected. In the case of
   datasets with a smaller number of genes, such as STARmap, osmFISH, and
   MERFISH, it was opted to utilize all available genes as input without
   any additional selection process. This comprehensive approach ensured
   the suitability of this analysis for a wide range of spatial
   transcriptomic datasets.

Transcriptomics Profile‐Based Spatial Graph Construction

   A transcriptomics profile‐based spatial graph was developed to
   characterize the relationships between spots. In previous studies, most
   methodologies relied either on spatial information alone or on prior
   clustering outcomes to capture spot‐to‐spot relationships. However,
   this kind of graph constructed only through spatial location distance
   and pre‐trained information cannot effectively describe the
   relationship between spots, especially when they were spatially distant
   but belong to the same domain. To comprehensively characterize the
   relationships between spots, an undirected graph was designed that
   integrates both gene expression information and spatial information.
   Specifically, the Euclidean distance between each spot and other spots
   was calculated through their spatial location information and gene
   expression profiles, respectively. For spatial location information,
   the top k [1] nearest spots were selected to each spot as its
   neighbors, while opting for the top k [2] nearest spots for gene
   expression information. Then, connections (edges) between spots that
   share neighborhood status were established, yielding two adjacency
   matrices A ^spatial and A ^gene respectively, which is defined as
   follows:
   [MATH:
   <mrow><mrow><msubsup><mi>A</mi><mrow><mi>i</mi><mi>j</mi></mrow><mrow><
   mi>s</mi><mi>p</mi><mi>a</mi><mi>t</mi><mi>i</mi><mi>a</mi><mi>l</mi></
   mrow></msubsup><mo linebreak="badbreak">=</mo><mfenced separators=""
   open="{" close=""><mtable><mtr><mtd
   columnalign="left"><mrow><mn>1</mn><mo>,</mo></mrow></mtd><mtd
   columnalign="left"><mrow><mi>j</mi><mo>∈</mo><msub><mi>N</mi><msub><mi>
   k</mi><mn>1</mn></msub></msub><mrow><mo
   stretchy="false">(</mo><mi>i</mi><mo
   stretchy="false">)</mo></mrow></mrow></mtd></mtr><mtr><mtd
   columnalign="left"><mrow><mn>0</mn><mo>,</mo></mrow></mtd><mtd
   columnalign="left"><mrow><mi>j</mi><mo>∉</mo><msub><mi>N</mi><msub><mi>
   k</mi><mn>1</mn></msub></msub><mrow><mo
   stretchy="false">(</mo><mi>i</mi><mo
   stretchy="false">)</mo></mrow></mrow></mtd></mtr></mtable></mfenced><ms
   pace
   width="2em"></mspace><msubsup><mi>A</mi><mrow><mi>i</mi><mi>j</mi></mro
   w><mrow><mi>g</mi><mi>e</mi><mi>n</mi><mi>e</mi></mrow></msubsup><mo
   linebreak="goodbreak">=</mo><mfenced separators="" open="{"
   close=""><mtable><mtr><mtd
   columnalign="left"><mrow><mn>1</mn><mo>,</mo></mrow></mtd><mtd
   columnalign="left"><mrow><mi>j</mi><mo>∈</mo><msub><mi>N</mi><msub><mi>
   k</mi><mn>2</mn></msub></msub><mrow><mo
   stretchy="false">(</mo><mi>i</mi><mo
   stretchy="false">)</mo></mrow></mrow></mtd></mtr><mtr><mtd
   columnalign="left"><mrow><mn>0</mn><mo>,</mo></mrow></mtd><mtd
   columnalign="left"><mrow><mi>j</mi><mo>∉</mo><msub><mi>N</mi><msub><mi>
   k</mi><mn>2</mn></msub></msub><mrow><mo
   stretchy="false">(</mo><mi>i</mi><mo
   stretchy="false">)</mo></mrow></mrow></mtd></mtr></mtable></mfenced></m
   row></mrow> :MATH]
   (1)

   where
   [MATH:
   <mrow><mrow><msub><mi>N</mi><msub><mi>k</mi><mn>1</mn></msub></msub><mr
   ow><mo stretchy="false">(</mo><mi>i</mi><mo
   stretchy="false">)</mo></mrow></mrow></mrow> :MATH]
   and
   [MATH:
   <mrow><mrow><msub><mi>N</mi><msub><mi>k</mi><mn>2</mn></msub></msub><mr
   ow><mo stretchy="false">(</mo><mi>i</mi><mo
   stretchy="false">)</mo></mrow></mrow></mrow> :MATH]
   represent the neighborhood of the spot i for spatial information and
   gene expression information, respectively. Afterwards, the parameter α
   ∈ (0, 1) was introduced to adjust the weight between spots within A
   ^gene . Its primary purpose is to establish a relationship framework
   where spatial location information takes precedence, is supplemented,
   and refined by gene expression information. Finally, A ^spatial and A
   ^gene were integrated to yield a neighborhood graph A as follows:
   [MATH: <mrow><mrow><mi mathvariant="bold">A</mi><mo
   linebreak="badbreak">=</mo><mi>min</mi><mfenced separators="" open="("
   close=")"><msup><mi
   mathvariant="bold">A</mi><mrow><mi>s</mi><mi>p</mi><mi>a</mi><mi>t</mi>
   <mi>i</mi><mi>a</mi><mi>l</mi></mrow></msup><mo
   linebreak="goodbreak">+</mo><mi>α</mi><msup><mi
   mathvariant="bold">A</mi><mrow><mi>g</mi><mi>e</mi><mi>n</mi><mi>e</mi>
   </mrow></msup><mo>,</mo><mn>1</mn></mfenced></mrow></mrow> :MATH]
   (2)

   Among them, the reason for limiting the weights between spots to 1 is
   to prevent over‐correction of gene expression information.

Graph Convolutional Autoencoder

   To better integrate and capture gene expression information and spatial
   location information, a graph convolutional autoencoder was employed to
   learn the latent embedding representation for spatial transcriptomics
   data. Specifically, both the encoder and the decoder utilize a
   single‐layer graph convolutional network (GCN), taking the normalized
   gene expression matrix X and the adjacency matrix A constructed from
   spatial information as input. Therefore, for spatial transcriptomics
   data, the latent embedding representation Z generated by the encoder
   and the reconstructed gene expression matrix H generated by the decoder
   can be defined as:
   [MATH: <mrow><mrow><mi mathvariant="bold">Z</mi><mo
   linebreak="badbreak">=</mo><mi>σ</mi><mo
   stretchy="false">(</mo><msup><mi
   mathvariant="bold">D</mi><mrow><mo>−</mo><mfrac><mn>1</mn><mn>2</mn></m
   frac></mrow></msup><mi mathvariant="bold">A</mi><msup><mi
   mathvariant="bold">D</mi><mrow><mo>−</mo><mfrac><mn>1</mn><mn>2</mn></m
   frac></mrow></msup><mi mathvariant="bold">X</mi><msub><mi
   mathvariant="bold">W</mi><mi>e</mi></msub><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   (3)
   [MATH: <mrow><mrow><mi mathvariant="bold">H</mi><mo
   linebreak="badbreak">=</mo><mi>σ</mi><mo
   stretchy="false">(</mo><msup><mi
   mathvariant="bold">D</mi><mrow><mo>−</mo><mfrac><mn>1</mn><mn>2</mn></m
   frac></mrow></msup><mi mathvariant="bold">A</mi><msup><mi
   mathvariant="bold">D</mi><mrow><mo>−</mo><mfrac><mn>1</mn><mn>2</mn></m
   frac></mrow></msup><mi mathvariant="bold">Z</mi><msub><mi
   mathvariant="bold">W</mi><mi>d</mi></msub><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   (4)

   where σ(·) denotes the nonlinear activation function; D = diag{(I + A)1
   [n ]} (n is the total number of spots) represents the degree matrix; W
   [e ]and W [d ]are the parameter matrices in the encoder and decoder,
   respectively. Further, an inner product decoder was designed to compute
   the reconstructed adjacency matrix
   [MATH: <mrow><mover accent="true"><mi
   mathvariant="bold">A</mi><mo>^</mo></mover></mrow> :MATH]
   , which is defined as follows:
   [MATH: <mrow><mrow><mover accent="true"><mi
   mathvariant="bold">A</mi><mo>^</mo></mover><mo
   linebreak="badbreak">=</mo><mi>sigmoid</mi><mrow><mo
   stretchy="false">(</mo><msup><mi
   mathvariant="bold">Z</mi><mi>T</mi></msup><mi
   mathvariant="bold">Z</mi><mo
   stretchy="false">)</mo></mrow></mrow></mrow> :MATH]
   (5)

   where sigmoid(x) = 1/(1 + e ^−x ) represents the nonlinear activation
   function. To jointly extract critical gene expression information and
   spatial location information, the reconstruction loss of the gene
   expression matrix was minimized and the adjacency matrix as follows:
   [MATH: <mrow><mrow><msub><mi
   mathvariant="script">L</mi><mrow><mi>r</mi><mi>e</mi><mi>c</mi></mrow><
   /msub><mo
   linebreak="badbreak">=</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><munderov
   er><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></mun
   derover><mfenced separators="" open="("
   close=")"><msub><mi>α</mi><mn>1</mn></msub><mrow><mo
   stretchy="false">|</mo><mo stretchy="false">|</mo></mrow><msub><mi
   mathvariant="bold">H</mi><mi>i</mi></msub><mo
   linebreak="goodbreak">−</mo><msub><mi
   mathvariant="bold">X</mi><mi>i</mi></msub><msup><mrow><mo
   stretchy="false">|</mo><mo
   stretchy="false">|</mo></mrow><mn>2</mn></msup><mo
   linebreak="goodbreak">+</mo><msub><mi>α</mi><mn>2</mn></msub><mrow><mo
   stretchy="false">|</mo><mo stretchy="false">|</mo></mrow><msub><mover
   accent="true"><mi
   mathvariant="bold">A</mi><mo>^</mo></mover><mi>i</mi></msub><mo>−</mo><
   msub><mi mathvariant="bold">A</mi><mi>i</mi></msub><msup><mrow><mo
   stretchy="false">|</mo><mo
   stretchy="false">|</mo></mrow><mn>2</mn></msup></mfenced></mrow></mrow>
   :MATH]
   (6)

   where i denotes the i‐th spot, α[1] and α[2] are weight coefficients.

Space‐Aware Contrastive Learning

   A space‐aware contrastive learning framework that leverages spatial
   information was proposed to ensure alignment between the spots in the
   embedding representations and their true spatial distribution. A
   space‐aware contrastive learning framework was proposed to improve the
   quality of the latent embedding representation through spatial
   information. First, data augmentation was performed to generate a
   corrupted gene expression matrix by adding Gaussian noise, which
   defined as:
   [MATH: <mrow><mrow><mover accent="true"><mi
   mathvariant="bold">Z</mi><mo>^</mo></mover><mo
   linebreak="badbreak">=</mo><mi
   mathvariant="bold">X</mi><mo>⊙</mo><msub><mi
   mathvariant="bold">N</mi><mi>g</mi></msub></mrow></mrow> :MATH]
   (7)

   where N [g ]represents the random noise matrices obeying the Gaussian
   distribution, ⊙ is the Hadamard product.
   [MATH: <mrow><mover accent="true"><mi
   mathvariant="bold">X</mi><mo>^</mo></mover></mrow> :MATH]
   was inserted into the encoder that shares parameters with X to generate
   the latent embedding representation
   [MATH: <mrow><mover accent="true"><mi
   mathvariant="bold">Z</mi><mo>^</mo></mover></mrow> :MATH]
   under a different perspective as follows:
   [MATH: <mrow><mrow><mover accent="true"><mi
   mathvariant="bold">Z</mi><mo>^</mo></mover><mo
   linebreak="badbreak">=</mo><mi>σ</mi><mrow><mo
   stretchy="false">(</mo><msup><mi
   mathvariant="bold">D</mi><mrow><mo>−</mo><mfrac><mn>1</mn><mn>2</mn></m
   frac></mrow></msup><mi mathvariant="bold">A</mi><msup><mi
   mathvariant="bold">D</mi><mrow><mo>−</mo><mfrac><mn>1</mn><mn>2</mn></m
   frac></mrow></msup><mover accent="true"><mi
   mathvariant="bold">X</mi><mo>^</mo></mover><msub><mi
   mathvariant="bold">W</mi><mi>e</mi></msub><mo
   stretchy="false">)</mo></mrow></mrow></mrow> :MATH]
   (8)

   Contrastive learning endeavors to optimize the likeness between
   positive pairs while concurrently minimizing the resemblance among
   negative pairs. Inspired by this principle, a space‐aware contrastive
   learning approach tailored was proposed to spatial transcriptomics
   data. The space‐aware contrastive loss is established on a reasonable
   assumption that adjacent spots are similar and non‐adjacent spots are
   dissimilar. Therefore, the key idea is to enhance the similarity of
   spatially adjacent spots and weaken the similarity of non‐adjacent
   spots. Specifically, the cosine similarity matrix S of cross‐view
   latent embedding representations was calculated, which is defined as
   follows:
   [MATH:
   <mrow><mrow><msub><mi>S</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo
   linebreak="badbreak">=</mo><mfrac><mrow><mrow><mo
   stretchy="false">(</mo><msub><mi
   mathvariant="bold">Z</mi><mi>i</mi></msub><mo
   stretchy="false">)</mo></mrow><msup><mrow><mo
   stretchy="false">(</mo><msub><mover accent="true"><mi
   mathvariant="bold">Z</mi><mo>^</mo></mover><mi>j</mi></msub><mo
   stretchy="false">)</mo></mrow><mi>T</mi></msup></mrow><mrow><mrow><mo
   stretchy="false">|</mo><mo stretchy="false">|</mo></mrow><msub><mi
   mathvariant="bold">Z</mi><mi>i</mi></msub><mrow><mo
   stretchy="false">|</mo><mo stretchy="false">|</mo><mo
   stretchy="false">|</mo><mo stretchy="false">|</mo></mrow><msub><mover
   accent="true"><mi
   mathvariant="bold">Z</mi><mo>^</mo></mover><mi>j</mi></msub><mrow><mo
   stretchy="false">|</mo><mo
   stretchy="false">|</mo></mrow></mrow></mfrac></mrow></mrow> :MATH]
   (9)

   where S [ij ]denotes the cosine similarity between the embedding
   representation of the i‐th spot in the first view and the embedding
   representation of the j‐th spot in the second view. To align the spot
   similarities with the spatial coordinate distribution, the mean square
   error (MSE) was employed between S and the adjacency matrix A as the
   contrastive loss, formulated as follows:
   [MATH: <mrow><mrow><msub><mi
   mathvariant="script">L</mi><mrow><mi>c</mi><mi>o</mi><mi>n</mi></mrow><
   /msub><mo
   linebreak="badbreak">=</mo><mfrac><mn>1</mn><msup><mi>n</mi><mn>2</mn><
   /msup></mfrac><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn
   ></mrow><mi>n</mi></munderover><munderover><mo>∑</mo><mrow><mi>j</mi><m
   o>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><msup><mfenced
   separators="" open="("
   close=")"><msub><mi>S</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>−
   </mo><msub><mi>A</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub></mfenced>
   <mn>2</mn></msup></mrow></mrow> :MATH]
   (10)

   It is important to emphasize that during model training, as we
   minimized the contrastive loss, the similarity of spot pairs
   characterized by high weight in matrix A is increased while
   simultaneously observing a reduction in the similarity of pairs with
   lower weights. Obviously, this process clearly results in our
   space‐aware contrastive loss effectively bringing spatially adjacent
   spots into closer proximity and pushing non‐adjacent spots further
   apartx, ensuring that the embedding representation can effectively
   capture the spatial similarity inherent in the underlying
   organizational structure.

Cluster‐Level Feature Contrastive Learning

   Apart from adjusting the correlation of spots across views through
   spatial information, the correlation of features in the cluster‐level
   embedding representation was further optimized. The embedding
   representations of different views were equally divided into k groups
   (k is the number of clusters), calculated the average embedding within
   each group, and finally combined them to obtain the cluster‐level
   embedding representations. Specifically, the readout function was
   introduced
   [MATH: <mrow><mrow><mi mathvariant="script">R</mi><mrow><mo
   stretchy="false">(</mo><mo>·</mo><mo
   stretchy="false">)</mo></mrow><mo>:</mo><msup><mi
   mathvariant="bold">R</mi><mrow><mi>n</mi><mo>×</mo><mi>d</mi></mrow></m
   sup><mo>→</mo><msup><mi
   mathvariant="bold">R</mi><mrow><mi>k</mi><mo>×</mo><mi>d</mi></mrow></m
   sup></mrow></mrow> :MATH]
   (d is the latent embedding dimension) formulated as:
   [MATH: <mrow><mrow><mi mathvariant="script">R</mi><mrow><mo
   stretchy="false">(</mo><mi mathvariant="bold">Z</mi><mo
   stretchy="false">)</mo></mrow><mo linebreak="badbreak">=</mo><mfenced
   separators="" open="[" close="]"><msub><mover><mi
   mathvariant="bold">Z</mi><mo>¯</mo></mover><mn>1</mn></msub><mo>,</mo><
   msub><mover><mi
   mathvariant="bold">Z</mi><mo>¯</mo></mover><mn>2</mn></msub><mo>,</mo><
   mtext>…</mtext><mo>,</mo><msub><mover><mi
   mathvariant="bold">Z</mi><mo>¯</mo></mover><mi>k</mi></msub></mfenced><
   /mrow></mrow> :MATH]
   (11)
   [MATH: <mrow><mrow><msub><mover><mi
   mathvariant="bold">Z</mi><mo>¯</mo></mover><mi>i</mi></msub><mo
   linebreak="badbreak">=</mo><mfrac><mn>1</mn><mrow><mrow><mo
   stretchy="false">|</mo></mrow><msub><mi>n</mi><mi>i</mi></msub><mrow><m
   o
   stretchy="false">|</mo></mrow></mrow></mfrac><munder><mo>∑</mo><mrow><m
   i>j</mi><mo>∈</mo><msub><mi>n</mi><mi>i</mi></msub></mrow></munder><msu
   b><mi mathvariant="bold">Z</mi><mi>j</mi></msub><mo>,</mo><mspace
   width="1em"></mspace><mi>i</mi><mo>∈</mo><mfenced separators=""
   open="{"
   close="}"><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mtext>…</mtext><mo>,
   </mo><mi>k</mi></mfenced></mrow></mrow> :MATH]
   (12)

   where
   [MATH: <mrow><msub><mover><mi
   mathvariant="bold">Z</mi><mo>¯</mo></mover><mi>i</mi></msub></mrow>
   :MATH]
   denotes the average embedding of group i; n [i ]represents the set of
   spots in group i; |n [i ]| is the number of spots in group i. Likewise,
   still calculated the cosine similarity matrix
   [MATH: <mrow><mrow><mover accent="true"><mi
   mathvariant="bold">S</mi><mo>∼</mo></mover><mo>∈</mo><msup><mi
   mathvariant="bold">R</mi><mrow><mi>d</mi><mo>×</mo><mi>d</mi></mrow></m
   sup></mrow></mrow> :MATH]
   of cross‐view cluster‐level embedding representations from the aspect
   of feature dimension, which is defined as follows:
   [MATH: <mrow><mrow><msub><mover
   accent="true"><mi>S</mi><mo>∼</mo></mover><mrow><mi>i</mi><mi>j</mi></m
   row></msub><mo linebreak="badbreak">=</mo><mfrac><mrow><msup><mrow><mo
   stretchy="false">(</mo><mi mathvariant="script">R</mi><msub><mrow><mo
   stretchy="false">(</mo><mi mathvariant="bold">Z</mi><mo
   stretchy="false">)</mo></mrow><mi>i</mi></msub><mo
   stretchy="false">)</mo></mrow><mi>T</mi></msup><mrow><mo
   stretchy="false">(</mo><mi mathvariant="script">R</mi><msub><mrow><mo
   stretchy="false">(</mo><mover accent="true"><mi
   mathvariant="bold">Z</mi><mo>^</mo></mover><mo
   stretchy="false">)</mo></mrow><mi>j</mi></msub><mo
   stretchy="false">)</mo></mrow></mrow><mrow><msub><mrow><mo
   stretchy="false">|</mo><mo stretchy="false">|</mo><mi
   mathvariant="script">R</mi><mrow><mo stretchy="false">(</mo><mi
   mathvariant="bold">Z</mi><mo
   stretchy="false">)</mo></mrow></mrow><mi>i</mi></msub><mrow><mo
   stretchy="false">|</mo><mo stretchy="false">|</mo><mo
   stretchy="false">|</mo><mo stretchy="false">|</mo><mi
   mathvariant="script">R</mi></mrow><msub><mrow><mo
   stretchy="false">(</mo><mover accent="true"><mi
   mathvariant="bold">Z</mi><mo>^</mo></mover><mo
   stretchy="false">)</mo></mrow><mi>j</mi></msub><mrow><mo
   stretchy="false">|</mo><mo
   stretchy="false">|</mo></mrow></mrow></mfrac></mrow></mrow> :MATH]
   (13)

   where
   [MATH: <mrow><msub><mover
   accent="true"><mi>S</mi><mo>∼</mo></mover><mrow><mi>i</mi><mi>j</mi></m
   row></msub></mrow> :MATH]
   denotes the cosine similarity between the average embedding of the i‐th
   dimension feature in all groups in the first view and the average
   embedding of the j‐th dimension feature in all groups in the second
   view. Representations of the same dimension feature across views were
   taken as positive pairs and maximize their similarity. Therefore, the
   MSE of
   [MATH: <mrow><mover accent="true"><mi
   mathvariant="bold">S</mi><mo>∼</mo></mover></mrow> :MATH]
   and an identity matrix I ∈ R ^d × d were used as the cluster‐level
   contrastive loss, which is defined as follows:
   [MATH: <mrow><mtable displaystyle="true"><mtr><mtd
   columnalign="right"><msub><mi
   mathvariant="script">L</mi><mrow><mi>c</mi><mi>l</mi><mi>u</mi></mrow><
   /msub></mtd><mtd columnalign="left"><mrow><mo
   linebreak="badbreak">=</mo><mfrac><mn>1</mn><msup><mi>d</mi><mn>2</mn><
   /msup></mfrac><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn
   ></mrow><mi>d</mi></munderover><munderover><mo>∑</mo><mrow><mi>j</mi><m
   o>=</mo><mn>1</mn></mrow><mi>d</mi></munderover><msup><mfenced
   separators="" open="(" close=")"><msub><mover
   accent="true"><mi>S</mi><mo>∼</mo></mover><mrow><mi>i</mi><mi>j</mi></m
   row></msub><mo>−</mo><msub><mi>I</mi><mrow><mi>i</mi><mi>j</mi></mrow><
   /msub></mfenced><mn>2</mn></msup></mrow></mtd></mtr><mtr><mtd></mtd><mt
   d columnalign="left"><mrow><mo
   linebreak="badbreak">=</mo><mfrac><mn>1</mn><msup><mi>d</mi><mn>2</mn><
   /msup></mfrac><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn
   ></mrow><mi>d</mi></munderover><msup><mfenced separators="" open="("
   close=")"><msub><mover
   accent="true"><mi>S</mi><mo>∼</mo></mover><mrow><mi>i</mi><mi>i</mi></m
   row></msub><mo>−</mo><mn>1</mn></mfenced><mn>2</mn></msup><mo
   linebreak="goodbreak">+</mo><mfrac><mn>1</mn><mrow><msup><mi>d</mi><mn>
   2</mn></msup><mo>−</mo><mi>d</mi></mrow></mfrac><munderover><mo>∑</mo><
   mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>d</mi></munderover><munde
   r><mo>∑</mo><mrow><mi>j</mi><mo
   linebreak="badbreak">≠</mo><mi>i</mi></mrow></munder><msup><mfenced
   separators="" open="(" close=")"><msub><mover
   accent="true"><mi>S</mi><mo>∼</mo></mover><mrow><mi>i</mi><mi>j</mi></m
   row></msub></mfenced><mn>2</mn></msup></mrow></mtd></mtr></mtable></mro
   w> :MATH]
   (14)

   The cluster‐level feature contrastive learning strengthens the
   correlation between the same dimension features in two views and
   reduces the correlation between different dimension features. This can
   filter out the redundant information of latent embedding
   representation, preserve more discriminative features well, and improve
   the clustering performance of the model.

Overall Loss Function

   stDCL learns the latent embedding representation by minimizing the
   reconstruction loss
   [MATH: <mrow><msub><mi
   mathvariant="script">L</mi><mrow><mi>r</mi><mi>e</mi><mi>c</mi></mrow><
   /msub></mrow> :MATH]
   , the space‐aware contrastive loss
   [MATH: <mrow><msub><mi
   mathvariant="script">L</mi><mrow><mi>c</mi><mi>o</mi><mi>n</mi></mrow><
   /msub></mrow> :MATH]
   , and the cluster‐level feature contrastive loss
   [MATH: <mrow><msub><mi
   mathvariant="script">L</mi><mrow><mi>c</mi><mi>l</mi><mi>u</mi></mrow><
   /msub></mrow> :MATH]
   , which can be defined as:
   [MATH: <mrow><mrow><mi mathvariant="script">L</mi><mo
   linebreak="badbreak">=</mo><msub><mi>γ</mi><mn>1</mn></msub><msub><mi
   mathvariant="script">L</mi><mrow><mi>r</mi><mi>e</mi><mi>c</mi></mrow><
   /msub><mo
   linebreak="goodbreak">+</mo><msub><mi>γ</mi><mn>2</mn></msub><msub><mi
   mathvariant="script">L</mi><mrow><mi>c</mi><mi>o</mi><mi>n</mi></mrow><
   /msub><mo
   linebreak="goodbreak">+</mo><msub><mi>γ</mi><mn>3</mn></msub><msub><mi
   mathvariant="script">L</mi><mrow><mi>c</mi><mi>l</mi><mi>u</mi></mrow><
   /msub></mrow></mrow> :MATH]
   (15)

   where γ[1], γ[2] and γ[3] are weight coefficients assigned to each loss
   to control the balance of the overall loss function. γ[1], γ[2] and
   γ[3]was set to [10, 0.5, 0.8] after parametric analysis. For model
   training, it was divided it into two parts: pre‐training and training,
   and both are optimized using the Adam optimizer. The learning rate and
   epoch for these two parts were 0.001, 0.005 and 500,
   1000, respectively.

Clustering and Refinement

   To identify the spatial domains in our study, clustering techniques
   were employed on the latent embedding representations generated by
   stDCL. First, the dimension of the learned embedding representations of
   stDCL was reduced to 20 by PCA, and then used clustering methods to
   perform clustering. When the data is manually annotated, the mclust
   clustering algorithm was employed. For cases without manual annotation,
   the Leiden algorithm was used. To enhance the clustering performance,
   data augmentation were employed with two different Gaussian
   distributions and selected clustering results with relatively low
   Davies–Bouldin scores. Furthermore, stDCL incorporates a refinement
   process for the clustering results. Specifically, the neighborhood of
   each spot was defined as a circle with a radius of r (r is set to 50)
   and reassigned the spots to domains with the most common labels within
   their respective neighborhoods.

Implementation Details and Comparisons with Baseline Methods

   stDCL was implemented in Python, and its core model is built on the
   PyTorch framework, which is publicly available at
   [215]https://github.com/Philyzh8/stDCL. To evaluate the performance of
   stDCL in spatial clustering, stDCL was compared with the following
   methods:
     * Seurat. Seurat is a comprehensive software package written in R
       that facilitates the analysis of single cell data, including
       modules dedicated to processing spatial transcriptomics data. For
       data preprocessing, the default parameters in Seurat were utilized
       . To perform clustering, the appropriate resolution based on the
       desired number of clusters was determined and the “FindClusters”
       function was employed to assign cells to their respective clusters.
     * SpaGCN. SpaGCN is a method based on graph convolutional networks
       that integrates three types of information including gene
       expression, spatial location, and histological images to identify
       spatial domains. The adjacency matrix in SpaGCN is constructed with
       parameters alpha' and beta' set to 1 and 49, respectively. During
       the training process, we set the learning rate to 0.05 and trained
       the model for 200 epochs.
     * stLearn. stLearn uses the deep learning model to extract
       information from tissue morphology to normalize gene expression and
       improve spatial clustering. In this study, stLearn was employed for
       data normalization and spatial clustering, using the default
       parameter settings. The clustering method used in stLearn is the
       K‐means method.
     * BayesSpace. BayesSpace is a Bayesian statistical clustering method
       that leverages information about spatial neighborhoods to enhance
       the resolution of spatial transcriptomics data and perform
       clustering analysis. The parameters were configured as follows: “d”
       was set to 15, “nrep” to 50 000, and “gamma” to 3.
     * SEDR. SEDR (Spatial Embedding with Deep Representation) combines
       both an autoencoder and a variational autoencoder to generate
       embedding representations of transcripts and spatial information.
       During training, the number of epochs were set to 200, the learning
       rate to 0.01, and the dropout rate to 0.2.
     * CCST. CCST is an unsupervised cell clustering method that uses GCNs
       to learn cell embedding representations from graphs derived from
       spatial transcriptomics data. For parameter settings, the default
       parameter settings were used in their code. The training process
       consisted of 5000 epochs, and the number of hidden layer nodes was
       set to 256. Additionally, the parameter “lambda_I” was set to 0.3.
     * STAGATE. STAGATE is a graph attention autoencoder framework
       designed to learn a low‐dimensional latent embedding by combining
       spatial information and gene expression profiles. For data
       preprocessing, 3000 highly variable genes were specifically
       selected as input for analysis. The parameters “rad_cutoff” and
       “alpha” were set to 150 and 0, respectively.
     * GraphST. GraphST is a graph self‐supervised comparative learning
       method that effectively utilizes spatial information and gene
       expression profiles for spatial clustering. The parameters set for
       model training were adopted from the recommended parameters in
       their code. For loss function weight assignment, the parameters
       “alpha” and “beta” were set to 10 and 1, respectively. The training
       process consisted of 600 epochs, with a learning rate of 0.001 and
       a dropout rate of 0.2.

Benchmarking Metrics for stDCL Clustering

   Three common external metrics were adopted to evaluate clustering
   performance, namely adjusted rand index (ARI), normalized mutual
   information (NMI), and homogeneity score (HS). ARI can measure the
   similarity between the cluster labels predicted by the algorithm and
   the real cluster labels. NMI quantifies the similarity of between two
   clusters by considering the mutual information between the predicted
   and true cluster labels, and normalizing it based on the entropy of the
   individual clusters. These two metrics can be defined as follows:
   [MATH: <mrow><mrow><mi>A</mi><mi>R</mi><mi>I</mi><mo
   linebreak="badbreak">=</mo><mfrac><mrow><msub><mo>∑</mo><mrow><mi>i</mi
   ><mi>j</mi></mrow></msub><mfenced separators="" open="("
   close=")"><mfrac
   linethickness="0pt"><msub><mi>n</mi><mrow><mi>i</mi><mi>j</mi></mrow></
   msub><mn>2</mn></mfrac></mfenced><mo>−</mo><mfenced separators=""
   open="[" close="]"><msub><mo>∑</mo><mi>i</mi></msub><mfenced
   separators="" open="(" close=")"><mfrac
   linethickness="0pt"><msub><mi>a</mi><mi>i</mi></msub><mn>2</mn></mfrac>
   </mfenced><msub><mo>∑</mo><mi>j</mi></msub><mfenced separators=""
   open="(" close=")"><mfrac
   linethickness="0pt"><msub><mi>b</mi><mi>j</mi></msub><mn>2</mn></mfrac>
   </mfenced></mfenced><mo>/</mo><mfenced separators="" open="("
   close=")"><mfrac
   linethickness="0pt"><mi>n</mi><mn>2</mn></mfrac></mfenced></mrow><mrow>
   <mfrac><mn>1</mn><mn>2</mn></mfrac><mfenced separators="" open="["
   close="]"><msub><mo>∑</mo><mi>i</mi></msub><mfenced separators=""
   open="(" close=")"><mfrac
   linethickness="0pt"><msub><mi>a</mi><mi>i</mi></msub><mn>2</mn></mfrac>
   </mfenced><mo>+</mo><msub><mo>∑</mo><mi>j</mi></msub><mfenced
   separators="" open="(" close=")"><mfrac
   linethickness="0pt"><msub><mi>b</mi><mi>j</mi></msub><mn>2</mn></mfrac>
   </mfenced></mfenced><mo>−</mo><mfenced separators="" open="["
   close="]"><msub><mo>∑</mo><mi>i</mi></msub><mfenced separators=""
   open="(" close=")"><mfrac
   linethickness="0pt"><msub><mi>a</mi><mi>i</mi></msub><mn>2</mn></mfrac>
   </mfenced><msub><mo>∑</mo><mi>j</mi></msub><mfenced separators=""
   open="(" close=")"><mfrac
   linethickness="0pt"><msub><mi>b</mi><mi>j</mi></msub><mn>2</mn></mfrac>
   </mfenced></mfenced><mo>/</mo><mfenced separators="" open="("
   close=")"><mfrac
   linethickness="0pt"><mi>n</mi><mn>2</mn></mfrac></mfenced></mrow></mfra
   c></mrow></mrow> :MATH]
   (16)
   [MATH: <mrow><mrow><mi>N</mi><mi>M</mi><mi>I</mi><mo
   linebreak="badbreak">=</mo><mfrac><mrow><msub><mo>∑</mo><mrow><mi>i</mi
   ><mi>j</mi></mrow></msub><msub><mi>n</mi><mrow><mi>i</mi><mi>j</mi></mr
   ow></msub><mi>log</mi><mfrac><mrow><msub><mi>n</mi><mrow><mi>i</mi><mi>
   j</mi></mrow></msub><mi>n</mi></mrow><mrow><msub><mi>a</mi><mi>i</mi></
   msub><msub><mi>b</mi><mi>j</mi></msub></mrow></mfrac></mrow><msqrt><mro
   w><msub><mo>∑</mo><mi>i</mi></msub><msub><mi>a</mi><mi>i</mi></msub><mi
   >log</mi><mfrac><msub><mi>a</mi><mi>i</mi></msub><mi>n</mi></mfrac><msu
   b><mo>∑</mo><mi>j</mi></msub><msub><mi>b</mi><mi>j</mi></msub><mi>log</
   mi><mfrac><msub><mi>b</mi><mi>j</mi></msub><mi>n</mi></mfrac></mrow></m
   sqrt></mfrac></mrow></mrow> :MATH]
   (17)

   where n is the total number of samples, n [ij ]is the number of samples
   belonging to both the i‐th cluster in the true label and the j‐th
   cluster in the predicted label, a [i ]and b [j ]are the total number of
   samples in the i‐th cluster in the true label and the j‐th cluster in
   the predicted label, respectively.

   HS can measure the homogeneity of the clustering algorithm for sample
   division, which means that only samples of the same category are
   included in the cluster. The HS ranges from 0 to 1, where a higher
   value indicates a higher level of homogeneity in the clusters, which
   can be defined as:
   [MATH: <mrow><mrow><mi>h</mi><mo
   linebreak="badbreak">=</mo><mn>1</mn><mo
   linebreak="goodbreak">−</mo><mfrac><mrow><mi>H</mi><mo
   stretchy="false">(</mo><mi>C</mi><mo
   stretchy="false">|</mo><mi>K</mi><mo
   stretchy="false">)</mo></mrow><mrow><mi>H</mi><mo
   stretchy="false">(</mo><mi>C</mi><mo
   stretchy="false">)</mo></mrow></mfrac></mrow></mrow> :MATH]
   (18)

   where H(C|K) is the conditional entropy of the true label partition
   under the predicted label partition condition.

   In addition, the internal metric Davies‐Bouldin Index (DBI) was also
   used to improve the clustering performance of stDCL. DBI can calculate
   the average of the maximum ratio of the within‐cluster distance and the
   between‐cluster distance for each cluster, which can be formulated as:
   [MATH: <mrow><mrow><mi>D</mi><mi>B</mi><mi>I</mi><mo
   linebreak="badbreak">=</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><munderov
   er><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></mun
   derover><munder><mi>max</mi><mrow><mi>j</mi><mo
   linebreak="badbreak">≠</mo><mi>i</mi></mrow></munder><mfenced
   separators="" open="("
   close=")"><mfrac><mrow><mover><msub><mi>S</mi><mi>i</mi></msub><mo>¯</m
   o></mover><mo>+</mo><mover><msub><mi>S</mi><mi>j</mi></msub><mo>¯</mo><
   /mover></mrow><msub><mi>E</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><
   /mfrac></mfenced></mrow></mrow> :MATH]
   (19)

   where n is the total number of clusters,
   [MATH:
   <mrow><mover><msub><mi>S</mi><mi>i</mi></msub><mo>¯</mo></mover></mrow>
   :MATH]
   denotes the average distance from the sample in the i‐th cluster to the
   cluster centroid, and E [ij ]represents the Euclidean distance between
   the i‐th cluster and the j‐th cluster. The lower the DBI, the better
   the clustering quality.

Benchmarking Metrics for Stdcl Imputation

   Four common metrics were adopted to evaluate the imputation performance
   of stDCL, namely Pearson correlation coefficient (PCC), Mean Squared
   Error (MSE), Spearman Rank Correlation Coefficient (SRCC), and Cosine
   Similarity (CS). These metrics evaluate the correlation and consistency
   of stDCL's imputation space with the original space. PCC was used to
   measure the linear correlation between the imputed and the true values.
   It captures the strength and direction of a linear relationship, with a
   value ranging from –1 to 1, where 1 indicates a perfect positive linear
   correlation and –1 indicates a perfect negative linear correlation,
   which can be defined as:
   [MATH: <mrow><mrow><mi>P</mi><mi>C</mi><mi>C</mi><mo
   linebreak="badbreak">=</mo><mfrac><mrow><msubsup><mo>∑</mo><mrow><mi>i<
   /mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></msubsup><mrow><mo
   stretchy="false">(</mo><msub><mi>x</mi><mi>i</mi></msub><mo>−</mo><move
   r><mi>x</mi><mo>¯</mo></mover><mo
   stretchy="false">)</mo></mrow><mrow><mo
   stretchy="false">(</mo><msub><mi>y</mi><mi>i</mi></msub><mo>−</mo><move
   r><mi>y</mi><mo>¯</mo></mover><mo
   stretchy="false">)</mo></mrow></mrow><mrow><msqrt><mrow><msubsup><mo>∑<
   /mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></msubsup><msu
   p><mrow><mo
   stretchy="false">(</mo><msub><mi>x</mi><mi>i</mi></msub><mo>−</mo><move
   r><mi>x</mi><mo>¯</mo></mover><mo
   stretchy="false">)</mo></mrow><mn>2</mn></msup></mrow></msqrt><msqrt><m
   row><msubsup><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n
   </mi></msubsup><msup><mrow><mo
   stretchy="false">(</mo><msub><mi>y</mi><mi>i</mi></msub><mo>−</mo><move
   r><mi>y</mi><mo>¯</mo></mover><mo
   stretchy="false">)</mo></mrow><mn>2</mn></msup></mrow></msqrt></mrow></
   mfrac></mrow></mrow> :MATH]
   (20)

   where x [i ]and y [i ]represent the true and imputed values,
   respectively, and
   [MATH: <mrow><mover><mi>x</mi><mo>¯</mo></mover></mrow> :MATH]
   and
   [MATH: <mrow><mover><mi>y</mi><mo>¯</mo></mover></mrow> :MATH]
   denote the mean of the true and imputed values.

   MSE is used to measure the average squared difference between the true
   and imputed values. MSE penalizes larger deviations more heavily, which
   makes it sensitive to outliers. The lower the MSE, the better the
   imputation performance, which can be defined as:
   [MATH: <mrow><mrow><mi>M</mi><mi>S</mi><mi>E</mi><mo
   linebreak="badbreak">=</mo><mfrac><mn>1</mn><mi>n</mi></mfrac><munderov
   er><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></mun
   derover><msup><mrow><mo
   stretchy="false">(</mo><msub><mi>x</mi><mi>i</mi></msub><mo>−</mo><msub
   ><mi>y</mi><mi>i</mi></msub><mo
   stretchy="false">)</mo></mrow><mn>2</mn></msup></mrow></mrow> :MATH]
   (21)

   where n is the number of spots, and x [i ]and y [i ]are the true and
   imputed values, respectively.

   SRCC measures the rank‐based correlation between the true and imputed
   values. Unlike PCC, SRCC is non‐parametric and assesses how well the
   relationship between the two variables can be described by a monotonic
   function, which can be formulated as:
   [MATH: <mrow><mrow><mi>S</mi><mi>R</mi><mi>C</mi><mi>C</mi><mo
   linebreak="badbreak">=</mo><mfrac><mrow><msubsup><mo>∑</mo><mrow><mi>i<
   /mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></msubsup><mrow><mo
   stretchy="false">(</mo><msub><mi>r</mi><msub><mi>x</mi><mi>i</mi></msub
   ></msub><mo>−</mo><mover><msub><mi>r</mi><mi>x</mi></msub><mo>¯</mo></m
   over><mo stretchy="false">)</mo></mrow><mrow><mo
   stretchy="false">(</mo><msub><mi>r</mi><msub><mi>y</mi><mi>i</mi></msub
   ></msub><mo>−</mo><mover><msub><mi>r</mi><mi>y</mi></msub><mo>¯</mo></m
   over><mo
   stretchy="false">)</mo></mrow></mrow><mrow><msqrt><mrow><msubsup><mo>∑<
   /mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></msubsup><msu
   p><mrow><mo
   stretchy="false">(</mo><msub><mi>r</mi><msub><mi>x</mi><mi>i</mi></msub
   ></msub><mo>−</mo><mover><msub><mi>r</mi><mi>x</mi></msub><mo>¯</mo></m
   over><mo
   stretchy="false">)</mo></mrow><mn>2</mn></msup></mrow></msqrt><msqrt><m
   row><msubsup><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n
   </mi></msubsup><msup><mrow><mo
   stretchy="false">(</mo><msub><mi>r</mi><msub><mi>y</mi><mi>i</mi></msub
   ></msub><mo>−</mo><mover><msub><mi>r</mi><mi>y</mi></msub><mo>¯</mo></m
   over><mo
   stretchy="false">)</mo></mrow><mn>2</mn></msup></mrow></msqrt></mrow></
   mfrac></mrow></mrow> :MATH]
   (22)

   where
   [MATH:
   <mrow><msub><mi>r</mi><msub><mi>x</mi><mi>i</mi></msub></msub></mrow>
   :MATH]
   and
   [MATH:
   <mrow><msub><mi>r</mi><msub><mi>y</mi><mi>i</mi></msub></msub></mrow>
   :MATH]
   are the rank values of x [i ]and y [i ], respectively.

   CS quantifies the similarity between two non‐zero vectors by measuring
   the cosine of the angle between them. It is particularly useful when
   comparing high‐dimensional data and ranges from 0 to 1, with 1
   indicating perfect similarity, which can be defined as:
   [MATH: <mrow><mrow><mi>C</mi><mi>S</mi><mo
   linebreak="badbreak">=</mo><mfrac><mrow><msubsup><mo>∑</mo><mrow><mi>i<
   /mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></msubsup><msub><mi>x</mi><mi>
   i</mi></msub><msub><mi>y</mi><mi>i</mi></msub></mrow><mrow><msqrt><mrow
   ><msubsup><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</m
   i></msubsup><msubsup><mi>x</mi><mi>i</mi><mn>2</mn></msubsup></mrow></m
   sqrt><msqrt><mrow><msubsup><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</m
   n></mrow><mi>n</mi></msubsup><msubsup><mi>y</mi><mi>i</mi><mn>2</mn></m
   subsup></mrow></msqrt></mrow></mfrac></mrow></mrow> :MATH]
   (23)

   where x [i ]and y [i ]are the true and imputed values.

Differential Gene Expression Analysis

   We performed differential gene expression (DE) analysis using the
   function “FindMarkers” in the R package Seurat (v4.0.2).^[ [216]^29 ^]
   The Wilcoxon rank sum test was applied to identify differentially
   expressed genes, and a Benjamini‐Hochberg correction was used to adjust
   the p‐values. Specifically, we selected the genes with adjusted p‐value
   ⩽ 0.05 as differentially expressed genes in the tested regions.

Functional Enrichment

   Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG)
   enrichment analyses was performed on the differentially expressed genes
   using the R package ClusterProfiler (v3.18.1).^[ [217]^67 ^] The
   function “enrichGO” and “enrichKEGG” were applied to GO analysis and
   KEGG analysis. In both analyses, the parameters were set as follows:
   “pAdjustMethod = BH” and “pvalueCutoff = 0.05”. Subsequently, the
   results of the GO analysis were visually presented using the “heatplot”
   function from the R package enrichplot (v1.14.2). These analyses
   provided valuable insights into the enriched biological processes and
   pathways associated with the differentially expressed genes.

PPI Network

   STRING^[ [218]^46 ^] was utilized to generate a protein–protein
   interaction (PPI) network for the up‐regulated differentially expressed
   genes in the brain regions of mouse embryos. The PPI network was
   constructed using interactions with scores greater than 0.4. To
   visualize the network, Cytoscape was employed,^[ [219]^68 ^] which
   allowed us to gain a visual representation of the network and explore
   its underlying structure. Furthermore, MCODE^[ [220]^47 ^]was adopted
   to identify significant coherent modules within the PPI network and
   showed the highest scoring modules.

Trajectory Inference

   The R package Slingshot (v2.2.1)^[ [221]^69 ^] was used for trajectory
   inference. Slingshot is a powerful tool designed to infer the structure
   and order of cell lineage differentiation, which consists of two main
   stages: inference of lineage structure, and inference of pseudotemporal
   changes in cells of each lineage. In this analysis, Slingshot was
   utilized on the latent representations obtained from stDCL. The
   predicted cluster labels were used as input for the “slingshot”
   function in Slingshot. This allowed us to leverage the learned latent
   representations to infer the trajectory and pseudotime information for
   the cells. Following trajectory inference, Slingshot assigned a
   pseudo‐time value to each location. In addition, Slingshot needed to
   set a domain as the starting domain in trajectory inference, and the
   White matter was set as the starting domain when analyzing the osmFISH
   dataset. This choice determined the initial reference point from which
   the trajectory was inferred, aiding in the interpretation of the
   differentiation process in the context of the specific dataset.
   Moreover, R package Monocle3 (v1.3.6)
   ([222]http://cole‐trapnell‐lab.github.io/monocle‐release/monocle3) was
   used to infer pseudotime trajectories of astrocyte subtypes in the
   STARmap PLUS data. Monocle3 used the “learn_graph” function to learn
   principal graph in the UMAP space. Pseudotime was computed by
   calculating the geodesic distance from the specified root node to the
   cell node along the principal graph, and adopted the function
   “plot_cells” to visualize the trajectory inference.

Gene Set Variation Analysis (GSVA)

   Gene set variation analysis were performed using the R package GSVA
   (v1.38.2). In the analysis, the “gsva” function from the GSVA package
   was utilized. The GSVA enrichment scores were estimated using the
   “method = ssgsea” parameter setting. This method, based on the Single
   Sample Gene Set Enrichment Analysis (ssGSEA) approach, allowed us to
   evaluate the activity or enrichment levels of gene sets or pathways in
   each sample. The related pathway enrichment was used to assess the
   activation of WikiPathways pathways, which is available in the MSigDB
   database.^[ [223]^70 ^]

Conflict of Interest

   The authors declare no conflict of interest.

Supporting information

   Supporting Information
   [224]ADVS-12-2410081-s001.pdf^ (1.5MB, pdf)

Acknowledgements