Abstract

   Embeddings derived from cell graphs hold significant potential for
   exploring spatial transcriptomics (ST) datasets. Nevertheless, existing
   methodologies rely on a graph structure defined by spatial proximity,
   which inadequately represents the diversity inherent in cell‐cell
   interactions (CCIs). This study introduces STAGUE, an innovative
   framework that concurrently learns a cell graph structure and a
   low‐dimensional embedding from ST data. STAGUE employs graph structure
   learning to parameterize and refine a cell graph adjacency matrix,
   enabling the generation of learnable graph views for effective
   contrastive learning. The derived embeddings and cell graph improve
   spatial clustering accuracy and facilitate the discovery of novel CCIs.
   Experimental benchmarks across 86 real and simulated ST datasets show
   that STAGUE outperforms 15 comparison methods in clustering
   performance. Additionally, STAGUE delineates the heterogeneity in human
   breast cancer tissues, revealing the activation of
   epithelial‐to‐mesenchymal transition and PI3K/AKT signaling in specific
   sub‐regions. Furthermore, STAGUE identifies CCIs with greater alignment
   to established biological knowledge than those ascertained by existing
   graph autoencoder‐based methods. STAGUE also reveals the regulatory
   genes that participate in these CCIs, including those enriched in
   neuropeptide signaling and receptor tyrosine kinase signaling pathways,
   thereby providing insights into the underlying biological processes.

   Keywords: graph structure learning, cell‐cell interactions, spatial
   clustering, spatial transcriptomics
     __________________________________________________________________

   STAGUE is an innovative framework that introduces graph structure
   learning into unsupervised representation learning of spatial
   transcriptomics data. It generates a learned adjacency matrix and
   spatially aware cell embeddings for the cell graph. The adjacency
   matrix can discern novel cell‐cell interactions, whereas the cell
   embeddings support multiple downstream applications such as clustering,
   visualization, and trajectory inference.

   graphic file with name ADVS-11-2403572-g001.jpg

1. Introduction

   Within the tissues of multicellular organisms, cells often aggregate
   into spatially organized and functionally distinct anatomical
   structures,^[ [32]^1 , [33]^2 ^] and proximal cells frequently
   coordinate their biological functions through cell‐cell interactions
   (CCIs). The dual challenges of pinpointing these spatial clusters,
   i.e., spatial clustering, and decoding the intricate communication
   patterns between cells, i.e., CCI inference, are pivotal for a
   comprehensive understanding of biological mechanisms. The
   groundbreaking spatial transcriptomic (ST) technologies,^[ [34]^3 ^]
   which allow for gene expression profiling while retaining critical
   spatial localization information, present an invaluable opportunity to
   discover spatially continuous clusters and decipher CCIs with greater
   biological fidelity. Recently, there has been great interest in
   applying unsupervised graph representation learning (UGRL) techniques
   to integrate spatial information into the learning of cell embeddings
   and spatial clustering.^[ [35]^4 , [36]^5 , [37]^6 , [38]^7 , [39]^8 ,
   [40]^9 , [41]^10 , [42]^11 ^] These methods involve converting the
   input ST data into a cell graph, where cells or spots are treated as
   nodes. Edges between these nodes are established based on the spatial
   proximity of the corresponding cells. By applying UGRL algorithms, they
   seek to infer discriminative, low‐dimensional representations of cells,
   which are intended to preserve the essential spatial and gene
   expression information in the ST data. The inferred low‐dimensional
   cell representations provide a powerful basis for downstream clustering
   with unsupervised algorithms such as k‐means,^[ [43]^12 ^] mclust,^[
   [44]^13 ^] and Leiden.^[ [45]^14 ^] Among these methods, some propose
   jointly optimizing spatially aware cell representations while inferring
   novel CCIs.^[ [46]^9 , [47]^10 , [48]^11 ^] They build upon graph
   autoencoders (GAE), learning cell representations by reconstructing the
   predefined adjacency matrix of the input cell graph. Subsequently,
   novel CCIs are identified by analyzing the newly generated edges
   according to the reconstructed adjacency matrix.

   Recently, graph contrastive learning (GCL) has emerged as an important
   UGRL paradigm for ST studies.^[ [49]^15 ^] Existing GCL‐based methods^[
   [50]^16 , [51]^17 , [52]^18 , [53]^19 , [54]^20 ^] build upon the Deep
   Graph Infomax (DGI) framework.^[ [55]^21 ^] They typically utilize
   ad‐hoc augmentation functions^[ [56]^15 ^] to generate different views
   of the input cell graph stochastically, such as randomly shuffling node
   features or randomly adding/dropping edges of the original cell graph.
   Subsequently, they harness the principle of mutual information
   maximization to bring similar views closer while distancing the
   irrelevant ones. It is crucial to note that in current GCL‐based
   methods, the generation of different cell graph views relies on a
   predefined graph structure based on spatial proximity. When
   constructing the graph topology, these methods often identify
   neighboring cells based on two main criteria: proximity within a
   designated distance or a predetermined quantity of the nearest cells.
   This methodology may possess certain intrinsic limitations, as merely
   factoring in spatial proximity does not adequately capture the
   complexity of intercellular communications that regulate gene
   expression dependencies among cells. For instance, the signal strength
   from cell secreting signaling molecules (e.g., protein ligands) often
   attenuates with increasing distance,^[ [57]^22 ^] and nearby cells
   selectively respond to these molecules via specific receptors.^[
   [58]^23 ^] A smaller distance range for constructing cell graphs can
   reduce the inclusion of irrelevant interactions, known as noise, but it
   may also overlook significant long‐distance relationships. Conversely,
   a larger range might capture these interactions at the cost of reducing
   the signal‐to‐noise ratio. Therefore, depending solely on spatial
   information to define the graph structure is insufficient and
   challenging. Moreover, this trade‐off poses a limitation on existing
   GAE‐based methods for CCI inference, as their reconstructing objective,
   i.e., the predefined cell graph adjacency matrix, may inherently
   contain noise.

   To address the above problems, in this study, we investigate a scenario
   where the cell graph structure is dynamically learned and adjusted
   during the contrastive learning process. To this end, we draw
   inspiration from recent developments in graph structure learning
   (GSL),^[ [59]^24 , [60]^25 , [61]^26 ^] and propose STAGUE, an
   unsupervised representation learning model for Spatial Transcriptomics
   with spAtially informed Graph strUcture lEarning. STAGUE infers
   discriminative cell representations by exploiting the contrast between
   multiple views of the input cell graph. By parameterizing and
   optimizing the cell graph structure, it can generate learnable graph
   views to enhance the efficiency of the contrast process. This approach
   also seamlessly integrates the spatial clustering and CCI inference
   tasks into a unified framework. Specifically, STAGUE employs a spatial
   learner module that effectively models a cell graph adjacency matrix by
   utilizing the intrinsic spatial and gene expression relationships among
   cells. The learned adjacency matrix captures the statistical
   dependencies among cells, reflecting their proximities and potential
   interactions. The integration of the normalized temperature‐scaled
   cross‐entropy loss^[ [62]^25 , [63]^27 ^] and triplet loss as joint
   contrastive objectives facilitates the contrast between different
   views. Benchmark results show that STAGUE outperforms 15 comparison
   methods in spatial clustering across 86 ST datasets from different
   platforms. Moreover, STAGUE exhibits the capability to discern
   heterogeneity within both cancerous and paracancerous regions in human
   breast cancer samples. It uncovers the localized initiation of the
   epithelial‐to‐mesenchymal transition as well as the activation of
   PI3K/AKT signaling pathways in particular sub‐regions. For the CCI
   inference task, STAGUE shows superior ability to discern biologically
   meaningful CCIs compared to existing GAE‐based methods, and the
   regulatory genes governing these interactions are revealed.
   Comprehensive parameter analyses and ablation studies validate the
   effectiveness of the proposed components in STAGUE.

2. Results

2.1. Overview of STAGUE

   STAGUE is an unsupervised representation learning model for spatial
   transcriptomics (ST), utilizing spatially informed graph structure
   learning (GSL) to integrate spatial clustering and cell‐cell
   interaction (CCI) inference tasks into a unified framework (Figure [64]
   1 , Experimental Section). Given an ST dataset as input, we can extract
   a gene expression matrix and spatial coordinates of cells. Utilizing
   the coordinates, we derive a raw k‐nearest neighbors (kNN) cell
   adjacency matrix. STAGUE mainly consists of three different views of
   the input ST data. The learner view is derived using a spatial learner
   module that integrates the expression features and spatial information
   to learn a cell adjacency matrix. Compared with the raw kNN cell
   adjacency matrix, the learned adjacency matrix features continuous
   values and is designed to capture the statistical dependencies among
   cells, reflecting their proximities and potential interactions. The
   positive view is generated from the raw kNN cell adjacency matrix and
   serves as the positive pair of the learner view. The negative view is
   formed by randomly shuffling the node features and using an identity
   matrix as the adjacency matrix. Before the Graph Convolutional Network
   (GCN)^[ [65]^28 ^] encoding, the three views are processed using data
   augmentation techniques, including feature masking and edge dropping.
   The encoded embeddings are then mapped to the space where contrastive
   loss is applied through projection heads. The integration of the
   normalized temperature‐scaled cross‐entropy (NT‐Xent) loss^[ [66]^25 ,
   [67]^27 ^] and triplet loss enables an effective contrast between the
   three views. The learned adjacency matrix from the spatial learner
   module can be utilized to discern novel CCIs. Furthermore, the
   embeddings from the learner view can be used for multiple downstream
   analyses, including spatial clustering, data visualization, and
   trajectory inference. Note that various unsupervised clustering
   algorithms^[ [68]^12 , [69]^13 , [70]^14 ^] are applicable to cell
   embeddings; in this study, the standard k‐means was utilized as the
   default clustering algorithm.

Figure 1.

   Figure 1
   [71]Open in a new tab

   Overview of STAGUE. The input ST data comprises gene expression matrix
   X and spatial coordinates C. Utilizing C, a raw kNN cell adjacency
   matrix Γ is derived. STAGUE processes the input ST data through three
   views: the learner view, which combines expression features with
   spatial information to infer a cell adjacency matrix A capturing cell
   dependencies; the positive view, derived from the raw kNN cell
   adjacency matrix Γ; and the negative view, formed by shuffling cell
   features and using an identity adjacency matrix. The three views
   undergo data augmentation and are subsequently encoded using GCN
   encoders with shared weights. The embeddings are then projected to
   compute contrastive loss, utilizing a joint objective composed of the
   NT‐Xent loss
   [MATH: <mrow><msub><mi
   mathvariant="script">L</mi><mrow><mi>N</mi><mi>T</mi></mrow></msub></mr
   ow> :MATH]
   and triplet loss
   [MATH: <mrow><msub><mi
   mathvariant="script">L</mi><mrow><mi>t</mi><mi>r</mi><mi>i</mi><mi>p</m
   i><mi>l</mi><mi>e</mi><mi>t</mi></mrow></msub></mrow> :MATH]
   . The low‐dimensional embedding of the learner view H can be used for
   multiple downstream tasks, including clustering, visualization, and
   trajectory analysis, whereas the learned adjacency matrix A aids in
   identifying novel CCIs.

2.2. STAGUE Outperforms State‐of‐the‐Art Methods on Spatial Clustering

2.2.1. Clustering Accuracy

   In this section, we evaluate the spatial clustering accuracy of STAGUE,
   benchmarking it against fifteen recent algorithms using comprehensive
   datasets that includes both real and simulated ST datasets
   (Experimental Section).

   The selected comparison algorithms encompass deep learning methods,
   both with and without the implementation of graph contrastive learning
   (GCL), as well as the latest statistical learning approaches to ensure
   a rigorous assessment. The first group of real datasets (Real#1)
   includes 14 datasets derived from a variety of technological platforms,
   including STARmap, MERFISH, osmFISH, Stereo‐seq, and 10x Visium. Among
   these data, eight are characterized by single‐cell resolution, while
   six offer spot‐level resolution. This selection presents a varied scope
   regarding spatial resolution, cell counts, and gene coverage. The
   second group of real datasets (Real#2) consists of 16 datasets selected
   from a recent benchmark study focused on spatial clustering.^[ [72]^29
   ^] These data involve non‐brain tissues and cover smaller, discrete
   tissue domains such as breast cancer, liver, and pancreatic ductal
   adenocarcinoma. The first simulated dataset group (Simulated#1) is
   comprised of 28 datasets generated using the spatial pattern preserving
   simulation tool, SRTsim.^[ [73]^30 ^] These simulations utilized the
   Real#1 group as references, incorporating both tissue‐based and