Abstract
Spatial transcriptomics has emerged as a groundbreaking tool for the
study of intercellular ligand-receptor interactions (LRIs) that exhibit
spatial variability. To identify spatially variable LRIs with
activation evidence, we present SPIDER, which constructs cell-cell
interaction interfaces constrained by cellular interaction capacity,
and profiles and identifies spatially variable interaction (SVI)
signals with support from downstream transcript factors via multiple
probabilistic models. SPIDER demonstrates superior performance
regarding accuracy, specificity, and spatial variance relative to
existing methods. Experiments of simulations and real datasets in bulk
and single-cell resolutions validate SPIDER-identified SVIs by spatial
autocorrelation and correlation with downstream target genes, and
reveal their consistency across multiple biological replicates.
Particularly, distinct SVIs on mouse datasets reveal the potential in
representing regional and inter-cell type interactions. SPIDER groups
SVIs with similar spatial distributions into SVI patterns that are
supported by strong Pearson correlations on spot annotations,
generating interaction-based sub-clusters within cell-type regions, and
deriving enriched pathways.
Subject terms: Computational models, Cellular signalling networks,
Cancer genomics, Software, Machine learning
__________________________________________________________________
Detecting spatial variance across cell-cell interactions remains
computationally challenging. Here, the authors develop SPIDER, a
statistical and machine learning tool to detect functionally-supported
spatially variable ligand-receptor interactions from spatial
transcriptomics data at bulk and single-cell resolutions.
Introduction
Cell-cell interactions (CCIs) play a vital role in cellular functions,
organogenesis, and disease progression, with identified CCIs widely
applied in disease diagnosis and therapeutic strategies^[30]1–[31]3.
These highly complex interactions involve multiple signaling pathways
and crosstalk between different cell types, some facilitated by
ligand-receptor interactions (LRIs). Single-cell RNA sequencing
(scRNA-seq) technology facilitates LRI inference from the co-expression
of signaling molecules, such as ligands, receptors, and downstream
transcription factors^[32]4.
Recently, the advance of spatially resolved transcriptome (ST)
technologies^[33]5–[34]8 has improved the accuracy of LRI inference by
applying spatial proximity to LRI signal detection^[35]9. For example,
Giotto and SpaTalk infer LRIs from neighboring cells defined by
Delaunay triangulation, and recent COMMOT limits the signaling range of
LRIs by collective optimal transport (COT)^[36]10–[37]12. The inferred
LRI signals are subject to differential expression tests given cluster
labels such as cell types. For example, SpaTalk identifies enriched
LRIs between any two clusters with a permutation test of randomly
shuffling cell labels to reconstruct interacting cell pairs^[38]11.
COMMOT summarizes LRI signals into a cluster-by-cluster signaling
matrix and applies label permutation tests to identify the interaction
significance between any pair of clusters^[39]12. However, examining
LRIs with predefined spot clusters neglects possible regional
interactions among mixed clusters or sub-clusters^[40]13.
Given the regulatory role of LRIs for tissue organization and
homeostasis^[41]14, LRI signals could exhibit spatial heterogeneity
across spatial locations, which we refer to as spatially variable LRIs
(SVIs). Spatial variance in LRI signals can reflect cell states
independent of gene-based annotations - cells from multiple clusters or
a subcluster could exhibit homogeneous LRI signals^[42]15. Furthermore,
similar to spatially variable genes (SVG), SVIs preserve the spatial
relationships of cells and capture cellular heterogeneity concerning
interactions^[43]16. Therefore, the identification of SVIs fills the
gap in existing CCI studies.
Statistical models for assessing the significance of the dependence
between signals and spatial coordinates have been proposed and
reviewed^[44]17. Spatial autocorrelation statistics, such as Moran’s I
and Geary’s C, serve as the baseline measurements for spatial
variance^[45]18,[46]19. For more sophisticated methods, SOMDE,
SpatialDE, SpatialDE2, and nnSVG are based on the Gaussian process,
SPARK-X is based on nonparametric covariance tests, and scGCO is based
on hidden Markov random fields and graph cut^[47]20–[48]24. However,
the above methods are not directly applicable to identifying SVIs due
to two obstacles. First, the LRI signal is inferred between two
neighboring cells without specific locations, which is a prerequisite
for spatial variance models. Furthermore, the other challenge is the
computational scalability^[49]17. For example, SPARK-X and nnSVG scale
linearly with the number of spatial locations^[50]22,[51]23, while
SpatialDE2 scales quadratically^[52]21. Detecting the spatial variance
of interaction signals is more computationally intensive, as the number
of spot pairs producing the signal doubles or triples the number of
spots. This introduces the need for current SVG methods to adapt to the
increasingly larger data size, as ST platforms move towards even higher
resolution.
In addition to detecting spatial variance for a single signal, methods
also exist for detecting the spatial co-occurrence between two signals.
SVCA is a Gaussian process-based framework that decomposes gene
expression variation into intrinsic effects, environmental effects, and
gene interaction effects, identifying genes with a high proportion of
variance explained by the interaction term^[53]25. Similarly,
SpatialCorr employs a likelihood ratio test statistic to detect
spatially varying gene correlations derived from the Gaussian kernel
and assesses statistical significance through sequential Monte Carlo
permutation^[54]26. However, the above methods do not consider if the
spatially co-occurring genes form any spatial patterns. Recently,
SpatialDM proposed to identify SVIs by introducing Moran’s R, a
bivariate extension of the aforementioned spatial autocorrelation
statistic Moran’s I, to detect spatial co-expression with spatial
patterns^[55]27. However, SpatialDM fails to consider any functional
support for identified SVIs, for example, whether the SVIs trigger any
downstream targets in receivers. Without functional validation, the
reported SVIs could contain false positives that should not be
integrated into downstream analyses. Therefore, a method that removes
such false positives by utilizing the enrichment of the
ligand-receptor-target (LRT) signaling network is still lacking.
In this study, we propose the package, named SPtial Interaction-encoDed
intErface decipheR (SPIDER), to identify SVIs with functional supports.
SPIDER constructs interaction capacity-constrained cell-cell
interaction interfaces using a power diagram. SPIDER then encodes each
interface with the estimated interaction strengths of individual LR
pairs by joining COT and co-expressions. Subsequently, SPIDER seeks
functional support of an LRI by examining the activation of its target
genes. In selecting potentially spatially variable LRIs, SPIDER
recruits a self-organizing map (SOM) neural network to form abstract
interfaces with interfaces in close proximity and identify signals that
are potentially spatially variable across interfaces using six
probabilistic models. Finally, SPIDER screens for SVI candidates with
functional support, and generates interaction patterns by grouping SVIs
with similar spatial distributions. Across all simulated scenarios in
both bulk and single-cell resolutions, SPIDER is able to receive the
highest Area Under the Receiver Operator Curve (AUC) and specificity
values compared to other methods. On real datasets, SPIDER shows
robustness in identifying SVIs from both single-cell and spot-based ST
data generated by different platforms, including 10x Visium, Slide-seq
V2, seqFISH, and Stereo-seq^[56]8,[57]28–[58]30. SPIDER demonstrates
scalability by effectively operating across a range of constructed
interfaces, from hundreds to over one hundred thousand, accommodating
the increasing size of ST data. The difference in resolution also
suggests the robustness of SPIDER against sequencing platforms. The
SPIDER-identified SVIs have been validated based on two criteria: they
exhibit higher scores on spatial autocorrelation metrics than the
excluded LRIs, and they correlate stronger with their downstream target
genes than the excluded LRIs. The biological significance of SVIs and
SVI patterns has been validated through strong Pearson correlations
with spot annotations, indicating biologically relevant spatial
patterns. The analysis has also generated interaction-based
sub-clusters within cell-type regions and clusters characterized by
mixed cell-type interactions. Furthermore, SPIDER derives enriched
pathways from SVIs and their supporting genes and excludes
false-positive SVIs without literature evidence.
Results
Overview of SPIDER
Following the assumption that LRIs are spatially constrained, SPIDER
estimates an interaction interface between any pair of cells given
their interaction capacity and spatial proximity (Fig. [59]1A). SPIDER
first evaluates the interaction capacity for each cell based on the
total expression of ligand and receptor genes. SPIDER then applies a
power diagram to identify interfaces between cells based on varying
interaction capacities^[60]31.
Fig. 1. Schema of SPIDER.
[61]Fig. 1
[62]Open in a new tab
A SPIDER estimates interaction interfaces between neighboring cells
based on interaction capacity and spatial proximity. Interface sizes
are proportional to assigned capacities in a power diagram
representation^[63]76. B SPIDER models ligand-receptor interaction
(LRI) signals across interfaces using a collective optimal transport
approach integrated with co-expression analysis to estimate interaction
profiles and directions. C Interfaces are located on the spatial
transcriptome slice and represented as nodes in a proximity graph.
Self-organizing mapping derives an abstract interface graph joining
adjacent interfaces. D A knowledge graph is used to estimate receptor
activation by analyzing weighted paths reaching spatially variable TFs
(svTFs) using a cell-specific adjacency matrix. The power of this
matrix quantifies path counts at increasing lengths. E Correlating
receptor activation scores with LRI scores provides supporting evidence
for LRIs by analysis of svTF patterns. F Statistical testing identifies
spatially variable interaction (SVI) candidates. G Candidates are
further screened for SVIs supported by correlating receptor activations
with LRI scores. H SVIs with similar distributions are clustered using
a mixture model.
Similar to the commonly used Delaunay triangulation in ST data
analysis, a power diagram generates polygons representing spots.
However, power diagrams directly represent interaction areas, with
polygon sizes proportional to assigned interaction capacities.
Therefore, the capacity-based power diagram outperforms Delaunay
triangulation in its superior adaptability to identify interfaces
according to varying interaction dynamics (detailed comparison in
Supplementary Note [64]1).
In a power diagram, cells are lifted onto a paraboloid surface in 3D
space, with the position of each cell determined by its assigned weight
and 2D positions. The lifted positions shape an overall convex hull
when viewed from above the paraboloid. This convex hull is then
projected back downward onto the original 2D plane. The boundaries
determined by this projection define the sizes and shapes of each
cell’s bounding polygon within the 2D plane. The area of each polygon
is directly proportional to the corresponding cell’s original lifting
weight, with higher weights yielding larger polygons.
Subsequently, SPIDER models LRI signals across interfaces, represented
as interaction profiles, as well as interaction directions, using a COT
approach integrated with co-expression analysis of ligand and receptor
genes (Fig. [65]1B and Supplementary Note [66]2). First, SPIDER applies
COT to estimate the distribution of ligand and receptor expression
across interfaces^[67]12. The COT problem is formulated to minimize the
transport of LR expression across interfaces while penalizing
un-transported expression, with constraints ensuring expression is
transported from source to target spots. The COT solution represents
the optimal distribution plan of ligands and receptors across
interfaces, therefore facilitating the estimation of LRI signals and
directions. Subsequently, the direction of an LRI in an interface is
inferred as the maximum between bi-directional COT scores. For the
interaction profiles, each entry in the profile represents an LR pair
from the LR database. From the optimal transport scores, LRI-specific
expressions are extracted for each spot to calculate LR co-expression.
The interaction strength of a ligand-receptor pair is then calculated
as the maximum of the corresponding COT score and co-expression value.
Subsequently, SPIDER locates the profiled interface on the ST slice at
the center of the connected spots (Fig. [68]1C). As a result, we obtain
the coordinates and expressions for interfaces. In particular, locating
interfaces enables the construction of a spatial proximity graph with
interfaces as nodes, which facilitates calculating the spatial variance
of an LRI signal.
To reduce the number of sites in testing spatial variance, SPIDER
further finds an abstract representation of interfaces based on the
interface spatial proximity graph (Fig. [69]1C). Specifically, SPIDER
utilizes SOM, an unsupervised neural network, to adaptively integrate
adjacent interfaces into an abstract interface. SOM derives the mapping
between interfaces and an abstract interface by both the topological
neighboring relations and the relative interface densities^[70]32. The
proximity-based abstraction of SOM preserves the spatial continuity of
joint interaction profiles. Each abstract interface joins the
interaction profiles of the contracted interfaces into an abstract
interaction profile with a mean-max signal convolution.
After interface construction, SPIDER analyzes spatially variable
transcription factors (svTFs) to provide functional support for LRI
scores. The inclusion of svTF analysis serves two major purposes.
First, the activation of a TF downstream of an SVI provides mechanistic
validation for the detected SVI. If an SVI is functional, its signaling
should propagate through the receptor to downstream gene regulatory
programs, resulting in the spatial activation of specific TFs^[71]11.
Therefore, observing spatially variable expression of a TF that can be
mechanistically linked to an upstream SVI offers strong evidence that
the SVI is not merely a result of spatial co-expression or technical
artifact, but is biologically meaningful and functionally realized in
the tissue context. Second, integrating svTFs into our analysis
enhances the biological interpretability of SVIs, as it allows us to
connect spatial cell-cell communication events with downstream changes
in gene regulatory networks and pathway enrichment analysis. By linking
SVIs to the activation of specific TFs, we can better understand how
spatial signaling events drive functional heterogeneity within the
tissue and how these events may shape spatial patterns of gene
expression and cell states. Consequently, we exclude non-spatially
variable TF genes, as they provide less compelling functional evidence
for SVIs and are less likely to reflect the downstream consequences of
spatial signaling.
SPIDER contains a knowledge graph representing known regulatory
relationships among ligands, receptors, and downstream target TF genes.
Receptor activation is estimated by tracing weighted paths reaching
svTF nodes through this graph topology, using a cell-specific adjacency
matrix representation integrated with cell expressions (Fig. [72]1D).
Powers of the weighted adjacency matrix quantify path counts linking
receptors and TFs at increasing hops, akin to signal propagation
through multiple network steps. Hop matrices extracted at each power
allow the dissection of receptor-TF connectivity over a range of path
lengths. The summation of hop matrices within a length threshold yields
a combined matrix of full activation scores, systematically
characterizing signal flow over the knowledge graph. Correlating the
resulting activation scores with the LRI profiles implicating those
receptors provides functional evidence of receptor-TF collaborations
(Fig. [73]1E).
Subsequently, SPIDER identifies SVI candidates by a multi-model test,
as illustrated in Fig. [74]1F from the abstract interfaces.
Specifically, SPIDER integrates six statistical models and two
benchmark tests for spatial variance and retains LRIs with
statistically significant spatial variances that pass the joint model
tests. SPIDER further screens for SVI candidates that are supported by
svTFs, generalizing the set of svTF-supported SVIs (Fig. [75]1G).
SPIDER then clusters SVIs based on similarities in spatial
distributions using a Gaussian process mixture model (Fig. [76]1H). The
number of clusters is automatically determined by a Dirichlet process
prior^[77]21. Each SVI cluster generates a spatial pattern as the
posterior mixture of the member SVIs modeled as Gaussian process
components, summarizing the spatial distribution of the included SVIs.
We refer to the Gaussian process mixture as the activation strength of
the generated pattern, which provides a quantitative measure of the
summarized SVI profiles within the cluster. To explore the potential
biological processes associated with SVIs that share similar spatial
distributions, SPIDER also identifies significantly enriched pathways
within each SVI cluster from KEGG and Reactome databases using Fisher’s
exact test^[78]33,[79]34.
Evaluation of SPIDER on simulated datasets compared to the SOTA CCI methods
To examine the accuracy of SPIDER in identifying SVIs, we simulated
multiple datasets from the pancreatic ductal adenocarcinoma (PDAC)
dataset in Fig. [80]2A with 428 spots and 498 LR pairs^[81]28. We used
the SVCA package^[82]25 to simulate the interaction strength between
ligands and receptors, as well as the receptor activation of TF genes.
Specifically, SVCA dissected the expression variance of a gene into
distinct Gaussian process components that individually represent the
contributions from intrinsic cell states, spatial proximity, gene
interactions, and residual noise.
Fig. 2. Comparisons between SPIDER and other state-of-the-art CCI methods on
simulated PDAC and DLPFC datasets in single-cell and bulk resolutions.
[83]Fig. 2
[84]Open in a new tab
A–E Comparisons on simulated single-cell PDAC datasets. A PDAC sample
with region annotations. B Receiver Operating Characteristic (ROC) of
the simulation dataset with 99% interaction strength level; Area Under
ROC (AUC) for each method is marked in the legend. C The boxplot of
five SVI evaluation metrics on LRIs detected by SPIDER, SpatialDM,
CellChat, SpatialCorr, SpaTalk, and those excluded by the above methods
(n=490 LRIs). The Geary score, where a lower value indicates higher
spatial clustering, is reversed for visualization consistency. D
Comparison of average AUCs across datasets under four different
interaction strength levels (25, 50, 75, and 99%, respectively) and
median noise level across different fractions. E Comparison of average
specificity scores across null datasets under four different
interaction strength levels (25%, 50%, 75%, and 99%, respectively)
across all three noise levels. F–J Comparisons on simulated single-cell
DLPFC sample 151,673. F DLPFC sample 151,673 with region annotations. G
ROC of the simulation dataset with 99% interaction strength level. H
The boxplot of five SVI evaluation metrics on LRIs detected by SPIDER,
SpatialDM, CellChat, and SpaTalk, and those excluded by the above
methods, with reverted Geary scores (n = 407 LRIs). I Comparison of
average AUCs across datasets under four different interaction strength
levels and median noise level across different fractions. J Comparison
of average specificity across null datasets under four different
interaction strength levels with all three noise levels between SPIDER
and SpatialDM. K–M Comparisons on simulated bulk PDAC datasets. K Bulk
PDAC simulation with annotations. L ROC of the simulation dataset with
99% interaction strength level; Area Under ROC (AUC) for each method is
marked in the legend. M Comparison of AUCs across datasets under four
different interaction strength levels with two resolution settings and
all three noise levels. N–P Comparisons on simulated bulk DLPFC
datasets. N Bulk DLPFC simulation with annotations. O ROC of the
simulation dataset with 99% interaction strength level; AUC for each
method is marked in the legend. P. Comparison of AUCs across datasets
under four different interaction strength levels with two resolution
settings and all three noise levels). PDAC: pancreatic ductal
adenocarcinoma; DLPFC: dorsolateral prefrontal cortex. The boxplots
display the median (center line), the 25 and 75th percentiles (box
bounds), whiskers extending to the most extreme data points within
1.5 × the interquartile range, minima and maxima as the lowest and
highest points within the whiskers, and outliers as individual points
beyond the whiskers. The statistical significance of box plots is
calculated using one-sided Mann-Whitney-Wilcoxon test with
Benjamini-Hochberg correction, with the exact adjusted p-values listed
in Supplementary Table [85]2 and the following significance
annotations: ****: adjusted p-value ≤ 1.00e-04; ***: 1.00e-04 <
adjusted p-value ≤ 1.00e-03; **: 1.00e-03 < adjusted
p-value ≤ 1.00e-02; *: 1.00e-02 < adjusted p-value ≤ 5.00e-02. Source
data are provided as a [86]Source Data file.
By controlling the fraction of expression variance explained by gene
interactions, we simulated ligand gene expression from the
corresponding receptor expression. This fraction, which defines the
interaction strength level, allows heterogeneous interaction strengths
across both cells and LR pairs. Similarly, we simulated the expression
of TF genes by controlling the fraction of intrinsic variance from
receptors. As the spatial pattern of simulated expression is dominated
by that of the given receptor gene, we selected the top 100 SV and
non-SV receptors. Considering that SVCA is a Guassian-based framework,
we further imposed different levels of Poisson noise on the simulated
count matrix.
We constructed various simulation scenarios by combining the simulated
ligand and TF expression data, initially generating 12 simulations by
systematically varying the interaction strength levels–specifically,
setting the fraction of ligand variance explained by receptor
interaction to 99, 75, 50, and 25%–in conjunction with different noise
levels. At the median noise level, we generated an additional 12
simulations by adjusting the fraction of svTF-supported non-SV
receptors, with options for full, partial, or no svTF support. Lastly,
we created four null simulations at the median noise level across
different interaction strength levels, ensuring that all SV receptors
were unsupported by svTFs, while non-SV receptors could have varying
levels of support.
We benchmarked SPIDER’s performance against state-of-the-art (SOTA) CCI
inference methods: cluster-based SpaTalk for ST data and CellChat for
non-spatial single-cell data, as well as spatial correlation-based
SpatialCorr and SpatialDM. To further test the effect of SPIDER
components, we replace the interface scoring step in SPIDER with COMMOT
and stLearn. We also evaluated the individual performances of SPIDER
SVI models, namely SpatialDE, SpatialDE2, scGCO, SPARKX, SOMDE, and
nnSVG.
AUC, or Area Under the Curve, is a widely utilized performance metric
that evaluates the effectiveness of binary classification models by
quantifying the area under the Receiver Operating Characteristic (ROC)
curve. This metric provides a comprehensive measure of a model’s
ability to differentiate between positive and negative classes, with a
value of 1.0 signifying perfect discrimination. When the interaction
strength is set at 99%, SPIDER achieved an average AUC of 0.84 across
all noise levels, surpassing both SpatialDM (AUC = 0.701) and TF-based
SpaTalk (AUC = 0.697), as illustrated in Fig. [87]2B. Additionally,
SPIDER outperformed CCI methods SpatialCorr (AUC = 0.475) and CellChat
(AUC = 0.673), which are not specifically designed for identifying
SVIs. In contrast, the performance of scoring methods stLearn and
COMMOT was less satisfactory, yielding AUCs of 0.713 and 0.562,
respectively, thereby indicating SPIDER’s superiority in LRI scoring.
To further evaluate the selected SVIs identified by SPIDER, we employed
two standard measures of spatial autocorrelation: Moran’s I and Geary’s
C. High values of I and low values of C indicate non-random, clustered
distributions^[88]18,[89]19. We reverse the Geary score for visual and
contextual consistency. In addition to these baseline assessments, we
analyzed specialized metrics from the SOMDE and nnSVG algorithms,
specifically the log-likelihood ratio (LR) and SOMDE’s fraction of
spatial variation (FSV) score, which quantifies spatially explained
interaction variability^[90]20,[91]22. Under conditions of 99%
interaction strength level, the SVI candidates selected by SPIDER and
SpatialDM, and LRIs from SpatialCorr, CellChat, and SpaTalk exhibited
significantly higher scores than those excluded by the above methods
(adjusted p-value ≤ 0.0001), as illustrated in Fig. [92]2C.
Additionally, SPIDER-nominated SVI candidates generally demonstrated
significantly higher scores compared to those identified by SpatialDM,
SpatialCorr, CellChat, and SpaTalk.
In scenarios with lower interaction strength levels of 25, 50, and 75%
and three noise levels, SPIDER consistently outperformed all other
methods (Fig. [93]2D and Supplementary Fig. [94]1A). As summarized in
Supplementary Table [95]1, SPIDER achieved an average AUC of 0.886,
compared to average AUCs of 0.762 and 0.703 for SpatialDM and SpaTalk,
respectively. Similar performance trends were observed for spatial
variance metrics in other simulation cases, where SPIDER often attained
higher scores than SpatialDM, SpatialCorr, CellChat, and SpaTalk,
although not always statistically significant, as demonstrated in
Supplementary Fig. [96]1B. Additionally, SPIDER outperformed all other
methods under different combinations of interaction strength levels and
fractions of svTF-supported non-SV receptors (Supplementary
Fig. [97]1C). SPIDER also demonstrated robust performance in composite
interaction scenarios (details in Supplementary Note [98]3), achieving
mean AUROCs of 0.930 (mixed interaction strength levels) and 0.871
(mixed noise levels), with superior performance in weak interaction
identification (AUROC = 0.931 vs 0.797–0.671) and high-noise spatial
pattern (AUROC = 0.779 vs 0.767–0.669) compared to SpatialDM and
SpaTalk. Systematic validation through variance ranking and spatial
autocorrelation metrics further confirmed SPIDER reliability in
detecting SVI across interaction variance gradients and varying spatial
patterns.
Furthermore, the individual SPIDER SVI model consistently yielded
higher AUCs than SpatialDM across varying interaction and noise levels
(Fig. [99]2D). Notably, SPIDER-SpatialDE and SPIDER-nnSVG, which
utilize only the statistical results from SpatialDE and nnSVG, achieved
the highest AUCs of 0.902 and 0.901, respectively. Among SPIDER SVI
models, Gaussian models, such as SPIDER-SOMDE (average AUC = 0.899) and
SPIDER-SpatialDE2 (average AUC = 0.895) outperformed scGCO (average
AUC = 0.884) and SPARKX (average AUC = 0.872).
To evaluate performance under null scenarios, we conducted tests on
SPIDER and comparable methods using simulated data without
svTF-supported SVIs. Specificity, defined as a method’s ability to
minimize false positives, serves as a critical performance indicator,
with higher values indicating superior outcomes. SPIDER consistently
demonstrated higher specificity compared to SpatialDM and SpaTalk
across various interaction levels, with average specificities of 0.848,
0.392, and 0.418, respectively (Fig. [100]2E). Additionally, SPIDER
outperformed individual statistical models and most other methods in
terms of specificity (Supplementary Fig. [101]1C). It is noteworthy
that some methods achieved higher specificity than SPIDER, primarily
due to their tendency to reject a majority of LRI pairs.
We further test for the effect of TF variability on SPIDER. Across
three TF noise levels, SPIDER consistently outperformed SOTA CCI
methods with the highest mean AUCs of 0.813, compared to average AUCs
of 0.779 and 0.639 for SpatialDM and SpaTalk, respectively
(Supplementary Fig. [102]2A). While both SPIDER and SpaTalk were
affected by the TF noise, SPIDER presented a smaller drop of 0.036 in
AUC value from low to high TF noise levels compared to that of 0.043
for SpaTalk. Additionally, SPIDER under the median TF noise generally
outperformed SOTA CCI methods against varying LRI noise levels and
interaction strength levels (Supplementary Fig. [103]2B, C).
To eliminate dataset-specific influences, we replicated the simulation
using the human dorsolateral prefrontal cortex (DLPFC) sample 151673
shown in Fig. [104]2F^[105]5. When the interaction strength level is
set at 99%, SPIDER achieved an AUC of 0.85 across all noise levels,
surpassing both SpatialDM (AUC = 0.78) and TF-based SpaTalk
(AUC = 0.81), as illustrated in Fig. [106]2G. Additionally, SPIDER
outperformed the CCI method CellChat (AUC = 0.72), with SpatialCorr
failing to run on this larger dataset. In this dataset, we observed a
better but still lesser performance of scoring methods stLearn and
COMMOT, yielding AUCs of 0.81 and 0.74, respectively, again supporting
SPIDER’s superiority in LRI scoring. The spatial variance evaluation
metrics also suggested the higher quality of SVIs from SPIDER compared
with SpatialDM, CellChat, and SpaTalk (Fig. [107]2H). Such accuracy and
quality persist for other combinations of interaction strength levels
and noise levels (Fig. [108]2I and Supplementary Fig. [109]3A, B).
Across all interaction strength and noise level, SPIDER performance is
in alignment with previous findings, exhibiting higher an average AUC
of 0.859 and average specificity of 0.592, compared to SpatialDM, which
recorded an average AUC of 0.803 and average specificity of 0.395, and
SpaTalk, which recorded an average AUC of 0.809 and average specificity
of 0.408 (Fig. [110]2J and Supplementary Fig. [111]3C).
In both simulations, integrating six distinct spatial variance testing
models within SPIDER confers notable advantages over relying on any
single model. As shown in Supplementary Table [112]1, while individual
models such as SPIDER-SpatialDE and SPIDER-nnSVG achieve high AUROC
values (0.902 and 0.901, respectively), and others like SPIDER-SPARKX
and SPIDER-SpatialDE2 exhibit strong specificity (0.835 and 0.833,
respectively), none surpass SPIDER itself in both AUROC and specificity
across both PDAC and DLPFC simulations. This demonstrates that the
integrative framework of SPIDER leverages the complementary strengths
of each method, resulting in consistently robust
performance–particularly in balancing sensitivity and specificity.
Furthermore, by aggregating results from multiple tests, SPIDER
effectively reduces the disagreement commonly observed among existing
spatial variance detection methods^[113]35, thereby enhancing the
reliability of identified SVI candidates. Importantly, SPIDER’s
flexible design allows users to consider a custom group of spatial
variance tests, ensuring adaptability to datasets with varying size and
noise characteristics.
Furthermore, we evaluate SPIDER and comparative methods (SpatialDM and
SpaTalk) on bulk simulations of the PDAC and DLPFC datasets. Across all
simulations, SPIDER consistently demonstrated superior predictive
performance for identifying SVIs supported by svTFs while lowering
resolution differentially influenced certain models’ accuracy. We
generated bulk PDAC dataset with the number of cells per spot as four
and six (Fig. [114]2L and Supplementary Fig. [115]4A). We found SPIDER
still demonstrated satisfactory identification of SVIs with svTF
support compared to SpatialDM and SpaTalk, as indicated by ROCs in
Fig. [116]2M and Supplementary Fig. [117]4B when the number of cells
per spot set to four. Notably, the accuracy of SpatialDE2, scGCO, and
SPARKX was more impacted by the reduced bulk resolution. This pattern
held for both cell-per-spot settings tested (Fig. [118]2N). For the
DLPFC bulk simulations (Fig. [119]2O and Supplementary Fig. [120]4C),
SPIDER again showed superior predictive ability based on the ROCs
(Fig. [121]2P and Supplementary Fig. [122]4D). In this dataset though,
only SpatialDE2 appeared to suffer substantially from the lower bulk
resolution, according to its AUROCs for both cell-per-spot settings
(Fig. [123]2Q). Overall, SPIDER maintained strong predictive
performance in identifying SVIs supported by svTFs under bulk
resolution, outperforming alternative approaches. However, moving to
bulk simulations decreased performance for some methods, with
SpatialDE2 consistently most affected on both datasets.
Using both single-cell and bulk resolution simulations, we examined the
impact of spatial constraints on SPIDER and two comparative methods,
SpatialDM and SpaTalk. When spatial constraints on LRIs are
absent–i.e., when spatial information is disregarded–none of the
methods should identify SVIs. Across repeated random simulations with
varying interaction strength levels, SPIDER and SpatialDM consistently
achieved remarkably accurate specificity scores of 1 (Supplementary
Fig. [124]5). In contrast, SpaTalk performed less effectively, likely
due to its primary focus on identifying cluster-level interactions
rather than SVIs.
To evaluate SPIDER’s robustness under weak spatial constraints, we
permuted spots or cells within blocks of varying sizes while adjusting
interaction strength levels (Supplementary Fig. [125]6A). SPIDER
consistently outperformed both SpatialDM and SpaTalk across three block
size settings, achieving a mean AUROC of 0.8716, compared to 0.7625 for
SpatialDM and 0.6487 for SpaTalk (Supplementary Fig. [126]6B).
Furthermore, we observed that loosening spatial constraints had a less
pronounced effect on bulk resolution datasets, with a reduction in mean
AUROC of only 0.0184 compared to a reduction of 0.0558 in single-cell
resolution datasets (Supplementary Fig. [127]6C). This difference is
likely attributable to the aggregated signals within spots in bulk
resolution datasets.
In real samples, we assessed the interaction types identified by SPIDER
under varying degrees of spatial constraint. Specifically, we permuted
spots or cells within blocks of different sizes in two cancerous and
two normal brain samples. Across all samples, the proportion of
short-distance interactions decreased as spatial constraints were
relaxed, confirming the influence of spatial proximity on interactions
(Supplementary Fig. [128]6D). Notably, cancerous tissues exhibited a
greater reduction in short-distance interactions, consistent with the
predominant role of ECM-receptor interactions in cancer. In contrast,
brain samples, where long-range secreted signaling dominates, showed
less pronounced changes in interaction proportions.
SPIDER involves three hyperparameters for interface abstraction and
svTF filtering of LRIs. We systematically assessed the impact of
interface abstraction as well as alternative settings for these
hyperparameters–the mixture parameter, SOM node number, and svTF
filtering threshold–on model performance using simulation datasets
(detailed in Supplementary Note [129]4). Power analysis demonstrates
that interface abstraction is essential for larger datasets, where it
effectively enhances sparse signals, but is less critical for smaller
datasets. While the mixture parameter had limited overall influence,
adjusting the SOM node number and svTF threshold demonstrated more
prominent effects. Specifically, higher SOM nodes tended to reduce
accuracy for some methods more substantially on the larger DLPFC
dataset, and the stringent svTF threshold of 0.5 showed stronger
dependence on dataset noise profiles than less restrictive cutoffs.
For practical application, we recommend enabling interface abstraction
and using a moderate SOM node number (e.g., 10) for large-scale or
high-complexity datasets, while for smaller datasets with less than
1000 spots or cells, interface abstraction can be omitted or a lower
SOM node number (e.g., 5) is preferred. The mixture parameter can be
set to the default value of 0.3, but increasing it to 0.5 may help
amplify weak signals in sparse data. For svTF filtering, the default
threshold of 0.3 generally balances sensitivity and specificity,
whereas a more stringent threshold of 0.5 may be used when higher
specificity is required, though this may reduce sensitivity in noisy
datasets. In summary, parameter values should be tailored to dataset
size and noise level: larger and noisier datasets benefit from
interface abstraction, moderate SOM node numbers, and careful
adjustment of the svTF threshold. Users are encouraged to iteratively
refine these parameters based on dataset characteristics and the
biological relevance of detected interfaces.
Evaluation of SPIDER on real datasets compared to SOTA CCI methods
We compare SPIDER, SpatialDM, stLearn, and COMMOT on the real PDAC
dataset, where SPIDER identified 42 svTF-supported SVIs, while
SpatialDM, stLearn, and COMMOT proposed 260, 323, and 346 LRIs,
respectively. Similar to the simulation cases, we compare the selected
LRIs with six metrics, as shown in Fig. [130]3A. Statistical tests
annotated in Fig. [131]3A further confirmed that svTF-supported SVIs by
SPIDER received significantly higher metrics values than those
identified by SpatialDM, stLearn, and COMMOT, suggesting that SPIDER is
more strict in identifying SVIs with both spatial variance and
functional support from target TF genes. In addition, such pattern
generally holds for other parameter settings (Supplementary
Fig. [132]7).
Fig. 3. Comparisons between SPIDER and SOTA CCI methods on real PDAC dataset.
[133]Fig. 3
[134]Open in a new tab
A Boxplot of six SVI-evaluation metrics, showing significantly higher
scores of SPIDER-identified SVIs over those by SpatialDM, COMMOT, and
stLearn, with those excluded as a baseline (n = 498 LRIs). The Geary
score is again reversed. B Boxplot of the number of supporting svTFs
per LRI among SPIDER svTF-supported LRIs, SpatialDM SVIs, stLearn and
COMMOT LRIs (n = 374, 260, 346, and 320 LRIs, respectively). C Venn
plot of svTF-supported LRIs, SpatialDM SVIs, and stLearn LRIs. D
TSPAN14/15 signaling LRIs with their best supporting TF genes and
correlations. E Barplot of the average number of ligands per receptor
(left) and the average number of receptors per ligand (right) in three
different LRI sets: SPIDER SVIs, SpatialDM SVIs, COMMOT and stLearn
LRIs, and all LRIs in the database. F Barplot showing the number of TF
supports for the top five SVI ranked by number of TF supports. G Dot
plot of pathways enriched by each SVI and its supporting spatially
variable TF genes. H Deconvoluted cell type annotations and the
corresponding SPIDER SVIs with the five highest Pearson correlation
coefficients. I Interaction patterns generated by LRIs identified by
SPIDER and SpatialDM. J Heatmap of Pearson correlation coefficients
between interaction patterns and PDAC region annotations, from SPIDER
and SpatialDM, respectively. K Heatmap of pattern-indicated interaction
between PDAC region annotations. PDAC: pancreatic ductal
adenocarcinoma; corr: Pearson correlation coefficient. The boxplots
display the median (center line), the 25th and 75th percentiles (box
bounds), whiskers extending to the most extreme data points within
1.5 × the interquartile range, minima and maxima as the lowest and
highest points within the whiskers, and outliers as individual points
beyond the whiskers. The statistical significance of box plots is
calculated using one-sided Mann-Whitney-Wilcoxon test with
Benjamini-Hochberg correction, with the exact adjusted p-values listed
in Supplementary Table [135]3 and the following significance
annotations: ****: adjusted p-value ≤ 1.00e-04; ***: 1.00e-04 <
adjusted p-value ≤ 1.00e-03; **: 1.00e-03 < adjusted
p-value ≤ 1.00e-02; *: 1.00e-02 < adjusted p-value ≤ 5.00e-02. Source
data are provided as a [136]Source Data file.
We first compare the SPIDER-proposed LRIs with svTF gene support with
the SpatialDM SVIs and stLearn and COMMOT LRIs, acknowledging that the
three SOTA methods are not designed for this specific task. The level
of TF support, quantified by the number of supporting svTF genes for an
LRI, is significantly lower for all three SOTA methods, as shown in
Fig. [137]3B. Furthermore, 59 SpatialDM SVIs and 85 stLearn LRIs are
not supported by any svTF genes (Fig. [138]3C). As a result, SpatialDM,
stLearn, and COMMOT received high false-positive rate (FPR) of 0.417,
0.667, and 0.750, respectively, when comparing to LRI svTF support.
Such pattern can be observed across dataset: SpatialDM achieved the
lowest mean FPR of 0.230 among the three methods, followed by 0.781
from stLearn and 0.958 from COMMOT (Supplementary Fig. [139]8A). The
gap between SpatialDM and other two CCI methods also supports the
relation between SVI and svTF support. We further examine the false
positive LRIs from the three SOTA methods that are rejected in SPIDER
for the lack of svTF support (Supplementary Fig. [140]8B). For example,
among three LRIs related to TSPAN14/15 signaling (Fig. [141]3D), only
APP-TSPAN15 are supported by downstream target GAPDH, while the
interaction between ADAM10 and APP-TSPAN15/14 are rejected (Pearson
r = 0.323, 0.256, and 0.122, respectively). However, ADAM10-TSPAN15 is
falsely identified by SpatialDM, COMMOT, and stLearn, and
ADAM10-TSPAN14 by SpatialDM and stLearn, as neither validates svTF gene
support. We first compare the SPIDER-proposed LRIs with svTF gene
support with the SpatialDM SVIs, acknowledging that SpatialDM is not
designed for this specific task. The level of TF support, quantified by
the number of supporting svTF genes for an LRI, is significantly lower
for SpatialDM SVIs, as shown in Fig. [142]3C. Furthermore, 59 SpatialDM
SVIs are not supported by any svTF genes. For example, among three
SpatialDM SVIs related to TSPAN14/15 signaling (Fig. [143]3D), only
APP-TSPAN15 are supported by downstream target GAPDH, while the
interaction between ADAM10 and APP-TSPAN15/14 are rejected (Pearson
r = 0.323, 0.256, and 0.122, respectively). APP has been shown to
influence the phosphorylation of key proteins like ERK within the MAPK
signaling pathway, leading to increased cell migration, invasion, and
epithelial-mesenchymal transition (EMT), promoting aggressive tumor
behavior in various cancers^[144]36. In particular, the ERK pathway can
regulate metabolic enzymes, including GAPDH, a known PDAC therapeutic
target associated with increased metabolic activity in tumor
growth^[145]37. In addition, by keeping only svTF-supported LRIs, we
found a reduced average number of ligands per receptor and receptors
per ligand in SPIDER svTF-supported SVIs, as shown in Fig. [146]3E.
This observation also validates SPIDER’s effectiveness in finding
functional SVIs, with co-expressing but not functioning LRIs excluded
by downstream target genes. Both the higher level of TF support and the
exclusion of less biologically meaningful LRIs suggest the
effectiveness of SPIDER in identifying svTF-supported LRIs compared
with SpatialDM, stLearn, and COMMOT.
Subsequently, we examine the biological insights provided by the
pathways activated jointly by SVI and their supporting svTF genes. We
show the top five ranking SVIs by their number of supporting svTF genes
in Fig. [147]3F, all of which are supported by over twenty target TF
genes. GO analysis on genes implicated in the top five ranking SVIs,
along with their supporting svTF genes, showed diverse enriched
pathways in Fig. [148]3G and Supplementary Table [149]4. We notice
similar enriched pathways for LAMC2-ITGA2 and THBS2-NOTCH3, especially
two cancer-related terms, proteoglycans in cancer and PI3K-Akt
signaling pathway (entries hsa05205 and hsa04151, adjusted p-values ≤
0.0001). The remaining three SVIs all enriched immune-related pathways
such as antigen processing and presentation (entry hsa04612, adjusted
p-value ≤ 0.0001). Here, SPP1-ITGB1 and TGM2-SDC4 and their svTF genes
further enrich T-helper cell-related pathways, while INS-INSR and its
svTF genes enrich the FoxO signaling pathway (hsa04068, adjusted
p-value = 0.0005). Additionally, we compare the biological insights
provided by false positive LRIs and svTF-supported SVIs. The false
positive LRIs from other three CCI methods enriched 22 pathways, while
SPIDER svTF-supported SVIs enriched 47 pathways, with only a small
intersection of nine pathways (Supplementary Fig. [150]8C).
Notably, these biologically enriched SVIs not only highlight key
signaling axes but also offer a practical guide for downstream
experimental validation and therapeutic exploration. For instance,
LAMC2-ITGA2 and THBS2-NOTCH3 along with their supporting svTF genes,
which activate cancer-related pathways, represent promising candidates
for perturbation assays such as ligand-blocking or CRISPR-mediated
disruption, enabling assessment of their spatial functional roles.
Moreover, INS-INSR or TGM2-SDC4 and their svTF genes, involved in
metabolism and immune regulation, may inform targeted modulation
strategies within the tumor microenvironment. By integrating spatial
variance and downstream activation, SPIDER effectively narrows the
search space for biologically meaningful, testable ligand-receptor
interactions with translational potential.
Subsequently, we examine the spatial pattern of SPIDER SVIs against LRI
spatial regions identified by SptialDM and stLearn. The deconvoluted
cell type annotations of the PDAC dataset served as another validation
of the identified SVIs (Supplementary Fig. [151]8D). As shown in
Fig. [152]3H, SVIs identified by SPIDER, such as SERPINE1-PLAUR and
PLAU-ITGA3, captured subregional spatial arrangements of two cancer
clones, A and B, dispersed in the tumor region of the sample (Pearson
r = 0.873 and 0.764, respectively). Both SVIs aid the understanding of
the spatial heterogeneity and the molecular mechanisms underlying PDAC
as they involve interactions between genes that are crucial in the
tumor microenvironment and are implicated in the pathogenesis and
progression of pancreatic duct adenocarcinoma^[153]38. These
clone-specific SVIs also provide experimentally actionable hypotheses:
spatial perturbation of SERPINE1-PLAUR or PLAU-ITGA3 in situ–through
blocking antibodies or local gene silencing–can help elucidate the
functional consequences of disrupting localized tumor subclone
communication. Such targeted validation may clarify the role of these
LRIs in maintaining clonal niches and promoting invasion or immune
escape, guiding precision therapeutic design in spatially heterogeneous
tumors. However, SpatialDM proposed less correlating marker SVIs for
the cancer clones, with the Pearson r of the top marker SVIs are 0.697
and 0.517, respectively (Supplementary Fig. [154]8E). Furthermore,
stLearn LRIs demonstrated insignificant correlations with the two
cancer clone types. Similarly, SPIDER-proposed SVIs also correlated
with major cell types, such as LTF-LRP1 and acinar cells with Pearson
r = 0.850, as well as TGM2-SDC4 and ductal cells with Pearson r=0.818.
Again, we can observe a higher average correlation with SPIDER SVIs
than SpatialDM SVIs and stLearn LRIs, along with other cell types such
as the tuft cells, pDCs, and endocrine cells (Supplementary
Fig. [155]8F).
In addition, SPIDER can reveal interactions among annotated cell types
from identified SVIs. We observed that 27 SPIDER SVIs exhibited
heterotypic signaling between different cell types, while the remaining
15 exhibited homotypic signaling within the same cell type
(Supplementary Fig. [156]8G). Interaction strength is also stronger for
heterotypic signaling (Supplementary Fig. [157]8H). Traditional CCI
analysis using scRNA-seq data finds a larger portion of homotypic
signaling, suggesting that SVIs could reveal more heterotypic
signaling^[158]39.
In particular, we observed heterotypic SVIs representing tumor-TME and
TME-exclusive interactions (Supplementary Fig. [159]8I). Agreeing with
the metastatic role of the aforementioned PLAUR, the SVI FN1-PLAUR
revealed tumor-TME interactions, including the bi-directional
cancer-stroma crosstalk, as well as interaction signals from duct
epithelium to the cancer region. Conversely, SVI INS-INSR, with INSR
being a known regulator of immune responses, showed strong TME
interactions but limited interactions with the tumor region - it mainly
revealed the interaction between duct epithelium and stroma issue. The
biological importance of these SVIs is further validated, with
FN1-PLAUR identified by SpatialDM, COMMOT, and stLearn, and INS-INSR
recognized by SpatialDM and COMMOT.
LRIs can be categorized into three types: long-distance secreted
signaling and short-distance signaling, which encompasses both
ECM-receptor interactions and cell-cell contact signaling^[160]27. We
evaluate SPIDER’s ability to identify both long-distance and
short-distance signaling, using SpatialDM as a benchmark. In the PDAC
sample, the proportion of secreted signaling in SPIDER SVIs is the
highest (0.33 for SPIDER compared to 0.26 for SpatialDM, 0.23 for
COMMOT and 0.32 for stLearn). Additionally, a majority of the samples
analyzed show larger percentages of secreted signaling in SPIDER SVIs
compared to SpatialDM, stLearn, and COMMOT (Supplementary Fig. [161]9).
Besides identifying LRIs with spatial variance, both SPIDER and
SpatialDM can generate interaction patterns from the identified LRIs,
as shown in Fig. [162]3I. The basic evaluations of SPIDER patterns are
illustrated in Supplementary Fig. [163]10A, suggesting SPIDER SVI
patterns are satisfying representations of their member SVIs.
Subsequently, we further compare the biological relevance and region
specificity of the interaction patterns generated by SPIDER and
SpatialDM, highlighting their respective abilities to capture
meaningful tissue interactions.
We anticipate that certain interaction patterns, akin to SVIs, will
reflect intra-cell type interactions, as assessed by the correlation
between these patterns and regional annotations (Fig. [164]3J).
Specifically, SPIDER-generated interaction patterns 4 and 3 exhibited
strong correlations with the cancer and duct epithelium regions, with
Pearson correlation coefficients of 0.822 and 0.648, respectively. In
contrast, pattern 2 displayed a significant negative correlation with
the cancer region, yielding a Pearson coefficient of -0.802.
Conversely, interaction patterns derived from SpatialDM demonstrated
weaker correlations with the annotated regions, with the highest
Pearson coefficient being 0.626. Thus, the SVI patterns produced by
SPIDER more effectively capture the interaction differences within
tissue regions.
SPIDER interaction patterns also demonstrate interactions between cell
types (Supplementary Fig. [165]10B). In particular, pattern 1 and
pattern 2 revealed tumor-TME (Tumor Microenvironment) and TME-exclusive
interactions similar to SVIs (Fig. [166]3K). Here, pattern 1 represents
the interaction between the cancer region and duct epithelium (adjusted
p-value ≤ 0.001, Supplementary Fig. [167]10C). Similarly, pattern 0
represents the interaction between cancer and stroma, while pattern 3
represents the stroma-epithelium interaction (adjusted p-values ≤
0.0001, Supplementary Fig. [168]10C).
We conducted additional analyses comparing SVI identified by SPIDER,
SpatialDM, stLearn, and COMMOT using samples from the DLPFC dataset and
other datasets (Supplementary Note [169]5). SVIs supported by svTFs in
SPIDER generally exhibited higher spatial and TF correlation metrics
than those from SpatialDM, stLearn, and COMMOT. In terms of finding
LRIs supported by svTFs, SPIDER received the highest level of svTF
support, and generally reduced the average number of ligands/receptors
per receptor/ligand. SPIDER SVI patterns also showed stronger
correlations with cell types than the SpatialDM-derived patterns in the
DLPFC samples.
SPIDER identifies svTF-supported spatially variable ligand-receptor
interactions from multiple samples in single-cell resolution
This section serves as a comprehensive assessment of SPIDER’s capacity
to identify svTF-supported SVIs across samples from different platforms
and biological repeats. To test SPIDER’s robustness against biological
repeats, we apply SPIDER on the seqFISH mouse organogenesis dataset
with three organogenesis samples, each with two biological
repeats^[170]29. Furthermore, we showcase SPIDER’s robustness against
sequencing platforms with the Stereo-seq mouse brain dataset almost at
a single-cell resolution^[171]30.
On the seqFISH mouse organogenesis dataset, we first assess the quality
of the identified SVIs using both spatial-variance metrics and svTF
correlations. In terms of the spatial variance of SPIDER-proposed SVIs,
we found significantly higher scores from SVIs than those excluded in
all five spatial variance metrics in all samples (Fig. [172]4A).
Furthermore, Fig. [173]4A shows that the identified SVIs demonstrate
significantly higher svTF correlations than those excluded.
Additionally, SpatialDM and stLearn both proposed interactions lacking
svTF support, receiving mean FPRs of 0.344 and 0.766 when comparing to
LRI svTF support (Supplementary Fig. [174]11A, B). The level of TF
support, quantified by the number of supporting svTF genes for an LRI,
is also lower for SpatialDM and stLearn, as shown in Supplementary
Fig. [175]11C.
Fig. 4. Validation of SPIDER on identifying TF-supported SVIs on multiple
datasets at single-cell resolution.
[176]Fig. 4
[177]Open in a new tab
A–K Validation of SPIDER on identifying TF-supported SVIs on multiple
seqFISH mouse samples at the single-cell resolution. A The boxplot of
five SVI evaluation metrics and one TF correlation metric (n=282 LRIs).
B The stacked bar plot showing TF supports for the top 10 SVIs across
samples, ranked by the number of supports. C The dot plot lists
pathways enriched by each SVI and its supporting spatially variable TF
genes. D SVI related to the Fzd2 receptor, with different supporting TF
genes and correlations. E The location of the annotated erythroid
region in three embryo samples (z2) and the profile of the top SVI with
the highest correlation Kitl-Epor. F The location of the annotated
cardiomyocyte region in three embryo samples (z2) and the profile of
the top SVI with the highest correlation Bmp7-Acvr2a/Acvr1. G The
annotated sample embryo 1 (z2) of the embryo dataset, the location of
the annotated brain region composed of the forebrain, midbrain, and
hindbrain, and the profiles of the regional SVIs Fgf15-Fgfr1 and
Sfrp1-Fzd2. H Clustering of spots based on identified SVIs. I, J
Sub-clusters within the brain region and the spinal cord region,
respectively. K Marker SVIs for the main clusters involved in the brain
region and the spinal cord region, respectively. L–P. Validation of
SPIDER on identifying TF-supported SVIs on the Stereo-seq mouse sample
at approximately single-cell resolution. L Annotation of the Stereo-seq
mouse sample. M The boxplot of five SVI evaluation metrics and one TF
correlation metric (n=1295 LRIs). N The dot plot lists marker SVIs of
inter-cell type interfaces. O Clustering of cells based on identified
SVIs. P Cell types within SVI-defined Cluster 0 and 6, respectively.
The boxplots display the median (center line), the 25 and 75th
percentiles (box bounds), whiskers extending to the most extreme data
points within 1.5 × the interquartile range, minima and maxima as the
lowest and highest points within the whiskers, and outliers as
individual points beyond the whiskers. The statistical significance of
box plots is calculated using one-sided Mann-Whitney-Wilcoxon test with
Benjamini-Hochberg correction, with the exact adjusted p-values listed
in Supplementary Table [178]5 and the following significance
annotations: ****: adjusted p-value ≤ 1.00e-04; ***: 1.00e-04 <
adjusted p-value ≤ 1.00e-03; **: 1.00e-03 < adjusted
p-value ≤ 1.00e-02; *: 1.00e-02 < adjusted p-value ≤ 5.00e-02. For bar
plots, data are presented as mean values ± 95% confidence intervals.
Corr: Pearson correlation coefficient. Inter: interneuron-selective
cells; GABA: long-projecting GABAergic Cell; Astro: astrocyte; CR:
Cajal-Retzius Cell; Oligo: oligodendrocyte. Source data are provided as
a [179]Source Data file.
Aside from the benchmarking metrics, we further evaluate the biological
validity of TF-supported SVI activation. Ranking SVIs by their number
of supporting svTF genes in all six samples allows us to evaluate the
level of support, as shown in Supplementary Fig. [180]11D. The top five
SVIs received support from over twenty target TF genes (Fig. [181]4B),
namely Sftp1-Fzd2, Bmp7-Acvr1, Bmp2-Acvr1, Wnt5a-Fzd2, and Apln-Aplnr.
We apply GO analysis on genes implicated in the above SVIs along with
the corresponding supporting svTF genes (Supplementary Table [182]6).
All five SVIs and svTFs enrich PI3K/Akt signal transduction pathway and
signaling pathways regulating pluripotency of stem cells (entries
mmu04151 and mmu04550, adjusted p-values ≤ 0.05, Fig. [183]4C), in
accordance with the developing state of samples. Additionally, the two
Fzd2-related SVIs and their supporting svTF genes enrich the Wnt
signaling pathways (entry mmu04310, adjusted p-value ≤ 0.001), and the
two Bmp-Acvr SVIs enrich TGF-β signaling pathways (entry mmu04350,
adjusted p-value ≤ 0.01), with both pathways known to participate in
embryonic development.
Furthermore, we examine the difference between active and non-active
SVIs. Specifically, supporting svTF genes vary among LRIs of the same
receptor, potentially providing diverse information for underlying
function activation. Take receptor Fgfr2 in Embryo 1 (z2) as an example
(Fig. [184]4D and Supplementary Fig. [185]11E): while Fgf5-Fgfr2 is not
supported by any svTF genes, Fgf10-Fgfr2 and Fgf17-Fgfr2 are most
supported by Ldhd and Aldob, with Pearson r of 0.396 and 0.410,
respectively. Moreover, Fgf5-Fgfr2 is rejected by svTFs in most
samples. However, Fgf5-Fgfr2 is identified by SpatialDM in two samples,
and stLearn in all six samples.
This concurs with the distinct roles of FGFs in development.
Specifically, Fgf5 interacts primarily with Fgfr1 to regulate hair
cycling^[186]40, without functional evidence in organogenesis,
indicating its limited interaction with Fgfr2 and lack of supporting
transcription factors in that context. Conversely, Fgf10 and Fgf17 are
critical for organ development, with Fgf10 playing a vital role in lung
branching morphogenesis and Fgf17 being implicated in cerebellar
development^[187]41. The presence of supporting svTFs for Fgf10-Fgfr2
and Fgf17-Fgfr2 underscores the necessity of these factors in
facilitating their interactions with Fgfr2, highlighting the importance
of context-dependent regulatory mechanisms in these signaling pathways.
A similar difference can be observed in other Fgf receptors
(Supplementary Fig. [188]11F–J).
Now that we have validated SPIDER’s ability to identify high quality
SVIs from single-cell samples, we can access its robustness as drawing
similar results from bulk samples. To this end, we constructed bulk
samples by aggregating neighboring cells into spots using SOM for the
six single-cell samples. On the aggregated bulk sample, we also
identified high-quality SVIs, with average Jaccard similarity between
the single-cell and bulk SVI lists as 0.7278 for five-cell-per-spot
aggregation and 0.6683 for ten-cell-per-spot aggregation (Supplementary
Fig. [189]12A, B). Across all samples, eight of ten SVIs with the most
supporting svTF genes are consistent in both aggregations, with
Sftp1-Fzd2, Bmp7-Acvr1, Bmp2-Acvr1, Wnt5a-Fzd2, and Apln-Aplnr still
received support from over twenty target TF genes (Supplementary
Fig. [190]12C, D). Therefore, both the quality and level of TF support
for svTF-supported SVIs are rather consistent between single-cell and
bulk samples.
After examining the supporting svTF genes for SVIs, we proceed to the
assessment of SVIs in terms of biological validity, demonstrated by
their regional presence related to annotated cell types. First, SPIDER
identified SVIs that correlate with cell types across all samples.
Figure [191]4E shows relatively strong correlations (highest Pearson
r = 0.917) between erythroid and the SVI Kitl-Epor, in accordance with
the importance of Kitl-Epor interaction in erythropoiesis^[192]42.
Further validating this finding, Kitl-Epor has strong correlations
across all samples as shown in Supplementary Fig. [193]13A. Similarly,
agreeing with bone morphogenetic protein (BMP) signaling’s
participation in heart development, the BMP-related SVIs Bmp7-Acvr2a
and Bmp7-Acvr1 correlate moderately with the spatial distribution of
cardiomyocytes as shown in Fig. [194]4F^[195]43. Again, Bmp2/7-Acvr1/2a
interactions are found across samples (Supplementary Fig. [196]13B)
with the highest Pearson r at 0.729.
Aside from the above cell types, we also observed other cell types with
correlated SVIs. In the endothelium region, we found a constant
presence of Alpn-Alpnr interactions across all six samples, as shown in
Supplementary Fig. [197]13C (highest Pearson r=0.616). Similarly, in
the gut tube region, we found the presence of Cdh-Fgfr1 interactions in
five out of six samples (Supplementary Fig. [198]13D, highest Pearson
r = 0.861).
In addition, SPIDER proposed SVIs marking subregions within cell types.
For example, within the annotated brain region composed of the
forebrain, midbrain, and hindbrain, SPIDER identified SVIs with
regional presence. In Embryo 1 (z2), SPIDER-proposed Fgf15-Fgfr1 and
Sfrp1-Fzd2 are located separately in the midbrain and hindbrain,
moderately correlating with the brain region (Pearson r = 0.519 and
0.505, Fig. [199]4G). Validating our finding, Fgf15 is a known
regulator of neurogenesis in the midbrain^[200]44. Its region presence
in the midbrain can be observed in other samples (Supplementary
Fig. [201]13E). These region-specific SVIs offer direct hypotheses for
functional developmental studies. For example, Fgf15-Fgfr1 and
Fgf17-Fgfr3, enriched in distinct brain subregions, could be targeted
using spatially resolved gene perturbation or reporter assays to study
their role in neuronal differentiation or regional patterning. The
ability of SPIDER to map such localized, functionally supported
interactions enables focused experimental interrogation of
developmental signaling landscapes.
To further investigate the regional feature of SVIs, we used SVI to
perform cell clustering using the Leiden algorithm implemented in
Scanpy^[202]45 as detailed in Methods. To demonstrate the effectiveness
of SVIs in revealing interaction-based clusters, we also performed cell
clustering using genes implicated by the identified SVIs. Based on both
AMI and ARI metrics evaluated against ground truth annotations,
SVI-based clustering achieved significantly higher performance (mean
AMI = 0.5957, mean ARI = 0.3456; see Supplementary Fig. [203]14A)
compared to clustering using only the implicated genes (mean
AMI = 0.2783, mean ARI = 0.1368). These results indicate that while
both gene- and interaction-based clustering capture tissue
heterogeneity, SVI-based clustering provides additional insights into
distinct cell states.
Using Embryo 1 (z2) as an example, we further assess the biological
validity of SVI-based clusters compared to clusters derived from genes
implicated in these SVIs (Fig. [204]4H; Supplementary Fig. [205]14B).
Notably, SVI-based clustering identified major clusters corresponding
to annotated brain and spinal cord regions (Fig. [206]4I, J), which
were not detected by clustering based solely on the implicated genes
(Supplementary Fig. [207]14C). In accordance with the aforementioned
regional SVIs within the brain region, clustering based on SVIs
separated the brain region into clusters 9, 10, 7, and 3, representing
the hindbrain, midbrain, and two sub-regions within the forebrain,
respectively. Similarly, the spinal cord also consisted of four
SVI-defined subregions, which could be related to the separation of the
anterior, dorsal, and ventral regions of the spinal cord.
We also found distinct marker SVIs of the subclusters in the brain
region (Fig. [208]4K). In particular, our findings of marker SVIs
Fgf17-Fgfr3 of the hindbrain and Fgf17-Fgfr2 of the midbrain concord
with prior reports documenting a regional separation of these
receptors^[209]46. Specifically, this separation concurs with the
region-specific functions of these FGF-receptor interactions in
governing neuronal differentiation and spatial patterning during
embryonic brain development.
Subsequently, we test SPIDER on a different ST sequencing platform
Stereo-seq, with near single-cell resolution. Cell type annotations on
this mouse brain dataset^[210]30 are displayed in Fig. [211]4L. SPIDER
again identified high-quality SVI as indicated by five out of the six
metrics shown in Fig. [212]4M. The levels of TF support in results from
SpatialDM and stLearn are significant lower compared to SPIDER
svTF-supported LRIs, as shown in Supplementary Fig. [213]15A.
Additionally, the false positive LRIs from SpatialDM and stLearn
enriched 21 pathways, while SPIDER svTF-supported SVIs enriched 62
pathways, with a small intersection of fourteen pathways (Supplementary
Fig. [214]15B). In pathways uniquely enriched by either gene sets, we
found only one signaling pathway from false positive LRIs, and eleven
from SPIDER svTF-supported SVIs. Therefore, SPIDER is able to provide
more biological insight with TF incorporation.
Further validating the biological importance of SPIDER svTF-supported
SVIs, we found marker SVIs for all cell types (Supplementary
Fig. [215]15C). For example, the correlation between mural cell and
Prg4-Cd44 reaches 0.863, and the correlation between oligodendrocyte
and Trf-Tfrc reaches 0.821. In particular, Trf-Tfrc is known to promote
oligodendrocyte maturation and proliferation^[216]47.
On this dataset, we explore SPIDER SVI’s potential in revealing
cross-cell-type interactions. That is, aside from SVIs marking cell
type and sub-cell type regions, we can also observe interactions
between cell types (Fig. [217]4N). For example, Npy-Npy2r marks most
cross-cell type interactions involving the interneuron-selective cells,
a common neuromodulator for interneurons to inhibit postsynaptic
neurons^[218]48. Conversely, the interaction between
interneuron-selective cells and oligodendrocytes is uniquely marked by
Rtn2-Rtn4r, with Rtn4r being an important regulator of
oligodendrocytes^[219]49.
Furthermore, the dendrogram in Fig. [220]4N suggests distinct
interaction among groups of cell types. Agreeing with the marker SVIs
identified above, one group involves oligodendrocytes and
interneuron-selective cells-related interactions. Conversely, mural
cell-related interactions are closer in the dendrogram.
We further explore cross-cell-type SVIs with interaction regions.
Similar to the previous dataset, we used SVI to perform cell clustering
using the Leiden algorithm, achieving an ARI of 0.5047 and an AMI of
0.5923 with the spot annotation (Fig. [221]4O). This interaction-based
clustering revealed two distinct clusters of multi-cellular
interactions in the brain. Cluster 3 was predominantly defined by
interactions among interneuron-selective cells, long-projecting
GABAergic neurons, and astrocytes (Fig. [222]4P, left). This concurred
with previous hierarchical clustering, especially since cluster 3 is
marked by Lefty1-Acvr2a and Npy-Npy2r (Supplementary Fig. [223]15D) -
marker SVIs of GABAergic-astrocyte and interneuron-GABA/astrocyte
communications, respectively. Similarly, cluster 5 describing the
crosstalk between Cajal-Retzius cells and oligodendrocytes
(Fig. [224]4P, right) is also marked by SVI Efnb3-Ephb1 identified as a
marker for oligodendrocytes-Cajal-Retzius cell interaction.
Under different clustering parameters, we can observe SVI clusters
marking subregions within annotated cell types (Supplementary
Fig. [225]15E). For example, we found the SVI cluster 6 in Fig. [226]4O
consistently revealing a subregion within the annotated oligodendrocyte
region, as shown in Supplementary Fig. [227]15F. Furthermore, cluster 6
is consistently marked by the dopamine signaling SVI Gnal2-Drd2
(Supplementary Fig. [228]15G). This SVI-indicated oligodendrocyte
subregion agrees with the function of dopamine signaling in maintaining
healthy oligodendrocyte function and supporting neuronal
networks^[229]50.
Supplementary Note [230]6 provides a further discussion on SVI patterns
for both datasets. The mouse embryo samples revealed a limited number
of SVIs due to the small gene count, with SPIDER generating 4-6 SVI
patterns per sample. While the SVI patterns captured spatial
similarities within larger organ regions instead of distinct cell
types, they still identified brain regions in embryos. In the SAW mouse
brain sample, although the patterns were also fewer than the cell
types, they effectively marked Cajal-Retzius cells, oligodendrocytes,
and mural cells.
SPIDER SVI pattern clustering reveals sub-cell type clusters on the slide-seq
V2 mouse brain dataset
We demonstrated SPIDER’s ability to cluster SVIs with similar spatial
distributions on the slide-seq V2 mouse brain data with single-cell
resolution^[231]8 as shown in Fig. [232]5A. SPIDER identified 119
high-quality svTF-supported SVI out of 163 LRIs, as indicated by all
six metrics shown in Supplementary Fig. [233]16A. We can again identify
marker SVIs for major cell types (Supplementary Fig. [234]16B) such as
the SVIs Nlgn1-Nrxn3 marking the CA1/CA2/CA3/subiculum and
Calm2-Cacna1c marking dentate pyramids.
Fig. 5. SPIDER SVI-generated pattern results on single-cell-resolution
Slide-SeqV2 mouse brain dataset.
[235]Fig. 5
[236]Open in a new tab
A The cell type annotation of the sample. B The boxplot of the
correlations between the activation strength and the profiles of member
and non-member SVIs in each pattern shows that the activation strength
represents the member SVIs (n=163 LRIs). C Deconvoluted cell types and
their corresponding patterns are displayed, with annotated correlation
coefficients for each pattern. For each pattern, two member SVIs having
the highest correlation coefficients with the activation strength are
plotted. D Top ten enriched pathways for genes contained in each SVI
pattern, under the significance threshold of adjusted p-value ≤ 0.05. E
Cell clustering using SVI patterns. F Cell type compositions in SVI
pattern-based clusters. Clusters having a cell type with a majority of
over 30% are displayed. G–I Clusters involved in the cell types and
their top three marker SVIs. G Oligodendrocytes. H CA1/CA2/CA3
subiculum. I Astrocytes. Corr: Pearson correlation coefficient. The
boxplots display the median (center line), the 25 and 75th percentiles
(box bounds), whiskers extending to the most extreme data points within
1.5 × the interquartile range, minima and maxima as the lowest and
highest points within the whiskers, and outliers as individual points
beyond the whiskers. The statistical significance of box plots is
calculated using one-sided Mann-Whitney-Wilcoxon test with
Benjamini-Hochberg correction, with the exact adjusted p-values listed
in Supplementary Table [237]7 and the following significance
annotations: ****: adjusted p-value ≤ 1.00e-04; ***: 1.00e-04 <
adjusted p-value ≤ 1.00e-03; **: 1.00e-03 < adjusted
p-value ≤ 1.00e-02; *: 1.00e-02 < adjusted p-value ≤ 5.00e-02. Source
data are provided as a [238]Source Data file.
From the identified SVIs, SPIDER generated seven SVI patterns, each
with an activation strength aggregated from the profiles of SVI members
(Supplementary Fig. [239]16C). We first evaluated the quality of the
inferred activation strength of a pattern by their correlations with
profiles of member SVIs, compared to the rest of SVIs. The boxplot in
Fig. [240]5B showed significantly higher correlations between the
activation strength and member SVIs compared to the non-member SVIs.
Correlations between deconvoluted cell types and SVI patterns
(Fig. [241]5C and Supplementary Fig. [242]16D) suggested that some
patterns capture spatial heterogeneity similarly to gene expressions
and SVIs. For example, pattern 2 strongly correlated with
oligodendrocytes (Pearson r = 0.760). Examination of member SVIs in
pattern 2 validated its fidelity in representing member SVI profiles,
with high correlations between activation strengths and the pattern
(Pearson r =0.837 and 0.834). While pattern 3 modestly correlated with
ependymal cells (Pearson r = 0.322), one member SVI, Tac1-Tacr1, has
been experimentally validated to relate to ependymal cell
activation^[243]51, demonstrating the potential biological relevance
even of patterns with weaker cell type correlations.
We also performed gene enrichment analysis with respect to patterns
(Fig. [244]5D). For example, pattern 1 correlates moderately with the
dentate pyramids (Pearson r=0.518), representing the interaction
between neurotrophins Bdnf and Ntfk3, indicating a potential functional
role of neurotrophins in the dentate pyramids^[245]52. Summarizing all
SVI-implicated genes in pattern 1, we found enriched neuron-related
pathways, such as axon guidance, neuroactive ligand-receptor
interaction, and neurotrophin signaling pathway (mmu04360, mmu04080,
and mmu04722, adjusted p-values ≤ 0.01), in accordance with the
abundant neurons in the represented regions. Similarly, for smaller
cell populations such as the endothelial tip cells, SPIDER also found
SVI pattern 5 with similar spatial distributions (Pearson’s r = 0.362).
In pattern 1, we found enriched TGF-β signaling (mmu04350, adjusted
p-value ≤ 0.001), agreeing with the signaling role of TGF-β receptors
for endothelial tip cells within the neural environment^[246]53.
To further demonstrate how the SVI patterns represent spatial
interaction modules, we performed cell clustering with SVI patterns
using the Leiden algorithm. We obtained 23 cell clusters from SVI
patterns, as shown in Fig. [247]5E, among which 11 clusters describe
the major cell types (Fig. [248]5F). For example, given the high
correlation between cell types and patterns, endothelial tip and
dentate pyramids are represented by clusters 17 and 15, respectively.
To further validate that the SVI patterns represent spatially organized
interaction modules, we performed cell clustering using the Leiden
algorithm based on the SVI patterns. This resulted in 23 clusters
(Fig. [249]5E), with 11 associated with major cell types
(Fig. [250]5F). For example, consistent with their high correlations to
patterns, endothelial tip cells, and dentate pyramidal cells were
distinctly represented by clusters 17 and 15, respectively.
Furthermore, the SVI patterns suggest subpopulation structure within
major cell types astrocytes, CA1/CA2/CA3/subiculum, and
oligodendrocytes. Regarding oligodendrocytes, the main regions were
characterized by clusters 14 and 6, while the sparser regions were
associated with cluster 5 (Fig. [251]5G). Cluster 14 was defined by the
MAG interaction, consistent with MAG being a marker of myelination and
mature oligodendrocytes^[252]54. In contrast, cluster 5 was
characterized by EphA-Ephrin signaling, which is required prior to
myelination. These results indicate the SVI patterns capture
interaction variation within oligodendrocyte populations related to
distinct developmental and functional states.
Similarly, three clusters are observed in the CA1/CA2/CA3/subiculum
(Fig. [253]5H). Here, cluster 13 encompasses multiple Calm2-related
SVIs, such as the SVI Calm2-Kcnq3 that is crucial for the proper
assembly and functionality of KCNQ2/KCNQ3 channels. These channels are
vital for stabilizing neuronal membrane potentials and supporting the
cognitive functions associated with the CA1, CA2, CA3, and subiculum
regions^[254]55. Contrarily, cluster 16, possibly representing the CA1
region, is marked by Silt2 interactions that establish synaptic
specificity in hippocampal CA1^[255]56. Such sub-clusters can also be
observed for astrocytes in (Fig. [256]5I). Overall, SPIDER SVI patterns
are able to produce interaction-based clusters that represent
interaction diversity within cell types.
SPIDER identifies svTF-supported SVIs from multiple samples in bulk
resolution
In this section, we show that SPIDER is not restricted to single-cell
resolution datasets by incorporating intra-spot interfaces. We use two
datasets to demonstrate these properties: a HER2-positive breast cancer
dataset with annotated cancerous and non-cancerous regions^[257]57, and
the aforementioned DLPFC dataset with layered structures of spot
clusters. In addition, a previous study showed that the inclusion of
CCI information improves psduotime and trajectory inference in
scRNA-seq data, given the important regulatory function of CCI in cell
differentiation and developmental processes^[258]58. Therefore, we also
explore the application of SVI in revealing diffusion pseudotime and
trajectory compared to gene expression.
SPIDER identified high-quality SVIs across all six samples in the
breast cancer dataset, as shown by the six metrics in Fig. [259]6A.
Furthermore, we can find marker SVIs for cell-type annotations across
samples (Supplementary Fig. [260]17). For biological validation of
SVIS, we take sample G2 as an example, which contains both invasive and
in situ cancer, as well as immune infiltrated regions (Fig. [261]6B).
We find the presence of SVI ZG16B-CXCR4 in the region of immune cell
infiltration (correlation coefficient at 0.431), with CXCR4 has also
been found to relate to T-lymphocyte infiltration and immunotherapy in
metastatic breast cancer^[262]59. Similarly, cancer in situ and
invasive cancer regions are marked by different SVIs HSP90AA1-ERBB2 and
THBS2-ITGB1 with correlation coefficients of 0.824 and 0.697,
respectively. In particular, ERBB2, also known as HER2, is a biomarker
for HER2-positive breast cancer, and THBS2-ITGB1 interaction is known
to regulate tumor invasion in breast cancer^[263]60.
Fig. 6. Validation of SPIDER on identifying TF-supported SVIs on datasets at
bulk resolution.
[264]Fig. 6
[265]Open in a new tab
A–D. Validation of SPIDER on identifying TF-supported SVIs on multiple
breast cancer samples at the bulk resolution. A The boxplot of five SVI
evaluation metrics and one TF correlation metric (n=490, 487, 2117,
944, 563, and 647 LRIs for samples A1 to G2, respectively). B Marker
SVIs for the main clusters involved in sample G2 with correlation
coefficients higher than 0.3. C Marker SVIs for invasive cancer and
cancer in situ that are consistent across samples, shown on samples A1
and G2. D The dot plot lists pathways enriched by genes implicated in
constant cancer marker SVIs. E Pseudotime results based on SVIs (left)
and gene expression (right) shown on samples A1 and G2. F The barplot
of correlations between cancer/TME labels and pseudotime from SVI or
gene expression across all samples. G Boxplots showing pseudotime
distributions with respect to cancer/TME labels (n=307, 140, 20 spots
for TME, Invasive Cancer, and Cancer In Situ, respectively). H–L
Validation of SPIDER on identifying TF-supported SVIs on multiple DLPFC
samples at the bulk resolution. H The boxplot of five SVI evaluation
metrics and one TF correlation metric (n=1584, 1416, and 1454 LRIs for
sample 151673, 151510, and 151672, respectively). I Correlation heatmap
of white matter and layer 3 regions on three samples with top three
white matter marker SVIs (left) and layer 3 marker SVIs (right). The
white color indicates the corresponding SVI is missing in the sample. J
The dot plot lists pathways enriched by genes implicated in constant
region marker SVIs. K Trajectories inferred with gene expression (top)
and SVIs (bottom) on sample 151673. L The barplot showing AUROC scores
of trajectories inferred from gene expression and SVIs on three
samples. Corr: Pearson correlation coefficient; WM: white matter. The
boxplots display the median (center line), the 2 and 75th percentiles
(box bounds), whiskers extending to the most extreme data points within
1.5 × the interquartile range, minima and maxima as the lowest and
highest points within the whiskers, and outliers as individual points
beyond the whiskers. The statistical significance of box plots is
calculated using one-sided Mann-Whitney-Wilcoxon test with
Benjamini-Hochberg correction, with the exact adjusted p-values listed
in Supplementary Table [266]8 and the following significance
annotations: ****: adjusted p-value ≤ 1.00e-04; ***: 1.00e-04 <
adjusted p-value ≤ 1.00e-03; **: 1.00e-03 < adjusted p-value ≤
1.00e-02; *: 1.00e-02 < adjusted p-value ≤ 5.00e-02. Source data are
provided as a Source Data file.
Notably, SPIDER is able to produce distinct marker SVIs that represent
the diversity between invasive cancer and cancer in situ (Supplementary
Fig. [267]18). Furthermore, such cancer marker SVIs are consistent
across samples, as shown in Fig. [268]6C. The six consistent marker
SVIs for invasive cancer have an average correlation coefficient of
0.464, much higher compared to the average correlation coefficient with
cancer in situ region of −0.046. Similarly, we found 28 consistent
marker SVIs for the cancer in situ region. These SVIs have an average
correlation coefficient of 0.492 with cancer in situ region, also
higher than the average correlation coefficient with invasive cancer of
−0.029.
GO analysis on the above consistent cancer marker SVIs further revealed
functional differences between cancer in situ and invasive cancer as
shown in Fig. [269]6D, while sharing the enrichment of pathways in
cancer with adjusted p-values ≤ 0.05. For invasive cancer, the
SVI-implicated genes enrich the pathway of fluid shear stress and
atherosclerosis (entry hsa05418, adjusted p-value ≤ 0.05), which has
been found to promote breast cancer cell proliferation and invasive
potential^[270]61. Similarly, the enrichment of thyroid hormone
signaling pathway (entry hsa04919, adjusted p-value ≤ 0.05) by the
interaction involving ITGAV, suggests the activation of processes
critical for metastasis, such as epithelial-mesenchymal
transition^[271]62. Conversely, genes from SVIs marking cancer in situ
region enrich pathways whose dysregulation often contributes to the
invasive progression of in situ cancer^[272]63.
Subsequently, we inspect if SVI can be used to generate pseudotime
results in a similar manner to gene expression. In cancerous ST
samples, pseudotime represents tumor transition revealed by gene
expression changes^[273]64. Similarly, pseudotime from interaction
information reveals the temporal evolution of intercellular signaling
networks within the tumor microenvironment. Setting invasive cancer as
the root for pseudotime, we obtained pseudotime results from both SVI
and gene expression for all samples (Fig. [274]6E and Supplementary
Fig. [275]19A). We evaluate the quality of the generated pseudotime by
its correlation with spot labels ordered by invasive cancer, TME, and
cancer in situ. Across all samples, SVI pseudotime achieved higher
correlations with ordered labels compared to gene pseudotime, with mean
correlations across samples as 0.5752 and 0.3967, respectively
(Fig. [276]6F). The better performance from SVIs could be justified by
the aforementioned consistent presence of cancer marker SVIs.
Subsequently, we evaluate the biological significance of SVI pseudotime
results. Agreeing with previous findings^[277]65, the pseudotime
between invasive and in situ cancer showed the great divergence
(Fig. [278]6G and Supplementary Fig. [279]19B), suggesting that SVI can
serve as a valid counterpart of gene expression.
To test if SPIDER can identify SVIs with different spatial patterns, we
also apply SPIDER on the DLPFC dataset with clustered and layered
spatial structure, unlike the rather sparse annotations from the breast
cancer dataset. Again, SPIDER identified high-quality SVIs across all
three samples in the DLPFC dataset, as shown by the six metrics in
Fig. [280]6H. We can also observe SVIs marking known brain layers
(Supplementary Fig. [281]20). For example. the neuronal interactions
LPL-LRP1 and PSAP-LRP1 are present in layers 3 and 1 from sample 151673
(Pearson r=0.632 and 0.480, respectively).
Furthermore, we demonstrated SPIDER’s ability to identify SVIs that
differentiate brain regions in multiple samples (Fig. [282]6I).
Specifically, six SVIs consistently correlated with white matter across
all samples (mean Pearson r=0.720). Additionally, 26 SVIs correlated
with layer 3 in two of three samples (mean Pearson r = 0.395). The
higher positive correlations between SVIs and their respective regions,
compared to lower negative correlations with other regions, provide
statistical evidence that SPIDER can delineate discrete brain regions
like white matter and layer 3 based on SVIs.
Again, we conducted GO analysis on the identified marker SVIs, which
revealed functional differences between white matter and layer 3
regions (Fig. [283]6J). Specifically, pathways enriched in white matter
markers primarily supported structural integrity (adjusted p-values ≤
0.01), while layer 3 markers focused on synaptic interactions and
cognitive functions (adjusted p-values ≤ 0.0001). Notably, the
enrichment of neuroactive ligand-receptor interaction and calcium
signaling pathways in layer 3 concurs with its role in synaptic
interactions. Similarly, the enriched long-term potentiation and
circadian entrainment pathways in layer 3 agree with its central role
in cognitive processing.
The DLPFC dataset is characterized by its linear development
trajectory^[284]66. Therefore, we inspect if SVI can be used to reveal
the linear trajectory in a similar manner to gene expression.
Trajectory analysis based on gene expression reveals the gene
expression patterns and functional roles of cells across different
cortical layers, while trajectory analysis based on SVIs uncovers the
communication patterns between these layers, helping us understand
their coordination in maintaining normal brain function. We obtained
trajectory results from both SVI and gene expression for all samples
(Fig. [285]6K and Supplementary Fig. [286]21A). We evaluate the quality
of the generated trajectories with the AUROC score on the edge weights
against ground truth connections between consecutive layers. Across all
samples, SVI trajectory achieved higher AUROC scores, with mean
correlations across samples as 0.9833 and 0.9065, respectively
(Fig. [287]6L). Therefore, SVI can also generate valid trajectory as
gene expression, with a possible advantage in this case coming from the
spatial constraint of interactions.
Supplementary Note [288]7 provides further discussion on the SVI
patterns obtained for both datasets. Similar to our finding based on
SVIs, the SVI patterns in the breast cancer dataset also distinctly
mark regions of invasive and in situ cancer. For the DLPFC dataset, we
observed SVI patterns marking the white matter region across all three
samples.
Discussion
Identifying functionally activated spatially variable interactions is
crucial for understanding the relationship between tissue structure and
cell-talk functions. In this paper, we introduce SPIDER, a
computational method that accurately identifies spatially variable
interactions (SVIs) with the support of svTF genes. SPIDER identifies
capacity-constrained cell-cell interaction interfaces, and scores LRIs
with interface-adapted COT and LR gene co-expression. SPIDER constructs
the abstract interface graph by SOM, and integrates six statistical
models to produce SVI candidates^[289]20–[290]24, leveraging the
Gaussian process with various covariance kernels, nonparametric
covariance tests, and hidden Markov random fields. SPIDER further
screens for SVI candidates that are supported by the activation of
target svTF genes, reducing the number of false positives.
SPIDER performed well at identifying svTF-supported SVIs compared to
other CCI methods, both in simulations and real datasets. In simulation
testing across varying levels of supported and unsupported svTF genes,
interaction strength, and noise, SPIDER generally achieved higher AUC
values among SpatialDM, SpatialCorr, SpaTalk, and CellChat. In addition
to the simulation data, we applied SPIDER on six datasets from four
platforms varying in spot numbers and resolutions (Supplementary
Table [291]9). SPIDER-identified svTF-supported generally exhibited
higher spatial autocorrelation scores and stronger correlations with
downstream target svTF genes than those from SpatialDM. These results
demonstrate SPIDER’s ability to accurately recognize svTF-supported
SVIs over alternative CCI methods in multiple testing scenarios.
The importance of screening for functionally active SVIs becomes
evident in the context of interaction-based analyses, particularly when
assessing the relevance and biological significance of these
interactions. In the PDAC dataset, among LRIs related to TSPAN14/15
signaling, only the interaction APP-TSPAN15 is supported by a
downstream target. Additionally, in a mouse embryo dataset, SPIDER
effectively excluded a false positive interaction Fgf5-Fgfr2 with
limited relevance to organogenesis due to the absence of svTF support.
In contrast, SpatialDM exhibited significantly lower TF support
compared to SPIDER-proposed LRIs, with many lacking supporting svTF
genes. By concentrating on svTF-supported LRIs, SPIDER reduces the
average number of ligands per receptor and receptors per ligand,
demonstrating its ability to identify biologically meaningful
interactions. The top-ranking SVIs, supported by multiple svTF genes,
reveal diverse enriched pathways, highlighting their distinct
biological roles in cooperation with downstream targets. These findings
emphasize SPIDER’s effectiveness in identifying biologically relevant
svTF-supported LRIs and the essential role of TF support in functional
pathway analysis.
SPIDER SVIs demonstrate both heterotypic and homotypic signaling
characteristics, providing insights into cellular interactions across
diverse biological contexts. In a pancreatic ductal adenocarcinoma
(PDAC) dataset, approximately 64% of SPIDER SVIs represented
heterotypic interactions, with stronger interaction strength compared
to homotypic signaling. For homotypic interactions, correlations
between SPIDER SVIs and gene-driven spot annotations validated that
these SVIs effectively capture interactions within specific cell types.
Clustering analyses further confirmed the presence of both signaling
types. In a mouse embryo dataset, SPIDER identified SVIs that
delineated sub-regions of the brain and spinal cord. Similarly, in the
SAW mouse brain dataset, SPIDER-generated SVIs revealed regions
composed of mixed cell types. The identification of both heterotypic
and homotypic signaling is crucial for understanding cell type
coordination, particularly within heterogeneous tissues like tumors.
The SVI patterns generated by SPIDER uncover spatial interaction
modules that enhance our understanding of the cooperative functions of
SVIs and cell types. By grouping SVIs with similar spatial
distributions, we can achieve higher-level functional annotations based
on these interactions. In the slide-seq mouse brain dataset, SPIDER
identified significant interaction modules, such as neurotrophins in
the dentate pyramids and TGF-β signaling in endothelial tip cells.
Additionally, SVI patterns facilitate interaction-based cell
annotation. A comparison with SpatialDM in the PDAC dataset showed that
SPIDER effectively delineates distinct SVI patterns linked to specific
tissue regions, while SpatialDM patterns lacked this regional
specificity. The SVI patterns in the slide-seq dataset also indicated
sub-clusters within gene-based annotations, revealing the presence of
distinct marker SVIs.
The evaluation of SPIDER’s runtime across key steps–scoring interfaces,
scoring TF genes, and identifying SVI candidates–highlights important
insights into computational efficiency. The runtime of the above steps
on a single CPU core is reported for each sample in Supplementary
Table [292]10. As detailed in Supplementary Note [293]8, a strong
linear relationship was found between the number of interfaces and the
runtime for interface scoring and SVI candidate identification. In
contrast, the number of LRIs showed a weaker correlation. For svTF
scoring, the number of LRIs was a more substantial factor, but the
number of interfaces still played a role. This quantitative observation
further supports that interface screening and abstraction can
substantially enhance computational efficiency for SPIDER. However,
interface abstraction presents a trade-off between LRI profile accuracy
and runtime efficiency (Supplementary Note [294]9). While it reduces
the number of interfaces and significantly decreases identification
time, it also smooths out LRI signals, potentially masking important
variations. Evaluation metrics indicate that abstracted interfaces have
lower expression variability than original interfaces. Thus, while
abstraction enhances efficiency, users should consider its impact on
biological accuracy, especially in smaller datasets or specific regions
of interest.
Compared to existing methods, SPIDER offers a more flexible framework
for analyzing SVIs, facilitated by the approaches of interface
identification and svTF validation. Our SPIDER framework allows users
to utilize different interface scoring methods, as the underlying
assumptions may influence the results^[295]67. Additionally, SPIDER
enables customization of spatial variance testing methods, considering
the specific advantages and limitations of each approach in relation to
data size and noise levels. By combining results from multiple spatial
variance tests, SPIDER can reduce discrepancies that often arise when
calling statistically significant spatial variance^[296]35. This
adaptability not only enhances the robustness of SVI identification but
also facilitates the incorporation of advanced CCI methods.
In conclusion, SPIDER offers a powerful approach for analyzing SVIs in
ST data, significantly enhancing our understanding of cellular
interactions within complex tissues. By accurately identifying
functionally activated SVIs supported by svTF genes, SPIDER minimizes
false positives and captures biologically relevant interactions that
are critical for elucidating tissue structure and function. Its ability
to discern both heterotypic and homotypic signaling enriches our
insights into cell type coordination, particularly in heterogeneous
environments like tumors. Furthermore, the identification of distinct
SVI patterns facilitates higher-level functional annotations and
interaction-based cell annotations, allowing researchers to uncover
meaningful biological modules. Overall, SPIDER serves as a valuable
tool for exploring the intricate dynamics of cell communication, paving
the way for insightful discoveries in ST data.
Methods
Identifying cell-cell interaction interfaces constrained by interaction
capacity
The inputs of SPIDER are the spatial gene expression matrix
[MATH: X∈Rm×
n :MATH]
and the spot coordinates
[MATH: S=(s1T
,⋯,smT
)∈Rm×
2 :MATH]
of the ST data, with m denoting the number of spatial spots and n
denoting the number of genes. From the gene set
[MATH: G={
g1,⋯,gn} :MATH]
, we extract the ligand and receptor gene set
[MATH: G′ :MATH]
that contains genes participating in known ligand-receptor
interactions. In this work, we use the LR pairs from
CellTalkDB^[297]68, which contains 3398 human LR pairs and 2033 mouse
LR pairs.
We first evaluate the interaction capacity for spot by the total
expression of ligand and receptor (LR) genes. Given the interaction
capacity per spot, we can then use a power diagram to identify varying
numbers of interfaces for spots with different interaction
capacities^[298]31. Similar to the Delaunay triangulation, the power
diagram generated polygons representing spots, and we consider spots
with adjacent polygons to be potentially interacting. To generate a
power diagram, we lift spots from the Euclidean space
[MATH: E2
:MATH]
onto a paraboloid in
[MATH: E3
:MATH]
. This construction incorporates the interaction capacity c[s] of spot
s with a scaling function f, defining its lifted height h[s] in
[MATH: E3
:MATH]
[MATH:
hs=∣∣s∣∣2F−
cs2
msubsup>,wherecs
mrow>=f∑g∈G′
Xs,g. :MATH]
1
The lifting equation and capacity scaling are detailed in Supplementary
Method [299]1.1. The interaction capacity c[s] modifies the paraboloid,
with larger capacities pulling it downward, resulting in more extensive
downward-facing facets. The convex hull of this adjusted paraboloid
forms these facets, which correspond to the power diagram for the
points. This lifting process determines the capacity-driven paraboloid
and, consequently, the capacity-guided polygons representing the
points.
After obtaining possibly interacting spots from the power diagram, we
further screen for spot pairs based on both their spatial proximity and
interaction capacity. We first keep spot pairs that are in close
proximity. For grid ST datasets, the proximity threshold is defined as
the third quantile of the distance between all adjacent spots. For
non-grid ST datasets, we apply the 99th quantile instead. We then
estimate the ideal number of interfaces for each spot based on their
interaction capacity with a Gaussian mixture model (GMM). For a spot
with more interfaces than its ideal number of interfaces, we prune its
interface with the most distant adjacent spot.
We also adapt the modeling of interfaces to account for ST resolution.
Specifically, for bulk resolution, we consider an interface between two
spots as a regional representation of interaction signals, and we
define an inner interface to represent the interactions within a spot.
Considering that our task is to identify interaction with spatial
variation, the regional interaction signal can also indicate variation
in space.
Finally, we determine the spatial locations of interaction interfaces.
For each neighboring spot pair
[MATH:
(s,s
′) :MATH]
identified and screened above, we define the interaction interface
[MATH:
I=(s,s′) :MATH]
representing the interactions between the two spots. Formally, we
obtain the set of interaction interfaces
[MATH: I
:MATH]
and the location of interfaces Z as
[MATH: I={
I1,⋯,Ik}whereIi
mrow>=(si,si′)
;Z=z1T
,⋯,zkT
∈Rk×
2wherezi=si+si′2. :MATH]
2
with s[i] and
[MATH:
si′
:MATH]
denoting the two neighboring spots on either side of the i-th interface
I[i], and k denoting the number of identified interfaces.
Estimating ligand-receptor interaction strengths and directions on interfaces
with COT and co-expression
We model the ligand-receptor interaction (LRI) signals at each
interaction interface from gene expression X, referred to as the
interaction profile, by incorporating both COT and the co-expression of
ligand-receptor (LR) genes at the interaction interface. From the set
of LR genes
[MATH: G′ :MATH]
, we extract the ligand gene set
[MATH: L
:MATH]
and the receptor gene set
[MATH: R
:MATH]
. The set of LR pairs are represented as
[MATH: P={(<
/mo>l1,r1),⋯,(ln′,rn′
mrow>)} :MATH]
, where
[MATH: n′
:MATH]
denotes the number of LR pairs.
We first utilize COT to estimate the distribution of ligand and
receptor expression across the interfaces. The COT implementation is
based on the approach provided by COMMOT^[300]12, with modifications to
constrain the transport plan between SPIDER interfaces. Specifically,
for the set of interfaces
[MATH: I
:MATH]
, the COT formulation is adjusted as follows:
[MATH: minT∑(l,r)∈L×R(s,s′)∈IT(l,r,s,s′
)+∑l∈LF(μl)+∑r∈RF(νr)s.t.T(l,
r,⋅,⋅)=0∀(l,
r)∉P,
∑T(l,⋅,s,⋅)≤
Xs,l,
∑T(⋅,r,⋅,s′)≤X
s′,r.
:MATH]
3
Here, F is the penalty function associated with the untransported LR
expression, defined as
[MATH:
μl,s=Xs,l−∑
r,s′T(l,
mo>r,s,s′) :MATH]
and
[MATH:
νr,
s′=Xs′
,r−∑l,s
T(l,r,s,s′) :MATH]
. The COT score of the p-th LRI
[MATH:
(l,r)
mrow>∈P :MATH]
at the i-th interface
[MATH:
(s,s
′)∈I :MATH]
is therefore
[MATH: T^i,p=max
{T(l,r,s,s<
/mrow>′),<
mi>T(l,r,
s′,s)}.
:MATH]
4
Furthermore, from the optimal transport plan T, we extract the LR
specific expressions of ligands
[MATH: L∈Rm×n′
:MATH]
and receptors
[MATH: R∈Rm×n′
:MATH]
to compute LR co-expression. Specifically, the co-expression of the
p-th LRI (l, r) at the i-th interface
[MATH:
(s,s
′) :MATH]
is
[MATH: T¯i,p
=max(L
s,pRs′,p)12<
/mrow>,(Rs,p
Ls′
mo>,p)12<
/mfrac>whereLs,p=∑{(s,s′)∈I∣∀s
mrow>′}T(l,r,
mo>s,s′),Rs′,p=∑{(s,s′)∈I∣∀s}
mo>T(l,
r,s,s′).
:MATH]
5
For the ith interface between spots s and
[MATH: s′
:MATH]
, we define the interaction profile y[i], where the pth entry
represents the interaction strength of the pth ligand-receptor pair:
[MATH:
yi,p=max{T¯i,p
,T^i,p}
. :MATH]
6
Then we can present the interaction profiles across k interfaces
between neighboring spots for all
[MATH: n′
:MATH]
ligand-receptor pairs as a two-dimensional matrix
[MATH: Y=(y1T
,⋯,ykT
)∈Rk×n′
:MATH]
.
Additionally, we infer the interaction direction from the optimal
transport plan. The interaction direction can be determined by the
choice made in Equation ([301]4). That is, the interaction pair
[MATH:
(s,s
′) :MATH]
is reversed to
[MATH:
(s′,s) :MATH]
when
[MATH:
T(l,r,s,s′)<T(l,r,s′,s)
mrow> :MATH]
, and kept as
[MATH:
(s,s
′) :MATH]
otherwise.
Scoring the activation of spatially variable downstream genes
The objective of this step is to use transcription factor (TF) genes to
support scored LRIs. We collect a knowledge graph encoding known LR-TF
regulatory relation from SpaTalk^[302]11, which is a directed graph
with LR/TF nodes and regulatory edges. Let V denote all nodes in the
knowledge graph, we use V[R] and V[TF] to denote the receptor and TF
nodes. Using the spatial information of ST data, we can further
partition the TF node set V[TF] into a set of spatially varying TF
(svTF) genes V[svTF] and non-spatially varying TF genes V[nsvTF]. We
calculate activation scores for each receptor by analyzing weighted
paths from V[R] to V[svTF] in the knowledge graph. The directed graph
structure is encoded by the adjacency matrix
[MATH: A∈R∣V
∣×∣V∣ :MATH]
. The power of an adjacency matrix, denoted as A^h, where h is a
positive integer, represents the number of paths of hop h between
vertices in the graph. Specifically, the entry (V[R], V[svTF]) of the
matrix A^h indicates the number of distinct paths of hop h that start
at a receptor vertex V[R] and end at a svTF vertex V[svTF].
We further derive a weighted adjacency matrix by considering the level
of target node activation represented by gene expression. Let x denote
the gene expression vector of a spot, the weighted adjacency matrix
[MATH: A~ :MATH]
are defined as
[MATH: A~ij=A
ijx
mi>j :MATH]
. Its powers
[MATH: A~<
mrow>h :MATH]
capture the number of paths between nodes within h hops weighted by the
co-expression of target nodes. To make sure the weighted adjacency
matrix is comparable among different h, we regularize
[MATH: A~<
mrow>h :MATH]
with the weights as well as the number of paths. The regularized
weighted adjacency matrix
[MATH: A¯h :MATH]
is therefore defined as
[MATH: A¯ijh=A~ijh<
/mrow>1h<
/mi>A
ijh :MATH]
7
We use
[MATH: H∈R∣VR∣×∣<
/mo>VTF∣ :MATH]
to record activation scores between each receptor-svTF pair with a
spot, which is defined as the weighted receptor-svTF path sum within 3
hops
[MATH: H=∑h≤3
HhwhereHh=A¯h[VR,
VTF]
mo>∈R∣VR∣×∣<
/mo>VTF∣. :MATH]
8
Here, the hop matrices H[h] is a subset that keeps only receptor-svTF
paths. We reorganize the H matrices across all cells into a single
activation score matrix
[MATH: S∈Rm×nrt :MATH]
, where n[rt] is the number of receptor-svTF pairs.
Finally, we calculate the correlation between S[r] (activation scores
for receptor r) and LRIs scores implicating r. Correlating svTFs
provides supporting evidence for those LRIs, with a user-defined
threshold on correlation coefficients filtering supported LRIs, with a
threshold of 0.3 set as default.
Building abstract interfaces with self-organizing map
In this section, we build an abstract graph from the interfaces derived
above. From m spots, the number of interfaces is approximately 2m for
square or hex grid ST data, and 3m for non-grid ST data, as shown in
Supplementary Method [303]1.2. Considering that statistical tests for
spatial variance are challenged by computational scalability, we reduce
the number of interfaces by finding an abstract graph of interfaces.
An abstract interface represents a group of spatially adjacent
interfaces; we derive such a mapping with a self-organizing map (SOM)
that preserves the spatial topology of interfaces^[304]20. Formally, we
initialize a K × K square lattice covering the ST slice whose vertices
represent an abstract interface with initial coordinates
[MATH: Z^=(z^<
mrow>1×1T,z^<
mrow>1×2T,⋯,z^<
mrow>K×KT)∈RK
mrow>2×2
:MATH]
. To achieve a level of abstraction that balances between a resolution
that retains detail while still reducing the number of interfaces, we
set K by requiring the number of interfaces represented by a single
abstract interface as
[MATH:
kK
2=5 :MATH]
by default. In the training process, SOM iteratively updates the
mapping between an interface at z and its best-matching abstract
interfaces selected as the one with the smallest distance from z, as
well as the coordinates
[MATH: Z^ :MATH]
of abstract interfaces to better present the spatial topology of
interfaces. Here, we use the batch SOM implemented in SOMDE^[305]20.
After training, we construct the interaction profile of an abstract
interface with a mixed max-average pooling on the represented interface
profiles. The joint of two pooling methods retains the most prominent
LRI signals while avoiding overestimating variance due to the limited
neighborhood size^[306]69. In particular, we set the mixture portion of
average values to 0.7 to avoid extreme values presenting as noise in
calculating the spatial variance of interaction signals. We denote the
profiles of abstract interfaces as
[MATH: Y^=(y^<
mrow>1×1T,y^<
mrow>1×2T,⋯,y^<
mrow>M×MT)∈RM
mrow>2×n′ :MATH]
.
Identifying spatially variable LRI candidates
SPIDER incorporates six state-of-the-art methods for spatial variance
evaluation. From the profile matrix and coordinates of the abstract
interfaces, denoted as Y and Z to ease the notation, we have
[MATH: Y=(y1,y2,⋯,yp,⋯,yn′)
T :MATH]
, where
[MATH: n′
:MATH]
is the number of ligand-receptor pairs and y[p] denotes the profile of
LRI p across all abstract interfaces. SPIDER measures the
spatial-induced variance of y[p] with several formulations below.
Gaussian process (GP) regression is a common approach for spatial
regression; under the GP model, an LRI signal y[p] follows a normal
distribution with the mean of μ ⋅ 1 and a covariance matrix from two
terms - a non-spatial noise term ψI and a spatial term formulated by a
covariance function σ^2Q(ϕ)^[307]70. In the spatial variance term, Q(ϕ)
captures the spatial correlation between abstract interfaces under the
lengthscale ϕ controlling the decay rate of correlation with distance.
The parameters are inferred by maximizing the log-likelihood, with
which we obtain the log-likelihood ratio (LLR) compared to the null
hypothesis of no spatial covariance.
Assuming that the LLR follows a χ^2 distribution, we obtain the
corresponding p-value and q-value for y[p]. We can also rank LRI with
the fraction of spatial variance (FSV) defined in SpatialDE^[308]70.
For the above process, we apply the implementation in SpatialDE and
SOMDE^[309]20 using a squared exponential covariance function.
Furthermore, we incorporate two variations of the GP regression model
(Supplementary Method [310]1.3.1). Replacing the full covariance matrix
with a nearest-neighbor approximation accelerates the model fitting
process to scale linearly with the number of abstract interfaces,
instead of cubically when using the full covariance matrix; the
subsequent estimation of LLR, p-value, and q-value resembles the
previous method. We apply the implementation in nnSVG^[311]22. To
integrate a variety of covariance functions, we incorporate all
possible covariance functions with a generalized linear mixed model.
For an efficient model fitting process, we utilize an extended score
test that only estimates the model under the null hypothesis as
implemented in SpatialDE2^[312]21. Assuming that the sum of the score
vector for the variance components follows a mixture of χ^2
distributions, we obtain the corresponding p-value and q-value for
y[p].
Moreover, we apply a nonparametric covariance test for estimating the
spatial variance (Supplementary Method [313]1.3.2). Assumes that if
y[p]is independent of Z, then the spatial distance between two abstract
interfaces z[i] and z[j] would also be independent of the LRI profile
difference between the two abstract interfaces i and j. With y[p] and
Z, we have the centered profile and coordinate covariance matrices
Q(y[p]) and Q(Z). Subsequently, tr(Q(y[p])Q(Z))/k, a correlation
measurement between y[p] and spatial coordinates Z is computed.
Assuming that the correlation measurement follows a mixture of χ^2
distributions, we obtain the p-value and the corresponding q-value for
y[p]. SPIDER uses the implementation in SPARK-X^[314]23.
Given that LRI profiles are often noisy due to sequencing limitations,
we further denoise the LRI profiles by discretization, that is,
converging the continuous LRI profile to discrete labels. We achieve
spatial neighbor-aware discretization with hidden Markov random field
by assuming that the observable LRI signal y[p] are conditionally
dependent on the underlying discrete labels f = {f[i]} whose states are
not observable. Initially, we estimate f with a Gaussian mixture model,
with which we apply the alpha-expansion graph cut algorithm implemented
in scGCO^[315]24 to find a label vector f of interfaces that minimizes
an energy function (detailed in Supplementary Method [316]1.3.3). The
optimal label f * defines an active region
[MATH: F
:MATH]
for the LRI signal, on which we test for the spatial non-randomness
p-value with the complete spatial randomness framework^[317]24, as well
as the corresponding q-values.
Since we have multiple statistical tests, SPIDER filters the SVIs by
applying a strict threshold of p-value ≤ 0.01 across all tests. If the
resultant SVIs is less than 10, SPIDER combines the resulting p values
using the Cauchy p value combination rule^[318]71 to obtain a single p
value, which is also subject to the threshold of p-value ≤ 0.01. The
threshold is also implemented as a hyperparameter for users to adjust.
SVI pattern generation with a Gaussian process mixture model
We use a Gaussian process mixture model proposed in SpatialDE^[319]70
with a Dirichlet process prior^[320]21 to group SVIs sharing similar
spatial distributions. The nonparametric Dirichlet process prior
automatically determines the number of patterns. From the LRI profile
matrix
[MATH: Y^ :MATH]
, we obtain the latent SVI membership of the SVI pattern and the
distribution of each pattern c, denoted as
[MATH: μ^<
/mrow>c :MATH]
, in the form of a GP mixture model with components as the GP models of
SVI members, solved by variational inference as implemented in
SpatialDE2^[321]21.
We term
[MATH: μ^<
/mrow>c :MATH]
as the activation strength for pattern c. Following the mapping between
z and its best-matching abstract interface
[MATH: z^<
mrow>*(z) :MATH]
, each interface z has the same activation strength as
[MATH: z^<
mrow>*(z) :MATH]
. We use
[MATH: ϒ=(μ1,μ2,⋯,μc
mrow>)T :MATH]
to denote the activation strength matrix of interfaces.
Construction of simulation tests and comparisons with other methods
To examine SPIDER’s accuracy in identifying SVIs, we simulated multiple
ST datasets containing both spatially variable (SV) and non-SV
ligand-receptor interactions (LRIs) as well as target gene supports
using the SVCA package^[322]25. In these simulations, the interaction
strength of each LRI is not uniform but is varied across cells.
Specifically, the level of interaction strength is controlled by the
fraction of ligand expression variance explained by its interaction
with the corresponding receptor, as specified in SVCA. Similarly, SVCA
simulates target gene expressions from a given receptor by controlling
the intrinsic effect, that is, the fraction of target gene variance
explained by receptor activation. Therefore, we use the fraction of the
interaction effect to control the strength level of interaction, and
the fraction of the intrinsic effect to control the activation of
target TF genes by receptors.
We first simulated ligand expression for testing the identification of
SVI candidates. To distinguish SV and non-SV patterns, we selected the
top 100 SV and non-SV receptors from a real dataset. From the
expression of SV receptors, SVCA simulates ligands that present similar
spatial patterns, therefore generating LRIs with spatial patterns.
Similarly, ligands generated from non-SV receptors will result in LRIs
without spatial patterns. We control the level of interaction strength
by supplying a range of fractions of variance to be explained by gene
interactions. We further apply a Poisson noise on the generated ligand
expression, with λ defined as the product of expressions and a scaling
factor adjusting the intensity of the noise, resulting in three noise
levels.
Subsequently, we simulated TF expression for testing svTF support of
LRIs. We constructed a four-hop knowledge graph, where the target TF
genes can be either svTF (spatially variable TF) or nsvTF
(non-spatially variable TF), and their expression can either correlate
or not correlate with the receptor gene. By setting the intrinsic
effect to 0.9, we obtain TF expressions correlating with the input
receptor gene, and the uncorrelated TF expressions take the negative
part of the correlating TF expressions. If the input receptor gene is
spatially variable, then the simulated TF expressions are also
spatially variable. To generate nsvTF for SV receptors, we apply a
Poisson distribution with λ = 1 to disrupt the generated spatial
pattern while maintaining the correlation between TFs and the SV
receptor. On the other hand, if the input receptor is not spatially
variable, we apply a Gaussian model with length-scale 2 to at an
expression hot-spot to simulate spatial variance. When testing the
effect of TF variability on SPIDER, we also apply three levels of
Poisson noise on the generated TF gene expression as above.
Joining the simulated ligand and TF expression, we construct various
simulation cases. Initially, we generated 12 simulations by
systematically varying the interaction strength levels–specifically,
setting the fraction of ligand variance explained by receptor
interaction to 99, 75, 50, and 25%–and applying three noise levels
using scaling factors of 0.5, 0.7, and 0.9, which correspond to low
(0.1), medium (0.3), and high (0.5) noise, respectively. For both SV
and non-SV receptor genes, half of them are supported by svTFs, while
the other half are not. At the median noise level and with four
interaction strength levels, we generated an additional 12 simulations
by adjusting the fraction of svTF-supported non-SV receptors.
Specifically, non-SV receptor genes can be fully supported, partially
supported, or not supported by svTFs to evaluate the TF scoring
function of SPIDER. Lastly, we generated 12 null simulations at the
median noise level across different interaction strength levels,
ensuring that all SV receptors are not supported by any svTFs. In
contrast, non-SV receptor genes can be fully supported, half supported,
or not supported by svTFs.
Several methods have been developed for ST-based LRI analyses.
SpatialDM and SpatialCorr detect spot-level interactions, while SpaTalk
identifies cluster-level interactions. On the other hand, CellChat
detects cluster-level interactions without using spatial knowledge. For
cluster-based LRI methods SpaTalk and CellChat, we kept the lowest
p-value for each ligand-receptor pair between any two clusters. We
compared SPIDER with these methods. However, it should be noted that
these methods, except for SpatialDM, are rather unfavorable in this
simulation setting for having relatively different objectives.
Furthermore, to show the effect of SPIDER’s interaction scoring on the
capacity-constrained interfaces, we incorporated two other methods,
namely stLearn and COMMOT, that provide spot-based interaction
scoring^[323]12,[324]13, and followed with SPIDER’s statistical model
testing. Statistical testing with similar LRIs is used in stLearn to
estimate spot-wise interaction significance scores. On the other hand,
we supply COMMOT’s spot-spot interaction scores onto the constructed
interfaces.
We generated receiver operating characteristic (ROC) curves under each
interaction scenario for all benchmarking methods, SPIDER, and
different models used in SPIDER. The ROC curves were then used to
compare the AUC (Area Under the Receiver Operator Curve) values. For
null cases with no svTF-supported SVIs, we use the specificity metric
with a significance cutoff of 0.01.
To generate bulk simulation datasets, we grouped adjacent cells into
computational spots using SOM on the PDAC and DLPFC simulation
datasets. Due to the smaller number of cells in the PDAC sample, we set
the number of cells per spot to four and six for both the PDAC and
DLPFC samples. The simulated expression values of cells within each
spot were aggregated by taking the mean expression, generating
bulk-level profiles from the underlying single-cell simulations. For
cluster-based SpaTalk, we label a spot by the majority of cluster
labels from its member cells.
Evaluating the effect of spatial constraints on both simulation and real
datasets
We further assessed SPIDER’s robustness under weak and non-existent
spatial constraints by applying permutation tests to both single-cell
and bulk resolution PDAC simulations across four interaction strength
levels. To model weak spatial constraints, we performed block
permutations, randomly shuffling cells or spots within predefined block
sizes of 3, 5, and 10, with larger block sizes representing
progressively weaker spatial constraints. To simulate the complete
absence of spatial constraints, we randomly permuted all spots or cells
without maintaining any block structure, repeating this process five
times for each interaction strength setting.
Decoding cell type interactions with directed and undirected SVI
Given spot annotations such as cell types or cluster labels, SPIDER
decodes the interacting parties from an SVI. Let C = {C[t]} denote the
categories in the given annotation, and c = {c[s]} denote the spot
labels. We define a symmetric matrix K representing cell type
interactions for the pth LRI, where
[MATH:
K(C
t,C
t′)=∑(s,
s′
)ys
,p⋅1{Ct∈{cs<
/msub>,cs′}<
/mrow>}⋅1{Ct′∈{cs,
cs′
mo>}}∑(s,
s′
)1{Ct∈{cs<
/msub>,cs′}<
/mrow>}⋅1{Ct′∈{cs,
cs′
mo>}} :MATH]
9
subject to
[MATH: t≠t′
mrow> :MATH]
. Here
[MATH:
(s,s
′) :MATH]
represents an interface, y[s,p] is the expression level of the pth LRI
on spot s, and
[MATH: 1 :MATH]
is the indicator function. For the diagonal entry, we have
[MATH:
K(C
t,Ct<
/mi>)=
∑(s,
s′
)ys,p⋅1{Ct=cs}⋅1{Ct=cs′
}
∑(s,
s′
)1{Ct=cs}⋅1{Ct=cs′
}
. :MATH]
10
The matrix K reveals the mean SVI expressions of interfaces connecting
different cell types in the sample.
Subsequently, we construct a non-symmetric version of K for directed
cell type interactions, with
[MATH:
K(C
t,C
t′)=∑(s,
s′
)ys,p⋅1{cs=Ct}⋅1{cs′=Ct′}∑(s,
s′
)1{cs=Ct}⋅1{cs′=Ct′}.
mo> :MATH]
11
The diagonal entries remain unchanged as in Equation ([325]10). SPIDER
visualizes the non-symmetric K with a chord diagram, with the option to
exclusively display interactions between different cell types.
Downstream analyses with identified SVIs and SVI patterns
In the above section, we obtain the SVI expression matrix
[MATH: Y′ :MATH]
, subset from the full expression matrix Y, and the SVI pattern matrix
ϒ. Subsequently, we demonstrate that SPIDER populates
[MATH: Y′ :MATH]
and ϒ back to the feature space of the spot.
We retain a mapping between spots and interfaces as B, with B(s)
denoting a set of interfaces connected to spot s. Let X and χ denote
the spot-transformed SVI expression and SVI pattern matrics, defined as
[MATH: Xs=1∣B(s)∣∑i∈B<
mo>(s)Yi′
andχs=1∣B<
mo>(s)∣∑i∈B<
mo>(s)ϒi. :MATH]
12
Furthermore, we calculate Pearson correlation with known spot
annotations and perform SVI or SVI pattern-based spot clustering given
the populated features. We showcase the resemblance between spot
annotations and SVIs or SVI patterns by calculating their Pearson
correlations. Specifically, we calculate the Pearson correlations
between the populated matrices with known spot annotations, such as
one-hot encoded cell type labels or cell type deconvolution results.
For the populated SVI and SVI pattern matrices X and χ, we apply
Scanpy’s clustering [326]pipeline^[327]45. Specifically, we embed the
populated matrices in a two-dimensional space using Scanpy’s UMAP
implementation with default parameters. We then use Scanpy’s Leiden
implementation to extract clusters from the UMAP-reduced features. For
the mouse embryo and mouse brain datasets, we set the resolution of
Leiden as 0.1 and 0.5, respectively, to approximate the number of
annotated clusters. For the SAW dataset, we vary the resolution
parameter to 0.1, 0.5, and 1, respectively. We further find the
differentially expressed SVIs among interaction-based clusters using
Scanpy’s rank_genes_groups with default parameters. The top SVIs were
ranked by integrating log-fold change and t-test p-value. We further
utilized the SVI matrix X for inferring pseudotime and trajectory,
following Scanpy’s [328]trajectory inference pipeline with the number
of PCA components set to 10.
Considering that interfaces could serve as a counterpart of spots or
cells, we also performed clustering, pseudotime and trajectory
inference on interfaces instead of spots. The method and results are
discussed in detail in Supplementary Note [329]10. Similarly, SVIs
could be seen as a counterpart of genes, for which we performed
additional tests to examine the relation between SVIs and genes
(details in Supplementary Note [330]11).
Pathway enrichment for LRIs and svTFs
We perform pathway enrichment analyses on a given set of LRIs, both
utilizing the “enrichR” function implemented in the GSEApy package to
test for enrichment on pathway databases such as KEGG and
Reactome^[331]72. Specifically, for enrichment analysis on genes
implicated in the set of LRIs, we call “enrichR” with default human
KEGG pathway “KEGG_2021_Human” and mouse KEGG pathway
“KEGG_2019_Mouse”. For enrichment on an SVI and its supporting svTFs,
the gene set is the joint set of the LR genes and the supporting svTFs.
Configuration of the used software
We configured several computational software packages for interface
scoring, abstraction, and SVI analysis. For COT scoring of interfaces,
we used the default parameters implemented in COMMOT, except for
adjustments made at interfaces. When applying SOMDE for interface
abstraction, we configured the mixture portion parameter to 0.7 and set
it to represent 10 interfaces, with each abstract interface by default.
For identifying and evaluating SVI candidates, we applied eight
software. Regarding the SpatialDE, SpatialDE2’s omnibus mode, and
SPARKX for detecting SVI candidates, we left them at their default
settings. For SOMDE, we set the default mixture portion and number of
interfaces per abstract interface to four for under 1000 interfaces or
ten otherwise. To evaluate different combinations of edge and node
penalties using the scGCO software, we systematically varied the smooth
factor among 1, 5, 10, and 20 and the unary scale factor between 50 and
100. Across all statistical tests run using various combinations, we
report the smallest q-value to determine significant SVI candidates.
For nnSVG, we try with default parameters, and when nnSVG fails for
some inputs, we include two configuration options - adding a
pseudo-count of 1 to the interface profile matrix or discarding
low-reliability LRI pairs. We obtained Moran’s I and Geary’s C metrics
using the Squidpy package^[332]73 with 1000 permutations set for
significance testing.
We compared SPIDER with several methodologies, including SpatialDM,
SpaTalk, CellChat, stLearn, COMMOT, and spatialcorr. For SpatialDM, we
set the parameters l=1.2 and a cutoff of 0.2, using bin-spot results
for cell-type correlation assessments. SpaTalk was run with default
parameters. In the case of CellChat, we specified the type as
truncatedMean in the computeCommunProb function and set the trim to
0.01. For stLearn, we applied score pval_adj_cutoff=0.05, and set
distances to 8 for DLPFC and 1.3 for PDAC simulations. The resultant
lr_sig_scores were used as the LRI scores. For COMMOT, we applied a
distance threshold (dis_thr) of 2.1 and enabled heteromeric
interactions. Lastly, spatialcorr was evaluated with a bandwidth of 10,
but it took too long to process the DLPFC simulations, leading to its
exclusion from further evaluations. A significance threshold of 0.01
was applied to the final LRI results from all methods, consistent with
the threshold used in SPIDER.
Datasets
An overview of datasets, including cell or spot number, gene number, as
well as the number of interfaces and SVIs, are included in
Supplementary Table [333]9. Preprocessing of count matrices, LR pair
databases, and interfaces are described in Supplementary
Method [334]1.4.
Human pancreatic ductal adenocarcinoma dataset
Raw data of the Spatial Transcriptomics sample PDAC-A of pancreatic
ductal adenocarcinoma is collected from [335]GSE111672, and the spot
annotations are from Fig. [336]2 of the publication^[337]28. Cell type
deconvolution with paired scRNA-seq data is performed by SPOTlight
following the [338]github tutorial.
SAW mouse brain dataset
Raw data and clusters of the stereo-seq mouse brain samples are
collected from [339]CNP0004437^[340]30.
Mouse hippocampus dataset
Raw data and annotations of the slide-seq V2 mouse hippocampus samples
are collected from the Single Cell Portal under project
[341]SCP815^[342]8. Cell type deconvolution with a mouse single-cell
RNA-seq hippocampus dataset^[343]74 is performed by Seurat following
the [344]official tutorial.
Mouse embryo dataset
Raw data and cell types of the seqFISH mouse embryos are collected from
the [345]Spatial Mouse Atlas^[346]29.
Human dorsolateral prefrontal cortex dataset
The raw counts and manual annotations of the human dorsolateral
prefrontal cortex samples are collected from [347]globus. For each
biological sample, one repeat sample is selected, namely samples
151,510, 151,672, and 151,673.
Human breast cancer dataset
From the HER2-positive breast cancer Spatial Transcriptomics samples
provided in the study of Wu et al.^[348]57, we selected six samples
namely A1, B1, C1, D1, G2, and F1, whose count matrices and spot
annotations are collected from [349]zenodo project 3957257.
Reporting summary
Further information on research design is available in the [350]Nature
Portfolio Reporting Summary linked to this article.
Supplementary information
[351]Supplementary Information^ (21.3MB, pdf)
[352]Peer Review File^ (17.1MB, pdf)
[353]41467_2025_62988_MOESM3_ESM.pdf^ (71.7KB, pdf)
Description of Additional Supplementary Files
[354]Supplementary Data 1^ (326.6KB, xlsx)
[355]Reporting Summary^ (1.9MB, pdf)
Source data
[356]Source Data^ (26.4MB, xlsx)
Acknowledgements