Abstract
Background
Cellular communication is vital for the proper functioning of
multicellular organisms. A comprehensive analysis of cellular
communication demands the consideration not only of the binding between
ligands and receptors but also of a series of downstream signal
transduction reactions within cells. Thanks to the advancements in
spatial transcriptomics technology, we are now able to better decipher
the process of cellular communication within the cellular
microenvironment. Nevertheless, the majority of existing spatial
cell–cell communication algorithms fail to take into account the
downstream signals within cells.
Results
In this study, we put forward SpaCcLink, a cell–cell communication
analysis method that takes into account the downstream influence of
individual receptors within cells and systematically investigates the
spatial patterns of communication as well as downstream signal
networks. Analyses conducted on real datasets derived from humans and
mice have demonstrated that SpaCcLink can help in identifying more
relevant ligands and receptors, thereby enabling us to systematically
decode the downstream genes and signaling pathways that are influenced
by cell–cell communication. Comparisons with other methods suggest that
SpaCcLink can identify downstream genes that are more closely
associated with biological processes and can also discover reliable
ligand-receptor relationships.
Conclusions
By means of SpaCcLink, a more profound and all-encompassing
comprehension of the mechanisms underlying cellular communication can
be achieved, which in turn promotes and deepens our understanding of
the intricate complexity within organisms.
Supplementary Information
The online version contains supplementary material available at
10.1186/s12915-025-02141-x.
Keywords: Spatial transcriptome, Cell–cell communication, Downstream
pathways, Communication patterns
Background
In multicellular organisms, cell–cell communication is of fundamental
importance for sustaining the harmonious and coordinated operation of
tissues, organs, or systems. Cellular communication empowers cells to
identify one another, interchange information, and collaborate in a
synchronized manner to fulfill a diverse array of biological tasks,
such as cell differentiation, proliferation, development, and the
orchestration of immune responses [[32]1, [33]2]. Thus, the dissection
of cell–cell interactions holds utmost significance in comprehensively
grasping the structure, functionality, and overall “landscape” of
biological systems. It offers crucial revelations into the elaborate
network of intercellular communication and coordination and also plays
a pivotal part in deciphering disease mechanisms and guiding
therapeutic strategies.
Cell–cell interactions transpire via a multiplicity of mechanisms,
encompassing autocrine, paracrine, juxtacrine, and endocrine signaling
[[34]3, [35]4]. Autocrine and juxtacrine interactions entail
short-range effects, whereas paracrine and endocrine interactions can
exert their influence over more extended distances. Within tissues,
signaling cells can attach to receptors on recipient cells by means of
specific ligands, thereby instigating a sequence of signal transduction
reactions. These signal transduction reactions are capable of either
activating or suppressing particular signaling pathways, ultimately
resulting in alterations in the activity of transcription factors and
the expression of target genes [[36]5–[37]8]. Subsequently, these
modifications in transcription factor activity and target gene
expression further exert an impact on the functionality and
physiological processes of the cells.
In recent times, remarkable headway has been achieved in deciphering
cell–cell communication from single-cell transcriptome data,
attributable to the ceaseless progression of single-cell transcriptomic
sequencing (scRNA-seq) technology [[38]9–[39]12]. A number of methods
have been devised to analyze and fathom cell–cell communication. These
methods can be generally classified into two categories of algorithms:
those involved in deducing intercellular communication between diverse
cell types [[40]13–[41]16] and those considering intracellular
signaling [[42]17–[43]20]. The algorithms utilized for inferring
intercellular signaling depend on pre-existing databases and
communication scores to single out significant ligand-receptor pairs.
Moreover, the methods for considering intracellular signaling can be
further partitioned into two classifications: ones based on established
signal pathway annotations [[44]17, [45]18, [46]20] and ones based on
de novo inference [[47]19]. Notwithstanding the relative maturity of
these methods founded on scRNA-seq data, there remains a necessity to
further probe into how spatial information can be harnessed in the
exploration of cell–cell communication. This is because cell–cell
communication customarily takes place within a specific range, and
incorporating spatial distance information can assist in alleviating
potential false-positive outcomes that might emerge from non-spatial
communication methods [[48]21].
Spatial transcriptome technology offers a powerful tool for
investigating cell–cell communication within the tissue
microenvironment [[49]22, [50]23]. It enables us to obtain a more
comprehensive understanding of cell–cell interactions, uncover the
spatial dependence of such communication in tissues, and further deepen
our insights into cell functions and regulatory mechanisms in the
tissue microenvironment. Recently, multiple methods have been developed
to study spatial cell–cell communication based on spatial transcriptome
data. For instance, SpaTalk [[51]24], Giotto [[52]25], and Niches
[[53]26] are designed to identify neighborhood communication. SpaCI
[[54]27] and GCNG [[55]28] leverage graph convolutional networks to
detect adjacent potential communication relationships. HoloNet [[56]29]
employs a graph network model to analyze the impact of communication on
target gene expression, while COMMOT [[57]30] utilizes the collective
optimal transmission method to analyze the competitive relationships in
communication.
Although there are some methods that do take downstream signaling into
consideration in the context of spatial communication, the current ones
[[58]24, [59]29–[60]32] fail to make full use of gene networks and gene
expression information when dealing with downstream signaling.
Traditional non-spatial communication inference methods also have the
same limitations since they do not adequately incorporate gene
expression information and the pathways of signal transduction. In the
study of cellular communication and regulation, taking the downstream
signaling process within the cell into account can help us understand
the regulatory mechanism of cell communication in a more comprehensive
manner [[61]33].
Consequently, in order to conduct a more precise analysis of the
downstream functions influenced by cell–cell communication and identify
the corresponding spatial communication patterns, we have put forward
SpaCcLink, a spatial cellular communication analysis method that takes
into account the downstream signals of individual cells and
systematically explores both the spatial cellular communication
patterns and downstream signal networks. Leveraging the prior gene
network, the graph attention network is employed to identify distinct
groups of target genes that are highly correlated with the receptors.
By combining the specific expressions of these target genes, the
activity fraction of receptors in individual cells can be quantified to
analyze the communication patterns. Additionally, the Cross Moran index
is incorporated to identify spatially dependent signal molecules.
To assess the performance of SpaCcLink, we compare it with other
methods using real datasets. The results reveal that the downstream
signals identified by our model are of greater significance. Moreover,
the receptor pairs identified through the Cross Moran index also
demonstrate strong reliability. In summary, SpaCcLink represents a
novel approach for the downstream functional analysis of spatial
cell–cell communication. It facilitates the identification of key
signaling molecules and communication patterns, thereby contributing to
a deeper understanding of cell localization and function within
tissues.
Results and discussion
Overview of SpaCcLink
The schematics of SpaCcLink are shown in Fig. [62]1. Firstly, the
downstream influence of each receptor within individual cells is taken
into account. A graph attention network model is employed to pinpoint a
group of target genes that exhibit a high degree of correlation with
the receptor (Fig. [63]1a). Subsequently, the downstream influence
scores of each receptor within every cell are derived by integrating
the gene expression specificity scores (Fig. [64]1c).
Fig. 1.
[65]Fig. 1
[66]Open in a new tab
Overview of SpaCcLink. a Identification of downstream target genes
associated with the receptor within gene networks. The gene network
comprises a comprehensive signaling pathway commencing from the ligand
(L), proceeding through the receptor (R), intermediates (M), impinging
on transcription factors (TF), and ultimately arriving at the target
genes (TG). b Recognition of spatially dependent ligand-receptor pairs.
Initially, the interaction range of ligand-receptor pairs is
partitioned using the Wasserstein distance. Subsequently,
ligand-receptor pairs with both global and local spatial dependencies
are identified based on the cross-Moran’s index. c Statistical analysis
of intercellular communication. Through the integration of
gene-specific expression levels and receptor-target gene interactions,
the downstream impact of receptors is precisely quantified. d
Visualization of the downstream analysis of spatial resolution
cell–cell communication
Secondly, the Wasserstein distance is calculated for all
ligand-receptor pairs. Given that the existing prior ligand-receptor
(LR) databases fail to offer detailed accounts of the interaction
modalities between receptors and ligands, SpaCcLink resorts to an
optimal transport-based approach to categorize LR pairs into
short-range and long-range interactions. By making use of d_ratio and
p-values (as detailed in the “ [67]Methods” section), the interaction
pairs are sorted into significant short-range, significant long-range,
and the default medium-range classifications. Grounded on this
classification, cross-Moran’s I index is applied to single out
ligand-receptor pairs with conspicuous spatial dependencies (as
illustrated in Fig. [68]1b and Fig. [69]1c). These two sequential steps
collaboratively facilitate the identification of ligand-receptor pairs
that exhibit both remarkable spatial dependencies and powerful
downstream effects.
Finally, a downstream analysis is carried out on the identified
ligand-receptor pairs (Fig. [70]1c). This analysis encompasses several
key components: decomposing the receptor impact scores through
non-negative matrix factorization (NMF) to obtain corresponding
communication patterns, dissecting the intercellular communication
among different cell types, identifying the downstream target genes
that are primarily affected by the ligand-receptor pairs with the aid
of Sankey diagrams, conducting Fisher’s exact test to analyze the
crucial transcription factors and visualizing the entire signal
transduction pathways from receptors to target genes.
Identification of communication patterns and significant LR pairs in human
melanoma data
We initially applied our methodology to human melanoma data derived
from the ST platform. This dataset encompasses 293 spots and comprises
7 distinct cell types (as depicted in Fig. [71]2a.). We computed the
downstream influence scores for each receptor and integrated them with
spatial communication analysis. This enabled us to identify receptors
that possess both significant downstream influence and conspicuous
spatial dependence on ligands. Through the execution of standard NMF on
the downstream influence scores of receptors, two patterns of
downstream signaling communication within the melanoma data were
obtained (shown in Fig. [72]2b). It is evident that Pattern 0
predominantly aligns with the distribution area of melanoma cells,
whereas Pattern 1 mainly corresponds to B cell regions and certain
boundary zones between cancer-associated fibroblasts (CAF) and melanoma
(as illustrated in Fig. [73]2a and b). To further validate the efficacy
of identifying communication patterns via the downstream signaling
scores of receptors, we carried out supplementary dimensionality
reduction and Leiden clustering on the outcomes obtained through the
traditional scoring approach (see the “ [74]Methods” section). By
regulating the resolution, we acquired results that were highly
congruent with the communication patterns identified by the downstream
scores (as presented in Fig. [75]2c).
Fig. 2.
[76]Fig. 2
[77]Open in a new tab
Case study illustrating the application of SpaCcLink in human melanoma
data. a The cellular type of each spot within the human melanoma data.
b Identification of patterns pertaining to the downstream influence of
receptors. c Dimensionality reduction and clustering of cell-cell
communication scores. d Enrichment of top receptors in the loading of
downstream communication patterns. e Communication hotspot plot between
the ligand CXCL12 and the receptor CXCR4. f Circle plot depicting the
intercellular communication involving the ligand CXCL12 and the
receptor CXCR4 between different cell types. g Pathway enrichment
analysis for the top 20 ranked receptor genes in pattern1 based on
loading scores
We subsequently acquired the top genes for each pattern by relying on
the decomposed H-matrix (Fig. [78]2d). This matrix delineates the
significance or influence of each feature within the original data,
principally aiming to gauge the extent of contribution to the pattern.
In pattern 1, the genes that rank at the top and exhibit high
specificity are ALOX5, CD79A, and CXCR12. ALOX5 pertains to the human
lipoxygenases family and is robustly correlated with immune cell
populations [[79]34, [80]35].
It orchestrates the signaling pathways or immune responses that impact
these cell populations. Previous investigations have unearthed a
correlation between ALOX5 and the prognosis of melanoma [[81]36,
[82]37]. The expression of CD79a is highly specific to B cells, making
it one of the characteristic molecules of B cells. The CD79a/b
heterodimer formed by the binding of CD79a and CD79b plays a crucial
role in the normal functioning of B cells, including signal
transduction, cell development, and survival [[83]38, [84]39]. The
expression distribution of CXC12 and CXCR4 manifested an evident
spatial dependence (Fig. [85]2e), which intimated that CXCR4
predominantly received signals from CXCL12. The communication between
CXC12 and CXCR4 is chiefly concentrated between B cells and B cells as
well as between B cells and CAF (Fig. [86]2f). The interaction between
CXCL12 and CXCR4 is highly expressed in B cells and is postulated to
play a pivotal role in cell migration and proliferation [[87]40]. These
processes are intimately intertwined with cancer progression and
metastasis, thereby designating CXCL12 and CXCR4 as potential targets
for the treatment of skin malignancies, especially melanoma [[88]41,
[89]42]. In accordance with the GOBP (Gene Ontology Biological Process)
enrichment analysis carried out on the top 20 genes in pattern 1, the
outcomes signified a substantial association between pattern 1 and
biological processes such as leukocyte cell adhesion regulation,
cellular activation, and inflammatory response (Fig. [90]2g). These
discoveries imply that pattern 1 might be implicated in the modulation
of the immune system and inflammatory processes. This holds significant
ramifications for further research into immune cell function and the
disease mechanisms of melanoma.
Identification of prognostic receptor genes and downstream signals in human
breast cancer data
Subsequently, we applied our methodology to human breast cancer data
acquired from the 10X Visium platform. This dataset comprises 3798
spots and encompasses 9 cell types, including four malignant cell
types, namely basal_like_1, basal_like_2, luminal_AR, and mesenchymal
cells (as shown in Fig. [91]3a). In a similar vein, we initially
conducted pattern recognition on the downstream signal scores of
receptor genes that manifested significant effects and exhibited clear
spatial dependency on ligands. Through this process, we identified four
communication patterns, among which patterns 1–3 display distinct
regional specificities. Pattern 1 is mainly associated with the area of
immune cells; pattern 2 is predominantly concentrated between the
malignant cell type basal_like_1 and stroma cells, potentially being
related to the invasiveness and aggressiveness of the tumor; pattern 3
is mostly clustered in the area of malignant cells (illustrated in
Fig. [92]3b). We then visualized the top 10 receptor genes with high
load scores in pattern 1. Among these genes, CD247, CD3D, and CD48
demonstrated the regional specificity of pattern 1. Utilizing TCGA
human breast cancer data, we carried out survival analysis on the top
10 receptor genes. The expression of specific receptor genes in pattern
1 significantly influenced the survival time of patients, whereas
non-specific genes showed no significant correlation (Additional file
[93]1: Fig. S1).
Fig. 3.
[94]Fig. 3
[95]Open in a new tab
Case study on the application of SpaCcLink in human breast cancer data.
a The cellular type of each spot within the human breast cancer data. b
Identification of the downstream impact patterns of receptors
exhibiting regional heterogeneity. c Enrichment Analysis of top
receptors in the loading of downstream communication patterns. d
Communication of HLA_B-CD3D at the cell-type level. e The survival
analysis curves for the specific genes CD247, CD3D, and CD48 were
constructed using the TCGA Human Breast Cancer Dataset. f Sankey
diagram of cellular communication signal from basal_like_1 to
macrophages. g Pathway enrichment of downstream signaling genes of CD3D
Next, we analyzed the communication relationship between CD3D and its
spatially dependent ligand HLA-B across all cell types. The
communication between HLA-B and CD3D primarily takes place within
basal_like_1 cells and between basal_like_1 cells and macrophage cells.
In our analysis, we discovered that the high communication score within
basal_like_1 is mainly attributable to its relatively larger cell
population. Consequently, we focused on the signaling process from
basal_like_1 to macrophages. The signaling from basal_like_1 to
macrophages is primarily mediated by ligand HLA-C and ligand HLA-B.
These molecules act as signaling initiators and interact with receptor
molecules CD3D and LILRB2 on the cell surface.
As signal receivers, CD3D and LILRB2 receive the signals from
basal_like_1 cells and mainly transmit them to downstream target genes
UBAP2L, VAMP3, and VCAN (shown in Fig. [96]3f). UBAP2L is associated
with macrophage polarization [[97]43]. VAMP3 plays a vital role in
macrophages by regulating processes such as adhesion, spreading, and
sustained migration, thereby having a significant impact on their
function [[98]44, [99]45]. Moreover, VAMP3 is involved in membrane
transport pathways related to cytokine secretion and pathogen
engulfment [[100]46]. VCAN is widely expressed during immune
inflammation processes and has been found to regulate macrophage
activation and function [[101]47, [102]48].
We performed Gene Ontology Biological Process enrichment analysis on
the genes involved in the downstream signaling of CD3D and found that
they were mainly enriched in the following functions: immune
response-activating cell surface receptor signaling pathway, cellular
response to cytokine stimulus, and cytokine signaling in the immune
system (as presented in Fig. [103]3g). These results imply that CD3D
may play a significant role in immune response and cytokine signaling.
The study of these signaling pathways is essential for understanding
the interactions between cancer cells and immune cells as well as their
crucial roles in the development of breast cancer and the configuration
of the tumor microenvironment.
Identification of key TFs and complete downstream signaling pathway in mouse
brain data
We also applied our methodology to mouse brain data obtained from the
10X Visium platform. This dataset covered 2688 spots and encompassed 15
cell types (as shown in Fig. [104]4a). In a similar manner, we
initially carried out pattern recognition on the downstream effect
scores of receptors. Four patterns were identified, which were closely
related to cell type distributions. Pattern 0 was predominantly
clustered in the mouse cortex and pyramidal layer regions (as
illustrated in Fig. [105]4a and b). Subsequently, we visualized the top
5 genes with high loading scores for each pattern. In pattern 0, the
specific genes were identified as Robo2, Cckbr, and Cnr1 (as depicted
in Fig. [106]4d). For each specific gene and its ligands that exhibited
significant spatial dependence, we conducted a cell-type communication
network analysis (as shown in Fig. [107]4c). Cckbr and its ligand CcK
primarily communicate within Cortex_3 cells and between Cortex_1 and
Cortex_3, which is in line with previous studies [[108]49].
Cholecystokinin (CCK) is a peptide hormone and one of the most abundant
neuropeptides in the vertebrate brain. CCKBR plays a preponderant role
in the central nervous system, mainly in the neocortex and limbic
structures. CCK has been recognized as a central modulator in neuronal
circuits, and along with its receptors, it participates in the
regulation of neural processes such as feeding, memory, pain
perception, and exploratory behavior [[109]50].
Fig. 4.
[110]Fig. 4
[111]Open in a new tab
Case study on the application of SpaCcLink in mouse brain data. a The
cell type of each spot in the mouse brain data. b Identification of the
downstream impact patterns of receptors exhibiting regional
heterogeneity. c Inter-cell type communication heatmap of Cck-Cckbr. d
Enrichment of top receptors in the loading of downstream communication
patterns. e Downstream signal path of Cckbr. Green symbolizes the
receptor, gray stands for the mediator, orange represents TF, and blue
indicates the target gene. f Pathway enrichment of downstream signaling
genes of Cckbr
Next, we employed Fisher’s exact test to identify the downstream TFs
activated by each receptor gene. Subsequently, we visualized the
complete signal transduction network from the receptor to the
transcription factors and further to the target genes (as shown in
Fig. [112]4e).
FOXO1 and FOXO3 belong to the FOXO family, and they play crucial roles
in neural development, cell survival, and metabolic regulation
[[113]51]. Trp53 plays a vital part in the nervous system, regulating
the process of apoptosis [[114]52]. NFKB1 regulates inflammatory and
immune responses in neuronal cells, and its involvement is associated
with the development of neurological disorders and cognitive
impairments [[115]53, [116]54]. By conducting GOBP enrichment analysis
on downstream genes, we can further confirm the association of the
identified functional transcription factors with cellular responses to
organonitrogen compounds, responses to peptides, and behavior-related
processes. Identifying key transcription factors and studying the
complete signaling pathways can offer a profound understanding of the
functionality and regulatory mechanisms of the mouse brain. This
contributes to deciphering the interactions between cells, the
organization and functionality of neural networks, and the mechanisms
underlying complex physiological processes such as behavior.
Compared with other methods
We compared SpaCcLink with SpaTalk, NicheNet, and CellCall, assessing
their performance in deducing the intracellular downstream signaling
pathways within different cell types. Due to the complex nature of
inputting prior LR data and pathway data from diverse methods, we
decided to employ the specific prior data furnished by each method. It
was discovered that a considerable portion of the LR data we amassed
and arranged overlapped with the data utilized by other methods
(Fig. [117]5a). Although NicheNet presented a more extensive range of
LR pair information, a significant part still did not coincide with the
other methods. We conducted pathway enrichment analysis on the
downstream target genes of the receptors using the KEGG and Reactome
databases. The significance of pathway enrichment was gauged based on
the adjusted q-values acquired from the enrichment analysis. According
to the outcomes presented in Fig. [118]5b, it was found that the
downstream genes identified by SpaCcLink possess a stronger capacity to
embrace receptor-related biological processes or pathways.
Fig. 5.
[119]Fig. 5
[120]Open in a new tab
Comparison of the performance of SpaCcLink with NicheNet, CellCall,
SpaTalk, CellChat Giotto, and COMMOT. a The Venn diagram depicts the
overlap of prior LR databases employed by different methods. b
Performance comparison of SpaCcLink with other methods (NicheNet,
CytoTalk, and CellCall) in inferring downstream signaling on human
melanoma, human breast cancer, and mouse brain datasets. p-values were
utilized to signify the significance of enriched pathways or biological
processes in the KEGG and Reactome databases. c The overlap rate of
cell–cell communication predicted between different methods
Furthermore, we also contrasted the performance of other cell
communication methods in inferring LR interactions, such as CellChat,
Giotto, COMMOT, and SpaTalk. Among these methods, only CellChat does
not make use of spatial information. Giotto and SpaTalk depend on
K-nearest neighbor techniques to infer cell communication with adjacent
neighbors, whereas COMMOT deduces cell communication within a specific
range. We applied a unified prior LR database for all these methods.
Although CellChat inferred the majority of the communications, it had a
restricted overlap with other methods. In contrast, while SpaCcLink
inferred a relatively smaller number of LR pairs, it displayed a larger
overlap with other methods, especially with COMMOT (as shown in
Fig. [121]5c. and Additional file [122]1: Fig. S2). In the human breast
cancer dataset, it was noted that SpaCcLink had a lower overlap rate in
predicting LR interactions in comparison to SpaTalk. This is because
SpaTalk’s predictions mostly overlapped with CellChat, while there was
less coincidence with other methods that take spatial information into
account. Hence, this implies that SpaCcLink demonstrates higher
reliability in inferring LR interactions, as it exhibits a greater
overlap with other methods. This indicates that the predictions from
SpaCcLink might be more dependable, offering valuable perspectives for
understanding cell–cell communication.
Conclusions
Cell–cell communication not only entails the binding of ligands and
receptors but also a cascade of downstream signaling reactions
initiated by the receptors. Moreover, cell–cell communication typically
occurs within a specific range, with a higher probability of
communication between adjacent cells. The progress in spatial
transcriptomics has empowered us to directly decipher cell–cell
communication within the cellular microenvironment. However, among the
existing methods for analyzing communication based on spatial
transcriptomic data, only a few take into account the downstream
signaling reactions within cells. Additionally, the methods that
consider intracellular signaling fail to integrate the topological
information of prior gene networks and gene expression, thereby
restricting the comprehensive analysis of cell communication. Here, we
present SpaCcLink, a spatial cell–cell communication approach that
combines the downstream signaling reactions within cells and the
spatial expression relationships. Case studies on three real datasets
(human melanoma, human breast cancer, and mouse brain) illustrate that
SpaCcLink can effectively assist us in identifying ligands and
receptors with significant downstream impact and spatial expression
dependencies, thereby uncovering their downstream intracellular
signaling. Moreover, comparisons with other methods suggest that
SpaCcLink can identify more reliable ligand-receptor relationships and
more efficiently identify pathways associated with biological
processes.
SpaCcLink initially focuses on the receptor genes that are more
pertinent to downstream reactions and utilizes Moran’s I index for
spatial communication analysis to guarantee that the identified
receptor genes possess stronger downstream influence and display
spatial dependence with ligand expression. Experimental investigations
on human melanoma have shown that the downstream signaling patterns
identified by SpaCcLink are in accordance with the communication
patterns. Thus, analyzing the downstream signaling patterns can aid us
in identifying receptor genes that have communication relationships and
strong downstream influence. Analysis results on human breast cancer
imply that SpaCcLink can help us identify receptor genes that are
significantly correlated with patient survival, providing crucial cues
for disease research and treatment.
Furthermore, we have constructed a complete signaling pathway from
receptors to transcription factors and then to target genes. In the
analysis of mouse brain data, SpaCcLink can identify key transcription
factors in the signaling cascade and visualize the entire signal
transduction process between receptors and target genes. By revealing
the complete pathways of intercellular signaling, we can gain insights
into the patterns of interactions between cells, signal amplification
and cooperation mechanisms, leading to a better comprehension of the
functions and regulatory mechanisms of biological systems.
Owing to the consideration of gene networks, SpaCcLink is indeed more
appropriate for scenarios with a larger number of genes. In cases with
fewer genes, its accuracy and reliability might be constrained.
Additionally, the available prior LR databases do not furnish detailed
descriptions of the interaction modes between receptors and ligands.
Consequently, SpaCcLink adopts an optimal transport-based method to
classify LR pairs into short-range interactions and long-range
interactions. This step demands a longer computation time. If future
research can supply additional information regarding the interaction
modes, this step could potentially be simplified or omitted.
Furthermore, with the continuous advancement of single-cell resolution
spatial transcriptomic sequencing technologies, we can anticipate
exploring the complete process of cell communication at a higher
resolution level. This will enable us to achieve a deeper understanding
of the interactions and signal transduction between cells, resulting in
more accurate analysis outcomes.
Methods
Collection of ligand-receptor pairs and gene signal network data
We have successfully collected ligand-receptor pairs from CellTalkDB
[[123]55], CytoTalk, and Connectome [[124]56], including 3324 known
ligand-receptor pairs for human and 2484 known ligand-receptor pairs
for mouse. TF-TG regulatory data are obtained from TRRUST [[125]57],
HTRIdb [[126]58] and RegNetwork [[127]59]. Furthermore, pathways from
receptors to TFs are gathered from KEGG [[128]60] and Reactome
[[129]61].
To construct the gene signal network, we integrate TF-TG regulatory
data and pathways, and only retain the pathways that goes from
receptors to TG, forming a strongly connected graph.
Construction of training data
To identify target genes highly correlated with receptors within the
genes network, we employ triplet loss [[130]62] for model training. The
triplet loss optimizes the model by triplets, i.e., < anchor, positive,
negative > . Specifically, it brings the anchor and the positive as
close as possible within the manifold space, while pushing the anchor
and the negative as far apart as possible. Additionally, it ensures
that different domains, which are composed of similar samples, remain
as separate as possible. This enables our model to effectively learn
the gene interaction relationships and distinguish between interacting
and non-interacting gene relationships.
Pearson correlation coefficients for all associations in the gene
network are calculated to construct triplets. A default threshold of 5%
(which is learnable) is utilized to identify highly correlated links as
positive samples, denoted as < anchor, positive > . The gene set not
associated with the anchor is taken as negative samples, denoted
as < anchor, negative > . Highly related gene pairs and unrelated gene
pairs together form triplets, such as
[MATH:
a,gp,gn> :MATH]
, where
[MATH:
a,gp>∈P :MATH]
,
[MATH:
a,gn>∈N :MATH]
,
[MATH: ga :MATH]
refers to anchor,
[MATH: gp :MATH]
refers to positive,
[MATH: gn :MATH]
refers to negative. In this manner, the trained model is capable of
identifying gene relationships with stronger correlations within the
collected gene network.
Model architecture
To better identify target genes that are more relevant to the receptor
in terms of the gene signaling network structure and expression
features, we employ a graph attention network (GAT) model to extract
the latent characteristics of gene expression.
We use log-normalized expression data
[MATH:
X∈Rm
×n :MATH]
and gene interaction network
[MATH:
A∈Rm
×m :MATH]
as inputs, where n and m denote the number of cells and genes,
respectively. The model comprises a two-layer graph attention network.
In each layer of GAT, each node is adaptively weighted based on the
feature information of its neighboring nodes to obtain a comprehensive
embedded representation
[MATH: Z :MATH]
.The embedded representation of the gene
[MATH: i :MATH]
at layer k (
[MATH: Zi(k) :MATH]
) can be defined as follows:
[MATH: Zi(k)=σ(∑j∈
NiαijkW
kZj(k-1)) :MATH]
1
[MATH: eij=LeakReLU(a→[WZi‖WZj])
:MATH]
2
[MATH: αij=softmaxj(eij)=exp(eij)∑k∈Niexp(eik) :MATH]
3
where
[MATH: Wk :MATH]
is the weight parameter matrix of the k layer,
[MATH: Ni :MATH]
represents the neighbor of gene i in the signal network,
[MATH: αijk :MATH]
is the correlation coefficient between genes i and j,
[MATH: ‖ :MATH]
is concatenation operation, and
[MATH: a→ :MATH]
is the trainable weight vector. Consequently, each gene is capable of
integrating information from neighboring genes, thereby facilitating a
more accurate capture of the interactions between genes.
Loss function
For a given triplet
[MATH:
<ga,
gp,gn> :MATH]
, the optimization objective of the model adopts triplet loss:
[MATH: L=∑(a,p,n)∈Kmax{d(a,p)-d(a,n)+α,0<
mo stretchy="false">} :MATH]
4
where
[MATH: K :MATH]
is the triplet list;
[MATH: d(a,p) :MATH]
is the absolute value of cosine distance between
[MATH: za :MATH]
and
[MATH: zp :MATH]
;
[MATH: α :MATH]
is the margin parameter. The margin
[MATH: α :MATH]
controls the difference in manifold space between interacting pairs and
non-interacting pairs. Cosine similarity is used to measure the
correlation between gene pairs in the manifold space.
Computation of intracellular influence score
By taking into account the expression of target genes that are highly
correlated with the receptor, we assess the downstream influence of the
receptor
[MATH: r :MATH]
in the cell
[MATH: i :MATH]
using the following definitions:
[MATH:
IntraScoreri=∑Tg∈
mo>TrPE<
msubsup>Mri·
PEMTgi·
|corrr-Tg|/Tr :MATH]
5
where
[MATH: Tr :MATH]
is the set of target genes highly correlated with the receptor
[MATH: r :MATH]
; and
[MATH:
corrr-Tg :MATH]
is the absolute value of cosine similarity between the latent feature
[MATH: Zr :MATH]
and
[MATH: ZTg :MATH]
;
[MATH: ∗
:MATH]
is the cardinality of the set;
[MATH:
PEMji :MATH]
measures the expression specificity of gene
[MATH: j :MATH]
in cell
[MATH: i :MATH]
.The definition of
[MATH:
PEMji :MATH]
is as follows:
[MATH:
PEMji=log2Exprji<
/msubsup>eiA :MATH]
6
[MATH:
eiA=∑m=1MMeanjm·
s∗A
mmultiscripts>∑m=1<
/mn>Ms∗m :MATH]
7
where
[MATH:
Exprj
i :MATH]
is the expression of gene
[MATH: j :MATH]
in cell
[MATH: i :MATH]
;
[MATH: eiA
:MATH]
is the expected expression of gene
[MATH: i :MATH]
in cell type
[MATH: A :MATH]
under the null distribution across all
[MATH: M :MATH]
cell types;
[MATH:
Meanj
m :MATH]
is the average expression of gene
[MATH: j :MATH]
in cell type
[MATH: m :MATH]
;
[MATH:
s∗A
:MATH]
is the total expression of all genes in cell type A. Thus, the
intracellular score takes into account the expression specificity and
association between the receptor and target genes.
Division of the type of ligand-receptor pairs
Cell–cell interaction typically occurs only within a specific range.
Based on the interaction distance, cell–cell interaction can be
categorized into four types: autocrine, paracrine, juxtacrine, and
endocrine. However, it is challenging to assess endocrinology in local
tissue, and the interaction range between autocrine and juxtacrine is
relatively short and difficult to distinguish. Moreover, the physical
distance between cells impacts signal diffusion and intensity, and
different interaction types need to be evaluated at appropriate spatial
scales. The curated LR data that we have collected does not provide
information about the types of interactions. Inspired by previous
research [[131]63], we employ the Wasserstein distance to partition the
types of interactions for the ligand-receptor pairs. The Wasserstein
distance is a metric for quantifying the difference between two
probability distributions. It is founded on optimal transport theory
and computes the minimum cost between two distributions by
contemplating how to transform one distribution into another.
In the classification of ligand-receptor interaction types, the
Wasserstein distance is utilized to measure the minimum transportation
cost between the distributions of ligand expression and receptor
expression. By supposing that the ligand genes are expressed in m cells
and the receptor genes are expressed in n cells, and using the
Euclidean distance matrix between cells as the cost matrix
[MATH: D∈Rm×
n :MATH]
, the Wasserstein distance from the ligand expression distribution to
the receptor expression distribution can be calculated by resolving the
optimal transportation cost problem as follows:
[MATH:
γ′=<
mrow>argminγ∈Γ(L,R)<γ,D> :MATH]
8
And the corresponding Wasserstein distance is obtained as follows.
[MATH: W(L,R)=minγ∈Γ(L,R)<γ,D>=<γ
′,D> :MATH]
9
We then conduct multiple random permutations (with a default of 200
permutations) on the expression distributions of the ligand and
receptor. We recalculate the Wasserstein distance and utilize the
d_ratio (the ratio of the actual distance to the mean of the multiple
perturbations) and the two-sided significance p-value to classify the
types of interactions.
After sorting lr pairs based on d_ratio and p_value, we select the lr
pairs that satisfy the condition d_ratio < 1 and p_value < 0.05 from
the top 10% as short-range interaction pairs. We then select the lr
pairs that satisfy the condition d_ratio > 1 and p_value < 0.05 from
the bottom 10% as long-range interaction pairs, and the remaining LR
pairs are categorized as medium-range interaction pairs.
Spatial communication analysis
To identify ligand-receptor pairs with spatial dependence, we calculate
the cross-Moran’s I index for all interaction pairs and obtain
significant candidate pairs through a permutation test (with a default
of 500 permutations). In the permutation test, we randomly shuffle the
locations of all spots and assess statistical significance by comparing
the observed cross-Moran’s I index to the likelihood of obtaining such
an extremely positive value under the permutation-based random model.
Furthermore, we consider that the communication relationship with a
p-value less than 0.05 is significant.
[MATH: Cross-Moran′I=N
∑iN∑jNW<
/mi>ij<
msubsup>∑iN<
mo>∑jNWij(xi-x¯)(yj-y¯)∑<
/mo>iN(xi-x¯)2∑jN(yj-y¯)2
mfrac> :MATH]
10
where
[MATH: xi :MATH]
represents the expression of ligand
[MATH: x :MATH]
in cell
[MATH: i :MATH]
, while
[MATH: yj :MATH]
represents the expression of receptor gene
[MATH: y :MATH]
in cell
[MATH: j :MATH]
; N is the number of cells;
[MATH: Wij :MATH]
is the distance weight of cells i and j, which is calculated based on
the k-nearest neighbor method. The exact choice of k should reflect the
data. For all data sets analyzed in this study, which are derived from
the 10X Visium platform, according to Squidpy [[132]64], we set k = 6
for short-range interactions and multiplied on the medium-range and
long-range upper parameters to capture different ranges of
interactions. For other data sets, users can visualize the weight
matrix for different parameters and choose the most appropriate scale
for their data.
We further use local cross-Moran’s I index to identify important local
interaction regions for each LR pairs. The local cross-Moran’s I for
spot i is defined as:
[MATH: LocalCross-Moran′<
/msup>Ii=∑j
NWij[(xi-x¯)(yj-y¯)+(yi-y¯)(xj-x¯)]∑<
/mo>iN(xi-x¯)2∑jN(yj-y¯)2
mfrac> :MATH]
11
Similarly, we utilize permutation testing for conducting a significance
test (with a default of 500 permutations). After identifying reliable
communication relationships, we further quantify the cell–cell
communication strength between cell
[MATH: i :MATH]
and cell
[MATH: j :MATH]
for LR pair
[MATH: l-r :MATH]
using the following definition:
[MATH: CCIScorelrij=x
mi>li·x<
/mi>rj·IntraScorerj·(1-dijdthre) :MATH]
12
where
[MATH: xli
:MATH]
is the expression of gene
[MATH: l :MATH]
in cell
[MATH: i :MATH]
;
[MATH: dij :MATH]
is the distance between cell
[MATH: i :MATH]
and
[MATH: j :MATH]
;
[MATH: dthre :MATH]
is the median distance among all local interaction hotspots of the
interaction pair.
Analysis of communication between cell types
To identify ligand-receptor pairs with significant spatial
communication between clusters, we calculate the cross-Moran’s I index
for the interaction pair (ligand
[MATH: x :MATH]
and receptor
[MATH: y :MATH]
) between cell type A and cell type B as follows:
[MATH: CelltypesCross-Moran′<
/msup>I=
N+M2∑
i∈A∑j
∈BWij<
msub>∑i∈A
∑j∈BW
ij(xi-x¯)(yj-y¯)∑i∈A(xi-x¯)2∑j∈B<
mrow>(yj-y¯)2
mfrac> :MATH]
13
where N and M are the number of cells belonging to cell types A and B,
respectively. And we also permute the locations of spots within each
cluster for significance test (with a default of 500 permutations).
After filtering out LR pairs with significant communication
relationships between different cell types through cross-Moran index,
we can further calculate cell–cell communication scores between
different cell types for the LR pair
[MATH: l-r :MATH]
.The cell-type-level signaling strength is quantified as.
[MATH: CelltypeCCIScorel
rAB=∑(k,l)∈IABCCI
ScorelrklIAB
:MATH]
14
where
[MATH: IAB={(k,l)|k∈SA
msub>,l∈SB} :MATH]
,
[MATH: SA :MATH]
is the cluster A.
Identification of downstream key transcription factors and signaling pathways
After identifying the reliable ligand-receptor relationship, we further
identify the signals affected by the ligand-receptor. First, we utilize
the inter-gene co-expression coefficient to further screen the
receptor-target gene relationship for each cell type (with a threshold
of 0.1) and then employ the Fisher’s test to identify the activity
fraction of each transcription factor. The activity score of TF k in
the cell type
[MATH: t :MATH]
is calculated by Fisher’s exact test as follows:
[MATH: P=a+bac+dca+b+c+da+c :MATH]
15
where
[MATH: a :MATH]
is the number of the set of target genes that are highly associated
with the receptor
[MATH: r :MATH]
and TF
[MATH: k :MATH]
;
[MATH: b :MATH]
is the number of the set of target genes that receptor r can act on
through the prior gene network excluding
[MATH: a :MATH]
;
[MATH: c :MATH]
is the number of the set of target genes that is highly correlated with
receptor
[MATH: r :MATH]
but not associated with TF
[MATH: k :MATH]
;
[MATH: d :MATH]
is the number of all genes excluding a, b, and c.
After identifying the key transcription factors, we search for the
optimal signal pathway from the receptor to the target gene based on
the co-expression coefficient between the genes.
Comparison with other methods
In our experiments, we conducted a comparison of our method against
CellChat, COMMOT, SpaTalk, and Giotto using three real datasets. For
each of the methods, our collected data was utilized as the prior
ligand-receptor (LR) database, and default parameters were employed to
predict LR pairs. We regarded the interaction pairs between clusters
with a significant communication p-value of less than 0.05 as being of
importance.
In addition to the aforementioned methods, we also made a comparison
with methods (SpaTalk, CellCall, NicheNet) that take downstream
signaling into account. Given the complexity of their prior databases,
we opted to use their original databases. All methods were implemented
with default parameters. Inspired by SpaTalk, we employed Fisher’s
exact test to compare the significance of predicted downstream
signaling pathways involved in KEGG and Reactome among different
methods.
Supplementary Information
[133]12915_2025_2141_MOESM1_ESM.docx^ (547.4KB, docx)
Additional file 1: Supplementary figures S1-S2.
Acknowledgements