Abstract

Background

   Cellular communication is vital for the proper functioning of
   multicellular organisms. A comprehensive analysis of cellular
   communication demands the consideration not only of the binding between
   ligands and receptors but also of a series of downstream signal
   transduction reactions within cells. Thanks to the advancements in
   spatial transcriptomics technology, we are now able to better decipher
   the process of cellular communication within the cellular
   microenvironment. Nevertheless, the majority of existing spatial
   cell–cell communication algorithms fail to take into account the
   downstream signals within cells.

Results

   In this study, we put forward SpaCcLink, a cell–cell communication
   analysis method that takes into account the downstream influence of
   individual receptors within cells and systematically investigates the
   spatial patterns of communication as well as downstream signal
   networks. Analyses conducted on real datasets derived from humans and
   mice have demonstrated that SpaCcLink can help in identifying more
   relevant ligands and receptors, thereby enabling us to systematically
   decode the downstream genes and signaling pathways that are influenced
   by cell–cell communication. Comparisons with other methods suggest that
   SpaCcLink can identify downstream genes that are more closely
   associated with biological processes and can also discover reliable
   ligand-receptor relationships.

Conclusions

   By means of SpaCcLink, a more profound and all-encompassing
   comprehension of the mechanisms underlying cellular communication can
   be achieved, which in turn promotes and deepens our understanding of
   the intricate complexity within organisms.

Supplementary Information

   The online version contains supplementary material available at
   10.1186/s12915-025-02141-x.

   Keywords: Spatial transcriptome, Cell–cell communication, Downstream
   pathways, Communication patterns

Background

   In multicellular organisms, cell–cell communication is of fundamental
   importance for sustaining the harmonious and coordinated operation of
   tissues, organs, or systems. Cellular communication empowers cells to
   identify one another, interchange information, and collaborate in a
   synchronized manner to fulfill a diverse array of biological tasks,
   such as cell differentiation, proliferation, development, and the
   orchestration of immune responses [[32]1, [33]2]. Thus, the dissection
   of cell–cell interactions holds utmost significance in comprehensively
   grasping the structure, functionality, and overall “landscape” of
   biological systems. It offers crucial revelations into the elaborate
   network of intercellular communication and coordination and also plays
   a pivotal part in deciphering disease mechanisms and guiding
   therapeutic strategies.

   Cell–cell interactions transpire via a multiplicity of mechanisms,
   encompassing autocrine, paracrine, juxtacrine, and endocrine signaling
   [[34]3, [35]4]. Autocrine and juxtacrine interactions entail
   short-range effects, whereas paracrine and endocrine interactions can
   exert their influence over more extended distances. Within tissues,
   signaling cells can attach to receptors on recipient cells by means of
   specific ligands, thereby instigating a sequence of signal transduction
   reactions. These signal transduction reactions are capable of either
   activating or suppressing particular signaling pathways, ultimately
   resulting in alterations in the activity of transcription factors and
   the expression of target genes [[36]5–[37]8]. Subsequently, these
   modifications in transcription factor activity and target gene
   expression further exert an impact on the functionality and
   physiological processes of the cells.

   In recent times, remarkable headway has been achieved in deciphering
   cell–cell communication from single-cell transcriptome data,
   attributable to the ceaseless progression of single-cell transcriptomic
   sequencing (scRNA-seq) technology [[38]9–[39]12]. A number of methods
   have been devised to analyze and fathom cell–cell communication. These
   methods can be generally classified into two categories of algorithms:
   those involved in deducing intercellular communication between diverse
   cell types [[40]13–[41]16] and those considering intracellular
   signaling [[42]17–[43]20]. The algorithms utilized for inferring
   intercellular signaling depend on pre-existing databases and
   communication scores to single out significant ligand-receptor pairs.
   Moreover, the methods for considering intracellular signaling can be
   further partitioned into two classifications: ones based on established
   signal pathway annotations [[44]17, [45]18, [46]20] and ones based on
   de novo inference [[47]19]. Notwithstanding the relative maturity of
   these methods founded on scRNA-seq data, there remains a necessity to
   further probe into how spatial information can be harnessed in the
   exploration of cell–cell communication. This is because cell–cell
   communication customarily takes place within a specific range, and
   incorporating spatial distance information can assist in alleviating
   potential false-positive outcomes that might emerge from non-spatial
   communication methods [[48]21].

   Spatial transcriptome technology offers a powerful tool for
   investigating cell–cell communication within the tissue
   microenvironment [[49]22, [50]23]. It enables us to obtain a more
   comprehensive understanding of cell–cell interactions, uncover the
   spatial dependence of such communication in tissues, and further deepen
   our insights into cell functions and regulatory mechanisms in the
   tissue microenvironment. Recently, multiple methods have been developed
   to study spatial cell–cell communication based on spatial transcriptome
   data. For instance, SpaTalk [[51]24], Giotto [[52]25], and Niches
   [[53]26] are designed to identify neighborhood communication. SpaCI
   [[54]27] and GCNG [[55]28] leverage graph convolutional networks to
   detect adjacent potential communication relationships. HoloNet [[56]29]
   employs a graph network model to analyze the impact of communication on
   target gene expression, while COMMOT [[57]30] utilizes the collective
   optimal transmission method to analyze the competitive relationships in
   communication.

   Although there are some methods that do take downstream signaling into
   consideration in the context of spatial communication, the current ones
   [[58]24, [59]29–[60]32] fail to make full use of gene networks and gene
   expression information when dealing with downstream signaling.
   Traditional non-spatial communication inference methods also have the
   same limitations since they do not adequately incorporate gene
   expression information and the pathways of signal transduction. In the
   study of cellular communication and regulation, taking the downstream
   signaling process within the cell into account can help us understand
   the regulatory mechanism of cell communication in a more comprehensive
   manner [[61]33].

   Consequently, in order to conduct a more precise analysis of the
   downstream functions influenced by cell–cell communication and identify
   the corresponding spatial communication patterns, we have put forward
   SpaCcLink, a spatial cellular communication analysis method that takes
   into account the downstream signals of individual cells and
   systematically explores both the spatial cellular communication
   patterns and downstream signal networks. Leveraging the prior gene
   network, the graph attention network is employed to identify distinct
   groups of target genes that are highly correlated with the receptors.
   By combining the specific expressions of these target genes, the
   activity fraction of receptors in individual cells can be quantified to
   analyze the communication patterns. Additionally, the Cross Moran index
   is incorporated to identify spatially dependent signal molecules.

   To assess the performance of SpaCcLink, we compare it with other
   methods using real datasets. The results reveal that the downstream
   signals identified by our model are of greater significance. Moreover,
   the receptor pairs identified through the Cross Moran index also
   demonstrate strong reliability. In summary, SpaCcLink represents a
   novel approach for the downstream functional analysis of spatial
   cell–cell communication. It facilitates the identification of key
   signaling molecules and communication patterns, thereby contributing to
   a deeper understanding of cell localization and function within
   tissues.

Results and discussion

Overview of SpaCcLink

   The schematics of SpaCcLink are shown in Fig. [62]1. Firstly, the
   downstream influence of each receptor within individual cells is taken
   into account. A graph attention network model is employed to pinpoint a
   group of target genes that exhibit a high degree of correlation with
   the receptor (Fig. [63]1a). Subsequently, the downstream influence
   scores of each receptor within every cell are derived by integrating
   the gene expression specificity scores (Fig. [64]1c).

Fig. 1.

   [65]Fig. 1
   [66]Open in a new tab

   Overview of SpaCcLink. a Identification of downstream target genes
   associated with the receptor within gene networks. The gene network
   comprises a comprehensive signaling pathway commencing from the ligand
   (L), proceeding through the receptor (R), intermediates (M), impinging
   on transcription factors (TF), and ultimately arriving at the target
   genes (TG). b Recognition of spatially dependent ligand-receptor pairs.
   Initially, the interaction range of ligand-receptor pairs is
   partitioned using the Wasserstein distance. Subsequently,
   ligand-receptor pairs with both global and local spatial dependencies
   are identified based on the cross-Moran’s index. c Statistical analysis
   of intercellular communication. Through the integration of
   gene-specific expression levels and receptor-target gene interactions,
   the downstream impact of receptors is precisely quantified. d
   Visualization of the downstream analysis of spatial resolution
   cell–cell communication

   Secondly, the Wasserstein distance is calculated for all
   ligand-receptor pairs. Given that the existing prior ligand-receptor
   (LR) databases fail to offer detailed accounts of the interaction
   modalities between receptors and ligands, SpaCcLink resorts to an
   optimal transport-based approach to categorize LR pairs into
   short-range and long-range interactions. By making use of d_ratio and
   p-values (as detailed in the “ [67]Methods” section), the interaction
   pairs are sorted into significant short-range, significant long-range,
   and the default medium-range classifications. Grounded on this
   classification, cross-Moran’s I index is applied to single out
   ligand-receptor pairs with conspicuous spatial dependencies (as
   illustrated in Fig. [68]1b and Fig. [69]1c). These two sequential steps
   collaboratively facilitate the identification of ligand-receptor pairs
   that exhibit both remarkable spatial dependencies and powerful
   downstream effects.

   Finally, a downstream analysis is carried out on the identified
   ligand-receptor pairs (Fig. [70]1c). This analysis encompasses several
   key components: decomposing the receptor impact scores through
   non-negative matrix factorization (NMF) to obtain corresponding
   communication patterns, dissecting the intercellular communication
   among different cell types, identifying the downstream target genes
   that are primarily affected by the ligand-receptor pairs with the aid
   of Sankey diagrams, conducting Fisher’s exact test to analyze the
   crucial transcription factors and visualizing the entire signal
   transduction pathways from receptors to target genes.

Identification of communication patterns and significant LR pairs in human
melanoma data

   We initially applied our methodology to human melanoma data derived
   from the ST platform. This dataset encompasses 293 spots and comprises
   7 distinct cell types (as depicted in Fig. [71]2a.). We computed the
   downstream influence scores for each receptor and integrated them with
   spatial communication analysis. This enabled us to identify receptors
   that possess both significant downstream influence and conspicuous
   spatial dependence on ligands. Through the execution of standard NMF on
   the downstream influence scores of receptors, two patterns of
   downstream signaling communication within the melanoma data were
   obtained (shown in Fig. [72]2b). It is evident that Pattern 0
   predominantly aligns with the distribution area of melanoma cells,
   whereas Pattern 1 mainly corresponds to B cell regions and certain
   boundary zones between cancer-associated fibroblasts (CAF) and melanoma
   (as illustrated in Fig. [73]2a and b). To further validate the efficacy
   of identifying communication patterns via the downstream signaling
   scores of receptors, we carried out supplementary dimensionality
   reduction and Leiden clustering on the outcomes obtained through the
   traditional scoring approach (see the “ [74]Methods” section). By
   regulating the resolution, we acquired results that were highly
   congruent with the communication patterns identified by the downstream
   scores (as presented in Fig. [75]2c).

Fig. 2.

   [76]Fig. 2
   [77]Open in a new tab

   Case study illustrating the application of SpaCcLink in human melanoma
   data. a The cellular type of each spot within the human melanoma data.
   b Identification of patterns pertaining to the downstream influence of
   receptors. c Dimensionality reduction and clustering of cell-cell
   communication scores. d Enrichment of top receptors in the loading of
   downstream communication patterns. e Communication hotspot plot between
   the ligand CXCL12 and the receptor CXCR4. f Circle plot depicting the
   intercellular communication involving the ligand CXCL12 and the
   receptor CXCR4 between different cell types. g Pathway enrichment
   analysis for the top 20 ranked receptor genes in pattern1 based on
   loading scores

   We subsequently acquired the top genes for each pattern by relying on
   the decomposed H-matrix (Fig. [78]2d). This matrix delineates the
   significance or influence of each feature within the original data,
   principally aiming to gauge the extent of contribution to the pattern.
   In pattern 1, the genes that rank at the top and exhibit high
   specificity are ALOX5, CD79A, and CXCR12. ALOX5 pertains to the human
   lipoxygenases family and is robustly correlated with immune cell
   populations [[79]34, [80]35].

   It orchestrates the signaling pathways or immune responses that impact
   these cell populations. Previous investigations have unearthed a
   correlation between ALOX5 and the prognosis of melanoma [[81]36,
   [82]37]. The expression of CD79a is highly specific to B cells, making
   it one of the characteristic molecules of B cells. The CD79a/b
   heterodimer formed by the binding of CD79a and CD79b plays a crucial
   role in the normal functioning of B cells, including signal
   transduction, cell development, and survival [[83]38, [84]39]. The
   expression distribution of CXC12 and CXCR4 manifested an evident
   spatial dependence (Fig. [85]2e), which intimated that CXCR4
   predominantly received signals from CXCL12. The communication between
   CXC12 and CXCR4 is chiefly concentrated between B cells and B cells as
   well as between B cells and CAF (Fig. [86]2f). The interaction between
   CXCL12 and CXCR4 is highly expressed in B cells and is postulated to
   play a pivotal role in cell migration and proliferation [[87]40]. These
   processes are intimately intertwined with cancer progression and
   metastasis, thereby designating CXCL12 and CXCR4 as potential targets
   for the treatment of skin malignancies, especially melanoma [[88]41,
   [89]42]. In accordance with the GOBP (Gene Ontology Biological Process)
   enrichment analysis carried out on the top 20 genes in pattern 1, the
   outcomes signified a substantial association between pattern 1 and
   biological processes such as leukocyte cell adhesion regulation,
   cellular activation, and inflammatory response (Fig. [90]2g). These
   discoveries imply that pattern 1 might be implicated in the modulation
   of the immune system and inflammatory processes. This holds significant
   ramifications for further research into immune cell function and the
   disease mechanisms of melanoma.

Identification of prognostic receptor genes and downstream signals in human
breast cancer data

   Subsequently, we applied our methodology to human breast cancer data
   acquired from the 10X Visium platform. This dataset comprises 3798
   spots and encompasses 9 cell types, including four malignant cell
   types, namely basal_like_1, basal_like_2, luminal_AR, and mesenchymal
   cells (as shown in Fig. [91]3a). In a similar vein, we initially
   conducted pattern recognition on the downstream signal scores of
   receptor genes that manifested significant effects and exhibited clear
   spatial dependency on ligands. Through this process, we identified four
   communication patterns, among which patterns 1–3 display distinct
   regional specificities. Pattern 1 is mainly associated with the area of
   immune cells; pattern 2 is predominantly concentrated between the
   malignant cell type basal_like_1 and stroma cells, potentially being
   related to the invasiveness and aggressiveness of the tumor; pattern 3
   is mostly clustered in the area of malignant cells (illustrated in
   Fig. [92]3b). We then visualized the top 10 receptor genes with high
   load scores in pattern 1. Among these genes, CD247, CD3D, and CD48
   demonstrated the regional specificity of pattern 1. Utilizing TCGA
   human breast cancer data, we carried out survival analysis on the top
   10 receptor genes. The expression of specific receptor genes in pattern
   1 significantly influenced the survival time of patients, whereas
   non-specific genes showed no significant correlation (Additional file
   [93]1: Fig. S1).

Fig. 3.

   [94]Fig. 3
   [95]Open in a new tab

   Case study on the application of SpaCcLink in human breast cancer data.
   a The cellular type of each spot within the human breast cancer data. b
   Identification of the downstream impact patterns of receptors
   exhibiting regional heterogeneity. c Enrichment Analysis of top
   receptors in the loading of downstream communication patterns. d
   Communication of HLA_B-CD3D at the cell-type level. e The survival
   analysis curves for the specific genes CD247, CD3D, and CD48 were
   constructed using the TCGA Human Breast Cancer Dataset. f Sankey
   diagram of cellular communication signal from basal_like_1 to
   macrophages. g Pathway enrichment of downstream signaling genes of CD3D

   Next, we analyzed the communication relationship between CD3D and its
   spatially dependent ligand HLA-B across all cell types. The
   communication between HLA-B and CD3D primarily takes place within
   basal_like_1 cells and between basal_like_1 cells and macrophage cells.
   In our analysis, we discovered that the high communication score within
   basal_like_1 is mainly attributable to its relatively larger cell
   population. Consequently, we focused on the signaling process from
   basal_like_1 to macrophages. The signaling from basal_like_1 to
   macrophages is primarily mediated by ligand HLA-C and ligand HLA-B.
   These molecules act as signaling initiators and interact with receptor
   molecules CD3D and LILRB2 on the cell surface.

   As signal receivers, CD3D and LILRB2 receive the signals from
   basal_like_1 cells and mainly transmit them to downstream target genes
   UBAP2L, VAMP3, and VCAN (shown in Fig. [96]3f). UBAP2L is associated
   with macrophage polarization [[97]43]. VAMP3 plays a vital role in
   macrophages by regulating processes such as adhesion, spreading, and
   sustained migration, thereby having a significant impact on their
   function [[98]44, [99]45]. Moreover, VAMP3 is involved in membrane
   transport pathways related to cytokine secretion and pathogen
   engulfment [[100]46]. VCAN is widely expressed during immune
   inflammation processes and has been found to regulate macrophage
   activation and function [[101]47, [102]48].

   We performed Gene Ontology Biological Process enrichment analysis on
   the genes involved in the downstream signaling of CD3D and found that
   they were mainly enriched in the following functions: immune
   response-activating cell surface receptor signaling pathway, cellular
   response to cytokine stimulus, and cytokine signaling in the immune
   system (as presented in Fig. [103]3g). These results imply that CD3D
   may play a significant role in immune response and cytokine signaling.
   The study of these signaling pathways is essential for understanding
   the interactions between cancer cells and immune cells as well as their
   crucial roles in the development of breast cancer and the configuration
   of the tumor microenvironment.

Identification of key TFs and complete downstream signaling pathway in mouse
brain data

   We also applied our methodology to mouse brain data obtained from the
   10X Visium platform. This dataset covered 2688 spots and encompassed 15
   cell types (as shown in Fig. [104]4a). In a similar manner, we
   initially carried out pattern recognition on the downstream effect
   scores of receptors. Four patterns were identified, which were closely
   related to cell type distributions. Pattern 0 was predominantly
   clustered in the mouse cortex and pyramidal layer regions (as
   illustrated in Fig. [105]4a and b). Subsequently, we visualized the top
   5 genes with high loading scores for each pattern. In pattern 0, the
   specific genes were identified as Robo2, Cckbr, and Cnr1 (as depicted
   in Fig. [106]4d). For each specific gene and its ligands that exhibited
   significant spatial dependence, we conducted a cell-type communication
   network analysis (as shown in Fig. [107]4c). Cckbr and its ligand CcK
   primarily communicate within Cortex_3 cells and between Cortex_1 and
   Cortex_3, which is in line with previous studies [[108]49].
   Cholecystokinin (CCK) is a peptide hormone and one of the most abundant
   neuropeptides in the vertebrate brain. CCKBR plays a preponderant role
   in the central nervous system, mainly in the neocortex and limbic
   structures. CCK has been recognized as a central modulator in neuronal
   circuits, and along with its receptors, it participates in the
   regulation of neural processes such as feeding, memory, pain
   perception, and exploratory behavior [[109]50].

Fig. 4.

   [110]Fig. 4
   [111]Open in a new tab

   Case study on the application of SpaCcLink in mouse brain data. a The
   cell type of each spot in the mouse brain data. b Identification of the
   downstream impact patterns of receptors exhibiting regional
   heterogeneity. c Inter-cell type communication heatmap of Cck-Cckbr. d
   Enrichment of top receptors in the loading of downstream communication
   patterns. e Downstream signal path of Cckbr. Green symbolizes the
   receptor, gray stands for the mediator, orange represents TF, and blue
   indicates the target gene. f Pathway enrichment of downstream signaling
   genes of Cckbr

   Next, we employed Fisher’s exact test to identify the downstream TFs
   activated by each receptor gene. Subsequently, we visualized the
   complete signal transduction network from the receptor to the
   transcription factors and further to the target genes (as shown in
   Fig. [112]4e).

   FOXO1 and FOXO3 belong to the FOXO family, and they play crucial roles
   in neural development, cell survival, and metabolic regulation
   [[113]51]. Trp53 plays a vital part in the nervous system, regulating
   the process of apoptosis [[114]52]. NFKB1 regulates inflammatory and
   immune responses in neuronal cells, and its involvement is associated
   with the development of neurological disorders and cognitive
   impairments [[115]53, [116]54]. By conducting GOBP enrichment analysis
   on downstream genes, we can further confirm the association of the
   identified functional transcription factors with cellular responses to
   organonitrogen compounds, responses to peptides, and behavior-related
   processes. Identifying key transcription factors and studying the
   complete signaling pathways can offer a profound understanding of the
   functionality and regulatory mechanisms of the mouse brain. This
   contributes to deciphering the interactions between cells, the
   organization and functionality of neural networks, and the mechanisms
   underlying complex physiological processes such as behavior.

Compared with other methods

   We compared SpaCcLink with SpaTalk, NicheNet, and CellCall, assessing
   their performance in deducing the intracellular downstream signaling
   pathways within different cell types. Due to the complex nature of
   inputting prior LR data and pathway data from diverse methods, we
   decided to employ the specific prior data furnished by each method. It
   was discovered that a considerable portion of the LR data we amassed
   and arranged overlapped with the data utilized by other methods
   (Fig. [117]5a). Although NicheNet presented a more extensive range of
   LR pair information, a significant part still did not coincide with the
   other methods. We conducted pathway enrichment analysis on the
   downstream target genes of the receptors using the KEGG and Reactome
   databases. The significance of pathway enrichment was gauged based on
   the adjusted q-values acquired from the enrichment analysis. According
   to the outcomes presented in Fig. [118]5b, it was found that the
   downstream genes identified by SpaCcLink possess a stronger capacity to
   embrace receptor-related biological processes or pathways.

Fig. 5.

   [119]Fig. 5
   [120]Open in a new tab

   Comparison of the performance of SpaCcLink with NicheNet, CellCall,
   SpaTalk, CellChat Giotto, and COMMOT. a The Venn diagram depicts the
   overlap of prior LR databases employed by different methods. b
   Performance comparison of SpaCcLink with other methods (NicheNet,
   CytoTalk, and CellCall) in inferring downstream signaling on human
   melanoma, human breast cancer, and mouse brain datasets. p-values were
   utilized to signify the significance of enriched pathways or biological
   processes in the KEGG and Reactome databases. c The overlap rate of
   cell–cell communication predicted between different methods

   Furthermore, we also contrasted the performance of other cell
   communication methods in inferring LR interactions, such as CellChat,
   Giotto, COMMOT, and SpaTalk. Among these methods, only CellChat does
   not make use of spatial information. Giotto and SpaTalk depend on
   K-nearest neighbor techniques to infer cell communication with adjacent
   neighbors, whereas COMMOT deduces cell communication within a specific
   range. We applied a unified prior LR database for all these methods.
   Although CellChat inferred the majority of the communications, it had a
   restricted overlap with other methods. In contrast, while SpaCcLink
   inferred a relatively smaller number of LR pairs, it displayed a larger
   overlap with other methods, especially with COMMOT (as shown in
   Fig. [121]5c. and Additional file [122]1: Fig. S2). In the human breast
   cancer dataset, it was noted that SpaCcLink had a lower overlap rate in
   predicting LR interactions in comparison to SpaTalk. This is because
   SpaTalk’s predictions mostly overlapped with CellChat, while there was
   less coincidence with other methods that take spatial information into
   account. Hence, this implies that SpaCcLink demonstrates higher
   reliability in inferring LR interactions, as it exhibits a greater
   overlap with other methods. This indicates that the predictions from
   SpaCcLink might be more dependable, offering valuable perspectives for
   understanding cell–cell communication.

Conclusions

   Cell–cell communication not only entails the binding of ligands and
   receptors but also a cascade of downstream signaling reactions
   initiated by the receptors. Moreover, cell–cell communication typically
   occurs within a specific range, with a higher probability of
   communication between adjacent cells. The progress in spatial
   transcriptomics has empowered us to directly decipher cell–cell
   communication within the cellular microenvironment. However, among the
   existing methods for analyzing communication based on spatial
   transcriptomic data, only a few take into account the downstream
   signaling reactions within cells. Additionally, the methods that
   consider intracellular signaling fail to integrate the topological
   information of prior gene networks and gene expression, thereby
   restricting the comprehensive analysis of cell communication. Here, we
   present SpaCcLink, a spatial cell–cell communication approach that
   combines the downstream signaling reactions within cells and the
   spatial expression relationships. Case studies on three real datasets
   (human melanoma, human breast cancer, and mouse brain) illustrate that
   SpaCcLink can effectively assist us in identifying ligands and
   receptors with significant downstream impact and spatial expression
   dependencies, thereby uncovering their downstream intracellular
   signaling. Moreover, comparisons with other methods suggest that
   SpaCcLink can identify more reliable ligand-receptor relationships and
   more efficiently identify pathways associated with biological
   processes.

   SpaCcLink initially focuses on the receptor genes that are more
   pertinent to downstream reactions and utilizes Moran’s I index for
   spatial communication analysis to guarantee that the identified
   receptor genes possess stronger downstream influence and display
   spatial dependence with ligand expression. Experimental investigations
   on human melanoma have shown that the downstream signaling patterns
   identified by SpaCcLink are in accordance with the communication
   patterns. Thus, analyzing the downstream signaling patterns can aid us
   in identifying receptor genes that have communication relationships and
   strong downstream influence. Analysis results on human breast cancer
   imply that SpaCcLink can help us identify receptor genes that are
   significantly correlated with patient survival, providing crucial cues
   for disease research and treatment.

   Furthermore, we have constructed a complete signaling pathway from
   receptors to transcription factors and then to target genes. In the
   analysis of mouse brain data, SpaCcLink can identify key transcription
   factors in the signaling cascade and visualize the entire signal
   transduction process between receptors and target genes. By revealing
   the complete pathways of intercellular signaling, we can gain insights
   into the patterns of interactions between cells, signal amplification
   and cooperation mechanisms, leading to a better comprehension of the
   functions and regulatory mechanisms of biological systems.

   Owing to the consideration of gene networks, SpaCcLink is indeed more
   appropriate for scenarios with a larger number of genes. In cases with
   fewer genes, its accuracy and reliability might be constrained.
   Additionally, the available prior LR databases do not furnish detailed
   descriptions of the interaction modes between receptors and ligands.
   Consequently, SpaCcLink adopts an optimal transport-based method to
   classify LR pairs into short-range interactions and long-range
   interactions. This step demands a longer computation time. If future
   research can supply additional information regarding the interaction
   modes, this step could potentially be simplified or omitted.

   Furthermore, with the continuous advancement of single-cell resolution
   spatial transcriptomic sequencing technologies, we can anticipate
   exploring the complete process of cell communication at a higher
   resolution level. This will enable us to achieve a deeper understanding
   of the interactions and signal transduction between cells, resulting in
   more accurate analysis outcomes.

Methods

Collection of ligand-receptor pairs and gene signal network data

   We have successfully collected ligand-receptor pairs from CellTalkDB
   [[123]55], CytoTalk, and Connectome [[124]56], including 3324 known
   ligand-receptor pairs for human and 2484 known ligand-receptor pairs
   for mouse. TF-TG regulatory data are obtained from TRRUST [[125]57],
   HTRIdb [[126]58] and RegNetwork [[127]59]. Furthermore, pathways from
   receptors to TFs are gathered from KEGG [[128]60] and Reactome
   [[129]61].

   To construct the gene signal network, we integrate TF-TG regulatory
   data and pathways, and only retain the pathways that goes from
   receptors to TG, forming a strongly connected graph.

Construction of training data

   To identify target genes highly correlated with receptors within the
   genes network, we employ triplet loss [[130]62] for model training. The
   triplet loss optimizes the model by triplets, i.e., < anchor, positive,
   negative > . Specifically, it brings the anchor and the positive as
   close as possible within the manifold space, while pushing the anchor
   and the negative as far apart as possible. Additionally, it ensures
   that different domains, which are composed of similar samples, remain
   as separate as possible. This enables our model to effectively learn
   the gene interaction relationships and distinguish between interacting
   and non-interacting gene relationships.

   Pearson correlation coefficients for all associations in the gene
   network are calculated to construct triplets. A default threshold of 5%
   (which is learnable) is utilized to identify highly correlated links as
   positive samples, denoted as < anchor, positive > . The gene set not
   associated with the anchor is taken as negative samples, denoted
   as < anchor, negative > . Highly related gene pairs and unrelated gene
   pairs together form triplets, such as
   [MATH:
   <mrow><msub><mtext><g</mtext><mi>a</mi></msub><msub><mtext>,g</mtext><m
   i>p</mi></msub><msub><mtext>,g</mtext><mi>n</mi></msub><mo>></mo></mrow
   > :MATH]
   , where
   [MATH:
   <mrow><msub><mtext><g</mtext><mi>a</mi></msub><msub><mtext>,g</mtext><m
   i>p</mi></msub><mo>></mo><mo>∈</mo><mi>P</mi></mrow> :MATH]
   ,
   [MATH:
   <mrow><msub><mtext><g</mtext><mi>a</mi></msub><msub><mtext>,g</mtext><m
   i>n</mi></msub><mo>></mo><mo>∈</mo><mi>N</mi></mrow> :MATH]
   ,
   [MATH: <msub><mi>g</mi><mi>a</mi></msub> :MATH]
   refers to anchor,
   [MATH: <msub><mi>g</mi><mi>p</mi></msub> :MATH]
   refers to positive,
   [MATH: <msub><mi>g</mi><mi>n</mi></msub> :MATH]
   refers to negative. In this manner, the trained model is capable of
   identifying gene relationships with stronger correlations within the
   collected gene network.

Model architecture

   To better identify target genes that are more relevant to the receptor
   in terms of the gene signaling network structure and expression
   features, we employ a graph attention network (GAT) model to extract
   the latent characteristics of gene expression.

   We use log-normalized expression data
   [MATH:
   <mrow><mi>X</mi><mo>∈</mo><msup><mrow><mi>R</mi></mrow><mrow><mi>m</mi>
   <mo>×</mo><mi>n</mi></mrow></msup></mrow> :MATH]
   and gene interaction network
   [MATH:
   <mrow><mi>A</mi><mo>∈</mo><msup><mrow><mi>R</mi></mrow><mrow><mi>m</mi>
   <mo>×</mo><mi>m</mi></mrow></msup></mrow> :MATH]
   as inputs, where n and m denote the number of cells and genes,
   respectively. The model comprises a two-layer graph attention network.
   In each layer of GAT, each node is adaptively weighted based on the
   feature information of its neighboring nodes to obtain a comprehensive
   embedded representation
   [MATH: <mi>Z</mi> :MATH]
   .The embedded representation of the gene
   [MATH: <mi>i</mi> :MATH]
   at layer k (
   [MATH: <msubsup><mi>Z</mi><mrow><mi>i</mi></mrow><mrow><mo
   stretchy="false">(</mo><mi>k</mi><mo
   stretchy="false">)</mo></mrow></msubsup> :MATH]
   ) can be defined as follows:
   [MATH: <mrow><msubsup><mi>Z</mi><mi>i</mi><mrow><mo
   stretchy="false">(</mo><mi>k</mi><mo
   stretchy="false">)</mo></mrow></msubsup><mo>=</mo><mi>σ</mi><mspace
   width="0.277778em"></mspace><mrow><mo
   stretchy="false">(</mo><msub><mo>∑</mo><mrow><mi>j</mi><mo>∈</mo><msub>
   <mi>N</mi><mi>i</mi></msub></mrow></msub><msubsup><mi>α</mi><mrow><mi
   mathvariant="italic">ij</mi></mrow><mi>k</mi></msubsup><msub><mi>W</mi>
   <mi>k</mi></msub><msubsup><mi>Z</mi><mi>j</mi><mrow><mo
   stretchy="false">(</mo><mi>k</mi><mo>-</mo><mn>1</mn><mo
   stretchy="false">)</mo></mrow></msubsup><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   1
   [MATH: <mrow><msub><mi>e</mi><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub><mo>=</mo><mi>L</mi><mi>e</mi
   ><mi>a</mi><mi>k</mi><mspace width="0.166667em"></mspace><mrow><mi
   mathvariant="italic">Re</mi></mrow><mspace
   width="0.166667em"></mspace><mi>L</mi><mi>U</mi><mspace
   width="0.166667em"></mspace><mrow><mo stretchy="false">(</mo><mover
   accent="true"><mi>a</mi><mo stretchy="false">→</mo></mover><mrow><mo
   stretchy="false">[</mo><mi>W</mi><msub><mi>Z</mi><mi>i</mi></msub><mo
   stretchy="false">‖</mo><mi>W</mi><msub><mi>Z</mi><mi>j</mi></msub><mo
   stretchy="false">]</mo></mrow><mo stretchy="false">)</mo></mrow></mrow>
   :MATH]
   2
   [MATH: <mrow><msub><mi>α</mi><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub><mo>=</mo><mi>s</mi><mi>o</mi
   ><mi>f</mi><mi>t</mi><msub><mrow><mspace
   width="0.166667em"></mspace><mrow><mi
   mathvariant="italic">max</mi></mrow></mrow><mi>j</mi></msub><mrow><mo
   stretchy="false">(</mo><msub><mi>e</mi><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub><mo
   stretchy="false">)</mo></mrow><mo>=</mo><mfrac><mrow><mi
   mathvariant="italic">exp</mi><mo
   stretchy="false">(</mo><msub><mi>e</mi><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub><mo
   stretchy="false">)</mo></mrow><mrow><msub><mo>∑</mo><mrow><mi>k</mi><mo
   >∈</mo><msub><mi>N</mi><mi>i</mi></msub></mrow></msub><mi
   mathvariant="italic">exp</mi><mrow><mo
   stretchy="false">(</mo><msub><mi>e</mi><mrow><mi
   mathvariant="italic">ik</mi></mrow></msub><mo
   stretchy="false">)</mo></mrow></mrow></mfrac></mrow> :MATH]
   3

   where
   [MATH: <msub><mi>W</mi><mi>k</mi></msub> :MATH]
   is the weight parameter matrix of the k layer,
   [MATH: <msub><mi>N</mi><mi>i</mi></msub> :MATH]
   represents the neighbor of gene i in the signal network,
   [MATH: <msubsup><mi>α</mi><mrow><mi
   mathvariant="italic">ij</mi></mrow><mi>k</mi></msubsup> :MATH]
   is the correlation coefficient between genes i and j,
   [MATH: <mo stretchy="false">‖</mo> :MATH]
   is concatenation operation, and
   [MATH: <mover accent="true"><mi>a</mi><mo
   stretchy="false">→</mo></mover> :MATH]
   is the trainable weight vector. Consequently, each gene is capable of
   integrating information from neighboring genes, thereby facilitating a
   more accurate capture of the interactions between genes.

Loss function

   For a given triplet
   [MATH:
   <mrow><mo><</mo><msub><mtext>g</mtext><mi>a</mi></msub><mtext>,</mtext>
   <mspace
   width="0.333333em"></mspace><msub><mtext>g</mtext><mi>p</mi></msub><msu
   b><mtext>,g</mtext><mi>n</mi></msub><mo>></mo></mrow> :MATH]
   , the optimization objective of the model adopts triplet loss:
   [MATH: <mrow><mi>L</mi><mo>=</mo><msub><mo>∑</mo><mrow><mo
   stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>p</mi><mo>,</mo><mi>n</m
   i><mo stretchy="false">)</mo><mo>∈</mo><mi
   mathvariant="normal">K</mi></mrow></msub><mi
   mathvariant="italic">max</mi><mrow><mo
   stretchy="false">{</mo><mi>d</mi><mrow><mo
   stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>p</mi><mo
   stretchy="false">)</mo></mrow><mo>-</mo><mi>d</mi><mrow><mo
   stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>n</mi><mo
   stretchy="false">)</mo></mrow><mo>+</mo><mi>α</mi><mo>,</mo><mn>0</mn><
   mo stretchy="false">}</mo></mrow></mrow> :MATH]
   4

   where
   [MATH: <mi mathvariant="normal">K</mi> :MATH]
   is the triplet list;
   [MATH: <mrow><mi>d</mi><mo
   stretchy="false">(</mo><mi>a</mi><mo>,</mo><mi>p</mi><mo
   stretchy="false">)</mo></mrow> :MATH]
   is the absolute value of cosine distance between
   [MATH: <msub><mi>z</mi><mi>a</mi></msub> :MATH]
   and
   [MATH: <msub><mi>z</mi><mi>p</mi></msub> :MATH]
   ;
   [MATH: <mi>α</mi> :MATH]
   is the margin parameter. The margin
   [MATH: <mi>α</mi> :MATH]
   controls the difference in manifold space between interacting pairs and
   non-interacting pairs. Cosine similarity is used to measure the
   correlation between gene pairs in the manifold space.

Computation of intracellular influence score

   By taking into account the expression of target genes that are highly
   correlated with the receptor, we assess the downstream influence of the
   receptor
   [MATH: <mi>r</mi> :MATH]
   in the cell
   [MATH: <mi>i</mi> :MATH]
   using the following definitions:
   [MATH:
   <mrow><msubsup><mtext>IntraScore</mtext><mrow><mi>r</mi></mrow><mi>i</m
   i></msubsup><mo>=</mo><msub><mo>∑</mo><mrow><mi>T</mi><mi>g</mi><mo>∈</
   mo><msub><mi>T</mi><mi>r</mi></msub></mrow></msub><mi>P</mi><mi>E</mi><
   msubsup><mi>M</mi><mrow><mi>r</mi></mrow><mi>i</mi></msubsup><mo>·</mo>
   <mi>P</mi><mi>E</mi><msubsup><mi>M</mi><mrow><mi
   mathvariant="italic">Tg</mi></mrow><mi>i</mi></msubsup><mo>·</mo><mrow>
   <mo
   stretchy="false">|</mo><mi>c</mi><mi>o</mi><mi>r</mi><msub><mi>r</mi><m
   row><mi>r</mi><mo>-</mo><mi>T</mi><mi>g</mi></mrow></msub><mo
   stretchy="false">|</mo></mrow><mo stretchy="false">/</mo><mfenced
   close="|" open="|"><msub><mi
   mathvariant="normal">T</mi><mi>r</mi></msub></mfenced></mrow> :MATH]
   5

   where
   [MATH: <msub><mi mathvariant="normal">T</mi><mi>r</mi></msub> :MATH]
   is the set of target genes highly correlated with the receptor
   [MATH: <mi>r</mi> :MATH]
   ; and
   [MATH:
   <mrow><mi>c</mi><mi>o</mi><mi>r</mi><msub><mi>r</mi><mrow><mi>r</mi><mo
   >-</mo><mi>T</mi><mi>g</mi></mrow></msub></mrow> :MATH]
   is the absolute value of cosine similarity between the latent feature
   [MATH: <msub><mi>Z</mi><mi>r</mi></msub> :MATH]
   and
   [MATH: <msub><mi>Z</mi><mrow><mi
   mathvariant="italic">Tg</mi></mrow></msub> :MATH]
   ;
   [MATH: <mfenced close="|" open="|"><mrow></mrow><mo>∗</mo></mfenced>
   :MATH]
   is the cardinality of the set;
   [MATH:
   <mrow><mi>P</mi><mi>E</mi><msubsup><mi>M</mi><mrow><mi>j</mi></mrow><mi
   >i</mi></msubsup></mrow> :MATH]
   measures the expression specificity of gene
   [MATH: <mi>j</mi> :MATH]
   in cell
   [MATH: <mi>i</mi> :MATH]
   .The definition of
   [MATH:
   <mrow><mi>P</mi><mi>E</mi><msubsup><mi>M</mi><mrow><mi>j</mi></mrow><mi
   >i</mi></msubsup></mrow> :MATH]
   is as follows:
   [MATH:
   <mrow><msubsup><mtext>PEM</mtext><mrow><mi>j</mi></mrow><mi>i</mi></msu
   bsup><mo>=</mo><msub><mi
   mathvariant="italic">log</mi><mn>2</mn></msub><mfrac><mrow><mi>E</mi><m
   i>x</mi><mi>p</mi><msubsup><mi>r</mi><mrow><mi>j</mi></mrow><mi>i</mi><
   /msubsup></mrow><msubsup><mi>e</mi><mrow><mi>i</mi></mrow><mi>A</mi></m
   subsup></mfrac></mrow> :MATH]
   6
   [MATH:
   <mrow><msubsup><mi>e</mi><mrow><mi>i</mi></mrow><mi>A</mi></msubsup><mo
   >=</mo><msubsup><mo>∑</mo><mrow><mi>m</mi><mo>=</mo><mn>1</mn></mrow><m
   i>M</mi></msubsup><mi>M</mi><mi>e</mi><mi>a</mi><msubsup><mi>n</mi><mro
   w><mi>j</mi></mrow><mi>m</mi></msubsup><mo>·</mo><mfrac><mmultiscripts>
   <mi>s</mi><mrow><mrow></mrow><mo>∗</mo><mi>A</mi></mrow><mrow></mrow></
   mmultiscripts><mrow><msubsup><mo>∑</mo><mrow><mi>m</mi><mo>=</mo><mn>1<
   /mn></mrow><mi>M</mi></msubsup><mmultiscripts><mi>s</mi><mrow><mrow></m
   row><mo>∗</mo><mi>m</mi></mrow><mrow></mrow></mmultiscripts></mrow></mf
   rac></mrow> :MATH]
   7

   where
   [MATH:
   <mrow><mi>E</mi><mi>x</mi><mi>p</mi><msubsup><mi>r</mi><mrow><mi>j</mi>
   </mrow><mi>i</mi></msubsup></mrow> :MATH]
   is the expression of gene
   [MATH: <mi>j</mi> :MATH]
   in cell
   [MATH: <mi>i</mi> :MATH]
   ;
   [MATH: <msubsup><mi>e</mi><mrow><mi>i</mi></mrow><mi>A</mi></msubsup>
   :MATH]
   is the expected expression of gene
   [MATH: <mi>i</mi> :MATH]
   in cell type
   [MATH: <mi>A</mi> :MATH]
   under the null distribution across all
   [MATH: <mi>M</mi> :MATH]
   cell types;
   [MATH:
   <mrow><mi>M</mi><mi>e</mi><mi>a</mi><msubsup><mi>n</mi><mrow><mi>j</mi>
   </mrow><mi>m</mi></msubsup></mrow> :MATH]
   is the average expression of gene
   [MATH: <mi>j</mi> :MATH]
   in cell type
   [MATH: <mi>m</mi> :MATH]
   ;
   [MATH:
   <mmultiscripts><mi>s</mi><mrow><mrow></mrow><mo>∗</mo><mi>A</mi></mrow>
   <mrow></mrow></mmultiscripts> :MATH]
   is the total expression of all genes in cell type A. Thus, the
   intracellular score takes into account the expression specificity and
   association between the receptor and target genes.

Division of the type of ligand-receptor pairs

   Cell–cell interaction typically occurs only within a specific range.
   Based on the interaction distance, cell–cell interaction can be
   categorized into four types: autocrine, paracrine, juxtacrine, and
   endocrine. However, it is challenging to assess endocrinology in local
   tissue, and the interaction range between autocrine and juxtacrine is
   relatively short and difficult to distinguish. Moreover, the physical
   distance between cells impacts signal diffusion and intensity, and
   different interaction types need to be evaluated at appropriate spatial
   scales. The curated LR data that we have collected does not provide
   information about the types of interactions. Inspired by previous
   research [[131]63], we employ the Wasserstein distance to partition the
   types of interactions for the ligand-receptor pairs. The Wasserstein
   distance is a metric for quantifying the difference between two
   probability distributions. It is founded on optimal transport theory
   and computes the minimum cost between two distributions by
   contemplating how to transform one distribution into another.

   In the classification of ligand-receptor interaction types, the
   Wasserstein distance is utilized to measure the minimum transportation
   cost between the distributions of ligand expression and receptor
   expression. By supposing that the ligand genes are expressed in m cells
   and the receptor genes are expressed in n cells, and using the
   Euclidean distance matrix between cells as the cost matrix
   [MATH: <mrow><mi>D</mi><mo>∈</mo><msup><mrow><mi
   mathvariant="double-struck">R</mi></mrow><mrow><mi>m</mi><mo>×</mo><mi>
   n</mi></mrow></msup></mrow> :MATH]
   , the Wasserstein distance from the ligand expression distribution to
   the receptor expression distribution can be calculated by resolving the
   optimal transportation cost problem as follows:
   [MATH:
   <mrow><msup><mrow><mi>γ</mi></mrow><mo>′</mo></msup><mo>=</mo><munder><
   mrow><mi
   mathvariant="italic">argmin</mi></mrow><mrow><mi>γ</mi><mo>∈</mo><mi
   mathvariant="normal">Γ</mi><mo
   stretchy="false">(</mo><mtext>L,</mtext><mi>R</mi><mo
   stretchy="false">)</mo></mrow></munder><mo><</mo><mi>γ</mi><mo>,</mo><m
   i>D</mi><mo>></mo></mrow> :MATH]
   8

   And the corresponding Wasserstein distance is obtained as follows.
   [MATH: <mrow><mi>W</mi><mrow><mo
   stretchy="false">(</mo><mtext>L,R</mtext><mo
   stretchy="false">)</mo></mrow><mo>=</mo><munder><mrow><mi
   mathvariant="italic">min</mi></mrow><mrow><mi>γ</mi><mo>∈</mo><mi
   mathvariant="normal">Γ</mi><mo
   stretchy="false">(</mo><mi>L</mi><mo>,</mo><mi>R</mi><mo
   stretchy="false">)</mo></mrow></munder><mo><</mo><mi>γ</mi><mo>,</mo><m
   i>D</mi><mo>></mo><mo>=</mo><mo><</mo><msup><mrow><mi>γ</mi></mrow><mo>
   ′</mo></msup><mo>,</mo><mi>D</mi><mo>></mo></mrow> :MATH]
   9

   We then conduct multiple random permutations (with a default of 200
   permutations) on the expression distributions of the ligand and
   receptor. We recalculate the Wasserstein distance and utilize the
   d_ratio (the ratio of the actual distance to the mean of the multiple
   perturbations) and the two-sided significance p-value to classify the
   types of interactions.

   After sorting lr pairs based on d_ratio and p_value, we select the lr
   pairs that satisfy the condition d_ratio < 1 and p_value < 0.05 from
   the top 10% as short-range interaction pairs. We then select the lr
   pairs that satisfy the condition d_ratio > 1 and p_value < 0.05 from
   the bottom 10% as long-range interaction pairs, and the remaining LR
   pairs are categorized as medium-range interaction pairs.

Spatial communication analysis

   To identify ligand-receptor pairs with spatial dependence, we calculate
   the cross-Moran’s I index for all interaction pairs and obtain
   significant candidate pairs through a permutation test (with a default
   of 500 permutations). In the permutation test, we randomly shuffle the
   locations of all spots and assess statistical significance by comparing
   the observed cross-Moran’s I index to the likelihood of obtaining such
   an extremely positive value under the permutation-based random model.
   Furthermore, we consider that the communication relationship with a
   p-value less than 0.05 is significant.
   [MATH: <mrow><msup><mtext>Cross-Moran</mtext><mo>′</mo></msup><mspace
   width="0.166667em"></mspace><mtext>I</mtext><mo>=</mo><mfrac><mi>N</mi>
   <mrow><msubsup><mo>∑</mo><mrow><mi>i</mi></mrow><mi>N</mi></msubsup><ms
   ubsup><mo>∑</mo><mrow><mi>j</mi></mrow><mi>N</mi></msubsup><msub><mi>W<
   /mi><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub></mrow></mfrac><mfrac><mrow><
   msubsup><mo>∑</mo><mrow><mi>i</mi></mrow><mi>N</mi></msubsup><msubsup><
   mo>∑</mo><mrow><mi>j</mi></mrow><mi>N</mi></msubsup><msub><mi>W</mi><mr
   ow><mi mathvariant="italic">ij</mi></mrow></msub><mrow><mo
   stretchy="false">(</mo><msub><mi>x</mi><mi>i</mi></msub><mo>-</mo><move
   r><mi>x</mi><mo>¯</mo></mover><mo
   stretchy="false">)</mo></mrow><mrow><mo
   stretchy="false">(</mo><msub><mi>y</mi><mi>j</mi></msub><mo>-</mo><move
   r><mi>y</mi><mo>¯</mo></mover><mo
   stretchy="false">)</mo></mrow></mrow><mrow><msqrt><mrow><msubsup><mo>∑<
   /mo><mrow><mi>i</mi></mrow><mi>N</mi></msubsup><mrow><mo
   stretchy="false">(</mo></mrow><msub><mi>x</mi><mi>i</mi></msub><mo>-</m
   o><mover><mi>x</mi><mo>¯</mo></mover><msup><mrow><mo
   stretchy="false">)</mo></mrow><mn>2</mn></msup></mrow></msqrt><msqrt><m
   row><msubsup><mo>∑</mo><mrow><mi>j</mi></mrow><mi>N</mi></msubsup><mrow
   ><mo
   stretchy="false">(</mo></mrow><msub><mi>y</mi><mi>j</mi></msub><mo>-</m
   o><mover><mi>y</mi><mo>¯</mo></mover><msup><mrow><mo
   stretchy="false">)</mo></mrow><mn>2</mn></msup></mrow></msqrt></mrow></
   mfrac></mrow> :MATH]
   10

   where
   [MATH: <msub><mi>x</mi><mi>i</mi></msub> :MATH]
   represents the expression of ligand
   [MATH: <mi>x</mi> :MATH]
   in cell
   [MATH: <mi>i</mi> :MATH]
   , while
   [MATH: <msub><mi>y</mi><mi>j</mi></msub> :MATH]
   represents the expression of receptor gene
   [MATH: <mi>y</mi> :MATH]
   in cell
   [MATH: <mi>j</mi> :MATH]
   ; N is the number of cells;
   [MATH: <msub><mi>W</mi><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub> :MATH]
   is the distance weight of cells i and j, which is calculated based on
   the k-nearest neighbor method. The exact choice of k should reflect the
   data. For all data sets analyzed in this study, which are derived from
   the 10X Visium platform, according to Squidpy [[132]64], we set k = 6
   for short-range interactions and multiplied on the medium-range and
   long-range upper parameters to capture different ranges of
   interactions. For other data sets, users can visualize the weight
   matrix for different parameters and choose the most appropriate scale
   for their data.

   We further use local cross-Moran’s I index to identify important local
   interaction regions for each LR pairs. The local cross-Moran’s I for
   spot i is defined as:
   [MATH: <mrow><mtext>Local</mtext><mspace
   width="0.333333em"></mspace><msup><mtext>Cross-Moran</mtext><mo>′</mo><
   /msup><mspace
   width="0.166667em"></mspace><msub><mtext>I</mtext><mi>i</mi></msub><msp
   ace width="0.333333em"></mspace><mtext>=</mtext><mspace
   width="0.333333em"></mspace><mfrac><mrow><msubsup><mo>∑</mo><mrow><mi>j
   </mi></mrow><mi>N</mi></msubsup><msub><mi>W</mi><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub><mrow><mo
   stretchy="false">[</mo><mrow><mo
   stretchy="false">(</mo><msub><mi>x</mi><mi>i</mi></msub><mo>-</mo><move
   r><mi>x</mi><mo>¯</mo></mover><mo
   stretchy="false">)</mo></mrow><mrow><mo
   stretchy="false">(</mo><msub><mi>y</mi><mi>j</mi></msub><mo>-</mo><move
   r><mi>y</mi><mo>¯</mo></mover><mo
   stretchy="false">)</mo></mrow><mo>+</mo><mrow><mo
   stretchy="false">(</mo><msub><mi>y</mi><mi>i</mi></msub><mo>-</mo><move
   r><mi>y</mi><mo>¯</mo></mover><mo
   stretchy="false">)</mo></mrow><mrow><mo
   stretchy="false">(</mo><msub><mi>x</mi><mi>j</mi></msub><mo>-</mo><move
   r><mi>x</mi><mo>¯</mo></mover><mo stretchy="false">)</mo></mrow><mo
   stretchy="false">]</mo></mrow></mrow><mrow><msqrt><mrow><msubsup><mo>∑<
   /mo><mrow><mi>i</mi></mrow><mi>N</mi></msubsup><mrow><mo
   stretchy="false">(</mo></mrow><msub><mi>x</mi><mi>i</mi></msub><mo>-</m
   o><mover><mi>x</mi><mo>¯</mo></mover><msup><mrow><mo
   stretchy="false">)</mo></mrow><mn>2</mn></msup></mrow></msqrt><msqrt><m
   row><msubsup><mo>∑</mo><mrow><mi>j</mi></mrow><mi>N</mi></msubsup><mrow
   ><mo
   stretchy="false">(</mo></mrow><msub><mi>y</mi><mi>j</mi></msub><mo>-</m
   o><mover><mi>y</mi><mo>¯</mo></mover><msup><mrow><mo
   stretchy="false">)</mo></mrow><mn>2</mn></msup></mrow></msqrt></mrow></
   mfrac></mrow> :MATH]
   11

   Similarly, we utilize permutation testing for conducting a significance
   test (with a default of 500 permutations). After identifying reliable
   communication relationships, we further quantify the cell–cell
   communication strength between cell
   [MATH: <mi>i</mi> :MATH]
   and cell
   [MATH: <mi>j</mi> :MATH]
   for LR pair
   [MATH: <mrow><mi>l</mi><mo>-</mo><mi>r</mi></mrow> :MATH]
   using the following definition:
   [MATH: <mrow><mtext>CCI</mtext><mspace
   width="0.333333em"></mspace><msubsup><mtext>Score</mtext><mrow><mi
   mathvariant="italic">lr</mi></mrow><mrow><mi
   mathvariant="italic">ij</mi></mrow></msubsup><mo>=</mo><msubsup><mi>x</
   mi><mrow><mi>l</mi></mrow><mi>i</mi></msubsup><mo>·</mo><msubsup><mi>x<
   /mi><mrow><mi>r</mi></mrow><mi>j</mi></msubsup><mo>·</mo><msubsup><mtex
   t>IntraScore</mtext><mrow><mi>r</mi></mrow><mi>j</mi></msubsup><mo>·</m
   o><mrow><mo
   stretchy="false">(</mo><mn>1</mn><mo>-</mo><mfrac><msub><mi>d</mi><mrow
   ><mi
   mathvariant="italic">ij</mi></mrow></msub><msub><mi>d</mi><mrow><mi
   mathvariant="italic">thre</mi></mrow></msub></mfrac><mo
   stretchy="false">)</mo></mrow></mrow> :MATH]
   12

   where
   [MATH: <msubsup><mi>x</mi><mrow><mi>l</mi></mrow><mi>i</mi></msubsup>
   :MATH]
   is the expression of gene
   [MATH: <mi>l</mi> :MATH]
   in cell
   [MATH: <mi>i</mi> :MATH]
   ;
   [MATH: <msub><mi>d</mi><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub> :MATH]
   is the distance between cell
   [MATH: <mi>i</mi> :MATH]
   and
   [MATH: <mi>j</mi> :MATH]
   ;
   [MATH: <msub><mi>d</mi><mrow><mi
   mathvariant="italic">thre</mi></mrow></msub> :MATH]
   is the median distance among all local interaction hotspots of the
   interaction pair.

Analysis of communication between cell types

   To identify ligand-receptor pairs with significant spatial
   communication between clusters, we calculate the cross-Moran’s I index
   for the interaction pair (ligand
   [MATH: <mi>x</mi> :MATH]
   and receptor
   [MATH: <mi>y</mi> :MATH]
   ) between cell type A and cell type B as follows:
   [MATH: <mrow><mtext>Celltypes</mtext><mspace
   width="0.166667em"></mspace><msup><mtext>Cross-Moran</mtext><mo>′</mo><
   /msup><mspace
   width="0.166667em"></mspace><mtext>I</mtext><mo>=</mo><mfrac><mrow><mi>
   N</mi><mo>+</mo><mi>M</mi></mrow><mrow><mn>2</mn><msub><mo>∑</mo><mrow>
   <mi>i</mi><mo>∈</mo><mi>A</mi></mrow></msub><msub><mo>∑</mo><mrow><mi>j
   </mi><mo>∈</mo><mi>B</mi></mrow></msub><msub><mi>W</mi><mrow><mi
   mathvariant="italic">ij</mi></mrow></msub></mrow></mfrac><mfrac><mrow><
   msub><mo>∑</mo><mrow><mi>i</mi><mo>∈</mo><mi>A</mi></mrow></msub><msub>
   <mo>∑</mo><mrow><mi>j</mi><mo>∈</mo><mi>B</mi></mrow></msub><msub><mi>W
   </mi><mrow><mi mathvariant="italic">ij</mi></mrow></msub><mrow><mo
   stretchy="false">(</mo><msub><mi>x</mi><mi>i</mi></msub><mo>-</mo><move
   r><mi>x</mi><mo>¯</mo></mover><mo
   stretchy="false">)</mo></mrow><mrow><mo
   stretchy="false">(</mo><msub><mi>y</mi><mi>j</mi></msub><mo>-</mo><move
   r><mi>y</mi><mo>¯</mo></mover><mo
   stretchy="false">)</mo></mrow></mrow><mrow><msqrt><mrow><msub><mo>∑</mo
   ><mrow><mi>i</mi><mo>∈</mo><mi>A</mi></mrow></msub><mrow><mo
   stretchy="false">(</mo></mrow><msub><mi>x</mi><mi>i</mi></msub><mo>-</m
   o><mover><mi>x</mi><mo>¯</mo></mover><msup><mrow><mo
   stretchy="false">)</mo></mrow><mn>2</mn></msup></mrow></msqrt><msqrt><m
   row><msub><mo>∑</mo><mrow><mi>j</mi><mo>∈</mo><mi>B</mi></mrow></msub><
   mrow><mo
   stretchy="false">(</mo></mrow><msub><mi>y</mi><mi>j</mi></msub><mo>-</m
   o><mover><mi>y</mi><mo>¯</mo></mover><msup><mrow><mo
   stretchy="false">)</mo></mrow><mn>2</mn></msup></mrow></msqrt></mrow></
   mfrac></mrow> :MATH]
   13

   where N and M are the number of cells belonging to cell types A and B,
   respectively. And we also permute the locations of spots within each
   cluster for significance test (with a default of 500 permutations).

   After filtering out LR pairs with significant communication
   relationships between different cell types through cross-Moran index,
   we can further calculate cell–cell communication scores between
   different cell types for the LR pair
   [MATH: <mrow><mi>l</mi><mo>-</mo><mi>r</mi></mrow> :MATH]
   .The cell-type-level signaling strength is quantified as.
   [MATH: <mrow><mtext>Celltype</mtext><mspace
   width="0.166667em"></mspace><mtext>CCI</mtext><mspace
   width="0.333333em"></mspace><msubsup><mtext>Score</mtext><mrow><mtext>l
   r</mtext></mrow><mtext>AB</mtext></msubsup><mo>=</mo><mfrac><mrow><msub
   ><mo>∑</mo><mrow><mrow><mo
   stretchy="false">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo
   stretchy="false">)</mo></mrow><mo>∈</mo><msub><mi>I</mi><mrow><mi
   mathvariant="italic">AB</mi></mrow></msub></mrow></msub><mtext>CCI
   Scor</mtext><msubsup><mtext>e</mtext><mrow><mtext>lr</mtext></mrow><mte
   xt>kl</mtext></msubsup></mrow><mfenced close="|"
   open="|"><msub><mi>I</mi><mrow><mi
   mathvariant="italic">AB</mi></mrow></msub></mfenced></mfrac></mrow>
   :MATH]
   14

   where
   [MATH: <mrow><msub><mi>I</mi><mrow><mi
   mathvariant="italic">AB</mi></mrow></msub><mo>=</mo><mrow><mo
   stretchy="false">{</mo><mrow><mo
   stretchy="false">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo
   stretchy="false">)</mo></mrow><mo
   stretchy="false">|</mo><mi>k</mi><mo>∈</mo><msub><mi>S</mi><mi>A</mi></
   msub><mo>,</mo><mi>l</mi><mo>∈</mo><msub><mi>S</mi><mi>B</mi></msub><mo
   stretchy="false">}</mo></mrow></mrow> :MATH]
   ,
   [MATH: <msub><mi>S</mi><mi>A</mi></msub> :MATH]
   is the cluster A.

Identification of downstream key transcription factors and signaling pathways

   After identifying the reliable ligand-receptor relationship, we further
   identify the signals affected by the ligand-receptor. First, we utilize
   the inter-gene co-expression coefficient to further screen the
   receptor-target gene relationship for each cell type (with a threshold
   of 0.1) and then employ the Fisher’s test to identify the activity
   fraction of each transcription factor. The activity score of TF k in
   the cell type
   [MATH: <mi>t</mi> :MATH]
   is calculated by Fisher’s exact test as follows:
   [MATH: <mrow><mi>P</mi><mo>=</mo><mfrac><mrow><mfenced close=")"
   open="("><mrow><mtable><mtr><mtd><mtext>a+b</mtext></mtd></mtr><mtr><mt
   d><mrow><mrow></mrow><mi>a</mi></mrow></mtd></mtr></mtable></mrow></mfe
   nced><mfenced close=")"
   open="("><mrow><mtable><mtr><mtd><mtext>c+d</mtext></mtd></mtr><mtr><mt
   d><mrow><mrow></mrow><mi>c</mi></mrow></mtd></mtr></mtable></mrow></mfe
   nced></mrow><mfenced close=")"
   open="("><mrow><mtable><mtr><mtd><mtext>a+b+c+d</mtext></mtd></mtr><mtr
   ><mtd><mrow><mrow></mrow><mi>a</mi><mo>+</mo><mi>c</mi></mrow></mtd></m
   tr></mtable></mrow></mfenced></mfrac></mrow> :MATH]
   15

   where
   [MATH: <mi>a</mi> :MATH]
   is the number of the set of target genes that are highly associated
   with the receptor
   [MATH: <mi>r</mi> :MATH]
   and TF
   [MATH: <mi>k</mi> :MATH]
   ;
   [MATH: <mi>b</mi> :MATH]
   is the number of the set of target genes that receptor r can act on
   through the prior gene network excluding
   [MATH: <mi>a</mi> :MATH]
   ;
   [MATH: <mi>c</mi> :MATH]
   is the number of the set of target genes that is highly correlated with
   receptor
   [MATH: <mi>r</mi> :MATH]
   but not associated with TF
   [MATH: <mi>k</mi> :MATH]
   ;
   [MATH: <mi>d</mi> :MATH]
   is the number of all genes excluding a, b, and c.

   After identifying the key transcription factors, we search for the
   optimal signal pathway from the receptor to the target gene based on
   the co-expression coefficient between the genes.

Comparison with other methods

   In our experiments, we conducted a comparison of our method against
   CellChat, COMMOT, SpaTalk, and Giotto using three real datasets. For
   each of the methods, our collected data was utilized as the prior
   ligand-receptor (LR) database, and default parameters were employed to
   predict LR pairs. We regarded the interaction pairs between clusters
   with a significant communication p-value of less than 0.05 as being of
   importance.

   In addition to the aforementioned methods, we also made a comparison
   with methods (SpaTalk, CellCall, NicheNet) that take downstream
   signaling into account. Given the complexity of their prior databases,
   we opted to use their original databases. All methods were implemented
   with default parameters. Inspired by SpaTalk, we employed Fisher’s
   exact test to compare the significance of predicted downstream
   signaling pathways involved in KEGG and Reactome among different
   methods.

Supplementary Information

   [133]12915_2025_2141_MOESM1_ESM.docx^ (547.4KB, docx)

   Additional file 1: Supplementary figures S1-S2.

Acknowledgements