Abstract

   Spatially resolved transcriptomics provides genetic information in
   space toward elucidation of the spatial architecture in intact organs
   and the spatially resolved cell-cell communications mediating tissue
   homeostasis, development, and disease. To facilitate inference of
   spatially resolved cell-cell communications, we here present SpaTalk,
   which relies on a graph network and knowledge graph to model and score
   the ligand-receptor-target signaling network between spatially proximal
   cells by dissecting cell-type composition through a non-negative linear
   model and spatial mapping between single-cell transcriptomic and
   spatially resolved transcriptomic data. The benchmarked performance of
   SpaTalk on public single-cell spatial transcriptomic datasets is
   superior to that of existing inference methods. Then we apply SpaTalk
   to STARmap, Slide-seq, and 10X Visium data, revealing the in-depth
   communicative mechanisms underlying normal and disease tissues with
   spatial structure. SpaTalk can uncover spatially resolved cell-cell
   communications for single-cell and spot-based spatially resolved
   transcriptomic data universally, providing valuable insights into
   spatial inter-cellular tissue dynamics.

   Subject terms: Software, Gene expression, Computational models
     __________________________________________________________________

   Cell-cell communication is a vital feature involving numerous
   biological processes. Here, the authors develop SpaTalk, a cell-cell
   communication inference method using knowledge graph for spatially
   resolved transcriptomic data, providing valuable insights into spatial
   intercellular tissue dynamics.

Introduction

   Cell–cell communications via secreting and receiving ligands frequently
   occur in multicellular organisms, which is a vital feature involving
   numerous biological processes^[46]1. Standard algorithms for inferring
   cell–cell communications mediated by ligand–receptor interactions
   (LRIs) primarily incorporate a database of known LRIs and single-cell
   transcriptomic data by delineating cell populations and their lineage
   relationships^[47]2,[48]3. One common strategy is to integrate the
   abundance of ligands and receptors for the inference of signals from
   senders to receivers based on the premise that highly co-expressed
   ligands and receptors are likely to mediate inter-cellular
   communications^[49]4,[50]5. Another strategy applies the downstream
   targets triggered by LRIs in receivers to enrich and score the
   ligand–receptor–target (LRT) signaling network^[51]6–[52]8. Although
   single-cell transcriptomic data can provide information on the genes
   contributing to cell–cell communications, the spatial information of
   cells is inevitably lost when dissociating tissues into single cells,
   thereby hindering the extension of current tools to investigate
   cell–cell communications in tissues with spatial structure^[53]9.

   Recent technological advances in spatially resolved transcriptomics
   (ST) benefiting from spatial barcoding and imaging-based approaches
   have enabled the measurement of whole or mostly whole transcriptomes
   while retaining the spatial information^[54]10, which have been
   increasingly adopted to generate useful insights in the biological and
   biomedical domains, with dramatically improved accuracy and reliability
   in the inference of spatially proximal cell–cell communications^[55]11.
   Given the space-constrained nature of juxtacrine and paracrine
   signaling, such spatial gene expression information is vital to
   understand cell–cell communications mediating tissue homeostasis,
   development, and disease^[56]12,[57]13.

   Several methods have recently emerged to decode the mechanisms of
   cell–cell communications in space^[58]14. For example, Giotto utilizes
   preferential cell neighbors over single-cell ST datasets for each pair
   of cell types with an enrichment test to evaluate the likelihood of a
   given LRI based on proximal co-expressing cells and infer cell–cell
   communication in space^[59]15. SpaOTsc applies structured optimal
   transport mapping between scRNA-seq and ST data to assign a spatial
   position for each cell, resulting in a cell–cell distance as a
   transport cost to infer the ligand–receptor signaling network that
   mediates space-constrained cell–cell communication^[60]16. However,
   Giotto and SpaOTsc are limited to infer inter-cellular communications
   over single-cell ST data rather than the spot-based ST data and between
   paired cell types rather than paired cells. It still lacks of methods
   that can infer and visualize spatially resolved cell–cell
   communications at single-cell resolution over ST data to date, posing a
   great challenge for decoding spatial inter-cellular dynamics underlying
   disease pathology.

   To address this challenge, we herein proposed SpaTalk, a spatially
   resolved cell–cell communication inference method by creatively
   integrating the principles of the ligand–receptor proximity and
   ligand–receptor–target (LRT) co-expression to model and score the LRT
   signaling network between spatially proximal cells relying on the graph
   network and knowledge graph approaches^[61]17,[62]18. The performance
   of SpaTalk was evaluated on benchmarked datasets with remarkable
   superiority over other methods. By applying to STARmap^[63]19,
   Slide-seq^[64]20,[65]21, and 10x Visium^[66]22 datasets, SpaTalk
   revealed the in-depth communicative mechanisms underlying normal and
   disease tissues with spatial structure. Collectively, these results
   demonstrate SpaTalk as a useful and universal method that can help to
   uncover spatially resolved cell–cell communications for both
   single-cell and spot-based ST data, providing insights into the
   understanding of spatial inter-cellular dynamics in tissues.

Results

Overview of the SpaTalk method

   Figure [67]1 provides an overview of the workflow for developing and
   testing SpaTalk, comprising two main components: (1) dissect the
   cell-type composition of ST data and (2) infer spatially resolved
   cell–cell communications over decomposed single-cell ST data
   (Fig. [68]1a). In the first component, the non-negative linear model
   (NNLM)^[69]23–[70]25 was applied to decode the cell-type composition
   for a single-cell or spot-based ST data matrix using the scRNA-seq data
   matrix with k cell types as the underlying reference. By incorporating
   Lee’s multiplicative iteration algorithm and relative entropy
   loss^[71]25, the model was trained with default hyperparameters until
   convergence, producing a weight matrix representing the optimal
   proportion of cell types for each cell/spot. For single-cell ST data,
   the cell type with the maximum weight was assigned to label each cell.
   For spot-based ST data, the cell types with different weights were used
   as the reference to project the cells from scRNA-seq data onto the
   spatial spot (Fig. [72]1b). Through random sampling and deep iteration
   processes, the optimal cellular combination that most resembled the
   spatial spot was refined to reconstruct the single-cell ST data for
   spot-based ST data.

Fig. 1. Workflow of the SpaTalk method and visualization.

   [73]Fig. 1
   [74]Open in a new tab

   a Overview of SpaTalk, including the input, intermediate process of
   decoding spatially resolved cell–cell communications, and output. b
   Conceptual framework of cell-type decomposition with SpaTalk. Five
   different spatial technologies and datasets were selected and analyzed:
   spot-based ST data (Slide-seq and 10x Visium) and single-cell ST data
   (STARmap, MERFISH, and seqFISH+). NNLM was used to dissect the optimal
   proportion of cell types for the projection of cells from scRNA-seq
   reference data onto the spatial cells/spots, generating single-cell ST
   data with known cell types. c Schematic representation of SpaTalk to
   infer spatially resolved cell–cell communications mediated by LRIs. The
   inter-cellular and intracellular scores were obtained and combined from
   the cell–cell graph network and the LRT-knowledge graph (KG),
   respectively, by integrating the KNN, permutation test, and random walk
   algorithms. L, ligand; R, receptor; TF, transcription factor; T,
   target. d Visualization of spatially resolved cell–cell communications,
   including a heatmap, Sankey plot, and diagram of the LRI from senders
   to receivers in space, as well as ligand–receptor–target (LRT)
   signaling pathways over the reconstructed single-cell ST data.

   The second component of SpaTalk is to infer spatially resolved
   cell–cell communications and downstream signal pathways. To identify
   possible communications among cells mediated by LRIs, the principles of
   ligand–receptor proximity and ligand–receptor–target (LRT)
   co-expression were incorporated based on a recent review^[75]11. In
   detail, the KNN algorithm is first applied to each cell in space to
   construct the cell graph network inspired by Giotto^[76]15. For a given
   ligand of the sender (cell type A) and a given receptor of the receiver
   (cell type B), the number of ligand–receptor co-expressed cell–cell
   pairs is obtained from the graph network by counting the 1-hop neighbor
   nodes of receivers for each sender. A permutation test filters and
   scores the significantly enriched LRIs, generating the inter-cellular
   score (Fig. [77]1c).

   The knowledge graph (KG) was then introduced to model the intracellular
   signal propagation process from the receptor to its downstream signals,
   i.e., TFs and target genes. In practice, LRIs from CellTalkDB^[78]26,
   pathways from Kyoto Encyclopedia of Genes and Genomes (KEGG) and
   Reactome, and TFs from AnimalTFDB^[79]27 were integrated to construct
   the LRT-KG, wherein the weight between entities represents the
   co-expressed coefficient. Taking the receptor as the query node, we
   incorporated the random walk algorithm^[80]28 into the LRT-KG to filter
   and score the downstream activated TFs and calculate the intracellular
   score of the LRI from senders to receivers as the LRI that mediates the
   cell–cell communication is supposed to activate at least one TF and its
   target gene in the receiver cell type and the greater number of
   co-expressed TFs and their target genes will lead to a higher score for
   a given LRI pair between the sender cell type and the receiver cell
   type. Inter-cellular and intracellular scores are combined to rank the
   LRIs that mediate spatially resolved cell–cell communications.

   SpaTalk also includes numerous visualization functions to characterize
   the cell-type composition and spatially resolved cell–cell
   communications, such as the diagram of the LRI from senders to
   receivers in space and LRT signaling pathways, over the reconstructed
   single-cell ST data (Fig. [81]1d). Five broad ranges of different
   spatial technologies and corresponding representative datasets were
   analyzed and visualized: spot-based ST data (Slide-seq^[82]20,[83]21
   and 10x Visium^[84]22) and single-cell ST data (STARmap^[85]19,
   MERFISH^[86]29, and seqFISH+^[87]30).

Performance comparison of SpaTalk with other methods

   The cell-type decomposition by SpaTalk is the foundation for subsequent
   analyses. To evaluate its performance, four single-cell ST datasets
   from the mouse cortex, hypothalamus, olfactory bulb (OB), and
   sub-ventricular zone (SVZ) were utilized (Supplementary Fig. [88]1a).
   All cells were split according to the fixed spatial distance and then
   merged into simulated spots as the benchmark datasets (Fig. [89]2a).
   The quality of predicted cell-type decompositions and expression
   profiles was evaluated by Pearson’s correlation coefficient and the
   root mean square error (RMSE) based on the ground truth, wherein
   SpaTalk exhibited fantastic performance over the benchmark datasets
   (Supplementary Fig. [90]1b, c). Although the majority of existing
   cell-type deconvolution methods (RCTD^[91]31, Seurat^[92]32,
   SPOTlight^[93]33, deconvSeq^[94]34, Stereoscope^[95]35, and
   Cell2location^[96]36) can achieve a decent correlation coefficient and
   low RMSE on spot deconvolution, SpaTalk outperformed these methods on
   most benchmark datasets with the top-ranked performance, except for the
   evaluation indices on the MERFISH dataset and the mean RMSE on the
   STARmap dataset (Fig. [97]2b). The MERFISH dataset only includes 155
   genes, whereas the STARmap and seqFISH+ datasets cover 1020 and 10,000
   genes per cell, respectively, suggesting that SpaTalk is potentially
   more effective for spatial data with higher gene coverage.

Fig. 2. Superior performance of SpaTalk over existing methods.

   [98]Fig. 2
   [99]Open in a new tab

   a Schematic diagram for generating simulated spot data. Cells were
   split according to the fixed spatial distance and then merged for the
   single-cell ST data with known cell types. b Performance comparison of
   SpaTalk with other existing cell-type deconvolution methods (RCTD,
   Seurat, SPOTlight, deconvSeq, Stereoscope, Cell2location). The asterisk
   represents the top-ranked method for each dataset. NA, not available. c
   Schematic illustration of the procedure and rationale for single-cell
   ST data to evaluate predicted LRIs that mediate spatially resolved
   cell–cell communications. d and e Performance comparison of SpaTalk
   with existing cell–cell communication inference methods (Giotto,
   SpaOTsc, NicheNet, CytoTalk, CellCall, CellPhoneDB, and CellChat) on
   the STARmap and seqFISH+ datasets. The P-value represents the
   difference of spatial distances between sender-receiver and all
   cell–cell pairs assessed with the Wilcoxon test. For the boxplots
   (minima, 25th percentile, median, 75th percentile, and maxima) from
   SpaTalk to CellChat, the numbers of data points for the STARmap dataset
   are 15, 5, 676, 537, 75, 31, 7, and 166, respectively; the numbers of
   data points for the seqFISH+ OB dataset are 1,559, 38, 3,972, 1,223,
   375, 675, 10, and 404, respectively; the numbers of data points for the
   seqFISH+ SVZ dataset are 9817, 50, 3337, 1424, 1553, 7798, 258, and
   291, respectively. f Schematic illustration of the procedure and
   rationale for single-cell ST data to evaluate predicted downstream
   target and pathways underlying LRIs. g Performance comparison of
   SpaTalk on the inferred downstream targets with other methods
   (NicheNet, CytoTalk, and CellCall) over the STARmap and seqFISH+
   datasets. The P-value represents the significance of enriched pathways
   or biological processes from the KEGG and Reactome databases using
   inferred downstream targets with the Fisher-exact test. For the
   boxplots (minima, 25th percentile, median, 75th percentile, and maxima)
   from SpaTalk to CellCall, the numbers of data points for the STARmap
   dataset are 25, 1203, 66, and 68, respectively; the numbers of data
   points for the seqFISH+ OB dataset are 23,132, 19,209, 5168, and
   27,668, respectively; the numbers of data points for the seqFISH+ SVZ
   dataset are 121,693, 25,216, 15,454, and 225,231, respectively.

   For the inference of cell–cell communication, we demonstrate that
   SpaTalk enables the identification of known LRIs in space as shown in
   the cases based on the STARmap and Slide-seq mouse brain ST datasets
   (Supplementary Fig. [100]2). Thus, we next compared the performance of
   SpaTalk with that of existing cell–cell communication inference methods
   with their default LRI databases (Supplementary Fig. [101]3a).
   Consequently, most methods exhibited a large fraction of overlapped
   predictions with the rest of the methods despite the different number
   of inferred cell–cell communications (Supplementary Fig. [102]3b, c),
   indicating the reproducible inference across these methods. Regarding
   the inferred LRIs (Supplementary Fig. [103]3d), we reasoned that the
   spatial distances of the inferred LRI between sender-receiver pairs
   will be shorter than those between all cell–cell pairs and thus the
   inferred LRI will be more co-expressed in local space as cells that are
   close are more likely to signal (Fig. [104]2c). The one-sided Wilcoxon
   test was performed to evaluate the spatial proximity significance of
   the inferred LRIs, and the co-expressed percentage of the LRI was
   calculated as the co-expression level using the cell–cell graph
   network. Although most LRIs inferred by other methods showed
   significantly closer spatial distances between sender-receiver pairs
   than that between all cell–cell pairs, superior performance of SpaTalk
   was observed, ranking first for both evaluation indices for STARmap
   datasets (Fig. [105]2d). Similarly, SpaTalk obtained a higher median
   –log[10] P-value and co-expression percent on the seqFISH+ OB and SVZ
   datasets but not for SpaOTsc^[106]16, CellPhoneDB^[107]37, and
   CellChat^[108]38 on individual evaluation indices (Fig. [109]2e).

   Considering the different default LRI database utilized in cell–cell
   communication inference methods, we benchmarked the performance of
   SpaTalk and other methods by unifying the LRI database, wherein
   CellTalkDB^[110]17, CellPhoneDB^[111]37, CytoTalkDB^[112]7,
   CellChatDB^[113]38, and CellCallDB^[114]8 were used by turns. As shown
   in Supplementary Fig. [115]3e, several methods showed decent
   performance on some individual LRI databases. For example, Giotto
   obtained the highest median –log[10] P among all methods over the
   STARmap dataset based on CellPhoneDB and CellCallDB, while CytoTalk is
   the top-ranked method over the STARmap dataset based on CellPhoneDB and
   CellChatDB considering the median co-expression percent. Over the
   seqFISH+ OB dataset based on CellChatDB, both of CellPhoneDB and
   CellChatDB perfectly identified the significantly proximal LR pairs in
   space. For SpaOTsc, it exhibited the highest median co-expression
   percent over the seqFISH+ SVZ dataset across all LRI databases.
   Nevertheless, SpaTalk obtained the most times of the first place across
   the benchmarked datasets and the underlying LRI databases,
   outperforming other existing methods for inference of spatially
   proximal LR pairs that mediating cell–cell communication in space.

   Next, we compared the performance of SpaTalk for inference of
   intracellular signal pathways of the receiver cell type triggered by
   the LRI with those of NicheNet, CytoTalk, and CellCall that also infer
   the downstream targets of LRIs. We reasoned that a more accurate method
   would be more likely to enrich the receptor-related biological
   processes or pathways using the inferred downstream target genes in the
   receiver cell type (Fig. [116]2f); hence, the Fisher-exact test was
   adopted for pathway enrichment analysis with the KEGG and Reactome
   databases on target genes in receivers. The target genes inferred by
   all methods enriched the most intracellular pathways or biological
   processes triggered by the inter-cellular LRI (Fig. [117]2g).
   Nevertheless, SpaTalk exhibited the top-ranked performance over three
   benchmarked datasets, exceeding other existing methods in inference of
   the LRT signal network.

   In addition, the computation time for each method was also evaluated.
   For the decomposition step, SpaTalk and other deconvolution six methods
   were compared over four simulated (STARmap, MERFISH, seqFISH+ OB and
   SVZ) and one real spot-based (10x Visium) ST datasets, wherein the
   computation time of SpaTalk was within minutes similar to RCTD and
   Seurat, outperforming SPOTlight, deconvSeq, Stereoscope, and
   Cell2location (Supplementary Table [118]1). For the inferring cell–cell
   communication step, SpaTalk and other seven methods were benchmarked
   over three single-cell ST datasets (STARmap, seqFISH+ OB and SVZ) and
   one reconstructed single-cell ST dataset (10x Visium) with SpaTalk. As
   shown in Supplementary Table [119]2, the computation time of SpaTalk,
   Giotto, SpaOTsc, NicheNet, CellCall, and CellPhoneDB were all within
   minutes, superior to that of CytoTalk and CellChat. In short, the
   present results indicated that the SpaTalk is relatively accurate and
   efficient method for dissecting the cell-type composition of ST data
   and inferring spatially resolved cell–cell communications.

Metabolic modulation of periportal hepatocytes on pericentral hepatocytes

   We first applied SpaTalk to the Slide-seq ST dataset (v1) of the mouse
   liver covering 17,545 unique genes among 25,595 spots in space
   (Fig. [120]3a). To explore the cell-type composition of Slide-seq data,
   a mouse liver scRNA-seq reference integrating the non-parenchymal cells
   from the Mouse Cell Atlas (MCA)^[121]39 and the parenchymal hepatic
   cells from the “[122]GSE125688
   [[123]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE125688]”
   dataset^[124]40 were utilized (Supplementary Fig. [125]4a), containing
   6,029 cells, including major immune cells such as macrophages (Macro),
   and the pericentral and periportal hepatocytes. The reconstructed
   single-cell ST atlas was perfectly accordant with the original outcome
   obtained by Slide-seq (Fig. [126]3b), wherein the expression of known
   marker genes^[127]41 and the percent for each cell type were highly
   correlated across spots (Supplementary Fig. [128]4b–d), such as the
   pericentrally and periportally zonated genes Cyp2e1 and Pck1
   (Fig. [129]3c). Immune cells were hardly observed in each spot, with
   pericentral and periportal hepatocytes accounting for the major
   proportion across spots (Supplementary Fig. [130]4e); the same
   phenomenon was observed in recently published ST data of the healthy
   liver^[131]42.

Fig. 3. Modulation of periportal hepatocytes on the metabolic
microenvironment sensed by pericentral hepatocytes.

   [132]Fig. 3
   [133]Open in a new tab

   a Slide-seq spot-based ST dataset of the mouse liver involving 25,595
   spots and 17,545 genes. b Cell-type decomposition by SpaTalk. PC,
   pericentral; PP, periportal; Hep, hepatocytes. c Scaled Pearson’s
   correlation coefficients between the expression of known marker genes
   and the percent for each cell type. Endo, endothelial cells; DC,
   dendritic cells; Macro, macrophages. d Enriched LRIs that mediate
   cell–cell communications between pericentral and periportal
   hepatocytes. e Significant differentially expressed genes (DEGs)
   between periportal and pericentral hepatocytes assessed with the
   Wilcoxon test and the corresponding significantly enriched biological
   processes and pathways determined with the Metascape web tool.
   Representative DEGs are labeled beside the heatmap. f Significantly
   activated pathways in pericentral (up) and periportal (down)
   hepatocytes determined by Gene Set Enrichment Analysis (GSEA). g
   Communications from periportal hepatocytes to pericentral hepatocytes
   mediated by the Apob-Cd36 interaction in space. h Intracellular signal
   pathway inferred by SpaTalk over Slide-seq data and the significantly
   enriched pathways over the ligand–receptor–target network. i Inferred
   module score of each hepatocyte type over the pathway signatures
   determined with Seurat.

   The cell–cell communications between pericentral and periportal
   hepatocytes were further explored by SpaTalk (Fig. [134]3d). Both
   hepatocyte types secrete and receive multiple ligands for their
   communication, forming spatially distributed metabolic cascades to
   cooperatively optimize the metabolic environment. For instance, with
   the gradient expression of enzymes in sequential lobule layers,
   pericentral hepatocytes perform the primary steroid, alcohol, and lipid
   metabolic processes, while periportal hepatocyte mainly carry out the
   small-molecule and monosaccharide biosynthetic processes, amino acid
   and triglyceride metabolic processes, and gluconeogenesis
   (Fig. [135]3e, f), in line with the variable functions of zonated
   hepatocytes residing in the central and portal veins^[136]43. Notably,
   the periportal hepatocytes substantially expressed more ligands,
   including epidermal growth factor (Egf), transforming growth factor
   alpha (Tgfa), heparin-binding EGF-like growth factor (Hbegf),
   insulin-like growth factor 1 (Igf1), and vascular endothelial growth
   factor A (Vegfa), to promote the growth of pericentral hepatocytes. As
   the blood flows from the portal vein toward the central vein, this
   could reflect the fact that periportal hepatocytes respire most of the
   oxygen, leading to decreased oxygen concentrations along the lobule
   axis; thus, periportal hepatocytes secrete numerous growth factors via
   paracrine signaling to modulate the pericentral hepatocytes for the
   prevention of hypoxia and the maintenance and amelioration of
   pericentrally metabolic functions^[137]43.

   Taking the LRI of Apob-Cd36 as an example, the spatially resolved
   cell–cell communications between periportal and pericentral hepatocytes
   mainly occurred across the mid-lobule layers (Fig. [138]3g). The gene
   product of Apob is an apolipoprotein of chylomicrons and low-density
   lipoproteins highly involved with the regulation of lipids and fatty
   acids metabolism through CD36, indicating the modulation of periportal
   hepatocytes on the metabolic microenvironment sensed by pericentral
   hepatocytes. From the reconstructed intracellular signal propagation
   network triggered by the Apob-Cd36 interaction (Fig. [139]3h),
   sequential target TFs were activated, including Ahr that regulates
   xenobiotic-metabolizing enzymes such as cytochrome P450, and Nr1h4 that
   regulates the expression of genes involved in bile acid synthesis and
   transport, in agreement with the corresponding module score of
   pericentral hepatocytes in space (Fig. [140]3i). The LRT network was
   also remarkably enriched in the AMPK and PPAR signaling pathways, which
   play crucial roles in the regulation of energy and metabolic
   homeostasis, suggesting the spatially fine-tuned cell–cell
   communications along the portal-central lobule axis for minimizing
   risks to pericentral hepatocytes^[141]43.

Identification of signal transmission among glomerular cells in kidney

   SpaTalk was then applied to investigate and visualize the
   intraglomerular communications over the Slide-seq ST dataset (v2) of
   the mouse kidney^[142]44 (Fig. [143]4a), including data of 20,591
   sequenced genes for 27,044 spots in space covering the spatial axis of
   collecting duct intercalated cells (CD-IC), collecting duct principal
   cells (CD-PC), distal convoluted tubules (DCT), endothelial cells
   (Endo), fibroblasts (FB), granular cells (GC), macrophages (Macro),
   mesangial cells (MC), immune cells, proximal convoluted tubules (PCT),
   podocytes (Pod), thick ascending limb (TAL), and vascular smooth muscle
   cells (vSMC) from cortex of kidney to renal medulla. As shown in
   Fig. [144]4b, SpaTalk identified the spatial signal transmission among
   Pod, MC, and Endo in glomerulus. For example, the direct cell–cell
   communication mediated by the pleiotrophin (Ptn) and protein tyrosine
   phosphatase receptor type B (Ptprb) interaction was observed from MC
   and Pod to Endo, wherein Ptn is a secreted growth factor that can bind
   Ptprb, which is known to be involved with adherens junction stimulating
   endothelial cell migration and maintaining proper glomerular
   function^[145]45. Besides, collagen and Notch signaling were also
   identified forming the matrix that provides structural support for the
   glomerulus, which are necessary for proper glomerular basement membrane
   formation and glomerular development^[146]45,[147]46. Concordantly,
   these identified LRIs among glomerular cells are associated with
   multiple biological processes and pathways that play vital roles in the
   regulation of physiological kidney development and glomerular
   filtration function in the urinary system^[148]47,[149]48, including
   tube morphogenesis, positive regulation of cell adhesion, urogenital
   system development, and morphogenesis of a branching epithelium
   (Fig. [150]4c).

Fig. 4. Identification of spatial signal transmission among glomerular cells
in kidney.

   [151]Fig. 4
   [152]Open in a new tab

   a Slide-seq (v2) ST dataset of the mouse kidney involving 27044 spots
   and 20591 genes. CD-IC, collecting duct intercalated cells; CD-PC,
   collecting duct principal cells; DCT, distal convoluted tubules; Endo,
   endothelial cells; FB, fibroblasts; GC, granular cells; Macro,
   macrophages; MC, mesangial cells; PCT, proximal convoluted tubules;
   Pod, podocytes; TAL, thick ascending limb; vSMC, vascular smooth muscle
   cells. b Significantly enriched LRIs that mediate cell–cell
   communications among MC, Endo, and Pod inferred by SpaTalk with
   P < 0.05. The P-value represents the significance of spatial proximity
   of LRIs using the permutation test. Top 20 LRI pairs were plotted for
   Pod-Endo communications. c Significantly enriched Gene ontology (GO)
   biological processes determined with the Metascape web tool for the
   ligands and receptors from MC and Pod to Endo inferred by SpaTalk. d
   Spatial distribution of the Vegfa-Kdr pairs between the Pod senders and
   Endo receivers. e Number of Vegfa-Kdr pairs from Pod to Endo in space
   across other slides of mouse kidney. f Mouse kidney ST dataset
   generated from 10x Visium involving 3124 spots and 19465 genes and the
   reconstructed single-cell ST data by SpaTalk. The percent of MC, Endo,
   and Pod as well as the expression of the corresponding known markers
   were plotted. PT, proximal tubule; CNT, connecting tubule; LHDL, loop
   of Henle descending loop; LHAL, loop of Henle ascending loop. g
   Significantly enriched LRIs that mediate Pod-Endo communications. Top
   20 LRIs (left) and spatial distribution of the Angpt1-Tek pairs between
   the Pod senders and Endo receivers were plotted. h Communications of
   Endo-MC mediated by the Pdgfb-Pdgfrb interaction in space and the
   spatial distances of Pdgfb-Pdgfrb in Endo-MC and all cell–cell pairs,
   respectively. P-value was calculated with the one-sided t-test. The
   numbers of data points (minima, 25th percentile, median, 75th
   percentile, and maxima) for the boxplots from left to right are
   2,237,963 and 405,756, respectively.

   Notably, the Pod-Endo communication mediated by Vegfa signaling was
   significantly enriched in space (Fig. [153]4b), in accordance with the
   fact that Vegfa is vital for the formation and maintenance of select
   microvascular beds within the kidney^[154]49. It is known that Vegfa,
   normally produced by healthy podocytes, has been shown to be a critical
   regulator of glomerular development and function and precise expression
   of the amount of Vegfa is required for adequate barrier
   function^[155]50. As the kinase insert domain receptor (Kdr) of Vegfa,
   Kdr functions as the main mediator of VEGF-induced endothelial
   proliferation, survival, migration, tubular morphogenesis and
   sprouting. Concordantly, the Pod-Endo communication mediated by
   Vegfa-Kdr was also identified in other mouse kidney ST data
   (Fig. [156]4e). In addition, most shared LRIs mediating the
   intraglomerular communications in space were also observed across
   slides of human and mouse kidney (Supplementary Fig. [157]5a),
   suggesting the robustness and universality of the spatially resolved
   cell–cell communications inferred by SpaTalk.

   We then applied SpaTalk to another spot-based ST dataset of the mouse
   kidney (10x Visium), reaching up to 19,465 unique genes among 3,124
   spots in space (Fig. [158]4f). Leveraging previously published adult
   mouse kidney cell taxonomy by single-nucleus RNA-seq data^[159]51
   (Supplementary Fig. [160]5b), SpaTalk reconstructed the spatial
   transcriptomics atlas at single-cell resolution for Slide-seq data,
   which showed consistent spatial localization of glomerular and other
   cells (Fig. [161]4f and Supplementary Fig. [162]5c, d). Compared to
   Slide-seq (v2) data, proximal tubules (PT), connecting tubule (CNT),
   loop of Henle descending loop (LHDL), and loop of Henle ascending loop
   (LHAL) were also observed in the 10x Visium data. Consistently, most
   LRIs mediating intraglomerular communications in Slide-seq (v2) data
   were also found in 10x Visium data (Fig. [163]4g and Supplementary
   Fig. [164]5e). Similar to the Vegfa paracrine system, Angiopoietin-1
   (Angpt1) is expressed by podocytes and its cognate tyrosine kinase
   receptor (Tek) is expressed by the glomerular endothelial cells, which
   plays an indispensable role in glomerular health and maintenance of the
   filtration barrier in physiological conditions^[165]47. Moreover, the
   direct communication between Endo to MC mediated by platelet-derived
   growth factor B and its receptor (Pdgfb-Pdgfrb) was significantly
   enriched (Fig. [166]4h), in accordance with the classical studies that
   have delineated a key role for the Pdgfb in communication between the
   glomerular endothelium and nearby mesangial cells^[167]45,[168]47.

Spatial characterization of cell types over 10x Visium data

   Given the widely used 10x Visium tool in ST studies, we applied SpaTalk
   to another human skin squamous cell carcinoma (SCC) ST dataset
   published by Ji et al.^[169]22, who profiled SCC and matched normal
   tissues via 10x scRNA-seq and used Visium to identify a tumor-specific
   keratinocyte (TSK) in the tumor (Fig. [170]5a). Using the matched SCC
   scRNA-seq data of patient 2 as reference, the optimal cell-type
   composition for each spot was deconvoluted by SpaTalk, which exhibited
   a similar characterization with that histologically assessed from
   hematoxylin and eosin-stained frozen sections (Fig. [171]5b). The
   percent of TSK inferred by our method was compared to the TSK score
   based on markers (e.g., MMP10, PTHLH, LAMC2, and IL24) defined by Ji et
   al.^[172]22 across 646 spots (Fig. [173]5c). A high correlation between
   the TSK percent and score was observed (Fig. [174]5d). Moreover, the
   percent of inferred cell types was prominently associated with the
   expression of known marker genes, indicating the accuracy of SpaTalk
   for cell-type decomposition (Supplementary Fig. [175]6a, b).

Fig. 5. Spatial characterization of tumor and stromal cells in human squamous
cell carcinoma Visium data with SpaTalk.

   [176]Fig. 5
   [177]Open in a new tab

   a Visium spot-based ST dataset of human skin SCC in patient 2 with the
   matched scRNA-seq dataset involving the main keratinocytes (KC),
   stromal cells, and immune cells. b Cell-type decomposition by SpaTalk.
   Cyc, cycling; Diff, differentiating; NK, natural killer; FB,
   fibroblasts; TSK, tumor-specific keratinocytes. c TSK percent and TSK
   score across spatial spots. The expression of known TSK markers is
   plotted. d Pearson’s correlation coefficient between the TSK percent
   and TSK score. The gray band represents the 95% confidence interval of
   the mean value for the fitting straight line. e Cell-type decomposition
   by SpaTalk at single-cell resolution for the spot-based human skin SCC
   ST data. f Contour plot of TSK, FB, and Endo based on the reconstructed
   single-cell ST atlas by SpaTalk. g TSK leading spots with a TSK score
   >0.8 in space. The bar chart represents the number of different cell
   types and the line chart represents the number of neighbors adjacent to
   TSKs among the TSK leading spots. h Visium spot-based ST dataset of
   human skin SCC in patient 10 and the cell-type decomposition by SpaTalk
   showing the percent of TSKs across 621 spatial spots.

   Next, we reconstructed the single-cell ST profile by assuming a total
   of 30 cells in each spot according to a recent review^[178]11, which
   covered the main epithelial cells, including differentiating, cycling,
   and basal keratinocytes; melanocytes; fibroblasts (FB); Endo; natural
   killer (NK) cells; and T cells (Fig. [179]5e). Despite the asymmetrical
   distribution for most cell types, TSK, FB, and Endo showed specific
   patterns of locations in space, which were highly adjacent in some
   tumor areas (Fig. [180]5f), forming direct cell–cell communications in
   the tumor microenvironment (TME). By filtering cells from the TSK
   leading spots (score ≥ 0.8), we found that TSKs reside within a
   fibrovascular niche, resulting in high co-localization of TSKs, FB, and
   Endo at the TSK leading spots (Fig. [181]5g), in line with the previous
   findings^[182]22. Moreover, we used SpaTalk to investigate the
   cell-type composition over the ST data of another SCC patient. Despite
   a low percentage across 621 spatial spots, most TSKs centered on a
   handful of corner spots in space, exhibiting a highly consistent
   distribution with TSK scores (Fig. [183]5h and Supplementary
   Fig. [184]6c). Unsurprisingly, the fibrovascular niche was also
   observed in the TSK leading spots of patient 10 with clear spatial
   co-localization (Supplementary Fig. [185]6d), indicating the close
   cell–cell communication among TSKs, FB, and Endo in the TME underlying
   the occurrence and development of SCC.

Reconstruction of TSK-stroma communications in space

   To dissect the underlying LRI mediating the spatially resolved
   cell–cell communications between TSKs and stromal cells of the
   fibrovascular niche in the TME, we applied SpaTalk to infer the
   communications between TSK-FB and TSK-Endo pairs over the decomposed
   single-cell ST data of SCC in patient 2, including the top-ranked 20
   LRIs based on the integrated inter-cellular and intracellular scores
   (Fig. [186]6a). Consistent with a TSK-fibrovascular niche, prominent
   TSK signaling to FB and Endo was mediated by several common
   ligand–receptor pairs, including VEGFA-NPR1, VEGFB-NPR1, PGF-SDC1, and
   CDH1-ITGAE, associated with tumor angiogenesis. Additionally, TSKs
   modulate FB through secreting matrix metallopeptidase (MMP)1 and MMP9,
   which are linked to tumor metastasis via cellular movement and
   extracellular matrix (ECM) disassembly. Conversely, FB and Endo
   prominently co-expressed numerous ligands such as MDK, HGF, HMGB1, and
   THBS1, matching TSK receptors that promote the proliferation and
   differentiation of TSKs (Fig. [187]6b and Supplementary Table [188]3).
   Further supporting TSKs as an epithelial mesenchymal transition
   (EMT)-like population, SpaTalk predicted that the widely expressed
   TGFB1 regulates TSKs. TSK receptors corresponding to additional ligands
   from FB and Endo included several integrins (e.g., ITGA5 and ITGB6) and
   nectins (e.g., NECTIN1 and NECTIN2), highlighting other pathways
   associated with EMT and epithelial tumor invasion^[189]52,[190]53.
   Notably, similar LRIs mediating the TSK-stroma communications in space
   were also observed in another SCC patient (Supplementary Fig. [191]7a).

Fig. 6. Reconstruction of cell–cell communications between TSK subpopulations
and stromal cells in space with SpaTalk.

   [192]Fig. 6
   [193]Open in a new tab

   a Top 20 inferred LRIs that mediate cell–cell communication from the
   TSK senders to the FB and Endo receivers. Colored blocks represent the
   known ligand–receptor pairs in CellTalkDB. The asterisk represents the
   significantly enriched LRIs determined by SpaTalk. b Top 20 inferred
   LRIs that mediate cell–cell communication from the FB and Endo senders
   to TSK receivers. c Region of interest (ROI) covering 42 spatial spots
   in space with a high total score of TSK, FB, and Endo according to
   their signature genes. The point plot shows the decomposed single-cell
   ST atlas determined by SpaTalk. The number sign represents the receptor
   contributing to the proliferation and differentiation of TSK. d
   Expression of known cancer-associated fibroblast (CAF) markers across
   TSKs, FB, Endo, and other cells. e Differentially expressed genes
   (DEGs) between EMT-like and EMT-unlike TSKs and the corresponding
   enriched biological processes and pathways. Representative DEGs are
   labeled beside the point. f Number of cell–cell pairs over the LRIs
   from the EMT-like and EMT-unlike TSKs to CAFs. g Communications from
   EMT-like and EMT-unlike TSKs to CAFs mediated by the MMP1-CD44
   interaction in space. h Comparison of communications among CAFs, Endo,
   EMT-like TSKs, and EMT-unlike TSKs with respect to the number of
   cell–cell pairs in space over the matched LRIs evaluated with a paired
   two-sided Wilcoxon test.

   Next, we focused on a region of interest (ROI) covering 42 spatial
   spots in space for in-depth exploration of TSK-stroma communications,
   which exhibited a high total score of TSK, FB, and Endo according to
   their signature genes, occupying the major part in the ROI
   (Fig. [194]6c). By mapping cancer-associated fibroblast (CAF) markers
   to the cells in space, the majority of FB in the ROI highly expressed
   the known CAF marker genes (e.g., VIM, FAP, POSTIN, and SPARC)
   (Fig. [195]6d), hinting at the transformation of FB to CAFs induced by
   the adjacent TSKs and conversely supporting the stemness of TSKs via
   direct cell–cell communication in space (Supplementary Fig. [196]7b).
   Notably, the TSKs appear to be heterogeneous with respect to the broad
   range of EMT scores in the ROI; thus, TSKs were further classified into
   268 EMT-like and 268 EMT-unlike populations (Supplementary
   Fig. [197]7c). By comparing their differentially expressed genes,
   EMT-like TSKs were dramatically enriched with ECM organization,
   proteoglycans in cancer, regulation of cell adhesion, and the
   VEGFA-VEGFR2 signaling pathway, representing more invasive properties
   compared with EMT-unlike TSKs (Fig. [198]6e).

   Additionally, EMT-like TSKs appear to be more communicative with
   surrounding CAFs in the TME in light of the greater number of cell–cell
   pairs over the LRIs that prominently mediate TSK-stroma communications
   in space, such as LAMB3-ITGB1, LAMA3-ITGB1, LAMC2-ITGB1, and MMP1-CD44
   (Fig. [199]6f and Supplementary Fig. [200]7d–f). Metastasis-related
   laminins are essential for formation and function of the basement
   membrane^[201]54, whereas MMPs are involved in ECM breakdown, both
   contributing to the aggravated malignancy of tumors. Moreover, CD44
   expression on CAFs plays a supporting role in the induction of cellular
   stemness^[202]55, wherein CAFs have a preference of cell–cell
   communications with EMT-like TSKs in space (Fig. [203]6g).
   Interestingly, EMT-unlike TSKs notably exhibited more communicative
   cell–cell pairs with Endo, whereas EMT-like TSKs exhibited
   significantly more communicative cell–cell pairs with CAFs over the
   matched LRIs (Fig. [204]6h), consistent with the observed contribution
   of CAFs to EMT in a broad range of tumors^[205]56,[206]57.

Discussion

   We have demonstrated the capabilities of SpaTalk to infer and visualize
   spatially resolved cell–cell communications mediated by significantly
   enriched LRIs under normal and disease states over existing
   representative datasets, including the single-cell ST data generated
   from STARmap, MERFISH, and seqFISH + , and the spot-based ST data
   obtained via Slide-seq and 10x Visium.

   There are two principles to decode the mechanisms of cell–cell
   communications: ligand–receptor proximity and LRT
   co-expression^[207]11. In a given tissue niche, cells are more likely
   to communicate with each other when they are spatially adjacent and
   activate downstream target genes in the receiving cell triggered by the
   LRI in proximal cells; thus, ST data are well suited to apply the two
   principles for inferring inter-cellular communications. Accordingly,
   our proposed SpaTalk realizes the integration of these two principles
   by incorporating the KNN and cell–cell graph network to filter
   spatially proximal cell pairs and corresponding LRIs, followed by
   utilizing the knowledge graph algorithm to model the LRT signal
   propagation process. Consequently, the performance of SpaTalk was
   superior to that of other methods over the benchmarked ST datasets with
   respect to several evaluation indices, demonstrating the reliability of
   the two principles in decoding cellular cross-talk, especially for
   juxtracrine and paracrine communication.

   Importantly, SpaTalk is applicable to either single-cell or spot-based
   ST datasets generated from mainstream ST technologies. For the former,
   SpaTalk assigns a label to each cell by selecting similar cell types
   with the top-ranked weight via NNLM for single-cell ST data, generating
   the ST atlas at single-cell resolution with known cell types for the
   subsequent inference of cell–cell communications. For spot-based ST
   data, SpaTalk selects and maps the optimal combination of cells in
   accordance with the decomposed optimal weight/percent of cell types via
   NNLM and the transcriptome profiles of spatial spots to reconstruct the
   ST atlas at single-cell resolution with known cell types. Notably,
   applications of SpaTalk to the mouse cortex and liver datasets
   sequenced by STARmap and Slide-seq, respectively, revealed the
   evidential LRIs in space that mediate the spatially resolved cell–cell
   communications contributing to normal physiological processes.
   Moreover, exploration of SpaTalk on the human skin SCC dataset obtained
   from 10x Visium identified the variable preference of communication
   among tumor subpopulations, CAF, and Endo. These cases convincingly
   demonstrate the universality of SpaTalk in decoding the mechanism of
   cell–cell communications in space underlying normal and disease tissues
   for single-cell and spot-based ST data.

   As unmatched scRNA-seq and ST data would directly influence the
   cell-type decomposition, an important feature of SpaTalk is the ability
   to assign a spot/cell into an unsure category considering the unseen
   cell types in the scRNA-seq reference. For example, the scRNA-seq
   reference for the Slide-seq mouse cortex ST data was obtained from
   another repository for the same tissue, resulting in numerous unsure
   cells by assuming one cell in each spot in terms of the high resolution
   (10 μm) of Slide-seq technology that almost approaches single-cell
   resolution. Additionally, the low gene coverage of several spatial
   spots severely affects the regression model, which were regarded as the
   unsure type by SpaTalk. However, with respect to the human skin SCC
   datasets, SpaTalk removed the unsure type for the matched scRNA-seq and
   ST data. As the matched multi-modal datasets will undoubtedly become
   greater in number, application of SpaTalk and similar methods will be
   required for accurate inference of spatially resolved cell–cell
   communications.

   Additionally, SpaTalk characterizes the spatial distribution for each
   cell type within the reconstructed ST at single-cell resolution through
   the contour plot of cellular density in space, which enables analyzing
   the proximal relationship between paired cell types. Moreover, SpaTalk
   enables the statistical analyses and visualization of spatially
   proximal LRIs in space, forming a dynamic cell–cell communication
   network. Currently, it is hard to analyze and visualize the LRI at
   single-cell resolution for scRNA-seq data, wherein the common practice
   is to interpret the LRI for paired cell types. By incorporating spatial
   information, SpaTalk displays the enriched LRI at single-cell
   resolution via the spatially proximal co-expressed cell pairs, offering
   an informative approach for the analysis and visualization of the LRI
   and its mediated cell–cell communication underlying the disease
   pathology from a different perspective, as shown in the application of
   SpaTalk to the human skin SCC datasets.

   Adding spatial constraints in cell–cell communication inference is
   critical to the spatial analysis of juxtracrine and paracrine
   communications. However, this constraint inevitably causes the failure
   of inferring long-range communications such as endocrine and telecrine
   signaling. Classification of LRIs into short-range and long-range
   communications with prior knowledge might be helpful to infer the
   comprehensive communication categories computationally. Moreover, it is
   potentially beneficial to include other omics data with the increasing
   multi-modal datasets generated from state-of-the-art technologies such
   as 10x Multiome and Digital Spatial Profiling^[208]58 in studying
   spatially regulated cell–cell communications. Thus, more reliable
   computational models might be needed for more accurate integration of
   multi-modal data and inference.

Methods

Datasets

   For STARmap^[209]19, the single-cell ST data of the mouse cortex
   (20180410-BY3_1kgenes) was obtained from a “public data portal
   [[210]https://www.dropbox.com/sh/f7ebheru1lbz91s/AABYSSjSTppBmVmWl2H4s_
   K-a?dl=0]”. For MERFISH^[211]29, the single-cell ST data of the naïve
   female mouse (Animal_ID: 1, Bregma: 0.26) hypothalamic preoptic region
   was downloaded from “Dryad
   [[212]https://datadryad.org/stash/dataset/doi:10.5061/dryad.8t8s248]”.
   For seqFISH+^[213]30, the single-cell ST data of the mouse cortex and
   olfactory bulb were retrieved from the “Github repository
   [[214]https://github.com/CaiGroup/seqFISH-PLUS]”. For
   Slide-seq^[215]20,[216]21, the spot-based ST data of the mouse liver
   (Puck_180803_8) and somatosensory cortex (Puck_200306_03) were obtained
   from the Broad Institute Single Cell Portal “SCP354
   [[217]https://singlecell.broadinstitute.org/single_cell/study/SCP354/sl
   ide-seq-study]” and “SCP815
   [[218]https://singlecell.broadinstitute.org/single_cell/study/SCP815/hi
   ghly-sensitive-spatial-transcriptomics-at-near-cellular-resolution-with
   -slide-seqv2]”, respectively, and the human and mouse kidney ST
   datasets^[219]44 were collected “here
   [[220]https://cellxgene.cziscience.com/collections/8e880741-bf9a-4c8e-9
   227-934204631d2a]”. For 10x Visium, the ST data and scRNA-seq data of
   human SCC were downloaded from the Gene Expression Omnibus (GEO)
   repository: “[221]GSE144240”^[222]22, and the mouse kidney ST and
   single-nucleus RNA-sequencing data were obtained from 10x Visium
   Spatial Gene Expression of “‘Adult Mouse Kidney (FFPE)’
   [[223]https://www.10xgenomics.com/resources/datasets]” and
   “[224]GSE119531”^[225]51, respectively. The mouse liver scRNA-seq data
   of non-parenchymal and parenchymal hepatic cells were refined from the
   “MCA [[226]https://figshare.com/articles/MCA_DGE_Data/5435866]”^[227]39
   and “[228]GSE125688”^[229]40, respectively. The mouse cortex scRNA-seq
   data were obtained from “[230]GSE71585”^[231]59.

Data processing

   For the liver scRNA-seq datasets, the non-parenchymal cells and
   parenchymal hepatic cells were collected from the MCA^[232]39 and
   “[233]GSE125688”^[234]40, respectively, wherein hepatocytes were
   classified into pericentral and periportal hepatocytes with principal
   component analyses and clustering analysis. For the MERFISH dataset,
   ependymal cells were excluded due to the limited cell number in the
   section (<2). For human and mouse kidney ST data sequenced by Slide-seq
   (v2), datasets with at least 15 mesangial cells were included. For
   other datasets, all cells were included in the filtered matrices. Human
   and mouse gene symbols were revised in accordance with “NCBI gene data
   [[235]https://www.ncbi.nlm.nih.gov/gene/]” updated on June 30, 2021,
   wherein unmatched genes and duplicated genes were removed. For all ST
   and scRNA-seq datasets, the raw counts were normalized via the
   global-scaling normalization method LogNormalize in preparation for
   running the subsequent scDeepSort pipeline.

SpaTalk algorithm

   The SpaTalk model consists of two components: cell-type decomposition
   and spatial LRI enrichment. The first component is to infer cell-type
   composition for single-cell or spot-based ST data, and the second
   component is to infer spatially proximal ligand–receptor interactions
   that mediate cell–cell communications in space.

Cell-type decomposition

   To dissect the cell-type composition for the ST data matrix T [n × s]
   (n genes and s spots/cells), NNLM was first applied to obtain the
   optimal proportion of cell types using the scRNA-seq data matrix
   S[n × c] (n genes and c cells) as the reference with k cell types. Let
   [MATH:
   <mi>Y</mi><mo>=</mo><mrow><mo>{</mo><mrow><msub><mrow><mi>y</mi></mrow>
   <mrow><mn>1</mn></mrow></msub><mo>,</mo><mspace
   width="0.25em"></mspace><msub><mrow><mi>y</mi></mrow><mrow><mn>2</mn></
   mrow></msub><mo>,</mo><mo>…</mo><mo>,</mo><mspace
   width="0.25em"></mspace><msub><mrow><mi>y</mi></mrow><mrow><mi>n</mi></
   mrow></msub></mrow><mo>}</mo></mrow> :MATH]
   be the expression profile for each spot/cell to establish the following
   linear model:
   [MATH: <mi>Y</mi><mo>=</mo><mi>X</mi><mi
   mathvariant="normal">β</mi><mo>+</mo><mi mathvariant="normal">ε</mi>
   :MATH]
   1

   where X = [n × k] is the average expression profile generated from S
   and ε represents random error. Mean relative entropy loss was then used
   to measure the difference between the predicted and observed values.
   Therefore, the objective function can be written as:
   [MATH: <mi>arg</mi><mi>min</mi><mfenced close="}" open="{"><mrow><mi
   mathvariant="normal">β</mi><mo>≥</mo><mn>0</mn></mrow></mfenced><mi
   mathvariant="normal">L</mi><mfenced close=")"
   open="("><mrow><mi>Y</mi><mo>−</mo><mi>X</mi><mi
   mathvariant="normal">β</mi></mrow></mfenced><mo>+</mo><msub><mrow><mi
   mathvariant="normal">λ</mi></mrow><mrow><mn>1</mn></mrow></msub><msub><
   mrow><mi
   mathvariant="normal">R</mi></mrow><mrow><mn>1</mn></mrow></msub><mrow><
   mo>(</mo><mrow><mi
   mathvariant="normal">β</mi></mrow><mo>)</mo></mrow><mo>+</mo><msub><mro
   w><mi mathvariant="normal">λ</mi></mrow><mrow><mi
   mathvariant="normal">α</mi></mrow></msub><msub><mrow><mi
   mathvariant="normal">R</mi></mrow><mrow><mi
   mathvariant="normal">α</mi></mrow></msub><mrow><mo>(</mo><mrow><mi
   mathvariant="normal">β</mi></mrow><mo>)</mo></mrow><mo>+</mo><msub><mro
   w><mi
   mathvariant="normal">λ</mi></mrow><mrow><mn>2</mn></mrow></msub><msub><
   mrow><mi
   mathvariant="normal">R</mi></mrow><mrow><mn>2</mn></mrow></msub><mrow><
   mo>(</mo><mrow><mi mathvariant="normal">β</mi></mrow><mo>)</mo></mrow>
   :MATH]
   2

   where R[1], R[α], and R[2] represent the L1, angle, and L2
   regularization with non-negative λ[1], λ[α], and λ[2] initialized by
   zero^[236]23,[237]24. The model was trained with the above objective
   function using Lee’s multiplicative iteration algorithm^[238]25 with
   default hyperparameters until convergence or after 10,000 iterations to
   generate the coefficient matrix C[k × s].

   For single-cell ST data, the cell type with the maximum coefficient was
   assigned to each cell. For spot-based ST data, let M be the maximum
   cell number for each spot, which was set to 30 for 10x Visium data and
   was set to 1 for Slide-seq data according to a recent review^[239]12.
   In practice, the optimal cellular combination ω for each spot was
   determined by the following function:
   [MATH: <msub><mrow><mi
   mathvariant="normal">ω</mi></mrow><mrow><mi>i</mi></mrow></msub><mfence
   d close=")" open="("><mrow><mi>i</mi><mo>∈</mo><mfenced close="}"
   open="{"><mrow><mn>1</mn><mo>,</mo><mspace
   width="0.25em"></mspace><mn>2</mn><mo>,</mo><mo>…</mo><mo>,</mo><mspace
   width="0.25em"></mspace><mi>k</mi></mrow></mfenced></mrow></mfenced><mo
   >=</mo><mfenced open="{"><mrow><mtable><mtr><mtd
   columnalign="center"><mfenced close="]"
   open="["><mrow><msub><mrow><mi>M</mi><mi
   mathvariant="normal">β</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow>
   </mfenced><mo>+</mo><mn>1</mn></mtd><mtd columnalign="right"><mfenced
   close=")" open="("><mrow><mfenced close="}"
   open="{"><mrow><msub><mrow><mi>M</mi><mi>β</mi></mrow><mrow><mi>i</mi><
   /mrow></msub></mrow></mfenced><mo>≥</mo><mn>0.5</mn></mrow></mfenced></
   mtd></mtr><mtr><mtd columnalign="center"><mfenced close="]"
   open="["><mrow><msub><mrow><mi>M</mi><mi
   mathvariant="normal">β</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow>
   </mfenced></mtd><mtd columnalign="right"><mfenced close=")"
   open="("><mrow><mfenced close="}"
   open="{"><mrow><msub><mrow><mi>M</mi><mi>β</mi></mrow><mrow><mi>i</mi><
   /mrow></msub></mrow></mfenced><mspace
   width="0.25em"></mspace><mo><</mo><mspace
   width="0.25em"></mspace><mn>0.5</mn></mrow></mfenced></mtd></mtr></mtab
   le></mrow></mfenced> :MATH]
   3

   wherein
   [MATH: <mrow><mo>[</mo><mrow><msub><mrow><mi>M</mi><mi
   mathvariant="normal">β</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow>
   <mo>]</mo></mrow> :MATH]
   and
   [MATH: <mrow><mo>{</mo><mrow><mi>M</mi><msub><mrow><mi
   mathvariant="normal">β</mi></mrow><mrow><mi>i</mi></mrow></msub></mrow>
   <mo>}</mo></mrow> :MATH]
   represent the integer and fractional parts of
   [MATH: <msub><mrow><mi>M</mi><mi
   mathvariant="normal">β</mi></mrow><mrow><mi>i</mi></mrow></msub> :MATH]
   , respectively. For each spot, we randomly selected m (
   [MATH:
   <mi>m</mi><mo>=</mo><msubsup><mrow><mo>∑</mo></mrow><mrow><mi>i</mi><mo
   >=</mo><mn>1</mn></mrow><mrow><mi>k</mi></mrow></msubsup><msub><mrow><m
   i mathvariant="normal">ω</mi></mrow><mrow><mi>i</mi></mrow></msub>
   :MATH]
   ) cells from S to compare their merged expression profile ϵ with the
   ground truth according to the following function:
   [MATH: <mi
   mathvariant="normal">argmin</mi><mrow><mo>{</mo><mrow><mi>m</mi><mo>≤</
   mo><mi>M</mi></mrow><mo>}</mo></mrow><msubsup><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>n</mi></mrow></msubsup><msup><mrow><mrow><mo>(</mo><mrow><msu
   b><mrow><mi>Y</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>−</mo><msubs
   up><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>m</mi></mrow></msubsup><msubsup><mrow><mi
   mathvariant="normal">ϵ</mi></mrow><mrow><mi>i</mi></mrow><mrow><mi>j</m
   i></mrow></msubsup></mrow><mo>)</mo></mrow></mrow><mrow><mn>2</mn></mro
   w></msup> :MATH]
   4

   To assign a coordinate
   [MATH: <mrow><mo>(</mo><mrow><mover
   accent="true"><mrow><mi>x</mi></mrow><mrow><mo>^</mo></mrow></mover><mo
   >,</mo><mover
   accent="true"><mrow><mi>y</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mo>)</mo></mrow> :MATH]
   for each sampled cell, we proposed a probabilistic distribution for a
   given cell in each spot (x[0], y[0]) by considering the ratio (R) of
   the same cell type in Q neighbor spots as the probability to locate the
   cell into the space with the following function:
   [MATH: <mtable><mtr><mtd columnalign="center"><mover
   accent="true"><mrow><mi>x</mi></mrow><mrow><mo>^</mo></mrow></mover><mo
   >=</mo><msub><mrow><mi>x</mi></mrow><mrow><mn>0</mn></mrow></msub><mo>+
   </mo><mi
   mathvariant="normal">α</mi><msub><mrow><mi>d</mi></mrow><mrow><mi>min</
   mi></mrow></msub><mi>cos</mi><mrow><mo>(</mo><mrow><mi
   mathvariant="normal">θ</mi><mi
   mathvariant="normal">π</mi><mo>/</mo><mn>180</mn></mrow><mo>)</mo></mro
   w><mo>/</mo><mn>2</mn></mtd></mtr><mtr><mtd columnalign="center"><mover
   accent="true"><mrow><mi>y</mi></mrow><mrow><mo>^</mo></mrow></mover><mo
   >=</mo><msub><mrow><mi>y</mi></mrow><mrow><mn>0</mn></mrow></msub><mo>+
   </mo><msub><mrow><mi
   mathvariant="normal">α</mi><mi>d</mi></mrow><mrow><mi>min</mi></mrow></
   msub><mi>sin</mi><mrow><mo>(</mo><mrow><mi
   mathvariant="normal">θ</mi><mi
   mathvariant="normal">π</mi><mo>/</mo><mn>180</mn></mrow><mo>)</mo></mro
   w><mo>/</mo><mn>2</mn></mtd></mtr></mtable> :MATH]
   5

   where d[min] represents the spatial distance of the closest neighbor
   spot, and α
   [MATH: <mo>∈</mo> :MATH]
   (0, 1] and θ∈(0, 360] mean the weight for d[min] and the angle towards
   the spot center (x[0], y[0]), respectively. In detail, the θ were first
   determined by the following probability equation:
   [MATH: <mover
   accent="true"><mrow><mi>P</mi></mrow><mrow><mo>^</mo></mrow></mover><mf
   enced close=")" open="("><mrow><mi
   mathvariant="normal">θ</mi></mrow></mfenced><mo>=</mo><mfrac><mrow><msu
   b><mrow><mi>R</mi></mrow><mrow><mi>q</mi></mrow></msub><mo>+</mo><mn>1<
   /mn></mrow><mrow><msubsup><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>Q</mi></mrow></msubsup><mfenced close=")"
   open="("><mrow><msub><mrow><mi>R</mi></mrow><mrow><mi>i</mi></mrow></ms
   ub><mo>+</mo><mn>1</mn></mrow></mfenced></mrow></mfrac><mo>,</mo><mspac
   e width="0.25em"></mspace><mi
   mathvariant="normal">θ</mi><mo>∈</mo><mrow><mo>(</mo><mrow><mn>90</mn><
   mi>q</mi><mo>−</mo><mn>90</mn><mo>,</mo><mspace
   width="0.25em"></mspace><mn>90</mn><mi>q</mi></mrow><mo>]</mo></mrow>
   :MATH]
   6

   where q represents the q[th] neighbor spot in Q, which was set to 4 in
   practice denoting that the space centered the spot were split into four
   areas and the nearest neighbor in each area was filtered. After
   determining the θ, the corresponding neighbor spot (x[θ], y[θ]) was
   selected to determine the probabilistic distribution of α by the
   following equation:
   [MATH: <mover
   accent="true"><mrow><mi>P</mi></mrow><mrow><mo>^</mo></mrow></mover><mf
   enced close=")" open="("><mrow><mi
   mathvariant="normal">α</mi></mrow></mfenced><mo>=</mo><mfenced
   open="{"><mrow><mtable><mtr><mtd
   columnalign="center"><mrow><mo>(</mo><mrow><msub><mrow><mi>R</mi></mrow
   ><mrow><msub><mrow><mi>x</mi></mrow><mrow><mn>0</mn></mrow></msub><mo>,
   </mo><msub><mrow><mi>y</mi></mrow><mrow><mn>0</mn></mrow></msub></mrow>
   </msub><mo>+</mo><mn>1</mn></mrow><mo>)</mo></mrow><mo>/</mo><mrow><mo>
   (</mo><mrow><msub><mrow><mi>R</mi></mrow><mrow><msub><mrow><mi>x</mi></
   mrow><mrow><mn>0</mn></mrow></msub><mo>,</mo><msub><mrow><mi>y</mi></mr
   ow><mrow><mn>0</mn></mrow></msub></mrow></msub><mo>+</mo><msub><mrow><m
   i>R</mi></mrow><mrow><msub><mrow><mi>x</mi></mrow><mrow><mi
   mathvariant="normal">θ</mi></mrow></msub><mo>,</mo><msub><mrow><mi>y</m
   i></mrow><mrow><mi
   mathvariant="normal">θ</mi></mrow></msub></mrow></msub><mo>+</mo><mn>2<
   /mn></mrow><mo>)</mo></mrow><mo>,</mo><mspace
   width="0.25em"></mspace><mi
   mathvariant="normal">α</mi><mo>∈</mo><mrow><mo>(</mo><mrow><mn>0</mn><m
   o>,</mo><mspace
   width="0.25em"></mspace><mn>0.5</mn></mrow><mo>]</mo></mrow></mtd></mtr
   ><mtr><mtd
   columnalign="center"><mrow><mo>(</mo><mrow><msub><mrow><mi>R</mi></mrow
   ><mrow><msub><mrow><mi>x</mi></mrow><mrow><mi
   mathvariant="normal">θ</mi></mrow></msub><mo>,</mo><msub><mrow><mi>y</m
   i></mrow><mrow><mi
   mathvariant="normal">θ</mi></mrow></msub></mrow></msub><mo>+</mo><mn>1<
   /mn></mrow><mo>)</mo></mrow><mo>/</mo><mrow><mo>(</mo><mrow><msub><mrow
   ><mi>R</mi></mrow><mrow><msub><mrow><mi>x</mi></mrow><mrow><mn>0</mn></
   mrow></msub><mo>,</mo><msub><mrow><mi>y</mi></mrow><mrow><mn>0</mn></mr
   ow></msub></mrow></msub><mo>+</mo><msub><mrow><mi>R</mi></mrow><mrow><m
   sub><mrow><mi>x</mi></mrow><mrow><mi
   mathvariant="normal">θ</mi></mrow></msub><mo>,</mo><msub><mrow><mi>y</m
   i></mrow><mrow><mi
   mathvariant="normal">θ</mi></mrow></msub></mrow></msub><mo>+</mo><mn>2<
   /mn></mrow><mo>)</mo></mrow><mo>,</mo><mspace
   width="0.25em"></mspace><mi
   mathvariant="normal">α</mi><mo>∈</mo><mrow><mo>(</mo><mrow><mn>0.5</mn>
   <mo>,</mo><mspace
   width="0.25em"></mspace><mn>1</mn></mrow><mo>]</mo></mrow></mtd></mtr><
   /mtable></mrow></mfenced> :MATH]
   7

   where
   [MATH:
   <msub><mrow><mi>R</mi></mrow><mrow><msub><mrow><mi>x</mi></mrow><mrow><
   mn>0</mn></mrow></msub><mo>,</mo><msub><mrow><mi>y</mi></mrow><mrow><mn
   >0</mn></mrow></msub></mrow></msub> :MATH]
   and
   [MATH:
   <msub><mrow><mi>R</mi></mrow><mrow><msub><mrow><mi>x</mi></mrow><mrow><
   mi
   mathvariant="normal">θ</mi></mrow></msub><mo>,</mo><msub><mrow><mi>y</m
   i></mrow><mrow><mi
   mathvariant="normal">θ</mi></mrow></msub></mrow></msub> :MATH]
   represent the ratio of the given cell type in each spot and its
   neighbor spot. By integrating the optimal cellular combinations for all
   spots, ST data at single-cell resolution were reconstructed for the
   spot-based ST data.

Spatial LRI enrichment

   To generate the cell–cell distance matrix D, the Euclidean distance
   between cells was calculated using the single-cell spatial coordinates
   of ST data. Inspired by Giotto, the KNN algorithm was then applied to
   each cell to select the K nearest cells from D to construct the cell
   graph network, wherein K was set to 10 by default. For a given ligand i
   of the sender (cell type A) and a given receptor j of the receiver
   (cell type B), the number of ligand–receptor co-expressed cell–cell
   pairs (
   [MATH:
   <msubsup><mrow><mi>C</mi></mrow><mrow><mi>A</mi><mi>i</mi><mo>,</mo><mi
   >B</mi><mi>j</mi></mrow><mrow><mn>0</mn></mrow></msubsup> :MATH]
   ) was obtained from the graph network by counting the 1-hop neighbor
   nodes of receivers for each sender, resulting in the different number
   of cell–cell pairs for a given LRI pair between the sender cell type A
   and the receiver cell type B. The permutation test was then performed
   by randomly shuffling cell labels to recalculate the number of LRI
   pairs. By repeating this step Z times, a background distribution
   [MATH:
   <mi>C</mi><mo>=</mo><mrow><mo>{</mo><mrow><msubsup><mrow><mi>C</mi></mr
   ow><mrow><mi>A</mi><mi>i</mi><mo>,</mo><mi>B</mi><mi>j</mi></mrow><mrow
   ><mn>1</mn></mrow></msubsup><mo>,</mo><mspace
   width="0.25em"></mspace><msubsup><mrow><mi>C</mi></mrow><mrow><mi>A</mi
   ><mi>i</mi><mo>,</mo><mi>B</mi><mi>j</mi></mrow><mrow><mn>2</mn></mrow>
   </msubsup><mo>,</mo><mo>…</mo><mo>,</mo><mspace
   width="0.25em"></mspace><msubsup><mrow><mi>C</mi></mrow><mrow><mi>A</mi
   ><mi>i</mi><mo>,</mo><mi>B</mi><mi>j</mi></mrow><mrow><mi>Z</mi></mrow>
   </msubsup></mrow><mo>}</mo></mrow> :MATH]
   was obtained for comparison with the real interacting score, and the
   P-value was calculated as follows:
   [MATH:
   <msub><mrow><mi>P</mi></mrow><mrow><mi>A</mi><mi>i</mi><mo>,</mo><mi>B<
   /mi><mi>j</mi></mrow></msub><mo>=</mo><mi
   mathvariant="normal">crad</mi><mrow><mo>{</mo><mrow><mi>x</mi><mo>∈</mo
   ><mi>C</mi><mo>∣</mo><mi>x</mi><mo>≥</mo><msubsup><mrow><mi>C</mi></mro
   w><mrow><mi>A</mi><mi>i</mi><mo>,</mo><mi>B</mi><mi>j</mi></mrow><mrow>
   <mn>0</mn></mrow></msubsup></mrow><mo>}</mo></mrow><mo>/</mo><mi>Z</mi>
   :MATH]
   8

   where
   [MATH:
   <msub><mrow><mi>P</mi></mrow><mrow><mi>A</mi><mi>i</mi><mo>,</mo><mi>B<
   /mi><mi>j</mi></mrow></msub> :MATH]
   values less than 0.05 were filtered to calculate the inter-cellular
   score of LRI from senders to receivers (
   [MATH:
   <msubsup><mrow><mi>S</mi></mrow><mrow><mi>A</mi><mi>i</mi><mo>,</mo><mi
   >B</mi><mi>j</mi></mrow><mrow><mi
   mathvariant="normal">inter</mi></mrow></msubsup><mo>=</mo><mn>1</mn><mo
   >−</mo><msub><mrow><mi>P</mi></mrow><mrow><mi>A</mi><mi>i</mi><mo>,</mo
   ><mi>B</mi><mi>j</mi></mrow></msub> :MATH]
   ). To further enrich the LRIs that activate downstream TFs, target
   genes, and the related pathways of receivers, the knowledge graph was
   introduced to model the intracellular signal propagation process. In
   practice, LRIs from CellTalkDB, pathways from KEGG and Reactome, and
   TFs from AnimalTFDB were integrated to construct the ligand–receptor–TF
   knowledge graph (LRT-KG), wherein the weight between entities
   represents the co-expressed coefficient. Taking the receptor as the
   query node, we incorporated the random-walk algorithm into the LRT-KG
   to filter and score the downstream activated t TFs with no more than 10
   steps and Z iterations; thus, the probability p for each TF can be
   calculated with the ratio of successful hits from the query node to the
   target TF during the Z random walks. By integrating the co-expressed
   TFs and the corresponding target genes from the LRT-KG, the
   intracellular score of LRI from senders to receivers can be written as:
   [MATH:
   <msubsup><mrow><mi>S</mi></mrow><mrow><mi>A</mi><mi>i</mi><mo>,</mo><ms
   pace width="0.25em"></mspace><mi>B</mi><mi>j</mi></mrow><mrow><mi
   mathvariant="normal">intra</mi></mrow></msubsup><mo>=</mo><msubsup><mro
   w><mo
   mathsize="big">∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>t</mi></mrow></msubsup><msub><mrow><mi
   mathvariant="normal">θ</mi></mrow><mrow><mi>k</mi></mrow></msub><mo>×</
   mo><msub><mrow><mi>p</mi></mrow><mrow><mi>k</mi></mrow></msub><mo>/</mo
   ><msub><mrow><mi
   mathvariant="normal">η</mi></mrow><mrow><mi>k</mi></mrow></msub> :MATH]
   9

   where θ represents the number of targeted genes, η represents the step
   from the receptor to the TF in the LRT-KG. By the sigmoid
   transformation for
   [MATH:
   <msubsup><mrow><mi>S</mi></mrow><mrow><mi>A</mi><mi>i</mi><mo>,</mo><mi
   >B</mi><mi>j</mi></mrow><mrow><mi
   mathvariant="normal">intra</mi></mrow></msubsup> :MATH]
   , the final score of the LRI from cell type A to cell type B can be
   written as:
   [MATH:
   <msub><mrow><mi>S</mi></mrow><mrow><mi>A</mi><mi>i</mi><mo>,</mo><mi>B<
   /mi><mi>j</mi></mrow></msub><mo>=</mo><msqrt><mrow><msubsup><mrow><mi>S
   </mi></mrow><mrow><mi>A</mi><mi>i</mi><mo>,</mo><mspace
   width="0.25em"></mspace><mi>B</mi><mi>j</mi></mrow><mrow><mi
   mathvariant="normal">inter</mi></mrow></msubsup><mo>×</mo><msubsup><mro
   w><mi>S</mi></mrow><mrow><mi>A</mi><mi>i</mi><mo>,</mo><mspace
   width="0.25em"></mspace><mi>B</mi><mi>j</mi></mrow><mrow><mi
   mathvariant="normal">intra</mi></mrow></msubsup></mrow></msqrt> :MATH]
   10

Comparison with other methods

   STARmap, MERFISH, and seqFISH+ ST data were used to compare the
   performance of SpaTalk with other existing cell-type decomposition
   methods. For these single-cell ST data, all cells were split according
   to the fixed spatial distance and then merged into simulated spots as
   the benchmark datasets. RCTD^[240]31, Seurat^[241]32,
   SPOTlight^[242]33, deconvSeq^[243]34, Stereoscope^[244]35, and
   Cell2location^[245]36 were benchmarked with the default parameters and
   evaluated with Pearson’s correlation coefficient and RMSE over the
   predicted and real cell-type composition for each spot.

   Considering the limited genes of MERFISH ST data, STARmap and seqFISH+
   single-cell ST data (including 1020 and 10,000 genes, respectively)
   were used as the benchmark datasets to compare the performance of
   SpaTalk with other cell–cell communication inference methods
   (Giotto^[246]15, SpaOTsc^[247]16, NicheNet^[248]6, CytoTalk^[249]7,
   CellChat^[250]38, CellPhoneDB^[251]37, and CellCall^[252]8). The
   one-sided Wilcoxon test was performed to evaluate the spatial proximity
   significance of the inferred LRIs by comparing the number of expressed
   LRIs between sender-receiver pairs and all cell–cell pairs, and the
   co-expressed percent of the LRI was calculated to evaluate the
   co-expression level by counting the number of expressed LRIs from
   senders to receivers from the cell–cell graph network. All methods were
   benchmarked with the default parameters and default LRI database. All
   inferred LRIs were unbiasedly evaluated with the above criteria except
   for the LRIs from SpaOTsc since the number of inferred LRIs was much
   larger than that of the other methods; thus, the top 1000 LRIs for each
   cell–cell communication were selected from SpaOTsc according to the
   final score. In addition, we benchmarked the performance of SpaTalk and
   other methods by unifying the LRI database, wherein the LRI pairs in
   CellTalkDB, CellPhoneDB, CytoTalkDB, CellChatDB, and CellCallDB were
   used by turns with the default parameters.

   Given the significantly enriched biological processes or pathways in
   the receiver cell type, the Fisher’s exact test was adopted for pathway
   enrichment analysis with the KEGG and Reactome databases on the
   activated genes in receivers using the following function:
   [MATH: <mi>P</mi><mo>=</mo><mfenced close=")"
   open="("><mrow><mfrac><mrow><mi>a</mi><mo>+</mo><mi>b</mi></mrow><mrow>
   <mi>a</mi></mrow></mfrac></mrow></mfenced><mfenced close=")"
   open="("><mrow><mfrac><mrow><mi>c</mi><mo>+</mo><mi>d</mi></mrow><mrow>
   <mi>c</mi></mrow></mfrac></mrow></mfenced><mstyle
   mathsize="2.03em"><mfenced
   open="/"><mrow></mrow></mfenced></mstyle><mfenced close=")"
   open="("><mrow><mfrac><mrow><mi>n</mi></mrow><mrow><mi>a</mi><mo>+</mo>
   <mi>c</mi></mrow></mfrac></mrow></mfenced> :MATH]
   11
                                Interested genes Uninterested genes
   Genes matching the pathway   a                b
   Genes unmatching the pathway c                d
   [253]Open in a new tab

   where n = a + b + c + d; a is the number of inferred target genes
   (interested genes) that match the given pathway; b is the number of a
   given pathway’s genes that exclude a, namely the uninterested genes
   that match the given pathway; c is the number of inferred target genes
   (interested genes) that unmatch the given pathway; d is the number of
   all genes excluding a, b, and c, namely the uninterested genes that
   unmatch the given pathway. NicheNet, CytoTalk, and CellCall were
   benchmarked with the default parameters and the inferred target genes
   for each LRI were evaluated according to the significance of pathway
   enrichment.

Pathway and biological process enrichment

   The “Metascape web tool [[254]https://metascape.org/]”^[255]60 was used
   to perform the enrichment analysis of pathways and biological
   processes, wherein the top 100 highly expressed genes were selected
   according to the fold change of the average gene expression. Gene Set
   Enrichment Analysis (GSEA)^[256]61 was performed using the ranked gene
   list with the clusterprofiler tool to enrich the significantly
   activated pathways and biological processes, whose signatures were
   obtained from the Molecular Signatures Database v7.4 (“MSigDB
   [[257]http://www.gsea-msigdb.org/gsea/msigdb]” ^[258]62, including the
   gene sets from Gene Ontology (GO) and the canonical pathway gene sets
   derived from the KEGG, Reactome, and WikiPathways pathway databases.

Module scoring of hallmarks and signatures

   Hallmark scoring of metabolism of xenobiotics by cytochrome P450,
   synthesis of bile acids and bile salts, TSK, and EMT was performed
   using the “AddModuleScore” function in Seurat with default parameters.
   Hallmark pathways and EMT were obtained from MSigDB^[259]62, and the
   signature genes of the TSK were download from the original publication
   by Jin et al.^[260]22.

Statistics

   R (version 4.1.1), GraphPad Prism 8, and Python 3.9 were used for all
   statistical analyses.

Reporting summary

   Further information on research design is available in the [261]Nature
   Research Reporting Summary linked to this article.

Supplementary information

   [262]Supplementary information^ (16.1MB, pdf)
   [263]Peer Review File^ (12.2MB, pdf)
   [264]Reporting Summary^ (367.3KB, pdf)

Acknowledgements