Abstract

   Deciphering the features, structure, and functions of the cell niche in
   tissues remains a major challenge. Here, we present scNiche, a
   computational framework to identify and characterize cell niches from
   spatial omics data at single-cell resolution. We benchmark scNiche with
   both simulated and biological datasets, and demonstrate that scNiche
   can effectively and robustly identify cell niches while outperforming
   other existing methods. In spatial proteomics data from human
   triple-negative breast cancer, scNiche reveals the influence of the
   microenvironment on cellular phenotypes, and further dissects
   patient-specific niches with distinct cellular compositions or
   phenotypic characteristics. By analyzing mouse liver spatial
   transcriptomics data across normal and early-onset liver failure
   donors, scNiche uncovers disease-specific liver injury niches, and
   further delineates the niche remodeling from normal liver to liver
   failure. Overall, scNiche enables decoding the cellular
   microenvironment in tissues from single-cell spatial omics data.

   Subject terms: Computational biology and bioinformatics, RNA
   sequencing, Computational models, Software
     __________________________________________________________________

   Deciphering the features, structure, and functions of the cell niche in
   tissues remains a major challenge. Here, the authors develop scNiche, a
   computational framework to identify and characterise cell niches from
   spatial omics data at single-cell resolution.

Introduction

   The cell niche, also referred to as the cellular microenvironment or
   spatial domain, is defined as the local environment or communities
   surrounding cells and plays a critical role in determining various
   biological processes, such as maintaining tissue
   homeostasis^[40]1–[41]3 and shaping disease progression^[42]4–[43]6.
   Recent advances in spatial omics technologies^[44]7–[45]15 provide
   molecular profiles at single-cell resolution, allowing systematic
   exploration of cellular states, functions, and interactions in the
   tissue context. However, while these advances have offered extensive
   spatial atlases, it remains a challenge to decipher the latent cell
   niche information within these data accurately.

   Various computational methods have been developed to identify cell
   niches by integrating molecular profiles of the cell with spatial
   information. Early methods such as HMRF^[46]16, BayesSpace^[47]17, and
   DR-SC^[48]18 employ a Potts model to encourage physically proximal
   cells to have the same label. This strategy assumes that a cell niche
   is a region with homogeneous gene expression and models the gene
   expression of all cells with the same distribution, which cannot
   accurately capture the gene expression heterogeneity of different cell
   types within the same niche^[49]19. As an improvement, BASS^[50]19
   introduces additional hierarchical modeling structures on top of the
   Potts model to explicitly model heterogeneous gene expression of
   different cell types, thus enabling more flexible and effective
   modeling of spatial omics data. SCGP^[51]20, as another class of
   method, constructs spatial cellular graphs by computing spatial edges
   and feature edges between cells separately, enabling traditional graph
   community detection algorithms to identify cell niches. Most other
   subsequent methods, on the other hand, tend to combine the molecular
   profiles of the cell itself with that of its neighbors in different
   ways to generate new features that are more representative of the cell
   niche. Specifically, UTAG^[52]21 and CellCharter^[53]22 integrate the
   molecular profiles of neighbors into the cell’s own molecular profiles
   using linear weighting and neighborhood aggregation, respectively.
   BANKSY^[54]23 generates neighbor-augmented features by combining the
   molecular profiles of the cell itself with that of its neighbors, and
   provides a specific hyperparameter to tune the contributions of the
   cells and their local microenvironments. Deep learning-based methods
   such as SpaGCN^[55]24, STAGATE^[56]25, GraphST^[57]26, and
   SpaceFlow^[58]27, learn better latent features through graph neural
   networks. In addition, there are also some methods primarily used for
   spatial proteomics data such as CytoCommunity^[59]28 and
   Spatial-LDA^[60]29, which rely on well-annotated cell type information
   and utilize only the neighborhood composition features of cells to
   identify cell niches. However, these methods may not reveal some niches
   located in spatially specific regions, such as the tumor-immune
   interface, where both tumor and immune cells exhibit altered molecular
   profiles^[61]13. Overall, the effectiveness of these methods suggests
   that the various features of cells and their microenvironments may both
   be potentially helpful in accurately identifying cell niches. However,
   current methods are generally designed based on a fixed architecture of
   feature combinations, which may have limitations when users want to
   integrate specific combinations of features they only have. In
   addition, except for a few methods such as BANKSY and CellCharter, most
   methods are primarily demonstrated on small datasets (such as the
   spatial transcriptomics data of individual tissue slices). Scaling to
   large datasets with dozens or hundreds of tissue slices and
   simultaneously identifying conserved or specific cell niches across
   these slices remains a prominent challenge in the field.

   In this study, we define the features of the cell itself (e.g., the
   molecular profiles of the cell) and various features of its
   microenvironment (e.g., the cellular compositions or molecular profiles
   of neighborhoods of the cell) in a unified way as features from
   different “views” of the cell, and introduce scNiche, a computational
   method that leverages these multi-views features of the cell to
   identify and characterize cell niches in tissue. We highlight the
   novelty and strengths of scNiche over other existing methods: 1) unlike
   most previous deep learning-based methods, which typically run graph
   neural networks on the spatial graph to integrate molecular profiles,
   scNiche first constructs separate graphs for features from different
   views of the cell, and then utilizes the graph neural networks to
   integrate these multi-views features of the cell into a meaningful
   joint representation of niches. This unique model framework allows the
   flexibility to dynamically replace or add features from other views of
   the cell in practice, and as such can be used as a model paradigm to
   comprehensively consider and investigate the optimal combination of
   multi-views features of the cell for niche modeling; 2) through the
   batch training strategy, scNiche can scale to large datasets containing
   millions of cells from a series of tissue slices, holding the potential
   to simultaneously identify conserved or specific cell niches across
   multiple slices or samples.

   We first benchmarked the performance of scNiche with existing methods
   using simulated and biological datasets. We then applied scNiche to a
   variety of spatial omics datasets from different tissues, including
   human triple-negative breast cancer across two archetypical subtypes
   (mixed and compartmentalized)^[62]13 as well as mouse liver under
   normal and early-onset liver failure states^[63]10, to identify
   patient- or disease-specific cell niches and to further provide
   comprehensive characterization and interpretation of these niches from
   both the cellular composition and molecular expression perspectives.

Results

Design concept of scNiche

   scNiche is designed to leverage and integrate multi-view features of
   the cell from both itself and its microenvironment to identify cell
   niches. By default, scNiche takes single-cell spatial omics data as
   input and first extracts the following three-view features of each cell
   within a pre-defined neighborhood range: the molecular profiles of the
   cell, the molecular profiles of its neighborhoods, and the cellular
   compositions of its neighborhoods (Fig. [64]1a). Notably, when applied
   to spatial transcriptomics datasets containing multiple tissue slices,
   dimensionality reduction and batch correction on the features of the
   first two views are usually necessary to balance the dimensionality of
   different views while eliminating potential batch effects across
   different slices (Methods). On the other hand, in addition to the
   default three views, features from other views (such as the
   histological information of cells or the deconvoluted cellular
   compositions of spots in the low-resolution spatial transcriptomics
   data) can also be added or replaced conveniently, allowing for a more
   flexible investigation of the optimal combination of multi-view
   features of the cell for niche modeling. Subsequently, scNiche applies
   a neural network architecture of the multiple graph autoencoder (M-GAE)
   coupled with a graph fusion network (GFN) to integrate the multi-view
   features of the cell into a joint representation (z). Specifically, the
   M-GAE model encodes the complementary information of multi-view data,
   and the GFN captures the relationships among graphs from different
   views and generates a consensus graph that contains a global node
   relationship across all views, which is then input back into the M-GAE
   model. scNiche also applies a multi-view mutual information
   maximization (MMIM) module to guide the joint representation (z) to be
   more clustering-friendly by boosting the similarity between
   representations of neighboring samples within any view (Fig. [65]1a and
   Supplementary Fig. [66]S1). The training process is guided by
   minimizing the combined loss function comprising the M-GAE
   reconstruction, graph reconstruction, and mutual information loss
   (Fig. [67]1a and Methods). Additionally, a batch training strategy is
   developed to enable scNiche to efficiently handle large datasets
   (Methods). After model training, the learned joint representation (z)
   can be clustered using any unsupervised clustering algorithms such as
   k-means or Leiden^[68]30 to identify the cell niches. Finally, scNiche
   also implements an integrated downstream analytical framework for the
   comprehensive characterization of identified cell niches (Fig. [69]1b
   and Methods).

Fig. 1. Overview of scNiche.

   [70]Fig. 1
   [71]Open in a new tab

   a Schematic workflow of scNiche. Given the spatial omics data, scNiche
   first extracts the multi-view features of cells within a pre-defined
   neighborhood range. Subsequently, scNiche combined features from
   different views into a joint representation (z). The combined loss
   function comprising the M-GAE reconstruction loss (
   [MATH:
   <msub><mrow><mi>L</mi></mrow><mrow><mi>r</mi><mi>e</mi><mi>c</mi></mrow
   ></msub> :MATH]
   ), graph reconstruction loss (
   [MATH:
   <msub><mrow><mi>L</mi></mrow><mrow><mi>g</mi><mi>r</mi><mi>e</mi></mrow
   ></msub> :MATH]
   ), and mutual information loss (
   [MATH:
   <msub><mrow><mi>L</mi></mrow><mrow><mi>m</mi><mi>i</mi><mi>m</mi></mrow
   ></msub> :MATH]
   ) is applied to guide the training process of the model. scNiche
   finally performs the unsupervised clustering step on the learned joint
   representation (z) to identify cell niches. b Downstream analytical
   framework of scNiche for the comprehensive characterization and
   interpretation of identified cell niches.

Multi-view feature fusion improves the accuracy of cell niche identifications

   We first evaluated the performance of scNiche using the simulated
   datasets generated by scCube^[72]31, where the heterogeneity in both
   the cellular composition and gene expression of cell niches was
   considered. Furthermore, the cell niches in each simulated dataset
   exhibited variations in spatial continuity and compositional
   complexity, aiming to simulate the cellular microenvironment across
   different tissues (Supplementary Fig. [73]S2a and Methods). Ten
   existing methods (DC-SC^[74]18, BASS^[75]19, UTAG^[76]21,
   CellCharter^[77]22, BANKSY^[78]23, SpaGCN^[79]24, STAGATE^[80]25,
   GraphST^[81]26, SpaceFlow^[82]27, and CytoCommunity^[83]28) were
   selected for comparison. Two evaluation metrics, the adjusted Rand
   index (ARI) and the macro-F1 score, were used to assess the accuracy of
   identifying true cell niches. As shown in Supplementary Fig. [84]S2b,
   scNiche outperformed other methods in accurately identifying cell
   niches, with its performance being nearly unaffected by the spatial
   continuity or compositional complexity of cell niches (Supplementary
   Fig. [85]S2c–e).

   We next assessed the performance of scNiche when the data quality
   degrades through two simulation scenarios. Specifically, in one
   scenario, we randomly set the expression values of a certain proportion
   of genes to 0 (the gene expression dropout), and in another scenario,
   we randomly altered the cell annotation labels of a certain proportion
   of cells to “ambiguous” (the cell annotation dropout) (Methods). As
   expected, the accuracy of all methods dropped as the data quality
   degraded (Supplementary Fig. [86]S3a, b). In the former simulation
   scenario, scNiche exhibited relatively stable performance at lower
   dropout rates of gene expression. However, for higher dropout rates of
   gene expression, the performance of scNiche and all other methods
   declined dramatically (Supplementary Fig. [87]S3a). In the latter
   simulation scenario, we found that the performance of scNiche was more
   robust compared to CytoCommunity as the dropout rates of cell
   annotation increased, suggesting that scNiche’s strategy of multi-view
   feature fusion may effectively mitigate the impact of ambiguous cell
   annotations by considering the cell gene expression information
   (Supplementary Fig. [88]S3b).

   We also conducted ablation studies on each view of the three default
   inputs as well as on each model component of scNiche respectively to
   assess their individual contributions. For the former, as shown in
   Supplementary Table [89]1–[90]2, scNiche outperformed all its
   derivatives, each of which excludes the fusion of features from a
   specific view, indicating that features from all three views contribute
   to the accurate identification of cell niches. Furthermore, scNiche
   also performed better than using the features from a single view alone,
   and its model-based feature fusion strategy was superior to the simple
   concatenation of features from different views. For the latter, as
   expected, scNiche was unable to effectively encode the complementary
   information from multiple views of cells when the M-GAE or GFN
   component was removed, resulting in an inability to accurately model
   the cellular microenvironment (Supplementary Table [91]3). In addition,
   the performance of scNiche-w/o MMIM also declined compared to scNiche,
   suggesting that the MMIM component contributes to the learning of more
   discriminative joint representations (Supplementary Table [92]3).

   In summary, the performance evaluation on the simulated datasets
   demonstrated that scNiche can effectively integrate information from
   different views of the cell and holds the potential to identify cell
   niches accurately.

Performance evaluation of scNiche on mouse spleen CODEX dataset

   The real spatial omics data are better examples than simulated data for
   evaluating the performance of scNiche because the cell niches therein
   are objectively present and biologically meaningful. Therefore, we
   first applied scNiche to a mouse spleen spatial proteomics dataset
   generated by the co-detection by indexing (CODEX) technology^[93]12.
   The compartment label of each cell from three wild-type spleen samples
   (BALBc-1, BALBc-2, and BALBc-3) was provided and can be regarded as the
   ground truth of niches (Fig. [94]2a). We first evaluated the
   performance of scNiche’s batch training strategy on this dataset. As
   shown in Supplementary Fig. [95]S4, our results indicated that scNiche
   maintained a relatively stable performance under different batch number
   settings without requiring additional training epochs. Furthermore, the
   performance of scNiche in identifying cell niches across multiple
   slices is comparable to that of using only a single slice, which was
   consistent with previous findings^[96]32 (Supplementary Fig. [97]S5).
   Benchmarking results showed that scNiche outperformed other methods on
   both evaluation metrics, and accurately identified the marginal zone (a
   unique cell niche located on the periphery of the B cell follicle) in
   all three samples (Fig. [98]2b, c).

Fig. 2. Performance evaluation of scNiche on spatial proteomics datasets.

   [99]Fig. 2
   [100]Open in a new tab

   a The mouse spleen CODEX data from three wild-type spleen samples, each
   cell is colored by the cell type labels (left) and the niche labels
   (right). b The cell niches identified by each method on the mouse
   spleen CODEX data. c, Performance comparison of scNiche with other
   methods across three samples using the adjusted Rand index (ARI) (left)
   and the macro-F1 score (right) metrics on mouse spleen CODEX dataset.
   The ARI and macro-F1 score metrics relative to the niche annotation are
   calculated based on the result of each method. Data are presented as
   boxplots (minima, 25th percentile, median, 75th percentile, and
   maxima). n = 3 samples for each method. d The human UTUC IMC data from
   one representative sample, each cell is colored by the cell type labels
   (top) and the niche labels (bottom). e The cell niches identified by
   each method on the human UTUC IMC data. f Performance comparison of
   scNiche with other methods across 16 samples using the adjusted Rand
   index (ARI) (left) and the macro-F1 score (right) metrics on human UTUC
   IMC dataset. The ARI and macro-F1 score metrics relative to the niche
   annotation are calculated based on the result of each method. Data are
   presented as boxplots (minima, 25th percentile, median, 75th
   percentile, and maxima). n = 16 samples for each method. Source data
   are provided as a Source Data file.

Performance evaluation of scNiche on human upper tract urothelial carcinoma
IMC dataset

   We also applied scNiche to another spatial proteomics dataset from
   human upper tract urothelial carcinoma (UTUC) generated by the Imaging
   Mass Cytometry (IMC) technology^[101]33 to further evaluate its
   performance. In this dataset, 16 images had been manually annotated
   with boundaries of tumor and stroma, which can be regarded as the
   ground truth of niches^[102]21 (Fig. [103]2d and Supplementary
   Fig. [104]S6). As shown in Fig. [105]2e, f and Supplementary
   Fig. [106]S6-[107]7, although distinguishing tumor and stroma niches in
   this dataset is a relatively simple task and all methods achieved
   comparatively good performance in most samples, scNiche still
   demonstrated the best overall performance across all 16 samples, and
   successfully resolved the fine structure of the boundaries of tumor and
   stroma niches in some samples such as PM57_B8-01. Furthermore, scNiche
   with higher clustering granularity identified more fine-grained niches,
   including different immune-enriched niches and tumor-enriched niches
   (Supplementary Fig. [108]S8).

   Additionally, since the finer subpopulation annotation of each cell was
   also provided by the original authors, we thus further evaluated the
   robustness of scNiche to the granularity of cell population annotation.
   As illustrated in Supplementary Fig. [109]S9, scNiche continued to
   accurately identify tumor and stromal niches when using the refined
   cell population annotation of cells and consistently outperformed
   CytoCommunity, which also utilized the refined cellular annotation
   information.

Performance evaluation of scNiche on mouse brain spatial transcriptomics
datasets

   We further evaluated the performance of scNiche on two additional mouse
   brain single-cell spatial transcriptomics datasets with more complex
   niche structures generated by different technologies. Specifically, the
   STARmap dataset^[110]34 contains one tissue slice from the mouse V1
   neocortex and the MERFISH dataset^[111]35 contains 31 tissue slices
   from the mouse frontal cortex and striatum. The brain region label of
   each cell was manually annotated in both datasets and can be regarded
   as the ground truth of niches (Fig. [112]3a, d). Consistent with the
   results on spatial proteomics datasets, scNiche also demonstrated
   superior overall performance compared with other methods on these two
   additional spatial transcriptomics datasets (Fig. [113]3b, c, e, f and
   Supplementary Fig. [114]S10-[115]11), suggesting the general
   applicability of scNiche in accurately identifying cell niches from
   different spatial omics data.

Fig. 3. Performance evaluation of scNiche on spatial transcriptomics
datasets.

   [116]Fig. 3
   [117]Open in a new tab

   a The mouse V1 neocortex STARmap data from one slice, each cell is
   colored by the cell type labels (left) and the niche labels (right). b
   The cell niches identified by each method on the mouse V1 neocortex
   STARmap data. c Performance comparison of scNiche with other methods
   using the adjusted Rand index (ARI) (top) and the macro-F1 score
   (bottom) metrics on mouse V1 neocortex STARmap dataset. The ARI and
   macro-F1 score metrics relative to the niche annotation are calculated
   based on the result of each method. d, The mouse frontal cortex and
   striatum MERFISH data from one representative slice, each cell is
   colored by the cell type labels (top) and the niche labels (bottom). e
   The cell niches identified by each method on the mouse frontal cortex
   and striatum MERFISH data. f Performance comparison of scNiche with
   other methods across 31 slices using the adjusted Rand index (ARI)
   (left) and the macro-F1 score (right) metrics on mouse frontal cortex
   and striatum MERFISH dataset. The ARI and macro-F1 score metrics
   relative to the niche annotation are calculated based on the result of
   each method. Data are presented as boxplots (minima, 25th percentile,
   median, 75th percentile, and maxima). n = 31 slices for each method.
   Source data are provided as a Source Data file.

Performance evaluation of scNiche on low-resolution spatial transcriptomics
dataset

   We explored the potential applicability of scNiche to the spatial
   transcriptomics data generated by platforms with a lower resolution
   such as ST^[118]36 and 10X Visium on the human DLPFC 10X Visium
   dataset^[119]37. Specifically, we first used the human middle temporal
   gyrus (MTG) scRNA-seq data by Hodge et al.^[120]38, which contains 75
   transcriptomically distinct cell types, as the single-cell reference,
   and deconvoluted the spots of each DLPFC slice data using
   Cell2location^[121]39 (Supplementary Fig. [122]S12a). The deconvolution
   results were subsequently used to replace the feature of cellular
   compositions of neighborhoods utilized in the single-cell spatial omics
   data, and were input into scNiche along with features from the
   remaining two views (the molecular profiles of the spot and its
   neighborhoods). Four tissue slices from the same donor were selected,
   where the cortical layer label of each spot was manually annotated and
   can be regarded as the ground truth of niches (Supplementary
   Fig. [123]S12b). As illustrated in Supplementary Fig. [124]S12c, d,
   although scNiche was not originally designed for spot-based spatial
   transcriptomics data, its modified version still performed comparably
   to some state-of-the-art methods on the four DLPFC slices.

Scalability analysis of scNiche to large datasets

   In addition to the spatial transcriptomics and spatial proteomics
   datasets used in the benchmarking studies, we further tested the
   scalability of scNiche and other methods on a much larger mouse whole
   brain MERFISH dataset generated by Zhang et al.^[125]40. As shown in
   Supplementary Fig. [126]S13, scNiche, BANKSY, UTAG, and CellCharter are
   the only four methods that can scale to the dataset with more than 3
   million cells. For the mouse whole brain MERFISH dataset, scNiche
   identified 14 cell niches according to the cluster stability, which
   were aligned across sequential tissue sections (Supplementary
   Fig. [127]S14). Moreover, we also selected four representative sections
   corresponding to different regions of the brain: C57BL6J-1.032,
   C57BL6J-1.056, C57BL6J-1.081, and C57BL6J-1.136, and compared the cell
   niches identified by each method with the anatomical regions from the
   Allen Mouse Brain Reference Atlas^[128]41. As shown in Fig. [129]4a, b,
   the niches identified by scNiche accurately correspond to different
   structures in the mouse brain. In contrast, the niches identified by
   UTAG and the nonspatial method lack clear spatial separation, while
   BANKSY and CellCharter failed to distinguish certain brain regions,
   such as the hypothalamus and striatum in the C57BL6J-1.056 section
   (Fig. [130]4c). In summary, these results suggest that scNiche can
   scale to large datasets while maintaining good performance.

Fig. 4. Performance evaluation of scNiche on mouse whole brain MERFISH
dataset.

   [131]Fig. 4
   [132]Open in a new tab

   a Annotated mouse brain coronal section images from the Allen Mouse
   Brain Atlas. The cell niches identified by scNiche (b) and other
   methods (c) on C57BL6J-1.032, C57BL6J-1.056, C57BL6J-1.081, and
   C57BL6J-1.136 sections.

Robustness analysis of scNiche

   We first evaluated the robustness of scNiche to the size of the
   pre-defined neighborhood range on four different datasets. As shown in
   Supplementary Fig. [133]S15-[134]16, the performance of scNiche was
   stable when different numbers of k-nearest neighbors were selected.
   Indeed, previous studies have shown that moderately changing the number
   of k-nearest neighbors does not lead to a significant degradation in
   method accuracy. Nevertheless, given the complexity of tissues, it may
   be still necessary to empirically determine an appropriate size of the
   neighborhood range in practical analyses^[135]28,[136]42–[137]44.
   Furthermore, we also evaluated the sensitivity of scNiche to different
   random seed choices, and the results indicated that scNiche was also
   relatively robust to different random seed choices compared to other
   methods such as UTAG, SpaGCN, and DR-SC (Supplementary Fig. [138]S17).

scNiche deciphers the cell niches in different subtypes and patients of human
triple-negative breast cancer

   The tumor microenvironment has been demonstrated by mounting evidence
   and new therapeutic strategies to play a pivotal role in the initiation
   and progression of cancer, opening new opportunities for diagnosis and
   therapy^[139]45–[140]49. Here, we applied scNiche to a human
   triple-negative breast cancer (TNBC) dataset generated by the
   multiplexed ion beam imaging by time-of-flight (MIBI-TOF)
   technology^[141]13. Studies have shown that TNBC exhibits three
   archetypical subtypes based on different tumor-immune interactions:
   cold, mixed, and compartmentalized. Among these, the cold subtype is
   characterized by low immune infiltration and is easily distinguished,
   while the mixed and compartmentalized subtypes may contain similar
   numbers of immune cells. Furthermore, the spatial organization and
   degree of mixing of tumor and immune cells may differ in the mixed and
   compartmentalized subtypes^[142]13. Therefore, we used scNiche to
   decipher the tumor microenvironment of these two TNBC subtypes. A total
   of 173,205 cells from 19 mixed subtype samples and 15 compartmentalized
   subtype samples were analyzed.

   According to the cluster stability proposed by Varrone et al.^[143]22
   (Supplementary Fig. [144]S18), scNiche identified 13 cell niches, which
   broadly manifested as tumor-enriched niches (Niche 1, 10, 2, 9, 3, 5,
   and 11) and other immune-enriched niches (Niche 7, 6, 4, 0, 8, and 12)
   characterized by distinct combinations of immune and stroma cell types
   (Fig. [145]5a and Supplementary Fig. [146]S19, [147]20). By comparing
   the enriched cell niches in the two subtypes of TNBC samples, we found
   that the tumor-enriched niches were predominantly enriched in the mixed
   subtype samples. In contrast, other immune-niches were more prevalent
   in the compartmentalized subtype samples (Fig. [148]5a). This finding
   of scNiche was supported by the previous
   studies^[149]13,[150]50,[151]51 that immune cells in the mixed subtype
   demonstrated a higher degree of mixing with tumor cells compared to the
   compartmentalized subtype and were therefore less likely to form
   spatially separate regions. Furthermore, different niches tended to be
   shared only among a small number of specific patients (Supplementary
   Fig. [152]S21a), reflecting the inter-patient heterogeneity of tumor
   microenvironment^[153]52.

Fig. 5. scNiche deciphers the cellular compositional and phenotypic
heterogeneity between immune niches from human TBNC dataset.

   [154]Fig. 5
   [155]Open in a new tab

   a The proportion of immune, stromal, and tumor cells in each niche
   (top) and the enrichment scores of each cell type in each niche
   (bottom) across 19 mixed subtype samples and 15 compartmentalized
   subtype samples. The enrichment scores are calculated among all 13
   niches, and the p-value is calculated with the one-sided Mann-Whitney U
   test (*p ≤ 0.05; **p ≤ 0.01; ***p ≤ 0.001; ****p ≤ 0.0001). b The two
   spatially adjacent cell niches are enriched with immune cells from
   different lineages respectively, corresponding to different immune
   microenvironments. Each cell is colored by the niche labels and the
   cell type labels, and the proportions of CD4 T cells and other immune
   cells within the neighbors of each cell are displayed. c The
   proportions of Niche 6 and Niche 12 in each sample. Only samples with a
   proportion of Niche 6 (or Niche 12) exceeding 5% are shown. The average
   expression levels of monocyte markers, immunoregulatory proteins, and
   antigen presentation proteins of macrophages in Niche 6 and Niche 12 in
   each sample (d) and the proportions of macrophages, immune cells, and
   tumor cells in Niche 6 and Niche 12 in each sample (e). For Niche 6 (or
   Niche 12), only samples with a proportion of that niche exceeding 5%
   are considered. The p-value is calculated with the two-sided
   Mann-Whitney U test. Data are presented as boxplots (minima, 25th
   percentile, median, 75th percentile, and maxima). n = 6 (10) samples
   for Niche 6 (Niche 12). f Niche 6 in Patient 9 (top) and Patient 10
   (bottom) is located at the tumor-immune border. g The spatial
   expression level of CD11b, CD11c, IDO, and PD-L1 in Patient 9 (top) and
   Patient 10 (bottom). Source data are provided as a Source Data file.

   The 6 immune-enriched niches showed differential cellular composition,
   corresponding to distinct microenvironments. Niche 7 exhibited
   significant enrichment of B cells, CD4 T cells, CD8 T cells, dendritic
   cells, and NK cells, which may represent the tertiary lymphoid
   structure (TLS)^[156]53,[157]54; Niche 8 was enriched with endothelial
   cells and mesenchymal-like cells, potentially representing the stromal
   microenvironment in tumor (Supplementary Fig. [158]S21b). The two
   spatially adjacent cell niches, Niche 4 and Niche 0, co-existed in a
   specific subset of patients (Patient 3, 4, 5, 9, and 27) and were
   enriched with lymphoid immune cells (such as CD4 T cells) and other
   immune cells, respectively (Fig. [159]5b and Supplementary
   Fig. [160]S21c). It has been reported that cells from the same lineage
   tended to be more spatially proximate, thus, these two cell niches may
   reflect the diversity of the immune responses to tumors, whereby
   specific immune cells were recruited to the tumor sites via specific
   mechanisms or local environments^[161]13,[162]45,[163]55,[164]56.

   We also noticed that the two macrophage-enriched niches (Niche 6 and
   Niche 12) resolved by scNiche were mainly present in different TNBC
   subtypes and consisted of macrophages with distinct phenotypes
   (Fig. [165]5c). Specifically, while macrophages within these two niches
   exhibited consistent expression of classical monocyte markers such as
   CD68 and CD63, those in Niche 6 also displayed increased expression of
   both CD11b, CD11c, immune regulation proteins, and antigen presentation
   proteins, suggesting they were myeloid derived suppressor cells^[166]57
   (Fig. [167]5d). This inconsistency in the phenotype exhibited by cells
   from the same lineage was observed across the entire patient cohort
   scales, which may be related to the microenvironments the cells reside
   in. We found that the macrophages in Niche 6 were likely to exhibit
   more pronounced spatial proximity to other immune cell types, whereas
   macrophages in Niche 12 were more likely to co-localize with tumor
   cells (Fig. [168]5e). Indeed, Niche 6 may represent a unique niche at
   the tumor-immune border^[169]13, with its altered phenotypes compared
   with Niche 12 arising from changes in the expression profiles across
   all cell types, rather than being specific to a particular cell
   population (Fig. [170]5f, g and Supplementary Fig. [171]S22).

   Further cell population enrichment analysis of 7 tumor-enriched niches
   revealed more subtle compositional differences among them. For
   instance, Niche 5 and Niche 11, characterized by the enrichment of
   Keratin^+ tumor cells, exhibited extremely low immune or stromal cell
   infiltration. In contrast, other tumor-enriched niches exhibited
   heterogeneity in the type of cells infiltrated and the degree of
   infiltration (Fig. [172]6a, b). Although the cellular compositions of
   the immune-exhausted tumor niches (Niche 5 and Niche 11) were similar,
   the tumor cells within these niches exhibited differential expression
   of the tumor-related proteins, including cytokeratin 6 (CK6) and CK17
   (Fig. [173]6c). Interestingly, Niche 5 and Niche 11 were identified in
   different patients, potentially representing patient-specific niches
   composed of cells in distinct cellular states^[174]58 (Fig. [175]6d).
   Furthermore, survival analysis results on public datasets suggested
   that the differences between these two niches may also reflect
   phenotypic differences between patients (Supplementary Fig. [176]S23).
   On the other hand, the tumor cells within other infiltrating
   tumor-enriched niches exhibited variation in the expression of other
   types of proteins, which may be associated with the specific
   infiltrating cell types. For instance, the tumor cells in Niche 9
   showed increased expression of antigen presentation proteins such as
   HLA1 and HLA-DR, suggesting the localized production of cytokines like
   IFN-γ induced by the extensive infiltration of immune
   cells^[177]59,[178]60 (Fig. [179]6e, f). Similarly, we observed that
   the tumor cells in Niche 3, the tumor-enriched niche infiltrated by
   stromal cells, displayed high expression of stromal-related proteins
   such as SMA and vimentin, which may indicate invasion and metastasis,
   and was often associated with poor prognosis^[180]61–[181]63
   (Fig. [182]6g, h).

Fig. 6. scNiche reveals tumor niches characterized with distinct phenotypes
from human TBNC dataset.

   [183]Fig. 6
   [184]Open in a new tab

   a The enrichment scores of each cell type in each tumor-enriched niche
   across 19 mixed subtype samples and 15 compartmentalized subtype
   samples. The enrichment scores are calculated among 7 tumor-enriched
   niches, and the p-value is calculated with the one-sided Mann-Whitney U
   test (*p ≤ 0.05; **p ≤ 0.01; ***p ≤ 0.001; ****p ≤ 0.0001). b Three
   representative tumor-enriched niches exhibit heterogeneity in the type
   of cells infiltrated and the degree of infiltration. Each cell is
   colored by the niche labels (left) and the cell type labels (right). c
   The average expression levels of tumor-related proteins of tumor cells
   in Niche 5 and Niche 11 in each sample. For Niche 5 (or Niche 11), only
   samples with a proportion of that niche exceeding 5% are considered.
   The p-value is calculated with the two-sided Mann-Whitney U test. Data
   are presented as boxplots (minima, 25th percentile, median, 75th
   percentile, and maxima). n = 7 (8) samples for Niche 5 (Niche 11). d
   The proportions of Niche 5 and Niche 11 in each sample. Only samples
   with a proportion of Niche 5 (or Niche 11) exceeding 5% are shown. The
   average expression levels of HLA1 and HLA-DR of tumor cells in Niche 9
   and other tumor-enriched niches in each sample (e) and the proportions
   of tumor cells, immune cells, and stromal cells in Niche 9 and other
   tumor-enriched niches in each sample (f). For Niche 9 (or other
   tumor-enriched niches), only samples with a proportion of that niche
   exceeding 5% are considered. The p-value is calculated with the
   two-sided Mann-Whitney U test. Data are presented as boxplots (minima,
   25th percentile, median, 75th percentile, and maxima). n = 8 (32)
   samples for Niche 9 (other tumor-enriched niches). The average
   expression levels of SMA and Vimentin of tumor cells in Niche 3 and
   other tumor-enriched niches in each sample (g) and the proportions of
   tumor cells, immune cells, and stromal cells in Niche 3 and other
   tumor-enriched niches in each sample (h). For Niche 3 (or other
   tumor-enriched niches), only samples with a proportion of that niche
   exceeding 5% are considered. The p-value is calculated with the
   two-sided Mann-Whitney U test. Data are presented as boxplots (minima,
   25th percentile, median, 75th percentile, and maxima). n = 10 (29)
   samples for Niche 3 (other tumor-enriched niches). Source data are
   provided as a Source Data file.

   Together, these results effectively highlight the important influence
   of the microenvironment on cellular phenotypes, while also
   demonstrating the accuracy of scNiche in revealing both compositional
   and phenotypic heterogeneity of cell niches.

scNiche characterizes the cell niches in normal and disease mouse livers

   To further demonstrate the applicability of scNiche on other types of
   spatial omics data, we next applied scNiche to a mouse liver spatial
   transcriptomics dataset generated by Cho et al.^[185]10 with the
   Seq-Scope technology. A total of 37,505 cells from 6 normal donors and
   4 early-onset liver failure induced by excessive mTORC1
   signaling^[186]64 (Tsc1^Δhep/Depdc5^Δhep or TD model) donors were
   analyzed. Considering the significant batch effects presented in the
   high-dimensional spatial transcriptomics data between normal and TD
   livers, we first used scVI^[187]65 for dimensionality reduction and
   batch effect removal before applying scNiche (Supplementary
   Fig. [188]S24).

   According to the cluster stability (Supplementary Fig. [189]S25),
   scNiche identified 15 cell niches, with the majority of them showing
   specific enrichment in either normal or TD livers, potentially
   revealing distinct physiological states (Fig. [190]7a and Supplementary
   Fig. [191]S26, [192]27a). Specifically, we found that the 7 cell niches
   (Niche 0, 3, 12, 5, 14, 11, and 1) enriched in normal livers exhibited
   spatial continuity, encompassing the zonation patterns from the central
   vein to the portal node (Fig. [193]7b). For example, Niche 0 (enriched
   with pericentral hepatocytes) and Niche 1 (enriched with periportal
   hepatocytes) were located in the pericentral and periportal zones,
   respectively, whereas the other 5 niches were situated in the
   transition zones between pericentral and periportal zones, and
   characterized by various enrichment patterns of different hepatocyte
   subtypes and other non-parenchymal cell types (Fig. [194]7c). Moreover,
   differentially expressed genes (adjusted p-value < 0.05) within these 7
   niches also showed a pronounced spatial expression pattern of zones
   (Fig. [195]7d). From Niche 0 to Niche 1, we observed a gradual decrease
   in the expression of the pericentral genes^[196]66–[197]69 such as
   Cyp2e1, Cyp1a2, Mup17, Gsta3, and Gulo. Conversely, the expression of
   the periportal genes^[198]66–[199]70 such as Ass1, Alb, Cyp2f2, Sds,
   Hsd17b13, and Mup20 exhibited a gradual increase (Fig. [200]7e and
   Supplementary Fig. [201]S27b). Meanwhile, we also found that some
   specific genes exhibited the non-monotonic expression patterns across
   the 7 consecutive niches. For example, the hepcidin-encoding genes,
   Hamp and Hamp2^[202]66, which demonstrated non-monotonic zonation
   expression patterns that peak at intermediate lobule layers, were
   highly expressed in Niche 5 and Niche 12, respectively (Fig. [203]7e).
   Additional non-monotonic genes such as Cyp8b1^[204]66 and
   Apoc1^[205]66,[206]71, were highly expressed in Niche 3 and Niche 5,
   respectively (Supplementary Fig. [207]S27b). The gene expression
   signature scores of KEGG pathways for the 7 niches also largely
   recapitulated previous zonation studies^[208]66,[209]67. Cells in Niche
   0 had higher scores of drug metabolism, primary bile acid biosynthesis,
   and metabolism of xenobiotics pathways, while cells in Niche 1 had
   higher scores of oxidative phosphorylation, gluconeogenesis, and
   complement and coagulation cascades pathways (Fig. [210]7f). Overall,
   these results suggested that scNiche can precisely reveal the zonation
   profiles of normal livers.

Fig. 7. scNiche accurately identifies the zonation patterns from the central
vein to the portal node in the normal mouse liver.

   [211]Fig. 7
   [212]Open in a new tab

   a The proportion of each donor in each niche (top) and the enrichment
   scores of each cell type in each niche (bottom) across 6 normal donors
   and 4 TD donors. The enrichment scores are calculated among all 15
   niches, and the p-value is calculated with the one-sided Mann-Whitney U
   test (*p ≤ 0.05; **p ≤ 0.01; ***p ≤ 0.001; ****p ≤ 0.0001). b The cell
   niches identified by scNiche in tile 2103, tile 2104, and tile 2105,
   each cell is colored by the niche labels (top) and the cell type labels
   (bottom). c The proportion of each cell type in Niche 0, 3, 12, 5, 14,
   11, and 1. d, The normalized average expression levels of 216
   differentially expressed genes (adjusted p-value < 0.05) in Niche 0, 3,
   12, 5, 14, 11, and 1. The p-value is calculated with the two-sided
   Wilcoxon rank-sum test and is adjusted with the Benjamini-Hochberg
   correction. e The average expression levels of Cyp2e1, Gulo, Hamp,
   Hamp2, Alb, and Cyp2f2 in Niche 0, 3, 12, 5, 14, 11, and 1. For each
   niche, only donors with a proportion of that niche exceeding 5% are
   considered. Data are presented as mean values with a 95% confidence
   interval. f The gene expression signature scores of Niche 0, 3, 12, 5,
   14, 11, and 1 for six different KEGG pathways. Data are presented as
   mean values with a 95% confidence interval. Source data are provided as
   a Source Data file.

   To further appreciate the difference in niches between normal and TD
   livers, we also investigated the cell niches enriched in TD livers. The
   scNiche results revealed three unique niches in TD livers: Niche 4,
   Niche 9, and Niche 7. These niches were spatially distributed from the
   core to the periphery of the injury and inflammation sites, and were
   characterized by the enrichment of a series of emerging cell
   populations, including inflamed macrophages, hepatic progenitor cells
   (HPC), activated hepatic stellate cells (HSC-A), and injured
   hepatocytes (Figs. [213]7a, [214]8a). Differential expression analysis
   showed that these three niches upregulated a range of injury-associated
   genes that were individually induced by different cell populations,
   possibly reflecting the unique response of different cell populations
   to liver injury as previously reported^[215]10 (Fig. [216]8b and
   Supplementary Fig. [217]S27c). For example, injured hepatocytes and HPC
   highly expressed serum amyloid protein-encoding genes (Saa1 and Saa2)
   and Spp1, respectively, which have been reported to be associated with
   injury response^[218]72,[219]73, whereas inflamed macrophages and HSC-A
   exhibited high expression levels of pro-inflammatory markers (Cd74 and
   MHC-II components) and fibrosis markers (Acta2 and collagens),
   respectively. Consistent with these up-regulated markers, the cellular
   inflammatory infiltration and fibrosis^[220]74 signature scores in
   Niche 4, Niche 9, and Niche 7 were also higher than in other niches
   (Fig. [221]8c). Overall, these results suggested that these three
   niches identified by scNiche reflected the specific microenvironment
   associated with liver injury.

Fig. 8. scNiche uncovers disease-specific liver injury niches and delineates
the niche remodeling from normal liver to liver failure.

   [222]Fig. 8
   [223]Open in a new tab

   a Highlighting of the injury-associated cell populations and niches
   (Niche 4, 9 and 7) in tile 2117, 2118, and 2119. each cell is colored
   by the niche labels (top) and the cell type labels (bottom). b The
   normalized average expression levels of differentially expressed genes
   in the injury-associated niches (Niche 4, 9, 7) and other niches. The
   top 20 genes highly expressed in the injury-associated niches are
   shown. c The gene expression signature scores of the injury-associated
   niches (Niche 4, 9, 7) and other niches for cellular inflammatory
   infiltration (top) and fibrosis (bottom). The p-value is calculated
   with the two-sided Mann-Whitney U test. Data are presented as boxplots
   (minima, 25th percentile, median, 75th percentile, and maxima).
   n = 1183, 482, 1612, and 34228 cells for Niche 4, Niche 9, Niche 7, and
   other niches. d The number of links among niches on the spatial
   connectivity graph in normal (left) and TD (right) mouse livers. CV,
   central vein; PN, portal node. e Gene set enrichment analysis (GSEA)
   results of differentially expressed genes of Niche 10. f Comparisons of
   the trends in spatial expression of genes across niches from the
   central vein to the portal node in normal and TD mouse livers. Data are
   presented as mean values with a 95% confidence interval. Source data
   are provided as a Source Data file.

   Furthermore, scNiche uncovered the partial remodeling of the zonation
   patterns from the central vein to the portal node in TD livers compared
   to normal livers. Specifically, we found that niches proximal to the
   central vein observed in normal livers (Niche 0, 3 and 12) were also
   present in TD liver, whereas niches proximal to the portal node were
   altered, despite no significant changes in their cellular composition
   (Fig. [224]8d and Supplementary Fig. [225]S27d). These results
   indicated specific phenotypic changes in the niches located proximal to
   the portal node during liver injury, which were precisely captured by
   scNiche. We further performed the differential expression analysis
   between Niche 1 and Niche 10, which were located at the portal node in
   normal and TD livers, respectively, and the results showed that
   up-regulated genes in Niche 10 compared with Niche 1 comprised some
   antioxidant genes such as Gpx3, Gsta1, Gsta2, and Gsto1; the
   cathepsins-encoding genes like Ctsl and Ctsb, which have been reported
   to potentially mediate liver fibrosis^[226]75–[227]77; and several
   specific cytochrome P450 genes whose expression were increased during
   liver injury and fibrosis, including Cyp17a1^[228]78,
   Cyp2b9^[229]79,[230]80, and Cyp2b10^[231]64,[232]79. In contrast, major
   urinary protein-encoding genes, carboxylesterase-encoding genes, and
   Traf5, a protective factor in liver inflammation and hepatic
   steatosis^[233]81,[234]82, were downregulated in Niche 10
   (Supplementary Fig. [235]S27e). Consistent with differential expression
   analysis results, gene set enrichment analysis (GSEA) confirmed that
   the up-regulated genes in Niche 10 were enriched in mTORC1 signaling,
   interferon response, inflammatory response, fibrosis-related, and
   apoptosis pathways (Fig. [236]8e). In particular, the mTORC1 signaling
   pathway, known as the vital pathway for homeostasis, metabolism,
   transplantation, and regeneration in the liver^[237]83–[238]85, has
   been reported to induce pronounced and extensive hepatocyte damage when
   activated^[239]64,[240]86,[241]87. The results of CellChat^[242]88 also
   confirmed the enhanced interactions related to inflammation and
   fibrosis between cells in Niche 10 compared to Niche 1 (Supplementary
   Fig. [243]S28). Interestingly, we further found that these differential
   expression genes reflecting phenotypic changes during liver injury
   between Niche 1 and Niche 10 exhibited spatial expression gradients
   similar to those of pericentral or periportal markers (Fig. [244]8f),
   suggesting that scNiche is also capable of precisely deciphering subtle
   spatial variation trends among different niches in either normal or
   disease states.

Discussion

   In this study, we have presented a computational framework, scNiche,
   for identifying and characterizing cell niches from spatial omics data
   at single-cell resolution. scNiche employs a different approach to
   utilizing graph neural networks compared to previous deep
   learning-based methods, which typically run graph neural networks on
   the spatial graph constructed by integrating molecular profiles of the
   cell with spatial information. Specifically, scNiche first constructs
   the separate graph for features from each view of the cell, and then
   integrates them by a multiple graph autoencoder model coupled with a
   graph fusion network. This approach provides greater flexibility in
   niche modeling while more comprehensively considering the common and
   complementary information from multiple views of cells. Additionally,
   scNiche applies a multi-view mutual information maximization module to
   guide the learning of more discriminative and clustering-friendly joint
   representations. Benchmarking studies demonstrated the superior
   performance of scNiche compared to other existing methods on different
   spatial omics datasets, including spatial transcriptomics and spatial
   proteomics.

   The batch training strategy of scNiche enables its scalability to
   large-scale spatial omics datasets containing multiple samples under
   different conditions to identify homogeneous or heterogeneous cell
   niches across multiple samples, without compromising on accuracy. Our
   results on the mouse whole brain dataset containing over 3 million
   cells effectively indicate the potential of scNiche in this regard.
   Furthermore, the results on the human TNBC dataset and the mouse liver
   dataset also convincingly demonstrate the universality of scNiche in
   identifying refined patient- or disease-specific cell niches. In the
   former analysis, we deciphered the heterogeneity of cell niches within
   different TNBC subtypes, and identified patient-specific niches that
   exhibited distinct phenotypic characteristics. In the latter analysis,
   we discovered three liver injury-associated niches characterized by the
   enrichment and co-localization of inflamed macrophages, HSC-A, HPC, and
   injured hepatocytes. Furthermore, we also revealed the specific
   remodeling of niches located proximal to the portal node during liver
   injury.

   scNiche also implements an integrated downstream analytical framework
   for the comprehensive characterization and interpretation of identified
   cell niches. The enrichment analysis framework of scNiche allows for
   the comprehensive characterization of identified cell niches from
   various perspectives (including cellular compositions, conditions,
   samples, etc.). The multi-sample analysis framework of scNiche allows
   for differential analyses at the sample scale, such as the comparison
   of specific niches across different conditions, or the comparison of
   specific cell populations across different niches, which holds the
   promise of identifying clinically relevant key niches or cell
   populations from large-scale datasets while avoiding the influence of
   individual outliers. On the other hand, benefiting from its modular
   architecture, scNiche can be conveniently compatible and integrated
   with other computational tools. For example, in the analysis of the
   mouse liver dataset in this study, we first applied scVI^[245]65 to
   remove batch effects before employing scNiche to identify cell niches.
   Similarly, in subsequent downstream analysis, we also performed the
   spatial connectivity analysis among different niches facilitated by
   Squidpy^[246]89 apart from the workflow provided by scNiche itself.

   We also have some additional concerns and discussions about the
   “cellular compositions of neighborhoods” view. First, the ablation
   results of each view on the simulated datasets indicated that this view
   seemed to contribute less to the accurate identification of cell niches
   compared to the other two views (Supplementary Table [247]1). This is
   expected, as cell types are typically inferred from the molecular
   profiles of cells; therefore, the “cellular compositions of
   neighborhoods” view may be just a coarser version of the “molecular
   profiles of neighborhoods” view. Nevertheless, we found that the
   performance of scNiche consistently declined across the simulated
   datasets as well as other biological datasets when this view was
   removed (Supplementary Fig. [248]S29), suggesting that this view, as an
   expert-based feature, may help to identify niches more accurately to a
   certain extent by reducing the potential noise that exist in the
   original molecular profiles. Second, although our results on both the
   simulated datasets and the human UTUC IMC dataset showed that scNiche
   was relatively robust to the dropout and granularity of cell types
   (Supplementary Fig. [249]S3b, [250]9b), the quality of the cell type
   labels still needs to be assessed to avoid introducing additional noise
   during the subsequent integration of multi-view features. For example,
   accurate expert-annotated or expert-verified cell type labels typically
   provide more useful information compared with annotations that are just
   derived from a clustering algorithm. Finally, cell type labels are
   usually unavailable for the spatial transcriptomics data generated by
   platforms with a lower resolution such as ST^[251]36 and 10X Visium. To
   address this issue, features from other views can be used as
   substitutes, such as the cell type deconvolution results of spots
   (which can be inferred through a series of spot deconvolution
   methods^[252]39,[253]90,[254]91) or the histological information
   extracted from H&E staining images. Benchmarking studies on the human
   DLPFC 10X Visium dataset suggested that scNiche, using the
   deconvolution results of spots inferred by Cell2location^[255]39 as
   alternative inputs, performed comparably to other state-of-the-art
   methods (Supplementary Fig. [256]S12). However, users are supposed to
   test different deconvolution methods to obtain optimal results of niche
   identification in practice. Additionally, it is worth noting that
   scNiche may still be limited in accurately resolving sufficiently
   fine-grained cellular microenvironments on low-resolution spatial
   transcriptomics data due to the resolution constraints of the spot
   arrays. Another alternative strategy is to first employ single-cell
   spatial mapping^[257]92–[258]94 or reconstruction^[259]95,[260]96
   methods to generate spatial coordinates for cells, and subsequently
   apply scNiche to the reconstructed spatially resolved single-cell data,
   which may effectively overcome the inherent limitations of technical
   platforms.

   Finally, the strategy of scNiche for modeling features from different
   views of the cell offers more possible avenues for expansion, such as
   application to spatial multi-omics data. We tested this on a postnatal
   day (P)22 mouse brain coronal section dataset generated by Zhang et
   al.^[261]97, which includes RNA-seq and CUT&Tag (acetylated histone H3
   Lys27 (H3K27ac) histone modification) modalities. As shown in
   Supplementary Fig. [262]S30, scNiche achieved clearer brain region
   identification compared to the single-modality results provided by the
   original authors. In summary, scNiche offers an accurate and scalable
   approach to identify and characterize the cell niches in tissues, with
   great potential for expanding to larger and more complex datasets.

Methods

Data collection and preprocessing

   Two multi-condition scRNA-seq datasets used for constructing simulated
   data by scCube: The human PBMCs dataset^[263]98 (eight control vs.
   eight IFN-β treated samples) and the mouse cortex dataset^[264]99 (four
   vehicle vs. four LPS treated samples) were downloaded using the
   ExperimentHub^[265]100 R package. We performed normalization and
   principal components analysis (PCA) dimensionality reduction steps on
   the data using scanpy^[266]101 Python package (version 1.9.1) before
   running scNiche.

   Mouse spleen CODEX dataset^[267]12: Raw data were downloaded from
   [268]https://data.mendeley.com/datasets/zjnpwh8m5b/1. The compartment
   labels of all cells from three wild-type spleen samples (BALBc-1,
   BALBc-2, and BALBc-3) were downloaded from
   [269]https://github.com/huBioinfo/CytoCommunity. We did not perform the
   dimensionality reduction step and retained all proteins for running
   scNiche.

   Human upper tract urothelial carcinoma (UTUC) IMC dataset^[270]33:
   Processed h5ad files that contain raw data were downloaded from
   10.5281/zenodo.6376766. A total of 16 images with manually annotated
   topological domain labels were utilized. We did not perform the
   dimensionality reduction step and retained all proteins for running
   scNiche.

   Mouse V1 neocortex STARmap dataset^[271]34: Raw data were downloaded
   from [272]https://zenodo.org/record/7830764#.ZDpObi-1HUI. A total of
   one slice replicate was utilized. We performed normalization and PCA
   dimensionality reduction steps on the data using scanpy^[273]101 Python
   package (version 1.9.1) before running scNiche.

   Mouse frontal cortex and striatum MERFISH dataset^[274]35: Processed
   h5ad files were downloaded from CELLxGENE
   ([275]https://cellxgene.cziscience.com/collections/31937775-0602-4e52-a
   799-b6acdd2bac2e). A total of 31 tissue slices were utilized. We
   retained 7 major niches shared across all slices: striatum, cortical
   layer VI, cortical layer V, corpus callosum, cortical layer II/III,
   olfactory region, and pia mater. In addition, since the data have been
   normalized by the original authors, we directly performed the PCA
   dimensionality reduction step using scanpy^[276]101 Python package
   (version 1.9.1) before running scNiche.

   Human middle temporal gyrus (MTG) snRNA-seq dataset^[277]38: Raw data
   were downloaded from
   [278]https://portal.brain-map.org/atlases-and-data/rnaseq/human-mtg-sma
   rt-seq, containing 15,928 cells of 75 transcriptomically distinct cell
   types.

   Human dorsolateral prefrontal cortex (DLPFC) 10X Visium
   dataset^[279]37: Raw data were downloaded from
   [280]http://spatial.libd.org/spatialLIBD/. A total of 4 tissue slices
   from the same donor (slice 151673, 151674, 151675, 151676) were
   utilized. Before running scNiche, we performed the normalization step
   using scanpy^[281]101 Python package (version 1.9.1) and subset the
   data based on the top 2000 highly variable genes (HVGs). Subsequently,
   we performed the dimensionality reduction and batch effect removal
   steps using scvi^[282]65 Python package (version 1.1.2) on the data.

   Mouse whole brain (ABCA-1) MERFISH dataset^[283]40: Processed h5ad
   files were downloaded from CELLxGENE
   ([284]https://cellxgene.cziscience.com/collections/0cca8620-8dee-45d0-a
   ef5-23f032a5cf09). A total of 129 coronal sections were utilized after
   removing the sections that were not registered to the Allen
   CCFv3^[285]41. Since the data have been normalized by the original
   authors, we directly performed the PCA dimensionality reduction step
   using scanpy^[286]101 Python package (version 1.9.1) before running
   scNiche.

   Human triple-negative breast cancer (TNBC) MIBI-TOF dataset^[287]13:
   Raw data were downloaded from Spatial Omics DataBase^[288]102
   ([289]https://gene.ai.tencent.com/SpatialOmics/dataset?datasetID=47). A
   total of 19 mixed subtype samples and 15 compartmentalized subtype
   samples were utilized. We did not perform the dimensionality reduction
   step and retained all proteins for running scNiche.

   Mouse liver Seq-Scope dataset^[290]10: Processed RDS files that contain
   raw gene expression matrix and cell type annotation information were
   downloaded from Deep Blue Data (10.7302/cjfe-wa35). A total of 6 normal
   donors and 4 early-onset liver failure donors were utilized. Before
   running scNiche, we performed the normalization step using
   scanpy^[291]101 Python package (version 1.9.1) and subset the data
   based on the top 2000 highly variable genes (HVGs). Subsequently, we
   performed the dimensionality reduction and batch effect removal steps
   using scvi^[292]65 Python package (version 1.1.2) on the data.

   Mouse brain spatial CUT&Tag–RNA-seq dataset^[293]97: Processed h5ad
   files were downloaded from [294]https://zenodo.org/records/10362607.
   Since the data have been processed by the authors, we directly used the
   low-dimensional representations of RNA (reduced by PCA) and CUT&Tag
   (reduced by latent semantic indexing) modalities.

   The detailed description of each dataset is summarized in Supplementary
   Data [295]1.

Design of scNiche

Model overview

   The scNiche model is developed based on the multi-view clustering
   method proposed by Wang et al.^[296]103 and consists of three
   components: M-GAE, GFN, and MMIM. Importantly, we innovate the original
   framework in the following ways to better adapt to the spatial omics
   data: (1) we expand the M-GAE model to allow the creation of graph
   convolutional layers for each view in an extensible manner, replacing
   the design of a fixed number of views in the original framework. This
   optimization greatly enhances the flexibility of niche modeling with
   different numbers or combinations of views; (2) unlike the sequential
   architecture of M-GAE and GFN in the original framework, we develop an
   improved model architecture that couples these two components, and
   optimize the corresponding training process so that the parameters of
   both M-GAE and GFN can be updated simultaneously during training,
   enhancing the synergy between them; and (3) we develop a subgraph-based
   batch training strategy to adapt to the increasing size of spatial
   omics data, enabling scNiche to scale to large datasets.

   scNiche initially extracts the multi-view features of cells within a
   pre-defined neighborhood range from the given spatial omics data and
   constructs graphs corresponding to each view. Notably, the extracted
   features can be obtained with dimensionality reduction and batch effect
   removal steps as needed before graph construction. Subsequently,
   scNiche applies this coupled neural network architecture of M-GAE and
   GFN to integrate information from multiple views and learn a joint
   representation. Furthermore, the MMIM module is also introduced to the
   learning of more clustering-friendly joint representation. Finally,
   scNiche uses k-means algorithm on the learned joint representation to
   identify cell niches, although other unsupervised clustering methods
   such as Leiden^[297]30 algorithm are also provided. Below we describe
   each step of the workflow of scNiche in detail.

Cellular neighborhoods determination

   scNiche applies the k-nearest neighbors algorithm to determine the size
   of cellular neighborhoods. We evaluated the robustness of scNiche to
   the different values of k on the simulated and biological datasets
   (Supplementary Fig. [298]S15-[299]16).

Dimensionality reduction and batch effect removal

   In contrast to spatial proteomics data, which usually contain only a
   few dozen proteins, spatial transcriptomics data can often measure
   hundreds to thousands of genes, with potential batch effects commonly
   present across tissue slices from different samples. Therefore,
   dimensionality reduction and batch effect removal need to be performed
   on the molecular profiles of the cells and their neighborhoods before
   multi-view feature fusion. Additionally, considering that the number of
   genes measured in spatial transcriptomics data usually far exceeds the
   number of cell types that exist, this preprocessing step also helps
   balance the dimensionality of features across different views, allowing
   for more accurate niche identification (Supplementary Fig. [300]S31).
   We use scVI^[301]65 by default to perform dimensionality reduction and
   batch effect removal. However, simple PCA dimensionality reduction or
   other deep learning-based integration methods like scArches^[302]104
   are also applicable.

Multiple graph auto-encoder

   To learn the joint representation that combines the common and
   complementary information from multiple views, scNiche applies a
   multiple graph autoencoder (M-GAE) model consisting of a multi-graph
   attention fusion encoder base on the GCN^[303]105 and view-specific
   decoders.

   In the multi-graph attention fusion encoder, we use V view-specific GCN
   layers as the first layer. Given the multi-view features
   [MATH: <mi
   mathvariant="script">X</mi><mo>=</mo><msubsup><mrow><mrow><mo>{</mo><mr
   ow><msup><mrow><mi>X</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>v</mi></mrow></mfenced></mrow></msup></mrow><mo>}</m
   o></mrow></mrow><mrow><mi>v</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>V<
   /mi></mrow></msubsup> :MATH]
   and the corresponding graphs
   [MATH: <mi
   mathvariant="script">A</mi><mo>=</mo><msubsup><mrow><mrow><mo>{</mo><mr
   ow><msup><mrow><mi>A</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>v</mi></mrow></mfenced></mrow></msup></mrow><mo>}</m
   o></mrow></mrow><mrow><mi>v</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>V<
   /mi></mrow></msubsup> :MATH]
   , where
   [MATH: <msup><mrow><mi>X</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>v</mi></mrow></mfenced></mrow></msup><msup><mrow><mi
   mathvariant="double-struck">\in</mi><mi
   mathvariant="double-struck">R</mi></mrow><mrow><mi>N</mi><mo>×</mo><mi>
   F</mi></mrow></msup> :MATH]
   is the feature matrix of the v-th view with N nodes and F features, and
   [MATH: <msup><mrow><mi>A</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>v</mi></mrow></mfenced></mrow></msup><msup><mrow><mi
   mathvariant="double-struck">\in</mi><mi
   mathvariant="double-struck">R</mi></mrow><mrow><mi>N</mi><mo>×</mo><mi>
   N</mi></mrow></msup> :MATH]
   is the graph of the v-th view, the v-th view-specific representations
   [MATH:
   <msubsup><mrow><mi>Z</mi></mrow><mrow><mrow><mo>(</mo><mrow><mn>1</mn><
   /mrow><mo>)</mo></mrow></mrow><mrow><mrow><mo>(</mo><mrow><mi>v</mi></m
   row><mo>)</mo></mrow></mrow></msubsup> :MATH]
   learned by the first layer can be obtained as follow:
   [MATH:
   <msubsup><mrow><mi>Z</mi></mrow><mrow><mrow><mo>(</mo><mrow><mn>1</mn><
   /mrow><mo>)</mo></mrow></mrow><mrow><mrow><mo>(</mo><mrow><mi>v</mi></m
   row><mo>)</mo></mrow></mrow></msubsup><mo>=</mo><mi>δ</mi><mfenced
   close=")" open="("><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><msup><mrow><mover
   accent="true"><mrow><mi>D</mi></mrow><mo>~</mo></mover></mrow><mrow><mr
   ow><mo>(</mo><mrow><mi>v</mi></mrow><mo>)</mo></mrow></mrow></msup></mr
   ow></mfenced></mrow><mrow><mo>−</mo><mfrac><mrow><mn>1</mn></mrow><mrow
   ><mn>2</mn></mrow></mfrac></mrow></msup><msup><mrow><mover
   accent="true"><mrow><mi>A</mi></mrow><mo>~</mo></mover></mrow><mrow><mr
   ow><mo>(</mo><mrow><mi>v</mi></mrow><mo>)</mo></mrow></mrow></msup><msu
   p><mrow><mfenced close=")" open="("><mrow><msup><mrow><mover
   accent="true"><mrow><mi>D</mi></mrow><mo>~</mo></mover></mrow><mrow><mr
   ow><mo>(</mo><mrow><mi>v</mi></mrow><mo>)</mo></mrow></mrow></msup></mr
   ow></mfenced></mrow><mrow><mo>−</mo><mfrac><mrow><mn>1</mn></mrow><mrow
   ><mn>2</mn></mrow></mfrac></mrow></msup><msup><mrow><mi>X</mi></mrow><m
   row><mrow><mo>(</mo><mrow><mi>v</mi></mrow><mo>)</mo></mrow></mrow></ms
   up><msubsup><mrow><mi>W</mi></mrow><mrow><mrow><mo>(</mo><mrow><mn>1</m
   n></mrow><mo>)</mo></mrow></mrow><mrow><mrow><mo>(</mo><mrow><mi>v</mi>
   </mrow><mo>)</mo></mrow></mrow></msubsup></mrow></mfenced> :MATH]
   1

   where δ(·) is the activation function.
   [MATH: <mover
   accent="true"><mrow><mi>A</mi></mrow><mo>~</mo></mover><mo>=</mo><mi>A<
   /mi><mo>+</mo><mi>I</mi> :MATH]
   , I is the identity diagonal matrix.
   [MATH: <mover
   accent="true"><mrow><mi>D</mi></mrow><mo>~</mo></mover><mo>=</mo><mi
   mathvariant="normal">diag</mi><mrow><mo>(</mo><mrow><msub><mrow><mo>∑</
   mo></mrow><mrow><mi>j</mi></mrow></msub><msub><mrow><mover
   accent="true"><mrow><mi>A</mi></mrow><mo>~</mo></mover></mrow><mrow><mi
   >i</mi><mi>j</mi></mrow></msub></mrow><mo>)</mo></mrow> :MATH]
   is the degree matrix of
   [MATH: <mover accent="true"><mrow><mi>A</mi></mrow><mo>~</mo></mover>
   :MATH]
   .
   [MATH:
   <msubsup><mrow><mi>W</mi></mrow><mrow><mrow><mo>(</mo><mrow><mn>1</mn><
   /mrow><mo>)</mo></mrow></mrow><mrow><mrow><mo>(</mo><mrow><mi>v</mi></m
   row><mo>)</mo></mrow></mrow></msubsup><msup><mrow><mi
   mathvariant="double-struck">\in</mi><mspace width="0.25em"></mspace><mi
   mathvariant="double-struck">R</mi></mrow><mrow><mi>F</mi><mo>×</mo><msu
   b><mrow><mi>d</mi></mrow><mrow><mn>1</mn></mrow></msub></mrow></msup>
   :MATH]
   is the parameter matrix of the v-th view learned by GCN layers, with
   [MATH: <msub><mrow><mi>d</mi></mrow><mrow><mn>1</mn></mrow></msub>
   :MATH]
   being the output dimension for GCN layers.

   To adaptively fuse the representations of a sample across different
   views, we introduce an attention coefficient matrix
   [MATH:
   <msubsup><mrow><mi>W</mi></mrow><mrow><mi>a</mi></mrow><mrow><mrow><mo>
   (</mo><mrow><mi>v</mi></mrow><mo>)</mo></mrow></mrow></msubsup> :MATH]
   to learn the importance of each view. This allows for a weighted
   combination of view-specific representations, leading to a more
   informative common representation. The operation to compute this joint
   representation, denoted as
   [MATH:
   <msub><mrow><mi>Z</mi></mrow><mrow><mrow><mo>(</mo><mrow><mn>2</mn></mr
   ow><mo>)</mo></mrow></mrow></msub> :MATH]
   , is defined by the following equation:
   [MATH:
   <msub><mrow><mi>Z</mi></mrow><mrow><mrow><mo>(</mo><mrow><mn>2</mn></mr
   ow><mo>)</mo></mrow></mrow></msub><mo>=</mo><mi>δ</mi><mfenced
   close=")" open="("><mrow><msubsup><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>v</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>V</mi></mrow></msubsup><msubsup><mrow><mi>W</mi></mrow><mrow>
   <mi>a</mi></mrow><mrow><mrow><mo>(</mo><mrow><mi>v</mi></mrow><mo>)</mo
   ></mrow></mrow></msubsup><mfenced close=")"
   open="("><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><msup><mrow><mover
   accent="true"><mrow><mi>D</mi></mrow><mo>~</mo></mover></mrow><mrow><mr
   ow><mo>(</mo><mrow><mi>v</mi></mrow><mo>)</mo></mrow></mrow></msup></mr
   ow></mfenced></mrow><mrow><mo>−</mo><mfrac><mrow><mn>1</mn></mrow><mrow
   ><mn>2</mn></mrow></mfrac></mrow></msup><msup><mrow><mover
   accent="true"><mrow><mi>A</mi></mrow><mo>~</mo></mover></mrow><mrow><mr
   ow><mo>(</mo><mrow><mi>v</mi></mrow><mo>)</mo></mrow></mrow></msup><msu
   p><mrow><mfenced close=")" open="("><mrow><msup><mrow><mover
   accent="true"><mrow><mi>D</mi></mrow><mo>~</mo></mover></mrow><mrow><mr
   ow><mo>(</mo><mrow><mi>v</mi></mrow><mo>)</mo></mrow></mrow></msup></mr
   ow></mfenced></mrow><mrow><mo>−</mo><mfrac><mrow><mn>1</mn></mrow><mrow
   ><mn>2</mn></mrow></mfrac></mrow></msup><msubsup><mrow><mi>Z</mi></mrow
   ><mrow><mrow><mo>(</mo><mrow><mn>1</mn></mrow><mo>)</mo></mrow></mrow><
   mrow><mrow><mo>(</mo><mrow><mi>v</mi></mrow><mo>)</mo></mrow></mrow></m
   subsup><msubsup><mrow><mi>W</mi></mrow><mrow><mrow><mo>(</mo><mrow><mn>
   2</mn></mrow><mo>)</mo></mrow></mrow><mrow><mrow><mo>(</mo><mrow><mi>v<
   /mi></mrow><mo>)</mo></mrow></mrow></msubsup></mrow></mfenced></mrow></
   mfenced> :MATH]
   2

   where
   [MATH:
   <msubsup><mrow><mi>W</mi></mrow><mrow><mrow><mo>(</mo><mrow><mn>2</mn><
   /mrow><mo>)</mo></mrow></mrow><mrow><mrow><mo>(</mo><mrow><mi>v</mi></m
   row><mo>)</mo></mrow></mrow></msubsup><msup><mrow><mi
   mathvariant="double-struck">\in</mi><mspace width="0.25em"></mspace><mi
   mathvariant="double-struck">R</mi></mrow><mrow><msub><mrow><mi>d</mi></
   mrow><mrow><mn>1</mn></mrow></msub><mo>×</mo><msub><mrow><mi>d</mi></mr
   ow><mrow><mn>2</mn></mrow></msub></mrow></msup> :MATH]
   is the parameter matrix learned by GCN layers, the
   [MATH:
   <msubsup><mrow><mi>W</mi></mrow><mrow><mi>a</mi></mrow><mrow><mrow><mo>
   (</mo><mrow><mi>v</mi></mrow><mo>)</mo></mrow></mrow></msubsup><msup><m
   row><mi mathvariant="double-struck">\in</mi><mspace
   width="0.25em"></mspace><mi
   mathvariant="double-struck">R</mi></mrow><mrow><msub><mrow><mi>d</mi></
   mrow><mrow><mn>2</mn></mrow></msub><mo>×</mo><msub><mrow><mi>d</mi></mr
   ow><mrow><mn>2</mn></mrow></msub></mrow></msup> :MATH]
   is the attention coefficient matrix, with d[2] being the output
   dimension for GCN layers.

   We then continue to use the GCN layers to apply convolution over the
   obtained joint representation
   [MATH:
   <msub><mrow><mi>Z</mi></mrow><mrow><mrow><mo>(</mo><mrow><mn>2</mn></mr
   ow><mo>)</mo></mrow></mrow></msub> :MATH]
   and the consensus graph
   [MATH: <msup><mrow><mi>A</mi></mrow><mrow><mo>*</mo></mrow></msup>
   :MATH]
   learned by the graph fusion network, and the final joint representation
   Z can be obtained as follow:
   [MATH: <mi>Z</mi><mo>=</mo><mi>δ</mi><mfenced close=")"
   open="("><mrow><msup><mrow><mfenced close=")"
   open="("><mrow><msup><mrow><mover
   accent="true"><mrow><mi>D</mi></mrow><mo>~</mo></mover></mrow><mrow><mo
   >*</mo></mrow></msup></mrow></mfenced></mrow><mrow><mo>−</mo><mfrac><mr
   ow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac></mrow></msup><msup>
   <mrow><mover
   accent="true"><mrow><mi>A</mi></mrow><mo>~</mo></mover></mrow><mrow><mo
   >*</mo></mrow></msup><msup><mrow><mfenced close=")"
   open="("><mrow><msup><mrow><mover
   accent="true"><mrow><mi>D</mi></mrow><mo>~</mo></mover></mrow><mrow><mo
   >*</mo></mrow></msup></mrow></mfenced></mrow><mrow><mo>−</mo><mfrac><mr
   ow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac></mrow></msup><msub>
   <mrow><mi>Z</mi></mrow><mrow><mrow><mo>(</mo><mrow><mn>2</mn></mrow><mo
   >)</mo></mrow></mrow></msub><msub><mrow><mi>W</mi></mrow><mrow><mrow><m
   o>(</mo><mrow><mn>3</mn></mrow><mo>)</mo></mrow></mrow></msub></mrow></
   mfenced> :MATH]
   3

   where
   [MATH:
   <msub><mrow><mi>W</mi></mrow><mrow><mrow><mo>(</mo><mrow><mn>3</mn></mr
   ow><mo>)</mo></mrow></mrow></msub><msup><mrow><mi
   mathvariant="double-struck">\in</mi><mspace width="0.25em"></mspace><mi
   mathvariant="double-struck">R</mi></mrow><mrow><msub><mrow><mi>d</mi></
   mrow><mrow><mn>2</mn></mrow></msub><mo>×</mo><msub><mrow><mi>d</mi></mr
   ow><mrow><mn>3</mn></mrow></msub></mrow></msup> :MATH]
   is the parameter matrix learned by GCN layers, with
   [MATH: <msub><mrow><mi>d</mi></mrow><mrow><mn>3</mn></mrow></msub>
   :MATH]
   being the output dimension for GCN layers.

   In the view-specific decoders, we use the inner-product as the decoder
   to reconstruct the multi-view graphs from the joint representation Z:
   [MATH: <msup><mrow><mover
   accent="true"><mrow><mi>A</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><mrow><mo>(</mo><mrow><mi>v</mi></mrow><mo>)</mo></mrow></mro
   w></msup><mo>=</mo><mi mathvariant="normal">sigmoid</mi><mfenced
   close=")"
   open="("><mrow><mi>Z</mi><mo>⋅</mo><msup><mrow><mi>W</mi></mrow><mrow><
   mrow><mo>(</mo><mrow><mi>v</mi></mrow><mo>)</mo></mrow></mrow></msup><m
   o>⋅</mo><msup><mrow><mi>Z</mi></mrow><mrow><mi>T</mi></mrow></msup></mr
   ow></mfenced> :MATH]
   4

   where
   [MATH:
   <msup><mrow><mi>W</mi></mrow><mrow><mrow><mo>(</mo><mrow><mi>v</mi></mr
   ow><mo>)</mo></mrow></mrow></msup><msup><mrow><mi
   mathvariant="double-struck">\in</mi><mspace width="0.25em"></mspace><mi
   mathvariant="double-struck">R</mi></mrow><mrow><msub><mrow><mi>d</mi></
   mrow><mrow><mn>3</mn></mrow></msub><mo>×</mo><msub><mrow><mi>d</mi></mr
   ow><mrow><mn>3</mn></mrow></msub></mrow></msup> :MATH]
   is the parameter matrix learned by the v-th view-specific decoder.

   In order to minimize the reconstruction error between the original
   graph
   [MATH: <msup><mrow><mi>A</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>v</mi></mrow></mfenced></mrow></msup> :MATH]
   and the reconstruction graph
   [MATH: <msup><mrow><mover
   accent="true"><mrow><mi>A</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><mrow><mo>(</mo><mrow><mi>v</mi></mrow><mo>)</mo></mrow></mro
   w></msup> :MATH]
   of each view, the loss of the multiple graph autoencoder is defined as:
   [MATH:
   <msub><mrow><mi>L</mi></mrow><mrow><mi>r</mi><mi>e</mi><mi>c</mi></mrow
   ></msub><mo>=</mo><msubsup><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>v</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>V</mi></mrow></msubsup><msubsup><mrow><mi>L</mi></mrow><mrow>
   <mi>r</mi><mi>e</mi><mi>c</mi></mrow><mrow><mrow><mo>(</mo><mrow><mi>v<
   /mi></mrow><mo>)</mo></mrow></mrow></msubsup><mo>=</mo><msubsup><mrow><
   mo
   mathsize="big">∑</mo></mrow><mrow><mi>v</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>V</mi></mrow></msubsup><mi>l</mi><mi>o</mi><mi>s</mi><mi>s</m
   i><mfenced close=")"
   open="("><mrow><msup><mrow><mi>A</mi></mrow><mrow><mrow><mo>(</mo><mrow
   ><mi>v</mi></mrow><mo>)</mo></mrow></mrow></msup><mo>,</mo><msup><mrow>
   <mover
   accent="true"><mrow><mi>A</mi></mrow><mrow><mo>^</mo></mrow></mover></m
   row><mrow><mrow><mo>(</mo><mrow><mi>v</mi></mrow><mo>)</mo></mrow></mro
   w></msup></mrow></mfenced> :MATH]
   5

   The loss function to be optimized is binary cross entropy (BCE) loss.

Graph fusion network

   In the scNiche framework, we introduce an additional graph fusion
   network (GFN) in the M-GAE model to learn the consensus graph
   [MATH: <msup><mrow><mi>A</mi></mrow><mrow><mo>*</mo></mrow></msup>
   :MATH]
   , which contains the global adjacency relations of graphs from
   different views (i.e., the global node relationships). Notably, the
   information is shared between M-GAE and GFN during training. The GFN is
   a two-layer fully connected model, where the first layer is followed by
   a ReLU activation. The consensus graph learned by l-th layer can be
   described as:
   [MATH: <msub><mrow><mi>G</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>l</mi></mrow></mfenced></mrow></msub><mo>=</mo><mi>δ
   </mi><mfenced close=")"
   open="("><mrow><msub><mrow><mi>W</mi></mrow><mrow><mi>G</mi><mi>F</mi><
   mi>N</mi><mfenced close=")"
   open="("><mrow><mi>l</mi></mrow></mfenced></mrow></msub><msub><mrow><mi
   >G</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>l</mi><mo>−</mo><mn>1</mn></mrow></mfenced></mrow></
   msub><mo>+</mo><msub><mrow><mi>b</mi></mrow><mrow><mi>G</mi><mi>F</mi><
   mi>N</mi><mfenced close=")"
   open="("><mrow><mi>l</mi></mrow></mfenced></mrow></msub></mrow></mfence
   d> :MATH]
   6

   where δ(·) is the activation function,
   [MATH:
   <msub><mrow><mi>W</mi></mrow><mrow><mi>G</mi><mi>F</mi><mi>N</mi><mfenc
   ed close=")"
   open="("><mrow><mi>l</mi></mrow></mfenced></mrow></msub><msup><mrow><mi
   mathvariant="double-struck">\in</mi><mspace width="0.25em"></mspace><mi
   mathvariant="double-struck">R</mi></mrow><mrow><msub><mrow><mi>d</mi></
   mrow><mrow><mi>l</mi></mrow></msub><mo>×</mo><msub><mrow><mi>d</mi></mr
   ow><mrow><mi>l</mi><mo>−</mo><mn>1</mn></mrow></msub></mrow></msup>
   :MATH]
   and
   [MATH:
   <msub><mrow><mi>b</mi></mrow><mrow><mi>G</mi><mi>F</mi><mi>N</mi><mfenc
   ed close=")"
   open="("><mrow><mi>l</mi></mrow></mfenced></mrow></msub><mo>∈</mo><msup
   ><mrow><mi
   mathvariant="double-struck">R</mi></mrow><mrow><msub><mrow><mi>d</mi></
   mrow><mrow><mi>l</mi></mrow></msub></mrow></msup> :MATH]
   are the weight matrix and bias of the l-th layer, respectively, with
   [MATH: <msub><mrow><mi>d</mi></mrow><mrow><mi>l</mi></mrow></msub>
   :MATH]
   being the output dimension for layer l. The initial input
   [MATH: <msub><mrow><mi>G</mi></mrow><mrow><mn>0</mn></mrow></msub>
   :MATH]
   to the GFN is defined as:
   [MATH: <msub><mrow><mi>G</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mn>0</mn></mrow></mfenced></mrow></msub><mo>=</mo><msub
   sup><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>v</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>V</mi></mrow></msubsup><msup><mrow><mi>A</mi></mrow><mrow><mr
   ow><mo>(</mo><mrow><mi>v</mi></mrow><mo>)</mo></mrow></mrow></msup>
   :MATH]
   7

   where
   [MATH:
   <msup><mrow><mi>A</mi></mrow><mrow><mrow><mo>(</mo><mrow><mi>v</mi></mr
   ow><mo>)</mo></mrow></mrow></msup> :MATH]
   is the graphs from v-th view, and V is the number of views.

   The GFN’s goal is to ensure that the final consensus graph
   [MATH: <msup><mrow><mi>A</mi></mrow><mrow><mo>*</mo></mrow></msup>
   :MATH]
   integrates the information from each individual graph
   [MATH:
   <msup><mrow><mi>A</mi></mrow><mrow><mrow><mo>(</mo><mrow><mi>v</mi></mr
   ow><mo>)</mo></mrow></mrow></msup> :MATH]
   comprehensively. The optimization of the GFN involves a loss function
   that minimizes the discrepancy between the individual graphs from each
   view
   [MATH:
   <msup><mrow><mi>A</mi></mrow><mrow><mrow><mo>(</mo><mrow><mi>v</mi></mr
   ow><mo>)</mo></mrow></mrow></msup> :MATH]
   and the consensus graph
   [MATH: <msup><mrow><mi>A</mi></mrow><mrow><mo>*</mo></mrow></msup>
   :MATH]
   :
   [MATH:
   <msub><mrow><mi>L</mi></mrow><mrow><mi>g</mi><mi>r</mi><mi>e</mi></mrow
   ></msub><mo>=</mo><msubsup><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>v</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>V</mi></mrow></msubsup><mi>l</mi><mi>o</mi><mi>s</mi><mi>s</m
   i><mfenced close=")"
   open="("><mrow><msup><mrow><mi>A</mi></mrow><mrow><mrow><mo>(</mo><mrow
   ><mi>v</mi></mrow><mo>)</mo></mrow></mrow></msup><mo>,</mo><msup><mrow>
   <mi>A</mi></mrow><mrow><mo>*</mo></mrow></msup></mrow></mfenced> :MATH]
   8

   This loss function is typically a mean squared error (MSE) loss,
   focusing on reducing errors between the graphs from each view and the
   synthesized consensus graph.

Multi-view mutual information maximization

   Mutual information is a Shannon entropy-based measure of dependence
   between random variables^[304]106. Recent studies have revealed that
   maximizing the mutual information between input samples and learned
   latent representations contributes to the learning of useful
   representations by the models (such as the encoder)^[305]107. Given the
   input samples
   [MATH: <mi>X</mi><mo>=</mo><msubsup><mrow><mfenced close="}"
   open="{"><mrow><msup><mrow><mi>x</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup></mrow></mfence
   d></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>N</mi></m
   row></msubsup> :MATH]
   and the corresponding representations
   [MATH: <mi>Y</mi><mo>=</mo><msubsup><mrow><mfenced close="}"
   open="{"><mrow><msup><mrow><mi>y</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup></mrow></mfence
   d></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>N</mi></m
   row></msubsup> :MATH]
   , the mutual information between X and Y can be expressed as:
   [MATH: <mi>I</mi><mfenced close=")"
   open="("><mrow><mi>X</mi><mo>,</mo><mi>Y</mi></mrow></mfenced><mo>=</mo
   ><mi>∬</mi><mi>p</mi><mfenced close=")"
   open="("><mrow><mi>y</mi><mo>∣</mo><mi>x</mi></mrow></mfenced><mi>p</mi
   ><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced><mi>log</mi><mfrac><mrow><mi>
   p</mi><mfenced close=")"
   open="("><mrow><mi>y</mi><mo>∣</mo><mi>x</mi></mrow></mfenced></mrow><m
   row><mi>p</mi><mfenced close=")"
   open="("><mrow><mi>y</mi></mrow></mfenced></mrow></mfrac><mi>d</mi><mi>
   x</mi><mspace
   width="0.25em"></mspace><mi>d</mi><mi>y</mi><mo>=</mo><msub><mrow><mi
   class="MJX-tex-caligraphic"
   mathvariant="script">D</mi></mrow><mrow><mi>K</mi><mi>L</mi></mrow></ms
   ub><mrow><mo>(</mo><mrow><mi>p</mi><mfenced close=")"
   open="("><mrow><mi>y</mi><mo>∣</mo><mi>x</mi></mrow></mfenced><mi>p</mi
   ><mfenced close=")" open="("><mrow><mi>x</mi></mrow></mfenced><mi
   mathvariant="normal">∣∣</mi><mi>p</mi><mfenced close=")"
   open="("><mrow><mi>y</mi></mrow></mfenced><mi>p</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced></mrow><mo>)</mo></mrow>
   :MATH]
   9

   Based on the assumption by Wang et al.^[306]103 that if two samples x
   and
   [MATH: <msup><mrow><mi>x</mi></mrow><mrow><mo>′</mo></mrow></msup>
   :MATH]
   are close in any view, their corresponding representations z and
   [MATH: <msup><mrow><mi>z</mi></mrow><mrow><mo>′</mo></mrow></msup>
   :MATH]
   should also be close in the common latent view, we here applied their
   multi-view mutual information maximization (MMIM) module to guide the
   learning of the clustering-friendly joint representations.
   Specifically, the MMIM module aims to guide the coupled model of M-GAE
   and GFN to ultimately generate more useful joint representations for
   each cell by boosting the similarity of the multi-view joint
   representations of two cells that are similar to each other in any
   view, as a way to make subsequent cell niche identification more
   accurate. According to the relevant properties of mutual information,
   larger mutual information denotes the representations are more similar,
   thus the optimization objective of the MMIM module can be expressed as:
   [MATH: <mi>max</mi><mfenced close="}" open="{"><mrow><mi>I</mi><mfenced
   close=")"
   open="("><mrow><mi>Z</mi><mo>,</mo><msup><mrow><mi>Z</mi></mrow><mrow><
   mo>′</mo></mrow></msup></mrow></mfenced></mrow></mfenced> :MATH]
   10

   Where Z and
   [MATH: <msup><mrow><mi>Z</mi></mrow><mrow><mo>′</mo></mrow></msup>
   :MATH]
   are the corresponding representations of the samples X and their
   nearest neighbors
   [MATH: <msup><mrow><mi>X</mi></mrow><mrow><mo>′</mo></mrow></msup>
   :MATH]
   , respectively.

   According to the Eq. ([307]9) and Eq. ([308]10), the loss of the MMIM
   module can be written as:
   [MATH:
   <msub><mrow><mi>L</mi></mrow><mrow><mi>m</mi><mi>i</mi><mi>m</mi></mrow
   ></msub><mo>=</mo><mo>−</mo><msub><mrow><mi class="MJX-tex-caligraphic"
   mathvariant="script">D</mi></mrow><mrow><mi>K</mi><mi>L</mi></mrow></ms
   ub><mrow><mo>(</mo><mrow><mi>p</mi><mfenced close=")"
   open="("><mrow><msup><mrow><mi>z</mi></mrow><mrow><mo>′</mo></mrow></ms
   up><mo>∣</mo><mi>z</mi></mrow></mfenced><mi>p</mi><mfenced close=")"
   open="("><mrow><mi>z</mi></mrow></mfenced><mo>∣</mo><mo>∣</mo><mi>p</mi
   ><mfenced close=")"
   open="("><mrow><msup><mrow><mi>z</mi></mrow><mrow><mo>′</mo></mrow></ms
   up></mrow></mfenced><mi>p</mi><mfenced close=")"
   open="("><mrow><mi>z</mi></mrow></mfenced></mrow><mo>)</mo></mrow>
   :MATH]
   11

   Since the KL divergence is unbounded, we use JS divergence instead of
   KL divergence in mutual information and Eq. ([309]11) can be converted
   to:
   [MATH:
   <msub><mrow><mi>L</mi></mrow><mrow><mi>m</mi><mi>i</mi><mi>m</mi></mrow
   ></msub><mo>=</mo><mo>−</mo><msub><mrow><mi class="MJX-tex-caligraphic"
   mathvariant="script">D</mi></mrow><mrow><mi>J</mi><mi>S</mi></mrow></ms
   ub><mrow><mo>(</mo><mrow><mi>p</mi><mfenced close=")"
   open="("><mrow><msup><mrow><mi>z</mi></mrow><mrow><mo>′</mo></mrow></ms
   up><mo>∣</mo><mi>z</mi></mrow></mfenced><mi>p</mi><mfenced close=")"
   open="("><mrow><mi>z</mi></mrow></mfenced><mo>∣</mo><mo>∣</mo><mi>p</mi
   ><mfenced close=")"
   open="("><mrow><mi>z</mi><mo>′</mo></mrow></mfenced><mi>p</mi><mfenced
   close=")"
   open="("><mrow><mi>z</mi></mrow></mfenced></mrow><mo>)</mo></mrow>
   :MATH]
   12

   The JS divergence, as a specific case of the f-divergences^[310]108, is
   challenging to compute directly in practice. We thus utilize the
   variational lower bound on the f-divergences
   [MATH: <msub><mrow><mi class="MJX-tex-caligraphic"
   mathvariant="script">D</mi></mrow><mrow><mi>f</mi></mrow></msub><mfence
   d close=")"
   open="("><mrow><mi>P</mi><mo>∣</mo><mo>∣</mo><mi>Q</mi></mrow></mfenced
   > :MATH]
   ^[311]108 to estimate a generative model Q given the true distribution
   P. In this approach, we adopt the generative-adversarial network
   methodology, employing two neural networks: Q and T. Here, Q is our
   generative model that outputs a sample of interest from a random vector
   input, and T is the variational function that evaluates these samples.
   The variational estimation of f-divergences is defined as:
   [MATH: <msub><mrow><mi class="MJX-tex-caligraphic"
   mathvariant="script">D</mi></mrow><mrow><mi>f</mi></mrow></msub><mfence
   d close=")"
   open="("><mrow><mi>P</mi><mo>∣</mo><mo>∣</mo><mi>Q</mi></mrow></mfenced
   ><mo>=</mo><msub><mrow><mi>max</mi></mrow><mrow><mi>T</mi></mrow></msub
   ><mrow><mo>(</mo><mrow><msub><mrow><mi
   mathvariant="double-struck">E</mi></mrow><mrow><mi>x</mi><mo>~</mo><mi>
   p</mi><mrow><mo>(</mo><mrow><mi>x</mi></mrow><mo>)</mo></mrow></mrow></
   msub><mfenced close="]"
   open="["><mrow><mi>T</mi><mrow><mo>(</mo><mrow><mi>x</mi></mrow><mo>)</
   mo></mrow></mrow></mfenced><mo>−</mo><msub><mrow><mi
   mathvariant="double-struck">E</mi></mrow><mrow><mi>x</mi><mo>~</mo><mi>
   q</mi><mrow><mo>(</mo><mrow><mi>x</mi></mrow><mo>)</mo></mrow></mrow></
   msub><mfenced close="]"
   open="["><mrow><mi>g</mi><mrow><mo>(</mo><mrow><mi>T</mi><mrow><mo>(</m
   o><mrow><mi>x</mi></mrow><mo>)</mo></mrow></mrow><mo>)</mo></mrow></mro
   w></mfenced></mrow><mo>)</mo></mrow> :MATH]
   13

   where
   [MATH:
   <mi>p</mi><mrow><mo>(</mo><mrow><mi>x</mi></mrow><mo>)</mo></mrow>
   :MATH]
   and
   [MATH:
   <mi>q</mi><mrow><mo>(</mo><mrow><mi>x</mi></mrow><mo>)</mo></mrow>
   :MATH]
   are the probability density functions of the true distribution P and
   the estimated distribution Q respectively, with Q being parameterized
   by the generative model in the GAN framework. The functions
   [MATH:
   <mi>f</mi><mrow><mo>(</mo><mrow><mi>u</mi></mrow><mo>)</mo></mrow>
   :MATH]
   and the its conjugate
   [MATH:
   <mi>g</mi><mrow><mo>(</mo><mrow><mi>t</mi></mrow><mo>)</mo></mrow>
   :MATH]
   dictate the specific form of divergence being measured^[312]108:
   [MATH:
   <mi>f</mi><mrow><mo>(</mo><mrow><mi>u</mi></mrow><mo>)</mo></mrow><mo>=
   </mo><mo>−</mo><mfenced close=")"
   open="("><mrow><mi>u</mi><mo>+</mo><mn>1</mn></mrow></mfenced><mi>log</
   mi><mfrac><mrow><mi>u</mi><mo>+</mo><mn>1</mn></mrow><mrow><mn>2</mn></
   mrow></mfrac><mo>+</mo><mi>u</mi><mi>log</mi><mi>u</mi><mi>g</mi><mrow>
   <mo>(</mo><mrow><mi>t</mi></mrow><mo>)</mo></mrow><mo>=</mo><mo>−</mo><
   mi>log</mi><mrow><mo>(</mo><mrow><mn>2</mn><mo>−</mo><msup><mrow><mi>e<
   /mi></mrow><mrow><mi>t</mi></mrow></msup></mrow><mo>)</mo></mrow>
   :MATH]
   14

   This framework facilitates the calculation of the JS divergence as
   follows:
   [MATH: <msub><mrow><mi class="MJX-tex-caligraphic"
   mathvariant="script">D</mi></mrow><mrow><mi>J</mi><mi>S</mi></mrow></ms
   ub><mfenced close=")"
   open="("><mrow><mi>P</mi><mo>∣</mo><mo>∣</mo><mi>Q</mi></mrow></mfenced
   ><mo>=</mo><msub><mrow><mi>max</mi></mrow><mrow><mi>T</mi></mrow></msub
   ><mrow><mo>(</mo><mrow><msub><mrow><mi
   mathvariant="double-struck">E</mi></mrow><mrow><mi>x</mi><mo>~</mo><mi>
   p</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced></mrow></msub><mfenced
   close="]" open="["><mrow><mi>T</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced></mrow></mfenced><mo>−</mo><m
   sub><mrow><mi
   mathvariant="double-struck">E</mi></mrow><mrow><mi>x</mi><mo>~</mo><mi>
   q</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced></mrow></msub><mfenced
   close="]" open="["><mrow><mi>g</mi><mfenced close=")"
   open="("><mrow><mi>T</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced></mrow></mfenced></mrow></mfe
   nced></mrow><mo>)</mo></mrow><mo>=</mo><msub><mrow><mi>max</mi></mrow><
   mrow><mi>T</mi></mrow></msub><mfenced close=")"
   open="("><mrow><mo>∫</mo><mi>p</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced><mi>T</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced><mo>−</mo><mi>q</mi><mfenced
   close=")" open="("><mrow><mi>x</mi></mrow></mfenced><mi>g</mi><mfenced
   close=")" open="("><mrow><mi>T</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced></mrow></mfenced><mi>d</mi><m
   i>x</mi></mrow></mfenced><mo>=</mo><msub><mrow><mi>max</mi></mrow><mrow
   ><mi>T</mi></mrow></msub><mfenced close=")"
   open="("><mrow><mo>∫</mo><mi>p</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced><mi>T</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced><mo>−</mo><mi>q</mi><mfenced
   close=")" open="("><mrow><mi>x</mi></mrow></mfenced><mfenced close="]"
   open="["><mrow><mo>−</mo><mi>log</mi><mfenced close=")"
   open="("><mrow><mn>2</mn><mo>−</mo><msup><mrow><mi>e</mi></mrow><mrow><
   mi>T</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced></mrow></msup></mrow></mfence
   d></mrow></mfenced><mi>d</mi><mi>x</mi></mrow></mfenced> :MATH]
   15

   Let
   [MATH: <mi>T</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced><mo>=</mo><mi>log</mi><mrow><
   mo>[</mo><mrow><mn>2</mn><mi>D</mi><mrow><mo>(</mo><mrow><mi>x</mi></mr
   ow><mo>)</mo></mrow></mrow><mo>]</mo></mrow> :MATH]
   (Here,
   [MATH:
   <mi>D</mi><mrow><mo>(</mo><mrow><mi>x</mi></mrow><mo>)</mo></mrow>
   :MATH]
   is a variational function that can be related to
   [MATH:
   <mi>T</mi><mrow><mo>(</mo><mrow><mi>x</mi></mrow><mo>)</mo></mrow>
   :MATH]
   via a simple transformation. The purpose of this transformation is to
   simplify the optimization process, enabling more tractable gradients
   for optimization. Importantly, this transformation does not alter the
   intrinsic nature of
   [MATH: <mi>T</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced><mo>.</mo> :MATH]
   ), then Eq. ([313]15) can be converted to:
   [MATH: <msub><mrow><mi class="MJX-tex-caligraphic"
   mathvariant="script">D</mi></mrow><mrow><mi>J</mi><mi>S</mi></mrow></ms
   ub><mfenced close=")"
   open="("><mrow><mi>P</mi><mo>∣</mo><mo>∣</mo><mi>Q</mi></mrow></mfenced
   ><mo>=</mo><msub><mrow><mi>max</mi></mrow><mrow><mi>D</mi></mrow></msub
   ><mfenced close=")" open="("><mrow><mo>∫</mo><mi>p</mi><mfenced
   close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced><mi>log</mi><mrow><mo>[</mo><
   mrow><mn>2</mn><mi>D</mi><mrow><mo>(</mo><mrow><mi>x</mi></mrow><mo>)</
   mo></mrow></mrow><mo>]</mo></mrow><mo>−</mo><mi>q</mi><mfenced
   close=")" open="("><mrow><mi>x</mi></mrow></mfenced><mfenced close="]"
   open="["><mrow><mo>−</mo><mi>log</mi><mfenced close=")"
   open="("><mrow><mn>2</mn><mo>−</mo><msup><mrow><mi>e</mi></mrow><mrow><
   mi>log</mi><mrow><mo>[</mo><mrow><mn>2</mn><mi>D</mi><mrow><mo>(</mo><m
   row><mi>x</mi></mrow><mo>)</mo></mrow></mrow><mo>]</mo></mrow></mrow></
   msup></mrow></mfenced></mrow></mfenced><mi>d</mi><mi>x</mi></mrow></mfe
   nced><mo>=</mo><msub><mrow><mi>max</mi></mrow><mrow><mi>D</mi></mrow></
   msub><mstyle mathsize="2.03em"><mfenced
   open="("><mrow></mrow></mfenced></mstyle><mo>∫</mo><mi>p</mi><mfenced
   close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced><mi>log</mi><mfenced
   close="]" open="["><mrow><mi>D</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced></mrow></mfenced><mo>+</mo><m
   i>p</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced><mi>log</mi><mn>2</mn><mspace
   width="1.0em"></mspace><mo>+</mo><mi>q</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced><mi>log</mi><mfenced
   close="]" open="["><mrow><mn>1</mn><mo>−</mo><mi>D</mi><mfenced
   close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced></mrow></mfenced><mo>+</mo><m
   i>q</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced><mi>log</mi><mn>2</mn><mi>d</
   mi><mi>x</mi><mstyle mathsize="2.03em"><mfenced
   open=")"><mrow></mrow></mfenced></mstyle><mo>=</mo><msub><mrow><mi>max<
   /mi></mrow><mrow><mi>D</mi></mrow></msub><mfenced close=")"
   open="("><mrow><mo>∫</mo><mi>p</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced><mi>log</mi><mfenced
   close="]" open="["><mrow><mi>D</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced></mrow></mfenced><mo>+</mo><m
   i>q</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced><mi>log</mi><mfenced
   close="]" open="["><mrow><mn>1</mn><mo>−</mo><mi>D</mi><mfenced
   close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced></mrow></mfenced><mi>d</mi><m
   i>x</mi><mo>+</mo><mi>log</mi><mn>4</mn></mrow></mfenced><mi>⇌</mi><msu
   b><mrow><mi>max</mi></mrow><mrow><mi>D</mi></mrow></msub><mfenced
   close=")" open="("><mrow><mo>∫</mo><mi>p</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced><mi>log</mi><mfenced
   close="]" open="["><mrow><mi>D</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced></mrow></mfenced><mo>+</mo><m
   i>q</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced><mi>log</mi><mfenced
   close="]" open="["><mrow><mn>1</mn><mo>−</mo><mi>D</mi><mfenced
   close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced></mrow></mfenced><mi>d</mi><m
   i>x</mi></mrow></mfenced><mo>=</mo><msub><mrow><mi>max</mi></mrow><mrow
   ><mi>D</mi></mrow></msub><mfenced close=")"
   open="("><mrow><msub><mrow><mi
   mathvariant="double-struck">E</mi></mrow><mrow><mi>x</mi><mo>~</mo><mi>
   p</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced></mrow></msub><mfenced
   close="]" open="["><mrow><mi>log</mi><mi>D</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced></mrow></mfenced><mo>+</mo><m
   sub><mrow><mi
   mathvariant="double-struck">E</mi></mrow><mrow><mi>x</mi><mo>~</mo><mi>
   q</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced></mrow></msub><mfenced
   close="]"
   open="["><mrow><mi>log</mi><mrow><mo>(</mo><mrow><mn>1</mn><mo>−</mo><m
   i>D</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced></mrow><mo>)</mo></mrow></mro
   w></mfenced></mrow></mfenced> :MATH]
   16

   In our loss function,
   [MATH: <mi>p</mi><mfenced close=")"
   open="("><mrow><msup><mrow><mi>z</mi></mrow><mrow><mo>′</mo></mrow></ms
   up><mo>∣</mo><mi>z</mi></mrow></mfenced><mi>p</mi><mfenced close=")"
   open="("><mrow><mi>z</mi></mrow></mfenced> :MATH]
   and
   [MATH: <mi>p</mi><mfenced close=")"
   open="("><mrow><msup><mrow><mi>z</mi></mrow><mrow><mo>′</mo></mrow></ms
   up></mrow></mfenced><mi>p</mi><mfenced close=")"
   open="("><mrow><mi>z</mi></mrow></mfenced> :MATH]
   are used to replace
   [MATH: <mi>p</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced> :MATH]
   and
   [MATH: <mi>q</mi><mfenced close=")"
   open="("><mrow><mi>x</mi></mrow></mfenced> :MATH]
   , and the loss of the MMIM module can be rewritten as:
   [MATH:
   <msub><mrow><mi>L</mi></mrow><mrow><mi>m</mi><mi>i</mi><mi>m</mi></mrow
   ></msub><mo>=</mo><mo>−</mo><msub><mrow><mi
   mathvariant="double-struck">E</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>z</mi><mo>,</mo><msup><mrow><mi>z</mi></mrow><mrow><
   mo>′</mo></mrow></msup></mrow></mfenced><mo>~</mo><mi>p</mi><mfenced
   close=")"
   open="("><mrow><msup><mrow><mi>z</mi></mrow><mrow><mo>′</mo></mrow></ms
   up><mo>∣</mo><mi>z</mi></mrow></mfenced><mi>p</mi><mfenced close=")"
   open="("><mrow><mi>z</mi></mrow></mfenced></mrow></msub><mfenced
   close="]" open="["><mrow><mi>log</mi><mi>D</mi><mfenced close=")"
   open="("><mrow><mi>z</mi><mo>,</mo><msup><mrow><mi>z</mi></mrow><mrow><
   mo>′</mo></mrow></msup></mrow></mfenced></mrow></mfenced><mo>−</mo><msu
   b><mrow><mi mathvariant="double-struck">E</mi></mrow><mrow><mfenced
   close=")"
   open="("><mrow><mi>z</mi><mo>,</mo><msup><mrow><mi>z</mi></mrow><mrow><
   mo>′</mo></mrow></msup></mrow></mfenced><mo>~</mo><mi>p</mi><mfenced
   close=")"
   open="("><mrow><msup><mrow><mi>z</mi></mrow><mrow><mo>′</mo></mrow></ms
   up></mrow></mfenced><mi>p</mi><mfenced close=")"
   open="("><mrow><mi>z</mi></mrow></mfenced></mrow></msub><mfenced
   close="]" open="["><mrow><mi>log</mi><mfenced close=")"
   open="("><mrow><mn>1</mn><mo>−</mo><mi>D</mi><mfenced close=")"
   open="("><mrow><mi>z</mi><mo>,</mo><msup><mrow><mi>z</mi></mrow><mrow><
   mo>′</mo></mrow></msup></mrow></mfenced></mrow></mfenced></mrow></mfenc
   ed> :MATH]
   17

   The problem in Eq. ([314]17) can be solved using the negative sample
   estimation^[315]107.
   [MATH: <mi>D</mi><mfenced close=")"
   open="("><mrow><mi>z</mi><mo>,</mo><msup><mrow><mi>z</mi></mrow><mrow><
   mo>′</mo></mrow></msup></mrow></mfenced> :MATH]
   is a discriminator to distinguish the negative sample pairs and
   positive sample pairs to estimate the distribution of positive samples.
   Positive sample pairs are composed by the latent representations of the
   sample x and its nearest neighbor
   [MATH: <mi>x</mi><mo>′</mo> :MATH]
   in any view, and negative sample pairs are composed by the latent
   representations of the sample x and random samples outside its nearest
   neighbors. The nearest neighbors of each sample are identified by the
   k-nearest neighbors algorithm.

Loss function of scNiche

   The total loss function of scNiche is defined as:
   [MATH:
   <mi>L</mi><mo>=</mo><msub><mrow><mi>L</mi></mrow><mrow><mi>g</mi><mi>r<
   /mi><mi>e</mi></mrow></msub><mo>+</mo><msub><mrow><mi>λ</mi></mrow><mro
   w><mn>1</mn></mrow></msub><msub><mrow><mi>L</mi></mrow><mrow><mi>r</mi>
   <mi>e</mi><mi>c</mi></mrow></msub><mo>+</mo><msub><mrow><msub><mrow><mi
   >λ</mi></mrow><mrow><mn>2</mn></mrow></msub><mi>L</mi></mrow><mrow><mi>
   m</mi><mi>i</mi><mi>m</mi></mrow></msub> :MATH]
   18

   where
   [MATH: <msub><mrow><mi>λ</mi></mrow><mrow><mn>1</mn></mrow></msub>
   :MATH]
   and
   [MATH: <msub><mrow><mi>λ</mi></mrow><mrow><mn>2</mn></mrow></msub>
   :MATH]
   are hyperparameters that balances three parts of the loss function. By
   default,
   [MATH:
   <msub><mrow><mi>λ</mi></mrow><mrow><mn>1</mn></mrow></msub><mo>=</mo><m
   sub><mrow><mi>λ</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>=</mo><mn>
   1</mn> :MATH]
   .

Batch training strategy

   We develop a subgraph-based batch training strategy that enables
   scNiche to scale to large datasets and multiple slices. Specifically,
   after extracting multi-view features of cells, we do not directly
   construct the corresponding graphs with all cells, which would result
   in insufficient memory due to the excessive number of nodes and edges
   on the entire graph. As an alternative, we initially divide the entire
   dataset into several non-overlapping subsets using a random sampling
   strategy, and subsequently construct corresponding graphs for each
   subset, which are referred to as subgraphs. Next, we employ the batched
   graph data loader facilitated by DGL^[316]109 for batch-iterating over
   this set of subgraphs to generate the batched graph of each batch for
   model training. Considering the sharp reduction in the number of nodes
   and edges on each subgraph compared to the entire graph, this batch
   training strategy effectively avoids the out-of-memory limitation. We
   evaluated the robustness of scNiche’s batch training strategy to the
   different batch number settings on the mouse spleen CODEX dataset
   (Supplementary Fig. [317]S4).

Clustering

   We employ the unsupervised clustering algorithm k-means by default on
   the learned joint representation to identify the cell niches.
   Additionally, if the target number of clusters is not provided, we
   identify the optimal candidates for K based on the stability of the
   clustering proposed by Varrone et al.^[318]22. In brief, for each K
   within the specified range, we execute a single clustering run with K
   clusters. Subsequently, we calculated the average Fowlkes–Mallows
   Index^[319]110 (FMI) between the clusters at K-1 and K, and between K
   and K+1. A higher average FMI indicates a higher similarity between
   clustering solutions of the continuous cluster number, i.e., the
   clustering results are more stable.

Enrichment analysis of cell niches

   We apply a general enrichment analysis framework that can characterize
   the identified cell niches from various perspectives (including
   cellular compositions, conditions, and samples, etc.) and compute the
   corresponding enrichment scores. Taking cellular composition as an
   example, given cells belonging to S samples, N identified cell niches,
   and M cell populations
   [MATH: <mi>C</mi><mo>=</mo><mfenced close="}"
   open="{"><mrow><msup><mrow><mi>c</mi></mrow><mrow><mrow><mo>(</mo><mrow
   ><mi>s</mi><mo>,</mo><mi>n</mi><mo>,</mo><mi>m</mi></mrow><mo>)</mo></m
   row></mrow></msup><mo>∣</mo><mn>1</mn><mo>≤</mo><mi>s</mi><mo>≤</mo><mi
   >S</mi><mo>,</mo><mn>1</mn><mo>≤</mo><mi>n</mi><mo>≤</mo><mi>N</mi><mo>
   ,</mo><mn>1</mn><mo>≤</mo><mi>m</mi><mo>≤</mo><mi>M</mi></mrow></mfence
   d> :MATH]
   , we first compute the observed value of the proportion of the cell
   population m within the cell niche n in each sample s:
   [MATH:
   <msubsup><mrow><mi>P</mi><mi>r</mi><mi>o</mi><mi>p</mi></mrow><mrow><mi
   >o</mi><mi>b</mi><mi>s</mi></mrow><mrow><mrow><mo>(</mo><mrow><mi>s</mi
   ><mo>,</mo><mi>n</mi><mo>,</mo><mi>m</mi></mrow><mo>)</mo></mrow></mrow
   ></msubsup><mo>=</mo><mfrac><mrow><msup><mrow><mi>c</mi></mrow><mrow><m
   row><mo>(</mo><mrow><mi>s</mi><mo>,</mo><mi>n</mi><mo>,</mo><mi>m</mi><
   /mrow><mo>)</mo></mrow></mrow></msup></mrow><mrow><msubsup><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>M</mi></mrow></msubsup><msup><mrow><mi>c</mi></mrow><mrow><mr
   ow><mo>(</mo><mrow><mi>s</mi><mo>,</mo><mi>n</mi><mo>,</mo><mi>i</mi></
   mrow><mo>)</mo></mrow></mrow></msup></mrow></mfrac> :MATH]
   19

   and the expected value of the proportion of the cell population m
   within the cell niche n in each sample s is defined as:
   [MATH:
   <msubsup><mrow><mi>P</mi><mi>r</mi><mi>o</mi><mi>p</mi></mrow><mrow><mi
   >exp</mi></mrow><mrow><mrow><mo>(</mo><mrow><mi>s</mi><mo>,</mo><mi>n</
   mi><mo>,</mo><mi>m</mi></mrow><mo>)</mo></mrow></mrow></msubsup><mo>=</
   mo><mfrac><mrow><msubsup><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mtable><mtr><mtd
   columnalign="center"><mi>i</mi><mo>=</mo><mn>1</mn></mtd></mtr><mtr><mt
   d
   columnalign="center"><mi>i</mi><mo>≠</mo><mi>n</mi></mtd></mtr></mtable
   ></mrow><mrow><mi>N</mi></mrow></msubsup><msubsup><mrow><mi>P</mi><mi>r
   </mi><mi>o</mi><mi>p</mi></mrow><mrow><mi>o</mi><mi>b</mi><mi>s</mi></m
   row><mrow><mrow><mo>(</mo><mrow><mi>s</mi><mo>,</mo><mi>i</mi><mo>,</mo
   ><mi>m</mi></mrow><mo>)</mo></mrow></mrow></msubsup></mrow><mrow><mi>N<
   /mi><mo>−</mo><mn>1</mn></mrow></mfrac> :MATH]
   20

   we then define the enrichment score of cell population m within the
   cell niche n across S samples as follow:
   [MATH:
   <msup><mrow><mi>E</mi><mi>S</mi></mrow><mrow><mrow><mo>(</mo><mrow><mi>
   n</mi><mo>,</mo><mi>m</mi></mrow><mo>)</mo></mrow></mrow></msup><mo>=</
   mo><msub><mrow><mi>log</mi></mrow><mrow><mn>2</mn></mrow></msub><mfence
   d close=")" open="("><mrow><mfrac><mrow><mfrac><mrow><msubsup><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>S</mi></mrow></msubsup><msubsup><mrow><mi>P</mi><mi>r</mi><mi
   >o</mi><mi>p</mi></mrow><mrow><mi>o</mi><mi>b</mi><mi>s</mi></mrow><mro
   w><mrow><mo>(</mo><mrow><mi>i</mi><mo>,</mo><mi>n</mi><mo>,</mo><mi>m</
   mi></mrow><mo>)</mo></mrow></mrow></msubsup></mrow><mrow><mi>S</mi></mr
   ow></mfrac></mrow><mrow><mfrac><mrow><msubsup><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow>
   <mrow><mi>S</mi></mrow></msubsup><msubsup><mrow><mi>P</mi><mi>r</mi><mi
   >o</mi><mi>p</mi></mrow><mrow><mi>exp</mi></mrow><mrow><mrow><mo>(</mo>
   <mrow><mi>i</mi><mo>,</mo><mi>n</mi><mo>,</mo><mi>m</mi></mrow><mo>)</m
   o></mrow></mrow></msubsup></mrow><mrow><mi>S</mi></mrow></mfrac></mrow>
   </mfrac></mrow></mfenced> :MATH]
   21

   The P-value of
   [MATH:
   <msup><mrow><mi>E</mi><mi>S</mi></mrow><mrow><mrow><mo>(</mo><mrow><mi>
   n</mi><mo>,</mo><mi>m</mi></mrow><mo>)</mo></mrow></mrow></msup> :MATH]
   can be computed with the one-sided Mann-Whitney U test if requested.

Multi-sample analysis framework

   For large-scale datasets containing multiple samples under different
   conditions, scNiche provides a multi-sample analysis framework that
   enables niche comparisons at the sample scale. Specifically, for each
   niche within each sample, we compute its cell composition and
   phenotypic characteristics (defined as the average expression value of
   all cells belonging to this niche), as well as the proportion and
   phenotypic characteristics (defined as the average expression value of
   all cells belonging to this cell population) of specific cell
   populations within this niche. Subsequently, we can perform
   differential analyses across the entire sample series, including the
   comparison of compositions or phenotypes for specific niches between
   conditions, as well as the comparison of proportions or phenotypes for
   specific cell populations between niches. The p-value is calculated
   with the two-sided Mann-Whitney U test. Furthermore, to avoid the
   effect of outliers, for each niche, only samples with a proportion of
   that niche exceeding a set threshold (5% by default) are considered.

Simulation experiment setup

   We simulated the situation in which the cell niches exhibit
   heterogeneity in both gene expression and cellular composition among
   each other (Supplementary Fig. [320]S2a). To achieve this, we generated
   the simulated data from the multi-condition scRNA-seq datasets
   following the simulation framework used in scCube^[321]31.
   Specifically, we first generate the proportion and cellular composition
   of each niche in a randomized manner. If two cell niches were
   designated to exhibit heterogeneity in gene expression, the cells
   within these two cell niches were derived from different conditions of
   scRNA-seq data with similar composition proportions. Conversely, if two
   cell niches were designated to exhibit heterogeneity in cellular
   composition, the cells within these two cell niches were derived from
   the same condition of scRNA-seq data but with different composition
   proportions. Subsequently, we generated the random spatial patterns for
   each cell niche with the reference-free strategy of scCube.

   We considered four variabilities in the simulated data: continuity of
   spatial patterns, complexity of niche composition, gene expression
   dropout, and cell annotation dropout. For the first variability, we
   generated cell niches with different levels of spatial pattern
   continuity by setting the parameter δ in scCube to 10, 20, 30, and 50.
   For the second variability, we generated cell niches with different
   composition complexity by randomly selecting 2, 3, and 4 cell
   populations for each cell niche. The last two variabilities
   corresponded to scenarios of degraded data quality. For the dropout of
   gene expression, we randomly set the expression values of genes to 0
   with the proportions of 0.1, 0.2, 0.4, and 0.8. For the dropout of cell
   annotation, we randomly altered the cell annotation labels to
   “ambiguous” with the proportions of 0.1, 0.2, 0.4, and 0.8.

Benchmarking analysis

   We compared the performance of scNiche with other existing methods
   using simulated and biological datasets. The target number of clusters
   was kept consistent across all methods for each benchmarking dataset,
   as determined by the ground truth. Furthermore, for methods that
   require specifying the range of neighborhoods based on the k-nearest
   neighbors algorithm first, such as scNiche, BANKSY, STAGATE, GraphST,
   SpaceFlow, and CytoCommunity, we set a consistent value of k for each
   method (20 for the simulated datasets, 30 for the single cell spatial
   omics datasets, and 6 for the human DLPFC 10X Visium dataset). Below we
   describe the application of each method.

scNiche

   The preprocessing step of each dataset is described in the “Data
   collection and preprocessing” section above. For the mouse spleen CODEX
   dataset, we applied the batch training strategy to run scNiche on both
   single and multiple slices with the number of batches set to 30 and
   100, respectively. For the human UTUC IMC dataset, mouse frontal cortex
   and striatum MERFISH dataset, and the human DLPFC 10X Visium dataset,
   we applied the batch training strategy to run scNiche on multiple
   slices with the number of batches set to 100, 20, and 2, respectively.
   For other datasets, we directly ran scNiche with the full-graph-based
   training. The parameter ‘epochs’ was set to 200 for the mouse spleen
   CODEX dataset and 100 for other datasets; the parameter ‘lr’ was set to
   0.01 for all datasets.

DR-SC

   DR-SC^[322]18 is implemented in the R package DR.SC (version 3.3). The
   parameter ‘platform’ was set to ‘Visium’ for the human DLPFC 10X Visium
   dataset and ‘Other_SRT’ for other datasets. All other parameters were
   set with default values.

BASS

   BASS^[323]19 is implemented in the R package BASS (version 1.1.0.016).
   We ran BASS with default parameters except for the parameter ‘C’, which
   was determined by the number of cell types in each benchmarking
   dataset.

CellCharter

   CellCharter^[324]22 is implemented in the Python package cellcharter
   (version 0.1.2). We followed the instructions in the original article
   for dimensionality reduction and batch effect removal. Specifically,
   for the simulated datasets and spatial proteomics datasets, we applied
   the TRVAE model implemented in the Python package scArches^[325]104
   (version 0.5.9). For the human DLPFC 10X Visium dataset, we applied the
   scVI model implemented in the Python package scvi^[326]65 (version
   1.1.2). For the mouse V1 neocortex STARmap dataset, we applied the PCA
   dimensionality reduction directly since this dataset only contains one
   tissue slice. For the mouse frontal cortex and striatum MERFISH
   dataset, we also applied the PCA dimensionality reduction since this
   dataset has been normalized by the original authors. All other
   parameters were set with default values.

SpaGCN

   SpaGCN^[327]24 is implemented in the Python package SpaGCN (version
   1.2.7). We ran SpaGCN with default parameters. The refinement step was
   also performed.

UTAG

   UTAG^[328]21 is implemented in the Python package utag (version 0.1.1).
   We ran UTAG with default parameters except for the parameter
   ‘max_dist’, which was set to be consistent with the value of k in
   methods that require specifying the range of neighborhoods based on the
   k-nearest neighbors algorithm first (20 for the simulated datasets, 30
   for the single cell spatial omics datasets, and 6 for the human DLPFC
   10X Visium dataset). Additionally, since the Leiden clustering
   algorithm employed in UTAG does not directly allow for setting the
   number of clusters, we modified the clustering process by introducing
   the ‘search_res’ function from SpaGCN^[329]24 to determine the
   clustering resolution first, and subsequently performed Leiden
   clustering with this resolution.

BANKSY

   BANKSY^[330]23 is implemented in the R package Banksy (version
   0.99.13). We followed the tutorials provided by the original authors to
   set recommended values of the parameter ‘lambda’. Specifically, for the
   human DLPFC 10X Visium dataset, the parameter ‘lambda’ was set to 0.2.
   For other datasets, the parameter ‘lambda’ was set to 0.8. All other
   parameters were set with default values. Additionally, for the human
   DLPFC 10X Visium dataset, we applied the multi-sample analysis followed
   by the tutorial. The clustering process was also modified as described
   above to determine the clustering resolution first.

STAGATE

   STAGATE^[331]25 is implemented in the Python package stagate-pyg
   (version 1.0.0). For the mouse spleen CODEX dataset, we ran STAGATE
   using the batch training strategy with the parameters ‘num_batch_x’ =
   4, ‘num_batch_y’ = 6, and ‘n_epochs’ = 500, and all other parameters
   were set with default values. For other datasets, we ran STAGATE with
   default parameters. Additionally, for the mouse V1 neocortex STARmap
   dataset and the human DLPFC 10X Visium dataset, we applied the ‘mclust’
   algorithm in the clustering step as recommended, and for other
   datasets, we applied the Louvain algorithm with the clustering
   resolution determined by the modified clustering process described
   above.

GraphST

   GraphST^[332]26 is implemented in the Python package GraphST (version
   1.1.1). We ran GraphST with default parameters. The refinement step was
   also performed. Notably, GraphST raised a “CUDA out of memory” error on
   the mouse spleen CODEX dataset.

SpaceFlow

   SpaceFlow^[333]27 is implemented in the Python package SpaceFlow
   (version 1.0.3). We ran SpaceFlow with default parameters. The
   clustering process was also modified as described above to determine
   the clustering resolution first.

CytoCommunity

   CytoCommunity^[334]28 is implemented in the Python package
   CytoCommunity (version 1.1.0). We applied the unsupervised version of
   CytoCommunity. For the simulated datasets and the mouse V1 neocortex
   STARmap dataset, we ran CytoCommunity with default parameters. For the
   mouse spleen CODEX dataset, the human UTUC IMC dataset, and the mouse
   frontal cortex and striatum MERFISH dataset, we set the parameter
   ‘Num_Epoch’ to 100, 500, and 500, respectively, to reduce the training
   time. For the mouse spleen CODEX dataset, we further set the parameter
   ‘Loss_Cutoff’ to -0.3 to reduce the training time. All other parameters
   were set with default values. In addition, we did not run CytoCommunity
   on the human DLPFC 10X Visium dataset because the current version of
   CytoCommunity does not support data at non-single-cell resolution.

Evaluation metrics

   We used the adjusted Rand index (ARI) and the macro-F1 score to
   evaluate the performance of each method. The benchmarking was conducted
   on a computing cluster with 2 AMD EPYC 7K62 CPUs (48 cores each), with
   approximately 503.65 GB of usable system memory. For GPU-compatible
   methods, an NVIDIA A10 GPU with 24 GB total memory (approximately 22.5
   GB usable) was used.

Scalability analysis of each method on the mouse whole brain MERFISH dataset

   We tested the scalability of scNiche and other methods on the processed
   mouse whole brain MERFISH dataset containing 3,698,530 cells from 129
   coronal sections. For scNiche, we performed the PCA dimensionality
   reduction on the data first and then applied the batch training
   strategy on multiple sections with the number of batches set to 500. We
   set other parameters ‘k_cutoff’ = 30, ‘epochs’ = 25, and ‘lr’ = 0.01.
   The target number of clusters was set to 14 based on the cluster
   stability result. For CellCharter, we performed the PCA dimensionality
   reduction on the data first, and ran it with default parameters. For
   UTAG, we applied the batch mode provided in the tutorial with the
   parameter ‘max_dist’ = 30. For BANKSY, we applied the multi-sample
   analysis with the parameters ‘k_geom’ = 30 and ‘lambda’ = 0.8. The
   number of clusters was also set to 14 to be consistent with scNiche. In
   addition, we performed the k-means algorithm on the principal
   components of the data directly as the nonspatial clustering for
   comparison. The annotated mouse brain coronal section images were
   downloaded from the Allen Mouse Brain Atlas^[335]41
   [[336]https://mouse.brain-map.org/experiment/thumbnails/100048576?image
   _type=atlas]. The scalability benchmarking was conducted on the same
   computing cluster as the other benchmarking studies.

Analysis of the human TNBC MIBI-TOF dataset and the mouse liver Seq-Scope
dataset

   For both the human TNBC MIBI-TOF dataset and the mouse liver Seq-Scope
   dataset, we applied the batch training strategy to run scNiche on
   multiple slices with the number of batches set to 20. Notably, for the
   Seq-Scope data, despite its ability to achieve subcellular resolution,
   the UMI information for each high-definition map coordinate identifier
   (HDMI) needs to be aggregated to produce interpretable results^[337]10.
   We therefore used the data binning with 10 μm grids provided by the
   original authors for our analysis, as the resolution of this data was
   already close to the single-cell level and there was no much noise in
   cell type identification^[338]10 (Supplementary Fig. [339]S32). We set
   the parameters ‘k_cutoff’ = 30, ‘epochs’ = 100, and ‘lr’ = 0.01 for
   both datasets. The target number of clusters was determined by the
   cluster stability results.

Spatial connectivity analysis of cell niches

   Given cells belonging to N identified cell niches
   [MATH: <mi>C</mi><mo>=</mo><msubsup><mrow><mfenced close="}"
   open="{"><mrow><msup><mrow><mi>c</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>n</mi></mrow></mfenced></mrow></msup></mrow></mfence
   d></mrow><mrow><mi>n</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>N</mi></m
   row></msubsup> :MATH]
   , we first computed the spatial graph of cells based on Delaunay
   triangulation using the ‘spatial_neighbors’ function in squidpy^[340]89
   Python package (version 1.2.3) and get the adjacency matrix A. The
   number of spatial links between cell niche i and j then can be defined
   as:
   [MATH:
   <msup><mrow><mi>L</mi><mi>i</mi><mi>n</mi><mi>k</mi></mrow><mrow><mrow>
   <mo>(</mo><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow><mo>)</mo></mrow><
   /mrow></msup><mo>=</mo><msub><mrow><mo
   mathsize="big">∑</mo></mrow><mrow><mi>v</mi><mo>∈</mo><msup><mrow><mi>c
   </mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>i</mi></mrow></mfenced></mrow></msup><mo>,</mo><mi>w
   </mi><mo>∈</mo><msup><mrow><mi>c</mi></mrow><mrow><mfenced close=")"
   open="("><mrow><mi>j</mi></mrow></mfenced></mrow></msup></mrow></msub><
   msub><mrow><mi>A</mi></mrow><mrow><mi>v</mi><mi>w</mi></mrow></msub>
   :MATH]
   22

   Larger values indicate a stronger spatial connectivity between cell
   niches.

Differential gene expression analysis

   The differentially expressed genes were calculated using the
   ‘rank_genes_groups’ function in scanpy^[341]101 Python package (version
   1.9.1) with the Wilcoxon rank sum test (adjusted p-value < 0.05).

Gene expression signature score calculation

   The gene expression signature scores were calculated using the
   ‘score_genes’ function in scanpy^[342]101 Python package (version
   1.9.1) with default parameters. The signature gene sets of
   KEGG^[343]111 pathways were downloaded using the ‘get_library’ function
   in gseapy^[344]112 Python package (version 1.1.2). The signature gene
   sets of the cellular inflammatory infiltration and fibrosis were
   obtained from the original publication by Te et al.^[345]74.

Pathway enrichment analysis

   The gene set enrichment analysis^[346]113 (GSEA) was performed using
   the gseapy^[347]112 Python package (version 1.1.2) with default
   parameters, whose hallmark gene sets were downloaded from the Molecular
   Signatures Database^[348]114,[349]115 using the ‘get_library’ function
   in gseapy^[350]112 Python package (version 1.1.2).

Statistics

   Python (version 3.9.19) and R (version 4.2.1 and 4.3.1) are used for
   the statistical analysis.

Reporting summary

   Further information on research design is available in the [351]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [352]Supplementary Information^ (24.7MB, pdf)
   [353]41467_2025_57029_MOESM2_ESM.pdf^ (27.6KB, pdf)

   Description of Additional Supplementary Files
   [354]Supplementary Data 1^ (19.1KB, xlsx)
   [355]Reporting Summary^ (270.8KB, pdf)
   [356]Transparent Peer Review file^ (4.5MB, pdf)

Source data

   [357]Source Data^ (5.5MB, xlsx)

Acknowledgements