Abstract

   The spatial organization of cells plays a pivotal role in shaping
   tissue functions and phenotypes in various biological systems and
   diseased microenvironments. However, the topological principles
   governing interactions among cell types within spatial patterns remain
   poorly understood. Here, we present the triangulation cellular
   community motif neural network (TrimNN), a graph-based deep learning
   framework designed to identify conserved spatial cell organization
   patterns, termed cellular community (CC) motifs, from spatial
   transcriptomics and proteomics data. TrimNN employs a
   semi–divide-and-conquer approach to efficiently detect overrepresented
   topological motifs of varying sizes in a triangulated space. By
   uncovering CC motifs, TrimNN reveals key associations between spatially
   distributed cell-type patterns and diverse phenotypes. These insights
   provide a foundation for understanding biological and disease
   mechanisms and offer potential biomarkers for diagnosis and therapeutic
   interventions.

   Subject terms: Computational models, Machine learning, Data mining,
   Network topology
     __________________________________________________________________

   Cellular spatial organisation is crucial for shaping tissue functions
   and phenotypes. Here, authors present TrimNN, a graph-based deep
   learning framework to identify conserved cellular community motifs,
   revealing links between spatially distributed cell-type patterns and
   diverse phenotypes.

Introduction

   Various cells work together within spatial arrangements in tissue to
   support organ homeostasis and function^[42]1. Deciphering the
   multicellular organization is key to understanding the relationship
   between spatial structure and tissue biological and pathological
   functions^[43]2. Emerging spatial omics approaches, including spatially
   resolved transcriptomics^[44]3 and spatial proteomics^[45]4, enable
   investigation of the mechanisms governing the spatial organization of
   different cell types in a specific tissue. Within a region of interest
   (ROI) in spatial omics, cellular neighborhoods (CNs) define local cell
   type enrichment patterns in cellular communities (CCs), and decoding
   function-related conservative spatial features in CNs is one of the
   primary spatial omics data analysis tasks^[46]4.

   Most existing data analysis approaches adopt the top-down strategy to
   describe the cell organizations. This strategy mainly relies on
   clustering strategies to identify the cell type compositions as common
   patterns. Deep learning approaches, including SPACE-GM^[47]5,
   CytoCommunity^[48]6, CellCharter^[49]7, and BANKSY^[50]8, typically
   learn low-dimensional embeddings of the nodes in corresponding CNs and
   then apply clustering approaches to these embeddings. However,
   clustering approaches suffer the following challenges in dissecting and
   interpreting highly heterogeneous, dynamically evolving cell
   systems^[51]9. First, clustering results usually become less stable
   when samples contain cells under active state transition, which is
   common in disease or developmental processes^[52]10. Second, clusters
   identified by these top-down approaches are often described as
   percentages of cell-type compositions. These clustering presentations
   lack formulations in topologically representing the geometrical
   cell-type interactions or are difficult to interpret biologically.
   Last, these top-down results essentially depend on the presence of
   batch effects, where CNs separate primarily by samples as technical
   covariates rather than biological features^[53]3. These batch effects
   make it easy to overfit the models but difficult to validate across
   different datasets^[54]5.

   Considering the preceding limitations of top-down strategies, we
   instead use a bottom-up strategy to identify CC motifs as recurring
   significant interconnections between cells. In the spatial
   omics–derived CC, we hypothesize that CC motifs can be represented as
   topological building blocks of multicellular organization consistent
   across different samples and associated with key biological processes
   and functions. CC motifs are biologically interpretable spatial
   patterns of the combined cell types, which provide topological
   information beyond clusters and explicitly link to the biological and
   pathological mechanisms through distinct cell–cell communications,
   highly expressed genes and pathways^[55]11. This concept is related to
   the functional tissue units (FTUs)^[56]12, but CC motifs are even
   smaller in the scale of cell locations and cell types, which provides
   more details for understanding and modeling the healthy physiological
   function of the organ and functional-related changes during disease
   states. Currently, size 1–3 motif analysis^[57]4 makes up most of the
   spatial omics studies, where size-1 motifs are single nodes that can be
   treated as cell-type compositions, size-2 motifs are double nodes
   linked by edges, and size-3 motifs are triple nodes within triangles.
   Nevertheless, biologists have found that sizable CC motifs with more
   nodes than triangles substantially correlate with patient survival and
   phenotypical features in colorectal cancer (CRC)^[58]13, kidney
   diseases^[59]14, maternal–fetal interface^[60]15, and many other
   biological contexts.

   In practice, identifying the most overrepresented CC motifs composing
   multicellular organization is still computationally expensive with (i)
   subgraph matching^[61]16, which counts the occurrence of a given motif
   on the query graph, and (ii) pattern growth^[62]17, which finds the
   motifs with the most significant occurrence. It is known that subgraph
   matching is NP-complete^[63]16, which makes the node type combination
   alone super-exponential. Existing approaches include
   permutation^[64]11, edge sampling (e.g., mfinder^[65]18), node sampling
   (e.g., FANMOD^[66]19), and global pruning (e.g., Ullmann^[67]20 and
   VF2^[68]21). A computationally feasible approach is still lacking to
   analytically identify conservative, interpretable, and generalizable
   spatial rules of cellular organization in different sizes across
   different samples of spatial omics.

   Here, we propose the triangulation cellular community motif neural
   network (TrimNN), a graph-based deep learning approach to analyze
   spatial transcriptomics and proteomics data using a bottom-up strategy
   (Supplementary Fig. [69]1). Within the input spatial omics samples, CC
   is defined based on the cells as nodes, the node types represent
   different cell types, and the edges encode physical proximity inferred
   unidirectional as the spatial cell-cell relation from Delaunay
   triangulation^[70]22 based on nodes coordinates from ROI. TrimNN
   estimates overrepresented size-
   [MATH: <mi>K</mi> :MATH]
   CC motifs in the CC of spatial omics using graph isomorphism
   networks^[71]23 (GIN) empowered by positional encoding^[72]24 (PE). In
   various spatial transcriptomics and spatial proteomics case studies,
   TrimNN identifies computationally significant and biologically
   meaningful CC motifs to differentiate patient survival in CRC studies
   and represents pathologically related cell type organization in
   neurodegenerative diseases and colorectal carcinoma studies. Notably,
   the identified sizable CC motifs demonstrate their potential as
   interpretable topological prognostic biomarkers linking the topological
   structural organization of cell types at microscopic levels to
   phenotypes at macroscopic levels, which cannot be inferred by other
   existing tools. The source code of TrimNN is publicly available at
   [73]https://github.com/yuyang-0825/TrimNN.

Results

TrimNN quantifies multicellular organization with sizable CC motifs

   A schematic diagram of the proposed TrimNN and its analytic workflow is
   shown in Fig. [74]1A. These identified CC motifs are biologically
   interpretable through a set of downstream analyses, including motif
   visualization, cellular-level interpretation within cell–cell
   communication analysis, gene-level interpretation within differentially
   expressed gene and pathway analysis, and phenotypical analysis within
   the availability of phenotypical information (Fig. [75]1B). TrimNN is
   constructed on an empowered GIN to estimate the occurrence of the query
   on the target graph. On the CC as a triangulated graph built from
   spatial omics, TrimNN builds a supervised graph learning model by
   simplifying the graph constraints and incorporating the inductive bias
   within triangles derived from Delaunay triangulation. TrimNN decomposes
   the regression task in occurrence counting of the query graphs into
   many trackable binary classification tasks modeled by the sub-TrimNN
   module. Inspired by the idea of NSIC^[76]25, this method is trained on
   representative pairs of the predefined query subgraphs and the target
   triangulated cell graphs as a binary classification task. This graph
   representation framework builds upon GIN and adopts a shortest
   distance–based PE^[77]24, modeling the symmetric space to increase the
   expressive power. Additionally, TrimNN adopts a semi–divide-and-conquer
   strategy to estimate the abundance of the query by summarizing the
   enumeration of single classification tasks by a sub-TrimNN module on
   each node’s enclosed graph. Given the size of the query subgraph, our
   framework uses an enumeration approach to estimate the most
   overrepresented CC motifs with possible cell types and topology. Then,
   we search to infer CC motifs in different sizes incrementally. The
   details of the architecture of TrimNN are shown in Supplementary
   Fig. [78]2.

Fig. 1. TrimNN analysis workflow.

   [79]Fig. 1
   [80]Open in a new tab

   A Spatially resolved transcriptomics (e.g., STARmap PLUS and 10X
   Xenium) and spatial proteomics data (e.g., MIBI-TOF and CODEX) are used
   as input to generate corresponding CCs with spatial coordinates and
   Delaunay triangulation. TrimNN is trained on representative pairs of
   query motifs and target triangulated graphs at scale. Given a specific
   query, TrimNN identifies its occurrence in the target CC in the
   subgraph matching process by decomposing this regression task into many
   binary classification problems, where each classification predicts
   whether the query exists in the target graph as the enclosed graph of
   each node. Enumerating possible motifs at size-
   [MATH: <mi>K</mi> :MATH]
   , TrimNN identifies the most overrepresented motifs. Then, the pattern
   growth process adopts a heuristic search for their successor size-
   [MATH: <mi>k</mi><mo>+</mo><mn>1</mn> :MATH]
   motifs. Here, we take size-3 CC motifs as an example. After subgraph
   matching and pattern growth, TrimNN estimates overrepresented CC
   motifs. Created in BioRender. Yu, Y. (2025) [81]BioRender.com/mfm4ta4.
   B These CC motifs can be biologically interpreted in the downstream
   analysis, including visualization, cellular-level interpretation within
   cell–cell communication analysis, gene-level interpretation within
   differentially expressed gene analysis (e.g., GO enrichment analysis
   and pathway enrichment analysis), and phenotypical analysis within the
   availability of phenotypical information (e.g., survival curve and
   phenotypic classification analysis). CC: cellular community.

   We hypothesize CC motifs as the countable recurring spatial patterns of
   various cell types are robust within noises to represent and quantify
   multicellular organization. We performed simulations to mimic different
   levels of noises, including cell missing from the cell capture
   imperfection of sequencing technology (Fig. [82]2A), cell coordinate
   shifting from technological errors (Fig. [83]2B), and cell type
   misclassification from annotation errors in data analytics
   (Fig. [84]2C). We noticed diverse noises do not influence the relative
   ranking of CC motif abundance, which remained robustly consistent in
   most scenarios (Fig. [85]2A and Supplementary Fig. [86]3). Even under
   extreme cases with a noise ratio of 0.4 and 0.5, the Spearman
   correlation between abundance rankings before and after poised noise
   remained relatively stable in cell dropout and cell coordinate
   perturbations. When the noise level is high, the correlation values
   deteriorate in large motif sizes for cell type misclassification, which
   is unlikely to occur in practical scenarios.

Fig. 2. The performance of TrimNN on spatial omics.

   [87]Fig. 2
   [88]Open in a new tab

   A Simulations of missing cell effects on CC motifs, represented as the
   Spearman correlation between abundance rankings of all the possible
   motifs before and after simulated noises at cell proportions of 0.01,
   0.05, 0.1, 0.2, 0.3, 0.4, and 0.5 within CC motifs in size-1, size-2,
   size-3, size-4, and size-5 (n = 100). B Simulations of cell coordinate
   shifting effects on CC motifs, represented as the Spearman correlation
   between abundance rankings of all the possible motifs before and after
   simulated noises with different levels of noises of 0.01, 0.05, 0.1,
   0.2, 0.3, 0.4, and 0.5 at cell proportions of 0.01, 0.05, 0.1, 0.2,
   0.3, 0.4, and 0.5 within CC motifs in size-1, size-2, size-3, size-4,
   and size-5 (n = 100). C Simulations of cell-type misclassification
   effects on CC motifs, represented as the Spearman correlation between
   abundance rankings of all the possible motifs before and after
   simulated noises at cell proportions of 0.01, 0.05, 0.1, 0.2, 0.3, 0.4,
   and 0.5 within CC motifs in size-1, size-2, size-3, size-4, and size-5
   (n = 100). D Benchmarking the performance of TrimNN, TrimNN-RGIN, and
   NSIC on independent simulated data for subgraph matching (n = 3000).
   The X-axis represents different sizes of CC motifs, and the Y-axis
   indicates the MCC (Matthews Correlation Coefficient) values. E
   Performance comparison of TrimNN, TrimNN-RGIN, and NSIC in identifying
   occurrences of CC motifs in diverse simulated datasets. The Y-axis is
   the RMSE (Root Mean Square Error) value. F Scalability of TrimNN. The
   X-axis represents the size of the triangulated graph, and the Y-axis
   indicates the runtime on a workstation equipped with an Intel Xeon Gold
   6338 CPU and 80 G RAM. G Ablation tests on performance comparison
   adding the positional encoding of TrimNN model (n = 3000). The X-axis
   represents different sizes of CC motifs, and the Y-axis indicates the
   MCC values. CC: cellular community. On each box, the central mark
   indicates the median, and the bottom and top edges of the box indicate
   the 25th and 75th percentiles. The whiskers extend to the most extreme
   data points without outliers, and the outliers are plotted individually
   as circles. Source data are provided as a Source Data file.

TrimNN accurately identifies overrepresented CC motifs in Cellular
Neighborhoods

   On a modified subgraph matching task as a binary classification of
   motif existence in a triangulated graph, TrimNN outperformed the
   competitive methods in all scenarios in most criteria in synthetic
   spatial omics data, including VF2, the original regression-based neural
   network method NSIC^[89]25, and TrimNN-RGIN with the proposed
   formulation but using NSIC’s RGIN network architecture. Especially on
   large-size CC motifs, TrimNN demonstrated significant performance
   improvements with TrimNN-RGIN, highlighting its architecture’s capacity
   (Fig. [90]2D and Supplementary Data [91]1).

   TrimNN accurately identified the top overrepresented CC motifs. On a
   pattern growth challenge to determine the ranking of CC motif
   abundance, TrimNN outperformed competitive methods consistently in
   different sizes and cell types in synthetic spatial omics data. Both
   TrimNN and TrimNN-RGIN outperformed NSIC by a large margin in most
   scenarios and criteria, which highlights the capability of the proposed
   problem formulation. Notably, TrimNN demonstrated an average
   improvement over NSIC by approximately 20 to 60 times in root mean
   square error (RMSE) (Fig. [92]2E and Supplementary Data [93]2). Besides
   the criteria in absolute occurrence value, the relative value of the
   ranking index also supported TrimNN’s capacity in Supplementary
   Fig. [94]4A and Supplementary Data [95]3.

   TrimNN is highly scalable in identifying large-size CC motifs. Because
   scalability plays a vital role in the study, we compared the
   computational time on target-triangulated graphs with varying node
   sizes. We observed that TrimNN, TrimNN-RGIN, and NSIC exhibit linear
   scalability with increasing node sizes, while TrimNN continuously
   consumed less computational time (Fig. [96]2F). TrimNN was especially
   more efficient than TrimNN-RGIN with a simpler network architecture
   using the same problem setting. In contrast, the classical
   enumeration-based VF2 method grew exponentially, where its runtime made
   it unacceptable in most scenarios. In practical usage, on typical
   spatial omics data with thousands of cells of dozens of cell types,
   TrimNN robustly infers large-size CC motifs accurately in seconds,
   which is unattainable through conventional methods.

   Together with GIN, PE increases the expressive power of TrimNN. In
   challenging tasks with larger-sized motifs, ablation tests showed that
   integrating PE improved GNN (Graph Neural Network) performance compared
   with TrimNN-RGIN without PE and a complex GRU module (Fig. [97]2G and
   Supplementary Data [98]4). In addition, GIN, as the critical component
   in TrimNN, was effective by replacing it with other graph neural
   network models, including Graph Convolutional Networks and Graph
   Transformer^[99]26, keeping other components and parameters constant
   (Supplementary Fig. [100]4B and Supplementary Data [101]5). This result
   aligned with theoretical analyses that GIN is a powerful 1-order graph
   neural network^[102]23. Meanwhile, it was shown that TrimNN requires
   sufficient training data to learn the complex relationships
   (Supplementary Fig. [103]4C and Supplementary Data [104]6).

TrimNN identifies representative CC motifs that accurately differentiate the
severity of colorectal cancer patients

   In addition to the above simulation studies, we showed that the CC
   motifs inferred by TrimNN are intrinsic representations to
   differentiate phenotypes of the CC. In a proteomics study comprising 17
   low-risk (Crohn’s-like lymphoid reaction, CLR) and 18 high-risk
   (diffuse inflammatory infiltration, DII) patients, using Co-Detection
   by Indexing (CODEX)^[105]13 on CRC, we performed a CC motif analysis
   using TrimNN on 140 tissue regions and identified the most abundant CC
   motifs in size-1 to size-4. Traditional machine learning approaches,
   such as logistic regression (LR), were adopted using relative ranking
   indices to quantify motif occurrence as features. Because the original
   publication annotated 29 cell types, we chose 29 as the fixed number of
   features in supervised learning to classify CLR and DII. Within tenfold
   cross-validation following the same protocol as CytoCommunity^[106]6,
   the ROC-AUC results of LR were 0.77, 0.76, 0.79, and 0.76
   (Fig. [107]3A) for size-1 to size-4 CC motifs, respectively. This LR
   model with 29 CC motif features outperformed CytoCommunity’s extensive
   GNN computation performance using a default of 512 dimensions of
   embeddings as features (ROC-AUC: 0.71). Notably, if the feature number
   increased to the top 100, the LR model on size-3 motifs achieved an
   ROC-AUC of 0.81. To investigate the robustness of the model against
   potential overfitting, additional experiments were performed using 10
   times 10-fold cross-validation on the LR model with the top 5, 10, 15,
   and 20 size-3 CC motif features, as well as CytoCommunity using reduced
   29-dimensional embedding (Supplementary Fig. [108]5 and Supplementary
   Data [109]7). The performance of the LR model with diverse small
   numbers of CC motif features is relatively stable, suggesting it
   encounters limited influences from overfitting with LR. In addition,
   other classical machine learning models, such as Random Forest and
   Support Vector Machine, were applied to the same classification tasks
   with the same settings. These models performed similarly to LR, further
   supporting the representational power of CC motifs (Supplementary
   Data [110]8-[111]10). The evaluation of the comprehensive performance
   comparison to CytoCommunity and SPACE-GM is provided in Supplementary
   Data [112]11.

Fig. 3. TrimNN analysis in a colorectal cancer study using CODEX.

   [113]Fig. 3
   [114]Open in a new tab

   A The ROC curves of the LR model classify CLR and DII patients using
   the top CC motifs of size-1 to size-4 as features, and the competitive
   method CytoCommunity uses learned dimension. The LR model uses features
   as motif counts from TrimNN and scales between 0 and 1. B Visualization
   of all the samples using the top two principal components from the 29
   top size-2 CC motifs. Blue spots denote the CLR patient group and red
   spots denote the DII patient group. C Generalizability of the trained
   model testing on random cropping of ROI in the samples (n = 100). The
   X-axis is the ratio of width and height of the original ROI, and the
   Y-axis is ROC-AUC. On each box, the central mark indicates the median,
   and the bottom and top edges of the box indicate the 25th and 75th
   percentiles. The whiskers extend to the most extreme data points
   without outliers, and the outliers are plotted individually as circles.
   Survival curves of DII patients with and without enriched motifs,
   including D size-2 “A & B”. Here, cell type CD68+CD163+ macrophages are
   denoted as “A” and smooth muscles are denoted as “B”. E size-3 “A & A &
   B” and F size-4 “A & A & B & B”. The visualization of spatial
   localization of size-2 CC motif “A & B” on the CC in G patient 3 (DII)
   on spot 5A and H patient 8 (CLR) on spot 16 A. The visualization of the
   spatial locations of the size-3 motif “A & A & B” in I DII spot and J
   CLR spot (same spots as G and H). The visualization of spatial
   localization of the size-4 motif “A & A & B & B” in K DII spot and L
   CLR spot (same spots as G and H). All motifs are marked as blue, nodes
   of cell type “A” are red, and nodes of cell type “B” are orange. The
   plot of LR coefficients ranked by Cox PH p-value of the top 29 CC
   motifs in M size-2, N size-3, and O size-4. The extent of the blue
   color represents the Cox PH p-value. The p-values were derived from the
   two-sided Wald test. * marks the highlighted motif. LR: logistic
   regression, CC: cellular community. Source data are provided as a
   Source Data file.

   Besides supervised learning, these CC motifs seemed to capture some
   intrinsic characteristics in CNs, where CLR and DII demonstrated good
   visual separation using the top two principal components inferred from
   the top 29 size-2 motifs (Fig. [115]3B). Further unsupervised
   hierarchical clustering showed the different distributions of the top
   29 motif abundances among the CLR and DII groups in Supplementary
   Fig. [116]6A and [117]6B (size-2), Supplementary Fig. [118]6C and
   [119]6D (size-3), and Supplementary Fig. [120]6E and [121]6F (size-4).

   CC motifs in simpler models and fewer numbers of features showed better
   generalizability across multiple samples in machine learning. When only
   parts of the CC were available by random cropping the samples, CC
   motif–based LR methods were very robust in generalizability compared to
   competitive methods (Fig. [122]3C). The same trends were observed in
   distorted samples with simulated noises in cell missing, cell
   coordinate shifting, and cell type misclassification (Supplementary
   Fig. [123]7).

   In addition, the enrichment of sizable CC motifs can be used to
   differentiate patient survival. We identified several size-2, size-3,
   and size-4 CC motifs that significantly differentiate survival (Cox PH
   p < 0.05) between enriched and non-enriched DII patients, while cell
   type composition (size-1 motifs) may not necessarily succeed
   (Supplementary Data [124]12). With cell type “CD68+CD163+ macrophages”
   (denoted by “A”) and cell type “smooth muscle” (denoted by “B”), the
   survival curves showed that size-2 motif “A & B” may not have
   adequately separated survival in the DII patient group (Cox PH
   p = 0.63, shown in Fig. [125]3D), but including more adjacent nodes
   with the same cell types, patients with enrichment of size-3 (“A & A &
   B”) and size-4 (“A & A & B & B”) CC motifs showed significant lower
   survival rates (Cox PH p = 0.016, shown in Fig. [126]3E, and Cox PH
   p = 0.0093, shown in Fig. [127]3F, respectively). To validate the
   results, we performed additional survival analyses in both COAD and
   READ cohorts of The Cancer Genome Atlas (TCGA) associated with
   CD68+CD163+ macrophage marker genes: CD68, CD163, CD14, and
   ITGAM^[128]27, and smooth muscle marker genes: ACTA2, MYH11, and
   MYL9^[129]28. We observed no significant differences in survival
   between patients associated with either cell type (Cox PH p > 0.05,
   Supplementary Figs. [130]8 and [131]9). This independent analysis
   showed consistent results, indicating that cell type compositions, such
   as size-1 CC motifs, have limited effectiveness in differentiating
   patient survival in CRC. In addition, the occurrence numbers of these
   CC motifs among DII and CLR patients were 14,415 and 7004 (ratio 2.06)
   for size-2 “A & B”; 4176 and 1548 (ratio 2.70) for size-3 “A & A & B”;
   and 6946 and 2276 (ratio 3.05) for size-4 “A & A & B & B,” respectively
   in each case. All were inferred as significant through the
   Benjamini-Hochberg adjusted Fisher’s exact test. The different
   distribution of these CC motifs on CCs among DII and CLR spots can be
   visualized in Fig. [132]3G and H (size-2), Fig. [133]3I and J (size-3),
   and Fig. [134]3K, L (size-4).

   Notably, spatial topology plays a crucial role in linking phenotypes
   and survival. There were two types of size-3 motifs with cell types
   CD68+CD163+macrophages (“A”) and smooth muscle (“B”). Compared with “A
   & A & B”, the alternative motif “A & B & B” occurred 4602 and 2255
   times among DII and CLR patients, respectively, with a lower ratio of
   2.04, and it cannot differentiate survival well (Cox PH p = 0.2975).
   Apparently, these topological differences among the spatial
   localization of cells in different cell types played different roles
   biologically and pathologically, where conventional top-down approaches
   with cell type composition failed to distinguish (Supplementary
   Fig. [135]10 and Supplementary Data [136]13–[137]16).

   Furthermore, an LR model provides intrinsic interpretability when
   differentiating phenotypes. The coefficients of each feature from the
   LR model demonstrated the importance of CC motifs quantitatively,
   making the model interpretable (Fig. [138]3M–O). Notably, all
   macrophage-related, muscle-related, and significant Cox PH p-value
   motifs in different sizes tended to have high absolute coefficient
   values. The same interpretable results can also be cross-validated by
   Shapley value^[139]29 in Supplementary Fig. [140]6G–I, showing that
   these macrophage-related and muscle-related CC motifs were essential to
   differentiating DII patients from CLR patients. Biologically, it was
   evidenced that macrophages facilitate pancreatic cancer to induce
   muscle wasting via promoting TWEAK (TNF-like weak inducer of apoptosis)
   secretion from the tumor^[141]30. After carefully checking these top
   motifs in different sizes, we also identified biologically meaningful
   tumor cells and B cells, which were known to be related to the severity
   of CRC^[142]31. Representative tumor and B cell enrichments in DII and
   CLR samples are shown in Supplementary Fig. [143]6J, [144]6K, [145]6L,
   and 6M. Our analysis validated the crosstalk between macrophages,
   muscle wasting, and cancer cachexia through an independent spatial
   omics study, and TrimNN identified CC motifs in a data-driven approach
   as robust interpretable representations in CNs.

TrimNN identifies CC motifs revealing diverse roles in Alzheimer’s disease
using spatial transcriptomics data

   Next, we showed TrimNN’s capability to identify diverse spatially
   distributed CC motifs corresponding to multiple biological and
   pathological mechanisms in complex diseases. It is known that the
   interaction between CTX (Cortex) excitatory neurons and Microglia is
   significantly disrupted by neuroinflammation in Alzheimer’s disease
   (AD)^[146]32. However, their topological combinations, particularly
   their relationship with amyloid-β on the cellular level, are still
   unknown^[147]33. In a study, we performed TrimNN on an AD mouse brain
   with 8-month-old and 13-month-old samples sequenced by STARmap PLUS
   spatially resolved transcriptomics^[148]34. There were two replicates
   for both disease and control conditions at each time point. The
   transcriptomics data included 2,766 genes and two proteomics channels
   representing AD markers of amyloid-β and tau pathologies at subcellular
   resolution.

   On the derived CCs, size-3 triangle-like CC motifs composed of cell
   types CTX excitatory neurons and Microglia were identified significant
   between AD (Fig. [149]4A) and control (Fig. [150]4B). These significant
   CC motifs included CTX excitatory neurons-CTX excitatory neurons-CTX
   excitatory neurons (CCC), CTX excitatory neurons-CTX excitatory
   neurons-Microglia (CCM), CTX excitatory neurons-Microglia-Microglia
   (CMM), and Microglia-Microglia-Microglia (MMM) (Benjamini–Hochberg
   adjusted Fisher’s exact test in 8-month-old replicate 1 with
   p = 6.12e–32, p = 3.23e–20, p = 4.30e–34, and p = 9.74e–07,
   respectively). Visualization of an exemplary CC motif “MMM”
   demonstrated uneven spatial distribution that differed in AD
   (Fig. [151]4C) and control (Fig. [152]4D). Supplementary Fig. [153]11
   shows the spatial occurrence distribution of the other three motifs.
   The CCs inferred from all samples are shown in Supplementary
   Fig. [154]12 and Supplementary Data [155]17–[156]28.

Fig. 4. TrimNN analysis in an AD mouse study sequenced by STARmap PLUS.

   [157]Fig. 4
   [158]Open in a new tab

   CCs of 13-month-old of A. AD and B control sample replicate 1 are
   obtained using Delaunay triangulation, where black spots are amyloid-β
   in the AD sample. The spatial locations of the identified motif with
   all Microglia cells (“MMM” motif, where “M” denotes cell type
   Microglia) in 13-month-old replicate 1 of C. AD and D control mouse
   samples. “MMM” motifs are marked as purple. Cell–cell communication
   analysis demonstrates the ligand–receptor differences between motif
   regions and non-motif regions as river plots, including E cell type CTX
   excitatory neurons (denoted as “C”) as source (left) and target
   (right), F cell type Microglia as source (left) and target (right) in
   regions with and without “CCC” motif. Similarly, G and H are cell types
   of CTX excitatory neurons and Microglia as source and target in regions
   with and without the “MMM” motif. I GO enrichment analysis of
   Biological Processes and J Pathway enrichment analysis on DEGs between
   regions containing and not containing the “MMM” motif in 13-month-old
   AD samples. K Expression of marker genes for cell types “C” and “M”
   related size-3 and size-4 motifs. L Spatial co-occurrence of different
   CC motifs with respect to amyloid-β as computed using Squidpy.
   Microglia-related motifs have an even higher spatial co-occurrence
   probability compared to the amyloid-β plaque, and CTX excitatory
   neurons–related motifs have lower spatial co-occurrence probabilities
   compared to the amyloid-β plaque. CC: cellular community. Source data
   are provided as a Source Data file.

   Here, we defined motif-enriched regions as expanded regions within
   three hops of CC motifs in the CC. From the perspective of cell–cell
   communications, unique ligand–receptor signaling pathway patterns
   between “CCC” (Fig. [159]4E, F) and “MMM” (Fig. [160]4G and H) were
   identified by motif-enriched and complementary regions on 13-month-old
   samples using CellChat^[161]35. Specific to cell type Microglia, motifs
   “CCC” and “MMM” had dominant ligand–receptor pairs GRN^[162]21 and PMCH
   to distinguish motif regions from the complementary regions. AD-related
   ligand–receptors, including GRN, VEGF, PDGF, CCL, VIP, NRG, and SEMA3,
   were significantly enriched in CC motifs associated with CTX excitatory
   neurons or Microglia (Supplementary Data [163]29–[164]32). All the
   cell–cell communication results on CC motifs are detailed in
   Supplementary Figs. [165]13–[166]21 and Supplementary
   Data [167]33–[168]40. These differences in cell–cell communications
   were independently validated by DeepTalk^[169]36, incorporating
   long-range cellular interactions. Particularly, CSF and VEGF pathways
   exhibited significant differences between “CCC” and “ non- CCC” regions
   in Microglia-to-CTX excitatory neurons cell–cell communication
   (Mann–Whitney U test, p = 0.033, and p = 0.003, Supplementary
   Figs. [170]22–[171]25). Considering cell–cell communication among all
   the cell types, all ligand–receptor pairs identified by CellChat show
   significant differences with and outside the identified motif regions
   (Mann–Whitney U test, p-value < 0.001, Supplementary Figs. [172]26 and
   [173]27).

   Then we analyzed the gene-level characteristics of identified CC
   motifs. Comparing “MMM” motif-enriched and complementary regions,
   differentially expressed genes (DEGs) were identified as significant
   (p-value < 0.05) using the Wilcoxon rank-sum test, including Plekha1,
   Ctsb, and Sort1 in 8-month-old samples (Supplementary Data [174]44),
   and App, Plekha1, Clu, Ptk2b, Sort1, Bin1, and Ctsb in 13-month-old
   samples (Supplementary Data [175]48). On DEGs in 13-month-old samples,
   Gene Ontology (GO) enrichment analysis showed significant
   vesicle-mediated transport in synapse (q-value = 1.68E–107), regulation
   of synapse structure or activity (q-value = 6.45e–106), learning or
   memory (q-value = 1.20e–56), and cognition (q-value = 5.54e–56)
   (Fig. [176]4I). Neural systems (q-value = 3.63e–51), transmission
   across chemical synapses (q-value = 2.05e–33), neurotransmitter
   receptors and postsynaptic signal transmission (q-value = 8.68e–20),
   and nervous system development (q-value = 1.33e–15) were enriched with
   pathway enrichment analysis (Fig. [177]4J). For all the detailed
   results of CC motifs “CCC”, “CCM”, “CMM”, and “MMM”, a similar analysis
   was performed for DEGs (Supplementary Data [178]41–[179]48), including
   both GO and pathway enrichment analyses (Supplementary Figs. [180]28
   and [181]29).

   To validate their relations with AD, we compared these motif-related
   DEGs with 77 AD-associated genes identified from large-scale GWAS
   analysis^[182]37. On CC motif “CCC”, the Trem2 gene was exclusively
   observed in the replicates of the 13-month-old but not 8-month-old AD
   mouse model, consistent with its role in the late-onset of AD^[183]38.
   Similarly, for the motif “MMM”, the Clu gene was highlighted only in
   the 13-month-old mouse model, aligning with its direct involvement in
   the formation process of amyloid-β^[184]39 (Supplementary
   Fig. [185]30).

   Based on the identified size-3 motifs, we performed pattern growth to
   identify size-4 motifs using TrimNN (Supplementary Notes [186]1). Among
   all the size-4 “CCC” expanded motifs, “CCCM” showed the most
   significant difference between AD and control samples, while “MMMM” was
   the most significant size-4 motif expanded from “MMM”. Similar to the
   analysis on size-3 motifs, checking significantly enriched
   ligand–receptors (Supplementary Data [187]49–[188]51) and DEGs
   (Supplementary Data [189]52–[190]57), these size-4 motifs were related
   to AD in cell–cell communication analysis (Supplementary
   Data [191]58–[192]63), GO enrichment analysis (Supplementary
   Fig. [193]31), and pathway enrichment analysis (Supplementary
   Fig. [194]32).

   Further investigation on DEGs showed diverse groups of CC motifs with
   expressed markers (Fig. [195]4K). Size-3 “MMM” and size-4 “MMMM” motifs
   with homogeneous Microglia had divergent expression patterns. For
   example, Hexb had a higher average expression than the other CC motifs.
   Hexb is known to induce toxic and progressive neuronal damage, which
   may relate to neurodegenerative dementia^[196]40.

   In addition to examining the diversity of CC motifs at the gene level,
   we investigated whether the identified CC motifs were spatially
   co-localized with amyloid-β by computing their co-occurrence
   probabilities using Squidpy^[197]41. The results showed that
   Microglia-related CC motifs had an even higher co-occurrence
   probability with amyloid-β than the spatial expectation, distinguishing
   them from other CC motifs associated with CTX excitatory neurons
   (Fig. [198]4L). Interestingly, the extent of homogeneity of Microglia
   regions seemed to correspond to a larger co-occurrence probability of
   amyloid-β. In contrast, the extent of homogeneity of CTX excitatory
   neurons tallied to a lower co-occurrence probability. This trend
   prevailed across the whole spectrum of CC motifs composed of Microglia
   and CTX excitatory neurons in multiple sizes, from a very high ratio of
   size-4 “MMMM” to a very low ratio of size-3 “CCC”.

   Differences in both DEGs and spatial co-occurrence suggest the presence
   of two distinct types of CC motifs related to amyloid-β in AD. One type
   of CC motif (i.e., “CCC”, “CCM”, “CMM”, and “CCCM”) was reluctant to
   co-localize with amyloid-β. Another kind of CC motif (i.e., “MMM” and
   “MMMM”) was closely co-localized with amyloid-β. These results were
   consistent with the observation that Microglia, as key mediators in the
   brain, activate inflammation in the vicinity of amyloid-β deposits,
   which are directly toxic to the adjacent neurons^[199]42. Activated
   Microglia release pro-inflammatory cytokines, such as tumor necrosis
   factor-alpha (TNF-α) and interleukin-1 beta (IL-1β), can damage
   excitatory neurons or alter their function^[200]43.

   In this case study, TrimNN confirmed known knowledge of AD-related cell
   types and provided some new insights into spatial biology. As an
   unbiased data-driven approach, TrimNN independently identified
   pathologically related spatial characteristics of Microglia and CTX
   excitatory neurons, along with their topological relations with diverse
   cell types, as two distinguished CC motifs differ in levels of cell
   type, gene, and cell–cell communications. Analysis enabled by CC motifs
   demonstrates an unprecedented spectrum of the spatial relationships
   between the homogeneity of CTX excitatory neurons/Microglia cell types
   and the location of amyloid-β. TrimNN accurately captured these spatial
   co-localization patterns with amyloid-β deposits, providing insights
   into the onset of AD as the result of interactions between multiple
   cell types^[201]44, which clustering-based tools may overlook
   (Supplementary Figs. [202]33 and [203]34).

TrimNN identifies cell type–specific spatial tendencies in a colorectal
carcinoma study on spatial proteomics data

   Besides the AD study, we also performed TrimNN analysis to explore cell
   type–specific spatial tendencies in one colorectal carcinoma study. It
   is known that the tumor microenvironment can significantly influence
   the interactions between T-cells and epithelial cells through antigen
   presentation, T-cell activation, and modulation of the tumor
   microenvironment. However, it is still unknown how the spatial
   arrangement of these cells is related to effective immune surveillance
   and the potential for therapeutic interventions^[204]45. The adopted
   colorectal carcinoma study investigated 40 ROIs in two colorectal
   cancer patients and 18 ROIs in two healthy controls using spatial
   proteomics of multiplexed ion beam imaging using time of flight
   (MIBI-TOF)^[205]46.

   After a comprehensive analysis of size-3 and their related size-4 CC
   motifs with TrimNN, we defined two types of CC motifs: Shifted
   Interaction Motifs and Homeostatic Interaction Motifs (Fig. [206]5A).
   Shifted Interaction Motifs demonstrated a shift of CC motif abundance
   from control-enriched (more occurrence in control than disease samples)
   to disease-enriched (more occurrence in disease than control samples)
   when expanding from size-3 to size-4. The exemplary size-3 motif “ABC”
   (Fig. [207]5B, C) suggested disease progression when involving other
   immune cells to form a size-4 motif “ABCD” (Fig. [208]5D, E), where “A”
   denotes CD4 T-cells, “B” denotes CD8 T-cells, “C” denotes epithelial,
   and “D” denotes other immune cells (other CD45+) annotated by the
   original publication. Proportion tests showed that this size-4 “ABCD”
   motif significantly differed from the size-3 “ABC” motif in abundance
   between disease and control samples (p = 2.58e–12). In contrast,
   Homeostatic Interaction Motif remained consistent in abundance between
   disease and control groups when expanding its sizes (e.g., size-3 motif
   “AEC”) (Fig. [209]5F, G), where “E” denotes endothelial concatenating
   another epithelial (“C”) as a size-4 motif “AECC” (Fig. [210]5H, I).
   Proportion tests showed consistency between disease and control ratios
   among this pair of size-3 and size-4 motifs (p = 0.55). The expression
   level of antibodies also confirmed differences between these two groups
   of CC motifs (Fig. [211]5J, Supplementary Fig. [212]35A and [213]35B).
   A similar analysis demonstrated that these two groups of CC motifs also
   existed in AD studies (Supplementary Notes [214]2).

Fig. 5. TrimNN analysis in a colorectal carcinoma study using MIBI-TOF.

   [215]Fig. 5
   [216]Open in a new tab

   A Schematic of Shifted Interaction Motif and Homeostatic Interaction
   Motif as two types of size-4 motifs. Shifted Interaction Motifs: size-3
   motif “ABC” (purple) in exemplary B spot 33 (disease) and C spot 56
   (control), the successor size-4 motif “ABCD” (red) in the same D spot
   33 (disease) and E spot 56 (control). Homeostatic Interaction Motifs:
   exemplary size-3 motif “AEC” (purple) in exemplary F spot 8 (disease)
   and G spot 50 (control), the successor size-4 motif “AECC” (red) in the
   same H spot 8 (disease) and I spot 50 (control). J Heatmap of antibody
   expression ratio between disease and control samples in Shifted
   Interaction Motif and Homeostatic Interaction Motif. K Ranking of
   effective size between all cell types in colon tissue samples.
   Abundance of size-2 CC motifs as occurrences in L colorectal carcinoma
   and M healthy control samples. N P-value of size-2 CC motifs between
   disease and control by the two-sided Benjamini-Hochberg adjusted
   Fisher’s exact test. O Heatmap on sender rank from NCEM-type coupling
   analysis in colon tissue samples. Heatmap of NCEM-type coupling
   analysis in P colorectal carcinoma and Q healthy control samples. R
   Difference values from NCEM-type coupling analysis between disease and
   control samples. Cell type “A” denotes CD4 T-cells, “B” denotes CD8
   T-cells, “C” denotes epithelial, “D” denotes other immune cells
   annotated by the original publication, and “E” denotes endothelial. The
   star symbol marks the paired cell type composition of the “AEC” motif.
   CC: cellular community. Source data are provided as a Source Data file.

   To better explain the biological relevance of these patterns, we
   analyzed the types of proteins showing increased expression in the
   expanded motifs. Interestingly, the increased disease/control
   fold-change observed in “ABCD” compared to “ABC” suggests that the
   involvement of the fourth node (other immune cells) is associated with
   heightened immune activity or immune cell engagement in the tumor
   microenvironment. Many of the upregulated proteins in this motif, such
   as CD39, CD11C, CD45, CD14, and H3, are related to immune infiltration,
   antigen presentation, or immune cell function. However, the presence of
   CD39 (involved in immunosuppressive adenosine signaling) and a
   concurrent rise in metabolic enzymes (e.g., HK1, G6PD, and VDAC1) and
   proliferation markers (e.g., Ki67) suggests that this immune activity
   may be skewed or dysfunctional. Rather than a classical
   immune-activated phenotype, the tumor microenvironment in “ABCD” may
   represent an immune-infiltrated but metabolically constrained
   immunoregulatory niche.

   In contrast, although the Homeostatic Interaction Motifs “AEC” and
   “AECC” do not change in abundance across conditions, their protein
   expression reveals different biological characteristics. Notably, PD1
   and GLUT1 are upregulated in the size-4 motif “AECC” suggesting that
   immune checkpoint pathways and metabolic competition may play important
   roles even in spatially stable interaction contexts. PD1 upregulation
   indicates a checkpoint-mediated immune inhibitory signal, while GLUT1
   elevation supports increased glycolytic activity, both of which can
   further restrict immune function through resource competition. We also
   observed that motif-specific differences in PD1 and CD3 did not change
   between “ABC” and “ABCD”, but both showed increased expression in
   “AECC” versus “AEC”. This implies that the involvement of endothelial
   cells may contribute to a niche promoting immune checkpoint activation
   or T cell exhaustion. CD36, a lipid metabolism regulator, showed a
   moderate decrease in the “ABCD” motif, suggesting lipid metabolic
   shifts associated with certain spatial configurations.

   Together, these comparisons provide evidence that different motif
   expansions not only reflect structural reorganization but also
   correspond to distinct biological states. The Shifted Interaction Motif
   represents a transition toward a metabolically active,
   immune-suppressive tumor niche, consistent with tumor progression and
   immune evasion. The Homeostatic Interaction Motif, while stable in
   structure, still shows signs of immune checkpoint engagement,
   potentially limiting effective immune surveillance.

   Next, we explored cell-type preferences using TrimNN analysis and