Abstract The spatial organization of cells plays a pivotal role in shaping tissue functions and phenotypes in various biological systems and diseased microenvironments. However, the topological principles governing interactions among cell types within spatial patterns remain poorly understood. Here, we present the triangulation cellular community motif neural network (TrimNN), a graph-based deep learning framework designed to identify conserved spatial cell organization patterns, termed cellular community (CC) motifs, from spatial transcriptomics and proteomics data. TrimNN employs a semi–divide-and-conquer approach to efficiently detect overrepresented topological motifs of varying sizes in a triangulated space. By uncovering CC motifs, TrimNN reveals key associations between spatially distributed cell-type patterns and diverse phenotypes. These insights provide a foundation for understanding biological and disease mechanisms and offer potential biomarkers for diagnosis and therapeutic interventions. Subject terms: Computational models, Machine learning, Data mining, Network topology __________________________________________________________________ Cellular spatial organisation is crucial for shaping tissue functions and phenotypes. Here, authors present TrimNN, a graph-based deep learning framework to identify conserved cellular community motifs, revealing links between spatially distributed cell-type patterns and diverse phenotypes. Introduction Various cells work together within spatial arrangements in tissue to support organ homeostasis and function^[42]1. Deciphering the multicellular organization is key to understanding the relationship between spatial structure and tissue biological and pathological functions^[43]2. Emerging spatial omics approaches, including spatially resolved transcriptomics^[44]3 and spatial proteomics^[45]4, enable investigation of the mechanisms governing the spatial organization of different cell types in a specific tissue. Within a region of interest (ROI) in spatial omics, cellular neighborhoods (CNs) define local cell type enrichment patterns in cellular communities (CCs), and decoding function-related conservative spatial features in CNs is one of the primary spatial omics data analysis tasks^[46]4. Most existing data analysis approaches adopt the top-down strategy to describe the cell organizations. This strategy mainly relies on clustering strategies to identify the cell type compositions as common patterns. Deep learning approaches, including SPACE-GM^[47]5, CytoCommunity^[48]6, CellCharter^[49]7, and BANKSY^[50]8, typically learn low-dimensional embeddings of the nodes in corresponding CNs and then apply clustering approaches to these embeddings. However, clustering approaches suffer the following challenges in dissecting and interpreting highly heterogeneous, dynamically evolving cell systems^[51]9. First, clustering results usually become less stable when samples contain cells under active state transition, which is common in disease or developmental processes^[52]10. Second, clusters identified by these top-down approaches are often described as percentages of cell-type compositions. These clustering presentations lack formulations in topologically representing the geometrical cell-type interactions or are difficult to interpret biologically. Last, these top-down results essentially depend on the presence of batch effects, where CNs separate primarily by samples as technical covariates rather than biological features^[53]3. These batch effects make it easy to overfit the models but difficult to validate across different datasets^[54]5. Considering the preceding limitations of top-down strategies, we instead use a bottom-up strategy to identify CC motifs as recurring significant interconnections between cells. In the spatial omics–derived CC, we hypothesize that CC motifs can be represented as topological building blocks of multicellular organization consistent across different samples and associated with key biological processes and functions. CC motifs are biologically interpretable spatial patterns of the combined cell types, which provide topological information beyond clusters and explicitly link to the biological and pathological mechanisms through distinct cell–cell communications, highly expressed genes and pathways^[55]11. This concept is related to the functional tissue units (FTUs)^[56]12, but CC motifs are even smaller in the scale of cell locations and cell types, which provides more details for understanding and modeling the healthy physiological function of the organ and functional-related changes during disease states. Currently, size 1–3 motif analysis^[57]4 makes up most of the spatial omics studies, where size-1 motifs are single nodes that can be treated as cell-type compositions, size-2 motifs are double nodes linked by edges, and size-3 motifs are triple nodes within triangles. Nevertheless, biologists have found that sizable CC motifs with more nodes than triangles substantially correlate with patient survival and phenotypical features in colorectal cancer (CRC)^[58]13, kidney diseases^[59]14, maternal–fetal interface^[60]15, and many other biological contexts. In practice, identifying the most overrepresented CC motifs composing multicellular organization is still computationally expensive with (i) subgraph matching^[61]16, which counts the occurrence of a given motif on the query graph, and (ii) pattern growth^[62]17, which finds the motifs with the most significant occurrence. It is known that subgraph matching is NP-complete^[63]16, which makes the node type combination alone super-exponential. Existing approaches include permutation^[64]11, edge sampling (e.g., mfinder^[65]18), node sampling (e.g., FANMOD^[66]19), and global pruning (e.g., Ullmann^[67]20 and VF2^[68]21). A computationally feasible approach is still lacking to analytically identify conservative, interpretable, and generalizable spatial rules of cellular organization in different sizes across different samples of spatial omics. Here, we propose the triangulation cellular community motif neural network (TrimNN), a graph-based deep learning approach to analyze spatial transcriptomics and proteomics data using a bottom-up strategy (Supplementary Fig. [69]1). Within the input spatial omics samples, CC is defined based on the cells as nodes, the node types represent different cell types, and the edges encode physical proximity inferred unidirectional as the spatial cell-cell relation from Delaunay triangulation^[70]22 based on nodes coordinates from ROI. TrimNN estimates overrepresented size- [MATH: K :MATH] CC motifs in the CC of spatial omics using graph isomorphism networks^[71]23 (GIN) empowered by positional encoding^[72]24 (PE). In various spatial transcriptomics and spatial proteomics case studies, TrimNN identifies computationally significant and biologically meaningful CC motifs to differentiate patient survival in CRC studies and represents pathologically related cell type organization in neurodegenerative diseases and colorectal carcinoma studies. Notably, the identified sizable CC motifs demonstrate their potential as interpretable topological prognostic biomarkers linking the topological structural organization of cell types at microscopic levels to phenotypes at macroscopic levels, which cannot be inferred by other existing tools. The source code of TrimNN is publicly available at [73]https://github.com/yuyang-0825/TrimNN. Results TrimNN quantifies multicellular organization with sizable CC motifs A schematic diagram of the proposed TrimNN and its analytic workflow is shown in Fig. [74]1A. These identified CC motifs are biologically interpretable through a set of downstream analyses, including motif visualization, cellular-level interpretation within cell–cell communication analysis, gene-level interpretation within differentially expressed gene and pathway analysis, and phenotypical analysis within the availability of phenotypical information (Fig. [75]1B). TrimNN is constructed on an empowered GIN to estimate the occurrence of the query on the target graph. On the CC as a triangulated graph built from spatial omics, TrimNN builds a supervised graph learning model by simplifying the graph constraints and incorporating the inductive bias within triangles derived from Delaunay triangulation. TrimNN decomposes the regression task in occurrence counting of the query graphs into many trackable binary classification tasks modeled by the sub-TrimNN module. Inspired by the idea of NSIC^[76]25, this method is trained on representative pairs of the predefined query subgraphs and the target triangulated cell graphs as a binary classification task. This graph representation framework builds upon GIN and adopts a shortest distance–based PE^[77]24, modeling the symmetric space to increase the expressive power. Additionally, TrimNN adopts a semi–divide-and-conquer strategy to estimate the abundance of the query by summarizing the enumeration of single classification tasks by a sub-TrimNN module on each node’s enclosed graph. Given the size of the query subgraph, our framework uses an enumeration approach to estimate the most overrepresented CC motifs with possible cell types and topology. Then, we search to infer CC motifs in different sizes incrementally. The details of the architecture of TrimNN are shown in Supplementary Fig. [78]2. Fig. 1. TrimNN analysis workflow. [79]Fig. 1 [80]Open in a new tab A Spatially resolved transcriptomics (e.g., STARmap PLUS and 10X Xenium) and spatial proteomics data (e.g., MIBI-TOF and CODEX) are used as input to generate corresponding CCs with spatial coordinates and Delaunay triangulation. TrimNN is trained on representative pairs of query motifs and target triangulated graphs at scale. Given a specific query, TrimNN identifies its occurrence in the target CC in the subgraph matching process by decomposing this regression task into many binary classification problems, where each classification predicts whether the query exists in the target graph as the enclosed graph of each node. Enumerating possible motifs at size- [MATH: K :MATH] , TrimNN identifies the most overrepresented motifs. Then, the pattern growth process adopts a heuristic search for their successor size- [MATH: k+1 :MATH] motifs. Here, we take size-3 CC motifs as an example. After subgraph matching and pattern growth, TrimNN estimates overrepresented CC motifs. Created in BioRender. Yu, Y. (2025) [81]BioRender.com/mfm4ta4. B These CC motifs can be biologically interpreted in the downstream analysis, including visualization, cellular-level interpretation within cell–cell communication analysis, gene-level interpretation within differentially expressed gene analysis (e.g., GO enrichment analysis and pathway enrichment analysis), and phenotypical analysis within the availability of phenotypical information (e.g., survival curve and phenotypic classification analysis). CC: cellular community. We hypothesize CC motifs as the countable recurring spatial patterns of various cell types are robust within noises to represent and quantify multicellular organization. We performed simulations to mimic different levels of noises, including cell missing from the cell capture imperfection of sequencing technology (Fig. [82]2A), cell coordinate shifting from technological errors (Fig. [83]2B), and cell type misclassification from annotation errors in data analytics (Fig. [84]2C). We noticed diverse noises do not influence the relative ranking of CC motif abundance, which remained robustly consistent in most scenarios (Fig. [85]2A and Supplementary Fig. [86]3). Even under extreme cases with a noise ratio of 0.4 and 0.5, the Spearman correlation between abundance rankings before and after poised noise remained relatively stable in cell dropout and cell coordinate perturbations. When the noise level is high, the correlation values deteriorate in large motif sizes for cell type misclassification, which is unlikely to occur in practical scenarios. Fig. 2. The performance of TrimNN on spatial omics. [87]Fig. 2 [88]Open in a new tab A Simulations of missing cell effects on CC motifs, represented as the Spearman correlation between abundance rankings of all the possible motifs before and after simulated noises at cell proportions of 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, and 0.5 within CC motifs in size-1, size-2, size-3, size-4, and size-5 (n = 100). B Simulations of cell coordinate shifting effects on CC motifs, represented as the Spearman correlation between abundance rankings of all the possible motifs before and after simulated noises with different levels of noises of 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, and 0.5 at cell proportions of 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, and 0.5 within CC motifs in size-1, size-2, size-3, size-4, and size-5 (n = 100). C Simulations of cell-type misclassification effects on CC motifs, represented as the Spearman correlation between abundance rankings of all the possible motifs before and after simulated noises at cell proportions of 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, and 0.5 within CC motifs in size-1, size-2, size-3, size-4, and size-5 (n = 100). D Benchmarking the performance of TrimNN, TrimNN-RGIN, and NSIC on independent simulated data for subgraph matching (n = 3000). The X-axis represents different sizes of CC motifs, and the Y-axis indicates the MCC (Matthews Correlation Coefficient) values. E Performance comparison of TrimNN, TrimNN-RGIN, and NSIC in identifying occurrences of CC motifs in diverse simulated datasets. The Y-axis is the RMSE (Root Mean Square Error) value. F Scalability of TrimNN. The X-axis represents the size of the triangulated graph, and the Y-axis indicates the runtime on a workstation equipped with an Intel Xeon Gold 6338 CPU and 80 G RAM. G Ablation tests on performance comparison adding the positional encoding of TrimNN model (n = 3000). The X-axis represents different sizes of CC motifs, and the Y-axis indicates the MCC values. CC: cellular community. On each box, the central mark indicates the median, and the bottom and top edges of the box indicate the 25th and 75th percentiles. The whiskers extend to the most extreme data points without outliers, and the outliers are plotted individually as circles. Source data are provided as a Source Data file. TrimNN accurately identifies overrepresented CC motifs in Cellular Neighborhoods On a modified subgraph matching task as a binary classification of motif existence in a triangulated graph, TrimNN outperformed the competitive methods in all scenarios in most criteria in synthetic spatial omics data, including VF2, the original regression-based neural network method NSIC^[89]25, and TrimNN-RGIN with the proposed formulation but using NSIC’s RGIN network architecture. Especially on large-size CC motifs, TrimNN demonstrated significant performance improvements with TrimNN-RGIN, highlighting its architecture’s capacity (Fig. [90]2D and Supplementary Data [91]1). TrimNN accurately identified the top overrepresented CC motifs. On a pattern growth challenge to determine the ranking of CC motif abundance, TrimNN outperformed competitive methods consistently in different sizes and cell types in synthetic spatial omics data. Both TrimNN and TrimNN-RGIN outperformed NSIC by a large margin in most scenarios and criteria, which highlights the capability of the proposed problem formulation. Notably, TrimNN demonstrated an average improvement over NSIC by approximately 20 to 60 times in root mean square error (RMSE) (Fig. [92]2E and Supplementary Data [93]2). Besides the criteria in absolute occurrence value, the relative value of the ranking index also supported TrimNN’s capacity in Supplementary Fig. [94]4A and Supplementary Data [95]3. TrimNN is highly scalable in identifying large-size CC motifs. Because scalability plays a vital role in the study, we compared the computational time on target-triangulated graphs with varying node sizes. We observed that TrimNN, TrimNN-RGIN, and NSIC exhibit linear scalability with increasing node sizes, while TrimNN continuously consumed less computational time (Fig. [96]2F). TrimNN was especially more efficient than TrimNN-RGIN with a simpler network architecture using the same problem setting. In contrast, the classical enumeration-based VF2 method grew exponentially, where its runtime made it unacceptable in most scenarios. In practical usage, on typical spatial omics data with thousands of cells of dozens of cell types, TrimNN robustly infers large-size CC motifs accurately in seconds, which is unattainable through conventional methods. Together with GIN, PE increases the expressive power of TrimNN. In challenging tasks with larger-sized motifs, ablation tests showed that integrating PE improved GNN (Graph Neural Network) performance compared with TrimNN-RGIN without PE and a complex GRU module (Fig. [97]2G and Supplementary Data [98]4). In addition, GIN, as the critical component in TrimNN, was effective by replacing it with other graph neural network models, including Graph Convolutional Networks and Graph Transformer^[99]26, keeping other components and parameters constant (Supplementary Fig. [100]4B and Supplementary Data [101]5). This result aligned with theoretical analyses that GIN is a powerful 1-order graph neural network^[102]23. Meanwhile, it was shown that TrimNN requires sufficient training data to learn the complex relationships (Supplementary Fig. [103]4C and Supplementary Data [104]6). TrimNN identifies representative CC motifs that accurately differentiate the severity of colorectal cancer patients In addition to the above simulation studies, we showed that the CC motifs inferred by TrimNN are intrinsic representations to differentiate phenotypes of the CC. In a proteomics study comprising 17 low-risk (Crohn’s-like lymphoid reaction, CLR) and 18 high-risk (diffuse inflammatory infiltration, DII) patients, using Co-Detection by Indexing (CODEX)^[105]13 on CRC, we performed a CC motif analysis using TrimNN on 140 tissue regions and identified the most abundant CC motifs in size-1 to size-4. Traditional machine learning approaches, such as logistic regression (LR), were adopted using relative ranking indices to quantify motif occurrence as features. Because the original publication annotated 29 cell types, we chose 29 as the fixed number of features in supervised learning to classify CLR and DII. Within tenfold cross-validation following the same protocol as CytoCommunity^[106]6, the ROC-AUC results of LR were 0.77, 0.76, 0.79, and 0.76 (Fig. [107]3A) for size-1 to size-4 CC motifs, respectively. This LR model with 29 CC motif features outperformed CytoCommunity’s extensive GNN computation performance using a default of 512 dimensions of embeddings as features (ROC-AUC: 0.71). Notably, if the feature number increased to the top 100, the LR model on size-3 motifs achieved an ROC-AUC of 0.81. To investigate the robustness of the model against potential overfitting, additional experiments were performed using 10 times 10-fold cross-validation on the LR model with the top 5, 10, 15, and 20 size-3 CC motif features, as well as CytoCommunity using reduced 29-dimensional embedding (Supplementary Fig. [108]5 and Supplementary Data [109]7). The performance of the LR model with diverse small numbers of CC motif features is relatively stable, suggesting it encounters limited influences from overfitting with LR. In addition, other classical machine learning models, such as Random Forest and Support Vector Machine, were applied to the same classification tasks with the same settings. These models performed similarly to LR, further supporting the representational power of CC motifs (Supplementary Data [110]8-[111]10). The evaluation of the comprehensive performance comparison to CytoCommunity and SPACE-GM is provided in Supplementary Data [112]11. Fig. 3. TrimNN analysis in a colorectal cancer study using CODEX. [113]Fig. 3 [114]Open in a new tab A The ROC curves of the LR model classify CLR and DII patients using the top CC motifs of size-1 to size-4 as features, and the competitive method CytoCommunity uses learned dimension. The LR model uses features as motif counts from TrimNN and scales between 0 and 1. B Visualization of all the samples using the top two principal components from the 29 top size-2 CC motifs. Blue spots denote the CLR patient group and red spots denote the DII patient group. C Generalizability of the trained model testing on random cropping of ROI in the samples (n = 100). The X-axis is the ratio of width and height of the original ROI, and the Y-axis is ROC-AUC. On each box, the central mark indicates the median, and the bottom and top edges of the box indicate the 25th and 75th percentiles. The whiskers extend to the most extreme data points without outliers, and the outliers are plotted individually as circles. Survival curves of DII patients with and without enriched motifs, including D size-2 “A & B”. Here, cell type CD68+CD163+ macrophages are denoted as “A” and smooth muscles are denoted as “B”. E size-3 “A & A & B” and F size-4 “A & A & B & B”. The visualization of spatial localization of size-2 CC motif “A & B” on the CC in G patient 3 (DII) on spot 5A and H patient 8 (CLR) on spot 16 A. The visualization of the spatial locations of the size-3 motif “A & A & B” in I DII spot and J CLR spot (same spots as G and H). The visualization of spatial localization of the size-4 motif “A & A & B & B” in K DII spot and L CLR spot (same spots as G and H). All motifs are marked as blue, nodes of cell type “A” are red, and nodes of cell type “B” are orange. The plot of LR coefficients ranked by Cox PH p-value of the top 29 CC motifs in M size-2, N size-3, and O size-4. The extent of the blue color represents the Cox PH p-value. The p-values were derived from the two-sided Wald test. * marks the highlighted motif. LR: logistic regression, CC: cellular community. Source data are provided as a Source Data file. Besides supervised learning, these CC motifs seemed to capture some intrinsic characteristics in CNs, where CLR and DII demonstrated good visual separation using the top two principal components inferred from the top 29 size-2 motifs (Fig. [115]3B). Further unsupervised hierarchical clustering showed the different distributions of the top 29 motif abundances among the CLR and DII groups in Supplementary Fig. [116]6A and [117]6B (size-2), Supplementary Fig. [118]6C and [119]6D (size-3), and Supplementary Fig. [120]6E and [121]6F (size-4). CC motifs in simpler models and fewer numbers of features showed better generalizability across multiple samples in machine learning. When only parts of the CC were available by random cropping the samples, CC motif–based LR methods were very robust in generalizability compared to competitive methods (Fig. [122]3C). The same trends were observed in distorted samples with simulated noises in cell missing, cell coordinate shifting, and cell type misclassification (Supplementary Fig. [123]7). In addition, the enrichment of sizable CC motifs can be used to differentiate patient survival. We identified several size-2, size-3, and size-4 CC motifs that significantly differentiate survival (Cox PH p < 0.05) between enriched and non-enriched DII patients, while cell type composition (size-1 motifs) may not necessarily succeed (Supplementary Data [124]12). With cell type “CD68+CD163+ macrophages” (denoted by “A”) and cell type “smooth muscle” (denoted by “B”), the survival curves showed that size-2 motif “A & B” may not have adequately separated survival in the DII patient group (Cox PH p = 0.63, shown in Fig. [125]3D), but including more adjacent nodes with the same cell types, patients with enrichment of size-3 (“A & A & B”) and size-4 (“A & A & B & B”) CC motifs showed significant lower survival rates (Cox PH p = 0.016, shown in Fig. [126]3E, and Cox PH p = 0.0093, shown in Fig. [127]3F, respectively). To validate the results, we performed additional survival analyses in both COAD and READ cohorts of The Cancer Genome Atlas (TCGA) associated with CD68+CD163+ macrophage marker genes: CD68, CD163, CD14, and ITGAM^[128]27, and smooth muscle marker genes: ACTA2, MYH11, and MYL9^[129]28. We observed no significant differences in survival between patients associated with either cell type (Cox PH p > 0.05, Supplementary Figs. [130]8 and [131]9). This independent analysis showed consistent results, indicating that cell type compositions, such as size-1 CC motifs, have limited effectiveness in differentiating patient survival in CRC. In addition, the occurrence numbers of these CC motifs among DII and CLR patients were 14,415 and 7004 (ratio 2.06) for size-2 “A & B”; 4176 and 1548 (ratio 2.70) for size-3 “A & A & B”; and 6946 and 2276 (ratio 3.05) for size-4 “A & A & B & B,” respectively in each case. All were inferred as significant through the Benjamini-Hochberg adjusted Fisher’s exact test. The different distribution of these CC motifs on CCs among DII and CLR spots can be visualized in Fig. [132]3G and H (size-2), Fig. [133]3I and J (size-3), and Fig. [134]3K, L (size-4). Notably, spatial topology plays a crucial role in linking phenotypes and survival. There were two types of size-3 motifs with cell types CD68+CD163+macrophages (“A”) and smooth muscle (“B”). Compared with “A & A & B”, the alternative motif “A & B & B” occurred 4602 and 2255 times among DII and CLR patients, respectively, with a lower ratio of 2.04, and it cannot differentiate survival well (Cox PH p = 0.2975). Apparently, these topological differences among the spatial localization of cells in different cell types played different roles biologically and pathologically, where conventional top-down approaches with cell type composition failed to distinguish (Supplementary Fig. [135]10 and Supplementary Data [136]13–[137]16). Furthermore, an LR model provides intrinsic interpretability when differentiating phenotypes. The coefficients of each feature from the LR model demonstrated the importance of CC motifs quantitatively, making the model interpretable (Fig. [138]3M–O). Notably, all macrophage-related, muscle-related, and significant Cox PH p-value motifs in different sizes tended to have high absolute coefficient values. The same interpretable results can also be cross-validated by Shapley value^[139]29 in Supplementary Fig. [140]6G–I, showing that these macrophage-related and muscle-related CC motifs were essential to differentiating DII patients from CLR patients. Biologically, it was evidenced that macrophages facilitate pancreatic cancer to induce muscle wasting via promoting TWEAK (TNF-like weak inducer of apoptosis) secretion from the tumor^[141]30. After carefully checking these top motifs in different sizes, we also identified biologically meaningful tumor cells and B cells, which were known to be related to the severity of CRC^[142]31. Representative tumor and B cell enrichments in DII and CLR samples are shown in Supplementary Fig. [143]6J, [144]6K, [145]6L, and 6M. Our analysis validated the crosstalk between macrophages, muscle wasting, and cancer cachexia through an independent spatial omics study, and TrimNN identified CC motifs in a data-driven approach as robust interpretable representations in CNs. TrimNN identifies CC motifs revealing diverse roles in Alzheimer’s disease using spatial transcriptomics data Next, we showed TrimNN’s capability to identify diverse spatially distributed CC motifs corresponding to multiple biological and pathological mechanisms in complex diseases. It is known that the interaction between CTX (Cortex) excitatory neurons and Microglia is significantly disrupted by neuroinflammation in Alzheimer’s disease (AD)^[146]32. However, their topological combinations, particularly their relationship with amyloid-β on the cellular level, are still unknown^[147]33. In a study, we performed TrimNN on an AD mouse brain with 8-month-old and 13-month-old samples sequenced by STARmap PLUS spatially resolved transcriptomics^[148]34. There were two replicates for both disease and control conditions at each time point. The transcriptomics data included 2,766 genes and two proteomics channels representing AD markers of amyloid-β and tau pathologies at subcellular resolution. On the derived CCs, size-3 triangle-like CC motifs composed of cell types CTX excitatory neurons and Microglia were identified significant between AD (Fig. [149]4A) and control (Fig. [150]4B). These significant CC motifs included CTX excitatory neurons-CTX excitatory neurons-CTX excitatory neurons (CCC), CTX excitatory neurons-CTX excitatory neurons-Microglia (CCM), CTX excitatory neurons-Microglia-Microglia (CMM), and Microglia-Microglia-Microglia (MMM) (Benjamini–Hochberg adjusted Fisher’s exact test in 8-month-old replicate 1 with p = 6.12e–32, p = 3.23e–20, p = 4.30e–34, and p = 9.74e–07, respectively). Visualization of an exemplary CC motif “MMM” demonstrated uneven spatial distribution that differed in AD (Fig. [151]4C) and control (Fig. [152]4D). Supplementary Fig. [153]11 shows the spatial occurrence distribution of the other three motifs. The CCs inferred from all samples are shown in Supplementary Fig. [154]12 and Supplementary Data [155]17–[156]28. Fig. 4. TrimNN analysis in an AD mouse study sequenced by STARmap PLUS. [157]Fig. 4 [158]Open in a new tab CCs of 13-month-old of A. AD and B control sample replicate 1 are obtained using Delaunay triangulation, where black spots are amyloid-β in the AD sample. The spatial locations of the identified motif with all Microglia cells (“MMM” motif, where “M” denotes cell type Microglia) in 13-month-old replicate 1 of C. AD and D control mouse samples. “MMM” motifs are marked as purple. Cell–cell communication analysis demonstrates the ligand–receptor differences between motif regions and non-motif regions as river plots, including E cell type CTX excitatory neurons (denoted as “C”) as source (left) and target (right), F cell type Microglia as source (left) and target (right) in regions with and without “CCC” motif. Similarly, G and H are cell types of CTX excitatory neurons and Microglia as source and target in regions with and without the “MMM” motif. I GO enrichment analysis of Biological Processes and J Pathway enrichment analysis on DEGs between regions containing and not containing the “MMM” motif in 13-month-old AD samples. K Expression of marker genes for cell types “C” and “M” related size-3 and size-4 motifs. L Spatial co-occurrence of different CC motifs with respect to amyloid-β as computed using Squidpy. Microglia-related motifs have an even higher spatial co-occurrence probability compared to the amyloid-β plaque, and CTX excitatory neurons–related motifs have lower spatial co-occurrence probabilities compared to the amyloid-β plaque. CC: cellular community. Source data are provided as a Source Data file. Here, we defined motif-enriched regions as expanded regions within three hops of CC motifs in the CC. From the perspective of cell–cell communications, unique ligand–receptor signaling pathway patterns between “CCC” (Fig. [159]4E, F) and “MMM” (Fig. [160]4G and H) were identified by motif-enriched and complementary regions on 13-month-old samples using CellChat^[161]35. Specific to cell type Microglia, motifs “CCC” and “MMM” had dominant ligand–receptor pairs GRN^[162]21 and PMCH to distinguish motif regions from the complementary regions. AD-related ligand–receptors, including GRN, VEGF, PDGF, CCL, VIP, NRG, and SEMA3, were significantly enriched in CC motifs associated with CTX excitatory neurons or Microglia (Supplementary Data [163]29–[164]32). All the cell–cell communication results on CC motifs are detailed in Supplementary Figs. [165]13–[166]21 and Supplementary Data [167]33–[168]40. These differences in cell–cell communications were independently validated by DeepTalk^[169]36, incorporating long-range cellular interactions. Particularly, CSF and VEGF pathways exhibited significant differences between “CCC” and “ non- CCC” regions in Microglia-to-CTX excitatory neurons cell–cell communication (Mann–Whitney U test, p = 0.033, and p = 0.003, Supplementary Figs. [170]22–[171]25). Considering cell–cell communication among all the cell types, all ligand–receptor pairs identified by CellChat show significant differences with and outside the identified motif regions (Mann–Whitney U test, p-value < 0.001, Supplementary Figs. [172]26 and [173]27). Then we analyzed the gene-level characteristics of identified CC motifs. Comparing “MMM” motif-enriched and complementary regions, differentially expressed genes (DEGs) were identified as significant (p-value < 0.05) using the Wilcoxon rank-sum test, including Plekha1, Ctsb, and Sort1 in 8-month-old samples (Supplementary Data [174]44), and App, Plekha1, Clu, Ptk2b, Sort1, Bin1, and Ctsb in 13-month-old samples (Supplementary Data [175]48). On DEGs in 13-month-old samples, Gene Ontology (GO) enrichment analysis showed significant vesicle-mediated transport in synapse (q-value = 1.68E–107), regulation of synapse structure or activity (q-value = 6.45e–106), learning or memory (q-value = 1.20e–56), and cognition (q-value = 5.54e–56) (Fig. [176]4I). Neural systems (q-value = 3.63e–51), transmission across chemical synapses (q-value = 2.05e–33), neurotransmitter receptors and postsynaptic signal transmission (q-value = 8.68e–20), and nervous system development (q-value = 1.33e–15) were enriched with pathway enrichment analysis (Fig. [177]4J). For all the detailed results of CC motifs “CCC”, “CCM”, “CMM”, and “MMM”, a similar analysis was performed for DEGs (Supplementary Data [178]41–[179]48), including both GO and pathway enrichment analyses (Supplementary Figs. [180]28 and [181]29). To validate their relations with AD, we compared these motif-related DEGs with 77 AD-associated genes identified from large-scale GWAS analysis^[182]37. On CC motif “CCC”, the Trem2 gene was exclusively observed in the replicates of the 13-month-old but not 8-month-old AD mouse model, consistent with its role in the late-onset of AD^[183]38. Similarly, for the motif “MMM”, the Clu gene was highlighted only in the 13-month-old mouse model, aligning with its direct involvement in the formation process of amyloid-β^[184]39 (Supplementary Fig. [185]30). Based on the identified size-3 motifs, we performed pattern growth to identify size-4 motifs using TrimNN (Supplementary Notes [186]1). Among all the size-4 “CCC” expanded motifs, “CCCM” showed the most significant difference between AD and control samples, while “MMMM” was the most significant size-4 motif expanded from “MMM”. Similar to the analysis on size-3 motifs, checking significantly enriched ligand–receptors (Supplementary Data [187]49–[188]51) and DEGs (Supplementary Data [189]52–[190]57), these size-4 motifs were related to AD in cell–cell communication analysis (Supplementary Data [191]58–[192]63), GO enrichment analysis (Supplementary Fig. [193]31), and pathway enrichment analysis (Supplementary Fig. [194]32). Further investigation on DEGs showed diverse groups of CC motifs with expressed markers (Fig. [195]4K). Size-3 “MMM” and size-4 “MMMM” motifs with homogeneous Microglia had divergent expression patterns. For example, Hexb had a higher average expression than the other CC motifs. Hexb is known to induce toxic and progressive neuronal damage, which may relate to neurodegenerative dementia^[196]40. In addition to examining the diversity of CC motifs at the gene level, we investigated whether the identified CC motifs were spatially co-localized with amyloid-β by computing their co-occurrence probabilities using Squidpy^[197]41. The results showed that Microglia-related CC motifs had an even higher co-occurrence probability with amyloid-β than the spatial expectation, distinguishing them from other CC motifs associated with CTX excitatory neurons (Fig. [198]4L). Interestingly, the extent of homogeneity of Microglia regions seemed to correspond to a larger co-occurrence probability of amyloid-β. In contrast, the extent of homogeneity of CTX excitatory neurons tallied to a lower co-occurrence probability. This trend prevailed across the whole spectrum of CC motifs composed of Microglia and CTX excitatory neurons in multiple sizes, from a very high ratio of size-4 “MMMM” to a very low ratio of size-3 “CCC”. Differences in both DEGs and spatial co-occurrence suggest the presence of two distinct types of CC motifs related to amyloid-β in AD. One type of CC motif (i.e., “CCC”, “CCM”, “CMM”, and “CCCM”) was reluctant to co-localize with amyloid-β. Another kind of CC motif (i.e., “MMM” and “MMMM”) was closely co-localized with amyloid-β. These results were consistent with the observation that Microglia, as key mediators in the brain, activate inflammation in the vicinity of amyloid-β deposits, which are directly toxic to the adjacent neurons^[199]42. Activated Microglia release pro-inflammatory cytokines, such as tumor necrosis factor-alpha (TNF-α) and interleukin-1 beta (IL-1β), can damage excitatory neurons or alter their function^[200]43. In this case study, TrimNN confirmed known knowledge of AD-related cell types and provided some new insights into spatial biology. As an unbiased data-driven approach, TrimNN independently identified pathologically related spatial characteristics of Microglia and CTX excitatory neurons, along with their topological relations with diverse cell types, as two distinguished CC motifs differ in levels of cell type, gene, and cell–cell communications. Analysis enabled by CC motifs demonstrates an unprecedented spectrum of the spatial relationships between the homogeneity of CTX excitatory neurons/Microglia cell types and the location of amyloid-β. TrimNN accurately captured these spatial co-localization patterns with amyloid-β deposits, providing insights into the onset of AD as the result of interactions between multiple cell types^[201]44, which clustering-based tools may overlook (Supplementary Figs. [202]33 and [203]34). TrimNN identifies cell type–specific spatial tendencies in a colorectal carcinoma study on spatial proteomics data Besides the AD study, we also performed TrimNN analysis to explore cell type–specific spatial tendencies in one colorectal carcinoma study. It is known that the tumor microenvironment can significantly influence the interactions between T-cells and epithelial cells through antigen presentation, T-cell activation, and modulation of the tumor microenvironment. However, it is still unknown how the spatial arrangement of these cells is related to effective immune surveillance and the potential for therapeutic interventions^[204]45. The adopted colorectal carcinoma study investigated 40 ROIs in two colorectal cancer patients and 18 ROIs in two healthy controls using spatial proteomics of multiplexed ion beam imaging using time of flight (MIBI-TOF)^[205]46. After a comprehensive analysis of size-3 and their related size-4 CC motifs with TrimNN, we defined two types of CC motifs: Shifted Interaction Motifs and Homeostatic Interaction Motifs (Fig. [206]5A). Shifted Interaction Motifs demonstrated a shift of CC motif abundance from control-enriched (more occurrence in control than disease samples) to disease-enriched (more occurrence in disease than control samples) when expanding from size-3 to size-4. The exemplary size-3 motif “ABC” (Fig. [207]5B, C) suggested disease progression when involving other immune cells to form a size-4 motif “ABCD” (Fig. [208]5D, E), where “A” denotes CD4 T-cells, “B” denotes CD8 T-cells, “C” denotes epithelial, and “D” denotes other immune cells (other CD45+) annotated by the original publication. Proportion tests showed that this size-4 “ABCD” motif significantly differed from the size-3 “ABC” motif in abundance between disease and control samples (p = 2.58e–12). In contrast, Homeostatic Interaction Motif remained consistent in abundance between disease and control groups when expanding its sizes (e.g., size-3 motif “AEC”) (Fig. [209]5F, G), where “E” denotes endothelial concatenating another epithelial (“C”) as a size-4 motif “AECC” (Fig. [210]5H, I). Proportion tests showed consistency between disease and control ratios among this pair of size-3 and size-4 motifs (p = 0.55). The expression level of antibodies also confirmed differences between these two groups of CC motifs (Fig. [211]5J, Supplementary Fig. [212]35A and [213]35B). A similar analysis demonstrated that these two groups of CC motifs also existed in AD studies (Supplementary Notes [214]2). Fig. 5. TrimNN analysis in a colorectal carcinoma study using MIBI-TOF. [215]Fig. 5 [216]Open in a new tab A Schematic of Shifted Interaction Motif and Homeostatic Interaction Motif as two types of size-4 motifs. Shifted Interaction Motifs: size-3 motif “ABC” (purple) in exemplary B spot 33 (disease) and C spot 56 (control), the successor size-4 motif “ABCD” (red) in the same D spot 33 (disease) and E spot 56 (control). Homeostatic Interaction Motifs: exemplary size-3 motif “AEC” (purple) in exemplary F spot 8 (disease) and G spot 50 (control), the successor size-4 motif “AECC” (red) in the same H spot 8 (disease) and I spot 50 (control). J Heatmap of antibody expression ratio between disease and control samples in Shifted Interaction Motif and Homeostatic Interaction Motif. K Ranking of effective size between all cell types in colon tissue samples. Abundance of size-2 CC motifs as occurrences in L colorectal carcinoma and M healthy control samples. N P-value of size-2 CC motifs between disease and control by the two-sided Benjamini-Hochberg adjusted Fisher’s exact test. O Heatmap on sender rank from NCEM-type coupling analysis in colon tissue samples. Heatmap of NCEM-type coupling analysis in P colorectal carcinoma and Q healthy control samples. R Difference values from NCEM-type coupling analysis between disease and control samples. Cell type “A” denotes CD4 T-cells, “B” denotes CD8 T-cells, “C” denotes epithelial, “D” denotes other immune cells annotated by the original publication, and “E” denotes endothelial. The star symbol marks the paired cell type composition of the “AEC” motif. CC: cellular community. Source data are provided as a Source Data file. To better explain the biological relevance of these patterns, we analyzed the types of proteins showing increased expression in the expanded motifs. Interestingly, the increased disease/control fold-change observed in “ABCD” compared to “ABC” suggests that the involvement of the fourth node (other immune cells) is associated with heightened immune activity or immune cell engagement in the tumor microenvironment. Many of the upregulated proteins in this motif, such as CD39, CD11C, CD45, CD14, and H3, are related to immune infiltration, antigen presentation, or immune cell function. However, the presence of CD39 (involved in immunosuppressive adenosine signaling) and a concurrent rise in metabolic enzymes (e.g., HK1, G6PD, and VDAC1) and proliferation markers (e.g., Ki67) suggests that this immune activity may be skewed or dysfunctional. Rather than a classical immune-activated phenotype, the tumor microenvironment in “ABCD” may represent an immune-infiltrated but metabolically constrained immunoregulatory niche. In contrast, although the Homeostatic Interaction Motifs “AEC” and “AECC” do not change in abundance across conditions, their protein expression reveals different biological characteristics. Notably, PD1 and GLUT1 are upregulated in the size-4 motif “AECC” suggesting that immune checkpoint pathways and metabolic competition may play important roles even in spatially stable interaction contexts. PD1 upregulation indicates a checkpoint-mediated immune inhibitory signal, while GLUT1 elevation supports increased glycolytic activity, both of which can further restrict immune function through resource competition. We also observed that motif-specific differences in PD1 and CD3 did not change between “ABC” and “ABCD”, but both showed increased expression in “AECC” versus “AEC”. This implies that the involvement of endothelial cells may contribute to a niche promoting immune checkpoint activation or T cell exhaustion. CD36, a lipid metabolism regulator, showed a moderate decrease in the “ABCD” motif, suggesting lipid metabolic shifts associated with certain spatial configurations. Together, these comparisons provide evidence that different motif expansions not only reflect structural reorganization but also correspond to distinct biological states. The Shifted Interaction Motif represents a transition toward a metabolically active, immune-suppressive tumor niche, consistent with tumor progression and immune evasion. The Homeostatic Interaction Motif, while stable in structure, still shows signs of immune checkpoint engagement, potentially limiting effective immune surveillance. Next, we explored cell-type preferences using TrimNN analysis and