Abstract
The spatial organization of cells plays a pivotal role in shaping
tissue functions and phenotypes in various biological systems and
diseased microenvironments. However, the topological principles
governing interactions among cell types within spatial patterns remain
poorly understood. Here, we present the triangulation cellular
community motif neural network (TrimNN), a graph-based deep learning
framework designed to identify conserved spatial cell organization
patterns, termed cellular community (CC) motifs, from spatial
transcriptomics and proteomics data. TrimNN employs a
semi–divide-and-conquer approach to efficiently detect overrepresented
topological motifs of varying sizes in a triangulated space. By
uncovering CC motifs, TrimNN reveals key associations between spatially
distributed cell-type patterns and diverse phenotypes. These insights
provide a foundation for understanding biological and disease
mechanisms and offer potential biomarkers for diagnosis and therapeutic
interventions.
Subject terms: Computational models, Machine learning, Data mining,
Network topology
__________________________________________________________________
Cellular spatial organisation is crucial for shaping tissue functions
and phenotypes. Here, authors present TrimNN, a graph-based deep
learning framework to identify conserved cellular community motifs,
revealing links between spatially distributed cell-type patterns and
diverse phenotypes.
Introduction
Various cells work together within spatial arrangements in tissue to
support organ homeostasis and function^[42]1. Deciphering the
multicellular organization is key to understanding the relationship
between spatial structure and tissue biological and pathological
functions^[43]2. Emerging spatial omics approaches, including spatially
resolved transcriptomics^[44]3 and spatial proteomics^[45]4, enable
investigation of the mechanisms governing the spatial organization of
different cell types in a specific tissue. Within a region of interest
(ROI) in spatial omics, cellular neighborhoods (CNs) define local cell
type enrichment patterns in cellular communities (CCs), and decoding
function-related conservative spatial features in CNs is one of the
primary spatial omics data analysis tasks^[46]4.
Most existing data analysis approaches adopt the top-down strategy to
describe the cell organizations. This strategy mainly relies on
clustering strategies to identify the cell type compositions as common
patterns. Deep learning approaches, including SPACE-GM^[47]5,
CytoCommunity^[48]6, CellCharter^[49]7, and BANKSY^[50]8, typically
learn low-dimensional embeddings of the nodes in corresponding CNs and
then apply clustering approaches to these embeddings. However,
clustering approaches suffer the following challenges in dissecting and
interpreting highly heterogeneous, dynamically evolving cell
systems^[51]9. First, clustering results usually become less stable
when samples contain cells under active state transition, which is
common in disease or developmental processes^[52]10. Second, clusters
identified by these top-down approaches are often described as
percentages of cell-type compositions. These clustering presentations
lack formulations in topologically representing the geometrical
cell-type interactions or are difficult to interpret biologically.
Last, these top-down results essentially depend on the presence of
batch effects, where CNs separate primarily by samples as technical
covariates rather than biological features^[53]3. These batch effects
make it easy to overfit the models but difficult to validate across
different datasets^[54]5.
Considering the preceding limitations of top-down strategies, we
instead use a bottom-up strategy to identify CC motifs as recurring
significant interconnections between cells. In the spatial
omics–derived CC, we hypothesize that CC motifs can be represented as
topological building blocks of multicellular organization consistent
across different samples and associated with key biological processes
and functions. CC motifs are biologically interpretable spatial
patterns of the combined cell types, which provide topological
information beyond clusters and explicitly link to the biological and
pathological mechanisms through distinct cell–cell communications,
highly expressed genes and pathways^[55]11. This concept is related to
the functional tissue units (FTUs)^[56]12, but CC motifs are even
smaller in the scale of cell locations and cell types, which provides
more details for understanding and modeling the healthy physiological
function of the organ and functional-related changes during disease
states. Currently, size 1–3 motif analysis^[57]4 makes up most of the
spatial omics studies, where size-1 motifs are single nodes that can be
treated as cell-type compositions, size-2 motifs are double nodes
linked by edges, and size-3 motifs are triple nodes within triangles.
Nevertheless, biologists have found that sizable CC motifs with more
nodes than triangles substantially correlate with patient survival and
phenotypical features in colorectal cancer (CRC)^[58]13, kidney
diseases^[59]14, maternal–fetal interface^[60]15, and many other
biological contexts.
In practice, identifying the most overrepresented CC motifs composing
multicellular organization is still computationally expensive with (i)
subgraph matching^[61]16, which counts the occurrence of a given motif
on the query graph, and (ii) pattern growth^[62]17, which finds the
motifs with the most significant occurrence. It is known that subgraph
matching is NP-complete^[63]16, which makes the node type combination
alone super-exponential. Existing approaches include
permutation^[64]11, edge sampling (e.g., mfinder^[65]18), node sampling
(e.g., FANMOD^[66]19), and global pruning (e.g., Ullmann^[67]20 and
VF2^[68]21). A computationally feasible approach is still lacking to
analytically identify conservative, interpretable, and generalizable
spatial rules of cellular organization in different sizes across
different samples of spatial omics.
Here, we propose the triangulation cellular community motif neural
network (TrimNN), a graph-based deep learning approach to analyze
spatial transcriptomics and proteomics data using a bottom-up strategy
(Supplementary Fig. [69]1). Within the input spatial omics samples, CC
is defined based on the cells as nodes, the node types represent
different cell types, and the edges encode physical proximity inferred
unidirectional as the spatial cell-cell relation from Delaunay
triangulation^[70]22 based on nodes coordinates from ROI. TrimNN
estimates overrepresented size-
[MATH: K :MATH]
CC motifs in the CC of spatial omics using graph isomorphism
networks^[71]23 (GIN) empowered by positional encoding^[72]24 (PE). In
various spatial transcriptomics and spatial proteomics case studies,
TrimNN identifies computationally significant and biologically
meaningful CC motifs to differentiate patient survival in CRC studies
and represents pathologically related cell type organization in
neurodegenerative diseases and colorectal carcinoma studies. Notably,
the identified sizable CC motifs demonstrate their potential as
interpretable topological prognostic biomarkers linking the topological
structural organization of cell types at microscopic levels to
phenotypes at macroscopic levels, which cannot be inferred by other
existing tools. The source code of TrimNN is publicly available at
[73]https://github.com/yuyang-0825/TrimNN.
Results
TrimNN quantifies multicellular organization with sizable CC motifs
A schematic diagram of the proposed TrimNN and its analytic workflow is
shown in Fig. [74]1A. These identified CC motifs are biologically
interpretable through a set of downstream analyses, including motif
visualization, cellular-level interpretation within cell–cell
communication analysis, gene-level interpretation within differentially
expressed gene and pathway analysis, and phenotypical analysis within
the availability of phenotypical information (Fig. [75]1B). TrimNN is
constructed on an empowered GIN to estimate the occurrence of the query
on the target graph. On the CC as a triangulated graph built from
spatial omics, TrimNN builds a supervised graph learning model by
simplifying the graph constraints and incorporating the inductive bias
within triangles derived from Delaunay triangulation. TrimNN decomposes
the regression task in occurrence counting of the query graphs into
many trackable binary classification tasks modeled by the sub-TrimNN
module. Inspired by the idea of NSIC^[76]25, this method is trained on
representative pairs of the predefined query subgraphs and the target
triangulated cell graphs as a binary classification task. This graph
representation framework builds upon GIN and adopts a shortest
distance–based PE^[77]24, modeling the symmetric space to increase the
expressive power. Additionally, TrimNN adopts a semi–divide-and-conquer
strategy to estimate the abundance of the query by summarizing the
enumeration of single classification tasks by a sub-TrimNN module on
each node’s enclosed graph. Given the size of the query subgraph, our
framework uses an enumeration approach to estimate the most
overrepresented CC motifs with possible cell types and topology. Then,
we search to infer CC motifs in different sizes incrementally. The
details of the architecture of TrimNN are shown in Supplementary
Fig. [78]2.
Fig. 1. TrimNN analysis workflow.
[79]Fig. 1
[80]Open in a new tab
A Spatially resolved transcriptomics (e.g., STARmap PLUS and 10X
Xenium) and spatial proteomics data (e.g., MIBI-TOF and CODEX) are used
as input to generate corresponding CCs with spatial coordinates and
Delaunay triangulation. TrimNN is trained on representative pairs of
query motifs and target triangulated graphs at scale. Given a specific
query, TrimNN identifies its occurrence in the target CC in the
subgraph matching process by decomposing this regression task into many
binary classification problems, where each classification predicts
whether the query exists in the target graph as the enclosed graph of
each node. Enumerating possible motifs at size-
[MATH: K :MATH]
, TrimNN identifies the most overrepresented motifs. Then, the pattern
growth process adopts a heuristic search for their successor size-
[MATH: k+1 :MATH]
motifs. Here, we take size-3 CC motifs as an example. After subgraph
matching and pattern growth, TrimNN estimates overrepresented CC
motifs. Created in BioRender. Yu, Y. (2025) [81]BioRender.com/mfm4ta4.
B These CC motifs can be biologically interpreted in the downstream
analysis, including visualization, cellular-level interpretation within
cell–cell communication analysis, gene-level interpretation within
differentially expressed gene analysis (e.g., GO enrichment analysis
and pathway enrichment analysis), and phenotypical analysis within the
availability of phenotypical information (e.g., survival curve and
phenotypic classification analysis). CC: cellular community.
We hypothesize CC motifs as the countable recurring spatial patterns of
various cell types are robust within noises to represent and quantify
multicellular organization. We performed simulations to mimic different
levels of noises, including cell missing from the cell capture
imperfection of sequencing technology (Fig. [82]2A), cell coordinate
shifting from technological errors (Fig. [83]2B), and cell type
misclassification from annotation errors in data analytics
(Fig. [84]2C). We noticed diverse noises do not influence the relative
ranking of CC motif abundance, which remained robustly consistent in
most scenarios (Fig. [85]2A and Supplementary Fig. [86]3). Even under
extreme cases with a noise ratio of 0.4 and 0.5, the Spearman
correlation between abundance rankings before and after poised noise
remained relatively stable in cell dropout and cell coordinate
perturbations. When the noise level is high, the correlation values
deteriorate in large motif sizes for cell type misclassification, which
is unlikely to occur in practical scenarios.
Fig. 2. The performance of TrimNN on spatial omics.
[87]Fig. 2
[88]Open in a new tab
A Simulations of missing cell effects on CC motifs, represented as the
Spearman correlation between abundance rankings of all the possible
motifs before and after simulated noises at cell proportions of 0.01,
0.05, 0.1, 0.2, 0.3, 0.4, and 0.5 within CC motifs in size-1, size-2,
size-3, size-4, and size-5 (n = 100). B Simulations of cell coordinate
shifting effects on CC motifs, represented as the Spearman correlation
between abundance rankings of all the possible motifs before and after
simulated noises with different levels of noises of 0.01, 0.05, 0.1,
0.2, 0.3, 0.4, and 0.5 at cell proportions of 0.01, 0.05, 0.1, 0.2,
0.3, 0.4, and 0.5 within CC motifs in size-1, size-2, size-3, size-4,
and size-5 (n = 100). C Simulations of cell-type misclassification
effects on CC motifs, represented as the Spearman correlation between
abundance rankings of all the possible motifs before and after
simulated noises at cell proportions of 0.01, 0.05, 0.1, 0.2, 0.3, 0.4,
and 0.5 within CC motifs in size-1, size-2, size-3, size-4, and size-5
(n = 100). D Benchmarking the performance of TrimNN, TrimNN-RGIN, and
NSIC on independent simulated data for subgraph matching (n = 3000).
The X-axis represents different sizes of CC motifs, and the Y-axis
indicates the MCC (Matthews Correlation Coefficient) values. E
Performance comparison of TrimNN, TrimNN-RGIN, and NSIC in identifying
occurrences of CC motifs in diverse simulated datasets. The Y-axis is
the RMSE (Root Mean Square Error) value. F Scalability of TrimNN. The
X-axis represents the size of the triangulated graph, and the Y-axis
indicates the runtime on a workstation equipped with an Intel Xeon Gold
6338 CPU and 80 G RAM. G Ablation tests on performance comparison
adding the positional encoding of TrimNN model (n = 3000). The X-axis
represents different sizes of CC motifs, and the Y-axis indicates the
MCC values. CC: cellular community. On each box, the central mark
indicates the median, and the bottom and top edges of the box indicate
the 25th and 75th percentiles. The whiskers extend to the most extreme
data points without outliers, and the outliers are plotted individually
as circles. Source data are provided as a Source Data file.
TrimNN accurately identifies overrepresented CC motifs in Cellular
Neighborhoods
On a modified subgraph matching task as a binary classification of
motif existence in a triangulated graph, TrimNN outperformed the
competitive methods in all scenarios in most criteria in synthetic
spatial omics data, including VF2, the original regression-based neural
network method NSIC^[89]25, and TrimNN-RGIN with the proposed
formulation but using NSIC’s RGIN network architecture. Especially on
large-size CC motifs, TrimNN demonstrated significant performance
improvements with TrimNN-RGIN, highlighting its architecture’s capacity
(Fig. [90]2D and Supplementary Data [91]1).
TrimNN accurately identified the top overrepresented CC motifs. On a
pattern growth challenge to determine the ranking of CC motif
abundance, TrimNN outperformed competitive methods consistently in
different sizes and cell types in synthetic spatial omics data. Both
TrimNN and TrimNN-RGIN outperformed NSIC by a large margin in most
scenarios and criteria, which highlights the capability of the proposed
problem formulation. Notably, TrimNN demonstrated an average
improvement over NSIC by approximately 20 to 60 times in root mean
square error (RMSE) (Fig. [92]2E and Supplementary Data [93]2). Besides
the criteria in absolute occurrence value, the relative value of the
ranking index also supported TrimNN’s capacity in Supplementary
Fig. [94]4A and Supplementary Data [95]3.
TrimNN is highly scalable in identifying large-size CC motifs. Because
scalability plays a vital role in the study, we compared the
computational time on target-triangulated graphs with varying node
sizes. We observed that TrimNN, TrimNN-RGIN, and NSIC exhibit linear
scalability with increasing node sizes, while TrimNN continuously
consumed less computational time (Fig. [96]2F). TrimNN was especially
more efficient than TrimNN-RGIN with a simpler network architecture
using the same problem setting. In contrast, the classical
enumeration-based VF2 method grew exponentially, where its runtime made
it unacceptable in most scenarios. In practical usage, on typical
spatial omics data with thousands of cells of dozens of cell types,
TrimNN robustly infers large-size CC motifs accurately in seconds,
which is unattainable through conventional methods.
Together with GIN, PE increases the expressive power of TrimNN. In
challenging tasks with larger-sized motifs, ablation tests showed that
integrating PE improved GNN (Graph Neural Network) performance compared
with TrimNN-RGIN without PE and a complex GRU module (Fig. [97]2G and
Supplementary Data [98]4). In addition, GIN, as the critical component
in TrimNN, was effective by replacing it with other graph neural
network models, including Graph Convolutional Networks and Graph
Transformer^[99]26, keeping other components and parameters constant
(Supplementary Fig. [100]4B and Supplementary Data [101]5). This result
aligned with theoretical analyses that GIN is a powerful 1-order graph
neural network^[102]23. Meanwhile, it was shown that TrimNN requires
sufficient training data to learn the complex relationships
(Supplementary Fig. [103]4C and Supplementary Data [104]6).
TrimNN identifies representative CC motifs that accurately differentiate the
severity of colorectal cancer patients
In addition to the above simulation studies, we showed that the CC
motifs inferred by TrimNN are intrinsic representations to
differentiate phenotypes of the CC. In a proteomics study comprising 17
low-risk (Crohn’s-like lymphoid reaction, CLR) and 18 high-risk
(diffuse inflammatory infiltration, DII) patients, using Co-Detection
by Indexing (CODEX)^[105]13 on CRC, we performed a CC motif analysis
using TrimNN on 140 tissue regions and identified the most abundant CC
motifs in size-1 to size-4. Traditional machine learning approaches,
such as logistic regression (LR), were adopted using relative ranking
indices to quantify motif occurrence as features. Because the original
publication annotated 29 cell types, we chose 29 as the fixed number of
features in supervised learning to classify CLR and DII. Within tenfold
cross-validation following the same protocol as CytoCommunity^[106]6,
the ROC-AUC results of LR were 0.77, 0.76, 0.79, and 0.76
(Fig. [107]3A) for size-1 to size-4 CC motifs, respectively. This LR
model with 29 CC motif features outperformed CytoCommunity’s extensive
GNN computation performance using a default of 512 dimensions of
embeddings as features (ROC-AUC: 0.71). Notably, if the feature number
increased to the top 100, the LR model on size-3 motifs achieved an
ROC-AUC of 0.81. To investigate the robustness of the model against
potential overfitting, additional experiments were performed using 10
times 10-fold cross-validation on the LR model with the top 5, 10, 15,
and 20 size-3 CC motif features, as well as CytoCommunity using reduced
29-dimensional embedding (Supplementary Fig. [108]5 and Supplementary
Data [109]7). The performance of the LR model with diverse small
numbers of CC motif features is relatively stable, suggesting it
encounters limited influences from overfitting with LR. In addition,
other classical machine learning models, such as Random Forest and
Support Vector Machine, were applied to the same classification tasks
with the same settings. These models performed similarly to LR, further
supporting the representational power of CC motifs (Supplementary
Data [110]8-[111]10). The evaluation of the comprehensive performance
comparison to CytoCommunity and SPACE-GM is provided in Supplementary
Data [112]11.
Fig. 3. TrimNN analysis in a colorectal cancer study using CODEX.
[113]Fig. 3
[114]Open in a new tab
A The ROC curves of the LR model classify CLR and DII patients using
the top CC motifs of size-1 to size-4 as features, and the competitive
method CytoCommunity uses learned dimension. The LR model uses features
as motif counts from TrimNN and scales between 0 and 1. B Visualization
of all the samples using the top two principal components from the 29
top size-2 CC motifs. Blue spots denote the CLR patient group and red
spots denote the DII patient group. C Generalizability of the trained
model testing on random cropping of ROI in the samples (n = 100). The
X-axis is the ratio of width and height of the original ROI, and the
Y-axis is ROC-AUC. On each box, the central mark indicates the median,
and the bottom and top edges of the box indicate the 25th and 75th
percentiles. The whiskers extend to the most extreme data points
without outliers, and the outliers are plotted individually as circles.
Survival curves of DII patients with and without enriched motifs,
including D size-2 “A & B”. Here, cell type CD68+CD163+ macrophages are
denoted as “A” and smooth muscles are denoted as “B”. E size-3 “A & A &
B” and F size-4 “A & A & B & B”. The visualization of spatial
localization of size-2 CC motif “A & B” on the CC in G patient 3 (DII)
on spot 5A and H patient 8 (CLR) on spot 16 A. The visualization of the
spatial locations of the size-3 motif “A & A & B” in I DII spot and J
CLR spot (same spots as G and H). The visualization of spatial
localization of the size-4 motif “A & A & B & B” in K DII spot and L
CLR spot (same spots as G and H). All motifs are marked as blue, nodes
of cell type “A” are red, and nodes of cell type “B” are orange. The
plot of LR coefficients ranked by Cox PH p-value of the top 29 CC
motifs in M size-2, N size-3, and O size-4. The extent of the blue
color represents the Cox PH p-value. The p-values were derived from the
two-sided Wald test. * marks the highlighted motif. LR: logistic
regression, CC: cellular community. Source data are provided as a
Source Data file.
Besides supervised learning, these CC motifs seemed to capture some
intrinsic characteristics in CNs, where CLR and DII demonstrated good
visual separation using the top two principal components inferred from
the top 29 size-2 motifs (Fig. [115]3B). Further unsupervised
hierarchical clustering showed the different distributions of the top
29 motif abundances among the CLR and DII groups in Supplementary
Fig. [116]6A and [117]6B (size-2), Supplementary Fig. [118]6C and
[119]6D (size-3), and Supplementary Fig. [120]6E and [121]6F (size-4).
CC motifs in simpler models and fewer numbers of features showed better
generalizability across multiple samples in machine learning. When only
parts of the CC were available by random cropping the samples, CC
motif–based LR methods were very robust in generalizability compared to
competitive methods (Fig. [122]3C). The same trends were observed in
distorted samples with simulated noises in cell missing, cell
coordinate shifting, and cell type misclassification (Supplementary
Fig. [123]7).
In addition, the enrichment of sizable CC motifs can be used to
differentiate patient survival. We identified several size-2, size-3,
and size-4 CC motifs that significantly differentiate survival (Cox PH
p < 0.05) between enriched and non-enriched DII patients, while cell
type composition (size-1 motifs) may not necessarily succeed
(Supplementary Data [124]12). With cell type “CD68+CD163+ macrophages”
(denoted by “A”) and cell type “smooth muscle” (denoted by “B”), the
survival curves showed that size-2 motif “A & B” may not have
adequately separated survival in the DII patient group (Cox PH
p = 0.63, shown in Fig. [125]3D), but including more adjacent nodes
with the same cell types, patients with enrichment of size-3 (“A & A &
B”) and size-4 (“A & A & B & B”) CC motifs showed significant lower
survival rates (Cox PH p = 0.016, shown in Fig. [126]3E, and Cox PH
p = 0.0093, shown in Fig. [127]3F, respectively). To validate the
results, we performed additional survival analyses in both COAD and
READ cohorts of The Cancer Genome Atlas (TCGA) associated with
CD68+CD163+ macrophage marker genes: CD68, CD163, CD14, and
ITGAM^[128]27, and smooth muscle marker genes: ACTA2, MYH11, and
MYL9^[129]28. We observed no significant differences in survival
between patients associated with either cell type (Cox PH p > 0.05,
Supplementary Figs. [130]8 and [131]9). This independent analysis
showed consistent results, indicating that cell type compositions, such
as size-1 CC motifs, have limited effectiveness in differentiating
patient survival in CRC. In addition, the occurrence numbers of these
CC motifs among DII and CLR patients were 14,415 and 7004 (ratio 2.06)
for size-2 “A & B”; 4176 and 1548 (ratio 2.70) for size-3 “A & A & B”;
and 6946 and 2276 (ratio 3.05) for size-4 “A & A & B & B,” respectively
in each case. All were inferred as significant through the
Benjamini-Hochberg adjusted Fisher’s exact test. The different
distribution of these CC motifs on CCs among DII and CLR spots can be
visualized in Fig. [132]3G and H (size-2), Fig. [133]3I and J (size-3),
and Fig. [134]3K, L (size-4).
Notably, spatial topology plays a crucial role in linking phenotypes
and survival. There were two types of size-3 motifs with cell types
CD68+CD163+macrophages (“A”) and smooth muscle (“B”). Compared with “A
& A & B”, the alternative motif “A & B & B” occurred 4602 and 2255
times among DII and CLR patients, respectively, with a lower ratio of
2.04, and it cannot differentiate survival well (Cox PH p = 0.2975).
Apparently, these topological differences among the spatial
localization of cells in different cell types played different roles
biologically and pathologically, where conventional top-down approaches
with cell type composition failed to distinguish (Supplementary
Fig. [135]10 and Supplementary Data [136]13–[137]16).
Furthermore, an LR model provides intrinsic interpretability when
differentiating phenotypes. The coefficients of each feature from the
LR model demonstrated the importance of CC motifs quantitatively,
making the model interpretable (Fig. [138]3M–O). Notably, all
macrophage-related, muscle-related, and significant Cox PH p-value
motifs in different sizes tended to have high absolute coefficient
values. The same interpretable results can also be cross-validated by
Shapley value^[139]29 in Supplementary Fig. [140]6G–I, showing that
these macrophage-related and muscle-related CC motifs were essential to
differentiating DII patients from CLR patients. Biologically, it was
evidenced that macrophages facilitate pancreatic cancer to induce
muscle wasting via promoting TWEAK (TNF-like weak inducer of apoptosis)
secretion from the tumor^[141]30. After carefully checking these top
motifs in different sizes, we also identified biologically meaningful
tumor cells and B cells, which were known to be related to the severity
of CRC^[142]31. Representative tumor and B cell enrichments in DII and
CLR samples are shown in Supplementary Fig. [143]6J, [144]6K, [145]6L,
and 6M. Our analysis validated the crosstalk between macrophages,
muscle wasting, and cancer cachexia through an independent spatial
omics study, and TrimNN identified CC motifs in a data-driven approach
as robust interpretable representations in CNs.
TrimNN identifies CC motifs revealing diverse roles in Alzheimer’s disease
using spatial transcriptomics data
Next, we showed TrimNN’s capability to identify diverse spatially
distributed CC motifs corresponding to multiple biological and
pathological mechanisms in complex diseases. It is known that the
interaction between CTX (Cortex) excitatory neurons and Microglia is
significantly disrupted by neuroinflammation in Alzheimer’s disease
(AD)^[146]32. However, their topological combinations, particularly
their relationship with amyloid-β on the cellular level, are still
unknown^[147]33. In a study, we performed TrimNN on an AD mouse brain
with 8-month-old and 13-month-old samples sequenced by STARmap PLUS
spatially resolved transcriptomics^[148]34. There were two replicates
for both disease and control conditions at each time point. The
transcriptomics data included 2,766 genes and two proteomics channels
representing AD markers of amyloid-β and tau pathologies at subcellular
resolution.
On the derived CCs, size-3 triangle-like CC motifs composed of cell
types CTX excitatory neurons and Microglia were identified significant
between AD (Fig. [149]4A) and control (Fig. [150]4B). These significant
CC motifs included CTX excitatory neurons-CTX excitatory neurons-CTX
excitatory neurons (CCC), CTX excitatory neurons-CTX excitatory
neurons-Microglia (CCM), CTX excitatory neurons-Microglia-Microglia
(CMM), and Microglia-Microglia-Microglia (MMM) (Benjamini–Hochberg
adjusted Fisher’s exact test in 8-month-old replicate 1 with
p = 6.12e–32, p = 3.23e–20, p = 4.30e–34, and p = 9.74e–07,
respectively). Visualization of an exemplary CC motif “MMM”
demonstrated uneven spatial distribution that differed in AD
(Fig. [151]4C) and control (Fig. [152]4D). Supplementary Fig. [153]11
shows the spatial occurrence distribution of the other three motifs.
The CCs inferred from all samples are shown in Supplementary
Fig. [154]12 and Supplementary Data [155]17–[156]28.
Fig. 4. TrimNN analysis in an AD mouse study sequenced by STARmap PLUS.
[157]Fig. 4
[158]Open in a new tab
CCs of 13-month-old of A. AD and B control sample replicate 1 are
obtained using Delaunay triangulation, where black spots are amyloid-β
in the AD sample. The spatial locations of the identified motif with
all Microglia cells (“MMM” motif, where “M” denotes cell type
Microglia) in 13-month-old replicate 1 of C. AD and D control mouse
samples. “MMM” motifs are marked as purple. Cell–cell communication
analysis demonstrates the ligand–receptor differences between motif
regions and non-motif regions as river plots, including E cell type CTX
excitatory neurons (denoted as “C”) as source (left) and target
(right), F cell type Microglia as source (left) and target (right) in
regions with and without “CCC” motif. Similarly, G and H are cell types
of CTX excitatory neurons and Microglia as source and target in regions
with and without the “MMM” motif. I GO enrichment analysis of
Biological Processes and J Pathway enrichment analysis on DEGs between
regions containing and not containing the “MMM” motif in 13-month-old
AD samples. K Expression of marker genes for cell types “C” and “M”
related size-3 and size-4 motifs. L Spatial co-occurrence of different
CC motifs with respect to amyloid-β as computed using Squidpy.
Microglia-related motifs have an even higher spatial co-occurrence
probability compared to the amyloid-β plaque, and CTX excitatory
neurons–related motifs have lower spatial co-occurrence probabilities
compared to the amyloid-β plaque. CC: cellular community. Source data
are provided as a Source Data file.
Here, we defined motif-enriched regions as expanded regions within
three hops of CC motifs in the CC. From the perspective of cell–cell
communications, unique ligand–receptor signaling pathway patterns
between “CCC” (Fig. [159]4E, F) and “MMM” (Fig. [160]4G and H) were
identified by motif-enriched and complementary regions on 13-month-old
samples using CellChat^[161]35. Specific to cell type Microglia, motifs
“CCC” and “MMM” had dominant ligand–receptor pairs GRN^[162]21 and PMCH
to distinguish motif regions from the complementary regions. AD-related
ligand–receptors, including GRN, VEGF, PDGF, CCL, VIP, NRG, and SEMA3,
were significantly enriched in CC motifs associated with CTX excitatory
neurons or Microglia (Supplementary Data [163]29–[164]32). All the
cell–cell communication results on CC motifs are detailed in
Supplementary Figs. [165]13–[166]21 and Supplementary
Data [167]33–[168]40. These differences in cell–cell communications
were independently validated by DeepTalk^[169]36, incorporating
long-range cellular interactions. Particularly, CSF and VEGF pathways
exhibited significant differences between “CCC” and “ non- CCC” regions
in Microglia-to-CTX excitatory neurons cell–cell communication
(Mann–Whitney U test, p = 0.033, and p = 0.003, Supplementary
Figs. [170]22–[171]25). Considering cell–cell communication among all
the cell types, all ligand–receptor pairs identified by CellChat show
significant differences with and outside the identified motif regions
(Mann–Whitney U test, p-value < 0.001, Supplementary Figs. [172]26 and
[173]27).
Then we analyzed the gene-level characteristics of identified CC
motifs. Comparing “MMM” motif-enriched and complementary regions,
differentially expressed genes (DEGs) were identified as significant
(p-value < 0.05) using the Wilcoxon rank-sum test, including Plekha1,
Ctsb, and Sort1 in 8-month-old samples (Supplementary Data [174]44),
and App, Plekha1, Clu, Ptk2b, Sort1, Bin1, and Ctsb in 13-month-old
samples (Supplementary Data [175]48). On DEGs in 13-month-old samples,
Gene Ontology (GO) enrichment analysis showed significant
vesicle-mediated transport in synapse (q-value = 1.68E–107), regulation
of synapse structure or activity (q-value = 6.45e–106), learning or
memory (q-value = 1.20e–56), and cognition (q-value = 5.54e–56)
(Fig. [176]4I). Neural systems (q-value = 3.63e–51), transmission
across chemical synapses (q-value = 2.05e–33), neurotransmitter
receptors and postsynaptic signal transmission (q-value = 8.68e–20),
and nervous system development (q-value = 1.33e–15) were enriched with
pathway enrichment analysis (Fig. [177]4J). For all the detailed
results of CC motifs “CCC”, “CCM”, “CMM”, and “MMM”, a similar analysis
was performed for DEGs (Supplementary Data [178]41–[179]48), including
both GO and pathway enrichment analyses (Supplementary Figs. [180]28
and [181]29).
To validate their relations with AD, we compared these motif-related
DEGs with 77 AD-associated genes identified from large-scale GWAS
analysis^[182]37. On CC motif “CCC”, the Trem2 gene was exclusively
observed in the replicates of the 13-month-old but not 8-month-old AD
mouse model, consistent with its role in the late-onset of AD^[183]38.
Similarly, for the motif “MMM”, the Clu gene was highlighted only in
the 13-month-old mouse model, aligning with its direct involvement in
the formation process of amyloid-β^[184]39 (Supplementary
Fig. [185]30).
Based on the identified size-3 motifs, we performed pattern growth to
identify size-4 motifs using TrimNN (Supplementary Notes [186]1). Among
all the size-4 “CCC” expanded motifs, “CCCM” showed the most
significant difference between AD and control samples, while “MMMM” was
the most significant size-4 motif expanded from “MMM”. Similar to the
analysis on size-3 motifs, checking significantly enriched
ligand–receptors (Supplementary Data [187]49–[188]51) and DEGs
(Supplementary Data [189]52–[190]57), these size-4 motifs were related
to AD in cell–cell communication analysis (Supplementary
Data [191]58–[192]63), GO enrichment analysis (Supplementary
Fig. [193]31), and pathway enrichment analysis (Supplementary
Fig. [194]32).
Further investigation on DEGs showed diverse groups of CC motifs with
expressed markers (Fig. [195]4K). Size-3 “MMM” and size-4 “MMMM” motifs
with homogeneous Microglia had divergent expression patterns. For
example, Hexb had a higher average expression than the other CC motifs.
Hexb is known to induce toxic and progressive neuronal damage, which
may relate to neurodegenerative dementia^[196]40.
In addition to examining the diversity of CC motifs at the gene level,
we investigated whether the identified CC motifs were spatially
co-localized with amyloid-β by computing their co-occurrence
probabilities using Squidpy^[197]41. The results showed that
Microglia-related CC motifs had an even higher co-occurrence
probability with amyloid-β than the spatial expectation, distinguishing
them from other CC motifs associated with CTX excitatory neurons
(Fig. [198]4L). Interestingly, the extent of homogeneity of Microglia
regions seemed to correspond to a larger co-occurrence probability of
amyloid-β. In contrast, the extent of homogeneity of CTX excitatory
neurons tallied to a lower co-occurrence probability. This trend
prevailed across the whole spectrum of CC motifs composed of Microglia
and CTX excitatory neurons in multiple sizes, from a very high ratio of
size-4 “MMMM” to a very low ratio of size-3 “CCC”.
Differences in both DEGs and spatial co-occurrence suggest the presence
of two distinct types of CC motifs related to amyloid-β in AD. One type
of CC motif (i.e., “CCC”, “CCM”, “CMM”, and “CCCM”) was reluctant to
co-localize with amyloid-β. Another kind of CC motif (i.e., “MMM” and
“MMMM”) was closely co-localized with amyloid-β. These results were
consistent with the observation that Microglia, as key mediators in the
brain, activate inflammation in the vicinity of amyloid-β deposits,
which are directly toxic to the adjacent neurons^[199]42. Activated
Microglia release pro-inflammatory cytokines, such as tumor necrosis
factor-alpha (TNF-α) and interleukin-1 beta (IL-1β), can damage
excitatory neurons or alter their function^[200]43.
In this case study, TrimNN confirmed known knowledge of AD-related cell
types and provided some new insights into spatial biology. As an
unbiased data-driven approach, TrimNN independently identified
pathologically related spatial characteristics of Microglia and CTX
excitatory neurons, along with their topological relations with diverse
cell types, as two distinguished CC motifs differ in levels of cell
type, gene, and cell–cell communications. Analysis enabled by CC motifs
demonstrates an unprecedented spectrum of the spatial relationships
between the homogeneity of CTX excitatory neurons/Microglia cell types
and the location of amyloid-β. TrimNN accurately captured these spatial
co-localization patterns with amyloid-β deposits, providing insights
into the onset of AD as the result of interactions between multiple
cell types^[201]44, which clustering-based tools may overlook
(Supplementary Figs. [202]33 and [203]34).
TrimNN identifies cell type–specific spatial tendencies in a colorectal
carcinoma study on spatial proteomics data
Besides the AD study, we also performed TrimNN analysis to explore cell
type–specific spatial tendencies in one colorectal carcinoma study. It
is known that the tumor microenvironment can significantly influence
the interactions between T-cells and epithelial cells through antigen
presentation, T-cell activation, and modulation of the tumor
microenvironment. However, it is still unknown how the spatial
arrangement of these cells is related to effective immune surveillance
and the potential for therapeutic interventions^[204]45. The adopted
colorectal carcinoma study investigated 40 ROIs in two colorectal
cancer patients and 18 ROIs in two healthy controls using spatial
proteomics of multiplexed ion beam imaging using time of flight
(MIBI-TOF)^[205]46.
After a comprehensive analysis of size-3 and their related size-4 CC
motifs with TrimNN, we defined two types of CC motifs: Shifted
Interaction Motifs and Homeostatic Interaction Motifs (Fig. [206]5A).
Shifted Interaction Motifs demonstrated a shift of CC motif abundance
from control-enriched (more occurrence in control than disease samples)
to disease-enriched (more occurrence in disease than control samples)
when expanding from size-3 to size-4. The exemplary size-3 motif “ABC”
(Fig. [207]5B, C) suggested disease progression when involving other
immune cells to form a size-4 motif “ABCD” (Fig. [208]5D, E), where “A”
denotes CD4 T-cells, “B” denotes CD8 T-cells, “C” denotes epithelial,
and “D” denotes other immune cells (other CD45+) annotated by the
original publication. Proportion tests showed that this size-4 “ABCD”
motif significantly differed from the size-3 “ABC” motif in abundance
between disease and control samples (p = 2.58e–12). In contrast,
Homeostatic Interaction Motif remained consistent in abundance between
disease and control groups when expanding its sizes (e.g., size-3 motif
“AEC”) (Fig. [209]5F, G), where “E” denotes endothelial concatenating
another epithelial (“C”) as a size-4 motif “AECC” (Fig. [210]5H, I).
Proportion tests showed consistency between disease and control ratios
among this pair of size-3 and size-4 motifs (p = 0.55). The expression
level of antibodies also confirmed differences between these two groups
of CC motifs (Fig. [211]5J, Supplementary Fig. [212]35A and [213]35B).
A similar analysis demonstrated that these two groups of CC motifs also
existed in AD studies (Supplementary Notes [214]2).
Fig. 5. TrimNN analysis in a colorectal carcinoma study using MIBI-TOF.
[215]Fig. 5
[216]Open in a new tab
A Schematic of Shifted Interaction Motif and Homeostatic Interaction
Motif as two types of size-4 motifs. Shifted Interaction Motifs: size-3
motif “ABC” (purple) in exemplary B spot 33 (disease) and C spot 56
(control), the successor size-4 motif “ABCD” (red) in the same D spot
33 (disease) and E spot 56 (control). Homeostatic Interaction Motifs:
exemplary size-3 motif “AEC” (purple) in exemplary F spot 8 (disease)
and G spot 50 (control), the successor size-4 motif “AECC” (red) in the
same H spot 8 (disease) and I spot 50 (control). J Heatmap of antibody
expression ratio between disease and control samples in Shifted
Interaction Motif and Homeostatic Interaction Motif. K Ranking of
effective size between all cell types in colon tissue samples.
Abundance of size-2 CC motifs as occurrences in L colorectal carcinoma
and M healthy control samples. N P-value of size-2 CC motifs between
disease and control by the two-sided Benjamini-Hochberg adjusted
Fisher’s exact test. O Heatmap on sender rank from NCEM-type coupling
analysis in colon tissue samples. Heatmap of NCEM-type coupling
analysis in P colorectal carcinoma and Q healthy control samples. R
Difference values from NCEM-type coupling analysis between disease and
control samples. Cell type “A” denotes CD4 T-cells, “B” denotes CD8
T-cells, “C” denotes epithelial, “D” denotes other immune cells
annotated by the original publication, and “E” denotes endothelial. The
star symbol marks the paired cell type composition of the “AEC” motif.
CC: cellular community. Source data are provided as a Source Data file.
To better explain the biological relevance of these patterns, we
analyzed the types of proteins showing increased expression in the
expanded motifs. Interestingly, the increased disease/control
fold-change observed in “ABCD” compared to “ABC” suggests that the
involvement of the fourth node (other immune cells) is associated with
heightened immune activity or immune cell engagement in the tumor
microenvironment. Many of the upregulated proteins in this motif, such
as CD39, CD11C, CD45, CD14, and H3, are related to immune infiltration,
antigen presentation, or immune cell function. However, the presence of
CD39 (involved in immunosuppressive adenosine signaling) and a
concurrent rise in metabolic enzymes (e.g., HK1, G6PD, and VDAC1) and
proliferation markers (e.g., Ki67) suggests that this immune activity
may be skewed or dysfunctional. Rather than a classical
immune-activated phenotype, the tumor microenvironment in “ABCD” may
represent an immune-infiltrated but metabolically constrained
immunoregulatory niche.
In contrast, although the Homeostatic Interaction Motifs “AEC” and
“AECC” do not change in abundance across conditions, their protein
expression reveals different biological characteristics. Notably, PD1
and GLUT1 are upregulated in the size-4 motif “AECC” suggesting that
immune checkpoint pathways and metabolic competition may play important
roles even in spatially stable interaction contexts. PD1 upregulation
indicates a checkpoint-mediated immune inhibitory signal, while GLUT1
elevation supports increased glycolytic activity, both of which can
further restrict immune function through resource competition. We also
observed that motif-specific differences in PD1 and CD3 did not change
between “ABC” and “ABCD”, but both showed increased expression in
“AECC” versus “AEC”. This implies that the involvement of endothelial
cells may contribute to a niche promoting immune checkpoint activation
or T cell exhaustion. CD36, a lipid metabolism regulator, showed a
moderate decrease in the “ABCD” motif, suggesting lipid metabolic
shifts associated with certain spatial configurations.
Together, these comparisons provide evidence that different motif
expansions not only reflect structural reorganization but also
correspond to distinct biological states. The Shifted Interaction Motif
represents a transition toward a metabolically active,
immune-suppressive tumor niche, consistent with tumor progression and
immune evasion. The Homeostatic Interaction Motif, while stable in
structure, still shows signs of immune checkpoint engagement,
potentially limiting effective immune surveillance.
Next, we explored cell-type preferences using TrimNN analysis and