Abstract

Background

   Ulcerative colitis (UC) is a chronic inflammatory bowel disease
   characterized by persistent inflammation of the colon. The specific
   cause of UC is still not fully understood, but this condition is
   believed to arise from a combination of environmental, genetic,
   microbial, and immune factors. This study aimed to explore the specific
   roles of macrophages and fibroblasts in UC pathogenesis, focusing on
   their interactions and contributions to disease progression.

Methods

   We utilized single-cell RNA sequencing (scRNA-seq) to analyze
   macrophages and fibroblasts in peripheral blood and colon biopsy
   samples from UC patients. Bulk RNA sequencing and spatial
   transcriptomic data from the Gene Expression Omnibus (GEO) database and
   flow cytometry and multiplex immunohistochemistry (mIHC) data were used
   for validation. Statistical analyses were performed to assess the
   correlation between cell abundance and disease severity.

Results

   Macrophages and fibroblasts were identified as key communication hubs
   in UC; specifically, SPP1 + macrophages and CHI3L1 + fibroblasts were
   significantly enriched at the sites of inflammation. These cells are
   strongly correlated with disease severity and orchestrate inflammatory
   responses within the intestinal immune microenvironment, contributing
   to UC-associated colorectal cancer.

Conclusions

   Our study identified SPP1 + macrophages and CHI3L1 + fibroblasts as key
   contributors to UC pathogenesis. These cells are enriched in
   inflammatory sites, are correlated with disease severity, and play a
   role in UC-associated colorectal cancer, providing new insights into UC
   mechanisms.

Supplementary Information

   The online version contains supplementary material available at
   10.1186/s12967-025-06565-5.

   Keywords: Ulcerative colitis, SPP1 + macrophages, CHI3L1 + fibroblasts,
   Regulatory networks, scRNA-seq

Introduction

   Ulcerative colitis (UC) is a type of inflammatory bowel disease (IBD)
   that is more common than Crohn’s disease is [[42]1]. The incidence of
   UC is increasing annually in Asia, with a fourfold increase in
   prevalence is expected by 2035 [[43]2]. The most severe complication of
   UC is colorectal cancer (CRC). Indeed, studies have shown that
   individuals with UC have a greater risk of developing and dying from
   CRC than do those without UC [[44]3–[45]5]. Risk factors for
   UC-associated tumors include a long disease duration, extensive tumor
   involvement, and primary sclerosing cholangitis. Therefore, exploring
   the pathogenesis of UC, controlling the progression of this disease,
   shortening the course of UC, and reducing the incidence of UC-CRC are
   important research topics.

   The exact pathogenesis mechanism of UC is still under investigation. UC
   is hypothesized to be caused by a combination of environmental factors,
   host genetic factors, intestinal microbial infections and immune
   factors [[46]6, [47]7]. In other autoimmune diseases, such as psoriasis
   and rheumatoid arthritis, abnormalities in immune system regulation are
   strongly correlated with the disease [[48]8]. As the intestinal flora
   comprises a microbial community, the intestinal immune response to the
   intestinal flora is strictly regulated and determines the occurrence of
   immune tolerance or a defensive inflammatory response. Disturbances in
   the balance of intestinal immune reactions can lead to UC [[49]9,
   [50]10].

   Single-cell RNA sequencing (scRNA-seq) is used to accurately determine
   the transcriptional characteristics of individual cells. Multiple
   studies have described the diverse cell types in the intestinal mucosa
   of UC patients, highlighting the imbalance in cellular populations in
   the context of intestinal inflammation [[51]11, [52]12]. These findings
   provide valuable insights into the mechanisms underlying UC
   pathogenesis and will aid in the identification of potential
   therapeutic targets.

   Macrophages are innate immune cells that play important roles in both
   the inflammatory response and tumor immunity. Various inflammatory
   cytokines secreted by macrophages play crucial roles in UC [[53]12].
   Tumor-associated macrophages, specifically SPP1 + macrophages, have
   been found in multiple types of cancer. These cells are highly enriched
   in tumor tissues and are closely associated with prognosis
   [[54]13–[55]15].

   In IBD, disruption of the intestinal mucosal barrier is a key cause of
   disrupted immune homeostasis. In chronic infections, inflammation, and
   cancer, the tissue microenvironment regulates the behavior of local
   immune cells. Among them, fibroblasts in the tissue microenvironment
   are key cell types that modulate immune responses, either by activating
   or suppressing them [[56]16]. Intestinal fibroblasts can undergo
   phenotypic polarization in response to microbial stimuli, shifting
   toward a proinflammatory state and acting as a central feedback hub to
   further facilitate immune cell recruitment [[57]17]. In addition,
   tumor-associated fibroblasts are enriched in CRC and serve as key
   components of the tumor microenvironment that promote cancer cell
   invasion and reshape immune cell infiltration pathways [[58]18].

   Both macrophages and fibroblasts have been reported as key nodes in
   intercellular communication within the immune microenvironment of IBD,
   with each cell type playing a critical role in disease pathogenesis
   [[59]19–[60]21]. Furthermore, interactions between macrophages and
   fibroblasts have been well documented in various cancers and
   inflammatory diseases, where they are closely linked to disease
   progression [[61]14, [62]22–[63]24]. However, despite these findings
   under other conditions, there is limited research on the interactions
   between macrophages and fibroblasts, especially in UC. Therefore, this
   study focused on the interactions between macrophages and fibroblasts
   in UC patients and, for the first time, identified tumor-associated
   SPP1 + macrophages in UC patients. We observed enrichment of
   SPP1 + inflammation-associated macrophages and
   CHI3L1 + inflammation-associated fibroblasts at inflammatory sites in
   UC patients and found that the infiltration level of these cells is
   strongly correlated with the severity of the disease in these patients.
   Immunofluorescence staining confirmed this interaction. Overall, this
   work reveals interactions between CHI3L1 + fibroblasts and
   SPP1 + macrophages, which may provide new insights for the diagnosis
   and treatment of UC.

Methods

Clinical sample collection from patients

   Normal mucosal and inflammatory tissue samples as well as fresh
   peripheral blood samples were collected from UC patients (n = 7). Fresh
   tissue samples were kept on ice in RPMI 1640 medium supplemented with
   10% FBS and prepared for transport. Specific clinical information is
   provided in Supplementary Table 1.

Primary analysis of the raw read data (scRNA-seq)

   The raw reads were processed to generate gene expression profiles via
   CeleScope v1.12 (Singleron Biotechnologies) with default parameters.
   Briefly, barcodes and unique molecular identifiers (UMIs) were
   extracted from R1 reads and corrected. Adapter sequences and poly A
   tails were trimmed from R2 reads, and the trimmed R2 reads were aligned
   against the GRCh38 (hg38) transcriptome via STAR (v2.6.1b). Uniquely
   mapped reads were subsequently assigned to exons with FeatureCounts
   (v2.0.1). Successfully assigned reads with the same cell barcode, UMI
   and gene were grouped together to generate a gene expression matrix for
   further analysis.

Quality control, dimension reduction and clustering (Scanpy)

   Scanpy v1.8.2 was used for quality control, dimensionality reduction
   and clustering in Python 3.7. For each sample dataset, we filtered the
   expression matrix according to the following criteria: (1) cells with a
   gene count less than 200 or with a top 2% gene count were excluded;
   ((2) cells with a top 2% UMI count were excluded; (3) cells with a
   mitochondrial content > 30% were excluded; and (4) genes expressed in
   fewer than 5 cells were excluded. After filtering, 123,092 cells were
   retained for downstream analyses, with an average of 896 genes and 2362
   UMIs per cell. The raw count matrix was normalized by total counts per
   cell and logarithmically transformed into a normalized data matrix. The
   top 2000 variable genes were selected by setting flavor = ‘seurat’.
   Principal component analysis (PCA) was performed on the scaled variable
   gene matrix, and the top 23 principal components were used for
   clustering and dimensionality reduction. The cells were separated into
   23 clusters via the Louvain algorithm, and the resolution parameter was
   set to 1.2. The cell clusters were visualized via uniform manifold
   approximation and projection (UMAP)[[64]25].

Batch effect removal

   The batch effect between samples was removed with Harmony v1.0 using
   the top 20 principal components from the PCA [[65]26].

Differentially expressed gene (DEG) analysis (Scanpy)

   To identify DEGs, we used the scanpy.tl.rank_genes_groups() function
   based on the Wilcoxon rank-sum test with default parameters; genes
   expressed in more than 10% of the cells in either of the compared
   groups of cells and with an average log(fold change) value greater than
   1 were identified as DEGs. The adjusted P value was calculated via
   Benjamini‒Hochberg correction, and a value of 0.05 was used as the
   criterion to evaluate statistical significance.

Pathway enrichment analysis

   To investigate the potential functions of DEGs between inflammatory or
   ulcerated (IFM) tissue and normal or noninflammatory (non-IFM) tissues,
   we performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and
   Genomes (KEGG) analyses via the “clusterProfiler” R package v 4.0.
   Pathways with p_adj values less than 0.05 were considered significantly
   enriched. Data for selected significantly enriched pathways were
   plotted in bar plots. Gene set enrichment analysis (GSEA) was performed
   on the DEGs. For gene set variation analysis (GSVA), average gene
   expression levels for each cell type were used as input data [[66]27].
   GO gene sets in the molecular function (MF), biological process (BP),
   and cellular component (CC) categories were used.

Cell type annotation

Cell type recognition with Cell-ID

   Cell-ID is a multivariate approach that extracts gene signatures for
   each individual cell and performs cell identity recognition using
   hypergeometric tests (HGTs). Dimensionality reduction was performed on
   a normalized gene expression matrix through multiple correspondence
   analysis, in which both cells and genes were projected in the same
   low-dimensional space. Then, a gene ranking was calculated for each
   cell to obtain the most featured gene sets of that cell. HGTs were
   performed on these gene sets against the intestinal reference from the
   SynEcoSys database, which contains all cell type-specific genes. The
   identity of each cell was determined as the cell type with the minimal
   HGT P value. For cluster annotation, the frequency of each cell type
   was calculated for each cluster, and the cell type with the highest
   frequency was used to determine cluster identity.

   The identity of each cell cluster was determined according to the
   expression of canonical markers from the reference database SynEcoSysTM
   (Singleron Biotechnology). SynEcoSysTM contains collections of
   canonical cell type marker data for single-cell sequencing data from
   CellMarkerDB, PanglaoDB and recently published literature.

Filtering of cell doublets

   The number of cell doublets was estimated on the basis of the
   expression patterns of canonical cell markers. Any clusters enriched
   with multiple cell type-specific markers were excluded from the
   downstream analysis.

Cell‒cell interaction (CCI) analysis via CellChat

   CellChat (version 1.6.1) was used to analyze intercellular
   communication networks according to the scRNA-seq data. The CellChat
   dataset was created using the corresponding R package. Cell information
   was added to the meta-slot of the object. The ligand‒receptor
   interaction database was used, and matching receptor inference
   calculations were performed [[67]28].

CCI analysis: CellPhoneDB

   CCIs between fibroblasts and macrophages were predicted on the basis of
   known ligand–receptor pairs via CellPhoneDB (v2.1.7) [[68]29]. The
   permutation number for calculating the null distribution of average
   ligand‒receptor pair expression in randomized cell identities was set
   to 1000. Individual ligands or receptor expression levels were
   thresholded according to a cutoff on the basis of the average log gene
   expression distribution for all genes across each cell type. Predicted
   interaction pairs with a P value < 0.05 and an average log
   expression > 0.1 were considered significant and visualized with a
   heatmap plot and dot plot, respectively, in CellPhoneDB.

Pseudotime trajectory analysis with Monocle2

   The cell differentiation trajectory of monocyte subtypes was
   reconstructed with Monocle2 v 2.22.0 (ref). For construction of the
   trajectory, the top 2000 highly variable genes were selected with the
   Seurat (v4.1.2) FindVariableFeatures() function, and dimension
   reduction was performed with DDRTree(). The trajectory was visualized
   with the plot_cell_trajectory() function in Monocle2 [[69]30].

RNA velocity

   For RNA velocity, BAM files containing fibroblasts and macrophages and
   the reference genome GRCh38 (hg38) were used for analysis with velocyto
   (v 0.2.4) [[70]31] and scVelo (v0.17.17) in Python with default
   parameters. The results were projected to the UMAP plot in Seurat
   clustering analysis for visualization consistency.

UCell gene set scoring

   Gene set scoring was performed using the R package UCell v 2.2.0
   [[71]32]. UCell scores were determined on the basis of the Mann‒Whitney
   U statistic by ranking query genes in order of their expression levels
   in individual cells. Because UCell is a rank-based scoring method, it
   is suitable for use in large datasets containing multiple samples and
   batches.

scGSVA

   To perform GSVA for single-cell data, we used scGSVA
   ([72]https://github.com/guokai8/scGSVA), which uses ssGSEA methods to
   score individual cells to generate multiple pathway enrichment score
   matrices. The limma package was used to calculate the differential
   enrichment scores for pathways; an absolute value of t greater than
   1.96 indicated a significant difference in these scores among cell
   types.

Transcription factor (TF) regulatory network analysis (pySCENIC)

   A TF network was constructed with pySCENIC (v0.11.0) [[73]33] using the
   scRNA expression matrix and TFs in AnimalTFDB. First, GRNBoost2 was
   used to construct a regulatory network on the basis of the coexpression
   of regulators and targets. CisTarget was subsequently used to exclude
   indirect targets and to search for TF binding motifs. Afterward, AUCell
   was used for regulon activity quantification for every cell.
   Cluster-specific TF regulons were identified according to regulon
   specificity scores (RSSs), and the activity of these TF regulons was
   visualized in heatmaps.

MuSic

   Cell type deconvolution of the bulk RNA-seq data with single-cell
   references was performed with the R package MuSiC (v1.0.0). The bulk