Abstract Background Ulcerative colitis (UC) is a chronic inflammatory bowel disease characterized by persistent inflammation of the colon. The specific cause of UC is still not fully understood, but this condition is believed to arise from a combination of environmental, genetic, microbial, and immune factors. This study aimed to explore the specific roles of macrophages and fibroblasts in UC pathogenesis, focusing on their interactions and contributions to disease progression. Methods We utilized single-cell RNA sequencing (scRNA-seq) to analyze macrophages and fibroblasts in peripheral blood and colon biopsy samples from UC patients. Bulk RNA sequencing and spatial transcriptomic data from the Gene Expression Omnibus (GEO) database and flow cytometry and multiplex immunohistochemistry (mIHC) data were used for validation. Statistical analyses were performed to assess the correlation between cell abundance and disease severity. Results Macrophages and fibroblasts were identified as key communication hubs in UC; specifically, SPP1 + macrophages and CHI3L1 + fibroblasts were significantly enriched at the sites of inflammation. These cells are strongly correlated with disease severity and orchestrate inflammatory responses within the intestinal immune microenvironment, contributing to UC-associated colorectal cancer. Conclusions Our study identified SPP1 + macrophages and CHI3L1 + fibroblasts as key contributors to UC pathogenesis. These cells are enriched in inflammatory sites, are correlated with disease severity, and play a role in UC-associated colorectal cancer, providing new insights into UC mechanisms. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-025-06565-5. Keywords: Ulcerative colitis, SPP1 + macrophages, CHI3L1 + fibroblasts, Regulatory networks, scRNA-seq Introduction Ulcerative colitis (UC) is a type of inflammatory bowel disease (IBD) that is more common than Crohn’s disease is [[42]1]. The incidence of UC is increasing annually in Asia, with a fourfold increase in prevalence is expected by 2035 [[43]2]. The most severe complication of UC is colorectal cancer (CRC). Indeed, studies have shown that individuals with UC have a greater risk of developing and dying from CRC than do those without UC [[44]3–[45]5]. Risk factors for UC-associated tumors include a long disease duration, extensive tumor involvement, and primary sclerosing cholangitis. Therefore, exploring the pathogenesis of UC, controlling the progression of this disease, shortening the course of UC, and reducing the incidence of UC-CRC are important research topics. The exact pathogenesis mechanism of UC is still under investigation. UC is hypothesized to be caused by a combination of environmental factors, host genetic factors, intestinal microbial infections and immune factors [[46]6, [47]7]. In other autoimmune diseases, such as psoriasis and rheumatoid arthritis, abnormalities in immune system regulation are strongly correlated with the disease [[48]8]. As the intestinal flora comprises a microbial community, the intestinal immune response to the intestinal flora is strictly regulated and determines the occurrence of immune tolerance or a defensive inflammatory response. Disturbances in the balance of intestinal immune reactions can lead to UC [[49]9, [50]10]. Single-cell RNA sequencing (scRNA-seq) is used to accurately determine the transcriptional characteristics of individual cells. Multiple studies have described the diverse cell types in the intestinal mucosa of UC patients, highlighting the imbalance in cellular populations in the context of intestinal inflammation [[51]11, [52]12]. These findings provide valuable insights into the mechanisms underlying UC pathogenesis and will aid in the identification of potential therapeutic targets. Macrophages are innate immune cells that play important roles in both the inflammatory response and tumor immunity. Various inflammatory cytokines secreted by macrophages play crucial roles in UC [[53]12]. Tumor-associated macrophages, specifically SPP1 + macrophages, have been found in multiple types of cancer. These cells are highly enriched in tumor tissues and are closely associated with prognosis [[54]13–[55]15]. In IBD, disruption of the intestinal mucosal barrier is a key cause of disrupted immune homeostasis. In chronic infections, inflammation, and cancer, the tissue microenvironment regulates the behavior of local immune cells. Among them, fibroblasts in the tissue microenvironment are key cell types that modulate immune responses, either by activating or suppressing them [[56]16]. Intestinal fibroblasts can undergo phenotypic polarization in response to microbial stimuli, shifting toward a proinflammatory state and acting as a central feedback hub to further facilitate immune cell recruitment [[57]17]. In addition, tumor-associated fibroblasts are enriched in CRC and serve as key components of the tumor microenvironment that promote cancer cell invasion and reshape immune cell infiltration pathways [[58]18]. Both macrophages and fibroblasts have been reported as key nodes in intercellular communication within the immune microenvironment of IBD, with each cell type playing a critical role in disease pathogenesis [[59]19–[60]21]. Furthermore, interactions between macrophages and fibroblasts have been well documented in various cancers and inflammatory diseases, where they are closely linked to disease progression [[61]14, [62]22–[63]24]. However, despite these findings under other conditions, there is limited research on the interactions between macrophages and fibroblasts, especially in UC. Therefore, this study focused on the interactions between macrophages and fibroblasts in UC patients and, for the first time, identified tumor-associated SPP1 + macrophages in UC patients. We observed enrichment of SPP1 + inflammation-associated macrophages and CHI3L1 + inflammation-associated fibroblasts at inflammatory sites in UC patients and found that the infiltration level of these cells is strongly correlated with the severity of the disease in these patients. Immunofluorescence staining confirmed this interaction. Overall, this work reveals interactions between CHI3L1 + fibroblasts and SPP1 + macrophages, which may provide new insights for the diagnosis and treatment of UC. Methods Clinical sample collection from patients Normal mucosal and inflammatory tissue samples as well as fresh peripheral blood samples were collected from UC patients (n = 7). Fresh tissue samples were kept on ice in RPMI 1640 medium supplemented with 10% FBS and prepared for transport. Specific clinical information is provided in Supplementary Table 1. Primary analysis of the raw read data (scRNA-seq) The raw reads were processed to generate gene expression profiles via CeleScope v1.12 (Singleron Biotechnologies) with default parameters. Briefly, barcodes and unique molecular identifiers (UMIs) were extracted from R1 reads and corrected. Adapter sequences and poly A tails were trimmed from R2 reads, and the trimmed R2 reads were aligned against the GRCh38 (hg38) transcriptome via STAR (v2.6.1b). Uniquely mapped reads were subsequently assigned to exons with FeatureCounts (v2.0.1). Successfully assigned reads with the same cell barcode, UMI and gene were grouped together to generate a gene expression matrix for further analysis. Quality control, dimension reduction and clustering (Scanpy) Scanpy v1.8.2 was used for quality control, dimensionality reduction and clustering in Python 3.7. For each sample dataset, we filtered the expression matrix according to the following criteria: (1) cells with a gene count less than 200 or with a top 2% gene count were excluded; ((2) cells with a top 2% UMI count were excluded; (3) cells with a mitochondrial content > 30% were excluded; and (4) genes expressed in fewer than 5 cells were excluded. After filtering, 123,092 cells were retained for downstream analyses, with an average of 896 genes and 2362 UMIs per cell. The raw count matrix was normalized by total counts per cell and logarithmically transformed into a normalized data matrix. The top 2000 variable genes were selected by setting flavor = ‘seurat’. Principal component analysis (PCA) was performed on the scaled variable gene matrix, and the top 23 principal components were used for clustering and dimensionality reduction. The cells were separated into 23 clusters via the Louvain algorithm, and the resolution parameter was set to 1.2. The cell clusters were visualized via uniform manifold approximation and projection (UMAP)[[64]25]. Batch effect removal The batch effect between samples was removed with Harmony v1.0 using the top 20 principal components from the PCA [[65]26]. Differentially expressed gene (DEG) analysis (Scanpy) To identify DEGs, we used the scanpy.tl.rank_genes_groups() function based on the Wilcoxon rank-sum test with default parameters; genes expressed in more than 10% of the cells in either of the compared groups of cells and with an average log(fold change) value greater than 1 were identified as DEGs. The adjusted P value was calculated via Benjamini‒Hochberg correction, and a value of 0.05 was used as the criterion to evaluate statistical significance. Pathway enrichment analysis To investigate the potential functions of DEGs between inflammatory or ulcerated (IFM) tissue and normal or noninflammatory (non-IFM) tissues, we performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses via the “clusterProfiler” R package v 4.0. Pathways with p_adj values less than 0.05 were considered significantly enriched. Data for selected significantly enriched pathways were plotted in bar plots. Gene set enrichment analysis (GSEA) was performed on the DEGs. For gene set variation analysis (GSVA), average gene expression levels for each cell type were used as input data [[66]27]. GO gene sets in the molecular function (MF), biological process (BP), and cellular component (CC) categories were used. Cell type annotation Cell type recognition with Cell-ID Cell-ID is a multivariate approach that extracts gene signatures for each individual cell and performs cell identity recognition using hypergeometric tests (HGTs). Dimensionality reduction was performed on a normalized gene expression matrix through multiple correspondence analysis, in which both cells and genes were projected in the same low-dimensional space. Then, a gene ranking was calculated for each cell to obtain the most featured gene sets of that cell. HGTs were performed on these gene sets against the intestinal reference from the SynEcoSys database, which contains all cell type-specific genes. The identity of each cell was determined as the cell type with the minimal HGT P value. For cluster annotation, the frequency of each cell type was calculated for each cluster, and the cell type with the highest frequency was used to determine cluster identity. The identity of each cell cluster was determined according to the expression of canonical markers from the reference database SynEcoSysTM (Singleron Biotechnology). SynEcoSysTM contains collections of canonical cell type marker data for single-cell sequencing data from CellMarkerDB, PanglaoDB and recently published literature. Filtering of cell doublets The number of cell doublets was estimated on the basis of the expression patterns of canonical cell markers. Any clusters enriched with multiple cell type-specific markers were excluded from the downstream analysis. Cell‒cell interaction (CCI) analysis via CellChat CellChat (version 1.6.1) was used to analyze intercellular communication networks according to the scRNA-seq data. The CellChat dataset was created using the corresponding R package. Cell information was added to the meta-slot of the object. The ligand‒receptor interaction database was used, and matching receptor inference calculations were performed [[67]28]. CCI analysis: CellPhoneDB CCIs between fibroblasts and macrophages were predicted on the basis of known ligand–receptor pairs via CellPhoneDB (v2.1.7) [[68]29]. The permutation number for calculating the null distribution of average ligand‒receptor pair expression in randomized cell identities was set to 1000. Individual ligands or receptor expression levels were thresholded according to a cutoff on the basis of the average log gene expression distribution for all genes across each cell type. Predicted interaction pairs with a P value < 0.05 and an average log expression > 0.1 were considered significant and visualized with a heatmap plot and dot plot, respectively, in CellPhoneDB. Pseudotime trajectory analysis with Monocle2 The cell differentiation trajectory of monocyte subtypes was reconstructed with Monocle2 v 2.22.0 (ref). For construction of the trajectory, the top 2000 highly variable genes were selected with the Seurat (v4.1.2) FindVariableFeatures() function, and dimension reduction was performed with DDRTree(). The trajectory was visualized with the plot_cell_trajectory() function in Monocle2 [[69]30]. RNA velocity For RNA velocity, BAM files containing fibroblasts and macrophages and the reference genome GRCh38 (hg38) were used for analysis with velocyto (v 0.2.4) [[70]31] and scVelo (v0.17.17) in Python with default parameters. The results were projected to the UMAP plot in Seurat clustering analysis for visualization consistency. UCell gene set scoring Gene set scoring was performed using the R package UCell v 2.2.0 [[71]32]. UCell scores were determined on the basis of the Mann‒Whitney U statistic by ranking query genes in order of their expression levels in individual cells. Because UCell is a rank-based scoring method, it is suitable for use in large datasets containing multiple samples and batches. scGSVA To perform GSVA for single-cell data, we used scGSVA ([72]https://github.com/guokai8/scGSVA), which uses ssGSEA methods to score individual cells to generate multiple pathway enrichment score matrices. The limma package was used to calculate the differential enrichment scores for pathways; an absolute value of t greater than 1.96 indicated a significant difference in these scores among cell types. Transcription factor (TF) regulatory network analysis (pySCENIC) A TF network was constructed with pySCENIC (v0.11.0) [[73]33] using the scRNA expression matrix and TFs in AnimalTFDB. First, GRNBoost2 was used to construct a regulatory network on the basis of the coexpression of regulators and targets. CisTarget was subsequently used to exclude indirect targets and to search for TF binding motifs. Afterward, AUCell was used for regulon activity quantification for every cell. Cluster-specific TF regulons were identified according to regulon specificity scores (RSSs), and the activity of these TF regulons was visualized in heatmaps. MuSic Cell type deconvolution of the bulk RNA-seq data with single-cell references was performed with the R package MuSiC (v1.0.0). The bulk