Abstract Background The heterogeneity of colorectal cancer (CRC) and its complex immune microenvironment pose significant challenges for treatment. Understanding the cellular composition and dynamic changes is essential for uncovering mechanisms of tumour progression. Methods To investigate the cellular heterogeneity and immune microenvironment of CRC, identifying critical subpopulations, functional pathways, and prognostic biomarkers, single-cell transcriptomic data from 41 CRC samples across four datasets were integrated. Bioinformatic analyses identified cellular subpopulations, cell communication networks, and prognostic biomarkers. The expression patterns, clinical significance and biological function of TUBB were validated in vitro. Results A distinct epithelial subpopulation with proliferative and invasive features was identified, promoting tumour progression by resisting apoptosis and remodelling the extracellular matrix. ActMono, a terminal state of myeloid cells, was enriched in tumours and linked to disease progression. Cell communication analysis highlighted galectin signalling in immune regulation. A prognostic model (CRS) based on secretory immune cell-related genes identified TUBB as a key molecule influencing the cell cycle and extracellular matrix remodelling, with its expression patterns, clinical significance and biological effects validated in vitro. Conclusion This study reveals critical subpopulations, signalling pathways, and biomarkers in CRC, providing insights into tumour progression and potential therapeutic strategies. Keywords: single-cell sequencing, colorectal cancer, tumour microenvironment, cellular heterogeneity, immune regulation, TUBB 1. Introduction Colorectal cancer (CRC) ranks as the third most common malignancy globally and represents a leading cause of cancer-related mortality. Despite surgical resection combined with chemotherapy being the standard treatment protocol, approximately one-third of patients experience disease recurrence ([35]1, [36]2). While immune checkpoint inhibitors have shown significant efficacy in microsatellite instability-high (MSI-H) tumours and combined EGFR/BRAF inhibitor therapy has proven effective in BRAF V600E-mutant CRC, these treatments are only applicable to specific patient subgroups ([37]3–[38]5). Large-scale gene expression studies have established molecular classification systems for CRC, most notably the Consensus Molecular Subtypes (CMS), which categorizes CRC into four subtypes: CMS1–4 ([39]6). However, these classifications, which are primarily based on bulk sequencing data, cannot precisely resolve the complex cellular heterogeneity within the tumour microenvironment. The development of CRC involves the accumulation of mutations in multiple oncogenes and tumour suppressor genes (such as APC, KRAS, and PIK3CA) and microsatellite instability caused by DNA mismatch repair gene dysfunction ([40]3, [41]7). Although high tumour mutational burden (TMB) and MSI status can predict the response to immune checkpoint inhibitor therapy, only a minority of patients respond to PD-1 inhibitor treatment ([42]8, [43]9). The complex molecular heterogeneity and microenvironmental characteristics of CRC not only influence disease progression but also present significant challenges for precision medicine, highlighting the importance of understanding the CRC microenvironment in detail. To date, there has not been a comprehensive and systematic characterization of how tumour and TME cells shape the tumoural, stromal, and immune landscapes to form specific CRC subtypes. Recent single-cell studies have revealed cellular heterogeneity in the CRC microenvironment and identified multiple functionally important specific cell subgroups ([44]10–[45]13). While these studies have provided new perspectives for understanding tumour progression mechanisms and immune evasion, their geographical limitations and sample sizes make fully characterizing the shared mechanisms within the CRC microenvironment difficult. Cross-study comparisons are also challenging due to varying cell annotation methods across different studies. In this study, we integrated four datasets from public databases, encompassing 41 samples, to systematically describe the differential cell population distributions and intercellular interaction networks between tumour and normal tissues. We not only revealed the heterogeneous characteristics and transcriptional reprogramming of epithelial cells in tumour tissues but also identified several key cell subgroups potentially involved in the formation of an immunosuppressive microenvironment, providing new insights into CRC progression mechanisms and the immune microenvironment. 2. Materials and methods 2.1. Data collection Data for TCGA-COAD and TCGA-READ were downloaded from the UCSC Xena platform ([46]https://xenabrowser.net/datapages/). Single-cell RNA sequencing (scRNA-seq) datasets were obtained from the GEO database (accession numbers: [47]GSE161277, [48]GSE200997, [49]GSE221575, and [50]GSE231559). The combined dataset included samples from 27 primary colorectal cancer (CRC) patients and 14 normal control samples. After quality control (QC), a total of 88,212 high-quality single cells were retained for further analysis. 2.2. Data processing Single-cell RNA-seq data were preprocessed using the Seurat v4.3.0 R package. Quality control (QC) was performed to remove low-quality cells and potential dying cells. Specifically, we retained cells that expressed at least 400 genes, and excluded cells with >20% mitochondrial gene expression. These thresholds were selected based on the distribution of QC metrics and previous studies, aiming to balance data completeness with the removal of low-quality cells ([51]14–[52]16). To detect and remove potential doublets, we applied DoubletFinder v2.0.3. The expected number of doublets was calculated based on an assumed doublet rate of ~7.5–8%, following 10X Genomics guidelines, and using the formula: nExp_poi = round(0.08 × N × N/10000), where N is the number of cells in the sample. For doublet prediction, we used 20 principal components (PCs = 1:20) and the following parameters: pN = 0.25, pK = 0.09, nExp = nExp_poi, reuse.pANN = FALSE, sct = FALSE. These settings were based on the recommended defaults in the official DoubletFinder tutorial. Predicted doublets were removed from the dataset prior to downstream analysis. Data normalization, identification of highly variable genes, principal component analysis (PCA), and unsupervised clustering were performed using Seurat’s standard pipeline. Harmony v1.2.3 was used for batch correction and data integration, with sample identity specified as the batch variable. The RunHarmony() function was executed with default parameters, and the top 30 PCs were retained for downstream analysis. UMAP was used for dimensionality reduction and visualization. For differential expression analysis, we used the FindAllMarkers() function in Seurat with the Wilcoxon rank-sum test, applying the following thresholds: min.pct = 0.25, logfc.threshold = 0.25, only.pos = FALSE. Significant differentially expressed genes (DEGs) were defined as those with p-value < 0.05. 2.3. Cell type identification To identify cell types, we first performed differential expression analysis across clusters using the FindAllMarkers() function in Seurat. Marker genes were defined as those with an adjusted p-value < 0.05, expression in more than 25% of cells within the cluster (min.pct = 0.25), and an absolute log2 fold change > 0.25. For each cluster, the top-ranked differentially expressed genes were considered cluster-specific highly expressed genes. We then compared these cluster-specific markers against curated reference databases, including CellMarker and PanglaoDB, to determine the most likely cell type identity for each cluster. Annotation was conducted manually based on the expression patterns of canonical lineage markers and known cell-type-specific genes. To support and cross-validate our manual annotations, we additionally employed the SingleR package, which uses reference transcriptomic datasets to infer cell identities. The results from SingleR were used as a secondary reference and were reconciled with our primary marker-based annotation strategy. Furthermore, we calculated Spearman correlation coefficients between the average expression profiles of all clusters to evaluate transcriptional similarity. Clusters with highly correlated expression patterns and overlapping marker gene expression were considered for subtype merging to avoid artificial over-segmentation. Final cell type labels were determined by integrating information from marker gene analysis, database matching, SingleR prediction, and inter-cluster correlation. 2.4. Cell communication analysis Cell-cell communication networks within the tumour microenvironment were inferred using the CellChat v1.1.3 R package based on receptor-ligand interactions ([53]17). Communication probability and the number of interactions were calculated to construct communication networks. The interactions between any two cell populations were visualized, and scatter plots were generated to display the major signalling senders (signal sources) and receivers (targets) in a two-dimensional space, helping to identify the main contributors of outgoing or incoming signals, particularly among immune cell types. A pattern recognition approach was used to identify how multiple immune cell types and signalling pathways coordinate. 2.5. Pseudotime trajectory analysis Pseudotime trajectories were constructed using the Monocle 2 algorithm, an R package designed for single-cell trajectory analysis by Qiu et al. ([54]18). This algorithm reduces high-dimensional gene expression profiles into a low-dimensional space and arranges the cells into trajectories with branching points. Dynamic expression heatmaps were constructed using the plot_pseudotime_heatmap function. 2.6. Machine learning-based feature signature identification A consensus feature signature was derived using a machine learning-based integrative approach that combined ten different algorithms: Random Survival Forest (RSF), Elastic Net (Enet), Lasso, Ridge, Stepwise Cox, CoxBoost, Partial Least Squares for Cox (plsRcox), Supervised Principal Components (SuperPC), Generalized Boosted Regression Model (GBM), and Survival Support Vector Machines (survival-SVM). To ensure robustness, a consensus model was constructed by integrating predictions from these methods. A total of 101 algorithmic combinations were executed within a leave-one-out cross-validation (LOOCV) framework, optimizing feature selection and model performance. Hyperparameters for each algorithm were tuned using grid search/random search (or specify your method), and model performance was evaluated based on the concordance index (C-index) and other relevant metrics (e.g., AUC, log-rank test).The TCGA-READ and COAD datasets were randomly split into a training set (60%) and a validation set (40%), ensuring a balanced distribution of clinical and molecular features. Stratified sampling was applied to maintain consistency between groups. The final model was assessed on the validation set for predictive accuracy and generalizability. 2.7. Immune infiltration evaluation The CIBERSORT algorithm was applied to quantify immune cell infiltration levels in pancreatic adenocarcinoma (PAAD) patients and to explore differences in immune cell abundance between high-risk and low-risk patient groups ([55]19). Pearson correlation analysis was conducted to assess the relationship between immune cell abundance and risk scores. To further investigate potential differences in immune function, single-sample gene set enrichment analysis (ssGSEA) was employed to obtain enrichment scores, and the Wilcoxon test was used to compare immune function between high-risk and low-risk groups ([56]20). 2.8. Quantitative real-time PCR and western blotting Total RNA extracted from paired colorectal cancer tissues was collected. Then the RNA was reversely transcribed into cDNA with the kit (Takara, Dalian, China) and amplified. The sequences of the primers are detailed as below: TUBB: 5’-ATTTCTTTATGCCTGGCTTTG-3’ and 5’-GACCTGCTGGGTGAGTTCC’; GAPDH: 5’-ACACCCACTCCTCCACCTTT-3’ and 5’-TTACTCCTTGGAGGCCATGT-3’. Total cellular protein from clinical samples were lysed with RIPA lysis buffer (Solarbio, China) and added with protease inhibitor at a ratio of 1:100 (Thermo Scientific, United States). The information of primary antibodies was as follows: TUBB (1:1,000, Abmart, TA7011M) and GAPDH (1:4,000, Abmart, P30008S). 2.9. Kaplan–Meier plotter database analysis We analysed the predictive value of TUBB in CRC using Kaplan–Meier (KM) Plotter ([57]https://kmplot.com) ([58]21). Based on the median expression (high and low expression), patients were divided into two groups to analyse the overall survival (OS) and recurrence-free survival (RFS). 2.10. Patients and clinical samples 59 paired colorectal cancer tissues were obtained from patients who underwent colorectal cancer surgery at Yangpu Hospital of Tongji University between November 2018 and November 2019. The study got approval from Ethics Committee of the Yangpu Hospital (LL-2023-LW-012). CRC tissues and paracancerous tissues were collected during surgery and immediately frozen in liquid nitrogen to assess the expression levels of specific genes and proteins respectively. 2.11. Cell culture and transfection Human colorectal (CRC) cell lines (HCT116 and SW620) were acquired from Shanghai Institute of Biochemistry and Cell Biology. All cell lines were cultured in DMEM medium (Gibco, Carlsbad, CA, USA) which contains 10% foetal bovine serum (FBS; Gibco) at 37°C with 5% CO2. Lipofectamine 3000 (Invitrogen, Carlsbad, CA, USA) was used to transfect cells with an siRNA specific for TUBB and a control construct purchased from GeneChem (Shanghai, China). Cells were utilized for downstream assays at 48h post-transfection. Analyses were conducted in triplicate. TUBB overexpression plasmid was customized from GenePharma (Shanghai, China). 2.12. Transwell assays and wound healing assay Cells were suspended in 250 μL of serum-free medium and seeded into the upper chamber of a 24-well Transwell plate (Nest, China). The lower chamber was filled with culture medium containing 10% FBS. For invasion assays, the Transwell chambers were coated with Matrigel (2 mg/mL) and DMEM, whereas for migration assays, they were left uncoated. After 24 hours of incubation, the invaded cells were fixed with 4% paraformaldehyde for 30 minutes and stained with crystal violet for 10 additional minutes, both at room temperature. Cells were counted in five random optical fields of view under a light microscope (Nikon Corporation, Japan). For the wound healing assay, cells were cultured without FBS in 6-well plates for 24 hours. Linear wounds were created by scratching with a 10 μL pipette tip. Wound closure was monitored and photographed at 0 and 24 hours using a microscope (Nikon Corporation, Japan). 2.13. Assays of cell proliferation and apoptosis To assess the rate of DNA synthesis, CRC cell lines were subjected to treatment with 5-ethynyl-2’-deoxyuridine (EDU) at a concentration of 50 μM, which was subsequently added to the cell culture plates. Following a 30-minute incubation, DNA was stained using Hoechst 33342, allowing for the visualization of positively stained cells under a microscope. HCT116 and SW620 cells, characterized by either TUBB overexpression or knockdown, were dissociated into single-cell suspensions using 0.25% trypsin. These cells were then stained with Annexin V-APC and 7-Aminoactinomycin D (7-AAD) to evaluate apoptosis rates, which were quantified through flow cytometry analysis. 2.14. Statistical analysis All statistical analyses and data visualizations were performed using R software (version 4.1.3). Pearson correlation coefficients were used to evaluate the correlation between continuous variables. For quantitative data, a two-tailed unpaired Student’s t-test or one-way analysis of variance (ANOVA) with Tukey’s multiple comparison test was performed to compare values between subgroups. When multiple comparisons were conducted, p-values were adjusted using the Benjamini-Hochberg (BH) method to control the false discovery rate (FDR). A p-value or adjusted p-value < 0.05 was considered statistically significant. 3. Results 3.1. Integrated analysis reveals cell type composition and functional remodelling in the CRC microenvironment In this study, we performed an integrated analysis of 41 samples from four public datasets ([59]GSE161277, [60]GSE200997, [61]GSE221575, and [62]GSE231559) ([63]2, [64]22–[65]24), comprising 27 primary CRC tumour samples and 14 normal control samples. After rigorous quality control and doublet removal, we obtained 88,212 valid cells for analysis ([66] Supplementary Figures S1A, B ). Through single-cell transcriptome analysis, which combines specific gene expression profiles and classical cell markers, we classified these cells into 32 distinct clusters. On the basis of intercluster Spearman correlation analysis and previously reported cell markers ([67]14, [68]25), we identified 8 different cell types ([69] Figures 1A, B , [70]Supplementary Figure S1C ), including T cells expressing high levels of CD3D, CD3E, and TRAC; B cells expressing CD79A, MS4A1, and CD79B; epithelial cells expressing EPCAM, KRT8, and KRT18; plasma cells expressing JCHAIN and SDC1; myeloid cells expressing LYZ, MNDA, and C1QA; fibroblasts expressing COL1A1, DCN, and COL1A2; endothelial cells expressing CLDN5, CDH5, PECAM1, and VWF; and mast cells expressing TPSB2 and TPSAB1 ([71] Figure 1C , [72]Supplementary Figures S1D–G ). We calculated the top 50 highly expressed genes for each cell type to confirm cluster specificity and used AUCell to score activated pathways in each subgroup, further validating cell identities ([73] Figure 1D , [74]Supplementary Figure S2A ). Figure 1. [75]Figure 1 [76]Open in a new tab Single-cell transcriptomic analysis reveals cell type composition and functional remodelling in colorectal cancer. (A) UMAP dimensionality reduction showing the integrated cell distribution map. A total of 32 cell clusters were identified, classified into 8 major cell types, with different colours representing distinct cell clusters. (B) Spearman correlation heatmap based on gene expression across cell clusters, revealing transcriptional similarities between clusters. (C) Expression profiles of characteristic marker genes across different cell types. (D) Heatmap displaying the top 50 highly expressed genes specific to each cell type, illustrating the transcriptional characteristics of different cell types. (E) Density distribution plot of each cell type in tumour and normal groups, showing differences in cell abundance between groups. (F) PHATE dimensionality reduction plot displaying cell distribution, with the left panel coloured by cell type and the right panel coloured by sample group, revealing spatial distribution patterns of cell types and sample groups. (G) Left panel: Stacked bar plot showing the proportional distribution of different cell types across samples; right panel: Composition ratio of tumour and normal groups within each cell type. * indicates statistical significance at p < 0.05. (H) Statistical summary of the number of differentially expressed genes (DEGs) between tumour and normal groups for each cell type. (I) Representative GO functional pathways enriched in upregulated DEGs in each cell type. For the cell composition analysis, we first calculated the cell type abundance in the tumour and normal groups and found significant enrichment of epithelial cells in the tumour group ([77] Figure 1E ). Using the PHATE algorithm ([78]26), which captures both local and global nonlinear structures, we identified significant differences between tumour and normal epithelial cells that were not fully revealed by conventional UMAP analysis ([79] Figure 1F , [80]Supplementary Figure S2B ). The cell proportion statistics revealed significant intergroup heterogeneity among the samples. Additionally, the proportions of epithelial cells, and myeloid cells increased in the tumour group, whereas the proportions of B cells and plasma cells decreased, suggesting remodelling of the tumour immune microenvironment ([81] Figure 1G , [82]Supplementary Figures S1H, I ), with cell abundance calculations providing more intuitive visualization of these results ([83] Supplementary Figures S2C, D ). We calculated differentially expressed genes (DEGs) between the tumour and normal groups for each cell type ([84] Figure 1H ), with epithelial cells showing the most DEGs, followed by fibroblasts, indicating that these two cell types may undergo the most significant phenotypic changes during tumour progression. GO analysis of the upregulated DEGs revealed that myeloid cells were significantly enriched in cytokine production and apoptotic signalling pathways, whereas T cells were enriched mainly in NF-κB signalling and cytokine production-related pathways. Notably, multiple cell types respond to oxidative stress and hypoxia, particularly endothelial cells and fibroblasts, which are actively involved in angiogenesis-related pathways, reflecting the complex intercellular interaction network in the tumour microenvironment ([85] Figure 1I ). 3.2. Multiple cell types exhibit coordinated patterns of metabolic activation and matrix remodelling We performed a systematic analysis of transcriptional characteristics across cell types in tumour and normal tissues. First, through Wilcoxon rank-sum test analysis of DEGs, we identified changes in several key molecules: the chemokine CCL5 was significantly downregulated in multiple immune cells (including T cells and myeloid cells), whereas CCL20 was upregulated, suggesting a shift in the immune microenvironment from an antitumour status to a protumour status ([86]27). Moreover, the concurrent upregulation of COL4A1/COL4A2 across multiple cell types, particularly in fibroblasts and endothelial cells, reflected extracellular matrix remodelling ([87] Figure 2A ). To comprehensively understand the changes in the cellular composition during disease states, we employed a multilayered analytical strategy. Differential abundance analysis based on k-nearest neighbour statistics ([88]28) revealed significantly increased proportions of epithelial cells, endothelial cells, and myeloid cells in the tumour group ([89] Figures 2B, C ), suggesting expansion of the tumour parenchyma and active angiogenesis. Furthermore, using the random forest-based Augur algorithm ([90]29) to assess transcriptional perturbation levels across cell types, we found that endothelial cells, mast cells, fibroblasts, and epithelial cells presented the most significant transcriptional changes ([91] Figure 2D ). Using AUCell, we calculated the pathway involvement of endothelial cells and mast cells in the tumour group and found that endothelial cells were involved primarily in extracellular matrix remodelling, whereas mast cells were involved mainly in lipid metabolism ([92] Supplementary Figures S2E, F ). Figure 2. [93]Figure 2 [94]Open in a new tab Multi-dimensional analysis reveals transcriptomic remodelling and functional changes in cell types in colorectal cancer. (A) Differentially expressed genes (DEGs) between tumour and normal groups across cell types, analysed using the Wilcoxon rank-sum test. Highly significant DEGs are defined as those with an adjusted p-value (adj.pval) < 0.05, while lowly significant DEGs are defined as those with adj.pval < 0.1. “Upregulated”refers to genes with higher expression in the tumour group relative to the normal group. (B) Differential abundance analysis using the Milo k-nearest neighbour (kNN) algorithm. Each node represents a local cellular neighbourhood, with colour intensity representing the log fold change of tumour relative to normal. White nodes indicate non-significant differences (FDR > 10%). The node layout is based on UMAP dimensionality reduction. (C) Statistical results of the Milo kNN differential abundance analysis. (D) Augur analysis framework assessing the degree of transcriptomic perturbation in each cell subtype between biological states. A higher AUC value indicates more significant transcriptomic alterations. (E) Volcano plot of DEGs in epithelial cells between tumour and normal groups. DC values represent protein-protein interaction network strength, calculated using the STRING database. (F-G) GO functional enrichment analysis of DEGs in epithelial cells. (F) Functional annotation of upregulated genes and (G) downregulated genes. (H) GSEA pathway enrichment analysis of DEGs in epithelial cells, showing representative signalling pathways. NES represents the Normalized Enrichment Score. Given the dominant position of epithelial cells in samples and their significant abundance and transcriptional changes, we conducted in-depth functional analysis. GO enrichment analysis of the DEGs revealed that the upregulated genes were enriched mainly in protein synthesis- and ribosome biogenesis-related pathways, reflecting robust growth demands and stress adaptation responses of tumour cells, whereas the downregulated genes were enriched in cell polarity- and structure-related pathways, suggesting phenotypic alterations ([95] Figures 2E–G ). GSEA further confirmed the significant activation of three key pathways: RNA processing, translation, and extracellular vesicles. These changes not only reflect cancer cell proliferative activity and stress adaptation but also indicate tumour microenvironment remodelling and potential metastatic tendencies suggesting that systematic changes occur across multiple cell types in CRC ([96] Figure 2H ). 3.3. Epithelial cell heterogeneity and copy number variation-driven malignant progression in CRC Considering that epithelial cells are the primary source of malignant tumour cells in CRC and their significant perturbations at both the abundance and transcriptional levels, we conducted an in-depth analysis of epithelial cells. Through reclustering, we identified 8 distinct epithelial cell subgroups ([97] Figure 3A , [98]Supplementary Figures S3A, B ), including stem/progenitor cells (SPCs), secretory transit amplifying cells (SecTA), absorptive enterocytes (AEs), goblet cells (GCs), cycling transit amplifying cells (CycTA), infiltrating immune-like cells (IIIC), enteroendocrine cells (EECs), and BEST4+ enterocytes (BEST4-ECs). On the basis of previous studies and analyses of highly expressed genes in each subgroup ([99]30, [100]31), we validated these subgroup annotations ([101] Figures 3C, D ). By calculating subgroup-specific highly expressed genes, we not only confirmed cell identity specificity but also identified representative marker genes, such as the stem cell characteristic factors SOX4, ELF3, and KLF5 in the SPCs and VIM and CCL5 in infiltrating immune-like cells ([102] Figure 3D ). Additionally, we performed functional enrichment analysis on the genes that were specifically expressed in each subgroup to validate their functions ([103] Supplementary Figures S3C, D ). When comparing tumour and normal tissues, we detected significantly increased abundances of SPCs, SecTA, and CycTA in the tumour group, indicating the activation of stem cell properties and inflammatory responses, whereas decreased abundances of AEs, EECs, and BEST4-ECs reflected impaired normal absorption and endocrine functions, revealing significantly different epithelial cell states between the two groups ([104]32, [105]33) ([106] Figures 3B, E ). Augur framework analysis revealed that EECs and SecTA were the two subgroups with the most significant transcriptional perturbations ([107] Figure 3F ). GO enrichment analysis of DEGs revealed that EEC-related genes were enriched mainly in cytoskeleton- and cell junction-related pathways, such as cell migration, cadherin binding, and cell–cell junctions, changes that might affect their paracrine regulatory function and hormone secretion polarity, potentially leading to invasive characteristics. SecTA DEGs were enriched mainly in protein synthesis and cell adhesion-related pathways, such as cytoplasmic translation and cytosolic ribosomes, reflecting significantly increased protein synthesis activity and metabolic levels in SecTA within tumour tissue, suggesting possible increased secretory protein production ([108] Figure 3G ). Figure 3. [109]Figure 3 [110]Open in a new tab Copy number variation (CNV) analysis reveals genomic instability features in epithelial cell subtypes of colorectal cancer. (A) UMAP showing subtypes of stem/progenitor cells (SPCs), secretory transit amplifying cells (SecTA), absorptive enterocytes (AEs), goblet cells (GCs), cycling transit amplifying cells (CycTA), infiltrating immune-like cells (IIIC), enteroendocrine cells (EECs), and BEST4+ enterocytes (BEST4-ECs). (B) Stacked bar plot representing the proportional distribution of cell types across different groups. (C) Expression of marker genes used for the identification of each cluster. (D) Heatmap showing the average expression of genes in each epithelial cell subtype, with the left panel displaying clustering of subtype-specific genes and representative markers. (E) Density plot showing the enrichment of cell counts in each group. (F) Augur framework displaying transcriptomic perturbation across subclusters under two biological conditions, where a higher AUC represents greater transcriptomic changes. (G) GO enrichment analysis of differentially expressed genes (DEGs) in EEC and SecTA subtypes. (H) InferCNV analysis predicting copy number variations in each epithelial cell subtype using T cells, B cells, and NK cells as references. (I) CNV scores mapped