Abstract Precancerous lesions typically precede gastric cancer (GC), but the molecular mechanisms underlying the transition from these lesions to GC remain unclear. Therefore, it is urgent to understand this transition from precancerous lesions to GC, which is crucial for the early diagnosis and treatment of GC. In this study, we merged multiple single-cell RNA sequencing datasets to investigate the molecular changes in distinct cell types associated with the progression of GC. First, we observed an increasing abundance of immune cells and a decrease in non-immune cells from non-atrophic gastritis to GC. Five immune cell types were significantly enriched in GC compared to precancerous lesions. Moreover, we found that the interleukin (IL)-17 signaling pathway and Th17 cell differentiation were significantly up-regulated in immune cell subsets during GC progression. Some genes in these processes were predominantly expressed at the GC stage, highlighting their potential as diagnostic markers. Furthermore, we validated our findings using bulk RNA sequencing data from The Cancer Genome Atlas and confirmed consistent immune cell changes during GC progression. Our study provides insights into the immune infiltration and signaling pathways involved in the development of GC, contributing to the development of early diagnosis and targeted treatment strategies for this malignancy. Keywords: gastric cancer, tumor microenvironment, inflammation, single-cell RNA sequencing, IL-17 signaling pathway Graphical abstract graphic file with name fx1.jpg [41]Open in a new tab __________________________________________________________________ Lv and colleagues reveal that the immune microenvironment undergoes significant changes during the progression from inflammation-induced precancerous lesions to gastric cancer (GC). They identify an increase in immune cell populations and up-regulation of the IL-17 signaling pathway, highlighting potential biomarkers and therapeutic targets for early GC diagnosis and treatment. Introduction Nowadays, cancer is one of the serious threats to public health. The number of new cancer cases and deaths is increasing year by year.[42]^1 There are many hypotheses about the cause of cancer, including smoking, radiation, viruses, cancer-causing chemicals (carcinogens), and chronic inflammation.[43]^2^,[44]^3 As early as 1863, Balkwill et al. discovered that tumor tissue contains a large amount of inflammatory cell infiltration and put forward the hypothesis that tumors are originated from chronic inflammation.[45]^4 Inflammation is thought to be the body’s local protective response to injury or infection.[46]^5 In recent years, studies have found that chronic or excessive inflammation can lead to cancer. More than 20% of cancers are caused by inflammation, such as gastritis for gastric cancer (GC),[47]^6 pneumonia for lung cancer,[48]^7 and colitis for colorectal cancer.[49]^8 It is crucial to analyze the molecular mechanisms under this transformation to systematically study the transformation process from chronic inflammation into cancer and discover the important factors involved in this malignant transformation. Breakthroughs in this field will contribute to cancer prevention as well as early diagnosis and treatment. GC is one of the most common cancers in clinical practice, with a high incidence and mortality worldwide, especially in Asia.[50]^9 According to the World Health Organization Global Cancer Statistics Report 2020, it is estimated that the number of new patients with GC worldwide exceeds 1 million every year, and the main age of onset is between 60 and 84 years old.[51]^10^,[52]^11 The link between intestinal metaplasia (IM) and GC was revealed in 1938.[53]^12 In 1955, Morson reported that GC occurred in IM.[54]^13 Intestinal GC is preceded by precancerous lesions, including non-atrophic gastritis (NAG), chronic atrophic gastritis (CAG), and IM.[55]^14 Patients with these precancerous lesions have an increased risk of developing GC.[56]^14 Identifying the genes and pathways that drive cancer formation has been the focus of large-scale genomics research. However, most research has focused on the overall characteristics of advanced gastric tumors, largely ignoring related studies on precancerous lesions.[57]^8 Our understanding of the phenotypic changes and molecular drivers of the transition from precancerous lesions to GC remains inadequate.[58]^8 The identification of the biomarkers in this transformation process could help in the early diagnosis of GC so that most patients with GC can get a radical cure to improve their health. In this study, we collected multiple single-cell RNA sequencing (scRNA-seq) datasets from the Gene Expression Omnibus (GEO). These include NAG, CAG, IM, GC tumor samples, and adjacent normal tissues (ANTs). By comparing the samples from gastritis to IM and then to GC, we found that the abundances of the five immune cell types vary identically. Meanwhile, all of them are significantly enriched in the GC. From IM to GC, interleukin (IL)-17 signaling pathway is up-regulated in the tumor, and its related genes are significantly differentially expressed in the tumor, such as TNF, IL17RA, IKBKG, TAB2, IL1B, and CASP8. These differentially expressed genes (DEGs) can be used as clinical diagnostic markers for the early diagnosis and treatment of GC. We have identified a set of genes that demonstrate up-regulation in immune cell subsets, suggesting their potential involvement in the disease process. While these genes show promise as biomarkers for the early diagnosis and treatment of GC, we acknowledge that the up-regulation of these genes is not exclusive to GC but is also observed in various immune-dependent disorders such as infections, allergies, and autoimmune diseases. This overlap in gene expression profiles complicates the specificity of these markers for GC and may lead to potential misdiagnosis in patients with concurrent diseases. Results Immune modifications during gastric carcinogenesis To identify markers of dynamic changes in the tumor immune microenvironment that are associated with the development and progression of GC from precancerous lesions, we obtained high-quality single-cell expression profiles of 211,056 high-quality cells from 58 samples ([59]Figure S1A; [60]materials and methods). These included 3 NAG samples, 3 CAG samples, 7 IM samples, 35 primary tumor samples, and 10 adjacent normal samples ([61]Figure 1A). The tumor samples included one early GC sample, and the tumor samples ranged from stage I to stage IV ([62]Figure S1B). In order to identify different cell populations based on the expression of marker genes, we performed dimension reduction and unsupervised cell clustering using the method implemented in the Scanpy software suite[63]^15^,[64]^16^,[65]^17 and ultimately identified a total of 12 major cell clusters ([66]Figure 1B; [67]materials and methods). Based on the expression of known marker genes ([68]Figure 1C), all cell types in the gastric samples including immune cells, epithelial cells, fibroblasts, endothelial cells, enteroendocrine cells, and smooth muscle cells were revealed. Among them, immune cells consist of T cells, B cells, natural killer (NK) cells, mast cells, and myeloid cells. Epithelial cells have three subpopulations, S100P+ epithelial cells, MZB1+ epithelial cells, and CXCL17+ epithelial cells ([69]materials and methods). Figure 1. [70]Figure 1 [71]Open in a new tab Single-cell atlas of expression and changes in cellular composition during the development of GC (A) Uniform manifold approximation and projection (UMAP) representations of all scRNA-seq cells colored by the original tissue types. (B) UMAP representations and annotations of all cells. (C) Dot plot showing the expression of the marker genes of each cell type. (D) Boxplots of the cellular proportions of immune cells and non-immune cells in different samples. Each box represents data for a single tissue type. The boxplots and statistics are derived from 3 NAG samples, 3 CAG samples, 7 IM samples, 10 ANT samples, and 35 GC samples. Boxes represent the median, 25th percentile, and 75th percentile of the data, whiskers represent the highest and lowest values within 1.5 times the interquartile range of the boxplot, and all points are plotted. (E) Boxplots of immune cell type as in (D). t tests were performed for the proportion of adjacent tissue types and the proportion between IM and GC. p values smaller than 0.05 are shown in the plot. (F) Milo analysis of differential abundance changes between IM and GC samples. Neighborhoods that are significantly differentially abundant in IM samples are colored in red, and neighborhoods that are differentially abundant in GC samples are colored in blue. By tracking the changes in the proportion of these cell types in each sample, we found that the proportions of immune cell subsets gradually increase and the proportions of non-immune cell subsets gradually decrease from NAG to GC ([72]Figure 1D). Immune cells are significantly enriched at GC and ANT stages (t test, p < 0.05). Next, we captured the changes in the relative abundances of the five immune cell types separately. Their relative abundances show basically the same upward trend along the development of GC. We then tested whether the abundance of every immune cell type is significantly different between adjacent stages and found that the abundances of all five immune cell types were significantly different between IM and GC stages ([73]Figure 1E, t test, p < 0.05). We used another generalized linear model-based method (Milo) to test the difference in the abundances of cell types.[74]^18 Milo’s results show that, relative to IM, all five immune cell types we identified were enriched at the GC stage, which is consistent with the previous observations ([75]Figure 1F). Taken together, these data suggest that the turning point in the trajectory of immune evasion of GC is at the IM stage. Activated signaling pathways during GC progression Since the turning point in the trajectory of immune evasion is at the IM stage, we next explored the altered biological processes during this turning point. The NEBULA algorithm[76]^19 was used to obtain the DEGs within five immune cell subsets between IM and GC ([77]Figure 2A; [78]materials and methods). We then performed enrichment analysis on these DEGs within each immune cell subset separately and found that Th17 cell differentiation is significantly up-regulated at the GC stage compared with IM for all five immune cell subsets ([79]Figures S2–S4; [80]materials and methods). Th17 cells are a subset of effector T helper cells that produce IL-17 pro-inflammatory factors.[81]^20 Besides, the IL-17 signaling pathway is significantly up-regulated in subsets of T cells, B cells, mast cells, and myeloid cells at the GC stage compared with IM ([82]Figures S2–S4). IL-17 is an effective pro-inflammatory cytokine, which has been confirmed to be closely related to the formation, growth, and metastasis of tumors in a variety of cancer types.[83]^21 IL-17 binds to its receptor, IL-17R, to form a heterodimeric receptor complex, which activates downstream pathways such as nuclear factor κB (NF-κB), Microtubule Affinity-Regulating Kinase (MAPKs), and CCAAT/Enhancer Binding Protein (C/EBPs) and induces the expression of antimicrobial peptides, cytokines, and chemokines.[84]^21 This is consistent with our observation that the NF-κB signaling pathway and MAPK signaling pathway are significantly up-regulated in all subsets of immune cells during GC ([85]Figures S2–S4). Figure 2. [86]Figure 2 [87]Open in a new tab Gene expression in GC development (A) Volcano plots of the results of differential expression analysis in immune cells between IM and GC by using the NEBULA algorithm. Red dots represent genes up-regulated (log FC ≥ 1 and p < 0.05) in GC, and blue dots represent genes down-regulated (log FC ≤ −1 and p < 0.05) in GC. (B and C) Dot plots of representative genes in the IL-17 signaling pathway mapped onto tissue types and cell types. (D and E) Dot plots of representative genes in Th17 cell differentiation mapped onto tissue types and cell types. Since Th17 cell-associated activities are the common event in altered immune cells in GC, we further examined the expression of specific genes involved in the IL-17 signaling pathway and Th17 differentiation. In the IL-17 signaling pathway, some genes were only expressed at the GC stage or up-regulated at the GC stage, such as TNF, IL17RA, IKBKG, TAB2, IL1B, and CASP8, among others ([88]Figure 2B). As shown in [89]Figure 2C, IL1B is mainly expressed by myeloid cells, CASP8 is mainly expressed by T and NK cells, and TNF is mainly expressed by myeloid cells, but TNF is also expressed by other immune cells. These results suggest that the IL-17 signaling pathways in GC require coordination between diverse immune cell types. Genes that contribute to Th17 cell differentiation are IL1B, IL2RB, ZAP70, HLA-DQA1, LAT, and IL4R. We found that many genes related to Th17 cell differentiation were expressed only at the GC stage or up-regulated at the GC stage, whereas none of these genes were found to be expressed in precancerous lesion samples ([90]Figure 2D). Meanwhile, we found that IL1B is mainly expressed by myeloid cells. Additionally, some genes are mainly expressed by two or three immune cells, such as ZAP70, IL2RB, LCK, and CD247 are mainly expressed by T and NK cells, HLA-DQA1 is mainly expressed by myeloid cells and B cells, and LAT is mainly expressed by T, NK, and mast cells. In addition, some genes are expressed in all immune cells, for instance, IL4R and RARA ([91]Figure 2E). These results further confirm that the IL-17 signaling pathway and Th17 cell differentiation in GC require the coordination between diverse immune cell types. Immune infiltration of GC samples in TCGA To track the changes in immune cell states along GC progression, we obtained the bulk RNA-seq data of GC from The Cancer Genome Atlas (TCGA)[92]^22 and used the CIBERSORTx[93]^23 platform to deconvolute these GC bulk RNA-seq data to infer the cellular proportion of each sample of the TCGA GC data. Due to the limited storage space on the CIBERSORTx platform (1,000 megabytes), we used the Metacell-2 algorithm to reduce the number of cells in the merged scRNA-seq dataset.[94]^24 Finally, the derived metacell dataset contains 11,337 metacells ([95]materials and methods). We identified 10 major cell clusters based on marker gene expression ([96]Figures 3A and 3B; [97]materials and methods). Although the granularity of metacell clusters is slightly different from those of single-cell clusters identified by single-cell expression profile, the major cell types in gastric tissues are all revealed, and the five immune cell types that we focus on in this study are consistent. Since the metacell dataset is still bigger than 1,000 megabytes, we downsampled approximately 75% of the metacells multiple times to ensure that the deconvolution results are stable. Figure 3. [98]Figure 3 [99]Open in a new tab Immune cell composition of GC samples in TCGA (A) UMAP representations and annotations of metacells. (B) Dot plot of the expression of the marker genes for each cell type. (C) Violin plots of the proportion of each immune cell type in different samples from bulk RNA-seq dataset. Each violin represents data for a single GC stage. t test p values between any two stages less than 0.05 are shown in the plot. (D) Boxplots of the proportion of each immune cell type in different samples from scRNA-seq data (the first row) and bulk RNA-seq data (the second row). Each box represents data for a single tissue type or GC stage. The boxplots and statistics are derived from 3 NAG samples, 3 CAG samples, 7 IM samples, 1 early GC samples, 4 stage I GC samples, 6 stage II GC samples, 17 stage III GC samples, and 7 stage IV GC samples in the scRNA-seq data and 14 stage I GC samples, 70 stage II GC samples, 157 stage III GC samples, and 95 stage IV GC samples in the bulk RNA-seq data. T0, early GC; T1–T4, stage I–IV GC. Using metacells as the reference gene expression profiles, we deconvolved the GC TCGA data and obtained the inferred proportion of each cell type in each sample ([100]materials and methods). According to the clinical information, we grouped the samples by the stage of GC. As shown in [101]Figure 3C, the proportions of five immune cells show an overall upward trend with the progression of GC, and the proportions of T cells, B cells, and myeloid cells showed significant differences in different stages. Next, we compared the changes in the proportion of each immune cell in the four tumor stages of GC in the single-cell data and TCGA data. We found that the proportion of immune cells in the two data showed a similar trend during tumor progression ([102]Figure 3D). These data validate the changes in the abundance of immune cell subsets found in our single-cell data. Signaling pathway analysis of subsets of T cells and myeloid cells Next, we further subdivided T cells and myeloid cells into subtypes and examined changes in the abundance of these two immune cell subtypes as well as their differential gene expression. We used the Scanpy software suite to perform dimension reduction and unsupervised cell clustering for both T cells and myeloid cells.[103]^15^,[104]^16^,[105]^17 We then annotated the subtypes based on the expression of known marker genes ([106]materials and methods). Ultimately, three T cell subtypes were identified, including regulatory T cells, cytotoxic T cells, and naive T cells ([107]Figure 4A). Meanwhile, five subtypes of myeloid cells were identified, including intermediate monocytes, monocytes/macrophages, non-classical monocytes, conventional dendritic cells (DCs), and plasmacytoid DCs (pDCs) ([108]Figure 4B). Figure 4. [109]Figure 4 [110]Open in a new tab The composition of subtypes of T cells and myeloid cells (A and B) UMAP representations and annotations of subtypes in T cells and myeloid cells. (C and D) Boxplots of the proportion of each T cell subtype and each myeloid cell subtype in different samples. (E) Dot plots of representative genes in the IL-17 signaling pathway mapped onto cell types. (F) Dot plots of representative genes in Th17 cell differentiation mapped onto cell types. Next, we examined the changes in abundance of the eight immune cell subtypes obtained above and used the t test to determine the significance of differences in abundance between the adjacent stages. We found the abundance of T cell and myeloid cell subtypes gradually increased from NAG to GC ([111]Figures 4C and 4D). The common feature among the changes in the abundance of the eight subtypes is that the difference in abundance between IM and GC was significant, and all eight subtypes were relatively significantly enriched in the GC and ANT stages (t test, p < 0.05). This is consistent with our previous conclusion that the turning point in the trajectory of immune evasion of GC is at the IM stage. From IM to GC, we used the NEBULA algorithm to obtain DEGs within subsets of T cells and myeloid cells. Then, we examined the DEGs within each subset and performed Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis using these genes. In the up-regulated gene enrichment analysis, we found that compared to IM, the IL-17 signaling pathway was significantly up-regulated only in regulatory T cells, intermediate monocytes, and monocytes/macrophages during the GC stage. Additionally, we found that Th17 cell differentiation was significantly up-regulated only in regulatory T cells, cytotoxic T cells, naive T cells, intermediate monocytes, and pDCs during the GC stage. At the same time, we examined the expression of genes involved in the IL-17 signaling pathway that were up-regulated at the GC stage in regulatory T cells, intermediate monocytes, and monocytes/macrophages. We found that genes expressed or up-regulated only at the GC stage were the same as when T cells and myeloid cells were not divided. We also examined the expression of genes involved in Th17 cell differentiation that were up-regulated at the GC stage in regulatory T cells, cytotoxic T cells, naive T cells, intermediate monocytes, and pDCs. We found that genes expressed or up-regulated only at the GC stage were almost the same as when T cells and myeloid cells were not divided, and only IL4R was not found. Finally, we examined the expression of these genes in different cell types and did not find any genes that were specifically expressed only in a subset ([112]Figures 4E and 4F). In summary, these data and results suggest that only some, but not all, immune subsets are involved in the activities related to the IL-17 signaling pathway and Th17 cell differentiation. This also provides further evidence that the IL-17 signaling pathways in GC require coordination between diverse immune cell types. Discussion Cancer is a growing threat to public health, with increasing incidence and mortality rates. Chronic inflammation has been recognized as a contributing factor to cancer development, and it has been observed that chronic or excessive inflammation can lead to a significant number of cancer types. In this study, we aimed to investigate the molecular mechanisms involved in the transformation of inflammation into GC and identify potential biomarkers for early diagnosis and treatment. Our study revealed the immune modifications during gastric carcinogenesis. By analyzing scRNA-seq data from multiple samples of NAG, CAG, IM, GC at different stages, and ANTs, we identified distinct immune cell populations and their dynamic changes throughout the progression of GC. We observed a gradual increase in the proportion of immune cell subsets and a decrease in non-immune cell subsets from NAG to GC, indicating a shift toward immune cell enrichment in the tumor microenvironment. Furthermore, we found that the abundances of all five immune cell types (T cells, B cells, NK cells, mast cells, and myeloid cells) were significantly different between IM and GC stages, with enrichment in the GC samples. This suggests that the turning point in the trajectory of immune evasion in GC occurs at the IM stage. These observations highlight the importance of understanding the immune cell dynamics and their functional changes during the progression of GC. We then focused on the IL-17 signaling pathway, which plays a crucial role in inflammation and has been implicated in various cancers. Our analysis revealed the up-regulation of the IL-17 signaling pathway in subsets of T cells, B cells, mast cells, and myeloid cells at the GC stages compared to IM. The IL-17 signaling pathway is known to activate downstream pathways, such as NF-κB and MAPKs, and induce the expression of pro-inflammatory cytokines and chemokines. The significant up-regulation of the NF-κB and MAPK signaling pathways in immune cells during GC further supports the involvement of the IL-17 pathway in gastric carcinogenesis. Moreover, we identified several specific genes involved in the IL-17 signaling pathway and Th17 cell differentiation that were expressed or up-regulated in GC samples, including TNF, IL17RA, IKBKG, TAB2, IL1B, and CASP8. These genes may serve as potential clinical diagnostic markers for early detection and treatment of GC. The coordinated expression of specific genes involved in the IL-17 signaling pathway and Th17 cell differentiation was observed across diverse immune cell types, emphasizing the interplay and cooperation between different immune subsets in GC. Meanwhile, in-depth analysis of immune cell subtypes, specifically T cells and myeloid cells, revealed that only a portion of immune subsets are involved in the activities related to the IL-17 signaling pathway and Th17 cell differentiation. To validate our findings, we analyzed bulk RNA-seq data from TCGA and confirmed the changes in immune cell proportions observed in our scRNA-seq data. The similarities in immune cell dynamics between the two datasets further support the reliability of our findings. In conclusion, our study provides comprehensive insights into the immune modifications and signaling pathways involved in gastric carcinogenesis. The identification of immune cell dynamics and the up-regulation of the IL-17 signaling pathway during GC progression highlight potential biomarkers for early diagnosis and treatment. Some genes involved in the IL-17 pathway and Th17 cell differentiation serve as potential drug targets for clinical applications. Further research and validation of these findings could contribute to improved cancer prevention, early detection, and therapeutic strategies for GC. Materials and methods Collection of scRNA-seq and bulk RNA-seq data scRNA-seq data were obtained from the GEO database under GEO: [113]GSE183904 [114]^9 and [115]GSE134520,[116]^14 and from the Database of Genotypes and Phenotypes (dbGaP) under the identifier phs001818.v1.p1.[117]^25 In total, 58 samples from 43 patients were selected for analysis. These included 3 NAG samples, 3 CAG samples, 7 IM samples, 35 primary tumor samples, and 10 adjacent normal samples. Bulk RNA-seq data of patients with GC were downloaded from TCGA database ([118]https://portal.gdc.cancer.gov/). Quality control and preprocessing of scRNA-seq data We performed quality control on each dataset using the method implemented in the Scanpy software suite[119]^17 before merging them. Genes detected in fewer than three cells and cells expressing less than 200 detected genes were filtered out and excluded from downstream analysis. Next, cells would be labeled as poor-quality ones if they presented with one of the following conditions: (1) the number of Unique Molecular Identifier (UMIs) is lower than 600 or larger than 120,000, (2) the number of expressed genes is lower than 400 or larger than 7,000, or (3) 20% or more of the UMIs were mapped to mitochondrial genes.[120]^14 Then, we continued to utilize the functions in the Scanpy to normalize the gene expression level to UMIs Per Million (UPM) and log transform the single-cell gene expression data. Finally, after merging these datasets, we obtained 211,056 high-quality cells, and 15,569 genes were kept in the merged dataset, which was then used for subsequent analysis. Dimension reduction, unsupervised clustering, and annotation of scRNA-seq data Dimension reduction, unsupervised clustering, and annotation of scRNA-seq data were done using functions in the Scanpy software package (v.1.9.1). We identified a subset of highly variable genes (HVGs) in the preprocessed expression matrix and then scaled gene expression values before performing dimension reduction and clustering on them. The selection of HVGs in scRNA-seq data was implemented by the “pp.highly_variable_genes” function in the Scanpy package. The valid value of average expression was set to the range from 0.05 to 5, dispersion was set to no less than 0.5, and the default values were used for other parameters. Finally, the 4,000 HVGs were selected for further analysis. Next, we used the “pp.pca” function to perform the principal-component analysis on the single-cell expression matrix with genes restricted to HVGs and retained the top 50 principal components for subsequent analysis. Using these data, we performed Harmony and BBKNN to construct the K-Nearest Neighbors (KNN) graph of high-quality cells. In the same graph, we used the “tl.umap” function to project cells to 2D space for visualization. Meanwhile, we utilized the “tl.leiden” function to conduct an unsupervised graph-based clustering algorithm called Leiden to cluster cells. Due to the large number of cells in our study, we set the parameter resolution as 4. In this way, a total of 12 major cell clusters were identified. We annotated these cell clusters using the expression of known marker genes. Marker genes include CD3E and CD3D for T cells; MS4A1 for B cells; NKG7 and GZMB for NK cells; CPA3 for mast cells; AIF1 for myeloid cells; S100P and AGR2 for S100P+ epithelial cells; MZB1 and CD79A for MZB1+ epithelial cells; MUC6 for CXCL17+ epithelial cells; DCN and MMP2 for fibroblasts; RAMP2 and PLVAP for endothelial cells; CHGA and SCG5 for enteroendocrine cells; and RGS5 for smooth muscle cells. Differential gene expression analysis Differential gene expression analysis was performed using the tool NEBULA, which performs differential expression based on a negative binomial mixed model for two annotated cell groups.[121]^19 In R (v.4.2.2), we performed differential gene expression analysis by using the NEBULA software package (v.1.2.2) to find DEGs in each immune cell type between IM and GC states. If the log-transformed fold change (log FC) is ≥1 and the p value is <0.05, then the gene was defined as up-regulated. If the log FC is ≤ −1 and the p is <0.05, then the corresponding gene was defined as down-regulated. Gene set enrichment analysis We performed the gene set enrichment analysis of DEGs by using the online platform WebGestalt ([122]http://www.webgestalt.org/), with which the enriched KEGG pathways were derived, and we defined the pathways with a false discovery rate ≤ 0.05 as the significantly enriched pathways. Metacell-2 for scRNA-seq data We loaded the Metacells package (v.0.8.0)[123]^24 into python (v.3.9.12) and ran the Metacell-2 algorithm on scRNA-seq data with default parameters. In the end, we obtained 11,337 metacells for subsequent analysis. Annotation of metacells We applied the same methods for dimension reduction, clustering, and annotation to metacells. First, we extracted the gene expression profiles of metacells of the top 4,000 HVGs. Then we ran “pp.pca” and “tl.umap” functions on this expression matrix in turn to achieve the purpose of dimension reduction. Finally, we used the “tl.leiden” function to cluster metacells and set the parameter resolution as 0.05. In this way, a total of 10 metacell clusters were identified. We annotated these metacell clusters by using the expression of known marker genes. Marker genes included CD3E and CD3D for T cells; MS4A1 for B cells; TXK for NK cells; CPA3 and TPSAB1 for mast cells; AIF1 for myeloid cells; MZB1 and CD79A for epithelial cells; DCN for fibroblasts; ESAM for endothelial cells; CHGA and SCG5 for enteroendocrine cells; and TSPAN8, SMIM22, and ELF3 for enterocytes. Bulk RNA-seq data analysis We used the online platform CIBERSORTx ([124]https://cibersortx.stanford.edu/) to estimate the composition of various cell populations in bulk RNA-seq data. Signature gene matrices were created using the expression profiles of metacells as the reference gene expression profiles. We ran the “Impute Cell Fractions” module with default parameters and enabled S-mode batch correction. In this way, we obtained the proportions of each cell type for each bulk RNA-seq sample. Division of T cells and myeloid cells First, we extracted the expression matrices of the top 4,000 HVGs of T cells and myeloid cells, respectively. Then, we used the “pp.pca” and “tl.umap” functions to reduce the dimensions of the two expression matrices. Finally, we used the “tl.leiden” function to re-cluster T cells and myeloid cells, respectively, and set the parameter resolution as 0.25 for both. By this means, three subtypes in T cells and five subtypes myeloid cells were identified. We annotated these subtypes by using the expression of known marker genes. Marker genes included GZMK and CD8 for cytotoxic T cells; CCR7 for naive T cells; FOXP3 for regulatory T cells; FCGR3A for intermediate monocytes; FCN1 and VCAN for monocytes/macrophages; IFITF1 for non-classical monocytes; IDO1 and CLIC2 for conventional dendritic cells; and PLD4 and IRF8 for pDCs. Data and code availability This study did not generate new unique reagents. Raw data of this study are available on the GEO with GEO: [125]GSE183904 and [126]GSE134520 and the dbGaP with the identifier phs001818.v1.p1. Acknowledgments