Abstract

   Precancerous lesions typically precede gastric cancer (GC), but the
   molecular mechanisms underlying the transition from these lesions to GC
   remain unclear. Therefore, it is urgent to understand this transition
   from precancerous lesions to GC, which is crucial for the early
   diagnosis and treatment of GC. In this study, we merged multiple
   single-cell RNA sequencing datasets to investigate the molecular
   changes in distinct cell types associated with the progression of GC.
   First, we observed an increasing abundance of immune cells and a
   decrease in non-immune cells from non-atrophic gastritis to GC. Five
   immune cell types were significantly enriched in GC compared to
   precancerous lesions. Moreover, we found that the interleukin (IL)-17
   signaling pathway and Th17 cell differentiation were significantly
   up-regulated in immune cell subsets during GC progression. Some genes
   in these processes were predominantly expressed at the GC stage,
   highlighting their potential as diagnostic markers. Furthermore, we
   validated our findings using bulk RNA sequencing data from The Cancer
   Genome Atlas and confirmed consistent immune cell changes during GC
   progression. Our study provides insights into the immune infiltration
   and signaling pathways involved in the development of GC, contributing
   to the development of early diagnosis and targeted treatment strategies
   for this malignancy.

   Keywords: gastric cancer, tumor microenvironment, inflammation,
   single-cell RNA sequencing, IL-17 signaling pathway

Graphical abstract

   graphic file with name fx1.jpg
   [41]Open in a new tab
     __________________________________________________________________

   Lv and colleagues reveal that the immune microenvironment undergoes
   significant changes during the progression from inflammation-induced
   precancerous lesions to gastric cancer (GC). They identify an increase
   in immune cell populations and up-regulation of the IL-17 signaling
   pathway, highlighting potential biomarkers and therapeutic targets for
   early GC diagnosis and treatment.

Introduction

   Nowadays, cancer is one of the serious threats to public health. The
   number of new cancer cases and deaths is increasing year by year.[42]^1
   There are many hypotheses about the cause of cancer, including smoking,
   radiation, viruses, cancer-causing chemicals (carcinogens), and chronic
   inflammation.[43]^2^,[44]^3 As early as 1863, Balkwill et al.
   discovered that tumor tissue contains a large amount of inflammatory
   cell infiltration and put forward the hypothesis that tumors are
   originated from chronic inflammation.[45]^4 Inflammation is thought to
   be the body’s local protective response to injury or infection.[46]^5
   In recent years, studies have found that chronic or excessive
   inflammation can lead to cancer. More than 20% of cancers are caused by
   inflammation, such as gastritis for gastric cancer (GC),[47]^6
   pneumonia for lung cancer,[48]^7 and colitis for colorectal
   cancer.[49]^8 It is crucial to analyze the molecular mechanisms under
   this transformation to systematically study the transformation process
   from chronic inflammation into cancer and discover the important
   factors involved in this malignant transformation. Breakthroughs in
   this field will contribute to cancer prevention as well as early
   diagnosis and treatment.

   GC is one of the most common cancers in clinical practice, with a high
   incidence and mortality worldwide, especially in Asia.[50]^9 According
   to the World Health Organization Global Cancer Statistics Report 2020,
   it is estimated that the number of new patients with GC worldwide
   exceeds 1 million every year, and the main age of onset is between 60
   and 84 years old.[51]^10^,[52]^11 The link between intestinal
   metaplasia (IM) and GC was revealed in 1938.[53]^12 In 1955, Morson
   reported that GC occurred in IM.[54]^13 Intestinal GC is preceded by
   precancerous lesions, including non-atrophic gastritis (NAG), chronic
   atrophic gastritis (CAG), and IM.[55]^14 Patients with these
   precancerous lesions have an increased risk of developing GC.[56]^14

   Identifying the genes and pathways that drive cancer formation has been
   the focus of large-scale genomics research. However, most research has
   focused on the overall characteristics of advanced gastric tumors,
   largely ignoring related studies on precancerous lesions.[57]^8 Our
   understanding of the phenotypic changes and molecular drivers of the
   transition from precancerous lesions to GC remains inadequate.[58]^8
   The identification of the biomarkers in this transformation process
   could help in the early diagnosis of GC so that most patients with GC
   can get a radical cure to improve their health.

   In this study, we collected multiple single-cell RNA sequencing
   (scRNA-seq) datasets from the Gene Expression Omnibus (GEO). These
   include NAG, CAG, IM, GC tumor samples, and adjacent normal tissues
   (ANTs). By comparing the samples from gastritis to IM and then to GC,
   we found that the abundances of the five immune cell types vary
   identically. Meanwhile, all of them are significantly enriched in the
   GC. From IM to GC, interleukin (IL)-17 signaling pathway is
   up-regulated in the tumor, and its related genes are significantly
   differentially expressed in the tumor, such as TNF, IL17RA, IKBKG,
   TAB2, IL1B, and CASP8. These differentially expressed genes (DEGs) can
   be used as clinical diagnostic markers for the early diagnosis and
   treatment of GC. We have identified a set of genes that demonstrate
   up-regulation in immune cell subsets, suggesting their potential
   involvement in the disease process. While these genes show promise as
   biomarkers for the early diagnosis and treatment of GC, we acknowledge
   that the up-regulation of these genes is not exclusive to GC but is
   also observed in various immune-dependent disorders such as infections,
   allergies, and autoimmune diseases. This overlap in gene expression
   profiles complicates the specificity of these markers for GC and may
   lead to potential misdiagnosis in patients with concurrent diseases.

Results

Immune modifications during gastric carcinogenesis

   To identify markers of dynamic changes in the tumor immune
   microenvironment that are associated with the development and
   progression of GC from precancerous lesions, we obtained high-quality
   single-cell expression profiles of 211,056 high-quality cells from 58
   samples ([59]Figure S1A; [60]materials and methods). These included 3
   NAG samples, 3 CAG samples, 7 IM samples, 35 primary tumor samples, and
   10 adjacent normal samples ([61]Figure 1A). The tumor samples included
   one early GC sample, and the tumor samples ranged from stage I to stage
   IV ([62]Figure S1B). In order to identify different cell populations
   based on the expression of marker genes, we performed dimension
   reduction and unsupervised cell clustering using the method implemented
   in the Scanpy software suite[63]^15^,[64]^16^,[65]^17 and ultimately
   identified a total of 12 major cell clusters ([66]Figure 1B;
   [67]materials and methods). Based on the expression of known marker
   genes ([68]Figure 1C), all cell types in the gastric samples including
   immune cells, epithelial cells, fibroblasts, endothelial cells,
   enteroendocrine cells, and smooth muscle cells were revealed. Among
   them, immune cells consist of T cells, B cells, natural killer (NK)
   cells, mast cells, and myeloid cells. Epithelial cells have three
   subpopulations, S100P+ epithelial cells, MZB1+ epithelial cells, and
   CXCL17+ epithelial cells ([69]materials and methods).

Figure 1.

   [70]Figure 1
   [71]Open in a new tab

   Single-cell atlas of expression and changes in cellular composition
   during the development of GC

   (A) Uniform manifold approximation and projection (UMAP)
   representations of all scRNA-seq cells colored by the original tissue
   types.

   (B) UMAP representations and annotations of all cells.

   (C) Dot plot showing the expression of the marker genes of each cell
   type.

   (D) Boxplots of the cellular proportions of immune cells and non-immune
   cells in different samples. Each box represents data for a single
   tissue type. The boxplots and statistics are derived from 3 NAG
   samples, 3 CAG samples, 7 IM samples, 10 ANT samples, and 35 GC
   samples. Boxes represent the median, 25th percentile, and 75th
   percentile of the data, whiskers represent the highest and lowest
   values within 1.5 times the interquartile range of the boxplot, and all
   points are plotted.

   (E) Boxplots of immune cell type as in (D). t tests were performed for
   the proportion of adjacent tissue types and the proportion between IM
   and GC. p values smaller than 0.05 are shown in the plot.

   (F) Milo analysis of differential abundance changes between IM and GC
   samples. Neighborhoods that are significantly differentially abundant
   in IM samples are colored in red, and neighborhoods that are
   differentially abundant in GC samples are colored in blue.

   By tracking the changes in the proportion of these cell types in each
   sample, we found that the proportions of immune cell subsets gradually
   increase and the proportions of non-immune cell subsets gradually
   decrease from NAG to GC ([72]Figure 1D). Immune cells are significantly
   enriched at GC and ANT stages (t test, p < 0.05). Next, we captured the
   changes in the relative abundances of the five immune cell types
   separately. Their relative abundances show basically the same upward
   trend along the development of GC. We then tested whether the abundance
   of every immune cell type is significantly different between adjacent
   stages and found that the abundances of all five immune cell types were
   significantly different between IM and GC stages ([73]Figure 1E, t
   test, p < 0.05). We used another generalized linear model-based method
   (Milo) to test the difference in the abundances of cell types.[74]^18
   Milo’s results show that, relative to IM, all five immune cell types we
   identified were enriched at the GC stage, which is consistent with the
   previous observations ([75]Figure 1F). Taken together, these data
   suggest that the turning point in the trajectory of immune evasion of
   GC is at the IM stage.

Activated signaling pathways during GC progression

   Since the turning point in the trajectory of immune evasion is at the
   IM stage, we next explored the altered biological processes during this
   turning point. The NEBULA algorithm[76]^19 was used to obtain the DEGs
   within five immune cell subsets between IM and GC ([77]Figure 2A;
   [78]materials and methods). We then performed enrichment analysis on
   these DEGs within each immune cell subset separately and found that
   Th17 cell differentiation is significantly up-regulated at the GC stage
   compared with IM for all five immune cell subsets ([79]Figures S2–S4;
   [80]materials and methods). Th17 cells are a subset of effector T
   helper cells that produce IL-17 pro-inflammatory factors.[81]^20
   Besides, the IL-17 signaling pathway is significantly up-regulated in
   subsets of T cells, B cells, mast cells, and myeloid cells at the GC
   stage compared with IM ([82]Figures S2–S4). IL-17 is an effective
   pro-inflammatory cytokine, which has been confirmed to be closely
   related to the formation, growth, and metastasis of tumors in a variety
   of cancer types.[83]^21 IL-17 binds to its receptor, IL-17R, to form a
   heterodimeric receptor complex, which activates downstream pathways
   such as nuclear factor κB (NF-κB), Microtubule Affinity-Regulating
   Kinase (MAPKs), and CCAAT/Enhancer Binding Protein (C/EBPs) and induces
   the expression of antimicrobial peptides, cytokines, and
   chemokines.[84]^21 This is consistent with our observation that the
   NF-κB signaling pathway and MAPK signaling pathway are significantly
   up-regulated in all subsets of immune cells during GC
   ([85]Figures S2–S4).

Figure 2.

   [86]Figure 2
   [87]Open in a new tab

   Gene expression in GC development

   (A) Volcano plots of the results of differential expression analysis in
   immune cells between IM and GC by using the NEBULA algorithm. Red dots
   represent genes up-regulated (log FC ≥ 1 and p < 0.05) in GC, and blue
   dots represent genes down-regulated (log FC ≤ −1 and p < 0.05) in GC.

   (B and C) Dot plots of representative genes in the IL-17 signaling
   pathway mapped onto tissue types and cell types.

   (D and E) Dot plots of representative genes in Th17 cell
   differentiation mapped onto tissue types and cell types.

   Since Th17 cell-associated activities are the common event in altered
   immune cells in GC, we further examined the expression of specific
   genes involved in the IL-17 signaling pathway and Th17 differentiation.
   In the IL-17 signaling pathway, some genes were only expressed at the
   GC stage or up-regulated at the GC stage, such as TNF, IL17RA, IKBKG,
   TAB2, IL1B, and CASP8, among others ([88]Figure 2B). As shown in
   [89]Figure 2C, IL1B is mainly expressed by myeloid cells, CASP8 is
   mainly expressed by T and NK cells, and TNF is mainly expressed by
   myeloid cells, but TNF is also expressed by other immune cells. These
   results suggest that the IL-17 signaling pathways in GC require
   coordination between diverse immune cell types. Genes that contribute
   to Th17 cell differentiation are IL1B, IL2RB, ZAP70, HLA-DQA1, LAT, and
   IL4R. We found that many genes related to Th17 cell differentiation
   were expressed only at the GC stage or up-regulated at the GC stage,
   whereas none of these genes were found to be expressed in precancerous
   lesion samples ([90]Figure 2D). Meanwhile, we found that IL1B is mainly
   expressed by myeloid cells. Additionally, some genes are mainly
   expressed by two or three immune cells, such as ZAP70, IL2RB, LCK, and
   CD247 are mainly expressed by T and NK cells, HLA-DQA1 is mainly
   expressed by myeloid cells and B cells, and LAT is mainly expressed by
   T, NK, and mast cells. In addition, some genes are expressed in all
   immune cells, for instance, IL4R and RARA ([91]Figure 2E). These
   results further confirm that the IL-17 signaling pathway and Th17 cell
   differentiation in GC require the coordination between diverse immune
   cell types.

Immune infiltration of GC samples in TCGA

   To track the changes in immune cell states along GC progression, we
   obtained the bulk RNA-seq data of GC from The Cancer Genome Atlas
   (TCGA)[92]^22 and used the CIBERSORTx[93]^23 platform to deconvolute
   these GC bulk RNA-seq data to infer the cellular proportion of each
   sample of the TCGA GC data. Due to the limited storage space on the
   CIBERSORTx platform (1,000 megabytes), we used the Metacell-2 algorithm
   to reduce the number of cells in the merged scRNA-seq dataset.[94]^24
   Finally, the derived metacell dataset contains 11,337 metacells
   ([95]materials and methods). We identified 10 major cell clusters based
   on marker gene expression ([96]Figures 3A and 3B; [97]materials and
   methods). Although the granularity of metacell clusters is slightly
   different from those of single-cell clusters identified by single-cell
   expression profile, the major cell types in gastric tissues are all
   revealed, and the five immune cell types that we focus on in this study
   are consistent. Since the metacell dataset is still bigger than 1,000
   megabytes, we downsampled approximately 75% of the metacells multiple
   times to ensure that the deconvolution results are stable.

Figure 3.

   [98]Figure 3
   [99]Open in a new tab

   Immune cell composition of GC samples in TCGA

   (A) UMAP representations and annotations of metacells.

   (B) Dot plot of the expression of the marker genes for each cell type.

   (C) Violin plots of the proportion of each immune cell type in
   different samples from bulk RNA-seq dataset. Each violin represents
   data for a single GC stage. t test p values between any two stages less
   than 0.05 are shown in the plot.

   (D) Boxplots of the proportion of each immune cell type in different
   samples from scRNA-seq data (the first row) and bulk RNA-seq data (the
   second row). Each box represents data for a single tissue type or GC
   stage. The boxplots and statistics are derived from 3 NAG samples, 3
   CAG samples, 7 IM samples, 1 early GC samples, 4 stage I GC samples, 6
   stage II GC samples, 17 stage III GC samples, and 7 stage IV GC samples
   in the scRNA-seq data and 14 stage I GC samples, 70 stage II GC
   samples, 157 stage III GC samples, and 95 stage IV GC samples in the
   bulk RNA-seq data. T0, early GC; T1–T4, stage I–IV GC.

   Using metacells as the reference gene expression profiles, we
   deconvolved the GC TCGA data and obtained the inferred proportion of
   each cell type in each sample ([100]materials and methods). According
   to the clinical information, we grouped the samples by the stage of GC.
   As shown in [101]Figure 3C, the proportions of five immune cells show
   an overall upward trend with the progression of GC, and the proportions
   of T cells, B cells, and myeloid cells showed significant differences
   in different stages. Next, we compared the changes in the proportion of
   each immune cell in the four tumor stages of GC in the single-cell data
   and TCGA data. We found that the proportion of immune cells in the two
   data showed a similar trend during tumor progression ([102]Figure 3D).
   These data validate the changes in the abundance of immune cell subsets
   found in our single-cell data.

Signaling pathway analysis of subsets of T cells and myeloid cells

   Next, we further subdivided T cells and myeloid cells into subtypes and
   examined changes in the abundance of these two immune cell subtypes as
   well as their differential gene expression. We used the Scanpy software
   suite to perform dimension reduction and unsupervised cell clustering
   for both T cells and myeloid cells.[103]^15^,[104]^16^,[105]^17 We then
   annotated the subtypes based on the expression of known marker genes
   ([106]materials and methods). Ultimately, three T cell subtypes were
   identified, including regulatory T cells, cytotoxic T cells, and naive
   T cells ([107]Figure 4A). Meanwhile, five subtypes of myeloid cells
   were identified, including intermediate monocytes,
   monocytes/macrophages, non-classical monocytes, conventional dendritic
   cells (DCs), and plasmacytoid DCs (pDCs) ([108]Figure 4B).

Figure 4.

   [109]Figure 4
   [110]Open in a new tab

   The composition of subtypes of T cells and myeloid cells

   (A and B) UMAP representations and annotations of subtypes in T cells
   and myeloid cells.

   (C and D) Boxplots of the proportion of each T cell subtype and each
   myeloid cell subtype in different samples.

   (E) Dot plots of representative genes in the IL-17 signaling pathway
   mapped onto cell types.

   (F) Dot plots of representative genes in Th17 cell differentiation
   mapped onto cell types.

   Next, we examined the changes in abundance of the eight immune cell
   subtypes obtained above and used the t test to determine the
   significance of differences in abundance between the adjacent stages.
   We found the abundance of T cell and myeloid cell subtypes gradually
   increased from NAG to GC ([111]Figures 4C and 4D). The common feature
   among the changes in the abundance of the eight subtypes is that the
   difference in abundance between IM and GC was significant, and all
   eight subtypes were relatively significantly enriched in the GC and ANT
   stages (t test, p < 0.05). This is consistent with our previous
   conclusion that the turning point in the trajectory of immune evasion
   of GC is at the IM stage.

   From IM to GC, we used the NEBULA algorithm to obtain DEGs within
   subsets of T cells and myeloid cells. Then, we examined the DEGs within
   each subset and performed Kyoto Encyclopedia of Genes and Genomes
   (KEGG) pathway enrichment analysis using these genes. In the
   up-regulated gene enrichment analysis, we found that compared to IM,
   the IL-17 signaling pathway was significantly up-regulated only in
   regulatory T cells, intermediate monocytes, and monocytes/macrophages
   during the GC stage. Additionally, we found that Th17 cell
   differentiation was significantly up-regulated only in regulatory
   T cells, cytotoxic T cells, naive T cells, intermediate monocytes, and
   pDCs during the GC stage. At the same time, we examined the expression
   of genes involved in the IL-17 signaling pathway that were up-regulated
   at the GC stage in regulatory T cells, intermediate monocytes, and
   monocytes/macrophages. We found that genes expressed or up-regulated
   only at the GC stage were the same as when T cells and myeloid cells
   were not divided. We also examined the expression of genes involved in
   Th17 cell differentiation that were up-regulated at the GC stage in
   regulatory T cells, cytotoxic T cells, naive T cells, intermediate
   monocytes, and pDCs. We found that genes expressed or up-regulated only
   at the GC stage were almost the same as when T cells and myeloid cells
   were not divided, and only IL4R was not found. Finally, we examined the
   expression of these genes in different cell types and did not find any
   genes that were specifically expressed only in a subset
   ([112]Figures 4E and 4F).

   In summary, these data and results suggest that only some, but not all,
   immune subsets are involved in the activities related to the IL-17
   signaling pathway and Th17 cell differentiation. This also provides
   further evidence that the IL-17 signaling pathways in GC require
   coordination between diverse immune cell types.

Discussion

   Cancer is a growing threat to public health, with increasing incidence
   and mortality rates. Chronic inflammation has been recognized as a
   contributing factor to cancer development, and it has been observed
   that chronic or excessive inflammation can lead to a significant number
   of cancer types. In this study, we aimed to investigate the molecular
   mechanisms involved in the transformation of inflammation into GC and
   identify potential biomarkers for early diagnosis and treatment.

   Our study revealed the immune modifications during gastric
   carcinogenesis. By analyzing scRNA-seq data from multiple samples of
   NAG, CAG, IM, GC at different stages, and ANTs, we identified distinct
   immune cell populations and their dynamic changes throughout the
   progression of GC. We observed a gradual increase in the proportion of
   immune cell subsets and a decrease in non-immune cell subsets from NAG
   to GC, indicating a shift toward immune cell enrichment in the tumor
   microenvironment. Furthermore, we found that the abundances of all five
   immune cell types (T cells, B cells, NK cells, mast cells, and myeloid
   cells) were significantly different between IM and GC stages, with
   enrichment in the GC samples. This suggests that the turning point in
   the trajectory of immune evasion in GC occurs at the IM stage. These
   observations highlight the importance of understanding the immune cell
   dynamics and their functional changes during the progression of GC.

   We then focused on the IL-17 signaling pathway, which plays a crucial
   role in inflammation and has been implicated in various cancers. Our
   analysis revealed the up-regulation of the IL-17 signaling pathway in
   subsets of T cells, B cells, mast cells, and myeloid cells at the GC
   stages compared to IM. The IL-17 signaling pathway is known to activate
   downstream pathways, such as NF-κB and MAPKs, and induce the expression
   of pro-inflammatory cytokines and chemokines. The significant
   up-regulation of the NF-κB and MAPK signaling pathways in immune cells
   during GC further supports the involvement of the IL-17 pathway in
   gastric carcinogenesis. Moreover, we identified several specific genes
   involved in the IL-17 signaling pathway and Th17 cell differentiation
   that were expressed or up-regulated in GC samples, including TNF,
   IL17RA, IKBKG, TAB2, IL1B, and CASP8. These genes may serve as
   potential clinical diagnostic markers for early detection and treatment
   of GC. The coordinated expression of specific genes involved in the
   IL-17 signaling pathway and Th17 cell differentiation was observed
   across diverse immune cell types, emphasizing the interplay and
   cooperation between different immune subsets in GC. Meanwhile, in-depth
   analysis of immune cell subtypes, specifically T cells and myeloid
   cells, revealed that only a portion of immune subsets are involved in
   the activities related to the IL-17 signaling pathway and Th17 cell
   differentiation.

   To validate our findings, we analyzed bulk RNA-seq data from TCGA and
   confirmed the changes in immune cell proportions observed in our
   scRNA-seq data. The similarities in immune cell dynamics between the
   two datasets further support the reliability of our findings.

   In conclusion, our study provides comprehensive insights into the
   immune modifications and signaling pathways involved in gastric
   carcinogenesis. The identification of immune cell dynamics and the
   up-regulation of the IL-17 signaling pathway during GC progression
   highlight potential biomarkers for early diagnosis and treatment. Some
   genes involved in the IL-17 pathway and Th17 cell differentiation serve
   as potential drug targets for clinical applications. Further research
   and validation of these findings could contribute to improved cancer
   prevention, early detection, and therapeutic strategies for GC.

Materials and methods

Collection of scRNA-seq and bulk RNA-seq data

   scRNA-seq data were obtained from the GEO database under GEO:
   [113]GSE183904
   [114]^9 and [115]GSE134520,[116]^14 and from the Database of Genotypes
   and Phenotypes (dbGaP) under the identifier phs001818.v1.p1.[117]^25 In
   total, 58 samples from 43 patients were selected for analysis. These
   included 3 NAG samples, 3 CAG samples, 7 IM samples, 35 primary tumor
   samples, and 10 adjacent normal samples.

   Bulk RNA-seq data of patients with GC were downloaded from TCGA
   database ([118]https://portal.gdc.cancer.gov/).

Quality control and preprocessing of scRNA-seq data

   We performed quality control on each dataset using the method
   implemented in the Scanpy software suite[119]^17 before merging them.
   Genes detected in fewer than three cells and cells expressing less than
   200 detected genes were filtered out and excluded from downstream
   analysis. Next, cells would be labeled as poor-quality ones if they
   presented with one of the following conditions: (1) the number of
   Unique Molecular Identifier (UMIs) is lower than 600 or larger than
   120,000, (2) the number of expressed genes is lower than 400 or larger
   than 7,000, or (3) 20% or more of the UMIs were mapped to mitochondrial
   genes.[120]^14 Then, we continued to utilize the functions in the
   Scanpy to normalize the gene expression level to UMIs Per Million (UPM)
   and log transform the single-cell gene expression data. Finally, after
   merging these datasets, we obtained 211,056 high-quality cells, and
   15,569 genes were kept in the merged dataset, which was then used for
   subsequent analysis.

Dimension reduction, unsupervised clustering, and annotation of scRNA-seq
data

   Dimension reduction, unsupervised clustering, and annotation of
   scRNA-seq data were done using functions in the Scanpy software package
   (v.1.9.1).

   We identified a subset of highly variable genes (HVGs) in the
   preprocessed expression matrix and then scaled gene expression values
   before performing dimension reduction and clustering on them. The
   selection of HVGs in scRNA-seq data was implemented by the
   “pp.highly_variable_genes” function in the Scanpy package. The valid
   value of average expression was set to the range from 0.05 to 5,
   dispersion was set to no less than 0.5, and the default values were
   used for other parameters. Finally, the 4,000 HVGs were selected for
   further analysis.

   Next, we used the “pp.pca” function to perform the principal-component
   analysis on the single-cell expression matrix with genes restricted to
   HVGs and retained the top 50 principal components for subsequent
   analysis. Using these data, we performed Harmony and BBKNN to construct
   the K-Nearest Neighbors (KNN) graph of high-quality cells. In the same
   graph, we used the “tl.umap” function to project cells to 2D space for
   visualization. Meanwhile, we utilized the “tl.leiden” function to
   conduct an unsupervised graph-based clustering algorithm called Leiden
   to cluster cells. Due to the large number of cells in our study, we set
   the parameter resolution as 4. In this way, a total of 12 major cell
   clusters were identified.

   We annotated these cell clusters using the expression of known marker
   genes. Marker genes include CD3E and CD3D for T cells; MS4A1 for B
   cells; NKG7 and GZMB for NK cells; CPA3 for mast cells; AIF1 for
   myeloid cells; S100P and AGR2 for S100P+ epithelial cells; MZB1 and
   CD79A for MZB1+ epithelial cells; MUC6 for CXCL17+ epithelial cells;
   DCN and MMP2 for fibroblasts; RAMP2 and PLVAP for endothelial cells;
   CHGA and SCG5 for enteroendocrine cells; and RGS5 for smooth muscle
   cells.

Differential gene expression analysis

   Differential gene expression analysis was performed using the tool
   NEBULA, which performs differential expression based on a negative
   binomial mixed model for two annotated cell groups.[121]^19 In R
   (v.4.2.2), we performed differential gene expression analysis by using
   the NEBULA software package (v.1.2.2) to find DEGs in each immune cell
   type between IM and GC states. If the log-transformed fold change (log
   FC) is ≥1 and the p value is <0.05, then the gene was defined as
   up-regulated. If the log FC is ≤ −1 and the p is <0.05, then the
   corresponding gene was defined as down-regulated.

Gene set enrichment analysis

   We performed the gene set enrichment analysis of DEGs by using the
   online platform WebGestalt ([122]http://www.webgestalt.org/), with
   which the enriched KEGG pathways were derived, and we defined the
   pathways with a false discovery rate ≤ 0.05 as the significantly
   enriched pathways.

Metacell-2 for scRNA-seq data

   We loaded the Metacells package (v.0.8.0)[123]^24 into python
   (v.3.9.12) and ran the Metacell-2 algorithm on scRNA-seq data with
   default parameters. In the end, we obtained 11,337 metacells for
   subsequent analysis.

Annotation of metacells

   We applied the same methods for dimension reduction, clustering, and
   annotation to metacells. First, we extracted the gene expression
   profiles of metacells of the top 4,000 HVGs. Then we ran “pp.pca” and
   “tl.umap” functions on this expression matrix in turn to achieve the
   purpose of dimension reduction. Finally, we used the “tl.leiden”
   function to cluster metacells and set the parameter resolution as 0.05.
   In this way, a total of 10 metacell clusters were identified.

   We annotated these metacell clusters by using the expression of known
   marker genes. Marker genes included CD3E and CD3D for T cells; MS4A1
   for B cells; TXK for NK cells; CPA3 and TPSAB1 for mast cells; AIF1 for
   myeloid cells; MZB1 and CD79A for epithelial cells; DCN for
   fibroblasts; ESAM for endothelial cells; CHGA and SCG5 for
   enteroendocrine cells; and TSPAN8, SMIM22, and ELF3 for enterocytes.

Bulk RNA-seq data analysis

   We used the online platform CIBERSORTx
   ([124]https://cibersortx.stanford.edu/) to estimate the composition of
   various cell populations in bulk RNA-seq data. Signature gene matrices
   were created using the expression profiles of metacells as the
   reference gene expression profiles. We ran the “Impute Cell Fractions”
   module with default parameters and enabled S-mode batch correction. In
   this way, we obtained the proportions of each cell type for each bulk
   RNA-seq sample.

Division of T cells and myeloid cells

   First, we extracted the expression matrices of the top 4,000 HVGs of
   T cells and myeloid cells, respectively. Then, we used the “pp.pca” and
   “tl.umap” functions to reduce the dimensions of the two expression
   matrices. Finally, we used the “tl.leiden” function to re-cluster
   T cells and myeloid cells, respectively, and set the parameter
   resolution as 0.25 for both. By this means, three subtypes in T cells
   and five subtypes myeloid cells were identified.

   We annotated these subtypes by using the expression of known marker
   genes. Marker genes included GZMK and CD8 for cytotoxic T cells; CCR7
   for naive T cells; FOXP3 for regulatory T cells; FCGR3A for
   intermediate monocytes; FCN1 and VCAN for monocytes/macrophages; IFITF1
   for non-classical monocytes; IDO1 and CLIC2 for conventional dendritic
   cells; and PLD4 and IRF8 for pDCs.

Data and code availability

   This study did not generate new unique reagents. Raw data of this study
   are available on the GEO with GEO: [125]GSE183904
   and [126]GSE134520 and the dbGaP with the identifier phs001818.v1.p1.

Acknowledgments