Abstract In this study, we comprehensively charted the cellular landscape of colorectal cancer (CRC) and well-matched liver metastatic CRC using single-cell and spatial transcriptome RNA sequencing. We generated 41,892 CD45^− nonimmune cells and 196,473 CD45^+ immune cells from 27 samples of six CRC patients, and found that CD8_CXCL13 and CD4_CXCL13 subsets increased significantly in liver metastatic samples that exhibited high proliferation ability and tumor-activating characterization, contributing to better prognosis of patients. Distinct fibroblast profiles were observed in primary and liver metastatic tumors. F3^+ fibroblasts enriched in primary tumors contributed to worse overall survival by expressing protumor factors. However, MCAM^+ fibroblasts enriched in liver metastatic tumors might promote generation of CD8_CXCL13 cells through Notch signaling. In summary, we extensively analyzed the transcriptional differences of cell atlas between primary and liver metastatic tumors of CRC by single-cell and spatial transcriptome RNA sequencing, providing different dimensions of the development of liver metastasis in CRC. __________________________________________________________________ Different cellular maps exist in CRC primary and liver metastatic sites, indicating the heterogeneity in different settings. INTRODUCTION Colorectal cancer (CRC) is the third most common malignant tumor worldwide ([48]1). About 21 to 26% of patients present with synchronous metastatic diseases ([49]2), in which 14.5 to 17.5% of patients develop liver metastases ([50]3, [51]4). Metastatic CRCs are associated with poor survival rates, and CRC-derived liver metastasis (CRCLM) leads to a great clinical challenge ([52]3). The metastatic process is a multistep event that entails cancer cells to escape from the primary site, survive in circulation, seed at distant sites, and grow ([53]5, [54]6). It is well established that metastatic spread is promoted by communication between cancer cells and stromal cells via secretion of cytokines, growth factors, and proteases that remodel the tumor microenvironment (TME) ([55]7). The cellular compositions of TME is highly complex, including diverse populations of fibroblasts and immune cells that play important roles in cancer evasion, metastasis, and responses to treatment ([56]8, [57]9). Cancer-associated fibroblast (CAF) is a predominant stromal cellular component in many types of malignancies including CRCs, breast cancers, and pancreatic cancers ([58]10–[59]12). Several subsets of CAFs with different functions have been identified in a variety of tumors. CD10^+GPR77^+ CAFs can promote breast and lung cancer chemoresistance and tumor formation by providing a survival niche for cancer stem cells ([60]13). There are also antigen-presenting CAFs, which can activate CD4^+ T cells in an antigen-specific fashion ([61]14). The tumor immune microenvironment (TIME) is composed of diverse populations of lymphocytes and myeloid cells. CD8^+ T cells have been proven as formidable “soldiers” in antitumor immune responses. The infiltration of CD8^+ T cells in solid tumors predicts a favorable prognosis ([62]15). The immune milieu of primary CRCs has been described by single-cell RNA sequencing (scRNA-seq) ([63]16, [64]17), but the global cellular landscape of CRCLM and the interaction network differences between primary and metastatic CRC are rarely reported. To comprehensively chart the cellular landscape of CRC and matched liver metastasis, in this study, we generated high-resolution cell maps using scRNA-seq in conjunction with spatial transcriptome (ST) and immunohistochemistry (IHC) staining as well as flow cytometry. Different phenotypic profiles and highly variable frequencies of major cell types, such as CAF, CD8^+ T cells, and tumor-associated macrophages, were observed between CRC primary tumor and liver metastasis, which would indicate the molecular heterogeneity within TME and provide potential targets for future cancer therapy. RESULTS Global single-cell transcriptome map of CRC primary tumors and CRC-derived liver metastatic tumors To elucidate the cellular heterogeneity of CRC primary tumors and CRC-derived liver metastatic tumors, CD45^− nonimmune cells and CD45^+ immune cells were collected from primary colorectal cancer (CC), adjacent normal colorectal mucosa (CN), liver metastasis (LM), adjacent normal liver tissue (LN), and peripheral blood (PB) from six patients with CRC for single-cell transcriptome analysis ([65]Fig. 1A). After quality control and filtration, we obtained 41,892 CD45^− nonimmune cells and 196,473 CD45^+ immune cells from 27 samples and identified 23 clusters of nonimmune cells and 41 clusters of immune cells. With marker-based annotations, tumor cells (EPCAM and SOX9), fibroblasts (COL1A1 and COL1A2), and endothelial cells (PECAM1 and CD34) were identified from nonimmune cells (fig. S1A), while T cells (CD3. pAbO), natural killer (NK) cells (CD56. pAbO), B cells/plasma cells (CD19. pAbO), monocytes/macrophages (CD14. pAbO), dendritic cells (DCs) (HLA.DRA), and mast cells (TPSAB1) were identified from immune cells (fig. S1B). Each subset had a distinct gene expression pattern (fig. S1, C and D). Fig. 1. Global cellular landscape in CRC liver metastasis. [66]Fig. 1. [67]Open in a new tab (A) Schematic overview of the experimental design and analytical workflow. (B) UMAP visualization of nonimmune cell clusters. (C) Volcano plot comparing cell type relative abundance of nonimmune cell clusters in CC versus LM (n = 5 patients). The x axis represents the log[2] fold change, and the y axis represents the −log[10] P value according to Mann-Whitney U test. Each dot represents a cell type. (D) UMAP visualization of immune cell clusters. (E) Volcano plot comparing cell type relative abundance of immune cell clusters in the tumors versus the paratumors (n = 6 patients). x and y axes represent the log[2] fold change according to Mann-Whitney U test. Each dot represents a cell type. Further unsupervised clustering in the major population of nonimmune cells gave rise to 11 tumor cell clusters, 8 fibroblast clusters, and 6 endothelial cell clusters ([68]Fig. 1B). CA2 and F2_MCAM were enriched in LM. F4_F3 and F5_CCL11 were significantly enriched in CC ([69]Fig. 1C and fig. S1E). On the basis of the expression of known markers, the immune cells were divided into 41 populations ([70]Fig. 1D). In the primary TME, the percentages of regulatory T cells (T[regs]) and macrophages were higher compared with those in the CN control. Although monocytes, NK cells, and mucosal-associated invariant T (MAIT) cells were predominant in LN, TIME was remodeled by metastatic tumor cells with significant up-regulation of T[regs] and macrophages ([71]Fig. 1E and fig. S1F). In summary, we identified multiple cell populations with distinct distribution patterns in different tissues of CRCLM. Spatial distribution profile of CRC primary tumors and CRC-derived liver metastatic tumors To comprehensively analyze the spatial distribution profile of CRC primary tumors and CRC-derived liver metastatic tumors, we collected six tissue specimens from six patients including four cases of CRC primary tumors (C1 to C4) and two cases of liver metastatic tumors (L1 and L2). The tumor area (T) and paratumor area (PT) were identified by hematoxylin and eosin (H&E) staining and gene expression features of each sample ([72]Fig. 2A and fig. S2). Compared with paratumor tissues, tumor tissues were enriched with cell cycle–related pathways, such as G[2]-M checkpoint, E2F targets, and mitotic spindle ([73]Fig. 2B). According to the gene expression profiles, tumor tissues and paratumor tissues were divided into different regions ([74]Fig. 2A). Considering that each spot contained multiple cells, we adopted a signature-based strategy, which integrated the ST and scRNA-seq data to estimate the proportions of different cell types for each captured spot ([75]18). B cells, T cells, NK cells, plasma cells, myeloid cells, tumor cells, fibroblasts, and endothelial cells were identified in the ST tissues. The proportions and scores of these cell clusters varied in each region. The ratios of plasma cells decreased in tumor tissues compared with those in paratumor tissues of C1, C3, and C4 ([76]Fig. 2, C and D, and fig. S3). Fig. 2. Cellular identification in spatial transcriptomic samples. [77]Fig. 2. [78]Open in a new tab (A) Overview of the spatial transcriptomic sections. H&E staining of spatial transcriptomic sections (left). Tumor tissue and paratumor tissue identification of each section (middle). Spatial cluster distribution of each section (right). (B) Heatmap showing the enrichment scores of hallmark gene sets of tumor tissue and paratumor tissue in each section. (C) Cluster identification combined with single-cell RNA expression profiles in primary CRC sections. Left: SPOTlight deconvolution results, which show the cell cluster proportions in each spot. Middle: Cluster definition of each spot, which is identified as the most dominant cell cluster there. Right: Proportions of identified cell clusters in different regions, which represent the mean proportions of cell clusters in all spots of each region. (D) Cluster identification combined with single-cell RNA expression profiles in liver metastatic sections. Transcriptomic heterogeneity of tumor cells between primary and liver metastatic tumors Studies have revealed that human colorectal tumor cells contain multiple cell types whose transcriptional characteristics mirror that of the normal colorectal epithelium ([79]19). Stem-like/transit amplifying cells, colonocytes, goblet cells, and tuft cells have been identified in the normal epithelium ([80]16). Thus, subclustering analysis was performed and revealed divergent differentiation lineages of tumor cells, namely, CA1 to CA11 (fig. S4A). The differentiation marker genes of tuft cells, such as LRMP and TRPM5, were identified in CA8. CA9 showed higher expression of MUC2, a marker for goblet cells (fig. S4B). The heterogeneity of tumor cell populations suggests the differentiation potential of tumor cells. It has been reported that tumor tissues generated from a single tumor cell can recapitulate the cellular diversity of the parental tumors ([81]19). The population diversity of tumor cells in LM was similar to that of CC ([82]Fig. 1C and fig. S4C). To further investigate the functional pathways and transcriptional programs of different tumor cell clusters, we performed gene set variation analysis (GSVA) and transcription factor (TF) analysis. The results showed that CA2 was enriched with Wnt–β-catenin signaling, and the TF LEF1, which plays an important role in the Wnt–β-catenin signaling pathway, was up-regulated in CA2. Furthermore, CA2 exhibited the up-regulation of the somatic stem cell division pathway (fig. S4D). Consistently, CA2 had the highest expression level of LGR5, a marker of intestinal stem cells. These results indicated that CA2 might be the stem cell subset in TME. In addition, CA2 had high expression levels of EPCAM, CDH1, and PRSS2, facilitating the adhesion and colonization of tumor cells (fig. S4, B and E). Collectively, cluster identification and characterization analysis demonstrated the functional diversification of tumor cells, and metastatic tumors can recapitulate the cellular diversity of the parental tumors. TME reshapes the composition of myeloid cells Myeloid cells are heterogeneous subsets in TME, and three kinds of myeloid cells were identified: monocytes/macrophages, DCs, and mast cells. Using subclustering analysis, we identified eight clusters of monocytes/macrophages and three clusters of conventional DCs (cDCs) ([83]Fig. 3A), and each subset had a distinct gene expression pattern (fig. S5, A and B). There were distinct distribution patterns of myeloid cells in each site ([84]Fig. 3B), demonstrating the organ-specific characterization. CD14^+ monocyte was the most enriched subset in PB (fig. S5C). Compared with those in the paratumors, the Mac_SPP1 subset was enriched in the tumors, and most of the cells in the Mac_SPP1 subset were from the primary tumors. In addition, the proportion of Mac_CXCL9 was significantly increased in LM ([85]Fig. 3C and fig. S5D), which had high expression level of CXCL9, suggesting that this subset could recruit CXCR3-positive effector T cells into tumors. Genes involved in recruiting myeloid cells especially granulocytes, such as CXCL3, were enriched in the Mac_SPP1 subset ([86]Fig. 3D). Myeloid-derived suppressive cells can express high level of CXCR2, which is the receptor of CXCL3 ([87]20) and can suppress the CD8^+ T cells by expressing arginase and inducible nitric oxide synthase (iNOS) ([88]21). GSVA analysis showed that the Mac_SPP1 subset was engaged in the pathway related to inflammatory response. However, the Mac_CXCL9 subset was enriched in the interferon-γ (IFN-γ) response and T cell activation pathways ([89]Fig. 3E). The trajectory analysis showed that both the Mac_CXCL9 and Mac_SPP1 subsets were terminally differentiated, indicating that they were tumor-activated subsets ([90]Fig. 3F). Fig. 3. Immune landscape of myeloid cells. [91]Fig. 3. [92]Open in a new tab (A) UMAP visualization of myeloid cell clusters. (B) Proportions of the myeloid cell clusters in CN, CC, LN, LM, and PB. (C) Proportions of MAC_SPP1, MAC_CXCL9, and cDC_LAMP3 subsets in myeloid cells. P values were determined by the paired nonparameter test. (D) Volcano plot showing differentially expressed genes between MAC_CXCL9 and MAC_SPP1 subsets. (E) GSVA analysis showing pathways enriched in MAC_CXCL9 and MAC_SPP1 subsets. (F) Monocle analysis showing the developmental trajectory of myeloid cells. (G) Feature plots showing expressions of LAMP3 and CCR7 in myeloid cells. (H) Violin plots showing expressions of CD40, CD80, and CD86 in myeloid cells. DCs can be divided into plasmacytoid DCs (pDCs) and cDCs according to their cytokine expression and function. Different subsets of cDCs, including cDC1 and cDC2, have been identified in TME ([93]22). In our data, both cDC1 (cDC_CPNE3) and cDC2 (cDC_CD1c) were identified in CC and LM ([94]Fig. 3A). Moreover, cDC_LAMP3 was also identified and enriched in LM ([95]Fig. 3C and fig. S5D). cDC_LAMP3 expressed high levels of CCR7 ([96]Fig. 3G), which is the receptor of CCL19 and CCL21, indicating that it had migration ability to the lymph node. Furthermore, the cDC_LAMP3 subset exhibited high expression levels of costimulatory molecules CD40, CD80, and CD86, markers for DC maturation ([97]Fig. 3H). The percentage of the cDC_LAMP3 subset was increased significantly in LM, indicating the education of tumor cells on TIME. CXCL13^+ T cells are enriched in liver metastatic tumors T cells, especially CD8^+ T cells and CD4^+ T cells, are predominant in the adaptive immune response. Among T cells, seven clusters of CD8^+ T cells, five clusters of conventional CD4^+ T (cCD4) cells, and two clusters of T[regs] were identified. Both CD8^+ T cells and CD4^+ T cells included a cluster expressing CXCL13, a chemokine for CXCR5 ([98]Fig. 4, A and B, and fig. S6, A to D). The percentage of the CD8_CXCL13 cells was significantly increased in LM compared with those in LN; however, this subset was rarely detected in CN, LN, and PB ([99]Fig. 4, C and D). In the steady state of the colon, about 3 to 12% of CD4 cells were CXCL13 positive, which is denoted as follicular T helper (T[FH]). In CC, the percentage of CD4_CXCL13 was decreased, but the percentage of this subset was increased in LM compared with that in LN ([100]Fig. 4E). Fig. 4. Transcriptional reprogramming of tumor-infiltrated T cells. [101]Fig. 4. [102]Open in a new tab (A) Proportions of the CD8^+ T cell clusters in CN, CC, LN, LM, and PB. (B) Proportions of the CD4^+ T cell clusters in CN, CC, LN, LM, and PB. (C) Volcano plot comparing cell type relative abundance of T cell clusters in the tumors versus the paratumors (n = 6 patients). X and y axes represent the log[2] fold change according to Mann-Whitney U test. Each dot represents a cell type. (D) Proportions of CD8_CXCL13 subsets in CD8^+ T cells. (E) Proportions of CD4_CXCL13 subsets in CD4^+ T cells. (F) GSVA pathway analysis of CD8^+ T cell clusters. (G) GSVA pathway analysis of CD4^+ T cell clusters. (H) Ki67 expression levels in different cell subsets in CN, CC, LN, and LM by flow cytometry. Gated on tumor-infiltrated CD8^+CD45RO^+ T cells. To further investigate the characteristics of the CXCL13^+ T subset, we analyzed the pathways up-regulated in each subset. GSVA analysis revealed that the CD8_CXCL13 subset was enriched with T cell proliferation pathway, and the CD4_CXCL13 subset was enriched with G[2]-M checkpoint pathway, showing the proliferative properties of these two subsets ([103]Fig. 4, F and G). The gene analysis of each cluster showed that CD8_CXCL13 cells expressed high level of ITGAE, which is a marker for tissue-resident memory T (T[RM]) cells. Considering that the T[RM] cells were also CD69 positive (fig. S6E), we identified CD69^+CD103^+CD8^+ T cells by flow cytometry. Most CD103^+ cells were CD69-positive cells, and the CD69^+CD103^+CD8^+ T cells were rarely presented in LN (fig. S6F). Consistently, the expression levels of Ki67 in CD69^+CD103^+CD8^+ T cells from both CC and LM were higher than those in other CD8^+ T cells. However, the proliferative properties of different CD8^+ T cell subsets from CN and LN were almost identical ([104]Fig. 4H). Together, these results indicated that both CD8_CXCL13 and CD4_CXCL13 cells were up-regulated in LM of CRC because of their high proliferation abilities. The CXCL13^+ T cells are associated with good prognosis of CRC patients In aforementioned results, we showed that CD8_CXCL13 was enriched in CC and LM compared with the CD4_CXCL13 subset ([105]Fig. 4, D and E), indicating that the CD8_CXCL13 cells might be a tumor-activating subset. Thus, we further investigated the characteristics of the CD8_CXCL13 subset. High levels of exhausted markers, such as PDCD1, HAVCR2, LAG3, CTLA4, and TIGIT, were observed in the CD8_CXCL13 subset compared with other subsets ([106]Fig. 5, A and B). Previous studies showed that tumor-reactive T cells exhibited an exhaustion phenotype due to the persistent tumor antigen stimulation ([107]23–[108]25), and the exhaustion phenotype indicated their tumor reactivity. The trajectory analysis showed that the CD8_CXCL13 cells were terminally differentiated ([109]Fig. 5C). Furthermore, CXCL13^+ T cell subsets showed remarkable clonal expansion ability in TME, further indicating their antigen experience properties ([110]26). In addition, the CD8_CXCL13 subset also expressed high levels of effector molecules, such as GZMB ([111]Fig. 5, A and B), which indicated that this subset might preserve partial antitumor function. Because it was difficult to detect CXCL13 in the tissues, we attempted to use CD69 and CD103 to label CXCL13^+ cells, and the IHC results showed that CD69^+CD103^+CD8^+ T cells were more adjacent to the CK19^+ tumor cells than other CD8^+ T cells in the tumor tissues ([112]Fig. 5, D and E), facilitating their tumor-killing function. Notably, CD103^+CD8^+ T cells were present in the tertiary lymphoid structures (TLSs) of CRC tissue, and patients with a higher TLS score had a higher CD8_CXCL13 score, which indicated that this subset might participate in the formation of TLS ([113]Fig. 5, F and G). Fig. 5. Characterization and prognostic effect of CXCL13^+ T cells. [114]Fig. 5. [115]Open in a new tab (A) Volcano plot showing differentially expressed genes between CD8_CXCL13 and other CD8^+ T cells. (B) Feature plots showing the checkpoint and effector molecules in CD8^+ T cells. (C) Monocle analysis showing the developmental trajectory of CD8^+ T cells. (D) Multicolor IHC staining of CD69^+CD103^+CD8^+ T cells in one representative CRC tumor. (E) Distances of CD69^+CD103^+CD8^+ T cells and other CD8^+ T cells to cancer cells. (F) Multicolor IHC staining of CD103^+CD8^+ T cells in CRC tertiary lymphoid structure. (G) Relationship of tertiary lymphoid structure score and CD8_CXCL13 infiltration score in the [116]GSE39582 dataset. (H) Overall survival of CXCL13^high and CXCL13^low patients from the [117]GSE39582 dataset. To explore the prognostic value of CXCL13^+ T cells in CRC, we divided CRC patients from the Gene Expression Omnibus (GEO) cohort into CXCL13^high and CXCL13^low groups and found that the higher expression of CXCL13 in CC predicted better overall survival ([118]Fig. 5H). To summarize, CXCL13^+ T cells enriched in TME were a tumor-reactive subset and contributed to a better prognosis of CRC patients. Distinct subsets of fibroblasts exist in the primary and liver metastatic tumors of CRC Fibroblasts are the major types of stromal nonimmune cells in TME ([119]27). Various fibroblast subsets have been identified in a wide range of tumors ([120]10–[121]12). Thus, we further characterized the fibroblasts to explore their heterogeneity in CRC primary and liver metastatic tumors. Eight clusters of fibroblasts with distinct gene expression patterns were identified in TME of CRC, which were designated as F1_PRELP, F2_MCAM, F3_HAS1, F4_F3, F5_CCL11, F6_IGFL2, F7_COCH, and F8_MKI67 ([122]Fig. 6A and fig. S7, A and B). All the fibroblast highly expressed ACTA2 (fig. S7B), which was widely reported as an important marker of CAF. The percentage of the F2_MCAM cluster was higher in LM compared with that in CC. The F4_F3 and F5_CCL11 clusters mostly contained cells derived from CC ([123]Fig. 6B and fig. S7C). The infiltration of F4_F3 was increased in CC compared with that in CN according to the bulk RNA-sequencing data. However, the percentage of the F5_CCL11 cluster was decreased in CC, which might turn into the F4_F3 cluster based on the trajectory analysis ([124]Fig. 6C and fig. S7, D and E). Trajectory analysis also predicted that F2_MCAM and F4_F3 were two distinct terminally differentiated clusters ([125]Fig. 6C). IHC analysis verified that F2_MCAM existed in CC and LM ([126]Fig. 6D). According to the IHC result, the MCAM^+ CAF is very rare in LN. Most of the MCAM^+ cells were endothelial cells (fig. S7F). However, consistent with the single-cell analysis, the F4_F3 subset only existed in CC, but not in LM, and this phenomenon might be caused by the absence of F4_F3 in LN and the presence in CN ([127]Fig. 6, E and F). ST analysis also confirmed that there was more infiltration of F4_F3 subset in CC than in LM ([128]Fig. 6, G and H). The F4_F3 subset closely laid around the epithelial cells in both CN and CC, facilitating its interaction with the epithelial cells ([129]Fig. 6I). Furthermore, the increased infiltration of F4_F3 in CC might lead to worse prognosis ([130]Fig. 6J). In summary, different phenotypic profiles and highly variable frequencies of fibroblasts were observed between primary and liver metastatic tumors of CRC, which indicated the cellular heterogeneity within TMEs in different cancer settings. Fig. 6. Transcriptome signatures and heterogeneity of fibroblasts in primary and metastatic tumors. [131]Fig. 6. [132]Open in a new tab (A) UMAP visualization of fibroblast clusters. (B) Proportions of the F2_MCAM, F4_F3, and F5_CCL11 clusters in CC and LM (P values were determined by the paired nonparameter test). (C) Monocle analysis showing the developmental trajectory of fibroblasts. (D) IHC staining of F2_MCAM in CC and LM. (E) IHC staining of F4_F3 in CC and LM. (F) IHC staining of F4_F3 in CN and LN. (G) F4_F3 fibroblast distributions in CC (C1 to C4) by ST. (H) F4_F3 fibroblast distributions in LM (L1 and L2) by ST. (I) Multicolor IHC staining of F4_F3 fibroblasts and tumor cells (CK19). (J) Overall survival of F3^high and F3^low patients from the [133]GSE39582 dataset. The F3-expressing fibroblast subset enriched in the primary tumors secretes protumor factors and is associated with poor prognosis of CRC patients To investigate the remodeling of TME by different fibroblasts enriched in CC and LM, we further analyzed the characteristics of F2_MCAM and F4_F3. F2_MCAM was enriched with JAG1 and NOTCH3, which participated in the NOTCH signaling pathway. F4_F3 was enriched with C3 and CXCL1, indicating that this subset was involved in the complement and inflammatory response pathway ([134]Fig. 7, A and B). F4_F3 also expressed high levels of MMP2 and MMP3, which might contribute to the extracellular matrix organization ([135]Fig. 7A and fig. S7G). In addition, protumor factors participating in angiogenesis and tumor invasiveness, such as VEGFA, NRG1, HGF, GDF15, AREG, and BMP2, were enriched in F4_F3 ([136]Fig. 7C and fig. S7G). The interaction analysis between F4_F3 and tumor cells further revealed their cross-talk through the NRG1 and Erb-B2 receptor tyrosine kinase 3 (ERBB3) pathway, and ERBB3 was almost enriched in the tumor cells and could form heterodimer with ERBB2, promoting tumor cell proliferation and resistance to cetuximab of CRC patients ([137]Fig. 7D and fig. S7, H and I) ([138]28, [139]29). ST analysis also indicated the colocalization of F3 with NRG1, which surrounded the ERBB3^+ tumor cells [140](Figs. 6G and [141]7, E to I). Transwell migration assay showed that recombinant human NRG1 (rNRG1) can promote the migration of RKO and SW620 cells ([142]Fig. 7J and fig. S7J). CRC patients with high expression levels of F3 and NRG1 had worse overall survival ([143]Fig. 7K). These results demonstrated that the CC-enriched fibroblast F4_F3 could lead to poor prognosis of CRC patients by secreting protumor factors. Fig. 7. Characterization and prognostic effect of F3^+ fibroblasts. [144]Fig. 7. [145]Open in a new tab (A) Volcano plot showing differentially expressed genes between F2_MCAM and F4_F3. (B) GSVA analysis showing pathways enriched in F2_MCAM and F4_F3 subsets. (C) Violin plots showing expressions of VEGFA, NRG1, HGF, GDF15, AREG, and BMP2 in fibroblasts. (D) Communication network between tumor cells and F4_F3 fibroblasts by CellPhone DB. (E) NRG1 and ERBB3 distributions in C1 sample. (F) NRG1 and ERBB3 distributions in C2 sample. (G) NRG1 and ERBB3 distributions in C3 sample. (H) NRG1 and ERBB3 distributions in C4 sample. (I) Pearson correlation of F3 (x axis) and NRG1 (y axis). (J) Effects of rNRG1 on migration of the RKO cells. The migrated cell number in six different fields was counted in the experiment, and the values were averaged. Three independent experiments were performed. The bars represent means ± SD (***P < 0.001). (K) Overall survival of F3^high NRG1^high and F3^low NRG1^low CRC patients from the [146]GSE39582 dataset. The MCAM-expressing fibroblast in TME of LM modulates the generation of CD8_CXCL13 cells through the Notch signaling pathway The Notch signaling TF RBPJ was preferentially expressed in both the CD8_CXCL13 and CD4_CXCL13 subsets ([147]Fig. 5A and fig. S6, C and D). It has been reported that the Notch signaling pathway can be activated by the ligand and receptor interaction ([148]30). Following the interaction with its ligands, the intracellular domain of Notch can be cleaved and translocate into the nucleus, where it modulates the transcription of its target genes. The RBPJ expression in LM was positively correlated with CXCL13 and ITGAE, which was not observed in CC ([149]Fig. 8A and fig. S8A). The CD8_CXCL13 and CD4_CXCL13 subsets mainly expressed the NOTCH1 receptor ([150]Fig. 8B). To further gain insights into what types of cells modulate Notch signaling in the CD8_CXCL13 and CD4_CXCL13 subsets, we analyzed the expression levels of Notch ligands, including DLL1, DLL3, DLL4, JAG1, and JAG2, and we found that Notch ligands were predominantly expressed in the fibroblasts and endothelial cells (fig. S8B). The F2_MCAM subset was enriched with JAG1, and the F5_COCH subset with DLL1, while the E2_DLL4 subset was enriched with DLL4, JAG1, and JAG2 ([151]Fig. 8C). The interaction analysis of Notch and its ligands using the CellPhone DB showed that among the endothelial cells, the E2_DLL4 subset exhibited the strongest interaction with CXCL13^+ T cells, and among the fibroblasts, the F2_MCAM cluster interacted with the CD8_CXCL13 and CD4_CXCL13 subsets by JAG1-NOTCH1 ([152]Fig. 8D and fig. S8C). Because of the scatter distribution pattern of the fibroblasts in TME, we speculated that the F2_MCAM subset contributed to the activation of Notch signaling in CXCL13^+ T cells. Fig. 8. NOTCH signaling in TME modulates the generation of CD8_CXCL13 cells. [153]Fig. 8. [154]Open in a new tab (A) Scatterplots showing correlations of CXCL13, ITGAE, and RBPJ in LM using the [155]GSE50760 dataset. (B) Dot plot showing the expressions of NOTCH1, NOTCH2, NOTCH3, and NOTCH4 in T cell clusters. (C) Dot plot showing the expressions of DLL1, DLL3, DLL4, JAG1, and JAG2 in nonimmune cell clusters. (D) Communication network of the Notch signaling pathway between CD8_CXCL13 cluster and nonimmune cells by CellPhone DB. (E) Percentages of CD8_CXCL13 in F2_MCAM infiltration high and low groups in LM of scRNA-seq data. (F) Infiltration scores of CD8_CXCL13 in F2_MCAM infiltration high and low groups in LM of [156]GSE50760 dataset. (G) Infiltration of F2_MCAM and CD8_CXCL13 subsets in L1 and L2 by ST. (H) Pearson correlation of signature score of CD8_CXCL13 (x axis) and F2_MCAM (y axis) in L1 and L2. Next, we divided the patients into two groups according to the ratio of F2_MCAM from LM and found that patients in the F2_MCAM-high group had higher proportion of CD8_CXCL13 subset in LM ([157]Fig. 8E). GEO dataset analysis showed that patients with a higher F2_MCAM infiltration score had a higher CD8_CXCL13 subset infiltration score in LM, but not the CD4_CXCL13 subset, and not in CC ([158]Fig. 8F and fig. S8, D and E). Furthermore, we detected the locations of F2_MCAM and CD8_CXCL13 in LM through ST. Consistent with the single-cell results, regions with a higher F2_MCAM infiltration score had a higher CD8_CXCL13 infiltration score in the ST samples ([159]Fig. 8, G and H). Moreover, the strength of the Notch signaling interaction in LM was much stronger than that in CC (fig. S8F), which might be due to the higher proportion of F2_MCAM subset in LM ([160]Fig. 6B). The Notch signaling pathway has been reported to modulate the antitumor immunity of CD8^+ T cells ([161]31); however, little is known about whether the Notch signaling pathway can modulate the expression of CXCL13. To explore such modulation, we predicted the binding sites of RBPJ on the CXCL13 promoter using JASPAR ([162]http://jaspar.genereg.net/). Several potential binding sites were detected (table S1), indicating that RBPJ might act as a TF influencing the expression of CXCL13. Intercellular interaction network in ST tissues We also investigated the cellular interactions between different clusters in ST of the primary tumors and liver metastatic tumors. It showed that the VEGFA-NRP1 and VEGFA-NRP2 ligand-receptor pairs were enriched in both the primary and liver metastatic tumors. We also found there were various different enriched ligand-receptor pairs between CC and LM. The ERBB3-NRG1 pair was enriched in both C1 and C3, but absent in both L1 and L2 ([163]Fig. 9A), indicating the potential role of the ERBB3-NRG1 interaction in the development and metastasis in the primary tumors. Fig. 9. Cellular ligand and receptor interactions in ST tissues. [164]Fig. 9. [165]Open in a new tab (A) Bubble heatmap showing the mean interaction strength between two clusters for ligand-receptor pairs in C1, C3, L1, and L2. Dot size indicated the statistical significances by permutation test. Dot color indicated the mean interaction strength levels. The size factor used here is 10^−4. (B) Cross-talk model between the fibroblasts and other cells in CC and LM. Together, we found that F3^+ fibroblast enriched in the CRC primary tumor could modulate the development and/or migration by producing various protumor factors, including NRG1, which interacted with ERBB3 expressed on tumor cells to exert their protumor functions. However, the MCAM^+ fibroblast enriched in LM could modulate the generation of CD8_CXCL13 cells through the Notch signaling pathway, different from the protumor subset F3^+ fibroblast in CC, indicating the cellular heterogeneity of stromal cells in different cancer settings ([166]Fig. 9B). DISCUSSION CRCLM is a multistep process and fatal in most of the cases ([167]3). In this study, we presented a comprehensive single-cell transcriptomic characterization of the primary and liver metastatic tumors of CRC, covering 41,892 nonimmune cells and 196,473 immune cells from six patients. We observed that colorectal tumor cells could recapitulate the multilineage differentiation processes of normal colon epithelium. Distinct cellular profiles of CAF were identified from CC and LM. The CC-enriched CAF subset F4_F3 was involved in promoting tumor invasiveness, while the LM-enriched CAF subset could promote the generation of tumor antigen–specific CXCL13^+CD8^+ T cells through the Notch signaling pathway. It has been reported that tumor tissues generated from a single tumor cell can recapitulate the cellular diversity of the parental tumors ([168]19). In this study, we found that tumor cells in LM contained multiple cell types whose transcriptome profiles mirror those in the primary tumors. The intratumoral heterogeneity of tumor cells with different biological processes may contribute to the chemotherapy failure of CRC, thus investigating the cellular components of each patient may facilitate personal precision treatment. Fibroblasts are critical to normal tissue homeostasis. However, they are functionally subverted in TME ([169]32). Different types of fibroblasts have been identified from a wide range of malignant tumors, including breast cancer ([170]9), pancreatic cancer ([171]33), CRC ([172]34), lung cancer ([173]35), and gastric cancer ([174]8). CAFs have been ascribed pleiotropic protumorigenic functions, such as extracellular matrix remodeling, tissue stiffening, escape from immune surveillance, and promotion of therapeutic resistance ([175]36, [176]37). Tumor-specific FAP^+ fibroblast was identified in primary CRC and proved to be positively correlated with SPP1^+ macrophage, which would stimulate the formation of immune-excluded desmoplastic structure and limit the T cell infiltration ([177]38). In this study, we identified eight subpopulations of fibroblasts with distinct gene expression profiles in CC and LM. The F4_F3 subset expressing protumor factors was enriched in the primary tumors and probably contributed to tumor invasiveness through the NRG1-ERBB3 pathway. Considering the characterized genes, the F4_F3 subset was distinct from FAP^+ fibroblast ([178]38). It has been reported that NRG1 fusion can be detected in CRC, and NRG1 amplification was found in anti–epidermal growth factor receptor (EGFR) therapy intrinsic-resistant CRC patients. Bone marrow–derived mesenchymal stem cells stimulate invasion, survival, and tumorigenesis of CRC through the release of soluble NRG1, activating the HER2/HER3-dependent phosphatidylinositol 3-kinase (PI3K)/AKT signaling cascade in CRC cells ([179]39–[180]41). In our study, rNRG1 can promote the migration of CRC cell lines in vitro. These results hint that NRG1-ERBB3 signaling can promote the progression of CRC. However, the F2_MCAM fibroblasts enriched in LM expressed Notch ligands, revealing that different organs had distinct stromal microenvironment. It is reported that the NOTCH-RBPJ regulatory network is vital for persistence of T[RM] state in non–small cell lung cancer ([181]42), which expressed high level of CXCL13, indicating that Notch signaling is vital for the generation of CXCL13^+CD8^+ T cells. By analyzing the ligand-receptor pair communication, we found that the LM-enriched F2_MCAM subset could interact with tumor-specific CD8_CXCL13 cells through JAG1-NOTCH1, which was confirmed by the colocalization of F2_MCAM cells with CD8_CXCL13 cells using the ST. The CD8_CXCL13 cell subset has been identified in various tumors, such as breast cancer ([182]26) and uterine cancer ([183]43), and proven to be associated with increased efficacy of immunotherapy ([184]26). It has been proven that CXCL13^+ T cells can attract CXCR5^+ B cells into tissues, participating in the formation of TLS ([185]43), which was also verified in our study. TLS can promote antitumor immunity, contribute to the efficacy of immunotherapy, and improve prognosis ([186]44, [187]45). Our data also showed that high infiltration of CXCL13^+ T cells predicted better prognosis in CRC patients. Despite the universal existence of this subset in many kinds of tumors, the mechanism of CD8_CXCL13 cells activation remains unknown. In our study, both CD8_CXCL13 cells and CD4_CXCL13 cells expressed high levels of RBPJ, the characteristic TF of the Notch signaling pathway. In addition, RBPJ was correlated with the expression of CXCL13 and ITGAE, which was highly enriched in the CD8_CXCL13 cell subset according to the analysis of bulk RNA-sequencing dataset of CRCLM, demonstrating that the Notch signaling pathway could influence the generation of tumor-specific T cells. Furthermore, we also identified the possible binding sites of RBPJ on the promoter of CXCL13, indicating the role of Notch signaling in the antitumor immunity of T cells. Because Notch signaling mutations contribute to the antitumor immune responses in CRC ([188]46, [189]47), further studies should be performed to address the interaction of Notch signaling in TME and the possibility of targeting this signaling pathway. There were some limitations of our study. First, all the patients included have undergone neoadjuvant therapy and were unable to reflect the original state of CRC. Second, because the subjects included received different treatments, it was difficult to value the influence of each kind of treatment. Third, the characterizations of stromal cells from the paratumors were unavailable, and the cause of the divergent atlas of stromal cells in CC and LM remains to be elucidated. Last but not least, because our study used a relatively small sample size, a larger population cohort is needed to verify the conclusion. In conclusion, we unveiled the cellular profiles of TME from tumors and paratumor tissues of CRCLM. Our analysis uncovered the different lineages and functions of fibroblasts as well as the dynamic nature of immune cells in different cancer settings, which can be used for further identification of regulatory mechanisms and for potential therapeutic targets. MATERIALS AND METHODS Patients The study was approved by the Ethical Committees of the Guangzhou First People’s Hospital and the Sixth Affiliated Hospital of Sun Yat-sen University (no. KY2020-361-01-01). For scRNA-seq, six CRC patients with liver metastasis were enrolled after written informed consent was obtained. Primary CC, CN (at least 2 cm from matched tumors), LM, LN (at least 2 cm from matched tumors), and PB of the enrolled patients were collected following resection. All the patients received preoperative chemotherapy and/or radiotherapy. For ST sequencing, formalin-fixed paraffin-embedded (FFPE) tissues of six CRC patients were prepared. The clinical and demographic information of all the patients were summarized in table S2, and the neoadjuvant regimens of the patients for scRNA-seq were listed in table S3. Processing of human tissues and cell sorting Freshly resected tumors were rinsed with 1× phosphate-buffered saline (PBS) for three times, cut into small pieces, and transferred to 15 ml of digestion medium containing collagenase IV (1 mg/ml, C5138, Sigma-Aldrich) in Dulbecco’s modified Eagle’s medium (DMEM)/F-12 1:1 (70100300, Biosharp). Tissues were digested for 45 min at 37°C under agitation at 120 rpm; afterward, the samples were filtered using a 150-μm nylon mesh. Following the centrifugation at 450g for 5 min, the supernatants were discarded and the cell pellets were resuspended in 3 ml of red blood cell lysis buffer (C3702, Beyotime) at 4°C for 10 min. After washing with 1× PBS, the cell pellets were suspended in 1× PBS containing 0.2% bovine serum albumin (4240GR500, BioFroxx) and filtered using a 74-μm nylon mesh. Single-cell suspensions were stained with allophycocyanin (APC)/Cy7-CD45 (HI30, BioLegend) and 7-aminoactinomycin D (7-AAD; B226396, BioLegend) for fluorescence-activated cell sorting using BD Aria III (BD Biosciences). Nonimmune cells were sorted as 7-AAD^−CD45^−, and immune cells were sorted as 7-AAD^−CD45^+. The information of cells obtained from different sites of all the patients was listed in table S4. Single-cell RNA library preparation and sequencing After sorting, cells from different sites were labeled with sample tags (633781, BD Single Cell Multiplexing Kit, BD Biosciences) at 4°C for 45 min and washed by cell staining buffer (B302741, BioLegend) for three times. For immune cells, the expressions of seven proteins on cell surface were detected simultaneously using BD AbSeq Oligo-Conjugated Antibodies (BD Biosciences). The information of AbSeq Ab-Oligos was listed in table S5. After washing and centrifugation at 400g, 4°C for 5 min, the cell pellet was resuspended using BD stain buffer (554656, BD Biosciences) to a density of 700 to 1200 cells/μl. Single-cell library preparation was carried out according to the BD Rhapsody Single-Cell Whole Transcriptome Analysis alpha protocol ([190]https://bdbiosciences.com/). Before loading cells to BD Rhapsody Cartridge, nonimmune cell suspensions from primary and metastatic sites were pooled together, so were immune cells from primary and metastatic sites. About 20,000 cells from the cell suspension mixture were loaded onto a microwell array, followed by adding barcoded magnetic BD Rhapsody Cell Capture Beads and cell lysis buffer. Upon cell lysis, the mRNAs and sample tag barcodes of each cell were captured by probes via polyA/polyT hybridization. Beads were subsequently retrieved from the microwells by magnets and pooled into a single tube for reverse transcription to synthesize complementary DNAs (cDNAs). Then, cDNAs were amplified following multiple amplification schemes ([191]https://bdbiosciences.com/). Amplified cDNAs were purified using SPRIselect beads ([192]B23318, Beckman Coulter Life Sciences). Last, the whole transcriptome libraries, AbSeq libraries, and sample tag libraries were obtained for sequencing. Libraries were sequenced on NovaSeq 6000 (Illumina) after quality examination by Agilent 2100 Bioanalyzer (G2940CA, Agilent Technologies). scRNA-seq data quality control and preprocessing The sequencing data were mapped to the human reference sequence (GRCh38). The raw gene expression matrix from each sample was aggregated and converted into a Seurat object via Seurat R package (V4.0). Nonimmune cells with >6000 or <200 genes or >40% mitochondrial genes were discarded. Immune cells with >4000 or <200 genes or >25% mitochondrial genes were filtered out. To further eliminate the data of doublets, we performed the scrublet pipeline ([193]48) for each batch of our scRNA-seq data, which were expected objectively to exclude doublets, and the expected_doublet_rate was set at 0.05. We got 238,365 cells for further analysis, including 41,892 nonimmune cells and 196,473 immune cells. The genes expressed in more than 50 cells were selected. Last, 17,515 genes from nonimmune cells and 17,066 genes from immune cells met the criteria. The gene expression matrices were normalized to the total unique molecular identifiers (UMI) counts per cell and transformed to the natural log scale. To correct the technical and biological variations and increase the accuracy of cell type designation, we applied canonical correlation analysis implemented in Seurat to all the samples before cell type identification. After selecting 2500 highly variable genes by the FindVariableFeatures function in Seurat, principal components analysis (PCA) was performed using these genes. The numbers of principal components (PCs) of different kinds of cells were differentially selected according to the knee point of the scree plot for each cell type to accommodate different population complexities based on the ElbowPlot function in Seurat, and we chose the top 20 PCs to run the FindNeighbors and RunUMAP functions. We used the FindClusters function to cluster the cells, and resolutions from 0.1 to 0.8 were explored for better cell clustering, and we set this argument at 0.5 for clustering immune cells and 0.1 for clustering tumor cells. Two-dimensional Uniform Manifold Approximation and Projection (UMAP) was used to represent cell clusters. Unsupervised cell clustering and annotation According to the differentially expressed genes of each lineage, nonimmune cells were divided into tumor cells, fibroblasts, and endothelial cells. Immune cells were divided into T cells, NK cells, B cells, plasma cells, monocytes/macrophages, DCs, and mast cells. To identify subpopulations within these major cell types, second-round UMAP reduction was performed. Differentially expressed genes of each subset were identified using the FindAllMarkers function implemented in Seurat. Clusters were named by the highly expressed genes of each cluster. Single sample gene set enrichment analysis To calculate the subset infiltration score in the bulk RNA-sequencing dataset, we adopted single sample gene set enrichment analysis (ssGSEA) according to Senbabaoglu et al. ([194]49). Marker genes for F2_MCAM, CD8_CXCL13, CD4_CXCL13, and TLS were listed in table S6. Gene set variation analysis To compare signaling pathway enrichment of different cell clusters, GSVA was performed to calculate gene set enrichment scores ([195]50). Briefly, we run the gsva function for each single cell and used the “ssGSEA” method to calculate the enrichment score of indicated gene sets, and then we used limma packages (version 3.52.3) to analyze the difference of enrichment score of every gene set between each cell cluster and all other clusters, respectively. We chose the top five up-regulated gene sets for each cluster, scaled the score among the clusters, and exhibited in heatmap. SCENIC analysis The single-cell regulatory network inference and clustering (SCENIC) package (v1.1.2.1) ([196]51) was used to analyze the TF activity. We generated the super cells, which combined the data of every 20 single cells in each cluster to reduce the computing resource consumption. The mean values of normalized counts of 20 single cells were calculated as the raw input data of SCENIC. The Wilcoxon rank sum test was used to identify the differentially activated TFs for each subcluster, with log[2] fold change > 0.1 and adjusted P < 0.05 as significantly changed. Cell differentiation trajectory inference To infer the differentiation trajectory of intratumor T cells and fibroblasts, we used Monocle2 ([197]52) to infer the pseudotime of each cell. The differentially expressed genes across the clusters were identified by FindMarkers in Seurat for T cells and fibroblasts, respectively. The dimensionality reduction was performed with the DDRTree algorithm, using the most highly variable genes (top 2000) to arrange the cells in order. Flow cytometry The single-cell suspensions were stained to detect the tissue-resident CD8^+ T cells (CD8^+ T[RM]) with the following antibodies, including mouse anti-human BV510-CD45 (304036, BioLegend), mouse anti-human BV711-CD3 (317328, BioLegend), mouse anti-human BV785-CD56 (362550, BioLegend), mouse anti-human BUV563-CD4 (566000, BD Bioscience), mouse anti-human Alexa Fluor 700–CD8A (300920, BioLegend), mouse anti-human BV421-CD45RO (304224, BioLegend), mouse anti-human BV650-CD45RA (304136, BioLegend), mouse anti-human phycoerythrin (PE)–CD69 (310906, BioLegend), mouse anti-human APC-CD103 (350216, BioLegend), and mouse anti-human BV421-Ki67 (350505, BioLegend). After staining with surface markers, cells were fixed and permeabilized using staining buffer set (00-5523-00, eBioscience) to detect the Ki67 expression. Flow cytometry data were obtained using BD Fortessa (BD Biosciences) and analyzed with FlowJo software (V10, BD Biosciences). Molecular interaction network analysis We used CellPhone DB ([198]http://CellPhoneDB.org) to analyze the interactions between the tumor cells and the fibroblasts as well as the communications between the CXCL13^+ T cells and nonimmune cells through the Notch signaling pathway ([199]53). CellChat ([200]http://cellchat.org/) was applied to compare the differences of the ligand-receptor interactions between CC and LM ([201]54). IHC staining IHC was performed on 5-μm FFPE tissue sections prepared from primary sites. The tissues were dewaxed, hydrated, and subjected to antigen retrieval. After blocking with goat serum at room temperature, tissues were stained with primary antibodies, followed by staining with horseradish peroxidase (HRP)–conjugated secondary antibodies. For single-color IHC, [202]3,3′-diaminobenzidine tetrahydrochloride (DAB) was used as substrate for staining. For multiplex IHC, a PANO 7-plex IHC kit (TSA-RM) (0004100100, Panovue) was used. Tyramide signal amplification (TSA) dye was applied to amplify the signal following the secondary antibodies. The mouse anti-human CD8 antibody (70306s, Cell Signaling Technology), rabbit anti-human CD69 antibody (ab233396, Abcam), rabbit anti-human CD103 antibody (ab224202, Abcam), rabbit anti-human CK19 antibody (ab52625, Abcam), rabbit anti-human CD4 antibody (ab133616, Abcam), rabbit anti-human F3 antibody (ab228968, Abcam), and rabbit anti-human MCAM antibody (ab75769, Abcam) were used as primary antibodies. The nuclei were stained with 4′,6-diamidino-2-phenylindole (DAPI) (C1006, Beyotime). Multiplex IHC slides were viewed, and images were recorded with a Vectra Polaris multispectral imaging system (Akoya Biosciences). Phenochart software (v1.0.12, Akoya Biosciences) and HALO software (V3.2, Indica Labs) were used to view images and calculate the distance of different subsets to tumor cells. CRC patient survival analysis The expression information and survival information of CRC patients were downloaded from GEO ([203]https://ncbi.nlm.nih.gov/geo/), and the accession number is [204]GSE39582. The Kaplan-Meier method was used to generate survival curves, and the log-rank test was used to determine the statistical significance of differences in survival. Survival curves were constructed with R packages survival (version 3.4.0) and survminer (version 0.4.9). The “maxstat.test” function of R package maxstat was run to find the cutoff value of gene expressions for maximum rank statistic. Gene expression correlation analysis Gene expression data of CRCLM patients generated by bulk RNA sequencing were downloaded from the GEO public database ([205]https://ncbi.nlm.nih.gov/geo/) under the data series accession number [206]GSE50760. The processed fragments per kilobase per million (FPKM) expression matrices of CXCL13, ITGAE, and RBPJ of liver metastasis were extracted. The Pearson correlation analysis was applied to calculate the relationship of different genes. ST tissue handling, data processing, and cell type infiltration score calculation The tissues were obtained from patients after resection, fixed by formalin, and embedded in paraffin. A 10× FFPE gene expression slide (PN-1000185, 10X Genomics) was used for ST. A slide with 5-μm FFPE section was dewaxed with xylene (214736, Sigma-Aldrich) and stained with hematoxylin (51275, Sigma-Aldrich) and eosin (HT110116, Sigma-Aldrich). After visualizing and scanning the whole slide, decrosslinking was performed using the tris-ethylenediaminetetraacetic acid (TE) buffer (10-0046, GeneMed) to release the RNA. Forward and reverse human transcriptome probes (PN-1000364, 10X Genomics) were used for probe hybridization overnight. After hybridization, the cDNA libraries were constructed according to the 10X protocol (CG000407_VisiumSpatialGeneExpressionforFFPE_UserGuide_RevA). Sequencing was performed on NovaSeq 6000 (Illumina). Raw sequencing reads were processed with CellRanger V3 and aligned to human genome (GRCh38, ENSEMBL). Seurat (v4.0) was used to process the Space Ranger output files. We used SCTransform to normalize the data, ScaleData to scale the data, RunPCA to perform dimension reduction, FindNeighbors and FindClusters to cluster the ST spots, and RunUMAP to visualize the data. Then, we chose the scaled matrix to calculate the mean values of the F2_MCAM and CD8_CXCL13 gene signatures in every spot. Last, we ran the log1p function for the mean values of the F2_MCAM and CD8_CXCL13 gene signatures as the F2_MCAM and CD8_CXCL13 scores, respectively. The reference genes used were listed in table S7. SpatialFeaturePlot was performed to visualize the infiltration score of each subset. Pearson correlation analysis was applied to calculate the infiltration relationship of different subsets in each spot. The functional pathway enrichment analysis was performed to explore the function differences between the tumor and paratumor region. Briefly, we calculated the mean expression levels of each gene in the tumor and paratumor regions and then run the gsva function to calculate the gene set enrichment scores for every hallmark gene sets, which were downloaded from MSigDB database ([207]http://gsea-msigdb.org/gsea/msigdb/). The results were scaled and displayed in the heatmap. To estimate the cell cluster proportions in each captured spot, we performed the SPOTlight deconvolution analysis according to the standard pipeline supplied in [208]https://github.com/MarcElosua/SPOTlight. Briefly, we extracted the marker gene matrix for each cell cluster in our scRNA-seq data by running the FindAllMarkers function in Seurat package, setting logfc.threshold = 1 and min.pct = 0.8. Then, the ST data and marker gene matrix were used to run the spotlight_deconvolution function with the default parameters. The deconvolution results were displayed in the histology image by running the spatial_scatterpie function, and each captured spot was renamed using the most dominant cluster there. Then, to infer the cell-cell communication in ST data, the renamed spatial object was used to perform the cellphoneDB analysis pipeline ([209]https://github.com/Teichlab/cellphonedb). Cell lines and culture Human CRC cell line RKO was a gift from H. Zhang (Guangdong Provincial People’s Hospital). Human CRC cell line SW620 was a gift from Y. Yao (Guangzhou First People’s Hospital). Both RKO and SW620 were maintained in DMEM (Gibco) supplemented with 10% fetal bovine serum (FBS) (ExCell Bio) and antibiotics (Gibco). Transwell migration assay Transwell migration assay was performed to determine the influence of rNRG1 on the cell migration of CRC cell lines. Cells were seeded in 24-well Transwell inserting chambers (BD Biosciences) that contained serum-low medium. FBS (10%) with (50 ng/ml) or without (0 ng/ml) rNRG1 (R&D Systems, 396-HB-050) was added to the lower chamber as a source of chemoattractant. After 24 hours, the cells having not migrated through the chamber membrane were removed, and those having passed through the membrane were subjected to fixation and staining with crystal violet. They were subsequently detected and photographed using a microscope. Statistical analysis Measurement data were presented as mean ± SD. The Wilcoxon paired nonparametric test was used to compare percentages of different cell clusters among CN, CC, LN, LM, and PB. The infiltration scores of subsets were calculated by unpaired nonparametric test. The analysis was performed by SPSS 21 (IBM). Two-sided P < 0.05 was considered significant. Acknowledgments