Abstract Colorectal cancer (CRC), also known as colon cancer, is the third most common cancer and the fourth most cause of cancer-related death in the world. CRC can be classified into two major subtypes, including microsatellite instability (MSI) and microsatellite stability (MSS), which showed different characteristics in immunotherapy. Low sensitivity of diagnostic biomarkers and metastasis are still the principal cause of mortality, especially in MSI. Here, applying computational programs, we identified recurring expression programs based on single cell RNA sequencing (scRNA-Seq) data of CRC cell lines. Notably, three MSI specific recurring modules were identified by non-negative matrix factorization (NMF). High NMF score genes enriched in the function of metabolism and inflammatory response. Focusing on top specific active transcription factor (TF), RUNX3 (Runt-related transcription factor 3), our results suggest that T cell infiltration was increased in RUNX3 high MSI CRC samples. Unbiased Gene Set Enrichment Analysis (GSEA) showed that RUNX3 was strongly associated with immune and metastasis related functions, such as Interferon Gamma (IFN-γ) and EPITHELIAL MESENCHYMAL TRANSITION (EMT). In addition, RUNX3 shows specific highly activated at epigenetic level in MSI compared with other gastrointestinal carcinomas. Positive correlation between RUNX3 and most immune checkpoints further confirmed RUNX3 might have crucial roles in MSI cancer progression and immunotherapy. Taken together, these results indicate significant tumor heterogeneity of two CRC subtypes at single-cell level and epigenetic modification level. These results also linked transcriptional dysregulation with immune infiltration at single-cell level in MSI, which may advance the application of scRNA-Seq technology in immunotherapy and contribute to developing novel biomarkers of this malignancy. Keywords: Colorectal cancer, Microsatellite instability, Single cell RNA-Seq, RUNX3, Immune infiltration, EPITHELIAL MESENCHYMAL TRANSITION 1. Introduction Colorectal cancer (CRC), also known as colon cancer, which occurs in the colon or rectum, which is the third most common carcinoma and fourth most cause of cancer-related death in the world (reproduced from [45]http://globocan.iarc.fr/) [[46]1]. In clinical, the most widely applied diagnostic biomarkers, such as FOBT and CEA, shows high specificity but very low sensitivity [[47]2]. In addition, metastasis is the principal cause of mortality in CRC so far [[48]3]. Thus, identification of new molecular biomarkers in CRC prognosis has become an essential and urgent task. Based on the genomic instability, CRC can be classified into two major subtypes, one is microsatellite instability (MSI) and the other is microsatellite stability (MSS) [[49]4]. The first subtype, MSI, is a hypermutable phenotype without DNA mismatch repair activity, and DNA mismatch repair system deficiency can lead to the accumulation of multiple mutations in CRC [[50]5]. In CRC, MSI patients account for about 15%, which exhibit the mutations of dominating genes, MLH1 (MutL Homolog 1), MLH3 (MutL Homolog 3), MSH3 (MutS Homolog 3), MSH2 (MutS Homolog 2), MSH6 (MutS Homolog 6), and PMS2 (PMS1 Homolog 2, Mismatch Repair System Component), in DNA mismatch repair pathway [[51]6]. In clinical, MSI patients almost always accompany by the following features: hereditary form (patients are younger than 50 years) or sporadic form (elderly patients), localization in the proximal colon and synchronous occurrence with other tumors [[52]7]. Immune checkpoint blockade (ICB) therapy has showed promising efficacies in MSI CRC patients with high immune cell infiltration, which are more effective than patients with MSS status; however, the underlying mechanism of immune cell infiltration in MSI is still unclear; a part of MSI patients still present no response after 20 weeks treatment; The lack of recruitment of immune cells in no response MSI patients seems to be the fundamental obstacle to efficacy [[53]4]. Thus, clarity the potential mechanism of immune response will enhance the response rate and identify more sensitive biomarkers in MSI CRC patients. Single-cell RNA sequencing has been an emerging tool to explore the characterization of cellular heterogeneity, differentiation states, cell-cell interactions through ligand-receptor at the single-cell resolution in an unbiased way. Single cell RNA sequencing (scRNA-Seq) can be widely applied to detect multi-omics at single-cell level [[54]8]. Several studies have applied scRNA-Seq data to reveal tumor heterogeneity in multiple cancer types, such as lung cancer [[55]8], breast cancer [[56]9] and esophageal squamous cell carcinoma (ESCC) [[57]10]. To clarify the potential underlying mechanism of immune evasion and identify new molecular biomarkers in MSI CRC prognosis, we did the following analysis (The flow chart showed in [58]Fig. 1A). At first, we analyzed the scRNA-Seq data of CRC cell lines from CCLE (Cancer Cell Line Encyclopedia) database. Distinct cell populations were uncovered, and we identified recurring expression programs in CRC cancer cells. In addition, the difference of two major CRC subtypes were present at single cell RNA expression level, which might explain why most MSI patients have better immunotherapy response compared with MSS patients. Finally, RUNX3 was identified as an immune infiltration related core transcript factor (core TF in regulatory network), which not only has positive correlation with IFN and EMT signals, also co-expressed with most immune checkpoint related genes in CRC microenvironment. These results provide insights into different subtypes of CRC microenvironment and contribute to developing effective prognosis biomarkers and therapeutic intervention in clinical for this malignancy. Fig. 1. [59]Fig. 1 [60]Open in a new tab Recurring expression and regulons patterns in colorectal cancer cells. A. The flow chart of this study. B. Clustering of expression programs in MSI and MSS CRC cancer cells. C. Functions of different recurring expression patterns. D. Differentially existed core TFs in each regulon (one module compared with other three modules). E. Regulons of RUNX3 and BACH1. 2. Results 2.1. Recurring programs and activated functions in two colorectal cancer subtypes Colorectal cancer has been divided into two main subtypes based on the microsatellite stability [[61]4]. Recently, non-negative matrix factorization (NMF) algorithm was applied to identify multiple recurring expression patterns of genes in pan-cancer cell lines, which are present in multiple cell lines [[62]11]. More details about NMF approach were listed in Methods. To identify different recurring expression patterns in all the colorectal cancer cells, we obtained four groups ([63]Fig. 1B) by clustering all the cells based on the top 50 genes (ranked by NMF score) in each recurring expression pattern. As expected, multiple functions have been discovered in colorectal cancer at single-cell resolution. Most of the recurring expression patterns indicated high heterogeneity, while MSI and MSS mixed groups revealed the common characteristics in these two subtypes ([64]Fig. 1B, module 2). In addition, MSI cancer cells have three different recurring expression patterns, which indicated high heterogeneity in cancer cells of MSI. To decipher the functions of each recurrent pattern, we did the functional enrichment analysis based on the overlap top 50 genes in each group ([65]Fig. 1C, red to white present q-value from 0 to 0.05). R package, clusterProfiler v3.14.3 was applied. E2F targets and G2M checkpoint have been reported participate in the regulation of cell cycle and also DNA synthesis in multiple cancer types, including colorectal cancer [[66]12,[67]13]. The dysfunction of cell cycle and DNA synthesis are key mechanism in colorectal cancer progression [[68]14]. Here, we found these two hallmarks were all activated in module 2 and module 4, which reveals the dysfunction of cell cycle and DNA synthesis in colorectal cancer cells at single-cell level. MITOTIC_SPINDLE, as a crucial part of cell cycle, forms during cell division and separates duplicated chromosomes [[69]15]. In colorectal cancer, especially MSI subtype, researchers already found mitotic defects can promote chromosomal instability in colorectal cancer. In our results, MITOTIC_SPINDLE enriched in module 4, and all cell lines in module 4 were MSI subtypes. These results indicate that chromosomal instability of MSI colorectal cancer can be reflected by recurrent expression patterns at single-cell resolution as well. Notably, compared with other modules, module 3 specific hallmarks were all metabolism related, such as OXIDATIVE PHOSPHORYLATION, FATTY ACID METABOLISM and REACTIVE OXYGEN SPECIES PATHWAY. OXIDATIVE PHOSPHORYLATION (OXPHOS) is an irreplaceable metabolic pathway for reforming ATP by their structure, enzymes, and releasing energy in cells, which has been discovered as an emerging target in cancer therapy. In addition, inhibition of REACTIVE OXYGEN SPECIES PATHWAY and FATTY ACID METABOLISM has been raised as potential therapeutic strategies in recent studies [[70]16,[71]17], while the therapeutic efficacy on MSI colorectal cancer is unknown. Recent studies found tumor cells can display reprogramming activity in metabolic process and then promote cancer progression in several cancer types [[72]18]. Therefore, these results hint that a part of cells showed aberrant metabolomic status in MSI, intervening these processes at single-cell level could extremely improve the therapeutic efficacy. On the other hand, module 1 is also a MSI specific group, which has activated signaling pathways and immune related pathways, such as KRAS SIGNALING and INFLAMMATORY RESPONSE. Different functional groups identified in MSI indicate that multiple cells play diverse functions and lead to the occurrence of cancer collectively. In module 2, only two cell cycle related hallmarks were significant, which suggested that although MSI and MSS subtypes have some common features in gene level, no obvious similar feature in pathway level. These results showed that high expressed genes in MSI and MSS were different in single-cell level, which lead to different abnormal biological functions. Therefore, excavating colorectal cancer single-cell RNA expression data could contribute to better understanding and revealing the crucial mechanism of colorectal cancer subtypes, and provide more constructive therapeutic strategies. More details about the significant enrichment analysis functions in four modules were listed in [73]Supplementary Table 1. 2.2. Gene expression pattern-enriched transcriptional regulatory network and epigenetic modification level of MSI core TFs To infer the activity of TF in colorectal cancer scRNA expression data, SCENIC algorithm was applied based on the motif's information of each TF and the target genes expression in scRNA-Seq data. Then, TF regulons (TF and their target genes) were also identified by SCENIC. Finally, module specific regulons were extracted by comparing one module verses other modules ([74]Fig. 1D). As we can see, the TF regulons in MSI and MSS were different. Module 2, MSI and MSS cell lines mixed module, shows high YY1 (Yin Yang 1), TFDP1 (Transcription factor Dp-1), E2F8 (E2F Transcription Factor 8) and E2F1 (E2F Transcription Factor 1) regulons activity, which was consistent with the results we described in the function enrichment analysis; E2F TARGETS and G2M CHECKPOINT were activated in Module 2. For the other three modules, although they are all MSI cell lines, they still have three diverse TF regulons, which indicated that multiple pathogenic mechanisms were existed in MSI cells. Notably, RUNX3 (12 g) and BACH1 (BTB Domain and CNC Homolog 1) (18 g) in module 1 have higher regulon scores compared with other modules. TFs with higher regulon scores means more active in this module. RUNX3, as a tumor suppressor, can frequently delete or transcriptionally silence in cancer [[75]19]. In colorectal cancer, RUNX3 was identified as a suppressor TF, which was reduced in advanced colorectal cancer and the expression of RUNX3 in cytoplasmic was associated with poor patient outcome [[76]20]. However, the activity of RUNX3 in MSI colorectal cancer has not been revealed yet. At single-cell level, our results identified RUNX3, as a core TF, can form a regulatory network in a part of MSI colorectal cancer cells with highest regulon score in all MSI cell lines, which reveals the activity of RUNX3 in different subtypes of colorectal cancer. Notably, one of the RUNX3 regulon genes, CD276 (Cluster of Differentiation 276), is an important immune checkpoint molecule, which plays a crucial role in cancer cell proliferation, invasion, and migration [[77]21]. The transcription factor, BACH1 has been found to promote the progression of human colorectal cancer through BACH1/CXCR4 pathway [[78]22]. BACH1, as a master regulator, was identified in breast cancer [[79]23]. These results indicated that the aberrant of RUNX3 and BACH1 might lead to the disorder of immune response and then increase tumor proliferation and invasion. The regulons of RUNX3 and BACH1 were showed in [80]Fig. 1E. To further explore the activation of RUNX3 and BACH1 at epigenetic level in CRC, ChIP-Seq data (H3K27ac) of colorectal primary cell lines (n = 35) were downloaded from published study [[81]24], which includes 4 normal colorectal primary cell lines, 6 MSI CRC primary cell lines and 25 MSS CRC primary cell lines. H3K27ac is an epigenetic modification marker, which is associated with the higher activation of transcription ([82]https://en.wikipedia.org/wiki/H3K27ac). H3K27ac signal has been found at both proximal and distal regions of transcription start site (TSS) ([83]https://en.wikipedia.org/wiki/H3K27ac). To compare the activation of RUNX3 in CRC with that in other gastrointestinal cancer, we also downloaded the ChIP-Seq data in gastric and esophageal cancers from published studies [[84][25], [85][26], [86][27], [87][28], [88][29]]. More details about these public datasets were described in Method. We re-analyzed these H3K27ac data based on Hg19 reference genome. Compared with Gastric cancer and esophageal cancer, H3K27ac signal at RUNX3 loci is highly activated in CRC (including promoter region (TSS ± 2 kb) and gene body) ([89]Fig. 2A). While BACH1 is activated in all GI cancer types ([90]Fig. 2B). These results indicate that RUNX3 is only activated in CRC samples compared with normal colon samples as well as other GI cancer types at epigenetic level (H3K27ac signal). Fig. 2. [91]Fig. 2 [92]Open in a new tab Epigenetic modification of RUNX3 and BACH1 in GI cancer samples and Correlation analysis between TFs and T cell infiltration in TCGA COAD cohort. A. H3K27ac signal of RUNX3 loci in multiple GI cancer samples. B. H3K27ac signal of BACH1 loci in multiple GI cancer samples. C. Correlation analysis between RUNX3 (BACH1) and canonical T cell-associated genes. D. Correlation analysis between RUNX3 and CD8^+ scores. E. Correlation analysis between RUNX3 and CYT scores. F. TIL scores and RUNX3 in MSI (left) and MSS (right) samples. The x-axis is TIL score and y-axis is RUNX3 expression. 2.3. MSI-related TF, RUNX3, correlated with immune infiltration The pathway enrichment analysis results suggested that RUNX3 is associated with immune checkpoint molecule in MSI cell lines. In recent study, Runx3 can reinforce CD8 + T-cell function and improve the clinical response to immunotherapy by epigenetic reprogramming mechanism in many cancer types [[93]30]. To further analyze the relationship between RUNX3 and immune system in MSI subtype, we investigated the correlation between RUNX3 and the well-known T cell infiltration-associated gene signatures, which include T cell average, immune cytolytic activity (CYT), and CD8^+ scores. Here, TCGA COAD (Colon adenocarcinoma) bulk RNA-Seq data was used as an independent dataset. More details were listed in see Methods. In gene signatures, the expression of RUNX3 was highly positively correlated with canonical T cell infiltration-associated genes (R = 0.6606 and P = 0) ([94]Fig. 2C). More information about the twelve gene signature was showed in Method section. In addition, BACH1 also showed highly positive correlation with this T cell infiltration-associated signature ([95]Fig. 2C) but no correlation with CD8^+ and CYT scores signatures. Since the amount of CD8^+ T cells reflects the T cell activity and plays an important role in immune response. We evaluated the CD8^+ T cell score of TCGA COAD samples in TIMER database. Prominent correlation between RUNX3 and CD8^+ T cell score were observed (R = 0.6991 and P = 0) ([96]Fig. 2D), which indicated that increased RUNX3 might induce the secrete of CD8^+ T cells in MSI sample tumor microenvironment and lead to better immunotherapy response compared with MSS patients. Another gene signature, GZMA and PRF1, has been verified significantly upregulated upon CD8^+ T cell activation, which also showed productive clinical responses to the anti-PD-L1 immunotherapies [[97]31]. Here, we evaluated the immune cytolytic activity and evasion of patients based on the average of these two genes [[98]32]. Consistent with the previous results, RUNX3 are significantly positively associated with the CYT score in TCGA COAD MSI patients compared with MSS patients (R = 0.5281 and P = 0) ([99]Fig. 2E). In addition, a pathology-based measure of T cell infiltration (TIL score) (429 histology slides from TCGA COAD patients [[100]33]) was considered. Consistent with above results, higher TIL scores of the sample has higher expression of RUNX3 in MSI CRC patients ([101]Fig. 2F, left), while no such positive correlation was observed in MSS CRC patients ([102]Fig. 2F, right). Taken together, we found the expression of RUNX3 had positive correlation with T-cell infiltration in MSI colorectal tumor microenvironment, suggesting that high expression of RUNX3 might induce the T cell infiltration and immune response in partial cells in MSI colorectal cancer. These results also explain why some MSI patients have a better response after immunotherapies compared with MSS patients at the single-cell level. 2.4. RUNX3 positively correlated with EMT and IFN signal in two MSI CRC cohorts To explore the downstream genes affected by RUNX3, we identified differentially expressed genes (DEGs) and performed an unbiased GSEA analysis using MSI CRC samples in published database, TCGA. First, we classified MSI samples into two groups, RUNX3-high (RUNX3 expressed top 30% samples) and RUNX3-low (RUNX3 expressed bottom 30% samples) groups, based on the expression of RUNX3. Then, limma R package was used to identify DEGs ([103]Fig. 3A). Most of the top-ranked (Fold change) genes are IGH and IGL related in RUNX3 high group. Recent studies have found immunoglobulin (Ig) not only can be produced by B-lineage cells, also can be cancer-derived [[104]34]. Cancer-derived Ig exerts at promoting cancer cells malignant behaviors and mediating tumor via multiple mechanisms [[105]34]. These results hint that high expressed RUNX3 might promote the secretion of cancer-derived Ig and present malignant behaviors of cancer cells. Fig. 3. [106]Fig. 3 [107]Open in a new tab RUNX3 downstream genes and RUNX3-related hallmarks in two independent datasets. A. DEGs between RUNX3-high and RUNX3-low groups in TCGA MSI COAD patients. B. RUNX3-induced negative and positive hallmarks in TCGA MSI COAD patient samples. C. GSEA enrichment plot of Epithelial Mesenchymal Transition and INTERFERON GAMMA RESPONSE in RUNX3 classified analysis (based on [108]Fig. 3B). D. Differential expressed genes between RUNX3-high and RUNX3-low groups in colorectal MSI cohort ([109]GSE13294). E. Positive significant hallmarks in RUNX3-high group in MSI patients ([110]GSE13294). F. GSEA enrichment plot of Epithelial Mesenchymal Transition and INTERFERON GAMMA RESPONSE in RUNX3 classified analysis (based on [111]Fig. 3E). G. Correlation analysis of RUNX3, BACH1 and 20 well-known immune related genes. Next, fold change of all genes from RUNX3-high and RUNX3-low group comparison were used to perform GSEAPreranked analysis. The results show that EPITHELIAL MESENCHYMAL TRANSITION (EMT) was the top one with significantly enriched P-value ([112]Fig. 3B and C). EMT is a complex program in cancer, which is necessary for tumor cells to obtain the capability of migratory and invasive [[113]35]. Recent studies found EMT is an essential step in cancer metastasis and motility [[114]35]. Decreased RUNX3 can repress metastasis in prostate cancer and renal cell carcinoma [[115]36,[116]37]. In colorectal cancer, although immunotherapy benefits most of MSI patients, still part of MSI patients obtain immune evasion and metastasis. Based on the bioinformatics analysis we described above, over-expressed RUNX3 might induce the activation of EMT and lead to cancer metastasis in a subset of MSI CRC patients. In addition, the top-enriched positive pathways include several immune-related functions, such as inflammatory response and IFN-gamma response ([117]Fig. 3B and C). We also collected independent CRC MSI cohort ([118]GSE13294) and did the same analysis (identified DEGs ([119]Fig. 3D)) used in TCGA COAD MSI dataset. The results are very consistent with what we identified in TCGA COAD MSI dataset. Top activated hallmarks are EMT and immune response related processes ([120]Fig. 3E and F). One of the interferon type II cytokine, IFN-gamma, is a crucial activator of macrophages, which can induce the expression of class II major histocompatibility complex (MHC) related genes in tumor microenvironment [[121]38]. Recent research shows that IFNs can regulate tumor cells directly or indirectly via multiple mechanism in cancer cells and protect against disease by direct effects on target cells and activate immune responses [[122]39]. Thus, the activation of IFN can promote cancer immunotherapy. The results in this part indicated that overexpression of RUNX3 might induce IFN signals in MSI CRC patients and activate immune response in immunotherapy, which will explain why MSI patients can get benefit from immunotherapy. The downstream genes of RUNX3 suggested that cancer-derived Ig might lead to the aggressiveness of MSI patients in RUNX3 high group. Top-enriched negative pathways were cell-cycle related, such as MYC and E2F, which is consistent with known findings that cell cycle activity has higher anti‐tumor immunity in diversity cancer [[123]40] ([124]Fig. 3B). To further characterize the relationships between RUNX3 and immune response in MSI CRC, 20 immunotherapy related genes were collected. As we can see, RUNX3 has strong positive correlation with most of the immunotherapy related genes, such as CTLA4 (cytotoxic T-lymphocyte-associated protein 4), TIGIT (T cell immunoreceptor with Ig and ITIM domains) and so on ([125]Fig. 3G), while BACH1 shows weak correlation with these genes ([126]Fig. 3G) in TCGA COAD MSI samples. The above results suggest that although RUNX3 has positively correlation with immune infiltration, highly expressed RUNX3 might activate the EMT and cancer metastasis process at the same time in MSI CRC patients, which indicate the potential progress of cancer development. As RUNX3 high samples with active EMT, how to increase immune infiltration and avoid metastasis could be a potential obstacle in CRC immunotherapy. 3. Discussion Although CRC patients with MSI status have benefit immunotherapy response [[127]41,[128]42], a subset of MSI patients still have no benefit after immunotherapy [[129]4,[130]43]. Most of the widely applied diagnostic biomarkers always show high specificity but low sensitivity [[131]2], and metastasis is still the dominating cause of mortality in CRC [[132]3]. Considering these obstacles, identification of effective biomarkers in CRC prognosis and therapies has become an urgent task. Recently, a subset of core TFs have been identified to orchestrate gene expression programs and effect the immune infiltration extent in multiple cancer types [[133]25,[134]29,[135]44,[136]45]. For example, MYC (MYC Proto-Oncogene, BHLH Transcription Factor) can involve in the recruitment of T cells and macrophages by directly binding to the promoter of CD47 (Cluster of Differentiation 47) and PDL1 (Programmed death-ligand 1), and then modify the microenvironment in tumor [[137]44]. Moreover, single cell RNA sequencing has emerged as a promising tool to depict the characterization of tumor heterogeneity, which has been widely applied in several studies [[138]9,[139]46,[140]47]. Here, we explored the tumor heterogeneity of CRC at single cell level based on CRC cell lines from published paper. First, we identified distinct cell populations in each CRC subtype. Next, we filtered the intra-tumoral recurring expression programs in all CRC cell lines, which might contribute to filtrating similar function cells in different subtypes. Then, unbiased Gene Set Enrichment Analysis of top 50 genes in each distinct cell population discovered different functions among populations, which might help to explain why a part of MSI patients shows low immunotherapy response. In addition, significant regulons and core transcription factors (TFs) in different intra-tumoral recurring expression programs were identified, which inferred the activity of hundreds of TFs in CRC at the single-cell level. In MSI cancer cell lines, RUNX3 was identified as a core TF, which has positive correlation with IFN signal and EMT pathways. The correlation analysis showed higher expression of RUNX3, higher immune infiltration in MSI CRC patients. Although RUNX3 has been identified in cancers for several years, the potential immune response related mechanism of RUNX3 is still obscure. This is the first time to evaluate the activity of RUNX3 in MSI colorectal cancer at the single-cell level and validated the results using bulk RNA-Seq data. Enriched H3K27ac signal around RUNX3 promoter regions also indicates the specific activation of RUNX3 in MSI patient samples. Notably, correlation analysis between RUNX3 expression and immune infiltration related gene signatures first revealed the potential function of RUNX3 in immune response. These results indicated that over-expressed RUNX3 might increase the immune infiltration in MSI CRC patients, while tumor cells can trigger other mechanism (activate EMT process) at the same time to insure being alive. Thus, RUNX3 expression level is critical for MSI CRC patients’ immunotherapy. Block the activation of RUNX3 downstream genes and over-expressed RUNX3 might help to increase tumor infiltration and enhance the efficacy of immunotherapy in MSI CRC patients. 4. Conclusions In summary, we provide insights in tumor microenvironment of two CRC subtypes, MSI and MSS, which first reveals the specific and common expression characteristics at the single-cell level. Moreover, RUNX3, first identified as an immune infiltration related core transcript factor, which also shows positive correlation with IFN and EMT signal. RUNX3 co-expressed with most of the immune checkpoint related genes in a subset of MSI status CRC cells ([141]Fig. 4). These results hint that blocking the activation of RUNX3 downstream genes (EMT related) and over-expressed RUNX3 at the same time might contribute to increasing tumor immune infiltration and then enhance the efficacy of immunotherapy in MSI CRC patients. All the results in this study might reveal the tumor heterogeneity and contribute to developing effective prognosis biomarkers and therapeutic intervention in clinical for this malignancy. In the meanwhile, this study explored the tumor heterogeneity of CRC applying bioinformatic methods and pipelines based on the published datasets (scRNA-Seq data, TCGA COAD RNA-Seq data and several immune infiltration related gene signatures). However, we acknowledge that there are two limitations in this study, one is no validation of these findings in vivo and vitro. The other is scRNA-Seq data from CRC patient cells need to be considered to validate our findings. We will definitely do that in the further study. Fig. 4. [142]Fig. 4 [143]Open in a new tab Potential mechanism of immune evasion in MSI CRC patients. 5. Methods 5.1. Sample collection and single cell RNA sequencing data Single cell RNA (scRNA) expression data of CRC cancer cell lines was downloaded from Gene Expression Omnibus (accession number [144]GSE157220 ) [[145]11], which performed scRNA-Seq using all the cell lines in CCLE database with 10× Genomics Chromium system. Meta information of all CRC cell lines was downloaded from CCLE portal ([146]https://portals.broadinstitute.org/ccle). Then, 2089 CRC cancer cells with MSI or MSS status were obtained, which includes 1322 MSI and 767 MSS CRC cancer cells, respectively. Finally, a sing-cell RNA sequencing expression matrix of CRC cells (rows are genes and columns are cells) was obtained for the identification of recurrent patterns. Bulk RNA expression data of colorectal cancer patients (n = 451) was downloaded from TCGA (The Cancer Genome Atlas Program) by GDC data portal. Gastrointestinal cancer related ChIP-Seq data of H3K27ac was downloaded from published studies [[147][26], [148][27], [149][28]]. 5.2. Epigenetic modification data (H3K27ac ChIP-Seq data) analysis To compare the H3K27ac histone modifications of all samples at the same criterions, we downloaded all the raw data of these samples and did the same process analysis. Briefly, quality control was done in all samples, and then ChIP-Seq data of H3K27ac was aligned to the reference genome (HG19) by using Bowtie 2 (v2.2.6) with default parameters. Next, to mark the PCR duplicates, we applied Picard MarkDuplicates tool and identified duplicates in each sample, and then ENCODE blacklisted regions in HG19 reference genome ([150]https://sites.google.com/site/anshulkundaje/projects/blacklists) were removed before the peak calling analysis. Then, well-known peak calling method, Macs2 (v2.2.7.1), was used to call peaks with the default parameters: -bdg --SPMR --nomodel --extsize 200 -q 0.01 [[151]48]. To obtain bigwig file for visualization, bamCompare Function in DeepTools (v3.1.3) was utilized to convert all bam files to bigwig files. The bigwig files can be used to visualize the H3K27ac modification level in either whole genome or certain regions. Finally, we visualized the signal in Integrative Genomics Viewer (IGV) software by importing the corresponding bigwig files of each sample [[152]49]. 5.3. Identification of recurrent patterns and expression groups in colorectal cancer To identify multiple expression programs of colorectal cancer cell, we applied NMF algorithm in the scRNA-Seq data of each cancer cell line [[153]50]. Non-negative matrix factorization (NMF) is one of an algorithms in multivariate analysis with a matrix V and two matrices W and H. All three matrices should have no negative values. Recently, NMF has been widely applied in bioinformatics to identify gene expression patterns and obtain representative genes in each cluster ([154]https://en.wikipedia.org/wiki/Non-negative_matrix_factorization). In this study, NMF function was utilized to calculate the relative expression values (Er), which can transform all negative values to zero. Because NMF avoids the exact normalized values of undetected genes, it may be helpful to analyze scRNA-seq [[155]50]. Next, distinct parameters (k from 6 to 9) were used to identify robust expression programs. Then top 50 genes (selected by NMF score) defined the expression programs in each k. Then, we defined the robust expression programs which has an overlap of at least 35 genes (70%) with a program obtained from different k value in each cell line. At last, the programs with highest overlap with a NMF program identified in another cell lines were remained. Finally, Pheatmap R package was used to show the common expression programs. 5.4. Identification of single-cell rEgulatory network in different groups SCENIC ([156]https://github.com/aertslab/SCENIC) is a well-known method to infer the activity of transcription factors (TF) and has been widely used in many published papers [[157]46]. Here, we utilized SCENIC to infer the activity of TFs in each colorectal cancer cell. In brief, SCENIC embeds three functions to infer the regulatory network. Firstly, GENIE3 function can identify the potential TF targets based on the expression in each cell types. Then, RcisTarget function performs the analysis of TF-motif enrichment and extract the direct targets (regulons). A regulon contains at least one TF and its directly target genes. In the last part, AUCell function calculated the score of each regulon. Here, by applying SCENIC, we inferred TF activity and the corresponding regulons in four modules. Then, differentially regulons between one module and the other three modules were identified. 5.5. Tumor infiltrating lymphocytes (TILs) scores and T cell infiltration-related gene signatures Tumor infiltrating lymphocytes (TILs) have been mentioned in multiple solid tumor types, which are becoming a pivotal biomarker for predicting response and efficacy [[158]51]. TIL score of colorectal cancer samples was downloaded from a pathology-based study, which measured the T cell infiltration in 429 histology pathology slides in TCGA COAD samples [[159]33]. Several studies applied TIL score to evaluate the immune filtration in colorectal cancer [[160]45]. Gene signatures of T cell infiltration-related were collected from established studies. Twelve canonical T cell-associated genes, CCL2 (chemokine (C–C motif) ligand 2), CCL3 (chemokine (C–C motif) ligand 3), CCL4 (chemokine (C–C motif) ligand 4), CXCL9 (Chemokine (C-X-C motif) ligand 9), CXCL10 (Chemokine (C-X-C motif) ligand 10), CD8A (Cluster of Differentiation 8a), HLA-DOB (Major Histocompatibility Complex, Class II, DO Beta), HLA-DMB (Major Histocompatibility Complex, Class II, DM Beta), HLA-DOA (Major Histocompatibility Complex, Class I, DO Beta), GZMK (Granzyme K), ICOS (Inducible T Cell Costimulator), and IRF1 (Interferon Regulatory Factor 1), were identified based on about 1000 colorectal cancer samples by exploring the genetic drivers related to immune evasion [[161]52]. Here, “T cell average” score was defined by the average log expression of these twelve genes in each sample [[162]52]. Immune Deconvolution analysis was characterized in Tumor IMmune Estimation Resource (TIMER) database, which can evaluate the abundance of six immune cell types (B cell, CD4 T cell, CD8 T cell, neutrophil, macrophage, and dendritic cell) based on gene expression matrix [[163]28]. Then, CD8^+ T cell score was used as the second gene signature. Colorectal cancer MSI and MSS patients were considered as two different datasets and analyzed separately. The last gene signature reflects the immune cytolytic activity (CYT), which was evaluated by the average expression of two genes, GZMA (Granzyme A) and PRF1 (Perforin 1), in each sample [[164]32]. Immune system is composed of multiple cell types and each element performs specific task, which can recognize and/or react against foreign materials [[165]53]. T cell, as one of the subtype in white blood cells, which plays a crucial role in the immune response [[166]53]. Here, we mainly applied T cells related gene signature as immune infiltration criteria. 5.6. Unbiased Gene Set Enrichment Analysis (GSEA) Bulk RNA expression data of colorectal cancer (TCGA COAD) was downloaded from TCGA using TCGAbiolinks R package [[167]54]. MSI and MSS samples were considered as two different datasets and analyzed separately. Firstly, samples in MSI were ranked by the expression of certain TF and then the top 30% TF high expressed samples and top 30% TF low expressed samples were extracted as two groups. Secondly, R package, limma, was applied to determine differentially expressed genes (DEGs) in TF high and TF low groups [[168]55]. At last, GSEAPreranked function in GSEA software was applied to identify the relevant functions of the certain TF (hallmarks database was used here) [[169]56]. 5.7. Immunotherapy related markers We collected immunotherapy related markers form published paper [[170]57], including CD8 T cell related genes, CD8A; Granzyme related genes, GZMA (Granzyme A) and GZMB (Granzyme B). Other markers were showed in [171]Fig. 3G. 5.8. Statistical analysis The cor.test(.) function was used to calculate Pearson correlation in R software (3.6.3). The significance among different groups was calculated by t‐test(.) function in the boxplots. Colorectal cancer scRNA-Seq data was downloaded on March 1, 2021. The RNA-Seq data used in this study was download from TCGA database (Level 3, released on March 26, 2019 (GDC V16.0)). Funding This work was Supported by Sanming Project of Medicine in Shenzhen (No.SZSM201911012). Consent for publication Not applicable. Data availability scRNA-Seq and bulk RNA-Seq data used in this study are all published data, which can be downloaded in GEO ([172]GSE157220, [173]GSE13294) ([174]https://www.ncbi.nlm.nih.gov/geo/) and TCGA (COAD) ([175]https://portal.gdc.cancer.gov/) databases. ChIP-Seq data of H3K27ac were downloaded from published papers (more details in Method section). CRediT authorship contribution statement Peng Sun: Writing – review & editing, Writing – original draft, Validation, Software, Methodology, Investigation, Data curation. Yusong Luan: Writing – original draft, Software, Formal analysis, Data curation. Xuhao Cai: Validation, Software. Qi Liu: Software, Resources. Peide Ren: Writing – review & editing, Validation, Software. Pengpan Xin: Visualization. Yonggang Yu: Investigation. Bolun Song: Methodology. Yangyang Wang: Software, Resources. Huijing Chang: Data curation. Haoyue Ma: Visualization, Software. Yinggang Chen: Writing – review & editing, Writing – original draft, Supervision, Funding acquisition. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgements