Abstract Simple Summary Cytotoxic T lymphocytes (CTL) are critical for response to therapy and survival in CRC. Additionally, CTL are required for response to immune checkpoint inhibitor (ICI) therapy which does not work for most CRC patients. We utilized 7 omics datasets, integrating clinical and genomic data to determine how DNA methylation may impact survival and CTL function in CRC. Using comprehensive molecular subtype (CMS) 1 patients as reference, we found high TBX21 expression and low methylation had a significant survival advantage. To confirm the role of TBX21 in CTL function, we utilized scRNAseq data, demonstrating the association of TBX21 with markers of enhanced CTL function. Together, this study suggests that targeting epigenetic modification more specifically for therapy and patient stratification may provide improved outcomes in CRC. Abstract Cytotoxic T lymphocyte (CTL) infiltration is associated with survival, recurrence, and therapeutic response in colorectal cancer (CRC). Immune checkpoint inhibitor (ICI) therapy, which requires CTLs for response, does not work for most CRC patients. Therefore, it is critical to improve our understanding of immune resistance in this disease. We utilized 2391 CRC patients and 7 omics datasets, integrating clinical and genomic data to determine how DNA methylation may impact survival and CTL function in CRC. Using comprehensive molecular subtype (CMS) 1 patients as reference, we found TBX21 to be the only gene with altered expression and methylation that was associated with CTL infiltration. We found that CMS1 patients with high TBX21 expression and low methylation had a significant survival advantage. To confirm the role of Tbx21 in CTL function, we utilized scRNAseq data, demonstrating the association of TBX21 with markers of enhanced CTL function. Further analysis using pathway enrichment found that the genes TBX21, MX1, and SP140 had altered expression and methylation, suggesting that the TP53/P53 pathway may modify TBX21 methylation to upregulate TBX21 expression. Together, this suggests that targeting epigenetic modification more specifically for therapy and patient stratification may provide improved outcomes in CRC. Keywords: TCGA, scRNAseq, epigenetic, CD8+ T[EX], TBX21, colorectal cancer, consensus molecular subtypes 1. Introduction Colorectal cancer (CRC) is a heterogeneous disease characterized by distinct genome-wide changes with the third-highest incidence rate and the second-highest rate of cancer-related deaths worldwide [[44]1]. To better design treatments for CRC, it is crucial to understand tumor heterogeneity and how this contributes to therapeutic resistance and disease progression. Genomic instability and epigenetic abnormalities with resultant dysregulation of gene expression are hallmarks of CRC. The high frequencies of DNA somatic copy number alterations (SCNAs) and APC tumor suppressor gene loss of function are closely linked with CIN-caused deletions, gains, translocations, and other chromosomal rearrangements, one of the primary pathways of CRC development [[45]2]. Additionally, ~15% of CRC cases demonstrate alterations in DNA mismatch repair (MMR) proteins, which lead to hypermethylation and cancer development [[46]3,[47]4,[48]5]. Importantly, a single factor does not lead to tumorigenesis, and this underlines the importance of understanding the molecular features of each individual tumor to improve therapy using a precision-based approach. Immune-based therapies, such as immune checkpoint inhibition (ICI), have recently made significant advances in difficult-to-treat malignancies, such as non-small cell lung cancer, melanoma, and renal cell cancer [[49]6,[50]7]. However, these treatments are limited to patients with microsatellite instability-high (MSI-H) CRC, as they have not yet demonstrated efficacy in other patient groups [[51]8]. Additionally, even in patients with an indication for use of ICI therapy, response is limited [[52]9]. This is despite data demonstrating that the number of CTLs within the tumor microenvironment (TME) is a critical prognostic marker for CRC [[53]10,[54]11]. This again highlights the importance for improved methods of assessing the TME and understanding therapeutic resistance. Dysregulated methylation impacts signaling pathways associated with apoptosis avoidance, metastasis, and therapeutic resistance and represents an important process in CRC [[55]12]. Additionally, combined treatments with drugs targeting epigenetic modification exploit the dynamic nature of epigenetic changes to potentially modulate responses to immunotherapy [[56]13]. However, current drugs targeting epigenetic modification are globally hypomethylating agents, such as Azacitidine and Decitabine, which are non-selective and may have unexpected effects [[57]14]. Critically, a more specific understanding of epigenetic alterations is necessary to improve therapy and patient stratification. Recently, large-scale public data repositories, such as The Cancer Genome Atlas (TCGA) and cBioPortal, as well as publicly available single-cell sequencing data, have allowed us to perform further in-depth study of cancer patients on a molecular and clinical basis using multi-omics [[58]15,[59]16,[60]17]. To better classify CRC patients, the consensus molecular subtypes (CMSs) were developed by an expert panel using eighteen different CRC mRNAseq and microarray datasets, settling on four different subtypes based on molecular features [[61]18]. Patients classified as CMS1 are characterized by microsatellite instability (MSI) and hypermutation and are considered the “immune” subtype. CMS2 is referred to as “canonical”, characterized by marked Wnt/β-catenin/TCF7L2 pathway activation and APC mutation. Characterized by metabolic deregulation, KRAS mutations, and mixed MSI patients, CMS3 is the least common. Tumors classified as CMS4 are “mesenchymal”, demonstrating prominent TGF-β activation, stromal infiltration, angiogenesis, and epithelial-mesenchymal transition (EMT) [[62]18]. Recognizing the importance of molecular classification, there are ongoing pre-clinical trials utilizing this system for patient stratification and selection of immune-based treatments ([63]NCT03436563, [64]NCT04738214). To address limitations regarding the current knowledge in resistance to immune-based therapy and the role of epigenetic modification of gene expression, in this study we sought to combine multi-omics data with cutting-edge data analytics and comprehensive molecular classification using CMS ([65]Figure 1). Utilizing publicly available data from four data repositories, including 15 datasets, 2391 CRC patients, and 7 omics datasets, we found a difference in survival based on differential expression and methylation of the critical T cell regulatory factor T-bet (TBX21). Additionally, we found that TBX21, MX1, and SP140 may play an important role impacting T cell function via TP53/P53 pathway modification of TBX21 methylation [[66]19]. Together, this suggests that targeting epigenetic modification more specifically for therapy and patient stratification may provide improved outcomes in CRC. Figure 1. [67]Figure 1 [68]Open in a new tab An outline of the methods and organization of this study. This flowchart describes the multi-omics we included and the process of how we breakdown patients by CMS classification. We then compared CMS1 and other CMS subtypes to determine DEGs and DMGs. Then, the MCPcounter score was calculated and survival analysis was undertaken using the identified CD8+ T exhaustion cell maker and immuno-crosstalk gene. Pathway associations were determined for DEGs using the Reactome online browser. Tumor mutation burden and neoantigen were used to evaluate the CMS subtypes. Eleven publicly available datasets were used to validate our partial results. scRNA, single-cell RNA-Seq; DMG, differentially methylated gene; DEG, differentially expressed gene. 2. Results 2.1. CMS1 Subtype Patients Demonstrate Features Consistent with Immune Activation To address the question of the clinical impact of DNA methylation on gene expression and resistance to immune-based therapy, we chose to use TCGA, the largest publicly available multi-omics dataset with substantial clinical annotation. First, we downloaded and integrated the 461 colon cancer (COAD) and 171 rectal cancer (READ) patients from this dataset. We then classified patients in CMS subtypes to use the most comprehensive current molecular classification [[69]18]. Using this system, there were 316 patients with DNA methylation data (450 k) and 509 patients with mRNAseq data labelled with CMS subtypes. The demographic, clinical, and pathologic characteristics of each patient group are summarized by CMS subtype in [70]Table 1. As expected, the microsatellite status composition of patients was significantly different between CMS subtypes (p < 0.0001), with the CMS1 subtype containing more MSI-H patients and the CMS3 subtype containing more MSI-L/MSS patients. The median age of CMS1 patients’ diagnosis was 71, ranging from 58 to 85, significantly higher than patients in the other CMS subtypes (p = 0.0019). Additionally, in the CMS2 and CMS4 subtypes, more patients were diagnosed at stage III and IV than the other subtypes (p < 0.0001). Table 1. Patient characteristics. CMS, consensus molecular subtype; MSS, microsatellite stable; MSI-L, microsatellite instability-low; MSI-H, microsatellite instability-high. Characteristic CMS1 (n = 76) CMS2 (n = 218) CMS3 (n = 72) CMS4 (n = 143) p Value Gender, n Male 37 123 38 75 0.6748 Female 39 95 34 68 MS, n MSS/MSI-L 14 217 60 137 <0.0001 MSI-H 62 1 12 6 Age, Mean ± SD 71 ± 14 66 ± 12 66 ± 13 65 ± 13 0.0019 Pathologic stage, n (%) I 12 (16) 46 (21.8) 21 (30.4) 9 (6.6) <0.0001 II 42 (56) 71 (33.6) 30 (43.5) 49 (36.0) III 17 (22.7) 58 (27.5) 15 (21.7) 54 (39.7) IV 4 (5.3) 36 (17.1) 3 (4.3) 24 (17.6) [71]Open in a new tab Next, we sought to combine CMS classification with stratification by cytotoxic lymphocyte (CTL) infiltration as a marker for anti-tumor immunity [[72]20]. MCP-counter scores were derived for each patient, and patients were then stratified by MCP-counter score quartile and CMS classification ([73]Figure 2) [[74]21,[75]22]. Patients in the CMS1 subtype had the highest median CTL abundance score, and most patients in this subgroup were in the highest quartile of CTL infiltration. Additionally, we also found that, when considering tumor mutational burden (TMB, [76]Figure S1) and neoantigen-predicted peptides ([77]Figure S2), CMS1 subtype patients were again much higher than the other subtypes. These data combined support the idea that CMS1 subtype patients represent an “inflamed” phenotype and are the most immune-active. Figure 2. [78]Figure 2 [79]Open in a new tab Ten tumor microenvironment cell populations for MCP-counter scores. CMS1 patients have significantly higher cytotoxic lymphocyte scores than the others. Each set of three columns represents one CMS subtype, each row represents the cell population. 2.2. TBX21 Is the Only Gene Differentially Expressed, Differentially Methylated, and Highly Correlated with CTL Infiltration across All CMS Subtypes CTL infiltration is critical for anti-tumor immunity and response to immune-based therapy. Therefore, we next sought to understand how gene expression and DNA methylation were associated with CTL infiltration and molecular alteration. Using CMS1 as the reference subtype, given its high level of CTL infiltration and predicted response to ICI, we performed differential gene expression and differential methylated region analysis ([80]Figure 1). Expressions of 20,531 genes and 293,276 methylation loci were determined for each tumor sample from TCGA. There were 2977 differentially expressed genes and 11,541 differentially methylated regions (3500 differentially methylated genes) in the comparison of CMS1 and CMS2. In the comparison of CMS1 and CMS3, there were 2385 differentially expressed genes and 4231 differentially methylated regions (1583 differentially methylated genes). In the comparison of CMS1 and CMS4, there were 3246 differentially expressed genes and 9393 differentially methylated regions (2359 differentially methylated genes). To identify “crosstalk genes”, genes both differentially expressed and differentially methylated, we then performed a correlative analysis plotting differentially expressed (|logFC| > 1 and p-value < 0.05, adjusted for FDR) against differentially methylated genes (|Δβ| > 0.25 and p-value < 0.05, adjusted for FDR, [81]Figure 3a–c). Identified crosstalk genes were predominantly enriched in quadrants “higher methylation and higher expression” and “higher methylation and lower expression” ([82]Figure 3a–c), demonstrating increased methylation (Δβ > 0.25) associated with either increased or decreased differential gene expression. To validate these findings, we analyzed other non-TCGA publicly available gene expression and DNA methylation datasets ([83]Table S1) [[84]23]. Analysis of these datasets confirmed that TBX21 was consistently differentially methylated when comparing CMS1 vs. other CMS subtypes. Specifically, we found that one loci in the coding region of TBX21 (SiteID: cg26281453, Loc: 45810610) was differentially methylated in all datasets ([85]Tables S1 and S2). Figure 3. [86]Figure 3 [87]Open in a new tab The distribution of crosstalk genes. Among the identified crosstalk genes for each comparison, most crosstalk genes were enriched in quadrants 2 and 3. TBX21 was the only gene differentially expressed and differentially methylated across three comparisons. The x-axis is the delta beta value of methylation, the y-axis is the RNAseq LogFC. (a) Distribution of crosstalk genes from CMS2 vs. CMS1 DEG and DMG analysis. (b) Distribution of crosstalk genes from CMS3 vs. CMS1 DEG and DMG analysis. (c) Distribution of crosstalk genes from CMS4 vs. CMS1 DEG and DMG analysis. To explore the relationship of crosstalk genes with CTL infiltration, we correlated crosstalk gene expression with MCP-counter CTL abundance scores. Interestingly, TBX21 was the only gene differentially expressed, differentially methylated, and highly correlated with CTL from each CMS1 comparison (r > |0.7|, [88]Figure 4a–c). Interestingly, we observed that TBX21 was more highly expressed in CMS1 patients but was also more highly methylated ([89]Figure 3a–c). However, the typical expectation is that methylation is an epigenetic modification that leads to decreased gene expression [[90]5]. Moreover, previous studies have suggested that TBX21 is an important transcriptional regulator of tumor-reactive CD8T-cells, which are critical for response and survival [[91]24]. Given these results, we hypothesized that the importance of alterations in the expression of TBX21 via methylation may be highest within CMS1 patients. Figure 4. [92]Figure 4 [93]Open in a new tab TBX21 was highly correlated with CTL. The x-axis is the CTL score, the y-axis is the normalized TBX21 mRNAseq expression. (a) Normalized TBX21 expression was highly correlated with CTL in CMS2. (b) Normalized TBX21 expression was highly correlated with CTL in CMS3. (c) Normalized TBX21 expression was highly correlated with CTL in CMS4. 2.3. CMS1 Patients with High TBX21 Expression and Low TBX21 Methylation Have the Best Survival To better understand the relationship between TBX21 expression and methylation and CMS1 patient outcomes, we next performed a survival analysis. When using TBX21 expression or methylation values as independent variables, there was no difference in survival. Therefore, we further classified CMS1 patients using the median value of TBX21 expression and mean methylation β value to separate these patients into four groups ([94]Figure 5a). Patients were grouped as: high TBX21 expression with low methylation (high-low); high TBX21 expression with high methylation (high-high); low TBX21 expression with high methylation (low-high); and low TBX21 expression with low methylation (low-low). We then repeated the survival analysis with CMS1 patients stratified by these four subgroups ([95]Figure 5b). Patients classified as high-low (high expression, low methylation) demonstrated the best survival, followed by low-high patients. Interestingly, the group of patients classified as high-high appeared to represent a potential intermediate subgroup ([96]Figure 5a) with worse survival ([97]Figure 5b) than both high-low and low-high subgroups. These data suggest that the interaction between TBX21 expression and methylation plays an important role in patient prognosis. Figure 5. [98]Figure 5 [99]Open in a new tab (a) Subtype of CMS1 patients based on the expression and methylation of TBX21. There were 24 patients with higher TBX21 methylation and 17 patients with lower methylation. The x-axis is TBX21 mRNAseq expression, the y-axis is the TBX21 methylation beta value. High-low: high TBX21 mRNA expression, low TBX21 methylation (number of patients (n) = 13); high-high: high TBX21 mRNA expression, high TBX21 methylation (n = 8); low-high: low TBX21 mRNA expression, high TBX21 methylation (n = 16); low-low: low TBX21 mRNA expression, low TBX21 methylation (n = 4). (b) Patients with higher expression and low methylation of TBX21 had the best survival, and the patients with lower expression and higher methylation had worse survival than high-low group and there were also the largest number of patients with this status in this group. The x-axis is patient days elapsed, the y-axis is percent of patients still alive. 2.4. There Were No Significant Clinical Differences between High-Low and Low-High CMS1 Patient Subgroups This result inspired us to investigate the potential mechanisms of the observed difference in survival between CMS1 patient subgroups. Therefore, we retrieved the patient clinical attributes from cBioPortal, including 41 separate attributes. We found no clinical attributes with significant differences when comparing high-low and low-high patients. However, when comparing the high-high group to other groups, we found multiple significant factors. There were eight clinical attributes with significant differences between high-low and high-high patient groups ([100]Table S3). Most importantly, we noted that high-high patients had a significantly higher number of positive lymph nodes than high-low patients (p-value: 0.0442, [101]Figure 6a), a known marker of poorer survival. Additionally, high-high patients had significantly higher rates of lymphovascular invasion (LVI) than low-high patients (p-value: 0.0192, [102]Figure 6b), another important clinical risk factor for survival. These results suggest that the survival difference demonstrated by the high-high group may be driven by clinical factors; however, the survival difference between the high-low and low-high patient groups may be more likely driven by molecular factors. Figure 6. [103]Figure 6 [104]Open in a new tab (a) High-high patients had more positive lymph nodes than high-low patients. (b) high-high patients had more lymphovascular invasions than low-high patients. 2.5. Patients with high TBX21 Expression and Low Methylation Are the Most Highly Immune-Infiltrated To further evaluate the impact of TBX21 on survival and anti-tumor immunity, we looked at other indicators of immune resistance. Tumor mutational burden (TMB) has been shown to predict survival and response to immune checkpoint, so we first derived these scores for each patient [[105]25]. When comparing the subgroups of CMS1 patients, however, we observed no significant differences in TMB ([106]Table S4). Next, we derived neopeptides, a potentially better marker of immune reactive antigen in the tumor microenvironment, and, again, we did not observe any differences between subgroups of CMS1 patients ([107]Table S4). Given the lack of overt differences in these measures, we sought to further investigate other aspects of anti-tumor immunity. To obtain a more in-depth look at immune alterations in CMS1 patient subgroups, we next looked more specifically at different cell populations in the TME using cell marker score analysis [[108]26]. Signature genes for calculating the cell marker score of each cell population were obtained from the TCIA dataset ([109]Table S5) [[110]26]. Using this analysis, we found that high-low patients had significantly higher infiltration of CD8+ T cell subtypes, including activated CD8+ T cell (p value = 0.0008), central memory CD8+ T cell (p value = 0.0027), and effector memory CD8+ T cell (p value = 0.0023) ([111]Figure 7a–c). We also found increased infiltration of T helper cells (Th1, p value = 0.0007) and activated dendritic cells (DC, p value = 0.0007) ([112]Figure 7d,e). However, in addition to the significantly higher infiltration of cell subtypes consistent with anti-tumor immune profiles, we also saw increased infiltration of immune-suppressive cell subtypes, such as Treg (p value = 0.0040) and myeloid-derived suppressor cells (p value = 0.0027) ([113]Figure 7f,g). TH17 and monocyte subsets were not significantly different. Given the evidence of increased immune cell infiltration, both pro-and anti-inflammatory, we sought to further understand the molecular differences that may suggest a mechanism for the observed difference in survival associated with TBX21 in these patients. Figure 7. [114]Figure 7 [115]Open in a new tab Cell marker score analysis for immune cell populations. High-low patients had significantly higher infiltration of CD8+ T cell subtypes, including (a) activated CD8+ T cell, (b) central memory CD8+ T cell, and (c) effector memory CD8+ T cell. We also found increased infiltration of (d) T helper cells (Th1) and (e) activated dendritic cells (DC). We also saw increased infiltration of cell subtypes suggested to be immune-suppressive: (f) Treg and (g) myeloid-derived suppressor cells. 2.6. Epigenetic Modification of Genes in the Regulation of TP53 Activity including TBX21 Are Enriched in High Expression and Low Methylation Patients To look at the question of molecular differences impacting survival in CMS1 patient subgroups, we completed DEG analysis, focusing on the differences between high-low and low-high patients ([116]Table 2). Notably, we found that there were many more genes differentially expressed when comparing high-low and low-high patients than when comparing the other subgroups (1482 DEGs, [117]Table S6), further supporting an important molecular difference in these patient groups. Next, DEGs obtained from comparing high-low and low-high subgroups were analyzed using Reactome gene enrichment analysis. We found 741 DEGs that were enriched in 41 pathways ([118]Table S7). Eighteen CD8+ T[EX] marker genes were enriched in thirteen pathways, all of which were upregulated in high-low patients ([119]Figure 8). After processing DMG analysis between low-high and high-low patients, five crosstalk genes associated with the CD8+ T[EX] signature were identified ([120]Table S8). We next applied Reactome analysis specifically for crosstalk genes. These five crosstalk genes participated in 303 pathways ([121]Table S9). Notably, we observed that genes MX1, SP140, and TBX21 were frequently enriched in the “Regulation of TP53 Activity” and its cascade pathways. Moreover, we identified the function of SP140 from the EpiFactors database as a Zinc finger structure that mainly targets Histone modification read and transcription factor (TF) regions to participate in epigenetic modification [[122]27]. Together, these data suggest a critical role for epigenetic modification of TP53 pathway genes impacting TBX21 and patient outcome in these subgroups. Table 2. Reactome pathway analysis for crosstalk gene with CD8Tex feature from the DEG and DMR analysis of high-low and low-high CMS1 patients. Pathway Identifier Pathway Name Submitted Entities Hit Interactor R-HSA-6803204 TP53 Regulates Transcription of Genes Involved in Cytochrome C Release TBX21 R-HSA-6804760 Regulation of TP53 Activity through Methylation TBX21 R-HSA-6804114 TP53 Regulates Transcription of Genes Involved in G2 Cell Cycle Arrest TBX21 R-HSA-6811555 PI5P Regulates TP53 Acetylation MX1; SP140; TBX21 R-HSA-6804758 Regulation of TP53 Activity through Acetylation MX1; SP140; TBX21 R-HSA-6791312 TP53 Regulates Transcription of Cell Cycle Genes TBX21 R-HSA-5633008 TP53 Regulates Transcription of Cell Death Genes TBX21 R-HSA-5633007 Regulation of TP53 Activity MX1; SP140; TBX21 R-HSA-3700989 Transcriptional Regulation by TP53 MX1; SP140; TBX21 [123]Open in a new tab Figure 8. [124]Figure 8 [125]Open in a new tab Eighteen CD8+ T[EX] marker genes were enriched in thirteen pathways, all of which were upregulated in high-low patients. Longer bars indicate that more DEGs were enriched in the pathway. Bar color from blue to red indicates that the DEG-enriched pathways had higher p-adjusted values. 2.7. TBX21 Is a Key Modulator in CD8+ T Exhausted Cells To validate our hypothesis regarding the importance of TBX21 in T[EX] in CRC, we explored publicly available scRNAseq data from 23 CRC patients with 65,362 matched normal and tumor single cells. Data were first normalized and then, utilizing cell subtypes identified by the original study, we retrieved 47,285 tumor cells labeled as TP [[126]7]. Blueprint/ENCODE references from the SingleR subtype identifier were