Abstract Background Investigations into the function of non-promoter DNA methylation have yielded new insights into the epigenetic regulation of gene expression. However, integrated genome-wide non-promoter DNA methylation and gene expression analyses across a wide number of tumour types and corresponding normal tissues have not been performed. Methods To investigate the impact of non-promoter DNA methylation on cancer pathogenesis, we performed a large-scale analysis of gene expression and DNA methylation profiles, finding enrichment in the 3’UTR DNA methylation positively correlated with gene expression. Filtering for genes in which 3’UTR DNA methylation strongly correlated with gene expression yielded a list of genes enriched for functions involving T cell activation. Findings The important immune checkpoint gene Havcr2 showed a substantial increase in 3’UTR DNA methylation upon T cell activation and subsequent upregulation of gene expression in mice. Furthermore, this increase in Havcr2 gene expression was abrogated by treatment with decitabine. Interpretation These findings indicate that the 3’UTR is a functionally relevant DNA methylation site. Additionally, we show a potential novel mechanism of HAVCR2 regulation in T cells, providing new insights for modulating immune checkpoint blockade. Keywords: 3’UTR, Epigenetics, Immune checkpoint, Methylation __________________________________________________________________ Research in context. Evidence before this study DNA methylation allows for differential expression of a single gene without requiring modification to the DNA sequence. This plays a critical role in both normal cellular processes, as well as in disease. Cancer cells are well known to have radically altered DNA methylation profiles. However, these changes are not isolated to the cancer cells themselves, but rather occur in various cell types within the tumour microenvironment. In particular, DNA methylation has been revealed as a major driver in why T cells become “exhausted” and no longer target cancer cells. Expression of factors that dampen T cell activity has been implicated in this phenotype, and as such, combining demethylating agents with immunotherapy has been shown to increase efficacy. HAVCR2 (TIM-3) is a critical immunoregulatory gene in which expression, especially in conjunction with PDCD1 (PD-1), induces an exhausted T cell state. Much of how DNA methylation functions in regulating HAVCR2 remains to be understood. Added value of this study Understanding how site-specific DNA methylation impacts expression is critical for gaining a more complete control over cellular processes, particularly in the context of cancer. T cells represent a potent tool for eliminating tumour cells, and DNA methylation is a major determinant of T cell function. This study investigates how intragenic site-specific DNA methylation across genes involved in T cell function changes based on T cell activation state. Implications of all the available evidence In this study, we have uncovered a strong positive correlation between 3’UTR DNA methylation of specific genes, and increased gene expression. Genes that play a role in T cell activation are enriched among those exhibiting the most robust correlations. Furthermore, 3’UTR methylation and gene expression of the immune checkpoint gene HAVCR2 increases significantly after T cell activation. Treating activated T cells with the demethylating agent decitabine, or knocking out the DNA methylating enzyme Dnmt3a in mice results in decreased 3’UTR methylation and gene expression of Havcr2. Therefore, the 3’UTR may serve as a functionally relevant site of DNA methylation. Moreover, alterations in the methylation of this region may be involved in activated and exhausted T cell phenotypes. Modulating this region may grant additional control over harnessing the immune system. Alt-text: Unlabelled Box 1. Introduction DNA methylation is found at nearly every region of the genome, with methylation at the gene promoter region being the most comprehensively understood. Promoter DNA methylation, typically occurring within CpG islands, results in powerful repression of transcription, primarily by recruiting repressor proteins or chromatin modifiers that enhance the binding of DNA to histones [[39]1,[40]2]. The interplay between epigenetic modifications on DNA and histones allows for remarkable plasticity among cells that share identical genomes by providing a means of activating or silencing genes whose expression affects the developmental state of specific cell types [[41]3]. However, promoter DNA methylation accounts for only a small portion of the overall DNA methylation of the genome [[42]4]. Much remains to be understood about the diverse functions of site-specific non-promoter DNA methylation on gene regulation. Recent studies have explored the connection between non-promoter DNA methylation and the regulation of gene expression. Enhancer DNA methylation has emerged as a robust predictor of gene expression, with a fraction of genes showing a stronger link between gene expression and enhancer DNA methylation relative to promoter methylation [[43]5,[44]6]. Gene body DNA methylation, like promoter methylation, can repress gene expression through altered binding of regulatory proteins [[45]7], but it is frequently associated with increased gene expression, which runs counter to its role in promoter DNA methylation [[46]8]. Multiple mechanisms to explain this relationship have been uncovered, such as promoting genomic stability by suppressing mobile DNA elements [[47]9,[48]10], modulating patterns of histone methylation that stabilize transcriptional elongation [[49]11], and regulating alternative splicing [[50]12,[51]13]. Importantly, a direct link between gene body DNA methylation and high expression of oncogenes has been demonstrated in colorectal cancer [[52]14], highlighting the functional relevance of this mode of epigenetic modification in both normal and pathogenic processes. Non-promoter DNA methylation undergoes extensive changes in disease, most strikingly in cancer [[53]15]. The DNA methylation of enhancer regions of clinically relevant genes, particularly within super-enhancers, were drastically altered, resulting in pathogenic transcriptional output [[54][16], [55][17], [56][18]]. Global changes to gene body DNA methylation in Burkitt and follicular lymphoma have been observed [[57]19]. Methylation of the ITPKA gene body is associated with increased expression in lung cancer, and the extent of gene body methylation serves as a biomarker for lung cancer progression [[58]20]. In colon cancer, increased gene expression through gene body DNA methylation is enriched for genes activated by c-Myc, and demethylating these regions impairs the ability of tumour cells to survive and proliferate [[59]14]. Modulating the epigenetic profiles of tumours has become a promising new route for treating cancer [[60]21].Recent therapeutic endeavors have focused primarily on re-activation of the genes suppressed by promoter methylation; therefore, cataloging the functions of non-promoter DNA methylation may broaden our understanding of tumourigenesis and tumour progression, while offering new opportunities for clinical intervention. To address this gap in knowledge, a comprehensive analysis of how DNA methylation of regions outside the promoter region may impact cancer pathogenesis was conducted. Here, for the first time, we uncovered a relationship between 3’UTR DNA methylation and gene expression across 10 tumour types, revealing a unique association between these two phenomena that has implications for our understanding of the role of DNA methylation in normal cellular processes and disease. 2. Materials and methods 2.1. Study design Sample size: We used all of the tumour and normal tissue samples available in the TCGA database, which includes data on 11,000 US cancer patients. There are no publication restrictions on these data according to the TCGA data policy ([61]http://cancergenome.nih.gov/publications/publicationguidelines). Rules for stopping data collection: N/A. Data inclusion and exclusion criteria: Three cut-offs for the correlation coefficient were originally set: 0∙7, 0∙5, and 0∙3. The minimum was set at 0∙3. However, because this yielded a very large number of genes, greater specificity was sought. Therefore, 0∙5 was used as the final cut-off. Outliers: No outliers were excluded from this analysis. Selection of endpoints: N/A. Replicates: For in-house experiments, each assay was conducted in triplicate, on the basis of previously designed methods. Each experiment was conducted at least twice, with similar results being achieved each time. 2.2. TCGA tissue selection For this analysis, we used tumour and normal tissue samples from 10 tumour types; Illumina HiSeq RNASeqV2 and Illumina HumanMethylation 450 k data were available at the time of our initial download on March 26, 2013. The tumour types included bladder carcinoma, breast carcinoma, colon and rectal carcinoma, head and neck squamous cell carcinoma, kidney renal cell carcinoma, liver carcinoma, lung adenocarcinoma, lung squamous cell carcinoma, prostate adenocarcinoma, thyroid carcinoma, and uterine carcinoma. The corresponding clinical data used for the survival analysis were downloaded from the TCGA data portal and were current as of January 8, 2014. 2.3. Methylation and gene expression data According to the TCGA description file associated with the Illumina Human Methylation 450 K array data, probes with a SNP within 10 nucleotide base pairs (bp) of the interrogated CpG site or those in which 15 bp of the interrogated CpG site are overlapped with a REPEAT element are masked as NA across all samples. There are 88,058 probes that interrogate such sites (18.3% of all probes). While these beta values are not reported at level 3, the methylated and un-methylated intensity values for these probes are recorded in the level 2 data. Therefore, we used the level 2 data to reconstruct the beta values for all probes as methylated/(methylated+unmethylated); these data are also used by the TCGA. We used the log2-transformed level 3 RNASeqV2 data to analyse gene expression. To avoid errors for RNASeq raw counts of 0, all values were offset by 1 prior to obtaining logs. 2.4. Gene methylation-expression correlation All analyses were performed using R software, version 2∙15∙1. Using the complete set of probes targeting CpG dinucleotides, we performed a genome-wide analysis exploring the relationship between the proportion of methylation at various locations within and up to 1500 bp upstream of a gene and the corresponding log-transformed gene expression. We used the Spearman rank statistic to quantify the correlation for each pair. Because we expected these patterns to vary by the tissue source site, we calculated coefficients individually using each of the 10 tissue types for which we had data from both the 450 k methylation and RNASeq arrays. 2.5. Survival analysis Tests for differences in survival were performed by comparing the overall survival of patients in the top and bottom quartiles of 3’UTR methylation using the “survival” package in R software for all genes; correlations of >0∙5 between gene expression and methylation were used for each tissue type. 2.6. Pathway analysis The pathway enrichment analysis was performed using Netwalker ([62]http://www.netwalkersuite.org) on all genes with correlations of > 0∙5 between gene expression and methylation at the 3’UTR and < 0∙5 between gene body methylation and gene expression for each tissue type. An additional pathway analysis was run using Ingenuity Pathway Analysis ([63]http://qiagenbioinformatics.com/products/ingenuity-pathway-analysi s.com), which yielded similar results. 2.7. Network identification and construction The network nodes were obtained using Netwalker to determine how many nodes exist and how the genes interact with one another. A singular network node was identified using this method; we also isolated genes that have been shown to be associated with one another but were not identified in the Netwalker database. The node and other genes that were associated with it were exported to Cytoscape ([64]https://www.cytoscape.org/). Connections were then manually included in Cytoscape on the basis of results in the published literature [[65]22]. 2.8. T cell isolation, activation, and de-methylation The spleens of 4 healthy female C57Bl/6 mice were excised after cervical dislocation and then ground on a 40-μm filter while being repeatedly washed with RPMI (Sigma-Aldrich) + 10% heat-inactivated FBS (Thermo Fisher Scientific) + 1% PenStrep (Thermo Fisher Scientific) + 0∙01% β-mercaptoethanol (Sigma-Aldrich). The resulting slurry was spun down at 4 °C at 450 ×g for 5 min, the supernatant was removed, and the pellet was washed with 50 mL of PBS (HyClone). The pellet was incubated at ambient temperature for 5 min using 4 mL of ACK lysis buffer (Gibco), washed with 50 mL of PBS, and spun down at 4 °C at 450 ×g. The pellet was then resuspended in the media. To obtain naïve T cells, cells were sent immediately for flow cytometry cell sorting. To activate the T cells, 2 μL/mL of mouse CD3e (BD Biosciences, RRID: [66]AB_2259894) and 3.5 μL/10 mL of mouse CD28 (BioXCell, RRID: [67]AB_1107624) activating antibodies were added to the media, along with 1:10000 ng/μL mouse recombinant IL-2 (R&D Systems). To induce demethylation, activated T cells were treated with 500 nM decitabine (Sigma-Aldrich) for 72 h or vehicle DMSO control. 2.9. Flow cytometry analysis Mouse splenocytes were grown in the conditions outlined above and spun down at 450 ×g at 4 °C for 5 min to pellet the cells. The supernatant was removed and the cells were re-suspended in 2 mL of FACS buffer (PBS + 2% FBS). Cells were then spun at 450 ×g at 4 °C for 5 min. The supernatant was again removed, and the cells were re-suspended in 200 μL of FACS buffer +2 μL of mouse CD16/CD32 blocking antibody (BD Biosciences, RRID: [68]AB_2740348) and incubated on ice for 10 min. After incubation, 2 μL each of mouse CD45-PE, CD4-eFluor 450, and CD8-APC-eFluor 780 (eBiosciences, RRID: [69]AB_469717, [70]AB_467067, [71]AB_11180008) were added, and the cells were incubated on ice for 30 min. All cells that were CD45+ and CD4+ or CD45+ and CD8+ were sorted by flow cytometry analysis and collected for molecular analysis. 2.10. Quantitative polymerase chain reaction Specimens were collected from the flow cytometry analysis, spun down at 450 ×g for 5 min at 4 °C, and then re-suspended in 350 μL of TRIzol (Thermo Fisher). RNA was isolated from specimens using the Direct-zol RNA isolation kit (Zymo) and quantified by NanoDrop. We used 100 ng of RNA for a cDNA template. cDNA was created using the Verso cDNA synthesis kit (Invitrogen). We combined 5 μg of cDNA with 1 μL of 100 μM forward and reverse primers and 5 μL of SYBR Green PCR Master Mix (Thermo Fisher) in each well. The resulting mixture was then run in a real-time PCR machine (Applied Biosystems) using the following program: 50 °C for 2 min, 95 °C for 10 min, 95 °C for 15 s, and 60 °C for 1 min × 40 cycles. The resulting Ct values were compared, and the ΔΔCt was obtained. This was used to quantify the relative change in mRNA across samples. 2.11. Primers Please see [72]Supplementary Table 3 for primer sequences. 2.12. Methylation analysis One microgram of genomic DNA was treated with sodium bisulfite using the EZ DNA Methylation-Gold Kit (Zymo Research, Irvine, CA) according to the manufacturer's protocol. The samples were eluted in 40 μL of M-elution buffer, and 2 μL were used for each PCR reaction. Both bisulfite conversion and a subsequent pyrosequencing analysis were performed at the DNA Methylation Analysis Core at The University of Texas MD Anderson Cancer Center (Houston, Texas). PCR primers for the pyrosequencing methylation analysis of Havcr2, Itk, and Vav1 were designed using Pyromark Assay Design SW 1∙0 software (Qiagen, Germany). In brief, a sequencing primer was identified within 1 to 5 base pairs near the CpG sites of interest, with an annealing temperature of 40 ± 5 °C. Forward and reverse primers were identified upstream and downstream of the sequencing primer. The optimal annealing temperatures for each of these primers were tested using gradient PCR. Controls for high methylation (SssI-treated DNA), low methylation (WGA-amplified DNA), and no-DNA template were included in each reaction. PCR reactions were performed in a total volume of 20 μL, and the entire volume was used for each pyrosequencing reaction, as previously described [[73]23]. In brief, PCR product purification was performed using streptavidin-sepharose high-performance beads (GE Healthcare Life Sciences, Piscataway, NJ), and co-denaturation of the biotinylated PCR products and sequencing primer (3∙6 pmol/reaction) was conducted following the PSQ96 sample preparation guide. Sequencing was performed on a PyroMark Q96 ID instrument with PyroMark Gold Q96 Reagents (Qiagen, Germany), according to the manufacturer's instructions. The degree of methylation for each individual CpG site was calculated using PyroMark Q96 software. The average methylation of all sites and duplicates was reported for each sample. 2.13. Statistical analysis Statistical analyses were performed using unpaired t-tests for comparisons between 2 groups and one-way ANOVA with Tukey's posttest for multiple comparisons for >2 groups (Graphpad Prism). 2.14. Sample size estimation Three replicates for each experiment were prepared in order to identify and exclude any potential outliers; however, there were no outliers excluded from the in vitro experiments. 2.15. Randomization No specific randomization procedure was employed in the in vitro experiments. 2.16. Blinding For all conducted experiments, the experimenter that submitted the samples for methylation analysis was not blinded, but the investigator that carried out the methylation analysis was blinded to the groups, and therefore the submitter had no control or input over the results of the analysis. All other data was generated using in silico approaches. 3. Results 3.1. DNA methylation probes positively correlated with increased gene expression are enriched in 3’UTRs To explore the relationship between poorly understood sites of DNA methylation and gene expression, we used genome-wide RNA-seq and methylation array datasets from the TCGA platform, which includes over 11,000 patient tissue samples, providing a unique opportunity to harness large-scale molecular profiling datasets across numerous tumour types [[74]24].Initially, the proportion of probes achieving Spearman correlations of < −0∙5 or > 0∙5 between gene expression and DNA methylation were examined within each of the 6 gene regions included in the Illumina methylation probe annotation after normalizing for the total number of probes interrogating each region ([75]Supplementary Fig. S1). The sample sizes associated with TCGA are such that these correlations were significant (p < 0∙0004, assuming n = 50 [Spearman correlation]). With sample sizes per tissue type in excess of a few hundred, absolute correlations of < −0∙5 or > 0∙5 were therefore considered highly significant in this analysis. At the probe level, there were 44,309 negative and 29,043 positive associations; at the gene level, there were 6287 negative and 3200 positive associations. The majority (> 75%) of the negatively correlated probes across all 10 tissue types were concentrated within the first exon, 5’-UTR, and upstream of the transcription start site ([76]Fig. 1a). However, approximately a third of probes exhibiting these correlation values were positively correlated with gene expression. Based upon this observation, the specific locations of the positively correlated probes were examined to determine whether any underlying pattern exists. Fig. 1. [77]Fig. 1 [78]Open in a new tab 3’UTR is enriched for methylation probes that are positively correlated with gene expression. Proportion heat map representing the distribution of probes in which the correlation between methylation and gene expression was < −0∙5 (a) or > 0∙5 (b) in each of the 10 tissue types. Areas with high proportions are shaded red, while regions with low proportions are shaded blue. The color bar depicts the gene structure: purple, promoter region; light green, 5’UTR; dark green, first exon; grey, body; red, 3’UTR. The negatively correlated probes were concentrated in the first exon, 5’UTR, and were < 200 bp upstream of the transcriptional start site. The positively correlated probes were found predominantly in the gene body and 3’UTR. (c) The arrows represent the directional change of the percentage of probes with negative correlations < −0∙5 compared with the percentage of probes with positive correlations >0∙5 when evaluating the entire gene. The arrows in the southeast direction indicate an increase in the percentage of positive probes and a simultaneous decrease in the percentage of negative probes. In the case of LUSC, we observed a 5% increase in the number of positively correlated probes and a 3% decrease in the number of negatively correlated probes. There was enrichment in the percentage of positively correlated probes in the 3’UTR for all 10 tissue types. BLCA = bladder carcinoma; BRCA = breast carcinoma; CORE = colon and rectal carcinoma; HNSC = head and neck squamous cell carcinoma; KIRC = kidney renal cell carcinoma; LIHC = liver carcinoma; LUAD = lung adenocarcinoma; LUSC = lung squamous cell carcinoma; PRAD = prostate adenocarcinoma; THCA = thyroid carcinoma; UCEC = uterine carcinoma. (For interpretation of the references to color in this figure legend, the reader is referred to