Example of using mitch package for Infinium methylation analysis

Introduction

Here we are looking at putting together the easiest possible workflow for the package.

Source code: https://github.com/markziemann/gmea/blob/main/example_workflow/example_workflow.Rmd

Requirements

Load packages.

Important: ensure that the mitch version used is 1.15.1 or higher.

suppressPackageStartupMessages({
  library("limma")
  library("eulerr")
  library("AnnotationHub")
  library("IlluminaHumanMethylation450kanno.ilmn12.hg19")
  library("IlluminaHumanMethylationEPICanno.ilm10b4.hg19")
  library("HGNChelper")
  library("tictoc")
  library("mitch")
  library("kableExtra")
  library("beeswarm")
  library("missMethyl")
  library("gridExtra")
  library("png")
})

if( packageVersion("mitch") < "1.15.1") {
  warning("This workflow requires mitch version 1.15.1 or higher")
}

Load methylation data

The limma¹ data table is read in. This dataset is available from GEO (GSE49667)². The limma data was obtained using a previously prepared workflow.³

dm <- read.csv("DMPs.csv.gz",header=TRUE)

head(dm)

##         Name     logFC     AveExpr         t      P.Value   adj.P.Val        B
## 1 cg07499259  3.654104  2.46652171  18.73082 7.258963e-08 0.005062817 7.454265
## 2 cg26992245  4.450696 -0.09180715  18.32680 8.603867e-08 0.005062817 7.360267
## 3 cg09747445 -3.337299 -0.25201484 -18.24369 8.913981e-08 0.005062817 7.340431
## 4 cg18808929 -2.990263  0.77522878 -17.90079 1.033329e-07 0.005062817 7.256713
## 5 cg25015733 -3.054336  0.83280190 -17.32537 1.332317e-07 0.005062817 7.109143
## 6 cg21179654  2.859016  1.32460816  17.27702 1.361568e-07 0.005062817 7.096322

Note that this object has probes with their own column, not simply as row names.

Load pathways

Gene ontologies were downloaded in GMT format from MSigDB on 15th Jan 2024^4,5. The GMT file is read into R using the mitch function gmt_import().

gene_sets <- gmt_import("c5.go.v2023.2.Hs.symbols.gmt")

Curate the annotation

One of the critical parts of this workflow is the establishment of probe-gene relationships. This controls how the probe data is aggregated to make the gene level scores.

In this demonstration we are using all probes on the HM450K chip. Alternative approaches could use only promoters CpGs for example.

As these annotations are several years old, many of the annotated gene names are no longer current. To remedy this, the gene names are screened with the HGNChelper package and any defunct symbols get updated to the newer gene name, so they will be recognised properly in the gene sets.

The process for making a gene table for EPIC data is included down below.

tic()
anno <- getAnnotation(IlluminaHumanMethylation450kanno.ilmn12.hg19)
myann <- data.frame(anno[,c("UCSC_RefGene_Name","UCSC_RefGene_Group","Islands_Name","Relation_to_Island")])
gp <- myann[,"UCSC_RefGene_Name",drop=FALSE]
gp2 <- strsplit(gp$UCSC_RefGene_Name,";")
names(gp2) <- rownames(gp)
gp2 <- lapply(gp2,unique)
gt1 <- stack(gp2)
colnames(gt1) <- c("gene","probe")
gt1$probe <- as.character(gt1$probe)
toc() #6.0s

## 15.969 sec elapsed

dim(gt1)

## [1] 407090      2

str(gt1)

## 'data.frame':    407090 obs. of  2 variables:
##  $ gene : chr  "TSPY4" "FAM197Y2" "TTTY14" "TMSB4Y" ...
##  $ probe: chr  "cg00050873" "cg00050873" "cg00212031" "cg00214611" ...

length(unique(gt1$gene))

## [1] 21231

tic()
#new.hgnc.table <- getCurrentHumanMap()
new.hgnc.table <- readRDS("../figs/new.hgnc.table.rds")
fix <- checkGeneSymbols(gt1$gene,map=new.hgnc.table)

## Warning in checkGeneSymbols(gt1$gene, map = new.hgnc.table): Human gene symbols
## should be all upper-case except for the 'orf' in open reading frames. The case
## of some letters was corrected.

## Warning in checkGeneSymbols(gt1$gene, map = new.hgnc.table): x contains
## non-approved gene symbols

fix2 <- fix[which(fix$x != fix$Suggested.Symbol),]
length(unique(fix2$x))

## [1] 2788

gt1$gene <- fix$Suggested.Symbol
toc() #  36 sec (2788 gene names updated)

## 141.394 sec elapsed

head(gt1)

##       gene      probe
## 1    TSPY4 cg00050873
## 2 FAM197Y2 cg00050873
## 3   TTTY14 cg00212031
## 4   TMSB4Y cg00214611
## 5    TBL1Y cg01707559
## 6   TMSB4Y cg02004872

Mitch pipeline

The first part is to import the data into mitch.

This only works properly when mitch’s mapGeneIds function uses mean to aggregate (line 69 of mitch.R).

tic()
m <- mitch_import(x=dm,DEtype="limma",geneTable=gt1,geneIDcol="Name")

## The input is a single dataframe; one contrast only. Converting
##         it to a list for you.

## Note: Mean no. genes in input = 439918

## Note: no. genes in output = 20092

## Warning in mitch_import(x = dm, DEtype = "limma", geneTable = gt1, geneIDcol = "Name"): Warning: less than half of the input genes are also in the
##         output

toc() #2.5 sec

## 5.143 sec elapsed

head(m) %>%
  kbl(caption = "Example of differential gene methylation scores used for pathway analysis") %>%
  kable_paper("hover", full_width = F)

Example of differential gene methylation scores used for pathway analysis
	x
A1BG	0.7093783
A1BG-AS1	0.5498603
A1CF	1.0133746
A2M	0.2846473
A2ML1	0.1595816
A4GALT	0.0441000

Now run the enrichment analysis.

Note that the results are not sorted by p-value, rather S.distance, an enrichment score. I think this works better for interpretation.

tic()
mres <- mitch_calc(x=m,genesets=gene_sets,minsetsize=5, priority="effect")

## Note: Enrichments with large effect sizes may not be
##             statistically significant.

toc() #7 secs

## 251.525 sec elapsed

mtable <- mres$enrichment_result
up <- subset(mtable,s.dist>0 & p.adjustANOVA<0.05)
dn <- subset(mtable,s.dist<0 & p.adjustANOVA<0.05)
nrow(up)

## [1] 8

nrow(dn)

## [1] 10

head(up,10) %>%
  kbl(caption = "Top significant pathways with higher methylation") %>%
  kable_paper("hover", full_width = F)

Top significant pathways with higher methylation
	set	setSize	pANOVA	s.dist	p.adjustANOVA
10188	GOMF_STRUCTURAL_CONSTITUENT_OF_SKIN_EPIDERMIS	35	1.53e-05	0.4223663	0.0155125
2100	GOBP_KERATINIZATION	78	4.00e-07	0.3329977	0.0019071
2016	GOBP_INTERMEDIATE_FILAMENT_ORGANIZATION	73	1.20e-06	0.3286556	0.0041505
7143	GOBP_STRIATED_MUSCLE_ADAPTATION	55	3.76e-05	0.3212865	0.0222716
8073	GOCC_KERATIN_FILAMENT	85	3.30e-06	0.2919890	0.0052689
2014	GOBP_INTERMEDIATE_FILAMENT_BASED_PROCESS	95	2.13e-05	0.2523268	0.0159130
1964	GOBP_INORGANIC_ION_IMPORT_ACROSS_PLASMA_MEMBRANE	136	2.14e-05	0.2110562	0.0159130
10191	GOMF_STRUCTURAL_MOLECULE_ACTIVITY	769	1.76e-05	0.0911287	0.0155125

head(dn,10) %>%
  kbl(caption = "Top significant pathways with lower methylation") %>%
  kable_paper("hover", full_width = F)

Top significant pathways with lower methylation
	set	setSize	pANOVA	s.dist	p.adjustANOVA
9210	GOMF_GLUTATHIONE_PEROXIDASE_ACTIVITY	22	3.73e-05	-0.5077864	0.0222716
6144	GOBP_REGULATION_OF_PLATELET_AGGREGATION	27	3.86e-05	-0.4575555	0.0222716
7750	GOCC_CENTROSOME	630	1.79e-05	-0.1002339	0.0155125
8135	GOCC_MICROTUBULE_ORGANIZING_CENTER	829	2.30e-06	-0.0967837	0.0052689
8236	GOCC_NUCLEAR_PROTEIN_CONTAINING_COMPLEX	1184	0.00e+00	-0.0944773	0.0004838
4779	GOBP_POSITIVE_REGULATION_OF_TRANSCRIPTION_BY_RNA_POLYMERASE_II	1226	3.50e-06	-0.0789663	0.0052689
1182	GOBP_DNA_METABOLIC_PROCESS	1016	2.82e-05	-0.0778273	0.0195531
8132	GOCC_MICROTUBULE_CYTOSKELETON	1339	3.50e-06	-0.0757001	0.0052689
743	GOBP_CELL_CYCLE_PROCESS	1287	1.43e-05	-0.0721735	0.0155125
736	GOBP_CELL_CYCLE	1791	4.20e-06	-0.0657249	0.0055179

Now make a barplot of these top findings.

top <- rbind(up[1:10,],dn[1:10,])
top <- top[order(top$s.dist),]

barnames <- gsub("_"," ",top$set)
cols <- as.character(((sign(top$s.dist)+1)/2)+1)
cols <- gsub("1","blue",gsub("2","red",cols))

par(mar = c(5.1, 29.1, 4.1, 2.1))
barplot(abs(top$s.dist),horiz=TRUE,las=1,names.arg=barnames, cex.names=0.8,
  cex.axis=0.8,col=cols, xlab="Enrichment score",main="Gene Ontologies")
mtext("whole gene")
grid()

par( mar = c(5.1, 4.1, 4.1, 2.1) )

There are some interesting results there that give hints to the differences between naive T and Treg cells.

Make a html report and some charts.

mitch_report(res=mres,outfile="example_mitchreport.html",overwrite=TRUE)

## Dataset saved as " /tmp/Rtmp85T27l/example_mitchreport.rds ".

## 
## 
## processing file: mitch.Rmd

## 1/34                             
## 2/34 [checklibraries]            
## 3/34                             
## 4/34 [peek]                      
## 5/34                             
## 6/34 [metrics]                   
## 7/34                             
## 8/34 [scatterplot]

## 9/34                             
## 10/34 [contourplot]               
## 11/34                             
## 12/34 [input_geneset_metrics1]    
## 13/34                             
## 14/34 [input_geneset_metrics2]

## 15/34                             
## 16/34 [input_geneset_metrics3]

## 17/34                             
## 18/34 [echart1d]                  
## 19/34 [echart2d]                  
## 20/34                             
## 21/34 [heatmap]                   
## 22/34                             
## 23/34 [effectsize]                
## 24/34                             
## 25/34 [results_table]             
## 26/34                             
## 27/34 [results_table_complete]    
## 28/34                             
## 29/34 [detailed_geneset_reports1d]

## 30/34                             
## 31/34 [detailed_geneset_reports2d]
## 32/34                             
## 33/34 [session_info]              
## 34/34

## output file: /mnt/data/mdz/projects/gmea/example_workflow/mitch.knit.md

## /home/mdz/anaconda3/bin/pandoc +RTS -K512m -RTS /mnt/data/mdz/projects/gmea/example_workflow/mitch.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output /tmp/Rtmp85T27l/mitch_report.html --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --section-divs --template /usr/local/lib/R/site-library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=bootstrap --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /tmp/Rtmp85T27l/rmarkdown-str97bf277041e14.html

## 
## Output created: /tmp/Rtmp85T27l/mitch_report.html

## [1] TRUE

mitch_plots(res=mres,outfile="example_mitchcharts.pdf")

## png 
##   2

Make gene table for EPIC array data

In case you are working with EPIC array data, the probe-gene table can be make with this snippet of code.

tic()
anno <- getAnnotation(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
myann <- data.frame(anno[,c("UCSC_RefGene_Name","UCSC_RefGene_Group","Islands_Name","Relation_to_Island")])
gp <- myann[,"UCSC_RefGene_Name",drop=FALSE]
gp2 <- strsplit(gp$UCSC_RefGene_Name,";")
names(gp2) <- rownames(gp)
gp2 <- lapply(gp2,unique)
gt2 <- stack(gp2)
colnames(gt2) <- c("gene","probe")
gt2$probe <- as.character(gt2$probe)
toc() #9.0s

## 22.065 sec elapsed

dim(gt2)

## [1] 684970      2

str(gt2)

## 'data.frame':    684970 obs. of  2 variables:
##  $ gene : chr  "YTHDF1" "EIF2S3" "PKN3" "CCDC57" ...
##  $ probe: chr  "cg18478105" "cg09835024" "cg14361672" "cg01763666" ...

length(unique(gt2$gene))

## [1] 27364

head(gt2)

##     gene      probe
## 1 YTHDF1 cg18478105
## 2 EIF2S3 cg09835024
## 3   PKN3 cg14361672
## 4 CCDC57 cg01763666
## 5   INF2 cg12950382
## 6  CDC16 cg02115394

tic()
new.hgnc.table <- readRDS("../figs/new.hgnc.table.rds")
fix <- checkGeneSymbols(gt2$gene,map=new.hgnc.table)

## Warning in checkGeneSymbols(gt2$gene, map = new.hgnc.table): Human gene symbols
## should be all upper-case except for the 'orf' in open reading frames. The case
## of some letters was corrected.

## Warning in checkGeneSymbols(gt2$gene, map = new.hgnc.table): x contains
## non-approved gene symbols

fix2 <- fix[which(fix$x != fix$Suggested.Symbol),]
length(unique(fix2$x))

## [1] 3253

gt2$gene <- fix$Suggested.Symbol
toc()

## 183.881 sec elapsed

head(gt2)

##     gene      probe
## 1 YTHDF1 cg18478105
## 2 EIF2S3 cg09835024
## 3   PKN3 cg14361672
## 4 CCDC57 cg01763666
## 5   INF2 cg12950382
## 6  CDC16 cg02115394

Make gene table for the new EPIC version 2 array

tic()
ah <- AnnotationHub()

## snapshotDate(): 2024-04-30

EPICv2manifest <- ah[["AH116484"]]

## loading from cache

myann <- data.frame(EPICv2manifest[,c("UCSC_RefGene_Name","UCSC_RefGene_Group","UCSC_CpG_Islands_Name","Relation_to_UCSC_CpG_Island","posrep_RepNum")])
myann <- subset(myann,posrep_RepNum==1)
gp <- myann[,"UCSC_RefGene_Name",drop=FALSE]
gp2 <- strsplit(gp$UCSC_RefGene_Name,";")
names(gp2) <- rownames(gp)
gp2 <- lapply(gp2,unique)
gt3 <- stack(gp2)
colnames(gt3) <- c("gene","probe")
gt3$probe <- as.character(gt3$probe)
dim(gt3)

## [1] 360697      2

str(gt3)

## 'data.frame':    360697 obs. of  2 variables:
##  $ gene : chr  "RBL2" "ANKFN1" "VDAC3" "ACTN1" ...
##  $ probe: chr  "cg00000029_TC21" "cg00000221_BC21" "cg00000236_TC21" "cg00000289_TC21" ...

toc()

## 58.14 sec elapsed

tic()
new.hgnc.table <- readRDS("../figs/new.hgnc.table.rds")
fix <- checkGeneSymbols(gt3$gene,map=new.hgnc.table)

## Warning in checkGeneSymbols(gt3$gene, map = new.hgnc.table): Human gene symbols
## should be all upper-case except for the 'orf' in open reading frames. The case
## of some letters was corrected.

## Warning in checkGeneSymbols(gt3$gene, map = new.hgnc.table): x contains
## non-approved gene symbols

fix2 <- fix[which(fix$x != fix$Suggested.Symbol),]
length(unique(fix2$x))

## [1] 237

gt3$gene <- fix$Suggested.Symbol
toc()

## 9.701 sec elapsed

head(gt3)

##     gene           probe
## 1   RBL2 cg00000029_TC21
## 2 ANKFN1 cg00000221_BC21
## 3  VDAC3 cg00000236_TC21
## 4  ACTN1 cg00000289_TC21
## 5 ATP2A1 cg00000292_BC21
## 6  SFRP1 cg00000321_BC21

Session information

sessionInfo()

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
##  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
##  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] pkgload_1.3.4                                      
##  [2] GGally_2.2.1                                       
##  [3] ggplot2_3.5.1                                      
##  [4] reshape2_1.4.4                                     
##  [5] gplots_3.1.3.1                                     
##  [6] gtools_3.9.5                                       
##  [7] tibble_3.2.1                                       
##  [8] dplyr_1.1.4                                        
##  [9] echarts4r_0.4.5                                    
## [10] png_0.1-8                                          
## [11] gridExtra_2.3                                      
## [12] missMethyl_1.38.0                                  
## [13] beeswarm_0.4.0                                     
## [14] kableExtra_1.4.0                                   
## [15] mitch_1.16.0                                       
## [16] tictoc_1.2.1                                       
## [17] HGNChelper_0.8.14                                  
## [18] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
## [19] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1 
## [20] minfi_1.50.0                                       
## [21] bumphunter_1.46.0                                  
## [22] locfit_1.5-9.10                                    
## [23] iterators_1.0.14                                   
## [24] foreach_1.5.2                                      
## [25] Biostrings_2.72.1                                  
## [26] XVector_0.44.0                                     
## [27] SummarizedExperiment_1.34.0                        
## [28] Biobase_2.64.0                                     
## [29] MatrixGenerics_1.16.0                              
## [30] matrixStats_1.3.0                                  
## [31] GenomicRanges_1.56.1                               
## [32] GenomeInfoDb_1.40.1                                
## [33] IRanges_2.38.0                                     
## [34] S4Vectors_0.42.0                                   
## [35] eulerr_7.0.2                                       
## [36] limma_3.60.3                                       
## [37] AnnotationHub_3.12.0                               
## [38] BiocFileCache_2.12.0                               
## [39] dbplyr_2.5.0                                       
## [40] BiocGenerics_0.50.0                                
## 
## loaded via a namespace (and not attached):
##   [1] splines_4.4.1             later_1.3.2              
##   [3] BiocIO_1.14.0             bitops_1.0-7             
##   [5] filelock_1.0.3            preprocessCore_1.66.0    
##   [7] XML_3.99-0.17             lifecycle_1.0.4          
##   [9] lattice_0.22-6            MASS_7.3-61              
##  [11] base64_2.0.1              scrime_1.3.5             
##  [13] magrittr_2.0.3            sass_0.4.9               
##  [15] rmarkdown_2.27            jquerylib_0.1.4          
##  [17] yaml_2.3.8                httpuv_1.6.15            
##  [19] doRNG_1.8.6               askpass_1.2.0            
##  [21] DBI_1.2.3                 RColorBrewer_1.1-3       
##  [23] abind_1.4-5               zlibbioc_1.50.0          
##  [25] quadprog_1.5-8            purrr_1.0.2              
##  [27] RCurl_1.98-1.14           rappdirs_0.3.3           
##  [29] GenomeInfoDbData_1.2.12   genefilter_1.86.0        
##  [31] annotate_1.82.0           svglite_2.1.3            
##  [33] DelayedMatrixStats_1.26.0 codetools_0.2-20         
##  [35] DelayedArray_0.30.1       xml2_1.3.6               
##  [37] tidyselect_1.2.1          UCSC.utils_1.0.0         
##  [39] beanplot_1.3.1            illuminaio_0.46.0        
##  [41] GenomicAlignments_1.40.0  jsonlite_1.8.8           
##  [43] multtest_2.60.0           survival_3.7-0           
##  [45] systemfonts_1.1.0         tools_4.4.1              
##  [47] Rcpp_1.0.12               glue_1.7.0               
##  [49] SparseArray_1.4.8         xfun_0.45                
##  [51] HDF5Array_1.32.0          withr_3.0.0              
##  [53] BiocManager_1.30.23       fastmap_1.2.0            
##  [55] rhdf5filters_1.16.0       fansi_1.0.6              
##  [57] openssl_2.2.0             caTools_1.18.2           
##  [59] digest_0.6.36             R6_2.5.1                 
##  [61] mime_0.12                 colorspace_2.1-0         
##  [63] RSQLite_2.3.7             utf8_1.2.4               
##  [65] tidyr_1.3.1               generics_0.1.3           
##  [67] data.table_1.15.4         rtracklayer_1.64.0       
##  [69] httr_1.4.7                htmlwidgets_1.6.4        
##  [71] S4Arrays_1.4.1            ggstats_0.6.0            
##  [73] pkgconfig_2.0.3           gtable_0.3.5             
##  [75] blob_1.2.4                siggenes_1.78.0          
##  [77] htmltools_0.5.8.1         scales_1.3.0             
##  [79] rstudioapi_0.16.0         knitr_1.47               
##  [81] tzdb_0.4.0                rjson_0.2.21             
##  [83] nlme_3.1-165              curl_5.2.1               
##  [85] org.Hs.eg.db_3.19.1       cachem_1.1.0             
##  [87] rhdf5_2.48.0              stringr_1.5.1            
##  [89] KernSmooth_2.23-24        BiocVersion_3.19.1       
##  [91] AnnotationDbi_1.66.0      restfulr_0.0.15          
##  [93] GEOquery_2.72.0           pillar_1.9.0             
##  [95] grid_4.4.1                reshape_0.8.9            
##  [97] vctrs_0.6.5               promises_1.3.0           
##  [99] xtable_1.8-4              evaluate_0.24.0          
## [101] readr_2.1.5               GenomicFeatures_1.56.0   
## [103] cli_3.6.3                 compiler_4.4.1           
## [105] Rsamtools_2.20.0          rlang_1.1.4              
## [107] crayon_1.5.3              rngtools_1.5.2           
## [109] nor1mix_1.3-3             mclust_6.1.1             
## [111] plyr_1.8.9                stringi_1.8.4            
## [113] viridisLite_0.4.2         BiocParallel_1.38.0      
## [115] munsell_0.5.1             Matrix_1.7-0             
## [117] hms_1.1.3                 sparseMatrixStats_1.16.0 
## [119] bit64_4.0.5               Rhdf5lib_1.26.0          
## [121] KEGGREST_1.44.1           statmod_1.5.0            
## [123] shiny_1.8.1.1             highr_0.11               
## [125] memoise_2.0.1             bslib_0.7.0              
## [127] bit_4.0.5                 splitstackshape_1.4.8

Bibliography

Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015; 43:e47–7.

Zhang Y, Maksimovic J, Naselli G, Qian J, Chopin M, Blewitt ME, Oshlack A, Harrison LC. Genome-wide DNA methylation analysis identifies hypomethylated genes regulated by FOXP3 in human regulatory T cells. Blood 2013; 122:2823–36.

Maksimovic J, Phipson B, Oshlack A. A cross-package bioconductor workflow for analysing methylation array data. F1000Res 2017; 5:1281.

Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The molecular signatures database hallmark gene set collection. Cell Syst 2015; 1:417–25.

The Gene Ontology Consortium, Aleksander SA, Balhoff J, Carbon S, Cherry JM, Drabkin HJ, Ebert D, Feuermann M, Gaudet P, Harris NL, et al. The gene ontology knowledgebase in 2023. Genetics 2023; 224.