The following code is a sample enrichment analysis done on a pre-clustered set of Sacharomyces cerevisiae genes from the dee2 repository using the clusterProfiler package from Bioconducter.
Other interesting Links mentioned below
1. Prerequisite libraries
suppressPackageStartupMessages({c(library(org.Sc.sgd.db), library("clusterProfiler"), library("enrichplot"))})
## [1] "org.Sc.sgd.db" "AnnotationDbi" "IRanges" "S4Vectors"
## [5] "Biobase" "BiocGenerics" "parallel" "stats4"
## [9] "stats" "graphics" "grDevices" "utils"
## [13] "datasets" "methods" "base" "clusterProfiler"
## [17] "org.Sc.sgd.db" "AnnotationDbi" "IRanges" "S4Vectors"
## [21] "Biobase" "BiocGenerics" "parallel" "stats4"
## [25] "stats" "graphics" "grDevices" "utils"
## [29] "datasets" "methods" "base" "enrichplot"
## [33] "clusterProfiler" "org.Sc.sgd.db" "AnnotationDbi" "IRanges"
## [37] "S4Vectors" "Biobase" "BiocGenerics" "parallel"
## [41] "stats4" "stats" "graphics" "grDevices"
## [45] "utils" "datasets" "methods" "base"
2. Gene Enrichment Analysis
# Make an object containing a list for geneIDs of interest. In this example, we will be using the Ensembl geneID format
genesOfInterest <- c("YDL248W", "YDR542W", "YOL161C", "YOL155C", "YOR032C", "YGL261C", "RDN5-1", "YLR349W", "YPL062W", "YPL021W", "YMR321C", "YMR322C", "YMR323W", "YMR325W", "YBL108C-A", "YBL029W", "YBR090C", "YBR196C-A", "YNL143C", "YNL067W-A", "YJR159W", "YJR160C", "YJR161C", "YKL097C", "YER091C-A", "YHR145C", "YIL169C", "YIR043C", "YIR044C", "YFL062W", "YAL068C", "YAR066W", "YAR068W")
# The org.Sc.sgd.db is the organism-specific database for yeast. You can find the list of all OrgDbs by organism using *1 and use it directly as shown here or you can do an in-code-query of the organism using a package called AnnotationHub. Tutorials can be found in chapters 4 and five using *2 above.
# Here we can see all the keyTypes of the database. This will be used as one of the arguments for the enrichGO function. Since our genesOfInterest object uses the Ensembl geneID format, the appropriate keyType should be "ENSEMBL" as shown in the yeast_CP_go line
keytypes(org.Sc.sgd.db)
## [1] "ALIAS" "COMMON" "DESCRIPTION" "ENSEMBL" "ENSEMBLPROT"
## [6] "ENSEMBLTRANS" "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL"
## [11] "GENENAME" "GO" "GOALL" "INTERPRO" "ONTOLOGY"
## [16] "ONTOLOGYALL" "ORF" "PATH" "PFAM" "PMID"
## [21] "REFSEQ" "SGD" "SMART" "UNIPROT"
#---------GO Analysis using clusterprofiler
yeast_CP_go <- enrichGO(genesOfInterest, "org.Sc.sgd.db", keyType = "ENSEMBL", ont = "ALL")
# Transform object into a dataframe for easier viewing
yeast_CP_go_df <- as.data.frame(yeast_CP_go)
yeast_CP_go_df
## ONTOLOGY ID
## GO:0043328 BP GO:0043328
## GO:0043162 BP GO:0043162
## GO:0032511 BP GO:0032511
## GO:0032509 BP GO:0032509
## GO:0071985 BP GO:0071985
## GO:0045324 BP GO:0045324
## Description
## GO:0043328 protein transport to vacuole involved in ubiquitin-dependent protein catabolic process via the multivesicular body sorting pathway
## GO:0043162 ubiquitin-dependent protein catabolic process via the multivesicular body sorting pathway
## GO:0032511 late endosome to vacuole transport via multivesicular body sorting pathway
## GO:0032509 endosome transport via multivesicular body sorting pathway
## GO:0071985 multivesicular body sorting pathway
## GO:0045324 late endosome to vacuole transport
## GeneRatio BgRatio pvalue p.adjust qvalue
## GO:0043328 3/23 22/5798 7.996522e-05 0.009595827 0.009090783
## GO:0043162 3/23 33/5798 2.753495e-04 0.016520968 0.015651443
## GO:0032511 3/23 43/5798 6.069027e-04 0.018940948 0.017944056
## GO:0032509 3/23 47/5798 7.892062e-04 0.018940948 0.017944056
## GO:0071985 3/23 47/5798 7.892062e-04 0.018940948 0.017944056
## GO:0045324 3/23 59/5798 1.533898e-03 0.030677951 0.029063322
## geneID Count
## GO:0043328 YDL248W/YJR161C/YFL062W 3
## GO:0043162 YDL248W/YJR161C/YFL062W 3
## GO:0032511 YDL248W/YJR161C/YFL062W 3
## GO:0032509 YDL248W/YJR161C/YFL062W 3
## GO:0071985 YDL248W/YJR161C/YFL062W 3
## GO:0045324 YDL248W/YJR161C/YFL062W 3
# -------Visualization using dot plot
emapplot(yeast_CP_go)
