Source: https://github.com/markziemann/enrichment_recipe

Introduction

This guide is a Rmarkdown script that conducts differential expression and enrichment analysis, which are very popular workflows for transcriptome data.

The goal of this work is to understand how ClusterProfiler manages the background list, as we have observed some weird behaviour. In particular we see that when we provide a custom background, those genes do not appear in the final analysis unless they are also included in the gene list.

In the code chunk below called libs, you can add and remove required R library dependancies. Check that the libraries listed here match the Dockerfile, otherwise you might get errors.

suppressPackageStartupMessages({
  library("DESeq2")
  library("kableExtra")
  library("clusterProfiler")
  library("fgsea")
  library("eulerr")
  library("gplots")
})

For this report I will be using proteomics data from Mackmull et al (2015) where they looked at the effect of HDACi on protein expression in nuclear and cytosolic compartments. Here I am using the data from the TSA treated nuclear fractions after 48 hrs exposure. The data were obtained from Supplementary Table 2 and saved to the /data folder.

def <- read.table("../data/tsa_48hr_nuclear_frac.tsv",sep="\t",header=TRUE)

KEGG pathway analysis

In the next sections I will run enrichment analysis with over-representation analysis (ORA) test and compare it to functional class scoring. I will also investigate some strange behviour of the ORA tool clusterprofiler.

ORA with Clusterprofiler custom background with otherwise default analysis

I’ve compiled a reporting checklist:

Reporting criteria	Method/resource used
Origin of gene sets	KEGG (2023-06-16)
Tool used	ClusterProfiler (check version at foot of report)
Statistical test used	hypergeometric test
P-value correction for multiple comparisons	FDR method
Background list	Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria	DESeq2 FDR<0.05
ORA directionality	Separate tests for up- and down-regulation
Data availability	Mackmull et al (2015)
Other parameters	Min gene set size of 10

Here I provide a custom background gene list.

Get the gene sets loaded in R. These are KEGG gene sets

# from https://www.gsea-msigdb.org/gsea/msigdb/human/collections.jsp 16th June 2023
genesets2 <- read.gmt("../ref/c2.cp.kegg.v2023.1.Hs.symbols.gmt")
gsets <- gmtPathways("../ref/c2.cp.kegg.v2023.1.Hs.symbols.gmt")

message(paste("number of genes described in the annotation set:",length(unique(genesets2$gene))))

## number of genes described in the annotation set: 5244

Now filter the gene names into three lists, up-regulated, down-regulated and background. Background is simply all genes that were detected.

defup <- unique(subset(def,adj.P.Val<=0.05 & logFC>0)$GN)

defdn <- unique(subset(def,adj.P.Val<=0.05 & logFC<0)$GN)

bg <- unique(def$GN)

message(paste("number of proteins in background:",length(bg)))

## number of proteins in background: 1619

Enrichment firstly with upregulated genes

n_pw=10

ora_up <- as.data.frame(enricher(gene = defup ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

ora_up$geneID <- NULL
ora_up <- subset(ora_up,p.adjust<0.05 & Count >=5)
ora_ups <- rownames(ora_up)

gr <- as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",2))

ora_up$es <- gr/br
ora_up <- ora_up[order(-ora_up$es),]
ora_up$Description=NULL

head(ora_up) %>%
  kbl(row.names = FALSE, caption="Top upregulated pathways in LPS treatment") %>%
  kable_paper("hover", full_width = F)

Top upregulated pathways in LPS treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es

topup2 <- rev(head(ora_up$es,10))
names(topup2) <- rev(head(ora_up$ID,10))

The reported background size does not match the dataset.

Now repeat with the downregulated genes

ora_dn <- as.data.frame(enricher(gene = defdn ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

ora_dn$geneID <- NULL
ora_dn <- subset(ora_dn,p.adjust<0.05 & Count >=5)
ora_dns <- rownames(ora_dn)

gr <- as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",2))

ora_dn$es <- gr/br
ora_dn <- ora_dn[order(-ora_dn$es),]
ora_dn$Description=NULL

head(ora_dn,n_pw) %>%
  kbl(row.names = FALSE, caption="Top downregulated pathways in LPS treatment") %>%
  kable_paper("hover", full_width = F)

Top downregulated pathways in LPS treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es

topdn2 <- head(ora_dn$es,n_pw)
names(topdn2) <- head(ora_dn$ID,n_pw)

Make a barplot

par(mar=c(5,20,5,1))

cols <- c(rep("blue",n_pw),rep("red",n_pw))

top <- c(topdn2,topup2)

if ( length(top) > 1 ) {
  barplot(c(top),
    horiz=TRUE,las=1,cex.names=0.65,col=cols,
    main="top DE KEGGs",
    xlab="ES",
    cex.main=0.9)
  mtext("ORA test")
}

ORA with Clusterprofiler with custom background and special gene set list to fix the potential bug

I’ve compiled a reporting checklist:

Reporting criteria	Method/resource used
Origin of gene sets	KEGG (2023-06-16)
Tool used	ClusterProfiler (check version at foot of report)
Statistical test used	hypergeometric test
P-value correction for multiple comparisons	FDR method
Background list	Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria	DESeq2 FDR<0.05
ORA directionality	Separate tests for up- and down-regulation
Data availability	Mackmull et al (2015)
Other parameters	Min gene set size of 10 + BUG FIX

Here I provide a background gene list for cluterprofiler to use, like a user should.

I have also done some modification to the KEGG gene sets, to en

Now filter the gene names into three lists, up-regulated, down-regulated and background. Background is simply all genes that were detected.

Adding all detected genes to the background appears to improve results!

bgdf <- data.frame("background",bg)
colnames(bgdf) <- c("term","gene")
genesets2 <- rbind(genesets2,bgdf)

Enrichment firstly with upregulated genes

n_pw=10

orafix_up <- as.data.frame(enricher(gene = defup ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

orafix_up$geneID <- NULL
orafix_up <- subset(orafix_up,p.adjust<0.05 & Count >=5)
orafix_ups <- rownames(orafix_up)

gr <- as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",2))

orafix_up$es <- gr/br
orafix_up <- orafix_up[order(-orafix_up$es),]
orafix_up$Description=NULL

head(orafix_up) %>%
  kbl(row.names = FALSE, caption="Top upregulated pathways in LPS treatment") %>%
  kable_paper("hover", full_width = F)

Top upregulated pathways in LPS treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es

topup2 <- rev(head(orafix_up$es,10))
names(topup2) <- rev(head(orafix_up$ID,10))

Now with the downregulated genes

orafix_dn <- as.data.frame(enricher(gene = defdn ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

orafix_dn$geneID <- NULL
orafix_dn <- subset(orafix_dn,p.adjust<0.05 & Count >=5)
orafix_dns <- rownames(orafix_dn)

gr <- as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",2))

orafix_dn$es <- gr/br
orafix_dn <- orafix_dn[order(-orafix_dn$es),]
orafix_dn$Description=NULL

head(orafix_dn,n_pw) %>%
  kbl(row.names = FALSE, caption="Top downregulated pathways in LPS treatment") %>%
  kable_paper("hover", full_width = F)

Top downregulated pathways in LPS treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es

topdn2 <- head(orafix_dn$es,n_pw)
names(topdn2) <- head(orafix_dn$ID,n_pw)

Make a barplot

par(mar=c(5,20,5,1))

cols <- c(rep("blue",n_pw),rep("red",n_pw))

top <- c(topdn2,topup2)

if ( length(top) > 1 ) {
  barplot(c(top),
    horiz=TRUE,las=1,cex.names=0.65,col=cols,
    main="top DE KEGGs",
    xlab="ES",
    cex.main=0.9)
  mtext("ORA fix")
}

Compare pathway results

v1 <- list("ORA up"=ora_ups, "ORA fix up"=orafix_ups)
v2 <- list("ORA dn"=ora_dns, "ORA fix dn"=orafix_dns)
v3 <- list("ORA"=union(ora_ups,ora_dns),
  "ORA fix"=union(orafix_ups,orafix_dns))

par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2),main="upregulated KEGG pathways")

plot(euler(v2),quantities = list(cex = 2), labels = list(cex = 2),main="downregulated KEGG pathways")

plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated KEGG pathways")

png("prot_kegg_euler.png")
plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated KEGG pathways")
dev.off()

## png 
##   2

Jaccard index comparing ORA and ORA fix.

jaccard <- function(a, b) {
    intersection = length(intersect(a, b))
    union = length(a) + length(b) - intersection
    return (intersection/union)
}

ora <- c(ora_ups,ora_dns)
orafix <- c(orafix_ups,orafix_dns)

jaccard(ora,orafix)

## [1] NaN

List gene sets identified only in the “fixed” analysis.

ora <- union(ora_ups,ora_dns)

orafix <- union(orafix_ups,orafix_dns)

setdiff(orafix,ora)

## character(0)

Reactome Gene sets

ORA with Clusterprofiler custom background with otherwise default analysis

I’ve compiled a reporting checklist:

Reporting criteria	Method/resource used
Origin of gene sets	Reactome (2023-03-06)
Tool used	ClusterProfiler (check version at foot of report)
Statistical test used	hypergeometric test
P-value correction for multiple comparisons	FDR method
Background list	Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria	DESeq2 FDR<0.05
ORA directionality	Separate tests for up- and down-regulation
Data availability	Mackmull et al (2015)
Other parameters	Min gene set size of 10

Here I provide a custom background gene list.

Get the gene sets loaded in R. These are Reactome gene sets

genesets2 <- read.gmt("../ref/ReactomePathways_2023-03-06.gmt")
gsets <- gmtPathways("../ref/ReactomePathways_2023-03-06.gmt")

message(paste("number of genes described in the annotation set:",length(unique(genesets2$gene))))

## number of genes described in the annotation set: 11457

Enrichment firstly with upregulated genes

n_pw=10

ora_up <- as.data.frame(enricher(gene = defup ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

ora_up$geneID <- NULL
ora_up <- subset(ora_up,p.adjust<0.05 & Count >=5)
ora_ups <- rownames(ora_up)

gr <- as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",2))

ora_up$es <- gr/br
ora_up <- ora_up[order(-ora_up$es),]
ora_up$Description=NULL

head(ora_up) %>%
  kbl(row.names = FALSE, caption="Top upregulated pathways in LPS treatment") %>%
  kable_paper("hover", full_width = F)

Top upregulated pathways in LPS treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es
Ub-specific processing proteases	8/104	11/645	3.88e-05	0.0104044	0.0091131	8	4.51049

topup2 <- rev(head(ora_up$es,10))
names(topup2) <- rev(head(ora_up$ID,10))

Now repeat with the downregulated genes

ora_dn <- as.data.frame(enricher(gene = defdn ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

ora_dn$geneID <- NULL
ora_dn <- subset(ora_dn,p.adjust<0.05 & Count >=5)
ora_dns <- rownames(ora_dn)

gr <- as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",2))

ora_dn$es <- gr/br
ora_dn <- ora_dn[order(-ora_dn$es),]
ora_dn$Description=NULL

head(ora_dn,n_pw) %>%
  kbl(row.names = FALSE, caption="Top downregulated pathways in LPS treatment") %>%
  kable_paper("hover", full_width = F)

Top downregulated pathways in LPS treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es

topdn2 <- head(ora_dn$es,n_pw)
names(topdn2) <- head(ora_dn$ID,n_pw)

Make a barplot

par(mar=c(5,20,5,1))

cols <- c(rep("blue",n_pw),rep("red",n_pw))

top <- c(topdn2,topup2)

if ( length(top) > 1 ) {
  barplot(c(top),
    horiz=TRUE,las=1,cex.names=0.65,col=cols,
    main="top DE Reactomes",
    xlab="ES",
    cex.main=0.9)
  mtext("ORA test")
}

ORA with Clusterprofiler with custom background and special gene set list to fix the potential bug

I’ve compiled a reporting checklist:

Reporting criteria	Method/resource used
Origin of gene sets	Reactome (2023-03-06)
Tool used	ClusterProfiler (check version at foot of report)
Statistical test used	hypergeometric test
P-value correction for multiple comparisons	FDR method
Background list	Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria	DESeq2 FDR<0.05
ORA directionality	Separate tests for up- and down-regulation
Data availability	Mackmull et al (2015)
Other parameters	Min gene set size of 10 + BUG FIX

Adding all detected genes to the background appears to improve results!

bgdf <- data.frame("background",bg)
colnames(bgdf) <- c("term","gene")
genesets2 <- rbind(genesets2,bgdf)

Enrichment firstly with upregulated genes

orafix_up <- as.data.frame(enricher(gene = defup ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

orafix_up$geneID <- NULL
orafix_up <- subset(orafix_up,p.adjust<0.05 & Count >=5)
orafix_ups <- rownames(orafix_up)

gr <- as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",2))

orafix_up$es <- gr/br
orafix_up <- orafix_up[order(-orafix_up$es),]
orafix_up$Description=NULL

head(orafix_up) %>%
  kbl(row.names = FALSE, caption="Top upregulated pathways in LPS treatment") %>%
  kable_paper("hover", full_width = F)

Top upregulated pathways in LPS treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es
Ub-specific processing proteases	8/292	11/1619	0.0001024	0.0275564	0.0251248	8	4.032379

topup2 <- rev(head(orafix_up$es,10))
names(topup2) <- rev(head(orafix_up$ID,10))

Now with the downregulated genes

n_pw=10

orafix_dn <- as.data.frame(enricher(gene = defdn ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

orafix_dn$geneID <- NULL
orafix_dn <- subset(orafix_dn,p.adjust<0.05 & Count >=5)
orafix_dns <- rownames(orafix_dn)

gr <- as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",2))

orafix_dn$es <- gr/br
orafix_dn <- orafix_dn[order(-orafix_dn$es),]
orafix_dn$Description=NULL

head(orafix_dn,n_pw) %>%
  kbl(row.names = FALSE, caption="Top downregulated pathways in LPS treatment") %>%
  kable_paper("hover", full_width = F)

Top downregulated pathways in LPS treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es

topdn2 <- head(orafix_dn$es,n_pw)
names(topdn2) <- head(orafix_dn$ID,n_pw)

Make a barplot

par(mar=c(5,20,5,1))

cols <- c(rep("blue",n_pw),rep("red",n_pw))

top <- c(topdn2,topup2)

if ( length(top) > 1 ) {
  barplot(c(top),
    horiz=TRUE,las=1,cex.names=0.65,col=cols,
    main="top DE Reactomes",
    xlab="ES",
    cex.main=0.9)
  mtext("ORA fix")
}

Compare pathway results

v1 <- list("ORA up"=ora_ups, "ORA fix up"=orafix_ups)
v2 <- list("ORA dn"=ora_dns, "ORA fix dn"=orafix_dns)
v3 <- list("ORA"=union(ora_ups,ora_dns),
  "ORA fix"=union(orafix_ups,orafix_dns))

par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2),main="upregulated Reactomes")

plot(euler(v2),quantities = list(cex = 2), labels = list(cex = 2),main="downregulated Reactomes")

plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated Reactomes")

png("prot_reactome_euler.png")
plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated Reactomes")
dev.off()

## png 
##   2

Jaccard index comparing ORA and ORA fix.

jaccard <- function(a, b) {
    intersection = length(intersect(a, b))
    union = length(a) + length(b) - intersection
    return (intersection/union)
}

ora <- c(ora_ups,ora_dns)
orafix <- c(orafix_ups,orafix_dns)

jaccard(ora,orafix)

## [1] 1

List gene sets identified only in the “fixed” analysis.

ora <- union(ora_ups,ora_dns)

orafix <- union(orafix_ups,orafix_dns)

setdiff(orafix,ora)

## character(0)

Session information

For reproducibility

Click HERE to show session info

sessionInfo()

## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
##  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
##  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] gplots_3.1.3                eulerr_7.0.0               
##  [3] fgsea_1.26.0                clusterProfiler_4.8.2      
##  [5] kableExtra_1.3.4            DESeq2_1.40.2              
##  [7] SummarizedExperiment_1.30.2 Biobase_2.60.0             
##  [9] MatrixGenerics_1.12.2       matrixStats_1.0.0          
## [11] GenomicRanges_1.52.0        GenomeInfoDb_1.36.1        
## [13] IRanges_2.34.1              S4Vectors_0.38.1           
## [15] BiocGenerics_0.46.0        
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3      rstudioapi_0.15.0       jsonlite_1.8.7         
##   [4] magrittr_2.0.3          farver_2.1.1            rmarkdown_2.23         
##   [7] zlibbioc_1.46.0         vctrs_0.6.3             memoise_2.0.1          
##  [10] RCurl_1.98-1.12         ggtree_3.8.2            webshot_0.5.5          
##  [13] htmltools_0.5.5         S4Arrays_1.0.5          gridGraphics_0.5-1     
##  [16] sass_0.4.7              KernSmooth_2.23-22      bslib_0.5.0            
##  [19] plyr_1.8.8              cachem_1.0.8            igraph_1.5.0.1         
##  [22] lifecycle_1.0.3         pkgconfig_2.0.3         gson_0.1.0             
##  [25] Matrix_1.6-0            R6_2.5.1                fastmap_1.1.1          
##  [28] GenomeInfoDbData_1.2.10 digest_0.6.33           aplot_0.1.10           
##  [31] enrichplot_1.20.0       colorspace_2.1-0        patchwork_1.1.2        
##  [34] AnnotationDbi_1.62.2    RSQLite_2.3.1           fansi_1.0.4            
##  [37] httr_1.4.6              polyclip_1.10-4         abind_1.4-5            
##  [40] compiler_4.3.1          bit64_4.0.5             withr_2.5.0            
##  [43] downloader_0.4          BiocParallel_1.34.2     viridis_0.6.4          
##  [46] DBI_1.1.3               highr_0.10              ggforce_0.4.1          
##  [49] MASS_7.3-60             DelayedArray_0.26.6     HDO.db_0.99.1          
##  [52] caTools_1.18.2          gtools_3.9.4            tools_4.3.1            
##  [55] ape_5.7-1               scatterpie_0.2.1        glue_1.6.2             
##  [58] nlme_3.1-162            GOSemSim_2.26.1         polylabelr_0.2.0       
##  [61] grid_4.3.1              shadowtext_0.1.2        reshape2_1.4.4         
##  [64] generics_0.1.3          gtable_0.3.3            tidyr_1.3.0            
##  [67] data.table_1.14.8       tidygraph_1.2.3         xml2_1.3.5             
##  [70] utf8_1.2.3              XVector_0.40.0          ggrepel_0.9.3          
##  [73] pillar_1.9.0            stringr_1.5.0           yulab.utils_0.0.6      
##  [76] splines_4.3.1           dplyr_1.1.2             tweenr_2.0.2           
##  [79] treeio_1.24.3           lattice_0.21-8          bit_4.0.5              
##  [82] tidyselect_1.2.0        GO.db_3.17.0            locfit_1.5-9.8         
##  [85] Biostrings_2.68.1       knitr_1.43              gridExtra_2.3          
##  [88] svglite_2.1.1           xfun_0.39               graphlayouts_1.0.0     
##  [91] stringi_1.7.12          lazyeval_0.2.2          ggfun_0.1.1            
##  [94] yaml_2.3.7              evaluate_0.21           codetools_0.2-19       
##  [97] ggraph_2.1.0            tibble_3.2.1            qvalue_2.32.0          
## [100] ggplotify_0.1.1         cli_3.6.1               systemfonts_1.0.4      
## [103] munsell_0.5.0           jquerylib_0.1.4         Rcpp_1.0.11            
## [106] png_0.1-8               parallel_4.3.1          ggplot2_3.4.2          
## [109] blob_1.2.4              DOSE_3.26.1             bitops_1.0-7           
## [112] viridisLite_0.4.2       tidytree_0.4.4          scales_1.2.1           
## [115] purrr_1.0.1             crayon_1.5.2            rlang_1.1.1            
## [118] cowplot_1.1.1           fastmatch_1.1-3         KEGGREST_1.40.0        
## [121] rvest_1.0.3

Examining background gene lists in clusterProfiler - proteomics example

Mark Ziemann & Anusuiya Bora

2023-08-30

Introduction

KEGG pathway analysis

ORA with Clusterprofiler custom background with otherwise default analysis

ORA with Clusterprofiler with custom background and special gene set list to fix the potential bug

Compare pathway results

Reactome Gene sets

ORA with Clusterprofiler custom background with otherwise default analysis

ORA with Clusterprofiler with custom background and special gene set list to fix the potential bug

Compare pathway results

Session information