Source: https://github.com/markziemann/enrichment_recipe

Introduction

This guide is a Rmarkdown script that conducts differential expression and enrichment analysis, which are very popular workflows for transcriptome data.

The goal of this work is to understand how ClusterProfiler manages the background list, as we have observed some weird behaviour. In particular we see that when we provide a custom background, those genes do not appear in the final analysis unless they are also included in the gene list.

In the code chunk below called libs, you can add and remove required R library dependancies. Check that the libraries listed here match the Dockerfile, otherwise you might get errors.

suppressPackageStartupMessages({
  library("DESeq2")
  library("kableExtra")
  library("clusterProfiler")
  library("fgsea")
  library("eulerr")
  library("gplots")
})

For this report I will be using proteomics data from Stukalov et al (https://pubmed.ncbi.nlm.nih.gov/33845483/), who profiled the effect of SARS-COV2 infection on A549 cells.

Here we used the 24hr profile, as there were relatively more changes found.

Probable contaminants were removed, as well as viral proteins. This left 5825 proteins, including 165 downregulated and 44 upregulated.

The data were obtained from Supplementary Table 5 and saved to the /data folder.

def <- read.table("../data/sarscov2_prot.tsv",sep="\t",header=TRUE)

KEGG pathway analysis

In the next sections I will run enrichment analysis with over-representation analysis (ORA) test and compare it to functional class scoring. I will also investigate some strange behviour of the ORA tool clusterprofiler.

ORA with Clusterprofiler custom background with otherwise default analysis

I’ve compiled a reporting checklist:

Reporting criteria	Method/resource used
Origin of gene sets	KEGG (2023-06-16)
Tool used	ClusterProfiler (check version at foot of report)
Statistical test used	hypergeometric test
P-value correction for multiple comparisons	FDR method
Background list	Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria	DESeq2 FDR<0.05
ORA directionality	Separate tests for up- and down-regulation
Data availability	Stukalov et al 2021: PMID 33845483
Other parameters	Min gene set size of 10

Here I provide a custom background gene list.

Get the gene sets loaded in R. These are KEGG gene sets

# from https://www.gsea-msigdb.org/gsea/msigdb/human/collections.jsp 16th June 2023
genesets2 <- read.gmt("../ref/c2.cp.kegg.v2023.1.Hs.symbols.gmt")
gsets <- gmtPathways("../ref/c2.cp.kegg.v2023.1.Hs.symbols.gmt")

message(paste("number of genes described in the annotation set:",length(unique(genesets2$gene))))

## number of genes described in the annotation set: 5244

Now filter the gene names into three lists, up-regulated, down-regulated and background. Background is simply all genes that were detected.

defup <- unique(subset(def,change.SARS_CoV2.24h_vs_mock.24h=="+")$gene_name)

defdn <- unique(subset(def,change.SARS_CoV2.24h_vs_mock.24h=="-")$gene_name)

bg <- unique(def$gene_name)

message(paste("number of proteins in background:",length(bg)))

## number of proteins in background: 5804

Enrichment firstly with upregulated genes

n_pw=10

ora_up <- as.data.frame(enricher(gene = defup ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

ora_up$geneID <- NULL
ora_up <- subset(ora_up,p.adjust<0.05 & Count >=5)
ora_ups <- rownames(ora_up)

gr <- as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",2))

ora_up$es <- gr/br
ora_up <- ora_up[order(-ora_up$es),]
ora_up$Description=NULL

head(ora_up) %>%
  kbl(row.names = FALSE, caption="Top upregulated pathways in LPS treatment") %>%
  kable_paper("hover", full_width = F)

Top upregulated pathways in LPS treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es

topup2 <- rev(head(ora_up$es,10))
names(topup2) <- rev(head(ora_up$ID,10))

The reported background size does not match the dataset.

Now repeat with the downregulated genes

ora_dn <- as.data.frame(enricher(gene = defdn ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

ora_dn$geneID <- NULL
ora_dn <- subset(ora_dn,p.adjust<0.05 & Count >=5)
ora_dns <- rownames(ora_dn)

gr <- as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",2))

ora_dn$es <- gr/br
ora_dn <- ora_dn[order(-ora_dn$es),]
ora_dn$Description=NULL

head(ora_dn,n_pw) %>%
  kbl(row.names = FALSE, caption="Top downregulated pathways in LPS treatment") %>%
  kable_paper("hover", full_width = F)

Top downregulated pathways in LPS treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es
KEGG_CELL_ADHESION_MOLECULES_CAMS	9/66	25/1942	0.0e+00	0.0000033	0.0000032	9	10.592727
KEGG_LYSOSOME	11/66	65/1942	6.3e-06	0.0002243	0.0002195	11	4.979487

topdn2 <- head(ora_dn$es,n_pw)
names(topdn2) <- head(ora_dn$ID,n_pw)

Make a barplot

par(mar=c(5,20,5,1))

cols <- c(rep("blue",n_pw),rep("red",n_pw))

top <- c(topdn2,topup2)

if ( length(top) > 1 ) {
  barplot(c(top),
    horiz=TRUE,las=1,cex.names=0.65,col=cols,
    main="top DE KEGGs",
    xlab="ES",
    cex.main=0.9)
  mtext("ORA test")
}

ORA with Clusterprofiler with custom background and special gene set list to fix the potential bug

I’ve compiled a reporting checklist:

Reporting criteria	Method/resource used
Origin of gene sets	KEGG (2023-06-16)
Tool used	ClusterProfiler (check version at foot of report)
Statistical test used	hypergeometric test
P-value correction for multiple comparisons	FDR method
Background list	Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria	DESeq2 FDR<0.05
ORA directionality	Separate tests for up- and down-regulation
Data availability	Data availability
Other parameters	Min gene set size of 10 + BUG FIX

Here I provide a background gene list for cluterprofiler to use, like a user should.

I have also done some modification to the KEGG gene sets, to en

Now filter the gene names into three lists, up-regulated, down-regulated and background. Background is simply all genes that were detected.

Adding all detected genes to the background appears to improve results!

bgdf <- data.frame("background",bg)
colnames(bgdf) <- c("term","gene")
genesets2 <- rbind(genesets2,bgdf)

Enrichment firstly with upregulated genes

n_pw=10

orafix_up <- as.data.frame(enricher(gene = defup ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

orafix_up$geneID <- NULL
orafix_up <- subset(orafix_up,p.adjust<0.05 & Count >=5)
orafix_ups <- rownames(orafix_up)

gr <- as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",2))

orafix_up$es <- gr/br
orafix_up <- orafix_up[order(-orafix_up$es),]
orafix_up$Description=NULL

head(orafix_up) %>%
  kbl(row.names = FALSE, caption="Top upregulated pathways in LPS treatment") %>%
  kable_paper("hover", full_width = F)

Top upregulated pathways in LPS treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es

topup2 <- rev(head(orafix_up$es,10))
names(topup2) <- rev(head(orafix_up$ID,10))

Now with the downregulated genes

orafix_dn <- as.data.frame(enricher(gene = defdn ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

orafix_dn$geneID <- NULL
orafix_dn <- subset(orafix_dn,p.adjust<0.05 & Count >=5)
orafix_dns <- rownames(orafix_dn)

gr <- as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",2))

orafix_dn$es <- gr/br
orafix_dn <- orafix_dn[order(-orafix_dn$es),]
orafix_dn$Description=NULL

head(orafix_dn,n_pw) %>%
  kbl(row.names = FALSE, caption="Top downregulated pathways in LPS treatment") %>%
  kable_paper("hover", full_width = F)

Top downregulated pathways in LPS treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es
KEGG_CELL_ADHESION_MOLECULES_CAMS	9/165	25/5804	0.0e+00	1.00e-06	9.00e-07	9	12.66327
KEGG_LYSOSOME	11/165	65/5804	1.7e-06	5.91e-05	5.69e-05	11	5.95282

topdn2 <- head(orafix_dn$es,n_pw)
names(topdn2) <- head(orafix_dn$ID,n_pw)

Make a barplot

par(mar=c(5,20,5,1))

cols <- c(rep("blue",n_pw),rep("red",n_pw))

top <- c(topdn2,topup2)

if ( length(top) > 1 ) {
  barplot(c(top),
    horiz=TRUE,las=1,cex.names=0.65,col=cols,
    main="top DE KEGGs",
    xlab="ES",
    cex.main=0.9)
  mtext("ORA fix")
}

Compare pathway results

v1 <- list("ORA up"=ora_ups, "ORA fix up"=orafix_ups)
v2 <- list("ORA dn"=ora_dns, "ORA fix dn"=orafix_dns)
v3 <- list("ORA"=union(ora_ups,ora_dns),
  "ORA fix"=union(orafix_ups,orafix_dns))

par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2),main="upregulated KEGG pathways")

plot(euler(v2),quantities = list(cex = 2), labels = list(cex = 2),main="downregulated KEGG pathways")

plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated KEGG pathways")

png("prot_kegg_euler.png")
plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated KEGG pathways")
dev.off()

## png 
##   2

Jaccard index comparing ORA and ORA fix.

jaccard <- function(a, b) {
    intersection = length(intersect(a, b))
    union = length(a) + length(b) - intersection
    return (intersection/union)
}

ora <- c(ora_ups,ora_dns)
orafix <- c(orafix_ups,orafix_dns)

jaccard(ora,orafix)

## [1] 1

List gene sets identified only in the “fixed” analysis.

ora <- union(ora_ups,ora_dns)

orafix <- union(orafix_ups,orafix_dns)

setdiff(orafix,ora)

## character(0)

Reactome Gene sets

ORA with Clusterprofiler custom background with otherwise default analysis

I’ve compiled a reporting checklist:

Reporting criteria	Method/resource used
Origin of gene sets	Reactome (2023-03-06)
Tool used	ClusterProfiler (check version at foot of report)
Statistical test used	hypergeometric test
P-value correction for multiple comparisons	FDR method
Background list	Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria	DESeq2 FDR<0.05
ORA directionality	Separate tests for up- and down-regulation
Data availability	Data availability
Other parameters	Min gene set size of 10

Here I provide a custom background gene list.

Get the gene sets loaded in R. These are Reactome gene sets

genesets2 <- read.gmt("../ref/ReactomePathways_2023-03-06.gmt")
gsets <- gmtPathways("../ref/ReactomePathways_2023-03-06.gmt")

message(paste("number of genes described in the annotation set:",length(unique(genesets2$gene))))

## number of genes described in the annotation set: 11457

Enrichment firstly with upregulated genes

n_pw=10

ora_up <- as.data.frame(enricher(gene = defup ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

ora_up$geneID <- NULL
ora_up <- subset(ora_up,p.adjust<0.05 & Count >=5)
ora_ups <- rownames(ora_up)

gr <- as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",2))

ora_up$es <- gr/br
ora_up <- ora_up[order(-ora_up$es),]
ora_up$Description=NULL

head(ora_up) %>%
  kbl(row.names = FALSE, caption="Top upregulated pathways in LPS treatment") %>%
  kable_paper("hover", full_width = F)

Top upregulated pathways in LPS treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es
Signaling by ALK fusions and activated point mutants	5/32	43/3973	0.0000190	0.0023435	0.0019875	5	14.436773
Signaling by ALK in cancer	5/32	43/3973	0.0000190	0.0023435	0.0019875	5	14.436773
Signaling by Interleukins	7/32	181/3973	0.0004631	0.0228754	0.0194000	7	4.801623
Cytokine Signaling in Immune system	8/32	281/3973	0.0013334	0.0478657	0.0405936	8	3.534698

topup2 <- rev(head(ora_up$es,10))
names(topup2) <- rev(head(ora_up$ID,10))

Now repeat with the downregulated genes

ora_dn <- as.data.frame(enricher(gene = defdn ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

ora_dn$geneID <- NULL
ora_dn <- subset(ora_dn,p.adjust<0.05 & Count >=5)
ora_dns <- rownames(ora_dn)

gr <- as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",2))

ora_dn$es <- gr/br
ora_dn <- ora_dn[order(-ora_dn$es),]
ora_dn$Description=NULL

head(ora_dn,n_pw) %>%
  kbl(row.names = FALSE, caption="Top downregulated pathways in LPS treatment") %>%
  kable_paper("hover", full_width = F)

Top downregulated pathways in LPS treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es
ECM proteoglycans	5/131	17/3973	0.0001624	0.0382412	0.0361056	5	8.920072
Glycosphingolipid metabolism	5/131	18/3973	0.0002189	0.0382412	0.0361056	5	8.424512
Extracellular matrix organization	10/131	66/3973	0.0000472	0.0247204	0.0233399	10	4.595188

topdn2 <- head(ora_dn$es,n_pw)
names(topdn2) <- head(ora_dn$ID,n_pw)

Make a barplot

par(mar=c(5,20,5,1))

cols <- c(rep("blue",n_pw),rep("red",n_pw))

top <- c(topdn2,topup2)

if ( length(top) > 1 ) {
  barplot(c(top),
    horiz=TRUE,las=1,cex.names=0.65,col=cols,
    main="top DE Reactomes",
    xlab="ES",
    cex.main=0.9)
  mtext("ORA test")
}

ORA with Clusterprofiler with custom background and special gene set list to fix the potential bug

I’ve compiled a reporting checklist:

Reporting criteria	Method/resource used
Origin of gene sets	Reactome (2023-03-06)
Tool used	ClusterProfiler (check version at foot of report)
Statistical test used	hypergeometric test
P-value correction for multiple comparisons	FDR method
Background list	Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria	DESeq2 FDR<0.05
ORA directionality	Separate tests for up- and down-regulation
Data availability	Data availability
Other parameters	Min gene set size of 10 + BUG FIX

Adding all detected genes to the background appears to improve results!

bgdf <- data.frame("background",bg)
colnames(bgdf) <- c("term","gene")
genesets2 <- rbind(genesets2,bgdf)

Enrichment firstly with upregulated genes

orafix_up <- as.data.frame(enricher(gene = defup ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

orafix_up$geneID <- NULL
orafix_up <- subset(orafix_up,p.adjust<0.05 & Count >=5)
orafix_ups <- rownames(orafix_up)

gr <- as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",2))

orafix_up$es <- gr/br
orafix_up <- orafix_up[order(-orafix_up$es),]
orafix_up$Description=NULL

head(orafix_up) %>%
  kbl(row.names = FALSE, caption="Top upregulated pathways in LPS treatment") %>%
  kable_paper("hover", full_width = F)

Top upregulated pathways in LPS treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es
Signaling by ALK fusions and activated point mutants	5/44	43/5804	0.0000154	0.0019033	0.0015979	5	15.338266
Signaling by ALK in cancer	5/44	43/5804	0.0000154	0.0019033	0.0015979	5	15.338266
Signaling by Interleukins	7/44	181/5804	0.0003684	0.0182006	0.0152803	7	5.101457
Cytokine Signaling in Immune system	8/44	281/5804	0.0010554	0.0411722	0.0345660	8	3.755419

topup2 <- rev(head(orafix_up$es,10))
names(topup2) <- rev(head(orafix_up$ID,10))

Now with the downregulated genes

n_pw=10

orafix_dn <- as.data.frame(enricher(gene = defdn ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

orafix_dn$geneID <- NULL
orafix_dn <- subset(orafix_dn,p.adjust<0.05 & Count >=5)
orafix_dns <- rownames(orafix_dn)

gr <- as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",2))

orafix_dn$es <- gr/br
orafix_dn <- orafix_dn[order(-orafix_dn$es),]
orafix_dn$Description=NULL

head(orafix_dn,n_pw) %>%
  kbl(row.names = FALSE, caption="Top downregulated pathways in LPS treatment") %>%
  kable_paper("hover", full_width = F)

Top downregulated pathways in LPS treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es
ECM proteoglycans	5/165	17/5804	0.0000820	0.0193713	0.0176669	5	10.345811
Glycosphingolipid metabolism	5/165	18/5804	0.0001109	0.0193713	0.0176669	5	9.771044
Glycosaminoglycan metabolism	6/165	33/5804	0.0002822	0.0258435	0.0235696	6	6.395592
Degradation of the extracellular matrix	5/165	28/5804	0.0010104	0.0441225	0.0402403	5	6.281385
Post-translational protein phosphorylation	6/165	35/5804	0.0003946	0.0258435	0.0235696	6	6.030130
Regulation of Insulin-like Growth Factor (IGF) transport and uptake by Insulin-like Growth Factor Binding Proteins (IGFBPs)	6/165	37/5804	0.0005392	0.0313907	0.0286287	6	5.704177
Sphingolipid metabolism	6/165	38/5804	0.0006254	0.0327698	0.0298865	6	5.554067
Extracellular matrix organization	10/165	66/5804	0.0000140	0.0073459	0.0066996	10	5.329660
Diseases of glycosylation	7/165	47/5804	0.0003186	0.0258435	0.0235696	7	5.238943
Diseases of metabolism	9/165	90/5804	0.0009565	0.0441225	0.0402403	9	3.517576

topdn2 <- head(orafix_dn$es,n_pw)
names(topdn2) <- head(orafix_dn$ID,n_pw)

Make a barplot

par(mar=c(5,20,5,1))

cols <- c(rep("blue",n_pw),rep("red",n_pw))

top <- c(topdn2,topup2)

if ( length(top) > 1 ) {
  barplot(c(top),
    horiz=TRUE,las=1,cex.names=0.65,col=cols,
    main="top DE Reactomes",
    xlab="ES",
    cex.main=0.9)
  mtext("ORA fix")
}

Compare pathway results

v1 <- list("ORA up"=ora_ups, "ORA fix up"=orafix_ups)
v2 <- list("ORA dn"=ora_dns, "ORA fix dn"=orafix_dns)
v3 <- list("ORA"=union(ora_ups,ora_dns),
  "ORA fix"=union(orafix_ups,orafix_dns))

par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2),main="upregulated Reactomes")

plot(euler(v2),quantities = list(cex = 2), labels = list(cex = 2),main="downregulated Reactomes")

plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated Reactomes")

png("prot_reactome_euler.png")
plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated Reactomes")
dev.off()

## png 
##   2

Jaccard index comparing ORA and ORA fix.

jaccard <- function(a, b) {
    intersection = length(intersect(a, b))
    union = length(a) + length(b) - intersection
    return (intersection/union)
}

ora <- c(ora_ups,ora_dns)
orafix <- c(orafix_ups,orafix_dns)

jaccard(ora,orafix)

## [1] 0.5

List gene sets identified only in the “fixed” analysis.

ora <- union(ora_ups,ora_dns)

orafix <- union(orafix_ups,orafix_dns)

setdiff(orafix,ora)

## [1] "Glycosaminoglycan metabolism"                                                                                               
## [2] "Diseases of glycosylation"                                                                                                  
## [3] "Post-translational protein phosphorylation"                                                                                 
## [4] "Regulation of Insulin-like Growth Factor (IGF) transport and uptake by Insulin-like Growth Factor Binding Proteins (IGFBPs)"
## [5] "Sphingolipid metabolism"                                                                                                    
## [6] "Diseases of metabolism"                                                                                                     
## [7] "Degradation of the extracellular matrix"

Session information

For reproducibility

Click HERE to show session info

sessionInfo()

## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
##  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
##  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] gplots_3.1.3                eulerr_7.0.0               
##  [3] fgsea_1.26.0                clusterProfiler_4.8.2      
##  [5] kableExtra_1.3.4            DESeq2_1.40.2              
##  [7] SummarizedExperiment_1.30.2 Biobase_2.60.0             
##  [9] MatrixGenerics_1.12.2       matrixStats_1.0.0          
## [11] GenomicRanges_1.52.0        GenomeInfoDb_1.36.1        
## [13] IRanges_2.34.1              S4Vectors_0.38.1           
## [15] BiocGenerics_0.46.0        
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3      rstudioapi_0.15.0       jsonlite_1.8.7         
##   [4] magrittr_2.0.3          farver_2.1.1            rmarkdown_2.23         
##   [7] zlibbioc_1.46.0         vctrs_0.6.3             memoise_2.0.1          
##  [10] RCurl_1.98-1.12         ggtree_3.8.2            webshot_0.5.5          
##  [13] htmltools_0.5.5         S4Arrays_1.0.5          gridGraphics_0.5-1     
##  [16] sass_0.4.7              bslib_0.5.0             KernSmooth_2.23-22     
##  [19] plyr_1.8.8              cachem_1.0.8            igraph_1.5.0.1         
##  [22] lifecycle_1.0.3         pkgconfig_2.0.3         Matrix_1.6-0           
##  [25] R6_2.5.1                fastmap_1.1.1           gson_0.1.0             
##  [28] GenomeInfoDbData_1.2.10 digest_0.6.33           aplot_0.1.10           
##  [31] enrichplot_1.20.0       colorspace_2.1-0        patchwork_1.1.2        
##  [34] AnnotationDbi_1.62.2    RSQLite_2.3.1           fansi_1.0.4            
##  [37] httr_1.4.6              polyclip_1.10-4         abind_1.4-5            
##  [40] compiler_4.3.1          bit64_4.0.5             withr_2.5.0            
##  [43] downloader_0.4          BiocParallel_1.34.2     viridis_0.6.4          
##  [46] DBI_1.1.3               highr_0.10              ggforce_0.4.1          
##  [49] MASS_7.3-60             DelayedArray_0.26.6     HDO.db_0.99.1          
##  [52] caTools_1.18.2          gtools_3.9.4            tools_4.3.1            
##  [55] ape_5.7-1               scatterpie_0.2.1        glue_1.6.2             
##  [58] nlme_3.1-162            GOSemSim_2.26.1         polylabelr_0.2.0       
##  [61] grid_4.3.1              shadowtext_0.1.2        reshape2_1.4.4         
##  [64] generics_0.1.3          gtable_0.3.3            tidyr_1.3.0            
##  [67] data.table_1.14.8       tidygraph_1.2.3         xml2_1.3.5             
##  [70] utf8_1.2.3              XVector_0.40.0          ggrepel_0.9.3          
##  [73] pillar_1.9.0            stringr_1.5.0           yulab.utils_0.0.6      
##  [76] splines_4.3.1           dplyr_1.1.2             tweenr_2.0.2           
##  [79] treeio_1.24.3           lattice_0.21-8          bit_4.0.5              
##  [82] tidyselect_1.2.0        GO.db_3.17.0            locfit_1.5-9.8         
##  [85] Biostrings_2.68.1       knitr_1.43              gridExtra_2.3          
##  [88] svglite_2.1.1           xfun_0.39               graphlayouts_1.0.0     
##  [91] stringi_1.7.12          yaml_2.3.7              lazyeval_0.2.2         
##  [94] ggfun_0.1.1             evaluate_0.21           codetools_0.2-19       
##  [97] ggraph_2.1.0            tibble_3.2.1            qvalue_2.32.0          
## [100] ggplotify_0.1.1         cli_3.6.1               systemfonts_1.0.4      
## [103] jquerylib_0.1.4         munsell_0.5.0           Rcpp_1.0.11            
## [106] png_0.1-8               parallel_4.3.1          ggplot2_3.4.2          
## [109] blob_1.2.4              DOSE_3.26.1             bitops_1.0-7           
## [112] viridisLite_0.4.2       tidytree_0.4.4          scales_1.2.1           
## [115] purrr_1.0.1             crayon_1.5.2            rlang_1.1.1            
## [118] cowplot_1.1.1           fastmatch_1.1-3         KEGGREST_1.40.0        
## [121] rvest_1.0.3

Examining background gene lists in clusterProfiler - proteomics example

Mark Ziemann & Anusuiya Bora

2023-08-30

Introduction

KEGG pathway analysis

ORA with Clusterprofiler custom background with otherwise default analysis

ORA with Clusterprofiler with custom background and special gene set list to fix the potential bug

Compare pathway results

Reactome Gene sets

ORA with Clusterprofiler custom background with otherwise default analysis

ORA with Clusterprofiler with custom background and special gene set list to fix the potential bug

Compare pathway results

Session information