Examining background gene lists in clusterProfiler

Source: https://github.com/markziemann/enrichment_recipe

Introduction

This guide is a Rmarkdown script that conducts differential expression and enrichment analysis, which are very popular workflows for transcriptome data.

The goal of this work is to understand how ClusterProfiler manages the background list, as we have observed some weird behaviour. In particular we see that when we provide a custom background, those genes do not appear in the final analysis unless they are also included in the gene list.

In the code chunk below called libs, you can add and remove required R library dependancies. Check that the libraries listed here match the Dockerfile, otherwise you might get errors.

suppressPackageStartupMessages({
  library("getDEE2")
  library("DESeq2")
  library("kableExtra")
  library("clusterProfiler")
  library("fgsea")
  library("eulerr")
  library("gplots")
})

For this guide I will be using bulk RNA-seq data from a previous study, which is deposited at NCBI GEO and SRA under accession numbers: GSE55123 and SRP038101 (Lund et al, 2014). The experiment is designed to investigate the effect of Azacitidine treatment on AML3 cells.

The raw data have been processed by the DEE2 project, and the summary gene expression counts are available at the dee2.io website, and programmatically with the getDEE2 bioconductor package (Ziemann et al, 2019).

Alternatively, you could fetch data from another resource like NCBI GEO, Zenodo or from the host storage drive.

mdat <- getDEE2Metadata("hsapiens")

# get sample sheet
ss <- subset(mdat,SRP_accession=="SRP038101")

# fetch the whole set of RNA-seq data
x <- getDEE2("hsapiens",ss$SRR_accession , metadata=mdat, legacy=TRUE)

## For more information about DEE2 QC metrics, visit
##     https://github.com/markziemann/dee2/blob/master/qc/qc_metrics.md

mx <- x$GeneCounts
rownames(mx) <- paste(rownames(mx),x$GeneInfo$GeneSymbol)
dim(mx)

## [1] 58302     6

# aza no filtering
ss$trt <- grepl("Treated",ss$Experiment_title)

ss %>%
  kbl(caption="sample sheet for Aza treatment in AML3 cells") %>%
  kable_paper("hover", full_width = F)

sample sheet for Aza treatment in AML3 cells
	SRR_accession	QC_summary	SRX_accession	SRS_accession	SRP_accession	Experiment_title	trt
156267	SRR1171523	PASS	SRX472607	SRS559064	SRP038101	GSM1329859: Untreated.1; Homo sapiens; RNA-Seq	FALSE
156268	SRR1171524	WARN(3,4)	SRX472608	SRS559066	SRP038101	GSM1329860: Untreated.2; Homo sapiens; RNA-Seq	FALSE
156269	SRR1171525	WARN(3,4)	SRX472609	SRS559065	SRP038101	GSM1329861: Untreated.3; Homo sapiens; RNA-Seq	FALSE
156270	SRR1171526	WARN(3,4)	SRX472610	SRS559068	SRP038101	GSM1329862: Treated.1; Homo sapiens; RNA-Seq	TRUE
156271	SRR1171527	WARN(3,4)	SRX472611	SRS559067	SRP038101	GSM1329863: Treated.2; Homo sapiens; RNA-Seq	TRUE
156272	SRR1171528	WARN(3,4)	SRX472612	SRS559069	SRP038101	GSM1329864: Treated.3; Homo sapiens; RNA-Seq	TRUE

Data quality control

QC is important, even if you are using public transcriptome data. For RNA-seq it is a good idea to show the number of reads for each sample.

par(mar=c(5,7,5,1))
barplot(rev(colSums(mx)),horiz=TRUE,las=1,main="number of reads per sample in SRP038101")

Now make a MDS plot.

mds <- cmdscale(dist(t(mx)))

# expand plot area
XMIN=min(mds[,1])*1.3
XMAX=max(mds[,1])*1.3
YMIN=min(mds[,2])*1.3
YMAX=max(mds[,2])*1.3

cols <- as.character(grepl("Treated",ss$Experiment_title))
cols <- gsub("FALSE","lightblue",cols)
cols <- gsub("TRUE","pink",cols)
plot(mds, xlab="Coordinate 1", ylab="Coordinate 2",
  xlim=c(XMIN,XMAX),ylim=c(YMIN,YMAX),
  pch=19,cex=2,col=cols, main="MDS plot")
text(cmdscale(dist(t(mx))), labels=colnames(mx) )

Differential expression analysis

I will be using DESeq2 for DE analysis, the count matrix is prefiltered using a detection threshold of 10 reads per sample across all samples. All genes that meet the detection threshold will comprise the background list. The first 6 rows of the count matrix are shows.

mxf <- mx[which(rowMeans(mx)>=10),]
dim(mxf)

## [1] 13168     6

head(mxf,6) %>%
  kbl(caption="Count matrix format") %>%
  kable_paper("hover", full_width = F)

Count matrix format
	SRR1171523	SRR1171524	SRR1171525	SRR1171526	SRR1171527	SRR1171528
ENSG00000225630 MTND2P28	494	396	340	333	415	418
ENSG00000237973 MTCO1P12	52	39	40	30	37	29
ENSG00000248527 MTATP6P1	853	544	537	582	702	716
ENSG00000228327 AL669831.1	17	13	21	21	22	12
ENSG00000228794 LINC01128	42	27	30	32	40	23
ENSG00000230699 AL645608.3	20	11	13	10	15	22

dds <- DESeqDataSetFromMatrix(countData = mxf , colData = ss, design = ~ trt )
res <- DESeq(dds)

## estimating size factors

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

## final dispersion estimates

## fitting model and testing

z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <-cbind(as.data.frame(z),mxf)
def <-as.data.frame(zz[order(zz$pvalue),])

head(def,10) %>%
  kbl(caption="Top DE genes for Aza treatment") %>%
  kable_paper("hover", full_width = F)

Top DE genes for Aza treatment
	baseMean	log2FoldChange	lfcSE	stat	SRR1171523	SRR1171524	SRR1171525	SRR1171526	SRR1171527	SRR1171528
ENSG00000165949 IFI27	1960.1970	-3.384492	0.0938869	-36.04861	4689	3583	2758	309	384	334
ENSG00000090382 LYZ	7596.0299	-1.650342	0.0561143	-29.41036	14212	10237	10789	3476	3764	3738
ENSG00000115461 IGFBP5	531.2217	-5.071157	0.1795239	-28.24781	1320	823	1025	23	26	43
ENSG00000157601 MX1	827.1511	-2.877795	0.1047823	-27.46450	1732	1501	1206	184	223	186
ENSG00000111331 OAS3	2127.2010	-2.661214	0.0972124	-27.37525	4204	3977	2972	562	614	560
ENSG00000070915 SLC12A3	424.5509	-3.374852	0.1298671	-25.98697	1012	721	653	63	85	76
ENSG00000234745 HLA-B	3197.0159	-1.431566	0.0604169	-23.69479	6085	4256	4023	1590	1872	1719
ENSG00000137965 IFI44	409.0957	-2.978581	0.1319352	-22.57608	829	740	635	76	111	89
ENSG00000204525 HLA-C	1631.6421	-1.461550	0.0660214	-22.13750	3112	2150	2106	791	923	891
ENSG00000110042 DTX4	524.1318	-2.470219	0.1173182	-21.05572	1166	883	688	166	168	145

Make a smear plot.

sigf <- subset(def,padj<=0.05)

DET=nrow(mxf)
NSIG=nrow(sigf)
NUP=nrow(subset(sigf,log2FoldChange>0))
NDN=nrow(subset(sigf,log2FoldChange<0))

HEADER=paste(DET,"detected genes;",NSIG,"w/FDR<0.05;",NUP,"up;",NDN,"down")

plot(log10(def$baseMean) ,def$log2FoldChange,
  cex=0.6,pch=19,col="darkgray",
  main="5-azacitidine treatment in AML3 cells",
  xlab="log10(basemean)",ylab="log2(fold change)")

points(log10(sigf$baseMean) ,sigf$log2FoldChange,
  cex=0.6,pch=19,col="red")

mtext(HEADER)

In the next sections I will run enrichment analysis with over-representation analysis (ORA) test and compare it to functional class scoring. I will also investigate some strange behviour of the ORA tool clusterprofiler.

ORA with Clusterprofiler custom background with otherwise default analysis

I’ve compiled a reporting checklist:

Reporting criteria	Method/resource used
Origin of gene sets	MSigDB Hallmark (2023-06-16)
Tool used	ClusterProfiler (check version at foot of report)
Statistical test used	hypergeometric test
P-value correction for multiple comparisons	FDR method
Background list	Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria	DESeq2 FDR<0.05
ORA directionality	Separate tests for up- and down-regulation
Data availability	via DEE2 at accession SRP038101 (human)
Other parameters	Min gene set size of 10

Here I provide a custom background gene list.

Get the gene sets loaded in R. These are MSigDB Hallmark gene sets

# from https://www.gsea-msigdb.org/gsea/msigdb/human/collections.jsp 16th June 2023
genesets2 <- read.gmt("h.all.v2023.1.Hs.symbols.gmt")
gsets <- gmtPathways("h.all.v2023.1.Hs.symbols.gmt")

message(paste("number of genes described in the annotation set:",length(unique(genesets2$gene))))

## number of genes described in the annotation set: 4384

Now filter the gene names into three lists, up-regulated, down-regulated and background. Background is simply all genes that were detected.

defup <- rownames(subset(def,padj<=0.05 & log2FoldChange>0))
defup <- unique(sapply(strsplit(defup," "),"[[",2))

defdn <- rownames(subset(def,padj<=0.05 & log2FoldChange<0))
defdn <- unique(sapply(strsplit(defdn," "),"[[",2))

bg <- rownames(def)
bg <- unique(sapply(strsplit(bg," "),"[[",2))

message(paste("number of genes in background:",length(bg)))

## number of genes in background: 13164

Enrichment firstly with upregulated genes

ora_up <- as.data.frame(enricher(gene = defup ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

ora_up$geneID <- NULL
ora_up <- subset(ora_up,p.adjust<0.05 & Count >=10)
ora_ups <- rownames(ora_up)

gr <- as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",2))

ora_up$es <- gr/br
ora_up <- ora_up[order(-ora_up$es),]
ora_up$Description=NULL

head(ora_up) %>%
  kbl(row.names = FALSE, caption="Top upregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)

Top upregulated pathways in Aza treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es
HALLMARK_MYC_TARGETS_V2	31/509	57/3141	0.0000000	0.0000000	0.0000000	31	3.356116
HALLMARK_E2F_TARGETS	101/509	198/3141	0.0000000	0.0000000	0.0000000	101	3.147794
HALLMARK_G2M_CHECKPOINT	97/509	191/3141	0.0000000	0.0000000	0.0000000	97	3.133924
HALLMARK_MYC_TARGETS_V1	90/509	195/3141	0.0000000	0.0000000	0.0000000	90	2.848118
HALLMARK_SPERMATOGENESIS	21/509	72/3141	0.0037495	0.0257111	0.0225536	21	1.799853
HALLMARK_MITOTIC_SPINDLE	50/509	188/3141	0.0001243	0.0011930	0.0010465	50	1.641203

topup2 <- rev(head(ora_up$es,10))
names(topup2) <- rev(head(ora_up$ID,10))

The reported background size does not match the dataset.

Now repeat with the downregulated genes

n_pw=10

ora_dn <- as.data.frame(enricher(gene = defdn ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

ora_dn$geneID <- NULL
ora_dn <- subset(ora_dn,p.adjust<0.05 & Count >=10)
ora_dns <- rownames(ora_dn)

gr <- as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",2))

ora_dn$es <- gr/br
ora_dn <- ora_dn[order(-ora_dn$es),]
ora_dn$Description=NULL

head(ora_dn,n_pw) %>%
  kbl(row.names = FALSE, caption="Top downregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)

Top downregulated pathways in Aza treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es
HALLMARK_INTERFERON_ALPHA_RESPONSE	67/719	86/3141	0.0000000	0.0000000	0.0000000	67	3.403419
HALLMARK_INTERFERON_GAMMA_RESPONSE	110/719	165/3141	0.0000000	0.0000000	0.0000000	110	2.912378
HALLMARK_ANGIOGENESIS	11/719	17/3141	0.0002632	0.0009401	0.0005542	11	2.826720
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION	56/719	108/3141	0.0000000	0.0000000	0.0000000	56	2.265183
HALLMARK_INFLAMMATORY_RESPONSE	61/719	125/3141	0.0000000	0.0000000	0.0000000	61	2.131861
HALLMARK_IL6_JAK_STAT3_SIGNALING	29/719	63/3141	0.0000364	0.0001654	0.0000975	29	2.010928
HALLMARK_ALLOGRAFT_REJECTION	55/719	121/3141	0.0000000	0.0000002	0.0000001	55	1.985712
HALLMARK_TNFA_SIGNALING_VIA_NFKB	70/719	155/3141	0.0000000	0.0000000	0.0000000	70	1.972901
HALLMARK_KRAS_SIGNALING_UP	49/719	111/3141	0.0000004	0.0000023	0.0000013	49	1.928467
HALLMARK_COAGULATION	31/719	72/3141	0.0000966	0.0004026	0.0002373	31	1.880911

topdn2 <- head(ora_dn$es,n_pw)
names(topdn2) <- head(ora_dn$ID,n_pw)

Make a barplot

par(mar=c(5,20,5,1))

cols <- c(rep("blue",n_pw),rep("red",n_pw))

barplot(c(topdn2,topup2),
  horiz=TRUE,las=1,cex.names=0.65,col=cols,
  main="top DE MSigDB Hallmark",
  xlab="ES",
  cex.main=0.9)

mtext("ORA test")

ORA with Clusterprofiler with custom background and special gene set list to fix the potential bug

I’ve compiled a reporting checklist:

Reporting criteria	Method/resource used
Origin of gene sets	MSigDB Hallmark (2023-06-16)
Tool used	ClusterProfiler (check version at foot of report)
Statistical test used	hypergeometric test
P-value correction for multiple comparisons	FDR method
Background list	Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria	DESeq2 FDR<0.05
ORA directionality	Separate tests for up- and down-regulation
Data availability	via DEE2 at accession SRP038101 (human)
Other parameters	Min gene set size of 10

Here I provide a background gene list for cluterprofiler to use, like a user should.

I have also done some modification to the MSigDB Hallmark gene sets, to en

Now filter the gene names into three lists, up-regulated, down-regulated and background. Background is simply all genes that were detected.

defup <- rownames(subset(def,padj<=0.05 & log2FoldChange>0))
defup <- unique(sapply(strsplit(defup," "),"[[",2))

defdn <- rownames(subset(def,padj<=0.05 & log2FoldChange<0))
defdn <- unique(sapply(strsplit(defdn," "),"[[",2))

bg <- rownames(def)
bg <- unique(sapply(strsplit(bg," "),"[[",2))
message(paste("number of genes in background:",length(bg)))

## number of genes in background: 13164

Adding all detected genes to the background appears to improve results!

bgdf <- data.frame("background",bg)
colnames(bgdf) <- c("term","gene")
genesets2 <- rbind(genesets2,bgdf)

Enrichment firstly with upregulated genes

orafix_up <- as.data.frame(enricher(gene = defup ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

orafix_up$geneID <- NULL
orafix_up <- subset(orafix_up,p.adjust<0.05 & Count >=10)
orafix_ups <- rownames(orafix_up)

gr <- as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",2))

orafix_up$es <- gr/br
orafix_up <- orafix_up[order(-orafix_up$es),]
orafix_up$Description=NULL

head(orafix_up) %>%
  kbl(row.names = FALSE, caption="Top upregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)

Top upregulated pathways in Aza treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es
HALLMARK_MYC_TARGETS_V2	31/1672	57/13164	0.0000000	0.0000000	0.0000000	31	4.281919
HALLMARK_E2F_TARGETS	101/1672	198/13164	0.0000000	0.0000000	0.0000000	101	4.016130
HALLMARK_G2M_CHECKPOINT	97/1672	191/13164	0.0000000	0.0000000	0.0000000	97	3.998434
HALLMARK_MYC_TARGETS_V1	90/1672	195/13164	0.0000000	0.0000000	0.0000000	90	3.633787
HALLMARK_SPERMATOGENESIS	21/1672	72/13164	0.0001631	0.0011183	0.0009564	21	2.296352
HALLMARK_MITOTIC_SPINDLE	50/1672	188/13164	0.0000002	0.0000018	0.0000015	50	2.093938

topup2 <- rev(head(orafix_up$es,10))
names(topup2) <- rev(head(orafix_up$ID,10))

Now with the downregulated genes

n_pw=10

orafix_dn <- as.data.frame(enricher(gene = defdn ,
  universe = bg,  maxGSSize = 5000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

orafix_dn$geneID <- NULL
orafix_dn <- subset(orafix_dn,p.adjust<0.05 & Count >=10)
orafix_dns <- rownames(orafix_dn)

gr <- as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",2))

orafix_dn$es <- gr/br
orafix_dn <- orafix_dn[order(-orafix_dn$es),]
orafix_dn$Description=NULL

head(orafix_dn,n_pw) %>%
  kbl(row.names = FALSE, caption="Top downregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)

Top downregulated pathways in Aza treatment
ID	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count	es
HALLMARK_INTERFERON_ALPHA_RESPONSE	67/1926	86/13164	0.0e+00	0.0e+00	0.0e+00	67	5.324857
HALLMARK_INTERFERON_GAMMA_RESPONSE	110/1926	165/13164	0.0e+00	0.0e+00	0.0e+00	110	4.556594
HALLMARK_ANGIOGENESIS	11/1926	17/13164	3.4e-06	8.9e-06	3.5e-06	11	4.422576
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION	56/1926	108/13164	0.0e+00	0.0e+00	0.0e+00	56	3.544017
HALLMARK_INFLAMMATORY_RESPONSE	61/1926	125/13164	0.0e+00	0.0e+00	0.0e+00	61	3.335427
HALLMARK_IL6_JAK_STAT3_SIGNALING	29/1926	63/13164	0.0e+00	0.0e+00	0.0e+00	29	3.146220
HALLMARK_ALLOGRAFT_REJECTION	55/1926	121/13164	0.0e+00	0.0e+00	0.0e+00	55	3.106769
HALLMARK_TNFA_SIGNALING_VIA_NFKB	70/1926	155/13164	0.0e+00	0.0e+00	0.0e+00	70	3.086725
HALLMARK_KRAS_SIGNALING_UP	49/1926	111/13164	0.0e+00	0.0e+00	0.0e+00	49	3.017204
HALLMARK_COAGULATION	31/1926	72/13164	0.0e+00	0.0e+00	0.0e+00	31	2.942800

topdn2 <- head(orafix_dn$es,n_pw)
names(topdn2) <- head(orafix_dn$ID,n_pw)

Make a barplot

par(mar=c(5,20,5,1))

cols <- c(rep("blue",n_pw),rep("red",n_pw))

barplot(c(topdn2,topup2),
  horiz=TRUE,las=1,cex.names=0.65,col=cols,
  main="top DE MSigDB Hallmark",
  xlab="ES",
  cex.main=0.9)

mtext("ORA fix test")

Enrichment analysis with functional class scoring

Reporting criteria	Method/resource used
Origin of gene sets	MSigDB Hallmark (2023-06-16)
Tool used	FGSEA (check version at foot of report)
Statistical test used	Kolmogorov-Smirnov test
P-value correction for multiple comparisons	FDR method
Background list	Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria	DESeq2 FDR<0.05
ORA directionality	Separate tests for up- and down-regulation
Data availability	via DEE2 at accession SRP038101 (human)
Other parameters	Min gene set size of 10

def2 <- def
def2$genename <- sapply(strsplit(rownames(def2)," "),"[[",2)
def2 <- def2[,c("stat","genename")]
def2 <- aggregate(. ~ genename, def2, sum)
stat <- def2$stat
names(stat) <- def2$genename
fgseaRes <- fgsea(pathways=gsets, stats=stat, minSize=10, nPermSimple = 10000)
fgseaRes <- fgseaRes[order(fgseaRes$pval),]

fgsea_up <- subset(fgseaRes,padj<0.05 & ES>0)
fgsea_ups <- fgsea_up$pathway
fgsea_dn <- subset(fgseaRes,padj<0.05 & ES<0)
fgsea_dns <- fgsea_dn$pathway

fgsea_up <-  data.frame(fgsea_up[order(-fgsea_up$ES),])

fgsea_dn <-  data.frame(fgsea_dn[order(fgsea_dn$ES),])

head(fgsea_up,n_pw) %>%
  kbl(row.names = FALSE, caption="FGSEA:Top upregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)

FGSEA:Top upregulated pathways in Aza treatment
pathway	pval	padj	log2err	ES	NES	size	leadingEdge
HALLMARK_MYC_TARGETS_V2	0.0000000	0.0000000	0.8870750	0.6903076	2.669290	57	NOLC1 , HSPD1 , MRTO4 , MYC , NPM1 , PLK4 , PA2G4 , SUPV3L1 , TCOF1 , UTP20 , TFB2M , MCM4 , SORD , WDR43 , IMP4 , NOP16 , DDX18 , MYBBP1A , TMEM97 , PUS1 , PRMT3 , GNL3 , NIP7 , NOP56 , LAS1L , PPRC1 , PLK1 , CDK4 , SLC19A1 , PES1 , UNG , MPHOSPH10, SRM , CBX3
HALLMARK_E2F_TARGETS	0.0000000	0.0000000	1.5763736	0.6737511	3.189024	198	MKI67 , TRIP13 , PRKDC , CDC25A , RRM2 , BUB1B , NOLC1 , KPNA2 , MAD2L1 , RFC3 , GINS1 , CIT , MELK , MCM6 , ATAD2 , CSE1L , GSPT1 , CDKN2C , CNOT9 , SMC4 , PNN , TOP2A , NUP153 , MYBL2 , RAD1 , ORC6 , MYC , SRSF2 , BRCA1 , CTCF , TUBB , PLK4 , CDK1 , CDC20 , PA2G4 , GINS4 , RAD51AP1, DDX39A , SYNCRIP , POLE , CTPS1 , RANBP1 , PCNA , DIAPH3 , BIRC5 , MSH2 , SRSF1 , CENPE , MCM4 , CBX5 , PAICS , UBE2T , MMS22L , SMC3 , TACC3 , ZW10 , SMC1A , KIF18B , XPO1 , EXOSC8 , CDCA8 , HNRNPD , IPO7 , BRCA2 , POP7 , NUP205 , HMMR , HMGB2 , CCNE1 , TP53 , RFC1 , NOP56 , PSIP1 , CCNB2 , RBBP7 , SPC25 , PMS2 , CCP110 , NUDT21 , CKS2 , SHMT1 , PLK1 , ESPL1 , CDK4 , DUT , LYAR , STAG1 , STMN1 , NUP107 , AURKA , BARD1 , POLD3 , PDS5B , NCAPD2 , DSCC1 , RAN , RFC2 , DEPDC1 , HELLS , UNG , SSRP1 , TMPO , CENPM , EIF2S1 , MRE11 , ORC2 , RPA1 , ANP32E , SPC24 , CHEK1 , USP1 , LIG1 , AURKB , TK1 , SPAG5 , TRA2B , TCF19 , WEE1 , KIF22 , RAD51C , XRCC6 , TIPIN , E2F8 , HMGB3 , KIF4A , SLBP , RACGAP1 , RPA3 , GINS3 , CDCA3 , PRIM2 , NAP1L1 , DEK , PPP1R8
HALLMARK_MYC_TARGETS_V1	0.0000000	0.0000000	1.5697917	0.6713794	3.167520	195	CCNA2 , RRM1 , TFDP1 , ODC1 , NOLC1 , KPNA2 , MAD2L1 , HSPD1 , MCM6 , TCP1 , DDX21 , KPNB1 , GSPT1 , CAD , HSP90AB1 , HNRNPA3 , SRPK1 , IFRD1 , ETF1 , PSMD1 , MYC , SRSF2 , NPM1 , POLE3 , NCBP1 , CDC20 , PA2G4 , SYNCRIP , PGK1 , CTPS1 , RANBP1 , PCNA , CDC45 , EIF2S2 , SRSF3 , SET , ERH , SRSF1 , PABPC4 , MCM4 , SRSF7 , TARDBP , HNRNPA1 , DHX15 , HPRT1 , ABCE1 , CCT5 , G3BP1 , XPO1 , GOT2 , VDAC1 , HNRNPD , SERBP1 , TYMS , NOP16 , PSMB2 , DDX18 , GNL3 , PWP1 , UBA2 , NOP56 , XPOT , RAD23B , SF3A1 , BUB3 , SNRPD3 , UBE2L3 , RPL6 , PPIA , PTGES3 , VDAC3 , CDK4 , DUT , CCT2 , YWHAQ , EIF4E , CDK2 , SNRPA , ILF2 , SMARCC1 , RAN , EIF3J , PHB2 , EIF3B , TUFM , EIF1AX , EXOSC7 , CCT7 , CCT4 , EEF1B2 , NCBP2 , EIF2S1 , PSMD14 , SSB , ORC2 , ACP1 , SRM , SF3B3 , RNPS1 , CBX3 , USP1 , PSMC6 , TRA2B , SNRPD1 , HNRNPA2B1, XRCC6 , RPL14 , PSMD3 , SNRPB2 , SLC25A3 , FAM120A , CLNS1A , TXNL4A , C1QBP , CCT3 , SNRPG , CSTF2 , RUVBL2 , NAP1L1 , DEK , PRPF31 , GLO1 , YWHAE , PPM1G , TRIM28 , RSL1D1 , EIF4H , TOMM70
HALLMARK_G2M_CHECKPOINT	0.0000000	0.0000000	1.3267161	0.6180244	2.913002	191	MKI67 , CCNA2 , CENPF , TFDP1 , CDC25A , ODC1 , NOLC1 , KPNA2 , MAD2L1 , KIF23 , TNPO2 , MCM6 , NCL , KPNB1 , CDC6 , POLQ , GSPT1 , PBK , CDKN2C , SMC4 , KIF11 , TOP2A , KIF20B , FBXO5 , AMD1 , MEIS1 , NUSAP1 , MYBL2 , ORC6 , PRC1 , TPX2 , MYC , SRSF2 , CTCF , LIG3 , PLK4 , HMGN2 , RAD54L , CDK1 , KIF15 , CDC20 , SLC7A1 , RBL1 , DDX39A , DKC1 , CUL4A , SYNCRIP , POLE , NDC80 , PRPF4B , SMC2 , CCNT1 , CDC27 , CDC45 , KIF5B , BIRC5 , SRSF1 , CENPE , NUP50 , UPF1 , DR1 , TACC3 , SMC1A , WRN , G3BP1 , XPO1 , HNRNPD , CENPA , BRCA2 , CCNF , HMMR , SRSF10 , EXO1 , RAD23B , CCNB2 , BUB3 , CASP8AP2, KNL1 , CKS2 , CUL5 , PLK1 , ESPL1 , CDK4 , SLC7A5 , TTK , STAG1 , STMN1 , TROAP , AURKA , BARD1 , TOP1 , NUP98 , NSD2 , PDS5B , SMARCC1 , BUB1 , INCENP , TMPO , DBF4 , CHAF1A , CHEK1 , AURKB , ODF2 , TRA2B , DTYMK , SNRPD1 , E2F3 , NEK2 , KIF22 , HMGB3 , RASAL2 , KIF4A , STIL , E2F1 , ATRX
HALLMARK_SPERMATOGENESIS	0.0001349	0.0003158	0.5188481	0.4653055	1.886053	72	SPATA6 , CSNK2A2, SLC2A5 , CDK1 , TSN , TOPBP1 , PSMG1 , MLLT10 , NEFH , NCAPH , CCNB2 , VDAC3 , TTK , PEBP1 , PRKAR2A, AURKA , JAM3 , MAP7 , RAD17 , AGFG1 , BUB1 , IFT88 , DBF4 , LDHC , NF2 , MTOR , NEK2 , CLPB
HALLMARK_MITOTIC_SPINDLE	0.0004600	0.0008847	0.4984931	0.3344765	1.574660	188	CENPF , TBCD , KIF23 , EPB41L2 , ANLN , KNTC1 , CDC42EP1, SMC4 , KIF11 , TOP2A , KIF20B , FBXO5 , NUSAP1 , NET1 , PRC1 , TPX2 , CDK1 , KIF15 , NDC80 , NF1 , ALMS1 , ECT2 , CDC27 , KIF5B , BIRC5 , CENPE , SMC3 , SMC1A , PCNT , BRCA2 , FLNB , TUBGCP3 , RFC1 , LRPPRC , CCNB2 , CKAP5 , EZR , DYNC1H1 , TUBGCP2 , PLK1 , ESPL1 , TTK , AURKA , SASS6 , ARFGEF1 , DST , BUB1 , PPP4R2 , GEMIN4 , INCENP , CEP57

head(fgsea_dn,n_pw) %>%
  kbl(row.names = FALSE, caption="FGSEA:Top downregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)

FGSEA:Top downregulated pathways in Aza treatment
pathway	pval	padj	log2err	ES	NES	size	leadingEdge
HALLMARK_INTERFERON_ALPHA_RESPONSE	0.0000000	0.0000000	1.5161076	-0.8685815	-3.016285	86	IFI27 , MX1 , IFI44 , HLA-C , OAS1 , IFITM3 , B2M , IFIT3 , PARP9 , TAP1 , SAMD9L , DDX60 , LGALS3BP, IFI44L , UBE2L6 , CMPK2 , PARP14 , PSMB9 , ISG15 , IRF7 , STAT2 , SP110 , HELZ2 , EIF2AK2 , LY6E , PARP12 , IFIH1 , IFIT2 , PLSCR1 , TXNIP , EPSTI1 , SAMD9 , IFI35 , BST2 , HERC6 , IFITM1 , RSAD2 , GBP2 , TRIM25 , NCOA7 , UBA7 , USP18 , ADAR , TRIM14 , LAP3 , DHX58 , OASL , TRIM5 , CNP , PSMB8 , PSME1 , TRIM21 , CASP1 , CMTR1
HALLMARK_INTERFERON_GAMMA_RESPONSE	0.0000000	0.0000000	1.7683043	-0.8145238	-3.062366	165	IFI27 , MX1 , OAS3 , HLA-B , IFI44 , CD86 , OAS2 , IFIT1 , IFITM3 , B2M , IFIT3 , STAT1 , TAP1 , SAMD9L , DDX60 , LGALS3BP, IFI44L , UBE2L6 , CMPK2 , PARP14 , PSMB9 , HLA-A , ISG15 , IRF7 , STAT2 , SP110 , HELZ2 , EIF2AK2 , LY6E , MYD88 , PARP12 , IFIH1 , IFIT2 , PLSCR1 , ITGB7 , TXNIP , EPSTI1 , IFI35 , TOR1B , BST2 , HERC6 , IRF5 , RSAD2 , TRIM25 , XAF1 , USP18 , ADAR , TRIM14 , CDKN1A , LAP3 , MX2 , ARID5B , CFH , DHX58 , BANK1 , OASL , RBCK1 , TNFSF10 , MVP , BTG1 , FGL2 , PSMB8 , RNF213 , APOL6 , PSME1 , PTGS2 , SAMHD1 , TRIM21 , CASP1 , PML , CMTR1 , IL10RA , JAK2 , HLA-G , CASP4 , TAPBP , CXCL11 , NLRC5 , ST8SIA4 , ZNFX1 , GCH1 , PTPN6 , PFKP , ISG20 , PDE4B , IL18BP , TNFAIP3 , METTL7B , EIF4E3 , MT2A , STAT3 , IFITM2 , PLA2G4A , GBP4 , PSME2 , GPR18 , NAMPT , CASP7 , TRIM26 , GZMA , CCL2 , STAT4 , NFKBIA , NMI
HALLMARK_ANGIOGENESIS	0.0015261	0.0027252	0.4550599	-0.7464645	-1.904687	17	VCAN , THBD , FGFR1, LPL , PDGFA, SPP1 , JAG1 , JAG2 , STC1 , NRP1 , TIMP1
HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION	0.0000000	0.0000000	1.0959293	-0.7353253	-2.632093	108	TGFBI , COL1A2 , IGFBP3 , VCAN , TPM2 , GJA1 , JUN , ID2 , ENO2 , TPM4 , ANPEP , MMP2 , GLIPR1 , CAPG , SLC6A8 , LRP1 , MMP1 , TIMP3 , QSOX1 , SPOCK1 , EMP3 , PCOLCE , LGALS1 , FUCA1 , PFN2 , CDH2 , PLOD1 , MYL9 , DAB2 , FLNA , MEST , SPP1 , ACTA2 , FERMT2 , SPARC , IL32 , EDIL3 , SGCB , CTHRC1 , PDLIM4 , PLAUR , GPX7 , FN1 , ITGA5 , TNFAIP3 , COL8A2 , INHBA , EFEMP2 , GAS1 , SDC1 , CD59 , SERPINH1, VEGFC , TGFB1
HALLMARK_COAGULATION	0.0000000	0.0000000	0.7477397	-0.6721676	-2.270109	72	ANXA1 , MMP2 , LRP1 , GSN , C3 , MMP1 , TIMP3 , THBD , CFH , CAPN2 , CTSB , ADAM9 , MMP15 , FURIN , PLAU , SPARC , WDR1 , CTSH , DUSP6 , PECAM1 , CTSO , FN1 , FYN , SERPINA1, KLF7 , PLEK , A2M , SERPINB2, F3 , TIMP1 , SIRT2 , MMP14 , CTSK , CAPN5 , F8 , RAPGEF3 , F12 , SERPING1
HALLMARK_IL6_JAK_STAT3_SIGNALING	0.0000001	0.0000003	0.7049757	-0.6592597	-2.182204	63	STAT1 , STAT2 , JUN , MYD88 , IL17RA , CD36 , HMOX1 , ACVRL1 , TNFRSF1B, PDGFC , IFNGR2 , OSMR , CXCL11 , TYK2 , CD14 , PIK3R5 , CSF1 , TNF , IFNGR1 , IL10RB , TLR2 , STAT3 , LEPR , TGFB1 , A2M , CSF3R , IRF1 , IL4R
HALLMARK_INFLAMMATORY_RESPONSE	0.0000000	0.0000000	0.9101197	-0.6470366	-2.357426	125	IRF7 , EIF2AK2 , LY6E , AXL , PTGER4 , BST2 , IFITM1 , CDKN1A , STAB1 , EMP3 , NOD2 , PTPRE , CLEC5A , TNFSF10 , IL7R , RGS16 , BTG2 , PTAFR , P2RY2 , GPR183 , CD70 , CYBB , IL10RA , CHST2 , TNFRSF1B, RHOG , GPC3 , SPHK1 , IFNGR2 , LPAR1 , OSMR , TAPBP , CXCL11 , PTGER2 , MEFV , GCH1 , PLAUR , CD14 , CD82 , PIK3R5 , CSF1 , ITGA5 , PDE4B , TLR2 , INHBA , GNA15 , IL18 , NAMPT , ATP2C1 , CCL2 , TNFSF9 , NFKBIA , NMI , IRAK2 , CSF3R , LCP2 , IRF1 , F3 , PSEN1 , IL4R , TIMP1
HALLMARK_NOTCH_SIGNALING	0.0030251	0.0052157	0.4317077	-0.6361774	-1.811228	28	DTX4 , NOTCH1 , CCND1 , FZD1 , DLL1 , CUL1 , JAG1 , ST3GAL6, MAML2 , PPARD , FZD7 , SAP30 , DTX2 , NOTCH3 , LFNG , KAT2A , TCF7L2
HALLMARK_ALLOGRAFT_REJECTION	0.0000000	0.0000000	0.8634154	-0.6334832	-2.300022	121	CD86 , B2M , STAT1 , TAP1 , HLA-A , IRF7 , FCGR2B , LY86 , CDKN2A , CAPG , GBP2 , ITGAL , HLA-E , CTSS , STAB1 , FGR , CCR2 , TLR6 , IL16 , FLNA , HDAC9 , JAK2 , CCND3 , APBB1 , HLA-G , NCF4 , IFNGR2 , ITGB2 , TAPBP , ST8SIA4, PTPN6 , CSF1 , CD1D , TNF , ETS1 , IFNGR1 , TLR2 , INHBA , GCNT1 , IL18 , SPI1 , GZMA , ELF4 , CCL2 , STAT4 , F2R , TGFB1 , TAP2 , PRKCB , BCL3 , LCP2 , IL4R , TIMP1 , HCLS1 , CCR1 , LYN , CD47 , FYB1 , AKT1
HALLMARK_TNFA_SIGNALING_VIA_NFKB	0.0000000	0.0000000	0.9101197	-0.6183389	-2.311022	155	TAP1 , JUN , ID2 , IFIH1 , IFIT2 , PTGER4 , BCL6 , CCND1 , MARCKS , SERPINB8, CDKN1A , PLK2 , FOSL2 , PTPRE , TNFAIP8 , BTG1 , IL7R , PHLDA1 , SQSTM1 , DUSP4 , STAT5A , BTG2 , IER5 , TGIF1 , NINJ1 , PTGS2 , GPR183 , ZFP36 , PLAU , LITAF , JUNB , JAG1 , TIPARP , SPHK1 , IFNGR2 , SLC2A6 , CXCL11 , TNIP1 , NFKB2 , BHLHE40 , FOS , CEBPB , GCH1 , PLAUR , TRAF1 , KLF10 , CSF1 , TNF , PDE4B , TNFAIP3 , TLR2 , KLF4 , SGK1 , INHBA , PNRC1 , IL18 , NAMPT , CCL2 , TNFSF9 , NFKBIA , PLEK , PANX1 , SERPINB2, PPP1R15A, BCL3 , KDM6B , DRAM1 , IRF1 , F3 , CLCF1

fgsea_up <- head(fgsea_up,n_pw)
fgsea_up_vec <- fgsea_up$ES
names(fgsea_up_vec) <- fgsea_up$pathway
fgsea_up_vec <- rev(fgsea_up_vec)

fgsea_dn <- head(fgsea_dn,n_pw)
fgsea_dn_vec <- fgsea_dn$ES * -1
names(fgsea_dn_vec) <- fgsea_dn$pathway

Make a barplot

par(mar=c(5,20,5,1))

cols <- c(rep("blue",n_pw),rep("red",n_pw))

barplot(c(fgsea_dn_vec,fgsea_up_vec),
  horiz=TRUE,las=1,cex.names=0.65,col=cols,
  main="top DE MSigDB Hallmark",
  xlab="ES",
  cex.main=0.9)

mtext("FCS test")

Compare pathway results

v1 <- list("ORA up"=ora_ups, "ORA fix up"=orafix_ups, "FCS up"=fgsea_ups)
v2 <- list("ORA dn"=ora_dns, "ORA fix dn"=orafix_dns, "FCS dn"=fgsea_dns)
v3 <- list("ORA"=union(ora_ups,ora_dns),
  "ORA fix"=union(orafix_ups,orafix_dns),
  "FCS"=union(fgsea_ups,fgsea_dns))

par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2),main="upregulated genes")

plot(euler(v2),quantities = list(cex = 2), labels = list(cex = 2),main="downregulated genes")

plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated genes")

Jaccard index comparing ORA and FCS.

jaccard <- function(a, b) {
    intersection = length(intersect(a, b))
    union = length(a) + length(b) - intersection
    return (intersection/union)
}

ora <- c(ora_ups,ora_dns)
fcs <- c(fgsea_ups,fgsea_dns)

jaccard(ora,fcs)

## [1] 0.7428571

Jaccard index comparing ORA and ORA fix.

ora <- c(ora_ups,ora_dns)
orafix <- c(orafix_ups,orafix_dns)

jaccard(ora,orafix)

## [1] 0.6923077

Jaccard index comparing ORA fix and FCS.

orafix <- c(orafix_ups,orafix_dns)
fcs <- c(fgsea_ups,fgsea_dns)

jaccard(orafix,fcs)

## [1] 0.8717949

Drill in to a specific pathway

In this case the top upregulated and downregulated pathways

upname <- head(fgsea_up,1)$pathway
plotEnrichment(gsets[[upname]], stat, gseaParam = 1, ticksSize = 0.2)

dnname <- head(fgsea_dn,1)$pathway
plotEnrichment(gsets[[dnname]], stat, gseaParam = 1, ticksSize = 0.2)

Now a heatmap of these gene sets.

# reads per million normalisation
rpm <- apply(mxf,2, function(x) { x/sum(x) * 1000000} )
colnames(rpm) <- c("UNTR1","UNTR2","UNTR3","TRT1","TRT2","TRT3")
gnames_up <- gsets[[which(names(gsets) == upname)]]
gnames_dn <- gsets[[which(names(gsets) == dnname)]]
gene_ids <- rownames(mxf)
gene_names <- sapply(strsplit(gene_ids," "),"[[",2)
rpm_up <- rpm[which(gene_names %in% gnames_up),]
rownames(rpm_up) <- sapply(strsplit(rownames(rpm_up)," "),"[[",2)
rpm_dn <- rpm[which(gene_names %in% gnames_dn),]
rownames(rpm_dn) <- sapply(strsplit(rownames(rpm_dn)," "),"[[",2)
colsidecols <- c("blue","blue","blue","red","red","red")
heatmap.2(rpm_up,scale="row",trace="none",margins=c(10,15),main=upname,ColSideColors=colsidecols)

heatmap.2(rpm_dn,scale="row",trace="none",margins=c(10,15),main=dnname,ColSideColors=colsidecols)

Session information

For reproducibility

Click HERE to show session info

sessionInfo()

## R version 4.3.0 (2023-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] gplots_3.1.3                eulerr_7.0.0               
##  [3] fgsea_1.22.0                clusterProfiler_4.4.4      
##  [5] kableExtra_1.3.4            DESeq2_1.36.0              
##  [7] SummarizedExperiment_1.26.1 Biobase_2.56.0             
##  [9] MatrixGenerics_1.8.1        matrixStats_0.63.0         
## [11] GenomicRanges_1.48.0        GenomeInfoDb_1.32.2        
## [13] IRanges_2.30.0              S4Vectors_0.34.0           
## [15] BiocGenerics_0.42.0         getDEE2_1.6.0              
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3     rstudioapi_0.14        jsonlite_1.8.4        
##   [4] magrittr_2.0.3         farver_2.1.1           rmarkdown_2.21        
##   [7] zlibbioc_1.42.0        vctrs_0.6.2            memoise_2.0.1         
##  [10] RCurl_1.98-1.12        ggtree_3.4.1           webshot_0.5.4         
##  [13] htmltools_0.5.5        gridGraphics_0.5-1     sass_0.4.5            
##  [16] KernSmooth_2.23-20     bslib_0.4.2            plyr_1.8.8            
##  [19] cachem_1.0.7           igraph_1.4.2           lifecycle_1.0.3       
##  [22] pkgconfig_2.0.3        Matrix_1.5-4           R6_2.5.1              
##  [25] fastmap_1.1.1          GenomeInfoDbData_1.2.8 digest_0.6.31         
##  [28] aplot_0.1.10           enrichplot_1.16.1      colorspace_2.1-0      
##  [31] patchwork_1.1.2        AnnotationDbi_1.58.0   geneplotter_1.74.0    
##  [34] RSQLite_2.3.1          labeling_0.4.2         fansi_1.0.4           
##  [37] httr_1.4.5             polyclip_1.10-4        compiler_4.3.0        
##  [40] bit64_4.0.5            withr_2.5.0            downloader_0.4        
##  [43] BiocParallel_1.30.3    viridis_0.6.2          DBI_1.1.3             
##  [46] highr_0.10             ggforce_0.4.1          MASS_7.3-59           
##  [49] DelayedArray_0.22.0    caTools_1.18.2         gtools_3.9.4          
##  [52] tools_4.3.0            DO.db_2.9              scatterpie_0.1.9      
##  [55] ape_5.7-1              glue_1.6.2             nlme_3.1-162          
##  [58] GOSemSim_2.22.0        polylabelr_0.2.0       shadowtext_0.1.2      
##  [61] grid_4.3.0             reshape2_1.4.4         generics_0.1.3        
##  [64] gtable_0.3.3           tidyr_1.3.0            data.table_1.14.8     
##  [67] tidygraph_1.2.3        xml2_1.3.4             utf8_1.2.3            
##  [70] XVector_0.36.0         ggrepel_0.9.3          pillar_1.9.0          
##  [73] stringr_1.5.0          yulab.utils_0.0.6      genefilter_1.78.0     
##  [76] splines_4.3.0          dplyr_1.1.2            tweenr_2.0.2          
##  [79] treeio_1.20.1          lattice_0.21-8         survival_3.5-5        
##  [82] bit_4.0.5              annotate_1.74.0        tidyselect_1.2.0      
##  [85] GO.db_3.15.0           locfit_1.5-9.7         Biostrings_2.64.0     
##  [88] knitr_1.42             gridExtra_2.3          svglite_2.1.1         
##  [91] xfun_0.39              graphlayouts_0.8.4     stringi_1.7.12        
##  [94] lazyeval_0.2.2         ggfun_0.0.9            yaml_2.3.7            
##  [97] evaluate_0.20          codetools_0.2-19       ggraph_2.1.0          
## [100] tibble_3.2.1           qvalue_2.28.0          ggplotify_0.1.0       
## [103] cli_3.6.1              xtable_1.8-4           systemfonts_1.0.4     
## [106] munsell_0.5.0          jquerylib_0.1.4        Rcpp_1.0.10           
## [109] png_0.1-8              XML_3.99-0.14          parallel_4.3.0        
## [112] ggplot2_3.4.2          blob_1.2.4             DOSE_3.22.0           
## [115] htm2txt_2.2.2          bitops_1.0-7           tidytree_0.4.2        
## [118] viridisLite_0.4.1      scales_1.2.1           purrr_1.0.1           
## [121] crayon_1.5.2           rlang_1.1.1            fastmatch_1.1-3       
## [124] KEGGREST_1.36.3        rvest_1.0.3

Examining background gene lists in clusterProfiler

Mark Ziemann & Anusuiya Bora

2023-06-16

Introduction

Data quality control

Differential expression analysis

ORA with Clusterprofiler custom background with otherwise default analysis

ORA with Clusterprofiler with custom background and special gene set list to fix the potential bug

Enrichment analysis with functional class scoring

Compare pathway results

Drill in to a specific pathway

Session information