Source: https://github.com/markziemann/background

Introduction

This guide is a Rmarkdown script that conducts differential expression and enrichment analysis, which are very popular workflows for transcriptome data.

The goal of this work is to understand how ClusterProfiler manages the background list, as we have observed some weird behaviour. In particular we see that when we provide a custom background, those genes do not appear in the final analysis unless they are also included in the gene list.

In the code chunk below called libs, you can add and remove required R library dependancies. Check that the libraries listed here match the Dockerfile, otherwise you might get errors.

suppressPackageStartupMessages({
  library("getDEE2")
  library("DESeq2")
  library("kableExtra")
  library("clusterProfiler")
  library("fgsea")
  library("eulerr")
  library("gplots")
})

For this guide I will be using bulk RNA-seq data from a previous study, which is deposited at NCBI GEO and SRA under accession numbers: GSE55123 and SRP038101 (Lund et al, 2014). The experiment is designed to investigate the effect of Azacitidine treatment on AML3 cells.

The raw data have been processed by the DEE2 project, and the summary gene expression counts are available at the dee2.io website, and programmatically with the getDEE2 bioconductor package (Ziemann et al, 2019).

Alternatively, you could fetch data from another resource like NCBI GEO, Zenodo or from the host storage drive.

mdat <- getDEE2Metadata("hsapiens")

# get sample sheet
ss <- subset(mdat,SRP_accession=="SRP038101")

# fetch the whole set of RNA-seq data
x <- getDEE2("hsapiens",ss$SRR_accession , metadata=mdat, legacy=TRUE)
## For more information about DEE2 QC metrics, visit
##     https://github.com/markziemann/dee2/blob/master/qc/qc_metrics.md
mx <- x$GeneCounts
rownames(mx) <- paste(rownames(mx),x$GeneInfo$GeneSymbol)
dim(mx)
## [1] 58302     6
# aza no filtering
ss$trt <- grepl("Treated",ss$Experiment_title)

ss %>%
  kbl(caption="sample sheet for Aza treatment in AML3 cells") %>%
  kable_paper("hover", full_width = F)
sample sheet for Aza treatment in AML3 cells
SRR_accession QC_summary SRX_accession SRS_accession SRP_accession Experiment_title GEO_series trt
156267 SRR1171523 PASS SRX472607 SRS559064 SRP038101 GSM1329859: Untreated.1; Homo sapiens; RNA-Seq FALSE
156268 SRR1171524 WARN(3,4) SRX472608 SRS559066 SRP038101 GSM1329860: Untreated.2; Homo sapiens; RNA-Seq FALSE
156269 SRR1171525 WARN(3,4) SRX472609 SRS559065 SRP038101 GSM1329861: Untreated.3; Homo sapiens; RNA-Seq FALSE
156270 SRR1171526 WARN(3,4) SRX472610 SRS559068 SRP038101 GSM1329862: Treated.1; Homo sapiens; RNA-Seq TRUE
156271 SRR1171527 WARN(3,4) SRX472611 SRS559067 SRP038101 GSM1329863: Treated.2; Homo sapiens; RNA-Seq TRUE
156272 SRR1171528 WARN(3,4) SRX472612 SRS559069 SRP038101 GSM1329864: Treated.3; Homo sapiens; RNA-Seq TRUE

Data quality control

QC is important, even if you are using public transcriptome data. For RNA-seq it is a good idea to show the number of reads for each sample.

par(mar=c(5,7,5,1))
barplot(rev(colSums(mx)),horiz=TRUE,las=1,main="number of reads per sample in SRP038101")

Now make a MDS plot.

mds <- cmdscale(dist(t(mx)))

# expand plot area
XMIN=min(mds[,1])*1.3
XMAX=max(mds[,1])*1.3
YMIN=min(mds[,2])*1.3
YMAX=max(mds[,2])*1.3

cols <- as.character(grepl("Treated",ss$Experiment_title))
cols <- gsub("FALSE","lightblue",cols)
cols <- gsub("TRUE","pink",cols)
plot(mds, xlab="Coordinate 1", ylab="Coordinate 2",
  xlim=c(XMIN,XMAX),ylim=c(YMIN,YMAX),
  pch=19,cex=2,col=cols, main="MDS plot")
text(cmdscale(dist(t(mx))), labels=colnames(mx) )

Differential expression analysis

I will be using DESeq2 for DE analysis, the count matrix is prefiltered using a detection threshold of 10 reads per sample across all samples. All genes that meet the detection threshold will comprise the background list. The first 6 rows of the count matrix are shows.

mxf <- mx[which(rowMeans(mx)>=10),]
dim(mxf)
## [1] 13168     6
head(mxf,6) %>%
  kbl(caption="Count matrix format") %>%
  kable_paper("hover", full_width = F)
Count matrix format
SRR1171523 SRR1171524 SRR1171525 SRR1171526 SRR1171527 SRR1171528
ENSG00000225630 MTND2P28 494 396 340 333 415 418
ENSG00000237973 MTCO1P12 52 39 40 30 37 29
ENSG00000248527 MTATP6P1 853 544 537 582 702 716
ENSG00000228327 AL669831.1 17 13 21 21 22 12
ENSG00000228794 LINC01128 42 27 30 32 40 23
ENSG00000230699 AL645608.3 20 11 13 10 15 22
dds <- DESeqDataSetFromMatrix(countData = mxf , colData = ss, design = ~ trt )
res <- DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <-cbind(as.data.frame(z),mxf)
def <-as.data.frame(zz[order(zz$pvalue),])

head(def,10) %>%
  kbl(caption="Top DE genes for Aza treatment") %>%
  kable_paper("hover", full_width = F)
Top DE genes for Aza treatment
baseMean log2FoldChange lfcSE stat pvalue padj SRR1171523 SRR1171524 SRR1171525 SRR1171526 SRR1171527 SRR1171528
ENSG00000165949 IFI27 1960.1970 -3.384492 0.0938869 -36.04861 0 0 4689 3583 2758 309 384 334
ENSG00000090382 LYZ 7596.0299 -1.650342 0.0561143 -29.41036 0 0 14212 10237 10789 3476 3764 3738
ENSG00000115461 IGFBP5 531.2217 -5.071157 0.1795239 -28.24781 0 0 1320 823 1025 23 26 43
ENSG00000157601 MX1 827.1511 -2.877795 0.1047823 -27.46450 0 0 1732 1501 1206 184 223 186
ENSG00000111331 OAS3 2127.2010 -2.661214 0.0972124 -27.37525 0 0 4204 3977 2972 562 614 560
ENSG00000070915 SLC12A3 424.5509 -3.374852 0.1298671 -25.98697 0 0 1012 721 653 63 85 76
ENSG00000234745 HLA-B 3197.0159 -1.431566 0.0604169 -23.69479 0 0 6085 4256 4023 1590 1872 1719
ENSG00000137965 IFI44 409.0957 -2.978581 0.1319352 -22.57608 0 0 829 740 635 76 111 89
ENSG00000204525 HLA-C 1631.6421 -1.461550 0.0660214 -22.13750 0 0 3112 2150 2106 791 923 891
ENSG00000110042 DTX4 524.1318 -2.470219 0.1173182 -21.05572 0 0 1166 883 688 166 168 145

Make a smear plot.

sigf <- subset(def,padj<=0.05)

DET=nrow(mxf)
NSIG=nrow(sigf)
NUP=nrow(subset(sigf,log2FoldChange>0))
NDN=nrow(subset(sigf,log2FoldChange<0))

HEADER=paste(DET,"detected genes;",NSIG,"w/FDR<0.05;",NUP,"up;",NDN,"down")

plot(log10(def$baseMean) ,def$log2FoldChange,
  cex=0.6,pch=19,col="darkgray",
  main="5-azacitidine treatment in AML3 cells",
  xlab="log10(basemean)",ylab="log2(fold change)")

points(log10(sigf$baseMean) ,sigf$log2FoldChange,
  cex=0.6,pch=19,col="red")

mtext(HEADER)

write.table(def,file="../../data/aza_rnaseq.tsv")

KEGG pathway analysis

In the next sections I will run enrichment analysis with over-representation analysis (ORA) test and compare it to functional class scoring. I will also investigate some strange behviour of the ORA tool clusterprofiler.

Get gene sets

Get the gene sets loaded in R. These are KEGG gene sets

# from https://www.gsea-msigdb.org/gsea/msigdb/human/collections.jsp 16th June 2023
genesets2 <- read.gmt("../../ref/c2.cp.kegg.v2023.1.Hs.symbols.gmt")
gsets <- gmtPathways("../../ref/c2.cp.kegg.v2023.1.Hs.symbols.gmt")

message(paste("number of genes described in the annotation set:",length(unique(genesets2$gene))))
## number of genes described in the annotation set: 5244

Now filter the gene names into three lists, up-regulated, down-regulated and background. Background is simply all genes that were detected.

defup <- rownames(subset(def,padj<=0.05 & log2FoldChange>0))
defup <- unique(sapply(strsplit(defup," "),"[[",2))

defdn <- rownames(subset(def,padj<=0.05 & log2FoldChange<0))
defdn <- unique(sapply(strsplit(defdn," "),"[[",2))

bg <- rownames(def)
bg <- unique(sapply(strsplit(bg," "),"[[",2))

message(paste("number of genes in background:",length(bg)))
## number of genes in background: 13164
writeLines(text=unique(defup),con="aza_up.txt")
writeLines(text=unique(defdn),con="aza_dn.txt")
writeLines(text=unique(bg),con="aza_bg.txt")

ORA with Clusterprofiler custom background with otherwise default analysis

I’ve compiled a reporting checklist:

Reporting criteria Method/resource used
Origin of gene sets KEGG (2023-06-16)
Tool used ClusterProfiler (check version at foot of report)
Statistical test used hypergeometric test
P-value correction for multiple comparisons FDR method
Background list Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria DESeq2 FDR<0.05
ORA directionality Separate tests for up- and down-regulation
Data availability via DEE2 at accession SRP038101 (human)
Other parameters Min gene set size of 10

Here I provide a custom background gene list.

Enrichment firstly with upregulated genes

n_pw=10

ora_up <- as.data.frame(enricher(gene = defup ,
  universe = bg,  maxGSSize = 500000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

ora_up$geneID <- NULL
ora_up <- subset(ora_up,p.adjust<0.05 & Count >=5)
ora_ups <- rownames(ora_up)

gr <- as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",2))

ora_up$es <- gr/br
ora_up <- ora_up[order(-ora_up$es),]
ora_up$Description=NULL

head(ora_up) %>%
  kbl(row.names = FALSE, caption="Top upregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)
Top upregulated pathways in Aza treatment
ID GeneRatio BgRatio pvalue p.adjust qvalue Count es
KEGG_MISMATCH_REPAIR 10/406 22/3134 0.0001811 0.0044696 0.0042686 10 3.508733
KEGG_DNA_REPLICATION 15/406 35/3134 0.0000104 0.0005222 0.0004987 15 3.308234
KEGG_CELL_CYCLE 44/406 115/3134 0.0000000 0.0000000 0.0000000 44 2.953438
KEGG_NUCLEOTIDE_EXCISION_REPAIR 16/406 42/3134 0.0000314 0.0009481 0.0009054 16 2.940652
KEGG_BASE_EXCISION_REPAIR 12/406 33/3134 0.0005182 0.0086936 0.0083027 12 2.806986
KEGG_RNA_POLYMERASE 9/406 26/3134 0.0038482 0.0484237 0.0462464 9 2.672035
topup2 <- rev(head(ora_up$es,10))
names(topup2) <- rev(head(ora_up$ID,10))

The reported background size does not match the dataset.

Now repeat with the downregulated genes

ora_dn <- as.data.frame(enricher(gene = defdn ,
  universe = bg,  maxGSSize = 500000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

ora_dn$geneID <- NULL
ora_dn <- subset(ora_dn,p.adjust<0.05 & Count >=5)
ora_dns <- rownames(ora_dn)

gr <- as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",2))

ora_dn$es <- gr/br
ora_dn <- ora_dn[order(-ora_dn$es),]
ora_dn$Description=NULL

head(ora_dn,n_pw) %>%
  kbl(row.names = FALSE, caption="Top downregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)
Top downregulated pathways in Aza treatment
ID GeneRatio BgRatio pvalue p.adjust qvalue Count es
KEGG_ALLOGRAFT_REJECTION 8/608 11/3134 0.0001822 0.0030068 0.0023402 8 3.748804
KEGG_GRAFT_VERSUS_HOST_DISEASE 8/608 11/3134 0.0001822 0.0030068 0.0023402 8 3.748804
KEGG_AUTOIMMUNE_THYROID_DISEASE 7/608 10/3134 0.0006959 0.0073552 0.0057246 7 3.608224
KEGG_TYPE_I_DIABETES_MELLITUS 9/608 14/3134 0.0002893 0.0039782 0.0030963 9 3.313675
KEGG_OTHER_GLYCAN_DEGRADATION 7/608 13/3134 0.0057712 0.0380901 0.0296459 7 2.775557
KEGG_CELL_ADHESION_MOLECULES_CAMS 30/608 59/3134 0.0000000 0.0000037 0.0000028 30 2.620986
KEGG_HEMATOPOIETIC_CELL_LINEAGE 19/608 39/3134 0.0000311 0.0010268 0.0007992 19 2.511218
KEGG_COMPLEMENT_AND_COAGULATION_CASCADES 10/608 21/3134 0.0031211 0.0245228 0.0190863 10 2.454574
KEGG_LEISHMANIA_INFECTION 23/608 49/3134 0.0000100 0.0005478 0.0004264 23 2.419509
KEGG_SYSTEMIC_LUPUS_ERYTHEMATOSUS 9/608 20/3134 0.0079453 0.0485544 0.0377903 9 2.319572
topdn2 <- head(ora_dn$es,n_pw)
names(topdn2) <- head(ora_dn$ID,n_pw)

Make a barplot

par(mar=c(5,20,5,1))

cols <- c(rep("blue",n_pw),rep("red",n_pw))

top <- c(topdn2,topup2)

if ( length(top) > 1 ) {
  barplot(c(top),
    horiz=TRUE,las=1,cex.names=0.65,col=cols,
    main="top DE KEGGs",
    xlab="ES",
    cex.main=0.9)
  mtext("ORA test")
}

ORA with Clusterprofiler with custom background and special gene set list to fix the potential bug

I’ve compiled a reporting checklist:

Reporting criteria Method/resource used
Origin of gene sets KEGG (2023-06-16)
Tool used ClusterProfiler (check version at foot of report)
Statistical test used hypergeometric test
P-value correction for multiple comparisons FDR method
Background list Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria DESeq2 FDR<0.05
ORA directionality Separate tests for up- and down-regulation
Data availability via DEE2 at accession SRP038101 (human)
Other parameters Min gene set size of 10 + BUG FIX

Here I provide a background gene list for cluterprofiler to use, like a user should.

I have also done some modification to the KEGG gene sets, to en

Now filter the gene names into three lists, up-regulated, down-regulated and background. Background is simply all genes that were detected.

defup <- rownames(subset(def,padj<=0.05 & log2FoldChange>0))
defup <- unique(sapply(strsplit(defup," "),"[[",2))

defdn <- rownames(subset(def,padj<=0.05 & log2FoldChange<0))
defdn <- unique(sapply(strsplit(defdn," "),"[[",2))

bg <- rownames(def)
bg <- unique(sapply(strsplit(bg," "),"[[",2))
message(paste("number of genes in background:",length(bg)))
## number of genes in background: 13164

Adding all detected genes to the background appears to improve results!

bgdf <- data.frame("background",bg)
colnames(bgdf) <- c("term","gene")
genesets2 <- rbind(genesets2,bgdf)

Enrichment firstly with upregulated genes

n_pw=10

orafix_up <- as.data.frame(enricher(gene = defup ,
  universe = bg,  maxGSSize = 500000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

orafix_up$geneID <- NULL
orafix_up <- subset(orafix_up,p.adjust<0.05 & Count >=5)
orafix_ups <- rownames(orafix_up)

gr <- as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",2))

orafix_up$es <- gr/br
orafix_up <- orafix_up[order(-orafix_up$es),]
orafix_up$Description=NULL

head(orafix_up) %>%
  kbl(row.names = FALSE, caption="Top upregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)
Top upregulated pathways in Aza treatment
ID GeneRatio BgRatio pvalue p.adjust qvalue Count es
KEGG_MISMATCH_REPAIR 10/1672 22/13164 0.0001611 0.0039535 0.0037783 10 3.578730
KEGG_DNA_REPLICATION 15/1672 35/13164 0.0000091 0.0004610 0.0004406 15 3.374231
KEGG_CELL_CYCLE 44/1672 115/13164 0.0000000 0.0000000 0.0000000 44 3.012357
KEGG_NUCLEOTIDE_EXCISION_REPAIR 16/1672 42/13164 0.0000275 0.0008364 0.0007993 16 2.999316
KEGG_BASE_EXCISION_REPAIR 12/1672 33/13164 0.0004583 0.0077393 0.0073963 12 2.862984
KEGG_RNA_POLYMERASE 9/1672 26/13164 0.0034659 0.0439015 0.0419557 9 2.725340
topup2 <- rev(head(orafix_up$es,10))
names(topup2) <- rev(head(orafix_up$ID,10))

Now with the downregulated genes

orafix_dn <- as.data.frame(enricher(gene = defdn ,
  universe = bg,  maxGSSize = 500000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

orafix_dn$geneID <- NULL
orafix_dn <- subset(orafix_dn,p.adjust<0.05 & Count >=5)
orafix_dns <- rownames(orafix_dn)

gr <- as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",2))

orafix_dn$es <- gr/br
orafix_dn <- orafix_dn[order(-orafix_dn$es),]
orafix_dn$Description=NULL

head(orafix_dn,n_pw) %>%
  kbl(row.names = FALSE, caption="Top downregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)
Top downregulated pathways in Aza treatment
ID GeneRatio BgRatio pvalue p.adjust qvalue Count es
KEGG_ALLOGRAFT_REJECTION 8/1926 11/13164 0.0000226 0.0002255 0.0001444 8 4.970830
KEGG_GRAFT_VERSUS_HOST_DISEASE 8/1926 11/13164 0.0000226 0.0002255 0.0001444 8 4.970830
KEGG_AUTOIMMUNE_THYROID_DISEASE 7/1926 10/13164 0.0001134 0.0008184 0.0005241 7 4.784424
KEGG_TYPE_I_DIABETES_MELLITUS 9/1926 14/13164 0.0000300 0.0002493 0.0001597 9 4.393859
KEGG_OTHER_GLYCAN_DEGRADATION 7/1926 13/13164 0.0010809 0.0058674 0.0037578 7 3.680326
KEGG_CELL_ADHESION_MOLECULES_CAMS 30/1926 59/13164 0.0000000 0.0000000 0.0000000 30 3.475368
KEGG_INTESTINAL_IMMUNE_NETWORK_FOR_IGA_PRODUCTION 7/1926 14/13164 0.0018899 0.0091194 0.0058406 7 3.417445
KEGG_HEMATOPOIETIC_CELL_LINEAGE 19/1926 39/13164 0.0000005 0.0000109 0.0000070 19 3.329819
KEGG_COMPLEMENT_AND_COAGULATION_CASCADES 10/1926 21/13164 0.0003293 0.0021863 0.0014002 10 3.254710
KEGG_LEISHMANIA_INFECTION 23/1926 49/13164 0.0000001 0.0000034 0.0000022 23 3.208214
topdn2 <- head(orafix_dn$es,n_pw)
names(topdn2) <- head(orafix_dn$ID,n_pw)

Make a barplot

par(mar=c(5,20,5,1))

cols <- c(rep("blue",n_pw),rep("red",n_pw))

top <- c(topdn2,topup2)

if ( length(top) > 1 ) {
  barplot(c(top),
    horiz=TRUE,las=1,cex.names=0.65,col=cols,
    main="top DE KEGGs",
    xlab="ES",
    cex.main=0.9)
  mtext("ORA fix")
}

Compare pathway results

v1 <- list("ORA up"=ora_ups, "ORA fix up"=orafix_ups)
v2 <- list("ORA dn"=ora_dns, "ORA fix dn"=orafix_dns)
v3 <- list("ORA"=union(ora_ups,ora_dns),
  "ORA fix"=union(orafix_ups,orafix_dns))

par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2),main="upregulated KEGG pathways")

plot(euler(v2),quantities = list(cex = 2), labels = list(cex = 2),main="downregulated KEGG pathways")

plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated KEGG pathways")

svg("rna_kegg_euler.svg")
plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated KEGG pathways")
dev.off()
## png 
##   2

Jaccard index comparing ORA and ORA fix.

jaccard <- function(a, b) {
    intersection = length(intersect(a, b))
    union = length(a) + length(b) - intersection
    return (intersection/union)
}

ora <- c(ora_ups,ora_dns)
orafix <- c(orafix_ups,orafix_dns)

jaccard(ora,orafix)
## [1] 0.5909091

List gene sets identified only in the “fixed” analysis.

ora <- union(ora_ups,ora_dns)

orafix <- union(orafix_ups,orafix_dns)

setdiff(orafix,ora)
##  [1] "KEGG_CALCIUM_SIGNALING_PATHWAY"                   
##  [2] "KEGG_GAP_JUNCTION"                                
##  [3] "KEGG_GNRH_SIGNALING_PATHWAY"                      
##  [4] "KEGG_GLYCOLYSIS_GLUCONEOGENESIS"                  
##  [5] "KEGG_FC_GAMMA_R_MEDIATED_PHAGOCYTOSIS"            
##  [6] "KEGG_ACUTE_MYELOID_LEUKEMIA"                      
##  [7] "KEGG_INTESTINAL_IMMUNE_NETWORK_FOR_IGA_PRODUCTION"
##  [8] "KEGG_TOLL_LIKE_RECEPTOR_SIGNALING_PATHWAY"        
##  [9] "KEGG_B_CELL_RECEPTOR_SIGNALING_PATHWAY"           
## [10] "KEGG_ENDOCYTOSIS"                                 
## [11] "KEGG_APOPTOSIS"                                   
## [12] "KEGG_GLUTATHIONE_METABOLISM"                      
## [13] "KEGG_TGF_BETA_SIGNALING_PATHWAY"                  
## [14] "KEGG_GLIOMA"                                      
## [15] "KEGG_NOTCH_SIGNALING_PATHWAY"                     
## [16] "KEGG_MELANOGENESIS"                               
## [17] "KEGG_GLYCOSAMINOGLYCAN_DEGRADATION"               
## [18] "KEGG_RIG_I_LIKE_RECEPTOR_SIGNALING_PATHWAY"       
## [19] "KEGG_AXON_GUIDANCE"                               
## [20] "KEGG_TYPE_II_DIABETES_MELLITUS"                   
## [21] "KEGG_OLFACTORY_TRANSDUCTION"                      
## [22] "KEGG_MELANOMA"                                    
## [23] "KEGG_CHRONIC_MYELOID_LEUKEMIA"                    
## [24] "KEGG_ARGININE_AND_PROLINE_METABOLISM"             
## [25] "KEGG_STEROID_BIOSYNTHESIS"                        
## [26] "KEGG_P53_SIGNALING_PATHWAY"                       
## [27] "KEGG_SPHINGOLIPID_METABOLISM"

Reactome Gene sets

ORA with Clusterprofiler custom background with otherwise default analysis

I’ve compiled a reporting checklist:

Reporting criteria Method/resource used
Origin of gene sets Reactome (2023-03-06)
Tool used ClusterProfiler (check version at foot of report)
Statistical test used hypergeometric test
P-value correction for multiple comparisons FDR method
Background list Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria DESeq2 FDR<0.05
ORA directionality Separate tests for up- and down-regulation
Data availability via DEE2 at accession SRP038101 (human)
Other parameters Min gene set size of 10

Here I provide a custom background gene list.

Get the gene sets loaded in R. These are Reactome gene sets

genesets2 <- read.gmt("../../ref/ReactomePathways_2023-03-06.gmt")
gsets <- gmtPathways("../../ref/ReactomePathways_2023-03-06.gmt")

message(paste("number of genes described in the annotation set:",length(unique(genesets2$gene))))
## number of genes described in the annotation set: 11457

Enrichment firstly with upregulated genes

n_pw=10

ora_up <- as.data.frame(enricher(gene = defup ,
  universe = bg,  maxGSSize = 500000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

ora_up$geneID <- NULL
ora_up <- subset(ora_up,p.adjust<0.05 & Count >=5)
ora_ups <- rownames(ora_up)

gr <- as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",2))

ora_up$es <- gr/br
ora_up <- ora_up[order(-ora_up$es),]
ora_up$Description=NULL

head(ora_up) %>%
  kbl(row.names = FALSE, caption="Top upregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)
Top upregulated pathways in Aza treatment
ID GeneRatio BgRatio pvalue p.adjust qvalue Count es
Condensation of Prometaphase Chromosomes 9/1050 11/6944 0.0000017 0.0000309 0.0000259 9 5.410909
Folding of actin by CCT/TriC 7/1050 10/6944 0.0001398 0.0016736 0.0014045 7 4.629333
Interactions of Rev with host cellular proteins 24/1050 36/6944 0.0000000 0.0000000 0.0000000 24 4.408889
Nuclear import of Rev protein 22/1050 33/6944 0.0000000 0.0000000 0.0000000 22 4.408889
Postmitotic nuclear pore complex (NPC) reformation 18/1050 27/6944 0.0000000 0.0000001 0.0000001 18 4.408889
Rev-mediated nuclear export of HIV RNA 22/1050 34/6944 0.0000000 0.0000000 0.0000000 22 4.279216
topup2 <- rev(head(ora_up$es,10))
names(topup2) <- rev(head(ora_up$ID,10))

Now repeat with the downregulated genes

ora_dn <- as.data.frame(enricher(gene = defdn ,
  universe = bg,  maxGSSize = 500000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

ora_dn$geneID <- NULL
ora_dn <- subset(ora_dn,p.adjust<0.05 & Count >=5)
ora_dns <- rownames(ora_dn)

gr <- as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",2))

ora_dn$es <- gr/br
ora_dn <- ora_dn[order(-ora_dn$es),]
ora_dn$Description=NULL

head(ora_dn,n_pw) %>%
  kbl(row.names = FALSE, caption="Top downregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)
Top downregulated pathways in Aza treatment
ID GeneRatio BgRatio pvalue p.adjust qvalue Count es
Interleukin-20 family signaling 10/1178 12/6944 0.0000009 0.0000594 0.0000530 10 4.912281
Endosomal/Vacuolar pathway 9/1178 12/6944 0.0000152 0.0007214 0.0006439 9 4.421053
Interferon alpha/beta signaling 41/1178 55/6944 0.0000000 0.0000000 0.0000000 41 4.394258
Interferon gamma signaling 40/1178 63/6944 0.0000000 0.0000000 0.0000000 40 3.742690
Interleukin-10 signaling 17/1178 27/6944 0.0000001 0.0000104 0.0000093 17 3.711501
Transcriptional regulation of pluripotent stem cells 7/1178 12/6944 0.0014257 0.0337226 0.0301007 7 3.438597
IRAK4 deficiency (TLR2/4) 8/1178 14/6944 0.0007663 0.0229901 0.0205209 8 3.368421
Binding and Uptake of Ligands by Scavenger Receptors 9/1178 16/6944 0.0004129 0.0141085 0.0125932 9 3.315790
Other interleukin signaling 10/1178 18/6944 0.0002230 0.0080676 0.0072011 10 3.274854
Formation of the cornified envelope 8/1178 15/6944 0.0013992 0.0337226 0.0301007 8 3.143860
topdn2 <- head(ora_dn$es,n_pw)
names(topdn2) <- head(ora_dn$ID,n_pw)

Make a barplot

par(mar=c(5,20,5,1))

cols <- c(rep("blue",n_pw),rep("red",n_pw))

top <- c(topdn2,topup2)

if ( length(top) > 1 ) {
  barplot(c(top),
    horiz=TRUE,las=1,cex.names=0.65,col=cols,
    main="top DE Reactomes",
    xlab="ES",
    cex.main=0.9)
  mtext("ORA test")
}

ORA with Clusterprofiler with custom background and special gene set list to fix the potential bug

I’ve compiled a reporting checklist:

Reporting criteria Method/resource used
Origin of gene sets Reactome (2023-03-06)
Tool used ClusterProfiler (check version at foot of report)
Statistical test used hypergeometric test
P-value correction for multiple comparisons FDR method
Background list Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria DESeq2 FDR<0.05
ORA directionality Separate tests for up- and down-regulation
Data availability via DEE2 at accession SRP038101 (human)
Other parameters Min gene set size of 10 + BUG FIX

Adding all detected genes to the background appears to improve results!

bgdf <- data.frame("background",bg)
colnames(bgdf) <- c("term","gene")
genesets2 <- rbind(genesets2,bgdf)

Enrichment firstly with upregulated genes

orafix_up <- as.data.frame(enricher(gene = defup ,
  universe = bg,  maxGSSize = 500000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

orafix_up$geneID <- NULL
orafix_up <- subset(orafix_up,p.adjust<0.05 & Count >=5)
orafix_ups <- rownames(orafix_up)

gr <- as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",2))

orafix_up$es <- gr/br
orafix_up <- orafix_up[order(-orafix_up$es),]
orafix_up$Description=NULL

head(orafix_up) %>%
  kbl(row.names = FALSE, caption="Top upregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)
Top upregulated pathways in Aza treatment
ID GeneRatio BgRatio pvalue p.adjust qvalue Count es
Condensation of Prometaphase Chromosomes 9/1672 11/13164 4.00e-07 0.0000055 0.0000043 9 6.441714
Folding of actin by CCT/TriC 7/1672 10/13164 4.45e-05 0.0004779 0.0003726 7 5.511244
Interactions of Rev with host cellular proteins 24/1672 36/13164 0.00e+00 0.0000000 0.0000000 24 5.248804
Nuclear import of Rev protein 22/1672 33/13164 0.00e+00 0.0000000 0.0000000 22 5.248804
Postmitotic nuclear pore complex (NPC) reformation 18/1672 27/13164 0.00e+00 0.0000000 0.0000000 18 5.248804
Rev-mediated nuclear export of HIV RNA 22/1672 34/13164 0.00e+00 0.0000000 0.0000000 22 5.094427
topup2 <- rev(head(orafix_up$es,10))
names(topup2) <- rev(head(orafix_up$ID,10))

Now with the downregulated genes

n_pw=10

orafix_dn <- as.data.frame(enricher(gene = defdn ,
  universe = bg,  maxGSSize = 500000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

orafix_dn$geneID <- NULL
orafix_dn <- subset(orafix_dn,p.adjust<0.05 & Count >=5)
orafix_dns <- rownames(orafix_dn)

gr <- as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",2))

orafix_dn$es <- gr/br
orafix_dn <- orafix_dn[order(-orafix_dn$es),]
orafix_dn$Description=NULL

head(orafix_dn,n_pw) %>%
  kbl(row.names = FALSE, caption="Top downregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)
Top downregulated pathways in Aza treatment
ID GeneRatio BgRatio pvalue p.adjust qvalue Count es
Interleukin-20 family signaling 10/1926 12/13164 0.0000002 0.0000108 0.0000089 10 5.695742
Endosomal/Vacuolar pathway 9/1926 12/13164 0.0000044 0.0001919 0.0001589 9 5.126168
Interferon alpha/beta signaling 41/1926 55/13164 0.0000000 0.0000000 0.0000000 41 5.095101
Interferon gamma signaling 40/1926 63/13164 0.0000000 0.0000000 0.0000000 40 4.339613
Interleukin-10 signaling 17/1926 27/13164 0.0000000 0.0000009 0.0000007 17 4.303450
TNFs bind their physiological receptors 6/1926 10/13164 0.0012021 0.0196365 0.0162539 6 4.100935
Transcriptional regulation of pluripotent stem cells 7/1926 12/13164 0.0005709 0.0111549 0.0092334 7 3.987020
IRAK4 deficiency (TLR2/4) 8/1926 14/13164 0.0002719 0.0066947 0.0055415 8 3.905652
Binding and Uptake of Ligands by Scavenger Receptors 9/1926 16/13164 0.0001299 0.0041000 0.0033937 9 3.844626
Other interleukin signaling 10/1926 18/13164 0.0000622 0.0021881 0.0018112 10 3.797162
topdn2 <- head(orafix_dn$es,n_pw)
names(topdn2) <- head(orafix_dn$ID,n_pw)

Make a barplot

par(mar=c(5,20,5,1))

cols <- c(rep("blue",n_pw),rep("red",n_pw))

top <- c(topdn2,topup2)

if ( length(top) > 1 ) {
  barplot(c(top),
    horiz=TRUE,las=1,cex.names=0.65,col=cols,
    main="top DE Reactomes",
    xlab="ES",
    cex.main=0.9)
  mtext("ORA fix")
}

ORA with FORA with custom background

Upregulated genes.

fres_up <- fora(pathways = gsets, genes = defup, universe = bg,
  minSize = 5, maxSize = 500000)

fora_up <- subset(fres_up,padj<0.05)$pathway

Downregulated genes.

fres_dn <- fora(pathways = gsets, genes = defdn, universe = bg,
  minSize = 5, maxSize = 500000)

fora_dn <- subset(fres_dn,padj<0.05)$pathway

Compare pathway results

Compare clusterprofiler approaches.

v1 <- list("ORA up"=ora_ups, "ORA fix up"=orafix_ups)
v2 <- list("ORA dn"=ora_dns, "ORA fix dn"=orafix_dns)
v3 <- list("ORA"=union(ora_ups,ora_dns),
  "ORA fix"=union(orafix_ups,orafix_dns))

par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2),main="upregulated Reactomes")

plot(euler(v2),quantities = list(cex = 2), labels = list(cex = 2),main="downregulated Reactomes")

plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated Reactomes")

svg("rna_reactome_euler.svg")
plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated Reactomes")
dev.off()
## png 
##   2

Jaccard index comparing ORA and ORA fix.

jaccard <- function(a, b) {
    intersection = length(intersect(a, b))
    union = length(a) + length(b) - intersection
    return (intersection/union)
}

ora <- c(ora_ups,ora_dns)
orafix <- c(orafix_ups,orafix_dns)

jaccard(ora,orafix)
## [1] 0.6104972

List gene sets identified only in the “fixed” analysis.

ora <- union(ora_ups,ora_dns)

orafix <- union(orafix_ups,orafix_dns)

setdiff(orafix,ora)
##   [1] "Signaling by Rho GTPases, Miro GTPases and RHOBTB3"                                   
##   [2] "RNA Polymerase II Transcription"                                                      
##   [3] "Post-translational protein modification"                                              
##   [4] "Signaling by Rho GTPases"                                                             
##   [5] "Metabolism of proteins"                                                               
##   [6] "rRNA processing"                                                                      
##   [7] "rRNA processing in the nucleus and cytosol"                                           
##   [8] "Signaling by Nuclear Receptors"                                                       
##   [9] "SARS-CoV-2-host interactions"                                                         
##  [10] "APC/C-mediated degradation of cell cycle proteins"                                    
##  [11] "Regulation of mitotic cell cycle"                                                     
##  [12] "Disease"                                                                              
##  [13] "SARS-CoV Infections"                                                                  
##  [14] "Major pathway of rRNA processing in the nucleolus and cytosol"                        
##  [15] "Anchoring of the basal body to the plasma membrane"                                   
##  [16] "Generic Transcription Pathway"                                                        
##  [17] "DNA Replication Pre-Initiation"                                                       
##  [18] "Regulation of APC/C activators between G1/S and early anaphase"                       
##  [19] "Translesion synthesis by Y family DNA polymerases bypasses lesions on DNA template"   
##  [20] "Chaperonin-mediated protein folding"                                                  
##  [21] "Positive epigenetic regulation of rRNA expression"                                    
##  [22] "Disorders of transmembrane transporters"                                              
##  [23] "Cellular responses to stress"                                                         
##  [24] "Protein folding"                                                                      
##  [25] "ESR-mediated signaling"                                                               
##  [26] "Cooperation of Prefoldin and TriC/CCT  in actin and tubulin folding"                  
##  [27] "RNA Polymerase I Transcription Termination"                                           
##  [28] "Chromatin modifying enzymes"                                                          
##  [29] "Chromatin organization"                                                               
##  [30] "SARS-CoV-2 Infection"                                                                 
##  [31] "Cellular responses to stimuli"                                                        
##  [32] "RHOBTB GTPase Cycle"                                                                  
##  [33] "Recruitment of NuMA to mitotic centrosomes"                                           
##  [34] "Centrosome maturation"                                                                
##  [35] "Recruitment of mitotic centrosome proteins and complexes"                             
##  [36] "RNA Polymerase III Transcription Initiation From Type 2 Promoter"                     
##  [37] "Loss of Nlp from mitotic centrosomes"                                                 
##  [38] "Loss of proteins required for interphase microtubule organization from the centrosome"
##  [39] "Regulation of mRNA stability by proteins that bind AU-rich elements"                  
##  [40] "Influenza Infection"                                                                  
##  [41] "E2F mediated regulation of DNA replication"                                           
##  [42] "Regulation of PLK1 Activity at G2/M Transition"                                       
##  [43] "Cooperation of PDCL (PhLP1) and TRiC/CCT in G-protein beta folding"                   
##  [44] "RNA Polymerase III Transcription Initiation From Type 1 Promoter"                     
##  [45] "Assembly of the ORC complex at the origin of replication"                             
##  [46] "SLBP independent Processing of Histone Pre-mRNAs"                                     
##  [47] "Nucleotide biosynthesis"                                                              
##  [48] "Processing of SMDT1"                                                                  
##  [49] "Activation of APC/C and APC/C:Cdc20 mediated degradation of mitotic proteins"         
##  [50] "Meiotic recombination"                                                                
##  [51] "B-WICH complex positively regulates rRNA expression"                                  
##  [52] "RNA Polymerase III Abortive And Retractive Initiation"                                
##  [53] "RNA Polymerase III Transcription"                                                     
##  [54] "RND2 GTPase cycle"                                                                    
##  [55] "Regulation of cholesterol biosynthesis by SREBP (SREBF)"                              
##  [56] "Cyclin E associated events during G1/S transition"                                    
##  [57] "Prefoldin mediated transfer of substrate  to CCT/TriC"                                
##  [58] "mRNA 3'-end processing"                                                               
##  [59] "Aberrant regulation of mitotic G1/S transition in cancer due to RB1 defects"          
##  [60] "Defective binding of RB1 mutants to E2F1,(E2F2, E2F3)"                                
##  [61] "Aberrant regulation of mitotic cell cycle due to RB1 defects"                         
##  [62] "Activation of BH3-only proteins"                                                      
##  [63] "Formation of the beta-catenin:TCF transactivating complex"                            
##  [64] "SLBP Dependent Processing of Replication-Dependent Histone Pre-mRNAs"                 
##  [65] "The role of GTSE1 in G2/M progression after G2 checkpoint"                            
##  [66] "Cyclin A:Cdk2-associated events at S phase entry"                                     
##  [67] "DNA Double Strand Break Response"                                                     
##  [68] "Deubiquitination"                                                                     
##  [69] "Developmental Biology"                                                                
##  [70] "Biological oxidations"                                                                
##  [71] "Leishmania infection"                                                                 
##  [72] "Neuronal System"                                                                      
##  [73] "RAC1 GTPase cycle"                                                                    
##  [74] "Toll-like Receptor Cascades"                                                          
##  [75] "Cell-Cell communication"                                                              
##  [76] "PI5P, PP2A and IER3 Regulate PI3K/AKT Signaling"                                      
##  [77] "Muscle contraction"                                                                   
##  [78] "Transmission across Chemical Synapses"                                                
##  [79] "Antigen processing-Cross presentation"                                                
##  [80] "RHOA GTPase cycle"                                                                    
##  [81] "Semaphorin interactions"                                                              
##  [82] "Interleukin-3, Interleukin-5 and GM-CSF signaling"                                    
##  [83] "Collagen biosynthesis and modifying enzymes"                                          
##  [84] "Negative regulation of the PI3K/AKT network"                                          
##  [85] "Attachment and Entry_9694614"                                                         
##  [86] "MyD88 deficiency (TLR2/4)"                                                            
##  [87] "Toll Like Receptor 4 (TLR4) Cascade"                                                  
##  [88] "TNFs bind their physiological receptors"                                              
##  [89] "Constitutive Signaling by Aberrant PI3K in Cancer"                                    
##  [90] "Phase I - Functionalization of compounds"                                             
##  [91] "Glycosaminoglycan metabolism"                                                         
##  [92] "Signaling by cytosolic FGFR1 fusion mutants"                                          
##  [93] "Signaling by SCF-KIT"                                                                 
##  [94] "Non-integrin membrane-ECM interactions"                                               
##  [95] "Metabolism of fat-soluble vitamins"                                                   
##  [96] "Diseases of signal transduction by growth factor receptors and second messengers"     
##  [97] "Ephrin signaling"                                                                     
##  [98] "Sema3A PAK dependent Axon repulsion"                                                  
##  [99] "ER-Phagosome pathway"                                                                 
## [100] "Signaling by PDGF"                                                                    
## [101] "CDC42 GTPase cycle"                                                                   
## [102] "Glycosphingolipid metabolism"                                                         
## [103] "Opioid Signalling"                                                                    
## [104] "Elastic fibre formation"                                                              
## [105] "Interleukin-15 signaling"                                                             
## [106] "Regulation of TLR by endogenous ligand"                                               
## [107] "Assembly and cell surface presentation of NMDA receptors"                             
## [108] "RAF-independent MAPK1/3 activation"                                                   
## [109] "Collagen formation"                                                                   
## [110] "Heparan sulfate/heparin (HS-GAG) metabolism"                                          
## [111] "Signaling by NOTCH3"                                                                  
## [112] "MyD88:MAL(TIRAP) cascade initiated on plasma membrane"                                
## [113] "Toll Like Receptor 2 (TLR2) Cascade"                                                  
## [114] "Toll Like Receptor TLR1:TLR2 Cascade"                                                 
## [115] "Toll Like Receptor TLR6:TLR2 Cascade"                                                 
## [116] "RHO GTPase cycle"                                                                     
## [117] "Nuclear Events (kinase and transcription factor activation)"                          
## [118] "Chondroitin sulfate/dermatan sulfate metabolism"                                      
## [119] "Activation of Matrix Metalloproteinases"                                              
## [120] "Metabolism of steroid hormones"                                                       
## [121] "The NLRP3 inflammasome"                                                               
## [122] "Neurotransmitter receptors and postsynaptic signal transmission"                      
## [123] "Amyloid fiber formation"                                                              
## [124] "Sphingolipid metabolism"                                                              
## [125] "Activated NOTCH1 Transmits Signal to the Nucleus"                                     
## [126] "Sensory processing of sound by outer hair cells of the cochlea"                       
## [127] "Signaling by CSF3 (G-CSF)"                                                            
## [128] "Signaling by FGFR1 in disease"                                                        
## [129] "Inflammasomes"                                                                        
## [130] "Cell recruitment (pro-inflammatory response)"                                         
## [131] "Purinergic signaling in leishmaniasis infection"                                      
## [132] "G alpha (s) signalling events"                                                        
## [133] "Prolactin receptor signaling"                                                         
## [134] "SEMA3A-Plexin repulsion signaling by inhibiting Integrin adhesion"                    
## [135] "Adaptive Immune System"                                                               
## [136] "G alpha (q) signalling events"                                                        
## [137] "Death Receptor Signaling"                                                             
## [138] "Ca-dependent events"                                                                  
## [139] "Antimicrobial peptides"

Compare clusterprofiler to fora.

v1 <- list("CP up"=ora_ups, "CP fix up"=orafix_ups, "FORA up" = fora_up)
v2 <- list("CP dn"=ora_dns, "CP fix dn"=orafix_dns, "FORA dn" = fora_dn)
v3 <- list("CP"=union(ora_ups,ora_dns),
  "CP fix"=union(orafix_ups,orafix_dns),
  "FORA" = union(fora_up,fora_dn) )

par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2),main="upregulated Reactomes")

plot(euler(v2),quantities = list(cex = 2), labels = list(cex = 2),main="downregulated Reactomes")

plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated Reactomes")

Jaccard index comparing CP to FORA.

ora <- c(ora_ups,ora_dns)
orafix <- c(orafix_ups,orafix_dns)
fora <- c(fora_up,fora_dn)

jaccard(ora,fora)
## [1] 0.6207865
jaccard(orafix,fora)
## [1] 0.8944591

Session information

For reproducibility


Click HERE to show session info

sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
##  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
##  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] gplots_3.1.3                eulerr_7.0.0               
##  [3] clusterProfiler_4.8.3       kableExtra_1.3.4           
##  [5] DESeq2_1.40.2               SummarizedExperiment_1.30.2
##  [7] Biobase_2.60.0              MatrixGenerics_1.12.3      
##  [9] matrixStats_1.0.0           GenomicRanges_1.52.0       
## [11] GenomeInfoDb_1.36.2         IRanges_2.34.1             
## [13] S4Vectors_0.38.1            BiocGenerics_0.46.0        
## [15] getDEE2_1.10.0              fgsea_1.26.0               
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3      rstudioapi_0.15.0       jsonlite_1.8.7         
##   [4] magrittr_2.0.3          farver_2.1.1            rmarkdown_2.24         
##   [7] zlibbioc_1.46.0         vctrs_0.6.3             memoise_2.0.1          
##  [10] RCurl_1.98-1.12         ggtree_3.8.2            webshot_0.5.5          
##  [13] htmltools_0.5.6         S4Arrays_1.0.6          gridGraphics_0.5-1     
##  [16] sass_0.4.7              KernSmooth_2.23-22      bslib_0.5.1            
##  [19] plyr_1.8.8              cachem_1.0.8            igraph_1.5.1           
##  [22] lifecycle_1.0.3         pkgconfig_2.0.3         gson_0.1.0             
##  [25] Matrix_1.6-1            R6_2.5.1                fastmap_1.1.1          
##  [28] GenomeInfoDbData_1.2.10 digest_0.6.33           aplot_0.2.0            
##  [31] enrichplot_1.20.1       colorspace_2.1-0        patchwork_1.1.3        
##  [34] AnnotationDbi_1.62.2    RSQLite_2.3.1           fansi_1.0.4            
##  [37] httr_1.4.7              polyclip_1.10-4         abind_1.4-5            
##  [40] compiler_4.3.1          bit64_4.0.5             withr_2.5.0            
##  [43] downloader_0.4          BiocParallel_1.34.2     viridis_0.6.4          
##  [46] DBI_1.1.3               highr_0.10              ggforce_0.4.1          
##  [49] MASS_7.3-60             DelayedArray_0.26.7     HDO.db_0.99.1          
##  [52] caTools_1.18.2          gtools_3.9.4            tools_4.3.1            
##  [55] scatterpie_0.2.1        ape_5.7-1               glue_1.6.2             
##  [58] nlme_3.1-163            GOSemSim_2.26.1         polylabelr_0.2.0       
##  [61] shadowtext_0.1.2        grid_4.3.1              reshape2_1.4.4         
##  [64] generics_0.1.3          gtable_0.3.4            tidyr_1.3.0            
##  [67] data.table_1.14.8       tidygraph_1.2.3         xml2_1.3.5             
##  [70] utf8_1.2.3              XVector_0.40.0          ggrepel_0.9.3          
##  [73] pillar_1.9.0            stringr_1.5.0           yulab.utils_0.0.9      
##  [76] splines_4.3.1           dplyr_1.1.3             tweenr_2.0.2           
##  [79] treeio_1.24.3           lattice_0.21-8          bit_4.0.5              
##  [82] tidyselect_1.2.0        GO.db_3.17.0            locfit_1.5-9.8         
##  [85] Biostrings_2.68.1       knitr_1.43              gridExtra_2.3          
##  [88] svglite_2.1.1           xfun_0.40               graphlayouts_1.0.0     
##  [91] stringi_1.7.12          lazyeval_0.2.2          ggfun_0.1.2            
##  [94] yaml_2.3.7              evaluate_0.21           codetools_0.2-19       
##  [97] ggraph_2.1.0            tibble_3.2.1            qvalue_2.32.0          
## [100] ggplotify_0.1.2         cli_3.6.1               systemfonts_1.0.4      
## [103] munsell_0.5.0           jquerylib_0.1.4         Rcpp_1.0.11            
## [106] png_0.1-8               parallel_4.3.1          ggplot2_3.4.3          
## [109] blob_1.2.4              DOSE_3.26.1             htm2txt_2.2.2          
## [112] bitops_1.0-7            viridisLite_0.4.2       tidytree_0.4.5         
## [115] scales_1.2.1            purrr_1.0.2             crayon_1.5.2           
## [118] rlang_1.1.1             cowplot_1.1.1           fastmatch_1.1-4        
## [121] KEGGREST_1.40.0         rvest_1.0.3