Source: https://github.com/markziemann/background
I previously wrote about the importance of a background gene list for accurate and correct enrichment analysis.
Here I will demonstrate how important this is, as well as providing a guide for how to make your own background gene list.
For this guide I will be using data from a previous analysis, which is deposited at NCBI GEO and SRA under accession numbers: GSE55123 and SRP038101.
The data was presented in a paper by Lund et al, 2014
The experiment is designed to investigate the effect of Azacitidine treatment on AML3 cells.
suppressPackageStartupMessages({
library("getDEE2")
library("DESeq2")
library("kableExtra")
library("vioplot")
library("clusterProfiler")
library("eulerr")
})
Using RNAseq data from DEE2, which can be accessed from dee2.io
mdat <- getDEE2Metadata("hsapiens")
# get sample sheet
ss <- subset(mdat,SRP_accession=="SRP038101")
# fetch the whole set of RNA-seq data
x <- getDEE2("hsapiens",ss$SRR_accession , metadata=mdat, legacy=TRUE)
## For more information about DEE2 QC metrics, visit
## https://github.com/markziemann/dee2/blob/master/qc/qc_metrics.md
mx <- x$GeneCounts
rownames(mx) <- paste(rownames(mx),x$GeneInfo$GeneSymbol)
dim(mx)
## [1] 58302 6
# aza no filtering
ss$trt <- grepl("Treated",ss$Experiment_title)
ss %>%
kbl(caption="sample sheet for Aza treatment in AML3 cells") %>%
kable_paper("hover", full_width = F)
SRR_accession | QC_summary | SRX_accession | SRS_accession | SRP_accession | Experiment_title | GEO_series | trt | |
---|---|---|---|---|---|---|---|---|
156267 | SRR1171523 | PASS | SRX472607 | SRS559064 | SRP038101 | GSM1329859: Untreated.1; Homo sapiens; RNA-Seq | FALSE | |
156268 | SRR1171524 | WARN(3,4) | SRX472608 | SRS559066 | SRP038101 | GSM1329860: Untreated.2; Homo sapiens; RNA-Seq | FALSE | |
156269 | SRR1171525 | WARN(3,4) | SRX472609 | SRS559065 | SRP038101 | GSM1329861: Untreated.3; Homo sapiens; RNA-Seq | FALSE | |
156270 | SRR1171526 | WARN(3,4) | SRX472610 | SRS559068 | SRP038101 | GSM1329862: Treated.1; Homo sapiens; RNA-Seq | TRUE | |
156271 | SRR1171527 | WARN(3,4) | SRX472611 | SRS559067 | SRP038101 | GSM1329863: Treated.2; Homo sapiens; RNA-Seq | TRUE | |
156272 | SRR1171528 | WARN(3,4) | SRX472612 | SRS559069 | SRP038101 | GSM1329864: Treated.3; Homo sapiens; RNA-Seq | TRUE |
Although I know this data set is good as I was involved in the project, it is a good idea to show at least the number of reads for each sample.
par(mar=c(5,7,5,1))
barplot(rev(colSums(mx)),horiz=TRUE,las=1,main="number of reads per sample in SRP038101")
Now we can have a look at the read count distribution for one of the samples.
# look at the distribution of counts
l <- lapply(1:ncol(mx),function(i) log10( mx[,i] +1 ))
names(l) <- colnames(mx)
l <- rev(l)
hist(l[[6]],main="Distribution of read counts in SRR1171523",
xlab="log10(readcounts+1)",breaks=30)
abline(v=1,col="red",lty=2)
We can see that most genes have a very low read count, consistent with most genes being undetectable in any sample.
Based on this typical distribution of read counts, I would consider any gene with fewer than 10 reads as not detected. This could be considered a strict detection threshold. Setting a strict threshold like this could be helpful in selecting genes that are likely to validate with qRT-PCR.
I will run DESeq2 with and without this threshold to see how it impacts the results.
To see whether this has any effect on downstream analysis, DESeq2 will be run on the same dataset without any filtering, with light filtering and with strict filtering.
Here I define light filter threshold as the lowest baseline expression level that DE genes start to appear. I know this contrast will yield thousands of DE genes, so this is a reasonable way to set the detection threshold. The DE genes with lowest baseline expression showed mean read counts of about 2.
# DESeq2 without any fitering
dds <- DESeqDataSetFromMatrix(countData = mx , colData = ss, design = ~ trt )
res <- DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(as.data.frame(z),mx)
de <- as.data.frame(zz[order(zz$pvalue),])
sig <- subset(de,padj<0.05)
head(de,10) %>%
kbl(caption="Top DE genes for Aza treatment (unfiltered)") %>%
kable_paper("hover", full_width = F)
baseMean | log2FoldChange | lfcSE | stat | pvalue | padj | SRR1171523 | SRR1171524 | SRR1171525 | SRR1171526 | SRR1171527 | SRR1171528 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ENSG00000165949 IFI27 | 1951.0174 | -3.375822 | 0.0926657 | -36.43012 | 0 | 0 | 4689 | 3583 | 2758 | 309 | 384 | 334 |
ENSG00000090382 LYZ | 7567.9024 | -1.641765 | 0.0531851 | -30.86889 | 0 | 0 | 14212 | 10237 | 10789 | 3476 | 3764 | 3738 |
ENSG00000115461 IGFBP5 | 528.5978 | -5.062576 | 0.1819135 | -27.82958 | 0 | 0 | 1320 | 823 | 1025 | 23 | 26 | 43 |
ENSG00000111331 OAS3 | 2117.9092 | -2.652602 | 0.0958528 | -27.67371 | 0 | 0 | 4204 | 3977 | 2972 | 562 | 614 | 560 |
ENSG00000157601 MX1 | 823.4574 | -2.869205 | 0.1059603 | -27.07810 | 0 | 0 | 1732 | 1501 | 1206 | 184 | 223 | 186 |
ENSG00000070915 SLC12A3 | 422.5759 | -3.366184 | 0.1332557 | -25.26108 | 0 | 0 | 1012 | 721 | 653 | 63 | 85 | 76 |
ENSG00000234745 HLA-B | 3185.6361 | -1.422980 | 0.0581834 | -24.45682 | 0 | 0 | 6085 | 4256 | 4023 | 1590 | 1872 | 1719 |
ENSG00000204525 HLA-C | 1625.8114 | -1.452921 | 0.0652846 | -22.25517 | 0 | 0 | 3112 | 2150 | 2106 | 791 | 923 | 891 |
ENSG00000137965 IFI44 | 407.2593 | -2.970374 | 0.1359830 | -21.84371 | 0 | 0 | 829 | 740 | 635 | 76 | 111 | 89 |
ENSG00000110042 DTX4 | 521.8803 | -2.461375 | 0.1203479 | -20.45217 | 0 | 0 | 1166 | 883 | 688 | 166 | 168 | 145 |
DET=nrow(mx)
NSIG=nrow(sig)
NUP=nrow(subset(sig,log2FoldChange>0))
NDN=nrow(subset(sig,log2FoldChange<0))
HEADER=paste(DET,"genes included;",NSIG,"w/FDR<0.05;",NUP,"up;",NDN,"down")
plot(log10(de$baseMean) ,de$log2FoldChange,
cex=0.6,pch=19,col="darkgray",
main="no detection filter",
xlab="log10(basemean)",ylab="log2(fold change)")
points(log10(sig$baseMean) ,sig$log2FoldChange,
cex=0.6,pch=19,col="red")
mtext(HEADER)
abline(v=1,lty=2)
deup <- rownames(subset(de,padj<=0.05 & log2FoldChange>0))
deup <- unique(sapply(strsplit(deup," "),"[[",2))
dedn <- rownames(subset(de,padj<=0.05 & log2FoldChange<0))
dedn <- unique(sapply(strsplit(dedn," "),"[[",2))
all <- rownames(de)
all <- unique(sapply(strsplit(all," "),"[[",2))
look at the lowest basemean genes
sig <- sig[order(sig$baseMean),]
head(sig,10) %>%
kbl(caption="lowest basemean genes") %>%
kable_paper("hover", full_width = F)
baseMean | log2FoldChange | lfcSE | stat | pvalue | padj | SRR1171523 | SRR1171524 | SRR1171525 | SRR1171526 | SRR1171527 | SRR1171528 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ENSG00000139679 LPAR6 | 3.252287 | -5.137261 | 1.480944 | -3.468909 | 0.0005226 | 0.0040009 | 8 | 7 | 5 | 0 | 0 | 0 |
ENSG00000121101 TEX14 | 3.260139 | -4.088679 | 1.488019 | -2.747733 | 0.0060009 | 0.0304398 | 4 | 6 | 8 | 0 | 1 | 0 |
ENSG00000170374 SP7 | 3.300960 | -4.118996 | 1.504304 | -2.738142 | 0.0061787 | 0.0312202 | 7 | 2 | 10 | 0 | 0 | 1 |
ENSG00000234235 BOK-AS1 | 3.316830 | -5.159547 | 1.487616 | -3.468332 | 0.0005237 | 0.0040036 | 7 | 4 | 9 | 0 | 0 | 0 |
ENSG00000163629 PTPN13 | 3.530105 | -4.214348 | 1.461784 | -2.883016 | 0.0039389 | 0.0216194 | 6 | 5 | 9 | 0 | 0 | 1 |
ENSG00000161896 IP6K3 | 3.549780 | -4.236643 | 1.448891 | -2.924059 | 0.0034550 | 0.0194498 | 9 | 6 | 6 | 0 | 0 | 1 |
ENSG00000204832 ST8SIA6-AS1 | 3.562172 | -5.269982 | 1.465897 | -3.595057 | 0.0003243 | 0.0026594 | 9 | 9 | 4 | 0 | 0 | 0 |
ENSG00000203797 DDO | 3.619810 | -4.275927 | 1.451662 | -2.945539 | 0.0032239 | 0.0183545 | 11 | 7 | 4 | 0 | 1 | 0 |
ENSG00000146592 CREB5 | 3.751125 | -5.342210 | 1.443747 | -3.700240 | 0.0002154 | 0.0018831 | 9 | 8 | 6 | 0 | 0 | 0 |
ENSG00000238222 MKRN4P | 3.807870 | -5.384167 | 1.454818 | -3.700920 | 0.0002148 | 0.0018812 | 15 | 5 | 5 | 0 | 0 | 0 |
The counts look quite low. Not sure that these DEGs could be validated with qRT-PCR.
mxf <- mx[which(rowMeans(mx)>=10),]
dim(mxf)
## [1] 13168 6
dds <- DESeqDataSetFromMatrix(countData = mxf , colData = ss, design = ~ trt )
res <- DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <-cbind(as.data.frame(z),mxf)
def <-as.data.frame(zz[order(zz$pvalue),])
head(def,10) %>%
kbl(caption="Top DE genes for Aza treatment (filtered)") %>%
kable_paper("hover", full_width = F)
baseMean | log2FoldChange | lfcSE | stat | pvalue | padj | SRR1171523 | SRR1171524 | SRR1171525 | SRR1171526 | SRR1171527 | SRR1171528 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ENSG00000165949 IFI27 | 1960.1970 | -3.384492 | 0.0938869 | -36.04861 | 0 | 0 | 4689 | 3583 | 2758 | 309 | 384 | 334 |
ENSG00000090382 LYZ | 7596.0299 | -1.650342 | 0.0561143 | -29.41036 | 0 | 0 | 14212 | 10237 | 10789 | 3476 | 3764 | 3738 |
ENSG00000115461 IGFBP5 | 531.2217 | -5.071157 | 0.1795239 | -28.24781 | 0 | 0 | 1320 | 823 | 1025 | 23 | 26 | 43 |
ENSG00000157601 MX1 | 827.1511 | -2.877795 | 0.1047823 | -27.46450 | 0 | 0 | 1732 | 1501 | 1206 | 184 | 223 | 186 |
ENSG00000111331 OAS3 | 2127.2010 | -2.661214 | 0.0972124 | -27.37525 | 0 | 0 | 4204 | 3977 | 2972 | 562 | 614 | 560 |
ENSG00000070915 SLC12A3 | 424.5509 | -3.374852 | 0.1298671 | -25.98697 | 0 | 0 | 1012 | 721 | 653 | 63 | 85 | 76 |
ENSG00000234745 HLA-B | 3197.0159 | -1.431566 | 0.0604169 | -23.69479 | 0 | 0 | 6085 | 4256 | 4023 | 1590 | 1872 | 1719 |
ENSG00000137965 IFI44 | 409.0957 | -2.978581 | 0.1319352 | -22.57608 | 0 | 0 | 829 | 740 | 635 | 76 | 111 | 89 |
ENSG00000204525 HLA-C | 1631.6421 | -1.461550 | 0.0660214 | -22.13750 | 0 | 0 | 3112 | 2150 | 2106 | 791 | 923 | 891 |
ENSG00000110042 DTX4 | 524.1318 | -2.470219 | 0.1173182 | -21.05572 | 0 | 0 | 1166 | 883 | 688 | 166 | 168 | 145 |
sigf <- subset(def,padj<=0.05)
DET=nrow(mxf)
NSIG=nrow(sigf)
NUP=nrow(subset(sigf,log2FoldChange>0))
NDN=nrow(subset(sigf,log2FoldChange<0))
HEADER=paste(DET,"detected genes;",NSIG,"w/FDR<0.05;",NUP,"up;",NDN,"down")
plot(log10(def$baseMean) ,def$log2FoldChange,
cex=0.6,pch=19,col="darkgray",
main="strict filter",
xlab="log10(basemean)",ylab="log2(fold change)")
points(log10(sigf$baseMean) ,sigf$log2FoldChange,
cex=0.6,pch=19,col="red")
mtext(HEADER)
defup <- rownames(subset(def,padj<=0.05 & log2FoldChange>0))
defup <- unique(sapply(strsplit(defup," "),"[[",2))
defdn <- rownames(subset(def,padj<=0.05 & log2FoldChange<0))
defdn <- unique(sapply(strsplit(defdn," "),"[[",2))
bg <- rownames(def)
bg <- unique(sapply(strsplit(bg," "),"[[",2))
I will run enrichment analysis using the correct correct background list, and compare the results to the incorrect whole genome background.
if (! file.exists("ReactomePathways.gmt")) {
download.file("https://reactome.org/download/current/ReactomePathways.gmt.zip",
destfile="ReactomePathways.gmt.zip")
unzip("ReactomePathways.gmt.zip")
}
genesets2 <- read.gmt("ReactomePathways.gmt")
Firstly with the upregulated genes
ora2_up <- as.data.frame(enricher(gene = defup ,
universe = bg, maxGSSize = 5000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora2_up$geneID <- NULL
ora2_up <- subset(ora2_up,p.adjust<0.05 & Count >=10)
ora2_ups <- rownames(ora2_up)
gr <- as.numeric(sapply(strsplit(ora2_up$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora2_up$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora2_up$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora2_up$BgRatio,"/"),"[[",2))
ora2_up$es <- gr/br
ora2_up <- ora2_up[order(-ora2_up$es),]
ora2_up$Description=NULL
head(ora2_up) %>%
kbl(row.names = FALSE, caption="Top upregulated pathways in Aza treatment (filtered)") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
Interactions of Rev with host cellular proteins | 24/1050 | 36/6944 | 0 | 0e+00 | 0e+00 | 24 | 4.408889 |
Nuclear import of Rev protein | 22/1050 | 33/6944 | 0 | 0e+00 | 0e+00 | 22 | 4.408889 |
Postmitotic nuclear pore complex (NPC) reformation | 18/1050 | 27/6944 | 0 | 1e-07 | 1e-07 | 18 | 4.408889 |
Rev-mediated nuclear export of HIV RNA | 22/1050 | 34/6944 | 0 | 0e+00 | 0e+00 | 22 | 4.279216 |
Export of Viral Ribonucleoproteins from Nucleus | 20/1050 | 31/6944 | 0 | 0e+00 | 0e+00 | 20 | 4.266667 |
NEP/NS2 Interacts with the Cellular Export Machinery | 20/1050 | 31/6944 | 0 | 0e+00 | 0e+00 | 20 | 4.266667 |
topup2 <- rev(head(ora2_up$es,20))
names(topup2) <- rev(head(ora2_up$ID,20))
Now with the downregulated genes
ora2_dn <- as.data.frame(enricher(gene = defdn ,
universe = bg, maxGSSize = 5000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora2_dn$geneID <- NULL
ora2_dn <- subset(ora2_dn,p.adjust<0.05 & Count >=10)
ora2_dns <- rownames(ora2_dn)
gr <- as.numeric(sapply(strsplit(ora2_dn$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora2_dn$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora2_dn$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora2_dn$BgRatio,"/"),"[[",2))
ora2_dn$es <- gr/br
ora2_dn <- ora2_dn[order(-ora2_dn$es),]
ora2_dn$Description=NULL
head(ora2_dn) %>%
kbl(row.names = FALSE, caption="Top downregulated pathways in Aza treatment (filtered)") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
Interleukin-20 family signaling | 10/1178 | 12/6944 | 0.0000009 | 0.0000594 | 0.0000530 | 10 | 4.912281 |
Interferon alpha/beta signaling | 41/1178 | 55/6944 | 0.0000000 | 0.0000000 | 0.0000000 | 41 | 4.394258 |
Interferon gamma signaling | 40/1178 | 63/6944 | 0.0000000 | 0.0000000 | 0.0000000 | 40 | 3.742690 |
Interleukin-10 signaling | 17/1178 | 27/6944 | 0.0000001 | 0.0000104 | 0.0000093 | 17 | 3.711501 |
Other interleukin signaling | 10/1178 | 18/6944 | 0.0002230 | 0.0080676 | 0.0072011 | 10 | 3.274854 |
Growth hormone receptor signaling | 10/1178 | 19/6944 | 0.0003996 | 0.0140422 | 0.0125340 | 10 | 3.102493 |
topdn2 <- head(ora2_dn$es,20)
names(topdn2) <- head(ora2_dn$ID,20)
Make a barplot
par(mar=c(5,20,5,1))
cols <- c(rep("blue",20),rep("red",20))
barplot(c(topdn2,topup2),
horiz=TRUE,las=1,cex.names=0.65,col=cols,
main="top DE Reactomes",
xlab="ES")
mtext("correct background")
Firstly with the upregulated genes
ora1_up <- as.data.frame(enricher(gene = defup ,
universe = all, maxGSSize = 5000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora1_up$geneID <- NULL
ora1_up <- subset(ora1_up,p.adjust<0.05 & Count >=10)
ora1_ups <- rownames(ora1_up)
gr <- as.numeric(sapply(strsplit(ora1_up$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora1_up$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora1_up$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora1_up$BgRatio,"/"),"[[",2))
ora1_up$es <- gr/br
ora1_up <- ora1_up[order(-ora1_up$es),]
ora1_up$Description=NULL
head(ora1_up) %>%
kbl(row.names = FALSE, caption="Top upregulated pathways in Aza treatment (unfiltered)") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
Interactions of Rev with host cellular proteins | 24/1050 | 36/10922 | 0 | 0 | 0 | 24 | 6.934603 |
Nuclear import of Rev protein | 22/1050 | 33/10922 | 0 | 0 | 0 | 22 | 6.934603 |
Postmitotic nuclear pore complex (NPC) reformation | 18/1050 | 27/10922 | 0 | 0 | 0 | 18 | 6.934603 |
Rev-mediated nuclear export of HIV RNA | 22/1050 | 34/10922 | 0 | 0 | 0 | 22 | 6.730644 |
NEP/NS2 Interacts with the Cellular Export Machinery | 20/1050 | 31/10922 | 0 | 0 | 0 | 20 | 6.710906 |
Transport of Ribonucleoproteins into the Host Nucleus | 20/1050 | 31/10922 | 0 | 0 | 0 | 20 | 6.710906 |
topup1 <- rev(head(ora1_up$es,20))
names(topup1) <- rev(head(ora1_up$ID,20))
Now with the downregulated genes
ora1_dn <- as.data.frame(enricher(gene = defdn ,
universe = all, maxGSSize = 5000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora1_dn$geneID <- NULL
ora1_dn <- subset(ora1_dn,p.adjust<0.05 & Count >=10)
ora1_dns <- rownames(ora1_dn)
gr <- as.numeric(sapply(strsplit(ora1_dn$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora1_dn$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora1_dn$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora1_dn$BgRatio,"/"),"[[",2))
ora1_dn$es <- gr/br
ora1_dn <- ora1_dn[order(-ora1_dn$es),]
ora1_dn$Description=NULL
head(ora1_dn) %>%
kbl(row.names = FALSE, caption="Top downregulated pathways in Aza treatment (unfiltered)") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
Interferon alpha/beta signaling | 41/1178 | 72/10922 | 0.00e+00 | 0.0000000 | 0.0000000 | 41 | 5.279688 |
Transcriptional regulation of granulopoiesis | 14/1178 | 31/10922 | 1.20e-06 | 0.0001297 | 0.0001109 | 14 | 4.187195 |
Interferon gamma signaling | 40/1178 | 92/10922 | 0.00e+00 | 0.0000000 | 0.0000000 | 40 | 4.031151 |
Antigen Presentation: Folding, assembly and peptide loading of class I MHC | 13/1178 | 30/10922 | 5.20e-06 | 0.0004012 | 0.0003431 | 13 | 4.017714 |
Growth hormone receptor signaling | 10/1178 | 24/10922 | 9.71e-05 | 0.0049763 | 0.0042556 | 10 | 3.863186 |
Other interleukin signaling | 10/1178 | 24/10922 | 9.71e-05 | 0.0049763 | 0.0042556 | 10 | 3.863186 |
topdn1 <- head(ora1_dn$es,20)
names(topdn1) <- head(ora1_dn$ID,20)
Make a barplot
par(mar=c(5,20,5,1))
cols <- c(rep("blue",20),rep("red",20))
barplot(c(topdn1,topup1),
horiz=TRUE,las=1,cex.names=0.65,col=cols,
main="top DE Reactomes",
xlab="ES")
mtext("whole genome background")
v1 <- list("correct up"=ora2_ups, "wg up"=ora1_ups)
v2 <- list("correct dn"=ora2_dns, "wg dn"=ora1_dns)
par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = TRUE)
plot(euler(v2),quantities = TRUE)
Jaccard index
jaccard <- function(a, b) {
intersection = length(intersect(a, b))
union = length(a) + length(b) - intersection
return (intersection/union)
}
correct <- c(ora2_ups,ora2_dns)
incorrect <- c(ora1_ups,ora1_dns)
jaccard(correct,incorrect)
## [1] 0.5572289
For reproducibility
sessionInfo()
## R version 4.2.2 Patched (2022-11-10 r83330)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] eulerr_7.0.0 clusterProfiler_4.4.4
## [3] vioplot_0.4.0 zoo_1.8-11
## [5] sm_2.2-5.7.1 kableExtra_1.3.4
## [7] DESeq2_1.36.0 SummarizedExperiment_1.26.1
## [9] Biobase_2.56.0 MatrixGenerics_1.8.1
## [11] matrixStats_0.63.0 GenomicRanges_1.48.0
## [13] GenomeInfoDb_1.32.2 IRanges_2.30.0
## [15] S4Vectors_0.34.0 BiocGenerics_0.42.0
## [17] getDEE2_1.6.0
##
## loaded via a namespace (and not attached):
## [1] shadowtext_0.1.2 fastmatch_1.1-3 systemfonts_1.0.4
## [4] plyr_1.8.8 igraph_1.3.5 lazyeval_0.2.2
## [7] polylabelr_0.2.0 splines_4.2.2 BiocParallel_1.30.3
## [10] ggplot2_3.4.0 digest_0.6.31 yulab.utils_0.0.5
## [13] htmltools_0.5.4 GOSemSim_2.22.0 viridis_0.6.2
## [16] GO.db_3.15.0 fansi_1.0.3 magrittr_2.0.3
## [19] memoise_2.0.1 Biostrings_2.64.0 annotate_1.74.0
## [22] graphlayouts_0.8.4 svglite_2.1.0 enrichplot_1.16.1
## [25] colorspace_2.0-3 blob_1.2.3 rvest_1.0.3
## [28] ggrepel_0.9.2 xfun_0.35 dplyr_1.0.10
## [31] crayon_1.5.2 RCurl_1.98-1.9 jsonlite_1.8.4
## [34] scatterpie_0.1.8 genefilter_1.78.0 survival_3.4-0
## [37] ape_5.6-2 glue_1.6.2 polyclip_1.10-4
## [40] gtable_0.3.1 zlibbioc_1.42.0 XVector_0.36.0
## [43] webshot_0.5.4 htm2txt_2.2.2 DelayedArray_0.22.0
## [46] scales_1.2.1 DOSE_3.22.0 DBI_1.1.3
## [49] Rcpp_1.0.9 viridisLite_0.4.1 xtable_1.8-4
## [52] gridGraphics_0.5-1 tidytree_0.4.1 bit_4.0.5
## [55] httr_1.4.4 fgsea_1.22.0 RColorBrewer_1.1-3
## [58] pkgconfig_2.0.3 XML_3.99-0.13 farver_2.1.1
## [61] sass_0.4.4 locfit_1.5-9.6 utf8_1.2.2
## [64] ggplotify_0.1.0 tidyselect_1.2.0 rlang_1.0.6
## [67] reshape2_1.4.4 AnnotationDbi_1.58.0 munsell_0.5.0
## [70] tools_4.2.2 cachem_1.0.6 downloader_0.4
## [73] cli_3.4.1 generics_0.1.3 RSQLite_2.2.19
## [76] evaluate_0.19 stringr_1.5.0 fastmap_1.1.0
## [79] yaml_2.3.6 ggtree_3.4.1 knitr_1.41
## [82] bit64_4.0.5 tidygraph_1.2.2 purrr_0.3.5
## [85] KEGGREST_1.36.3 ggraph_2.1.0 nlme_3.1-161
## [88] aplot_0.1.9 DO.db_2.9 xml2_1.3.3
## [91] compiler_4.2.2 rstudioapi_0.14 png_0.1-8
## [94] treeio_1.20.1 tibble_3.1.8 tweenr_2.0.2
## [97] geneplotter_1.74.0 bslib_0.4.1 stringi_1.7.8
## [100] highr_0.9 lattice_0.20-45 Matrix_1.5-3
## [103] vctrs_0.5.1 pillar_1.8.1 lifecycle_1.0.3
## [106] jquerylib_0.1.4 data.table_1.14.6 bitops_1.0-7
## [109] patchwork_1.1.2 qvalue_2.28.0 R6_2.5.1
## [112] gridExtra_2.3 codetools_0.2-19 MASS_7.3-58.2
## [115] assertthat_0.2.1 withr_2.5.0 GenomeInfoDbData_1.2.8
## [118] parallel_4.2.2 grid_4.2.2 ggfun_0.0.9
## [121] tidyr_1.2.1 rmarkdown_2.19 ggforce_0.4.1