Source: https://github.com/markziemann/enrichment_recipe
suppressPackageStartupMessages({
library("getDEE2")
library("DESeq2")
library("kableExtra")
library("clusterProfiler")
library("eulerr")
library("gplots")
library("mitch")
library("fgsea")
library("RhpcBLASctl")
library("msigdb")
library("ExperimentHub")
})
RhpcBLASctl::blas_set_num_threads(1)
For this guide I will be using bulk RNA-seq data from a previous study, which is deposited at NCBI GEO and SRA under accession numbers: GSE55123 and SRP038101 (Lund et al, 2014). The experiment is designed to investigate the effect of Azacitidine treatment on AML3 cells.
The raw data have been processed by the DEE2 project, and the summary gene expression counts are available at the dee2.io website, and programmatically with the getDEE2 bioconductor package (Ziemann et al, 2019).
The gene counts have also been deposited to the /data
folder in the example.Rdata
file in case the DEE2 resource
becomes unavailable. To import it, use the command:
load("data/example.Rdata")
Alternatively, you could fetch data from another resource like NCBI GEO, Zenodo or from the host storage drive.
mdat <- getDEE2Metadata("hsapiens")
# get sample sheet
ss <- subset(mdat,SRP_accession=="SRP038101")
# fetch the whole set of RNA-seq data
x <- getDEE2("hsapiens",ss$SRR_accession , metadata=mdat, legacy=TRUE)
## For more information about DEE2 QC metrics, visit
## https://github.com/markziemann/dee2/blob/master/qc/qc_metrics.md
mx <- x$GeneCounts
rownames(mx) <- paste(rownames(mx),x$GeneInfo$GeneSymbol)
dim(mx)
## [1] 58302 6
# aza no filtering
ss$trt <- grepl("Treated",ss$Experiment_title)
ss %>%
kbl(caption="sample sheet for Aza treatment in AML3 cells") %>%
kable_paper("hover", full_width = F)
SRR_accession | QC_summary | SRX_accession | SRS_accession | SRP_accession | Experiment_title | GEO_series | trt | |
---|---|---|---|---|---|---|---|---|
163028 | SRR1171523 | PASS | SRX472607 | SRS559064 | SRP038101 | GSM1329859: Untreated.1; Homo sapiens; RNA-Seq | GSE55123 | FALSE |
163029 | SRR1171524 | WARN(3,4) | SRX472608 | SRS559066 | SRP038101 | GSM1329860: Untreated.2; Homo sapiens; RNA-Seq | GSE55123 | FALSE |
163030 | SRR1171525 | WARN(3,4) | SRX472609 | SRS559065 | SRP038101 | GSM1329861: Untreated.3; Homo sapiens; RNA-Seq | GSE55123 | FALSE |
163031 | SRR1171526 | WARN(3,4) | SRX472610 | SRS559068 | SRP038101 | GSM1329862: Treated.1; Homo sapiens; RNA-Seq | GSE55123 | TRUE |
163032 | SRR1171527 | WARN(3,4) | SRX472611 | SRS559067 | SRP038101 | GSM1329863: Treated.2; Homo sapiens; RNA-Seq | GSE55123 | TRUE |
163033 | SRR1171528 | WARN(3,4) | SRX472612 | SRS559069 | SRP038101 | GSM1329864: Treated.3; Homo sapiens; RNA-Seq | GSE55123 | TRUE |
QC is important, even if you are using public transcriptome data. For RNA-seq it is a good idea to show the number of reads for each sample.
par(mar=c(5,7,5,1))
barplot(rev(colSums(mx)),horiz=TRUE,las=1,main="number of reads per sample in SRP038101")
Now make a MDS plot.
mds <- cmdscale(dist(t(mx)))
# expand plot area
XMIN=min(mds[,1])*1.3
XMAX=max(mds[,1])*1.3
YMIN=min(mds[,2])*1.3
YMAX=max(mds[,2])*1.3
cols <- as.character(grepl("Treated",ss$Experiment_title))
cols <- gsub("FALSE","lightblue",cols)
cols <- gsub("TRUE","pink",cols)
plot(mds, xlab="Coordinate 1", ylab="Coordinate 2",
xlim=c(XMIN,XMAX),ylim=c(YMIN,YMAX),
pch=19,cex=2,col=cols, main="MDS plot")
text(cmdscale(dist(t(mx))), labels=colnames(mx) )
I will be using DESeq2 for DE analysis, the count matrix is prefiltered using a detection threshold of 10 reads per sample across all samples. All genes that meet the detection threshold will comprise the background list. The first 6 rows of the count matrix are shows.
mxf <- mx[which(rowMeans(mx)>=10),]
dim(mxf)
## [1] 13168 6
head(mxf,6) %>%
kbl(caption="Count matrix format") %>%
kable_paper("hover", full_width = F)
SRR1171523 | SRR1171524 | SRR1171525 | SRR1171526 | SRR1171527 | SRR1171528 | |
---|---|---|---|---|---|---|
ENSG00000225630 MTND2P28 | 494 | 396 | 340 | 333 | 415 | 418 |
ENSG00000237973 MTCO1P12 | 52 | 39 | 40 | 30 | 37 | 29 |
ENSG00000248527 MTATP6P1 | 853 | 544 | 537 | 582 | 702 | 716 |
ENSG00000228327 AL669831.1 | 17 | 13 | 21 | 21 | 22 | 12 |
ENSG00000228794 LINC01128 | 42 | 27 | 30 | 32 | 40 | 23 |
ENSG00000230699 AL645608.3 | 20 | 11 | 13 | 10 | 15 | 22 |
dds <- DESeqDataSetFromMatrix(countData = mxf , colData = ss, design = ~ trt )
res <- DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <-cbind(as.data.frame(z),mxf)
def <-as.data.frame(zz[order(zz$pvalue),])
head(def,10) %>%
kbl(caption="Top DE genes for Aza treatment") %>%
kable_paper("hover", full_width = F)
baseMean | log2FoldChange | lfcSE | stat | pvalue | padj | SRR1171523 | SRR1171524 | SRR1171525 | SRR1171526 | SRR1171527 | SRR1171528 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ENSG00000165949 IFI27 | 1960.1970 | -3.384492 | 0.0938869 | -36.04861 | 0 | 0 | 4689 | 3583 | 2758 | 309 | 384 | 334 |
ENSG00000090382 LYZ | 7596.0299 | -1.650342 | 0.0561143 | -29.41036 | 0 | 0 | 14212 | 10237 | 10789 | 3476 | 3764 | 3738 |
ENSG00000115461 IGFBP5 | 531.2217 | -5.071157 | 0.1795239 | -28.24781 | 0 | 0 | 1320 | 823 | 1025 | 23 | 26 | 43 |
ENSG00000157601 MX1 | 827.1511 | -2.877795 | 0.1047823 | -27.46450 | 0 | 0 | 1732 | 1501 | 1206 | 184 | 223 | 186 |
ENSG00000111331 OAS3 | 2127.2010 | -2.661214 | 0.0972124 | -27.37525 | 0 | 0 | 4204 | 3977 | 2972 | 562 | 614 | 560 |
ENSG00000070915 SLC12A3 | 424.5509 | -3.374852 | 0.1298671 | -25.98697 | 0 | 0 | 1012 | 721 | 653 | 63 | 85 | 76 |
ENSG00000234745 HLA-B | 3197.0159 | -1.431566 | 0.0604169 | -23.69479 | 0 | 0 | 6085 | 4256 | 4023 | 1590 | 1872 | 1719 |
ENSG00000137965 IFI44 | 409.0957 | -2.978581 | 0.1319352 | -22.57608 | 0 | 0 | 829 | 740 | 635 | 76 | 111 | 89 |
ENSG00000204525 HLA-C | 1631.6421 | -1.461550 | 0.0660214 | -22.13750 | 0 | 0 | 3112 | 2150 | 2106 | 791 | 923 | 891 |
ENSG00000110042 DTX4 | 524.1318 | -2.470219 | 0.1173182 | -21.05572 | 0 | 0 | 1166 | 883 | 688 | 166 | 168 | 145 |
Make a smear plot.
sigf <- subset(def,padj<=0.05)
DET=nrow(mxf)
NSIG=nrow(sigf)
NUP=nrow(subset(sigf,log2FoldChange>0))
NDN=nrow(subset(sigf,log2FoldChange<0))
HEADER=paste(DET,"detected genes;",NSIG,"w/FDR<0.05;",NUP,"up;",NDN,"down")
plot(log10(def$baseMean) ,def$log2FoldChange,
cex=0.6,pch=19,col="darkgray",
main="5-azacitidine treatment in AML3 cells",
xlab="log10(basemean)",ylab="log2(fold change)")
points(log10(sigf$baseMean) ,sigf$log2FoldChange,
cex=0.6,pch=19,col="red")
mtext(HEADER)
In the next sections I will run enrichment analysis with over-representation test and compare it to functional class scoring.
# from https://reactome.org/download/current/ReactomePathways.gmt.zip
genesets2 <- read.gmt("ref/ReactomePathways_2023-03-06.gmt")
gsets <- gmtPathways("ref/ReactomePathways_2023-03-06.gmt")
Will try FORA and mitch.
Reporting criteria | Method/resource used |
---|---|
Origin of gene sets | Reactome (2023-03-06) |
Tool used | FORA (check fgsea version at foot of report) |
Statistical test used | hypergeometric |
P-value correction for multiple comparisons | FDR method |
Background list | Genes with >=10 reads per sample on average across all samples |
Gene Selection | DESeq2 FDR<0.05 and log2FC>0 or log2FC<0 |
Data availability | via DEE2 at accession SRP038101 (human) |
Other parameters | Min gene set size of 5 |
up <- unique(sapply(strsplit(rownames(subset(def, log2FoldChange>0 & padj<0.05))," "),"[[",2))
dn <- unique(sapply(strsplit(rownames(subset(def, log2FoldChange<0 & padj<0.05))," "),"[[",2))
bg <- unique(sapply(strsplit(rownames(def)," "),"[[",2))
gt <- read.table("https://dee2.io/data/hsapiens/hsa_gene_info.tsv",header=TRUE)
head(gt)
## GeneID GeneSymbol mean median longest_isoform merged
## 1 ENSG00000223972 DDX11L1 1144 1144 1657 1735
## 2 ENSG00000227232 WASH7P 1351 1351 1351 1351
## 3 ENSG00000278267 MIR6859-1 68 68 68 68
## 4 ENSG00000243485 MIR1302-2HG 623 623 712 1021
## 5 ENSG00000284332 MIR1302-2 138 138 138 138
## 6 ENSG00000237613 FAM138A 888 888 1187 1219
all <- unique(gt$GeneSymbol)
lapply(list(up,dn,bg,all),length)
## [[1]]
## [1] 1672
##
## [[2]]
## [1] 1926
##
## [[3]]
## [1] 13164
##
## [[4]]
## [1] 56648
fres_up_bg <- fora(pathways=gsets, genes=up, universe=bg, minSize = 5, maxSize = Inf)
fres_up_bg <- subset(fres_up_bg,padj<0.05)$pathway
length(fres_up_bg)
## [1] 228
fres_up_all <- fora(pathways=gsets, genes=up, universe=all, minSize = 5, maxSize = Inf)
fres_up_all <- subset(fres_up_all,padj<0.05)$pathway
length(fres_up_all)
## [1] 701
fres_dn_bg <- fora(pathways=gsets, genes=dn, universe=bg, minSize = 5, maxSize = Inf)
fres_dn_bg <- subset(fres_dn_bg,padj<0.05)$pathway
length(fres_dn_bg)
## [1] 128
fres_dn_all <- fora(pathways=gsets, genes=dn, universe=all, minSize = 5, maxSize = Inf)
fres_dn_all <- subset(fres_dn_all,padj<0.05)$pathway
length(fres_dn_all)
## [1] 763
v1 <- list("up"=fres_up_bg, "up*"=fres_up_all, "dn"=fres_dn_bg, "dn*"=fres_dn_all)
par(mar=c(1,1,1,1))
par(mfrow=c(1,1))
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2))
v2 <- list("up"=fres_up_bg, "up*"=fres_up_all)
plot(euler(v2),quantities = list(cex = 2), labels = list(cex = 2))
par(mar=c(5.1,4.1,4.1,2.1))
Reporting criteria | Method/resource used |
---|---|
Origin of gene sets | Reactome (2023-03-06) |
Tool used | mitch (check version at foot of report) |
Statistical test used | Rank ANOVA (Kruskal Wallis) |
P-value correction for multiple comparisons | FDR method |
Background list | Genes with >=10 reads per sample on average across all samples |
Gene Scoring | DESeq2 Wald stat |
Data availability | via DEE2 at accession SRP038101 (human) |
Other parameters | Min gene set size of 5 |
gt <- def
gt$genename <- sapply(strsplit(rownames(gt)," "),"[[",2)
gt <- gt[,"genename",drop=FALSE]
gt$id <- rownames(gt)
m <- mitch_import(x=def,DEtype="deseq2",geneTable=gt)
## The input is a single dataframe; one contrast only. Converting
## it to a list for you.
## Note: Mean no. genes in input = 13168
## Note: no. genes in output = 13164
## Note: estimated proportion of input genes in output = 1
mres <- mitch_calc(x=m,genesets=gsets,minsetsize=5,cores=4,priority="significance")
## Note: When prioritising by significance (ie: small
## p-values), large effect sizes might be missed.
mres_sig_up <- head(subset(mres$enrichment_result,s.dist>0),10)
mres_sig_dn <- head(subset(mres$enrichment_result,s.dist<0),10)
mres_sig_up %>%
kbl(caption="Top upregulated pathways when prioritised by FDR") %>%
kable_paper("hover", full_width = F)
set | setSize | pANOVA | s.dist | p.adjustANOVA | |
---|---|---|---|---|---|
203 | Cell Cycle | 589 | 0 | 0.3441689 | 0 |
205 | Cell Cycle, Mitotic | 477 | 0 | 0.3452345 | 0 |
781 | Metabolism of RNA | 697 | 0 | 0.2865202 | 0 |
204 | Cell Cycle Checkpoints | 250 | 0 | 0.3905566 | 0 |
744 | M Phase | 338 | 0 | 0.3342014 | 0 |
826 | Mitotic Prometaphase | 189 | 0 | 0.4303983 | 0 |
1044 | Processing of Capped Intron-Containing Pre-mRNA | 271 | 0 | 0.3488626 | 0 |
825 | Mitotic Metaphase and Anaphase | 220 | 0 | 0.3660460 | 0 |
822 | Mitotic Anaphase | 219 | 0 | 0.3631920 | 0 |
1248 | Resolution of Sister Chromatid Cohesion | 114 | 0 | 0.5009290 | 0 |
mres_sig_dn %>%
kbl(caption="Top downregulated pathways when prioritised by FDR") %>%
kable_paper("hover", full_width = F)
set | setSize | pANOVA | s.dist | p.adjustANOVA | |
---|---|---|---|---|---|
651 | Innate Immune System | 751 | 0 | -0.2338351 | 0 |
628 | Immune System | 1404 | 0 | -0.1750727 | 0 |
667 | Interferon alpha/beta signaling | 55 | 0 | -0.6884694 | 0 |
904 | Neutrophil degranulation | 379 | 0 | -0.2625552 | 0 |
668 | Interferon gamma signaling | 63 | 0 | -0.6128128 | 0 |
280 | Cytokine Signaling in Immune system | 529 | 0 | -0.2075498 | 0 |
445 | Extracellular matrix organization | 144 | 0 | -0.3484596 | 0 |
1016 | Platelet activation, signaling and aggregation | 173 | 0 | -0.3008010 | 0 |
1391 | Signaling by Interleukins | 338 | 0 | -0.2050024 | 0 |
629 | Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell | 60 | 0 | -0.4644790 | 0 |
mres <- mitch_calc(x=m,genesets=gsets,minsetsize=5,cores=4,priority="effect")
## Note: Enrichments with large effect sizes may not be
## statistically significant.
mres_es_up <- head(subset(mres$enrichment_result,s.dist>0 & p.adjustANOVA < 0.05),10)
mres_es_dn <- head(subset(mres$enrichment_result,s.dist<0 & p.adjustANOVA < 0.05),10)
mres_es_up %>%
kbl(caption="Top upregulated pathways when prioritised by ES (discarding any with FDR>0.05)") %>%
kable_paper("hover", full_width = F)
set | setSize | pANOVA | s.dist | p.adjustANOVA | |
---|---|---|---|---|---|
520 | G2/M DNA replication checkpoint | 5 | 0.0011098 | 0.8420853 | 0.0064467 |
54 | Activation of NOXA and translocation to mitochondria | 5 | 0.0011135 | 0.8418421 | 0.0064467 |
255 | Condensation of Prometaphase Chromosomes | 11 | 0.0000025 | 0.8196333 | 0.0000319 |
1031 | Postmitotic nuclear pore complex (NPC) reformation | 27 | 0.0000000 | 0.7477072 | 0.0000000 |
1007 | Phosphorylation of Emi1 | 6 | 0.0015534 | 0.7459847 | 0.0082652 |
663 | Interactions of Rev with host cellular proteins | 36 | 0.0000000 | 0.7422727 | 0.0000000 |
924 | Nuclear import of Rev protein | 33 | 0.0000000 | 0.7326982 | 0.0000000 |
1259 | Rev-mediated nuclear export of HIV RNA | 34 | 0.0000000 | 0.7286636 | 0.0000000 |
1610 | Transport of Ribonucleoproteins into the Host Nucleus | 31 | 0.0000000 | 0.7202221 | 0.0000000 |
441 | Export of Viral Ribonucleoproteins from Nucleus | 31 | 0.0000000 | 0.7153931 | 0.0000000 |
mres_es_dn %>%
kbl(caption="Top upregulated pathways when prioritised by ES (discarding any with FDR>0.05)") %>%
kable_paper("hover", full_width = F)
set | setSize | pANOVA | s.dist | p.adjustANOVA | |
---|---|---|---|---|---|
1055 | Prostanoid ligand receptors | 5 | 0.0009520 | -0.8532411 | 0.0057662 |
692 | Interleukin-9 signaling | 6 | 0.0005909 | -0.8099002 | 0.0039116 |
680 | Interleukin-2 signaling | 7 | 0.0003426 | -0.7814743 | 0.0025008 |
682 | Interleukin-21 signaling | 8 | 0.0002535 | -0.7469406 | 0.0019502 |
470 | Fatty Acids bound to GPR40 (FFAR1) regulate insulin secretion | 6 | 0.0017660 | -0.7371434 | 0.0091680 |
1323 | Scavenging by Class A Receptors | 5 | 0.0044139 | -0.7351774 | 0.0193308 |
1470 | Synthesis of Lipoxins (LX) | 5 | 0.0067423 | -0.6996428 | 0.0267811 |
681 | Interleukin-20 family signaling | 12 | 0.0000327 | -0.6924295 | 0.0003387 |
427 | Epithelial-Mesenchymal Transition (EMT) during gastrulation | 5 | 0.0076657 | -0.6885782 | 0.0294223 |
667 | Interferon alpha/beta signaling | 55 | 0.0000000 | -0.6884694 | 0.0000000 |
Make a barplot
par(mar=c(2,20,2,0))
mres_sig_up_vals <- -log(mres_sig_up$pANOVA)
names(mres_sig_up_vals) <- mres_sig_up$set
mres_sig_dn_vals <- -log(mres_sig_dn$pANOVA)
names(mres_sig_dn_vals) <- mres_sig_dn$set
mres_es_up_vals <- mres_es_up$s.dist
names(mres_es_up_vals) <- mres_es_up$set
mres_es_dn_vals <- -1 * mres_es_dn$s.dist
names(mres_es_dn_vals) <- mres_es_dn$set
par(mfrow=c(2,1))
barplot( mres_sig_up_vals,
horiz=TRUE,las=1,cex.names=0.65,col="gray",
main="Up Sig (-log10 p-value)",
xlab="-log10 p-value",
cex.main=0.9)
barplot( mres_es_up_vals,
horiz=TRUE,las=1,cex.names=0.65,col="gray",
main="Up ES",
xlab="ES",
cex.main=0.9)
barplot( mres_sig_dn_vals,
horiz=TRUE,las=1,cex.names=0.65,col="gray",
main="Down Sig (-log10 p-value)",
xlab="-log10 p-value",
cex.main=0.9)
barplot( mres_es_dn_vals,
horiz=TRUE,las=1,cex.names=0.65,col="gray",
main="Down ES",
xlab="-ES",
cex.main=0.9)
par(mfrow=c(1,1))
par(mar=c(5.1,4.1,4.1,2.1))
bg <- unique(sapply(strsplit(rownames(def)," "),"[[",2))
nrow(subset(def,padj<0.05 & log2FoldChange>0))
## [1] 1672
nrow(subset(def,padj<0.05 & log2FoldChange<0))
## [1] 1926
nrow(def)
## [1] 13168
sizes <- c(50,100,200,300,400,500,750,1000,1500,2000,2500,3000,4000,5000,6000,7000)
fup <- nrow(subset(fora(pathways=gsets, genes=up, universe=bg, minSize = 5, maxSize = Inf),padj<0.05))
fdn <- nrow(subset(fora(pathways=gsets, genes=dn, universe=bg, minSize = 5, maxSize = Inf),padj<0.05))
nups <- unlist(lapply(sizes, function(i) {
genes <- unique(sapply(strsplit(rownames(head(subset(def, log2FoldChange>0),i))," "),"[[",2))
fres <- fora(pathways=gsets, genes=genes, universe=bg, minSize = 5, maxSize = Inf)
nrow(subset(fres,padj<0.05))
}))
ndns <- unlist(lapply(sizes, function(i) {
genes <- unique(sapply(strsplit(rownames(head(subset(def, log2FoldChange<0),i))," "),"[[",2))
fres <- fora(pathways=gsets, genes=genes, universe=bg, minSize = 5, maxSize = Inf)
nrow(subset(fres,padj<0.05))
}))
df <- data.frame("up"=nups,"dn"=ndns,row.names=sizes)
df$tot <- df$up + df$dn
df
## up dn tot
## 50 0 24 24
## 100 0 34 34
## 200 8 40 48
## 300 44 41 85
## 400 101 45 146
## 500 123 55 178
## 750 198 99 297
## 1000 203 113 316
## 1500 232 114 346
## 2000 259 139 398
## 2500 287 169 456
## 3000 283 159 442
## 4000 269 141 410
## 5000 212 105 317
## 6000 187 100 287
## 7000 166 60 226
plot(sizes,df$up,type="b",col="red",ylab="no. gene sets with FDR<0.05",xlab="gene list length")
lines(sizes,df$dn,type="b",col="blue")
plot(sizes,df$up,type="b",col="red",ylab="no. gene sets with FDR<0.05",xlab="gene list length",
xlim=c(0,500), ylim=c(0,125))
lines(sizes,df$dn,type="b",col="blue")
updn <- unique(c(up,dn))
fres_updn_bg <- fora(pathways=gsets, genes=updn, universe=bg, minSize = 5, maxSize = Inf)
fres_updn_bg <- subset(fres_updn_bg,padj<0.05)$pathway
length(fres_updn_bg)
## [1] 149
length(fres_up_bg)
## [1] 228
length(fres_dn_bg)
## [1] 128
length(unique(c(fres_up_bg,fres_dn_bg)))
## [1] 355
v1 <- list("up"=fres_up_bg,"dn"=fres_dn_bg,"both"=fres_updn_bg)
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2))
msdb <- gmtPathways("msigdb.v2025.1.Hs.symbols.gmt")
kegg <- msdb[grep("KEGG",names(msdb))]
kegg_med <- msdb[grep("KEGG_MED",names(msdb))]
kegg <- kegg[grep("KEGG_MED",names(kegg),invert=TRUE)] #separate KEGG LEG from KEGG MED
gobp <- msdb[grep("GOBP",names(msdb))]
reactome <- msdb[grep("REACTOME",names(msdb))]
gslist <- list("KEGG"=kegg,"KEGGM"=kegg_med,"Reactome"=reactome,"GOBP"=gobp,"MSigDB"=msdb)
# no. gene sets
d1 <- unlist( lapply(gslist,length) )
# total number of annotations
d2 <- unlist( lapply(gslist,function(i) { length(unlist(i)) } ) )
# median gene set size
d3 <- unlist(lapply(gslist,function(i) { median(unlist(lapply(i,length))) } ))
# number of genes with one or more functions
d4 <- unlist( lapply(gslist,function(i) { length(unique(unlist(i))) } ))
d5 <- unlist(lapply(gslist,function(i) { median(unlist(lapply(i,length))) } ))
data.frame(d1,d2,d3,d4) %>%
kbl(caption="Gene set metrics") %>%
kable_paper("hover", full_width = F)
d1 | d2 | d3 | d4 | |
---|---|---|---|---|
KEGG | 186 | 12800 | 52.5 | 5245 |
KEGGM | 658 | 9662 | 11.5 | 2788 |
Reactome | 1787 | 97544 | 23.0 | 11369 |
GOBP | 7583 | 616242 | 20.0 | 18000 |
MSigDB | 35134 | 4089406 | 47.0 | 43351 |
For reproducibility
save.image("example_10m.Rdata")
sessionInfo()
## R version 4.2.3 (2023-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] GSEABase_1.60.0 graph_1.76.0
## [3] annotate_1.76.0 XML_3.99-0.14
## [5] AnnotationDbi_1.60.2 ExperimentHub_2.6.0
## [7] AnnotationHub_3.6.0 BiocFileCache_2.6.1
## [9] dbplyr_2.3.2 msigdb_1.6.0
## [11] RhpcBLASctl_0.23-42 fgsea_1.24.0
## [13] mitch_1.10.0 gplots_3.1.3
## [15] eulerr_7.0.0 clusterProfiler_4.6.2
## [17] kableExtra_1.3.4 DESeq2_1.38.3
## [19] SummarizedExperiment_1.28.0 Biobase_2.58.0
## [21] MatrixGenerics_1.10.0 matrixStats_0.63.0
## [23] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
## [25] IRanges_2.32.0 S4Vectors_0.36.2
## [27] BiocGenerics_0.44.0 getDEE2_1.8.0
##
## loaded via a namespace (and not attached):
## [1] shadowtext_0.1.2 fastmatch_1.1-3
## [3] systemfonts_1.0.4 plyr_1.8.8
## [5] igraph_1.4.2 lazyeval_0.2.2
## [7] polylabelr_0.2.0 splines_4.2.3
## [9] BiocParallel_1.32.6 ggplot2_3.4.2
## [11] digest_0.6.31 yulab.utils_0.0.6
## [13] htmltools_0.5.5 GOSemSim_2.24.0
## [15] viridis_0.6.2 GO.db_3.16.0
## [17] fansi_1.0.4 magrittr_2.0.3
## [19] memoise_2.0.1 Biostrings_2.66.0
## [21] graphlayouts_0.8.4 svglite_2.1.1
## [23] enrichplot_1.18.4 colorspace_2.1-0
## [25] rappdirs_0.3.3 blob_1.2.4
## [27] rvest_1.0.3 ggrepel_0.9.3
## [29] xfun_0.38 dplyr_1.1.1
## [31] crayon_1.5.2 RCurl_1.98-1.12
## [33] jsonlite_1.8.4 scatterpie_0.1.8
## [35] ape_5.7-1 glue_1.6.2
## [37] polyclip_1.10-4 gtable_0.3.3
## [39] zlibbioc_1.44.0 XVector_0.38.0
## [41] webshot_0.5.4 htm2txt_2.2.2
## [43] DelayedArray_0.24.0 scales_1.2.1
## [45] DOSE_3.24.2 GGally_2.1.2
## [47] DBI_1.1.3 Rcpp_1.0.10
## [49] viridisLite_0.4.1 xtable_1.8-4
## [51] gridGraphics_0.5-1 tidytree_0.4.2
## [53] bit_4.0.5 htmlwidgets_1.6.2
## [55] httr_1.4.5 RColorBrewer_1.1-3
## [57] ellipsis_0.3.2 reshape_0.8.9
## [59] pkgconfig_2.0.3 farver_2.1.1
## [61] sass_0.4.5 locfit_1.5-9.7
## [63] utf8_1.2.3 later_1.3.0
## [65] ggplotify_0.1.0 tidyselect_1.2.0
## [67] rlang_1.1.0 reshape2_1.4.4
## [69] BiocVersion_3.16.0 munsell_0.5.0
## [71] tools_4.2.3 cachem_1.0.7
## [73] downloader_0.4 cli_3.6.1
## [75] generics_0.1.3 RSQLite_2.3.1
## [77] gson_0.1.0 evaluate_0.20
## [79] stringr_1.5.0 fastmap_1.1.1
## [81] yaml_2.3.7 ggtree_3.6.2
## [83] knitr_1.42 bit64_4.0.5
## [85] tidygraph_1.2.3 caTools_1.18.2
## [87] purrr_1.0.1 KEGGREST_1.38.0
## [89] ggraph_2.1.0 nlme_3.1-162
## [91] mime_0.12 aplot_0.1.10
## [93] xml2_1.3.3 compiler_4.2.3
## [95] rstudioapi_0.14 interactiveDisplayBase_1.36.0
## [97] filelock_1.0.2 curl_5.0.0
## [99] beeswarm_0.4.0 png_0.1-8
## [101] treeio_1.22.0 tibble_3.2.1
## [103] tweenr_2.0.2 geneplotter_1.76.0
## [105] bslib_0.4.2 stringi_1.7.12
## [107] highr_0.10 lattice_0.21-8
## [109] Matrix_1.5-4 vctrs_0.6.2
## [111] pillar_1.9.0 lifecycle_1.0.3
## [113] BiocManager_1.30.20 jquerylib_0.1.4
## [115] data.table_1.14.8 cowplot_1.1.1
## [117] bitops_1.0-7 httpuv_1.6.9
## [119] patchwork_1.1.2 qvalue_2.30.0
## [121] R6_2.5.1 promises_1.2.0.1
## [123] echarts4r_0.4.4 KernSmooth_2.23-20
## [125] gridExtra_2.3 codetools_0.2-19
## [127] MASS_7.3-58.3 gtools_3.9.4
## [129] withr_2.5.0 GenomeInfoDbData_1.2.9
## [131] parallel_4.2.3 grid_4.2.3
## [133] ggfun_0.0.9 tidyr_1.3.0
## [135] HDO.db_0.99.1 rmarkdown_2.21
## [137] ggforce_0.4.1 shiny_1.7.4