Source: https://github.com/markziemann/background
This guide is a Rmarkdown script that conducts differential expression and enrichment analysis, which are very popular workflows for transcriptome data.
The goal of this work is to understand how ClusterProfiler manages the background list, as we have observed some weird behaviour. In particular we see that when we provide a custom background, those genes do not appear in the final analysis unless they are also included in the gene list.
In the code chunk below called libs
, you can add and remove required R library dependancies. Check that the libraries listed here match the Dockerfile, otherwise you might get errors.
suppressPackageStartupMessages({
library("getDEE2")
library("DESeq2")
library("kableExtra")
library("clusterProfiler")
library("fgsea")
library("eulerr")
library("gplots")
})
For this guide I will be using bulk RNA-seq data from a previous study, which is deposited at NCBI GEO and SRA under accession numbers: GSE55123 and SRP038101 (Lund et al, 2014). The experiment is designed to investigate the effect of Azacitidine treatment on AML3 cells.
The raw data have been processed by the DEE2 project, and the summary gene expression counts are available at the dee2.io website, and programmatically with the getDEE2 bioconductor package (Ziemann et al, 2019).
Alternatively, you could fetch data from another resource like NCBI GEO, Zenodo or from the host storage drive.
mdat <- getDEE2Metadata("hsapiens")
# get sample sheet
ss <- subset(mdat,SRP_accession=="SRP038101")
# fetch the whole set of RNA-seq data
x <- getDEE2("hsapiens",ss$SRR_accession , metadata=mdat, legacy=TRUE)
## For more information about DEE2 QC metrics, visit
## https://github.com/markziemann/dee2/blob/master/qc/qc_metrics.md
mx <- x$GeneCounts
rownames(mx) <- paste(rownames(mx),x$GeneInfo$GeneSymbol)
dim(mx)
## [1] 58302 6
# aza no filtering
ss$trt <- grepl("Treated",ss$Experiment_title)
ss %>%
kbl(caption="sample sheet for Aza treatment in AML3 cells") %>%
kable_paper("hover", full_width = F)
SRR_accession | QC_summary | SRX_accession | SRS_accession | SRP_accession | Experiment_title | GEO_series | trt | |
---|---|---|---|---|---|---|---|---|
156267 | SRR1171523 | PASS | SRX472607 | SRS559064 | SRP038101 | GSM1329859: Untreated.1; Homo sapiens; RNA-Seq | FALSE | |
156268 | SRR1171524 | WARN(3,4) | SRX472608 | SRS559066 | SRP038101 | GSM1329860: Untreated.2; Homo sapiens; RNA-Seq | FALSE | |
156269 | SRR1171525 | WARN(3,4) | SRX472609 | SRS559065 | SRP038101 | GSM1329861: Untreated.3; Homo sapiens; RNA-Seq | FALSE | |
156270 | SRR1171526 | WARN(3,4) | SRX472610 | SRS559068 | SRP038101 | GSM1329862: Treated.1; Homo sapiens; RNA-Seq | TRUE | |
156271 | SRR1171527 | WARN(3,4) | SRX472611 | SRS559067 | SRP038101 | GSM1329863: Treated.2; Homo sapiens; RNA-Seq | TRUE | |
156272 | SRR1171528 | WARN(3,4) | SRX472612 | SRS559069 | SRP038101 | GSM1329864: Treated.3; Homo sapiens; RNA-Seq | TRUE |
QC is important, even if you are using public transcriptome data. For RNA-seq it is a good idea to show the number of reads for each sample.
par(mar=c(5,7,5,1))
barplot(rev(colSums(mx)),horiz=TRUE,las=1,main="number of reads per sample in SRP038101")
Now make a MDS plot.
mds <- cmdscale(dist(t(mx)))
# expand plot area
XMIN=min(mds[,1])*1.3
XMAX=max(mds[,1])*1.3
YMIN=min(mds[,2])*1.3
YMAX=max(mds[,2])*1.3
cols <- as.character(grepl("Treated",ss$Experiment_title))
cols <- gsub("FALSE","lightblue",cols)
cols <- gsub("TRUE","pink",cols)
plot(mds, xlab="Coordinate 1", ylab="Coordinate 2",
xlim=c(XMIN,XMAX),ylim=c(YMIN,YMAX),
pch=19,cex=2,col=cols, main="MDS plot")
text(cmdscale(dist(t(mx))), labels=colnames(mx) )
I will be using DESeq2 for DE analysis, the count matrix is prefiltered using a detection threshold of 10 reads per sample across all samples. All genes that meet the detection threshold will comprise the background list. The first 6 rows of the count matrix are shows.
mxf <- mx[which(rowMeans(mx)>=10),]
dim(mxf)
## [1] 13168 6
head(mxf,6) %>%
kbl(caption="Count matrix format") %>%
kable_paper("hover", full_width = F)
SRR1171523 | SRR1171524 | SRR1171525 | SRR1171526 | SRR1171527 | SRR1171528 | |
---|---|---|---|---|---|---|
ENSG00000225630 MTND2P28 | 494 | 396 | 340 | 333 | 415 | 418 |
ENSG00000237973 MTCO1P12 | 52 | 39 | 40 | 30 | 37 | 29 |
ENSG00000248527 MTATP6P1 | 853 | 544 | 537 | 582 | 702 | 716 |
ENSG00000228327 AL669831.1 | 17 | 13 | 21 | 21 | 22 | 12 |
ENSG00000228794 LINC01128 | 42 | 27 | 30 | 32 | 40 | 23 |
ENSG00000230699 AL645608.3 | 20 | 11 | 13 | 10 | 15 | 22 |
dds <- DESeqDataSetFromMatrix(countData = mxf , colData = ss, design = ~ trt )
res <- DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <-cbind(as.data.frame(z),mxf)
def <-as.data.frame(zz[order(zz$pvalue),])
head(def,10) %>%
kbl(caption="Top DE genes for Aza treatment") %>%
kable_paper("hover", full_width = F)
baseMean | log2FoldChange | lfcSE | stat | pvalue | padj | SRR1171523 | SRR1171524 | SRR1171525 | SRR1171526 | SRR1171527 | SRR1171528 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ENSG00000165949 IFI27 | 1960.1970 | -3.384492 | 0.0938869 | -36.04861 | 0 | 0 | 4689 | 3583 | 2758 | 309 | 384 | 334 |
ENSG00000090382 LYZ | 7596.0299 | -1.650342 | 0.0561143 | -29.41036 | 0 | 0 | 14212 | 10237 | 10789 | 3476 | 3764 | 3738 |
ENSG00000115461 IGFBP5 | 531.2217 | -5.071157 | 0.1795239 | -28.24781 | 0 | 0 | 1320 | 823 | 1025 | 23 | 26 | 43 |
ENSG00000157601 MX1 | 827.1511 | -2.877795 | 0.1047823 | -27.46450 | 0 | 0 | 1732 | 1501 | 1206 | 184 | 223 | 186 |
ENSG00000111331 OAS3 | 2127.2010 | -2.661214 | 0.0972124 | -27.37525 | 0 | 0 | 4204 | 3977 | 2972 | 562 | 614 | 560 |
ENSG00000070915 SLC12A3 | 424.5509 | -3.374852 | 0.1298671 | -25.98697 | 0 | 0 | 1012 | 721 | 653 | 63 | 85 | 76 |
ENSG00000234745 HLA-B | 3197.0159 | -1.431566 | 0.0604169 | -23.69479 | 0 | 0 | 6085 | 4256 | 4023 | 1590 | 1872 | 1719 |
ENSG00000137965 IFI44 | 409.0957 | -2.978581 | 0.1319352 | -22.57608 | 0 | 0 | 829 | 740 | 635 | 76 | 111 | 89 |
ENSG00000204525 HLA-C | 1631.6421 | -1.461550 | 0.0660214 | -22.13750 | 0 | 0 | 3112 | 2150 | 2106 | 791 | 923 | 891 |
ENSG00000110042 DTX4 | 524.1318 | -2.470219 | 0.1173182 | -21.05572 | 0 | 0 | 1166 | 883 | 688 | 166 | 168 | 145 |
Make a smear plot.
sigf <- subset(def,padj<=0.05)
DET=nrow(mxf)
NSIG=nrow(sigf)
NUP=nrow(subset(sigf,log2FoldChange>0))
NDN=nrow(subset(sigf,log2FoldChange<0))
HEADER=paste(DET,"detected genes;",NSIG,"w/FDR<0.05;",NUP,"up;",NDN,"down")
plot(log10(def$baseMean) ,def$log2FoldChange,
cex=0.6,pch=19,col="darkgray",
main="5-azacitidine treatment in AML3 cells",
xlab="log10(basemean)",ylab="log2(fold change)")
points(log10(sigf$baseMean) ,sigf$log2FoldChange,
cex=0.6,pch=19,col="red")
mtext(HEADER)
write.table(def,file="../../data/aza_rnaseq.tsv")
In the next sections I will run enrichment analysis with over-representation analysis (ORA) test and compare it to functional class scoring. I will also investigate some strange behviour of the ORA tool clusterprofiler.
Get the gene sets loaded in R. These are KEGG gene sets
# from https://www.gsea-msigdb.org/gsea/msigdb/human/collections.jsp 16th June 2023
genesets2 <- read.gmt("../../ref/c2.cp.kegg.v2023.1.Hs.symbols.gmt")
gsets <- gmtPathways("../../ref/c2.cp.kegg.v2023.1.Hs.symbols.gmt")
message(paste("number of genes described in the annotation set:",length(unique(genesets2$gene))))
## number of genes described in the annotation set: 5244
Now filter the gene names into three lists, up-regulated, down-regulated and background. Background is simply all genes that were detected.
defup <- rownames(subset(def,padj<=0.05 & log2FoldChange>0))
defup <- unique(sapply(strsplit(defup," "),"[[",2))
defdn <- rownames(subset(def,padj<=0.05 & log2FoldChange<0))
defdn <- unique(sapply(strsplit(defdn," "),"[[",2))
bg <- rownames(def)
bg <- unique(sapply(strsplit(bg," "),"[[",2))
message(paste("number of genes in background:",length(bg)))
## number of genes in background: 13164
writeLines(text=unique(defup),con="aza_up.txt")
writeLines(text=unique(defdn),con="aza_dn.txt")
writeLines(text=unique(bg),con="aza_bg.txt")
I’ve compiled a reporting checklist:
Reporting criteria | Method/resource used |
---|---|
Origin of gene sets | KEGG (2023-06-16) |
Tool used | ClusterProfiler (check version at foot of report) |
Statistical test used | hypergeometric test |
P-value correction for multiple comparisons | FDR method |
Background list | Genes with >=10 reads per sample on average across all samples |
Gene Selection Criteria | DESeq2 FDR<0.05 |
ORA directionality | Separate tests for up- and down-regulation |
Data availability | via DEE2 at accession SRP038101 (human) |
Other parameters | Min gene set size of 10 |
Here I provide a custom background gene list.
Enrichment firstly with upregulated genes
n_pw=10
ora_up <- as.data.frame(enricher(gene = defup ,
universe = bg, maxGSSize = 500000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora_up$geneID <- NULL
ora_up <- subset(ora_up,p.adjust<0.05 & Count >=5)
ora_ups <- rownames(ora_up)
gr <- as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",2))
ora_up$es <- gr/br
ora_up <- ora_up[order(-ora_up$es),]
ora_up$Description=NULL
head(ora_up) %>%
kbl(row.names = FALSE, caption="Top upregulated pathways in Aza treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
KEGG_MISMATCH_REPAIR | 10/406 | 22/3134 | 0.0001811 | 0.0044696 | 0.0042686 | 10 | 3.508733 |
KEGG_DNA_REPLICATION | 15/406 | 35/3134 | 0.0000104 | 0.0005222 | 0.0004987 | 15 | 3.308234 |
KEGG_CELL_CYCLE | 44/406 | 115/3134 | 0.0000000 | 0.0000000 | 0.0000000 | 44 | 2.953438 |
KEGG_NUCLEOTIDE_EXCISION_REPAIR | 16/406 | 42/3134 | 0.0000314 | 0.0009481 | 0.0009054 | 16 | 2.940652 |
KEGG_BASE_EXCISION_REPAIR | 12/406 | 33/3134 | 0.0005182 | 0.0086936 | 0.0083027 | 12 | 2.806986 |
KEGG_RNA_POLYMERASE | 9/406 | 26/3134 | 0.0038482 | 0.0484237 | 0.0462464 | 9 | 2.672035 |
topup2 <- rev(head(ora_up$es,10))
names(topup2) <- rev(head(ora_up$ID,10))
The reported background size does not match the dataset.
Now repeat with the downregulated genes
ora_dn <- as.data.frame(enricher(gene = defdn ,
universe = bg, maxGSSize = 500000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora_dn$geneID <- NULL
ora_dn <- subset(ora_dn,p.adjust<0.05 & Count >=5)
ora_dns <- rownames(ora_dn)
gr <- as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",2))
ora_dn$es <- gr/br
ora_dn <- ora_dn[order(-ora_dn$es),]
ora_dn$Description=NULL
head(ora_dn,n_pw) %>%
kbl(row.names = FALSE, caption="Top downregulated pathways in Aza treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
KEGG_ALLOGRAFT_REJECTION | 8/608 | 11/3134 | 0.0001822 | 0.0030068 | 0.0023402 | 8 | 3.748804 |
KEGG_GRAFT_VERSUS_HOST_DISEASE | 8/608 | 11/3134 | 0.0001822 | 0.0030068 | 0.0023402 | 8 | 3.748804 |
KEGG_AUTOIMMUNE_THYROID_DISEASE | 7/608 | 10/3134 | 0.0006959 | 0.0073552 | 0.0057246 | 7 | 3.608224 |
KEGG_TYPE_I_DIABETES_MELLITUS | 9/608 | 14/3134 | 0.0002893 | 0.0039782 | 0.0030963 | 9 | 3.313675 |
KEGG_OTHER_GLYCAN_DEGRADATION | 7/608 | 13/3134 | 0.0057712 | 0.0380901 | 0.0296459 | 7 | 2.775557 |
KEGG_CELL_ADHESION_MOLECULES_CAMS | 30/608 | 59/3134 | 0.0000000 | 0.0000037 | 0.0000028 | 30 | 2.620986 |
KEGG_HEMATOPOIETIC_CELL_LINEAGE | 19/608 | 39/3134 | 0.0000311 | 0.0010268 | 0.0007992 | 19 | 2.511218 |
KEGG_COMPLEMENT_AND_COAGULATION_CASCADES | 10/608 | 21/3134 | 0.0031211 | 0.0245228 | 0.0190863 | 10 | 2.454574 |
KEGG_LEISHMANIA_INFECTION | 23/608 | 49/3134 | 0.0000100 | 0.0005478 | 0.0004264 | 23 | 2.419509 |
KEGG_SYSTEMIC_LUPUS_ERYTHEMATOSUS | 9/608 | 20/3134 | 0.0079453 | 0.0485544 | 0.0377903 | 9 | 2.319572 |
topdn2 <- head(ora_dn$es,n_pw)
names(topdn2) <- head(ora_dn$ID,n_pw)
Make a barplot
par(mar=c(5,20,5,1))
cols <- c(rep("blue",n_pw),rep("red",n_pw))
top <- c(topdn2,topup2)
if ( length(top) > 1 ) {
barplot(c(top),
horiz=TRUE,las=1,cex.names=0.65,col=cols,
main="top DE KEGGs",
xlab="ES",
cex.main=0.9)
mtext("ORA test")
}
I’ve compiled a reporting checklist:
Reporting criteria | Method/resource used |
---|---|
Origin of gene sets | KEGG (2023-06-16) |
Tool used | ClusterProfiler (check version at foot of report) |
Statistical test used | hypergeometric test |
P-value correction for multiple comparisons | FDR method |
Background list | Genes with >=10 reads per sample on average across all samples |
Gene Selection Criteria | DESeq2 FDR<0.05 |
ORA directionality | Separate tests for up- and down-regulation |
Data availability | via DEE2 at accession SRP038101 (human) |
Other parameters | Min gene set size of 10 + BUG FIX |
Here I provide a background gene list for cluterprofiler to use, like a user should.
I have also done some modification to the KEGG gene sets, to en
Now filter the gene names into three lists, up-regulated, down-regulated and background. Background is simply all genes that were detected.
defup <- rownames(subset(def,padj<=0.05 & log2FoldChange>0))
defup <- unique(sapply(strsplit(defup," "),"[[",2))
defdn <- rownames(subset(def,padj<=0.05 & log2FoldChange<0))
defdn <- unique(sapply(strsplit(defdn," "),"[[",2))
bg <- rownames(def)
bg <- unique(sapply(strsplit(bg," "),"[[",2))
message(paste("number of genes in background:",length(bg)))
## number of genes in background: 13164
Adding all detected genes to the background appears to improve results!
bgdf <- data.frame("background",bg)
colnames(bgdf) <- c("term","gene")
genesets2 <- rbind(genesets2,bgdf)
Enrichment firstly with upregulated genes
n_pw=10
orafix_up <- as.data.frame(enricher(gene = defup ,
universe = bg, maxGSSize = 500000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
orafix_up$geneID <- NULL
orafix_up <- subset(orafix_up,p.adjust<0.05 & Count >=5)
orafix_ups <- rownames(orafix_up)
gr <- as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",2))
orafix_up$es <- gr/br
orafix_up <- orafix_up[order(-orafix_up$es),]
orafix_up$Description=NULL
head(orafix_up) %>%
kbl(row.names = FALSE, caption="Top upregulated pathways in Aza treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
KEGG_MISMATCH_REPAIR | 10/1672 | 22/13164 | 0.0001611 | 0.0039535 | 0.0037783 | 10 | 3.578730 |
KEGG_DNA_REPLICATION | 15/1672 | 35/13164 | 0.0000091 | 0.0004610 | 0.0004406 | 15 | 3.374231 |
KEGG_CELL_CYCLE | 44/1672 | 115/13164 | 0.0000000 | 0.0000000 | 0.0000000 | 44 | 3.012357 |
KEGG_NUCLEOTIDE_EXCISION_REPAIR | 16/1672 | 42/13164 | 0.0000275 | 0.0008364 | 0.0007993 | 16 | 2.999316 |
KEGG_BASE_EXCISION_REPAIR | 12/1672 | 33/13164 | 0.0004583 | 0.0077393 | 0.0073963 | 12 | 2.862984 |
KEGG_RNA_POLYMERASE | 9/1672 | 26/13164 | 0.0034659 | 0.0439015 | 0.0419557 | 9 | 2.725340 |
topup2 <- rev(head(orafix_up$es,10))
names(topup2) <- rev(head(orafix_up$ID,10))
Now with the downregulated genes
orafix_dn <- as.data.frame(enricher(gene = defdn ,
universe = bg, maxGSSize = 500000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
orafix_dn$geneID <- NULL
orafix_dn <- subset(orafix_dn,p.adjust<0.05 & Count >=5)
orafix_dns <- rownames(orafix_dn)
gr <- as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",2))
orafix_dn$es <- gr/br
orafix_dn <- orafix_dn[order(-orafix_dn$es),]
orafix_dn$Description=NULL
head(orafix_dn,n_pw) %>%
kbl(row.names = FALSE, caption="Top downregulated pathways in Aza treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
KEGG_ALLOGRAFT_REJECTION | 8/1926 | 11/13164 | 0.0000226 | 0.0002255 | 0.0001444 | 8 | 4.970830 |
KEGG_GRAFT_VERSUS_HOST_DISEASE | 8/1926 | 11/13164 | 0.0000226 | 0.0002255 | 0.0001444 | 8 | 4.970830 |
KEGG_AUTOIMMUNE_THYROID_DISEASE | 7/1926 | 10/13164 | 0.0001134 | 0.0008184 | 0.0005241 | 7 | 4.784424 |
KEGG_TYPE_I_DIABETES_MELLITUS | 9/1926 | 14/13164 | 0.0000300 | 0.0002493 | 0.0001597 | 9 | 4.393859 |
KEGG_OTHER_GLYCAN_DEGRADATION | 7/1926 | 13/13164 | 0.0010809 | 0.0058674 | 0.0037578 | 7 | 3.680326 |
KEGG_CELL_ADHESION_MOLECULES_CAMS | 30/1926 | 59/13164 | 0.0000000 | 0.0000000 | 0.0000000 | 30 | 3.475368 |
KEGG_INTESTINAL_IMMUNE_NETWORK_FOR_IGA_PRODUCTION | 7/1926 | 14/13164 | 0.0018899 | 0.0091194 | 0.0058406 | 7 | 3.417445 |
KEGG_HEMATOPOIETIC_CELL_LINEAGE | 19/1926 | 39/13164 | 0.0000005 | 0.0000109 | 0.0000070 | 19 | 3.329819 |
KEGG_COMPLEMENT_AND_COAGULATION_CASCADES | 10/1926 | 21/13164 | 0.0003293 | 0.0021863 | 0.0014002 | 10 | 3.254710 |
KEGG_LEISHMANIA_INFECTION | 23/1926 | 49/13164 | 0.0000001 | 0.0000034 | 0.0000022 | 23 | 3.208214 |
topdn2 <- head(orafix_dn$es,n_pw)
names(topdn2) <- head(orafix_dn$ID,n_pw)
Make a barplot
par(mar=c(5,20,5,1))
cols <- c(rep("blue",n_pw),rep("red",n_pw))
top <- c(topdn2,topup2)
if ( length(top) > 1 ) {
barplot(c(top),
horiz=TRUE,las=1,cex.names=0.65,col=cols,
main="top DE KEGGs",
xlab="ES",
cex.main=0.9)
mtext("ORA fix")
}
v1 <- list("ORA up"=ora_ups, "ORA fix up"=orafix_ups)
v2 <- list("ORA dn"=ora_dns, "ORA fix dn"=orafix_dns)
v3 <- list("ORA"=union(ora_ups,ora_dns),
"ORA fix"=union(orafix_ups,orafix_dns))
par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2),main="upregulated KEGG pathways")
plot(euler(v2),quantities = list(cex = 2), labels = list(cex = 2),main="downregulated KEGG pathways")
plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated KEGG pathways")
svg("rna_kegg_euler.svg")
plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated KEGG pathways")
dev.off()
## png
## 2
Jaccard index comparing ORA and ORA fix.
jaccard <- function(a, b) {
intersection = length(intersect(a, b))
union = length(a) + length(b) - intersection
return (intersection/union)
}
ora <- c(ora_ups,ora_dns)
orafix <- c(orafix_ups,orafix_dns)
jaccard(ora,orafix)
## [1] 0.5909091
List gene sets identified only in the “fixed” analysis.
ora <- union(ora_ups,ora_dns)
orafix <- union(orafix_ups,orafix_dns)
setdiff(orafix,ora)
## [1] "KEGG_CALCIUM_SIGNALING_PATHWAY"
## [2] "KEGG_GAP_JUNCTION"
## [3] "KEGG_GNRH_SIGNALING_PATHWAY"
## [4] "KEGG_GLYCOLYSIS_GLUCONEOGENESIS"
## [5] "KEGG_FC_GAMMA_R_MEDIATED_PHAGOCYTOSIS"
## [6] "KEGG_ACUTE_MYELOID_LEUKEMIA"
## [7] "KEGG_INTESTINAL_IMMUNE_NETWORK_FOR_IGA_PRODUCTION"
## [8] "KEGG_TOLL_LIKE_RECEPTOR_SIGNALING_PATHWAY"
## [9] "KEGG_B_CELL_RECEPTOR_SIGNALING_PATHWAY"
## [10] "KEGG_ENDOCYTOSIS"
## [11] "KEGG_APOPTOSIS"
## [12] "KEGG_GLUTATHIONE_METABOLISM"
## [13] "KEGG_TGF_BETA_SIGNALING_PATHWAY"
## [14] "KEGG_GLIOMA"
## [15] "KEGG_NOTCH_SIGNALING_PATHWAY"
## [16] "KEGG_MELANOGENESIS"
## [17] "KEGG_GLYCOSAMINOGLYCAN_DEGRADATION"
## [18] "KEGG_RIG_I_LIKE_RECEPTOR_SIGNALING_PATHWAY"
## [19] "KEGG_AXON_GUIDANCE"
## [20] "KEGG_TYPE_II_DIABETES_MELLITUS"
## [21] "KEGG_OLFACTORY_TRANSDUCTION"
## [22] "KEGG_MELANOMA"
## [23] "KEGG_CHRONIC_MYELOID_LEUKEMIA"
## [24] "KEGG_ARGININE_AND_PROLINE_METABOLISM"
## [25] "KEGG_STEROID_BIOSYNTHESIS"
## [26] "KEGG_P53_SIGNALING_PATHWAY"
## [27] "KEGG_SPHINGOLIPID_METABOLISM"
I’ve compiled a reporting checklist:
Reporting criteria | Method/resource used |
---|---|
Origin of gene sets | Reactome (2023-03-06) |
Tool used | ClusterProfiler (check version at foot of report) |
Statistical test used | hypergeometric test |
P-value correction for multiple comparisons | FDR method |
Background list | Genes with >=10 reads per sample on average across all samples |
Gene Selection Criteria | DESeq2 FDR<0.05 |
ORA directionality | Separate tests for up- and down-regulation |
Data availability | via DEE2 at accession SRP038101 (human) |
Other parameters | Min gene set size of 10 |
Here I provide a custom background gene list.
Get the gene sets loaded in R. These are Reactome gene sets
genesets2 <- read.gmt("../../ref/ReactomePathways_2023-03-06.gmt")
gsets <- gmtPathways("../../ref/ReactomePathways_2023-03-06.gmt")
message(paste("number of genes described in the annotation set:",length(unique(genesets2$gene))))
## number of genes described in the annotation set: 11457
Enrichment firstly with upregulated genes
n_pw=10
ora_up <- as.data.frame(enricher(gene = defup ,
universe = bg, maxGSSize = 500000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora_up$geneID <- NULL
ora_up <- subset(ora_up,p.adjust<0.05 & Count >=5)
ora_ups <- rownames(ora_up)
gr <- as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",2))
ora_up$es <- gr/br
ora_up <- ora_up[order(-ora_up$es),]
ora_up$Description=NULL
head(ora_up) %>%
kbl(row.names = FALSE, caption="Top upregulated pathways in Aza treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
Condensation of Prometaphase Chromosomes | 9/1050 | 11/6944 | 0.0000017 | 0.0000309 | 0.0000259 | 9 | 5.410909 |
Folding of actin by CCT/TriC | 7/1050 | 10/6944 | 0.0001398 | 0.0016736 | 0.0014045 | 7 | 4.629333 |
Interactions of Rev with host cellular proteins | 24/1050 | 36/6944 | 0.0000000 | 0.0000000 | 0.0000000 | 24 | 4.408889 |
Nuclear import of Rev protein | 22/1050 | 33/6944 | 0.0000000 | 0.0000000 | 0.0000000 | 22 | 4.408889 |
Postmitotic nuclear pore complex (NPC) reformation | 18/1050 | 27/6944 | 0.0000000 | 0.0000001 | 0.0000001 | 18 | 4.408889 |
Rev-mediated nuclear export of HIV RNA | 22/1050 | 34/6944 | 0.0000000 | 0.0000000 | 0.0000000 | 22 | 4.279216 |
topup2 <- rev(head(ora_up$es,10))
names(topup2) <- rev(head(ora_up$ID,10))
Now repeat with the downregulated genes
ora_dn <- as.data.frame(enricher(gene = defdn ,
universe = bg, maxGSSize = 500000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora_dn$geneID <- NULL
ora_dn <- subset(ora_dn,p.adjust<0.05 & Count >=5)
ora_dns <- rownames(ora_dn)
gr <- as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",2))
ora_dn$es <- gr/br
ora_dn <- ora_dn[order(-ora_dn$es),]
ora_dn$Description=NULL
head(ora_dn,n_pw) %>%
kbl(row.names = FALSE, caption="Top downregulated pathways in Aza treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
Interleukin-20 family signaling | 10/1178 | 12/6944 | 0.0000009 | 0.0000594 | 0.0000530 | 10 | 4.912281 |
Endosomal/Vacuolar pathway | 9/1178 | 12/6944 | 0.0000152 | 0.0007214 | 0.0006439 | 9 | 4.421053 |
Interferon alpha/beta signaling | 41/1178 | 55/6944 | 0.0000000 | 0.0000000 | 0.0000000 | 41 | 4.394258 |
Interferon gamma signaling | 40/1178 | 63/6944 | 0.0000000 | 0.0000000 | 0.0000000 | 40 | 3.742690 |
Interleukin-10 signaling | 17/1178 | 27/6944 | 0.0000001 | 0.0000104 | 0.0000093 | 17 | 3.711501 |
Transcriptional regulation of pluripotent stem cells | 7/1178 | 12/6944 | 0.0014257 | 0.0337226 | 0.0301007 | 7 | 3.438597 |
IRAK4 deficiency (TLR2/4) | 8/1178 | 14/6944 | 0.0007663 | 0.0229901 | 0.0205209 | 8 | 3.368421 |
Binding and Uptake of Ligands by Scavenger Receptors | 9/1178 | 16/6944 | 0.0004129 | 0.0141085 | 0.0125932 | 9 | 3.315790 |
Other interleukin signaling | 10/1178 | 18/6944 | 0.0002230 | 0.0080676 | 0.0072011 | 10 | 3.274854 |
Formation of the cornified envelope | 8/1178 | 15/6944 | 0.0013992 | 0.0337226 | 0.0301007 | 8 | 3.143860 |
topdn2 <- head(ora_dn$es,n_pw)
names(topdn2) <- head(ora_dn$ID,n_pw)
Make a barplot
par(mar=c(5,20,5,1))
cols <- c(rep("blue",n_pw),rep("red",n_pw))
top <- c(topdn2,topup2)
if ( length(top) > 1 ) {
barplot(c(top),
horiz=TRUE,las=1,cex.names=0.65,col=cols,
main="top DE Reactomes",
xlab="ES",
cex.main=0.9)
mtext("ORA test")
}
I’ve compiled a reporting checklist:
Reporting criteria | Method/resource used |
---|---|
Origin of gene sets | Reactome (2023-03-06) |
Tool used | ClusterProfiler (check version at foot of report) |
Statistical test used | hypergeometric test |
P-value correction for multiple comparisons | FDR method |
Background list | Genes with >=10 reads per sample on average across all samples |
Gene Selection Criteria | DESeq2 FDR<0.05 |
ORA directionality | Separate tests for up- and down-regulation |
Data availability | via DEE2 at accession SRP038101 (human) |
Other parameters | Min gene set size of 10 + BUG FIX |
Adding all detected genes to the background appears to improve results!
bgdf <- data.frame("background",bg)
colnames(bgdf) <- c("term","gene")
genesets2 <- rbind(genesets2,bgdf)
Enrichment firstly with upregulated genes
orafix_up <- as.data.frame(enricher(gene = defup ,
universe = bg, maxGSSize = 500000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
orafix_up$geneID <- NULL
orafix_up <- subset(orafix_up,p.adjust<0.05 & Count >=5)
orafix_ups <- rownames(orafix_up)
gr <- as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",2))
orafix_up$es <- gr/br
orafix_up <- orafix_up[order(-orafix_up$es),]
orafix_up$Description=NULL
head(orafix_up) %>%
kbl(row.names = FALSE, caption="Top upregulated pathways in Aza treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
Condensation of Prometaphase Chromosomes | 9/1672 | 11/13164 | 4.00e-07 | 0.0000055 | 0.0000043 | 9 | 6.441714 |
Folding of actin by CCT/TriC | 7/1672 | 10/13164 | 4.45e-05 | 0.0004779 | 0.0003726 | 7 | 5.511244 |
Interactions of Rev with host cellular proteins | 24/1672 | 36/13164 | 0.00e+00 | 0.0000000 | 0.0000000 | 24 | 5.248804 |
Nuclear import of Rev protein | 22/1672 | 33/13164 | 0.00e+00 | 0.0000000 | 0.0000000 | 22 | 5.248804 |
Postmitotic nuclear pore complex (NPC) reformation | 18/1672 | 27/13164 | 0.00e+00 | 0.0000000 | 0.0000000 | 18 | 5.248804 |
Rev-mediated nuclear export of HIV RNA | 22/1672 | 34/13164 | 0.00e+00 | 0.0000000 | 0.0000000 | 22 | 5.094427 |
topup2 <- rev(head(orafix_up$es,10))
names(topup2) <- rev(head(orafix_up$ID,10))
Now with the downregulated genes
n_pw=10
orafix_dn <- as.data.frame(enricher(gene = defdn ,
universe = bg, maxGSSize = 500000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
orafix_dn$geneID <- NULL
orafix_dn <- subset(orafix_dn,p.adjust<0.05 & Count >=5)
orafix_dns <- rownames(orafix_dn)
gr <- as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",2))
orafix_dn$es <- gr/br
orafix_dn <- orafix_dn[order(-orafix_dn$es),]
orafix_dn$Description=NULL
head(orafix_dn,n_pw) %>%
kbl(row.names = FALSE, caption="Top downregulated pathways in Aza treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
Interleukin-20 family signaling | 10/1926 | 12/13164 | 0.0000002 | 0.0000108 | 0.0000089 | 10 | 5.695742 |
Endosomal/Vacuolar pathway | 9/1926 | 12/13164 | 0.0000044 | 0.0001919 | 0.0001589 | 9 | 5.126168 |
Interferon alpha/beta signaling | 41/1926 | 55/13164 | 0.0000000 | 0.0000000 | 0.0000000 | 41 | 5.095101 |
Interferon gamma signaling | 40/1926 | 63/13164 | 0.0000000 | 0.0000000 | 0.0000000 | 40 | 4.339613 |
Interleukin-10 signaling | 17/1926 | 27/13164 | 0.0000000 | 0.0000009 | 0.0000007 | 17 | 4.303450 |
TNFs bind their physiological receptors | 6/1926 | 10/13164 | 0.0012021 | 0.0196365 | 0.0162539 | 6 | 4.100935 |
Transcriptional regulation of pluripotent stem cells | 7/1926 | 12/13164 | 0.0005709 | 0.0111549 | 0.0092334 | 7 | 3.987020 |
IRAK4 deficiency (TLR2/4) | 8/1926 | 14/13164 | 0.0002719 | 0.0066947 | 0.0055415 | 8 | 3.905652 |
Binding and Uptake of Ligands by Scavenger Receptors | 9/1926 | 16/13164 | 0.0001299 | 0.0041000 | 0.0033937 | 9 | 3.844626 |
Other interleukin signaling | 10/1926 | 18/13164 | 0.0000622 | 0.0021881 | 0.0018112 | 10 | 3.797162 |
topdn2 <- head(orafix_dn$es,n_pw)
names(topdn2) <- head(orafix_dn$ID,n_pw)
Make a barplot
par(mar=c(5,20,5,1))
cols <- c(rep("blue",n_pw),rep("red",n_pw))
top <- c(topdn2,topup2)
if ( length(top) > 1 ) {
barplot(c(top),
horiz=TRUE,las=1,cex.names=0.65,col=cols,
main="top DE Reactomes",
xlab="ES",
cex.main=0.9)
mtext("ORA fix")
}
Upregulated genes.
fres_up <- fora(pathways = gsets, genes = defup, universe = bg,
minSize = 5, maxSize = 500000)
fora_up <- subset(fres_up,padj<0.05)$pathway
Downregulated genes.
fres_dn <- fora(pathways = gsets, genes = defdn, universe = bg,
minSize = 5, maxSize = 500000)
fora_dn <- subset(fres_dn,padj<0.05)$pathway
Compare clusterprofiler approaches.
v1 <- list("ORA up"=ora_ups, "ORA fix up"=orafix_ups)
v2 <- list("ORA dn"=ora_dns, "ORA fix dn"=orafix_dns)
v3 <- list("ORA"=union(ora_ups,ora_dns),
"ORA fix"=union(orafix_ups,orafix_dns))
par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2),main="upregulated Reactomes")
plot(euler(v2),quantities = list(cex = 2), labels = list(cex = 2),main="downregulated Reactomes")
plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated Reactomes")
svg("rna_reactome_euler.svg")
plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated Reactomes")
dev.off()
## png
## 2
Jaccard index comparing ORA and ORA fix.
jaccard <- function(a, b) {
intersection = length(intersect(a, b))
union = length(a) + length(b) - intersection
return (intersection/union)
}
ora <- c(ora_ups,ora_dns)
orafix <- c(orafix_ups,orafix_dns)
jaccard(ora,orafix)
## [1] 0.6104972
List gene sets identified only in the “fixed” analysis.
ora <- union(ora_ups,ora_dns)
orafix <- union(orafix_ups,orafix_dns)
setdiff(orafix,ora)
## [1] "Signaling by Rho GTPases, Miro GTPases and RHOBTB3"
## [2] "RNA Polymerase II Transcription"
## [3] "Post-translational protein modification"
## [4] "Signaling by Rho GTPases"
## [5] "Metabolism of proteins"
## [6] "rRNA processing"
## [7] "rRNA processing in the nucleus and cytosol"
## [8] "Signaling by Nuclear Receptors"
## [9] "SARS-CoV-2-host interactions"
## [10] "APC/C-mediated degradation of cell cycle proteins"
## [11] "Regulation of mitotic cell cycle"
## [12] "Disease"
## [13] "SARS-CoV Infections"
## [14] "Major pathway of rRNA processing in the nucleolus and cytosol"
## [15] "Anchoring of the basal body to the plasma membrane"
## [16] "Generic Transcription Pathway"
## [17] "DNA Replication Pre-Initiation"
## [18] "Regulation of APC/C activators between G1/S and early anaphase"
## [19] "Translesion synthesis by Y family DNA polymerases bypasses lesions on DNA template"
## [20] "Chaperonin-mediated protein folding"
## [21] "Positive epigenetic regulation of rRNA expression"
## [22] "Disorders of transmembrane transporters"
## [23] "Cellular responses to stress"
## [24] "Protein folding"
## [25] "ESR-mediated signaling"
## [26] "Cooperation of Prefoldin and TriC/CCT in actin and tubulin folding"
## [27] "RNA Polymerase I Transcription Termination"
## [28] "Chromatin modifying enzymes"
## [29] "Chromatin organization"
## [30] "SARS-CoV-2 Infection"
## [31] "Cellular responses to stimuli"
## [32] "RHOBTB GTPase Cycle"
## [33] "Recruitment of NuMA to mitotic centrosomes"
## [34] "Centrosome maturation"
## [35] "Recruitment of mitotic centrosome proteins and complexes"
## [36] "RNA Polymerase III Transcription Initiation From Type 2 Promoter"
## [37] "Loss of Nlp from mitotic centrosomes"
## [38] "Loss of proteins required for interphase microtubule organization from the centrosome"
## [39] "Regulation of mRNA stability by proteins that bind AU-rich elements"
## [40] "Influenza Infection"
## [41] "E2F mediated regulation of DNA replication"
## [42] "Regulation of PLK1 Activity at G2/M Transition"
## [43] "Cooperation of PDCL (PhLP1) and TRiC/CCT in G-protein beta folding"
## [44] "RNA Polymerase III Transcription Initiation From Type 1 Promoter"
## [45] "Assembly of the ORC complex at the origin of replication"
## [46] "SLBP independent Processing of Histone Pre-mRNAs"
## [47] "Nucleotide biosynthesis"
## [48] "Processing of SMDT1"
## [49] "Activation of APC/C and APC/C:Cdc20 mediated degradation of mitotic proteins"
## [50] "Meiotic recombination"
## [51] "B-WICH complex positively regulates rRNA expression"
## [52] "RNA Polymerase III Abortive And Retractive Initiation"
## [53] "RNA Polymerase III Transcription"
## [54] "RND2 GTPase cycle"
## [55] "Regulation of cholesterol biosynthesis by SREBP (SREBF)"
## [56] "Cyclin E associated events during G1/S transition"
## [57] "Prefoldin mediated transfer of substrate to CCT/TriC"
## [58] "mRNA 3'-end processing"
## [59] "Aberrant regulation of mitotic G1/S transition in cancer due to RB1 defects"
## [60] "Defective binding of RB1 mutants to E2F1,(E2F2, E2F3)"
## [61] "Aberrant regulation of mitotic cell cycle due to RB1 defects"
## [62] "Activation of BH3-only proteins"
## [63] "Formation of the beta-catenin:TCF transactivating complex"
## [64] "SLBP Dependent Processing of Replication-Dependent Histone Pre-mRNAs"
## [65] "The role of GTSE1 in G2/M progression after G2 checkpoint"
## [66] "Cyclin A:Cdk2-associated events at S phase entry"
## [67] "DNA Double Strand Break Response"
## [68] "Deubiquitination"
## [69] "Developmental Biology"
## [70] "Biological oxidations"
## [71] "Leishmania infection"
## [72] "Neuronal System"
## [73] "RAC1 GTPase cycle"
## [74] "Toll-like Receptor Cascades"
## [75] "Cell-Cell communication"
## [76] "PI5P, PP2A and IER3 Regulate PI3K/AKT Signaling"
## [77] "Muscle contraction"
## [78] "Transmission across Chemical Synapses"
## [79] "Antigen processing-Cross presentation"
## [80] "RHOA GTPase cycle"
## [81] "Semaphorin interactions"
## [82] "Interleukin-3, Interleukin-5 and GM-CSF signaling"
## [83] "Collagen biosynthesis and modifying enzymes"
## [84] "Negative regulation of the PI3K/AKT network"
## [85] "Attachment and Entry_9694614"
## [86] "MyD88 deficiency (TLR2/4)"
## [87] "Toll Like Receptor 4 (TLR4) Cascade"
## [88] "TNFs bind their physiological receptors"
## [89] "Constitutive Signaling by Aberrant PI3K in Cancer"
## [90] "Phase I - Functionalization of compounds"
## [91] "Glycosaminoglycan metabolism"
## [92] "Signaling by cytosolic FGFR1 fusion mutants"
## [93] "Signaling by SCF-KIT"
## [94] "Non-integrin membrane-ECM interactions"
## [95] "Metabolism of fat-soluble vitamins"
## [96] "Diseases of signal transduction by growth factor receptors and second messengers"
## [97] "Ephrin signaling"
## [98] "Sema3A PAK dependent Axon repulsion"
## [99] "ER-Phagosome pathway"
## [100] "Signaling by PDGF"
## [101] "CDC42 GTPase cycle"
## [102] "Glycosphingolipid metabolism"
## [103] "Opioid Signalling"
## [104] "Elastic fibre formation"
## [105] "Interleukin-15 signaling"
## [106] "Regulation of TLR by endogenous ligand"
## [107] "Assembly and cell surface presentation of NMDA receptors"
## [108] "RAF-independent MAPK1/3 activation"
## [109] "Collagen formation"
## [110] "Heparan sulfate/heparin (HS-GAG) metabolism"
## [111] "Signaling by NOTCH3"
## [112] "MyD88:MAL(TIRAP) cascade initiated on plasma membrane"
## [113] "Toll Like Receptor 2 (TLR2) Cascade"
## [114] "Toll Like Receptor TLR1:TLR2 Cascade"
## [115] "Toll Like Receptor TLR6:TLR2 Cascade"
## [116] "RHO GTPase cycle"
## [117] "Nuclear Events (kinase and transcription factor activation)"
## [118] "Chondroitin sulfate/dermatan sulfate metabolism"
## [119] "Activation of Matrix Metalloproteinases"
## [120] "Metabolism of steroid hormones"
## [121] "The NLRP3 inflammasome"
## [122] "Neurotransmitter receptors and postsynaptic signal transmission"
## [123] "Amyloid fiber formation"
## [124] "Sphingolipid metabolism"
## [125] "Activated NOTCH1 Transmits Signal to the Nucleus"
## [126] "Sensory processing of sound by outer hair cells of the cochlea"
## [127] "Signaling by CSF3 (G-CSF)"
## [128] "Signaling by FGFR1 in disease"
## [129] "Inflammasomes"
## [130] "Cell recruitment (pro-inflammatory response)"
## [131] "Purinergic signaling in leishmaniasis infection"
## [132] "G alpha (s) signalling events"
## [133] "Prolactin receptor signaling"
## [134] "SEMA3A-Plexin repulsion signaling by inhibiting Integrin adhesion"
## [135] "Adaptive Immune System"
## [136] "G alpha (q) signalling events"
## [137] "Death Receptor Signaling"
## [138] "Ca-dependent events"
## [139] "Antimicrobial peptides"
Compare clusterprofiler to fora.
v1 <- list("CP up"=ora_ups, "CP fix up"=orafix_ups, "FORA up" = fora_up)
v2 <- list("CP dn"=ora_dns, "CP fix dn"=orafix_dns, "FORA dn" = fora_dn)
v3 <- list("CP"=union(ora_ups,ora_dns),
"CP fix"=union(orafix_ups,orafix_dns),
"FORA" = union(fora_up,fora_dn) )
par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2),main="upregulated Reactomes")
plot(euler(v2),quantities = list(cex = 2), labels = list(cex = 2),main="downregulated Reactomes")
plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated Reactomes")
Jaccard index comparing CP to FORA.
ora <- c(ora_ups,ora_dns)
orafix <- c(orafix_ups,orafix_dns)
fora <- c(fora_up,fora_dn)
jaccard(ora,fora)
## [1] 0.6207865
jaccard(orafix,fora)
## [1] 0.8944591
For reproducibility
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] gplots_3.1.3 eulerr_7.0.0
## [3] clusterProfiler_4.8.3 kableExtra_1.3.4
## [5] DESeq2_1.40.2 SummarizedExperiment_1.30.2
## [7] Biobase_2.60.0 MatrixGenerics_1.12.3
## [9] matrixStats_1.0.0 GenomicRanges_1.52.0
## [11] GenomeInfoDb_1.36.2 IRanges_2.34.1
## [13] S4Vectors_0.38.1 BiocGenerics_0.46.0
## [15] getDEE2_1.10.0 fgsea_1.26.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 rstudioapi_0.15.0 jsonlite_1.8.7
## [4] magrittr_2.0.3 farver_2.1.1 rmarkdown_2.24
## [7] zlibbioc_1.46.0 vctrs_0.6.3 memoise_2.0.1
## [10] RCurl_1.98-1.12 ggtree_3.8.2 webshot_0.5.5
## [13] htmltools_0.5.6 S4Arrays_1.0.6 gridGraphics_0.5-1
## [16] sass_0.4.7 KernSmooth_2.23-22 bslib_0.5.1
## [19] plyr_1.8.8 cachem_1.0.8 igraph_1.5.1
## [22] lifecycle_1.0.3 pkgconfig_2.0.3 gson_0.1.0
## [25] Matrix_1.6-1 R6_2.5.1 fastmap_1.1.1
## [28] GenomeInfoDbData_1.2.10 digest_0.6.33 aplot_0.2.0
## [31] enrichplot_1.20.1 colorspace_2.1-0 patchwork_1.1.3
## [34] AnnotationDbi_1.62.2 RSQLite_2.3.1 fansi_1.0.4
## [37] httr_1.4.7 polyclip_1.10-4 abind_1.4-5
## [40] compiler_4.3.1 bit64_4.0.5 withr_2.5.0
## [43] downloader_0.4 BiocParallel_1.34.2 viridis_0.6.4
## [46] DBI_1.1.3 highr_0.10 ggforce_0.4.1
## [49] MASS_7.3-60 DelayedArray_0.26.7 HDO.db_0.99.1
## [52] caTools_1.18.2 gtools_3.9.4 tools_4.3.1
## [55] scatterpie_0.2.1 ape_5.7-1 glue_1.6.2
## [58] nlme_3.1-163 GOSemSim_2.26.1 polylabelr_0.2.0
## [61] shadowtext_0.1.2 grid_4.3.1 reshape2_1.4.4
## [64] generics_0.1.3 gtable_0.3.4 tidyr_1.3.0
## [67] data.table_1.14.8 tidygraph_1.2.3 xml2_1.3.5
## [70] utf8_1.2.3 XVector_0.40.0 ggrepel_0.9.3
## [73] pillar_1.9.0 stringr_1.5.0 yulab.utils_0.0.9
## [76] splines_4.3.1 dplyr_1.1.3 tweenr_2.0.2
## [79] treeio_1.24.3 lattice_0.21-8 bit_4.0.5
## [82] tidyselect_1.2.0 GO.db_3.17.0 locfit_1.5-9.8
## [85] Biostrings_2.68.1 knitr_1.43 gridExtra_2.3
## [88] svglite_2.1.1 xfun_0.40 graphlayouts_1.0.0
## [91] stringi_1.7.12 lazyeval_0.2.2 ggfun_0.1.2
## [94] yaml_2.3.7 evaluate_0.21 codetools_0.2-19
## [97] ggraph_2.1.0 tibble_3.2.1 qvalue_2.32.0
## [100] ggplotify_0.1.2 cli_3.6.1 systemfonts_1.0.4
## [103] munsell_0.5.0 jquerylib_0.1.4 Rcpp_1.0.11
## [106] png_0.1-8 parallel_4.3.1 ggplot2_3.4.3
## [109] blob_1.2.4 DOSE_3.26.1 htm2txt_2.2.2
## [112] bitops_1.0-7 viridisLite_0.4.2 tidytree_0.4.5
## [115] scales_1.2.1 purrr_1.0.2 crayon_1.5.2
## [118] rlang_1.1.1 cowplot_1.1.1 fastmatch_1.1-4
## [121] KEGGREST_1.40.0 rvest_1.0.3