Source: https://github.com/markziemann/background
I previously wrote about the importance of a background gene list for accurate and correct enrichment analysis.
Here I will demonstrate how important this is, as well as providing a guide for how to make your own background gene list.
For this guide I will be using data from a previous analysis, which is deposited at NCBI GEO and SRA under accession numbers: GSE109140 and SRP128998.
The data was presented in a paper by Felisbino et al
library("getDEE2")
library("DESeq2")
library("kableExtra")
library("vioplot")
library("clusterProfiler")
library("eulerr")
Using RNAseq data from DEE2, which can be accessed from dee2.io
mdat <- getDEE2Metadata("hsapiens")
# get sample sheet
ss <- subset(mdat,SRP_accession=="SRP128998")
# fetch the whole set of RNA-seq data
x <- getDEE2("hsapiens",ss$SRR_accession , metadata=mdat, legacy=TRUE)
## For more information about DEE2 QC metrics, visit
## https://github.com/markziemann/dee2/blob/master/qc/qc_metrics.md
mx <- x$GeneCounts
rownames(mx) <- paste(rownames(mx),x$GeneInfo$GeneSymbol)
# vpa subset no filtering
ss1 <- ss[grep("low",ss$Experiment_title),]
ss1$vpa <- grepl("VPA",ss1$Experiment_title)
ss1 %>%
kbl(caption="sample sheet for VPA treatment in hepatocytes") %>%
kable_paper("hover", full_width = F)
SRR_accession | QC_summary | SRX_accession | SRS_accession | SRP_accession | Experiment_title | GEO_series | vpa | |
---|---|---|---|---|---|---|---|---|
384654 | SRR6467485 | PASS | SRX3557434 | SRS2830733 | SRP128998 | GSM2932797: low glucose replicate 1; Homo sapiens; RNA-Seq | FALSE | |
384655 | SRR6467486 | PASS | SRX3557435 | SRS2830734 | SRP128998 | GSM2932798: low glucose replicate 2; Homo sapiens; RNA-Seq | FALSE | |
384656 | SRR6467487 | PASS | SRX3557436 | SRS2830735 | SRP128998 | GSM2932799: low glucose replicate 3; Homo sapiens; RNA-Seq | FALSE | |
384657 | SRR6467488 | PASS | SRX3557437 | SRS2830736 | SRP128998 | GSM2932800: low glucose VPA replicate 1; Homo sapiens; RNA-Seq | TRUE | |
384658 | SRR6467489 | PASS | SRX3557438 | SRS2830737 | SRP128998 | GSM2932801: low glucose VPA replicate 2; Homo sapiens; RNA-Seq | TRUE | |
384659 | SRR6467490 | PASS | SRX3557439 | SRS2830738 | SRP128998 | GSM2932802: low glucose VPA replicate 3; Homo sapiens; RNA-Seq | TRUE |
mx1 <- mx[,which(colnames(mx) %in% ss1$SRR_accession)]
dim(mx1)
## [1] 58302 6
Although I know this data set is good as I was involved in the project, it is a good idea to show at least the number of reads for each sample.
par(mar=c(5,7,5,1))
barplot(rev(colSums(mx1)),horiz=TRUE,las=1,main="number of reads per sample in SRP128998")
Now we can have a look at the read count distribution for one of the samples.
# look at the distribution of counts
l1 <- lapply(1:ncol(mx1),function(i) log10( mx1[,i] +1 ))
names(l1) <- colnames(mx1)
l1 <- rev(l1)
hist(l1[[6]],main="Distribution of read counts in SRR6467485",
xlab="log10(readcounts+1)",breaks=30)
abline(v=1,col="red",lty=2)
We can see that most genes have a very low read count, consistent with most genes being undetectable in any sample.
Based on this typical distribution of read counts, I would consider any gene with fewer than 10 reads as not detected. This could be considered a strict detection threshold. Setting a strict threshold like this could be helpful in selecting genes that are likely to validate with qRT-PCR.
I will run DESeq2 with and without this threshold to see how it impacts the results.
To see whether this has any effect on downstream analysis, DESeq2 will be run on the same dataset without any filtering, with light filtering and with strict filtering.
Here I define light filter threshold as the lowest baseline expression level that DE genes start to appear. I know this contrast will yield thousands of DE genes, so this is a reasonable way to set the detection threshold. The DE genes with lowest baseline expression showed mean read counts of about 2.
# DESeq2 without any fitering
dds <- DESeqDataSetFromMatrix(countData = mx1 , colData = ss1, design = ~ vpa )
res <- DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
z<- results(res)
vsd <- vst(dds, blind=FALSE)
zz<-cbind(as.data.frame(z),mx1)
dge1<-as.data.frame(zz[order(zz$pvalue),])
head(dge1,10) %>%
kbl(caption="Top DE genes for VPA treatment (unfiltered)") %>%
kable_paper("hover", full_width = F)
baseMean | log2FoldChange | lfcSE | stat | pvalue | padj | SRR6467485 | SRR6467486 | SRR6467487 | SRR6467488 | SRR6467489 | SRR6467490 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ENSG00000116852 KIF21B | 758.8708 | 4.235925 | 0.1637805 | 25.86342 | 0 | 0 | 65 | 81 | 91 | 1370 | 1232 | 1720 |
ENSG00000103034 NDRG4 | 809.8276 | 4.068972 | 0.1680121 | 24.21833 | 0 | 0 | 81 | 93 | 107 | 1699 | 1264 | 1605 |
ENSG00000101210 EEF1A2 | 623.2070 | 4.319276 | 0.1812953 | 23.82453 | 0 | 0 | 52 | 71 | 61 | 1305 | 1067 | 1158 |
ENSG00000205978 NYNRIN | 922.4737 | 4.414294 | 0.1955579 | 22.57282 | 0 | 0 | 79 | 72 | 102 | 1531 | 1788 | 1917 |
ENSG00000168280 KIF5C | 420.9624 | 5.242347 | 0.2324458 | 22.55298 | 0 | 0 | 21 | 20 | 26 | 858 | 675 | 924 |
ENSG00000187764 SEMA4D | 469.4849 | 3.365642 | 0.1621712 | 20.75364 | 0 | 0 | 71 | 86 | 100 | 830 | 784 | 939 |
ENSG00000025039 RRAGD | 544.0335 | 3.627284 | 0.1778770 | 20.39209 | 0 | 0 | 71 | 108 | 73 | 1046 | 921 | 1029 |
ENSG00000039068 CDH1 | 1204.3439 | 2.464853 | 0.1226862 | 20.09071 | 0 | 0 | 317 | 404 | 419 | 1985 | 1865 | 2233 |
ENSG00000182575 NXPH3 | 406.9713 | 5.120785 | 0.2615973 | 19.57507 | 0 | 0 | 11 | 41 | 20 | 697 | 776 | 886 |
ENSG00000100321 SYNGR1 | 324.7426 | 4.448159 | 0.2280782 | 19.50278 | 0 | 0 | 32 | 25 | 30 | 645 | 556 | 648 |
sig1 <- subset(dge1,padj<=0.05)
message("number of significant genes")
## number of significant genes
nrow(sig1)
## [1] 7089
plot(rowMeans( log10(dge1[,7:12]+1) ),dge1$log2FoldChange,
cex=0.6,pch=19,col="darkgray",
xlab="log10(mean read count)",ylab="log2(fold change)")
points(rowMeans( log10(sig1[,7:12]+1) ),sig1$log2FoldChange,
cex=0.6,pch=19,col="red")
abline(v=1,lty=2)
dge1up <- rownames(subset(dge1,padj<=0.05 & log2FoldChange>0))
dge1up <- unique(sapply(strsplit(dge1up," "),"[[",2))
dge1dn <- rownames(subset(dge1,padj<=0.05 & log2FoldChange<0))
dge1dn <- unique(sapply(strsplit(dge1dn," "),"[[",2))
dge1bg <- rownames(dge1)
dge1bg <- unique(sapply(strsplit(dge1bg," "),"[[",2))
look at the lowest basemean genes
sig1 <- sig1[order(sig1$baseMean),]
head(sig1,10) %>%
kbl(caption="lowest basemean genes") %>%
kable_paper("hover", full_width = F)
baseMean | log2FoldChange | lfcSE | stat | pvalue | padj | SRR6467485 | SRR6467486 | SRR6467487 | SRR6467488 | SRR6467489 | SRR6467490 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ENSG00000095917 TPSD1 | 1.955539 | 4.446961 | 1.824035 | 2.437981 | 0.0147696 | 0.0472248 | 0 | 0 | 0 | 4 | 2 | 6 |
ENSG00000144230 GPR17 | 1.970787 | 4.456430 | 1.817191 | 2.452373 | 0.0141917 | 0.0456696 | 0 | 0 | 0 | 5 | 2 | 5 |
ENSG00000066382 MPPED2 | 1.989149 | 4.467431 | 1.791035 | 2.494329 | 0.0126195 | 0.0414081 | 0 | 0 | 0 | 4 | 3 | 5 |
ENSG00000204475 NCR3 | 1.992263 | 4.468833 | 1.821833 | 2.452932 | 0.0141697 | 0.0456337 | 0 | 0 | 0 | 2 | 4 | 6 |
ENSG00000273294 C1QTNF3-AMACR | 1.992837 | -4.416860 | 1.791380 | -2.465619 | 0.0136777 | 0.0443251 | 4 | 4 | 4 | 0 | 0 | 0 |
ENSG00000134533 RERG | 2.034892 | 4.495207 | 1.838488 | 2.445056 | 0.0144830 | 0.0464671 | 0 | 0 | 0 | 7 | 3 | 2 |
ENSG00000140009 ESR2 | 2.034892 | 4.495207 | 1.838488 | 2.445056 | 0.0144830 | 0.0464671 | 0 | 0 | 0 | 7 | 3 | 2 |
ENSG00000137674 MMP20 | 2.038006 | 4.496467 | 1.788898 | 2.513540 | 0.0119526 | 0.0395104 | 0 | 0 | 0 | 5 | 4 | 3 |
ENSG00000106633 GCK | 2.056368 | 4.507043 | 1.788079 | 2.520606 | 0.0117153 | 0.0388516 | 0 | 0 | 0 | 4 | 5 | 3 |
ENSG00000232618 AL355304.1 | 2.077321 | 4.537447 | 1.858690 | 2.441207 | 0.0146382 | 0.0468782 | 0 | 0 | 0 | 2 | 2 | 9 |
The counts look quite low. Not sure that these DEGs could be validated with qRT-PCR.
mx2 <- mx1[which(rowMeans(mx1)>=10),]
dim(mx2)
## [1] 15852 6
dds <- DESeqDataSetFromMatrix(countData = mx2 , colData = ss1, design = ~ vpa )
res <- DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
z<- results(res)
vsd <- vst(dds, blind=FALSE)
zz<-cbind(as.data.frame(z),mx2)
dge2<-as.data.frame(zz[order(zz$pvalue),])
head(dge2,10) %>%
kbl(caption="Top DE genes for VPA treatment (filtered)") %>%
kable_paper("hover", full_width = F)
baseMean | log2FoldChange | lfcSE | stat | pvalue | padj | SRR6467485 | SRR6467486 | SRR6467487 | SRR6467488 | SRR6467489 | SRR6467490 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ENSG00000116852 KIF21B | 763.8163 | 4.248737 | 0.1718694 | 24.72073 | 0 | 0 | 65 | 81 | 91 | 1370 | 1232 | 1720 |
ENSG00000205978 NYNRIN | 928.0397 | 4.427457 | 0.1887125 | 23.46139 | 0 | 0 | 79 | 72 | 102 | 1531 | 1788 | 1917 |
ENSG00000103034 NDRG4 | 815.8752 | 4.083139 | 0.1748377 | 23.35388 | 0 | 0 | 81 | 93 | 107 | 1699 | 1264 | 1605 |
ENSG00000101210 EEF1A2 | 627.8539 | 4.333653 | 0.1884855 | 22.99197 | 0 | 0 | 52 | 71 | 61 | 1305 | 1067 | 1158 |
ENSG00000168280 KIF5C | 424.0618 | 5.255390 | 0.2402672 | 21.87311 | 0 | 0 | 21 | 20 | 26 | 858 | 675 | 924 |
ENSG00000182575 NXPH3 | 409.5097 | 5.132293 | 0.2538263 | 20.21970 | 0 | 0 | 11 | 41 | 20 | 697 | 776 | 886 |
ENSG00000072071 ADGRL1 | 1409.6105 | 3.197538 | 0.1606012 | 19.90980 | 0 | 0 | 247 | 317 | 290 | 2078 | 2596 | 2839 |
ENSG00000156515 HK1 | 1062.6425 | 3.220757 | 0.1622888 | 19.84583 | 0 | 0 | 159 | 263 | 222 | 1557 | 1898 | 2235 |
ENSG00000025039 RRAGD | 547.7283 | 3.641436 | 0.1840194 | 19.78833 | 0 | 0 | 71 | 108 | 73 | 1046 | 921 | 1029 |
ENSG00000187764 SEMA4D | 472.4305 | 3.378284 | 0.1754224 | 19.25800 | 0 | 0 | 71 | 86 | 100 | 830 | 784 | 939 |
sig2 <- subset(dge2,padj<=0.05)
nrow(sig2)
## [1] 6762
plot(log10(dge2$baseMean),dge2$log2FoldChange,
cex=0.6,pch=19,col="darkgray",
xlab="log10(base mean)",ylab="log2 fold change",
main="strict filter")
points(log10(sig2$baseMean),sig2$log2FoldChange,
cex=0.6,pch=19,col="red")
dge2up <- rownames(subset(dge2,padj<=0.05 & log2FoldChange>0))
dge2up <- unique(sapply(strsplit(dge2up," "),"[[",2))
dge2dn <- rownames(subset(dge2,padj<=0.05 & log2FoldChange<0))
dge2dn <- unique(sapply(strsplit(dge2dn," "),"[[",2))
dge2bg <- rownames(dge2)
dge2bg <- unique(sapply(strsplit(dge2bg," "),"[[",2))
I will run enrichment analysis using the correct correct background list, and compare the results to the incorrect whole genome background.
if (! file.exists("ReactomePathways.gmt")) {
download.file("https://reactome.org/download/current/ReactomePathways.gmt.zip",
destfile="ReactomePathways.gmt.zip")
unzip("ReactomePathways.gmt.zip")
}
genesets2 <- read.gmt("ReactomePathways.gmt")
Firstly with the upregulated genes
ora2_up <- as.data.frame(enricher(gene = dge2up ,
universe = dge2bg, maxGSSize = 5000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora2_up$geneID <- NULL
ora2_ups <- rownames(subset(ora2_up,p.adjust<0.05))
gr <- as.numeric(sapply(strsplit(ora2_up$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora2_up$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora2_up$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora2_up$BgRatio,"/"),"[[",2))
ora2_up$es <- gr/br
ora2_up <- ora2_up[order(-ora2_up$es),]
head(ora2_up)
## ID
## Adenylate cyclase inhibitory pathway Adenylate cyclase inhibitory pathway
## PKA activation PKA activation
## Long-term potentiation Long-term potentiation
## PKA-mediated phosphorylation of CREB PKA-mediated phosphorylation of CREB
## Activation of GABAB receptors Activation of GABAB receptors
## GABA B receptor activation GABA B receptor activation
## Description
## Adenylate cyclase inhibitory pathway Adenylate cyclase inhibitory pathway
## PKA activation PKA activation
## Long-term potentiation Long-term potentiation
## PKA-mediated phosphorylation of CREB PKA-mediated phosphorylation of CREB
## Activation of GABAB receptors Activation of GABAB receptors
## GABA B receptor activation GABA B receptor activation
## GeneRatio BgRatio pvalue p.adjust
## Adenylate cyclase inhibitory pathway 9/1806 10/7666 1.736468e-05 0.001290774
## PKA activation 11/1806 14/7666 2.137723e-05 0.001449423
## Long-term potentiation 9/1806 12/7666 2.381870e-04 0.007587956
## PKA-mediated phosphorylation of CREB 11/1806 15/7666 6.302994e-05 0.002811135
## Activation of GABAB receptors 14/1806 20/7666 1.384093e-05 0.001157447
## GABA B receptor activation 14/1806 20/7666 1.384093e-05 0.001157447
## qvalue Count es
## Adenylate cyclase inhibitory pathway 0.001169831 9 3.820266
## PKA activation 0.001313614 11 3.335153
## Long-term potentiation 0.006876977 9 3.183555
## PKA-mediated phosphorylation of CREB 0.002547737 11 3.112809
## Activation of GABAB receptors 0.001048997 14 2.971318
## GABA B receptor activation 0.001048997 14 2.971318
topup2 <- rev(head(ora2_up$es,20))
names(topup2) <- rev(head(ora2_up$ID,20))
Now with the downregulated genes
ora2_dn <- as.data.frame(enricher(gene = dge2dn ,
universe = dge2bg, maxGSSize = 5000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora2_dn$geneID <- NULL
ora2_dns <- rownames(subset(ora2_dn,p.adjust<0.05))
gr <- as.numeric(sapply(strsplit(ora2_dn$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora2_dn$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora2_dn$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora2_dn$BgRatio,"/"),"[[",2))
ora2_dn$es <- gr/br
ora2_dn <- ora2_dn[order(-ora2_dn$es),]
head(ora2_dn)
## ID
## Defects in vitamin and cofactor metabolism Defects in vitamin and cofactor metabolism
## Defects in cobalamin (B12) metabolism Defects in cobalamin (B12) metabolism
## Synthesis of bile acids and bile salts via 27-hydroxycholesterol Synthesis of bile acids and bile salts via 27-hydroxycholesterol
## RMTs methylate histone arginines RMTs methylate histone arginines
## Biotin transport and metabolism Biotin transport and metabolism
## Methylation Methylation
## Description
## Defects in vitamin and cofactor metabolism Defects in vitamin and cofactor metabolism
## Defects in cobalamin (B12) metabolism Defects in cobalamin (B12) metabolism
## Synthesis of bile acids and bile salts via 27-hydroxycholesterol Synthesis of bile acids and bile salts via 27-hydroxycholesterol
## RMTs methylate histone arginines RMTs methylate histone arginines
## Biotin transport and metabolism Biotin transport and metabolism
## Methylation Methylation
## GeneRatio
## Defects in vitamin and cofactor metabolism 15/1855
## Defects in cobalamin (B12) metabolism 9/1855
## Synthesis of bile acids and bile salts via 27-hydroxycholesterol 9/1855
## RMTs methylate histone arginines 20/1855
## Biotin transport and metabolism 7/1855
## Methylation 7/1855
## BgRatio
## Defects in vitamin and cofactor metabolism 20/7666
## Defects in cobalamin (B12) metabolism 12/7666
## Synthesis of bile acids and bile salts via 27-hydroxycholesterol 13/7666
## RMTs methylate histone arginines 30/7666
## Biotin transport and metabolism 11/7666
## Methylation 11/7666
## pvalue
## Defects in vitamin and cofactor metabolism 2.370976e-06
## Defects in cobalamin (B12) metabolism 2.966890e-04
## Synthesis of bile acids and bile salts via 27-hydroxycholesterol 7.567769e-04
## RMTs methylate histone arginines 9.904943e-07
## Biotin transport and metabolism 6.197420e-03
## Methylation 6.197420e-03
## p.adjust
## Defects in vitamin and cofactor metabolism 0.0004051406
## Defects in cobalamin (B12) metabolism 0.0213459940
## Synthesis of bile acids and bile salts via 27-hydroxycholesterol 0.0416170751
## RMTs methylate histone arginines 0.0001934294
## Biotin transport and metabolism 0.2017112715
## Methylation 0.2017112715
## qvalue
## Defects in vitamin and cofactor metabolism 0.0003993223
## Defects in cobalamin (B12) metabolism 0.0210394428
## Synthesis of bile acids and bile salts via 27-hydroxycholesterol 0.0410194095
## RMTs methylate histone arginines 0.0001906515
## Biotin transport and metabolism 0.1988144824
## Methylation 0.1988144824
## Count es
## Defects in vitamin and cofactor metabolism 15 3.099461
## Defects in cobalamin (B12) metabolism 9 3.099461
## Synthesis of bile acids and bile salts via 27-hydroxycholesterol 9 2.861041
## RMTs methylate histone arginines 20 2.755076
## Biotin transport and metabolism 7 2.629846
## Methylation 7 2.629846
topdn2 <- head(ora2_dn$es,20)
names(topdn2) <- head(ora2_dn$ID,20)
Make a barplot
par(mar=c(5,20,5,1))
cols <- c(rep("blue",20),rep("red",20))
barplot(c(topdn2,topup2),
horiz=TRUE,las=1,cex.names=0.65,col=cols,
main="top DE Reactomes",
xlab="ES")
mtext("correct background")
Firstly with the upregulated genes
ora1_up <- as.data.frame(enricher(gene = dge1up ,
universe = dge1bg, maxGSSize = 5000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora1_up$geneID <- NULL
ora1_ups <- rownames(subset(ora1_up,p.adjust<0.05))
gr <- as.numeric(sapply(strsplit(ora1_up$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora1_up$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora1_up$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora1_up$BgRatio,"/"),"[[",2))
ora1_up$es <- gr/br
ora1_up <- ora1_up[order(-ora1_up$es),]
head(ora1_up)
## ID
## Adenylate cyclase activating pathway Adenylate cyclase activating pathway
## Adenylate cyclase inhibitory pathway Adenylate cyclase inhibitory pathway
## Ephrin signaling Ephrin signaling
## PKA activation PKA activation
## Defective EXT1 causes exostoses 1, TRPS2 and CHDS Defective EXT1 causes exostoses 1, TRPS2 and CHDS
## Defective EXT2 causes exostoses 2 Defective EXT2 causes exostoses 2
## Description
## Adenylate cyclase activating pathway Adenylate cyclase activating pathway
## Adenylate cyclase inhibitory pathway Adenylate cyclase inhibitory pathway
## Ephrin signaling Ephrin signaling
## PKA activation PKA activation
## Defective EXT1 causes exostoses 1, TRPS2 and CHDS Defective EXT1 causes exostoses 1, TRPS2 and CHDS
## Defective EXT2 causes exostoses 2 Defective EXT2 causes exostoses 2
## GeneRatio BgRatio
## Adenylate cyclase activating pathway 7/1907 10/10922
## Adenylate cyclase inhibitory pathway 9/1907 14/10922
## Ephrin signaling 12/1907 19/10922
## PKA activation 11/1907 18/10922
## Defective EXT1 causes exostoses 1, TRPS2 and CHDS 8/1907 14/10922
## Defective EXT2 causes exostoses 2 8/1907 14/10922
## pvalue p.adjust
## Adenylate cyclase activating pathway 3.588639e-04 0.012217685
## Adenylate cyclase inhibitory pathway 1.273383e-04 0.007064916
## Ephrin signaling 1.161813e-05 0.001186937
## PKA activation 4.262818e-05 0.002902591
## Defective EXT1 causes exostoses 1, TRPS2 and CHDS 9.409871e-04 0.023108176
## Defective EXT2 causes exostoses 2 9.409871e-04 0.023108176
## qvalue Count es
## Adenylate cyclase activating pathway 0.010826014 7 4.009124
## Adenylate cyclase inhibitory pathway 0.006260178 9 3.681849
## Ephrin signaling 0.001051738 12 3.617255
## PKA activation 0.002571968 11 3.500029
## Defective EXT1 causes exostoses 1, TRPS2 and CHDS 0.020476010 8 3.272755
## Defective EXT2 causes exostoses 2 0.020476010 8 3.272755
topup1 <- rev(head(ora1_up$es,20))
names(topup1) <- rev(head(ora1_up$ID,20))
Now with the downregulated genes
ora1_dn <- as.data.frame(enricher(gene = dge1dn ,
universe = dge1bg, maxGSSize = 5000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora1_dn$geneID <- NULL
ora1_dns <- rownames(subset(ora1_dn,p.adjust<0.05))
gr <- as.numeric(sapply(strsplit(ora1_dn$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora1_dn$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora1_dn$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora1_dn$BgRatio,"/"),"[[",2))
ora1_dn$es <- gr/br
ora1_dn <- ora1_dn[order(-ora1_dn$es),]
head(ora1_dn)
## ID
## Defects in vitamin and cofactor metabolism Defects in vitamin and cofactor metabolism
## Defects in cobalamin (B12) metabolism Defects in cobalamin (B12) metabolism
## Beta-oxidation of very long chain fatty acids Beta-oxidation of very long chain fatty acids
## Biotin transport and metabolism Biotin transport and metabolism
## RMTs methylate histone arginines RMTs methylate histone arginines
## Synthesis of bile acids and bile salts via 27-hydroxycholesterol Synthesis of bile acids and bile salts via 27-hydroxycholesterol
## Description
## Defects in vitamin and cofactor metabolism Defects in vitamin and cofactor metabolism
## Defects in cobalamin (B12) metabolism Defects in cobalamin (B12) metabolism
## Beta-oxidation of very long chain fatty acids Beta-oxidation of very long chain fatty acids
## Biotin transport and metabolism Biotin transport and metabolism
## RMTs methylate histone arginines RMTs methylate histone arginines
## Synthesis of bile acids and bile salts via 27-hydroxycholesterol Synthesis of bile acids and bile salts via 27-hydroxycholesterol
## GeneRatio
## Defects in vitamin and cofactor metabolism 15/1860
## Defects in cobalamin (B12) metabolism 9/1860
## Beta-oxidation of very long chain fatty acids 8/1860
## Biotin transport and metabolism 7/1860
## RMTs methylate histone arginines 19/1860
## Synthesis of bile acids and bile salts via 27-hydroxycholesterol 9/1860
## BgRatio
## Defects in vitamin and cofactor metabolism 20/10922
## Defects in cobalamin (B12) metabolism 12/10922
## Beta-oxidation of very long chain fatty acids 11/10922
## Biotin transport and metabolism 11/10922
## RMTs methylate histone arginines 31/10922
## Synthesis of bile acids and bile salts via 27-hydroxycholesterol 15/10922
## pvalue
## Defects in vitamin and cofactor metabolism 1.835693e-08
## Defects in cobalamin (B12) metabolism 1.588286e-05
## Beta-oxidation of very long chain fatty acids 7.066624e-05
## Biotin transport and metabolism 7.158270e-04
## RMTs methylate histone arginines 3.985178e-08
## Synthesis of bile acids and bile salts via 27-hydroxycholesterol 2.206981e-04
## p.adjust
## Defects in vitamin and cofactor metabolism 1.687690e-06
## Defects in cobalamin (B12) metabolism 8.653219e-04
## Beta-oxidation of very long chain fatty acids 3.353227e-03
## Biotin transport and metabolism 2.024964e-02
## RMTs methylate histone arginines 3.256776e-06
## Synthesis of bile acids and bile salts via 27-hydroxycholesterol 8.324278e-03
## qvalue
## Defects in vitamin and cofactor metabolism 1.464931e-06
## Defects in cobalamin (B12) metabolism 7.511077e-04
## Beta-oxidation of very long chain fatty acids 2.910633e-03
## Biotin transport and metabolism 1.757689e-02
## RMTs methylate histone arginines 2.826913e-06
## Synthesis of bile acids and bile salts via 27-hydroxycholesterol 7.225553e-03
## Count es
## Defects in vitamin and cofactor metabolism 15 4.404032
## Defects in cobalamin (B12) metabolism 9 4.404032
## Beta-oxidation of very long chain fatty acids 8 4.270577
## Biotin transport and metabolism 7 3.736755
## RMTs methylate histone arginines 19 3.598994
## Synthesis of bile acids and bile salts via 27-hydroxycholesterol 9 3.523226
topdn1 <- head(ora1_dn$es,20)
names(topdn1) <- head(ora1_dn$ID,20)
Make a barplot
par(mar=c(5,20,5,1))
cols <- c(rep("blue",20),rep("red",20))
barplot(c(topdn1,topup1),
horiz=TRUE,las=1,cex.names=0.65,col=cols,
main="top DE Reactomes",
xlab="ES")
mtext("whole genome background")
v1 <- list("correct up"=ora2_ups, "wg up"=ora1_ups)
v2 <- list("correct dn"=ora2_dns, "wg dn"=ora1_dns)
par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = TRUE)
plot(euler(v2),quantities = TRUE)
dev.off()
## X11cairo
## 2
sessionInfo()
## R version 4.2.2 Patched (2022-11-10 r83330)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] eulerr_7.0.0 clusterProfiler_4.4.4
## [3] vioplot_0.4.0 zoo_1.8-11
## [5] sm_2.2-5.7.1 kableExtra_1.3.4
## [7] DESeq2_1.36.0 SummarizedExperiment_1.26.1
## [9] Biobase_2.56.0 MatrixGenerics_1.8.1
## [11] matrixStats_0.63.0 GenomicRanges_1.48.0
## [13] GenomeInfoDb_1.32.2 IRanges_2.30.0
## [15] S4Vectors_0.34.0 BiocGenerics_0.42.0
## [17] getDEE2_1.6.0
##
## loaded via a namespace (and not attached):
## [1] shadowtext_0.1.2 fastmatch_1.1-3 systemfonts_1.0.4
## [4] plyr_1.8.8 igraph_1.3.5 lazyeval_0.2.2
## [7] polylabelr_0.2.0 splines_4.2.2 BiocParallel_1.30.3
## [10] ggplot2_3.4.0 digest_0.6.31 yulab.utils_0.0.5
## [13] htmltools_0.5.4 GOSemSim_2.22.0 viridis_0.6.2
## [16] GO.db_3.15.0 fansi_1.0.3 magrittr_2.0.3
## [19] memoise_2.0.1 Biostrings_2.64.0 annotate_1.74.0
## [22] graphlayouts_0.8.4 svglite_2.1.0 enrichplot_1.16.1
## [25] colorspace_2.0-3 blob_1.2.3 rvest_1.0.3
## [28] ggrepel_0.9.2 xfun_0.35 dplyr_1.0.10
## [31] crayon_1.5.2 RCurl_1.98-1.9 jsonlite_1.8.4
## [34] scatterpie_0.1.8 genefilter_1.78.0 survival_3.4-0
## [37] ape_5.6-2 glue_1.6.2 polyclip_1.10-4
## [40] gtable_0.3.1 zlibbioc_1.42.0 XVector_0.36.0
## [43] webshot_0.5.4 htm2txt_2.2.2 DelayedArray_0.22.0
## [46] scales_1.2.1 DOSE_3.22.0 DBI_1.1.3
## [49] Rcpp_1.0.9 viridisLite_0.4.1 xtable_1.8-4
## [52] gridGraphics_0.5-1 tidytree_0.4.1 bit_4.0.5
## [55] httr_1.4.4 fgsea_1.22.0 RColorBrewer_1.1-3
## [58] pkgconfig_2.0.3 XML_3.99-0.13 farver_2.1.1
## [61] sass_0.4.4 locfit_1.5-9.6 utf8_1.2.2
## [64] ggplotify_0.1.0 tidyselect_1.2.0 rlang_1.0.6
## [67] reshape2_1.4.4 AnnotationDbi_1.58.0 munsell_0.5.0
## [70] tools_4.2.2 cachem_1.0.6 downloader_0.4
## [73] cli_3.4.1 generics_0.1.3 RSQLite_2.2.19
## [76] evaluate_0.19 stringr_1.5.0 fastmap_1.1.0
## [79] yaml_2.3.6 ggtree_3.4.1 knitr_1.41
## [82] bit64_4.0.5 tidygraph_1.2.2 purrr_0.3.5
## [85] KEGGREST_1.36.3 ggraph_2.1.0 nlme_3.1-161
## [88] aplot_0.1.9 DO.db_2.9 xml2_1.3.3
## [91] compiler_4.2.2 rstudioapi_0.14 png_0.1-8
## [94] treeio_1.20.1 tibble_3.1.8 tweenr_2.0.2
## [97] geneplotter_1.74.0 bslib_0.4.1 stringi_1.7.8
## [100] highr_0.9 lattice_0.20-45 Matrix_1.5-3
## [103] vctrs_0.5.1 pillar_1.8.1 lifecycle_1.0.3
## [106] jquerylib_0.1.4 data.table_1.14.6 bitops_1.0-7
## [109] patchwork_1.1.2 qvalue_2.28.0 R6_2.5.1
## [112] gridExtra_2.3 codetools_0.2-19 MASS_7.3-58.2
## [115] assertthat_0.2.1 withr_2.5.0 GenomeInfoDbData_1.2.8
## [118] parallel_4.2.2 grid_4.2.2 ggfun_0.0.9
## [121] tidyr_1.2.1 rmarkdown_2.19 ggforce_0.4.1