Source: https://github.com/markziemann/enrichment_recipe
This guide is a Rmarkdown script that conducts differential expression and enrichment analysis, which are very popular workflows for transcriptome data.
The goal of this work is to understand how ClusterProfiler manages the background list, as we have observed some weird behaviour. In particular we see that when we provide a custom background, those genes do not appear in the final analysis unless they are also included in the gene list.
In the code chunk below called libs
, you can add and remove required R library dependancies. Check that the libraries listed here match the Dockerfile, otherwise you might get errors.
suppressPackageStartupMessages({
library("DESeq2")
library("kableExtra")
library("clusterProfiler")
library("fgsea")
library("eulerr")
library("gplots")
})
For this report I will be using proteomics data from Stukalov et al (https://pubmed.ncbi.nlm.nih.gov/33845483/), who profiled the effect of SARS-COV2 infection on A549 cells.
Here we used the 24hr profile, as there were relatively more changes found.
Probable contaminants were removed, as well as viral proteins. This left 5825 proteins, including 165 downregulated and 44 upregulated.
The data were obtained from Supplementary Table 5 and saved to the /data folder.
def <- read.table("../data/sarscov2_prot.tsv",sep="\t",header=TRUE)
In the next sections I will run enrichment analysis with over-representation analysis (ORA) test and compare it to functional class scoring. I will also investigate some strange behviour of the ORA tool clusterprofiler.
I’ve compiled a reporting checklist:
Reporting criteria | Method/resource used |
---|---|
Origin of gene sets | KEGG (2023-06-16) |
Tool used | ClusterProfiler (check version at foot of report) |
Statistical test used | hypergeometric test |
P-value correction for multiple comparisons | FDR method |
Background list | Genes with >=10 reads per sample on average across all samples |
Gene Selection Criteria | DESeq2 FDR<0.05 |
ORA directionality | Separate tests for up- and down-regulation |
Data availability | Stukalov et al 2021: PMID 33845483 |
Other parameters | Min gene set size of 10 |
Here I provide a custom background gene list.
Get the gene sets loaded in R. These are KEGG gene sets
# from https://www.gsea-msigdb.org/gsea/msigdb/human/collections.jsp 16th June 2023
genesets2 <- read.gmt("../ref/c2.cp.kegg.v2023.1.Hs.symbols.gmt")
gsets <- gmtPathways("../ref/c2.cp.kegg.v2023.1.Hs.symbols.gmt")
message(paste("number of genes described in the annotation set:",length(unique(genesets2$gene))))
## number of genes described in the annotation set: 5244
Now filter the gene names into three lists, up-regulated, down-regulated and background. Background is simply all genes that were detected.
defup <- unique(subset(def,change.SARS_CoV2.24h_vs_mock.24h=="+")$gene_name)
defdn <- unique(subset(def,change.SARS_CoV2.24h_vs_mock.24h=="-")$gene_name)
bg <- unique(def$gene_name)
message(paste("number of proteins in background:",length(bg)))
## number of proteins in background: 5804
Enrichment firstly with upregulated genes
n_pw=10
ora_up <- as.data.frame(enricher(gene = defup ,
universe = bg, maxGSSize = 5000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora_up$geneID <- NULL
ora_up <- subset(ora_up,p.adjust<0.05 & Count >=5)
ora_ups <- rownames(ora_up)
gr <- as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",2))
ora_up$es <- gr/br
ora_up <- ora_up[order(-ora_up$es),]
ora_up$Description=NULL
head(ora_up) %>%
kbl(row.names = FALSE, caption="Top upregulated pathways in LPS treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
topup2 <- rev(head(ora_up$es,10))
names(topup2) <- rev(head(ora_up$ID,10))
The reported background size does not match the dataset.
Now repeat with the downregulated genes
ora_dn <- as.data.frame(enricher(gene = defdn ,
universe = bg, maxGSSize = 5000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora_dn$geneID <- NULL
ora_dn <- subset(ora_dn,p.adjust<0.05 & Count >=5)
ora_dns <- rownames(ora_dn)
gr <- as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",2))
ora_dn$es <- gr/br
ora_dn <- ora_dn[order(-ora_dn$es),]
ora_dn$Description=NULL
head(ora_dn,n_pw) %>%
kbl(row.names = FALSE, caption="Top downregulated pathways in LPS treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
KEGG_CELL_ADHESION_MOLECULES_CAMS | 9/66 | 25/1942 | 0.0e+00 | 0.0000033 | 0.0000032 | 9 | 10.592727 |
KEGG_LYSOSOME | 11/66 | 65/1942 | 6.3e-06 | 0.0002243 | 0.0002195 | 11 | 4.979487 |
topdn2 <- head(ora_dn$es,n_pw)
names(topdn2) <- head(ora_dn$ID,n_pw)
Make a barplot
par(mar=c(5,20,5,1))
cols <- c(rep("blue",n_pw),rep("red",n_pw))
top <- c(topdn2,topup2)
if ( length(top) > 1 ) {
barplot(c(top),
horiz=TRUE,las=1,cex.names=0.65,col=cols,
main="top DE KEGGs",
xlab="ES",
cex.main=0.9)
mtext("ORA test")
}
I’ve compiled a reporting checklist:
Reporting criteria | Method/resource used |
---|---|
Origin of gene sets | KEGG (2023-06-16) |
Tool used | ClusterProfiler (check version at foot of report) |
Statistical test used | hypergeometric test |
P-value correction for multiple comparisons | FDR method |
Background list | Genes with >=10 reads per sample on average across all samples |
Gene Selection Criteria | DESeq2 FDR<0.05 |
ORA directionality | Separate tests for up- and down-regulation |
Data availability | Data availability |
Other parameters | Min gene set size of 10 + BUG FIX |
Here I provide a background gene list for cluterprofiler to use, like a user should.
I have also done some modification to the KEGG gene sets, to en
Now filter the gene names into three lists, up-regulated, down-regulated and background. Background is simply all genes that were detected.
Adding all detected genes to the background appears to improve results!
bgdf <- data.frame("background",bg)
colnames(bgdf) <- c("term","gene")
genesets2 <- rbind(genesets2,bgdf)
Enrichment firstly with upregulated genes
n_pw=10
orafix_up <- as.data.frame(enricher(gene = defup ,
universe = bg, maxGSSize = 5000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
orafix_up$geneID <- NULL
orafix_up <- subset(orafix_up,p.adjust<0.05 & Count >=5)
orafix_ups <- rownames(orafix_up)
gr <- as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",2))
orafix_up$es <- gr/br
orafix_up <- orafix_up[order(-orafix_up$es),]
orafix_up$Description=NULL
head(orafix_up) %>%
kbl(row.names = FALSE, caption="Top upregulated pathways in LPS treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
topup2 <- rev(head(orafix_up$es,10))
names(topup2) <- rev(head(orafix_up$ID,10))
Now with the downregulated genes
orafix_dn <- as.data.frame(enricher(gene = defdn ,
universe = bg, maxGSSize = 5000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
orafix_dn$geneID <- NULL
orafix_dn <- subset(orafix_dn,p.adjust<0.05 & Count >=5)
orafix_dns <- rownames(orafix_dn)
gr <- as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",2))
orafix_dn$es <- gr/br
orafix_dn <- orafix_dn[order(-orafix_dn$es),]
orafix_dn$Description=NULL
head(orafix_dn,n_pw) %>%
kbl(row.names = FALSE, caption="Top downregulated pathways in LPS treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
KEGG_CELL_ADHESION_MOLECULES_CAMS | 9/165 | 25/5804 | 0.0e+00 | 1.00e-06 | 9.00e-07 | 9 | 12.66327 |
KEGG_LYSOSOME | 11/165 | 65/5804 | 1.7e-06 | 5.91e-05 | 5.69e-05 | 11 | 5.95282 |
topdn2 <- head(orafix_dn$es,n_pw)
names(topdn2) <- head(orafix_dn$ID,n_pw)
Make a barplot
par(mar=c(5,20,5,1))
cols <- c(rep("blue",n_pw),rep("red",n_pw))
top <- c(topdn2,topup2)
if ( length(top) > 1 ) {
barplot(c(top),
horiz=TRUE,las=1,cex.names=0.65,col=cols,
main="top DE KEGGs",
xlab="ES",
cex.main=0.9)
mtext("ORA fix")
}
v1 <- list("ORA up"=ora_ups, "ORA fix up"=orafix_ups)
v2 <- list("ORA dn"=ora_dns, "ORA fix dn"=orafix_dns)
v3 <- list("ORA"=union(ora_ups,ora_dns),
"ORA fix"=union(orafix_ups,orafix_dns))
par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2),main="upregulated KEGG pathways")
plot(euler(v2),quantities = list(cex = 2), labels = list(cex = 2),main="downregulated KEGG pathways")
plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated KEGG pathways")
png("prot_kegg_euler.png")
plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated KEGG pathways")
dev.off()
## png
## 2
Jaccard index comparing ORA and ORA fix.
jaccard <- function(a, b) {
intersection = length(intersect(a, b))
union = length(a) + length(b) - intersection
return (intersection/union)
}
ora <- c(ora_ups,ora_dns)
orafix <- c(orafix_ups,orafix_dns)
jaccard(ora,orafix)
## [1] 1
List gene sets identified only in the “fixed” analysis.
ora <- union(ora_ups,ora_dns)
orafix <- union(orafix_ups,orafix_dns)
setdiff(orafix,ora)
## character(0)
I’ve compiled a reporting checklist:
Reporting criteria | Method/resource used |
---|---|
Origin of gene sets | Reactome (2023-03-06) |
Tool used | ClusterProfiler (check version at foot of report) |
Statistical test used | hypergeometric test |
P-value correction for multiple comparisons | FDR method |
Background list | Genes with >=10 reads per sample on average across all samples |
Gene Selection Criteria | DESeq2 FDR<0.05 |
ORA directionality | Separate tests for up- and down-regulation |
Data availability | Data availability |
Other parameters | Min gene set size of 10 |
Here I provide a custom background gene list.
Get the gene sets loaded in R. These are Reactome gene sets
genesets2 <- read.gmt("../ref/ReactomePathways_2023-03-06.gmt")
gsets <- gmtPathways("../ref/ReactomePathways_2023-03-06.gmt")
message(paste("number of genes described in the annotation set:",length(unique(genesets2$gene))))
## number of genes described in the annotation set: 11457
Enrichment firstly with upregulated genes
n_pw=10
ora_up <- as.data.frame(enricher(gene = defup ,
universe = bg, maxGSSize = 5000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora_up$geneID <- NULL
ora_up <- subset(ora_up,p.adjust<0.05 & Count >=5)
ora_ups <- rownames(ora_up)
gr <- as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",2))
ora_up$es <- gr/br
ora_up <- ora_up[order(-ora_up$es),]
ora_up$Description=NULL
head(ora_up) %>%
kbl(row.names = FALSE, caption="Top upregulated pathways in LPS treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
Signaling by ALK fusions and activated point mutants | 5/32 | 43/3973 | 0.0000190 | 0.0023435 | 0.0019875 | 5 | 14.436773 |
Signaling by ALK in cancer | 5/32 | 43/3973 | 0.0000190 | 0.0023435 | 0.0019875 | 5 | 14.436773 |
Signaling by Interleukins | 7/32 | 181/3973 | 0.0004631 | 0.0228754 | 0.0194000 | 7 | 4.801623 |
Cytokine Signaling in Immune system | 8/32 | 281/3973 | 0.0013334 | 0.0478657 | 0.0405936 | 8 | 3.534698 |
topup2 <- rev(head(ora_up$es,10))
names(topup2) <- rev(head(ora_up$ID,10))
Now repeat with the downregulated genes
ora_dn <- as.data.frame(enricher(gene = defdn ,
universe = bg, maxGSSize = 5000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora_dn$geneID <- NULL
ora_dn <- subset(ora_dn,p.adjust<0.05 & Count >=5)
ora_dns <- rownames(ora_dn)
gr <- as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",2))
ora_dn$es <- gr/br
ora_dn <- ora_dn[order(-ora_dn$es),]
ora_dn$Description=NULL
head(ora_dn,n_pw) %>%
kbl(row.names = FALSE, caption="Top downregulated pathways in LPS treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
ECM proteoglycans | 5/131 | 17/3973 | 0.0001624 | 0.0382412 | 0.0361056 | 5 | 8.920072 |
Glycosphingolipid metabolism | 5/131 | 18/3973 | 0.0002189 | 0.0382412 | 0.0361056 | 5 | 8.424512 |
Extracellular matrix organization | 10/131 | 66/3973 | 0.0000472 | 0.0247204 | 0.0233399 | 10 | 4.595188 |
topdn2 <- head(ora_dn$es,n_pw)
names(topdn2) <- head(ora_dn$ID,n_pw)
Make a barplot
par(mar=c(5,20,5,1))
cols <- c(rep("blue",n_pw),rep("red",n_pw))
top <- c(topdn2,topup2)
if ( length(top) > 1 ) {
barplot(c(top),
horiz=TRUE,las=1,cex.names=0.65,col=cols,
main="top DE Reactomes",
xlab="ES",
cex.main=0.9)
mtext("ORA test")
}
I’ve compiled a reporting checklist:
Reporting criteria | Method/resource used |
---|---|
Origin of gene sets | Reactome (2023-03-06) |
Tool used | ClusterProfiler (check version at foot of report) |
Statistical test used | hypergeometric test |
P-value correction for multiple comparisons | FDR method |
Background list | Genes with >=10 reads per sample on average across all samples |
Gene Selection Criteria | DESeq2 FDR<0.05 |
ORA directionality | Separate tests for up- and down-regulation |
Data availability | Data availability |
Other parameters | Min gene set size of 10 + BUG FIX |
Adding all detected genes to the background appears to improve results!
bgdf <- data.frame("background",bg)
colnames(bgdf) <- c("term","gene")
genesets2 <- rbind(genesets2,bgdf)
Enrichment firstly with upregulated genes
orafix_up <- as.data.frame(enricher(gene = defup ,
universe = bg, maxGSSize = 5000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
orafix_up$geneID <- NULL
orafix_up <- subset(orafix_up,p.adjust<0.05 & Count >=5)
orafix_ups <- rownames(orafix_up)
gr <- as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(orafix_up$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(orafix_up$BgRatio,"/"),"[[",2))
orafix_up$es <- gr/br
orafix_up <- orafix_up[order(-orafix_up$es),]
orafix_up$Description=NULL
head(orafix_up) %>%
kbl(row.names = FALSE, caption="Top upregulated pathways in LPS treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
Signaling by ALK fusions and activated point mutants | 5/44 | 43/5804 | 0.0000154 | 0.0019033 | 0.0015979 | 5 | 15.338266 |
Signaling by ALK in cancer | 5/44 | 43/5804 | 0.0000154 | 0.0019033 | 0.0015979 | 5 | 15.338266 |
Signaling by Interleukins | 7/44 | 181/5804 | 0.0003684 | 0.0182006 | 0.0152803 | 7 | 5.101457 |
Cytokine Signaling in Immune system | 8/44 | 281/5804 | 0.0010554 | 0.0411722 | 0.0345660 | 8 | 3.755419 |
topup2 <- rev(head(orafix_up$es,10))
names(topup2) <- rev(head(orafix_up$ID,10))
Now with the downregulated genes
n_pw=10
orafix_dn <- as.data.frame(enricher(gene = defdn ,
universe = bg, maxGSSize = 5000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
orafix_dn$geneID <- NULL
orafix_dn <- subset(orafix_dn,p.adjust<0.05 & Count >=5)
orafix_dns <- rownames(orafix_dn)
gr <- as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(orafix_dn$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(orafix_dn$BgRatio,"/"),"[[",2))
orafix_dn$es <- gr/br
orafix_dn <- orafix_dn[order(-orafix_dn$es),]
orafix_dn$Description=NULL
head(orafix_dn,n_pw) %>%
kbl(row.names = FALSE, caption="Top downregulated pathways in LPS treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
ECM proteoglycans | 5/165 | 17/5804 | 0.0000820 | 0.0193713 | 0.0176669 | 5 | 10.345811 |
Glycosphingolipid metabolism | 5/165 | 18/5804 | 0.0001109 | 0.0193713 | 0.0176669 | 5 | 9.771044 |
Glycosaminoglycan metabolism | 6/165 | 33/5804 | 0.0002822 | 0.0258435 | 0.0235696 | 6 | 6.395592 |
Degradation of the extracellular matrix | 5/165 | 28/5804 | 0.0010104 | 0.0441225 | 0.0402403 | 5 | 6.281385 |
Post-translational protein phosphorylation | 6/165 | 35/5804 | 0.0003946 | 0.0258435 | 0.0235696 | 6 | 6.030130 |
Regulation of Insulin-like Growth Factor (IGF) transport and uptake by Insulin-like Growth Factor Binding Proteins (IGFBPs) | 6/165 | 37/5804 | 0.0005392 | 0.0313907 | 0.0286287 | 6 | 5.704177 |
Sphingolipid metabolism | 6/165 | 38/5804 | 0.0006254 | 0.0327698 | 0.0298865 | 6 | 5.554067 |
Extracellular matrix organization | 10/165 | 66/5804 | 0.0000140 | 0.0073459 | 0.0066996 | 10 | 5.329660 |
Diseases of glycosylation | 7/165 | 47/5804 | 0.0003186 | 0.0258435 | 0.0235696 | 7 | 5.238943 |
Diseases of metabolism | 9/165 | 90/5804 | 0.0009565 | 0.0441225 | 0.0402403 | 9 | 3.517576 |
topdn2 <- head(orafix_dn$es,n_pw)
names(topdn2) <- head(orafix_dn$ID,n_pw)
Make a barplot
par(mar=c(5,20,5,1))
cols <- c(rep("blue",n_pw),rep("red",n_pw))
top <- c(topdn2,topup2)
if ( length(top) > 1 ) {
barplot(c(top),
horiz=TRUE,las=1,cex.names=0.65,col=cols,
main="top DE Reactomes",
xlab="ES",
cex.main=0.9)
mtext("ORA fix")
}
v1 <- list("ORA up"=ora_ups, "ORA fix up"=orafix_ups)
v2 <- list("ORA dn"=ora_dns, "ORA fix dn"=orafix_dns)
v3 <- list("ORA"=union(ora_ups,ora_dns),
"ORA fix"=union(orafix_ups,orafix_dns))
par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2),main="upregulated Reactomes")
plot(euler(v2),quantities = list(cex = 2), labels = list(cex = 2),main="downregulated Reactomes")
plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated Reactomes")
png("prot_reactome_euler.png")
plot(euler(v3),quantities = list(cex = 2), labels = list(cex = 2),main="up- and down-regulated Reactomes")
dev.off()
## png
## 2
Jaccard index comparing ORA and ORA fix.
jaccard <- function(a, b) {
intersection = length(intersect(a, b))
union = length(a) + length(b) - intersection
return (intersection/union)
}
ora <- c(ora_ups,ora_dns)
orafix <- c(orafix_ups,orafix_dns)
jaccard(ora,orafix)
## [1] 0.5
List gene sets identified only in the “fixed” analysis.
ora <- union(ora_ups,ora_dns)
orafix <- union(orafix_ups,orafix_dns)
setdiff(orafix,ora)
## [1] "Glycosaminoglycan metabolism"
## [2] "Diseases of glycosylation"
## [3] "Post-translational protein phosphorylation"
## [4] "Regulation of Insulin-like Growth Factor (IGF) transport and uptake by Insulin-like Growth Factor Binding Proteins (IGFBPs)"
## [5] "Sphingolipid metabolism"
## [6] "Diseases of metabolism"
## [7] "Degradation of the extracellular matrix"
For reproducibility
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] gplots_3.1.3 eulerr_7.0.0
## [3] fgsea_1.26.0 clusterProfiler_4.8.2
## [5] kableExtra_1.3.4 DESeq2_1.40.2
## [7] SummarizedExperiment_1.30.2 Biobase_2.60.0
## [9] MatrixGenerics_1.12.2 matrixStats_1.0.0
## [11] GenomicRanges_1.52.0 GenomeInfoDb_1.36.1
## [13] IRanges_2.34.1 S4Vectors_0.38.1
## [15] BiocGenerics_0.46.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 rstudioapi_0.15.0 jsonlite_1.8.7
## [4] magrittr_2.0.3 farver_2.1.1 rmarkdown_2.23
## [7] zlibbioc_1.46.0 vctrs_0.6.3 memoise_2.0.1
## [10] RCurl_1.98-1.12 ggtree_3.8.2 webshot_0.5.5
## [13] htmltools_0.5.5 S4Arrays_1.0.5 gridGraphics_0.5-1
## [16] sass_0.4.7 bslib_0.5.0 KernSmooth_2.23-22
## [19] plyr_1.8.8 cachem_1.0.8 igraph_1.5.0.1
## [22] lifecycle_1.0.3 pkgconfig_2.0.3 Matrix_1.6-0
## [25] R6_2.5.1 fastmap_1.1.1 gson_0.1.0
## [28] GenomeInfoDbData_1.2.10 digest_0.6.33 aplot_0.1.10
## [31] enrichplot_1.20.0 colorspace_2.1-0 patchwork_1.1.2
## [34] AnnotationDbi_1.62.2 RSQLite_2.3.1 fansi_1.0.4
## [37] httr_1.4.6 polyclip_1.10-4 abind_1.4-5
## [40] compiler_4.3.1 bit64_4.0.5 withr_2.5.0
## [43] downloader_0.4 BiocParallel_1.34.2 viridis_0.6.4
## [46] DBI_1.1.3 highr_0.10 ggforce_0.4.1
## [49] MASS_7.3-60 DelayedArray_0.26.6 HDO.db_0.99.1
## [52] caTools_1.18.2 gtools_3.9.4 tools_4.3.1
## [55] ape_5.7-1 scatterpie_0.2.1 glue_1.6.2
## [58] nlme_3.1-162 GOSemSim_2.26.1 polylabelr_0.2.0
## [61] grid_4.3.1 shadowtext_0.1.2 reshape2_1.4.4
## [64] generics_0.1.3 gtable_0.3.3 tidyr_1.3.0
## [67] data.table_1.14.8 tidygraph_1.2.3 xml2_1.3.5
## [70] utf8_1.2.3 XVector_0.40.0 ggrepel_0.9.3
## [73] pillar_1.9.0 stringr_1.5.0 yulab.utils_0.0.6
## [76] splines_4.3.1 dplyr_1.1.2 tweenr_2.0.2
## [79] treeio_1.24.3 lattice_0.21-8 bit_4.0.5
## [82] tidyselect_1.2.0 GO.db_3.17.0 locfit_1.5-9.8
## [85] Biostrings_2.68.1 knitr_1.43 gridExtra_2.3
## [88] svglite_2.1.1 xfun_0.39 graphlayouts_1.0.0
## [91] stringi_1.7.12 yaml_2.3.7 lazyeval_0.2.2
## [94] ggfun_0.1.1 evaluate_0.21 codetools_0.2-19
## [97] ggraph_2.1.0 tibble_3.2.1 qvalue_2.32.0
## [100] ggplotify_0.1.1 cli_3.6.1 systemfonts_1.0.4
## [103] jquerylib_0.1.4 munsell_0.5.0 Rcpp_1.0.11
## [106] png_0.1-8 parallel_4.3.1 ggplot2_3.4.2
## [109] blob_1.2.4 DOSE_3.26.1 bitops_1.0-7
## [112] viridisLite_0.4.2 tidytree_0.4.4 scales_1.2.1
## [115] purrr_1.0.1 crayon_1.5.2 rlang_1.1.1
## [118] cowplot_1.1.1 fastmatch_1.1-3 KEGGREST_1.40.0
## [121] rvest_1.0.3