Source: https://github.com/markziemann/enrichment_recipe
This guide is a Rmarkdown script that conducts differential expression and enrichment analysis, which are very popular workflows for transcriptome data.
This is meant to be a boilerplate template, which you can remix and modify to suit your analytical needs.
In the code chunk below called libs
, you can add and remove required R library dependancies. Check that the libraries listed here match the Dockerfile, otherwise you might get errors.
suppressPackageStartupMessages({
library("getDEE2")
library("DESeq2")
library("kableExtra")
library("clusterProfiler")
library("fgsea")
library("eulerr")
library("gplots")
})
For this guide I will be using bulk RNA-seq data from a previous study, which is deposited at NCBI GEO with accession GSE93236.
The experiment is designed to investigate the effect of knowckdown of the histone methyltransferase set7.
We will be fetching the gene expression count matrix directly from NCBI GEO for downstream analysis. Link: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE93nnn/GSE93236/suppl/GSE93236_Keating_Set7_KD_counts.tsv.gz
download.file("https://ftp.ncbi.nlm.nih.gov/geo/series/GSE93nnn/GSE93236/suppl/GSE93236_Keating_Set7_KD_counts.tsv.gz",
destfile="GSE93236_Keating_Set7_KD_counts.tsv.gz")
x <- read.table("GSE93236_Keating_Set7_KD_counts.tsv.gz")
dim(x)
## [1] 18545 6
# there is a row we need to remove
mx <- x[rownames(x)!="NOTINBED",]
head(mx)
## FC045_3_NTC1 FC045_4_Set7KD1 FC045_5_NTC2
## ENSG00000165195.9_PIGA 303 291 280
## ENSG00000249719.1_RP3-486L4.3 54 51 48
## ENSG00000146966.8_DENND2A 1171 390 1153
## ENSG00000129103.13_SUMF2 1782 1476 1673
## ENSG00000256304.1_RP11-512M8.3 31 35 28
## ENSG00000203945.3_RP11-274K13.2 15 11 13
## FC045_6_Set7KD2 FC045_7_NTC3 FC045_8_Set7KD3
## ENSG00000165195.9_PIGA 294 262 257
## ENSG00000249719.1_RP3-486L4.3 50 49 60
## ENSG00000146966.8_DENND2A 493 1019 309
## ENSG00000129103.13_SUMF2 1473 1700 1366
## ENSG00000256304.1_RP11-512M8.3 43 26 47
## ENSG00000203945.3_RP11-274K13.2 10 11 11
dim(mx)
## [1] 18544 6
The row names contains an underscore that we need to remove.
rownames(mx) <- sub("_"," ",rownames(mx))
head(mx)
## FC045_3_NTC1 FC045_4_Set7KD1 FC045_5_NTC2
## ENSG00000165195.9 PIGA 303 291 280
## ENSG00000249719.1 RP3-486L4.3 54 51 48
## ENSG00000146966.8 DENND2A 1171 390 1153
## ENSG00000129103.13 SUMF2 1782 1476 1673
## ENSG00000256304.1 RP11-512M8.3 31 35 28
## ENSG00000203945.3 RP11-274K13.2 15 11 13
## FC045_6_Set7KD2 FC045_7_NTC3 FC045_8_Set7KD3
## ENSG00000165195.9 PIGA 294 262 257
## ENSG00000249719.1 RP3-486L4.3 50 49 60
## ENSG00000146966.8 DENND2A 493 1019 309
## ENSG00000129103.13 SUMF2 1473 1700 1366
## ENSG00000256304.1 RP11-512M8.3 43 26 47
## ENSG00000203945.3 RP11-274K13.2 10 11 11
Curate sample sheet from the colnames.
ss <- as.data.frame(colnames(x))
ss$trt <- as.numeric(grepl("Set7KD",colnames(x)))
ss %>%
kbl(caption="sample sheet for Set7KD experiment") %>%
kable_paper("hover", full_width = F)
colnames(x) | trt |
---|---|
FC045_3_NTC1 | 0 |
FC045_4_Set7KD1 | 1 |
FC045_5_NTC2 | 0 |
FC045_6_Set7KD2 | 1 |
FC045_7_NTC3 | 0 |
FC045_8_Set7KD3 | 1 |
QC is important, even if you are using public transcriptome data. For RNA-seq it is a good idea to show the number of reads for each sample.
par(mar=c(5,7,5,1))
barplot(rev(colSums(mx)),horiz=TRUE,las=1,main="number of reads per sample in SRP038101")
Now make a MDS plot.
mds <- cmdscale(dist(t(mx)))
# expand plot area
XMIN=min(mds[,1])*1.3
XMAX=max(mds[,1])*1.3
YMIN=min(mds[,2])*1.3
YMAX=max(mds[,2])*1.3
cols <- as.character(ss$trt)
cols <- gsub("0","lightblue",cols)
cols <- gsub("1","pink",cols)
plot(mds, xlab="Coordinate 1", ylab="Coordinate 2",
xlim=c(XMIN,XMAX),ylim=c(YMIN,YMAX),
pch=19,cex=2,col=cols, main="MDS plot")
text(cmdscale(dist(t(mx))), labels=colnames(mx) )
I will be using DESeq2 for DE analysis, the count matrix is prefiltered using a detection threshold of 10 reads per sample across all samples. All genes that meet the detection threshold will comprise the background list. The first 6 rows of the count matrix are shows.
mxf <- mx[which(rowMeans(mx)>=10),]
dim(mxf)
## [1] 18544 6
head(mxf,6) %>%
kbl(caption="Count matrix format") %>%
kable_paper("hover", full_width = F)
FC045_3_NTC1 | FC045_4_Set7KD1 | FC045_5_NTC2 | FC045_6_Set7KD2 | FC045_7_NTC3 | FC045_8_Set7KD3 | |
---|---|---|---|---|---|---|
ENSG00000165195.9 PIGA | 303 | 291 | 280 | 294 | 262 | 257 |
ENSG00000249719.1 RP3-486L4.3 | 54 | 51 | 48 | 50 | 49 | 60 |
ENSG00000146966.8 DENND2A | 1171 | 390 | 1153 | 493 | 1019 | 309 |
ENSG00000129103.13 SUMF2 | 1782 | 1476 | 1673 | 1473 | 1700 | 1366 |
ENSG00000256304.1 RP11-512M8.3 | 31 | 35 | 28 | 43 | 26 | 47 |
ENSG00000203945.3 RP11-274K13.2 | 15 | 11 | 13 | 10 | 11 | 11 |
dds <- DESeqDataSetFromMatrix(countData = mxf , colData = ss, design = ~ trt )
## the design formula contains one or more numeric variables with integer values,
## specifying a model with increasing fold change for higher values.
## did you mean for this to be a factor? if so, first convert
## this variable to a factor using the factor() function
res <- DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <-cbind(as.data.frame(z),mxf)
def <-as.data.frame(zz[order(zz$pvalue),])
head(def,10) %>%
kbl(caption="Top DE genes for Aza treatment") %>%
kable_paper("hover", full_width = F)
baseMean | log2FoldChange | lfcSE | stat | pvalue | padj | FC045_3_NTC1 | FC045_4_Set7KD1 | FC045_5_NTC2 | FC045_6_Set7KD2 | FC045_7_NTC3 | FC045_8_Set7KD3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ENSG00000164692.13 COL1A2 | 7325.1780 | 2.3007932 | 0.0872906 | 26.35785 | 0 | 0 | 2653 | 13366 | 2392 | 10048 | 2540 | 12544 |
ENSG00000168542.8 COL3A1 | 814.3688 | 2.7408245 | 0.1068516 | 25.65076 | 0 | 0 | 208 | 1554 | 228 | 1233 | 214 | 1395 |
ENSG00000172531.10 PPP1CA | 1571.0716 | -1.7187849 | 0.0764171 | -22.49216 | 0 | 0 | 2598 | 706 | 2353 | 748 | 2444 | 705 |
ENSG00000106484.8 MEST | 2657.6245 | 1.1789946 | 0.0690145 | 17.08328 | 0 | 0 | 1664 | 3796 | 1696 | 3399 | 1634 | 3683 |
ENSG00000130508.6 PXDN | 8180.4490 | -0.9726259 | 0.0619784 | -15.69298 | 0 | 0 | 11527 | 5249 | 10764 | 5616 | 10961 | 5419 |
ENSG00000125868.10 DSTN | 6067.2939 | 1.0357844 | 0.0664676 | 15.58330 | 0 | 0 | 4020 | 8146 | 3866 | 7524 | 4321 | 8393 |
ENSG00000169439.6 SDC2 | 3002.0236 | 1.3181753 | 0.0865541 | 15.22950 | 0 | 0 | 1892 | 4504 | 1802 | 3728 | 1577 | 4416 |
ENSG00000242265.1 PEG10 | 3171.5304 | 1.0736020 | 0.0708803 | 15.14670 | 0 | 0 | 2023 | 4234 | 1976 | 4070 | 2270 | 4380 |
ENSG00000166734.13 CASC4 | 1107.0779 | -1.4078612 | 0.0930505 | -15.13007 | 0 | 0 | 1620 | 535 | 1685 | 671 | 1626 | 580 |
ENSG00000091986.10 CCDC80 | 6264.0879 | 1.3185353 | 0.0878996 | 15.00047 | 0 | 0 | 3625 | 9486 | 3152 | 8289 | 4233 | 8613 |
Make a smear plot.
sigf <- subset(def,padj<=0.05)
DET=nrow(mxf)
NSIG=nrow(sigf)
NUP=nrow(subset(sigf,log2FoldChange>0))
NDN=nrow(subset(sigf,log2FoldChange<0))
HEADER=paste(DET,"detected genes;",NSIG,"w/FDR<0.05;",NUP,"up;",NDN,"down")
plot(log10(def$baseMean) ,def$log2FoldChange,
cex=0.6,pch=19,col="darkgray",
main="Set7KD in endothelial cells",
xlab="log10(basemean)",ylab="log2(fold change)")
points(log10(sigf$baseMean) ,sigf$log2FoldChange,
cex=0.6,pch=19,col="red")
mtext(HEADER)
In the next sections I will run enrichment analysis with over-representation test and compare it to functional class scoring.
I’ve compiled a reporting checklist:
Reporting criteria | Method/resource used |
---|---|
Origin of gene sets | Reactome (2023-03-06) |
Tool used | ClusterProfiler (check version at foot of report) |
Statistical test used | hypergeometric test |
P-value correction for multiple comparisons | FDR method |
Background list | Genes with >=10 reads per sample on average across all samples |
Gene Selection Criteria | DESeq2 FDR<0.05 |
ORA directionality | Separate tests for up- and down-regulation |
Data availability | GEO under accession GSE93236 |
Other parameters | Min gene set size of 10 |
Get the gene sets loaded in R. They are located in the /ref
folder of the repo.
# from https://reactome.org/download/current/ReactomePathways.gmt.zip
genesets2 <- read.gmt("ref/ReactomePathways_2023-03-06.gmt")
gsets <- gmtPathways("ref/ReactomePathways_2023-03-06.gmt")
Now filter the gene names into three lists, up-regulated, down-regulated and background. Background is simply all genes that were detected.
defup <- rownames(subset(def,padj<=0.05 & log2FoldChange>0))
defup <- unique(sapply(strsplit(defup," "),"[[",2))
defdn <- rownames(subset(def,padj<=0.05 & log2FoldChange<0))
defdn <- unique(sapply(strsplit(defdn," "),"[[",2))
bg <- rownames(def)
bg <- unique(sapply(strsplit(bg," "),"[[",2))
message("number of genes in each group")
## number of genes in each group
lapply(list("background"=bg,"up-regulated"=defup,"down-regulated"=defdn),length)
## $background
## [1] 18504
##
## $`up-regulated`
## [1] 2700
##
## $`down-regulated`
## [1] 2682
One of the quirky things I learned about clusterprofiler is that not all genes specified by the user as background are included in the enrichmentcalculation. It appears as though the genes not included in the annotation set are discarded. I believe this is a subtle but significant problem. To fix it is relatively easy from the user’s point of view. Just create a new gene set that consists of all detected genes and
Adding all detected genes to the background appears to improve results!
bgdf <- data.frame("background",bg)
colnames(bgdf) <- c("term","gene")
genesets2 <- rbind(genesets2,bgdf)
Enrichment firstly with upregulated genes. Note that the BgRatio contains the correct number of genes in the background now.
# show 10 pathways only
n_pw=10
ora_up <- as.data.frame(enricher(gene = defup ,
universe = bg, maxGSSize = 50000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora_up$geneID <- NULL
ora_up <- subset(ora_up,p.adjust<0.05 & Count >=10)
ora_ups <- rownames(ora_up)
gr <- as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",2))
ora_up$es <- gr/br
ora_up <- ora_up[order(-ora_up$es),]
ora_up$Description=NULL
head(ora_up,n_pw) %>%
kbl(row.names = FALSE, caption="Top upregulated pathways in Aza treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
Cell-extracellular matrix interactions | 12/2700 | 18/18504 | 7.00e-07 | 0.0000126 | 0.0000072 | 12 | 4.568889 |
Initiation of Nuclear Envelope (NE) Reformation | 11/2700 | 19/18504 | 1.52e-05 | 0.0001297 | 0.0000741 | 11 | 3.967719 |
Formation of tubulin folding intermediates by CCT/TriC | 12/2700 | 21/18504 | 7.40e-06 | 0.0000716 | 0.0000409 | 12 | 3.916190 |
RHOBTB1 GTPase cycle | 12/2700 | 23/18504 | 2.55e-05 | 0.0001958 | 0.0001118 | 12 | 3.575652 |
RHOBTB2 GTPase cycle | 12/2700 | 23/18504 | 2.55e-05 | 0.0001958 | 0.0001118 | 12 | 3.575652 |
Postmitotic nuclear pore complex (NPC) reformation | 13/2700 | 25/18504 | 1.22e-05 | 0.0001093 | 0.0000624 | 13 | 3.563733 |
Syndecan interactions | 14/2700 | 27/18504 | 5.90e-06 | 0.0000639 | 0.0000365 | 14 | 3.553580 |
Transport of Ribonucleoproteins into the Host Nucleus | 15/2700 | 29/18504 | 2.80e-06 | 0.0000346 | 0.0000198 | 15 | 3.544828 |
Transport of the SLBP Dependant Mature mRNA | 17/2700 | 33/18504 | 7.00e-07 | 0.0000119 | 0.0000068 | 17 | 3.530505 |
EPHB-mediated forward signaling | 19/2700 | 37/18504 | 2.00e-07 | 0.0000035 | 0.0000020 | 19 | 3.519279 |
topup2 <- rev(head(ora_up$es,10))
names(topup2) <- rev(head(ora_up$ID,10))
Now with the downregulated genes
ora_dn <- as.data.frame(enricher(gene = defdn ,
universe = bg, maxGSSize = 50000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora_dn$geneID <- NULL
ora_dn <- subset(ora_dn,p.adjust<0.05 & Count >=10)
ora_dns <- rownames(ora_dn)
gr <- as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",2))
ora_dn$es <- gr/br
ora_dn <- ora_dn[order(-ora_dn$es),]
ora_dn$Description=NULL
head(ora_dn,n_pw) %>%
kbl(row.names = FALSE, caption="Top downregulated pathways in Aza treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
Interferon alpha/beta signaling | 29/2682 | 55/18504 | 0.0e+00 | 0.0e+00 | 0.00e+00 | 29 | 3.637828 |
Peptide chain elongation | 40/2682 | 87/18504 | 0.0e+00 | 0.0e+00 | 0.00e+00 | 40 | 3.172105 |
Eukaryotic Translation Elongation | 42/2682 | 92/18504 | 0.0e+00 | 0.0e+00 | 0.00e+00 | 42 | 3.149694 |
Response of EIF2AK4 (GCN2) to amino acid deficiency | 44/2682 | 98/18504 | 0.0e+00 | 0.0e+00 | 0.00e+00 | 44 | 3.097658 |
Selenocysteine synthesis | 40/2682 | 90/18504 | 0.0e+00 | 0.0e+00 | 0.00e+00 | 40 | 3.066368 |
Viral mRNA Translation | 38/2682 | 87/18504 | 0.0e+00 | 0.0e+00 | 0.00e+00 | 38 | 3.013500 |
FOXO-mediated transcription | 22/2682 | 52/18504 | 1.1e-06 | 4.1e-05 | 3.56e-05 | 22 | 2.918947 |
Eukaryotic Translation Termination | 38/2682 | 91/18504 | 0.0e+00 | 0.0e+00 | 0.00e+00 | 38 | 2.881038 |
Selenoamino acid metabolism | 43/2682 | 103/18504 | 0.0e+00 | 0.0e+00 | 0.00e+00 | 43 | 2.880302 |
Formation of a pool of free 40S subunits | 41/2682 | 99/18504 | 0.0e+00 | 0.0e+00 | 0.00e+00 | 41 | 2.857298 |
topdn2 <- head(ora_dn$es,n_pw)
names(topdn2) <- head(ora_dn$ID,n_pw)
Make a barplot
par(mar=c(5,20,5,1))
cols <- c(rep("blue",n_pw),rep("red",n_pw))
barplot(c(topdn2,topup2),
horiz=TRUE,las=1,cex.names=0.65,col=cols,
main="top DE Reactomes",
xlab="ES",
cex.main=0.9)
mtext("ORA test")
Reporting criteria | Method/resource used |
---|---|
Origin of gene sets | Reactome (2023-03-06) |
Tool used | FGSEA (check version at foot of report) |
Statistical test used | Kolmogorov-Smirnov test |
P-value correction for multiple comparisons | FDR method |
Background list | Genes with >=10 reads per sample on average across all samples |
Gene Selection Criteria | DESeq2 FDR<0.05 |
ORA directionality | Separate tests for up- and down-regulation |
Data availability | via DEE2 at accession SRP038101 (human) |
Other parameters | Min gene set size of 10 |
def2 <- def
def2$genename <- sapply(strsplit(rownames(def2)," "),"[[",2)
def2 <- def2[,c("stat","genename")]
def2 <- aggregate(. ~ genename, def2, sum)
stat <- def2$stat
names(stat) <- def2$genename
fgseaRes <- fgsea(pathways=gsets, stats=stat, minSize=10, nPermSimple = 10000)
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (0.06% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
fgseaRes <- fgseaRes[order(fgseaRes$pval),]
fgsea_up <- subset(fgseaRes,padj<0.05 & ES>0)
fgsea_ups <- fgsea_up$pathway
fgsea_dn <- subset(fgseaRes,padj<0.05 & ES<0)
fgsea_dns <- fgsea_dn$pathway
fgsea_up <- data.frame(fgsea_up[order(-fgsea_up$ES),])
fgsea_dn <- data.frame(fgsea_dn[order(fgsea_dn$ES),])
head(fgsea_up,n_pw) %>%
kbl(row.names = FALSE, caption="FGSEA:Top upregulated pathways in Aza treatment") %>%
kable_paper("hover", full_width = F)
pathway | pval | padj | log2err | ES | NES | size | leadingEdge |
---|---|---|---|---|---|---|---|
Folding of actin by CCT/TriC | 0.0000049 | 0.0001013 | 0.6105269 | 0.8964312 | 2.140545 | 10 | CCT5 , TCP1 , CCT6A, ACTB , CCT3 , CCT2 , CCT7 , CCT4 |
GP1b-IX-V activation signalling | 0.0001711 | 0.0015309 | 0.5188481 | 0.8341859 | 1.991913 | 10 | COL1A2, PIK3R1, FLNA , COL1A1, RAF1 |
Platelet Adhesion to exposed collagen | 0.0011459 | 0.0078804 | 0.4550599 | 0.7729294 | 1.891480 | 11 | COL1A2, ITGA1 , COL1A1, ITGB1 , ITGA2 |
Cell-extracellular matrix interactions | 0.0000247 | 0.0003444 | 0.5756103 | 0.7577980 | 2.118309 | 18 | RSU1 , FERMT2, ACTN1 , FLNA , FBLIM1, VASP , ACTB , PARVA , ITGB1 , LIMS1 , ILK , FLNC |
Formation of tubulin folding intermediates by CCT/TriC | 0.0000062 | 0.0001184 | 0.6105269 | 0.7545947 | 2.200061 | 21 | TUBB3 , CCT5 , TCP1 , TUBB2A, CCT6A , CCT3 , CCT2 , TUBA1C, CCT7 , CCT4 , TUBA1B |
MASTL Facilitates Mitotic Progression | 0.0080462 | 0.0408024 | 0.2320568 | 0.7262829 | 1.734256 | 10 | PPP2CA , PPP2R1B, PPP2R1A, PPP2CB , ENSA |
Prefoldin mediated transfer of substrate to CCT/TriC | 0.0000056 | 0.0001107 | 0.6105269 | 0.7165120 | 2.199181 | 26 | TUBB3 , CCT5 , TCP1 , TUBB2A, CCT6A , ACTB , CCT3 , CCT2 , TUBA1C, CCT7 , CCT4 , VBP1 |
Cooperation of Prefoldin and TriC/CCT in actin and tubulin folding | 0.0000018 | 0.0000510 | 0.6435518 | 0.7130937 | 2.233069 | 28 | TUBB3 , CCT5 , TCP1 , TUBB2A, CCT6A , ACTB , CCT3 , CCT2 , TUBA1C, CCT7 , CCT4 , TUBA1B, VBP1 |
CTNNB1 S33 mutants aren’t phosphorylated | 0.0018180 | 0.0113301 | 0.4550599 | 0.7026577 | 1.841133 | 14 | PPP2CA , PPP2R1B, CTNNB1 , PPP2R5A, APC , PPP2R1A, PPP2CB |
CTNNB1 S37 mutants aren’t phosphorylated | 0.0018180 | 0.0113301 | 0.4550599 | 0.7026577 | 1.841133 | 14 | PPP2CA , PPP2R1B, CTNNB1 , PPP2R5A, APC , PPP2R1A, PPP2CB |
head(fgsea_dn,n_pw) %>%
kbl(row.names = FALSE, caption="FGSEA:Top downregulated pathways in Aza treatment") %>%
kable_paper("hover", full_width = F)
pathway | pval | padj | log2err | ES | NES | size | leadingEdge |
---|---|---|---|---|---|---|---|
Endosomal/Vacuolar pathway | 0.0000648 | 0.0007066 | 0.5384341 | -0.8564798 | -2.028123 | 10 | HLA-E, CTSS , HLA-A, HLA-C, HLA-H, B2M , HLA-F, HLA-B, HLA-G |
CASP8 activity is inhibited | 0.0046557 | 0.0257910 | 0.2972329 | -0.7534651 | -1.784187 | 10 | TNFRSF10B, TNFRSF10A, FAS , CFLAR , TRADD , TRAF2 , TNFSF10 |
Dimerization of procaspase-8 | 0.0046557 | 0.0257910 | 0.2972329 | -0.7534651 | -1.784187 | 10 | TNFRSF10B, TNFRSF10A, FAS , CFLAR , TRADD , TRAF2 , TNFSF10 |
Regulation by c-FLIP | 0.0046557 | 0.0257910 | 0.2972329 | -0.7534651 | -1.784187 | 10 | TNFRSF10B, TNFRSF10A, FAS , CFLAR , TRADD , TRAF2 , TNFSF10 |
ABC transporters in lipid homeostasis | 0.0087549 | 0.0427471 | 0.2157829 | -0.6668947 | -1.733942 | 14 | ABCA2 , ABCG1 , ABCA3 , ABCA7 , ABCA6 , ABCA9 , ABCA5 , ABCA12 |
Regulation of RUNX1 Expression and Activity | 0.0102121 | 0.0478077 | 0.2005132 | -0.6411896 | -1.693097 | 15 | TNRC6B, PTPN11, CCND2 , CBFB , PML |
Eukaryotic Translation Elongation | 0.0000000 | 0.0000000 | 0.8753251 | -0.6273228 | -2.468077 | 92 | RPL3 , EEF1A2 , RPS14 , RPL13A , RPL19 , RPS11 , RPL13 , RPS13 , EEF2 , RPS16 , RPL28 , RPS20 , RPS9 , RPL4 , RPS8 , RPL10A , RPL22L1, RPL15 , RPS19 , RPS3 , RPL8 , RPL37 , RPL26 , RPL12 , RPS6 , RPL23A , RPL29 , RPL37A , RPL22 , RPL18 , RPS21 , RPL31 , RPL18A , EEF1G , RPLP2 , RPL27A , RPS4X , RPL23 , RPL5 , RPS18 , RPL35A , EEF1A1 , RPL34 , RPL7A , RPS15 , RPS29 , EEF1D , RPLP1 , RPL30 , RPL14 , RPL36 |
Peptide chain elongation | 0.0000000 | 0.0000000 | 0.8634154 | -0.6251293 | -2.436687 | 87 | RPL3 , RPS14 , RPL13A , RPL19 , RPS11 , RPL13 , RPS13 , EEF2 , RPS16 , RPL28 , RPS20 , RPS9 , RPL4 , RPS8 , RPL10A , RPL22L1, RPL15 , RPS19 , RPS3 , RPL8 , RPL37 , RPL26 , RPL12 , RPS6 , RPL23A , RPL29 , RPL37A , RPL22 , RPL18 , RPS21 , RPL31 , RPL18A , RPLP2 , RPL27A , RPS4X , RPL23 , RPL5 , RPS18 , RPL35A , EEF1A1 , RPL34 , RPL7A , RPS15 , RPS29 , RPLP1 , RPL30 , RPL14 , RPL36 , RPL10 , RPL32 , RPS15A , RPL35 , RPS27 , RPLP0 |
Selenocysteine synthesis | 0.0000000 | 0.0000000 | 0.8513391 | -0.6159046 | -2.414271 | 90 | RPL3 , RPS14 , RPL13A , RPL19 , RPS11 , RPL13 , RPS13 , RPS16 , RPL28 , RPS20 , RPS9 , RPL4 , RPS8 , RPL10A , RPL22L1 , RPL15 , SEPSECS , RPS19 , RPS3 , RPL8 , RPL37 , RPL26 , RPL12 , RPS6 , RPL23A , RPL29 , RPL37A , RPL22 , RPL18 , RPS21 , RPL31 , RPL18A , RPLP2 , RPL27A , RPS4X , RPL23 , RPL5 , RPS18 , SECISBP2, RPL35A , RPL34 , RPL7A , RPS15 , RPS29 , RPLP1 , RPL30 , RPL14 , RPL36 , RPL10 , RPL32 , RPS15A , RPL35 , RPS27 , RPLP0 , EEFSEC , RPL11 , RPL38 , RPS28 , RPS4Y1 , RPL36AL |
Interferon alpha/beta signaling | 0.0000007 | 0.0000262 | 0.6594444 | -0.6136285 | -2.205558 | 55 | HLA-E , STAT2 , PTPN11, IFITM3, BST2 , ISG20 , HLA-A , IRF1 , HLA-C , IRF3 , IFITM1, HLA-H , IFITM2, IRF7 , HLA-F , MX2 , USP18 , SOCS3 , IRF9 , ISG15 , IFI27 , HLA-B , EGR1 , IFIT1 , OAS3 , HLA-G , RNASEL, MX1 , IFNAR2 |
fgsea_up <- head(fgsea_up,n_pw)
fgsea_up_vec <- fgsea_up$ES
names(fgsea_up_vec) <- fgsea_up$pathway
fgsea_up_vec <- rev(fgsea_up_vec)
fgsea_dn <- head(fgsea_dn,n_pw)
fgsea_dn_vec <- fgsea_dn$ES * -1
names(fgsea_dn_vec) <- fgsea_dn$pathway
Make a barplot
par(mar=c(5,20,5,1))
cols <- c(rep("blue",n_pw),rep("red",n_pw))
barplot(c(fgsea_dn_vec,fgsea_up_vec),
horiz=TRUE,las=1,cex.names=0.65,col=cols,
main="top DE Reactomes",
xlab="ES",
cex.main=0.9)
mtext("FCS test")
v1 <- list("ORA up"=ora_ups, "FCS up"=fgsea_ups)
v2 <- list("ORA dn"=ora_dns, "FCS dn"=fgsea_dns)
par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2))
plot(euler(v2),quantities = list(cex = 2), labels = list(cex = 2))
Jaccard index
jaccard <- function(a, b) {
intersection = length(intersect(a, b))
union = length(a) + length(b) - intersection
return (intersection/union)
}
ora <- c(ora_ups,ora_dns)
fcs <- c(fgsea_ups,fgsea_dns)
jaccard(ora,fcs)
## [1] 0.5233463
In this case the top upregulated and downregulated pathways
upname <- head(fgsea_up,1)$pathway
plotEnrichment(gsets[[upname]], stat, gseaParam = 1, ticksSize = 0.2)
dnname <- head(fgsea_dn,1)$pathway
plotEnrichment(gsets[[dnname]], stat, gseaParam = 1, ticksSize = 0.2)
Now a heatmap of these gene sets.
# reads per million normalisation
rpm <- apply(mxf,2, function(x) { x/sum(x) * 1000000} )
colnames(rpm) <- c("UNTR1","UNTR2","UNTR3","TRT1","TRT2","TRT3")
gnames_up <- gsets[[which(names(gsets) == upname)]]
gnames_dn <- gsets[[which(names(gsets) == dnname)]]
gene_ids <- rownames(mxf)
gene_names <- sapply(strsplit(gene_ids," "),"[[",2)
rpm_up <- rpm[which(gene_names %in% gnames_up),]
rownames(rpm_up) <- sapply(strsplit(rownames(rpm_up)," "),"[[",2)
rpm_dn <- rpm[which(gene_names %in% gnames_dn),]
rownames(rpm_dn) <- sapply(strsplit(rownames(rpm_dn)," "),"[[",2)
colsidecols <- c("blue","blue","blue","red","red","red")
heatmap.2(rpm_up,scale="row",trace="none",margins=c(10,15),main=upname,ColSideColors=colsidecols)
heatmap.2(rpm_dn,scale="row",trace="none",margins=c(10,15),main=dnname,ColSideColors=colsidecols)
For reproducibility
sessionInfo()
## R version 4.3.0 (2023-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] gplots_3.1.3 eulerr_7.0.0
## [3] fgsea_1.22.0 clusterProfiler_4.4.4
## [5] kableExtra_1.3.4 DESeq2_1.36.0
## [7] SummarizedExperiment_1.26.1 Biobase_2.56.0
## [9] MatrixGenerics_1.8.1 matrixStats_0.63.0
## [11] GenomicRanges_1.48.0 GenomeInfoDb_1.32.2
## [13] IRanges_2.30.0 S4Vectors_0.34.0
## [15] BiocGenerics_0.42.0 getDEE2_1.6.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 jsonlite_1.8.4 rstudioapi_0.14
## [4] magrittr_2.0.3 farver_2.1.1 rmarkdown_2.21
## [7] zlibbioc_1.42.0 vctrs_0.6.2 memoise_2.0.1
## [10] RCurl_1.98-1.12 ggtree_3.4.1 webshot_0.5.4
## [13] htmltools_0.5.5 gridGraphics_0.5-1 sass_0.4.5
## [16] bslib_0.4.2 KernSmooth_2.23-20 plyr_1.8.8
## [19] cachem_1.0.7 igraph_1.4.2 lifecycle_1.0.3
## [22] pkgconfig_2.0.3 Matrix_1.5-4 R6_2.5.1
## [25] fastmap_1.1.1 GenomeInfoDbData_1.2.8 digest_0.6.31
## [28] aplot_0.1.10 enrichplot_1.16.1 colorspace_2.1-0
## [31] patchwork_1.1.2 AnnotationDbi_1.58.0 geneplotter_1.74.0
## [34] RSQLite_2.3.1 labeling_0.4.2 fansi_1.0.4
## [37] httr_1.4.5 polyclip_1.10-4 compiler_4.3.0
## [40] bit64_4.0.5 withr_2.5.0 downloader_0.4
## [43] BiocParallel_1.30.3 viridis_0.6.2 DBI_1.1.3
## [46] highr_0.10 ggforce_0.4.1 MASS_7.3-59
## [49] DelayedArray_0.22.0 caTools_1.18.2 gtools_3.9.4
## [52] tools_4.3.0 DO.db_2.9 scatterpie_0.1.9
## [55] ape_5.7-1 glue_1.6.2 nlme_3.1-162
## [58] GOSemSim_2.22.0 polylabelr_0.2.0 shadowtext_0.1.2
## [61] grid_4.3.0 reshape2_1.4.4 generics_0.1.3
## [64] gtable_0.3.3 tidyr_1.3.0 data.table_1.14.8
## [67] tidygraph_1.2.3 xml2_1.3.4 utf8_1.2.3
## [70] XVector_0.36.0 ggrepel_0.9.3 pillar_1.9.0
## [73] stringr_1.5.0 yulab.utils_0.0.6 genefilter_1.78.0
## [76] splines_4.3.0 dplyr_1.1.2 tweenr_2.0.2
## [79] treeio_1.20.1 lattice_0.21-8 survival_3.5-5
## [82] bit_4.0.5 annotate_1.74.0 tidyselect_1.2.0
## [85] GO.db_3.15.0 locfit_1.5-9.7 Biostrings_2.64.0
## [88] knitr_1.42 gridExtra_2.3 svglite_2.1.1
## [91] xfun_0.39 graphlayouts_0.8.4 stringi_1.7.12
## [94] yaml_2.3.7 lazyeval_0.2.2 ggfun_0.0.9
## [97] evaluate_0.20 codetools_0.2-19 ggraph_2.1.0
## [100] tibble_3.2.1 qvalue_2.28.0 ggplotify_0.1.0
## [103] cli_3.6.1 xtable_1.8-4 systemfonts_1.0.4
## [106] jquerylib_0.1.4 munsell_0.5.0 Rcpp_1.0.10
## [109] png_0.1-8 XML_3.99-0.14 parallel_4.3.0
## [112] ggplot2_3.4.2 blob_1.2.4 DOSE_3.22.0
## [115] htm2txt_2.2.2 bitops_1.0-7 tidytree_0.4.2
## [118] viridisLite_0.4.1 scales_1.2.1 purrr_1.0.1
## [121] crayon_1.5.2 rlang_1.1.1 fastmatch_1.1-3
## [124] KEGGREST_1.36.3 rvest_1.0.3