Source: https://github.com/markziemann/enrichment_recipe

Introduction

This guide is a Rmarkdown script that conducts differential expression and enrichment analysis, which are very popular workflows for transcriptome data.

This is meant to be a boilerplate template, which you can remix and modify to suit your analytical needs.

In the code chunk below called libs, you can add and remove required R library dependancies. Check that the libraries listed here match the Dockerfile, otherwise you might get errors.

suppressPackageStartupMessages({
  library("getDEE2")
  library("DESeq2")
  library("kableExtra")
  library("clusterProfiler")
  library("fgsea")
  library("eulerr")
  library("gplots")
})

For this guide I will be using bulk RNA-seq data from a previous study, which is deposited at NCBI GEO with accession GSE93236.

The experiment is designed to investigate the effect of knowckdown of the histone methyltransferase set7.

We will be fetching the gene expression count matrix directly from NCBI GEO for downstream analysis. Link: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE93nnn/GSE93236/suppl/GSE93236_Keating_Set7_KD_counts.tsv.gz

download.file("https://ftp.ncbi.nlm.nih.gov/geo/series/GSE93nnn/GSE93236/suppl/GSE93236_Keating_Set7_KD_counts.tsv.gz",
  destfile="GSE93236_Keating_Set7_KD_counts.tsv.gz")

x <- read.table("GSE93236_Keating_Set7_KD_counts.tsv.gz")

dim(x)
## [1] 18545     6
# there is a row we need to remove
mx <- x[rownames(x)!="NOTINBED",]

head(mx)
##                                 FC045_3_NTC1 FC045_4_Set7KD1 FC045_5_NTC2
## ENSG00000165195.9_PIGA                   303             291          280
## ENSG00000249719.1_RP3-486L4.3             54              51           48
## ENSG00000146966.8_DENND2A               1171             390         1153
## ENSG00000129103.13_SUMF2                1782            1476         1673
## ENSG00000256304.1_RP11-512M8.3            31              35           28
## ENSG00000203945.3_RP11-274K13.2           15              11           13
##                                 FC045_6_Set7KD2 FC045_7_NTC3 FC045_8_Set7KD3
## ENSG00000165195.9_PIGA                      294          262             257
## ENSG00000249719.1_RP3-486L4.3                50           49              60
## ENSG00000146966.8_DENND2A                   493         1019             309
## ENSG00000129103.13_SUMF2                   1473         1700            1366
## ENSG00000256304.1_RP11-512M8.3               43           26              47
## ENSG00000203945.3_RP11-274K13.2              10           11              11
dim(mx)
## [1] 18544     6

The row names contains an underscore that we need to remove.

rownames(mx) <- sub("_"," ",rownames(mx))

head(mx)
##                                 FC045_3_NTC1 FC045_4_Set7KD1 FC045_5_NTC2
## ENSG00000165195.9 PIGA                   303             291          280
## ENSG00000249719.1 RP3-486L4.3             54              51           48
## ENSG00000146966.8 DENND2A               1171             390         1153
## ENSG00000129103.13 SUMF2                1782            1476         1673
## ENSG00000256304.1 RP11-512M8.3            31              35           28
## ENSG00000203945.3 RP11-274K13.2           15              11           13
##                                 FC045_6_Set7KD2 FC045_7_NTC3 FC045_8_Set7KD3
## ENSG00000165195.9 PIGA                      294          262             257
## ENSG00000249719.1 RP3-486L4.3                50           49              60
## ENSG00000146966.8 DENND2A                   493         1019             309
## ENSG00000129103.13 SUMF2                   1473         1700            1366
## ENSG00000256304.1 RP11-512M8.3               43           26              47
## ENSG00000203945.3 RP11-274K13.2              10           11              11

Curate sample sheet from the colnames.

ss <- as.data.frame(colnames(x))
ss$trt <- as.numeric(grepl("Set7KD",colnames(x)))

ss %>%
  kbl(caption="sample sheet for Set7KD experiment") %>%
  kable_paper("hover", full_width = F)
sample sheet for Set7KD experiment
colnames(x) trt
FC045_3_NTC1 0
FC045_4_Set7KD1 1
FC045_5_NTC2 0
FC045_6_Set7KD2 1
FC045_7_NTC3 0
FC045_8_Set7KD3 1

Data quality control

QC is important, even if you are using public transcriptome data. For RNA-seq it is a good idea to show the number of reads for each sample.

par(mar=c(5,7,5,1))
barplot(rev(colSums(mx)),horiz=TRUE,las=1,main="number of reads per sample in SRP038101")

Now make a MDS plot.

mds <- cmdscale(dist(t(mx)))

# expand plot area
XMIN=min(mds[,1])*1.3
XMAX=max(mds[,1])*1.3
YMIN=min(mds[,2])*1.3
YMAX=max(mds[,2])*1.3

cols <- as.character(ss$trt)
cols <- gsub("0","lightblue",cols)
cols <- gsub("1","pink",cols)
plot(mds, xlab="Coordinate 1", ylab="Coordinate 2",
  xlim=c(XMIN,XMAX),ylim=c(YMIN,YMAX),
  pch=19,cex=2,col=cols, main="MDS plot")
text(cmdscale(dist(t(mx))), labels=colnames(mx) )

Differential expression analysis

I will be using DESeq2 for DE analysis, the count matrix is prefiltered using a detection threshold of 10 reads per sample across all samples. All genes that meet the detection threshold will comprise the background list. The first 6 rows of the count matrix are shows.

mxf <- mx[which(rowMeans(mx)>=10),]
dim(mxf)
## [1] 18544     6
head(mxf,6) %>%
  kbl(caption="Count matrix format") %>%
  kable_paper("hover", full_width = F)
Count matrix format
FC045_3_NTC1 FC045_4_Set7KD1 FC045_5_NTC2 FC045_6_Set7KD2 FC045_7_NTC3 FC045_8_Set7KD3
ENSG00000165195.9 PIGA 303 291 280 294 262 257
ENSG00000249719.1 RP3-486L4.3 54 51 48 50 49 60
ENSG00000146966.8 DENND2A 1171 390 1153 493 1019 309
ENSG00000129103.13 SUMF2 1782 1476 1673 1473 1700 1366
ENSG00000256304.1 RP11-512M8.3 31 35 28 43 26 47
ENSG00000203945.3 RP11-274K13.2 15 11 13 10 11 11
dds <- DESeqDataSetFromMatrix(countData = mxf , colData = ss, design = ~ trt )
##   the design formula contains one or more numeric variables with integer values,
##   specifying a model with increasing fold change for higher values.
##   did you mean for this to be a factor? if so, first convert
##   this variable to a factor using the factor() function
res <- DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <-cbind(as.data.frame(z),mxf)
def <-as.data.frame(zz[order(zz$pvalue),])

head(def,10) %>%
  kbl(caption="Top DE genes for Aza treatment") %>%
  kable_paper("hover", full_width = F)
Top DE genes for Aza treatment
baseMean log2FoldChange lfcSE stat pvalue padj FC045_3_NTC1 FC045_4_Set7KD1 FC045_5_NTC2 FC045_6_Set7KD2 FC045_7_NTC3 FC045_8_Set7KD3
ENSG00000164692.13 COL1A2 7325.1780 2.3007932 0.0872906 26.35785 0 0 2653 13366 2392 10048 2540 12544
ENSG00000168542.8 COL3A1 814.3688 2.7408245 0.1068516 25.65076 0 0 208 1554 228 1233 214 1395
ENSG00000172531.10 PPP1CA 1571.0716 -1.7187849 0.0764171 -22.49216 0 0 2598 706 2353 748 2444 705
ENSG00000106484.8 MEST 2657.6245 1.1789946 0.0690145 17.08328 0 0 1664 3796 1696 3399 1634 3683
ENSG00000130508.6 PXDN 8180.4490 -0.9726259 0.0619784 -15.69298 0 0 11527 5249 10764 5616 10961 5419
ENSG00000125868.10 DSTN 6067.2939 1.0357844 0.0664676 15.58330 0 0 4020 8146 3866 7524 4321 8393
ENSG00000169439.6 SDC2 3002.0236 1.3181753 0.0865541 15.22950 0 0 1892 4504 1802 3728 1577 4416
ENSG00000242265.1 PEG10 3171.5304 1.0736020 0.0708803 15.14670 0 0 2023 4234 1976 4070 2270 4380
ENSG00000166734.13 CASC4 1107.0779 -1.4078612 0.0930505 -15.13007 0 0 1620 535 1685 671 1626 580
ENSG00000091986.10 CCDC80 6264.0879 1.3185353 0.0878996 15.00047 0 0 3625 9486 3152 8289 4233 8613

Make a smear plot.

sigf <- subset(def,padj<=0.05)

DET=nrow(mxf)
NSIG=nrow(sigf)
NUP=nrow(subset(sigf,log2FoldChange>0))
NDN=nrow(subset(sigf,log2FoldChange<0))

HEADER=paste(DET,"detected genes;",NSIG,"w/FDR<0.05;",NUP,"up;",NDN,"down")

plot(log10(def$baseMean) ,def$log2FoldChange,
  cex=0.6,pch=19,col="darkgray",
  main="Set7KD in endothelial cells",
  xlab="log10(basemean)",ylab="log2(fold change)")

points(log10(sigf$baseMean) ,sigf$log2FoldChange,
  cex=0.6,pch=19,col="red")

mtext(HEADER)

In the next sections I will run enrichment analysis with over-representation test and compare it to functional class scoring.

Enrichment using over-representation test

I’ve compiled a reporting checklist:

Reporting criteria Method/resource used
Origin of gene sets Reactome (2023-03-06)
Tool used ClusterProfiler (check version at foot of report)
Statistical test used hypergeometric test
P-value correction for multiple comparisons FDR method
Background list Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria DESeq2 FDR<0.05
ORA directionality Separate tests for up- and down-regulation
Data availability GEO under accession GSE93236
Other parameters Min gene set size of 10

Get the gene sets loaded in R. They are located in the /ref folder of the repo.

# from https://reactome.org/download/current/ReactomePathways.gmt.zip
genesets2 <- read.gmt("ref/ReactomePathways_2023-03-06.gmt")
gsets <- gmtPathways("ref/ReactomePathways_2023-03-06.gmt")

Now filter the gene names into three lists, up-regulated, down-regulated and background. Background is simply all genes that were detected.

defup <- rownames(subset(def,padj<=0.05 & log2FoldChange>0))
defup <- unique(sapply(strsplit(defup," "),"[[",2))

defdn <- rownames(subset(def,padj<=0.05 & log2FoldChange<0))
defdn <- unique(sapply(strsplit(defdn," "),"[[",2))

bg <- rownames(def)
bg <- unique(sapply(strsplit(bg," "),"[[",2))

message("number of genes in each group")
## number of genes in each group
lapply(list("background"=bg,"up-regulated"=defup,"down-regulated"=defdn),length)
## $background
## [1] 18504
## 
## $`up-regulated`
## [1] 2700
## 
## $`down-regulated`
## [1] 2682

One of the quirky things I learned about clusterprofiler is that not all genes specified by the user as background are included in the enrichmentcalculation. It appears as though the genes not included in the annotation set are discarded. I believe this is a subtle but significant problem. To fix it is relatively easy from the user’s point of view. Just create a new gene set that consists of all detected genes and

Adding all detected genes to the background appears to improve results!

bgdf <- data.frame("background",bg)
colnames(bgdf) <- c("term","gene")
genesets2 <- rbind(genesets2,bgdf)

Enrichment firstly with upregulated genes. Note that the BgRatio contains the correct number of genes in the background now.

# show 10 pathways only
n_pw=10

ora_up <- as.data.frame(enricher(gene = defup ,
  universe = bg,  maxGSSize = 50000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

ora_up$geneID <- NULL
ora_up <- subset(ora_up,p.adjust<0.05 & Count >=10)
ora_ups <- rownames(ora_up)

gr <- as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",2))

ora_up$es <- gr/br
ora_up <- ora_up[order(-ora_up$es),]
ora_up$Description=NULL

head(ora_up,n_pw) %>%
  kbl(row.names = FALSE, caption="Top upregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)
Top upregulated pathways in Aza treatment
ID GeneRatio BgRatio pvalue p.adjust qvalue Count es
Cell-extracellular matrix interactions 12/2700 18/18504 7.00e-07 0.0000126 0.0000072 12 4.568889
Initiation of Nuclear Envelope (NE) Reformation 11/2700 19/18504 1.52e-05 0.0001297 0.0000741 11 3.967719
Formation of tubulin folding intermediates by CCT/TriC 12/2700 21/18504 7.40e-06 0.0000716 0.0000409 12 3.916190
RHOBTB1 GTPase cycle 12/2700 23/18504 2.55e-05 0.0001958 0.0001118 12 3.575652
RHOBTB2 GTPase cycle 12/2700 23/18504 2.55e-05 0.0001958 0.0001118 12 3.575652
Postmitotic nuclear pore complex (NPC) reformation 13/2700 25/18504 1.22e-05 0.0001093 0.0000624 13 3.563733
Syndecan interactions 14/2700 27/18504 5.90e-06 0.0000639 0.0000365 14 3.553580
Transport of Ribonucleoproteins into the Host Nucleus 15/2700 29/18504 2.80e-06 0.0000346 0.0000198 15 3.544828
Transport of the SLBP Dependant Mature mRNA 17/2700 33/18504 7.00e-07 0.0000119 0.0000068 17 3.530505
EPHB-mediated forward signaling 19/2700 37/18504 2.00e-07 0.0000035 0.0000020 19 3.519279
topup2 <- rev(head(ora_up$es,10))
names(topup2) <- rev(head(ora_up$ID,10))

Now with the downregulated genes

ora_dn <- as.data.frame(enricher(gene = defdn ,
  universe = bg,  maxGSSize = 50000, TERM2GENE = genesets2,
  pAdjustMethod="fdr",  pvalueCutoff = 1, qvalueCutoff = 1  ))

ora_dn$geneID <- NULL
ora_dn <- subset(ora_dn,p.adjust<0.05 & Count >=10)
ora_dns <- rownames(ora_dn)

gr <- as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",2))

br <- as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",1)) /
  as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",2))

ora_dn$es <- gr/br
ora_dn <- ora_dn[order(-ora_dn$es),]
ora_dn$Description=NULL

head(ora_dn,n_pw) %>%
  kbl(row.names = FALSE, caption="Top downregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)
Top downregulated pathways in Aza treatment
ID GeneRatio BgRatio pvalue p.adjust qvalue Count es
Interferon alpha/beta signaling 29/2682 55/18504 0.0e+00 0.0e+00 0.00e+00 29 3.637828
Peptide chain elongation 40/2682 87/18504 0.0e+00 0.0e+00 0.00e+00 40 3.172105
Eukaryotic Translation Elongation 42/2682 92/18504 0.0e+00 0.0e+00 0.00e+00 42 3.149694
Response of EIF2AK4 (GCN2) to amino acid deficiency 44/2682 98/18504 0.0e+00 0.0e+00 0.00e+00 44 3.097658
Selenocysteine synthesis 40/2682 90/18504 0.0e+00 0.0e+00 0.00e+00 40 3.066368
Viral mRNA Translation 38/2682 87/18504 0.0e+00 0.0e+00 0.00e+00 38 3.013500
FOXO-mediated transcription 22/2682 52/18504 1.1e-06 4.1e-05 3.56e-05 22 2.918947
Eukaryotic Translation Termination 38/2682 91/18504 0.0e+00 0.0e+00 0.00e+00 38 2.881038
Selenoamino acid metabolism 43/2682 103/18504 0.0e+00 0.0e+00 0.00e+00 43 2.880302
Formation of a pool of free 40S subunits 41/2682 99/18504 0.0e+00 0.0e+00 0.00e+00 41 2.857298
topdn2 <- head(ora_dn$es,n_pw)
names(topdn2) <- head(ora_dn$ID,n_pw)

Make a barplot

par(mar=c(5,20,5,1))

cols <- c(rep("blue",n_pw),rep("red",n_pw))

barplot(c(topdn2,topup2),
  horiz=TRUE,las=1,cex.names=0.65,col=cols,
  main="top DE Reactomes",
  xlab="ES",
  cex.main=0.9)

mtext("ORA test")

Enrichment analysis with functional class scoring

Reporting criteria Method/resource used
Origin of gene sets Reactome (2023-03-06)
Tool used FGSEA (check version at foot of report)
Statistical test used Kolmogorov-Smirnov test
P-value correction for multiple comparisons FDR method
Background list Genes with >=10 reads per sample on average across all samples
Gene Selection Criteria DESeq2 FDR<0.05
ORA directionality Separate tests for up- and down-regulation
Data availability via DEE2 at accession SRP038101 (human)
Other parameters Min gene set size of 10
def2 <- def
def2$genename <- sapply(strsplit(rownames(def2)," "),"[[",2)
def2 <- def2[,c("stat","genename")]
def2 <- aggregate(. ~ genename, def2, sum)
stat <- def2$stat
names(stat) <- def2$genename
fgseaRes <- fgsea(pathways=gsets, stats=stat, minSize=10, nPermSimple = 10000)
## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (0.06% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
fgseaRes <- fgseaRes[order(fgseaRes$pval),]
fgsea_up <- subset(fgseaRes,padj<0.05 & ES>0)
fgsea_ups <- fgsea_up$pathway
fgsea_dn <- subset(fgseaRes,padj<0.05 & ES<0)
fgsea_dns <- fgsea_dn$pathway

fgsea_up <-  data.frame(fgsea_up[order(-fgsea_up$ES),])

fgsea_dn <-  data.frame(fgsea_dn[order(fgsea_dn$ES),])

head(fgsea_up,n_pw) %>%
  kbl(row.names = FALSE, caption="FGSEA:Top upregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)
FGSEA:Top upregulated pathways in Aza treatment
pathway pval padj log2err ES NES size leadingEdge
Folding of actin by CCT/TriC 0.0000049 0.0001013 0.6105269 0.8964312 2.140545 10 CCT5 , TCP1 , CCT6A, ACTB , CCT3 , CCT2 , CCT7 , CCT4
GP1b-IX-V activation signalling 0.0001711 0.0015309 0.5188481 0.8341859 1.991913 10 COL1A2, PIK3R1, FLNA , COL1A1, RAF1
Platelet Adhesion to exposed collagen 0.0011459 0.0078804 0.4550599 0.7729294 1.891480 11 COL1A2, ITGA1 , COL1A1, ITGB1 , ITGA2
Cell-extracellular matrix interactions 0.0000247 0.0003444 0.5756103 0.7577980 2.118309 18 RSU1 , FERMT2, ACTN1 , FLNA , FBLIM1, VASP , ACTB , PARVA , ITGB1 , LIMS1 , ILK , FLNC
Formation of tubulin folding intermediates by CCT/TriC 0.0000062 0.0001184 0.6105269 0.7545947 2.200061 21 TUBB3 , CCT5 , TCP1 , TUBB2A, CCT6A , CCT3 , CCT2 , TUBA1C, CCT7 , CCT4 , TUBA1B
MASTL Facilitates Mitotic Progression 0.0080462 0.0408024 0.2320568 0.7262829 1.734256 10 PPP2CA , PPP2R1B, PPP2R1A, PPP2CB , ENSA
Prefoldin mediated transfer of substrate to CCT/TriC 0.0000056 0.0001107 0.6105269 0.7165120 2.199181 26 TUBB3 , CCT5 , TCP1 , TUBB2A, CCT6A , ACTB , CCT3 , CCT2 , TUBA1C, CCT7 , CCT4 , VBP1
Cooperation of Prefoldin and TriC/CCT in actin and tubulin folding 0.0000018 0.0000510 0.6435518 0.7130937 2.233069 28 TUBB3 , CCT5 , TCP1 , TUBB2A, CCT6A , ACTB , CCT3 , CCT2 , TUBA1C, CCT7 , CCT4 , TUBA1B, VBP1
CTNNB1 S33 mutants aren’t phosphorylated 0.0018180 0.0113301 0.4550599 0.7026577 1.841133 14 PPP2CA , PPP2R1B, CTNNB1 , PPP2R5A, APC , PPP2R1A, PPP2CB
CTNNB1 S37 mutants aren’t phosphorylated 0.0018180 0.0113301 0.4550599 0.7026577 1.841133 14 PPP2CA , PPP2R1B, CTNNB1 , PPP2R5A, APC , PPP2R1A, PPP2CB
head(fgsea_dn,n_pw) %>%
  kbl(row.names = FALSE, caption="FGSEA:Top downregulated pathways in Aza treatment") %>%
  kable_paper("hover", full_width = F)
FGSEA:Top downregulated pathways in Aza treatment
pathway pval padj log2err ES NES size leadingEdge
Endosomal/Vacuolar pathway 0.0000648 0.0007066 0.5384341 -0.8564798 -2.028123 10 HLA-E, CTSS , HLA-A, HLA-C, HLA-H, B2M , HLA-F, HLA-B, HLA-G
CASP8 activity is inhibited 0.0046557 0.0257910 0.2972329 -0.7534651 -1.784187 10 TNFRSF10B, TNFRSF10A, FAS , CFLAR , TRADD , TRAF2 , TNFSF10
Dimerization of procaspase-8 0.0046557 0.0257910 0.2972329 -0.7534651 -1.784187 10 TNFRSF10B, TNFRSF10A, FAS , CFLAR , TRADD , TRAF2 , TNFSF10
Regulation by c-FLIP 0.0046557 0.0257910 0.2972329 -0.7534651 -1.784187 10 TNFRSF10B, TNFRSF10A, FAS , CFLAR , TRADD , TRAF2 , TNFSF10
ABC transporters in lipid homeostasis 0.0087549 0.0427471 0.2157829 -0.6668947 -1.733942 14 ABCA2 , ABCG1 , ABCA3 , ABCA7 , ABCA6 , ABCA9 , ABCA5 , ABCA12
Regulation of RUNX1 Expression and Activity 0.0102121 0.0478077 0.2005132 -0.6411896 -1.693097 15 TNRC6B, PTPN11, CCND2 , CBFB , PML
Eukaryotic Translation Elongation 0.0000000 0.0000000 0.8753251 -0.6273228 -2.468077 92 RPL3 , EEF1A2 , RPS14 , RPL13A , RPL19 , RPS11 , RPL13 , RPS13 , EEF2 , RPS16 , RPL28 , RPS20 , RPS9 , RPL4 , RPS8 , RPL10A , RPL22L1, RPL15 , RPS19 , RPS3 , RPL8 , RPL37 , RPL26 , RPL12 , RPS6 , RPL23A , RPL29 , RPL37A , RPL22 , RPL18 , RPS21 , RPL31 , RPL18A , EEF1G , RPLP2 , RPL27A , RPS4X , RPL23 , RPL5 , RPS18 , RPL35A , EEF1A1 , RPL34 , RPL7A , RPS15 , RPS29 , EEF1D , RPLP1 , RPL30 , RPL14 , RPL36
Peptide chain elongation 0.0000000 0.0000000 0.8634154 -0.6251293 -2.436687 87 RPL3 , RPS14 , RPL13A , RPL19 , RPS11 , RPL13 , RPS13 , EEF2 , RPS16 , RPL28 , RPS20 , RPS9 , RPL4 , RPS8 , RPL10A , RPL22L1, RPL15 , RPS19 , RPS3 , RPL8 , RPL37 , RPL26 , RPL12 , RPS6 , RPL23A , RPL29 , RPL37A , RPL22 , RPL18 , RPS21 , RPL31 , RPL18A , RPLP2 , RPL27A , RPS4X , RPL23 , RPL5 , RPS18 , RPL35A , EEF1A1 , RPL34 , RPL7A , RPS15 , RPS29 , RPLP1 , RPL30 , RPL14 , RPL36 , RPL10 , RPL32 , RPS15A , RPL35 , RPS27 , RPLP0
Selenocysteine synthesis 0.0000000 0.0000000 0.8513391 -0.6159046 -2.414271 90 RPL3 , RPS14 , RPL13A , RPL19 , RPS11 , RPL13 , RPS13 , RPS16 , RPL28 , RPS20 , RPS9 , RPL4 , RPS8 , RPL10A , RPL22L1 , RPL15 , SEPSECS , RPS19 , RPS3 , RPL8 , RPL37 , RPL26 , RPL12 , RPS6 , RPL23A , RPL29 , RPL37A , RPL22 , RPL18 , RPS21 , RPL31 , RPL18A , RPLP2 , RPL27A , RPS4X , RPL23 , RPL5 , RPS18 , SECISBP2, RPL35A , RPL34 , RPL7A , RPS15 , RPS29 , RPLP1 , RPL30 , RPL14 , RPL36 , RPL10 , RPL32 , RPS15A , RPL35 , RPS27 , RPLP0 , EEFSEC , RPL11 , RPL38 , RPS28 , RPS4Y1 , RPL36AL
Interferon alpha/beta signaling 0.0000007 0.0000262 0.6594444 -0.6136285 -2.205558 55 HLA-E , STAT2 , PTPN11, IFITM3, BST2 , ISG20 , HLA-A , IRF1 , HLA-C , IRF3 , IFITM1, HLA-H , IFITM2, IRF7 , HLA-F , MX2 , USP18 , SOCS3 , IRF9 , ISG15 , IFI27 , HLA-B , EGR1 , IFIT1 , OAS3 , HLA-G , RNASEL, MX1 , IFNAR2
fgsea_up <- head(fgsea_up,n_pw)
fgsea_up_vec <- fgsea_up$ES
names(fgsea_up_vec) <- fgsea_up$pathway
fgsea_up_vec <- rev(fgsea_up_vec)

fgsea_dn <- head(fgsea_dn,n_pw)
fgsea_dn_vec <- fgsea_dn$ES * -1
names(fgsea_dn_vec) <- fgsea_dn$pathway

Make a barplot

par(mar=c(5,20,5,1))

cols <- c(rep("blue",n_pw),rep("red",n_pw))

barplot(c(fgsea_dn_vec,fgsea_up_vec),
  horiz=TRUE,las=1,cex.names=0.65,col=cols,
  main="top DE Reactomes",
  xlab="ES",
  cex.main=0.9)

mtext("FCS test")

Compare pathway results

v1 <- list("ORA up"=ora_ups, "FCS up"=fgsea_ups)
v2 <- list("ORA dn"=ora_dns, "FCS dn"=fgsea_dns)
par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2))

plot(euler(v2),quantities = list(cex = 2), labels = list(cex = 2))

Jaccard index

jaccard <- function(a, b) {
    intersection = length(intersect(a, b))
    union = length(a) + length(b) - intersection
    return (intersection/union)
}

ora <- c(ora_ups,ora_dns)
fcs <- c(fgsea_ups,fgsea_dns)

jaccard(ora,fcs)
## [1] 0.5233463

Drill in to a specific pathway

In this case the top upregulated and downregulated pathways

upname <- head(fgsea_up,1)$pathway
plotEnrichment(gsets[[upname]], stat, gseaParam = 1, ticksSize = 0.2)

dnname <- head(fgsea_dn,1)$pathway
plotEnrichment(gsets[[dnname]], stat, gseaParam = 1, ticksSize = 0.2)

Now a heatmap of these gene sets.

# reads per million normalisation
rpm <- apply(mxf,2, function(x) { x/sum(x) * 1000000} )
colnames(rpm) <- c("UNTR1","UNTR2","UNTR3","TRT1","TRT2","TRT3")
gnames_up <- gsets[[which(names(gsets) == upname)]]
gnames_dn <- gsets[[which(names(gsets) == dnname)]]
gene_ids <- rownames(mxf)
gene_names <- sapply(strsplit(gene_ids," "),"[[",2)
rpm_up <- rpm[which(gene_names %in% gnames_up),]
rownames(rpm_up) <- sapply(strsplit(rownames(rpm_up)," "),"[[",2)
rpm_dn <- rpm[which(gene_names %in% gnames_dn),]
rownames(rpm_dn) <- sapply(strsplit(rownames(rpm_dn)," "),"[[",2)
colsidecols <- c("blue","blue","blue","red","red","red")
heatmap.2(rpm_up,scale="row",trace="none",margins=c(10,15),main=upname,ColSideColors=colsidecols)

heatmap.2(rpm_dn,scale="row",trace="none",margins=c(10,15),main=dnname,ColSideColors=colsidecols)

Session information

For reproducibility


Click HERE to show session info

sessionInfo()
## R version 4.3.0 (2023-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] gplots_3.1.3                eulerr_7.0.0               
##  [3] fgsea_1.22.0                clusterProfiler_4.4.4      
##  [5] kableExtra_1.3.4            DESeq2_1.36.0              
##  [7] SummarizedExperiment_1.26.1 Biobase_2.56.0             
##  [9] MatrixGenerics_1.8.1        matrixStats_0.63.0         
## [11] GenomicRanges_1.48.0        GenomeInfoDb_1.32.2        
## [13] IRanges_2.30.0              S4Vectors_0.34.0           
## [15] BiocGenerics_0.42.0         getDEE2_1.6.0              
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3     jsonlite_1.8.4         rstudioapi_0.14       
##   [4] magrittr_2.0.3         farver_2.1.1           rmarkdown_2.21        
##   [7] zlibbioc_1.42.0        vctrs_0.6.2            memoise_2.0.1         
##  [10] RCurl_1.98-1.12        ggtree_3.4.1           webshot_0.5.4         
##  [13] htmltools_0.5.5        gridGraphics_0.5-1     sass_0.4.5            
##  [16] bslib_0.4.2            KernSmooth_2.23-20     plyr_1.8.8            
##  [19] cachem_1.0.7           igraph_1.4.2           lifecycle_1.0.3       
##  [22] pkgconfig_2.0.3        Matrix_1.5-4           R6_2.5.1              
##  [25] fastmap_1.1.1          GenomeInfoDbData_1.2.8 digest_0.6.31         
##  [28] aplot_0.1.10           enrichplot_1.16.1      colorspace_2.1-0      
##  [31] patchwork_1.1.2        AnnotationDbi_1.58.0   geneplotter_1.74.0    
##  [34] RSQLite_2.3.1          labeling_0.4.2         fansi_1.0.4           
##  [37] httr_1.4.5             polyclip_1.10-4        compiler_4.3.0        
##  [40] bit64_4.0.5            withr_2.5.0            downloader_0.4        
##  [43] BiocParallel_1.30.3    viridis_0.6.2          DBI_1.1.3             
##  [46] highr_0.10             ggforce_0.4.1          MASS_7.3-59           
##  [49] DelayedArray_0.22.0    caTools_1.18.2         gtools_3.9.4          
##  [52] tools_4.3.0            DO.db_2.9              scatterpie_0.1.9      
##  [55] ape_5.7-1              glue_1.6.2             nlme_3.1-162          
##  [58] GOSemSim_2.22.0        polylabelr_0.2.0       shadowtext_0.1.2      
##  [61] grid_4.3.0             reshape2_1.4.4         generics_0.1.3        
##  [64] gtable_0.3.3           tidyr_1.3.0            data.table_1.14.8     
##  [67] tidygraph_1.2.3        xml2_1.3.4             utf8_1.2.3            
##  [70] XVector_0.36.0         ggrepel_0.9.3          pillar_1.9.0          
##  [73] stringr_1.5.0          yulab.utils_0.0.6      genefilter_1.78.0     
##  [76] splines_4.3.0          dplyr_1.1.2            tweenr_2.0.2          
##  [79] treeio_1.20.1          lattice_0.21-8         survival_3.5-5        
##  [82] bit_4.0.5              annotate_1.74.0        tidyselect_1.2.0      
##  [85] GO.db_3.15.0           locfit_1.5-9.7         Biostrings_2.64.0     
##  [88] knitr_1.42             gridExtra_2.3          svglite_2.1.1         
##  [91] xfun_0.39              graphlayouts_0.8.4     stringi_1.7.12        
##  [94] yaml_2.3.7             lazyeval_0.2.2         ggfun_0.0.9           
##  [97] evaluate_0.20          codetools_0.2-19       ggraph_2.1.0          
## [100] tibble_3.2.1           qvalue_2.28.0          ggplotify_0.1.0       
## [103] cli_3.6.1              xtable_1.8-4           systemfonts_1.0.4     
## [106] jquerylib_0.1.4        munsell_0.5.0          Rcpp_1.0.10           
## [109] png_0.1-8              XML_3.99-0.14          parallel_4.3.0        
## [112] ggplot2_3.4.2          blob_1.2.4             DOSE_3.22.0           
## [115] htm2txt_2.2.2          bitops_1.0-7           tidytree_0.4.2        
## [118] viridisLite_0.4.1      scales_1.2.1           purrr_1.0.1           
## [121] crayon_1.5.2           rlang_1.1.1            fastmatch_1.1-3       
## [124] KEGGREST_1.36.3        rvest_1.0.3