Source: https://github.com/markziemann/enrichment_recipe
This guide is a Rmarkdown script that conducts differential expression and enrichment analysis, which are very popular workflows for transcriptome data.
This is meant to be a boilerplate template, which you can remix and modify to suit your analytical needs.
In the code chunk below called libs
, you can add and remove required R library dependancies. Check that the libraries listed here match the Dockerfile, otherwise you might get errors.
suppressPackageStartupMessages({
library("getDEE2")
library("DESeq2")
library("kableExtra")
library("clusterProfiler")
library("fgsea")
library("eulerr")
library("gplots")
})
## Warning: replacing previous import 'utils::findMatches' by
## 'S4Vectors::findMatches' when loading 'AnnotationDbi'
For this guide I will be using bulk RNA-seq data from a previous study, which is deposited at NCBI GEO and SRA under accession numbers: GSE55123 and SRP038101 (Lund et al, 2014). The experiment is designed to investigate the effect of Azacitidine treatment on AML3 cells.
The raw data have been processed by the DEE2 project, and the summary gene expression counts are available at the dee2.io website, and programmatically with the getDEE2 bioconductor package (Ziemann et al, 2019).
The gene counts have also been deposited to the /data
folder in the example.Rdata
file in case the DEE2 resource becomes unavailable. To import it, use the command: load("data/example.Rdata")
Alternatively, you could fetch data from another resource like NCBI GEO, Zenodo or from the host storage drive.
mdat <- getDEE2Metadata("hsapiens")
# get sample sheet
ss <- subset(mdat,SRP_accession=="SRP038101")
# fetch the whole set of RNA-seq data
x <- getDEE2("hsapiens",ss$SRR_accession , metadata=mdat, legacy=TRUE)
## For more information about DEE2 QC metrics, visit
## https://github.com/markziemann/dee2/blob/master/qc/qc_metrics.md
mx <- x$GeneCounts
rownames(mx) <- paste(rownames(mx),x$GeneInfo$GeneSymbol)
dim(mx)
## [1] 58302 6
# aza no filtering
ss$trt <- grepl("Treated",ss$Experiment_title)
ss %>%
kbl(caption="sample sheet for Aza treatment in AML3 cells") %>%
kable_paper("hover", full_width = F)
SRR_accession | QC_summary | SRX_accession | SRS_accession | SRP_accession | Experiment_title | GEO_series | trt | |
---|---|---|---|---|---|---|---|---|
156267 | SRR1171523 | PASS | SRX472607 | SRS559064 | SRP038101 | GSM1329859: Untreated.1; Homo sapiens; RNA-Seq | FALSE | |
156268 | SRR1171524 | WARN(3,4) | SRX472608 | SRS559066 | SRP038101 | GSM1329860: Untreated.2; Homo sapiens; RNA-Seq | FALSE | |
156269 | SRR1171525 | WARN(3,4) | SRX472609 | SRS559065 | SRP038101 | GSM1329861: Untreated.3; Homo sapiens; RNA-Seq | FALSE | |
156270 | SRR1171526 | WARN(3,4) | SRX472610 | SRS559068 | SRP038101 | GSM1329862: Treated.1; Homo sapiens; RNA-Seq | TRUE | |
156271 | SRR1171527 | WARN(3,4) | SRX472611 | SRS559067 | SRP038101 | GSM1329863: Treated.2; Homo sapiens; RNA-Seq | TRUE | |
156272 | SRR1171528 | WARN(3,4) | SRX472612 | SRS559069 | SRP038101 | GSM1329864: Treated.3; Homo sapiens; RNA-Seq | TRUE |
QC is important, even if you are using public transcriptome data. For RNA-seq it is a good idea to show the number of reads for each sample.
par(mar=c(5,7,5,1))
barplot(rev(colSums(mx)),horiz=TRUE,las=1,main="number of reads per sample in SRP038101")
Now make a MDS plot.
mds <- cmdscale(dist(t(mx)))
# expand plot area
XMIN=min(mds[,1])*1.3
XMAX=max(mds[,1])*1.3
YMIN=min(mds[,2])*1.3
YMAX=max(mds[,2])*1.3
cols <- as.character(grepl("Treated",ss$Experiment_title))
cols <- gsub("FALSE","lightblue",cols)
cols <- gsub("TRUE","pink",cols)
plot(mds, xlab="Coordinate 1", ylab="Coordinate 2",
xlim=c(XMIN,XMAX),ylim=c(YMIN,YMAX),
pch=19,cex=2,col=cols, main="MDS plot")
text(cmdscale(dist(t(mx))), labels=colnames(mx) )
I will be using DESeq2 for DE analysis, the count matrix is prefiltered using a detection threshold of 10 reads per sample across all samples. All genes that meet the detection threshold will comprise the background list. The first 6 rows of the count matrix are shows.
mxf <- mx[which(rowMeans(mx)>=10),]
dim(mxf)
## [1] 13168 6
head(mxf,6) %>%
kbl(caption="Count matrix format") %>%
kable_paper("hover", full_width = F)
SRR1171523 | SRR1171524 | SRR1171525 | SRR1171526 | SRR1171527 | SRR1171528 | |
---|---|---|---|---|---|---|
ENSG00000225630 MTND2P28 | 494 | 396 | 340 | 333 | 415 | 418 |
ENSG00000237973 MTCO1P12 | 52 | 39 | 40 | 30 | 37 | 29 |
ENSG00000248527 MTATP6P1 | 853 | 544 | 537 | 582 | 702 | 716 |
ENSG00000228327 AL669831.1 | 17 | 13 | 21 | 21 | 22 | 12 |
ENSG00000228794 LINC01128 | 42 | 27 | 30 | 32 | 40 | 23 |
ENSG00000230699 AL645608.3 | 20 | 11 | 13 | 10 | 15 | 22 |
dds <- DESeqDataSetFromMatrix(countData = mxf , colData = ss, design = ~ trt )
res <- DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <-cbind(as.data.frame(z),mxf)
def <-as.data.frame(zz[order(zz$pvalue),])
head(def,10) %>%
kbl(caption="Top DE genes for Aza treatment") %>%
kable_paper("hover", full_width = F)
baseMean | log2FoldChange | lfcSE | stat | pvalue | padj | SRR1171523 | SRR1171524 | SRR1171525 | SRR1171526 | SRR1171527 | SRR1171528 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ENSG00000165949 IFI27 | 1960.1970 | -3.384492 | 0.0938869 | -36.04861 | 0 | 0 | 4689 | 3583 | 2758 | 309 | 384 | 334 |
ENSG00000090382 LYZ | 7596.0299 | -1.650342 | 0.0561143 | -29.41036 | 0 | 0 | 14212 | 10237 | 10789 | 3476 | 3764 | 3738 |
ENSG00000115461 IGFBP5 | 531.2217 | -5.071157 | 0.1795239 | -28.24781 | 0 | 0 | 1320 | 823 | 1025 | 23 | 26 | 43 |
ENSG00000157601 MX1 | 827.1511 | -2.877795 | 0.1047823 | -27.46450 | 0 | 0 | 1732 | 1501 | 1206 | 184 | 223 | 186 |
ENSG00000111331 OAS3 | 2127.2010 | -2.661214 | 0.0972124 | -27.37525 | 0 | 0 | 4204 | 3977 | 2972 | 562 | 614 | 560 |
ENSG00000070915 SLC12A3 | 424.5509 | -3.374852 | 0.1298671 | -25.98697 | 0 | 0 | 1012 | 721 | 653 | 63 | 85 | 76 |
ENSG00000234745 HLA-B | 3197.0159 | -1.431566 | 0.0604169 | -23.69479 | 0 | 0 | 6085 | 4256 | 4023 | 1590 | 1872 | 1719 |
ENSG00000137965 IFI44 | 409.0957 | -2.978581 | 0.1319352 | -22.57608 | 0 | 0 | 829 | 740 | 635 | 76 | 111 | 89 |
ENSG00000204525 HLA-C | 1631.6421 | -1.461550 | 0.0660214 | -22.13750 | 0 | 0 | 3112 | 2150 | 2106 | 791 | 923 | 891 |
ENSG00000110042 DTX4 | 524.1318 | -2.470219 | 0.1173182 | -21.05572 | 0 | 0 | 1166 | 883 | 688 | 166 | 168 | 145 |
Make a smear plot.
sigf <- subset(def,padj<=0.05)
DET=nrow(mxf)
NSIG=nrow(sigf)
NUP=nrow(subset(sigf,log2FoldChange>0))
NDN=nrow(subset(sigf,log2FoldChange<0))
HEADER=paste(DET,"detected genes;",NSIG,"w/FDR<0.05;",NUP,"up;",NDN,"down")
plot(log10(def$baseMean) ,def$log2FoldChange,
cex=0.6,pch=19,col="darkgray",
main="5-azacitidine treatment in AML3 cells",
xlab="log10(basemean)",ylab="log2(fold change)")
points(log10(sigf$baseMean) ,sigf$log2FoldChange,
cex=0.6,pch=19,col="red")
mtext(HEADER)
In the next sections I will run enrichment analysis with over-representation test and compare it to functional class scoring.
I’ve compiled a reporting checklist:
Reporting criteria | Method/resource used |
---|---|
Origin of gene sets | Reactome (2023-03-06) |
Tool used | ClusterProfiler (check version at foot of report) |
Statistical test used | hypergeometric test |
P-value correction for multiple comparisons | FDR method |
Background list | Genes with >=10 reads per sample on average across all samples |
Gene Selection Criteria | DESeq2 FDR<0.05 |
ORA directionality | Separate tests for up- and down-regulation |
Data availability | via DEE2 at accession SRP038101 (human) |
Other parameters | Min gene set size of 10 |
Get the gene sets loaded in R. They are located in the /ref
folder of the repo.
# from https://reactome.org/download/current/ReactomePathways.gmt.zip
genesets2 <- read.gmt("ref/ReactomePathways_2023-03-06.gmt")
gsets <- gmtPathways("ref/ReactomePathways_2023-03-06.gmt")
Now filter the gene names into three lists, up-regulated, down-regulated and background. Background is simply all genes that were detected.
defup <- rownames(subset(def,padj<=0.05 & log2FoldChange>0))
defup <- unique(sapply(strsplit(defup," "),"[[",2))
defdn <- rownames(subset(def,padj<=0.05 & log2FoldChange<0))
defdn <- unique(sapply(strsplit(defdn," "),"[[",2))
bg <- rownames(def)
bg <- unique(sapply(strsplit(bg," "),"[[",2))
message("number of genes in each group")
## number of genes in each group
lapply(list("background"=bg,"up-regulated"=defup,"down-regulated"=defdn),length)
## $background
## [1] 13164
##
## $`up-regulated`
## [1] 1672
##
## $`down-regulated`
## [1] 1926
One of the quirky things I learned about clusterprofiler is that not all genes specified by the user as background are included in the enrichmentcalculation. It appears as though the genes not included in the annotation set are discarded. I believe this is a subtle but significant problem. To fix it is relatively easy from the user’s point of view. Just create a new gene set that consists of all detected genes and
Adding all detected genes to the background appears to improve results!
bgdf <- data.frame("background",bg)
colnames(bgdf) <- c("term","gene")
genesets2 <- rbind(genesets2,bgdf)
Enrichment firstly with upregulated genes. Note that the BgRatio contains the correct number of genes in the background now.
# show 10 pathways only
n_pw=10
ora_up <- as.data.frame(enricher(gene = defup ,
universe = bg, maxGSSize = 50000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora_up$geneID <- NULL
ora_up <- subset(ora_up,p.adjust<0.05 & Count >=10)
ora_ups <- rownames(ora_up)
gr <- as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_up$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_up$BgRatio,"/"),"[[",2))
ora_up$es <- gr/br
ora_up <- ora_up[order(-ora_up$es),]
ora_up$Description=NULL
head(ora_up,n_pw) %>%
kbl(row.names = FALSE, caption="Top upregulated pathways in Aza treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
Interactions of Rev with host cellular proteins | 24/1672 | 36/13164 | 0 | 0 | 0 | 24 | 5.248804 |
Nuclear import of Rev protein | 22/1672 | 33/13164 | 0 | 0 | 0 | 22 | 5.248804 |
Postmitotic nuclear pore complex (NPC) reformation | 18/1672 | 27/13164 | 0 | 0 | 0 | 18 | 5.248804 |
Rev-mediated nuclear export of HIV RNA | 22/1672 | 34/13164 | 0 | 0 | 0 | 22 | 5.094427 |
Export of Viral Ribonucleoproteins from Nucleus | 20/1672 | 31/13164 | 0 | 0 | 0 | 20 | 5.079488 |
NEP/NS2 Interacts with the Cellular Export Machinery | 20/1672 | 31/13164 | 0 | 0 | 0 | 20 | 5.079488 |
Transport of Ribonucleoproteins into the Host Nucleus | 20/1672 | 31/13164 | 0 | 0 | 0 | 20 | 5.079488 |
Nuclear Pore Complex (NPC) Disassembly | 22/1672 | 35/13164 | 0 | 0 | 0 | 22 | 4.948872 |
Defective TPR may confer susceptibility towards thyroid papillary carcinoma (TPC) | 18/1672 | 29/13164 | 0 | 0 | 0 | 18 | 4.886817 |
Regulation of Glucokinase by Glucokinase Regulatory Protein | 18/1672 | 29/13164 | 0 | 0 | 0 | 18 | 4.886817 |
topup2 <- rev(head(ora_up$es,10))
names(topup2) <- rev(head(ora_up$ID,10))
Now with the downregulated genes
ora_dn <- as.data.frame(enricher(gene = defdn ,
universe = bg, maxGSSize = 50000, TERM2GENE = genesets2,
pAdjustMethod="fdr", pvalueCutoff = 1, qvalueCutoff = 1 ))
ora_dn$geneID <- NULL
ora_dn <- subset(ora_dn,p.adjust<0.05 & Count >=10)
ora_dns <- rownames(ora_dn)
gr <- as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_dn$GeneRatio,"/"),"[[",2))
br <- as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",1)) /
as.numeric(sapply(strsplit(ora_dn$BgRatio,"/"),"[[",2))
ora_dn$es <- gr/br
ora_dn <- ora_dn[order(-ora_dn$es),]
ora_dn$Description=NULL
head(ora_dn,n_pw) %>%
kbl(row.names = FALSE, caption="Top downregulated pathways in Aza treatment") %>%
kable_paper("hover", full_width = F)
ID | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | es |
---|---|---|---|---|---|---|---|
Interleukin-20 family signaling | 10/1926 | 12/13164 | 0.0000002 | 0.0000108 | 0.0000089 | 10 | 5.695742 |
Interferon alpha/beta signaling | 41/1926 | 55/13164 | 0.0000000 | 0.0000000 | 0.0000000 | 41 | 5.095101 |
Interferon gamma signaling | 40/1926 | 63/13164 | 0.0000000 | 0.0000000 | 0.0000000 | 40 | 4.339613 |
Interleukin-10 signaling | 17/1926 | 27/13164 | 0.0000000 | 0.0000009 | 0.0000007 | 17 | 4.303450 |
Other interleukin signaling | 10/1926 | 18/13164 | 0.0000622 | 0.0021881 | 0.0018112 | 10 | 3.797162 |
Growth hormone receptor signaling | 10/1926 | 19/13164 | 0.0001141 | 0.0036972 | 0.0030603 | 10 | 3.597311 |
FGFR1 mutant receptor activation | 10/1926 | 20/13164 | 0.0001984 | 0.0055504 | 0.0045943 | 10 | 3.417445 |
RHO GTPases Activate NADPH Oxidases | 10/1926 | 20/13164 | 0.0001984 | 0.0055504 | 0.0045943 | 10 | 3.417445 |
Smooth Muscle Contraction | 13/1926 | 26/13164 | 0.0000216 | 0.0008075 | 0.0006684 | 13 | 3.417445 |
Interleukin-4 and Interleukin-13 signaling | 34/1926 | 70/13164 | 0.0000000 | 0.0000000 | 0.0000000 | 34 | 3.319804 |
topdn2 <- head(ora_dn$es,n_pw)
names(topdn2) <- head(ora_dn$ID,n_pw)
Make a barplot
par(mar=c(5,20,5,1))
cols <- c(rep("blue",n_pw),rep("red",n_pw))
barplot(c(topdn2,topup2),
horiz=TRUE,las=1,cex.names=0.65,col=cols,
main="top DE Reactomes",
xlab="ES",
cex.main=0.9)
mtext("ORA test")
Reporting criteria | Method/resource used |
---|---|
Origin of gene sets | Reactome (2023-03-06) |
Tool used | FGSEA (check version at foot of report) |
Statistical test used | Kolmogorov-Smirnov test |
P-value correction for multiple comparisons | FDR method |
Background list | Genes with >=10 reads per sample on average across all samples |
Gene Selection Criteria | DESeq2 FDR<0.05 |
ORA directionality | Separate tests for up- and down-regulation |
Data availability | via DEE2 at accession SRP038101 (human) |
Other parameters | Min gene set size of 10 |
def2 <- def
def2$genename <- sapply(strsplit(rownames(def2)," "),"[[",2)
def2 <- def2[,c("stat","genename")]
def2 <- aggregate(. ~ genename, def2, sum)
stat <- def2$stat
names(stat) <- def2$genename
fgseaRes <- fgsea(pathways=gsets, stats=stat, minSize=10, nPermSimple = 10000)
fgseaRes <- fgseaRes[order(fgseaRes$pval),]
fgsea_up <- subset(fgseaRes,padj<0.05 & ES>0)
fgsea_ups <- fgsea_up$pathway
fgsea_dn <- subset(fgseaRes,padj<0.05 & ES<0)
fgsea_dns <- fgsea_dn$pathway
fgsea_up <- data.frame(fgsea_up[order(-fgsea_up$ES),])
fgsea_dn <- data.frame(fgsea_dn[order(fgsea_dn$ES),])
head(fgsea_up,n_pw) %>%
kbl(row.names = FALSE, caption="FGSEA:Top upregulated pathways in Aza treatment") %>%
kable_paper("hover", full_width = F)
pathway | pval | padj | log2err | ES | NES | size | leadingEdge |
---|---|---|---|---|---|---|---|
Condensation of Prometaphase Chromosomes | 0.0000299 | 0.0002850 | 0.5756103 | 0.8415269 | 2.126444 | 11 | SMC4 , NCAPG , CSNK2A2, CDK1 , CCNB1 , SMC2 , NCAPH , CCNB2 , NCAPD2 |
Postmitotic nuclear pore complex (NPC) reformation | 0.0000000 | 0.0000000 | 0.8012156 | 0.7968889 | 2.585608 | 27 | KPNB1 , NDC1 , AHCTF1 , RCC1 , NUP155 , TNPO1 , NUP160 , POM121 , NUP205 , NUP54 , NUP93 , NUP188 , NUP107 , NUP98 , RAN , NUP85 , SEH1L , NUP43 , SUMO1 , NUP133 , NUP62 , RANGAP1 |
Interactions of Rev with host cellular proteins | 0.0000000 | 0.0000000 | 0.9101197 | 0.7869546 | 2.731408 | 36 | KPNB1 , NDC1 , NUP153 , NPM1 , RCC1 , NUP155 , TPR , RANBP1 , RANBP2 , NUP160 , POM121 , NUP50 , XPO1 , NUP205 , NUP54 , NUP93 , NUP188 , NUP214 , NUP107 , NUP98 , RAN , NUP85 , SEH1L , NUP43 , NUP133 , POM121C, NUP62 , RANGAP1, NUP88 , RAE1 |
Nuclear import of Rev protein | 0.0000000 | 0.0000000 | 0.8390889 | 0.7824653 | 2.667908 | 33 | KPNB1 , NDC1 , NUP153 , NPM1 , RCC1 , NUP155 , TPR , RANBP2 , NUP160 , POM121 , NUP50 , NUP205 , NUP54 , NUP93 , NUP188 , NUP214 , NUP107 , NUP98 , RAN , NUP85 , SEH1L , NUP43 , NUP133 , POM121C, NUP62 , NUP88 , RAE1 |
Rev-mediated nuclear export of HIV RNA | 0.0000000 | 0.0000000 | 0.8753251 | 0.7821972 | 2.685095 | 34 | NDC1 , NUP153 , RCC1 , NUP155 , TPR , RANBP1 , RANBP2 , NUP160 , POM121 , NUP50 , XPO1 , NUP205 , NUP54 , NUP93 , NUP188 , NUP214 , NUP107 , NUP98 , RAN , NUP85 , SEH1L , NUP43 , NUP133 , POM121C, NUP62 , RANGAP1, NUP88 , RAE1 |
SLBP Dependent Processing of Replication-Dependent Histone Pre-mRNAs | 0.0005261 | 0.0034662 | 0.4772708 | 0.7809511 | 1.973376 | 11 | NCBP1 , LSM11 , SNRPD3, SNRPF , ZNF473, NCBP2 , SLBP , SNRPG , SNRPE |
SLBP independent Processing of Histone Pre-mRNAs | 0.0007848 | 0.0048608 | 0.4772708 | 0.7801965 | 1.910005 | 10 | NCBP1 , LSM11 , SNRPD3, SNRPF , ZNF473, NCBP2 , SNRPG , SNRPE |
Transport of Ribonucleoproteins into the Host Nucleus | 0.0000000 | 0.0000000 | 0.8012156 | 0.7776911 | 2.616341 | 31 | KPNB1 , NDC1 , NUP153 , NUP155 , TPR , RANBP2 , NUP160 , POM121 , NUP50 , NUP205 , NUP54 , NUP93 , NUP188 , KPNA1 , NUP214 , NUP107 , NUP98 , NUP85 , SEH1L , NUP43 , NUP133 , POM121C, NUP62 , NUP88 , RAE1 |
Impaired BRCA2 binding to PALB2 | 0.0000000 | 0.0000008 | 0.7195128 | 0.7767850 | 2.449986 | 24 | XRCC2 , BLM , BRIP1 , BRCA1 , RAD51AP1, RAD51D , WRN , BRCA2 , EXO1 , PALB2 , DNA2 , BARD1 , RAD51 , RBBP8 , MRE11 , RAD51C |
Export of Viral Ribonucleoproteins from Nucleus | 0.0000000 | 0.0000000 | 0.8012156 | 0.7760569 | 2.610843 | 31 | NDC1 , NUP153 , NUP155 , TPR , RANBP2 , NUP160 , POM121 , NUP50 , XPO1 , NUP205 , NUP54 , NUP93 , NUP188 , NUP214 , NUP107 , NUP98 , RAN , NUP85 , SEH1L , NUP43 , NUP133 , POM121C, NUP62 , NUP88 , RAE1 |
head(fgsea_dn,n_pw) %>%
kbl(row.names = FALSE, caption="FGSEA:Top downregulated pathways in Aza treatment") %>%
kable_paper("hover", full_width = F)
pathway | pval | padj | log2err | ES | NES | size | leadingEdge |
---|---|---|---|---|---|---|---|
Endosomal/Vacuolar pathway | 0.0000016 | 0.0000200 | 0.6435518 | -0.8981315 | -2.107505 | 12 | HLA-B, HLA-C, B2M , HLA-A, HLA-E, CTSS , HLA-H, HLA-F, HLA-G |
Interferon alpha/beta signaling | 0.0000000 | 0.0000000 | 1.1512205 | -0.8564838 | -2.765692 | 55 | IFI27 , MX1 , OAS3 , HLA-B , HLA-C , OAS2 , OAS1 , IFIT1 , IFITM3, IFIT3 , STAT1 , HLA-A , ISG15 , IRF7 , STAT2 , IFI6 , IFIT5 , IFIT2 , IFI35 , BST2 , IFITM1, IRF5 , RSAD2 , GBP2 , XAF1 , USP18 , HLA-E , ADAR , MX2 , HLA-H , OASL , PSMB8 , SAMHD1, HLA-F , HLA-G , TYK2 , PTPN6 , ISG20 |
TNFs bind their physiological receptors | 0.0023131 | 0.0121912 | 0.4317077 | -0.8166001 | -1.839888 | 10 | TNFSF13B, CD70 , TNFRSF1B, EDA2R , TNFSF13 , TNFSF9 |
Interleukin-20 family signaling | 0.0003977 | 0.0026994 | 0.4984931 | -0.8141392 | -1.910413 | 12 | STAT1 , STAT2 , STAT5A, JAK3 , JAK2 , TYK2 , IL10RB, STAT3 , STAT5B, STAT4 |
Regulation of IFNA/IFNB signaling | 0.0029060 | 0.0147850 | 0.3548195 | -0.8037272 | -1.810884 | 10 | STAT1, STAT2, USP18, TYK2 , PTPN6 |
Maturation of nucleoprotein_9683610 | 0.0017802 | 0.0096755 | 0.4550599 | -0.7952852 | -1.828141 | 11 | PARP9 , PARP14, PARP10 |
Interferon gamma signaling | 0.0000000 | 0.0000000 | 1.0276699 | -0.7934868 | -2.625995 | 63 | OAS3 , HLA-B , HLA-C , OAS2 , OAS1 , B2M , STAT1 , HLA-A , IRF7 , TRIM22, IRF5 , GBP2 , SMAD7 , TRIM25, HLA-E , TRIM14, HLA-H , OASL , TRIM5 , GBP3 , PTAFR , TRIM8 , SP100 , HLA-F , TRIM21, PML , JAK2 , HLA-G , IFNGR2, GBP5 , PTPN6 , IFNGR1, MT2A , NCAM1 , GBP4 , PRKCD , TRIM26 |
MyD88 deficiency (TLR2/4) | 0.0012073 | 0.0069045 | 0.4550599 | -0.7914249 | -1.896202 | 13 | MYD88 , S100A9, S100A8, CD36 , TLR6 , CD14 , TLR2 |
RUNX3 regulates p14-ARF | 0.0085470 | 0.0328838 | 0.2045441 | -0.7641348 | -1.721678 | 10 | CDKN2A, CCND1 , RUNX3 , RUNX1 |
Transcriptional regulation of pluripotent stem cells | 0.0038970 | 0.0184421 | 0.3037794 | -0.7636281 | -1.791887 | 12 | EPAS1, EOMES, PBX1 , NR5A1, STAT3, KLF4 , HHEX |
fgsea_up <- head(fgsea_up,n_pw)
fgsea_up_vec <- fgsea_up$ES
names(fgsea_up_vec) <- fgsea_up$pathway
fgsea_up_vec <- rev(fgsea_up_vec)
fgsea_dn <- head(fgsea_dn,n_pw)
fgsea_dn_vec <- fgsea_dn$ES * -1
names(fgsea_dn_vec) <- fgsea_dn$pathway
Make a barplot
par(mar=c(5,20,5,1))
cols <- c(rep("blue",n_pw),rep("red",n_pw))
barplot(c(fgsea_dn_vec,fgsea_up_vec),
horiz=TRUE,las=1,cex.names=0.65,col=cols,
main="top DE Reactomes",
xlab="ES",
cex.main=0.9)
mtext("FCS test")
v1 <- list("ORA up"=ora_ups, "FCS up"=fgsea_ups)
v2 <- list("ORA dn"=ora_dns, "FCS dn"=fgsea_dns)
par(mar=c(10,10,10,10))
par(mfrow=c(2,1))
plot(euler(v1),quantities = list(cex = 2), labels = list(cex = 2))
plot(euler(v2),quantities = list(cex = 2), labels = list(cex = 2))
Jaccard index
jaccard <- function(a, b) {
intersection = length(intersect(a, b))
union = length(a) + length(b) - intersection
return (intersection/union)
}
ora <- c(ora_ups,ora_dns)
fcs <- c(fgsea_ups,fgsea_dns)
jaccard(ora,fcs)
## [1] 0.5592841
In this case the top upregulated and downregulated pathways
upname <- head(fgsea_up,1)$pathway
plotEnrichment(gsets[[upname]], stat, gseaParam = 1, ticksSize = 0.2)
dnname <- head(fgsea_dn,1)$pathway
plotEnrichment(gsets[[dnname]], stat, gseaParam = 1, ticksSize = 0.2)
Now a heatmap of these gene sets.
# reads per million normalisation
rpm <- apply(mxf,2, function(x) { x/sum(x) * 1000000} )
colnames(rpm) <- c("UNTR1","UNTR2","UNTR3","TRT1","TRT2","TRT3")
gnames_up <- gsets[[which(names(gsets) == upname)]]
gnames_dn <- gsets[[which(names(gsets) == dnname)]]
gene_ids <- rownames(mxf)
gene_names <- sapply(strsplit(gene_ids," "),"[[",2)
rpm_up <- rpm[which(gene_names %in% gnames_up),]
rownames(rpm_up) <- sapply(strsplit(rownames(rpm_up)," "),"[[",2)
rpm_dn <- rpm[which(gene_names %in% gnames_dn),]
rownames(rpm_dn) <- sapply(strsplit(rownames(rpm_dn)," "),"[[",2)
colsidecols <- c("blue","blue","blue","red","red","red")
heatmap.2(rpm_up,scale="row",trace="none",margins=c(10,15),main=upname,ColSideColors=colsidecols)
heatmap.2(rpm_dn,scale="row",trace="none",margins=c(10,15),main=dnname,ColSideColors=colsidecols)
For reproducibility
sessionInfo()
## R version 4.3.0 (2023-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] gplots_3.1.3 eulerr_7.0.0
## [3] fgsea_1.22.0 clusterProfiler_4.4.4
## [5] kableExtra_1.3.4 DESeq2_1.36.0
## [7] SummarizedExperiment_1.26.1 Biobase_2.56.0
## [9] MatrixGenerics_1.8.1 matrixStats_0.63.0
## [11] GenomicRanges_1.48.0 GenomeInfoDb_1.32.2
## [13] IRanges_2.30.0 S4Vectors_0.34.0
## [15] BiocGenerics_0.42.0 getDEE2_1.6.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 rstudioapi_0.14 jsonlite_1.8.4
## [4] magrittr_2.0.3 farver_2.1.1 rmarkdown_2.21
## [7] zlibbioc_1.42.0 vctrs_0.6.2 memoise_2.0.1
## [10] RCurl_1.98-1.12 ggtree_3.4.1 webshot_0.5.4
## [13] htmltools_0.5.5 gridGraphics_0.5-1 sass_0.4.5
## [16] KernSmooth_2.23-20 bslib_0.4.2 plyr_1.8.8
## [19] cachem_1.0.7 igraph_1.4.2 lifecycle_1.0.3
## [22] pkgconfig_2.0.3 Matrix_1.5-4 R6_2.5.1
## [25] fastmap_1.1.1 GenomeInfoDbData_1.2.8 digest_0.6.31
## [28] aplot_0.1.10 enrichplot_1.16.1 colorspace_2.1-0
## [31] patchwork_1.1.2 AnnotationDbi_1.58.0 geneplotter_1.74.0
## [34] RSQLite_2.3.1 labeling_0.4.2 fansi_1.0.4
## [37] httr_1.4.5 polyclip_1.10-4 compiler_4.3.0
## [40] bit64_4.0.5 withr_2.5.0 downloader_0.4
## [43] BiocParallel_1.30.3 viridis_0.6.2 DBI_1.1.3
## [46] highr_0.10 ggforce_0.4.1 MASS_7.3-59
## [49] DelayedArray_0.22.0 caTools_1.18.2 gtools_3.9.4
## [52] tools_4.3.0 DO.db_2.9 scatterpie_0.1.9
## [55] ape_5.7-1 glue_1.6.2 nlme_3.1-162
## [58] GOSemSim_2.22.0 polylabelr_0.2.0 shadowtext_0.1.2
## [61] grid_4.3.0 reshape2_1.4.4 generics_0.1.3
## [64] gtable_0.3.3 tidyr_1.3.0 data.table_1.14.8
## [67] tidygraph_1.2.3 xml2_1.3.4 utf8_1.2.3
## [70] XVector_0.36.0 ggrepel_0.9.3 pillar_1.9.0
## [73] stringr_1.5.0 yulab.utils_0.0.6 genefilter_1.78.0
## [76] splines_4.3.0 dplyr_1.1.2 tweenr_2.0.2
## [79] treeio_1.20.1 lattice_0.21-8 survival_3.5-5
## [82] bit_4.0.5 annotate_1.74.0 tidyselect_1.2.0
## [85] GO.db_3.15.0 locfit_1.5-9.7 Biostrings_2.64.0
## [88] knitr_1.42 gridExtra_2.3 svglite_2.1.1
## [91] xfun_0.39 graphlayouts_0.8.4 stringi_1.7.12
## [94] lazyeval_0.2.2 ggfun_0.0.9 yaml_2.3.7
## [97] evaluate_0.20 codetools_0.2-19 ggraph_2.1.0
## [100] tibble_3.2.1 qvalue_2.28.0 ggplotify_0.1.0
## [103] cli_3.6.1 xtable_1.8-4 systemfonts_1.0.4
## [106] munsell_0.5.0 jquerylib_0.1.4 Rcpp_1.0.10
## [109] png_0.1-8 XML_3.99-0.14 parallel_4.3.0
## [112] ggplot2_3.4.2 blob_1.2.4 DOSE_3.22.0
## [115] htm2txt_2.2.2 bitops_1.0-7 tidytree_0.4.2
## [118] viridisLite_0.4.1 scales_1.2.1 purrr_1.0.1
## [121] crayon_1.5.2 rlang_1.1.1 fastmatch_1.1-3
## [124] KEGGREST_1.36.3 rvest_1.0.3