Previously, Mandhri and I analysed the B-PROOF 450K data, trying to understand whether vitamin supplementation caused changes in gene methylation. We used Limma and some basic analyses, which showed no specific probes with FDR<0.05, nor any DMRs.
Here we will attempt another approach. First aggregate median signal for probes belonging to a gene, followed by limma and enrichment analysis.
To validate this approach it is possible to cut the dataset in half and see what % of results are consistent.
suppressPackageStartupMessages({
library("limma")
library("parallel")
library("dplyr")
library("kableExtra")
library("eulerr")
library("mitch")
library("IlluminaHumanMethylation450kmanifest")
library("IlluminaHumanMethylation450kanno.ilmn12.hg19")
library("tictoc")
})
source("meth_functions.R")
ann450k <- getAnnotation(IlluminaHumanMethylation450kanno.ilmn12.hg19)
myann <- data.frame(ann450k[,c("UCSC_RefGene_Name","Regulatory_Feature_Group")])
gp <- myann[,"UCSC_RefGene_Name",drop=FALSE]
gp2 <- strsplit(gp$UCSC_RefGene_Name,";")
names(gp2) <- rownames(gp)
sets <- split(rep(names(gp2), lengths(gp2)), unlist(gp2))
promoters <- grep("Prom",myann$Regulatory_Feature_Group)
It is thought that high plasma homocysteine can inhibit DNA methylation. Let’s see whether that is the case, and which genes are most affected. This analysis is conducted at the whole gene level as well as on the level of promoters.
This approach involves performing limma, and then for each gene the median t-score is calculated. This value is then used by mitch.
design <- readRDS(file="hcy_design.rds")
mval <- readRDS(file="hcy_mval.rds")
tic()
res <- main(mval=mval,design=design,sets=sets,cores=4)
res_df <- res[[1]] # enrichment results
res_top <- res[[2]] # limma results
toc() ## 502-522 sec elapsed on 8 cores (8.3-8.7 mins)
## 591.806 sec elapsed
hist(res_top$t)
hist(res_df$median)
head(res_df,30) %>%
kbl(caption="Top GMEA genes") %>%
kable_styling("hover", full_width = F)
nprobes | median | PValue | FDR | |
---|---|---|---|---|
PDZK1IP1 | 8 | -1.859377 | 0.0000935 | 0.589115 |
ASPRV1 | 11 | -1.302865 | 0.0000950 | 0.589115 |
LOC100128023 | 1 | -3.929833 | 0.0001268 | 0.589115 |
CPVL | 18 | -1.425266 | 0.0001567 | 0.589115 |
HAL | 12 | -1.730888 | 0.0001975 | 0.589115 |
SNORD115-22 | 5 | -2.103461 | 0.0002981 | 0.589115 |
C12orf27 | 5 | -1.729725 | 0.0003147 | 0.589115 |
DCUN1D2 | 38 | -1.255057 | 0.0003350 | 0.589115 |
RNASE11 | 5 | -1.623172 | 0.0003891 | 0.589115 |
GPR109B | 5 | -2.057596 | 0.0004102 | 0.589115 |
MIR195 | 6 | -2.102848 | 0.0004515 | 0.589115 |
MIR143 | 5 | -2.373868 | 0.0004586 | 0.589115 |
WBSCR26 | 3 | -3.138159 | 0.0004609 | 0.589115 |
GUCY1B2 | 12 | -2.793912 | 0.0004644 | 0.589115 |
QSOX2 | 24 | -1.344775 | 0.0004850 | 0.589115 |
CEACAM8 | 6 | -2.251108 | 0.0005085 | 0.589115 |
FAM153C | 4 | -2.714201 | 0.0005386 | 0.589115 |
CD177 | 7 | -3.034576 | 0.0006618 | 0.589115 |
MIRLET7A3 | 8 | -2.033064 | 0.0006669 | 0.589115 |
SLC22A17 | 20 | -1.107041 | 0.0006882 | 0.589115 |
ANKRD33B | 49 | -1.001862 | 0.0007299 | 0.589115 |
PLXNB1 | 24 | -1.024144 | 0.0007478 | 0.589115 |
CYP2A13 | 8 | -1.484049 | 0.0007561 | 0.589115 |
CSTA | 3 | -2.351129 | 0.0007616 | 0.589115 |
UBE2MP1 | 8 | -1.790921 | 0.0008419 | 0.589115 |
RIN2 | 18 | -1.539252 | 0.0008706 | 0.589115 |
ENG | 7 | -2.117828 | 0.0008803 | 0.589115 |
LRRC70 | 1 | -3.392764 | 0.0008925 | 0.589115 |
MIR497 | 5 | -1.678467 | 0.0009088 | 0.589115 |
SNORD116-26 | 1 | -3.394603 | 0.0009254 | 0.589115 |
genesets <- gmt_import("ReactomePathways.gmt")
y <- res_df[,"median",drop=FALSE]
colnames(y) <- "meth"
res <- mitch_calc(y, genesets, priority="effect")
## Note: Enrichments with large effect sizes may not be
## statistically significant.
head(res$enrichment_result,30) %>%
kbl(caption="Top GMEA pathways", row.names = F) %>%
kable_styling("hover", full_width = F)
set | setSize | pANOVA | s.dist | p.adjustANOVA |
---|---|---|---|---|
Thyroxine biosynthesis | 10 | 0.0016466 | -0.5747660 | 0.0076537 |
Apoptotic cleavage of cell adhesion proteins | 11 | 0.0011910 | -0.5643081 | 0.0058152 |
Scavenging of heme from plasma | 11 | 0.0022490 | -0.5319605 | 0.0100842 |
TICAM1, RIP1-mediated IKK complex recruitment | 18 | 0.0001915 | 0.5077483 | 0.0011740 |
TP53 regulates transcription of additional cell cycle genes whose exact role in the p53 pathway remain uncertain | 19 | 0.0001404 | 0.5044793 | 0.0009282 |
MASTL Facilitates Mitotic Progression | 10 | 0.0070918 | 0.4916899 | 0.0262276 |
Transcriptional regulation of testis differentiation | 12 | 0.0033665 | -0.4888125 | 0.0139049 |
HDR through MMEJ (alt-NHEJ) | 11 | 0.0050633 | 0.4880519 | 0.0196835 |
Defective HDR through Homologous Recombination (HRR) due to BRCA1 loss-of-function | 20 | 0.0001831 | 0.4831748 | 0.0011316 |
Defective HDR through Homologous Recombination (HRR) due to PALB2 loss of function | 20 | 0.0001831 | 0.4831748 | 0.0011316 |
Defective HDR through Homologous Recombination Repair (HRR) due to PALB2 loss of BRCA1 binding function | 20 | 0.0001831 | 0.4831748 | 0.0011316 |
Defective HDR through Homologous Recombination Repair (HRR) due to PALB2 loss of BRCA2/RAD51/RAD51C binding function | 20 | 0.0001831 | 0.4831748 | 0.0011316 |
Impaired BRCA2 binding to PALB2 | 20 | 0.0001831 | 0.4831748 | 0.0011316 |
SRP-dependent cotranslational protein targeting to membrane | 105 | 0.0000000 | 0.4762477 | 0.0000000 |
PECAM1 interactions | 12 | 0.0047623 | 0.4705707 | 0.0186086 |
Ubiquitin-dependent degradation of Cyclin D | 50 | 0.0000000 | 0.4639473 | 0.0000003 |
Resolution of D-loop Structures through Holliday Junction Intermediates | 25 | 0.0000621 | 0.4626379 | 0.0004392 |
Class I peroxisomal membrane protein import | 18 | 0.0006815 | 0.4624289 | 0.0035751 |
Viral mRNA Translation | 84 | 0.0000000 | 0.4612688 | 0.0000000 |
Regulation of ornithine decarboxylase (ODC) | 49 | 0.0000000 | 0.4610351 | 0.0000005 |
Bicarbonate transporters | 10 | 0.0116949 | -0.4604163 | 0.0390188 |
Autodegradation of the E3 ubiquitin ligase COP1 | 50 | 0.0000000 | 0.4593574 | 0.0000004 |
Stabilization of p53 | 55 | 0.0000000 | 0.4551479 | 0.0000001 |
Peptide chain elongation | 84 | 0.0000000 | 0.4544448 | 0.0000000 |
Eukaryotic Translation Termination | 87 | 0.0000000 | 0.4522337 | 0.0000000 |
Metabolism of polyamines | 54 | 0.0000000 | 0.4497912 | 0.0000003 |
Regulation of activated PAK-2p34 by proteasome mediated degradation | 48 | 0.0000001 | 0.4491669 | 0.0000012 |
Eukaryotic Translation Elongation | 88 | 0.0000000 | 0.4474163 | 0.0000000 |
Deadenylation of mRNA | 24 | 0.0001531 | 0.4463959 | 0.0009903 |
Nonsense Mediated Decay (NMD) enhanced by the Exon Junction Complex (EJC) | 106 | 0.0000000 | 0.4452875 | 0.0000000 |
A slight variation is to use cameraPR instead of mitch to better control for correlation.
stat <- res_df$median
names(stat) <- rownames(res_df)
stat[is.na(stat)] <- 0
tic()
cres <- cameraPR(statistic=stat, index=genesets, use.ranks = FALSE, inter.gene.cor=0.01, sort = TRUE)
cres <- subset(cres,NGenes>4)
cres$FDR <- p.adjust(cres$PValue,method="fdr")
mymedians <- sapply(1:nrow(cres),function(i) {
myset <- genesets[[which(names(genesets) == rownames(cres)[i])]]
mystats <- stat[names(stat) %in% myset]
median(mystats)
})
cres$median <- mymedians
toc() # 1.0 sec
## 2.933 sec elapsed
sig <- subset(cres,FDR<0.05)
head(sig[order(-abs(sig$median)),],20) %>%
kbl(caption = "Top GMEA pathways found with CAMERA after 5% FDR filtering") %>%
kable_styling("hover", full_width = F)
NGenes | Direction | PValue | FDR | median | |
---|---|---|---|---|---|
Defective F9 activation | 6 | Down | 0.0008918 | 0.0151544 | -1.0187475 |
Histidine catabolism | 6 | Down | 0.0009943 | 0.0160688 | -0.8601699 |
Apoptotic cleavage of cell adhesion proteins | 11 | Down | 0.0023466 | 0.0288713 | -0.8503640 |
Scavenging of heme from plasma | 11 | Down | 0.0025557 | 0.0306524 | -0.7851758 |
Platelet Adhesion to exposed collagen | 13 | Down | 0.0049945 | 0.0493503 | -0.6951431 |
Regulation of TLR by endogenous ligand | 17 | Down | 0.0029347 | 0.0340635 | -0.5776825 |
Common Pathway of Fibrin Clot Formation | 22 | Down | 0.0012829 | 0.0194164 | -0.5646874 |
Formation of Fibrin Clot (Clotting Cascade) | 39 | Down | 0.0002462 | 0.0066118 | -0.5643871 |
Paracetamol ADME | 30 | Down | 0.0036389 | 0.0389851 | -0.5092101 |
Aspirin ADME | 43 | Down | 0.0029473 | 0.0340635 | -0.4623278 |
Keratinization | 205 | Down | 0.0047126 | 0.0472935 | -0.4125242 |
Drug ADME | 98 | Down | 0.0029287 | 0.0340635 | -0.3972550 |
ROS and RNS production in phagocytes | 36 | Down | 0.0045759 | 0.0462841 | -0.3605977 |
TP53 regulates transcription of additional cell cycle genes whose exact role in the p53 pathway remain uncertain | 19 | Up | 0.0008198 | 0.0146713 | 0.1882688 |
Deadenylation of mRNA | 24 | Up | 0.0029918 | 0.0343696 | 0.1440051 |
Defective HDR through Homologous Recombination (HRR) due to BRCA1 loss-of-function | 20 | Up | 0.0031821 | 0.0352809 | 0.1182056 |
Defective HDR through Homologous Recombination (HRR) due to PALB2 loss of function | 20 | Up | 0.0031821 | 0.0352809 | 0.1182056 |
Defective HDR through Homologous Recombination Repair (HRR) due to PALB2 loss of BRCA1 binding function | 20 | Up | 0.0031821 | 0.0352809 | 0.1182056 |
Defective HDR through Homologous Recombination Repair (HRR) due to PALB2 loss of BRCA2/RAD51/RAD51C binding function | 20 | Up | 0.0031821 | 0.0352809 | 0.1182056 |
Impaired BRCA2 binding to PALB2 | 20 | Up | 0.0031821 | 0.0352809 | 0.1182056 |
Now try another approach by first aggregating and then performing limma, then pathway enrichment.
Aggregate mval to genes and examine pathway enrichment in blood.
gsmea <- function(mval,design,probesets,genesets,cores=2) {
ag <- mclapply(probesets,function(ps) {
mval <- mval[rownames(mval) %in% ps,,drop=FALSE]
if (nrow(mval>1)) {
o <- apply(mval,2,median)
} else {
o <- mval
}
o
},mc.cores=cores)
ag <- do.call(rbind,ag)
res <- main(mval=ag,design=design,sets=genesets,cores=cores )
}
tic()
hcy_gsmea <- gsmea(mval=mval,design=design,probesets=sets,genesets=genesets,cores=8)
toc()
## 100.042 sec elapsed
hcy_gsmea$res <- subset(hcy_gsmea$res,nprobes>=5)
hcy_gsmea$res$FDR <- p.adjust(hcy_gsmea$res$PValue,method="fdr")
head(hcy_gsmea$res,20) %>%
kbl(caption = "Pathways found using the agg,lim,fry method") %>%
kable_styling("hover", full_width = F)
nprobes | median | PValue | FDR | |
---|---|---|---|---|
RUNX1 regulates transcription of genes involved in WNT signaling | 5 | -2.1901707 | 0.0049213 | 0.9998201 |
Histidine catabolism | 6 | -1.7254211 | 0.0057306 | 0.9998201 |
Metal sequestration by antimicrobial proteins | 6 | -1.7937251 | 0.0068823 | 0.9998201 |
Elevation of cytosolic Ca2+ levels | 16 | -1.1680203 | 0.0081327 | 0.9998201 |
Defective B3GALTL causes PpS | 34 | -0.8239993 | 0.0089175 | 0.9998201 |
O-glycosylation of TSR domain-containing proteins | 35 | -0.8535126 | 0.0092411 | 0.9998201 |
Pyrimidine salvage | 10 | -1.3093735 | 0.0108085 | 0.9998201 |
Paracetamol ADME | 29 | -1.2339871 | 0.0133597 | 0.9998201 |
Cation-coupled Chloride cotransporters | 7 | -0.7799046 | 0.0152959 | 0.9998201 |
DNA methylation | 6 | -0.9357507 | 0.0172100 | 0.9998201 |
Defective CHSY1 causes TPBS | 7 | -1.1704274 | 0.0180292 | 0.9998201 |
Prostanoid ligand receptors | 8 | -1.4453650 | 0.0181647 | 0.9998201 |
Platelet Adhesion to exposed collagen | 13 | -0.7616780 | 0.0183280 | 0.9998201 |
Metabolism of Angiotensinogen to Angiotensins | 15 | -0.9682746 | 0.0196268 | 0.9998201 |
Regulation of localization of FOXO transcription factors | 10 | -1.1349116 | 0.0203682 | 0.9998201 |
Caspase activation via Dependence Receptors in the absence of ligand | 9 | -1.7920408 | 0.0209745 | 0.9998201 |
Proton-coupled monocarboxylate transport | 6 | -1.5858403 | 0.0224858 | 0.9998201 |
ROS and RNS production in phagocytes | 35 | -0.6466300 | 0.0227899 | 0.9998201 |
Regulation of commissural axon pathfinding by SLIT and ROBO | 10 | -0.9774779 | 0.0240911 | 0.9998201 |
Scavenging of heme from plasma | 11 | -1.2249775 | 0.0243507 | 0.9998201 |
# set cores to used for parallel execution
CORES= detectCores()
dm <- read.table("dma3a.tsv.gz")
dm <- dm[,4:9]
dm <- merge(myann,dm,by=0)
rownames(dm) <- dm[,1]
dm[,1] = NULL
calc_sc <- function(dm) {
gn <- unique(unlist(strsplit( dm$UCSC_RefGene_Name ,";")))
gnl <- strsplit( dm$UCSC_RefGene_Name ,";")
gnl <- mclapply(gnl,unique,mc.cores=CORES)
dm$UCSC_RefGene_Name <- gnl
l <- mclapply(1:nrow(dm), function(i) {
a <- dm[i,]
len <- length(a[[1]][[1]])
tvals <- as.numeric(rep(a["t"],len))
genes <- a[[1]][[1]]
data.frame(genes,tvals)
},mc.cores=CORES)
df <- do.call(rbind,l)
gme_res <- mclapply( 1:length(gn), function(i) {
g <- gn[i]
tstats <- df[which(df$genes==g),"tvals"]
myn <- length(tstats)
mymean <- mean(tstats)
mymedian <- median(tstats)
wtselfcont <- wilcox.test(tstats)
res <- c("gene"=g,"nprobes"=myn,"mean"=mymean,"median"=mymedian,
"p-value(sc)"=wtselfcont$p.value)
} , mc.cores=CORES )
gme_res_df <- do.call(rbind, gme_res)
rownames(gme_res_df) <- gme_res_df[,1]
gme_res_df <- gme_res_df[,-1]
tmp <- apply(gme_res_df,2,as.numeric)
rownames(tmp) <- rownames(gme_res_df)
gme_res_df <- as.data.frame(tmp)
gme_res_df$sig <- -log10(gme_res_df[,4])
gme_res_df <- gme_res_df[order(-gme_res_df$sig),]
gme_res_df$`fdr(sc)` <- p.adjust(gme_res_df$`p-value(sc)`)
out <- list("df"=df,"gme_res_df"=gme_res_df)
return(out)
}
tic()
gme_res_wholegene <- calc_sc(dm)
time2 <- toc() #38 44 41 40 44 41 40
## 58.672 sec elapsed
time2
## $tic
## elapsed
## 30287.14
##
## $toc
## elapsed
## 30345.81
##
## $msg
## logical(0)
##
## $callback_msg
## [1] "58.672 sec elapsed"
df <- gme_res_wholegene[[1]]
res <- gme_res_wholegene[[2]]
write.table(res,file="gmea_wholegene.tsv")
head(res,50) %>%
kbl(caption = "Top significant genes with GMEA") %>%
kable_styling("hover", full_width = F)
nprobes | mean | median | p-value(sc) | sig | fdr(sc) | |
---|---|---|---|---|---|---|
TNXB | 531 | 0.3429080 | 0.4078607 | 0e+00 | 14.041475 | 0.0000000 |
PCDHA1 | 162 | -0.7077910 | -0.6245086 | 0e+00 | 13.496797 | 0.0000000 |
NNAT | 49 | -0.9419415 | -0.9392397 | 0e+00 | 13.303312 | 0.0000000 |
PCDHA2 | 149 | -0.7188634 | -0.6267169 | 0e+00 | 12.485679 | 0.0000000 |
PCDHA3 | 141 | -0.7189282 | -0.6267169 | 0e+00 | 11.583810 | 0.0000001 |
KCNQ1DN | 39 | -1.6144397 | -1.7080108 | 0e+00 | 11.439140 | 0.0000001 |
TAP1 | 100 | 0.8373978 | 0.8170222 | 0e+00 | 10.577873 | 0.0000005 |
NCOR2 | 212 | -0.6647509 | -0.5447866 | 0e+00 | 10.345470 | 0.0000009 |
PCDHGA1 | 317 | -0.4141393 | -0.4153141 | 0e+00 | 10.238108 | 0.0000012 |
PCDHA4 | 131 | -0.6779619 | -0.6026401 | 0e+00 | 10.042257 | 0.0000018 |
PCDHGA2 | 309 | -0.4165786 | -0.4153141 | 0e+00 | 9.987553 | 0.0000021 |
PCDHGA3 | 295 | -0.4180524 | -0.3901466 | 0e+00 | 9.576696 | 0.0000053 |
MESTIT1 | 59 | -0.7596278 | -0.7939385 | 0e+00 | 9.544056 | 0.0000057 |
PCDHA5 | 122 | -0.6998042 | -0.6245086 | 0e+00 | 9.542358 | 0.0000058 |
NKX6-2 | 36 | -1.3952706 | -1.4319593 | 0e+00 | 9.536050 | 0.0000058 |
SOX2OT | 85 | -0.7468762 | -0.8104998 | 0e+00 | 9.479982 | 0.0000066 |
PCDHA7 | 103 | -0.7321981 | -0.6267169 | 0e+00 | 9.375369 | 0.0000085 |
C11orf21 | 36 | -1.7668062 | -1.9463963 | 0e+00 | 9.257296 | 0.0000111 |
PITX2 | 64 | -1.0007742 | -0.9208371 | 0e+00 | 9.163509 | 0.0000138 |
PCDHA6 | 114 | -0.6916765 | -0.6245086 | 0e+00 | 8.749552 | 0.0000357 |
TSPAN32 | 42 | -1.6254336 | -1.9412022 | 0e+00 | 8.645175 | 0.0000454 |
MEST | 85 | -0.6582939 | -0.7858506 | 0e+00 | 8.607705 | 0.0000495 |
ASCL2 | 49 | -0.8873371 | -0.9068647 | 0e+00 | 8.538922 | 0.0000580 |
WT1 | 60 | -1.0380025 | -1.0581139 | 0e+00 | 8.537528 | 0.0000582 |
PCDHGB1 | 277 | -0.4040374 | -0.3512600 | 0e+00 | 8.169027 | 0.0001360 |
PCDHA8 | 92 | -0.7250898 | -0.6311017 | 0e+00 | 8.158124 | 0.0001394 |
KIAA1949 | 101 | 0.7427567 | 0.6964985 | 0e+00 | 8.107412 | 0.0001567 |
PCDHA10 | 81 | -0.7546498 | -0.6842488 | 0e+00 | 7.961957 | 0.0002190 |
PCDHA9 | 84 | -0.7351768 | -0.6598677 | 0e+00 | 7.948914 | 0.0002257 |
PSMB8 | 83 | 0.8313260 | 0.6488706 | 0e+00 | 7.945020 | 0.0002277 |
PRDM13 | 41 | -0.9251076 | -0.9595314 | 0e+00 | 7.870850 | 0.0002701 |
GNA12 | 84 | -0.9539754 | -0.7963257 | 0e+00 | 7.622375 | 0.0004787 |
PCDHGA4 | 263 | -0.4030756 | -0.3512600 | 0e+00 | 7.620039 | 0.0004812 |
MAD1L1 | 684 | -0.2827612 | -0.2513758 | 0e+00 | 7.532317 | 0.0005889 |
SFRP2 | 44 | -1.1033457 | -1.2056608 | 0e+00 | 7.518236 | 0.0006083 |
SOX1 | 27 | -1.2956875 | -1.5392701 | 0e+00 | 7.349659 | 0.0008967 |
ZIC1 | 40 | -1.0223609 | -1.1114647 | 1e-07 | 7.295516 | 0.0010157 |
SPON2 | 30 | -1.0371460 | -1.0318217 | 1e-07 | 7.096401 | 0.0016064 |
TLX1 | 36 | -0.7858743 | -0.8711780 | 1e-07 | 7.094355 | 0.0016139 |
PPP1R2P1 | 27 | -1.2635175 | -1.2835267 | 1e-07 | 6.826780 | 0.0029884 |
TBX15 | 57 | -0.8784010 | -0.9985338 | 2e-07 | 6.710128 | 0.0039091 |
HLA-E | 64 | 0.9883267 | 0.9471868 | 2e-07 | 6.621543 | 0.0047933 |
SOX2 | 31 | -0.9356689 | -0.9861379 | 4e-07 | 6.380592 | 0.0083477 |
HLA-J | 60 | -0.9474161 | -0.9835159 | 4e-07 | 6.363271 | 0.0086869 |
TFAP2A | 77 | -0.6823000 | -0.6389762 | 4e-07 | 6.359493 | 0.0087624 |
PCDHGB2 | 249 | -0.3779342 | -0.2994303 | 5e-07 | 6.334383 | 0.0092835 |
TBX2 | 42 | -0.9192182 | -0.9768360 | 5e-07 | 6.283615 | 0.0104341 |
NR2E1 | 55 | -0.7936621 | -0.8098274 | 6e-07 | 6.236670 | 0.0116246 |
DMRTA2 | 32 | -0.8190626 | -0.8833632 | 6e-07 | 6.231215 | 0.0117710 |
LOC100128811 | 24 | -0.9214760 | -1.0762458 | 6e-07 | 6.224720 | 0.0119478 |
# volcano selfcont
sig <- subset(res,`fdr(sc)` < 0.05)
plot(res$median , -log10(res$`p-value(sc)`) ,
xlab="effect size (mean t-stat)", ylab="-log10(p-value)",
pch=19, cex=0.5, col="gray",main="self contained test")
grid()
points(sig$median , -log10(sig$`p-value(sc)`) ,
pch=19, cex=0.5, col="red")
plot(res$nprobes,res$sig,log="x",ylim=c(0,50),pch=19,cex=0.6)
points(sig$nprobes,sig$sig,col="red",pch=19,cex=0.62)
MIN = min(sig$nprobes)
LEFT = nrow(subset(res,nprobes<MIN))
RIGHT = nrow(subset(res,nprobes>MIN))
SIG = nrow(sig)
TOT = nrow(res)
HEADER <- paste(TOT, "genes in total.", SIG, "with FDR<0.05.",
RIGHT, "well covered and", LEFT, "poorly covered")
mtext(HEADER)
abline(v=MIN,lty=2)
Boxplots smallest pvalue.
par(mfrow=c(1,2))
n=50
# self contained
gs <- head(rownames(res),50)
tstats <- lapply(gs, function(g) {
df[which(df$genes==g),"tvals"]
})
names(tstats) <- gs
tstats <- tstats[order(unlist(lapply(tstats,median)))]
boxplot(tstats,horizontal=TRUE,las=1,
main="smallest p-val(selfcont)",cex.axis=0.6,
xlab="t-statistic")
grid()
n=50
# effect size median
sig <- subset(res,`fdr(sc)` < 0.05)
gs <- head(rownames(sig[order(-abs(sig$median)),]),n)
if ( length(gs) >2 ) {
tstats <- lapply(gs, function(g) {
df[which(df$genes==g),"tvals"]
})
names(tstats) <- gs
tstats <- tstats[order(unlist(lapply(tstats,median)))]
boxplot(tstats,horizontal=TRUE,las=1,
main="biggest effect size(median)",cex.axis=0.6,
xlab="t-statistic")
grid()
} else {
plot(1)
mtext("too few significant genes found")
}
dmscore <- data.frame( res$median * res$sig)
rownames(dmscore) <- rownames(res)
colnames(dmscore) <- "metric"
if ( ! file.exists("ReactomePathways.gmt") ) {
download.file("https://reactome.org/download/current/ReactomePathways.gmt.zip",
destfile="ReactomePathways.gmt.zip")
unzip("ReactomePathways.gmt.zip")
}
file.info("ReactomePathways.gmt")
## size isdir mode mtime ctime
## ReactomePathways.gmt 897680 FALSE 664 2022-10-03 21:38:44 2022-10-03 21:38:44
## atime uid gid uname grname
## ReactomePathways.gmt 2022-11-24 16:28:17 1000 1000 mdz mdz
genesets <- gmt_import("ReactomePathways.gmt")
length(genesets)
## [1] 2546
mres <- mitch_calc(x=dmscore, genesets=genesets, priority="effect")
## Note: Enrichments with large effect sizes may not be
## statistically significant.
head(mres$enrichment_result,20) %>%
kbl(caption = "Top enriched gene sets with GMEA-Mitch", row.names = FALSE) %>%
kable_styling("hover", full_width = F)
set | setSize | pANOVA | s.dist | p.adjustANOVA |
---|---|---|---|---|
Negative regulation of activity of TFAP2 (AP-2) family transcription factors | 10 | 0.0001209 | -0.7018024 | 0.0009285 |
Regulation of commissural axon pathfinding by SLIT and ROBO | 10 | 0.0003908 | -0.6474208 | 0.0025714 |
Apoptotic cleavage of cell adhesion proteins | 11 | 0.0002096 | -0.6453083 | 0.0014958 |
Transcriptional regulation of testis differentiation | 12 | 0.0002348 | -0.6130531 | 0.0016520 |
Thyroxine biosynthesis | 10 | 0.0034244 | -0.5343657 | 0.0159178 |
POU5F1 (OCT4), SOX2, NANOG repress genes related to differentiation | 10 | 0.0055806 | -0.5060048 | 0.0228640 |
ERBB2 Activates PTK6 Signaling | 13 | 0.0018315 | -0.4990366 | 0.0092526 |
Negative regulation of TCF-dependent signaling by WNT ligand antagonists | 15 | 0.0009549 | -0.4925046 | 0.0053957 |
Adenylate cyclase activating pathway | 10 | 0.0081256 | -0.4832503 | 0.0312683 |
Acetylcholine Neurotransmitter Release Cycle | 16 | 0.0009281 | -0.4780294 | 0.0052637 |
GABA synthesis, release, reuptake and degradation | 19 | 0.0003437 | -0.4742558 | 0.0022913 |
Platelet sensitization by LDL | 17 | 0.0008191 | 0.4686399 | 0.0047701 |
Defective B3GALTL causes PpS | 34 | 0.0000029 | -0.4632690 | 0.0000521 |
TP53 regulates transcription of additional cell cycle genes whose exact role in the p53 pathway remain uncertain | 19 | 0.0004924 | 0.4616661 | 0.0031316 |
SRP-dependent cotranslational protein targeting to membrane | 105 | 0.0000000 | 0.4519886 | 0.0000000 |
SLBP Dependent Processing of Replication-Dependent Histone Pre-mRNAs | 11 | 0.0094503 | 0.4518113 | 0.0354189 |
TFAP2 (AP-2) family regulates transcription of growth factors and their receptors | 12 | 0.0074999 | -0.4456644 | 0.0295334 |
O-glycosylation of TSR domain-containing proteins | 35 | 0.0000051 | -0.4453420 | 0.0000803 |
Carnitine metabolism | 11 | 0.0106185 | -0.4447950 | 0.0388601 |
Erythrocytes take up carbon dioxide and release oxygen | 12 | 0.0079199 | -0.4426103 | 0.0306439 |
mitch_report(mres,outfile="dma3a_mitch.html",overwrite=TRUE)
## Note: overwriting existing report
## Dataset saved as " /tmp/RtmpV6gDi1/dma3a_mitch.rds ".
##
##
## processing file: mitch.Rmd
##
|
| | 0%
|
|.. | 3%
## inline R code fragments
##
##
|
|.... | 6%
## label: checklibraries (with options)
## List of 1
## $ echo: logi FALSE
##
##
|
|...... | 9%
## ordinary text without R code
##
##
|
|........ | 12%
## label: peek (with options)
## List of 1
## $ echo: logi FALSE
##
##
|
|.......... | 15%
## ordinary text without R code
##
##
|
|............ | 18%
## label: metrics (with options)
## List of 1
## $ echo: logi FALSE
##
##
|
|.............. | 21%
## ordinary text without R code
##
##
|
|................ | 24%
## label: scatterplot (with options)
## List of 5
## $ echo : logi FALSE
## $ fig.height: num 6
## $ fig.width : num 6.5
## $ message : logi FALSE
## $ warning : logi FALSE
##
|
|................... | 26%
## ordinary text without R code
##
##
|
|..................... | 29%
## label: contourplot (with options)
## List of 5
## $ echo : logi FALSE
## $ fig.height: num 6
## $ fig.width : num 6.5
## $ warning : logi FALSE
## $ message : logi FALSE
## Contour plot does not apply to unidimensional analysis.
##
|
|....................... | 32%
## ordinary text without R code
##
##
|
|......................... | 35%
## label: input_geneset_metrics1 (with options)
## List of 2
## $ results: chr "asis"
## $ echo : logi FALSE
##
##
|
|........................... | 38%
## ordinary text without R code
##
##
|
|............................. | 41%
## label: input_geneset_metrics2 (with options)
## List of 5
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 7
## $ fig.width : num 7
## $ fig.show : chr "all"
##
|
|............................... | 44%
## ordinary text without R code
##
##
|
|................................. | 47%
## label: input_geneset_metrics3 (with options)
## List of 5
## $ results : chr "asis"
## $ echo : logi FALSE
## $ message : logi FALSE
## $ fig.height: num 7
## $ fig.width : num 7
##
|
|................................... | 50%
## ordinary text without R code
##
##
|
|..................................... | 53%
## label: echart1d (with options)
## List of 6
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 7
## $ fig.width : num 7
## $ fig.show : chr "all"
## $ message : logi FALSE
##
##
|
|....................................... | 56%
## label: echart2d (with options)
## List of 6
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 7
## $ fig.width : num 7
## $ fig.show : chr "all"
## $ message : logi FALSE
##
##
|
|......................................... | 59%
## ordinary text without R code
##
##
|
|........................................... | 62%
## label: heatmap (with options)
## List of 6
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 10
## $ fig.width : num 7
## $ fig.show : chr "all"
## $ message : logi FALSE
##
##
|
|............................................. | 65%
## ordinary text without R code
##
##
|
|............................................... | 68%
## label: effectsize (with options)
## List of 6
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 7
## $ fig.width : num 7
## $ fig.show : chr "all"
## $ message : logi FALSE
##
##
|
|................................................. | 71%
## ordinary text without R code
##
##
|
|................................................... | 74%
## label: results_table (with options)
## List of 2
## $ results: chr "asis"
## $ echo : logi FALSE
##
##
|
|...................................................... | 76%
## ordinary text without R code
##
##
|
|........................................................ | 79%
## label: results_table_complete (with options)
## List of 2
## $ results: chr "asis"
## $ echo : logi FALSE
##
##
|
|.......................................................... | 82%
## ordinary text without R code
##
##
|
|............................................................ | 85%
## label: detailed_geneset_reports1d (with options)
## List of 7
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 6
## $ fig.width : num 6
## $ out.width : chr "80%"
## $ comment : logi NA
## $ message : logi FALSE
##
|
|.............................................................. | 88%
## ordinary text without R code
##
##
|
|................................................................ | 91%
## label: detailed_geneset_reports2d (with options)
## List of 7
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 5
## $ fig.width : num 6
## $ out.width : chr "80%"
## $ comment : logi NA
## $ message : logi FALSE
##
##
|
|.................................................................. | 94%
## ordinary text without R code
##
##
|
|.................................................................... | 97%
## label: session_info (with options)
## List of 3
## $ include: logi TRUE
## $ echo : logi TRUE
## $ results: chr "markup"
##
##
|
|......................................................................| 100%
## ordinary text without R code
## output file: /home/mdz/projects/gmea/mitch.knit.md
## /usr/bin/pandoc +RTS -K512m -RTS /home/mdz/projects/gmea/mitch.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output /tmp/RtmpV6gDi1/mitch_report.html --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --section-divs --template /usr/local/lib/R/site-library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=bootstrap --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /tmp/RtmpV6gDi1/rmarkdown-str112b71e1fa10b.html
##
## Output created: /tmp/RtmpV6gDi1/mitch_report.html
## [1] TRUE
For reproducibility.
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid stats4 parallel stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] pkgload_1.3.0
## [2] GGally_2.1.2
## [3] ggplot2_3.3.6
## [4] gtools_3.9.3
## [5] tibble_3.1.8
## [6] echarts4r_0.4.4
## [7] ENmix_1.32.0
## [8] doParallel_1.0.17
## [9] qqman_0.1.8
## [10] RCircos_1.2.2
## [11] beeswarm_0.4.0
## [12] forestplot_3.1.0
## [13] abind_1.4-5
## [14] checkmate_2.1.0
## [15] reshape2_1.4.4
## [16] gplots_3.1.3
## [17] GEOquery_2.64.2
## [18] RColorBrewer_1.1-3
## [19] topconfects_1.12.0
## [20] DMRcatedata_2.14.0
## [21] ExperimentHub_2.4.0
## [22] AnnotationHub_3.4.0
## [23] BiocFileCache_2.4.0
## [24] dbplyr_2.2.1
## [25] DMRcate_2.10.0
## [26] missMethyl_1.30.0
## [27] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
## [28] R.utils_2.12.0
## [29] R.oo_1.25.0
## [30] R.methodsS3_1.8.2
## [31] plyr_1.8.7
## [32] limma_3.52.4
## [33] tictoc_1.1
## [34] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1
## [35] IlluminaHumanMethylation450kmanifest_0.4.0
## [36] minfi_1.42.0
## [37] bumphunter_1.38.0
## [38] locfit_1.5-9.6
## [39] iterators_1.0.14
## [40] foreach_1.5.2
## [41] Biostrings_2.64.1
## [42] XVector_0.36.0
## [43] SummarizedExperiment_1.26.1
## [44] Biobase_2.56.0
## [45] MatrixGenerics_1.8.1
## [46] matrixStats_0.62.0
## [47] GenomicRanges_1.48.0
## [48] GenomeInfoDb_1.32.4
## [49] IRanges_2.30.1
## [50] S4Vectors_0.34.0
## [51] BiocGenerics_0.42.0
## [52] mitch_1.8.0
## [53] eulerr_6.1.1
## [54] kableExtra_1.3.4
## [55] dplyr_1.0.10
##
## loaded via a namespace (and not attached):
## [1] rappdirs_0.3.3 rtracklayer_1.56.1
## [3] tidyr_1.2.1 bit64_4.0.5
## [5] knitr_1.40 DelayedArray_0.22.0
## [7] data.table_1.14.4 rpart_4.1.19
## [9] KEGGREST_1.36.3 RCurl_1.98-1.9
## [11] AnnotationFilter_1.20.0 generics_0.1.3
## [13] GenomicFeatures_1.48.4 preprocessCore_1.58.0
## [15] RSQLite_2.2.18 bit_4.0.4
## [17] tzdb_0.3.0 webshot_0.5.4
## [19] xml2_1.3.3 httpuv_1.6.6
## [21] assertthat_0.2.1 xfun_0.34
## [23] jquerylib_0.1.4 hms_1.1.2
## [25] evaluate_0.17 promises_1.2.0.1
## [27] fansi_1.0.3 restfulr_0.0.15
## [29] scrime_1.3.5 progress_1.2.2
## [31] caTools_1.18.2 readxl_1.4.1
## [33] DBI_1.1.3 geneplotter_1.74.0
## [35] htmlwidgets_1.5.4 reshape_0.8.9
## [37] purrr_0.3.5 ellipsis_0.3.2
## [39] backports_1.4.1 permute_0.9-7
## [41] calibrate_1.7.7 annotate_1.74.0
## [43] biomaRt_2.52.0 deldir_1.0-6
## [45] sparseMatrixStats_1.8.0 vctrs_0.5.0
## [47] ensembldb_2.20.2 withr_2.5.0
## [49] cachem_1.0.6 Gviz_1.40.1
## [51] BSgenome_1.64.0 GenomicAlignments_1.32.1
## [53] prettyunits_1.1.1 mclust_5.4.10
## [55] svglite_2.1.0 cluster_2.1.4
## [57] RPMM_1.25 lazyeval_0.2.2
## [59] crayon_1.5.2 genefilter_1.78.0
## [61] edgeR_3.38.4 pkgconfig_2.0.3
## [63] nlme_3.1-160 ProtGenerics_1.28.0
## [65] nnet_7.3-18 rlang_1.0.6
## [67] lifecycle_1.0.3 filelock_1.0.2
## [69] dichromat_2.0-0.1 cellranger_1.1.0
## [71] rngtools_1.5.2 base64_2.0.1
## [73] Matrix_1.5-1 Rhdf5lib_1.18.2
## [75] base64enc_0.1-3 png_0.1-7
## [77] viridisLite_0.4.1 rjson_0.2.21
## [79] bitops_1.0-7 KernSmooth_2.23-20
## [81] rhdf5filters_1.8.0 blob_1.2.3
## [83] DelayedMatrixStats_1.18.2 doRNG_1.8.2
## [85] stringr_1.4.1 nor1mix_1.3-0
## [87] readr_2.1.3 jpeg_0.1-9
## [89] scales_1.2.1 memoise_2.0.1
## [91] magrittr_2.0.3 zlibbioc_1.42.0
## [93] compiler_4.2.1 BiocIO_1.6.0
## [95] illuminaio_0.38.0 Rsamtools_2.12.0
## [97] cli_3.4.1 DSS_2.44.0
## [99] htmlTable_2.4.1 Formula_1.2-4
## [101] MASS_7.3-58.1 tidyselect_1.2.0
## [103] stringi_1.7.8 highr_0.9
## [105] yaml_2.3.6 askpass_1.1
## [107] latticeExtra_0.6-30 sass_0.4.2
## [109] VariantAnnotation_1.42.1 tools_4.2.1
## [111] rstudioapi_0.14 foreign_0.8-83
## [113] bsseq_1.32.0 gridExtra_2.3
## [115] digest_0.6.30 BiocManager_1.30.19
## [117] shiny_1.7.3 quadprog_1.5-8
## [119] Rcpp_1.0.9 siggenes_1.70.0
## [121] BiocVersion_3.15.2 later_1.3.0
## [123] org.Hs.eg.db_3.15.0 httr_1.4.4
## [125] AnnotationDbi_1.58.0 biovizBase_1.44.0
## [127] colorspace_2.0-3 rvest_1.0.3
## [129] XML_3.99-0.11 splines_4.2.1
## [131] statmod_1.4.37 multtest_2.52.0
## [133] systemfonts_1.0.4 xtable_1.8-4
## [135] jsonlite_1.8.3 dynamicTreeCut_1.63-1
## [137] R6_2.5.1 Hmisc_4.7-1
## [139] pillar_1.8.1 htmltools_0.5.3
## [141] mime_0.12 glue_1.6.2
## [143] fastmap_1.1.0 BiocParallel_1.30.4
## [145] interactiveDisplayBase_1.34.0 beanplot_1.3.1
## [147] codetools_0.2-18 utf8_1.2.2
## [149] bslib_0.4.0 lattice_0.20-45
## [151] curl_4.3.3 openssl_2.0.4
## [153] interp_1.1-3 survival_3.4-0
## [155] rmarkdown_2.17 munsell_0.5.0
## [157] rhdf5_2.40.0 GenomeInfoDbData_1.2.8
## [159] impute_1.70.0 HDF5Array_1.24.2
## [161] gtable_0.3.1