Previously, Mandhri and I analysed the B-PROOF 450K data, trying to understand whether vitamin supplementation caused changes in gene methylation. We used Limma and some basic analyses, which showed no specific probes with FDR<0.05, nor any DMRs.
In this analysis we will use the principle of Gene Set Enrichment Analysis, applying it to many probes belonging to genes. If the probes are trending in concert, then we can make some judgement about the enrichment of those probes. The statistical test used is the Wilcox test, which can be applied to competitive or self contained test types.
library("parallel")
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library("kableExtra")
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library("eulerr")
library("mitch")
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
## Registered S3 method overwritten by 'gplots':
## method from
## plot.venn eulerr
library("IlluminaHumanMethylation450kmanifest")
## Loading required package: minfi
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:dplyr':
##
## combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which.max, which.min
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:dplyr':
##
## first, rename
## The following objects are masked from 'package:base':
##
## expand.grid, I, unname
## Loading required package: IRanges
##
## Attaching package: 'IRanges'
## The following objects are masked from 'package:dplyr':
##
## collapse, desc, slice
## Loading required package: GenomeInfoDb
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'matrixStats'
## The following object is masked from 'package:dplyr':
##
## count
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
## Loading required package: Biostrings
## Loading required package: XVector
##
## Attaching package: 'Biostrings'
## The following object is masked from 'package:base':
##
## strsplit
## Loading required package: bumphunter
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: locfit
## locfit 1.5-9.6 2022-07-11
## Setting options('download.file.method.GEOquery'='auto')
## Setting options('GEOquery.inmemory.gpl'=FALSE)
library("IlluminaHumanMethylation450kanno.ilmn12.hg19")
library("tictoc")
##
## Attaching package: 'tictoc'
## The following object is masked from 'package:SummarizedExperiment':
##
## shift
## The following object is masked from 'package:GenomicRanges':
##
## shift
## The following object is masked from 'package:IRanges':
##
## shift
## The following object is masked from 'package:S4Vectors':
##
## List
ann450k <- getAnnotation(IlluminaHumanMethylation450kanno.ilmn12.hg19)
myann <- data.frame(ann450k[,c("UCSC_RefGene_Name","Regulatory_Feature_Group")])
promoters <- grep("Prom",myann$Regulatory_Feature_Group)
It is thought that high plasma homocysteine can inhibit DNA methylation. Let’s see whether that is the case, and which genes are most affected. This analysis is conducted at the whole gene level as well as on the level of promoters.
dm <- read.table("dma3a.tsv.gz")
dm <- dm[,4:9]
dm <- merge(myann,dm,by=0)
rownames(dm) <- dm[,1]
dm[,1] = NULL
# testing
#dm <- head(dm,40000)
head(dm,50) %>%
kbl(caption = "Top significant genes with limma") %>%
kable_paper("hover", full_width = F)
UCSC_RefGene_Name | Regulatory_Feature_Group | logFC | AveExpr | t | P.Value | adj.P.Val | B | |
---|---|---|---|---|---|---|---|---|
cg00000029 | RBL2 | Promoter_Associated | -0.0446959 | 0.4522209 | -0.8638867 | 0.3888420 | 0.9098372 | -5.265779 |
cg00000108 | C3orf35;C3orf35 | -0.0122602 | 3.6607954 | -0.1654468 | 0.8687848 | 0.9886500 | -5.576980 | |
cg00000109 | FNDC3B;FNDC3B | -0.0378447 | 2.0296336 | -0.5100334 | 0.6106758 | 0.9574373 | -5.476065 | |
cg00000165 | -0.0277057 | -1.0289408 | -0.5477061 | 0.5845972 | 0.9528277 | -5.458804 | ||
cg00000236 | VDAC3;VDAC3 | -0.0308775 | 1.8879879 | -0.6273277 | 0.5312685 | 0.9421655 | -5.418295 | |
cg00000289 | ACTN1;ACTN1;ACTN1 | -0.0347776 | 0.2694825 | -0.5508856 | 0.5824203 | 0.9524415 | -5.457291 | |
cg00000292 | ATP2A1;ATP2A1 | 0.0078848 | 2.8508680 | 0.1712753 | 0.8642068 | 0.9883139 | -5.576129 | |
cg00000321 | SFRP1 | -0.0006141 | -0.8769170 | -0.0091726 | 0.9926919 | 0.9994500 | -5.588821 | |
cg00000363 | -0.0207565 | -0.2718208 | -0.3794289 | 0.7048330 | 0.9708901 | -5.526411 | ||
cg00000622 | NIPA2;NIPA2;NIPA2;NIPA2 | Promoter_Associated | 0.0333418 | -4.8495004 | 0.7574773 | 0.4497908 | 0.9250804 | -5.340328 |
cg00000658 | MAN1B1 | 0.0226951 | 2.0750918 | 0.7385729 | 0.4611636 | 0.9276129 | -5.352556 | |
cg00000714 | TSEN34;TSEN34 | -0.0767442 | -1.5409438 | -2.5167547 | 0.0127504 | 0.4684043 | -2.895740 | |
cg00000721 | LRRC16A | Unclassified_Cell_type_specific | -0.0031114 | 2.7846882 | -0.0385133 | 0.9693227 | 0.9977628 | -5.588214 |
cg00000734 | CNBP;CNBP;CNBP;CNBP;CNBP;CNBP | Unclassified | -0.0224123 | -3.0864719 | -0.8730215 | 0.3838572 | 0.9082510 | -5.258927 |
cg00000769 | DDX55 | Promoter_Associated | 0.0044715 | -3.1463101 | 0.1951579 | 0.8454975 | 0.9862198 | -5.572331 |
cg00000884 | TLR2 | -0.0355966 | 2.4344668 | -0.6542490 | 0.5138171 | 0.9388729 | -5.403363 | |
cg00000905 | FAM81A | Unclassified | 0.0269914 | -2.1171455 | 0.5264592 | 0.5992411 | 0.9555164 | -5.468689 |
cg00000924 | KCNQ1;KCNQ1OT1;KCNQ1 | 0.0076655 | 0.4463612 | 0.2133270 | 0.8313222 | 0.9846206 | -5.569112 | |
cg00000948 | -0.0512202 | 2.8596325 | -1.1837137 | 0.2381445 | 0.8535327 | -4.983571 | ||
cg00000957 | NPHP4 | Unclassified_Cell_type_specific | 0.0113656 | 2.7283928 | 0.3477238 | 0.7284688 | 0.9734455 | -5.536407 |
cg00001099 | PSKH2 | -0.0540892 | 2.2699920 | -0.7493275 | 0.4546739 | 0.9262433 | -5.345637 | |
cg00001245 | MRPS25;MRPS25 | Promoter_Associated | 0.0127411 | -4.2195473 | 0.3178478 | 0.7509823 | 0.9762343 | -5.545030 |
cg00001249 | 0.0878332 | 2.2493861 | 1.3410880 | 0.1816436 | 0.8210138 | -4.812929 | ||
cg00001261 | 0.0441601 | -0.2075480 | 0.9709593 | 0.3329193 | 0.8918583 | -5.180991 | ||
cg00001269 | 0.0390867 | 2.9907317 | 1.0911248 | 0.2767298 | 0.8718875 | -5.074204 | ||
cg00001349 | MAEL | 0.0094591 | 1.9174464 | 0.1451613 | 0.8847519 | 0.9899344 | -5.579714 | |
cg00001364 | PROX1 | -0.0859751 | 2.3099396 | -1.4045238 | 0.1619493 | 0.8064409 | -4.738269 | |
cg00001446 | ELOVL1 | 0.0361831 | 2.9141672 | 0.8905659 | 0.3743945 | 0.9051405 | -5.245568 | |
cg00001510 | LILRA6 | 0.0045678 | -0.0996494 | 0.1465904 | 0.8836254 | 0.9898532 | -5.579533 | |
cg00001534 | FAF1 | -0.1186347 | 3.3033444 | -1.5710033 | 0.1180025 | 0.7613262 | -4.526384 | |
cg00001582 | LOC283050;LOC283050;LOC283050;ZMIZ1 | 0.0248102 | -2.9750721 | 0.9292188 | 0.3540658 | 0.8980126 | -5.215208 | |
cg00001583 | NR5A2;NR5A2 | -0.0441798 | -2.8591736 | -0.9330761 | 0.3520765 | 0.8973466 | -5.212108 | |
cg00001594 | ROCK2;ROCK2 | 0.0042517 | -4.7593997 | 0.0853653 | 0.9320693 | 0.9944972 | -5.585695 | |
cg00001687 | CDK10;CDK10;CDK10;CDK10;CDK10;CDK10 | 0.0015161 | 5.0163253 | 0.0361013 | 0.9712431 | 0.9978621 | -5.588292 | |
cg00001747 | Unclassified_Cell_type_specific | -0.1126870 | -2.7076297 | -1.5885362 | 0.1139845 | 0.7562820 | -4.502731 | |
cg00001791 | TMEM182 | 0.0182578 | 2.9978670 | 0.2936189 | 0.7693998 | 0.9781258 | -5.551455 | |
cg00001793 | ETV6 | -0.1098128 | 1.0746323 | -2.0295079 | 0.0439333 | 0.6192038 | -3.825126 | |
cg00001809 | 0.0525778 | 1.7725728 | 1.1597328 | 0.2477500 | 0.8587519 | -5.007741 | ||
cg00001854 | DNAJA2 | 0.0114339 | 2.8722017 | 0.2049001 | 0.8378902 | 0.9853220 | -5.570641 | |
cg00001930 | 0.0632018 | 2.1698165 | 0.8056091 | 0.4215702 | 0.9186297 | -5.307809 | ||
cg00002028 | PINK1;PINK1 | Promoter_Associated | 0.0295245 | -3.5036247 | 0.8499858 | 0.3965035 | 0.9114511 | -5.276068 |
cg00002033 | LRFN1 | 0.0542461 | 1.7907903 | 0.4372262 | 0.6624906 | 0.9647231 | -5.505950 | |
cg00002080 | RWDD2B | Unclassified_Cell_type_specific | 0.0795838 | 2.1256473 | 1.4263346 | 0.1555674 | 0.8004050 | -4.711822 |
cg00002116 | MRPL12;MRPL12 | Promoter_Associated | -0.0045132 | -5.0826553 | -0.0905479 | 0.9279562 | 0.9940693 | -5.585300 |
cg00002145 | COL6A3;COL6A3;COL6A3 | 0.0863898 | 2.7662213 | 1.9901870 | 0.0481398 | 0.6319901 | -3.891945 | |
cg00002190 | -0.0809549 | 2.2105512 | -1.6860704 | 0.0935784 | 0.7287342 | -4.366527 | ||
cg00002224 | C8orf31 | Unclassified_Cell_type_specific | 0.0994866 | 0.1264213 | 1.3473757 | 0.1796150 | 0.8195515 | -4.805679 |
cg00002236 | RTTN | Promoter_Associated | -0.0189864 | -3.5918761 | -0.6113878 | 0.5417426 | 0.9447402 | -5.426842 |
cg00002406 | CD2BP2;CD2BP2 | Promoter_Associated | 0.0073673 | -4.1952587 | 0.2710988 | 0.7866371 | 0.9798899 | -5.556971 |
cg00002426 | SLMAP | 0.0357775 | 3.4533187 | 0.6786596 | 0.4982571 | 0.9355991 | -5.389284 |
# trim down the input dataset
dm <- dm[,c("UCSC_RefGene_Name","t")]
# histogram of t values
hist(dm$t,breaks=seq(from=-6,to=6,by=1))
# set cores to used for parallel execution
CORES= detectCores()
calc_sc <- function(dm) {
gn <- unique(unlist(strsplit( dm$UCSC_RefGene_Name ,";")))
gnl <- strsplit( dm$UCSC_RefGene_Name ,";")
gnl <- mclapply(gnl,unique,mc.cores=CORES)
dm$UCSC_RefGene_Name <- gnl
l <- mclapply(1:nrow(dm), function(i) {
a <- dm[i,]
len <- length(a[[1]][[1]])
tvals <- as.numeric(rep(a["t"],len))
genes <- a[[1]][[1]]
data.frame(genes,tvals)
},mc.cores=CORES)
df <- do.call(rbind,l)
gme_res <- mclapply( 1:length(gn), function(i) {
g <- gn[i]
tstats <- df[which(df$genes==g),"tvals"]
myn <- length(tstats)
mymean <- mean(tstats)
mymedian <- median(tstats)
wtselfcont <- wilcox.test(tstats)
res <- c("gene"=g,"nprobes"=myn,"mean"=mymean,"median"=mymedian,
"p-value(sc)"=wtselfcont$p.value)
} , mc.cores=CORES )
gme_res_df <- do.call(rbind, gme_res)
rownames(gme_res_df) <- gme_res_df[,1]
gme_res_df <- gme_res_df[,-1]
tmp <- apply(gme_res_df,2,as.numeric)
rownames(tmp) <- rownames(gme_res_df)
gme_res_df <- as.data.frame(tmp)
gme_res_df$sig <- -log10(gme_res_df[,4])
gme_res_df <- gme_res_df[order(-gme_res_df$sig),]
gme_res_df$`fdr(sc)` <- p.adjust(gme_res_df$`p-value(sc)`)
out <- list("df"=df,"gme_res_df"=gme_res_df)
return(out)
}
tic()
gme_res_wholegene <- calc_sc(dm)
time2 <- toc() #38 44 41 40 44 41 40
## 52.011 sec elapsed
time2
## $tic
## elapsed
## 40.678
##
## $toc
## elapsed
## 92.689
##
## $msg
## logical(0)
##
## $callback_msg
## [1] "52.011 sec elapsed"
df <- gme_res_wholegene[[1]]
res <- gme_res_wholegene[[2]]
write.table(res,file="gmea_wholegene.tsv")
head(res,50) %>%
kbl(caption = "Top significant genes with GMEA") %>%
kable_paper("hover", full_width = F)
nprobes | mean | median | p-value(sc) | sig | fdr(sc) | |
---|---|---|---|---|---|---|
TNXB | 531 | 0.3429080 | 0.4078607 | 0e+00 | 14.041475 | 0.0000000 |
PCDHA1 | 162 | -0.7077910 | -0.6245086 | 0e+00 | 13.496797 | 0.0000000 |
NNAT | 49 | -0.9419415 | -0.9392397 | 0e+00 | 13.303312 | 0.0000000 |
PCDHA2 | 149 | -0.7188634 | -0.6267169 | 0e+00 | 12.485679 | 0.0000000 |
PCDHA3 | 141 | -0.7189282 | -0.6267169 | 0e+00 | 11.583810 | 0.0000001 |
KCNQ1DN | 39 | -1.6144397 | -1.7080108 | 0e+00 | 11.439140 | 0.0000001 |
TAP1 | 100 | 0.8373978 | 0.8170222 | 0e+00 | 10.577873 | 0.0000005 |
NCOR2 | 212 | -0.6647509 | -0.5447866 | 0e+00 | 10.345470 | 0.0000009 |
PCDHGA1 | 317 | -0.4141393 | -0.4153141 | 0e+00 | 10.238108 | 0.0000012 |
PCDHA4 | 131 | -0.6779619 | -0.6026401 | 0e+00 | 10.042257 | 0.0000018 |
PCDHGA2 | 309 | -0.4165786 | -0.4153141 | 0e+00 | 9.987553 | 0.0000021 |
PCDHGA3 | 295 | -0.4180524 | -0.3901466 | 0e+00 | 9.576696 | 0.0000053 |
MESTIT1 | 59 | -0.7596278 | -0.7939385 | 0e+00 | 9.544056 | 0.0000057 |
PCDHA5 | 122 | -0.6998042 | -0.6245086 | 0e+00 | 9.542358 | 0.0000058 |
NKX6-2 | 36 | -1.3952706 | -1.4319593 | 0e+00 | 9.536050 | 0.0000058 |
SOX2OT | 85 | -0.7468762 | -0.8104998 | 0e+00 | 9.479982 | 0.0000066 |
PCDHA7 | 103 | -0.7321981 | -0.6267169 | 0e+00 | 9.375369 | 0.0000085 |
C11orf21 | 36 | -1.7668062 | -1.9463963 | 0e+00 | 9.257296 | 0.0000111 |
PITX2 | 64 | -1.0007742 | -0.9208371 | 0e+00 | 9.163509 | 0.0000138 |
PCDHA6 | 114 | -0.6916765 | -0.6245086 | 0e+00 | 8.749552 | 0.0000357 |
TSPAN32 | 42 | -1.6254336 | -1.9412022 | 0e+00 | 8.645175 | 0.0000454 |
MEST | 85 | -0.6582939 | -0.7858506 | 0e+00 | 8.607705 | 0.0000495 |
ASCL2 | 49 | -0.8873371 | -0.9068647 | 0e+00 | 8.538922 | 0.0000580 |
WT1 | 60 | -1.0380025 | -1.0581139 | 0e+00 | 8.537528 | 0.0000582 |
PCDHGB1 | 277 | -0.4040374 | -0.3512600 | 0e+00 | 8.169027 | 0.0001360 |
PCDHA8 | 92 | -0.7250898 | -0.6311017 | 0e+00 | 8.158124 | 0.0001394 |
KIAA1949 | 101 | 0.7427567 | 0.6964985 | 0e+00 | 8.107412 | 0.0001567 |
PCDHA10 | 81 | -0.7546498 | -0.6842488 | 0e+00 | 7.961957 | 0.0002190 |
PCDHA9 | 84 | -0.7351768 | -0.6598677 | 0e+00 | 7.948914 | 0.0002257 |
PSMB8 | 83 | 0.8313260 | 0.6488706 | 0e+00 | 7.945020 | 0.0002277 |
PRDM13 | 41 | -0.9251076 | -0.9595314 | 0e+00 | 7.870850 | 0.0002701 |
GNA12 | 84 | -0.9539754 | -0.7963257 | 0e+00 | 7.622375 | 0.0004787 |
PCDHGA4 | 263 | -0.4030756 | -0.3512600 | 0e+00 | 7.620039 | 0.0004812 |
MAD1L1 | 684 | -0.2827612 | -0.2513758 | 0e+00 | 7.532317 | 0.0005889 |
SFRP2 | 44 | -1.1033457 | -1.2056608 | 0e+00 | 7.518236 | 0.0006083 |
SOX1 | 27 | -1.2956875 | -1.5392701 | 0e+00 | 7.349659 | 0.0008967 |
ZIC1 | 40 | -1.0223609 | -1.1114647 | 1e-07 | 7.295516 | 0.0010157 |
SPON2 | 30 | -1.0371460 | -1.0318217 | 1e-07 | 7.096401 | 0.0016064 |
TLX1 | 36 | -0.7858743 | -0.8711780 | 1e-07 | 7.094355 | 0.0016139 |
PPP1R2P1 | 27 | -1.2635175 | -1.2835267 | 1e-07 | 6.826780 | 0.0029884 |
TBX15 | 57 | -0.8784010 | -0.9985338 | 2e-07 | 6.710128 | 0.0039091 |
HLA-E | 64 | 0.9883267 | 0.9471868 | 2e-07 | 6.621543 | 0.0047933 |
SOX2 | 31 | -0.9356689 | -0.9861379 | 4e-07 | 6.380592 | 0.0083477 |
HLA-J | 60 | -0.9474161 | -0.9835159 | 4e-07 | 6.363271 | 0.0086869 |
TFAP2A | 77 | -0.6823000 | -0.6389762 | 4e-07 | 6.359493 | 0.0087624 |
PCDHGB2 | 249 | -0.3779342 | -0.2994303 | 5e-07 | 6.334383 | 0.0092835 |
TBX2 | 42 | -0.9192182 | -0.9768360 | 5e-07 | 6.283615 | 0.0104341 |
NR2E1 | 55 | -0.7936621 | -0.8098274 | 6e-07 | 6.236670 | 0.0116246 |
DMRTA2 | 32 | -0.8190626 | -0.8833632 | 6e-07 | 6.231215 | 0.0117710 |
LOC100128811 | 24 | -0.9214760 | -1.0762458 | 6e-07 | 6.224720 | 0.0119478 |
# volcano selfcont
sig <- subset(res,`fdr(sc)` < 0.05)
plot(res$median , -log10(res$`p-value(sc)`) ,
xlab="effect size (mean t-stat)", ylab="-log10(p-value)",
pch=19, cex=0.5, col="gray",main="self contained test")
grid()
points(sig$median , -log10(sig$`p-value(sc)`) ,
pch=19, cex=0.5, col="red")
plot(res$nprobes,res$sig,log="x",ylim=c(0,50),pch=19,cex=0.6)
points(sig$nprobes,sig$sig,col="red",pch=19,cex=0.62)
MIN = min(sig$nprobes)
LEFT = nrow(subset(res,nprobes<MIN))
RIGHT = nrow(subset(res,nprobes>MIN))
SIG = nrow(sig)
TOT = nrow(res)
HEADER <- paste(TOT, "genes in total.", SIG, "with FDR<0.05.",
RIGHT, "well covered and", LEFT, "poorly covered")
mtext(HEADER)
abline(v=MIN,lty=2)
Boxplots smallest pvalue.
par(mfrow=c(1,2))
n=50
# self contained
gs <- head(rownames(res),50)
tstats <- lapply(gs, function(g) {
df[which(df$genes==g),"tvals"]
})
names(tstats) <- gs
tstats <- tstats[order(unlist(lapply(tstats,median)))]
boxplot(tstats,horizontal=TRUE,las=1,
main="smallest p-val(selfcont)",cex.axis=0.6,
xlab="t-statistic")
grid()
n=50
# effect size median
sig <- subset(res,`fdr(sc)` < 0.05)
gs <- head(rownames(sig[order(-abs(sig$median)),]),n)
if ( length(gs) >2 ) {
tstats <- lapply(gs, function(g) {
df[which(df$genes==g),"tvals"]
})
names(tstats) <- gs
tstats <- tstats[order(unlist(lapply(tstats,median)))]
boxplot(tstats,horizontal=TRUE,las=1,
main="biggest effect size(median)",cex.axis=0.6,
xlab="t-statistic")
grid()
} else {
plot(1)
mtext("too few significant genes found")
}
dmscore <- data.frame( res$median * res$sig)
rownames(dmscore) <- rownames(res)
colnames(dmscore) <- "metric"
if ( ! file.exists("ReactomePathways.gmt") ) {
download.file("https://reactome.org/download/current/ReactomePathways.gmt.zip",
destfile="ReactomePathways.gmt.zip")
unzip("ReactomePathways.gmt.zip")
}
file.info("ReactomePathways.gmt")
## size isdir mode mtime ctime
## ReactomePathways.gmt 897680 FALSE 664 2022-10-03 21:38:44 2022-10-03 21:38:44
## atime uid gid uname grname
## ReactomePathways.gmt 2022-10-13 10:42:43 1000 1000 mdz mdz
genesets <- gmt_import("ReactomePathways.gmt")
length(genesets)
## [1] 2546
mres <- mitch_calc(x=dmscore, genesets=genesets, priority="effect")
## Note: Enrichments with large effect sizes may not be
## statistically significant.
head(mres$enrichment_result,20) %>%
kbl(caption = "Top enriched gene sets with GMEA-Mitch") %>%
kable_paper("hover", full_width = F)
set | setSize | pANOVA | s.dist | p.adjustANOVA | |
---|---|---|---|---|---|
797 | Negative regulation of activity of TFAP2 (AP-2) family transcription factors | 10 | 0.0001209 | -0.7018024 | 0.0009285 |
1099 | Regulation of commissural axon pathfinding by SLIT and ROBO | 10 | 0.0003908 | -0.6474208 | 0.0025714 |
87 | Apoptotic cleavage of cell adhesion proteins | 11 | 0.0002096 | -0.6453083 | 0.0014958 |
1418 | Transcriptional regulation of testis differentiation | 12 | 0.0002348 | -0.6130531 | 0.0016520 |
1386 | Thyroxine biosynthesis | 10 | 0.0034244 | -0.5343657 | 0.0159178 |
879 | POU5F1 (OCT4), SOX2, NANOG repress genes related to differentiation | 10 | 0.0055806 | -0.5060048 | 0.0228640 |
367 | ERBB2 Activates PTK6 Signaling | 13 | 0.0018315 | -0.4990366 | 0.0092526 |
796 | Negative regulation of TCF-dependent signaling by WNT ligand antagonists | 15 | 0.0009549 | -0.4925046 | 0.0053957 |
64 | Adenylate cyclase activating pathway | 10 | 0.0081256 | -0.4832503 | 0.0312683 |
29 | Acetylcholine Neurotransmitter Release Cycle | 16 | 0.0009281 | -0.4780294 | 0.0052637 |
479 | GABA synthesis, release, reuptake and degradation | 19 | 0.0003437 | -0.4742558 | 0.0022913 |
923 | Platelet sensitization by LDL | 17 | 0.0008191 | 0.4686399 | 0.0047701 |
275 | Defective B3GALTL causes PpS | 34 | 0.0000029 | -0.4632690 | 0.0000521 |
1360 | TP53 regulates transcription of additional cell cycle genes whose exact role in the p53 pathway remain uncertain | 19 | 0.0004924 | 0.4616661 | 0.0031316 |
1167 | SRP-dependent cotranslational protein targeting to membrane | 105 | 0.0000000 | 0.4519886 | 0.0000000 |
1162 | SLBP Dependent Processing of Replication-Dependent Histone Pre-mRNAs | 11 | 0.0094503 | 0.4518113 | 0.0354189 |
1340 | TFAP2 (AP-2) family regulates transcription of growth factors and their receptors | 12 | 0.0074999 | -0.4456644 | 0.0295334 |
837 | O-glycosylation of TSR domain-containing proteins | 35 | 0.0000051 | -0.4453420 | 0.0000803 |
168 | Carnitine metabolism | 11 | 0.0106185 | -0.4447950 | 0.0388601 |
385 | Erythrocytes take up carbon dioxide and release oxygen | 12 | 0.0079199 | -0.4426103 | 0.0306439 |
mitch_report(mres,outfile="dma3a_mitch.html",overwrite=TRUE)
## Note: overwriting existing report
## Dataset saved as " /tmp/RtmpAHTfXe/dma3a_mitch.rds ".
##
##
## processing file: mitch.Rmd
##
|
| | 0%
|
|.. | 3%
## inline R code fragments
##
##
|
|.... | 6%
## label: checklibraries (with options)
## List of 1
## $ echo: logi FALSE
##
##
|
|...... | 9%
## ordinary text without R code
##
##
|
|........ | 12%
## label: peek (with options)
## List of 1
## $ echo: logi FALSE
##
##
|
|.......... | 15%
## ordinary text without R code
##
##
|
|............ | 18%
## label: metrics (with options)
## List of 1
## $ echo: logi FALSE
##
##
|
|.............. | 21%
## ordinary text without R code
##
##
|
|................ | 24%
## label: scatterplot (with options)
## List of 5
## $ echo : logi FALSE
## $ fig.height: num 6
## $ fig.width : num 6.5
## $ message : logi FALSE
## $ warning : logi FALSE
##
|
|................... | 26%
## ordinary text without R code
##
##
|
|..................... | 29%
## label: contourplot (with options)
## List of 5
## $ echo : logi FALSE
## $ fig.height: num 6
## $ fig.width : num 6.5
## $ warning : logi FALSE
## $ message : logi FALSE
## Contour plot does not apply to unidimensional analysis.
##
|
|....................... | 32%
## ordinary text without R code
##
##
|
|......................... | 35%
## label: input_geneset_metrics1 (with options)
## List of 2
## $ results: chr "asis"
## $ echo : logi FALSE
##
##
|
|........................... | 38%
## ordinary text without R code
##
##
|
|............................. | 41%
## label: input_geneset_metrics2 (with options)
## List of 5
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 7
## $ fig.width : num 7
## $ fig.show : chr "all"
##
|
|............................... | 44%
## ordinary text without R code
##
##
|
|................................. | 47%
## label: input_geneset_metrics3 (with options)
## List of 5
## $ results : chr "asis"
## $ echo : logi FALSE
## $ message : logi FALSE
## $ fig.height: num 7
## $ fig.width : num 7
##
|
|................................... | 50%
## ordinary text without R code
##
##
|
|..................................... | 53%
## label: echart1d (with options)
## List of 6
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 7
## $ fig.width : num 7
## $ fig.show : chr "all"
## $ message : logi FALSE
##
##
|
|....................................... | 56%
## label: echart2d (with options)
## List of 6
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 7
## $ fig.width : num 7
## $ fig.show : chr "all"
## $ message : logi FALSE
##
##
|
|......................................... | 59%
## ordinary text without R code
##
##
|
|........................................... | 62%
## label: heatmap (with options)
## List of 6
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 10
## $ fig.width : num 7
## $ fig.show : chr "all"
## $ message : logi FALSE
##
##
|
|............................................. | 65%
## ordinary text without R code
##
##
|
|............................................... | 68%
## label: effectsize (with options)
## List of 6
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 7
## $ fig.width : num 7
## $ fig.show : chr "all"
## $ message : logi FALSE
##
##
|
|................................................. | 71%
## ordinary text without R code
##
##
|
|................................................... | 74%
## label: results_table (with options)
## List of 2
## $ results: chr "asis"
## $ echo : logi FALSE
##
##
|
|...................................................... | 76%
## ordinary text without R code
##
##
|
|........................................................ | 79%
## label: results_table_complete (with options)
## List of 2
## $ results: chr "asis"
## $ echo : logi FALSE
##
##
|
|.......................................................... | 82%
## ordinary text without R code
##
##
|
|............................................................ | 85%
## label: detailed_geneset_reports1d (with options)
## List of 7
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 6
## $ fig.width : num 6
## $ out.width : chr "80%"
## $ comment : logi NA
## $ message : logi FALSE
##
|
|.............................................................. | 88%
## ordinary text without R code
##
##
|
|................................................................ | 91%
## label: detailed_geneset_reports2d (with options)
## List of 7
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 5
## $ fig.width : num 6
## $ out.width : chr "80%"
## $ comment : logi NA
## $ message : logi FALSE
##
##
|
|.................................................................. | 94%
## ordinary text without R code
##
##
|
|.................................................................... | 97%
## label: session_info (with options)
## List of 3
## $ include: logi TRUE
## $ echo : logi TRUE
## $ results: chr "markup"
##
##
|
|......................................................................| 100%
## ordinary text without R code
## output file: /home/mdz/projects/gmea/mitch.knit.md
## /usr/bin/pandoc +RTS -K512m -RTS /home/mdz/projects/gmea/mitch.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output /tmp/RtmpAHTfXe/mitch_report.html --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --section-divs --template /usr/local/lib/R/site-library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=bootstrap --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /tmp/RtmpAHTfXe/rmarkdown-str1d698318e965f.html
##
## Output created: /tmp/RtmpAHTfXe/mitch_report.html
## [1] TRUE
dm <- read.table("dma3a.tsv.gz")
dm <- dm[,4:9]
dm <- merge(myann,dm,by=0)
rownames(dm) <- dm[,1]
dm[,1] = NULL
dm <- dm[grep("Promoter_Associated",dm$Regulatory_Feature_Group),]
head(dm,50) %>%
kbl(caption = "Top significant genes with limma") %>%
kable_paper("hover", full_width = F)
UCSC_RefGene_Name | Regulatory_Feature_Group | logFC | AveExpr | t | P.Value | adj.P.Val | B | |
---|---|---|---|---|---|---|---|---|
cg00000029 | RBL2 | Promoter_Associated | -0.0446959 | 0.4522209 | -0.8638867 | 0.3888420 | 0.9098372 | -5.265779 |
cg00000622 | NIPA2;NIPA2;NIPA2;NIPA2 | Promoter_Associated | 0.0333418 | -4.8495004 | 0.7574773 | 0.4497908 | 0.9250804 | -5.340328 |
cg00000769 | DDX55 | Promoter_Associated | 0.0044715 | -3.1463101 | 0.1951579 | 0.8454975 | 0.9862198 | -5.572331 |
cg00001245 | MRPS25;MRPS25 | Promoter_Associated | 0.0127411 | -4.2195473 | 0.3178478 | 0.7509823 | 0.9762343 | -5.545030 |
cg00002028 | PINK1;PINK1 | Promoter_Associated | 0.0295245 | -3.5036247 | 0.8499858 | 0.3965035 | 0.9114511 | -5.276068 |
cg00002116 | MRPL12;MRPL12 | Promoter_Associated | -0.0045132 | -5.0826553 | -0.0905479 | 0.9279562 | 0.9940693 | -5.585300 |
cg00002236 | RTTN | Promoter_Associated | -0.0189864 | -3.5918761 | -0.6113878 | 0.5417426 | 0.9447402 | -5.426842 |
cg00002406 | CD2BP2;CD2BP2 | Promoter_Associated | 0.0073673 | -4.1952587 | 0.2710988 | 0.7866371 | 0.9798899 | -5.556971 |
cg00002660 | SMARCC2;SMARCC2;SMARCC2 | Promoter_Associated | -0.0276799 | -3.4149124 | -0.9195186 | 0.3591001 | 0.8995507 | -5.222947 |
cg00002930 | NFKBIL1;NFKBIL1;ATP6V1G2;NFKBIL1;NFKBIL1;NFKBIL1;ATP6V1G2;NFKBIL1 | Promoter_Associated | 0.0431087 | -2.5246196 | 0.9921947 | 0.3224833 | 0.8886832 | -5.163013 |
cg00003173 | CHCHD4;TMEM43;CHCHD4 | Promoter_Associated | 0.0440248 | -3.1942015 | 1.2228554 | 0.2230416 | 0.8453216 | -4.943077 |
cg00003202 | RFX5;RFX5 | Promoter_Associated | 0.0246642 | -4.4899392 | 0.4925218 | 0.6229727 | 0.9593472 | -5.483671 |
cg00003784 | CCDC45;DDX5 | Promoter_Associated | 0.0271078 | -3.7722693 | 0.8931927 | 0.3729904 | 0.9046774 | -5.243545 |
cg00004072 | ZFP36 | Promoter_Associated | 0.0532912 | -3.1321117 | 1.4801258 | 0.1406519 | 0.7861540 | -4.644902 |
cg00004082 | SLC2A9;SLC2A9 | Promoter_Associated | -0.0567563 | -2.1842139 | -0.9558817 | 0.3404610 | 0.8938321 | -5.193522 |
cg00004207 | SFRS7 | Promoter_Associated_Cell_type_specific | -0.0362644 | -3.0355991 | -0.8113672 | 0.4182657 | 0.9177356 | -5.303786 |
cg00005010 | MT1F | Promoter_Associated_Cell_type_specific | 0.0077148 | -4.0442169 | 0.2134724 | 0.8312090 | 0.9845704 | -5.569085 |
cg00005543 | TCTE3;C6orf70;TCTE3 | Promoter_Associated | -0.0049202 | -4.2181795 | -0.1388422 | 0.8897358 | 0.9905448 | -5.580493 |
cg00006032 | GPHN;GPHN;GPHN;GPHN | Promoter_Associated | -0.0762156 | -3.1976445 | -1.3023458 | 0.1945231 | 0.8284346 | -4.856869 |
cg00006122 | C12orf44;C12orf44 | Promoter_Associated | 0.0572183 | -3.0480148 | 2.0292594 | 0.0439589 | 0.6192964 | -3.825553 |
cg00006884 | CCDC126;CCDC126 | Promoter_Associated | 0.0081832 | -3.4410951 | 0.2750624 | 0.7835953 | 0.9796542 | -5.556032 |
cg00007226 | PACS2;PACS2 | Promoter_Associated | 0.0187771 | -2.7369800 | 0.6756715 | 0.5001482 | 0.9360383 | -5.391035 |
cg00007269 | ZNF77 | Promoter_Associated | 0.0243405 | -3.2605321 | 0.9476296 | 0.3446351 | 0.8950468 | -5.200298 |
cg00007898 | DSTYK;DSTYK | Promoter_Associated | -0.0487729 | -4.7627008 | -0.9935158 | 0.3218412 | 0.8884407 | -5.161882 |
cg00008004 | Promoter_Associated | 0.0530924 | -3.1300562 | 1.2249733 | 0.2222445 | 0.8448815 | -4.940849 | |
cg00008033 | ZNF613;ZNF613 | Promoter_Associated | -0.0926061 | -2.3343986 | -1.8015212 | 0.0733558 | 0.6948466 | -4.195215 |
cg00008188 | C14orf181 | Promoter_Associated | 0.0310443 | -3.3313383 | 1.0703318 | 0.2859557 | 0.8754919 | -5.093560 |
cg00008387 | TMEM188 | Promoter_Associated | 0.0187394 | -2.6875103 | 0.7807688 | 0.4360017 | 0.9222170 | -5.324840 |
cg00008665 | C3orf39 | Promoter_Associated | 0.0274437 | 2.6784492 | 0.6050298 | 0.5459492 | 0.9453510 | -5.430190 |
cg00008671 | FAM190B | Promoter_Associated | 0.0367837 | -0.8166720 | 0.9195016 | 0.3591090 | 0.8995507 | -5.222960 |
cg00008713 | IMPA2 | Promoter_Associated | 0.0547502 | -2.6401426 | 1.6624517 | 0.0982265 | 0.7348906 | -4.400228 |
cg00008823 | NRD1;NRD1 | Promoter_Associated | 0.0392469 | -4.1361594 | 1.3306030 | 0.1850646 | 0.8228002 | -4.824945 |
cg00008839 | OAZ1 | Promoter_Associated | -0.0109090 | -5.3145920 | -0.2359452 | 0.8137533 | 0.9829590 | -5.564703 |
cg00009167 | KIAA1324;C1orf194;C1orf194 | Promoter_Associated | 0.0203360 | -2.6887142 | 0.7767049 | 0.4383897 | 0.9227737 | -5.327576 |
cg00009214 | DERA | Promoter_Associated_Cell_type_specific | 0.0165604 | -2.8461860 | 0.7015914 | 0.4838734 | 0.9326330 | -5.375591 |
cg00009407 | TTC8;TTC8;TTC8 | Promoter_Associated | 0.0546805 | -3.7141576 | 1.8150757 | 0.0712380 | 0.6896927 | -4.174389 |
cg00009412 | DDX18 | Promoter_Associated | 0.0556711 | -3.8699150 | 1.5351918 | 0.1265570 | 0.7705896 | -4.573905 |
cg00009970 | MEX3C;MEX3C | Promoter_Associated | 0.0057340 | -3.9394691 | 0.1630548 | 0.8706649 | 0.9887727 | -5.577321 |
cg00010046 | MGC23284;MGC23284;MVD | Promoter_Associated | 0.0578509 | -3.2839645 | 1.7147500 | 0.0881753 | 0.7201932 | -4.324988 |
cg00010168 | MMS19;UBTD1 | Promoter_Associated | 0.0215936 | -2.4492618 | 0.7670533 | 0.4440916 | 0.9238443 | -5.334016 |
cg00010266 | MFSD3 | Promoter_Associated | -0.0110287 | -3.5919353 | -0.3064669 | 0.7596164 | 0.9770348 | -5.548111 |
cg00010659 | TMEM14C;TMEM14C | Promoter_Associated | -0.0108691 | -3.4079584 | -0.3903382 | 0.6967649 | 0.9700739 | -5.522770 |
cg00010853 | KIAA1949;KIAA1949 | Promoter_Associated | 0.0616603 | -0.2660541 | 1.0482193 | 0.2959945 | 0.8797502 | -5.113741 |
cg00010932 | METTL5 | Promoter_Associated | -0.0433476 | 1.7565091 | -0.7089306 | 0.4793184 | 0.9317887 | -5.371113 |
cg00010947 | HELQ;MRPS18C | Promoter_Associated | 0.0352093 | -3.4402376 | 0.6786797 | 0.4982445 | 0.9355984 | -5.389272 |
cg00011122 | C16orf75 | Promoter_Associated | -0.0340731 | -2.7978920 | -0.5123938 | 0.6090266 | 0.9571802 | -5.475019 |
cg00011284 | RBL2 | Promoter_Associated | -0.0490232 | -3.3631727 | -1.2791249 | 0.2025600 | 0.8332300 | -4.882602 |
cg00011578 | PPP5C | Promoter_Associated | 0.0318908 | -2.3697313 | 1.3798035 | 0.1694215 | 0.8125546 | -4.767763 |
cg00011994 | FLOT1;IER3 | Promoter_Associated | -0.0011729 | -3.7442208 | -0.0386883 | 0.9691833 | 0.9977478 | -5.588208 |
cg00012036 | H3F3B | Promoter_Associated | 0.0349406 | -2.9522732 | 1.2434522 | 0.2153776 | 0.8415437 | -4.921250 |
# trim down the input dataset
dm <- dm[,c("UCSC_RefGene_Name","t")]
# histogram of t values
hist(dm$t,breaks=seq(from=-6,to=6,by=1))
# set cores to used for parallel execution
tic()
gme_res_promoter <- calc_sc(dm)
time2 <- toc() #15.7 20.2 20.3 17.5 17.9 17.7 20.6
## 16.344 sec elapsed
time2
## $tic
## elapsed
## 147.699
##
## $toc
## elapsed
## 164.043
##
## $msg
## logical(0)
##
## $callback_msg
## [1] "16.344 sec elapsed"
df <- gme_res_promoter[[1]]
res <- gme_res_promoter[[2]]
write.table(res ,file="gmea_promo.tsv")
head(res,50) %>%
kbl(caption = "Top significant genes with GMEA") %>%
kable_paper("hover", full_width = F)
nprobes | mean | median | p-value(sc) | sig | fdr(sc) | |
---|---|---|---|---|---|---|
PSMB9 | 49 | 0.5761512 | 0.5821938 | 0.0000026 | 5.576796 | 0.0273321 |
LTA | 17 | 2.3851459 | 2.5326143 | 0.0000153 | 4.816480 | 0.1573792 |
ZNF331 | 22 | -0.8964193 | -0.9539272 | 0.0000205 | 4.688161 | 0.2114577 |
HLA-F | 37 | 0.8824152 | 0.8890888 | 0.0000264 | 4.579069 | 0.2718154 |
TAP1 | 52 | 0.5228433 | 0.5670995 | 0.0000274 | 4.561618 | 0.2829321 |
PHTF2 | 29 | 0.6846189 | 0.5747849 | 0.0000318 | 4.497636 | 0.3278101 |
TMEM60 | 29 | 0.6846189 | 0.5747849 | 0.0000318 | 4.497636 | 0.3278101 |
KDM6B | 19 | 0.9253189 | 0.5649357 | 0.0000381 | 4.418540 | 0.3932190 |
C20orf94 | 15 | 0.6004834 | 0.6045686 | 0.0000610 | 4.214420 | 0.6290894 |
KIAA1949 | 67 | 0.5198463 | 0.5612521 | 0.0000614 | 4.211571 | 0.6331675 |
SF3A2 | 17 | 1.0063875 | 0.6782074 | 0.0000763 | 4.117510 | 0.7862091 |
TUBB | 26 | 0.8749918 | 0.9736874 | 0.0000816 | 4.088475 | 0.8404865 |
PSMB8 | 46 | 0.5721948 | 0.4699794 | 0.0001023 | 3.989915 | 1.0000000 |
KIF5B | 14 | 0.7902940 | 0.5348532 | 0.0001221 | 3.913390 | 1.0000000 |
SRP68 | 14 | 0.9111961 | 0.8667275 | 0.0001221 | 3.913390 | 1.0000000 |
PLEKHJ1 | 16 | 1.0268987 | 0.8825641 | 0.0001526 | 3.816480 | 1.0000000 |
HSPE1 | 23 | 0.6167638 | 0.5507955 | 0.0001814 | 3.741275 | 1.0000000 |
HSPD1 | 23 | 0.6167638 | 0.5507955 | 0.0001814 | 3.741275 | 1.0000000 |
TMEM134 | 19 | 0.5859688 | 0.5494877 | 0.0002098 | 3.678177 | 1.0000000 |
HSPA1B | 23 | 0.6806965 | 0.5842477 | 0.0002148 | 3.667935 | 1.0000000 |
IQCH | 13 | 0.8418442 | 0.6822960 | 0.0002441 | 3.612360 | 1.0000000 |
AAGAB | 13 | 0.8418442 | 0.6822960 | 0.0002441 | 3.612360 | 1.0000000 |
RWDD1 | 13 | 0.8209741 | 0.6082937 | 0.0002441 | 3.612360 | 1.0000000 |
SSH3 | 13 | -1.5301230 | -1.4511893 | 0.0002441 | 3.612360 | 1.0000000 |
STAG3L4 | 13 | 0.7675809 | 0.8131921 | 0.0002441 | 3.612360 | 1.0000000 |
PMS2L4 | 13 | 0.7675809 | 0.8131921 | 0.0002441 | 3.612360 | 1.0000000 |
PTPRCAP | 13 | 2.3916698 | 2.4635616 | 0.0002441 | 3.612360 | 1.0000000 |
BCL11B | 14 | 1.3268440 | 1.3321196 | 0.0002441 | 3.612360 | 1.0000000 |
GALK2 | 25 | 0.7274296 | 0.7982154 | 0.0002498 | 3.602402 | 1.0000000 |
PEX10 | 22 | 0.7323587 | 0.8632916 | 0.0002556 | 3.592465 | 1.0000000 |
TRIM27 | 55 | 0.4703276 | 0.4519905 | 0.0003200 | 3.494916 | 1.0000000 |
C5orf36 | 20 | 0.7279515 | 0.8111289 | 0.0003223 | 3.491683 | 1.0000000 |
ANKRD32 | 20 | 0.7279515 | 0.8111289 | 0.0003223 | 3.491683 | 1.0000000 |
IRAK1BP1 | 14 | 0.7353866 | 0.7967438 | 0.0003662 | 3.436269 | 1.0000000 |
C5orf35 | 14 | 1.1025095 | 0.9671847 | 0.0003662 | 3.436269 | 1.0000000 |
PRKRA | 15 | 0.7994224 | 0.7985942 | 0.0004272 | 3.369322 | 1.0000000 |
DFNB59 | 15 | 0.7994224 | 0.7985942 | 0.0004272 | 3.369322 | 1.0000000 |
ZBTB9 | 35 | 0.5068120 | 0.4382240 | 0.0004435 | 3.353107 | 1.0000000 |
HIST1H2AK | 12 | 0.6486225 | 0.6487854 | 0.0004883 | 3.311330 | 1.0000000 |
ZC3H12D | 12 | 1.0249054 | 0.8920786 | 0.0004883 | 3.311330 | 1.0000000 |
TBC1D10C | 12 | 1.2114500 | 0.9331465 | 0.0004883 | 3.311330 | 1.0000000 |
TMEM128 | 12 | 1.0851858 | 0.8574542 | 0.0004883 | 3.311330 | 1.0000000 |
LYSMD1 | 12 | 0.8175382 | 0.6964782 | 0.0004883 | 3.311330 | 1.0000000 |
SCNM1 | 12 | 0.8175382 | 0.6964782 | 0.0004883 | 3.311330 | 1.0000000 |
PPM1M | 13 | 1.3734122 | 1.2332757 | 0.0004883 | 3.311330 | 1.0000000 |
OBFC2A | 12 | 0.4234877 | 0.3051854 | 0.0004883 | 3.311330 | 1.0000000 |
NME1-NME2 | 25 | 0.6867271 | 0.7243284 | 0.0004895 | 3.310271 | 1.0000000 |
CDC7 | 16 | 0.8879637 | 0.8115182 | 0.0005798 | 3.236696 | 1.0000000 |
ARSG | 16 | 0.7496755 | 0.8700503 | 0.0005798 | 3.236696 | 1.0000000 |
UVRAG | 15 | 0.8486591 | 0.9494920 | 0.0006104 | 3.214420 | 1.0000000 |
# volcano selfcont
sig <- subset(res,`fdr(sc)` < 0.05)
plot(res$median , -log10(res$`p-value(sc)`) ,
xlab="effect size (mean t-stat)", ylab="-log10(p-value)",
pch=19, cex=0.5, col="gray",main="self contained test")
grid()
points(sig$median , -log10(sig$`p-value(sc)`) ,
pch=19, cex=0.5, col="red")
plot(res$nprobes,res$sig,log="x",ylim=c(0,50),pch=19,cex=0.6)
points(sig$nprobes,sig$sig,col="red",pch=19,cex=0.62)
MIN = min(sig$nprobes)
LEFT = nrow(subset(res,nprobes<MIN))
RIGHT = nrow(subset(res,nprobes>MIN))
SIG = nrow(sig)
TOT = nrow(res)
HEADER <- paste(TOT, "genes in total.", SIG, "with FDR<0.05.",
RIGHT, "well covered and", LEFT, "poorly covered")
mtext(HEADER)
abline(v=MIN,lty=2)
Boxplots smallest pvalue.
par(mfrow=c(1,2))
n=50
# self contained
gs <- head(rownames(res),50)
tstats <- lapply(gs, function(g) {
df[which(df$genes==g),"tvals"]
})
names(tstats) <- gs
tstats <- tstats[order(unlist(lapply(tstats,median)))]
boxplot(tstats,horizontal=TRUE,las=1,
main="smallest p-val(selfcont)",cex.axis=0.6,
xlab="t-statistic")
grid()
n=50
# effect size median
sig <- subset(res,`fdr(sc)` < 0.05)
gs <- head(rownames(sig[order(-abs(sig$median)),]),n)
if ( length(gs) >2 ) {
tstats <- lapply(gs, function(g) {
df[which(df$genes==g),"tvals"]
})
names(tstats) <- gs
tstats <- tstats[order(unlist(lapply(tstats,median)))]
boxplot(tstats,horizontal=TRUE,las=1,
main="biggest effect size(median)",cex.axis=0.6,
xlab="t-statistic")
grid()
} else {
plot(1)
mtext("too few significant genes found")
}
dmscore <- data.frame( res$median * res$sig)
rownames(dmscore) <- rownames(res)
colnames(dmscore) <- "metric"
genesets <- gmt_import("ReactomePathways.gmt")
mres <- mitch_calc(x=dmscore, genesets=genesets,priority="effect")
## Note: Enrichments with large effect sizes may not be
## statistically significant.
head(mres$enrichment_result,20) %>%
kbl(caption = "Top enriched gene sets with GMEA-Mitch (promoter only)") %>%
kable_paper("hover", full_width = F)
set | setSize | pANOVA | s.dist | p.adjustANOVA | |
---|---|---|---|---|---|
829 | Regulation of FZD by ubiquitination | 12 | 0.0004435 | -0.5854282 | 0.0293696 |
812 | RUNX2 regulates osteoblast differentiation | 13 | 0.0002702 | -0.5832176 | 0.0230084 |
379 | Glutamate Neurotransmitter Release Cycle | 11 | 0.0022989 | -0.5306853 | 0.0830377 |
65 | Assembly of collagen fibrils and other multimeric structures | 10 | 0.0046010 | -0.5173411 | 0.1166405 |
526 | Maturation of nucleoprotein | 11 | 0.0047419 | 0.4916096 | 0.1166405 |
1160 | Voltage gated Potassium channels | 10 | 0.0076020 | -0.4873265 | 0.1372969 |
625 | Neurotransmitter release cycle | 18 | 0.0003600 | -0.4856269 | 0.0266115 |
864 | Regulation of pyruvate dehydrogenase (PDH) complex | 11 | 0.0069827 | -0.4696146 | 0.1328140 |
250 | Dopamine Neurotransmitter Release Cycle | 10 | 0.0142609 | -0.4474236 | 0.1868015 |
508 | Long-term potentiation | 11 | 0.0104895 | -0.4455287 | 0.1615337 |
672 | PKA activation | 10 | 0.0174194 | -0.4341194 | 0.2171792 |
1001 | Signaling by Retinoic Acid | 17 | 0.0020036 | -0.4327568 | 0.0770417 |
673 | PKA-mediated phosphorylation of CREB | 11 | 0.0133879 | -0.4305654 | 0.1793073 |
614 | Negative regulation of NMDA receptor-mediated neuronal transmission | 12 | 0.0105702 | -0.4261380 | 0.1615337 |
811 | RUNX2 regulates bone development | 17 | 0.0025294 | -0.4229719 | 0.0861428 |
384 | Glycogen storage diseases | 10 | 0.0219436 | -0.4183503 | 0.2356464 |
817 | Ras activation upon Ca2+ influx through NMDA receptor | 10 | 0.0272862 | -0.4030277 | 0.2450697 |
1053 | TNF receptor superfamily (TNFSF) members mediating non-canonical NF-kB pathway | 10 | 0.0296084 | 0.3971664 | 0.2468059 |
1034 | Synthesis of Leukotrienes (LT) and Eoxins (EX) | 10 | 0.0334008 | -0.3883940 | 0.2602207 |
835 | Regulation of KIT signaling | 10 | 0.0350228 | 0.3849005 | 0.2642229 |
mitch_report(mres,outfile="dma3a_mitch_promo.html",overwrite=TRUE)
## Note: overwriting existing report
## Dataset saved as " /tmp/RtmpAHTfXe/dma3a_mitch_promo.rds ".
##
##
## processing file: mitch.Rmd
##
|
| | 0%
|
|.. | 3%
## inline R code fragments
##
##
|
|.... | 6%
## label: checklibraries (with options)
## List of 1
## $ echo: logi FALSE
##
##
|
|...... | 9%
## ordinary text without R code
##
##
|
|........ | 12%
## label: peek (with options)
## List of 1
## $ echo: logi FALSE
##
##
|
|.......... | 15%
## ordinary text without R code
##
##
|
|............ | 18%
## label: metrics (with options)
## List of 1
## $ echo: logi FALSE
##
##
|
|.............. | 21%
## ordinary text without R code
##
##
|
|................ | 24%
## label: scatterplot (with options)
## List of 5
## $ echo : logi FALSE
## $ fig.height: num 6
## $ fig.width : num 6.5
## $ message : logi FALSE
## $ warning : logi FALSE
##
|
|................... | 26%
## ordinary text without R code
##
##
|
|..................... | 29%
## label: contourplot (with options)
## List of 5
## $ echo : logi FALSE
## $ fig.height: num 6
## $ fig.width : num 6.5
## $ warning : logi FALSE
## $ message : logi FALSE
## Contour plot does not apply to unidimensional analysis.
##
|
|....................... | 32%
## ordinary text without R code
##
##
|
|......................... | 35%
## label: input_geneset_metrics1 (with options)
## List of 2
## $ results: chr "asis"
## $ echo : logi FALSE
##
##
|
|........................... | 38%
## ordinary text without R code
##
##
|
|............................. | 41%
## label: input_geneset_metrics2 (with options)
## List of 5
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 7
## $ fig.width : num 7
## $ fig.show : chr "all"
##
|
|............................... | 44%
## ordinary text without R code
##
##
|
|................................. | 47%
## label: input_geneset_metrics3 (with options)
## List of 5
## $ results : chr "asis"
## $ echo : logi FALSE
## $ message : logi FALSE
## $ fig.height: num 7
## $ fig.width : num 7
##
|
|................................... | 50%
## ordinary text without R code
##
##
|
|..................................... | 53%
## label: echart1d (with options)
## List of 6
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 7
## $ fig.width : num 7
## $ fig.show : chr "all"
## $ message : logi FALSE
##
##
|
|....................................... | 56%
## label: echart2d (with options)
## List of 6
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 7
## $ fig.width : num 7
## $ fig.show : chr "all"
## $ message : logi FALSE
##
##
|
|......................................... | 59%
## ordinary text without R code
##
##
|
|........................................... | 62%
## label: heatmap (with options)
## List of 6
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 10
## $ fig.width : num 7
## $ fig.show : chr "all"
## $ message : logi FALSE
##
##
|
|............................................. | 65%
## ordinary text without R code
##
##
|
|............................................... | 68%
## label: effectsize (with options)
## List of 6
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 7
## $ fig.width : num 7
## $ fig.show : chr "all"
## $ message : logi FALSE
##
##
|
|................................................. | 71%
## ordinary text without R code
##
##
|
|................................................... | 74%
## label: results_table (with options)
## List of 2
## $ results: chr "asis"
## $ echo : logi FALSE
##
##
|
|...................................................... | 76%
## ordinary text without R code
##
##
|
|........................................................ | 79%
## label: results_table_complete (with options)
## List of 2
## $ results: chr "asis"
## $ echo : logi FALSE
##
##
|
|.......................................................... | 82%
## ordinary text without R code
##
##
|
|............................................................ | 85%
## label: detailed_geneset_reports1d (with options)
## List of 7
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 6
## $ fig.width : num 6
## $ out.width : chr "80%"
## $ comment : logi NA
## $ message : logi FALSE
##
|
|.............................................................. | 88%
## ordinary text without R code
##
##
|
|................................................................ | 91%
## label: detailed_geneset_reports2d (with options)
## List of 7
## $ results : chr "asis"
## $ echo : logi FALSE
## $ fig.height: num 5
## $ fig.width : num 6
## $ out.width : chr "80%"
## $ comment : logi NA
## $ message : logi FALSE
##
##
|
|.................................................................. | 94%
## ordinary text without R code
##
##
|
|.................................................................... | 97%
## label: session_info (with options)
## List of 3
## $ include: logi TRUE
## $ echo : logi TRUE
## $ results: chr "markup"
##
##
|
|......................................................................| 100%
## ordinary text without R code
## output file: /home/mdz/projects/gmea/mitch.knit.md
## /usr/bin/pandoc +RTS -K512m -RTS /home/mdz/projects/gmea/mitch.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output /tmp/RtmpAHTfXe/mitch_report.html --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --section-divs --template /usr/local/lib/R/site-library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=bootstrap --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /tmp/RtmpAHTfXe/rmarkdown-str1d6981ffcf082.html
##
## Output created: /tmp/RtmpAHTfXe/mitch_report.html
## [1] TRUE
For reproducibility.
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] pkgload_1.3.0
## [2] GGally_2.1.2
## [3] ggplot2_3.3.6
## [4] reshape2_1.4.4
## [5] beeswarm_0.4.0
## [6] gplots_3.1.3
## [7] gtools_3.9.3
## [8] tibble_3.1.8
## [9] echarts4r_0.4.4
## [10] tictoc_1.1
## [11] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1
## [12] IlluminaHumanMethylation450kmanifest_0.4.0
## [13] minfi_1.42.0
## [14] bumphunter_1.38.0
## [15] locfit_1.5-9.6
## [16] iterators_1.0.14
## [17] foreach_1.5.2
## [18] Biostrings_2.64.1
## [19] XVector_0.36.0
## [20] SummarizedExperiment_1.26.1
## [21] Biobase_2.56.0
## [22] MatrixGenerics_1.8.1
## [23] matrixStats_0.62.0
## [24] GenomicRanges_1.48.0
## [25] GenomeInfoDb_1.32.4
## [26] IRanges_2.30.1
## [27] S4Vectors_0.34.0
## [28] BiocGenerics_0.42.0
## [29] mitch_1.8.0
## [30] eulerr_6.1.1
## [31] kableExtra_1.3.4
## [32] dplyr_1.0.10
##
## loaded via a namespace (and not attached):
## [1] BiocFileCache_2.4.0 systemfonts_1.0.4
## [3] plyr_1.8.7 splines_4.2.1
## [5] BiocParallel_1.30.3 digest_0.6.29
## [7] htmltools_0.5.3 fansi_1.0.3
## [9] magrittr_2.0.3 memoise_2.0.1
## [11] tzdb_0.3.0 limma_3.52.4
## [13] readr_2.1.3 annotate_1.74.0
## [15] svglite_2.1.0 askpass_1.1
## [17] siggenes_1.70.0 prettyunits_1.1.1
## [19] colorspace_2.0-3 blob_1.2.3
## [21] rvest_1.0.3 rappdirs_0.3.3
## [23] xfun_0.33 crayon_1.5.2
## [25] RCurl_1.98-1.9 jsonlite_1.8.2
## [27] genefilter_1.78.0 GEOquery_2.64.2
## [29] survival_3.4-0 glue_1.6.2
## [31] gtable_0.3.1 zlibbioc_1.42.0
## [33] webshot_0.5.4 DelayedArray_0.22.0
## [35] Rhdf5lib_1.18.2 HDF5Array_1.24.2
## [37] scales_1.2.1 DBI_1.1.3
## [39] rngtools_1.5.2 Rcpp_1.0.9
## [41] viridisLite_0.4.1 xtable_1.8-4
## [43] progress_1.2.2 bit_4.0.4
## [45] mclust_5.4.10 preprocessCore_1.58.0
## [47] htmlwidgets_1.5.4 httr_1.4.4
## [49] RColorBrewer_1.1-3 ellipsis_0.3.2
## [51] pkgconfig_2.0.3 reshape_0.8.9
## [53] XML_3.99-0.11 sass_0.4.2
## [55] dbplyr_2.2.1 utf8_1.2.2
## [57] tidyselect_1.2.0 rlang_1.0.6
## [59] later_1.3.0 AnnotationDbi_1.58.0
## [61] munsell_0.5.0 tools_4.2.1
## [63] cachem_1.0.6 cli_3.4.1
## [65] generics_0.1.3 RSQLite_2.2.18
## [67] evaluate_0.17 stringr_1.4.1
## [69] fastmap_1.1.0 yaml_2.3.5
## [71] knitr_1.40 bit64_4.0.5
## [73] beanplot_1.3.1 scrime_1.3.5
## [75] caTools_1.18.2 purrr_0.3.5
## [77] KEGGREST_1.36.3 nlme_3.1-160
## [79] doRNG_1.8.2 sparseMatrixStats_1.8.0
## [81] mime_0.12 nor1mix_1.3-0
## [83] xml2_1.3.3 biomaRt_2.52.0
## [85] compiler_4.2.1 rstudioapi_0.14
## [87] filelock_1.0.2 curl_4.3.3
## [89] png_0.1-7 bslib_0.4.0
## [91] stringi_1.7.8 highr_0.9
## [93] GenomicFeatures_1.48.4 lattice_0.20-45
## [95] Matrix_1.5-1 multtest_2.52.0
## [97] vctrs_0.4.2 pillar_1.8.1
## [99] lifecycle_1.0.3 rhdf5filters_1.8.0
## [101] jquerylib_0.1.4 data.table_1.14.2
## [103] bitops_1.0-7 httpuv_1.6.6
## [105] rtracklayer_1.56.1 R6_2.5.1
## [107] BiocIO_1.6.0 promises_1.2.0.1
## [109] KernSmooth_2.23-20 gridExtra_2.3
## [111] codetools_0.2-18 MASS_7.3-58.1
## [113] assertthat_0.2.1 rhdf5_2.40.0
## [115] openssl_2.0.3 rjson_0.2.21
## [117] withr_2.5.0 GenomicAlignments_1.32.1
## [119] Rsamtools_2.12.0 GenomeInfoDbData_1.2.8
## [121] hms_1.1.2 quadprog_1.5-8
## [123] grid_4.2.1 tidyr_1.2.1
## [125] base64_2.0.1 rmarkdown_2.17
## [127] DelayedMatrixStats_1.18.1 illuminaio_0.38.0
## [129] shiny_1.7.2 restfulr_0.0.15