Introduction

In this report we establish a new method for generating simulated data with known ground truth. This will be used to test different gene methylation enrichment approaches systematically.

The general steps are:

Import GSE158422 data corresponding to control (non-tumour tissue).
From the 37 samples, create two groups of 18 samples. One of these will be considered “control” and the other “case”.
Create random gene sets that have similar sized to Reactome pathways.
Some gene sets will be selected to be differentially methylated. Half of these will be hypermethylated and the others will be hypomethylated.
The changes will be incorporated into the “case” samples.
The enrichment analysis will be conducted.
The accuracy will be calculated.

suppressPackageStartupMessages({
  library("stringi")
  library("limma")
  library("missMethyl")
  library("IlluminaHumanMethylation450kmanifest")
  library("IlluminaHumanMethylationEPICanno.ilm10b4.hg19")
  library("HGNChelper")
  library('org.Hs.eg.db')
  library("psych")
  library("mitch")
  library("kableExtra")
})

# optimised for 128 GB sever with 32 threads
CORES=6

Load data

annotations
probe sets
gene sets
design matrix
mval matrix

# get probe-gene mapping
anno <- getAnnotation(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
myann <- data.frame(anno[,c("UCSC_RefGene_Name","UCSC_RefGene_Group","Islands_Name","Relation_to_Island")])
gp <- myann[,"UCSC_RefGene_Name",drop=FALSE]
gp2 <- strsplit(gp$UCSC_RefGene_Name,";")
names(gp2) <- rownames(gp)
gp2 <- lapply(gp2,unique)
gt <- stack(gp2)
colnames(gt) <- c("gene","probe")
dim(gt)

## [1] 684970      2

str(gt)

## 'data.frame':    684970 obs. of  2 variables:
##  $ gene : chr  "YTHDF1" "EIF2S3" "PKN3" "CCDC57" ...
##  $ probe: Factor w/ 865859 levels "cg18478105","cg09835024",..: 1 2 3 4 5 6 7 8 8 9 ...

head(gt)

##     gene      probe
## 1 YTHDF1 cg18478105
## 2 EIF2S3 cg09835024
## 3   PKN3 cg14361672
## 4 CCDC57 cg01763666
## 5   INF2 cg12950382
## 6  CDC16 cg02115394

#new.hgnc.table <- getCurrentHumanMap()
#new.hgnc.table <- readRDS("new.hgnc.table.rds")
#fix <- checkGeneSymbols(gt$gene,map=new.hgnc.table)
#fix2 <- fix[which(fix$x != fix$Suggested.Symbol),]
#length(unique(fix2$x))
#gt$gene <- fix$Suggested.Symbol

if (!file.exists("GSE158422_design.rds")) {
  download.file("https://ziemann-lab.net/public/gmea_prototype/GSE158422_design.rds", "GSE158422_design.rds")
}
design <- readRDS("GSE158422_design.rds")

if (!file.exists("GSE158422_design.rds")) {
  options(timeout=10000000000000)
  download.file("https://ziemann-lab.net/public/gmea_prototype/GSE158422_mx.rds","GSE158422_mx.rds")
}
mval <- readRDS("GSE158422_mx.rds")

Gene set database

We could use Reactome pathways, however these have a lot of overlapping sets, which could cause inflated false positives. After some testing, I found that the gene set size strongly impacts the accuracy.

For reference, Reactome sets have a mean of 48 and median of 15, while MSigDB has a median of 47 and mean of 116.

Therefore, a reasonable analysis would include small (20), medium (50) and large (100) sets.

To reflect coverage of Reactome and other pathway databases, only half the genes will be included in sets.

gene_catalog <- unique(gt$gene)

# set gene set size here
lengths <- rep(50,1000)

randomGeneSets <- function(gene_catalog, lengths, seed){
  set.seed(seed) ; gene_catalog_half <- sample(gene_catalog,length(gene_catalog)/2)
  num_gsets <- length(lengths)
  set.seed(seed) ; seeds <- sample(1:1e6, num_gsets)
  gsets <- lapply(1:num_gsets,function(i) {
    set.seed(seeds[i]) ; gs <- sample(gene_catalog_half,lengths[i])
    return(gs)
  } )
  names(gsets)<-stri_rand_strings(length(gsets), 15, pattern = "[A-Za-z]")
  return(gsets)
}

gsets <- randomGeneSets(gene_catalog,lengths,seed=100)

Incorporate changes

TODO: to incorporate changes to case samples.

Need to figure out what magnitude to change. Will refer to the cancer/normal comparison.

Select genes and probes to alter.

seed=100
frac_genes=0.5
frac_probes=0.5
delta=1
nsamples=10
normal_mval <- mval[,(1:(ncol(mval)/2)*2)]

incorp_dm <- function(genesets,myann,mval,seed, frac_genes,frac_probes,
  groupsize,delta=1,gene_catalog) {

  # divide gene sets between hyper and hypomethylated
  nset <- floor(length(genesets)/2)
  set.seed(seed) ; gtup <-sample(genesets,nset)
  set.seed(seed) ; gtdn <- sample(setdiff(genesets,gtup),nset)
  gup <- unname(unlist(gtup))
  gdn <- unname(unlist(gtdn))
  # add extra 10% DM genes
  gnon <- setdiff(gene_catalog,c(gup,gdn))
  gextra <- round(length(gnon)*0.1)
  set.seed(seed) ; gup <- c(gup,sample(gnon,gextra))
  gnon <- setdiff(gene_catalog,c(gup,gdn))
  set.seed(seed) ; gdn <- c(gdn,sample(gnon,gextra))
  # make probe-gene vector
  probe2gene <- strsplit(myann$UCSC_RefGene_Name,";")
  names(probe2gene) <- rownames(myann)
  probe2gene <- unlist(probe2gene)
  # select probes hypermethylated
  set.seed(seed) ; gup2 <- sample(gup,floor(length(gup)*frac_genes))
  pup <- names(probe2gene[which(probe2gene %in% gup2)])
  set.seed(seed) ; pup2 <- sample(pup,floor(length(pup)*frac_probes))
  # select probes hypomethylated
  set.seed(seed) ; gdn2 <- sample(gdn,floor(length(gdn)*frac_genes))
  pdn <- names(probe2gene[which(probe2gene %in% gdn2)])
  set.seed(seed) ; pdn2 <- sample(pdn,floor(length(pdn)*frac_probes))
  # add 10% DM probes as well
  probes <- rownames(myann)
  pnon <- setdiff(probes,c(pup,pdn))
  pextra <- round(length(pnon)*0.1)
  set.seed(seed) ; pup <- c(pup,sample(pnon,pextra))
  pnon <- setdiff(probes,c(pup,pdn))
  set.seed(seed) ; pdn <- c(pdn,sample(pnon,pextra))
  # divide samples between ctrl and case
  ncols <- ncol(mval)
  maxgroupsize=floor(ncols/2)
  if ( groupsize > maxgroupsize ) { stop("groupsize cannot be larger than half the ncols of mval") }
  set.seed(seed) ; ctrl <- sample(1:ncols,groupsize)
  set.seed(seed) ; case <- sample(setdiff(1:ncols,ctrl),groupsize)
  mval_ctrl <- mval[,ctrl]
  mval_case <- mval[,case]
  # incorporate altered signals - change by +1 or -1
  mval_case[rownames(mval_case) %in% pup2,] <-  mval_case[rownames(mval_case) %in% pup2,] + delta
  mval_case[rownames(mval_case) %in% pdn2,] <-  mval_case[rownames(mval_case) %in% pdn2,] - delta
  mval2 <- cbind(mval_ctrl,mval_case)
  result <- list("mval"=mval2,"probes up"=pup2,"probes down"=pdn2,
    "genes up"=gup2,"genes down"=gdn2,
    "genesets up"=gtup,"genesets down"=gtdn)
  return(result)
}

GSAMETH function

# limma
runlimma <- function(mval,design,myann) {
  fit.reduced <- lmFit(mval,design)
  fit.reduced <- eBayes(fit.reduced)
  dm <- topTable(fit.reduced,coef=ncol(design), number = Inf)
  dm <- merge(myann,dm,by=0)
  dm <- dm[order(dm$P.Value),]
  rownames(dm) <- dm$Row.names
  dm$Row.names=NULL
  return(dm)
}

This is how to use the function

This could be complicated as it requires translation of symbols to entrez IDs, but could be simplified if the entrez translation is done at the gsameth step.

simgsa <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(unlist(strsplit(myann$UCSC_RefGene_Name,";")))
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  dm3 <- runlimma(mval=mval2,design=d,myann=myann)
  pup3 <- rownames(subset(dm3,adj.P.Val<0.05 & logFC>0))
  pdn3 <- rownames(subset(dm3,adj.P.Val<0.05 & logFC<0))
  if ( length(pup3) < 250 ) { pup3 <- head(rownames(subset(dm3, logFC > 0)), 250) }
  if ( length(pdn3) < 250 ) { pdn3 <- head(rownames(subset(dm3, logFC < 0)), 250) }
  # convert gene sets to entrez
  suppressWarnings(suppressMessages({ gene2entrez <- mapIds(org.Hs.eg.db, gene_catalog, 'ENTREZID', 'SYMBOL') }))
  gsets_entrez <- lapply(gsets,function(gs) {
    gs2 <- unique(gene2entrez[names(gene2entrez) %in% gs])
    gs2 <- gs2[!is.na(gs2)]
    return(gs2)
  })
  suppressWarnings(suppressMessages({
    gsaup3 <- gsameth(sig.cpg=pup3, all.cpg=rownames(dm3), collection=gsets_entrez, array.type="EPIC")
    gsadn3 <- gsameth(sig.cpg=pdn3, all.cpg=rownames(dm3), collection=gsets_entrez, array.type="EPIC")
  }))
  gsig_up3 <- rownames(subset(gsaup3,FDR<0.05))
  gsig_dn3 <- rownames(subset(gsadn3,FDR<0.05))
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(gsadn3)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

LA function

This process runs limma first and then aggregates the results before doing an enrichment test.

# enrich parametric
ttenrich <- function(m,genesets,cores=1,testtype="selfcontained") {
  res <- mclapply( 1:length(genesets), function(i) {
    scores <- m[,1]
    gs <- genesets[i]
    name <- names(gs)
    n_members <- length(which(rownames(m) %in% gs[[1]]))
    if ( n_members > 4 ) {
      tstats <- m[which(rownames(m) %in% gs[[1]]),]
      myn <- length(tstats)
      mymean <- mean(tstats)
      mymedian <- median(tstats)
      if ( testtype == "selfcontained" ) { wt <- t.test(tstats) }
      if ( testtype == "competitive" ) { wt <- t.test(tstats,scores) }
      res <- c(name,myn,mymean,mymedian,wt$p.value)
    }
  } , mc.cores = cores)
  res_df <- do.call(rbind, res)
  rownames(res_df) <- res_df[,1]
  res_df <- res_df[,-1]
  colnames(res_df) <- c("n_genes","t_mean","t_median","pval")
  tmp <- apply(res_df,2,as.numeric)
  rownames(tmp) <- rownames(res_df)
  res_df <- tmp
  res_df <- as.data.frame(res_df)
  res_df <- res_df[order(res_df$pval),]
  res_df$logp <- -log10(res_df$pval )
  res_df$fdr <- p.adjust(res_df$pval,method="fdr")
  res_df[order(abs(res_df$pval)),]
  return(res_df)
}

# enrich non-parametric
wtenrich <- function(m,genesets,cores=1,testtype="selfcontained") {
  res <- mclapply( 1:length(genesets), function(i) {
    scores <- m[,1]
    gs <- genesets[i]
    name <- names(gs)
    n_members <- length(which(rownames(m) %in% gs[[1]]))
    if ( n_members > 4 ) {
      tstats <- m[which(rownames(m) %in% gs[[1]]),]
      myn <- length(tstats)
      mymean <- mean(tstats)
      mymedian <- median(tstats)
      if ( testtype == "selfcontained" ) { wt <- wilcox.test(tstats) }
      if ( testtype == "competitive" ) { wt <- wilcox.test(tstats,scores) }
      res <- c(name,myn,mymean,mymedian,wt$p.value)
    }
  } , mc.cores = cores)
  res_df <- do.call(rbind, res)
  rownames(res_df) <- res_df[,1]
  res_df <- res_df[,-1]
  colnames(res_df) <- c("n_genes","t_mean","t_median","pval")
  tmp <- apply(res_df,2,as.numeric)
  rownames(tmp) <- rownames(res_df)
  res_df <- tmp
  res_df <- as.data.frame(res_df)
  res_df <- res_df[order(res_df$pval),]
  res_df$logp <- -log10(res_df$pval )
  res_df$fdr <- p.adjust(res_df$pval,method="fdr")
  res_df[order(abs(res_df$pval)),]
  return(res_df)
}

# LA parametric competitive
simlac <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  dm3 <- runlimma(mval=mval2,design=d,myann=myann)
  dd <- merge(dm3,gt,by.x=0,by.y="probe")
  m1 <- aggregate(t ~ gene,dd,mean)
  rownames(m1) <- m1$gene
  m1$gene=NULL
  lares1 <- ttenrich(m=m1,genesets=gsets,cores=2,testtype="competitive")
  gsig_up3 <- rownames(subset(lares1, fdr < 0.05 & t_mean > 0))
  gsig_dn3 <- rownames(subset(lares1, fdr < 0.05 & t_mean < 0))
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(lares1)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

# LA parametric competitive top
simlactop <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  dm3 <- runlimma(mval=mval2,design=d,myann=myann)
  dd <- merge(dm3,gt,by.x=0,by.y="probe")
  m1 <- aggregate(t ~ gene,dd, function(x) {
    if (abs(max(x)) > abs(min(x))) { max(x) } else { min(x) }
  })
  rownames(m1) <- m1$gene
  m1$gene=NULL
  lares1 <- ttenrich(m=m1,genesets=gsets,cores=2,testtype="competitive")
  gsig_up3 <- rownames(subset(lares1, fdr < 0.05 & t_mean > 0))
  gsig_dn3 <- rownames(subset(lares1, fdr < 0.05 & t_mean < 0))
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(lares1)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

# LA nonparametric competitive
simnlac <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  dm3 <- runlimma(mval=mval2,design=d,myann=myann)
  dd <- merge(dm3,gt,by.x=0,by.y="probe")
  m1 <- aggregate(t ~ gene,dd,mean)
  rownames(m1) <- m1$gene
  m1$gene=NULL
  lares1 <- wtenrich(m=m1,genesets=gsets,cores=2,testtype="competitive")
  gsig_up3 <- rownames(subset(lares1, fdr < 0.05 & t_mean > 0))
  gsig_dn3 <- rownames(subset(lares1, fdr < 0.05 & t_mean < 0))
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(lares1)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

# LA competitive mitch
simlacm <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  dm3 <- runlimma(mval=mval2,design=d,myann=myann)
  dd <- merge(dm3,gt,by.x=0,by.y="probe")
  m1 <- aggregate(t ~ gene,dd,mean)
  rownames(m1) <- m1$gene
  m1$gene=NULL
  lamres1 <- runmitch(m=m1,genesets=gsets,cores=2)
  gsig_up3 <- rownames( subset( lamres1, p.adjustANOVA < 0.05 & s.dist > 0 ) )
  gsig_dn3 <- rownames( subset( lamres1, p.adjustANOVA < 0.05 & s.dist < 0 ) )
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(lamres1)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

# LA rank competitive mitch
simlrm <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  dm3 <- runlimma(mval=mval2,design=d,myann=myann)
  # rank probes first
  dm3$rank <-  rank(dm3$t) - nrow(subset(dm3,t<0))
  mm <- merge(dm3,gt,by.x=0,by.y="probe")
  head(mm)
  mma <-aggregate(mm$rank ~ gene,mm,mean)
  rownames(mma) <- mma$gene
  mma$gene = NULL
  colnames(mma) <- "meanrank"
  lrmres <- runmitch(m=mma,genesets=gsets,cores=2)
  gsig_up3 <- rownames( subset( lrmres, p.adjustANOVA < 0.05 & s.dist > 0 ) )
  gsig_dn3 <- rownames( subset( lrmres, p.adjustANOVA < 0.05 & s.dist < 0 ) )
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(lrmres)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

AL: Aggregate limma

Functions for aggregate-limma-enrich approach.

# AL approach parametric competitive
simalc <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  # al pipeline
  mm <- merge(mval2,gt,by.x=0,by.y="probe")
  mm$Row.names = NULL
  a <- aggregate(. ~ gene,mm,mean)
  rownames(a) <- a$gene
  a$gene=NULL
  fit.reduced <- lmFit(a,d)
  fit.reduced <- eBayes(fit.reduced)
  al <- topTable(fit.reduced,coef=ncol(d), number = Inf)
  m1 <- as.data.frame(al$t)
  rownames(m1) <- rownames(al)
  colnames(m1) <- "t"
  alres1 <- ttenrich(m=m1,genesets=gsets,cores=2,testtype="competitive")
  # summarise results
  gsig_up3 <- rownames(subset(alres1, fdr < 0.05 & t_mean > 0))
  gsig_dn3 <- rownames(subset(alres1, fdr < 0.05 & t_mean < 0))
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(alres1)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

# AL approach nonparametric competitive
simnalc <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  # al pipeline
  mm <- merge(mval2,gt,by.x=0,by.y="probe")
  mm$Row.names = NULL
  a <- aggregate(. ~ gene,mm,mean)
  rownames(a) <- a$gene
  a$gene=NULL
  fit.reduced <- lmFit(a,d)
  fit.reduced <- eBayes(fit.reduced)
  al <- topTable(fit.reduced,coef=ncol(d), number = Inf)
  m1 <- as.data.frame(al$t)
  rownames(m1) <- rownames(al)
  colnames(m1) <- "t"
  alres1 <- wtenrich(m=m1,genesets=gsets,cores=2,testtype="competitive")
  # summarise results
  gsig_up3 <- rownames(subset(alres1, fdr < 0.05 & t_mean > 0))
  gsig_dn3 <- rownames(subset(alres1, fdr < 0.05 & t_mean < 0))
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(alres1)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

Agg-limma-mitch function

This approach uses the aggregated mvals, limma and instead of a 1-sample t-test it uses mitch which is a competitive test and could give more interpretable results.

runmitch <- function(m,genesets,cores=1) {
  suppressMessages({ mres <- mitch_calc(m,genesets,minsetsize=5,cores=cores) })
  mres <- mres$enrichment_result
  rownames(mres) <- mres$set
  mres$set=NULL
  return(mres)
}

simalm <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  # alm pipeline
  mm <- merge(mval2,gt,by.x=0,by.y="probe")
  mm$Row.names = NULL
  a <- aggregate(. ~ gene,mm,mean)
  rownames(a) <- a$gene
  a$gene=NULL
  fit.reduced <- lmFit(a,d)
  fit.reduced <- eBayes(fit.reduced)
  al <- topTable(fit.reduced,coef=ncol(d), number = Inf)
  m1 <- as.data.frame(al$t)
  rownames(m1) <- rownames(al)
  colnames(m1) <- "t"
  almres1 <- runmitch(m=m1,genesets=gsets,cores=2)
  # summarise results
  gsig_up3 <- rownames( subset( almres1, p.adjustANOVA < 0.05 & s.dist > 0 ) )
  gsig_dn3 <- rownames( subset( almres1, p.adjustANOVA < 0.05 & s.dist < 0 ) )
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(almres1)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

AA Aggregate-aggregate-limma functions

Use mean value works well here.

gsagg <- function(x,genesets,cores=1) {
  meds <- mclapply(1:length(genesets), function(i) {
    gs = genesets[[i]]
    xx <- x[rownames(x) %in% gs,]
    med <- apply(xx,2,mean)
  },mc.cores=cores)
  mymed <- do.call(rbind,meds)
  rownames(mymed) <- names(genesets)
  as.data.frame(mymed)
}

aalimma <- function(agag,design) {
  fit.reduced <- lmFit(agag,design)
  fit.reduced <- eBayes(fit.reduced)
  dmagg <- topTable(fit.reduced,coef=ncol(design), number = Inf)
  return(dmagg)
}

aal <- function(mval,myann,genesets,design,cores=1) {
  medf <- chragg(mval,myann,cores=cores)
  agag <- gsagg(x=medf,genesets=genesets,cores=cores)
  aalres <- aalimma(agag=agag,design=design)
  return(aalres)
}

simaa <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  mm <- merge(mval2,gt,by.x=0,by.y="probe")
  mm$Row.names = NULL
  a <- aggregate(. ~ gene,mm,mean)
  rownames(a) <- a$gene
  a$gene=NULL
  mystack <- stack(gsets)
  mmm <- merge(a,mystack,by.x=0,by.y="values")
  mmm$Row.names=NULL
  aa <- aggregate(. ~ ind,mmm,mean)
  rownames(aa) <- aa$ind
  aa$ind=NULL
  fit.reduced <- lmFit(aa,d)
  fit.reduced <- eBayes(fit.reduced)
  aares1 <- topTable(fit.reduced,coef=ncol(d), number = Inf)
  # summarise results
  gsig_up3 <- rownames(subset(aares1, adj.P.Val < 0.05 & logFC > 0))
  gsig_dn3 <- rownames(subset(aares1, adj.P.Val < 0.05 & logFC < 0))
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(aares1)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

F1 <- function(x,y) {
  ( 2 * x * y ) / ( x + y )
}

Run analyses

Set assumptions.

num_dm_sets=50
sims=20

groupsizes=c(3,5,8,12)
deltas=c(0.1,0.2,0.3,0.4,0.5)
#groupsizes=5
#deltas=0.4

params <- expand.grid("groupsizes"=groupsizes,"deltas"=deltas)

params %>% kbl(caption="Parameters to test") %>% kable_paper("hover", full_width = F)

Parameters to test
groupsizes	deltas
3	0.1
5	0.1
8	0.1
12	0.1
3	0.2
5	0.2
8	0.2
12	0.2
3	0.3
5	0.3
8	0.3
12	0.3
3	0.4
5	0.4
8	0.4
12	0.4
3	0.5
5	0.5
8	0.5
12	0.5

GSA meth sim

CCannot be run in multicore due to fragility of AnnotationDbi SQLite objects.

gres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- lapply(1:sims,function(i) {
    simgsa(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5,
      frac_probes=0.5, groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  })
  res <- do.call(rbind,res)
  return(res)
})

gres2 <- do.call(rbind,lapply(gres,colMeans))
gres2[,"PREC"] <- gres2[,"TP"] / ( gres2[,"TP"] + gres2[,"FP"] )

gres2 %>% kbl(caption="GSAmeth results") %>% kable_paper("hover", full_width = F)

GSAmeth results
TP	FP	FN	TN	PREC	REC
0.00	0.05	50.00	949.95	0.0000000	0.000
0.00	0.00	50.00	950.00	NaN	0.000
0.00	0.00	50.00	950.00	NaN	0.000
0.00	0.00	50.00	950.00	NaN	0.000
0.00	0.05	50.00	949.95	0.0000000	0.000
0.00	0.00	50.00	950.00	NaN	0.000
0.15	0.00	49.85	950.00	1.0000000	0.003
2.05	0.00	47.95	950.00	1.0000000	0.041
0.00	0.00	50.00	950.00	NaN	0.000
0.10	0.00	49.90	950.00	1.0000000	0.002
2.40	0.35	47.60	949.65	0.8727273	0.048
8.15	0.35	41.85	949.65	0.9588235	0.163
0.05	0.00	49.95	950.00	1.0000000	0.001
1.80	0.05	48.20	949.95	0.9729730	0.036
10.40	0.45	39.60	949.55	0.9585253	0.208
45.25	3.10	4.75	946.90	0.9358842	0.905
0.50	0.00	49.50	950.00	1.0000000	0.010
3.20	0.25	46.80	949.75	0.9275362	0.064
41.25	2.90	8.75	947.10	0.9343148	0.825
48.00	3.60	2.00	946.40	0.9302326	0.960

gres3p <- do.call(rbind,lapply(groupsizes, function (g) { gres2[params$groupsizes==g,"PREC"] }))
colnames(gres3p) <- deltas
rownames(gres3p) <- groupsizes
gres3p %>% kbl(caption="GSAmeth precision") %>% kable_paper("hover", full_width = F)

GSAmeth precision
	0.1	0.2	0.3	0.4	0.5
3	0	0	NaN	1.0000000	1.0000000
5	NaN	NaN	1.0000000	0.9729730	0.9275362
8	NaN	1	0.8727273	0.9585253	0.9343148
12	NaN	1	0.9588235	0.9358842	0.9302326

gres3r <- do.call(rbind,lapply(groupsizes, function (g) { gres2[params$groupsizes==g,"REC"] }))
colnames(gres3r) <- deltas
rownames(gres3r) <- groupsizes
gres3r %>% kbl(caption="GSAmeth recall") %>% kable_paper("hover", full_width = F)

GSAmeth recall
	0.2	0.3	0.4	0.5
3	0.000	0.000	0.001	0.010
5	0.000	0.002	0.036	0.064
8	0.003	0.048	0.208	0.825
12	0.041	0.163	0.905	0.960

F1(gres3p,gres3r) %>% kbl(caption="GSAmeth F1") %>% kable_paper("hover", full_width = F)

GSAmeth F1
	0.1	0.2	0.3	0.4	0.5
3	NaN	NaN	NaN	0.0019980	0.0198020
5	NaN	NaN	0.0039920	0.0694311	0.1197381
8	NaN	0.0059821	0.0909953	0.3418242	0.8762613
12	NaN	0.0787704	0.2786325	0.9201830	0.9448819

LA sim

parametric competitive

lacres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simlac(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

lacres2 <- do.call(rbind,lapply(lacres,colMeans))
lacres2[,"PREC"] <- lacres2[,"TP"] / ( lacres2[,"TP"] + lacres2[,"FP"] )

lacres2 %>% kbl(caption="LA parametric results") %>% kable_paper("hover", full_width = F)

LA parametric results
TP	FP	FN	TN	PREC	REC
0.05	0.10	49.95	949.90	0.3333333	0.001
0.00	0.00	50.00	950.00	NaN	0.000
0.10	0.15	49.90	949.85	0.4000000	0.002
0.10	0.05	49.90	949.95	0.6666667	0.002
0.15	0.10	49.85	949.90	0.6000000	0.003
0.60	0.20	49.40	949.80	0.7500000	0.012
1.75	0.40	48.25	949.60	0.8139535	0.035
3.05	0.30	46.95	949.70	0.9104478	0.061
0.40	0.10	49.60	949.90	0.8000000	0.008
2.90	0.15	47.10	949.85	0.9508197	0.058
5.55	0.95	44.45	949.05	0.8538462	0.111
8.85	0.50	41.15	949.50	0.9465241	0.177
2.10	0.20	47.90	949.80	0.9130435	0.042
5.80	0.40	44.20	949.60	0.9354839	0.116
11.15	0.65	38.85	949.35	0.9449153	0.223
15.75	0.55	34.25	949.45	0.9662577	0.315
3.60	0.45	46.40	949.55	0.8888889	0.072
10.15	0.85	39.85	949.15	0.9227273	0.203
16.75	0.65	33.25	949.35	0.9626437	0.335
18.55	0.55	31.45	949.45	0.9712042	0.371

lacres3p <- do.call(rbind,lapply(groupsizes, function (g) { lacres2[params$groupsizes==g,"PREC"] }))
colnames(lacres3p) <- deltas
rownames(lacres3p) <- groupsizes
lacres3p %>% kbl(caption="LA parametric precision") %>% kable_paper("hover", full_width = F)

LA parametric precision
	0.1	0.2	0.3	0.4	0.5
3	0.3333333	0.6000000	0.8000000	0.9130435	0.8888889
5	NaN	0.7500000	0.9508197	0.9354839	0.9227273
8	0.4000000	0.8139535	0.8538462	0.9449153	0.9626437
12	0.6666667	0.9104478	0.9465241	0.9662577	0.9712042

lacres3r <- do.call(rbind,lapply(groupsizes, function (g) { lacres2[params$groupsizes==g,"REC"] }))
colnames(lacres3r) <- deltas
rownames(lacres3r) <- groupsizes
lacres3r %>% kbl(caption="LA parametric recall") %>% kable_paper("hover", full_width = F)

LA parametric recall
	0.1	0.2	0.3	0.4	0.5
3	0.001	0.003	0.008	0.042	0.072
5	0.000	0.012	0.058	0.116	0.203
8	0.002	0.035	0.111	0.223	0.335
12	0.002	0.061	0.177	0.315	0.371

F1(lacres3p,lacres3r) %>% kbl(caption="LA parametric F1") %>% kable_paper("hover", full_width = F)

LA parametric F1
	0.1	0.2	0.3	0.4	0.5
3	0.0019940	0.0059701	0.0158416	0.0803059	0.1332100
5	NaN	0.0236220	0.1093308	0.2064057	0.3327869
8	0.0039801	0.0671141	0.1964602	0.3608414	0.4970326
12	0.0039880	0.1143393	0.2982308	0.4751131	0.5369030

parametric competitive with “top” aggregation

Too many false positives.

lactopres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simlactop(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

lactopres2 <- do.call(rbind,lapply(lactopres,colMeans))
lactopres2[,"PREC"] <- lactopres2[,"TP"] / ( lactopres2[,"TP"] + lactopres2[,"FP"] )

lactopres2 %>% kbl(caption="LA parametric results (top agg)") %>% kable_paper("hover", full_width = F)

LA parametric results (top agg)
TP	FP	FN	TN	PREC	REC
0.05	0.30	49.95	949.70	0.1428571	0.001
0.05	0.20	49.95	949.80	0.2000000	0.001
0.15	0.35	49.85	949.65	0.3000000	0.003
0.40	0.50	49.60	949.50	0.4444444	0.008
0.05	0.20	49.95	949.80	0.2000000	0.001
0.30	0.45	49.70	949.55	0.4000000	0.006
1.45	0.25	48.55	949.75	0.8529412	0.029
4.10	0.90	45.90	949.10	0.8200000	0.082
0.45	0.15	49.55	949.85	0.7500000	0.009
2.20	0.80	47.80	949.20	0.7333333	0.044
8.10	0.55	41.90	949.45	0.9364162	0.162
14.95	1.40	35.05	948.60	0.9143731	0.299
2.20	0.25	47.80	949.75	0.8979592	0.044
7.00	0.90	43.00	949.10	0.8860759	0.140
15.85	0.90	34.15	949.10	0.9462687	0.317
22.80	1.55	27.20	948.45	0.9363450	0.456
5.30	0.50	44.70	949.50	0.9137931	0.106
13.55	1.25	36.45	948.75	0.9155405	0.271
21.25	1.30	28.75	948.70	0.9423503	0.425
28.75	1.55	21.25	948.45	0.9488449	0.575

lactopres3p <- do.call(rbind,lapply(groupsizes, function (g) { lactopres2[params$groupsizes==g,"PREC"] }))
colnames(lactopres3p) <- deltas
rownames(lactopres3p) <- groupsizes
lactopres3p %>% kbl(caption="LA parametric precision (top agg)") %>% kable_paper("hover", full_width = F)

LA parametric precision (top agg)
	0.1	0.2	0.3	0.4	0.5
3	0.1428571	0.2000000	0.7500000	0.8979592	0.9137931
5	0.2000000	0.4000000	0.7333333	0.8860759	0.9155405
8	0.3000000	0.8529412	0.9364162	0.9462687	0.9423503
12	0.4444444	0.8200000	0.9143731	0.9363450	0.9488449

lactopres3r <- do.call(rbind,lapply(groupsizes, function (g) { lactopres2[params$groupsizes==g,"REC"] }))
colnames(lactopres3r) <- deltas
rownames(lactopres3r) <- groupsizes
lactopres3r %>% kbl(caption="LA parametric recall (top agg)") %>% kable_paper("hover", full_width = F)

LA parametric recall (top agg)
	0.1	0.2	0.3	0.4	0.5
3	0.001	0.001	0.009	0.044	0.106
5	0.001	0.006	0.044	0.140	0.271
8	0.003	0.029	0.162	0.317	0.425
12	0.008	0.082	0.299	0.456	0.575

F1(lactopres3p,lactopres3r) %>% kbl(caption="LA parametric F1 (top agg)") %>% kable_paper("hover", full_width = F)

LA parametric F1 (top agg)
	0.1	0.2	0.3	0.4	0.5
3	0.0019861	0.0019900	0.0177866	0.0838894	0.1899642
5	0.0019900	0.0118227	0.0830189	0.2417962	0.4182099
8	0.0059406	0.0560928	0.2762148	0.4749064	0.5858029
12	0.0157171	0.1490909	0.4506405	0.6133154	0.7160648

nonparametric competitive

nlacres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simnlac(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

nlacres2 <- do.call(rbind,lapply(nlacres,colMeans))
nlacres2[,"PREC"] <- nlacres2[,"TP"] / ( nlacres2[,"TP"] + nlacres2[,"FP"] )

nlacres2 %>% kbl(caption="LA nonparametric results") %>% kable_paper("hover", full_width = F)

LA nonparametric results
TP	FP	FN	TN	PREC	REC
0.00	0.10	50.00	949.90	0.0000000	0.000
0.15	0.05	49.85	949.95	0.7500000	0.003
0.20	0.15	49.80	949.85	0.5714286	0.004
0.55	0.15	49.45	949.85	0.7857143	0.011
0.25	0.20	49.75	949.80	0.5555556	0.005
1.80	0.55	48.20	949.45	0.7659574	0.036
4.40	1.00	45.60	949.00	0.8148148	0.088
7.35	1.25	42.65	948.75	0.8546512	0.147
2.15	0.45	47.85	949.55	0.8269231	0.043
6.80	1.00	43.20	949.00	0.8717949	0.136
11.45	2.00	38.55	948.00	0.8513011	0.229
15.15	1.30	34.85	948.70	0.9209726	0.303
5.95	0.75	44.05	949.25	0.8880597	0.119
12.60	1.65	37.40	948.35	0.8842105	0.252
16.80	2.10	33.20	947.90	0.8888889	0.336
20.35	1.30	29.65	948.70	0.9399538	0.407
10.60	1.90	39.40	948.10	0.8480000	0.212
17.10	1.70	32.90	948.30	0.9095745	0.342
20.55	1.40	29.45	948.60	0.9362187	0.411
24.05	1.55	25.95	948.45	0.9394531	0.481

nlacres3p <- do.call(rbind,lapply(groupsizes, function (g) { nlacres2[params$groupsizes==g,"PREC"] }))
colnames(nlacres3p) <- deltas
rownames(nlacres3p) <- groupsizes
nlacres3p %>% kbl(caption="LA nonparametric precision") %>% kable_paper("hover", full_width = F)

LA nonparametric precision
	0.1	0.2	0.3	0.4	0.5
3	0.0000000	0.5555556	0.8269231	0.8880597	0.8480000
5	0.7500000	0.7659574	0.8717949	0.8842105	0.9095745
8	0.5714286	0.8148148	0.8513011	0.8888889	0.9362187
12	0.7857143	0.8546512	0.9209726	0.9399538	0.9394531

nlacres3r <- do.call(rbind,lapply(groupsizes, function (g) { nlacres2[params$groupsizes==g,"REC"] }))
colnames(nlacres3r) <- deltas
rownames(nlacres3r) <- groupsizes
nlacres3r %>% kbl(caption="LA nonparametric recall") %>% kable_paper("hover", full_width = F)

LA nonparametric recall
	0.1	0.2	0.3	0.4	0.5
3	0.000	0.005	0.043	0.119	0.212
5	0.003	0.036	0.136	0.252	0.342
8	0.004	0.088	0.229	0.336	0.411
12	0.011	0.147	0.303	0.407	0.481

F1(nlacres3p,nlacres3r) %>% kbl(caption="LA nonparametric F1") %>% kable_paper("hover", full_width = F)

LA nonparametric F1
	0.1	0.2	0.3	0.4	0.5
3	NaN	0.0099108	0.0817490	0.2098765	0.3392000
5	0.0059761	0.0687679	0.2352941	0.3922179	0.4970930
8	0.0079444	0.1588448	0.3609141	0.4876633	0.5712300
12	0.0216963	0.2508532	0.4559819	0.5680391	0.6362434

LA competitive mitch

lacmres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simlacm(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

lacmres2 <- do.call(rbind,lapply(lacmres,colMeans))
lacmres2[,"PREC"] <- lacmres2[,"TP"] / ( lacmres2[,"TP"] + lacmres2[,"FP"] )

lacmres2 %>% kbl(caption="LA mitch results") %>% kable_paper("hover", full_width = F)

LA mitch results
TP	FP	FN	TN	PREC	REC
0.00	0.10	50.00	949.90	0.0000000	0.000
0.15	0.05	49.85	949.95	0.7500000	0.003
0.30	0.05	49.70	949.95	0.8571429	0.006
0.65	0.05	49.35	949.95	0.9285714	0.013
0.25	0.20	49.75	949.80	0.5555556	0.005
2.20	0.15	47.80	949.85	0.9361702	0.044
5.15	0.35	44.85	949.65	0.9363636	0.103
8.35	0.55	41.65	949.45	0.9382022	0.167
2.30	0.30	47.70	949.70	0.8846154	0.046
7.45	0.40	42.55	949.60	0.9490446	0.149
12.70	0.85	37.30	949.15	0.9372694	0.254
15.50	1.05	34.50	948.95	0.9365559	0.310
6.20	0.70	43.80	949.30	0.8985507	0.124
13.90	0.70	36.10	949.30	0.9520548	0.278
17.75	1.25	32.25	948.75	0.9342105	0.355
20.55	1.40	29.45	948.60	0.9362187	0.411
11.85	1.20	38.15	948.80	0.9080460	0.237
17.95	0.90	32.05	949.10	0.9522546	0.359
21.00	1.30	29.00	948.70	0.9417040	0.420
24.10	1.55	25.90	948.45	0.9395712	0.482

lacmres3p <- do.call(rbind,lapply(groupsizes, function (g) { lacmres2[params$groupsizes==g,"PREC"] }))
colnames(lacmres3p) <- deltas
rownames(lacmres3p) <- groupsizes
lacmres3p %>% kbl(caption="LA mitch precision") %>% kable_paper("hover", full_width = F)

LA mitch precision
	0.1	0.2	0.3	0.4	0.5
3	0.0000000	0.5555556	0.8846154	0.8985507	0.9080460
5	0.7500000	0.9361702	0.9490446	0.9520548	0.9522546
8	0.8571429	0.9363636	0.9372694	0.9342105	0.9417040
12	0.9285714	0.9382022	0.9365559	0.9362187	0.9395712

lacmres3r <- do.call(rbind,lapply(groupsizes, function (g) { lacmres2[params$groupsizes==g,"REC"] }))
colnames(lacmres3r) <- deltas
rownames(lacmres3r) <- groupsizes
lacmres3r %>% kbl(caption="LA mitch recall") %>% kable_paper("hover", full_width = F)

LA mitch recall
	0.1	0.2	0.3	0.4	0.5
3	0.000	0.005	0.046	0.124	0.237
5	0.003	0.044	0.149	0.278	0.359
8	0.006	0.103	0.254	0.355	0.420
12	0.013	0.167	0.310	0.411	0.482

F1(lacmres3p,lacmres3r) %>% kbl(caption="LA mitch F1") %>% kable_paper("hover", full_width = F)

LA mitch F1
	0.1	0.2	0.3	0.4	0.5
3	NaN	0.0099108	0.0874525	0.2179262	0.3758921
5	0.0059761	0.0840497	0.2575627	0.4303406	0.5214234
8	0.0119166	0.1855856	0.3996853	0.5144928	0.5809129
12	0.0256410	0.2835314	0.4658152	0.5712300	0.6371447

LA rank probes

lrmres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simlrm(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

lrmres2 <- do.call(rbind,lapply(lrmres,colMeans))
lrmres2[,"PREC"] <- lrmres2[,"TP"] / ( lrmres2[,"TP"] + lrmres2[,"FP"] )

lrmres2 %>% kbl(caption="LA rank mitch results") %>% kable_paper("hover", full_width = F)

LA rank mitch results
TP	FP	FN	TN	PREC	REC
0.00	0.15	50.00	949.85	0.0000000	0.000
0.05	0.00	49.95	950.00	1.0000000	0.001
0.15	0.05	49.85	949.95	0.7500000	0.003
0.30	0.05	49.70	949.95	0.8571429	0.006
0.20	0.10	49.80	949.90	0.6666667	0.004
1.50	0.10	48.50	949.90	0.9375000	0.030
2.95	0.30	47.05	949.70	0.9076923	0.059
3.85	0.20	46.15	949.80	0.9506173	0.077
1.45	0.25	48.55	949.75	0.8529412	0.029
4.15	0.30	45.85	949.70	0.9325843	0.083
6.40	0.55	43.60	949.45	0.9208633	0.128
6.90	0.35	43.10	949.65	0.9517241	0.138
3.60	0.55	46.40	949.45	0.8674699	0.072
7.15	0.55	42.85	949.45	0.9285714	0.143
8.45	0.95	41.55	949.05	0.8989362	0.169
8.60	0.40	41.40	949.60	0.9555556	0.172
4.95	0.70	45.05	949.30	0.8761062	0.099
8.55	0.55	41.45	949.45	0.9395604	0.171
9.90	1.00	40.10	949.00	0.9082569	0.198
9.25	0.40	40.75	949.60	0.9585492	0.185

lrmres3p <- do.call(rbind,lapply(groupsizes, function (g) { lrmres2[params$groupsizes==g,"PREC"] }))
colnames(lrmres3p) <- deltas
rownames(lrmres3p) <- groupsizes
lrmres3p %>% kbl(caption="LA rank mitch precision") %>% kable_paper("hover", full_width = F)

LA rank mitch precision
	0.1	0.2	0.3	0.4	0.5
3	0.0000000	0.6666667	0.8529412	0.8674699	0.8761062
5	1.0000000	0.9375000	0.9325843	0.9285714	0.9395604
8	0.7500000	0.9076923	0.9208633	0.8989362	0.9082569
12	0.8571429	0.9506173	0.9517241	0.9555556	0.9585492

lrmres3r <- do.call(rbind,lapply(groupsizes, function (g) { lrmres2[params$groupsizes==g,"REC"] }))
colnames(lrmres3r) <- deltas
rownames(lrmres3r) <- groupsizes
lrmres3r %>% kbl(caption="LA rank mitch recall") %>% kable_paper("hover", full_width = F)

LA rank mitch recall
	0.1	0.2	0.3	0.4	0.5
3	0.000	0.004	0.029	0.072	0.099
5	0.001	0.030	0.083	0.143	0.171
8	0.003	0.059	0.128	0.169	0.198
12	0.006	0.077	0.138	0.172	0.185

F1(lrmres3p,lacmres3r) %>% kbl(caption="LA rank mitch F1") %>% kable_paper("hover", full_width = F)

LA rank mitch F1
	0.1	0.2	0.3	0.4	0.5
3	NaN	0.0099256	0.0872922	0.2169834	0.3730770
5	0.0059821	0.0840550	0.2569473	0.4278949	0.5195017
8	0.0119048	0.1850065	0.3981728	0.5089930	0.5743887
12	0.0256116	0.2840920	0.4676688	0.5747784	0.6414508

AL sim

parametric competitive

alcres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simalc(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

alcres2 <- do.call(rbind,lapply(alcres,colMeans))
alcres2[,"PREC"] <- alcres2[,"TP"] / ( alcres2[,"TP"] + alcres2[,"FP"] )

alcres2 %>% kbl(caption="AL parametric results") %>% kable_paper("hover", full_width = F)

AL parametric results
TP	FP	FN	TN	PREC	REC
0.00	0.25	50.00	949.75	0.0000000	0.000
0.00	0.05	50.00	949.95	0.0000000	0.000
0.10	0.10	49.90	949.90	0.5000000	0.002
0.25	0.15	49.75	949.85	0.6250000	0.005
0.05	0.10	49.95	949.90	0.3333333	0.001
0.45	0.10	49.55	949.90	0.8181818	0.009
2.15	0.60	47.85	949.40	0.7818182	0.043
3.20	0.35	46.80	949.65	0.9014085	0.064
0.35	0.10	49.65	949.90	0.7777778	0.007
3.35	0.15	46.65	949.85	0.9571429	0.067
6.05	0.95	43.95	949.05	0.8642857	0.121
9.20	0.45	40.80	949.55	0.9533679	0.184
1.70	0.25	48.30	949.75	0.8717949	0.034
6.25	0.40	43.75	949.60	0.9398496	0.125
13.00	1.05	37.00	948.95	0.9252669	0.260
16.45	0.30	33.55	949.70	0.9820896	0.329
4.30	0.55	45.70	949.45	0.8865979	0.086
10.50	0.70	39.50	949.30	0.9375000	0.210
17.95	0.65	32.05	949.35	0.9650538	0.359
19.90	0.30	30.10	949.70	0.9851485	0.398

alcres3p <- do.call(rbind,lapply(groupsizes, function (g) { alcres2[params$groupsizes==g,"PREC"] }))
colnames(alcres3p) <- deltas
rownames(alcres3p) <- groupsizes
alcres3p %>% kbl(caption="AL parametric precision") %>% kable_paper("hover", full_width = F)

AL parametric precision
	0.1	0.2	0.3	0.4	0.5
3	0.000	0.3333333	0.7777778	0.8717949	0.8865979
5	0.000	0.8181818	0.9571429	0.9398496	0.9375000
8	0.500	0.7818182	0.8642857	0.9252669	0.9650538
12	0.625	0.9014085	0.9533679	0.9820896	0.9851485

alcres3r <- do.call(rbind,lapply(groupsizes, function (g) { alcres2[params$groupsizes==g,"REC"] }))
colnames(alcres3r) <- deltas
rownames(alcres3r) <- groupsizes
alcres3r %>% kbl(caption="AL parametric recall") %>% kable_paper("hover", full_width = F)

AL parametric recall
	0.1	0.2	0.3	0.4	0.5
3	0.000	0.001	0.007	0.034	0.086
5	0.000	0.009	0.067	0.125	0.210
8	0.002	0.043	0.121	0.260	0.359
12	0.005	0.064	0.184	0.329	0.398

F1(alcres3p,alcres3r) %>% kbl(caption="AL parametric F1") %>% kable_paper("hover", full_width = F)

AL parametric F1
	0.1	0.2	0.3	0.4	0.5
3	NaN	0.0019940	0.0138751	0.0654475	0.1567912
5	NaN	0.0178042	0.1252336	0.2206531	0.3431373
8	0.0039841	0.0815166	0.2122807	0.4059329	0.5233236
12	0.0099206	0.1195145	0.3084661	0.4928839	0.5669516

nonparametric competitive

nalcres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simnalc(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

nalcres2 <- do.call(rbind,lapply(nalcres,colMeans))
nalcres2[,"PREC"] <- nalcres2[,"TP"] / ( nalcres2[,"TP"] + nalcres2[,"FP"] )

nalcres2 %>% kbl(caption="AL nonparametric results") %>% kable_paper("hover", full_width = F)

AL nonparametric results
TP	FP	FN	TN	PREC	REC
0.05	0.20	49.95	949.80	0.2000000	0.001
0.00	0.00	50.00	950.00	NaN	0.000
0.15	0.10	49.85	949.90	0.6000000	0.003
0.35	0.10	49.65	949.90	0.7777778	0.007
0.15	0.15	49.85	949.85	0.5000000	0.003
1.00	0.20	49.00	949.80	0.8333333	0.020
3.35	1.00	46.65	949.00	0.7701149	0.067
6.40	1.00	43.60	949.00	0.8648649	0.128
1.90	0.45	48.10	949.55	0.8085106	0.038
5.60	0.95	44.40	949.05	0.8549618	0.112
10.75	1.95	39.25	948.05	0.8464567	0.215
14.95	1.15	35.05	948.85	0.9285714	0.299
4.60	0.85	45.40	949.15	0.8440367	0.092
11.25	1.15	38.75	948.85	0.9072581	0.225
16.70	1.80	33.30	948.20	0.9027027	0.334
20.20	1.35	29.80	948.65	0.9373550	0.404
10.00	1.20	40.00	948.80	0.8928571	0.200
16.05	1.55	33.95	948.45	0.9119318	0.321
20.00	1.25	30.00	948.75	0.9411765	0.400
23.80	1.45	26.20	948.55	0.9425743	0.476

nalcres3p <- do.call(rbind,lapply(groupsizes, function (g) { nalcres2[params$groupsizes==g,"PREC"] }))
colnames(nalcres3p) <- deltas
rownames(nalcres3p) <- groupsizes
nalcres3p %>% kbl(caption="AL nonparametric precision") %>% kable_paper("hover", full_width = F)

AL nonparametric precision
	0.1	0.2	0.3	0.4	0.5
3	0.2000000	0.5000000	0.8085106	0.8440367	0.8928571
5	NaN	0.8333333	0.8549618	0.9072581	0.9119318
8	0.6000000	0.7701149	0.8464567	0.9027027	0.9411765
12	0.7777778	0.8648649	0.9285714	0.9373550	0.9425743

nalcres3r <- do.call(rbind,lapply(groupsizes, function (g) { nalcres2[params$groupsizes==g,"REC"] }))
colnames(nalcres3r) <- deltas
rownames(nalcres3r) <- groupsizes
nalcres3r %>% kbl(caption="AL nonparametric recall") %>% kable_paper("hover", full_width = F)

AL nonparametric recall
	0.1	0.2	0.3	0.4	0.5
3	0.001	0.003	0.038	0.092	0.200
5	0.000	0.020	0.112	0.225	0.321
8	0.003	0.067	0.215	0.334	0.400
12	0.007	0.128	0.299	0.404	0.476

F1(nalcres3p,nalcres3r) %>% kbl(caption="AL nonparametric F1") %>% kable_paper("hover", full_width = F)

AL nonparametric F1
	0.1	0.2	0.3	0.4	0.5
3	0.0019900	0.0059642	0.0725883	0.1659152	0.3267974
5	NaN	0.0390625	0.1980548	0.3605769	0.4748521
8	0.0059701	0.1232751	0.3429027	0.4875912	0.5614035
12	0.0138751	0.2229965	0.4523449	0.5646401	0.6325581

ALM competitive test

almres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simalm(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

almres2 <- do.call(rbind,lapply(almres,colMeans))
almres2[,"PREC"] <- almres2[,"TP"] / ( almres2[,"TP"] + almres2[,"FP"] )

almres2 %>% kbl(caption="ALM results") %>% kable_paper("hover", full_width = F)

ALM results
TP	FP	FN	TN	PREC	REC
0.05	0.20	49.95	949.80	0.2000000	0.001
0.05	0.05	49.95	949.95	0.5000000	0.001
0.20	0.05	49.80	949.95	0.8000000	0.004
0.40	0.05	49.60	949.95	0.8888889	0.008
0.15	0.15	49.85	949.85	0.5000000	0.003
1.20	0.05	48.80	949.95	0.9600000	0.024
4.20	0.30	45.80	949.70	0.9333333	0.084
7.10	0.40	42.90	949.60	0.9466667	0.142
2.15	0.35	47.85	949.65	0.8600000	0.043
6.35	0.25	43.65	949.75	0.9621212	0.127
12.05	0.80	37.95	949.20	0.9377432	0.241
15.45	0.85	34.55	949.15	0.9478528	0.309
5.10	0.50	44.90	949.50	0.9107143	0.102
12.05	0.45	37.95	949.55	0.9640000	0.241
17.50	1.20	32.50	948.80	0.9358289	0.350
20.50	1.35	29.50	948.65	0.9382151	0.410
10.85	0.80	39.15	949.20	0.9313305	0.217
16.85	0.95	33.15	949.05	0.9466292	0.337
20.20	1.25	29.80	948.75	0.9417249	0.404
24.05	1.50	25.95	948.50	0.9412916	0.481

almres3p <- do.call(rbind,lapply(groupsizes, function (g) { almres2[params$groupsizes==g,"PREC"] }))
colnames(almres3p) <- deltas
rownames(almres3p) <- groupsizes
almres3p %>% kbl(caption="ALM precision") %>% kable_paper("hover", full_width = F)

ALM precision
	0.1	0.2	0.3	0.4	0.5
3	0.2000000	0.5000000	0.8600000	0.9107143	0.9313305
5	0.5000000	0.9600000	0.9621212	0.9640000	0.9466292
8	0.8000000	0.9333333	0.9377432	0.9358289	0.9417249
12	0.8888889	0.9466667	0.9478528	0.9382151	0.9412916

almres3r <- do.call(rbind,lapply(groupsizes, function (g) { almres2[params$groupsizes==g,"REC"] }))
colnames(almres3r) <- deltas
rownames(almres3r) <- groupsizes
almres3r %>% kbl(caption="ALM recall") %>% kable_paper("hover", full_width = F)

ALM recall
	0.1	0.2	0.3	0.4	0.5
3	0.001	0.003	0.043	0.102	0.217
5	0.001	0.024	0.127	0.241	0.337
8	0.004	0.084	0.241	0.350	0.404
12	0.008	0.142	0.309	0.410	0.481

F1(almres3p,almres3r) %>% kbl(caption="ALM F1") %>% kable_paper("hover", full_width = F)

ALM F1
	0.1	0.2	0.3	0.4	0.5
3	0.0019900	0.0059642	0.0819048	0.1834532	0.3519870
5	0.0019960	0.0468293	0.2243816	0.3856000	0.4970501
8	0.0079602	0.1541284	0.3834527	0.5094614	0.5654304
12	0.0158573	0.2469565	0.4660633	0.5706333	0.6366645

AA sim

aares <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simaa(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

aares2 <- do.call(rbind,lapply(aares,colMeans))
aares2[,"PREC"] <- aares2[,"TP"] / ( aares2[,"TP"] + aares2[,"FP"] )

aares2 %>% kbl(caption="AA results") %>% kable_paper("hover", full_width = F)

AA results
TP	FP	FN	TN	PREC	REC
2.50	94.25	47.50	855.75	0.0258398	0.050
1.25	47.40	48.75	902.60	0.0256937	0.025
0.65	17.20	49.35	932.80	0.0364146	0.013
0.00	0.00	50.00	950.00	NaN	0.000
2.50	93.40	47.50	856.60	0.0260688	0.050
1.25	46.30	48.75	903.70	0.0262881	0.025
0.85	16.55	49.15	933.45	0.0488506	0.017
0.00	0.00	50.00	950.00	NaN	0.000
2.50	92.90	47.50	857.10	0.0262055	0.050
1.25	45.65	48.75	904.35	0.0266525	0.025
1.10	16.75	48.90	933.25	0.0616246	0.022
0.00	0.00	50.00	950.00	NaN	0.000
2.55	92.20	47.45	857.80	0.0269129	0.051
1.70	45.25	48.30	904.75	0.0362087	0.034
1.25	17.60	48.75	932.40	0.0663130	0.025
0.05	0.00	49.95	950.00	1.0000000	0.001
3.15	91.80	46.85	858.20	0.0331754	0.063
2.05	44.75	47.95	905.25	0.0438034	0.041
2.05	18.30	47.95	931.70	0.1007371	0.041
0.50	0.00	49.50	950.00	1.0000000	0.010

aares3p <- do.call(rbind,lapply(groupsizes, function (g) { aares2[params$groupsizes==g,"PREC"] }))
colnames(aares3p) <- deltas
rownames(aares3p) <- groupsizes
aares3p %>% kbl(caption="AA precision") %>% kable_paper("hover", full_width = F)

AA precision
	0.1	0.2	0.3	0.4	0.5
3	0.0258398	0.0260688	0.0262055	0.0269129	0.0331754
5	0.0256937	0.0262881	0.0266525	0.0362087	0.0438034
8	0.0364146	0.0488506	0.0616246	0.0663130	0.1007371
12	NaN	NaN	NaN	1.0000000	1.0000000

aares3r <- do.call(rbind,lapply(groupsizes, function (g) { aares2[params$groupsizes==g,"REC"] }))
colnames(aares3r) <- deltas
rownames(aares3r) <- groupsizes
aares3r %>% kbl(caption="AA recall") %>% kable_paper("hover", full_width = F)

AA recall
	0.1	0.2	0.3	0.4	0.5
3	0.050	0.050	0.050	0.051	0.063
5	0.025	0.025	0.025	0.034	0.041
8	0.013	0.017	0.022	0.025	0.041
12	0.000	0.000	0.000	0.001	0.010

F1(aares3p,almres3r) %>% kbl(caption="AA F1") %>% kable_paper("hover", full_width = F)

AA F1
	0.1	0.2	0.3	0.4	0.5
3	0.0019255	0.0053808	0.0325649	0.0425887	0.0575520
5	0.0019251	0.0250920	0.0440587	0.0629584	0.0775295
8	0.0072082	0.0617754	0.0981516	0.1115005	0.1612633
12	NaN	NaN	NaN	0.5815603	0.6495611

Session information

save.image("GSE158422_simulate050.Rdata")

sessionInfo()

## R version 4.3.2 (2023-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] kableExtra_1.3.4                                   
##  [2] mitch_1.15.0                                       
##  [3] psych_2.3.9                                        
##  [4] org.Hs.eg.db_3.18.0                                
##  [5] AnnotationDbi_1.64.1                               
##  [6] HGNChelper_0.8.1                                   
##  [7] IlluminaHumanMethylation450kmanifest_0.4.0         
##  [8] missMethyl_1.36.0                                  
##  [9] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
## [10] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1 
## [11] minfi_1.48.0                                       
## [12] bumphunter_1.44.0                                  
## [13] locfit_1.5-9.8                                     
## [14] iterators_1.0.14                                   
## [15] foreach_1.5.2                                      
## [16] Biostrings_2.70.1                                  
## [17] XVector_0.42.0                                     
## [18] SummarizedExperiment_1.32.0                        
## [19] Biobase_2.62.0                                     
## [20] MatrixGenerics_1.14.0                              
## [21] matrixStats_1.2.0                                  
## [22] GenomicRanges_1.54.1                               
## [23] GenomeInfoDb_1.38.2                                
## [24] IRanges_2.36.0                                     
## [25] S4Vectors_0.40.2                                   
## [26] BiocGenerics_0.48.1                                
## [27] limma_3.58.1                                       
## [28] stringi_1.8.3                                      
## 
## loaded via a namespace (and not attached):
##   [1] splines_4.3.2             later_1.3.2              
##   [3] BiocIO_1.12.0             bitops_1.0-7             
##   [5] filelock_1.0.3            BiasedUrn_2.0.11         
##   [7] tibble_3.2.1              preprocessCore_1.64.0    
##   [9] XML_3.99-0.16             lifecycle_1.0.4          
##  [11] lattice_0.22-5            MASS_7.3-60              
##  [13] base64_2.0.1              scrime_1.3.5             
##  [15] magrittr_2.0.3            sass_0.4.8               
##  [17] rmarkdown_2.25            jquerylib_0.1.4          
##  [19] yaml_2.3.8                httpuv_1.6.13            
##  [21] doRNG_1.8.6               askpass_1.2.0            
##  [23] DBI_1.1.3                 RColorBrewer_1.1-3       
##  [25] abind_1.4-5               zlibbioc_1.48.0          
##  [27] rvest_1.0.3               quadprog_1.5-8           
##  [29] purrr_1.0.2               RCurl_1.98-1.13          
##  [31] rappdirs_0.3.3            GenomeInfoDbData_1.2.11  
##  [33] genefilter_1.84.0         annotate_1.80.0          
##  [35] svglite_2.1.3             DelayedMatrixStats_1.24.0
##  [37] codetools_0.2-19          DelayedArray_0.28.0      
##  [39] xml2_1.3.6                tidyselect_1.2.0         
##  [41] beanplot_1.3.1            BiocFileCache_2.10.1     
##  [43] webshot_0.5.5             illuminaio_0.44.0        
##  [45] GenomicAlignments_1.38.0  jsonlite_1.8.8           
##  [47] multtest_2.58.0           ellipsis_0.3.2           
##  [49] survival_3.5-7            systemfonts_1.0.5        
##  [51] tools_4.3.2               progress_1.2.3           
##  [53] Rcpp_1.0.11               glue_1.6.2               
##  [55] gridExtra_2.3             mnormt_2.1.1             
##  [57] SparseArray_1.2.2         xfun_0.41                
##  [59] dplyr_1.1.4               HDF5Array_1.30.0         
##  [61] fastmap_1.1.1             GGally_2.2.0             
##  [63] rhdf5filters_1.14.1       fansi_1.0.6              
##  [65] openssl_2.1.1             caTools_1.18.2           
##  [67] digest_0.6.33             R6_2.5.1                 
##  [69] mime_0.12                 colorspace_2.1-0         
##  [71] gtools_3.9.5              biomaRt_2.58.0           
##  [73] RSQLite_2.3.4             utf8_1.2.4               
##  [75] tidyr_1.3.0               generics_0.1.3           
##  [77] data.table_1.14.10        rtracklayer_1.62.0       
##  [79] prettyunits_1.2.0         httr_1.4.7               
##  [81] htmlwidgets_1.6.4         S4Arrays_1.2.0           
##  [83] ggstats_0.5.1             pkgconfig_2.0.3          
##  [85] gtable_0.3.4              blob_1.2.4               
##  [87] siggenes_1.76.0           htmltools_0.5.7          
##  [89] echarts4r_0.4.5           scales_1.3.0             
##  [91] png_0.1-8                 rstudioapi_0.15.0        
##  [93] knitr_1.45                reshape2_1.4.4           
##  [95] tzdb_0.4.0                rjson_0.2.21             
##  [97] nlme_3.1-163              curl_5.2.0               
##  [99] cachem_1.0.8              rhdf5_2.46.1             
## [101] stringr_1.5.1             KernSmooth_2.23-22       
## [103] restfulr_0.0.15           GEOquery_2.70.0          
## [105] pillar_1.9.0              grid_4.3.2               
## [107] reshape_0.8.9             vctrs_0.6.5              
## [109] gplots_3.1.3              promises_1.2.1           
## [111] dbplyr_2.4.0              xtable_1.8-4             
## [113] beeswarm_0.4.0            evaluate_0.23            
## [115] readr_2.1.4               GenomicFeatures_1.54.1   
## [117] cli_3.6.2                 compiler_4.3.2           
## [119] Rsamtools_2.18.0          rlang_1.1.2              
## [121] crayon_1.5.2              rngtools_1.5.2           
## [123] nor1mix_1.3-2             mclust_6.0.1             
## [125] plyr_1.8.9                viridisLite_0.4.2        
## [127] BiocParallel_1.36.0       munsell_0.5.0            
## [129] Matrix_1.6-1.1            hms_1.1.3                
## [131] sparseMatrixStats_1.14.0  bit64_4.0.5              
## [133] ggplot2_3.4.4             Rhdf5lib_1.24.1          
## [135] KEGGREST_1.42.0           statmod_1.5.0            
## [137] shiny_1.8.0               highr_0.10               
## [139] memoise_2.0.1             bslib_0.6.1              
## [141] bit_4.0.5

Notes

simla: parametric self-contained
simlac: parametric competitive
simnla: nonparametric self-contained
simnlac: nonparametric competitive

simal: parametric self-contained
simalc: parametric competitive
simnal: nonparametric self-contained
simnalc: nonparametric competitive

Testing gene methylation enrichment analysis approaches using simulated data

The GMEA team

2023-12-27

Introduction

Load data

Gene set database

Incorporate changes

GSAMETH function

LA function

AL: Aggregate limma

Agg-limma-mitch function

AA Aggregate-aggregate-limma functions

Run analyses

GSA meth sim

LA sim

parametric competitive

parametric competitive with “top” aggregation

nonparametric competitive

LA competitive mitch

LA rank probes

AL sim

parametric competitive

nonparametric competitive

ALM competitive test

AA sim

Session information

Notes

groupsizes	deltas
3	0.1
5	0.1
8	0.1
12	0.1
3	0.2
5	0.2
8	0.2
12	0.2
3	0.3
5	0.3
8	0.3
12	0.3
3	0.4
5	0.4
8	0.4
12	0.4
3	0.5
5	0.5
8	0.5
12	0.5

groupsizes	deltas
3	0.1
5	0.1
8	0.1
12	0.1
3	0.2
5	0.2
8	0.2
12	0.2
3	0.3
5	0.3
8	0.3
12	0.3
3	0.4
5	0.4
8	0.4
12	0.4
3	0.5
5	0.5
8	0.5
12	0.5

groupsizes	deltas
3	0.1
5	0.1
8	0.1
12	0.1
3	0.2
5	0.2
8	0.2
12	0.2
3	0.3
5	0.3
8	0.3
12	0.3
3	0.4
5	0.4
8	0.4
12	0.4
3	0.5
5	0.5
8	0.5
12	0.5