Introduction

In this report we establish a new method for generating simulated data with known ground truth. This will be used to test different gene methylation enrichment approaches systematically.

The general steps are:

  1. Import GSE158422 data corresponding to control (non-tumour tissue).

  2. From the 37 samples, create two groups of 18 samples. One of these will be considered “control” and the other “case”.

  3. Create random gene sets that have similar sized to Reactome pathways.

  4. Some gene sets will be selected to be differentially methylated. Half of these will be hypermethylated and the others will be hypomethylated.

  5. The changes will be incorporated into the “case” samples.

  6. The enrichment analysis will be conducted.

  7. The accuracy will be calculated.

suppressPackageStartupMessages({
  library("stringi")
  library("limma")
  library("missMethyl")
  library("IlluminaHumanMethylation450kmanifest")
  library("IlluminaHumanMethylationEPICanno.ilm10b4.hg19")
  library("HGNChelper")
  library('org.Hs.eg.db')
  library("psych")
  library("mitch")
  library("kableExtra")
})

# optimised for 128 GB sever with 32 threads
CORES=6

Load data

  • annotations

  • probe sets

  • gene sets

  • design matrix

  • mval matrix

# get probe-gene mapping
anno <- getAnnotation(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
myann <- data.frame(anno[,c("UCSC_RefGene_Name","UCSC_RefGene_Group","Islands_Name","Relation_to_Island")])
gp <- myann[,"UCSC_RefGene_Name",drop=FALSE]
gp2 <- strsplit(gp$UCSC_RefGene_Name,";")
names(gp2) <- rownames(gp)
gp2 <- lapply(gp2,unique)
gt <- stack(gp2)
colnames(gt) <- c("gene","probe")
dim(gt)
## [1] 684970      2
str(gt)
## 'data.frame':    684970 obs. of  2 variables:
##  $ gene : chr  "YTHDF1" "EIF2S3" "PKN3" "CCDC57" ...
##  $ probe: Factor w/ 865859 levels "cg18478105","cg09835024",..: 1 2 3 4 5 6 7 8 8 9 ...
head(gt)
##     gene      probe
## 1 YTHDF1 cg18478105
## 2 EIF2S3 cg09835024
## 3   PKN3 cg14361672
## 4 CCDC57 cg01763666
## 5   INF2 cg12950382
## 6  CDC16 cg02115394
#new.hgnc.table <- getCurrentHumanMap()
#new.hgnc.table <- readRDS("new.hgnc.table.rds")
#fix <- checkGeneSymbols(gt$gene,map=new.hgnc.table)
#fix2 <- fix[which(fix$x != fix$Suggested.Symbol),]
#length(unique(fix2$x))
#gt$gene <- fix$Suggested.Symbol

if (!file.exists("GSE158422_design.rds")) {
  download.file("https://ziemann-lab.net/public/gmea_prototype/GSE158422_design.rds", "GSE158422_design.rds")
}
design <- readRDS("GSE158422_design.rds")

if (!file.exists("GSE158422_design.rds")) {
  options(timeout=10000000000000)
  download.file("https://ziemann-lab.net/public/gmea_prototype/GSE158422_mx.rds","GSE158422_mx.rds")
}
mval <- readRDS("GSE158422_mx.rds")

Gene set database

We could use Reactome pathways, however these have a lot of overlapping sets, which could cause inflated false positives. After some testing, I found that the gene set size strongly impacts the accuracy.

For reference, Reactome sets have a mean of 48 and median of 15, while MSigDB has a median of 47 and mean of 116.

Therefore, a reasonable analysis would include small (20), medium (50) and large (100) sets.

To reflect coverage of Reactome and other pathway databases, only half the genes will be included in sets.

gene_catalog <- unique(gt$gene)

# set gene set size here
lengths <- rep(100,1000)

randomGeneSets <- function(gene_catalog, lengths, seed){
  set.seed(seed) ; gene_catalog_half <- sample(gene_catalog,length(gene_catalog)/2)
  num_gsets <- length(lengths)
  set.seed(seed) ; seeds <- sample(1:1e6, num_gsets)
  gsets <- lapply(1:num_gsets,function(i) {
    set.seed(seeds[i]) ; gs <- sample(gene_catalog_half,lengths[i])
    return(gs)
  } )
  names(gsets)<-stri_rand_strings(length(gsets), 15, pattern = "[A-Za-z]")
  return(gsets)
}

gsets <- randomGeneSets(gene_catalog,lengths,seed=100)

Incorporate changes

TODO: to incorporate changes to case samples.

Need to figure out what magnitude to change. Will refer to the cancer/normal comparison.

Select genes and probes to alter.

seed=100
frac_genes=0.5
frac_probes=0.5
delta=1
nsamples=10
normal_mval <- mval[,(1:(ncol(mval)/2)*2)]

incorp_dm <- function(genesets,myann,mval,seed, frac_genes,frac_probes,
  groupsize,delta=1,gene_catalog) {

  # divide gene sets between hyper and hypomethylated
  nset <- floor(length(genesets)/2)
  set.seed(seed) ; gtup <-sample(genesets,nset)
  set.seed(seed) ; gtdn <- sample(setdiff(genesets,gtup),nset)
  gup <- unname(unlist(gtup))
  gdn <- unname(unlist(gtdn))
  # add extra 10% DM genes
  gnon <- setdiff(gene_catalog,c(gup,gdn))
  gextra <- round(length(gnon)*0.1)
  set.seed(seed) ; gup <- c(gup,sample(gnon,gextra))
  gnon <- setdiff(gene_catalog,c(gup,gdn))
  set.seed(seed) ; gdn <- c(gdn,sample(gnon,gextra))
  # make probe-gene vector
  probe2gene <- strsplit(myann$UCSC_RefGene_Name,";")
  names(probe2gene) <- rownames(myann)
  probe2gene <- unlist(probe2gene)
  # select probes hypermethylated
  set.seed(seed) ; gup2 <- sample(gup,floor(length(gup)*frac_genes))
  pup <- names(probe2gene[which(probe2gene %in% gup2)])
  set.seed(seed) ; pup2 <- sample(pup,floor(length(pup)*frac_probes))
  # select probes hypomethylated
  set.seed(seed) ; gdn2 <- sample(gdn,floor(length(gdn)*frac_genes))
  pdn <- names(probe2gene[which(probe2gene %in% gdn2)])
  set.seed(seed) ; pdn2 <- sample(pdn,floor(length(pdn)*frac_probes))
  # add 10% DM probes as well
  probes <- rownames(myann)
  pnon <- setdiff(probes,c(pup,pdn))
  pextra <- round(length(pnon)*0.1)
  set.seed(seed) ; pup <- c(pup,sample(pnon,pextra))
  pnon <- setdiff(probes,c(pup,pdn))
  set.seed(seed) ; pdn <- c(pdn,sample(pnon,pextra))
  # divide samples between ctrl and case
  ncols <- ncol(mval)
  maxgroupsize=floor(ncols/2)
  if ( groupsize > maxgroupsize ) { stop("groupsize cannot be larger than half the ncols of mval") }
  set.seed(seed) ; ctrl <- sample(1:ncols,groupsize)
  set.seed(seed) ; case <- sample(setdiff(1:ncols,ctrl),groupsize)
  mval_ctrl <- mval[,ctrl]
  mval_case <- mval[,case]
  # incorporate altered signals - change by +1 or -1
  mval_case[rownames(mval_case) %in% pup2,] <-  mval_case[rownames(mval_case) %in% pup2,] + delta
  mval_case[rownames(mval_case) %in% pdn2,] <-  mval_case[rownames(mval_case) %in% pdn2,] - delta
  mval2 <- cbind(mval_ctrl,mval_case)
  result <- list("mval"=mval2,"probes up"=pup2,"probes down"=pdn2,
    "genes up"=gup2,"genes down"=gdn2,
    "genesets up"=gtup,"genesets down"=gtdn)
  return(result)
}

GSAMETH function

# limma
runlimma <- function(mval,design,myann) {
  fit.reduced <- lmFit(mval,design)
  fit.reduced <- eBayes(fit.reduced)
  dm <- topTable(fit.reduced,coef=ncol(design), number = Inf)
  dm <- merge(myann,dm,by=0)
  dm <- dm[order(dm$P.Value),]
  rownames(dm) <- dm$Row.names
  dm$Row.names=NULL
  return(dm)
}

This is how to use the function

This could be complicated as it requires translation of symbols to entrez IDs, but could be simplified if the entrez translation is done at the gsameth step.

simgsa <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(unlist(strsplit(myann$UCSC_RefGene_Name,";")))
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  dm3 <- runlimma(mval=mval2,design=d,myann=myann)
  pup3 <- rownames(subset(dm3,adj.P.Val<0.05 & logFC>0))
  pdn3 <- rownames(subset(dm3,adj.P.Val<0.05 & logFC<0))
  if ( length(pup3) < 250 ) { pup3 <- head(rownames(subset(dm3, logFC > 0)), 250) }
  if ( length(pdn3) < 250 ) { pdn3 <- head(rownames(subset(dm3, logFC < 0)), 250) }
  # convert gene sets to entrez
  suppressWarnings(suppressMessages({ gene2entrez <- mapIds(org.Hs.eg.db, gene_catalog, 'ENTREZID', 'SYMBOL') }))
  gsets_entrez <- lapply(gsets,function(gs) {
    gs2 <- unique(gene2entrez[names(gene2entrez) %in% gs])
    gs2 <- gs2[!is.na(gs2)]
    return(gs2)
  })
  suppressWarnings(suppressMessages({
    gsaup3 <- gsameth(sig.cpg=pup3, all.cpg=rownames(dm3), collection=gsets_entrez, array.type="EPIC")
    gsadn3 <- gsameth(sig.cpg=pdn3, all.cpg=rownames(dm3), collection=gsets_entrez, array.type="EPIC")
  }))
  gsig_up3 <- rownames(subset(gsaup3,FDR<0.05))
  gsig_dn3 <- rownames(subset(gsadn3,FDR<0.05))
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(gsadn3)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

LA function

This process runs limma first and then aggregates the results before doing an enrichment test.

# enrich parametric
ttenrich <- function(m,genesets,cores=1,testtype="selfcontained") {
  res <- mclapply( 1:length(genesets), function(i) {
    scores <- m[,1]
    gs <- genesets[i]
    name <- names(gs)
    n_members <- length(which(rownames(m) %in% gs[[1]]))
    if ( n_members > 4 ) {
      tstats <- m[which(rownames(m) %in% gs[[1]]),]
      myn <- length(tstats)
      mymean <- mean(tstats)
      mymedian <- median(tstats)
      if ( testtype == "selfcontained" ) { wt <- t.test(tstats) }
      if ( testtype == "competitive" ) { wt <- t.test(tstats,scores) }
      res <- c(name,myn,mymean,mymedian,wt$p.value)
    }
  } , mc.cores = cores)
  res_df <- do.call(rbind, res)
  rownames(res_df) <- res_df[,1]
  res_df <- res_df[,-1]
  colnames(res_df) <- c("n_genes","t_mean","t_median","pval")
  tmp <- apply(res_df,2,as.numeric)
  rownames(tmp) <- rownames(res_df)
  res_df <- tmp
  res_df <- as.data.frame(res_df)
  res_df <- res_df[order(res_df$pval),]
  res_df$logp <- -log10(res_df$pval )
  res_df$fdr <- p.adjust(res_df$pval,method="fdr")
  res_df[order(abs(res_df$pval)),]
  return(res_df)
}

# enrich non-parametric
wtenrich <- function(m,genesets,cores=1,testtype="selfcontained") {
  res <- mclapply( 1:length(genesets), function(i) {
    scores <- m[,1]
    gs <- genesets[i]
    name <- names(gs)
    n_members <- length(which(rownames(m) %in% gs[[1]]))
    if ( n_members > 4 ) {
      tstats <- m[which(rownames(m) %in% gs[[1]]),]
      myn <- length(tstats)
      mymean <- mean(tstats)
      mymedian <- median(tstats)
      if ( testtype == "selfcontained" ) { wt <- wilcox.test(tstats) }
      if ( testtype == "competitive" ) { wt <- wilcox.test(tstats,scores) }
      res <- c(name,myn,mymean,mymedian,wt$p.value)
    }
  } , mc.cores = cores)
  res_df <- do.call(rbind, res)
  rownames(res_df) <- res_df[,1]
  res_df <- res_df[,-1]
  colnames(res_df) <- c("n_genes","t_mean","t_median","pval")
  tmp <- apply(res_df,2,as.numeric)
  rownames(tmp) <- rownames(res_df)
  res_df <- tmp
  res_df <- as.data.frame(res_df)
  res_df <- res_df[order(res_df$pval),]
  res_df$logp <- -log10(res_df$pval )
  res_df$fdr <- p.adjust(res_df$pval,method="fdr")
  res_df[order(abs(res_df$pval)),]
  return(res_df)
}

# LA parametric competitive
simlac <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  dm3 <- runlimma(mval=mval2,design=d,myann=myann)
  dd <- merge(dm3,gt,by.x=0,by.y="probe")
  m1 <- aggregate(t ~ gene,dd,mean)
  rownames(m1) <- m1$gene
  m1$gene=NULL
  lares1 <- ttenrich(m=m1,genesets=gsets,cores=2,testtype="competitive")
  gsig_up3 <- rownames(subset(lares1, fdr < 0.05 & t_mean > 0))
  gsig_dn3 <- rownames(subset(lares1, fdr < 0.05 & t_mean < 0))
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(lares1)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

# LA parametric competitive top
simlactop <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  dm3 <- runlimma(mval=mval2,design=d,myann=myann)
  dd <- merge(dm3,gt,by.x=0,by.y="probe")
  m1 <- aggregate(t ~ gene,dd, function(x) {
    if (abs(max(x)) > abs(min(x))) { max(x) } else { min(x) }
  })
  rownames(m1) <- m1$gene
  m1$gene=NULL
  lares1 <- ttenrich(m=m1,genesets=gsets,cores=2,testtype="competitive")
  gsig_up3 <- rownames(subset(lares1, fdr < 0.05 & t_mean > 0))
  gsig_dn3 <- rownames(subset(lares1, fdr < 0.05 & t_mean < 0))
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(lares1)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

# LA nonparametric competitive
simnlac <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  dm3 <- runlimma(mval=mval2,design=d,myann=myann)
  dd <- merge(dm3,gt,by.x=0,by.y="probe")
  m1 <- aggregate(t ~ gene,dd,mean)
  rownames(m1) <- m1$gene
  m1$gene=NULL
  lares1 <- wtenrich(m=m1,genesets=gsets,cores=2,testtype="competitive")
  gsig_up3 <- rownames(subset(lares1, fdr < 0.05 & t_mean > 0))
  gsig_dn3 <- rownames(subset(lares1, fdr < 0.05 & t_mean < 0))
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(lares1)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

# LA competitive mitch
simlacm <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  dm3 <- runlimma(mval=mval2,design=d,myann=myann)
  dd <- merge(dm3,gt,by.x=0,by.y="probe")
  m1 <- aggregate(t ~ gene,dd,mean)
  rownames(m1) <- m1$gene
  m1$gene=NULL
  lamres1 <- runmitch(m=m1,genesets=gsets,cores=2)
  gsig_up3 <- rownames( subset( lamres1, p.adjustANOVA < 0.05 & s.dist > 0 ) )
  gsig_dn3 <- rownames( subset( lamres1, p.adjustANOVA < 0.05 & s.dist < 0 ) )
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(lamres1)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

# LA rank competitive mitch
simlrm <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  dm3 <- runlimma(mval=mval2,design=d,myann=myann)
  # rank probes first
  dm3$rank <-  rank(dm3$t) - nrow(subset(dm3,t<0))
  mm <- merge(dm3,gt,by.x=0,by.y="probe")
  head(mm)
  mma <-aggregate(mm$rank ~ gene,mm,mean)
  rownames(mma) <- mma$gene
  mma$gene = NULL
  colnames(mma) <- "meanrank"
  lrmres <- runmitch(m=mma,genesets=gsets,cores=2)
  gsig_up3 <- rownames( subset( lrmres, p.adjustANOVA < 0.05 & s.dist > 0 ) )
  gsig_dn3 <- rownames( subset( lrmres, p.adjustANOVA < 0.05 & s.dist < 0 ) )
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(lrmres)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

AL: Aggregate limma

Functions for aggregate-limma-enrich approach.

# AL approach parametric competitive
simalc <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  # al pipeline
  mm <- merge(mval2,gt,by.x=0,by.y="probe")
  mm$Row.names = NULL
  a <- aggregate(. ~ gene,mm,mean)
  rownames(a) <- a$gene
  a$gene=NULL
  fit.reduced <- lmFit(a,d)
  fit.reduced <- eBayes(fit.reduced)
  al <- topTable(fit.reduced,coef=ncol(d), number = Inf)
  m1 <- as.data.frame(al$t)
  rownames(m1) <- rownames(al)
  colnames(m1) <- "t"
  alres1 <- ttenrich(m=m1,genesets=gsets,cores=2,testtype="competitive")
  # summarise results
  gsig_up3 <- rownames(subset(alres1, fdr < 0.05 & t_mean > 0))
  gsig_dn3 <- rownames(subset(alres1, fdr < 0.05 & t_mean < 0))
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(alres1)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

# AL approach nonparametric competitive
simnalc <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  # al pipeline
  mm <- merge(mval2,gt,by.x=0,by.y="probe")
  mm$Row.names = NULL
  a <- aggregate(. ~ gene,mm,mean)
  rownames(a) <- a$gene
  a$gene=NULL
  fit.reduced <- lmFit(a,d)
  fit.reduced <- eBayes(fit.reduced)
  al <- topTable(fit.reduced,coef=ncol(d), number = Inf)
  m1 <- as.data.frame(al$t)
  rownames(m1) <- rownames(al)
  colnames(m1) <- "t"
  alres1 <- wtenrich(m=m1,genesets=gsets,cores=2,testtype="competitive")
  # summarise results
  gsig_up3 <- rownames(subset(alres1, fdr < 0.05 & t_mean > 0))
  gsig_dn3 <- rownames(subset(alres1, fdr < 0.05 & t_mean < 0))
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(alres1)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

Agg-limma-mitch function

This approach uses the aggregated mvals, limma and instead of a 1-sample t-test it uses mitch which is a competitive test and could give more interpretable results.

runmitch <- function(m,genesets,cores=1) {
  suppressMessages({ mres <- mitch_calc(m,genesets,minsetsize=5,cores=cores) })
  mres <- mres$enrichment_result
  rownames(mres) <- mres$set
  mres$set=NULL
  return(mres)
}

simalm <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  # alm pipeline
  mm <- merge(mval2,gt,by.x=0,by.y="probe")
  mm$Row.names = NULL
  a <- aggregate(. ~ gene,mm,mean)
  rownames(a) <- a$gene
  a$gene=NULL
  fit.reduced <- lmFit(a,d)
  fit.reduced <- eBayes(fit.reduced)
  al <- topTable(fit.reduced,coef=ncol(d), number = Inf)
  m1 <- as.data.frame(al$t)
  rownames(m1) <- rownames(al)
  colnames(m1) <- "t"
  almres1 <- runmitch(m=m1,genesets=gsets,cores=2)
  # summarise results
  gsig_up3 <- rownames( subset( almres1, p.adjustANOVA < 0.05 & s.dist > 0 ) )
  gsig_dn3 <- rownames( subset( almres1, p.adjustANOVA < 0.05 & s.dist < 0 ) )
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(almres1)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

AA Aggregate-aggregate-limma functions

Use mean value works well here.

gsagg <- function(x,genesets,cores=1) {
  meds <- mclapply(1:length(genesets), function(i) {
    gs = genesets[[i]]
    xx <- x[rownames(x) %in% gs,]
    med <- apply(xx,2,mean)
  },mc.cores=cores)
  mymed <- do.call(rbind,meds)
  rownames(mymed) <- names(genesets)
  as.data.frame(mymed)
}

aalimma <- function(agag,design) {
  fit.reduced <- lmFit(agag,design)
  fit.reduced <- eBayes(fit.reduced)
  dmagg <- topTable(fit.reduced,coef=ncol(design), number = Inf)
  return(dmagg)
}

aal <- function(mval,myann,genesets,design,cores=1) {
  medf <- chragg(mval,myann,cores=cores)
  agag <- gsagg(x=medf,genesets=genesets,cores=cores)
  aalres <- aalimma(agag=agag,design=design)
  return(aalres)
}

simaa <- function(genesetdatabase, myann, mval, seed, frac_genes, frac_probes, groupsize, delta=1, num_dm_sets=50) {
  # generate gene sets
  gene_catalog <- unique(gt$gene)
  lengths <- unname(unlist(lapply(genesetdatabase,length)))
  gsets <- randomGeneSets(gene_catalog,lengths,seed=seed)
  # select gene sets to alter
  set.seed(seed) ; gset_mod <- sample(gsets,num_dm_sets)
  # incorporate select changes
  sim <- incorp_dm(genesets=gset_mod, myann=myann, mval=mval, seed=seed,
    frac_genes=0.5,frac_probes=0.5,groupsize=groupsize,delta=delta,
    gene_catalog=gene_catalog)
  # set up limma
  mval2 <- sim$mval
  ncols <- ncol(mval2)
  groupsize <- ncols/2
  ss <- data.frame(colnames(mval2))
  colnames(ss) <- "sample"
  ss$case <- c(rep(0,groupsize),rep(1,groupsize))
  d <- model.matrix(~ ss$case )
  mm <- merge(mval2,gt,by.x=0,by.y="probe")
  mm$Row.names = NULL
  a <- aggregate(. ~ gene,mm,mean)
  rownames(a) <- a$gene
  a$gene=NULL
  mystack <- stack(gsets)
  mmm <- merge(a,mystack,by.x=0,by.y="values")
  mmm$Row.names=NULL
  aa <- aggregate(. ~ ind,mmm,mean)
  rownames(aa) <- aa$ind
  aa$ind=NULL
  fit.reduced <- lmFit(aa,d)
  fit.reduced <- eBayes(fit.reduced)
  aares1 <- topTable(fit.reduced,coef=ncol(d), number = Inf)
  # summarise results
  gsig_up3 <- rownames(subset(aares1, adj.P.Val < 0.05 & logFC > 0))
  gsig_dn3 <- rownames(subset(aares1, adj.P.Val < 0.05 & logFC < 0))
  gtup <- names(sim[[6]])
  gtdn <- names(sim[[7]])
  UPTP=length(intersect(gsig_up3 ,gtup))
  UPFP=length(setdiff(gsig_up3 ,gtup))
  UPFN=length(setdiff(gtup,gsig_up3))
  DNTP=length(intersect(gsig_dn3 ,gtdn))
  DNFP=length(setdiff(gsig_dn3 ,gtdn))
  DNFN=length(setdiff(gtdn,gsig_dn3))
  TP=UPTP+DNTP
  FP=UPFP+DNFP
  FN=UPFN+DNFN
  TN=nrow(aares1)-DNTP-DNFP-DNFN-UPTP-UPFP-UPFN
  PREC=TP/(TP+FP)
  REC=TP/(TP+FN)
  F1=TP/(TP+(0.5*(FP+FN)))
  result <- c("TP"=TP,"FP"=FP,"FN"=FN,"TN"=TN,"PREC"=PREC,"REC"=REC)
  return(result)
}

F1 <- function(x,y) {
  ( 2 * x * y ) / ( x + y )
}

Run analyses

Set assumptions.

num_dm_sets=50
sims=50

groupsizes=c(3,5,8,12)
deltas=c(0.1,0.2,0.3,0.4,0.5)
#groupsizes=5
#deltas=0.4

params <- expand.grid("groupsizes"=groupsizes,"deltas"=deltas)

params %>% kbl(caption="Parameters to test") %>% kable_paper("hover", full_width = F)
Parameters to test
groupsizes deltas
3 0.1
5 0.1
8 0.1
12 0.1
3 0.2
5 0.2
8 0.2
12 0.2
3 0.3
5 0.3
8 0.3
12 0.3
3 0.4
5 0.4
8 0.4
12 0.4
3 0.5
5 0.5
8 0.5
12 0.5

GSA meth sim

Cannot be run in multicore due to fragility of AnnotationDbi SQLite objects.

gres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- lapply(1:sims,function(i) {
    simgsa(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5,
      frac_probes=0.5, groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  })
  res <- do.call(rbind,res)
  return(res)
})

gres2 <- do.call(rbind,lapply(gres,colMeans))
gres2[,"PREC"] <- gres2[,"TP"] / ( gres2[,"TP"] + gres2[,"FP"] )

gres2 %>% kbl(caption="GSAmeth results") %>% kable_paper("hover", full_width = F)
GSAmeth results
TP FP FN TN PREC REC
0.00 0.02 50.00 949.98 0.0000000 0.0000
0.00 0.00 50.00 950.00 NaN 0.0000
0.00 0.00 50.00 950.00 NaN 0.0000
0.06 0.00 49.94 950.00 1.0000000 0.0012
0.00 0.00 50.00 950.00 NaN 0.0000
0.02 0.00 49.98 950.00 1.0000000 0.0004
0.76 0.12 49.24 949.88 0.8636364 0.0152
5.82 0.64 44.18 949.36 0.9009288 0.1164
0.02 0.02 49.98 949.98 0.5000000 0.0004
0.80 0.12 49.20 949.88 0.8695652 0.0160
7.32 0.64 42.68 949.36 0.9195980 0.1464
26.06 3.68 23.94 946.32 0.8762609 0.5212
0.12 0.04 49.88 949.96 0.7500000 0.0024
4.78 0.30 45.22 949.70 0.9409449 0.0956
22.92 4.22 27.08 945.78 0.8445099 0.4584
49.84 19.16 0.16 930.84 0.7223188 0.9968
0.94 0.12 49.06 949.88 0.8867925 0.0188
9.28 1.00 40.72 949.00 0.9027237 0.1856
49.34 16.02 0.66 933.98 0.7548960 0.9868
50.00 22.48 0.00 927.52 0.6898455 1.0000
gres3p <- do.call(rbind,lapply(groupsizes, function (g) { gres2[params$groupsizes==g,"PREC"] }))
colnames(gres3p) <- deltas
rownames(gres3p) <- groupsizes
gres3p %>% kbl(caption="GSAmeth precision") %>% kable_paper("hover", full_width = F)
GSAmeth precision
0.1 0.2 0.3 0.4 0.5
3 0 NaN 0.5000000 0.7500000 0.8867925
5 NaN 1.0000000 0.8695652 0.9409449 0.9027237
8 NaN 0.8636364 0.9195980 0.8445099 0.7548960
12 1 0.9009288 0.8762609 0.7223188 0.6898455
gres3r <- do.call(rbind,lapply(groupsizes, function (g) { gres2[params$groupsizes==g,"REC"] }))
colnames(gres3r) <- deltas
rownames(gres3r) <- groupsizes
gres3r %>% kbl(caption="GSAmeth recall") %>% kable_paper("hover", full_width = F)
GSAmeth recall
0.1 0.2 0.3 0.4 0.5
3 0.0000 0.0000 0.0004 0.0024 0.0188
5 0.0000 0.0004 0.0160 0.0956 0.1856
8 0.0000 0.0152 0.1464 0.4584 0.9868
12 0.0012 0.1164 0.5212 0.9968 1.0000
F1(gres3p,gres3r) %>% kbl(caption="GSAmeth F1") %>% kable_paper("hover", full_width = F)
GSAmeth F1
0.1 0.2 0.3 0.4 0.5
3 NaN NaN 0.0007994 0.0047847 0.0368194
5 NaN 0.0007997 0.0314218 0.1735657 0.3078965
8 NaN 0.0298742 0.2525880 0.5942442 0.8554092
12 0.0023971 0.2061637 0.6536243 0.8376471 0.8164598

LA sim

parametric competitive

lacres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simlac(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

lacres2 <- do.call(rbind,lapply(lacres,colMeans))
lacres2[,"PREC"] <- lacres2[,"TP"] / ( lacres2[,"TP"] + lacres2[,"FP"] )

lacres2 %>% kbl(caption="LA parametric results") %>% kable_paper("hover", full_width = F)
LA parametric results
TP FP FN TN PREC REC
0.02 0.02 49.98 949.98 0.5000000 0.0004
0.14 0.10 49.86 949.90 0.5833333 0.0028
0.38 0.12 49.62 949.88 0.7600000 0.0076
1.04 0.56 48.96 949.44 0.6500000 0.0208
1.06 0.22 48.94 949.78 0.8281250 0.0212
3.12 1.20 46.88 948.80 0.7222222 0.0624
9.38 2.60 40.62 947.40 0.7829716 0.1876
16.78 4.56 33.22 945.44 0.7863168 0.3356
6.78 1.96 43.22 948.04 0.7757437 0.1356
15.02 4.20 34.98 945.80 0.7814776 0.3004
23.96 4.76 26.04 945.24 0.8342618 0.4792
32.74 3.58 17.26 946.42 0.9014317 0.6548
16.80 3.20 33.20 946.80 0.8400000 0.3360
26.52 4.94 23.48 945.06 0.8429752 0.5304
33.14 4.18 16.86 945.82 0.8879957 0.6628
40.86 2.12 9.14 947.88 0.9506747 0.8172
27.02 3.28 22.98 946.72 0.8917492 0.5404
34.10 4.00 15.90 946.00 0.8950131 0.6820
38.92 3.18 11.08 946.82 0.9244656 0.7784
44.30 1.58 5.70 948.42 0.9655623 0.8860
lacres3p <- do.call(rbind,lapply(groupsizes, function (g) { lacres2[params$groupsizes==g,"PREC"] }))
colnames(lacres3p) <- deltas
rownames(lacres3p) <- groupsizes
lacres3p %>% kbl(caption="LA parametric precision") %>% kable_paper("hover", full_width = F)
LA parametric precision
0.1 0.2 0.3 0.4 0.5
3 0.5000000 0.8281250 0.7757437 0.8400000 0.8917492
5 0.5833333 0.7222222 0.7814776 0.8429752 0.8950131
8 0.7600000 0.7829716 0.8342618 0.8879957 0.9244656
12 0.6500000 0.7863168 0.9014317 0.9506747 0.9655623
lacres3r <- do.call(rbind,lapply(groupsizes, function (g) { lacres2[params$groupsizes==g,"REC"] }))
colnames(lacres3r) <- deltas
rownames(lacres3r) <- groupsizes
lacres3r %>% kbl(caption="LA parametric recall") %>% kable_paper("hover", full_width = F)
LA parametric recall
0.1 0.2 0.3 0.4 0.5
3 0.0004 0.0212 0.1356 0.3360 0.5404
5 0.0028 0.0624 0.3004 0.5304 0.6820
8 0.0076 0.1876 0.4792 0.6628 0.7784
12 0.0208 0.3356 0.6548 0.8172 0.8860
F1(lacres3p,lacres3r) %>% kbl(caption="LA parametric F1") %>% kable_paper("hover", full_width = F)
LA parametric F1
0.1 0.2 0.3 0.4 0.5
3 0.0007994 0.0413417 0.2308478 0.4800000 0.6729763
5 0.0055732 0.1148748 0.4339786 0.6511171 0.7741203
8 0.0150495 0.3026783 0.6087398 0.7590472 0.8451683
12 0.0403101 0.4704233 0.7585728 0.8788987 0.9240718

parametric competitive with “top” aggregation

Too many false positives.

lactopres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simlactop(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

lactopres2 <- do.call(rbind,lapply(lactopres,colMeans))
lactopres2[,"PREC"] <- lactopres2[,"TP"] / ( lactopres2[,"TP"] + lactopres2[,"FP"] )

lactopres2 %>% kbl(caption="LA parametric results (top agg)") %>% kable_paper("hover", full_width = F)
LA parametric results (top agg)
TP FP FN TN PREC REC
0.00 0.16 50.00 949.84 0.0000000 0.0000
0.06 0.22 49.94 949.78 0.2142857 0.0012
0.20 0.24 49.80 949.76 0.4545455 0.0040
0.44 0.26 49.56 949.74 0.6285714 0.0088
0.28 0.18 49.72 949.82 0.6086957 0.0056
1.52 0.38 48.48 949.62 0.8000000 0.0304
5.96 1.28 44.04 948.72 0.8232044 0.1192
17.90 2.80 32.10 947.20 0.8647343 0.3580
2.18 0.50 47.82 949.50 0.8134328 0.0436
12.28 1.66 37.72 948.34 0.8809182 0.2456
24.78 3.48 25.22 946.52 0.8768577 0.4956
36.90 2.84 13.10 947.16 0.9285355 0.7380
10.26 0.92 39.74 949.08 0.9177102 0.2052
26.82 3.26 23.18 946.74 0.8916223 0.5364
37.00 3.02 13.00 946.98 0.9245377 0.7400
43.90 2.34 6.10 947.66 0.9493945 0.8780
21.48 2.08 28.52 947.92 0.9117148 0.4296
36.46 3.40 13.54 946.60 0.9147015 0.7292
43.02 2.60 6.98 947.40 0.9430075 0.8604
46.72 2.14 3.28 947.86 0.9562014 0.9344
lactopres3p <- do.call(rbind,lapply(groupsizes, function (g) { lactopres2[params$groupsizes==g,"PREC"] }))
colnames(lactopres3p) <- deltas
rownames(lactopres3p) <- groupsizes
lactopres3p %>% kbl(caption="LA parametric precision (top agg)") %>% kable_paper("hover", full_width = F)
LA parametric precision (top agg)
0.1 0.2 0.3 0.4 0.5
3 0.0000000 0.6086957 0.8134328 0.9177102 0.9117148
5 0.2142857 0.8000000 0.8809182 0.8916223 0.9147015
8 0.4545455 0.8232044 0.8768577 0.9245377 0.9430075
12 0.6285714 0.8647343 0.9285355 0.9493945 0.9562014
lactopres3r <- do.call(rbind,lapply(groupsizes, function (g) { lactopres2[params$groupsizes==g,"REC"] }))
colnames(lactopres3r) <- deltas
rownames(lactopres3r) <- groupsizes
lactopres3r %>% kbl(caption="LA parametric recall (top agg)") %>% kable_paper("hover", full_width = F)
LA parametric recall (top agg)
0.1 0.2 0.3 0.4 0.5
3 0.0000 0.0056 0.0436 0.2052 0.4296
5 0.0012 0.0304 0.2456 0.5364 0.7292
8 0.0040 0.1192 0.4956 0.7400 0.8604
12 0.0088 0.3580 0.7380 0.8780 0.9344
F1(lactopres3p,lactopres3r) %>% kbl(caption="LA parametric F1 (top agg)") %>% kable_paper("hover", full_width = F)
LA parametric F1 (top agg)
0.1 0.2 0.3 0.4 0.5
3 NaN 0.0110979 0.0827639 0.3354037 0.5840131
5 0.0023866 0.0585742 0.3841101 0.6698302 0.8114845
8 0.0079302 0.2082460 0.6332737 0.8220395 0.8998118
12 0.0173570 0.5063649 0.8223758 0.9123026 0.9451750

nonparametric competitive

nlacres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simnlac(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

nlacres2 <- do.call(rbind,lapply(nlacres,colMeans))
nlacres2[,"PREC"] <- nlacres2[,"TP"] / ( nlacres2[,"TP"] + nlacres2[,"FP"] )

nlacres2 %>% kbl(caption="LA nonparametric results") %>% kable_paper("hover", full_width = F)
LA nonparametric results
TP FP FN TN PREC REC
0.04 0.12 49.96 949.88 0.2500000 0.0008
0.24 0.16 49.76 949.84 0.6000000 0.0048
0.90 0.30 49.10 949.70 0.7500000 0.0180
2.64 1.40 47.36 948.60 0.6534653 0.0528
2.42 1.08 47.58 948.92 0.6914286 0.0484
7.00 3.14 43.00 946.86 0.6903353 0.1400
13.62 4.70 36.38 945.30 0.7434498 0.2724
20.16 6.58 29.84 943.42 0.7539267 0.4032
11.32 3.92 38.68 946.08 0.7427822 0.2264
19.00 6.62 31.00 943.38 0.7416081 0.3800
25.76 6.46 24.24 943.54 0.7995034 0.5152
34.22 5.08 15.78 944.92 0.8707379 0.6844
21.96 5.02 28.04 944.98 0.8139362 0.4392
28.54 6.46 21.46 943.54 0.8154286 0.5708
33.56 5.52 16.44 944.48 0.8587513 0.6712
39.88 3.60 10.12 946.40 0.9172033 0.7976
29.64 4.94 20.36 945.06 0.8571429 0.5928
34.78 5.68 15.22 944.32 0.8596144 0.6956
37.90 4.28 12.10 945.72 0.8985301 0.7580
42.18 3.06 7.82 946.94 0.9323607 0.8436
nlacres3p <- do.call(rbind,lapply(groupsizes, function (g) { nlacres2[params$groupsizes==g,"PREC"] }))
colnames(nlacres3p) <- deltas
rownames(nlacres3p) <- groupsizes
nlacres3p %>% kbl(caption="LA nonparametric precision") %>% kable_paper("hover", full_width = F)
LA nonparametric precision
0.1 0.2 0.3 0.4 0.5
3 0.2500000 0.6914286 0.7427822 0.8139362 0.8571429
5 0.6000000 0.6903353 0.7416081 0.8154286 0.8596144
8 0.7500000 0.7434498 0.7995034 0.8587513 0.8985301
12 0.6534653 0.7539267 0.8707379 0.9172033 0.9323607
nlacres3r <- do.call(rbind,lapply(groupsizes, function (g) { nlacres2[params$groupsizes==g,"REC"] }))
colnames(nlacres3r) <- deltas
rownames(nlacres3r) <- groupsizes
nlacres3r %>% kbl(caption="LA nonparametric recall") %>% kable_paper("hover", full_width = F)
LA nonparametric recall
0.1 0.2 0.3 0.4 0.5
3 0.0008 0.0484 0.2264 0.4392 0.5928
5 0.0048 0.1400 0.3800 0.5708 0.6956
8 0.0180 0.2724 0.5152 0.6712 0.7580
12 0.0528 0.4032 0.6844 0.7976 0.8436
F1(nlacres3p,nlacres3r) %>% kbl(caption="LA nonparametric F1") %>% kable_paper("hover", full_width = F)
LA nonparametric F1
0.1 0.2 0.3 0.4 0.5
3 0.0015949 0.0904673 0.3470264 0.5705378 0.7008749
5 0.0095238 0.2327902 0.5025126 0.6715294 0.7689587
8 0.0351562 0.3987119 0.6266115 0.7534800 0.8223042
12 0.0977054 0.5254105 0.7664054 0.8532306 0.8857623

LA competitive mitch

lacmres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simlacm(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

lacmres2 <- do.call(rbind,lapply(lacmres,colMeans))
lacmres2[,"PREC"] <- lacmres2[,"TP"] / ( lacmres2[,"TP"] + lacmres2[,"FP"] )

lacmres2 %>% kbl(caption="LA mitch results") %>% kable_paper("hover", full_width = F)
LA mitch results
TP FP FN TN PREC REC
0.14 0.10 49.86 949.90 0.5833333 0.0028
0.36 0.06 49.64 949.94 0.8571429 0.0072
1.22 0.08 48.78 949.92 0.9384615 0.0244
3.74 0.40 46.26 949.60 0.9033816 0.0748
3.26 0.38 46.74 949.62 0.8956044 0.0652
9.62 0.78 40.38 949.22 0.9250000 0.1924
17.76 1.08 32.24 948.92 0.9426752 0.3552
25.34 1.70 24.66 948.30 0.9371302 0.5068
14.50 0.96 35.50 949.04 0.9379043 0.2900
24.16 1.72 25.84 948.28 0.9335394 0.4832
30.54 1.90 19.46 948.10 0.9414303 0.6108
37.00 2.56 13.00 947.44 0.9352882 0.7400
26.00 1.62 24.00 948.38 0.9413469 0.5200
32.96 2.48 17.04 947.52 0.9300226 0.6592
37.06 2.26 12.94 947.74 0.9425229 0.7412
40.90 2.92 9.10 947.08 0.9333638 0.8180
32.68 2.30 17.32 947.70 0.9342481 0.6536
37.74 2.90 12.26 947.10 0.9286417 0.7548
39.86 2.44 10.14 947.56 0.9423168 0.7972
42.42 2.98 7.58 947.02 0.9343612 0.8484
lacmres3p <- do.call(rbind,lapply(groupsizes, function (g) { lacmres2[params$groupsizes==g,"PREC"] }))
colnames(lacmres3p) <- deltas
rownames(lacmres3p) <- groupsizes
lacmres3p %>% kbl(caption="LA mitch precision") %>% kable_paper("hover", full_width = F)
LA mitch precision
0.1 0.2 0.3 0.4 0.5
3 0.5833333 0.8956044 0.9379043 0.9413469 0.9342481
5 0.8571429 0.9250000 0.9335394 0.9300226 0.9286417
8 0.9384615 0.9426752 0.9414303 0.9425229 0.9423168
12 0.9033816 0.9371302 0.9352882 0.9333638 0.9343612
lacmres3r <- do.call(rbind,lapply(groupsizes, function (g) { lacmres2[params$groupsizes==g,"REC"] }))
colnames(lacmres3r) <- deltas
rownames(lacmres3r) <- groupsizes
lacmres3r %>% kbl(caption="LA mitch recall") %>% kable_paper("hover", full_width = F)
LA mitch recall
0.1 0.2 0.3 0.4 0.5
3 0.0028 0.0652 0.2900 0.5200 0.6536
5 0.0072 0.1924 0.4832 0.6592 0.7548
8 0.0244 0.3552 0.6108 0.7412 0.7972
12 0.0748 0.5068 0.7400 0.8180 0.8484
F1(lacmres3p,lacmres3r) %>% kbl(caption="LA mitch F1") %>% kable_paper("hover", full_width = F)
LA mitch F1
0.1 0.2 0.3 0.4 0.5
3 0.0055732 0.1215511 0.4430186 0.6699304 0.7691221
5 0.0142800 0.3185430 0.6367949 0.7715356 0.8327449
8 0.0475634 0.5159791 0.7409025 0.8298253 0.8637053
12 0.1381603 0.6578401 0.8262617 0.8718823 0.8893082

LA rank probes

lrmres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simlrm(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

lrmres2 <- do.call(rbind,lapply(lrmres,colMeans))
lrmres2[,"PREC"] <- lrmres2[,"TP"] / ( lrmres2[,"TP"] + lrmres2[,"FP"] )

lrmres2 %>% kbl(caption="LA rank mitch results") %>% kable_paper("hover", full_width = F)
LA rank mitch results
TP FP FN TN PREC REC
0.12 0.12 49.88 949.88 0.5000000 0.0024
0.28 0.02 49.72 949.98 0.9333333 0.0056
0.98 0.04 49.02 949.96 0.9607843 0.0196
2.52 0.30 47.48 949.70 0.8936170 0.0504
2.32 0.18 47.68 949.82 0.9280000 0.0464
6.60 0.58 43.40 949.42 0.9192201 0.1320
11.86 0.58 38.14 949.42 0.9533762 0.2372
17.70 1.26 32.30 948.74 0.9335443 0.3540
10.46 0.62 39.54 949.38 0.9440433 0.2092
16.80 1.22 33.20 948.78 0.9322974 0.3360
21.74 1.34 28.26 948.66 0.9419411 0.4348
25.46 1.68 24.54 948.32 0.9380987 0.5092
17.72 1.02 32.28 948.98 0.9455710 0.3544
23.52 1.72 26.48 948.28 0.9318542 0.4704
25.62 1.50 24.38 948.50 0.9446903 0.5124
28.70 1.92 21.30 948.08 0.9372959 0.5740
23.32 1.44 26.68 948.56 0.9418417 0.4664
27.12 2.04 22.88 947.96 0.9300412 0.5424
27.96 1.78 22.04 948.22 0.9401479 0.5592
29.84 1.98 20.16 948.02 0.9377750 0.5968
lrmres3p <- do.call(rbind,lapply(groupsizes, function (g) { lrmres2[params$groupsizes==g,"PREC"] }))
colnames(lrmres3p) <- deltas
rownames(lrmres3p) <- groupsizes
lrmres3p %>% kbl(caption="LA rank mitch precision") %>% kable_paper("hover", full_width = F)
LA rank mitch precision
0.1 0.2 0.3 0.4 0.5
3 0.5000000 0.9280000 0.9440433 0.9455710 0.9418417
5 0.9333333 0.9192201 0.9322974 0.9318542 0.9300412
8 0.9607843 0.9533762 0.9419411 0.9446903 0.9401479
12 0.8936170 0.9335443 0.9380987 0.9372959 0.9377750
lrmres3r <- do.call(rbind,lapply(groupsizes, function (g) { lrmres2[params$groupsizes==g,"REC"] }))
colnames(lrmres3r) <- deltas
rownames(lrmres3r) <- groupsizes
lrmres3r %>% kbl(caption="LA rank mitch recall") %>% kable_paper("hover", full_width = F)
LA rank mitch recall
0.1 0.2 0.3 0.4 0.5
3 0.0024 0.0464 0.2092 0.3544 0.4664
5 0.0056 0.1320 0.3360 0.4704 0.5424
8 0.0196 0.2372 0.4348 0.5124 0.5592
12 0.0504 0.3540 0.5092 0.5740 0.5968
F1(lrmres3p,lacmres3r) %>% kbl(caption="LA rank mitch F1") %>% kable_paper("hover", full_width = F)
LA rank mitch F1
0.1 0.2 0.3 0.4 0.5
3 0.0055688 0.1218397 0.4437001 0.6709971 0.7716831
5 0.0142898 0.3181985 0.6365057 0.7721651 0.8333071
8 0.0475914 0.5175690 0.7410606 0.8306643 0.8627931
12 0.1380450 0.6569544 0.8273566 0.8735941 0.8908515

AL sim

parametric competitive

alcres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simalc(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

alcres2 <- do.call(rbind,lapply(alcres,colMeans))
alcres2[,"PREC"] <- alcres2[,"TP"] / ( alcres2[,"TP"] + alcres2[,"FP"] )

alcres2 %>% kbl(caption="AL parametric results") %>% kable_paper("hover", full_width = F)
AL parametric results
TP FP FN TN PREC REC
0.06 0.08 49.94 949.92 0.4285714 0.0012
0.08 0.10 49.92 949.90 0.4444444 0.0016
0.42 0.20 49.58 949.80 0.6774194 0.0084
1.44 0.74 48.56 949.26 0.6605505 0.0288
1.14 0.36 48.86 949.64 0.7600000 0.0228
3.98 1.68 46.02 948.32 0.7031802 0.0796
11.60 3.76 38.40 946.24 0.7552083 0.2320
19.12 5.80 30.88 944.20 0.7672552 0.3824
7.40 2.50 42.60 947.50 0.7474747 0.1480
17.80 5.16 32.20 944.84 0.7752613 0.3560
26.36 5.64 23.64 944.36 0.8237500 0.5272
35.48 4.40 14.52 945.60 0.8896690 0.7096
19.68 4.20 30.32 945.80 0.8241206 0.3936
29.26 5.54 20.74 944.46 0.8408046 0.5852
35.10 4.76 14.90 945.24 0.8805820 0.7020
42.34 2.66 7.66 947.34 0.9408889 0.8468
29.48 3.94 20.52 946.06 0.8821065 0.5896
36.88 4.48 13.12 945.52 0.8916828 0.7376
40.42 3.42 9.58 946.58 0.9219891 0.8084
45.76 1.86 4.24 948.14 0.9609408 0.9152
alcres3p <- do.call(rbind,lapply(groupsizes, function (g) { alcres2[params$groupsizes==g,"PREC"] }))
colnames(alcres3p) <- deltas
rownames(alcres3p) <- groupsizes
alcres3p %>% kbl(caption="AL parametric precision") %>% kable_paper("hover", full_width = F)
AL parametric precision
0.1 0.2 0.3 0.4 0.5
3 0.4285714 0.7600000 0.7474747 0.8241206 0.8821065
5 0.4444444 0.7031802 0.7752613 0.8408046 0.8916828
8 0.6774194 0.7552083 0.8237500 0.8805820 0.9219891
12 0.6605505 0.7672552 0.8896690 0.9408889 0.9609408
alcres3r <- do.call(rbind,lapply(groupsizes, function (g) { alcres2[params$groupsizes==g,"REC"] }))
colnames(alcres3r) <- deltas
rownames(alcres3r) <- groupsizes
alcres3r %>% kbl(caption="AL parametric recall") %>% kable_paper("hover", full_width = F)
AL parametric recall
0.1 0.2 0.3 0.4 0.5
3 0.0012 0.0228 0.1480 0.3936 0.5896
5 0.0016 0.0796 0.3560 0.5852 0.7376
8 0.0084 0.2320 0.5272 0.7020 0.8084
12 0.0288 0.3824 0.7096 0.8468 0.9152
F1(alcres3p,alcres3r) %>% kbl(caption="AL parametric F1") %>% kable_paper("hover", full_width = F)
AL parametric F1
0.1 0.2 0.3 0.4 0.5
3 0.0023933 0.0442718 0.2470785 0.5327558 0.7067849
5 0.0031885 0.1430111 0.4879386 0.6900943 0.8073555
8 0.0165942 0.3549572 0.6429268 0.7812152 0.8614663
12 0.0551936 0.5104111 0.7894971 0.8913684 0.9375128

nonparametric competitive

nalcres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simnalc(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

nalcres2 <- do.call(rbind,lapply(nalcres,colMeans))
nalcres2[,"PREC"] <- nalcres2[,"TP"] / ( nalcres2[,"TP"] + nalcres2[,"FP"] )

nalcres2 %>% kbl(caption="AL nonparametric results") %>% kable_paper("hover", full_width = F)
AL nonparametric results
TP FP FN TN PREC REC
0.06 0.12 49.94 949.88 0.3333333 0.0012
0.16 0.14 49.84 949.86 0.5333333 0.0032
0.54 0.22 49.46 949.78 0.7105263 0.0108
1.96 1.16 48.04 948.84 0.6282051 0.0392
1.80 0.68 48.20 949.32 0.7258065 0.0360
6.06 2.56 43.94 947.44 0.7030162 0.1212
12.24 4.38 37.76 945.62 0.7364621 0.2448
19.02 6.16 30.98 943.84 0.7553614 0.3804
10.16 3.34 39.84 946.66 0.7525926 0.2032
18.64 6.18 31.36 943.82 0.7510073 0.3728
25.24 6.26 24.76 943.74 0.8012698 0.5048
33.60 5.34 16.40 944.66 0.8628659 0.6720
21.22 5.06 28.78 944.94 0.8074581 0.4244
28.18 6.12 21.82 943.88 0.8215743 0.5636
33.52 5.80 16.48 944.20 0.8524924 0.6704
39.90 4.04 10.10 945.96 0.9080564 0.7980
29.44 4.96 20.56 945.04 0.8558140 0.5888
34.62 4.80 15.38 945.20 0.8782344 0.6924
38.14 4.58 11.86 945.42 0.8927903 0.7628
42.74 3.34 7.26 946.66 0.9275174 0.8548
nalcres3p <- do.call(rbind,lapply(groupsizes, function (g) { nalcres2[params$groupsizes==g,"PREC"] }))
colnames(nalcres3p) <- deltas
rownames(nalcres3p) <- groupsizes
nalcres3p %>% kbl(caption="AL nonparametric precision") %>% kable_paper("hover", full_width = F)
AL nonparametric precision
0.1 0.2 0.3 0.4 0.5
3 0.3333333 0.7258065 0.7525926 0.8074581 0.8558140
5 0.5333333 0.7030162 0.7510073 0.8215743 0.8782344
8 0.7105263 0.7364621 0.8012698 0.8524924 0.8927903
12 0.6282051 0.7553614 0.8628659 0.9080564 0.9275174
nalcres3r <- do.call(rbind,lapply(groupsizes, function (g) { nalcres2[params$groupsizes==g,"REC"] }))
colnames(nalcres3r) <- deltas
rownames(nalcres3r) <- groupsizes
nalcres3r %>% kbl(caption="AL nonparametric recall") %>% kable_paper("hover", full_width = F)
AL nonparametric recall
0.1 0.2 0.3 0.4 0.5
3 0.0012 0.0360 0.2032 0.4244 0.5888
5 0.0032 0.1212 0.3728 0.5636 0.6924
8 0.0108 0.2448 0.5048 0.6704 0.7628
12 0.0392 0.3804 0.6720 0.7980 0.8548
F1(nalcres3p,nalcres3r) %>% kbl(caption="AL nonparametric F1") %>% kable_paper("hover", full_width = F)
AL nonparametric F1
0.1 0.2 0.3 0.4 0.5
3 0.0023914 0.0685976 0.3200000 0.5563713 0.6976303
5 0.0063618 0.2067554 0.4982625 0.6685647 0.7743234
8 0.0212766 0.3674572 0.6193865 0.7505598 0.8226920
12 0.0737952 0.5059856 0.7555655 0.8494784 0.8896753

ALM competitive test

almres <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simalm(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

almres2 <- do.call(rbind,lapply(almres,colMeans))
almres2[,"PREC"] <- almres2[,"TP"] / ( almres2[,"TP"] + almres2[,"FP"] )

almres2 %>% kbl(caption="ALM results") %>% kable_paper("hover", full_width = F)
ALM results
TP FP FN TN PREC REC
0.10 0.10 49.90 949.90 0.5000000 0.0020
0.30 0.06 49.70 949.94 0.8333333 0.0060
0.78 0.04 49.22 949.96 0.9512195 0.0156
3.10 0.28 46.90 949.72 0.9171598 0.0620
2.42 0.18 47.58 949.82 0.9307692 0.0484
8.28 0.56 41.72 949.44 0.9366516 0.1656
16.14 1.02 33.86 948.98 0.9405594 0.3228
23.76 1.68 26.24 948.32 0.9339623 0.4752
13.02 0.82 36.98 949.18 0.9407514 0.2604
23.30 1.66 26.70 948.34 0.9334936 0.4660
29.84 1.98 20.16 948.02 0.9377750 0.5968
36.64 2.62 13.36 947.38 0.9332654 0.7328
25.00 1.64 25.00 948.36 0.9384384 0.5000
32.46 2.28 17.54 947.72 0.9343696 0.6492
37.02 2.52 12.98 947.48 0.9362671 0.7404
41.12 3.16 8.88 946.84 0.9286360 0.8224
32.34 2.26 17.66 947.74 0.9346821 0.6468
37.12 2.54 12.88 947.46 0.9359556 0.7424
40.36 2.72 9.64 947.28 0.9368617 0.8072
42.94 3.34 7.06 946.66 0.9278306 0.8588
almres3p <- do.call(rbind,lapply(groupsizes, function (g) { almres2[params$groupsizes==g,"PREC"] }))
colnames(almres3p) <- deltas
rownames(almres3p) <- groupsizes
almres3p %>% kbl(caption="ALM precision") %>% kable_paper("hover", full_width = F)
ALM precision
0.1 0.2 0.3 0.4 0.5
3 0.5000000 0.9307692 0.9407514 0.9384384 0.9346821
5 0.8333333 0.9366516 0.9334936 0.9343696 0.9359556
8 0.9512195 0.9405594 0.9377750 0.9362671 0.9368617
12 0.9171598 0.9339623 0.9332654 0.9286360 0.9278306
almres3r <- do.call(rbind,lapply(groupsizes, function (g) { almres2[params$groupsizes==g,"REC"] }))
colnames(almres3r) <- deltas
rownames(almres3r) <- groupsizes
almres3r %>% kbl(caption="ALM recall") %>% kable_paper("hover", full_width = F)
ALM recall
0.1 0.2 0.3 0.4 0.5
3 0.0020 0.0484 0.2604 0.5000 0.6468
5 0.0060 0.1656 0.4660 0.6492 0.7424
8 0.0156 0.3228 0.5968 0.7404 0.8072
12 0.0620 0.4752 0.7328 0.8224 0.8588
F1(almres3p,almres3r) %>% kbl(caption="ALM F1") %>% kable_paper("hover", full_width = F)
ALM F1
0.1 0.2 0.3 0.4 0.5
3 0.0039841 0.0920152 0.4078947 0.6524008 0.7645390
5 0.0119142 0.2814412 0.6216649 0.7661081 0.8280170
8 0.0306966 0.4806432 0.7294060 0.8268930 0.8672110
12 0.1161484 0.6299046 0.8209724 0.8722953 0.8919817

AA sim

aares <- lapply(1:nrow(params) , function(j) {
  groupsize = params[j,1]
  delta = params[j,2]
  res <- mclapply(1:sims,function(i) {
    simaa(genesetdatabase=gsets, myann=myann, mval=normal_mval, seed=i*100, frac_genes=0.5, frac_probes=0.5,
      groupsize=groupsize, delta=delta, num_dm_sets=num_dm_sets)
  },mc.cores=6)
  res <- do.call(rbind,res)
  return(res)
})

aares2 <- do.call(rbind,lapply(aares,colMeans))
aares2[,"PREC"] <- aares2[,"TP"] / ( aares2[,"TP"] + aares2[,"FP"] )

aares2 %>% kbl(caption="AA results") %>% kable_paper("hover", full_width = F)
AA results
TP FP FN TN PREC REC
1.92 66.50 48.08 883.50 0.0280620 0.0384
0.50 19.46 49.50 930.54 0.0250501 0.0100
0.00 0.00 50.00 950.00 NaN 0.0000
0.00 0.00 50.00 950.00 NaN 0.0000
1.98 66.34 48.02 883.66 0.0289813 0.0396
0.50 19.30 49.50 930.70 0.0252525 0.0100
0.36 3.28 49.64 946.72 0.0989011 0.0072
0.00 0.00 50.00 950.00 NaN 0.0000
2.00 66.42 48.00 883.58 0.0292312 0.0400
0.50 19.06 49.50 930.94 0.0255624 0.0100
0.44 4.60 49.56 945.40 0.0873016 0.0088
0.00 0.00 50.00 950.00 NaN 0.0000
2.22 65.80 47.78 884.20 0.0326375 0.0444
0.64 18.96 49.36 931.04 0.0326531 0.0128
0.80 6.08 49.20 943.92 0.1162791 0.0160
0.20 0.00 49.80 950.00 1.0000000 0.0040
2.56 65.50 47.44 884.50 0.0376139 0.0512
1.36 18.80 48.64 931.20 0.0674603 0.0272
1.62 6.80 48.38 943.20 0.1923990 0.0324
0.70 0.00 49.30 950.00 1.0000000 0.0140
aares3p <- do.call(rbind,lapply(groupsizes, function (g) { aares2[params$groupsizes==g,"PREC"] }))
colnames(aares3p) <- deltas
rownames(aares3p) <- groupsizes
aares3p %>% kbl(caption="AA precision") %>% kable_paper("hover", full_width = F)
AA precision
0.1 0.2 0.3 0.4 0.5
3 0.0280620 0.0289813 0.0292312 0.0326375 0.0376139
5 0.0250501 0.0252525 0.0255624 0.0326531 0.0674603
8 NaN 0.0989011 0.0873016 0.1162791 0.1923990
12 NaN NaN NaN 1.0000000 1.0000000
aares3r <- do.call(rbind,lapply(groupsizes, function (g) { aares2[params$groupsizes==g,"REC"] }))
colnames(aares3r) <- deltas
rownames(aares3r) <- groupsizes
aares3r %>% kbl(caption="AA recall") %>% kable_paper("hover", full_width = F)
AA recall
0.1 0.2 0.3 0.4 0.5
3 0.0384 0.0396 0.0400 0.0444 0.0512
5 0.0100 0.0100 0.0100 0.0128 0.0272
8 0.0000 0.0072 0.0088 0.0160 0.0324
12 0.0000 0.0000 0.0000 0.0040 0.0140
F1(aares3p,almres3r) %>% kbl(caption="AA F1") %>% kable_paper("hover", full_width = F)
AA F1
0.1 0.2 0.3 0.4 0.5
3 0.0037339 0.0362541 0.0525621 0.0612752 0.0710934
5 0.0096812 0.0438225 0.0484661 0.0621787 0.1236819
8 NaN 0.1514119 0.1523212 0.2009925 0.3107336
12 NaN NaN NaN 0.9025461 0.9240370

Session information

save.image("GSE158422_simulate100.Rdata")

sessionInfo()
## R version 4.3.2 (2023-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] kableExtra_1.3.4                                   
##  [2] mitch_1.15.0                                       
##  [3] psych_2.3.9                                        
##  [4] org.Hs.eg.db_3.18.0                                
##  [5] AnnotationDbi_1.64.1                               
##  [6] HGNChelper_0.8.1                                   
##  [7] IlluminaHumanMethylation450kmanifest_0.4.0         
##  [8] missMethyl_1.36.0                                  
##  [9] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
## [10] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1 
## [11] minfi_1.48.0                                       
## [12] bumphunter_1.44.0                                  
## [13] locfit_1.5-9.8                                     
## [14] iterators_1.0.14                                   
## [15] foreach_1.5.2                                      
## [16] Biostrings_2.70.1                                  
## [17] XVector_0.42.0                                     
## [18] SummarizedExperiment_1.32.0                        
## [19] Biobase_2.62.0                                     
## [20] MatrixGenerics_1.14.0                              
## [21] matrixStats_1.2.0                                  
## [22] GenomicRanges_1.54.1                               
## [23] GenomeInfoDb_1.38.2                                
## [24] IRanges_2.36.0                                     
## [25] S4Vectors_0.40.2                                   
## [26] BiocGenerics_0.48.1                                
## [27] limma_3.58.1                                       
## [28] stringi_1.8.3                                      
## 
## loaded via a namespace (and not attached):
##   [1] splines_4.3.2             later_1.3.2              
##   [3] BiocIO_1.12.0             bitops_1.0-7             
##   [5] filelock_1.0.3            BiasedUrn_2.0.11         
##   [7] tibble_3.2.1              preprocessCore_1.64.0    
##   [9] XML_3.99-0.16             lifecycle_1.0.4          
##  [11] lattice_0.22-5            MASS_7.3-60              
##  [13] base64_2.0.1              scrime_1.3.5             
##  [15] magrittr_2.0.3            sass_0.4.8               
##  [17] rmarkdown_2.25            jquerylib_0.1.4          
##  [19] yaml_2.3.8                httpuv_1.6.13            
##  [21] doRNG_1.8.6               askpass_1.2.0            
##  [23] DBI_1.1.3                 RColorBrewer_1.1-3       
##  [25] abind_1.4-5               zlibbioc_1.48.0          
##  [27] rvest_1.0.3               quadprog_1.5-8           
##  [29] purrr_1.0.2               RCurl_1.98-1.13          
##  [31] rappdirs_0.3.3            GenomeInfoDbData_1.2.11  
##  [33] genefilter_1.84.0         annotate_1.80.0          
##  [35] svglite_2.1.3             DelayedMatrixStats_1.24.0
##  [37] codetools_0.2-19          DelayedArray_0.28.0      
##  [39] xml2_1.3.6                tidyselect_1.2.0         
##  [41] beanplot_1.3.1            BiocFileCache_2.10.1     
##  [43] webshot_0.5.5             illuminaio_0.44.0        
##  [45] GenomicAlignments_1.38.0  jsonlite_1.8.8           
##  [47] multtest_2.58.0           ellipsis_0.3.2           
##  [49] survival_3.5-7            systemfonts_1.0.5        
##  [51] tools_4.3.2               progress_1.2.3           
##  [53] Rcpp_1.0.11               glue_1.6.2               
##  [55] gridExtra_2.3             mnormt_2.1.1             
##  [57] SparseArray_1.2.2         xfun_0.41                
##  [59] dplyr_1.1.4               HDF5Array_1.30.0         
##  [61] fastmap_1.1.1             GGally_2.2.0             
##  [63] rhdf5filters_1.14.1       fansi_1.0.6              
##  [65] openssl_2.1.1             caTools_1.18.2           
##  [67] digest_0.6.33             R6_2.5.1                 
##  [69] mime_0.12                 colorspace_2.1-0         
##  [71] gtools_3.9.5              biomaRt_2.58.0           
##  [73] RSQLite_2.3.4             utf8_1.2.4               
##  [75] tidyr_1.3.0               generics_0.1.3           
##  [77] data.table_1.14.10        rtracklayer_1.62.0       
##  [79] prettyunits_1.2.0         httr_1.4.7               
##  [81] htmlwidgets_1.6.4         S4Arrays_1.2.0           
##  [83] ggstats_0.5.1             pkgconfig_2.0.3          
##  [85] gtable_0.3.4              blob_1.2.4               
##  [87] siggenes_1.76.0           htmltools_0.5.7          
##  [89] echarts4r_0.4.5           scales_1.3.0             
##  [91] png_0.1-8                 rstudioapi_0.15.0        
##  [93] knitr_1.45                reshape2_1.4.4           
##  [95] tzdb_0.4.0                rjson_0.2.21             
##  [97] nlme_3.1-163              curl_5.2.0               
##  [99] cachem_1.0.8              rhdf5_2.46.1             
## [101] stringr_1.5.1             KernSmooth_2.23-22       
## [103] restfulr_0.0.15           GEOquery_2.70.0          
## [105] pillar_1.9.0              grid_4.3.2               
## [107] reshape_0.8.9             vctrs_0.6.5              
## [109] gplots_3.1.3              promises_1.2.1           
## [111] dbplyr_2.4.0              xtable_1.8-4             
## [113] beeswarm_0.4.0            evaluate_0.23            
## [115] readr_2.1.4               GenomicFeatures_1.54.1   
## [117] cli_3.6.2                 compiler_4.3.2           
## [119] Rsamtools_2.18.0          rlang_1.1.2              
## [121] crayon_1.5.2              rngtools_1.5.2           
## [123] nor1mix_1.3-2             mclust_6.0.1             
## [125] plyr_1.8.9                viridisLite_0.4.2        
## [127] BiocParallel_1.36.0       munsell_0.5.0            
## [129] Matrix_1.6-1.1            hms_1.1.3                
## [131] sparseMatrixStats_1.14.0  bit64_4.0.5              
## [133] ggplot2_3.4.4             Rhdf5lib_1.24.1          
## [135] KEGGREST_1.42.0           statmod_1.5.0            
## [137] shiny_1.8.0               highr_0.10               
## [139] memoise_2.0.1             bslib_0.6.1              
## [141] bit_4.0.5

Notes

LA

  • simla: parametric self-contained

  • simlac: parametric competitive

  • simnla: nonparametric self-contained

  • simnlac: nonparametric competitive

AL

  • simal: parametric self-contained

  • simalc: parametric competitive

  • simnal: nonparametric self-contained

  • simnalc: nonparametric competitive