

The predicted targets of microRNAs are used in functional enrichment analysis to justify the potential function of microRNAs, but there are some logical problems with this. In a biological tissue, not all of these targets will be expressed. Also the enrichment analysis requires a background list which is all the genes that can be measured. Not all genes are expressed in a tissue, at least half are silenced, so each enrichment analysis requires a custom background gene list. Unfortunately a custom background list is rarely used. We argue that this causes a dramatic distortion to the results.

Import read counts

Importing RNA-seq data

myfiles <-list.files(".",pattern="ke.tsv",recursive=TRUE)

x <- lapply(myfiles,function(x) {
  xx <- read.table(x,header=TRUE,row.names=1)

x <-,x)
colnames(x) <- gsub("_est_counts","",colnames(x))

Need gene symbols to map to the transcripts.

mdat <- getDEE2Metadata("hsapiens")
d <- getDEE2(species="hsapiens",SRRvec="SRR11509477",mdat,outfile="NULL",counts="GeneCounts",legacy=TRUE)
txinfo <- d$TxInfo

Merge txinfo.

xm <- merge(x,txinfo,by=0)
xm$GeneID_symbol <- paste(xm$GeneID,xm$GeneSymbol)
xm$Row.names = xm$GeneID = xm$GeneSymbol = xm$TxLength = NULL
xa <- aggregate(. ~ GeneID_symbol,xm,sum)
rownames(xa) <- xa[,1]
xa[,1] = NULL

Differential expression

xaf <- xa[which(rowMeans(xa)>=10),]
dim(xa) ; dim(xaf)
ss <- data.frame("run"=colnames(xaf),"trt"=c(1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0))
rownames(ss) <- ss$run

mds <- cmdscale(dist(t(xaf)))
text(mds, labels=rownames(mds) ,col="black")

dds <- DESeqDataSetFromMatrix(countData = round(xaf) , colData = ss , design = ~ trt )
z <- results(res)
vsd <- vst(dds, blind=FALSE)
zz <- cbind(,assay(vsd))
dge <-[order(zz$pvalue),])
ups <- rownames(subset(dge,padj<0.05 & log2FoldChange>0))
dns <- rownames(subset(dge,padj<0.05 & log2FoldChange<0))
Session information

For reproducibility.

