Source: https://github.com/markziemann/combined_enrichment
This script is designed to fetch GTEXv8 RNA-seq expression data in order to assess the correlation of genes within a gene set.
suppressPackageStartupMessages({
# library("kableExtra")
library("beeswarm")
# library("getDEE2")
library("mitch")
library("gplots")
library("Hmisc")
library("parallel")
library("vioplot")
})
res2 <- read.table("cor.tsv",sep="\t")
par(mfrow=c(2,1))
hist(res2[,"stat.t"],breaks=100,main="",xlab="test statistic")
boxplot(res2[,"stat.t"],horizontal=TRUE)
hist(res2[,"stat.t"],xlim=c(-50,50),breaks=100,main="",xlab="test statistic")
boxplot(res2[,"stat.t"],ylim=c(-50,50),horizontal=TRUE)
par(mfrow=c(1,1))
pos <- length( which(res2[,1]>0 & res2[,2] < 0.05) )
neg <- length( which(res2[,1]<0 & res2[,2] < 0.05) )
none <- length( which(res2[,2] > 0.05) )
bars <- c("Pos"=pos,"None"=none,"Neg"=neg)
barplot(bars,ylab="no. gene sets with correlated expression",ylim=c(0,max(bars)*1.3))
text(((1:length(bars))-0.5)*1.25,bars+50,labels=bars)
This work confirms the observations by Hong, that genes in a set are on the whole, most gene sets positively correlated members.
Some gene sets are highly correlated, while others are heterogenous, or slightly anticorrelated.
We can try to reproduce this correlation structure in simulated data.
sessionInfo()
## R version 4.0.1 (2020-06-06)
## Platform: x86_64-conda_cos6-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS/LAPACK: /ceph-g/opt/miniconda3/envs/R40/lib/libopenblasp-r0.3.9.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] vioplot_0.3.7 zoo_1.8-8 sm_2.2-5.7 Hmisc_4.6-0
## [5] ggplot2_3.3.5 Formula_1.2-4 survival_3.2-3 lattice_0.20-41
## [9] gplots_3.1.1 mitch_1.0.10 beeswarm_0.4.0
##
## loaded via a namespace (and not attached):
## [1] splines_4.0.1 gtools_3.9.2 shiny_1.4.0.2
## [4] assertthat_0.2.1 latticeExtra_0.6-29 blob_1.2.1
## [7] yaml_2.2.1 pillar_1.7.0 backports_1.1.8
## [10] glue_1.4.1 digest_0.6.25 RColorBrewer_1.1-2
## [13] promises_1.1.1 checkmate_2.0.0 colorspace_1.4-1
## [16] htmltools_0.5.0 httpuv_1.5.4 Matrix_1.2-18
## [19] plyr_1.8.6 pkgconfig_2.0.3 purrr_0.3.4
## [22] xtable_1.8-4 scales_1.1.1 jpeg_0.1-9
## [25] later_1.1.0.1 tibble_3.0.1 htmlTable_2.4.0
## [28] echarts4r_0.3.3 generics_0.0.2 ellipsis_0.3.2
## [31] withr_2.2.0 nnet_7.3-14 cli_3.1.1
## [34] magrittr_1.5 crayon_1.3.4 mime_0.9
## [37] evaluate_0.14 GGally_2.1.2 fansi_0.4.1
## [40] MASS_7.3-51.6 foreign_0.8-80 tools_4.0.1
## [43] data.table_1.12.8 lifecycle_1.0.1 stringr_1.4.0
## [46] munsell_0.5.0 cluster_2.1.0 compiler_4.0.1
## [49] caTools_1.18.2 rlang_1.0.0 grid_4.0.1
## [52] rstudioapi_0.11 htmlwidgets_1.5.1 bitops_1.0-7
## [55] base64enc_0.1-3 rmarkdown_2.3 gtable_0.3.0
## [58] DBI_1.1.0 reshape_0.8.8 reshape2_1.4.4
## [61] R6_2.4.1 gridExtra_2.3 knitr_1.28
## [64] dplyr_1.0.8 fastmap_1.0.1 utf8_1.1.4
## [67] KernSmooth_2.23-17 stringi_1.4.6 Rcpp_1.0.8
## [70] vctrs_0.3.8 rpart_4.1-15 png_0.1-7
## [73] tidyselect_1.1.1 xfun_0.15