Source: https://github.com/markziemann/combined_enrichment

Intro

This script is designed to fetch GTEXv8 RNA-seq expression data in order to assess the correlation of genes within a gene set.

suppressPackageStartupMessages({
#  library("kableExtra")
  library("beeswarm")
#  library("getDEE2")
  library("mitch")
  library("gplots")
  library("Hmisc")
  library("parallel")
  library("vioplot")
})
res2 <- read.table("cor.tsv",sep="\t")

par(mfrow=c(2,1))
hist(res2[,"stat.t"],breaks=100,main="",xlab="test statistic")
boxplot(res2[,"stat.t"],horizontal=TRUE)

hist(res2[,"stat.t"],xlim=c(-50,50),breaks=100,main="",xlab="test statistic")
boxplot(res2[,"stat.t"],ylim=c(-50,50),horizontal=TRUE)

par(mfrow=c(1,1))
pos <- length( which(res2[,1]>0 & res2[,2] < 0.05) )
neg <- length( which(res2[,1]<0 & res2[,2] < 0.05) )
none <- length( which(res2[,2] > 0.05) )
bars <- c("Pos"=pos,"None"=none,"Neg"=neg)
barplot(bars,ylab="no. gene sets with correlated expression",ylim=c(0,max(bars)*1.3))
text(((1:length(bars))-0.5)*1.25,bars+50,labels=bars)

Conclusions

This work confirms the observations by Hong, that genes in a set are on the whole, most gene sets positively correlated members.

Some gene sets are highly correlated, while others are heterogenous, or slightly anticorrelated.

We can try to reproduce this correlation structure in simulated data.

Session information

sessionInfo()
## R version 4.0.1 (2020-06-06)
## Platform: x86_64-conda_cos6-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /ceph-g/opt/miniconda3/envs/R40/lib/libopenblasp-r0.3.9.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] vioplot_0.3.7   zoo_1.8-8       sm_2.2-5.7      Hmisc_4.6-0    
##  [5] ggplot2_3.3.5   Formula_1.2-4   survival_3.2-3  lattice_0.20-41
##  [9] gplots_3.1.1    mitch_1.0.10    beeswarm_0.4.0 
## 
## loaded via a namespace (and not attached):
##  [1] splines_4.0.1       gtools_3.9.2        shiny_1.4.0.2      
##  [4] assertthat_0.2.1    latticeExtra_0.6-29 blob_1.2.1         
##  [7] yaml_2.2.1          pillar_1.7.0        backports_1.1.8    
## [10] glue_1.4.1          digest_0.6.25       RColorBrewer_1.1-2 
## [13] promises_1.1.1      checkmate_2.0.0     colorspace_1.4-1   
## [16] htmltools_0.5.0     httpuv_1.5.4        Matrix_1.2-18      
## [19] plyr_1.8.6          pkgconfig_2.0.3     purrr_0.3.4        
## [22] xtable_1.8-4        scales_1.1.1        jpeg_0.1-9         
## [25] later_1.1.0.1       tibble_3.0.1        htmlTable_2.4.0    
## [28] echarts4r_0.3.3     generics_0.0.2      ellipsis_0.3.2     
## [31] withr_2.2.0         nnet_7.3-14         cli_3.1.1          
## [34] magrittr_1.5        crayon_1.3.4        mime_0.9           
## [37] evaluate_0.14       GGally_2.1.2        fansi_0.4.1        
## [40] MASS_7.3-51.6       foreign_0.8-80      tools_4.0.1        
## [43] data.table_1.12.8   lifecycle_1.0.1     stringr_1.4.0      
## [46] munsell_0.5.0       cluster_2.1.0       compiler_4.0.1     
## [49] caTools_1.18.2      rlang_1.0.0         grid_4.0.1         
## [52] rstudioapi_0.11     htmlwidgets_1.5.1   bitops_1.0-7       
## [55] base64enc_0.1-3     rmarkdown_2.3       gtable_0.3.0       
## [58] DBI_1.1.0           reshape_0.8.8       reshape2_1.4.4     
## [61] R6_2.4.1            gridExtra_2.3       knitr_1.28         
## [64] dplyr_1.0.8         fastmap_1.0.1       utf8_1.1.4         
## [67] KernSmooth_2.23-17  stringi_1.4.6       Rcpp_1.0.8         
## [70] vctrs_0.3.8         rpart_4.1-15        png_0.1-7          
## [73] tidyselect_1.1.1    xfun_0.15