The purpose of the script is to understand why in figure 6 there is a positive correlation in the shortlist of genes shown in the scatterplot in panel A, but the contour heatmap indicatess a slight negative correlation.
cell <- readRDS("deRes_maleVSfemale_young_cell.rds")
tiss <- readRDS("deRes_maleVSfemale_young_tissue.rds")
head(cell)
## baseMean log2FoldChange lfcSE stat pvalue
## ENSG00000000003 1826.0151 0.39965523 0.2182276 1.8313693 0.06704544
## ENSG00000000419 970.3843 0.54076101 0.2373122 2.2786900 0.02268550
## ENSG00000000457 280.2717 0.11917897 0.1600669 0.7445574 0.45653935
## ENSG00000000460 189.8609 0.04339489 0.2367166 0.1833200 0.85454693
## ENSG00000000971 2285.8929 -0.04185744 0.1943196 -0.2154052 0.82945142
## ENSG00000001036 2496.6810 -0.11903377 0.1034856 -1.1502448 0.25004307
## padj diffexpressed
## ENSG00000000003 0.7304847 NO
## ENSG00000000419 0.6636213 NO
## ENSG00000000457 0.9050947 NO
## ENSG00000000460 0.9806722 NO
## ENSG00000000971 0.9753683 NO
## ENSG00000001036 0.8355613 NO
m <- merge(cell,tiss,by=0)
head(m)
## Row.names baseMean.x log2FoldChange.x lfcSE.x stat.x pvalue.x
## 1 ENSG00000000003 1826.0151 0.39965523 0.2182276 1.8313693 0.06704544
## 2 ENSG00000000419 970.3843 0.54076101 0.2373122 2.2786900 0.02268550
## 3 ENSG00000000457 280.2717 0.11917897 0.1600669 0.7445574 0.45653935
## 4 ENSG00000000460 189.8609 0.04339489 0.2367166 0.1833200 0.85454693
## 5 ENSG00000000971 2285.8929 -0.04185744 0.1943196 -0.2154052 0.82945142
## 6 ENSG00000001036 2496.6810 -0.11903377 0.1034856 -1.1502448 0.25004307
## padj.x diffexpressed.x baseMean.y log2FoldChange.y lfcSE.y stat.y
## 1 0.7304847 NO 131.7765 -0.12412971 0.17155218 -0.7235683
## 2 0.6636213 NO 636.7011 0.08664223 0.07605463 1.1392105
## 3 0.9050947 NO 248.5846 0.01910610 0.11413696 0.1673962
## 4 0.9806722 NO 23.6668 -0.18119207 0.27099065 -0.6686285
## 5 0.9753683 NO 522.6931 -0.35835736 0.17272997 -2.0746682
## 6 0.8355613 NO 147.0028 0.04276142 0.16434942 0.2601860
## pvalue.y padj.y diffexpressed.y
## 1 0.4693308 0.8238626 NO
## 2 0.2546154 0.6904190 NO
## 3 0.8670583 0.9653380 NO
## 4 0.5037325 0.8403387 NO
## 5 0.0380173 0.3590989 NO
## 6 0.7947203 0.9470191 NO
Now test association of fold changes.
cor.test(m$log2FoldChange.x,m$log2FoldChange.y)
##
## Pearson's product-moment correlation
##
## data: m$log2FoldChange.x and m$log2FoldChange.y
## t = 43.104, df = 14668, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3208568 0.3495836
## sample estimates:
## cor
## 0.3352981
mylm <- lm(m$log2FoldChange.y ~ m$log2FoldChange.x)
plot(m$log2FoldChange.x,m$log2FoldChange.y,xlab="cell",ylab="tissue",main="logFC")
abline(mylm,col="red",lty=2)
It looks similar to Fig 6A, which is good.
Now test DESeq2 test statistic.
cor.test(m$stat.x,m$stat.y)
##
## Pearson's product-moment correlation
##
## data: m$stat.x and m$stat.y
## t = -4.4546, df = 14668, p-value = 8.467e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.05290706 -0.02058623
## sample estimates:
## cor
## -0.03675626
mylm <- lm(m$stat.y ~ m$stat.x)
plot(m$stat.x,m$stat.y,xlab="cell",ylab="tissue",main="DESeq2 stat")
abline(mylm,col="red",lty=2)
A slight negative correlation is seen.
Now look at the rank-rank plot.
plot(rank(m$stat.x),rank(m$stat.y),cex=0.1,pch=2)
mylm <- lm(rank(m$stat.y) ~ rank(m$stat.x))
abline(mylm,col="red",lty=2)
cor.test(m$stat.x,m$stat.y,method = "spearman")
## Warning in cor.test.default(m$stat.x, m$stat.y, method = "spearman"): Cannot
## compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: m$stat.x and m$stat.y
## S = 6.0179e+11, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.1436798
Definitely a slight negative correlation that is consistent with Megan’s results.
I have checked that the apparent discrepancy between Fig 6A and 6B is due to the unusual nature of the data itself.
When looking at these top genes, they are indeed positively correlated, however majority of other genes show a negative association.
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.37 R6_2.5.1 fastmap_1.2.0 xfun_0.47
## [5] cachem_1.1.0 knitr_1.48 htmltools_0.5.8.1 rmarkdown_2.28
## [9] lifecycle_1.0.4 cli_3.6.3 sass_0.4.9 jquerylib_0.1.4
## [13] compiler_4.4.2 highr_0.11 rstudioapi_0.16.0 tools_4.4.2
## [17] evaluate_0.24.0 bslib_0.8.0 yaml_2.3.10 rlang_1.1.4
## [21] jsonlite_1.8.8