Test

The purpose of the script is to understand why in figure 6 there is a positive correlation in the shortlist of genes shown in the scatterplot in panel A, but the contour heatmap indicatess a slight negative correlation.

cell <- readRDS("deRes_maleVSfemale_young_cell.rds")
tiss <- readRDS("deRes_maleVSfemale_young_tissue.rds")
head(cell)
##                  baseMean log2FoldChange     lfcSE       stat     pvalue
## ENSG00000000003 1826.0151     0.39965523 0.2182276  1.8313693 0.06704544
## ENSG00000000419  970.3843     0.54076101 0.2373122  2.2786900 0.02268550
## ENSG00000000457  280.2717     0.11917897 0.1600669  0.7445574 0.45653935
## ENSG00000000460  189.8609     0.04339489 0.2367166  0.1833200 0.85454693
## ENSG00000000971 2285.8929    -0.04185744 0.1943196 -0.2154052 0.82945142
## ENSG00000001036 2496.6810    -0.11903377 0.1034856 -1.1502448 0.25004307
##                      padj diffexpressed
## ENSG00000000003 0.7304847            NO
## ENSG00000000419 0.6636213            NO
## ENSG00000000457 0.9050947            NO
## ENSG00000000460 0.9806722            NO
## ENSG00000000971 0.9753683            NO
## ENSG00000001036 0.8355613            NO
m <- merge(cell,tiss,by=0)
head(m)
##         Row.names baseMean.x log2FoldChange.x   lfcSE.x     stat.x   pvalue.x
## 1 ENSG00000000003  1826.0151       0.39965523 0.2182276  1.8313693 0.06704544
## 2 ENSG00000000419   970.3843       0.54076101 0.2373122  2.2786900 0.02268550
## 3 ENSG00000000457   280.2717       0.11917897 0.1600669  0.7445574 0.45653935
## 4 ENSG00000000460   189.8609       0.04339489 0.2367166  0.1833200 0.85454693
## 5 ENSG00000000971  2285.8929      -0.04185744 0.1943196 -0.2154052 0.82945142
## 6 ENSG00000001036  2496.6810      -0.11903377 0.1034856 -1.1502448 0.25004307
##      padj.x diffexpressed.x baseMean.y log2FoldChange.y    lfcSE.y     stat.y
## 1 0.7304847              NO   131.7765      -0.12412971 0.17155218 -0.7235683
## 2 0.6636213              NO   636.7011       0.08664223 0.07605463  1.1392105
## 3 0.9050947              NO   248.5846       0.01910610 0.11413696  0.1673962
## 4 0.9806722              NO    23.6668      -0.18119207 0.27099065 -0.6686285
## 5 0.9753683              NO   522.6931      -0.35835736 0.17272997 -2.0746682
## 6 0.8355613              NO   147.0028       0.04276142 0.16434942  0.2601860
##    pvalue.y    padj.y diffexpressed.y
## 1 0.4693308 0.8238626              NO
## 2 0.2546154 0.6904190              NO
## 3 0.8670583 0.9653380              NO
## 4 0.5037325 0.8403387              NO
## 5 0.0380173 0.3590989              NO
## 6 0.7947203 0.9470191              NO

Now test association of fold changes.

cor.test(m$log2FoldChange.x,m$log2FoldChange.y)
## 
##  Pearson's product-moment correlation
## 
## data:  m$log2FoldChange.x and m$log2FoldChange.y
## t = 43.104, df = 14668, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3208568 0.3495836
## sample estimates:
##       cor 
## 0.3352981
mylm <- lm(m$log2FoldChange.y ~ m$log2FoldChange.x)
plot(m$log2FoldChange.x,m$log2FoldChange.y,xlab="cell",ylab="tissue",main="logFC")
abline(mylm,col="red",lty=2)

It looks similar to Fig 6A, which is good.

Now test DESeq2 test statistic.

cor.test(m$stat.x,m$stat.y)
## 
##  Pearson's product-moment correlation
## 
## data:  m$stat.x and m$stat.y
## t = -4.4546, df = 14668, p-value = 8.467e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.05290706 -0.02058623
## sample estimates:
##         cor 
## -0.03675626
mylm <- lm(m$stat.y ~ m$stat.x)
plot(m$stat.x,m$stat.y,xlab="cell",ylab="tissue",main="DESeq2 stat")
abline(mylm,col="red",lty=2)

A slight negative correlation is seen.

Now look at the rank-rank plot.

plot(rank(m$stat.x),rank(m$stat.y),cex=0.1,pch=2)
mylm <- lm(rank(m$stat.y) ~ rank(m$stat.x))
abline(mylm,col="red",lty=2)

cor.test(m$stat.x,m$stat.y,method = "spearman")
## Warning in cor.test.default(m$stat.x, m$stat.y, method = "spearman"): Cannot
## compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  m$stat.x and m$stat.y
## S = 6.0179e+11, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.1436798

Definitely a slight negative correlation that is consistent with Megan’s results.

Conclusion

I have checked that the apparent discrepancy between Fig 6A and 6B is due to the unusual nature of the data itself.

When looking at these top genes, they are indeed positively correlated, however majority of other genes show a negative association.

Session

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
##  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
##  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     R6_2.5.1          fastmap_1.2.0     xfun_0.47        
##  [5] cachem_1.1.0      knitr_1.48        htmltools_0.5.8.1 rmarkdown_2.28   
##  [9] lifecycle_1.0.4   cli_3.6.3         sass_0.4.9        jquerylib_0.1.4  
## [13] compiler_4.4.2    highr_0.11        rstudioapi_0.16.0 tools_4.4.2      
## [17] evaluate_0.24.0   bslib_0.8.0       yaml_2.3.10       rlang_1.1.4      
## [21] jsonlite_1.8.8