Source: https://github.com/markziemann/SurveyEnrichmentMethods
suppressPackageStartupMessages({
library("getDEE2")
library("DESeq2")
library("clusterProfiler")
library("org.Hs.eg.db")
library("mitch")
library("kableExtra")
library("eulerr")
})
In Kaumadi et al 2021 we showed that major problems plague functional enrichment analysis in many journal articles.
Here I will do my best to reproduce and replicate the findings of some studies. These were selected on the following criteria:
Included in the 2019 group that was studies in Kaumadi et al 2021.
Human study
RNA-seq gene expression
Performed DAVID gene set analysis
Provided gene list
From this subset, four studies were selected
Study | Analytical issues |
---|---|
PMC6349697 | Background |
PMC6425008 | Background, FDR |
PMC6463127 | Background, FDR |
PMC6535219 | Background |
In this document I will be seeing if I can reproduce the same findings by using the same tool with the gene list provided in the article. Also I will correct the method wherever possible. This means using the correct background gene list, using FDR control and performing separate analysis of up and down-regulated gene lists.
The study examined in this document is PMC6425008: Balusamy et al, 2019.
In this study, the effect of citral (3,7-dimethyl-2,6-octadienal) on cells is investigated. In the paper, they isolate the compound with chromatography, followed by in vitro treatment of AGL cell line with different levels of the compound. Treated cells were characterised in terms of morphology and viability. Citral caused shrunken cells, amorphous shape and lower cell number. To understand the pattern of gene expression, RNA-seq of n=1 control and n=1 treated samples was performed yielding 75 and 65 M read pairs respectively which were deposited to SRA (SRP150561).
Some type of differential analysis was performed with Cuffdiff, but with no replicates, this might not be reliable. Anyway, the up and down regulated gene lists are found in (Supplementary File S6)[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6425008/bin/41598_2019_41406_MOESM7_ESM.xlsx]. It contains 612 genes; 216 up and 396 downregulated genes.
There is a network analysis which looks a bit like a hairball and isn’t described in any detail.
Figures 5-7 show the results of the enrichment analysis. A number of gene sets are presented as significant, despite lack of FDR correction. There was no mention of a background gene list in the article.
Here I have summarised the “significant” findings presented in the supplements. There are some problems. Firstly, only 7 of these have an FDR (BH method) < 0.05. Secondly, some of these show that only 1 gene was found in a gene set. Surely one gene on its own can’t constitute a valid enrichment.
orig <- read.table("PMC6425008_orig.tsv",sep="\t",header=TRUE)
orig %>% kbl(caption="Enrichment results presented in PMC6425008 supplementary files") %>% kable_paper("hover", full_width = F)
GO.class | GO.Term | No..of.genes.in.the.dataset | No..of.genes.in.the.background.dataset | Percentage.of.genes | Fold.enrichment | P.value..Hypergeometric.test. | BH.method |
---|---|---|---|---|---|---|---|
GO_BP | Protein metabolism | 43 | 1323 | 22.4 | 3.1 | 0.0e+00 | 0.0e+00 |
GO_BP | Intracellular signaling cascade | 1 | 3 | 0.5 | 31.7 | 3.1e-02 | 1.0e+00 |
GO_BP | Chromosome organization and biogenesis | 1 | 4 | 0.5 | 23.8 | 4.2e-02 | 1.0e+00 |
GO_MF | Heat shock protein activity | 8 | 27 | 4.2 | 28.0 | 0.0e+00 | 1.0e-07 |
GO_MF | Ubiquitin-specific protease activity | 16 | 377 | 8.3 | 4.0 | 2.8e-06 | 3.1e-04 |
GO_MF | Structural constituent of ribosome | 7 | 152 | 3.6 | 4.4 | 1.2e-03 | 8.9e-02 |
GO_MF | Oxidoreductase activity | 7 | 161 | 3.6 | 4.1 | 1.7e-03 | 9.3e-02 |
GO_MF | Chaperone activity | 6 | 126 | 3.1 | 4.5 | 2.3e-03 | 1.0e-01 |
GO_MF | ATPase activity | 4 | 111 | 2.1 | 3.4 | 3.0e-02 | 7.8e-01 |
GO_MF | Storage protein | 1 | 3 | 0.5 | 31.7 | 3.1e-02 | 7.8e-01 |
GO_MF | Superoxide dismutase activity | 1 | 3 | 0.5 | 31.7 | 3.1e-02 | 7.8e-01 |
GO_MF | Transferase activity, transferring aldehyde or ketonic groups | 1 | 3 | 0.5 | 31.7 | 3.1e-02 | 7.8e-01 |
GO_MF | Protein transporter activity | 1 | 4 | 0.5 | 23.8 | 4.2e-02 | 9.3e-01 |
GO_CC | Proteasome complex | 10 | 34 | 5.9 | 25.3 | 0.0e+00 | 0.0e+00 |
GO_CC | Exosomes | 59 | 2043 | 34.9 | 2.5 | 0.0e+00 | 0.0e+00 |
GO_CC | Lysosome | 39 | 1620 | 23.1 | 2.1 | 6.6e-06 | 1.7e-03 |
GO_CC | Inclusion body | 3 | 6 | 1.8 | 43.1 | 3.0e-05 | 5.9e-03 |
In this work, I will see if I can replicate the results for the 7 FDR significant gene sets.
subset(orig,BH.method < 0.05)$GO.Term
## [1] "Protein metabolism"
## [2] "Heat shock protein activity"
## [3] "Ubiquitin-specific protease activity"
## [4] "Proteasome complex"
## [5] "Exosomes"
## [6] "Lysosome"
## [7] "Inclusion body"
The article also shows some KEGG pathway analysis in Table 2 but shows results for individual genes and doesn’t present any results of enrichment tests for gene sets.
In Table 3 of the article, GO enrichment table is shown for apoptosis pathways. It seems only two of the sets are significant: “Response to unfolded protein” and “Regulation of cell cycle”
orig2 <- read.table("PMC6425008_orig2.tsv",sep="\t",header=TRUE)
orig2 %>% kbl(caption="Enrichment results presented in PMC6425008 Table 3") %>% kable_paper("hover", full_width = F)
Go.terms | Category | Number.of.genes | Go.id | up_P.value | dn_p.value |
---|---|---|---|---|---|
Response to unfolded protein | BP | 8 | GO:0006986 | 0.00e+00 | NA |
Negative regulation of programmed cell death | BP | 2 | GO:0043069 | 6.20e-01 | 3.30e-01 |
Negative regulation of cell death | BP | 4 | GO:0060548 | 7.70e-01 | 5.70e-01 |
Regulation of cell cycle | BP | 19 | GO:0051726 | 1.64e-05 | 1.64e-05 |
Negative regulation of programmed cell death | BP | 2 | GO:0043069 | 6.20e-01 | 3.30e-01 |
Positive regulation of cell death | BP | 1 | GO:0010942 | 1.20e-01 | NA |
Negative regulation of cell death | BP | 4 | GO:0060548 | 7.70e-01 | 5.70e-01 |
Intracellular membrane-bounded organelle | CC | 10 | GO:0043231 | NA | 6.70e-01 |
Negative regulation of cell growth | BP | 2 | GO:0030308 | NA | 8.70e-01 |
Negative regulation of catalytic activity | BP | 1 | GO:0043086 | 6.40e-01 | NA |
Cell cycle arrest | BP | 4 | GO:0007050 | NA | 6.40e-01 |
Negative regulation of growth | BP | 2 | GO:0045926 | 4.00e-01 | NA |
Cellular protein metabolic process | BP | 5 | GO:0044267 | NA | 6.40e-01 |
Positive regulation of cellular protein metabolic process | BP | 1 | GO:0032270 | NA | 7.60e-01 |
In addition, the results of Table 3 are inconsistent with the GO results shown in the supplements. For example “Response to unfolded protein” and “Regulation of cell cycle” are missing from the supplementary tables.
I submitted the 216 upregulated genes to DAVID and these were mapped to 213 DAVID gene IDs. This analysis without a background list gave 31 BP, 17 CC and 10 MF significant GO terms (see table below). This is drastically different to what is shown in the article.
repl1 <- read.table("PMC6425008_repl1.tsv", header=TRUE, sep="\t")
repl1 %>% kbl(caption="DAVID Enrichment results when using the genes provided in the supplement and a corrected background gene list.") %>% kable_paper("hover", full_width = F)
Category | Term | Count | X. | PValue | List.Total | Pop.Hits | Pop.Total | Fold.Enrichment | Benjamini |
---|---|---|---|---|---|---|---|---|---|
GOTERM_BP_DIRECT | GO:0043488~regulation of mRNA stability | 18 | 8.450704 | 0.0000000 | 179 | 103 | 16792 | 16.393990 | 0.0000000 |
GOTERM_BP_DIRECT | GO:0038061~NIK/NF-kappaB signaling | 13 | 6.103286 | 0.0000000 | 179 | 66 | 16792 | 18.477738 | 0.0000000 |
GOTERM_BP_DIRECT | GO:0051436~negative regulation of ubiquitin-protein ligase activity involved in mitotic cell cycle | 13 | 6.103286 | 0.0000000 | 179 | 71 | 16792 | 17.176489 | 0.0000000 |
GOTERM_BP_DIRECT | GO:0051437~positive regulation of ubiquitin-protein ligase activity involved in regulation of mitotic cell cycle transition | 13 | 6.103286 | 0.0000000 | 179 | 76 | 16792 | 16.046457 | 0.0000000 |
GOTERM_BP_DIRECT | GO:0031145~anaphase-promoting complex-dependent catabolic process | 13 | 6.103286 | 0.0000000 | 179 | 79 | 16792 | 15.437098 | 0.0000000 |
GOTERM_BP_DIRECT | GO:0002223~stimulatory C-type lectin receptor signaling pathway | 14 | 6.572770 | 0.0000000 | 179 | 105 | 16792 | 12.508007 | 0.0000000 |
GOTERM_BP_DIRECT | GO:0006521~regulation of cellular amino acid metabolic process | 11 | 5.164319 | 0.0000000 | 179 | 51 | 16792 | 20.233542 | 0.0000000 |
GOTERM_BP_DIRECT | GO:0060071~Wnt signaling pathway, planar cell polarity pathway | 13 | 6.103286 | 0.0000000 | 179 | 92 | 16792 | 13.255769 | 0.0000000 |
GOTERM_BP_DIRECT | GO:0033209~tumor necrosis factor-mediated signaling pathway | 14 | 6.572770 | 0.0000000 | 179 | 118 | 16792 | 11.130007 | 0.0000000 |
GOTERM_BP_DIRECT | GO:0002479~antigen processing and presentation of exogenous peptide antigen via MHC class I, TAP-dependent | 11 | 5.164319 | 0.0000000 | 179 | 63 | 16792 | 16.379534 | 0.0000001 |
GOTERM_BP_DIRECT | GO:0000209~protein polyubiquitination | 16 | 7.511737 | 0.0000000 | 179 | 184 | 16792 | 8.157396 | 0.0000001 |
GOTERM_BP_DIRECT | GO:0090090~negative regulation of canonical Wnt signaling pathway | 15 | 7.042254 | 0.0000000 | 179 | 163 | 16792 | 8.632827 | 0.0000002 |
GOTERM_BP_DIRECT | GO:0090263~positive regulation of canonical Wnt signaling pathway | 13 | 6.103286 | 0.0000000 | 179 | 120 | 16792 | 10.162756 | 0.0000004 |
GOTERM_BP_DIRECT | GO:0050852~T cell receptor signaling pathway | 14 | 6.572770 | 0.0000000 | 179 | 148 | 16792 | 8.873924 | 0.0000004 |
GOTERM_BP_DIRECT | GO:1900034~regulation of cellular response to heat | 11 | 5.164319 | 0.0000000 | 179 | 75 | 16792 | 13.758808 | 0.0000004 |
GOTERM_BP_DIRECT | GO:0006986~response to unfolded protein | 9 | 4.225352 | 0.0000000 | 179 | 42 | 16792 | 20.102155 | 0.0000007 |
GOTERM_BP_DIRECT | GO:0038095~Fc-epsilon receptor signaling pathway | 14 | 6.572770 | 0.0000001 | 179 | 178 | 16792 | 7.378319 | 0.0000030 |
GOTERM_BP_DIRECT | GO:0006457~protein folding | 14 | 6.572770 | 0.0000001 | 179 | 180 | 16792 | 7.296338 | 0.0000032 |
GOTERM_BP_DIRECT | GO:0043161~proteasome-mediated ubiquitin-dependent protein catabolic process | 14 | 6.572770 | 0.0000003 | 179 | 203 | 16792 | 6.469659 | 0.0000125 |
GOTERM_BP_DIRECT | GO:0090383~phagosome acidification | 6 | 2.816901 | 0.0000085 | 179 | 27 | 16792 | 20.846679 | 0.0003880 |
GOTERM_BP_DIRECT | GO:0042026~protein refolding | 5 | 2.347418 | 0.0000152 | 179 | 15 | 16792 | 31.270019 | 0.0006649 |
GOTERM_BP_DIRECT | GO:0015991~ATP hydrolysis coupled proton transport | 6 | 2.816901 | 0.0000202 | 179 | 32 | 16792 | 17.589385 | 0.0008429 |
GOTERM_BP_DIRECT | GO:0000165~MAPK cascade | 13 | 6.103286 | 0.0000236 | 179 | 262 | 16792 | 4.654697 | 0.0009433 |
GOTERM_BP_DIRECT | GO:0033572~transferrin transport | 6 | 2.816901 | 0.0000317 | 179 | 35 | 16792 | 16.081724 | 0.0012140 |
GOTERM_BP_DIRECT | GO:0090084~negative regulation of inclusion body assembly | 4 | 1.877934 | 0.0001331 | 179 | 10 | 16792 | 37.524022 | 0.0048858 |
GOTERM_BP_DIRECT | GO:0009408~response to heat | 6 | 2.816901 | 0.0001498 | 179 | 48 | 16792 | 11.726257 | 0.0050924 |
GOTERM_BP_DIRECT | GO:0051603~proteolysis involved in cellular protein catabolic process | 6 | 2.816901 | 0.0001498 | 179 | 48 | 16792 | 11.726257 | 0.0050924 |
GOTERM_BP_DIRECT | GO:0031396~regulation of protein ubiquitination | 4 | 1.877934 | 0.0005925 | 179 | 16 | 16792 | 23.452514 | 0.0194253 |
GOTERM_BP_DIRECT | GO:0006979~response to oxidative stress | 7 | 3.286385 | 0.0011237 | 179 | 110 | 16792 | 5.969731 | 0.0355706 |
GOTERM_BP_DIRECT | GO:0016241~regulation of macroautophagy | 5 | 2.347418 | 0.0011903 | 179 | 44 | 16792 | 10.660234 | 0.0364238 |
GOTERM_BP_DIRECT | GO:0008286~insulin receptor signaling pathway | 6 | 2.816901 | 0.0014301 | 179 | 78 | 16792 | 7.216158 | 0.0423494 |
GOTERM_CC_DIRECT | GO:0000502~proteasome complex | 14 | 6.572770 | 0.0000000 | 187 | 60 | 18224 | 22.739394 | 0.0000000 |
GOTERM_CC_DIRECT | GO:0005829~cytosol | 70 | 32.863850 | 0.0000000 | 187 | 3315 | 18224 | 2.057864 | 0.0000001 |
GOTERM_CC_DIRECT | GO:0070062~extracellular exosome | 62 | 29.107981 | 0.0000000 | 187 | 2811 | 18224 | 2.149478 | 0.0000002 |
GOTERM_CC_DIRECT | GO:0005654~nucleoplasm | 57 | 26.760563 | 0.0000002 | 187 | 2784 | 18224 | 1.995298 | 0.0000120 |
GOTERM_CC_DIRECT | GO:0022624~proteasome accessory complex | 6 | 2.816901 | 0.0000006 | 187 | 17 | 18224 | 34.395722 | 0.0000269 |
GOTERM_CC_DIRECT | GO:0005737~cytoplasm | 83 | 38.967136 | 0.0000048 | 187 | 5222 | 18224 | 1.548971 | 0.0001832 |
GOTERM_CC_DIRECT | GO:0005634~nucleus | 82 | 38.497653 | 0.0000404 | 187 | 5415 | 18224 | 1.475766 | 0.0013227 |
GOTERM_CC_DIRECT | GO:0005839~proteasome core complex | 5 | 2.347418 | 0.0000549 | 187 | 21 | 18224 | 23.203463 | 0.0015712 |
GOTERM_CC_DIRECT | GO:0005838~proteasome regulatory particle | 4 | 1.877934 | 0.0001625 | 187 | 11 | 18224 | 35.438017 | 0.0041352 |
GOTERM_CC_DIRECT | GO:0008540~proteasome regulatory particle, base subcomplex | 4 | 1.877934 | 0.0002151 | 187 | 12 | 18224 | 32.484848 | 0.0049250 |
GOTERM_CC_DIRECT | GO:0016234~inclusion body | 4 | 1.877934 | 0.0007625 | 187 | 18 | 18224 | 21.656566 | 0.0158738 |
GOTERM_CC_DIRECT | GO:0005765~lysosomal membrane | 10 | 4.694836 | 0.0020590 | 187 | 274 | 18224 | 3.556735 | 0.0392932 |
GOTERM_CC_DIRECT | GO:0000786~nucleosome | 6 | 2.816901 | 0.0027678 | 187 | 94 | 18224 | 6.220503 | 0.0425362 |
GOTERM_CC_DIRECT | GO:0031595~nuclear proteasome complex | 3 | 1.408451 | 0.0027862 | 187 | 8 | 18224 | 36.545454 | 0.0425362 |
GOTERM_CC_DIRECT | GO:0033180~proton-transporting V-type ATPase, V1 domain | 3 | 1.408451 | 0.0027862 | 187 | 8 | 18224 | 36.545454 | 0.0425362 |
GOTERM_CC_DIRECT | GO:0033179~proton-transporting V-type ATPase, V0 domain | 3 | 1.408451 | 0.0035583 | 187 | 9 | 18224 | 32.484848 | 0.0509278 |
GOTERM_CC_DIRECT | GO:0031597~cytosolic proteasome complex | 3 | 1.408451 | 0.0044181 | 187 | 10 | 18224 | 29.236364 | 0.0595140 |
GOTERM_MF_DIRECT | GO:0051082~unfolded protein binding | 13 | 6.103286 | 0.0000000 | 182 | 110 | 16881 | 10.961688 | 0.0000003 |
GOTERM_MF_DIRECT | GO:0005515~protein binding | 134 | 62.910798 | 0.0000000 | 182 | 8785 | 16881 | 1.414783 | 0.0000003 |
GOTERM_MF_DIRECT | GO:0051087~chaperone binding | 10 | 4.694836 | 0.0000002 | 182 | 81 | 16881 | 11.450957 | 0.0000200 |
GOTERM_MF_DIRECT | GO:0046961~proton-transporting ATPase activity, rotational mechanism | 6 | 2.816901 | 0.0000073 | 182 | 26 | 16881 | 21.404480 | 0.0005345 |
GOTERM_MF_DIRECT | GO:0015078~hydrogen ion transmembrane transporter activity | 6 | 2.816901 | 0.0000213 | 182 | 32 | 16881 | 17.391140 | 0.0011485 |
GOTERM_MF_DIRECT | GO:0055131~C3HC4-type RING finger domain binding | 4 | 1.877934 | 0.0000237 | 182 | 6 | 16881 | 61.835165 | 0.0011485 |
GOTERM_MF_DIRECT | GO:0004298~threonine-type endopeptidase activity | 5 | 2.347418 | 0.0000663 | 182 | 21 | 16881 | 22.083987 | 0.0027580 |
GOTERM_MF_DIRECT | GO:0016887~ATPase activity | 10 | 4.694836 | 0.0001614 | 182 | 183 | 16881 | 5.068456 | 0.0058694 |
GOTERM_MF_DIRECT | GO:0030544~Hsp70 protein binding | 5 | 2.347418 | 0.0004103 | 182 | 33 | 16881 | 14.053447 | 0.0132653 |
GOTERM_MF_DIRECT | GO:0036402~proteasome-activating ATPase activity | 3 | 1.408451 | 0.0016671 | 182 | 6 | 16881 | 46.376374 | 0.0485127 |
Now I’ll try this again, but using a correct background gene set. As the data is found on the DEE2 database, I can see which genes are detected and I can use those ones as a background. As the results below show, there are 14209 genes in the background.
m <- getDEE2::getDEE2Metadata(species="hsapiens")
ss <- m[grep("SRP150561",m$SRP_accession),]
x <- getDEE2(species="hsapiens",SRRvec=ss$SRR_accession,legacy=TRUE)
## For more information about DEE2 QC metrics, visit
## https://github.com/markziemann/dee2/blob/master/qc/qc_metrics.md
mx <- x$GeneCounts
mxf <- mx[which(rowMeans(mx)>=10),]
nrow(mx) #58302
## [1] 58302
nrow(mxf) #14119
## [1] 14218
bg <- rownames(mxf)
geneinfo <- x$GeneInfo
bgg <- unique(geneinfo[which(rownames(geneinfo) %in%bg),"GeneSymbol"])
length(bgg)
## [1] 14209
writeLines(bgg,"PMC6425008_bg.txt")
So I tried DAVID with the new background, but was greeted with “Due to the ambiguous nature of Gene Symbol, DAVID does not allow it to be used for a background.” This poses a problem for many researchers.
Anyway so I will use Ensembl IDs instead (14218 IDs).
writeLines(bg,"PMC6425008_bg2.txt")
Here are the results when using all detected genes as the background.
repl2 <- read.table("PMC6425008_repl2.tsv", header=TRUE, sep="\t")
repl2 %>% kbl(caption="DAVID Enrichment results when using the genes provided in the supplement.") %>% kable_paper("hover", full_width = F)
Category | Term | Count | X. | PValue | List.Total | Pop.Hits | Pop.Total | Fold.Enrichment | Benjamini |
---|---|---|---|---|---|---|---|---|---|
GOTERM_MF_DIRECT | GO:0051082~unfolded protein binding | 13 | 6.103286 | 0.0000000 | 182 | 110 | 16881 | 10.961688 | 0.0000003 |
GOTERM_MF_DIRECT | GO:0005515~protein binding | 134 | 62.910798 | 0.0000000 | 182 | 8785 | 16881 | 1.414783 | 0.0000003 |
GOTERM_MF_DIRECT | GO:0051087~chaperone binding | 10 | 4.694836 | 0.0000002 | 182 | 81 | 16881 | 11.450957 | 0.0000200 |
GOTERM_MF_DIRECT | GO:0046961~proton-transporting ATPase activity, rotational mechanism | 6 | 2.816901 | 0.0000073 | 182 | 26 | 16881 | 21.404480 | 0.0005345 |
GOTERM_MF_DIRECT | GO:0015078~hydrogen ion transmembrane transporter activity | 6 | 2.816901 | 0.0000213 | 182 | 32 | 16881 | 17.391140 | 0.0011485 |
GOTERM_MF_DIRECT | GO:0055131~C3HC4-type RING finger domain binding | 4 | 1.877934 | 0.0000237 | 182 | 6 | 16881 | 61.835165 | 0.0011485 |
GOTERM_MF_DIRECT | GO:0004298~threonine-type endopeptidase activity | 5 | 2.347418 | 0.0000663 | 182 | 21 | 16881 | 22.083987 | 0.0027580 |
GOTERM_MF_DIRECT | GO:0016887~ATPase activity | 10 | 4.694836 | 0.0001614 | 182 | 183 | 16881 | 5.068456 | 0.0058694 |
GOTERM_MF_DIRECT | GO:0030544~Hsp70 protein binding | 5 | 2.347418 | 0.0004103 | 182 | 33 | 16881 | 14.053447 | 0.0132653 |
GOTERM_MF_DIRECT | GO:0036402~proteasome-activating ATPase activity | 3 | 1.408451 | 0.0016671 | 182 | 6 | 16881 | 46.376374 | 0.0485127 |
GOTERM_CC_DIRECT | GO:0000502~proteasome complex | 13 | 6.103286 | 0.0000000 | 179 | 57 | 10737 | 13.680388 | 0.0000000 |
GOTERM_CC_DIRECT | GO:0070062~extracellular exosome | 59 | 27.699531 | 0.0000003 | 179 | 1842 | 10737 | 1.921287 | 0.0000347 |
GOTERM_CC_DIRECT | GO:0022624~proteasome accessory complex | 6 | 2.816901 | 0.0000045 | 179 | 16 | 10737 | 22.493715 | 0.0003196 |
GOTERM_CC_DIRECT | GO:0005829~cytosol | 68 | 31.924883 | 0.0000519 | 179 | 2621 | 10737 | 1.556223 | 0.0027916 |
GOTERM_CC_DIRECT | GO:0005839~proteasome core complex | 5 | 2.347418 | 0.0002330 | 179 | 19 | 10737 | 15.785063 | 0.0100174 |
GOTERM_CC_DIRECT | GO:0005838~proteasome regulatory particle | 4 | 1.877934 | 0.0006702 | 179 | 11 | 10737 | 21.812087 | 0.0205860 |
GOTERM_CC_DIRECT | GO:0008540~proteasome regulatory particle, base subcomplex | 4 | 1.877934 | 0.0006702 | 179 | 11 | 10737 | 21.812087 | 0.0205860 |
GOTERM_CC_DIRECT | GO:0016234~inclusion body | 4 | 1.877934 | 0.0011337 | 179 | 13 | 10737 | 18.456382 | 0.0304681 |
GOTERM_MF_DIRECT | GO:0051082~unfolded protein binding | 12 | 5.633803 | 0.0000002 | 176 | 87 | 10274 | 8.051724 | 0.0000667 |
GOTERM_MF_DIRECT | GO:0051087~chaperone binding | 9 | 4.225352 | 0.0000149 | 176 | 66 | 10274 | 7.960227 | 0.0018296 |
GOTERM_MF_DIRECT | GO:0055131~C3HC4-type RING finger domain binding | 4 | 1.877934 | 0.0000192 | 176 | 4 | 10274 | 58.375000 | 0.0018296 |
GOTERM_MF_DIRECT | GO:0004298~threonine-type endopeptidase activity | 5 | 2.347418 | 0.0002065 | 176 | 18 | 10274 | 16.215278 | 0.0147632 |
GOTERM_MF_DIRECT | GO:0046961~proton-transporting ATPase activity, rotational mechanism | 5 | 2.347418 | 0.0002581 | 176 | 19 | 10274 | 15.361842 | 0.0147632 |
GOTERM_MF_DIRECT | GO:0015078~hydrogen ion transmembrane transporter activity | 5 | 2.347418 | 0.0004680 | 176 | 22 | 10274 | 13.267046 | 0.0223099 |
GOTERM_MF_DIRECT | GO:0030544~Hsp70 protein binding | 5 | 2.347418 | 0.0007778 | 176 | 25 | 10274 | 11.675000 | 0.0317777 |
Considering the results shown in the supplement, there were 7 FDR significant sets.
When the analysis was rerun without a specific background there were mixed results with most of the findings not reproducing.
Did reproduce: “Proteasome complex”, “Inclusion body”
Did not reproduce: “Protein metabolism”, “Heat shock protein activity”, “Ubiquitin-specific protease activity”, “Exosomes”, “Lysosome”
When considering the re-analysis with the corrected background set:
Did reproduce: “Proteasome complex”, “Inclusion body”
Did not reproduce: “Protein metabolism”, “Heat shock protein activity”, “Ubiquitin-specific protease activity”, “Exosomes”, “Lysosome”
This suggests that the main problem here is that the results shown in the supplement are incorrect and are inconsistent with the gene list provided in the supplement.
Regarding the results shown in Table 3, “Response to unfolded protein” was reproduced, while “Regulation of cell cycle” wasn’t reproduced.
Some interesting statements about the enrichment analysis:
“… Furthermore, the enrichment and KEGG pathway results suggested that there are several genes involved to induce apoptosis pathway.”
“.. The enrichment analysis identified DEGs genes from transcriptome libraries including cell death, cell cycle, apoptosis and cell growth.”
“The present study showed the significant inhibition effect upon citral by regulating various genes involved in signaling pathways, inhibits metastasis, colony formation and induced apoptosis both in silico and in vitro.”
“KEGG analysis revealed that MAPK signaling pathway, NF-kappa B signaling pathway, p53 signaling pathway, PI3-K-Akt signaling pathway, pathways in cancer, and prostate cancer pathways are mostly enriched in up and down-regulated genes which identified between normal and citral treated samples in AGS. Among them, cell cycle, pathways in cancer, and MAPK signaling pathways were observed as mostly enriched pathways. In addition, several common genes were identified in these pathways were listed in Table 2. Among them, cell cycle (hsa04110) and cancer (hsa05200) occupied several down-regulated DEG genes. The genes involved in cell cycle synthesis are very crucial for the inhibition of cancer cell progression19. For example, cyclin-dependent kinases (CKD2) are the most important regulators for the transition and cell cycle progression and play a significant role in regulating cell cycle division including centromere duplication, DNA synthesis, G1-S transition, and modulation of G2 progression20–22. Furthermore, the enrichment analysis and KEGG pathway results indicated that, there are several genes involved to initiate apoptosis pathway that are directly/indirectly related to the treatment of citral. In addition, enrichment analysis identified DEGs genes from transcriptome libraries were mainly involved in cell death, cell cycle, apoptotic process, and cell growth (Table 3).”
To summarise the conclusions, authors allude to enrichment analysis identifying apoptosis pathways. They speak about the KEGG classifications but there was no statistical enrichment test performed so this is not a quantitative enrichment. Specifically they mention table 3 supports enrichment analysis that indicates “cell death, cell cycle, apoptotic process, and cell growth”
The only hint of apoptosis pathways identified was the “Ubiquitin-specific protease activity” found in the original study (FDR=3.1e-04), and this was partially reproduced with DAVID without a background set (GO:0051436~negative regulation of ubiquitin-protein ligase activity involved in mitotic cell cycle; FDR=3.154072e-09), however when a correct background gene set was used, no ubiquitin-related pathways were identified as significant.
So to conclude, it appears that most pathway results weren’t reproduced regarding cell cycle, signaling or apoptosis.
What I did find, was that the upregulated genes were enriched in functions related to protein misfolding, eg: chaperones, proteasome, etc.
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] eulerr_6.1.1 kableExtra_1.3.4
## [3] mitch_1.4.1 org.Hs.eg.db_3.13.0
## [5] AnnotationDbi_1.54.1 clusterProfiler_4.0.5
## [7] DESeq2_1.32.0 SummarizedExperiment_1.22.0
## [9] Biobase_2.52.0 MatrixGenerics_1.4.3
## [11] matrixStats_0.61.0 GenomicRanges_1.44.0
## [13] GenomeInfoDb_1.28.4 IRanges_2.26.0
## [15] S4Vectors_0.30.2 BiocGenerics_0.38.0
## [17] getDEE2_1.2.0
##
## loaded via a namespace (and not attached):
## [1] shadowtext_0.0.9 fastmatch_1.1-3 systemfonts_1.0.3
## [4] plyr_1.8.6 igraph_1.2.8 lazyeval_0.2.2
## [7] splines_4.1.2 BiocParallel_1.26.2 ggplot2_3.3.5
## [10] digest_0.6.28 yulab.utils_0.0.4 htmltools_0.5.2
## [13] GOSemSim_2.18.1 viridis_0.6.2 GO.db_3.13.0
## [16] fansi_0.5.0 magrittr_2.0.1 memoise_2.0.0
## [19] Biostrings_2.60.2 annotate_1.70.0 graphlayouts_0.7.1
## [22] svglite_2.0.0 enrichplot_1.12.3 colorspace_2.0-2
## [25] rvest_1.0.2 blob_1.2.2 ggrepel_0.9.1
## [28] xfun_0.28 dplyr_1.0.7 crayon_1.4.2
## [31] RCurl_1.98-1.5 jsonlite_1.7.2 scatterpie_0.1.7
## [34] genefilter_1.74.1 survival_3.2-13 ape_5.5
## [37] glue_1.5.0 polyclip_1.10-0 gtable_0.3.0
## [40] zlibbioc_1.38.0 XVector_0.32.0 webshot_0.5.2
## [43] htm2txt_2.1.1 DelayedArray_0.18.0 scales_1.1.1
## [46] DOSE_3.18.3 DBI_1.1.1 GGally_2.1.2
## [49] Rcpp_1.0.7 viridisLite_0.4.0 xtable_1.8-4
## [52] gridGraphics_0.5-1 tidytree_0.3.6 bit_4.0.4
## [55] htmlwidgets_1.5.4 httr_1.4.2 fgsea_1.18.0
## [58] gplots_3.1.1 RColorBrewer_1.1-2 ellipsis_0.3.2
## [61] pkgconfig_2.0.3 reshape_0.8.8 XML_3.99-0.8
## [64] farver_2.1.0 sass_0.4.0 locfit_1.5-9.4
## [67] utf8_1.2.2 ggplotify_0.1.0 tidyselect_1.1.1
## [70] rlang_0.4.12 reshape2_1.4.4 later_1.3.0
## [73] munsell_0.5.0 tools_4.1.2 cachem_1.0.6
## [76] downloader_0.4 generics_0.1.1 RSQLite_2.2.8
## [79] evaluate_0.14 stringr_1.4.0 fastmap_1.1.0
## [82] yaml_2.2.1 ggtree_3.0.4 knitr_1.36
## [85] bit64_4.0.5 tidygraph_1.2.0 caTools_1.18.2
## [88] purrr_0.3.4 KEGGREST_1.32.0 ggraph_2.0.5
## [91] nlme_3.1-153 mime_0.12 aplot_0.1.1
## [94] xml2_1.3.2 DO.db_2.9 rstudioapi_0.13
## [97] compiler_4.1.2 beeswarm_0.4.0 png_0.1-7
## [100] treeio_1.16.2 tibble_3.1.6 tweenr_1.0.2
## [103] geneplotter_1.70.0 bslib_0.3.1 stringi_1.7.5
## [106] highr_0.9 lattice_0.20-45 Matrix_1.3-4
## [109] vctrs_0.3.8 pillar_1.6.4 lifecycle_1.0.1
## [112] jquerylib_0.1.4 data.table_1.14.2 cowplot_1.1.1
## [115] bitops_1.0-7 httpuv_1.6.3 patchwork_1.1.1
## [118] qvalue_2.24.0 R6_2.5.1 promises_1.2.0.1
## [121] KernSmooth_2.23-20 echarts4r_0.4.2 gridExtra_2.3
## [124] MASS_7.3-54 gtools_3.9.2 assertthat_0.2.1
## [127] GenomeInfoDbData_1.2.6 grid_4.1.2 ggfun_0.0.4
## [130] tidyr_1.1.4 rmarkdown_2.11 ggforce_0.3.3
## [133] shiny_1.7.1