Source: https://github.com/markziemann/SurveyEnrichmentMethods

Intro

suppressPackageStartupMessages({
library("getDEE2")
library("DESeq2")
library("clusterProfiler")
library("org.Hs.eg.db")
library("mitch")
library("kableExtra")
library("eulerr")
})

In Kaumadi et al 2021 we showed that major problems plague functional enrichment analysis in many journal articles.

Here I will do my best to reproduce and replicate the findings of some studies. These were selected on the following criteria:

  • Included in the 2019 group that was studies in Kaumadi et al 2021.

  • Human study

  • RNA-seq gene expression

  • Performed DAVID gene set analysis

  • Provided gene list

From this subset, four studies were selected

Study Analytical issues
PMC6349697 Background
PMC6425008 Background, FDR
PMC6463127 Background, FDR
PMC6535219 Background

In this document I will be seeing if I can reproduce the same findings by using the same tool with the gene list provided in the article. Also I will correct the method wherever possible. This means using the correct background gene list, using FDR control and performing separate analysis of up and down-regulated gene lists.

The study examined in this document is PMC6425008: Balusamy et al, 2019.

In this study, the effect of citral (3,7-dimethyl-2,6-octadienal) on cells is investigated. In the paper, they isolate the compound with chromatography, followed by in vitro treatment of AGL cell line with different levels of the compound. Treated cells were characterised in terms of morphology and viability. Citral caused shrunken cells, amorphous shape and lower cell number. To understand the pattern of gene expression, RNA-seq of n=1 control and n=1 treated samples was performed yielding 75 and 65 M read pairs respectively which were deposited to SRA (SRP150561).

Some type of differential analysis was performed with Cuffdiff, but with no replicates, this might not be reliable. Anyway, the up and down regulated gene lists are found in (Supplementary File S6)[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6425008/bin/41598_2019_41406_MOESM7_ESM.xlsx]. It contains 612 genes; 216 up and 396 downregulated genes.

There is a network analysis which looks a bit like a hairball and isn’t described in any detail.

Figures 5-7 show the results of the enrichment analysis. A number of gene sets are presented as significant, despite lack of FDR correction. There was no mention of a background gene list in the article.

Here I have summarised the “significant” findings presented in the supplements. There are some problems. Firstly, only 7 of these have an FDR (BH method) < 0.05. Secondly, some of these show that only 1 gene was found in a gene set. Surely one gene on its own can’t constitute a valid enrichment.

orig <- read.table("PMC6425008_orig.tsv",sep="\t",header=TRUE)

orig %>% kbl(caption="Enrichment results presented in PMC6425008 supplementary files") %>% kable_paper("hover", full_width = F)
Enrichment results presented in PMC6425008 supplementary files
GO.class GO.Term No..of.genes.in.the.dataset No..of.genes.in.the.background.dataset Percentage.of.genes Fold.enrichment P.value..Hypergeometric.test. BH.method
GO_BP Protein metabolism 43 1323 22.4 3.1 0.0e+00 0.0e+00
GO_BP Intracellular signaling cascade 1 3 0.5 31.7 3.1e-02 1.0e+00
GO_BP Chromosome organization and biogenesis 1 4 0.5 23.8 4.2e-02 1.0e+00
GO_MF Heat shock protein activity 8 27 4.2 28.0 0.0e+00 1.0e-07
GO_MF Ubiquitin-specific protease activity 16 377 8.3 4.0 2.8e-06 3.1e-04
GO_MF Structural constituent of ribosome 7 152 3.6 4.4 1.2e-03 8.9e-02
GO_MF Oxidoreductase activity 7 161 3.6 4.1 1.7e-03 9.3e-02
GO_MF Chaperone activity 6 126 3.1 4.5 2.3e-03 1.0e-01
GO_MF ATPase activity 4 111 2.1 3.4 3.0e-02 7.8e-01
GO_MF Storage protein 1 3 0.5 31.7 3.1e-02 7.8e-01
GO_MF Superoxide dismutase activity 1 3 0.5 31.7 3.1e-02 7.8e-01
GO_MF Transferase activity, transferring aldehyde or ketonic groups 1 3 0.5 31.7 3.1e-02 7.8e-01
GO_MF Protein transporter activity 1 4 0.5 23.8 4.2e-02 9.3e-01
GO_CC Proteasome complex 10 34 5.9 25.3 0.0e+00 0.0e+00
GO_CC Exosomes 59 2043 34.9 2.5 0.0e+00 0.0e+00
GO_CC Lysosome 39 1620 23.1 2.1 6.6e-06 1.7e-03
GO_CC Inclusion body 3 6 1.8 43.1 3.0e-05 5.9e-03

In this work, I will see if I can replicate the results for the 7 FDR significant gene sets.

subset(orig,BH.method < 0.05)$GO.Term
## [1] "Protein metabolism"                  
## [2] "Heat shock protein activity"         
## [3] "Ubiquitin-specific protease activity"
## [4] "Proteasome complex"                  
## [5] "Exosomes"                            
## [6] "Lysosome"                            
## [7] "Inclusion body"

The article also shows some KEGG pathway analysis in Table 2 but shows results for individual genes and doesn’t present any results of enrichment tests for gene sets.

In Table 3 of the article, GO enrichment table is shown for apoptosis pathways. It seems only two of the sets are significant: “Response to unfolded protein” and “Regulation of cell cycle”

orig2 <- read.table("PMC6425008_orig2.tsv",sep="\t",header=TRUE)

orig2 %>% kbl(caption="Enrichment results presented in PMC6425008 Table 3") %>% kable_paper("hover", full_width = F)
Enrichment results presented in PMC6425008 Table 3
Go.terms Category Number.of.genes Go.id up_P.value dn_p.value
Response to unfolded protein BP 8 GO:0006986 0.00e+00 NA
Negative regulation of programmed cell death BP 2 GO:0043069 6.20e-01 3.30e-01
Negative regulation of cell death BP 4 GO:0060548 7.70e-01 5.70e-01
Regulation of cell cycle BP 19 GO:0051726 1.64e-05 1.64e-05
Negative regulation of programmed cell death BP 2 GO:0043069 6.20e-01 3.30e-01
Positive regulation of cell death BP 1 GO:0010942 1.20e-01 NA
Negative regulation of cell death BP 4 GO:0060548 7.70e-01 5.70e-01
Intracellular membrane-bounded organelle CC 10 GO:0043231 NA 6.70e-01
Negative regulation of cell growth BP 2 GO:0030308 NA 8.70e-01
Negative regulation of catalytic activity BP 1 GO:0043086 6.40e-01 NA
Cell cycle arrest BP 4 GO:0007050 NA 6.40e-01
Negative regulation of growth BP 2 GO:0045926 4.00e-01 NA
Cellular protein metabolic process BP 5 GO:0044267 NA 6.40e-01
Positive regulation of cellular protein metabolic process BP 1 GO:0032270 NA 7.60e-01

In addition, the results of Table 3 are inconsistent with the GO results shown in the supplements. For example “Response to unfolded protein” and “Regulation of cell cycle” are missing from the supplementary tables.

Try to reproduce

I submitted the 216 upregulated genes to DAVID and these were mapped to 213 DAVID gene IDs. This analysis without a background list gave 31 BP, 17 CC and 10 MF significant GO terms (see table below). This is drastically different to what is shown in the article.

repl1 <- read.table("PMC6425008_repl1.tsv", header=TRUE, sep="\t")

repl1 %>% kbl(caption="DAVID Enrichment results when using the genes provided in the supplement and a corrected background gene list.") %>% kable_paper("hover", full_width = F)
DAVID Enrichment results when using the genes provided in the supplement and a corrected background gene list.
Category Term Count X. PValue List.Total Pop.Hits Pop.Total Fold.Enrichment Benjamini
GOTERM_BP_DIRECT GO:0043488~regulation of mRNA stability 18 8.450704 0.0000000 179 103 16792 16.393990 0.0000000
GOTERM_BP_DIRECT GO:0038061~NIK/NF-kappaB signaling 13 6.103286 0.0000000 179 66 16792 18.477738 0.0000000
GOTERM_BP_DIRECT GO:0051436~negative regulation of ubiquitin-protein ligase activity involved in mitotic cell cycle 13 6.103286 0.0000000 179 71 16792 17.176489 0.0000000
GOTERM_BP_DIRECT GO:0051437~positive regulation of ubiquitin-protein ligase activity involved in regulation of mitotic cell cycle transition 13 6.103286 0.0000000 179 76 16792 16.046457 0.0000000
GOTERM_BP_DIRECT GO:0031145~anaphase-promoting complex-dependent catabolic process 13 6.103286 0.0000000 179 79 16792 15.437098 0.0000000
GOTERM_BP_DIRECT GO:0002223~stimulatory C-type lectin receptor signaling pathway 14 6.572770 0.0000000 179 105 16792 12.508007 0.0000000
GOTERM_BP_DIRECT GO:0006521~regulation of cellular amino acid metabolic process 11 5.164319 0.0000000 179 51 16792 20.233542 0.0000000
GOTERM_BP_DIRECT GO:0060071~Wnt signaling pathway, planar cell polarity pathway 13 6.103286 0.0000000 179 92 16792 13.255769 0.0000000
GOTERM_BP_DIRECT GO:0033209~tumor necrosis factor-mediated signaling pathway 14 6.572770 0.0000000 179 118 16792 11.130007 0.0000000
GOTERM_BP_DIRECT GO:0002479~antigen processing and presentation of exogenous peptide antigen via MHC class I, TAP-dependent 11 5.164319 0.0000000 179 63 16792 16.379534 0.0000001
GOTERM_BP_DIRECT GO:0000209~protein polyubiquitination 16 7.511737 0.0000000 179 184 16792 8.157396 0.0000001
GOTERM_BP_DIRECT GO:0090090~negative regulation of canonical Wnt signaling pathway 15 7.042254 0.0000000 179 163 16792 8.632827 0.0000002
GOTERM_BP_DIRECT GO:0090263~positive regulation of canonical Wnt signaling pathway 13 6.103286 0.0000000 179 120 16792 10.162756 0.0000004
GOTERM_BP_DIRECT GO:0050852~T cell receptor signaling pathway 14 6.572770 0.0000000 179 148 16792 8.873924 0.0000004
GOTERM_BP_DIRECT GO:1900034~regulation of cellular response to heat 11 5.164319 0.0000000 179 75 16792 13.758808 0.0000004
GOTERM_BP_DIRECT GO:0006986~response to unfolded protein 9 4.225352 0.0000000 179 42 16792 20.102155 0.0000007
GOTERM_BP_DIRECT GO:0038095~Fc-epsilon receptor signaling pathway 14 6.572770 0.0000001 179 178 16792 7.378319 0.0000030
GOTERM_BP_DIRECT GO:0006457~protein folding 14 6.572770 0.0000001 179 180 16792 7.296338 0.0000032
GOTERM_BP_DIRECT GO:0043161~proteasome-mediated ubiquitin-dependent protein catabolic process 14 6.572770 0.0000003 179 203 16792 6.469659 0.0000125
GOTERM_BP_DIRECT GO:0090383~phagosome acidification 6 2.816901 0.0000085 179 27 16792 20.846679 0.0003880
GOTERM_BP_DIRECT GO:0042026~protein refolding 5 2.347418 0.0000152 179 15 16792 31.270019 0.0006649
GOTERM_BP_DIRECT GO:0015991~ATP hydrolysis coupled proton transport 6 2.816901 0.0000202 179 32 16792 17.589385 0.0008429
GOTERM_BP_DIRECT GO:0000165~MAPK cascade 13 6.103286 0.0000236 179 262 16792 4.654697 0.0009433
GOTERM_BP_DIRECT GO:0033572~transferrin transport 6 2.816901 0.0000317 179 35 16792 16.081724 0.0012140
GOTERM_BP_DIRECT GO:0090084~negative regulation of inclusion body assembly 4 1.877934 0.0001331 179 10 16792 37.524022 0.0048858
GOTERM_BP_DIRECT GO:0009408~response to heat 6 2.816901 0.0001498 179 48 16792 11.726257 0.0050924
GOTERM_BP_DIRECT GO:0051603~proteolysis involved in cellular protein catabolic process 6 2.816901 0.0001498 179 48 16792 11.726257 0.0050924
GOTERM_BP_DIRECT GO:0031396~regulation of protein ubiquitination 4 1.877934 0.0005925 179 16 16792 23.452514 0.0194253
GOTERM_BP_DIRECT GO:0006979~response to oxidative stress 7 3.286385 0.0011237 179 110 16792 5.969731 0.0355706
GOTERM_BP_DIRECT GO:0016241~regulation of macroautophagy 5 2.347418 0.0011903 179 44 16792 10.660234 0.0364238
GOTERM_BP_DIRECT GO:0008286~insulin receptor signaling pathway 6 2.816901 0.0014301 179 78 16792 7.216158 0.0423494
GOTERM_CC_DIRECT GO:0000502~proteasome complex 14 6.572770 0.0000000 187 60 18224 22.739394 0.0000000
GOTERM_CC_DIRECT GO:0005829~cytosol 70 32.863850 0.0000000 187 3315 18224 2.057864 0.0000001
GOTERM_CC_DIRECT GO:0070062~extracellular exosome 62 29.107981 0.0000000 187 2811 18224 2.149478 0.0000002
GOTERM_CC_DIRECT GO:0005654~nucleoplasm 57 26.760563 0.0000002 187 2784 18224 1.995298 0.0000120
GOTERM_CC_DIRECT GO:0022624~proteasome accessory complex 6 2.816901 0.0000006 187 17 18224 34.395722 0.0000269
GOTERM_CC_DIRECT GO:0005737~cytoplasm 83 38.967136 0.0000048 187 5222 18224 1.548971 0.0001832
GOTERM_CC_DIRECT GO:0005634~nucleus 82 38.497653 0.0000404 187 5415 18224 1.475766 0.0013227
GOTERM_CC_DIRECT GO:0005839~proteasome core complex 5 2.347418 0.0000549 187 21 18224 23.203463 0.0015712
GOTERM_CC_DIRECT GO:0005838~proteasome regulatory particle 4 1.877934 0.0001625 187 11 18224 35.438017 0.0041352
GOTERM_CC_DIRECT GO:0008540~proteasome regulatory particle, base subcomplex 4 1.877934 0.0002151 187 12 18224 32.484848 0.0049250
GOTERM_CC_DIRECT GO:0016234~inclusion body 4 1.877934 0.0007625 187 18 18224 21.656566 0.0158738
GOTERM_CC_DIRECT GO:0005765~lysosomal membrane 10 4.694836 0.0020590 187 274 18224 3.556735 0.0392932
GOTERM_CC_DIRECT GO:0000786~nucleosome 6 2.816901 0.0027678 187 94 18224 6.220503 0.0425362
GOTERM_CC_DIRECT GO:0031595~nuclear proteasome complex 3 1.408451 0.0027862 187 8 18224 36.545454 0.0425362
GOTERM_CC_DIRECT GO:0033180~proton-transporting V-type ATPase, V1 domain 3 1.408451 0.0027862 187 8 18224 36.545454 0.0425362
GOTERM_CC_DIRECT GO:0033179~proton-transporting V-type ATPase, V0 domain 3 1.408451 0.0035583 187 9 18224 32.484848 0.0509278
GOTERM_CC_DIRECT GO:0031597~cytosolic proteasome complex 3 1.408451 0.0044181 187 10 18224 29.236364 0.0595140
GOTERM_MF_DIRECT GO:0051082~unfolded protein binding 13 6.103286 0.0000000 182 110 16881 10.961688 0.0000003
GOTERM_MF_DIRECT GO:0005515~protein binding 134 62.910798 0.0000000 182 8785 16881 1.414783 0.0000003
GOTERM_MF_DIRECT GO:0051087~chaperone binding 10 4.694836 0.0000002 182 81 16881 11.450957 0.0000200
GOTERM_MF_DIRECT GO:0046961~proton-transporting ATPase activity, rotational mechanism 6 2.816901 0.0000073 182 26 16881 21.404480 0.0005345
GOTERM_MF_DIRECT GO:0015078~hydrogen ion transmembrane transporter activity 6 2.816901 0.0000213 182 32 16881 17.391140 0.0011485
GOTERM_MF_DIRECT GO:0055131~C3HC4-type RING finger domain binding 4 1.877934 0.0000237 182 6 16881 61.835165 0.0011485
GOTERM_MF_DIRECT GO:0004298~threonine-type endopeptidase activity 5 2.347418 0.0000663 182 21 16881 22.083987 0.0027580
GOTERM_MF_DIRECT GO:0016887~ATPase activity 10 4.694836 0.0001614 182 183 16881 5.068456 0.0058694
GOTERM_MF_DIRECT GO:0030544~Hsp70 protein binding 5 2.347418 0.0004103 182 33 16881 14.053447 0.0132653
GOTERM_MF_DIRECT GO:0036402~proteasome-activating ATPase activity 3 1.408451 0.0016671 182 6 16881 46.376374 0.0485127

Try to reproduce with corrected background gene list

Now I’ll try this again, but using a correct background gene set. As the data is found on the DEE2 database, I can see which genes are detected and I can use those ones as a background. As the results below show, there are 14209 genes in the background.

m <- getDEE2::getDEE2Metadata(species="hsapiens")
ss <- m[grep("SRP150561",m$SRP_accession),]
x <- getDEE2(species="hsapiens",SRRvec=ss$SRR_accession,legacy=TRUE)
## For more information about DEE2 QC metrics, visit
##     https://github.com/markziemann/dee2/blob/master/qc/qc_metrics.md
mx <- x$GeneCounts
mxf <- mx[which(rowMeans(mx)>=10),]
nrow(mx) #58302
## [1] 58302
nrow(mxf) #14119
## [1] 14218
bg <- rownames(mxf)

geneinfo <- x$GeneInfo
bgg <- unique(geneinfo[which(rownames(geneinfo) %in%bg),"GeneSymbol"])
length(bgg)
## [1] 14209
writeLines(bgg,"PMC6425008_bg.txt")

So I tried DAVID with the new background, but was greeted with “Due to the ambiguous nature of Gene Symbol, DAVID does not allow it to be used for a background.” This poses a problem for many researchers.

Anyway so I will use Ensembl IDs instead (14218 IDs).

writeLines(bg,"PMC6425008_bg2.txt")

Here are the results when using all detected genes as the background.

repl2 <- read.table("PMC6425008_repl2.tsv", header=TRUE, sep="\t")

repl2 %>% kbl(caption="DAVID Enrichment results when using the genes provided in the supplement.") %>% kable_paper("hover", full_width = F)
DAVID Enrichment results when using the genes provided in the supplement.
Category Term Count X. PValue List.Total Pop.Hits Pop.Total Fold.Enrichment Benjamini
GOTERM_MF_DIRECT GO:0051082~unfolded protein binding 13 6.103286 0.0000000 182 110 16881 10.961688 0.0000003
GOTERM_MF_DIRECT GO:0005515~protein binding 134 62.910798 0.0000000 182 8785 16881 1.414783 0.0000003
GOTERM_MF_DIRECT GO:0051087~chaperone binding 10 4.694836 0.0000002 182 81 16881 11.450957 0.0000200
GOTERM_MF_DIRECT GO:0046961~proton-transporting ATPase activity, rotational mechanism 6 2.816901 0.0000073 182 26 16881 21.404480 0.0005345
GOTERM_MF_DIRECT GO:0015078~hydrogen ion transmembrane transporter activity 6 2.816901 0.0000213 182 32 16881 17.391140 0.0011485
GOTERM_MF_DIRECT GO:0055131~C3HC4-type RING finger domain binding 4 1.877934 0.0000237 182 6 16881 61.835165 0.0011485
GOTERM_MF_DIRECT GO:0004298~threonine-type endopeptidase activity 5 2.347418 0.0000663 182 21 16881 22.083987 0.0027580
GOTERM_MF_DIRECT GO:0016887~ATPase activity 10 4.694836 0.0001614 182 183 16881 5.068456 0.0058694
GOTERM_MF_DIRECT GO:0030544~Hsp70 protein binding 5 2.347418 0.0004103 182 33 16881 14.053447 0.0132653
GOTERM_MF_DIRECT GO:0036402~proteasome-activating ATPase activity 3 1.408451 0.0016671 182 6 16881 46.376374 0.0485127
GOTERM_CC_DIRECT GO:0000502~proteasome complex 13 6.103286 0.0000000 179 57 10737 13.680388 0.0000000
GOTERM_CC_DIRECT GO:0070062~extracellular exosome 59 27.699531 0.0000003 179 1842 10737 1.921287 0.0000347
GOTERM_CC_DIRECT GO:0022624~proteasome accessory complex 6 2.816901 0.0000045 179 16 10737 22.493715 0.0003196
GOTERM_CC_DIRECT GO:0005829~cytosol 68 31.924883 0.0000519 179 2621 10737 1.556223 0.0027916
GOTERM_CC_DIRECT GO:0005839~proteasome core complex 5 2.347418 0.0002330 179 19 10737 15.785063 0.0100174
GOTERM_CC_DIRECT GO:0005838~proteasome regulatory particle 4 1.877934 0.0006702 179 11 10737 21.812087 0.0205860
GOTERM_CC_DIRECT GO:0008540~proteasome regulatory particle, base subcomplex 4 1.877934 0.0006702 179 11 10737 21.812087 0.0205860
GOTERM_CC_DIRECT GO:0016234~inclusion body 4 1.877934 0.0011337 179 13 10737 18.456382 0.0304681
GOTERM_MF_DIRECT GO:0051082~unfolded protein binding 12 5.633803 0.0000002 176 87 10274 8.051724 0.0000667
GOTERM_MF_DIRECT GO:0051087~chaperone binding 9 4.225352 0.0000149 176 66 10274 7.960227 0.0018296
GOTERM_MF_DIRECT GO:0055131~C3HC4-type RING finger domain binding 4 1.877934 0.0000192 176 4 10274 58.375000 0.0018296
GOTERM_MF_DIRECT GO:0004298~threonine-type endopeptidase activity 5 2.347418 0.0002065 176 18 10274 16.215278 0.0147632
GOTERM_MF_DIRECT GO:0046961~proton-transporting ATPase activity, rotational mechanism 5 2.347418 0.0002581 176 19 10274 15.361842 0.0147632
GOTERM_MF_DIRECT GO:0015078~hydrogen ion transmembrane transporter activity 5 2.347418 0.0004680 176 22 10274 13.267046 0.0223099
GOTERM_MF_DIRECT GO:0030544~Hsp70 protein binding 5 2.347418 0.0007778 176 25 10274 11.675000 0.0317777

Do the results reproduce?

Considering the results shown in the supplement, there were 7 FDR significant sets.

When the analysis was rerun without a specific background there were mixed results with most of the findings not reproducing.

  • Did reproduce: “Proteasome complex”, “Inclusion body”

  • Did not reproduce: “Protein metabolism”, “Heat shock protein activity”, “Ubiquitin-specific protease activity”, “Exosomes”, “Lysosome”

When considering the re-analysis with the corrected background set:

  • Did reproduce: “Proteasome complex”, “Inclusion body”

  • Did not reproduce: “Protein metabolism”, “Heat shock protein activity”, “Ubiquitin-specific protease activity”, “Exosomes”, “Lysosome”

This suggests that the main problem here is that the results shown in the supplement are incorrect and are inconsistent with the gene list provided in the supplement.

Regarding the results shown in Table 3, “Response to unfolded protein” was reproduced, while “Regulation of cell cycle” wasn’t reproduced.

What about the conclusions of the study?

Some interesting statements about the enrichment analysis:

  • Abstract:

“… Furthermore, the enrichment and KEGG pathway results suggested that there are several genes involved to induce apoptosis pathway.”

“.. The enrichment analysis identified DEGs genes from transcriptome libraries including cell death, cell cycle, apoptosis and cell growth.”

“The present study showed the significant inhibition effect upon citral by regulating various genes involved in signaling pathways, inhibits metastasis, colony formation and induced apoptosis both in silico and in vitro.”

  • Results and Discussion:

“KEGG analysis revealed that MAPK signaling pathway, NF-kappa B signaling pathway, p53 signaling pathway, PI3-K-Akt signaling pathway, pathways in cancer, and prostate cancer pathways are mostly enriched in up and down-regulated genes which identified between normal and citral treated samples in AGS. Among them, cell cycle, pathways in cancer, and MAPK signaling pathways were observed as mostly enriched pathways. In addition, several common genes were identified in these pathways were listed in Table 2. Among them, cell cycle (hsa04110) and cancer (hsa05200) occupied several down-regulated DEG genes. The genes involved in cell cycle synthesis are very crucial for the inhibition of cancer cell progression19. For example, cyclin-dependent kinases (CKD2) are the most important regulators for the transition and cell cycle progression and play a significant role in regulating cell cycle division including centromere duplication, DNA synthesis, G1-S transition, and modulation of G2 progression20–22. Furthermore, the enrichment analysis and KEGG pathway results indicated that, there are several genes involved to initiate apoptosis pathway that are directly/indirectly related to the treatment of citral. In addition, enrichment analysis identified DEGs genes from transcriptome libraries were mainly involved in cell death, cell cycle, apoptotic process, and cell growth (Table 3).”

To summarise the conclusions, authors allude to enrichment analysis identifying apoptosis pathways. They speak about the KEGG classifications but there was no statistical enrichment test performed so this is not a quantitative enrichment. Specifically they mention table 3 supports enrichment analysis that indicates “cell death, cell cycle, apoptotic process, and cell growth”

The only hint of apoptosis pathways identified was the “Ubiquitin-specific protease activity” found in the original study (FDR=3.1e-04), and this was partially reproduced with DAVID without a background set (GO:0051436~negative regulation of ubiquitin-protein ligase activity involved in mitotic cell cycle; FDR=3.154072e-09), however when a correct background gene set was used, no ubiquitin-related pathways were identified as significant.

My conclusion

So to conclude, it appears that most pathway results weren’t reproduced regarding cell cycle, signaling or apoptosis.

What I did find, was that the upregulated genes were enriched in functions related to protein misfolding, eg: chaperones, proteasome, etc.

Session information

sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
##  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
##  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] eulerr_6.1.1                kableExtra_1.3.4           
##  [3] mitch_1.4.1                 org.Hs.eg.db_3.13.0        
##  [5] AnnotationDbi_1.54.1        clusterProfiler_4.0.5      
##  [7] DESeq2_1.32.0               SummarizedExperiment_1.22.0
##  [9] Biobase_2.52.0              MatrixGenerics_1.4.3       
## [11] matrixStats_0.61.0          GenomicRanges_1.44.0       
## [13] GenomeInfoDb_1.28.4         IRanges_2.26.0             
## [15] S4Vectors_0.30.2            BiocGenerics_0.38.0        
## [17] getDEE2_1.2.0              
## 
## loaded via a namespace (and not attached):
##   [1] shadowtext_0.0.9       fastmatch_1.1-3        systemfonts_1.0.3     
##   [4] plyr_1.8.6             igraph_1.2.8           lazyeval_0.2.2        
##   [7] splines_4.1.2          BiocParallel_1.26.2    ggplot2_3.3.5         
##  [10] digest_0.6.28          yulab.utils_0.0.4      htmltools_0.5.2       
##  [13] GOSemSim_2.18.1        viridis_0.6.2          GO.db_3.13.0          
##  [16] fansi_0.5.0            magrittr_2.0.1         memoise_2.0.0         
##  [19] Biostrings_2.60.2      annotate_1.70.0        graphlayouts_0.7.1    
##  [22] svglite_2.0.0          enrichplot_1.12.3      colorspace_2.0-2      
##  [25] rvest_1.0.2            blob_1.2.2             ggrepel_0.9.1         
##  [28] xfun_0.28              dplyr_1.0.7            crayon_1.4.2          
##  [31] RCurl_1.98-1.5         jsonlite_1.7.2         scatterpie_0.1.7      
##  [34] genefilter_1.74.1      survival_3.2-13        ape_5.5               
##  [37] glue_1.5.0             polyclip_1.10-0        gtable_0.3.0          
##  [40] zlibbioc_1.38.0        XVector_0.32.0         webshot_0.5.2         
##  [43] htm2txt_2.1.1          DelayedArray_0.18.0    scales_1.1.1          
##  [46] DOSE_3.18.3            DBI_1.1.1              GGally_2.1.2          
##  [49] Rcpp_1.0.7             viridisLite_0.4.0      xtable_1.8-4          
##  [52] gridGraphics_0.5-1     tidytree_0.3.6         bit_4.0.4             
##  [55] htmlwidgets_1.5.4      httr_1.4.2             fgsea_1.18.0          
##  [58] gplots_3.1.1           RColorBrewer_1.1-2     ellipsis_0.3.2        
##  [61] pkgconfig_2.0.3        reshape_0.8.8          XML_3.99-0.8          
##  [64] farver_2.1.0           sass_0.4.0             locfit_1.5-9.4        
##  [67] utf8_1.2.2             ggplotify_0.1.0        tidyselect_1.1.1      
##  [70] rlang_0.4.12           reshape2_1.4.4         later_1.3.0           
##  [73] munsell_0.5.0          tools_4.1.2            cachem_1.0.6          
##  [76] downloader_0.4         generics_0.1.1         RSQLite_2.2.8         
##  [79] evaluate_0.14          stringr_1.4.0          fastmap_1.1.0         
##  [82] yaml_2.2.1             ggtree_3.0.4           knitr_1.36            
##  [85] bit64_4.0.5            tidygraph_1.2.0        caTools_1.18.2        
##  [88] purrr_0.3.4            KEGGREST_1.32.0        ggraph_2.0.5          
##  [91] nlme_3.1-153           mime_0.12              aplot_0.1.1           
##  [94] xml2_1.3.2             DO.db_2.9              rstudioapi_0.13       
##  [97] compiler_4.1.2         beeswarm_0.4.0         png_0.1-7             
## [100] treeio_1.16.2          tibble_3.1.6           tweenr_1.0.2          
## [103] geneplotter_1.70.0     bslib_0.3.1            stringi_1.7.5         
## [106] highr_0.9              lattice_0.20-45        Matrix_1.3-4          
## [109] vctrs_0.3.8            pillar_1.6.4           lifecycle_1.0.1       
## [112] jquerylib_0.1.4        data.table_1.14.2      cowplot_1.1.1         
## [115] bitops_1.0-7           httpuv_1.6.3           patchwork_1.1.1       
## [118] qvalue_2.24.0          R6_2.5.1               promises_1.2.0.1      
## [121] KernSmooth_2.23-20     echarts4r_0.4.2        gridExtra_2.3         
## [124] MASS_7.3-54            gtools_3.9.2           assertthat_0.2.1      
## [127] GenomeInfoDbData_1.2.6 grid_4.1.2             ggfun_0.0.4           
## [130] tidyr_1.1.4            rmarkdown_2.11         ggforce_0.3.3         
## [133] shiny_1.7.1