Source: https://github.com/markziemann/SurveyEnrichmentMethods
suppressPackageStartupMessages({
library("getDEE2")
library("DESeq2")
library("clusterProfiler")
library("org.Hs.eg.db")
library("mitch")
library("kableExtra")
library("eulerr")
})
In Kaumadi et al 2021 we showed that major problems plague functional enrichment analysis in many journal articles.
Here I will do my best to reproduce and replicate the findings of some studies. These were selected on the following criteria:
Included in the 2019 group that was studies in Kaumadi et al 2021.
Human study
RNA-seq gene expression
Performed DAVID gene set analysis
Provided gene list
From this subset, four studies were selected
Study | Analytical issues |
---|---|
PMC6349697 | Background |
PMC6425008 | Background, FDR |
PMC6463127 | Background, FDR |
PMC6535219 | Background |
In this document I will be seeing if I can reproduce the same findings by using the same tool with the gene list provided in the article. Also I will correct the method wherever possible. This means using the correct background gene list, using FDR control and performing separate analysis of up and down-regulated gene lists.
The study examined in this document is PMC6349697: Liu et al, 2019.
In this study, LINC00941 is the focus. From TGCA expression data, a coexpression network was built and then genes associated with LINC00941 were obtained. These genes were then subject to enrichment analysis of GO terms and KEGG pathways using DAVID v6.8, with pathways FDR<0.05 considered significant. No information is provided in the article about a background gene list.
From the article, they present in Supplementary Table S5 a list of 123 co-expressed genes.
In Supplementary Table S6 they show the 3180 genes that underwent clustering, along with the cluster assignment. There are 124 genes in the “yellow” cluster.
In Supplementary Table S7, authors show the significant results of the enrichment tests.
Category | Term | Count | FDR |
---|---|---|---|
GOTERM_BP_DIRECT | GO:0030574~collagen catabolic process | 15 | 2.73E-15 |
GOTERM_BP_DIRECT | GO:0030199~collagen fibril organization | 13 | 8.48E-15 |
GOTERM_BP_DIRECT | GO:0030198~extracellular matrix organization | 20 | 1.43E-14 |
GOTERM_BP_DIRECT | GO:0007155~cell adhesion | 21 | 8.40E-09 |
GOTERM_BP_DIRECT | GO:0001501~skeletal system development | 10 | 2.67E-04 |
GOTERM_BP_DIRECT | GO:0001503~ossification | 7 | 0.0154 |
KEGG | hsa04512:ECM-receptor interaction | 10 | 1.81E-06 |
KEGG | hsa04974:Protein digestion and absorption | 9 | 4.53E-05 |
KEGG | hsa04510:Focal adhesion | 10 | 0.0032 |
KEGG | hsa04151:PI3K-Akt signaling pathway | 11 | 0.032 |
Here I’m going to reproduce the findings with DAVID as best I can.
Here is the list of genes used as the foreground. The gene IDs are saved as a txt file on this repository called “PMC6349697_fg.txt”
fg <- readLines("PMC6349697_fg.txt")
fg
## [1] "ENSG00000142552" "ENSG00000183682" "ENSG00000137868" "ENSG00000140937"
## [5] "ENSG00000164932" "ENSG00000099953" "ENSG00000261732" "ENSG00000163331"
## [9] "ENSG00000141756" "ENSG00000168913" "ENSG00000145040" "ENSG00000261456"
## [13] "ENSG00000183844" "ENSG00000005243" "ENSG00000072210" "ENSG00000179698"
## [17] "ENSG00000106683" "ENSG00000113739" "ENSG00000110427" "ENSG00000117586"
## [21] "ENSG00000162849" "ENSG00000124225" "ENSG00000169385" "ENSG00000164465"
## [25] "ENSG00000187959" "ENSG00000135362" "ENSG00000163359" "ENSG00000091879"
## [29] "ENSG00000184937" "ENSG00000149380" "ENSG00000148848" "ENSG00000171617"
## [33] "ENSG00000146674" "ENSG00000006327" "ENSG00000086717" "ENSG00000198108"
## [37] "ENSG00000011422" "ENSG00000133466" "ENSG00000105989" "ENSG00000112655"
## [41] "ENSG00000188483" "ENSG00000118785" "ENSG00000101198" "ENSG00000172061"
## [45] "ENSG00000134013" "ENSG00000134668" "ENSG00000011201" "ENSG00000060718"
## [49] "ENSG00000204262" "ENSG00000168542" "ENSG00000151388" "ENSG00000248329"
## [53] "ENSG00000113140" "ENSG00000132000" "ENSG00000187498" "ENSG00000130948"
## [57] "ENSG00000176887" "ENSG00000203805" "ENSG00000095752" "ENSG00000101825"
## [61] "ENSG00000138316" "ENSG00000182492" "ENSG00000128567" "ENSG00000078098"
## [65] "ENSG00000168487" "ENSG00000111799" "ENSG00000106366" "ENSG00000137745"
## [69] "ENSG00000157766" "ENSG00000214954" "ENSG00000147003" "ENSG00000102128"
## [73] "ENSG00000154096" "ENSG00000196177" "ENSG00000168334" "ENSG00000136378"
## [77] "ENSG00000115363" "ENSG00000159261" "ENSG00000125657" "ENSG00000137878"
## [81] "ENSG00000137573" "ENSG00000164694" "ENSG00000164692" "ENSG00000130635"
## [85] "ENSG00000159216" "ENSG00000131389" "ENSG00000122641" "ENSG00000168824"
## [89] "ENSG00000038427" "ENSG00000124875" "ENSG00000162745" "ENSG00000169067"
## [93] "ENSG00000123500" "ENSG00000163673" "ENSG00000117122" "ENSG00000130720"
## [97] "ENSG00000113083" "ENSG00000204767" "ENSG00000155886" "ENSG00000119711"
## [101] "ENSG00000103888" "ENSG00000181378" "ENSG00000149257" "ENSG00000188611"
## [105] "ENSG00000087303" "ENSG00000087116" "ENSG00000104415" "ENSG00000108821"
## [109] "ENSG00000222047" "ENSG00000170373" "ENSG00000114270" "ENSG00000144810"
## [113] "ENSG00000188064" "ENSG00000187730" "ENSG00000171722" "ENSG00000122861"
## [117] "ENSG00000128342" "ENSG00000137809" "ENSG00000225614" "ENSG00000196611"
## [121] "ENSG00000177202" "ENSG00000186340" "ENSG00000086991" "ENSG00000235884"
These genes were submitted to DAVID v6.8 on 29th November 2021 without any particular background gene set, which gave the following results.
res1 <- read.table("PMC6349697_res1.tsv",header=TRUE,sep="\t")
res1 %>% kbl(caption="DAVID results obtained using article gene list without background") %>% kable_paper("hover", full_width = F)
Category | Term | Count | X. | PValue | List.Total | Pop.Hits | Pop.Total | Fold.Enrichment | FDR |
---|---|---|---|---|---|---|---|---|---|
GOTERM_BP_DIRECT | GO:0030574~collagen catabolic process | 15 | 12.195122 | 0.0000000 | 105 | 64 | 16792 | 37.482143 | 0.0000000 |
GOTERM_BP_DIRECT | GO:0030199~collagen fibril organization | 13 | 10.569106 | 0.0000000 | 105 | 39 | 16792 | 53.307936 | 0.0000000 |
GOTERM_BP_DIRECT | GO:0030198~extracellular matrix organization | 20 | 16.260163 | 0.0000000 | 105 | 196 | 16792 | 16.318756 | 0.0000000 |
GOTERM_BP_DIRECT | GO:0007155~cell adhesion | 21 | 17.073171 | 0.0000000 | 105 | 459 | 16792 | 7.316776 | 0.0000000 |
GOTERM_BP_DIRECT | GO:0001501~skeletal system development | 10 | 8.130081 | 0.0000002 | 105 | 137 | 16792 | 11.673271 | 0.0000257 |
GOTERM_BP_DIRECT | GO:0001503~ossification | 7 | 5.691057 | 0.0000101 | 105 | 80 | 16792 | 13.993333 | 0.0012345 |
GOTERM_BP_DIRECT | GO:0022617~extracellular matrix disassembly | 6 | 4.878049 | 0.0001078 | 105 | 76 | 16792 | 12.625564 | 0.0098657 |
GOTERM_BP_DIRECT | GO:0030324~lung development | 6 | 4.878049 | 0.0001078 | 105 | 76 | 16792 | 12.625564 | 0.0098657 |
GOTERM_BP_DIRECT | GO:0032964~collagen biosynthetic process | 3 | 2.439024 | 0.0005607 | 105 | 6 | 16792 | 79.961905 | 0.0443490 |
GOTERM_BP_DIRECT | GO:0035987~endodermal cell differentiation | 4 | 3.252032 | 0.0006059 | 105 | 27 | 16792 | 23.692416 | 0.0443490 |
GOTERM_BP_DIRECT | GO:0007507~heart development | 7 | 5.691057 | 0.0009683 | 105 | 183 | 16792 | 6.117304 | 0.0633311 |
GOTERM_BP_DIRECT | GO:0071711~basement membrane organization | 3 | 2.439024 | 0.0010382 | 105 | 8 | 16792 | 59.971429 | 0.0633311 |
GOTERM_BP_DIRECT | GO:0001568~blood vessel development | 4 | 3.252032 | 0.0016632 | 105 | 38 | 16792 | 16.834085 | 0.0936507 |
KEGG_PATHWAY | hsa04512:ECM-receptor interaction | 10 | 8.130081 | 0.0000000 | 43 | 87 | 6879 | 18.388132 | 0.0000001 |
KEGG_PATHWAY | hsa04974:Protein digestion and absorption | 9 | 7.317073 | 0.0000000 | 43 | 88 | 6879 | 16.361258 | 0.0000015 |
KEGG_PATHWAY | hsa04510:Focal adhesion | 10 | 8.130081 | 0.0000031 | 43 | 206 | 6879 | 7.765861 | 0.0000738 |
KEGG_PATHWAY | hsa04151:PI3K-Akt signaling pathway | 11 | 8.943089 | 0.0000308 | 43 | 345 | 6879 | 5.100708 | 0.0005549 |
KEGG_PATHWAY | hsa05146:Amoebiasis | 6 | 4.878049 | 0.0004272 | 43 | 106 | 6879 | 9.055287 | 0.0061517 |
KEGG_PATHWAY | hsa04060:Cytokine-cytokine receptor interaction | 7 | 5.691057 | 0.0032944 | 43 | 243 | 6879 | 4.608384 | 0.0395333 |
KEGG_PATHWAY | hsa04611:Platelet activation | 5 | 4.065041 | 0.0078237 | 43 | 130 | 6879 | 6.152952 | 0.0804719 |
This result means that most pathways could be replicated using the gene list as provided by the authors without the use of a custom background gene list.
Now I will try to use a corrected gene list. I see that the Table S6 has all the other clustered genes sets, totalling 3180 genes, however this is likely not all the genes that were detected in the dataset. Therefore, I downloaded the TGCA data used in the study and identified all the genes with an average of 10 reads per sample or more.
There are 25947 genes in the background.
bg <- readLines("PMC6349697_bg.txt")
length(bg)
## [1] 25947
These genes are stored as “PMC6349697_bg.txt” in the repo.
Next I repeated DAVID analysis, this time using the correct background gene list. Here is the result I obtained.
res2 <- read.table("PMC6349697_res2.tsv",header=TRUE,sep="\t")
res2 %>% kbl(caption="DAVID results obtained using the correct background") %>% kable_paper("hover", full_width = F)
Category | Term | Count | X. | PValue | List.Total | Pop.Hits | Pop.Total | Fold.Enrichment | FDR |
---|---|---|---|---|---|---|---|---|---|
GOTERM_BP_DIRECT | GO:0030574~collagen catabolic process | 15 | 12.195122 | 0.0000000 | 103 | 60 | 14590 | 35.412621 | 0.0000000 |
GOTERM_BP_DIRECT | GO:0030199~collagen fibril organization | 13 | 10.569106 | 0.0000000 | 103 | 37 | 14590 | 49.769089 | 0.0000000 |
GOTERM_BP_DIRECT | GO:0030198~extracellular matrix organization | 20 | 16.260163 | 0.0000000 | 103 | 185 | 14590 | 15.313566 | 0.0000000 |
GOTERM_BP_DIRECT | GO:0007155~cell adhesion | 21 | 17.073171 | 0.0000000 | 103 | 429 | 14590 | 6.933940 | 0.0000000 |
GOTERM_BP_DIRECT | GO:0001501~skeletal system development | 10 | 8.130081 | 0.0000002 | 103 | 126 | 14590 | 11.242102 | 0.0000339 |
GOTERM_BP_DIRECT | GO:0001503~ossification | 7 | 5.691057 | 0.0000100 | 103 | 71 | 14590 | 13.965541 | 0.0012063 |
GOTERM_BP_DIRECT | GO:0030324~lung development | 6 | 4.878049 | 0.0001193 | 103 | 69 | 14590 | 12.317433 | 0.0123536 |
GOTERM_BP_DIRECT | GO:0022617~extracellular matrix disassembly | 6 | 4.878049 | 0.0001559 | 103 | 73 | 14590 | 11.642506 | 0.0141314 |
GOTERM_BP_DIRECT | GO:0035987~endodermal cell differentiation | 4 | 3.252032 | 0.0006823 | 103 | 25 | 14590 | 22.664078 | 0.0516796 |
GOTERM_BP_DIRECT | GO:0032964~collagen biosynthetic process | 3 | 2.439024 | 0.0007128 | 103 | 6 | 14590 | 70.825243 | 0.0516796 |
GOTERM_BP_DIRECT | GO:0071711~basement membrane organization | 3 | 2.439024 | 0.0009934 | 103 | 7 | 14590 | 60.707351 | 0.0654741 |
GOTERM_BP_DIRECT | GO:0007507~heart development | 7 | 5.691057 | 0.0013019 | 103 | 172 | 14590 | 5.764845 | 0.0786564 |
KEGG_PATHWAY | hsa04512:ECM-receptor interaction | 10 | 8.130081 | 0.0000000 | 42 | 81 | 5885 | 17.298648 | 0.0000002 |
KEGG_PATHWAY | hsa04974:Protein digestion and absorption | 9 | 7.317073 | 0.0000001 | 42 | 83 | 5885 | 15.193632 | 0.0000025 |
KEGG_PATHWAY | hsa04510:Focal adhesion | 10 | 8.130081 | 0.0000061 | 42 | 197 | 5885 | 7.112642 | 0.0001419 |
KEGG_PATHWAY | hsa04151:PI3K-Akt signaling pathway | 11 | 8.943089 | 0.0000360 | 42 | 309 | 5885 | 4.988057 | 0.0006298 |
KEGG_PATHWAY | hsa05146:Amoebiasis | 6 | 4.878049 | 0.0005387 | 42 | 98 | 5885 | 8.578717 | 0.0075413 |
KEGG_PATHWAY | hsa04060:Cytokine-cytokine receptor interaction | 7 | 5.691057 | 0.0025016 | 42 | 202 | 5885 | 4.855611 | 0.0291855 |
KEGG_PATHWAY | hsa04611:Platelet activation | 5 | 4.065041 | 0.0098511 | 42 | 122 | 5885 | 5.742584 | 0.0985112 |
In contrast to the six GOs and four KEGGs identified with FDR<0.05 originally, with the corrected background there’s eight GOs and six KEGGs now.
Some interesting statements about the enrichment analysis:
“… functional enrichment analysis of LINC00941 co-expression network demonstrated that LINC00941 might be an essential regulator of tumor metastasis and cancer cell proliferation.”
“… through the functional enrichment analysis, we found that LINC00941 might be an essential regulator of tumor metastasis and cancer cell proliferation.”
“Some GO terms, such as extracellular matrix organization (GO: 0030198) and cell adhesion (GO: 0007155) are cell migration-related GO processes, which are associated with tumor metastasis. We detected that ECM-receptor interaction (KEGG: hsa04512) and Focal adhesion (KEGG: hsa04510) are metastasis-related pathways. PI3K-Akt signaling pathway (KEGG: hsa04151) is cell proliferation-related pathway. Our findings demonstrated that LINC00941 could be a potential regulator of tumor metastasis and cancer cell proliferation.”
“The results of GO terms and KEGG pathways were mainly enriched in cell proliferation, cell migration, and tumor metastasis.”
Indeed the results indicate LINC00941 is co-regulated with genes involved in ECM structure and metabolism. This pathway might be involved with tissue remodelling involved with tumour maturation. This looks to all be okay, apart from one statement about proliferation. None of these ontologies/pathways are related to cell cycle or proliferation. The statement in the discussion is also suspect, because none of these pathways are closely linked to proliferation, migration or metastasis. There are likely other gene sets available which can be used to test these associations more thoroughly. For example there are 58 gene sets containing the keyword “metastasis” in the MSigDB v7.4 collection (29/11/2021).
Here I’m using the enrichGO function to analyse the data. The algorithm is slightly different and the gene sets might be a different version. First GO biological processes.
go_bp <- enrichGO(gene = fg, keyType = "ENSEMBL", universe = bg,
OrgDb = org.Hs.eg.db, ont = "BP", pAdjustMethod = "BH", pvalueCutoff = 0.01,
qvalueCutoff = 0.05, readable = TRUE)
go_bp <- data.frame(go_bp)
go_bp <- subset(go_bp,qvalue<0.05)
nrow(go_bp)
## [1] 69
writeLines(go_bp$Description,"PMC6349697_gobp.txt")
go_bp[,c(2:7,9)] %>% kbl(caption="Clusterprofiler GO results obtained using the correct background") %>% kable_paper("hover", full_width = F)
Description | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | |
---|---|---|---|---|---|---|---|
GO:0030198 | extracellular matrix organization | 38/110 | 379/15725 | 0.0000000 | 0.0000000 | 0.0000000 | 38 |
GO:0043062 | extracellular structure organization | 38/110 | 380/15725 | 0.0000000 | 0.0000000 | 0.0000000 | 38 |
GO:0045229 | external encapsulating structure organization | 38/110 | 382/15725 | 0.0000000 | 0.0000000 | 0.0000000 | 38 |
GO:0030199 | collagen fibril organization | 14/110 | 51/15725 | 0.0000000 | 0.0000000 | 0.0000000 | 14 |
GO:0001501 | skeletal system development | 23/110 | 457/15725 | 0.0000000 | 0.0000000 | 0.0000000 | 23 |
GO:0032963 | collagen metabolic process | 12/110 | 98/15725 | 0.0000000 | 0.0000000 | 0.0000000 | 12 |
GO:0061448 | connective tissue development | 16/110 | 227/15725 | 0.0000000 | 0.0000000 | 0.0000000 | 16 |
GO:0001503 | ossification | 18/110 | 369/15725 | 0.0000000 | 0.0000000 | 0.0000000 | 18 |
GO:0051216 | cartilage development | 13/110 | 173/15725 | 0.0000000 | 0.0000001 | 0.0000000 | 13 |
GO:0035987 | endodermal cell differentiation | 7/110 | 44/15725 | 0.0000000 | 0.0000041 | 0.0000035 | 7 |
GO:0007492 | endoderm development | 8/110 | 75/15725 | 0.0000001 | 0.0000091 | 0.0000077 | 8 |
GO:0060541 | respiratory system development | 11/110 | 181/15725 | 0.0000001 | 0.0000091 | 0.0000077 | 11 |
GO:0001706 | endoderm formation | 7/110 | 53/15725 | 0.0000001 | 0.0000122 | 0.0000104 | 7 |
GO:0030324 | lung development | 10/110 | 157/15725 | 0.0000001 | 0.0000208 | 0.0000178 | 10 |
GO:0030323 | respiratory tube development | 10/110 | 161/15725 | 0.0000002 | 0.0000246 | 0.0000210 | 10 |
GO:0007423 | sensory organ development | 16/110 | 488/15725 | 0.0000003 | 0.0000341 | 0.0000291 | 16 |
GO:0002062 | chondrocyte differentiation | 8/110 | 99/15725 | 0.0000004 | 0.0000521 | 0.0000445 | 8 |
GO:0030574 | collagen catabolic process | 6/110 | 44/15725 | 0.0000006 | 0.0000615 | 0.0000526 | 6 |
GO:0090596 | sensory organ morphogenesis | 11/110 | 229/15725 | 0.0000006 | 0.0000615 | 0.0000526 | 11 |
GO:0001704 | formation of primary germ layer | 8/110 | 109/15725 | 0.0000009 | 0.0000927 | 0.0000793 | 8 |
GO:0001649 | osteoblast differentiation | 10/110 | 203/15725 | 0.0000016 | 0.0001482 | 0.0001267 | 10 |
GO:0001654 | eye development | 12/110 | 329/15725 | 0.0000032 | 0.0002905 | 0.0002484 | 12 |
GO:0150063 | visual system development | 12/110 | 333/15725 | 0.0000037 | 0.0003146 | 0.0002691 | 12 |
GO:0072001 | renal system development | 11/110 | 279/15725 | 0.0000041 | 0.0003343 | 0.0002858 | 11 |
GO:0048880 | sensory system development | 12/110 | 339/15725 | 0.0000044 | 0.0003476 | 0.0002972 | 12 |
GO:0048592 | eye morphogenesis | 8/110 | 136/15725 | 0.0000050 | 0.0003780 | 0.0003233 | 8 |
GO:0042060 | wound healing | 14/110 | 484/15725 | 0.0000070 | 0.0005112 | 0.0004371 | 14 |
GO:0001525 | angiogenesis | 14/110 | 488/15725 | 0.0000077 | 0.0005413 | 0.0004628 | 14 |
GO:0001655 | urogenital system development | 11/110 | 311/15725 | 0.0000114 | 0.0007742 | 0.0006620 | 11 |
GO:0032964 | collagen biosynthetic process | 5/110 | 44/15725 | 0.0000134 | 0.0008803 | 0.0007527 | 5 |
GO:0042730 | fibrinolysis | 4/110 | 23/15725 | 0.0000181 | 0.0011553 | 0.0009879 | 4 |
GO:0001822 | kidney development | 10/110 | 271/15725 | 0.0000202 | 0.0012473 | 0.0010666 | 10 |
GO:0007369 | gastrulation | 8/110 | 168/15725 | 0.0000233 | 0.0013984 | 0.0011958 | 8 |
GO:0006029 | proteoglycan metabolic process | 6/110 | 87/15725 | 0.0000326 | 0.0018935 | 0.0016192 | 6 |
GO:0046888 | negative regulation of hormone secretion | 5/110 | 55/15725 | 0.0000403 | 0.0022399 | 0.0019154 | 5 |
GO:0072006 | nephron development | 7/110 | 133/15725 | 0.0000408 | 0.0022399 | 0.0019154 | 7 |
GO:0071559 | response to transforming growth factor beta | 9/110 | 240/15725 | 0.0000471 | 0.0025163 | 0.0021517 | 9 |
GO:0014910 | regulation of smooth muscle cell migration | 5/110 | 58/15725 | 0.0000521 | 0.0026389 | 0.0022565 | 5 |
GO:0061029 | eyelid development in camera-type eye | 3/110 | 11/15725 | 0.0000528 | 0.0026389 | 0.0022565 | 3 |
GO:0048608 | reproductive structure development | 11/110 | 368/15725 | 0.0000534 | 0.0026389 | 0.0022565 | 11 |
GO:0061458 | reproductive system development | 11/110 | 371/15725 | 0.0000574 | 0.0027698 | 0.0023685 | 11 |
GO:0003338 | metanephros morphogenesis | 4/110 | 31/15725 | 0.0000617 | 0.0029023 | 0.0024817 | 4 |
GO:0061035 | regulation of cartilage development | 5/110 | 61/15725 | 0.0000666 | 0.0030613 | 0.0026177 | 5 |
GO:0060428 | lung epithelium development | 4/110 | 33/15725 | 0.0000793 | 0.0035328 | 0.0030209 | 4 |
GO:1903034 | regulation of response to wounding | 7/110 | 148/15725 | 0.0000804 | 0.0035328 | 0.0030209 | 7 |
GO:0014909 | smooth muscle cell migration | 5/110 | 65/15725 | 0.0000904 | 0.0038860 | 0.0033229 | 5 |
GO:0038065 | collagen-activated signaling pathway | 3/110 | 14/15725 | 0.0001146 | 0.0048212 | 0.0041226 | 3 |
GO:0033627 | cell adhesion mediated by integrin | 5/110 | 71/15725 | 0.0001379 | 0.0056782 | 0.0048554 | 5 |
GO:0031589 | cell-substrate adhesion | 10/110 | 347/15725 | 0.0001597 | 0.0064431 | 0.0055094 | 10 |
GO:0043010 | camera-type eye development | 9/110 | 286/15725 | 0.0001793 | 0.0069487 | 0.0059418 | 9 |
GO:0046879 | hormone secretion | 9/110 | 286/15725 | 0.0001793 | 0.0069487 | 0.0059418 | 9 |
GO:0061041 | regulation of wound healing | 6/110 | 119/15725 | 0.0001868 | 0.0071011 | 0.0060721 | 6 |
GO:0007517 | muscle organ development | 9/110 | 291/15725 | 0.0002040 | 0.0076092 | 0.0065066 | 9 |
GO:0014812 | muscle cell migration | 5/110 | 78/15725 | 0.0002150 | 0.0077296 | 0.0066096 | 5 |
GO:0022617 | extracellular matrix disassembly | 5/110 | 78/15725 | 0.0002150 | 0.0077296 | 0.0066096 | 5 |
GO:0060562 | epithelial tube morphogenesis | 9/110 | 295/15725 | 0.0002258 | 0.0079705 | 0.0068156 | 9 |
GO:0009914 | hormone transport | 9/110 | 296/15725 | 0.0002315 | 0.0080297 | 0.0068661 | 9 |
GO:0071560 | cellular response to transforming growth factor beta stimulus | 8/110 | 234/15725 | 0.0002373 | 0.0080884 | 0.0069164 | 8 |
GO:0007566 | embryo implantation | 4/110 | 44/15725 | 0.0002481 | 0.0083118 | 0.0071074 | 4 |
GO:1903035 | negative regulation of response to wounding | 5/110 | 81/15725 | 0.0002567 | 0.0084577 | 0.0072322 | 5 |
GO:0001890 | placenta development | 6/110 | 129/15725 | 0.0002894 | 0.0092673 | 0.0079245 | 6 |
GO:0030195 | negative regulation of blood coagulation | 4/110 | 46/15725 | 0.0002950 | 0.0092673 | 0.0079245 | 4 |
GO:0046697 | decidualization | 3/110 | 19/15725 | 0.0002974 | 0.0092673 | 0.0079245 | 3 |
GO:0060348 | bone development | 7/110 | 185/15725 | 0.0003196 | 0.0092673 | 0.0079245 | 7 |
GO:0032330 | regulation of chondrocyte differentiation | 4/110 | 47/15725 | 0.0003207 | 0.0092673 | 0.0079245 | 4 |
GO:0033628 | regulation of cell adhesion mediated by integrin | 4/110 | 47/15725 | 0.0003207 | 0.0092673 | 0.0079245 | 4 |
GO:0060425 | lung morphogenesis | 4/110 | 47/15725 | 0.0003207 | 0.0092673 | 0.0079245 | 4 |
GO:1900047 | negative regulation of hemostasis | 4/110 | 47/15725 | 0.0003207 | 0.0092673 | 0.0079245 | 4 |
GO:0046883 | regulation of hormone secretion | 8/110 | 245/15725 | 0.0003234 | 0.0092673 | 0.0079245 | 8 |
Now KEGG analysis but need to convert to entrez first
fg_entrez <- unlist(mget(fg, org.Hs.egENSEMBL2EG, ifnotfound = NA))
bg_entrez <- unlist(mget(bg, org.Hs.egENSEMBL2EG, ifnotfound = NA))
kegg <- enrichKEGG(gene = fg_entrez, universe = bg_entrez,
organism = "hsa", pAdjustMethod = "BH", pvalueCutoff = 0.01,
qvalueCutoff = 0.05)
kegg <- as.data.frame(kegg)
writeLines(kegg$Description,"PMC6349697_kegg.txt")
kegg[,c(2:7,9)] %>% kbl(caption="Clusterprofiler KEGG results obtained using the correct background") %>% kable_paper("hover", full_width = F)
Description | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | Count | |
---|---|---|---|---|---|---|---|
hsa04974 | Protein digestion and absorption | 12/55 | 99/7042 | 0.0000000 | 0.0000000 | 0.0000000 | 12 |
hsa04512 | ECM-receptor interaction | 7/55 | 85/7042 | 0.0000037 | 0.0002106 | 0.0001847 | 7 |
hsa04933 | AGE-RAGE signaling pathway in diabetic complications | 6/55 | 100/7042 | 0.0001165 | 0.0044265 | 0.0038829 | 6 |
hsa04926 | Relaxin signaling pathway | 6/55 | 122/7042 | 0.0003463 | 0.0098697 | 0.0086576 | 6 |
The clusterProfiler GO:BP results gave 69 significant terms. Prominent similar terms to the article were ECM, collagen, skeletal system and ossification, whereas clusterprofiler also gave some interesting gene sets such as angiogenesis, smooth muscle cell migration, response to transforming growth factor beta.
With KEGG analysis, only two sets were consistent, while focal adhesion and PI3K-Akt were not significant.
In conclusion, it appears that the lack of a background gene set did not have much of an effect on the results, however the conclusions drawn in the article regarding proliferation and metastasis based on enrichment analysis are unfounded.
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] eulerr_6.1.1 kableExtra_1.3.4
## [3] mitch_1.4.1 DESeq2_1.32.0
## [5] SummarizedExperiment_1.22.0 MatrixGenerics_1.4.3
## [7] matrixStats_0.61.0 GenomicRanges_1.44.0
## [9] GenomeInfoDb_1.28.4 getDEE2_1.2.0
## [11] org.Hs.eg.db_3.13.0 AnnotationDbi_1.54.1
## [13] IRanges_2.26.0 S4Vectors_0.30.2
## [15] Biobase_2.52.0 BiocGenerics_0.38.0
## [17] clusterProfiler_4.0.5 reshape2_1.4.4
##
## loaded via a namespace (and not attached):
## [1] shadowtext_0.0.9 fastmatch_1.1-3 systemfonts_1.0.3
## [4] plyr_1.8.6 igraph_1.2.8 lazyeval_0.2.2
## [7] splines_4.1.2 BiocParallel_1.26.2 ggplot2_3.3.5
## [10] digest_0.6.28 yulab.utils_0.0.4 htmltools_0.5.2
## [13] GOSemSim_2.18.1 viridis_0.6.2 GO.db_3.13.0
## [16] fansi_0.5.0 magrittr_2.0.1 memoise_2.0.0
## [19] Biostrings_2.60.2 annotate_1.70.0 graphlayouts_0.7.1
## [22] svglite_2.0.0 enrichplot_1.12.3 colorspace_2.0-2
## [25] rvest_1.0.2 blob_1.2.2 ggrepel_0.9.1
## [28] xfun_0.28 dplyr_1.0.7 crayon_1.4.2
## [31] RCurl_1.98-1.5 jsonlite_1.7.2 scatterpie_0.1.7
## [34] genefilter_1.74.1 survival_3.2-13 ape_5.5
## [37] glue_1.5.0 polyclip_1.10-0 gtable_0.3.0
## [40] zlibbioc_1.38.0 XVector_0.32.0 webshot_0.5.2
## [43] htm2txt_2.1.1 DelayedArray_0.18.0 scales_1.1.1
## [46] DOSE_3.18.3 DBI_1.1.1 GGally_2.1.2
## [49] Rcpp_1.0.7 viridisLite_0.4.0 xtable_1.8-4
## [52] gridGraphics_0.5-1 tidytree_0.3.6 bit_4.0.4
## [55] htmlwidgets_1.5.4 httr_1.4.2 fgsea_1.18.0
## [58] gplots_3.1.1 RColorBrewer_1.1-2 ellipsis_0.3.2
## [61] pkgconfig_2.0.3 reshape_0.8.8 XML_3.99-0.8
## [64] farver_2.1.0 sass_0.4.0 locfit_1.5-9.4
## [67] utf8_1.2.2 later_1.3.0 ggplotify_0.1.0
## [70] tidyselect_1.1.1 rlang_0.4.12 munsell_0.5.0
## [73] tools_4.1.2 cachem_1.0.6 downloader_0.4
## [76] generics_0.1.1 RSQLite_2.2.8 evaluate_0.14
## [79] stringr_1.4.0 fastmap_1.1.0 yaml_2.2.1
## [82] ggtree_3.0.4 knitr_1.36 bit64_4.0.5
## [85] tidygraph_1.2.0 caTools_1.18.2 purrr_0.3.4
## [88] KEGGREST_1.32.0 ggraph_2.0.5 nlme_3.1-153
## [91] mime_0.12 aplot_0.1.1 xml2_1.3.2
## [94] DO.db_2.9 rstudioapi_0.13 compiler_4.1.2
## [97] beeswarm_0.4.0 png_0.1-7 treeio_1.16.2
## [100] tibble_3.1.6 tweenr_1.0.2 geneplotter_1.70.0
## [103] bslib_0.3.1 stringi_1.7.5 highr_0.9
## [106] lattice_0.20-45 Matrix_1.3-4 vctrs_0.3.8
## [109] pillar_1.6.4 lifecycle_1.0.1 jquerylib_0.1.4
## [112] data.table_1.14.2 cowplot_1.1.1 bitops_1.0-7
## [115] httpuv_1.6.3 patchwork_1.1.1 qvalue_2.24.0
## [118] R6_2.5.1 promises_1.2.0.1 KernSmooth_2.23-20
## [121] echarts4r_0.4.2 gridExtra_2.3 gtools_3.9.2
## [124] MASS_7.3-54 assertthat_0.2.1 GenomeInfoDbData_1.2.6
## [127] grid_4.1.2 ggfun_0.0.4 tidyr_1.1.4
## [130] rmarkdown_2.11 ggforce_0.3.3 shiny_1.7.1