Source: https://github.com/markziemann/SurveyEnrichmentMethods

Intro

suppressPackageStartupMessages({
library("getDEE2")
library("DESeq2")
library("clusterProfiler")
library("org.Hs.eg.db")
library("mitch")
library("kableExtra")
library("eulerr")
})

In Kaumadi et al 2021 we showed that major problems plague functional enrichment analysis in many journal articles.

Here I will do my best to reproduce and replicate the findings of some studies. These were selected on the following criteria:

  • Included in the 2019 group that was studies in Kaumadi et al 2021.

  • Human study

  • RNA-seq gene expression

  • Performed DAVID gene set analysis

  • Provided gene list

From this subset, four studies were selected

Study Analytical issues
PMC6349697 Background
PMC6425008 Background, FDR
PMC6463127 Background, FDR
PMC6535219 Background

In this document I will be seeing if I can reproduce the same findings by using the same tool with the gene list provided in the article. Also I will correct the method wherever possible. This means using the correct background gene list, using FDR control and performing separate analysis of up and down-regulated gene lists.

The study examined in this document is PMC6349697: Liu et al, 2019.

In this study, LINC00941 is the focus. From TGCA expression data, a coexpression network was built and then genes associated with LINC00941 were obtained. These genes were then subject to enrichment analysis of GO terms and KEGG pathways using DAVID v6.8, with pathways FDR<0.05 considered significant. No information is provided in the article about a background gene list.

From the article, they present in Supplementary Table S5 a list of 123 co-expressed genes.

In Supplementary Table S6 they show the 3180 genes that underwent clustering, along with the cluster assignment. There are 124 genes in the “yellow” cluster.

In Supplementary Table S7, authors show the significant results of the enrichment tests.

Category Term Count FDR
GOTERM_BP_DIRECT GO:0030574~collagen catabolic process 15 2.73E-15
GOTERM_BP_DIRECT GO:0030199~collagen fibril organization 13 8.48E-15
GOTERM_BP_DIRECT GO:0030198~extracellular matrix organization 20 1.43E-14
GOTERM_BP_DIRECT GO:0007155~cell adhesion 21 8.40E-09
GOTERM_BP_DIRECT GO:0001501~skeletal system development 10 2.67E-04
GOTERM_BP_DIRECT GO:0001503~ossification 7 0.0154
KEGG hsa04512:ECM-receptor interaction 10 1.81E-06
KEGG hsa04974:Protein digestion and absorption 9 4.53E-05
KEGG hsa04510:Focal adhesion 10 0.0032
KEGG hsa04151:PI3K-Akt signaling pathway 11 0.032

Try to reproduce

Here I’m going to reproduce the findings with DAVID as best I can.

Here is the list of genes used as the foreground. The gene IDs are saved as a txt file on this repository called “PMC6349697_fg.txt”

fg <- readLines("PMC6349697_fg.txt")

fg
##   [1] "ENSG00000142552" "ENSG00000183682" "ENSG00000137868" "ENSG00000140937"
##   [5] "ENSG00000164932" "ENSG00000099953" "ENSG00000261732" "ENSG00000163331"
##   [9] "ENSG00000141756" "ENSG00000168913" "ENSG00000145040" "ENSG00000261456"
##  [13] "ENSG00000183844" "ENSG00000005243" "ENSG00000072210" "ENSG00000179698"
##  [17] "ENSG00000106683" "ENSG00000113739" "ENSG00000110427" "ENSG00000117586"
##  [21] "ENSG00000162849" "ENSG00000124225" "ENSG00000169385" "ENSG00000164465"
##  [25] "ENSG00000187959" "ENSG00000135362" "ENSG00000163359" "ENSG00000091879"
##  [29] "ENSG00000184937" "ENSG00000149380" "ENSG00000148848" "ENSG00000171617"
##  [33] "ENSG00000146674" "ENSG00000006327" "ENSG00000086717" "ENSG00000198108"
##  [37] "ENSG00000011422" "ENSG00000133466" "ENSG00000105989" "ENSG00000112655"
##  [41] "ENSG00000188483" "ENSG00000118785" "ENSG00000101198" "ENSG00000172061"
##  [45] "ENSG00000134013" "ENSG00000134668" "ENSG00000011201" "ENSG00000060718"
##  [49] "ENSG00000204262" "ENSG00000168542" "ENSG00000151388" "ENSG00000248329"
##  [53] "ENSG00000113140" "ENSG00000132000" "ENSG00000187498" "ENSG00000130948"
##  [57] "ENSG00000176887" "ENSG00000203805" "ENSG00000095752" "ENSG00000101825"
##  [61] "ENSG00000138316" "ENSG00000182492" "ENSG00000128567" "ENSG00000078098"
##  [65] "ENSG00000168487" "ENSG00000111799" "ENSG00000106366" "ENSG00000137745"
##  [69] "ENSG00000157766" "ENSG00000214954" "ENSG00000147003" "ENSG00000102128"
##  [73] "ENSG00000154096" "ENSG00000196177" "ENSG00000168334" "ENSG00000136378"
##  [77] "ENSG00000115363" "ENSG00000159261" "ENSG00000125657" "ENSG00000137878"
##  [81] "ENSG00000137573" "ENSG00000164694" "ENSG00000164692" "ENSG00000130635"
##  [85] "ENSG00000159216" "ENSG00000131389" "ENSG00000122641" "ENSG00000168824"
##  [89] "ENSG00000038427" "ENSG00000124875" "ENSG00000162745" "ENSG00000169067"
##  [93] "ENSG00000123500" "ENSG00000163673" "ENSG00000117122" "ENSG00000130720"
##  [97] "ENSG00000113083" "ENSG00000204767" "ENSG00000155886" "ENSG00000119711"
## [101] "ENSG00000103888" "ENSG00000181378" "ENSG00000149257" "ENSG00000188611"
## [105] "ENSG00000087303" "ENSG00000087116" "ENSG00000104415" "ENSG00000108821"
## [109] "ENSG00000222047" "ENSG00000170373" "ENSG00000114270" "ENSG00000144810"
## [113] "ENSG00000188064" "ENSG00000187730" "ENSG00000171722" "ENSG00000122861"
## [117] "ENSG00000128342" "ENSG00000137809" "ENSG00000225614" "ENSG00000196611"
## [121] "ENSG00000177202" "ENSG00000186340" "ENSG00000086991" "ENSG00000235884"

These genes were submitted to DAVID v6.8 on 29th November 2021 without any particular background gene set, which gave the following results.

res1 <- read.table("PMC6349697_res1.tsv",header=TRUE,sep="\t")

res1 %>% kbl(caption="DAVID results obtained using article gene list without background") %>% kable_paper("hover", full_width = F)
DAVID results obtained using article gene list without background
Category Term Count X. PValue List.Total Pop.Hits Pop.Total Fold.Enrichment FDR
GOTERM_BP_DIRECT GO:0030574~collagen catabolic process 15 12.195122 0.0000000 105 64 16792 37.482143 0.0000000
GOTERM_BP_DIRECT GO:0030199~collagen fibril organization 13 10.569106 0.0000000 105 39 16792 53.307936 0.0000000
GOTERM_BP_DIRECT GO:0030198~extracellular matrix organization 20 16.260163 0.0000000 105 196 16792 16.318756 0.0000000
GOTERM_BP_DIRECT GO:0007155~cell adhesion 21 17.073171 0.0000000 105 459 16792 7.316776 0.0000000
GOTERM_BP_DIRECT GO:0001501~skeletal system development 10 8.130081 0.0000002 105 137 16792 11.673271 0.0000257
GOTERM_BP_DIRECT GO:0001503~ossification 7 5.691057 0.0000101 105 80 16792 13.993333 0.0012345
GOTERM_BP_DIRECT GO:0022617~extracellular matrix disassembly 6 4.878049 0.0001078 105 76 16792 12.625564 0.0098657
GOTERM_BP_DIRECT GO:0030324~lung development 6 4.878049 0.0001078 105 76 16792 12.625564 0.0098657
GOTERM_BP_DIRECT GO:0032964~collagen biosynthetic process 3 2.439024 0.0005607 105 6 16792 79.961905 0.0443490
GOTERM_BP_DIRECT GO:0035987~endodermal cell differentiation 4 3.252032 0.0006059 105 27 16792 23.692416 0.0443490
GOTERM_BP_DIRECT GO:0007507~heart development 7 5.691057 0.0009683 105 183 16792 6.117304 0.0633311
GOTERM_BP_DIRECT GO:0071711~basement membrane organization 3 2.439024 0.0010382 105 8 16792 59.971429 0.0633311
GOTERM_BP_DIRECT GO:0001568~blood vessel development 4 3.252032 0.0016632 105 38 16792 16.834085 0.0936507
KEGG_PATHWAY hsa04512:ECM-receptor interaction 10 8.130081 0.0000000 43 87 6879 18.388132 0.0000001
KEGG_PATHWAY hsa04974:Protein digestion and absorption 9 7.317073 0.0000000 43 88 6879 16.361258 0.0000015
KEGG_PATHWAY hsa04510:Focal adhesion 10 8.130081 0.0000031 43 206 6879 7.765861 0.0000738
KEGG_PATHWAY hsa04151:PI3K-Akt signaling pathway 11 8.943089 0.0000308 43 345 6879 5.100708 0.0005549
KEGG_PATHWAY hsa05146:Amoebiasis 6 4.878049 0.0004272 43 106 6879 9.055287 0.0061517
KEGG_PATHWAY hsa04060:Cytokine-cytokine receptor interaction 7 5.691057 0.0032944 43 243 6879 4.608384 0.0395333
KEGG_PATHWAY hsa04611:Platelet activation 5 4.065041 0.0078237 43 130 6879 6.152952 0.0804719

This result means that most pathways could be replicated using the gene list as provided by the authors without the use of a custom background gene list.

Try to reproduce with corrected background gene list

Now I will try to use a corrected gene list. I see that the Table S6 has all the other clustered genes sets, totalling 3180 genes, however this is likely not all the genes that were detected in the dataset. Therefore, I downloaded the TGCA data used in the study and identified all the genes with an average of 10 reads per sample or more.

There are 25947 genes in the background.

bg <- readLines("PMC6349697_bg.txt")

length(bg)
## [1] 25947

These genes are stored as “PMC6349697_bg.txt” in the repo.

Next I repeated DAVID analysis, this time using the correct background gene list. Here is the result I obtained.

res2 <- read.table("PMC6349697_res2.tsv",header=TRUE,sep="\t")

res2 %>% kbl(caption="DAVID results obtained using the correct background") %>% kable_paper("hover", full_width = F)
DAVID results obtained using the correct background
Category Term Count X. PValue List.Total Pop.Hits Pop.Total Fold.Enrichment FDR
GOTERM_BP_DIRECT GO:0030574~collagen catabolic process 15 12.195122 0.0000000 103 60 14590 35.412621 0.0000000
GOTERM_BP_DIRECT GO:0030199~collagen fibril organization 13 10.569106 0.0000000 103 37 14590 49.769089 0.0000000
GOTERM_BP_DIRECT GO:0030198~extracellular matrix organization 20 16.260163 0.0000000 103 185 14590 15.313566 0.0000000
GOTERM_BP_DIRECT GO:0007155~cell adhesion 21 17.073171 0.0000000 103 429 14590 6.933940 0.0000000
GOTERM_BP_DIRECT GO:0001501~skeletal system development 10 8.130081 0.0000002 103 126 14590 11.242102 0.0000339
GOTERM_BP_DIRECT GO:0001503~ossification 7 5.691057 0.0000100 103 71 14590 13.965541 0.0012063
GOTERM_BP_DIRECT GO:0030324~lung development 6 4.878049 0.0001193 103 69 14590 12.317433 0.0123536
GOTERM_BP_DIRECT GO:0022617~extracellular matrix disassembly 6 4.878049 0.0001559 103 73 14590 11.642506 0.0141314
GOTERM_BP_DIRECT GO:0035987~endodermal cell differentiation 4 3.252032 0.0006823 103 25 14590 22.664078 0.0516796
GOTERM_BP_DIRECT GO:0032964~collagen biosynthetic process 3 2.439024 0.0007128 103 6 14590 70.825243 0.0516796
GOTERM_BP_DIRECT GO:0071711~basement membrane organization 3 2.439024 0.0009934 103 7 14590 60.707351 0.0654741
GOTERM_BP_DIRECT GO:0007507~heart development 7 5.691057 0.0013019 103 172 14590 5.764845 0.0786564
KEGG_PATHWAY hsa04512:ECM-receptor interaction 10 8.130081 0.0000000 42 81 5885 17.298648 0.0000002
KEGG_PATHWAY hsa04974:Protein digestion and absorption 9 7.317073 0.0000001 42 83 5885 15.193632 0.0000025
KEGG_PATHWAY hsa04510:Focal adhesion 10 8.130081 0.0000061 42 197 5885 7.112642 0.0001419
KEGG_PATHWAY hsa04151:PI3K-Akt signaling pathway 11 8.943089 0.0000360 42 309 5885 4.988057 0.0006298
KEGG_PATHWAY hsa05146:Amoebiasis 6 4.878049 0.0005387 42 98 5885 8.578717 0.0075413
KEGG_PATHWAY hsa04060:Cytokine-cytokine receptor interaction 7 5.691057 0.0025016 42 202 5885 4.855611 0.0291855
KEGG_PATHWAY hsa04611:Platelet activation 5 4.065041 0.0098511 42 122 5885 5.742584 0.0985112

In contrast to the six GOs and four KEGGs identified with FDR<0.05 originally, with the corrected background there’s eight GOs and six KEGGs now.

What about the conclusions of the study?

Some interesting statements about the enrichment analysis:

  • Abstract:

“… functional enrichment analysis of LINC00941 co-expression network demonstrated that LINC00941 might be an essential regulator of tumor metastasis and cancer cell proliferation.”

  • Introduction final paragraph:

“… through the functional enrichment analysis, we found that LINC00941 might be an essential regulator of tumor metastasis and cancer cell proliferation.”

  • Results:

“Some GO terms, such as extracellular matrix organization (GO: 0030198) and cell adhesion (GO: 0007155) are cell migration-related GO processes, which are associated with tumor metastasis. We detected that ECM-receptor interaction (KEGG: hsa04512) and Focal adhesion (KEGG: hsa04510) are metastasis-related pathways. PI3K-Akt signaling pathway (KEGG: hsa04151) is cell proliferation-related pathway. Our findings demonstrated that LINC00941 could be a potential regulator of tumor metastasis and cancer cell proliferation.”

  • Discussion:

“The results of GO terms and KEGG pathways were mainly enriched in cell proliferation, cell migration, and tumor metastasis.”

Indeed the results indicate LINC00941 is co-regulated with genes involved in ECM structure and metabolism. This pathway might be involved with tissue remodelling involved with tumour maturation. This looks to all be okay, apart from one statement about proliferation. None of these ontologies/pathways are related to cell cycle or proliferation. The statement in the discussion is also suspect, because none of these pathways are closely linked to proliferation, migration or metastasis. There are likely other gene sets available which can be used to test these associations more thoroughly. For example there are 58 gene sets containing the keyword “metastasis” in the MSigDB v7.4 collection (29/11/2021).

What about a replication using an R script

Here I’m using the enrichGO function to analyse the data. The algorithm is slightly different and the gene sets might be a different version. First GO biological processes.

go_bp <- enrichGO(gene = fg,  keyType = "ENSEMBL", universe = bg,
    OrgDb = org.Hs.eg.db, ont = "BP", pAdjustMethod = "BH", pvalueCutoff  = 0.01,
    qvalueCutoff  = 0.05, readable = TRUE)

go_bp <- data.frame(go_bp)

go_bp <- subset(go_bp,qvalue<0.05)

nrow(go_bp)
## [1] 69
writeLines(go_bp$Description,"PMC6349697_gobp.txt")

go_bp[,c(2:7,9)] %>% kbl(caption="Clusterprofiler GO results obtained using the correct background") %>% kable_paper("hover", full_width = F)
Clusterprofiler GO results obtained using the correct background
Description GeneRatio BgRatio pvalue p.adjust qvalue Count
GO:0030198 extracellular matrix organization 38/110 379/15725 0.0000000 0.0000000 0.0000000 38
GO:0043062 extracellular structure organization 38/110 380/15725 0.0000000 0.0000000 0.0000000 38
GO:0045229 external encapsulating structure organization 38/110 382/15725 0.0000000 0.0000000 0.0000000 38
GO:0030199 collagen fibril organization 14/110 51/15725 0.0000000 0.0000000 0.0000000 14
GO:0001501 skeletal system development 23/110 457/15725 0.0000000 0.0000000 0.0000000 23
GO:0032963 collagen metabolic process 12/110 98/15725 0.0000000 0.0000000 0.0000000 12
GO:0061448 connective tissue development 16/110 227/15725 0.0000000 0.0000000 0.0000000 16
GO:0001503 ossification 18/110 369/15725 0.0000000 0.0000000 0.0000000 18
GO:0051216 cartilage development 13/110 173/15725 0.0000000 0.0000001 0.0000000 13
GO:0035987 endodermal cell differentiation 7/110 44/15725 0.0000000 0.0000041 0.0000035 7
GO:0007492 endoderm development 8/110 75/15725 0.0000001 0.0000091 0.0000077 8
GO:0060541 respiratory system development 11/110 181/15725 0.0000001 0.0000091 0.0000077 11
GO:0001706 endoderm formation 7/110 53/15725 0.0000001 0.0000122 0.0000104 7
GO:0030324 lung development 10/110 157/15725 0.0000001 0.0000208 0.0000178 10
GO:0030323 respiratory tube development 10/110 161/15725 0.0000002 0.0000246 0.0000210 10
GO:0007423 sensory organ development 16/110 488/15725 0.0000003 0.0000341 0.0000291 16
GO:0002062 chondrocyte differentiation 8/110 99/15725 0.0000004 0.0000521 0.0000445 8
GO:0030574 collagen catabolic process 6/110 44/15725 0.0000006 0.0000615 0.0000526 6
GO:0090596 sensory organ morphogenesis 11/110 229/15725 0.0000006 0.0000615 0.0000526 11
GO:0001704 formation of primary germ layer 8/110 109/15725 0.0000009 0.0000927 0.0000793 8
GO:0001649 osteoblast differentiation 10/110 203/15725 0.0000016 0.0001482 0.0001267 10
GO:0001654 eye development 12/110 329/15725 0.0000032 0.0002905 0.0002484 12
GO:0150063 visual system development 12/110 333/15725 0.0000037 0.0003146 0.0002691 12
GO:0072001 renal system development 11/110 279/15725 0.0000041 0.0003343 0.0002858 11
GO:0048880 sensory system development 12/110 339/15725 0.0000044 0.0003476 0.0002972 12
GO:0048592 eye morphogenesis 8/110 136/15725 0.0000050 0.0003780 0.0003233 8
GO:0042060 wound healing 14/110 484/15725 0.0000070 0.0005112 0.0004371 14
GO:0001525 angiogenesis 14/110 488/15725 0.0000077 0.0005413 0.0004628 14
GO:0001655 urogenital system development 11/110 311/15725 0.0000114 0.0007742 0.0006620 11
GO:0032964 collagen biosynthetic process 5/110 44/15725 0.0000134 0.0008803 0.0007527 5
GO:0042730 fibrinolysis 4/110 23/15725 0.0000181 0.0011553 0.0009879 4
GO:0001822 kidney development 10/110 271/15725 0.0000202 0.0012473 0.0010666 10
GO:0007369 gastrulation 8/110 168/15725 0.0000233 0.0013984 0.0011958 8
GO:0006029 proteoglycan metabolic process 6/110 87/15725 0.0000326 0.0018935 0.0016192 6
GO:0046888 negative regulation of hormone secretion 5/110 55/15725 0.0000403 0.0022399 0.0019154 5
GO:0072006 nephron development 7/110 133/15725 0.0000408 0.0022399 0.0019154 7
GO:0071559 response to transforming growth factor beta 9/110 240/15725 0.0000471 0.0025163 0.0021517 9
GO:0014910 regulation of smooth muscle cell migration 5/110 58/15725 0.0000521 0.0026389 0.0022565 5
GO:0061029 eyelid development in camera-type eye 3/110 11/15725 0.0000528 0.0026389 0.0022565 3
GO:0048608 reproductive structure development 11/110 368/15725 0.0000534 0.0026389 0.0022565 11
GO:0061458 reproductive system development 11/110 371/15725 0.0000574 0.0027698 0.0023685 11
GO:0003338 metanephros morphogenesis 4/110 31/15725 0.0000617 0.0029023 0.0024817 4
GO:0061035 regulation of cartilage development 5/110 61/15725 0.0000666 0.0030613 0.0026177 5
GO:0060428 lung epithelium development 4/110 33/15725 0.0000793 0.0035328 0.0030209 4
GO:1903034 regulation of response to wounding 7/110 148/15725 0.0000804 0.0035328 0.0030209 7
GO:0014909 smooth muscle cell migration 5/110 65/15725 0.0000904 0.0038860 0.0033229 5
GO:0038065 collagen-activated signaling pathway 3/110 14/15725 0.0001146 0.0048212 0.0041226 3
GO:0033627 cell adhesion mediated by integrin 5/110 71/15725 0.0001379 0.0056782 0.0048554 5
GO:0031589 cell-substrate adhesion 10/110 347/15725 0.0001597 0.0064431 0.0055094 10
GO:0043010 camera-type eye development 9/110 286/15725 0.0001793 0.0069487 0.0059418 9
GO:0046879 hormone secretion 9/110 286/15725 0.0001793 0.0069487 0.0059418 9
GO:0061041 regulation of wound healing 6/110 119/15725 0.0001868 0.0071011 0.0060721 6
GO:0007517 muscle organ development 9/110 291/15725 0.0002040 0.0076092 0.0065066 9
GO:0014812 muscle cell migration 5/110 78/15725 0.0002150 0.0077296 0.0066096 5
GO:0022617 extracellular matrix disassembly 5/110 78/15725 0.0002150 0.0077296 0.0066096 5
GO:0060562 epithelial tube morphogenesis 9/110 295/15725 0.0002258 0.0079705 0.0068156 9
GO:0009914 hormone transport 9/110 296/15725 0.0002315 0.0080297 0.0068661 9
GO:0071560 cellular response to transforming growth factor beta stimulus 8/110 234/15725 0.0002373 0.0080884 0.0069164 8
GO:0007566 embryo implantation 4/110 44/15725 0.0002481 0.0083118 0.0071074 4
GO:1903035 negative regulation of response to wounding 5/110 81/15725 0.0002567 0.0084577 0.0072322 5
GO:0001890 placenta development 6/110 129/15725 0.0002894 0.0092673 0.0079245 6
GO:0030195 negative regulation of blood coagulation 4/110 46/15725 0.0002950 0.0092673 0.0079245 4
GO:0046697 decidualization 3/110 19/15725 0.0002974 0.0092673 0.0079245 3
GO:0060348 bone development 7/110 185/15725 0.0003196 0.0092673 0.0079245 7
GO:0032330 regulation of chondrocyte differentiation 4/110 47/15725 0.0003207 0.0092673 0.0079245 4
GO:0033628 regulation of cell adhesion mediated by integrin 4/110 47/15725 0.0003207 0.0092673 0.0079245 4
GO:0060425 lung morphogenesis 4/110 47/15725 0.0003207 0.0092673 0.0079245 4
GO:1900047 negative regulation of hemostasis 4/110 47/15725 0.0003207 0.0092673 0.0079245 4
GO:0046883 regulation of hormone secretion 8/110 245/15725 0.0003234 0.0092673 0.0079245 8

Now KEGG analysis but need to convert to entrez first

fg_entrez <- unlist(mget(fg, org.Hs.egENSEMBL2EG, ifnotfound = NA))

bg_entrez <- unlist(mget(bg, org.Hs.egENSEMBL2EG, ifnotfound = NA))


kegg <- enrichKEGG(gene = fg_entrez, universe = bg_entrez,
    organism = "hsa", pAdjustMethod = "BH", pvalueCutoff  = 0.01,
    qvalueCutoff  = 0.05)

kegg <- as.data.frame(kegg)

writeLines(kegg$Description,"PMC6349697_kegg.txt")

kegg[,c(2:7,9)] %>% kbl(caption="Clusterprofiler KEGG results obtained using the correct background") %>% kable_paper("hover", full_width = F)
Clusterprofiler KEGG results obtained using the correct background
Description GeneRatio BgRatio pvalue p.adjust qvalue Count
hsa04974 Protein digestion and absorption 12/55 99/7042 0.0000000 0.0000000 0.0000000 12
hsa04512 ECM-receptor interaction 7/55 85/7042 0.0000037 0.0002106 0.0001847 7
hsa04933 AGE-RAGE signaling pathway in diabetic complications 6/55 100/7042 0.0001165 0.0044265 0.0038829 6
hsa04926 Relaxin signaling pathway 6/55 122/7042 0.0003463 0.0098697 0.0086576 6

The clusterProfiler GO:BP results gave 69 significant terms. Prominent similar terms to the article were ECM, collagen, skeletal system and ossification, whereas clusterprofiler also gave some interesting gene sets such as angiogenesis, smooth muscle cell migration, response to transforming growth factor beta.

With KEGG analysis, only two sets were consistent, while focal adhesion and PI3K-Akt were not significant.

In conclusion, it appears that the lack of a background gene set did not have much of an effect on the results, however the conclusions drawn in the article regarding proliferation and metastasis based on enrichment analysis are unfounded.

Session information

sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
##  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
##  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] eulerr_6.1.1                kableExtra_1.3.4           
##  [3] mitch_1.4.1                 DESeq2_1.32.0              
##  [5] SummarizedExperiment_1.22.0 MatrixGenerics_1.4.3       
##  [7] matrixStats_0.61.0          GenomicRanges_1.44.0       
##  [9] GenomeInfoDb_1.28.4         getDEE2_1.2.0              
## [11] org.Hs.eg.db_3.13.0         AnnotationDbi_1.54.1       
## [13] IRanges_2.26.0              S4Vectors_0.30.2           
## [15] Biobase_2.52.0              BiocGenerics_0.38.0        
## [17] clusterProfiler_4.0.5       reshape2_1.4.4             
## 
## loaded via a namespace (and not attached):
##   [1] shadowtext_0.0.9       fastmatch_1.1-3        systemfonts_1.0.3     
##   [4] plyr_1.8.6             igraph_1.2.8           lazyeval_0.2.2        
##   [7] splines_4.1.2          BiocParallel_1.26.2    ggplot2_3.3.5         
##  [10] digest_0.6.28          yulab.utils_0.0.4      htmltools_0.5.2       
##  [13] GOSemSim_2.18.1        viridis_0.6.2          GO.db_3.13.0          
##  [16] fansi_0.5.0            magrittr_2.0.1         memoise_2.0.0         
##  [19] Biostrings_2.60.2      annotate_1.70.0        graphlayouts_0.7.1    
##  [22] svglite_2.0.0          enrichplot_1.12.3      colorspace_2.0-2      
##  [25] rvest_1.0.2            blob_1.2.2             ggrepel_0.9.1         
##  [28] xfun_0.28              dplyr_1.0.7            crayon_1.4.2          
##  [31] RCurl_1.98-1.5         jsonlite_1.7.2         scatterpie_0.1.7      
##  [34] genefilter_1.74.1      survival_3.2-13        ape_5.5               
##  [37] glue_1.5.0             polyclip_1.10-0        gtable_0.3.0          
##  [40] zlibbioc_1.38.0        XVector_0.32.0         webshot_0.5.2         
##  [43] htm2txt_2.1.1          DelayedArray_0.18.0    scales_1.1.1          
##  [46] DOSE_3.18.3            DBI_1.1.1              GGally_2.1.2          
##  [49] Rcpp_1.0.7             viridisLite_0.4.0      xtable_1.8-4          
##  [52] gridGraphics_0.5-1     tidytree_0.3.6         bit_4.0.4             
##  [55] htmlwidgets_1.5.4      httr_1.4.2             fgsea_1.18.0          
##  [58] gplots_3.1.1           RColorBrewer_1.1-2     ellipsis_0.3.2        
##  [61] pkgconfig_2.0.3        reshape_0.8.8          XML_3.99-0.8          
##  [64] farver_2.1.0           sass_0.4.0             locfit_1.5-9.4        
##  [67] utf8_1.2.2             later_1.3.0            ggplotify_0.1.0       
##  [70] tidyselect_1.1.1       rlang_0.4.12           munsell_0.5.0         
##  [73] tools_4.1.2            cachem_1.0.6           downloader_0.4        
##  [76] generics_0.1.1         RSQLite_2.2.8          evaluate_0.14         
##  [79] stringr_1.4.0          fastmap_1.1.0          yaml_2.2.1            
##  [82] ggtree_3.0.4           knitr_1.36             bit64_4.0.5           
##  [85] tidygraph_1.2.0        caTools_1.18.2         purrr_0.3.4           
##  [88] KEGGREST_1.32.0        ggraph_2.0.5           nlme_3.1-153          
##  [91] mime_0.12              aplot_0.1.1            xml2_1.3.2            
##  [94] DO.db_2.9              rstudioapi_0.13        compiler_4.1.2        
##  [97] beeswarm_0.4.0         png_0.1-7              treeio_1.16.2         
## [100] tibble_3.1.6           tweenr_1.0.2           geneplotter_1.70.0    
## [103] bslib_0.3.1            stringi_1.7.5          highr_0.9             
## [106] lattice_0.20-45        Matrix_1.3-4           vctrs_0.3.8           
## [109] pillar_1.6.4           lifecycle_1.0.1        jquerylib_0.1.4       
## [112] data.table_1.14.2      cowplot_1.1.1          bitops_1.0-7          
## [115] httpuv_1.6.3           patchwork_1.1.1        qvalue_2.24.0         
## [118] R6_2.5.1               promises_1.2.0.1       KernSmooth_2.23-20    
## [121] echarts4r_0.4.2        gridExtra_2.3          gtools_3.9.2          
## [124] MASS_7.3-54            assertthat_0.2.1       GenomeInfoDbData_1.2.6
## [127] grid_4.1.2             ggfun_0.0.4            tidyr_1.1.4           
## [130] rmarkdown_2.11         ggforce_0.3.3          shiny_1.7.1