PMC6349697: A replication study

Intro

suppressPackageStartupMessages({
library("getDEE2")
library("DESeq2")
library("clusterProfiler")
library("org.Hs.eg.db")
library("mitch")
library("kableExtra")
library("eulerr")
})

In Kaumadi et al 2021 we showed that major problems plague functional enrichment analysis in many journal articles.

Here I will do my best to reproduce and replicate the findings of some studies. These were selected on the following criteria:

Included in the 2019 group that was studies in Kaumadi et al 2021.
Human study
RNA-seq gene expression
Performed DAVID gene set analysis
Provided gene list

From this subset, four studies were selected

Study	Analytical issues
PMC6349697	Background
PMC6425008	Background, FDR
PMC6463127	Background, FDR
PMC6535219	Background

In this document I will be seeing if I can reproduce the same findings by using the same tool with the gene list provided in the article. Also I will correct the method wherever possible. This means using the correct background gene list, using FDR control and performing separate analysis of up and down-regulated gene lists.

The study examined in this document is PMC6349697: Liu et al, 2019.

In this study, LINC00941 is the focus. From TGCA expression data, a coexpression network was built and then genes associated with LINC00941 were obtained. These genes were then subject to enrichment analysis of GO terms and KEGG pathways using DAVID v6.8, with pathways FDR<0.05 considered significant. No information is provided in the article about a background gene list.

From the article, they present in Supplementary Table S5 a list of 123 co-expressed genes.

In Supplementary Table S6 they show the 3180 genes that underwent clustering, along with the cluster assignment. There are 124 genes in the “yellow” cluster.

In Supplementary Table S7, authors show the significant results of the enrichment tests.

Category	Term	Count	FDR
GOTERM_BP_DIRECT	GO:0030574~collagen catabolic process	15	2.73E-15
GOTERM_BP_DIRECT	GO:0030199~collagen fibril organization	13	8.48E-15
GOTERM_BP_DIRECT	GO:0030198~extracellular matrix organization	20	1.43E-14
GOTERM_BP_DIRECT	GO:0007155~cell adhesion	21	8.40E-09
GOTERM_BP_DIRECT	GO:0001501~skeletal system development	10	2.67E-04
GOTERM_BP_DIRECT	GO:0001503~ossification	7	0.0154
KEGG	hsa04512:ECM-receptor interaction	10	1.81E-06
KEGG	hsa04974:Protein digestion and absorption	9	4.53E-05
KEGG	hsa04510:Focal adhesion	10	0.0032
KEGG	hsa04151:PI3K-Akt signaling pathway	11	0.032

Try to reproduce

Here I’m going to reproduce the findings with DAVID as best I can.

Here is the list of genes used as the foreground. The gene IDs are saved as a txt file on this repository called “PMC6349697_fg.txt”

fg <- readLines("PMC6349697_fg.txt")

fg

##   [1] "ENSG00000142552" "ENSG00000183682" "ENSG00000137868" "ENSG00000140937"
##   [5] "ENSG00000164932" "ENSG00000099953" "ENSG00000261732" "ENSG00000163331"
##   [9] "ENSG00000141756" "ENSG00000168913" "ENSG00000145040" "ENSG00000261456"
##  [13] "ENSG00000183844" "ENSG00000005243" "ENSG00000072210" "ENSG00000179698"
##  [17] "ENSG00000106683" "ENSG00000113739" "ENSG00000110427" "ENSG00000117586"
##  [21] "ENSG00000162849" "ENSG00000124225" "ENSG00000169385" "ENSG00000164465"
##  [25] "ENSG00000187959" "ENSG00000135362" "ENSG00000163359" "ENSG00000091879"
##  [29] "ENSG00000184937" "ENSG00000149380" "ENSG00000148848" "ENSG00000171617"
##  [33] "ENSG00000146674" "ENSG00000006327" "ENSG00000086717" "ENSG00000198108"
##  [37] "ENSG00000011422" "ENSG00000133466" "ENSG00000105989" "ENSG00000112655"
##  [41] "ENSG00000188483" "ENSG00000118785" "ENSG00000101198" "ENSG00000172061"
##  [45] "ENSG00000134013" "ENSG00000134668" "ENSG00000011201" "ENSG00000060718"
##  [49] "ENSG00000204262" "ENSG00000168542" "ENSG00000151388" "ENSG00000248329"
##  [53] "ENSG00000113140" "ENSG00000132000" "ENSG00000187498" "ENSG00000130948"
##  [57] "ENSG00000176887" "ENSG00000203805" "ENSG00000095752" "ENSG00000101825"
##  [61] "ENSG00000138316" "ENSG00000182492" "ENSG00000128567" "ENSG00000078098"
##  [65] "ENSG00000168487" "ENSG00000111799" "ENSG00000106366" "ENSG00000137745"
##  [69] "ENSG00000157766" "ENSG00000214954" "ENSG00000147003" "ENSG00000102128"
##  [73] "ENSG00000154096" "ENSG00000196177" "ENSG00000168334" "ENSG00000136378"
##  [77] "ENSG00000115363" "ENSG00000159261" "ENSG00000125657" "ENSG00000137878"
##  [81] "ENSG00000137573" "ENSG00000164694" "ENSG00000164692" "ENSG00000130635"
##  [85] "ENSG00000159216" "ENSG00000131389" "ENSG00000122641" "ENSG00000168824"
##  [89] "ENSG00000038427" "ENSG00000124875" "ENSG00000162745" "ENSG00000169067"
##  [93] "ENSG00000123500" "ENSG00000163673" "ENSG00000117122" "ENSG00000130720"
##  [97] "ENSG00000113083" "ENSG00000204767" "ENSG00000155886" "ENSG00000119711"
## [101] "ENSG00000103888" "ENSG00000181378" "ENSG00000149257" "ENSG00000188611"
## [105] "ENSG00000087303" "ENSG00000087116" "ENSG00000104415" "ENSG00000108821"
## [109] "ENSG00000222047" "ENSG00000170373" "ENSG00000114270" "ENSG00000144810"
## [113] "ENSG00000188064" "ENSG00000187730" "ENSG00000171722" "ENSG00000122861"
## [117] "ENSG00000128342" "ENSG00000137809" "ENSG00000225614" "ENSG00000196611"
## [121] "ENSG00000177202" "ENSG00000186340" "ENSG00000086991" "ENSG00000235884"

These genes were submitted to DAVID v6.8 on 29th November 2021 without any particular background gene set, which gave the following results.

res1 <- read.table("PMC6349697_res1.tsv",header=TRUE,sep="\t")

res1 %>% kbl(caption="DAVID results obtained using article gene list without background") %>% kable_paper("hover", full_width = F)

DAVID results obtained using article gene list without background
Category	Term	Count	X.	PValue	List.Total	Pop.Hits	Pop.Total	Fold.Enrichment	FDR
GOTERM_BP_DIRECT	GO:0030574~collagen catabolic process	15	12.195122	0.0000000	105	64	16792	37.482143	0.0000000
GOTERM_BP_DIRECT	GO:0030199~collagen fibril organization	13	10.569106	0.0000000	105	39	16792	53.307936	0.0000000
GOTERM_BP_DIRECT	GO:0030198~extracellular matrix organization	20	16.260163	0.0000000	105	196	16792	16.318756	0.0000000
GOTERM_BP_DIRECT	GO:0007155~cell adhesion	21	17.073171	0.0000000	105	459	16792	7.316776	0.0000000
GOTERM_BP_DIRECT	GO:0001501~skeletal system development	10	8.130081	0.0000002	105	137	16792	11.673271	0.0000257
GOTERM_BP_DIRECT	GO:0001503~ossification	7	5.691057	0.0000101	105	80	16792	13.993333	0.0012345
GOTERM_BP_DIRECT	GO:0022617~extracellular matrix disassembly	6	4.878049	0.0001078	105	76	16792	12.625564	0.0098657
GOTERM_BP_DIRECT	GO:0030324~lung development	6	4.878049	0.0001078	105	76	16792	12.625564	0.0098657
GOTERM_BP_DIRECT	GO:0032964~collagen biosynthetic process	3	2.439024	0.0005607	105	6	16792	79.961905	0.0443490
GOTERM_BP_DIRECT	GO:0035987~endodermal cell differentiation	4	3.252032	0.0006059	105	27	16792	23.692416	0.0443490
GOTERM_BP_DIRECT	GO:0007507~heart development	7	5.691057	0.0009683	105	183	16792	6.117304	0.0633311
GOTERM_BP_DIRECT	GO:0071711~basement membrane organization	3	2.439024	0.0010382	105	8	16792	59.971429	0.0633311
GOTERM_BP_DIRECT	GO:0001568~blood vessel development	4	3.252032	0.0016632	105	38	16792	16.834085	0.0936507
KEGG_PATHWAY	hsa04512:ECM-receptor interaction	10	8.130081	0.0000000	43	87	6879	18.388132	0.0000001
KEGG_PATHWAY	hsa04974:Protein digestion and absorption	9	7.317073	0.0000000	43	88	6879	16.361258	0.0000015
KEGG_PATHWAY	hsa04510:Focal adhesion	10	8.130081	0.0000031	43	206	6879	7.765861	0.0000738
KEGG_PATHWAY	hsa04151:PI3K-Akt signaling pathway	11	8.943089	0.0000308	43	345	6879	5.100708	0.0005549
KEGG_PATHWAY	hsa05146:Amoebiasis	6	4.878049	0.0004272	43	106	6879	9.055287	0.0061517
KEGG_PATHWAY	hsa04060:Cytokine-cytokine receptor interaction	7	5.691057	0.0032944	43	243	6879	4.608384	0.0395333
KEGG_PATHWAY	hsa04611:Platelet activation	5	4.065041	0.0078237	43	130	6879	6.152952	0.0804719

This result means that most pathways could be replicated using the gene list as provided by the authors without the use of a custom background gene list.

Try to reproduce with corrected background gene list

Now I will try to use a corrected gene list. I see that the Table S6 has all the other clustered genes sets, totalling 3180 genes, however this is likely not all the genes that were detected in the dataset. Therefore, I downloaded the TGCA data used in the study and identified all the genes with an average of 10 reads per sample or more.

There are 25947 genes in the background.

bg <- readLines("PMC6349697_bg.txt")

length(bg)

## [1] 25947

These genes are stored as “PMC6349697_bg.txt” in the repo.

Next I repeated DAVID analysis, this time using the correct background gene list. Here is the result I obtained.

res2 <- read.table("PMC6349697_res2.tsv",header=TRUE,sep="\t")

res2 %>% kbl(caption="DAVID results obtained using the correct background") %>% kable_paper("hover", full_width = F)

DAVID results obtained using the correct background
Category	Term	Count	X.	PValue	List.Total	Pop.Hits	Pop.Total	Fold.Enrichment	FDR
GOTERM_BP_DIRECT	GO:0030574~collagen catabolic process	15	12.195122	0.0000000	103	60	14590	35.412621	0.0000000
GOTERM_BP_DIRECT	GO:0030199~collagen fibril organization	13	10.569106	0.0000000	103	37	14590	49.769089	0.0000000
GOTERM_BP_DIRECT	GO:0030198~extracellular matrix organization	20	16.260163	0.0000000	103	185	14590	15.313566	0.0000000
GOTERM_BP_DIRECT	GO:0007155~cell adhesion	21	17.073171	0.0000000	103	429	14590	6.933940	0.0000000
GOTERM_BP_DIRECT	GO:0001501~skeletal system development	10	8.130081	0.0000002	103	126	14590	11.242102	0.0000339
GOTERM_BP_DIRECT	GO:0001503~ossification	7	5.691057	0.0000100	103	71	14590	13.965541	0.0012063
GOTERM_BP_DIRECT	GO:0030324~lung development	6	4.878049	0.0001193	103	69	14590	12.317433	0.0123536
GOTERM_BP_DIRECT	GO:0022617~extracellular matrix disassembly	6	4.878049	0.0001559	103	73	14590	11.642506	0.0141314
GOTERM_BP_DIRECT	GO:0035987~endodermal cell differentiation	4	3.252032	0.0006823	103	25	14590	22.664078	0.0516796
GOTERM_BP_DIRECT	GO:0032964~collagen biosynthetic process	3	2.439024	0.0007128	103	6	14590	70.825243	0.0516796
GOTERM_BP_DIRECT	GO:0071711~basement membrane organization	3	2.439024	0.0009934	103	7	14590	60.707351	0.0654741
GOTERM_BP_DIRECT	GO:0007507~heart development	7	5.691057	0.0013019	103	172	14590	5.764845	0.0786564
KEGG_PATHWAY	hsa04512:ECM-receptor interaction	10	8.130081	0.0000000	42	81	5885	17.298648	0.0000002
KEGG_PATHWAY	hsa04974:Protein digestion and absorption	9	7.317073	0.0000001	42	83	5885	15.193632	0.0000025
KEGG_PATHWAY	hsa04510:Focal adhesion	10	8.130081	0.0000061	42	197	5885	7.112642	0.0001419
KEGG_PATHWAY	hsa04151:PI3K-Akt signaling pathway	11	8.943089	0.0000360	42	309	5885	4.988057	0.0006298
KEGG_PATHWAY	hsa05146:Amoebiasis	6	4.878049	0.0005387	42	98	5885	8.578717	0.0075413
KEGG_PATHWAY	hsa04060:Cytokine-cytokine receptor interaction	7	5.691057	0.0025016	42	202	5885	4.855611	0.0291855
KEGG_PATHWAY	hsa04611:Platelet activation	5	4.065041	0.0098511	42	122	5885	5.742584	0.0985112

In contrast to the six GOs and four KEGGs identified with FDR<0.05 originally, with the corrected background there’s eight GOs and six KEGGs now.

What about the conclusions of the study?

Some interesting statements about the enrichment analysis:

Abstract:

“… functional enrichment analysis of LINC00941 co-expression network demonstrated that LINC00941 might be an essential regulator of tumor metastasis and cancer cell proliferation.”

Introduction final paragraph:

“… through the functional enrichment analysis, we found that LINC00941 might be an essential regulator of tumor metastasis and cancer cell proliferation.”

Results:

“Some GO terms, such as extracellular matrix organization (GO: 0030198) and cell adhesion (GO: 0007155) are cell migration-related GO processes, which are associated with tumor metastasis. We detected that ECM-receptor interaction (KEGG: hsa04512) and Focal adhesion (KEGG: hsa04510) are metastasis-related pathways. PI3K-Akt signaling pathway (KEGG: hsa04151) is cell proliferation-related pathway. Our findings demonstrated that LINC00941 could be a potential regulator of tumor metastasis and cancer cell proliferation.”

Discussion:

“The results of GO terms and KEGG pathways were mainly enriched in cell proliferation, cell migration, and tumor metastasis.”

Indeed the results indicate LINC00941 is co-regulated with genes involved in ECM structure and metabolism. This pathway might be involved with tissue remodelling involved with tumour maturation. This looks to all be okay, apart from one statement about proliferation. None of these ontologies/pathways are related to cell cycle or proliferation. The statement in the discussion is also suspect, because none of these pathways are closely linked to proliferation, migration or metastasis. There are likely other gene sets available which can be used to test these associations more thoroughly. For example there are 58 gene sets containing the keyword “metastasis” in the MSigDB v7.4 collection (29/11/2021).

What about a replication using an R script

Here I’m using the enrichGO function to analyse the data. The algorithm is slightly different and the gene sets might be a different version. First GO biological processes.

go_bp <- enrichGO(gene = fg,  keyType = "ENSEMBL", universe = bg,
    OrgDb = org.Hs.eg.db, ont = "BP", pAdjustMethod = "BH", pvalueCutoff  = 0.01,
    qvalueCutoff  = 0.05, readable = TRUE)

go_bp <- data.frame(go_bp)

go_bp <- subset(go_bp,qvalue<0.05)

nrow(go_bp)

## [1] 69

writeLines(go_bp$Description,"PMC6349697_gobp.txt")

go_bp[,c(2:7,9)] %>% kbl(caption="Clusterprofiler GO results obtained using the correct background") %>% kable_paper("hover", full_width = F)

Clusterprofiler GO results obtained using the correct background
	Description	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count
GO:0030198	extracellular matrix organization	38/110	379/15725	0.0000000	0.0000000	0.0000000	38
GO:0043062	extracellular structure organization	38/110	380/15725	0.0000000	0.0000000	0.0000000	38
GO:0045229	external encapsulating structure organization	38/110	382/15725	0.0000000	0.0000000	0.0000000	38
GO:0030199	collagen fibril organization	14/110	51/15725	0.0000000	0.0000000	0.0000000	14
GO:0001501	skeletal system development	23/110	457/15725	0.0000000	0.0000000	0.0000000	23
GO:0032963	collagen metabolic process	12/110	98/15725	0.0000000	0.0000000	0.0000000	12
GO:0061448	connective tissue development	16/110	227/15725	0.0000000	0.0000000	0.0000000	16
GO:0001503	ossification	18/110	369/15725	0.0000000	0.0000000	0.0000000	18
GO:0051216	cartilage development	13/110	173/15725	0.0000000	0.0000001	0.0000000	13
GO:0035987	endodermal cell differentiation	7/110	44/15725	0.0000000	0.0000041	0.0000035	7
GO:0007492	endoderm development	8/110	75/15725	0.0000001	0.0000091	0.0000077	8
GO:0060541	respiratory system development	11/110	181/15725	0.0000001	0.0000091	0.0000077	11
GO:0001706	endoderm formation	7/110	53/15725	0.0000001	0.0000122	0.0000104	7
GO:0030324	lung development	10/110	157/15725	0.0000001	0.0000208	0.0000178	10
GO:0030323	respiratory tube development	10/110	161/15725	0.0000002	0.0000246	0.0000210	10
GO:0007423	sensory organ development	16/110	488/15725	0.0000003	0.0000341	0.0000291	16
GO:0002062	chondrocyte differentiation	8/110	99/15725	0.0000004	0.0000521	0.0000445	8
GO:0030574	collagen catabolic process	6/110	44/15725	0.0000006	0.0000615	0.0000526	6
GO:0090596	sensory organ morphogenesis	11/110	229/15725	0.0000006	0.0000615	0.0000526	11
GO:0001704	formation of primary germ layer	8/110	109/15725	0.0000009	0.0000927	0.0000793	8
GO:0001649	osteoblast differentiation	10/110	203/15725	0.0000016	0.0001482	0.0001267	10
GO:0001654	eye development	12/110	329/15725	0.0000032	0.0002905	0.0002484	12
GO:0150063	visual system development	12/110	333/15725	0.0000037	0.0003146	0.0002691	12
GO:0072001	renal system development	11/110	279/15725	0.0000041	0.0003343	0.0002858	11
GO:0048880	sensory system development	12/110	339/15725	0.0000044	0.0003476	0.0002972	12
GO:0048592	eye morphogenesis	8/110	136/15725	0.0000050	0.0003780	0.0003233	8
GO:0042060	wound healing	14/110	484/15725	0.0000070	0.0005112	0.0004371	14
GO:0001525	angiogenesis	14/110	488/15725	0.0000077	0.0005413	0.0004628	14
GO:0001655	urogenital system development	11/110	311/15725	0.0000114	0.0007742	0.0006620	11
GO:0032964	collagen biosynthetic process	5/110	44/15725	0.0000134	0.0008803	0.0007527	5
GO:0042730	fibrinolysis	4/110	23/15725	0.0000181	0.0011553	0.0009879	4
GO:0001822	kidney development	10/110	271/15725	0.0000202	0.0012473	0.0010666	10
GO:0007369	gastrulation	8/110	168/15725	0.0000233	0.0013984	0.0011958	8
GO:0006029	proteoglycan metabolic process	6/110	87/15725	0.0000326	0.0018935	0.0016192	6
GO:0046888	negative regulation of hormone secretion	5/110	55/15725	0.0000403	0.0022399	0.0019154	5
GO:0072006	nephron development	7/110	133/15725	0.0000408	0.0022399	0.0019154	7
GO:0071559	response to transforming growth factor beta	9/110	240/15725	0.0000471	0.0025163	0.0021517	9
GO:0014910	regulation of smooth muscle cell migration	5/110	58/15725	0.0000521	0.0026389	0.0022565	5
GO:0061029	eyelid development in camera-type eye	3/110	11/15725	0.0000528	0.0026389	0.0022565	3
GO:0048608	reproductive structure development	11/110	368/15725	0.0000534	0.0026389	0.0022565	11
GO:0061458	reproductive system development	11/110	371/15725	0.0000574	0.0027698	0.0023685	11
GO:0003338	metanephros morphogenesis	4/110	31/15725	0.0000617	0.0029023	0.0024817	4
GO:0061035	regulation of cartilage development	5/110	61/15725	0.0000666	0.0030613	0.0026177	5
GO:0060428	lung epithelium development	4/110	33/15725	0.0000793	0.0035328	0.0030209	4
GO:1903034	regulation of response to wounding	7/110	148/15725	0.0000804	0.0035328	0.0030209	7
GO:0014909	smooth muscle cell migration	5/110	65/15725	0.0000904	0.0038860	0.0033229	5
GO:0038065	collagen-activated signaling pathway	3/110	14/15725	0.0001146	0.0048212	0.0041226	3
GO:0033627	cell adhesion mediated by integrin	5/110	71/15725	0.0001379	0.0056782	0.0048554	5
GO:0031589	cell-substrate adhesion	10/110	347/15725	0.0001597	0.0064431	0.0055094	10
GO:0043010	camera-type eye development	9/110	286/15725	0.0001793	0.0069487	0.0059418	9
GO:0046879	hormone secretion	9/110	286/15725	0.0001793	0.0069487	0.0059418	9
GO:0061041	regulation of wound healing	6/110	119/15725	0.0001868	0.0071011	0.0060721	6
GO:0007517	muscle organ development	9/110	291/15725	0.0002040	0.0076092	0.0065066	9
GO:0014812	muscle cell migration	5/110	78/15725	0.0002150	0.0077296	0.0066096	5
GO:0022617	extracellular matrix disassembly	5/110	78/15725	0.0002150	0.0077296	0.0066096	5
GO:0060562	epithelial tube morphogenesis	9/110	295/15725	0.0002258	0.0079705	0.0068156	9
GO:0009914	hormone transport	9/110	296/15725	0.0002315	0.0080297	0.0068661	9
GO:0071560	cellular response to transforming growth factor beta stimulus	8/110	234/15725	0.0002373	0.0080884	0.0069164	8
GO:0007566	embryo implantation	4/110	44/15725	0.0002481	0.0083118	0.0071074	4
GO:1903035	negative regulation of response to wounding	5/110	81/15725	0.0002567	0.0084577	0.0072322	5
GO:0001890	placenta development	6/110	129/15725	0.0002894	0.0092673	0.0079245	6
GO:0030195	negative regulation of blood coagulation	4/110	46/15725	0.0002950	0.0092673	0.0079245	4
GO:0046697	decidualization	3/110	19/15725	0.0002974	0.0092673	0.0079245	3
GO:0060348	bone development	7/110	185/15725	0.0003196	0.0092673	0.0079245	7
GO:0032330	regulation of chondrocyte differentiation	4/110	47/15725	0.0003207	0.0092673	0.0079245	4
GO:0033628	regulation of cell adhesion mediated by integrin	4/110	47/15725	0.0003207	0.0092673	0.0079245	4
GO:0060425	lung morphogenesis	4/110	47/15725	0.0003207	0.0092673	0.0079245	4
GO:1900047	negative regulation of hemostasis	4/110	47/15725	0.0003207	0.0092673	0.0079245	4
GO:0046883	regulation of hormone secretion	8/110	245/15725	0.0003234	0.0092673	0.0079245	8

Now KEGG analysis but need to convert to entrez first

fg_entrez <- unlist(mget(fg, org.Hs.egENSEMBL2EG, ifnotfound = NA))

bg_entrez <- unlist(mget(bg, org.Hs.egENSEMBL2EG, ifnotfound = NA))


kegg <- enrichKEGG(gene = fg_entrez, universe = bg_entrez,
    organism = "hsa", pAdjustMethod = "BH", pvalueCutoff  = 0.01,
    qvalueCutoff  = 0.05)

kegg <- as.data.frame(kegg)

writeLines(kegg$Description,"PMC6349697_kegg.txt")

kegg[,c(2:7,9)] %>% kbl(caption="Clusterprofiler KEGG results obtained using the correct background") %>% kable_paper("hover", full_width = F)

Clusterprofiler KEGG results obtained using the correct background
	Description	GeneRatio	BgRatio	pvalue	p.adjust	qvalue	Count
hsa04974	Protein digestion and absorption	12/55	99/7042	0.0000000	0.0000000	0.0000000	12
hsa04512	ECM-receptor interaction	7/55	85/7042	0.0000037	0.0002106	0.0001847	7
hsa04933	AGE-RAGE signaling pathway in diabetic complications	6/55	100/7042	0.0001165	0.0044265	0.0038829	6
hsa04926	Relaxin signaling pathway	6/55	122/7042	0.0003463	0.0098697	0.0086576	6

The clusterProfiler GO:BP results gave 69 significant terms. Prominent similar terms to the article were ECM, collagen, skeletal system and ossification, whereas clusterprofiler also gave some interesting gene sets such as angiogenesis, smooth muscle cell migration, response to transforming growth factor beta.

With KEGG analysis, only two sets were consistent, while focal adhesion and PI3K-Akt were not significant.

In conclusion, it appears that the lack of a background gene set did not have much of an effect on the results, however the conclusions drawn in the article regarding proliferation and metastasis based on enrichment analysis are unfounded.

Session information

sessionInfo()

## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
##  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
##  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] eulerr_6.1.1                kableExtra_1.3.4           
##  [3] mitch_1.4.1                 DESeq2_1.32.0              
##  [5] SummarizedExperiment_1.22.0 MatrixGenerics_1.4.3       
##  [7] matrixStats_0.61.0          GenomicRanges_1.44.0       
##  [9] GenomeInfoDb_1.28.4         getDEE2_1.2.0              
## [11] org.Hs.eg.db_3.13.0         AnnotationDbi_1.54.1       
## [13] IRanges_2.26.0              S4Vectors_0.30.2           
## [15] Biobase_2.52.0              BiocGenerics_0.38.0        
## [17] clusterProfiler_4.0.5       reshape2_1.4.4             
## 
## loaded via a namespace (and not attached):
##   [1] shadowtext_0.0.9       fastmatch_1.1-3        systemfonts_1.0.3     
##   [4] plyr_1.8.6             igraph_1.2.8           lazyeval_0.2.2        
##   [7] splines_4.1.2          BiocParallel_1.26.2    ggplot2_3.3.5         
##  [10] digest_0.6.28          yulab.utils_0.0.4      htmltools_0.5.2       
##  [13] GOSemSim_2.18.1        viridis_0.6.2          GO.db_3.13.0          
##  [16] fansi_0.5.0            magrittr_2.0.1         memoise_2.0.0         
##  [19] Biostrings_2.60.2      annotate_1.70.0        graphlayouts_0.7.1    
##  [22] svglite_2.0.0          enrichplot_1.12.3      colorspace_2.0-2      
##  [25] rvest_1.0.2            blob_1.2.2             ggrepel_0.9.1         
##  [28] xfun_0.28              dplyr_1.0.7            crayon_1.4.2          
##  [31] RCurl_1.98-1.5         jsonlite_1.7.2         scatterpie_0.1.7      
##  [34] genefilter_1.74.1      survival_3.2-13        ape_5.5               
##  [37] glue_1.5.0             polyclip_1.10-0        gtable_0.3.0          
##  [40] zlibbioc_1.38.0        XVector_0.32.0         webshot_0.5.2         
##  [43] htm2txt_2.1.1          DelayedArray_0.18.0    scales_1.1.1          
##  [46] DOSE_3.18.3            DBI_1.1.1              GGally_2.1.2          
##  [49] Rcpp_1.0.7             viridisLite_0.4.0      xtable_1.8-4          
##  [52] gridGraphics_0.5-1     tidytree_0.3.6         bit_4.0.4             
##  [55] htmlwidgets_1.5.4      httr_1.4.2             fgsea_1.18.0          
##  [58] gplots_3.1.1           RColorBrewer_1.1-2     ellipsis_0.3.2        
##  [61] pkgconfig_2.0.3        reshape_0.8.8          XML_3.99-0.8          
##  [64] farver_2.1.0           sass_0.4.0             locfit_1.5-9.4        
##  [67] utf8_1.2.2             later_1.3.0            ggplotify_0.1.0       
##  [70] tidyselect_1.1.1       rlang_0.4.12           munsell_0.5.0         
##  [73] tools_4.1.2            cachem_1.0.6           downloader_0.4        
##  [76] generics_0.1.1         RSQLite_2.2.8          evaluate_0.14         
##  [79] stringr_1.4.0          fastmap_1.1.0          yaml_2.2.1            
##  [82] ggtree_3.0.4           knitr_1.36             bit64_4.0.5           
##  [85] tidygraph_1.2.0        caTools_1.18.2         purrr_0.3.4           
##  [88] KEGGREST_1.32.0        ggraph_2.0.5           nlme_3.1-153          
##  [91] mime_0.12              aplot_0.1.1            xml2_1.3.2            
##  [94] DO.db_2.9              rstudioapi_0.13        compiler_4.1.2        
##  [97] beeswarm_0.4.0         png_0.1-7              treeio_1.16.2         
## [100] tibble_3.1.6           tweenr_1.0.2           geneplotter_1.70.0    
## [103] bslib_0.3.1            stringi_1.7.5          highr_0.9             
## [106] lattice_0.20-45        Matrix_1.3-4           vctrs_0.3.8           
## [109] pillar_1.6.4           lifecycle_1.0.1        jquerylib_0.1.4       
## [112] data.table_1.14.2      cowplot_1.1.1          bitops_1.0-7          
## [115] httpuv_1.6.3           patchwork_1.1.1        qvalue_2.24.0         
## [118] R6_2.5.1               promises_1.2.0.1       KernSmooth_2.23-20    
## [121] echarts4r_0.4.2        gridExtra_2.3          gtools_3.9.2          
## [124] MASS_7.3-54            assertthat_0.2.1       GenomeInfoDbData_1.2.6
## [127] grid_4.1.2             ggfun_0.0.4            tidyr_1.1.4           
## [130] rmarkdown_2.11         ggforce_0.3.3          shiny_1.7.1