Abstract Decidualization is a crucial process for successful embryo implantation and pregnancy in humans. Defects in decidualization during early pregnancy are associated with several pregnancy complications, such as pre-eclampsia, intrauterine growth restriction and recurrent pregnancy loss. However, the mechanism underlying decidualization remains poorly understood. In the present study, we performed a systematic analysis of decidualization-related genes using text mining. We identified 286 genes for humans and 287 genes for mice respectively, with an overlap of 111 genes shared by both species. Through enrichment test, we demonstrated that although divergence was observed, the majority of enriched gene ontology terms and pathways were shared by both species, suggesting that functional categories were more conserved than individual genes. We further constructed a decidualization-related protein-protein interaction network consisted of 344 nodes connected via 1,541 edges. We prioritized genes in this network and identified 12 genes that may be key regulators of decidualization. These findings would provide some clues for further research on the mechanism underlying decidualization. Introduction In mammals, pregnancy begins with embryo implantation into uterus [[25]1]. In some species such as humans and mice, invasive embryo implantation is accompanied by a rapid remodeling process in the stromal compartment of uterus known as decidualization. Upon decidualization, stromal cells undergo proliferation and subsequent differentiation into large epithelioid cells characterized by cytoplasmic accumulation of glycogen and lipid droplets, as well as an expansion of Golgi complex and rough endoplasmic reticulum [[26]2, [27]3]. This process is marked by the secretion of decidual prolactin (PRL) and insulin-like growth factor binding protein 1 (IGFBP1) [[28]4]. From a functional perspective, decidualization contributes to uterine angiogenesis and hemostasis during trophoblast invasion and placenta formation [[29]5]. It also enables establishing maternal immunological tolerance to embryonic antigens [[30]6]. Defects in decidualization during early pregnancy are associated with several pregnancy complications, such as pre-eclampsia, intrauterine growth restriction and recurrent pregnancy loss [[31]7]. Therefore, it is imperative to gain a clear understanding of the molecular mechanism underlying decidualization in order to improve reproductive health. In humans, decidualization is initiated spontaneously in the secretory phase of menstrual cycle [[32]8]. If pregnancy is obtained, decidualization continues as the embryo undergoes implantation; otherwise, menstruation occurs. Most knowledge about human decidualization has come from studies using in vitro model systems. It is well established that decidualization can be induced in cultured endometrial stromal cells by incubation with progesterone after proper estrogen priming [[33]9]. Decidualization is mediated by a gradual increase in intracellular cAMP level and addition of cAMP analogues leads to a boost of this process [[34]10, [35]11]. The main advantage of the in vitro model systems is the ability to provide key information on a single cell type reaction. However, a cell growing as a layer in a dish does not have the complexity that a cell growing in vivo has. Most importantly, the uterus is a complex organ comprised of many cell types. Cultured stromal cells lack whole organ physiology and interacting microenvironment. Because of ethical restrictions and experimental difficulties, it is not practical for in vivo study of decidualization in humans. Direct analysis of decidualization heavily relies on mice. Unlike humans, the decidual reaction in mice is an embryo-dependent process [[36]8]. Decidualization begins shortly after the blastocyst attaches to the uterine luminal epithelium. Interestingly, hormonally primed uterus can be stimulated by mechanical means (e.g. sesame oil) to trigger decidualization in the absence of an embryo [[37]12]. The mechanically decidualized endometrium, known as the deciduoma, is morphologically similar to the embryo-induced decidua, making it a good model of in vivo decidualization free of embryo contamination [[38]13, [39]14]. A previous study has compared the global gene expression profiles between deciduoma and decidua [[40]15]. Approximately 1,500 genes were differentially expressed by at least 1.2 folds. However, only 53 genes exhibited 2.5 folds or more, indicating that deciduoma is also similar to decidua at the transcriptome level. Nevertheless, a comprehensive analysis of the molecular mechanism underlying decidualization is lacking. A wealth of information remains hidden within published research articles, the number of which is growing fast. Recently, the text mining methodology has been implemented, providing a necessary means to retrieve these data in an automated way [[41]16]. Here we reported a systematic analysis of decidualization-related genes in humans and mice using text mining. Our study provides in-depth insights into the molecular mechanism underlying decidualization from a comparative aspect. Methods Text mining The PubMed database was used. We conducted a search with the following combinations of query key words: “decidualization OR decidual OR decidua OR deciduas OR deciduoma OR decidualized OR decidualizing”. The search tag “[Title/Abstract]” was added after each key word. The relevant articles were retrieved in XML format. This format makes information extraction more precise owning to the use of enclosed contents within tag pairs. For each article, titles and abstract texts were fetched using the dom4j XML parser class in JAVA. Abstract texts were further divided into sentences through a sentence tokenizer implemented in LingPipe (Alias-I, Inc). Text mining was performed at the sentence level. Species names were parsed based on a lexicon [[42]17]. All articles were classified into two categories according to species names mentioned in the texts: those studying human decidualization (including the monkey) and those studying mouse decidualization (including the rat). When no species name or multiple species names were detected, articles were classified manually. Gene mention recognition was performed using two different gene mention taggers, the hidden Markov model (HMM) tagger implemented in LingPipe and the ABNER tagger [[43]18] based on a machine learning system of conditional random fields (CRF). Gene mentions detected by both taggers were merged. Because researchers name the genes in a highly variable manner, we built a gene synonym dictionary from Entrez gene database [[44]19]. This dictionary was used for the gene name normalization process during which gene mentions were mapped to unique Entrez genes using exact string match. If multiple Entrez genes were linked to the same gene mention, the ambiguity was resolved manually. In order to reduce false positives, we required co-occurrence of decidualization mention and gene mention within a single sentence. In general, the abstract is sufficient for our text mining task, as it contains the most important findings of an article. However, articles on high throughput experiments often reveal a large number of genes which cannot be fully listed in the abstracts. For these articles, we downloaded full texts (as well as supplementary files if needed) and extracted gene mentions by hands. Finally, we compiled two gene sets: one is associated with human decidualization and the other one is associated with mouse decidualization. To ensure accurate and complete recording, each gene was checked manually and additional references