Abstract

Background and Objective

   Endometrial cancer (EC) is a common gynecological malignancy worldwide.
   Despite advances in the development of strategies for treating EC,
   prognosis of the disease remains unsatisfactory, especially for
   advanced EC. The aim of this study was to identify novel genes that can
   be used as potential biomarkers for identifying the prognosis of EC and
   to construct a novel risk stratification using these genes.

Methods and Results

   An mRNA sequencing dataset, corresponding survival data and expression
   profiling of an array of EC patients were obtained from The Cancer
   Genome Atlas and Gene Expression Omnibus, respectively. Common
   differentially expressed genes (DEGs) were identified based on
   sequencing and expression as given in the profiling dataset. Pathway
   enrichment analysis of the DEGs was performed using the Database for
   Annotation, Visualization, and Integrated Discovery. The
   protein–protein interaction network was established using the string
   online database in order to identify hub genes. Univariate and
   multivariable Cox regression analyses were used to screen prognostic
   DEGs and to construct a prognostic signature. Survival analysis based
   on the prognostic signature was performed on TCGA EC dataset. A total
   of 255 common DEGs were found and 11 hub genes (TOP2A, CDK1, CCNB1,
   CCNB2, AURKA, PCNA, CCNA2, BIRC5, NDC80, CDC20, and BUB1BA) that may be
   closely related to the pathogenesis of EC were identified. A panel of 7
   DEG signatures consisting of PHLDA2, GGH, ESPL1, FAM184A, KIAA1644,
   ESPL1, and TRPM4 were constructed. The signature performed well for
   prognosis prediction (p < 0.001) and time-dependent receiver–operating
   characteristic (ROC) analysis displayed an area under the curve (AUC)
   of 0.797, 0.734, 0.729, and 0.647 for 1, 3, 5, and 10-year overall
   survival (OS) prediction, respectively.

Conclusion

   This study identified potential genes that may be involved in the
   pathophysiology of EC and constructed a novel gene expression signature
   for EC risk stratification and prognosis prediction.

   Keywords: endometrial cancer, bioinformatics, prognosis, biomarker,
   GEO, TCGA

Introduction

   Endometrial cancer (EC) is a group of epithelial malignancies that
   occur in the endometrium and is the most common gynecological
   malignancy in developed countries. It is estimated that the incidence
   and mortality of EC was 22.2/100,000 and 4.4/100,000, respectively, in
   Europe and 8.4/100,000, 1.8/100,000, respectively, worldwide in females
   in 2018 ([27]Ferlay et al., 2018a,[28]b). In China, the incidence and
   mortality of EC was 6.6/100,000 and 1.54/100,000, respectively, in
   females in 2014 ([29]Chen et al., 2014). The incidence of EC has
   increased during recent years based on the population age and
   population size ([30]Chen et al., 2017; [31]Global Burden of Disease
   Cancer Collaboration, Fitzmaurice et al., 2018). While great advances
   have been made regarding treatment options available for EC, such as
   surgical interventions, radiotherapy and chemotherapy, large
   differences exist in the outcomes for patients with different stages of
   EC. Early EC patients usually have good prognosis but advanced,
   recurrent, or metastatic EC patients commonly have a bad outcome, which
   contributes to an ineffective response to radical surgery for EC
   ([32]Creasman et al., 2006; [33]Watari et al., 2009; [34]Mcgunigal et
   al., 2017). Therefore, there is an urgent need to identify new
   molecules that can be used as diagnostic biological markers, molecular
   therapeutic targets, and to predict prognosis of EC.

   Endometrial cancer development and progression occurs as a result of
   environmental factors and genetic variation, and shows different
   pathological and molecular characteristics. Classification of EC has
   been established based on different systems including clinical,
   metabolic, and endocrine, histological, and genetic alterations. These
   characteristics are usually used as a guide for selecting treatment
   strategies and prognosis assessments for EC patients ([35]Bokhman,
   1983; [36]Han et al., 2013; [37]Murali et al., 2014). A few clinical
   factors and pathological features further determine risk level and the
   prognosis of EC patients. Risk stratification comprehensive analysis of
   EC patients based on tumor stage, clinical and biological prognostic
   factors has been established and utilized ([38]Pecorelli, 2009;
   [39]Korkmaz et al., 2017). However, many genes and pathways are also
   associated with risk level and prognosis of EC patients ([40]Stelloo et
   al., 2015). Along with the development of next-generation sequencing, a
   large number of differentially expressed genes (DEGs) have been
   discovered between EC tissue and normal endometrium tissue, which have
   been applied to characterize EC into four subtypes ([41]Church et al.,
   2013). Furthermore, a few DEGs can be used as biomarkers for EC risk
   stratification and prognosis ([42]O’mara et al., 2016; [43]Corrado et
   al., 2018). However, only a few studies have conducted a comprehensive
   analysis of DEGs related to risk judgment and prognosis of EC.

   The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO)
   database contain many high-throughput sequencing and gene expression
   profile data of many different cancer types at DNA, RNA, protein, and
   epigenetic levels. These genomic data are publicly available and play
   an important role in exploring the molecular characteristics of cancer
   occurrence, recurrence, as well as metastasis and in improving
   diagnosis and treatment of cancer ([44]Tomczak et al., 2015). In recent
   years, a new molecular typing of EC has been developed through
   comprehensive genomic and transcriptomic analysis of ECs using TCGA
   high-throughput sequencing data, which can greatly contribute to
   develop a targeted therapy for a specific genetic mutation population
   ([45]Mcalpine et al., 2018). Additionally, comprehensive analysis of
   DEGs based on TCGA and GEO data has found new models consisting of many
   DEGs that have been used for risk stratification and as potential
   diagnosis and prognosis biomarkers in certain cancers ([46]Zhou et al.,
   2015; [47]Huang et al., 2017; [48]Liu X. et al., 2018).

   In this study, we first identified DEGs through an integrated analysis
   based on TCGA and GEO gene expression data of EC tissue and normal
   endometrial tissue. A bioinformatics analysis was used to analyze
   potential prognosis biomarkers for predicting the survival of EC
   patients using TCGA datasets. Finally, we constructed a DEG
   expression-based prognostic signature, which may contribute to the
   development of risk stratification and prognosis assessment of EC
   patients.

Materials and Methods

Data Source

   The gene microarray expression data of [49]GSE63678, including 7 EC
   tissue samples and 5 normal endometrial tissue samples was downloaded
   from the Gene Expression Omnibus (GEO) database^[50]1. The EC dataset
   containing 551 tumor samples and 35 normal samples, which included raw
   counts of mRNA expression data and corresponding clinical information,
   was obtained from The Cancer Genome Atlas (TCGA) dataset^[51]2. Data in
   this study was obtained from GEO and TCGA public databases and the
   acquisition and application method complied with guidelines and
   policies of each database.

Differentially Expressed Gene (DEG) Screening

   THE [52]GSE63678 expression profile was normalized and analyzed using
   the limma package of R software. The TCGA EC dataset was normalized and
   analyzed using the edgR package of R software. The criteria of a false
   discovery rate (FDR) p-value < 0.05 and | logFC| > 1 were applied to
   screen the DEGs. The DEGs that were overlapping in the [53]GSE63678 and
   TCGA EC datasets were named as common DEGs and were clustered using the
   pheatmap package of R software.

Functional Enrichment Analysis of DEGs

   The Database for Annotation, Visualization and Integrated Discovery
   (DAVID) v6.8^[54]3 was used to analyze the common DEGs using gene
   ontology (GO) enrichment analysis to identify the biological processes,
   molecular functions, cellular components, and signaling pathways
   associated with these DEGs. A p-value of <0.05 was considered as
   statistically significant.

Protein–Protein Interaction (PPI) Network and Module Analysis

   The potential relationship between the DEGs encoding proteins was
   analyzed using the STRING database^[55]4. Visualization of the PPI
   network was done using Cytoscape software. Genes with the top 10
   highest degrees in the PPI network were viewed as hub genes. Module
   analysis of the PPI network was performed using the Molecular Complex
   Detection (MCODE) tool of Cytoscape software. Functional enrichment
   analysis of the modules was carried out using the DAVID database.

Survival Analysis

   In the TCGA EC dataset, patients with a survival time of more than 30
   days were used for the survival analysis. The raw count of the DEGs
   were log2(x+1) transformed and Univariate Cox proportional hazards
   regression analysis was used to identify the potential genes involved
   in overall survival. DEGs with a p-value < 0.05 were subsequently used
   for multivariate Cox proportional hazards regression analysis to
   identify prognostic gene markers. In order to further evaluate the
   relative contribution of these prognostic gene markers to patient
   survival prediction, these markers were used as the dependent variable
   to construct the Cox proportional hazards regression model. A risk
   score model was constructed using a linear combination of these
   prognostic gene expression markers with the regression coefficient (β)
   from the multivariate Cox proportional hazards regression analysis. The
   formula used is as follows: risk score = expression of gene[1] ×
   β[1]gene[1] + expression of gene[2] × β[2]gene[2] + … expression of
   gene[n] × β[n]gene[n]. Patients were divided into a high-risk group and
   a low-risk group based on the median risk score. The survival analysis
   between the high-risk group and low-risk group was done using SPSS
   20.0. A time-dependent receiver–operating characteristic (ROC) curve
   was constructed using the survivalROC package of R software to analyze
   the predictive accuracy of patient overall survival obtained using the
   risk score model. In addition, comprehensive survival analysis based on
   the risk score model and EC subgroups, including EC grade, EC
   histological type and EC stage, were also performed to evaluate the
   adequacy of the prognostic gene signature for risk stratification and
   prognostic analysis of different EC subgroups.

Statistical Analysis

   The univariate and multivariate Cox proportional hazards regression
   analyses were completed using the survival package of R software and
   SPSS 20.0, respectively. Survival analysis was performed between
   high-risk group and low-risk group using the Kaplan–Meier method in
   SPSS 20.0. Completely random two samples t test was used to analyze the
   statistical difference in the expression of hub genes between tumor
   samples and normal samples and between prognostic genes in tumor
   samples and normal samples or between the high-risk group and low-risk
   group. A p-value of <0.05 was considered to be statistically
   significant.

Results

Identification of Differentially Expressed Genes

   According to the screening criteria, a total of 388 DEGs were found
   between EC tissue and normal endometrial tissue in [56]GSE63678, which
   included 239 upregulated and 149 downregulated genes ([57]Supplementary
   Table S1 and [58]Figure 1A). The hierarchy cluster analysis indicated
   that DEGs can be distinguished between the two groups based on gene
   expression ([59]Figure 1B). In addition, 4,410 DEGs were obtained,
   which consisted of 2,215 upregulated genes and 2,195 downregulated
   genes in EC tissue when compared with normal endometrial tissue in the
   TCGA dataset ([60]Supplementary Table S2). Furthermore, 255 common DEGs
   were identified between the [61]GSE63678 and the TCGA EC dataset which
   comprised of 168 upregulated genes and 87 downregulated genes
   ([62]Figure 1C and [63]Supplementary Table S3). [64]Figure 1D shows the
   cluster analysis of the 255 common DEGs in the TCGA EC dataset.

FIGURE 1.

   FIGURE 1
   [65]Open in a new tab

   Identification of the DEGs. (A) Volcano plot of [66]GSE63678. Red nodes
   represent DEGs with logFC >1 and p-value of <0.05. Green nodes
   represent DEGs with logFC <–1 and p-value of <0.05. (B) A heat map of
   all DEGs of [67]GSE63678. Each column represents a sample and each row
   represents one gene. The gradual color ranging from green to red
   represents the gene expression changing from downregulation to
   upregulation. (C) Venn diagrams of common DEGs of [68]GSE63678 and TCGA
   endometrial cancer (EC) dataset. 71 and 2,047 represent the upregulated
   DEGs of [69]GSE63678 and TCGA EC dataset, respectively, while 62 and
   2,106 represent the downregulated DEGs of [70]GSE63678 and TCGA EC
   dataset, respectively. 168 represent common upregulated DEGs of
   [71]GSE63678 and TCGA EC dataset, while 87 represent common
   downregulated DEGs of [72]GSE63678 and TCGA EC dataset. (D) A heat map
   of the common DEGs in TCGA EC dataset. DEGs, differentially expressed
   genes; EC, endometrial cancer; TCGA, The Cancer Genome Atlas.

Functional and Pathway Enrichment Analysis of the Common DEGs

   Gene ontology and KEGG enrichment analysis were used to explore the
   biological functions of the DEGs. The upregulated DEGs were mainly
   associated with cell proliferation, apoptotic process, cell adhesion,
   and cell cycle, while the downregulated DEGs were mainly enriched in
   DNA transcription, transcription factor in addition to cell
   proliferation and apoptosis ([73]Figure 2A and [74]Supplementary Table
   S4). In the pathway enrichment analysis, metabolic pathways, p53
   signaling pathway, and cell cycle were identified for the upregulated
   DEGs, while the downregulated DEGs were associated with pathways such
   as PI3K-Akt signaling pathway, MAPK signaling pathway, and signaling
   pathways regulating pluripotency of stem cells and proteoglycans in
   cancer ([75]Figure 2B and [76]Supplementary Table S5).

FIGURE 2.

   FIGURE 2
   [77]Open in a new tab

   Gene ontology and pathway enrichment analyses of the common DEGs. (A)
   GO enrichment analysis of the common DEGs. The y-axis labels represent
   clustered GO terms. The GeneRatio represents the ratio of the number of
   genes enriched in one GO term to the number of upregulated or
   downregulated DEGs. (B) KEGG enrichment analysis of the common DEGs.
   The y-axis labels represent clustered KEGG pathways. The GeneRatio
   represents the ratio of the number of genes enriched in one KEGG
   pathway to the number of upregulated or downregulated DEGs. GO, gene
   ontology; DEGs, differentially expressed genes; KEGG, Kyoto
   Encyclopedia of Genes and Genomes.

Protein–Protein Interaction (PPI) Network and Modular Analysis

   In order to reveal the potential relationship between DEGs encoding
   proteins, a PPI network was constructed based on the SRTING database. A
   total of 194 proteins obtained from the DEGs and 2,581 edges were
   included in the PPI network including 46 downregulated genes and 148
   upregulated genes ([78]Figure 3A). In the network, nodes with top 10
   highest degrees were TOP2A, CDK1, CCNB1, CCNB2, AURKA, PCNA, CCNA2,
   BIRC5, NDC80, CDC20, and BUB1BA, which were considered as hub genes.
   According to Cytoscape MCODE soft, two modules were identified in the
   PPI network. Module 1 contained of 62 nodes and 1,810 edges and module
   2 contained 10 nodes and 33 edges ([79]Figure 3B,C). Expression
   distribution of the 11 hub genes are shown in [80]Figure 4. To our
   surprise, all 11 hub genes were members of module 1 suggesting that
   module 1 plays a crucial role in the PPI network. GO terms enrichment
   analysis suggested that module 1 was mainly involved in diverse
   cellular activities such as cell division, cell proliferation,
   apoptotic process, and the cell cycle, while module 2 mainly
   participates in diverse metabolic processes such as gluconeogenesis,
   carbohydrate metabolic process, and extracellular exosomes ([81]Figure
   5A and [82]Supplementary Table S6). In terms of KEGG enrichment
   analysis, module 1 was closely related to cell cycle, immune system,
   p53 signaling pathway and viral carcinogenesis pathways. Module 2
   regulated various metabolic pathways such as carbon metabolism and
   gluconeogenesis ([83]Figure 5B and [84]Supplementary Table S7).

FIGURE 3.

   FIGURE 3
   [85]Open in a new tab

   Protein–protein interaction network of common DEGs and module analysis.
   (A) PPI network of proteins encoded by the DEGs, including 194 nodes
   and 2,581 edges. The yellow circle represents module 2 and the purple
   circle represents module 1. (B) Module 1 consisted of 62 nodes and
   1,810 edges. (C) Module 2 consisted of 10 nodes and 33 edges. Red nodes
   and green nodes represent upregulated and downregulated DEGs,
   respectively. PPI, protein–protein interaction; DEGs, differentially
   expressed genes.

FIGURE 4.

   FIGURE 4
   [86]Open in a new tab

   Expression of eleven hub genes in the PPI network in TCGA endometrial
   cancer dataset between the UCEC group and NE group. The expression
   value was log2(X+1) transformed. Completely randomized two-sample
   T-test was used to calculate the p-value. The white dot in each x-axis
   category represents the median. The dark bar on each x-axis category
   shows the interquartile range. The longer gray bar in each x-axis
   category represents the 95% confidence interval. TCGA, The Cancer
   Genome Atlas; PPI, protein–protein interaction; UCEC, uterine corpus
   endometrial carcinoma; NE, normal endometrium.

FIGURE 5.

   FIGURE 5
   [87]Open in a new tab

   Gene ontology and pathway enrichment analyses of the modules in the PPI
   network. (A) GO enrichment analysis of module 1. The y-axis labels
   represent clustered GO terms. The GeneRatio represents the ratio of the
   number of genes enriched in one GO term to the number of genes in
   module 1. (B) KEGG pathway enrichment analysis of module 2. The y-axis
   labels represent clustered KEGG pathways. The GeneRatio represents the
   ratio of the number of genes enriched in one KEGG pathway to the number
   of genes in module 2. GO, gene ontology; KEGG, Kyoto Encyclopedia of
   Genes and Genomes; PPI, protein–protein interaction.

Survival Analysis

   A univariate Cox regression analysis found that 117 DEGs were
   associated with patient overall survival (p < 0.05) A multivariate Cox
   proportional hazards regression model constructed the seven DEGs as a
   prognostic signature for overall survival (p < 0.05). These included
   PHLDA2, KIAA1644, GGH, ESPL1, TRPM4, LMNB1, and FAM184A. Among these
   genes, PHLDA2, GGH, ESPL1, and FAM184A with a hazard ratio of >1 were
   regarded as risky prognostic genes, while KIAA1644, ESPL1, TRPM4 with a
   hazard ratio of <1 were considered as protective prognostic genes
   ([88]Table 1). According to the risk score model, 276 patients were
   assigned to the high-risk group and the remaining 275 patients were
   assigned to the low-risk group. [89]Figure 6A–[90]C presents the risk
   score state of the TCGA EC dataset. Survival analysis showed that the
   low-risk group had a better overall survival than the high-risk group
   (p < 0.05) ([91]Figure 6D). The overall survival at 1, 3, and 5 years
   for low-risk group was 99.6% (95% CI: 1–0.99), 95.6% (95% CI:
   0.97–0.90), and 94.2% (95% CI: 0.95–0.86), respectively. Comparatively,
   overall survival at 1, 3, and 5 years for high-risk group was 92.4%
   (95% CI: 0.95–0.89), 78.3% (95% CI: 0.78–0.65), and 75.4% (95% CI:
   0.71–0.55), respectively. A time-dependent ROC analysis based on the
   risk score model showed good performance in survival prediction and the
   area under the ROC curve was 0.797, 0.734, 0.729, and 0.647 for 1, 3,
   5, and 10 years, respectively ([92]Figure 6E). Joint effects analysis
   of the seven-gene signature and EC grade, EC histologic, EC stage also
   showed a high predictive value for EC patient overall survival (p <
   0.001) ([93]Figure 7 and [94]Table 2). The overall survival at 1, 3,
   and 5 years for different EC subgroups based on the seven-gene
   signature risk stratification model also showed good predictive value
   ([95]Table 3). The expression value of the seven genes in EC tissue and
   normal endometrial tissue is shown in [96]Figure 8A, while the
   expression distribution of these genes in low-risk group and high-risk
   group is shown in [97]Figure 8B.

Table 1.

   Prognostic value of the seven genes in endometrial cancer patients of
   the TCGA cohort.
   Gene symbol Univariate analysis Multivariate analysis
     __________________________________________________________________
     __________________________________________________________________

   HR (95% CI) p-value HR (95% CI) p-value Coefficient
   PHLDA2 1.164 (0.036–268) 0.010 1.203 (1.049–1.378) 0.008 0.185
   KIAA1644 0.869 (-0.241– -0.039) 0.006 0.88 (0.788–0.982) 0.022 -0.128
   GGH 1.310 (0.138–0.402) 5.99E-05 1.249 (1.038–1.502) 0.018 0.222
   ESPL1 1.331 (0.145–0.427) 7.17E-05 1.486 (1.149–1.922) 0.003 0.396
   TRPM4 0.857 (-0.292– -0.017) 0.027 0.844 (0.718–0.992) 0.04 -0.17
   LMNB1 1.188 (0.020–0.325) 0.027 0.601 (0.439–0.822) 0.001 -0.509
   FAM184A 1.111 (0.009–0.201) 0.032 1.153 (1.035–1.285) 0.01 0.142
   [98]Open in a new tab

   TCGA, The Cancer Genome Atlas; CI, confidence interval.

FIGURE 6.

   FIGURE 6
   [99]Open in a new tab

   Prognostic analysis based on the seven genes risk score model on TCGA
   endometrial cancer dataset. (A) Patient risk score distribution based
   on the risk score model. (B) Patient survival status distribution of
   the low-risk group and the high-risk group. (C) Heat map of the seven
   genes that were used to construct the risk score model of the low- and
   high-risk groups. (D) Survival curves for the low- and high-risk
   groups. (E) ROC analysis predicted overall survival using the risk
   score. TCGA, The Cancer Genome Atlas; ROC, receiver operating
   characteristic curve.

FIGURE 7.

   FIGURE 7
   [100]Open in a new tab

   Joint effects analysis of OS stratified by risk score and EC clinical
   parameters. Joint effects analysis was stratified using the risk score
   and the following clinical parameters: tumor grade (A), histologic type
   (B), tumor stage (C). OS, overall survival; EC, endometrial cancer.

Table 2.

   Joint effects survival analysis of clinical factors and the DEG
   signature risk score with OS in EC patients.
   Group Risk score Variables Events/total (521) MST (years) HR (95%CI)
   P-value
   Histological grade
   A Low risk G1 + G2 8/164 NA 1
   B Low risk G3 11/97 NA 2.198 (0.884-50467) 0.09
   C High risk G1 + G2 8/47 NA 4.149 (1.553-11.08) 0.005
   D High risk G3 60/213 8.526 7.115 (3.397-14.901) <0.001
   Histological type^a
   a Low risk EEA 14/244 NA 1
   b Low risk SEA 4/14 9.175 4.577 (1.495-14.01) 0.008
   c High risk EEA 32/147 NA 4.645 (2.476-8.714) <0.001
   d High risk SEA 31/95 5.326 6.902 (3.662-13.009) <0.001
   Tumor stage
   1 Low risk Stage I + II 12/212 NA 1
   2 Low risk Stage III + IV 7/49 NA 2.904 (1.138-7.41) 0.026
   3 High risk Stage I + II 26/164 8.907 3.384 (1.702-6.725) 0.001
   4 High risk Stage III + IV 42/96 3.011 11.239 (5.896-21.42) <0.001
   [101]Open in a new tab

   ^aInformation of histological type is mixed EEA and SEA for 21
   patients. DEGs, differentially expressed genes; OS, overall survival;
   EC, endometrial cancer; MST, median survival time; HR, hazard ratio;
   EEA, endometrioid endometrial adenocarcinoma; SEA, serous endometrial
   adenocarcinoma; CI, confidence interval.

Table 3.

   1, 3, and 5-year OS analysis of EC patients based on clinical factors
   and the DEG signature risk score.
   Variables Risk score 1 year OS (95%CI) p-value 3 year OS (95%CI)
   p-value 5 year OS (95%CI) p-value
   Histological grade
   G1 + G2 Low risk 100% 97% (0.96-0.91) 96.3% (0.986-0.89)
   G1 + G2 High risk 93.6% (1.01-0.86) 0.001 85.1% (0.94-0.66) 0.002 85.1%
   (0.94-0.66) 0.004
   G3 Low risk 99% (1.01-0.97) 93.8% (0.98-0.86) 90.7% (0.95-0.76)
   G3 High risk 92% (0.95-0.88) 0.016 77% (0.77-0.62) <0.001 73.2%
   (0.68-0.49) <0.001
   Histological type
   EEA Low risk 99.6% (1-0.99) 95.9% (0.98-0.90) 95.1% (0.97-0.88)
   EEA High risk 91.8% (0.96-0.87) <0.001 81% (0.83-0.66) <0.001 78.9%
   (0.79-0.60) <0.001
   SEA Low risk 100% 92.9% (1.01-0.74) 78.6% (0.97-0.25)
   SEA High risk 93.7% (0.99-0.88) 0.341 78.7% (0.78-0.55) 0.155 70.5%
   (0.70-0.39) 0.43
   Tumor stage
   Stage I + II Low risk 100% 97.2% (0.99-0.92) 96.2% (0.98-0.89)
   Stage I + II High risk 95.7% (0.99-0.92) 0.003 89% (0.91-0.78) 0.001
   86.6% (0.87-0.67) <0.001
   Stage III + IV Low risk 98% (1.02-0.94) 89.8% (0.98-0.78) 85.7%
   (0.94-0.62)
   Stage III + IV High risk 86.5% (0.93-0.79) 0.027 60.4% (0.62-0.39)
   <0.001 56.2% (0.53-0.27) <0.001
   [102]Open in a new tab

   ^aInformation of histological type is mixed EEA and SEA for 21
   patients. DEGs, differentially expressed genes; OS, overall survival;
   EC, endometrial cancer; MST, median survival time; HR, hazard ratio;
   EEA, endometrioid endometrial adenocarcinoma; SEA, serous endometrial
   adenocarcinoma; CI, confidence interval.

FIGURE 8.

   FIGURE 8
   [103]Open in a new tab

   Expression of the seven genes in TCGA endometrial cancer dataset. The
   expression value was log2(X+1) transformed. (A) Expression of the seven
   genes between the UCEC group and NE group in TCGA endometrial cancer
   dataset. (B) Expression of the seven genes of the low- and high-risk
   groups in TCGA endometrial cancer dataset. The white dot on each x-axis
   category represents the median. The dark bar in each x-axis category
   shows the interquartile range. The longer gray bar in each x-axis
   category represent the 95% confidence interval. TCGA, The Cancer Genome
   Atlas; UCEC, uterine corpus endometrial carcinoma; NE, normal
   endometrium.

Discussion

   In this study, we identified DEGs between EC tissue and normal
   endometrium based on a GEO expression profile and TCGA high-throughput
   sequencing, and revealed the hub genes found among the protein-encoding
   DEGs. We looked for potential biomarkers related to EC prognosis from
   among the DEGs using univariate and multivariate Cox regression
   analyses and constructed a prognostic signature based on DEG
   expression. We found 255 common DEGs and 11 hub genes including TOP2A,
   CDK1, CCNB1, CCNB2, AURKA, PCNA, CCNA2, BIRC5, NDC80, CDC20, and
   BUB1BA. We developed a seven-gene signature for prognosis prediction of
   EC patients, which included the genes PHLDA2, KIAA1644, GGH, ESPL1,
   TRPM4, LMNB1, and FAM184A. The seven-gene signature displayed good
   predictive value for OS of EC patients and its subgroups. In summary,
   these results provide clues for further exploring the pathogenesis of
   EC and to establish a new risk classification and prognosis assessment
   model.

   Similar to our research, [104]Wu et al. (2019) reported of four
   important miRNAs that formed a four-miRNA signature that can divide EC
   patients into a high risk and a low-risk group, with significantly
   different overall survival according to TCGA EC dataset. A nine-lncRNA
   signature was also established, which had a good performance in overall
   survival prediction of endometrioid EC patients based on TCGA dataset
   ([105]Xu et al., 2018). RNA Sequencing analysis revealed the
   coexistence of mutations in a three-gene signature that can be viewed
   as a biomarker for diagnosis of endometrioid EC, while the absence of
   three-gene signature mutations when TP53 was mutated was found to be
   diagnostic of serous carcinomas ([106]Cuevas et al., 2019). In our
   study, a seven-gene signature was developed based on GEO and TCGA EC
   datasets, which can distinguish between high risk and low risk patients
   and functions well in predicting the overall survival of EC and its
   subgroups. In addition, our study showed that metabolic pathways, p53
   signaling pathway, and cell cycle were the signaling pathways that were
   mainly enriched for the upregulated DEGs, while the downregulated DEGs
   were associated with pathways such as the PI3K-Akt signaling pathway
   and the MAPK signaling pathway. These results were confirmed by the
   similar results obtained by [107]Zhang et al. (2016) and [108]Liu et
   al. (2019).

   In the current study, we also found eleven hub genes in the PPI
   network, indicating that they possibly play an important role in the
   pathogenesis of EC. Similar to our findings, TOP2A positive EC patients
   have been found to have shorter overall survival and disease-free
   survival compared to TOP2A negative EC patients ([109]Lapinska-Szumczyk
   et al., 2014; [110]Ito et al., 2016). TOP2A heterogeneity was also
   related to EC stage and metastases. Stage III and IV EC patients and EC
   patients with EC metastases showed higher TOP2A heterogeneity
   ([111]Supernat et al., 2014). These results suggest that higher TOP2A
   levels lead to EC progression and represent a higher degree of
   malignancy in EC. In other studies, TOP2A was upregulated in cancer
   tissues when compared with that of adjacent non-cancerous tissues in
   breast cancer ([112]Wang et al., 2012), renal cell carcinoma ([113]Ye
   et al., 2018), ovarian cancer ([114]Erriquez et al., 2015), prostate
   cancer ([115]De Resende et al., 2013), nasopharyngeal carcinoma
   ([116]Lan et al., 2014), and colon cancer ([117]Zhang et al., 2018).
   Furthermore, TOP2A overexpression is a positive tumor metastasis marker
   and a poor biomarker for prognosis. In addition, TOP2A downregulation
   was found to inhibit the proliferation and migration or invasion of
   pancreatic and colon cancer cell lines and involved the β-catenin
   signaling pathway in pancreatic cancer ([118]Pei et al., 2018;
   [119]Zhang et al., 2018). CCNB1, CCNB2, and CCNA2 are three members of
   the cyclin family and CDK1, a member of serine-threonine kinases, is a
   master regulator of cell cycle progression. Furthermore, cell cycle was
   also enriched significantly in our study for both biological processes
   and pathways, which indicates cell cycle changes significantly in EC.
   Consistent with our research, CDK1 and CCNA2 were also found to be
   overexpressed in EC tissues and cells and were also identified as hub
   genes in the PPI network ([120]Zhang et al., 2016; [121]Li et al.,
   2017). At present, there are few studies regarding the role of CCNB1,
   CCNB2, and CCNA2 in EC. [122]Wang J. et al., 2018 found that CCNA2
   expression was high and was positively correlated with histological
   grades, where a higher expression of CCNA2 was associated with worse
   differentiation in endometrial adenocarcinoma. CDK1 is a target gene of
   miR-1271, human paired box 2 and LncRNA ABHD11-AS1 and regulates
   endometrial carcinoma cell line proliferation, invasion, migration,
   apoptosis, and other mobility factors ([123]Li et al., 2017; [124]Liu
   Y. et al., 2018; [125]Wang J. et al., 2018). In vulvar squamous cell
   carcinoma, elevated levels of CDK1 were found in patients with advanced
   tumor behaviors and aggressive features ([126]Wang Z. et al., 2015). In
   addition, a high expression of CDK1 in lung adenocarcinoma patients,
   epithelial ovarian cancer patients, and colorectal cancer patients was
   identified as a diagnostic biomarker for poor survival ([127]Sung et
   al., 2014; [128]Xi et al., 2015; [129]Liu W.T. et al., 2018). AURKA is
   a human Aurora kinase and is reported to be involved in cell cycle
   regulation. In a study, AURKA was upregulated in higher tumor grades
   and was found to be associated with poor histological differentiation
   in EC ([130]Glover et al., 1995). Furthermore, knockout of AURKA
   inhibited EC cell line invasion and migration, and improved
   chemosensitivity to paclitaxel, suggesting that it is a potential
   therapeutic target in EC ([131]Umene et al., 2015). PCNA is a co-factor
   of DNA polymerase and is essential for DNA replication. It is also
   considered to play an important role in the G1 phase to the S phase of
   the cell cycle ([132]Bolton et al., 1992). PCNA expression was reported
   to be higher in postmenopausal endometrial carcinoma in comparison to
   normal postmenopausal endometrium tissue. Furthermore, the expression
   level of PCNA was found to be related to clinicopathological features
   and prognosis of EC patients ([133]Hareyama, 1994). Additionally, many
   studies have demonstrated that it is a poor survival biomarker in
   osteosarcoma, gastric, and colorectal cancer ([134]Wang et al., 2017;
   [135]Yin et al., 2017; [136]Zhou et al., 2018). BIRC5 which encodes for
   survivin protein, is a member of inhibitor of apoptosis gene family and
   regulates apoptosis, while cell cycle studies suggest that BIRC5 is
   overexpressed both in EC and in EC cell lines ([137]Pallares et al.,
   2005; [138]Nabilsi et al., 2009). Furthermore, BIRC5 expression was
   found to gradually increase from the proliferative endometrium to
   endometrial hyperplasia to endometrioid adenocarcinoma indicating that
   it contributes to EC development ([139]Erkanli et al., 2006). In recent
   years, it was also reported that the high expression of BIRC5 can be
   used as a biomarker of poor progression free survival ([140]Chuwa et
   al., 2016). NDC80 is a subunit of the Ndc80 complex and plays an
   important role in mitotic progression suggesting that NDC80 may be
   associated with EC through regulation of the cell cycle ([141]Amin et
   al., 2018). [142]Chen et al. (2011) found that NDC80 was highly
   expressed in serous adenocarcinomas in comparison with endometrioid
   adenocarcinomas. However, the expression of NDC80 increased in many
   cancers such as colon gastric, pancreatic cancer, and osteosarcoma and
   was associated with poor prognosis. Furthermore, knockdown of NDC80 was
   found to inhibit cancer cell proliferation and induced apoptosis
   ([143]Qu et al., 2014; [144]Meng et al., 2015; [145]Xing et al., 2016;
   [146]Xu et al., 2017). CDC20 is a cell cycle regulating protein. A
   large number of studies have confirmed that CDC20 is upregulated in
   solid tumors and promotes cell growth and invasion leading to poor
   prognosis ([147]Ding et al., 2017; [148]Wang S. et al., 2018).
   Meanwhile, higher expression of CDC20 was found to be related to a high
   tumor grade and stage in common malignant tumors including EC
   ([149]Gayyed et al., 2016). BUB1BA has not been reported in previous
   studies, and its function remains to be elucidated. All this evidence
   demonstrates that almost all of the hub genes identified in this study
   are closely related to tumor development and progression, based on
   mainly cell cycle regulation. The specific mechanisms by which they
   regulate EC need to be further investigated.

   In addition, we identified 7 pivotal genes involved in EC prognosis and
   constructed a prognostic gene signature comprising of these genes.
   Among these, PHLDA2, GGH, ESPL1, and FAM184A are viewed as risky
   prognostic genes. PHLDA2 is an imprinted gene located on human
   chromosome 11p15.5. Previous studies have suggested that is a growth
   suppressor gene and that overexpression of this gene in the placenta
   leads to growth restricted pregnancies both in humans and in animal
   models ([150]Jensen et al., 2014). Furthermore, ectopic expression of
   PHLDA2 results in pregnancy complications possibly by promoting
   apoptosis and suppressing trophoblast growth ([151]Jin et al., 2016).
   In cancer, the role of PHLDA2 is controversial. Many studies have shown
   that PHLDA2 expression is decreased in osteosarcoma tissue and cell
   lines when compared with controls and that high levels of PHLDA2 is a
   predictor of good prognosis ([152]Dai et al., 2012; [153]Wang et al.,
   2016). Additionally, upregulation of PHLDA2 induces osteosarcoma cell
   apoptosis, inhibited cell growth and tumorigenesis in vitro and in vivo
   ([154]Huang et al., 2012; [155]Li et al., 2014). However, PHLDA2 was
   also found to play oncogenic roles in lung adenocarcinoma ([156]Hsu et
   al., 2017). In addition, high expression of PHLDA2 has also been
   observed in triple-negative breast cancer cell lines and pancreatic
   ductal adenocarcinoma, and represents poor prognosis ([157]Moon et al.,
   2015). Silencing PHLDA2 reduces cancer cell aggressiveness and
   proliferation ([158]Idichi et al., 2018). In our study, PHLDA2
   expression was upregulated and associated with poor prognosis. These
   results suggest that the role of PHLDA2 in cancer is complex and
   further studies are needed to dissect the mechanism of PHLDA2 in EC.
   GGH is an enzyme involved in folate metabolism. Previous studies have
   confirmed that GGH is highly expressed in invasive breast cancer and
   ERG-negative prostate cancer in comparison with adjacent non-cancerous
   tissues and high GGH levels are related to poor prognosis and
   unfavorable clinical outcomes ([159]Shubbar et al., 2013; [160]Melling
   et al., 2017). In oral squamous cell carcinoma, GGH is a member of an
   11 gene molecular signature with a worse overall survival maker for
   patients without nodal metastases ([161]Wang W. et al., 2015).
   Additionally, it has been identified as a therapeutic target of
   chemotherapy in multiple cancer types. Lower expression of GGH enhances
   sensitivity of cancer cells to pemetrexed, 5-fluorouracil,
   methotrexate, and gemcitabine in colon cancer, advanced pancreatic
   cancer, and non-small cell lung cancer ([162]Iacopetta et al., 2008;
   [163]Nakamura et al., 2011; [164]Yoshida et al., 2016). Our results
   imply that GGH is highly expressed in EC and is a marker of poor
   prognosis. However, the underlying molecular mechanisms of GGH in EC
   remain unclear. ESPL1 encoding protein is a protease that cleaves
   chromosomal cohesin during mitosis. ESPL1 expression has been found to
   be upregulated in a wide range of cancers ([165]Finetti et al., 2014;
   [166]Wen et al., 2018) and high expression of ESPL1 is associated with
   a loss of key tumor suppressor gene P53, which further contributes to
   the progression of mammary adenocarcinomas ([167]Mukherjee et al.,
   2014). Nevertheless, it has also been reported that ESPL1 plays an
   opposite role in gastric adenocarcinoma. [168]Wang D. et al. (2018)
   showed that ESPL1 levels were lower in gastric adenocarcinoma tissue in
   comparison with that of adjacent non-cancer tissue and was associated
   with longer overall survival and a low tumor stage suggesting the dual
   role of this gene in cancer. ESPL1 expression was found to be increased
   in our study, however, the clinical significance and functional
   mechanism of ESPL1 in EC remains to be verified. FAM184A was also found
   to be increased in the current study and was classified as a risky
   prognostic gene, but its role in EC has not been reported in previous
   studies.

   Additionally, the three protective prognostic genes identified in this
   study were TRPM4, LMNB1, and KIAA1644. TRPM4 is a Ca^2+-activated
   non-selective cation channel that influences calcium homeostasis.
   However, it is highly expressed in some cancers and is considered as a
   risk factor as well as a poor survival factor in prostate cancer and
   diffuse large B-cell lymphoma ([169]Schinke et al., 2014; [170]Loo et
   al., 2017). Meanwhile, overexpression of TRPM4 promotes cell
   proliferation by enhancing the β-catenin signaling pathway and
   epithelial to mesenchymal transition, migration, and invasion in
   prostate cancer cell lines ([171]Armisen et al., 2011; [172]Sagredo et
   al., 2019). In contrast, low expression of TRPM4 was found in
   colorectal cancer indicating that it may also serve as a protective
   factor ([173]Sozucan et al., 2015). LMNB1 is an important member of the
   lamin protein family but its role in cancer is controversial. Its
   expression is decreased in colon cancer and gastric cancer ([174]Moss
   et al., 1999), but is increased in prostate cancer, hepatocellular
   carcinoma, and pancreatic cancer ([175]Sun et al., 2010; [176]Li et
   al., 2013). Overexpression of LMNB1 indicates lower survival rates both
   in pancreatic cancer and colon cancer ([177]Li et al., 2013;
   [178]Izdebska et al., 2018), while upregulation of LMNB1 represents
   good clinical outcome in breast cancer ([179]Wazir et al., 2013).
   Furthermore, [180]Fridley et al. (2014) reported that silencing of
   LMNB1 in cancer cells increases its resistance to cisplatin, suggesting
   that LMNB1 is beneficial for cancer treatment. Based on the complex
   role of LMNB1, additional studies are needed to confirm its role in EC.
   In terms of KIAA1644, little is known about its role and prognostic
   value in cancer research.

   Our study has several limitations. Firstly, our findings are based
   entirely on public databases using bioinformatics analysis and
   therefore functional experiments are needed to verify these results.
   Secondly, the prognostic predictive value of the seven-gene signature
   is only based on a single cohort with a relatively small sample size
   and future studies involving larger independent cohorts should be
   conducted to validate our findings. Additionally, we did not consider
   common clinical parameters as we only focused on the commonly occurring
   DEGs, which may have resulted in vital information being ignored.

Conclusion

   In summary, our study identified 255 common DEGs between EC and normal
   endometrium and identified 11 hub genes and constructed a seven-gene
   signature that can be used as a good stratified analysis and prognostic
   prediction biomarker for survival at 1, 3, 5, and 10 years for EC
   patients. Therefore, our results revealed novel potential molecular
   therapeutic targets and a new method for EC patient risk stratification
   assessment and prognostic prediction. Further experimental studies and
   independent cohort studies are needed to validate these findings.

Ethics Statement

   Data in this study was obtained from GEO and TCGA public database and
   the acquisition and application method complied with corresponding
   database guidelines and policies.

Author Contributions

   HH and LL conceived and instructed the work. LL and JL checked the
   associated database and analyzed the raw data. LL wrote and revised the
   manuscript. All authors read and approved the final manuscript.

Conflict of Interest Statement

   The authors declare that the research was conducted in the absence of
   any commercial or financial relationships that could be construed as a
   potential conflict of interest.

   Funding. This work was supported in part by clinical study on early
   diagnosis and treatment of cervical precancerous lesions and cervical
   cancer of Liuzhou Science and Technology Bureau (No. 2015J030508).
   ^1

   [181]https://www.ncbi.nlm.nih.gov/geo/
   ^2

   [182]https://portal.gdc.cancer.gov/
   ^3

   [183]https://david.ncifcrf.gov
   ^4

   [184]https://string-db.org

Supplementary Material

   The Supplementary Material for this article can be found online at:
   [185]https://www.frontiersin.org/articles/10.3389/fgene.2019.00373/full
   #supplementary-material
   TABLE S1

   List of differentially expressed genes in [186]GSE63678.
   [187]Click here for additional data file.^ (24.1KB, XLSX)
   TABLE S2

   List of differentially expressed genes between EC tissue and normal
   endometrial tissue in the TCGA dataset.
   [188]Click here for additional data file.^ (266.4KB, XLSX)
   TABLE S3

   List of common differentially expressed genes both in [189]GSE63678 and
   the TCGA EC dataset.
   [190]Click here for additional data file.^ (19.1KB, XLSX)
   TABLE S4

   Gene ontology enrichment analysis of the common DEGs both in
   [191]GSE63678 and the TCGA EC dataset.
   [192]Click here for additional data file.^ (45.2KB, XLSX)
   TABLE S5

   Pathway enrichment analysis of the common DEGs both in [193]GSE63678
   and the TCGA EC dataset.
   [194]Click here for additional data file.^ (18.2KB, XLSX)
   TABLE S6

   Gene ontology enrichment analysis of module1 and module2 in PPI
   network.
   [195]Click here for additional data file.^ (32.8KB, XLSX)
   TABLE S7

   Pathway enrichment analysis of module1 and module2 in PPI network.
   [196]Click here for additional data file.^ (17.1KB, XLSX)

References