Abstract

   Ovarian cancer (OC) is the most malignant tumor in the female
   reproductive tract. Although abundant molecular biomarkers have been
   identified, a robust and accurate gene expression signature is still
   essential to assist oncologists in evaluating the prognosis of OC
   patients. In this study, samples from 367 patients in The Cancer Genome
   Atlas (TCGA) database were subjected to mRNA expression profiling.
   Then, we used a gene set enrichment analysis (GSEA) to screen genes
   correlated with epithelial–mesenchymal transition (EMT) and assess
   their prognostic power with a Cox proportional regression model. Six
   genes (TGFBI, SFRP1, COL16A1, THY1, PPIB, BGN) associated with overall
   survival (OS) were used to construct a risk assessment model, after
   which the patients were divided into high-risk and low-risk groups. The
   six-gene signature was an independent prognostic biomarker of OS for OC
   patients based on the multivariate Cox regression analysis. In
   addition, the six-gene model was validated with samples from the Gene
   Expression Omnibus (GEO) database. In summary, we established a
   six-gene signature relevant to the prognosis of OC, which might become
   a therapeutic tool with clinical applications in the future.

   Keywords: ovarian cancer, EMT, prognostic, mRNAs, survival

Introduction

   Ovarian cancer is currently the fifth leading cause of death and has
   become a major threat to the reproductive health of women. According to
   previous reports, 80% of OC patients are initially diagnosed with
   advanced OC ([25]Baldwin et al., 2015; [26]Ferlay et al., 2015), and
   28% of these patients show distant metastasis ([27]Torre et al., 2018).
   Although standard therapies aimed at complete resection combined with
   chemotherapy and neoadjuvant therapies, such as carboplatin and
   paclitaxel ([28]Elies et al., 2018) have been widely used in clinical
   practice, the prognosis of OC is poor, and the 5-year survival rate is
   only 30–50% ([29]Tarver and Cancer Facts & Figures, 2012). The
   traditional FIGO (2014) staging system is the primary assessment
   standard for estimating the treatment and outcome of OC patients.
   However, the heterogeneity of OC is constantly changing. Clonal
   expansions in histologically normal tissues occurs not only in adults,
   but in childhood. The extent of clone expansion may inform the
   malignant potential and recurrence risk, which is a new insight in
   tumor therapy. For heterogeneous tumor samples at different sites or
   different times, the sequencing results of the same patient are
   theoretically inconsistent. PyClone can be used to analyze which
   subclones are present in the different sequencing data^[30]1. Each
   ovarian tumor can be regarded as a collection of many cells with
   different genetic and epigenetic traits. It is obviously impossible to
   completely cure OC by treating with a rigid treatment model, as
   individuals with the same stage and who receive the same treatment
   experience different outcomes ([31]Minlikeeva et al., 2017; [32]Testa
   et al., 2018). Thus, the current staging system is inadequate, and it
   is urgent to seek more accurate indicators to identify the high-risk
   population.

   It is time consuming and laborious to experimentally identify key
   genomic mutations (structural variation or number variation). However,
   with the rapid development of high-throughput gene microarrays and
   whole-genome sequencing, numerous molecular landscapes and gene
   variants have been discovered in many tumors ([33]Nakagawa and Fujita,
   2018; [34]Schuh et al., 2018). Developing a convenient and effective
   programmatic screening method for high-risk populations is of vital
   importance ([35]Song et al., 2019). Currently, a large number of
   databases have been built by researchers to provide access for the
   exploration of genomic alterations. Increasing evidence has
   demonstrated that molecular biomarkers contribute to the prognosis
   evaluation of tumors, and searches can be conducted for cancer-related
   somatic mutation sites from databases such as COSMIC ([36]Bamford et
   al., 2004). For example, overexpression of upregulated DLL3 is an
   independent prognostic predictor for endometrial cancer and could be a
   potential and novel tumor marker for early-stage endometrial cancer
   ([37]Wang et al., 2018). The angiopoietin receptor, plasma TIE2, is a
   tumor vascular response biomarker for VEGF that inhibitors in
   metastatic colorectal cancer ([38]Jayson et al., 2018). Nevertheless,
   researchers found that rather than traditional single-gene biomarkers,
   gene signatures comprising several genes can provide strong evidence
   for the prognosis and survival of patients with tumors. For example, a
   five-gene signature (ADM, ASPM, DCBLD2, E2F7, and KRT6A) demonstrates
   significant power to predict patient survival in two distinct patient
   cohorts with pancreatic ductal adenocarcinoma and is independent of the
   AJCC TNM staging of this disease ([39]Raman et al., 2018).
   Cross-validation of this gene signature reported a better AUC of the
   ROC (≥0.8) than existing pancreatic ductal adenocarcinoma (PDAC)
   survival signatures. Gene signatures may provide insights into the
   mechanism of carcinogenesis to facilitate a preferable and ideal
   treatment schedule. A modified PageRank algorithm was used to establish
   a pipeline that can help identify driver mutations among the many
   differentially expressed genes to produce prognostic genes. It is also
   helpful in the selection of key genes in other biological processes
   ([40]Wang et al., 2017). [41]Murakami et al. (2016) proposed a scoring
   system through TCGA and the GEO databases that could predict the tumor
   response to platinum-based therapies or taxane, which could be useful
   to develop individualized treatments for OC. A prognostic model of
   high-grade serous ovarian carcinoma classification named CLOVAR was
   developed by the TCGA specifically for OC ([42]Verhaak et al., 2013;
   [43]Boris et al., 2016). In fact, many studies about creating
   prognostic models via TCGA databases, gene set enrichment analysis
   (GSEA) and COX analysis have emerged ([44]Huiran et al., 2019; [45]Wang
   et al., 2019), and the gene signatures concluded from other studies
   were different than what we identify and present below.

   In this study, we used GSEA, a technique superior to traditional
   analysis, to screen target genes because GSEA is not limited to
   focusing on gene expression differences in disease biomarkers and
   analyzing survival changes; rather, this technique only focuses on
   genes with statistically significant aberrant expression ([46]Mathur et
   al., 2018). For GO) and KEGG pathway enrichment analysis, researchers
   need to provide a clear threshold for differentially expressed genes.
   However, the actual changes in RNA expression observed by ChIP are
   usually the result of multiple negative feedback loops, and the
   sensitivity of expression discrepancies is different among various
   tissues. GSEA can avoid ignoring those genes that have no significant
   expression difference but contribute to biological behavior, gene
   function, and are even related gene regulation networks. In addition,
   GSEA allows for the monitoring of the expression changes in gene sets
   rather than in individual genes, so subtle expression changes can be
   included to obtain more accurate results ([47]Dean et al., 2017;
   [48]Zhang et al., 2017).

   In our study, we attempted to produce enriched gene sets based on
   molecular signatures with the prognosis of OC in different stages. To
   this end, we profiled specific gene sets in 367 OC patients with
   integrated mRNA expression datasets from the TCGA database. A total of
   197 mRNAs were related to epithelial–mesenchymal transition (EMT), and
   a six-gene risk signature that could predict patient outcomes and
   identify high-risk OC populations that indicate poor prognosis was
   established.

Materials and Methods

Patient Clinical Data Source and mRNA Expression Dataset

   The mRNA expression profiles and corresponding clinical information
   from OC patients were extracted from the TCGA^[49]2 data portal^[50]3
   ([51]Tomczak et al., 2015). These data were imputed on an Illumina
   HiSeq RNA-Seq platform and comprised of one OC patient in stage I, 22
   in stage II, 289 in stage III, and 55 in stage IV, with matching
   relative to stage, age, cancer status, grade, venous invasion, and
   lymphatic invasion. The above information is based on high-throughput
   whole-genome sequencing of each tumor sample. The abovementioned data
   are displayed in [52]Table 1. Both the expression profiles and clinical
   characteristics can be obtained publicly, so there was no need to
   obtain ethics committee approval.

TABLE 1.

   Clinical pathological parameters of patients with ovary cancer in this
   study.
   Clinical pathological parameters  N    %   Confirmed deaths
   Stage
   Stage I                           1  0.27         1
   Stage II                         22  5.99         5
   Stage III                        289 78.74       159
   Stage IV                         55  14.98        34
   Cancer status
   Tumor free                       85  23.22        3
   With tumor                       281 76.78       196
   Grade
   G1                                1  0.27         0
   G2                               42  11.48        25
   G3                               314 85.79       167
   G4                                1  0.27         1
   GX                                8  2.186        5
   Age
   <58                              183  50          92
   >58                              183  50         107
   Venous invasion
   No                               40  38.83        17
   Yes                              63  61.17        23
   Lymphatic invasion
   No                               47  41.23        19
   Yes                              97  58.77        45
   [53]Open in a new tab

Gene Set Enrichment Analysis

   Single-gene analysis yields little similarity between two independent
   fields of patient survival in cancer. GSEA is an approach that focuses
   on many biological functions, chromosomal locations, or regulatory
   activities to interpret genome-wide expression profiles ([54]Fevzi et
   al., 2005). We used GSEA^[55]4 to determine whether a particular gene
   set shows a statistically significant difference in expression. The key
   point of GSEA is the ES, with which each gene is endowed. The ES
   reflects the degree of enrichment of gene members at both ends of the
   sequencing list. For the analysis results, we defined the significance
   of gene sets based on the following parameters: |normalized enrichment
   score (NES)| > 1, nominal p-value (NOM) < 0.05, and FDR q-val < 0.25.

Construction of Risk Model

   Firstly, we matched the patient’s gene expression profile with the
   patient’s clinicopathological parameters and selected patients with
   relatively complete data. The patients were divided into two groups
   according to stage: group 1 included stage I and stage II patients,
   whereas group 2 included stage III and stage IV patients. We determined
   the enriched cell pathways in group 2 by GSEA, after which we selected
   the target cell pathway according to the screening conditions and
   sorted enriched mRNAs. Univariate Cox regression analysis ([56]Woodward
   and Overall, 1975) was launched to screen for survival-related mRNAs
   with p < 0.05, and then the multivariate Cox proportional hazards
   regression analysis was used to analyze mRNAs related to OS. After all
   of the genes were divided into high-risk [hazard ratio (HR) > 1] and
   protective (0 < HR < 1) groups, a prognostic risk score formula was
   established based on a linear combination of the expression levels
   weighted with regression coefficients originating from the multivariate
   Cox regression analysis. The formula is as follows: Risk score =
   [MATH: <mrow><msubsup><mo largeop="true"
   symmetric="true">∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n
   </mi></msubsup><mrow><mi>E</mi><mi>x</mi><msubsup><mi>p</mi><mi>i</mi><
   mo>*</mo></msubsup><msub><mi
   mathvariant="normal">β</mi><mi>i</mi></msub></mrow></mrow> :MATH]
   , where n is the number of selected genes, ExPi is the expression level
   of gene i, and βi represents the regression coefficient of gene i.

   Patients were divided into high-risk and low-risk groups according to
   the median patient risk score. Differential expression and heat maps
   were used to analyze the different expression levels of genes that
   constitute the risk scores in the high-risk and low-risk groups.
   cBioPortal provides visualization tools for research and analysis of
   cancer genetic data to help decipher the molecular data obtained from
   cancer tissue and cytology research. We used cBioPortal to identify
   whether the genes that make up the risk score manifested mutations.

   GEPIA^[57]5 is a cancer data mining website mainly based on the TGCA
   and GTEx projects. The content that can be analyzed also covers
   multiple aspects: single-gene or multi-gene analysis, cancer types.
   Researchers select the specific tumor data according to their needs.
   Here, we entered the name of the single gene and obtained its
   expression level in different stages of OC.

Functional Enrichment Analysis

   Gene set variation analysis^[58]6 was performed with R according to the
   researchers’ needs of the OC patients’ mRNA expression profile data. In
   addition to the R version of the downloaded executable files, source
   code and documentation, various user-written software packages were
   also included. Downloading R (3.4.1. Windows 64-bit) required visiting
   the main website^[59]7 and selecting the CRAN to initiate download.

   We identified differentially expressed genes between high- and low-risk
   groups through the EDGR algorithm ([60]Robinson et al., 2010) and
   analyzed the cell pathways corresponding to the genes with significant
   differences between the high- and low-risk groups online through the
   DAVID^[61]8.

   We used GO and KEGG analyses to explore the biological processes
   related to the differentially expressed genes. The results of the two
   strategies were substantial in providing both an overall and a deep
   understanding of the biological systems identified by GSEA in our
   study. A PPI network was developed to explore the relationships among
   these genes using the online database STRING^[62]9. STRING was then
   used to map all of the hub genes connected with each other.

Verification of the Risk Score via Cox Regression Analysis

   We classified patients according to their risk scores and other
   clinicopathological parameters. Then, we used univariate Cox regression
   analysis to screen clinical pathological parameters related to survival
   followed by multivariate Cox proportional hazard regression analysis to
   determine whether the risk score was an independent predictor of OC.
   Most OC patients are diagnosed with stages III–IV disease, so the
   number of stages I–II OC cases is limited. Kaplan–Meier survival curves
   and ROC curves were created to compare the accuracy of the prognostic
   ability of the risk score with other pathological parameters in OC. To
   estimate the sensitivity and specificity of the risk score model and
   AUC value, ROC analysis was performed using SPSS 19.0. The visual
   nomogram ([63]Roman et al., 2005) was displayed in the R software
   mentioned above as the total points for OS for each patient.

Validation of the Risk Prognosis Model With a GEO Dataset

   Although there are more than a dozen GEO datasets for OC, we chose the
   [64]GSE9891 dataset because it included all six genes with matched
   survival data, and the size of the dataset is considerable. We
   downloaded [65]GSE9891^[66]10 ([67]Tothill et al., 2008), which
   comprises the clinical and gene expression data of 285 OC patients from
   the GEO database^[68]11. We matched the gene expression profiles of
   these patients with their clinicopathological parameters. The risk
   scores of the OC patients were calculated, and they were divided into a
   high-risk group and a low-risk group according to the median risk
   score. The feasibility of this prognosis model was verified based on
   whether there was a difference in survival between the high-risk and
   low-risk groups.

RNA Isolation and Quantitative Real-Time Polymerase Chain Reaction (qRT-PCR)

   Twenty pairs of OC tissues and normal ovarian tissues were obtained
   from patients seen at the Department of Obstetrics and Gynecology,
   Shengjing Hospital of China Medical University, China, from 2015 to
   2019. In addition, non-cancerous ovarian tissues were collected from
   women who underwent a hysterectomy for diseases other than cancer. All
   patients provided informed consent, and this study was approved by the
   Ethics Committee of Shengjing Hospital of China Medical University
   (2018PS251K). Histological diagnosis was assessed by three experienced
   pathologists. No patient received local or systemic treatment
   preoperatively. Total RNA was extracted from cancer and normal tissues
   with a TRIzol reagent (Takara, Dalian, China), and cDNA was generated
   from total RNA using a PrimeScript RT-polymerase (Takara). qRT-PCR was
   performed using SYBR-Green Premix (Takara) with specific PCR primers
   (Sangon Biotech, Co., Ltd., Shanghai, China) for the following
   proteins: TGFBI, F ACTCAGCCAAGACACTATTTGA, R CTTGTATGGGCATCAATTGGAG;
   SFRP1, F GCTCAACAAG AACTGCCAC, R CTTGTCACACTTAAGCATCTCG; COL 16A1, F
   GGAAGGACTCAAATTGGAACAC, R GATCTTC TTGATGGCAGACGTC; THY1, F
   CCAACTTCACCAGCAAA TACAA, R ACTTGACCAGTTTGTCTCTGAG; PPIB, F TTC
   TTCATCACGACAGTCAAGA, R TCACATCCTTCAGGGGT TTATC; and biglycan (BGN), F
   GAACATGAACTGCATCGAG ATG, R ATTTTGTTGTGGTCTAGGTGGA.
   Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was used as an
   internal control, and fold-changes were calculated with the 2^–ΔΔCt
   method. The qRT-PCR data are expressed as the means ± standard
   deviation (SD) of three independent experiments. Statistical analyses
   were performed with the GraphPad Prism 6.0 software (La Jolla, CA,
   United States). Since the data in both groups exhibited normal
   distribution, two-sided Student’s t-test or one-way analysis of
   variance (ANOVA) was used to ascertain differences between the two
   groups. A p-value < 0.05 was considered statistically significant.

Results

   The flowchart of this study is shown in [69]Figure 1. A prognostic
   model was established with EMT-related genes and Cox analysis, which
   indicated that the risk model was an independent prognostic indicator
   of OC.

FIGURE 1.

   [70]FIGURE 1
   [71]Open in a new tab

   Flowchart of this study.

Gene Set Enrichment Analysis

   There were 367 specimens (with a total of 24991 genes) from the TCGA
   database classified into two groups (stages I–II and stages III–IV).
   GSEA revealed that 24 out of the 50 hallmark biological processes were
   significantly enriched in the stages III–IV group. Among the 24
   processes, EMT (p = 0), oxidative phosphorylation (p = 0), adipogenesis
   (p = 0), myogenesis (p = 0), coagulation (p = 0.003), apoptosis (p =
   0.037), and fatty acid metabolism (p = 0.038) showed the most
   significant differences between the stages I–II and stages III–IV
   groups ([72]Figure 2 and [73]Table 2). A false discovery rate (FDR) <
   0.25, NOM p-val < 0.05 and |NES| > 1were considered the cut-off
   criteria for each process. The top-ranking gene set was the EMT process
   with the highest |NES|, which includes 197 mRNAs, and was selected for
   further studies.

FIGURE 2.

   [74]FIGURE 2
   [75]Open in a new tab

   Enrichment plots of seven gene sets that were functionally enriched in
   OC.

TABLE 2.

   Gene sets enriched in ovary cancer (367 samples).
   GS follow link to MSigDB          Size  ES  NOM p-value Rank at maximum
   EPITHELIAL MESENCHYMAL TRANSITION 197  0.6       0           5691
   OXIDATIVE PHOSPHORYLATION         183  0.53      0           7891
   ADIPOGENESIS                      190  0.41      0           10392
   MYOGENESIS                        199  0.39      0           10785
   COAGULATION                       136  0.39    0.003         10967
   APOPTOSIS                         159  0.32    0.037         5216
   FATTY ACID METABOLISM             157  0.32    0.038         10137
   [76]Open in a new tab

Identification of a Six-mRNA Signature Predicts Survival of OC Patients

   Eighty-three of the 197 genes were selected because their expression
   profile was enriched in stages III–IV OC. Then, the top 19 mRNAs
   associated with prognosis were obtained using univariate Cox regression
   analysis, six of which (TGFBI, SFRP1, COL16A1, THY1, PPIB, BGN) were
   selected for multivariate Cox regression analysis to construct a risk
   assessment model. A p-value < 0.05 in the univariate Cox analysis was
   the criterion for inclusion in the multivariate Cox analysis. We
   designated four mRNAs as high-risk mRNAs (TGFBI, SFRP1, COL16A1, and
   THY1; HR > 1) and two as protective mRNAs (PPIB and BGN; 0 < HR < 1)
   ([77]Table 3).

TABLE 3.

   The detailed information of six prognostic mRNAs significantly
   associated with overall survival in patients with ovarian cancer.
   mRNA Ensemble ID Location HR B (Cox) p
   TGFBI ENSG00000120708 Chr5: 136,028,988–136,063,818 1.1324 0.1243
   0.0175
   SFRP1 ENSG00000104332 Chr8: 41,261,962–41,309,473 1.0707 0.0683 0.0217
   COL16A1 ENSG00000084636 Chr1: 31,652,263–31,704,319 1.1528 0.1422
   0.0249
   THY1 ENSG00000154096 Chr11: 119,417,378–119,424,985 1.0853 0.0819
   0.0504
   PPIB ENSG00000166794 Chr15: 64,155,812–64,163,205 0.7530 −0.2837 0.0069
   BGN ENSG00000182492 ChrX: 153,494,980–153,509,546 0.8216 −0.1965 0.0393
   [78]Open in a new tab

   A model was developed to predict prognosis according to the gene
   expression and regression coefficients of the six genes and is
   summarized as follows.

   Risk score = 0.1243^∗ TGFBI + 0.0683^∗ SFRP1 + 0.1422^∗ COL16A1 +
   0.0819^∗ THY1 – 0.2837^∗ PPIB – 0.1965^∗ BGN.

   After a risk score was calculated for every patient, the median risk
   score was regarded as the cut-off value, and the patients were divided
   into low-risk and high-risk groups ([79]Figure 3A). The distribution of
   the patient relapse status is also shown in [80]Figure 3A. The
   mortality increased with an increasing risk score among these patients,
   with a heat map ([81]Figure 3B) indicating the expression pattern of
   the six mRNAs. As the risk score of the OC patients increased, the
   expression of the high-risk mRNAs (TGFBI, SFRP1, COL16A1, THY1) showed
   obvious upregulation whereas the expression of protective mRNAs (PPIB,
   BGN) was downregulated.

FIGURE 3.

   [82]FIGURE 3
   [83]Open in a new tab

   The six-mRNA signatures related to risk score predicts the OS and was
   associated with survival in OC patients. (A) mRNA risk score
   distribution and survival days of patients. (B) A heatmap of five gene
   expression profiles. (C) Selected gene alterations in the clinical
   samples. (D) Selected gene-specific alterations in the detailed cancer
   type.

   The results from the OncoPrint of cBioPortal showed a summary of the
   genetic alterations of the six genes in 367 OC patients. It indicated
   that TGFBI was altered in 0.9% patients, with two instances of
   amplification, two instances of deep deletions and one instance of a
   missense mutation (unknown significance). SFRP1, COL16A1, THY1, PPIB,
   and BGN showed alterations in 4, 3, 1, 1.2, and 4% of patients,
   respectively ([84]Figure 3C). Similarly, the specific type of
   alteration in the selected genes are shown in high-grade serous OC
   ([85]Figure 3D). It is clear that a loss of mRNA constituted the
   majority of alterations.

Validation of the Six mRNAs for Predicting Survival by Kaplan–Meier Curves

   Kaplan–Meier curves were used to validate the prognostic value of each
   mRNA in predicting OC. The results showed that in addition to SFRP1 (p
   = 0.0314), the risk score was a specific indicator between the high-
   and low-risk groups (p < 0.0001). The patients in the high-risk group
   definitively had a shorter survival ([86]Figure 4A). The original data
   of the six-mRNA expression profiles without log transformation in the
   high- and low-risk groups are displayed in [87]Figure 4B. Furthermore,
   the mRNA expression levels of the six selected genes were tested in
   normal ovary tissues and OC tissues ([88]Figure 4C). High-risk genes
   such as TGFBI (p = 0.0002), SFRP1 (p = 0.0344), COL16A1 (p = 0.0449),
   and THY1 (p < 0.0001) were overexpressed in OC tissues compared with
   normal tissues, while protective genes such as PPIB (p = 0.0128) and
   BGN (p = 0.0047) were expressed at lower levels in the OC tissues
   ([89]Figure 4D).

FIGURE 4.

   [90]FIGURE 4
   [91]Open in a new tab

   Survival validation and expression profile of the six genes grouped by
   high- and low-risk. (A) Prognostic prediction of OC by every mRNA in
   high- and low-risk groups. (B) Gene expression of the six genes in
   high- and low-risk groups. (C) Differential expression of the six
   genes. (D) The expression level of six mRNAs detected by qRT-PCR. ^∗P <
   0.05, ^∗∗P < 0.01, ^∗∗∗P < 0.001, ^****P < 0.0001.

EMT Process Was Validated to Be Associated With the High-Risk Group of OC via
GSVA Database and Functional Enrichment Analysis

   According to the high- and low-risk score groups obtained using samples
   from the GSVA database, we established a heatmap displaying various
   biological processes associated with OC, from which we could see that
   EMT was included ([92]Figure 5A). A heatmap generated by the EDGR
   algorithm of the 83 most common differentially expressed genes in the
   high-risk and low-risk groups is shown in [93]Supplementary Figure 1.
   Upregulated genes were defined as logFC > 1, and downregulated genes
   were defined as logFC < −1. p < 0.05 was considered statistically
   significant. The prognostic signaling pathways were evaluated by KEGG
   and GO pathways, from which we concluded that the genes related to
   prognosis were enriched in ligand receptor activity. [94]Figures 5B,C
   display the GO and KEGG pathway enrichment plots, respectively, for OC.
   There were some differentially expressed cancer-related biological
   processes, such as ligand receptor activity and the PI3K-Akt signaling
   pathway, between the high-risk and low-risk groups, which indicated
   that the risk score was of great relevance to the tumorigenesis or
   development of OC.

FIGURE 5.

   [95]FIGURE 5
   [96]Open in a new tab

   Epithelial–mesenchymal transition process was associated with the
   high-risk group of OC. (A) Heatmap displayed various biological
   processes associated with OC. (B) GO and (C) KEGG pathways for the
   functional enrichment analysis.

The Risk Score, Stage, and Cancer Status Are Independent Prognostic
Indicators of OC

   Univariate and multivariate analyses were carried out together to
   compare the prognostic strength of the risk score and other common
   clinicopathological parameters ([97]Table 4). The results with the GSVA
   dataset indicated that risk score, stage, and cancer status were
   independent prognostic indicators since their p-values were < 0.05 not
   only in the univariate but also in the multivariate analysis.
   Importantly, the cancer status was the most obvious clinical parameter
   related to mortality for OC patients in the high-risk group, who were
   8.837 times more likely to die than those in the low-risk score group.
   Risk score was the next most relevant factor, indicating a 2.742-fold
   increased likelihood of death for OC patients in the high-risk score
   group compared to those in the low-risk group.

TABLE 4.

   Univariable and multivariable analyses for each clinical feature.
 Clinical feature   Number Univariate analysis
     __________________________________________________________________

   Multivariate analysis
     __________________________________________________________________

                            HR   95%CI of HR  p-Value  HR   95%CI of HR  p-Value
 Risk score          367   2.742 1.855–4.053  <0.0001 1.983 1.263–3.112  0.0029
 Age                 367   1.023 1.010–1.035  0.0003  8.241 1.445–15.279 0.0044
 Cancer Status       318   8.837 4.787–16.315 <0.0001 0.421 0.300–0.594  <0.0001
 Grade               366   1.214 0.811–1.818  0.3457
 Stage               367   2.113 0.937–4.763  0.0711
 Venous invasion     102   1.026 0.611–1.722   0.924
 Lymphatic invasion  147   1.447 0.847–2.472   0.176
   [98]Open in a new tab

Validation of the Six-mRNA Signature for Predicting Survival by Kaplan–Meier
Curves

   Kaplan–Meier curves and the log-rank method were used to validate the
   prognostic ability of clinical parameters (risk score, stage, age,
   cancer status, grade, venous invasion, and lymphatic invasion) to
   predict survival in OC. The results indicated that patients with
   high-risk scores had poor prognoses. Patients with stages III–IV
   disease or tumor status were at a higher risk of poor prognosis than
   were patients with stages I–II disease ([99]Figure 6A). Furthermore, we
   performed a data stratification analysis on the entire cohort, and 367
   patients were stratified based on their clinical parameters. According
   to the results above, patients with tumors, patients in stages III–IV
   or grade 3, and patients older than 65 years old were more likely to
   have a shorter survival ([100]Supplementary Figure 2A). Kaplan–Meier
   curves were established to validate the prognostic value of the risk
   score to predict survival in OC in an independent GEO cohort
   ([101]GSE9891). As we stated above, every patient was assigned a risk
   score, with patients from the GEO database divided into low-risk and
   high-risk groups based on the median risk score value as the cut-off
   criterion. The distribution of the patient relapse status is shown in
   [102]Supplementary Figure 2D. It was apparent that patients with
   high-risk scores had poor prognoses (p < 0.0001), which was consistent
   with the result observed in [103]Figure 6B. Kaplan–Meier curves were
   also used to validate the prognostic value of the risk score in
   predicting colon cancer ([104]Supplementary Figure 2B) and
   hepatocellular cancer ([105]Supplementary Figure 2C). The results
   showed that the risk score was not a significant indicator between the
   high- and low-risk groups (p = 0.1158 and p = 0.3675), which indicated
   the specificity of the risk score model to OC.

FIGURE 6.

   [106]FIGURE 6
   [107]Open in a new tab

   Kaplan–Meier survival analysis for the patients with OC in the TCGA
   dataset. (A) The Kaplan–Meier curve for patients divided into high-risk
   and low-risk groups and different clinical features that predict
   patient survival. (B) GEO validation of the signature. (C) ROC curves
   of each parameter and the combination of independent prognostic risk
   factors of the 5-year OS prediction. (D) Nomogram of total points for
   OS for each patient.

   The time-dependent ROC curve of each parameter demonstrated the
   sensitivity and specificity of the 5-year OS prediction ([108]Figure
   6C). We identified that risk score, age, and cancer status were
   independent risk factors, with AUCs of 0.711, 0.56, and 0.731,
   respectively. In addition, we integrated these three independent risk
   factors into a larger model, and its AUC increased to 0.776. The AUC of
   the ROC curve verified the accuracy of our prognostic model. As shown
   in [109]Figure 6D, each patient can be assigned point values according
   to the risk score, age, and cancer status in this nomogram, with the
   total score reflecting OS. The model is more accurate in assessing
   patient outcomes when the risk score, age, and cancer status are
   combined.

   Protein–protein interaction networks obtained from the STRING database
   and visualized with the Cytoscape software helped us to identify the
   hub genes among the genes related to prognosis. The core genes (degrees
   ≥ 15) ([110]Supplementary Figure 3B) were further submitted for PPI
   network analysis, which indicated that TGFBI, SFRP1, and COL16A1 were
   clearly at the center of the network ([111]Supplementary Figure 3A).

Discussion

   In clinical practice, although two patients with the same clinical and
   pathological parameters received the same treatment, their outcomes
   were sometimes drastically different. Robust biomarkers are emerging to
   stratify high-risk groups among these patients. For example, the
   expression level of miR-205 was an independent variable selected to
   predict the events of progressive disease ([112]Cañueto et al., 2017).
   Soluble VEGFR-2 level could predict the malignancy of ovarian neoplasms
   and poor prognosis in epithelial OC ([113]Sallinen et al., 2014). The
   lncHIFCAR-mediated mechanism for HIF-1 activation in oral carcinoma is
   of great value in determining prognosis and developing a potential
   therapeutic strategy ([114]Bolha et al., 2017; [115]Shih et al., 2017).
   Nevertheless, increasing studies support the fact that gene regulation
   in biological processes is complex, with multiple genes interacting
   with each other to form a network. Therefore, the expression status of
   a single gene is insufficient to predict the prognosis of these
   patients ([116]Qiu et al., 2017; [117]Chen et al., 2018).

   Epithelial–mesenchymal transition was initially studied as one of the
   five characteristics to differentiate benign and malignant tumors. One
   group proposed that an accumulation of a set of four specific cancer
   mutations results in the malignant transformation of normal cells
   ([118]Koten and Otter, 1991; [119]Koten et al., 1993). The EMT process
   has been well-studied for almost 30 years, and there have also been
   arguments about its mechanism as it relates to cancer ([120]Jolly et
   al., 2017). In EMT, cancer cells lose their epithelial features and
   acquire more invasive characteristics. Activation of EMT elicits
   changes in multiple fundamental aspects of cellular physiology, such as
   wound healing, tissue fibrosis, and drug resistance, and has become a
   hot topic in studying carcinoma progression ([121]Nieto et al., 2016;
   [122]Shibue and Weinberg, 2017). As previously described, EMT gives
   rise to a variety of intermediate cell states functioning as CSCs
   between the epithelial and mesenchymal states ([123]Anushka and Robert,
   2019). EMT might control the expression of immune checkpoint inhibitors
   and promote immune evasion in non-small-cell lung carcinoma
   ([124]Asgarova et al., 2018). miR-34a acts on SNAIL to regulate EMT in
   breast cancer and lung cancer cells ([125]He et al., 2017). The EMT
   markers currently used are not consistently related to poor prognosis
   because the cellular context is complex.

   The wide application of genomic profiling is gradually becoming the
   standard in clinical oncology and will undoubtedly accelerate progress
   in diagnosing and treating cancer ([126]Ocak et al., 2009; [127]Wheeler
   and Wang, 2013; [128]Ouellet et al., 2019). The age of big data has
   arrived, and the endless screening of hub molecules has emerged. For
   example, researchers presented a data-interlinked platform called
   BIOOPENER, which enabled the query of different types of mutations and
   genomic alterations that may contribute to molecular and clinical
   insights in cancer; this approach resulted in the discovery of three
   pathways that potentially cause promoter changes in gynecological
   cancers ([129]Jha et al., 2017). Gene signatures containing several
   genes are superior to individual biomarkers in predicting OC prognosis
   and survival and can be applied widely ([130]Shahid et al., 2016;
   [131]Liu et al., 2018; [132]Qiao et al., 2019).

   In the present study, we obtained mRNA data from 367 patients from the
   TCGA database, and vital bioinformatics analyses, such as GSEA, were
   used to reveal that EMT was potentially the most affected pathway in OC
   with p < 0.05. We then screened the 197 mRNAs related to EMT signaling
   and found that 83 were enriched in OC patients with stages III and IV
   disease. To narrow the field and improve the predictive efficiency,
   univariate and multivariate Cox regression analyses were performed, and
   six mRNAs, namely, TGFBI, SFRP1, COL16A1, THY1, PPIB, and BGN, were
   found to be associated with the prognosis of OC patients and
   subsequently combined to construct a six-gene signature model. TGFBI,
   SFRP1, COL16A1, and THY1 were validated to be high-risk genes, while
   PPIB and BGN were low-risk genes. The expression level of these genes
   as detected by qRT-PCR confirmed that the high-risk genes were
   overexpressed in OC tissues and that the protective genes were
   expressed at lower levels. We were able to distinguish low-risk and
   high-risk samples using the content of these six-mRNA signatures with
   relatively high prognostic accuracy. The prognosis-related biological
   processes and hub genes were further probed based on different risk
   groups. Among these mRNAs, TGFBI was investigated as a highly induced
   transcript during EMT in the non-small-cell lung cancer cell line A549,
   acting as a competing endogenous RNA (ceRNA) ([133]Qi et al., 2015) for
   miR-21 to modulate EMT, which indicated that the TGFBI 3′ UTR
   containing the miR-21 binding site could reduce miR-21 expression and
   mitigate its biological function ([134]Liu et al., 2019). SFRP1
   exhibited a tumor-promoting function by selectively activating TGFβ
   signaling in gastric cancer cells and thus activating EMT progression
   ([135]Peng et al., 2019). COL16A1 was found to be one of eight genes in
   a signature that predicts survival in OC and was validated to be highly
   expressed in cancer tissues compared with normal tissues ([136]Wang and
   Li, 2020). THY1 is more highly expressed in ovarian CSCs than in
   non-CSCs, and high THY1 expression in patients with serous OC indicates
   poorer outcomes. THY1 promotes proliferation and self-renewal in OC
   ([137]Elizabeth et al., 2019). BGN was indicated to be an endogenous
   inhibitor of bladder cancer cell proliferation by antiproliferative
   tyrosine kinase inhibitors. BGN is related to favorable prognosis
   ([138]Christian et al., 2013). However, we also discovered that SFRP1
   is thought to be a tumor suppressor that is epigenetically silenced by
   DNA methylation ([139]Rashidah et al., 2020). Given all these data,
   what should we think about the contradictory function of a single gene?
   As far as we know, many genes have “dual roles” in different cancers,
   even in different stages of one cancer type. For example, TGF-beta
   inhibits the proliferation of carcinomas in the early stages of breast
   cancer but promotes tumor growth and metastasis in later stages of
   cancer ([140]Aesun et al., 2005). In esophageal cancer cells, Nrf2
   promotes cell proliferation via metabolic reprogramming and ROS
   detoxification ([141]Yuki et al., 2017). By contrast, low levels of
   Nrf2 expression were correlated with poor survival in patients with
   melanoma (p = 0.0341), kidney cancer (p = 0.0203), and prostate cancer
   (p = 0.00279) ([142]Juan et al., 2014). Therefore, the role and
   function of genes in cancer are complex and multifactorial; they may
   depend on the tumor microenvironment, targets of genes, or tumor stage,
   all of which need to be studied in the future.

   We also investigated the prognostic value of the signature and compared
   it with that of other clinicopathologic parameters and different
   subgroups. Based on Cox analysis, the signature was able to strongly
   predict risk score, cancer status, and age in OC patients. It is
   generally considered scientific to represent tumor prognosis with
   5-year OS rates. It is obvious that 5-year OS was significantly
   different between the high-risk and low-risk groups, revealing that the
   signature is a prospective independent marker of prognosis. Actually,
   when assessed as a ROC curve, the combination of the six-gene signature
   with cancer status and age exhibited a more powerful prediction for OS
   than did the parameters individually, indicating that comprehensive
   consideration of these three factors may be a promising independent
   tool for OC. Finally, the GEO cohort was examined to confirm the
   predictive capacity of this signature, and the survival result was
   consistent with that obtained from the original dataset. In fact, OC
   metastasis to the liver, colon, and other important visceral tissues is
   common. However, the risk score of the six-gene signature was not
   significant in primary liver or colon cancer, which indicated that the
   risk score was specific to OC prognosis. To our knowledge, this study
   is the first to create a prognostic gene signature in OC.

   Despite the novel findings proposed by our study of candidates for OC
   prognosis, there are still limitations that require further
   investigation. This is a retrospective study, and the results would be
   more convincing with a larger sample size. In conclusion, the six
   signature genes TGFBI, SFRP1, COL16A1, THY1, PPIB, and BGN might be
   potential biomarkers for predicting the prognosis of OC patients. To
   some degree, our study might provide some clues for further
   investigation into the biological processes, clinical diagnosis, and
   therapeutic strategies of OC relating to these genes.

Data Availability Statement

   The mRNA expression and corresponding clinical information for OC
   patients were extracted from the TCGA data portal
   ([143]https://cancergenome.nih.gov/). GEO cohort ([144]GSE9891) was
   download from the GEO da ([145]http://www.ncbi.nlm.nih.gov/gds).

Ethics Statement

   The studies involving human participants were reviewed and approved by
   Shengjing Hospital of China Medical University (2018PS251K). The
   patients/participants provided their written informed consent to
   participate in this study.

Author Contributions

   XP finished the experiment, analyzed the data, drafted the manuscript,
   and prepared all figures and tables. XM designed the study, drafted and
   revised the manuscript. Both authors read and approved the final
   manuscript.

Conflict of Interest

   The authors declare that the research was conducted in the absence of
   any commercial or financial relationships that could be construed as a
   potential conflict of interest.

Acknowledgments