Abstract Gastric cancer (GC) is a common gastrointestinal tumor with poor prognosis. However, conventional prognostic factors cannot accurately predict the outcomes of GC patients. Therefore, there remains a need to identify novel predictive markers to improve prognosis. In this study, we obtained microRNA expression profiles of 385 GC patients from The Cancer Genome Atlas. We performed Cox regression analysis to identify overall survival‐related microRNA and then constructed a microRNA signature‐based prognostic model. The accuracy of the model was evaluated and validated through Kaplan–Meier survival analysis and time‐dependent receiver operating characteristic (ROC) curve analysis. The independent prognostic value of the model was assessed by multivariate Cox regression analysis. Enrichment analysis was performed to explore potential functions of the prognostic microRNA. Finally, a prognostic model based on a six‐microRNA (miRNA‐100, miRNA‐374a, miRNA‐509‐3, miRNA‐668, miRNA‐549, and miRNA‐653) signature was developed. Further analysis in the training, test, and complete The Cancer Genome Atlas set showed the model can distinguish between high‐risk and low‐risk patients and predict 3‐year and 5‐year survival. The six‐microRNA signature was also an independent prognostic marker, and enrichment analysis suggested that the microRNA may be involved in cell cycle and mitosis. These results demonstrated that the model based on the six‐microRNA signature can be used to accurately predict the prognosis of GC patients. Keywords: gastric cancer, microRNA, overall survival, prognosis __________________________________________________________________ Abbreviations AUC area under the ROC curve BP biological process CC cell component GC gastric cancer GO gene ontology HR hazard ratio KEGG Kyoto Encyclopedia of Genes and Genomes MF molecular function ROC curve time‐dependent receiver operating characteristic curve TCGA The Cancer Genome Atlas Gastric cancer (GC) is one of the most common gastrointestinal malignant tumors. In 2015, 1 310 000 people were diagnosed with GC around the world and 810 000 patients died because of GC. The morbidity and mortality of GC ranked 5th and 3rd among all malignant tumors, respectively [34]1. Due to atypical early symptoms, most patients are diagnosed with GC at an advanced stage and the median overall survival time is usually < 1 year [35]2, [36]3. On the other hand, although some patients have received radical surgery, up to 37%‐48% of them died from recurrence or metastasis [37]4. Therefore, the prognosis of GC is poor and it is very important and essential to improve early diagnosis and perform appropriate and individualized therapies based on prognosis. AJCC TNM staging system is a conventional prognostic indicator. However, it is sometimes difficult to obtain an accurate stage in clinical practice for several reasons, such as < 15 lymph nodes dissection and failure to remove the tumor completely. Moreover, the AJCC staging system could not distinguish some patients at the same stage but with different survival time [38]5, [39]6. In the genomic era, the most likely explanation is the molecular heterogeneity of the patients within the same stage group. Recently, several novel molecular classification schemas of GC have been proposed according to the heterogeneous molecular characteristics [40]7, [41]8. Logically, it is also necessary and crucial to develop a novel prognostic model based on molecular characteristics to predict the outcome of patients with GC. microRNA are a group of small noncoding RNA consisting of approximate 22 nucleotides. It has been demonstrated that one microRNA can regulate expression levels of multiple mRNA to exert its biological functions by participating in the degradation of mRNA or by inhibiting the translation of mRNA [42]9, [43]10. A number of studies have shown that microRNA are involved in proliferation [44]11, apoptosis [45]12, [46]13, differentiation [47]14, [48]15, invasion [49]16, [50]17, and migration [51]18 of GC cells. Moreover, several studies have reported that some microRNA can also affect the survival of patients with GC [52]19, [53]20. Consequently, it is feasible to construct a prognostic model based on expression profiles of microRNA. In this study, we developed a prognostic model of GC based on six‐microRNA expression signature by using The Cancer Genome Atlas (TCGA) high‐throughput sequencing data of microRNA. The six‐microRNA expression signature was associated with overall survival and can predict 3‐ and 5‐year overall survival of patients with GC. Moreover, it was also an independent prognostic factor. Material and methods Genetic and clinical data acquisition and processing Genetic and clinical data of patients with GC were obtained from TCGA ([54]http://cancergenome.nih.gov/). Genetic data included microRNA and mRNA expression levels for each patient, and clinical information included age, gender, pathological stage, histological grade, survival status, and overall survival time. microRNA and mRNA expression levels were measured by log (RPM + 1) and log (FPKM + 1), respectively. microRNA that were not expressed in more than 50% of patients were removed. The patients were randomly divided into two groups which served as the training set and test set by sampling package in r program (v3.5.0, The R Foundation, Vienna, Austria), and the survival status of patients balanced between the two sets. Statistical analysis Univariate and multivariate Cox regression analyses were used to identify the survival‐related microRNA in the training set. Then, the prognostic model based on the survival‐related microRNA was constructed according to Cox regression model, in which the regression coefficients represented the weights of micorRNA expression levels. The risk score of each patient was calculated by the sum of weighted expression levels of microRNA. The patients in each set were classified to the high‐risk group and low‐risk group using the median risk score in the training set as a cutoff value. Kaplan–Meier survival analyses by log‐rank test were used to compare the overall survival of patients in the two groups, and univariate Cox regression analyses were used to calculate hazard ratios (HR) between the two groups. Time‐dependent receiver operating characteristic curve (ROC curve) analyses were performed to evaluate the sensitivity and specificity of the prognostic model to predict 3‐ and 5‐year overall survival in each set by survival ROC [55]21 package in r program. In addition, multivariate Cox regression analyses were used to determine whether the microRNA signature was an independent prognostic marker. Function enrichment analysis Since microRNA exert their biological activities through trans‐regulating mRNA, the expression correlations between microRNA and mRNA were analyzed by Pearson's correlation test. mRNA with correlation coefficients value < −0.3 and P < 0.05 were identified as target genes of microRNA. Subsequently, gene ontology (GO) in cell component (CC), molecular function (MF) and biological process (BP) categories, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed and visualized by clusterprofiler [56]22 package in r program. P < 0.05 was considered to be significant. Results Preparation of genetic and clinical data Genetic and clinical data of 385 patients with gastric adenocarcinoma were downloaded from the TCGA database. They were randomly assigned to the training set (n = 192) and test set (n = 193). There were no statistically significant differences in age, gender, pathological stage, histological grade, and survival status between the two sets (Table [57]1). After removing genes unexpressed in more than half of the samples, 566 out of total 1046 microRNA were further analyzed (Table [58][Link], [59][Link], [60][Link]). Table 1. Clinical characteristics of patients in each dataset Training set (n = 192) Test set (n = 193) χ^2 P‐value Age (years) < 67 89 102 1.7697 0.183 ≧67 103 88 Gender Male 123 130 0.329 0.566 Female 69 63 Histological grade G1/G2 69 76 0.404 0.525 G3 119 112 Pathological stage I/II 82 92 0.067 0.796 III/IV 103 100 Survival status Alive 118 119 0 1 Dead 74 74 [61]Open in a new tab Development of prognostic model in the training set In the training set, by univariate Cox regression analysis, we found that the expression levels of 46 microRNA were related to the overall survival time of patients (Table [62]S4). Subsequently, by multivariate Cox regression analysis, we found that the expression levels of six in the 46 microRNA were related to the overall survival of patients (Table [63]2). They were independent prognostic factors of GC patients. Among them, miRNA‐100, miRNA‐653, and miRNA‐668 were risk genes, while miRNA‐374a, miRNA‐509‐3, and miRNA‐549 were protective genes. Table 2. microRNA independently associated with overall survival Coefficient P‐value HR 95% confidence interval miRNA‐100 1.163 0.0212 3.199 1.193–8.576 miRNA‐374a −1.619 0.009 0.198 0.059–0.663 miRNA‐509‐3 −1.471 0.038 0.230 0.057–0.919 miRNA‐549 −0.980 0.045 0.375 0.144–0.980 miRNA‐653 0.551 0.029 1.735 1.058–2.844 miRNA‐668 2.723 0.021 15.224 1.512–153.341 [64]Open in a new tab To construct a prognostic model, multivariate Cox regression analysis was performed on the six microRNA with independent prognostic value, and the weight of each microRNA expression level in the predictive model was obtained according to the regression coefficient. The risk score was defined as follows: Risk score = (0.336*expression level of miRNA‐100) + (−0.777*expression level of miRNA‐374a) + (−0.578*expression level of miRNA‐509‐3) + (−0.487*expression level of miRNA‐549) + (0.618*expression level of miRNA‐653) + (1.223*expression level of miRNA‐668). Based on this model, the risk score of each patient was calculated and there were 96 patients in the high‐risk group and 96 patients in the low‐risk group in the training set using the median risk score of patients in the training set as cutoff value. Kaplan–Meier survival analysis by log‐rank test demonstrated that there was a significant difference between the two groups. Patients in the low‐risk group tended to have longer overall survival time than those in the high‐risk group (P < 0.001, Fig. [65]1A). The univariate Cox regression analysis indicated that the HR of high‐risk group versus low‐risk group was 3.154 (95% CI: 1.899–5.24, P < 0.001, Table [66]3). Furthermore, time‐dependent ROC analysis of the six‐microRNA signature showed that the area under the ROC curve (AUC) reached 0.759 and 0.821 to predict 3‐ and 5‐year survival (Fig. [67]1B). Therefore, the six‐microRNA signature‐based model can predict the prognosis of patients. Figure 1. Figure 1 [68]Open in a new tab The prognostic performance of the six‐microRNA signature in the training set. (A) Kaplan–Meier survival analysis by log‐rank test between the high‐risk group and low‐risk group in the training set. (B) Time‐dependent ROC analysis for the six‐microRNA signature to predict 3‐ and 5‐year survival. Table 3. Univariate and multivariate analyses of clinical characteristics and the six‐microRNA signature Univariate analysis Multivariate analysis HR 95% CI P‐value HR 95% CI P‐value Training set Age (< 67/≧ 67 years) 1.372 0.861–2.186 0.184 1.63 1.012–2.628 0.045 Gender (male/female) 0.832 0.513–1.348 0.455 0.913 0.559–1.490 0.715 Histological grade (G1, G2/G3) 1.973 1.157–3.365 0.013 1.696 0.953–2.983 0.058 Pathological stage (I, II/III, IV) 1.927 1.174–3.163 0.01 1.711 1.032–2.839 0.037 Six‐microRNA signature (low risk/high risk) 3.154 1.899–5.24 < 0.001 2.682 1.597–4.504 < 0.001 Test set Age (< 67/≧ 67 years) 1.344 0.844–2.14 0.212 1.792 1.087–2.953 0.022 Gender (male/female) 0.76 0.454–1.272 0.296 0.677 0.386–1.189 0.175 Histological grade (G1, G2/G3) 1.04 0.648–1.67 0.871 0.946 0.573–1.563 0.83 Pathological stage (I, II/III, IV) 1.958 1.208–3.174 0.006 1.819 1.096–3.019 0.021 Six‐microRNA signature (low risk/high risk) 1.699 1.07–2.698 0.025 1.702 1.039–2.787 0.035 Entire TCGA set Age (< 67/≧ 67 years) 1.324 0.957–1.831 0.09 1.645 1.176–2.302 0.004 Gender (male/female) 0.807 0.568–1.146 0.231 0.768 0.535–1.102 0.152 Histological grade (G1, G2/G3) 1.406 0.993–1.991 0.055 1.232 0.86–1.765 0.256 Pathological stage (I, II/III, IV) 1.94 1.376–2.734 < 0.001 1.724 1.211–2.455 0.003 Six‐microRNA signature (low risk/high risk) 2.3 1.646–3.216 < 0.001 2.12 1.496–3.005 < 0.001 [69]Open in a new tab Validation of the prognostic model in testing and entire TCGA set To assess the predictive value of this model, we further validated the six‐microRNA signature in the test set. By using the same risk score calculation method, the 193 patients in the test set were divided into the high‐risk group (n = 87) and low‐risk group (n = 106) according to the same cutoff value as used in the training set. The result of Kaplan–Meier survival analysis was consistent with that in the training set. The patients in the low‐risk group tend to have longer overall survival time than those in the high‐risk group (P = 0.023, Fig. [70]2A). The HR of the high‐risk group versus the low‐risk group was 1.699 (95% CI: 1.07–2.698, P = 0.025, Table [71]3) according to univariate Cox regression analysis. The AUC in time‐dependent ROC analysis was 0.708 at 3‐year survival and 0.729 at 5‐year survival (Fig. [72]2B). These results showed that the model also performed well in the test set. Figure 2. Figure 2 [73]Open in a new tab The prognostic performance of the six‐microRNA signature in the test set. (A) Kaplan–Meier survival analysis by log‐rank test between the high‐risk group and low‐risk group in the test set. (B) Time‐dependent ROC analysis for the six‐microRNA signature to predict 3‐ and 5‐year survival. To further verify the robustness of the prognostic model, the six‐microRNA signature was tested in the entire TCGA set. By using the same risk cutoff criteria as above, the patients in the entire TCGA set were classified into the high‐risk group (n = 183) and low‐risk group (n = 202). Similar result of Kaplan–Meier survival analysis by log‐rank test was observed. The patients in the low‐risk group tended to have better overall survival than those in the high‐risk group (P < 0.001, Fig. [74]3A). The univariate Cox regression analysis showed that the HR of the high‐risk group versus the low‐risk group was 2.3 (95% CI: 1.646–3.216, P < 0.001, Table [75]3). Time‐dependent ROC analyses illustrated that the AUC of the prognostic model to predict 3‐ and 5‐year survival was 0.71 and 0.789 (Fig. [76]3B). These analyses on the entire TCGA set confirmed the robustness of the six‐microRNA signature. Figure 3. Figure 3 [77]Open in a new tab The prognostic performance of the six‐microRNA signature in the entire TCGA set. (A) Kaplan–Meier survival analysis by log‐rank test between the high‐risk group and low‐risk group in the entire TCGA set. (B) Time‐dependent ROC analysis for the six‐microRNA signature to predict 3‐ and 5‐year survival. Assessment of independence value of the six‐microRNA signature To assess the independent prognostic value of six‐microRNA signature, multivariate Cox regression analyses were performed. The consistent results in the training, test, and entire TCGA set showed that pathological stage, age, and the six‐microRNA signature were independent prognostic markers of patients with GC. The HR of high‐risk group versus low‐risk group was 2.682 (95% CI: 1.597‐4–504) in the training set, 1.702 (95% CI: 1.039–2.787) in the test set, and 2.120 (95% CI: 1.496–3.005) in the entire TCGA set. However, time‐dependent ROC analysis of pathological stage and age to predict 3‐ and 5‐year survival revealed that the AUCs were < 0.7 (Fig. [78]4A,B). These results demonstrated that the six‐microRNA signature was an independent prognostic marker of GC patients and superior to pathological stage and age. Figure 4. Figure 4 [79]Open in a new tab Time‐dependent ROC analysis for pathological stage and age to predict 3‐ and 5‐year survival in the entire TCGA set. (A) Pathological stage. (B) Age. Function pathway enrichment analysis of the six microRNA To explore potential functions of these six microRNA, 978 co‐expressed mRNA, which may be the target genes of the microRNA, were identified by Pearson's correlation test. GO enrichment analysis of the co‐expressed mRNA suggested that chromosome, centromeric region, ATPase activity, and mitotic nuclear division were the most significantly enriched CC, MF, and BP categories (Fig. [80]5A–C, Table [81]S5). KEGG pathway enrichment analysis indicated that cell cycle was the most significantly enriched pathway (Fig. [82]5D, Table [83]S6). In addition, these mRNA also functioned as microtubule binding, tubulin binding, etc., which have been proved to be related to cell proliferation. They were also involved in some cancer‐related biologic processes or signal pathways such as cell cycle phase transition, cell cycle checkpoint, regulation of cell division, and p53 signal pathway. Figure 5. Figure 5 [84]Open in a new tab GO and KEGG pathway enrichment analysis. (A) Top 20 significantly enriched cellular component GO annotations. (B) Significantly enriched BP GO annotations. (C) Top 20 significantly enriched MF GO annotations. (D) Significantly enriched KEGG pathways. Discussion In the present study, we identified six survival‐related microRNA in patients with GC by Cox regression model and proposed a six‐microRNA signature‐based prognostic model. The model can distinguish the patients of GC with poor and good prognosis, and the ROC curve analysis showed that the AUC of the model to predict 3‐ or 5‐year overall survival was > 0.7. In addition, according to the multivariate Cox regression analysis, the six‐microRNA signature was also an independent prognostic marker. These results, which were validated in the training set, test set, and entire TCGA set, illustrated that the model based on six‐microRNA signature was robust to predict the outcomes of patients with GC. There have been several similar studies which developed prognostic models of GC depending on molecular profiles. Tow studies have constructed prognostic models based on the mRNA signature. However, both the sample sizes were relatively small [85]23, [86]24. Another study conducted by Wang et al. [87]25 built a model based on a nine‐mRNA signature to predict the prognosis of GC patients. It can distinguish patients with high risk or low risk in a cohort but cannot predict the prognosis of a single patient, because the evaluation method was based on median gene expression levels of the cohort. Recently, noncoding RNA were also used to construct prognostic models. Miao et al. [88]26 proposed a four lncRNA‐based prognostic model of GC. However, the AUC of time‐dependent ROC curve to predict 5‐year overall survival was < 0.7. Another study [89]27 developed a microRNA‐based model, but it did not evaluate the prognostic value on predicting 3‐ and 5‐year survival. Compared with these studies, the model in the current study can distinguish the patients with poor or good prognosis, and it also performed well in predicting 3‐ and 5‐year survival. In our study, six microRNA were identified to be associated with overall survival. The enrichment analyses revealed that the target mRNA of them took part in process of cell cycle, mitosis, p53 signal pathway, etc. These results can explain why the six microRNA were related to the prognosis of patients. On the other hand, most of the microRNA have been found to be related to tumors. Among these microRNA, miRNA‐100 and miRNA‐374a were the most frequently studied microRNA. Nevertheless, controversial results about their roles in tumors have been reported. miRNA‐100 was upregulated in patients with diffuse‐type GC and related to the depth of invasion, lymph node metastasis, and stage [90]28. On the contrary, another study showed that miRNA‐100 could promote apoptosis of GC cell through Notch‐apoptosis pathway and improve the sensitivity of GC cells to chemotherapy [91]29. miRNA‐374a could promote proliferation, migration, and invasion of GC cells through downregulating SRCIN1 while inhibit proliferation, invasion, migration, and intrahepatic metastasis of colon cancer cells by targeting CCND1 [92]30, [93]31. In our study, miRNA‐100 was a risk microRNA and miRNA‐374a was a protective microRNA. These inconsistent results may be due to the different tumor types or microenvironments such as in vitro and in vivo. miRNA‐509‐3 has been previously identified as a tumor suppressor gene in lung cancer [94]32, ovarian cancer [95]33, hepatoma [96]34, leukemia [97]35, renal cell carcinoma [98]36, and GC [99]37. It was also an independent prognostic biomarker in GC patients. These findings were consistent with ours. miRNA‐668 might play a role of oncogene [100]38 and could be associated with radioresistance in breast cancer [101]39. In our study, similar results were found and showed that miRNA‐668 was a risk gene in GC. To date, there have been no direct studies focusing on the relationships between miRNA‐549 or miRNA‐653 and tumors. However, our study showed that miRNA‐549 was a protective microRNA and miRNA‐653 was a risk microRNA, which deserve further study. In summary, our study identified six survival‐related microRNA (miRNA‐100, miRNA‐374a, miRNA‐509‐3, miRNA‐668, miRNA‐549, and miRNA‐653) in GC patients and developed a prognostic prediction model. The model can be utilized to predict the risk of death and 3‐ and 5‐year overall survival for patients with GC. Moreover, the six‐microRNA signature of the model was also a novel independent molecular prognostic biomarker. These results will contribute to individualized therapies for GC patients. Author contributions Y‐fH and JC conceived and designed the experiments. BH reviewed drafts of the paper. B‐jS and WW performed the experiments and analyzed the data. X‐jQ prepared figures and/or tables. Conflicts of interest The authors declare no conflict of interest. Supporting information Table S1. Training set. [102]Click here for additional data file.^ (970.6KB, xlsx) Table S2. Test set. [103]Click here for additional data file.^ (973.8KB, xlsx) Table S3. Clinical information. [104]Click here for additional data file.^ (26.7KB, xlsx) Table S4. Survival‐related microRNA according to univariate Cox regression analysis. [105]Click here for additional data file.^ (13.6KB, xlsx) Table S5. GO enrichment analysis of the co‐expressed mRNA. [106]Click here for additional data file.^ (30.8KB, xlsx) Table S6. KEGG pathwy enrichment analysis of the co‐expressed mRNA. [107]Click here for additional data file.^ (9.3KB, xlsx) Acknowledgement