Graphical abstract
   System overview of GASE. (A). MicroRNA expression profiles and survival
   time of patients with stomach and esophageal carcinoma are input of the
   system. (B) GASE method development and (C) microRNA discovery and
   analysis.
   graphic file with name ga1.jpg
   [35]Open in a new tab
   Keywords: miRNA signature, Machine learning, Survival estimation,
   Stomach and esophageal carcinoma
Abstract
   Identifying a miRNA signature associated with survival will open a new
   window for developing miRNA-targeted treatment strategies in stomach
   and esophageal cancers (STEC). Here, using data from The Cancer Genome
   Atlas on 516 patients with STEC, we developed a Genetic Algorithm-based
   Survival Estimation method, GASE, to identify a miRNA signature that
   could estimate survival in patients with STEC. GASE identified 27
   miRNAs as a survival miRNA signature and estimated the survival time
   with a mean squared correlation coefficient of 0.80 ± 0.01 and a mean
   absolute error of 0.44 ± 0.25 years between actual and estimated
   survival times, and showed a good estimation capability on an
   independent test cohort. The miRNAs of the signature were prioritized
   and analyzed to explore their roles in STEC. The diagnostic ability of
   the identified miRNA signature was analyzed, and identified some
   critical miRNAs in STEC. Further, miRNA-gene target enrichment analysis
   revealed the involvement of these miRNAs in various pathways, including
   the somatotrophic axis in mammals that involves the growth hormone and
   transforming growth factor beta signaling pathways, and gene ontology
   annotations. The identified miRNA signature provides evidence for
   survival-related miRNAs and their involvement in STEC, which would aid
   in developing miRNA-target based therapeutics.
1. Introduction
   Stomach and esophageal carcinomas (STEC) are among the most prevalent
   malignant diseases causing thousands of deaths globally. Worldwide,
   stomach cancer ranks sixth in cancer incidence, with 1,089,103 new
   cases, and third in cancer morality, with 768,793 deaths, while
   esophageal cancer ranks tenth in cancer incidence, with 604,100 new
   cases, and sixth in cancer morality, with 544,076 deaths, based on
   estimates for the year 2020 [36][1]. STEC ranks higher in mortality
   than incidence because these cancers are often first diagnosed at an
   advanced stage. In the United States, diagnosis occurs at a localized,
   regional, and distant stage in 28 %, 32 %, and 40 %, respectively, of
   stomach cancer cases, and in 25 %, 29 %, and 31 %, respectively of
   esophageal cancer cases [37][2], [38][3]. For localized, regional, and
   metastatic disease, five-year survival is 64 %, 28.2 %, and 5.3 %,
   respectively, for stomach cancer, and 46.7 %, 25.1 %, and 4.8 %,
   respectively, for esophageal cancer [39][2], [40][3]. Treatment for
   STEC is selected based on disease stage [41][4], [42][5]. Surgery can
   be curative but is offered mainly in early disease stages. Chemotherapy
   and chemoradiotherapy provide an added survival benefit to surgery in
   early-stage disease and are offered without surgery in later disease
   stages. Targeted therapies (e.g., Trastuzumab, an inhibitor of human
   epidermal growth factor receptor 2) improve survival in STEC and are
   increasingly being used in STEC treatment [43][6], and immunotherapy
   and other emerging therapies continue to be evaluated for improvement
   in STEC survival [44][4], [45][5].
   Biomarkers associated with STEC survival are potential targets for
   designing new STEC treatments to improve patient survival [46][7],
   [47][8]. MicroRNAs (miRNAs) function as oncogenes or tumor suppressor
   genes in STEC [48][6], [49][7] and have been investigated as biomarkers
   of STEC diagnosis and prognosis [50][9], [51][10]. Roles for miRNAs in
   STEC progression and survival have been described in several reports.
   For example, low levels of miR148a, a miRNA that suppresses cell
   invasion and migration, are associated with advanced clinical stage and
   poor prognosis in stomach cancer [52][11]. MiR-616-3p promotes
   angiogenesis and metastasis and is correlated with poor prognosis in
   stomach cancer [53][12]. Elevated miR-21 expression is linked to lymph
   node metastasis [54][13] and poor prognosis [55][14] in esophageal
   cancer. MiR-375 targets proteins involved in cancer cell proliferation
   and invasion [56][15], and its downregulation is associated with
   advanced cancer staging and poor prognosis in esophageal squamous cell
   carcinoma [57][16]. Aberrant miRNA expression has also been identified
   in STEC. Hwang et al. identified miRNAs, including miR-601, miR-107,
   miR-18a, miR-370, miR-300 and miR-96 that were significantly expressed
   in early gastric cancers when compared to normal samples [58][17]. A
   serum biomarker miRNA panel consisting of 12 miRNAs was developed for
   risk assessment in patients with gastric cancer [59][18]. Furthermore,
   several dysregulated miRNAs have been found in esophageal tumors that
   regulate carcinogenesis [60][19], [61][20]. A quantitative RT-qPCR
   study on patients with esophageal carcinoma revealed three miRNAs,
   including miR-34a-5p, miR-148a-3p and miR-181a-5p that were associated
   with the cancer progression [62][21].
   In most studies, associations between miRNAs and STEC survival have
   been based on results from a single study sample assessed using the
   log-rank test to compare Kaplan-Meier survival curves or Cox
   proportional hazards regression analysis [63][22], [64][23], [65][24],
   [66][25]. A few other studies have employed discovery and validation
   stages in their design to increase the strength of the evidence
   supporting associations between miRNAs and STEC survival. These include
   studies that have identified differentially expressed miRNAs in STEC in
   the discovery stage and tested for association between the miRNAs and
   survival in an independent STEC study sample in the validation stage
   [67][26], [68][27], [69][28], [70][29], [71][30], [72][31]. Machine
   learning methods are also being applied to identify miRNAs associated
   with STEC survival. In a study of esophageal squamous cell carcinoma, a
   recursive feature elimination-support vector machine algorithm along
   with LASSO Cox proportional hazards regression was used to identify
   miRNAs associated with survival and build a prognostic model in a
   training sample, and the prognostic model was shown to correlate with
   survival in an independent, test sample [73][32]. While these previous
   reports indicate that miRNAs have potential clinical value as
   biomarkers of prognosis in STEC, they have not addressed whether miRNAs
   can predict STEC survival time in individual patients.
   To design a personalized survival prediction model, it is necessary to
   identify biomarkers that show a robust association with survival in
   STEC patients. Accordingly, this study aimed to develop a genetic
   algorithm (GA)-based survival estimation method (GASE) to identify a
   survival-associated miRNA signature and estimate survival time in
   patients with STEC. A genetic algorithm (GA)-based survival estimation
   method (GASE) is proposed for estimating the survival time in STEC
   patients using miRNA expression profiles. GASE was developed using
   support vector regression (SVR) that incorporates an optimal feature
   selection algorithm inheritable bi-objective combinatorial genetic
   algorithm (IBCGA) [74][33]. The identified miRNA signature was analyzed
   further to explore miRNA association with STEC. The system overview of
   GASE is shown in the graphical abstract.
2. Material and methods
   The miRNA expression profiles of patients with STEC were retrieved from
   The Cancer Genome Atlas (TCGA) database. These data were generated
   using an Illumina Hiseq 2000 sequencing platform. The number of
   patients with STEC in the initial dataset was 628. After excluding the
   patients without survival information and those whose survival time was
   less than 30 days, the final dataset consisted of 123 patients with
   miRNA expression profiles and clinical data, including days to death.
   Each miRNA expression profile consisting of 500 miRNAs was used for the
   survival estimation procedure. For the independent validation, we used
   a cohort of 393 patients who were alive with STEC at last follow-up in
   the TCGA.
2.1. Survival estimation method GASE
   The GASE’s two primary objectives were to estimate the survival time
   and simultaneously identify the miRNA signature associated with
   survival in patients with STEC. GASE was developed using SVR and an
   optimal feature selection algorithm IBCGA. The optimization technique
   implemented in GASE was adopted from previous studies [75][34],
   [76][35], [77][36]. SVM is a supervised machine learning method, which
   has demonstrated good prediction capability in solving classification
   and regression problems in various biomedical fields, especially in
   cancer genomics [78][37]. SVR uses a nonlinear transformation to find
   the relation between input and output variables by generating a
   hyperplane that optimally fits in the high dimensional space and
   carries out the regression function [79][38]. The tuning of the
   parameters C, γ, and ν determine the performance of SVR; hence
   parameter tuning plays a vital role in the SVR modeling process. The
   minimization of the loss function can be optimized using the following
   objective function for the given input data points.
   [MATH: min12<
   mrow>|w|2+C∑i=1Nξi
   mi>+ξi∗ :MATH]
   (1)
   where ||w|| is the magnitude of the vector to the surface, C is a
   regularization parameter, ξ[i] and ξ[i]* are slack variables, ξ[i] ≥ 0,
   ξ[i]* ≥ 0, and i = 1,2,…N.
   The optimal parameters of GASE were tuned based on an intelligent
   evolutionary algorithm (IEA) [80][39]. In the optimization process,
   IBCGA [81][33] was used to identify a small set of miRNAs while
   maximizing the fitness function in terms of squared correlation
   coefficient. GASE prediction performance was evaluated using two
   metrics, squared correlation coefficient and mean absolute error. IBGCA
   effectively solves bi-objective combinatorial problems where a small
   set of informative features will be selected from a large number of
   candidate features. The applications of IBCGA in identifying biomarkers
   in cancer research have been demonstrated in previous studies [82][34],
   [83][35], [84][36], [85][40], [86][41]. In the optimal feature
   selection process, all the candidate features were encoded into binary
   variables, including the parameters C, γ, and ν of the SVR. The
   detailed steps involved in IBCGA can be found in the [87]supplementary
   methods. After identifying the miRNA signature, main effect difference
   (MED) [88][42] analysis was used to prioritize the miRNAs of the
   signature based on their contribution to the prediction performance.
2.2. Feature appearance score
   To ensure robustness, we performed 50 independent runs of GASE and
   selected one feature set with the highest appearance score for the
   analysis. The feature appearance score (FAS) indicates the frequency of
   the features that appeared in the 50 independent runs. A feature set
   with a more significant appearance score suggests that the feature
   frequency in that particular set is higher when compared to other
   features across the independent runs. There are S[t] features in the
   t-th signature. The frequency score for each feature m presented in the
   miRNA signatures can be calculated as follows.
   [MATH: FeatureappearanceScore=∑i=
   1Stf(mi)/St :MATH]
   (2)
   where m is the miRNA of the t-th signature.
2.3. LASSO and elastic net
   To evaluate the estimation ability of GASE, we compared the prediction
   performance with some standard regression methods, including ridge
   [89][43], Lasso [90][44] and elastic net [91][45]. We used the miRNA
   expression profiles and survival time of 123 patients with STEC as
   input. The minimum λ was selected after 100 independent runs of LASSO
   and elastic net using 10-CV.
2.4. Strong evidence on miRNA-gene target interaction
   To identify the target genes of the selected miRNAs, we used the
   miRTarBase (9.0 beta) database [92][46] to extract the experimentally
   verified microRNA-target interactions (MTIs) with strong evidence,
   which are validated by reporter assay, Western blot, and qPCR.
2.5. Gene set enrichment test
   Gene-set libraries are used to organize accumulated knowledge about the
   function of groups of genes. We used Enrichr [93][47], [94][48], which
   is a web-based application that includes the latest gene-set libraries,
   to perform gene-set enrichment analysis. We evaluated the ability of
   Enrichr to rank terms from gene-set libraries by combining the p-value
   computed using Fisher’s exact test with the z-score of the deviation
   from the expected rank by multiplying these two numbers as follows:
   [MATH: c=log(p)·z :MATH]
   where z = z-score and p = p-value.
   This study used six Gene-set libraries, including 1) WikiPathway Human
   2021 [95][49], 2) Kyoto Encyclopedia of Genes and Genomes (KEGG), 3)
   MSigDB Hallmark [96][50], 4) Gene Ontology Molecular Function 2021
   [97][51], 5) Gene Ontology Biological Process, and 6) Gene Ontology
   Cellular Component.
3. Results
3.1. GASE prediction performance
   We used a survival estimation method, GASE, to identify a miRNA
   signature and estimate the survival time in patients with STEC. One
   hundred and twenty-three patients with miRNA expression profiles were
   retrieved from the TCGA database. GASE identified 27 miRNAs as a
   survival miRNA signature and estimated the survival time with a mean
   squared correlation coefficient (R^2) of 0.80 ± 0.01 and a mean
   absolute error (MAE) of 0.44 ± 0.25 years between actual and estimated
   survival times.
   A robust miRNA signature was selected by measuring the frequency
   appearance score (FAS) using 50 independent runs of GASE. A miRNA
   signature with the highest FAS indicates higher frequencies of miRNAs
   in the signature across the independent runs of GASE. The mean FAS
   obtained for the independent runs was 15.55 ± 1.45, while the highest
   FAS was 18.85 (shown in [98]Supplementary Fig. S1 and [99]Supplementary
   Table S1). The feature set with the highest FAS was selected for the
   analysis. This feature set obtained a R^2 of 0.80 and a MAE of
   0.43 years between actual and estimated survival times, and selected 27
   miRNAs as a signature to estimate survival time in patients with STEC.
3.2. Prediction performance comparison and validation
   Next, we compared GASE with some standard machine learning methods on
   their performance to predict survival times. The machine learning
   methods used in the comparison included ridge regression, least
   absolute shrinkage and selection operator (Lasso) and elastic net.
   Ridge regression obtained a R^2 of 0.77 and a MAE of 0.54 years between
   actual and estimated survival times. Lasso obtained a R^2 of 0.51 and a
   MAE of 0.69 years between actual and estimated survival times, and
   elastic net obtained a R^2 of 0.50 and a MAE of 0.71 years between
   actual and estimated survival times, respectively. In comparison, GASE
   obtained a highest R^2 of 0.83 and a MAE of 0.41 years between actual
   and estimated survival times ([100]Table 1). The results indicated that
   the performance of GASE was better than that of the standard machine
   learning methods. The correlation plots of GASE and the other machine
   learning methods are shown in [101]Supplementary Fig. S2A-D.
Table 1.
   Prediction performance of GASE.
        Method         R^2      MAE (years) Features selected
   Ridge regression 0.77        0.54        485
   LASSO            0.51        0.69        28
   Elastic net      0.50        0.71        30
   GASE-FAS         0.80        0.43        27
   GASE-Best        0.83        0.41        32
   GASE-Mean        0.80 ± 0.01 0.44 ± 0.25 33.44 ± 3.59
   [102]Open in a new tab
   Next, the estimation ability of GASE was validated using a validation
   dataset consisting of 393 patients with STEC along with their follow-up
   times. The follow-up times of these patients were in the range of
   0.3–56 months. We attempted to estimate the survival times of these
   patients using the GASE prediction model. The mean follow-up times
   observed in patients with STEC was 8.09 ± 12.09 months. The mean
   predicted survival time of these patients was 17.74 ± 10.50 months.
   GASE achieved an accuracy of 80.41 % for estimating the survival times
   of patients whose estimated survival times were higher than the
   follow-up times (mean follow-up time 4.0 ± 5.9 months). The mean
   estimated survival time of the 316 patients was 19.10 ± 10.28 months,
   and a mean prediction error of 12.15 months was obtained for the
   remaining patients. The results could be interpreted as follows: an
   estimated survival time that was higher than the patient’s follow-up
   time was considered as a correct prediction, whereas an estimated
   survival time that was lower than the follow-up time was a considered a
   prediction error. The follow-up and estimated survival times of these
   patients are shown in [103]Fig. 1.
Fig. 1.
   [104]Fig. 1
   [105]Open in a new tab
   The GASE prediction performance on an independent test cohort of 393
   patients with follow-up times.
3.3. Ranking of miRNA signature
   The miRNAs of the identified miRNA signature were ranked based on their
   contribution towards estimating the survival time using main effect
   difference (MED) [106][42] analysis. A higher MED score represents
   greater contribution towards the prediction of survival time. A miRNA
   with a higher MED score indicates superior prediction ability towards
   the survival time estimation, whereas a lower-scoring miRNA indicates a
   smaller contribution to survival time estimation. The top 10 ranked
   miRNAs according to the MED analysis, include hsa-miR-760,
   hsa-miR-767-5p, hsa-miR-1301-3p, hsa-miR-891a-5p, hsa-miR-532-5p,
   hsa-miR-29a-5p, hsa-miR-16-5p, hsa-miR-130a-5p, hsa-miR-329-3p, and
   hsa-miR-496 ([107]Table 2). The prioritization of miRNAs based on their
   contribution to the survival estimation is shown in [108]Fig. 2.
Table 2.
   Ranking of miRNA signature and corresponding MED scores.
   Rank      miRNA        MIMAT-ID     MED
   1    hsa-miR-760     MIMAT0004957 1.728135
   2    hsa-miR-767-5p  MIMAT0003882 1.480966
   3    hsa-miR-1301-3p MIMAT0005797 1.344602
   4    hsa-miR-891a-5p MIMAT0004902 1.14225
   5    hsa-miR-532-5p  MIMAT0002888 1.139153
   6    hsa-miR-29a-5p  MIMAT0004503 0.887408
   7    hsa-miR-16-5p   MIMAT0000069 0.88658
   8    hsa-miR-130a-5p MIMAT0004593 0.863724
   9    hsa-miR-329-3p  MIMAT0001629 0.844311
   10   hsa-miR-496     MIMAT0002818 0.818043
   11   hsa-miR-20a-3p  MIMAT0004493 0.724058
   12   hsa-miR-125a-5p MIMAT0000443 0.63757
   13   hsa-miR-181b-5p MIMAT0000257 0.590379
   14   hsa-miR-675-3p  MIMAT0006790 0.578151
   15   hsa-miR-9-5p    MIMAT0000441 0.484588
   16   hsa-miR-664a-5p MIMAT0005948 0.425219
   17   hsa-miR-93-5p   MIMAT0000093 0.364274
   18   hsa-miR-30e-5p  MIMAT0000692 0.355408
   19   hsa-miR-376c-3p MIMAT0000720 0.345478
   20   hsa-miR-326     MIMAT0000756 0.312151
   21   hsa-miR-193a-5p MIMAT0004614 0.275742
   22   hsa-miR-532-3p  MIMAT0004780 0.268942
   23   hsa-miR-625-3p  MIMAT0004808 0.259763
   24   hsa-miR-106a-5p MIMAT0000103 0.213424
   25   hsa-let-7 g-5p  MIMAT0000414 0.152833
   26   hsa-let-7f-5p   MIMAT0000067 0.04358
   27   hsa-miR-193b-5p MIMAT0004767 0.010963
   [109]Open in a new tab
Fig. 2.
   Fig. 2
   [110]Open in a new tab
   Chord diagram showing the prioritization of miRNAs of the signature
   based on their survival estimation ability in stomach and esophageal
   carcinoma. The size of the line is proportional to the percent
   contribution towards the survival estimation.
3.4. Diagnosis prediction
   The diagnostic ability of the identified miRNA signature was measured
   by distinguishing healthy and STEC patients using CancerMiRNome
   database [111][52]. The individual miRNAs that compose the miRNA
   signature had AUCs in a range of 0.49–0.94 for distinguishing healthy
   from STEC patients, as shown in [112]Table 3. Among the signature
   miRNAs, 13 miRNAs, including hsa-miR-93-5p, hsa-miR-1 81b-5p,
   hsa-miR-125a-5p, hsa-miR-1301-3p, hsa-miR-30e-5p, hsa-miR-767-5p,
   hsa-miR-16-5p, hsa-miR-675-3p, hsa-miR-326, hsa-miR-760,
   hsa-miR-20a-3p, hsa-miR-664a-5p, and hsa-miR-130a-5p were good
   diagnostic predictors of esophageal carcinoma (ESCA) (AUC ≥ 0.70), as
   shown in [113]Fig. 3. Ten miRNAs, including hsa-miR-30e-5p,
   hsa-miR-1301-3p, hsa-miR-125a-5p, hsa-miR-93-5p, hsa-miR-326,
   hsa-miR-532-5p, hsa-miR-9-5p, hsa-miR-181b-5p, hsa-miR-193a-5p, and
   hsa-let-7 g-5p were good diagnostic predictors of stomach
   adenocarcinoma (STAD) (AUC ≥ 0.7), as shown in [114]Fig. 4.
Table 3.
   Diagnosis prediction of patients with STEC using the miRNA signature.
       miRNAs      ESCA-AUC STAD-AUC
   hsa-miR-760       0.73     0.60
   hsa-miR-767-5p    0.78     0.63
   hsa-miR-1301-3p   0.82     0.82
   hsa-miR-891a-5p   0.52     0.53
   hsa-miR-532-5p    0.59     0.78
   hsa-miR-29a-5p    0.57     0.64
   hsa-miR-16-5p     0.77     0.49
   hsa-miR-130a-5p   0.70     0.56
   hsa-miR-329-3p    0.55     0.53
   hsa-miR-496       0.60     0.59
   hsa-miR-20a-3p    0.73     0.62
   hsa-miR-125a-5p   0.84     0.81
   hsa-miR-181b-5p   0.87     0.77
   hsa-miR-675-3p    0.74     0.54
   hsa-miR-9-5p      0.49     0.78
   hsa-miR-664a-5p   0.73     0.69
   hsa-miR-93-5p     0.94     0.81
   hsa-miR-30e-5p    0.82     0.85
   hsa-miR-376c-3p   0.57     0.66
   hsa-miR-326       0.74     0.81
   hsa-miR-193a-5p   0.59     0.73
   hsa-miR-532-3p    0.52     0.68
   hsa-miR-625-3p    0.68     0.5
   hsa-miR-106a-5p   0.62     0.51
   hsa-let-7 g-5p    0.64     0.7
   hsa-let-7f-5p     0.50     0.62
   hsa-miR-193b-5p   0.58     0.58
   [115]Open in a new tab
   Abbreviation: ESCA-Esophageal carcinoma, STAD-Stomach adenocarcinoma,
   AUC-Area under the receiver operating curve.
Fig. 3.
   [116]Fig. 3
   [117]Open in a new tab
   Diagnosis prediction ability of miRNAs was evaluated in ESCA using ROC
   curves.
Fig. 4.
   [118]Fig. 4
   [119]Open in a new tab
   Diagnosis prediction ability of miRNAs was evaluated in STAD using ROC
   curves.
3.5. Expression differences of the miRNA signature
   Expression difference analysis was performed to measure the
   significance in the expression levels of the identified miRNA signature
   between normal and tumor tissues of ESCA and STAD patients using the
   CancerMiRNome database [120][52]. There were 14 miRNAs, including
   hsa-miR-625-3p, hsa-miR-664a-5p, hsa-miR-326, hsa-miR-130a-5p,
   hsa-miR-20a-3p, hsa-miR-675-3p, hsa-miR-760, hsa-miR-16-5p,
   hsa-miR-767-5p, hsa-miR-1301-3p, hsa-miR-125a-5p, hsa-miR-181b-5p,
   hsa-miR-93-5p, and hsa-miR-30e-5p which showed a significant difference
   (p < 0.05) between normal and ESCA samples ([121]Table 4). There were
   19 miRNAs, including hsa-miR-664a-5p, hsa-miR-767-5p, hsa-let-7 g-5p,
   hsa-let-7f-5p, hsa-miR-376c-3p, hsa-miR-29a-5p, hsa-miR-760,
   hsa-miR-1301-3p, hsa-miR-532-5p, hsa-miR-20a-3p, hsa-miR-125a-5p,
   hsa-miR-181b-5p, hsa-miR-9-5p, hsa-miR-93-5p, hsa-miR-30e-5p,
   hsa-miR-326, hsa-miR-193a-5p, hsa-miR-532-3p, and hsa-miR-193b-5p, that
   had significantly different expression between normal and STAD patients
   ([122]Table 4). The top 10 ranked miRNAs and their expression
   differences between healthy and ESCA and STAD patients are shown in
   [123]Fig. 5, [124]Fig. 6, respectively.
Table 4.
   Expression differences of the miRNA signature between normal and tumor
   tissues.
   miRNA signature Normal vs ESCA Normal vs STAD
                      p-value        p-value
   hsa-miR-760         0.003          0.0297
   hsa-miR-767-5p      0.0004         0.0003
   hsa-miR-1301-3p     0.0001        <0.0001
   hsa-miR-891a-5p     0.8733         0.6737
   hsa-miR-532-5p      0.3987        <0.0001
   hsa-miR-29a-5p      0.3481         0.008
   hsa-miR-16-5p       0.0005         0.0911
   hsa-miR-130a-5p     0.0115         0.1496
   hsa-miR-329-3p      0.4576         0.6845
   hsa-miR-496         0.1857         0.1031
   hsa-miR-20a-3p      0.005         <0.0001
   hsa-miR-125a-5p    <0.0001        <0.0001
   hsa-miR-181b-5p    <0.0001        <0.0001
   hsa-miR-675-3p      0.0033         0.1292
   hsa-miR-9-5p        0.9717        <0.0001
   hsa-miR-664a-5p     0.0177         0.0002
   hsa-miR-93-5p      <0.0001        <0.0001
   hsa-miR-30e-5p     <0.0001        <0.0001
   hsa-miR-376c-3p     0.5389         0.0033
   hsa-miR-326         0.0117        <0.0001
   hsa-miR-193a-5p     0.3494        <0.0001
   hsa-miR-532-3p      0.8464        <0.0001
   hsa-miR-625-3p      0.023          0.9569
   hsa-miR-106a-5p     0.1783         0.9017
   hsa-let-7 g-5p      0.3872         0.0003
   hsa-let-7f-5p       0.7968         0.0026
   hsa-miR-193b-5p     0.343         <0.0001
   [125]Open in a new tab
   Abbreviation: ESCA-Esophageal carcinoma, STAD-Stomach adenocarcinoma.
Fig. 5.
   [126]Fig. 5
   [127]Open in a new tab
   Comparison of expression of the top 10 ranked miRNAs between normal and
   ESCA samples using boxplot representation (* indicates p < 0.05).
Fig. 6.
   [128]Fig. 6
   [129]Open in a new tab
   Comparison of expression of the top 10 ranked miRNAs between normal and
   STAD samples using boxplot representation. (* indicates p < 0.05).
3.6. MiRNA-gene target enrichment analysis
   There were 558 miRNA target interactions (MTI) with strong evidence,
   which included 32 miRNAs and 352 target genes from miRTarBase
   ([130]Supplementary Table S2). We performed gene-set enrichment
   analysis using three pathway libraries: WikiPathway, KEGG, and MSigDB
   Hallmark, shown in [131]Fig. 7. The highly enriched pathways in
   WikiPathway, KEGG, and MSigDB Hallmark were the somatotrophic axis and
   its relationship to dietary restriction and aging (WP4186) (adjusted
   p-value: 1.34E-10, Odds ratio: 117888, combined score: 2,862,498),
   pancreatic cancer (adjusted p-value:1.54E-34, Odds ratio:44.55,
   combined score: 3655.72), and apoptosis (adjusted p-value:5.17E-20,
   Odds ratio: 12.68, combined score: 563.12), respectively, shown in
   [132]Supplementary Tables S3-S5. Additionally, the miRNA signature-gene
   interaction network was built using miRTarBase [133][46], TarBase V8.
   [134][53] and miRecords [135][54]. There were 28,057 edges associated
   with 10,525 genes. We reduced the low priority edges using the shortest
   path network measures [136][55]. The final network, consisting of 832
   edges 93 targeted genes, is shown in [137]Supplementary Fig. S3.
Fig. 7.
   [138]Fig. 7
   [139]Open in a new tab
   The pathways enrichment analysis of miRNA signature targeted genes in
   three categories, (A) wiki pathways, (B) KEGG pathways, and (C) MSigDB
   hallmark.
   The Gene Ontology (GO) annotations of the target genes were in three
   categories: biological process, molecular function, and cellular
   component. The highly enriched pathways for biological process,
   molecular function, and cellular component were positive regulation of
   smooth muscle cell apoptosis process (GO:0034393), I-SMAD binding
   (GO:0070411), and serine/threonine protein kinase complex (GO:1902554),
   respectively, as shown in [140]Supplementary Figs. S4-S6 and
   [141]Supplementary Table S6.
3.7. MiRNAs in cancers
   The roles of the top 10 ranked miRNAs in various diseases and cancers
   were examined using the Human microRNA Disease Database (HMDD v3.2)
   [142][56], miRTarbase, and by reviewing the scientific literature. The
   information from these resources indicate that the top 10 ranked miRNAs
   are involved in STEC. A quantitative real-time PCR analysis reported
   that hsa-miR-760 is significantly downregulated in ESCA tissues and
   cell lines, suggesting that this miRNA could be used as a prognostic
   indicator [143][57]. Significant differential expression of
   hsa-miR-769-5p was observed in ESCA tissue when compared to the normal
   tissues [144][58]. Over-expression of hsa-miR-1301-3p induces cell
   proliferation and tumorigenesis in gastric cancer tissues [145][59]. Wu
   et al. reported the differential expression of hsa-miR-1301-3p in ESCA,
   suggesting that this miRNA could be used as a prognostic biomarker for
   ESCA [146][60]. Zhang and colleagues reported the downregulation of
   hsa-miR-532-5p in gastric cancer cells, and its expression is
   associated with poorer survival in patients with gastric cancer
   [147][61]. Tokumaru and colleagues demonstrated the association of
   hsa-miR-29a with overall survival in patients with gastric cancer, and
   lower expression of hsa-miR-29a worsens the overall survival in
   patients with gastric cancer [148][62]. Hsa-miR-16-5p has been utilized
   as a prospective biomarker for prognosis prediction in patients with
   gastric cancer and ESCA [149][63], [150][64]. Hsa-miR-130a-5p affects
   cell growth, migration and invasion by targeting cannabinoid receptor 1
   in gastric cancer cells [151][65]; it also deregulates PTEN and
   controls malignant cell survival and tumor growth in multiple cancers
   [152][66]. Hsa-miR-329-3p acts as a tumor suppressor by targeting T
   lymphoma invasion and metastasis in gastric cancer cells and could be
   utilized as potential therapeutic target [153][67]. Hsa-miR-496 is
   downregulated in gastric cancer cell lines, and it inhibits cell
   proliferation via targeting Lyn kinase in gastric cancer cell lines
   [154][68]. Among the top 10 ranked miRNAs, the roles of two miRNAs,
   hsa-miR-769-5p and hsa-miR-891a-5p, have not been reported previously
   in either STAD or ESCA.
   Additionally, a miRNA-disease network was constructed for the miRNA
   signature using miRNet 2.0 [155][55]. The miRNAs of the signature were
   observed to be involved in several diseases. In the miRNA-disease
   association network, there were 12 nodes (miRNAs) with 132 edges
   associated with 85 diseases, shown in [156]Supplementary Fig. S7.
4. Discussion
   MiRNAs provide a way to explore disease mechanisms in various cancers,
   including STEC. The clinical applications of miRNAs in cancer rely on
   identifying miRNA signatures as potential biomarkers and developing
   miRNA-target based therapeutics. Accordingly, we developed a survival
   time estimation method, GASE, to identify a miRNA signature that was
   correlated with STEC patient survival. Computational methods for
   feature selection often suffer from issues related to data quality and
   high dimensionality, especially when dealing with biomedical data. To
   address the challenges to identifying the right biomarker, we used an
   optimal feature selection algorithm, IBCGA, which is good at
   identifying s small number of important features from a large number of
   candidate features. The optimization method was previously utilized to
   estimate the survival time in various cancers [157][34], [158][35],
   [159][36]. In this study, we exclusively focused on identifying a miRNA
   signature in patients with STEC. The proposed method, GASE, identified
   27 miRNAs as a survival miRNA signature and performed better than
   standard machine learning methods in estimating survival time. Our
   evaluation of the diagnostic ability of the identified miRNAs revealed
   that 13 miRNAs were good diagnostic predictors (AUC ≥ 0.7) in ESCA and
   10 miRNAs in STAD. The differential expression analysis between tumor
   and normal samples from patients with STEC revealed that several miRNAs
   had significantly different expression between tumor and normal
   samples. Further, previous reports provide evidence supporting the
   importance of the top 10 ranked miRNAs of the signature in STEC.
   The miRNA-gene target interaction analysis showed that the target genes
   were highly enriched in the somatotrophic axis and its relationship to
   dietary restriction and the aging (WP4186) pathway. The somatotrophic
   axis in mammals involves signaling by growth hormone (GH), which is
   produced by the anterior pituitary, and its secondary mediator,
   insulin-like growth factor 1 (IGF-1). In a previous study, growth
   hormone–releasing hormone and its receptor (GHRH-R) were found
   primarily in the anterior pituitary gland, gastric cancers, other solid
   tumors, and lymphomas. Increased levels of GHRH-R in tumor samples from
   patients with gastric cancer are associated with poor outcomes
   [160][69]. Another important enriched pathway of the miRNA signature
   was transforming growth factor beta (TGF-β) signaling pathway. TGF-β is
   a cytokine that participates in both physiological and pathological
   processes including tumorigenesis [161][70]. During tumor progression,
   TGF-β signaling regulates the immune/inflammatory response and the
   tumor microenvironment. It also regulates tumor growth,
   epithelial-mesenchymal transition (EMT), and cancer cell stemness
   depending on tumor stage and cellular context [162][71]. EMT is also an
   enriched pathway from MSigDB Hallmark (adjusted p-value: 8.24E-19, Odds
   ratio: 11.13, combined score: 506.81), which is consistent with this
   biological mechanism. Abnormal TGF-β signaling has been associated with
   progression of gastrointestinal cancer [163][72], which includes
   esophageal, gastric, liver, colorectal, and pancreatic carcinomas that,
   collectively, are major causes of cancer-related deaths worldwide
   [164][73]. Several TGF-β-based therapeutics have been developed for the
   treatment of gastrointestinal cancers and have displayed efficacy in
   clinical trials [165][74], [166][75]. Additional support for the role
   of TGF-β signaling in STEC was obtained from the GO annotation
   analysis, which showed that I-SMAD binding (GO:0070411) was enriched in
   the GO molecular function category (adjusted p-value: 1.03E-06).
   Nuclear accumulation of active SMAD complexes is crucial for the
   transduction of TGF-β superfamily signals from transmembrane receptors
   to the nucleus.
   The top hits for gene target enrichment analysis also indicated that
   the miRNA signature was related to miRNAs involved in DNA damage
   response, epidermal growth factor receptor tyrosine kinase inhibitor
   resistance, apoptosis, Wnt/beta-catenin signaling, and angiogenesis.
   DNA damage response pathways are known to be related to therapy
   resistance in STEC [167][76], [168][77], and resistance to epidermal
   growth factor receptor tyrosine kinase inhibitors are relevant to
   survival in STEC, consistent with the use of epidermal growth factor
   receptor tyrosine kinase inhibitors as targeted therapy in STEC
   [169][78], [170][79]. The Wnt/beta-catenin signaling pathway has been
   implicated in cancer progression in STEC [171][80], and the
   dysregulation of apoptosis and angiogenesis are known to promote tumor
   growth [172][81], [173][82]. This suggests the miRNAs in the signature
   and the putative gene targets of these miRNAs are possible molecular
   targets for exploitation in the pursuit to create new therapies for
   STEC.
   In addition to being associated with survival, the miRNAs in the
   signature could discriminate between healthy and STEC patients, and
   were differentially expressed between the healthy and tumor tissues of
   patients with STEC. This suggests that the capability of these miRNAs
   to function as prognostic or diagnostic biomarkers. Further
   investigation is needed to determine the utility of the miRNA signature
   as a prognostic biomarker for monitoring response to therapy or
   predicting survival after therapy in STEC patients and as a biomarker
   for early STEC diagnosis. Other questions for study are whether the
   miRNA signature can perform as a biomarker in STEC of different types
   and stages, and whether the miRNA signature can be detected in blood at
   a level of accuracy comparable to that in tumor tissue (to allow for
   the possibility of performing liquid biopsies for biomarker detection).
   In conclusion, a better understanding of the miRNA signature in
   survival predictions will aid in developing treatment strategies for
   STEC. We anticipate that the miRNA signature identified here could help
   in understanding the roles of miRNAs in STEC and developing miRNA-based
   cancer therapeutics.
Funding
   This work was supported in part by the Marshfield Clinic Research
   Institute, Marshfield, WI. The funders had no role in the study design,
   data collection and analysis, decision to publish, or preparation of
   the manuscript.
Author contributions
   S.Y.S. designed the system, carried out the detail study and supervised
   the study. S.Y.S, M.T, T.C, P.A, S.K.S, A.B and S.Y.H, participated in
   data analysis, manuscript preparation and discussed the results. All
   authors have read and approved the final manuscript.
Availability of data and materials
   All the data used in this analysis can be found on the TCGA data portal
   [[174]https://portal.gdc.cancer.gov/].
Ethics approval and consent to participate
   Not applicable.
Consent to publish
   Not applicable.
CRediT authorship contribution statement
   Srinivasulu Yerukala Sathipati: Conceptualization, Data curation,
   Writing – original draft, Formal analysis, Funding acquisition,
   Investigation, Methodology, Project administration, Supervision.
   Ming-Ju Tsai: Validation, Visualization, Formal analysis. Tonia Carter:
   Formal analysis, Data curation, Writing - review & editing. Patrick
   Allaire: Formal analysis. Sanjay K Shukla: Formal analysis. Afshin
   Beheshti: Formal analysis, Writing - review & editing. Shinn-Ying Ho:
   Formal analysis.
Declaration of Competing Interest
   The authors declare that they have no known competing financial
   interests or personal relationships that could have appeared to
   influence the work reported in this paper.
Footnotes
   ^Appendix A
   Supplementary data to this article can be found online at
   [175]https://doi.org/10.1016/j.csbj.2022.08.025.
Appendix A. Supplementary data
   The following are the Supplementary data to this article:
   Supplementary data 1
   [176]mmc1.docx^ (1MB, docx)
   Supplementary data 2
   [177]mmc2.xlsx^ (41.1KB, xlsx)
   Supplementary data 3
   [178]mmc3.xlsx^ (60.9KB, xlsx)
   Supplementary data 4
   [179]mmc4.xlsx^ (37.5KB, xlsx)
   Supplementary data 5
   [180]mmc5.xlsx^ (14.3KB, xlsx)
   Supplementary data 6
   [181]mmc6.xlsx^ (33.5KB, xlsx)
References