Graphical abstract
System overview of GASE. (A). MicroRNA expression profiles and survival
time of patients with stomach and esophageal carcinoma are input of the
system. (B) GASE method development and (C) microRNA discovery and
analysis.
graphic file with name ga1.jpg
[35]Open in a new tab
Keywords: miRNA signature, Machine learning, Survival estimation,
Stomach and esophageal carcinoma
Abstract
Identifying a miRNA signature associated with survival will open a new
window for developing miRNA-targeted treatment strategies in stomach
and esophageal cancers (STEC). Here, using data from The Cancer Genome
Atlas on 516 patients with STEC, we developed a Genetic Algorithm-based
Survival Estimation method, GASE, to identify a miRNA signature that
could estimate survival in patients with STEC. GASE identified 27
miRNAs as a survival miRNA signature and estimated the survival time
with a mean squared correlation coefficient of 0.80 ± 0.01 and a mean
absolute error of 0.44 ± 0.25 years between actual and estimated
survival times, and showed a good estimation capability on an
independent test cohort. The miRNAs of the signature were prioritized
and analyzed to explore their roles in STEC. The diagnostic ability of
the identified miRNA signature was analyzed, and identified some
critical miRNAs in STEC. Further, miRNA-gene target enrichment analysis
revealed the involvement of these miRNAs in various pathways, including
the somatotrophic axis in mammals that involves the growth hormone and
transforming growth factor beta signaling pathways, and gene ontology
annotations. The identified miRNA signature provides evidence for
survival-related miRNAs and their involvement in STEC, which would aid
in developing miRNA-target based therapeutics.
1. Introduction
Stomach and esophageal carcinomas (STEC) are among the most prevalent
malignant diseases causing thousands of deaths globally. Worldwide,
stomach cancer ranks sixth in cancer incidence, with 1,089,103 new
cases, and third in cancer morality, with 768,793 deaths, while
esophageal cancer ranks tenth in cancer incidence, with 604,100 new
cases, and sixth in cancer morality, with 544,076 deaths, based on
estimates for the year 2020 [36][1]. STEC ranks higher in mortality
than incidence because these cancers are often first diagnosed at an
advanced stage. In the United States, diagnosis occurs at a localized,
regional, and distant stage in 28 %, 32 %, and 40 %, respectively, of
stomach cancer cases, and in 25 %, 29 %, and 31 %, respectively of
esophageal cancer cases [37][2], [38][3]. For localized, regional, and
metastatic disease, five-year survival is 64 %, 28.2 %, and 5.3 %,
respectively, for stomach cancer, and 46.7 %, 25.1 %, and 4.8 %,
respectively, for esophageal cancer [39][2], [40][3]. Treatment for
STEC is selected based on disease stage [41][4], [42][5]. Surgery can
be curative but is offered mainly in early disease stages. Chemotherapy
and chemoradiotherapy provide an added survival benefit to surgery in
early-stage disease and are offered without surgery in later disease
stages. Targeted therapies (e.g., Trastuzumab, an inhibitor of human
epidermal growth factor receptor 2) improve survival in STEC and are
increasingly being used in STEC treatment [43][6], and immunotherapy
and other emerging therapies continue to be evaluated for improvement
in STEC survival [44][4], [45][5].
Biomarkers associated with STEC survival are potential targets for
designing new STEC treatments to improve patient survival [46][7],
[47][8]. MicroRNAs (miRNAs) function as oncogenes or tumor suppressor
genes in STEC [48][6], [49][7] and have been investigated as biomarkers
of STEC diagnosis and prognosis [50][9], [51][10]. Roles for miRNAs in
STEC progression and survival have been described in several reports.
For example, low levels of miR148a, a miRNA that suppresses cell
invasion and migration, are associated with advanced clinical stage and
poor prognosis in stomach cancer [52][11]. MiR-616-3p promotes
angiogenesis and metastasis and is correlated with poor prognosis in
stomach cancer [53][12]. Elevated miR-21 expression is linked to lymph
node metastasis [54][13] and poor prognosis [55][14] in esophageal
cancer. MiR-375 targets proteins involved in cancer cell proliferation
and invasion [56][15], and its downregulation is associated with
advanced cancer staging and poor prognosis in esophageal squamous cell
carcinoma [57][16]. Aberrant miRNA expression has also been identified
in STEC. Hwang et al. identified miRNAs, including miR-601, miR-107,
miR-18a, miR-370, miR-300 and miR-96 that were significantly expressed
in early gastric cancers when compared to normal samples [58][17]. A
serum biomarker miRNA panel consisting of 12 miRNAs was developed for
risk assessment in patients with gastric cancer [59][18]. Furthermore,
several dysregulated miRNAs have been found in esophageal tumors that
regulate carcinogenesis [60][19], [61][20]. A quantitative RT-qPCR
study on patients with esophageal carcinoma revealed three miRNAs,
including miR-34a-5p, miR-148a-3p and miR-181a-5p that were associated
with the cancer progression [62][21].
In most studies, associations between miRNAs and STEC survival have
been based on results from a single study sample assessed using the
log-rank test to compare Kaplan-Meier survival curves or Cox
proportional hazards regression analysis [63][22], [64][23], [65][24],
[66][25]. A few other studies have employed discovery and validation
stages in their design to increase the strength of the evidence
supporting associations between miRNAs and STEC survival. These include
studies that have identified differentially expressed miRNAs in STEC in
the discovery stage and tested for association between the miRNAs and
survival in an independent STEC study sample in the validation stage
[67][26], [68][27], [69][28], [70][29], [71][30], [72][31]. Machine
learning methods are also being applied to identify miRNAs associated
with STEC survival. In a study of esophageal squamous cell carcinoma, a
recursive feature elimination-support vector machine algorithm along
with LASSO Cox proportional hazards regression was used to identify
miRNAs associated with survival and build a prognostic model in a
training sample, and the prognostic model was shown to correlate with
survival in an independent, test sample [73][32]. While these previous
reports indicate that miRNAs have potential clinical value as
biomarkers of prognosis in STEC, they have not addressed whether miRNAs
can predict STEC survival time in individual patients.
To design a personalized survival prediction model, it is necessary to
identify biomarkers that show a robust association with survival in
STEC patients. Accordingly, this study aimed to develop a genetic
algorithm (GA)-based survival estimation method (GASE) to identify a
survival-associated miRNA signature and estimate survival time in
patients with STEC. A genetic algorithm (GA)-based survival estimation
method (GASE) is proposed for estimating the survival time in STEC
patients using miRNA expression profiles. GASE was developed using
support vector regression (SVR) that incorporates an optimal feature
selection algorithm inheritable bi-objective combinatorial genetic
algorithm (IBCGA) [74][33]. The identified miRNA signature was analyzed
further to explore miRNA association with STEC. The system overview of
GASE is shown in the graphical abstract.
2. Material and methods
The miRNA expression profiles of patients with STEC were retrieved from
The Cancer Genome Atlas (TCGA) database. These data were generated
using an Illumina Hiseq 2000 sequencing platform. The number of
patients with STEC in the initial dataset was 628. After excluding the
patients without survival information and those whose survival time was
less than 30 days, the final dataset consisted of 123 patients with
miRNA expression profiles and clinical data, including days to death.
Each miRNA expression profile consisting of 500 miRNAs was used for the
survival estimation procedure. For the independent validation, we used
a cohort of 393 patients who were alive with STEC at last follow-up in
the TCGA.
2.1. Survival estimation method GASE
The GASE’s two primary objectives were to estimate the survival time
and simultaneously identify the miRNA signature associated with
survival in patients with STEC. GASE was developed using SVR and an
optimal feature selection algorithm IBCGA. The optimization technique
implemented in GASE was adopted from previous studies [75][34],
[76][35], [77][36]. SVM is a supervised machine learning method, which
has demonstrated good prediction capability in solving classification
and regression problems in various biomedical fields, especially in
cancer genomics [78][37]. SVR uses a nonlinear transformation to find
the relation between input and output variables by generating a
hyperplane that optimally fits in the high dimensional space and
carries out the regression function [79][38]. The tuning of the
parameters C, γ, and ν determine the performance of SVR; hence
parameter tuning plays a vital role in the SVR modeling process. The
minimization of the loss function can be optimized using the following
objective function for the given input data points.
[MATH: min12<
mrow>|w|2+C∑i=1Nξi
mi>+ξi∗ :MATH]
(1)
where ||w|| is the magnitude of the vector to the surface, C is a
regularization parameter, ξ[i] and ξ[i]* are slack variables, ξ[i] ≥ 0,
ξ[i]* ≥ 0, and i = 1,2,…N.
The optimal parameters of GASE were tuned based on an intelligent
evolutionary algorithm (IEA) [80][39]. In the optimization process,
IBCGA [81][33] was used to identify a small set of miRNAs while
maximizing the fitness function in terms of squared correlation
coefficient. GASE prediction performance was evaluated using two
metrics, squared correlation coefficient and mean absolute error. IBGCA
effectively solves bi-objective combinatorial problems where a small
set of informative features will be selected from a large number of
candidate features. The applications of IBCGA in identifying biomarkers
in cancer research have been demonstrated in previous studies [82][34],
[83][35], [84][36], [85][40], [86][41]. In the optimal feature
selection process, all the candidate features were encoded into binary
variables, including the parameters C, γ, and ν of the SVR. The
detailed steps involved in IBCGA can be found in the [87]supplementary
methods. After identifying the miRNA signature, main effect difference
(MED) [88][42] analysis was used to prioritize the miRNAs of the
signature based on their contribution to the prediction performance.
2.2. Feature appearance score
To ensure robustness, we performed 50 independent runs of GASE and
selected one feature set with the highest appearance score for the
analysis. The feature appearance score (FAS) indicates the frequency of
the features that appeared in the 50 independent runs. A feature set
with a more significant appearance score suggests that the feature
frequency in that particular set is higher when compared to other
features across the independent runs. There are S[t] features in the
t-th signature. The frequency score for each feature m presented in the
miRNA signatures can be calculated as follows.
[MATH: FeatureappearanceScore=∑i=
1Stf(mi)/St :MATH]
(2)
where m is the miRNA of the t-th signature.
2.3. LASSO and elastic net
To evaluate the estimation ability of GASE, we compared the prediction
performance with some standard regression methods, including ridge
[89][43], Lasso [90][44] and elastic net [91][45]. We used the miRNA
expression profiles and survival time of 123 patients with STEC as
input. The minimum λ was selected after 100 independent runs of LASSO
and elastic net using 10-CV.
2.4. Strong evidence on miRNA-gene target interaction
To identify the target genes of the selected miRNAs, we used the
miRTarBase (9.0 beta) database [92][46] to extract the experimentally
verified microRNA-target interactions (MTIs) with strong evidence,
which are validated by reporter assay, Western blot, and qPCR.
2.5. Gene set enrichment test
Gene-set libraries are used to organize accumulated knowledge about the
function of groups of genes. We used Enrichr [93][47], [94][48], which
is a web-based application that includes the latest gene-set libraries,
to perform gene-set enrichment analysis. We evaluated the ability of
Enrichr to rank terms from gene-set libraries by combining the p-value
computed using Fisher’s exact test with the z-score of the deviation
from the expected rank by multiplying these two numbers as follows:
[MATH: c=log(p)·z :MATH]
where z = z-score and p = p-value.
This study used six Gene-set libraries, including 1) WikiPathway Human
2021 [95][49], 2) Kyoto Encyclopedia of Genes and Genomes (KEGG), 3)
MSigDB Hallmark [96][50], 4) Gene Ontology Molecular Function 2021
[97][51], 5) Gene Ontology Biological Process, and 6) Gene Ontology
Cellular Component.
3. Results
3.1. GASE prediction performance
We used a survival estimation method, GASE, to identify a miRNA
signature and estimate the survival time in patients with STEC. One
hundred and twenty-three patients with miRNA expression profiles were
retrieved from the TCGA database. GASE identified 27 miRNAs as a
survival miRNA signature and estimated the survival time with a mean
squared correlation coefficient (R^2) of 0.80 ± 0.01 and a mean
absolute error (MAE) of 0.44 ± 0.25 years between actual and estimated
survival times.
A robust miRNA signature was selected by measuring the frequency
appearance score (FAS) using 50 independent runs of GASE. A miRNA
signature with the highest FAS indicates higher frequencies of miRNAs
in the signature across the independent runs of GASE. The mean FAS
obtained for the independent runs was 15.55 ± 1.45, while the highest
FAS was 18.85 (shown in [98]Supplementary Fig. S1 and [99]Supplementary
Table S1). The feature set with the highest FAS was selected for the
analysis. This feature set obtained a R^2 of 0.80 and a MAE of
0.43 years between actual and estimated survival times, and selected 27
miRNAs as a signature to estimate survival time in patients with STEC.
3.2. Prediction performance comparison and validation
Next, we compared GASE with some standard machine learning methods on
their performance to predict survival times. The machine learning
methods used in the comparison included ridge regression, least
absolute shrinkage and selection operator (Lasso) and elastic net.
Ridge regression obtained a R^2 of 0.77 and a MAE of 0.54 years between
actual and estimated survival times. Lasso obtained a R^2 of 0.51 and a
MAE of 0.69 years between actual and estimated survival times, and
elastic net obtained a R^2 of 0.50 and a MAE of 0.71 years between
actual and estimated survival times, respectively. In comparison, GASE
obtained a highest R^2 of 0.83 and a MAE of 0.41 years between actual
and estimated survival times ([100]Table 1). The results indicated that
the performance of GASE was better than that of the standard machine
learning methods. The correlation plots of GASE and the other machine
learning methods are shown in [101]Supplementary Fig. S2A-D.
Table 1.
Prediction performance of GASE.
Method R^2 MAE (years) Features selected
Ridge regression 0.77 0.54 485
LASSO 0.51 0.69 28
Elastic net 0.50 0.71 30
GASE-FAS 0.80 0.43 27
GASE-Best 0.83 0.41 32
GASE-Mean 0.80 ± 0.01 0.44 ± 0.25 33.44 ± 3.59
[102]Open in a new tab
Next, the estimation ability of GASE was validated using a validation
dataset consisting of 393 patients with STEC along with their follow-up
times. The follow-up times of these patients were in the range of
0.3–56 months. We attempted to estimate the survival times of these
patients using the GASE prediction model. The mean follow-up times
observed in patients with STEC was 8.09 ± 12.09 months. The mean
predicted survival time of these patients was 17.74 ± 10.50 months.
GASE achieved an accuracy of 80.41 % for estimating the survival times
of patients whose estimated survival times were higher than the
follow-up times (mean follow-up time 4.0 ± 5.9 months). The mean
estimated survival time of the 316 patients was 19.10 ± 10.28 months,
and a mean prediction error of 12.15 months was obtained for the
remaining patients. The results could be interpreted as follows: an
estimated survival time that was higher than the patient’s follow-up
time was considered as a correct prediction, whereas an estimated
survival time that was lower than the follow-up time was a considered a
prediction error. The follow-up and estimated survival times of these
patients are shown in [103]Fig. 1.
Fig. 1.
[104]Fig. 1
[105]Open in a new tab
The GASE prediction performance on an independent test cohort of 393
patients with follow-up times.
3.3. Ranking of miRNA signature
The miRNAs of the identified miRNA signature were ranked based on their
contribution towards estimating the survival time using main effect
difference (MED) [106][42] analysis. A higher MED score represents
greater contribution towards the prediction of survival time. A miRNA
with a higher MED score indicates superior prediction ability towards
the survival time estimation, whereas a lower-scoring miRNA indicates a
smaller contribution to survival time estimation. The top 10 ranked
miRNAs according to the MED analysis, include hsa-miR-760,
hsa-miR-767-5p, hsa-miR-1301-3p, hsa-miR-891a-5p, hsa-miR-532-5p,
hsa-miR-29a-5p, hsa-miR-16-5p, hsa-miR-130a-5p, hsa-miR-329-3p, and
hsa-miR-496 ([107]Table 2). The prioritization of miRNAs based on their
contribution to the survival estimation is shown in [108]Fig. 2.
Table 2.
Ranking of miRNA signature and corresponding MED scores.
Rank miRNA MIMAT-ID MED
1 hsa-miR-760 MIMAT0004957 1.728135
2 hsa-miR-767-5p MIMAT0003882 1.480966
3 hsa-miR-1301-3p MIMAT0005797 1.344602
4 hsa-miR-891a-5p MIMAT0004902 1.14225
5 hsa-miR-532-5p MIMAT0002888 1.139153
6 hsa-miR-29a-5p MIMAT0004503 0.887408
7 hsa-miR-16-5p MIMAT0000069 0.88658
8 hsa-miR-130a-5p MIMAT0004593 0.863724
9 hsa-miR-329-3p MIMAT0001629 0.844311
10 hsa-miR-496 MIMAT0002818 0.818043
11 hsa-miR-20a-3p MIMAT0004493 0.724058
12 hsa-miR-125a-5p MIMAT0000443 0.63757
13 hsa-miR-181b-5p MIMAT0000257 0.590379
14 hsa-miR-675-3p MIMAT0006790 0.578151
15 hsa-miR-9-5p MIMAT0000441 0.484588
16 hsa-miR-664a-5p MIMAT0005948 0.425219
17 hsa-miR-93-5p MIMAT0000093 0.364274
18 hsa-miR-30e-5p MIMAT0000692 0.355408
19 hsa-miR-376c-3p MIMAT0000720 0.345478
20 hsa-miR-326 MIMAT0000756 0.312151
21 hsa-miR-193a-5p MIMAT0004614 0.275742
22 hsa-miR-532-3p MIMAT0004780 0.268942
23 hsa-miR-625-3p MIMAT0004808 0.259763
24 hsa-miR-106a-5p MIMAT0000103 0.213424
25 hsa-let-7 g-5p MIMAT0000414 0.152833
26 hsa-let-7f-5p MIMAT0000067 0.04358
27 hsa-miR-193b-5p MIMAT0004767 0.010963
[109]Open in a new tab
Fig. 2.
Fig. 2
[110]Open in a new tab
Chord diagram showing the prioritization of miRNAs of the signature
based on their survival estimation ability in stomach and esophageal
carcinoma. The size of the line is proportional to the percent
contribution towards the survival estimation.
3.4. Diagnosis prediction
The diagnostic ability of the identified miRNA signature was measured
by distinguishing healthy and STEC patients using CancerMiRNome
database [111][52]. The individual miRNAs that compose the miRNA
signature had AUCs in a range of 0.49–0.94 for distinguishing healthy
from STEC patients, as shown in [112]Table 3. Among the signature
miRNAs, 13 miRNAs, including hsa-miR-93-5p, hsa-miR-1 81b-5p,
hsa-miR-125a-5p, hsa-miR-1301-3p, hsa-miR-30e-5p, hsa-miR-767-5p,
hsa-miR-16-5p, hsa-miR-675-3p, hsa-miR-326, hsa-miR-760,
hsa-miR-20a-3p, hsa-miR-664a-5p, and hsa-miR-130a-5p were good
diagnostic predictors of esophageal carcinoma (ESCA) (AUC ≥ 0.70), as
shown in [113]Fig. 3. Ten miRNAs, including hsa-miR-30e-5p,
hsa-miR-1301-3p, hsa-miR-125a-5p, hsa-miR-93-5p, hsa-miR-326,
hsa-miR-532-5p, hsa-miR-9-5p, hsa-miR-181b-5p, hsa-miR-193a-5p, and
hsa-let-7 g-5p were good diagnostic predictors of stomach
adenocarcinoma (STAD) (AUC ≥ 0.7), as shown in [114]Fig. 4.
Table 3.
Diagnosis prediction of patients with STEC using the miRNA signature.
miRNAs ESCA-AUC STAD-AUC
hsa-miR-760 0.73 0.60
hsa-miR-767-5p 0.78 0.63
hsa-miR-1301-3p 0.82 0.82
hsa-miR-891a-5p 0.52 0.53
hsa-miR-532-5p 0.59 0.78
hsa-miR-29a-5p 0.57 0.64
hsa-miR-16-5p 0.77 0.49
hsa-miR-130a-5p 0.70 0.56
hsa-miR-329-3p 0.55 0.53
hsa-miR-496 0.60 0.59
hsa-miR-20a-3p 0.73 0.62
hsa-miR-125a-5p 0.84 0.81
hsa-miR-181b-5p 0.87 0.77
hsa-miR-675-3p 0.74 0.54
hsa-miR-9-5p 0.49 0.78
hsa-miR-664a-5p 0.73 0.69
hsa-miR-93-5p 0.94 0.81
hsa-miR-30e-5p 0.82 0.85
hsa-miR-376c-3p 0.57 0.66
hsa-miR-326 0.74 0.81
hsa-miR-193a-5p 0.59 0.73
hsa-miR-532-3p 0.52 0.68
hsa-miR-625-3p 0.68 0.5
hsa-miR-106a-5p 0.62 0.51
hsa-let-7 g-5p 0.64 0.7
hsa-let-7f-5p 0.50 0.62
hsa-miR-193b-5p 0.58 0.58
[115]Open in a new tab
Abbreviation: ESCA-Esophageal carcinoma, STAD-Stomach adenocarcinoma,
AUC-Area under the receiver operating curve.
Fig. 3.
[116]Fig. 3
[117]Open in a new tab
Diagnosis prediction ability of miRNAs was evaluated in ESCA using ROC
curves.
Fig. 4.
[118]Fig. 4
[119]Open in a new tab
Diagnosis prediction ability of miRNAs was evaluated in STAD using ROC
curves.
3.5. Expression differences of the miRNA signature
Expression difference analysis was performed to measure the
significance in the expression levels of the identified miRNA signature
between normal and tumor tissues of ESCA and STAD patients using the
CancerMiRNome database [120][52]. There were 14 miRNAs, including
hsa-miR-625-3p, hsa-miR-664a-5p, hsa-miR-326, hsa-miR-130a-5p,
hsa-miR-20a-3p, hsa-miR-675-3p, hsa-miR-760, hsa-miR-16-5p,
hsa-miR-767-5p, hsa-miR-1301-3p, hsa-miR-125a-5p, hsa-miR-181b-5p,
hsa-miR-93-5p, and hsa-miR-30e-5p which showed a significant difference
(p < 0.05) between normal and ESCA samples ([121]Table 4). There were
19 miRNAs, including hsa-miR-664a-5p, hsa-miR-767-5p, hsa-let-7 g-5p,
hsa-let-7f-5p, hsa-miR-376c-3p, hsa-miR-29a-5p, hsa-miR-760,
hsa-miR-1301-3p, hsa-miR-532-5p, hsa-miR-20a-3p, hsa-miR-125a-5p,
hsa-miR-181b-5p, hsa-miR-9-5p, hsa-miR-93-5p, hsa-miR-30e-5p,
hsa-miR-326, hsa-miR-193a-5p, hsa-miR-532-3p, and hsa-miR-193b-5p, that
had significantly different expression between normal and STAD patients
([122]Table 4). The top 10 ranked miRNAs and their expression
differences between healthy and ESCA and STAD patients are shown in
[123]Fig. 5, [124]Fig. 6, respectively.
Table 4.
Expression differences of the miRNA signature between normal and tumor
tissues.
miRNA signature Normal vs ESCA Normal vs STAD
p-value p-value
hsa-miR-760 0.003 0.0297
hsa-miR-767-5p 0.0004 0.0003
hsa-miR-1301-3p 0.0001 <0.0001
hsa-miR-891a-5p 0.8733 0.6737
hsa-miR-532-5p 0.3987 <0.0001
hsa-miR-29a-5p 0.3481 0.008
hsa-miR-16-5p 0.0005 0.0911
hsa-miR-130a-5p 0.0115 0.1496
hsa-miR-329-3p 0.4576 0.6845
hsa-miR-496 0.1857 0.1031
hsa-miR-20a-3p 0.005 <0.0001
hsa-miR-125a-5p <0.0001 <0.0001
hsa-miR-181b-5p <0.0001 <0.0001
hsa-miR-675-3p 0.0033 0.1292
hsa-miR-9-5p 0.9717 <0.0001
hsa-miR-664a-5p 0.0177 0.0002
hsa-miR-93-5p <0.0001 <0.0001
hsa-miR-30e-5p <0.0001 <0.0001
hsa-miR-376c-3p 0.5389 0.0033
hsa-miR-326 0.0117 <0.0001
hsa-miR-193a-5p 0.3494 <0.0001
hsa-miR-532-3p 0.8464 <0.0001
hsa-miR-625-3p 0.023 0.9569
hsa-miR-106a-5p 0.1783 0.9017
hsa-let-7 g-5p 0.3872 0.0003
hsa-let-7f-5p 0.7968 0.0026
hsa-miR-193b-5p 0.343 <0.0001
[125]Open in a new tab
Abbreviation: ESCA-Esophageal carcinoma, STAD-Stomach adenocarcinoma.
Fig. 5.
[126]Fig. 5
[127]Open in a new tab
Comparison of expression of the top 10 ranked miRNAs between normal and
ESCA samples using boxplot representation (* indicates p < 0.05).
Fig. 6.
[128]Fig. 6
[129]Open in a new tab
Comparison of expression of the top 10 ranked miRNAs between normal and
STAD samples using boxplot representation. (* indicates p < 0.05).
3.6. MiRNA-gene target enrichment analysis
There were 558 miRNA target interactions (MTI) with strong evidence,
which included 32 miRNAs and 352 target genes from miRTarBase
([130]Supplementary Table S2). We performed gene-set enrichment
analysis using three pathway libraries: WikiPathway, KEGG, and MSigDB
Hallmark, shown in [131]Fig. 7. The highly enriched pathways in
WikiPathway, KEGG, and MSigDB Hallmark were the somatotrophic axis and
its relationship to dietary restriction and aging (WP4186) (adjusted
p-value: 1.34E-10, Odds ratio: 117888, combined score: 2,862,498),
pancreatic cancer (adjusted p-value:1.54E-34, Odds ratio:44.55,
combined score: 3655.72), and apoptosis (adjusted p-value:5.17E-20,
Odds ratio: 12.68, combined score: 563.12), respectively, shown in
[132]Supplementary Tables S3-S5. Additionally, the miRNA signature-gene
interaction network was built using miRTarBase [133][46], TarBase V8.
[134][53] and miRecords [135][54]. There were 28,057 edges associated
with 10,525 genes. We reduced the low priority edges using the shortest
path network measures [136][55]. The final network, consisting of 832
edges 93 targeted genes, is shown in [137]Supplementary Fig. S3.
Fig. 7.
[138]Fig. 7
[139]Open in a new tab
The pathways enrichment analysis of miRNA signature targeted genes in
three categories, (A) wiki pathways, (B) KEGG pathways, and (C) MSigDB
hallmark.
The Gene Ontology (GO) annotations of the target genes were in three
categories: biological process, molecular function, and cellular
component. The highly enriched pathways for biological process,
molecular function, and cellular component were positive regulation of
smooth muscle cell apoptosis process (GO:0034393), I-SMAD binding
(GO:0070411), and serine/threonine protein kinase complex (GO:1902554),
respectively, as shown in [140]Supplementary Figs. S4-S6 and
[141]Supplementary Table S6.
3.7. MiRNAs in cancers
The roles of the top 10 ranked miRNAs in various diseases and cancers
were examined using the Human microRNA Disease Database (HMDD v3.2)
[142][56], miRTarbase, and by reviewing the scientific literature. The
information from these resources indicate that the top 10 ranked miRNAs
are involved in STEC. A quantitative real-time PCR analysis reported
that hsa-miR-760 is significantly downregulated in ESCA tissues and
cell lines, suggesting that this miRNA could be used as a prognostic
indicator [143][57]. Significant differential expression of
hsa-miR-769-5p was observed in ESCA tissue when compared to the normal
tissues [144][58]. Over-expression of hsa-miR-1301-3p induces cell
proliferation and tumorigenesis in gastric cancer tissues [145][59]. Wu
et al. reported the differential expression of hsa-miR-1301-3p in ESCA,
suggesting that this miRNA could be used as a prognostic biomarker for
ESCA [146][60]. Zhang and colleagues reported the downregulation of
hsa-miR-532-5p in gastric cancer cells, and its expression is
associated with poorer survival in patients with gastric cancer
[147][61]. Tokumaru and colleagues demonstrated the association of
hsa-miR-29a with overall survival in patients with gastric cancer, and
lower expression of hsa-miR-29a worsens the overall survival in
patients with gastric cancer [148][62]. Hsa-miR-16-5p has been utilized
as a prospective biomarker for prognosis prediction in patients with
gastric cancer and ESCA [149][63], [150][64]. Hsa-miR-130a-5p affects
cell growth, migration and invasion by targeting cannabinoid receptor 1
in gastric cancer cells [151][65]; it also deregulates PTEN and
controls malignant cell survival and tumor growth in multiple cancers
[152][66]. Hsa-miR-329-3p acts as a tumor suppressor by targeting T
lymphoma invasion and metastasis in gastric cancer cells and could be
utilized as potential therapeutic target [153][67]. Hsa-miR-496 is
downregulated in gastric cancer cell lines, and it inhibits cell
proliferation via targeting Lyn kinase in gastric cancer cell lines
[154][68]. Among the top 10 ranked miRNAs, the roles of two miRNAs,
hsa-miR-769-5p and hsa-miR-891a-5p, have not been reported previously
in either STAD or ESCA.
Additionally, a miRNA-disease network was constructed for the miRNA
signature using miRNet 2.0 [155][55]. The miRNAs of the signature were
observed to be involved in several diseases. In the miRNA-disease
association network, there were 12 nodes (miRNAs) with 132 edges
associated with 85 diseases, shown in [156]Supplementary Fig. S7.
4. Discussion
MiRNAs provide a way to explore disease mechanisms in various cancers,
including STEC. The clinical applications of miRNAs in cancer rely on
identifying miRNA signatures as potential biomarkers and developing
miRNA-target based therapeutics. Accordingly, we developed a survival
time estimation method, GASE, to identify a miRNA signature that was
correlated with STEC patient survival. Computational methods for
feature selection often suffer from issues related to data quality and
high dimensionality, especially when dealing with biomedical data. To
address the challenges to identifying the right biomarker, we used an
optimal feature selection algorithm, IBCGA, which is good at
identifying s small number of important features from a large number of
candidate features. The optimization method was previously utilized to
estimate the survival time in various cancers [157][34], [158][35],
[159][36]. In this study, we exclusively focused on identifying a miRNA
signature in patients with STEC. The proposed method, GASE, identified
27 miRNAs as a survival miRNA signature and performed better than
standard machine learning methods in estimating survival time. Our
evaluation of the diagnostic ability of the identified miRNAs revealed
that 13 miRNAs were good diagnostic predictors (AUC ≥ 0.7) in ESCA and
10 miRNAs in STAD. The differential expression analysis between tumor
and normal samples from patients with STEC revealed that several miRNAs
had significantly different expression between tumor and normal
samples. Further, previous reports provide evidence supporting the
importance of the top 10 ranked miRNAs of the signature in STEC.
The miRNA-gene target interaction analysis showed that the target genes
were highly enriched in the somatotrophic axis and its relationship to
dietary restriction and the aging (WP4186) pathway. The somatotrophic
axis in mammals involves signaling by growth hormone (GH), which is
produced by the anterior pituitary, and its secondary mediator,
insulin-like growth factor 1 (IGF-1). In a previous study, growth
hormone–releasing hormone and its receptor (GHRH-R) were found
primarily in the anterior pituitary gland, gastric cancers, other solid
tumors, and lymphomas. Increased levels of GHRH-R in tumor samples from
patients with gastric cancer are associated with poor outcomes
[160][69]. Another important enriched pathway of the miRNA signature
was transforming growth factor beta (TGF-β) signaling pathway. TGF-β is
a cytokine that participates in both physiological and pathological
processes including tumorigenesis [161][70]. During tumor progression,
TGF-β signaling regulates the immune/inflammatory response and the
tumor microenvironment. It also regulates tumor growth,
epithelial-mesenchymal transition (EMT), and cancer cell stemness
depending on tumor stage and cellular context [162][71]. EMT is also an
enriched pathway from MSigDB Hallmark (adjusted p-value: 8.24E-19, Odds
ratio: 11.13, combined score: 506.81), which is consistent with this
biological mechanism. Abnormal TGF-β signaling has been associated with
progression of gastrointestinal cancer [163][72], which includes
esophageal, gastric, liver, colorectal, and pancreatic carcinomas that,
collectively, are major causes of cancer-related deaths worldwide
[164][73]. Several TGF-β-based therapeutics have been developed for the
treatment of gastrointestinal cancers and have displayed efficacy in
clinical trials [165][74], [166][75]. Additional support for the role
of TGF-β signaling in STEC was obtained from the GO annotation
analysis, which showed that I-SMAD binding (GO:0070411) was enriched in
the GO molecular function category (adjusted p-value: 1.03E-06).
Nuclear accumulation of active SMAD complexes is crucial for the
transduction of TGF-β superfamily signals from transmembrane receptors
to the nucleus.
The top hits for gene target enrichment analysis also indicated that
the miRNA signature was related to miRNAs involved in DNA damage
response, epidermal growth factor receptor tyrosine kinase inhibitor
resistance, apoptosis, Wnt/beta-catenin signaling, and angiogenesis.
DNA damage response pathways are known to be related to therapy
resistance in STEC [167][76], [168][77], and resistance to epidermal
growth factor receptor tyrosine kinase inhibitors are relevant to
survival in STEC, consistent with the use of epidermal growth factor
receptor tyrosine kinase inhibitors as targeted therapy in STEC
[169][78], [170][79]. The Wnt/beta-catenin signaling pathway has been
implicated in cancer progression in STEC [171][80], and the
dysregulation of apoptosis and angiogenesis are known to promote tumor
growth [172][81], [173][82]. This suggests the miRNAs in the signature
and the putative gene targets of these miRNAs are possible molecular
targets for exploitation in the pursuit to create new therapies for
STEC.
In addition to being associated with survival, the miRNAs in the
signature could discriminate between healthy and STEC patients, and
were differentially expressed between the healthy and tumor tissues of
patients with STEC. This suggests that the capability of these miRNAs
to function as prognostic or diagnostic biomarkers. Further
investigation is needed to determine the utility of the miRNA signature
as a prognostic biomarker for monitoring response to therapy or
predicting survival after therapy in STEC patients and as a biomarker
for early STEC diagnosis. Other questions for study are whether the
miRNA signature can perform as a biomarker in STEC of different types
and stages, and whether the miRNA signature can be detected in blood at
a level of accuracy comparable to that in tumor tissue (to allow for
the possibility of performing liquid biopsies for biomarker detection).
In conclusion, a better understanding of the miRNA signature in
survival predictions will aid in developing treatment strategies for
STEC. We anticipate that the miRNA signature identified here could help
in understanding the roles of miRNAs in STEC and developing miRNA-based
cancer therapeutics.
Funding
This work was supported in part by the Marshfield Clinic Research
Institute, Marshfield, WI. The funders had no role in the study design,
data collection and analysis, decision to publish, or preparation of
the manuscript.
Author contributions
S.Y.S. designed the system, carried out the detail study and supervised
the study. S.Y.S, M.T, T.C, P.A, S.K.S, A.B and S.Y.H, participated in
data analysis, manuscript preparation and discussed the results. All
authors have read and approved the final manuscript.
Availability of data and materials
All the data used in this analysis can be found on the TCGA data portal
[[174]https://portal.gdc.cancer.gov/].
Ethics approval and consent to participate
Not applicable.
Consent to publish
Not applicable.
CRediT authorship contribution statement
Srinivasulu Yerukala Sathipati: Conceptualization, Data curation,
Writing – original draft, Formal analysis, Funding acquisition,
Investigation, Methodology, Project administration, Supervision.
Ming-Ju Tsai: Validation, Visualization, Formal analysis. Tonia Carter:
Formal analysis, Data curation, Writing - review & editing. Patrick
Allaire: Formal analysis. Sanjay K Shukla: Formal analysis. Afshin
Beheshti: Formal analysis, Writing - review & editing. Shinn-Ying Ho:
Formal analysis.
Declaration of Competing Interest
The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to
influence the work reported in this paper.
Footnotes
^Appendix A
Supplementary data to this article can be found online at
[175]https://doi.org/10.1016/j.csbj.2022.08.025.
Appendix A. Supplementary data
The following are the Supplementary data to this article:
Supplementary data 1
[176]mmc1.docx^ (1MB, docx)
Supplementary data 2
[177]mmc2.xlsx^ (41.1KB, xlsx)
Supplementary data 3
[178]mmc3.xlsx^ (60.9KB, xlsx)
Supplementary data 4
[179]mmc4.xlsx^ (37.5KB, xlsx)
Supplementary data 5
[180]mmc5.xlsx^ (14.3KB, xlsx)
Supplementary data 6
[181]mmc6.xlsx^ (33.5KB, xlsx)
References