Abstract

Background

   Gastric cancer (GC) is one of the most common causes of cancer-related
   fatalities worldwide, and its progression is associated with RNA
   modifications. Here, using RNA modification-related genes (RNAMRGs), we
   aimed to construct a prognostic model for patients with GC.

Methods

   Based on RNAMRGs, RNA modification scores (RNAMSs) were obtained for GC
   samples from The Cancer Genome Atlas and were divided into high- and
   low-RNAMS groups. Differential analysis and weighted correlation
   network analysis were performed for the differential expressed genes
   (DEGs) to obtain the key genes. Next, univariate Cox regression, least
   absolute shrinkage and selection operator, and multivariate Cox
   regression analyses were performed to obtain the model. According to
   the model risk score, samples were divided into high- and low-risk
   groups. Enrichment analysis and immunoassays were performed for the
   DEGs in these groups. Four external datasets from Gene Expression
   Omnibus data base were used to test the accuracy of the predictive
   model.

Results

   We identified SELP and CST2 as key DEGs, which were used to generate
   the predictive model. The high-risk group had a worse prognosis
   compared to the low-risk group (p < 0.05). Enrichment analysis and
   immunoassays revealed that 144 DEGs related to immune cell infiltration
   were associated with the Wnt signaling pathway and included hub genes
   such as ELN. Overall mutation levels, tumor mutation burden, and
   microsatellite instability were lower, but tumor immune dysfunction and
   exclusion scores were greater (p < 0.05) in the high-risk group than in
   the low-risk group. The validation results showed that the prediction
   model score can accurately predict the prognosis of GC patients.
   Finally, a nomogram was constructed using the risk score combined with
   the clinicopathological characteristics of patients with GC.

Conclusion

   This risk score from the prediction model related to the tumor
   microenvironment and immunotherapy could accurately predict the overall
   survival of GC patients.

Supplementary Information

   The online version contains supplementary material available at
   10.1186/s12859-023-05283-3.

   Keywords: RNA modification, Gastric cancer, Prognosis, Bioinformatics,
   Epigenomic

Background

   Gastric cancer (GC) is the fifth most diagnosed malignancy worldwide
   [[37]1]. The most common pathological type of GC is stomach
   adenocarcinoma (STAD), which originates from the epithelial cells of
   the mucosa [[38]2]. Surgery is the cornerstone treatment for GC
   [[39]3], and survival rates have been improved using chemotherapy
   [[40]4]. Although many individuals with GC have benefited from the
   recent development of tailored treatment options, such as
   immunological, targeted, and combination therapies [[41]5], the
   mortality rate of GC remains high, making it the third most common
   cause of cancer-related death [[42]6]. It is therefore important to
   identify novel prognostic markers and therapeutic targets for patients
   with GC.

   Over 170 different types of modifications for coding and non-coding RNA
   (ncRNA) have been identified, including N6-methyladenosine (m6A),
   5-methylcytosine (m5C), and N1-methyladenosine (m1A) [[43]7]. The
   methyltransferases, demethylases, and binding proteins, together known
   as “writers,” “erasers,” and “readers,” can control the dynamics and
   reversibility of RNA epigenetic modifications [[44]8, [45]9].
   Accumulating evidence suggests that the aberrant expression of RNA
   modifications is associated with cell survival, proliferation,
   self-renewal, differentiation, stress adaptation, invasion, and
   resistance to therapy, all of which are hallmarks of cancer
   [[46]10–[47]17]. Abnormal post-transcriptional alterations contribute
   toward cancer cell migration, self-renewal, proliferation, and survival
   [[48]18] and thus are promising therapeutic targets for cancer. The
   most prevalently distributed RNA post-transcriptional modification is
   m6A [[49]19]. It has been reported that the m6A writers METTL3 and
   METTL14 have an impact on GC progression [[50]20–[51]22], the
   expression of METTL3 is associated with GC prognosis [[52]21].
   Therefore, studying the relationship between RNA modification-related
   genes (RNAMRGs) and GC would facilitate exploration of potential
   prognostic markers and therapeutic targets.

   In this study, we identified 48 RNAMRGs associated with m6A, m5C, and
   m1A. Similar to bioinformatics methods of some studies [[53]23–[54]26],
   we used The Cancer Genome Atlas (TCGA) data to screen prognosis-related
   genes and built a predictive model according to the study flowchart
   shown in Fig. [55]1 to explore whether RNAMRGs were associated with GC
   prognosis.

Fig. 1.

   [56]Fig. 1
   [57]Open in a new tab

   Study flow chart

Methods

Data collection

   Expression profiles and clinical data of patients with STAD were
   downloaded from TCGA. GC samples with an overall survival (OS) > 0 were
   retained, leaving a total of 328 samples (Table [58]1). "Masked Somatic
   Mutation" data were downloaded and analyzed. The tumor mutational
   burden (TMB) and microsatellite instability (MSI) were obtained.
   “Masked Copy Number Segment” data were used as copy number variation
   (CNV) data for 440 patients with STAD, who were divided into high- and
   low-risk groups. Segments were analyzed using GenePattern
   ([59]https://cloud.genepattern.org) for GISTIC 2.0 [[60]27] with
   default settings and a confidence level of 0.99. The STAD datasets
   [61]GSE26899 (93 samples) [[62]28], [63]GSE26901 (109 samples)
   [[64]28], [65]GSE84437 (357 samples) [[66]29], and [67]GSE62254 (300
   samples) [[68]30] were downloaded from Gene Expression Omnibus (GEO)
   [[69]31], and 48 RNAMRGs (Additional file [70]1: Table S1) were
   extracted from an article [[71]32].

Table 1.

   Characteristics of patients with gastric cancer in TCGA-STAD data
          Alive       Dead       Total
          258         70         328
   Age (years)
   Mean   64.7        67.4       65.2
   Median 66          68.5       67
    > 67  116 (45%)   42 (60%)   158 (48.2%)
    ≤ 67  142 (54%)   28 (40%)   170 (51.8%)
   Gender
   Female 101 (39.1%) 14 (20%)   115 (35.1%)
   Male   157 (60.9%) 56 (80%)   213 (64.9%)
   T stage
   T1     14 (5.4%)   1 (1.4%)   15 (4.6%)
   T2     54 (20.9%)  12 (17.2%) 66 (20.1%)
   T3     116 (45%)   36 (51.4%) 152 (46.3%)
   T4     74 (28.7%)  21 (30%)   95 (29%)
   TX     0 (0%)      0 (0%)     0 (0%)
   N stage
   N0     85 (33%)    14 (20%)   99 (30.2%)
   N1     67 (26%)    16 (22.9%) 83 (25.3%)
   N2     53 (20.5%)  16 (22.9%) 69 (21%)
   N3     47 (18.2%)  22 (31.3%) 69 (21%)
   NX     6 (2.3%)    2 (2.9%)   8 (2.5%)
   M stage
   M0     235 (91.1%) 56(80%)    291 (88.7%)
   M1     11 (4.3%)   11(15.7%)  22 (6.7%)
   MX     12 (4.6%)   3(4.3%)    15 (4.6%)
   Stage
   I      34 (13.2%)  9 (12.9%)  43 (13.1%)
   II     96 (37.2%)  8 (11.4%)  104 (31.7%)
   III    108 (41.9%) 31 (44.3%) 139 (42.4%)
   IV     15 (5.8%)   18 (25.7%) 33 (10.1%)
   X      5 (1.9%)    4 (5.7%)   9 (2.7%)
   [72]Open in a new tab

RNA modification score (RNAMS) analysis

   The RNAMS was calculated for TCGA samples based on the RNAMRGs using
   the “ssGSEA” algorithm. Samples were divided into high- and low-RNAMS
   groups, and differentially expressed genes (DEGs) were screened using
   adj. p < 0.01 and |log fold change (FC)|> 1.

   Weighted gene correlation network analysis (WGCNA) was performed using
   TCGA data as the input, and 0.85 was calculated as the optimal soft
   threshold by the “pickSoftTreshold” function. With 200 as the minimum
   number of genes per module, gene modules were dynamically assigned to
   identify genes. We screened four modules with the strongest positive
   and negative correlations with the high-RNAMS group. Genes with module
   membership > 0.5 and significance > 0.2 were screened as key gene
   modules. DEGs in the high- and low-RNAMS groups were intersected with
   key gene modules to obtain key genes.

   Consensus clustering was performed using TCGA data and the key genes to
   improve the differentiation between different GC subtypes. The number
   of clusters was set between 2 and 8, and 80% of all samples were drawn
   in 1000 repetitions with clusterAlg = “pam” and distance = “spearman.”

Risk prediction model construction

   Univariate Cox regression analysis was used to calculate the
   association between the expression of each key gene and OS. The least
   absolute shrinkage and selection operator (LASSO) algorithm was then
   used to eliminate multicollinearity and screen for significant
   variables. Following multivariate Cox regression analysis, final
   screening was performed using stepwise regression. The risk score
   formula was calculated by considering the expression of optimized genes
   and the multivariate Cox regression coefficients, as follows:
   [MATH:
   <mrow><mi>R</mi><mi>i</mi><mi>s</mi><mi>k</mi><mi>s</mi><mi>c</mi><mi>o
   </mi><mi>r</mi><mi>e</mi><mo>=</mo><munder><mo
   movablelimits="false">∑</mo><mi>i</mi></munder><mi>C</mi><mi>o</mi><mi>
   e</mi><mi>f</mi><mi>f</mi><mi>i</mi><mi>c</mi><mi>i</mi><mi>e</mi><mi>n
   </mi><mi>t</mi><mfenced close=")"
   open="("><mrow><mi>g</mi><mi>e</mi><mi>n</mi><msub><mi>e</mi><mi>i</mi>
   </msub></mrow></mfenced><mo>×</mo><mi>m</mi><mi>R</mi><mi>N</mi><mi>A</
   mi><mi>e</mi><mi>x</mi><mi>p</mi><mi>r</mi><mi>e</mi><mi>s</mi><mi>s</m
   i><mi>i</mi><mi>o</mi><mi>n</mi><mfenced close=")"
   open="("><mrow><mi>g</mi><mi>e</mi><mi>n</mi><msub><mi>e</mi><mi>i</mi>
   </msub></mrow></mfenced></mrow> :MATH]

   TCGA patients were divided into high- and low-risk groups based on the
   calculated optimal risk score cutoff. Kaplan–Meier analysis and
   log-rank tests were performed, and time-dependent receiver operating
   characteristic (ROC) curves were used to assess survival prediction by
   calculating the area under curve (AUC).

   [73]GSE26901, [74]GSE26899, [75]GSE84437 and [76]GSE62254 were used as
   validation datasets. After calculating risk scores using the formula
   above, data were grouped according to the optimal risk score cutoff and
   subjected to Kaplan–Meier analysis and log-rank tests.

Risk score analysis

   Differential analysis was performed based on the high- and low-risk
   groups from TCGA data, with adj. p < 0.01 and |logFC|> 1.5 as screening
   thresholds. Gene Ontology (GO) analysis [[77]33] and Kyoto Encyclopedia
   of Genes and Genomes (KEGG) pathway enrichment analysis [[78]34] were
   performed on the intersecting genes, the critical value of FDR < 0.05
   was considered statistically significant, and the entry screening
   criteria were adj. p < 0.05 and q-value < 0.05, and the p-value
   correction method was the Benjamini–Hochberg method. Based on the gene
   expression profile dataset for patients with STAD in TCGA, we performed
   gene set enrichment analysis (GSEA) [[79]35], and adj. p < 0.05 was
   considered statistically significant.

   The STRING ([80]https://string-db.org/) database [[81]36] was used to
   construct a protein–protein interaction (PPI) network for DEGs in the
   high- and low-risk groups, with a coefficient of 0.4. PPI results were
   exported from STRING and visualized using Cytoscape [[82]37], and the
   “CytoHubba” plugin [[83]38] was used to analyze the bub genes in the
   PPI network. miRNA–mRNA interaction information was downloaded from the
   miRTarBas ([84]https://mirtarbase.cuhk.edu.cn/) database [[85]39].
   Based on the hub genes identified using the PPI network, an miRNA–mRNA
   regulatory network was constructed by predicting possible regulated
   miRNAs.

   Immune-related genes were downloaded from a pan-cancer immunogenomic
   analysis article [[86]40], which included 782 genes and 28 cell types.
   The degree of immune cell infiltration was analyzed using TCGA-STAD
   data, and an immune score is obtained for each tumor sample. Tumor
   Immune Dysfunction and Exclusion (TIDE) [[87]41]
   ([88]http://tide.dfci.harvard.edu) was used to analyze the treatment
   response in patients with high- and low-risk scores.

   Finally, a nomogram was constructed using the risk score and
   clinicopathological characteristics significantly associated with OS.
   Calibration curves were generated to assess nomogram performance.

Statistical analysis

   All data processing and analyses were conducted using R (v4.1.1). To
   compare two groups of continuous variables, the statistical
   significance of normally distributed variables was estimated using
   independent Student’s t-tests, and differences between non-normally
   distributed variables were analyzed using Mann–Whitney U-tests.
   Chi-square or Fisher’s exact tests were used to compare and analyze
   statistical significance between two groups of categorical variables.
   All statistical p values were bilateral and statistically significant
   at p < 0.05. Significance labeled as NS, p > 0.05; *p < 0.05;
   **p < 0.01; ***p < 0.001; ****p < 0.0001.

   The following packages were used in this study: TCGAbiolinks [[89]42],
   maftools [[90]43], TCGAmutations, GEOquery [[91]44], GSVA [[92]45],
   survival, survminer, limma [[93]46], WGCNA [[94]47], Venn,
   ConsensusClusterPlus [[95]48], survivor, timeROC [[96]49],
   clusterProfiler [[97]50], ESTIMATE [[98]51], rms, glmnet [[99]52],
   Pheatmap, and ggplot2.

Results

Data pre-processing

   First, to analyze the effect of RNA modification on the process of
   gastric carcinogenesis, we downloaded the gene expression profiles of
   STAD patients from the TCGA database as the training set and the
   [100]GSE26899, [101]GSE26901, [102]GSE84437 and [103]GSE62254 gene
   expression profiles associated with STAD from the GEO database as the
   validation sets. The box line plots of the gene expression matrices of
   [104]GSE26899 (Fig. [105]2a), [106]GSE26901 (Fig. [107]2b),
   [108]GSE84437 (Fig. S1a) and [109]GSE62254 (Additional file [110]2:
   Fig. S1b) were plotted. The results showed the same expression trends
   between samples for these datasets, with no intra-group differences,
   and can be used for subsequent analysis.

Fig. 2.

   [111]Fig. 2
   [112]Open in a new tab

   Boxplots of [113]GSE26899 (a) and [114]GSE26901 (b) expression profile
   data

Prognostic analysis of RNAMS in GC and identification of key genes

   We calculated the RNAMS for patients with STAD in TCGA to indicate
   their RNA modification levels. Based on the optimal RNAMS cutoff value
   (1.207673), TCGA-STAD patients were then divided into high- and
   low-RNAMS groups. Survival analysis revealed that patients with a high
   RNAMS had a better prognosis than those with a low RNAMS
   (Fig. [115]3a), while clinical analysis showed that patients in the
   low-RNAMS group had a low age bias and those in the high-RNAMS group
   had a high age bias (Fig. [116]3b).

Fig. 3.

   [117]Fig. 3
   [118]Open in a new tab

   Prognostic analysis of RNA modification score (RNAMS) in gastric cancer
   and identification of key genes. a Survival curves for the high- and
   low-RNAMS groups; b Bar chart of age subgroups in high- and low-RNAMS
   groups; c Differential expression volcano plot of high- and low-RNAMS
   groups; d Heatmap of differentially expressed genes (DEGs) in the high-
   and low-RNAMS groups; e Heatmap of correlation between modules and
   traits in weighted gene correlation network analysis molecules; f Venn
   diagram of the intersection between DEGs and key gene modules

   To identify key RNAMRGs in GC, we analyzed DEGs in the high- and
   low-RNAMS groups (Fig. [119]3c, d). Of the 275 DEGs identified, 77 were
   significantly upregulated and 198 were significantly downregulated
   (Additional file [120]3: Table S2). Next, we constructed a gene
   co-expression network to identify biologically significant gene modules
   as well as genes closely related to RNAMS. Eleven modules were obtained
   (Fig. [121]3e) and screened to identify the four modules with the
   strongest positive and negative correlations with the high-RNAMS group
   (blue; purple; brown; pink). Intersecting the strongly related genes in
   the four modules yielded 2,675 module key genes (Additional file
   [122]4: Table S3), of which 236 overlapped with the DEGs (Additional
   file [123]5: Table S4) and were considered key RNA
   modification-associated genes in GC (Fig. [124]3f).

Analysis of key genes and molecular typing of RNAMS-related genes

   To investigate the overall expression of the 236 key genes in patients
   with STAD, we plotted a heatmap (Fig. [125]4a) and analyzed their
   correlation (Additional file [126]6: Table S5). Fifty genes were
   randomly selected for visualization (Fig. [127]4b). Most of the key
   genes were highly expressed in the low-RNAMS group, and the
   correlations between genes were generally positive. When we further
   analyzed the DNA level variation of key genes, we found that single
   nucleotide polymorphisms (SNPs) were present in key genes in 193
   (60.12%) of TCGA-STAD samples (Fig. [128]4c).

Fig. 4.

   [129]Fig. 4
   [130]Open in a new tab

   Analysis of key genes and molecular typing of RNA modification score
   (RNAMS)-related genes. a Differential expression heatmap of key genes;
   b Correlation heatmap of key genes; c Mutation mapping of
   RNAMS-associated genes in patients with stomach adenocarcinoma; d
   Consistency cumulative distribution function graph; e Scree plot; f and
   g Clustering results for k = 3 (f) and k = 4 (g); h Principal
   components analysis of clustering results; i Survival curves for the
   three subgroups

   Next, we used the expression profiles of the 236 key genes in all
   samples to perform unsupervised clustering using a consensus clustering
   algorithm. The optimal number of clusters was calculated and determined
   to be three (Fig. [131]4d–g); therefore, k = 3 was used to cluster all
   samples into three subtypes that were subjected to principal components
   analysis (Fig. [132]4h). The three subtypes were well differentiated,
   and prognostic analysis revealed that survival differed significantly
   among the three subtypes (Fig. [133]4i), indicating the accuracy of the
   clustering results.

RNAMS risk model construction and evaluation

   To quantify the effect of the 236 key genes on prognosis, we combined
   their expression to construct a risk score model. First, the key genes
   were screened using univariate Cox regression, and 66 genes were
   retained for LASSO regression to eliminate covariance (Fig. [134]5a,
   Additional file [135]7: Table S6). After the best lambda values had
   been determined and crossed by ten-fold validation, six genes (SELP,
   APOD, MXRA8, CST2, RRAD, and GPX3; Fig. [136]5b) were subjected to
   multivariate Cox regression analysis, and the optimal combination was
   screened using stepwise regression to obtain two genes (SELP and CST2)
   for model construction (Fig. [137]5c). The model scoring formula was as
   follows:
   [MATH:
   <mrow><mi>R</mi><mi>i</mi><mi>s</mi><mi>k</mi><mi>s</mi><mi>c</mi><mi>o
   </mi><mi>r</mi><mi>e</mi><mo>=</mo><mfenced close=")"
   open="("><mrow><mn>0.189</mn><mo>×</mo><mi>S</mi><mi>E</mi><mi>L</mi><m
   i>P</mi><mi>e</mi><mi>x</mi><mi>p</mi><mi>r</mi><mi>e</mi><mi>s</mi><mi
   >s</mi><mi>i</mi><mi>o</mi><mi>n</mi></mrow></mfenced><mo>+</mo><mfence
   d close=")"
   open="("><mrow><mn>0.119</mn><mo>×</mo><mi>C</mi><mi>S</mi><mi>T</mi><m
   n>2</mn><mi>e</mi><mi>x</mi><mi>p</mi><mi>r</mi><mi>e</mi><mi>s</mi><mi
   >s</mi><mi>i</mi><mi>o</mi><mi>n</mi></mrow></mfenced></mrow> :MATH]

Fig. 5.

   [138]Fig. 5
   [139]Open in a new tab

   Construction and evaluation of RNA modification score (RNAMS) risk
   prediction model. a Univariate Cox Forest plot for the top 10 genes; b
   Least absolute shrinkage and selection operator regression cross-check
   lambda results plot; c Multivariate Cox Forest plot; d Scatter plot of
   the correlation between risk scores and RNAMS; e–g Survival curves for
   The Cancer Genome Atlas (TCGA)-stomach adenocarcinoma (STAD) data (e),
   [140]GSE26899 (f), and [141]GSE26901 (g); h–j Risk score distribution
   for TCGA (h), [142]GSE26899 (i), and [143]GSE26901 (j) expression
   profiles; k–m ROC curves for TCGA-STAD (k), [144]GSE26899 (l), and
   [145]GSE26901 (m)

   To verify the validity of the model, the correlation between risk score
   and RNAMS for TCGA patients was plotted (Fig. [146]5d), and a
   significant negative correlation was found (cor = -0.49, p < 2.2e^−16).
   A risk score cutoff value of 0.4448089 was used to classify the
   training set TCGA-STAD patients into high- and low-risk groups.
   Survival analysis showed that the low-risk group had significantly
   better survival than the high-risk group (Fig. [147]5e). Consequently,
   patients in the validation sets were classified into high- and low-risk
   groups based on their respective risk score cutoff values
   ([148]GSE26899: 2.059207; [149]GSE26901: 2.140648; [150]GSE84437:
   2.156697; [151]GSE62254: 1.788483), and survival analysis showed that
   the low-risk groups had significantly better survival than the
   high-risk groups (Fig. [152]5f, g; Additional file [153]8: Fig. S2a,
   b). Analyses for risk score distribution, survival status, and
   characteristic gene expression patterns for TCGA (Fig. [154]5h),
   [155]GSE26899 (Fig. [156]5i), [157]GSE26901 (Fig. [158]5j),
   [159]GSE84437 (Additional file [160]8: Fig. S2c), and [161]GSE62254
   (Additional file [162]8: Fig. S2d) revealed that patients in high-risk
   groups had worse survival and similar gene expression patterns.
   Time-dependent ROC analysis of risk scores in the five datasets
   revealed the AUC values for 1-, 3-, and 5-year OS in TCGA
   (Fig. [163]5k), [164]GSE26899 (Fig. [165]5l), [166]GSE26901
   (Fig. [167]5m), [168]GSE84437 (Additional file [169]8: Fig. S2e) and
   [170]GSE62254 (Additional file [171]8: Fig. S2f) indicating that the
   risk score could accurately predict the OS of patients with GC.

Enrichment analysis

   To determine the ability of the model to predict cancer development in
   patients with STAD, we performed differential expression analysis in
   high- and low-risk patient groups. Of the 144 DEGs identified, 2 were
   upregulated and 142 were significantly downregulated (Fig. [172]6a, b).
   Next, we performed GO and KEGG enrichment analyses to identify
   biological processes, molecular functions, cellular components, and
   biological pathways related to the 144 DEGs (Fig. [173]6c–f, Additional
   file [174]9: Tables S7, Additional file [175]10: Table S8), which
   included the Wnt signaling pathway and vascular smooth muscle
   contraction.

Fig. 6.

   [176]Fig. 6
   [177]Open in a new tab

   Analysis of differentially expressed genes (DEGs) in high- and low-risk
   patient groups. a Volcano plot of DEGs; b Heatmap of DEGs; c Top six
   biological processes; d Top six cellular components; e Top six
   molecular functions; f Kyoto Encyclopedia of Genes and Genomes pathway
   enrichment results

   To elucidate the effect of gene expression levels on GC, we analyzed
   the associations between biological processes involved in gene
   expression in TCGA data using GSEA. The MYC targets V2 pathway
   (Fig. [178]7a) was significantly enriched in high-risk patients,
   whereas KRAS signaling up (Fig. [179]7b), inflammatory response
   (Fig. [180]7c), interferon gamma response (Fig. [181]7d), IL2–STAT5
   signaling (Fig. [182]7e), KRAS signaling dn (Fig. [183]7f), TGFβ
   signaling (Fig. [184]7g), apoptosis (Fig. [185]7h), and IL6–JAK–STAT3
   signaling (Fig. [186]7i) were significantly enriched in low-risk
   patients (Additional file [187]11: Table S9).

Fig. 7.

   [188]Fig. 7
   [189]Open in a new tab

   Gene set enrichment analysis. a The MYC targets V2 pathway; b KRAS
   signaling up; c Inflammatory response; d Interferon gamma response; e
   IL2–STAT5 signaling; f KRAS signaling dn; g TGFβ signaling; h
   Apoptosis; i IL6–JAK–STAT3 signaling

Construction of PPI and related regulatory networks

   To analyze the protein interactions among the 144 DEGs, we constructed
   a PPI network (Fig. [190]8a). The top 10 hub genes (ELN, DCN, MYH11,
   ACTA2, FBLN5, TAGLN, CLU, MFAP5, MFAP4, and FBLN1) were obtained based
   on local node density, with a darker color indicating node importance
   (Fig. [191]8b). Functional similarity (Friends) analysis revealed that
   ELN was an important hub gene (Fig. [192]8c), while correlation
   analysis indicated significant co-expression patterns between hub genes
   and risk scores, all of which correlated positively (Fig. [193]8d).
   Based on miRNA–mRNA interaction information downloaded from the
   miRTarBase database, an miRNA–mRNA regulatory network was constructed
   using the hub genes obtained from the PPI network (Fig. [194]8e), which
   contained a total of 202 miRNAs and 9 mRNAs worthy of further study.

Fig. 8.

   [195]Fig. 8
   [196]Open in a new tab

   Construction of protein–protein interaction (PPI) and related
   regulatory networks. a PPI network of differentially expressed genes; b
   Maximal Clique Centrality network of top 10 nodes. Darker color
   indicates more important nodes; c Hub gene importance raincloud plot; d
   Heatmap of correlations between hub genes and risk scores; e miRNA–mRNA
   regulatory network constructed using hub genes. Node size indicates its
   connectivity in the network

Immune cell infiltration analysis

   To evaluate the effect of the risk score on the overall immune
   characteristics and immune cell infiltration levels of patients with
   STAD, we calculated immune and stromal scores for each patient. A
   significant positive correlation was observed between risk score and
   both the immune score (cor = 0.37, p = 4.9e^−12, Fig. [197]9a) and
   stromal score (cor = 0.73, p < 2.2e^−16, Fig. [198]9b). Immune cell
   infiltration scores in TCGA were displayed using a heatmap to further
   examine immune cell expression (Fig. [199]9c). Most immune cells
   correlated positively with one another (Fig. [200]9d), with > 90% of
   immune cells in both high- and low-risk groups showing a significant
   difference and most immune cells displaying significantly higher
   infiltration in the high-risk group than in the low-risk group
   (Fig. [201]9e).

Fig. 9.

   [202]Fig. 9
   [203]Open in a new tab

   Immune cell infiltration analysis. a Scatter plot of correlation
   between risk score and immune score; b Scatter plot of correlation
   between risk score and stromal score; c Heatmap of immune infiltration;
   d Heatmap of immune cell correlations; e Boxplot of differences in
   immune cell infiltration in high- and low-risk groups. f Heatmap of
   correlations between hub genes and immune cell infiltration; g Heatmap
   of correlations between hub genes and immune genes; h Boxplot of
   differences in immune genes in high- and low-risk groups

   Next, we plotted the correlation heatmap between hub genes and immune
   cells (Fig. [204]9f) and between hub genes and immune genes
   (Fig. [205]9g) to determine the effect of hub genes on immune function
   in patients with STAD. Differential boxplots revealed that the
   expression of all immune genes differed significantly between the high-
   and low-risk groups (Fig. [206]9h).

Effect of risk score on genomic changes in patients with STAD

   To determine the effect of the risk score on genetic variation in
   patients with STAD, we analyzed changes in SNPs and CNVs. The overall
   level of single nucleotide mutations in common tumorigenesis driver
   genes was lower in the high-risk group than in the low-risk group
   (Fig. [207]10a, b), whereas the frequency of CNVs was significantly
   increased in all high-risk patients, particularly for copy number
   deletions (Fig. [208]10c, d). By calculating and plotting the TIDE,
   TMB, and MSI, we found that TIDE scores were significantly higher in
   the high-risk group than in the low-risk group (p = 2.6e^−06,
   Fig. [209]10e), whereas TMB (p = 1.8e^−05, Fig. [210]10f) and MSI
   (p = 6.7e^−05, Fig. [211]10g) were significantly lower in the high-risk
   group than in the low-risk group. Together, these results indicate that
   the model risk score can be a potential indicator for evaluating the
   efficacy of immunotherapy in patients with GC.

Fig. 10.

   [212]Fig. 10
   [213]Open in a new tab

   Effect of risk score on genomic changes in patients with stomach
   adenocarcinoma. a and b Mutation profiles of common tumorigenic driver
   genes in high-risk group (a) and low-risk group (b); c and d Changes in
   copy number levels of genes in the high-risk group (c) and low-risk
   group (d); e–g Boxplots of the difference in Tumor Immune Dysfunction
   and Exclusion (TIDE) (e), tumor mutational burden (TMB) (f), and
   microsatellite instability (MSI) (g) in high- and low-risk groups

Construction of a clinical predictive nomogram

   Finally, we generated boxplots to visualize the differences between
   risk score and clinical characteristics. Risk score differed
   significantly with pathological stage, T stage, and N stage
   (Fig. [214]11a), and the RNAMS was significantly lower in the high-risk
   group than in the low-risk group (p = 1.5e^−08, Fig. [215]11b). To
   determine whether the risk score and clinicopathological
   characteristics were independent prognostic factors, we performed
   regression analyses. Univariate Cox regression analysis showed that
   risk score (p < 0.001), pathological stage (p = 0.00159), M stage
   (p = 0.00243), T stage (p = 0.00565), and gender (p = 0.0473) were all
   associated with OS (Fig. [216]11c, Table [217]2), and multivariate Cox
   regression analysis of these significant factors revealed that risk
   score (p = 0.001), M stage (p < 0.001), T stage (p = 0.014), and gender
   (p = 0.005) were significantly associated with OS (Fig. [218]11d, Table
   [219]3). Therefore, we combined the risk score with these three
   clinicopathological characteristics to construct a predictive nomogram
   to assess the OS of patients with STAD (Fig. [220]11e). Calibration
   curves revealed good uniformity between the OS estimates of the
   nomograph for 1-, 2- and 3-year OS of patients and the actual observed
   values (Fig. [221]11f), suggesting that the nomogram predictions were
   accurate.

Fig. 11.

   [222]Fig. 11
   [223]Open in a new tab

   Construction of a nomogram based on the risk score. a Boxplot of
   differences in risk scores across clinical characteristics; b Boxplot
   of differences between risk scores and RNA modification scores; c
   Univariate Cox regression Forest plot; d Multivariate Cox regression
   Forest plot; e Nomogram predicting the 1-, 2-, and 3-year overall
   survival status of patients with gastric cancer; f Nomogram calibration
   curve. g 1-, 2-, and 3-year receiver operating characteristic curves
   for the nomogram. AUC: area under curve

Table 2.

   Univariate Cox regression analysis results
              Hazard ratio Lower 95% CI Upper 95% CI p
   Risk score     2.72         1.52         4.85     0.000709
   Stage         0.892        0.333         2.38     0.00159
   TNM_M          2.74         1.43         5.27     0.00243
   TNM_T          2.42         1.29         4.53     0.00565
   Gender         1.81         1.01         3.26      0.0473
   Age            1.52         0.94         2.47      0.0873
   TNM_N          1.6         0.873         2.94      0.128
   RNA           0.329        0.0697        1.55      0.161
   [224]Open in a new tab

   CI Confidence level

Table 3.

   Multivariate Cox regression analysis results
                            Exp (coefficient) Lower 95% CI Upper 95% CI p
   Risk score                     3.28            1.66         6.47      0.001
   TNM_M (M0 vs. M1)              3.75            1.86         7.55      < 0.001
   TNM_T (T1–2 vs. T3–4)          2.35            1.19         4.64      0.014
   Gender (female vs. male)       2.53            1.32         4.86      0.005
   [225]Open in a new tab

   CI Confidence level

   The ROC curves of the prognostic discrimination of patients at 1 year,
   2 years and 3 years of the nomogram were plotted (Fig. [226]11g), and
   the results showed that the AUC values of the nomogram for predicting
   the prognosis of patients at 1 year, 2 years and 3 years were 0.696,
   0.683 and 0.655, respectively, indicating that the nomogram has good
   predictive ability.

Discussion

   Although the incidence and mortality rates of GC have declined in the
   past 5 decades [[227]53–[228]55], GC remains the third leading cause of
   cancer-related deaths [[229]56]. Cancer cells are characterized by
   genomic instability, which favors mutation accumulation and increased
   tumor heterogeneity [[230]57–[231]59]. RNA modifications have recently
   emerged as key post-transcriptional regulators of gene expression and
   are thought to be associated with various diseases, including cancer
   [[232]60]. However, the intrinsic relationship between RNA
   modifications and GC progression remains unknown. In this study, we
   analyzed the relationship between RNAMRGs and GC using bioinformatics
   analyses and statistical methods and constructed a prognostic model.

   First, we scored each TCGA sample and grouped the samples according to
   their RNAMS and found that patients with high RNA modification
   correlation had a worse prognosis. Dynamic RNA modification events can
   promote tumor progression by promoting tumor cell proliferation or
   regulating invasive and metastatic potential, resulting in a poor
   prognosis, consistent with our findings. Therefore, we explored hub
   RNAMRGs and established a prognostic prediction model for patients with
   GC using selectin P (SELP) and cystatin SA (CST2). SELP encodes a
   protein stored on platelets and endothelial cells that is redistributed
   to the plasma membrane during platelet activation and degranulation and
   mediates the interactions between activated endothelial cells or
   platelets and leukocytes [[233]61]. Coxsackievirus B 1 and 3 interact
   with human platelets to trigger SELP expression [[234]62], and SELP has
   predictive value in various cancers [[235]63, [236]64]; however, the
   relationship between SELP and RNA modification in GC carcinogenesis and
   development remains unclear. CST2 overexpression enhances the growth,
   migration, and invasion of GC cells by regulating the
   epithelial–mesenchymal transition and TGF-β1 signaling pathways
   [[237]65]. According to the formula of our predictive model, higher
   SELP and CST2 expression were associated with a higher risk score, and
   survival analysis further indicated that the high-risk group had a
   worse prognosis. The high predictive efficacy of the model was further
   validated using GEO datasets. Taken together, these findings are
   consistent with the hypothesis that SELP and CST2 may act as cancer
   progression-promoting genes in various tumors.

   Furthermore, enrichment analysis revealed the importance of the Wnt
   signaling pathway, which plays a key role in tumorigenesis [[238]66,
   [239]67]. While constructing the PPI network, we identified ELN as an
   important hub gene. ELN encodes one of the two components of elastic
   fibers that form part of the extracellular matrix and confer elasticity
   to organs and tissues. Abnormal ELN levels have been observed in many
   fibrotic diseases, including kidney [[240]68], lung [[241]69], and
   liver fibrosis [[242]70], and ELN accumulation is associated with the
   development of hepatocellular carcinoma [[243]71]. ELN has also been
   shown to regulate tumor development and the tumor microenvironment
   (TME) in colorectal cancer [[244]72]. Therefore, studying the mechanism
   of ELN and RNA modification in GC carcinogenesis is essential. miRNAs
   are epigenetic regulators that affect the levels of proteins encoded by
   target mRNAs without modifying gene sequences and are themselves
   regulated by epigenetic mechanisms [[245]73]. Reciprocal relationships
   between miRNAs and epigenetic regulation form miRNA-epigenetic feedback
   loops that can regulate cellular processes such as cell proliferation
   [[246]74], apoptosis [[247]75], and differentiation [[248]76]. To
   understand the regulatory relationship between the hub genes (e.g.,
   ELN) and miRNAs, we constructed an miRNA–mRNA regulatory network to
   explore whether miRNAs and mRNAs are associated with GC development.

   Recently, breakthroughs have been made in immunotherapy for cancers
   [[249]77], including GC [[250]78, [251]79]; however, not all patients
   with GC benefit from immunotherapy, possibly owing to the TME
   [[252]80]. Here, we found that immune cell infiltration was increased
   in the high-risk group. FBLN1 was highly expressed in most immune cells
   and correlated positively with the majority of immune genes, whereas
   FBLN5 was expressed at low levels in most immune cells. FBLN1 and FBLN5
   are components of extracellular matrix fibronectin, and ectopic FBLN1
   expression inhibits the growth of GC cells by inducing apoptosis
   [[253]81]. Additionally, FBLN5 is a potential indicator of therapeutic
   efficacy in patients with hepatocellular carcinoma [[254]82]. Thus,
   FBLN1 and FBLN5 may be potential immunotherapeutic targets for treating
   GC. A comparison of the high- and low-risk groups revealed that all
   immune genes exhibited significantly different expression patterns,
   suggesting that our predictive model is a potential indicator for
   immunotherapy in patients with GC.

   We also found that overall mutation levels for common tumor driver
   genes were lower in the high-risk group than in the low-risk group,
   while CNVs, particularly due to deletion, were significantly increased
   in all patients in the high-risk group. TIDE, TMB, and MSI results
   further suggested that our model could predict the effect of immune
   checkpoint inhibition therapy in patients with GC. We therefore
   combined the model with clinicopathological features to construct a
   nomogram to assess the clinical prognosis of GC. Despite these
   promising findings, our study had some limitations. First, the model
   has not yet been applied in a clinical setting, and its actual
   predictive accuracy remains unknown. Second, the value and mechanism of
   action of the hub RNAMRGs identified in this study have not been
   experimentally verified in patients with GC. To overcome these
   limitations, we will apply the nomogram in future clinical work and
   conduct further experiments to explore the mechanism of key RNA
   modifications in the development of GC.

   In conclusion, the RNAMRG risk prediction model developed in this study
   could effectively predict the overall survival of patients with GC.
   Thus, the model risk score can be used to study GC carcinogenesis and
   developing targeted therapies for GC.

Supplementary Information

   [255]12859_2023_5283_MOESM1_ESM.xlsx^ (10.4KB, xlsx)

   Additional file 1. Table S1: RNA modification-related genes.
   [256]12859_2023_5283_MOESM2_ESM.tif^ (11.1MB, tif)

   Additional file 2. Figure S1: Boxplots of GSE84437 (a) and GSE62254 (b)
   expression profile data.
   [257]12859_2023_5283_MOESM3_ESM.xlsx^ (32.3KB, xlsx)

   Additional file 3. Table S2: Differentially expressed genes in the
   high- and low-RNA modification score groups.
   [258]12859_2023_5283_MOESM4_ESM.xlsx^ (43.3KB, xlsx)

   Additional file 4. Table S3: Key gene modules.
   [259]Additional file 5. Table S4: Key genes.^ (12.8KB, xlsx)
   [260]12859_2023_5283_MOESM6_ESM.xlsx^ (600.8KB, xlsx)

   Additional file 6. Table S5: Correlation among all key genes.
   [261]12859_2023_5283_MOESM7_ESM.xlsx^ (19KB, xlsx)

   Additional file 7. Table S6: Univariate Cox regression analysis results
   for all key genes.
   [262]12859_2023_5283_MOESM8_ESM.tif^ (12.8MB, tif)

   Additional file 8. Figure S2: Evaluation of RNA modification score
   (RNAMS) risk prediction model. a–b Survival curves for GSE84437 (a) and
   GSE62254 (b); c–d Risk score distribution for GSE84437 (i) and GSE62254
   (j) expression profiles; e–f ROC curves for GSE84437 (e) and GSE26901
   (f).
   [263]12859_2023_5283_MOESM9_ESM.xlsx^ (36.5KB, xlsx)

   Additional file 9. Table S7: Gene Ontology enrichment analysis results.
   [264]12859_2023_5283_MOESM10_ESM.xlsx^ (10.8KB, xlsx)

   Additional file 10. Table S8: Kyoto Encyclopedia of Genes and Genomes
   enrichment analysis results.
   [265]12859_2023_5283_MOESM11_ESM.xlsx^ (16.7KB, xlsx)

   Additional file 11. Table S9: Gene set enrichment analysis results.

Acknowledgements