Abstract MicroRNAs(miRNAs) are promising biomarkers for early esophageal squamous cell carcinoma (ESCC) detection and prognostic prediction. This study aimed to explore the potential biomarkers and molecular pathogenesis in the early diagnosis of ESCC. Firstly, 48 differentially expressed miRNAs (DEMs) and 1319 differentially expressed genes (DEGs) were identified between 94 ESCC tissues and 13 normal esophageal tissues in TCGA. From miRNA–mRNA regulatory network, there are 6558 target genes of the 48 DEMs, where 400 target genes are also among 1319 DEGs. Then, gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment indicate that the 400 DEGs significantly enriched in cell cycle, proteoglycans in cancer, p53 signaling pathway, protein digestion and absorption, transcriptional dysregulation in cancer, and oocyte meiosis. And there are 66 DEGs among these six biological pathways, which we called GO-DEGs. From miRNA–mRNA regulatory network, 32 DEMs regulated the 66 GO-DEGs, where 22 DEMs were verified by different types of experiments in ESCC tissues, cells, or serum from the literature. For the other novel 10 DEMs, single-factor Cox regression analysis show that only hsa-miR-34b-3p showed no significant correlation with the overall survival of ESCC patients. Finally, we obtained the novel 9 ESCC-related DEMs, where three are down-regulated, and six are up-regulated. We analyzed the expression trends of target genes for five miRNAs and identified three significantly different miRNAs (hsa-miR-205-3p, hsa-miR-452-3p, and hsa-miR-6499-3p) confirmed by qPCR. Moreover, the stage-specific miRNAs were also suggested. These three qPCR validated miRNAs are also specific to the early stages of ESCC: hsa-miR-452-3p is specific to Stage I, II and III; hsa-miR-205-3p is specific in Stage II and III; and hsa-miR-6499-3p is Stage II specific. They might be the potential biomarkers for ESCC stage diagnosis. This study identified three novel miRNA markers potentially related to the diagnosis of ESCC and participated in the occurrence and development of ESCC through cell cycle, proteoglycans in cancer, p53 signaling pathway, protein digestion and absorption, transcriptional dysregulation in cancer, and signaling pathway for oocyte meiosis. Supplementary Information The online version contains supplementary material available at 10.1038/s41598-024-76321-0. Keywords: Esophageal squamous cell carcinoma, miRNA, mRNA, Biomarkers Subject terms: Gene expression, Gene regulation, Cancer genomics, Tumour biomarkers Introduction Esophageal cancer is one of the most common gastrointestinal malignant tumors, the eighth incidence rate, the sixth mortality rate of all cancers in the world^[46]1, and the fourth leading cause of cancer-related deaths in China^[47]2. Esophageal cancer mainly has two pathological subtypes: esophageal adenocarcinoma and esophageal squamous cell carcinoma (ESCC). And in China, about 90% of esophageal cancer are ESCC^[48]3, and most ESCC pathological stage, around 90% were at the severe stage^[49]4. The prognosis of ESCC in China is very poor, and 5-year survival rate is only 15–25%^[50]5. The location of ESCC were mainly in the upper and middle esophagus. The cause factors include smoking, alcohol abuse, dietary habits, and genetic factors^[51]6. However, the specific pathogenesis of ESCC remains unclear. Studies have identified several miRNAs upregulated in ESCC patients, including miR-10a, miR-18a, miR-19b, miR-21, miR-22, miR-25, miR-31, miR-93, miR-129, miR-1246, miR-1322, miR-451, and miR-365. Conversely, miR-155, miR-203, miR-205, miR-375, miR-377, miR-486, and miR-718 are downregulated in these patients^[52]7. Recent findings indicate miRNA profiles are crucial for ESCC prognosis. High levels of plasma miR-21 and miR-16 correlate with poor survival, and low miR-375 levels predict an especially poor outcome^[53]8. A six-miRNA signature outperforms traditional tumour, node and metastasis (TNM) staging in accuracy, while miR-129, miR-103, and miR-107 also relate to worse survival rates^[54]9. Conversely, miR-377 and miR-15a levels are inversely related to survival, suggesting their potential as new prognostic markers, along with the rising importance of miR-367 serum levels. These results highlight the significant role of miRNA dysregulation in diagnosis and treatment of ESCC^[55]10. Previous studies show that miRNA may participate in the occurrence, development, invasion, and metastasis of ESCC, thus, miRNAs are promising diagnosis and prognosis biomarkers of ESCC^[56]11,[57]12. MiRNA is small non-coding RNA regulating gene expression through post-transcriptional regulation, and roles similarly as oncogene or tumor suppressor gene^[58]13. A meta-analysis^[59]14 of 11 studies revealed that miRNA could be used as a biomarker for early detection of ESCC. Most previous studies on ESCC miRNA markers used The Cancer Genome Atlas (TCGA) database, while few searched miRNACancerMap database and PubMed^[60]15–[61]17. In this study, we used the transcriptomic data of both miRNA and mRNA, clinical survival information of ESCC from TCGA to explore novel key miRNA markers participating in progression of ESCC, as well as their molecular pathogenic mechanism in ESCC. Results Differentially expressed miRNAs (DEM) and genes (DEG) between ESCC and normal In total, 94 ESCC and 13 normal samples with both miRNA and mRNA transcriptome data from TCGA were included in the study. Compared to the normal esophageal tissue samples, 1319 differentially expressed genes (DEGs) with FDR < 0.05 and 4-FoldChange were obtained, as shown in the volcano plot of Fig. [62]1A, where 845 were upregulated and 474 were downregulated in ESCC samples. And 48 differentially expressed miRNAs (DEMs) with FDR < 0.05 and 4-FoldChange were identified as shown in the volcano plot of Fig. [63]1B, where 28 were upregulated and 20 were downregulated in ESCC samples. The heatmaps of these 1319 DEGs and 48 DEMs are shown in Fig. [64]1C and D, respectively. Fig. 1. [65]Fig. 1 [66]Open in a new tab DEMs and DEGs between ESCC and normal samples. A Volcano plot of 1319 DEGs with FDR < 0.05 and 4-FoldChange; B Volcano plot of DEMs with FDR < 0.05 and 4-FoldChange; C Heatmap of 1319 DEGs; D Heatmap of 48 DEMs. The target genes of 48 DEMs There are 6,559 target genes of 48 DEMs according to miRTarBase database ([67]https://mirtarbase.cuhk.edu.cn/~miRTarBase/miRTarBase_2022/php/ind ex.php). The comparison of these 6559 target genes with 1319 DEGs was shown in Fig. [68]2, where Venn diagram in Fig. [69]2A shows that 400 DEGs are also the target genes regulated by the 48 DEMs, i.e., DEM regulated DEGs. The heatmap of 400 DEM-regulated DEGs in Fig. [70]2B shows that they well separate the ESCC and normal samples. Fig. 2. [71]Fig. 2 [72]Open in a new tab Common genes between 1319 DEGs and 6,559 target genes of 48 DEMs. A Venn diagram of 1319 DEGs and the 6559 target genes of 48 DEMs, where the 400 common genes are DEM regulated DEGs; B Heatmap of the 400 DEM regulated DEGs. GO and KEGG enrichment Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes(KEGG) enrichment analysis of 1319 DEGs, 6559 target genes of the 48 DEMs, and 400 DEM-regulated DEGs were shown in Fig. [73]3. Fig. 3. [74]Fig. 3 [75]Open in a new tab GO and KEGG enrichment of DEGs and DEMs. A Dot plot of top GO terms enriched from 6559 target gene of 48 DEMs; B Dot plot of top GO terms enriched from 1319 DEGs; C Dot plot of top GO terms enriched from 400 DEM-regulated DEGs; D Dot plot of top KEGG pathways enriched from 6559 target gene of 48 DEMs; E Dot plot of top KEGG pathways enriched from 1319 DEGs; F Dot plot of top KEGG pathways enriched from 400 DEM-regulated DEGs; G The occurrence heatmap shows that only 66 DEM-regulated DEGs participate in the top six KEGG pathways. From GO enrichment of 6559 DEMs target genes in Fig. [76]3A, the top enriched biological processes include cell growth, mitotic cell cycle phase transition, gland development, regulation of apoptotic signaling pathway, and regulation of binding; the top cell components include cell-substrate junction, focal adhesion, spindle, chromosomal region, and cell leading edge; and the top molecular function include DNA-binding transcription factor binding, DNA-binding transcription activator activity, RNA polymerase II-specific, RNA polymerase II-specific DNA-binding transcription factor binding cadherin binding. From GO enrichment of 1319 DEGs in Fig. [77]3B, the top enriched biological processes include epidermis development, organelle fission, nuclear division, skin development, and mitotic nuclear division. The top enriched cell components include the collagen-containing extracellular matrix, chromosome, chromosomal region, and basal part of the cell. The top enriched molecular functions include glycosaminoglycan binding, DNA-binding transcription activator activity, RNA polymerase II-specific, glycosaminoglycan binding, extracellular matrix structural constituent, and heparin binding. From GO enrichment of 400 DEM-regulated DEGs in Fig. [78]3C, the top enriched biological processes include organelle fission, nuclear division, chromosome segregation, mitotic nuclear division, nuclear chromosome segregation, mitotic cell cycle phase transition, sister chromatid segregation, mitotic sister chromatid segregation, microtubule cytoskeleton organization involved in mitosis, and regulation of chromosome segregation; the top cell components include the spindle, chromosomal region, condensed chromosome, chromosome centromeric region, condensed chromosome centromeric region, kinetochore, mitotic spindle, midbody, kinesin complex, and outer kinetochore; and the top molecular functions include DNA-binding transcription activator activity RNA polymerase II-specific, DNA-binding transcription activator activity, tubulin binding, microtubule binding, catalytic activity acting on DNA, microtubule motor activity, cytoskeletal motor activity, DNA secondary structure binding, extracellular matrix structural constituent conferring tensile strength, and single-stranded DNA helicase activity. From KEGG enrichment of 6559 DEMs target genes in Fig. [79]3D, the top enriched KEGG pathways include Human papillomavirus infection, Proteoglycans in cancer, Cellular senescence, and Cell cycle. From KEGG enrichment of 1319 DEGs in Fig. [80]3E, the top enriched KEGG pathways include Cell Cycle, Oocyte meiosis, Hippo signaling pathway, Melanogenesis, IL-17 signaling pathway, and p53 signaling pathway. Furthermore, KEGG enrichment of 400 DEM-regulated DEGs in Fig. [81]3F shows that the top enriched KEGG pathways include cell cycle, proteoglycan in cancer, p53 signaling pathway, protein digestion and absorption, transcriptional dysregulation in cancer, and oocyte meiosis. Both GO and KEGG enrichment indicates that these six pathways were the main mechanisms of both DEM target genes and regulated DEGs, which related to ESCC. We thus filtered the DEM-regulated DEGs in these six ESCC-related pathways, and there are 66 DEM-regulated DEGs among them as shown in their occurrence heatmap of Fig. [82]3G. The heatmaps of these 66 DEM-regulated DEGs in the six ESCC-related pathways are shown in Fig. [83]4A–F, respectively. They show that these 66 DEMs regulated DEGs well separate ESCC and normal samples. We thus looked for the DEMs regulating these 66 DEGs. From the miRNA–mRNA regulatory network in miRTarBase database, there are 32 DEMs in total regulating these 66 DEGs, called as ESCC-related DEMs. Fig. 4. [84]Fig. 4 [85]Open in a new tab The heatmaps of the DEM-regulated DEGs in six ESCC-related pathways. A Heatmap of DEM-regulated DEGs in cell cycle pathway; B Heatmap of DEM-regulated DEGs in proteoglycans in cancer pathway; C Heatmap of DEM-regulated DEGs in p53 signaling pathway; D Heatmap of DEM-regulated DEGs in protein digestion and absorption pathway; E Heatmap of DEM-regulated DEGs in transcriptional misregulation in cancer pathway; F Heatmap of DEM-regulated DEGs in oocyte meiosis pathway. Comparison with ESCC-related DEMs reported in the literature Among the 32 ESCC-related DEMs identified from our analysis in “[86]GO and KEGG enrichment” section, 22 DEMs (hsa-miR-133a-3p, hsa-miR-196a-5p, hsa-miR-205-5p, hsa-miR-129-5p, hsa-miR-196b-5p, hsa-miR-30a-5p, hsa-miR-30a-3p, hsa-miR-148a-3p, hsa-miR-149-5p, hsa-miR-139-5p, hsa-miR-224-5p, hsa-miR-204-5p, hsa-miR-455-3p, hsa-miR-338-3p, hsa-miR-153-5p, hsa-miR-192-5p, hsa-miR-1-3p, hsa-miR-29c-3p, hsa-miR-375, hsa-miR-338-5p, hsa-miR-194-5p, hsa-miR-675-3p) were previously reported in miRCancerMap and PubMed, as verified by different experiments in ESCC tissues, cell lines, or serum, as shown in Table [87]1. Table 1. ESCC-related DEMs reported in miRNACancerMAP and PuBMed. miRNA Expression Target gene Detection method Sample References