Abstract Objective This study aimed to identify potential biomarkers and elucidate molecular mechanisms in oral squamous cell carcinoma (OSCC) by leveraging an integrated bioinformatics approach. Methods We conducted high-throughput RNA sequencing on OSCC and adjacent normal tissues to profile mRNA and lncRNA expression. Differentially expressed genes were identified and subjected to Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. A competing endogenous RNA (ceRNA) network was constructed, and protein–protein interaction (PPI) networks were analyzed using STRING and Cytoscape software. Hub genes were identified using the Cytohubba plug-in in Cytoscape. Results Our analysis identified 5362 differentially expressed mRNAs and 2801 differentially expressed lncRNAs. GO analysis revealed that dysregulated mRNAs were associated with system development and responses to organic substances. At the same time, lncRNAs were enriched in muscle system processes and cell–cell signaling. KEGG pathway analysis highlighted cancer-related pathways, including cytokine–cytokine receptor interactions and the NF-kappa B signaling pathway. The constructed ceRNA network highlighted key regulatory nodes, including hub genes IGF2BP1, CLDN6, and HLA-G, which may play pivotal roles in OSCC. Conclusions This study provides a comprehensive lncRNA-mRNA regulatory network and identifies biomarkers that could advance OSCC therapeutic strategies. The findings offer new insights into OSCC pathogenesis and potential targets for clinical application. Supplementary Information The online version contains supplementary material available at 10.1007/s12672-025-02922-4. Keywords: Oral squamous cell carcinoma, LncRNA, mRNA, Bioinformatics, CeRNA network Introduction Oral cancer incorporates all malignant tumours arising in the oral cavity. The GLOBOCAN 2020 database cites that in the year 2020 alone, 377,700 new cases and 177,800 deaths were recorded globally, with increasing incidence and mortality rates in many countries [[34]1]. Oral squamous cell carcinoma (OSCC) is the most common form of oral cancer. Though advances in diagnostics and therapeutics have occurred, the most recently reported 5-year survival rate for OSCC stands at 40–50%, depending on macro-regions and the stage of diagnosis [[35]2–[36]4]. Early OSCC is much more amenable to cure, yet early detection is hard to achieve. Once OSCC progresses to advanced stages, survival averages sink dramatically, and distant metastasis is frequently seen [[37]2, [38]3, [39]5]. For that reason, understanding the molecular processes that drive OSCC proliferation and ensuing metastasis is critical for developing new strategies to curb the progression of the disease [[40]6, [41]7]. Long non-coding RNAs (lncRNAs) are RNA molecules longer than 200 nucleotides that do not code for proteins but play a crucial role in regulating gene expression. More than 5000 lncRNAs were identified based on transcriptome sequencing and microarray technologies [[42]8–[43]11]. Recent evidence suggests the existence of complex regulatory interactions among lncRNAs, microRNAs (miRNAs), and messenger RNAs (mRNAs) in cancer pathogenesis, which form competing endogenous RNA (ceRNA) networks. In addition, such networks significantly contribute to tumorigenesis by controlling gene expression at transcriptional and post-transcriptional levels [[44]12–[45]17]. In particular, the regulatory interactions within the lncRNA-miRNA-mRNA axis have been pointed out as important mechanisms in developing OSCC: tumor cell proliferation, invasion, metastasis, and resistance to therapy [[46]18–[47]20]. Some investigations have reported on the predictive and progressive association of lncRNAs with OSCC. For example, high levels of expression of CASC9 lncRNA correlate with larger sizes of the tumor, more advanced clinical stages, and poorer overall survival [[48]21]. Similarly, knockdown of lncRNA LSINCT5 will inhibit the proliferation and migration of OSCC, and its overexpression leads to poor prognosis [[49]19]. In addition, other important lncRNAs, such as SNHG16, CCAT1, and PART1, have been shown to play a role in the proliferation, migration, and invasion of OSCC [[50]18–[51]20]. Despite these studies, the very complicated regulatory mechanisms and interactions of lncRNA-miRNA-mRNA in the context of OSCC are still scarcely explored. This study aims to bridge those gaps in knowledge by utilizing high-throughput RNA sequencing (RNA-seq) analysis on 10 pairs of OSCC tumors and corresponding adjacent normal tissues. We constructed a complete lncRNA–mRNA interactome network using this information to illuminate key regulatory interactions involved in OSCC pathogenesis. Future studies should include in vitro and in vivo validations to complement the findings of the current performative analysis. Materials and methods Sample collection Histopathological confirmation of a diagnosis of OSCC was secured by exploding the ten OSCC patients contributed with tissue samples. Both samples of the tumor, as well as non-tumorous adjacent tissues, were carried out with standardized clinical protocols. Inclusion criteria for this study included patients with histologically confirmed OSCC and staging according to the tumor-node-metastasis (TNM) staging system. Only patients with Stage I-III OSCC, excluding those with distant metastasis, were included. Tumor grading was conducted according to the World Health Organization (WHO) histopathological classification. Immediately after excising, the tissues were flash-frozen in liquid nitrogen and sectioned into 2–3 mm thick portions to preserve RNA integrity. Samples were stored in 1.5 mL RNase-free screw-cap tubes, tightly sealed with parafilm, and kept at − 80 °C until further analysis. All patients were recruited from Hospital AA and given written consent before participation. The study was approved by the Ethics Committee of The First People's Hospital of Foshan (Approval No. Lunshenyan (2025) No. 202) and was carried out in accordance with the Declaration of Helsinki. Isolation of total RNA and sequencing The TRIzol reagent (Invitrogen, Carlsbad, CA, USA) was utilized in accordance with the manufacturer’s protocol to isolate total RNA from tumor tissue samples and normal tissues of OSCC patients. Gel electrophoresis in agarose was conducted to analyze the RNA purity and integrity of the samples. The initial quantification and detection of RNA concentration and purity were performed utilizing the NanoDrop-1000 instrument (Thermo Fisher Scientific Inc., Waltham, MA, USA). The Agilent 2100 Bioanalyzer was used to check the insert size of the library, with an anticipated insert size of around 250–300 bp. Illumina PE150 sequencing was conducted in accordance with the manufacturer’s guidelines Hisat2 ([52]http://ccb.jhu.edu/software/hisat2). A total of 20 paired samples (10 tumor and 10 adjacent normal tissues) were analyzed, obtaining 1,750,496,736 raw reads; after quality control, 1,707,758,234 clean reads were left to be analyzed. PPI network construction and hub gene identification A PPI network has been constructed using the STRING database version 11.5 with a confidence score threshold of 0.7 to evidence molecular interactions. The top 3096 DEGs were selected for network analysis based on adjusted p value (< 0.01) and fold change criteria (|log2FC| ≥ 2). The outputted PPI network was visualized in Cytoscape version 3.9.1, and hub genes were determined using the CytoHubba plugin, ranking the nodes based on the MCC algorithm. Isolation of LncRNA Following standard protocols, total RNA, including long non-coding RNA (lncRNA), was extracted from the OSCC tumor and adjacent normal tissues using TRIzol reagent (Invitrogen, Carlsbad, CA, USA). RNA quality and concentration were assessed with a NanoDrop spectrophotometer, and integrity was verified using an Agilent 2100 Bioanalyzer; only samples with an RNA integrity number (RIN) > 7 were used. Lithium chloride precipitation was performed to enrich lncRNA and remove small RNAs, selectively isolating lncRNAs and discarding small RNA fractions. The final lncRNA samples were stored at – 80 °C for further analyses. Screening for differential expression genes To quantify transcript fold changes and identify differential expression mRNA and lncRNAs, we employed the edgR package in R (R version 3.38.5) with a threshold of |log[2]FC| ≥ 2.0 and FDR < 0.01. Volcano plots and heat maps were generated in R software using the heatmap and ggplot2 packages, respectively. Principal component analysis (PCA) PCA was performed to analyze the variation in gene expression patterns between the tumor and adjacent normal tissues of OSCC. Three PCs accounting for more than 75% of the variance in gene expression profiles were chosen for visualization, providing robust separation between tumor and normal groups. Gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis The enrichment analyses using GO and KEGG pathways were performed in R with the clusterProfiler package (version 4.2.2). The hypergeometric test was used to determine the significance of enrichment, and p value adjustments for multiple comparisons were made using the Benjamini–Hochberg method, using an adjusted p value threshold < 0.05. Significantly enriched biological processes, cellular components, and molecular functions were identified and visualized using ggplot2. Co-expression of differential LncRNAs and mRNAs To create the co-expression network for both mRNA and lncRNA, we integrated the differential expression mRNAs and lncRNAs, then calculated the Pearson correlation coefficient for all pairs. We considered pairs of mRNAs and lncRNAs with coefficients greater than 0.9 or less than − 0.9 to be correlated. The resulting mRNA-lncRNA network was visualized using Cytoscape (version 3.9.1). We then utilized the Molecular Complex Detection (MCODE) plug-in in Cytoscape to identify hub genes within the network, providing insights into potential key regulators in the system. Co-expression network construction The interplays of differentially expressed lncRNAs and mRNAs were further explored by constructing a co-expression network that involved Pearson correlation analysis. A count of pairs of correlation coefficients > 0.9 or < − 0.9 was considered sufficiently big. The network was depicted in Cytoscape (3.9.1), while the Molecular Complex Detection (MCODE) plugin identified hub genes, thus allowing a glimpse into potential key regulators in OSCC. Results RNA sequencing and data quality control RNA sequencing was performed on 20 paired OSCC samples of tumor and adjacent normal tissues, giving rise to 1,750,496,736 raw reads (Table S1). Stringent quality control removed all but 1,707,758,234 high-quality clean reads (Fig. [53]1A). The Phred quality scores (Q30) were above 95%, indicating that the sequencing performed was highly accurate. Read alignment to the human reference genome (GRCh38) using Hisat2 showed a mapping efficiency of over 98%, establishing the reliability of sequence data for all samples. Through PCA, tumor tissue can be distinctly separated from adjacent tissue (Fig. [54]1B). Fig. 1. [55]Fig. 1 [56]Open in a new tab Identification of differential expression mRNAs and lncRNAs inRNA-seq. A Box plot of gene expression FPKM distribution in samples. B PCA results for the RNA-seq data. C Volcano plot of differential expression mRNAs. D Volcano plot of differential expression lncRNAs. E Heatmap of differential expression mRNAs. F Heatmap of differential expression lncRNAs. N normal, T tumor Identification of differentially expressed genes (DEGs) and LncRNAs A total of 5362 DEGs and 2801 differentially expressed lncRNAs were identified (|log2FC| ≥ 2.0, FDR < 0.01) (Table S2). The most significantly upregulated DEGs included IGF2BP1 (log2FC = 4.67, p < 0.001), CLDN6 (log2FC = 4.21, p < 0.001), and HLA-G (log2FC = 3.98, p < 0.001). The most significantly downregulated genes included LINC00472 (log2FC = − 4.51, p < 0.001), MUC19 (log2FC = − 4.33, p < 0.001), and ZG16B (log2FC = − 4.12, p < 0.001). Volcano plots (Fig. [57]1C, D) and hierarchical clustering heatmaps (Fig. [58]1E, F) visualize these significant differences between OSCC tumors and normal tissues. Gene ontology (GO) and KEGG pathway enrichment analysis As shown in Fig. [59]2A, the top 20 biological processes (BP) terms in tumor tissues were associated with single-organism processes, system development, and responses to organic substances; the top 20 CC terms were those associated with integrated components of the plasma membrane, extracellular regions, and cytoplasm; top 20 MF terms were involved in molecular functions, protein binding, and protein dimerization activities (Table S3, Fig. [60]2A). The KEGG pathway analysis identified the top significantly enriched pathways for DEGs, which included metabolic pathways, pathways in cancer, cytokine–cytokine receptor interaction, viral carcinogenesis, and systemic lupus erythematosus (Fig. [61]2B). These pathways are essential during OSCC pathogenesis, with the PI3K-Akt signaling pathways known to regulate tumor proliferation, survival, and resistance to apoptosis (Fig. [62]2B). Fig. 2. [63]Fig. 2 [64]Open in a new tab Functional analysis of differential expression mRNAs. A GO analysis of differential expression mRNAs. B KEGG pathway analysis of differential expression mRNAs The GO and KEGG analysis of differential expression lncRNAs revealed important pathways linked to OSCC (Fig. [65]3). The top 20 terms for BP, CC, and MF are presented in Fig. [66]3A. In KEGG analysis, the top five significantly enriched pathways for DEGs were metabolic pathways, pathways in cancer, regulation of actin cytoskeleton, neuroactive ligand–receptor interaction, and cytokine–cytokine receptor interaction (Table S4, Fig. [67]3B). Fig. 3. [68]Fig. 3 [69]Open in a new tab Functional analysis of differential expression lncRNAs. A GO analysis of differential expression lncRNAs. B KEGG pathway analysis of differential expression lncRNAs PPI network construction and hub gene identification A network of protein–protein interactions (PPIs) was determined by using the STRING database, including 3096 DEGs, and allowing intermedium-high confidence scores for integration (> 0.7) (Fig. [70]4). Selected hub genes are recognized and ranked by the Maximal Clique Centrality (MCC) algorithm via the plug-in CytoHubba. The list of the top 10 hub genes includes IGF2BP1, CLDN6, C1orf94, HLA-G, MUC19, LINC00472, LINC02582, [71]AC073365.Y, HTN1, and ZG16B (Fig. [72]4). These genes are key to cancer progression, immune evasion, and tumor microenvironment regulation. Fig. 4. [73]Fig. 4 [74]Open in a new tab Construction of deferential expression lncRNA–mRNA co-expression network. Green ovals depict the representation of key genes; blue ovals represent differential expression mRNAs; differential expression lncRNAs are indicated by red diamonds lncRNA-mRNA co-expression network A co-expression network was constructed to investigate the lncRNA–mRNA interactive network. Spearman correlation analysis identified significant associations (|r| >0.9, p < 0.01). Cytoscape visualization showed that IGF2BP1, CLDN6, and LINC00472 are the major regulatory hubs, indicating their possible roles in the modulation of OSCC-associated pathways (Fig. [75]4). These results provide significant insights into the OSCC pathogenesis, highlighting specific key genes and pathways that could be candidates for either therapeutic intervention or as potential prognostic biomarkers. Discussion OSCC is an extremely heterogeneous malignancy; various factors, including ethnicity, tumor stage, and molecular subtypes, may lead to certain variations in gene expression patterns and impact patient prognosis [[76]22, [77]23]. We identified differentially expressed genes (DEGs) and long non-coding RNAs (lncRNAs) associated with OSCC, revealing several key regulatory pathways. Incorporating TCGA RNA-seq data into the analysis could interactively reinforce these findings by correlating biomarker gene expression with survival signatures in various patient cohorts [[78]24]. The results of our work strongly proposed the importance of lncRNA-mediated regulatory networks in OSCC development. The competing endogenous RNA (ceRNA) hypothesis states that lncRNAs act as molecular sponges that modulate the stability and translation of target mRNAs through the sequestration of certain miRNAs [[79]25, [80]26]. Although key lncRNA–mRNA interactions were identified, miRNA profiling was not fully integrated into our study, which limited the extent to which ceRNA interactions could be explored. This enhances the potential for further insights into the interconnected regulatory networks directing OSCC pathogenesis, using miRNA expression profiling complemented by target prediction tools such as miRDB or TargetScan [[81]27]. The protein–protein interaction (PPI) network analysis revealed several key hub genes, including IGF2BP1, CLDN6, and HLA-G. IGF2BP1 is an oncogenic RNA-binding protein associated with enhanced tumor cell proliferation and metastasis, with high expression levels linked to poor prognosis in multiple cancers [[82]28]. CLDN6, a tight junction protein, is involved in epithelial-mesenchymal transition (EMT), contributing to OSCC invasiveness and metastatic potential [[83]29]. HLA-G, an immune checkpoint regulator, has been implicated in immune evasion mechanisms, suggesting a potential role in OSCC immune escape [[84]30]. Our pathway enrichment analysis highlighted PI3K-Akt and NF-kappa B signaling as major pathways involved in the biology of OSCC. Those pathways are vital for tumor cell survival, proliferation, and resistance to therapy [[85]31, [86]32]. The PI3K-Akt pathway, often activated in OSCC, has been linked with poor prognosis, while NF-kappa B signaling contributes mainly to inflammation-driven carcinogenesis and chemoresistance [[87]33]. Small-molecule inhibitors specifically targeting these pathways or applying immunotherapeutics might provide novel therapeutic options for OSCC patients. The relative differential coding potential of mRNAs, as opposed to non-coding RNAs, raises important considerations in directing their contributions to OSCC progression. mRNAs would code for active proteins, leading to oncogenic processes, while on the other hand, lncRNAs would primarily function transcribing and post-transcribing to regulate expression [[88]34]. Literature reviews hint at specific lncRNAs, such as LINC00472, playing tumor-suppressing roles in oncogene expression regulation. In contrast, lncRNAs such as LINC02582 are posited to play oncogenic protein roles by promoting tumor growth and invasion [[89]35, [90]36]. All the same, it must be said that the authors are cognizant of limitations in their study. These small samples hinder generalizability and thus call for larger multi-center studies. The results mentioned here, while interesting, should be treated with caution since a lot still needs to be learned, particularly in terms of experimental validation through in vitro and in vivo models. Further investigation would benefit greatly from integrating clinical data, including patient survivorship and treatment response, which would enhance the translational potential of those biomarkers identified. Future efforts should focus on functional assays and patient-derived OSCC samples to establish clinical correlations. Therefore, from this analysis of lncRNA–mRNA interactions in OSCC, key pathways and intricate regulatory networks implicated in tumor progression emerged. By integrating bioinformatics analyses with experimental verification and clinical datasets, it may be possible to follow these findings with multi-center studies in a broader scope that could ultimately lead to novel diagnostic and therapeutic strategies for OSCC. How the interactions between these molecular players impact OSCC prognosis and response to therapy remains an important area of inquiry for the future. Conclusion Our study identified ten RNAs as potential biomarkers for OSCC, providing valuable insights into their regulatory mechanisms in the lncRNA-mRNA ceRNA network. From building a PPI network, we identified key hub genes that might contribute to OSCC progression and therapeutic resistance. While our findings identified potential molecular targets, further experimental validation and clinical studies are required to examine their diagnostic and prognostic significance. Future studies could refine these biomarkers into novel OSCC diagnosis and treatment strategies when integrated with multi-omics data and functional assays. Electronic supplementary material Below is the link to the electronic supplementary material. [91]Supplementary Material 1^ (636.6KB, xls) [92]Supplementary Material 2^ (369.4KB, xls) [93]Supplementary Material 3^ (166KB, xls) [94]Supplementary Material 4^ (2.1MB, xls) Author contributions ShiWei Liu, Jin Li and Qing Shao carried out study concepts and design, clinical and experimental studies, data acquisition and analysis, statistical analysis and manuscript preparation and editing; JuFeng Chen and Chen Zou contributed to literature research; YiLong Ai was the guarantor of integrity of the entire study and helped to mnuscript review. All authors have read and approved this article. Funding None. Data availability The datasets used or analysed during the current study are available from the corresponding author on reasonable request. Declarations Consent for publication Informed consent was obtained from all individual participants included in the study. Ethical approval All procedures performed in studies involving human participants were in accordance with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. This study is approved by the Ethics Committee of Foshan First People’s Hospital. Written informed consent was obtained. Competing interests The authors declare no competing interests. Footnotes Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. ShiWei Liu, Jin Li and Qing Shao are co-first authors. References