Abstract Background To investigate the correlation between computed tomography (CT) radiomic characteristics and key genes for cervical lymph node metastasis (LNM) in oral squamous cell carcinoma (OSCC). Methods The region of interest was annotated at the edge of the primary tumor on enhanced CT images from 140 patients with OSCC and obtained radiomic features. Ribonucleic acid (RNA) sequencing was performed on pathological sections from 20 patients. the DESeq software package was used to compare differential gene expression between groups. Weighted gene co‐expression network analysis was used to construct co‐expressed gene modules, and the KEGG and GO databases were used for pathway enrichment analysis of key gene modules. Finally, Pearson correlation coefficients were calculated between key genes of enriched pathways and radiomic features. Results Four hundred and eighty radiomic features were extracted from enhanced CT images of 140 patients; seven of these correlated significantly with cervical LNM in OSCC (p < 0.01). A total of 3527 differentially expressed RNAs were screened from RNA sequencing data of 20 cases. original_glrlm_RunVariance showed significant positive correlation with most long noncoding RNAs. Conclusions OSCC cervical LNM is related to the salivary hair bump signaling pathway and biological process. Original_glrlm_RunVariance correlated with LNM and most differentially expressed long noncoding RNAs. Keywords: computed tomography imaging (CT imaging)3, genomics4, lymph node metastasis (LNM)2, oral squamous cell carcinoma (OSCC)1, ribonucleic acid sequencing (RNA‐seq)5 1. INTRODUCTION Oral cancer accounts for 3%–5% of all systemic malignant tumors. It is the 15th most common malignant tumor in terms of incidence, with more than 480,000 new cases and approximately 220,000 global deaths each year[38] ^1 ; In this context, oral squamous cell carcinoma (OSCC) accounts for 90% of oral cancers.[39] ^2 After comprehensive treatment, the 5‐year survival rate of OSCC remains at 50%,[40] ^3 , [41]^4 while that of patients with cervical lymph node metastases (LNM) is only 20%. Ultrasonography (US), computed tomography (CT), magnetic resonance imaging (MRI), and 18F‐FDG PET/CT have been widely used to evaluate the cervical LN status of OSCC patients before surgery.[42] ^5 , [43]^6 , [44]^7 Medical imaging plays an important role in the clinical decision‐making process in oncology. Radiomics is used to decode tumor phenotypes by converting conventional imaging images of patients into quantitative data that can be mined, so as to help in the application of tumor diagnosis, efficacy prediction, and prognosis analysis.[45] ^8 Therefore, accurate assess cervical LN status based on CT radiomics, and screening of biomarkers related to cervical LNM are crucial for the treatment and prognosis of OSCC. Molecular characterization of tumor tissue by genomics and proteomics is achieved by obtaining tissue specimens through invasive surgery. However, specimens have an inherent selection bias during the biopsy process and cannot accurately display the entire lesion.[46] ^9 , [47]^10 Ribonucleic acid‐sequencing (RNA‐seq) can provide a comprehensive view of the entire, including fusion gene identification, coding sequence polymorphism study, gene expression analysis, directional sequencing, and single‐cell sequencing transcript and shape understanding on genome function from all aspects.[48] ^11 The ideal biomarker is noninvasive, predictive, high spatiotemporal resolution, and inexpensive. With tissue and blood compared with liquid‐derived biomarkers, imaging genomics combines radiographic phenotypes and genome data; it analyzes the relevance of genetic mutations and diagnoses radiographic features in intermediate phenotypes to explore disease pathogenesis. These markers can be used to predict treatment response and the likelihood of early metastasis and to personalize treatment decisions.[49] ^12 , [50]^13 , [51]^14 Image genomics has been applied to the study of various types of tumors, including renal clear cell carcinoma,[52] ^15 liver cancer,[53] ^16 and breast cancer[54] ^17 among others. It is noteworthy that standardized image acquisition and reconstruction is an important step to achieve the accuracy and reliability of imaging genomics research and analysis results.[55] ^18 In addition, the Image Biomarkers Standardization Initiative (IBSI) should also be established when calculating and extracting the images required for radiomics features.[56] ^19 Tumor phenotypic characteristics can be obtained through multimodal high‐throughput noninvasively, and radiomics features can be extracted based on standardized imaging data to jointly serve as noninvasive biomarkers for early tumor identification in clinical practice.[57] ^20 This study was aimed to analyze the correlation between imaging and gene expression in patients with cervical LNM from OSCC. Firstly, the tumor‐based multi‐region analysis included multiple quantitative imaging features, the Minimum Redundancy Maximum Relevance(mRMR) and Least absolute shrinkage and selection operator (LASSO) were used to further screen the optimal radiomics features. Weighted Gene Co‐expression Network Analysis (WGCNA) was performed to capture the gene expression patterns of lymph node metastasis in patients with OSCC. It also screened the association between LNM radiomic features and gene modules, explored imaging biomarkers and pathways of LNM, and aimed to predict the development of LNM in patients with OSCC. 2. MATERIALS AND METHODS 2.1. Screening of CT radiomic features of OSCC The study was approved by the Ethics Committee of the Chinese PLA General Hospital. A total of 140 patients with OSCC who underwent enhanced CT scan in the Department of Oral and Maxillofacial Surgery, Chinese PLA General Hospital from January 2014 to June 2020 were retrospectively analyzed for CT radiomics feature extraction (Table [58]S1). Twenty randomly selected tumor tissue samples were used for RNA‐seq (Table [59]S2). Patients who fulfilled the following criteria were included: no history of preoperative radiotherapy, chemotherapy, or any other anticancer therapy; underwent definitive surgery with a clear postoperative pathological diagnosis of squamous cell carcinoma. ITK‐SNAP software (version 3.8.0, University of Pennsylvania, [60]www.itksnap.org) was used to determine and delineate the region of interest (ROI) at the edge of the primary tumor on CT images. The PyRadiomics library from python was used to extract radiomic features from the delineated ROI. Finally, mRMR and LASSO were used to further select the optimal radiomic features (Figure [61]1). FIGURE 1. FIGURE 1 [62]Open in a new tab Work flowchart for the study. TCGA, the cancer genome atlas; OSCC, oral squamous cell carcinoma; non‐LNM, non lymph node metastasis; LNM, lymph node metastasis; LASSO, least absolute shrinkage and selection operator; WGCNA, weighted gene co‐expression network analysis; mRMR, minimum redundancy maximum relevance. 2.2. RNA‐seq For sample testing, 10–1000 ng/μL of total RNA with an RNA integrity number of ≥7.0 and 28S/18S ribosomal RNA (rRNA) ≥1.0 was analyzed using an Agilent 2100 Bioanalyzer. Following digestion with deoxyribonuclease I, a Ribo‐Zero TM kit was used for removing the rRNA from the total RNA, and the messenger RNA (mRNA), long noncoding RNA (lncRNA), and circular RNA were purified. The mRNA was broken into 140–160 nt fragments using a fragmentation buffer, and the secondary structure was denatured and hybridized with primers. A strand of circular deoxyribonucleic acid was synthesized using six‐base random primers by polymerase chain reaction (PCR). Double‐stranded circular deoxyribonucleic acid was synthesized from deoxyuridine triphosphate, and the ends were repaired using a Thermomixer Comfort apparatus by adding adenine bases to the 3′‐end. Following PCR amplification, an Agilent 2100 Bioanalyzer was used for detecting the range of the inserted fragments in the library. An ABI StepOnePlus Real‐Time PCR System (TaqMan Probe) was used to measure the quantitative concentration of the library. The library was denatured into single strands by the addition of sodium hydroxide, and hybridized with the connector on a Flow Cell. Bridge PCR amplification was performed on a cBot system; sequencing was finally performed on a DNBSEQ platform. The low‐quality reads containing adapter sequences and reads with >10% N (number of bases with mass values ≤5 accounting for >50% of the total reads) were removed. The gene expression profile data were then obtained by quality control of the raw sequencing data using FastQC software. 2.3. Differential expression analysis DESeq software was used for analyzing differential gene expression patterns in the expression profile data; the screening threshold was as follows: p < 0.05 and |log2FoldChange| > 1. The VennDiagram package (version 1.6.20) in R (version 4.0.1) was used to depict the differentially expressed genes in the 20 OSCC samples and the Cancer Genome Atlas (TCGA) database. The differentially expressed genes that were common between the two groups were subsequently extracted. 2.4. Weighted gene co‐expression network analysis (WGCNA) The differential gene expression data were normalized to transcript per million values to ensure comparability of the gene expression profiles among the samples. The optimal act law coefficient β value was determined based on the pickSoftThreshold function in the WGCNA package (version 1.69). The gene expression similarity matrix was converted into a network, constructed based on the adjacency matrix. The degree of dissimilarity between the genes was subsequently calculated and converted into a topological matrix. The topological overlap matrix was used to describe the level of association between the genes, and the hierarchical clustering method was used to obtain a gene clustering tree. The modules were split with the dynamic clipping tree algorithm and the minimum number of genes in each module was set to 30. The modules with similar expression were merged using the parameters TOMType = “unsigned” and MergeCutHeight = 0.25, and the network was finally constructed. The module eigengene (ME) was calculated by principal component analysis of the gene expression matrix of the entire module. The Pearson correlation coefficient between the ME and radiomic features was then calculated for determining the relationship between the modules and radiomic features. Smaller p values were indicative of more significant correlation. The labeledHeatmap function was used to visualize the model and the above correlation matrix. 2.5. Enrichment analysis of key module gene pathways Metascape ([63]http://metascape.org/) was used for online analysis of the shared differentially expressed genes identified from the sequencing data and TCGA database for the 20 OSCC samples in this study. Bar charts depicting the results of gene functional pathway enrichment were also prepared using Metascape. The genes that demonstrated significant correlation with the radiomic features were extracted. The genes were annotated by Gene Ontology (GO) analysis, into biological process (GO_BP), molecular function (GO_MF), and cellular component (GO_CC) categories. The Kyoto Encyclopedia of Genes and Genomes (KEGG) database was used for classification and annotation of the genes. The results of KEGG functional enrichment were depicted using bubble charts. 2.6. Correlation analysis Pearson correlation analysis was used for determining the correlation between the major enriched pathway genes and radiomic features in the key gene modules; these were visualized using the Pheatmap package (version 1.0.12) and Cytoscape (version 3.7.2), respectively. 3. RESULTS 3.1. Radiomic feature screening A total of 480 radiomic features were extracted from 140 patients with OSCC. The mRMR algorithm was used to further screen 171 features for LASSO analysis (Figure [64]2). LASSO regression was used to select the optimized feature subset. The seven following radiomic features were selected: Original_shape_Maximum2DDiameterRow, original_glrlm_RunVariance, original_glszm_Zonepercentage, square_firstorder_Median, square_glrlm_ShortRunHighGrayLevelEmphasis, logarithm_glszm_SmallAreaEmphasis, and exponential_firstorder_RootMeanSquared. FIGURE 2. FIGURE 2 [65]Open in a new tab Screening features in LASSO regression. (A) Relationship between the adjustment parameter (lambda) and mean square error (MSE) in LASSO regression; (B) Trend plot of the adjustment parameter (lambda) for 171 radiomic features in LASSO regression. 3.2. Differential gene analysis of pathological samples from patients with OSCC The transcriptome sequencing RNA‐seq data of the 20 patients in the experimental group revealed a total of 3527 differentially expressed RNAs (DERNAs) between LNM and non‐LNM; 2157 genes were upregulated while 1370 were downregulated (Figure [66]3). In the control setup, a total of 139 differentially expressed genes were screened between the LNM and non‐LNM groups in TCGA database. Among them, 41 genes were differentially expressed in both datasets (Figure [67]S1). Gene were primarily enriched in the GO: 0031424 (keratinization) pathway term (Figure [68]S2). The common differential genes that were enriched in this pathway included GAL, KRT81, TGM3, LCE2A, FOXJ1, SPRR3, LCE2B, and LCE2D (Table [69]1). FIGURE 3. FIGURE 3 [70]Open in a new tab Volcano plot depicting the overall distribution of the differentially expressed genes. For the DERNAs between the LNM and non‐LNM groups, the 2157 upregulated genes are depicted in red and the 1370 downregulated genes are depicted in green. Genes with no significant difference in expression levels are depicted in black. TABLE 1. Enrichment of common differential genes in keratinization. Gene symbol Description Biological process (GO) Protein function (Protein Atlas) GAL Galanin and GMAP prepropeptide GO:0051795 positive regulation of timing of catagen Disease related genes; Predicted secreted proteins GO:1902606 regulation of large conductance calcium‐activated potassium channel activity GO:1902608 positive regulation of large conductance calcium‐activated potassium channel activity KRT81 Keratin 81 GO:0070268 cornification GO:0031424 keratinization GO:0030216 keratinocyte differentiation Disease related genes Predicted intracellular proteins TGM3 Transglutaminase 3 GO:0043163 cell envelope organization Disease related genes ENZYME proteins; Transferases GO:0045229 external encapsulating structure organization Enzymes Potential drug targets; GO:0031069 hair follicle morphogenesis Predicted intracellular proteins LCE2A Late cornified envelope 2A GO:0031424 keratinization; GO:0030216 keratinocyte differentiation GO:0009913 epidermal cell differentiation Predicted intracellular proteins FOXJ1 Forkhead box J1 GO:0002510 central B cell tolerance induction GO:0002646 regulation of central tolerance induction GO:0002648 positive regulation of central tolerance induction Predicted intracellular proteins Disease related genes Cancer‐related genes Candidate cancer biomarkers; Transcription factors; Helix‐turn‐helix domains SPRR3 Small proline rich protein 3 GO:0018149 peptide cross‐linking Predicted intracellular proteins GO:0070268 cornification Cancer‐related genes Candidate cancer biomarkers GO:0031424 keratinization LCE2B Late cornified envelope 2B GO:0031424 keratinization Predicted intracellular proteins GO:0030216 keratinocyte differentiation GO:0009913 epidermal cell differentiation LCE2D Late cornified envelope 2D GO:0031424 keratinization; Predicted intracellular proteins GO:0030216 keratinocyte differentiation GO:0009913 epidermal cell differentiation [71]Open in a new tab 3.3. Construction of gene co‐expression network module WGCNA is used to simplify complex gene expression data into several functional modules by measuring the co‐expression relationship between gene expression correlation coefficients and radiomic features; the modules with biological significance can be identified by combining the phenotypic information. The optimal act law coefficient β value was determined to be 26; this indicates that the network was scale‐free (Figure [72]S3). The modules with similar expression were merged and a total of 13 gene modules were finally constructed (Figure [73]4A). The largest module comprised 1013 RNAs (gene module in turquoise), while the smallest module contained 49 RNAs (gene module in cyan). The number of genes in each module has been depicted in the Table [74]S3. The relative expression levels of the genes in the LNM and non‐LNM groups were subsequently determined (Figure [75]4B). In the largest gene module, the genes in the LNM group were relatively highly expressed, while the genes in the non‐LNM group exhibited relatively low expression. FIGURE 4. FIGURE 4 [76]Open in a new tab Construction of the co‐expressed gene network module and verification of gene expression levels. (A) The upper part of the co‐expressed gene network module is divided into clustering trees, the middle color bar is the gene module segmented by the dynamic shear tree algorithm, and the lower color band is the final gene module result after merging the modules with similar expression. (B) Relative expression levels of LNM and non‐LNM genes; red represents high expression and blue represents low expression. The heat map of correlation between gene modules and radiomic features (Figure [77]5) showed that the original_glrlm_RunVariance feature had the strongest correlation with the turquoise gene modules (correlation coefficient 0.85, p < 0.001). The correlation between the purple gene module and the square_firstorder_Median feature was second only to that with square_firstorder_Median (correlation coefficient 0.82, p < 0.001). The exponential_firstorder_RootMeanSquared feature also showed correlation with the purple gene module (correlation coefficient 0.78, p < 0.001). The turquoise and purple gene modules were screened for subsequent analysis. FIGURE 5. FIGURE 5 [78]Open in a new tab Heat map of correlation between gene modules and radiomic features. Rows represent the module eigengene (ME) of each module and represent the overall expression level of the module. Columns represent radiomic features. Each square is represented by the r value (p value); r is the correlation coefficient between modules and radiomic features and the p value represents the significance of the correlation coefficient. 3.4. Enrichment analysis of key gene module pathways The turquoise gene modules were mainly enriched in the “taste transduction” signaling pathway of the KEGG database (Figure [79]6A) and the “transposition, RNA‐mediated” biological process of GO_BP (Figure [80]6B). In the KEGG database(Figure [81]6C), the purple gene module was mainly enriched in the salivary secretion pathway; in GO_BP, it was mainly enriched in the biological process of saliva secretion (Figure [82]6D). Moreover, the enrichment bubble plots of the two genes in GO_CC and GO_MF are shown in Figure [83]S4, respectively. FIGURE 6. FIGURE 6 [84]Open in a new tab Enrichment bubble diagram. Enrichment bubble diagram of Turquoise gene module based on KEGG_Pathway (A), GO_BP (B); Purple module gene based on KEGG_Pathway (C); GO_BP (D). 3.5. Correlation analysis between genes and radiomic features A total of 56 genes were extracted from the main enrichment pathways of the turquoise and purple gene modules. The correlation between each gene and radiomic eigenvalue was calculated based on the expression levels of each gene. The results showed significant positive correlation (p < 0.05) between most genes and the two image features, square_glrlm_ShortRunHighGrayLevelEmphasis and original_glrlm_RunVariance (Figure [85]7). The correlation network diagram between key genes and radiomic features, which was constructed based on screening, showed that the original_glrlm_RunVariance feature correlated significantly and positively with most long noncoding RNAs (Figure [86]8). FIGURE 7. FIGURE 7 [87]Open in a new tab Correlation between key genes and the radiomic features heat map. Rows represent key genes, columns represent radiomic features, red represents positive correlation, and blue represents negative correlation, * represents significance (p < 0.05). FIGURE 8. FIGURE 8 [88]Open in a new tab Correlation network diagram between key genes and radiomic features (lncRNAs in orange, mRNAs in green, pseudo genes in blue, and radiomic features in red). The lines represent correlations and solid lines represent positive correlations. 4. DISCUSSION The concept of imaging genomics was first proposed by the European Society for Radiotherapy and Oncology (ESTRO) to establish correlations between gene expression data and radiomic features.[89] ^21 In a study from Israel, Segal et al[90] ^16 obtained 28 image features and gene expression profiles of patients with primary liver cancer after contrast enhanced CT scanning; they proposed a three‐step strategy for performing correlation analysis between radiomic features and gene expression. Image genomics correlates quantitative or qualitative imaging features with genomic data obtained from tissue analysis and other clinical data to provide more comprehensive understanding on tumors; this in turn aids the identification of imaging biomarkers that can be used as alternatives to genetic testing. The noninvasive nature and high spatial and temporal resolution of imaging biomarkers make them ideal for therapeutic monitoring and predictive analysis. Although studies performed over the past few decades have elucidated some of the mechanisms underlying OSCC progression, most studies have focused on protein‐coding genes, with few studies based on noncoding RNA. Protein‐coding genes comprise only 2% of the human genome, and most transcripts are noncoding RNA, including lncRNAs, which exceed 200 nt in length and do not have protein‐coding ability. Studies have shown that genomic regions that cannot encode proteins play an important role in regulating cellular physiological processes and disease progression; these regions are usually transcribed as lncRNA.[91] ^22 , [92]^23 , [93]^24 Based on the correlation between gene and radiomics features greater than 0.75. In this study, the radiomic feature, ‘original_glrlm_RunVariance’, which was extracted from enhanced CT images of 140 patients with OSCC, correlated significantly and positively with various lncRNA including F11‐AS1, LOC105373131, and LINC02588. A large number of abnormally expressed lncRNAs are associated with various cancers, and have become an important universal gene category in cancer development and progression. OSCC demonstrates many instances of gene regulation by lncRNAs; these are associated with and affect various aspects of cell balance including proliferation, survival, migration, and genomic stability. There is currently no OSCC‐specific diagnostic marker. Abnormal expression of some lncRNAs has been shown to be closely related to cancer prognosis. HOX transcript antisense RNA (HOTAIR) is highly expressed in OSCC tissues, and its expression level correlates with tumor size, TNM stage, and prognosis. These findings suggest that HOTAIR can be used as a biomarker for the diagnosis and prognostication of OSCC and as a molecular target for treatment.[94] ^25 Tang et al[95] ^26 found that MALAT‐1 and HOTAIR are expressed in patients with primary OSCC tumors; the expression level of HOTAIR differed between patients with LNM and primary tumor controls. This suggests that lncRNA detection offers promise for the noninvasive and rapid diagnosis of OSCC in the clinic; it may also help determine the presence of cervical LNM. The therapeutic effect and possibility of early cervical LNM can therefore be further predicted based on these imaging features. Many genomic and clinical biomarkers identified from various types of cancer have been included in TCGA; these correlate with corresponding images in the Cancer Imaging Archive database.[96] ^14 However, imaging data from the Cancer Imaging Archive (TCIA) cannot be used in clinical practice as the results of genetic testing cannot be matched with specific locations on images.[97] ^10 TCIA database was different from the imaging data types of 140 patients with OSCC in this study. contrast‐enhanced CT was used in 140 patients with OSCC, while Unenhanced CT was used in TCIA database. Imaging genomics can quantify lesion characteristics, better distinguish tumor nature and metastasis risk, stratify patients according to risk, and allow more accurate imaging and screening of tumors.[98] ^27 , [99]^28 Zhou et al[100] ^29 analyzed preoperative CT imaging features in 113 cases of non‐small cell lung cancer and detected changes in genetic information from tumor tissues by RNA‐seq; they then identified the genes associated with ground‐glass shaped irregular nodules. Using radiomics and gene expression analysis of enhanced CT images, Badic et al[101] ^30 found that enhanced CT combined with gene expression can predict the prognosis of primary colorectal cancer more effectively; this is conducive to the formulation of treatment plans. The main limitations of current imaging genomics models include the lack of reproducibility and stability.[102] ^31 In this study, owing to the small sample size (as in most retrospective studies), a prospective validation cohort is lacking. The main limitation of deep learning‐based image genomics is the availability of a limited data set. Insufficient volumes of required data lead to insufficient stratification between training, validation, and test queues.[103] ^32 , [104]^33 , [105]^34 Interpretable models therefore need to be urgently combined with open access and high‐quality public databases, as well as complete genomic and imaging data for different types of tumors; this will allow tumor heterogeneity to be better studied and addressed. We contrast enhanced CT images of cervical LNM of OSCC were delineated according to this standard, and the imaging features were obtained, and finally screened a small number of reliable mark genes related to radiomic features. And finally screened a small number of reliable mark genes related to radiomic features. In addition, factors such as different cancer cell clones, tumor microenvironment, cell cycle and cell differentiation should be taken into account when establishing radiogenomics models,[106] ^35 although these factors may eventually be replaced by more effective radiogenomics. In summary, with the development of precision medicine, image genomics has become a bridge between the tumor phenotype and genotype. From the perspective of cancer management, the role of imaging genomics is expected to expand across all stages, including tumor diagnosis, treatment response prediction, and risk monitoring. Imaging genomics not only provides more accurate prediction results, but also more effective treatment options. The use of imaging biomarkers to reflect gene expression analysis of solid tumors provides a rapid and reproducible noninvasive evaluation modality for more accurate diagnosis and prediction through detailed analysis of images. Further studies are needed to explore the association between radiomic features and key genes; this will allow researchers to determine whether new biomarkers can predict diagnosis, metastasis, or treatment response in independent patient populations. It will also improve the accuracy of prognostication and help develop the best treatment plan, thereby contributing to personalized precision therapy. 5. CONCLUSIONS CT imaging of cervical lymph node metastasis in OSCC is closely related to RNA sequencing. Based on RNA‐seq data analysis from OSCC patients, the findings from this study suggest that the “salivary secretion” signaling pathway and biological process may be related to the development of LNM in OSCC. Correlation analysis of radiomic features and key genes extracted from enhanced CT showed the “original_glrlm_RunVariance” feature to be related to metastasis; it also showed significant positive correlation with most differentially expressed lncRNAs. This forms a basis for the future search of gene imaging substitutes and the interpretation of radiomic features at the gene level. AUTHOR CONTRIBUTIONS Nenghao Jin: Data curation (lead); writing – original draft (lead). Bo Qiao: Data curation (equal); investigation (equal). Min Zhao: Methodology (equal); software (lead). Liangbo Li: Data curation (equal). Liang Zhu: Data curation (equal). Xiaoyi Zang: Data curation (equal). Bin Gu: Funding acquisition (lead); writing – review and editing (equal). Haizhong Zhang: Software (equal); writing – review and editing (equal). FUNDING INFORMATION This work were supported by Beijing Natural Science Foundation(Number: Z200001) and the National Key R&D Program of China (No. 2020YFC2008900). CONFLICT OF INTEREST STATEMENT The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. ETHICS STATEMENT The studies involving human participants were reviewed and approved by Ethics Office of Chinese PLA General Hospital(Number: S2022‐203‐01). And the consent to participate in the study have been obtained from participants. Author confirm that all the experiments on the use of human tissue samples confirm that all experiment were performed in accordance with the Declaration of Helsinki's guidelines and regulations. The study was approved and the participants gave their informed consent. CONSENT Not applicable. Supporting information Supplementary Figure 1 Venndiagram of the common differential genes in RNA‐seq and TCGA databases of 20 OSCC patients, blue circles are the differential genes between LNM and non‐LNM groups of OSCC patients in TCGA database, yellow circles are the differential genes obtained by RNA‐seq analysis of pathological samples of 20 OSCC patients in experimental group. A total of 41 genes were differentially expressed in both datasets. Supplementary Figure 2 Enrichment analysis of common differential gene pathways showed that genes were mainly enriched in the GO: 0031424 (Keratinization) pathway Supplementary Figure 3 Selection of parameters for network construction. A. The β values are presented on the x‐axis and the R^2 values (using the scale‐free topology junction model under different β values) are presented on the y‐axis; B. The x‐axis represents the β values and the y‐axis represents the average adjacency coefficient under different β values. Supplementary Figure 4 Enrichment bubble diagram: Enrichment bubble diagram of Turquoise gene module based on GO_CC (A) and GO_MF (B); Purple module gene based on GO_CC (C) and GO_MF (D). Supplementary Table 1 Information of 140 patients with oral squamous cell carcinoma Supplementary Table 2 The correlation of OSCC clinical‐pathological variables Supplementary Table 3 13 color genes Number of genes contained in the color module [107]Click here for additional data file.^ (963.6KB, docx) ACKNOWLEDGMENTS