Abstract Background Oral squamous cell carcinoma (OSCC) is the most common type of oral cancer with alarmingly high morbidity. The cancer-associated fibroblasts (CAFs) play a pivotal role in tumor development, while their specific mechanisms in OSCC remains largely unclear. Our object is to explore a CAFs-related biomarker in OSCC. Methods Single-cell RNA sequencing (ScRNA-seq) analysis was used to pinpoint CAF clusters in OSCC samples. Differentially expressed genes and Cox regression analyses were used to identify candidate genes, and their functions were evaluated using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses. The prognostic performance of the identified biomarker was evaluated using receiver operating characteristic analysis. The qPCR and western blot were used to assess gene expression. The hub gene related immune characteristics were analyzed in independent cohorts, and gene expression differences between different immunotherapy response groups were investigated using Pearson correlation analysis. Results Desmoglein-2 (DSG2) was identified as a CAFs-related biomarker in OSCC exhibiting elevated expression compared to controls and being associated with poor prognosis. Enrichment analyses revealed that DSG2 was involved in signal transduction pathways like focal adhesion. The Area Under Curve values of DSG2 in predicting prognosis exceeded 0.6 in both training-set and validation-set. Furthermore, patients with low DSG2 expression were more likely to benefit from immunotherapy than those DSG2 highly expressed patients. Conclusion Our study identified DSG2 as a reliable CAFs-related prognostic biomarker in OSCC, providing a new reference for the mechanistic understanding and target therapy of this malignancy. Supplementary Information The online version contains supplementary material available at 10.1186/s12903-024-05284-5. Keywords: Oral squamous cell carcinoma, Cancer-associated fibroblasts, DSG2, Prognosis, Immune cell infiltration, Immunotherapy Introduction Oral squamous cell carcinoma (OSCC) is currently the most prevalent form of oral cancer, with alarmingly high morbidity and mortality rates [[36]1]. It affects diverse sites such as the lips, tongue, palate, cheek mucosa, gums, floor of the mouth, and the retromolar region [[37]2]. Multiple etiological factors including smoking, betel nut chewing, and human papillomavirus (HPV) infection have been established as significant risks for the development of OSCC [[38]1]. Despite advancements in treatment modalities, the overall 5-year survival rate for OSCC patients remains around 50% [[39]3], primarily due to the high incidence of metastasis, recurrence, and resistance to traditional chemotherapy methods [[40]4]. Additionally, the majority of OSCC patients are diagnosed at a clinically advanced stage, and further compounding the challenge [[41]5]. Although modern treatment options have shown promise in improving patient outcomes, a significant proportion of patients, ranging from 25 to 50%, continue to experience recurrence and distant metastasis [[42]6]. Therefore, early diagnosis and more effective therapeutic measures are crucial to improve the prognosis of OSCC patients. The tumor microenvironment (TME), a complex ecosystem encompassing tumor cells and stromal cells, has been established to associate with cancer progression [[43]7]. Among these stromal components, cancer-associated fibroblasts (CAFs) play a pivotal role in tumor development, tissue repair, and inflammatory responses [[44]8]. CAFs have been extensively studied in various cancer types, such as breast cancer, prostatic cancer, and lung cancer [[45]9–[46]11], due to their crucial involvement in cancer progression. Specifically, CAFs secrete a diverse range of growth factors and cytokines that promote tumor-enhancing properties, thereby influencing tumor cell proliferation, metastasis, and chemoresistance [[47]12, [48]13]. In the context of OSCC, CAFs have garnered increasing attention from researchers. For instance, Bienkowska et al. have observed a correlation between CAFs enrichment and poorer prognosis in OSCC patients [[49]14]. Sekiguchi et al. have demonstrated that a specific factor, ACLP, activates CAFs in OSCC, suppressing CD8 + T cell infiltration and thus facilitating tumor growth [[50]15]. Additionally, CAFs have been shown to stimulate the OSCC tumor cells invasion via CXCL1 and IL-1β, highlighting the interdependency between CAFs and cancer cells within the OSCC microenvironment [[51]16]. Yang et al. have further elucidated the underlying mechanism, revealing that CAFs activate the TGF-β1/Smad pathway, promoting OSCC invasion and ultimately leading to a worse prognosis [[52]17]. These accumulating evidences underscore the crucial role of CAFs in OSCC. Understanding these functions is essential for developing targeted therapeutic strategies against OSCC. Therefore, there is an urgent need to identify convenient and cost-effective CAFs-related biomarkers for OSCC patients, which could potentially serve as prognostic indicators and therapeutic targets. In this study, our objective is to employ single-cell RNA-sequencing (scRNA-seq) analysis to delineate CAFs clusters within OSCC and to identify CAFs-related hub genes in OSCC. Leveraging bioinformatics and machine learning techniques, we aim to identify a CAFs-related prognostic biomarker and to evaluate its impact on the OSCC immune microenvironment as well as treatment efficacy. Through our comprehensive investigation, we anticipate gaining novel insights into the diagnosis and treatment strategies for OSCC. Materials and methods Subjects The mRNA expression profile data, maf files, and clinical information of 335 OSCC patients including 305 OSCC samples and 30 paraneoplastic samples (controls) were downloaded from the Cancer Genome Atlas (TCGA, [53]https://tcga-data.nci.nih.gov/tcga/, version: Data Release 36.0—December 12, 2022) database. The detailed clinical information of these 305 OSCC samples were shown in Table [54]1. Table 1. Clinicopathological characteristics of OSCC patients from TCGA database Characteristics Patients (N = 305) NO % Gender Female 102 33.33% Male 203 66.67% Age  ≤ 61(Median) 156 51.31%  > 61(Median) 149 48.69% Grade GX 3 0.98% G1 48 16.01% G2 191 62.42% G3 62 20.26% Unknown 1 0.33% Survival Time Long(> 5 years) 31 10.13% Short(< 5 years) 274 89.87% OS status Dead 143 46.73% Alive 162 53.27% M M0 288 94.44% M1 2 0.65% Mx 12 3.92% Unknown 3 0.98% N N0 160 52.29% N1 56 18.30% N2 76 24.84% N3 2 0.65% Nx 9 2.94% Unknown 2 0.98% T T1 18 5.88% T2 97 31.70% T3 72 23.86% T4 110 35.95% Tx 5 1.63% Unknown 3 0.98% Primary site Anterior floor of mouth 2 0.65% Border of tongue 1 0.33% Cheek mucosa 19 6.21% Floor of mouth 51 16.67% Gum 8 2.61% Hard palate 4 1.31% Lip 3 0.98% Lower gum 2 0.65% Mouth 20 6.54% Overlapping lesion of lip,oral cavity and pharynx 69 22.55% Palate 1 0.33% Tongue 125 40.85% Upper gum 1 0.33% [55]Open in a new tab In addition, the mRNA expression profile datasets [56]GSE103322 ([57]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE103322), [58]GSE30784 ([59]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30784), [60]GSE78060 ([61]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE78060), [62]GSE41613 ([63]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE41613), [64]GSE35640 ([65]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE35640), and [66]GSE91061 ([67]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE91061) were downloaded from the Gene Expression Omnibus (GEO, [68]https://www.ncbi.nlm.nih.gov/geo/) database. The details of all datasets were shown in Supplementary Table 1. Especially, the [69]GSE103322 dataset included 5,901 scRNA-seq data. The [70]GSE41613 was used to validate the survival differences. The [71]GSE35640 and [72]GSE91061 datasets were utilized to investigate the impact of OSCC’ s prognostic biomarker on immunotherapy response. ScRNA-seq data processing and analysis The processing of scRNA-seq data was conducted using the R package “Seurat” (version 5.0.1) [[73]18]. Initially, cells with low-quality data were filtered out. Subsequently, the filtered data were normalized. Principal Component Analysis (PCA) was then performed on the normalized data. Unsupervised clustering of the major cell subtypes was achieved by the “FindClusters” function within “Seurat” package, which was followed by visualization using t-distributed stochastic neighbor embedding (tSNE). Cellular annotation of CAFs were achieved through four marker genes, including ACTA2, FAP, PDGFRB, and NOTCH3. To further elucidate the heterogeneity within the CAF population, a re-clustered procedure was implemented, employing the same algorithm of “FindNeighbors” and “FindClusters” functions. This re-clustering was followed by dimensionality reduction using tSNE, specifically focusing on the CAF clusters [[74]19]. The specific genes of CAFs clusters in OSCC were identified using the “FindAllMarkers” function in “Seurat”. The copy number variations (CNV) characteristics among the CAFs clusters were analyzed using the “CopyKAT” R package to differentiate between tumor cells and normal cells in each sample. Differential expression analysis The differentially expressed gene (DEG) analysis was conducted using the “limma” R package (version 4.1.0) [[75]20]. The current study encompassed the comprehensive computation of p-values and logFC (log-fold change) values for each individual gene. Additionally, to account for the potential biases introduced by multiple testing, the Benjamini–Hochberg method was employed for the adjustment of p-values, resulting in the derivation of adjusted p-values (p.adjust). “|log[2]FC|> 1 and p.adjust < 0.05” were defined as the threshold for the differential expression of genes. Functional enrichment analysis The Gene Ontology (GO) terms, Kyoto Encyclopedia of Genes and Genomes(KEGG) pathway enrichment analysis, and Gene Set Enrichment Analysis (GSEA) were conducted using the “clusterProfiler” R function package (version 4.7.1.2) [[76]21]. A threshold of p < 0.05 was established for identifying significant biological processes and pathways. Survival analysis The R packages “survival” and “survminer” (version 0.4.9) ([77]https://CRAN.R-project.org/package=survival) were used to estimate the overall survival of patients in different gene expression groups. Specifically, the survival curves were drawn using Kaplan–Meier method, and the significance of survival difference was determined with log-rank test. Tissue samples Twenty pairs of clinical tissue samples, consisting of OSCC tissues and paraneoplastic tissues, were obtained from Tianjin Medical University Cancer Institute and Hospital (Supplementary Table 2), and written informed consent was obtained from all participating patients. This study was approved by the Medical Ethics Committee at Tianjin Medical University Cancer Institute and Hospital (approval number: E2024087A). The entire collection process adhered strictly to the ethical principles outlined in the Declaration of Helsinki. Upon harvesting the fresh tissue specimens, they were promptly rinsed with sterile phosphate-buffered saline (PBS) or normal saline to ensure optimal cleanliness. Following thorough aspiration of residual PBS, a minimum of 0.2 g of each sample was carefully transferred into a dedicated, enzyme-free 2 mL centrifuge tube. These tubes were then stored under controlled conditions at −80 °C and transported over dry ice to maintain the necessary cryogenic temperatures throughout transit, ensuring the quality and integrity of the samples for subsequent analysis. Quantitative real-time PCR (qPCR) The total RNA was extracted from cells by TriQuick Reagent Total RNA Extraction Reagent (R1100, Beijing Solarbio Science&Technology Co.,Ltd., Beijing, China). Next, the qPCR was performed according to the protocol in previous study [[78]22]. The primer sequences of Desmoglein-2 (DSG2) were as follows: forward, 5’-TGGATCAAGGGGGCAGTCTA-3’); reverse, (5’-CCACCTTCCATCCATCTCGG-3’). In addition, GAPDH was the reference gene and the primer sequences of GAPDH were: forward, (5’-GAAGGTGAAGGTCGGAGTC-3’); reverse, (5’-GAAGATGGTGATGGGATTTC-3’). Western blot (WB) About 0.1 g of tissue samples were weighed and about 1 mL radioimmunoprecipitation assay buffer (RIPA) Lysis Buffer (R0010-100 ml, Beijing Solarbio Science&Technology Co.,Ltd., Beijing, China) was added to extract total protein. Next, the WB was performed followed the protocol in previous study [[79]23]. In this experiment, the primary antibody was DSG2 Polyclonal Antibody (21,880–1-AP, Proteintech Group, Wuhan, China) and GAPDH Antibody (60,004–1-Ig, Proteintech Group, Wuhan, China), and the secondary antibody was horseradase labeling of goat with anti-rabbit IgG 1:1000 (ZB-2301–0.1 ml, Beijing Zhong Shan-Golden Bridge Biological Technology Co., Ltd., Beijing, China) and horseradase labeling of goat anti-mouse IgG 1:1000 (ZB-2301–0.1 ml, Beijing Zhong Shan-Golden Bridge Biological Technology Co., Ltd., Beijing, China). Finally, the ECL color was developed and placed into a fully automated chemiluminescence image analysis system for internal imaging. The protein bands were analyzed for gray values using the gray value software ImageJ. The original gels and multiple exposure images were shown in Supplementary Fig. S1. Immunohistochemistry The paraffin-embedded tumor tissue and adjacent paracancerous tissue were sectioned sequentially. All paraffin sections underwent deparaffinization and rehydration. Antigen retrieval was performed on the tissue sections using a citric acid antigen retrieval buffer (pH 6.0) (ZLI-9064, Beijing Zhongshan Jinqiao Biotechnology Co., Ltd, Beijing, China). Endogenous peroxidase activity was inhibited with a 3% hydrogen peroxide solution (10,011,218, Sinopharm Chemical Reagent Co., Ltd., China), followed by blocking with goat serum (C0265, Biyuntian Biotechnology, shanghai, China). The sections were then incubated overnight at 4 °C with DSG2 polyclonal antibody (1: 1000, 21,800–1-AP, Proteintech), followed by a 50-min incubation with the Goat Anti-Rabbit lgG Secondary Antibody (HRP) (1:1000, SSA004, Sino Biological Inc.) at room temperature. Subsequently, a DAB (ZLI-9018, Beijing Zhongshan Jinqiao Biotechnology Co., Ltd, Beijing, China) color reaction was conducted, followed by hematoxylin (BA4041, Zhuhai beso Biotechnology Co., Ltd., Shenzhen, China) counterstaining and alcohol dehydration. Finally, after sealing with neutral gel, the tissue sections were observed under a microscope (DP26, OLYMPUS). Construction and validation of nomogram The patients’ independent prognostic factors were assessed by performing multivariate Cox regression, and then a prognostic nomogram was established using R package “rms” based on these independent prognostic factors. In the nomogram, independent prognostic factors were matched with a score, and the total score was obtained by adding the scores across all variables of each sample. Then the calibration curves of the nomogram were then performed to examine the predicted value between the predicted 1 -, 3 -, and 5-year survival rates and the virtual outcomes. Immune cell infiltration analysis The R package “CICERSORT” was used to calculate the proportions of a total of 22 immune cell types for each sample in different groups [[80]24]. In addition, Pearson correlation analysis was performed to assess the association between immune cell infiltration and the hub gene’s expression level. The immunity score was calculated using the “estimate” function [[81]25]. To assess the effect of immunotherapy across diverse groups, we leveraged the Tumor immune dysfunction and exclusion (TIDE, [82]http://tide.dfci.harvard.edu/ which quantified the tumor's potential immune escape capabilities. A higher TIDE score was indicative of a less favorable response to immunotherapy. Furthermore, immunophenoscore (IPS) correlation scores for each sample were downloaded from the Cancer Immunome Database (TCIA) database (tcia.at/home, version: Data Release 36.0—December 12, 2022). A higher IPS indicated a higher response rate. The p < 0.05 was regarded as significant. Statistical analysis The Wilcox test was applied to assess the statistical significance of differences in gene expression and immune cell infiltration between two groups. The “cor” function of R language was performed to analyze the Pearson correlation. Multivariate Cox regression analysis was used to determine whether the biomarker under investigation serves as an independent prognostic factor. All statistical analyses were conducted using R software(version 4.3.1). Statistical significance was considered at p < 0.05. Results ScRNA-seq data clustering and identification of CAFs-related genes in OSCC Employing the scRNA-seq dataset [83]GSE103322, we conducted a rigorous single-cell clustering and annotation procedures to delineate the CAFs clusters in OSCC. Initially, we implemented quality control measures to exclude low-quality single cells (Supplementary Fig. S2A-E). Subsequently, we annotated the cellular subpopulations based on four established CAFs marker genes including ACTA2, FAP, PDGFRB, and NOTCH3. As a result, a total of four distinct CAFs clusters were identified (Fig. [84]1A). Notably, we observed varied expression patterns of these four marker genes across the different CAF subsets (Fig. [85]1B, C), further validating the accuracy of our CAFs classification. Fig. 1. [86]Fig. 1 [87]Open in a new tab Identification of CAFs-related specific genes using single-cell analysis in the [88]GSE103322 cohort. A A t-SNE plot of the distribution of four CAFs clusters after clustering. B, C T-SNE plots (B) and violin plots (C) of the expression of four published CAFs’ marker genes in different clusters. D Dot plot of the top five CAFs-related specific genes’ expression in different clusters. E T-SNE distribution map of tumor and normal cells predicted by the CNV characteristics. F The top 20 GO terms enrichment of CAFs-related specific genes. G The top 20 KEGG pathway enrichment of CAFs-related specific genes Then, we identified a total of 4,509 CAFs-related genes (CAFs-genes) in these four CAFs clusters using the “FindAllMarkers” function (Supplementary Table 3). The top five genes with the highest expression levels in each cluster were shown in Fig. [89]1D. Additionally, leveraging copy number variation (CNV) characteristics, we distinguished tumor cells and normal cells within the CAFs clusters (Fig. [90]1E). This distinction revealed 2,021 DEGs (CAFs-DEGs) between the tumor and normal cellular components. To explore the functional roles of CAFs-genes in OSCC, we performed GO and KEGG pathway enrichment analyses. The GO enrichment analysis results showed that they were involved in critical biological processes such as ribosome biogenesis and ribonucleoprotein complex biogenesis (Fig. [91]1, Supplementary Table 4). The KEGG analysis results revealed that they were significantly enriched in 61 pathways such as focal adhesion and proteoglycans in cancer (Fig. [92]1, Supplementary Table 4). Identification of DSG2 as a CAFs-related biomarker in OSCC To screen CAFs-related candidate genes in OSCC, the DEG analysis has been conducted. As a result, we identified 4,504, 2,498, and 2,264 DEGs between OSCC samples and controls in the TCGA (TCGA-DEGs) (Fig. [93]2A), [94]GSE30784 ([95]GSE30784-DEGs) (Fig. [96]2B), and [97]GSE78060 datasets ([98]GSE78060-DEGs) (Fig. [99]2C), respectively. Next, 102 overlapping genes were generated by taking the intersection of the 4,509 CAFs-genes, 2,021 CAFs-DEGs, 4,504 TCGA-DEGs, 2,498 [100]GSE30784-DEGs, and 2,264 [101]GSE78060-DEGs (Fig. [102]2D). To further investigate their clinical significance, we utilized the TCGA database to conduct an univariate Cox regression analysis on these 102 overlapping genes. Notably, the results indicated that eight genes exhibited significant associations with OSCC prognosis including DDIT4, DSG2, ADM, PLAU, BIRC5, ADORA2B, TK1, and LAPTM4B (Fig. [103]2E), highlighting their potential prognostic value as CAFs-related candidate genes. Fig. 2. [104]Fig. 2 [105]Open in a new tab Identification of CAFs-related candidate genes in OSCC. A-C Volcano plot of DEGs between OSCC and control samples in the TCGA (A), [106]GSE30784 (B), and [107]GSE78060 datasets (C). D Overlapping DEGs of 4,509 CAFs-genes, 2,021 CAFs-DEGs, 4,504 TCGA-DEGs, 2,498 [108]GSE30784-DEGs, and 2,264 [109]GSE78060-DEGs. E Univariate Cox regression analysis of eight candidate genes. F, G AUC values of eight candidate genes in diagnosing OSCC in the TCGA (F) and [110]GSE78060 (G) datasets based on ROC analysis. H ROC curves of DSG2 in diagnosing OSCC based on fivefold CV Subsequently, we conducted an receiver operating characteristic (ROC) analysis using the TCGA dataset to assess the diagnostic performance of eight CAFs-related candidates. Encouragingly, the findings revealed that all genes exhibited Area Under Curve (AUC) values exceeded 0.7 (Fig. [111]2F), suggesting their strong predictive capabilities. Additionally, when assessed in the [112]GSE78060 dataset, the AUC values further underscored the diagnostic significance of these genes in OSCC (Fig. [113]2G). Among the previous studies in OSCC, we discovered a scarcely reported gene, DSG2, which belonged to the cadherin superfamily of cell adhesion proteins. Given its established role in regulate cell–cell contacts, the aberrant expression of DSG2 was likely to play a crucial role in tumor genesis and progression of OSCC. The ROC analysis revealed AUC values of 0.711 in the TCGA dataset and 0.885 in the [114]GSE78060 dataset for DSG2 (Supplementary Fig. S3A-S3B). To further validate its diagnostic capability, we conducted a fivefold cross-validation (fivefold CV) methodology using the TCGA dataset, which yield a mean AUC value of 0.8365 for DSG2 (Fig. [115]2H). This finding further corroborated the potential effectiveness of DSG2 as a biomarker in diagnosing OSCC. Based on these results, we postulated DSG2 as a CAFs-related hub gene and proceeded to conduct downstream analysis to elucidate its intricate role in the pathogenesis of OSCC. Significant high expression of DSG2 in OSCC samples compared to controls To investigate the role of DSG2 expression in OSCC, we conducted a comprehensive analysis of its expression patterns in both OSCC samples and controls. The results showed a significant up-regulation of DSG2 in OSCC samples compared to the controls across the TCGA, [116]GSE30784, and [117]GSE78060 datasets (Fig. [118]3A-C). The results of qPCR showed that mRNA level of DSG2 was significantly upregulated in OSCC tissue compared to normal tissue (Fig. [119]3D). Immunohistochemistry images showed that DSG2 was more highly expressed in OSCC tissues than in normal tissues (Fig. [120]3E). The level of DSG2 protein was also significantly increased OSCC tissues (Fig. [121]3F). in In addition, compared to HPV-negative OSCC samples in the TCGA dataset, DSG2 was significantly increased in the HPV-positive samples (Fig. [122]3G). This observation suggested a potential association between DSG2 expression and HPV infection status in OSCC, implying a pivotal role for DSG2 in the disease's pathogenesis. Fig. 3. [123]Fig. 3 [124]Open in a new tab Validation of high DSG2 expression as an independent risk factor for the prognosis of OSCC. A-C Box plot of DSD2 expression in OSCC and controls of TCGA (A), [125]GSE30784 (B), and [126]GSE78060 cohorts (C). D The expression of DSD2 mRNA in OSCC determined by qPCR. E Immunohistochemistry staining of DSG2 (magnification: 100 × and 200x). F The expression of DSD2 protein in OSCC determined by western blot. G Box plot of DSG2 expression in HPV-negative and HPV-positive of TCGA dataset. H, I Survival analysis of patients with different DSG2 expression in TCGA (H) and [127]GSE41613 (I) datasets. J Multivariate Cox regression analysis of DSG2 expression and other clinicopathologic characteristics including age, gender, and grade. K Nomogram model integrating the DSG2 expression and age. L-N The calibration curves for the nomogram's overall survival prediction encompass 1-year (L), 3-year (M), and 5-year (N) horizons. (* p < 0.05, **p < 0.01, ***p < 0.001) High expression of DSG2 as an independent risk factor for the prognosis of OSCC Next, we examined the impact of DSG2 expression on the prognosis of OSCC patients. The OSCC samples in the TCGA dataset (TCGA-OSCC) and [128]GSE41613 cohort were categorized into high-expression (DSG2-H) and low-expression(DSG2-L) groups based on the median DSG2 expression level. Survival analysis results revealed that, in both the TCGA-OSCC and [129]GSE41613 cohorts, the DSG2-H group exhibited significantly poorer overall survival rates compared to the DSG2-L group (Fig. [130]3H, I). This observation strongly suggested that the high DSG2 expression served as a detrimental prognostic factor in OSCC. To assess the independence of DSG2 expression in predicting prognosis, we conducted a multivariate Cox regression analysis in the TCGA-OSCC samples, incorporating three critical clinical factors including age, gender, and grade. The results confirmed that both DSG2 expression and age retained significan predictive power for overall survival (Fig. [131]3J). Specifically, age and DSG2 expression (HR = 1.3, 95%CI: 1.11–1.5, P < 0.001) emerged as independent prognostic factors. Building on these findings, to enhance clinical prognosis prediction for OSCC patients, we utilized these two independent prognostic factors, DSG2 expression and age, to construct a nomogram. This nomogram facilitated the prediction of 1-, 3-, and 5-year survival probabilities for OSCC patients (Fig. [132]3K). The calibration curves further validated the accuracy of our nomogram, demonstrating its effectiveness in predicting actual survival outcomes (Fig. [133]3L-N). Elevated expression of DSG2 linked to malignant progression of OSCC To further investigate the role of DSG2 in the progression of OSCC, we employed the TCGA dataset to comprehensively analyze the relationship of its expression patterns with gender, age, clinical stage, grade, N stage, and T stage. The results revealed no significant variation in DSG2 expression levels between different gender and age groups (Supplementary Fig. S4A, B). However, we observed significant differences in DSG2 expression among different clinical stages (Supplementary Fig. S4C) and grade groups (Supplementary Fig. S4D), with a gradual increase in expression levels paralleling the malignancy progression of the tumor. On the other hand, although we did not detect significant differences in DSG2 expression among different N stage (Supplementary Fig. S4E) and T stage groups, except for T1 vs T2, T1 vs T4, and T3 vs T4 comparisons (Supplementary Fig. S4F), it is noteworthy that DSG2 expression levels exhibited a trend of gradual elevation with the advancement of both N stage and T stage. These findings suggested that the high DSG2 expression was intricately linked to the malignant progression of OSCC. Exploration of biological significance of different DSG2 expression groups To gain a more profound understanding of the roles of DSG2 in OSCC, we performed GSEA on DSG2-H and DSG2-L groups within the TCGA-OSCC dataset. The results revealed that 91 pathways were significantly enriched in the DSG2-H group compared to the DSG2-L group (Fig. [134]4A, Supplementary Table 5). In particular, immune-related pathways including AGE-RAGE signaling pathway in diabetic complications, ECM-receptor interaction, PI3K-Akt signaling pathway, TGF-beta signaling pathway, and wnt signaling pathway were significantly activated in the DSG2-H group (Fig. [135]4B). Fig. 4. [136]Fig. 4 [137]Open in a new tab Exploration of the biological significance of different DSG2 expression groups. A The top 30 pathways exhibiting significant differences between the high DSG2 expression and low DSG2 expression groups. B The immune-related pathways were activated in the high DSG2 expression group based on GSEA. C, D Heatmap of top 10 GO terms (C) and KEGG pathways (D) with significant differences between the high DSG2 expression and low DSG2 expression groups based on GSVA In addition, we conducted a thorough analysis of the disparities among functional pathways between DSG2-H and DSG2-L groups utilizing GSVA. As a result, we discovered that 5,550 GO terms such as fibroblast migration and 106 KEGG pathways like focal adhesion exhibited significant differences (adjust.p < 0.05) (Supplementary Table 6, Fig. [138]4C, D). Therefore, these findings suggested that the DSG2 potentially played a significant role in the immune-mediated processes in OSCC, necessitating further scrutiny and investigation. The divergent immune landscapes between different DSG2 expression groups Drawing upon the aforementioned findings, we delved deeper into the relationship between DSG2 expression and immune cell infiltration in OSCC. Initially, utilizing the ESTIMATE algorithm, we quantified compute the immune, stromal, and ESTIMATE scores in the DSG2-L and DSG2-H groups. The results revealed no significant differences in ESTIMATE score and stromal score between the two groups. However, a noteworthy finding was the significantly elevated immune score in the DSG2-L group compared to the DSG2-H group (Fig. [139]5A-C). This observation suggested that elevated DSG2 expression might suppress immune cell infiltration, potentially fostering tumor growth and dissemination. Fig. 5. [140]Fig. 5 [141]Open in a new tab The landscape of immune cell infiltration in different DSG2 expression groups. A-C Box plots of ESTIMATE (A), immune (B), and stromal (C) scores in different DSG2 expression groups. D. Stacked plot illustrating the distribution of 22 immune cell types in each sample. E The 22 immune cell infiltration in the high DSG2 expression and low DSG2 expression groups. F Correlation analysis of the relationship between 14 vital immune cells and DSG2 expression. (* p < 0.05, **p < 0.01, ***p < 0.001) Furthermore, to gain a more comprehensive understanding, we utilized the CIBERSORT algorithm to calculate the proportion of 22 distinct immune cell types in TCGA-OSCC samples (Fig. [142]5D). We found that significant disparities in the infiltration of 14 immune cell types between the DSG2-L and DSG2-H groups (Fig. [143]5E). Correlation analysis further revealed that the infiltration of resting memory CD4 T cells (R = 0.24, p < 0.001), resting NK cells (R = 0.2, p < 0.001), M0 macrophages (R = 0.22, p < 0.001), and activated mast cells (R = 0.2, p < 0.001) exhibited a significantly positive correlation with DSG2 expression. Conversely, seven immune cell infiltration such as CD8 T cells (R = −0.33, p < 0.001) demonstrated a significantly negative correlation with DSG2 expression (Fig. [144]5F). These findings provided a nuanced understanding of the complex relationship between immune cell infiltration and DSG2 expression in OSCC. DSG2 expression could predict the clinical benefit of immunotherapy In recent years, tumor mutation burden (TMB) had garnered great attention in the realm of cancer immunotherapy, as patients with a high TMB were more likely to benefit from immunotherapy. To investigate this phenomenon in OSCC, we observed the difference in somatic mutation levels between the DSG2-L and DSG2-H groups and calculated the TMB for the patients using the TCGA-OSCC dataset. Notably, we discovered that TP53 exhibited the highest mutation frequency in both the DSG2-L and DSG2-H groups (Supplementary Fig. S5A, B). In addition, the TMB in DSG2-L group was significantly elevated compared to that in the DSG2-H group (Supplementary Fig. S5C, Fig. [145]6A). This observation suggested that OSCC patients in the DSG2-L group were more likely to derive benefit from immunotherapy. Fig. 6. [146]Fig. 6 [147]Open in a new tab Validation of DSG2 expression in predicting the clinical benefit of immunotherapy. A Box plots of TMB differences in different DSG2 expression groups. B Analysis of the expression levels of immune checkpoint genes, including PD1, LAG3, CD80, and CD86 in different DSG2 expression groups. C Correlation analysis of the association between immune checkpoint genes and DSG2 expression. D Correlation analysis of the relationship between TIDE and DSG2 expression. E The correlation between DSG2 expression and MHC, EC, SC, IPS score. F The correlation between DSG2 expression and four immunotherapy strategies specifically targeting CTLA-4 and PD-1: CTLA-4(-)/PD-1(-), CTLA-4( +)/PD-1(-), CTLA-4(-)/PD-1( +), and CTLA-4( +)/PD-1( +). G Box plot of DSG2 expression for patients with different immunotherapeutic responses in [148]GSE35640 cohort. H ROC curve of response to immunotherapy in [149]GSE35640 cohort. I Box plot of DSG2 expression for patients with varying immunotherapeutic responses in [150]GSE91061 cohort. J ROC curve of response to immunotherapy in GSE91061cohort. (* p < 0.05, **p < 0.01, ***p < 0.001) To gain further insights, we examined the relationship between DSG2 expression and crucial immune checkpoints including PD1, LAG3, CD80 and CD86. Our findings revealed that in the DSG2-L group, PD1 and LAG3 expressions were significantly down-regulated, while CD80 and CD86 expressions were significantly up-regulated compared to the DSG2-H group (Fig. [151]6B). Pearson correlation analysis showed a significant negative correlation between PD1/LAG3 and DSG2 expression, and a positive correlation between CD80/CD86 and DSG2 expression (Fig. [152]6C). Subsequently, we delved into the clinical response of OSCC to immunotherapy. Utilizing the TIDE algorithm, we discovered a significant positive correlation between the TIDE score and DSG2 expression in the TCGA-OSCC cohort (R = 0.85, p < 0.001) (Fig. [153]6D). Moreover, in the [154]GSE30784 dataset, CTLA4, LAG3, and TIGIT expressions were significantly decreased in high DSG2 expression group (high vs. low), and were negatively correlated with DSG2 expression (Supplementary Fig. S6, P < 0.05). This finding suggested that patients with high DSG2 expression might possess a potent immune escape capacity. To further investigate this, we accessed the TCIA database and retrieved the IPS of the TCGA-OSCC samples. The correlation analysis results revealed that DSG2 expression exhibited a negative association with both the major histocompatibility complex (MHC) score and the IPS, whereas it displayed an inverse relationship with the suppressor cell (SC) score (Fig. [155]6E). Furthermore, our analysis indicated that DSG2 expression responded negatively to four immunotherapy strategies specifically targeting CTLA-4 and PD-1, including CTLA-4(-)/PD-1(-), CTLA-4( +)/PD-1(-), CTLA-4(-)/PD-1( +), and CTLA-4( +)/PD-1( +) (Fig. [156]6F). These findings collectively suggested that patients with lower DSG2 expression might have a higher response rate to immunotherapy compared to those with higher DSG2 expression. To further investigate the predictive potential of DSG2 expression in immunotherapy response, we employed the [157]GSE35640 and [158]GSE91061 datasets. In the [159]GSE35640 dataset, DSG2 expression was significantly higher in the immunotherapy nonresponder group than in the immunotherapy responder group (Fig. [160]6G). Additionally, the results of ROC analysis showed an AUC of 0.663 in predicting the response to immunotherapy (Fig. [161]6H), indicating the predictive accuracy of DSG2 expression in determining immunotherapy response. Similarly, in the [162]GSE91061 dataset, DSG2 expression was notably higher in patients with disease progression (PD) or disease stability (SD) (PD/SD) compared to those with partial response (PR) or complete response (CR) (PR/CD) (F[163]ig. [164]6I). The AUC value of 0.684 further supports the predictive value of DSG2 expression in immunotherapy response (Fig. [165]6J). Collectively, our findings suggested that OSCC patients with lower DSG2 expression were more likely to derive benefit from immunotherapy than those with higher DSG2 expression, highlighting the potential of DSG2 expression as a predictive biomarker for immunotherapy response in OSCC. Discussion OSCC, the most frequently encountered malignant tumor in the head and neck region, posed a significant threat to human health due to its high degree of heterogeneity and proclivity for metastasis [[166]26]. Despite advances in treatment, the prognosis for patients remained poor, with a alarmingly low 5-year survival rate [[167]27]. CAFs, modulated tumor proliferation, angiogenesis, metastasis, and chemotherapy resistance through the release of various factors into the TME, had been recognized as critical players in tumor progression [[168]28]. Consequently, exploring the molecular markers associated with CAFs in OSCC held the potential to enhance our understanding of the disease’s pathogenesis and paved the way for novel therapeutic strategies. In this study, we utilized scRNA-seq analysis to identify CAFs clusters and their specific genes within OSCC. Leveraging DEG analysis and Cox regression analysis, we successfully pinpointed DSG2 as a promising CAFs-related biomarker. DSG2, a calcium-binding transmembrane glycoprotein belonging to the cadherin family [[169]29], played a pivotal role in various cancer cell functions, including cell adhesion [[170]30], proliferation, and invasion [[171]31]. Previous studies had demonstrated that overexpression of DSG2 in the transgenic mice epidermis could induce hyperplasia [[172]32], while its loss led to reduced proliferation of mouse epithelial cells and inhibited tumor growth [[173]33]. Notably, DSG2 expression was significantly elevated in lung squamous cell carcinoma, pancreatic cancer, and prostate cancer, and high DSG2 expression was associated with a poor prognosis in these patients [[174]29]. Although DSG2 showed abnormal expression in other digestive system tumors and was closely related to tumor prognosis, its specific role in OSCC still requires further study. Our study found that DSG2 was highly expressed in OSCC samples compared to controls. Furthermore, patients with high DSG2 expression exhibited a worse prognosis, highlighting its potential as a prognostic marker in OSCC. The results of our functional enrichment analyses indicated that DSG2 was involved in critical signal transduction pathways, such as focal adhesion and signaling pathways regulating the pluripotency of stem cells. Prior reports had established the fundamental role of desmosomal cadherin as a vital component of intercellular junctions, facilitating intercellular communication and signal transduction processes [[175]34]. In the regulation of stem cell pluripotency, DSG2 was recognized as a key cell surface marker for defining the pluripotency status of human pluripotent stem cells (hPSCs) [[176]35]. DSG2 colocalized with specific cell surface markers unique to human PSCs, and its expression was rapidly downregulated following cell differentiation [[177]35]. Knockdown of CD133 (A marker for both cancer and normal stem cells) remarkably decreased DSG2 expression and impaired cell adhesion and tumourigenicity of clear cell carcinoma stem cells [[178]36]. The depletion of DSG2 significantly impaired both the proliferation and the expression of pluripotency markers in hPSCs, suggesting that DSG2 was crucial for the maintenance of PSC self-renewal and pluripotency [[179]35]. Moreover, DSG2 could regulates hPSCs self-renewal and pluripotency by regulating b-catenin/Slug-mediated epithelial to mesenchymal transition [[180]35]. Our findings revealed a significant activation of the wnt signaling pathway and PI3K Akt signaling pathway in the high DSG2 expression group. Wnt proteins were secreted glycoproteins that regulated multicellular development and maintained tissue homeostasis [[181]37]. During the transformation of mouse induced pluripotent stem cells (miPSCs) into cancer stem cells, upregulation of Dsg2 expression activates the Wnt/β-catenin signaling pathway, suggesting a possible mechanism by which stem cells transform into malignant phenotypes [[182]38]. In breast cancer, DSG2 forms desmosomal complexes with JUP and vimentin, triggers Wnt/PCP signaling, and was associated with poor prognosis [[183]39]. Therefore, we speculated that DSG2 might promote the malignant progression of OSCC by activating the Wnt/β-catenin signaling pathway. Meanwhile, the PI3K/Akt signaling pathway, a highly conserved signaling network in eukaryotic cells, promoted cell cycle progression and played an important role in cancer biology [[184]40]. Based on these observations, we hypothesized that DSG2 exerted its influence cell proliferation and TME via these intricate signaling mechanisms [[185]41]. Furthermore, our immune cell infiltration analysis revealed a significant negative correlation between T cell infiltration and DSG2 expression. In cervical cancer, up-regulation of DSG2 significantly enhanced tumor purity by reducing the infiltration of B cells, T cells, and NK cells [[186]42]. Overactivation of transforming growth factor (TGF)-beta-1 signaling contribute to the pathogenesis of arrhythmogenic cardiomyopathy caused by mutant DSG2 [[187]43]. It is well known that TGF-beta is a key cytokine that regulates T cell development, activation, proliferation, differentiation and death. In CD4 + T cells, TGF-β remained quiescent and controls the activation of naïve T cells [[188]44]. TGF-β was required for the induction of Foxp3 in naive T cells and for the development of regulatory T cells [[189]44]. TGF-β was essential for CD8 + T cell differentiation and memory maintenance, while inhibiting effector T cell function [[190]44]. Given the well-documented pivotal role of T cells in tumor progression [[191]45], this finding suggested that an upregulation of DSG2 expression might suppress the infiltration of immune cells. Considering that the TGF-beta signaling pathway was significantly activated in OSCC patients with high DSG2 expression, we suggested that DSG2 might regulate T cell infiltration by modulating the TGF-beta signaling pathway signaling pathway in TME, thereby promoting tumor progression. In recent years, immunotherapy had garnered substantial attention as an anticancer therapy [[192]46]. In the treatment of oral OSCC, the application of immune checkpoint inhibitors (ICIs) represents a significant area of research [[193]47]. While the direct relationship between DSG2 and immunotherapy remains unclear, studies indicated that the expression of immune checkpoints, such as PD-1, and PD-L1, in OSCC was closely associated with the TME and patient prognosis [[194]48, [195]49]. In the present study, we discovered that DSG2 expression was remarkably negatively correlated with PD-1 expression and T cell infiltration (eg. CD8 T cells) in OSCC. These results indicated that DSG2 might influence the immune escape of OSCC and the efficacy of immunotherapy by altering the behavior of immune cells within the tumor microenvironment, particularly the function of T cells. While the specific mechanism remains unclear, the role of DSG2 in the tumor microenvironment could significantly affect the response to immunotherapy. Furthermore, our in-depth analysis of DSG2 expression levels across different immune response groups unveiled a significant elevated DSG2 expression in the immunotherapy non-responder group compared to the immunotherapy responder group, suggesting that OSCC patients with lower DSG2 expression were more likely to benefit from immunotherapy than those with higher DSG2 expression. Consistent with our findings, Hill et al. reported a notable decrement in DSG2 levels among responded to anti-PD-1 therapy in Head and Neck Squamous Cell Carcinoma (HNSCC), while no such decrease was observed in non-responders [[196]50]. Moreover, previous studies had demonstrated that TGF-β signaling could promote cancer progression by suppressing the antitumor activities of immune cells, thereby potentially reducing or compromising the efficacy of anticancer immunotherapy [[197]51]. Strikingly, our analysis identified a significant enrichment of the TGF-β signaling pathway in the high DSG2 expression group. Therefore, it was plausible that DSG2 might contribute to immunotherapy resistance through multiple intricate mechanisms, which deserve further attention and validation. Although this study identified DSG2 as a biomarker associated with CAFs in OSCC, demonstrating a correlation with poor prognosis and outcomes in immunotherapy, the role of DSG2 in OSCC prognosis and immunotherapy should be further validation in a prospective cohort of patients, including follow-up data, who were receiving immunotherapy in clinical practice. Additionally, our study revealed a relationship between DSG2 and immune cells; however, the mechanisms through which DSG2 influences immune cells within the tumor microenvironment required further investigation at the molecular level. Finally, the role of DSG2 in OSCC progression should be investigated by in vivo and in vitro experiments in future studies. Conclusions Our study successfully identified DSG2 as a CAFs-related biomarker in OSCC, which holds significant predictive value for both diagnosis and prognosis. Notably, upregulation of DSG2 was found in OSCC, and this upregulation was linked to inferior prognosis. Patients with lower DSG2 expression were more prone to derive beneficial outcomes from immunotherapy compared to those with higher DSG2 expression. This study offered novel insights into the mechanistic understanding of CAFs for OSCC. Supplementary Information [198]12903_2024_5284_MOESM1_ESM.pdf^ (1.1MB, pdf) Supplementary Material 1. Supplementary Fig. S1. The original gels and multiple exposure images. [199]12903_2024_5284_MOESM2_ESM.docx^ (2.4MB, docx) Supplementary Material 2. Supplementary Fig. S2. Single cell quality control chart. (A). Violin plot of the distribution of counts, features, and mitochondrial numbers (B). Scatter plot of correlation between counts and feature numbers (C). Scatter plot of hypervariable genes, red points represent hypervariable genes (D). After PCA analysis, each Visual distribution of genes in principal components (E). Principal component scatter plot after PCA analysis. [200]12903_2024_5284_MOESM3_ESM.docx^ (204.5KB, docx) Supplementary Material 3. Supplementary Fig. S3. Receiver operating characteristic curve of DSG2 in diagnosing OSCC in the TCGA (A) and GSE78060 datasets (B). [201]12903_2024_5284_MOESM4_ESM.docx^ (392.3KB, docx) Supplementary Material 4. Supplementary Fig. S4. Correlation between DSG2 expression and clinicopathologic characteristics of OSCC. The expression of DSD2 in female and male(A), Age < 65 and > 65 years old (B), clinical stage (I/II/III/IV) (C), grade (G1/G2/G3) (D), N stage (N0/N1/N2/N3) (E), and T stage (T1/T2/T3/T4) (F). (*p < 0.05, **p < 0.01). [202]12903_2024_5284_MOESM5_ESM.docx^ (318.4KB, docx) Supplementary Material 5. Supplementary Fig. S5. OncoPrint analysis based on CNV profiles and TMB. OncoPrint visualizations based on the CNV profile in the high DSG2 expression group (A) and low DSG2 expression group (B). (C). The level of TMB of each sample in the TCGA-OSCC cohort. [203]12903_2024_5284_MOESM6_ESM.docx^ (774.9KB, docx) Supplementary Material 6. Supplementary Fig. S6. The correlation between DSG2 expression and CTLA4, LAG3, and TIGIT expressions in the GSE30784 dataset. (*p < 0.05, ***p < 0.001, ****p < 0.0001). [204]12903_2024_5284_MOESM7_ESM.docx^ (18.7KB, docx) Supplementary Material 7. Supplementary Table 1. The details of the datasets from the GEO database. [205]12903_2024_5284_MOESM8_ESM.docx^ (12.7KB, docx) Supplementary Material 8. Supplementary Table 2. Patient information for clinical tissue samples. [206]12903_2024_5284_MOESM9_ESM.xlsx^ (570.1KB, xlsx) Supplementary Material 9. Supplementary Table 3. Fibroblast-related genes in four CAFs clusters. [207]12903_2024_5284_MOESM10_ESM.xlsx^ (298.9KB, xlsx) Supplementary Material 10. Supplementary Table 4. The results of GO and KEGG enrichment analysis for 4,509 CAFs-genes. [208]12903_2024_5284_MOESM11_ESM.xlsx^ (35KB, xlsx) Supplementary Material 11. Supplementary Table 5. The result of GSEA for differentially expressed genes between high and low DSG2 expression groups. [209]12903_2024_5284_MOESM12_ESM.xlsx^ (820.7KB, xlsx) Supplementary Material 12. Supplementary Table 6. GSVA analysis revealed highly enriched GO and KEGG pathways in high and low DSG2 expression groups. Acknowledgements