Abstract Transcription factors (TFs) are pivotal in tumor initiation and progression, regulating downstream gene expression and modulating cellular processes. In this study, we conducted a comprehensive analysis of TF gene sets to define the molecular subtypes of gliomas. Using nonnegative matrix factorization (NMF), we identified two distinct glioma subtypes characterized by significant differences in survival outcomes and clinical features. Additionally, we identified TF gene sets with differential expression across gliomas of various World Health Organization (WHO) states, followed by protein‒protein interaction (PPI) network analysis. By applying 101 machine learning models, five key genes (EZH2, TWIST1, EGR1, FOSL2, and TCF3) involved in glioma were identified. Among these genes, TCF3 has emerged as a potential key prognostic marker because of its distinct expression patterns and functional relevance. By performing multi-omics and multi-dataset analyses, we explored the aberrant expression of TCF3 across multiple cancers, with robust validation at both the cellular and tissue levels. Furthermore, our analysis revealed a strong association between TCF3 mutation and glioma prognosis, underscoring its potential as a therapeutic target. In summary, this study not only introduces a novel method for the molecular subtyping of glioma but also highlights TCF3 as a promising target for precision medicine. Our findings provide crucial insights into the molecular mechanisms of glioma and offer a foundation for the development of novel therapeutic strategies. Supplementary Information The online version contains supplementary material available at 10.1038/s41598-025-09924-w. Subject terms: Computational biology and bioinformatics, Molecular biology, Biomarkers, Diseases, Medical research, Molecular medicine, Neurology Introduction Cancer remains an escalating global health challenge, with both incidence and mortality rates steadily rising. In 2022, the United States alone reported more than 1.9 million new cancer cases and 609,360 cancer-related deaths, underscoring the pressing need for more effective interventions^[34]1. Among the many cancer types, glioma, a particularly aggressive brain tumor, is distinguished by its infiltrative growth pattern, which closely intertwines with normal brain tissue, obscuring tumor boundaries^[35]2. This makes surgical resection difficult and often incomplete, leading to residual tumor cells and recurrence. Moreover, gliomas exhibit a regenerative capacity that increases malignancy over time, further complicating treatment and worsening the prognosis^[36]3. An additional challenge in glioma management arises from its intertumoral heterogeneity^[37]4. This molecular complexity means that each tumor can present unique features, leading to variable responses to standard therapies^[38]5. Given the limited understanding of the molecular mechanisms driving glioma progression, alongside the limitations of current treatments, there is an urgent need for in-depth molecular analyses to advance precision medicine and support the development of more personalized therapeutic strategies. Transcription factors (TFs) are critical regulators of gene expression and play key roles in cancer biology. In eukaryotic cells, transcription initiation requires the coordinated action of RNA polymerase II and numerous TFs that assemble at the promoter region of a gene to form the transcription initiation complex^[39]6. TFs are generally classified as either general TFs, which help form the transcription initiation complex, or specific TFs, which regulate genes in response to stimuli such as hormones or growth factors. In cancer, TFs play a central role in various critical pathological processes, including tumorigenesis^[40]7–[41]9, metabolic reprogramming^[42]10,[43]11, and the self-renewal and differentiation of cancer stem cells (CSCs)^[44]12,[45]13, by regulating gene expression networks. During tumorigenesis, TFs can function as oncogenes or tumor suppressors, depending on the specific cancer type^[46]10,[47]14–[48]17. For example, the TF Yin Yang 1 (YY1) promotes tumor cell proliferation and metastasis in most cancers^[49]18 but exhibits tumor-suppressive activity in specific cancers, such as pancreatic and esophageal tumors. In metabolic reprogramming, TFs regulate critical genes involved in metabolic pathways, such as glycolysis and fatty acid metabolism, enabling tumor cells to meet the energy demands of rapid growth^[50]19,[51]20. TFs also play crucial roles in modulating the properties of CSCs, a population of tumor cells responsible for recurrence and metastasis. In breast cancer, forkhead box O3 (FOXO3a) negatively regulates forkhead box M1 (FOXM1)^[52]21–[53]24, reducing CSC stemness and tumorigenicity. Given these diverse roles, TFs serve as important diagnostic and prognostic biomarkers in various cancers^[54]25,[55]26. Together, these findings highlight the multifaceted functions of TFs in cancer biology and their potential as therapeutic targets. In this study, we focused on the role of TFs in glioma subtyping. Using nonnegative matrix factorization (NMF)^[56]27, we analyzed 795 TFs and identified two molecular subtypes of gliomas with distinct survival rates and clinical characteristics. Further analysis revealed that certain TF-related gene sets were differentially expressed across World Health Organization (WHO) grade II, III, and IV gliomas. To explore protein-level interactions among these genes, we performed a protein‒protein interaction (PPI) network analysis. To identify core genes associated with the subtyping of gliomas and the classification of gliomas according to the WHO standards, ten machine learning techniques, including least absolute shrinkage and selection operator (Lasso), Ridge, Elastic Net (Enet), StepCox, survivalSVM, CoxBoost, SuperPC, plsRcox, random survival forests (RSF), and gradient boosting machine (GBM), were combined into 101 distinct strategies. Subsequently, the support vector machine (SVM) algorithm was used to further refine the results, identifying genes that are representative of different glioma subtypes. This thorough approach led to the identification of several key genes linked to glioma subtypes. Univariate analysis was conducted to determine the main factors affecting glioma risk, and a thorough PPI network analysis of these risk factors was performed. Ultimately, we identified five key genes, EZH2, TWIST1, EGR1, FOSL2, and TCF3, which are important for understanding glioma subtypes and predicting patient outcomes. Research on transcription factors in glioma and glioblastoma has confirmed their unique regulatory roles in these types of brain tumors. Despite its recognized role in other tumors, TCF3 has been minimally studied in gliomas^[57]28–[58]30, The regulatory role of TCF3 in controlling cell proliferation and migration in glioma cell lines has been reported in previous studies^[59]31, Our multi-omics and multi-dataset analyses revealed the abnormal expression of TCF3 in glioma and other cancers, a finding validated at both the cellular and tissue levels. Furthermore, we explored the relationship between TCF3 and glioma prognosis, revealing new insights into its potential as a therapeutic target. Overall, our study introduced a novel molecular classification system for gliomas and identified TCF3 as a key prognostic marker and potential therapeutic target. These findings lay the groundwork for advancing precision medicine in glioma treatment and may facilitate the development of novel therapeutic strategies. Materials and methods Data collection and processing The structured workflow diagram of this study is illustrated in Fig. [60]1. Transcription factors are sourced from the TRRUST database ([61]https://www.grnpedia.org/trrust/) , a manually curated repository of transcriptional regulatory networks in humans and mice (Table [62]S1). These data originate from 11,237 PubMed-indexed publications documenting small-scale experimental investigations into transcriptional regulation, encompassing three categories of transcription factors: direct regulators, indirect mediators, and condition-specific modulators. Transcriptome and clinical data were obtained from two databases: the Chinese Glioma Genome Atlas (CGGA) database ([63]http://www.cgga.org.cn) and The Cancer Genome Atlas (TCGA) database ([64]https://portal.gdc.cancer.gov/)^[65]32–[66]37. Among these, TCGA included 702 glioma samples, while the CGGA database aggregated three datasets (CGGA325, CGGA693, and CGGA301) totaling 1,351 samples, all confirmed as gliomas. For prognostic analysis and molecular subtyping, samples with invalid survival data (survival time ≤ 0 days or missing survival status) were excluded from the cohort. Genomic mutation data for gliomas were acquired from the Genomic Data Commons (GDC) Data Portal ([67]https://portal.gdc.cancer.gov/). The glioma cell lines (U251, LN229, U87, A172, and U118) and normal human astrocytes (HA) were obtained from the Chinese Academy of Science. Tumor tissue specimens, and paired adjacent normal brain tissue samples were obtained from the Tissue Repository of Lanzhou University Second Hospital. All patients provided written informed consent, and the study protocol received full ethical approval from the Clinical Ethics Committee of Lanzhou University Second Hospital. Fig. 1. [68]Fig. 1 [69]Open in a new tab Structured workflow diagram of this study. Identification and classification of glioma subtypes using TFs A set of 795 TF-related genes (Table [70]S1) was identified for analysis. To classify glioma subtypes on the basis of TF expression profiles, we applied NMF with the following parameters: rank = 2:10, method = “brunet”, and nrun = 10. After evaluating the clustering performance, we determined that two subtypes (clusterNum = 2) provided the most robust classification. We then assessed the relationships between these two subtypes and their clinical characteristics. Differential gene expression analyses between two glioma subtypes and across different WHO grades of glioma were performed via the limma R package to identify differentially expressed genes (DEGs). Gene Ontology (GO) analysis^[71]38 was carried out to evaluate DEGs across dimensions, including cellular component (CC), molecular function (MF), and biological process (BP). We also performed pathway enrichment analysis via the Kyoto Encyclopedia of Genes and Genomes (KEGG) ^[72]39–[73]41and gene set enrichment analysis (GSEA) to explore specific functional pathways. Weighted gene co-expression network analysis (WGCNA) WGCNA^[74]42 was used to identify gene modules co-expressed in relation to the two glioma subtypes. To construct a scale-free network, we calculate the soft-thresholding power on the basis of the scale-free topology criterion and select the optimal value accordingly. Modules were defined with a minimum size of 50 genes. The dynamic tree cut method was used to detect distinct gene modules, and similar modules were merged via a module eigenvalue dissimilarity threshold (MEDissThres) of 0.25. Tumor microenvironment (TME), immune, and functional scoring The R package ESTIMATE was used to predict immune, stromal, and total ESTIMATE scores for individual tumor samples^[75]43. To investigate immune cell interactions within the TME, data from the web portal TISIDB were used to quantify the relative abundance of various immune cell types via single-sample gene set enrichment analysis (ssGSEA). Immune-related features were further assessed via KEGG_C2 pathway scores. The R package IOBR^[76]44,[77]45 provided additional insights into immune cell infiltration and interactions. PPI analysis and machine learning for prognostic gene identification PPI was conducted to identify core protein-related genes that were differentially expressed across glioma subtypes. Leveraging the CGGA301 dataset as a test set, we applied ten machine learning algorithms, including Lasso, Ridge, Enet, StepCox, survivalSVM, CoxBoost, SuperPC, plsRcox, RSF, and GBM. Under the framework of cross-validation, one algorithm was used for variable selection while another was employed to construct the prognostic model. The concordance index (C-index) was calculated for each model combination (totaling 101 combinations, Table [78]S2) on external datasets (or including the training set). For the CoxBoost model, we first determined the optimal penalty term (shrinkage parameter) by invoking the “optimCoxBoostPenalty” function. Subsequently, tenfold cross-validation was performed to identify the optimal boosting steps for the CoxBoost model, and final model fitting was accomplished using the “CoxBoost” function. In terms of stepwise Cox analysis, we utilized the survival package and evaluated model complexity based on the Akaike Information Criterion (AIC). All possible combinations of direction parameters were considered, including “both” (bidirectional), "backward," and “forward” elimination approaches. The Lasso, Ridge, and Enet models were constructed using the "cv.glmnet" function from the glmnet package. A tenfold cross-validation approach was adopted to determine the regularization parameter lambda, with the compromise parameter alpha ranging between 0 and 1 at 0.1 intervals. Specifically, when alpha = 1, the Lasso model was implemented; when alpha = 0, the Ridge model was used; and for other alpha values, the Enet (Elastic Net) model was applied. For the survival support vector machine model, we employed the “survivalsvm” function from the survivalsvm package, which is specialized for survival outcome analysis. The GBM model was fitted using the “gbm” function from the gbm package combined with tenfold cross-validation. The SuperPC model, an extension of principal component analysis (PCA), was implemented via the superpc package. During model construction, tenfold cross-validation was performed using the "superpc.cv" function. The plsRcox model was directly established using the "cv.plsRcox" function from the plsRcox package. Finally, for the RSF (Random Survival Forest) model, we utilized the “rfsrc” function from the randomForestSRC package. In parameter settings, “ntree” represents the number of trees in the random forest, and “nodesize” denotes the minimum size of terminal nodes. In this study, “ntree” was set to 1000, and the minimum variable count for screening was configured as 5. A total of 101 unique algorithmic combinations were used to identify prognostic genes. We further refined our gene selection through univariate analysis and SVM analysis, focusing on TFs differentially expressed between glioma subtypes. Survival analysis was subsequently performed via the R packages survival and survminer, applying a significance threshold of p < 0.05 for significance. RNA extraction and real-time quantitative PCR (qRT-PCR) workflow Cellular and tissue samples were lysed and purified using TRIzol reagent (Thermo Fisher Scientific). Total RNA was precipitated via isopropanol precipitation method, and RNA purity was verified using a Nanodrop spectrophotometer (A260/A280 ratio ≥ 1.8). All procedures were performed strictly following the manufacturer’s instructions with SYBR Premix Ex Taq™ kits (Takara, Japan; Catalog Nos. RR047A and RR820A). Real-time PCR amplification was conducted on a CFX96 Touch™ Real-Time PCR Detection System (Bio-Rad, USA) equipped with a 96-well optical reaction module and precision temperature control system. Relative gene expression was quantified using the ΔΔCt method with GAPDH as the endogenous reference gene. Each sample included a minimum of 3 technical replicates and 3 biological replicates to ensure statistical robustness. Primer sequences used are listed in Supplementary Table [79]S5. Statistical analysis Statistical analysis and graphical outputs were generated via R software version 4.3.0 ([80]https://www.r-project.org/) and GraphPad 9.0. Comparisons between two groups were made via the Wilcoxon test, whereas analysis of variance (ANOVA) was used for comparisons across more than two groups^[81]46. Survival analysis was conducted via the log-rank test, facilitated by the R package survminer. Pearson correlation analysis was performed to examine the relationships between genes and gene set enrichment scores. A p value of less than 0.05 was considered statistically significant throughout the analyses. Results Transcriptomic profiles of the two glioma subgroups We identified a set of 795 TF genes for analysis. After integrating glioma datasets from TCGA_LGG, TCGA_GBM, CCGA_325, and TCGA_693, we visualized TF expression levels across glioma samples categorized by WHO tumor grade (from grade II to grade IV). These findings revealed that most TFs presented higher expression levels in more aggressive, higher-grade tumors (Fig. [82]2A). NMF was used to stratify glioma patients into two distinct clusters, C1 and C2. Visualization of the TF expression patterns in these two clusters confirmed that the two clusters represent two distinct subtypes of gliomas on the basis of TF activity (Fig. [83]2B–F). Patients in the C2 cluster had significantly better overall survival than those in the C1 cluster did (Fig. [84]2F). Fig. 2. [85]Fig. 2 [86]Open in a new tab Construction and clinical exploration of transcription factor-based molecular classification of gliomas. (A) Heatmap of transcription factor expression across WHO grades; (B, C) Derivation of 2–10 subtypes via non-negative matrix factorization (NMF) and NMF rank survey; (D, E). Subtype localization and heatmap visualization of transcription factor expression; (F) Survival difference analysis among subtypes reveals poorer prognosis in C1 subtype; (G) Distribution ratio of subtypes across WHO grades; (H) Sankey diagram visualizing subtype-WHO grade associations; (I) Differential expression analysis of transcription factors in WHO grades identifies 208 differentially expressed transcription factors. Analysis of clinical traits between the two clusters revealed significant differences in polygenic risk score (PRS) type, tumor grade, isocitrate dehydrogenase (IDH) gene mutation status, 1p/19q codeletion status, and hypermethylation of the O^6-methylguanine-DNA-methyltransferase (MGMT) gene promoter methylation status, but no significant differences according to sex. In particular, the C1 cluster had a greater proportion of patients with WHO grade IV gliomas (54%) than did the C2 cluster (19%), whereas the C2 subgroup had more patients with lower-grade (WHO grades II and III) gliomas (WHO grade II: C1 = 6%, C2 = 55%; WHO grade III: C1 = 18%, C2 = 53%), with all differences reaching statistical significance (p < 0.01) (Fig. [87]1G). Additionally, most patients in the C1 cluster had higher-grade gliomas (Fig. [88]2H). Differential analysis of TF expression across gliomas of different WHO grades revealed that 208 TFs were significantly differentially expressed between gliomas of varying grades (Fig [89]2I). Protein interaction and machine learning analyses identified prognostic genes PPI analysis of TFs with varying expression levels across glioma grades revealed interactions between key proteins (Fig. [90]3A,B). Among the machine learning strategies applied. RSF, Lasso + RSF, and CoxBoost + RSF yielded the best predictive performance for identifying prognostic genes (Fig. [91]3C), with RSF achieving the top rank, scoring an average of 0.791 (maximum: 1). From the RSF-derived gene set, several genes, including HES1, IRF1, and TCF3, were found to have the highest expression levels in high-grade gliomas (Fig. [92]3D). Further analysis via the SVM algorithm revealed distinct gene expression patterns in the two glioma subgroups. Several key genes exhibited differential expression between the subgroups (Fig. [93]4A–C), and prognostic evaluation revealed significant associations between gene expression and patient outcomes (p < 0.05) (Fig. [94]4D). Fig. 3. [95]Fig. 3 [96]Open in a new tab Screening of prognostic differentially expressed transcription factors and protein–protein interaction analysis using 101 machine learning algorithms. (A, B) Visualization of protein–protein interaction networks for differentially expressed transcription factors; (C) Prognostic gene selection via 101 machine learning model ensemble strategies; (D) Heatmap of prognostic genes identified by random survival forest (RSF) algorithm in glioblastoma. Fig. 4. [97]Fig. 4 [98]Open in a new tab Screening of subtype-specific signature genes for transcription factor-based glioma classification. (A–C) Signature gene selection for glioma transcription factor subtypes using support vector machine (SVM); (D) Univariate analysis results of subtype-specific signature genes. Differential expression of key TFs in glioma PPI network analysis of the proteins encoded by the 38 identified genes revealed the top-ranked genes (Fig. [99]5A). The top five genes—EZH2, TWIST1, EGR1, FOSL2, and TCF3—were identified as biomarkers of poor prognosis in glioma patients and are crucial for the molecular subtyping of gliomas. Evaluation of the mRNA expression levels of these five genes across gliomas of different WHO grades revealed a progressive increase in expression with increasing tumor grade (Fig. [100]5B). When the two glioma subgroups were compared, these genes were more highly expressed in the C1 subgroup, which is associated with poor prognosis, than in the C2 subgroup (Fig. [101]5C–G). Elevated expression of these five genes was correlated with worse patient outcomes (Fig. [102]5H–L). Fig. 5. [103]Fig. 5 [104]Open in a new tab Subtype-specific signature gene screening results. (A) Protein–protein interaction analysis of subtype-specific signature genes; (B) Core gene expression profiles across WHO grades indicate highest expression in high-grade gliomas; (C–G) Upregulation of signature genes in poor-prognosis subtype (C1); (H–L) Prognostic value analysis of signature genes in gliomas shows poor survival correlates with high expression. TME and gene expression in glioma subtypes The TME is the cellular environment in which malignant tumor cells interact with immune and stromal cells, influencing tumor progression. Using the ESTIMATE algorithm, we evaluated the TME status of the two TF-based subtypes of gliomas. The C1 subgroup was found to have the highest immune, stromal, and ESTIMATE scores, suggesting a greater abundance of immune and stromal components (Fig. [105]6A). We assessed the gene sets related to immune function and pathways (Fig. [106]6A), as well as the KEGG_C2 pathway (Fig. [107]6B), across the subgroups. In the hallmark gene set, pathways such as EPITHELIAL_MESENCHYMAL_TRANSITION, ANGIOGENESIS, HYPOXIA, and P53_PATHWAY presented increased activity, implying a link between these pathways and the malignant behavior of the C1 subgroup. Similar results were also observed in the immune and KEGG_C2 gene sets. Increased scores for CELL_ADHESION_MOLECULES_CAMS, AMINO_SUGAR_AND_NUCLEOTIDE_SUGAR_METABOLISM, and P53_SIGNALING_PATHWAY were also observed in the C1 subgroup, further highlighting its malignant features. Additionally, we explored transcription gene expression profiles between subgroups, identifying increased expression of genes such as EZH2, TWIST1, EGR1, FOSL2, and TCF3 in the C1 subgroup (Fig. [108]6C,D). Fig. 6. [109]Fig. 6 [110]Open in a new tab Immune landscape, functional enrichment, and transcription factor distribution across glioma subtypes. (A) Differences in tumor microenvironment, immune gene expression, and immune function scores across subtypes; (B) KEGG pathway enrichment visualization reveals elevated tumor-related pathways in C1 subtype; (C–D) Heatmaps of core transcription factor expression across subtypes. Pan-cancer analysis of TCF3 and its implications in Glioma Receiver Operating Characteristic (ROC) analysis of five genes demonstrated that TCF3 achieved a significantly higher Area Under the Curve (AUC) value (0.735) compared to EZH2 (0.618), TWIST1 (0.667), EGR1 (0.594), and FOSL2 (0.494) (Fig. [111]7A). Cross-referencing data from the Human Protein Atlas (HPA) database ([112]https://www.proteinatlas.org) and Gene Expression Profiling Interactive Analysis (GEPIA) database^[113]47 revealed elevated TCF3 expression levels across multiple malignancies (Fig. [114]7B,C). Notably, TCF3 expression was markedly upregulated in glioblastoma (GBM) and low-grade glioma (LGG) tissues relative to normal brain counterparts (Fig. [115]7D), suggesting a potential oncogenic role in gliomagenesis or progression. Elevated TCF3 expression persisted in brain cancer cell lines including LN229 and U251MG (Fig. [116]7E). Compared to normal human astrocytes (HA), glioma cell lines exhibited consistent overexpression of TCF3 (Fig. [117]7F), a finding corroborated by immunohistochemical staining demonstrating intense TCF3 positivity in tumor specimens (Fig. [118]7G,H). While minor inter-subtype variations in TCF3 expression were observed across glioma classifications (Fig. [119]7I), our RT-qPCR analysis of clinical glioma samples revealed a progressive increase in TCF3 expression correlating with ascending WHO grades (Fig. [120]7J). Elevated TCF3 expression was uniformly associated with adverse clinical outcomes (Fig. [121]7K–N), including poorer survival in GBM (Fig. [122]7L), WHO grade II (Fig. [123]7M), and WHO grade III (Fig. [124]7N) cohorts. Furthermore, among the four molecular subtypes of diffuse glioma, TCF3 demonstrated preferential overexpression in the prognostically unfavorable classical and mesenchymal subtypes (Fig. [125]7O,P). Fig. 7. [126]Fig. 7 [127]Open in a new tab TCF3 pan-cancer analysis reveals tumor cachexia characteristics. (A) ROC curve indicates the highest AUC for TCF3; (B, C) TCF3 expression profiles across pan-cancer types; (D) Higher TCF3 expression in glioblastoma (GBM) and low-grade glioma (LGG) compared to normal tissues; (E) TCF3 expression abundance in brain cancer cell lines; (F) Elevated TCF3 expression in tumor cells vs. astrocytes; (G, H) Immunohistochemical staining confirms TCF3 overexpression in tumors; (I) TCF3 expression across pathological grades; (J, K) TCF3 expression correlates with WHO grade progression and poor prognosis; (L) High TCF3 expression associates with worse survival in WHO IV gliomas; (M) High TCF3 expression correlates with poor prognosis in WHO II gliomas; (N) High TCF3 expression correlates with poor prognosis in WHO III gliomas; (O, P) TCF3 overexpression in poor-prognosis classical and mesenchymal molecular subtypes. TCF3 as a potential biomarker for glioma classification and prognosis To elucidate the molecular characteristics of transcription factor subtypes in glioma, we conducted differential expression analysis and functional enrichment studies. Differential analysis revealed significant upregulation of multiple prognosis-related genes, including TCF3, in the C1 subtype (Fig. [128]8A). Further stratification of samples into high-expression (> median) and low-expression (< median) groups based on TCF3 levels revealed significant enrichment of differentially expressed genes in Wnt, FOXO, and Hippo signaling pathways through KEGG pathway enrichment analysis (Table [129]S3, Fig. [130]8B). It should be particularly noted that this result only indicates a statistical association between TCF3 expression levels and these pathways. The specific regulatory direction (e.g., whether TCF3 activates the pathways or is subject to feedback regulation by the pathways) remains to be validated through functional experiments. Fig. 8. [131]Fig. 8 [132]Open in a new tab Functional analysis of TCF3 in gliomas. (A) Upregulation of oncogenes including TCF3 in poor-prognosis subtypes; (C, D). Functional consistency between TCF3 analysis and subtype analysis (GO/KEGG enrichment); (E–H) Weighted gene co-expression network analysis (WGCNA) of glioma transcription factor subtypes; I-K. Functional consistency validation between TCF3 and subtype-specific pathways. Functional annotation of biological processes (BP), cellular components (CC), and molecular functions (MF) associated with glioma subtype genes was performed using Gene Ontology (GO) analysis (Table [133]S4, Fig. [134]8C,D). At the biological process level, TCF3-related genes were enriched in fundamental cellular functions such as gene expression regulation, nucleotide metabolism, and GTPase activity, suggesting potential functional links between TCF3 and these processes. However, direct regulatory effects on cell growth and survival require validation through in vitro experiments. At the cellular component level, involvement of TCF3-related genes in cell–matrix adhesions and cell junctions implies a potential role in shaping the tumor microenvironment through cell–cell interactions, though specific mechanisms governing cell migration or tissue formation require further elucidation. At the molecular function level, enrichment of ubiquitin-protein ligase binding and guanine nucleotide exchange factor activity suggests TCF3 may mediate protein modification or signal transduction processes, with its precise molecular mechanisms warranting exploration. Weighted Gene Co-expression Network Analysis (WGCNA) identified a gene module strongly correlated with the C1 subtype (Figs. [135]7E–H), with the brown2 module exhibiting significant positive correlation (cor = 0.62, p = 4e − 102).. Intersection analysis of this module with glioma-associated differentially expressed genes (DEGs) yielded 523 overlapping genes. KEGG pathway and Gene Set Enrichment Analysis (GSEA) demonstrated co-expression patterns of these genes in previously reported tumor-related pathways (Fig. [136]8J,K). However, it must be emphasized that this finding reflects only expression-level associations between TCF3 and specific pathways, with functional relevance requiring verification through genetic perturbation experiments (e.g., knockdown/overexpression). Nevertheless, the functional consistency between TCF3 expression and glioma transcriptional subtypes supports its utility as a marker for glioma transcriptomic subtyping. Furthermore, the significant expression-level association between TCF3 and key glioma signaling pathways/cellular processes provides tentative evidence for its role as a molecular classifier and prognostic biomarker in glioma. Mutation characteristics and clinical implications of TCF3 Data from TCGA revealed significant differences in tumor mutation burden (TMB) across glioma subtypes. Tumors with high TCF3 expression (TCF3-H) had increased TMB (Fig. [137]9A). Analysis of patient prognosis revealed that gliomas with both high TCF3 expression and high TMB (H-TMB + TCF3-H) had the worst prognosis, whereas those with low TCF3 expression and low TMB (L-TMB + TCF3-L) had the best prognosis (Fig. [138]9B). Among the top 25 genes most frequently mutated in gliomas, certain genes, such as IDH1, TP53, and ATRX, had frequent mutations across both glioma subtypes. In TCF3-H tumors, TP53 (49%) and ATRX (34%) mutations were frequent, with additional low-frequency mutations in genes such as NF5 (1%) and PTEN (10%) (Fig. [139]9C–D). These findings highlight the genetic complexity of TCF3-H tumors and their association with glioma progression. The presence of multiple mutations, particularly in TP53 and ATRX, may contribute to the aggressive nature of gliomas with elevated TCF3 expression, reinforcing the link between high TCF3 levels and poor prognosis. Furthermore, within molecular subtypes of glioma, our TCF3-high expression cohort demonstrated a significantly higher representation in prognostically unfavorable subgroups characterized byclassical and MGMT promoter methylation (263:67), indirectly underscoring the strong correlation between elevated TCF3 expression and adverse clinical outcomes in glioma(Fig. [140]9E,F). Fig. 9. [141]Fig. 9 [142]Open in a new tab Mutational landscape of TCF3 in gliomas. (A) Association between TCF3 expression and tumor mutation burden (TMB); (B) Survival differences in TMB between TCF3-high/low groups; (C, D) Mutational profiles of TCF3-high (C) and TCF3-low (D) tumors; (E) Higher prevalence of TCF3-high in classical/mesenchymal subtypes with poor prognosis; (F) Enrichment of TCF3-high in MGMT-methylated subgroups with worse outcomes; (G) Poorer prognosis in MGMT-methylated vs. non-methylated gliomas; (H) Worse survival in classical/mesenchymal vs. proneural gliomas. Discussion Glioma is the most prevalent primary malignant tumor in the intracranial cavity and is characterized by poor prognosis, high rates of disability, and a propensity for recurrence. This disease imposes a significant burden on patients, their families, and society at large. Glioblastoma, the most aggressive type of glioma, has extremely poor survival rates—typically less than 15 months—even when treated with advanced techniques such as maximal safe resection (surgical removal), temozolomide chemotherapy, and radiotherapy. Gliomas differ from other tumors because their cells are especially invasive, infiltrating normal brain tissue early. This aggressiveness means that even after complete surgical removal of the tumor^[143]48–[144]50, recurrence is common at the surgical margins^[145]29–[146]31. TFs play critical roles in tumorigenesis and tumor progression by regulating gene transcription, influencing processes such as proliferation, apoptosis, invasion, metastasis, and angiogenesis. The E26 transformation-specifi (ETS) family of TFs, one of the largest groups of transcription regulators, drives oncogenesis through chromosomal translocations, abnormal gene expression, and disrupted signaling pathways. The ETS family activates genes related to tumor invasion and metastasis, promoting tumor invasion and metastasis^[147]32,[148]33,[149]51. YY1, another key TF, has dual roles as both a gene activator and repressor. The overexpression of YY1 is strongly linked to tumor initiation and metastasis, and its involvement in critical pathways leads to poor clinical outcomes^[150]18. Similarly, BTF3 is highly expressed in multiple cancers and promotes tumor growth by interacting with genes such as TP53. Silencing BTF3 in melanoma has been shown to inhibit tumor growth and induce apoptosis, underscoring its role in tumor progression^[151]52. The intricate regulatory networks governed by TFs are fundamental to the biological behavior of gliomas and other malignancies. These networks not only drive tumorigenesis but also have a significant effect on clinical outcomes. In this study, the functions of 795 TFs in glioma were investigated, and two distinct prognostic subgroups, C1 and C2, were identified via NMF on the basis of their biological characteristics and patient outcomes. These subgroups were differentiated by several clinical and molecular characteristics, such as polygenic risk score (PRS) type, tumor grade, IDH gene mutation status, 1p/19q codeletion status, and MGMT gene promoter methylation status, emphasizing the biological complexity and heterogeneity of glioma. These two subgroups also differed in the proportion of gliomas with varying WHO grades. C1 had a greater proportion of patients with WHO grade IV gliomas (54%) than did C2 (19%), whereas C2 had a greater proportion of patients with lower-grade gliomas (WHO grade II: C1: 6% vs. C2: 55%; WHO grade III: C1: 18% vs. C2: 53%). These differences were statistically significant (p < 0.01). Additionally, most patients in the C1 subgroup had higher-grade gliomas. Further analysis revealed the biological pathways activated in each subgroup. In the hallmark gene set, pathways such as EPITHELIAL_MESENCHYMAL_TRANSITION, ANGIOGENESIS, HYPOXIA, and P53_PATHWAY were more active in the C1 subgroup, suggesting that these processes contribute to the aggressive nature of C1 tumors. Similar patterns were found in immune-related and KEGG gene sets, particularly in the activation of pathways related to CELL_ADHESION_MOLECULES_CAMS, AMINO_SUGAR_AND_NUCLEOTIDE_SUGAR_METABOLISM, and P53_SIGNALING, indicating more aggressive biological behavior in the C1 subgroup. To further understand the biological significance of these differences, a range of advanced analysis techniques, such as differential gene expression analysis, WGCNA, survival analysis, and machine learning methods, were used. These analyses revealed that TCF3 is a central regulator. Building on the pivotal roles of TFs in tumorigenesis, cancer cell metabolic reprogramming, and stem cell regulation^[152]10,[153]15–[154]17,[155]29,[156]30, significant progress has been achieved in glioma research through targeting dysregulated TFs using nanomedicine as a potential therapeutic strategy^[157]14. Studies have demonstrated that TCF3 is overexpressed in human gliomas, and its inhibition leads to suppressed tumor growth both in vitro and in vivo, accompanied by reduced proliferation and migration of glioma cells^[158]28,[159]31. TCF3 was significantly differentially expressed between the two glioma subgroups. This finding was confirmed through further analysis, which revealed that TCF3 is closely associated with the WHO classification of glioma and can serve as a marker to distinguish between different glioma subgroups. The role of the TCF3 gene in glioma was further confirmed via mutation analysis. The TCF3-H subgroup presented a greater TMB, with significant mutations in genes such as TP53 (49%) and ATRX (34%). Additionally, several low-frequency mutations were detected in genes such as NF1 and PTEN, suggesting that different signaling pathways are involved in the TCF3-H group. This variety of mutations points to a more complex genetic makeup and greater tumor heterogeneity in the TCF3-H subgroup, which may contribute to the poor prognosis in this subgroup. In summary, our study developed a novel molecular classification system based on TF-related genes, with a particular emphasis on the role of TCF3 in the molecular subtyping of glioma and prognosis prediction. Although our comprehensive analyses and rt-QPCR validation have identified TCF3 as a pivotal biomarker within the transcription factor subtype landscape of glioma, we acknowledge that the functional validation of its mechanistic role remains to be fully explored. Specifically, in vitro and in vivo model experiments are warranted to dissect the precise biological functions of TCF3 as a transcription factor in glioma. Moreover, a systematic investigation into the interplay between TCF3 and classical oncogenic pathways is imperative to elucidate the specific mechanisms underlying TCF3’s actions in glioma. By integrating multiple TFs, we identified two distinct prognostic subgroups with significant differences in clinical and molecular characteristics. TCF3 has emerged as a critical marker for distinguishing these subgroups, as it is closely associated with tumor grade, mutation burden, and key pathways. This classification system enhances the accuracy of survival prediction for glioma patients, reflecting the biological heterogeneity of the disease. Furthermore, this study provides a rich dataset that can be leveraged to explore TFs as potential therapeutic targets and guide the development of personalized glioma treatments. Electronic supplementary material Below is the link to the electronic supplementary material. [160]Supplementary Material 1^ (2MB, xlsx) Author contributions All authors made significant contributions to the reported work, Qiao LI and Peng Feng made significant contributions to the conception, research design, execution, data collection and analysis of the paper, and Shangyu Liu and Goupeng Tian participated in drafting, revising or critically reviewing the article; The corresponding author Yawen Pan , Gouqiang Yuan guided and controlled the article as a whole. Data availability Publicly available datasets were analyzed in this study. This data can be found here: The China Glioma Genome Atlas (CGGA) database ([161]http://www.cgga.org.cn) , The Cancer Genome Atlas (TCGA) database ([162]https://portal.gdc.cancer.gov/). Declarations Competing interests The authors declare no competing interests. Ethical approval and consent to participate All patients involved in this study provided written informed consent, and the study was approved by the Medical Ethics Committee of the Second Hospital of Lanzhou University (Approval No. 2022 A-515). We adhere to the journal’s publication ethics policies and ensure that all research published complies with the journal’s stringent technical and ethical standards. Consent for publication all authors agreed to finally approve the version to be released; Agree to the journal to which the article is submitted; and agree to take responsibility for all aspects of the work. Footnotes Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Qiao Li, Peng Feng and Shangyu Liu: contributed equally to this work. Contributor Information Guoqiang Yuan, Email: yuangq08@lzu.edu.cn. Yawen Pan, Email: pyw@lzu.edu.cn. References