Abstract

   Transcription factors (TFs) are pivotal in tumor initiation and
   progression, regulating downstream gene expression and modulating
   cellular processes. In this study, we conducted a comprehensive
   analysis of TF gene sets to define the molecular subtypes of gliomas.
   Using nonnegative matrix factorization (NMF), we identified two
   distinct glioma subtypes characterized by significant differences in
   survival outcomes and clinical features. Additionally, we identified TF
   gene sets with differential expression across gliomas of various World
   Health Organization (WHO) states, followed by protein‒protein
   interaction (PPI) network analysis. By applying 101 machine learning
   models, five key genes (EZH2, TWIST1, EGR1, FOSL2, and TCF3) involved
   in glioma were identified. Among these genes, TCF3 has emerged as a
   potential key prognostic marker because of its distinct expression
   patterns and functional relevance. By performing multi-omics and
   multi-dataset analyses, we explored the aberrant expression of TCF3
   across multiple cancers, with robust validation at both the cellular
   and tissue levels. Furthermore, our analysis revealed a strong
   association between TCF3 mutation and glioma prognosis, underscoring
   its potential as a therapeutic target. In summary, this study not only
   introduces a novel method for the molecular subtyping of glioma but
   also highlights TCF3 as a promising target for precision medicine. Our
   findings provide crucial insights into the molecular mechanisms of
   glioma and offer a foundation for the development of novel therapeutic
   strategies.

Supplementary Information

   The online version contains supplementary material available at
   10.1038/s41598-025-09924-w.

   Subject terms: Computational biology and bioinformatics, Molecular
   biology, Biomarkers, Diseases, Medical research, Molecular medicine,
   Neurology

Introduction

   Cancer remains an escalating global health challenge, with both
   incidence and mortality rates steadily rising. In 2022, the United
   States alone reported more than 1.9 million new cancer cases and
   609,360 cancer-related deaths, underscoring the pressing need for more
   effective interventions^[34]1. Among the many cancer types, glioma, a
   particularly aggressive brain tumor, is distinguished by its
   infiltrative growth pattern, which closely intertwines with normal
   brain tissue, obscuring tumor boundaries^[35]2. This makes surgical
   resection difficult and often incomplete, leading to residual tumor
   cells and recurrence. Moreover, gliomas exhibit a regenerative capacity
   that increases malignancy over time, further complicating treatment and
   worsening the prognosis^[36]3.

   An additional challenge in glioma management arises from its
   intertumoral heterogeneity^[37]4. This molecular complexity means that
   each tumor can present unique features, leading to variable responses
   to standard therapies^[38]5. Given the limited understanding of the
   molecular mechanisms driving glioma progression, alongside the
   limitations of current treatments, there is an urgent need for in-depth
   molecular analyses to advance precision medicine and support the
   development of more personalized therapeutic strategies.

   Transcription factors (TFs) are critical regulators of gene expression
   and play key roles in cancer biology. In eukaryotic cells,
   transcription initiation requires the coordinated action of RNA
   polymerase II and numerous TFs that assemble at the promoter region of
   a gene to form the transcription initiation complex^[39]6. TFs are
   generally classified as either general TFs, which help form the
   transcription initiation complex, or specific TFs, which regulate genes
   in response to stimuli such as hormones or growth factors.

   In cancer, TFs play a central role in various critical pathological
   processes, including tumorigenesis^[40]7–[41]9, metabolic
   reprogramming^[42]10,[43]11, and the self-renewal and differentiation
   of cancer stem cells (CSCs)^[44]12,[45]13, by regulating gene
   expression networks. During tumorigenesis, TFs can function as
   oncogenes or tumor suppressors, depending on the specific cancer
   type^[46]10,[47]14–[48]17. For example, the TF Yin Yang 1 (YY1)
   promotes tumor cell proliferation and metastasis in most cancers^[49]18
   but exhibits tumor-suppressive activity in specific cancers, such as
   pancreatic and esophageal tumors. In metabolic reprogramming, TFs
   regulate critical genes involved in metabolic pathways, such as
   glycolysis and fatty acid metabolism, enabling tumor cells to meet the
   energy demands of rapid growth^[50]19,[51]20. TFs also play crucial
   roles in modulating the properties of CSCs, a population of tumor cells
   responsible for recurrence and metastasis. In breast cancer, forkhead
   box O3 (FOXO3a) negatively regulates forkhead box M1
   (FOXM1)^[52]21–[53]24, reducing CSC stemness and tumorigenicity. Given
   these diverse roles, TFs serve as important diagnostic and prognostic
   biomarkers in various cancers^[54]25,[55]26. Together, these findings
   highlight the multifaceted functions of TFs in cancer biology and their
   potential as therapeutic targets.

   In this study, we focused on the role of TFs in glioma subtyping. Using
   nonnegative matrix factorization (NMF)^[56]27, we analyzed 795 TFs and
   identified two molecular subtypes of gliomas with distinct survival
   rates and clinical characteristics. Further analysis revealed that
   certain TF-related gene sets were differentially expressed across World
   Health Organization (WHO) grade II, III, and IV gliomas. To explore
   protein-level interactions among these genes, we performed a
   protein‒protein interaction (PPI) network analysis.

   To identify core genes associated with the subtyping of gliomas and the
   classification of gliomas according to the WHO standards, ten machine
   learning techniques, including least absolute shrinkage and selection
   operator (Lasso), Ridge, Elastic Net (Enet), StepCox, survivalSVM,
   CoxBoost, SuperPC, plsRcox, random survival forests (RSF), and gradient
   boosting machine (GBM), were combined into 101 distinct strategies.
   Subsequently, the support vector machine (SVM) algorithm was used to
   further refine the results, identifying genes that are representative
   of different glioma subtypes. This thorough approach led to the
   identification of several key genes linked to glioma subtypes.
   Univariate analysis was conducted to determine the main factors
   affecting glioma risk, and a thorough PPI network analysis of these
   risk factors was performed. Ultimately, we identified five key genes,
   EZH2, TWIST1, EGR1, FOSL2, and TCF3, which are important for
   understanding glioma subtypes and predicting patient outcomes.

   Research on transcription factors in glioma and glioblastoma has
   confirmed their unique regulatory roles in these types of brain tumors.
   Despite its recognized role in other tumors, TCF3 has been minimally
   studied in gliomas^[57]28–[58]30, The regulatory role of TCF3 in
   controlling cell proliferation and migration in glioma cell lines has
   been reported in previous studies^[59]31, Our multi-omics and
   multi-dataset analyses revealed the abnormal expression of TCF3 in
   glioma and other cancers, a finding validated at both the cellular and
   tissue levels. Furthermore, we explored the relationship between TCF3
   and glioma prognosis, revealing new insights into its potential as a
   therapeutic target.

   Overall, our study introduced a novel molecular classification system
   for gliomas and identified TCF3 as a key prognostic marker and
   potential therapeutic target. These findings lay the groundwork for
   advancing precision medicine in glioma treatment and may facilitate the
   development of novel therapeutic strategies.

Materials and methods

Data collection and processing

   The structured workflow diagram of this study is illustrated in
   Fig. [60]1. Transcription factors are sourced from the TRRUST database
   ([61]https://www.grnpedia.org/trrust/) , a manually curated repository
   of transcriptional regulatory networks in humans and mice (Table
   [62]S1). These data originate from 11,237 PubMed-indexed publications
   documenting small-scale experimental investigations into
   transcriptional regulation, encompassing three categories of
   transcription factors: direct regulators, indirect mediators, and
   condition-specific modulators. Transcriptome and clinical data were
   obtained from two databases: the Chinese Glioma Genome Atlas (CGGA)
   database ([63]http://www.cgga.org.cn) and The Cancer Genome Atlas
   (TCGA) database ([64]https://portal.gdc.cancer.gov/)^[65]32–[66]37.
   Among these, TCGA included 702 glioma samples, while the CGGA database
   aggregated three datasets (CGGA325, CGGA693, and CGGA301) totaling
   1,351 samples, all confirmed as gliomas. For prognostic analysis and
   molecular subtyping, samples with invalid survival data (survival
   time ≤ 0 days or missing survival status) were excluded from the
   cohort. Genomic mutation data for gliomas were acquired from the
   Genomic Data Commons (GDC) Data Portal
   ([67]https://portal.gdc.cancer.gov/). The glioma cell lines (U251,
   LN229, U87, A172, and U118) and normal human astrocytes (HA) were
   obtained from the Chinese Academy of Science. Tumor tissue specimens,
   and paired adjacent normal brain tissue samples were obtained from the
   Tissue Repository of Lanzhou University Second Hospital. All patients
   provided written informed consent, and the study protocol received full
   ethical approval from the Clinical Ethics Committee of Lanzhou
   University Second Hospital.

Fig. 1.

   [68]Fig. 1
   [69]Open in a new tab

   Structured workflow diagram of this study.

Identification and classification of glioma subtypes using TFs

   A set of 795 TF-related genes (Table [70]S1) was identified for
   analysis. To classify glioma subtypes on the basis of TF expression
   profiles, we applied NMF with the following parameters: rank = 2:10,
   method = “brunet”, and nrun = 10. After evaluating the clustering
   performance, we determined that two subtypes (clusterNum = 2) provided
   the most robust classification. We then assessed the relationships
   between these two subtypes and their clinical characteristics.
   Differential gene expression analyses between two glioma subtypes and
   across different WHO grades of glioma were performed via the limma R
   package to identify differentially expressed genes (DEGs). Gene
   Ontology (GO) analysis^[71]38 was carried out to evaluate DEGs across
   dimensions, including cellular component (CC), molecular function (MF),
   and biological process (BP). We also performed pathway enrichment
   analysis via the Kyoto Encyclopedia of Genes and Genomes (KEGG)
   ^[72]39–[73]41and gene set enrichment analysis (GSEA) to explore
   specific functional pathways.

Weighted gene co-expression network analysis (WGCNA)

   WGCNA^[74]42 was used to identify gene modules co-expressed in relation
   to the two glioma subtypes. To construct a scale-free network, we
   calculate the soft-thresholding power on the basis of the scale-free
   topology criterion and select the optimal value accordingly. Modules
   were defined with a minimum size of 50 genes. The dynamic tree cut
   method was used to detect distinct gene modules, and similar modules
   were merged via a module eigenvalue dissimilarity threshold
   (MEDissThres) of 0.25.

Tumor microenvironment (TME), immune, and functional scoring

   The R package ESTIMATE was used to predict immune, stromal, and total
   ESTIMATE scores for individual tumor samples^[75]43. To investigate
   immune cell interactions within the TME, data from the web portal
   TISIDB were used to quantify the relative abundance of various immune
   cell types via single-sample gene set enrichment analysis (ssGSEA).
   Immune-related features were further assessed via KEGG_C2 pathway
   scores. The R package IOBR^[76]44,[77]45 provided additional insights
   into immune cell infiltration and interactions.

PPI analysis and machine learning for prognostic gene identification

   PPI was conducted to identify core protein-related genes that were
   differentially expressed across glioma subtypes. Leveraging the CGGA301
   dataset as a test set, we applied ten machine learning algorithms,
   including Lasso, Ridge, Enet, StepCox, survivalSVM, CoxBoost, SuperPC,
   plsRcox, RSF, and GBM. Under the framework of cross-validation, one
   algorithm was used for variable selection while another was employed to
   construct the prognostic model. The concordance index (C-index) was
   calculated for each model combination (totaling 101 combinations, Table
   [78]S2) on external datasets (or including the training set). For the
   CoxBoost model, we first determined the optimal penalty term (shrinkage
   parameter) by invoking the “optimCoxBoostPenalty” function.
   Subsequently, tenfold cross-validation was performed to identify the
   optimal boosting steps for the CoxBoost model, and final model fitting
   was accomplished using the “CoxBoost” function. In terms of stepwise
   Cox analysis, we utilized the survival package and evaluated model
   complexity based on the Akaike Information Criterion (AIC). All
   possible combinations of direction parameters were considered,
   including “both” (bidirectional), "backward," and “forward” elimination
   approaches. The Lasso, Ridge, and Enet models were constructed using
   the "cv.glmnet" function from the glmnet package. A tenfold
   cross-validation approach was adopted to determine the regularization
   parameter lambda, with the compromise parameter alpha ranging between 0
   and 1 at 0.1 intervals. Specifically, when alpha = 1, the Lasso model
   was implemented; when alpha = 0, the Ridge model was used; and for
   other alpha values, the Enet (Elastic Net) model was applied. For the
   survival support vector machine model, we employed the “survivalsvm”
   function from the survivalsvm package, which is specialized for
   survival outcome analysis. The GBM model was fitted using the “gbm”
   function from the gbm package combined with tenfold cross-validation.
   The SuperPC model, an extension of principal component analysis (PCA),
   was implemented via the superpc package. During model construction,
   tenfold cross-validation was performed using the "superpc.cv" function.
   The plsRcox model was directly established using the "cv.plsRcox"
   function from the plsRcox package. Finally, for the RSF (Random
   Survival Forest) model, we utilized the “rfsrc” function from the
   randomForestSRC package. In parameter settings, “ntree” represents the
   number of trees in the random forest, and “nodesize” denotes the
   minimum size of terminal nodes. In this study, “ntree” was set to 1000,
   and the minimum variable count for screening was configured as 5. A
   total of 101 unique algorithmic combinations were used to identify
   prognostic genes. We further refined our gene selection through
   univariate analysis and SVM analysis, focusing on TFs differentially
   expressed between glioma subtypes. Survival analysis was subsequently
   performed via the R packages survival and survminer, applying a
   significance threshold of p < 0.05 for significance.

RNA extraction and real-time quantitative PCR (qRT-PCR) workflow

   Cellular and tissue samples were lysed and purified using TRIzol
   reagent (Thermo Fisher Scientific). Total RNA was precipitated via
   isopropanol precipitation method, and RNA purity was verified using a
   Nanodrop spectrophotometer (A260/A280 ratio ≥ 1.8). All procedures were
   performed strictly following the manufacturer’s instructions with SYBR
   Premix Ex Taq™ kits (Takara, Japan; Catalog Nos. RR047A and RR820A).
   Real-time PCR amplification was conducted on a CFX96 Touch™ Real-Time
   PCR Detection System (Bio-Rad, USA) equipped with a 96-well optical
   reaction module and precision temperature control system. Relative gene
   expression was quantified using the ΔΔCt method with GAPDH as the
   endogenous reference gene. Each sample included a minimum of 3
   technical replicates and 3 biological replicates to ensure statistical
   robustness. Primer sequences used are listed in Supplementary Table
   [79]S5.

Statistical analysis

   Statistical analysis and graphical outputs were generated via R
   software version 4.3.0 ([80]https://www.r-project.org/) and GraphPad
   9.0. Comparisons between two groups were made via the Wilcoxon test,
   whereas analysis of variance (ANOVA) was used for comparisons across
   more than two groups^[81]46. Survival analysis was conducted via the
   log-rank test, facilitated by the R package survminer. Pearson
   correlation analysis was performed to examine the relationships between
   genes and gene set enrichment scores. A p value of less than 0.05 was
   considered statistically significant throughout the analyses.

Results

Transcriptomic profiles of the two glioma subgroups

   We identified a set of 795 TF genes for analysis. After integrating
   glioma datasets from TCGA_LGG, TCGA_GBM, CCGA_325, and TCGA_693, we
   visualized TF expression levels across glioma samples categorized by
   WHO tumor grade (from grade II to grade IV). These findings revealed
   that most TFs presented higher expression levels in more aggressive,
   higher-grade tumors (Fig. [82]2A). NMF was used to stratify glioma
   patients into two distinct clusters, C1 and C2. Visualization of the TF
   expression patterns in these two clusters confirmed that the two
   clusters represent two distinct subtypes of gliomas on the basis of TF
   activity (Fig. [83]2B–F). Patients in the C2 cluster had significantly
   better overall survival than those in the C1 cluster did (Fig. [84]2F).

Fig. 2.

   [85]Fig. 2
   [86]Open in a new tab

   Construction and clinical exploration of transcription factor-based
   molecular classification of gliomas. (A) Heatmap of transcription
   factor expression across WHO grades; (B, C) Derivation of 2–10 subtypes
   via non-negative matrix factorization (NMF) and NMF rank survey; (D,
   E). Subtype localization and heatmap visualization of transcription
   factor expression; (F) Survival difference analysis among subtypes
   reveals poorer prognosis in C1 subtype; (G) Distribution ratio of
   subtypes across WHO grades; (H) Sankey diagram visualizing subtype-WHO
   grade associations; (I) Differential expression analysis of
   transcription factors in WHO grades identifies 208 differentially
   expressed transcription factors.

   Analysis of clinical traits between the two clusters revealed
   significant differences in polygenic risk score (PRS) type, tumor
   grade, isocitrate dehydrogenase (IDH) gene mutation status, 1p/19q
   codeletion status, and hypermethylation of the
   O^6-methylguanine-DNA-methyltransferase (MGMT) gene promoter
   methylation status, but no significant differences according to sex. In
   particular, the C1 cluster had a greater proportion of patients with
   WHO grade IV gliomas (54%) than did the C2 cluster (19%), whereas the
   C2 subgroup had more patients with lower-grade (WHO grades II and III)
   gliomas (WHO grade II: C1 = 6%, C2 = 55%; WHO grade III: C1 = 18%,
   C2 = 53%), with all differences reaching statistical significance
   (p < 0.01) (Fig. [87]1G). Additionally, most patients in the C1 cluster
   had higher-grade gliomas (Fig. [88]2H). Differential analysis of TF
   expression across gliomas of different WHO grades revealed that 208 TFs
   were significantly differentially expressed between gliomas of varying
   grades (Fig [89]2I).

Protein interaction and machine learning analyses identified prognostic genes

   PPI analysis of TFs with varying expression levels across glioma grades
   revealed interactions between key proteins (Fig. [90]3A,B). Among the
   machine learning strategies applied. RSF, Lasso + RSF, and
   CoxBoost + RSF yielded the best predictive performance for identifying
   prognostic genes (Fig. [91]3C), with RSF achieving the top rank,
   scoring an average of 0.791 (maximum: 1). From the RSF-derived gene
   set, several genes, including HES1, IRF1, and TCF3, were found to have
   the highest expression levels in high-grade gliomas (Fig. [92]3D).
   Further analysis via the SVM algorithm revealed distinct gene
   expression patterns in the two glioma subgroups. Several key genes
   exhibited differential expression between the subgroups
   (Fig. [93]4A–C), and prognostic evaluation revealed significant
   associations between gene expression and patient outcomes (p < 0.05)
   (Fig. [94]4D).

Fig. 3.

   [95]Fig. 3
   [96]Open in a new tab

   Screening of prognostic differentially expressed transcription factors
   and protein–protein interaction analysis using 101 machine learning
   algorithms. (A, B) Visualization of protein–protein interaction
   networks for differentially expressed transcription factors; (C)
   Prognostic gene selection via 101 machine learning model ensemble
   strategies; (D) Heatmap of prognostic genes identified by random
   survival forest (RSF) algorithm in glioblastoma.

Fig. 4.

   [97]Fig. 4
   [98]Open in a new tab

   Screening of subtype-specific signature genes for transcription
   factor-based glioma classification. (A–C) Signature gene selection for
   glioma transcription factor subtypes using support vector machine
   (SVM); (D) Univariate analysis results of subtype-specific signature
   genes.

Differential expression of key TFs in glioma

   PPI network analysis of the proteins encoded by the 38 identified genes
   revealed the top-ranked genes (Fig. [99]5A). The top five genes—EZH2,
   TWIST1, EGR1, FOSL2, and TCF3—were identified as biomarkers of poor
   prognosis in glioma patients and are crucial for the molecular
   subtyping of gliomas. Evaluation of the mRNA expression levels of these
   five genes across gliomas of different WHO grades revealed a
   progressive increase in expression with increasing tumor grade
   (Fig. [100]5B). When the two glioma subgroups were compared, these
   genes were more highly expressed in the C1 subgroup, which is
   associated with poor prognosis, than in the C2 subgroup
   (Fig. [101]5C–G). Elevated expression of these five genes was
   correlated with worse patient outcomes (Fig. [102]5H–L).

Fig. 5.

   [103]Fig. 5
   [104]Open in a new tab

   Subtype-specific signature gene screening results. (A) Protein–protein
   interaction analysis of subtype-specific signature genes; (B) Core gene
   expression profiles across WHO grades indicate highest expression in
   high-grade gliomas; (C–G) Upregulation of signature genes in
   poor-prognosis subtype (C1); (H–L) Prognostic value analysis of
   signature genes in gliomas shows poor survival correlates with high
   expression.

TME and gene expression in glioma subtypes

   The TME is the cellular environment in which malignant tumor cells
   interact with immune and stromal cells, influencing tumor progression.
   Using the ESTIMATE algorithm, we evaluated the TME status of the two
   TF-based subtypes of gliomas. The C1 subgroup was found to have the
   highest immune, stromal, and ESTIMATE scores, suggesting a greater
   abundance of immune and stromal components (Fig. [105]6A). We assessed
   the gene sets related to immune function and pathways (Fig. [106]6A),
   as well as the KEGG_C2 pathway (Fig. [107]6B), across the subgroups. In
   the hallmark gene set, pathways such as
   EPITHELIAL_MESENCHYMAL_TRANSITION, ANGIOGENESIS, HYPOXIA, and
   P53_PATHWAY presented increased activity, implying a link between these
   pathways and the malignant behavior of the C1 subgroup. Similar results
   were also observed in the immune and KEGG_C2 gene sets. Increased
   scores for CELL_ADHESION_MOLECULES_CAMS,
   AMINO_SUGAR_AND_NUCLEOTIDE_SUGAR_METABOLISM, and P53_SIGNALING_PATHWAY
   were also observed in the C1 subgroup, further highlighting its
   malignant features. Additionally, we explored transcription gene
   expression profiles between subgroups, identifying increased expression
   of genes such as EZH2, TWIST1, EGR1, FOSL2, and TCF3 in the C1 subgroup
   (Fig. [108]6C,D).

Fig. 6.

   [109]Fig. 6
   [110]Open in a new tab

   Immune landscape, functional enrichment, and transcription factor
   distribution across glioma subtypes. (A) Differences in tumor
   microenvironment, immune gene expression, and immune function scores
   across subtypes; (B) KEGG pathway enrichment visualization reveals
   elevated tumor-related pathways in C1 subtype; (C–D) Heatmaps of core
   transcription factor expression across subtypes.

Pan-cancer analysis of TCF3 and its implications in Glioma

   Receiver Operating Characteristic (ROC) analysis of five genes
   demonstrated that TCF3 achieved a significantly higher Area Under the
   Curve (AUC) value (0.735) compared to EZH2 (0.618), TWIST1 (0.667),
   EGR1 (0.594), and FOSL2 (0.494) (Fig. [111]7A). Cross-referencing data
   from the Human Protein Atlas (HPA) database
   ([112]https://www.proteinatlas.org) and Gene Expression Profiling
   Interactive Analysis (GEPIA) database^[113]47 revealed elevated TCF3
   expression levels across multiple malignancies (Fig. [114]7B,C).
   Notably, TCF3 expression was markedly upregulated in glioblastoma (GBM)
   and low-grade glioma (LGG) tissues relative to normal brain
   counterparts (Fig. [115]7D), suggesting a potential oncogenic role in
   gliomagenesis or progression. Elevated TCF3 expression persisted in
   brain cancer cell lines including LN229 and U251MG (Fig. [116]7E).
   Compared to normal human astrocytes (HA), glioma cell lines exhibited
   consistent overexpression of TCF3 (Fig. [117]7F), a finding
   corroborated by immunohistochemical staining demonstrating intense TCF3
   positivity in tumor specimens (Fig. [118]7G,H). While minor
   inter-subtype variations in TCF3 expression were observed across glioma
   classifications (Fig. [119]7I), our RT-qPCR analysis of clinical glioma
   samples revealed a progressive increase in TCF3 expression correlating
   with ascending WHO grades (Fig. [120]7J). Elevated TCF3 expression was
   uniformly associated with adverse clinical outcomes (Fig. [121]7K–N),
   including poorer survival in GBM (Fig. [122]7L), WHO grade II
   (Fig. [123]7M), and WHO grade III (Fig. [124]7N) cohorts. Furthermore,
   among the four molecular subtypes of diffuse glioma, TCF3 demonstrated
   preferential overexpression in the prognostically unfavorable classical
   and mesenchymal subtypes (Fig. [125]7O,P).

Fig. 7.

   [126]Fig. 7
   [127]Open in a new tab

   TCF3 pan-cancer analysis reveals tumor cachexia characteristics. (A)
   ROC curve indicates the highest AUC for TCF3; (B, C) TCF3 expression
   profiles across pan-cancer types; (D) Higher TCF3 expression in
   glioblastoma (GBM) and low-grade glioma (LGG) compared to normal
   tissues; (E) TCF3 expression abundance in brain cancer cell lines; (F)
   Elevated TCF3 expression in tumor cells vs. astrocytes; (G, H)
   Immunohistochemical staining confirms TCF3 overexpression in tumors;
   (I) TCF3 expression across pathological grades; (J, K) TCF3 expression
   correlates with WHO grade progression and poor prognosis; (L) High TCF3
   expression associates with worse survival in WHO IV gliomas; (M) High
   TCF3 expression correlates with poor prognosis in WHO II gliomas; (N)
   High TCF3 expression correlates with poor prognosis in WHO III gliomas;
   (O, P) TCF3 overexpression in poor-prognosis classical and mesenchymal
   molecular subtypes.

TCF3 as a potential biomarker for glioma classification and prognosis

   To elucidate the molecular characteristics of transcription factor
   subtypes in glioma, we conducted differential expression analysis and
   functional enrichment studies. Differential analysis revealed
   significant upregulation of multiple prognosis-related genes, including
   TCF3, in the C1 subtype (Fig. [128]8A). Further stratification of
   samples into high-expression (> median) and low-expression (< median)
   groups based on TCF3 levels revealed significant enrichment of
   differentially expressed genes in Wnt, FOXO, and Hippo signaling
   pathways through KEGG pathway enrichment analysis (Table [129]S3,
   Fig. [130]8B). It should be particularly noted that this result only
   indicates a statistical association between TCF3 expression levels and
   these pathways. The specific regulatory direction (e.g., whether TCF3
   activates the pathways or is subject to feedback regulation by the
   pathways) remains to be validated through functional experiments.

Fig. 8.

   [131]Fig. 8
   [132]Open in a new tab

   Functional analysis of TCF3 in gliomas. (A) Upregulation of oncogenes
   including TCF3 in poor-prognosis subtypes; (C, D). Functional
   consistency between TCF3 analysis and subtype analysis (GO/KEGG
   enrichment); (E–H) Weighted gene co-expression network analysis (WGCNA)
   of glioma transcription factor subtypes; I-K. Functional consistency
   validation between TCF3 and subtype-specific pathways.

   Functional annotation of biological processes (BP), cellular components
   (CC), and molecular functions (MF) associated with glioma subtype genes
   was performed using Gene Ontology (GO) analysis (Table [133]S4,
   Fig. [134]8C,D). At the biological process level, TCF3-related genes
   were enriched in fundamental cellular functions such as gene expression
   regulation, nucleotide metabolism, and GTPase activity, suggesting
   potential functional links between TCF3 and these processes. However,
   direct regulatory effects on cell growth and survival require
   validation through in vitro experiments. At the cellular component
   level, involvement of TCF3-related genes in cell–matrix adhesions and
   cell junctions implies a potential role in shaping the tumor
   microenvironment through cell–cell interactions, though specific
   mechanisms governing cell migration or tissue formation require further
   elucidation. At the molecular function level, enrichment of
   ubiquitin-protein ligase binding and guanine nucleotide exchange factor
   activity suggests TCF3 may mediate protein modification or signal
   transduction processes, with its precise molecular mechanisms
   warranting exploration. Weighted Gene Co-expression Network Analysis
   (WGCNA) identified a gene module strongly correlated with the C1
   subtype (Figs. [135]7E–H), with the brown2 module exhibiting
   significant positive correlation (cor = 0.62, p = 4e − 102)..
   Intersection analysis of this module with glioma-associated
   differentially expressed genes (DEGs) yielded 523 overlapping genes.
   KEGG pathway and Gene Set Enrichment Analysis (GSEA) demonstrated
   co-expression patterns of these genes in previously reported
   tumor-related pathways (Fig. [136]8J,K). However, it must be emphasized
   that this finding reflects only expression-level associations between
   TCF3 and specific pathways, with functional relevance requiring
   verification through genetic perturbation experiments (e.g.,
   knockdown/overexpression). Nevertheless, the functional consistency
   between TCF3 expression and glioma transcriptional subtypes supports
   its utility as a marker for glioma transcriptomic subtyping.
   Furthermore, the significant expression-level association between TCF3
   and key glioma signaling pathways/cellular processes provides tentative
   evidence for its role as a molecular classifier and prognostic
   biomarker in glioma.

Mutation characteristics and clinical implications of TCF3

   Data from TCGA revealed significant differences in tumor mutation
   burden (TMB) across glioma subtypes. Tumors with high TCF3 expression
   (TCF3-H) had increased TMB (Fig. [137]9A). Analysis of patient
   prognosis revealed that gliomas with both high TCF3 expression and high
   TMB (H-TMB + TCF3-H) had the worst prognosis, whereas those with low
   TCF3 expression and low TMB (L-TMB + TCF3-L) had the best prognosis
   (Fig. [138]9B). Among the top 25 genes most frequently mutated in
   gliomas, certain genes, such as IDH1, TP53, and ATRX, had frequent
   mutations across both glioma subtypes. In TCF3-H tumors, TP53 (49%) and
   ATRX (34%) mutations were frequent, with additional low-frequency
   mutations in genes such as NF5 (1%) and PTEN (10%) (Fig. [139]9C–D).
   These findings highlight the genetic complexity of TCF3-H tumors and
   their association with glioma progression. The presence of multiple
   mutations, particularly in TP53 and ATRX, may contribute to the
   aggressive nature of gliomas with elevated TCF3 expression, reinforcing
   the link between high TCF3 levels and poor prognosis. Furthermore,
   within molecular subtypes of glioma, our TCF3-high expression cohort
   demonstrated a significantly higher representation in prognostically
   unfavorable subgroups characterized byclassical and MGMT promoter
   methylation (263:67), indirectly underscoring the strong correlation
   between elevated TCF3 expression and adverse clinical outcomes in
   glioma(Fig. [140]9E,F).

Fig. 9.

   [141]Fig. 9
   [142]Open in a new tab

   Mutational landscape of TCF3 in gliomas. (A) Association between TCF3
   expression and tumor mutation burden (TMB); (B) Survival differences in
   TMB between TCF3-high/low groups; (C, D) Mutational profiles of
   TCF3-high (C) and TCF3-low (D) tumors; (E) Higher prevalence of
   TCF3-high in classical/mesenchymal subtypes with poor prognosis; (F)
   Enrichment of TCF3-high in MGMT-methylated subgroups with worse
   outcomes; (G) Poorer prognosis in MGMT-methylated vs. non-methylated
   gliomas; (H) Worse survival in classical/mesenchymal vs. proneural
   gliomas.

Discussion

   Glioma is the most prevalent primary malignant tumor in the
   intracranial cavity and is characterized by poor prognosis, high rates
   of disability, and a propensity for recurrence. This disease imposes a
   significant burden on patients, their families, and society at large.
   Glioblastoma, the most aggressive type of glioma, has extremely poor
   survival rates—typically less than 15 months—even when treated with
   advanced techniques such as maximal safe resection (surgical removal),
   temozolomide chemotherapy, and radiotherapy. Gliomas differ from other
   tumors because their cells are especially invasive, infiltrating normal
   brain tissue early. This aggressiveness means that even after complete
   surgical removal of the tumor^[143]48–[144]50, recurrence is common at
   the surgical margins^[145]29–[146]31.

   TFs play critical roles in tumorigenesis and tumor progression by
   regulating gene transcription, influencing processes such as
   proliferation, apoptosis, invasion, metastasis, and angiogenesis. The
   E26 transformation-specifi (ETS) family of TFs, one of the largest
   groups of transcription regulators, drives oncogenesis through
   chromosomal translocations, abnormal gene expression, and disrupted
   signaling pathways. The ETS family activates genes related to tumor
   invasion and metastasis, promoting tumor invasion and
   metastasis^[147]32,[148]33,[149]51. YY1, another key TF, has dual roles
   as both a gene activator and repressor. The overexpression of YY1 is
   strongly linked to tumor initiation and metastasis, and its involvement
   in critical pathways leads to poor clinical outcomes^[150]18.
   Similarly, BTF3 is highly expressed in multiple cancers and promotes
   tumor growth by interacting with genes such as TP53. Silencing BTF3 in
   melanoma has been shown to inhibit tumor growth and induce apoptosis,
   underscoring its role in tumor progression^[151]52. The intricate
   regulatory networks governed by TFs are fundamental to the biological
   behavior of gliomas and other malignancies. These networks not only
   drive tumorigenesis but also have a significant effect on clinical
   outcomes.

   In this study, the functions of 795 TFs in glioma were investigated,
   and two distinct prognostic subgroups, C1 and C2, were identified via
   NMF on the basis of their biological characteristics and patient
   outcomes. These subgroups were differentiated by several clinical and
   molecular characteristics, such as polygenic risk score (PRS) type,
   tumor grade, IDH gene mutation status, 1p/19q codeletion status, and
   MGMT gene promoter methylation status, emphasizing the biological
   complexity and heterogeneity of glioma. These two subgroups also
   differed in the proportion of gliomas with varying WHO grades. C1 had a
   greater proportion of patients with WHO grade IV gliomas (54%) than did
   C2 (19%), whereas C2 had a greater proportion of patients with
   lower-grade gliomas (WHO grade II: C1: 6% vs. C2: 55%; WHO grade III:
   C1: 18% vs. C2: 53%). These differences were statistically significant
   (p < 0.01). Additionally, most patients in the C1 subgroup had
   higher-grade gliomas.

   Further analysis revealed the biological pathways activated in each
   subgroup. In the hallmark gene set, pathways such as
   EPITHELIAL_MESENCHYMAL_TRANSITION, ANGIOGENESIS, HYPOXIA, and
   P53_PATHWAY were more active in the C1 subgroup, suggesting that these
   processes contribute to the aggressive nature of C1 tumors. Similar
   patterns were found in immune-related and KEGG gene sets, particularly
   in the activation of pathways related to CELL_ADHESION_MOLECULES_CAMS,
   AMINO_SUGAR_AND_NUCLEOTIDE_SUGAR_METABOLISM, and P53_SIGNALING,
   indicating more aggressive biological behavior in the C1 subgroup.

   To further understand the biological significance of these differences,
   a range of advanced analysis techniques, such as differential gene
   expression analysis, WGCNA, survival analysis, and machine learning
   methods, were used. These analyses revealed that TCF3 is a central
   regulator. Building on the pivotal roles of TFs in tumorigenesis,
   cancer cell metabolic reprogramming, and stem cell
   regulation^[152]10,[153]15–[154]17,[155]29,[156]30, significant
   progress has been achieved in glioma research through targeting
   dysregulated TFs using nanomedicine as a potential therapeutic
   strategy^[157]14. Studies have demonstrated that TCF3 is overexpressed
   in human gliomas, and its inhibition leads to suppressed tumor growth
   both in vitro and in vivo, accompanied by reduced proliferation and
   migration of glioma cells^[158]28,[159]31. TCF3 was significantly
   differentially expressed between the two glioma subgroups. This finding
   was confirmed through further analysis, which revealed that TCF3 is
   closely associated with the WHO classification of glioma and can serve
   as a marker to distinguish between different glioma subgroups. The role
   of the TCF3 gene in glioma was further confirmed via mutation analysis.
   The TCF3-H subgroup presented a greater TMB, with significant mutations
   in genes such as TP53 (49%) and ATRX (34%). Additionally, several
   low-frequency mutations were detected in genes such as NF1 and PTEN,
   suggesting that different signaling pathways are involved in the TCF3-H
   group. This variety of mutations points to a more complex genetic
   makeup and greater tumor heterogeneity in the TCF3-H subgroup, which
   may contribute to the poor prognosis in this subgroup.

   In summary, our study developed a novel molecular classification system
   based on TF-related genes, with a particular emphasis on the role of
   TCF3 in the molecular subtyping of glioma and prognosis prediction.
   Although our comprehensive analyses and rt-QPCR validation have
   identified TCF3 as a pivotal biomarker within the transcription factor
   subtype landscape of glioma, we acknowledge that the functional
   validation of its mechanistic role remains to be fully explored.
   Specifically, in vitro and in vivo model experiments are warranted to
   dissect the precise biological functions of TCF3 as a transcription
   factor in glioma. Moreover, a systematic investigation into the
   interplay between TCF3 and classical oncogenic pathways is imperative
   to elucidate the specific mechanisms underlying TCF3’s actions in
   glioma. By integrating multiple TFs, we identified two distinct
   prognostic subgroups with significant differences in clinical and
   molecular characteristics. TCF3 has emerged as a critical marker for
   distinguishing these subgroups, as it is closely associated with tumor
   grade, mutation burden, and key pathways. This classification system
   enhances the accuracy of survival prediction for glioma patients,
   reflecting the biological heterogeneity of the disease. Furthermore,
   this study provides a rich dataset that can be leveraged to explore TFs
   as potential therapeutic targets and guide the development of
   personalized glioma treatments.

Electronic supplementary material

   Below is the link to the electronic supplementary material.
   [160]Supplementary Material 1^ (2MB, xlsx)

Author contributions

   All authors made significant contributions to the reported work, Qiao
   LI and Peng Feng made significant contributions to the conception,
   research design, execution, data collection and analysis of the paper,
   and Shangyu Liu and Goupeng Tian participated in drafting, revising or
   critically reviewing the article; The corresponding author Yawen Pan ,
   Gouqiang Yuan guided and controlled the article as a whole.

Data availability

   Publicly available datasets were analyzed in this study. This data can
   be found here: The China Glioma Genome Atlas (CGGA) database
   ([161]http://www.cgga.org.cn) , The Cancer Genome Atlas (TCGA) database
   ([162]https://portal.gdc.cancer.gov/).

Declarations

Competing interests

   The authors declare no competing interests.

Ethical approval and consent to participate

   All patients involved in this study provided written informed consent,
   and the study was approved by the Medical Ethics Committee of the
   Second Hospital of Lanzhou University (Approval No. 2022 A-515). We
   adhere to the journal’s publication ethics policies and ensure that all
   research published complies with the journal’s stringent technical and
   ethical standards.

Consent for publication

   all authors agreed to finally approve the version to be released; Agree
   to the journal to which the article is submitted; and agree to take
   responsibility for all aspects of the work.

Footnotes

   Publisher’s note

   Springer Nature remains neutral with regard to jurisdictional claims in
   published maps and institutional affiliations.

   Qiao Li, Peng Feng and Shangyu Liu: contributed equally to this work.

Contributor Information

   Guoqiang Yuan, Email: yuangq08@lzu.edu.cn.

   Yawen Pan, Email: pyw@lzu.edu.cn.

References