Abstract We aimed to predict CD44 expression and assess its prognostic significance in patients with high-grade gliomas (HGG) using non-invasive radiomics models based on machine learning. Enhanced magnetic resonance imaging, along with the corresponding gene expression and clinicopathological data, was downloaded from online database. Kaplan–Meier survival curves, univariate and multivariate COX analyses, and time-dependent receiver operating characteristic were used to assess the prognostic value of CD44. Following the screening of radiomic features using repeat least absolute shrinkage and selection operator, two radiomics models were constructed utilizing logistic regression and support vector machine for validation purposes. The results indicated that CD44 protein levels were higher in HGG compared to normal brain tissues, and CD44 expression emerged as an independent biomarker of diminished overall survival (OS) in patients with HGG. Moreover, two predictive models based on seven radiomic features were built to predict CD44 expression levels in HGG, achieving areas under the curves (AUC) of 0.809 and 0.806, respectively. Calibration and decision curve analysis validated the fitness of the models. Notably, patients with high radiomic scores presented worse OS (p < 0.001). In summary, our results indicated that the radiomics models effectively differentiate CD44 expression level and OS in patients with HGG. Supplementary Information The online version contains supplementary material available at 10.1038/s41598-025-90128-7. Keywords: Radiomics, CD44, High-grade glioma, Overall survival, Machine learning Subject terms: Immunology, Biomarkers, Oncology, Cancer, CNS cancer Introduction Glioma is the most common primary malignancy occurring in the central nervous system (CNS)^[32]1. Despite advancements in surgery, chemotherapy, radiotherapy, and targeted therapy for glioma in recent decades, it remains a highly lethal disease due to limited treatment efficacy. Gliomas are categorized into grade I to IV based on the histologic and molecular features^[33]2. Generally, grade I and II gliomas are categorized as low-grade glioma (LGG) and have a relatively favorable prognosis. In contrast, grade III and IV gliomas are classified as high-grade gliomas (HGG), which are highly malignant, aggressive and associated with a miserable prognosis. According to the fifth edition of the WHO Classification of Tumors of CNS, the mutational status of IDH is one of the most crucial biomarkers for gliomas, as it helps distinguish GBM from both astrocytoma and oligodendroglioma^[34]3. Currently, the prognostic indicators of glioma include clinicopathologic features, isocitrate dehydrogenase (IDH) status, O^6-methylguanine DNA methyltransferase promoter (MGMT) status, Chromosome (Chr) 1p19q codeletion and radiomics data of computer tomography (CT) and magnetic resonance imaging (MRI). However, these indicators are insufficient for precision medicine due to the heterogeneity of gliomas. Consequently, there is an urgent need to investigate novel predictors to improve prognostic stratification and personalize medical decision-making. The protein CD44 antigen (CD44) is a glycoprotein located on the cell surface, which functions as a receptor for extracellular matrix components such as hyaluronan, collagen, and matrix metalloproteinases^[35]4,[36]5. It participates in various biological functions, including cell adhesion, migration, lymphocyte activation, cell-cell interactions, hematopoiesis, and inflammation^[37]6–[38]9. A prior study discovered that high CD44 expression is linked to diminished survival rates in patients with grade II and III glioma, indicating its potential as a prognostic marker^[39]10. Several compounds targeting CD44 yielded promising antitumor effects^[40]11–[41]14. For instance, a phase II clinical study highlighted the efficacy and safety of Angstrom6 in ovarian cancer by modulating CD44-mediated signaling^[42]11. Nevertheless, detection of CD44 through blood is inconvenient, expensive, and subjected to spatio-temporal heterogeneity with tumor. Moreover, detection of CD44 by tissue sample is invasive, costly, expensive, and influenced by factors such as antibody specificity and operator variability. Consequently, an efficient, non-invasive, and repeatable method to predict CD44 expression in patients with HGG is urgently needed. Given its low cost, non-invasiveness, and effectiveness, MRI is routinely implemented in the patients with glioma. Radiomics offers a method for quantitatively, dynamically, and non-invasively reporting tumor characteristics based on advanced imaging features by analyzing a vast array of radiographic characteristics^[43]15. Machine learning (ML) has been extensively applied in clinical diagnosis and precision medicine, especially in the fields of radiology and oncology. Increasing studies have indicated the potential of non-invasive evaluation via radiomics for early diagnosis and molecular typing of HGG, as well as for predicting the tumor microenvironment (TME) and heterogeneity^[44]16–[45]18. However, there have been no published studies on predicting CD44 expression in patients with HGG using radiomics. In the present study, we aimed to build a non-invasive radiomics model by ML to predict CD44 expression in patients with HGG. This was achieved by utilizing data from The Cancer Genome Atlas (TCGA) and The Cancer Imaging Archive (TCIA) databases. Additionally, we explored the potential relationship between CD44 and the tumor microenvironment (TME), which could serve as a valuable predictive tool to enhance clinical decision-making and precision medicine. Materials and methods Data collection MRI images, along with corresponding gene expression data and clinicopathological data of patients with HGG, were downloaded from the TCIA ([46]https://www.cancerimagingarchive.net/) and TCGA ([47]https://portal.gdc.cancer.gov/) databases. The inclusion criteria for this study were: (1) newly diagnosed HGG; (2) pathologically confirmed grade III or IV gliomas; (3) availability of axial T1WI contrast-enhanced (T1WI + C) images of diagnostic quality. The exclusion criteria were: (1) absence of gene expression data, clinical information, and survival data; (2) survival time of less than 30 days; and (3) low image quality. The schematic drawing of the criteria was listed in Supplementary Figure [48]S1. For each patient, we collected the following variables: age (< 60 or ≥ 60 years), gender (female or male), grade (III or IV), IDH status (wild-type or mutated), Chr 1p19q codeletion status (codeletion or non-codeletion), the methylation status of MGMT promoter (unmethylated/unknown or methylated), chemotherapy (no or yes), radiotherapy (no or yes), expression level of CD44, survival time, and survival outcome (alive or death). The requirement for ethical approval was waived. Expression analysis of CD44 The RNAseq data (TPM) from TCGA and Genotype-Tissue Expression (GTEx) processed by the TOIL project^[49]19, were accessed from UCSC Xena ([50]https://xenabrowser.net/datapages/). All the patients were divided into CD44^high and CD44^low groups based on the cutoff expression level of CD44 determined by the survminer R package. Differential CD44 expression analysis was performed between HGG samples and normal brain samples (NBTs), following the log2 transformation of the RNAseq data. Immunohistochemistry (IHC) We obtained a total of 4 HGG tissue samples and 4 NBTs from patients with epilepsy and brain injury in the Department of Pathology and Neurosurgery, Shengjing Hospital of China Medical University. After fixation in 10% formalin, the samples were paraffin-embedded and sectioned. Subsequently, IHC staining with an anti-CD44 antibody (Proteintech, # 60224-1-Ig, dilution 1:200) was performed to assess CD44 expression. Informed consent was obtained from all patients, and the study was approved by the Ethics Committee of Shengjing Hospital (No. 2019PS105K). Survival and enrichment analysis of CD44 OS was assessed and plotted using Kaplan–Meier survival curves. Cox univariate and multivariate regression analyses were employed to evaluate the influence of various variables on OS. Moreover, we conducted subgroup analyses and interaction tests to evaluate the effects of CD44. The time-dependent receiver operating characteristic (ROC) curves were performed to estimate the accuracy and discrimination of CD44 in predicting OS at 12, 24, and 36 months of follow-up. To determine the correlations between CD44 and clinical features, we calculated Spearman rank coefficients. We applied CIBERSORTx ([51]https://cibersortx.stanford.edu/) to deconvolute of the expression matrix of HGG samples, thereby obtaining the distribution of 20 distinct immune cell types in HGG tissues of each sample^[52]20. We conducted Wilcoxon test to assess the immune cell abundances between the CD44^high and CD44^low groups by the R package limma^[53]21. To identify the different functions and relevant pathways between the CD44^high and CD44^low groups, we conducted gene ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis on differentially expressed genes (DEGs) by the R package clusterProfiler with q value < 0.05^[54]22. Development of a radiomics models to predict CD44 expression level in patients with HGG Totally, we obtained 82 samples by intersecting the TCGA bioinformatics data with TCIA image data. The imaging parameters were as follows: Slice Thickness: 0.9–6.0 mm; Pixel_Spacings: 0.4–1.1 mm; Spacing Between Slices: 1.0–7.5 mm; RepetitionTime: 6.6-3252.8 ms; Echo Time: 2.75-20.0 ms. We applied N4 bias field correction to the MR images to reduce signal inhomogeneity caused by magnetic field variations, which can lead to regions appearing too bright or dark, or inconsistencies in brightness within the same tissue. Images were resampled to achieve isotropy, minimizing variations due to differences in scanning equipment and protocols, as well as lesion size. Image normalization was also implemented to reduce noise and standardize intensity, thereby minimizing differences in image signal intensity across different machines. Patients were divided into CD44^high and CD44^low groups based on the cutoff expression level of CD44 determined by the survminer R package. Two experienced physicians manually outlined entire tumor areas in the MRI enhanced T1-weighted images (T1WI)^[55]23 using three-dimensional slicer version 4.10.2. The Volume of Interest (VOI) delineation was performed by two radiologists who were blinded to the clinical data. One radiologist completed the VOI delineation for all cases, while the second radiologist randomly selected 20 samples for independent delineation. A total of 107 radiomic features were extracted by PyRadiomics and subsequently normalized. The intraclass correlation coefficient (ICC) was utilized to evaluate the reliability of the radiomic features between the two physicians, and features with ICC ≥ 0.75 were chosen for subsequent analyses. Repeat least absolute shrinkage and selection operator (LASSO) was carried out to further screen the radiomic features by R package glmnet^[56]24. Features that occurred more than 600 times during the 1000 screening were included in the final model. After feature extraction, Z-score normalization was utilized. Logistic Regression (LR) and Support vector machine (SVM) were used to predict the expression of CD44. The diagnosis discrimination of radiomics models was assessed by plotting the ROC and precision recall (PR) curves. The calibration curve was generated to evaluate the goodness of fit of radiomics models. Brier score was used to determine the overall performance of radiomics models. The clinical benefit was determined by a decision curve analysis (DCA). To identify the predicted CD44 levels by the radiomic models, we calculated radiomic score (RS). The Wilcoxon test was utilized to assess the difference of RS between CD44^high and CD44^low groups. To determine the correlations among RS, CD44 and genes of EMT, we performed Spearman correlation analysis. We obtained 170 samples by intersecting TCGA clinical data and TCIA imaging data. The 170 patients were divided into dichotomous variable (i.e., RS^high and RS^low groups) based on the cutoff value of RS derived from SVM model determined by R package survminer. Log-rank tests were used to survival analysis. Statistics analysis The quantitative data were described as mean ± standard deviation. The categorical data were described as frequencies and proportions, and then compared using χ^2 test at baseline. Statistical significance was considered as a two-sided p-value < 0.05. The T1WI + C-clinical prediction model was visualized through a nomogram by the rms R package. All statistical analyses were performed by R software version 4.1.1. Results Patient characteristics The flow-chart of this study was presented in Fig. [57]1. A cohort of 310 HGG patients from the TCGA databases were included in survival analyses. Patients were separated into CD44^high (n = 155) and CD44^low (n = 155) groups based on the median expression of CD44 (5.043). The baseline characteristics of the patients were detailed in Supplemental Table 1. Fig. 1. [58]Fig. 1 [59]Open in a new tab The brief flowchart of the study process. Clinical significance of CD44 To elucidate the clinical relevance of CD44, we initially investigated its expression in HGG samples and NBTs. The results showed that CD44 expression was markedly elevated in HGG samples (p < 0.001, Fig. [60]2A). Furthermore, we conducted immunohistochemistry (IHC) experiments on HGG patient samples and NBTs, confirming that CD44 protein levels were elevated in HGG compared to NBTs (Fig. [61]2B). The median OS in the CD44^low group was 44.6 months, compared to just 16.2 months in the CD44^high group. Patients in the CD44^low group exhibited better OS than those in the CD44^high group (p < 0.001, Fig. [62]2C). Additionally, we analyzed the impact of various clinical characteristics to the OS of HGG patients. The results indicated that low age (< 60), histologic grade, chemotherapy, IDH mutant, Chr 1p19q codeletion, and methylated MGMT promoter were associated with longer OS (Supplementary Figure [63]S2). The time-dependent ROC curve indicated that the area under the curve (AUC) of the CD44 expression was 0.719 for 1-year OS prediction (Fig. [64]2D). Fig. 2. [65]Fig. 2 [66]Open in a new tab Expression of CD44 elevated and predicted poor OS in HGG. (A) CD44 was high expression in HGG tissues. (B) IHC analysis revealed that CD44 was upregulated in HGG compared to that in normal brain tissues. Images were captured at × 100. (C) Patients in the CD44^low group had better OS. (D) The ROC curves of CD44 in predicting OS at 12, 24, and 36 months of follow-up. (E) Cox univariate and multivariate regression analyses of the effects of clinicopathological variables on OS. (F) Subgroup analyses and interaction tests of the effects of CD44 expression level on OS. (G) The correlations between CD44 expression and clinical characteristics of HGG. CI, confidence interval; HGG, high-grade glioma; HR, hazard ratio; IDH, isocitrate dehydrogenase; IHC, immunohistochemistry; MGMT, O^6-methylguanine-DNA-methyltransferase; OS, overall survival; ROC, time-dependent receiver operating characteristic. *, p < 0.05; **, p < 0.01; ***, p < 0.001. Further, the univariate Cox analysis revealed that compared to low CD44 expression, high CD44 expression was significantly correlated with worse OS (HR = 3.025, 95% CI: 2.203–4.154, p < 0.001). Expectedly, low age (< 60), histologic grade, chemotherapy, IDH mutant, Chr 1p19q codeletion, and methylated MGMT promoter presented similar trend. In multivariate Cox analysis, high CD44 expression emerged as an independent factor of shorter OS compared to low CD44 expression (HR = 1.718, 95% CI: 1.189–2.484, p = 0.004, Fig. [67]2E). Moreover, low histologic grade, chemotherapy, and IDH mutant were identified as independent factors affecting OS in HGG patients. Subgroup analyses revealed that most subgroups with high CD44 expression exhibited unfavorable OS. In exact, significant interaction effects of CD44 expression on the OS were seen in chemotherapy and radiotherapy subgroups (Fig. [68]2F). The heatmap illustrating the correlation between CD44 expression and clinical characteristics revealed that CD44 expression was negatively and significantly related to IDH status, Chr 1p19q codeletion, and MGMT promoter status (all p < 0.001, Fig. [69]2G). Immune infiltration and enrichment analysis Subsequently, we delved into the distinctions in the tumor immune microenvironment between the CD44^high and CD44^low groups. As shown in Fig. [70]3A, the violin plots revealed a higher infiltration of M2 macrophages, monocytes, and CD4 memory resting T cells in the CD44^high group compared with the CD44^low group (p < 0.001). Fig. 3. Fig. 3 [71]Open in a new tab The immune and functional analyses of DEGs between the CD44^high and CD44^low groups. (A) The correlations of CD44 expression and immune cell infiltration in HGG. (B) The GO analyses of DEGs between the CD44^high and CD44^low groups. (C) The KEGG analyses of DEGs between the CD44^high and CD44^low groups. BP, biological process; CC, cell component; DEGs, differentially expressed genes; GO, gene ontology; HGG, high-grade glioma; KEGG, the Kyoto Encyclopedia of Genes and Genomes; MF, molecular function. To further understand the biological functions and mechanisms of CD44 in HGG, we performed GO and KEGG analyses of DEGs (listed in Supplemental Table 2) between the CD44^high and CD44^low groups. Synapse organization, focal adhesion, and transcription coregulator activity was significantly enriched in GO analysis (p < 0.05, Fig. [72]3B). KEGG pathway enrichment analysis unveiled several enriched pathways, including lysosome, Epstein-Barr virus infection, adherens junction, and Human immunodeficiency virus 1 infection (p < 0.05, Fig. [73]3C). Construction and evaluation of radiomics models Totally, 82 patients from the intersection of TCGA and TCIA datasets were enrolled for radiomics analysis. Patients were separated into CD44^high and CD44^low groups based on the median expression of CD44 (5.043). The ICCs of the features extracted by PyRadiomics were listed in Supplemental Table 3. The median of ICC was 0.938, and 93 features (86.9%) had an ICC of ≥ 0.75, which were enrolled for further feature selection. After repeat LASSO, seven features with frequencies > 600 remained in the model, including shape_sphericity, shape_sphericity, gray level size zone matrix (glszm)_gray level non uniformity normalized, and gray level run length matrix (glrlm)_long run high gray level emphasis, among others (Fig. [74]4). Fig. 4. [75]Fig. 4 [76]Open in a new tab Selection of radiomic features by LASSO. (A) Plot of the LASSO coefficient profiles of the selected features. The vertical line presented the optimal λ value. (B) Selection of tuning parameter (λ) in the LASSO model. The vertical line presented the optimal λ value. (C) Visualization the top 20 radiomic features with the most counts. The dashed line presented 600 counts. LASSO, least absolute shrinkage and selection operator. Subsequently, we constructed a radiomics model based on LR. The importance of seven features were showed in Fig. [77]5A. The radiomics score (RS) was calculated using the formula: RS = -0.01933 + 0.880206×shape_sphericity + 0.072922×glszm_gray level non uniformity normalized + 0.165961 × glrlm_long run high gray level emphasis-1.05154 × neighborhood gray tone difference matrix (ngtdm)_coarseness-0.5184 × glszm_large area low gray level emphasis-0.44085 × shape_surface area + 0.483907×first order mean. The model demonstrated an accuracy of 0.768, with sensitivity of 0.829, specificity of 0.707, positive predictive value (PPV) of 0.739, and negative predictive value (NPV) of 0.806. The predictive performance of the model was satisfactory, achieving an AUC of 0.806 (Fig. [78]5B). Moreover, the AUC in the 10-fold cross-validation and PR curve reached 0.751 and 0.772 (Fig. [79]5C,D). Fig. 5. [80]Fig. 5 [81]Open in a new tab Construction and assessment of the radiomics model for predicting CD44 expression based on LR. (A) Seven radiomic features and the corresponding importance in the radiomics model based on LR. (B) ROC curve. (C) ROC curve of 10-fold cross-validation. (D) PR curve. (E) Calibration curve. (F) Plot of DCA. (G) Predicted probabilities of RS in CD44^high and CD44^low groups. AUC, the area under the curve; DCA, decision curve analysis; Glrlm, gray level run length matrix; Glszm, gray level size zone matrix; HL, Hosmer-Lemeshow test; LR, logistic regression; Ngtdm, neighborhood gray tone difference matrix; PR, precision recall; ROC, receiver operating characteristic. ****, p < 0.0001. The calibration curve and Hosmer-Lemeshow test indicated a favorable consistency between the model-predicted probabilities of CD44 high expression and the actual values (p = 0.161, Fig. [82]5E). The DCA curves suggested our radiomics model provided higher net benefit than all treatments and none treatment within a threshold probability was 0.1–0.7 (Fig. [83]5F). Notably, the RS was significantly higher in the CD44^high group than in the CD44^low group (p < 0.001, Fig. [84]5G). To further investigate the optimal radiomics model for predicting CD44 expression in HGG, we developed a radiomics model utilizing SVM. The importance of seven features were showed in Fig. [85]6A. The SVM-based radiomics model achieved an accuracy of 0.780, with sensitivity at 0.854, specificity at 0.707, PPV at 0.745, and NPV at 0.829. Similarly, the predictive performance of the model was good with an AUC of 0.809 (Fig. [86]6B). In addition, the AUC of 10-fold cross-validation and the PR curve reached 0.778 and 0.771. (Fig. [87]6C,D). The calibration curve and Hosmer-Lemeshow test indicated a favorable consistency between the predictive CD44 expression and the true value (p = 0.194, Fig. [88]6E). The DCA curves suggested that our SVM-based radiomics model provided a higher net benefit than all treatments and none treatment within a threshold probability was 0.1–0.8 (Fig. [89]6F). The RS was significantly higher in the CD44^high group than in the CD44^low group (p < 0.001, Fig. [90]6G). Although the AUC for the SVM model was higher than that for LR model, no significant difference was found between the two models by Delong test. The RS of SVM model was chosen for further analyses due to its relatively higher accuracy. Fig. 6. [91]Fig. 6 [92]Open in a new tab Construction and assessment of the radiomics model for predicting CD44 expression based on SVM. (A) Seven radiomic features and the corresponding importance in the radiomics model based on SVM. (B) ROC curve. (C) ROC curve of 10-fold cross-validation. (D) PR curve. (E) Calibration curve. (F) Plot of DCA. (G) Predicted probabilities of RS in CD44^high and CD44^low groups. (H) Spearman correlation analysis of RS and EMT molecules. (I) Survival analysis of RS for OS. (J) Clinical application of radiomics model in HGG. AUC, the area under the curve; DCA, decision curve analysis; Glrlm, gray level run length matrix; Glszm, gray level size zone matrix; HL, Hosmer-Lemeshow test; Ngtdm, neighborhood gray tone difference matrix; OS, overall survival; PR, precision recall; ROC, receiver operating characteristic; RS, Radiomic score; SVM, support vector machine; ***, p < 0.001; ****, p < 0.0001. In addition to CD44, Spearman correlation analysis illustrated that RS was also significantly correlated to some molecules involved in EMT (p < 0.05), including serpin H1 (SERPINH1), insulin like growth factor binding protein 2 (IGFBP2), galectin-1 (LGALS1). (Fig. [93]6H). To investigate the prognostic implications of the radiomics model, we obtained 170 patients’ data with HGG from TCGA database for subsequent survival analysis. All the patients were divided into RS^high (n = 85) and RS^low (n = 85) groups based on the cutoff value of 0.532. The baseline clinicopathological characteristics of the patients were presented in Supplemental Table 4. As shown in Fig. [94]6I, the median OS was significantly elevated in RS^low group (21.6 months) compared to RS^high group (14.9 months), indicating that patients in RS^high group had poorer OS than those in RS^low group (p < 0.001). To enhance the clinical applicability, we developed a predictive nomogram incorporating radiomic score, age, gender, chemotherapy, radiotherapy, the methylation status of MGMT promoter, and Chr 1p19q codeletion status. The clinical application nomogram score sheet displayed the clinical predictive scheme at 12, 36, and 60 months (Fig. [95]6J). Discussion The present study aimed to elucidate the prognostic value of CD44 in HGG, and to develop radiomics models by ML to predict CD44 expression non-invasively. Our study corroborated that CD44 expression level was an independent predictor of OS, and high-expressed CD44 was associated with poor OS among HGG patients. In addition, we constructed two favorable radiomics models to predict CD44 expression level in HGG patients, achieving AUC values of 0.809 and 0.806, respectively. Furthermore, HGG patients with higher RS predicted relatively shorter OS. CD44, a cell-surface receptor with multifaceted roles in cellular functions, has been implicated as a risk factor in glioma, as demonstrated in previous studies^[96]25,[97]26. For instance, a meta-analysis revealed that elevated CD44 expression adversely impacts OS in patients with glioma^[98]25, which aligns with our findings. Accumulating evidence suggests that CD44 is responsible for growth and metastasis in HGG^[99]27,[100]28. Interestingly, our analysis found that the most significant GO enrichment terms associated with CD44 include synapse organization, focal adhesion, cell-substrate junction, and neuron to neuron synapse. In addition, CD44 was recently reported to participate in TME^[101]29, immunotherapy^[102]30, and stemness^[103]31 in HGG. Notably, our research also discovered a propensity for M2 macrophages infiltration in the CD44 high-expressed HGG, indicating a potential role of CD44 in driving M2 macrophage polarization and remodeling TME^[104]29. Numerous molecules, including CD44, contribute to HGG progression in distinct ways and at different stages, contributing to intratumoral and intertumoral heterogeneity^[105]32. However, assessing this heterogeneity pathologically is invasive, subject to sampling variability, and impractical in clinical practice for each sample. Thus, it is essential to construct an efficient approach to quantify the heterogeneity. MRI is routinely utilized for HGG diagnosis and follow-up in clinical practice, offering the capability to non-invasively assess the entire lesion and dynamic monitoring it. Several evidences have inspired us to explore the potential of MRI-based radiomics for diagnosis and classification of HGG^[106]33,[107]34. In the current research, we identified CD44 as an independent risk factor for OS by multivariate COX regression analysis. To further predict CD44 expression in HGG, we applied repeat LASSO to screen the radiomic features of MRI images, and constructed two radiomics models based on LR and SVM methods, respectively. Interestingly, both models exhibited satisfactory predictive power of CD44 expression. As precision medicine continues to advance, an increasing number of molecular targets have been identified. Notably, accurate prediction of these targets has become critical for diagnosis and therapy, forming the foundation of precision oncology applications. Consequently, many investigators are committed to build radiomics models to predict molecule status intratumor precisely and generalizability^[108]35,[109]36. Thus, radiomics models are emerging as a popular research direction for heterogeneity measure and biomarker prediction. Routine MRI diagnosis relies on morphological features of the lesions, which is lack of objective, quantitative information. Radiomics provides high-dimensional quantitative information that might have been overlooked or unnoticed by the naked eye, thereby aiding diagnosis based on imaging features. It is often referred to as “digital biopsy”. Recently, extensive studies have confirmed the importance of radiomics in predicting biomarkers in glioma. For instance, Sha et al.^[110]37 constructed radiomics models to predict the IDH mutation combined with MGMT methylated molecular subtype in glioma, achieving an impressive AUC of 0.9. Another study built an MRI-based radiomic signature that had promising ability to assess the promoter mutation status of TERT in glioblastoma patients^[111]38. Similarly, a recent study built two radiomics models based on MRI to predict CD44 expression in LGG^[112]39. In this research, we applied MRI radiomic features to predict CD44 expression in HGG, and our prediction models achieved AUCs of 0.806 and 0.809 in LR and SVM models, respectively. These results provide an efficient, cost-effective, repeatable, non-invasive method to predict CD44 expression in HGG. Following the screening of 107 radiomic features, seven radiomic features were finally retained to build our predictive model, including shape_sphericity, glszm_gray level non uniformity normalized, glrlm_long run high gray level emphasis, ngtdm_coarseness, glszm_large area low gray level emphasis, shape_surface area, and first order mean. Glszm, a texture feature, represents the number of connected voxels with the same gray level intensity and arrangement pattern in an image^[113]40. A higher value indicates greater heterogeneity within regions of interest. Several studies confirmed that glszm-derived features are common and important radiomic factors for the research in glioma^[114]39,[115]41. Wang et al. built a predictive model for CD44 expression in LGG, in which a feature from glszm category showed an odds ratio of 2.634 (p = 0.008)^[116]40. Consistent with their findings, our study also highlighted the crucial role of the glszm category in both models. In the SVM model, glszm_gray level non uniformity normalized emerged as the most important feature, while in the LR model, glszm_large area low gray level emphasis ranked as the third most important feature. The above results underscore that the glszm radiomic features played an important role in predicting the CD44 expression level in HGG. A multitude of studies have highlighted the potential of radiomics models as a promising tool for the diagnosis, management, and prognostication across various cancer types, including ovarian cancer, hepatocellular carcinoma, and meningioma^[117]42–[118]44. Several studies have specifically focused on constructing radiomics models to predict the survival of glioma patients^[119]45,[120]46. Li et al.^[121]47 established a prognostic radiomics model for patients with glioma, which demonstrated that patients with high radiomic feature scores had shorter OS. Similarly, our radiomics model, incorporating texture features, first order features, and shape features from radiomic images, consistently exhibited discriminatory performance to predict OS in HGG patients. According to Kaplan–Meier curves, the median OS of RS^low group was 21.6 months, which was 6.7 months longer than that of RS^high group. These findings collectively demonstrate the fantastic prognostic power of our radiomics models. While our radiomics models demonstrated strong performance in predicting CD44 expression and OS in patients with HGG, several limitations of the present research provide orientations for future study. Firstly, the research was an exploratory study, further validation study should be reported. In addition, the present study was based on MRI enhanced T1WI images, and future studies should consider incorporating additional functional MRI sequences. Moreover, the dataset used in this study was obtained from public database and the sample capacity was relatively small. Multicenter prospective studies are still needed to combat bias and enhance the generalizability of our findings. Conclusion Elevated CD44 expression was found to be strongly associated with shorter OS in patients with HGG. Our radiomics model, referred to as RS, demonstrated the ability to accurately discern the level of CD44 expression and predict OS in patients with HGG. Moreover, RS showed a significant correlation with some EMT related genes. This non-invasive radiomics model exhibited reliable and feasible performance, offering a potential powerful tool for predicting outcomes and assisting clinical decision-making in the era of individualized precision medicine and target therapy. Electronic supplementary material Below is the link to the electronic supplementary material. [122]Supplementary Material 1^ (2.2MB, tif) [123]Supplementary Material 2^ (17.5KB, docx) [124]Supplementary Material 3^ (2.5MB, xlsx) [125]Supplementary Material 4^ (13KB, xlsx) [126]Supplementary Material 5^ (12.9KB, xlsx) [127]Supplementary Material 6^ (337.2KB, tif) [128]Supplementary Material 7^ (13.9KB, docx) Acknowledgements