Abstract Objective: Epithelial-mesenchymal transition (EMT) is a tightly regulated and dynamic process occurring in both embryonic development and tumor progression. Our study aimed to comprehensively explore the molecular subtypes, immune landscape, and prognostic signature based on EMT-related genes in low-grade gliomas (LGG) in order to facilitate treatment decision-making and drug discovery. Methods: We curated EMT-related genes and performed molecular subtyping with consensus clustering algorithm to determine EMT expression patterns in LGG. The infiltration level of diverse immune cell subsets was evaluated by implementing the single-sample gene set enrichment analysis (ssGSEA) and ESTIMATE algorithms. The distinctions in clinical characteristics, mutation landscape, and immune tumor microenvironment (TME) among the subtypes were subjected to further investigation. Gene Set Variation Analysis (GSVA) was performed to explore the biological pathways that were involved in subtypes. The chemo drug sensitivity and immunotherapy of subtypes were estimated through GDSC database and NTP algorithm. To detect EMT subtype-related prognostic gene modules, the analysis of weighted gene co-expression network (WGCNA) was performed. The LASSO algorithm was utilized to construct a prognostic risk model, and its efficacy was verified through an independent CGGA dataset. Finally, the expression of the hub genes from the prognostic model was evaluated through the single-cell dataset and in-vitro experiment. Results: The TCGA-LGG dataset revealed the creation of two molecular subtypes that presented different prognoses, clinical implications, TME, mutation landscapes, chemotherapy, and immunotherapy. A three-gene signature (SLC39A1, CTSA and CLIC1) based on EMT expression pattern were established through WGCNA analysis. Low-risk patients showed a positive outlook, increased immune cell presence, and higher expression of immune checkpoint proteins. In addition, several promising drugs, including birinapant, fluvastatin, clofarabine, dasatinib, tanespimycin, TAK−733, GDC−0152, AZD8330, trametinib and ingenol-mebutate had great potential to the treatment of high risk patients. Finally, CTSA and CLIC1 were highly expressed in monocyte cell through single-cell RNA sequencing analysis. Conclusion: Our research revealed non-negligible role of EMT in the TME diversity and complexity of LGG. A prognostic signature may contribute to the personalized treatment and prognostic determination. Keywords: low-grade gliomas, Epithelial-mesenchymal transition (EMT), molecular subtypes, tumor microenvironment, immunotherpapy, signature, single cell RNA analysis 1 Introduction The efficacy of cancer chemotherapy and immunotherapy is often impeded by inter-patient and intra-tumor heterogeneity, as well as multifactorial drug resistance. Low-grade gliomas (LGGs), being a form of malignant brain tumor, are also confronted with the challenge of tumor heterogeneity and resistance to treatment ([28]Weller et al., 2015). Even after undergoing standard surgical resection followed by radiotherapy and chemotherapy, individuals diagnosed with low-grade gliomas continue to have a bleak prognosis, with an average survival rate ranging from 2 to 10 years ([29]Golub et al., 2019). Despite extensive efforts to improve clinical outcomes, more than half of LGG cases progress and evolve into therapy-resistant high-grade aggressive glioma over time ([30]Claus et al., 2015). One of the primary objectives of precision medicine is to precisely identify patients who are likely to benefit from personalized treatment and to prescribe treatments that are tailored to the unique characteristics of their tumors, with the aim of achieving optimal therapeutic outcomes. Hence, in the realm of precision medicine, it is imperative to establish a more precise categorization of tumors to eradicate the heterogeneity and resistance to treatment associated with LGGs. The process of epithelial-mesenchymal transition (EMT) is an essential differentiation program that is necessary for the development of tissues during embryogenesis. During this process, cells lose their epithelial characteristics and acquire mesenchymal migration properties ([31]Yang et al., 2020). Aberrant activation of the EMT process is frequently observed in tumor proliferation, leading to the development of resistance towards conventional therapeutic interventions ([32]Lambert et al., 2017; [33]Shibue and Weinberg, 2017). EMT has the capability to enhance the potential of tumor cells for invasion and metastasis by facilitating their migration, disturbing cell-cell connections, disintegrating the basement membrane, and restructuring the extracellular matrix (ECM) ([34]Miyoshi et al., 2004). Moreover, the process of EMT is linked with a higher percentage of cancer stem cells, reduced immune response against tumors and the development of resistance towards treatment ([35]Akalay et al., 2013; [36]Rhim et al., 2014), the study of therapeutic resistance from the perspective of EMT is one of cancer research focus. Thus, the presence of EMT, which is a hallmark of cancer, has been shown to be intimately linked with tumor invasion, metastasis, and the acquisition of chemotherapy resistance, all of which are critical biological processes in cancer development ([37]van Staalduinen et al., 2018; [38]Loret et al., 2019). Despite the existence of variations among tumor subtypes, the EMT process in LGG could potentially hold significant prognostic and molecular typing value. As a result of the highly hostile and intrusive characteristics of gliomas, it has been progressively acknowledged that the occurrence of EMT in gliomas may hold significant significance in the progression of glioma and the restructuring of the glioma microenvironment ([39]Lu et al., 2012). Given the emergence of innovative immunotherapy techniques that offer groundbreaking cancer treatment alternatives, it is crucial to ascertain the immune profile of distinct EMT expression patterns, and their responsiveness to immunotherapy ([40]Yamaguchi, 2016). In our study, while we have primarily relied on computational analyses of existing datasets, we have also performed experimental validation of the identified key genes. This approach underscores the critical role of EMT-related genes in the context of low-grade gliomas (LGG) and further highlights their significance in future research. By combining computational insights with experimental evidence, we aim to provide a more comprehensive understanding of the role of EMT in the expression, prognosis, immune tumor microenvironment (TME), clinical implications, and personalized treatment strategies for LGG. 2 Materials and methods 2.1 Data acquisition Transcriptome data, mutation data and clinicopathologic characteristics of LGG samples were retrieved from the Cancer Genome Atlas (TCGA; [41]https://www. cancer. gov/tcga/) through the Genomic Data Commons data portal (GDC; [42]https://portal.gdc.cancer.gov/). In addition, the aforementioned information were obtained from two cohorts (mRNAseq_693, Illumina HiSeq Platform; mRNAseq_325, Illumina HiSeq (2000) and (2,500) platforms) in Chinese Glioma Genome Atlas (CGGA; [43]http://www.cgga.org.cn), and serve as an external dataset. “SVA” ([44]Leek et al., 2012) package was then used to merged the two cohorts after removing the batch effects (“ComBat” algorithm). After converting the raw read count, the values were expressed as transcripts per kilobase million (TPM). The Molecular signatures database (MSigDB; [45]http://www.broadinstitute.org/msigdb) ([46]Liberzon et al., 2015) provided a collection of 200 genes related to EMT, obtained from the gene set named “HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION”. [47]Figure 1 showed the graphical abstract and analysis flow chart of present study. FIGURE 1. [48]FIGURE 1 [49]Open in a new tab The graphical abstract and analysis flow chart of present study. (A) Construction of EMT-related consensus clusters. (B) Evaluation of immunotherapy of EMT-related consensus clusters. (C) Construction of EMT-related signature. (D) Experimental validation of SLC39A1. 2.2 Unsupervised clustering analysis To perform an unsupervised cluster analysis on mRNA expression profiles of genes associated with EMT, the “ConsensusClusterPlus” R package was created ([50]Wilkerson and Hayes, 2010). The optimal number of clusters was determined using the cumulative distribution function (CDF) and consensus matrix through resampling analysis to achieve the best performance. To assess the disparity in survival among clusters, the Kaplan-Meier (KM) analysis was employed. The results displayed the top 20 mutated genes in each cluster after visualizing single-nucleotide polymorphisms (SNP) variations using the R package “maftools” ([51]Mayakonda et al., 2018). The GISTIC2.0 tool from Genome Data Analysis Center (GDAC) Firehose ([52]https://gdac.broadinstitute.org) was used to analyze the copy number alterations (CNA) data among subtypes. 2.3 Landscape of immune TME To calculate the ratio of immune cells in LGG samples, the Estimation of Stromal and Immune cells in Malignant Tumours using Expression (ESTIMATE) data method was utilized to compute the stromal and immune scores ([53]Yoshihara et al., 2013). The estimation of tumor purity was done by combining stromal and immune scores. The violin plot displayed variations in immune score, stromal score, and tumor purity across various subtypes. The single-sample gene set enrichment analysis (ssGSEA) algorithm was utilized to measure the proportion of immune cell infiltration in the immune TME of LGG samples through the computation of enrichment scores. To compute the abundance of 28 tumor infiltrating immune cells (TIICs), a sum of 782 marker genes was collected from Bindea et al. ([54]Bindea et al., 2013) and Broad Institute ([55]http://software.broadinstitute.org/gsea/msigdb/index.jsp). Boxplots were used to present the results of ssGSEA analysis, which compared the level and function of immune infiltration among different subtypes using the gene set variation analysis (GSVA) software “GSVA” R package. 2.4 GSVA and functional annotation To evaluate dissimilarities in biological processes among various clusters, the R software package “GSVA” was utilized for conducting GSVA analysis. Based on transcriptome data, GSVA is an unsupervised and nonparametric gene enrichment method that estimates alterations in the activity of biological processes and pathways in samples ([56]Hänzelmann et al., 2013). To run GSVA analysis, the gene set of “C2. cp.kegg.v7.1″was downloaded from MSigDB database. The “clusterProfiler” R software package was utilized to conduct functional annotation of genes related to EMT. Significantly enriched pathways were identified through screening for those with an adjusted p-value < 0.05 and false discovery rate (FDR) < 0.05. 2.5 Therapy response prediction of subtype The tumor immune dysfunction and exclusion (TIDE) algorithm ([57]Jiang et al., 2018) was utilized to approximate the efficacy of immunotherapy based on tumor immune dysfunction and exclusion. According to the Genomics of Drug Sensitivity in Cancer (GDSC) database ([58]https://www.cancerrxgene.org/) ([59]Yang et al., 2012), we predict chemotherapy response of each LGG sample. After performing simple processing of data (such as elimination of low-variation genes and summarize duplicated gene expression data into averages), we estimated the sensitivity of eight chemotherapeutic agents (cisplatin, erlotinib, methotrexate, vincristine, carmustine, temozolomide (TMZ), rapamycin, and doxorubicin) currently under study or widely applied in gliomas by SubMap analysis (Gene Pattern) of GDSC data ([60]Hoshida et al., 2007). The ridge regression was performed to estimate the half-maximal inhibitory concentration (IC50) for LGG by “pRRophetic” R software package ([61]Geeleher et al., 2014a), and the quantitative prediction accuracy was verified by 10-fold cross validation based on the GDSC training set ([62]Geeleher et al., 2014b). 2.6 Weighted correlation network analysis (WGCNA) to identify hub genes from subtypes For the purpose of identifying subtype associated genes, we adopted WGCNA algorithm through the “WGCNA” R package (Version: 1.71) ([63]Langfelder et al., 2009). At first, genes that exhibited a variance value exceeding 25% were selected for the construction of the coexpression network ([64]Yang and Xu, 2021; [65]Zhou et al., 2021; [66]Feng et al., 2022). Subsequently, the outlier samples were excluded through the “goodSampleGenes” function, and the soft-thresholding value β = 5 (scale free = 0.85) was applied to ensure a scale-free network. Then, a gene clustering tree was generated based on the computed adjacency among genes, followed by the categorization of genes into distinct modules comprising no less than 100 genes exhibiting similar characteristics within each module. The modules that exhibited most correlation with subtypes were selected to the downstream analysis, such as KEGG analysis. Moreover, the hub genes were screened based on the cutoff gene significance (GS) > 0.6 and module membership (MM) > 0.8. Pathway enrichment analysis on the modules was performed via the “clusterProfiler” R package ([67]Wu et al., 2021). 2.7 Development of EMT-related signature The “glmnet” package was employed to conduct a LASSO regression analysis on genes associated with prognostic EMT subtyping to evaluate their effect on prognosis. The analysis was subjected to ten-fold cross-validation, and the genes were chosen based on the point with the lowest error rate. Patients were categorized into low or high EMT score subgroups using the median score, which was calculated by merging gene expression and coefficients to compute the EMT risk score. KM survival curves were generated by performing survival analysis with the aid of the “survival” and “survminer” packages. To generate receiver operating characteristic (ROC) curves the “timeROC” package was employed. Prognostic variables were subjected to uni- and multivariate-cox regression analyses. A nomogram was constructed using the “rms” package to estimate survival probability, and calibration curves were used to evaluate the accuracy of the predictions. 2.8 Drug discovery of EMT-related signature The response of human cancer cell lines to small molecule compounds was estimated using drug sensitivity profiles obtained from either the Cancer Therapeutics Response Portal (CTRP) ([68]Basu et al., 2013) or the PRISM project ([69]Corsello et al., 2020). 2.9 Single cell RNA-sequencing (scRNA-seq) revealed the expression level of genes The [70]GSE202096 dataset provided scRNA-seq data for one LGG samples ([71]GSM6094425) ([72]Chen et al., 2022). After conducting quality control, cells that had more than 20% mitochondrial UMI counts were eliminated using the “Seurat” package ([73]Cao et al., 2022). The selection process involved choosing the top 1,500 genes with high variability, followed by clustering cell populations via the FindClusters function and then mapping them into t-distributed stochastic neighbor embedding (t-SNE). Using the FindAllMarkers function, the markers for each cell cluster were determined. Then, the CellMarker database was utilized to annotate cells based on their cell markers ([74]Hu et al., 2022). 2.10 Cell culture and transfection The cultivation of HMC3, GOS-3, T98G, and LN-18 cells (ATCC) was carried out in DMEM supplemented with 10% fetal bovine serum (Sigma Aldrich) under an atmosphere of 5% CO2 at 37°C. After cloning siRNA targeting SLC39A1 into lentiviral vectors, the vectors were transfected into GOS-3 and T98G cells for 3 days and then exposed to 4 μg/mL puromycin for 1 week. 2.11 Real-time quantitative PCR (RT-qPCR) Total RNA was extracted using TRIzol (Beyotime) and cDNA was synthesized with the PrimeScript RT reagent kit after gDNA Eraser (Takara) treatment. SYBR Green II Mixture (TaKaRa) was used for RT-qPCR, and GAPDH was used as an internal reference to calculate the expression using the 2^−ΔΔCT method. 2.12 Flow cytometry Cell apoptosis was detected through flow cytometry and the Cell Apoptosis Detection Kit (KTA0002; Abbkine) was utilized. To summarize, a group of cells was exposed to 5 μL of Annexin V-AbFluor™ 488 binding and 2 μL of PI for 15 min at room temperature while being shielded from light. After adding 400 μL of 1 x Annexin V buffer, the level of apoptosis was assessed using a Beckman Flow cytometer. 2.13 Statistical analysis The R language ([75]R-project.org) platform and software package from the Bioconductor project ([76]www.bioconductor.org) (R Core Team, Version 4.0.2) were used for statistical analysis and visualization of results in the current study. To compare the two groups and more groups, the Wilcoxon and Kruskal–Wallis tests were utilized, respectively ([77]Hazra and Gogtay, 2016). To detect the differentially expressed genes (DEGs) between the two subtypes, the one-way ANOVA and Tukey’s test were conducted with a q-value of less than 0.05 and an absolute value of log2FC greater than 2. The KM curve, analyzed by log-rank test, exhibited the variations in operating systems among different groups. Bilateral p values were used, with a statistically significant difference defined as p < 0.05. 3 Results 3.1 Clinicopathological characteristics of two LGG molecular subtypes based on EMT related genes Based on the prognostic EMT gene expression profile, LGG samples was divided into two molecular subtypes by unsupervised clustering analysis ([78]Figures 2A–C). The EMT gene expression were presented a significant divergence between two subtypes ([79]Figure 2D). [80]Figure 2E showed that the principal components analysis (PCA) distribution patterns were mostly in agreement with the two subtypes designations. [81]Figure 2F suggested the overall prognosis of C2 is worse than that of C1 (p < 0.0001). Moreover, we further explore the relationship between subtypes and clinical traits. FIGURE 2. [82]FIGURE 2 [83]Open in a new tab EMT-related genes could distinguish LGG in TCGA with different clinical and molecular features. (A) Relative change in area under CDF curve for k = 2 to k = 9. (B) Delta curve analysis from k = 2 to k = 9. (C) Consensus clustering matrix heatmap plots of 506 samples from TCGA datasets for k = 2. (D) Expression of EMT-related genes in molecular subtypes. (E) PCA analysis of the EMT-related genes expression when k = 2. (F) Kaplan-Meier analysis of patients between two subtypes. CDF, cumulative distribution function; PCA, principal components analysis. As [84]Figures 3A–G showed, the patients survival status ([85]Figure 3A), age ([86]Figure 3B), grade ([87]Figure 3D), IDH status ([88]Figure 3E), 1p19q status ([89]Figure 3F) and methylation status ([90]Figure 3G) were showed a significant difference (p < 0.05) between C1 and C2, while no difference was observed in gender ([91]Figure 3C). Subtype C2 corresponding to more deaths, elderly patients, higher grade, 1p19q non-codel, IDH wild type, and unmethylated cases. FIGURE 3. [92]FIGURE 3 [93]Open in a new tab The relationship between two molecular subtypes, survival status (A), age (B), gender (C), grade (D), IDH mutation status (E), 1p19q status (F) and methylation status (G). We assessed the function of each subtype through GSVA analysis ([94]Supplementary Figure S1) to delve deeper into the potential biological process of the two clusters. Results suggested that C1 mainly enriched metabolism-related signaling pathways, such as butanoate metabolism and propanoate metabolism pathways. The primary routes of C2 that were enriched consist of cell adhesion molecules (CAMs), interaction with ECM receptors, the p53 signaling pathway, interaction between cytokines and cytokine receptors, focal adhesion, the JAK-stat signaling pathway, and apoptosis. These pathways were significantly correlated with tumor cell genesis, proliferation, invasion and migration. Furthermore, the GSVA outcomes indicated that C2 was linked to pathways related to DNA damage repair (such as Nucleotide excision repair, DNA replication, and Mismatch_repair) as well as immune functions (including Natural killer cell mediated cytotoxicity, Toll like receptor signaling pathway, and Antigen processing and presentation). In conclusion, C2 showed more malignant biological behaviors and enrichment signaling pathways than C1.In addition, SNP analysis of the two clusters in TCGA uncovered that C1 had a higher mutation proportion than C2 of IDH1, TP53, ATRX, CIC, FUBP1, NOTCH1 ([95]Rotin et al., 2015) and IDH2 ([96]Supplementary Figure S1A). However, EGFR, PTEN and NF1 ([97]Verhaak et al., 2010; [98]Marques et al., 2019; [99]Koptyra et al., 2023), the mutations that were common in glioblastoma (GBM), were more frequent in C2 than C1 ([100]Supplementary Figure S2B). Therefore, EMT related genes are important references for LGG molecular subtype. Our cluster-specific