Abstract

   Objective: Epithelial-mesenchymal transition (EMT) is a tightly
   regulated and dynamic process occurring in both embryonic development
   and tumor progression. Our study aimed to comprehensively explore the
   molecular subtypes, immune landscape, and prognostic signature based on
   EMT-related genes in low-grade gliomas (LGG) in order to facilitate
   treatment decision-making and drug discovery.

   Methods: We curated EMT-related genes and performed molecular subtyping
   with consensus clustering algorithm to determine EMT expression
   patterns in LGG. The infiltration level of diverse immune cell subsets
   was evaluated by implementing the single-sample gene set enrichment
   analysis (ssGSEA) and ESTIMATE algorithms. The distinctions in clinical
   characteristics, mutation landscape, and immune tumor microenvironment
   (TME) among the subtypes were subjected to further investigation. Gene
   Set Variation Analysis (GSVA) was performed to explore the biological
   pathways that were involved in subtypes. The chemo drug sensitivity and
   immunotherapy of subtypes were estimated through GDSC database and NTP
   algorithm. To detect EMT subtype-related prognostic gene modules, the
   analysis of weighted gene co-expression network (WGCNA) was performed.
   The LASSO algorithm was utilized to construct a prognostic risk model,
   and its efficacy was verified through an independent CGGA dataset.
   Finally, the expression of the hub genes from the prognostic model was
   evaluated through the single-cell dataset and in-vitro experiment.

   Results: The TCGA-LGG dataset revealed the creation of two molecular
   subtypes that presented different prognoses, clinical implications,
   TME, mutation landscapes, chemotherapy, and immunotherapy. A three-gene
   signature (SLC39A1, CTSA and CLIC1) based on EMT expression pattern
   were established through WGCNA analysis. Low-risk patients showed a
   positive outlook, increased immune cell presence, and higher expression
   of immune checkpoint proteins. In addition, several promising drugs,
   including birinapant, fluvastatin, clofarabine, dasatinib,
   tanespimycin, TAK−733, GDC−0152, AZD8330, trametinib and
   ingenol-mebutate had great potential to the treatment of high risk
   patients. Finally, CTSA and CLIC1 were highly expressed in monocyte
   cell through single-cell RNA sequencing analysis.

   Conclusion: Our research revealed non-negligible role of EMT in the TME
   diversity and complexity of LGG. A prognostic signature may contribute
   to the personalized treatment and prognostic determination.

   Keywords: low-grade gliomas, Epithelial-mesenchymal transition (EMT),
   molecular subtypes, tumor microenvironment, immunotherpapy, signature,
   single cell RNA analysis

1 Introduction

   The efficacy of cancer chemotherapy and immunotherapy is often impeded
   by inter-patient and intra-tumor heterogeneity, as well as
   multifactorial drug resistance. Low-grade gliomas (LGGs), being a form
   of malignant brain tumor, are also confronted with the challenge of
   tumor heterogeneity and resistance to treatment ([28]Weller et al.,
   2015). Even after undergoing standard surgical resection followed by
   radiotherapy and chemotherapy, individuals diagnosed with low-grade
   gliomas continue to have a bleak prognosis, with an average survival
   rate ranging from 2 to 10 years ([29]Golub et al., 2019). Despite
   extensive efforts to improve clinical outcomes, more than half of LGG
   cases progress and evolve into therapy-resistant high-grade aggressive
   glioma over time ([30]Claus et al., 2015). One of the primary
   objectives of precision medicine is to precisely identify patients who
   are likely to benefit from personalized treatment and to prescribe
   treatments that are tailored to the unique characteristics of their
   tumors, with the aim of achieving optimal therapeutic outcomes. Hence,
   in the realm of precision medicine, it is imperative to establish a
   more precise categorization of tumors to eradicate the heterogeneity
   and resistance to treatment associated with LGGs.

   The process of epithelial-mesenchymal transition (EMT) is an essential
   differentiation program that is necessary for the development of
   tissues during embryogenesis. During this process, cells lose their
   epithelial characteristics and acquire mesenchymal migration properties
   ([31]Yang et al., 2020). Aberrant activation of the EMT process is
   frequently observed in tumor proliferation, leading to the development
   of resistance towards conventional therapeutic interventions
   ([32]Lambert et al., 2017; [33]Shibue and Weinberg, 2017). EMT has the
   capability to enhance the potential of tumor cells for invasion and
   metastasis by facilitating their migration, disturbing cell-cell
   connections, disintegrating the basement membrane, and restructuring
   the extracellular matrix (ECM) ([34]Miyoshi et al., 2004). Moreover,
   the process of EMT is linked with a higher percentage of cancer stem
   cells, reduced immune response against tumors and the development of
   resistance towards treatment ([35]Akalay et al., 2013; [36]Rhim et al.,
   2014), the study of therapeutic resistance from the perspective of EMT
   is one of cancer research focus. Thus, the presence of EMT, which is a
   hallmark of cancer, has been shown to be intimately linked with tumor
   invasion, metastasis, and the acquisition of chemotherapy resistance,
   all of which are critical biological processes in cancer development
   ([37]van Staalduinen et al., 2018; [38]Loret et al., 2019). Despite the
   existence of variations among tumor subtypes, the EMT process in LGG
   could potentially hold significant prognostic and molecular typing
   value. As a result of the highly hostile and intrusive characteristics
   of gliomas, it has been progressively acknowledged that the occurrence
   of EMT in gliomas may hold significant significance in the progression
   of glioma and the restructuring of the glioma microenvironment ([39]Lu
   et al., 2012). Given the emergence of innovative immunotherapy
   techniques that offer groundbreaking cancer treatment alternatives, it
   is crucial to ascertain the immune profile of distinct EMT expression
   patterns, and their responsiveness to immunotherapy ([40]Yamaguchi,
   2016). In our study, while we have primarily relied on computational
   analyses of existing datasets, we have also performed experimental
   validation of the identified key genes. This approach underscores the
   critical role of EMT-related genes in the context of low-grade gliomas
   (LGG) and further highlights their significance in future research. By
   combining computational insights with experimental evidence, we aim to
   provide a more comprehensive understanding of the role of EMT in the
   expression, prognosis, immune tumor microenvironment (TME), clinical
   implications, and personalized treatment strategies for LGG.

2 Materials and methods

2.1 Data acquisition

   Transcriptome data, mutation data and clinicopathologic characteristics
   of LGG samples were retrieved from the Cancer Genome Atlas (TCGA;
   [41]https://www. cancer. gov/tcga/) through the Genomic Data Commons
   data portal (GDC; [42]https://portal.gdc.cancer.gov/). In addition, the
   aforementioned information were obtained from two cohorts (mRNAseq_693,
   Illumina HiSeq Platform; mRNAseq_325, Illumina HiSeq (2000) and (2,500)
   platforms) in Chinese Glioma Genome Atlas (CGGA;
   [43]http://www.cgga.org.cn), and serve as an external dataset. “SVA”
   ([44]Leek et al., 2012) package was then used to merged the two cohorts
   after removing the batch effects (“ComBat” algorithm). After converting
   the raw read count, the values were expressed as transcripts per
   kilobase million (TPM). The Molecular signatures database (MSigDB;
   [45]http://www.broadinstitute.org/msigdb) ([46]Liberzon et al., 2015)
   provided a collection of 200 genes related to EMT, obtained from the
   gene set named “HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION”. [47]Figure
   1 showed the graphical abstract and analysis flow chart of present
   study.

FIGURE 1.

   [48]FIGURE 1
   [49]Open in a new tab

   The graphical abstract and analysis flow chart of present study. (A)
   Construction of EMT-related consensus clusters. (B) Evaluation of
   immunotherapy of EMT-related consensus clusters. (C) Construction of
   EMT-related signature. (D) Experimental validation of SLC39A1.

2.2 Unsupervised clustering analysis

   To perform an unsupervised cluster analysis on mRNA expression profiles
   of genes associated with EMT, the “ConsensusClusterPlus” R package was
   created ([50]Wilkerson and Hayes, 2010). The optimal number of clusters
   was determined using the cumulative distribution function (CDF) and
   consensus matrix through resampling analysis to achieve the best
   performance. To assess the disparity in survival among clusters, the
   Kaplan-Meier (KM) analysis was employed. The results displayed the top
   20 mutated genes in each cluster after visualizing single-nucleotide
   polymorphisms (SNP) variations using the R package “maftools”
   ([51]Mayakonda et al., 2018). The GISTIC2.0 tool from Genome Data
   Analysis Center (GDAC) Firehose ([52]https://gdac.broadinstitute.org)
   was used to analyze the copy number alterations (CNA) data among
   subtypes.

2.3 Landscape of immune TME

   To calculate the ratio of immune cells in LGG samples, the Estimation
   of Stromal and Immune cells in Malignant Tumours using Expression
   (ESTIMATE) data method was utilized to compute the stromal and immune
   scores ([53]Yoshihara et al., 2013). The estimation of tumor purity was
   done by combining stromal and immune scores. The violin plot displayed
   variations in immune score, stromal score, and tumor purity across
   various subtypes.

   The single-sample gene set enrichment analysis (ssGSEA) algorithm was
   utilized to measure the proportion of immune cell infiltration in the
   immune TME of LGG samples through the computation of enrichment scores.
   To compute the abundance of 28 tumor infiltrating immune cells (TIICs),
   a sum of 782 marker genes was collected from Bindea et al. ([54]Bindea
   et al., 2013) and Broad Institute
   ([55]http://software.broadinstitute.org/gsea/msigdb/index.jsp).
   Boxplots were used to present the results of ssGSEA analysis, which
   compared the level and function of immune infiltration among different
   subtypes using the gene set variation analysis (GSVA) software “GSVA” R
   package.

2.4 GSVA and functional annotation

   To evaluate dissimilarities in biological processes among various
   clusters, the R software package “GSVA” was utilized for conducting
   GSVA analysis. Based on transcriptome data, GSVA is an unsupervised and
   nonparametric gene enrichment method that estimates alterations in the
   activity of biological processes and pathways in samples
   ([56]Hänzelmann et al., 2013). To run GSVA analysis, the gene set of
   “C2. cp.kegg.v7.1″was downloaded from MSigDB database. The
   “clusterProfiler” R software package was utilized to conduct functional
   annotation of genes related to EMT. Significantly enriched pathways
   were identified through screening for those with an adjusted p-value <
   0.05 and false discovery rate (FDR) < 0.05.

2.5 Therapy response prediction of subtype

   The tumor immune dysfunction and exclusion (TIDE) algorithm ([57]Jiang
   et al., 2018) was utilized to approximate the efficacy of immunotherapy
   based on tumor immune dysfunction and exclusion. According to the
   Genomics of Drug Sensitivity in Cancer (GDSC) database
   ([58]https://www.cancerrxgene.org/) ([59]Yang et al., 2012), we predict
   chemotherapy response of each LGG sample. After performing simple
   processing of data (such as elimination of low-variation genes and
   summarize duplicated gene expression data into averages), we estimated
   the sensitivity of eight chemotherapeutic agents (cisplatin, erlotinib,
   methotrexate, vincristine, carmustine, temozolomide (TMZ), rapamycin,
   and doxorubicin) currently under study or widely applied in gliomas by
   SubMap analysis (Gene Pattern) of GDSC data ([60]Hoshida et al., 2007).
   The ridge regression was performed to estimate the half-maximal
   inhibitory concentration (IC50) for LGG by “pRRophetic” R software
   package ([61]Geeleher et al., 2014a), and the quantitative prediction
   accuracy was verified by 10-fold cross validation based on the GDSC
   training set ([62]Geeleher et al., 2014b).

2.6 Weighted correlation network analysis (WGCNA) to identify hub genes from
subtypes

   For the purpose of identifying subtype associated genes, we adopted
   WGCNA algorithm through the “WGCNA” R package (Version: 1.71)
   ([63]Langfelder et al., 2009). At first, genes that exhibited a
   variance value exceeding 25% were selected for the construction of the
   coexpression network ([64]Yang and Xu, 2021; [65]Zhou et al., 2021;
   [66]Feng et al., 2022). Subsequently, the outlier samples were excluded
   through the “goodSampleGenes” function, and the soft-thresholding value
   β = 5 (scale free = 0.85) was applied to ensure a scale-free network.
   Then, a gene clustering tree was generated based on the computed
   adjacency among genes, followed by the categorization of genes into
   distinct modules comprising no less than 100 genes exhibiting similar
   characteristics within each module. The modules that exhibited most
   correlation with subtypes were selected to the downstream analysis,
   such as KEGG analysis. Moreover, the hub genes were screened based on
   the cutoff gene significance (GS) > 0.6 and module membership (MM) >
   0.8. Pathway enrichment analysis on the modules was performed via the
   “clusterProfiler” R package ([67]Wu et al., 2021).

2.7 Development of EMT-related signature

   The “glmnet” package was employed to conduct a LASSO regression
   analysis on genes associated with prognostic EMT subtyping to evaluate
   their effect on prognosis. The analysis was subjected to ten-fold
   cross-validation, and the genes were chosen based on the point with the
   lowest error rate. Patients were categorized into low or high EMT score
   subgroups using the median score, which was calculated by merging gene
   expression and coefficients to compute the EMT risk score. KM survival
   curves were generated by performing survival analysis with the aid of
   the “survival” and “survminer” packages. To generate receiver operating
   characteristic (ROC) curves the “timeROC” package was employed.
   Prognostic variables were subjected to uni- and multivariate-cox
   regression analyses. A nomogram was constructed using the “rms” package
   to estimate survival probability, and calibration curves were used to
   evaluate the accuracy of the predictions.

2.8 Drug discovery of EMT-related signature

   The response of human cancer cell lines to small molecule compounds was
   estimated using drug sensitivity profiles obtained from either the
   Cancer Therapeutics Response Portal (CTRP) ([68]Basu et al., 2013) or
   the PRISM project ([69]Corsello et al., 2020).

2.9 Single cell RNA-sequencing (scRNA-seq) revealed the expression level of
genes

   The [70]GSE202096 dataset provided scRNA-seq data for one LGG samples
   ([71]GSM6094425) ([72]Chen et al., 2022). After conducting quality
   control, cells that had more than 20% mitochondrial UMI counts were
   eliminated using the “Seurat” package ([73]Cao et al., 2022). The
   selection process involved choosing the top 1,500 genes with high
   variability, followed by clustering cell populations via the
   FindClusters function and then mapping them into t-distributed
   stochastic neighbor embedding (t-SNE). Using the FindAllMarkers
   function, the markers for each cell cluster were determined. Then, the
   CellMarker database was utilized to annotate cells based on their cell
   markers ([74]Hu et al., 2022).

2.10 Cell culture and transfection

   The cultivation of HMC3, GOS-3, T98G, and LN-18 cells (ATCC) was
   carried out in DMEM supplemented with 10% fetal bovine serum (Sigma
   Aldrich) under an atmosphere of 5% CO2 at 37°C. After cloning siRNA
   targeting SLC39A1 into lentiviral vectors, the vectors were transfected
   into GOS-3 and T98G cells for 3 days and then exposed to 4 μg/mL
   puromycin for 1 week.

2.11 Real-time quantitative PCR (RT-qPCR)

   Total RNA was extracted using TRIzol (Beyotime) and cDNA was
   synthesized with the PrimeScript RT reagent kit after gDNA Eraser
   (Takara) treatment. SYBR Green II Mixture (TaKaRa) was used for
   RT-qPCR, and GAPDH was used as an internal reference to calculate the
   expression using the 2^−ΔΔCT method.

2.12 Flow cytometry

   Cell apoptosis was detected through flow cytometry and the Cell
   Apoptosis Detection Kit (KTA0002; Abbkine) was utilized. To summarize,
   a group of cells was exposed to 5 μL of Annexin V-AbFluor™ 488 binding
   and 2 μL of PI for 15 min at room temperature while being shielded from
   light. After adding 400 μL of 1 x Annexin V buffer, the level of
   apoptosis was assessed using a Beckman Flow cytometer.

2.13 Statistical analysis

   The R language ([75]R-project.org) platform and software package from
   the Bioconductor project ([76]www.bioconductor.org) (R Core Team,
   Version 4.0.2) were used for statistical analysis and visualization of
   results in the current study. To compare the two groups and more
   groups, the Wilcoxon and Kruskal–Wallis tests were utilized,
   respectively ([77]Hazra and Gogtay, 2016). To detect the differentially
   expressed genes (DEGs) between the two subtypes, the one-way ANOVA and
   Tukey’s test were conducted with a q-value of less than 0.05 and an
   absolute value of log2FC greater than 2. The KM curve, analyzed by
   log-rank test, exhibited the variations in operating systems among
   different groups. Bilateral p values were used, with a statistically
   significant difference defined as p < 0.05.

3 Results

3.1 Clinicopathological characteristics of two LGG molecular subtypes based
on EMT related genes

   Based on the prognostic EMT gene expression profile, LGG samples was
   divided into two molecular subtypes by unsupervised clustering analysis
   ([78]Figures 2A–C). The EMT gene expression were presented a
   significant divergence between two subtypes ([79]Figure 2D). [80]Figure
   2E showed that the principal components analysis (PCA) distribution
   patterns were mostly in agreement with the two subtypes designations.
   [81]Figure 2F suggested the overall prognosis of C2 is worse than that
   of C1 (p < 0.0001). Moreover, we further explore the relationship
   between subtypes and clinical traits.

FIGURE 2.

   [82]FIGURE 2
   [83]Open in a new tab

   EMT-related genes could distinguish LGG in TCGA with different clinical
   and molecular features. (A) Relative change in area under CDF curve for
   k = 2 to k = 9. (B) Delta curve analysis from k = 2 to k = 9. (C)
   Consensus clustering matrix heatmap plots of 506 samples from TCGA
   datasets for k = 2. (D) Expression of EMT-related genes in molecular
   subtypes. (E) PCA analysis of the EMT-related genes expression when k =
   2. (F) Kaplan-Meier analysis of patients between two subtypes. CDF,
   cumulative distribution function; PCA, principal components analysis.

   As [84]Figures 3A–G showed, the patients survival status ([85]Figure
   3A), age ([86]Figure 3B), grade ([87]Figure 3D), IDH status ([88]Figure
   3E), 1p19q status ([89]Figure 3F) and methylation status ([90]Figure
   3G) were showed a significant difference (p < 0.05) between C1 and C2,
   while no difference was observed in gender ([91]Figure 3C). Subtype C2
   corresponding to more deaths, elderly patients, higher grade, 1p19q
   non-codel, IDH wild type, and unmethylated cases.

FIGURE 3.

   [92]FIGURE 3
   [93]Open in a new tab

   The relationship between two molecular subtypes, survival status (A),
   age (B), gender (C), grade (D), IDH mutation status (E), 1p19q status
   (F) and methylation status (G).

   We assessed the function of each subtype through GSVA analysis
   ([94]Supplementary Figure S1) to delve deeper into the potential
   biological process of the two clusters. Results suggested that C1
   mainly enriched metabolism-related signaling pathways, such as
   butanoate metabolism and propanoate metabolism pathways. The primary
   routes of C2 that were enriched consist of cell adhesion molecules
   (CAMs), interaction with ECM receptors, the p53 signaling pathway,
   interaction between cytokines and cytokine receptors, focal adhesion,
   the JAK-stat signaling pathway, and apoptosis. These pathways were
   significantly correlated with tumor cell genesis, proliferation,
   invasion and migration. Furthermore, the GSVA outcomes indicated that
   C2 was linked to pathways related to DNA damage repair (such as
   Nucleotide excision repair, DNA replication, and Mismatch_repair) as
   well as immune functions (including Natural killer cell mediated
   cytotoxicity, Toll like receptor signaling pathway, and Antigen
   processing and presentation). In conclusion, C2 showed more malignant
   biological behaviors and enrichment signaling pathways than C1.In
   addition, SNP analysis of the two clusters in TCGA uncovered that C1
   had a higher mutation proportion than C2 of IDH1, TP53, ATRX, CIC,
   FUBP1, NOTCH1 ([95]Rotin et al., 2015) and IDH2 ([96]Supplementary
   Figure S1A). However, EGFR, PTEN and NF1 ([97]Verhaak et al., 2010;
   [98]Marques et al., 2019; [99]Koptyra et al., 2023), the mutations that
   were common in glioblastoma (GBM), were more frequent in C2 than C1
   ([100]Supplementary Figure S2B). Therefore, EMT related genes are
   important references for LGG molecular subtype. Our cluster-specific