Abstract

   As the most common malignancy from mediastinum, the metabolic
   reprogramming of thymoma is important in its development. Nevertheless,
   the connection between the metabolic map and thymoma development is yet
   to be discovered. Thymoma was categorized into three subcategories by
   unsupervised clustering of molecular markers for metabolic pathway
   presentation in the TCGA dataset. Different genes and functions
   enriched were demonstrated through the utilization of metabolic Gene
   Ontology (GO) analysis. To identify the main contributors in the
   development of thymic malignancy, we utilized Gene Set Enrichment
   Analysis (GSEA), Gene Set Variation Analysis (GSVA), and Kyoto
   Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis.
   The prognosis of thymoma was evaluated by screening the essential
   pathways and genes using GSVA scores and machine learning classifiers.
   Furthermore, we integrated the transcriptomics findings with spectrum
   metabolomics investigation, detected through LC-MS/MS, in order to
   establish the essential controller network of metabolic reprogramming
   during thymoma progression. The thymoma prognosis is related to
   glycosphingolipid biosynthesis-lacto and neolacto series pathway, of
   what high B3GNT5 indicate poor survival. The investigation revealed
   that glycosphingolipid charts have a significant impact on metabolic
   dysfunction and could potentially serve as crucial targets in the
   clinical advancement of metabolic therapy.

   Keywords: Mass spectrometry analysis, Thymoma, Metabolic reprogramming,
   Glycosphingolipid

Introduction

   The most common primary mass in the anterior mediastinum is thymoma,
   which originates from the thymic epithelium. At present, the molecular
   mechanisms underlying thymoma are still not understood.
   Multidisciplinary approach is necessary for treating thymoma, with
   surgery being the primary method for achieving cure [47]^1. However, up
   to 30% to 50% of patients may develop advanced, recurrent, metastatic,
   or refractory tumors. Moreover, patients experiencing advanced stages
   of thymoma typically exhibit unsatisfactory treatment outcomes [48]^2^,
   [49]^3. Hence, it is crucial to discover dependable predictive
   biomarkers for the identification of thymoma patients, as this would
   enhance the chances of survival.

   Carcinogenesis often involves the reprogramming of metabolism, which
   has significant effects on the tumor microenvironment, cellular
   differentiation, and the expression of genes. The alteration of
   metabolism has been demonstrated to impact the growth, proliferation,
   and invasion of tumor cells in various manners. For instance, it can
   fulfill the energy requirements of rapidly dividing cancer cells and
   modify biological processes by influencing the makeup of substances
   [50]^4. The identification of metabolically significant biomarkers aids
   in the identification of abnormal organism-specific alterations, and
   there is an increasing body of evidence connecting metabolic
   irregularities to unfavorable prognosis in various cancers, including
   clear cell renal cell carcinoma, colorectal cancer, and endometrial
   cancer [51]^5^-[52]^7. Nevertheless, the precise expression patterns of
   genes associated with metabolism in thymoma remain uncertain.

   The beta-1,3-N-acetylglucosaminyltransferase (B3GNT) family consists of
   a collection of enzymes that demonstrate an extraordinary capacity to
   facilitate the transfer of N-acetylglucosamine (GlcNAc) from UDP-GlcNAc
   to specific target molecules [53]^8. The formation of beta-1,3-linked
   GlcNAc residues through this enzymatic process has significant
   implications in diverse biological scenarios. Currently, the B3GNT
   family comprises eight known members, namely B3GNT1 to B3GNT8, each
   displaying unique characteristics and functions. Notably, the
   disruption of the B3GNT group has been identified as a crucial element
   in the advancement and advancement of cancer [54]^9. For example,
   abnormal levels of B3GNT1 expression have consistently been detected in
   different types of cancer, appearing as an increase in its occurrence.
   The elevated expression is strongly linked to negative consequences
   like accelerated tumor growth, enhanced ability to spread, and worse
   prognosis [55]^10. Moreover, within the B3GNT family, B3GNT5 has
   garnered considerable attention due to its strong association with
   glioma progression [56]^9.

   The ST3GAL family is composed of sialyltransferases, which are enzymes
   categorized under the sialyltransferase 3 (ST3) subfamily [57]^11. Six
   members of this family has been reported. Glycosyltransferases known as
   sialyltransferases facilitate the transfer of sialic acid residues from
   donor molecule, typically CMP-sialic acid, to acceptor molecules. This
   process leads to the incorporation of sialic acid and glycan
   structures. The members of this family paly an important role in
   different biological processes, like cell communication, growth, immune
   defense, and progression of diseases. Changes in the expression or
   function of ST3GAL family members have been detected in various
   conditions, such as malignancy, inflammation, and neurodegenerative
   ailments [58]^12.

   For this study, we utilized mRNA expression information of individuals
   diagnosed with thymoma from The Cancer Genome Atlas (TCGA) repository
   to construct a metabolic prognostic signature. Extensive bioinformatics
   analysis revealed substantial alterations in the metabolomics of
   thymoma. Afterwards, thymoma metabolites and metabolic pathways were
   investigated using differential metabolomics. Afterwards, a correlation
   analysis was conducted to examine the relationship between distinct
   metabolic pathways and significant clinical phenotypes and we verified
   the results with the clinical samples and data. In conclusion, we have
   discovered the most predictive metabolic pathways and crucial genes in
   thymoma. These findings could offer fresh insights into examining the
   prognosis of thymoma and formulating personalized metabolic treatment
   strategies.

Methods

Patients and tissue samples

   All the thymoma tissues and adjacent normal thymic tissues were
   collected from patients who underwent surgery in thoracic surgery, the
   First Affiliated Hospital, Sun Yat-sen University. All patients have
   not received any therapy before surgery. The samples were collected
   immediately after removing the resected thymoma, and stored at -80 °C.
   All the participating patients have signed informed consent form and
   the study was approved by the Ethics Committee of the First Affiliated
   Hospital, Sun Yat-sen University.

Thymoma molecular subtyping and survival analysis

   Downloaded from the UCSC Xena database
   ([59]https://xenabrowser.net/datapages/), the RNA-seq data for 121
   TCGA-thymoma (THYM) samples was obtained in log2 (count+1) units. The
   phenotypes of the THYM samples were also downloaded from GDC Hub. The
   analysis of cancer molecular subtypes was conducted using the
   algorithms 'CancerSubtypes' and 'ClassDiscovery'[60]^13. Cluster
   analysis was performed using the RNA expression matrix and average
   linkage method, employing the 'SNFCC' approach.

   Following subcategorization, the patients with distinct thymoma
   subtypes underwent survival analysis utilizing the survival and
   survminer software packages [61]^14. Using the top-notch Separation
   algorithm to distinguish between high and low expression, the samples
   were automatically classified based on variations in survival rates.
   Each group consisted of a minimum of 10% of the total samples [62]^15.

Analysis of Prognostic differentially expressed genes and functional
enrichment

   An analysis of differential gene expression was conducted on the
   samples from two clusters that exhibited the most notable disparity in
   survival. To normalize gene expression profiles and remove
   low-expression genes, the DESeq algorithm was employed [63]^16. To
   choose differentially expressed genes, the requirements included a
   log2-fold change ≥1.5 and a Benjamini-Hochberg (B-H) adjusted P value
   <0.05. The differentially expressed genes were clustered based on their
   expression patterns using unsupervised cluster analysis. Subsequently,
   the clusters underwent Gene Ontology (GO) enrichment analysis to
   investigate their potential biological roles using ToppGene Suite
   ([64]https://toppgene.cchmc.org) [65]^17.

Gene set enrichment analysis and gene set variation analysis

   Based on the algorithms 'clusterProfiler' and 'AnnotationHub'[66]^18^,
   [67]^19, we performed a gene-set enrichment analysis (GSEA,
   [68]http://software.broadinstitute.org/gsea/index.jsp) on thymoma
   tissues from the TCGA-THYM project. The minimum gene set size parameter
   for the algorithm was set to 10, while the maximum gene set size was
   500. Significantly enriched pathways were determined by considering
   terms with a permutation test number of 1000 and a B-H corrected P
   value less than 0.05.

   Furthermore, GSVA is a nonparametric unsupervised technique utilized
   for assessing the outcomes of gene set enrichment in microarray or
   RNA-seq data [69]^20. The gene list of central pathways was gathered by
   incorporating the KEGG (http //www.genome.ad.jp/kegg/), REACTOME (https
   //reactome.org/) and PathCards databases (http
   //pathcards.genecards.org/). For normalized scoring of gene sets per
   cell, the GSVA and GSEABase packages were used [70]^21. To assess the
   lasting impact on patients, we conducted survival analysis on pathways
   scored using GSVA [71]^22.

Analysis of differential metabolic pathways

   GSVA is an unsupervised and nonparametric approach that assesses the
   enrichment of gene sets in the transcriptome. Identifying the metabolic
   pathways enriched in a sample can be achieved by converting the gene
   expression matrix into a gene set expression matrix. The project
   utilized GSVA to assess the variations in gene sets among different
   metabolic pathways in sample clusters [72]^23. By employing the
   approach described by Wu et al., we successfully detected gene clusters
   associated with 46 metabolic pathways in tumors. These pathways were
   standardized and quantified using GSVA algorithms. The differential
   analysis of metabolic pathway GSVA scores in sample clusters was
   performed using the B-H algorithm from the Limma R package, resulting
   in the generation of a bar chart [73]^24.

Significance of differential metabolic pathways in thymoma classification

   First, the GSVA scores of differential metabolic pathways were
   extracted and matched with the classification of clusters. We built a
   machine learning classifier for the two groups of patients with thymoma
   who had the most distinct survival rates, utilizing linear regression,
   decision tree, and random forest analyses. By employing recursive
   feature selection, the decision tree algorithm categorizes the training
   dataset and produces a tree composed of nodes and directed edges
   [74]^25. The nodes are divided into internal and leaf nodes, with an
   internal node representing a feature and a leaf node representing a
   class. A random forest is a type of classifier that consists of
   multiple decision trees, and the final class prediction is based on the
   most frequently occurring class among these trees. The random forest
   method is highly valuable for error balancing and accuracy maintenance
   [75]^26. It can be applied to handle datasets with multiple variables,
   generate classifiers with high accuracy, and evaluate feature
   importance using decision trees. Furthermore, we assessed the
   precision, recall, and AUC curve to determine the accuracy and
   stability of the machine learning model. Additionally, we built a SHAP
   model and computed the SHAP score to evaluate the impact of key
   attributes on the categorization of two groups of thymoma patients with
   the most significant contrast in survival [76]^27.

Mass spectrometry-based metabolomic profiling

   We picked 10 specimens, comprising of five samples of thymoma tissue
   and five samples of para-tumor tissue. The samples were analyzed
   utilizing ultra-performance liquid chromatography (UPLC, ExionLC AD,
   USA) and tandem mass spectrometry (MS/MS; QTRAP®, USA). Various
   statistical techniques were employed to examine variations in
   metabolites among the samples [77]^28.

   The mass spectrometry data was processed using Analyst 1.6.3.A
   combination of the sample extracts [78]^29 was used for quality
   assurance. To ensure the consistency of measurement under identical
   operating conditions, analytical samples were periodically supplemented
   with quality control samples, with a frequency of one every ten
   analytical samples. The total ion current (TIC) chromatogram and
   multiple reaction monitoring (MRM) multipeak chromatogram were
   obtained. To identify the distinctive ion of every compound, a triple
   quadrupole mass spectrometer was employed for screening. The detector
   captured the signal intensity (counts per second) of the characteristic
   ion. The samples' mass spectrometer output files were utilized for
   chromatographic peak integration and calibration using MultiQuant
   software [79]^30. The chromatographic peak area indicates the
   proportionate amount of the corresponding compound. All data related to
   the integration of chromatographic peak areas were exported and saved.
   In order to ensure the accuracy of qualification and quantification, we
   adjusted the chromatographic peak of each detected metabolite in
   various samples by considering the metabolite's retention time and peak
   type, allowing for comparison of their levels [80]^31.

Orthogonal partial least squares-discriminant analysis

   By utilizing the metabolomic data obtained earlier, we have the
   capability to conduct metabolite identification and analyze the quality
   of sample data. We choose differential metabolites and perform
   predictive and analytical functions on the metabolites present in the
   samples. OPLS-DA (Orthogonal Partial Least Squares-Discriminant
   Analysis) combines orthogonal signal correction and PLS-DA to separate
   the matrix data of independent variables into two components:
   irrelevant information and dependent variable-related information. By
   utilizing this approach, it becomes possible to maximize variations
   between different groups, which aids in the detection of distinct
   metabolites [81]^32. Additionally, it helps eliminate irrelevant
   disparities in order to pinpoint specific variables that differ
   significantly. Ultimately, this method enhances the outcomes of
   differential analysis. The OPLS-DA model utilized Variable Importance
   in Projection (VIP) to initially identify distinct metabolites among
   groups [82]^33. Further differential metabolite selection was based on
   the P value or fold change obtained from univariate analysis.
   Differential metabolites were determined by setting a threshold of VIP
   ≥ 1.0 and fold change ≥ 2 or fold change ≤ 0.5. In other words, if the
   ratio of a metabolite in the tumor group to that in the control group
   was ≥ 2 or ≤ 0.5, it was considered to have statistically significant
   difference.

Functional annotation and enrichment analysis of differential metabolites

   Using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, we
   conducted an analysis on the connections and routes of distinct
   metabolites in living beings. This primarily encompassed the potential
   metabolic pathways of carbohydrates, nucleotides, and amino acids, as
   well as the degradation of organic substances. Additionally, we
   performed a thorough annotation of enzymes for a range of reactions. P
   value < 0.05 was considered statistically significant [83]^34.

   The conventional analysis of enrichment, which relies on the
   hypergeometric distribution, is primarily suitable for identifying
   metabolites that have significant upregulation or downregulation.
   However, it may overlook metabolites that lack significant differential
   expression despite their crucial biological importance. Metabolic set
   enrichment analysis (MSEA) converts metabolomic data into a range of
   predetermined metabolic sets without specifying the threshold for
   differential metabolites in advance [84]^35. Metabolic sets with
   significant differences were identified using MSEA. P value < 0.05 for
   pathway enrichment was considered statistically significant.

Effects of the core pathway on survival and identification of target genes

   The machine learning classifier mentioned earlier identified the
   regulatory metabolic pathway of utmost significance. Additionally, we
   acquired the GSVA scores for this pathway, which were automatically
   partitioned using the survival and survminer packages to derive an
   optimal threshold value. We examined the impact of this pathway on
   long-term survival, as determined by DSS, PFI, and OS. Additionally, we
   acquired the gene expression pattern of this central pathway and
   investigated gene correlation through Pearson correlation analysis. The
   gene expression differences corrected by the B-H algorithm in Limma
   were used to determine the differential expression of these genes.
   Finally, genes with Log|FC| ≥1.0 and an adjusted p value ≤ 0.05 were
   selected as targets. Continuous variables may have nonlinear effects on
   survival. In order to eliminate the non-linear impacts of expression
   profiles, the smoothHR and Hmisc algorithms were employed to construct
   a survival prediction model that links differential target gene
   expression with overall survival (OS). Additionally, the risk ratio was
   calculated for each level of continuous gene expression compared to the
   baseline, providing further evaluation of the effects of genes on
   long-term survival. Afterwards, the validation of the effects of gene
   expression on the long-term survival (DSS, PFI, and OS) in patients
   with thymoma was conducted by utilizing the survival and survminer
   packages.

Immunohistochemistry (IHC) and Kaplan-Meier survival analysis

   IHC staining for B3GNT5 and ST3GAL6 was performed on all 28 thymoma and
   adjacent normal tissue samples, which were fixed in 10% neutral
   buffered formalin, paraffin-processed and embedded. The deparaffinized
   tissue sections (4 mm thick) were stained with antibody against B3GNT5
   and ST3GAL6 (Santa Cruz, Dallas, Texas, USA) for immunohistochemical
   analysis. Images of immunohistochemistry staining were photographed
   under a light microscope (Leica, Wetzlar, Germany). IHC scores were
   independently evaluated by two individuals. The final score was the
   multiplication of the positive ratio value (0-3) and the intensity
   value (0-3) of the immunoreactive cells. The scores ranging from 0-4
   were defined as low expression, while the scores > 4 were considered
   high expression. Kaplan-Meier survival analysis was conducted by
   GraphPad Prism version 9.5.1.

Results

Thymoma subtype identification and differential analysis

   The 121 samples of THYM were categorized into three distinct groups:
   Group 1 (52 cases), Group 2 (39 cases), and Group 3 (30 cases). The
   chi-squared test resulted in a Q value of 10.6 and a P value of 0.005
   for the groups (Fig. [85]1A-B). The unsupervised heat map of tissue
   gene expression data indicated that this classification was
   trustworthy, with groups showing relative autonomy and strong
   correlation within each group (Fig. [86]1B). Groups 1 and 3 exhibited
   the most notable variation in survival rates, defined as the
   good-outcome and poor-outcome groups correspondingly (Fig. [87]1A).

Figure 1.

   [88]Figure 1
   [89]Open in a new tab

   Consensus clustering and different metabolic profiles between the two
   clusters. A. Survival analysis of thymoma subgroups. B Consensus matrix
   heatmap analysis. C Volcano plot of differentially expressed genes
   between subgroup 1 and 3. D Correlation analysis and GO enrichment
   analysis between differentially expressed genes and clinical
   phenotypes.

   The analysis of differential gene expression, using the transcription
   data of Groups 1 and 3, revealed 1370 differentially expressed gene.
   Among these genes, 524 were found to be downregulated while 846 were
   upregulated (Fig. [90]1C). We gathered the clinical data related to the
   samples and conducted an unsupervised cluster analysis on the
   expression levels of the top 100 genes that showed differential
   expression. The findings indicated that the genes exhibiting
   differential expression in Groups 1 and 3 could potentially be linked
   to the histological classification of the tumor and past occurrence of
   myasthenia gravis (Fig. [91]1D).

   After clustering, the genes that showed differential expression were
   categorized into three distinct clusters. Cluster 1 was mainly
   associated with GO:0002475 (antigen processing and presentation via MHC
   class Ib; P < 0.001), GO:0042492 (gamma-delta T-cell differentiation; P
   < 0.001), and GO:0033151 (V[D]J recombination; P < 0.001). Cluster 2
   was mainly associated with GO:0008544 (epidermis development; P <
   0.001), GO:0046068 (cGMP metabolic process; P <0.001), and GO:0043588
   (skin development; P < 0.001). Cluster 3 was mainly associated with
   GO:0071449 (cellular response to lipid hydroperoxide; P = 0.001),
   GO:1902691 (respiratory basal cell differentiation; P = 0.001), and
   GO:0042398 (cellular modified amino acid biosynthetic process; P =
   0.002; Fig. [92]1D).

Construction of a pathway interaction network of different subtypes and
survival analysis

   A pathway interaction network was derived from the enriched
   differentially expressed genes of subtypes 1 and 3 by GSEA (Fig.
   [93]2A). Lipid metabolism related maps and immune/inflammation related
   maps were significantly enriched (Fig. [94]2B). Significant enrichment
   was observed in neutral lipid biosynthetic process (n = 32, normalized
   enrichment score [NES] = 1.82, P < 0.001), positive regulation of
   phospholipid metabolic process (n = 44, NES = 1.65, P = 0.004), and
   positive regulation of lipid metabolic process (n = 121, NES = 1.59, P
   = 0.002) among pathways related to lipid metabolism (Fig. [95]2B).

Figure 2.

   [96]Figure 2
   [97]Open in a new tab

   Construction of the pathway interaction network of different subtypes
   and survival analysis. A, B GSEA enrichment analysis of differentially
   expressed genes. C GSVA analysis of differentially expressed genes. D
   The lipid metabolism related pathway survival analysis based on GSVA
   score.

   The unsupervised cluster heatmap based on the GSVA score showed
   significant differences between Groups 1 and 3 in the components of the
   following pathways: GOMF: lipid transporter activity; GOBP: lipid
   localization; and GOBP: neutral lipid biosynthetic process (Fig.
   [98]2C). The results suggested important roles of lipid metabolism in
   the proliferation and differentiation of the two subtypes of thymomas.
   Based on the GSVA scores of pathways, we also found that the long-term
   survival of patients with thymomas was closely associated with GOMF:
   lipid transporter activity (P = 0.027, hazard ratio [HR] = 4.16, 95%
   confidence interval [CI]: 1.03-16.78), GOBP: lipid localization (P =
   0.002, HR = Inf, 95% CI: Inf-Inf), and GOBP: neutral lipid biosynthetic
   process (P = 0.01, HR = 6.01, 95% CI: 1.55-23.28) (Fig. [99]2D).

Analysis of differential metabolic pathways

   Using the method described by Wu et al., we identified gene sets
   involved in 46 tumor metabolic pathways in the KEGG database. By GSVA
   and Limma differential analyses, we identified 12 pathways with t value
   ≥ 1.0 and B-H corrected p value < 0.05, including
   PYRIMIDINE_METABOLISM, PURINE_METABOLISM, and GLYOXYL
   ATE_AND_DICARBOXYL ATE_METABOLISM. We also identified 25 pathways with
   t valuet ≤ -1.0 and B-H corrected p value <0.05, including
   GLYCOSPHINGOLIPID_BIOSYNTHESIS_LACTO_AND_NEOLACTO_SERIES,
   ARGININE_AND_PROLINE_METABOLISM, and GLYCOSAMINOGLYCAN_DEGRADATION
   (Fig. [100]3A).

Figure 3.

   [101]Figure 3
   [102]Open in a new tab

   Differential metabolic pathway analysis and core pathway construction
   in thymoma. A Twenty-five differential metabolic pathways were
   identified via GSVA analysis. B, C, D, E Different models to construct
   key metabolic pathways (B decision tree, C random forest, D, E SHAP
   model)

Developing a thymoma machine learning classifier by analyzing distinct
metabolic pathways

   Differential analysis revealed 37 differential metabolic pathways
   important for thymoma progression.

   The linear regression analysis revealed that there were significant
   variations in GSAV scores between Cluster1 and Cluster3 across 37
   pathways. The pathway variables showed a linear correlation
   (precision-recall curves: sensitivity and specificity AUC values of
   0.586 and recall and precision AUC values of 0.593). Decision tree
   analysis suggested that
   KEGG_GLYCOSPHINGOLIPID_BIOSYNTHESIS_LACTO_AND_NEOLACTO_SERIES with a
   GSVA score of < -0.12 and KEGG_GLYCOSPHINGOLIPID
   _BIOSYNTHESIS_GLOBO_SERIES with a GSVA score of ≥ 0.12 were the most
   important pathways for the model (Fig. [103]3B). Random forest analysis
   showed that the
   KEGG_GLYCOSPHINGOLIPID_BIOSYNTHESIS_LACTO_AND_NEOLACTO_SERIES,
   KEGG_ASCORBATE_AND_ALDARATE_METABOLISM, and
   KEGG_PRIMARY_BILE_ACID_BIOSYNTHESIS pathways had the highest
   contribution to the identification of thymoma subtypes with poor
   prognosis (Fig. [104]3C). The SHAP model showed that the
   KEGG_GLYCOSPHINGOLIPID_BIOSYNTHESIS_LACTO_AND_NEOLACTO_SERIES,
   KEGG_PRIMARY_BILE_ACID_BIOSYNTHESIS, KEGG_TAURINE_AND_
   HYPOTAURINE_METABOLISM, KEGG_CITRATE_CYCLE_TCA_CYCLE, and
   KEGG_ASCORBATE_AND_ALDARATE_METABOLISM pathways were of utmost
   importance for the model (Fig. [105]3D). Hence, these algorithms
   recognized distinct crucial routes, with the decision tree, random
   forest, and SHAP models concurring that
   KEGG_GLYCOSPHINGOLIPID_BIOSYNTHESIS_LACTO_AND_NEOLACTO_SERIES held
   utmost significance among the metabolic pathways. Additionally, we
   assessed the SHAP-value of this pathway and observed that its GSVA
   score greatly impacted the classifier output (Fig. [106]3E).

Impact of the core pathway on the long-term survival of patients with thymoma

   The analysis above revealed that the glycosphingolipid biosynthesis
   lacto and neolacto series is the central regulatory pathway linked to
   the progression of thymoma. In order to examine the influence of this
   pathway on the future outlook of thymoma patients, we conducted K-M
   survival analysis using the GSVA score of this pathway. The results in
   Fig. [107]4A show that the higher the GSVA score of this pathway was,
   the worse the prognosis of the patients was, as indicated by DSS (p =
   0.024, hazard ratio = 6.47, 95% CI: 0.36-114.93), OS (p <0.001, hazard
   ratio = 10.63, 95% CI: 1.61-70.12), and, to a lesser extent, PFI (p =
   0.091, hazard ratio = Inf, 95% CI: Inf-Inf).

Figure 4.

   [108]Figure 4
   [109]Open in a new tab

   Relationship between core metabolic pathways and long-term survival in
   thymoma patients. A K-M survival analysis of glycosphingolipid
   biosynthesis lacto and neolacto series. B, C Pearson correlation
   analysis and differential expression analysis of genes in
   glycosphingolipid biosynthesis lacto and neolacto series.

   In order to comprehend the expression and interaction of genes within
   this pathway, we conducted a Pearson correlation analysis on the
   expression levels of 22 genes involved in this pathway. In general,
   there was a correlation among the genes, with strong correlations
   observed between FUT7, ST3GAL3, and ST3GAL4, as well as between B3GNT4
   and FUT1 (Fig. [110]4B). In tumor tissues, the expression of ST3GAL6
   was found to be notably reduced, whereas the expression levels of
   B3GALT2, ABO FUT3, B3GNT3, and B3GNT5 were observed to be significantly
   elevated (Fig. [111]4C).

Metabolomic profiling analysis of thymomas

   Widely targeted metabolomic profiling detected a grand total of 1758
   metabolites from the 10 samples. Hierarchical cluster analysis
   suggested that the distributions of different types of metabolites had
   certain heterogeneity among groups (Fig. [112]5A). Unsupervised
   principal component analysis (PCA) revealed significant metabolic
   heterogeneity among groups and minimal variation within groups (Fig.
   [113]5B).

Figure 5.

   [114]Figure 5
   [115]Open in a new tab

   Metabolomic analysis of thymoma samples. A Correlation analysis of
   differential metabolites in thymoma tissue samples and para-tumor
   tissue samples. B The OPLS-DA plot. C Screening of differential
   metabolites D. Interaction diagrams between different metabolites.

   The coefficient of variation (CV) is the ratio of the standard
   deviation to the mean of the original data, reflecting the degree of
   data dispersion. The proportion of substances with CV < 0.3 in the
   quality control samples was higher than 85%, which was higher than the
   standard value of 75%, indicating that the experimental data were very
   stable.

   The Q^2 of the OPLS-DA model was 0.542, higher than the acceptable
   limit of 0.5 for an effective model proposed by Thévenot et al. By the
   permutation test, there were 93 randomized models with a superior
   interpretation rate for the matrix to this OPLS-DA model (P < 0.05). A
   total of 534 differential metabolites were selected by VIP, including
   233 downregulated metabolites and 301 upregulated metabolites (Fig.
   [116]5C). A significant interaction map among different metabolites was
   detected, as shown in Fig. [117]5D.

Functional enrichment analysis of differential metabolites

   The Pearson analysis showed that different differential metabolites had
   a synergistic or incompatible relationship (i.e., metabolic proximity),
   suggesting mutual regulation of differential metabolites (Fig.
   [118]6A). The differential abundance (DA) score is a numerical
   assessment of alterations in metabolism based on KEGG pathways,
   indicating the average and overall modifications in all metabolites
   within a particular pathway. The formula for the DA score was DA score
   = (the number of upregulated differential metabolites in the pathway -
   the number of downregulated differential metabolites in the
   pathway)/the number of all metabolites annotated to the pathway. Lipid
   and atherosclerosis, cholesterol metabolism, and regulation of
   lipolysis in adipocytes showed a decreased expression trend in the
   pathological progression process of thymoma. Ether lipid metabolism,
   and synthesis and degradation of ketone bodies showed an increased
   expression trend (Fig. [119]6B).

Figure 6.

   [120]Figure 6
   [121]Open in a new tab

   KEGG analysis of differential metabolites. A Pearson correlation
   analysis of differential metabolites. B KEGG pathway-based differential
   abundance (DA) scoring. C, D Screening of significantly enriched
   pathways.

   In the KEGG pathway map related to metabolism, 30 DEMs were enriched in
   the glycerophospholipid metabolism pathway (Fig. [122]6C). Furthermore,
   the galactose metabolism, steroid biosynthesis, and endocrine
   resistance pathways exhibited significant enrichment with the richness
   factor > 0.5 (Fig. [123]6D). In this plot, the compounds lactose (VIP =
   1.023, Log2FC = 2.274, p value = 0.002), UDP-glucose (VIP = 1.915,
   Log2FC = -2.898, p value = 0.004), and UDP-D-galactose (VIP=1.915,
   Log2FC = -2.898, p value = 0.004) were identified (Fig. [124]6D).

Hub regulator detection

   In order to mitigate the impact of nonlinear gene expression on
   survival analysis using transcriptome profiling, we employed the
   restricted cubic spline model to assess the non-linear impacts of the
   continuous expression of the target genes ST3GAL6, B3GALT2, ABO, FUT3,
   B3GNT3, and B3GNT5 on survival. ABO (p for combined association =
   0.007, p for non-linear association = 0.082) and B3GNT5 (p for combined
   association = 0.004, p for non-linear association ≤ 0.001) had a
   notable influence on the overall survival (OS) of thymoma patients
   (Fig. [125]7A). The linear continuous expression of ABO and B3GNT5
   showed significant impacts on OS (ABO: p = 0.056, hazard ratio = 0.26,
   95% CI: 0.04-1.93; B3GNT5: p < 0.001, hazard ratio = 22.71, 95% CI:
   0.89-578.3), PFI (ABO: p = 0.058, hazard ratio = 0.38, 95% CI:
   0.1-1.53; B3GNT5: p = 0.015, hazard ratio = 3.01, 95% CI: 0.89-10.17),
   and DSS (ABO: p = 0.003, hazard ratio = 0.07, 95% CI: 0.01-0.94;
   B3GNT5: p < 0.001, hazard ratio = 25.3, 95% CI: 0.15-4249.41) (Fig.
   [126]7B).

Figure 7.

   [127]Figure 7
   [128]Open in a new tab

   Hub regulator detection. A Association analysis of hub gene expression
   and survival. B Long-term survival analysis of hub gene expression in
   patients with thymoma

High B3GNT5 expression associated with worse disease-free survival

   Expression of B3GNT5 and ST3GAL6 was upregulated in thymoma tissues
   than adjacent normal thymic tissues (p < 0.05) (Fig. [129]8).
   Furthermore, the clinical data showed that high expression level of
   B3GNT5 in thymoma was associated with worse disease-free survival (p <
   0.05, HR = 0.3339, 95%CI 0.1204-0.9263), while ST3GAL6 expression
   didn't influence DFS (p > 0.05, HR = 1.995, 95%CI 0.7339-5.424) (Fig.
   [130]8).

Figure 8.

   [131]Figure 8
   [132]Open in a new tab

   B3GNT5 and ST3GAL6 are upregulated in thymoma tissues and B3GNT5
   expression associates with disease-free survival.

Discussion

   Due to the marked heterogeneity of thymoma, even with the same
   histological diagnosis, the prognosis of patients tends to vary
   significantly. The conversion of healthy cells to cancerous cells is
   accompanied by various biological characteristics with metabolic
   reprogramming being the most notable, including glycolysis,
   glutamate-dependent anabolism, and lipid synthesis [133]^36^-[134]^38.
   Hence, exploring novel subtypes of tumors, especially regards to their
   metabolism, is an effective way to study the heterogeneity of thymoma,
   thereby providing insights for clinicians to make more accurate
   clinical assessments [135]^39^, [136]^40. The aim of this study was to
   identify a valuable metabolic gene signature for thymoma by examining
   the distinct metabolic genes in the TCGA database.

   Through investigation, we conducted consensus clustering on thymoma
   individuals within the database by utilizing genes associated with
   metabolism. By employing advanced bioinformatics analysis, we
   successfully pinpointed the crucial involvement of lipid metabolism in
   thymoma. Organelles contain lipids that serve as essential nutrients
   for normal cell growth and also function as components of cell
   membranes [137]^41. Research showed that the alteration of lipid
   metabolism is vital in membrane synthesis, energy production, and
   signal transduction during cancer cell progression. Tumor cells rely on
   lipid metabolism to fuel their energy needs, support cell growth,
   produce signaling molecules, and prioritize lipid synthesis for rapid
   proliferation [138]^42^-[139]^44. The alteration of lipid metabolism is
   strongly associated with worse tumor prognosis, evidenced in different
   types of tumors including pancreatic cancer, breast cancer, and
   non-small cell lung cancer [140]^45^-[141]^47. Elevated lipid levels
   can enhance the metastatic potential of cancer cells and contribute to
   drug resistance. Adjust the sentence structure, delete unnecessary
   words, and use synonyms to maintain the meaning [142]^48^, [143]^49.

   Creation of a machine learning classifier helped us to find
   glycosphingolipid biosynthesis, particularly the lacto and neolacto
   series, played an important role in thymoma progression. This
   classification model helped us to identify two key enzymes, namely
   B3GNT5 and ST3GAL6, that play crucial roles in this process
   [144]^50^-[145]^53. Glycosphingolipids (GSLs) are highly varied and
   plentiful glycolipids that exist on the outer layer of the cell
   membrane in various living organisms. They are a crucial part of the
   lipid composition of the plasma membrane in most eukaryotic cells.
   Their involvement includes processes of cell‒cell recognition and
   regulation of signals through the regulation of membrane microdomains
   and proteins associated with the membrane [146]^54^, [147]^55. The
   glycan composition present on the cell surface is a major determinant
   of the structural and functional classification of GSL, and alterations
   in GSL glycosylation are associated with stem cell differentiation and
   contribute to a variety of cancer processes, such as persistent tumor
   cell proliferation, promotion of tumor cell metastasis [148]^56^,
   [149]^57. The core glycans of GSLs are categorized into three primary
   groups: the ganglio-series, globo-series, and lacto/neolacto series.
   Glycosphingolipids from the lacto/neolacto series function as carriers
   or essential constituents of numerous glycan antigens, which have
   demonstrated significant involvement in various types of cancer
   [150]^58. B3GNT5, an essential enzyme involved in the production of
   lactate and the glycosphingolipids of the lactate series, is crucial in
   the development of certain cancerous conditions like ovarian cancer and
   glioblastoma [151]^59^, [152]^60. A recent study showed enhanced
   enzymatic function of B3GNT5 causes the buildup of a fresh lactate
   sequence of glycosphingolipids in cancer cells, impeding immune
   monitoring [153]^9. ST3GAL6 belongs to sialyltransferase family,
   involving in synthesis of glycolipid substrates, abnormalities of which
   are associated with cancer development, cell adhesion, invasion and
   metastasis [154]^61. ST3GAL6 is excessively expressed in different
   types of cancers. For instance, it stimulates the growth and invasion
   of hepatocellular carcinoma and colon cancer cells through the PI3K/AKT
   signaling pathway. Additionally, its overexpression enhances the
   ability of gastric cancer cells to inhibit the resistance of the Met
   tyrosine kinase receptor to crizotinib treatment [155]^62.
   Nevertheless, the results from the TCGA repository indicate a
   significant correlation between increased B3GNT5 levels and unfavorable
   prognosis, whereas ST3GAL6 acts as a gene that provides protection.
   This result was validated in our tissue samples but remains to be
   further elucidated.

   To summarize, our examination revealed two molecular categories linked
   to metabolism in thymoma, and we investigated the metabolic routes and
   crucial genes implicated in thymoma.The results of our study provide
   new insights into the classification of thymoma and emphasize its
   involvement in the heterogeneity of tumors related to metabolism.
   However, further validation requires larger sample sizes.

Conclusions

   This study revealed a strong correlation between the glycosphingolipid
   biosynthesis pathway and thymoma. B3GNT5 can be used as potential
   biomarkers to predict better prognosis of thymoma patients.

Acknowledgments