Abstract

Background

   Obesity has emerged as a growing global public health concern over
   recent decades. Obesity prevalence exhibits substantial global
   variation, ranging from less than 5% in regions like China, Japan, and
   Africa to rates exceeding 75% in urban areas of Samoa.

Aim

   To examine the involvement of metabolism-related genes.

Methods

   Gene expression datasets [38]GSE110729 and [39]GSE205668 were accessed
   from the GEO database. DEGs between obese and lean groups were
   identified through DESeq2. Metabolism-related genes and pathways were
   detected using enrichment analysis, WGCNA, Random Forest, and XGBoost.
   The identified signature genes were validated by real-time quantitative
   PCR (qRT-PCR) in mouse models.

Results

   A total of 389 genes exhibiting differential expression were
   discovered, showing significant enrichment in metabolic pathways,
   particularly in the propanoate metabolism pathway. The orangered4
   module, which exhibited the highest correlation with propanoate
   metabolism, was identified using Weighted Correlation Network Analysis
   (WGCNA). By integrating the DEGs, WGCNA results, and machine learning
   methods, the identification of two metabolism-related genes, Storkhead
   Box 1 (STOX1), NACHT and WD repeat domain-containing protein 2(NWD2)
   was achieved. These signature genes successfully distinguished between
   obese and lean individuals. qRT-PCR analysis confirmed the
   downregulation of STOX1 and NWD2 in mouse models of obesity.

Conclusion

   This study has analyzed the available GEO dataset in order to identify
   novel factors associated with obesity metabolism and found that STOX1
   and NWD2 may serve as diagnostic biomarkers.

Supplementary Information

   The online version contains supplementary material available at
   10.1186/s12967-024-05615-8.

   Keywords: Bioinformatics analysis, Machine learning, Metabolism,
   Differentially expressed genes (DEGs, Biomarkers

Introduction

   In recent decades, economic development has led to obesity becoming a
   major global public health issue. Obesity prevalence exhibits
   significant global variation, ranging from less than 5% in regions like
   China, Japan, and Africa to rates exceeding 75% in urban areas of Samoa
   [[40]1]. Whether in developing or developed countries, obesity can
   affect the overall health of the population. Moreover, obesity is
   closely associated with a variety of non-communicable diseases, such as
   cardiovascular diseases, diabetes, musculoskeletal disorders, and
   specific types of cancer [[41]2]. Obesity is a multifactorial
   condition, with increased risk tied to environmental, socioeconomic,
   and demographic factors. Furthermore, links have been found between
   being overweight or obese and various factors, including age, gender,
   socioeconomic status, and whether one resides in urban or rural areas
   [[42]3].

   Obesity classification commonly depends on the body mass index (BMI)
   calculation. This formula requires dividing an individual's weight (in
   kilograms) by their height squared (in meters), resulting in a value
   measured in kg/m^2 [[43]4]. Based on a systematic review of prevalence
   data, individuals with a BMI ≥ 30 kg/m^2 are commonly considered obese.
   Estimates within this classification indicate that the occurrence of
   metabolically healthy obesity (MHO) spans from 10 to 51% [[44]5].
   Furthermore, research conducted by Phillips and colleagues demonstrated
   that the prevalence of metabolic health in obese individuals was
   between 6.8 and 36.6%, while the prevalence of metabolically unhealthy
   subjects in non-obese individuals ranged from 21.8 to 87% [[45]6].
   Additionally, obesity is a major factor contributing to hypertension,
   with metabolic abnormalities closely associated with the severity of
   this condition and the risk of target organ damage. Disruptions in body
   composition and the presence of visceral obesity significantly
   influence metabolic risk factors. By rearranging sentence structures,
   using synonyms, and adding slight variations, the core message remains
   intact while reducing the risk of plagiarism [[46]7]. Obesity is
   closely linked with various cardiac metabolic disorders such as type 2
   diabetes (T2D), dyslipidemia, hypertension, and coronary artery
   disease. However, the underlying mechanisms differentiating obesity
   from these common cardiac metabolic complications remain incompletely
   understood. In addition to current research, genome-wide association
   studies can reveal new genetic factors and pathways [[47]8, [48]9].

   The main goal of this research is to clarify how genes related to
   metabolism are involved in obesity-linked metabolic diseases.
   Furthermore, we seek to evaluate their expression in cellular models.

Materials and methods

Data collection

   The [49]GSE110729 and [50]GSE205668 datasets were derived from the from
   the Gene Expression Omnibus (GEO) database
   ([51]https://www.ncbi.nlm.nih.gov/geo/). [52]GSE110729 expression
   profile data included 28 patients with 15 in the lean group and 13 in
   the obese group and served as a training data set. Meanwhile, the
   expression profile data of [53]GSE205668 included 61 patients, with 35
   in the lean group and 26 in the obese group, and served as the
   validation dataset. The [54]GSE205668 dataset is a bulk RNA-Seq
   analysis of adipose samples collected during routine surgeries as part
   of the Leipzig Childhood Study in Germany, while [55]GSE110729 is a
   bulk RNA-Seq study of adult subjects in the United States.

Differential analysis

   The "DESeq2" package in R was utilized to perform differential analysis
   on the [56]GSE110729 dataset. Genes with differential expression (DEGs)
   were identified using screening criteria of padj ≤ 0.05 and
   |logFC| > 1.

GO and KEGG pathway enrichment analyses

   Gene Ontology (GO) enrichment analysis is a widely used bioinformatics
   technique for extracting in-depth insights from extensive genomic
   datasets, such as Biological Process (BP), Cellular Component (CC), and
   Molecular Function (MF). Additionally, Kyoto Encyclopedia of Genes and
   Genomes (KEGG) pathway enrichment analysis, another common approach for
   understanding biological processes and functions, was executed. For the
   KEGG enrichment analysis, the "clusterProfiler" package was employed,
   whereas the "Metascape" database was used for the GO enrichment
   analysis.

WGCNA

   Weighted Correlation Network Analysis (WGCNA) is a genomics research
   method that facilitates the discovery of gene clusters with high
   relatedness. This is accomplished by constructing a coexpression
   network using the WGCNA-R package, with a focus on the top 5000 genes
   exhibiting the highest variance. This network facilitates web-based
   gene screening to identify potential biomarkers or therapeutic targets.
   Modules of genes are then identified through hierarchical clustering,
   and gene expression patterns are used to build a weighted gene network.
   Genes are categorized based on their expression patterns, grouping
   those with similar patterns into modules. This process divides tens of
   thousands of genes into multiple modules based on their expression
   patterns, utilizing the correlation and correlation coefficient as key
   measures.

GSVA

   We utilized Gene Set Variation Analysis (GSVA) to evaluate the activity
   of biological pathways in our gene expression dataset. GSVA is an
   unsupervised, non-parametric method that calculates pathway scores for
   each sample. This approach enables a comprehensive and data-driven
   exploration of changes at the pathway level, facilitating the discovery
   of biologically significant insights within our dataset.

   At first, the gene sets related to metabolism were obtained from
   single-sample gene set enrichment analysis(GSEA) | MSigDB
   (gsea-msigdb.org). Following this, the GSVA package in R was utilized,
   focusing on the ssGSEA technique for computing metabolic scores.
   Additionally, the limma package was employed for conducting
   differential analysis to pinpoint significant KEGG pathways. Metabolism
   points for major KEGG pathways were used as patient-specific inputs.
   The WGCNA network was then constructed using mRNA expression data to
   identify module genes most involved in metabolism, and the specific
   molecular mechanism was further studied.

Identify metabolism-associated signature genes of obesity by machine learning

   eXtreme Gradient Boosting(XGBoost) and Random Forest(RF) were applied
   to identify signature genes linked to metabolism in obesity. XGBoost,
   short for Extreme Gradient Boosting, is a powerful and versatile
   machine learning algorithm widely employed in various fields due to its
   exceptional predictive performance. Originally introduced as an
   ensemble learning technique, XGBoost builds upon decision trees and
   aims to minimize prediction errors by iteratively adding weak learners
   to the model, thus boosting its overall accuracy. This algorithm has
   proven particularly effective in addressing challenges posed by
   high-dimensional datasets and complex relationships among variables.
   Our experiment involved the utilization of the XGBoost algorithm
   through the "caret" package in R software for both feature selection
   and classification purposes. The Random Forest (RF) algorithm functions
   as an ensemble method that merges various decision trees to produce a
   single decision based on the combined outcomes of different
   classifiers. Each tree in the forest is constructed utilizing the
   bootstrap technique, which involves selecting diverse samples from the
   original dataset and training them with a randomly selected function
   using the bagging mechanism. Subsequently, decisions made by numerous
   individual trees are aggregated through a voting process, with the
   class receiving the most votes being designated as the prediction. In
   this instance, we applied the RF algorithm to forecast RF using the
   cellular senescence-associated signature genes within the
   "randomForest" package in R software.

   In conclusion, The machine learning methods mentioned above identified
   specific genes that were deemed to be characteristic of obesity in
   terms of metabolism, specifically known as the signature genes of
   obesity. The expression levels of these genes were assessed in both the
   training set ([57]GSE110729) and the testing set ([58]GSE205668). To
   assess the predictive precision of these characteristic genes, we
   generated an ROC (Receiver Operating Characteristic) curve utilizing
   the "pROC" package within the R software.

   The curve showing the ROC demonstrates the correlation between
   sensitivity, which is the true positive rate, and specificity,
   calculated as 1 minus the false positive rate. The X-axis, representing
   1-specificity (false positive rate), approaches zero as accuracy
   increases. On the other hand, the Y-axis, denoting sensitivity (true
   positive rate), exhibits higher accuracy as it increases.

Animals

   Thirteen-week-old male C57BL/6J (WT), ob/ob, and DB/DB mice were
   sourced from Shulaibao (Wuhan) Biotechnology Co., Ltd. These mice were
   housed in cages with regulated temperatures and a 12-h light–dark
   cycle, with unrestricted access to water and a standard diet (chow
   diet, CD) for a duration of 4 weeks. In a separate treatment,
   six-week-old male C57BL/6J mice were fed a high-fat diet (HFD, D12492,
   Research Diet, New Brunswick, NJ, USA) for a period of 12 weeks. Every
   mouse participating in this research maintained a C57BL/6J genetic
   background. Euthanasia was performed via intraperitoneal administration
   of ketamine (100 mg/kg) and xylazine (10 mg/kg), followed by cervical
   dislocation. Subsequent to euthanasia, white adipose tissue was
   harvested for further experimental processing and analysis. The Animal
   Experimental Ethics Committee of Chongqing Medical University approved
   all the experimental procedures, which were conducted in compliance
   with applicable guidelines and regulations.

Real-time quantitative PCR (qRT-PCR)

   Total RNA extraction was accomplished utilizing the RNeasy mini kit
   (Qiagen, Germany) according to the given protocols, after which cDNA
   synthesis was conducted using Qiagen's quantitative reverse
   transcription kit pursuant to the manufacturer's guidelines. For the
   quantitative analysis of target genes, the FastStart Universal SYBR
   Green Master kit was employed, and DNA amplification was executed with
   the LightCycler 480 system. To determine the relative abundance of
   target gene mRNA, the delta-delta Ct (ΔΔCt) method was applied,
   utilizing an internal control for comparison purposes. The PCR was
   processed under the following conditions: an initial denaturation at
   95 °C for 5 min, succeeded by 40 cycles of denaturation at 95 °C for
   10 s, annealing at 60 °C for 60 s, and a final extension at 72 °C for
   30 s.

Data and statistical analysis

   For statistical processing, GraphPad Prism 6.01 (GraphPad Software
   Inc., San Diego, CA, USA) was used. Data is represented as
   mean ± standard error of mean (mean ± SEM). Comparative analysis of
   quantitative values was performed using a two-factor analysis of
   variance. Tukey’s honest significant difference (HSD) test was
   implemented for subsequent pairwise evaluations [[59]10].

Results

Identification of DEGs related to obesity

   Data from the [60]GSE110729 dataset was retrieved from the Gene
   Expression Omnibus (GEO) database
   ([61]https://www.ncbi.nlm.nih.gov/gds/). Principal Component Analysis
   (PCA) was utilized to assess dataset variance, employing the
   "FactoMineR" package. Visualization of the findings was carried out
   using the "ggplot2" package (Fig. [62]1C). Differentially expressed
   genes (DEGs) were pinpointed by categorizing the 28 samples into 13
   obese and 15 lean samples. DEGs analysis was performed using the
   "DESeq2" package, with screening criteria set as padj < 0.05,
   |logFC| > 1, and p-values arranged in ascending order. A total of 40
   genes, comprising 20 up-regulated genes and 20 down-regulated genes,
   were identified as the most significant. Hierarchical clustering
   analysis was performed using all DEGs, revealing distinct expression
   differences between the two groups, as depicted in the heatmap
   (Fig. [63]1A). In addition, through visualization using “ggplot2”, it
   can be found in the volcan plot that the differential genes in the chip
   are mostly downregulated genes (Fig. [64]1B).

Fig. 1.

   [65]Fig. 1
   [66]Open in a new tab

   Identification results of differentially expressed genes (DEGs) and
   functional enrichment analysis. A Heatmap of 40 DEGs identified using
   the "DESeq2" package. Samples in the differentiating gene sets
   ([67]GSE110729) are displayed by columns, while genes are represented
   by rows. Gray squares indicate lean samples, and red squares indicate
   obese samples. DEGs: differentially expressed genes. B Volcano plot of
   DEGs, with red color indicating high expression and blue color
   indicating low expression. C Principal component analysis (PCA) was
   used for quality control, with each point representing a sample. Blue
   points represent lean samples, and red points represent obese samples.
   D GO enrichment analysis of DEGs performed by Metascape. E KEGG
   enrichment analysis of DEGs performed by clusterProfiler

GO and KEGG enrichment analysis of DEGs

   To investigate the biological characteristics of the DEGs, enrichment
   analysis was performed using the Metascape database [[68]11]. BP
   suggests that DEGs are enriched in processes such as "cellular
   glucuronidation," "uronic acid metabolic process," and "glucuronate
   metabolic process." This observation implies that metabolic reactions
   might be crucial for obese populations. Regarding CC, the terms
   "spindle," "collagen-containing extracellular matrix," and "condensed
   chromosome" showed significant enrichment. Hence, we hypothesize that
   DEGs primarily influence chromosomal activities. Additionally, the MF
   terms associated with DEGs included "glucuronosyltransferase activity,"
   "cytokine activity," and "receptor ligand activity" (Fig. [69]1D). To
   investigate the underlying roles of DEGs, an analysis of KEGG pathway
   enrichment was performed utilizing the software package
   "clusterprofiler". The findings indicated that DEGs were notably
   enriched in the pathways of "Porphyrin metabolism", "Ascorbate and
   aldarate metabolism", and "Interconversions of Pentose and glucuronate"
   (Fig. [70]1E).

GSVA reveals the potential function of differential genes

   We obtained pathway scores for each sample through GSVA analysis.
   Thereafter, differential analysis was performed using the limma
   package, setting the threshold at P value < 0.01, to identify
   differential metabolic pathways. The top differential GSVA terms
   related to metabolism were “propanoate metabolism”,“porphyrin and
   chlorophyll metabolism” and “galactose metabolism” (Additional file
   [71]1: Fig. S1A). The volcano plot clearly demonstrate the
   downregulation of genes associated with metabolic pathways such as
   "propanoate metabolism" in obese individuals compared to lean
   individuals. Conversely, genes involved in "porphyrin and chlorophyll
   metabolism" and "galactose metabolism" exhibit upregulation in obese
   individuals when compared to their lean counterparts (Additional file
   [72]1: Fig. S1B).

WGCNA analysis of gene expression profiles in obese and lean

   WGCNA facilitates the detection of disease-related modules
   characterized by coordinated expression patterns, thereby significantly
   enhancing the identification of central genes. To construct a gene
   co-expression network, clustering analysis was performed using the
   [73]GSE110729 dataset, involving a total of 28 samples for the
   construction of a hierarchical clustering tree. The soft threshold
   power of 24 was selected using the pickSoftThreshold function available
   in the WGCNA package for the [74]GSE110729 dataset (Fig. [75]2A). We
   utilized the innovative mixed cropping technique to combine modules
   with high similarity in feature genes, resulting in the discovery of 10
   unique gene modules represented by various colors. We employed the
   plotEigengeneNetworks function from the WGCNA package to visualize and
   analyze the consensus eigengene networks. The eigengene adjacency
   heatmap effectively illustrates the correlation structure among
   modules, uncovering a significant level of interconnectivity among
   specific eigengenes, such as MEmaroon and MEmediumpurple4, as well as
   MEdarkorange2 and MEbrown2 (Fig. [76]2B). Notably, the gray module
   consisted of genes that could not be classified (Fig. [77]2C). The
   heatmap illustrates that the orangered4 module exhibited the strongest
   correlation with propanoate metabolism among all three metabolic
   pathways (Fig. [78]2D). To evaluate the potential function and
   mechanism of propanoate metabolism, we picked the orangered4 module
   that exhibited the highest correlation with propanoate metabolism,
   which was validated as a hub module. These genes from the hub
   orangered4 module was analyzed by the “clusterProfiler” package for GO
   and KEGG enrichment analysis.

Fig. 2.

   [79]Fig. 2
   [80]Open in a new tab

   Weighted Gene Coexpression Network Analysis (WGCNA) in the
   [81]GSE110729 cohort. A Scale independence and average connectivity in
   a metaqueue of differentially metabolized genes. B Hierarchical
   clustering dendrogram of module eigengenes with color labels. C Gene
   dendrogram and modules before and after merging in the [82]GSE110729
   cohort. D Correlation analysis of merged modules with metabolic
   pathways in the [83]GSE110729 cohort. E The intersection of genes
   related to obesity metabolism obtained from WGCNA with the set of
   differentially expressed genes (DEGs) in the [84]GSE110729 cohort

GO and KEGG enrichment analysis of metabolism related genes

   BP suggests that genes associated with metabolic processes are highly
   present in "carboxylic acid breakdown", "organic acid breakdown", and
   "cellular amino acid breakdown". Regarding CC, the phrases "Golgi
   apparatus cisterna", "cis-Golgi network", and "Golgi apparatus cisterna
   membrane" showed significant enrichment. As for MF, differentially
   expressed genes related to metabolism exhibited terms such as "vitamin
   binding", "ligase activity", and "pyridoxal phosphate binding"
   (Additional file [85]1: Fig. S2A). We utilized the “clusterProfiler”
   package to perform KEGG pathway enrichment analysis. The analysis
   identified that the differentially expressed genes were significantly
   enriched in the following pathways: “Valine, leucine, and isoleucine
   degradation,” “Propanoate metabolism,” and “Glyoxylate and
   dicarboxylate metabolism.” (Additional file [86]1: Fig. S2B).

Construction and evaluation of machine learning models

   As depicted in the Venn diagram (Fig. [87]2E), the metabolism-related
   module obtained from WGCNA, referred to as orangered4, was intersected
   with the set of DEGs. Consequently, we identified 14 metabolism-related
   DEGs. Subsequently, two machine learning models were constructed:
   Random Forest and XGBoost. For the XGBoost model, gene importance was
   evaluated using variable importance values. Genes with values greater
   than or equal to the median were selected (Additional file [88]1: Fig.
   S3A). Similarly, the Random Forest model evaluated gene importance
   using MeanDecreaseGini values, selecting genes with values ≥ 1
   (Additional file [89]1: Fig. S3B). Consequently, two key genes were
   identified among the 14 metabolism-related DEGs using the two machine
   learning models (Additional file [90]1: Fig. S3C). We evaluated the
   diagnostic performance of the model we built and the two characteristic
   genes by analyzing the ROC curve. In the training dataset, the model
   demonstrated an area under the curve (AUC) of 1 (95% confidence
   interval 1–1), whereas in the validation dataset, the AUC was 0.799
   (95% CI 0.727–1) (Fig. [91]3A, D). Specifically, for the gene Storkhead
   Box 1 (STOX1), the AUC was 0.877 (95% CI 0.727–1) in the training set
   and 0.791 (95% CI 0.666–0.917) in the validation set. For the gene
   NACHT and WD repeat domain-containing protein 2 (NWD2), the AUC was
   0.908 (95%CI 0.779–1) in the training set and 0.803
   (95%CI = 0.681–0.925) in the validation set (Fig. [92]3B, E). The
   expression differences of obesity signature genes between lean and
   obese individuals are illustrated in Fig. [93]3C, F. In obese
   individuals, NWD2 and STOX1 were discovered to be notably reduced in
   comparison to their lean counterparts (P < 0.05).

Fig. 3.

   [94]Fig. 3
   [95]Open in a new tab

   Expression difference and ROC curve of obesity signature genes in the
   [96]GSE110729 (training set) and [97]GSE205668 (valid set). The ROC
   curves were used to evaluate the diagnostic efficacy of the
   RandomForest model in the [98]GSE110729 (A) and [99]GSE205668 (D). The
   ROC curves of the two obesity signature genes in the [100]GSE110729 (B)
   and [101]GSE205668 (E). Expression difference of obesity signature
   genes among lean and obese in the [102]GSE110729 (C) and [103]GSE205668
   (F)

Distinct metabolic subtypes in obesity

   To investigate the predictive potential of metabolic genes in 28
   obesity and slim individuals, we clustered the data unattended using
   the R-package "ConsensusClusterplus". We successfully identified two
   different subtypes and the cluster stability turned out to be optimal
   for k = 2 (Fig. [104]4A, B). It is worth noting that there were notable
   variances in the levels of metabolic-related compounds expressed among
   the pair of clusters (Fig. [105]4C). We further validated our findings
   using principal component analysis and observed that the expression
   levels of 28 metabolic-related molecules can clearly distinguish the
   two clusters (Fig. [106]4D). Moreover, the outcomes of the GO, KEGG
   analyses and GSEA demonstrated a strong correlation between the
   subtypes identified by consensus cluster analysis and the metabolic
   pathway (Fig. [107]4E–G). The lollipop plot illustrates a strong
   correlation between the two key genes, NWD2 and STOX1, and metabolic
   pathways (Fig. [108]4H, I).

Fig. 4.

   [109]Fig. 4
   [110]Open in a new tab

   Identification and expression analysis of obesity metabolism related
   gene clusters. Enrichment analysis of subtypes including C1 and C2.
   Correlation analysis of NWD2 and STOX1 with metabolic pathways. A
   Uniform clustering matrix at K = 2. B Representative CDF curve. C PCA
   between C1 and C2 gene clusters. D Expression of C1and C2 gene
   clusters. E GO analysis. F KEGG analysis. G GSEA. H Correlation
   analysis of NWD2 with metabolic pathways. I Correlation analysis of
   STOX1 with metabolic pathways

Expression of STOX1 and NWD2 in mice

   To investigate the potential relationship between the STOX1 and NWD2
   genes and metabolic processes, we analyzed their expression levels in
   the white adipose tissue of wild-type (WT), ob/ob, DB/DB, and high-fat
   diet (HFD) mice. Using qRT-PCR, we discovered a significant
   differential expression, with STOX1 and NWD2 mRNA levels being lower in
   obese mice as opposed to lean ones (p < 0.05). These results indicate
   that the STOX1 and NWD2 genes could potentially act as biomarkers for
   metabolism associated with obesity(Fig. [111]5).

Fig. 5.

   [112]Fig. 5
   [113]Open in a new tab

   Expression of STOX1 and NWD2 in white adipose tissue of mice. A NWD2 B
   STOX1 groups: c57, DB/DB, HFD, ob/ob

Discussion

   Adipokines are signal molecules secreted by fatty tissue that play a
   crucial role in various physiological functions, including energy
   balance, metabolism, inflammation and immune function. Adipokines play
   a crucial role in maintaining metabolic homeostasis and regulating
   various physiological functions. Further research on adipokines and
   their interactions with different tissues and organs may provide
   valuable insights into the pathogenesis of obesity and associated
   diseases, as well as potential therapeutic targets for managing these
   conditions.

   For instance, loci near IRS1 have been found to harbor alleles
   associated with favorable cardiac metabolic risk characteristics,
   particularly increased body fat. Several genes related to insulin
   signaling (ADCY5, CCCDC9, MTOR, RAC1), energy expenditure (IGF2BP2),
   and inflammation (SH2B3, ADCY9) may serve as therapeutic targets for
   mitigating cardiac metabolic risks linked with obesity [[114]12].

   The current research utilized the DESeq2 software to conduct a
   comparative analysis of genes, aiming to discover differentially
   expressed genes (DEGs) within a sample population consisting of both
   obese and lean subjects. Following this, a comprehensive enrichment
   analysis was carried out to explore the biological pathways linked to
   these DEGs. The analysis revealed that the DEGs were primarily enriched
   in “cellular glucuronization” and “glucuronic acid metabolism
   processes”. Machine learning techniques were then employed to identify
   STOX1 and NWD2 genes that are functionally related to metabolism.
   Through consensus clustering based on the expression of STOX1 and NWD2
   genes, two distinct molecular subtypes were identified. The accuracy of
   the model constructed using the random forest algorithm was evaluated
   using ROC curves, and the expression of these genes in different cells
   was verified using qRT-PCR. Interestingly, the expression level of
   these genes was found to be relatively low in obese individuals,
   implying that STOX1 and NWD2 may have potential as valuable biomarkers
   for obesity-related diseases. Furthermore, the bar plot clearly
   demonstrates a strong correlation between the two key genes and
   metabolic pathways. GSEA was performed on the DEGs identified among the
   subtypes clustered based on the NWD2 and STOX1 genes using consensus
   clustering, revealing a significant association with metabolic
   pathways.

   STOX1 is an angle encryption factor that shares structures and
   functional comparisons with disguise transfer factor [[115]13].
   Initially STOX1 was described as consisting of six isomers, namely A,
   B, C, D, E and F, which were identified by selective displacement.
   Among these isomers STOX1A and STOX1B have been significantly studied.
   STOX1A represents the most complete isomer possessing a DNA binding
   domain and an activator domain in which STOX1B shares only the former
   [[116]14]. Functional investigations of STOX1 have largely focused on
   its involvement in various biological processes [[117]15], such as the
   cell cycle [[118]16], early development [[119]17], oxidative stress
   regulation [[120]18] etc. Moreover, STOX1 also has significant
   implications in various diseases. The STOX1 gene is responsible for
   encoding a cytoplasmic protein predominantly associated with fetal
   development and maternal blood pressure regulation [[121]19]. Yi Xu et
   al. demonstrated that STOX1 overexpression modulates genes involved in
   hypoxia, redox balance, carbon monoxide, and energy metabolism, thereby
   playing a significant role in pulmonary artery remodeling.
   Additionally, STOX1 has been shown to promote mitotic entry and
   proliferation of inner ear epithelial cells while inhibiting cerebellar
   granule neurogenesis and the synthesis of neural tube cell tumors
   [[122]20].

   NWD2 is a nodular-like receptor (NLR) that possesses an N-terminal
   motif resembling the β-solenoid folding pattern. This repeating unit,
   known as NWD2, can undergo a structural transformation from its prion
   protein-forming region to adopt β-solenoid folds, thereby activating
   Het-s pore formation proteins [[123]21]. NWD2, located on the fifth
   position of the chromosome, has been shown to possess a unique role in
   signal transduction, particularly in cholinergic neurons within the
   habenular nucleus [[124]22]. Further investigations have indicated that
   oligomeric NWD2 triggers the transformation of HET-s prions and
   disrupts the nucleating HET-s's aggregation ability through mutations
   affecting its helix folding mechanism [[125]21]. Additionally, in
   fungi, the expression of NWD2 redirects HET-s towards the cell's
   periphery, activating the pore-forming protein and providing evidence
   for NWD2's signaling interaction in primitive cells [[126]23].
   Nonetheless, the contribution of STOX1 and NWD2 to obesity metabolism
   remains inadequately investigated.

   Our analysis reveals significant associations between the genes STOX1
   and NWD2 and the KEGG pathways KEGG_PROPANOATE_METABOLISM and
   KEGG_PORPHYRIN_AND_CHLOROPHYLL_METABOLISM. These findings suggest novel
   potential connections between these genes and adipose tissue biology,
   specifically within the context of short-chain fatty acid metabolism
   and heme processing. The association with propanoate metabolism pathway
   points to a possible involvement of STOX1 and NWD2 in short-chain fatty
   acid (SCFA) metabolism. SCFAs, particularly propionate, have been shown
   to play crucial roles in energy homeostasis and adipose tissue
   function. They act as signaling molecules, influencing appetite
   regulation, insulin sensitivity, and inflammation in adipose tissue1
   [[127]24]. The link between STOX1 and NWD2 and propanoate metabolism
   suggests these genes may modulate SCFA production or utilization in
   adipose tissue, potentially impacting obesity development.

   Concurrently, the association with the porphyrin and chlorophyll
   metabolism pathway implies a potential role in heme processing. Recent
   studies have highlighted the importance of heme metabolism in adipose
   tissue function and insulin sensitivity2 [[128]25]. Dysregulation of
   heme homeostasis has been linked to adipose tissue dysfunction and
   insulin resistance, key factors in obesity pathogenesis3 [[129]26]. The
   connection of STOX1 and NWD2 to this pathway suggests they may
   influence adipose tissue function through modulation of heme
   metabolism.

   Recent studies have established a close association between related
   genes and obesity metabolism. Our research findings indicate that STOX1
   and NWD2 hold significant diagnostic value in predicting
   obesity-related metabolic diseases. Our analysis of the training set
   ([130]GSE110729, AUC = 1) and validation set ([131]GSE205668,
   AUC = 0.799) demonstrates their potential as excellent diagnostic
   biomarkers. Therefore, STOX1 and NWD2, due to their close relationship
   with cellular metabolism, may serve as potential biomarkers for obesity
   metabolism. Furthermore, we seek to determine the association between
   these genes and the biological pathways governing the initiation of
   obesity, as well as their impact on metabolic characteristics.

   Although age-related differences in gene expression and metabolic
   regulation may influence the roles of STOX1 and NWD2 in propanoate and
   porphyrin metabolism pathways, both genes are associated with
   KEGG_PROPANOATE_METABOLISM and
   KEGG_PORPHYRIN_AND_CHLOROPHYLL_METABOLISM across datasets of different
   ages. Furthermore, these genes exhibit differential expression between
   obese and normal samples, a finding validated in animal models. The
   [132]GSE205668 dataset comprises bulk RNASeq analysis of adipose
   samples from the Leipzig Childhood Study in Germany, while
   [133]GSE110729 involves deep bulk RNA sequencing of 12 obese and 15
   lean adults from the Karolinska Institutet in Sweden. We selected
   [134]GSE110729 as the training set and [135]GSE205668 for external
   validation, focusing on their compatibility and data quality. We
   acknowledge that the ethnic backgrounds of the participants is a
   crucial factor of diseases like obesity. However, these datasets
   primarily offer geographical location information and lack detailed
   data on the ethnicity of the participants. Consequently, our study
   could not directly address the ethnicity of the cohort. We recognize
   this as a limitation in our research. Moving forward, we plan to engage
   in research involving more ethnically diverse cohorts to enhance the
   universality of our findings. Besides, we acknowledge that the
   generalizability of these findings to other cohorts may be limited by
   the demographic and ethnic composition of the datasets we used. While
   our findings offer valuable insights into the genes associated with
   obesity, we are planning to conduct further studies using more diverse
   and larger cohorts. This future research will aim to validate and
   extend the applicability of these genes across various populations,
   providing a more comprehensive understanding of their role in obesity.
   Last but not least, it is essential to note that further examination of
   a larger sample size is warranted to validate these findings and
   unravel their underlying mechanisms. It is essential to note that
   further examination of a larger sample size is warranted to validate
   these findings and unravel their underlying mechanisms.

Conclusion

   This preliminary study aims to examine the involvement of
   metabolism-related genes, STOX1 and NWD2, in obesity-related metabolic
   disorders. The findings of this report contribute significantly to the
   existing knowledge and suggest that STOX1 and NWD2 might serve as
   promising biomarkers in the obesity population. Future research should
   employ updated datasets for validation, utilize immunohistochemistry
   and Western blot techniques to assess protein expression, and
   investigate the clinical utility of these genes by employing metabolic
   inhibitors.

Supplementary Information

   [136]Additional file 1.^ (340.9KB, docx)

Acknowledgements