Abstract

   Non-alcoholic fatty liver disease (NAFLD) is a global health challenge
   with complex pathogenesis and limited diagnostic biomarkers.
   Palmitoylation, a post-translational modification, has emerged as a
   critical regulator in metabolic disorders, yet its role in NAFLD
   remains underexplored. This study integrated bioinformatics analysis
   and machine learning to identify palmitoylation-related biomarkers for
   NAFLD. Transcriptomic datasets from human liver tissues were analyzed
   to identify differentially expressed genes (DEGs) and co-expression
   modules via WGCNA. Intersection analysis revealed 60
   palmitoylation-related DEGs (PR-DEGs). Seven machine learning models
   were employed, with Neural Network (NNET) and Decision Tree (DT)
   outperforming others, identifying three hub genes: TYMS, WNT5A, and
   ZFP36. A nomogram integrating these genes demonstrated robust
   diagnostic accuracy (AUC = 0.976). The pivotal role of these genes in
   diagnosing NAFLD was confirmed using the validation dataset
   (AUC = 0.903). Functional enrichment linked these genes to TNF
   signaling, lipid metabolism, and immune pathways. Single-cell RNA-seq
   analysis highlighted their expression in hepatocytes and immune cells,
   with altered intercellular communication patterns. Immune infiltration
   analysis revealed significant shifts in monocytes, dendritic cells, and
   macrophages in NAFLD. Regulatory network analysis highlighted that
   hsa-let-7b-5p might be pivotal co-regulator of the three hub gene
   expressions. Finally, the top 10 potential gene-targeted drugs were
   screened. This study unveils novel palmitoylation-related biomarkers
   and provides insights into NAFLD pathogenesis, offering diagnostic and
   therapeutic avenues.

Supplementary Information

   The online version contains supplementary material available at
   10.1038/s41598-025-13477-3.

   Keywords: Non-alcoholic fatty liver disease, Palmitoylation, Biomarker,
   Bioinformatics, Machine learning

   Subject terms: Computational biology and bioinformatics, Drug
   discovery, Immunology, Molecular biology, Biomarkers, Gastroenterology

Introduction

   Non-alcoholic fatty liver disease (NAFLD) has emerged as the most
   prevalent chronic liver disease globally, with an incidence rate of 25%
   among adults. This figure is projected to continue rising in tandem
   with the increasing prevalence of obesity and metabolic
   syndrome^[44]1–[45]3. Characterized by excessive lipid accumulation in
   hepatocytes (steatosis), NAFLD encompasses a spectrum of liver
   pathologies, ranging from simple steatosis to non-alcoholic
   steatohepatitis (NASH), which may progress to fibrosis, cirrhosis, and
   hepatocellular carcinoma (HCC)^[46]4,[47]5. Notably, 20–30% of NAFLD
   patients develop NASH, and up to 20% of these may progress to cirrhosis
   within 3–4 years, significantly increasing their risk of liver-related
   mortality^[48]6,[49]7. Alarmingly, NAFLD is projected to become the
   leading cause of liver transplantation by 2030^[50]8,[51]9,
   underscoring the urgent need for early diagnostic biomarkers and
   targeted therapeutic strategies.

   Palmitoylation, a reversible post-translational modification involving
   the covalent attachment of palmitate to cysteine residues, regulates
   diverse cellular processes, including protein trafficking, membrane
   localization, and signal transduction^[52]10–[53]12. In metabolic
   disorders, palmitoylation modulates lipid metabolism, inflammatory
   responses, and insulin signaling^[54]13,[55]14—key pathways implicated
   in NAFLD pathogenesis^[56]15,[57]16. For instance, palmitoylation of
   Toll-like receptor 4 (TLR4) enhances pro-inflammatory cytokine
   production in hepatic macrophages, exacerbating NASH
   progression^[58]17. Similarly, dysregulated palmitoylation of
   mitochondrial proteins disrupts fatty acid oxidation, promoting hepatic
   lipotoxicity^[59]14,[60]18. Despite these advances, the systemic
   profiling of palmitoylation-related genes (PRGs) in NAFLD remains
   underexplored, and their roles as diagnostic biomarkers or therapeutic
   targets are yet to be elucidated.

   Leveraging integrated bioinformatics and machine learning, this study
   aims to bridge this knowledge gap. By analyzing transcriptomic datasets
   from NAFLD patients, we employed WGCNA to identify disease-associated
   modules and intersected these with palmitoylation-related genes.
   Advanced machine learning algorithms were then applied to screen hub
   genes, validated through single-cell RNA sequencing and immune
   infiltration analyses. Our multi-omics approach identifies novel
   palmitoylation-related biomarkers and constructs a robust diagnostic
   nomogram, providing insights into their roles in immune modulation,
   metabolic dysregulation, and intercellular communication within the
   hepatic microenvironment. This work provides a foundation for
   understanding NAFLD pathogenesis through the lens of palmitoylation and
   highlights potential therapeutic targets for clinical translation.

Materials and methods

Data collection

   Figure [61]1 illustrates the workflow of this study. We obtained liver
   tissue transcriptomic datasets from the Gene Expression Omnibus (GEO)
   database ([62]https://www.ncbi.nlm.nih.gov/geo/) (accessed on
   2025-01-13). The GEO database provides large-scale, well-annotated
   public microarray datasets that are highly standardized and have been
   extensively validated in numerous studies, providing a robust
   foundation for our analyses. The following criteria guided our search:
   (1) samples were derived from Homo sapiens; (2) the experimental data
   type was microarray; (3) the dataset included both control and disease
   groups; (4) a minimum sample size of 20. Based on these criteria, we
   selected the datasets [63]GSE89632 ([64]GPL14951) and [65]GSE164760
   ([66]GPL13667) as the training and validation cohorts, respectively.
   The [67]GSE89632 dataset comprised 63 samples: 24 control samples and
   39 NAFLD samples. The [68]GSE164760 dataset included 80 samples: 6
   control samples and 74 NAFLD samples. Subsequently, we downloaded
   palmitoylation-related genes from the GeneCards database
   ([69]https://www.genecards.org/) (accessed on 2025-01-13)
   (Supplementary Table [70]S1).

Fig. 1.

   [71]Fig. 1
   [72]Open in a new tab

   Study flowchart. WGCNA Weighted Gene Co-expression Network Analysis,
   DGEs Differentially Expressed Genes, GO and KEGG Gene Ontology and
   Kyoto Encyclopedia of Genes and Genomes, GSEA Gene Set Enrichment
   Analysis, GSVA Gene Set Variation Analysis.

Identification of differentially expressed genes (DEGs)

   We identified DEGs between the normal and NAFLD groups using the
   “limma” R package. The threshold criteria for DEGs were set at p < 0.05
   and |log2FC| ≥ 0.585. To visualize the DEGs, we employed the “ggplot2”
   R package to generate volcano plots, which highlight genes with
   significant expression changes. Additionally, heatmaps were constructed
   using the “pheatmap” R package to display the expression patterns of
   DEGs across samples. To further elucidate the overlap of DEGs with
   palmitoylated genes, we utilized the “Venn” R package to create Venn
   diagrams, thereby identifying PR-DEGs that are potentially involved in
   NAFLD pathogenesis.

Weighted gene coexpression network analysis (WGCNA) and module gene
identification

   The R package “WGCNA” was adopted to seek co-expression gene modules
   with high biological significance and explore the relationship between
   gene networks and diseases. First, we selected 20,819 genes that were
   expressed in more than half of the samples from the [73]GSE89632
   dataset for further analysis. Second, the optimal soft-thresholding
   power β (ranging from 1 to 30) was generated by the “pickSoftThreshold”
   function to construct a scale free network. The average connectivity
   R^2 threshold was set to 0.9. Third, the adjacency was then converted
   into a topological overlap matrix (TOM), and the gene ratio and
   dissimilarity were determined. Fourth, hierarchical clustering and the
   dynamic tree cut function were employed to cut and identify
   co-expression modules, which were then merged based on similar
   expression patterns for further analysis. The parameters
   “minModuleSize”, “mergeCutHeight” and “deepSplit” were set to 50, 0.6
   and 1.0, respectively. Finally, modules were linked to disease by
   calculating gene significance (GS) and module membership (MM). The
   genes within the module exhibiting the strongest correlation with
   disease were used for subsequent analysis.

Application of machine learning for screening hub genes

   In this study, we employed a diverse array of machine learning
   algorithms to identify hub genes, all of which were implemented through
   corresponding R packages. Specifically, the Random Forest (RF) model
   was constructed using the “randomForest” package; Support Vector
   Machine (SVM) was realized through the “kernlab” package; Generalized
   Linear Model (GLM) was trained with the “caret” package; K-Nearest
   Neighbors (KNN) was also implemented using the “caret” package; Neural
   Network (NNET) was built via the “nnet” package; Least Absolute
   Shrinkage and Selection Operator (LASSO) regression was executed
   through the “glmnet” package; and Decision Tree (DT) was constructed
   using the “rpart” package. The “train” function from the “caret”
   package was utilized to train each model with a 5-fold cross-validation
   strategy. Subsequently, the trained models were transformed into
   interpretable objects using the “explain” function from the “DALEX”
   package, and their predictive performance was assessed via the
   “model_performance” function. The evaluation metrics included residual
   analysis, Receiver Operating Characteristic (ROC) curves, and Area
   Under the Curve (AUC) values. The fitting quality of the models was
   evaluated based on the distribution characteristics of residuals, with
   detailed analyses of prediction errors conducted through the plotting
   of reverse cumulative distribution curves, boxplots, histograms, and
   Precision-Recall Curves (PRC). The importance scores of genes in each
   model were calculated using the “variable_importance” function from the
   “DALEX” package. The residual boxplots and gene importance score plots
   were generated using the “ggplot2” package, while ROC curves were
   created with the “pROC” package. Ultimately, the top-performing two
   machine learning models were selected, and their intersections were
   used to identify the hub genes.

Construction and evaluation of the nomogram

   To evaluate the predictive performance and stability of the hub genes
   and the nomogram, we employed a variety of statistical methods. First,
   the nomogram was constructed using the “rms” package to integrate
   multiple predictive variables into a comprehensive predictive model.
   Subsequently, the ROC curve was plotted using the “pROC” package, and
   the AUC was calculated to assess the model’s discriminative ability. A
   higher AUC value indicates stronger predictive capacity. Additionally,
   calibration curves were used to assess the consistency between
   predicted probabilities and actual outcomes, and the concordance index
   (C-index) was employed to measure the model’s ranking ability and
   stability. A higher C-index value suggests greater stability and
   predictive accuracy of the model.

GO, KEGG, GSEA and GSVA analyses

   The “clusterProfiler” and “org.Hs.eg.db” packages were used to conduct
   Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG)
   analyses^[74]19,[75]20. Additionally, Gene Set Enrichment Analysis
   (GSEA) was performed using the “clusterProfiler” package, utilizing
   human gene sets retrieved from the MSigDB database (accessed on
   2025-01-15), specifically the “c5.go.v2024.1.Hs.symbols.gmt” and
   “c2.cp.kegg_medicus.v2024.1.Hs.symbols.gmt” files. The ranked list of
   genes based on their Spearman correlation coefficients with each hub
   gene was provided as input to the GSEA analysis. To further elucidate
   the phenotypic impact of hub genes, we conducted Gene Set Variation
   Analysis (GSVA) using the “GSVA” package, comparing high and low
   expression groups. Visualization of GO, KEGG, and GSEA enrichment
   results was achieved using the “enrichplot” package, while GSVA results
   were depicted using “ggplot2”. Statistical significance was determined
   using thresholds of p < 0.05 and false discovery rate (FDR) q < 0.05.

Gathering and handling of data for single-cell RNA-Seq analysis

   We analyzed the single-cell data from [76]GSE136103 including four
   high-quality liver samples: [77]GSM4041162, [78]GSM4041163,
   [79]GSM4041165, and [80]GSM4041167 in the GEO database (accessed on
   2025-01-17) using the “Seurat” software package, performing a series of
   standard workflows with the “PercentageFeatureSet”, “NormalizeData”,
   “FindVariableFeatures”, “RunPCA”, “FindNeighbors”, “FindClusters”,
   “RunUMAP”, and “FindAllMarkers” functions. Cell types were annotated
   using the “SingleR” tool, which facilitates accurate cell type
   identification through single-cell expression profiling. To assess the
   functional impact of palmitoylation-related hub genes (PRHGs), we
   utilized the “AddModuleScore” function to determine the
   feature-specific scores for each cell. Furthermore, we examined the
   dynamics of cell-cell interactions using the “CellChat” R package,
   which enables the reconstruction and analysis of intercellular
   communication networks.

Immune infiltration analysis

   The CIBERSORT algorithm, a deconvolution method designed to identify
   the composition of 22 immune cell types based on gene expression data,
   was employed to analyze immune cell infiltration using the “CIBERSORT”
   package. The resulting data were visualized utilizing the “ggplot2” and
   “corrplot” packages. To assess the relationships between immune cells
   and potential biomarkers, Spearman correlation analysis was conducted.

Recognition of transcription factors and miRNAs

   The transcription factor (TF)-gene and gene-miRNA regulatory networks
   were constructed using the NetworkAnalyst 3.0 platform
   ([81]https://www.networkanalyst.ca/) (accessed on 2025-01-27). Data
   from the ENCODE project, which provides comprehensive and integrative
   functional genomic information for human and mouse cells and tissues,
   were utilized. Additionally, miRNA-mRNA interactions were sourced from
   TarBase, a primary database for experimentally validated miRNA-mRNA
   interactions. The resulting networks were visualized using Cytoscape.

Evaluation of candidate drugs

   The Enrichers platform was employed to elucidate the associations
   between drug molecules and hub genes, with data sourced from the Drug
   Signatures Database (DSigDB, [82]https://tanlab.ucdenver.edu/DSigDB)
   (accessed on 2025-02-01). DSigDB comprises 22,527 gene sets,
   encompassing 17,389 drugs that target 19,531 genes.

Statistical analysis

   All statistical analyses were performed using R software (version
   4.4.1). The Wilcoxon test was employed to compare the expression levels
   of identified biomarkers between the control group and the NAFLD group.
   Correlation analysis was conducted using Spearman’s correlation
   analysis. All statistical tests were two-sided, and a p-value of less
   than 0.05 was considered statistically significant.

Results

Implementation of WGCNA and identification of key module genes

   To identify modules significantly associated with NAFLD, we performed
   WGCNA on the [83]GSE89632 dataset. Initially, we conducted hierarchical
   clustering analysis on the samples and detected two outlier samples,
   [84]GSM2385767 and [85]GSM2385782. These outliers were removed by
   setting the cutHeight to 130 (Fig. [86]2A). Subsequently, we chose an
   optimal soft threshold power of 5 (R^2 = 0.9) to guarantee a scale-free
   topological network (Fig. [87]2B). Then, we defined modules by setting
   the “minModuleSize” to 50, “mergeCutHeight” to 0.6, and “deepSplit” to
   1.0, resulting in the identification of nine distinct modules. Our
   analysis revealed a strong correlation between the MEbrown and NAFLD
   (correlation coefficient = -0.87, p = 3e-20) (Fig. [88]2C). Within the
   brown module, a highly significant correlation was observed between
   gene significance (GS) and module membership (MM) (correlation
   coefficient = 0.92, p < 1e − 200) (Fig. [89]2D). Ultimately, we
   identified 2,969 genes significantly associated with NAFLD within the
   brown module.

Fig. 2.

   [90]Fig. 2
   [91]Open in a new tab

   Implementation of WGCNA and identification of key module genes. (A)
   Hierarchical Clustering Dendrogram for Outlier Detection in Sample
   Dataset [92]GSE89632. (B) A soft-thresholding power was set at 5,
   ensuring a scale-free R2 = 0.9. (C) Correlation Heatmap of Module-Trait
   Relationships in NAFLD and Control Samples. (D) Relationship Between
   Module Membership and Gene Significance in the Brown Module.

Identification and functional enrichment analysis of palmitoylation-related
DEGs in NAFLD

   To further elucidate the relationship between palmitoylation-related
   genes and the development of NAFLD, we subsequently conducted a
   comprehensive analysis of DEGs between normal and NAFLD samples in the
   [93]GSE89632 dataset. Ultimately, a total of 872 DEGs were identified
   in NAFLD samples compared to normal samples (Fig. [94]3A, B). By
   intersecting these DEGs with genes from the brown module obtained
   through WGCNA and palmitoylation-related genes, we successfully
   identified 60 PR-DEGs (Fig. [95]3C). GO analysis of these 60 key genes
   revealed significant enrichment in biological processes (BP), including
   the regulation of inflammatory response, positive regulation of
   cytokine production, and response to lipopolysaccharide. In the
   cellular component (CC) classification, collagen-containing
   extracellular matrix and endocytic vesicle exhibited the highest
   enrichment scores. The molecular functions (MF) were primarily enriched
   in enzyme inhibitor activity, DNA-binding transcription activator
   activity, and cytokine activity (Fig. [96]3D, E) (Supplementary Table
   S2). KEGG pathway enrichment analysis indicated that the majority of
   these genes were highly enriched in pathways such as the TNF signaling
   pathway, lipid metabolism and atherosclerosis, IL-17 signaling pathway,
   and inflammatory bowel disease (Fig. [97]3F, G) (Supplementary Table
   S3).

Fig. 3.

   [98]Fig. 3
   [99]Open in a new tab

   Association Analysis of Palmitoylation-Related Genes with the
   Development of NAFLD. (A) A volcano plot was constructed to identify
   DEGs between normal and NAFLD samples. (B) A heatmap was generated to
   display the top 20 upregulated and downregulated genes, sorted by log2
   fold change. (C) A Venn diagram was utilized to illustrate the
   intersection of genes identified through WGCNA, DEGs, and palmitoylated
   genes. (D) GO analysis was performed, and the results were presented in
   a circular plot. (E) GO analysis was further visualized in a bar chart,
   with annotations for biological processes (BP), cellular components
   (CC), and molecular functions (MF). (F) KEGG pathway enrichment
   analysis was conducted, and the findings were depicted in a circular
   plot. (G) KEGG pathway enrichment analysis was also represented in a
   bar chart to provide a detailed overview of the enriched pathways.

Identification of PRHGs for NAFLD via machine learning

   Based on the expression profiles of 60 PR-DEGs, we constructed seven
   machine learning models to further identify genes with high diagnostic
   value. The results indicated that the NNET and DT machine learning
   models exhibited relatively low residuals (Fig. [100]4A, B). We
   assessed the diagnostic performance of the seven machine learning
   algorithms in the test cohort using the AUC. The AUC values for the RF
   model, SVM model, GLM model, KNN model, NNET model, LASSO model, and DT
   model were 0.987, 0.974, 0.675, 0.948, 0.987, 0.974, and 0.955,
   respectively (Fig. [101]4C). We further ranked the top 10 significant
   feature variables in each model using the root mean square error (RMSE)
   (Fig. [102]4D). Collectively, the NNET and DT machine learning models
   demonstrated the highest efficiency in distinguishing between normal
   individuals and patients with NAFLD among the seven models. We selected
   the top 10 predictive genes from both the NNET and DT models and
   identified three PRHGs at their intersection for further investigation:
   TYMS, WNT5A, and ZFP36 (Fig. [103]4E). The AUC values for TYMS, WNT5A,
   and ZFP36 were 0.958, 0.963, and 0.922, respectively, indicating their
   excellent diagnostic potential (Fig. [104]4F).

Fig. 4.

   [105]Fig. 4
   [106]Open in a new tab

   Construction and evaluation of seven machine models. (A) The reverse
   cumulative distribution of residuals for seven machine learning models
   (NNET, RF, KNN, GLM, DT, SVM, LASSO). (B) The boxplot of residuals. The
   boxplot illustrates the distribution of residuals for the seven machine
   learning models (NNET, RF, KNN, GLM, DT, SVM, LASSO). Red dots
   represent the root mean square error (RMSE) of the residuals. (C) The
   ROC curves and corresponding AUC values for seven machine learning
   models (RF, SVM, GLM, KNN, NNET, LASSO, DT). (D) The top 10 significant
   feature variables in seven machine learning models (DT, GLM, KNN,
   LASSO, NNET, RF, SVM), ranked according to root mean square error
   (RMSE). (E) The intersection of the top 10 predicted genes in NNET and
   DT models. (F) The ROC curves and corresponding AUC values for TYMS,
   WNT5A, and ZFP36.

Gene expression analysis and construction and validation of the nomogram

   We initially compared the expression levels of TYMS, WNT5A, and ZFP36
   genes between the control group and the NAFLD patient group. The
   results revealed that the expression of TYMS and WNT5A was
   significantly upregulated, whereas the expression of ZFP36 was
   significantly downregulated in the NAFLD group (Fig. [107]5A). Based on
   these differences, we constructed a nomogram model that converts the
   relative expression levels of each gene into scores, allowing for the
   estimation of NAFLD risk based on the cumulative score (Fig. [108]5B).
   To evaluate the effectiveness of this model, we further analyzed its
   calibration and discrimination capabilities. The calibration curve
   (Fig. [109]5C) demonstrated a high degree of concordance between
   predicted probabilities and actual observed probabilities, indicating
   excellent calibration. The C-index was 0.976 (95% CI: 0.935–1.018)
   (Fig. [110]5C), and the area under the ROC (AUC) was 0.976
   (Fig. [111]5D), confirming the model’s superior discrimination ability.
   Moreover, we validated the accuracy of these candidate biomarkers in an
   external dataset ([112]GSE164760). The results were largely consistent
   with our initial findings, demonstrating that TYMS and WNT5A were
   upregulated, whereas ZFP36 was downregulated in the NAFLD group
   (Fig. [113]5E). Further analysis using the nomogram and ROC curve
   provided additional support for the diagnostic value of these genes
   (Fig. [114]5F, H). In summary, our findings suggest that TYMS, WNT5A,
   and ZFP36 hold promise as diagnostic biomarkers for NAFLD.

Fig. 5.

   [115]Fig. 5
   [116]Open in a new tab

   Evaluation of the diagnostic potential of candidate biomarkers and
   construction of a nomogram in the training ([117]GSE89632) and
   validation cohorts ([118]GSE164760). (A) The violin plots elucidate the
   differential expression profiles of TYMS, WNT5A, and ZFP36 between the
   control and the NAFLD group in the training cohort. (B) A predictive
   nomogram for NAFLD occurrence constructed from PRHGs. “Points” indicate
   the scores of hub genes, and “Total Points” represent the sum of all
   these gene scores. (C) The calibration curve illustrates the
   relationship between actual and predicted NAFLD rates. A perfectly
   accurate model would be represented by the dotted diagonal line. The
   performance of the nomogram is depicted by the solid line, with its
   closeness to the diagonal indicating greater precision. (D) ROC curve
   of the nomogram in training cohort for NAFLD. (E) The violin plots
   elucidate the differential expression profiles of TYMS, WNT5A, and
   ZFP36 between the control and the NAFLD group in the validation cohort.
   (F) A predictive nomogram for NAFLD occurrence constructed from PRHGs
   in the validation cohort. (G) The ROC curves and corresponding AUC
   values for TYMS, WNT5A, and ZFP36 in the validation cohort. (H) ROC
   curve of the nomogram in validation cohort for NAFLD.

GSEA and GSVA pathway analyses reveal molecular mechanisms of PRHGs in NAFLD

   To further investigate the molecular mechanisms of the three PRHGs in
   the diagnosis of NAFLD, we performed GSEA-KEGG pathway enrichment
   analysis for each gene biomarker. The graphical representations
   highlighted the top five most enriched pathways (Fig. [119]6A-C and
   Supplementary Table S4-6). By intersecting the enrichment results of
   these three hub genes, we identified several key pathways in which they
   were commonly enriched. These pathways include mitochondrial complex
   UCP1 in thermogenesis, the impact of inactivated PINK1 on electron
   transfer in Complex I, aberrant TDP43 on electron transfer in Complex
   I, electron transfer in Complex I, the effects of aberrant SNCA and
   ABETA on electron transfer in Complex I, DNA replication origin
   unwinding and elongation, and the interaction between CENPE and the
   NDC80 complex (Fig. [120]6D, E). Subsequently, NAFLD samples were
   stratified into high- and low-expression groups based on the median
   expression levels of the hub genes. GSVA enrichment analysis was then
   performed to investigate the differential pathways between these
   groups. Comprehensive analysis indicated that high expression of TYMS
   may activate serotonin metabolism, hormone-like cytokine to JAK-STAT
   signaling pathway, GH JAK-STAT signaling pathway, and PAX8-PPARG fusion
   to PPARG-mediated transcription. Conversely, low expression of TYMS was
   associated with the activation of pathways such as DNA replication
   origin unwinding and elongation, pre-initiation complex (pre-IC)
   formation, aberrant ABETA to crosstalk between extrinsic and intrinsic
   apoptotic pathways, and keratan sulfate degradation (Fig. [121]6F).
   High expression of WNT5A was linked to the activation of the KEAP1-NRF2
   signaling pathway, tyrosine degradation, and gluconeogenesis, while low
   expression of WNT5A was associated with the activation of the activin
   signaling pathway, nodal signaling pathway, and HSV GD to HVEM NFKB
   signaling pathway (Fig. [122]6G). Similarly, high expression of ZFP36
   was related to the TLR7/8/9-IRF5 signaling pathway and TLR7/9-IRF7
   signaling pathway, whereas low expression of ZFP36 was associated with
   the PERK-ATF4 signaling pathway (Fig. [123]6H).

Fig. 6.

   [124]Fig. 6
   [125]Open in a new tab

   GSEA and GSVA of 3 PRHGs. GSEA of TYMS (A), WNT5A (B) and ZFP36 (C)
   using KEGG gene sets. (D, E) Intersections of GSEA results for 3 PRHGs.
   GSVA of TYMS (F), WNT5A (G) and ZFP36 (H) using KEGG gene sets.

The correlation of the PRHGs with singlecell characteristics

   In order to evaluate the functions of PRHGs within the hepatic
   microenvironment at the single-cell transcriptomic level, we
   comprehensively analyzed the expression profiles of TYMS, WNT5A, and
   ZFP36 across distinct cell types. Our results revealed that TYMS is
   predominantly expressed in NK cells, WNT5A is primarily expressed in
   hepatocytes and monocytes, while ZFP36 exhibits widespread expression
   across various cell types. Notably, hepatocytes were identified as the
   main cell type in which all three PRHGs are co-expressed
   (Fig. [126]7A-E). Utilizing the “AddModuleScore” function, we assigned
   the signature-specific score to each cell based on PRHGs expression.
   Subsequently, cells were categorized into high-score and low-score
   groups according to their PRHGs scores (Fig. [127]7F), and differential
   expression analysis was performed to identify DEGs. GSEA demonstrated
   that these DEGs were significantly enriched in pathways and functions
   such as Cytokine Signaling in the Immune System, IL-18 Signaling
   Pathway, Cytokine Mediated Signaling Pathway, Response to Cytokine, and
   Inflammatory Response (Fig. [128]7G, H). Intriguingly, liver cells
   within the microenvironment exhibited diverse communication patterns
   depending on their PRHGs scores (Fig. [129]7I). Within this intricate
   hepatic microenvironment, various cell types can act as Sender,
   Receiver, Mediator, and Influencer during cellular communication,
   ultimately leading to distinct intercellular signaling cues. Our study
   identified significant alterations and key influencers in the cell
   communication signals of the high-score group, particularly in MIF,
   APP, and COLLAGEN signaling (Fig. [130]7J–L). These findings suggest
   that such signals may play regulatory roles in inflammation, fibrosis,
   immune modulation, and metabolic processes within the hepatic
   microenvironment.

Fig. 7.

   [131]Fig. 7
   [132]Open in a new tab

   The single-cell characteristics of PRHGs. (A) UMAP plot of cell type
   annotation using the “SingleR” package. Density of TYMS: (B), WNT5A:
   (C), ZFP36: (D) Gene Expression. (E) Density of Co-expressed TYMS,
   WNT5A, and ZFP36 Genes. (F) The UMAP plot categorizes cells into two
   groups based on their signature-specific scores derived from the
   expression of PRHGs. (G) The results of GSEA performed using the
   c2.cp.v2024.1.Hs.symbols.gmt gene set collection. (H) The results of
   GSEA performed using the c5.go.v2024.1.Hs.symbols.gmt gene set
   collection. (I) Identifying distinct signal pathways in varying PRHGs
   score groups. (J–L) The signaling pathway networks for MIF (J), APP
   (K), and COLLAGEN (L) pathways, with heatmaps showing cell type
   involvement.

Immune cell infiltration analysis

   To investigate the immune response mechanisms in NAFLD, we employed the
   CIBERSORT algorithm to evaluate the changes in immune cell abundance
   between NAFLD and control groups (Fig. [133]8A, B). Our results
   revealed that, compared to control samples, NAFLD samples exhibited
   significantly higher levels of plasma cells, activated memory CD4 T
   cells, monocytes, activated dendritic cells, activated mast cells, and
   neutrophils. Conversely, there were lower levels of resting memory CD4
   T cells, γδ T cells, activated NK cells, M1 macrophages, M2
   macrophages, resting dendritic cells, and resting mast cells in NAFLD
   samples. Subsequent correlation analysis through a heatmap of immune
   cell interactions demonstrated that in NAFLD samples, M0 macrophages
   were positively correlated with CD8 T cells (r = 0.55), and γδ T cells
   were positively correlated with M2 macrophages (r = 0.63). In contrast,
   monocytes exhibited negative correlations with γδ T cells (r = – 0.75)
   and M2 macrophages (r = – 0.71) (Fig. [134]8C). These findings suggest
   that the immune profile and the interactions among various immune cell
   types in NAFLD patients are distinct from those in controls.
   Additionally, a correlation bubble chart was utilized to illustrate the
   associations between the PRHGs and different immune cells
   (Fig. [135]8D).

Fig. 8.

   [136]Fig. 8
   [137]Open in a new tab

   Analysis of immune cell infiltration. (A) The bar plot visualizing the
   proportion of infiltrating immune cells in control and NAFLD groups.
   (B) The boxplot comparing the proportion of immune cells between NAFLD
   and control groups. *p < 0.05; ** p < 0.01; *** p < 0.001; ****
   p < 0.0001. (C) The correlated heatmap representing the correlation
   between different immune cells in NAFLD. (D) Correlation analysis of
   infiltrating immune cells with three PRHGs.

Construction of the TFs-genes and genes-miRNAs regulatory network

   Given the importance of TFs and miRNAs in elucidating disease
   progression, we constructed TFs-genes and genes-miRNAs network to
   identify the transcriptional and post-transcriptional regulators of
   PRHGs, thereby enhancing our understanding of the disease. The
   TFs-genes network comprises 202 nodes and 228 edges (Fig. [138]9A),
   while the genes-miRNAs network consists of 327 nodes and 501 edges. By
   selecting the “Minimum Network” option, the genes-miRNAs network was
   reduced to a subnetwork with only 4 nodes and 3 edges (Fig. [139]9B).
   Notably, hsa-let-7b-5p interacts with three hub genes, suggesting that
   it may serve as a common regulator of these genes. However, these
   findings require further validation to confirm their roles.

Fig. 9.

   Fig. 9
   [140]Open in a new tab

   Network for TFs-genes and genes-miRNAs with three hub genes. (A)
   TFs-genes regulatory network of three hub genes. (B) Genes-miRNAs
   regulatory network of three hub genes.

Identification of drug candidates

   We utilized the DSigDB database on the Enrichr platform to identify
   potential drugs targeting hub genes. A total of 265 gene-drug
   associations were identified (Supplementary Table S7). The top 10
   compounds were selected based on their p-values and adjusted p-values.
   These potential therapeutic molecules include troglitazone (CTD
   00002415), folic acid (CTD 00005997), TPEN (CTD 00001994), mitomycin C
   (CTD 00007136), progesterone (CTD 00006624), calcitriol (CTD 00005558),
   medroxyprogesterone acetate (CTD 00006623), KT 5720 (CTD 00002405),
   trifluridine (TTD 00011552), and 2-mercaptoethanol (TTD 00000601)
   (Table [141]1).

Table 1.

   Predicted top 10 drug compounds.
   Term P-value Adjusted P-value Combined score Genes
   troglitazone CTD 00002415 3.43E-05 0.009097838 596691.1364 ZFP36;WNT5A;
   TYMS
   folic acid CTD 00005997 2.79E-04 0.036970492 1688.42119 WNT5A; TYMS
   TPEN CTD 00001994 4.74E-04 0.040748318 1204.258214 ZFP36;TYMS
   mitomycin C CTD 00007136 7.02E-04 0.040748318 934.5688531 WNT5A; TYMS
   progesterone CTD 00006624 8.77E-04 0.040748318 381927.2499 ZFP36;WNT5A;
   TYMS
   calcitriol CTD 00005558 9.37E-04 0.040748318 377411.5153 ZFP36;WNT5A;
   TYMS
   Medroxyprogesterone acetate CTD 00006623 0.001117417 0.040748318
   688.8061376 ZFP36;WNT5A
   KT 5720 CTD 00002405 0.001798976 0.040748318 5741.921288 ZFP36
   Trifluridine TTD 00011552 0.001798976 0.040748318 5741.921288 TYMS
   2-mercaptoethanol TTD 00000601 0.001798976 0.040748318 5741.921288 TYMS
   [142]Open in a new tab

Discussion

   NAFLD continues to be a significant global health burden, with its
   pathogenesis being highly complex. There is an urgent need for reliable
   diagnostic biomarkers. In this study, we for the first time integrated
   bioinformatics and machine learning approaches to identify biomarkers
   associated with palmitoylation in NAFLD, revealing three key genes
   (TYMS, WNT5A, and ZFP36) with substantial diagnostic and therapeutic
   potential. Our findings are not only consistent with the existing
   theories of NAFLD pathogenesis but also provide novel insights into the
   role of palmitoylation in NAFLD, offering a more refined framework for
   future research and clinical translation.

   Initially, through WGCNA analysis and differential expression analysis,
   this study identified 60 PR-DEGs, which were enriched in pathways
   regulating inflammatory responses (e.g., IL-17, TNF signaling pathways)
   and lipid metabolism (e.g., atherosclerosis). The significant
   associations with TNF signaling and immune pathways highlight the
   critical role of inflammation in the progression of NAFLD. Activation
   of the TNF signaling pathway has been demonstrated to drive the
   progression of NAFLD to NASH by promoting hepatocyte apoptosis, the
   release of pro-inflammatory cytokines, and insulin
   resistance^[143]21,[144]22. Notably, palmitoylation may exacerbate
   hepatic inflammation and metabolic disturbances by modulating the
   subcellular localization or protein stability of key molecules in the
   TNF pathway, such as TLR4 or TNFR1. For instance, palmitoylation of
   TLR4 can enhance its membrane localization and promote the activation
   of downstream NF-κB signaling^[145]23,[146]24, a mechanism that may be
   closely related to the upregulation of PR-DEGs observed in monocytes
   and macrophages in this study. Moreover, the enrichment of PR-DEGs in
   pathways related to extracellular matrix (ECM) remodeling (e.g.,
   collagen metabolism) suggests that palmitoylation may influence the
   dynamic balance of the ECM by regulating the activity of matrix
   metalloproteinases (MMPs) or their inhibitors (TIMPs), thereby
   participating in the process of hepatic fibrosis^[147]25,[148]26. For
   example, WNT5A, one of the core genes identified in this study, has
   been shown to promote the activation of hepatic stellate cells and
   collagen deposition through activation of the Wnt/β-catenin signaling
   pathway^[149]27. However, the specific palmitoylation sites of PR-DEGs
   and their functional regulation require further experimental
   validation.

   Through comparative analysis of seven machine learning models, this
   study identified TYMS, WNT5A, and ZFP36 as the PRHGs for NAFLD. Among
   these models, the NNET and DT models exhibited the most optimal
   diagnostic performance, characterized by the smallest residuals and an
   AUC exceeding 0.95. Compared with the control group, TYMS and WNT5A
   were significantly upregulated, while ZFP36 was significantly
   downregulated in patients with NAFLD. ROC analysis further demonstrated
   their potential as diagnostic biomarkers for NAFLD, with all AUC values
   exceeding 0.90. A nomogram model constructed based on these genes
   achieved an AUC of 0.976 and maintained high accuracy in the
   independent validation cohort ([150]GSE164760), with an AUC of 0.903,
   thereby highlighting its clinical translational potential. However, it
   is important to note that the validation dataset ([151]GSE164760)
   contained only 6 control samples, which may limit the statistical
   robustness of our validation results. This small sample size could
   potentially introduce variability in the AUC estimation and may affect
   the generalizability of our findings. Future studies with larger
   validation cohorts are needed to further confirm the diagnostic
   accuracy and stability of our model.

   TYMS is a key enzyme in nucleotide metabolism, catalyzing the
   conversion of deoxyuridine monophosphate (dUMP) to deoxythymidine
   monophosphate (dTMP)^[152]28. Extensive research has demonstrated that
   TYMS plays a crucial role in a variety of diseases, including cancer,
   metabolic disorders, inflammatory diseases, neurodegenerative diseases,
   and cardiovascular diseases^[153]29–[154]31. In this study, TYMS was
   significantly upregulated in patients with NAFLD, potentially
   reflecting enhanced hepatocyte proliferation or a compensatory response
   to lipotoxicity-induced DNA damage. Previous studies have shown that
   overexpression of TYMS is closely associated with cell proliferation
   and oxidative stress in hepatic fibrosis^[155]32,[156]33. Our GSEA
   analysis further revealed an association between high TYMS expression
   and the serotonin metabolism and JAK-STAT signaling pathways,
   suggesting its involvement in metabolic dysregulation and inflammatory
   responses, which are hallmark features of NAFLD.

   WNT5A is a pivotal member of the Wnt protein family and serves as a
   ligand for the non-canonical Wnt signaling pathway^[157]34. It plays a
   crucial role in cell migration, inflammation, immune regulation, and
   metabolic control^[158]35,[159]36. In non-alcoholic steatohepatitis
   (NASH), the upregulation of WNT5A has been shown to be closely
   associated with oxidative stress and inflammatory responses in
   hepatocytes^[160]37,[161]38. This finding is consistent with our GSEA
   results, which link WNT5A to the KEAP1-NRF2 pathway, further supporting
   its involvement in the progression of NAFLD. Its association with
   gluconeogenesis and tyrosine degradation further underscores its role
   in metabolic reprogramming, a key driver in the progression of NAFLD.
   Additionally, WNT5A may promote the progression of hepatic fibrosis by
   regulating the activation and proliferation of hepatic stellate
   cells^[162]39. Intriguingly, WNT5A inhibitors have demonstrated
   efficacy in reducing fibrosis in various preclinical
   models^[163]40–[164]42, suggesting their potential as dual therapeutic
   targets addressing both the metabolic and fibrotic components of NAFLD.

   ZFP36, which is downregulated in NAFLD, is an RNA-binding protein that
   suppresses excessive inflammatory responses by degrading the mRNA of
   pro-inflammatory cytokines such as TNF-α, IL-6, and IL-1β^[165]43. In
   diseases like rheumatoid arthritis and inflammatory bowel disease, low
   expression of ZFP36 leads to the accumulation of pro-inflammatory
   cytokines^[166]44,[167]45. Therefore, its reduced expression in NAFLD
   may exacerbate hepatic inflammation, a finding corroborated by the
   enrichment of Toll-like receptor (TLR) and PERK-ATF4 pathways observed
   in our analysis. ZFP36 can also participate in lipid metabolism and
   insulin signaling pathways^[168]46, exerting a profound influence on
   cellular metabolism by modulating the expression of various metabolic
   enzymes and transport proteins^[169]47. Thus, restoring ZFP36 activity
   may alleviate inflammation and metabolic dysregulation associated with
   NAFLD, offering a novel therapeutic strategy.

   Single-cell RNA sequencing analysis revealed that TYMS is highly
   expressed in natural killer cells, WNT5A is enriched in hepatocytes and
   monocytes, while ZFP36 is widely distributed across various immune
   cells. This distribution pattern suggests that PRHGs may influence the
   progression of NAFLD by modulating the interactions between hepatocytes
   and immune cells. For instance, high expression of WNT5A in hepatocytes
   may activate the pro-inflammatory phenotype (M1) of neighboring
   macrophages through paracrine signaling^[170]27,[171]48, while the
   absence of ZFP36 may compromise its function in suppressing the release
   of inflammatory cytokines^[172]49. Additionally, CellChat analysis
   identified that cellular communication in the high PRHGs score group is
   primarily mediated by MIF, APP, and collagen signaling pathways, which
   are closely associated with hepatic fibrosis and immune
   evasion^[173]50–[174]52. This further supports the pivotal role of
   PRHGs in regulating the microenvironment.

   Immune infiltration analysis revealed a significant increase in
   monocytes, activated dendritic cells, and neutrophils in the livers of
   patients with NAFLD, while resting immune cells, such as M2 macrophages
   and resting mast cells, were decreased. This pattern is consistent with
   a chronic inflammatory state: activated monocytes-macrophages drive
   insulin resistance through the release of IL-1β and TNF-α^[175]53,
   while neutrophil infiltration exacerbates oxidative stress
   damage^[176]54,[177]55. Notably, WNT5A exhibited a strong positive
   correlation with monocyte infiltration (r = 0.72), suggesting that it
   may promote the inflammatory cascade by recruiting monocytes to the
   liver.

   MicroRNAs (miRNAs) are primarily responsible for post-transcriptional
   gene regulation, while transcription factors (TFs) modulate the
   transcription of target genes. Both play critical roles in diverse
   biological processes and diseases^[178]56. Surprisingly, in our study,
   the miRNA hsa-let-7b-5p was identified as a potential common regulator
   of all three key genes, suggesting its potential as a therapeutic
   target. Although members of the let-7 family are known to regulate
   lipid metabolism and inflammation^[179]57,[180]58, their direct
   association with palmitoylation-related genes in NAFLD is
   unprecedented. This indicates a post-transcriptional regulatory axis in
   which hsa-let-7b-5p finely tunes palmitoylation-driven pathways,
   bridging metabolic and immune dysregulation—a hypothesis that warrants
   experimental validation.

   Our drug screening identified several compounds, including
   pioglitazone, folic acid, and calcitriol, as potential therapeutic
   candidates targeting these hub genes. Pioglitazone, a PPARγ agonist,
   has demonstrated efficacy in improving hepatic steatosis by enhancing
   fatty acid oxidation^[181]59,[182]60. Similarly, calcitriol (vitamin
   D3) has been shown to modulate immune responses and reduce hepatic
   inflammation in preclinical models of NAFLD^[183]61. These findings are
   consistent with the GSVA results linking the hub genes to the JAK-STAT
   and Toll-like receptor (TLR) signaling pathways, suggesting that
   targeting these pathways may help ameliorate the progression of NAFLD.
   However, their functional roles and underlying molecular pathways
   require further validation through cellular experiments and clinical
   samples.

   Despite these advancements, our study has several limitations. First,
   reliance on transcriptomic data from public databases may introduce
   batch effects or confounding factors. Second, the data used in this
   study primarily originated from specific geographical regions,
   potentially limiting the generalizability of findings across diverse
   ethnic or population groups. Third, although bioinformatics predictions
   provided mechanistic insights, experimental validation (e.g.,
   CRISPR/Cas9 gene knockout or palmitoylation analysis) is required to
   establish causality. Future research could integrate spatial
   transcriptomics with metabolomics to elucidate the spatiotemporal
   dynamics of palmitoylation in the lobule-specific pathological
   processes of the liver.

Conclusion

   This study has uncovered the critical roles of palmitoylated genes
   TYMS, WNT5A, and ZFP36 in the diagnosis and immunometabolic regulation
   of NAFLD, establishing high-precision diagnostic models and identifying
   potential therapeutic agents. These findings provide novel insights
   into the mechanisms underlying NAFLD and offer new avenues for clinical
   intervention. Future research, through the integration of multi-omics
   approaches and experimental validation, is poised to advance the
   translational application of these discoveries.

Supplementary Information

   Below is the link to the electronic supplementary material.
   [184]Supplementary Material 1^ (249.5KB, xlsx)

Author contributions

   Author Contributions: Study conception and design: Zheng Liu, Xiaohong
   Wang and Xiaowei Tang. Drafting of manuscript: Zheng Liu, Mingzhu Xiu,
   Xiaohong Wang, Rui Luo, Xiaomin Shi and Sha Liu. Acquisition of data
   and critical revision: Zheng Liu, Yizhou Wang, Yusong Ye, Ruiyu Wang.
   Revision of manuscript, and final approval of manuscript: Xiaowei Tang,
   Muhan Lv.

Funding

   We acknowledged the following grant to our study, Natural Science
   Foundation of Sichuan Province (No. 2022NSFSC1378).

Data availability

   The datasets generated and/or analysed during the current study are
   available in the Gene Expression Omnibus (GEO) repository. The specific
   datasets used in this study are as follows:
   [185]GSE89632:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?
   acc=[186]GSE89632,
   [187]GSE164760:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?
   acc=[188]GSE164760, [189]GSE136103:
   https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=[190]GSE136103.

Code acquisition

   The code used for the analyses is available on GitHub at
   [191]https://github.com/laotoubienao/code.git.

Declarations

Competing interests

   The authors declare no competing interests.

Footnotes

   Publisher’s note

   Springer Nature remains neutral with regard to jurisdictional claims in
   published maps and institutional affiliations.

   Zheng Liu, Xiaohong Wang and Mingzhu Xiu have contributed equally to
   this work.

Contributor Information

   Muhan Lv, Email: lvmuhan@swmu.edu.cn.

   Xiaowei Tang, Email: solitude5834@hotmail.com.

References