Abstract Non-alcoholic fatty liver disease (NAFLD) is a global health challenge with complex pathogenesis and limited diagnostic biomarkers. Palmitoylation, a post-translational modification, has emerged as a critical regulator in metabolic disorders, yet its role in NAFLD remains underexplored. This study integrated bioinformatics analysis and machine learning to identify palmitoylation-related biomarkers for NAFLD. Transcriptomic datasets from human liver tissues were analyzed to identify differentially expressed genes (DEGs) and co-expression modules via WGCNA. Intersection analysis revealed 60 palmitoylation-related DEGs (PR-DEGs). Seven machine learning models were employed, with Neural Network (NNET) and Decision Tree (DT) outperforming others, identifying three hub genes: TYMS, WNT5A, and ZFP36. A nomogram integrating these genes demonstrated robust diagnostic accuracy (AUC = 0.976). The pivotal role of these genes in diagnosing NAFLD was confirmed using the validation dataset (AUC = 0.903). Functional enrichment linked these genes to TNF signaling, lipid metabolism, and immune pathways. Single-cell RNA-seq analysis highlighted their expression in hepatocytes and immune cells, with altered intercellular communication patterns. Immune infiltration analysis revealed significant shifts in monocytes, dendritic cells, and macrophages in NAFLD. Regulatory network analysis highlighted that hsa-let-7b-5p might be pivotal co-regulator of the three hub gene expressions. Finally, the top 10 potential gene-targeted drugs were screened. This study unveils novel palmitoylation-related biomarkers and provides insights into NAFLD pathogenesis, offering diagnostic and therapeutic avenues. Supplementary Information The online version contains supplementary material available at 10.1038/s41598-025-13477-3. Keywords: Non-alcoholic fatty liver disease, Palmitoylation, Biomarker, Bioinformatics, Machine learning Subject terms: Computational biology and bioinformatics, Drug discovery, Immunology, Molecular biology, Biomarkers, Gastroenterology Introduction Non-alcoholic fatty liver disease (NAFLD) has emerged as the most prevalent chronic liver disease globally, with an incidence rate of 25% among adults. This figure is projected to continue rising in tandem with the increasing prevalence of obesity and metabolic syndrome^[44]1–[45]3. Characterized by excessive lipid accumulation in hepatocytes (steatosis), NAFLD encompasses a spectrum of liver pathologies, ranging from simple steatosis to non-alcoholic steatohepatitis (NASH), which may progress to fibrosis, cirrhosis, and hepatocellular carcinoma (HCC)^[46]4,[47]5. Notably, 20–30% of NAFLD patients develop NASH, and up to 20% of these may progress to cirrhosis within 3–4 years, significantly increasing their risk of liver-related mortality^[48]6,[49]7. Alarmingly, NAFLD is projected to become the leading cause of liver transplantation by 2030^[50]8,[51]9, underscoring the urgent need for early diagnostic biomarkers and targeted therapeutic strategies. Palmitoylation, a reversible post-translational modification involving the covalent attachment of palmitate to cysteine residues, regulates diverse cellular processes, including protein trafficking, membrane localization, and signal transduction^[52]10–[53]12. In metabolic disorders, palmitoylation modulates lipid metabolism, inflammatory responses, and insulin signaling^[54]13,[55]14—key pathways implicated in NAFLD pathogenesis^[56]15,[57]16. For instance, palmitoylation of Toll-like receptor 4 (TLR4) enhances pro-inflammatory cytokine production in hepatic macrophages, exacerbating NASH progression^[58]17. Similarly, dysregulated palmitoylation of mitochondrial proteins disrupts fatty acid oxidation, promoting hepatic lipotoxicity^[59]14,[60]18. Despite these advances, the systemic profiling of palmitoylation-related genes (PRGs) in NAFLD remains underexplored, and their roles as diagnostic biomarkers or therapeutic targets are yet to be elucidated. Leveraging integrated bioinformatics and machine learning, this study aims to bridge this knowledge gap. By analyzing transcriptomic datasets from NAFLD patients, we employed WGCNA to identify disease-associated modules and intersected these with palmitoylation-related genes. Advanced machine learning algorithms were then applied to screen hub genes, validated through single-cell RNA sequencing and immune infiltration analyses. Our multi-omics approach identifies novel palmitoylation-related biomarkers and constructs a robust diagnostic nomogram, providing insights into their roles in immune modulation, metabolic dysregulation, and intercellular communication within the hepatic microenvironment. This work provides a foundation for understanding NAFLD pathogenesis through the lens of palmitoylation and highlights potential therapeutic targets for clinical translation. Materials and methods Data collection Figure [61]1 illustrates the workflow of this study. We obtained liver tissue transcriptomic datasets from the Gene Expression Omnibus (GEO) database ([62]https://www.ncbi.nlm.nih.gov/geo/) (accessed on 2025-01-13). The GEO database provides large-scale, well-annotated public microarray datasets that are highly standardized and have been extensively validated in numerous studies, providing a robust foundation for our analyses. The following criteria guided our search: (1) samples were derived from Homo sapiens; (2) the experimental data type was microarray; (3) the dataset included both control and disease groups; (4) a minimum sample size of 20. Based on these criteria, we selected the datasets [63]GSE89632 ([64]GPL14951) and [65]GSE164760 ([66]GPL13667) as the training and validation cohorts, respectively. The [67]GSE89632 dataset comprised 63 samples: 24 control samples and 39 NAFLD samples. The [68]GSE164760 dataset included 80 samples: 6 control samples and 74 NAFLD samples. Subsequently, we downloaded palmitoylation-related genes from the GeneCards database ([69]https://www.genecards.org/) (accessed on 2025-01-13) (Supplementary Table [70]S1). Fig. 1. [71]Fig. 1 [72]Open in a new tab Study flowchart. WGCNA Weighted Gene Co-expression Network Analysis, DGEs Differentially Expressed Genes, GO and KEGG Gene Ontology and Kyoto Encyclopedia of Genes and Genomes, GSEA Gene Set Enrichment Analysis, GSVA Gene Set Variation Analysis. Identification of differentially expressed genes (DEGs) We identified DEGs between the normal and NAFLD groups using the “limma” R package. The threshold criteria for DEGs were set at p < 0.05 and |log2FC| ≥ 0.585. To visualize the DEGs, we employed the “ggplot2” R package to generate volcano plots, which highlight genes with significant expression changes. Additionally, heatmaps were constructed using the “pheatmap” R package to display the expression patterns of DEGs across samples. To further elucidate the overlap of DEGs with palmitoylated genes, we utilized the “Venn” R package to create Venn diagrams, thereby identifying PR-DEGs that are potentially involved in NAFLD pathogenesis. Weighted gene coexpression network analysis (WGCNA) and module gene identification The R package “WGCNA” was adopted to seek co-expression gene modules with high biological significance and explore the relationship between gene networks and diseases. First, we selected 20,819 genes that were expressed in more than half of the samples from the [73]GSE89632 dataset for further analysis. Second, the optimal soft-thresholding power β (ranging from 1 to 30) was generated by the “pickSoftThreshold” function to construct a scale free network. The average connectivity R^2 threshold was set to 0.9. Third, the adjacency was then converted into a topological overlap matrix (TOM), and the gene ratio and dissimilarity were determined. Fourth, hierarchical clustering and the dynamic tree cut function were employed to cut and identify co-expression modules, which were then merged based on similar expression patterns for further analysis. The parameters “minModuleSize”, “mergeCutHeight” and “deepSplit” were set to 50, 0.6 and 1.0, respectively. Finally, modules were linked to disease by calculating gene significance (GS) and module membership (MM). The genes within the module exhibiting the strongest correlation with disease were used for subsequent analysis. Application of machine learning for screening hub genes In this study, we employed a diverse array of machine learning algorithms to identify hub genes, all of which were implemented through corresponding R packages. Specifically, the Random Forest (RF) model was constructed using the “randomForest” package; Support Vector Machine (SVM) was realized through the “kernlab” package; Generalized Linear Model (GLM) was trained with the “caret” package; K-Nearest Neighbors (KNN) was also implemented using the “caret” package; Neural Network (NNET) was built via the “nnet” package; Least Absolute Shrinkage and Selection Operator (LASSO) regression was executed through the “glmnet” package; and Decision Tree (DT) was constructed using the “rpart” package. The “train” function from the “caret” package was utilized to train each model with a 5-fold cross-validation strategy. Subsequently, the trained models were transformed into interpretable objects using the “explain” function from the “DALEX” package, and their predictive performance was assessed via the “model_performance” function. The evaluation metrics included residual analysis, Receiver Operating Characteristic (ROC) curves, and Area Under the Curve (AUC) values. The fitting quality of the models was evaluated based on the distribution characteristics of residuals, with detailed analyses of prediction errors conducted through the plotting of reverse cumulative distribution curves, boxplots, histograms, and Precision-Recall Curves (PRC). The importance scores of genes in each model were calculated using the “variable_importance” function from the “DALEX” package. The residual boxplots and gene importance score plots were generated using the “ggplot2” package, while ROC curves were created with the “pROC” package. Ultimately, the top-performing two machine learning models were selected, and their intersections were used to identify the hub genes. Construction and evaluation of the nomogram To evaluate the predictive performance and stability of the hub genes and the nomogram, we employed a variety of statistical methods. First, the nomogram was constructed using the “rms” package to integrate multiple predictive variables into a comprehensive predictive model. Subsequently, the ROC curve was plotted using the “pROC” package, and the AUC was calculated to assess the model’s discriminative ability. A higher AUC value indicates stronger predictive capacity. Additionally, calibration curves were used to assess the consistency between predicted probabilities and actual outcomes, and the concordance index (C-index) was employed to measure the model’s ranking ability and stability. A higher C-index value suggests greater stability and predictive accuracy of the model. GO, KEGG, GSEA and GSVA analyses The “clusterProfiler” and “org.Hs.eg.db” packages were used to conduct Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses^[74]19,[75]20. Additionally, Gene Set Enrichment Analysis (GSEA) was performed using the “clusterProfiler” package, utilizing human gene sets retrieved from the MSigDB database (accessed on 2025-01-15), specifically the “c5.go.v2024.1.Hs.symbols.gmt” and “c2.cp.kegg_medicus.v2024.1.Hs.symbols.gmt” files. The ranked list of genes based on their Spearman correlation coefficients with each hub gene was provided as input to the GSEA analysis. To further elucidate the phenotypic impact of hub genes, we conducted Gene Set Variation Analysis (GSVA) using the “GSVA” package, comparing high and low expression groups. Visualization of GO, KEGG, and GSEA enrichment results was achieved using the “enrichplot” package, while GSVA results were depicted using “ggplot2”. Statistical significance was determined using thresholds of p < 0.05 and false discovery rate (FDR) q < 0.05. Gathering and handling of data for single-cell RNA-Seq analysis We analyzed the single-cell data from [76]GSE136103 including four high-quality liver samples: [77]GSM4041162, [78]GSM4041163, [79]GSM4041165, and [80]GSM4041167 in the GEO database (accessed on 2025-01-17) using the “Seurat” software package, performing a series of standard workflows with the “PercentageFeatureSet”, “NormalizeData”, “FindVariableFeatures”, “RunPCA”, “FindNeighbors”, “FindClusters”, “RunUMAP”, and “FindAllMarkers” functions. Cell types were annotated using the “SingleR” tool, which facilitates accurate cell type identification through single-cell expression profiling. To assess the functional impact of palmitoylation-related hub genes (PRHGs), we utilized the “AddModuleScore” function to determine the feature-specific scores for each cell. Furthermore, we examined the dynamics of cell-cell interactions using the “CellChat” R package, which enables the reconstruction and analysis of intercellular communication networks. Immune infiltration analysis The CIBERSORT algorithm, a deconvolution method designed to identify the composition of 22 immune cell types based on gene expression data, was employed to analyze immune cell infiltration using the “CIBERSORT” package. The resulting data were visualized utilizing the “ggplot2” and “corrplot” packages. To assess the relationships between immune cells and potential biomarkers, Spearman correlation analysis was conducted. Recognition of transcription factors and miRNAs The transcription factor (TF)-gene and gene-miRNA regulatory networks were constructed using the NetworkAnalyst 3.0 platform ([81]https://www.networkanalyst.ca/) (accessed on 2025-01-27). Data from the ENCODE project, which provides comprehensive and integrative functional genomic information for human and mouse cells and tissues, were utilized. Additionally, miRNA-mRNA interactions were sourced from TarBase, a primary database for experimentally validated miRNA-mRNA interactions. The resulting networks were visualized using Cytoscape. Evaluation of candidate drugs The Enrichers platform was employed to elucidate the associations between drug molecules and hub genes, with data sourced from the Drug Signatures Database (DSigDB, [82]https://tanlab.ucdenver.edu/DSigDB) (accessed on 2025-02-01). DSigDB comprises 22,527 gene sets, encompassing 17,389 drugs that target 19,531 genes. Statistical analysis All statistical analyses were performed using R software (version 4.4.1). The Wilcoxon test was employed to compare the expression levels of identified biomarkers between the control group and the NAFLD group. Correlation analysis was conducted using Spearman’s correlation analysis. All statistical tests were two-sided, and a p-value of less than 0.05 was considered statistically significant. Results Implementation of WGCNA and identification of key module genes To identify modules significantly associated with NAFLD, we performed WGCNA on the [83]GSE89632 dataset. Initially, we conducted hierarchical clustering analysis on the samples and detected two outlier samples, [84]GSM2385767 and [85]GSM2385782. These outliers were removed by setting the cutHeight to 130 (Fig. [86]2A). Subsequently, we chose an optimal soft threshold power of 5 (R^2 = 0.9) to guarantee a scale-free topological network (Fig. [87]2B). Then, we defined modules by setting the “minModuleSize” to 50, “mergeCutHeight” to 0.6, and “deepSplit” to 1.0, resulting in the identification of nine distinct modules. Our analysis revealed a strong correlation between the MEbrown and NAFLD (correlation coefficient = -0.87, p = 3e-20) (Fig. [88]2C). Within the brown module, a highly significant correlation was observed between gene significance (GS) and module membership (MM) (correlation coefficient = 0.92, p < 1e − 200) (Fig. [89]2D). Ultimately, we identified 2,969 genes significantly associated with NAFLD within the brown module. Fig. 2. [90]Fig. 2 [91]Open in a new tab Implementation of WGCNA and identification of key module genes. (A) Hierarchical Clustering Dendrogram for Outlier Detection in Sample Dataset [92]GSE89632. (B) A soft-thresholding power was set at 5, ensuring a scale-free R2 = 0.9. (C) Correlation Heatmap of Module-Trait Relationships in NAFLD and Control Samples. (D) Relationship Between Module Membership and Gene Significance in the Brown Module. Identification and functional enrichment analysis of palmitoylation-related DEGs in NAFLD To further elucidate the relationship between palmitoylation-related genes and the development of NAFLD, we subsequently conducted a comprehensive analysis of DEGs between normal and NAFLD samples in the [93]GSE89632 dataset. Ultimately, a total of 872 DEGs were identified in NAFLD samples compared to normal samples (Fig. [94]3A, B). By intersecting these DEGs with genes from the brown module obtained through WGCNA and palmitoylation-related genes, we successfully identified 60 PR-DEGs (Fig. [95]3C). GO analysis of these 60 key genes revealed significant enrichment in biological processes (BP), including the regulation of inflammatory response, positive regulation of cytokine production, and response to lipopolysaccharide. In the cellular component (CC) classification, collagen-containing extracellular matrix and endocytic vesicle exhibited the highest enrichment scores. The molecular functions (MF) were primarily enriched in enzyme inhibitor activity, DNA-binding transcription activator activity, and cytokine activity (Fig. [96]3D, E) (Supplementary Table S2). KEGG pathway enrichment analysis indicated that the majority of these genes were highly enriched in pathways such as the TNF signaling pathway, lipid metabolism and atherosclerosis, IL-17 signaling pathway, and inflammatory bowel disease (Fig. [97]3F, G) (Supplementary Table S3). Fig. 3. [98]Fig. 3 [99]Open in a new tab Association Analysis of Palmitoylation-Related Genes with the Development of NAFLD. (A) A volcano plot was constructed to identify DEGs between normal and NAFLD samples. (B) A heatmap was generated to display the top 20 upregulated and downregulated genes, sorted by log2 fold change. (C) A Venn diagram was utilized to illustrate the intersection of genes identified through WGCNA, DEGs, and palmitoylated genes. (D) GO analysis was performed, and the results were presented in a circular plot. (E) GO analysis was further visualized in a bar chart, with annotations for biological processes (BP), cellular components (CC), and molecular functions (MF). (F) KEGG pathway enrichment analysis was conducted, and the findings were depicted in a circular plot. (G) KEGG pathway enrichment analysis was also represented in a bar chart to provide a detailed overview of the enriched pathways. Identification of PRHGs for NAFLD via machine learning Based on the expression profiles of 60 PR-DEGs, we constructed seven machine learning models to further identify genes with high diagnostic value. The results indicated that the NNET and DT machine learning models exhibited relatively low residuals (Fig. [100]4A, B). We assessed the diagnostic performance of the seven machine learning algorithms in the test cohort using the AUC. The AUC values for the RF model, SVM model, GLM model, KNN model, NNET model, LASSO model, and DT model were 0.987, 0.974, 0.675, 0.948, 0.987, 0.974, and 0.955, respectively (Fig. [101]4C). We further ranked the top 10 significant feature variables in each model using the root mean square error (RMSE) (Fig. [102]4D). Collectively, the NNET and DT machine learning models demonstrated the highest efficiency in distinguishing between normal individuals and patients with NAFLD among the seven models. We selected the top 10 predictive genes from both the NNET and DT models and identified three PRHGs at their intersection for further investigation: TYMS, WNT5A, and ZFP36 (Fig. [103]4E). The AUC values for TYMS, WNT5A, and ZFP36 were 0.958, 0.963, and 0.922, respectively, indicating their excellent diagnostic potential (Fig. [104]4F). Fig. 4. [105]Fig. 4 [106]Open in a new tab Construction and evaluation of seven machine models. (A) The reverse cumulative distribution of residuals for seven machine learning models (NNET, RF, KNN, GLM, DT, SVM, LASSO). (B) The boxplot of residuals. The boxplot illustrates the distribution of residuals for the seven machine learning models (NNET, RF, KNN, GLM, DT, SVM, LASSO). Red dots represent the root mean square error (RMSE) of the residuals. (C) The ROC curves and corresponding AUC values for seven machine learning models (RF, SVM, GLM, KNN, NNET, LASSO, DT). (D) The top 10 significant feature variables in seven machine learning models (DT, GLM, KNN, LASSO, NNET, RF, SVM), ranked according to root mean square error (RMSE). (E) The intersection of the top 10 predicted genes in NNET and DT models. (F) The ROC curves and corresponding AUC values for TYMS, WNT5A, and ZFP36. Gene expression analysis and construction and validation of the nomogram We initially compared the expression levels of TYMS, WNT5A, and ZFP36 genes between the control group and the NAFLD patient group. The results revealed that the expression of TYMS and WNT5A was significantly upregulated, whereas the expression of ZFP36 was significantly downregulated in the NAFLD group (Fig. [107]5A). Based on these differences, we constructed a nomogram model that converts the relative expression levels of each gene into scores, allowing for the estimation of NAFLD risk based on the cumulative score (Fig. [108]5B). To evaluate the effectiveness of this model, we further analyzed its calibration and discrimination capabilities. The calibration curve (Fig. [109]5C) demonstrated a high degree of concordance between predicted probabilities and actual observed probabilities, indicating excellent calibration. The C-index was 0.976 (95% CI: 0.935–1.018) (Fig. [110]5C), and the area under the ROC (AUC) was 0.976 (Fig. [111]5D), confirming the model’s superior discrimination ability. Moreover, we validated the accuracy of these candidate biomarkers in an external dataset ([112]GSE164760). The results were largely consistent with our initial findings, demonstrating that TYMS and WNT5A were upregulated, whereas ZFP36 was downregulated in the NAFLD group (Fig. [113]5E). Further analysis using the nomogram and ROC curve provided additional support for the diagnostic value of these genes (Fig. [114]5F, H). In summary, our findings suggest that TYMS, WNT5A, and ZFP36 hold promise as diagnostic biomarkers for NAFLD. Fig. 5. [115]Fig. 5 [116]Open in a new tab Evaluation of the diagnostic potential of candidate biomarkers and construction of a nomogram in the training ([117]GSE89632) and validation cohorts ([118]GSE164760). (A) The violin plots elucidate the differential expression profiles of TYMS, WNT5A, and ZFP36 between the control and the NAFLD group in the training cohort. (B) A predictive nomogram for NAFLD occurrence constructed from PRHGs. “Points” indicate the scores of hub genes, and “Total Points” represent the sum of all these gene scores. (C) The calibration curve illustrates the relationship between actual and predicted NAFLD rates. A perfectly accurate model would be represented by the dotted diagonal line. The performance of the nomogram is depicted by the solid line, with its closeness to the diagonal indicating greater precision. (D) ROC curve of the nomogram in training cohort for NAFLD. (E) The violin plots elucidate the differential expression profiles of TYMS, WNT5A, and ZFP36 between the control and the NAFLD group in the validation cohort. (F) A predictive nomogram for NAFLD occurrence constructed from PRHGs in the validation cohort. (G) The ROC curves and corresponding AUC values for TYMS, WNT5A, and ZFP36 in the validation cohort. (H) ROC curve of the nomogram in validation cohort for NAFLD. GSEA and GSVA pathway analyses reveal molecular mechanisms of PRHGs in NAFLD To further investigate the molecular mechanisms of the three PRHGs in the diagnosis of NAFLD, we performed GSEA-KEGG pathway enrichment analysis for each gene biomarker. The graphical representations highlighted the top five most enriched pathways (Fig. [119]6A-C and Supplementary Table S4-6). By intersecting the enrichment results of these three hub genes, we identified several key pathways in which they were commonly enriched. These pathways include mitochondrial complex UCP1 in thermogenesis, the impact of inactivated PINK1 on electron transfer in Complex I, aberrant TDP43 on electron transfer in Complex I, electron transfer in Complex I, the effects of aberrant SNCA and ABETA on electron transfer in Complex I, DNA replication origin unwinding and elongation, and the interaction between CENPE and the NDC80 complex (Fig. [120]6D, E). Subsequently, NAFLD samples were stratified into high- and low-expression groups based on the median expression levels of the hub genes. GSVA enrichment analysis was then performed to investigate the differential pathways between these groups. Comprehensive analysis indicated that high expression of TYMS may activate serotonin metabolism, hormone-like cytokine to JAK-STAT signaling pathway, GH JAK-STAT signaling pathway, and PAX8-PPARG fusion to PPARG-mediated transcription. Conversely, low expression of TYMS was associated with the activation of pathways such as DNA replication origin unwinding and elongation, pre-initiation complex (pre-IC) formation, aberrant ABETA to crosstalk between extrinsic and intrinsic apoptotic pathways, and keratan sulfate degradation (Fig. [121]6F). High expression of WNT5A was linked to the activation of the KEAP1-NRF2 signaling pathway, tyrosine degradation, and gluconeogenesis, while low expression of WNT5A was associated with the activation of the activin signaling pathway, nodal signaling pathway, and HSV GD to HVEM NFKB signaling pathway (Fig. [122]6G). Similarly, high expression of ZFP36 was related to the TLR7/8/9-IRF5 signaling pathway and TLR7/9-IRF7 signaling pathway, whereas low expression of ZFP36 was associated with the PERK-ATF4 signaling pathway (Fig. [123]6H). Fig. 6. [124]Fig. 6 [125]Open in a new tab GSEA and GSVA of 3 PRHGs. GSEA of TYMS (A), WNT5A (B) and ZFP36 (C) using KEGG gene sets. (D, E) Intersections of GSEA results for 3 PRHGs. GSVA of TYMS (F), WNT5A (G) and ZFP36 (H) using KEGG gene sets. The correlation of the PRHGs with singlecell characteristics In order to evaluate the functions of PRHGs within the hepatic microenvironment at the single-cell transcriptomic level, we comprehensively analyzed the expression profiles of TYMS, WNT5A, and ZFP36 across distinct cell types. Our results revealed that TYMS is predominantly expressed in NK cells, WNT5A is primarily expressed in hepatocytes and monocytes, while ZFP36 exhibits widespread expression across various cell types. Notably, hepatocytes were identified as the main cell type in which all three PRHGs are co-expressed (Fig. [126]7A-E). Utilizing the “AddModuleScore” function, we assigned the signature-specific score to each cell based on PRHGs expression. Subsequently, cells were categorized into high-score and low-score groups according to their PRHGs scores (Fig. [127]7F), and differential expression analysis was performed to identify DEGs. GSEA demonstrated that these DEGs were significantly enriched in pathways and functions such as Cytokine Signaling in the Immune System, IL-18 Signaling Pathway, Cytokine Mediated Signaling Pathway, Response to Cytokine, and Inflammatory Response (Fig. [128]7G, H). Intriguingly, liver cells within the microenvironment exhibited diverse communication patterns depending on their PRHGs scores (Fig. [129]7I). Within this intricate hepatic microenvironment, various cell types can act as Sender, Receiver, Mediator, and Influencer during cellular communication, ultimately leading to distinct intercellular signaling cues. Our study identified significant alterations and key influencers in the cell communication signals of the high-score group, particularly in MIF, APP, and COLLAGEN signaling (Fig. [130]7J–L). These findings suggest that such signals may play regulatory roles in inflammation, fibrosis, immune modulation, and metabolic processes within the hepatic microenvironment. Fig. 7. [131]Fig. 7 [132]Open in a new tab The single-cell characteristics of PRHGs. (A) UMAP plot of cell type annotation using the “SingleR” package. Density of TYMS: (B), WNT5A: (C), ZFP36: (D) Gene Expression. (E) Density of Co-expressed TYMS, WNT5A, and ZFP36 Genes. (F) The UMAP plot categorizes cells into two groups based on their signature-specific scores derived from the expression of PRHGs. (G) The results of GSEA performed using the c2.cp.v2024.1.Hs.symbols.gmt gene set collection. (H) The results of GSEA performed using the c5.go.v2024.1.Hs.symbols.gmt gene set collection. (I) Identifying distinct signal pathways in varying PRHGs score groups. (J–L) The signaling pathway networks for MIF (J), APP (K), and COLLAGEN (L) pathways, with heatmaps showing cell type involvement. Immune cell infiltration analysis To investigate the immune response mechanisms in NAFLD, we employed the CIBERSORT algorithm to evaluate the changes in immune cell abundance between NAFLD and control groups (Fig. [133]8A, B). Our results revealed that, compared to control samples, NAFLD samples exhibited significantly higher levels of plasma cells, activated memory CD4 T cells, monocytes, activated dendritic cells, activated mast cells, and neutrophils. Conversely, there were lower levels of resting memory CD4 T cells, γδ T cells, activated NK cells, M1 macrophages, M2 macrophages, resting dendritic cells, and resting mast cells in NAFLD samples. Subsequent correlation analysis through a heatmap of immune cell interactions demonstrated that in NAFLD samples, M0 macrophages were positively correlated with CD8 T cells (r = 0.55), and γδ T cells were positively correlated with M2 macrophages (r = 0.63). In contrast, monocytes exhibited negative correlations with γδ T cells (r = – 0.75) and M2 macrophages (r = – 0.71) (Fig. [134]8C). These findings suggest that the immune profile and the interactions among various immune cell types in NAFLD patients are distinct from those in controls. Additionally, a correlation bubble chart was utilized to illustrate the associations between the PRHGs and different immune cells (Fig. [135]8D). Fig. 8. [136]Fig. 8 [137]Open in a new tab Analysis of immune cell infiltration. (A) The bar plot visualizing the proportion of infiltrating immune cells in control and NAFLD groups. (B) The boxplot comparing the proportion of immune cells between NAFLD and control groups. *p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001. (C) The correlated heatmap representing the correlation between different immune cells in NAFLD. (D) Correlation analysis of infiltrating immune cells with three PRHGs. Construction of the TFs-genes and genes-miRNAs regulatory network Given the importance of TFs and miRNAs in elucidating disease progression, we constructed TFs-genes and genes-miRNAs network to identify the transcriptional and post-transcriptional regulators of PRHGs, thereby enhancing our understanding of the disease. The TFs-genes network comprises 202 nodes and 228 edges (Fig. [138]9A), while the genes-miRNAs network consists of 327 nodes and 501 edges. By selecting the “Minimum Network” option, the genes-miRNAs network was reduced to a subnetwork with only 4 nodes and 3 edges (Fig. [139]9B). Notably, hsa-let-7b-5p interacts with three hub genes, suggesting that it may serve as a common regulator of these genes. However, these findings require further validation to confirm their roles. Fig. 9. Fig. 9 [140]Open in a new tab Network for TFs-genes and genes-miRNAs with three hub genes. (A) TFs-genes regulatory network of three hub genes. (B) Genes-miRNAs regulatory network of three hub genes. Identification of drug candidates We utilized the DSigDB database on the Enrichr platform to identify potential drugs targeting hub genes. A total of 265 gene-drug associations were identified (Supplementary Table S7). The top 10 compounds were selected based on their p-values and adjusted p-values. These potential therapeutic molecules include troglitazone (CTD 00002415), folic acid (CTD 00005997), TPEN (CTD 00001994), mitomycin C (CTD 00007136), progesterone (CTD 00006624), calcitriol (CTD 00005558), medroxyprogesterone acetate (CTD 00006623), KT 5720 (CTD 00002405), trifluridine (TTD 00011552), and 2-mercaptoethanol (TTD 00000601) (Table [141]1). Table 1. Predicted top 10 drug compounds. Term P-value Adjusted P-value Combined score Genes troglitazone CTD 00002415 3.43E-05 0.009097838 596691.1364 ZFP36;WNT5A; TYMS folic acid CTD 00005997 2.79E-04 0.036970492 1688.42119 WNT5A; TYMS TPEN CTD 00001994 4.74E-04 0.040748318 1204.258214 ZFP36;TYMS mitomycin C CTD 00007136 7.02E-04 0.040748318 934.5688531 WNT5A; TYMS progesterone CTD 00006624 8.77E-04 0.040748318 381927.2499 ZFP36;WNT5A; TYMS calcitriol CTD 00005558 9.37E-04 0.040748318 377411.5153 ZFP36;WNT5A; TYMS Medroxyprogesterone acetate CTD 00006623 0.001117417 0.040748318 688.8061376 ZFP36;WNT5A KT 5720 CTD 00002405 0.001798976 0.040748318 5741.921288 ZFP36 Trifluridine TTD 00011552 0.001798976 0.040748318 5741.921288 TYMS 2-mercaptoethanol TTD 00000601 0.001798976 0.040748318 5741.921288 TYMS [142]Open in a new tab Discussion NAFLD continues to be a significant global health burden, with its pathogenesis being highly complex. There is an urgent need for reliable diagnostic biomarkers. In this study, we for the first time integrated bioinformatics and machine learning approaches to identify biomarkers associated with palmitoylation in NAFLD, revealing three key genes (TYMS, WNT5A, and ZFP36) with substantial diagnostic and therapeutic potential. Our findings are not only consistent with the existing theories of NAFLD pathogenesis but also provide novel insights into the role of palmitoylation in NAFLD, offering a more refined framework for future research and clinical translation. Initially, through WGCNA analysis and differential expression analysis, this study identified 60 PR-DEGs, which were enriched in pathways regulating inflammatory responses (e.g., IL-17, TNF signaling pathways) and lipid metabolism (e.g., atherosclerosis). The significant associations with TNF signaling and immune pathways highlight the critical role of inflammation in the progression of NAFLD. Activation of the TNF signaling pathway has been demonstrated to drive the progression of NAFLD to NASH by promoting hepatocyte apoptosis, the release of pro-inflammatory cytokines, and insulin resistance^[143]21,[144]22. Notably, palmitoylation may exacerbate hepatic inflammation and metabolic disturbances by modulating the subcellular localization or protein stability of key molecules in the TNF pathway, such as TLR4 or TNFR1. For instance, palmitoylation of TLR4 can enhance its membrane localization and promote the activation of downstream NF-κB signaling^[145]23,[146]24, a mechanism that may be closely related to the upregulation of PR-DEGs observed in monocytes and macrophages in this study. Moreover, the enrichment of PR-DEGs in pathways related to extracellular matrix (ECM) remodeling (e.g., collagen metabolism) suggests that palmitoylation may influence the dynamic balance of the ECM by regulating the activity of matrix metalloproteinases (MMPs) or their inhibitors (TIMPs), thereby participating in the process of hepatic fibrosis^[147]25,[148]26. For example, WNT5A, one of the core genes identified in this study, has been shown to promote the activation of hepatic stellate cells and collagen deposition through activation of the Wnt/β-catenin signaling pathway^[149]27. However, the specific palmitoylation sites of PR-DEGs and their functional regulation require further experimental validation. Through comparative analysis of seven machine learning models, this study identified TYMS, WNT5A, and ZFP36 as the PRHGs for NAFLD. Among these models, the NNET and DT models exhibited the most optimal diagnostic performance, characterized by the smallest residuals and an AUC exceeding 0.95. Compared with the control group, TYMS and WNT5A were significantly upregulated, while ZFP36 was significantly downregulated in patients with NAFLD. ROC analysis further demonstrated their potential as diagnostic biomarkers for NAFLD, with all AUC values exceeding 0.90. A nomogram model constructed based on these genes achieved an AUC of 0.976 and maintained high accuracy in the independent validation cohort ([150]GSE164760), with an AUC of 0.903, thereby highlighting its clinical translational potential. However, it is important to note that the validation dataset ([151]GSE164760) contained only 6 control samples, which may limit the statistical robustness of our validation results. This small sample size could potentially introduce variability in the AUC estimation and may affect the generalizability of our findings. Future studies with larger validation cohorts are needed to further confirm the diagnostic accuracy and stability of our model. TYMS is a key enzyme in nucleotide metabolism, catalyzing the conversion of deoxyuridine monophosphate (dUMP) to deoxythymidine monophosphate (dTMP)^[152]28. Extensive research has demonstrated that TYMS plays a crucial role in a variety of diseases, including cancer, metabolic disorders, inflammatory diseases, neurodegenerative diseases, and cardiovascular diseases^[153]29–[154]31. In this study, TYMS was significantly upregulated in patients with NAFLD, potentially reflecting enhanced hepatocyte proliferation or a compensatory response to lipotoxicity-induced DNA damage. Previous studies have shown that overexpression of TYMS is closely associated with cell proliferation and oxidative stress in hepatic fibrosis^[155]32,[156]33. Our GSEA analysis further revealed an association between high TYMS expression and the serotonin metabolism and JAK-STAT signaling pathways, suggesting its involvement in metabolic dysregulation and inflammatory responses, which are hallmark features of NAFLD. WNT5A is a pivotal member of the Wnt protein family and serves as a ligand for the non-canonical Wnt signaling pathway^[157]34. It plays a crucial role in cell migration, inflammation, immune regulation, and metabolic control^[158]35,[159]36. In non-alcoholic steatohepatitis (NASH), the upregulation of WNT5A has been shown to be closely associated with oxidative stress and inflammatory responses in hepatocytes^[160]37,[161]38. This finding is consistent with our GSEA results, which link WNT5A to the KEAP1-NRF2 pathway, further supporting its involvement in the progression of NAFLD. Its association with gluconeogenesis and tyrosine degradation further underscores its role in metabolic reprogramming, a key driver in the progression of NAFLD. Additionally, WNT5A may promote the progression of hepatic fibrosis by regulating the activation and proliferation of hepatic stellate cells^[162]39. Intriguingly, WNT5A inhibitors have demonstrated efficacy in reducing fibrosis in various preclinical models^[163]40–[164]42, suggesting their potential as dual therapeutic targets addressing both the metabolic and fibrotic components of NAFLD. ZFP36, which is downregulated in NAFLD, is an RNA-binding protein that suppresses excessive inflammatory responses by degrading the mRNA of pro-inflammatory cytokines such as TNF-α, IL-6, and IL-1β^[165]43. In diseases like rheumatoid arthritis and inflammatory bowel disease, low expression of ZFP36 leads to the accumulation of pro-inflammatory cytokines^[166]44,[167]45. Therefore, its reduced expression in NAFLD may exacerbate hepatic inflammation, a finding corroborated by the enrichment of Toll-like receptor (TLR) and PERK-ATF4 pathways observed in our analysis. ZFP36 can also participate in lipid metabolism and insulin signaling pathways^[168]46, exerting a profound influence on cellular metabolism by modulating the expression of various metabolic enzymes and transport proteins^[169]47. Thus, restoring ZFP36 activity may alleviate inflammation and metabolic dysregulation associated with NAFLD, offering a novel therapeutic strategy. Single-cell RNA sequencing analysis revealed that TYMS is highly expressed in natural killer cells, WNT5A is enriched in hepatocytes and monocytes, while ZFP36 is widely distributed across various immune cells. This distribution pattern suggests that PRHGs may influence the progression of NAFLD by modulating the interactions between hepatocytes and immune cells. For instance, high expression of WNT5A in hepatocytes may activate the pro-inflammatory phenotype (M1) of neighboring macrophages through paracrine signaling^[170]27,[171]48, while the absence of ZFP36 may compromise its function in suppressing the release of inflammatory cytokines^[172]49. Additionally, CellChat analysis identified that cellular communication in the high PRHGs score group is primarily mediated by MIF, APP, and collagen signaling pathways, which are closely associated with hepatic fibrosis and immune evasion^[173]50–[174]52. This further supports the pivotal role of PRHGs in regulating the microenvironment. Immune infiltration analysis revealed a significant increase in monocytes, activated dendritic cells, and neutrophils in the livers of patients with NAFLD, while resting immune cells, such as M2 macrophages and resting mast cells, were decreased. This pattern is consistent with a chronic inflammatory state: activated monocytes-macrophages drive insulin resistance through the release of IL-1β and TNF-α^[175]53, while neutrophil infiltration exacerbates oxidative stress damage^[176]54,[177]55. Notably, WNT5A exhibited a strong positive correlation with monocyte infiltration (r = 0.72), suggesting that it may promote the inflammatory cascade by recruiting monocytes to the liver. MicroRNAs (miRNAs) are primarily responsible for post-transcriptional gene regulation, while transcription factors (TFs) modulate the transcription of target genes. Both play critical roles in diverse biological processes and diseases^[178]56. Surprisingly, in our study, the miRNA hsa-let-7b-5p was identified as a potential common regulator of all three key genes, suggesting its potential as a therapeutic target. Although members of the let-7 family are known to regulate lipid metabolism and inflammation^[179]57,[180]58, their direct association with palmitoylation-related genes in NAFLD is unprecedented. This indicates a post-transcriptional regulatory axis in which hsa-let-7b-5p finely tunes palmitoylation-driven pathways, bridging metabolic and immune dysregulation—a hypothesis that warrants experimental validation. Our drug screening identified several compounds, including pioglitazone, folic acid, and calcitriol, as potential therapeutic candidates targeting these hub genes. Pioglitazone, a PPARγ agonist, has demonstrated efficacy in improving hepatic steatosis by enhancing fatty acid oxidation^[181]59,[182]60. Similarly, calcitriol (vitamin D3) has been shown to modulate immune responses and reduce hepatic inflammation in preclinical models of NAFLD^[183]61. These findings are consistent with the GSVA results linking the hub genes to the JAK-STAT and Toll-like receptor (TLR) signaling pathways, suggesting that targeting these pathways may help ameliorate the progression of NAFLD. However, their functional roles and underlying molecular pathways require further validation through cellular experiments and clinical samples. Despite these advancements, our study has several limitations. First, reliance on transcriptomic data from public databases may introduce batch effects or confounding factors. Second, the data used in this study primarily originated from specific geographical regions, potentially limiting the generalizability of findings across diverse ethnic or population groups. Third, although bioinformatics predictions provided mechanistic insights, experimental validation (e.g., CRISPR/Cas9 gene knockout or palmitoylation analysis) is required to establish causality. Future research could integrate spatial transcriptomics with metabolomics to elucidate the spatiotemporal dynamics of palmitoylation in the lobule-specific pathological processes of the liver. Conclusion This study has uncovered the critical roles of palmitoylated genes TYMS, WNT5A, and ZFP36 in the diagnosis and immunometabolic regulation of NAFLD, establishing high-precision diagnostic models and identifying potential therapeutic agents. These findings provide novel insights into the mechanisms underlying NAFLD and offer new avenues for clinical intervention. Future research, through the integration of multi-omics approaches and experimental validation, is poised to advance the translational application of these discoveries. Supplementary Information Below is the link to the electronic supplementary material. [184]Supplementary Material 1^ (249.5KB, xlsx) Author contributions Author Contributions: Study conception and design: Zheng Liu, Xiaohong Wang and Xiaowei Tang. Drafting of manuscript: Zheng Liu, Mingzhu Xiu, Xiaohong Wang, Rui Luo, Xiaomin Shi and Sha Liu. Acquisition of data and critical revision: Zheng Liu, Yizhou Wang, Yusong Ye, Ruiyu Wang. Revision of manuscript, and final approval of manuscript: Xiaowei Tang, Muhan Lv. Funding We acknowledged the following grant to our study, Natural Science Foundation of Sichuan Province (No. 2022NSFSC1378). Data availability The datasets generated and/or analysed during the current study are available in the Gene Expression Omnibus (GEO) repository. The specific datasets used in this study are as follows: [185]GSE89632:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=[186]GSE89632, [187]GSE164760:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=[188]GSE164760, [189]GSE136103: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=[190]GSE136103. Code acquisition The code used for the analyses is available on GitHub at [191]https://github.com/laotoubienao/code.git. Declarations Competing interests The authors declare no competing interests. Footnotes Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Zheng Liu, Xiaohong Wang and Mingzhu Xiu have contributed equally to this work. Contributor Information Muhan Lv, Email: lvmuhan@swmu.edu.cn. Xiaowei Tang, Email: solitude5834@hotmail.com. References