Abstract Non-alcoholic Fatty Liver Disease (NAFLD), noted for its widespread prevalence among adults, has become the leading chronic liver condition globally. Simultaneously, the annual disease burden, particularly liver cirrhosis caused by NAFLD, has increased significantly. Neutrophil Extracellular Traps (NETs) play a crucial role in the progression of this disease and are key to the pathogenesis of NAFLD. However, research into the specific roles of NETs-related genes in NAFLD is still a field requiring thorough investigation. Utilizing techniques like AddModuleScore, ssGSEA, and WGCNA, our team conducted gene screening to identify the genes linked to NETs in both single-cell and bulk transcriptomics. Using algorithms including Random Forest, Support Vector Machine, Least Absolute Shrinkage, and Selection Operator, we identified ZFP36L2 and PHLDA1 as key hub genes. The pivotal role of these genes in NAFLD diagnosis was confirmed using the training dataset [36]GSE164760. This study identified 116 genes linked to NETs across single-cell and bulk transcriptomic analyses. These genes demonstrated enrichment in immune and metabolic pathways. Additionally, two NETs-related hub genes, PHLDA1 and ZFP36L2, were selected through machine learning for integration into a prognostic model. These hub genes play roles in inflammatory and metabolic processes. scRNA-seq results showed variations in cellular communication among cells with different expression patterns of these key genes. In conclusion, this study explored the molecular characteristics of NETs-associated genes in NAFLD. It identified two potential biomarkers and analyzed their roles in the hepatic microenvironment. These discoveries could aid in NAFLD diagnosis and management, with the ultimate goal of enhancing patient outcomes. Keywords: Non-alcoholic fatty liver disease, Neutrophil extracellular traps (NETs), Single-cell RNA-seq, Biomarker, Bioinformatics, Machine learning Subject terms: Computational biology and bioinformatics, Molecular biology, Endocrinology, Pathogenesis Introduction Non-alcoholic Fatty Liver Disease (NAFLD) has emerged as the most common chronic liver disease globally, with a prevalence rate of 25% among adults, and this figure is on the rise^[37]1. This condition is marked by excessive lipid storage in hepatocytes, resulting in continuous alterations in liver enzymes, including aspartate transaminase and alanine transaminase^[38]2. The spectrum of NAFLD includes various liver conditions, extending from Non-alcoholic Fatty Liver (NAFL) to Non-alcoholic Steatohepatitis (NASH). Without intervention, NASH may advance to cirrhosis and potentially to hepatocellular carcinoma (HCC)^[39]3. With the aging of the affected population and prolonged disease exposure, the burden of NAFLD-related cirrhosis is increasing, projected to double or triple in many regions worldwide from 2015 to 2030^[40]4,[41]5. Consequently, it becomes imperative to delve more deeply into the pathogenesis of NAFLD and to innovate new strategies for treatment. In the last twenty years, there has been a growing focus on studying the influence of immune cells on the transition from NAFLD to NASH fibrosis. Since that time, a multitude of research has delved into the roles of different immune cells and inflammatory factors in the development and advancement of NAFLD. Many of these studies have underscored the importance of macrophages, T cells, and cytokines in the pathogenesis of liver inflammation and fibrosis associated with NAFLD^[42]6–[43]10. Neutrophils, forming a crucial subset of white blood cells, play a central role in the immune system's frontline defense. Their primary functions include safeguarding the body against infections and diseases through mechanisms such as phagocytosis, degranulation, and neuroendocrine actions directed at combating pathogens, including viruses, bacteria, and fungi^[44]11–[45]13. The explanation of Neutrophil Extracellular Traps (NETs) has transformed our comprehension of neutrophil function and their contribution to immune responses^[46]14. NETs, composed of chromatin, granular proteins, and histones, form a mesh-like extracellular structure. In NAFLD, the buildup of fat in the liver initiates an inflammatory reaction, which results in the mobilization of neutrophils and subsequent release of NETs.NETs not only worsen inflammation but also attract additional immune cells to the liver, including macrophages and regulatory T cells (Tregs), ultimately playing a role in the advancement of NASH-HCC^[47]15,[48]16. Therefore, NETs are regarded as a crucial element in the advancement of NAFLD. However, further extensive investigations are necessary to fully understand the involvement of genes associated with NETs in NAFLD. Researchers can now quickly evaluate the expression levels of numerous genes, thanks to the notable progress in gene microarrays and single-cell sequencing technologies. This advancement greatly contributes to our comprehension of the genetic causes of diseases. Hence, we aim to utilize bioinformatics to uncover the mechanisms through which NETs facilitate NAFLD, offering proof to guide the creation of diagnostic and therapeutic approaches for NAFLD. The study is illustrated by the workflow diagram in Fig. [49]1. Fig. 1. [50]Fig. 1 [51]Open in a new tab Flowchart of this study. Materials and methods Data sources and processing We consolidated multiple liver tissue transcriptomic datasets obtained from the Gene Expression Omnibus (GEO) database ([52]https://www.ncbi.nlm.nih.gov/). The criteria for selecting raw expression profile datasets included: (1) a focus on expression profiling via array methods; (2) inclusion of datasets containing liver tissue samples from both NAFLD patients and control subjects; (3) a minimum sample size of 15; (4) the presence of either raw data or array-based gene expression profiles in the GEO database. Consequently, four datasets met these criteria: [53]GSE89632 (Microarray, platform [54]GPL14951), [55]GSE48452 (Microarray, platform [56]GPL11532), [57]GSE66676 (Microarray, platform [58]GPL6244), and [59]GSE164760 (Microarray, platform [60]GPL13667). Additional details are available in the Supplementary Table [61]1. Similar to our previous study^[62]17, we initially merged datasets [63]GSE89632, [64]GSE48452, and [65]GSE66676, comprising 72 normal and 104 NAFLD samples. These datasets were then normalized using the "sva" package^[66]18. Differential gene expression between NAFLD and control groups was analyzed with the "limma" package^[67]19, considering p-values below 0.05 as statistically significant. For validation, datasets [68]GSE164760 (6 normal and 74 NAFLD samples) were utilized. To identify NETs-related genes, we compiled a list of 170 genes from existing literature^[69]20,[70]21 (Supplementary Table [71]2). Gathering and handling of data for single-cell RNA-Seq analysis To assess the influence of the immune microenvironment in the liver on NAFLD and fibrosis, we analyzed the scRNA-seq dataset [72]GSE136103.In analyzing [73]GSE136103 including four high-quality liver samples: [74]GSM4041162, [75]GSM4041163, [76]GSM4041165, and [77]GSM4041167.The "Seurat" package^[78]22 was employed for single-cell sequencing data analysis. The QC process started by choosing cells that had mitochondrial gene content lower than 15% and genes that were expressed in a minimum of three cells, within an expression range of 500 to 5000. For further analysis, we pinpointed 2000 genes characterized by high variability. To minimize batch effects across the four samples, the "Harmony" package was employed. Subsequently, cell clusters were created using the “FindClusters” and “FindNeighbors” functions, and the “t-SNE” method was applied for visualization. The selection of marker genes, vital for annotating different cell types, was informed by previous research findings^[79]23. The AddModuleScore function was utilized to determine each cell's unique signature score, specifically targeting NETs genes. Seurat's "FindMarkers" function was used to identify differentially expressed genes (DEGs) between two distinct groups. We determined the statistical significance of these DEGs by employing the Wilcoxon test, with the adjusted p-value threshold set below 0.05, while keeping other parameters at their default values. Genes exhibiting diverse expressions in cells with distinct NETs scores were identified as potential contributors to NETs at the single-cell transcriptome level. The identified genes were subsequently included in the Weighted Gene Co-expression Network Analysis (WGCNA) to conduct a more comprehensive evaluation of gene expression profiles. In addition, cell interaction dynamics were examined using the "CellChat" R package^[80]24. Analysis of gene co-expression networks using weights (WGCNA) The identification of co-expression modules involved the utilization of the R package 'WGCNA' (version 1.70.3)^[81]25. The analysis, focusing on the NAFLD group, used a combined dataset of gene expressions. To begin, we established an appropriate soft threshold β for creating a scale-free network. Afterward, the weighted adjacency matrix was converted into a topological overlap matrix (TOM), and the dissimilarity (dissTOM) was computed. Next, we utilized the dynamic tree-cut technique to group genes and identify modules. The module that demonstrated the most substantial correlation with the NETs score was earmarked for in-depth analysis. Identification of differentially expressed genes Differentially expressed genes (DEGs) were identified within batch-calibrated datasets [82]GSE89632, [83]GSE48452, and [84]GSE66676. For screening DEGs between NAFLD and normal samples, the Limma program package^[85]26 was utilized, adopting a P. adj value < 0.05 as the cutoff criterion. Due to dataset characteristics, a logFC threshold was not established. The resulting data was visualized using volcano plots and heatmaps, created with the R packages “ggplot2” and “pheatmap,” respectively. Identification of a hub gene associated with neutrophil extracellular traps (NETs) via machine learning algorithm Subsequently, we performed an intersection of the differentially expressed genes (DEGs) at the gene expression profile level with those in the NETs-related module, as identified through WGCNA. The genes found at this intersection were deemed to be involved in neutrophil extracellular traps (NETs) at both the gene expression profile and single-cell transcriptome levels. Consequently, these genes were designated as Neutrophil Extracellular Traps-related Genes (NRGs).To create a strong predictive model with improved accuracy, we utilized the Support Vector Machine (SVM), Least Absolute Shrinkage and Selection Operator (LASSO), and Random Forest (RF) machine learning algorithms. The LASSO technique is a regression approach that prioritizes variable selection to improve the predictive accuracy and interpretability of statistical models^[86]27. RF is advantageous for its lack of variable condition constraints and superior accuracy, sensitivity, and specificity, suitable for predicting continuous variables and providing stable forecasts^[87]28. SVM, on the other hand, constructs a hyperplane in feature space to effectively separate negative from positive instances with the maximum margin^[88]29. We utilized the “glmnet”^[89]30, “e1071”^[90]31, and “randomForest”^[91]32 R packages for conducting LASSO regression, SVM, and RF analysis, respectively. The choice of hub NAFLD genes was made according to the agreement genes identified by all three algorithms. Construction and validation of a diagnostic model for NAFLD The identified hub genes underwent multivariate logistic regression analysis using the 'ROCR' package^[92]33 to assess their diagnostic significance in NAFLD. Additionally, the area under the receiver operator characteristic (ROC) curve (AUC) was calculated to further evaluate their predictive accuracy. Moreover, a nomograph was developed to forecast the probability of NAFLD^[93]34, along with a calibration graph and decision curve analyses to showcase the stability of the model. Gene set variation analysis (GSVA) analysis and gene set enrichment analysis (GSEA) This study commenced with the retrieval of "c5.go.symbols" files from the MSigDB database. Following this, the “GSVA” R package (version 2.11)^[94]35 was utilized to reveal differences in enrichment among Gene Ontology (GO) categories via a non-parametric, unsupervised gene set variation analysis (GSVA) approach. A threshold for statistical significance was set at a p-value lower than 0.05. Moreover, the “clusterProfiler” package (version 3.16.1) was employed for conducting Gene Set Enrichment Analysis (GSEA) to determine the abundance of significant gene clusters in the Kyoto Encyclopedia of Genes and Genomes (KEGG) ([95]http://www.gsea-msigdb.org) pathways^[96]36. Assessment of the infiltration of immune cells using CIBERSORTx and ssGSEA The LM22 genetic characteristic matrix algorithm^[97]36 was used by Cibersort to evaluate the immune-system cell subtype in each sample by analyzing their gene expression profiles. Additionally, the p-value for the backfold product of each sample was computed using Monte Carlo sampling, and immune cell abundance differences between groups were estimated using the Wilcoxon rank sum test. In this study, a p-value < 0.05 was deemed statistically significant. The analysis concentrated on the expression of particular immune cell metagenes through Single-sample Gene Set Enrichment Analysis (ssGSEA). We utilized the 'GSVA' R package for the quantitative assessment of variations in immune functions between groups with high and low expressions of hub genes. The two-tailed Wilcoxon test (p-value < 0.05) was applied to pinpoint differences in immune-related functions between these groups. Subsequently, the 'vioplot' R package^[98]37 was employed for visualizing the results. Experimental animals and histological examination In this study, twelve 6-week-old male C57BL/6 J mice were used, housed in a controlled environment (ambient temperature: 23 °C ± 2 °C; 12-h light/dark cycle) with free access to food and water. After an initial acclimatization period of one week, these mice were randomly segregated into two dietary groups: a normal chow (CON) group and a high-fat diet (HFD) group. The HFD group received a diet with 60% calories from fat (d12492, Medicine, Jiangsu, China), in contrast to the CON group, which was provided with standard lab chow. Following a 16-week dietary regimen, we successfully developed a mouse model indicative of non-alcoholic fatty liver disease (NAFLD)^[99]41. At the end of this period, the mice were sedated using 2% isoflurane and subsequently euthanized via cervical dislocation for liver tissue collection. To analyze morphological changes, liver sections (5 μm thick) embedded in paraffin were subjected to staining with hematoxylin and eosin (H&E) and Oil Red O for assessing hepatic steatosis. The Harbin Medical University's Professional Committee for Animal Protection (2022-DWSYLLCZ-20) sanctioned all the experimental methodologies employed in this study, and all methods were performed in accordance with the relevant guidelines and regulations. Immunohistochemical analyses Immunohistochemical staining of paraffin-embedded liver sections was conducted following standard protocols. Primary antibodies rabbit anti-PHLDA1 (1:100 dilution; PA5948; Abmart) and rabbit anti-ZFP36L2 (1:100 dilution; PA4972; Abmart) were incubated with the sections overnight at 4 °C. The stained sections were then visualized using a light field microscope. To maintain objectivity, a blinded method was used to randomly select three mice for each section. Quantitative polymerase chain reaction (RT-qPCR) Total RNA was extracted from homogenized tissue samples using TRIzol reagent (Invitrogen, Carlsbad, CA, USA). Following this, 1 µg of the extracted RNA underwent reverse transcription with PrimeScript reverse transcriptase (Takara, Kusatsu, Japan). The expression levels of genes were then quantified by employing 2X SYBR Green qPCR (Vazyme, Nanjing, China). For normalization purposes, β-actin was used as an internal reference. The sequences of the primers used for the target genes are specified below: ZFP36L2: CACACTTCTGTCACCCTTCTAC (F), and GTCCAGCATGTTGTTCAGATTG (R); PHLDA1: CACCAGTCAAGCTGAAGGAA (F), and GTCATCACCACAGTGAAGTACA (R). The 2^−ΔΔCt technique was utilized for the semi-quantitative assessment of mRNA expression levels in the target genes. Results Neutrophil extracellular traps characteristic in single‑cell transcriptome We investigated a liver dataset ([100]GSE136103) using single-cell RNA sequencing (scRNA-seq) to explore the contribution of different liver cell types, including hepatocytes, endothelial cells, and immune cells, to the progression of NAFLD and fibrosis. The exam involved creating transcriptomic signatures unique to every cell type, which were determined by genes primarily expressed in each cell subset. This task included the analysis of four liver samples characterized by high-quality single-cell transcriptomes, specifically [101]GSM4041162, [102]GSM4041163, [103]GSM4041165, and [104]GSM4041167.To mitigate batch effects, the Harmony package was employed, successfully integrating the four samples as depicted in Supplementary Fig. [105]1A and B.To achieve dimensionality reduction, the top 2000 genes exhibiting the greatest variability were subjected to principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) techniques. This led to the clustering of all cells into 20 distinct groups with a resolution of 0.5, as shown in Supplementary Fig. [106]2. For cell classification, we utilized specific marker genes corresponding to various cell types, as established in prior research^[107]23. Figure [108]2A illustrates the analyzed cell types, which encompassed Hepatocytes, Cholangiocytes, B cells, CD4 T cells, CD8 T cells, Endothelial cells, Kupffer cells, Macrophages, Monocytes, and NK cells. Figure [109]2B showcases a heatmap of the top five marker genes for each of these cell groups. To assess the function of Neutrophil Extracellular Traps (NETs) in various cell types, we utilized the Seurat package's 'AddModuleScore' function to gauge the expression levels of a distinct group of 170 genes associated with NETs across all cell types (Fig. [110]2C). Notably, Monocytes, Kupffer cells, and Macrophages exhibited significantly heightened levels of NETs activity, as depicted in Fig. [111]2D. Following this, cells were categorized into groups with high and low NETs activity. Based on this classification, 1276 differentially expressed genes (DEGs) were discerned between these two groups, setting the stage for subsequent analysis (see Supplementary Table [112]3). Fig. 2. [113]Fig. 2 [114]Open in a new tab Characteristics of Neutrophil Extracellular Traps (NETs) identified within the single-cell transcriptome: (A) t-SNE plot showing cell types via marker genes. (B) Heatmap displaying each cell cluster's top five marker genes. (C) Scoring NETs activity for each cell. (D) NETs scores' variation across different cell types. Identification of the hub module and genes related to NETs in the expression profile of the NAFLD samples The single-sample Gene Set Enrichment Analysis (ssGSEA) algorithm is widely used to assess changes in biological processes and pathway activities in individual samples. For our investigation, we utilized ssGSEA to calculate a score representing the activity of Neutrophil Extracellular Traps (NETs) for every sample in the GEO-NAFLD dataset. This score was then employed as phenotype data in the subsequent analysis of Weighted Gene Co-expression Network Analysis (WGCNA). To identify modules that are strongly correlated with NETs scores, we performed WGCNA on the 1276 DEGs associated with NETs that were identified through single-cell sequencing. Outlier samples were excluded before the analysis (Supplementary Fig. [115]3B). By utilizing a soft power value of 6 (Supplementary Fig. [116]3A), gene modules were defined and the dynamic tree-cut algorithm was employed to detect three separate co-expressed gene modules. These modules were then displayed in a topological overlap matrix (TOM) heatmap (Supplementary Fig. [117]3C). Our analysis revealed that the MEblue module exhibited a strong correlation with the NRGs score in the expression profile (cor = 0.59, Fig. [118]3A). Moreover, the scatter diagram depicting the importance of genes (GS) compared to their membership in the blue module revealed a noteworthy association (cor = 0.68, p = 1.9e − 45, Fig. [119]3B). This suggests that the genes within this module could potentially have functional significance in neutrophil extracellular traps (Supplementary Table [120]6). To depict the dissimilarly expressed genes in the expression profile of normal tissues and NAFLD samples, volcano plots and heat maps (Fig. [121]3C,D) were employed. By intersecting the 209 genes from the blue module with the DEGs in the expression profile, we identified a total of 116 genes (Fig. [122]3E), which are believed to be involved in Neutrophil Extracellular Traps (NETs) at both global and single-cell transcriptome levels. These genes have been designated as Neutrophil Extracellular Traps-related genes (NRGs). Gene Ontology (GO) analysis of these NRGs revealed significant enrichment in biological processes (BP) including phagocytosis, response to bacterial origin molecules, lipopolysaccharide response, and steroid hormone response. Furthermore, enhancements were observed in the cellular component (CC) classification, particularly in the extracellular matrix containing collagen, and in molecular functions (MF) like inhibitory activity of enzymes, DNA-binding transcription activator activity, and apoptotic process involving cysteine-type endopeptidase inhibitor activity(Fig. [123]3F–G) (Supplementary Table [124]4). Fig. 3. [125]Fig. 3 [126]Open in a new tab Identifying Neutrophil Extracellular Traps-related genes (NRGs). (A) Module-trait heatmap links the MEblue module with the NETs trait. (B) The scatter plot correlates gene significance (GS) with module membership (MM) in the blue module. (C) DEGs heatmap, with color variations indicating gene expression trends in GEO NAFLD versus normal samples. The top 100 genes ranked by adjusted p-values are shown. (D) The volcano plot presents GEO NAFLD and normal sample differential analysis. (E) Venn diagram depicts common genes between the MEblue module and Microarray DEGs. (F) GO enrichment analysis for NRGs. (G) Circular diagram of NRGs' GO enrichment. Identification of NETs-related hub genes for NAFLD To further identify NETs-related hub genes for NAFLD, we analyzed the 116 Neutrophil Extracellular Traps-related genes (NRGs) using a combination of three machine-learning algorithms. Initially, LASSO regression analysis was performed on the intersected genes, resulting in the identification of twenty-seven candidate hub genes (Fig. [127]4A,B, Supplementary Table [128]7). Subsequently, the SVM-RFE analysis indicated that the classifier error was minimal when the eigengene number was twenty-nine (Fig. [129]4C,D, Supplementary Table [130]7). Following this, the Random Forest (RF) algorithm ranked the relative importance of the genes, identifying five characteristic genes (Fig. [131]4E and F, Supplementary Table [132]7). Finally, the overlapping genes determined by all three algorithms led to the selection of ZFP36L2 and PHLDA1 as the hub genes (Fig. [133]5A). Fig. 4. [134]Fig. 4 [135]Open in a new tab Identification of the NETs-related hub genes for NAFLD: (A,B) LASSO regression for hub gene screening. (A) LASSO coefficient spectrum for 28 genes, generating a logarithmic coefficient map. (B) Optimal lambda selection via repeated tenfold cross-validation, guiding feature selection. (C,D) RF for candidate hub gene identification. (C) Decision tree count impacts error rates; green, red, and black represent NAFLD, non-NAFLD, and all samples respectively. (D) Gini importance plot with mean decrease Gini on the horizontal axis and NRGs on the vertical. (E,F) Finalizing 29 characteristic NRGs using SVM-RFE. Fig. 5. [136]Fig. 5 [137]Open in a new tab NRGHs-related Nomogram built to evaluate clinical utility. (A) Overlapping genes among three algorithms are shown in a Venn diagram. (B) A predictive nomogram for NAFLD occurrence constructed from NRGHs. (C) In decision curve analysis, red highlights the nomogram's net benefit in NAFLD prediction versus universal or no NAFLD assumptions (black and gray, respectively). (D) The calibration curve compares actual versus predicted NAFLD rates. An ideal model's perfect prediction is indicated by the dotted diagonal, with the nomogram's performance shown by the solid line, where closer proximity to the diagonal suggests higher accuracy. Evaluation of hub genes associated with NETs We developed a nomogram model centered on the two NET-associated hub genes to estimate the probability of NAFLD onset and to evaluate their predictive accuracy (as illustrated in Fig. [138]5B). The efficacy of this model was corroborated using a calibration curve (shown in Fig. [139]5D) and through Decision Curve Analysis (DCA) (depicted in Fig. [140]5C). Then, boxplots were used to show the expression of the two hub genes in the training set and the validation set [141]GSE164760 (shown in Fig. [142]6A,B). Furthermore, the Receiver Operating Characteristic (ROC) analysis was utilized to ascertain the Area Under the Curve (AUC) and the 95% Confidence Intervals (CI) for each of the genes under consideration. The results were as follows: PHLDA1 (AUC: 0.783, 95% CI 0.711 − 0.849) and ZFP36L2 (AUC: 0.713, 95% CI 0.628 − 0.795) (Fig. [143]6C and E), demonstrating substantial diagnostic efficiency. Furthermore, ROC analysis of the validation dataset [144]GSE164760 showed similar efficacy for PHLDA1 (AUC: 0.753, 95% CI 0.561 − 0.923) and ZFP36L2 (AUC: 0.721, 95% CI 0.588 − 0.838) (Fig. [145]6D and F). Fig. 6. [146]Fig. 6 [147]Open in a new tab NRGHs' efficacy in both training and validation sets: (A) Boxplots compare ZFP36L2 and PHLDA1 expressions between NAFLD and controls in the training set. (B) A similar comparison is in the validation set [148]GSE164760. (C,D) ROC curves for the PHLDA1 gene in diagnosing NAFLD in the training set (C) and [149]GSE164760 validation set D. (E,F) ROC analysis for ZFP36L2 gene in NAFLD diagnosis within the training set (E) and [150]GSE164760 validation set (F). *P < 0.05, ***P < 0.001. NETs-related hub genes were associated with NAFLD-related enrichment pathways To delve deeper into the molecular mechanisms of two NETs-related hub genes in the context of NAFLD diagnosis, ssGSEA-KEGG pathway enrichment analysis was applied to each gene biomarker. The graphical representations highlighted the top six most enriched pathways (as seen in Fig. [151]7A,B and Supplementary Table [152]5). Comprehensive analysis indicated significant enrichment of these two hub genes in pathways such as cytokine-cytokine receptor interaction, MAPK signaling, ECM receptor interaction, olfactory transduction, and the JAK-STAT signaling pathway. Subsequently, NAFLD samples were categorized into high and low-expression groups based on the median expression levels of the hub genes.GSVA enrichment analysis was then performed to investigate the differential pathways between these groups. Comprehensive analysis indicated that high expression of ZFP36L2 might activate pathways such as protein export, nonhomologous end joining, regulation of autophagy, and riboflavin metabolism. Conversely, low expression of ZFP36L2 was associated with the activation of pathways like bladder cancer, glycosaminoglycan biosynthesis—keratan sulfate, taurine and hypotaurine metabolism, and glycosaminoglycan degradation (Fig. [153]7C). Similarly, high expression of PHLDA1 was linked to the activation of sulfur metabolism, base excision repair, and nonhomologous end joining pathways. In contrast, low expression of PHLDA1 was correlated with the activation of nod-like receptor signaling, leishmania infection, hematopoietic cell lineage, and glycosaminoglycan biosynthesis—chondroitin sulfate pathways (Fig. [154]7D). Fig. 7. [155]Fig. 7 [156]Open in a new tab GSEA and GSVA of 2 NRGHs. GSEA of ZFP36L2: (A) and PHLDA1 (B) genes using KEGG gene sets. GSVA of ZFP36L2 C and PHLDA1 D genes using KEGG gene sets. The correlation of the NETs-related hub genes with single‑cell characteristics To evaluate the function of NRGHs in the liver's microenvironment at the single-cell transcriptomic level, we analyzed the expression patterns of PHLDA1 and ZFP36L2 across different cell types (Fig. [157]8A). These results revealed that ZFP36L2 is broadly expressed in various liver cell types, with PHLDA1 predominantly expressed in Hepatocytes and Cholangiocytes (Fig. [158]8B,C). Employing the “AddModuleScore” function, we determined the signature-specific score for each cell based on the NETs-related hub genes (NRGHs). Significantly, CD4 T cells, NK cells, and CD8 T cells displayed markedly higher scores (see Supplementary Fig. [159]4). Cells were divided into groups with high and low scores based on their NRGHs scores, followed by a subsequent differential analysis. KEGG and GSEA pathway analyses of the differentially expressed genes showed significant enrichment in pathways, including natural killer cell-mediated cytotoxicity, Th1 and Th2 cell differentiation, Th17 cell differentiation, cytokine-cytokine receptor interaction, and the chemokine signaling pathway (refer to Fig. [160]8D and E). It was observed that liver cells in the microenvironment with varying NRGHs scores exhibited diverse communication patterns (Fig. [161]8F). Within the microenvironment of the liver, various types of cells have the ability to act as transmitters, recipients, facilitators, and agents in the process of cellular communication, ultimately resulting in distinct intercellular cues. Our study identified significant changes and influencers in the cell communication signals of the low-score group, particularly in VISFATIN, ANGPTL, and COMPLEMENT signaling (Fig. [162]8G–I). These results indicate that such signals could have a regulatory impact on inflammation, metabolism, and apoptosis in the liver microenvironment^[163]38–[164]40. Fig. 8. [165]Fig. 8 [166]Open in a new tab NRGHs' correlation with single-cell characteristics: (A–C) ZFP36L2 and PHLDA1 expression across cell types was analyzed via single-cell RNA-seq. (D) KEGG dissects DEGs in high vs. low NRGHs score groups. (E) GSEA pinpoints GO terms prevalent in these groups by NRGHs scores. (F) Identifying distinct signal pathways in varying NRGHs score groups. (G–I) Circos plots for VISFATIN (G), ANGPTL (H), and COMPLEMENT (I) pathways, with heatmaps showing cell type involvement. Immune microenvironment and immune-related functions analysis To investigate the immune response mechanisms in NAFLD, the CIBERSORT algorithm was employed to assess the variation in immune cell abundance between patients with NAFLD and healthy individuals (as illustrated in Fig. [167]9A,B). Our findings indicated a notably higher presence of Macrophages M1, Macrophages M2, and resting Mast cells in NAFLD samples compared to control ones. Conversely, levels of naive B cells, Monocytes, activated Mast cells, and Neutrophils were significantly reduced in NAFLD samples relative to controls. Additionally, we investigated the differences in immune functions between groups exhibiting high and low expression of the hub genes, as shown in (see Fig. [168]9C,D). Additionally, a correlation heatmap was used to illustrate the associations between the hub genes and different immune cells (refer to Fig. [169]9E). Fig. 9. [170]Fig. 9 [171]Open in a new tab Assessment of the extent of immune cell infiltration using data from the training set: (A) Immune cell proportion changes are shown in a stacked histogram. (B) NAFLD vs. control group differences in immune infiltration. ssGSEA assesses immune function differences in NRGHs' high vs. low expression groups for ZFP36L2 (C) and PHLDA1 (D). (E) The link between immune infiltration and NRGHs-inclusive genes. *P < 0.05, ***P < 0.001. Hub gene expression validation in NAFLD mouse model Hub Gene Expression Confirmation in NAFLD Mouse Model: The high-fat diet (HFD) group, exhibited severe hepatic steatosis and sporadic inflammation, as evidenced by H&E and Oil Red O staining of liver tissue sections (Fig. [172]10A,B). The qRT-PCR analysis revealed that PHLDA1 and ZFP36L2 expression levels were significantly reduced in the liver tissues of the HFD group in comparison to the control group (refer to Fig. [173]10E,F). This observation was corroborated by immunohistochemical staining, demonstrating lower levels of PHLDA1 and ZFP36L2 expression in the liver tissues of the HFD group compared to the control group. (Fig. [174]10C,D). Fig. 10. [175]Fig. 10 [176]Open in a new tab Validation of NRGHs expression in NAFLD mouse model: (A) Liver tissue samples from the HFD and CON groups were stained with H&E. (B) Liver tissue samples from the HFD and CON groups were stained with Oil Red O. (C,D) Immunohistochemical staining was performed to assess the expression levels of ZFP36L2 (C) and PHLDA1 (D) in the Liver tissues. (E,F) mRNA expression levels of ZFP36L2 (E) and PHLDA1 (F) in Liver tissues from HFD and CON groups. *P < 0.05, ***P < 0.001. Discussion The development of Non-alcoholic Fatty Liver Disease (NAFLD) is intricately linked to Neutrophil Extracellular Traps (NETs). While vital for defending against infection and inflammation, excessive accumulation of NETs can lead to liver damage and disease progression, potentially culminating in liver failure. Recent studies highlight the importance of peptidyl arginine deiminase 4 (PAD 4) in NET formation, noting that neutrophils deficient in PAD 4 are unable to form NETs^[177]41. Two members of the DNase 1 family, specifically DNase 1 and DNase 1-like 3 (DNase 1 L3), are recognized as key contributors to NET formation, effective both in vitro and in vivo^[178]41. Abnormal lipid accumulation due to lipotoxicity is considered a key event in the progression of hepatic steatosis.NAFLD is characterized by a notable rise in the generation of Free Fatty Acids (FFAs)^[179]42,[180]43. Inhibiting fatty acid synthase (FATCH) in primary human liver tissues has been shown to prevent steatosis^[181]44. In vitro studies have shown that FFAs, including linoleic acid (LA) and palmitic acid (PA), can induce NET formation, while oleic acid (OA) does not. However, suppressing the increase in free fatty acids (FFAs) is not accomplished by inhibiting NETs with DNase 1 or through the use of PAD 4 knockout mice, suggesting that NET formation is not a causative factor in steatosis, but rather a result of lipid accumulation^[182]15. The underlying mechanisms in this context still warrant further investigation. At first, we examined the liver scRNA-seq dataset ([183]GSE136103) and generated transcriptomic signatures specific to each cell subtype by identifying genes that were expressed significantly. Afterward, subsequently, the "AddModuleScore" function was used to determine the expression levels of 170 NETs associated genes in all cells, thereby quantifying NETs activity across various cell types. Notably, Monocytes, Kupffer cells, and Macrophages exhibited significantly higher NETs activity among ten cell types (Fig. [184]2D). Based on NETs activity, cells were classified into high and low NETs score groups, and 1276 differentially expressed genes (DEGs) between these groups were identified for further analysis. To identify NETs-related genes at both single-cell and bulk transcriptome levels, we utilized a novel bioinformatics approach that combines AddModuleScore, single-sample Gene Set Enrichment Analysis (ssGSEA), and Weighted Gene Co-expression Network Analysis (WGCNA) algorithms. Weighted Gene Co-expression Network Analysis (WGCNA) results indicated that genes within the blue module might be significantly associated with Neutrophil Extracellular Traps (NETs). By using the GEO database, we examined the levels of gene expression in individuals with NAFLD and those who are in good health. We then compared the 209 genes from the blue module with the differentially expressed genes (DEGs) found in the expression profile, resulting in the identification of 116 genes. These genes are believed to participate in NETs at both global and single-cell transcriptome levels and have been named Neutrophil Extracellular Traps-related genes (NRGs). Functional enrichment analysis of these NRGs suggested their strong association with immune response pathways and DNA-binding transcription activator activity. This corroborates previous research showing that overexpression of NETs can lead to liver inflammation^[185]45–[186]47, with NETs primarily composed of DNA^[187]14. Due to their capacity to recognize and manage high-dimensional feature data, machine learning algorithms, a crucial aspect of Artificial Intelligence (AI), have been widely employed in the identification and screening of hub genes, thanks to the swift progress of AI^[188]48. To further identify core NETs-related genes associated with NAFLD, we analyzed the 116 NRGs using three machine learning algorithms. In the end, ZFP36L2 and PHLDA1 were chosen as central genes, and their ability to diagnose NAFLD was confirmed in a separate dataset. The decay of mRNA targets^[189]49 is facilitated by CCCH tandem zinc finger (TZF) proteins, namely ZFP36, ZFP36L1, and ZFP36L2, which are members of the Tristetraprolin (TTP) family. An extra member, ZFP36L3, found exclusively in rodents, is specifically expressed in both the placenta and the yolk sac^[190]50. The TTP family proteins have been recognized as crucial in controlling cell apoptosis and have a notable function in cellular differentiation^[191]51–[192]53. Targeted gene studies in mice have delineated diverse functions of ZFP36 family members, notably identifying ZFP36L2 as a crucial regulator in hematopoiesis^[193]54–[194]56. Its role extends to female fertility, with its inactivation linked to impaired lipid metabolism^[195]57,[196]58. Imbalances in T-cell responses are observed in various autoimmune diseases^[197]59. ZFP36L2 is involved in the progression of human autoimmune diseases, specifically Systemic Lupus Erythematosus (SLE), due to its crucial function in the differentiation of hematopoietic stem cells and the formation of the thymus.ZFP36L2 shows notable downregulation in the PBMCs of individuals with SLE when compared to healthy individuals^[198]60. Furthermore, ZFP36L2 has been recognized as a gene linked to increased susceptibility in Multiple Sclerosis (MS), where its expression is notably reduced in patients relative to healthy counterparts^[199]61. The degradation of Ikzf2 mRNA^[200]62 has been linked to the inhibition of induced T cell proliferation (iTregs) through the involvement of ZFP36L2 in accelerating it. It is noteworthy that RNA-binding proteins play a crucial role in regulating T cell activation, proliferation, and cytokine production. The Zinc Finger Protein 36 (ZFP36) family genes (including Zfp36, Zfp36l1, and Zfp36l2) encode RNA-binding proteins that facilitate the degradation of transcripts containing AU-rich elements. The homeostasis of T cells relies on the RNA-binding proteins of the ZFP36 family, with primary dependency on ZFP36L1 and ZFP36L2 in T cells^[201]63. Previous studies have demonstrated that by promoting the differentiation and function of regulatory T cells, NETs serve as a bridge between adaptive and innate immunity^[202]16. In this study, a mouse model subjected to a Western diet was utilized to induce a NASH phenotype, revealing a direct relationship between enhanced regulatory T cell activity and NETs formation. Moreover, inhibiting regulatory T cells can prevent the progression of NASH liver. This finding underscores the role of the mitochondrial oxidative phosphorylation (OXPHOS) pathway in naïve CD4 positive T, mediated by TLR4-induced metabolic reprogramming^[203]16. Initially identified as T-cell death-associated gene 51 (TDAG 51), Pleckstrin Homology-like Domain Family A Member 1 (PHLDA1) was first linked to apoptotic processes in adaptive immunity^[204]64,[205]65. Research has shown that PHLDA1 is involved in regulating cell growth, maintaining energy balance, promoting cell specialization, and regulating programmed cell death^[206]66–[207]68. Recent research has underscored the substantial role of PHLDA1 in immune responses. As an illustration, Hossain and colleagues, as well as Han et al., have demonstrated that inhibiting PHLDA1 modifies the properties of macrophages and endothelial cells, thereby reducing oxidative and endoplasmic reticulum (ER) stress associated with atherosclerosis^[208]68 in studies related to Parkinson's disease. According to the report, PHLDA1 has been identified as a potent controller of neuroinflammation, as its suppression greatly hampers the activation of M1 microglia^[209]69. Another study demonstrated that compounds targeting PHLDA1 inhibition can mitigate neuroinflammation following an ischemic stroke by maintaining equilibrium between M1 and M2 polarization within microglia^[210]70. Additionally, recent hepatic research has uncovered that the miR-194/PHLDA1 axis is pivotal as an upstream regulator of IKK and MAPK in hepatic ischemia–reperfusion injury (IRI). Focusing on PHLDA1 might offer a promising strategy in the treatment of liver ischemia–reperfusion injury (IRI)^[211]70. Moreover, reduced hepatic levels of TDAG51 protein are positively correlated with obesity, hepatic steatosis, and insulin resistance (IR). Supporting this notion, established obese mouse models, including wild-type mice fed a high-fat diet (HFD) and leptin-deficient mice, exhibit significantly decreased hepatic TDAG51 protein levels^[212]67. In human hepatocytes, overexpression of PHLDA1 protein negatively regulates SREBP-1 protein, a key regulator of triglyceride synthesis. Restoring the endoplasmic reticulum stress response protein TDAG51 in hepatocytes can alleviate non-alcoholic fatty liver disease (NAFLD) in mice^[213]71. During inflammatory stimuli, TDAG51 acts as a regulator of the transcription factor FoxO1, leading to enhanced FoxO1 activity in LPS-induced inflammatory responses^[214]72. Concurrently, abnormal TLR4-induced inflammation is closely associated with the development of acute inflammatory diseases, including bacterial sepsis, and chronic inflammatory diseases such as autoimmune diseases, cancer, and metabolic disorders^[215]73–[216]75. Additionally, in response to LPS stimulation, neutrophils release NETs via activation of the Toll-like receptor 4 (TLR4)-JNK axis^[217]76. Consequently, steatosis may induce the formation of NETs through these various pathways. To date, there is no documentation regarding the roles of these two crucial genes in the initiation and progression of NAFLD. Nevertheless, our investigation uncovered that their primary engagement lies in pathways such as the interaction between cytokines and cytokine receptors, the signaling pathway of MAPK, the interaction with extracellular matrix receptors, the transduction of olfactory signals, and the signaling pathway of JAK-STAT. Furthermore, NF-κB is known to catalyze the production of NLRP3, the precursor of IL-1β, and additional pro-inflammatory cytokines. Potential triggers of inflammasomes include substances such as cholesterol crystals, Reactive Oxygen Species (ROS), and fatty acids, acting as Damage-Associated Molecular Patterns (DAMPs)^[218]77–[219]79. These mechanisms work together to activate toll-like receptors (TLRs), leading to the activation of NF-κB and the mitogen-activated protein kinase (MAPK) signaling pathways. These pathways play a crucial role in inflammatory and fibrotic processes^[220]77–[221]79. Recent research indicates that the JAK-STAT signaling pathway plays a crucial part in the development of inflammatory disorders. It is worth mentioning that Nicolas and his team described the disruption of the JAK-STAT pathway in conditions marked by inflammation, cancer, and neurodegeneration. Similarly, Cai and colleagues emphasized the connection between the release of different cytokines and inflammatory mediators and the JAK/STAT pathway, underscoring its regulatory function in the immune response to sepsis^[222]80. Additionally, recent research has also validated the involvement of JAK-STAT in NASH^[223]81. Furthermore, Wohlmann et al.It was discovered that TSLP induces inflammatory reactions in atopic conditions via the JAK-STAT signaling pathway^[224]82. Therefore, we hypothesize that the hub genes PHLDA1 and ZFP36L2 may participate in the pathogenesis of NAFLD through toll-like receptors, the MAPK signaling pathway, and the JAK-STAT pathway, potentially becoming viable targets for NAFLD treatment. However, their detailed mechanisms warrant further investigation. Given that Neutrophil Extracellular Traps-related Hub Genes (NRGHs) were derived from single-cell transcriptomic data, we re-examined this dataset to delve into the molecular mechanisms linked to NRGHs. The "AddModuleScore" function was utilized to compute a signature-specific score for each cell, anchored in the NRGHs. Significantly, CD4 T cells, NK cells, and CD8 T cells displayed markedly higher scores, indicating a potential role of these genes in the functioning of immune cells. Furthermore, we identified biological processes pertinent to NAFLD progression and immune function, including natural killer cell-mediated cytotoxicity, Th1 and Th2 cell differentiation, Th17 cell differentiation, as well as cytokine-cytokine receptor interactions and chemokine signaling pathways. These findings further substantiate the involvement of NRGHs in NAFLD progression and their association with immune functions. Furthermore, by analyzing cellular communication, we have discovered unique patterns of communication between liver microenvironments that have high and low NRGHs scores. For instance, we noted significant changes and influencing factors in the low NRGHs score group’s cellular communication signals in VISFATIN, ANGPTL, and complement signaling, suggesting their potential role in regulating inflammation, metabolism, and apoptosis in the liver microenvironment^[225]83–[226]85. Furthermore, the association between the onset of inflammation and NRGHs was demonstrated by utilizing the CIBERSORT algorithm to evaluate the variances in immune cell abundance among NAFLD patients and individuals in good health, as well as by analyzing the correlation between immune cells and NRGHs. However, our study still has a few limitations. First, while gene expression results were authenticated in mouse models, the derived conclusions came from a modestly sized group of NAFLD patients, highlighting the need for larger patient cohorts to ensure more dependable and solid results. Additionally, the diagnostic model for NAFLD formulated in this study demands more comprehensive evaluation and validation from external sources before its clinical implementation. Finally, this study concentrated exclusively on gene expression data. Consequently, subsequent studies should also delve into the changes in epigenetics, proteomics, and metabolomics involved in NAFLD's development. Conclusions Based on our current understanding, this study represents the first investigation into the molecular properties of genes associated with NETs in NAFLD. It has discovered two potential biomarkers, namely PHLDA1 and ZFP36L2, and has provided insights into their functions within the liver microenvironment. These discoveries may aid in the diagnosis and management of NAFLD, ultimately aiming to enhance patient prognosis. Supplementary Information [227]Supplementary Figure 1.^ (653.8KB, pdf) [228]Supplementary Figure 2.^ (3MB, pdf) [229]Supplementary Figure 3.^ (562.2KB, pdf) [230]Supplementary Figure 4.^ (373.1KB, pdf) [231]Supplementary Legends.^ (15KB, docx) [232]Supplementary Tables.^ (187.5KB, xlsx) Acknowledgements