Abstract

Objective

   Diabetes mellitus combined with nonalcoholic fatty liver disease is a
   prevalent and intricate metabolic disorder that presents a significant
   global health challenge, imposing economic and emotional burdens on
   society and families. An in-depth understanding of the disease
   pathogenesis is crucial for enhancing diagnostic and therapeutic
   efficacy. Therefore, the study aims to identify and validate
   autophagy-related diagnostic biomarkers associated with T2DM-associated
   MAFLD, investigate regulatory mechanisms in disease progression, and
   explore cellular diversity within the same tissue using single-cell
   sequencing data.

Methods

   This study utilized four datasets retrieved from the Gene Expression
   Omnibus (GEO) database: [35]GSE15653, [36]GSE89632, [37]GSE24807 and
   [38]GSE23343. The analysis involved variance analysis, WGCNA analysis,
   PPI network construction, machine learning application, examination of
   autophagy-related gene sets, and diagnostic ROC analysis to identify
   and validate autophagy-related biomarkers in T2DM combined with MAFLD
   within an independent external dataset. Functional enrichment analysis,
   immune infiltration analysis, and validation of gene significance in
   T2DM combined with MAFLD progression were conducted using animal
   experiments to understand the biological functions and immunomodulatory
   roles of key biomarkers. Cellular diversity within liver tissues was
   characterized at the single-cell level, exploring interrelationships,
   differentiation, and developmental trajectories among cell populations
   through cellular communication and pseudo-temporal analyses.

Results

   The study identified four key biomarkers (IRAK3, TNFRSF1A, CX3CR1,
   JUNB). Real-time fluorescence quantitative PCR analysis in animal
   experiments demonstrated significantly higher mRNA expression levels of
   IRAK3, TNFRSF1A, CX3CR1, and JUNB in T2DM and MAFLD rat liver tissues
   compared to the control group. Quantitative immunohistochemical
   analysis revealed notably elevated protein expression levels of IRAK3,
   TNFRSF1A, CX3CR1, and JUNB in liver tissues of rats with T2DM and MAFLD
   when contrasted with the control group (P < 0.05). Enrichment analysis
   indicated associations of T2DM combined with MAFLD pathogenesis with
   pathways such as the NF-kappa B signaling pathway, MAPK signaling
   pathway, Fluid shear stress and atherosclerosis, Insulin resistance,
   and Cytokine-cytokine receptor interaction. Correlative analysis
   uncovered connections between immune infiltration and the identified
   genes. Single-cell transcriptomic analysis highlighted the
   differentiation of CX3CR1, JUNB, and TFRC in various
   single-cell-annotated populations. The pseudo-temporal analysis of
   epithelial cells identified enriched genes at crucial nodes related to
   “Leukocyte transendothelial migration”, “Lipid and atherosclerosis”,
   and “Type II diabetes mellitus” signaling pathways. Additionally, four
   cellular communication signaling pathways (TNF, CXCL, VEGF, and MIF)
   potentially significant in T2DM combined with MAFLD progression were
   identified through cell communication analysis.

Conclusion

   This study unveiled potential associations and key biomarkers (IRAK3,
   TNFRSF1A, CX3CR1, JUNB) concerning T2DM combined with MAFLD and
   relevant pathways, offering novel insights for the investigation of
   these two conditions.

   Keywords: T2DM, MAFLD, single-cell RNA sequencing, biomarkers, immune
   signaling pathway

1. Introduction

   Type 2 Diabetes Mellitus (T2DM) is a prevalent metabolic disorder
   characterized by insulin resistance and relative insulin deficiency,
   resulting in hyperglycemia. The International Diabetes Federation
   reported approximately 537 million adults living with diabetes globally
   in 2021, with T2DM representing about 90% of these cases ([39]1).
   Metabolic dysfunction-associated fatty liver disease (MAFLD), formerly
   termed non-alcoholic fatty liver disease (NAFLD), is characterized by
   excessive fat accumulation in the liver and affects nearly 25% of the
   global population. The surge in MAFLD cases mirrors shifts in the
   environment and lifestyle driven by rapid global industrialization;
   environmental elements like air pollution may heighten MAFLD risk,
   compounding socio-economic challenges. In essence, MAFLD represents an
   escalating global health issue necessitating robust preventive and
   therapeutic strategies. The pathogenesis of MAFLD is multifactorial,
   heavily influenced by obesity, insulin resistance, and dyslipidemia.
   The co-occurrence of MAFLD in T2DM patients raises concerns as it
   increases the risk of liver-related complications and cardiovascular
   events ([40]2, [41]3). This relationship is intricate, with overlapping
   metabolic pathways and inflammatory processes that may drive disease
   progression. Furthermore, MAFLD is closely linked to metabolic
   dysregulation, primarily insulin resistance, which exacerbates liver
   fat accumulation and inflammation ([42]4). The interplay between T2DM
   and MAFLD is significant; both conditions share common risk factors and
   pathophysiological mechanisms, including obesity and dyslipidemia,
   creating a vicious cycle that further deteriorates metabolic health
   ([43]5). Given the insidious progression of T2DM with MAFLD,
   identification of novel diagnostic markers significantly impacts the
   early detection and treatment such kind of disease.

   The correlation between autophagy and MAFLD has garnered substantial
   attention. Autophagy, a conserved catabolic process, plays a crucial
   role in physiological functions by degrading misfolded proteins,
   eliminating dysfunctional organelles, and influencing growth and aging.
   Autophagy’s formation of autophagosomes, which ensnare substances for
   degradation and unite with lysosomes housing hydrolytic enzymes,
   results in the breakdown of encapsulated components into biomolecule
   monomers like amino acids, fatty acids, and nucleotides for cellular
   recycling. An integral mechanism of MAFLD involves diminished
   autophagic capacity; the accumulation of excess lipids in hepatocytes
   can hinder the autophagic process, impeding efficient lipid removal and
   exacerbating fatty liver progression. Moreover, oxidative stress
   emerges as a significant determinant of autophagic levels in MAFLD;
   augmented reactive oxygen species (ROS) production can disrupt
   autophagy regulation and functionality. Research indicates a prevalent
   magnesium deficiency in individuals with obesity and metabolic
   syndrome. Magnesium supplementation demonstrates effectiveness in
   ameliorating metabolic disorders such as obesity and fatty liver by
   modulating cellular lipid metabolism through autophagy stimulation via
   the AMPK/mTOR pathway, thereby curbing lipid buildup. Recognizing
   autophagy’s pivotal role in MAFLD pathogenesis, drug development
   targeting the autophagic process holds promise as a novel therapeutic
   avenue for MAFLD treatment.

   The interplay between T2DM and MAFLD has not been fully elucidated,
   particularly regarding the role of autophagy-related genes. To address
   this knowledge gap, our study aimed to examine the relationship between
   autophagy-related genes and endoplasmic reticulum stress-associated
   genes in the context of T2DM and MAFLD. We utilized relevant datasets
   from the GEO database for differential analysis to identify
   differentially expressed genes (DEGs). Subsequently, we employed WGCNA,
   PPI network analysis, machine learning techniques, diagnostic ROC
   analyses, and enrichment assessments to identify key biological markers
   and elucidate their pathways, gaining deeper insights into the
   underlying biological mechanisms.

   In our rat model of T2DM combined with MAFLD, we verified the
   expression levels of predicted genes, identifying IRAK3, TNFRSF1A,
   CX3CR1, and JUNB as potentially significant candidates. Further,
   leveraging single-cell analysis, we elucidated potential mechanisms
   underlying the progression of T2DM and MAFLD, focusing on precise
   cellular localization, differentiation, developmental trajectories
   within pertinent cell populations, and potential intercellular
   communication networks. This study not only enhances our understanding
   of the pathogenesis of T2DM combined with MAFLD but also identifies
   diagnostic factors linked to autophagy. Such advancements are vital for
   improving early identification, precise diagnosis, personalized
   treatment, and disease monitoring for these conditions. The findings
   provide novel opportunities and challenges for the prevention and
   treatment of T2DM and MAFLD. The flow diagram of this study is showed
   in the [44]Supplementary Figure 1 .

2. Material and methods

2.1. Identification of differential expression genes

   The datasets [45]GSE15653, [46]GSE89632, [47]GSE24807 and [48]GSE23343
   were obtained from the GEO database through the GEOquery package
   (version 2.72.0) ([49]6),please refer to [50]Supplementary Table S1 for
   more details.[51]GSE15653 and [52]GSE89632 served as training sets,
   while [53]GSE24807 and [54]GSE23343 was utilized as the validation set.
   The MAFLD samples within the [55]GSE89632 dataset were categorized by
   different stages: Simple Steatosis (SS) and Non-Alcoholic
   Steatohepatitis (NASH), and were designated as GSE89632_Simple
   Steatosis (GSE89632_SS) and GSE89632_Non-Alcoholic Steatohepatitis
   (GSE89632_NASH), respectively. The data from [56]GSE15653, GSE89632_SS,
   and GSE89632_NASH were normalized using the normalizeBetweenArrays
   function from the limma package ([57]7). Principal component analysis
   (PCA) was performed on the normalized dataset. Difference analysis was
   performed separately to obtain the respective DEGs, using limma package
   (version 3.58.1), with screening criteria: |log2FC| > 0.5 and P < 0.05
   ([58]8, [59]9). Subsequent analyses included the generation of PCA
   plots, volcano plots, and basic numerical heatmaps using “ggplot2,”
   alongside the creation of complex numerical heatmaps utilizing the
   “ComplexHeatmap” package (version 2.20.0) ([60]10).

2.2. WGCNA analysis

   Weighted gene co-expression networks were analyzed using the WGCNA
   package (version 1.72-5) ([61]11) for the datasets GSE89632_SS and
   GSE89632_NASH. Initially, the analysis focused on the top 25% of genes
   based on expression variance, identifying and excluding potential
   outlier samples through clustering analysis. Next, the
   pickSoftThreshold function was employed to ascertain the optimal soft
   threshold. After establishing the threshold, the cutreeDynamic function
   identified dynamic modules and set the minimum gene count required for
   each module ([62]12). Initial modules were identified through the
   Dynamic Tree Cut (DTC) method, with similar modules subsequently merged
   based on the clustering relationships of modular eigengene (ME).
   Subsequently, the TOMsimilarity function calculated the topological
   molecular similarity matrix among genes, with 1,000 genes randomly
   selected for heatmap visualization. Pearson correlation coefficients
   were computed to evaluate associations between modules and clinical
   traits, with statistical significance assessed via the corPvalueStudent
   function (WGCNA R package). Modules demonstrating significant
   correlations with both SS and NASH (p < 0.05) were selected for
   downstream analysis. KEGG pathway enrichment analysis was then
   systematically performed on the genes within these modules.

2.3. Protein-protein interaction network

   The results of DEGs of [63]GSE15653 were intersected with DEGs of
   GSE89632_SS, GSE89632_NASH, and WGCNA modular genes, respectively, and
   the results were visualized in an “UpSet plot”. In order to explore the
   potential interactions of the above genes, we used the STRING database
   ([64]https://string-db.org/) for the analysis ([65]13), and the results
   were imported into Cytoscape (version 3.10.2) for the analysis of
   protein interaction networks ([66]14). Meanwhile, the Maximal Clique
   Centrality (MCC), Degree and Molecular Network Centrality (MNC)
   algorithms in CytoHubba plug-in were applied to select the top 15 genes
   at key positions in the PPI network ([67]15). Subsequently, the results
   of the three algorithms were intersected and visualized using a Wayne
   diagram. Chromosomal localization analysis of the intersection hub
   genes was performed using the “circlize package”. Finally, the
   correlation analysis was performed using Spearman, and the correlation
   pie charts were used to show the degree of association and interactions
   between genes, which were visualized using the “ggplot2 package”.

2.4. Acquisition of autophagy-related genes

   In order to deeply investigate the molecular mechanisms related to
   autophagy in T2DM combined with MAFLD, we combined the resources from
   multiple databases. We obtained the autophagy-associated molecular
   mechanisms from PubMedGene database
   ([68]https://www.ncbi.nlm.nih.gov/gene/), Genecard database
   ([69]https://www.genecards.org/), and GSEA database
   ([70]https://www.gsea-msigdb.org/), autophagy-related websites
   ([71]http://www.autophagy.lu/),
   ([72]http://hamdb.scbdd.com/home/index/) to obtain autophagy-related
   genes.

2.5. Machine learning

   The GSE89632_SS and GSE89632NASH datasets underwent individual
   screenings using LASSO, RFE-SVM, and Boruta methods, respectively. The
   diagnostic LASSO coefficient screening leveraged the glmnet package
   (Version 4.1.7) to analyze the cleaned data, derive variable lambda,
   likelihood, and L1 regularization values, along with visualization. We
   employed ten-fold cross-validation (seed number: 2022) for validation
   ([73]16). Additionally, we utilized two feature selection strategies:
   Recursive Feature Elimination (RFE) and the Boruta algorithm. RFE
   eliminates features without significant impact on Accuracy during model
   training repetitions ([74]17), while the Boruta algorithm selects or
   eliminates significant and non-significant features in each base
   learner training iteration based on shadow features and Z-scores
   ([75]18). This study integrated SVM base learner scores from the e1071
   package (Version 1.7-14) with the RFE strategy ([76]17) through 10*10
   fold cross-validation and applied the Boruta algorithm using the Boruta
   package (Version 8.0.0) ([77]18). Subsequently, the outcomes of Lasso,
   Boruta, and RFE-SVM screenings, combined with autophagy-related genes,
   were intersected to identify diagnostic biomarkers linked to autophagy
   in T2DM and NAFLD. These results were visually represented in Wayne
   diagrams.

2.6. Receiver operating characteristic analysis

   In datasets GSE89632_SS, GSE89632_NASH and [78]GSE15653, ROC curve
   analysis was performed on each of the three datasets using the pROC
   software package to determine the sensitivity and specificity of the
   above genes ([79]19), to predict the ROC-related information and data
   of the variables at their respective cut-off values, and to assess the
   diagnostic accuracy, and the results were presented as ROC The results
   were quantified by area under the curve (AUC), and the AUC ranges
   between 0.5 and 1, and values closer to 1 indicate superior diagnostic
   performance of the variable in predicting clinical outcomes and
   visualized using ggplot2.

2.7. Enrichment analysis

   To investigate the functions and pathways of the above biomarkers, we
   used the DAVID online database ([80]https://david.ncifcrf.gov/) for GO
   and KEGG enrichment analysis ([81]20–[82]22). The “org.Hs.eg.db”
   package was used to convert the ID of the input gene list, and the
   “clusterProfiler package” was used for enrichment analysis ([83]23).
   The significance of the enrichment results was assessed by calculating
   the z-score value for each enriched entry using the “GOplot” package
   ([84]24). The results were screened for significance and biological
   significance at P<0.05 and FDR<0.2 ([85]25). At the same time,
   GSEA(Gene Set Enrichment Analysis) was performed ([86]26), and the
   results were screened according to the following criteria: normalized
   enrichment score |NES| > 1, FDR < 0.25, p.adjust < 0.05. Finally, the
   results were visualized using the “ggplot2 package”.

2.8. Immune cell infiltration analysis

   In order to explore the immune cell infiltration in liver tissues of
   patients with T2DM combined with MAFLD, the “Single Sample Gene Set
   Enrichment Analysis” (ssGSEA) method was used in this study ([87]27).
   The ssGSEA algorithm provided by the GSVA package was utilized for this
   purpose. Using markers specific to each class of immune cells ([88]28)
   as gene sets, the enrichment score for each class of immune cells in
   each sample was calculated to assess the infiltration of immune cells
   in each sample. All analyses and visualizations were performed in R
   4.2.1. We used the ggplot2 package to draw histograms to visualize the
   differences in immune cell infiltration between the normal and control
   groups. In addition, Spearman statistics were used to analyze the
   correlation between them one by one, and the linkET package was used
   for the calculation of the data portion of the network graph. The
   analysis results were visualized by ggplot2 package for group
   comparison plots, lollipop plots, correlation scatter plots, and
   correlation network heatmaps, thus demonstrating more intuitively the
   infiltration of immune cells in liver tissues of patients with T2DM
   combined with MAFLD.

2.9. Preparation of T2DM combined with MAFLD rat model

   Thirty healthy clean-grade male Sprague–Dawley rats, aged 6–8 weeks,
   weighing 180–220 g, were provided by Wuhan Yunkron Technology Company.
   The animal production and use license number is DCXR(E)2018-0021. SYXK
   (E) 2013-0069, respectively. After 1 week of normal feeding, high-fat
   and high-sugar diets and regular diets were prepared according to the
   formula. Twenty rats were fed with high-fat and high-sugar diet,
   high-fat and high-sugar diet feed formula: 10% lard, 20% sucrose, 2%
   cholesterol, 60% ordinary feed, 8% egg yolk powder(Jiangsu Xietong
   Pharmaceutical Bio-engineering Co., Ltd.) and 10 rats in normal control
   group were fed with normal diet. Twenty rats were fed a 100 g high-fat
   and high-sugar diet daily with free access to food and water. Corn oil
   5 mL/kg was given by gavage on an empty stomach at 8:00 am every
   morning. At the end of the 12th week, 10 rats in the model group were
   given intraperitoneal injection of streptozotocin (STZ) (30 mg/kg)
   overnight fasting, and the other 10 rats in the model group were not
   given STZ injection. Ten control rats fed a regular diet were injected
   with the corresponding volume of citrate buffer. Blood samples were
   collected from the tail vein of 10 rats fed a high-fat and high-sugar
   diet and injected with STZ 3 days later, and random blood glucose
   was≥16.7 mmol/L. Blood glucose was monitored after 2 weeks, and fasting
   blood glucose was detected on the 2nd, 4th, 6th, 8th, 10th, 12th and
   14th days. At the 14th week, 4 rats in each group were randomly
   selected to complete liver ultrasound, and fatty liver results were
   formed to determine the success of modeling. Finally, the rats were
   divided into 3 groups: (1) T2DM+MASLD group (n=10); (2) High-fat and
   high-glucose group (n=10); (3) normal control group (n=10); After 15
   weeks, the rats were sacrificed under anesthesia, and the livers were
   collected, rinsed with normal saline at 4°C, weighed and placed in
   preservation solution and stored in a refrigerator at −80°C.

2.10. Quantitative real time PCR

   Real-time fluorescence quantitative PCR (Polymerase Chain Reaction):
   0.15 g of rat liver tissue was taken from each group, and total RNA was
   extracted by the Trizol method. cDNA was reverse-transcribed into the
   corresponding cDNA in accordance with the reverse-transcription kit,
   and then subjected to real time Polymerase Chain Reaction
   (PCR)、RELB、S100A9 and SOCS1 primer sequences are shown in
   [89]Supplementary Table S2 , Chain Reaction), PCR reaction total system
   20μL, PCR amplification conditions: denaturation at 95°C for 10 min,
   annealing at 60°C for 1 min, extension at 95°C for 15 s, 40 cycles,
   each sample set up 3 replicate wells, internal reference GAPDH. The
   results of the experiments were analyzed by Bio-Rad Fluorescence
   Quantitative Analysis. The results of the experiments were read by
   Bio-Rad fluorescence quantitative analysis software, and the
   quantitative calculation of the mRNA expression levels of and in each
   group was expressed by 2−ΔΔCt (CT is the number of cycles). The main
   observation indexes were RELB, S100A9, SOCS1 mRNA expression levels in
   rat liver tissues in each group.

2.11. Immunohistochemical staining

   Kidney tissue specimens were fixed for 48h and paraffin-embedded. The
   paraffin-embedded tissue sections were serial sectioned at 3 μm,
   dewaxed and hydrated. 3% H 202 was incubated at room temperature for 10
   min to remove endogenous peroxidase. After 10 min of citrate antigen
   repair, primary antibodies RELB (1:50), S100A9(1:300), SOCS1 (1:300),
   were added dropwise and incubated at 37°C for 25 min, and then freshly
   prepared DAB color development solution was added dropwise. The film
   was stained with hematoxylin and sealed with neutral gum. At the same
   time, PBS was used as negative control instead of primary antibodies.
   The immunohistochemical pathology pictures were analyzed using
   ImageScope software as follows: three different 400x fields of view
   were intercepted from each section, enter the analysis module of
   lmageScope software, and set all the dark brown color on the tissue
   sections as strong positive, brownish yellow as moderate positive,
   light yellow as weak positive, and blue nuclei as negative. Each tissue
   point was then identified and analyzed to find out the area (in pixels)
   of strong positive, moderate positive, weak positive and negative, the
   percentage of positivity and finally the H-score rating.

2.12. Single-cell data preprocessing and clustering annotation

   High-throughput sequencing expression profile data ([90]GSE136103) were
   downloaded from the Gene Expression Omnibus (GEO) database
   ([91]http://www.ncbi.nlm.nih.gov/) ([92]29). Twenty liver tissue
   samples were selected, including 9 MAFLD patients and 11 healthy
   individuals. Doublets in each sample were identified using
   DoubletFinder ([93]30). Subsequently, doublets were filtered out,
   retaining cells meeting the following criteria: nFeature_RNA > 500 and
   < 3000; nCount_RNA < 10,000; percent_mito < 10. Cells from different
   samples were then merged using the merge function and analyzed using
   the Seurat package ([94]31). Using the NormalizeData function, data
   were normalized for each library with the LogNormalize method and
   scale.factor of 10,000. The top 2,000 variable features were identified
   through the FindVariableFeatures function ([95]32), and linear
   dimensionality reduction was performed using the RunPCA function with
   default parameters (npcs = 50). Batch effects were corrected using the
   RunHarmony function ([96]33), followed by nonlinear dimensionality
   reduction using the RunUMAP function. The nearest neighbor graph was
   constructed using the FindNeighbors function (k.param = 20) with 30
   principal components. Cell clustering was completed using the
   FindClusters function. Cluster marker genes were identified through the
   FindAllMarkers function with min.pct = 0.5 and logfc.threshold = 1.
   Single-cell samples were clustered and characterized using known
   lineage markers and manually annotated with reference to the human
   liver cell atlas on the CellMarker2 website
   ([97]http://117.50.127.228/CellMarker/) ([98]34). Differential
   expression analysis between cell clusters was performed using the
   FindMarkers function (Wilcoxon rank-sum test, p_val < 0.05 &
   abs(avg_log2FC) > 0.5).

2.13. Exploring cell clusters associated with MAFLD

   To further explore intergroup heterogeneity across cell clusters, we
   analyzed proportional differences in cell counts between control and
   disease groups within distinct clusters, intergroup density variations,
   and distributional biases across conditions. Additionally, to test
   changes in cellular abundance at high resolution between
   groups/conditions, we employed the neighborhood abundance method in
   Milo (v2.2.0) ([99]35) using the functions “buildGraph” and
   “makeNhoods” with parameters: “K=30”, “d=50”, “prop=0.2”,
   “refined=TRUE”, and “refinement_scheme=reduced_dim”, indicating
   sampling using dimensionality-reduced data, with remaining parameters
   selected according to Milo’s standard workflow recommendations. For
   pathway activity scoring of each cell cluster, we used the R package
   AUCell (v1.28.0): first, gene expression rankings per cell were
   computed via AUCell_buildRankings using the expression matrix with
   default parameters; then, custom-defined gene sets constructed from key
   genes identified in bulk-RNA analyses were applied to score individual
   cells; during this process, for each cell, the Area Under the Curve
   (AUC) value was calculated with AUCell_calcAUC based on gene expression
   rankings, where the AUC value represents the proportion of genes in the
   custom-defined set appearing among top-ranked genes per cell ([100]36).
   Gene Set Variation Analysis implemented in the GSVA package (v2.0.0)
   was applied for gene set enrichment analysis using the “HALLMARK gene
   set” exported via GSEABase (v1.44.0). Intergroup differences in pathway
   activity scores per cell cluster between control and disease groups
   were computed using the LIMMA package (v3.62.2). Finally, based on
   intergroup heterogeneity patterns across cell clusters, we extracted
   Liver Sinusoidal Endothelial Cells (LSECs) and repeated the standard
   Seurat V5 workflow on these cells to identify and visualize endothelial
   subpopulations exhibiting significant distributional differences
   between groups.

2.14. Pseudo-temporal analysis

   Unsupervised pseudotemporal analysis was performed using the “Monocle”
   package (v2.34.0) ([101]37). First, a cell dataset containing the
   expression matrix, phenotypic data, and feature data was constructed
   with the newCellDataSet function (expressionFamily = negbinomial.size).
   Next, size factor dispersion and gene expression across cells were
   corrected via estimateSizeFactors and estimateDispersions functions.
   Subsequently, dimensionality reduction was conducted using the DDRTree
   method (parameter max_components = 2), and cell ordering and
   visualization were performed with plot_cell_trajectory ([102]38), while
   filtering highly variable genes correlated with pseudotime; expression
   changes of these genes along pseudotime were analyzed using the
   differentialGeneTest function. Finally, the filtered genes were
   clustered into distinct groups based on expression patterns and
   visualized using plot_pseudotime_heatmap. To identify genes bifurcating
   cells into different branches while exploring specific roles of key
   genes from bulk RNA analyses in single-cell pseudotime trajectories, we
   conducted “Branched Expression Analysis Modeling” (BEAM) analysis
   ([103]39), further enhancing the filtering threshold; genes from BEAM
   analysis were visualized via plot_genes_branched_heatmap. Subsequently,
   KEGG functional enrichment analysis was performed for genes within each
   cluster using clusterProfiler (v4.14.4) and org.Hs.eg.db (v3.20.0)
   packages.

2.15. Cell–cell communication

   Initially, to further explore the LSEC clusters derived from prior
   research that may be closely associated with MAFLD progression, we
   processed and re-clustered LSECs using the standard Seurat V5 workflow.
   Then, by comparing these clusters through UMAP visualization of MAFLD
   and healthy control groups, the LSEC subpopulation exhibiting the most
   distinct cellular distribution was identified. Additionally, beyond
   LSECs, we identified cell clusters specifically distributing “bulk-RNA
   key genes” through scatter plots of gene distribution across cells;
   these cell clusters were simultaneously analyzed with LSECs for
   cell-cell communication analysis using the “CellChat” package (v2.1.2)
   ([104]40). A random seed was set (seed=0528), and 8000 cells were
   randomly selected to create a CellChat object. Ligand-receptor
   interactions of “secretory signaling” were analyzed using “human” data
   from CellChatDB. The netVisual_circle function displayed the number and
   strength of inter-cell connections, while the netVisual_bubble function
   illustrated communications between different cell types. Subsequently,
   the netVisual_aggregate function demonstrated the communication network
   of a specific signaling pathway and calculated the contribution of
   various ligands. The mutual communication network between cells was
   visualized using the netVisual_individual function. Network centrality
   analysis was conducted using the netAnalysis_computeCentrality function
   ([105]41) and represented as a heatmap.

2.16. Statistical analyses

   Data analysis utilized SPSS 26.0 statistical software. Normally
   distributed measurement data were expressed as mean ± standard
   deviation and compared between two groups using independent samples
   t-test. Non-normally distributed measurement data were presented using
   the median and interquartile range, and comparisons between groups were
   conducted using the Mann-Whitney test. Statistical significance was set
   at P < 0.05.

3. Results

3.1. Screening of DEGs

   The datasets [106]GSE15653 and [107]GSE89632 were downloaded from GEO
   data. The [108]GSE89632 dataset was divided into GSE89632_SS group and
   GSE89632_NASH group according to SS and NASH samples. PCA principal
   component analysis was performed separately, and in the [109]GSE15653
   dataset, PC1 was 26.2% and PC2 was 8.7% ([110] Figure 1A ); in the
   GSE89632_SS dataset, PC1 was 30.6% and PC2 was 12.1% ([111] Figure 1B
   ); in the GSE89632_NASH dataset, PC1 was 30.3% and PC2 was 12.1% ([112]
   Figure 1C ). The volcano plot showed that 1686 differentially expressed
   genes were identified in the [113]GSE15653 dataset with |log2(FC)|>0.5,
   p-value<0.05 as the screening threshold, of which 864 were up-regulated
   and 822 were down-regulated ([114] Figure 1D ). In the GSE89632_SS
   dataset, 2219 differentially expressed genes were identified, of which
   1003 genes were up-regulated in expression and 1216 genes were
   down-regulated in expression, as in ([115] Figure 1E ). In the
   GSE89632_NASH dataset, 2248 differentially expressed genes were
   identified, of which 1062 genes were up-regulated in expression and
   1186 genes were down-regulated in expression, as in ([116] Figure 1F ).
   The ring value heatmap and complex value heatmap showed the genes
   ranked top40 in differential folds in [117]GSE15653 ([118] Figure 1G ),
   GSE89632_SS ([119] Figure 1H ) and GSE89632_NASH ([120] Figure 1H ),
   respectively.

Figure 1.

   [121]Graphs and heatmaps illustrating gene expression data. Panels A-C
   show PCA plots with distinct groupings: T2DM, SS, and NASH, compared to
   Control groups. Panels D-F depict scatter plots with data from
   GSE15653, GSE89632_SS, and GSE89632_NASH; points indicate significant
   expression changes. Panel G is a circular heatmap showing gene
   expression in GSE15653, indicating up- and down-regulated genes. Panel
   H shows a heatmap for GSE89632, with color codes for gene expression
   levels, comparing Normal and Disease across SS and NASH.
   [122]Open in a new tab

   Genes of interest. (A–C) Principal component analysis of [123]GSE15653,
   GSE89632_SS and GSE89632_NASH. (D–F) Volcano plots of differentially
   expressed mRNAs in [124]GSE15653, GSE89632_SS, and GSE89632_NASH,
   |logFC| > 0.5, P < 0.05. (G, H) Expression heatmaps of the top40 DEGs
   in the [125]GSE15653 and [126]GSE89632 datasets.

3.2. WGCNA analysis

   Weighted gene co-expression network analysis (WGCNA) was performed on
   [127]GSE89632SS, [128]GSE89632NASH. Hierarchical clustering was
   performed based on similarity or correlation between samples and
   visualized as a hierarchical dendrogram ([129] Figures 2A, E ). The
   distance between samples indicated the degree of similarity or
   correlation between samples, and outlier samples were removed by
   pruning operation, and the outlier samples rejected were
   [130]GSM2385767 and [131]GSM2385782, and the most representative subset
   of samples was selected, and the clustering differences between
   different samples in the Normal group and the MAFLD group were
   visualized in a joint heat map ([132] Figures 2C, G ). The top 25% of
   genes with the largest fluctuations were selected for WGCNA analysis
   based on standard deviation ranking. The scale-free fit indices and
   average connectivity of various soft-threshold powers were evaluated on
   the basis of scale-free R2. Among them, GSE89632_SS selected soft
   threshold powers with β = 11 and scale-free R2 = 0.8 ([133] Figure 2B
   ), and [134]GSE89632NASH selected soft threshold powers with β = 5 and
   scale-free R2 = 0.8 ([135] Figure 2F ). Gene clustering tree with
   module identification demonstrates the results of hierarchical
   clustering of genes in the two datasets and categorizes genes into
   different modules, further merging similar modules to reduce redundancy
   ([136] Supplementary Figures 2A, D ). The hierarchical clustering
   results between different modules were visualized using module
   clustering dendrogram ([137] Supplementary Figures 2B, E ), while the
   correlations between different modules were visualized with modal
   correlation heatmap ([138] Supplementary Figures 2C, F ). Finally,
   [139]GSE89632SS identified 7 modules ([140] Figure 2D ), GSE89632_NASH
   identified 7 modules ([141] Figure 2H ). We performed KEGG enrichment
   analysis on genes within the "brown" modules of GSE89632_SS and
   GSE89632_NASH. The results demonstrated that in the SS group, pathways
   including TNF-JNK signaling pathway,IL-1/IL-1R-JNK signaling cascade,
   and TLR2/4-MAPK signaling axis collectively indicate the role of
   pro-inflammatory signaling in driving steatosis progression toward
   fibrosis. Concurrently, pathways such as Environmental factor-induced
   PI3K signaling pathway and Environmental factor-mediated RAS-ERK
   signaling cascade (activated by heavy metals) suggest synergistic
   interactions between environmental carcinogens and metabolic
   dysregulation. In the NASH group, the recurrent enrichment of TNF-JNK
   signaling pathway and Environmental factor-mediated RAS-ERK signaling
   cascade further validates conserved mechanisms underlying
   inflammation-carcinogenesis transition. The observed negative
   correlation may reflect compensatory dysregulation of protective
   mechanisms during disease progression, where downregulation of module
   genes attenuates their inhibitory effects on
   pro-inflammatory/carcinogenic pathways, thereby accelerating
   pathological advancement ([142] Supplementary Files 1 , [143]2 ). Based
   on this observation, we selected the "brown" modules from both SS and
   NASH WGCNA cohorts.

Figure 2.

   [144]Panel A and E show hierarchical cluster trees for sample outlier
   detection. Panels B and F display scale independence and mean
   connectivity plots. Panels C and G present gene dendrograms with module
   colors. Panels D and H feature heatmaps of module-trait relationships
   with different color scales, showing correlations between modules and
   traits like “Normal” vs. “SS” and “Control” vs. “NASH”.
   [145]Open in a new tab

   WGCNA.(A–D) GSE89632_SS.(E–H) GSE89632_NASH.(A, E) Sample dendrogram
   with outlier samples above the red line. (B, F) Scale-free fit indices
   and average connectivity for different soft thresholds. (C, G)
   Clustering dendrogram after fusing similar modules. (D, H) Module-trait
   association plot.

3.3. Protein-protein interaction network

   UpSet plots visualize the results of intersections of DEGs of
   [146]GSE15653 taken with GSE89632_SS, GSE89632_NASH, SS WGCNA, NASH
   WGCNA, respectively. ([147] Figure 3A ). In order to better understand
   the interactions between genes, we used the STRING database and
   constructed the PPI network ([148] Figures 3B, C ), while the CytoHubba
   plugin was used to identify the top 15 hub genes in the GSE89632_SS
   group and the GSE89632_NASH group, respectively, based on the MCC, MNC,
   and Degree algorithms ([149] Figures 3D–F , [150]3H–J ), and then these
   three algorithms took the intersection to obtain 13 hub genes each
   ([151] Figures 3G , [152]K ). Next, we investigated the chromosomal
   localization ([153] Figures 3L , [154]M ) and interrelationships ([155]
   Figures 3N , [156]O ) of the 13 hub genes.

Figure 3.

   [157]Image showing multiple panels of graphical data analysis related
   to gene expression. Panel A presents an upset plot indicating shared
   features among datasets. Panels B and C display circular network graphs
   with nodes representing genes, colored by degree of connectivity, from
   dark purple to light. Panels D to J feature similar network graphs with
   genes highlighted in red, orange, or yellow based on specific criteria.
   Panels G and K display Venn diagrams illustrating intersections among
   different metrics. Panels L and M show circular diagrams mapping gene
   positions. Panels N and O contain heatmaps detailing gene expression
   levels across conditions, using red and green scales.
   [158]Open in a new tab

   Protein-Protein Interaction (PPI) Network and 13 Hub Genes. (A) UpSet
   Plot. (B, C) PPI networks for GSE89632_SS and GSE89632_NASH. (D–F)
   GSE89632_SS: Top 15 genes identified by the MCC, MNC, and Degree
   algorithms. (H–J) GSE89632_NASH: Top 15 genes identified by the MCC,
   MNC, and Degree algorithms. (G) GSE89632_SS: Venn diagrams illustrating
   the intersecting genes across the three algorithms. (K) GSE89632_NASH:
   Venn diagram depicting the intersection of genes identified by the
   three algorithms. (L) GSE89632_SS: Chromosomal localization of 13 hub
   genes. (M) GSE89632_NASH: Chromosomal localization of 13 hub genes. (N)
   GSE89632_SS: Correlation heatmap of 13 hub genes. (O) GSE89632_NASH:
   Correlation heatmap of 13 hub genes.

3.4. Shared screen of multiple machine learning algorithms

   We employed Lasso, Boruta, and RFE-SVM algorithms to identify hub genes
   associated with T2DM and MAFLD in the GSE89632_SS and GSE89632_NASH
   datasets. In GSE89632_SS, Lasso analysis revealed four significant
   variables: CLDN7, IRAK3, TFRC, and TNFRSF1A ([159] Figures 4A, B ). The
   Boruta algorithm identified ten significant variables, including
   CDKN1A, GJA1, IL1B, IRAK3, LBP, SOCS1, STAT3, TFRC, TLR2, and TNFRSF1A
   ([160] Figure 4C ). RFE_SVM highlighted eight features: IRAK3, MMP9,
   TNFRSF1A, TLR2, JUNB, TFRC, RELB, and CDKN1A ([161] Figure 4D ). In
   GSE89632_NASH, Lasso analysis identified five significant variables:
   ASPM, CCNF, CLEC4E, CX3CR1, and JUNB ([162] Figures 4F, G ).
   Furthermore, the Boruta algorithm filtered eleven significant
   variables, specifically ASPM, CCNF, CLEC4E, CX3CR1, GJA1, IL1B, IRAK3,
   JUNB, SOCS1, TLR2, and TNFRSF1A ([163] Figure 4H ). Two features were
   identified through RFE_SVM: JUNB and CX3CR1 ([164] Figure 4I ).
   Subsequently, we intersected the machine learning results from both
   datasets with autophagy-related genes to derive diagnostic biomarkers
   associated with autophagy in type 2 diabetes mellitus combined with
   NAFLD ([165] Figures 4E, J ).

Figure 4.

   [166]Composite image of multiple data visualizations and analyses.
   Panel A shows a line plot of binomial deviance versus log lambda with
   error bars. Panel B displays a line graph of coefficients over log
   lambda values. Panel C is a boxplot of importance Z-scores for various
   attributes. Panel D shows a line plot of out-of-bag error rate versus
   number of features. Panel E is a Venn diagram illustrating overlaps in
   selected features across methods. Panels F and G repeat Panels A and B
   with different data. Panel H is a similar boxplot to Panel C. Panel I
   repeats Panel D. Panel J mirrors Panel E with different overlaps.
   Panels K to O are ROC curves comparing sensitivity and specificity
   across different studies.
   [167]Open in a new tab

   Machine learning screening biomarker. (A–E) Dataset GSE89632_SS. (F–J)
   Dataset GSE89632_NASH. (A, F) Cross validation of parameter selection
   in Lasso regression. (B, G) Lasso regression for 13 hub genes. (C, H)
   Boruta algorithm feature gene screening. (D, I) SVM-RFE algorithm
   feature gene screening. (E, J) Wayne plots. (K) ROC curves for IRAK3,
   TFRC, TNFRSF1A in GSE89632_SS. (L) ROC curves for CX3CR1, JUNB in
   GSE89632_NASH. (M) ROC curves for IRAK3, TFRC, TNFRSF1A, CX3CR1, JUNB
   in [168]GSE15653. (N) ROC curves for IRAK3, TFRC, TNFRSF1A, CX3CR1,
   JUNB in [169]GSE24807. (O) ROC curves for IRAK3, TFRC, TNFRSF1A,
   CX3CR1, JUNB in [170]GSE23343. AUC, area under the curve; TPR, true
   positive rate; FPR, false positive rate.

3.5. Performance of five diagnostic biomarkers

   We assessed the diagnostic performance of IRAK3, TFRC, TNFRSF1A,
   CX3CR1, and JUNB using the datasets GSE89632_SS, GSE89632_NASH, and
   [171]GSE15653 for training, followed by plotting ROC curves. The
   results indicated that in GSE89632_SS, the AUC for IRAK3, TFRC, and
   TNFRSF1A exceeded 0.8 ([172] Figure 4K ). In GSE89632_NASH, the AUC for
   CX3CR1 and JUNB also surpassed 0.8 ([173] Figure 4L ), reflecting their
   high predictive accuracy. In [174]GSE15653, the AUCs for all five genes
   were above 0.8 ([175] Figure 4M ). Subsequently, we conducted external
   validation using independent datasets [176]GSE24807 and [177]GSE23343.
   The results showed that in [178]GSE24807, the AUC values of IRAK3,
   TNFRSF1A, CX3CR1, and JUNB were all greater than 0.6, while in
   [179]GSE23343, the AUC values of IRAK3, TNFRSF1A, CX3CR1, and JUNB were
   all greater than 0.5 ([180] Figures 4N, O ). These results indicate
   that in the two independent external validation sets, these indicators
   also possess certain diagnostic value, which is consistent with our
   previous predictions. Ultimately, we identified IRAK3, TNFRSF1A,
   CX3CR1, and JUNB as diagnostic biomarkers for T2DM complicated with
   MAFLD associated with autophagy and endoplasmic reticulum stress.

3.6. Enrichment analysis of GO and KEGG

   During the GO and KEGG enrichment analysis, TNFRSF1A and JUNB were
   predominantly enriched in the TNF signaling pathway, while TNFRSF1A and
   TFRC showed significant enrichment in the HIF-1 signaling pathway.
   Furthermore, TNFRSF1A demonstrated substantial enrichment in the
   NF-kappa B signaling pathway, MAPK signaling pathway, Fluid shear
   stress and atherosclerosis, and Insulin resistance ([181] Figure 5A ,
   [182]Figure 5C ). CX3CR1 exhibited notable enrichment in the
   Cytokine-cytokine receptor interaction pathway, and TFRC displayed
   significant enrichment in the Hematopoietic cell lineage ([183]
   Figures 5B, D ).

Figure 5.

   [184]Four panels labeled A, B, C, and D depict different pathway
   analyses. Panels A and B are bubble charts showing pathways enriched in
   biological processes (BP), molecular functions (MF), cellular
   components (CC), and KEGG categories, with significance measured on the
   x-axis. Panels C and D are heat maps displaying gene-pathway
   interactions for KEGG pathways, with panel C highlighting IL1B, JUNB,
   RELB, TFRC, TNFRSF1A and panel D showing CXCR1, IL1B, JUNB, RELB,
   SOCS1, TFRC, TNFRSF1A. Ontology labels BP, CC, MF, and KEGG are
   specified in legends.
   [185]Open in a new tab

   Functional enrichment analysis. (A, B) Lollipop diagrams showing GO and
   KEGG enrichment analysis of GSE89632_SS group and GSE89632_NASH group,
   respectively. (C, D) Heatmaps showing the pathway enrichment of some
   hub genes in GSE89632_SS and GSE89632_NASH groups, respectively.

3.7. Enrichment analysis of GSEA

   In Enrichment analysis of GSEA, IRAK3、TNFRSF1A 和 JUNB commonly enriched
   in REACTOME_SIGNALING_BY_INTERLEUKINS([186] Supplementary Figure 3A ,
   [187]Supplementary Figure 3E , [188]Supplementary Figure 3M ).IRAK3
   mainly enriched in BIOCARTA_IL1R_PATHWAY([189] Supplementary Figure 3B
   );REACTOME_TOLL_LIKE_RECEPTOR_TLR1_TLR2_CASCADE([190] Supplementary
   Figure 3C );REACTOME_TOLL_LIKE_RECEPTOR_CASCADES ([191] Supplementary
   Figure 3D ).TNFRSF1A mainly enriched in
   KEGG_CYTOKINE_CYTOKINE_RECEPTOR_INTERACTION ([192] Supplementary
   Figure 3F );REACTOME_INTERLEUKIN_10_SIGNALING ([193] Supplementary
   Figure 3G );KEGG_MAPK_SIGNALING_PATHWAY([194] Supplementary Figure 3H
   ).TFRC mainly enriched in PID_HIF1_TFPATHWAY([195] Supplementary
   Figure 3I );KEGG_HEMATOPOIETIC_CELL_LINEAGE ([196] Supplementary
   Figure 3J ). JUNB mainly enriched in
   REACTOME_INTERLEUKIN_4_AND_INTERLEUKIN_13_SIGNALING([197] Supplementary
   Figure 3K );WP_TGFBETA_SIGNALING_PATHWAY([198] Supplementary Figure 3L
   ); PID_IL6_7_PATHWAY([199] Supplementary Figure 3N
   );REACTOME_SIGNALING_BY_TGF_BETA_RECEPTOR_COMPLEX ([200] Supplementary
   Figure 3O ).

3.8. Infiltration and functional exploration of immune cells

   To assess immune cell infiltration and functional differences between
   the T2DM and control groups, we utilized the ssGSEA method to evaluate
   various immune cell subpopulations. The group comparison plot revealed
   increased levels of Activated dendritic cells, Monocytes, and
   Regulatory T cells in the T2DM group, with decreased levels of Memory B
   cells ([201] Figure 6A ). The correlation coefficient’s absolute value
   indicates the strength of correlation: 0.3-0.5 denotes weak
   correlation, 0.5-0.8 signifies moderate correlation, and 0.8–1
   represents strong correlation; P < 0.05 indicates statistical
   significance. In the T2DM group, IRAK3 exhibited a negative correlation
   with Natural killer cell infiltration level (R=-0.933) ([202] Figure 6B
   , [203]Supplementary Figure 4A ). The correlations between TNFRSF1A and
   Effector memory CD8T cells, Gamma delta T cells, Macrophages, Natural
   killer cells, Type 1 T helper cells, and Regulatory T cells were
   positive (R=0.733, 0.700, 0.717, 0.717, 0.700, 0.733) ([204] Figure 6C
   , [205]Supplementary Figures 4B–G ). CX3CR1 showed a negative
   correlation with Type 17 T helper cell infiltration (R=-0.983) ([206]
   Figure 6D , [207]Supplementary Figure 4H ). JUNB displayed a positive
   correlation with Monocyte infiltration (R=0.700) ([208] Figure 6E ,
   [209]Supplementary Figure 4I ). Additionally, correlations existed
   between different types of immune cells ([210] Supplementary Figure 4J
   ).

Figure 6.

   [211]Panel A shows a bar graph comparing cell value differences between
   control and MAFLD groups. Panels B to E present correlation plots for
   genes IRAK3, TNFRSF1A, CX3CR1, and JUNB with different immune cells,
   highlighting correlations, p-values, and variance for each.
   [212]Open in a new tab

   Immune cell infiltration assessment. (A) Subgroup comparison plot
   demonstrating the difference in immune cell infiltration between the
   two groups as calculated by the ssGSEA algorithm. (B–E) Lollipop plots
   showing the correlation between IRAK3 (B), TNFRSF1A (C), CX3CR1 (D),
   JUNB (E) and immune cells. Note: Significance levels are denoted as
   follows: ns p ≥0.05; * p < 0.05; ** p < 0.01; *** p < 0.001.

3.9. Real-time fluorescence quantitative PCR results

   The results of real-time fluorescence quantitative PCR analysis showed
   that there were significant differences in the expression levels of
   IRAK3, TNFR1, CX3CR1 and JUNB between the two groups (P < 0.05); among
   them, the mRNA expression levels of IRAK3, TNFR1, CX3CR1 and JUNB in
   the liver tissues of rats in the T2DM combined with MAFLD group were
   significantly higher than that in the control group (P > 0.05).), as
   shown in [213]Figure 7F .

Figure 7.

   [214]Histological and graphical analysis of biomarkers in liver
   conditions. Panels A-D show histology images comparing control and T2DM
   & MAFLD groups at 40x magnification, focusing on CX3CR1, LRAK3, JUNB,
   and TNFR1. Panels E and F display bar graphs indicating integrated
   optical density and mRNA expression levels for these biomarkers, with
   statistical significance noted. Comparison shows higher levels in T2DM
   & MAFLD than controls.
   [215]Open in a new tab

   (F). IRAK3, TNFR1,CX3CR1, JUNB mRNA expression in liver tissues of rats
   in T2DM and MAFLD group and control group. (A–E), Expression levels of
   antigen-antibody complexes in the liver tissues of rats in the T2DM and
   MAFLD groups and the Control group. Note: Significance levels are
   denoted as follows: ns p ≥0.05; * p < 0.05; ** p < 0.01; *** p < 0.001.

3.10. Immunohistochemical results

   Immunohistochemical picture analysis showed that the expression levels
   of the four factors, CX3CR1, IRAK3, JUNB, and TNFR1, were significantly
   different between the T2DM & MAFLD groups and the Control group (P <
   0.05); and the T2DM combined with MAFLD group were significantly higher
   than the Control group. For details, see [216]Supplementary Table S3
   and [217]Figures 7A–E .

3.11. Single-cell data preprocessing and clustering annotation

   A comprehensive scRNA-seq analysis was conducted using publicly
   available data from healthy and MAFLD liver samples. The clustering
   results, illustrated through dendrograms([218] Supplementary Figure 5A
   ) and marker genes bubble plots ([219] Figure 8B ), culminated in the
   identification of 13 major cell clusters at a resolution of 0.6.
   Distinct cell types were pinpointed, including Cycling(cycling cells)
   (STMN1,MKI67), B_cell (CD79A,CD79B,MS4A1), Plasma(Plasma cell)
   (CD79A,IGHA1), pDC (plasmacytoid dendritic cell) (LILRA4,CLEC4C),
   Mes(mesenchymal cell) (PDGFRB,ACTA2,COL1A1,COL1A2,COL3A1,DCN), Cho
   (Cholangiocyte) (EPCAM,KRT19,FXYD2), T_cell (CD3D,CD3E,CD3G,CD8A), ILC
   (innate lymphoid cell) (KLRF1,KLRC1,GZMB,NKG7), Kupffer (CD163,MARCO),
   Neu(Neutrophils) (S100A8,S100A9), cDC(Conventional Dendritic Cell)
   (CLEC10A,CD86), LSEC(Liver Sinusoidal Endothelial Cel)
   (CLEC14A,CD34,VWF,STAB2,CLDN5) and LEC(Lymphatic Endothelial Cells)
   (PROX1,TFF3,CCL21). The findings were further visualized using UMAP
   ([220] Figure 8A ).

Figure 8.

   [221]Panel A shows a UMAP plot clustering cell types with different
   colors. Panel B is a dot plot presenting gene expression across
   identities with varying dot sizes and shades. Panel C displays stacked
   bar charts comparing cell type ratios between control and MAFLD groups.
   Panel D illustrates a stacked bar chart of cell type distribution
   across samples.
   [222]Open in a new tab

   Cell types annotation. (A) UMAP plot showing the clustering results of
   cells at 0.6 resolution. (B) Bubble diagram showing Marker genes used
   to identify major cell types. (C) Percentage of different cell types
   between Control and MAFLD groups. (D) Percentage of different cell
   types among different samples between Control and MAFLD groups.

3.12. Exploring cell clusters associated with MAFLD

   We analyzed the proportional differences in cell counts between control
   and disease groups across distinct cell clusters, with LSEC showing a
   significantly higher proportion in the MAFLD group ([223] Figure 8C ),
   along with proportional variations of cell clusters among different
   samples ([224] Figure 8D ). We performed density analysis of
   inter-group cell counts at a two-dimensional level, where relative
   brightness indicates higher density of the corresponding cell cluster
   in its respective group ([225] Figures 9A, B ). Subsequently, we
   evaluated the distributional preference between MAFLD and Control
   groups using the Ro/e index ([226] Figure 9C ), as well as
   distributional preferences across individual samples ([227] Figure 9D