Abstract Background The imbalance in macrophage phenotype transition is a central mechanism driving chronic inflammation in diabetic kidney disease (DKD). Macrophages can polarize toward the M2 phenotype via efferocytosis, exerting anti-inflammatory and pro-resolving effects. However, the identification and functional validation of regulatory genes governing M2 macrophage and efferocytosis in DKD remain to be thoroughly explored. Methods Differentially expressed genes were obtained based on [42]GSE96804 and [43]GSE30122 data sets. Based on efferocytosis-related genes (ERGs) and M2 polarization-related genes (MRGs), ERG and MRG scores were computed in the [44]GSE96804 dataset. Weighted gene co-expression network analysis (WGCNA) was carried out to identify critical module genes. Finally, macrophage-efferocytosis-related DEGs (MEDEGs) were identified. Further, machine learning (ML)—support vector machine (SVM), BORUTA, and lasso regression—were employed to identify hub genes and build Nomogram predictive model. Additionally, hub genes were confirmed through animal experiments. Results A total of 35 MEDEGs were identified. ML recognized 3 hub genes—MCUR1, CYP27B1, and G6PC. Hub genes were notably downregulated in DKD group and exhibited high predictive ability. Furthermore, the Nomogram model based on key genes has shown potential in predicting DKD. The findings were further validated through transcriptome sequencing of DKD model. Conclusion This study uncovered 3 hub genes—MCUR1, CYP27B1, and G6PC—linked to M2 polarization, efferocytosis, and DKD. These genes may contribute to DKD pathogenesis, providing novel targets for early diagnosis and therapeutic interventions in DKD. Keywords: diabetic kidney disease, M2 macrophage, efferocytosis, bioinformatics, immune regulatory 1. Introduction Diabetic kidney disease (DKD) is one of the most severe microvascular complications of diabetes ([45]1). According to the Global Burden of Disease study, DKD accounts for more than 50% of new dialysis cases in developed countries ([46]2). Despite advancements in glycemic control and renin-angiotensin system blockade therapies, 30-40% of patients still progress to irreversible renal failure ([47]3). This clinical challenge highlights the urgent need to elucidate novel pathogenic mechanisms. The kidney is a complex organ composed of more than 50 different cell types ([48]4). The development of kidney diseases is driven by complex intercellular interactions in the renal microenvironment ([49]5). Previous DKD studies have primarily focused on hyperglycemia-induced direct injury mechanisms affecting renal ([50]6–[51]8). However, the underlying pathogenic mechanisms remain unclear, and there is a lack of sensitive and efficient predictive biomarkers. Once DKD progresses to overt nephropathy, it rapidly advances to the end-stage ([52]9). Recent findings suggest that chronic inflammation is a key factor driving DKD, with macrophage-mediated immune dysregulation playing a crucial role in sustained renal injury ([53]10, [54]11). Macrophage-dominant immune cell infiltration is observed in renal tissues of DKD patients at various disease stages ([55]12), and the extent of macrophage infiltration is significantly correlated with the rate of renal function decline in DKD patients ([56]13). The pro-inflammatory M1 phenotype dominates the early stages of injury, whereas M2 macrophages contribute to anti-inflammatory responses and tissue repair. The imbalance in macrophage polarization is a crucial factor driving the persistent progression of inflammation ([57]14). Efferocytosis is the process by which phagocytic innate immune cells clear apoptotic cells. It plays a pivotal role in maintaining immune homeostasis, controlling chronic inflammation, and facilitating tissue repair ([58]15). Mechanistically, efferocytosis not only eliminates cellular debris but also promotes the polarization of macrophages toward the anti-inflammatory and reparative M2 phenotype, establishing a self-sustaining reparative loop ([59]16). Macrophage efferocytosis is a critical step in inflammation resolution. Impairment in this process exacerbates tubular atrophy and glomerulosclerosis ([60]17). This study uses bioinformatics techniques to explore the potential role of M2 polarization and efferocytosis-associated genes. Machine learning (ML) algorithms are then applied to pinpoint biomarkers closely tied to DKD pathology ([61] Figure 1 ). Figure 1. [62]Flowchart illustrating a bioinformatics study. The process includes four main phases: Screening, Machine Learning, Validation and Analysis, and Application. Screening involves GSE96804 and GSE30122 data sets, DEGs, MRGs, ERGs, WGCNA, PPI, and GO+KEGG analyses. Machine Learning employs SVW, Boruta, LASSO, and a Sankey diagram. Validation and Analysis involve nephroseq, transcriptomics, immune infiltration, and single-cell analysis with various charts and sample images. Application phase includes nomogram, consensus clustering, and network construction with graphical representations. Each phase contains specific tasks and visualizations. [63]Open in a new tab Research flowchart. 2. Methods 2.1. Data sources The DKD gene expression datasets [64]GSE96804 and [65]GSE30122 were retrieved from gene expression omnibus (GEO) database ([66] Table 1 ). The keyword “efferocytosis” was used to perform a search in the GeneCards database ([67]https://www.genecards.org/). Based on literature extraction, 137 efferocytosis-related genes (ERGs) were identified ([68]18–[69]20), and after integration, 137 ERGs were derived ([70] Supplementary Table S1 ). Additionally, based on previous studies ([71]21), we identified 46 genes associated with M2 macrophage polarization (MRGs) ([72] Supplementary Table S2 ). Table 1. Overview of the datasets. Dataset Database Platform Type [73]GSE96804 GEO [74]GPL17586 41 DKD cases and 20 normal case [75]GSE30122 GEO [76]GPL571 19 DKD cases and 50 normal case Ju CKD Glom Nephroseq / 12 DKD cases and 21 normal case Ju CKD TubInt Nephroseq / 17 DKD cases and 31 normal cases [77]Open in a new tab 2.2. Differentially expressed genes The limma package ([78]22) was employed to analyze the [79]GSE96804 and [80]GSE30122 datasets. DEGs from [81]GSE96804 were designated as DEGs 1, while those from [82]GSE30122 were labeled as DEGs 2. The selection criteria for DEGs were a fold change (FC) > 1.5 and adj.P < 0.05. Heatmaps (pheatmap package) and Volcano plots (ggplot2 package ([83]23)) were used to visualize DEGs expression patterns. Finally, the overlapping DEGs genes from the two datasets were integrated to form the final DEGs. 2.3. Single sample gene set enrichment analysis scoring We performed ssGSEA using the GSVA package ([84]24), with MRGs and ERGs as the reference background set. ssGSEA enrichment scores were computed for all samples in the [85]GSE96804 dataset. 2.4. Weighted gene co-expression network analysis analysis WGCNA is a powerful tool for analyzing gene expression data, aiming to identify gene modules by constructing gene co-expression networks and further revealing key genes within these modules. Genes were filtered by calculating the median absolute deviation (MAD) and retaining the top 50% with highest variability. The WGCNA package’s goodSamplesGenes function was then applied to remove outliers ([86]25). A scale-free co-expression network was subsequently built using WGCNA. MRG and ERG scores were considered as trait variables, and pearson correlation analysis was performed between modules and MRGs/ERGs scores. 2.5. Enrichment analysis The intersection of DEGs and module genes was obtained to identify macrophage-efferocytosis related DEGs (MEDEGs). The STRING database ([87]https://cn.string-db.org/) was utilized to construct the protein-protein interaction (PPI) network. Functional enrichment analysis of gene ontology (GO) terms and kyoto encyclopedia of genes and genomes (KEGG) pathways was performed with the clusterProfiler R package. 2.6. ML selection of hub genes Three ML methods—lasso regression, support vector machine (SVM), and BORUTA algorithm—were employed to filter MEDEGs. Lasso regression enhances model interpretability through regularization ([88]26). SVM is employed to manage high-dimensional data and optimize the classification boundary and the BORUTA algorithm identifies significant features by evaluating the relative importance of variables. The intersection was considered as hub genes. 2.7. Validation and clinical significance The pROC package ([89]27) was employed to calculate the area under the ROC curve (AUC) to evaluate predictive performance. Furthermore, the correlation between hub genes and clinical parameters such as serum creatinine (Scr) and glomerular filtration rate (GFR) was examined using the [90]GSE30122 dataset and the Nephroseq database ([91]http://v5.nephroseq.org). Furthermore, Nomogram was constructed using the rms package. 2.8. Immune infiltration analysis The CIBERSORT algorithm was applied to assess immune infiltration levels. The “PERM” parameter was set to 100, with a critical value of P at 0.05. Assessing the correlation between hub genes and immune cells, and heatmap was generated to visualize associations. 2.9. Single-cell RNA sequencing analysis Sc-RNA seq datas were analyzed using the Kidney Integrated Transcriptomics (K.I.T.) database ([92]https://www.humphreyslab.com/SingleCell/). Wilson P et al. ([93]28)conducted unbiased Sc-RNA seq on cryopreserved human diabetic kidney samples. This study extracted sc-RNA seq data of hub genes in DKD samples and visualized the results. 2.10. Consensus clustering Consensus clustering analysis was conducted on hub genes expression matrix using the Consensus Cluster Plus package ([94]29). The similarity between the samples was assessed using Euclidean distance, followed by K-means clustering. Subsequently, 500 iterations were performed with a resampling proportion of 0.8. Next, principal component analysis (PCA) was subsequently employed to validate the expression patterns of genes under different subtypes. Additionally, the gene set variation analysis (GSVA) expression profiles of different subtypes were compared. 2.11. Molecular regulatory networks and drug prediction Transcription factors (TFs) were predicted through the NetworkAnalyst 3.0 platform (JASPAR database) ([95]https://www.networkanalyst.ca/). The miRTarBase database was used to predict miRNA targets. Next, the mRNA-TF, mRNA-miRNA, and TF-miRNA-mRNA interaction networks were constructed to elucidate their regulatory roles. Furthermore, drug prediction for hub genes was conducted using the DGIdb database ([96]https://www.dgidb.org/). 2.12. Animal model validation All methods were carried out in accordance with relevant guidelines and regulations in the manuscript. All animal experimental procedures were in accordance with ARRIVE guidelines. Twenty-four SPF-grade male C57BL/6J mice (20 ± 2 g, 6–8 weeks) were purchased from Beijing Vital River Laboratory Animal Technology Co., Ltd. (Production License No.: SCXK (Beijing)-20210006). This study was approved by the Animal Ethics Committee of Beijing University of Chinese Medicine (BUCM-2023120104-4282). The detailed experimental procedures are provided in the [97]Supplementary Materials . Under 1% pentobarbital sodium anesthesia administered intraperitoneally (ip), the kidneys were completely excised, and the cortical tissue was divided into two parts: one was fixed in 4% paraformaldehyde for 24 hours for histopathological evaluation using HE, PAS, and Masson staining; the other was snap-frozen in liquid nitrogen and submitted to Shanghai Majorbio Bio-pharm Technology Co.,Ltd. for transcriptomic sequencing. 2.13. Statistical analysis Statistical analyses were conducted using R Studio 4.3.0. Pearson correlation analysis was used to evaluate relationships between variables, and the Wilcoxon test was applied to compare differences between the two groups. P-value < 0.05 was considered statistically significant. 3. Results 3.1. DKD is associated with efferocytosis and M2 macrophage polarization In the [98]GSE96804 dataset, differential expression analysis identified 578 significantly upregulated genes (DEGs 1_up) and 948 significantly downregulated genes (DEGs 1_down) ([99] Figures 2A, C ; [100]Supplementary Table S3 ). Similarly, in the [101]GSE30122 dataset, 364 upregulated genes (DEGs2_up) and 177 downregulated genes (DEGs2_down) were detected ([102] Figures 2B, D ; [103]Supplementary Table S4 ). Finally, 171 DEGs were obtained ([104] Figure 2E ). ssGSEA revealed a notable difference between the DKD and normal group (P < 0.001). Notably, the ssGSEA score was significantly elevated in DKD group ([105] Figures 2F, G ), suggesting that the pathological progression of DKD may involve the regulation of M2 macrophage polarization and efferocytosis. Figure 2. [106]Volcano plots (A and B) show gene expression changes with regulated, up-regulated, and down-regulated genes highlighted. Heatmaps (C and D) display gene expression clustering for normal and diabetic kidney disease groups. A bar chart (E) presents the distribution of differentially expressed gene overlaps. Box plots (F and G) compare MRGs and ERGs between normal and DKD groups, indicating significant differences. [107]Open in a new tab Efferocytosis and M2 macrophage polarization in DKD. (A, B) Volcano plot. (A) DEGs 1; (B) DEGs 2. (C, D) Heatmap. (C) DEGs 1; (D) DEGs 2; (E) Upset plot of intersecting DEGs. (F) MRGs scores (G) ERGs scores (^* P < 0.05,^**** P < 0.001). 3.2. WGCNA analysis Enrichment analysis was performed using background gene sets constructed based on MRGs and ERGs, and no significant outliers were detected ([108] Figures 3A, B ). The optimal soft-thresholding power was identified as 9, based on the R² statistic and the mean connectivity criterion ([109] Figures 3C, D ). Hierarchical clustering analysis classified the gene expression profiles of [110]GSE96804 dataset into 16 co-expression modules ([111] Figures 3E, F ). Among them, the green yellow module positively correlated with the M2 macrophage score (r = 0.95, P = 1.5×10^-30), while the turquoise module negatively correlated (r = -0.81, P = 1.7×10^-15) ([112] Figure 3E ). In efferocytosis-related characteristics, the green yellow (r = 0.92, P = 4.6×10^-25) and dark red (r = 0.80, P = 1.80×10^-14) modules exhibited significant positive correlations, whereas turquoise module negatively correlated (r = -0.81, P = 2.2×10^-15) ([113] Figure 3F ). A total of 1,021 module genes associated with MRGs and 1,089 module genes associated with ERGs were ultimately identified. Figure 3. [114]Hierarchical clustering, dendrograms, and network analysis are displayed in a composite image. Panels A and B show dendrograms with heat maps for MRGs and ERGs, indicating grouping into 'Normal' and 'DKD.' Panel C has graphs for scale-free topology and mean connectivity, with a highlighted cutoff point. Panel D shows a detailed dendrogram with dynamic and merged clusters. Panels E and F feature correlation networks with color-coded pathways and interaction strengths. [115]Open in a new tab Analysis of gene modules associated with M2 macrophage polarization and efferocytosis in DKD. (A, B) Clustering analysis of characteristic genes (A) M2 macrophages; (B) efferocytosis; (C) Selection of the soft-thresholding power; (D) WGCNA dendrogram: Co-expression modules identified and color-coded, with a total of 16 characteristic modules; (E, F) Heatmap of correlations between co-expression modules (E) M2 macrophage polarization; (F) efferocytosis. 3.3. Enrichment analysis of MEDEGs A total of 35 MEDEGs were identified ([116] Figure 4A ). The PPI network was constructed to analyze the interactions between the 35 MEDEGs ([117] Figure 4B ). Biological process (BP) was predominantly enriched in various metabolic processes, such as small molecule catabolism and organic acid degradation. The cellular component (CC) category was primarily enriched in microbody lumen, vesicle lumen, and secretory granule lumen. In the molecular function (MF) category, genes were enriched in binding to monocarboxylic acids, vitamins, and fatty acids ([118] Figure 4C ). KEGG pathway enrichment analysis were widely involved in various metabolic pathways, including carbon metabolism, pentose and glucuronate interconversions, and purine metabolism. The involvement of these pathways suggests that DEGs may play crucial roles in energy metabolism, substance conversion, and signal transduction ([119] Figure 4D ). Figure 4. [120]Panel A shows a Venn diagram comparing DEGs, MRGs, and ERGs with overlap indicated by numbers. Panel B presents a gene interaction network highlighting connections and gene significance. Panel C is a bar graph showing GO enrichment scores for biological processes (BP), cellular components (CC), and molecular functions (MF). Panel D depicts a network analysis of metabolic pathways with node size indicating significance and colored lines denoting categories. [121]Open in a new tab Enrichment analysis of MEDEGs. (A) Venn diagram of DEGs, ERGs, and MRGs.; (B) PPI network; (C) GO enrichment analysis; (D) KEGG enrichment analysis. 3.4. ML selection of hub genes To further refine the selection of MEDEGs, three ML algorithms—SVM, BORUTA feature selection, and LASSO regression—were employed to screen hub genes ([122] Figure 5 ). The SVM algorithm, utilizing a radial basis function kernel, identified six high-weight genes ([123] Figures 5A, B ). The BORUTA algorithm, based on random forest importance evaluation, identified 13 key genes ([124] Figures 5C, D ). LASSO regression, using ten-fold cross-validation, ultimately selected five feature genes ([125] Figure 5E ). A consensus analysis integrating all three algorithms identified MCUR1, CYP27B1, and G6PC as the core hub genes in DKD ([126] Figure 5F ). Figure 5. [127]Six-panel image depicting feature selection and importance analysis in machine learning. Panel A shows a line graph of 5-fold cross-validation accuracy vs. number of features, peaking at six features with 0.942 accuracy. Panel B shows a 5-fold cross-validation error graph, minimizing at six features with 0.0576 error. Panel C displays a box plot of feature importance across attributes, with G6PC being the most important. Panel D is a line chart showing importance stability over 100 classifier runs. Panel E shows misclassification error vs. log(lambda) in a LASSO model, highlighting minimal error. Panel F is a Sankey diagram linking genes to machine learning methods. [128]Open in a new tab ML selection of hub genes. (A) Five-fold cross-validation accuracy of the SVM method under different feature numbers; (B) Five-fold cross-validation error of the SVM method under different feature numbers; (C, D) BORUTA algorithm results. (E) Cross-validation curve of LASSO regression; (F) Sankey diagram integrating the three machine learning approaches. 3.5. Validation and clinical significance The expression of MCUR1, CYP27B1, and G6PC were analyzed in the [129]GSE96804 dataset, revealing significantly lower expression in the DKD group (P < 0.01) ([130] Figure 6A ). To further evaluate the predictive capability of these three genes in distinguishing DKD from normal samples, the AUC was calculated: MCUR1 (0.935), CYP27B1 (0.960), and G6PC (0.987) ([131] Figure 6B ). The expression trends of hub genes in the [132]GSE30122 dataset, Ju CKD TubInt, and Ju CKD Glom were consistent with those observed in [133]GSE96804 ([134] Figures 6C, D, G ). Except for MCUR1, which exhibited a lower AUC in Ju CKD Glom, all other AUC values exceeded 0.9 ([135] Figures 6E, H ), demonstrating strong discriminative capability and suggesting their potential value in the early diagnosis of DKD. [136]Figure 6F and [137]Figure 6I illustrate the correlation between MCUR1, CYP27B1, and G6PC and clinical parameters, including serum creatinine levels and GFR. Correlation analysis revealed a significant negative correlation between hub gene expression and Scr levels, while a positive correlation was observed with GFR. This suggests that alterations in these gene expressions are not only associated with DKD progression but may also play a crucial role in reflecting kidney function impairment. Figure 6. [138]Several panels illustrate gene expression analysis and correlation data related to diabetic kidney disease (DKD). Panels A, C, D, and G show box plots comparing gene expression levels of MCU1, CYP27B1, and GP6C in normal versus DKD or diabetic nephropathy samples. Panels B, E, and H display ROC curves assessing the diagnostic performance of these genes. Panels F and I depict correlation matrices, highlighting relationships between gene expression and clinical variables like serum creatinine level and GFR. Different colors and symbols indicate varying levels of correlation and significance. [139]Open in a new tab Validation and clinical correlation analysis. (A)hub genes in [140]GSE96804 dataset; (B) ROC curves of hub genes in [141]GSE96804; (C) hub genes in the [142]GSE30122 dataset; (D) hub genes in the Ju CKD TubInt dataset; (E) ROC curves of hub genes in Ju CKD TubInt; (F) Correlation between hub genes and clinical parameters in Ju CKD TubInt; (G) hub genes in the Ju CKD Glom dataset; (H) ROC curves of hub genes in Ju CKD Glom; (I) Correlation between hub genes and clinical parameters in Ju CKD Glom (^* P < 0.05, ^** P < 0.01, ^*** P < 0.001, ^**** P < 0.0001). 3.6. Nomogram model A Nomogram model for predicting DKD was developed based on MCUR1, CYP27B1, and G6PC ([143] Figure 7A ). The calibration curve provided preliminary evidence of predictive accuracy for DKD ([144] Figure 7B ). In the [145]GSE96804 dataset, the risk score was significantly higher in DKD group (P < 0.001) ([146] Figure 7C ). The AUC based on the risk score was 0.988, suggesting a promising discriminative ability ([147] Figure 7D ). Consistent results are also obtained in Ju CKD TubInt (AUC = 0.949) ([148] Figures 7E, F ). Notably, a positive correlation exists between risk score and Scr levels (r = 0.59, P < 0.001) and a negative correlation exists between risk score and GFR (r = -0.71, P < 0.001) ([149] Figures 7G, H ). Figure 7. [150]A series of graphs and charts displaying a nomogram and its analysis in relation to diabetic kidney disease (DKD) and other groups. Panel A shows a nomogram with points for MCUR1, CYP27B1, and G6PC. Panel B presents a calibration plot with a C-index of 0.988. Panel C is a scatter plot comparing nomogram scores between normal and DKD groups, indicating significant differences. Panel D depicts an ROC curve with an AUC of 0.988. Panel E is another scatter plot comparing healthy living donors and diabetic nephropathy. Panel F shows an ROC curve with an AUC of 0.949. Panel G and H display correlations of the nomogram with serum creatinine levels and GFR (MDRD), respectively. [151]Open in a new tab Nomogram model. (A) Nomogram; (B) Calibration curve; (C) Risk scores in [152]GSE96804 dataset; (D) ROC curve of risk scores in [153]GSE96804 dataset; (E) Risk scores in Ju CKD TubInt dataset; (F) ROC curve of risk scores in Ju CKD TubInt dataset; (G) Correlation between the risk scores and Scr; (H) Correlation between the risk scores and GFR. 3.7. Immune infiltration analysis To explore the immune microenvironment of DKD, we employed the CIBERSORT algorithm to compare immune cell infiltration levels ([154] Figures 8A, B ). Several immune cell subsets, including γδ T cells, NK cells, and macrophages (M0, M1 and M2), were significantly elevated in DKD group. These findings indicate that immune cells could be crucial in the pathological progression of DKD. MCUR1 was positively correlated with M2 macrophages and neutrophils, CYP27B1 was associated with neutrophils, and G6PC exhibited correlations with M2 macrophages, neutrophils, and eosinophils ([155] Figure 8C ). Figure 8. [156]Chart A displays stacked bar graphs of cell type proportions across multiple samples, categorized by color. Chart B shows a box plot comparing proportions of different immune cell types in normal versus DKD groups, with statistical significance indicated by asterisks. Chart C illustrates a network correlation plot detailing relationships between various cell types and three genes: MCU1, CYP27B1, and G6PC, using lines and shaded squares to denote correlation strength and significance. [157]Open in a new tab Immune infiltration. (A) Relative proportions of 22 immune cells; (B) Box plot comparing immune cell infiltration levels; (C) Heatmap illustrating the correlations between hub genes and immune cell types (^* P < 0.05, ^** P < 0.01, ^*** P < 0.001, ^**** P < 0.0001). 3.8. Sc-RNA seq analysis To gain deeper insights, we performed sc-RNA seq analysis in KIT database. [158]Figure 9A displays the t-distributed stochastic neighbor embedding (t-SNE) plot of sc-RNA data in DKD. The analysis identified 12 distinct cell clusters, representing various renal cell types, including podocytes (PODO), proximal convoluted tubule cells (PCT), loop of Henle cells (LOH), endothelial cells (ENDO), and parietal epithelial cells (PEC). Regarding gene expression patterns, MCUR1 was primarily localized in PEC, PCT, and LOH. In contrast, CYP27B1 and G6PC were predominantly expressed in PCT ([159] Figures 9C, D ). Figure 9. [160]Panel A shows a t-SNE plot with colored clusters labeled as PCT, CD-ICA, PEC, and others, indicating cell types. Panels B, C, and D are dot plots showing the percent expression of MCUR1, CYP27B1, and G6PC across various samples, with dot size indicating expression level. [161]Open in a new tab Sc-RNA seq expression analysis (A) t-SNE plot of single-cell sequencing data; (B) Bubble plot of MCUR1 expression levels; (C) Bubble plot of CYP27B1 expression levels; (D) Bubble plot of G6PC expression levels. 3.9. Consensus clustering Consensus clustering was conducted to classify 41 DKD patients into distinct molecular subtypes. By comprehensively evaluating cluster stability indicators, the optimal cluster number was determined as k = 2. The consensus matrix heatmap, cumulative distribution function (CDF) curve, and cluster stability analysis collectively confirmed the robustness of this clustering strategy ([162] Figures 10A-C ). As a result, 41 DKD patients were divided into two subgroups: C1 (n = 21) and C2 (n = 20). Further PCA demonstrated notable differences between two subtypes ([163] Figure 10D ). The box plots of hub gene expression levels clearly demonstrated that MCUR1, CYP27B1, and G6PC were significantly higher in C1 group ([164] Figures 10E-G ). To explore functional differences between the two subgroups, GSVA was conducted. The results showed that the Renin-Angiotensin System and Glycerophospholipid Metabolism pathways were upregulated in C1, whereas NOTCH signaling, Gamma R-mediated phagocytosis, ECM-receptor interactions, and chemokine signaling pathways were upregulated in C2 ([165] Figure 10H ). Figure 10. [166]A combination of charts and plots related to data analysis: - A) Heatmap with two clusters, consensus matrix \( k=2 \). - B) Consensus CDF plot with multiple colored lines showing consensus indices. - C) Delta area plot showing relative changes in area under CDF curve. - D) PCA plot differentiating two groups (C1, C2) on PC1 and PC2 axes. - E, F, G) Box plots comparing gene expression levels (MCU, GP6, CYP27B1) between groups C1 and C2, showing statistical significance. - H) Bar chart displaying gene sets with up and down-regulated values between groups, indicated by T values of GSVA score. Each section represents different parts of a comprehensive data analysis. [167]Open in a new tab Consensus clustering analysis. (A) Consensus clustering matrix (k = 2); (B) CDF curve; (C) Area under the CDF curve; (D) PCA plot of two subtypes; (E-G) Box plots of hub genes expression (^* P < 0.05, ^** P < 0.01, ^*** P < 0.001, ^**** P < 0.0001).; (H) GSVA analysis between two subtypes. 3.10. Molecular regulatory network and drug prediction To elucidate the regulatory mechanisms of hub genes in DKD, a molecular regulatory network was constructed, followed by drug prediction analysis. At the transcriptional regulation level, a 23-node, 27-edge network was identified, including 20 key TFs. Among them, NFKB1 and HNF4A were identified as regulators of MCUR1 and CYP27B1, while POU2F2, MEF2A, and HOXA5 specifically regulated MCUR1 and G6PC. Additionally, TP53 was found to exert dual-target regulation over CYP27B1 and G6PC ([168] Figure 11A ). The mRNA-miRNA interaction network was generated, predicting 123 miRNAs, forming a 126-node, 125-edge network ([169] Figure 11B ). Subsequently, we constructed the TF-miRNA-mRNA network to provide a clearer and more intuitive depiction of the interrelationships among mRNA, miRNA, and TF ([170] Figure 11C ). Notably, drug prediction analysis using the DGIdb database revealed that CYP27B1 could serve as a potential therapeutic target for 13 drugs ([171] Figure 11C ). Figure 11. [172]Diagram with four panels illustrating biological relationships and data. Panel A shows a gene interaction network with nodes for G6PC and MCUR1 emphasizing CYP27B1. Panel B presents three clusters of interactions, color-coded in green, yellow, and red. Panel C displays a circular gene and microRNA interaction map linking G6PC, CYP27B1, and MCUR1. Panel D is a Sankey diagram, mapping genes like CYP27B1 to drugs and their regulatory statuses, with color coding indicating therapeutic uses. [173]Open in a new tab Molecular regulatory network and drug prediction. (A) mRNA-TF network; (B) mRNA-miRNA network; (C) TF-miRNA-mRNA network; (D) Drug prediction. 3.11. Animal experiment validation Further exploration of hub genes expression was carried out using the DKD mouse model. Histological analysis ([174] Figure 12A ): HE, PAS, and Masson staining were used to compare the renal pathology. The control group showed normal glomerular morphology, well-arranged renal tubules, and no significant pathological alterations. The DKD group exhibited classic DKD pathological features, including glomerular hypertrophy, diffuse thickening of the tubular basement membrane, and extensive collagen deposition in the renal interstitium, indicating significant renal interstitial fibrosis. Renal function assessment ([175] Figures 12B-D ): The kidney-to-body weight ratio (kW/BW) was significantly increased in DKD group. Scr and urine microalbumin/creatinine ratio (mAlb/uCr) were both significantly elevated, confirming renal dysfunction in DKD mice. Transcriptomic validation ([176] Figures 12E-G ): Whole-transcriptome sequencing was performed to validate hub genes expression. Compared to control group, the mRNA expression levels of Mcur1, Cyp27b1, and G6pc in DKD renal tissues were significantly downregulated (P < 0.05). Figure 12. [177]Microscopic images and box plots comparing control and DKD groups. Panel A shows kidney tissue stained with HE, Masson's trichrome, and PAS. The control group exhibits less alteration, while DKD samples show increased pathological changes. Panels B to G present box plots of kidney function indicators and gene expression levels, with DKD showing significant changes compared to control. Statistical significance is denoted by asterisks, with DKD generally showing higher levels in panels B, C, and D, and lower levels in panels E, F, and G. [178]Open in a new tab Animal experiment validation. (A) HE, PAS, and Masson staining (×400; 50μm). (B) kW/BW at 20 weeks. (C) Scr at 20 weeks. (D) mA/uCr at 20 weeks. (E-G) mRNA expression of Mcur1, Cyp27b1, and G6pc (^* P < 0.05, ^** P < 0.01, ^*** P < 0.001). 4. Discussion DKD pathogenesis involves complex metabolic disorders, inflammatory responses, and immune microenvironment dysregulation. This study systematically identified key genes related to M2 macrophage polarization and efferocytosis (MCUR1, CYP27B1, and G6PC) using multi-dataset analysis, machine learning, and animal-level transcriptomic sequencing, revealing their potential roles in immune-metabolic regulation in DKD and constructing a predictive model. 4.1. The pathological significance of M2 macrophage polarization and efferocytosis in DKD Recent studies have highlighted that macrophage phenotype polarization (M1/M2 balance) and dysfunctional efferocytosis are critical factors in the progression of DKD ([179]14, [180]30). Our study found that the M2 macrophage score was significantly elevated in DKD patients. M2 macrophages contribute to alleviating renal injury through their anti-inflammatory and tissue-repair functions. However, excessive M2 polarization is considered a risk factor for tissue fibrosis ([181]31, [182]32). Additionally, efferocytosis, a key mechanism for clearing apoptotic cells, plays a central role in maintaining tissue homeostasis. Its dysfunction can lead to the accumulation of apoptotic debris, triggering secondary necrosis and inflammatory storms ([183]33–[184]35). Although some studies have shown that M1 macrophages possess phagocytic capability, the majority of research supports that M2 macrophages exhibit stronger efferocytosis compared to M1 macrophages ([185]36–[186]42). Interestingly, macrophage efferocytosis promotes polarization towards the M2 phenotype, a process regarded as essential for resolving inflammation and facilitating tissue repair, thereby aiding in the restoration of tissue homeostasis ([187]37, [188]43–[189]46). In this study, the elevated ERGs score in the DKD group may be associated with an increased demand for apoptotic cell clearance; however, whether its efficiency is impaired requires further validation. 4.2. Regulatory network and functional analysis of hub genes MCUR1 is a key regulatory protein of the MCU complex, primarily expressed in the inner mitochondrial membrane. It modulates MCU complex activity, regulating mitochondrial calcium uptake, thereby influencing cellular energy metabolism and apoptosis ([190]47). Efferocytosis is a mitochondria-mediated process, where mitochondrial metabolism, dynamics, and inter-organelle communication influence the clearance of apoptotic cells by phagocytes ([191]48, [192]49) and are critical for macrophage activation and differentiation ([193]50). In an atherosclerotic mouse model, MCUR1 was found to regulate macrophage endocytosis influenced by oxidized low-density lipoprotein (oxLDL) ([194]51). However, no studies have yet investigated MCUR1 in DKD, highlighting the novelty of our findings. CYP27B1 is a cytochrome P450 enzyme primarily involved in vitamin D (VD) metabolism ([195]52). Besides regulating calcium and bone homeostasis, VD plays a crucial role in immune system, particularly in modulating macrophage polarization and function ([196]53, [197]54). In DKD studies, active VD has been shown in vitro to promote the transition of high-glucose-induced M1 macrophages to the M2 phenotype, reducing inflammation to prevent podocyte injury ([198]55). Additionally, it can inhibit macrophage polarization towards the M1 phenotype via the STAT-1/TREM-1 pathway ([199]56), thereby alleviating renal inflammation and fibrosis and ultimately preserving kidney function. Macrophages can synthesize active VD and express the VD receptor (VDR) in their nuclei. Their polarization state significantly affects their metabolic capacity for 25(OH)D3, with M2a macrophages exhibiting higher activity in synthesizing active VD, whereas M1 macrophages display lower activity ([200]57). In CYP27B1 gene knockout mouse models, excessive 25(OH)D3 significantly exacerbated renal tubulointerstitial fibrosis and oxidative stress, pathological changes that are closely associated with macrophage phenotype alterations ([201]58). VD can upregulate ASAP2 transcription, promoting efferocytosis by modulating cytoskeletal rearrangement and vesicular transport ([202]59). The G6PC gene encodes the catalytic subunit of glucose-6-phosphatase, playing a crucial role in glucose production and release, which is essential for maintaining glucose homeostasis ([203]60). In liver-specific G6pc knockout mice, proteins associated with tissue inflammation and macrophage M2 polarization (such as ARG1 and CD163) were significantly upregulated ([204]61). Renal G6pc deficiency leads to glucose-6-phosphate (G6P) accumulation, which activates the pentose phosphate pathway and de novo lipogenesis, resulting in triglyceride accumulation. The activation of the renin-angiotensin system and increased TGF-β1 levels contribute to epithelial-mesenchymal transition (EMT) and podocyte injury, ultimately impairing the glomerular filtration barrier ([205]62). However, research on the role of G6PC in macrophages and DKD remains unexplored. 4.3. Molecular subtyping and potential applications in precision medicine Consensus clustering analysis classified DKD patients into two subtypes, C1 and C2, based on hub genes. The C1 subtype is characterized by activation of the RAS pathway and glycerophospholipid metabolism, while the C2 subtype is enriched in the NOTCH signaling and extracellular matrix (ECM) remodeling pathways. This classification suggests that: (1) C1 subtype patients may benefit from a combination of RAS inhibitors (e.g., ACEI/ARB) and lipid metabolism modulators; (2) C2 subtype patients may require targeted interventions against fibrosis-related pathways, such as TGF-β and ECM receptors. Additionally, the higher expression of hub genes in the C1 subtype may indicate stronger renal compensatory capacity, suggesting a better prognosis compared to the C2 subtype. This molecular subtyping provides new insights for individualized treatment strategies in DKD. 4.4. Immune microenvironment imbalance: from mechanisms to therapeutic targets Immune infiltration analysis revealed significantly increased levels of M0, M1, and M2 macrophages, CD8+ T cells, and NK cells in the DKD group, suggesting that both the innate and adaptive immune systems contribute to the pathogenesis of DKD. The abnormal infiltration of these immune cells may exacerbate inflammatory responses and worsen tissue damage. Notably, the increased infiltration of M2 macrophages in DKD may reflect the body’s attempt to counteract inflammation through their anti-inflammatory effects. Moreover, M2 macrophages exhibit a correlation with hub genes, indicating that these genes may regulate macrophage metabolic phenotypes and influence their functions. For instance, MCUR1-mediated mitochondrial calcium signaling may modulate macrophage cytokine secretion, while G6PC influences macrophage polarization through glucose metabolism. Targeting the immune-metabolic microenvironment, such as modulating the glycolysis/oxidative phosphorylation balance in macrophages, may offer a potential new therapeutic approach for DKD. Sc-RNA seq analysis further revealed the distribution patterns of MCUR1, CYP27B1, and G6PC across different renal cell types. These genes are primarily expressed in parietal epithelial cells of the glomerulus, proximal convoluted tubule cells, and proximal tubular cells, all of which play crucial roles in the pathogenesis of DKD. For example, damage to parietal epithelial cells is closely linked to the integrity of the glomerular filtration barrier, while injury to proximal convoluted and proximal tubular cells may lead to tubular dysfunction and proteinuria. Therefore, the downregulation of these genes in specific cell types may directly impact renal structure and function. Animal experiments confirmed the downregulation of MCUR1, CYP27B1, and G6PC in DKD mouse models, further supporting their potential diagnostic value in DKD. 4.5. Limitations This study has the following limitations: (1) This study is based on public databases; however, further in vitro and in vivo functional studies, such as gene knock-out or overexpression, are required to validate the role of hub genes in efferocytosis and macrophage polarization; (2) The study primarily focuses on the roles of MCUR1, CYP27B1, and G6PC in macrophages. However, the mechanisms by which these genes interact with renal tubular and glomerular cells remain unclear. Furthermore, the specific role of these interactions in the pathogenesis of DKD is not well understood. (3) Although the critical roles of MCUR1, CYP27B1, and G6PC in M2 macrophage polarization and macrophage apoptosis phagocytosis in DKD have been revealed, the specific molecular pathways through which these key genes regulate M2 macrophage polarization and apoptosis phagocytosis remain unclear. Future research should focus on further validating, through in vitro and in vivo experiments, the role of hub genes in phagocytosis or efferocytosis polarization. Additionally, it should investigate the molecular mechanisms underlying the regulation of macrophage-tubule/glomerular cell interactions by hub genes, while optimizing models based on these key genes. Furthermore, the feasibility of using these models as non-invasive diagnostic tools or therapeutic monitoring biomarkers should be explored. 5. Conclusion This study systematically identified MCUR1, CYP27B1, and G6PC as key genes associated with M2 macrophage polarization and efferocytosis in DKD through multi-dataset analysis, machine learning, and experimental validation. They may contribute to DKD development via metabolic and immune pathways. Future research should investigate their regulatory mechanisms and evaluate their potential as therapeutic targets. Acknowledgments