Abstract Uremia is a serious complication of end-stage chronic kidney disease, closely associated with immune imbalance and chronic inflammation. However, its molecular mechanisms remain largely unclear. In this study, we analyzed transcriptomic data from the [34]GSE37171 dataset to identify genes associated with uremia. Differential expression and WGCNA analyses were used to screen core genes, followed by machine learning (LASSO, Random Forest, SVM-RFE) to identify key feature genes. GSEA and immune infiltration analyses were conducted to explore functional pathways and immune relevance. ROC curves were used to evaluate the discriminatory power of the selected genes. Four feature genes—NAF1, SNORD4A, CGB3, and CD3E—were identified. These genes were enriched in pathways related to apoptosis, immune regulation, and oxidative stress. Their expression levels correlated with multiple immune cell types, and ROC analysis demonstrated good discriminatory performance between uremia and healthy samples. Our findings provide potential molecular candidates for further investigation into the immune-related mechanisms of uremia. Supplementary Information The online version contains supplementary material available at 10.1038/s41598-025-09950-8. Keywords: Uremia, Bioinformatics, Machine learning, Immune infiltration, Feature genes. Subject terms: Biomarkers, Nephrology, Urology Introduction Uremia is a common complication that occurs as chronic kidney disease (CKD) progresses to its end stage. It results from a prolonged and irreversible decline in renal function, leading to the accumulation of metabolic waste products and toxins in the body. This accumulation triggers a range of systemic toxic responses and metabolic disturbances, forming a complex clinical syndrome. A distinctive feature of uremia is the disruption of both local and systemic metabolic processes and signaling pathways, which presents substantial challenges for clinical treatment^[35]1.Globally, the prevalence of CKD is approximately 13.4%, and the number of patients progressing to end-stage renal disease (ESRD) continues to increase each year^[36]2.As a result, uremia is gradually emerging as a major threat to public health and safety^[37]3.Currently, kidney transplantation and dialysis remain the most effective therapeutic options for uremia. However, these treatments are associated with high costs, limited patient eligibility, and unsatisfactory long-term outcomes, all of which significantly impair patients’ quality of life^[38]4. Therefore, elucidating the complex mechanisms underlying the pathogenesis and progression of uremia, identifying key biomarkers, and developing novel diagnostic and therapeutic strategies have long been major focuses in uremia research^[39]5,[40]6. In recent years, microarray and high-throughput technologies have been widely applied in disease research. When integrated with bioinformatics approaches, these technologies enable systematic analyses of complex diseases at the gene expression level, significantly advancing our understanding of disease mechanisms and contributing to the identification of critical biomarkers and pathways involved in disease development^[41]7–[42]9.Meanwhile, the incorporation of machine learning into the biomedical field has enabled researchers to analyze large-scale and complex datasets, facilitating tasks such as identifying potential key genes and constructing diagnostic prediction models^[43]10. This integrative analytical approach not only enhances the identification of biologically relevant genes but also offers novel perspectives for the discovery and validation of disease biomarkers. Notably, recent studies have identified a range of specific molecular biomarkers associated with uremia, CKD, and dialysis outcomes. For example, elevated levels of IL-6 have been linked to increased cardiovascular and all-cause mortality in dialysis patients^[44]11 while APOL1 genetic variants have been shown to influence dialysis survival in individuals of African ancestry^[45]12. Circulating microRNAs such as miR-126 and miR-223, as well as genetic polymorphisms like VEGF rs699947, have also been implicated in adverse outcomes^[46]13,[47]14. In addition, multi-omics studies in CKD populations have identified promising biomarkers, including LCN2, KIM-1, UMOD, and miR-21, which may have diagnostic or therapeutic value^[48]15. These findings have enriched the current biomarker landscape for uremia and related kidney diseases. However, integrative approaches that combine transcriptomic analysis, machine learning, and immune profiling to systematically identify novel regulatory targets remain limited. In this study, we aim to identify key genes and potential mechanistic pathways associated with the development of uremia by integrating bioinformatics and machine learning approaches. Furthermore, we explore the functional roles of these genes within the immune microenvironment, with the goal of providing a theoretical foundation and robust data support for advancing uremia research. Methods Data source The dataset used in this study was obtained from the Gene Expression Omnibus (GEO) database under the accession number [49]GSE37171. This dataset includes gene expression profiles from 40 healthy individuals and 75 patients with uremia. The gene expression data were obtained from peripheral blood samples of uremic patients aged 18–75 years who were clinically stable, at stage 5 of chronic kidney disease, awaiting kidney transplantation, and had not received immunosuppressive therapy^[50]16. The dataset was generated using the [51]GPL570 platform. Identification of differentially expressed genes The [52]GSE37171 dataset and its corresponding [53]GPL570 platform annotation were processed using R to convert probe identifiers and generate the gene expression matrix. The expression data were first normalized using the limma package (version 3.60.6), followed by differential expression analysis. Genes with an |log2FoldChange|>1 and an adjusted p-value<0.05 were considered differentially expressed. Positive values indicated upregulation, while negative values indicated downregulation. Visualization of the differentially expressed genes (DEGs) was performed using the ggplot2 package (version 3.5.1) for the volcano plot and the pheatmap package (version 1.0.12) for the heatmap. In addition, principal component analysis (PCA) was performed on the normalized expression matrix to evaluate the overall sample distribution and identify potential outliers. Weighted gene co-expression network analysis (WGCNA) Weighted Gene Co-expression Network Analysis (WGCNA) is a widely used method for analyzing gene expression data, which constructs gene co-expression networks to identify modules of co-expressed genes and relate them to external sample traits^[54]17. In this study, WGCNA was performed on the normalized gene expression matrix using the R package WGCNA (version 1.73). An appropriate soft-thresholding power was selected to construct a scale-free network and to define gene modules. The module with the highest correlation to the trait of interest was selected for downstream analysis. Identification of hub genes and enrichment analysis The intersection of the previously identified differentially expressed genes (DEGs) and the genes from the most significantly correlated WGCNA module was used to define the hub gene set. A Venn diagram was generated to visualize this overlap and illustrate the selection process for the hub genes. To investigate the functional characteristics of the hub genes and their involvement in biological pathways, we conducted Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses^[55]18. GO analysis annotated the hub genes based on three categories: Biological Process (BP), Cellular Component (CC), and Molecular Function (MF). KEGG analysis provided systematic insights into the functional roles of the hub genes by integrating genomic, biochemical, and evolutionary information. Both analyses were performed using the clusterProfiler package (version 4.12.6) and the org.Hs.eg.db database (version 3.19.1). The top 10 enriched terms (p < 0.05) were visualized using the ggplot2 package (version 3.5.1). Machine learning and ROC validation To identify key feature genes from the previously defined hub gene set, we employed three machine learning algorithms: Least Absolute Shrinkage and Selection Operator (LASSO) regression, Random Forest (RF), and Support Vector Machine–Recursive Feature Elimination (SVM-RFE)^[56]10. The intersection of the results from these three methods was used to determine the key feature genes associated with uremia. To assess the robustness of these candidate genes in distinguishing uremia patients from healthy controls, we performed five-fold cross-validation and receiver operating characteristic (ROC) curve analysis. All analyses were conducted using the R packages caret (version 7.0–1), glmnet (version 4.1-8), randomForest (version 4.7–1.2), and pROC (version 1.18.5). Single-gene gene set enrichment analysis of key feature genes To further investigate the biological pathways potentially regulated by the identified key feature genes, we conducted single-gene Gene Set Enrichment Analysis (GSEA)^[57]19.Specifically, for each key gene, samples were stratified into high-expression and low-expression groups based on the median expression level^[58]20–[59]22. To comprehensively explore the pathways associated with each key gene and to avoid biases introduced by pre-filtering, we used the normalized full gene expression matrix as the input for the analysis. GSEA was performed using the R packages clusterProfiler (version 3.14.3) and org.Hs.eg.db (version 3.19.1). Given the exploratory nature of this study and in order to capture as many potential biological signals as possible, we applied a relatively inclusive criterion: nominal p-value < 0.05 and absolute value of the normalized enrichment score (|NES|) > 1.5. This approach allowed us to maximize the identification of potentially relevant pathways associated with the expression of key feature genes. Immune infiltration analysis Recent studies have demonstrated a close relationship between uremia and the immune system^[60]23,[61]24. Uremia has been shown to impair not only innate immunity^[62]25but also exert substantial effects on adaptive immune responses^[63]26. To explore the potential immunological roles of the identified key feature genes, we performed immune cell infiltration analysis using the CIBERSORT algorithm on the normalized gene expression matrix. CIBERSORT is a deconvolution method based on support vector regression that estimates the relative proportions of 22 immune cell subtypes from bulk tissue gene expression profiles^[64]27. This analysis was conducted using the CIBERSORT R package (version 0.1.0), and the results were visualized with the ggplot2 (version 3.5.1) and ggpubr (version 0.6.0) packages. Study design and statistical analysis The overall analytical workflow of this study is illustrated in Fig. [65]1. All statistical analyses in this study were performed using R. Unless otherwise specified, a p-value < 0.05 was considered statistically significant. Fig. 1. [66]Fig. 1 [67]Open in a new tab The overall analytical workflow. Results Results of DEGs A total of 630 DEGs were identified based on the predefined thresholds, including 51 upregulated and 579 downregulated genes. The PCA plot showed a clear separation trend between the uremia and control groups in the principal component space, with no obvious extreme outliers observed (Fig. [68]2A). The results of DEG screening are visualized in the volcano plot and heatmap (Fig. [69]2B,C). The volcano plot displays all identified DEGs, while the heatmap shows the top 50 most significantly upregulated and downregulated genes. Fig. 2. [70]Fig. 2 [71]Open in a new tab (A) The PCA plot. (B) The volcano plot displays all differentially expressed genes (DEGs). (C) The heatmap illustrates the top 50 most significantly upregulated and downregulated genes across all samples. Results of WGCNA Based on WGCNA analysis, a soft-thresholding power of 12 was selected using the pickSoftThreshold function to construct the gene co-expression network and define gene modules (Fig. [72]3A–B). Among the resulting modules, the yellow module showed the strongest correlation with uremia (r = 0.82, p = 1 × 10^−29), indicating that the genes within this module may play a key role in the pathogenesis and progression of the disease. This module contained 363 genes (Fig. [73]3C). Fig. 3. [74]Fig. 3 [75]Open in a new tab (A) Selection of the soft-thresholding power in WGCNA. The left panel shows the scale-free fit index, and the right shows mean connectivity across soft-thresholding powers. (B) Gene dendrogram and module detection using WGCNA. (C) Module–Trait Relationship Heatmap. The MEyellow showed the strongest positive correlation with the uremia trait. Core genes and enrichment analysis in uremia The intersection of the differentially expressed genes and the genes from the most significant WGCNA module yielded a set of 52 core genes associated with uremia (Fig. [76]4A). The complete results of the intersecting genes are provided in Supplementary Material [77]1. Fig. 4. [78]Fig. 4 [79]Open in a new tab (A) Venn diagram of DEGs and WGCNA module genes. (B) GO enrichment analysis in the category of BP. (C) GO enrichment analysis in MF. (D) GO enrichment analysis in CC. (E) KEGG pathway enrichment analysis. Subsequently, GO and KEGG enrichment analyses were performed on this core gene set. According to the GO enrichment results, the core genes were mainly involved in T cell-mediated immune response, T cell activation and proliferation, immunological synapse formation, leukocyte differentiation, and hematopoiesis within the BP category (Fig. [80]4B). In the CC category, they were enriched in components such as immunological synapses, cytolytic granules, and transcription factor complexes (Fig. [81]4C). For the MF category, the genes were primarily associated with superoxide dismutase activity, oxidoreductase activity (acting on superoxide radicals as acceptors), and DNA binding in transcription regulatory regions (Fig. [82]4D). KEGG pathway analysis revealed enrichment in several key signaling pathways, including the TGF-β signaling pathway, FOXO signaling pathway, and various immune-related pathways (Fig. [83]4E)^[84]18. Overall, these results suggest that the core genes in uremia may exert biological effects through diverse mechanisms, including immune responses, oxidative stress, epigenetic regulation, and transcriptional control. Machine learning identification of key feature genes in uremia Based on the expression data of 52 core genes, three machine learning algorithms were applied for feature selection: LASSO regression, RF, and SVM-RFE. LASSO regression, coupled with 10-fold cross-validation, identified 11 genes with non-zero coefficients as potential predictors (Fig. [85]5A–B). SVM-RFE with 10-fold cross-validation was used to rank genes by their importance scores in the SVM model, and the top 20 ranked genes were selected (Fig. [86]5C). Similarly, Random Forest was performed using 500 decision trees, and genes were ranked by the Mean Decrease in Gini index; the top 20 most important genes were retained (Fig. [87]5D). All models were trained with appropriate cross-validation to minimize the risk of overfitting. The intersection of the results from all three algorithms yielded four key feature genes: NAF1, SNORD4A, CD3E, and CGB3 (Fig. [88]5E). Consistent with the differential expression analysis, NAF1 and CD3E were downregulated in uremia samples, whereas SNORD4A and CGB3 were upregulated. Fig. 5. [89]Fig. 5 [90]Open in a new tab (A) LASSO coefficient paths. (B) LASSO cross-validation curve. (C) SVM-RFE feature importance ranking. (D) Top 20 Important features by random forest. (E) Venn diagram of the intersection genes from three machine learning methods. ROC curve analysis demonstrated high predictive performance for all four genes in distinguishing uremia from healthy samples: NAF1 (AUC = 0.9827), SNORD4A (AUC = 0.9803), CD3E (AUC = 0.9713), CGB3 (AUC = 0.9677) and multi-gene (AUC = 1). A logistic regression model constructed based on these key feature genes achieved an accuracy and Kappa value of 1.0 under five-fold cross-validation, indicating excellent classification performance. (Fig. [91]6). The machine learning results are provided in Supplementary Material [92]2. Fig. 6. [93]Fig. 6 [94]Open in a new tab The ROC-AUC curves of NAF1 (A), SNORD4A (B), CGB3 (C), CD3E (D) and all four genes (E). Sensitivity-specificity curve for all four genes (F). Single-gene gene set enrichment analysis of key feature genes Samples were divided into high-expression (≥ 50%) and low-expression (< 5 0%) groups based on the expression levels of the four key feature genes: NAF1, SNORD4A, CD3E, and CGB3. Gene sets for GO biological processes (c5.go.bp.v7.4.symbols.gmt) and KEGG pathways (c2.cp.kegg.v7.4.symbols.gmt) were obtained from the Molecular Signatures Database (MSigDB)^[95]28. GSEA was then performed to identify significantly enriched pathways and potential molecular mechanisms between the high- and low-expression groups, aiming to reveal biological processes associated with the pathogenesis of uremia. To highlight pathways potentially activated or upregulated under uremic conditions, results with a normalized enrichment score (NES) > 1.5 and a nominal p-value < 0.05 were visualized. The complete GSEA results are provided in Supplementary Material [96]3. The GSEA results for NAF1 showed significant enrichment in several Gene Ontology (GO) biological processes, including macrophage fusion (NES = 1.6875, NP = 0.0020), nuclear body organization (NES = 1.6526, NP = 0.0078), and translational readthrough (NES = 1.6421, NP = 0.0021) (Fig. [97]7A). KEGG pathway analysis indicated that NAF1 was associated with the lysosome pathway (NES = 1.5583, NP = 0.0228), SNARE interactions in vesicular transport (NES = 1.5269, NP = 0.0318), T cell receptor signaling pathway (NES = 1.5195, NP = 0.0260), and Toll-like receptor signaling pathway (NES = 1.5145, NP = 0.0417) (Fig. [98]8A). Fig. 7. [99]Fig. 7 [100]Open in a new tab GSEA -GO Results for NAF1 (A), SNORD4A (B), CGB3 (C) and CD3E (D). Fig. 8. [101]Fig. 8 [102]Open in a new tab GSEA -KEGG Results for NAF1 (A), SNORD4A (B), CGB3 (C) and CD3E (D). The GSEA results for SNORD4A indicated significant enrichment in several GO biological processes, including positive regulation of necrotic cell death (NES = 1.7901, NP = 0.0038), negative regulation of sphingolipid biosynthetic process (NES = 1.6862, NP = 0.0058), NADP metabolic process (NES = 1.6724, NP = 0.0119), regulation of autophagosome maturation (NES = 1.6705, NP = 0.0077), and modulation by virus of host cellular process (NES = 1.6509, NP = 0.0038) (Fig. [103]7B). KEGG pathway analysis showed enrichment in the acute myeloid leukemia pathway (NES = 1.5218, NP = 0.0226) and aminoacyl-tRNA biosynthesis (NES = 1.5084, NP = 0.0131) (Fig. [104]8B). The GSEA results for CD3E showed significant enrichment in GO biological processes such as DNA unwinding involved in DNA replication (NES = 1.6563, NP = 0.0042), nuclear body organization (NES = 1.6312, NP = 0.0170), and translational readthrough (NES = 1.6074, NP = 0.0106) (Fig. [105]7C). KEGG pathway analysis revealed associations with aminoacyl-tRNA biosynthesis (NES = 1.6680, NP = 0.0000), DNA replication (NES = 1.6654, NP = 0.0020), one-carbon pool by folate (NES = 1.6102, NP = 0.0021), and the non-homologous end-joining (NHEJ) DNA repair pathway (NES = 1.5219, NP = 0.0325) (Fig. [106]8C). The GSEA results for CGB3 suggested significant enrichment in several GO biological processes, including positive regulation of necrotic cell death (NES = 1.6914, NP = 0.0057), regulation of mitochondrial gene expression (NES = 1.6008, NP = 0.0019), regulation of autophagosome maturation (NES = 1.5788, NP = 0.0265), negative regulation of cytosolic pattern recognition receptor signaling pathway in response to virus (NES = 1.5763, NP = 0.0144), and NADPH regeneration (NES = 1.5751, NP = 0.0039) (Fig. [107]7D). KEGG pathway analysis indicated that CGB3 was associated with aminoacyl-tRNA biosynthesis (NES = 1.5204, NP = 0.0254) and the acute myeloid leukemia pathway (NES = 1.5161, NP = 0.0137) (Fig. [108]8D). Immune infiltration analysis results To investigate immune system abnormalities in uremia, we performed immune cell infiltration analysis using the CIBERSORT algorithm. As shown in Fig. [109]9A, the composition of immune cells differed significantly between uremic patients and healthy controls. In particular, the proportions of memory B cells (fold change = 5.77, P < 0.001), plasma cells (fold change = 5.63, P < 0.001), gamma delta T cells (fold change = 9.32, P < 0.001), monocytes (fold change = 1.54, P < 0.001), and M0 macrophages (fold change = 2.40, P < 0.001) were significantly increased in the uremia group. These findings indicate an immune-activated state and potential inflammatory response in uremia. Fig. 9. [110]Fig. 9 [111]Open in a new tab (A) Boxplot of immune cell infiltration. (B) Stacked bar plot of immune cell infiltration. (C) Correlation heatmap between gene expression and immune cell infiltration. (D) Correlation network based on mantel test. *p < 0.05, **p < 0.01, ***p < 0.001, **** p < 0.0001. A bar plot was further generated to visualize the relative proportions of 22 immune cell subtypes in each sample (Fig. [112]9B). Although inter-sample variability was observed, the overall infiltration patterns were largely consistent within groups, suggesting that abnormal infiltration of certain immune cell subsets may be characteristic of the uremic condition. In addition, we assessed the correlations between four key feature genes (NAF1, SNORD4A, CD3E, and CGB3) and the proportions of 22 immune cell subtypes (Fig. [113]9C). The results showed that NAF1 was most strongly and positively correlated with resting NK cells (r = 0.51), and most negatively correlated with monocytes (r = − 0.63). SNORD4A was positively associated with monocytes (r = 0.44) and negatively with resting NK cells (r = − 0.47). CD3E showed the strongest positive correlation with naïve B cells (r = 0.48) and the strongest negative correlation with monocytes (r = − 0.64). CGB3 was most positively correlated with monocytes (r = 0.63) and negatively correlated with resting NK cells (r = − 0.48). These findings indicate that these key genes may participate in the regulation of innate and adaptive immune responses. Furthermore, the Mantel test and correlation network analysis (Fig. [114]9D) demonstrated a significant overall association between the four genes and immune cell infiltration (P < 0.01), additionally highlighting the major gene–cell interaction network. The results are provided in Supplementary Material [115]4. Discussion The kidneys play a vital role not only in eliminating metabolic waste products and maintaining electrolyte and acid–base balance, but also in regulating blood pressure and participating in hormone secretion and other physiological functions^[116]29. When acute or chronic kidney failure occurs, metabolic waste products that are normally excreted or metabolized by the kidneys accumulate in the bloodstream, resulting in toxic manifestations. Given the kidney’s extensive interactions with other organs and systems, this dysfunction leads to a complex clinical syndrome characterized by systemic symptoms—uremia^[117]30. Currently, the main therapeutic strategies for uremia include kidney transplantation and hemodialysis. Kidney transplantation is considered the definitive treatment for end-stage renal disease; however, it is not widely accessible due to donor organ shortages, risk of immune rejection, the need for lifelong immunosuppressive therapy, and high economic burden^[118]31,[119]32. Hemodialysis, the most commonly used renal replacement therapy, removes metabolic waste products through extracorporeal circulation, offering short-term improvements in clinical symptoms and quality of life. However, it does not fully replicate the kidneys’ functions and is associated with long-term complications such as anemia and infection^[120]33–[121]35. Moreover, studies have shown that the five-year survival rate after initiation of maintenance hemodialysis remains suboptimal, at approximately 40%^[122]36. Therefore, to develop more effective treatments for uremia, it is critical to gain deeper insights into the molecular mechanisms underlying its pathogenesis and to identify reliable biomarkers for diagnosis, prognosis, and therapeutic targeting. In this study, we identified a set of core genes associated with uremia by integrating differential expression analysis and WGCNA, followed by GO and KEGG enrichment analyses to explore their potential biological functions and pathways. To further refine the most relevant candidates, we applied machine learning algorithms and identified four key genes—NAF1, SNORD4A, CD3E, and CGB3—that showed the strongest associations with uremia. To gain deeper insight into the roles these genes may play in the pathogenesis and progression of uremia, we conducted Gene Set Enrichment Analysis (GSEA) and immune infiltration analysis. These complementary approaches allowed us to investigate the signaling pathways and immune-related mechanisms potentially mediated by the identified genes. The GO and KEGG enrichment results of the core gene set suggested that immune dysregulation and inflammatory responses may be major pathogenic mechanisms underlying uremia. Consistently, the GSEA and immune infiltration analyses of the four key feature genes further indicated their broad involvement in immune and inflammatory processes. The high degree of concordance between these two analytical approaches at the functional level supports the reliability of our findings. Previous studies have also demonstrated widespread immune abnormalities in uremic conditions, with various immune cell populations exhibiting functional impairments or imbalance. These include alterations in both innate and adaptive immune responses, contributing to chronic inflammation and increased susceptibility to infection commonly observed in patients with end-stage renal disease^[123]37. Polymorphonuclear leukocytes (PMNLs), particularly neutrophils, are notably affected in uremia. Studies have shown that neutrophils in uremic patients exhibit impaired phagocytic activity, reduced oxidative burst capacity, and delayed apoptosis^[124]38–[125]42. Additionally, the uremic environment contains circulating factors that may induce a pre-activated state in PMNLs, rendering them more prone to activation and release of inflammatory mediators^[126]43–[127]46. While this primed state may be beneficial under physiological conditions, it can contribute to a persistent low-grade inflammatory response when homeostasis is disrupted^[128]47,[129]48. Other immune cell types are also affected in uremia, including monocytes, dendritic cells, and lymphocytes. Girndt et al. reported functional and phenotypic differences in monocytes between uremic patients and healthy individuals. Based on CD14/CD16 expression profiles, they identified three distinct monocyte subsets, among which the intermediate monocyte population (Mo2: CD14⁺⁺/CD16⁺) exhibited strong pro-inflammatory activity^[130]25. Dendritic cells (DCs), which serve as critical mediators between innate and adaptive immunity, are also impaired in chronic kidney disease. Verkade et al. demonstrated that advanced CKD inhibits the differentiation of monocytes into DCs^[131]49leading to compromised antigen presentation and weakened immune responses^[132]50. Moreover, renal-resident dendritic cells can interact with glomerular antigens and infiltrating T cells, thereby contributing to the progression of kidney disease^[133]51,[134]52. Our study identified four genes that have been rarely reported in previous research. NAF1 is a component of the H/ACA ribonucleoprotein complex, primarily involved in the processing and modification of small nucleolar RNAs (snoRNAs). It also plays essential roles in telomere maintenance, RNA maturation, and cellular stress responses. In our study, GSEA results showed that NAF1 was enriched in pathways related to apoptosis, stress response, and oxidative phosphorylation—processes closely associated with cellular injury—suggesting that NAF1 may be involved in stress-induced renal damage. Additionally, immune infiltration analysis revealed that NAF1 expression was significantly correlated with the infiltration levels of multiple immune cell types. This finding implies that NAF1 may contribute to the progression of uremia by modulating the immune microenvironment. Although research on NAF1 in the context of uremia or kidney disease remains limited, previous studies have reported a strong link between NAF1 mutations and pulmonary fibrosis^[135]53,[136]54. In a study on idiopathic pulmonary fibrosis, pathogenic variants were identified in two telomere-related genes, one of which was NAF1, and both were associated with severe telomere shortening^[137]54. Stanley et al. reported a frameshift mutation in NAF1 in patients with combined pulmonary fibrosis and emphysema. Their modeling experiments demonstrated that the mutation impaired the generation and stability of telomerase RNA, resulting in telomere attrition and disease development^[138]53. Telomere shortening has also been linked to chronic kidney disease. A systematic review by Levstek et al. concluded that reduced telomere length is associated with declining renal function and progression to chronic renal failure^[139]55. Renal fibrosis, in turn, is widely recognized as the final common pathway in nearly all forms of chronic kidney disease^[140]56. Based on these findings, we speculate that NAF1 may influence the development of uremia not only through immune modulation and stress response but also by regulating telomere length and contributing to renal fibrosis. Although this hypothesis remains to be experimentally validated, our findings suggest that NAF1 represents a potentially novel therapeutic target and a promising direction for future uremia research. SNORD4A belongs to the C/D box small nucleolar RNA (snoRNA) family and is primarily involved in the 2′-O-methylation of ribosomal RNA (rRNA), thereby regulating ribosome biogenesis and protein translation efficiency^[141]57. Recent studies have shown that snoRNAs are associated with various kidney diseases. For instance, Horikawa et al. reported that a splice variant of the SNORA host gene 4 was significantly upregulated in a podocyte injury model and was detectable in urine, suggesting its disease-specific expression and potential as a noninvasive biomarker for kidney disorders^[142]58. Similarly, Zhu et al. demonstrated that SNORD3A promotes ferroptosis in acute kidney injury by transcriptionally activating the STING pathway^[143]59. In the context of diabetic nephropathy, five snoRNAs (U8, SNORD118, SNORD24, SNORD107, and SNORD87) were found to be abnormally expressed in patients’ plasma. Notably, although snoRNAs represent only about 10% of total small non-coding RNAs (sncRNAs), they accounted for 71% of those significantly associated with diabetic nephropathy, highlighting their disproportionate relevance to disease processes^[144]60. Another study on renal cell carcinoma identified SNORD63 and SNORD96A as significantly upregulated in patients with clear cell renal cell carcinoma and proposed them as potential noninvasive diagnostic markers^[145]61. In our study, SNORD4A emerged as one of the key feature genes identified through integrative bioinformatics and machine learning approaches. Although there is currently no direct evidence linking SNORD4A to kidney disease, its structural and functional similarities to other disease-associated C/D box snoRNAs (e.g., SNORD3A, SNORD96A) suggest that it may play a regulatory role in immune or inflammatory processes relevant to uremia. Based on our findings, we speculate that SNORD4A may act as a novel inflammation- or immunity-related biomarker in uremia. CGB3 is a member of the human chorionic gonadotropin beta subunit (hCG-β) gene family. Our findings suggest that CGB3 is involved in several immune- and inflammation-related pathways, with enrichment analysis indicating its association with T cell–mediated immune responses and cytokine secretion. Further immune infiltration analysis revealed that CGB3 expression was strongly correlated with monocyte and resting NK cell infiltration, suggesting a role in innate immune regulation in uremia. As a member of the hCG-β subunit gene family, CGB3, along with related genes such as CGB1 and CGB2, contributes to the production and regulation of the hCG hormone^[146]62. While hCG is traditionally associated with pregnancy and placental function, increasing evidence has linked hCG and its β subunits to cancer progression^[147]62–[148]64. Several studies suggest that CGB3 and other CGB family genes may influence cancer development through mechanisms such as immune evasion, tumor metastasis, and apoptosis^[149]63–[150]65. CGB3 has also been reported as a biomarker or potential immunotherapeutic target in cancers such as bladder, cervical, and thyroid cancer^[151]66–[152]68. However, there is currently no direct evidence implicating CGB3 in uremia or kidney disease. In our study, the significant correlation between CGB3 expression and the infiltration of monocytes and resting NK cells provides preliminary support for its potential involvement in modulating the renal immune microenvironment. Nonetheless, given the lack of specific evidence linking CGB3 to renal pathology, its utility as a diagnostic or therapeutic biomarker in uremia remains uncertain. That said, our findings support the notion that CGB3 may represent a potential candidate gene worthy of further investigation. Even if it cannot yet be considered a clinical biomarker, it may serve as a molecular indicator of underlying immune dysregulation in the kidney, and thus could be a valuable target for future studies focused on renal immune modulation. CD3E encodes the CD3ε chain, an essential component of the T cell receptor (TCR) complex, and plays a central role in T cell development, activation, and immune responses^[153]69. In our study, single-gene GSEA revealed that CD3E was significantly enriched in pathways related to cellular activation, nucleic acid metabolism, and DNA replication and repair, suggesting its potential involvement in T cell proliferation and activation. Existing studies support these findings from multiple perspectives. Winterberg et al. established a uremic cardiomyopathy mouse model and showed that depletion of T cells using an anti-CD3ε monoclonal antibody significantly improved diastolic dysfunction^[154]70 providing direct experimental evidence that CD3E-mediated T cell activation plays a pathogenic role in uremia-related organ damage. At the transcriptomic level, Wu et al. used single-cell RNA sequencing in hemodialysis patients and found that CD3E was significantly downregulated in CD4⁺ T cells, accompanied by reduced T cell activity and suppression of downstream TCR signaling pathways, including PI3K-Akt and NF-κB^[155]71. Similarly, Zheng et al. observed downregulation of CD3E and CD3D in T cells from IgA nephropathy patients, indicating that T cell dysfunction in chronic kidney disease is closely associated with abnormal CD3E expression^[156]72. Furthermore, Galichon et al. suggested that changes in urinary CD3E mRNA levels may serve as a biomarker for renal transplant rejection, reflecting local immune activity^[157]73. Given that CD3E is stably expressed on the surface of CD3⁺ T cells, fluctuations in CD3⁺ T cell counts may indirectly reflect its functional status. For example, Baranwal et al. found that expansion of the renal lymphatic network after acute kidney injury modulated CD3E⁺ T cell infiltration and promoted renal repair^[158]74. In patients with focal segmental glomerulosclerosis (FSGS), Kronbichler et al. reported significant CD3⁺ T cell infiltration in both glomerular and interstitial regions^[159]75further implicating T cell activation in kidney structural damage. Additionally, Xu et al. demonstrated that persistent CD3⁺ T cell infiltration contributed to tubular atrophy and functional loss during the transition from acute kidney injury to chronic kidney disease^[160]76. It is worth further discussion that our immune infiltration analysis revealed that CD3E expression was associated not only with T cell subsets, but also showed certain correlations with naïve B cell and monocyte infiltration. Regarding this phenomenon, we believe that although CD3E is traditionally regarded as a T cell-specific molecule, its expression changes in the complex immune microenvironment of uremia may indirectly reflect correlations with other immune cells by influencing immune signaling networks or intercellular interactions. However, relevant studies on this topic are still limited. Therefore, we still agree that the primary function of CD3E lies in mediating T cell development and activation, while its association with other immune cell types may more likely reflect indirect links between cellular subsets under conditions of immune imbalance. Taken together, these findings position CD3E as the most promising of the four key feature genes identified in our study, with strong potential as a therapeutic or interventional target for uremia and its related complications. This study has several limitations. Our analysis primarily relied on bioinformatics and machine learning approaches, combined with evidence from previously published experimental studies, to systematically identify and explore key feature genes and potential mechanisms associated with uremia. However, the study lacks direct experimental validation using independent clinical samples or functional assays. In addition, a relatively lenient threshold was applied during the screening of differentially expressed genes to balance sensitivity and specificity. Similarly, the single-gene GSEA analysis used a filtering criterion of p < 0.05 and |NES| > 1.5, which may carry a risk of false positives. Moreover, we acknowledge that patients with uremia or advanced chronic kidney disease often present with comorbid conditions such as diabetes and hypertension. Previous studies have shown that these comorbidities may influence gene expression to a certain extent, potentially through mechanisms such as epigenetic regulation, oxidative stress, and chronic inflammation^[161]77–[162]79. However, as the original dataset used in this study did not include relevant clinical comorbidity information, we were unable to accurately assess the potential confounding effects of these factors on gene expression. Furthermore, the number and diversity of available samples in public databases remain limited, which may affect the generalizability and applicability of our findings. In future studies, we plan to validate the expression patterns of the identified key genes using large-scale, multi-center datasets and to conduct mechanistic investigations through cellular and animal experiments. These efforts aim to elucidate the functional roles of these genes in the development and progression of uremia, ultimately providing new theoretical insights for its prevention and treatment. Conclusion Through an integrative analysis combining bioinformatics and machine learning approaches, this study systematically identified four genes—NAF1, SNORD4A, CGB3, and CD3E—as potential key regulators involved in immune imbalance and chronic inflammation in uremia, with CD3E demonstrating the greatest potential. Although this study has certain limitations, our findings provide valuable theoretical insights for further research on uremia. Future studies should incorporate larger cohorts and functional experiments to validate these results and elucidate the precise molecular mechanisms of the identified genes. Electronic supplementary material Below is the link to the electronic supplementary material. [163]Supplementary Information 1.^ (21.6KB, xlsx) [164]Supplementary Information 2.^ (10.1KB, xlsx) [165]Supplementary Information 3.^ (38.2KB, xlsx) [166]Supplementary Information 4.^ (40.4KB, xlsx) Acknowledgements