Abstract Introduction Diabetic kidney disease (DKD) is a long-term complication of diabetes and causes renal microvascular disease. It is also one of the main causes of end-stage renal disease (ESRD), which has a complex pathophysiological process. Timely prevention and treatment are of great significance for delaying DKD. This study aimed to use bioinformatics analysis to find key diagnostic markers that could be possible therapeutic targets for DKD. Methods We downloaded DKD datasets from the Gene Expression Omnibus (GEO) database. Overexpression enrichment analysis (ORA) was used to explore the underlying biological processes in DKD. Algorithms such as WGCNA, LASSO, RF, and SVM_RFE were used to screen DKD diagnostic markers. The reliability and practicability of the the diagnostic model were evaluated by the calibration curve, ROC curve, and DCA curve. GSEA analysis and correlation analysis were used to explore the biological processes and significance of candidate markers. Finally, we constructed a mouse model of DKD and diabetes mellitus (DM), and we further verified the reliability of the markers through experiments such as PCR, immunohistochemistry, renal pathological staining, and ELISA. Results Biological processes, such as immune activation, T-cell activation, and cell adhesion were found to be enriched in DKD. Based on differentially expressed oxidative stress and inflammatory response-related genes (DEOIGs), we divided DKD patients into C1 and C2 subtypes. Four potential diagnostic markers for DKD, including tenascin C, peroxidasin, tissue inhibitor metalloproteinases 1, and tropomyosin (TNC, PXDN, TIMP1, and TPM1, respectively) were identified using multiple bioinformatics analyses. Further enrichment analysis found that four diagnostic markers were closely related to various immune cells and played an important role in the immune microenvironment of DKD. In addition, the results of the mouse experiment were consistent with the bioinformatics analysis, further confirming the reliability of the four markers. Conclusion In conclusion, we identified four reliable and potential diagnostic markers through a comprehensive and systematic bioinformatics analysis and experimental validation, which could serve as potential therapeutic targets for DKD. We performed a preliminary examination of the biological processes involved in DKD pathogenesis and provide a novel idea for DKD diagnosis and treatment. Keywords: diabetic kidney disease, biomarker, diagnostic model, immune, bioinformatic analysis Introduction Diabetes kidney disease (DKD) is a chronic kidney disease caused by diabetes. About 40% of type 2 diabetes patients and 30% of type 1 diabetes patients present with DKD ([49]1, [50]2). With the increasing prevalence of diabetes, the number of DKD patients has also increased ([51]3, [52]4). DKD patients present different forms of kidney damage, which is characterized by continuous increase of albuminuria excretion and/or a progressive decrease in glomerular filtration rate (GFR), eventually developing into end-stage renal disease (ESRD) ([53]5). DKD is the main cause of ESRD, and about 30% to 50% of worldwide ESRD is caused by DKD ([54]6). Therefore, it is urgent to explore early effective diagnosis and intervention targets for exploring new diagnosis and treatment strategies to improve the clinical DKD outcome. The pathogenesis of DKD is complex and multifactorial. Generally, DKD is mainly caused by hemodynamic changes and metabolic disturbances ([55]7). These changes subsequently lead to activation of the renin-angiotensin-aldosterone system (RAAS) ([56]8), increases in metabolites and pro-inflammatory factors, and dysregulation of many intracellular signaling cascades associated with oxidative stress ([57]9–[58]11). In the state of diabetes, on one hand, the self-oxidation of glucose causes mitochondrial overload and excessive production of reactive oxygen species (ROS). On the other hand, the body’s antioxidant capacity decreases, and the amount of intracellular antioxidant (nicotinamide adenine dinucleotide phosphate [NADPH]) is insufficient ([59]12), resulting in an imbalance between oxidants and antioxidants. In addition, oxidative stress is also closely related to inflammatory cells, which often coexist and activate each other. Excessive oxidative stress and inflammatory responses lead to damage to the renal interstitium, glomeruli, and renal podocytes, thereby impairing renal function. Therefore, finding diagnostic and therapeutic targets for oxidative stress and inflammatory response is expected to block the process of renal injury in DKD and restore renal function. With the popularization of gene chips and high-throughput sequencing, many disease databases have gradually been improved, and more and more effective data can be used to reveal the pathogenesis of diseases and new therapeutic targets. For example, Ma used a bioinformatic approach to analyze gene expression profiles and underlying functional networks in cardiac tissue from patients with dilated cardiomyopathy ([60]13). Huang analyzed the correlation of serum 25-hydroxyvitamin D levels in the progression of proteinuria in DKD and its underlying mechanisms ([61]14). Yang explored seven immune-related genes that can predict the progression of atherosclerotic plaque based on machine learning ([62]15). However, existing studies have some deficits, such as analysis based on a single dataset, limited number of patients, and no multi-faceted validation of bioinformatics methods, which affects prediction capability or reliability. This study integrated DKD datasets from different sources, used a variety of biological information methods to screen diagnostic markers related to oxidative stress and inflammatory response in DKD, and thoroughly examined the biological functions and potential mechanisms of diagnostic markers. This discovery may provide a promising direction for clarifying the diagnosis and pathogenesis of DKD. Materials and methods Data sources and processing DKD gene expression profiling data were downloaded from the Gene Expression Omnibus (GEO) database ([63]https://www.ncbi.nlm.nih.gov/geo/), including seven datasets, [64]GSE111154, [65]GSE142025, [66]GSE162830, [67]GSE163603, [68]GSE96804, [69]GSE1009, and [70]GSE30122. [71]Table 1 presents more details concerning the above datasets. Excluding the samples irrelevant to this study, 214 samples were finally obtained, including 101 normal samples and 113 DKD samples. The “sva” R package was applied for removing batch effects from different datasets ([72]16). A Principal component analysis (PCA) was utilized to assess the effect of batch effect removal and visualize the distribution of DKD and normal patient samples. Subsequently, we obtained 458 oxidative stress-related genes from the Gene Ontology (GO) knowledgebase ([73]http://geneontology.org/) and 200 inflammatory response related genes from MsigDB (HALLMARK_INFLAMMATORY_RESPONSE) ([74]http://www.broad.mit.edu/gsea/msigdb/) as shown in [75]Supplementary Table 1 . Table 1. Details of the datasets included in this study. Datasets Platforms Organism DKD Normal References Status