Abstract Diabetic nephropathy (DN) is one of the most established microvascular complications of diabetes and a key cause of end-stage renal disease. It is well established that gene susceptibility to DN plays a critical role in disease pathophysiology. Therefore, many genetic studies have been performed to categorize candidate genes in prominent diabetic cohorts, aiming to investigate DN pathogenesis and etiology. In this study, we performed a meta-analysis on the expression profiles of [41]GSE1009, [42]GSE30122, [43]GSE96804, [44]GSE99340, [45]GSE104948, [46]GSE104954, and [47]GSE111154 to identify critical transcriptional factors associated with DN progression. The analysis was conducted for all individual datasets for each kidney tissue (glomerulus, tubules, and kidney cortex). We identified distinct clusters of susceptibility genes that were dysregulated in a renal compartment-specific pattern. Further, we recognized a small but a closely connected set of these susceptibility genes enriched for podocyte differentiation, several of which were characterized as genes encoding critical transcriptional factors (TFs) involved in DN development and podocyte function. To validate the role of identified TFs in DN progression, we functionally validated the three main TFs (DACH1, LMX1B, and WT1) identified through differential gene expression and network analysis using the hyperglycemic zebrafish model. We report that hyperglycemia-induced altered gene expression of the key TF genes leads to morphological abnormalities in zebrafish glomeruli, pronephric tubules, proximal and distal ducts. This study demonstrated that altered expression of these TF genes could be associated with hyperglycemia-induced nephropathy and, thus, aids in understanding the molecular drivers, essential genes, and pathways that trigger DN initiation and development. Keywords: Diabetic nephropathy, Diabetic kidney disease, Gene expression, Network analysis, Hyperglycemic zebrafish, Transcription factors, DACH1, LMX1B, WT, Therapeutic biomarkers Graphical Abstract [48]ga1 [49]Open in a new tab 1. Introduction Diabetic nephropathy (DN), also known as diabetic kidney disease, is a main microvascular complication of diabetes mellitus (DM) and a key factor in increased morbidity and mortality in DM patients. Several factors, including hyperglycemia-generated metabolic fluctuations [50][1], age at onset [51][2], DM duration [52][3], hypertension [53][4] and genetic susceptibility [54][5], have been proven as significant contributors to the disease progression. It is majorly the glomerular damage, besides tubulointerstitial fibrosis, contributing to the pathogenesis of DN [55][6], [56][7]. The current treatment for DN is controlling glucose levels and reducing systemic blood pressure using renin-angiotensin system inhibitors [57][8]. However, even with these therapeutic options, the prevalence of DN kept arising, and innovative treatment strategies were immediately required. Recently, multiple studies have used computational approaches to identify risk factors that could contribute to DN pathogenesis [58][9], [59][10], [60][11], [61][12]. An enormous volume of data, including high throughput omics data and clinical information collected through Electronic Health Records, has been produced because of the tremendous advancements in biotechnology and the health sciences. Application of appropriate computational methods including machine learning and meta-analysis approaches is essential to transform all this information intelligently into useful knowledge. Finding new molecular signatures requires integrating multiple datasets with greater sample sizes that are less prone to systematic biases. These signatures may offer additional insights and treatment options into the etiology and pathogenicity of DN. The hyperglycemic conditions in DM patients stimulate several molecular pathways, and biochemical reactions in diverse kidney cell types. These include activation of protein kinase C [62][13], increasing inflammatory mediators and growth factors [63][14], the elevation of complex glycation end products resulted from nonenzymatic reactions [64][15], and the accumulation of reactive oxygen species (ROS) [65][16]. Nonetheless, a comprehensive understanding of the gene regulation mechanisms that aggravate hyperglycemia-arbitrated biochemical imbalances is less understood. In vitro and in vivo research investigations in mesangial and glomerular cells, podocyte loss and epithelial dysfunction have emphasized several transcription factors (TFs) that contribute to regulating gene expression in diabetic contexts [66][17], [67][18]. For instance, the transcription factor NF-κB is activated by ROS cellular signaling or through hyperglycemia-induced cytokines influencing the glomeruli microinflammation [68][19]. The SMAD proteins, including SMAD1 and SMAD3, are activated in response to transforming growth factor (TGF- β) and glycation end products to maximize the transcriptional activity of mesangial cells through the production of extracellular matrix proteins [69][20], [70][21], [71][22]. Increasing angiotensin II in glomerular mesangial cells and elevated glucose levels activate STAT1 and STAT3, leading to the production of extracellular matrix proteins and manipulating cellular growth [72][23], [73][24]. Transcriptional dysregulation in the DN can occur at several levels, including modifications in upstream cellular signals or the TF itself. In this regard, TFs are the final founder proteins at the site of gene regulation to arbitrate the cellular responses to pathogenic elements in DM. Consequently, TFs may represent attractive therapeutic targets that could avoid many exertions with signal idleness and unwanted signal transduction events often faced in therapies designed to target the upstream signaling pathways [74][24], [75][25]. Despite the advancement in this topic, obstacles in developing TFs as drug molecules are still challenging. For instance, several TFs are directly regulated by synthetic steroid ligands such as estrogen receptor in endocrine treatment for several cancers [76][26] and glucocorticoid receptor in immunosuppression therapy [77][27], [78][28]. However, for some other TFs, direct targeting designs exist, e.g., c-Myc expression targeting and regulation through quadruplex stabilizer [79][29], [80][30], [81][31], disrupting STAT3– or NF-κB–DNA interactions by decoy oligonucleotides [82][32], [83][33], and small interfering RNA or antisense oligonucleotide technology for many other diseases [84][34], [85][35]. Given that, more realistic strategies for targeting TFs are essential. An example of a therapeutic approach for DN is through induction of O-GlcNAcylation of carbohydrate response element (ChRE)-binding protein (ChREBP) to high glucose levels in mesangial cells [86][36]. It is a basic helix–loop–helix/leucine zipper TF produced in several metabolically related tissues, such as pancreatic β-cells, liver, and adipocytes, and acts as a moderator of glucose-responsive gene activation [87][36], [88][37], [89][38]. Following glucose stimulation, ChREBP translocates into the nucleus, thus producing a heterodimer with Max-like protein X (Mlx) that conjugate to the carbohydrate response element (ChRE) for transcriptional regulation of genes related to glucose, fatty acid steroid and lipid metabolism [90][38], [91][39], [92][40], [93][41], [94][42]. Despite the druggability hurdle yet to be solved, progress is being achieved in research targeting the modulation of TFs with possible clinical advantages. To create novel therapies for DN, new research studies and concepts are needed to determine specific TFs and investigate their potential functional role in the disease pathogenesis. This study aims to identify key kidney-specific TFs of DN using publicly available microarray datasets and further prioritize them through functional validation in hyperglycemic zebrafish models. The study also discussed the specific role of each identified TF in developing DN to their correlated functions. 2. Methods 2.1. Dataset selection and curation All microarray datasets used in this study were retrieved and accessed using the Gene Expression Omnibus [95][43] repository on January 25, 2022. The search keywords were limited to “diabetic nephropathy”, and the results were further filtered based on organism (“Human”) and study type (“Expression profiling by array”). The resulting entries (n = 36) were further manually curated based on the following criteria: (1) relevance to DN; (2) availability of data for kidney tissue; (3) classification of study as observational; and (4) availability of baseline comparators, i.e., non-chronic or acute kidney disease tissue. 2.2. Individual dataset pre-processing and quality-check The final curation of microarray datasets (Summarized in [96]Table 1) was processed using a uniform “R” [97][44] pipeline script. Individually selected raw intensity expression (CEL) files obtained based on the criteria mentioned above were imported into R. The background expression level was subtracted, quantile-normalized, median-polished, and log[2]-transformed using the oligo package version 1.58.0 [98][45]. The presence of duplicate samples was screened for using the Dopplegang R package version 1.22.0 [99][46], based on “expression” doppelgängers. If doppelgängers were found, they were manually removed based on further inspection of all information submitted for the suspected samples and the respective GEO series entries. Table 1. List of GEO datasets curated and used for this study. All datasets were screened and obtained from Gene Expression Omnibus (GEO). GEO Series Platform Tissue studied Compared groups Curated samples Outliers (n) Analyzed samples (n) Sample-type breakdown (n of DN/n of Control) [100]GSE1009 [101]GPL8300 (HG_U95Av2) Glomerulus DN; control 6 0 6 6^✧ [102]GSE30122^# [103]GPL571 (HG-U133A_2) Glomerulus; Tubulointerstitium DN; control Glom = 22 Tubu = 22 Glom = 1 Tubu = 1 Glom = 21 Tubu = 21 Glom = 8/13 Tubu = 10/11 [104]GSE96804 [105]GPL17586 (HTA-2_0) Glomerulus EDN; LDN; control 61 0 61 41/20^⁑ [106]GSE99340^# [107]GPL19109 (HG-U133_Plus_2) Glomerulus DN 7 0 0^⁑ 0 [108]GPL19184 (HG-U133A) Glomerulus; Tubulointerstitium DN; control 29 1 14^❋ 15 [109]GSE104948^# [110]GPL22945 (HG-U133_Plus_2) Glomerulus DN; control Glom = 7 Tubu = 22 0 Glom = 0 Tubu = 22 7/18 [111]GPL24120 (HG-U133A) Glomerulus DN; control 8 1 7 4/3 [112]GSE104954 [113]GPL22945 (HG-U133_Plus_2) Tubulointerstitium DN; control 25 2 23 6/17 [114]GPL24120 (HG-U133A) Tubulointerstitium DN; control 13 0 13 10/3 [115]GSE111154 [116]GPL17586 (HTA-2_0) Kidney cortex EDN; control 8 1 7 4/3 [117]Open in a new tab Abbreviations: EDN, early diabetic nephropathy; Glom, Glomerulus; LDN, late diabetic nephropathy; Tubu, Tubulointerstitium. ✧ Two samples had technical replicates. ⁑ EDN and LDN were both categorized as DN due to a lack of annotation details associated with the public dataset, despite an attempt to personally communicate with authors to attain further details. ✻ Glomerulus samples did not contain controls and were thus not analyzed. # Not included in Network inference analysis To determine sample quality scores for use in outlier detection, we computed array quality metrics [118][47] and the IQRray statistic [119][48]. To inform decisions to remove outliers, IQRrays were tested for outlier values using either Rosner’s or Dixon’s test, as implemented in EnvStats [120][49]. In addition, each sample was further inspected based on the quality scores from the ArrayQualityMetrics output and inspected visually based on principal component analysis (PCA) plots and density plots and based on outliers for distances between arrays using the Kolmogorov–Smirnov statistic Ka (boxplot outliers). To functionally annotate probes, the AnnotationDbi package [121][50] was used to assign gene symbol keys to the probes according to the database that matched the platform used for each study. Datasets were further filtered to remove internal control probes and probes that matched to no or multiple gene symbols. Finally, expression values for multiple probes mapping to the same gene were averaged to obtain the mean. 2.3. Differential gene expression analysis and filtering for “core” differentially expressed genes To test for differentially expressed genes (DEGs), we constructed least-squares fitted linear models for the expression data of each gene, and then we computed the log-fold changes in gene expression based on empirical Bayes function using limma [122][51]. This analysis was computed for all individual datasets separated by platform and kidney tissue (glomerulus, tubules, and kidney cortex) and for each of three “combined meta-analysis” datasets, including glomerulus and tubules datasets. The combined meta-analysis dataset samples that passed quality checks were combined into a single data frame, and the expression data were limited to genes that overlapped all individual datasets, which were corrected for batch effects using the ComBat [123][52] function in the SVA package [124][53]. DEGs were corrected for multiple testing at a false discovery rate (FDR) threshold of 5% using the Benjamini–Hochberg method [125][54], and the final list of genes was visualized for among-datasets highest-degree. To screen for “core” DEGs between DN and control samples, we employed a vote-counting [126][55] approach using individual datasets for glomeruli and tubules and the combined meta-analysis dataset. First, statistically significant genes that passed a log[2] fold-change threshold of 1.5 (raw FC ≈ 2.83) were retained; otherwise, they were dropped. This more stringent fold-change cutoff allowed us to isolate a panel of genes that varied widely in their gene expression among datasets and represented biologically meaningful DEGs [127][56]. These genes were further used to pick the topmost representative genes by vote-counting genes that were significant and passed the 1.5 log[2] fold-change threshold. Genes that were vote-counted at least twice in either all the glomeruli or tubule datasets were used, and then the fold-change values from these genes were used to generate a heatmap using pheatmap [128][57]. The “Ward 2″ error sum of squares hierarchical clustering method [129][58] was used ad hoc to create clusters based on dissimilarity patterns across datasets. Finally, to determine the directionality of consistency (i.e., upregulation vs downregulation) for gene expression, the MetaVolcanoR package was used to calculate meta-analysis fold-changes, p-values, and sign consistencies for each of the genes using individual series datasets, calculated as implemented in MetaVolcanoR using the vote-counting and -combining approaches [130][59]. 2.4. Gene function and disease enrichment analyses The DEGs within the resulting heatmap clusters (which we term “aggregates” to prevent confusion with reference to the network analysis clusters generated below) were used to guide subsequent enrichment analyses. Subsequently, gene ontology (GO) and biological pathway (BP) sub ontologies and disease ontology semantic and enrichment (DOSE) terms that were enriched more than would be expected by chance were generated using clusterProfiler [131][60]. The enrichment functions were run using p- and q-value cutoff levels of 0.05, and p-values were adjusted post-hoc using the BH [132][54] method. To visualize the results, heatmaps, gene-concept networks (“cnetplots”), and treeplots were generated within clusterProfiler, and the number of viewable ontologies was adjusted to achieve meaningful visualization of results. 2.5. Network inference analysis Network inference was used to explore and detect novel associations between the DEGs. The network inference algorithm identifies statistical dependencies between gene expression levels as edges. Specifically, we used network inference to infer the interactions between set of genes using the mutual information (MI) [133][61] measure. Statistically significant interactions were selected using a threshold of p < 10^−7 and visualized using a force-directed layout in the software application Cytoscape [134][62]. This layout uses the MI values to determine the structure of the network. Genes with high MI values will appear closer in the network, whereas genes with low MI values will be further apart. Several highly correlated genes will appear as dense clusters in the network. A network clustering tool, MCODE [135][63], was used to find densely connected regions in the network. A protein-protein interaction database (STRING) was further used for functional enrichment of the clusters in the network and in silico validation of the gene interactions. This analysis was performed on a subset of our datasets ([136]GSE30122, [137]GSE99340, [138]GSE104948 were not included in the Network inference analysis). 2.6. Functional validation experiments using Zebrafish model 2.6.1. Animal care declaration Zebrafish (Danio rerio) adults were maintained in a recirculating housing system under standard environmental conditions of temperature at 28 °C, conductivity at 1000 µs, and pH at 7.5 with 14 h light and 10 h dark cycle. All protocols used in these studies were approved by the local Animal Care and Use Committee and conform to the Zebrafish Policy published by the Qatar Ministry of Public Health that following the Guide for the Care and Use of Laboratory Animals published by the National Institutes of Health. Zebrafish experiments were approved by the local Animal Care Committee (Qatar Foundation–EVMC-2020–006). Wild-type zebrafish were maintained as previously described in Da’as et al., 2020 [139][64]. Embryos were obtained from breeding and kept in an E3 solution at 28.5 °C. The zebrafish were divided into groups and raised in E3 media in control 0% glucose conditions or, to induce hyperglycemia in the developing zebrafish model, 2% high glucose conditions. 2.7. Zebrafish care, maintenance, and histological examination Both groups were staged at 3 days old and fixed in 4% paraformaldehyde, then embedded in paraffin and cut into sections at a thickness of 4-µm. The kidney sections were stained with hematoxylin and eosin (H&E) as described previously [140][64]. Representative images were scanned using Phillips Ultra versatile scanner L60, at 40X magnification, scale bar, 50 µm. 2.8. Gene expression analysis of zebrafish Total RNA was extracted from whole-zebrafish larvae using RNeasy mini kit (Cat. No. 74104, Qiagen, LLC, USA) according to the manufacturer’s instructions. Reverse transcription was achieved using Superscript IV (Cat. No. 18091050, Invitrogen, USA) as previously described [141][64]. Real-time quantitative PCR (qPCR) was performed in 20 μL reaction volumes using the listed primer sets ([142]Table 2), and the qPCR reaction mixture contained 1 µL of cDNA, 4 µL of H2O, 1 µL of forward primer (10 µM), 1 µL of reverse primer, and 10 µL of PowerUp™ SYBR™ Green Master Mix (Cat. No. A25741). The qPCR reaction was carried out in a QuantStudio™ 12 K Flex Real-Time PCR System in a three stages standard curve experiment: (1) hold stage of 50ºC for 2 min, followed by 95ºC for 10 min; (2) fourty-cycle PCR stage of 95ºC for 15 s, followed by 60ºC for 1 min; (3) melt curve stage 95ºC for 95 s, 60ºC for 1 min, 95ºC for 15 s Table 2. List of target genes and primers used to target zebrafish genes in the study. Target gene 5′ to 3′ Direction Primer sequence dachd Forward TTCTCAAACACCTGGTCGGG dachd Reverse CTGCTCCACGTTACACACCA lmx1ba Forward CGCGCTGTTTTGTCTTGTCT lmx1ba Reverse CACTGACTCTCTCCAGACGC lmx1bb Forward GACTTGTGCACGGTGTAGGA lmx1bb Reverse ACGTCCGAGACGTTTTCTCC wt1a Forward TGCCTTTACCGTGCACTTCT wt1a Reverse GCTGAACATCCTTGGTTGGC wt1b Forward CGATGGGATCGGATGTCAGAG wt1b Reverse GAGGCAGGGTGAAAGTCCA eefa1 Forward GGGCCCATGACATGACTACA eefa1 Reverse AGGTGACCAGTCGTAGTGGA [143]Open in a new tab 2.9. Statistical analysis For expression analysis, the relative change in gene expression was quantified from the qPCR results using the Livak [144][65] method. The CT values for each gene in each experimental condition were assessed using 11–12 replicates in each group and were normalized to an internal control gene (eefa1). The relative expression was calibrated against the mean expression of control samples, and significant differential expression between groups was estimated using Wilcoxon rank-sum test. The data were analyzed for differential expression using custom scripts and the PCR [145][66] package in R. 3. Results 3.1. Identification of key differentially expressed genes In response to our GEO query, 62 DN glomeruli and 57 corresponding control samples, 36 tubulointerstitial and 35 corresponding control samples, and 4 DN kidney cortex and 3 corresponding control samples were selected for meta-analysis ([146]Table 1). Considerable batch effects were observed across the preprocessed study samples, as illustrated in ([147]Fig. 1A). The PCA plot of RMA/log[2]-normalized and gene-level averaged expression across all datasets indicated that platform-introduced experimental batch effects (represented by platform colors) largely contributed to the differences compared to sample origin and disease status. Using the empirical Bayes-based method ComBat [148][52], the undesirable batch effects on gene expression were minimized, as shown in ([149]Fig. 1B). However, one dataset ([150]GSE1009) remained separated from the other datasets. After batch effects were corrected, the amount of variance described by the first two principal components that explained the most variance also decreased, showing that the study samples were more homogeneous and that the bias resulting from uncontrolled cross-study experimental circumstances was less. Differential gene expression using least-squares fitted linear models (see Methods) was independently computed for each batch-corrected dataset. In general, the number of unique DEGs within each dataset was proportional to the total number of DEGs in the dataset ([151]Fig. 2). For example, the dataset series [152]GSE96804, which contained the highest number of DEGs (10,116), also contained the highest unique number of DEGs (5424) not found in other datasets. The same dataset was also one of the top five in the list of 10 datasets with DEGs overlapping with other datasets. Furthermore, fewer DEGs were found within the tubulointerstitial datasets than the glomerulus datasets. Finally, no DEGs were found between the DN and controls for the kidney cortical tissue samples ([153]GSE111154) and one tubule dataset (GSE104954_GPL24120). Fig. 1. [154]Fig. 1 [155]Open in a new tab PCA of control and DN human kidney tissue datasets. Shown are the first two most variance-explaining principal components (PC) of RMA/log2-normalized and gene-level-averaged gene expression across all datasets (A) before or (B) after adjusting for batch-effects. Fig. 2. [156]Fig. 2 [157]Open in a new tab UpSetR plot of unique and shared DEGs among individual datasets. The number of total DEGs in each target glomerulus (A) and tubules (B) dataset is shown on the left side of the figure, and the left side of the figure shows the number of DEGs unique to each study, followed by the top 10 shared gene sets. Directionality of gene expression for unique genes is represented by red for upregulated genes and blue for downregulated genes. We next used a vote-counting consensus approach to identify a subset of core DEGs. The dual approach using vote counts and meta-analysis fold changes allowed us to gain insight into concordances between the two methods and combine them to their fullest potential. Genes found in at least a third of the datasets and common to both glomerular and tubulointerstitial samples were retained for further pathway analysis. In total, 164 genes were identified as core DEGs between DN and control samples for either the glomerulus or tubule datasets. Out of the 164 core DEGs presented in the heatmap ([158]Fig. 3), 147 were found to be DEGs in the glomerulus compartment ([159]Table S1), compared to only 35 in the tubulointerstitium ([160]Table S1). We broadly identified two aggregates of downregulated (Aggregate 1, n = 88 genes) and upregulated (Aggregate 2, n = 75 genes) sets of genes from the heatmap. These were used to conduct pathway enrichment analysis based on the consistency in the directionality of the gene expression changes [161][67]. Fig. 3. [162]Fig. 3 [163]Open in a new tab Heatmap of core DEGs human DN kidney tissue samples compared with healthy controls. To simplify visualization, the groups branching from the first hierarchical cluster dendrogram were split and situated side-by-side, labeled “Aggregate 1″ and “Aggregate 2″. The vote counts represent the number of times a gene was significantly down- or up-regulated (Log[2] FC = 1.5, FDR<0.05). 3.2. Signature pathways in glomerular and tubular datasets The core downregulated DEGs in Aggregate 1 were found to be enriched for kidney-tissue development pathways ([164]Fig. 4A). In contrast, the core upregulated DEGs in Aggregate 2 were heavily enriched for regulation (including negative regulation) of metabolic functions and humoral immune responses, with specific genes such as LTF, SLPI, and C3 connecting both biological themes ([165]Fig. 4B). In addition, kidney-related diseases and conditions were more highly enriched in the Aggregate 1 than the Aggregate 2 gene set. DEGs found within both clusters were also enriched for non-kidney related diseases, such as blood disorders ([166]Fig. 5A) and respiratory illnesses ([167]Fig. 5B). Fig. 4. [168]Fig. 4 [169]Open in a new tab Gene-concept network plot comparing the functional profiles of DEGs in human DN kidney tissue samples compared with healthy controls. Shown are the top 8 gene-ontology terms assigned to genes from the first/downregulated (A) and second/upregulated (B) hierarchical cluster groups produced in [170]Fig. 3. Log[2] fold-changes were based on meta-analysis datasets using glomerulus or tubule datasets. Grey fold-change values indicate lack of data for meta-analysis dataset. Abbreviations: FC, fold change; GO, gene ontology. Fig. 5. [171]Fig. 5 [172]Open in a new tab Tree plot of hierarchical clustering groups of disease ontology semantic and enrichment analysis terms for core DEGs. Each tree was created using the core DEGs defined and identified in the meta-analysis from Aggregate 1 (top) and Aggregate 2 (bottom). 3.3. Gene network inference We next looked at the interaction network model computed using the MI algorithm [173][61] and explored the top MI values between the gene pairs. The force-driven layout of the network showed intriguing associations between the clusters in the network. The STRING protein-protein interaction database (https://string-db.org/) was used in the annotation of these clusters based on the most prominent biological functions ([174]Fig. 6). Four of the densest clusters in the network were enriched for podocyte function, immune system processes, mTOR signaling, cytokine receptor interactions, mitochondrial dysfunction, osteoclast differentiation, and MAPK signaling pathways. After assessing the sizes of the clusters, we focused on the podocyte differentiation cluster, which also consisted of high MI values (see [175]Table 1). Fig. 6. [176]Fig. 6 [177]Open in a new tab Overall network model using the most significant MI scores visualized by means of a force-directed layout, revealing several clusters in the network. The most prominent biological functions were used to annotate the clusters. Nodes in the network represent the genes, and edges (lines) represent the correlations, measured as mutual information values. Each node border color represents the fold-change in the gene expression in DN conditions. The node size represents the connection to other genes (hubs), and the thickness of the edges represents the strength of the association between genes. 3.4. Podocyte differentiation Using network inference analysis, we found a sub-network cluster enriched for ribosomal activity and podocyte function, which was built from several hub genes and TFs known to play a role in podocyte differentiation ([178]Fig. 7A). TF genes DACH1 and LMX1B, which our DEG analysis showed to be significantly downregulated in DN, are reportedly critical to podocyte differentiation [179][68], [180][69]. These were connected to many other genes in the identified sub-network cluster and appeared as network hubs. The other hub genes in this cluster were NPHS1, TYRO3, VEGFA, PTPRO, PODXL, and FGF1, all of which were enriched in renal system development pathways ([181]Fig. 3, aggregate 1). The expression fold changes for nearly all genes in this cluster was suppressed in DN compared to the controls. NPHS1 showed interactions with 40 other genes in the network. STRING network analysis using the same gene sets supported several of these interactions and revealed VEGFA, NPHS1, and WT1 as potential interaction partners. The protein-protein interaction network comprised of 44 nodes and 49 edges at an average node degree of 2.23 with a highly significant interaction enrichment (p-value = 5.41e-11), indicating that these proteins form at least a partially connected biological group. Fig. 7. [182]Fig. 7 [183]Open in a new tab Network clustering of podocyte-differentiation-related differentially expressed genes in human DN kidney tissue compared with healthy controls. The size of each circle is proportional to the number of gene interactions, and the thickness of the edges represents mutual information between genes (A). Network cluster enriched for podocyte function with TFs in bold text. The interactions between the genes were also confirmed by STRING network (B). 3.5. Zebrafish functional validation The TF genes DACH1, WT1, and LMX1B, which were downregulated in DN based on our DEG analysis, are master regulators of gene expression in podocyte differentiation. Additionally, they were identified as hubs with multiple potential interaction partners using our MI network inference model. To elucidate their gene regulatory functions, these three TF genes and their orthologs, DACH1 (dachd), LMX1B (lmx1ba and lmx1bb), and WT1 (wt1a and wt1b), were selected for additional functional validation in hyperglycemic zebrafish models. The incubation of developing zebrafish embryos in 2% glucose resulted in increased total free-glucose levels in the whole-zebrafish homogenates. The average total free-glucose levels in the 3-day-old zebrafish group incubated in E3 media with 2% glucose measured 69.6 mg/dL compared to 43 mg/dL glucose levels in the control zebrafish group incubated in 0% glucose. Renal development in the zebrafish model was then evaluated. In accordance with the previously established hyperglycemia model [184][64], alterations to the developing renal system were noticed in the zebrafish incubated in 2% glucose conditions. Indeed, hyperglycemia caused noticeable tubular morphological changes to the glomeruli, pronephric tubules, and proximal and distal ducts, as evident in the examined histological sections, that were associated with the increase in total glucose levels ([185]Fig. 8). The hyperglycemia-induced zebrafish model showed an abnormally distorted pronephros: the pronephric tubules were reduced in size at the proximal and distal pronephric ducts at 2% glucose condition when compared to the control group at 0% glucose. Fig. 8. [186]Fig. 8 [187]Open in a new tab Hyperglycemia-induced pronephric abnormalities in zebrafish model. Representative micrographs from the level of the glomerulus ([188]1, [189]2, [190]3), proximal pronephric duct ([191]Sections 4–5), and distal pronephric duct (Section 6) are shown. Shapes indicate the location of the glomerulus (GLM, green), pronephric tubule (PT, red), and pronephric duct (PD, blue). Abbreviations: ZF, zebrafish. We next examined the effects of hyperglycemic conditions on the expression of DACH1A (zebrafish ortholog: dachd), LMX1B (zebrafish orthologs: lmx1ba and lmx1bb), and WT1A (zebrafish orthologs: wt1a and wt1b). DACH1 and WT1 were prominent in the DEG and MI analyses, and LMX1B was important in the MI analysis and confirmed by STRING network ([192]Fig. 3, [193]Fig. 7). The evaluation of expression profiles of the key TF genes in the hyperglycemia-induced model demonstrated that, with increasing hyperglycemia, there was a significant reduction in dach1 and lmx1b expression levels and a significant increase in lmx1a, wt1a, and wt1b orthologs ([194]Fig. 9). Fig. 9. [195]Fig. 9 [196]Open in a new tab Expression of key TF genes in hyperglycemia-induced zebrafish model. Expression levels of key TF genes (A) dach1a, (B) lmx1ba, (C) lmx1bb, (D) wt1a and (E) wt1b were determined using quantitative real-time PCR and normalized to the internal control gene (eefa1), and relative expression was calibrated using the mean expression of control samples. Data are expressed as mean± SEM. Groups were 0% glucose and 2% glucose conditions. 4. Discussion High-throughput transcriptomics and analysis methodologies have made it possible to explore the mechanisms of complex diseases in detail. Finding the gene regulatory pathways, which are important and potentially causal to the phenotypic changes or pathogenesis of a disease and the DEGs, is challenging but worthwhile objective. In this study, we found distinct clusters of genes that were dysregulated in a broadly renal compartment-specific pattern using nine individual datasets of human DN and healthy control glomeruli and tubulointerstitial tissues. Specifically, most downregulated core DEGs defined within this study were strongly downregulated in the glomerulus compartment. In contrast, the core DEGs showed more pronounced upregulation in the tubule datasets. These patterns are in line with previous reports of gene-expression downregulation in glomerulus samples [197][70], [198][71], [199][72] and upregulation in tubule samples in DKD [200][71]. Using network models, we identified a small but a closely connected cluster of genes enriched for podocyte differentiation, several of which were identified as genes encoding critical TFs involved in podocyte function. Podocyte damage is one of the hallmarks of DN and leads to proteinuria and kidney damage [201][73], [202][74]. We selected a group of three TFs for functional validation in the hyperglycemic zebrafish model based on the combined outcomes from DEG and MI network analyses and their importance for podocyte differentiation. Hyperglycemia resulted in altered zebrafish pronephric development and was associated with altered expression levels of dachd, lmx1b, and wt1. Moreover, the occurrence of abnormal pronephric development adds to the substantial evidence that the key TF genes LMX1B, DACH1, and WT1 are associated with hyperglycemia-induced nephropathy. Firstly, DACH1 was consistently downregulated across all our analyzed datasets, presenting as a core downregulated DEG in human kidney tissue and a downregulated gene (dachd) in the hyperglycemic zebrafish model, consistent with its downregulation in human nephrectomy samples [203][68] and its knockdown effect (using dachd ortholog) on compromised podocyte function in zebrafish [204][75]. DACH1 expression is localized to the glomerular podocytes and epithelium of convoluted tubules during kidney development [205][76] and the nuclei of glomerular and distal tubular epithelial cells in adult kidneys, but not to the renal interstitium of normal tissue [206][77]. DACH1 functions as a TF and a key cell-fate determinant [207][78] which is critical for proper kidney development and function [208][79]. For example, it was demonstrated to protect podocytes (by podocyte-specific augmentation, [209][68]) and tubules [210][68] during experimentally induced kidney injury. Moreover, global knockouts of Dach1 in mice lead to early postnatal death [211][68], [212][80], and kidney hypoplasia suggesting global Dach1-KO mice die from renal failure [213][68]. Immunohistochemical staining of DACH1 proteins in human renal biopsies of immunoglobulin A nephropathy (IgAN) and idiopathic membranous nephropathy (IMN) patients show significantly decreased eGFR with declining DACH1, and significantly increased urinary protein in IMN with increasing DACH1 [214][77]. Although the biological mechanism behind DACH1 involvement in adult kidney tissue remains a mystery, the application of DACH1 as a predictor for clinical parameters [215][77] in CDKs such as IgAN and IMN highlights its potential as a key molecule in DN and lays the groundwork for future investigations into its mechanistic role in disease progression. LMX1B was identified as a second core DEG that was downregulated in the meta-analysis datasets for glomeruli and tubules. Similarly, WT1 (a core DEG) was consistently downregulated in glomeruli and in the majority of tubule datasets. The roles of WT1 and LMX1B in organ development and renal pathophysiology are well established. Multiple studies have shown that WT1 is involved in the development of the kidneys, neurons, and genitourinary tissue [216][81], and LMX1B is similarly involved in the development of kidneys [217][82], neurons [218][83], [219][84], and limbs [220][82], [221][85]. WT1 has been implicated as an essential regulator of podocyte fate [222][86], and it also functions in the activation of the downstream target LMX1B [223][69], [224][84]. Like WT1, LMX1B also plays a significant role in podocyte differentiation and function. It has been reported that the loss of LMX1B function results in hypoplastic kidneys that lack mature glomeruli or nephrons [225][87]. Indeed, in Lmx1b-knockout mouse models, kidney size diminishes, and glomeruli do not fully differentiate [226][69]. In addition to the poor kidney morphology outcomes related to the expression of TFs described above, we found the strong enrichment of pathways related to other nephropathic complications and multiple renal-system development pathways. Specifically, we found evidence for the downregulation of several key experimentally validated genes in DN, kidney development, and podocyte function in the glomeruli, such as FOXC1 [227][88], MAGI2 [228][89], PTPRO [229][90], NPHS1 [230][91], and NPHS2 [231][92]. More specifically, TYRO3 was included within the critical hub genes in the network analysis and was generally downregulated in our meta-analysis. TYRO3 is a protein expressed in podocytes and reported to play a critical role in maintaining podocyte function, especially in DN [232][93]. It has also been reported that TYRO3 mRNA expression decreases with the progression of the disease, and it may be a potential drug target [233][94]. The disease and gene-function enrichment analyses provided evidence of the association between DN and signature complications of CKD and other poor systemic outcomes, such as anemia, proteinuria, and humoral immune responses. Although DN is considered a non-inflammation sub-type [234][95] of nephrotic syndrome [235][96], we found there was sporadic but highly elevated expression of immune activation in gene-function aggregate 2. This aggregate included genes of notable significance to diabetic nephropathy, specifically in the tubulointerstitial compartment for CXCL6 [236][97], [237][98], and for the key DN target C3, which was recently identified in a bioinformatic analysis [238][99] and was shown to be involved in complement-system mediated glomerular and tubular injury using the ZS rat model [239][100]. Although the relevance of CXCL6 has been validated in in vitro using a human podocyte cell line [240][98] and in kidney biopsies and resections [241][97], the role of C3 in remains poorly characterized in kidney tissue from patients with diabetic nephropathy. Overall, the enrichment of pathways relating to immune function supports the pathophysiology re-defining role of inflammation in diabetic nephropathy [242][95]. This study has potential limitations, which are highlighted by the presence of unexpected results. Although we expected all orthologs to be downregulated in our qPCR assay in agreement with our meta-analysis, we found unexpected upregulation in the zebrafish hyperglycemic validation model for both WT1 orthologs and one of the LMX1B orthologs (lmx1ba). The experimental conditions we used (2% glucose treatment for 3 days) may not have been stringent enough to induce the DN pathology experienced by patients included in the datasets. Additionally, we chose to examine gene expression of the key TFs using whole-larvae homogenates—as opposed to pronephros tissue— as it was logistically more feasible. Using whole larvae for RNA extraction could have skewed the results on our target genes, as they are involved in many developmental pathways that could last for up to 6 weeks after embryonic development in zebrafish [243][101], during which time the proteins encoded by these genes are expected to be highly abundant. To address some of the difficulties raised here, experimental designs utilizing organ-specific assays, together with immunohistochemistry or in situ hybridization labeling for the major DEGs, would be necessary. Future studies should therefore employ more focused experimental setups and targeted expression assays in zebrafish models in order to further validate the role of key genes in CKD. Finally, by providing gene-expression-supported evidence for the over-representation of genes related to immune responses, renal system development, and diabetic/CKD development and validating key-TFs with podocyte-specific functions, our study has highlighted the systemic nature of diabetic nephropathy as a chronic disease [244][102]. By utilizing multiple independent datasets and leveraging the power of meta-analysis, we could highlight DN driving key TFs that are typically hidden amid a sizable number of DEGs in any given study. As a result, this work identifies and offers a confined set of biomarkers that may have predictive and therapeutic implications for diabetic nephropathy. Declarations Ethics approvals and consent to participate * • IACUC ethical approval: Institutional Animal Care and Use Committee (IACUC). Protocol Number: EVMC-2020–006 * • IACUC ethical approval: 2020–1132, Sidra Medicine Zebrafish Facility approval date: 06/15/2020, Expiry date: 06/15/2023. Funding This work is supported by Sidra Medicine Research Branch, Qatar, grant number SDR100002. Author AAA is the project lead principal investigator, budget holder, and the recipient of the research funding support. Consent for publication All authors have approved the manuscript and give their consent for submission and publication. Data Sharing Statement Data used are publicly available. CRediT authorship contribution statement Ikhlak Ahmed, and Mubarak Ziab chose and examined the final datasets needed to construct the manuscript, writing, visualizing, and editing the final manuscript. Sahar Da'as, designing the zebrafish model and tests, collecting data, interpreting and writing the outcomes of the research, and visualizing the results. Waseem Hasan, zebrafish experiments and evaluation. Elbay Aliyev data analysis and dataset search. Sujitha P. Jeya molecular biology experiments linked to zebrafish validation, and initial dataset search. Sabah Nisar, the final manuscript version was editing and illustration. Ajaz A. Bhat experimental lab work in connection with the validation studies. Khalid Fakhro critical final revision and editing. Ammira S. Alshabeeb Akil, budget holder, study design, lab staff supervision, manuscript writing and revision, obtained the final approval of all authors, and produced the final manuscript draft. Competing interests The authors have no relevant financial or non-financial interests to disclose. Acknowledgments