Abstract

   Globally, nearly 40 percent of all diabetic patients develop serious
   diabetic kidney disease (DKD). The identification of the potential
   early-stage biomarkers and elucidation of their underlying molecular
   mechanisms in DKD are required. In this study, we performed integrated
   bioinformatics analysis on the expression profiles [31]GSE111154,
   [32]GSE30528 and [33]GSE30529 associated with early diabetic
   nephropathy (EDN), glomerular DKD (GDKD) and tubular DKD (TDKD),
   respectively. A total of 1,241, 318 and 280 differentially expressed
   genes (DEGs) were identified for [34]GSE30258, [35]GSE30529, and
   [36]GSE111154 respectively. Subsequently, 280 upregulated and 27
   downregulated DEGs shared between the three GSE datasets were
   identified. Further analysis of the gene expression levels conducted on
   the hub genes revealed SPARC (Secreted Protein Acidic And Cysteine
   Rich), POSTN (periostin), LUM (Lumican), KNG1 (Kininogen 1), FN1
   (Fibronectin 1), VCAN (Versican) and PTPRO (Protein Tyrosine
   Phosphatase Receptor Type O) having potential roles in DKD progression.
   FN1, LUM and VCAN were identified as upregulated genes for GDKD whereas
   the downregulation of PTPRO was associated with all three diseases.
   Both POSTN and SPARC were identified as the overexpressed putative
   biomarkers whereas KNG1 was found as downregulated in TDKD.
   Additionally, we also identified two drugs, namely pidorubicine, a
   topoisomerase inhibitor (LINCS ID- BRD-K04548931) and Polo-like kinase
   inhibitor (LINCS ID- BRD-K41652870) having the validated role in
   reversing the differential gene expression patterns observed in the
   three GSE datasets used. Collectively, this study aids in the
   understanding of the molecular drivers, critical genes and pathways
   that underlie DKD initiation and progression.

   Keywords: diabetic nephropathy, gene ontology, prognosis, biomarkers,
   differential gene expressions, diabetic kidney disease

Introduction

   Diabetes is a major global health crisis projected to affect 642
   million people by 2040 ([37]1). According to the statistics from the
   World Health Organization (WHO), the rapid rise in the population
   prevalence in low- and middle- income countries led to the dramatic
   rise in the number of diabetic patients from 108 million in 1980 to 422
   million in 2014. Unfortunately, nearly 40 percent of all diabetic
   patients develop serious diabetic kidney disease (DKD) ([38]2). The
   dangerous liaison between diabetes and DKD is well-established.
   Currently, diabetes is one of the leading causes of both end-stage
   renal disease (ESRD) and chronic kidney disease (CKD) ([39]3, [40]4).
   According to one study, the mortality due to the DKD rose by 94 percent
   between 1990 and 2012 ([41]5).

   In 2015, the Global Burden of Disease (GBD) assessed 1.2 million deaths
   and 19 million disability-adjusted life-years attributed to the hostile
   ailments of diabetes directly related to the reduced glomerular
   filtration rates ([42]6, [43]7). According to the data from the WHO,
   nearly 100 million Chinese people are currently suffering from diabetes
   ([44]8). Several studies reported the prevalence and risk of CKD and
   DKD in Chinese residents. Liu et al. conducted a cross-sectional survey
   including 23,869 clinical samples to measure the prevalences and risk
   factors associated with CKD and DKD in a Chinese rural population
   ([45]9). In participants with diabetes, the overall prevalence of CKD
   was more than 35 percent. The Chinese cohort study conducted from 2003
   through 2015 implicated a higher mortality rate and shorter life
   expectancy in the patients suffering from DKD. Taken together, there is
   an urgent need to develop the methods for early detecting the
   progression of kidney diseases in diabetic patients.

   Based on anatomical distinction between the different sections of human
   kidney, the DKD which is also commonly known as diabetic nephropathy
   (DN) are subdivided into glomerular or tubular ([46]10). Henceforth,
   the diseases associated with the glomerular and tubular region will be
   called glomerular diabetic kidney diseases (GDKDs) and tubular kidney
   diseases (TDKDs). The GDKDs are characterized by reduced glomerular
   filtration capacity, and the deposition of immune leading into the
   damage of the glomerular basement membrane ([47]11) whereas the TDKDs
   are characterized by the dysfunctions of transporters and channels in
   the renal tubular system, and the destructions of the normal tubular
   tissues ([48]12).

   The diagnosis of the DKD has posed a challenge due to the complicated
   etiopathology and comorbidity associated with other related diseases.
   Identification of both prognostic and diagnostics biomarkers to monitor
   the initiation and progression of DKD has been recommended ([49]13).
   Several diagnostic biomarkers mainly identified by monitoring the
   morphological changes in diabetic kidney were reported. The marker of
   tubular damage such as neutrophil gelatinase-associated lipocalin
   (NGAL), alpha-1-microglobulin, kidney injury molecule-1 (KIM-1),
   N-acetyl-beta-D-glucosaminidase, angiotensinogen, uromodulin,
   liver-type fatty acid-binding protein (L-FABP), and transferrin were
   also reported ([50]14). However, the prognosis of the disease is still
   poorly understood. To this end, gene expression profiling can aid
   tremendous value in identifying the putative prognosis biomarkers.

   The integrated use of differential gene expression (DEG) analysis and
   bioinformatics approaches have been advocated in early detection and
   monitoring of disease progression. It has also been used in identifying
   the chemical perturbagens capable of intervening the critical molecular
   pathways in diseased conditions. For example, the genomics and
   transcriptomics data profiling has helped in elucidating the
   pathogenesis of diabetes ([51]15–[52]17) and in identifying potential
   new mechanisms in DN ([53]18, [54]19). However, there are only a few
   studies that shed light on the potential pathogenesis mechanism and
   cross-talks between GDKD, TDKD and early diabetic nephropathy (EDN).
   For example, Woroniecka et al. conducted the transcriptomics profiling
   of GDKD and TDKD on human kidney samples ([55]20). The integration of
   multiple genomics data and their reanalysis offers a great avenue to
   identify certain disease-related biomarkers and help overcome the
   pitfalls of single cohort study, poor reproducibility and consistency
   ([56]21).

   In the current study, we analyzed the gene expression profiles of EDN
   ([57]GSE111154) ([58]22), GDKD ([59]GSE30528) ([60]20) and TDKD
   ([61]GSE30529) ([62]20) accessed from the NCBI‐Gene Expression Omnibus
   database (NCBI‐GEO). We identified the differentially expressed genes
   (DEGs) and further examined the molecular factors that drive the DKD
   pathogenesis using an integrated bioinformatics approach. We further
   conducted the drug perturbation analysis using the LINCS L1000 data
   repository ([63]23) to study the effect of small-molecules and
   identified the potential drug candidates which have the ability to
   induce and reverse the transcription profile signatures.

Materials and Methods

Dataset and Identification of DEGs

   The high throughput gene expression profile datasets of EDN
   ([64]GSE111154) ([65]22), GDKD ([66]GSE30528) and TDKD ([67]GSE30529)
   ([68]20) were obtained from NCBI-GEO ([69]24). The datasets
   [70]GSE30528 and [71]GSE30529 were the subseries of [72]GSE30122 and
   based on the [73]GPL571 platform- HG-U133A_2 Human Genome Affymetrix
   array. [74]GSE111154 was based on the platform [75]GPL17586- [HTA-2_0]
   Affymetrix Human Transcriptome Array 2.0. The GDKD dataset had 9
   samples and 13 samples collected from the glomerular tissues of DKD and
   normal patients, respectively. The TDKD dataset had 10 samples and 12
   samples collected from tubular tissues of DKD and normal patients,
   respectively. The EDN dataset consisted of 4 clinical samples of the
   early DN and 4 non-diabetic control samples obtained from the kidney
   tissues and blood. The GEO2R module of the bioconductor package was
   used to identify DEG. The datasets were standardized, normalized by TPM
   (Transcripts Per Kilobase Million) and log transformed to identify DEGs
   with respect to the diseased vs control samples. In this study, the
   genes with log2FC > 1.5 and P value < 0.05 were defined as DEGs
   ([76]25, [77]26). The datasets were examined both individually and in
   groups to identify the genes unique to the respective diseases. Volcano
   plots and heatmap clusters of the DEGs were plotted using the ggplot
   modules of R package 3.5.8 and R studio 1.4.1106 ([78]27).

Gene Ontology and Pathway Enrichment Analysis

   The functions of the candidate DEGs were annotated using the DAVID gene
   annotation tool ([79]https://david-d.ncifcrf.gov/) ([80]28). The
   annotations were evaluated for three sub-ontologies viz. biological
   process (BP), molecular functions (MF) and cellular components (CC).
   The pathways enrichment analysis was performed using the Kyoto
   Encyclopedia of Genes and Genomes (KEGG) ([81]29, [82]30) and REACTOME
   ([83]31) databases. The interrelation analysis of the molecular
   pathways were conducted using the ClueGo (version 2.5.7) ([84]32)
   implemented in Cytoscape software (version 3.8.2) ([85]33). In this
   work, we have plotted the clusters based on the Benjamini Hochberg
   statistical significance including enrichment/depletion hypergeometric
   tests and a kappa score cutoff of 3 ([86]21).

Protein-Protein Interaction (PPI) Network Establishment and Modular Analysis

   The DEG-encoded proteins and their interactions amongst each other were
   established using the STRING database ([87]http://stringdb.org/)
   ([88]34). The PPI-network was created and evaluated using the Cytoscape
   software. Additionally, the significant interactions within the
   PPI-network were screened using MCODE plugin ([89]35) implemented in
   Cytoscape software. The selections were made based on the highest MCODE
   score, degrees and the number of nodes. In this work, the significant
   clusters were selected based on the network scoring degree cutoff of 2.
   The parameters to find the clusters were optimized to the haircut
   option with K- score cutoff of 2 and node score cutoff of 0.2.

Validation of Hub Genes

   The expression levels of the hub genes in various tissues were
   investigated using the expression atlas platform
   ([90]https://www.ebi.ac.uk/gxa/home) ([91]36). The presence and absence
   of the genes were considered as the biomarker for the kidney diseases
   based on the standard expression levels of the genes in the renal and
   associated systems. The protein expression levels of the hub genes were
   also identified from the protein atlas database
   ([92]www.proteinatlas.org). The tissue pathology, transcript expression
   of hub genes as identified from Genotype tissue Expression dataset
   (GTEx) and their cell specific protein expression from human proteome
   atlas (HPA) in normal cells of glomeruli, and tubules were predicted.

Confirmation of Hub Genes

   The hub genes identified needed confirmation and hence a single-cell
   transcriptome data of early diabetic nephropathy ([93]37) from
   Humphreys Lab ([94]http://humphreyslab.com/SingleCell/) was analyzed
   with the data visualization tool. The dataset with GEO accession
   [95]GSE131882 had an integrated single-nuclear (sn) RNA data that
   included 3 control and 3 early diabetes samples from podocytes (PODO),
   proximal convoluted tubules (PCT), cells of the loop of Henle (LOH),
   connecting tubule cells (CNT), distal convoluted duct cells (DCT),
   principal cells of collecting duct (CTP), glomerular endothelial cells
   (EDC), intercalated cell B from collecting duct (CTB), intercalated
   cell A from the collecting duct (CTA), parietal epithelial cells (PEC),
   messenger cells and immune cells ([96]38). The hub genes were compared
   to the snRNA dataset to confirm their potential effect. Their
   expression in diabetic nephropathy vs healthy human donor was predicted
   through Nephroseq v5 tool ([97]https://nephroseq.org). The cut off
   thresholds were p value of 0.05, and log2FC of 1.5 to filter out the
   expression patterns of the putative candidate biomarker genes.

Drug Perturbation Analysis

   We have conducted the transcriptomic signature similarity search on the
   identified DEGs using L1000FWD web server
   ([98]http://www.lincsproject.org/LINCS/dmoa). The potential drug
   candidates which mimicked and reversed the gene expression signatures
   were ranked based on the similarity score close to 1 and -1 ([99]39).

Results

Identification of DEGs in DKD

   We obtained the gene expression profiles of diseased and controlled
   samples of GDKD, TDKD and EDN from the [100]GSE30528, [101]GSE30529 and
   [102]GSE111154 datasets and identified the DEGs using the GEO2R
   package. Upon setting the cut-off criterion as log2FC > 1.5 and P <
   0.05, we identified 1,241, 318 and 280 DEGs from [103]GSE30528,
   [104]GSE30529 and [105]GSE111154, respectively. The overlapping DEGs
   between the three datasets were also identified ([106]Figure 1A) The
   volcano plots of the DEGs of each dataset are shown in [107]Figure 1B.
   The heat map of the DEGs based on Hierarchical clustering between
   patients with DKD and healthy controls are shown in [108]Figure 1C. We
   have identified 44 upregulated and 23 downregulated genes overlapped
   between the EDN and GDKD, 188 upregulated and 5 downregulated genes
   were found overlapped between the GDKD and TDKD, and 52 upregulated
   genes were found commonly between the EDN and TDKD. Interestingly, two
   upregulated genes SULF1 and DCN, and one downregulated gene PTPRO were
   observed as common to the three datasets ([109]Figure 1A). The overall
   statistics are shown in [110]Table 1.

Figure 1.

   [111]Figure 1
   [112]Open in a new tab

   Identification of differentially expressed genes (DEGs) in the three
   datasets (EDN: [113]GSE111154, GDKD: [114]GSE30528 and TDKD:
   [115]GSE30529). (A) Venn diagrams representing commonly changed DEGs in
   the three datasets. (B) Respective volcano plot of the three datasets.
   Red and blue plots represent up- and downregulated genes, respectively
   (log2FC > 1.5 and P value < 0.05). Black plots represent the remaining
   genes with no significant difference. (C) Heatmap of the top 500 DEGs
   are shown.

Table 1.

   Identification of the commonly changed DEGs in the three datasets.
   Group               EDN vs GDKD GDKD vs TDKD EDN vs TDKD EDN vs GDKD vs TDKD
   Upregulated genes       186          42          50               2
   Downregulated genes     22           4            0               1
   [116]Open in a new tab

GO Term Enrichment Analysis of DEGs

   Gene ontology enrichment is significant to elucidate the mechanisms of
   differentially expressed genes and we performed DEG GO analysis using
   the DAVID ([117]https://david-d.ncifcrf.gov/) gene annotation tool.
   Three sub-ontologies viz. BP, MF and CC were examined. The DEGs of the
   respective datasets were examined and based on the P- value, the top 10
   best ranked gene annotations were considered for further analysis
   ([118]40). The GO annotations for the up- and downregulated genes are
   shown in [119]Supplementary Figures 1 and [120]2, respectively. As
   shown in [121]Supplementary Figure 1 illustrating the BP term, the
   upregulated genes were mainly enriched in extracellular matrix
   organization, regulation of cell migration, response to cytokine
   stimulation, regulation of coagulation and glomerular epithelial cell
   differentiation; downregulated genes were largely enriched in
   regulation of glomerular filtration, chemotaxis and cellular response
   to vascular endothelial growth factors. For MF, enrichment of
   upregulated genes were mainly involved in the metalloendopeptidase
   inhibitor activity, tau protein binding, integrin binding and
   prostaglandin E receptor activity, and that of downregulated genes were
   mainly in the filamin binding, VEGF (Vascular Endothelial Growth
   Factor) receptor binding and CMP alpha 2-3 sialyltransferase and
   immunoglobulin binding mechanisms. For CC, enrichment of upregulated
   genes were primarily in the phagocytic vesicles and granular lumen and
   the downregulated genes mainly in secretory vesicles, exocytic
   vesicles, platelet alpha granules and endolysosomes. We have also
   performed the GO term enrichment analysis for the overlapped genes
   ([122]Supplementary Table 1).

Signaling Pathway Enrichment Analysis

   KEGG and REACTOME pathway enrichment analyses of the DEGs (both up- and
   downregulated) were performed ([123]Supplementary Figure 3). The
   significantly upregulated pathways were identified to be the
   hemostasis, aggregation and activation of platelets, collagen formation
   and complement cascade activation. Enrichment of downregulated DEGS was
   mostly in vascular endothelial growth factor receptor dimerization,
   VEGF- ligand interactions, defective B3GALTL (Beta
   3-Glucosyltransferase) and pathways associated with O-glycosylation. We
   further performed the interrelational analysis of pathways by examining
   KEGG processes of up- and downregulated DEGs in ClueGO. Interestingly,
   the upregulated DEGs were primarily associated with the complement
   activation, collagen formation, collagen biosynthesis, humoral immune
   response, regulation of prostaglandin, perception of neurofibrillary
   tangle, cold stimulus and nephron morphogenesis ([124]Figure 2). The
   downregulated DEGs were primarily associated with the regulation of
   axon guidance, endothelial cell chemotaxis, vascular endothelial growth
   factor signaling, MAP kinase signaling, protein kinase activity,
   glomerular filtration, collagen biosynthesis, formation of fibrin clots
   and coagulation ([125]Figure 3).

Figure 2.

   [126]Figure 2
   [127]Open in a new tab

   Interrelation analysis of pathways upregulated DEGs by examining KEGG
   processes in ClueGO.

Figure 3.

   [128]Figure 3
   [129]Open in a new tab

   Interrelation analysis of pathways downregulated DEGs by examining KEGG
   processes in ClueGO.

PPI Network Construction and Identification of Hub Genes

   We have investigated the integrated pathways and PPI of the DEGs using
   the STRING database and Cytoscape software. The overall interactions
   were further validated with the MCODE module (shown in [130]Figure 4).
   Significant PPI clusters for up- and downregulated DEGs were identified
   with the maximum score of 13.286 and 6.25, respectively. The PPI
   cluster of the upregulated DEGs included 15 nodes with 94 degrees. The
   PPI cluster of the downregulated DEGs included 7 nodes with 25 degrees
   along with two subclusters containing 2 nodes and 2 degrees each with a
   score of 3.32. CTGF (Cellular Communication Network Factor 2), IGFBP5
   (Insulin Like Growth Factor Binding Protein 5) and LOX (Lysyl Oxidase)
   were identified as two upregulated hub genes for EDN. THBS2
   (Thrombospondin 2), LUM, COL6A3 (Collagen Type VI Alpha 3 Chain), FN1
   and VCAN were identified as four upregulated hub genes for GDKD. FSTL1
   (Follistatin Like 1), TIMP1 (TIMP Metallopeptidase Inhibitor 1), POSTN,
   SPARC and COL3A1 (Collagen Type III Alpha 1 Chain) were identified as
   five upregulated hub genes for TDKD. VEGFA, PLCE1 (Phospholipase C
   Epsilon 1), BMP2 (Bone Morphogenetic Protein 2) and FGF1 (Fibroblast
   Growth Factor 1) were identified as four downregulated hub genes for
   EDN. HRG (Histidine Rich Glycoprotein) was identified as the
   downregulated hub gene for GDKD. Interestingly, two most significant
   candidate genes viz. DCN (encodes decorin) and PTPRO (encodes
   receptor-type tyrosine-protein phosphatase O) were found common for the
   respective up- and downregulated DEGs in all three datasets. Non
   overlapping upregulated gene COL1A2 (Collagen Type VI Alpha 2 Chain)
   and downregulated THY1 (Thy-1 Cell Surface Antigen), KNG1, EGF and
   THBS1 were also identified from the string interactions.

Figure 4.

   [131]Figure 4
   [132]Open in a new tab

   Protein–protein interaction (PPI) network of DEGs and module analysis.
   (A) Identification of a significant module based on the degree of
   importance examined for upregulated DEGs. (B) Identification of a
   significant module based on the degree of importance examined for
   downregulated DEGs. The representations are as follows: the potential
   genes unique to EDN, GDKD, and TDKD are shown in green, pink and cream
   colors respectively. The cyan color represents the common gene noticed
   in all three datasets for the respective up- and downregulated DEGs.

Validation of Hub Genes

   The hub genes were validated based on the comparison between the
   standard expression levels from the Genotype tissue expression (GTEx)
   dataset and the fold change as a measure of Transcripts per kilobase
   million (TPM) with a maximum of 11,749 TPM. Analysis of the
   distribution of the hub genes revealed that they were significantly
   expressed in both left and right cortex tissues of the kidney. As per
   the GTEx dataset, higher expression levels of PTPRO, NPHS1, VCAN,
   COL6A3, THBS1 and FN1 were found in the cortex and pelvis of both the
   kidneys. Medium expressions of KNG1, LOX, THY1, THBS2, PLCE1 and VEGFA
   and lower expressions of HRG were identified in the renal tissues
   ([133]Figure 5). Validation was further extended with the prediction of
   the tissue specific expression rates as per GTEx data set and cell
   specific expression rates from Human Protein Atlas maps and are
   represented in [134]Table 2.

Figure 5.

   [135]Figure 5
   [136]Open in a new tab

   Hub genes identified from the expression atlas based on their
   transcripts per kilobase million expression level in the renal system.

Table 2.

   Expression levels of the hub genes identified from GTEx and HPA data.
   Hub genes RNA seq expression Protein Expression Sn RNA dataset
   Kidney (pTPM) Glomerulus Tubules P- value Cell type
   VEGFA 96.1 medium medium 0.962 PODO
   PTPRO 22.6 high not detected 0.000339427082146 PODO
   LUM 36.3 not detected not detected Not detected Not detected
   COL6A3 6.8 not detected not detected 2.55212579805471E-06 MES
   FN1 38.9 medium high 0.406463264506511 MES
   POSTN 23.0 not detected low 0.010690303094222 MES
   KNG1 126.5 medium high 1.10077051593139E-05 DCT-CT
   FGF1 40.9 high not detected 0.002353864219147 PODO
   BMP2 3.3 No data No data No data No data
   LOX 18.9 No data No data No data No data
   EGF 13.2 No data No data 0.027696802696135 DCT-CT
   DCN 301.3 low not detected 0.800847912989593 PODO
   SPARC 158.7 High low 0.018698295677998 PODO
   VCAN 9.6 not detected not detected 9.57643667243769E-07 PEC
   PLCE1 8.2 Medium Medium 0.455738614120668 PODO
   CTGF 167.1 low low 0.862180638654418 PEC
   [137]Open in a new tab

   Mean TPM of DCN was higher in kidneys with an expression rate of 301.3,
   followed by SPARC with a value of 158.7. Their expression rates were
   the highest in the urinary bladder with 1509.2 pTPM and 345.8 pTPM for
   DCN and SPARC respectively. DCN was identified as a renal cancer
   prognostic marker with lower levels of its protein identified in
   glomeruli cells of normal kidneys whereas higher levels of SPARC was
   predicted in glomeruli and lower levels in tubuli cells of normal
   kidneys. Higher protein expressions of PTPRO and FGF1 were detected in
   glomerulus and not in tubuli. PTPRO and FGF1 were already established
   prognostic biomarkers of renal cancers. Higher protein expression
   levels of FN1 and KNG1 were identified in normal tubular kidneys and
   medium levels in glomerulus and bladder tissues. Expressions of LUM,
   VCAN, COL6A3 were not detected in the normal glomerular, tubular and
   bladder tissues.

   From the overall analysis of hub genes and expression analysis,
   overexpression of FN1, LUM, VCAN could be the potential prognostic
   biomarkers of GDKD. Downregulation of KNG1 and upregulation of POSTN
   and SPARC could be significant biomarkers of TDKD. Upregulated CTGF
   represents a potential biomarker of EDN. The below cutoff expression
   (represented in transcripts per kilobase million or TPM) of PTPRO could
   represent any of the three diseased conditions such as early diabetic
   nephropathy, glomerular and tubular diabetic kidney diseases. Although
   the DCN is normally expressed in kidney tissue, high traces of its
   expression was also found in other organs such as heart and muscles.
   Thus, despite its significant expression in upregulated DEGs of all
   three datasets, it cannot be proposed as a suitable biomarker for DKD.

Comparison of Hub Genes With snRNA Transcriptome Dataset

   The hub genes were compared with the snRNA transcriptome dataset to
   infer their potentiality. The genes SPARC, THBS1, EGF, PTPRO, FGF1,
   POSTN, and COL6A3 were identified to be significantly enriched in the
   PODO, MES, DCT, PODO, MES, PEC, and MES samples respectively with
   significant p-value ([138]38) as seen in [139]Table 2. LUM, BMP2 and
   LOX were not detected. VEGFA, FN1, KNG1, FGF1, DCN, VCAN and CTGF were
   identified in PODO, MES, DCT-CT, PODO, PODO, PEC and PEC respectively.
   As seen in the [140]Figure 6, Nephroseq v5 webserver predicted the over
   expression of FN1, SPARC, LUM, VCAN, and POSTN with fold change of
   2.808, 1.655, 6.25, 1.77, and 3.324 in diabetic nephropathy
   respectively. Under expression of PTPRO and KNG1 with fold change of
   -3.308 and -1.618 respectively were reported in diabetic nephropathy in
   comparison with healthy living donors. These results correlated with
   the identified putative biomarkers of GDKD, TDKD and EDN except that of
   CTGF, where it has an under expression (fold change of -4.728) in
   diabetic nephropathy in contrast to the earlier predictions.

Figure 6.

   Figure 6
   [141]Open in a new tab

   The box plot of seven candidate genes that exhibit the over and under
   expression rates in diabetic nephropathy in comparison with healthy
   living donors.

Drug Perturbation of the Transcriptome Data

   Further, we identified the small-molecules chemical perturbagens. The
   large publicly available gene expression datasets from LINCS were
   explored to first identify the signature genesets for kidney disease.
   Subsequently, we identified the 50 different drug perturbation
   signatures for the overlapping DEGs. We filtered out the similar and
   opposite signatures for the DEGs from the human renal epithelial cell
   line HA1E ([142]41). Drugs with similar gene signatures represent the
   ones strikingly mimic the current gene expression patterns whereas the
   opposite gene signature patterns represent the ones capable of
   reversing the observed current expression profiles ([143]42). In this
   study, the top drugs were ranked based on the similarity scores. Two
   drugs viz. pidorubicine, a topoisomerase inhibitor (LINCS ID-
   BRD-K04548931) and Polo-like kinase inhibitor (LINCS ID- BRD-K41652870)
   were identified from the opposite signatures. Both the drugs were
   validated to be effective on the renal epithelium cell line with the
   similarity score of -0.9812 and -0.9516, respectively. The details of
   the drug perturbation analysis is presented in [144]Table 3. The
   transcriptomic signatures and the action of the drugs BRD-K04548931 and
   BRD-K41652870 on renal epithelium cell lines (HA1E) are shown in
   [145]Supplementary Figure 3.

Table 3.

   Drug perturbation analysis of the overlapped DEGs.
   Signature ID Pubchem ID Drug Similarity score Drug Structure
   Similar signatures - drugs mimic the originally observed expression
   patterns
   CPC002_HA1E_6H:BRD-K80348542-001-01-4:10 442195 Cephaeline
   BRD-K80348542 1 graphic file with name fendo-12-721202-g007.jpg
   CPC010_HA1E_6H:BRD-A24643465-001-05-3:10 16219462 Homoharringtonine
   BRD-A24643465 protein synthesis inhibitor 1 graphic file with name
   fendo-12-721202-g008.jpg
   Opposite signatures - drugs reverse the observed pattern of DEGs
   CPC002_HA1E_24H:BRD-K91370081-001-10-3:10 65348 Pidorubicine
   BRD-K04548931 Topoisomerase inhibitor -0.9812 graphic file with name
   fendo-12-721202-g009.jpg
   CPC012_HA1E_6H:BRD-K41652870-001-01-9:10 [146]3190037 BRD-K41652870
   Polo-like kinase inhibitor -0.9516 graphic file with name
   fendo-12-721202-g010.jpg
   [147]Open in a new tab

Discussion

   Currently, due to the aggressive comorbidities associated with the DKD
   poses an enormous challenge in their diagnosis. The identification of
   the better biomarkers are urgently needed to improve diagnosis.
   Niewczas et al. recently examined the proteomic profiling of
   circulating proteins ([148]19) in type 1 and type 2 diabetes patients.
   They identified an extremely robust inflammatory signature, consisting
   of 17 proteins enriched for TNF Receptor Superfamily members with a
   10-year risk of end-stage renal disease. Mitrofanova et al., reported
   that SMPDL3b modulates insulin receptor signaling in DKD ([149]43).
   Bjornstad et al., recently reported the relationships between markers
   of tubular injury and intrarenal hemodynamic function in adults with
   and without type 1 diabetes ([150]44). Hirao et al., reported the
   discovery of a useful biomarker for EDN discovery by proteomics
   analysis of urine proteomes of DM patients without microalbuminuria and
   healthy volunteers ([151]45). Cheung et al. performed an exome-chip
   association analysis in Chinese subjects with type 2 diabetes
   ([152]46). They noticed elevated circulating pigment epithelium-derived
   factor levels were associated with increased risks of DN and
   sight-threatening diabetic retinopathy. Collectively, the
   identification of novel markers for prognosis and diagnosis of DKD is
   an active area of research.

   In this work, we aimed to reanalyze the omics molecular profiles
   available in the public databases to identify the prognosis biomarkers
   and small-molecules chemical perturbagens of the DKD. Similar
   approaches have been advocated in previous studies. In the current
   study, we integrated gene expression patterns of three stages of DKD
   viz. EDN, GDKD and TDKD. We obtained three datasets of gene expression
   patterns from NCBI-GEO: [153]GSE111154 with samples affected by EDN
   along with healthy control, [154]GSE30528 with samples from the
   affected GDKD along with healthy control, and finally [155]GSE30529
   containing samples affected with TDKD along with healthy control.

   A total of 280 up- and 27 downregulated DEGs were identified and the
   subsequent analysis was carried out on them. Gene annotations of DEGs
   revealed the upregulations of biological processes such as cell
   migration, cytokine stimulation, and glomerular epithelial cell
   differentiation. Downregulated molecular functions being
   metallopeptidase inhibitor, VEGFA receptor binding and integrin binding
   ability. Cellular components like phagocytic vesicles and granular
   lumen were found in higher levels. Pathways enrichment analysis
   disclosed potential activation and aggregation of platelets, collagen
   fiber formation, hemostasis and activated complement cascade. Pathways
   involving the vascular endothelial growth factor dimerization and its
   interaction with ligands were substantially deferred from the normal
   occurrences. The subsequent inter-relational analysis supported the
   above findings. Strikingly, reduced developments of glomeruli and
   nephrons were noticed in all three diseased conditions. The
   determination of PPI networks allowed us to narrow down the DEGs to
   identify potential biomarkers.

   We compared the gene expression levels of the identified hub genes. FN1
   (responsible for cell adhesion), LUM and VCAN (veriscan) (involved in
   cytokine activity and signaling pathways) could be the putative
   upregulated biomarkers of GDKD. We predicted the lower expression of
   KNG1 (kininogen1) responsible for elevated cytosolic calcium levels and
   fibrin clots can be a putative prognostic biomarker for TDKD.
   Upregulation of POSTN (peristin) and SPARC (secreted protein acidic and
   cysteine rich) could also be putative biomarkers of TDKD. PTPRO
   (protein tyrosine phosphatase receptor O) responsible for the
   chemotaxis and regulation of glomerular filtration was proposed as an
   attractive downregulated candidate biomarker for all three diseases.
   The genes FN1, and PTPRO were reported by the Online Mendelian
   Inheritance in Man (OMIM) database to be associated with
   glomerulopathy, renal fibrosis and nephrotic syndrome, respectively
   ([156]47). Correlation of expression of the candidate genes were
   obtained in comparison to the snRNA datasets, except that of CTGF which
   had a contradictory under expression in diabetic nephropathy Thus, the
   identified key candidate genes - FN1, SPARC, VCAN, LUM, POSTN, KNG1 and
   PTPRO can be considered further for the investigation in clinical
   research. Finally the two drugs viz. pidorubicine, a topoisomerase
   inhibitor (LINCS ID- BRD-K04548931) and Polo-like kinase inhibitor
   (LINCS ID- BRD-K41652870) were also identified using the induced
   transcriptional gene signature database. These drugs were known to have
   the validated role in reversing the differential gene expression
   patterns observed in all three datasets ([157]48).

   Though the study has integrated gene enrichment, pathways, tissue and
   cell specific expression analysis in the identification of hub genes
   for diabetic kidney conditions, there exists few limitations as well.
   There are advanced sc-RNA and bulk sequencing technologies available,
   yet the current study relied upon the affymetrix microarray datasets to
   infer the diabetic kidney disease pathogenesis for three reasons:
   first, majority of the diabetic kidney disease datasets were based on
   affymetrix platform, secondly, the human transcriptome data were
   limited and finally, the datasets without any treatment conditions were
   a little. The present study needs extensive clinical validation of the
   proposed candidate genes and drug perturbations. Thus the field demands
   more research and the current research opens up novel directions to
   explore the biomarkers and pathology of diabetic kidney diseases.
   Comparison of the hub genes with the sn-RNA sequence data added
   confirmation and correlated with the main findings of the study.

   In conclusion, we have investigated the potential biomarkers/key
   candidate genes for DKD using the integrated bioinformatics analysis.
   We identified seven candidate genes FN1, LUM, VCAN, KNG1, POSTN, SPARC
   and PTPRO significantly associated with the progression of DKD, and
   further investigation of these genes in clinical research is warranted.
   Furthermore, the results of this study increase our understanding of
   the molecular drivers, critical genes and pathways that underlie DKD
   initiation and progression. Identification of the small-molecules
   chemical perturbagens further validate their potentials as therapeutic
   targets.

Data Availability Statement

   The original contributions presented in the study are included in the
   article/[158]Supplementary Material. Further inquiries can be directed
   to the corresponding author.

Author Contributions

   ZG designed the study, curated the datasets, analyzed the computational
   results and wrote the original draft. AS helped in the acquisition,
   analysis and interpretation of the datasets, and wrote the revised
   manuscript. XML, XLL and LS helped in the datasets analysis and wrote
   the original manuscript. All authors contributed to the article and
   approved the submitted version.

Funding

   This work is funded by the Everest Project of Air Force Medical
   University (NO: 2020ZFB002).

Conflict of Interest

   The authors declare that the research was conducted in the absence of
   any commercial or financial relationships that could be construed as a
   potential conflict of interest.

Publisher’s Note

   All claims expressed in this article are solely those of the authors
   and do not necessarily represent those of their affiliated
   organizations, or those of the publisher, the editors and the
   reviewers. Any product that may be evaluated in this article, or claim
   that may be made by its manufacturer, is not guaranteed or endorsed by
   the publisher.

Supplementary Material

   The Supplementary Material for this article can be found online at:
   [159]https://www.frontiersin.org/articles/10.3389/fendo.2021.721202/ful
   l#supplementary-material
   Supplementary Figure 1

   Gene Ontology (GO) term enrichment analysis of upregulated genes in the
   three datasets (EDN: [160]GSE111154, GDKD: [161]GSE30528 and TDKD:
   [162]GSE30529). GO analysis of the upregulated genes were examined for
   three sub-ontologies (biological process, molecular function, and
   cellular component).
   [163]Click here for additional data file.^ (578.4KB, jpeg)
   Supplementary Figure 2

   Gene Ontology (GO) term enrichment analysis of downregulated genes in
   the three datasets (EDN: [164]GSE111154, GDKD: [165]GSE30528 and TDKD:
   [166]GSE30529). GO analysis of the downregulated genes were examined
   for three sub-ontologies (biological process, molecular function, and
   cellular component).
   [167]Click here for additional data file.^ (723.4KB, jpeg)
   Supplementary Figure 3

   Signaling pathway enrichment analysis of overlapped DEGs. from KEGG and
   REACTOME pathways.
   [168]Click here for additional data file.^ (770.5KB, jpeg)
   [169]Click here for additional data file.^ (11KB, docx)

References