Abstract

Background

   A significant cause of advanced renal failure is diabetic nephropathy
   (DKD), with few treatment options available. Calcitriol shows potential
   in addressing fibrosis related to DKD, though its molecular mechanisms
   remain poorly understood. This research seeks to pinpoint the crucial
   genes and pathways influenced by calcitriol within the scope of
   DKD-related fibrosis.

Methods

   Single-cell gene expression profiling of calcitriol treated DKD rat
   kidney tissue and screening of fibrosis-associated cell subsets.
   Mendelian randomization and enrichment analyses (CIBERSORT, GSVA, GSEA,
   Motif Enrichment) were used to explore gene-immune cell interactions
   and signaling pathways. Key findings were validated using independent
   datasets and protein expression data from the Human Protein Atlas.

Results

   Calcitriol treatment reduced proliferative cell populations and
   highlighted the FoxO signaling pathway's role in DKD. SUMO3 and CD74
   were identified as key markers linked to immune infiltration and renal
   function. These genes were significantly associated with creatinine
   levels and eGFR, indicating their potential role in DKD progression.

Conclusion

   Our results suggest that calcitriol modulates DKD fibrosis through the
   FoxO pathway, with SUMO3 and CD74 serving as potential biomarkers for
   kidney protection. These results provide fresh insights into strategies
   for treating DKD.

   Keywords: Calcitriol, Diabetic kidney disease, Fibrosis, Biomarkers,
   FoxO pathway

1. Introduction

   There is a great concern in the global health community regarding
   diabetic kidney disease (DKD), a common complication arising from
   diabetes that significantly contributes to the progression of end-stage
   renal disease (ESRD) [[39]1]. The condition typically involves several
   harmful processes, such as an increase in fat levels, long-lasting
   inflammation, and oxidative stress. Additionally, the buildup of
   certain proteins outside cells contributes to the gradual worsening of
   kidney function, eventually leading to ESRD. DKD also significantly
   elevates the risk of cardiovascular and cerebrovascular diseases
   [[40]2,[41]3]. The complexity and variability of DKD pose major
   challenges for current treatments. In the absence of effective drug
   therapies, many patients advance to ESRD, necessitating renal
   replacement options like dialysis or kidney transplantation
   [[42]4,[43]5]. Although some medications have shown efficacy in
   improving DKD, the overall clinical landscape remains largely unchanged
   [[44]6]. Consequently, there is a pressing need for novel treatment
   approaches to better control and manage DKD, given its increasing
   prevalence and profound impact on clinical outcomes.

   Active vitamin D has garnered increasing attention due to its
   multifaceted biological activities, particularly in the context of DKD.
   The regulation of calcium and phosphate levels, immune response
   modulation, and anti-inflammatory effects are all critically influenced
   by active vitamin D [[45]7,[46]8]. Evidence indicates that vitamin D
   could offer protective benefits in slowing the progression of chronic
   kidney disease [[47]9]. Early research on DKD suggests that active
   vitamin D may provide therapeutic advantages by mitigating renal
   inflammation and preserving kidney function [[48]10,[49]11]. However,
   despite these promising findings, the specific molecular mechanisms by
   which active vitamin D impacts DKD fibrosis remain unclear and warrant
   further investigation.

   This study aims to systematically explore the mechanisms by which
   active vitamin D affects the treatment of diabetic kidney disease (DKD)
   through an extensive analysis of diverse datasets. Utilizing Mendelian
   randomization, the research seeks to assess the potential causal
   relationship between active vitamin D levels and the initiation and
   progression of fibrosis in DKD. This method reduces the influence of
   external factors and reverse causality, allowing for a clearer
   assessment of the causal relationship [[50]12]. Single-cell sequencing
   technology will be employed to analyze the mechanism of action of
   active vitamin D in DKD fibrosis, discover new therapeutic targets, and
   provide a basis for cell-specific intervention strategies, thereby
   improving therapeutic efficacy [[51]13]. Additionally, bulk RNA
   sequencing will explore the immune microenvironment and regulatory
   networks within DKD. This comprehensive study seeks to elucidate the
   molecular mechanisms of active vitamin D in DKD, providing new
   scientific evidence for future precision medicine and personalized
   treatments to enhance patient outcomes and quality of life.

2. Methods

2.1. GEO data source and preparation

   A total of two major datasets, [52]GSE30528 and [53]GSE30529, were
   retrieved from Gene Expression Omnibus (GEO) repository. Dataset
   [54]GSE30528, employed as the primary analysis set, includes genetic
   expression data from kidney tissues of 10 DKD patients and 8 healthy
   controls. Dataset [55]GSE30529, utilized as an independent validation
   set, comprises gene expression data from kidney tissues of 12
   individuals diagnosed with DKD and 13 non-DKD controls.
   [56]Supplementary Table 1 provides detailed information about these
   datasets. Fibrosis-related gene collections were sourced through the
   GeneCards database ([57]https://www.genecards.org/). Since the dataset
   are sourced from a public database, approval from a local ethics
   committee was not required.

2.2. Animal and experimental design

   Fifteen male Sprague-Dawley (SD) rats, each six weeks old, were sourced
   from Three Gorges University (Yichang, China). The rats were housed in
   a pathogen-free (SPF) environment with tightly regulated conditions,
   stabilizing the temperature at 22 °C and relative humidity at 50 % and
   a 12-h cycle of light and darkness. The animals were provided with
   unlimited availability of food and water for the entire study period,
   ensuring their nutritional needs were met without restriction. The
   Animal Experiment Ethics Committee of Minda Hospital, affiliated with
   Hubei Minzu University, reviewed and approved the study protocol
   (Ethics Approval Number: Y2022009).

   The rats were randomly distributed into one of three experimental
   groups, each consisting of five animals. The first group, designated as
   the wild-type (WT) control, was administered an intraperitoneal
   injection of sodium citrate buffer (25 mM/L, pH 4.0) and maintained on
   a regular chow diet. The other rats were fed a diet rich in fats and
   glucose for eight weeks to induce metabolic alterations. Following this
   period, the rats were administered an intraperitoneal dose of 30 mg/kg
   streptozotocin (STZ) to develop a model of DKD. The DKD rats were
   further divided into two subgroups (n = 5 per subgroup): the DKD model
   group and the DKD + calcitriol group. Rats in the DKD + calcitriol
   group received both STZ injections (60 mg/kg, intraperitoneally) and
   were treated with calcitriol (0.03 μg/kg/day, orally). Calcitriol was
   supplied by Hy cell Biotechnology (Wuhan, China).

   Glucose concentrations in the blood were measured via tail vein
   sampling 72 h following the STZ injection. DKD was deemed successfully
   induced when random blood glucose levels reached or exceeded
   16.7 mmol/L. At the end of the 8-week intervention period, each rat was
   anesthetized with an intraperitoneal administration of sodium
   pentobarbital (30 mg/kg), followed by euthanasia through cervical
   dislocation. Subsequently, kidney tissues were harvested and stored at
   a temperature of −80 °C for subsequent examination and evaluation. For
   additional molecular analysis, right kidney tissues were obtained from
   two rats in the diabetic kidney disease cohort and two from the
   diabetic kidney disease + calcitriol cohort. The samples were pooled
   according to group and sent to Ruixing Biotechnology (Wuhan, China) for
   single-cell gene expression profiling.

2.3. Single-cell RNA profiling analysis

   Kidney tissues were isolated from control rats and experimental groups
   to generate single-cell suspensions. Two samples from the untreated DKD
   cohort and two from the calcitriol-administered group were combined
   into two pooled samples. The kidney cells were subsequently
   reconstituted in phosphate-buffered saline (PBS, Thermo Fisher
   Scientific, Waltham, MA, USA) at cell concentrations ranging between
   1.6 million and 7 million cells per milliliter, preparing them for
   sequencing. The cells were then loaded onto 10x Chromium microfluidic
   chips (10x Genomics, Pleasanton, CA, USA) and barcoded using the 10x
   Genomics Chromium System. Total RNA was isolated and converted into
   cDNA, after which a single-cell sequencing library was constructed
   using the 3′ v2 kit from the Chromium Single Cell system (10x
   Genomics). Sequencing was conducted on the NovaSeq 6000 system
   (Illumina, San Diego, CA, USA) utilizing 150 bp paired-end reads.

   Filtering and quality assessment for the scRNA-Seq data included
   eliminating low-quality cells and genes with minimal expression. Data
   normalization, homogenization, and PCA analysis were performed
   sequentially. The ElbowPlot method was employed to determine the
   optimal count of principal components (PCs), and clustering was
   executed through UMAP. Cell group classifications were assigned using
   the celldex toolkit, emphasizing the identification of cells relevant
   to the progression of the disease. Using the LIMMA software for
   differential gene expression analysis [[58]14], we utilized a corrected
   p-value cutoff of less than 0.05 to detect meaningful differences.

3. Mendelian randomization analysis

3.1. Exposure data (pQTL)

   Protein quantitative trait loci (pQTL) data were sourced from the 2021
   version of the deCODE database
   ([59]https://www.decode.com/summarydata/). This dataset contains a
   genome-wide association study (GWAS) of 35,559 European individuals,
   using 4907 aptamers to quantify plasma protein concentrations. Single
   nucleotide polymorphisms (SNPs) linked to each gene with a significance
   threshold of P < 1e-5 were regarded as potential instrumental variables
   (IVs). To maintain accuracy, linkage disequilibrium (LD) between SNPs
   was assessed, and strict criteria (R2 < 0.001 and p2 < 5e-5) were
   implemented to filter for the most relevant SNPs.

3.2. Outcome data

   Outcome data were obtained from the FinnGen repository
   ([60]https://www.finngen.fi/en/access_results), a comprehensive genetic
   research project focused on the European population. We specifically
   downloaded GWAS data for DKD, comprising 4111 cases and 308,539
   controls.

3.3. Statistical methods

   To establish causal relationships between pQTL and DKD, we used outcome
   IDs extracted from the FinnGen biobank database. The analysis utilized
   various statistical techniques to assess causality and determine the
   robustness of the instrumental variables: MR Egger, Weighted Median,
   Weighted Mode, and Inverse-Variance Weighted (IVW) methods. The
   reliability of the causal inferences was reinforced through further
   statistical validation, including the Wald ratio, offering a detailed
   assessment of gene expression's influence on DKD.

3.4. Sensitivity analysis

   A leave-one-out sensitivity analysis was conducted to assess the
   consistency of the genetic variants' effect on DKD risk. This approach
   sequentially excludes each SNP and recomputes the combined effect size
   for the remaining SNPs. By generating a new point estimate and its 95 %
   confidence interval with each SNP removal, we could evaluate each SNP's
   individual contribution. This analysis ensures robustness and
   identifies any disproportionately influential SNPs. Results were
   summarized in a chart comparing the estimates after individual SNP
   removals and the overall estimate with all SNPs included.

3.5. Heterogeneity test

   To evaluate statistical heterogeneity among the studied SNPs, a
   Mendelian heterogeneity test was performed. This analysis calculated
   the weighted sum of squares for effect sizes and standard errors of
   each SNP to produce a Q value. The Q value follows a chi-square
   distribution, where the degrees of freedom correspond to the total
   number of SNPs minus one. A p-value exceeding 0.05 suggests no
   significant heterogeneity in SNP effect sizes, indicating consistent
   impacts on DKD risk.

3.6. Functional exploration of GO and KEGG pathways

   The identified genes underwent Gene Ontology (GO) and KEGG pathway
   enrichment analysis via the clusterProfiler package in R. GO terms were
   categorized into biological processes, cellular components, and
   molecular functions, with statistical significance evaluated using an
   adjusted p-value threshold of <0.05. The results were depicted through
   bar plots, with color gradients representing the statistical
   significance of each term or pathway. Data visualization was achieved
   using ggplot2 to ensure clarity and precision.

3.7. Analysis of immune cell infiltration

   The composition of immune cells was evaluated using the CIBERSORT
   algorithm, applying the LM22 signature matrix to the [61]GSE30528
   dataset [[62]15]. The [63]GSE30528 dataset was normalized by
   calculating transcripts per million (TPM), and batch effects were
   corrected using the ComBat method to improve consistency. To ensure
   robust estimations of immune cell proportions, CIBERSORT was run with
   1000 permutations. The proportions of immune cells between the groups
   were compared using the Wilcoxon rank-sum test, a method well-suited
   for handling data that do not follow a normal distribution. All
   visualizations were generated using the ggplot2 and pheatmap packages
   in the R programming environment.

3.8. Pathway enrichment evaluation using GSEA

   To gain insights into the signaling pathway differences among groups
   with varying gene expression levels, we implemented Gene Set Enrichment
   Analysis (GSEA). This method utilized the curated gene sets from the
   MsigDB (version 7.0) database, providing a framework for identifying
   pathways unique to specific subtypes. By examining the differential
   expression patterns across these subtypes, we pinpointed key gene sets
   that were significantly enriched, ranking them by enrichment scores
   with a corrected p-value cutoff of less than 0.05. GSEA proves
   particularly useful in linking gene expression profiles to biological
   processes, making it invaluable for studies that aim to uncover the
   functional relevance of disease subtypes and their molecular
   mechanisms.

3.9. Gene Set Variation Analysis (GSVA)

   Unlike conventional gene set enrichment analysis, GSVA calculates
   enrichment scores at the individual sample level, enabling the
   identification of subtle pathway activity changes. To explore the role
   of key genes in DKD renal tubules, a training dataset was constructed
   from transcriptomic data, and the GSVA approach was employed. At the
   outset, samples were categorized into high-expression (HExp) and
   low-expression (LExp) groups according to the expression levels of the
   key genes. GSVA was then applied to compute the enrichment scores for
   specific gene sets within each sample, and differences between the
   expression groups were assessed, generating t-values to indicate
   pathway activity changes across various signaling pathways. This
   approach allows for the detection of biological pathway alterations
   across samples, providing deeper insights into functional dynamics and
   uncovering intricate mechanisms underlying the data.

3.10. Transcriptional regulation analysis of key genes

   Analysis of enriched transcription factor (TF) binding motifs was
   carried out using the HOMER software (v4.10) with its default settings
   [[64]16]. The gene set analyzed included key renal tubular genes
   involved in DKD, derived from RNA-seq data of the DKD training set,
   which compared renal tubular samples from DKD patients with healthy
   controls. A background gene set, adjusted for sequence length and GC
   content, was applied to control for non-specific motif matches. Motif
   enrichment was evaluated using the cisBP database, with results
   quantified by normalized enrichment scores (NES) and area under the
   curve (AUC) values. High-confidence transcription factor annotations
   were sourced from the cisBP database and supplemented with experimental
   data when available. Only motifs with significant NES and AUC scores
   were selected for further analysis.

3.11. Clinical Correlation with key genes

   Gene expression data for key renal tubular genes were acquired and
   analyzed from the DKD-specific dataset in NephroseqV5, with the
   [65]GSE30529 dataset serving as an independent validation cohort. The
   gene expression data were processed through TPM normalization and log2
   transformation, with low-expression genes (TPM <1) being excluded from
   further analysis. To evaluate the diagnostic relevance of key genes in
   DKD, ROC curve analysis was performed with the pROC package in R, and
   the area under the curve (AUC) was computed to gauge their ability to
   distinguish DKD patients from healthy controls. Correlation analysis
   was conducted to explore the relationship between gene expression and
   clinical parameters, such as serum creatinine and estimated glomerular
   filtration rate (eGFR), using either Pearson or Spearman correlation
   coefficients depending on the results of the Shapiro-Wilk test for
   normality. The eGFR values were calculated using both the MDRD and
   CKD-EPI equations. All statistical analyses and visualizations were
   performed in R, with data visualized through the ggplot2 package.

3.12. Verification of key gene expression by immunohistochemistry

   The expression of key genes was validated through immunohistochemistry
   (IHC) analysis. Data for this validation were sourced from the Human
   Protein Atlas (HPA) repository ([66]https://www.proteinatlas.org/), a
   robust resource offering extensive proteomic information. This database
   contains data generated from 27,520 antibodies, each targeting one of
   17,288 human proteins, providing a valuable platform for assessing
   protein expression patterns in various tissues. This database provides
   detailed protein atlases at various levels, including subcellular,
   single-cell, cell line, and tissue, and contains extensive resources of
   IHC staining images. For this study, we downloaded IHC images of normal
   kidney tissue for the selected key genes from the HPA database. The
   expression patterns observed in these images were analyzed to
   corroborate the findings from our gene expression analysis.

3.13. Statistical analysis

   Mendelian Randomization (MR) analysis was grounded in three fundamental
   assumptions. First, the relevance assumption was confirmed by selecting
   instrumental variables (IVs) with an F-statistic greater than 10,
   ensuring strong instruments. Second, the independence assumption was
   evaluated using MR-Egger regression and weighted median analysis to
   detect and account for potential pleiotropy. Third, the exclusion
   restriction assumption was tested through sensitivity analyses,
   ensuring that the IVs influenced the outcome only through the exposure.
   IVs were drawn from genome-wide association studies (GWAS) based on
   their significant association with the exposure trait (p < 5 × 10^-8).
   The primary analysis utilized the inverse-variance weighted (IVW)
   method, with MR-Egger and weighted median approaches employed for
   additional robustness. Statistical significance was set at a p-value
   threshold of 0.05, and adjustments for multiple comparisons were made
   using the Benjamini-Hochberg false discovery rate (FDR). All analyses
   were conducted in R version 4.2.2.

4. Results

4.1. Analysis of scRNA-Seq data at the single-cell resolution

   To investigate the cellular mechanisms contributing to DKD and evaluate
   the effects of Calcitriol treatment, this study employed single-cell
   transcriptomics as a technique to analyze gene expression at the
   individual cell level. The experimental setup included two DKD rats
   treated with Calcitriol and two untreated DKD rats, with renal tissues
   subjected to scRNA-Seq analysis. The Seurat framework was applied to
   process the gene expression data, applying filters to remove
   low-quality cells based on the following criteria: nFeature_RNA greater
   than 200, nFeature_RNA less than 4000, nCount_RNA below 25000, and
   percent.mt under 75. This filtering retained 25,841 cells.
   Post-filtering, gene expression distributions were illustrated using
   violin and scatter plots ([67]Supplementary Figs. 1A and 1B). Emphasis
   was placed on the ten genes showing the highest normalized variance
   ([68]Supplementary Fig. 1C), potentially serving as biomarkers and key
   regulatory genes. After normalization, principal component analysis
   (PCA), and harmony analysis ([69]Supplementary Figs. 1D–1F), UMAP
   analysis was conducted to reveal the spatial relationships among cell
   clusters, identifying 18 distinct cellular subtypes ([70]Fig. 1A).
   These results provide crucial insights into the cellular heterogeneity
   present in DKD kidneys.

Fig. 1.

   [71]Fig. 1
   [72]Open in a new tab

   Cell Annotation and Differential Analysis (A) UMAP-based clustering of
   cells into 18 clusters using PCA components. (B) Scatter plot showing
   marker gene expression. (C) Annotation of cells into 13 distinct types.
   (D) Bubble chart illustrating cell types and their associated markers.
   (E) Differences in cell type proportions between sample groups. (F)
   Volcano plot depicting differential expression in proliferative cells.
   (G) Venn diagram showing intersecting genes. UMAP: Uniform Manifold
   Approximation and Projection; PCA: Principal Component Analysis.

4.2. Cellular subtype annotation in scRNA-Seq

   In this research, we identified and annotated 18 distinct cellular
   subtypes ([73]Fig. 1B) from scRNA-Seq, organizing them into 13 unique
   renal cell categories: Proliferative cell, mesangial cell, S3 proximal
   cell, macrophage cell, S1/S2 proximal tubule cell, Loop of Henle cell,
   collecting duct intercalated cell, principal cell, S2 proximal cell,
   proximal tubular cell, distal convoluted tubular cell, collecting duct
   principal cell, and connecting tubular cell ([74]Fig. 1C). The cellular
   markers for each type were depicted in a bubble chart ([75]Fig. 1D),
   and the distribution of each cell type was displayed in a bar chart
   ([76]Fig. 1E).

   Given the established association between proliferative cells and
   fibrosis [[77]17], the proliferative cell subtype was analyzed from
   both control and DKD (diabetic kidney disease) + Calcitriol
   intervention groups to determine differentially expressed genes using a
   P-value cutoff of less than 0.05. This analysis identified 433 marker
   genes, which were visualized using a volcano plot ([78]Fig. 1F).
   Subsequently, fibrosis-related genes sourced from the GeneCards
   repository were compared with the identified marker genes showing
   differential expression, yielding 219 common genes. This gene set was
   used for Mendelian randomization analysis to investigate their
   potential roles in the fibrotic process ([79]Fig. 1G).

4.3. Biological function enrichment analysis

   To investigate the biological roles of the overlapping genes identified
   in this research, we performed pathway enrichment analyses utilizing
   both Gene Ontology (GO) and KEGG databases. The GO analysis revealed a
   notable enrichment of these genes in pathways linked to mitotic nuclear
   division, nuclear division, and nuclear chromosome organization
   ([80]Fig. 2A). Furthermore, the KEGG analysis demonstrated that these
   genes play a pivotal role in key cellular processes, including the
   regulation of the cell cycle, DNA replication, and the signaling
   pathway involving Forkhead box O (FoxO) ([81]Fig. 2B). These results
   offer important insights into the molecular pathways driving renal
   fibrosis in DKD and could help in identifying novel therapeutic targets
   for treatment.

Fig. 2.

   [82]Fig. 2
   [83]Open in a new tab

   Enrichment Analysis (A) GO enrichment analysis results of intersecting
   genes. (B) KEGG pathway enrichment analysis results of intersecting
   genes. GO: Gene Ontology; KEGG: Kyoto Encyclopedia of Genes and
   Genomes.

4.4. Mendelian Randomization Analysis

   We utilized the 219 intersecting genes identified previously and
   accessed relevant data on diabetic kidney disease from the deCODE
   database. Utilizing aggregated statistical data from 312,650 samples
   (Controls: 308,539; Cases: 4111), we obtained the outcome ID
   finngen_R9_DM_NEPHROPATHY_EXMORE. Subsequently, we employed the
   extract_instruments and extract_outcome_data tools to establish 77
   gene-outcome causal relationships ([84]Supplementary Table 2).
   Mendelian randomization (MR) analysis identified two gene-outcome pairs
   with a positive causal association ([85]Fig. 3A–B), with an
   inverse-variance weighted (IVW) p-value <0.05. These genes were CD74
   and SUMO3. The presence of one variant with an odds ratio (OR) of SUMO3
   (OR: 0.775; 95 % CI: 0.619–0.970; p = 0.044) potentially correlated
   with a reduced risk of diabetic kidney disease ([86]Supplementary Table
   3), while the CD74 gene (OR: 1.474; 95 % CI: 1.128–1.926; p = 0.004)
   was linked to an elevated risk ([87]Supplementary Table 4).

Fig. 3.

   [88]Fig. 3
   [89]Open in a new tab

   Mendelian Randomization Analysis. (A–B) Scatter plots depicting MR
   analysis of key genes. Different colors denote distinct statistical
   methods, while the slope of the line illustrates the causal effect of
   each method. MR: Mendelian randomization. (For interpretation of the
   references to color in this figure legend, the reader is referred to