Abstract
There is a need to define regions of gene activation or repression that
control human kidney cells in states of health, injury, and repair to
understand the molecular pathogenesis of kidney disease and design
therapeutic strategies. Comprehensive integration of gene expression
with epigenetic features that define regulatory elements remains a
significant challenge. We measure dual single nucleus RNA expression
and chromatin accessibility, DNA methylation, and H3K27ac, H3K4me1,
H3K4me3, and H3K27me3 histone modifications to decipher the chromatin
landscape and gene regulation of the kidney in reference and adaptive
injury states. We establish a spatially-anchored epigenomic atlas to
define the kidney’s active, silent, and regulatory accessible chromatin
regions across the genome. Using this atlas, we note distinct control
of adaptive injury in different epithelial cell types. A proximal
tubule cell transcription factor network of ELF3, KLF6, and KLF10
regulates the transition between health and injury, while in thick
ascending limb cells this transition is regulated by NR2F1. Further,
combined perturbation of ELF3, KLF6, and KLF10 distinguishes two
adaptive proximal tubular cell subtypes, one of which manifested a
repair trajectory after knockout. This atlas will serve as a foundation
to facilitate targeted cell-specific therapeutics by reprogramming gene
regulatory networks.
Subject terms: Histone post-translational modifications, Chronic kidney
disease, Epigenomics
__________________________________________________________________
Comprehensive integration of gene expression with epigenetic features
is needed to understand the transition of kidney cells from health to
injury. Here, the authors integrate dual single nucleus RNA expression
and chromatin accessibility, DNA methylation, and histone modifications
to decipher the chromatin landscape of the kidney in reference and
adaptive injury cell states, identifying a transcription factor network
of ELF3, KLF6, and KLF10 which regulates adaptive repair and
maladaptive failed repair.
Introduction
The cellular response to kidney injury causes gene expression changes,
leading to successful repair with restored function or failed tubular
epithelial repair that can progress to chronic kidney disease
(CKD)^[115]1,[116]2. Putative adaptive states with successful or
maladaptive characteristics were recently described^[117]3 in both the
proximal tubule (PT) and the thick ascending limb (TAL) epithelial
cells of the loop of Henle. Gene expression profiles found in these
adaptive states included epithelial to mesenchymal transition,
elevation of cytokine production, senescence, and downregulation of ion
and solute transporters required for normal physiological function of
these nephron segments. Expression of key injury, repair, and
progenitor markers were altered in these states^[118]3–[119]5. The
proportion of adaptive cell state signature was increased in
individuals with CKD or acute kidney injury (AKI) and correlated with
faster progression to end-stage renal disease^[120]3. Although these
studies provide important insights into gene expression changes in
kidney injury and repair, a critical need persists to define the
epigenetic regulation of gene expression as kidney cells progress
through states of health, injury, regeneration, or failed repair.
Activation or repression of gene transcription is regulated by changes
in the epigenetic landscape, including the sliding of nucleosomes to
create regions of open and closed chromatin, DNA methylation, and
histone modification^[121]6. Numerous investigations have uncovered
these epigenetic signals in kidney disease^[122]7–[123]9. Different
patterns of DNA methylation have been found in DKD with alterations in
tumor necrosis factor-alpha and other pro-fibrotic
genes^[124]10–[125]12. Putative regulatory regions have been identified
in diabetic kidney disease (DKD), renal cell carcinoma (RCC), and
polycystic kidney disease (PKD)^[126]11,[127]13,[128]14 using ATAC-seq
to determine the accessibility of chromatin across the genome. A broad
set of ATAC-seq data is now available for a variety of renal diseases,
including RCC and DKD^[129]11,[130]13,[131]15, and has also been
leveraged to infer tissue-specific causal regulation of GWAS
findings^[132]16. The snATAC-seq of kidney tissue has augmented our
understanding of the regulation of disease at the single-cell level. In
one study, reduced accessibility of glucocorticoid receptor binding
sites was observed in the proximal tubule of diabetic humans^[133]15.
Investigations have also examined chromatin accessibility in human
Papillary RCC (pRCC), kidney organoid
differentiation^[134]5,[135]17,[136]18, and in multiple murine
models^[137]19,[138]20. Each of these studies provides important
insights into epigenetic expression regulation in the kidney. However,
deeper insights would be gained by the integration of these
technologies, particularly with single-cell sequencing, spatial
transcriptomics, and sequencing to delineate histone modifications.
To accurately identify regulatory regions that orchestrate the
coordinated activation and silencing of gene networks in injury and
repair, we interrogated an overlapping set of human kidney tissue
samples with multiple technologies to establish an atlas of genome-wide
DNA methylation, histone modifications associated with active or
repressed regions in promoters and enhancers, open chromatin, gene
expression, and protein expression. These methods have undergone
rigorous QA/QC review and standardization^[139]21, and have been
optimized for relatively low cell inputs suitable for interrogation of
kidney biopsy samples. Orthogonal validation was attained at
single-cell resolution and with spatial localization. We identified
differential regulation of the adaptive cell state in the PT and TAL.
The kidney epigenomic atlas established here will prove a valuable
reference for future studies analyzing diseased tissue obtained from
biopsy specimens.
Results
Samples and quality control
To comprehensively understand the contributions of epigenomic features
to transcript expression regulation, we generated, integrated, and
aligned data from multiple orthogonal technologies performed on a
partially overlapping set of 25 unique specimens (including 23 unique
epigenetic samples, Fig. [140]1, Supplementary Data [141]1). Multiple
replicates and multiple technologies were run for some samples,
resulting in a total of 84 datasets. Epigenetic assays used included:
(1) WGBS from laser microdissected glomeruli (GLOM) and
tubulointerstitium (TI), (2) single nucleus multiome (10x Genomics
combined snATAC-seq and snRNAseq), (3) bulk ATAC-seq, and (4) CUT&RUN
for histone modifications of active chromatin (H3K27ac, H3K4me3,
H3K4me1) and repressive chromatin (H3K27me3) derived from kidney
cortex. A kidney chromatin landscape browser application was developed
to view these aligned datasets across the genome (10.48698/HHE6-YV15).
Spatial transcriptomic, regional transcriptomic, and regional proteomic
expression datasets complemented the epigenetic technologies.
Fig. 1. Study workflow.
[142]Fig. 1
[143]Open in a new tab
In an overlapping set of kidney samples, tissue was interrogated by
laser microdissection-guided whole genome bisulfite sequencing (WGBS),
by multiome single nucleus Assay for Transposase-Accessible Chromatin
for sequencing (snATAC-seq) and single nucleus RNA sequencing
(snRNAseq) after cell disaggregation, and by Cleavage Under Targets &
Release Using Nuclease (CUT&RUN) of kidney cortex. Datasets were
aligned in the human genome 38 to create an integrated epigenomic
atlas.
We established extensive quality control assessments for the generated
datasets to determine technical and biological reproducibility,
coverage, batch effect, and assay drift. For WGBS, samples clustered as
expected based on the region dissected (glomeruli and the
tubulointerstitium) with an average coverage of 25×, and a mean of
22,156,845 CpG sites (range: 19.9–25.7 million sites) mapped per sample
with calculated methylation, representing 81% of all CpG sites and >99%
of annotated CpG islands within the human genome hg38 (Supplementary
Fig. [144]1). The CUT&RUN data was highly reproducible and showed
strong correlation across samples (Supplementary Fig. [145]2) and
overlap with the ENCODE ChIP-seq and DNAse-seq datasets, providing
additional validation of our approach in human kidney (Supplementary
Fig. [146]3). The multiome dataset also showed robust analytical and
biological reproducibility with ~2400 average genes/cell detected,
expected cell type distribution, and 4.8 mean transcriptional start
sites (TSS) enriched per cell (Supplementary Fig. [147]4).
Integrated whole epigenome alignment
After demonstrating satisfactory data quality, we examined the
genome-wide alignment of the open chromatin signatures of the whole
kidney by first using a publicly available ATAC-seq dataset^[148]22
with histone marks and methylation patterns (Fig. [149]2). These
modalities aligned across the genome. For example, for PODXL, a gene
expressed in podocytes, there was a strong methylation dip (reduction
or absence of DNA methylation) that coincided with an ATAC-seq open
chromatin peak, and the H3K4me3 promoter mark near the TSS
(Fig. [150]2a). Furthermore, two intronic ATAC-seq peaks and a peak in
the final exonic region aligned with H3K27ac active chromatin marks,
H3K4me1 enhancer marks, and glomerular methylation dips. This
multimodal approach allowed more detailed insights into mechanisms and
gene regulatory regions that define PODXL expression. We also confirmed
this correspondence in epigenetic regulation for proximal tubule and
TAL-specific genes PDZK1 and CASR, respectively.
Fig. 2. Alignment of epigenomic features in bulk and regional human kidney
samples.
[151]Fig. 2
[152]Open in a new tab
a Epigenomic features of marker genes for glomerulus (PODXL), proximal
tubule (PT-S12, PDZK1), and thick ascending loop of Henle (C-TAL,
CASR), displaying: (1) DNA methylation (DNAm) in the tubulointerstitium
(TI) (N = 15) and glomerulus (GLOM) (N = 15), (2) bulk CUT&RUN for four
histone modifications: H3K27ac (N = 6), H3K27me3 (N = 10),
H3K4me1 (N = 3), H3K4me3 (N = 3), and (3) assay for
transposase-accessible chromatin using sequencing (ATAC-seq) on bulk
tissue (Encode ENCSR297VGU) (N = 1). Gray stripes indicate active
promoters wherein ATAC-seq peaks at transcriptional start sites
coincide with DNAm dips, and H3K4me3 peaks. Variable H3K27ac peaks
reflect compartments’ proportion within bulk tissue. b Differential
methylation between GLOM and TI kidney compartments for summative
methylation of 30,024 gene bodies P value < 0.05 by t test. c Best-fit
regression model of methylation and mRNA expression in identical
samples (N = 22) for differentially expressed genes (N = 5408) between
the GLOM and TI. Each dot represents a gene. Y axis is the Log[2] fold
change of mRNA between the GLOM and TI. X axis is the Log[2] fold
change of methylation between the GLOM and TI. The best-fit annotated
gene region (summative promoter, exon, intron, of CpG island
methylation) with the most negative correlation was identified as the
promoter for 1867 genes and CpG island for 1327 genes. The inset
represents methylation fold change distribution in annotated gene
regions. d Genomic region annotation criteria based on epigenetic
landscape. e Landscape correlation agreement between datasets (Fisher’s
exact test two sides). f Histone markers and spearman correlation with
snRNAseq expression in the PT-S12 and C-TAL and in the regional mRNA
expression of the microdissected TI. g Cell type deconvolution of
CUT&RUN for H3K27ac, H3K4me1, H3K4me3 active histone modifications at
promoters. The RNA signature was taken from the HuBMAP/KPMP atlas
snRNAseq (N = 36), using the 10% most DE marker genes. h Upset plot
depicting overlap in peaks of H3K27ac, H3K4me1, H3K4me3, and H3K27me3
with DNAm dips across the genome. Active promoters, predicted
enhancers, and repressed regions are annotated. i Heatmap of CUT&RUN
marks across the genome by annotated region after filtering for open
chromatin and DNA methylation dips. The figure uses a licensed stock
image adapted from Adobe Illustrator (Eadon et al. - stock.adobe.com).
We next assessed regulation in key kidney structures by first examining
differential methylation patterns between glomeruli and the
tubulointerstitium (Supplementary Data [153]2). As expected, we
observed an inverse relationship between gene expression and
methylation in these structures; genes expressed in glomeruli (e.g.,
WT1) were methylated in tubules and tubule-expressed genes (MIOX) were
methylated in glomeruli (Fig. [154]2b). Similar trends were observed
for promoter methylation, summative intronic methylation, exonic
methylation, CpG island methylation, and methylation in the 5’ and 3’
untranslated regions (UTR); however, genes were not differentially
methylated to the same extent in these gene regions (Supplementary
Fig. [155]1n). Given the variability in differential gene region
methylation, we sought to determine the key regulatory regions where
methylation contributes to differential expression within the kidney.
To accomplish this, we acquired mRNA and protein expression (by mass
spectrometry) from the same microdissected samples as those
interrogated with WGBS (glomeruli and tubulointerstitium). Univariate
comparisons of expression with summative gene region methylation were
performed for all differentially expressed genes (DEGs, n = 5408) and
proteins (DEPs, n = 1796) between the glomeruli and tubulointerstitium
(Supplementary Fig. [156]5). We applied a best-fit model (overall
C = 0.58) of expression and methylation across all DEGs including the
region with the highest c-statistic (a measure of model fit) for each
individual gene (Fig. [157]2c). Summative promoter region methylation
best-explained mRNA expression in 1867 of the 5408 DEGs, while CpG
island methylation best-explained expression in 1327 genes. Albeit
non-canonical, intronic (n = 1143) and exonic (n = 1071) methylation
best correlated with expression in a subset of DEGs, but these genes
had a lower overall correlation between methylation and expression.
Similar correlation patterns were observed in methylation and protein
expression regressions (Supplementary Fig. [158]5). We hypothesized
that summative methylation from regions defined by CUT&RUN histone mark
peaks might also correlate with mRNA expression. The strongest
correlation between methylation and mRNA was found for regions in which
methylation was defined by enhancer H3K4me1 peaks (C = 0.45), but this
did not outperform the best-fit model of Fig. [159]2c. Even in the
best-fit model, the correlation was 0.58, suggesting features beyond
methylation contribute significantly to expression regulation.
We assayed genome-wide DNA methylation coexistence with CUT&RUN histone
marks in the kidney cortex and bulk kidney ATAC-seq peaks. Across the
genome, open chromatin regions were positively associated with active
chromatin marks (H3K27ac, H3K4me1, H3K4me3), and TI methylation dips,
whereas the silencing histone mark (H3K27me3) displayed a negative
association with these features (Fig. [160]2d, e). These patterns
suggest conserved features of epigenomic regulation across the genome.
Epigenomic features at gene promoters were then compared with mRNA
expression using laser microdissection-guided tubulointerstitial
expression^[161]23 or pseudobulk snRNAseq expression of the PT (S1 and
S2) and cortical TAL (C-TAL)^[162]3. The active chromatin marks
(H3K4me3, H3K4me1, and H3K27ac) positively correlated with expression,
while silencing marks were negatively correlated (Fig. [163]2f). We
justified these comparisons because the most abundant cell types of
bulk tissue were the PT and cortical TAL cells and the
tubulointerstitium accounts for the vast majority of these cell types.
We performed cell type deconvolution of the histone modifications
within promoters and identified major contributions from the PT, TAL,
and interstitial cell type signatures within the cortex histone data
(Fig. [164]2g). Similarly, bulk ATAC-seq peaks display the strongest
association with active promoters and predicted enhancers associated
with PT specific genes (Supplementary Fig. [165]6a). Approximately half
of all TI DNA methylation dips overlapped with bulk ATAC-seq peaks, but
very few ATAC-seq peaks overlapped in regions of elevated DNA
methylation (Supplementary Fig. [166]6b, c). Further, TI DNA
methylation dips that overlapped ATAC-seq peaks showed stronger histone
modification patterns than those that lacked overlapping ATAC-seq peaks
(Supplementary Fig. [167]6d). We quantitatively examined the overlap
between TI DNA methylation dips with cortex histone modifications at
bulk ATAC peaks. We identified a strong overlap at peaks associated
with epigenetic patterns indicative of active promoters (H3K4me3+,
H3K27ac+, and H3K4me1+/−) but less overlap with peaks displaying active
enhancer (H3K4me1+ and H3K27ac+) and repressed region (H3K27me3)
histone marks (Fig. [168]2h). To enhance specificity to TI, bulk ATAC
peaks were filtered for those overlapping TI DNA methylation dips and
we examined the binding patterns of promoter H3K4me3, enhancer H3K4me1,
active H3K27ac, and silencing H3K27me3 histone modifications in
distinct genomic regions (Fig. [169]2i). As expected, H3K4me3 localized
strongly to promoters which also display H3K27ac binding, indicating
that the majority of the promoters selected by the overlap of bulk
ATAC-seq peaks and TI DNA methylation dips are in the active state.
H3K4me1 is expectedly localized outside of promoters (within predicted
extragenic enhancers, exonic, and intronic peaks) together with
H3K27ac, indicating that the majority of the remaining ATAC peaks with
DNA methylation dips display active enhancer features. This integrative
analysis combining bulk tissue ATAC and cortex CUT&RUN histone data
with region-specific DNA methylation demonstrates the ability to use
bulk-level data to augment region-specific data and gain additional
insights into region-specific epigenetic landscape information.
Cell-type specific open chromatin in the reference cell state
We leveraged the 10x multiome data to better understand the cell
type-specific open chromatin signature across the kidney’s diverse
range of cell types. The multiome dataset included 47,217 nuclei
derived from 12 samples, which partially overlapped with those used in
WGBS and CUT&RUN (Fig. [170]3). The multiome atlas yielded 72 distinct
cell types based on a pre-existing human kidney atlas^[171]3. As a
control, we examined the alignment of expression and open chromatin in
known marker genes of the podocytes (POD), proximal tubules (PT), and
cortical thick ascending loop of Henle (C-TAL) in Fig. [172]3c. The
area under-the-curve (AUC) of the TSS was highest in the POD for PODXL
and NPHS1, while the TSS AUC was highest for LRP2 and SLC5A2 in the PT.
The regulatory patterns of PDZK1 (expressed in PT), ESRRG (expressed in
C-TAL), and PODXL (expressed in POD), align across snATAC-seq, WGBS in
the GLOM and TI, and the 4 histone modifications (Fig. [173]3d–f). For
example, in PDZK1, open chromatin was differentially accessible in the
PT (combined S1 and S2) with the TSS peak aligning with a TI methyl
dip, and a peak in H3K27ac with an H3K4me3 promoter peak in close
proximity. Two additional upstream (1–10 Kb) multiome peaks also
aligned with methyl dips and H3K27ac peaks, albeit with H3K4me1
enhancer peaks. In ESRRG, the first region highlighted in red denoted a
CpG island at the TSS with a methyl dip that aligned with open
chromatin of only the C-TAL cell type as well as the H3K4me3 promoter
and H3K27ac activation marks. A small peak in H3K27me3 also aligned,
likely due to the bulk nature of the CUT&RUN assay.
Fig. 3. Single-cell epigenomics in health.
[174]Fig. 3
[175]Open in a new tab
The multiome reduction by uniform manifold approximation and projection
(UMAP) of 47,217 nuclei in 72 clusters and 12 samples aligning with the
HuBMAP/KPMP atlas is depicted for a snRNA-seq and b snATAC-seq assays.
Highlighted clusters include podocytes (POD), proximal tubule (PT) S1,
S2, S3 and adaptive PT (aPT), cortical thick ascending loop of Henle
(C-TAL), and adaptive TAL (aTAL1 and aTAL2) cells. c Gene markers for
POD, PT-S1 merged with S2 (PT-S12), and C-TAL reveal cell type
specificity of expression and chromatin accessibility. Dot plot (top)
reveals transcript expression. Bar graph (bottom) represents the area
under the curve (AUC) for open chromatin coverage summed across the
entire gene. Genes d PDZK1, e ESRRG, and f PODXL align across the
snATAC-seq, WGBS, and CUT&RUN technologies. snATAC-seq peaks are
displayed in tracks 1–3 with respective RNA expression (N = 12) in
adjacent violin plots. DNAm levels (track 4) and differential DNAm
(track 5) are depicted for the GLOM (N = 15, blue) and TI (N = 15,
red). Histone marks for CUT&RUN are found in track 6–9: H3K27ac
(N = 10), H3K27me3 (N = 6), H3K4me1 (N = 3) and H3K4me3 (N = 3),
respectively. Aggregate open chromatin regions (track 10), and
chromosome coordinates are below. Differentially accessible open
chromatin peaks are annotated as red stripes (coincides with CpG
island) or gray (no CpG island). Cohen’s Kappa (CK) agreement between
snATAC-seq peaks with DNAm dips (g) or H3K4me3 peaks (h) in the 2194
differentially expressed genes of POD, PT-S12, and C-TAL. Perfect
agreement (disagreement) is 1 (−1), ranging from CK [0,0.2] for no
agreement to CK [0.8,1] for perfect agreement. Average CK across all
genes: G[T]. Most peaks positively correlated between technologies.
Smaller genes have stronger correlation. i Association of DNAm &
histone marks with open chromatin. The correlation of open chromatin
peaks with DNAm dips, and histone mark peaks is given for
differentially expressed genes of the POD, PT-S12, and C-TAL (Fisher’s
Exact test two sides). j Upset plot shows the intersection of CUT&RUN
peaks and DNAm dips across snATAC-seq PT-S12/aPT peaks, identifying
regulatory regions for PT. k Heatmap of CUT&RUN marks in PT-S12/aPT in
each annotated region after filtering for open chromatin and DNAm dips.
To confirm the alignment between the snATAC-seq peaks with the other
technologies, we selected 2194 DEGs from POD, PT, and C-TAL. The
regions of each gene were coded as “peak” and “absent peak” regions for
the snATAC-seq and H3K4me3 datasets, and as “dip” or “absent dip” for
TI WGBS. A comparison across technologies using the Cohen’s Kappa test
was performed to quantitate alignment of peaks across all chromatin of
the DEGs (Fig. [176]3g, h, Supplementary Data [177]3). Alignment ranged
from −1 for perfect disagreement to 1 for total agreement. The
agreement was strong between technologies in the DEGs with >99% of
genes having a positive Cohen’s Kappa value and the strongest agreement
was seen in smaller genes. In Fig. [178]3i, the correlation plot
calculated by Fisher’s exact test showed agreement between all
technologies, with the exception of the H3K27me3 silencing mark, which
is expected. These results were in agreement with those from publicly
available snATAC-seq data generated by the Humphreys lab^[179]5
(Supplementary Fig. [180]7b, d). Agreement was also seen with
differentially upregulated regions of DNA methylation in the glomerulus
and the corresponding expected TI gene expression (Supplementary
Fig. [181]7c), as well as with other technologies (Supplementary
Fig. [182]7e).
We explored the overlap of regulatory regions within the snATAC-seq
peaks of the combined PT-S1 and PT-S2 clusters (PT-S12) (Fig. [183]3j).
DNA methyl dips frequently coincided in regions of active promoters,
active enhancers, and repressed promoters. Upon filtering the
snATAC-seq peaks for regions with DNA methyl dips, peaks of H3K4me3
marks were strongly associated and centered upon known promoter
regions. In contrast, H3K4me1 peaks were centered in regions outside of
promoters. In both cases, H3K4me3 and H3K4me1 peaks strongly correlated
with H3K27ac enrichment but not H3K27me3 (Fig. [184]3k). This
integrative analysis resulted in a PT-S12 specific snATAC-seq peak map
with DNA-methylation-and-histone-modification-landscape information for
each peak, thereby providing the ability to functionally define each
open chromatin region.
We sought to understand the specificity of multiome signals in cell
subtypes of the atlas, comparing expression and open chromatin across
the PT-S1, PT-S2, and PT-S3 cell clusters. The PT shared expression of
common gene markers (LRP2, CUBN, SLC5A12) across all three segments;
however, unique gene markers specific to each proximal subsegment were
also identified (Supplementary Fig. [185]7f, g). For example, PRODH2
was most highly expressed in the PT-S1 with differential accessibility
(DA) in its TSS and 3’ UTR. SLC34A1 (Sodium phosphate co-transporter
2a) was highly expressed in the PT-S2, with DA seen in most peaks.
Finally, SLC7A13 (Aspartate/Glutamate Transporter 1) was expressed most
highly in the PT-S3 with DA of its TSS and other intragenic regions
(Supplementary Fig. [186]7h–j). Through these analyses, we have
demonstrated that the multiome dataset contains the resolution to
identify PT subtypes. Additionally, expected open chromatin patterns
were seen for the distal convoluted tubule (SLC12A3), principal cells
(AQP2), and endothelial cells (PECAM1) in known marker genes
(Supplementary Fig. [187]8).
Adaptive cell state in the proximal tubule
Recently, important and varied cell states associated with injury have
been annotated in a comprehensive atlas of the kidney^[188]3. One such
cell state is the adaptive state, found in multiple epithelial cells of
the kidney, including the PT and C-TAL. The adaptive cell state refers
to the balance of adaptive repair with maladaptive epithelial to
mesenchymal transition. The state is defined by a loss of expression of
canonical markers (like PDZK1 or SLC5A12 in the PT) with upregulation
of injury markers such as ITGB3, PROM1, and TPM1.
We used the multiome data to identify a corresponding set of 4194 DEGs
between PT-S12 and the aPT, in alignment with that of the published
kidney atlas^[189]3 (Fig. [190]4a, Supplementary Data [191]4). The
diffusion map (Fig. [192]4b) reveals the developmental trajectory of
the PT, from PT-S12 to the adaptive PT (aPT) cell state with feature
plots indicating the loss of canonical genes and upregulation of injury
markers over the pseudotime transition to the aPT phenotype
(Fig. [193]4c).
Fig. 4. Adaptive cell state in the proximal tubule.
[194]Fig. 4
[195]Open in a new tab
a Differentially expressed genes (DEGs) between the PT-S12 and aPT cell
types within the multiome atlas (N = 12), 4194 genes with
Bonferroni-adjusted P value < 0.05 (Wilcox test). b Diffusion map of
PT-S1, PT-S2 and aPT with 13,241 nuclei. The inset shows the pseudotime
trajectory from PT to aPT. c Gene expression localization in aPT cells
(ITGB3, PROM1, TPM1) for aPT marker genes. Canonical PT markers (PDZK1,
SLC5A12, and RXRA) localize to the PT-S12. d Differentially accessible
(DA) peaks N = 10,506, Bonferroni-adjusted P value < 0.05 (LR test)
between the PT-S12 and aPT (from multiome TRIPOD-seurat-aPTxPT-S12
object that coincide with TI DNA methylation (DNAm) dips (N = 15).
Red = promoter dip, blue = dip outside promoter. e DA multiome peaks
with DNAm dips and a CUT&RUN histone mark peak (N = 22). Active
promoter (Act[pro]) = green, predicted enhancer (Pred[enh]) = blue,
repressed promoter (Rep[pro]) = red, N = 6607, Bonferroni-adjusted P
value < 0.05 (LR test). f DA peaks (N = 2557) in upregulated DEGs of
the PT-S12, targeted by a transcription factor (TF) in TRIPOD analyses.
Orange dots represent new peaks (NP) in the PT-S12, where fewer than 2%
of aPT nuclei had open chromatin Bonferroni-adjusted P value < 0.05, LR
test). Gray dots display DA peaks present in both PT-S12 and aPT. All
DA NP were found in PT-S12 cells for DEGs upregulated in the PT-S12. g
DA peaks (N = 4573) in upregulated DEGs of the aPT. Green = aPT NP.
Gray = DA peak present in both PT-S12 and aPT.
We observed 137,993 peaks (labeled from 1 to 137,993 with gene name and
peak number) within the whole genome of the PT-S12 and aPT clusters
(Supplementary Data [196]5). Upon annotation, 21,790 peaks were mapped
to within 5 kb of 2 or more genes. From the annotated peaks, 49,662
coincided with a TI DNA methylation dip (Fig. [197]4d). The
differential accessibility was queried between all peaks in the PT-S12
and aPT genes, revealing 10,506 as DA, with 3,112 occurring in promoter
regions. In Fig. [198]4e, 27,174 peaks coinciding with DNA methyl dips
were categorized as active promoters (Act-Pro), predicted enhancers
(Pred-Enh), and repressed promoters (Repr-Pro) based on histone
landscape information and proximity to the TSS. Of these, 6607 were DA
peaks. PT peaks aligning with a DNA methyl dip were more likely to be
found in a CpG island (Supplementary Fig. [199]9). PT peaks aligning
with enhancer and repressed promoter marks, but not active promoter
marks, were more likely to occur in CpG islands. The relationship was
maintained after filtering for DNA methyl dips (Supplementary
Fig. [200]9a).
Transcription factors (TF) are proteins that bind to specific DNA
sequences and regulate gene transcription by activating or repressing
expression. We identified the transcriptional pathways that are
associated with differentially regulated regions in chromatin in aPT
cells using the TRIPOD package^[201]24. This permitted the
identification of regulatory trios of TF expression, target gene
expression, and target gene DA in a multiomic dataset (Supplementary
Data [202]6). If a TF DNA binding motif is enriched at a particular
peak, this suggests that the TF is likely regulating target gene
expression. If a particular histone modification is enriched, it
suggests that the modification is involved in regulating the
accessibility of the target gene chromatin. We analyzed 96 DEGs meeting
all criteria (DA, DEGs, TF binding in TRIPOD, DNA methyl dip, histone
modification present), with 39 DEGs upregulated in PT-S12 and 57 DEGs
upregulated in the aPT. The analysis revealed 68 distinct pathways,
organized into 40 clusters (Supplementary Fig. [203]9e, Supplementary
Data [204]7), with the top cluster being platelet-derived growth factor
binding, which is important in myofibroblast differentiation. Other key
pathways included stress fiber regulation and integrin complex
cell-matrix adhesion.
The multiome dataset defines cells by both expression and open
chromatin. Resultantly, we sought to exploit this opportunity to
identify the occurrence of unique “new peaks” (NP) in the aPT cell
state and PT-S12 reference cell state, both in promoter and
non-promoter regions of DEGs. A new peak was defined by the absence of
open chromatin in the comparative cell state (i.e., a new peak in
the aPT is a DA peak defined by fewer than 2% of cells in the PT-S12
possessing open chromatin). DA peaks occurred in 3095 regions of
upregulated DEGS in PT-S12 cells and were associated with TFs
identified by TRIPOD (Fig. [205]4f) with 380 new peaks. New peaks were
only found in the PT-S12 for DEGs upregulated in the PT-S12. The
converse was true in the aPT (Fig. [206]4g). Upregulated DEGs in the
aPT contained 7535 DA peaks with an associated TF, of which 455 were
new peaks not occurring in the PT-S12. The distribution of peaks and
new peaks was explored across DNA methyl dip regions and histone
modifications (Supplementary Fig. [207]10). New peak regions were more
likely to occur in introns, regardless of co-occurrence with DNA methyl
dips or enhancer, active, or repressed marks.
Two exemplar genes demonstrate new peak regulation (Fig. [208]5a, b).
The two genes have 7 NP that are associated with 98 transcription
factors (TFs), of which 9 are differentially expressed, as shown in
Fig. [209]5c. In the PT-S12, SLC5A12 (Fig. [210]5d) was upregulated
with two new DA peaks located at 83830 (promoter) and 83824
(neighboring ANO3 gene). The 83824 peak coincided with active enhancer
marks (H3K4me1 and H3K27ac) and was regulated by the TF / DEG RREB1 and
other TFs. PROM1 was upregulated in the aPT cell type (Fig. [211]5e)
with 5 new peaks spanning the promoter, exonic, and intronic regions.
Its TFs included PRDM1 which targeted 3 PROM1 new peaks. PRDM1 plays a
pivotal role in regulating stem cell differentiation and tissue
morphogenesis, including regulating injury signals in glomerular
endothelial cells^[212]25. Other TFs regulating PROM1 peaks included
ZEB1, KLF5, and EGR1.
Fig. 5. New peaks in reference and adaptive cell states.
[213]Fig. 5
[214]Open in a new tab
a, b Gene alignment of SLC5A12, a PT-S12 marker, and PROM1, an aPT
marker, across snATAC-seq peaks (N = 12), DNAm TI dips (N = 15), and
CUT&RUN histone marks H3K27ac (N = 10), H3K27me3 (N = 6), H3K4me1
(N = 3), and H3K4me3 (N = 3). Red stripe indicates new peaks (NP) with
transcription factor (TF) binding. c Differentially expressed TFs of
the aPT and PT-S12 which target NP of PROM1 and SLC5A12. d, e Chord
diagram of SLC5A12 and PROM1, respectively, with TFs (TRUE = positive
binding or DNAm Dip and FALSE = no binding or no DNAm Dip).
Transcription factor networks in the proximal tubule
We sought to explore the regulatory landscape of the proximal tubule in
reference to adaptive states. Multiple TFs displayed increased
expression in the aPT (Fig. [215]6a), with concurrently increased open
chromatin within a subset of these genes, such as KLF6, ELF3, SOX4,
KLF10, and NFKB1. Open chromatin was calculated across each TF by
including the TSS and all other peaks within 5 kb of the TSS. We
conducted a TRIPOD analysis of DA peaks within the aPT. The analysis
revealed ELF3 as a candidate regulatory TF of the aPT state due to its
predicted regulation of multiple aPT-specific genes through key
functional regulatory regions defined by DNAm and the histone
landscape. Through this exploration, we discovered a predicted TF
regulatory network specific to the aPT involving ELF3, KLF6, and KLF10
(Fig. [216]6b).
Fig. 6. Regulation of adaptation in the proximal tubule (PT) in the multiome
atlas.
[217]Fig. 6
[218]Open in a new tab
a Mean expression and area under the curve (AUC) of summative open
chromatin for transcription factors (TF) of the adaptive proximal
tubule (aPT) and PT-S1 and S2 (PT-S12) in 12 samples. Bold font
indicates expression upregulation in the aPT (negative binomial exact
test, p < 0.05 after Bonferroni and average Fold Change >0.25
Supplementary Data [219]4). b TF network defined by the TRIPOD[1]
method wherein ELF3, KLF6, and KLF10 cross-regulate each other, and two
genes upregulated in the aPT (ITGB3 and TPM1). Edge thickness
represents the number of peaks predicted in the interaction. c, d
Alignment of epigenomic features in TPM1 and KLF6 for the aPT and
PT-S12. Red stripe indicates a peak with predicted TF binding by ELF3.
Co-accessibility scores were correlated with gene expression, peak
accessibility by Signac, DNAm in the tubulointerstitium (TI), and
histone marks. TF Peaks are numbered and correspond to (Supplementary
Data [220]5). e Pseudotime trajectories from PT-S12 to aPT for the
expression and activity of TFs ELF3, KLF6, and KLF10 with target gene
expression. X axis: pseudotime, Y axis: z score of transformed values
based on the standard deviation of the mean. TF motifs are provided to
the right. f Representative spatial transcriptomic mapping (N = 3) in a
healthy reference, injured cortex, and injured medulla. SLC12A1 defines
the corticomedullary distribution (including medullary rays). TPM1 and
ELF3 expressions are upregulated in the injured cortex. TF activity of
ELF3 is present only in the cortex (in PT dominant spots) and is
upregulated in the injured cortex.
For context, ELF3 is known to regulate Smad3-induced fibrosis in
podocytes of diabetic kidneys^[221]26. Both ELF3 and KLF6 were
predicted to regulate AQP2 expression in the collecting duct of
mice^[222]27,[223]28. KLF6 has been implicated in multiple murine
models of AKI, by impairing branched-chain amino acid (BCAA) catabolism
in aristolochic acid I^[224]29, by enabling pyroptosis in septic
AKI^[225]30, and in mediating apoptosis and inflammation in
ischemia^[226]31. KLF10 has been shown to mediate epithelial to
mesenchymal transition in the kidney^[227]32 and its knockdown reduces
DKD-related fibrosis^[228]33.
ELF3 is predicted to bind ITGB3 at 2 snATAC-seq peaks and TPM1 at 1
promoter-associated snATAC-seq peak, implying direct regulation of
their expression. The regions ELF3 is predicted to bind in ITGB3 were
specific active enhancer (H3K4me1+/H3K27ac+) peaks (116237 and 116240,
Supplementary Fig. [229]10a), indicating regulation via enhancer
activity rather than direct promoter activity. For TPM1, a new peak
(106234) was noted just upstream of the TSS which was also a DNA methyl
dip and CpG island, with histone modifications slightly offset from the
center of the snATAC-seq peak (Fig. [230]6c). An active promoter
signature aligned with the second TPM1 peak (106235); however, the
amount of open chromatin was not different between the PT-S12 and the
aPT, suggesting regulation in the aPT cell state is occurring within
the promoter peak 106234. Of note, the ELF3-regulated peaks also
display H3K27me3 modification in the kidney cortex. This could be due
to the bulk nature of the data or could indicate regions that are
bivalently regulated for rapid transcriptional regulation switches.
ELF3 is also predicted to bind 5 snATAC peaks upstream of the KLF6
promoter, all of which also display enhancer histone modifications
(Fig. [231]6d) and 3 peaks outside of KLF10 (Supplementary
Fig. [232]10b). Analysis using scMEGA (1.0.1) predicts that both KLF6
and KLF10 in turn regulate ITGB3 (Supplementary Fig. [233]10d,
Supplementary Data [234]8). All 5 peaks in KLF6 corresponded to active
enhancer regulatory regions. One of the peaks in KLF10 (67206,
upstream) was a DNAm dip with repressed enhancer marks, while the
others (downstream, 67188 and 67191) were active enhancers with
corresponding H3K4me1 and H3K27ac peaks. Together, the TFs ELF3, KLF6,
and KLF10 were predicted to be involved in an aPT-specific regulatory
network with cross-regulatory relationships between themselves and
evidence of direct regulation of the aPT target genes TPM1 and ITGB3
through binding at specific promoters and enhancers (Fig. [235]6b).
We further explored the pseudotime and localization features of these
TFs using the scMEGA package. Since it defines more permissive
relationships than TRIPOD (which requires a three-way association of TF
expression, target gene DE and target gene DA), we validated the method
by identifying the same relationships in scMEGA with additional aPT
target genes (PROM1, VCAM1) regulated by ELF3, KLF6, and KLF10
(Supplementary Fig. [236]10d). Chromatin accessibility, gene
expression, and TF activity are depicted in Supplementary
Fig. [237]10d. TF activity, target gene expression, and TF expression
were visualized as relationships to the pseudotime trajectory of
Fig. [238]4b for KLF6, KLF10, and ELF3 (Fig. [239]6e). These
relationships showed strong alignment with their aPT target genes for
each associated TF motif. A comprehensive TF activity matrix, filtered
to retain interactions between TFs and target genes with a correlation
of >90% is provided (Supplementary Data [240]9).
Spatial transcriptomics (ST) facilitated localization of target gene
expression, TF expression, and TF activity (Fig. [241]6f). TF activity
refers to a composite vector for a gene regulatory network (GRN) and
all target gene expression, chromatin accessibility, and TF expression.
We mapped these features for ELF3 in a reference nephrectomy with
cortex and medulla, as well as two diseased biopsy samples, one cortex
and one medulla. Expression of SLC12A1, the sodium-potassium-2 chloride
(NKCC) transporter in the TAL, demarcates the regions of medulla or
medullary rays within the nephrectomy and differentiates the regions of
the two biopsy samples. In the second row of Fig. [242]6f, we show the
expression of TPM1, which is upregulated in the injured cortex, but
less so in the medulla. ELF3 expression and activity were both
upregulated in the injured cortex, but not the medulla. In reference
tissue, baseline ELF3 transcription factor activity is confined to the
cortex. Taken together, our results demonstrate that TF activity
matrices and spatial transcriptomics are powerful tools for identifying
transcriptional regulatory networks in spatially resolved tissues. ELF3
TF activity was specifically localized to cortical regions of abundant
PT-S12. These findings have important implications for understanding
the molecular mechanisms underlying tissue injury and repair.
Perturbation analysis of the proximal tubule
To determine whether the TF network of ELF3, KLF6, and KLF10
is integral to or associative with the PT adaptive cell state,
Celloracle^[243]34 in silico knockout of these 3 TFs was performed
individually and in combination (Fig. [244]7). Nine subclusters of the
PT-S1, PT-S2, and aPT were identified and their connections were
modeled (Fig. [245]7a–c). Combination knockout of ELF3 (expression
shown in Fig. [246]7d), KLF6, and KLF10 significantly disrupted the
trajectory path of both the PT-S1 and PT-S2 transition to the aPT
(Fig. [247]7e, f). Comparing the individual knockouts of each gene, the
KLF6 knockout perturbed the largest number of aPT target genes with a
similar trajectory disruption to the combined knockout (Supplementary
Data [248]10, Supplementary Fig. [249]11). ELF3 knockout also yielded a
trajectory disruption between the PT-S12 and aPT, with expression
changes of additional aPT target genes over the individual KLF6
knockout. ELF3 knockout is predicted to reduce KLF6 expression. Thus,
the combined effects are greater than either individual knockout and
consistent with the cross-regulatory relationship we discovered for
these TFs using TRIPOD and scMEGA. Isolated KLF10 knockout affected the
aPT trajectory to a lesser extent than either KLF6 or ELF3, suggesting
redundancy with these genes.
Fig. 7. In silico perturbation knockout in the proximal tubule (PT) (N = 12).
[250]Fig. 7
[251]Open in a new tab
a Cell type distribution of PT-S1, PT-S2, and adaptive PT (aPT). b
Partition-based graph abstraction (PAGA) shows the connectivity of 9
subclusters in Louvain annotation. c Cell type annotation distribution
across the 9 subclusters. d ELF3, a TF targeting multiple aPT marker
genes, is expressed in PT clusters before combined knockout of ELF3,
KLF6, and KLF10, but is reduced in expression after knockout. e Cell
velocity combined with the pseudotime plot showing the cell flow from
PT-S1 and PT-S2 to the aPT cell state. f The cell flow after in silico
knockout revealing disruption of the trajectory. g Gene expression in
nine subclusters of selected PT-S12 marked genes (blue) and 50 top
differential aPT markers before and after combined knockout
Benjamini-Hochberg adjusted P value < 0.05 (Wilcox test). h Expression
of selected aPT marker genes with predicted TF binding from ELF3, KLF6,
or KLF10 before and after combined knockout.
The combined knockout had minimal effect on canonical marker genes of
the PT-S12, but did reduce expression of multiple aPT genes
(Fig. [252]7g, h) with accessible chromatin and predicted binding by
ELF3 and KLF6 (top 50 genes shown). The two aPT subclusters (aPT-A and
aPT-B) demonstrated different behavior in response to the knockout. The
aPT-A sub-cluster maintained higher expression of PT-S12 genes prior to
knockout, but also showed less change in expression of injury genes in
response to the knockout when compared to the aPT-B sub-cluster. In
response to knockout, the aPT-B sub-cluster had increased PT-S12 gene
expression (e.g., SLC13A3, SLC34A1) and reduced aPT gene expression.
Both the aPT-A and aPT-B subclusters had reduction in aPT marker gene
expression or restoration of PT-S12 gene expression after knockout;
however, a greater proportion of the aPT-A cells manifest a potential
repair trajectory after combined knockout. Together, these data suggest
ELF3, KLF6, and KLF10 mediate progression to the aPT cell state.
In a cell culture model, normal human proximal tubular kidney (NHPTK)
cells were transfected with siRNA targeting the three TFs individually
and in combination (Supplementary Fig. [253]11f). Target gene
expression of TPM1 and VCAM1 were reduced in the combined knockout,
supporting the disruption of the aPT trajectory observed in the in
silico analysis. Individually, ELF3 knockdown led to modest reductions
in KLF10, TPM1, and ITGB8 expression. KLF10 knockdown led to reductions
of ELF3, TPM1, and VCAM1. Individual KLF6 knockdown led to increases in
ELF3 and ITGB8 expression, which were not seen in the combined
knockdown, perhaps supporting the approach to evaluate these genes as a
TF network.
Dysregulation of the thick ascending loop of Henle in the adaptive state
In addition to the adaptive state being present in proximal tubule
epithelial cells, it has also been identified in TAL epithelial cells
of the Loop of Henle^[254]3. We sought to determine how the regulatory
features of the adaptive TAL (aTAL) cell corresponded or differed from
the aPT. Overall, there were fewer TAL cells than PT cells in the
atlas. Nonetheless, patterns of DEGs between the cortical TAL (C-TAL)
and aTAL were preserved between epithelial cells. aTAL signatures
revealed loss of canonical markers like SLC12A1, ESRRG, UMOD, and EGF
(Fig. [255]8a–c, Supplementary Data [256]11). We observed 3152 DEGs,
with upregulation of 2485 for C-TAL and 667 for aTAL. ELF3 was
differentially expressed in the aTAL, but to a lesser extent than in
the aPT. Pro-fibrotic genes were upregulated in the aTAL including
ZEB1, LAMC2, and SMAD3. FHL2 is known to mediate podocyte
dedifferentiation and glomerular basement membrane thickening in
diabetic kidney disease and is pro-fibrotic in the TI through the
beta-catenin pathway^[257]35–[258]37, aligning with extracellular
matrix-associated pathways enriched in the aTAL enrichment analysis for
top 300 DEGs. Pathways showed significance with Bonferroni-adjusted P
values < 0.05 in enrichment tests (Fig. [259]8d, Supplementary
Data [260]12). TM4SF1 was a specific aTAL DEG, but little is known of
its function in the kidney. TM4SF1 functions as a tumor
suppressor^[261]38, promotes epithelial to mesenchymal
transition^[262]39, and serves as a marker of functional human alveolar
epithelial progenitor cells^[263]40.
Fig. 8. Adaptation in the cortical thick ascending loop of Henle (C-TAL).
[264]Fig. 8
[265]Open in a new tab
a Differentially expressed genes (DEGs) between the adaptive TAL (aTAL,
blue) and C-TAL (magenta), for N = 3152 genes at a Bonferroni adjusted
P value < 0.05 (Wilcox test) within the multiome atlas (N = 12). b UMAP
harmony of aTAL and C-TAL. Inset shows pseudotime from C-TAL to aTAL. c
Gene expression localizes in aTAL cells for aTAL marker genes (LAMC2,
FHL2, and TM4SF1). Canonical C-TAL markers (UMOD and ESRRG) are
expressed in the TAL. NR2F1 is a TF that is not differentially
expressed. d Top 15 clusters from GO-All pathway enrichment analysis at
a Bonferroni adjusted P value < 0.05 (enrichment tests). The genes are
based on DA regions in the aTAL. Key pathways of the adaptive process
overlap with those of the aPT, including mesodermal cell
differentiation and adhesion. e Alignment of epigenomic features in
TM4SF1 and FHL2 for the aTAL and C-TAL. The red stripe indicates a peak
with TF binding by NR2F1. Co-accessibility scores were correlated with
gene expression, peak accessibility by Signac, DNAm in the
tubulointerstitium (TI), and histone marks. Additional TF peaks and
target genes are found in Supplementary Fig. [266]12. TF Peaks are
numbered and correspond to Supplementary Data [267]13. f Top 10 most
differentially expressed TF in the aTAL and C-TAL (by TRIPOD1). The TF
NR2F1 is not differentially expressed. The bar plot conveys the AUC of
summative gene open chromatin. g The expression of FHL2 and TM4SF1 is
highest in the aTAL. The peak activity of the peaks targeted by NR2F1
is also highest in the aTAL region. h Representative spatial
transcriptomic mapping (N = 3) in a healthy reference, injured cortex,
and injured medulla. NR2F1 expression and activity occur in the
medulla, and activity is present in injury.
We explored the regulation of FHL2 and TM4SF1, which shared a common
TF, NR2F1 (Fig. [268]8e). FHL2 did not possess open chromatin at its
TSS; instead, NR2F1 is predicted to bind FHL2 in an intronic region
(peak 14502) before the first coding exon with an active enhancer mark.
The TSS for TM4SF1 was targeted by multiple TFs that were not
differentially expressed, including ETV1 (Supplementary Data [269]13).
However, NR2F1 had predicted binding upstream at peak 25256 (numbered
according to Supplementary Data [270]14), also corresponding to an
active enhancer mark. NR2F1 was not differentially expressed
(Fig. [271]8g), but had TF activity specific to the medulla
(Fig. [272]8h, Supplementary Data [273]15). The expression of NR2F1 in
spatial transcriptomics was present in relatively few spots,
corresponding to the low expression in a few cells of scRNA-seq.
However, its activity was robust and localized to the medulla both in
reference and injury. (Fig. [274]8i, lower panels), potentially
highlighting the value of these complementary technologies. The aTAL
phenotype included upregulation of SMAD3. SMAD3 potentially interacted
with ZEB1 (at peak 85995), and while not statistically significant for
DA, the peak was a DNA methyl dip and active enhancer peak. In
contrast, a SMAD3 peak targeted by ELF3 (86016) was DA, but without a
DNA methyl dip or active enhancer, suggesting alternative regulation in
this region. TF expression and DA were also assessed using scMega and
revealed similar patterns (Supplementary Fig. [275]12). Of note, HIF1A
was associated with both the aPT and aTAL cell phenotypes. In summary,
the regulation of the aTAL shares overlapping features with the aPT,
but still has distinctly localized TFs that mediate the phenotype.
Discussion
While much progress has been made in transcriptomic and proteomic
interrogation of the kidney, integrating diverse epigenetic
technologies is still a nascent endeavor. Multiple studies have
examined single-cell open chromatin in the context of mRNA
expression^[276]15,[277]41–[278]44. Open chromatin is necessary for
gene transcription, but it may also be found at promoters or enhancers
of inactive or silent genes, indicating that chromatin accessibility is
insufficient alone to predict expression^[279]6. There are multiple
layers of epigenetic regulation that determine the activity of
regulatory regions, with DNA methylation and histone modifications
being among the most important^[280]45. Relevant studies by the Susztak
laboratory have examined WGBS with chromatin immunoprecipitation
sequencing of histone modifications in diabetic and healthy tissue
samples^[281]10 as well as interpreted genome-wide association study
variants in the context of snATAC-seq and methylation. We build upon
this investigation by integrating CUT&RUN, WGBS, multiomic snATAC-seq,
regional transcriptomics, spatial transcriptomics, and regional
proteomics, all at a biopsy scale of tissue. In this study, we have
generated a genome-wide dataset that integrates all these epigenomic
layers of the gene regulatory program with mRNA and protein expression
in overlapping human kidney samples.
A major goal of this work was to create a comprehensive atlas that will
serve as an essential reference for comparison to clinical biopsy
samples and diseased kidney tissue. Studies of this type are needed to
reveal how cell-type specific regulatory regions control the transition
between healthy and diseased states, and to guide efforts to reprogram
cells to promote repair after injury. Because our work builds upon a
transcriptomic atlas of healthy and diseased tissue^[282]3, we were
able to explore the regulatory network of the adaptive cell state that
is associated with the progression of CKD and AKI to CKD transition.
Although this epigenomic atlas includes a limited number of diseased
kidney biopsy samples, both the reference and adaptive cell states are
found in nephrectomy samples, which have varying degrees of injury.
Healthy tissue is heterogenous and often contains abnormal pathology
and altered cells with injury signatures. Most samples selected in this
study were provided by the KPMP and HuBMAP, supplemented by local
repositories, in order to assess the technical reproducibility and
potential biological insights of these technologies to each consortium.
Future studies in a larger KPMP cohort of diseased tissue will better
examine specific epigenomic alterations in AKI, CKD, and those
associated with histologic manifestations. For example, DNA methylation
is known to correlate with renal disease and outcomes, including those
observed during ischemia in kidney transplant allografts^[283]46.
Our bioinformatics analyses highlight the potential value of our
dataset to elucidate regulatory pathways involved in the adaptive cell
state. We uncovered differential regulation of the PT and aPT,
specifically identifying a TF network of ELF3, KLF6, and KLF10,
responsible for multiple downstream target genes within the adaptive
cell state and localizing specifically to the kidney cortex and PT. In
silico knockout of the TFs in this network supported cross-regulation
among these genes and revealed two distinct aPT trajectories.
Interestingly, target gene expression in one of these trajectories was
significantly more responsive to TF knockdown, underscoring a potential
role in transition between PT and aPT. Further, our studies revealed
that regulation of the adaptive cell state differs across epithelial
cell types, as the TF regulation of the aPT was different than that of
the aTAL. The multiome defines cells based on mRNA signatures and TSS
open chromatin, unlike standard snATAC-seq, which uses the TSS alone to
define cell types. This added dimension allows for greater
interpretation of peaks outside the TSS and improves resolution of
injury cell states in which open chromatin of canonical marker genes
may be lost. The integration of WGBS and CUT&RUN facilitates the
assessment of DNA methylation and histone modifications as
contributions to chromatin accessibility.
In order for the atlas to serve as a foundation for the scientific
community, we reasoned that it must meet specific benchmarks. Until
recently, epigenomic analysis of clinical tissues has been limited by
the relatively large amount of starting material needed to perform some
of the assays. However, advances in epigenomic technologies have now
enabled these assays to be performed on a smaller scale. We performed
all methods described in this study using relatively low cell inputs
suitable for interrogation of clinical kidney biopsy samples. For
example, all nephrectomy tissue was cut to be biopsy-sized and in a
subset of samples, multiple technologies were performed on a single
Optimal-Cutting-Temperature compound embedded block. A second key
feature of a reference dataset is that the methodology must meet
rigorous QA/QC criteria to account for assay drift and batch effects,
and be readily reproducible by other investigators. Each of the
technologies used in this study has undergone careful expert review and
standardization based on published KPMP criteria^[284]21 and the
resultant quality data is included in the supplement. In this study, we
demonstrated a high degree of reproducibility among technical and
biological replicates collected over or stored for many months.
Furthermore, we demonstrated genome-wide orthogonal validation between
complementary epigenomic features. Investigators using our opensource
detailed protocols^[285]47–[286]49 adapted to low cell numbers will be
able to validate our data in independent reference samples and use
these methods to interrogate additional diseased kidney tissue.
This multimodal analysis of chromatin regulation aligned histone
modifications with DNA methylation patterns and open chromatin across
the genome of the kidney. There is not a one-to-one correlation of
these technologies and their integration provided a holistic view of
the contributors to chromatin accessibility. This comprehensive
epigenetic map will complement the interpretation of signatures within
the merged Human Biomolecular Atlas project (HuBMAP) and Kidney
Precision Medicine Project (KPMP) transcriptomic atlas^[287]3. Future
studies comparing healthy and diseased tissue have the potential for
significant clinical impact by uncovering epigenetic biomarkers of
disease progression and druggable cell-specific regulatory targets that
promote successful or failed repair.
Methods
Human subjects and samples
This study complies with all relevant ethical regulations and was
approved by the Institutional Review Boards of Indiana University and
Washington University in St. Louis and as part of sIRB protocols of the
HuBMAP and KPMP. Human samples and clinical data were derived from the
following sources: (1) The Biopsy Biobank Cohort of Indiana (BBCI, IRB
#1906572234)^[288]50 approved by Indiana University, (2) the Kidney
Translational Research Center (KTRC, (IRB #201102312) of Washington
University in St. Louis^[289]51, (3) the Kidney Precision Medicine
Project (KPMP)^[290]52, and (4) the Human Biomolecular Atlas
Project^[291]53 (HuBMAP). For the BBCI, KTRC, and HuBMAP, samples were
obtained under a waiver of informed consent due to minimal risk to
subjects according to the United States Department of Health and Human
Services Common Rule 45 CFR 46.116(f). For KPMP, samples were acquired
with written informed consent. Samples included healthy reference
tissue from patients undergoing total nephrectomy, deceased donor
nephrectomy, percutaneous kidney biopsy in a healthy transplant donor,
or percutaneous kidney biopsy in an individual with kidney disease
(Supplementary Data [292]1). All nephrectomy samples were dissected
from tumor-free regions. Individuals with AKI or CKD were included in
the KPMP. The proportion of cortex and medulla, the demographic data,
the clinical data, and the histopathologic findings were recorded.
Multiome sample processing, library preparation, sequencing, and analysis
For sample processing, in 12 total samples (8 distinct biologic
samples), 47,217 nuclei were isolated from tissue cryosections
according to a published protocol^[293]54, and with the following
exceptions: (1) tissue sections were cut then stored on dry ice until
nuclei isolation; (2) Protector RNase Inhibitor (Sigma-Aldrich, Catalog
#3335402001) was used at a concentration of 1.0 U/μl; (3) Complete
protease inhibitor cocktail (Roche, cat #11836153001) was included for
final 1× concentration; and (4) 4′,6-diamidino-2-phenylindole (DAPI)
was excluded from the extraction buffer. Furthermore, two nuclei
samples were further purified immediately following isolation using the
LeviCell system (Levitas Bio) using company protocols. This was
performed to assess for any data quality improvements for nuclei that
had been separated from cellular debris generated during the tissue
dissociation.
For library preparation, isolated nuclei were processed immediately
using the Chromium Next GEM Single Cell Multiome ATAC + Gene Expression
(v.1.0) kit according to the step-by-step protocol available
online^[294]49.
For sequencing, the RNA and ATAC libraries were sequenced separately on
the NovaSeq 6000 (Illumina) system (NovaSeq Control Software v.1.7.0
and v.1.7.5). Sample demultiplexing, barcode processing, gene
expression and open chromatin peak quantifications were performed using
the Cell Ranger Arc software (v.2.0.0) using GRCh38 (hg38) reference
genome.
For RNA Analysis, cell barcodes passing the following quality control
filters were used for downstream analyses: (1) passing 10X Cell Ranger
Arc (RNA/ATAC) filters; (2) considered singlets by the DoubletDetection
software (v.2.4.0); 3) showing greater than 400 and <7500
non-mitochondrial genes detected; (4) passing a gene UMI ratio filter
(gene vs. molecule.cell.filter) from the Pagoda2 software
([295]https://github.com/hms-dbmi/pagoda2). Mitochondrial transcripts
were removed, and the query RNA counts were integrated with a previous
snRNA-seq reference atlas of healthy and injured cell types in the
human kidney^[296]3. This integration was performed using Seurat
(v.4.0.0)^[297]3. Briefly, RNA counts were normalized using
sctransform, anchors between datasets were identified based on the
reference Pagoda2 principal components. Query data was then projected
onto the reference principal component (PC) and UMAP space, with cell
type labels transferred using the MapQuery function. A k-nearest
neighbor graph (k = 100) was generated using Pagoda2 based on the
integrated PCs, and integrated clusters were identified using the
infomap community detection algorithm. The query portion of each
cluster was then annotated to the most overlapping, correlated and / or
predicted reference cell type label. This was performed to account for
inaccurate cell type labeling that is prevalent for altered cellular
states. A further step of manual assessment of cell type markers was
performed to confirm identities. Integrated clusters that showed
markers or identities that overlapped across disparate cell types were
labeled as ambiguous or low-quality and removed.
For chromatin Analysis, snATAC data was processed using Signac
(v.1.6.0)^[298]55. Peaks called using Cell Ranger Arc were combined
across experiments using the reduce function. Fragment objects for each
experiment were prepared from Cell Ranger Arc fragment files using the
CreateFragmentObject function. The combined set of peak regions was
used to generate peak-by-cell matrices for each experiment using the
FeatureMatrix function. These peak matrices were then used to create
individual Seurat objects that were merged to form a single combined
object. Only cell barcodes that were retained from RNA analyses were
used for further analyses. Accessible peaks were then called separately
for multiple levels of cell type annotations (clusters, subclass level
3 and subclass level 1) using the CallPeaks function and MACS
(v.3.0.0a6; [299]https://github.com/macs3-project/MACS). All peaks,
including those called by Cell Ranger Arc, were combined using the
reduce function and filtered to remove: (1) regions >10000 and <20 base
pairs; (2) regions falling within nonstandard chromosomes; (3) regions
occurring in blacklist regions using the blacklist_hg38_unified object
from Signac. The final peak set was used to create a new peak by cell
count matrix and seurat object as detailed above. Gene annotation of
the peaks was performed using GetGRangesFromEnsDb(ensdb =
EnsDb.Hsapiens.v86). Nucleosome signal (NS) scores, transcription start
site (TSS) enrichment scores and fraction of reads in peaks (FRiP) were
calculated for each cell using the NuceosomeSignal, TSS Enrichment and
FRiP functions. Cell barcodes further passing the following ATAC
filters were kept for downstream analyses: (1) >1000 and <100000 peak
counts per cell; (2) a FRiP greater than 0.25; (3) an NS score less
than 4; (4) a TSS enrichment score >2. A combined Seurat object was
then generated with separate RNA and ATAC assays for the same cell
barcodes. The LinkPeaks function was used to link potential regulatory
peaks to each gene, taking into account GC content (computed using the
RegionStats function with the BSgenome.Hsapiens.UCSC.hg38 genome),
overall accessibility and peak size. Transcription factor motif
information was added to the Seurat object using the AddMotifs function
in Signac and using Jaspar motifs (JASPAR2020, all vertebrate). Motif
activity scores were computed using chromVar^[300]56 (v.1.12.0;
[301]https://greenleaflab.github.io/chromVAR) through the RunChromVar
function. For quality control assessment, the multiome dataset was
compared to a publicly available snATAC-seq dataset ([302]GSE151302)
that was aligned and merged with the KPMP snRNA-seq atlas^[303]3.
Cleavage under targets & release using nuclease (cut&run) sample processing,
library preparation, sequencing, and analysis
For sample processing, liquid nitrogen or OCT-embedded frozen kidney
tissue was processed according to a modified protocol adapted from
Epicypher and available on protocols.io^[304]48. Antibodies used in
this study for CUT&RUN reactions: H3K27ac (Cell Signaling, 8173),
H3K27me3 (Cell Signaling, 9733), H3K4me1 (Cell Signaling, 5326),
H3K4me3 (Cell Signaling, 9751), and IgG (Cell Signaling, 2729) at a
1:50 dilution. The number of samples processed for each antibody are:
H3K27ac N = 6 total and 4 distinct biologic, H3K27me3 N = 10 total and
6 biologic, H3K4me1 N = 3 total and 1 biologic, H3K4me3 N = 3 total and
1 biologic.
For library preparation, up to 3 ng of DNA from KPMP donor samples was
used to prepare sequencing libraries using the Ion Plus Fragment
Library Kit (Thermo Fisher Scientific, 4471252), following
manufacturer’s instructions, using 17 cycles to amplify the library. Up
to 0.5 ng of DNA from KTRC donor samples (3399, 3409, 3447) was used to
prepare sequencing libraries using NEBNext Ultra II Library Kit (NEB,
E7645), following manufacturer’s instructions, using 18 cycles to
amplify the library.
For sequencing, 100 pM KPMP donor libraries were sequenced on the Ion
Torrent Proton to a sequencing depth of 30 million reads (2–3× genome
coverage). 0.8 pm KTRC donor libraries were sequenced on the NovaSeq
6000 (Illumina) targeting 100 million paired-end reads.
For analysis, detailed bioinformatic analysis with command line
examples for Illumina sequences can be found on protocols.io^[305]48.
Briefly, trimmed fastq files were aligned to the hg38 reference genome
using Bowtie2 (for Illumina sequenced) or TMAP (for Ion Torrent
sequenced). Aligned reads were extracted and converted to a sorted bam
file using SAMtools. Scaled bigWig files were generated using
Deeptools, and peaks were called using Macs2.
Whole genome bisulfite sequencing (Wgbs) sample processing, library
preparation, sequencing, and analysis
For sample processing, thirty kidney glomerular and tubulointerstitial
(TI) samples from 14 distinct donors were processed along with positive
and negative blood controls. Glomerular and TI samples from 1 donor
were processed twice to assess batch effect. Tissue was stored at
−80 °C and sectioned from an OCT block at 12 µm thickness, and mounted
on Leica PPS-membrane Laser Microdissection (LMD) slides (Leica, Cat#
11505268). A rapid stain protocol^[306]23 prepared slides for LMD,
briefly: slides were fixed in –20 °C Acetone (Sigma-Aldrich, Cat#
270725-1 L) for 1 minute, washed with PBS (VWR, Cat# K812-500ML) 2
times for 30 seconds, stained with antibody mix [4 μL OG-Phalloidin
(Oregon Green 488, ThermoFisher, Cat# O7466) + 1.5 μL DAPI
(ThermoFisher, Cat# 62248) in a total volume of 200 µl of PBS solution]
for 5 minutes, washed with PBS for 30 seconds twice and air dried for
5 minutes. Glomerular (GLOM) and TI sections were dissected on a Leica
LMD6500 system and collected directly in the flat cap of an autoclaved
0.5 mL microcentrifuge tube (ThermoScientific, Cat# AB-0350) tubes,
then stored in −80 °C. DNA was isolated using the PureLinkTM Pro 96
Genomic DNA Kit (Cat# K1821-04A) with minor modifications. Tissue
lysates were prepared using the “Mammalian Cells and Blood Lysate”
protocol, briefly: 200 µl of PBS, 20 µl of Proteinase, 20 µl of RNase A
were added to each tube, mixed and incubated for 2 minutes at room
temperature. Then, 200 µl of PureLinkTM Pro 96 Genomic Lysis/Binding
Buffer was added to each tube, mixed and incubated for 15 minutes at
55 °C. The produced tissue lysate was mixed with 200 µl of 100% ethanol
and then transferred to a PureLinkTM gDNA filter plate, centrifuged at
≥ 2100 × g for 10 minutes, then washed with 500 µl of wash buffer 1 and
wash buffer 2 separately; the purified DNA was eluted in 50 µl of
elution buffer.
For library preparation, DNA quantity and quality were assessed by
Qubit Fluorometer 4.1 (Invitrogen). Bisulfite conversion was performed
using the EpiTect Fast DNA Bisulfite Kit (Qiagen Cat# 59824). An input
of 10 ng was used for library preparation, and QIAseq Methyl DNA
Library Kit (Cat# 180502) was used for amplification and ligation of
Illumina adaptors. Library purification was performed using QIAseq
Beads. The library quality was assessed using an Agilent Bioanalyzer
2100 and Qubit Fluorometer 4.1.
For sequencing, multiple libraries were pooled in equal molarity, and a
final concentration of 300 pM was loaded onto the NovaSeq 6000
sequencer (Illumina) for 150 bp paired-end sequencing. Approximately
200 million read pairs (400 million reads total) were generated per
library for ~15–20× calculated coverage. The actual final measured
coverage averages 11.8× after processing. An EpiTect Unmethylated DNA
control (Qiagen Cat# 59568) and EpiTect Methylated DNA control (Qiagen
Cat# 59655) were sequenced as controls.
For analysis, sequencing quality control, mapping, and methylation
analysis were performed using the pipeline with FastC (v.0.11.5) and
MultiQC (v.1.8). TrimGalore (v.0.6.7) was used to remove Illumina
adapter sequences and ten 5’ prime bp and five 3’ prime bp. Bismark
(v.0.23.1) DNA non-directional mapping was used to align to the
hg38/GRCh38 reference genome. Duplicated reads were removed.
Methylation extractor was used to define CpG regions. Alignment was
performed on the Carbonate Cluster at Indiana University. Batch effect
correction was performed by an empirical Bayes framework^[307]57 by SVA
R package. To test reproducibly between batches, biological replicants
were applied to the Rank–Rank Hypergeometric Overlap (RRHO2)^[308]58 to
determine agreement.
DNA methylation data was stored in a methylKit (v.1.20.0) object. This
object accepted CpG sites with coverage greater than 5 reads across all
samples for the GLOM and TI. Whole genome DNAm was extracted
(Supplementary Fig. [309]1) for each nucleotide. Methylated cytosines
(#C’s), unmethylated cytosines (#T’s) and coverage (#C’s + #T’s) were
quantified per nucleotide in the glomerulus (g) and tubulointerstitium
(ti) for each sample i. Regions of interest were extracted using Grange
GenomicRanges (v.1.46.1), where each row has information for genomic
regions as small as one base pair to those spanning an entire
chromosome. Regions were subset and summed (coverage, numC’s, and
numT’s). For each region, the methylation levels are given by Eq.
([310]1) GLOM and Eq. ([311]2) in TI:
[MATH:
mlGLO<
/mi>M=∑i=1
15#C′sgi
mrow>∑i=1
15#T′sgi
mrow>+#C′sgi
mrow> :MATH]
1
[MATH:
mlTI=∑i=1
15#C′stii∑i=1
15#T′stii+#C′stii :MATH]
2
A t test is applied with multiple testing correction for
[MATH:
mlG
LOMi<
/msub> :MATH]
and
[MATH:
mlT
Ii :MATH]
.
The relative, differential, or hyper-methylation (Hyper) is given by
Eq. ([312]3):
[MATH:
Hyper=log2mlGLOM+β
mlTI+β :MATH]
3
were β = 0.001. The measurement compares the GLOM and TI compartments
to assess relative methylation levels. A positive value points to
greater methylation in the GLOM and a negative value has greater
methylation in the TI. Hyper was calculated for any region.
Using the annotatr (v.1.21.0) R package (PMID: 28369316), genes were
annotated with whole gene and gene regions (1 to 5 kb upstream of TSS,
exons, introns, promoters, 3’ UTRs, 5’ UTRs, CpG islands, CpG shelves,
CpG shores, CpG intergenic). Methylation status was summed across CpGs
in a region for each gene.
Regional transcriptomics sample processing, library preparation, sequencing,
and analysis
For sample processing, an overlapping set of samples (11 of 14 distinct
biologic WGBS samples) were processed for regional transcriptomics of
GLOM and TI according to a laser microdissection protocol^[313]23,
available on protocols.io^[314]47. Briefly, glomeruli and the
tubulointerstitium were microdissected from a frozen OCT section using
a rapid staining antibody-based protocol.
For library preparation, cDNA was synthesized with the SMARTer
Universal Low Input RNA Kit protocol (Clontech, cat. no. 634938) using
the standard RiboGone–Mammalian Kit protocol.
For sequencing, barcoded libraries were pooled in equal molarity and
sequenced on an Illumina HiSeq 4000 for about 100 million reads per
library.
For analysis, mRNA expression was calculated according to a negative
binomial dispersion exact test. Statistics were performed with negative
binomial generalized linear models^[315]59 in edgeR (v.3.36.0).
Regional proteomics sample processing, protein measurement, and analysis
For sample processing, an overlapping set of samples (8 of 14 distinct
biologic WGBS samples) were processed for regional proteomics of GLOM
and TI according to the protocol available on protocols.io
(10.17504/protocols.io.bew6jfhe). Briefly, glomeruli and the
tubulointerstitium were microdissected from a frozen OCT section.
Protein content was measured by Liquid chromatography-tandem mass
spectrometry. Analysis was performed with an Easy-nLC 1000 coupled to
an Orbitrap Fusion mass spectrometer (ThermoScientific, Waltham, MA).
For protein measurement and analysis, protein was quantitated by
high-performance liquid chromatography with mass spectrometry^[316]10.
Spatial transcriptomics sample processing, library preparation, sequencing,
and analysis
For sample processing, human kidney tissue (N = 3) was prepared and
imaged using the Visium Spatial Gene Expression protocols (10x
Genomics, [317]CG000240 protocol)^[318]60. From optimal-cutting
temperature (OCT) compound embedded blocks, tissue was sectioned at a
thickness of 10 µm. Hematoxylin and eosin (H&E) stained brightfield
Images were acquired from a Keyence BZ-X810 microscope. The microscope
was equipped with a Nikon 10X CFI Plan Fluor objective. Brightfield
mosaics were stitched and aligned with Visium fiducials. mRNA was
isolated from the tissue sections after 12 minutes of permeabilization.
mRNA was bound to oligonucleotides in the fiducial spots and then
reverse transcribed.
For library preparation and sequencing, the mRNA underwent
second-strand cDNA synthesis, denaturation, cDNA amplification, and
SPRIselect cDNA cleanup (Visium [319]CG000239 protocol). cDNA was
sequenced on an Illumina NovaSeq 6000.
Analysis
Space Ranger (v2.0.0) with the reference genome GRCh38 2020-A was used
to perform expression anEalysis, mapping, counting, and clustering. To
localize transcription factor activity, we first created trajectories
to obtain gene regulatory networks (GRNs) in scMEGA^[320]61. The
multiome data was subdivided into aPT/PT-S1S2 cells and aTAL / C-TAL
cells to identify the pseudotime spectrum of cells from health to
adaptive injury states. Harmony was applied for batch
correction^[321]62. The R package Destiny^[322]63 was used to construct
a diffusion map and apply a reduction in non-linear space.
Computational inference of GRNs was obtained in order to map TF
activity with scMEGA. Briefly, a composite vector accounting for target
gene expression, target gene open chromatin accessibility, and TF
expression are localized over renal tissue using spatial transcriptomic
expression patterns.
Cohen’s Kappa measurement of alignment
All datasets were aligned to the hg38/GRCh38 reference genome. The
agreement of multiome peaks, each CUT&RUN antibody peak, and WGBS dips
were assessed. Regions were defined as peak or no-peak in the multiome
dataset by Signac. To evaluate peak alignment, the podocyte (POD),
proximal tubule (PT-s1/S2), and cortical TAL (cTAL) cell types were
defined with the findmarker function and cell-type specific marker
genes were selected from the merged HuBMAP/KPMP sc/snRNAseq atlas v
1.0^[323]3. A Gisch filter was implemented to select cell-type specific
peaks for the multiome and remove background signals from other
technologies. For this filter, the AUC was calculated for each gene
region (exon, intron, etc.). In regions where the peak AUC was greater
than the average AUC of the entire gene, the region was called a peak.
Other regions were considered no-peak regions. Peaks were called
similarly for H3K27ac, H3K27me3, H3K4me1, and H3K4me3 antibody marks.
WGBS dips were called when the AUC of a region was less than the
average AUC of the whole gene. Fisher’s exact and Cohen’s Kappa tests
were used to evaluate the agreement of peaks across technologies.
Heatmap analysis
Heatmaps of the alignment of histone modifications with genomic
features were generated using DeepTools. Genomic features assayed were
restricted to those that overlapped with a macs2 called ATAC peak (bulk
ATAC-seq or PT-specific snATAC-seq peaks). Promoters were defined as a
2 kb region centered on an annotated GRCh38 transcription start site.
Putative enhancers were called by the overlap of macs2 called
H3K4me1+/H3K27ac+/H3K4me3- bulk ATAC-seq peaks not located within an
annotated gene body or promoter. DNA methylation dips were called based
on a DNAm value of ≤0.4 across a 50 bp region or larger.
Cell-type deconvolution
Cell-type deconvolution was performed using the Dtangle algorithm in
the Granulator (v.1.7.0) R Package. The RNA Signature was derived from
the merged HuBMAP/KPMP snRNAseq kidney atlas. The top 10% marker DEGs
were selected for which CUT&RUN promoter peaks were detected in
H3K27ac, H3K4me1, or H3K4me3. Cell types were restricted to those with
≥20 Marker Genes in this top 10% of DEGs. The descending thin limb and
endothelial cell clusters were removed because of similarity to TAL
signature and poor algorithm performance, respectively. The Detangle
algorithm then identified the percentage of cells from each cell
cluster based on the histone mark.
Transcription factor analysis
TRIPOD (v.0.01) R package was employed to identify regulatory trios
from the multiome dataset between: (1) open chromatin with
cis-regulatory regions of target genes (100–200 kb surrounding the
TSS), (2) target gene expression, and (3) TF expression^[324]24. The TF
definitions were based on the Signac enrichment method^[325]55 and the
JASPAR2020 dataset^[326]64. TRIPOD creates a meta cells matrix to link
RNA expression (snRNAseq) and open chromatin (snATAC-seq). Briefly, the
steps in the TRIPOD method include: (1) RNA is normalized by
regularized negative binomial regression, nuclei are clustered by PCA
and UMAP, (3) the open chromatin in the snATAC-seq assay is normalized
by default options, and (4) using the PCA (RNA) and LSI (ATAC)
reduction, a weighted nearest neighbor (WNN) graph was built^[327]65 to
cluster RNA and ATAC together. The matrix is used to compare each
regulatory trio and identify TF and target gene associations with high
confidence.
For pseudotime trajectories of transcription factor activity and
expression, scMEGA (v.1.0.1) was used as described above^[328]61. For
differentially expressed genes, a Seurat findmarker function was run
using the test.use = wilcox and plotted as a volcano plot with a
Bonferroni-corrected P value < 0.05 and percent of cells expressed
above 2%.
To calculate the differential accessibility (DA) of genes, the Seurat
function FindMarkers was used with the logistic regression LR. Peaks
were classified as DA peaks with a Bonferroni-corrected P value < 0.05
and an absolute log[2] fold change >0 between the cell types, with the
setting only.pos = FALSE. An average log[2] fold change >0 means
greater open chromatin in adaptive injury cell states.
In silico perturbation
Co-accessibility networks for the PT cells were obtained with
Cicero^[329]66. Following the CellOracle (v.0.10.12) pipeline^[330]34,
9 subclusters were identified, and the connections in gene expression
were modeled with a graph abstraction. A combined knockout of ELF3,
KLF6 and KLF10 was simulated in silico, and changes in gene expression
and cell velocity were calculated. Highly aPT-changing genes were
selected by the higher variation in Fold Change between aPT and PT-S12.
Expression modeling
DNA methylation was summed across CpG sites within gene region
annotations for whole gene bodies (N = 30,024), exons (27,830), introns
(25,904), promoters (28,625) and CpG Islands (18,849). Methylation was
compared to mRNA and protein expression derived from a set of
completely overlapping samples in differentially expressed genes (DEGs,
N = 5942) or proteins (N = 1917) between GLOM and TI dissections. The
correlation of differential DNA methylation with expression of DEGs or
DEPs was assessed by univariate linear regression, and residuals were
calculated using ggplot2 and stat R packages. A best-fit regression
model was developed wherein a gene annotation (CpG island, exon,
intron, promoter) was selected based on the greatest C-statistic
between log[2] differential expression and differential gene region
methylation for each gene (Fig. [331]2c, Supplementary Fig. [332]5).
Analogously, DNA methylation was summed within four sets of CUT&RUN
peaks, bulk ATAC-seq peaks, and single-cell multiome peaks (PT-S1 and
C-TAL for TI, and Podocyte for GLOM). The peaks of these orthogonal
technologies acted as filters for univariate regressions in which
summative DNA methylation was plotted within each peak set against mRNA
or protein expression for all DEGs and DEPs.
Enrichment analyses
pathfindR (v.2.1.0) was used to combine a BioGRID active subnetwork
search with a gene ontology (GO) pathway enrichment analysis. We input
a gene interest list with adjusted P value from gene differential
expression^[333]67. The enriched terms are clustered and identify
significative adjusted P value < 0.05 and are called representative
pathways and their members. The P values obtained from the enrichment
tests are adjusted by Bonferroni method.
Kidney cell culture
Normal human proximal tubular kidney (NHPTK) cells^[334]68 were plated
at a density of 4.5 × 10^4 in a 24-well flat-bottom plate and
maintained in Renal Epithelial Growth media (REGM, Lonza, Basel,
Switzerland) and 9% fetal bovine serum (HyClone). Cells were diluted to
20–30% confluency three times a week and maintained at 37 °C in 95%
humidified atmosphere with 5% CO2.
siRNA knockdown and real-time polymerase chain reaction
We conducted siRNA-mediated knockdown of ELF3, KLF10, and KLF6 in NHPTK
cells with lipofectamine. Cells were plated on Day 1 with lipofectamine
and a pool of up to 6 directed siRNA constructs (20 nM concentration
each) for 5 conditions: (1) Silencer, (Cat. No. 4390843) scrambled
siRNA control, (Cat. No. 4427037), (2) a pool of two ELF3 siRNA
molecules (siRNA ID s4623 and s4624), (3) a pool of two KLF6 siRNA
molecules (siRNA ID s3376 and s3375), (4) a pool of two KLF10 siRNA
molecules (siRNA ID s14129 and s14130), and (5) a combination of 6
siRNA molecules targeting ELF3, KLF10, and KLF6. Gene expression of
(Cat. No. 4448892) ELF3 (Assay ID Hs00963877_g1), TPM1 (Assay ID
Hs04398572_m1), ITGB8 (Assay ID Hs00174456_m1), (Cat. No. 4453320),
KLF10 (Assay ID Hs00921811_m1), KLF6 (Assay ID Hs00810569_m1) and VCAM1
(Assay ID Hs01003372_m1) was measured 48 hours after siRNA or scrambled
control transfection. RNA was isolated with the miRNeasy Mini Kit (Cat.
No. 217004, Qiagen, Hilden, Germany) and converted to cDNA with the
High-Capacity cDNA Reverse Transcription Kit (Cat. No. 4368814, Thermo
Fisher) according to the manufacturer’s protocol. Real-time PCR was
performed on the ViiA 7 Real‐Time PCR System (Applied Biosystems,
Waltham, MA) using TaqMan Gene Expression assays for all six genes
(Life Technologies, Foster City, CA) with GAPDH as a control (Cat. No.
4448489, Assay ID Hs02786624_g1).
The ΔΔCT technique was used to calculate the relative gene expression
in samples with the fold difference between the siRNA knockdown and
scrambled control calculated as: fold difference = 2^(ΔΔCT). Gene
expression after siRNA transfection is expressed as a percentage
relative to the scrambled control expression for each gene. Four
replicates were performed for the control and each individual siRNA
knockdown. Two replicates were performed for the combined 3-gene
knockdown. For each experiment, significance was determined by an t
test.
Statistics
For basic statistical analyses and when not otherwise specified, a t
test was used to compare two continuous variables, and a Fisher’s exact
test was used for categorical comparisons.
Reporting summary
Further information on research design is available in the [335]Nature
Portfolio Reporting Summary linked to this article.
Supplementary information
[336]Supplementary Figures^ (10.9MB, pdf)
[337]Peer Review File^ (817.3KB, pdf)
[338]41467_2023_44467_MOESM3_ESM.pdf^ (26.1KB, pdf)
Description of Additional Supplementary Files
[339]Supplementary Dataset 1 to 9^ (159.5MB, xlsx)
[340]Supplementary Dataset 10 to 15^ (34.4MB, xlsx)
[341]Reporting Summary^ (335.5KB, pdf)
Source data
[342]Source Data^ (863.4KB, xlsx)
Acknowledgements