Abstract

   The clear cell renal cell carcinoma (ccRCC) microenvironment consists
   of many different cell types and structural components that play
   critical roles in cancer progression and drug resistance, but the
   cellular architecture and underlying gene regulatory features of ccRCC
   have not been fully characterized. Here, we applied single-cell RNA
   sequencing (scRNA-seq) and single-cell assay for transposase-accessible
   chromatin sequencing (scATAC-seq) to generate transcriptional and
   epigenomic landscapes of ccRCC. We identified tumor cell-specific
   regulatory programs mediated by four key transcription factors (TFs)
   (HOXC5, VENTX, ISL1, and OTP), and these TFs have prognostic
   significance in The Cancer Genome Atlas (TCGA) database. Targeting
   these TFs via short hairpin RNAs (shRNAs) or small molecule inhibitors
   decreased tumor cell proliferation. We next performed an integrative
   analysis of chromatin accessibility and gene expression for CD8^+ T
   cells and macrophages to reveal the different regulatory elements in
   their subgroups. Furthermore, we delineated the intercellular
   communications mediated by ligand–receptor interactions within the
   tumor microenvironment. Taken together, our multiomics approach further
   clarifies the cellular heterogeneity of ccRCC and identifies potential
   therapeutic targets.

   Subject terms: Tumour heterogeneity, Epigenetics

Introduction

   Clear cell renal carcinoma (ccRCC) is the most common and aggressive
   histological subtype of renal cell carcinoma^[44]1,[45]2. More than
   one-third of ccRCC patients relapse and develop metastases after
   surgery. The prognosis for metastatic ccRCC patients is poor, with a
   5-year survival rate of 10%^[46]3, emphasizing the need to understand
   the underlying cellular and molecular mechanisms to facilitate the
   discovery of biomarkers and guide clinical intervention. The efficacy
   of current ccRCC clinical treatment modalities, including conventional
   chemotherapy, targeted therapy, and immunotherapy, is limited by tumor
   heterogeneity^[47]2,[48]4.

   Previously, large-scale genomic studies revealed many essential genome
   mutations that drive tumor progression and contribute to clinical
   treatment^[49]5–[50]7. A common mutation is the inactivation of the VHL
   gene, leading to the stabilization of oncogenic hypoxia-inducible
   factor proteins (HIF1α, HIF2α). Other frequent mutations in chromatin
   remodeling genes, such as PBRM1, BAP1, and SETD2, are necessary for
   tumorigenesis and tumor development^[51]7,[52]8. However, tumors exist
   in a highly heterogeneous microenvironment that includes many different
   cell types, so bulk sequencing is not suitable for delineating tumor
   characteristics at the cellular level. Single-cell isolation and
   barcoding technologies have enabled us to investigate cellular
   heterogeneity at a single-cell resolution to discern the role of
   different cell populations whose gene expression patterns may be masked
   or diluted in bulk sequencing^[53]9,[54]10. ScRNA-seq has been used to
   comprehensively characterize the cellular composition and
   transcriptional states of ccRCC, revealing its origin and intratumoral
   heterogeneity^[55]11,[56]12. Comparative scRNA-seq analyses of
   different conditions (e.g., before and after treatment, in disease
   stages, and at sampling sites) revealed multiple cell populations
   associated with immunotherapy resistance and a poor prognosis in ccRCC
   patients^[57]13–[58]15. Although significant progress has been made
   toward delineating changes in the transcriptional expression of various
   cell types during tumor development and clinical treatment, it remains
   unclear how cis-acting DNA elements (e.g., enhancers and promoters) and
   trans-acting factors (e.g., transcription factors (TFs)) regulate these
   changes within the microenvironment.

   ScATAC-seq identifies accessible chromatin regions by Tn5
   transposase-mediated tagmentation and captures active DNA regulatory
   elements at a single-cell resolution^[59]16. Similar to bulk ATAC-seq,
   this approach can capture multiple types of gene regulatory
   information; for example, it can identify genome-wide cis-elements and
   indicate inference with TFs binding and activity. ScATAC-seq has been
   widely used to delineate the differentiation trajectory of
   developmental lineages and reveal key regulatory
   elements^[60]17,[61]18. Multidimensional data, such as DNA and RNA
   sequencing data, provide different perspectives to investigate
   biological phenomena and deepen our understanding of tumor pathogenesis
   and progression. Recent studies integrating scRNA-seq and scATAC-seq
   data on healthy kidneys have depicted the transcriptional and chromatin
   accessibility landscape and identified key celltype-specific TFs that
   play crucial roles in kidney development^[62]19,[63]20. Resolving ccRCC
   heterogeneity with multidimensional information at the single-cell
   level can provide new insights for exploring tumor regulatory
   mechanisms and identifying potential therapeutic targets.

   Here, we performed scRNA-seq and scATAC-seq on ccRCC primary tumor
   tissues to identify the key regulatory molecules that mediate tumor
   development and manipulate the function of immune cells. Our multiomics
   analysis revealed a tumor-specific regulatory signature and multiple
   TFs associated with immune cell functional states. Our work provides
   important insights into dissecting the tumor heterogeneity of ccRCC
   using single-cell multiomics data.

Results

Single-cell transcriptional and chromatin accessibility profiling in ccRCC

   To systematically dissect the heterogenous architecture of ccRCC
   tumors, we performed paired scRNA-seq and scATAC-seq on three patients
   and scRNA-seq alone on one additional patient (Fig. [64]1a). Selected
   patients ranged in age from 33 to 75 years old and included both women
   (n = 2) and men (n = 2) (Supplementary Table [65]S1). All samples had
   VHL gene mutations (Supplementary Fig. [66]S1a). Other common
   mutations, such as the histone deubiquitinase BAP1^[67]8 and SWI/SNF
   component ARID1B^[68]21, were also detected sporadically in these
   samples. After quality-control filtering, a total of 38,097 cells from
   scRNA-seq and 21,272 cells from scATAC-seq were retained for the
   following analysis (Supplementary Fig. [69]S1b–e). Using canonical
   markers, we identified 15 cell types in the scRNA-seq dataset (Fig.
   [70]1b, c and Supplementary Tables [71]S2, [72]S3). For our scATAC-seq
   data, we calculated the prediction scores by Seurat’s label-transfer
   algorithm and annotated cell clusters in a supervised manner (Fig.
   [73]1d and Supplementary Fig. [74]S1f, g). In parallel, we inspected
   the chromatin accessibility at the promoter regions for known marker
   genes and calculated their activity scores for assigning cell
   identities (Supplementary Fig. [75]S1h). Comparing the annotation
   results between these two annotation strategies, we found that most
   cell types were present in both datasets, which supported that
   scATAC-seq is comparable to scRNA-seq in the detection and annotation
   of cell types (Supplementary Fig. [76]S1i). Finally, we identified 12
   cell types in the scATAC-seq dataset by combining these two annotation
   results (Fig. [77]1e, f).

Fig. 1. Single-cell transcriptional and chromatin accessibility profiling of
human ccRCC.

   [78]Fig. 1
   [79]Open in a new tab

   a Schematic of multiomics profiling of chromatin accessibility and
   transcription in ccRCC using scRNA-seq and scATAC-seq. b scRNA-seq UMAP
   projection of 38,097 single cells from four ccRCC samples. c Dot plot
   showing the gene expression patterns of cell-type marker genes in the
   scRNA-seq data. d Cross-platform linkage of scATAC-seq cells with
   scRNA-seq cells. e scATAC-seq UMAP projection of 21,272 single cells
   from three ccRCC samples. f Normalized chromatin accessibility profiles
   for each cell type at canonical marker genes. The promoter region is
   highlighted in gray with the gene model and chromosome position shown
   below. g Heatmap of CNV signals normalized against the “normal” cluster
   shown in the top panel (CD8^+ T cells and NK/NKT cells) for CNV changes
   by chromosome (columns) within individual cells (rows). All cells in
   the tumor population (bottom panel) exhibited chromosome 3p loss (blue
   dotted frame) or chromosome 5q gain (red dotted frame), which are
   classic genomic features of ccRCC. h Pie charts of the proportion of
   each class in the scRNA-seq and scATAC-seq dataset.

   We identified five major lymphocyte populations: CD4^+ (CD4, IL7R,
   CD3D, CD3E) and CD8^+ T cells (CD8A, CD8B, CD3D, CD3E), Treg (FOXP3,
   IL2RA), natural killer (NK)/natural killer T (NKT) cells (KLRD1, GNLY),
   B cells (MS4A1/CD20, CD79A) and multiple myeloid subsets after
   integration of scRNA-seq data and scATAC-seq data, including
   macrophages (CSF1R, CD68, CD163), monocytes (S100A12, FCGR3A/CD16), and
   mast cells (TPSAB1, KIT) (Fig. [80]1c, f and Supplementary Fig.
   [81]S1j). We also identified several nonimmune cell types, including
   endothelial cells (PECAM1, PTPRB) and mesangial cells (PDGFRB, ACTA2).
   Notably, we found two different endothelial subpopulations: one
   specifically expressed vascular cell adhesion molecule 1 (VCAM1), while
   the other did not express VCAM1 but had enriched kinase insert domain
   receptor (KDR). Similar to a previous study^[82]11, we found that the
   VCAM1^− endothelial cells highly expressed genes regulating endothelial
   cell proliferation and vasculature development, whereas the VCAM1^+
   endothelial cells highly expressed genes regulating immune cell
   chemotaxis and migration (Supplementary Fig. [83]S1k). Next, we
   identified the tumor cells by detecting both the canonical marker
   CA9^[84]11,[85]22 and classic copy number variations (CNVs) (chromosome
   3p loss or chromosome 5q gain)^[86]8 (Fig. [87]1c, f, g). In the entire
   tumor microenvironment (TME) of ccRCC, we found that immune cells were
   the most numerous cell population, accounting for more than 70% of the
   total cells, while tumor cells accounted for <10% of the population
   (9.36% from scRNA-seq data; 6.2% from scATAC-seq data) (Fig. [88]1h).
   Moreover, we found that the abundance of cell types varied across
   samples, suggesting a considerable level of tumor heterogeneity
   (Supplementary Fig. [89]S1l). In summary, our integrated scRNA-seq and
   scATAC-seq multiomics approach revealed the heterogeneous cellular
   composition of ccRCC.

Tumor-specific regulatory elements in ccRCC

   To investigate the differences in chromatin accessibility among all
   cell types, we first identified a total of 212,326 peaks in our
   scATAC-seq data by using MACS2^[90]23 and found that ~10.6% (22,682
   unique peaks) of them exhibited significant differences among cell
   types; these sites were defined as differentially accessible chromatin
   regions (DARs) (adjusted P < 0.05 and log[2](fold change (FC)) > 0.25)
   (Fig. [91]2a). Approximately 17.3% (mean proportion = 0.173 ± 0.067) of
   DARs were closely associated with differentially expressed genes (DEGs)
   in their respective cell types (Supplementary Table [92]S4). Overall,
   endothelium and mesangial cells had the most DARs, followed by tumor
   cells (Supplementary Fig. [93]S2a). Notably, lymphoid lineage
   populations, except B cells, had fewer DARs than myeloid cell
   populations. The majority of DARs were located in the promoter and
   intronic regions of the genome, and the distribution of DARs was
   relatively conserved across cell types (Fig. [94]2b and Supplementary
   Fig. [95]S2b). In addition, we found that both DARs and DEGs of tumor
   cells were significantly enriched in metabolism-related biological
   processes (Fig. [96]2c and Supplementary Fig. [97]S2c). Next, we
   applied chromVAR^[98]24 to infer TF motif activity in our scATAC-seq
   data. Hierarchical clustering of the bias-corrected deviation scores
   for the differential TFs revealed shared and unique regulatory elements
   across cell types (Fig. [99]2d and Supplementary Table [100]S5). The
   identification of distinct lineage-specific TFs further supported our
   assignment of cell identities. For example, T-Box (e.g., EOMES and
   TBX5) TFs were exclusively enriched in NK/NKT and CD8^+ T cell
   populations^[101]25,[102]26. Within myeloid cells, SPI1^[103]27 TF were
   enriched in macrophages, CEBP TF family^[104]28 exhibited high activity
   in monocytes, while the GATA2^[105]29 TF was specifically enriched in
   mast cells. Interestingly, two endothelial subpopulations exhibited
   differences in TF enrichment: the VCAM1^− endothelial cells showed high
   activity of SOX TF family, while STAT1 and NFATC4 were enriched in the
   VCAM1^+ endothelial cells.

Fig. 2. Identification of tumor-specific regulatory elements.

   [106]Fig. 2
   [107]Open in a new tab

   a Heatmap of the Z-score normalized accessibilities of 22,682
   scATAC-seq differential peaks for each cell type. b Pie plot of genomic
   annotations for scATAC-seq differential peaks. c Gene ontology term
   enrichment analysis for distal (intergenic region) and proximal
   (promoter and gene body) upregulated DARs in tumor cells. The
   Benjamini-Hochberg adjusted P values are shown on the right side of the
   plots. d Heatmap representation of chromVAR bias-corrected deviation
   scores for the differential TF motifs of each cell type. The top TF
   motifs of each cell type are indicated on both sides of the plot. e
   Heatmap of the log[2](FC) values for tumor-specific TFs identified by
   our filtering strategy. The right bar plot shows the
   Bonferroni-adjusted P values. f Kaplan–Meier overall survival curves of
   TCGA-KIRC patients grouped by the average expression of tumor-specific
   TFs (with the median value as the threshold). HR hazard ratio.

   To uncover the pivotal TFs participating in tumorigenesis and
   development, we designed a filtering strategy to identify highly
   specific TFs enriched in tumor cells. First, we calculated the standard
   deviation of chromVAR bias-corrected scores among all cell types for
   each TF and filtered out the TFs with low variability by the median
   absolute deviation method^[108]30. Then, we selected the TFs highly
   enriched exclusively in tumor cells (log[2](FC) > 4 in tumor cells and
   log[2](FC) < 1 in any other cell types; adjusted P < 0.0001) and
   identified 49 TFs with higher activity scores in tumor cells than in
   any other cell types (Fig. [109]2e and Supplementary Table [110]S6).
   Hepatocyte nuclear factor 1 (HNF1A, HNF1B), a family of
   tissue-restricted transcription regulators associated with kidney
   developmental disorders^[111]31, had the highest activity scores and
   were highly expressed in tumor cells specifically (Supplementary Fig.
   [112]S2d). Multiple HOX family TF genes were also highly enriched in
   tumor cells and have been reported to play crucial roles in numerous
   tumor processes, including angiogenesis and
   oncogenesis^[113]32,[114]33. We then examined the association of these
   tumor-specific TFs with the prognosis of patients in The Cancer Genome
   Atlas kidney renal clear cell carcinoma (TCGA-KIRC) cohort. Patients
   with high average expression of tumor-specific TFs had shorter overall
   survival and disease-free survival than the patients with low average
   expression (Fig. [115]2f and Supplementary Fig. [116]S2e).

   To further investigate and verify the biological functions of these TFs
   in ccRCC, we selected four TFs (HOXC5, VENTX, ISL1, and OTP) whose
   expression levels were significantly associated with worse overall
   survival in the TCGA-KIRC dataset, and binding sites were located in
   accessible chromatin regions, which were specific for kidney cancer and
   identified by ATAC-seq^[117]34 (Fig. [118]3a and Supplementary Fig.
   [119]S3a). We identified their target genes whose promoters or linked
   candidate cis-regulatory elements (cCREs) were accessible and contained
   the TF-binding motif in tumor cells. Then, we constructed TF regulatory
   networks to gain further insights into TF-mediated gene regulation in
   tumor cells. Within these networks, we found that these four TFs
   regulated multiple tumor-specific genes, such as FXYD2 and CRYAB, with
   a significant increase in expression in tumor cells (Fig. [120]3b and
   Supplementary Table [121]S3). Moreover, we found that the target genes
   of these TFs were significantly enriched in hypoxia and cell
   proliferation signaling pathways (Supplementary Fig. [122]S3b). For
   each TF, we calculated the target gene scores for each cell type and
   found that the scores in tumor cells were significantly higher than in
   any other cell type (Supplementary Fig. [123]S3c). Similar phenomena
   were observed in the other three previous scRNA-seq datasets of
   ccRCC^[124]11,[125]13,[126]14 (Supplementary Fig. [127]S3d–f). More
   importantly, we found that knocking down these TFs significantly
   reduced tumor cell proliferation and increased cell death (Fig.
   [128]3c, d and Supplementary Fig. [129]S3g, h). We further confirmed
   that knocking down two TFs (HOXC5 and ISL1) strongly reduced tumor
   growth in the xenograft mouse models (Supplementary Fig. [130]S3i–k).
   To find potential drugs for targeting these TFs, we used the LINCS
   consortium^[131]35, which included 19,811 small molecule
   compound-perturbed profiles, to interrogate the effects of drugs on
   gene expression. With this information, we found two candidate drugs
   (homoharringtonine (HHT) and mitotane) approved by the FDA could
   significantly decrease the expression level of HOXC5, ISL1, and VENTX
   (Fig. [132]3e). OTP was not included in the drug perturbation
   expression profile and thus was not subjected to subsequent analysis.
   As we expected, both of these drugs significantly decreased the
   proliferation rate of tumor cells and the expression levels of HOXC5,
   ISL1, and VENTX in two renal cancer cell lines (Fig. [133]3f–j).
   Collectively, we identified tumor-specific regulatory elements with
   chromosome accessibility signals captured by scATAC-seq and verified
   their roles in promoting tumor growth.

Fig. 3. Characterization of the four tumor-specific TFs.

   [134]Fig. 3
   [135]Open in a new tab

   a Kaplan–Meier analyses of overall survival in TCGA-KIRC patients
   separated by HOXC5, VENTX, ISL1, and OTP expression (with the median
   value as the threshold) individually. HR, hazard ratio. b TF regulatory
   network showing the candidate target genes for the following TFs:
   HOXC5, VENTX, ISL1, and OTP in tumor cells. c The effects of sh-HOXC5,
   VENTX, ISL1, OTP, and vector on cell proliferation were determined by a
   cell proliferation assay in the 786-O cell line. d HOXC5, VENTX, ISL1,
   and OTP mRNA expression were significantly downregulated in the 786-O
   cell line. Significance was determined by two-way ANOVA. e Heatmap
   showing the degree of decrease in expression levels of VENTX, ISL1, and
   HOXC5 after drug treatment. f Cell proliferation was assessed over a
   2-day time course after treatment with DMSO or HHT in the 786-O (top)
   and 769-P (bottom) ccRCC cell lines. Significance was determined by
   multiple t-tests. g The mRNA levels of HOXC5 after treatment with DMSO
   or HHT in the 786-O (left) and 769-P (right) ccRCC cell lines were
   measured using qPCR. Significance was determined by Student’s t-test. h
   Cell proliferation was assessed over a 2-day time course after
   treatment with DMSO or mitotane in the 786-O (top) and 769-P (bottom)
   ccRCC cell lines. Significance was determined by multiple t-tests. i
   The mRNA levels of VENTX after treatment with DMSO or mitotane in the
   786-O (left) and 769-P (right) ccRCC cell lines were measured using
   qPCR. Significance was determined by Student’s t-test. j The mRNA
   levels of ISL1 after treatment with DMSO or mitotane in the 786-O
   (left) and 769-P (right) ccRCC cell lines were measured using qPCR.
   Significance was determined by Student’s t-test. For all statistical
   tests, *P < 0.05; **P < 0.01; ***P < 0.001.

Malignant transcriptional programs within ccRCC

   To explore how expression states vary among different malignant cells
   within ccRCC, we focused on 3564 tumor cells from four samples
   (Supplementary Fig. [136]S4a). Pairwise correlation analysis revealed
   multiple distinct transcriptional states that were consistently present
   within these tumors (Fig. [137]4a). To interrogate tumor heterogeneity
   information more precisely, we applied nonnegative matrix factorization
   (NMF) to define underlying transcriptional programs consisting of
   coexpressed genes. NMF is a matrix factorization method and can extract
   meaningful features from noisy or complex data in a direct, unbiased
   manner^[138]36. Due to its excellent performance in dimensionality
   reduction and data interpretability, NMF is widely used to identify key
   subclasses and latent biological processes in the scRNA-seq
   dataset^[139]37,[140]38. With this method, we successfully extracted a
   total of 11 intratumor programs among these samples (Supplementary Fig.
   [141]S4b, c). Subsequently, we refined these programs by hierarchical
   clustering and identified two meta-programs that included highly
   similar programs across the four samples (Fig. [142]4b and
   Supplementary Fig. [143]S4d, Table [144]S7). Meta-program 1 was present
   in all samples, while meta-program 2 was present in three samples. The
   cluster of programs that covered less than half the samples was not
   subjected to subsequent analysis. These two meta-programs exhibited a
   relatively low correlation (r = −0.2), suggesting the different roles
   in tumors (Supplementary Fig. [145]S4e). Next, we investigated the
   biological functions of these meta-programs using pathway enrichment
   analysis. The genes of meta-program 1 were significantly enriched in
   stress-related pathways, such as hypoxia and the MAPK signaling
   pathway, while the genes of meta-program 2 were mainly involved in
   metabolic-related biological processes, such as glycolysis and
   monosaccharide metabolism (Fig. [146]4c). Furthermore, we examined the
   association between the average gene expression of each meta-program
   and the prognosis of patients in the TCGA-KIRC cohort. Patients with
   high expression of meta-program 1 had worse overall survival than
   patients with low expression, while patients with high expression of
   meta-program 2 were associated with better overall survival (Fig.
   [147]4d, e).

Fig. 4. Deciphering malignant transcriptional programs within ccRCC.

   [148]Fig. 4
   [149]Open in a new tab

   a Pairwise correlations between the expression profiles of four
   scRNA-seq samples (3564 cells). b Heatmap of average correlations
   across four samples between pairs of programs. c Dot plot showing
   significantly enriched pathways for signature genes of each
   meta-program. The color shade of the dot indicates the P value after
   FDR correction (the redder, the smaller adjusted P value), and the size
   of the dot represents the number of genes in the indicated pathway (the
   larger, the more genes included). d, e The Kaplan–Meier overall
   survival curves of TCGA-KIRC patients grouped by the average gene
   expression of meta-program 1 and meta-program 2 (with the median as the
   threshold). The P value was calculated by the log-rank test. HR, hazard
   ratio. f, g Sankey diagrams showing the regulatory relationship between
   the four tumor-specific TFs and genes within meta-program 1 and
   meta-program 2.

   We further interrogated the key regulatory molecules for these two
   meta-programs. With the previous strategy for the identification of TF
   target genes, we found that more than two-thirds of the genes in each
   meta-program can be regulated by numerous TFs (Supplementary Fig.
   [150]S4f). Multiple TFs, such as ZNF263 and SP1, showed similar and
   strong regulatory capabilities in different meta-programs. We noticed
   that VEGFA and SMIM24 were the cardinal coregulatory target genes of
   these TFs for meta-program 1 and meta-program 2, respectively
   (Supplementary Fig. [151]S4g, h). We further explored the regulatory
   relationship between meta-programs and four tumor-specific TFs
   identified previously. Notably, we also found that the four TFs all
   exhibited a regulatory relationship with VEGFA in meta-program 1 (Fig.
   [152]4f). In meta-program 2, CRYAB, the second most significant DEG in
   tumor cells, was regulated by all four TFs (Fig. [153]4g). The most
   significant DEG in tumor cells, FXYD2, was regulated by both ISL1 and
   OTP. Overall, we discovered two major transcriptome programs within
   ccRCC tumor cells and further identified the possible regulatory
   elements of these two programs.

CD8^+ T cell clustering and state analysis in ccRCC

   CD8^+ T lymphocytes play crucial roles in inhibiting tumor progression.
   To better elucidate the heterogeneity of tumor-infiltrating CD8^+ T
   cells, we further investigated CD8^+ T cell subpopulations in both
   scRNA-seq and scATAC-seq datasets. We focused on the major
   subpopulations and removed clusters with fewer than 100 cells
   (Supplementary Fig. [154]S5a–f). Finally, we identified four major
   clusters in the scRNA-seq dataset and five major clusters in the
   scATAC-seq dataset (Fig. [155]5a, b). Tissue-resident CD8^+ T cells
   were identified in both scRNA-seq and scATAC-seq datasets with high
   gene expression or activity of tissue-resident markers (CD69,
   ZNF683/Hobit, ITGAE/CD103, and ITGA1/CD49A)^[156]39–[157]41 (Fig.
   [158]5c). Interestingly, tissue-resident CD8^+ T cells included two
   clusters in the scRNA-seq data: one cluster (tissue-resident C2)
   exhibited high expression of effector molecules (TNF, IFNG, and GZMA),
   and the other (tissue-resident C1) was highly expressed naive/memory
   genes (IL7R, CCR7, and TCF7). We identified the exhausted CD8^+ T cells
   based on the expression of exhaustion markers (PDCD1 and TOX^[159]42)
   (Fig. [160]5c and Supplementary Fig. [161]S5g). The cluster (“exhausted
   immediate-early genes (exhausted IEG)”) was also characterized by the
   high expression of genes induced early after activation (e.g., HSPA1A,
   DNAJB1, JUNB, and ATF3), which is consistent with a recent
   study^[162]15. Notably, we identified two exhausted IEG clusters in the
   scATAC-seq dataset and exhibited different gene activities in T cell
   inhibitory genes (CTLA4, LAG3, and HAVCR2/TIM3).

Fig. 5. Characterization of CD8^+ T cells in ccRCC.

   [163]Fig. 5
   [164]Open in a new tab

   a, b Subclustering of CD8^+ T cells on the UMAP plots of the scRNA-seq
   and scATAC-seq datasets. c Heatmap of CD8^+ T cell lineage and
   functional markers provided phenotypic information for individual CD8^+
   T cell clusters in the scRNA-seq (left) and scATAC-seq (right) data. d,
   e Violin plots showing the signature scores of cytotoxic and exhaustion
   gene sets for each CD8^+ T cell cluster in the scRNA-seq (top) and
   scATAC-seq data (bottom). Signature scores for each cell were
   calculated by the VISION method. Two-sided Wilcoxon test. ns, no
   significance; *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001. f
   Violin plots showing the progenitor and terminally exhausted signature
   scores for each CD8^+ T cell cluster in the scRNA-seq data. Signature
   scores for each cell were calculated by the VISION method. Two-sided
   Wilcoxon test. ns, no significance; *P < 0.05; **P < 0.01;
   ***P < 0.001; ****P < 0.0001. g Heatmap of significantly enriched
   pathways for DEGs of each CD8^+ T cell cluster in the scRNA-seq data.
   The darker the red color is, the smaller the adjusted P value. h Dot
   plot showing the top enriched TFs within each CD8^+ T cell
   subpopulation individually.

   We further investigated the functional properties of these CD8^+ T cell
   subpopulations using multiple functional gene sets^[165]13,[166]43 and
   the VISION method^[167]44 (Supplementary Table [168]S2). The exhaustion
   cluster presented the highest cytotoxic, exhaustion, and terminal
   differentiation signature scores (Fig. [169]5d, e and Supplementary
   Fig. [170]S5h). Exhausted IEG cells had higher cell stress signature
   scores than other cells (Supplementary Fig. [171]S5i). Recent studies
   have reported that progenitor exhausted T cells control tumors more
   effectively than terminally exhausted T cells and respond better to
   anti-PD1 therapy^[172]43,[173]45. Thus, we assessed progenitor and
   terminally exhausted signatures and found that the tissue-resident C1
   cluster had the highest progenitor exhausted signature scores and the
   lowest terminally exhausted signature scores (Fig. [174]5f), indicating
   an important role in tumor immunotherapy. In parallel, we performed
   pathway enrichment analysis for DEGs (adjusted P < 0.05 and
   log[2](FC) > 0.5) of each cluster in our scRNA-seq dataset. We found
   that two tissue-resident clusters enriched distinct pathways: the
   tissue-resident C1 cluster was significantly associated with the
   inflammatory response and the tissue-resident C2 cluster was enriched
   in cytokine and vitamin metabolic pathways (Fig. [175]5g). Multiple T
   cell exhaustion-related pathways, such as PD-1, IL2-STAT5^[176]46, and
   IL6-JAK-STAT3^[177]47 signaling, were significantly enriched in the
   exhaustion cluster. Furthermore, we found that the average expression
   of DEGs in the exhaustion cluster was associated with worse overall
   survival in the TCGA-KIRC and CheckMate-025 cohorts, a randomized phase
   III trial of nivolumab (anti-PD-1) treatment in ccRCC^[178]48
   (Supplementary Fig. [179]S5j). Next, we examined the regulatory
   elements in these CD8^+ T clusters. EOMES and BATF implicating in CD8^+
   T cell exhaustion^[180]49,[181]50 were highly enriched in the
   exhaustion cluster (Fig. [182]5h). Multiple Rel/NF-κB family TFs, such
   as RELA and NFKB1, were also enriched in the exhaustion cluster.
   Interestingly, we found that two exhausted IEG clusters were regulated
   by different TFs.

Macrophage clustering and state analysis in ccRCC

   Tumor-associated macrophages (TAMs) are the major population of myeloid
   cells in tumors and play vital roles in tumorigenesis and drug
   resistance^[183]51. To systematically interrogate the heterogeneity of
   TAMs, we performed recluster analysis similar to CD8^+ T cells and
   identified three TAM clusters in both datasets using known phenotypic
   markers^[184]52 (Fig. [185]6a–c and Supplementary Fig. [186]S6a–f). The
   first cluster (“TAM-C1QB”) expressed complement genes (e.g., C1QB and
   C1QC), APOE, and early response-related genes (e.g., IER2 and JUN).
   Moreover, numerous MHC class II molecules were highly expressed in this
   cluster, suggesting strong antigen presentation ability (Fig. [187]6d).
   The second cluster (“TAM-RGCC”) expressed high levels of RGCC and
   proinflammatory genes, such as NLRP3^[188]53, CLEC5A, and IL1A/B. The
   third cluster (“TAM-LGALS3”) exhibited intermediate expression of C1QB
   and C1QC but highly expressed genes involved in alternative (M2)
   macrophage activation, such as ANXA2^[189]54 and LGALS3^[190]55. In
   addition, this cluster also expressed immunosuppressive genes, such as
   GPNMB^[191]56 and TREM2^[192]57, and multiple MHC class I molecules.

Fig. 6. Characterization of TAMs in ccRCC.

   [193]Fig. 6
   [194]Open in a new tab

   a, b Subclustering of TAMs on the UMAP plots of the scRNA-seq and
   scATAC-seq datasets. c Dot plot showing the expression levels and
   activity scores for known phenotypic markers in each TAM cluster. Dot
   size indicates the fraction of expressing cells, colored based on
   normalized expression or activity. d Heatmap showing the expression of
   MHC molecules, chemokines, cytokines, and other related genes in each
   TAM cluster. e Boxplots showing the M1 and M2 signature scores for each
   TAM cluster in the scRNA-seq data. The signature score was calculated
   by the VISION method. Two-sided Wilcoxon test. ns, no significance;
   *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001. f, g Boxplot
   showing the angiogenesis and phagocytosis signature scores for each TAM
   cluster in the scRNA-seq data. The signature score was calculated by
   the VISION method. Two-sided Wilcoxon test. ns, no significance;
   *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001. h Dot plot showing
   the expression of the immune costimulatory, checkpoint, and evasion
   genes for each TAM cluster in the scRNA-seq dataset. i Heatmap of
   log[2](FC) values for TFs with significant differences both in the
   scRNA-seq and scATAC-seq datasets. j TF regulatory network showing the
   predicted target genes for MEF2C in the TAM-C1QB cluster.

   To understand the function of these TAM subpopulations, we further
   examined the expression or activity of multiple functional gene
   sets^[195]58 (Supplementary Table [196]S2). We found that the TAM-RGCC
   cluster exhibited the highest M1 signature scores, followed by the
   TAM-C1QB cluster (Fig. [197]6e and Supplementary Fig. [198]6g).
   Conversely, the M2 signature was primarily enriched in the TAM-LGALS3
   cluster. TAM-RGCC cluster possessed higher angiogenesis signature
   scores (Fig. [199]6f and Supplementary Fig. [200]S6h). Notably, the
   TAM-C1QB cluster showed the highest phagocytosis signature scores in
   scRNA-seq data, suggesting a high phagocytic activity (Fig. [201]6g).
   However, scATAC-seq data did not show a similar trend, maybe due to the
   sparsity of its data^[202]59 (Supplementary Fig. [203]S6i). In
   parallel, we investigated the differences among these subpopulations at
   the pathway level and found that various pathways related to
   inflammation (e.g., TNFA signaling via NFKB, and inflammatory response)
   and phagocytosis (e.g., endocytosis and phagocytosis) were
   significantly enriched in the TAM-C1QB cluster (Supplementary Fig.
   [204]S6j). TAM-RGCC cluster exhibited enrichment of
   angiogenesis-related pathways, while the TAM-LGALS3 cluster exhibited
   enrichment of lysosome and lipid catabolic processes. Furthermore, we
   examined the expression of immune checkpoint genes and costimulatory
   molecules. Multiple costimulatory signals were detected in both
   TAM-C1QB and TAM-RGCC clusters but not in the TAM-LGALS3 cluster (Fig.
   [205]6h). The ligands (PD-L1, PD-L2, VSIR/VISTA^[206]60, VSIG4^[207]61,
   and SIGLEC10^[208]62) mediating T cell immune checkpoint were highly
   expressed in the TAM-C1QB cluster while LGALS9^[209]63 was specifically
   enriched in the TAM-LGALS3 cluster. Next, we investigated the
   difference in TF motif activity and found distinct TF regulation
   programs among the three TAM clusters (Supplementary Fig. [210]S6k).
   Four TFs (MEF2C, NFKB1, RUNX3, and ENO1) showed a substantial increase
   in both gene expression and activity in the corresponding clusters
   (Fig. [211]6i). Previous studies have demonstrated that MEF2C plays
   vital roles in promoting M1 macrophage polarization and inducing cell
   death of macrophage^[212]64,[213]65. We identified MEF2C target genes
   using the same strategy described in the previous section and found
   that this gene regulated multiple TFs (e.g., FOXO1, NEU1, and NRP1) and
   chemokines (e.g., CCL3 and CCL3L1) (Fig. [214]6j) that have been
   demonstrated to be associated with phagocytosis and angiogenesis in
   macrophages^[215]66–[216]69. Moreover, the signature score of MEF2C
   target genes was associated with better survival in both the TCGA-KIRC
   and CheckMate-025 cohorts (Supplementary Fig. [217]S6l).

Cell–cell cross-talk in the ccRCC microenvironment

   To demonstrate the intercellular communications within the ccRCC
   microenvironment, we applied CellPhoneDB, a repository of known
   ligand–receptor interactions, for data analysis. We found that the
   endothelial cells had the largest ligand–receptor interaction pairs,
   while B cells had the fewest (Fig. [218]7a and Supplementary Fig.
   [219]S7a). Notably, the TAM populations had the largest number of
   ligand–receptor interaction pairs compared to other immune cell
   populations (Supplementary Fig. [220]S7b). Tumor cells are the key
   cellular component of the entire TME, and the interactions between
   tumor cells and other cells play crucial roles in tumorigenesis.
   Therefore, we built an interaction map to investigate the intercellular
   communications of tumor cells (Fig. [221]7b). Tumor cells mainly
   communicated with the endothelium, TAMs, proliferative CD8^+ T cells,
   and CD8^+ exhaustion T cells through multiple ligand–receptor
   interactions (Fig. [222]7c and Supplementary Fig. [223]S7c).

Fig. 7. Ligand–receptor-based interaction between tumor and immune cells.

   [224]Fig. 7
   [225]Open in a new tab

   a Heatmap of the number of significant ligand–receptor interactions
   between cell types in the scRNA-seq data. b The interaction network
   between the tumor cell and other cell types. The direction of the arrow
   indicates the cell type expressing the receptor. The wider the edge or
   larger the dot is, the greater the number of interactions. c Heatmap of
   the number of significant interactions mediated by tumor cells as
   ligand-expressing and receptor-expressing cells. d Dot plots showing
   the mean interaction strength for selected ligand–receptor pairs
   between tumor cells and TAM clusters. The left and right plots show
   that the ligand is expressed in the tumor cells and TAM clusters,
   respectively. Dot size indicates the P value, colored by the average
   expression level of interacting molecule 1 in cluster 1 and interacting
   molecule 2 in cluster 2. e The Kaplan–Meier overall survival curves of
   TCGA-KIRC patients grouped by the averaged expression of RPS19 and
   C5AR1 (with the median value as the threshold). HR, hazard ratio. f Dot
   plots showing the mean interaction strength for selected
   ligand–receptor pairs between tumor cells and three lymphoid subsets.
   The left and right plots show that the ligand is expressed in the tumor
   cells and lymphoid subsets, respectively. Dot size indicates the P
   value, colored by the average expression level of interacting molecule
   1 in cluster 1 and interacting molecule 2 in cluster 2. g Kaplan–Meier
   overall survival curves of TCGA-KIRC patients grouped by the averaged
   expression of LTB and LTBR (with the median value as the threshold).

   The crosstalks between various immune cells (especially macrophages and
   T cells) and tumor cells play crucial roles in tumor immune escape,
   which is a hot spot in cancer immunology research. Thus, we selected
   three cell subpopulations in the myeloid and lymphoid lineages with the
   highest number of interactions with tumor cells for further study. We
   found strong interactions between tumor cells and TAM populations
   mediated by macrophage migration inhibitory factor (MIF) and its
   receptors (CD44 and CD74), which have been well characterized to play
   vital roles in tumor progression, angiogenesis, and immune escape in
   ccRCC^[226]70–[227]72 (Fig. [228]7d and Supplementary Fig. [229]S7d).
   We also uncovered some novel interactions that have not been reported
   in ccRCC, for example, the ligand–receptor interaction between
   ribosomal protein S19 (RPS19) and complement C5a receptor 1 (C5AR1),
   which has been shown to contribute to immunosuppression in the human
   breast and ovarian cancer TME^[230]73. Patients with high average
   expression of RPS19 and C5AR1 had a poor prognosis in the TCGA-KIRC
   dataset (Fig. [231]7e). Moreover, we identified multiple interactions
   between tumor cells and three lymphocyte subpopulations (proliferative
   CD8^+ T cell, Treg, and CD8^+ T-exhaustion) (Fig. [232]7f). Notably, we
   characterized a Treg-tumor cell-specific interaction mediated by the
   ligand Lymphotoxin Beta (LTB) and its receptor lymphotoxin beta
   receptor (LTBR), which is required for fibrosarcoma and hepatocellular
   carcinoma tumor formation^[233]74,[234]75. High average expression of
   LTB and LTBR was associated with poor clinical outcomes in the
   TCGA-KIRC dataset, suggesting the translational potential of this
   finding (Fig. [235]7g).

Discussion

   Here, we performed integrative scRNA-seq and scATAC-seq to delineate
   the transcriptional and epigenetic landscape of ccRCC. Our analysis
   demonstrated that scRNA-seq and scATAC-seq had consistently high cell
   identification abilities, and their data confirmed and complemented
   each other. Single-cell multiomics profiling can provide more
   comprehensive information from different perspectives, enabling us to
   better dissect cellular compositions and decipher cross-compartment
   interactions in the TME.

   Consistent with several recently reported scRNA-seq studies of ccRCC,
   we found that tumor cells represent only a small fraction (7.2%) of all
   cells in ccRCC tissues^[236]11, suggesting that traditional bulk-level
   epigenetic sequencing methods, such as ATAC-seq and ChIP-seq, may fail
   to identify some tumor cell-specific regulatory elements and their
   networks. In this study, we revealed several pivotal regulatory TFs
   (HOXC5, ISL1, VENTX, and OTP) in tumor cells by integrating scRNA-seq
   and scATAC-seq data and further experimentally validated their roles in
   tumor growth. The function of these TFs had been explored in some other
   types of cancers. For example, ISL1 has been reported as an important
   regulator and potential therapeutic target for gastric cancer and
   triple-negative breast cancer^[237]76,[238]77, while OTP has been
   characterized as a promising prognostic marker of pulmonary
   neuroendocrine tumors^[239]78. Our findings highlight the potential
   oncogenic roles of these TFs in ccRCC for the first time, but the
   underlying mechanisms remain to be investigated in future studies.

   In addition, multiple hepatocyte nuclear factors (e.g., HNF1A, HNF4A,
   and HNF4G) we identified in tumor cells have been shown to be specific
   for proximal tubule cells, which are the original cells of
   ccRCC^[240]79. We also adopted a refined clustering strategy on the two
   major immune cell populations (CD8^+ T cells and macrophages) in the
   TME to reveal the TFs that control different functional states of these
   cells. Multiple NFATC TFs (e.g., NFATC2/NFAT1 and NFATC3/NFAT4) were
   specifically enriched in the exhausted IEG C1 population and have
   already been demonstrated to promote exhaustion of T cells and better
   control immune responses by disrupting its
   interactions^[241]80,[242]81. C-Rel (REL) plays a vital role in the
   activation of Tregs and is a potential target for suppressing Treg
   function^[243]82. Interestingly, we found that the REL TF was
   significantly enriched in the exhaustion T cells, suggesting a
   potential target for CD8^+ T cells for future research. Furthermore, we
   comprehensively delineated the multicellular communities in the TME and
   identified ligand–receptor pairs, including RPS19-C5AR1 and LTB-LTBR,
   which could serve as potential therapeutic targets for further studies.
   Targeting the LTB-LTBR interaction with an agonistic monoclonal
   antibody (CBE11) inhibited tumor growth and prolonged survival in
   colorectal cancer xenografts^[244]83. Therefore, therapeutic
   interruption of these interactions by antagonist or blocking antibodies
   may alleviate the immunosuppressive nature of the microenvironment and
   provide a potential strategy for ccRCC treatment.

   In summary, our single-cell multiomics analysis provides a
   high-resolution transcriptional and epigenomic map of ccRCC and
   in-depth knowledge of tumor heterogeneity and underlying regulatory
   factors, facilitating a step forward in rational therapeutic strategy
   design.

Materials and methods

Human specimens

   All research activities were pre-approved by The Institutional Research
   Ethics Committee of The Second Affiliated Hospital, School of Medicine,
   Zhejiang University (NO. IR2020001463), and Informed consent were
   obtained for all human participants. Human ccRCC samples were directly
   obtained from the operating room during nephrectomy and transported in
   MACS tissue storage solution (#130-100-008; Miltenyi) on ice.

Tissue dissociation and library preparation

   The tumor tissue of each sample was equally divided into two parts for
   scRNA-seq and scATAC-seq library preparation. For scATAC-seq, fresh
   nuclei were isolated with 1 mL prepared lysis buffer (10 nM Tris-HCI,
   10 mM NaCI, 3 mM MgCl[2], 0.1% Nonidet^TM P40 Substitute (#74385;
   Sigma-Aldrich)) and incubated on ice for 5 min. The culture was
   centrifuged at 500× g for 5 min at 4 °C and mixed with 1 mL nuclear
   resuspension buffer (1× PBS, 1% BSA, and 0.2 U/μL RNase Inhibitor). The
   homogenate was filtered through a 40-μm cell strainer (#H13680-0040;
   BelArt) and the nuclear concentration was determined using Countess® II
   FL Automated Cell Counter (#[245]C10228; Thermo Fisher). For scRNA-seq,
   tissues were cut into 2–4 mm^3 pieces and dissociated to generate a
   single-cell suspension. The cell suspension culture was centrifuged at
   300× g for 30 s and resuspended with an additional 10 mL RPMI 1640
   (#10-040-CM; Miltenyi Biotec). The homogenate was filtered through a
   70-μm cell strainer (#130-098-462; Miltenyi Biotec) and centrifuged at
   300× g for 7 min followed by red blood cell lysis with 1 mL 1× Red
   Blood Cell Lysis Solution (#130-094-183; Miltenyi Biotec). The cell
   concentration was determined using Countess® II FL Automated Cell
   Counter. 10× Chromium libraries were prepared according to manufacturer
   protocol. Three scATAC-seq libraries and four scRNA-seq libraries were
   obtained using 10× Genomics Chromium Single Cell ATAC v1 chemistry and
   10× Genomics Chromium Single Cell 3′ v3 chemistry, respectively. All
   libraries were sequenced on Illumina NovaSeq 6000 using a paired-end
   150 bp protocol.

scRNA-seq data processing

   Raw sequencing data were processed with Cell Ranger (v5.0.0, 10×
   Genomics) software for demultiplexing, aligning to GRCh38 human
   reference genome, and generating gene-barcode matrices. Seurat
   (v4.0.3)^[246]84 R package was used to perform filtering,
   normalization, dimensionality reduction, clustering, and differential
   expression analysis. The following criteria were applied to each sample
   to remove low-quality cells: gene number between 200 and 6000, UMI
   count > 1000, and mitochondrial content > 10%. Doublets were predicted
   by the DoubletFinder (v2.0.3)^[247]85 algorithm. After filtering, a
   total of 38,600 cells were left. The batch effect across different
   samples was eliminated by the Harmony (v1.0)^[248]86 method. The top 40
   harmony embedding was selected by the “ElbowPlot” function and used to
   perform clustering and visualization. The “FindClusters” function was
   performed to generate different clustering results with resolutions
   ranging from 0.2 to 1.2. An appropriate resolution was determined based
   on cluster stability with clustree (v0.4.3)^[249]87 R package. Finally,
   we obtained 19 clusters (resolution = 0.3) (Supplementary Fig.
   [250]S1c). Two clusters (15 and 18) were excluded because they
   contained fewer than 100 cells or did not express known markers, and
   38,097 cells were retained for the subsequent analysis. Differential
   gene analysis was performed by the MAST method^[251]88
   (“FindAllMarkers” function) and the donor as the latent variable. DEGs
   were identified with Bonferroni-adjusted P values smaller than 0.05 and
   log[2]-fold-change values larger than 0.25. The second round of
   clustering procedures for CD8^+ T and macrophage cells was the same as
   above: starting from normalized expression matrix with SCTransfrom
   method, performing integration analysis with Harmony method, and
   clustering with “FindNeighbors” and “FindClusters” function.

scATAC-seq data processing

   Raw sequencing data were processed with Cell Ranger ATAC (v1.2.0, 10×
   Genomics) software for demultiplexing, aligning to GRCh38 human
   reference genome, and generating peak-barcode matrices. Signac
   (v1.2.1)^[252]89 R package was used to perform subsequent analysis.
   Low-quality cells were removed based on the following criteria: peak
   region fragments > 1000, peak region fragments < 20,000, reads in peaks
   > 15%, blacklist ratio < 0.05, nucleosome signal < 4 and TSS enrichment
   > 3. After filtering, a dataset from three samples comprising 88,392
   peaks from 24,173 cells remained for latent semantic indexing (LSI)
   analysis. Clustering and dimensionality reduction were then performed
   on the corrected LSI components by the Harmony^[253]86 method. Finally,
   we obtained 15 clusters with resolution = 0.5 and dims = 2:15
   (Supplementary Fig. [254]S1e). The activity of each gene was quantified
   by examining the local chromatin accessibility, including the 2 kb
   upstream of the transcriptional start site and gene body. The
   accessible chromatin peaks for each cell type were identified by the
   MACS2 method^[255]23. Differential chromatin accessibility analysis was
   performed by the “FindAllMarkers” function. Differentially accessible
   chromatin regions were identified with Bonferroni-adjusted P values
   smaller than 0.05 and log[2]-fold-change values larger than 0.25. The
   genomic regions containing accessible chromatin peaks were annotated by
   ChIPSeeker (v1.26.2)^[256]90 with the UCSC database on hg38^[257]91. A
   similar analysis strategy as described above was used to investigate
   the subpopulation for CD8^+ T cells and macrophages.

Integrated analysis of scRNA-seq and scATAC-seq data

   To help to interpret the scATAC-seq data, we applied Seurat’s
   integration framework to identify the pairs of corresponding cells
   between two modalities data. The shared correlation patterns between
   scATAC-seq gene activity and scRNA-seq gene expression were identified
   by the “FindTransferAnchors” function (reduction = ‘cca’). Then the
   cell type label of each cell in scATAC-seq data was predicted by
   “TransferData” function (weight.reduction = ‘lsi’ and dim = 2:15). A
   total of 21,272 cells were left after filtering using a maximum
   prediction score ≥ 0.5 (Supplementary Fig. [258]S1f). The filtered
   scATAC-seq object was reprocessed with LSI, batch corrected with
   Harmony algorithm, and clustered with SLM algorithm. The Jaccard index
   was used to assess the consistency between cell identities predicted by
   label transfer and curated annotations based on gene activities of
   known markers (Supplementary Fig. [259]S1i).

Single-cell copy number variation analysis

   The inferCNV (v1.6.0) (inferCNV of the Trinity CTAT Project,
   [260]https://github.com/broadinstitute/inferCNV) R package was used to
   distinguish malignant cells by inferring chromosomal CNVs based on the
   gene expression data. The CD8^+ T cells and NK/NKT cells as normal
   reference cells were used to estimate CNVs for the potential tumor cell
   population. A gene ordering file from the human GRCh38 assembly
   containing each gene’s chromosomal start and end positions was created
   as the input of the “gene_order_file” parameter. The raw count matrix
   and annotation file were input to run inferCNV with cutoff = 0.1.

Single-cell transcription factor activity analysis

   Single-cell TF motif activity was estimated for a set of 870 TFs from
   the Catalog of Inferred Sequence Binding Preferences (CIS-BP) database