Abstract The clear cell renal cell carcinoma (ccRCC) microenvironment consists of many different cell types and structural components that play critical roles in cancer progression and drug resistance, but the cellular architecture and underlying gene regulatory features of ccRCC have not been fully characterized. Here, we applied single-cell RNA sequencing (scRNA-seq) and single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) to generate transcriptional and epigenomic landscapes of ccRCC. We identified tumor cell-specific regulatory programs mediated by four key transcription factors (TFs) (HOXC5, VENTX, ISL1, and OTP), and these TFs have prognostic significance in The Cancer Genome Atlas (TCGA) database. Targeting these TFs via short hairpin RNAs (shRNAs) or small molecule inhibitors decreased tumor cell proliferation. We next performed an integrative analysis of chromatin accessibility and gene expression for CD8^+ T cells and macrophages to reveal the different regulatory elements in their subgroups. Furthermore, we delineated the intercellular communications mediated by ligand–receptor interactions within the tumor microenvironment. Taken together, our multiomics approach further clarifies the cellular heterogeneity of ccRCC and identifies potential therapeutic targets. Subject terms: Tumour heterogeneity, Epigenetics Introduction Clear cell renal carcinoma (ccRCC) is the most common and aggressive histological subtype of renal cell carcinoma^[44]1,[45]2. More than one-third of ccRCC patients relapse and develop metastases after surgery. The prognosis for metastatic ccRCC patients is poor, with a 5-year survival rate of 10%^[46]3, emphasizing the need to understand the underlying cellular and molecular mechanisms to facilitate the discovery of biomarkers and guide clinical intervention. The efficacy of current ccRCC clinical treatment modalities, including conventional chemotherapy, targeted therapy, and immunotherapy, is limited by tumor heterogeneity^[47]2,[48]4. Previously, large-scale genomic studies revealed many essential genome mutations that drive tumor progression and contribute to clinical treatment^[49]5–[50]7. A common mutation is the inactivation of the VHL gene, leading to the stabilization of oncogenic hypoxia-inducible factor proteins (HIF1α, HIF2α). Other frequent mutations in chromatin remodeling genes, such as PBRM1, BAP1, and SETD2, are necessary for tumorigenesis and tumor development^[51]7,[52]8. However, tumors exist in a highly heterogeneous microenvironment that includes many different cell types, so bulk sequencing is not suitable for delineating tumor characteristics at the cellular level. Single-cell isolation and barcoding technologies have enabled us to investigate cellular heterogeneity at a single-cell resolution to discern the role of different cell populations whose gene expression patterns may be masked or diluted in bulk sequencing^[53]9,[54]10. ScRNA-seq has been used to comprehensively characterize the cellular composition and transcriptional states of ccRCC, revealing its origin and intratumoral heterogeneity^[55]11,[56]12. Comparative scRNA-seq analyses of different conditions (e.g., before and after treatment, in disease stages, and at sampling sites) revealed multiple cell populations associated with immunotherapy resistance and a poor prognosis in ccRCC patients^[57]13–[58]15. Although significant progress has been made toward delineating changes in the transcriptional expression of various cell types during tumor development and clinical treatment, it remains unclear how cis-acting DNA elements (e.g., enhancers and promoters) and trans-acting factors (e.g., transcription factors (TFs)) regulate these changes within the microenvironment. ScATAC-seq identifies accessible chromatin regions by Tn5 transposase-mediated tagmentation and captures active DNA regulatory elements at a single-cell resolution^[59]16. Similar to bulk ATAC-seq, this approach can capture multiple types of gene regulatory information; for example, it can identify genome-wide cis-elements and indicate inference with TFs binding and activity. ScATAC-seq has been widely used to delineate the differentiation trajectory of developmental lineages and reveal key regulatory elements^[60]17,[61]18. Multidimensional data, such as DNA and RNA sequencing data, provide different perspectives to investigate biological phenomena and deepen our understanding of tumor pathogenesis and progression. Recent studies integrating scRNA-seq and scATAC-seq data on healthy kidneys have depicted the transcriptional and chromatin accessibility landscape and identified key celltype-specific TFs that play crucial roles in kidney development^[62]19,[63]20. Resolving ccRCC heterogeneity with multidimensional information at the single-cell level can provide new insights for exploring tumor regulatory mechanisms and identifying potential therapeutic targets. Here, we performed scRNA-seq and scATAC-seq on ccRCC primary tumor tissues to identify the key regulatory molecules that mediate tumor development and manipulate the function of immune cells. Our multiomics analysis revealed a tumor-specific regulatory signature and multiple TFs associated with immune cell functional states. Our work provides important insights into dissecting the tumor heterogeneity of ccRCC using single-cell multiomics data. Results Single-cell transcriptional and chromatin accessibility profiling in ccRCC To systematically dissect the heterogenous architecture of ccRCC tumors, we performed paired scRNA-seq and scATAC-seq on three patients and scRNA-seq alone on one additional patient (Fig. [64]1a). Selected patients ranged in age from 33 to 75 years old and included both women (n = 2) and men (n = 2) (Supplementary Table [65]S1). All samples had VHL gene mutations (Supplementary Fig. [66]S1a). Other common mutations, such as the histone deubiquitinase BAP1^[67]8 and SWI/SNF component ARID1B^[68]21, were also detected sporadically in these samples. After quality-control filtering, a total of 38,097 cells from scRNA-seq and 21,272 cells from scATAC-seq were retained for the following analysis (Supplementary Fig. [69]S1b–e). Using canonical markers, we identified 15 cell types in the scRNA-seq dataset (Fig. [70]1b, c and Supplementary Tables [71]S2, [72]S3). For our scATAC-seq data, we calculated the prediction scores by Seurat’s label-transfer algorithm and annotated cell clusters in a supervised manner (Fig. [73]1d and Supplementary Fig. [74]S1f, g). In parallel, we inspected the chromatin accessibility at the promoter regions for known marker genes and calculated their activity scores for assigning cell identities (Supplementary Fig. [75]S1h). Comparing the annotation results between these two annotation strategies, we found that most cell types were present in both datasets, which supported that scATAC-seq is comparable to scRNA-seq in the detection and annotation of cell types (Supplementary Fig. [76]S1i). Finally, we identified 12 cell types in the scATAC-seq dataset by combining these two annotation results (Fig. [77]1e, f). Fig. 1. Single-cell transcriptional and chromatin accessibility profiling of human ccRCC. [78]Fig. 1 [79]Open in a new tab a Schematic of multiomics profiling of chromatin accessibility and transcription in ccRCC using scRNA-seq and scATAC-seq. b scRNA-seq UMAP projection of 38,097 single cells from four ccRCC samples. c Dot plot showing the gene expression patterns of cell-type marker genes in the scRNA-seq data. d Cross-platform linkage of scATAC-seq cells with scRNA-seq cells. e scATAC-seq UMAP projection of 21,272 single cells from three ccRCC samples. f Normalized chromatin accessibility profiles for each cell type at canonical marker genes. The promoter region is highlighted in gray with the gene model and chromosome position shown below. g Heatmap of CNV signals normalized against the “normal” cluster shown in the top panel (CD8^+ T cells and NK/NKT cells) for CNV changes by chromosome (columns) within individual cells (rows). All cells in the tumor population (bottom panel) exhibited chromosome 3p loss (blue dotted frame) or chromosome 5q gain (red dotted frame), which are classic genomic features of ccRCC. h Pie charts of the proportion of each class in the scRNA-seq and scATAC-seq dataset. We identified five major lymphocyte populations: CD4^+ (CD4, IL7R, CD3D, CD3E) and CD8^+ T cells (CD8A, CD8B, CD3D, CD3E), Treg (FOXP3, IL2RA), natural killer (NK)/natural killer T (NKT) cells (KLRD1, GNLY), B cells (MS4A1/CD20, CD79A) and multiple myeloid subsets after integration of scRNA-seq data and scATAC-seq data, including macrophages (CSF1R, CD68, CD163), monocytes (S100A12, FCGR3A/CD16), and mast cells (TPSAB1, KIT) (Fig. [80]1c, f and Supplementary Fig. [81]S1j). We also identified several nonimmune cell types, including endothelial cells (PECAM1, PTPRB) and mesangial cells (PDGFRB, ACTA2). Notably, we found two different endothelial subpopulations: one specifically expressed vascular cell adhesion molecule 1 (VCAM1), while the other did not express VCAM1 but had enriched kinase insert domain receptor (KDR). Similar to a previous study^[82]11, we found that the VCAM1^− endothelial cells highly expressed genes regulating endothelial cell proliferation and vasculature development, whereas the VCAM1^+ endothelial cells highly expressed genes regulating immune cell chemotaxis and migration (Supplementary Fig. [83]S1k). Next, we identified the tumor cells by detecting both the canonical marker CA9^[84]11,[85]22 and classic copy number variations (CNVs) (chromosome 3p loss or chromosome 5q gain)^[86]8 (Fig. [87]1c, f, g). In the entire tumor microenvironment (TME) of ccRCC, we found that immune cells were the most numerous cell population, accounting for more than 70% of the total cells, while tumor cells accounted for <10% of the population (9.36% from scRNA-seq data; 6.2% from scATAC-seq data) (Fig. [88]1h). Moreover, we found that the abundance of cell types varied across samples, suggesting a considerable level of tumor heterogeneity (Supplementary Fig. [89]S1l). In summary, our integrated scRNA-seq and scATAC-seq multiomics approach revealed the heterogeneous cellular composition of ccRCC. Tumor-specific regulatory elements in ccRCC To investigate the differences in chromatin accessibility among all cell types, we first identified a total of 212,326 peaks in our scATAC-seq data by using MACS2^[90]23 and found that ~10.6% (22,682 unique peaks) of them exhibited significant differences among cell types; these sites were defined as differentially accessible chromatin regions (DARs) (adjusted P < 0.05 and log[2](fold change (FC)) > 0.25) (Fig. [91]2a). Approximately 17.3% (mean proportion = 0.173 ± 0.067) of DARs were closely associated with differentially expressed genes (DEGs) in their respective cell types (Supplementary Table [92]S4). Overall, endothelium and mesangial cells had the most DARs, followed by tumor cells (Supplementary Fig. [93]S2a). Notably, lymphoid lineage populations, except B cells, had fewer DARs than myeloid cell populations. The majority of DARs were located in the promoter and intronic regions of the genome, and the distribution of DARs was relatively conserved across cell types (Fig. [94]2b and Supplementary Fig. [95]S2b). In addition, we found that both DARs and DEGs of tumor cells were significantly enriched in metabolism-related biological processes (Fig. [96]2c and Supplementary Fig. [97]S2c). Next, we applied chromVAR^[98]24 to infer TF motif activity in our scATAC-seq data. Hierarchical clustering of the bias-corrected deviation scores for the differential TFs revealed shared and unique regulatory elements across cell types (Fig. [99]2d and Supplementary Table [100]S5). The identification of distinct lineage-specific TFs further supported our assignment of cell identities. For example, T-Box (e.g., EOMES and TBX5) TFs were exclusively enriched in NK/NKT and CD8^+ T cell populations^[101]25,[102]26. Within myeloid cells, SPI1^[103]27 TF were enriched in macrophages, CEBP TF family^[104]28 exhibited high activity in monocytes, while the GATA2^[105]29 TF was specifically enriched in mast cells. Interestingly, two endothelial subpopulations exhibited differences in TF enrichment: the VCAM1^− endothelial cells showed high activity of SOX TF family, while STAT1 and NFATC4 were enriched in the VCAM1^+ endothelial cells. Fig. 2. Identification of tumor-specific regulatory elements. [106]Fig. 2 [107]Open in a new tab a Heatmap of the Z-score normalized accessibilities of 22,682 scATAC-seq differential peaks for each cell type. b Pie plot of genomic annotations for scATAC-seq differential peaks. c Gene ontology term enrichment analysis for distal (intergenic region) and proximal (promoter and gene body) upregulated DARs in tumor cells. The Benjamini-Hochberg adjusted P values are shown on the right side of the plots. d Heatmap representation of chromVAR bias-corrected deviation scores for the differential TF motifs of each cell type. The top TF motifs of each cell type are indicated on both sides of the plot. e Heatmap of the log[2](FC) values for tumor-specific TFs identified by our filtering strategy. The right bar plot shows the Bonferroni-adjusted P values. f Kaplan–Meier overall survival curves of TCGA-KIRC patients grouped by the average expression of tumor-specific TFs (with the median value as the threshold). HR hazard ratio. To uncover the pivotal TFs participating in tumorigenesis and development, we designed a filtering strategy to identify highly specific TFs enriched in tumor cells. First, we calculated the standard deviation of chromVAR bias-corrected scores among all cell types for each TF and filtered out the TFs with low variability by the median absolute deviation method^[108]30. Then, we selected the TFs highly enriched exclusively in tumor cells (log[2](FC) > 4 in tumor cells and log[2](FC) < 1 in any other cell types; adjusted P < 0.0001) and identified 49 TFs with higher activity scores in tumor cells than in any other cell types (Fig. [109]2e and Supplementary Table [110]S6). Hepatocyte nuclear factor 1 (HNF1A, HNF1B), a family of tissue-restricted transcription regulators associated with kidney developmental disorders^[111]31, had the highest activity scores and were highly expressed in tumor cells specifically (Supplementary Fig. [112]S2d). Multiple HOX family TF genes were also highly enriched in tumor cells and have been reported to play crucial roles in numerous tumor processes, including angiogenesis and oncogenesis^[113]32,[114]33. We then examined the association of these tumor-specific TFs with the prognosis of patients in The Cancer Genome Atlas kidney renal clear cell carcinoma (TCGA-KIRC) cohort. Patients with high average expression of tumor-specific TFs had shorter overall survival and disease-free survival than the patients with low average expression (Fig. [115]2f and Supplementary Fig. [116]S2e). To further investigate and verify the biological functions of these TFs in ccRCC, we selected four TFs (HOXC5, VENTX, ISL1, and OTP) whose expression levels were significantly associated with worse overall survival in the TCGA-KIRC dataset, and binding sites were located in accessible chromatin regions, which were specific for kidney cancer and identified by ATAC-seq^[117]34 (Fig. [118]3a and Supplementary Fig. [119]S3a). We identified their target genes whose promoters or linked candidate cis-regulatory elements (cCREs) were accessible and contained the TF-binding motif in tumor cells. Then, we constructed TF regulatory networks to gain further insights into TF-mediated gene regulation in tumor cells. Within these networks, we found that these four TFs regulated multiple tumor-specific genes, such as FXYD2 and CRYAB, with a significant increase in expression in tumor cells (Fig. [120]3b and Supplementary Table [121]S3). Moreover, we found that the target genes of these TFs were significantly enriched in hypoxia and cell proliferation signaling pathways (Supplementary Fig. [122]S3b). For each TF, we calculated the target gene scores for each cell type and found that the scores in tumor cells were significantly higher than in any other cell type (Supplementary Fig. [123]S3c). Similar phenomena were observed in the other three previous scRNA-seq datasets of ccRCC^[124]11,[125]13,[126]14 (Supplementary Fig. [127]S3d–f). More importantly, we found that knocking down these TFs significantly reduced tumor cell proliferation and increased cell death (Fig. [128]3c, d and Supplementary Fig. [129]S3g, h). We further confirmed that knocking down two TFs (HOXC5 and ISL1) strongly reduced tumor growth in the xenograft mouse models (Supplementary Fig. [130]S3i–k). To find potential drugs for targeting these TFs, we used the LINCS consortium^[131]35, which included 19,811 small molecule compound-perturbed profiles, to interrogate the effects of drugs on gene expression. With this information, we found two candidate drugs (homoharringtonine (HHT) and mitotane) approved by the FDA could significantly decrease the expression level of HOXC5, ISL1, and VENTX (Fig. [132]3e). OTP was not included in the drug perturbation expression profile and thus was not subjected to subsequent analysis. As we expected, both of these drugs significantly decreased the proliferation rate of tumor cells and the expression levels of HOXC5, ISL1, and VENTX in two renal cancer cell lines (Fig. [133]3f–j). Collectively, we identified tumor-specific regulatory elements with chromosome accessibility signals captured by scATAC-seq and verified their roles in promoting tumor growth. Fig. 3. Characterization of the four tumor-specific TFs. [134]Fig. 3 [135]Open in a new tab a Kaplan–Meier analyses of overall survival in TCGA-KIRC patients separated by HOXC5, VENTX, ISL1, and OTP expression (with the median value as the threshold) individually. HR, hazard ratio. b TF regulatory network showing the candidate target genes for the following TFs: HOXC5, VENTX, ISL1, and OTP in tumor cells. c The effects of sh-HOXC5, VENTX, ISL1, OTP, and vector on cell proliferation were determined by a cell proliferation assay in the 786-O cell line. d HOXC5, VENTX, ISL1, and OTP mRNA expression were significantly downregulated in the 786-O cell line. Significance was determined by two-way ANOVA. e Heatmap showing the degree of decrease in expression levels of VENTX, ISL1, and HOXC5 after drug treatment. f Cell proliferation was assessed over a 2-day time course after treatment with DMSO or HHT in the 786-O (top) and 769-P (bottom) ccRCC cell lines. Significance was determined by multiple t-tests. g The mRNA levels of HOXC5 after treatment with DMSO or HHT in the 786-O (left) and 769-P (right) ccRCC cell lines were measured using qPCR. Significance was determined by Student’s t-test. h Cell proliferation was assessed over a 2-day time course after treatment with DMSO or mitotane in the 786-O (top) and 769-P (bottom) ccRCC cell lines. Significance was determined by multiple t-tests. i The mRNA levels of VENTX after treatment with DMSO or mitotane in the 786-O (left) and 769-P (right) ccRCC cell lines were measured using qPCR. Significance was determined by Student’s t-test. j The mRNA levels of ISL1 after treatment with DMSO or mitotane in the 786-O (left) and 769-P (right) ccRCC cell lines were measured using qPCR. Significance was determined by Student’s t-test. For all statistical tests, *P < 0.05; **P < 0.01; ***P < 0.001. Malignant transcriptional programs within ccRCC To explore how expression states vary among different malignant cells within ccRCC, we focused on 3564 tumor cells from four samples (Supplementary Fig. [136]S4a). Pairwise correlation analysis revealed multiple distinct transcriptional states that were consistently present within these tumors (Fig. [137]4a). To interrogate tumor heterogeneity information more precisely, we applied nonnegative matrix factorization (NMF) to define underlying transcriptional programs consisting of coexpressed genes. NMF is a matrix factorization method and can extract meaningful features from noisy or complex data in a direct, unbiased manner^[138]36. Due to its excellent performance in dimensionality reduction and data interpretability, NMF is widely used to identify key subclasses and latent biological processes in the scRNA-seq dataset^[139]37,[140]38. With this method, we successfully extracted a total of 11 intratumor programs among these samples (Supplementary Fig. [141]S4b, c). Subsequently, we refined these programs by hierarchical clustering and identified two meta-programs that included highly similar programs across the four samples (Fig. [142]4b and Supplementary Fig. [143]S4d, Table [144]S7). Meta-program 1 was present in all samples, while meta-program 2 was present in three samples. The cluster of programs that covered less than half the samples was not subjected to subsequent analysis. These two meta-programs exhibited a relatively low correlation (r = −0.2), suggesting the different roles in tumors (Supplementary Fig. [145]S4e). Next, we investigated the biological functions of these meta-programs using pathway enrichment analysis. The genes of meta-program 1 were significantly enriched in stress-related pathways, such as hypoxia and the MAPK signaling pathway, while the genes of meta-program 2 were mainly involved in metabolic-related biological processes, such as glycolysis and monosaccharide metabolism (Fig. [146]4c). Furthermore, we examined the association between the average gene expression of each meta-program and the prognosis of patients in the TCGA-KIRC cohort. Patients with high expression of meta-program 1 had worse overall survival than patients with low expression, while patients with high expression of meta-program 2 were associated with better overall survival (Fig. [147]4d, e). Fig. 4. Deciphering malignant transcriptional programs within ccRCC. [148]Fig. 4 [149]Open in a new tab a Pairwise correlations between the expression profiles of four scRNA-seq samples (3564 cells). b Heatmap of average correlations across four samples between pairs of programs. c Dot plot showing significantly enriched pathways for signature genes of each meta-program. The color shade of the dot indicates the P value after FDR correction (the redder, the smaller adjusted P value), and the size of the dot represents the number of genes in the indicated pathway (the larger, the more genes included). d, e The Kaplan–Meier overall survival curves of TCGA-KIRC patients grouped by the average gene expression of meta-program 1 and meta-program 2 (with the median as the threshold). The P value was calculated by the log-rank test. HR, hazard ratio. f, g Sankey diagrams showing the regulatory relationship between the four tumor-specific TFs and genes within meta-program 1 and meta-program 2. We further interrogated the key regulatory molecules for these two meta-programs. With the previous strategy for the identification of TF target genes, we found that more than two-thirds of the genes in each meta-program can be regulated by numerous TFs (Supplementary Fig. [150]S4f). Multiple TFs, such as ZNF263 and SP1, showed similar and strong regulatory capabilities in different meta-programs. We noticed that VEGFA and SMIM24 were the cardinal coregulatory target genes of these TFs for meta-program 1 and meta-program 2, respectively (Supplementary Fig. [151]S4g, h). We further explored the regulatory relationship between meta-programs and four tumor-specific TFs identified previously. Notably, we also found that the four TFs all exhibited a regulatory relationship with VEGFA in meta-program 1 (Fig. [152]4f). In meta-program 2, CRYAB, the second most significant DEG in tumor cells, was regulated by all four TFs (Fig. [153]4g). The most significant DEG in tumor cells, FXYD2, was regulated by both ISL1 and OTP. Overall, we discovered two major transcriptome programs within ccRCC tumor cells and further identified the possible regulatory elements of these two programs. CD8^+ T cell clustering and state analysis in ccRCC CD8^+ T lymphocytes play crucial roles in inhibiting tumor progression. To better elucidate the heterogeneity of tumor-infiltrating CD8^+ T cells, we further investigated CD8^+ T cell subpopulations in both scRNA-seq and scATAC-seq datasets. We focused on the major subpopulations and removed clusters with fewer than 100 cells (Supplementary Fig. [154]S5a–f). Finally, we identified four major clusters in the scRNA-seq dataset and five major clusters in the scATAC-seq dataset (Fig. [155]5a, b). Tissue-resident CD8^+ T cells were identified in both scRNA-seq and scATAC-seq datasets with high gene expression or activity of tissue-resident markers (CD69, ZNF683/Hobit, ITGAE/CD103, and ITGA1/CD49A)^[156]39–[157]41 (Fig. [158]5c). Interestingly, tissue-resident CD8^+ T cells included two clusters in the scRNA-seq data: one cluster (tissue-resident C2) exhibited high expression of effector molecules (TNF, IFNG, and GZMA), and the other (tissue-resident C1) was highly expressed naive/memory genes (IL7R, CCR7, and TCF7). We identified the exhausted CD8^+ T cells based on the expression of exhaustion markers (PDCD1 and TOX^[159]42) (Fig. [160]5c and Supplementary Fig. [161]S5g). The cluster (“exhausted immediate-early genes (exhausted IEG)”) was also characterized by the high expression of genes induced early after activation (e.g., HSPA1A, DNAJB1, JUNB, and ATF3), which is consistent with a recent study^[162]15. Notably, we identified two exhausted IEG clusters in the scATAC-seq dataset and exhibited different gene activities in T cell inhibitory genes (CTLA4, LAG3, and HAVCR2/TIM3). Fig. 5. Characterization of CD8^+ T cells in ccRCC. [163]Fig. 5 [164]Open in a new tab a, b Subclustering of CD8^+ T cells on the UMAP plots of the scRNA-seq and scATAC-seq datasets. c Heatmap of CD8^+ T cell lineage and functional markers provided phenotypic information for individual CD8^+ T cell clusters in the scRNA-seq (left) and scATAC-seq (right) data. d, e Violin plots showing the signature scores of cytotoxic and exhaustion gene sets for each CD8^+ T cell cluster in the scRNA-seq (top) and scATAC-seq data (bottom). Signature scores for each cell were calculated by the VISION method. Two-sided Wilcoxon test. ns, no significance; *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001. f Violin plots showing the progenitor and terminally exhausted signature scores for each CD8^+ T cell cluster in the scRNA-seq data. Signature scores for each cell were calculated by the VISION method. Two-sided Wilcoxon test. ns, no significance; *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001. g Heatmap of significantly enriched pathways for DEGs of each CD8^+ T cell cluster in the scRNA-seq data. The darker the red color is, the smaller the adjusted P value. h Dot plot showing the top enriched TFs within each CD8^+ T cell subpopulation individually. We further investigated the functional properties of these CD8^+ T cell subpopulations using multiple functional gene sets^[165]13,[166]43 and the VISION method^[167]44 (Supplementary Table [168]S2). The exhaustion cluster presented the highest cytotoxic, exhaustion, and terminal differentiation signature scores (Fig. [169]5d, e and Supplementary Fig. [170]S5h). Exhausted IEG cells had higher cell stress signature scores than other cells (Supplementary Fig. [171]S5i). Recent studies have reported that progenitor exhausted T cells control tumors more effectively than terminally exhausted T cells and respond better to anti-PD1 therapy^[172]43,[173]45. Thus, we assessed progenitor and terminally exhausted signatures and found that the tissue-resident C1 cluster had the highest progenitor exhausted signature scores and the lowest terminally exhausted signature scores (Fig. [174]5f), indicating an important role in tumor immunotherapy. In parallel, we performed pathway enrichment analysis for DEGs (adjusted P < 0.05 and log[2](FC) > 0.5) of each cluster in our scRNA-seq dataset. We found that two tissue-resident clusters enriched distinct pathways: the tissue-resident C1 cluster was significantly associated with the inflammatory response and the tissue-resident C2 cluster was enriched in cytokine and vitamin metabolic pathways (Fig. [175]5g). Multiple T cell exhaustion-related pathways, such as PD-1, IL2-STAT5^[176]46, and IL6-JAK-STAT3^[177]47 signaling, were significantly enriched in the exhaustion cluster. Furthermore, we found that the average expression of DEGs in the exhaustion cluster was associated with worse overall survival in the TCGA-KIRC and CheckMate-025 cohorts, a randomized phase III trial of nivolumab (anti-PD-1) treatment in ccRCC^[178]48 (Supplementary Fig. [179]S5j). Next, we examined the regulatory elements in these CD8^+ T clusters. EOMES and BATF implicating in CD8^+ T cell exhaustion^[180]49,[181]50 were highly enriched in the exhaustion cluster (Fig. [182]5h). Multiple Rel/NF-κB family TFs, such as RELA and NFKB1, were also enriched in the exhaustion cluster. Interestingly, we found that two exhausted IEG clusters were regulated by different TFs. Macrophage clustering and state analysis in ccRCC Tumor-associated macrophages (TAMs) are the major population of myeloid cells in tumors and play vital roles in tumorigenesis and drug resistance^[183]51. To systematically interrogate the heterogeneity of TAMs, we performed recluster analysis similar to CD8^+ T cells and identified three TAM clusters in both datasets using known phenotypic markers^[184]52 (Fig. [185]6a–c and Supplementary Fig. [186]S6a–f). The first cluster (“TAM-C1QB”) expressed complement genes (e.g., C1QB and C1QC), APOE, and early response-related genes (e.g., IER2 and JUN). Moreover, numerous MHC class II molecules were highly expressed in this cluster, suggesting strong antigen presentation ability (Fig. [187]6d). The second cluster (“TAM-RGCC”) expressed high levels of RGCC and proinflammatory genes, such as NLRP3^[188]53, CLEC5A, and IL1A/B. The third cluster (“TAM-LGALS3”) exhibited intermediate expression of C1QB and C1QC but highly expressed genes involved in alternative (M2) macrophage activation, such as ANXA2^[189]54 and LGALS3^[190]55. In addition, this cluster also expressed immunosuppressive genes, such as GPNMB^[191]56 and TREM2^[192]57, and multiple MHC class I molecules. Fig. 6. Characterization of TAMs in ccRCC. [193]Fig. 6 [194]Open in a new tab a, b Subclustering of TAMs on the UMAP plots of the scRNA-seq and scATAC-seq datasets. c Dot plot showing the expression levels and activity scores for known phenotypic markers in each TAM cluster. Dot size indicates the fraction of expressing cells, colored based on normalized expression or activity. d Heatmap showing the expression of MHC molecules, chemokines, cytokines, and other related genes in each TAM cluster. e Boxplots showing the M1 and M2 signature scores for each TAM cluster in the scRNA-seq data. The signature score was calculated by the VISION method. Two-sided Wilcoxon test. ns, no significance; *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001. f, g Boxplot showing the angiogenesis and phagocytosis signature scores for each TAM cluster in the scRNA-seq data. The signature score was calculated by the VISION method. Two-sided Wilcoxon test. ns, no significance; *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001. h Dot plot showing the expression of the immune costimulatory, checkpoint, and evasion genes for each TAM cluster in the scRNA-seq dataset. i Heatmap of log[2](FC) values for TFs with significant differences both in the scRNA-seq and scATAC-seq datasets. j TF regulatory network showing the predicted target genes for MEF2C in the TAM-C1QB cluster. To understand the function of these TAM subpopulations, we further examined the expression or activity of multiple functional gene sets^[195]58 (Supplementary Table [196]S2). We found that the TAM-RGCC cluster exhibited the highest M1 signature scores, followed by the TAM-C1QB cluster (Fig. [197]6e and Supplementary Fig. [198]6g). Conversely, the M2 signature was primarily enriched in the TAM-LGALS3 cluster. TAM-RGCC cluster possessed higher angiogenesis signature scores (Fig. [199]6f and Supplementary Fig. [200]S6h). Notably, the TAM-C1QB cluster showed the highest phagocytosis signature scores in scRNA-seq data, suggesting a high phagocytic activity (Fig. [201]6g). However, scATAC-seq data did not show a similar trend, maybe due to the sparsity of its data^[202]59 (Supplementary Fig. [203]S6i). In parallel, we investigated the differences among these subpopulations at the pathway level and found that various pathways related to inflammation (e.g., TNFA signaling via NFKB, and inflammatory response) and phagocytosis (e.g., endocytosis and phagocytosis) were significantly enriched in the TAM-C1QB cluster (Supplementary Fig. [204]S6j). TAM-RGCC cluster exhibited enrichment of angiogenesis-related pathways, while the TAM-LGALS3 cluster exhibited enrichment of lysosome and lipid catabolic processes. Furthermore, we examined the expression of immune checkpoint genes and costimulatory molecules. Multiple costimulatory signals were detected in both TAM-C1QB and TAM-RGCC clusters but not in the TAM-LGALS3 cluster (Fig. [205]6h). The ligands (PD-L1, PD-L2, VSIR/VISTA^[206]60, VSIG4^[207]61, and SIGLEC10^[208]62) mediating T cell immune checkpoint were highly expressed in the TAM-C1QB cluster while LGALS9^[209]63 was specifically enriched in the TAM-LGALS3 cluster. Next, we investigated the difference in TF motif activity and found distinct TF regulation programs among the three TAM clusters (Supplementary Fig. [210]S6k). Four TFs (MEF2C, NFKB1, RUNX3, and ENO1) showed a substantial increase in both gene expression and activity in the corresponding clusters (Fig. [211]6i). Previous studies have demonstrated that MEF2C plays vital roles in promoting M1 macrophage polarization and inducing cell death of macrophage^[212]64,[213]65. We identified MEF2C target genes using the same strategy described in the previous section and found that this gene regulated multiple TFs (e.g., FOXO1, NEU1, and NRP1) and chemokines (e.g., CCL3 and CCL3L1) (Fig. [214]6j) that have been demonstrated to be associated with phagocytosis and angiogenesis in macrophages^[215]66–[216]69. Moreover, the signature score of MEF2C target genes was associated with better survival in both the TCGA-KIRC and CheckMate-025 cohorts (Supplementary Fig. [217]S6l). Cell–cell cross-talk in the ccRCC microenvironment To demonstrate the intercellular communications within the ccRCC microenvironment, we applied CellPhoneDB, a repository of known ligand–receptor interactions, for data analysis. We found that the endothelial cells had the largest ligand–receptor interaction pairs, while B cells had the fewest (Fig. [218]7a and Supplementary Fig. [219]S7a). Notably, the TAM populations had the largest number of ligand–receptor interaction pairs compared to other immune cell populations (Supplementary Fig. [220]S7b). Tumor cells are the key cellular component of the entire TME, and the interactions between tumor cells and other cells play crucial roles in tumorigenesis. Therefore, we built an interaction map to investigate the intercellular communications of tumor cells (Fig. [221]7b). Tumor cells mainly communicated with the endothelium, TAMs, proliferative CD8^+ T cells, and CD8^+ exhaustion T cells through multiple ligand–receptor interactions (Fig. [222]7c and Supplementary Fig. [223]S7c). Fig. 7. Ligand–receptor-based interaction between tumor and immune cells. [224]Fig. 7 [225]Open in a new tab a Heatmap of the number of significant ligand–receptor interactions between cell types in the scRNA-seq data. b The interaction network between the tumor cell and other cell types. The direction of the arrow indicates the cell type expressing the receptor. The wider the edge or larger the dot is, the greater the number of interactions. c Heatmap of the number of significant interactions mediated by tumor cells as ligand-expressing and receptor-expressing cells. d Dot plots showing the mean interaction strength for selected ligand–receptor pairs between tumor cells and TAM clusters. The left and right plots show that the ligand is expressed in the tumor cells and TAM clusters, respectively. Dot size indicates the P value, colored by the average expression level of interacting molecule 1 in cluster 1 and interacting molecule 2 in cluster 2. e The Kaplan–Meier overall survival curves of TCGA-KIRC patients grouped by the averaged expression of RPS19 and C5AR1 (with the median value as the threshold). HR, hazard ratio. f Dot plots showing the mean interaction strength for selected ligand–receptor pairs between tumor cells and three lymphoid subsets. The left and right plots show that the ligand is expressed in the tumor cells and lymphoid subsets, respectively. Dot size indicates the P value, colored by the average expression level of interacting molecule 1 in cluster 1 and interacting molecule 2 in cluster 2. g Kaplan–Meier overall survival curves of TCGA-KIRC patients grouped by the averaged expression of LTB and LTBR (with the median value as the threshold). The crosstalks between various immune cells (especially macrophages and T cells) and tumor cells play crucial roles in tumor immune escape, which is a hot spot in cancer immunology research. Thus, we selected three cell subpopulations in the myeloid and lymphoid lineages with the highest number of interactions with tumor cells for further study. We found strong interactions between tumor cells and TAM populations mediated by macrophage migration inhibitory factor (MIF) and its receptors (CD44 and CD74), which have been well characterized to play vital roles in tumor progression, angiogenesis, and immune escape in ccRCC^[226]70–[227]72 (Fig. [228]7d and Supplementary Fig. [229]S7d). We also uncovered some novel interactions that have not been reported in ccRCC, for example, the ligand–receptor interaction between ribosomal protein S19 (RPS19) and complement C5a receptor 1 (C5AR1), which has been shown to contribute to immunosuppression in the human breast and ovarian cancer TME^[230]73. Patients with high average expression of RPS19 and C5AR1 had a poor prognosis in the TCGA-KIRC dataset (Fig. [231]7e). Moreover, we identified multiple interactions between tumor cells and three lymphocyte subpopulations (proliferative CD8^+ T cell, Treg, and CD8^+ T-exhaustion) (Fig. [232]7f). Notably, we characterized a Treg-tumor cell-specific interaction mediated by the ligand Lymphotoxin Beta (LTB) and its receptor lymphotoxin beta receptor (LTBR), which is required for fibrosarcoma and hepatocellular carcinoma tumor formation^[233]74,[234]75. High average expression of LTB and LTBR was associated with poor clinical outcomes in the TCGA-KIRC dataset, suggesting the translational potential of this finding (Fig. [235]7g). Discussion Here, we performed integrative scRNA-seq and scATAC-seq to delineate the transcriptional and epigenetic landscape of ccRCC. Our analysis demonstrated that scRNA-seq and scATAC-seq had consistently high cell identification abilities, and their data confirmed and complemented each other. Single-cell multiomics profiling can provide more comprehensive information from different perspectives, enabling us to better dissect cellular compositions and decipher cross-compartment interactions in the TME. Consistent with several recently reported scRNA-seq studies of ccRCC, we found that tumor cells represent only a small fraction (7.2%) of all cells in ccRCC tissues^[236]11, suggesting that traditional bulk-level epigenetic sequencing methods, such as ATAC-seq and ChIP-seq, may fail to identify some tumor cell-specific regulatory elements and their networks. In this study, we revealed several pivotal regulatory TFs (HOXC5, ISL1, VENTX, and OTP) in tumor cells by integrating scRNA-seq and scATAC-seq data and further experimentally validated their roles in tumor growth. The function of these TFs had been explored in some other types of cancers. For example, ISL1 has been reported as an important regulator and potential therapeutic target for gastric cancer and triple-negative breast cancer^[237]76,[238]77, while OTP has been characterized as a promising prognostic marker of pulmonary neuroendocrine tumors^[239]78. Our findings highlight the potential oncogenic roles of these TFs in ccRCC for the first time, but the underlying mechanisms remain to be investigated in future studies. In addition, multiple hepatocyte nuclear factors (e.g., HNF1A, HNF4A, and HNF4G) we identified in tumor cells have been shown to be specific for proximal tubule cells, which are the original cells of ccRCC^[240]79. We also adopted a refined clustering strategy on the two major immune cell populations (CD8^+ T cells and macrophages) in the TME to reveal the TFs that control different functional states of these cells. Multiple NFATC TFs (e.g., NFATC2/NFAT1 and NFATC3/NFAT4) were specifically enriched in the exhausted IEG C1 population and have already been demonstrated to promote exhaustion of T cells and better control immune responses by disrupting its interactions^[241]80,[242]81. C-Rel (REL) plays a vital role in the activation of Tregs and is a potential target for suppressing Treg function^[243]82. Interestingly, we found that the REL TF was significantly enriched in the exhaustion T cells, suggesting a potential target for CD8^+ T cells for future research. Furthermore, we comprehensively delineated the multicellular communities in the TME and identified ligand–receptor pairs, including RPS19-C5AR1 and LTB-LTBR, which could serve as potential therapeutic targets for further studies. Targeting the LTB-LTBR interaction with an agonistic monoclonal antibody (CBE11) inhibited tumor growth and prolonged survival in colorectal cancer xenografts^[244]83. Therefore, therapeutic interruption of these interactions by antagonist or blocking antibodies may alleviate the immunosuppressive nature of the microenvironment and provide a potential strategy for ccRCC treatment. In summary, our single-cell multiomics analysis provides a high-resolution transcriptional and epigenomic map of ccRCC and in-depth knowledge of tumor heterogeneity and underlying regulatory factors, facilitating a step forward in rational therapeutic strategy design. Materials and methods Human specimens All research activities were pre-approved by The Institutional Research Ethics Committee of The Second Affiliated Hospital, School of Medicine, Zhejiang University (NO. IR2020001463), and Informed consent were obtained for all human participants. Human ccRCC samples were directly obtained from the operating room during nephrectomy and transported in MACS tissue storage solution (#130-100-008; Miltenyi) on ice. Tissue dissociation and library preparation The tumor tissue of each sample was equally divided into two parts for scRNA-seq and scATAC-seq library preparation. For scATAC-seq, fresh nuclei were isolated with 1 mL prepared lysis buffer (10 nM Tris-HCI, 10 mM NaCI, 3 mM MgCl[2], 0.1% Nonidet^TM P40 Substitute (#74385; Sigma-Aldrich)) and incubated on ice for 5 min. The culture was centrifuged at 500× g for 5 min at 4 °C and mixed with 1 mL nuclear resuspension buffer (1× PBS, 1% BSA, and 0.2 U/μL RNase Inhibitor). The homogenate was filtered through a 40-μm cell strainer (#H13680-0040; BelArt) and the nuclear concentration was determined using Countess® II FL Automated Cell Counter (#[245]C10228; Thermo Fisher). For scRNA-seq, tissues were cut into 2–4 mm^3 pieces and dissociated to generate a single-cell suspension. The cell suspension culture was centrifuged at 300× g for 30 s and resuspended with an additional 10 mL RPMI 1640 (#10-040-CM; Miltenyi Biotec). The homogenate was filtered through a 70-μm cell strainer (#130-098-462; Miltenyi Biotec) and centrifuged at 300× g for 7 min followed by red blood cell lysis with 1 mL 1× Red Blood Cell Lysis Solution (#130-094-183; Miltenyi Biotec). The cell concentration was determined using Countess® II FL Automated Cell Counter. 10× Chromium libraries were prepared according to manufacturer protocol. Three scATAC-seq libraries and four scRNA-seq libraries were obtained using 10× Genomics Chromium Single Cell ATAC v1 chemistry and 10× Genomics Chromium Single Cell 3′ v3 chemistry, respectively. All libraries were sequenced on Illumina NovaSeq 6000 using a paired-end 150 bp protocol. scRNA-seq data processing Raw sequencing data were processed with Cell Ranger (v5.0.0, 10× Genomics) software for demultiplexing, aligning to GRCh38 human reference genome, and generating gene-barcode matrices. Seurat (v4.0.3)^[246]84 R package was used to perform filtering, normalization, dimensionality reduction, clustering, and differential expression analysis. The following criteria were applied to each sample to remove low-quality cells: gene number between 200 and 6000, UMI count > 1000, and mitochondrial content > 10%. Doublets were predicted by the DoubletFinder (v2.0.3)^[247]85 algorithm. After filtering, a total of 38,600 cells were left. The batch effect across different samples was eliminated by the Harmony (v1.0)^[248]86 method. The top 40 harmony embedding was selected by the “ElbowPlot” function and used to perform clustering and visualization. The “FindClusters” function was performed to generate different clustering results with resolutions ranging from 0.2 to 1.2. An appropriate resolution was determined based on cluster stability with clustree (v0.4.3)^[249]87 R package. Finally, we obtained 19 clusters (resolution = 0.3) (Supplementary Fig. [250]S1c). Two clusters (15 and 18) were excluded because they contained fewer than 100 cells or did not express known markers, and 38,097 cells were retained for the subsequent analysis. Differential gene analysis was performed by the MAST method^[251]88 (“FindAllMarkers” function) and the donor as the latent variable. DEGs were identified with Bonferroni-adjusted P values smaller than 0.05 and log[2]-fold-change values larger than 0.25. The second round of clustering procedures for CD8^+ T and macrophage cells was the same as above: starting from normalized expression matrix with SCTransfrom method, performing integration analysis with Harmony method, and clustering with “FindNeighbors” and “FindClusters” function. scATAC-seq data processing Raw sequencing data were processed with Cell Ranger ATAC (v1.2.0, 10× Genomics) software for demultiplexing, aligning to GRCh38 human reference genome, and generating peak-barcode matrices. Signac (v1.2.1)^[252]89 R package was used to perform subsequent analysis. Low-quality cells were removed based on the following criteria: peak region fragments > 1000, peak region fragments < 20,000, reads in peaks > 15%, blacklist ratio < 0.05, nucleosome signal < 4 and TSS enrichment > 3. After filtering, a dataset from three samples comprising 88,392 peaks from 24,173 cells remained for latent semantic indexing (LSI) analysis. Clustering and dimensionality reduction were then performed on the corrected LSI components by the Harmony^[253]86 method. Finally, we obtained 15 clusters with resolution = 0.5 and dims = 2:15 (Supplementary Fig. [254]S1e). The activity of each gene was quantified by examining the local chromatin accessibility, including the 2 kb upstream of the transcriptional start site and gene body. The accessible chromatin peaks for each cell type were identified by the MACS2 method^[255]23. Differential chromatin accessibility analysis was performed by the “FindAllMarkers” function. Differentially accessible chromatin regions were identified with Bonferroni-adjusted P values smaller than 0.05 and log[2]-fold-change values larger than 0.25. The genomic regions containing accessible chromatin peaks were annotated by ChIPSeeker (v1.26.2)^[256]90 with the UCSC database on hg38^[257]91. A similar analysis strategy as described above was used to investigate the subpopulation for CD8^+ T cells and macrophages. Integrated analysis of scRNA-seq and scATAC-seq data To help to interpret the scATAC-seq data, we applied Seurat’s integration framework to identify the pairs of corresponding cells between two modalities data. The shared correlation patterns between scATAC-seq gene activity and scRNA-seq gene expression were identified by the “FindTransferAnchors” function (reduction = ‘cca’). Then the cell type label of each cell in scATAC-seq data was predicted by “TransferData” function (weight.reduction = ‘lsi’ and dim = 2:15). A total of 21,272 cells were left after filtering using a maximum prediction score ≥ 0.5 (Supplementary Fig. [258]S1f). The filtered scATAC-seq object was reprocessed with LSI, batch corrected with Harmony algorithm, and clustered with SLM algorithm. The Jaccard index was used to assess the consistency between cell identities predicted by label transfer and curated annotations based on gene activities of known markers (Supplementary Fig. [259]S1i). Single-cell copy number variation analysis The inferCNV (v1.6.0) (inferCNV of the Trinity CTAT Project, [260]https://github.com/broadinstitute/inferCNV) R package was used to distinguish malignant cells by inferring chromosomal CNVs based on the gene expression data. The CD8^+ T cells and NK/NKT cells as normal reference cells were used to estimate CNVs for the potential tumor cell population. A gene ordering file from the human GRCh38 assembly containing each gene’s chromosomal start and end positions was created as the input of the “gene_order_file” parameter. The raw count matrix and annotation file were input to run inferCNV with cutoff = 0.1. Single-cell transcription factor activity analysis Single-cell TF motif activity was estimated for a set of 870 TFs from the Catalog of Inferred Sequence Binding Preferences (CIS-BP) database