Abstract Background Chromatin accessibility plays a crucial role in mediating transcriptional dysregulation and heterogeneity in Esophageal Squamous Cell Carcinoma (ESCC). Examining the chromatin accessibility of ESCC at single-cell level is imperative to understand how it activates oncogenes and contributes to the onset and metastasis of ESCC. Methods We performed single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) on cancerous and adjacent noncancerous tissues from four ESCC patients who were pathological staged as T1a, T2b, T3b, or T4a, to investigate whether regulatory elements are pivotal in instigating cellular heterogeneity during ESCC metastasis. In conjunction, we integrated these data with 55 scRNA-seq datasets, ChIP-seq or CUT&Tag sequencing data, Hi-C sequencing data, bulk RNA-seq data, and bulk ATAC-seq data from ESCC cell lines to dissect the mechanisms underlying the heterogeneity of ESCC and tumor microenvironment (TME). Results Our study identified enhancer-specific activation within epithelial cells orchestrated by the three-dimensional structure of chromatin that regulates SERPINH1 transcription, and promotes the epithelial-mesenchymal transition (EMT) and metastasis of ESCC. Additionally, chromatin element activation facilitated the expression of TNFSF4 in CD8 + exhausted T cells, thereby activating Tregs. Furthermore, we observed that chromatin accessibility promoted the differentiation of tumor-associated macrophages (TAMs) and cancer associated fibroblasts (CAFs). Conclusions In summary, utilizing multiomics analyses, we have revealed chromatin accessibility maps and illuminated the intricate molecular mechanisms that underlie cellular heterogeneity during ESCC metastasis, offering valuable insights to further advance research on tumor progression and deterioration. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-025-06154-6. Keywords: Single-cell multiomic analysis, scATAC-seq, Tumor heterogeneity, SERPINH1, Gene regulation, Tumor microenvironment, Esophageal Squamous Cell Carcinoma Background Esophageal Squamous Cell Carcinoma (ESCC) is a common malignant digestive tract tumor and is the sixth leading cause of cancer-related death, with a five-year survival rate of only 15–25% [[51]1, [52]2]. Although endoscopic examination has become more popular in recent years, only a small number of patients with ESCC can achieve early diagnosis, and most individuals progress to advanced disease. Local lymphatic metastasis is the most common recurrence pattern and accounts for treatment failure. Most researches focused on the mechanisms of the progression and metastasis of ESCC [[53]3, [54]4], but the activation process of key gene regulatory elements that promote the heterogeneous development and the tumor microenvironment (TME) of ESCC remains unclear. Therefore, a priority for oncologists is to explore ESCC heterogeneity and TME to achieve early diagnosis and prevent the metastasis of tumor. ESCC exhibits significant tumor heterogeneity, especially evident in chromosome copy number, differential expression of cellular markers and functional disparities among its constituent cells throughout the occurrence and progression across various clinical stages [[55]5–[56]8]. This heterogeneity is initially observable in the peritumoral environment. Chen et al. have reported that some antitumor and inflammatory molecules, such as KRT13 and MUC21 were high expressed in non-malignant esophageal epithelial cells, whereas SERPINH1, a gene related epithelial-mesenchymal transition (EMT) was overexpressed in cancer cells [[57]6, [58]9]. This result demonstrated the heterogenous gene expression between cancerous and noncancerous cells in ESCC tissues. In addition, the heterogeneity may not only exist between cancerous cells and noncancerous cells, the tumor cells even in one individual also exhibit a diverse array of cell subsets due to the continuous proliferation and distinct differentiation processes. These cell subsets exhibit unique characteristics related to immunology, cell cycle regulation, EMT-like properties and cell repair mechanisms, which further contribute to the overall heterogeneity of ESCC [[59]6, [60]9–[61]12]. Furthermore, the heterogeneity between metastatic and non-metastatic ESCC tissues also obviously exists. Yin, X et al. have revealed that the pathways related to keratosis, epidermal cell differentiation, and T-cell activation are significantly activated in ESCC without metastasis, while the tumor tissues in ESCC with lymphatic metastasis exhibit extremely high levels of immunoregulatory pathways through cNMF analysis [[62]5]. Just as mentioned above, most studies have focused on the phenomenon of ESCC heterogeneity, the underlying mechanisms, especially the epigenetic mechanism which account for the heterogenous changes remains unclear. Epigenetic disorders drive aberrant transcriptional programs, leading to cancer initiation and progression [[63]13]. Chromatin accessibility, accompanied by histone modifications, can regulate the activity of oncogenes without changing the DNA sequence. Technologies such as ATAC-seq and ChIP-seq (such as H3K27ac ChIP-seq) enable the identification of the activation of regulatory elements [[64]14]. By combining bulk ATAC-seq with TCGA analyses, researchers have shown that increased chromatin accessibly at EGFRe1, an enhancer near the EGFR gene locus, can promote the overexpression of EGFR and mediate the proliferation and migration of ESCC cells [[65]15]. Furthermore, regions around transcriptionally active genes (NELL2 and PRIM2), exhibit more open chromatin structures, whereas those associated with genes with low-expression (such as HPGD, KAT2B, and RRAGD) present reduced chromatin accessibility [[66]16]. However, these findings are primarily based on analyses of cell line and tissues, with a limited understanding of single-cell chromatin accessibility and its impact on the ESCC TME. Single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) is a single-cell genomics technique that identifies open regions of chromatin to reflect gene expression and cellular function. Compared with bulk ATAC-seq, scATAC-seq can reveal regions of gene transcriptional activity at the single-cell level and can more precisely reveal the epigenetic mechanisms underlying tumor heterogeneity and TME evolution. Integrated scATAC and scRNA-seq analyses have revealed gene transcriptional activation mechanisms in various cancers, including endometrial cancer [[67]17], colorectal cancer [[68]18], clear cell renal cell carcinoma [[69]19], bladder cancer [[70]20], and glioma [[71]21]. These studies demonstrated that chromatin accessibility changes activated gene transcription, thereby promoting tumorigenesis and metastasis at the single-cell level. Regner et al. reported that the increased accessibility of the LAPTM4B promoter and enhancers can promote the overexpression of LAPTM4B, increase chemotherapy resistance, and even correlate with the poor prognosis of ovarian cancer patients using the combination of scATAC-seq with scRNA-seq [[72]17]. However, the role of chromatin accessibility in regulating gene transcription at the single-cell level in ESCC has not been reported. The TME comprises an environment that encompasses tumor cells, adjacent tissues, and immune cells, fostering the survival of tumor cells [[73]22, [74]23]. Variations in the TME cell composition and function significantly affect cancer initiation and progression [[75]24]. Several researchers have proposed that CD8 + exhausted T cells are more abundant in tumor cells than in adjacent tissues [[76]25]. Malignant epithelial cells can promote the transformation of normal fibroblasts (NFs) to cancer-associated fibroblasts (CAFs) by downregulating ANXA1, with CAFs contributing to the formation and migration of ESCC cells [[77]26]. In addition, fibroblasts are significantly enriched in cancer regions, whereas immune cells (T cells, NK cells, and B cells) are more abundant in stromal regions in ESCC [[78]8]. However, the epigenetic mechanisms underlying these TME changes remain unclear. Therefore, a deeper understanding of ESCC TME dynamics and their underlying epigenetic regulation is crucial for developing effective therapeutic strategies for ESCC. In this study, we performed scATAC-seq on the tumors and adjacent tissues of 4 patients with ESCC at different disease stages, coupled with a comprehensive analysis of 55 groups of scRNA-seq datasets. The results revealed the epigenetic alterations in ESCC-related differentially expressed genes (DEGs). Furthermore, we explored the chromatin accessibility dynamics of ESCC at different stages and inferred the activity of transcription factors (TFs) that regulate the interaction of stage-specific elements. In addition, chromatin accessibility and transcription maps of CAFs, tumor associated macrophages (TAMs) and T cells in the TME were generated, providing a basis for the development of novel ESCC therapeutic strategies. Methods Patient selection and tissue collection We retrospectively selected patients with ESCC who had undergone curative resection at Shandong Provincial Hospital Affiliated to Shandong First Medical University. These patients were pathologically staged as I-IV according to the 8th Edition of TNM Staging Manual published by the American Joint Committee on Cancer. From each stage, we selected 4 patients, resulting in a total of 16 patients included in our study. For each patient, a pair of ESCC and non-cancerous esophageal tissue (located more than 5 cm from the margin) were collected. To comprehensively analyze chromatin accessibility and gene expression, we employed a multi-faceted approach. 4 pairs of representative tissues (one per stage) were subjected to scACAC-seq (Supplementary Table S1). The remaining 12 ESCC tissues were analyzed using bulk ATAC-seq followed by qPCR to assess chromatin accessibility changes; Concurrently, IHC staining was performed on these 12 samples to evaluate the expression levels of the genes of interest (Supplementary Table S1). This study was approved by the Ethics Committee of Biomedical Research of Shandong Provincial Hospital Affiliated to Shandong First Medical University (NO.2022–160, March 1, 2022). Cells culture Four established human ESCC cell lines (KYSE30, KYSE150, KYSE510 and KYSE180) were purchased from the ATCC cell bank, and one normal human esophageal epithelial cell line (HEEC; BNCC359279) was a gift from Dr. Zhaohua Xiao (The Second Hospital Affiliated to Shandong University). All the cells were cultured in Roswell Park Memorial Institute medium 1640 (RPMI 1640). The media were supplemented with 10% fetal bovine serum (FBS), 100 U/mL penicillin, and 100 U/mL streptomycin (all of these reagents were obtained from Gibco, Invitrogen, Inc, USA). All the cell lines were maintained in a humidified atmosphere with 5% CO[2] at 37℃. Plasmids and cell transfection SERPINH1 overexpression plasmids were constructed by cloning the human SERPINH1 gene into the pcDNA3.1-3xFlag-C vector (pcDNA3.1-SERPINH1-3xFlag). The cells were transfected with pcDNA3.1-SERPINH1-3xFlag (OE_SERPINH1) or pcDNA3.1-3xFlag-C (Empty Vector) using Lipofectamine 3000 (Invitrogen, USA). scATAC library construction and sequencing Approximately 0.1 g of tissue was obtained, ground, digested with a micropestle, and passed through a 40-micron cell strainer, after which all the cells were collected by resuspension in precooled PBS. The cells were pelleted via centrifugation at 500 ×g for 5–10 min at 4 °C. After washes with PBS and centrifugation, the cells were resuspended in 50 μl of cold nuclei lysis buffer and incubated on ice. Nuclei were isolated by centrifugation at 500 ×g for 10 min at 4 °C. Following Trypan blue staining and cell counting the nuclei concentration was adjusted to the desired capture number. scATAC libraries were prepared using a 10 × Genomics single-cell ATAC kit and sequenced on the Illumina NextSeq 500 platform according to the manufacturer's recommendations. scATAC-seq data analysis The raw reads from each patient sample were aligned to the hg38 reference genome, and a list of unique ATAC-seq fragments with associated barcodes was generated using 10 × Genomics Cell Ranger ATAC. The list of unique fragments per barcode for each patient’s sample was read into the R package ArchR (version 1.0.2) [[79]27]. Cells with > 1000 unique nuclear fragments and > 4 TSSs were preserved in accordance with the strict quality control (QC) criteria for scATAC-seq data. Doublet removal was performed via “addDoubletScores” and “filterDoublets” functions within ArchR. The filterRatio was configured to be 1 during this process. Then, further quality control of the sequencing depth of each dataset was performed according to the TSS enrichment and log10(nFrags) criteria [[80]27]. For dimensionality reduction of the scATAC-seq data, iterative latent semantic indexing (LSI) in ArchR [[81]27] was performed using the "addIterativeLSI" function, with 25,000 variable features and 30 dimensions. A batch effect correction tool called “Harmony” was applied to remove strong batch effect differences. Subsequently, clusters were identified by utilizing the 'addClusters' function with a resolution parameter set at 0.8. Thereafter, a two-dimensional visualization of the data was generated through the employment of the 'addUMAP' function, with the parameters configured as nNeighbors = 30, minDist = 0.5, and metric = cosine, to enhance the clarity and interpretability of the data representation. Pseudo bulk replicates were generated using the “addGroupCoverages” function to obtain measurements of statistical significance. Peak calling was conducted using the “findMacs2” function, followed by reproducible peak set creation and peak matrix generation with the “addReproduciblePeakSet” and “addPeakMatrix” function. Furthermore, the “getMarkerFeatures” function was used to identify marker peaks based on TSS enrichment and the log10(nFrags) value, and a t-test for statistical significance was performed. Coaccessibility was used to determine correlations in accessibility between two peaks, and peak-to-gene linkage leakage was used to integrate scRNA-seq data and determine correlations between peak accessibility and gene expression [[82]27]. scRNA-seq data analysis scRNA-seq data were obtained from the [83]GSE160269 dataset of GEO ([84]https://www.ncbi.nlm.nih.gov/geo/) and the PRJNA777911 and PRJNA971344 (in house) datasets of SRA ([85]https://www.ncbi.nlm.nih.gov/sra) (Supplementary Table S2). The detailed clinical information is summarized in Supplementary Table S2. The raw reads of the latter two datasets were aligned to the hg38 reference genome and were counted using 10 × Genomics Cell Ranger with the default parameters. In addition, Seurat (version 4.2.0) [[86]28] was used to convert the matrix to a Seurat object and for QC. Samples whose number of cells was less than 1000 were removed. To eliminate potentially low-quality cells, those with a high proportion of mitochondrial gene expression were excluded. Additionally, genes that were expressed in fewer than three cells were also removed from the analysis. Furthermore, the data were integrated via CCA to eliminate the influence of data sources on the data. Meanwhile, the 'FindVariableFeatures' function was employed to detect the top 2000 highly variable genes. Then, Principal component analysis (PCA) was subsequently performed with the “RunPCA” function in Seurat. The “FindNeighbors” and “FindCluster” functions in Seurat were used to sort the cells. The clustering results were visualized in UMAP plot generated with the “RunUMAP” function with the top ten principal components [[87]29]. A cell annotation approach based on well-characterized gene markers was adopted, as detailed in a previous publication [[88]9]. Briefly, EPCAM, KRT19, and KRT5 in epithelial cells; FN1, COL1A1, and COL3A1 in fibroblasts; VWF, PECAM1, and SELE in endothelial cells; CD3A and CD3D in T cells; CD79A and CD79B in B cells; CD163 in macrophages; TPSAB1 in mast cells; and CLEC9A in dendritic cells were analyzed. scATAC integration with scRNA-seq and scATAC-seq annotations Gene activity scores were inferred via scATAC-seq using the “addGeneScoreMatrix” function from ArchR before the marker was transferred from the scRNA-seq data to the scATAC-seq data. The scATAC-seq gene scoring matrix (GeneScoreMatrix) was compared with the scRNA-seq gene expression matrix (RNAAssay) using the “addGeneIntegrationMatrix” function. The scATAC-seq data were subsequently integrated with the scRNA-seq data. This integration allowed us to transfer cell type annotations from the well-characterized scRNA-seq data to the scATAC-seq data. Briefly, cells from scATAC-seq data were directly aligned with corresponding cells from the scRNA-seq data by comparing the scATAC-seq gene score matrix with the scRNA-seq gene expression matrix, using the “FindTransferAnchors” function from the Seurat R package and the “addGeneIntegrationMatrix” in the ArchR R package. Copy number variation (CNV) analysis The CNV assessment of epithelial cells was performed with the infercnv (version 1.10.1) R package [[89]30]. Firstly, the RNA microarray matrices were extracted from scRNA-seq data using the “GetAssayData” function and from scATAC-seq data using the “GeneScoreMatrix” function, with the average gene copy number of T cells used as a reference. Subsequently, AnnoProbe (version 0.1.7, GitHub—jmzeng1314/AnnoProbe) was used to construct the gene map reference files. Finally, by comparing the gene copy numbers and references of epithelial cells,