Abstract Background Esophageal cancer has a low 5-year survival rate despite treatments. scRNA-seq and MR offer insights into neoadjuvant treatment efficacy for precision medicine. Methods Three post-neoadjuvant treatment datasets underwent QC for Seurat analysis. Marker genes identified cell subsets. MR analyzed eQTL data from GWAS cohorts for causal links. Single-cell and MR-derived genes intersected to reveal PLTP in CD4⁺T cells. Results Single-cell analysis of 16 samples found 40,198 genes and 120,102 cells. CD4⁺T cell numbers differed significantly between groups after therapy. Differentially expressed genes were immune-related. Pseudotime and cell-cell communication varied. PLTP, linked to esophageal cancer, co-expressed with genes involved in cell cycle processes. Conclusion The study highlights CD4⁺T cells’ predictive role in therapy efficacy via scRNA-seq and MR. PLTP emerges as a key gene, offering new precision medicine strategies for esophageal cancer. Supplementary Information The online version contains supplementary material available at 10.1007/s12672-025-03751-1. Introduction Esophageal cancer stands as a highly prevalent malignant tumor within the gastrointestinal tract, occupying the seventh position globally in terms of incidence and the sixth spot in mortality rates. This form of cancer is notorious for its dismal prognosis, as patients typically face a five-year survival rate that hovers merely between 15% and 25% [[30]1]. Esophageal squamous cell carcinoma (ESCC) exhibits tumor heterogeneity [[31]2, [32]3], manifesting as significant differences in chromosomal copy numbers, differential expression of cellular markers, and functional disparities among its cells, even across distinct clinical stages of tumor progression [[33]4]. Esophageal cancer has two primary types: squamous cell carcinoma (predominant in China, >90% of cases) and adenocarcinoma (more common in the West). Early-stage ESCC is typically treated with radical surgery, while locally advanced cases often receive neoadjuvant chemotherapy (e.g., cisplatin + 5-fluorouracil) before surgery, which reduces mortality [[34]5, [35]6]. The emergence of immunotherapy has ushered in groundbreaking transformations within the field of oncological treatment. Key trials like KEYNOTE590, CHECKMATE 648, and ESCORT-1st have proven that combining chemotherapy with immunotherapy as a first-line treatment is effective for advanced ESCC, prompting research into immunotherapy’s role in neoadjuvant therapy. The KEYSTONE-002 [[36]7] and PALACE-1 [[37]8] studies reported pathological complete response (pCR) rates of 56% and 46.1%, respectively, for locally advanced ESCC treated with neoadjuvant chemoradiotherapy combined with immunotherapy. Concurrently, the NICE study found a pCR rate of 45.4% for locally advanced ESCC treated with Camrelizumab in combination with nab-paclitaxel and carboplatin [[38]9]. Despite the enhanced efficacy of immunotherapy as a neoadjuvant treatment for ESCC, a considerable proportion of patients still do not derive benefit. Traditionally, tumor mutation burden (TMB), microsatellite instability, and PD-L1 protein expression have been considered potential biomarkers [[39]10]. However, both the KEYNOTE-590 [[40]11] and ChECKMATE-648 [[41]12] studies have confirmed that the efficacy of immunotherapy is independent of PD-L1 status. To avoid the wastage of medical resources, there is an urgent need to develop biomarkers capable of predicting the efficacy of immunotherapy for esophageal squamous cell carcinoma. Previously, tumor research centered on abnormal gene expression and signaling pathways. Now, single-cell technologies like scRNA-seq, snRNA, TCR sequencing, spatial transcriptomics, and whole-genome sequencing offer a deeper, holistic view of tumor development by analyzing genomics, transcriptional patterns, and the tumor microenvironment (TME) [[42]13]. Single-cell sequencing technology is a major driver of cancer research progress. For instance, in gastric cancer with peritoneal metastasis, it revealed that signaling between SPP1⁺ tumor-associated macrophages (TAMs) and stromal myofibroblastic cancer-associated fibroblasts (mCAFs) in the TME influences immunotherapy efficacy [[43]14]. In intrahepatic cholangiocarcinoma, single-cell sequencing found that weakened antigen-presenting cell function causes CD8⁺T cells to have a naive state. This then attracts SPP1⁺ macrophages from the stroma. These macrophages, together with POSTN⁺ cancer-associated fibroblasts and endothelial cells, drive tumor growth and spread [[44]15]. In hepatocellular carcinoma, glutamine drives the infiltration of GPR109A⁺ myeloid cells, enabling them to evade immune surveillance. Blocking GPR109A activates the antitumor effects of CD8⁺T cells [[45]16]. Single-cell and spatial transcriptomic studies of head and neck squamous cell carcinoma revealed that inflammatory fibroblasts (expressing IL-11) and fibroreticular-like cells (expressing CCL19) have immune-regulating gene expressions. Together, they team up with TNF-α to activate the typical NF-κB signaling pathway [[46]17]. In breast cancer research, molecular biomarkers can predict CDK4/6 inhibitors’ effectiveness. CD8⁺T and NK cells might also indicate treatment response. For instance, HSP90 and HSPA8 are linked to resistance against PD1/PD-L1 immune checkpoint drugs. The SPP1-CD44 axis curbs suppressive T-cell growth, and the MDK-NCL axis dampens immune activity [[47]18]. Tumor-associated macrophages (TAMs) with high SPP1 levels are abundant in metastatic castration-resistant prostate cancer (mCRPC). They suppress CD8⁺T cell activity and boost resistance to immune checkpoint inhibitors (ICIs) in the body. Using ciforadenant to block A2ARs cuts down on SPP1⁺ TAMs in CRPC, possibly improving treatment outcomes [[48]19]. High-grade glioblastomas exhibit multiple pathological vascular cell subtypes within their tumor vasculature, which interact more extensively with diverse immune cell populations, laying the groundwork for the development of targeted therapeutic strategies against both the vascular and immune systems [[49]20]. In cervical adenocarcinoma, the Epi_10_CYSTM1 cell cluster attracts and increases Tregs by activating the ALCAM-CD6 pathway. In return, Tregs give stem - like traits to the Epi_10_CYSTM1 cluster through the TGFβ pathway, forming a strongly immunosuppressive tumor microenvironment (TME) [[50]21]. In non-small-cell-lung-cancer, combining scRNA-seq and spatial transcriptomics reveals that abundant CD4⁺ Tregs and mCAFs indicate an immunosuppressive environment, while more CD4⁺ Th17 cells and iCAFs may predict treatment success. Single-cell sequencing has deepened our understanding of tumor development and tumor microenvironment changes. Mendelian randomization (MR) study is a causal inference method based on genetic variation. It utilizes genetic variants as instrumental variables (IVs) to mimic the design of a randomized controlled trial (RCT), thereby evaluating the causal relationship between exposure factors and health outcomes [[51]22]. It has emerged as one of the research hotspots in recent years. MR studies have established relatively close associations between numerous factors and tumorigenesis. For instance, metabolic syndrome may increase the risk of developing certain types of cancer [[52]23]. Infectious diseases are significantly associated with an elevated risk of lung cancer, with lower respiratory tract infections exerting the most pronounced promoting effect on lung cancer risk [[53]24]. Bacterial species such as Lactobacillus gasseri, Dysgonomonas, and Saccharofermentans family exhibit significant and robust causal associations with the risk of colorectal cancer [[54]25]. ADHD, schizophrenia, and insomnia may affect glioma development via neurotransmitter signaling, immune responses, tumor invasiveness/metastasis regulation, stem cell pluripotency, and glycolipid/amino acid metabolism [[55]26]. Three major categories of environmental endocrine disruptors (bisphenols, parabens, and phthalates) exert distinct impacts on the risk of different subtypes of breast cancer. Specifically, n-BuP is positively associated with luminal A-type breast cancer, MMP is negatively correlated with luminal B-type breast cancer, and MiBP shows a positive association with triple-negative breast cancer (TNBC) [[56]27]. Allergic diseases are inversely associated with the risk of colorectal cancer, and this inverse correlation is more pronounced in colorectal cancer cases with an age of onset below 50 years [[57]28]. The phyla Euryarchaeota, the genera Escherichia-Shigella, Family XIII AD3011 group, Prevotella 9, and two unclassified unknown bacterial genera are identified as pathogenic bacteria for ovarian cancer. In contrast, the R.7 group of the family Christensenellaceae, Tyzzerella 3, and the family Victivallaceae have been demonstrated to confer potential protective effects against ovarian cancer [[58]28]. From the above findings, it is evident that many previously overlooked etiological factors of tumors have come to our attention through in-depth MR investigations. This study merges scRNA-seq with Mendelian randomization (MR) to understand varied immune-neoadjuvant therapy responses in ESCC. By identifying overlaps between DEGs in single-cell subgroups and MR-derived eQTLs, it aims to clarify key gene expression and function, using single-cell analysis to reveal molecular mechanisms and guide future research. Methods Dataset source We got three datasets from the Gene Expression Omnibus (GEO) and the National Genomics Data Center (NGDC). These datasets have info on immunotherapy in neoadjuvant treatment of esophageal squamous cell carcinoma (ESCC) patients and postoperative evaluations of treatment effects. In some datasets, only part of the data meets our analytical objectives, and irrelevant data is excluded. * GSE203115: Data from 3 patients who had 2 cycles of immune-neoadjuvant therapy before surgery. 2 were “responders” and 1 was a “non-responder”. * GSE221561: 7 patients had neoadjuvant treatment, with 2 going straight to surgery. Patient S2487T got immunotherapy, radiotherapy, and chemotherapy as neoadjuvant treatment and was TRG1 post-op. * OMIX005710: 46 samples. Some patients had a complete response (CR) or partial response (PR) and were “responders”. Some had an incomplete partial response (IPR) and were “non - responders”. Though combining these data might cause batch effects, they help us understand the link between ESCC patients’ molecular traits after immune-neoadjuvant therapy and immunotherapy efficacy better. Using public data means no extra ethics docs are needed. Detailed steps of the single-cell analysis process In single - cell RNA sequencing (scRNA-seq) analysis, we used the Seurat R package (v5.2.1) to turn raw data into a Seurat object. During data preprocessing, we had strict quality control. We removed genes with very low expression and mitochondrial genes with expression over 3-fold different from the mean. For dimensionality reduction, we used “RunPCA” and “FindCluster” for PCA and clustering. We applied the Harmony algorithm to remove batch effects and UMAP for 2-D cell visualization. After filtering out “doublets” (cells wrongly seen as one), we used Seurat’s “FindAllMarkers” to find marker genes for different cell clusters by comparing cells within a cluster to all others. We manually annotated cell clusters based on known biological types using the CellMarker 2.0 database and genes from previous studies. For specific sub-groups, we did secondary clustering and annotation. We corrected for confounding factors like cell-cycle effects and batch variations. Only a few cells were filtered out in quality control. To keep potentially important differentially expressed genes (DEGs), we didn’t add more filtering at this stage. Gene expression correlation We used the Harmony algorithm or Seurat’s IntegrateData function to fix and lessen batch effects in the dataset. Then, we picked highly variable genes with the FindVariableFeatures function. Next, we calculated Spearman correlation coefficients between these genes and the CD4 gene. Genes with strong correlations to CD4 were visualized to clearly show their links. Differentially expressed gene analysis and pathway enrichment analysis We identified differentially expressed genes (DEGs) with Seurat’s “FindMarker” function using the Wilcoxon rank - sum test. For pathway enrichment analysis, we used R’s “clusterProfiler” package (v4.12.6) for GO and KEGG analyses. GO gene sets have three main categories: biological process (BP), cellular component (CC), and molecular function (MF). Pathways with an adjusted p-value below 0.05 were deemed significantly enriched. GSVA enrichment analysis We performed gene set variation analysis (GSVA) using 50 classic pathways from the Molecular Signatures Database (MSigDB). For each cell type, the GSVA package (v2.0.5) analyzed gene set enrichment to check pathway activity changes. Then, the limma R package (v3.60.4) compared cell activity scores via a moderated t-test, giving us t-values to measure pathway activity differences between cells in a specific cluster and those from other clusters. Intercellular communication and gene regulatory network analysis We used CellChat (v1.6.1) to combine gene expression data and assess differences in intercellular communication modules, relying on the default CellChatDB as the ligand-receptor interaction database. By finding overexpressed ligands or receptors in certain cell groups, we then located ligand-receptor interactions strengthened by this overexpression. To deduce gene regulatory networks, we utilized pySCENIC (v1.3.1), a Python algorithm that reconstructs regulatory networks (regulons, made up of transcription factors [TFs] and their target genes) in each cell from scRNA-seq data. This let us decode the complex transcriptional regulatory programs shaping cellular behavior and responses during intercellular communication. Pseudotime trajectory analysis We performed pseudotime trajectory analysis with Monocle 2 (v2.34.0). First, we used the “FindAllMarkers” function to spot differentially expressed genes across cell subpopulations in the dataset. Then, the “reduceDimension” function cut down data dimensionality, projecting cells onto a pseudotime trajectory using default settings. This let us figure out the developmental or temporal order of cells, giving us a peek into cellular differentiation paths and lineage ties in the studied biological system. Survival analysis We used esophageal squamous cell carcinoma (ESCC) data from the UCSC XENA database to assess the prognostic ability of certain immune cell clusters. First, we calculated signature scores for each cell cluster via Gene Set Variation Analysis (GSVA). Then, we did survival analysis for different single-cell RNA sequencing (scRNA-seq)-derived cell types. Using the survminer package (v0.4.9) in R, we split cell types into two groups based on the median of their signature scores. We generated survival curves with the Kaplan - Meier method to compare survival between the groups. We also used a multivariate Cox proportional hazards model to get hazard ratios, evaluating how cell-cluster-derived signature scores affected patient survival while accounting for confounding factors. This approach let us figure out the prognostic importance of specific immune cell clusters in ESCC patients. Mendelian randomization study We used the Mendelian Randomization (MR) framework to assess the potential causal impact of gene expression on esophageal cancer risk, relying on publicly available genome - wide association study (GWAS) datasets for esophageal cancer and expression quantitative trait loci (eQTL) data. Outcome data came from GWAS studies of three independent cohorts: * BBJ-a-117: From the BioBank Japan (BBJ) cohort, with genetic association signals for esophageal squamous cell carcinoma (ESCC), representing East Asian genetic traits. * EBI-a-GCST90018621: From the European Bioinformatics Institute (EBI), focusing on genetic predisposition to esophageal adenocarcinoma (EAC). * EBI-a-GCST90018841: From the EBI cohort, involving a cross - subtype analysis of esophageal cancer, including both squamous cell carcinoma and adenocarcinoma. We selected eQTLs with a p-value < 5 × 10⁻⁸. To remove redundant single nucleotide polymorphisms (SNPs), we did linkage disequilibrium (LD) pruning (r² < 0.001 within a 10,000 kb window). We extracted cis - acting eQTLs (cis-eQTLs) in esophageal tissue (within 1 megabase [Mb] upstream or downstream of the target gene) as exposure factors. To estimate causal effects, we used five mainstream MR methods: * Inverse-variance weighted (IVW). * MR-Egger regression. * Weighted median. * Simple mode. * Weighted mode. We conducted sensitivity analyses for result robustness: * Heterogeneity tests to check causal estimate consistency across genetic variants. * Horizontal pleiotropy tests to detect pleiotropic confounding. * Leave - one - out analyses to assess individual SNP influence on the overall causal estimate. All analyses were done in R (v4.3.2) using packages like TwoSampleMR, coloc, ggplot2, and forestplot. MR results were visualized with forest, scatter, and funnel plots, mainly presenting IVW method results for interpretation. Reactome analysis We used the Reactome pathway analysis module on the Xiantao website ([59]https://www.xiantaozi.com/) to analyze the biological functions of differentially expressed genes (DEGs). We uploaded the gene list linked to PLTP (phospholipid transfer protein) expression into the tool. We set significance thresholds: a p-value < 0.05 (from Fisher’s exact test) and a false discovery rate (FDR) < 0.25. Based on these, the tool created visualizations showing enriched pathways and biological processes tied to PLTP-related genes, offering insights into their functional roles and regulatory mechanisms. Statistical analysis All data handling, statistical analysis, and visualization were done in R (v4.3.2). We used the Kaplan - Meier method and log-rank test to estimate and compare subtype - specific overall survival (OS). The Wilcoxon or t-test assessed differences in continuous variables between two groups. For categorical variables, we applied the chi-square or Fisher’s exact test. We corrected p-value with the false discovery rate (FDR) method to control multiple testing and cut false positives. Pearson correlation analysis evaluated variable correlations. All p-value were two-tailed, and statistical significance was p < 0.05. Result Profiling of the relationship between immunotherapy neoadjuvant efficacy and immune cells in ESCC To explore the link between the effectiveness of neoadjuvant therapy (combining immune checkpoint blockade and chemotherapy) and immune cells in ESCC, we got scRNA-seq data from the 10× Genomics platform in public databases. We included 16 single-cell samples from three esophageal cancer datasets (Fig. [60]1A, Supplementary Table 1). Based on therapeutic response, we labeled cases with complete response (CR) or major pathological response (MPR) as the “response/stim group” and those with incomplete pathological response (IPR) as the “Non-response/ctrl group”. After quality control, filtering, and data integration, we had 40,198 genes and 120,102 single cells. Using dimensionality reduction, clustering, doublet filtering, and UMAP visualization, we divided the cells into 16 clusters. The top 10 expressed genes differed significantly among these clusters (Fig. [61]1B). We manually annotated the clusters using the CellMarker 2.0 website and AI-assisted searches for cluster-specific markers. The cells fell into four main lineages: epithelial, fibroblast, lymphocyte, and myeloid cells (Figs. [62]1C-E for clustering, annotation, and origin). We identified these lineages by their characteristic markers (Supplementary Figs. 1A-E). Epithelial cells were marked by genes like SFN, KRT5, and KRT14; fibroblasts by COL1A1, COL3A1, and COL1A2; myeloid cells by CD14, TPSAB1, and S100A8; and lymphocytes by CD3D, CD4, CD79A, MZB1, KLRD1, and FGFBP2. We noticed that while all these cell types were in tumor tissues, their proportions and numbers varied greatly across different tissues. This implies these differences might affect the efficacy of immunotherapy neoadjuvant treatment (Fig. [63]1F). Fig. 1. [64]Fig. 1 [65]Open in a new tab Overview of the relationship between the efficacy of immunotherapy neoadjuvant treatment and immune cells in esophageal squamous cell carcinoma (ESCC). A Workflow diagram illustrating the entire research process. B Heatmap depicting the top 10 differentially expressed genes in each immune cell cluster. C UMAP (Uniform Manifold Approximation and Projection) plot visualizing the clustering of immune cells. D UMAP plot displaying the annotation results of immune cell clusters. E UMAP plot indicating the data sources of individual cells. F Bar chart or table presenting the number and types of immune cell subtypes within each dataset Differences and functions of lymphoid cells in ESCC tissues with distinct efficacies of neoadjuvant therapy We first isolated lymphoid cells and did another round of quality control, filtering, dimensionality reduction, and clustering. Using classical markers (Fig. [66]2A), we grouped the 34 cell clusters into six major cell types (Fig. [67]2B). The proportions and numbers of each cell type varied a lot across specimens (Supplementary Fig. 2A). Based on the earlier-mentioned criteria for assessing immunotherapy neoadjuvant treatment efficacy, the “response/stim group” had a higher cell proportion (Fig. [68]2C, Supplementary Fig. 2B). By comparing the “response/stim group” with the “Non-response/ctrl group”, we found differentially expressed genes (Supplementary Table 2). Gene Ontology (GO) analysis (Fig. [69]2D) showed these genes were linked to functions like “immune response activation”. Gene Set Enrichment Analysis (GSEA) (Fig. [70]2E) revealed significant associations with signaling pathways and metabolic processes, including IL6_JAK_STAT3_SIGNALING, ANGIOGENESIS (Fig. [71]2F), MYC_TARGETS_V2, COAGULATION (Supplementary Fig. 2C), etc. Gene Set Variation Analysis (GSVA) (Fig. [72]2G) allowed a more detailed comparison of gene - set activity levels across groups, potentially helping predict prognosis or treatment response. We also explored differences in cell-cell communication and pathway activities between the two groups. Using the CellChatDB database, we found dense intercellular communication networks within each group (“response/stim” and “Non-response/ctrl”) (Fig. [73]2H). But when comparing the groups, there were significant differences in communication intensity. The “response/stim group” had slightly fewer intercellular communications (560) than the “Non-response/ctrl group” (699) (Fig. [74]2I). Overall, these findings suggest differences in lymphoid cells may be tied to variations in neoadjuvant therapy efficacy for ESCC. Fig. 2. [75]Fig. 2 [76]Open in a new tab Differences and functional characteristics of lymphoid cells in ESCC tissues with varying responses to neoadjuvant therapy. A UMAP plot depicting the expression patterns of lymphoid cell markers. B UMAP plot illustrating the clustering and cell annotation of lymphoid cells. C UMAP plot showing the distribution of cells according to different therapeutic responses. D GO (Gene Ontology) analysis revealing the functional enrichment of differentially expressed genes (DEGs) between the two groups with distinct prognoses. E GSEA (Gene Set Enrichment Analysis) demonstrating differences in signaling pathways between the two groups with varying prognoses. F Enrichment status of DEGs in the IL6_JAK_STAT3_SIGNALING and ANGIOGENESIS pathways. G GSVA (Gene Set Variation Analysis) displaying the activity levels of pathways. H Differential network diagram depicting intracellular and intercellular connections between the two groups with different responses. I Heatmap illustrating the differences in the number of cell-cell communications between the two groups with varying responses Differences in immune cell types and numbers in ESCC tissues with distinct efficacies of neoadjuvant therapy Our previous analysis showed that relying solely on existing database markers couldn’t comprehensively cluster cells due to some lacking characteristic markers or expressing multiple subset markers. For better clustering and subset comparison, we applied stricter quality control and Harmony integration to all cells, using markers from a wider range of studies. This gave us 35,479 genes and 78,914 single cells for analysis. We tested different reduction parameters in clustering: 0.1 for 11 clusters, 0.5 for 25, and 0.8 for 33. To avoid annotation errors from too many clusters, we initially picked 11. Immune cells were identified by positive PTPRC expression (Supplementary Fig. 3A), leaving us with 35,479 genes and 37,296 immune cells. After another Harmony integration (reduction parameter = 0.8), these immune cells clustered into 18 groups, annotated into eight subsets (Fig. [77]3A). These subsets had distinct marker expression patterns (Fig. [78]3B), and their top 10 expressed genes varied greatly, reflecting different functions (Fig. [79]3C). Both groups had all immune cell subsets (Fig. [80]3D), with 21,171 and 16,125 immune cells respectively. But the number of each subset varied a lot among patients (Supplementary Fig. 3B). The “Non-response/ctrl group” had more CD8⁺T cells, while the “response/stim group” had more CD4⁺T cells and mast cells (Fig. [81]3E). These differences were clear in both proportion and absolute count (Fig. [82]3F). These findings suggest that changes in immune cell numbers and types may be linked to neoadjuvant therapy efficacy for esophageal cancer. However, survival analysis of 185 esophageal cancer patients from the UCSC database found no significant correlation between immune cell subset types and prognosis (Supplementary Fig. 3C). Fig. 3. [83]Fig. 3 [84]Open in a new tab Differences in immune cell types and numbers in ESCC tissues with varying responses to neoadjuvant therapy. A UMAP plot showing the clustering and cell annotation of PTPRC-positive immune cells. B Heatmap depicting the expression patterns of typical markers in each immune cell subset. C Heatmap illustrating the top 10 differentially expressed genes (DEGs) in each immune cell cluster. D UMAP plot displaying the distribution of immune cell subsets according to different therapeutic responses. E Line graph presenting the differences in the numbers of immune cell subsets between groups with varying responses. F Bar chart showing the proportions and absolute numbers of immune cell subsets in groups with different therapeutic responses Functional differences in immune cells within esophageal squamous cell carcinoma (ESCC) tissues with distinct prognoses There were 10,949 shared differentially expressed genes (DEGs) between the “response/stim group” and the “Non-response/ctrl group” (Supplementary Table 3), with 2,573 showing significant differential expression (p_adj < 0.01, |log2FC| >1). GO analysis (Fig. [85]4A) linked these DEGs to immune response activation. GSEA (Fig. [86]4B) revealed associations with signaling pathways and metabolic processes, some activating, others suppressing (Fig. [87]4C). Changes were noted in pathways like ANGIOGENESIS and COAGULATION (Fig. [88]4D), concentrated in CD4⁺T cells (Fig. [89]4E). GSVA (Fig. [90]4F) showed pathway activity differences between groups, suggesting efficacy variations may relate to immune cell function differences. Cell-cell communication analysis revealed dense networks in each group but with intensity differences (Fig. [91]4G): 402 communications in the “response/stim group” vs. 546 in the “Non-response/ctrl group” (Fig. [92]4H). Pathway numbers differed (31 vs. 25), with cell types varying even for the same pathway. Taking IFN-II as an example (Fig. [93]4I), CD4⁺T cells communicated with different cells in each group, possibly due to ligand-receptor differences. Genes closely linked to CD4, like IL-2, CD2, CD7, were immune-related (Fig. [94]4J). Pseudotime analysis showed CD4⁺T cells in the “response/stim group” participated more in cellular evolution (Fig. [95]4K). These findings imply neoadjuvant therapy efficacy variations may correlate with CD4⁺T cell functions. Fig. 4. [96]Fig. 4 [97]Open in a new tab Functional disparities in immune cells within esophageal squamous cell carcinoma (ESCC) tissues with distinct prognoses. A GO (Gene Ontology) analysis revealing the functional enrichment of differentially expressed genes (DEGs) between the two groups with different prognoses. B Gene set enrichment analysis (GSEA) of DEGs comparing the two groups with varying prognoses. C Visualization of the activation and suppression status of DEG sets in relation to different prognoses. D Enrichment status of DEGs in the ANGIOGENESIS and COAGULATION pathways across the two prognostic groups. E Heatmap illustrating the association between activated gene sets and specific immune cell clusters. F GSVA (Gene Set Variation Analysis) depicting the activity levels of pathways in the two prognostic groups. G Differential network diagram showcasing intracellular and intercellular connections between the two groups with different responses. H Heatmap demonstrating the differences in the number of cell-cell communications between the two groups with varying responses. I Circos plot highlighting the disparities in cell types participating in the same signaling pathway between the two prognostic groups. J Top 30 genes correlated with CD4 expression. K Pseudotime analysis revealing the evolutionary differences of CD4⁺T cells across the two prognostic groups Differences in T cell subset types and numbers in ESCC tissues with distinct efficacies of neoadjuvant therapy Our previous analyses suggest CD4⁺T cells are the main cellular contributors to prognosis differences after immunotherapy neoadjuvant treatment. To clarify T cell subset counts and functions, we isolated “CD4⁺T”, “CD8⁺T”, “Proliferative cell”, and “T cell” subsets from PTPRC-positive cells. After quality control and data integration, we got 35,479 genes and 20,514 cells, clustered together. Based on characteristic markers (Fig. [98]5A, Supplementary Fig. 4A), these cells were divided into eight subsets (Fig. [99]5B). All subsets expressed immunotherapy markers like LAG3, CTLA4, and CD274 (Fig. [100]5C), with notable differences in their top 10 expressed genes (Supplementary Fig. 4B). The “response/stim group” had 10,595 cells, while the “Non-response/ctrl group” had 9,919, showing significant cellular composition differences (Fig. [101]5D). Cell types and numbers also varied across samples (Supplementary Fig. 4C). The “response/stim group” had more CD4⁺T cells, while the “Non-response/ctrl group” had more CD8⁺T cells (Fig. [102]5E), evident in both proportions and absolute counts (Fig. [103]5F). However, no correlation was found between these cellular differences and prognosis in UCSC database data (Supplementary Fig. 4D). Fig. 5. [104]Fig. 5 [105]Open in a new tab Variations in T cell subset types and numbers within ESCC tissues showing different responses to neoadjuvant therapy. A UMAP plot depicting the expression patterns of T cell markers. B UMAP plot illustrating the clustering and cell annotation of T cell subsets. C Heatmap displaying the expression levels of typical markers in each T cell subset. D UMAP plot showing the distribution of T cell subsets according to different therapeutic responses. E Line graph presenting the differences in the numbers of T cells between groups with varying therapeutic responses. F Bar chart showing the proportions and absolute numbers of T cell subsets in groups with different therapeutic responses Functional differences in T cells within esophageal squamous cell carcinoma (ESCC) tissues with distinct prognoses Among T cell subsets, 11,382 DEGs were shared between the “response/stim” and “Non-response/ctrl” groups (Supplementary Table 4), with 3,311 showing significant differential expression (p_adj < 0.01, |log2FC| >1). T cell functions align with immune cells. GO analysis (Supplementary Fig. 5A) linked DEGs to immune response activation. KEGG analysis (Supplementary Fig. 5B) suggested correlations with cytokine pathways. GSEA (Fig. [106]6A) revealed pathway associations, some activating, others suppressing (Fig. [107]6B). GSVA (Fig. [108]6C) showed pathway activity differences between groups, implying efficacy variations may relate to immune cell functions, similar but distinct from previous subsets. Notable pathway changes included ANGIOGENESIS, COAGULATION, IL6-JAK-STAT3, and NOTCH-SIGNALING (Fig. [109]6D), concentrated in CD4⁺T cells (Supplementary Fig. 5C). Cell-cell communication varied between groups (Supplementary Fig. 5D), with 580 communications in “response/stim group” and 850 in “Non-response/ctrl group” (Supplementary Fig. 5E), forming 26 and 31 pathways, respectively. Participating cell types differed even for the same pathway (Supplementary Fig. 5F), likely due to ligand-receptor differences (Supplementary Fig. 5G). Genes associated with CD4, like IL-32, CD2, CD7, correlated with immune molecules (Fig. [110]6E). Immunomodulatory gene indices differed between groups (Fig. [111]6F). The “response/stim group” had a higher EMT score (Fig. [112]6G). Pseudotime analysis showed CD4⁺T cells in “response/stim” participated more in cellular evolution (Fig. [113]6H). These findings suggest efficacy variations may correlate with CD4⁺T cell immune functions. Fig. 6. [114]Fig. 6 [115]Open in a new tab Functional disparities of T cells in esophageal squamous cell carcinoma (ESCC) tissues with distinct prognoses. A Gene set enrichment analysis (GSEA) of differentially expressed gene (DEG) sets comparing the two groups with different prognoses. B Visualization of the activation and suppression status of DEG sets in relation to different prognoses. C GSVA (Gene Set Variation Analysis) illustrating the activity levels of pathways in the two prognostic groups. D Enrichment status of DEGs in the ANGIOGENESIS, COAGULATION, IL6-JAK-STAT3-SIGNALING, and NOTCH-SIGNALING pathways across the two prognostic groups. E Top 30 genes correlated with CD4 expression. F Differences in immune-regulatory genes between the two groups with varying prognoses. G Comparison of epithelial-mesenchymal transition (EMT) scores between the two groups with different responses. H Pseudotime analysis revealing the evolutionary differences of CD4⁺T cells across the two prognostic groups Differences in the types and numbers of CD4⁺T cell subsets in esophageal squamous cell carcinoma (ESCC) tissues with distinct prognoses We extracted the T cell subsets annotated as “CD4⁺T”, “TH17”, and “Treg” from the previously analyzed T cell populations for further quality control and data integration. After obtaining a comprehensive dataset of 35,479 genes and 7,949 cells, we re-clustered these cells. Based on characteristic markers (Supplementary Fig. 6A), the cells were divided into four subsets (Fig. [116]7A). While all subsets expressed classical immune markers, their expression profiles varied among the subsets (Fig. [117]7B). There were notable differences in the top 10 expressed genes across these subsets (Supplementary Fig. 6B). There were significant disparities in the types and numbers of cells between the “response/stim group” and the “Non-response/ctrl group” (Fig. [118]7C). The proportions of cellular subsets varied considerably among individual patients (Supplementary Fig. 6C). Specifically, the “response/stim group” exhibited a marked increase in Treg cells, whereas the “Non-response/ctrl group” showed a notable elevation in Tfh_BCL6 cells (Fig. [119]7D). These differences were evident in both the cellular proportions and absolute counts (Fig. [120]7E). Fig. 7. [121]Fig. 7 [122]Open in a new tab Variations in CD4⁺T cell subset types and numbers within esophageal squamous cell carcinoma (ESCC) tissues with distinct prognoses. A UMAP plot illustrating the clustering and cell annotation of CD4⁺T cell subsets. B Heatmap depicting the expression levels of commonly used clinical immune-predictive molecules in each CD4⁺T cell subset. C UMAP plot showing the distribution of CD4⁺T cell subsets according to different therapeutic responses. D Line graph presenting the differences in the numbers of CD4⁺T cells between groups with varying therapeutic responses. E Bar chart showing the proportions and absolute numbers of CD4⁺T cell subsets in groups with different therapeutic responses Functional differences in CD4⁺T cell subsets among esophageal squamous cell carcinoma (ESCC) tissues with distinct prognoses Within CD4⁺T cell subsets, 10,172 DEGs were shared between the “response/stim” and “Non-response/ctrl” groups (Supplementary Table 5), with 748 showing significant differential expression (p_adj < 0.01, |log2FC| >1). GO analysis (Supplementary Fig. 7A) linked DEGs to adaptive immune response. KEGG analysis (Supplementary Fig. 7B) suggested cytokine pathway correlations. GSEA (Fig. [123]8A) revealed pathway associations, some activating, others suppressing (Fig. [124]8B). GSVA (Fig. [125]8C) showed pathway activity differences between groups. Notable pathway changes included INTERFERON_ALPHA_RESPONSE, KRAS_SIGNALING, TGF_BETA_SIGNALING, and INTERFERON_GAMMA_RESPONSE (Fig. [126]8D), concentrated in Tfh_BCL6 (upregulated) and Treg (downregulated) subsets (Supplementary Fig. 7C). Cell-cell communication analysis uncovered both intracellular and intercellular connections (Supplementary Fig. 7D). The “response/stim group” had 222 intercellular communications, while the “Non-response/ctrl group” had 152, involving 20 and 19 pathways, respectively. Participating cell types differed even for the same pathway (Supplementary Fig. 7E). Genes closely associated with CD4 expression correlated with immune molecules like IL-32, CD27, HLA-A/B (Fig. [127]8E). EMT score differences were significant between groups (Fig. [128]8F). Pseudotime analysis showed immunological profile disparities at different stages (Fig. [129]8G). These findings suggest efficacy variations may relate to CD4⁺T cell functions, particularly the Treg subset, and the immunological microenvironment. Fig. 8. [130]Fig. 8 [131]Open in a new tab Functional disparities of CD4⁺T cell subsets in esophageal squamous cell carcinoma (ESCC) tissues with distinct prognoses. A Gene set enrichment analysis (GSEA) of differentially expressed gene (DEG) sets comparing the two groups with different prognoses. B Visualization of the activation and suppression status of pathways associated with DEG sets in relation to different prognoses. C GSVA (Gene Set Variation Analysis) depicting the activity levels of pathways in the two prognostic groups. D Enrichment status of DEGs in the INTERFERON_ALPHA_RESPONSE, KRAS_SIGNALING, TGF_BETA_SIGNALING, and INTERFERON_GAMMA_RESPONSE pathways across the two prognostic groups. E Top 30 genes correlated with CD4 expression. F Differences in immune-regulatory genes between the two groups with varying prognoses. G Comparison of epithelial-mesenchymal transition (EMT) scores between the two groups with different therapeutic responses. H Pseudotime analysis revealing the evolutionary differences of regulatory T (Treg) cells within CD4⁺T cells across the two prognostic groups Investigating the role of the PLTP gene in esophageal cancer via Mendelian randomization studies Previous analyses explored cellular differences and their correlations with immunotherapy efficacy, suggesting genetic disparities may drive cellular variations. Comparing DEGs between “response/stim” and “Non-response/ctrl” groups across lymphoid, immune cell, T cell, and CD4⁺T cell subsets, we identified 95 overlapping DEGs (Supplementary Tables 2–5). Using esophageal cancer datasets from the EAU website, we conducted MR analyses and identified PLTP and PKD4 as shared genes with significant MR associations (Fig. [132]9A). PLTP was widely expressed in T subsets and CD4⁺T cell subsets (Fig. [133]9B), with higher expression in the “response/stim group” (Fig. [134]9C), particularly in CD4⁺T cells and their Treg subset (Fig. [135]9D), suggesting a link between PLTP expression and immunotherapy efficacy. We retrieved the PLTP promoter sequence from NCBI and used HumanTFDB for transcription factor prediction. SCENIC analysis on lymphoid, immune cell, and T subsets identified key transcription factors (BCL3, E2F1, etc.) (Fig. [136]9E, Supplementary Tables 6–8). Genes correlated with PLTP expression (Fig. [137]9F) were predominantly associated with metabolic processes (Supplementary Table 9). Reactome pathway analysis linked relevant DEGs to pathways like “Cell Cycle Checkpoints” (Supplementary Table 10). GSEA analysis and ridge plots revealed associations with metabolic pathways, including the proteasome and oxidative phosphorylation (Fig. [138]9G). These findings suggest that the interplay between immune cells, gene regulation, and metabolism may influence immunotherapy efficacy in esophageal cancer. Fig. 9. [139]Fig. 9 [140]Open in a new tab The role of the PLTP gene in esophageal cancer. A Positive results of expression quantitative trait loci (eQTL) from Mendelian randomization studies using esophageal cancer datasets. B UMAP plot depicting the distribution of PLTP in T cell subsets and CD4⁺T cell subsets. C Differential expression of PLTP between groups with distinct prognoses in esophageal cancer. D Expression levels of PLTP in different T cell subsets and CD4⁺T cell subsets. E Heatmap illustrating the results of SCENIC (Single-Cell rEgulatory Network Inference and Clustering) analysis for T cell subsets and CD4⁺T cell subsets. F Top 30 genes correlated with PLTP expression. G Visualization of gene pathways associated with PLTP expression through Reactome pathway analysis Discussion This study used scRNA-seq and eQTL analysis to explore why immunotherapy works differently in esophageal squamous cell carcinoma (ESCC), aiming to predict treatment responses. We found CD4⁺T cells, crucial in the ESCC immune environment, are linked to good immunotherapy results. GWAS data revealed PLTP, an esophageal cancer gene, is highly expressed in T cell subsets, especially CD4⁺T cells. As an oncogene, PLTP affects both cancer development and immunotherapy effectiveness. Thus, PLTP and CD4⁺T cells could be biomarkers for predicting immunotherapy response in ESCC. This research offers key insights into ESCC’s immune mechanisms and suggests new treatment strategies, contributing to personalized, effective ESCC therapies and better patient outcomes. We analyzed scRNA-seq data to study how cellular subtypes relate to immunotherapy efficacy in ESCC. Through dimensionality reduction and clustering, CD4⁺T cells emerged as crucial, evolving from lymphoid to immune and then T cell subsets. Patients responding to immunotherapy had more CD4⁺T cells than non-responders. Pseudotime analysis revealed their role in evolutionary processes, causing differential gene expression that impacts pathways and cell communication, suggesting CD4⁺T cells as potential predictive biomarkers. Numerous studies confirm CD4⁺T cells’ role in tumor progression, though their specific functions are still debated. For instance, a subset of CD4⁺T cells expressing high levels of the interleukin-7 receptor, along with CD97, interleukin-18 receptor, and Ly6C, has been linked to patient responsiveness to anti-PD-1 therapy in cancer and disease states in multiple sclerosis [[141]29]. In the tumor microenvironment (TME) of endometrial cancer, a CD4⁺ T cell cluster characterized by CXCL13 expression, along with granzyme B and CCL5 expression, exhibits cytotoxic functions and participates in immune surveillance [[142]30]. CD4⁺FOXP3⁺ T cells, which suppress the body’s immune response against tumors and are present in various cancer microenvironments, can be influenced by the CXCL12/CXCR4 signaling axis to induce FOXP3 exon 2 splice variants (FOXP3E2). FOXP3E2⁺ Tregs possess stronger immunosuppressive capabilities and are associated with poorer patient prognosis [[143]31]. In head and neck squamous cell carcinoma, MIF activates the JAK/STAT3 pathway, boosts apCAF transformation, raises the CD4⁺T/CD8⁺T cell ratio, and aids tumor progression [[144]32]. Memory CD4⁺T cells rely on MHCII signals provided by tumor-infiltrating myeloid cells and tissue-resident myeloid cells to regulate the appropriate ratio of Bcl-6 to T-bet, triggering long-term antitumor immune memory and sustaining a durable antitumor response [[145]33]. In non-small cell lung cancer patients, concurrent targeting of PD-1 and CD85j enhances the effector functions of CD4⁺GzmB⁺ T cells through the activation of the AKT-FOXO1-T-bet pathway [[146]34]. PD-L1 blockade therapy, when administered after irreversible electroporation, promotes the differentiation of CD4⁺T cells into interferon-γ-positive (IFN-γ⁺) Th1 and Th17 cells with antitumor activity, activating the cDC2-CD4⁺T cell axis and playing a crucial antitumor role in pancreatic cancer with low MHC-I expression [[147]35]. Adoptive transfer of interleukin-9-producing cytotoxic CD8⁺T cells (Tc9 cells) secretes IL-24, recruits cDC2 cells, and inhibits tumor growth by activating effector CD4⁺T cells [[148]36]. In colorectal cancer, a co-culture study using CD4⁺T cells from Foxp3eGFP mice and CRC organoids showed that more infiltrating CD4⁺ regulatory T cells (Tregs) in the TME correlate with tumor progression, immunotherapy failure, and poor prognosis [[149]37]. Despite these divergent findings regarding the specific roles of CD4⁺T cells in tumors, there is a consensus on their significant involvement in tumor development and progression. MR analysis pinpointed pathogenic genes tied to ESCC. By cross-referencing these with DEGs from our single-cell study (comparing immune cells, T cell subsets, and CD4⁺T cells across prognostic groups), we spotlighted PLTP as a key gene. PLTP was strongly linked to ESCC risk, suggesting its involvement in disease development. Genes related to PLTP were mainly involved in metabolic pathways, especially Cell Cycle Checkpoints or Mitotic Prometaphase. This connects immune cells, gene control, and metabolism, providing fresh insights into PLTP’s cancer-causing roles. A literature review backs PLTP’s vital functions in disease processes, highlighting its predictive value and essential roles. For instance, recombinant phospholipid transfer protein (PLTP) suppresses the release of interleukin-1β (IL-1β), IL-18, and tumor necrosis factor-α (TNF-α), and blocks the activation of the NLRP3 inflammasome/GSDMD signaling pathway in septic mice, thereby inhibiting inflammatory responses and cardiomyocyte pyroptosis, and ultimately ameliorating sepsis-induced cardiac dysfunction [[150]38]. Three cancer-associated p53 hypomorphic variants, P47S, Y107H, and G334R, significantly impair PLTP gene transcriptional activation, inhibit cellular colony formation ability, and modulate cellular sensitivity to ferroptosis [[151]39]. In lung adenocarcinoma patients, postoperative peripheral blood extracellular vesicle protein levels of PLTP and MET are markedly reduced, suggesting their potential as biomarkers for assessing the success of tumor resection surgery [[152]40]. Phosphocholine, via PLTP-mediated energy-dependent active transport, enables tumor-targeted delivery, offering a new approach for developing intelligent nanomedicine systems based on metabolic targets [[153]41]. In human hepatocellular carcinoma cell lines, overexpression of profurin forms a complex with PLTP via its prodomain, mediating PLTP degradation and thereby intervening in lipid metabolism disorders [[154]42]. Knockdown of LPL and PLTP reduces cell cycle proteins, boosts kinase inhibitors like p16, p21, and Rb, and inhibits EMT markers and invasion molecules (e.g., MMPs), thereby curbing glioma cell proliferation and migration and promoting apoptosis [[155]43]. Collectively, these findings underscore the vital biological functions of PLTP, further emphasizing its significance in cancer biology and its potential as a therapeutic target or prognostic biomarker. Limitations of the study CD4⁺T cells benefit ESCC immunotherapy neoadjuvant treatment, suggesting their potential as predictive biomarkers. Targeting PLTP and its pathways may also offer a promising therapeutic approach. However, our study has limitations: single-cell data lack large-scale validation, not all cells show characteristic gene expression, MR analysis only indicates disease occurrence (not therapy outcomes), and findings need further in vitro/in vivo validation. Despite these, our research reveals the complex interplay between immune cells, genetics, and therapy responses in ESCC, guiding future personalized treatment strategies to improve clinical outcomes. Conclusion This study emphasizes the importance of scRNA-seq and MR in highlighting CD4⁺T cells’ role in predicting immunotherapy efficacy for ESCC, paving the way for tailored treatments. Identifying PLTP as a key gene in ESCC offers new precision medicine strategies, potentially improving outcomes across disease stages. While these findings provide novel insights into predicting immunotherapy efficacy, further research with larger samples and experimental validations is needed to confirm these discoveries and fully explore CD4⁺T cells’ potential. Such efforts will advance personalized oncology and optimize ESCC treatments. Supplementary Information Below is the link to the electronic supplementary material. [156]Supplementary Material 1.^ (16.5MB, zip) Acknowledgements