Abstract Introduction Hypopharyngeal squamous cell carcinoma (HSCC) is one of the malignant tumors with the worst prognosis in head and neck cancers. The transformation from normal tissue through low-grade and high-grade intraepithelial neoplasia to cancerous tissue in HSCC is typically viewed as a progressive pathological sequence typical of tumorigenesis. Nonetheless, the alterations in diverse cell clusters within the tissue microenvironment (TME) throughout tumorigenesis and their impact on the development of HSCC are yet to be fully understood. Methods We employed single-cell RNA sequencing and TCR/BCR sequencing to sequence 60,854 cells from nine tissue samples representing different stages during the progression of HSCC. This allowed us to construct dynamic transcriptomic maps of cells in diverse TME across various disease stages, and experimentally validated the key molecules within it. Results We delineated the heterogeneity among tumor cells, immune cells (including T cells, B cells, and myeloid cells), and stromal cells (such as fibroblasts and endothelial cells) during the tumorigenesis of HSCC. We uncovered the alterations in function and state of distinct cell clusters at different stages of tumor development and identified specific clusters closely associated with the tumorigenesis of HSCC. Consequently, we discovered molecules like MAGEA3 and MMP3, pivotal for the diagnosis and treatment of HSCC. Discussion Our research sheds light on the dynamic alterations within the TME during the tumorigenesis of HSCC, which will help to understand its mechanism of canceration, identify early diagnostic markers, and discover new therapeutic targets. Keywords: hypopharyngeal squamous cell carcinoma, carcinogenesis, dynamic transcriptomic mapping, tissue microenvironment, single-cell sequencing 1. Introduction The hypopharynx is crucial for physiological functions like swallowing and speech. Hypopharyngeal squamous cell carcinoma (HSCC) often develops unnoticed, leading to late-stage diagnoses. Despite advanced treatments, the 5-year survival rate is below 40%, marking it as one of the most severe malignancies in the head and neck region ([35]1). Squamous cell carcinoma is the predominant form of hypopharynx, strongly linked to smoking and alcohol ([36]2). The transformation from normal mucosa to cancer, involving stages of low-grade intraepithelial neoplasia (LGIN) and high-grade intraepithelial neoplasia (HGIN), is poorly understood due to the complexity of signaling networks and molecules involved ([37]3). Thus, studying the cellular and molecular alterations in HSCC development is vital for understanding its molecular mechanisms, identifying diagnostic markers, and discovering therapeutic targets. Understanding the biological mechanisms of HSCC carcinogenesis requires detailed characterization of the molecular, cellular, and acellular components involved in cancerous transformation, which exhibit pronounced spatial and temporal heterogeneity throughout tumor initiation and progression ([38]4). This heterogeneity is marked by the emergence of distinct cellular entities, each with unique molecular signatures and functional differences ([39]5). Traditional transcriptome sequencing methods, which average RNA expression, can mask the disparities in gene expression among diverse cells within a group. The advent of single-cell RNA sequencing (scRNA-seq) has overcome this, allowing for the extraction and preparation of RNA libraries at the single-cell level and providing comprehensive transcriptomic insights. This technology aids in identifying cell types, states, and functions, and illuminates cellular heterogeneity and evolutionary pathways. Integrated with immunome library analysis, it offers a complete examination of the T-cell receptors (TCR) and B-cell receptors (BCR) of all immune cells, providing a detailed immune profile of gene expression in specific tissues or diseases. This analysis is crucial for uncovering tumor heterogeneity, tracing cell lineage, and understanding the complex mechanisms of tumor clonal evolution, paving the way for early detection, therapeutic stratification, prognostic evaluation, and monitoring of recurrence ([40]6). In this study, we explored the multi-stage carcinogenesis of hypopharyngeal mucosa by conducting scRNA-seq on tissue specimens from various pathological stages, aiming to analyze the composition and expression variations of cells within different tissue microenvironments (TME). This approach, simulating the progression of hypopharyngeal carcinogenesis, aspires to construct a comprehensive dynamic transcriptome map to depict the evolution of cellular and molecular expressions throughout tumorigenesis. The insights gained are expected to elucidate the trajectory and potential molecular mechanisms of hypopharyngeal carcinogenesis and reveal key regulatory molecules, aiding in the identification of predictive molecular markers for hypopharyngeal carcinogenesis, thereby contributing significantly to the advancement of HSCC analysis. 2. Materials and methods 2.1. Patient recruitment and sample collection From January to February 2023, five male patients, median age 57 (55-63 years), with HSCC were recruited from the Cancer Hospital of the Chinese Academy of Medical Sciences. The study received approval from the hospital ethics committee (number: 22/454-3656), with informed consent obtained from each patient before examination. None had received any treatment (radiation, chemotherapy, or surgery) or had a history of other tumor diseases. Before treatment, all underwent laryngoscopy and gastroscopy. Based on laryngoscopic findings, multiple biopsies were taken from different hypopharyngeal regions of the patients, yielding nine samples: four from the left pyriform sinus, three from the right, and two from the posterior hypopharyngeal wall. Each sample was bifurcated; a fragment was preserved in 10% formalin for routine pathology, and the remainder was used for scRNA-seq and TCR/BCR-seq. Two experienced, blinded pathologists conducted the pathological assessments, agreeing on the final diagnoses. Histopathological diagnosis is the diagnostic gold standard. The final pathological diagnoses were two cases of normal squamous epithelial tissue, one of LGIN, three of HGIN, and three of HSCC. Subsequently, we classified the study categories into four groups based on pathological grading: Normal, LGIN, HGIN, and Tumor ([41]7, [42]8). 2.2. Preparation of single cell suspensions After sampling, tissues were rinsed of any blood stains with saline and stored in brown tubes with MACS^® Tissue Storage Solution (Miltenyi Biotec), transported to the laboratory at 4°C. Dissociation experiments began within 1 hour of arrival. Samples were segmented into 2-3 mm pieces and processed to single-cell suspensions using the Human Tumor Dissociation Kit protocol. Tissue pieces were transferred to a 5 mL tube with dissociation solution and dissociated at 37°C for 2 hours using a rotary mixer. Post-dissociation, 20 mL of DMEM was added, and the suspension was filtered using a 70 μm strainer. Cells were collected by centrifugation at 4°C. Cells were then resuspended in 1×PBS and treated with Red Blood Cell Lysis Solution. Finally, cells were resuspended in 1×PBS + 0.04% BSA + 1U/μL RNase inhibitor. Cell viability, concentration, and aggregation rate were measured using AO/PI Fluorescent Dye on a LUNA-FL™ Cell Counter, with required standards of viability >75%, concentration between 700-1200 cells/μL, and aggregation rate <5%. 2.3. Library construction and Next generation sequencing We utilized Chromium Next GEM Single Cell 5’ Reagent Kits v2 (Dual Index) from 10× Genomics to construct single-cell libraries, aiming for 10,000 captured cells per sample. After generating GEMs with the Chromium Controller, we adhered to the kit instructions for 5’ single-cell RT-PCR amplification, 5’ cDNA amplification and purification, TCR and BCR sequence amplification and purification, and library construction. The quality and concentration of the library were assessed using a Qubit 4.0 and Qubit™ 1× dsDNA Assay Kits (high sensitivity) from Thermo Fisher Scientific. The molar concentration and fragment insertion of the library were evaluated using the StepOnePlus™ Real-Time PCR System from Applied Biosystems and detected by LabChip Touch. Sequencing was executed on Illumina’s Novaseq 6000 with a PE150 read length. For processing single-cell 5’ gene expression and TCR enrichment data, we employed Cell Ranger count and Cell Ranger vdj functions of Cellranger software (version 6.0.1) from 10× Genomics. Gene expression data were aligned to the human genome reference (GRCh38), and TCR and BCR enrichment data to the VDJ reference sequences available at 10× Genomics Reference Data. 2.4. Single-cell gene expression quantification and determination of cell types We processed the sequencing data utilizing the Seurat R package (version 4.3.0) ([43]9). Data was converted into a Seurat object and quality filtered to exclude cells meeting specific conditions. Perform batch effect correction using the harmony package (version 1.0.3). Logarithmic normalization and linear regression were performed using Seurat package functions to construct the gene expression matrix. The COSG package (version 0.9.3) identified cell clusters, categorized into six primary cell types based on distinct marker genes: T cells, myeloid cells, B cells, epithelial cells, mast cells, and fibroblasts ([44]10). Subsequently, normalization, scaling, and clustering were repeated to further subdivide and label specific cell subtypes based on average expression of gene sets in each primary cell type. Cells with multiple labeled genes and elevated UMI counts were considered cellular contamination and excluded. Each cluster of a primary cell type was assigned a cluster identifier containing a marker gene, selected based on criteria including top ranking in differential gene expression analysis, high specificity of gene expression, and literature support validating their role as marker genes or functional genes associated with the cell type. 2.5. Pathway enrichment analysis We used the clusterProfiler software package ([45]11) (version 4.0.5) for R for Gene Ontology (GO), KEGG Pathway, and Reactome enrichment analyses to explore the functions and mechanisms of the identified cellular clusters. P-values were adjusted with the Benjamini and Hochberg method, with p.adjust values below 0.05 deemed statistically significant. Additionally, we performed Gene Set Variation Analysis (GSVA) using the GSEABase package ([46]12) (version 1.62.0) for R, primarily focusing on the 50 hallmark gene sets from the MSigDB database ([47]https://www.gsea-msigdb.org/). 2.6. CNV estimation To identify malignant cells in HSCC patients, we used inferCNV software (version 1.14.2) to infer CNVs from chromosomal gene expression patterns, setting a cutoff value at 0.1 and enabling the denoise option. We chose the expression profiles of T and B cells as references, including all epithelial cell clusters in the observation