Abstract Simple Summary Eosinophilic esophagitis is a chronic disease that can be complicated by fibrosis and strictures in the esophagus. Fibroblasts are a cell type that are considered important mediators of this process. The aim of this study was to better understand how fibroblasts can cause strictures by using publicly available single-cell RNA sequencing data to identify the different types of fibroblasts in patients with eosinophilic esophagitis. We identified at least two types of fibroblasts in eosinophilic esophagitis. Our analysis shows that fibroblasts are important contributors to the communication that occurs between cells in the human esophagus and that each of the fibroblast types has a unique way of interacting with other cell types (e.g., with epithelial cells or immune cells). This work can be used to provide additional insights into how strictures form in eosinophilic esophagitis. Abstract Fibroblast heterogeneity remains undefined in eosinophilic esophagitis (EoE), an allergic inflammatory disorder complicated by fibrosis. We utilized publicly available single-cell RNA sequencing data ([34]GSE201153) of EoE esophageal biopsies to identify fibroblast sub-populations, related transcriptomes, disease status-specific pathways and cell–cell interactions. IL13-treated fibroblast cultures were used to model active disease. At least 2 fibroblast populations were identified, F_A and F_B. Several genes including ACTA2 were more enriched in F_A. F_B percentage was greater than F_A and epithelial–mesenchymal transition upregulated in F_B vs. F_A in active and remission EoE. Epithelial–mesenchymal transition was also upregulated in F_B in active vs. remission EoE and TNF-α signaling via NFKB was downregulated in F_A. IL-13 treatment upregulated ECM-related genes more profoundly in ACTA2− fibroblasts than ACTA2+ myofibroblasts. After proliferating epithelial cells, F_B and F_A contributed most to cell–cell communication networks. ECM–Receptor interaction strength was stronger than secreted or cell–cell contact signaling in active vs. remission EoE and significant ligand–receptor pairs were driven mostly by F_B. This unbiased analysis identifies at least 2 fibroblast sub-populations in EoE in vivo, distinguished in part by ACTA2. Fibroblasts play a critical role in cell–cell interactions in EoE, most profoundly via ECM–receptor signaling via the F_B sub-group. Keywords: cell–cell communication, eosinophilic esophagitis, fibroblasts, fibroblast sub-populations, pathway analysis, single-cell RNA sequencing data analysis 1. Introduction Eosinophilic esophagitis (EoE) is a Th2 immune/antigen-mediated clinicopathologic disorder characterized by symptoms of esophageal dysfunction and an esophageal, eosinophil predominant inflammation of ≥15 eosinophils/hpf in mucosal biopsies. Worsening dysphagia and esophageal food impactions requiring endoscopic interventions and the need for esophageal dilations are the most significant complications of EoE. The prevailing theory is that chronic, untreated and/or persistent inflammation (e.g., due to delays in diagnosis or therapy) can initiate a tissue remodeling process in the esophagus that involves epithelial hyperplasia, subepithelial fibrosis, and hypertrophy of the esophageal smooth muscle, that progresses, in many, though not all adults, to a fibrostenotic phenotype with strictures and luminal narrowing [[35]1,[36]2,[37]3,[38]4,[39]5]. The pathogenesis of fibrostenosis in EoE is an active area of investigation and remains incompletely defined [[40]6]. Activated fibroblasts are considered one of the primary effector cells in fibrosis, via deposition of connective tissue components including extracellular matrix (ECM) structural and adhesive proteins such as collagen and fibronectin and ground substance [[41]7,[42]8]. In EoE, cell-culture-based studies have shown that pro-fibrotic fibroblast behavior is regulated by Th2 cytokines prevalent in EoE (IL-4, IL-13, IL-5) and TGFβ1 secreted from a number of cells including eosinophils [[43]9]. Pro-fibrotic fibroblast behavior has also been shown to be regulated by crosstalk with the extracellular matrix [[44]10], endothelial [[45]11], and epithelial [[46]12] cells. Finally, epithelial cells have been shown to acquire fibroblast characteristics through epithelial–mesenchymal transition [[47]13]. It is becoming increasingly clear, however, that fibroblasts are a heterogeneous population that remains incompletely defined in the human esophagus. Single-cell surveys of other organs and disease states such as the human colon and inflammatory bowel disease have revealed heterogeneity of stromal cell populations. Myofibroblasts, pericytes and 4 additional distinct populations of fibroblast-like cells have been identified in the healthy colon with shifts in these populations in active colonic inflammation [[48]14]. More recently, single-cell RNA profiling of the human esophagus has revealed the existence of three major classes of stromal cells in the normal human esophagus (early immune, desmoplastic/ECM-depositing, and contractile fibroblast phenotypes) with shifts in these populations in the setting of Barrett’s esophagus, an intestinal metaplasia of the otherwise squamous epithelium [[49]15]. There is also emerging evidence that fibroblast populations change with disease state in EoE with a recent study showing shifts in fibroblast populations in response to a TNF-related cytokine LIGHT. Spatial transcriptomics demonstrates an increase in LIGHT-regulated inflammatory fibroblasts in the epithelium in active EoE and a decrease in homeostatic fibroblasts in the lamina propria [[50]16,[51]17]. Despite these advances, the molecular features, heterogeneity, and cellular interactions of fibroblasts in the complex fibroinflammatory milieu of EoE remain incompletely defined. Our aim was to leverage a publicly available dataset of single-cell RNA seq (scRNA-seq) of EoE patients [[52]18] to identify fibroblast sub-populations, and related transcriptomes, disease status-specific pathways and cell–cell interactions in active and remission states. Active EoE was defined by histologic findings of ≥15 eosinophils per high-power field in esophageal biopsies and clinical symptoms. EoE in remission was defined by a history of EoE and eosinophils < 2 eosinophils per high power field in esophageal biopsies [[53]18]. Herein we identify at least two fibroblast sub-populations in active and remission EoE, F_B and F_A, distinguished in part by ACTA2. The percentage of F_B is greater than F_A in active and remission EoE and characterized by genes related to epithelial–mesenchymal transition. TNF-α signaling via NFKB is downregulated in F_A and epithelial–mesenchymal transition is upregulated in F_B in active vs. remission EoE, respectively. Cell–cell communication analysis shows that fibroblasts play a critical role in cell–cell interactions in EoE, most profoundly via ECM-receptor signaling via the F_B sub-group. 2. Materials and Methods 2.1. Data Sources Raw scRNA-seq data were extracted from Gene Expression Omnibus (GEO) under the accession number: [54]GSE201153. 2.2. scRNA-Seq Data Analysis The scRNA-seq data were pre-processed for filtration, normalization and scaling using R toolkit Seurat (v4.1.2) [[55]19]. After the selection and filtration of cells (genes expressed in at least 3 cells, cells with more than 200 genes represented by at least one UMI (unique molecular identifier) and percentage of counts coming from mitochondrial genes less than or equal to 5%), data normalization and scaling were performed using default options. Data were then batch corrected using Seurat’s integration method by all 10 samples, i.e., control (n = 2), EoE remission (n = 3), and active EoE (n = 5), to unbiasedly compare the cellular population repertoire. The datasets were first normalized with SCTransform. Seurat’s PrepSCTIntegration function was then applied prior to identifying anchors. In Seurat’s FindIntegrationAnchors and IntegrateData function, the normalization.method parameter was set to ‘SCT’. The top 20 principal components of the processed expression data were used to identify clusters based on Elbow plots ([56]Supplementary Figure S1A). UMAP algorithm was used for dimensionality reduction. All dimensions of the low-dimension representation were used as input to generate the UMAP plots with default parameters. Marker genes were identified by determining the average log-fold change (logFC) of expression of each cluster compared to the rest of the cells using Seurat’s FindAllMarkers function using default settings for the Wilcoxon rank sum and MAST tests, which yielded the same commonly highly expressed genes in each cluster. We identified marker genes as those with an average log-fold change above 0.25, as previously described [[57]18]. Clusters were labeled using cell types associated with the previously identified marker genes. The listed marker genes include, e.g., CNFN labeling epithelial cells, GNLY labeling lymphocytes, LYZ labeling myeloid, TPSAB1 labeling mast cells, PLVAP labeling endothelial cells and LUM labeling fibroblasts. The acquired epithelial cell populations were also clustered into six subpopulations using Seurat with default settings, as previously described to annotate the epithelial subpopulations [[58]18]. Listed marker genes included KRT15 for Quiescent, TOP2A for Proliferating, DSP for Transitioning 1, SERPINB3 for Transitioning 2, TGM3 for Differentiated_low and RNASE7 for labeling Differentiated_high, respectively. 2.3. Identification of Fibroblast Subpopulations Data were batch corrected using Seurat’s integration method by 8 samples (i.e., EoE remission, and active EoE) as described above. Following integration, UMAP dimensionality reduction was performed on the first 20 principal components, again based on Elbow plot ([59]Supplementary Figure S1B), resulting in 383 and 243 of the total number of fibroblasts in EoE remission and active EoE samples, respectively. Next, the integrated fibroblast population was further clustered into two and four subpopulations using Seurat with resolution set at 0.2 and 0.6 (determined by clustering tree using the R packages ‘clustree’ Seurat: v4.2.0 to ensure stable cluster), respectively, on the first 15 principal components ([60]Supplementary Figure S1C). We did not perform sub-clustering in the control samples because there were only 26 fibroblasts. In the calculator from SATIJA lab that developed R package ‘Seurat’ v4.2.0, access date 20 September 2023 ([61]https://satijalab.org/howmanycells/, accessed on 20 September 2023), we set the assumed number of cell types to 2 or 4, minimum fraction to 10% (we assumed there were no rare cell types) and minimum desired cells per type to 10 under the condition of 95% confidence. We found that if the cell number reached 167 or 179, at least 10 cells from each cluster could be accurately detected. The UMAP algorithm was used for dimensionality reduction. Marker genes across two fibroblast subpopulations were identified using Seurat’s FindMarkers function with default settings with both Wilcoxon rank sum and MAST tests. A heatmap depicting the top 30 marker genes in each cluster was generated, where the listed gene order was based on adjusted p-value (<1 × 10^−10). The lowest fold change among the marker genes was 1.6. 2.4. DEGs and GSEA/Hallmark Database Marker genes or differentially expressed genes across sub-clusters in active and in remission EoE and across disease status for each sub-cluster were identified with Seurat’s FindMarkers function using the default settings for both Wilcoxon rank sum and MAST tests, as described above, with similar results for each sub-cluster. Differentially expressed genes were considered with an FDR-adjusted p value threshold of q < 0.05 and a fold-change (FC) threshold of FC > 1.5. The whole gene lists identified with Seurat’s FindMarkers function using the default settings were used as inputs of R package ‘fgsea’ v1.30.0 with default parameters for gene set enrichment analysis (GSEA). Human hallmark functional collection was chosen as a resource of the GSEA molecular signatures database. Only pathways with an FDR-adjusted p value threshold of q < 0.25 were considered. 2.5. Cell–Cell Interaction Cell–cell interaction was calculated using the R package ‘CellChat’ v1.5.0 and CellChatDB database with default parameters [[62]20], only in EoE remission and active EoE samples. The above-mentioned Seurat outputs from these 8 samples with a resolution of 0.2 for fibroblasts sub-clustering, with the fibroblast population replaced by F_A and F_B sub-clusters, were used as the inputs. CellChatDB is generated based on the KEGG signaling pathway database and recent peer-reviewed studies and consists of three categories: secreted (paracrine/autocrine) signaling interactions, extracellular matrix (ECM)–receptor interactions, and cell–cell contact interactions. The difference of incoming or outgoing interaction strength between all cell populations in active EoE and EoE remission via secreted signaling, ECM–receptor or cell–cell contact was obtained from CellChat, all of which were used as inputs of principal component analysis (PCA), performed by R packages ‘FactoMineR’ v2.9 and ‘factoextra’ v1.0.7 with default parameters. The total contribution scores of individual cell populations were calculated based on the scores from the first two dimensions of the PCA (Dimension 1 = 65.4% and Dimension 2 = 26.1%), following the equation: score in Dimension 1 × % Dimension 1 + score in Dimension 2 × % Dimension 2, which was used as inputs of R package ‘pheatmap’ to generate a heatmap. Significant ligand–receptor gene pairs (p < 0.05) between active and remission EoE, focused on the fibroblast sub-populations, were identified using the “statistically robust mean method” in the CellChat algorithm. The ligand–receptor pairs were classified into secreted, cell–cell contact, and ECM–receptor cell–cell type interactions. The communication probability between cell types was modeled by the law of mass action as described by Jin et al. [[63]20]. 2.6. Cell Culture and Treatment Previously characterized primary human esophageal fibroblast/myofibroblast cultures, established from surgical resections of deidentified normal human esophagi were cultured as previously described [[64]21,[65]22]. These primary cultures have been previously shown to express ACTA2+ gene and α-SMA protein. Human esophageal primary fibroblasts were purchased from ScienCell Research Laboratories (Carlsbad, CA, USA, Catalog no. 2730), and cultured in Fibroblast Medium (FM, Cat. #2301) as recommended by the manufacturer. To model active EoE, ACTA2+ and ACTA2− fibroblasts (passage 2–6) grown in 6 well plates were treated with IL-13 (final concentration 100 ng/mL, Peprotech, Rocky Hill, NJ, USA, catalog no. 200-13) for 24 h, as previously described [[66]18,[67]23]. 2.7. Real-Time Quantitative RT-PCR RNA was isolated using the RNAeasy mini kit (QIAGEN, Germantown, MD, USA), followed by cDNA synthesis using SuperScript™ (Invitrogen™, Waltham, MA, USA) reverse transcription system according to the manufacturer’s protocol. Quantitative PCR (qPCR) was performed with the isolated cDNAs using SYBR Green Master Mix for ACTA2, POSTN, LUM, RGS5, EGFL6, CCL26, TNC, IL13RA1, IL13RA2, FN1, COL6A1, COL6A2, COL6A3, and COL1A2 ([68]Supplementary Table S1). Ten microliters of reaction medium contained 2 µL of diluted cDNAs and 8 µL of 300 nM gene-specific primers and SYBR Green Master Mix (Applied Biosystems, Waltham, MA, USA). A control analysis was performed with an RNA sample to detect contamination by genomic DNA and with water to exclude the formation of primer–dimers. The PCR program was conducted as follows on a QuantStudio 5 (Life Technologies, Carlsbad, CA, USA): one cycle at 95 °C for 2 min, and 40 cycles of 15 s at 95 °C and 1 min at 60 °C. Data collection was performed at 60 °C. The melting curve was produced according to the following program: 15 s at 95 °C at a ramp rate of 1.6 °C s^−1, 1 min at 60 °C at a ramp rate of 1.6 °C s^−1, heating to 95 °C at a ramp rate of 0.05 °C s^−1 and held in place for 15 s. The data were continuously collected. The data were analyzed by the 2^−∆∆Ct method, using GAPDH and untreated conditions as references. qPCR was