Abstract Hypersecretion of airway mucus caused by goblet cell metaplasia is a characteristic of chronic pulmonary inflammatory diseases including asthma, cystic fibrosis (CF), and chronic obstructive pulmonary disease (COPD). Goblet cells originate from airway progenitor club cells. However, the molecular mechanisms and features of goblet cell metaplasia in lung disease are poorly understood. Herein, public single-cell RNA sequencing datasets of human lungs were reanalyzed to explore the transitional phase as club cells differentiate into goblet cells in asthma, CF, and COPD. We found that changes in club and goblet cells during pathogenesis and cellular transition were associated with signalling pathways related to immune response, oxidative stress, and apoptosis. Moreover, other key drivers of goblet cell specification appeared to be pathologically specific, with interleukin (IL)-13 and hypoxia inducible factor 1 (HIF-1)-induced genetic changes in asthma, cystic fibrosis transmembrane conductance regulator (CFTR) mutation being present in CF, and interactions with CD8^+ T cells, mitophagy, and mitochondria-induced apoptosis in COPD. In conclusion, this study revealed the similarities and differences in goblet cell metaplasia in asthma, CF, and COPD at the transcriptome level, thereby providing insights into possible novel therapeutic approaches for these diseases. Keywords: Goblet cell metaplasia, Asthma, Cystic fibrosis, COPD, Single-cell RNA sequencing Highlights * • Asthma, CF, and COPD share common features in immune response, cellular metabolism, and stress response. * • In asthma, IRF6, TAGLN2, and IL-13-induced genetic changes drive club-to-goblet transition. * • CF involves SPDEF, JUND, CREB3L4, and NF-kappa B transcription factors in goblet transition. * • In COPD, STAT1, CEBPB, KLF5, KLF2, and FOXQ1, are required for T cell interaction, mitophagy, and apoptosis in goblet transition. 1. Introduction Mucus is a gel composed of water, mucins, lipids, and cell debris. Mucus traps bacteria, allergens, and other harmful substances and removes them from the lungs through ciliary activity [[35]1,[36]2]. Airway obstruction caused by mucus hypersecretion and plugging contributes significantly to the high morbidity and mortality of chronic lung diseases including asthma, cystic fibrosis (CF), and chronic obstructive pulmonary disease (COPD), which share a common phenotype of goblet cell metaplasia [[37][3], [38][4], [39][5]]. Goblet cells originate from airway progenitor club cells that express secretoglobin family 1A member 1 (SCGB1A1) [[40]6]. Several transcription factors (TFs) are pivotal in regulating goblet cell transition, such as SAM-pointed domain-containing ETs-like factor (SPDEF), forkhead box A3 (FOXA3), FOXA2, and NK2 homeobox 1 [[41]7]. However, it remains unclear whether control of goblet cell fate is identical to asthma, CF, and COPD. In addition, it is difficult to precisely capture and study the transitional phase of the club-to-goblet cell transition. Single-cell RNA sequencing (scRNA-seq) is a powerful and rapid tool for high-throughput sequencing analysis of transcriptomes at the single-cell level. It can identify genetic differences among thousands of cells in multiple samples. The advent of scRNA-seq has enabled us to better understand dynamic gene expression within and heterogeneity between related cell types in diseased human lungs [[42]8]. Herein, we integrated five public scRNA-seq datasets from human lungs to analyse the transcriptional similarities and differences of goblet cell metaplasia in asthma, CF, and COPD. Our results clarify the mechanism of airway mucus production and expression of signaling pathway genes, and suggest possible therapeutic targets for these diseases. 2. Materials and methods 2.1. Asthma, CF, and COPD datasets scRNA-seq data from bronchoscopies of six healthy controls and six patients with asthma from the proximal airways (third to sixth generations of the right lower and middle lobe) were kindly provided by FA Vieira Braga [[43]9]. scRNA-seq data of proximal airway epithelia ([44]GSE150674) from 20 healthy controls and 22 patients with CF were downloaded from the Gene Expression Omnibus (GEO; [45]https://www.ncbi.nlm.nih.gov/geo) [[46]10]. scRNA-seq data from lung distal parenchymal samples ([47]GSE136831, [48]GSE173896, and [49]GSE168191) from 39 healthy controls and 28 patients with COPD were downloaded from the GEO database [[50][11], [51][12], [52][13]]. The demographics and clinical data of the patients are shown in [53]Supplementary Table 1. 2.2. Data processing The expression matrices and patient metadata of the five datasets were downloaded and analysed using R software (version 4.2.1). Doublets of raw data ([54]GSE168191 and [55]GSE173896) were removed using the DoubletFinder package (version 2.0.3) before analysis. Doublets in other datasets were previously removed and were not considered in this study. The Seurat package (version 4.1.1) was used for subsequent analysis [[56]14]. Briefly, batch effects between the datasets and samples were removed using the harmony package (version 0.1.0) [[57]15]. Cells with <200 and >6000 genes or those composed of >20 % mitochondrial genes were considered low-quality cells and were excluded from the asthma and COPD datasets. For the CF dataset, quality control had already been performed on the downloaded data, therefore cell filtering was not conducted. Then, epithelial cells (EPCAM^+ and CDH1^+) in the five datasets were then integrated and analysed using the standard workflow of the Seurat package (normalisation, dimension reduction, and clustering) with default parameters (R codes are provided in the Supplementary Materials). Canonical marker genes were used to annotate the cell types in the dataset [[58][16], [59][17], [60][18]]. Club and goblet cells from the three diseases were extracted from the integrated data for downstream analysis. 2.3. Cell trajectory analysis Cell trajectory analysis measures the transcriptional differences in individual cells during cell transition and ranks each unit according to its progress on the trajectory. Herein, the monocle package (version 2.20.0) was used for cell trajectory analysis of club-to-goblet cell transition in the three diseases [[61]19]. The genes along the cell differentiation trajectory (CTG) were determined based on a q-value <0.01. Club cells were defined as the root of the cell differentiation trajectory in each dataset. 2.4. Pathway enrichment analysis The club and goblet cells were reclassified according to the cell order from the cell trajectory analysis. To study goblet cell metaplasia in the three diseases, club and goblet cells were analysed for differentially expressed genes (DEGs). The DEGs in airway club and goblet cells from the disease groups compared with cells from the control groups were identified using the run_de function of Libra (version 1.1.0) [[62]20]. |Fold change (FC)| >1.5 and an adjusted p-value <0.05 were used to screen DEGs. Gene Ontology biological process (GO BP), and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses of the DEGs were performed using the Database for Annotation, Visualisation and Integrated Discovery (DAVID, 2021 Update) website with default parameters. The top five upregulated and downregulated DEGs of club and goblet cells for each disease were visualised using volcano diagrams. 2.5. Transcriptional regulatory network analysis To determine the crucial TFs among the CTGs of the disease and control groups, club and goblet cells were used to perform standard single-cell regulatory network inference and clustering (SCENIC version 1.3.1) analysis [[63]21]. Regulons (TFs and their target genes) were identified using the GENIE3 package (version 1.18.0). The AUCell package (version 1.18.0) was used to score the regulatory activity of TFs in each disease. Regulons with scores beyond the calculated thresholds were considered active regulons and were retained in the regulatory network. Eventually, active TFs and target genes that met the screening criteria (|FC| >1.2 and adjusted p-value <0.05) were visualised using Cytoscape software (version 3.9.1) [[64]22]. 2.6. Transitional state analysis of the CTG To further understand the CTGs in the three diseases, the genes obtained from cell trajectory analysis were divided into three states (start, middle, and end) using the monocle package. The CTGs in the middle state, which represent the differentiating state were pulled out for pathway enrichment analysis by DAVID. GO BP terms and KEGG pathways with Kappa scores of >0.4 were visualised, and a two-sided hypergeometric test was used to determine whether these terms should be retained, based on an adjusted p-value of 0.05. 2.7. Gene module score and gene set variation analysis (GSVA) The following 10 pathway gene sets related to club-to-goblet cell transition reported in previous studies were downloaded from the KEGG pathway database ([65]https://www.kegg.jp): GABAergic synapse (hsa04727), WNT signalling pathway (hsa04310), Hedgehog signalling pathway (hsa04340), JAK-STAT signalling pathway (hsa04630), MAPK signalling pathway (hsa04010), NF-kappa B signalling pathway (hsa04064), NOTCH signalling pathway (hsa04330), PI3K-AKT signalling pathway (hsa04151), TGF-beta signalling pathway (hsa04350), and VEGF signalling pathway (hsa04370). To determine the functional activities of these pathways in our scRNA-seq data, the average score of each airway epithelial cell type was calculated using the corresponding gene set via the AddModuleScore function of the Seurat package and visualised using heatmaps. The normalised expression matrix of club and goblet cells was used for GSVA (version 1.44.1) [[66]23], and the results were visualised using divergent bar plots. 2.8. Statistical analysis All statistical analyses were calculated using R software. For the DEG analysis, we used the edgeR algorithm [[67]24] based on the run_de function of the Libra package. To compare the cell proportion and regulon activities in different groups, we used the two-sided Student's t-tests via the t.test function of the stats package if the data were normally distributed, and the two-sided Wilcoxon rank-sum test via the wilcox.test function of the stats package if the data were not normally distributed. Statistical significance was set at p < 0.05 (*p < 0.05, **p < 0.01, ***p < 0.001). 3. Results 3.1. scRNA-seq analysis of airway goblet cell metaplasia Goblet cell metaplasia is commonly found in chronic airway inflammatory diseases such as asthma, CF, and COPD. To further understand the similarities and differences in the molecular mechanisms that regulate goblet cell metaplasia, we retrieved and analysed scRNA-seq data collected from the lungs of patients with these three diseases. After removing low-quality cells and batch effects, the standard workflow of the Seurat package was used for downstream analysis. The epithelial cells (EPCAM^+ and CDH1^+) of five datasets were first integrated and clustered, and airway epithelial cells from each disease were selected for further analysis. Airway epithelial cells from the three diseases totalled 61,479 cells (control: 32,615 cells, asthma: 12,833 cells, CF: 10,128 cells, COPD: 5,903 cells) and were divided into nine cell types according to canonical marker genes ([68]Fig. 1a and b), including seven normal clusters comprising basal cells (KRT5^+ and TP63^+), club cells (SCGB1A1^+ and SCGB3A2^+), goblet cells (MUC5AC^+, MUC5B^+, and SPDEF^+), submucosal gland cells (LYZ^+ and LTF^+), ciliated cells (FOXJ1^+ and CCDC78^+), neuroendocrine cells (CALCA^+, CHGA^+, and ASCL1^+), and ionocytes (CFTR^+ and FOXI1^+); one transitional cluster comprising trans-ciliated cells (SCGB1A1^+, MUC5AC^+, and FOXJ1^+); and one proliferating cluster comprising proliferating cells (MKI67^+ and TOP2A^+). Furthermore, the proportion of each type of lung epithelial cell was analysed in patients with asthma, CF, and COPD ([69]Supplementary Fig. 1). Fig. 1. [70]Fig. 1 [71]Open in a new tab Single-cell RNA-sequencing analysis of airway epithelial cells in asthma, cystic fibrosis (CF), and chronic obstructive pulmonary disease (COPD). a Uniform manifold approximation and projection (UMAP) graphical representation of 61,479 epithelial cells from control and diseases. b Heatmap of canonical marker gene expression levels in nine airway epithelial cell types. 3.2. Disease-specificity of the club-to-goblet cell transition trajectory To avoid the clinical-group batch effect and determine the real differences in club-to-goblet cell transition between the disease and control groups, cell trajectory analysis was performed separately in different clinical groups using the monocle package. We annotated club and goblet cells in our data according to the cell order calculated by monocle ([72]Fig. 2a and b). We then used the pseudo bulks method [[73]20] for DEG analysis of club and goblet cells in each disease individually. The DEGs of the three diseases were combined, and the results showed 203 DEGs for asthma, 772 for CF, and 1654 for COPD ([74]Supplementary Table 2). DPYSL3 and DENND2C were the common DEGs in all three datasets ([75]Fig. 2c). The top five upregulated and downregulated DEGs of each disease were visualised using volcano plots ([76]Fig. 2d). In asthma, CST1, CST4, and CCL26 levels were increased in club cells, and CST1, FETUB, and CLCA1 levels were upregulated in goblet cells. In CF, club cells showed high expression levels of mitochondrial genes. In COPD, CXCL9 and CXCL11 levels were increased in club cells, while the levels of ATP5E, ATP5I, ATP5L and other genes encoding adenosine triphosphate (ATP) synthase were decreased in club and goblet cells. Fig. 2. [77]Fig. 2 [78]Open in a new tab Transcriptomic changes in club and goblet cells in asthma, cystic fibrosis (CF), and chronic obstructive pulmonary disease (COPD). a The cell trajectory from club to goblet cells in the control and disease group. Each point represents a cell. b The relative expression of SCGB1A1, MUC5AC, MUC5B and SPDEF during club-to-goblet cell transition. c Venn diagram showing the intersections of differentially expressed genes (DEGs) in asthma, CF, and COPD. d Volcano diagrams showing the top five upregulated and downregulated DEGs of club and goblet cells in each disease. e The number of Gene Ontology biological process (GO BP) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enriched for DEGs from the three diseases. f The common GO BP and KEGG pathways among the three diseases. We then conducted enrichment analysis using the DEGs to identify possible biological processes and signalling pathways associated with each disease. The number of enriched terms was higher in asthma and CF than in COPD ([79]Fig. 2e), indicating that the alteration of club and goblet cells in COPD was more complicated and might be associated with multiple physiological activities. Thereafter, we performed intersection analysis using the enrichment results of the three diseases and extracted the top 10 pathways of DEG enrichment for each disease ([80]Supplementary Tables 3 and 4). We found that the mutually enriched terms in the three diseases were mainly involved in neutrophil chemotaxis, response to lipopolysaccharide, defence response to bacterium, and response to hydrogen peroxide ([81]Fig. 2f). 3.3. Distinct TFs regulate the goblet cell transition in different lung diseases To study the TFs that control the goblet cell transition in the three lung diseases, we performed the single-cell regulatory network inference and clustering (SCENIC) workflow with disease-specific CTGs and set the selection criteria as: |FC| >1.2 and and adjusted p-value <0.05. We identified two active TFs in asthma, three in CF, and five in COPD ([82]Fig. 3a). Along the asthmatic cell differentiation trajectory, IRF6 and TAGLN2 were highly expressed in the end state. In CF, JUND was initially highly expressed, whereas SPDEF and CREB3L4 were mainly concentrated in the end state. Additionally, in COPD, the KLF5, FOXQ1, and CEBPB played roles in the initial transition state, while STAT1 and KLF2 were expressed at the initiation and intermediate stages ([83]Fig. 3b). The activity scores of the TF regulatory units suggested that most regulons may be more active in the goblet cells of the disease groups compared with the control group ([84]Fig. 3c); therefore, these regulators may be important targets for biological therapy. To map the TFs of club and goblet cells in the steady state, we extracted CTGs related to the transition of club cells to goblet cells in normal samples of the three datasets. TFs that may be responsible for goblet cell transition at the steady-state included CREB3L1, FOSB, GTF2B, MAFB, and CHD2 ([85]Supplementary Fig. 2A, [86]Supplementary Table 5). Fig. 3. [87]Fig. 3 [88]Open in a new tab Distinct transcription factors (TFs) govern club-to-goblet cell transition in asthma, cystic fibrosis (CF), and chronic obstructive pulmonary disease (COPD). a The network of TFs (left) and their target genes (right) for three diseases. The colours of the nodes represent the fold change in upregulation (red) and downregulation (blue) of genes in the disease group compared with the control group. The size of the TF nodes indicates for the number of their target genes. b Heatmaps of TF expression along the cell differentiation trajectory in each disease. c Violin plots showing the activities of the TF regulatory units in club and goblet cells of the three diseases. ***p < 0.001. (For interpretation of the references to colour in this figure legend,