Abstract Cardiac differentiation of human pluripotent stem cells is not only a new strategy of regenerative therapy for cardiovascular disease treatment but also provides unique opportunities for the study of in vitro disease models and human heart development. To elucidate the dynamic gene regulatory networks and pivotal regulators involved in the cardiomyocyte differentiation process, we conducted an analysis of single-cell RNA sequencing data obtained from the reprogramming of two human induced pluripotent stem cell (iPSC) lines into cardiomyocytes. The data were collected from 32,365 cells at 4 stages of this process. We successfully identified cardiomyocyte clusters and several other cell clusters with different molecular characteristics derived from iPSC and described the differentiation trajectory of cardiomyocytes during differentiation in vitro. Through differential gene analysis and SCENIC analysis, we identified several candidate genes including CREG and NR2F2 that play an important regulatory role in cardiomyocyte lineage commitment. This study provides the key differentiation trajectory of heart differentiation in vitro at single-cell resolution and reveals the molecular basis of heart development and differentiation of iPSC-derived cardiomyocytes. Keywords: Induced pluripotent stem cells (iPSC), Single-cell RNA sequencing, Cardiomyocyte differentiation, Transcription regulation, Trajectory inference Subject terms: Cell biology, Stem cells Introduction Cardiovascular disease is the leading cause of death worldwide^[30]1. Regenerative therapy with induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) is a new potential treatment strategy for cardiovascular disease^[31]2. The technique reported by Yamanaka and others for generating pluripotent stem cells from adult cells avoids the need for human embryos, sidestepping political controversy rooted in moral and religious concerns^[32]3–[33]5. iPSCs have the ability to self-renew and differentiate into various human cell types, which can be used not only in regenerative medicine and drug discovery but also in analyzing the genetic and molecular mechanisms of cardiac differentiation and disease, providing potential for clinical cell therapy. With the rapid development of iPSC technology in the 21st century, it has been widely used in tissue-engineered cardiac patches and cardiac regenerative medicine. Intramyocardial injection of iPSC-CMs can improve myocardial function, and attenuate ventricular remodeling and myocardial fibrosis in infarcted rats^[34]6. Despite the exciting progress made, the problems of uneven differentiation efficiency, low purity, and immaturity of these cardiomyocytes remain unresolved^[35]7. However, the transplantation of immature cardiomyocytes may cause fatal arrhythmias^[36]8. Recent advances in RNA or chromatin immunoprecipitation sequencing have revealed some of the complexities of gene regulation during the differentiation of iPSC-derived cardiomyocytes. However, the complete genetic and differentiation regulatory circuits have not yet been discovered. Bulk RNA-seq ignores the stochasticity of gene expression in different cell types, with results influenced by the relative cell type abundance and status of each cell type within a sample^[37]9. Whereas, with the significant advances in single-cell RNA sequencing (scRNA-seq) technology, scRNA-seq has transformed our understanding of cardiac morphogenesis, particularly human cardiogenesis, by enabling the study of global transcriptomic profiles at the single-cell level^[38]10. However, the overall composition of cell types and the gene regulatory profile of individual cell change dynamically during cell differentiation. To identify these dynamic effects with high resolution, we collected cells at four differentiation time points from induced pluripotent stem cells to cardiomyocytes for scRNA-seq. In this paper, we identified the true iPSC-derived cardiomyocyte clusters and recognized the cellular heterogeneity during differentiation. Through the construction of regulatory networks, lineage trajectories, and intercellular crosstalk, we can more accurately understand the characteristics of the identity of differentiation process and derivation of cardiac fates, which provides new information for the application of iPSC-CMs in clinical medicine and drug development. Result Single-cell RNA sequencing analysis of cardiac directed differentiation To further understand the genetic regulation of cardiovascular development, we reanalyzed the scRNA-seq data at four time points during the differentiation of iPSCs into cardiomyocytes^[39]11. The data were derived from single-cell transcriptional profiling of human iPSCs induced using a chemically defined cardiac differentiation kit (Cellapy, CA2004500, China). Two human iPSC lines (CA4024106, cell line 1 M-iPS; CA4027106, cell line 2 F-iPS) derived from healthy males and females using the Cytotune-iPS 2.0 KOSM transgenic system were used as parental cell lines for the study^[40]11. Cells were collected at days 0, 2, 4, and 10 of differentiation for large-scale scRNA-seq (10x genomics) (Fig. [41]1a). The Seurat R package was used for log-normalization and data analysis. Cells with a significantly abnormal number of genes (> 8000) were considered potential multiplets and were excluded from subsequent analysis. Similarly, cells with a percentage of mitochondrial genes ≥ 10% also were excluded. We collected a total of 32,365 cells from the four sample groups, including 11,654 cells on day 0, 6903 cells on day 2, 8724 cells on day 4, and 6272 cells on day 10, resulting in a total of 2,066,741,896 high-quality clean reads. Fig. 1. [42]Fig. 1 [43]Open in a new tab Single-cell RNA sequencing analysis of cardiac directed differentiation. (a) Schematic of the experimental design. (b) Gene expression across all cells at individual time points. (c) UMAP (upper panel) and t-SNE (lower panel) plot of single-cell clustering from all four time points. (d) Number of genes and UMIs per cluster. (e) Number of cells per cluster. (f) The proportion of cells per cluster at four time points. (g) UMAP plot at each time point. (h) t-SNE plot at each time point. To confirm that the differentiation followed the known cardiac differentiation trajectory, we analyzed the expression of known genes. On day 0, the human iPSC lines were positive for pluripotency marker genes including T, NANOG, and POU5F. On day 2, previously reported mesoderm markers such as T, MIXL1, and EOMES were sequentially activated, while the expression of pluripotency markers SOX2 and NANOG was gradually downregulated (Fig. [44]1b). The expression levels and cell numbers of cardiac and cardiomyocyte markers significantly increased at day 10 of cardiac differentiation. These data show that cardiac-directed induction of differentiation produces distinct cell populations and displays expected time-specific transcriptional profiles. Non-linear dimensional reduction of the scRNA-Seq data by t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) revealed 9 distinctly separated cell populations (Cluster 0–8) (Fig. [45]1c). Individual clusters have variable numbers of UMIs and genes (Fig. [46]1d). The proportion of cells per cluster at different stages during differentiation reflects the differentiation process of cells within each cluster (Fig. [47]1e–f). The cell numbers in clusters 1, 2, and 4 continued to decrease from day 0 to day 10, while the number of cells in cluster 6 increased continuously (Fig. [48]1g–h). Notably, cell numbers in clusters 5 and 0 peaked on day 2 and day 4, respectively, before declining. Identify the iPSC-CMs cluster To identify the iPSC-CMs and other cell clusters that may exist during differentiation, we visualized the expression of known marker genes (Fig. [49]2a). Pluripotency biomarkers (e.g., T, NANOG, POU5F1) were expressed in cell clusters 1,2,3 and 4, while typical primitive streak (PS) markers, such as MIXL1, T-brachyury, EOMES, and MIXL1 was found in cluster 5. DNMT3B, a rejuvenation signature^[50]12, also exhibited higher expression in clusters 1, 2, and 4. The transcription factor genes associated with cardiac lineage, which play a decisive role in cardiac development and formation, including Nkx2-5, TBX5, GATA4, Isl1, HAND1/2, and MEF2C, were activated in clusters 0 and 6 (Fig. [51]2a). The early expression of the zinc finger transcription factor GATA4in precardiogenic splanchnic mesoderm, which can activate the expression of certain cardiac-specific genes, is crucial for normal cardiac morphogenesis and is highly expressed in cluster 0/6^[52]13,[53]14. Consistently, the LIM-homeodomain transcription factor Islet-1 (Isl1), a marker of cardiac precursors during differentiation, serves as a pioneer factor driving cardiomyocyte lineage commitment^[54]15. ISL1 is vital for orchestrating proper cardiogenesis and establishing the epigenetic memory of cardiomyocyte fate commitment^[55]15, with its expression limited to clusters 0 and 6. Structural genes encoding sarcomeric-related proteins expressed in differentiated cardiomyocytes, including MYL7, MYH6, TNNI1, TTN, and TNNT2, were found in clusters 0 and 6, with the highest expression observed in cluster 6 (Fig. [56]2a). Gene Set Enrichment Analysis (GSEA) revealed that Cluster 6 significantly enriched metabolic pathways, including “Dilated cardiomyopathy”, “Hypertrophic cardiomyopathy (HCM)”, “Cardiac muscle contraction” and “Adrenergic signaling in cardiomyocytes” (Fig. [57]2b). The expression of genes that have been reported to be involved in cardiomyocyte development, including PPARA, FABP3, BMP5, BMP7, ERBB2, and ERBB4^[58]16, exhibited significantly increased expression in cluster 6 (Fig. [59]2c). The PPAR pathway, which is activated during cardiomyocyte differentiation^[60]17, leads to increased expression of FABP3, a regulator of cardiac metabolism^[61]18. The trabecular genes ERBB2 and ERBB4 are also highly expressed in cluster 6 (Fig. [62]2c), potentially promoting cell differentiation or proliferation^[63]19. Consistent with the study by Cui et al., BMP signaling is involved in regulating cardiac development^[64]16, and BMP ligands BMP5 and BMP7 are expressed at high levels in cardiomyocytes (Fig. [65]2c). Based on this information, cluster 6 was identified as a true iPSC-CM cluster. Fig. 2. [66]Fig. 2 [67]Open in a new tab Identify the iPSC-CMs and other cell clusters. (a) Expression of specific markers in all clusters. The shade of red color in the UMAP plot reflects the relative expression level of corresponding genes. (b) Pathway enrichment of highly expressed genes in cluster 6. The abscissa represents the gene ratio, the size of dots represents the the number of genes, and the color of dots represents the p values. (c) The gene expression levels of each cluster are visualized by violin plots. (d) UMAP plots identified nine cell types separated into distinct clusters. (e) Heat maps showed highly expressed genes of nine clusters. On the other hand, cluster 8 expressed relatively high levels of smooth muscle cell markers TAGLN, TNN1, and ACTA2 (α-SMA) (Fig. [68]2a). Cluster 7 cells exhibited high expression of PAX3 (Fig. [69]2a), an ectodermal marker and an important transcription factor in the PAX family, which is closely related to the brain and neural crest during embryonic development^[70]20,[71]21. However, cluster 7 did not express other ectoderm genes, but highly expressed pluripotency genes, suggesting that these cells may represent an undifferentiated population. Taken together, these data show that iPSCs differentiate into PS mesoderm (cluster 5), cardiac progenitors (cluster 0), cardiomyocytes (cluster 6), and smooth muscle cells (cluster 8) (Fig. [72]2d). The highly differentially expressed genes across the 9 clusters are shown in the heatmap (Fig. [73]2e). Reconstruction of the developmental trajectory of cardiac differentiation from human iPSCs To track the differentiating cells from day 0 to day 12, we performed a pseudotime analysis, revealing that the pluripotent stem cells gradually developed into committed and definitive cardiomyocytes, with obvious branching and turning observed on day 2 (Fig. [74]3a). From cluster 0 onward, the cells are committed to the cardiac lineage (Fig. [75]3a). In order to understand the potential biological functions of the clusters, enrichment analysis revealed specific pathways related to cardiac differentiation (Fig. [76]3b). In cardiac progenitors (Cluster 0), cardiomyocytes (Cluster 6), and SMCs (Cluster 8), the TNF-β signaling pathway arose as a major biological event, indicating its close association with cardiomyocytes development. The role of TGF-β signal transduction in the cardiac outflow tract and cardiac valve has been reported^[77]22,[78]23. In addition, genes involved in extracellular matrix (ECM) receptor interactions were enriched in these three clusters, which may be due to the necessity of cardiac ECM synthesis for supporting tissue growth in early cardiac morphogenesis^[79]24. From cluster 0 onwards, the cells are clearly committed to the cardiac lineage. Unlike clusters 0 and 8, cluster 6 showed significantly high expression of oxidative phosphorylation genes, which is consistent with the high dependence of cardiomyocytes on mitochondrial oxidative metabolism. In contrast, iPSs exhibited marked enrichment for DNA replication and base excision repair (Fig. [80]3b). RNA velocity, which distinguishes between unspliced and spliced mRNAs in single-cell RNA sequencing, infers the direction of gene expression changes, allowing predictions of future individual cell states^[81]25. As shown in Fig. [82]3c, the direction and magnitude of the velocities are projected onto the UMAP plot as arrows, with cardiac progenitor cells (cluster 0) flowing toward cardiomyocytes (cluster 0) as expected, while clusters 2 and 4 flow toward cluster 5. The pseudotime analysis on days 0 and 2 revealed that cluster 2 had a greater potential for transitioning into cluster 5 (Fig. [83]3d–e). Based on these results, we constructed the developmental trajectory of iPSC-CMs differentiation in this study (Fig. [84]3f). The iPSCs in this study, consistent with most studies^[85]26–[86]28, developed into cardiomyocytes following the same steps e same steps observed in human embryonic stem cell cardiac development (Fig. [87]3f). Fig. 3. [88]Fig. 3 [89]Open in a new tab Reconstruction of the developmental trajectory of cardiac differentiation from human iPSCs. (a) Reconstruction of cell differentiation trajectory from day 0 to day 10, with the cardiomyocytes highlighted in the pseudotime trajectory. (b) The heat map showed pathway enrichment of differentially expressed genes in all clusters. (c) The UMAP plot of RNA velocities shows the dynamics and trajectory of mRNA transcription states. The direction of cell state transition and RNA velocities are projected onto UMAP embedding as streamlines. (d–e) Reconstruction of cell differentiation trajectory from day 0 to day 2. (f) Schematic representation of cardiomyocyte differentiation from human iPSCs in this study. (g) Heatmap showing the number of cell–cell interactions between cell clusters, predicted by the CellphoneDB. (h) Dot plot showing the selected ligand-receptor pairs between clusters. Dot size and color represents the scaled mean of the expression level of the two interacting molecules in their respective clusters. The red outline indicates significance. Intercellular communication mediated by ligand-receptor complexes is essential for coordinating a variety of biological processes^[90]29. To explore the crosstalk between different cell types during cardiomyocyte development, CellPhoneDB was used to study potential receptor-ligand interactions (Fig. [91]3g–h). Our analysis reveals that interactions among cardiomyocytes were the most prevalent during this process. Additionally, the interactions of receptor-ligand between cardiomyocytes and other clusters were notably abundant, particularly with cardiac progenitor cells, suggesting a significant connection between these two clusters (Fig. [92]3g–h). Important transcription factor regulating cardiac lineage commitment To further characterize the molecular changes surrounding cardiac lineage specification, we focused on clusters 0 and 6. Using the Louvain clustering algorithm, we identified three subclusters, with cardiac progenitors (cluster 0) divided into subclusters 0–1 and 0–2, according to their expression profiles (Fig. [93]4a). Both subclusters expressed cardiac progenitor markers, but subclusters 0–2 demonstrated a greater potential to differentiate into cardiomyocytes (Fig. [94]4b). Further analysis indicated that upregulated genes in subcluster 0–2 were significantly enriched in ECM, MET activates PTK2 (FAK) signaling, and TGF-β signaling pathway, indicating that activation of these pathways may be involved in cardiomyocyte differentiation (Fig. [95]4c–e). The composition and distribution of ECM in the heart are dynamic, and ECM is indispensable for promoting cardiac regeneration^[96]24. Furthermore, the role of the TGF-β superfamily in cardiac development and the populating of the embryonic heart with cardiomyocytes has been well documented^[97]30. Conversely, cell cycle-related pathways were significantly inhibited, indicative of emerging cardiomyocytes (Fig. [98]4c–e). However, MET/FAK has been found to regulate the cell cycle, and its inhibition is associated with decreased cancer cell proliferation^[99]31. Fig. 4. [100]Fig. 4 [101]Open in a new tab Important transcription factor that regulates cardiac lineage commitment. (a) Cluster 0 and 6 are further analyzed by subclustering. (b) Reconstruction of cell differentiation trajectory of subclusters. (c, d) Pathway enrichment analysis with REACTOME and KEGG is shown for differentially expressed genes between subcluster 0–1 and subcluster 0–2. The abscissa represents the Normalized Enrichment Score (NES), the size of dots represents the Gene Ratio of differentially expressed genes in this pathway, and the color of dots represents the adjusted p values. (e) GSEA revealed significant pathway activations or inhibitions within subclusters 0–2. Two-tailed Student’s t test p values are indicated. (f) Heat maps show highly expressed genes in subclusters 0–1 and 0–2. (g) A selection of the regulons identified using SCENIC analysis (rows), and their activities in each cell (columns). Analysis of differentially expressed genes (DEGs) between the subclusters 0–1 and 0–2 revealed that CREG1, CCBE1, and NODAL were significantly upregulated in subcluster 0–2 (Fig. [102]4f). Notably, the cellular repressor of E1A-stimulated genes (CREG) has been shown to be a vascular remodeling mediator and its overexpression inhibits the proliferation of vascular SMCs while promoting their differentiation^[103]32,[104]33. In addition, the interaction between CREG1 and the cytolytic component Sect. 8 promotes the differentiation of ES cells into cardiomyocytes and is essential for assembling intercellular connections^[105]34. This suggests that CREG1 may be an important transcription factor that regulates cardiac lineage commitment. The collagen- and calcium-binding EGF-like domains 1 (CCBE1) is a secreted protein that enhances VEGF-C signaling and is critical for lymphangiogenesis during development. It has recently been found to be required for the differentiation of embryonic stem cell-derived early cardiac precursor lineages^[106]35. NODAL was originally considered a mesoderm inducer in vertebrates^[107]36, but recent findings indicate its close relationship with proper cardiac specification and differentiation. Disruption of Nodal/TGFβ pathway signal can lead to abnormal cardiac and endoderm development^[108]37. Based on the above information, we speculate that CREG1, CCBE1, and NODAL are important genes that control cardiac lineage commitment from early mesodermal precursor cells. To further identify regulators of cardiomyocyte differentiation, we performed single-cell regulatory network interference and clustering (SCENIC) analysis^[109]38, which identifies differential expression of regulons (transcription factors and their putative direct targets). Analysis of the regulons enriched in subclusters 0–1, 0–2, and 6 showed that although some regulons (including ARNTL2, NFIA, ZNF649) are shared among these three clusters, there are differentially expressed regulons in each cluster (Fig. [110]4g). Especially the transcription factors of the HOX family, including HOXB3, HOXA2, HOXD4, HOXD3, and HOXC5, are activated in cluster 6, as HOX proteins are required during heart development^[111]39. PPARA, which belongs to the PPAR family, is also activated in cluster 6. This family consists of three members, PPARα, PPARβ (also called PPARb/d or PPARδ), and PPARγ^[112]40. PPARα is found in different cardiomyocytes and is known to be essential for cardiac energy metabolism^[113]16,[114]41. Another transcription factor expressed in 6, ZIC2, was identified in a genome-wide CRISPR-knockout screen for cardiac differentiation from hESCs and has been proven to be a key gene in determining cardiac progenitor cell fate during human cardiogenesis^[115]42. Xiong et al. suggested that Lim homeobox transcription factor 1α (LMX1A)may be involved in the process of limonin-mediated cardiac repair following myocardial infarction^[116]43. The transcription factor Forkhead box G1 (FOXG1), a nuclear-cytosolic transcription factor expressed in the developing nervous system, is essential for forebrain development^[117]11,[118]44. Nuclear receptor subfamily 2, group F, member 2 (NR2F2) encodes the key nuclear receptor COUP-TF2, whose altered expression is associated with cardiovascular diseases, especially atherosclerosis and congenital heart defects (CDH)^[119]45. However, the exact roles of subgroup 6 specific TFs in human cardiac lineage commitment warrant further investigation. FOXA2, FOXG1, and FOXI3 are activated in subclusters 0–2 or 6. The forkhead box (FOX) proteins constitute a prominent family of transcription factors involved in a wide range of cellular processes. FOXA2 plays a key role in embryonic patterning and endoderm formation, and also regulates normal heart development^[120]46. FOXG1, a transcription factor that controls the proliferation of neural progenitor cells during brain development, was recently found to regulate epicardial proliferation^[121]47. FOXI3 is a regulator of ectodermal development^[122]48. Based on the above information, we hypothesize that transcription factors FOXA2 and FOXG1 are potentially significant in the cardiomyocyte differentiation process, warranting further exploration. Discussion The iPSC to cardiomyocyte differentiation system is a powerful tool for modeling human cardiac development, and scRNA-Seq technology provides a way to represent the dynamics of gene expression during differentiation at high resolution. In this study, we not only dissect cellular heterogeneity from a single sampling point but also analyze the dynamics of gene expression during differentiation across multiple time points, reconstructing developmental trajectories, and revealing the potential regulatory mechanisms involved. There are many established methods to reprogram somatic cells into pluripotent stem cells, such as retroviral transduction OCT4, SOX2, KLF4 and c-myc, adenovirus vector, recombinant proteins, and transposon system^[123]4,[124]49–[125]52. Although the iPSCs have the potential to differentiate into all three germ layers, they face limitations regarding low induction efficiency and lengthy process. Recent advancements in adding small molecules and growth factors to pluripotent stem cell culture media have improved the generation of mature cardiomyocytes, achieving higher yields (85–95%), and greater cost-effectiveness^[126]53. Generation of mature human cardiomyocytes from iPSCs is a challenging endeavor and in vitro differentiation does not fully replicate the morphology, expression profile, complex cell types, and functionality of adult in-vivo-derived cardiomyocytes^[127]7. Nkx2-5 is indispensable for normal cardiac development, but its expression level is relatively low in cardiac progenitors, and it significantly increases in cardiomyocytes. This phenomenon may be attributed to the immaturity of the in vitro differentiation system. Similar results have been observed in the research on cardiac differentiation from human pluripotent stem cells (hPSC)^[128]28,[129]54 or iPSC^[130]55. In this study, we focused on rebuilding the developmental trajectory and mining genes that play a key role in cardiomyocyte transcription and functional maturation, including CREG1, NODAL, ZIC2, LMX1A, FOXG1, and NR2F2. CREG, known as a cell suppressor of the E1A stimulator gene, including CREG1 and CREG2, can antagonize transcriptional activation and cell transformation of E1A^[131]56,[132]57. In human carcinoma, CREG protein can be secreted to extracellular and participate in the signaling cascade to enhance the differentiation of NTERA-2 human embryonic cancer cells^[133]58. CREG promotes the differentiation of rat SMCs, participates in the maintenance of quiescent mature SMC phenotype, and plays a role in maintaining vascular homeostasis and preventing vascular remodeling^[134]32,[135]33. On the other hand, CREG1 is highly expressed in the heart and has protective functions in damaged myocardium^[136]59–[137]61. Its deficiency exacerbates cardiac dysfunction, cardiac hypertrophy, and fibrosis in diabetic cardiomyopathy mice, which can be alleviated by overexpression^[138]62. The role of CREG1 in cardiomyocyte differentiation remains unclear and it may serve as a novel differentiation biomarker. A recent study showed that pathways like BMP and Wnt induce cardiogenic mesoderm patterning and specification through the coordinated activation of Cdx/Hoxgenes^[139]63. ZIC2 has been identified as an essential gene for the formation of cardiac progenitor cells^[140]42. Its mutation prevents cardiac lineage commitment by affecting apelin receptor-related signaling pathways during mesoderm formation^[141]42. NR2F2 protein is highly expressed in the atria throughout embryogenesis^[142]64. In mice, the absence of COUP-TF2 can lead to embryonic death due to angiogenesis and cardiogenic defects^[143]65, while in human embryonic hearts, mutations in the NR2F2 gene can lead to a variety of cardiovascular abnormalities, including atrioventricular septal defects, congenital bicuspid aortic valve (BAV), hypoplastic left heart and so on^[144]66–[145]68. Conclusion In summary, this study employed single-cell data to provide a high-resolution view of the differentiation landscape inferring pseudo-time along multiple differentiation trajectories and identifying several candidate genes that play key regulatory roles in early cardiac lineage commitment, which underlie the transcriptional and functional maturation of iPSC-derived cardiomyocytes. These findings deepen our understanding of the molecular basis of cardiac development and provide new insights into the development of novel cardiac differentiation strategies for disease modeling and regenerative medicine. Materials and methods scRNA-Seq data bioinformatics pre-processing The data is from the reanalysis of scRNA-seq data recently published by our laboratory, which can be accessed in the GEO database with the accession number [146]GSE130731^[147]11. Fastq files of 10X Genomics scRNA-Seq data were aligned to the GRCh38 human genome, and the reads were filtered and quantified reads using the Cell Ranger pipeline (v.7.0.0) with default parameters. For cell quality control, dimensionality reduction, clustering, cell annotation and differential gene analysis, downstream analyses were performed using the filtered gene expression matrices by R software (v.4.0.4) with the Seurat package (v.4.0.5)^[148]69. Briefly, for quality control, cells with (1) gene count < 500; (2) gene count > 8000; (3) UMI count < 1000; (4) mitochondrial gene proportion > 10% were filtered as low-quality cells or doublets. After performing logarithmic normalization, gene expression normalization, and principal component analysis on the data, we used the functions (FindIntegrationAnchors and IntegrateData) to merge and remove the batch effects. Clustering of scRNA-Seq data For clustering, we used the Find Neighbours and Find Clusters functions on 30 PCA components with a resolution of 0.4. UMAP dimensionality reduction was performed using the first 30 PCA components to obtain a 2-dimensional representation of the cell states^[149]70. The Find All Markers function was used to characterize clusters and broadly classify the clusters into the principal cell types based on the expression of canonical marker genes. Gene ontology, KEGG enrichment, and GSEA analyses Differential gene expression testing was performed using the FindMarkers function in Seurat with parameter ‘test.use = wilcox’ by default and the Benjamini-Hochberg method was used to estimate the false discovery rate (FDR). Differentially expressed genes (DEGs) were filtered using a minimum log2FC (fold change) of 0.25 and a maximum FDR value of 0.05. Gene Ontology (GO) and KEGG enrichment analysis were conducted using clusterProfiler package^[150]71. For GSEA (Gene Set Enrichment Analysis) analysis^[151]72, the differential genes were sorted in descending order of avg_log2FC. Based on the rearranged genes list, GSEA analysis was performed according to the MSigDB (Molecular Signatures Database). Pseudotime analysis Monocle2 (version 2.18.0) was used for pseudotime trajectory analyses to determine the differentiation trajectory of cell development^[152]73. After extracting the UMI matrix was read from the Seurat object, the newCellDataSet function was utilized to create a new object. For the trajectory analysis, 2000 highly variable genes were selected, followed by dimension reduction using the DDRTree method and cell ordering with the orderCells function. Differential genes along the pseudotime trajectory were identified using the differentialGeneTest function. SCENIC transcription factor analysis SCENIC analysis was applied to explore key transcription factors regulating differences in cell types. The analysis was performed using pySCENIC with default parameters, following the standard procedure^[153]38. We then constructed a co-expression network using grnboost2, inferred regulons for each transcription factor (TF) with the RcisTarget motif databases (hg38-refseq-r80-10 kb-up-and-down-tss. mc9nr.feather and hg38-refseq-r80-500 bp-up-and-100 bp-down-tss.mc9nr.feather)^[154]74, and calculated activity score (AUC) for each TF regulon in individual cell by AUCell. Significant TF regulons of cell types were identified using FindAllMarker function in Seurat and mapped to the heatmap to visualize. Cell-cell communication analysis Cell-cell communication was inferred using CellPhoneDB to analyze ligand-receptor interactions^[155]29. A heatmap visualizing potential interactions between different cell types was generated utilizing the default visualization function within CellPhoneDB. Furthermore, the plot_cpdb function from the R package ktplots (version 2.4.0) was employed to create a dot plot, illustrating the average expression levels of ligand-receptor pairs involved in cell-cell communication. Author contributions L.L and DD.L: data analysis, curation and manuscript writing; JH.W and Y.D: conceptualization and project administration. All authors have reviewed and approved the final version of the manuscript. Funding This work was supported by grants from the GuangDong Basic and Applied Basic Research Foundation (2022A1515110743 to JH.W). Data availability The scRNA data were generated from our previous study and are available in the NCBI GEO database with the accession number [156]GSE130731. The analysis code can be requested from the corresponding author upon reasonable request. Declarations Competing interests The authors declare no competing interests. Footnotes Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. These authors contributed equally: Lu Li and Dandan Li. Contributor Information Jiahong Wang, Email: wjh1987@smu.edu.cn. Yong Dai, Email: daiyong22@aliyun.com. References