Abstract Circulating tumor cells (CTCs) that undergo epithelial-to-mesenchymal transition (EMT) can provide valuable information regarding metastasis and potential therapies. However, current studies on the EMT overlook alternative splicing. Here, we used single-cell full-length transcriptome data and mRNA sequencing of CTCs to identify stage-specific alternative splicing of partial EMT and mesenchymal states during pancreatic cancer metastasis. We classified definitive tumor and normal epithelial cells via genetic aberrations and demonstrated dynamic changes in the epithelial-mesenchymal continuum in both epithelial cancer cells and CTCs. We provide the landscape of alternative splicing in CTCs at different stages of EMT, uncovering cell-type-specific splicing patterns and splicing events in cell surface proteins suitable for therapies. We show that MBNL1 governs cell fate through alternative splicing independently of changes in gene expression and affects the splicing pattern during EMT. We found a high frequency of events that contained multiple premature termination codons and were enriched with C and G nucleotides in close proximity, which influence the likelihood of stop codon readthrough and expand the range of potential therapeutic targets. Our study provides insights into the EMT transcriptome’s dynamic changes and identifies potential diagnostic and therapeutic targets in pancreatic cancer. Keywords: MT: Bioinformatics, alternative splicing, circulating tumor cell, epithelial-to-mesenchymal transition, pancreatic ductal adenocarcinoma Graphical abstract graphic file with name fx1.jpg [47]Open in a new tab __________________________________________________________________ Jiang and colleagues analyzed the epithelial, pEMT, and mesenchymal states of human PDAC circulating tumor cells; systematically delineated the EMT continuum from the epithelium to the tumor circulation; and revealed the landscape of alternative splicing characteristics of circulating tumor cells. Introduction Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive and lethal malignancy because of the lack of early diagnosis and limited response to treatments.[48]^1 Metastatic recurrence is a significant problem in patients with PDAC, with a 5-year survival rate of only 9% for fatal disease and metastatic disease emerging in nearly 98% of patients.[49]^2 Metastatic spread is a highly inefficient and complex multistep process.[50]^3 The formation of tumor metastases first requires circulating tumor cells (CTCs) to colonize metastatic organs or tissues through the circulatory system.[51]^4 CTCs are vital components of liquid biopsies for the diagnosis of residual cancer, monitoring of therapy response, and prognosis of recurrence.[52]^5 Epithelial-to-mesenchymal transition (EMT) promotes the generation of CTCs in epithelial cancers by increasing tumor cell invasiveness and motility to promote tumor cell intravasation and facilitate tumor cell survival in the peripheral system.[53]^6 EMT is involved in cancer invasion and metastasis, which is the process by which polarized epithelial cells lose their characteristic intercellular connections and polarity and acquire characteristics of mesenchymal cells.[54]^7 Experimental and clinical studies have highlighted that CTC with phenotypic hallmarks of EMT plays a critical role in the metastatic and recurrence of solid malignancy.[55]^7 EMT is now well documented not to be a linear process, but rather a dynamic one with the existence of partial EMT (pEMT) states between the epithelial and mesenchymal poles.[56]^8 Cells transitioning in these pEMT states exhibit stem-cell-like properties and display high plasticity, which is critical for metastasis and drug resistance.[57]^9 RNA alternative splicing (AS) drives multiple malignant phenotypes, including aggressiveness, angiogenesis, and abnormal metabolism, it is not surprising that it simultaneously affects different aspects of EMT progression.[58]^10^,[59]^11 In line with this notion, AS events have been reported to play a functional role in EMT. For instance, Brown et al.[60]^12 showed that CD44 isoform switching is essential for EMT to occur and for cancer cells to metastasize. Reciprocally, the RNA-binding protein (RBP) AKAP8 suppresses tumor metastasis by antagonizing EMT-associated AS.[61]^13 Together, these findings imply a role for AS in EMT, but, until now, this phenomenon has not been assessed in PDAC. AS is an essential mechanism that acts on the pre-mRNA transcripts of more than 90% of mammals, allowing cells to create different versions of transcripts and proteins by selectively including or excluding different exons during RNA processing.[62]^14 The parsing of unique alterations of tumor-specific RNA splicing in PDAC can reveal the key elements involved in the metastatic changes, and these elements can be exploited for novel immunotherapeutic strategies.[63]^15 Progression and metastasis are also influenced by complex and dynamic features in tumor AS. However, efforts to understand PDAC progression and metastasis through alternative splicing analysis of EMT progression in CTCs have not been reported. Here, we analyzed the epithelial, pEMT, and mesenchymal status of human PDAC CTCs to identify the full EMT continuum and create a dynamic landscape by profiling AS from the epithelial stage, through pEMT, to the mesenchymal stage. We demonstrated that the EMT program of CTCs exhibits significant heterogeneity. We found that EMT is accompanied by a significant AS modality switch, which is orchestrated mainly by the splicing regulator. These findings provide insight into RNA processing during EMT progression, enhance our understanding of the molecular mechanisms involved in PDAC metastasis, and offer a resource for future investigations of AS regulators critical for tumor progression. Results Single-cell resolution identified epithelial cells from PDAC tumor and normal samples We sought to assess the EMT status of CTCs during metastasis, reasoning that this may unveil the diverse regulation of metastasis in pancreatic cancer and reveal potential therapeutic targets. To comprehensively understand the dynamics of cancer spread from the primary site through circulation to metastatic sites, we collected data on localized PDAC (locPDAC), metastatic PDAC (metPDAC), and healthy samples ([64]Table S1). We explored intrinsic characteristics of locPDAC and metPDAC through a comparative analysis between epithelial cells and CTCs ([65]Figure 1A). We first needed to obtain pure cancer epithelial cells along with normal pancreatic duct cells. Single-cell RNA sequencing (RNA-seq) data were obtained from a total of 33 samples, which included 19 localized cases and 3 metastases. These metastases stemmed from the liver, lung, and surgical vaginal apex in human PDAC. Additionally, we obtained 11 normal samples. A standard single-cell analysis process was performed to identify the major cell types in PDAC datasets. To identify cell types on the basis of the similarity of their transcriptomic signatures between cells, we performed clustering analysis using uniform manifold approximation and projection (UMAP). Data integration showed segregation into two main cell clusters ([66]Figures S1A‒S1C). To further understand whether this resulted from a batch effect, we analyzed the cellular composition of these two groups. Our findings show that cluster 1 predominantly comprises endothelial cells and fibroblasts, whereas these cell types are rare in cluster 2 ([67]Figures S1D‒S1H). This suggests that the observed disparity is more likely attributable to cell type than to a batch effect. After quality-control filtering, we cataloged 35,545 high-quality single cells into 10 distinct cell clusters annotated with canonical marker gene expression ([68]Figures 1B and 1C). We noticed there were two dfferent clusters of epithelial cells and conducted further analysis on these distinct cell types. As epithelial 2 cluster shows high expression of the cell-cycle-associated TOP2A gene, we investigated the cell cycle status of individual cells within this cluster in greater detail. We found that not all cells in this cluster were in a proliferating state. Hence, our categorization of this cluster is not solely driven by cell cycle effects ([69]Figure S1I). The pie chart of proportions (Smartseq2 vs. 10x) for these two different epithelial cell clusters shows that the ratio of Smartseq2 to 10x in these two cell types is similar, indicating that their differences are not likely due to the two different protocols ([70]Figure S1J). The bar plot, however, reveals significant differences in the number of duct cells between these two cell types, with epithelial 1 showing a higher yield of cells from normal tissues ([71]Figure S1K). We also performed a Gene Ontology (GO) enrichment analysis on the differentially expressed genes (DEGs) within these two epithelial cell types. The results suggested that epithelial 1 is correlated with peptidase activity, while epithelial 2 appears to be engaged with inflammation-related biological processes ([72]Figures S1L‒S1M). Therefore, we retained only epithelial 1 cells for subsequent analysis. Figure 1. [73]Figure 1 [74]Open in a new tab Cell types identified in PDAC on the basis of single-cell RNA sequencing (A) Conceptual overview of study design and samples used. (B) UMAP plot of all single-cell transcriptomes color coded by main cell type (left) and bar plot showing proportions of cell types in pooled with localized PDAC (locPDAC), metastatic PDAC (metPDAC), and normal samples (right). (C) Bubble plot showing highly expressed marker genes in each cell type, with cell types in rows and genes in columns. The size of each bubble represents the percent of cells expressing marker and color represents the level of average expression. (D) Quantification of main cell types per patient. (E) Clarification of cancer cells on the basis of CNV inference. The two-dimensional plot between the mean squares and correlation of the CNV signal for each sample. Red and blue colors represent single cells defined as malignant and non-malignant, respectively. Gray represents immune cells. (F) UMAP of all epithelial single-cell transcriptomes color coded by the cell source. The yield of the epithelial transcriptome varied by histological subtype. To ensure accuracy in subsequent analyses, we removed any samples with an epithelial percentage less than 10% ([75]Figure 1D). To separate defined tumor cells from a potentially non-malignant population, we obtained epithelial cells from each tumor tissue, using genetic aberrations by inferring copy number variations (CNVs) from gene expression data ([76]Figures S1N‒S1P; [77]materials and methods).[78]^16 We calculated the correlation of each cell’s CNV segment expression with the average CNV segment expression of the top 5% samples on the basis of the CNV score. We identified 4,059 malignant epithelial cells, defined by a correlation or CNV score greater than 0.2 ([79]Figures 1E, [80]S1Q, and S1R). Epithelial cancer cells in tumor tissues were removed because of their ambiguous identities arising from the lack of CNV. In the subsequent tumor cell analysis, we retained the inferred malignant tumor cells and epithelial cells from the normal sample ([81]Figure 1F). Presentation of EMT gene signatures reveals dynamic changes in CTCs during the metastasis process Having obtained cancer cells and normal epithelial cells of PDAC, we ventured forth to perform a deeper characterization of EMT dynamics of epithelial cells from the primary site, through the circulation to the metastasis site. Recent studies have demonstrated that a purity of 50% can be achieved with the CTC-chip using the peripheral blood of patients with metastatic cancer (lung, prostate, breast, colon, and pancreatic cancers).[82]^17 We assessed the tumor purity of samples and inferred the immune cell composition within the CTC samples. We grouped B cells, T cells, natural killer (NK) cells, plasma cells, eosinophils, neutrophils, and monocytes collectively as total white blood cells (WBCs). The results indicated that the WBC content of CTCs in high tumor purity was significantly lower compared with that in normal peripheral blood mononuclear cells (PBMCs) ([83]Figures S2A and S2B; [84]Table S1; [85]materials and methods). To identify the EMT gene preferentially expressed in a particular cell cluster using CTCs from pancreatic cancer patients, we next performed principal-component analysis (PCA) using 30 CTCs with relatively high purity and 15 normal PBMC samples. The results showed that CTCs were separated from normal blood cells of healthy donors, and the first and second principal components explained 55% and 19% of the variability (74% in total; [86]Figure 2A). This suggests that the expression profiles of CTCs were similar between localized and metastatic tumors. To investigate the dynamic changes in epithelial and mesenchymal markers of CTCs during cancer metastasis, we compared EMT genes (identified by Tan et al.[87]^18) in CTCs and epithelial cancer cells ([88]materials and methods). We separately screened EMT genes that were differentially expressed in epithelial cancer cells and CTCs ([89]Figures S2C and S2D; [90]Table S2). We found 28 differentially expressed EMT genes in duct PDAC, consisting of 11 epithelial genes and 17 mesenchymal genes ([91]Figure 2B). Most of the differentially expressed EMT genes demonstrated an upregulation trend, with more mesenchymal genes being upregulated than epithelial genes. Moreover, out of all the differentially expressed EMT signatures, 10 epithelial genes, and 8 mesenchymal genes were found to be differentially expressed in both epithelial duct samples and CTCs ([92]Figures 2C and [93]S2E). The result showed opposite expression patterns of the same gene in CTCs and epithelial cells: TACSTD2, EHF, and OR7E14P in epithelial genes and VCAM1, DDR2, SFRP1, and ECM2 in mesenchymal genes. This implies EMT dynamics in cancer epithelial cells. We observed that differentially expressed EMT genes were shared across multiple biological processes. Specifically, the differentially expressed EMT genes were significantly enriched in terms associated with the activation of the EMT process during metastasis. This includes activities related to cadherin binding, integrin binding, and functioning as extracellular matrix structural constituents ([94]Figure 2D). Figure 2. [95]Figure 2 [96]Open in a new tab EMT phenotypes of mRNA across different PDAC patient groups (A) Principal-component analysis between CTCs and normal PBMC samples. (B) Differentially expressed EMT gene counts in CTCs and duct PDAC compared with normal samples. (C) Heatmap showing the log[2] fold change values for differentially expressed epithelial (top) and mesenchymal genes (bottom) shared in CTCs and duct PDAC. (D) Cnet plot representing the top 7 Gene Ontology enriched terms of differentially expressed EMT genes. Enriched terms with high similarity were clustered and rendered as a network. Each gene node is colored according to its EMT type. The node size of a term represents the number of enriched genes and is colored by differentially expressed epithelial genes (EPI) and mesenchymal genes (MES) in CTCs and ductal PDAC, respectively. (E) Analysis of the correlation between epithelial score (E score) and mesenchymal score (M score). Points are colored by different groups. (F) Distributions of EM score within the duct and CTC. The distribution of the EM score from the CTC is colored in blue, while the distribution of the EM score from the duct is colored in green. The black dashed lines represent the intersection of the CTC and duct EM score distribution curves. The red dots indicate the corresponding EM score at the intersection point. (G) Quantification of different patient groups in epithelial, partial, and mesenchymal phenotypes. (H) Boxplot showing EM scores in different CTC and duct groups. (I and J) Distribution of EMT status in CTC and duct groups. (K) Overview diagram showing the dynamic changes of the EMT stage from the primary site, through the circulation to the metastatic site. The color and shape of each point are determined by the group type and display the EM score of the relevant sample. To evaluate the epithelial stability of PDAC epithelial ductal cells and CTCs, we calculated the epithelial score (EPI) and mesenchymal score (MES) for each sample on the basis of the different expressed EMT genes ([97]materials and methods). Our results show an increase in both the EPI and MESs in the CTCs, with the CTC samples tending to have high MESs and low EPIs, which is not significant in ductal samples ([98]Figures 2E and [99]S2F). To verify the universality of this phenomenon, we carried out the same analysis process with these differentially expressed EMT genes in two other datasets (one is the CTC dataset, and the other is the PDAC dataset from The Cancer Genome Atlas [TCGA]). The results were consistent with our analysis, showing an increase in both the EPI and MES in the CTCs, with the CTC samples tending to have higher MESs and lower EPIs, and in the TCGA datasets, while the EPIs remained unchanged, the range of MESs changed greatly, implying that the degree of mesenchymal differentiation in tumor tissues is heterogeneous ([100]Figure S2G). These findings confirm that entry into circulation is a complex event involving alterations in EMT signatures that are crucial for the distant spread of cancer cells. To measure the overall EMT dynamics from the epithelium to the mesenchyme, we generated an EM score for each sample on the basis of these differentially expressed EMT genes ([101]materials and methods). The distribution of EM scores for both CTC and epithelial duct samples shows that scores for CTC are concentrated primarily between 0 and 0.7, whereas epithelial duct samples predominantly score near −1 and 1 ([102]Figure 2F). Then, we subdivided them into epithelial, pEMT, and mesenchymal phenotypes on the basis of the cutoff at the intersection of the epithelial duct and CTC density curve ([103]Figure 2G; [104]Table S3). Normal PBMCs predominantly exhibit a pEMT phenotype, while all normal ducts present an epithelial phenotype ([105]Figures 2H and [106]S2H). This suggests that samples in epithelial phenotype are less mobile and more structured. We also found that the transition from pEMT to a fully mesenchymal phenotype was accompanied by an increase in the epithelial duct and a decrease in CTCs ([107]Figure S2I). In CTC samples, we found that locPDAC accounted for a slightly higher proportion than metPDAC in the full mesenchymal phenotype ([108]Figure 2I). In addition, we observed that locPDAC epithelial duct samples contributed to 25% of both the epithelial and mesenchymal phenotypes, whereas metPDAC epithelial duct samples predominantly exhibited the mesenchymal phenotype ([109]Figure 2J). The result suggests a distinction in the behavior between localized and metastatic forms of PDAC and implies some level of dynamic balance between these cell types during the process of EMT ([110]Figure 2K). The EMT score measures the potential for metastasis and could also be connected to disease development. To determine whether this was a common occurrence in pancreatic cancer CTCs, we validated this hypothesis on three external datasets including CTC and duct tumor samples. The results are consistent with our analysis that CTC samples predominantly display pEMT and mesenchymal phenotype, duct normal samples are largely epithelial phenotype, and duct tumor samples vary across different EMT stages ([111]Figure S2J). We analyzed the overall survival of TCGA patients, and a higher EM score was associated with poorer overall survival ([112]Figure S2K). In addition, we also compared the correlation between EM score and the expression of shared EMT genes with opposite expression patterns, and we found that the expression of Mesenchymal gene VCAM1 and epithelial gene TACSTD2 were significantly correlated with EM score ([113]Figure S2L). Evidence has shown that CTCs can cluster with neutrophils, enhancing proliferation and promoting metastasis. Neutrophils interact with CTCs through binding to highly expressed adhesion molecules on tumor cells, such as VCAM1.[114]^19 TACSTD2 overexpression is related to increased growth and metastasis of pancreatic carcinoma.[115]^20 These results suggest that the acquisition of mesenchymal features by epithelial cells is closely associated with the migration of CTCs, leading to cancer progression. A range of AS events occur in CTCs during EMT AS is an important method of regulating gene expression that expands the cell proteome. After demonstrating the significance of the EMT program in the tumor cycle, our next step was to identify the splicing regulators responsible for it at the exon level. Instead of using blood samples as a control, normal epithelial cells were extracted from bulk RNA-seq to investigate the intrinsic differences between CTCs and epithelial cells ([116]materials and methods). We first analyzed the gene coverage of both the CTC RNA-seq and bulk RNA-seq data. This comparison confirms that the gene coverage across these two datasets is sufficiently similar to permit valid comparative analysis ([117]Figure S3A). To quantify the contribution of AS to dynamic transcript accumulation during the emergence of EMT, we calculated the use of alternative exons using the rMATS statistical model. This model estimates the abundance of each exon and assigns it a “percentage splicing in” (PSI) value (Ψ) to calculate the use of alternative exons. We identified five types of AS events, including exon skipping, mutually exclusive exons, alternative 5′ splice sites, alternative 3′ splice sites, and intron retention. On the basis of the location of each AS event, we obtained the intersection of events in CTC samples and normal epithelial ducts ([118]Figure S3B). We then identified differential AS events in epithelial, pEMT, and mesenchymal CTCs, respectively, compared with normal epithelial duct samples (adjusted p < 0.05 and delta PSI > 0.1; [119]Table S4). The relative proportions of differential AS event types remained stable when comparing epithelial, pEMT, and mesenchymal types ([120]Figures 3A, [121]S3C, and S3D). Notably, some of the differential AS events identified in our results have been reported as indeed existing. For example, the SE splicing event associated with the RBP gene MBNL1, located at chr3:152445281-152445539+, was experimentally verified by Chu et al.[122]^21 Similarly, the SE splicing event related to cell surface protein FLNA at chrX:154352552-154352675:- in exon 40 was also validated by Oda et al.[123]^22 GO analysis of these events revealed their association with signaling pathways related to protein metabolism (e.g., organelle biogenesis and maintenance, protein localization) and tumor growth factor (e.g., apoptotic signaling pathway, establishment of organelle localization), as shown in [124]Figure 3B. On the basis of our calculation of the number of events related to those genes, we discovered that on average, 20% of genes generated multiple isoforms, indicating transcriptional and post-transcriptional diversity within CTCs in different EMT states ([125]Figure 3C). We then conducted a detailed analysis to explore the composition of those alternative spliced genes. To explore the relationship between differential AS events and DEGs, we analyzed the DEGs in epithelial, pEMT, and mesenchymal phenotype compared with normal epithelial duct samples. We found that a few of the genes were both differentially spliced and expressed, whereas the majority of genes were alternatively spliced without expression changes ([126]Figure S3E). In subsequent analyses, we only retained events whose related gene showed no significant changes in expression between CTCs and normal epithelial ducts ([127]Figure 3D). To gain insights into the regulation of AS and functional relationships among genes, we explored the chromosomal distribution of these events. Our analysis revealed that relatively long chromosomes, such as chr1, chr2, and chr3, had a larger number of splicing events compared with other chromosomes in A5SS and MXE, suggesting that these chromosomal regions display increased transcriptional and post-transcriptional activity ([128]Figures 3E and [129]S3F). We went on to compare the delta PSI of differential AS events among epithelial cells, pEMT, and stromal cells. We found those AS events were different across the EMT stage. These differences were predominantly driven by strong relative enrichment of epithelial and mesenchymal, compared with their low abundances in pEMT. The relative abundances of RI events increased progressively from mesenchymal to epithelial ([130]Figure 3F). In addition, we found that multiple differential AS events were significantly associated with survival outcomes ([131]Figure S3G). Together, the differential AS events clearly show the necessity of discriminating AS-specific roles for a full understanding of the transcriptional landscape of EMT. Figure 3. [132]Figure 3 [133]Open in a new tab Differential alternative splicing events during EMT in CTCs (A) The distribution of AS types in EMT-specific AS events, comparing epithelia, pEMT, and mesenchymal CTCs. (B) Enrichment network representing the top 8 significantly enriched terms of genes related to differential AS events (p < 0.05). Enriched terms with high similarity were clustered and rendered as a network, while each node represents an enriched term and is colored according to its cluster. Node size indicates the number of enriched genes. (C) Bar graph showing the proportion of genes associated with a single event or multiple events. (D) The count of differential AS events without differences at gene level among epithelial, pEMT, and mesenchymal CTC. (E) Distribution of delta PSI of shared SE events across chromosomes. (F) The ternary phase diagram reveals variations in epithelial, pEMT, and stromal AS events, analyzed using delta PSI. Each point, colored per AS type, signifies genes: small dots for those associated with a unique splicing event, and larger ones for genes linked to multiple events. (G) The count of differential AS events related to cell surface proteins. (H) The frequencies of exon use in differential AS events related to cell surface proteins. (I) Cell surface protein genes associated with multiple differential AS events in epithelial, pEMT, mesenchymal, and normal duct samples. (J) The pie chart shows the topology present in ORF transcripts affected by differential AS events related to cell surface protein genes. (K and L) Examples showing topology and domains in ORF transcripts, such as in LMBRD1-229 (K) and MCAM-201 (L). Cell surface proteins that are specifically altered splicing in tumor cells are ideal targets to minimize off-target effects and are important markers for the detection and isolation of CTCs. Our analysis of splicing data revealed a total of 89 aberrant splicing events associated with 77 cell surface proteins among the differential AS events ([134]Figure 3G). To identify retained exons in these cell surface protein-related AS events, we used a threshold of ΔPSI > 0.1, while skipped exons were defined as those with ΔPSI < −0.1. The result showed that there exhibited a higher number of retained exons, which could potentially impact isoform length ([135]Figure 3H). Among them, we found 12 cell surface protein genes associated with multiple AS events, which might increase the number of inferred neoantigens in metPDAC ([136]Figure 3I). On the basis of the location of these cell surface protein gene-related events, we found that they affected 124 open reading frame (ORF) regions. These regions are associated with 22 different transcripts, which could potentially affect the reading frame and may serve as suitable therapeutic targets ([137]Table S5). To further screen for PDAC-specific targets in cancer-specific AS events, we then analyzed the topology of those 22 related transcripts. We identified 25 distinct topologies, 6 of which were associated with the extracellular domain ([138]Figure 3J). Furthermore, transcripts contain structures with multiple topologies that may influence cellular function, possibly leading to changes in phenotype or even disease conditions ([139]Figures 3K and 3L). However, we believe that this is a cautious estimate as junction-spanning polypeptides derived from AS are inadequately represented because of trypsin’s cleavage properties. Consequently, all of the RNA-level candidates we have identified might be considered appropriate for additional validation and can be developed further as targets for immune therapy. Together, these data indicate a specific and sustained requirement for diverse transcripts of cell surface protein during all stages of EMT. AS of RBP orchestrates the EMT-specific AS events After highlighting the significance of particular AS events in EMT, our subsequent objective was to pinpoint the splicing regulators accountable for them. Our focus was on skipped exon events, which are one of the most common types of AS. We used the online web server rMAPS2 (RNA Map Analysis and Plotting Server 2) to identify discriminative cis-regulatory sequence motifs enriched in the cassette exons that are preferentially included in epithelial, pEMTs, and mesenchymal phenotype, as well as motifs in their flanking introns ([140]materials and methods).[141]^23 These cis-regulatory motifs corresponded to predicted RBPs. To investigate the association between AS and RBPs, we analyzed the correlations between the gene expression levels of RBPs and the PSI values of exon-skipped events. We sorted the significantly correlated RBPs according to the absolute value of correlation and took the top 10 RBPs for downstream analysis (correlation R is greater than 0.6, and the p value is less than 0.05; [142]Figures S4A‒S4C). We observed that the well-conserved binding motifs of these RBPs were significantly over-represented in the transcript sequences of flanking the skipped/retained exons compared with non-differentially spliced exons ([143]Figures 4A–4C). Most RBPs were common among epithelial, pEMT, and mesenchymal, such as DAZAP1, FXR1, G3BP2, and HNRNPH2 ([144]Figure S4D). It is worth noting that when the same RBPs bind to the same RNA target sequence but possess distinct biochemical properties, it may result in varying splicing regulatory outcomes.[145]^24 As an example, in the case of pEMT-specific skipping events (ΔPSI > 0.1), we identified the HNRNPH2 motif occurring separately in the intron sequence located 250 bp upstream or downstream of the target exon sequence ([146]Figure 4B). In contrast, for epithelial and mesenchymal skipping events (ΔPSI > 0.1), this motif was absent in the intron sequence found 250 bp downstream of the target exon sequence ([147]Figures 4A and 4C). The results suggest that splicing factors have a common regulatory effect on AS patterns and that RBPs have potential position-dependent effects on exon splicing in both the process of pEMT and mesenchymal phenotype. Figure 4. [148]Figure 4 [149]Open in a new tab RBP regulated exon inclusion in CTCs during EMT (A‒C) The differential enrichment of RBPs between exon skipped (ΔPSI < 0) and exon inclusion (ΔPSI > 0). The motif sequences of the corresponding RBP are labeled on the right. Cell colors represent the smallest p value less than 0.05 for each enriched region, while blanks represent non-significant enrichment. (D) Bar graphs show expression levels of MBNL1 across epithelial, pEMT, mesenchymal, and normal duct samples. (E and F) The boxplot on the left represents the PSI of MBNL1. The genomic location of the alternative exon was labeled at the bottom. Sashimi plots on the right show the alternative spliced exons according to genomic locations. Junction reads are plotted as arcs whose width is determined by the number of reads aligned to the junction spanning the exons connected by the arc. (G) Positional distribution of MBNL1-binding motifs of epithelial (top), pEMT (middle), mesenchymal (bottom) CTC in SE events. Motif enrichment scores (solid line) and p values (dashed line) were plotted according to AS event positions. Arrows indicate peaks of enrichment for exons. (H) Bar plot showing isoform use of MBNL1. Data were analyzed using the Mann-Whitney two-sided test. ∗p ≤ 0.05, ∗∗p ≤ 0.01, ∗∗∗p ≤ 0.001, and ∗∗∗∗p ≤ 0.0001. Notably, the relative expression level of MBNL1 was not significantly different between pEMT PDAC, mesenchymal PDAC, and normal epithelial cells ([150]Figure 4D). However, MBNL1 generated distinct transcripts through its unique AS mechanism ([151]Figures 4E and 4F). Moreover, MBNL1 influences the recruitment of other splicing factors and the spliceosome, leading to its direct regulation of AS. Therefore, we have selected MBNL1 for further investigation to understand its role in epithelial, pEMT, and mesenchymal emergence. To investigate the effect of MBNL1 on AS changes between CTC and normal epithelial cells, we examined the positional distribution of MBNL1-binding motifs. Our analysis revealed a significant overrepresentation of the MBNL1 binding motif in the flanking retained exons of CTCs compared with non-differentially skipped exons. Additionally, we observed significant differences in the positioning of MBNL1 binding sites within CTC regions undergoing AS changes ([152]Figure 4G). In addition, we further analyzed the differential transcript use of MBNL1. Although there were no evident alterations in the expression levels of MBNL1, we noticed considerable variation in the use of the MBNL1-018 transcript isoform across epithelial, pEMT, and mesenchymal CTCs. On the other hand, significant changes in the MBNL1-205 transcript were only seen in epithelial CTCs ([153]Figure 4H). Collectively, these findings suggest that as EMT evolves step by step, it is accompanied by stage-specific variations in transcript diversity. These findings suggest that MBNL1 can differentially regulate the transition among epithelial, pEMT, and mesenchymal phenotypes through AS, independent of gene expression, potentially affecting CTC remodeling and differentiation into pEMT and mesenchymal subtypes. Stop codon readthrough affects the diversity of therapeutic targets in CTCs To provide a comprehensive understanding of the complexity of a given AS event, we analyzed changes in AS event modality relative to the stepwise progression from normal epithelial cells to pEMT and finally to mesenchymal. Our analysis began by assigning each AS event to one of three modalities: (1) included, wherein exons are entirely preserved (Ψ > 0.95); (2) excluded, wherein exons are fully skipped (ψ < 0.05); and (3) middle, wherein an exon may be included or skipped in RNA molecules transcribed from the gene (0.05 < Ψ < 0.95). Our results reveal that nearly 90% of AS events exhibited modality switches from the norm to epithelial phenotype, while 28.2% of AS events showed modality changes from epithelial to pEMT, and 35.6% of them changed from pEMT to mesenchymal phenotype ([154]Figures 5A–5C). We observed that among the modality change AS events, the majority were attributed to the middle mode. The middle mode represents an exon that skips over a percentage of transcripts while including the remaining portions. As various combinations of exons in a gene can generate multiple protein isoforms with unique functions or properties, the incorporation of these exons may play a critical role in determining the commitment of cells toward pEMT fate ([155]Figure 5D). We found that CYB5R3-related splicing events have different patterns in different EMT stages ([156]Figure 5E). CYB5R3 is an important component in maintaining glucose use, where it interacts with glucose to stabilize glucokinase.[157]^25 We found that the transcript of CYB5R3 has an extracellular topology domain, which may play a crucial role in processes like signal transduction. At the same time, compared with normal duct cells, we found that CYB5R3-214 showed significant isoform use differences during EMT ([158]Figure 5E). Our findings reveal a clear pattern that identifying the CTC-specific isoform may lead to the discovery of specific therapeutic targets. Figure 5. [159]Figure 5 [160]Open in a new tab Genes have alternative splice variants containing different stop codons (A‒C) Bar plot showing the proportion of AS events with (modality change) or without (no modality change) modality changes between neighboring developmental stages during EMT. (D) Sankey diagram showing the AS modality transition from norm to epithelial, then to pEMT, and finally to mesenchymal. (E) Isoforms related to CYB5R3. Top: two analyzed isoforms for the CYB5R3 gene: PFAM domains are indicated by black. Extracellular topology colored in green. UCSC gene transcript IDs are shown for each isoform. Bottom: the average isoform fraction in epithelial (left), pEMT (middle), and mesenchymal (right), compared with the norm duct. Data were analyzed using the Mann-Whitney two-sided test. ∗p ≤ 0.05, ∗∗p ≤ 0.01, ∗∗∗p ≤ 0.001, and ∗∗∗∗p ≤ 0.0001. (F) Circos plot of PTC (TAA, TAG, and TGA) signals in overall shared SE AS events. (G) Bar plot showing the frequency of SE AS events with different types of PTCs. (H) Distribution of different types of PTCs in shared SE AS events. (I) The frequencies of each nucleotide are plotted for all positions 40 nt upstream to 60 nt downstream of the stop codon. Nucleotides are plotted in order of increasing frequencies. Exon skipping and retention not only alter transcript length but also introduce premature termination codons (PTCs) within gene coding sequences.[161]^26 This can activate the evolutionarily conserved nonsense-mediated decay machinery, which guards against the negative consequences of mRNAs harboring PTCs.[162]^27 To comprehensively compare the impact of stop codon identity on these splicing events, we conducted a systematic analysis by tallying the occurrences of each possible PTC, including TAA, TAG, and TGA ([163]Figure 5F). Importantly, our analysis revealed a high frequency of single events harboring multiple PTCs, with longer exons exhibiting an increased number of PTCs ([164]Figures 5G and 5H). Given the known influence of C’s and G’s in promoting efficient translation termination compared with A’s and U’s, we investigated the nucleotide use within 40 nt upstream and 60 nt downstream of the stop codon in protein-coding transcripts ([165]Figure 5I). Our investigation revealed that several patterns in this region demonstrate a repeated three-nucleotide periodicity. Notably, we observed a significant enrichment of C and G nucleotides near the PTC. However, as the distance from the PTC increases, the frequency of C and G nucleotides decreases. Taken together, it appears that evolution has intricately refined PTC to enhance translation termination efficiency. Targeting crucial stop codons for readthrough is a promising approach that may be achieved without disrupting general translation termination. This trait could potentially provide therapeutic benefits when dealing with crucial genes identified to have PTCs. Discussion In this study, we provide quantitation of the epithelial to EMT continuum of CTCs in PDAC, covering localized and metastatic sites. Our findings reveal that CTCs, especially those obtained from metastatic samples, can destabilize the epithelium by preferentially expressing mesenchymal genes. This could lead to broad perturbations in the metastatic process through multiple pathways. Additionally, a genome-wide exon-based analysis of CTCs showed considerable variation between different EMT stages, allowing for investigation into the identification of cell-type-specific isoforms, AS events, and an unknown regulatory mechanism of EMT formation. We identified 77 splicing events in cell surface proteins that were specifically expressed in epithelial, pEMT, and mesenchymal states, respectively, which may represent potential targets for therapeutic intervention. These results suggest that abnormal AS is prevalent in EMT progression, with a complex mode switch from epithelial to pEMT and then to mesenchymal. Furthermore, we discovered that MBNL1 significantly regulates frequent differential AS switches in the epithelial, pEMT, and mesenchymal stages through AS without appreciable changes at the gene expression level. Finally, upon examination of the epithelial, pEMT, and mesenchymal termination codons, we observed a high frequency of single events harboring multiple PTCs. This finding could potentially lead to transcript diversity and contribute to the identification of useful markers for diagnosis and prognosis. A previous study discovered that the EMT program is highly plastic and demonstrated that epithelial stabilization leads to enhanced collective migration of epithelial cancer cells, as evidenced by the use of single-cell RNA-seq and genetic mouse models.[166]^28 Epithelial cells undergo a process known as EMT, during which they become more mobile and invasive by losing their normal cell-cell adhesion and polarity. This allows them to break off from the main tumor and enter the bloodstream as CTCs.[167]^29^,[168]^30 We explored epithelial stability during tumor metastasis to identify disease-associated cell-type-specific EMT gene signatures. The analysis of the molecular features that distinguish CTCs is an outstanding tool to provide insights into the understanding of crucial aspects of the metastatic process and could foster use in liquid biopsy-based personalized cancer treatment. On the basis of the EMT signature, our results showed that the EMT program of CTCs is highly heterogeneous, with many cells present in pEMT and mesenchymal states, whereas epithelial cancer cells acquire a more stable epithelial phenotype away from pEMT and mesenchymal states. Our results are consistent with previously reported that the EMT program of cancer cells exhibits molecular alterations, including decreased expression of epithelial markers (E-cadherin, ZO-1, claudins, and occludins) and increased expression of mesenchymal markers (vimentin, N-cadherin, fibroblast-specific protein1, and fibronectin).[169]^31^,[170]^32 Notably, we observed that both normal epithelial cells and healthy donor samples from blood fluctuated in a small range of EM scores, whereas epithelial cancer cells and CTCs demonstrated distinct polarization. These results suggest that epithelial cells can undergo EMT with broad effects. In addition, pEMT-phenotype CTCs were more similar to health donors, suggesting that this pEMT intermediate cell state may therefore be more stable than the other transition states and provided evidence of a metastatic advantage for pEMT CTCs, which was supported by previous studies.[171]^33^,[172]^34 Overall, our analysis of EMT signals of epithelial cells and CTCs establishes the importance of epithelial cells losing their intercellular adhesion and acquiring mesenchymal in supporting metastatic dissemination and highlights that pEMT-phenotype CTCs and mesenchymal CTCs had different metastases capacity. AS is a common process in which transcribed mRNAs generate distinct transcript isoforms, leading to protein products that possess unique functional properties.[173]^35 However, the current understanding of how to monitor disease progression from CTCs to metastasis is primarily derived from the observations at the gene expression level.[174]^36 Ignoring transcriptional isoforms in transcriptome analyses can lead to incomplete information about the complex process of EMT. Furthermore, it is common for previous studies of CTCs from PDAC patients to use purified healthy donor volunteer blood as a control, while the cells in purified blood from a healthy donor include red blood cells, WBCs (such as lymphocytes and neutrophils), and platelets.[175]^4^,[176]^36^,[177]^37^,[178]^38 Nonetheless, one main concern is that CTCs are rare cells that have detached from the localized tumor and entered the bloodstream, and they may provide a closer comparison with the epithelial cells of the localized tumor compared with other circulating cells in the bloodstream, including their genetic and molecular profile, as well as their morphology and behavior. Therefore, using healthy donor blood as a control might potentially mask important differences in EMT changes between CTCs and healthy individuals. Here, we describe AS changes in epithelial-type CTCs, pEMT-type CTCs, and mesenchymal-type CTCs using normal epithelial samples as controls to gain a deeper understanding of the unique characteristics of each CTC subtype. We identified multiple splicing events in cell surface proteins that are suitable targets for future research. It has been reported that abnormal splicing events provided the corresponding putative AS-derived neoepitopes of 16 cancer types.[179]^36 Lin et al.[180]^36 illustrated that many highly tumor-specific AS events are putative targets for TCR T cell approaches in the GBM. MBNL1 is well known to regulate RNA AS, localization, and decay and drives tumor dedifferentiation as well, implicating their activity to some extent in the relapse and distant metastasis.[181]^39^,[182]^40^,[183]^41 We found that gene MBNL1, with the absence of appreciable changes in the gene expression, produced significantly different AS events in CTC. Notably, we found a large subset of epithelial-, pEMT-, and mesenchymal-specific AS events were predicted to be under the control of MBNL1. Genome-wide transcriptomic analysis of the MBNL1 gene has revealed a potential close relationship between MBNL1 and breast cancer metastatic colonization.[184]^42 Together, our findings provide further evidence that specific AS isoforms might have key regulatory roles independently of expression alterations and are a necessary first step for understanding the isoform-specific functions of key proteins. The premature stop codons may lead to mRNA destruction through nonsense-mediated mRNA decay or the production of a truncated protein.[185]^42 Using the modality strategy, we explored the distribution and variation of AS events across different cell populations to reveal exon inclusion bias during EMT. We discovered a high frequency of single events harboring multiple PTCs that might support models of translation initiation inhibition. This highlights the significance of the stop codon context in regulating translation termination. However, achieving specificity for PTC readthrough poses a significant challenge for nonsense suppression therapies, requiring a deeper understanding of translation termination and stop codon readthrough in varied sequence contexts.[186]^43^,[187]^44 Despite this challenge, such readthrough events can be beneficial in therapy when a critical gene contains a PTC. In summary, this study suggests that the EMT program at the exon level within the CTC is heterogeneous and contributes to the functional annotation of the splicing factor by indicating the necessity for studying alternatively spliced isoforms during transcriptional analysis. These findings have allowed us to explore previously unidentified transcriptional processes involved in tumor metastasis and offer insights into potential therapeutic interventions. Materials and methods Data acquisition We downloaded 3 PDAC single-cell RNA-seq (scRNA-seq) datasets ([188]GSE156405 , [189]GSE81547, and [190]GSE155698), 3 CTC RNA-seq datasets ([191]GSE144561, [192]GSE40174, and [193]GSE114704), and 2 bulk RNA-seq datasets ([194]GSE57973 and [195]GSE193268) from the Gene Expression Omnibus (GEO) ([196]https://www.ncbi.nlm.nih.gov/geo/). The corresponding RNA-seq data were obtained from the TCGA data portal ([197]https://tcga-data.nci.nih.gov/tcga/). Datasets, including GEO: [198]GSE156405, [199]GSE81547, [200]GSE155698, [201]GSE144561, [202]GSE193268, and [203]GSE57973, were used for analysis. Datasets,including GEO: [204]GSE40174 and [205]GSE114704 and TCGA samples were used for validation purposes ([206]Table S1). Single-cell data quality pre-processing and initial cell type discovery Quality control and downstream analysis were performed using Seurat version 3.2.3.[207]^45 Each scRNA-seq dataset contained a single-cell transcriptome count matrix. Samples were analyzed after excluding low-quality cells (<1,000 genes/cell and >20% mitochondrial genes). The harmony algorithm was used to integrate single cells from different platforms and tissues, followed by the construction of a K-nearest neighbors (KNN) graph using the FindNeighbors function in Seurat. Finally, cell clustering and UMAP visualization were performed using the FindClusters and RunUMAP functions, respectively. The annotations of cell identity on each cluster were defined by the expression of known marker genes. Inference of CNV from scRNA-seq data To distinguish cancer cells from normal ones, chromosomal gene expression was perturbed to infer CNV aberrations. For the accuracy of cancer cell inference, we have compared the results of two methods: inferCNV and CopyKat.[208]^46^,[209]^47 First, we separately used CopyKat and inferCNV to compute the signal for copy number profiles from the gene expression matrix. To compare the results of these two methods, we calculated the CNV scores of all cells using the method from previous studies.[210]^48 Because of the algorithmic differences between the two methods, we scaled the CNV score. We then arranged the cells in ascending order and defined the high and low of the CNV score with 25% as the cutoff value. We found that in the results of CopyKat, there is a higher proportion of high scores in the epithelial cells, while in immune cells, there is a higher proportion of low scores. On the other hand, the results from InferCNV present similar proportions for both types of cells. Therefore, in subsequent analyses, we adopted the predictions of CopyKat. Second, we excluded cells that were not predicted and the diploid cells that were predicted to be normal epithelial or immune cells. Third, we calculated the mean square and the correlation of CNV per cell with the average of the top 5% of cells from the aneuploid single cells. This allowed us to detect cancer cells and enhance the precision of our findings. Finally, cells specifically with a correlation or CNV score greater than 0.2, are categorized as cancer cells. Considering that P5_GES156405 contained very few malignant epithelial cells, this sample was removed from subsequent analyses. Bulk RNA-seq quality pre-processing The CTC datasets read were aligned to the GRCh38 human reference genome from the University of California, Santa Cruz ([211]http://genome.ucsc.edu), using the STAR version 2.7.4a aligner with default settings (Alex Dobin; [212]https://github.com/alexdobin/STAR).[213]^49 We used ESTIMATE to evaluate tumor purity.[214]^50 The formula for calculating ESTIMATE-based tumor purity was developed as follows: Tumor_purity = cos(0.6049872018 + 0.0001467884 × ESTIMATEScore). We retained 30 CTC samples with a tumor purity greater than 50%, and 15 normal samples with a purity less than 50%. To further check the tumor purity of the sample, we analyzed the immune cell composition in the selected CTC samples, as inferred by the CIBERSORT algorithm.[215]^51 Differential expression analysis To perform sample-level differentially expressed EMT genes in single-cell PDAC samples and CTC samples in parallel, we used a pseudo-bulk approach for single-cell data. We use the functions from the SingleCellExperiment package to extract the counts and metadata of single-cell data.[216]^52 To calibrate the pseudo-bulk data, we added two bulk RNA-seq datasets as references, namely, [217]GSE193268, which