Abstract

   Circulating tumor cells (CTCs) that undergo epithelial-to-mesenchymal
   transition (EMT) can provide valuable information regarding metastasis
   and potential therapies. However, current studies on the EMT overlook
   alternative splicing. Here, we used single-cell full-length
   transcriptome data and mRNA sequencing of CTCs to identify
   stage-specific alternative splicing of partial EMT and mesenchymal
   states during pancreatic cancer metastasis. We classified definitive
   tumor and normal epithelial cells via genetic aberrations and
   demonstrated dynamic changes in the epithelial-mesenchymal continuum in
   both epithelial cancer cells and CTCs. We provide the landscape of
   alternative splicing in CTCs at different stages of EMT, uncovering
   cell-type-specific splicing patterns and splicing events in cell
   surface proteins suitable for therapies. We show that MBNL1 governs
   cell fate through alternative splicing independently of changes in gene
   expression and affects the splicing pattern during EMT. We found a high
   frequency of events that contained multiple premature termination
   codons and were enriched with C and G nucleotides in close proximity,
   which influence the likelihood of stop codon readthrough and expand the
   range of potential therapeutic targets. Our study provides insights
   into the EMT transcriptome’s dynamic changes and identifies potential
   diagnostic and therapeutic targets in pancreatic cancer.

   Keywords: MT: Bioinformatics, alternative splicing, circulating tumor
   cell, epithelial-to-mesenchymal transition, pancreatic ductal
   adenocarcinoma

Graphical abstract

   graphic file with name fx1.jpg
   [47]Open in a new tab
     __________________________________________________________________

   Jiang and colleagues analyzed the epithelial, pEMT, and mesenchymal
   states of human PDAC circulating tumor cells; systematically delineated
   the EMT continuum from the epithelium to the tumor circulation; and
   revealed the landscape of alternative splicing characteristics of
   circulating tumor cells.

Introduction

   Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive and
   lethal malignancy because of the lack of early diagnosis and limited
   response to treatments.[48]^1 Metastatic recurrence is a significant
   problem in patients with PDAC, with a 5-year survival rate of only 9%
   for fatal disease and metastatic disease emerging in nearly 98% of
   patients.[49]^2 Metastatic spread is a highly inefficient and complex
   multistep process.[50]^3 The formation of tumor metastases first
   requires circulating tumor cells (CTCs) to colonize metastatic organs
   or tissues through the circulatory system.[51]^4 CTCs are vital
   components of liquid biopsies for the diagnosis of residual cancer,
   monitoring of therapy response, and prognosis of recurrence.[52]^5
   Epithelial-to-mesenchymal transition (EMT) promotes the generation of
   CTCs in epithelial cancers by increasing tumor cell invasiveness and
   motility to promote tumor cell intravasation and facilitate tumor cell
   survival in the peripheral system.[53]^6

   EMT is involved in cancer invasion and metastasis, which is the process
   by which polarized epithelial cells lose their characteristic
   intercellular connections and polarity and acquire characteristics of
   mesenchymal cells.[54]^7 Experimental and clinical studies have
   highlighted that CTC with phenotypic hallmarks of EMT plays a critical
   role in the metastatic and recurrence of solid malignancy.[55]^7 EMT is
   now well documented not to be a linear process, but rather a dynamic
   one with the existence of partial EMT (pEMT) states between the
   epithelial and mesenchymal poles.[56]^8 Cells transitioning in these
   pEMT states exhibit stem-cell-like properties and display high
   plasticity, which is critical for metastasis and drug resistance.[57]^9
   RNA alternative splicing (AS) drives multiple malignant phenotypes,
   including aggressiveness, angiogenesis, and abnormal metabolism, it is
   not surprising that it simultaneously affects different aspects of EMT
   progression.[58]^10^,[59]^11 In line with this notion, AS events have
   been reported to play a functional role in EMT. For instance, Brown
   et al.[60]^12 showed that CD44 isoform switching is essential for EMT
   to occur and for cancer cells to metastasize. Reciprocally, the
   RNA-binding protein (RBP) AKAP8 suppresses tumor metastasis by
   antagonizing EMT-associated AS.[61]^13 Together, these findings imply a
   role for AS in EMT, but, until now, this phenomenon has not been
   assessed in PDAC.

   AS is an essential mechanism that acts on the pre-mRNA transcripts of
   more than 90% of mammals, allowing cells to create different versions
   of transcripts and proteins by selectively including or excluding
   different exons during RNA processing.[62]^14 The parsing of unique
   alterations of tumor-specific RNA splicing in PDAC can reveal the key
   elements involved in the metastatic changes, and these elements can be
   exploited for novel immunotherapeutic strategies.[63]^15 Progression
   and metastasis are also influenced by complex and dynamic features in
   tumor AS. However, efforts to understand PDAC progression and
   metastasis through alternative splicing analysis of EMT progression in
   CTCs have not been reported.

   Here, we analyzed the epithelial, pEMT, and mesenchymal status of human
   PDAC CTCs to identify the full EMT continuum and create a dynamic
   landscape by profiling AS from the epithelial stage, through pEMT, to
   the mesenchymal stage. We demonstrated that the EMT program of CTCs
   exhibits significant heterogeneity. We found that EMT is accompanied by
   a significant AS modality switch, which is orchestrated mainly by the
   splicing regulator. These findings provide insight into RNA processing
   during EMT progression, enhance our understanding of the molecular
   mechanisms involved in PDAC metastasis, and offer a resource for future
   investigations of AS regulators critical for tumor progression.

Results

Single-cell resolution identified epithelial cells from PDAC tumor and normal
samples

   We sought to assess the EMT status of CTCs during metastasis, reasoning
   that this may unveil the diverse regulation of metastasis in pancreatic
   cancer and reveal potential therapeutic targets. To comprehensively
   understand the dynamics of cancer spread from the primary site through
   circulation to metastatic sites, we collected data on localized PDAC
   (locPDAC), metastatic PDAC (metPDAC), and healthy samples
   ([64]Table S1). We explored intrinsic characteristics of locPDAC and
   metPDAC through a comparative analysis between epithelial cells and
   CTCs ([65]Figure 1A). We first needed to obtain pure cancer epithelial
   cells along with normal pancreatic duct cells. Single-cell RNA
   sequencing (RNA-seq) data were obtained from a total of 33 samples,
   which included 19 localized cases and 3 metastases. These metastases
   stemmed from the liver, lung, and surgical vaginal apex in human PDAC.
   Additionally, we obtained 11 normal samples. A standard single-cell
   analysis process was performed to identify the major cell types in PDAC
   datasets. To identify cell types on the basis of the similarity of
   their transcriptomic signatures between cells, we performed clustering
   analysis using uniform manifold approximation and projection (UMAP).
   Data integration showed segregation into two main cell clusters
   ([66]Figures S1A‒S1C). To further understand whether this resulted from
   a batch effect, we analyzed the cellular composition of these two
   groups. Our findings show that cluster 1 predominantly comprises
   endothelial cells and fibroblasts, whereas these cell types are rare in
   cluster 2 ([67]Figures S1D‒S1H). This suggests that the observed
   disparity is more likely attributable to cell type than to a batch
   effect. After quality-control filtering, we cataloged 35,545
   high-quality single cells into 10 distinct cell clusters annotated with
   canonical marker gene expression ([68]Figures 1B and 1C). We noticed
   there were two dfferent clusters of epithelial cells and conducted
   further analysis on these distinct cell types. As epithelial 2 cluster
   shows high expression of the cell-cycle-associated TOP2A gene, we
   investigated the cell cycle status of individual cells within this
   cluster in greater detail. We found that not all cells in this cluster
   were in a proliferating state. Hence, our categorization of this
   cluster is not solely driven by cell cycle effects ([69]Figure S1I).
   The pie chart of proportions (Smartseq2 vs. 10x) for these two
   different epithelial cell clusters shows that the ratio of Smartseq2 to
   10x in these two cell types is similar, indicating that their
   differences are not likely due to the two different protocols
   ([70]Figure S1J). The bar plot, however, reveals significant
   differences in the number of duct cells between these two cell types,
   with epithelial 1 showing a higher yield of cells from normal tissues
   ([71]Figure S1K). We also performed a Gene Ontology (GO) enrichment
   analysis on the differentially expressed genes (DEGs) within these two
   epithelial cell types. The results suggested that epithelial 1 is
   correlated with peptidase activity, while epithelial 2 appears to be
   engaged with inflammation-related biological processes
   ([72]Figures S1L‒S1M). Therefore, we retained only epithelial 1 cells
   for subsequent analysis.

Figure 1.

   [73]Figure 1
   [74]Open in a new tab

   Cell types identified in PDAC on the basis of single-cell RNA
   sequencing

   (A) Conceptual overview of study design and samples used. (B) UMAP plot
   of all single-cell transcriptomes color coded by main cell type (left)
   and bar plot showing proportions of cell types in pooled with localized
   PDAC (locPDAC), metastatic PDAC (metPDAC), and normal samples (right).
   (C) Bubble plot showing highly expressed marker genes in each cell
   type, with cell types in rows and genes in columns. The size of each
   bubble represents the percent of cells expressing marker and color
   represents the level of average expression. (D) Quantification of main
   cell types per patient. (E) Clarification of cancer cells on the basis
   of CNV inference. The two-dimensional plot between the mean squares and
   correlation of the CNV signal for each sample. Red and blue colors
   represent single cells defined as malignant and non-malignant,
   respectively. Gray represents immune cells. (F) UMAP of all epithelial
   single-cell transcriptomes color coded by the cell source.

   The yield of the epithelial transcriptome varied by histological
   subtype. To ensure accuracy in subsequent analyses, we removed any
   samples with an epithelial percentage less than 10% ([75]Figure 1D). To
   separate defined tumor cells from a potentially non-malignant
   population, we obtained epithelial cells from each tumor tissue, using
   genetic aberrations by inferring copy number variations (CNVs) from
   gene expression data ([76]Figures S1N‒S1P; [77]materials and
   methods).[78]^16 We calculated the correlation of each cell’s CNV
   segment expression with the average CNV segment expression of the top
   5% samples on the basis of the CNV score. We identified 4,059 malignant
   epithelial cells, defined by a correlation or CNV score greater than
   0.2 ([79]Figures 1E, [80]S1Q, and S1R). Epithelial cancer cells in
   tumor tissues were removed because of their ambiguous identities
   arising from the lack of CNV. In the subsequent tumor cell analysis, we
   retained the inferred malignant tumor cells and epithelial cells from
   the normal sample ([81]Figure 1F).

Presentation of EMT gene signatures reveals dynamic changes in CTCs during
the metastasis process

   Having obtained cancer cells and normal epithelial cells of PDAC, we
   ventured forth to perform a deeper characterization of EMT dynamics of
   epithelial cells from the primary site, through the circulation to the
   metastasis site. Recent studies have demonstrated that a purity of 50%
   can be achieved with the CTC-chip using the peripheral blood of
   patients with metastatic cancer (lung, prostate, breast, colon, and
   pancreatic cancers).[82]^17 We assessed the tumor purity of samples and
   inferred the immune cell composition within the CTC samples. We grouped
   B cells, T cells, natural killer (NK) cells, plasma cells, eosinophils,
   neutrophils, and monocytes collectively as total white blood cells
   (WBCs). The results indicated that the WBC content of CTCs in high
   tumor purity was significantly lower compared with that in normal
   peripheral blood mononuclear cells (PBMCs) ([83]Figures S2A and S2B;
   [84]Table S1; [85]materials and methods). To identify the EMT gene
   preferentially expressed in a particular cell cluster using CTCs from
   pancreatic cancer patients, we next performed principal-component
   analysis (PCA) using 30 CTCs with relatively high purity and 15 normal
   PBMC samples. The results showed that CTCs were separated from normal
   blood cells of healthy donors, and the first and second principal
   components explained 55% and 19% of the variability (74% in total;
   [86]Figure 2A). This suggests that the expression profiles of CTCs were
   similar between localized and metastatic tumors. To investigate the
   dynamic changes in epithelial and mesenchymal markers of CTCs during
   cancer metastasis, we compared EMT genes (identified by Tan
   et al.[87]^18) in CTCs and epithelial cancer cells ([88]materials and
   methods). We separately screened EMT genes that were differentially
   expressed in epithelial cancer cells and CTCs ([89]Figures S2C and S2D;
   [90]Table S2). We found 28 differentially expressed EMT genes in duct
   PDAC, consisting of 11 epithelial genes and 17 mesenchymal genes
   ([91]Figure 2B). Most of the differentially expressed EMT genes
   demonstrated an upregulation trend, with more mesenchymal genes being
   upregulated than epithelial genes. Moreover, out of all the
   differentially expressed EMT signatures, 10 epithelial genes, and 8
   mesenchymal genes were found to be differentially expressed in both
   epithelial duct samples and CTCs ([92]Figures 2C and [93]S2E). The
   result showed opposite expression patterns of the same gene in CTCs and
   epithelial cells: TACSTD2, EHF, and OR7E14P in epithelial genes and
   VCAM1, DDR2, SFRP1, and ECM2 in mesenchymal genes. This implies EMT
   dynamics in cancer epithelial cells. We observed that differentially
   expressed EMT genes were shared across multiple biological processes.
   Specifically, the differentially expressed EMT genes were significantly
   enriched in terms associated with the activation of the EMT process
   during metastasis. This includes activities related to cadherin
   binding, integrin binding, and functioning as extracellular matrix
   structural constituents ([94]Figure 2D).

Figure 2.

   [95]Figure 2
   [96]Open in a new tab

   EMT phenotypes of mRNA across different PDAC patient groups

   (A) Principal-component analysis between CTCs and normal PBMC samples.
   (B) Differentially expressed EMT gene counts in CTCs and duct PDAC
   compared with normal samples. (C) Heatmap showing the log[2] fold
   change values for differentially expressed epithelial (top) and
   mesenchymal genes (bottom) shared in CTCs and duct PDAC. (D) Cnet plot
   representing the top 7 Gene Ontology enriched terms of differentially
   expressed EMT genes. Enriched terms with high similarity were clustered
   and rendered as a network. Each gene node is colored according to its
   EMT type. The node size of a term represents the number of enriched
   genes and is colored by differentially expressed epithelial genes (EPI)
   and mesenchymal genes (MES) in CTCs and ductal PDAC, respectively. (E)
   Analysis of the correlation between epithelial score (E score) and
   mesenchymal score (M score). Points are colored by different groups.
   (F) Distributions of EM score within the duct and CTC. The distribution
   of the EM score from the CTC is colored in blue, while the distribution
   of the EM score from the duct is colored in green. The black dashed
   lines represent the intersection of the CTC and duct EM score
   distribution curves. The red dots indicate the corresponding EM score
   at the intersection point. (G) Quantification of different patient
   groups in epithelial, partial, and mesenchymal phenotypes. (H) Boxplot
   showing EM scores in different CTC and duct groups. (I and J)
   Distribution of EMT status in CTC and duct groups. (K) Overview diagram
   showing the dynamic changes of the EMT stage from the primary site,
   through the circulation to the metastatic site. The color and shape of
   each point are determined by the group type and display the EM score of
   the relevant sample.

   To evaluate the epithelial stability of PDAC epithelial ductal cells
   and CTCs, we calculated the epithelial score (EPI) and mesenchymal
   score (MES) for each sample on the basis of the different expressed EMT
   genes ([97]materials and methods). Our results show an increase in both
   the EPI and MESs in the CTCs, with the CTC samples tending to have high
   MESs and low EPIs, which is not significant in ductal samples
   ([98]Figures 2E and [99]S2F). To verify the universality of this
   phenomenon, we carried out the same analysis process with these
   differentially expressed EMT genes in two other datasets (one is the
   CTC dataset, and the other is the PDAC dataset from The Cancer Genome
   Atlas [TCGA]). The results were consistent with our analysis, showing
   an increase in both the EPI and MES in the CTCs, with the CTC samples
   tending to have higher MESs and lower EPIs, and in the TCGA datasets,
   while the EPIs remained unchanged, the range of MESs changed greatly,
   implying that the degree of mesenchymal differentiation in tumor
   tissues is heterogeneous ([100]Figure S2G). These findings confirm that
   entry into circulation is a complex event involving alterations in EMT
   signatures that are crucial for the distant spread of cancer cells.

   To measure the overall EMT dynamics from the epithelium to the
   mesenchyme, we generated an EM score for each sample on the basis of
   these differentially expressed EMT genes ([101]materials and methods).
   The distribution of EM scores for both CTC and epithelial duct samples
   shows that scores for CTC are concentrated primarily between 0 and 0.7,
   whereas epithelial duct samples predominantly score near −1 and 1
   ([102]Figure 2F). Then, we subdivided them into epithelial, pEMT, and
   mesenchymal phenotypes on the basis of the cutoff at the intersection
   of the epithelial duct and CTC density curve ([103]Figure 2G;
   [104]Table S3). Normal PBMCs predominantly exhibit a pEMT phenotype,
   while all normal ducts present an epithelial phenotype ([105]Figures 2H
   and [106]S2H). This suggests that samples in epithelial phenotype are
   less mobile and more structured. We also found that the transition from
   pEMT to a fully mesenchymal phenotype was accompanied by an increase in
   the epithelial duct and a decrease in CTCs ([107]Figure S2I). In CTC
   samples, we found that locPDAC accounted for a slightly higher
   proportion than metPDAC in the full mesenchymal phenotype
   ([108]Figure 2I). In addition, we observed that locPDAC epithelial duct
   samples contributed to 25% of both the epithelial and mesenchymal
   phenotypes, whereas metPDAC epithelial duct samples predominantly
   exhibited the mesenchymal phenotype ([109]Figure 2J). The result
   suggests a distinction in the behavior between localized and metastatic
   forms of PDAC and implies some level of dynamic balance between these
   cell types during the process of EMT ([110]Figure 2K). The EMT score
   measures the potential for metastasis and could also be connected to
   disease development. To determine whether this was a common occurrence
   in pancreatic cancer CTCs, we validated this hypothesis on three
   external datasets including CTC and duct tumor samples. The results are
   consistent with our analysis that CTC samples predominantly display
   pEMT and mesenchymal phenotype, duct normal samples are largely
   epithelial phenotype, and duct tumor samples vary across different EMT
   stages ([111]Figure S2J). We analyzed the overall survival of TCGA
   patients, and a higher EM score was associated with poorer overall
   survival ([112]Figure S2K). In addition, we also compared the
   correlation between EM score and the expression of shared EMT genes
   with opposite expression patterns, and we found that the expression of
   Mesenchymal gene VCAM1 and epithelial gene TACSTD2 were significantly
   correlated with EM score ([113]Figure S2L). Evidence has shown that
   CTCs can cluster with neutrophils, enhancing proliferation and
   promoting metastasis. Neutrophils interact with CTCs through binding to
   highly expressed adhesion molecules on tumor cells, such as
   VCAM1.[114]^19 TACSTD2 overexpression is related to increased growth
   and metastasis of pancreatic carcinoma.[115]^20 These results suggest
   that the acquisition of mesenchymal features by epithelial cells is
   closely associated with the migration of CTCs, leading to cancer
   progression.

A range of AS events occur in CTCs during EMT

   AS is an important method of regulating gene expression that expands
   the cell proteome. After demonstrating the significance of the EMT
   program in the tumor cycle, our next step was to identify the splicing
   regulators responsible for it at the exon level. Instead of using blood
   samples as a control, normal epithelial cells were extracted from bulk
   RNA-seq to investigate the intrinsic differences between CTCs and
   epithelial cells ([116]materials and methods). We first analyzed the
   gene coverage of both the CTC RNA-seq and bulk RNA-seq data. This
   comparison confirms that the gene coverage across these two datasets is
   sufficiently similar to permit valid comparative analysis
   ([117]Figure S3A). To quantify the contribution of AS to dynamic
   transcript accumulation during the emergence of EMT, we calculated the
   use of alternative exons using the rMATS statistical model. This model
   estimates the abundance of each exon and assigns it a “percentage
   splicing in” (PSI) value (Ψ) to calculate the use of alternative exons.
   We identified five types of AS events, including exon skipping,
   mutually exclusive exons, alternative 5′ splice sites, alternative 3′
   splice sites, and intron retention. On the basis of the location of
   each AS event, we obtained the intersection of events in CTC samples
   and normal epithelial ducts ([118]Figure S3B). We then identified
   differential AS events in epithelial, pEMT, and mesenchymal CTCs,
   respectively, compared with normal epithelial duct samples (adjusted
   p < 0.05 and delta PSI > 0.1; [119]Table S4). The relative proportions
   of differential AS event types remained stable when comparing
   epithelial, pEMT, and mesenchymal types ([120]Figures 3A, [121]S3C, and
   S3D). Notably, some of the differential AS events identified in our
   results have been reported as indeed existing. For example, the SE
   splicing event associated with the RBP gene MBNL1, located at
   chr3:152445281-152445539+, was experimentally verified by Chu
   et al.[122]^21 Similarly, the SE splicing event related to cell surface
   protein FLNA at chrX:154352552-154352675:- in exon 40 was also
   validated by Oda et al.[123]^22 GO analysis of these events revealed
   their association with signaling pathways related to protein metabolism
   (e.g., organelle biogenesis and maintenance, protein localization) and
   tumor growth factor (e.g., apoptotic signaling pathway, establishment
   of organelle localization), as shown in [124]Figure 3B. On the basis of
   our calculation of the number of events related to those genes, we
   discovered that on average, 20% of genes generated multiple isoforms,
   indicating transcriptional and post-transcriptional diversity within
   CTCs in different EMT states ([125]Figure 3C). We then conducted a
   detailed analysis to explore the composition of those alternative
   spliced genes. To explore the relationship between differential AS
   events and DEGs, we analyzed the DEGs in epithelial, pEMT, and
   mesenchymal phenotype compared with normal epithelial duct samples. We
   found that a few of the genes were both differentially spliced and
   expressed, whereas the majority of genes were alternatively spliced
   without expression changes ([126]Figure S3E). In subsequent analyses,
   we only retained events whose related gene showed no significant
   changes in expression between CTCs and normal epithelial ducts
   ([127]Figure 3D). To gain insights into the regulation of AS and
   functional relationships among genes, we explored the chromosomal
   distribution of these events. Our analysis revealed that relatively
   long chromosomes, such as chr1, chr2, and chr3, had a larger number of
   splicing events compared with other chromosomes in A5SS and MXE,
   suggesting that these chromosomal regions display increased
   transcriptional and post-transcriptional activity ([128]Figures 3E and
   [129]S3F). We went on to compare the delta PSI of differential AS
   events among epithelial cells, pEMT, and stromal cells. We found those
   AS events were different across the EMT stage. These differences were
   predominantly driven by strong relative enrichment of epithelial and
   mesenchymal, compared with their low abundances in pEMT. The relative
   abundances of RI events increased progressively from mesenchymal to
   epithelial ([130]Figure 3F). In addition, we found that multiple
   differential AS events were significantly associated with survival
   outcomes ([131]Figure S3G). Together, the differential AS events
   clearly show the necessity of discriminating AS-specific roles for a
   full understanding of the transcriptional landscape of EMT.

Figure 3.

   [132]Figure 3
   [133]Open in a new tab

   Differential alternative splicing events during EMT in CTCs

   (A) The distribution of AS types in EMT-specific AS events, comparing
   epithelia, pEMT, and mesenchymal CTCs. (B) Enrichment network
   representing the top 8 significantly enriched terms of genes related to
   differential AS events (p < 0.05). Enriched terms with high similarity
   were clustered and rendered as a network, while each node represents an
   enriched term and is colored according to its cluster. Node size
   indicates the number of enriched genes. (C) Bar graph showing the
   proportion of genes associated with a single event or multiple events.
   (D) The count of differential AS events without differences at gene
   level among epithelial, pEMT, and mesenchymal CTC. (E) Distribution of
   delta PSI of shared SE events across chromosomes. (F) The ternary phase
   diagram reveals variations in epithelial, pEMT, and stromal AS events,
   analyzed using delta PSI. Each point, colored per AS type, signifies
   genes: small dots for those associated with a unique splicing event,
   and larger ones for genes linked to multiple events. (G) The count of
   differential AS events related to cell surface proteins. (H) The
   frequencies of exon use in differential AS events related to cell
   surface proteins. (I) Cell surface protein genes associated with
   multiple differential AS events in epithelial, pEMT, mesenchymal, and
   normal duct samples. (J) The pie chart shows the topology present in
   ORF transcripts affected by differential AS events related to cell
   surface protein genes. (K and L) Examples showing topology and domains
   in ORF transcripts, such as in LMBRD1-229 (K) and MCAM-201 (L).

   Cell surface proteins that are specifically altered splicing in tumor
   cells are ideal targets to minimize off-target effects and are
   important markers for the detection and isolation of CTCs. Our analysis
   of splicing data revealed a total of 89 aberrant splicing events
   associated with 77 cell surface proteins among the differential AS
   events ([134]Figure 3G). To identify retained exons in these cell
   surface protein-related AS events, we used a threshold of ΔPSI > 0.1,
   while skipped exons were defined as those with ΔPSI < −0.1. The result
   showed that there exhibited a higher number of retained exons, which
   could potentially impact isoform length ([135]Figure 3H). Among them,
   we found 12 cell surface protein genes associated with multiple AS
   events, which might increase the number of inferred neoantigens in
   metPDAC ([136]Figure 3I). On the basis of the location of these cell
   surface protein gene-related events, we found that they affected 124
   open reading frame (ORF) regions. These regions are associated with 22
   different transcripts, which could potentially affect the reading frame
   and may serve as suitable therapeutic targets ([137]Table S5). To
   further screen for PDAC-specific targets in cancer-specific AS events,
   we then analyzed the topology of those 22 related transcripts. We
   identified 25 distinct topologies, 6 of which were associated with the
   extracellular domain ([138]Figure 3J). Furthermore, transcripts contain
   structures with multiple topologies that may influence cellular
   function, possibly leading to changes in phenotype or even disease
   conditions ([139]Figures 3K and 3L). However, we believe that this is a
   cautious estimate as junction-spanning polypeptides derived from AS are
   inadequately represented because of trypsin’s cleavage properties.
   Consequently, all of the RNA-level candidates we have identified might
   be considered appropriate for additional validation and can be
   developed further as targets for immune therapy. Together, these data
   indicate a specific and sustained requirement for diverse transcripts
   of cell surface protein during all stages of EMT.

AS of RBP orchestrates the EMT-specific AS events

   After highlighting the significance of particular AS events in EMT, our
   subsequent objective was to pinpoint the splicing regulators
   accountable for them. Our focus was on skipped exon events, which are
   one of the most common types of AS. We used the online web server
   rMAPS2 (RNA Map Analysis and Plotting Server 2) to identify
   discriminative cis-regulatory sequence motifs enriched in the cassette
   exons that are preferentially included in epithelial, pEMTs, and
   mesenchymal phenotype, as well as motifs in their flanking introns
   ([140]materials and methods).[141]^23 These cis-regulatory motifs
   corresponded to predicted RBPs. To investigate the association between
   AS and RBPs, we analyzed the correlations between the gene expression
   levels of RBPs and the PSI values of exon-skipped events. We sorted the
   significantly correlated RBPs according to the absolute value of
   correlation and took the top 10 RBPs for downstream analysis
   (correlation R is greater than 0.6, and the p value is less than 0.05;
   [142]Figures S4A‒S4C). We observed that the well-conserved binding
   motifs of these RBPs were significantly over-represented in the
   transcript sequences of flanking the skipped/retained exons compared
   with non-differentially spliced exons ([143]Figures 4A–4C). Most RBPs
   were common among epithelial, pEMT, and mesenchymal, such as DAZAP1,
   FXR1, G3BP2, and HNRNPH2 ([144]Figure S4D). It is worth noting that
   when the same RBPs bind to the same RNA target sequence but possess
   distinct biochemical properties, it may result in varying splicing
   regulatory outcomes.[145]^24 As an example, in the case of
   pEMT-specific skipping events (ΔPSI > 0.1), we identified the HNRNPH2
   motif occurring separately in the intron sequence located 250 bp
   upstream or downstream of the target exon sequence ([146]Figure 4B). In
   contrast, for epithelial and mesenchymal skipping events (ΔPSI > 0.1),
   this motif was absent in the intron sequence found 250 bp downstream of
   the target exon sequence ([147]Figures 4A and 4C). The results suggest
   that splicing factors have a common regulatory effect on AS patterns
   and that RBPs have potential position-dependent effects on exon
   splicing in both the process of pEMT and mesenchymal phenotype.

Figure 4.

   [148]Figure 4
   [149]Open in a new tab

   RBP regulated exon inclusion in CTCs during EMT

   (A‒C) The differential enrichment of RBPs between exon skipped
   (ΔPSI < 0) and exon inclusion (ΔPSI > 0). The motif sequences of the
   corresponding RBP are labeled on the right. Cell colors represent the
   smallest p value less than 0.05 for each enriched region, while blanks
   represent non-significant enrichment. (D) Bar graphs show expression
   levels of MBNL1 across epithelial, pEMT, mesenchymal, and normal duct
   samples. (E and F) The boxplot on the left represents the PSI of MBNL1.
   The genomic location of the alternative exon was labeled at the bottom.
   Sashimi plots on the right show the alternative spliced exons according
   to genomic locations. Junction reads are plotted as arcs whose width is
   determined by the number of reads aligned to the junction spanning the
   exons connected by the arc. (G) Positional distribution of
   MBNL1-binding motifs of epithelial (top), pEMT (middle), mesenchymal
   (bottom) CTC in SE events. Motif enrichment scores (solid line) and p
   values (dashed line) were plotted according to AS event positions.
   Arrows indicate peaks of enrichment for exons. (H) Bar plot showing
   isoform use of MBNL1. Data were analyzed using the Mann-Whitney
   two-sided test. ∗p ≤ 0.05, ∗∗p ≤ 0.01, ∗∗∗p ≤ 0.001, and ∗∗∗∗p ≤
   0.0001.

   Notably, the relative expression level of MBNL1 was not significantly
   different between pEMT PDAC, mesenchymal PDAC, and normal epithelial
   cells ([150]Figure 4D). However, MBNL1 generated distinct transcripts
   through its unique AS mechanism ([151]Figures 4E and 4F). Moreover,
   MBNL1 influences the recruitment of other splicing factors and the
   spliceosome, leading to its direct regulation of AS. Therefore, we have
   selected MBNL1 for further investigation to understand its role in
   epithelial, pEMT, and mesenchymal emergence. To investigate the effect
   of MBNL1 on AS changes between CTC and normal epithelial cells, we
   examined the positional distribution of MBNL1-binding motifs. Our
   analysis revealed a significant overrepresentation of the MBNL1 binding
   motif in the flanking retained exons of CTCs compared with
   non-differentially skipped exons. Additionally, we observed significant
   differences in the positioning of MBNL1 binding sites within CTC
   regions undergoing AS changes ([152]Figure 4G). In addition, we further
   analyzed the differential transcript use of MBNL1. Although there were
   no evident alterations in the expression levels of MBNL1, we noticed
   considerable variation in the use of the MBNL1-018 transcript isoform
   across epithelial, pEMT, and mesenchymal CTCs. On the other hand,
   significant changes in the MBNL1-205 transcript were only seen in
   epithelial CTCs ([153]Figure 4H). Collectively, these findings suggest
   that as EMT evolves step by step, it is accompanied by stage-specific
   variations in transcript diversity. These findings suggest that MBNL1
   can differentially regulate the transition among epithelial, pEMT, and
   mesenchymal phenotypes through AS, independent of gene expression,
   potentially affecting CTC remodeling and differentiation into pEMT and
   mesenchymal subtypes.

Stop codon readthrough affects the diversity of therapeutic targets in CTCs

   To provide a comprehensive understanding of the complexity of a given
   AS event, we analyzed changes in AS event modality relative to the
   stepwise progression from normal epithelial cells to pEMT and finally
   to mesenchymal. Our analysis began by assigning each AS event to one of
   three modalities: (1) included, wherein exons are entirely preserved
   (Ψ > 0.95); (2) excluded, wherein exons are fully skipped (ψ < 0.05);
   and (3) middle, wherein an exon may be included or skipped in RNA
   molecules transcribed from the gene (0.05 < Ψ < 0.95). Our results
   reveal that nearly 90% of AS events exhibited modality switches from
   the norm to epithelial phenotype, while 28.2% of AS events showed
   modality changes from epithelial to pEMT, and 35.6% of them changed
   from pEMT to mesenchymal phenotype ([154]Figures 5A–5C). We observed
   that among the modality change AS events, the majority were attributed
   to the middle mode. The middle mode represents an exon that skips over
   a percentage of transcripts while including the remaining portions. As
   various combinations of exons in a gene can generate multiple protein
   isoforms with unique functions or properties, the incorporation of
   these exons may play a critical role in determining the commitment of
   cells toward pEMT fate ([155]Figure 5D). We found that CYB5R3-related
   splicing events have different patterns in different EMT stages
   ([156]Figure 5E). CYB5R3 is an important component in maintaining
   glucose use, where it interacts with glucose to stabilize
   glucokinase.[157]^25 We found that the transcript of CYB5R3 has an
   extracellular topology domain, which may play a crucial role in
   processes like signal transduction. At the same time, compared with
   normal duct cells, we found that CYB5R3-214 showed significant isoform
   use differences during EMT ([158]Figure 5E). Our findings reveal a
   clear pattern that identifying the CTC-specific isoform may lead to the
   discovery of specific therapeutic targets.

Figure 5.

   [159]Figure 5
   [160]Open in a new tab

   Genes have alternative splice variants containing different stop codons

   (A‒C) Bar plot showing the proportion of AS events with (modality
   change) or without (no modality change) modality changes between
   neighboring developmental stages during EMT. (D) Sankey diagram showing
   the AS modality transition from norm to epithelial, then to pEMT, and
   finally to mesenchymal. (E) Isoforms related to CYB5R3. Top: two
   analyzed isoforms for the CYB5R3 gene: PFAM domains are indicated by
   black. Extracellular topology colored in green. UCSC gene transcript
   IDs are shown for each isoform. Bottom: the average isoform fraction in
   epithelial (left), pEMT (middle), and mesenchymal (right), compared
   with the norm duct. Data were analyzed using the Mann-Whitney two-sided
   test. ∗p ≤ 0.05, ∗∗p ≤ 0.01, ∗∗∗p ≤ 0.001, and ∗∗∗∗p ≤ 0.0001. (F)
   Circos plot of PTC (TAA, TAG, and TGA) signals in overall shared SE AS
   events. (G) Bar plot showing the frequency of SE AS events with
   different types of PTCs. (H) Distribution of different types of PTCs in
   shared SE AS events. (I) The frequencies of each nucleotide are plotted
   for all positions 40 nt upstream to 60 nt downstream of the stop codon.
   Nucleotides are plotted in order of increasing frequencies.

   Exon skipping and retention not only alter transcript length but also
   introduce premature termination codons (PTCs) within gene coding
   sequences.[161]^26 This can activate the evolutionarily conserved
   nonsense-mediated decay machinery, which guards against the negative
   consequences of mRNAs harboring PTCs.[162]^27 To comprehensively
   compare the impact of stop codon identity on these splicing events, we
   conducted a systematic analysis by tallying the occurrences of each
   possible PTC, including TAA, TAG, and TGA ([163]Figure 5F).
   Importantly, our analysis revealed a high frequency of single events
   harboring multiple PTCs, with longer exons exhibiting an increased
   number of PTCs ([164]Figures 5G and 5H). Given the known influence of
   C’s and G’s in promoting efficient translation termination compared
   with A’s and U’s, we investigated the nucleotide use within 40 nt
   upstream and 60 nt downstream of the stop codon in protein-coding
   transcripts ([165]Figure 5I). Our investigation revealed that several
   patterns in this region demonstrate a repeated three-nucleotide
   periodicity. Notably, we observed a significant enrichment of C and G
   nucleotides near the PTC. However, as the distance from the PTC
   increases, the frequency of C and G nucleotides decreases. Taken
   together, it appears that evolution has intricately refined PTC to
   enhance translation termination efficiency. Targeting crucial stop
   codons for readthrough is a promising approach that may be achieved
   without disrupting general translation termination. This trait could
   potentially provide therapeutic benefits when dealing with crucial
   genes identified to have PTCs.

Discussion

   In this study, we provide quantitation of the epithelial to EMT
   continuum of CTCs in PDAC, covering localized and metastatic sites. Our
   findings reveal that CTCs, especially those obtained from metastatic
   samples, can destabilize the epithelium by preferentially expressing
   mesenchymal genes. This could lead to broad perturbations in the
   metastatic process through multiple pathways. Additionally, a
   genome-wide exon-based analysis of CTCs showed considerable variation
   between different EMT stages, allowing for investigation into the
   identification of cell-type-specific isoforms, AS events, and an
   unknown regulatory mechanism of EMT formation. We identified 77
   splicing events in cell surface proteins that were specifically
   expressed in epithelial, pEMT, and mesenchymal states, respectively,
   which may represent potential targets for therapeutic intervention.
   These results suggest that abnormal AS is prevalent in EMT progression,
   with a complex mode switch from epithelial to pEMT and then to
   mesenchymal. Furthermore, we discovered that MBNL1 significantly
   regulates frequent differential AS switches in the epithelial, pEMT,
   and mesenchymal stages through AS without appreciable changes at the
   gene expression level. Finally, upon examination of the epithelial,
   pEMT, and mesenchymal termination codons, we observed a high frequency
   of single events harboring multiple PTCs. This finding could
   potentially lead to transcript diversity and contribute to the
   identification of useful markers for diagnosis and prognosis.

   A previous study discovered that the EMT program is highly plastic and
   demonstrated that epithelial stabilization leads to enhanced collective
   migration of epithelial cancer cells, as evidenced by the use of
   single-cell RNA-seq and genetic mouse models.[166]^28 Epithelial cells
   undergo a process known as EMT, during which they become more mobile
   and invasive by losing their normal cell-cell adhesion and polarity.
   This allows them to break off from the main tumor and enter the
   bloodstream as CTCs.[167]^29^,[168]^30 We explored epithelial stability
   during tumor metastasis to identify disease-associated
   cell-type-specific EMT gene signatures. The analysis of the molecular
   features that distinguish CTCs is an outstanding tool to provide
   insights into the understanding of crucial aspects of the metastatic
   process and could foster use in liquid biopsy-based personalized cancer
   treatment. On the basis of the EMT signature, our results showed that
   the EMT program of CTCs is highly heterogeneous, with many cells
   present in pEMT and mesenchymal states, whereas epithelial cancer cells
   acquire a more stable epithelial phenotype away from pEMT and
   mesenchymal states. Our results are consistent with previously reported
   that the EMT program of cancer cells exhibits molecular alterations,
   including decreased expression of epithelial markers (E-cadherin, ZO-1,
   claudins, and occludins) and increased expression of mesenchymal
   markers (vimentin, N-cadherin, fibroblast-specific protein1, and
   fibronectin).[169]^31^,[170]^32 Notably, we observed that both normal
   epithelial cells and healthy donor samples from blood fluctuated in a
   small range of EM scores, whereas epithelial cancer cells and CTCs
   demonstrated distinct polarization. These results suggest that
   epithelial cells can undergo EMT with broad effects. In addition,
   pEMT-phenotype CTCs were more similar to health donors, suggesting that
   this pEMT intermediate cell state may therefore be more stable than the
   other transition states and provided evidence of a metastatic advantage
   for pEMT CTCs, which was supported by previous
   studies.[171]^33^,[172]^34 Overall, our analysis of EMT signals of
   epithelial cells and CTCs establishes the importance of epithelial
   cells losing their intercellular adhesion and acquiring mesenchymal in
   supporting metastatic dissemination and highlights that pEMT-phenotype
   CTCs and mesenchymal CTCs had different metastases capacity.

   AS is a common process in which transcribed mRNAs generate distinct
   transcript isoforms, leading to protein products that possess unique
   functional properties.[173]^35 However, the current understanding of
   how to monitor disease progression from CTCs to metastasis is primarily
   derived from the observations at the gene expression level.[174]^36
   Ignoring transcriptional isoforms in transcriptome analyses can lead to
   incomplete information about the complex process of EMT. Furthermore,
   it is common for previous studies of CTCs from PDAC patients to use
   purified healthy donor volunteer blood as a control, while the cells in
   purified blood from a healthy donor include red blood cells, WBCs (such
   as lymphocytes and neutrophils), and
   platelets.[175]^4^,[176]^36^,[177]^37^,[178]^38 Nonetheless, one main
   concern is that CTCs are rare cells that have detached from the
   localized tumor and entered the bloodstream, and they may provide a
   closer comparison with the epithelial cells of the localized tumor
   compared with other circulating cells in the bloodstream, including
   their genetic and molecular profile, as well as their morphology and
   behavior. Therefore, using healthy donor blood as a control might
   potentially mask important differences in EMT changes between CTCs and
   healthy individuals. Here, we describe AS changes in epithelial-type
   CTCs, pEMT-type CTCs, and mesenchymal-type CTCs using normal epithelial
   samples as controls to gain a deeper understanding of the unique
   characteristics of each CTC subtype.

   We identified multiple splicing events in cell surface proteins that
   are suitable targets for future research. It has been reported that
   abnormal splicing events provided the corresponding putative AS-derived
   neoepitopes of 16 cancer types.[179]^36 Lin et al.[180]^36 illustrated
   that many highly tumor-specific AS events are putative targets for TCR
   T cell approaches in the GBM. MBNL1 is well known to regulate RNA AS,
   localization, and decay and drives tumor dedifferentiation as well,
   implicating their activity to some extent in the relapse and distant
   metastasis.[181]^39^,[182]^40^,[183]^41 We found that gene MBNL1, with
   the absence of appreciable changes in the gene expression, produced
   significantly different AS events in CTC. Notably, we found a large
   subset of epithelial-, pEMT-, and mesenchymal-specific AS events were
   predicted to be under the control of MBNL1. Genome-wide transcriptomic
   analysis of the MBNL1 gene has revealed a potential close relationship
   between MBNL1 and breast cancer metastatic colonization.[184]^42
   Together, our findings provide further evidence that specific AS
   isoforms might have key regulatory roles independently of expression
   alterations and are a necessary first step for understanding the
   isoform-specific functions of key proteins.

   The premature stop codons may lead to mRNA destruction through
   nonsense-mediated mRNA decay or the production of a truncated
   protein.[185]^42 Using the modality strategy, we explored the
   distribution and variation of AS events across different cell
   populations to reveal exon inclusion bias during EMT. We discovered a
   high frequency of single events harboring multiple PTCs that might
   support models of translation initiation inhibition. This highlights
   the significance of the stop codon context in regulating translation
   termination. However, achieving specificity for PTC readthrough poses a
   significant challenge for nonsense suppression therapies, requiring a
   deeper understanding of translation termination and stop codon
   readthrough in varied sequence contexts.[186]^43^,[187]^44 Despite this
   challenge, such readthrough events can be beneficial in therapy when a
   critical gene contains a PTC.

   In summary, this study suggests that the EMT program at the exon level
   within the CTC is heterogeneous and contributes to the functional
   annotation of the splicing factor by indicating the necessity for
   studying alternatively spliced isoforms during transcriptional
   analysis. These findings have allowed us to explore previously
   unidentified transcriptional processes involved in tumor metastasis and
   offer insights into potential therapeutic interventions.

Materials and methods

Data acquisition

   We downloaded 3 PDAC single-cell RNA-seq (scRNA-seq) datasets
   ([188]GSE156405
   , [189]GSE81547, and [190]GSE155698), 3 CTC RNA-seq datasets
   ([191]GSE144561, [192]GSE40174, and [193]GSE114704), and 2 bulk RNA-seq
   datasets ([194]GSE57973 and [195]GSE193268) from the Gene Expression
   Omnibus (GEO) ([196]https://www.ncbi.nlm.nih.gov/geo/). The
   corresponding RNA-seq data were obtained from the TCGA data portal
   ([197]https://tcga-data.nci.nih.gov/tcga/). Datasets, including GEO:
   [198]GSE156405, [199]GSE81547, [200]GSE155698, [201]GSE144561,
   [202]GSE193268, and [203]GSE57973, were used for analysis.
   Datasets,including GEO: [204]GSE40174 and [205]GSE114704 and TCGA
   samples were used for validation purposes ([206]Table S1).

Single-cell data quality pre-processing and initial cell type discovery

   Quality control and downstream analysis were performed using Seurat
   version 3.2.3.[207]^45 Each scRNA-seq dataset contained a single-cell
   transcriptome count matrix. Samples were analyzed after excluding
   low-quality cells (<1,000 genes/cell and >20% mitochondrial genes). The
   harmony algorithm was used to integrate single cells from different
   platforms and tissues, followed by the construction of a K-nearest
   neighbors (KNN) graph using the FindNeighbors function in Seurat.
   Finally, cell clustering and UMAP visualization were performed using
   the FindClusters and RunUMAP functions, respectively. The annotations
   of cell identity on each cluster were defined by the expression of
   known marker genes.

Inference of CNV from scRNA-seq data

   To distinguish cancer cells from normal ones, chromosomal gene
   expression was perturbed to infer CNV aberrations. For the accuracy of
   cancer cell inference, we have compared the results of two methods:
   inferCNV and CopyKat.[208]^46^,[209]^47 First, we separately used
   CopyKat and inferCNV to compute the signal for copy number profiles
   from the gene expression matrix. To compare the results of these two
   methods, we calculated the CNV scores of all cells using the method
   from previous studies.[210]^48 Because of the algorithmic differences
   between the two methods, we scaled the CNV score. We then arranged the
   cells in ascending order and defined the high and low of the CNV score
   with 25% as the cutoff value. We found that in the results of CopyKat,
   there is a higher proportion of high scores in the epithelial cells,
   while in immune cells, there is a higher proportion of low scores. On
   the other hand, the results from InferCNV present similar proportions
   for both types of cells. Therefore, in subsequent analyses, we adopted
   the predictions of CopyKat. Second, we excluded cells that were not
   predicted and the diploid cells that were predicted to be normal
   epithelial or immune cells. Third, we calculated the mean square and
   the correlation of CNV per cell with the average of the top 5% of cells
   from the aneuploid single cells. This allowed us to detect cancer cells
   and enhance the precision of our findings. Finally, cells specifically
   with a correlation or CNV score greater than 0.2, are categorized as
   cancer cells. Considering that P5_GES156405 contained very few
   malignant epithelial cells, this sample was removed from subsequent
   analyses.

Bulk RNA-seq quality pre-processing

   The CTC datasets read were aligned to the GRCh38 human reference genome
   from the University of California, Santa Cruz
   ([211]http://genome.ucsc.edu), using the STAR version 2.7.4a aligner
   with default settings (Alex Dobin;
   [212]https://github.com/alexdobin/STAR).[213]^49 We used ESTIMATE to
   evaluate tumor purity.[214]^50 The formula for calculating
   ESTIMATE-based tumor purity was developed as follows: Tumor_purity =
   cos(0.6049872018 + 0.0001467884 × ESTIMATEScore). We retained 30 CTC
   samples with a tumor purity greater than 50%, and 15 normal samples
   with a purity less than 50%. To further check the tumor purity of the
   sample, we analyzed the immune cell composition in the selected CTC
   samples, as inferred by the CIBERSORT algorithm.[215]^51

Differential expression analysis

   To perform sample-level differentially expressed EMT genes in
   single-cell PDAC samples and CTC samples in parallel, we used a
   pseudo-bulk approach for single-cell data. We use the functions from
   the SingleCellExperiment package to extract the counts and metadata of
   single-cell data.[216]^52 To calibrate the pseudo-bulk data, we added
   two bulk RNA-seq datasets as references, namely, [217]GSE193268, which