Abstract

   Long-read RNA sequencing has transformed transcriptome analysis by
   enabling comprehensive mapping of full-length transcripts, providing an
   unprecedented resolution of transcript diversity, alternative splicing
   and transcript-specific regulation. In this study, we employed nanopore
   long-read RNA sequencing to profile the transcriptomes of three cell
   types commonly used to model brain disorders, human fibroblasts,
   induced pluripotent stem cells and stem cell-derived cortical neurons,
   identifying extensive transcript diversity with 15 072 transcripts in
   stem cell-derived cortical neurons, 13 048 in fibroblasts and 12 759 in
   induced pluripotent stem cells. Our analyses uncovered 35 519
   differential transcript expression events and 5135 differential
   transcript usage events, underscoring the complexity of transcriptomic
   regulation across these cell types. Importantly, by integrating
   differential transcript expression and usage analyses, we gained deeper
   insights into transcript dynamics that are not captured by gene-level
   expression analysis alone. Differential transcript usage analysis
   highlighted transcript-specific changes in disease-relevant genes such
   as APP, KIF2A and BSCL2, associated with Alzheimer’s disease, neuronal
   migration disorders and degenerative axonopathies, respectively. This
   added resolution emphasizes the significance of transcript-level
   variations that often remain hidden in traditional differential gene
   expression analyses. Overall, our work provides a framework for
   understanding transcript diversity in both pluripotent and specialized
   cell types, which can be used to investigate transcriptomic changes in
   disease states in future work. Additionally, this study underscores the
   utility of differential transcript usage analysis in advancing our
   understanding of neurodevelopmental and neurodegenerative diseases,
   paving the way for identifying transcript-specific therapeutic targets.

1. Introduction

   Cellular differentiation is a complex process where cells transition
   from a pluripotent state to specialized mature phenotypes. This
   developmental continuum, from pluripotent stem cells to differentiated
   cell types like fibroblasts or neurons, is critical for understanding
   developmental biology, disease mechanisms and providing therapeutic
   innovations [[44]1–[45]3]. Likewise, in translational research,
   patient-derived fibroblasts are a more easily accessible cell source
   that can be reprogrammed into induced pluripotent stem cells (iPSC) and
   further differentiated into otherwise inaccessible cell types—opening
   new avenues for disease modelling and therapeutic discovery. Robust
   cell systems have been developed to generate various cell types in
   vitro, including different types of neurons [[46]4–[47]8].

   In the study of neurodevelopmental and neurodegenerative diseases,
   human fibroblasts are widely used as an accessible and clinically
   relevant somatic cell type, particularly when patient-derived neuronal
   tissue is unavailable. Hence, fibroblasts are often used as a proxy to
   study neuronal pathophysiology due to their availability and ease of
   handling. Additionally, fibroblasts can be reprogrammed into iPSC,
   which can subsequently be differentiated into various neural lineages,
   including cortical neurons [[48]9]. This
   fibroblast-to-iPSC-to-iPSC-derived cortical neuron (iCN) model has been
   extensively employed to investigate transcriptomic reprogramming and
   neuronal pathophysiology, especially in patient-derived disease models
   [[49]9–[50]12]. Despite their widespread use, our understanding of the
   differences between fibroblasts and neuronal cell types in terms of
   transcript regulation remains limited. While other ectoderm-derived
   lineages (e.g. surface ectoderm or ectomesenchyme) may offer
   developmental context [[51]13,[52]14], our focus is on capturing
   transcriptomic transitions across the full reprogramming and
   differentiation trajectory. Moreover, isoform-level comparisons between
   fibroblasts and iPSC-derived neurons remain underexplored, and our
   study aims to fill this gap using long-read (LR) RNA sequencing.

   Transcriptome studies in most neuronal models have been limited to gene
   expression studies, which often provide only broad regulatory patterns.
   These analyses tend to overlook the complexity introduced by
   alternative splicing (AS)—a process that affects nearly 95% of human
   pre-mRNAs [[53]15,[54]16]. AS facilitates the generation of diverse
   transcript variants, enabling fine-tuned regulation of gene expression
   across tissues and developmental stages [[55]17]. The regulation of AS
   involves a complex network in which RNA-binding proteins (RBPs) act as
   cis-regulators by directly interacting with intronic and exonic splice
   enhancer and silencer sequences. AS is also controlled by
   trans-regulators, such as transcription factors (TFs), which modulate
   the transcriptional elongation rate and modify chromatin structure
   [[56]6,[57]18–[58]23]. A comprehensive review highlights that AS is
   often coordinated with transcription and regulated by chromatin
   accessibility and epigenetic modifications, linking splicing outcomes
   to RNA polymerase II dynamics and histone marks [[59]24]. AS is
   particularly prevalent in the vertebrate brain, where it plays an
   essential role in key processes such as neurogenesis, synaptogenesis,
   axon guidance and neural plasticity [[60]20,[61]25–[62]31]. Quantifying
   these splice variants can reveal specific patterns associated with
   different cellular conditions, like neuronal differentiation, or
   disease states, like Alzheimer’s disease (AD) and other tauopathies
   [[63]32–[64]36].

   While differential gene expression (DGE) studies have been informative,
   growing evidence highlights the importance of differential transcript
   expression (DTE) and differential transcript usage (DTU) in decoding
   complex gene regulation. Computational tools such as SUPPA2 and DEXseq
   facilitate accurate quantification of AS events from transcript
   abundance estimates, enabling scalable splicing analysis across
   multiple conditions [[65]37,[66]38]. These approaches leverage fast and
   efficient transcript quantification techniques to dissect
   transcript-level regulation, making them especially suitable for cell
   differentiation and disease models where AS plays a critical regulatory
   role.

   Importantly, transcript-level analyses have been shown to improve both
   sensitivity and interpretability of RNA sequencing (RNA-seq) data. For
   example, Soneson et al. [[67]39] demonstrated that incorporating
   transcript-level abundance estimates enhances detection power in
   gene-level differential expression workflows [[68]39]. Moreover,
   transcript- and isoform-level analyses have revealed prognostic
   signatures and cancer-specific splicing networks that remain
   undetectable when relying solely on gene-level data [[69]40,[70]41]. By
   mapping the full landscape of AS and transcript diversity, LR
   sequencing offers unprecedented resolution in understanding gene
   regulation and its implications for human health and disease.

   Although traditional short-read RNA-seq has provided invaluable
   insights into the transcriptome, its limitations—such as an inability
   to accurately resolve complex transcript structures and quantify
   quantifying AS events—are well documented [[71]42–[72]44]. Short-read
   RNA-seq often struggles with repetitive regions and long contiguous
   sequences, leading to fragmented or incomplete transcript assemblies
   [[73]43,[74]45,[75]46]. As a result, transcript-level regulatory
   changes across cell differentiation stages remain incompletely
   characterized.

   By contrast, LR sequencing technologies, such as those developed by
   Oxford Nanopore and PacBio, have revolutionized transcriptomics by
   producing reads that span entire transcripts, enabling accurate
   reconstruction of full-length RNA molecules [[76]47,[77]48]. LR
   sequencing has now been successfully applied to profile the human
   tissue transcriptome, revealing extensive transcript diversity and
   previously unannotated isoforms [[78]49]. This breakthrough allows for
   the identification of novel transcripts and complex splicing events
   that are often missed by short-read methods [[79]50–[80]53].

   In our study, we utilized nanopore RNA-seq to achieve high-quality
   transcriptome profiling and a comprehensive analysis of DGE, DTE and
   DTU between human skin-derived fibroblasts, fibroblast-derived iPSC and
   iPSC-derived iCN. Rather than identifying strictly cell type-specific
   transcripts, we focused on genes and transcripts that are
   differentially expressed or used between cell types, thereby capturing
   transcriptomic transitions along the reprogramming and differentiation
   trajectory. Our DTU analysis revealed differences in transcript usage
   between cell types or during cell differentiation, reflecting AS
   changes potentially influenced by transcriptional, epigenetic or
   post-transcriptional regulatory mechanisms. Notably, we identified
   genes with significant changes in transcript expression and usage,
   including medically relevant genes such as APP, KIF2A and BSCL2, which
   undergo AS. These findings underscore the complexity of transcript
   regulation and highlight the importance of incorporating
   transcript-level analyses to fully understand transcriptomic regulation
   and disease mechanisms.

2. Results

2.1. Long-read sequencing uncovers transcript diversity and complexity across
cell types

   We utilized the Oxford Nanopore Technologies LR sequencing platform to
   generate an average of 17 722 607 QC-passed reads, enabling us to
   assess transcript diversity across various cell types and
   differentiation stages, including human-derived fibroblasts,
   fibroblast-derived iPSC and iPSC-derived iCN. The properties and purity
   of cultured iCN were previously shown [[81]54]. Gene and transcript
   level quantification was performed using our robust pipeline (see §4).
   A transcript was considered expressed above noise levels if its median
   TPM exceeded one within each respective cell type, i.e. evaluated
   independently per cell type. Using this criterion, we identified a
   total of 21 040 unique transcripts across all three cell types with an
   average of 15 072 transcripts in iCN, 13 048 in fibroblasts and 12 759
   in iPSC ([82]figure 1A; electronic supplementary material, figure S1).
   Additionally, our analysis revealed a significant number of lowly
   expressed transcripts (TPM between 0.1 and 1), with a count of 20 939
   in iCN, 18 896 in fibroblasts, 19 571 in iPSC (electronic supplementary
   material, figure S2A).

Figure 1.

   [83]Transcriptome Profiling of Cultured Cells Using Long-Read RNA
   Sequencing
   [84]Open in a new tab

   Transcriptome profiling of cultured cells using long-read RNA
   sequencing. (A) Transcript type distribution per cell type: bar plot
   showing the number and proportion of expressed transcripts by
   biotype—protein-coding (blue), long non-coding RNAs (lncRNA, orange)
   and others (green)—within each cell type (iCN, fibroblasts and iPSC).
   Expression is defined by a threshold of TPM > 1. This panel reflects
   total expressed transcript counts in each cell type but does not
   distinguish whether transcripts are unique to a given cell type or
   shared across types. (B) Transcript diversity per gene: distribution of
   the number of expressed transcripts per gene (TPM > 1), aggregated
   across all cell types. Genes are categorized by transcript biotype. The
   inset zooms in on genes expressing more than 10 transcript isoforms,
   including examples such as DMKN, SNHG29 and GAS5. Note that this
   analysis does not resolve transcript diversity by cell type. (C)
   Protein-coding transcript diversity in OMIM genes: bar plot showing the
   number of expressed coding transcripts per gene for genes listed in the
   OMIM database, aggregated across all samples. Most genes express fewer
   than five isoforms. The inset highlights outlier genes with high
   isoform diversity (e.g. FN1, HNRNPK, GNAS). As in (B), this plot does
   not represent cell-type-specific expression but instead summarizes
   global transcript diversity.

   To assess the accuracy of our LR transcriptomic data across cell types,
   we implemented a comprehensive multi-step validation process. First, we
   curated a set of marker genes specific to neuron (n = 85), fibroblasts
   (n = 132) and iPSC (n = 13) from the PanglaoDB database
   ([85]https://panglaoDB.se/index.html). We examined the expression of
   these marker genes within our samples (electronic supplementary
   material, figure S2B). As expected, the expression patterns of marker
   genes matched the respective cell types. Next, we compared our results
   with previously published datasets, including two Illumina short-read
   datasets [[86]55,[87]56] and a nanopore LR dataset [[88]51]. Principal
   component analysis (PCA) demonstrated that our data clustered
   distinctly by cell type, with clear separation and alignment with
   existing datasets (electronic supplementary material, figure S2C),
   thereby further validating the accuracy of our transcriptomic profiles.
   Samples grouped primarily by cell type along the first principal
   component (PC1), reflecting strong concordance in cell type-specific
   expression patterns across datasets. Notably, our samples aligned
   closely with their respective counterparts from published studies,
   supporting the robustness and validity of our data integration
   approach.

   Residual variation along the second principal component (PC2) was most
   apparent among iCN and likely reflected biological heterogeneity
   between donor-derived cell lines. These differences were consistent
   with known donor- and clone-specific transcriptomic signatures,
   commonly seen in reprogrammed and differentiated cell models
   [[89]57,[90]58], and were unlikely to represent technical batch
   effects, which were effectively minimized during preprocessing.

   Transcripts identified in our samples were classified based on the
   Gencode annotation (v43) into categories such as protein-coding
   transcripts, long non-coding RNAs (lncRNAs) and other biotypes,
   including transcripts containing retained introns or subjected to
   nonsense mediated decay and pseudogene transcripts. On average, 76% of
   the expressed transcripts in our samples were protein-coding, 8% were
   lncRNAs, and 15% belonged to other biotypes across all investigated
   cell types. Specifically, in iCN, 11 382 (75.5%) transcripts were
   classified as protein-coding, 1322 (8.8%) as lncRNAs and 2368 (15.7%)
   as other biotypes ([91]figure 1A). These transcript counts were based
   on a TPM > 1 threshold per cell type and reflected the presence of
   transcripts in each cell type individually, without distinguishing
   whether the transcript is cell-type-specific or shared.

   To explore isoform diversity per gene, we next examined all transcripts
   that passed the expression threshold in at least one of the three cell
   types. Using this approach, we detected 21 040 expressed transcripts
   from 11 853 unique genes. Interestingly, 60.8% (7201) of these genes
   expressed only a single transcript ([92]figure 1B). On the other end of
   the spectrum, three genes demonstrated the expression of more than 20
   transcripts each, including two lncRNA genes—GAS5 (45 transcripts) and
   SNHG29 (28 transcripts)—and one protein-coding gene, DMKN (20
   transcripts) ([93]figure 1B). Additionally, we identified 5135 coding
   transcripts from 2740 genes defined as medically relevant according to
   the OMIM database (released at 2023.7). Among these, 1207 (44.5%) genes
   expressed multiple transcripts ([94]figure 1C). Notably, genes such as
   the highly complex GNAS, which is involved in the assembly of the
   stimulatory G-protein alpha subunit and possibly involved in
   imprinting, the novel internal ribosomal entry site-transacting factor
   HNRNPK, and Fibronectin-1 (FN1), related to cell growth,
   differentiation and migration, exhibited over 10 transcripts each
   ([95]figure 1C). These transcript diversity measurements were computed
   across all cell types collectively, rather than in a cell-type-resolved
   manner.

2.2. Differential transcript expression reveals key transcripts
distinguishing cell types and highlighting disease associations

   Next, we utilized LR RNA-seq data to analyse differential transcript
   expression (DTE) across various cell types and identified 35 519 DTE
   events affecting 16 886 unique transcripts across 10 215 genes. Among
   these comparisons, 12 303 DTE events were identified in the comparison
   of iCN versus iPSC (Wald test with Benjamini–Hochberg (BH) adjusted p <
   0.05 and |log[2]FC| > 1), 12 293 events in iCN versus fibroblasts (Wald
   test with BH adjusted p < 0.05 and |log[2]FC| > 1), and 10 923 events
   in fibroblasts versus iPSC (Wald test with BH adjusted p < 0.05 and
   |log[2]FC| > 1) ([96]figure 2A). The analysis revealed a significant
   overlap of DTE across the three comparisons. 4262 DTE events were
   observed in all conditions. By contrast, a smaller subset of DTE
   transcripts exhibited comparison-specific expression changes. For
   instance, 988 transcripts were uniquely differentially expressed in the
   iCN versus iPSC comparison (electronic supplementary material, figure
   S3A).

Figure 2.

   [97]Differential Expression Analysis
   [98]Open in a new tab

   Differential expression analysis. (A) Bar plot showing the number of
   differentially expressed transcript (DTE, blue) and (B) differentially
   expressed gene (DGE, green) events in three comparisons: iCN versus
   iPSC (left), iCN versus fibroblasts (middle), and fibroblasts versus
   iPSC (right). (C) Volcano plot for DTE of iCN versus iPSC: volcano plot
   displaying the log[2] fold change (x-axis) against the
   −log[10](p‐value) (y-axis) for each transcript. A set of highlighted
   significant DTE events (adjusted p < 0.05 and |log[2]FC| > 1) are
   selected and categorized as top high-ranking events by adjusted p‐value
   (orange), originating from medical-associated genes defined by OMIM
   (green), from RBP mRNAs (yellow), and from cell marker transcripts
   (blue). (D) Volcano plot for DTE of iCN versus fibroblasts: volcano
   plot displaying log[2] fold change (x-axis) against the
   −log[10](p‐value) (y-axis) for each transcript. A set of highlighted
   significant DTE events (adjusted p < 0.05 and |log[2]FC| > 1) are
   selected and categorized as top high-ranking events by p‐value
   (orange), originating from medical-associated genes defined by OMIM
   (green), from RBP mRNAs (yellow), and from cell marker transcripts
   (blue). Each comparison group included nine independent replicates (n =
   9).

   Among DTE events detected in the iCN versus iPSC comparison, several
   transcripts stood out due to their high significance and fold change.
   These included TUBA1A-202, a major component of microtubules crucial
   for cell structure and intracellular transport [[99]59], ATIC-201,
   which plays an essential role in cellular proliferation [[100]60], and
   GAP43-201, known for its role in neuronal growth, regeneration and
   synaptic plasticity [[101]61–[102]65]. Additionally, multiple
   transcripts encoding RBPs were differentially expressed including
   HNRNPF-203, LIN28A-202 and ESRP1-202/203 ([103]figure 2B). We also
   observed differential expression of disease-associated genes (e.g.
   REEP1-201, KIF5A-202, EXOSC5-201), alongside upregulation of neuronal
   marker transcripts (e.g. BEX1-201, CNRIP1-201, TCEAL3-202) and the
   expected downregulation of iPSC marker transcripts in iCN (e.g.
   POU5F1-201, SALL4-201) ([104]figure 2C).

   Similarly, in the iCN versus fibroblasts comparison several transcripts
   showed significant differential expression ([105]figure 2D). These
   included BEX1-201, known to be involved in neurogenesis and neuronal
   differentiation [[106]66,[107]67], GAP43-201, which plays a crucial
   role in nervous system development [[108]68,[109]69], and MLLT11-201,
   associated with the development and progression of various human
   cancers [[110]70]. This comparison also revealed changes in transcripts
   associated with RNA-binding proteins (e.g. HNRNPF-203, RBM3-203) as
   well as fibroblast marker genes (e.g. S100A4-201, CD24-201)
   ([111]figure 2C). Furthermore, transcripts of disease-associated genes,
   such as BSCL2-206, REEP1-201, UCHL1-207/209 and KIF5A-202, all
   associated with hereditary spastic paraplegia (HSP), were significantly
   differentially expressed.

   The comparison between fibroblasts versus iPSC further highlighted key
   transcripts associated with cell marker genes (e.g. S100A4-201,
   POU5F1-201), RBPs (e.g. LIN28A-202) and disease-related genes (e.g.
   MME-203/214, KIF1A-207), which may play a role in maintaining
   fibroblast identity (electronic supplementary material, figure S3B).

   In addition to DTE, we investigated differential gene expression (DGE)
   to provide a broader context for our findings. Importantly, DGE and DTE
   were conducted as independent, parallel analyses, each using the Wald
   test with Benjamini–Hochberg adjusted p < 0.05 and |log[2]FC| > 1. Our
   analysis identified 19 367 DGE events across pairwise comparisons
   between cell types, involving 9200 unique genes. Among these, the
   comparison of iCN versus fibroblasts exhibited the most DGE events (n =
   6706), followed by the comparison of iCN versus iPSC (n = 6692) and
   fibroblasts versus iPSC (n = 5969). These events were presented side by
   side to illustrate the relative abundance of significant changes
   detected at gene and transcript levels ([112]figure 2A,B; electronic
   supplementary material, figure S4A).

   In the DGE analysis of iCN versus iPSC, genes such as TUBA1A, COMMD3,
   ATIC and GNG3 exhibited high significance and fold change, mirroring
   the patterns observed in the DTE analysis. Several other genes,
   including REEP1, KIF5A, LIN28A and POU5F1, were also prominent in both
   DGE and DTE analyses (electronic supplementary material, figure S4B).
   Similarly, in the comparisons of iCN versus fibroblasts and fibroblasts
   versus iPSC, several genes were consistently detected in both DGE and
   DTE analyses (electronic supplementary material, figure S4C–D).
   However, some genes, such as SGCE and STRADA ([113]table 1), did not
   show significant differential expression at the gene level between cell
   types but exhibited distinct differential expression patterns at the
   transcript level (electronic supplementary material, figure S10A,B).

Table 1.

   Summary of representative genes with significant differential
   transcript usage (DTU) between iCN and iPSC. The table lists key genes,
   their cellular function, disease association and their regulation
   status at gene level (upregulated, downregulated or unchanged) in iCN
   from our DGE analysis.

   gene ID

   cellular function

   disease association

   DGE in iCN versus iPSC

   MEAF6

   transcription regulation and chromatin remodelling [[114]71]

   OMIM: *611001

   upregulated

   BSCL2

   lipid storage and metabolism [[115]72]

   distal hereditary motor neuropathy and Silver syndrome [[116]73];
   hereditary spastic paraplegia type17 [[117]74]; OMIM: *606158, #269700,
   #608594. #619112, #270685, #615924

   upregulated

   SEPTIN6

   cytokinesis and cellular organization [[118]75,[119]76]

   OMIM: *300 683

   upregulated

   CDC42

   cell morphology, migration, endocytosis and cell cycle [[120]77]

   cancer and neurodevelopmental disorders [[121]78,[122]79];
   Takenouchi–Kosaki syndrome [[123]80]; OMIM: *116952

   upregulated

   OCIAD1

   cell adhesion [[124]81]

   OMIM: *300683

   upregulated

   APP

   synapse formation

   role in AD [[125]82,[126]83]; OMIM: *104760, #605714

   upregulated

   PFN2

   cell motility and structure [[127]84]

   OMIM: *176590

   upregulated

   C11orf1

   male reproduction [[128]85,[129]86]

   OMIM: *617615, *600393

   upregulated

   RTN4

   inhibition of nerve regeneration and neural plasticity [[130]87]

   OMIM: *604475

   upregulated

   SGCE

   muscle cell membrane stability [[131]88]

   muscle integrity and myoclonus-dystonia syndrome [[132]89]; OMIM:
   *604149, #159900, #608099

   not significant

   KRT8

   cellular structural integrity [[133]90]

   OMIM: *1 48 060

   not significant

   STRADA

   mTOR pathway regulation and cell growth [[134]91]

   polyhydramnios, epilepsy syndrome [[135]92]; OMIM: *608626, #611087,
   #234200

   not significant

   TPD52L2

   vesicle-mediated transport and secretion [[136]93]

   OMIM: *603747

   not significant

   KIF2A

   intracellular transport, mitotic spindle dynamics [[137]94]

   early-onset neurodegeneration [[138]95]; OMIM: *602591, #615411

   not significant

   LDHA

   conversion of pyruvate to lactate [[139]96]

   cancer and metabolic disorders [[140]97]; OMIM: *150000, #612933

   downregulated

   RABGAP1L

   intracellular vesicle trafficking [[141]98]

   neurodevelopmental syndrome [[142]99]; OMIM: *609238

   downregulated

   FKBP11

   protein folding [[143]100]

   OMIM: *610571

   downregulated

   CMC2

   mitochondrial function [[144]101]

   downregulated
   [145]Open in a new tab

2.3. Differential transcript usage highlights transcript complexity and
functional adaptations across cell types

   Differential transcript usage (DTU) refers to variations in the
   relative abundance of transcripts from the same gene across conditions
   or cell types. These variations often indicate functional adaptations
   at the transcript level. Genes were considered to exhibit DTU if they
   had at least one significant DTU event, defined by DRIMSeq-adjusted p <
   0.05 and an absolute change in isoform fraction (dIF) > 0.1. Among the
   4652 genes in our dataset that expressed more than one transcript, we
   identified a total of 5135 DTU events involving 3851 transcripts from
   1894 genes, based on comparisons between cell types. The highest number
   of DTU events was observed in the comparison of iCN versus iPSC (n =
   2109), followed by fibroblasts versus iPSC (n = 1651), and iCN versus
   fibroblasts (n = 1375) ([146]figure 3A). 1332 DTU events were unique to
   the iCN versus iPSC comparison, indicating distinct transcript usage
   patterns in this transition ([147]figure 3A). Interestingly, we
   observed 332 shared DTU events between all comparisons, which might
   suggest common regulatory mechanisms ([148]figure 3A).

Figure 3.

   [149]Differential Transcript Usage (DTU) Analysis Across Cell Types
   [150]Open in a new tab

   Differential transcript usage (DTU) analysis across cell types. (A)
   UpSet plot: UpSet plot illustrating the intersection of DTU events
   across three comparisons: iCN versus iPSC, fibroblasts versus iCN, and
   fibroblasts versus iPSC. The bars indicate the number of shared and
   unique DTU transcripts for each comparison. iPSC versus iCN comparison
   has the highest number of unique DTU transcripts (1332). The bar plot
   denotes the intersection size, circles denote which comparisons have
   overlap, and the set size reflects the total number of genes. (B)
   Volcano plot for iCN versus iPSC comparison: volcano plot showing the
   difference in transcript usage (x-axis) against the −log[10](p‐value)
   (y-axis) for each transcript. Top significant DTUs are highlighted,
   with notable genes such as APP, KIF2A and BSCL2. Points are coloured
   based on the number of transcripts per gene across all cell types, as
   indicated by the inset bar chart (aquamarine = 2 transcripts (n = 871);
   orange = 3 transcripts (n = 552); purple = 4 transcripts (n = 311);
   pink = 5 transcripts (n = 168); green > 5 transcripts (n = 210)). This
   reflects the overall transcript diversity of each gene, not limited to
   DTU-involved isoforms. The highest frequency of DTU events occurs in
   genes with two expressed transcripts, followed by those with three and
   four transcripts. Each comparison group included nine independent
   replicates (n = 9).

   We next categorized the significant DTU events by AS type and analysed
   their distribution across the different cell type comparisons.
   Transition from iPSC to iCN revealed a significant enrichment in iCN
   for alternative 3′ splice site usage (A3 loss), increased intron
   retention (IR gain), and mutually exclusive exon (MES) usage (FDR <
   0.05) (electronic supplementary material, figure S5). These AS patterns
   underscore the extent of splicing-mediated transcriptomic remodelling
   that occurs during cortical neuron differentiation. By contrast,
   comparisons involving fibroblast exhibited fewer or no significant AS
   enrichments, suggesting a more limited role for splicing changes in
   these transitions (electronic supplementary material, figure S5).

   To explore the functional relevance of the 1139 DTU genes between iCN
   versus iPSC, we performed pathway enrichment analysis using Gene
   Ontology (GO) terms and Reactome pathways. This analysis included all
   significant DTU genes (n = 1139). Enriched biological processes (BP)
   identified in the GO analysis included regulation of mRNA processing
   (GO:0050684, intersection size = 24, p = 1.88 × 10^−5), translation
   (GO:0006412, intersection size = 65, p = 0.000201), generation of
   neurons (GO:0048699, intersection size = 106, p = 0.00901), reflecting
   dynamic post-transcriptional and translational regulation during the
   transition from iPSC to iCN (electronic supplementary material, figure
   S6A).

   Enriched cellular components (CC) analysis revealed enrichment for
   focal adhesion (GO:0005925, intersection size = 42, p = 0.000525),
   clathrin coat (GO:0030118, intersection size = 11, p = 0.00503),
   ribosomal subunit (GO:0044391, intersection size = 22, p = 0.021),
   suggesting coordinated regulation of protein synthesis, cytoskeletal
   remodelling and vesicle trafficking during differentiation (electronic
   supplementary material, figure S6A).

   Enriched molecular functions (MF) included molecular adaptor activity
   (GO:0060090, intersection size = 105, p = 3.74 × 10^−7), mRNA binding
   (GO:0003729, intersection size = 39, p = 0.00243), cadherin binding
   (GO:0045296, intersection size = 36, p = 0.00363) (electronic
   supplementary material, figure S6A), highlighting post-transcriptional
   regulation, protein complex formation, and cell to cell adhesion
   processes involved in neuronal lineage commitment (electronic
   supplementary material, figure S6A).

   Reactome pathway enrichment analysis further underscored these findings
   (electronic supplementary material, figure S6B), with significant
   enrichment in pathways like RNA polymerase II transcription
   (REAC:R-HSA-73857, intersection size = 120, p = 0.000254), metabolism
   of proteins (REAC:R-HSA-392499, intersection size = 157, p = 0.00168),
   signalling by Rho GTPases (REAC:R-HSA-194315, intersection size = 68, p
   = 0.006), signalling by Rho GTPases, Miro GTPases and RHOBTB3
   (REAC:R-HSA-9716542, intersection size = 68, p = 0.012), chromatin
   organization (REAC:R-HSA-4839726, intersection size = 33, p = 0.0177),
   and mitochondrial translation (REAC:R-HSA-5368287, intersection size =
   16, p = 0.0369). Together, these may reflect the interplay of
   transcriptional, post-transcriptional and signalling processes that
   drive neuronal differentiation from iPSC.

   Among the 1139 DTU genes identified in the iCN versus iPSC comparison,
   525 genes (46%) were OMIM genes, underscoring a potential link between
   transcript isoform regulation and human disease. An additional 29 of
   1139 DTU genes encoded RBPs, pointing to a pivotal role for
   post-transcriptional regulation during the differentiation of iPSC to
   iCN. Functional annotation of these disease-associated gene and RBPs
   revealed clustering in key BPs, including generation of neurons
   (GO:0048699, n = 53), cell division (GO:0051301, n = 16), organelle
   organization (GO:0006996 n = 104) and translation (GO:0006412, n = 65)
   (electronic supplementary material, figure S7). These findings
   suggested that isoform level regulation within disease genes and RBPs
   may influence essential cellular pathways, thereby affecting both
   neurodevelopment and disease vulnerability.

   Further analysis of DTU event characteristics revealed that genes with
   two expressed transcripts across all cell types were the most common (n
   = 871) ([151]figure 3B). A similar pattern was observed in the
   comparison of iCN versus fibroblasts (n = 566) and fibroblasts versus
   iPSC (n = 689) (electronic supplementary material, figure S8A–D).
   Additionally, many genes exhibitng DTU events were found to have more
   than two expressed transcript isoforms in total ([152]figure 3B). For
   example, in the iCN versus iPSC comparison, genes such as RTN4, BSCL2
   and APP showed multiple transcript variants, highlighting the high
   level of transcript complexity in these genes ([153]figure 3B).

   [154]Table 1 presents 18 representative genes, each containing at least
   one of the top-ranking DTU transcripts (based on adjusted p‐value),
   along with their DGE changes, categorized as upregulated, downregulated
   or not significant in iCN versus iPSC comparison. The number of
   transcripts expressed per DTU gene is summarized in the inset bar plot
   in [155]figure 3B.

   To illustrate the diversity of transcript-level regulation among these
   representative genes, we highlight several examples. CDC42 displayed
   differential expression and transcript usage at both the gene and
   transcript levels (electronic supplementary material, figure S9A). In
   iCN, CDC42 was upregulated at the gene level, with transcript CDC42-201
   predominantly expressed and showing higher usage (electronic
   supplementary material, figure S9A). By contrast, CDC42-208 was the
   transcript predominantly used in iPSC (electronic supplementary
   material, figure S9A). LDHA was downregulated at the gene level in iCN
   but exhibited a complex transcript regulation pattern, with some
   transcripts upregulated and others downregulated, alongside
   differential transcript usage between iCN and iPSC (electronic
   supplementary material, figure S9B). Interestingly, other medically
   relevant DTU genes, such as SGCE and STRADA, did not show statistically
   significant changes at the gene level (electronic supplementary
   material, figure S10A,B). However, both genes exhibited multiple
   differentially expressed transcripts and notable shifts in transcript
   usage. For instance, four STRADA transcripts (STRADA-202, STRADA-203,
   STRADA-231 and STRADA-241) were identified in our dataset (electronic
   supplementary material, figure S7B). STRADA-202 showed the highest
   transcript abundance and usage in iCN, while the other three
   transcripts exhibited higher expression levels and greater usage
   frequency in iPSC (electronic supplementary material, figure S10B).

   To explore the relationship between differential regulation at the gene
   and transcript levels (DTE, DTU and DGE) in greater detail, we focused
   on the medically relevant DTU genes APP, KIF2A and BSCL2. It is widely
   accepted that APP is an Alzheimer hallmark gene and plays a significant
   role as a regulator in neural system development [[156]82]. The
   alternative splicing of APP mRNA can generate approximately 10
   different transcripts, which are differentially expressed in various
   tissues [[157]102,[158]103]. In our analysis, we observed upregulation
   of APP in iCN at gene level (DGE) and identified five distinct APP mRNA
   transcripts ([159]figure 4A). Of these, four (APP-201, APP-202, APP-204
   and APP-205) are annotated as protein-coding, while one, APP-218, is
   classified as protein-coding, though its coding sequence (CDS) has not
   been fully defined in the latest Gencode annotation ([160]figure 4A).
   Our differential transcript analysis revealed that three transcripts
   (APP-201, APP-202 and APP-218) displayed both differential expression
   and usage in iCN compared to iPSC ([161]figure 4A). By contrast,
   APP-204 and APP-205 exhibited differential usages without differential
   expressions. To highlight the magnitude of isoform switching, we
   calculated the values in dIF: APP-201 dIF = −0.56, APP-202 dIF =+0.48,
   APP-204 dIF = −0.23, APP-205 dIF = −0.00075 and APP-218 dIF =+0.054.
   Interestingly, the transcript APP-202, also known as APP[695], which
   lacks the Kunitz-like protease inhibitor (KPI) domain and Ox-2 antigen
   domain, was predominantly expressed in iCN, while APP-201 (APP[770])
   was primarily expressed in iPSC ([162]figure 4A). This isoform switch
   towards APP[695] in iCN is consistent with previous findings in
   Alzheimer’s disease brains, where significant transcript-level changes
   in APP were identified and linked to disease-relevant processing
   pathways [[163]104].

Figure 4.

   [164]Differential Transcript Usage Analysis for the APP, KIF2A and
   BSCL2 GenesFor each of the selected genes
   [165]Open in a new tab

   Differential transcript usage analysis for the APP, KIF2A and BSCL2
   genes. For each of the selected genes (A) APP gene; (B) KIF2A gene; (C)
   BSCL2 gene, the top panel indicates the transcript structures based on
   Gencode annotation v43, with main protein domains indicated by
   different colours. Bottom panels: gene expression (normalized TPM)
   levels (left), transcript expression (normalized TPM) levels (middle),
   and transcript usage (right) in iPSC (pink) and iCN (green).
   Differential gene expression (DGE) and differential transcript
   expression (DTE) were analysed using DESeq2, while differential
   transcript usage (DTU) was analysed using DRIMSeq. Statistical
   significance is indicated by ns (not significant), * (p < 0.05) and ***
   (p < 0.001). Each comparison group included six independent replicates
   (n = 9) (error bars: ± lfcSE (standard error of the log[2] fold change)
   are shown for DGE and DTE plots where applicable; for DTU, error bars
   are not displayed as DRIMSeq is based on a likelihood ratio framework
   and does not estimate standard errors or confidence intervals for
   transcript usage proportions).

   KIF2A (Kinesin Superfamily Protein 2A) plays a crucial role in neuronal
   migration and differentiation, primarily through its involvement in
   microtubule dynamics. Mutations in KIF2A are associated with cortical
   dysplasia and early-onset neurodegeneration
   [[166]95,[167]105,[168]106]. Alternative splicing of KIF2A mRNA
   produces multiple isoforms, each with distinct functional roles.
   Specifically, in mice, the inclusion or exclusion of exon 18, along
   with the alternative 5′ splice site selection in exon 5, generates
   isoforms that differ in their ability to support neuronal migration
   [[169]107]. In our study, three KIF2A transcripts (KIF2A-202, KIF2A-203
   and KIF2A-220) were identified, with significant differential
   expression and usage between iCN and iPSC ([170]figure 4B). Transcript
   usage changes were quantified using dIF: KIF2A-202 dIF = −0.71,
   KIF2A-203 dIF =+0.97 and KIF2A-220 dIF = −0.26. Notably, KIF2A-202 and
   KIF2A-220 were predominantly expressed in iPSC, while KIF2A-203, an
   isoform implicated in neuronal migration [[171]107], was more prevalent
   in iCN ([172]figure 4B). Importantly, despite the differential
   expression and usage of these transcripts, the overall gene expression
   of KIF2A remained unchanged between iPSC and iCN ([173]figure 4B). This
   finding suggests that while KIF2A is subject to complex regulation at
   the transcript level, its overall gene expression is tightly
   controlled, possibly to maintain essential cellular functions.

   BSCL2 (Berardinelli-Seip Congenital Lipodystrophy 2) encodes Seipin, an
   integral endoplasmic reticulum membrane protein crucial for lipid
   droplet formation and metabolism. Mutations in BSCL2 are associated
   with congenital generalized lipodystrophy type 2, as well as
   neurodegenerative axonopathies such as hereditary spastic paraplegia
   and distal hereditary motor neuropathy [[174]108–[175]110]. Aberrant
   splicing of BSCL2 has been implicated in several pathological
   conditions [[176]111]. In our analysis, we detected several BSCL2
   transcripts (BSCL2-203, BSCL2-206, BSCL2-207, BSCL2-210 and BSCL2-227)
   with differential expression and usage in iCN compared with iPSC
   ([177]figure 4C). Transcript usage changes were quantified using dIF:
   BSCL2-203: dIF =+0.365, BSCL2-206 dIF =+0.014, BSCL2-207 dIF = −0.36,
   BSCL2-210 dIF = −0.038, and BSCL2-227 dIF = −0.0198. Notably, BSCL2-203
   was predominantly expressed in iCN, while BSCL20-207 was more prevalent
   in iPSC, though expressed at significantly lower levels ([178]figure
   4C).

2.4. Comparative analysis of DGE, DTE and DTU reveals overlapping and unique
gene functional patterns

   We conducted a direct comparison of the genes identified in each
   analysis (DGE, DTE and DTU). The greatest overlap was observed between
   DGE and DTE genes, with a smaller proportion of significant genes
   identified by all three methods ([179]figure 5A–C).

Figure 5.

   [180]Venn Diagrams of Differential Gene Expression (DGE), Differential
   Transcript Expression (DTE)
   [181]Open in a new tab

   Venn diagrams of differential gene expression (DGE), differential
   transcript expression (DTE) and differential transcript usage (DTU)
   across cell type comparisons. Venn diagrams illustrating the overlap
   between differentially expressed genes (DGE; green), differentially
   expressed transcripts (DTE; blue), and genes with differential
   transcript usage (DTU; red) in: (A) iCN versus iPSC, (B) iCN versus
   fibroblasts, (C) fibroblasts versus iPSC. Each circle represents one of
   the categories (DTE: top right; DGE: top left; DTU: bottom; red), with
   the numbers indicating the count of genes in each category and their
   intersections.

   In the comparison of iCN versus iPSC, 6559 genes were differentially
   expressed both at the gene and transcript level, with 516 genes shared
   across all three differential analyses ([182]figure 5A). Similar
   patterns were observed in the comparisons of iCN versus fibroblasts
   ([183]figure 5B) and fibroblasts versus iPSC ([184]figure 5C).
   Interestingly, a notable proportion of DTE genes were not found in the
   DGE analysis, suggesting that transcript-level regulation plays a
   pivotal role in cell-specific functions, which may be overlooked by
   solely investigating gene-level expression. The DTU analysis identified
   the smallest set of genes, many of which overlap with either DGE or
   DTE, suggesting that changes in transcript usage often coincide with
   changes in overall transcript or gene expression levels. The small
   number of unique DTU genes indicates that few genes exhibit changes in
   transcript usage without showing changes in transcript abundance or
   gene expression.

   We performed functional enrichment analysis using gprofiler2 on DGE,
   DTE and DTU genes across all comparisons. The top 10 GO terms with the
   best p-values of enrichments from each category—BP, CC and MF—are
   displayed in [185]figure 6 and electronic supplementary material,
   figures S12–13. In the comparison of iCN versus iPSC, we found
   significant enrichment of 861 (DGE), 941 (DTE) and 174 (DTU) GO terms
   (p < 0.05). Many of these terms were related to neuronal functions,
   with ‘neuron’, ‘synapse’ and ‘axon’ prominently appearing in DGE (151
   terms, 18%) and DTE (137 terms, 15%), reflecting BPs related to
   cellular regulation, neurogenesis and neuron development ([186]figure
   6A). Additionally, genes identified through DGE and DTE analysis in the
   iCN versus iPSC comparison were enriched for cellular components of
   intracellular anatomical structures and organelles ([187]figure 6B).
   While DGE and DTE sets showed strong enrichments for neuronal
   functions, the DTU gene set exhibited a distinct pattern. In this case,
   neuronal pathway enrichment was more limited, with only two significant
   terms: generation of neurons (GO:0048699) and neuron differentiation
   (GO:0030182) ([188]figure 6A). Instead, DTU genes were more
   specifically associated with molecular functions such as to cadherin
   (GO:0045296), DNA (GO:0003677) and mRNA binding (GO:0003729)
   ([189]figure 6C), indicating additional layers of transcriptomic
   regulation that may not be captured by DGE and DTE analyses alone.

Figure 6.

   [190]Gene Ontology (GO) Term Enrichment Analysis for DGE, DTE, and DTU
   in iCN
   [191]Open in a new tab

   Gene Ontology (GO) term enrichment analysis for DGE, DTE and DTU in iCN
   versus iPSC. Dot plots illustrate the top 10 terms with the best
   p-values for enrichment from three categories: (A) biological processes
   (BP), (B) cellular components (CC), and (C) molecular functions (MF)
   for genes identified through DGE, DTE and DTU analyses. (D) Reactome
   pathway for genes identified through DGE, DTE and DTU analyses. Each
   dot represents a specific GO term or Reactome pathway (size of dots
   indicate the number of associated genes; colour reflecting the
   −log[10](p‐value), signifying the level of statistical significance).

   To further investigate the unique insights provided by transcript-level
   analyses, we focused the subset of 1512 genes identified exclusively at
   the transcript level, i.e. those identified in DTE but not found as to
   be differentially expressed at the gene level in DGE. These genes,
   which exhibit transcript-level changes in expression without
   corresponding gene-level differences, likely reflect regulatory
   transcript-level regulation that would otherwise be missed in standard
   gene-level analyses. Notably, this set included 379 OMIM associated
   genes, 24 RBP genes and 9 genes that overlapped between OMIM and RBP
   categories, highlighting their clinical and regulatory importance.

   To assess their biological roles, we projected these
   transcript-specific OMIM and RBP genes onto the GO biological process
   network derived from all DTE genes. The resulting network visualization
   (electronic supplementary material, figure S10) emphasized the
   functional involvement of these genes in key biological processes that
   were not captured by DGE. Highlighted BPs included nervous system
   development (GO:0007399), RNA splicing regulation (GO:0043484),
   mitochondrial respiratory chain complex I assembly (GO:0032981) and
   TORC1 signalling regulation (GO:1903432). Within these pathways, we
   specifically emphasized genes uniquely detected through
   transcript-level analyses, demonstrating their critical functional
   contributions that would be overlooked by gene-level approaches.

   In addition to GO enrichment, Reactome pathway analysis further
   revealed the functional distinctions among these gene sets ([192]figure
   6D). While DGE and DTE were predominantly enriched in
   neurodevelopmental pathways, including axon guidance
   (REAC:R-HSA-422475), neuronal system (REAC:R-HSA-112316), nervous
   system development (REAC:R-HSA-9675108) and synaptic transmission
   (REAC:R-HSA-112314), the DTU gene set exhibited a distinct enrichment
   profile. DTU genes were associated with pathways related to chromatin
   organization (REAC: R-HSA-4839726), RNA polymerase II transcription
   (REAC: R-HSA-73857) and peroxisomal protein import (REAC: R-HSA9033241)
   pathways. These results suggest that isoform-level regulation plays a
   key role in fine-tuning gene expression and modulating epigenetic
   processes, complementing the broader transcriptional patterns captured
   by DGE and DTE analyses.

   A similar enrichment pattern emerged in the comparison of iCN versus
   fibroblasts, supporting the notion that these regulatory profiles
   reflect the unique neuronal identity. By contrast, the fibroblast
   versus iPSC comparison revealed a markedly different set of enriched
   pathways, primarily associated with RNA/DNA metabolism, cell cycle
   progression and transcription reflecting regulatory changes
   characteristic of the reprogramming process rather than neuronal
   differentiation. Further details on GO term and Reactome pathway
   enrichment analysis for both iCN versus fibroblasts and fibroblasts
   versus iPSC are available in electronic supplementary material, figures
   S12–S13.

   In conclusion, the integrative analysis of DGE, DTE and DTU highlighted
   the complexity of gene regulation across different cellular contexts.
   While DGE and DTE overlapped substantially, pinpointing to shared
   biological relevance, DTU offered additional resolution by capturing
   transcript regulatory events that would otherwise be missed,
   emphasizing the importance of considering transcript usage in
   transcriptomic studies.

3. Discussion

   Our comprehensive transcriptome analysis using nanopore long-read RNA
   sequencing (LR RNA-seq) has provided valuable insights into the
   transcript diversity and expression profiles of human iPSC-derived iCN,
   fibroblasts and iPSC. By integrating DTE and DTU analyses, we uncovered
   a level of transcriptomic complexity that extends beyond conventional
   gene expression studies. This approach was essential for identifying
   transcript-specific regulation and AS events, offering a more nuanced
   understanding of the functional implications of transcript variation
   between cell types. The primary goal of our study was to characterize
   transcript expression and its dynamic changes during cellular
   differentiation. Importantly, our dataset is intended to serve as a
   baseline reference for other researchers working with human-derived
   disease models, providing a valuable framework for comparing
   transcriptomic profiles and guiding future investigations into disease
   mechanisms and potential therapeutic strategies as well as choice of
   appropriate cellular model system.

   Our results revealed that iCN exhibited the highest number of expressed
   transcripts compared with fibroblasts and iPSC. This aligns with
   findings from Heberle et al. [[193]112] and Page et al. [[194]49], who
   also emphasized the expansive transcriptomic landscape required for
   proper neuronal function [[195]49,[196]112]. This observation is
   consistent with the complex roles of neurons, which rely on a diverse
   set of transcripts to support critical processes like synaptic
   plasticity, neurotransmission and neurogenesis.

   Our DTU analysis revealed a cell-type-specific AS pattern, with IR
   being more prevalent in iCN compared with iPSC. This finding aligns
   with the emerging view of IR as a regulated and functionally meaningful
   mechanism during neuronal differentiation [[197]113,[198]114].
   Consistent with our findings, Braunschweig et al. [[199]113]
   demonstrated that IR is surprisingly widespread in mammalian
   transcriptomes and functions as a ‘transcriptome tuning’ mechanism
   [[200]113]. Their study showed that IR acts through both
   nonsense-mediated decay (NMD) and nuclear retention, reducing the
   levels of transcripts that are not required for the physiological
   identity of a given cell type, especially in the nervous system where
   IR is more conserved across species. Boutz et al. [[201]114] further
   refined this understanding by describing a distinct class of detained
   introns (DIs), introns that are retained specifically in the nucleus
   within polyadenylated transcripts and are not subject to NMD
   [[202]114]. These DIs are post-transcriptionally spliced in response to
   cellular signalling, such as Clk kinase activity, suggesting a role for
   IR beyond transcript suppression, serving as a regulated reservoir of
   transcripts that can be spliced upon demand to dynamically alter
   protein production [[203]114]. More recently, Petrić Howe et al.
   [[204]115] identified a class of cytoplasmic intron-retaining
   transcripts (CIRTs) enriched during motor neuron development
   [[205]115]. These transcripts were not associated with reduced gene
   expression but instead showed selective enrichment for RBP and miRNA
   interactions, suggesting a distinct post-transcriptional regulatory
   role for IR [[206]115]. Together, these findings suggest that the IR
   events enriched in iCN represent a multifaceted regulatory mechanism,
   modulating gene expression during neuronal lineage commitment through
   both transcript-level suppression and selective post-transcriptional
   interactions. In our dataset, DTU genes were significantly enriched for
   functions related to the cytoplasm, protein binding and neuron
   generation, reinforcing the notion that IR contributes to the
   functional specialization of the cortical neuron transcriptome.

   Within the framework of transcript-level analysis, DTE provides
   critical insights into how different isoforms of the same gene are
   expressed under specific conditions, while DTU highlights changes in
   the relative abundance of transcript isoforms, reflecting regulatory
   shifts in splicing across cell states or disease contexts. Increasing
   evidence supports the role of DTE and DTU in the progression of
   neurodegenerative diseases. For example, in Parkinson’s disease (PD),
   DTU analysis has revealed 23 splicing events across 19 genes, including
   THEM5, SLC16A1 and BCHE, in the prefrontal cortex, suggesting
   functional consequences for altered isoform expression [[207]116].
   Similarly, in Huntington’s disease (HD), the HTT gene produces multiple
   isoforms via alternative splicing, impacting the aggregation properties
   of the huntingtin protein and contributing to HD pathology [[208]117].

   Building on this evidence, our study underscores the power of DTE and
   DTU analyses in revealing the functional consequences of AS. For
   instance, our investigation into the APP gene highlighted distinct
   splicing patterns that are relevant to AD [[209]103]. Alternative
   splicing of APP directly influences the production of amyloid-beta
   peptides, a hallmark of AD pathology [[210]82,[211]118–[212]120].
   Specifically, our combination of DTE and DTU analysis revealed that the
   neuron-specific isoform APP[695] (APP-202) was predominantly expressed
   in iCN, whereas APP[770] (APP-201) and APP[751] (APP-204) were more
   prevalent in iPSC. These findings align with the well-established role
   of APP in AD pathology, where APP[695], lacking the KPI domain, is more
   susceptible to proteolytic cleavage, leading to the generation of
   amyloid-β peptides, a hallmark mechanism of AD [[213]121]. The higher
   expression of KPI-containing isoforms in non-neuronal cells might
   suggest a protective mechanism against amyloidogenic processing
   [[214]122–[215]126]. In addition to APP, genes such as BIN1 have been
   shown to exhibit significant differential transcript expression and
   usage, particularly in the temporal and frontal lobes of AD patients,
   further highlighting the pivotal role of AS in neurodegeneration
   [[216]104]. This observed pattern of isoform expression supports the
   validity of our transcript-level analyses and highlights the
   significance of transcript-specific regulation in understanding disease
   mechanisms.

   Moreover, our study demonstrated that transcript differential analyses
   can uncover cell-type-specific transcript patterns that would otherwise
   be missed with gene-level analyses alone. For example, we identified
   key transcript isoforms in KIF2A that exhibited distinct splicing
   patterns during the transition from iPSC to iCN, even though overall
   gene expression levels showed no significant differences. Specifically,
   isoforms KIF2A-202 and KIF2A-220, which lack exon 17, were
   significantly upregulated according to DTE analysis and predominantly
   expressed in iPSC according to DTU analysis. By contrast, KIF2A-203,
   which retains exon 17, was upregulated in iCN based on DTE and showed
   predominant usage according to DTU analysis. This suggests that neurons
   utilize specific KIF2A isoforms during differentiation, likely to
   support microtubule depolymerization, a process critical for neuronal
   migration and axonal development [[217]95,[218]127]. Importantly,
   transcript differential analyses (DTE and DTU) not only identify these
   transcript patterns but also guides formation of hypotheses on the
   regulatory mechanisms of AS. For example, the regulation of AS in KIF2A
   may be influenced by neuron-specific RBPs, which are known to bind to
   intronic sequences and 3′ UTRs, influencing exon inclusion and mRNA
   stability [[219]128,[220]129]. This mechanism could ensure the
   inclusion of exon 17 in KIF2A-203, preserving its role in microtubule
   depolymerization, essential for neuronal differentiation and migration
   [[221]107]. On the other hand, isoforms lacking exon 17, such as
   KIF2A-202 and KIF2A-220, may serve broader, more generalized cellular
   roles.

   In the past, genomic mutations were often linked only to gene-level
   changes, providing limited insight into how these mutations impact
   specific transcript isoforms. However, our findings demonstrate that
   while a gene may exhibit straightforward regulation at the gene level,
   its transcripts can display complex and varied regulatory patterns, as
   exemplified by APP and KIF2A. This indicates that transcript-level
   changes, such as AS or DTU, can lead to the production of isoforms with
   distinct functional properties, even when gene expression remains
   unchanged. These transcript-specific variations may drive disease
   pathology by altering key cellular processes like protein function,
   localization or molecular interactions. Therefore, transcript-level
   analyses allow us to directly link genomic mutations to specific
   transcript isoforms, offering more precise therapeutic targets for
   intervention. For example, mutations in the BSCL2 gene, such as the
   homozygous c.974dupG mutation, have been shown to lead to exon 7
   skipping, potentially producing an abnormal transcript [[222]110]. This
   splicing alteration, which results in the loss of exon 7 in certain
   isoforms, has been linked to severe neurodegenerative conditions like
   Celia’s encephalopathy [[223]110]. In our study, we found that the
   BSCL2-203 transcript, which includes exon 7, is the dominant isoform of
   the Seipin family expressed in iCN. The presence of exon 7 in BSCL2-203
   may be critical for maintaining normal neuronal function, and any
   disruption in this splicing pattern, such as the exon 7 skipping caused
   by the c.974dupG mutation, could lead to pathogenic outcomes. This
   example illustrates how transcript-specific changes, driven by
   mutations that cause AS, can be directly tied to disease mechanisms,
   emphasizing the importance of identifying specific transcripts as
   potential therapeutic targets. Therefore, one of the most significant
   contributions of transcriptome differential analyses may be the
   potential to identify disease-causing targets. In our study, nearly
   half of the DTU genes in the iPSC-to-iCN transition were linked to OMIM
   disease loci, and several were annotated as RBPs. Similarly, genes
   identified exclusively by DTE, not detected by DGE, showed a high
   proportion of OMIM- and RBP-associated entries. These findings further
   highlight the clinical and regulatory significance of transcript-level
   changes, while overall gene expression may remain unchanged, suggesting
   that AS and isoform shifts may directly contribute to disease
   mechanisms.

   Beyond its biological contributions, this study also provides a
   valuable benchmark for the transcriptomics field. As bioinformatics
   tools for LR RNA-seq analysis continue to mature, comprehensive
   datasets such as ours, spanning fibroblasts, iPSC and induced cortical
   neurons, can serve as a useful reference for evaluating transcript
   quantification, alternative splicing detection and isoform usage
   metrics. We anticipate that this resource will support the refinement
   of existing pipelines and stimulate the development of new algorithms
   optimized for long-read sequencing technologies.

   While our study provides important insights into transcript-level
   regulation during neuronal differentiation, several limitations should
   be acknowledged:
     * (1) The functional assignment of specific isoforms remains
       challenging, particularly in the absence of isoform-resolved
       experimental validation.
     * (2) Although Oxford Nanopore Technologies LR sequencing enables
       full-length transcript profiling, it is known to have a higher
       base-calling error rate compared to short-read platforms, which may
       affect transcript quantification and splicing accuracy
       [[224]130–[225]133].
     * (3) Despite applying batch correction, residual variability across
       samples, especially in iPSC-derived neurons, may reflect
       inter-donor and clonal differences, which can be confounding
       factors in interpretation [[226]58]. These factors should be
       considered when generalizing our findings, and future work
       involving patient-derived datasets and orthogonal validation
       approaches will be necessary to refine isoform-level functional
       interpretations.
     * (4) The functional assignment of specific isoforms remains
       challenging, particularly in the absence of isoform-resolved
       experimental validation. While we observed transcript usage
       differences in disease-associated genes such as KIF2A and BSCL2,
       the corresponding isoforms are annotated in current reference
       transcriptomes. However, their isoform-level validation in
       disease-relevant datasets remains limited, underscoring the need
       for follow-up studies in patient-derived systems.

   In conclusion, by leveraging nanopore LR RNA-seq, we were able to
   explore transcript diversity at a high resolution, providing a
   comprehensive framework for understanding transcript-specific
   regulation and its impact on cellular identity. This understanding is
   critical for advancing research into neurodevelopment and
   neurodegeneration. Importantly, we show that LR RNA-seq, combined with
   DTE and DTU analyses, can uncover the complexities of transcript
   regulation, particularly how AS shapes cellular identity and function.
   Transcript-level analysis may prove crucial in the future, as genetic
   testing alone overlooks changes in isoform expression and AS that may
   contribute to disease pathology. AS or alterations to the predominantly
   expressed isoforms in disease-relevant tissues may also account for
   diseases that currently lack a genetic explanation. Furthermore,
   identification of disease-relevant transcripts may open potential
   therapeutic avenues, as interventions targeting specific isoforms could
   restore normal splicing patterns and prevent the production of
   pathogenic isoforms [[227]134]. Ultimately, transcript-level analyses
   have the potential to significantly enhance our approach to genetic
   testing by identifying mutations that lead to aberrant splicing or
   isoform production. Taken together, these analyses hold significant
   promise for advancing our understanding of cellular functions,
   especially in neurodevelopmental and neurodegenerative diseases, where
   transcriptomic diversity and splicing dysregulation play pivotal roles.
   Moving forward, a continued focus on transcript-level changes in
   various disease models will provide deeper insights into the molecular
   mechanisms of disease progression and may lead to the identification of
   novel therapeutic targets for intervention.

4. Methods

4.1. Generation of human cell lines

   The study was approved by the Institutional Review Board of the
   University of Tübingen Medical Faculty at the University Hospital
   Tübingen, Germany (IRB: 423/2019BO1). All participants gave their
   written informed consent to study participation. Human skin fibroblasts
   were obtained from healthy donors as previously described [[228]135].
   Fibroblasts were cultured in DMEM high glucose media (Sigma) with 10%
   FCS (Life Technologies). From these fibroblasts and iPSC were
   reprogrammed as previously described [[229]136–[230]138], using
   episomal plasmids (pCXLE-hUL ID: 270776, pCXLE-hSK, ID: 27078, and
   pCXLE-hOCT3/4, ID: 27076) [[231]139]. The reprogrammed cells were
   seeded onto Matrigel-coated (1:60 in DMEM, Corning^®) 6-well plates in
   fibroblast media. The next day, the medium was supplemented with 2 ng
   ml^−1 FGF2 (Peprotech) and 1% P/S (Gibco). On day 3, medium was changed
   to Essential 8 (E8; in-house) media with 100 µM sodium butyrate
   (Sigma-Aldrich) and 0.1% P/S and changed every other day. After 3 to 4
   weeks, colonies were picked and expanded. Cryo-stocks were obtained
   using 50% E8 media with 40% KO-SR (Life Technologies), 10% DMSO
   (Sigma-Aldrich) and 10 μM Y-27632 (Abcam Biochemicals). A PCR
   Mycoplasma Test (AppliChem) was performed following manufacturer’s
   recommendation. Reprogrammed iPSC were differentiated into cortical
   neurons of layers V and VI. The differentiation followed previously
   established protocols [[232]54,[233]140,[234]141]. Briefly, iPSC were
   seeded at a density of 3 × 10^5 cells cm^−2 on Matrigel-coated plates
   (Corning^®), in E8 medium supplemented with 10 µM SB431542
   (Sigma-Aldrich) and 500 nM LDN-193189 (Sigma-Aldrich). The cells
   underwent neural induction over 9 days. Post induction, on day 9, the
   cells were split at a 1:3 ratio and then cultured in 3N medium with 20
   ng ml^−1 FGF-2 for an additional 2 days. From day 11 to day 26 after
   induction (DAI 11-26), cells were maintained in 3N medium, with
   addition of heparin (100 µg ml^−1) on DAI 13 and 15. Medium changes
   occurred bi-daily. iCN were split on DAI 26, with supplementation of 10
   µM DAPT (Tocris) and 10 µM PD0325901 (Tocris) in DAI 27 and 29. iCN
   were maintained until DAI 37 and RNA was isolated according to the
   manufacturer instructions (Qiagen; RNeasy Mini Kit). Cortical neuron
   identity was previously shown by immunocytochemistry (neuronal markers:
   β-III-tubulin, TAU; dendritic marker: MAP2; cortical layer V (CTIP2)
   and VI (TBR1) markers) and RT-qPCR (cortical layer markers: FOXG1 and
   PAX6; dendritic marker: MAP2; microtubule-associated marker: DXC) at
   DAI 36 [[235]54]. A total of four cell lines were used for the
   experiments ([236]table 2). All cells described were maintained at 37°C
   and 5% CO[2].

Table 2.

   Cell lines used for long-read RNA-sequencing experiments.

   human-derived fibroblasts

   fibroblast-derived induced pluripotent stem cells

   iPSC-derived cortical neurons

   CO-1

   CO-1 2.1

   CO-1 2.1

   —

   CO-2 1.1

   CO-2 1.1

   CO-3

   CO-3 B.1

   CO-3 B.1

   CO-4

   —

   —
   [237]Open in a new tab

4.2. Long-read RNA-sequencing processing and quality control

   RNA concentration was estimated using the Qubit Fluorometric
   Quantitation and RNA Broad-Range Assay (Thermo Fisher Scientific) and
   RNA Integrity Number RIN using a Fragment Analyzer 5300 and a Fragment
   Analyzer RNA kit (Agilent Technologies). Libraries were prepared using
   a PCR-cDNA Barcoding Kit (SQK-PCB111.24) from Oxford Nanopore
   Technologies according to manufacturer’s instructions. A total of 200
   ng of total RNA was annealed for strand-switching reaction and reverse
   transcription with Maxima H Minus RT (Thermo Fisher Scientific). The
   resulting cDNA was amplified with LongAmp Hot Start Taq Master Mix
   (NEB) according to the protocol from ONT; with following modifications;
   extension for 9 min with 12 cycles of amplification and a final
   extension of 10 min. Library molarity was determined by measuring the
   library size using a FemtoPulse instrument and a Genomic DNA 165 kb Kit
   (Agilent Technologies) and the library concentration using Qubit
   Fluorometric Quantitation and dsDNA High sensitivity assay (Thermo
   Fisher Scientific). An equimolar pool of four barcoded libraries was
   cleaned up with 0.8 × AMPure XP beads (Beckman Coulter). The pools were
   quantified and assessed with Qubit and Femto, as individual libraries.
   The Rapid Adaptors were added to the pool of amplified cDNA and 20 fmol
   of the library was loaded on a R.9.4.1 PromethION flow cell
   (FLO-PRO002) and ran on a PromethION instrument in High Accuracy
   Basecalling mode for 72 h.

   Demultiplexing was performed by MinKNOW software. Initially, quality
   was assessed using the pycoQC tool [[238]142]. Reads were processed
   with Pychopper ([239]https://github.com/nanoporetech/pychopper). The
   FASTQ files were mapped to the reference human genome (GRCh38) using
   the minimap2 tool with the splicing option and quantified using Bambu
   v3.2.4 [[240]143] to obtain transcript-level counts and CPM. Then CPM
   was converted to TPM (transcripts per million) with Gencode annotation
   v43. For each cell type, median transcript expression was calculated
   across replicates. Transcripts with a median TPM > 1 were defined as
   expressed in that cell type. Finally, the expressed transcript sets
   from each cell type were combined to generate a unified list of
   expressed transcripts across all conditions.

   To obtain gene-level expression, we aggregated the TPM values of all
   expressed transcripts that mapped to the same gene by summing the
   transcript-level TPMs. These gene-level TPMs were then used for
   downstream analyses, including differential gene expression.

   Principal component analysis (PCA) was performed on three published
   datasets, along with our own dataset, using the prcomp function in R
   version 4.3.1. Prior to PCA, the curated published datasets were
   converted to gene expression TPM values and integrated with our gene
   expression data. To remove technical batch effects, we applied the
   removeBatchEffect function from the limma package (version 3.58.1). We
   then compared the first two principal components of the integrated
   datasets to assess whether they exhibit similar transcriptome profiles.

4.3. Differential expression analysis

   Counts from expressed transcripts across all cell types were used to
   perform differential gene expression (DGE) and differential transcript
   expression (DTE) analysis using the DESeq2 (version-1.42.1) R package.
   Pairwise differential tests were conducted between cell types, such as
   fibroblasts versus iPSC, iCN versus iPSC, and iCN versus fibroblasts.
   Genes and transcripts were considered to have differential expressions
   between cell types if their corresponding test adjusted p-values were
   <0.05 and |log[2]FC| > 1. Upregulated genes and transcripts with
   significant DGE or DTE events (log[2]FC > 1) were selected for Gene
   Ontology (GO) term enrichment tests. For example, in the iCN versus
   iPSC comparison, genes upregulated in iCN were used to represent iCN
   expression. GO term tests were performed using gProfiler2(v0.2.3) in R
   and the top 10 GO terms with the best adjusted p-values from each
   category—biological processes (BP), cellular components (CC) and
   molecular functions (MF)—were selected for visualization. Reactome
   pathway enrichment was also performed using gProfiler2, and
   significantly enriched pathways (adjusted p < 0.05) were included to
   further interpret the biological functions associated with
   differentially expressed genes and transcripts.

4.4. Differential transcript usage analysis

   Differential transcript usage (DTU) analysis was performed using the
   IsoformSwitchAnalyzeR v2.2.0 framework in combination with DRIMSeq
   [[241]144,[242]145]. DRIMSeq conducted the DTU analysis based on
   transcript usage calculated from TPM values of expressed transcripts.
   Statistical significance was assessed using a Dirichlet-multinomial
   model with likelihood ratio tests, providing gene- and transcript-level
   p-values. As the model does not estimate standard errors or confidence
   intervals for transcript proportions, error bars are not included in
   usage plots. Instead, statistical significance is visualized by
   annotating adjusted p-values (FDR) in the plots.

   We integrated the DRIMSeq results with gene and transcript differential
   expression results produced by DESeq2 (in §4.3) within
   IsoformSwitchAnalyzeR framework to predict alternative splicing events
   and their consequences. Additionally, to ensure robust results, we
   applied several filtering steps as recommended by
   IsoformSwitchAnalyzeR. Initially, transcripts were filtered based on a
   minimum gene expression threshold (gene_condition_1 > 1 and
   gene_condition_2 > 1) to exclude noise. DTU events were identified
   using DRIMSeq test results with an adjusted p < 0.05 and an absolute
   usage difference |dIF| > 0.1. Identified DTU genes were then selected
   for GO term enrichment tests. gProfiler2(v0.2.3) in R and the top 10 GO
   terms with the best adjusted p-values from each category—biological
   processes (BP), cellular components (CC) and molecular functions
   (MF)—were selected for visualization.

   To predict the functional consequences of differentially utilized
   transcripts, we analysed coding potential (CPC2) [[243]146], protein
   domains (Pfam) [[244]147], signal peptides (SignalP 5.0) [[245]148],
   intrinsically disordered regions (IDR) [[246]149], using external
   sequence analysis tools including Pfam, CPC2, SignalP-5.0 and IUPred2A
   [[247]150].

Acknowledgements