Abstract In eukaryotes, alternative promoter (AP), alternative splicing (AS), and alternative polyadenylation (APA) are three crucial regulatory mechanisms that modulate message RNA (mRNA) diversity. Although AP, AS and APA are involved in diverse biological processess, whether they have dynamic changes in Angiotensin II (Ang II) induced senescence in rat primary aortic endothelial cells (RAECs), an important cellular model for studying cardiovascular disease, remains unclear. Here we integrated both PacBio single-molecule long-read isoform sequencing (Iso-Seq) and Illumina short-read RNA sequencing (RNA-seq) to analyze the changes of AP, AS and APA in Ang II-induced senescent RAECs. Iso-Seq generated 36,278 isoforms from 10,145 gene loci and 65.81% of these isoforms are novel, which were further cross-validated by public data obtained by other techonologies such as CAGE, PolyA-Seq and 3′READS. APA contributed most to novel isoforms, followed by AS and AP. Further investigation showed that AP, AS and APA could all contribute to the regulation of isoform, but AS has more dynamic changes compared to AP and APA upon Ang II stimulation. Genes undergoing AP, AS and APA in Ang II-treated cells are enriched in various pathways related to aging or senescence, suggesting that these molecular changes are involved in functional alterations during Ang II-induced senescence. Together, the present study largely improved the annotation of rat genome and revealed gene expression changes at isoform level, extending the understanding of the complexity of gene regulation in Ang II-treated RAECs, and also provided novel clues for discovering the regulatory mechanism undelying Ang II caused vascular senescence and diseases. Keywords: rat primary aortic endothelial cells senescence, Iso-Seq, RNA-seq, Angiotensin II, alternative promoter, alternative splicing, alternative polyadenylation Introduction Rat has become an important model in the field of cardiovascular diseases (CVDs) such as hypertension, cardiac hypertrophy, and heart failure ([34]Doggrell and Brown, 1998). However, the Rat genome is not well annotated compared to that of human and mouse ([35]Wang et al., 2019). Endothelial cell (EC) senescence appears to be the first step in a series of events that culminate with the development of cardiovascular pathologies ([36]Carracedo et al., 2018). The renin-angiotensin system (RAS) plays an important role in vascular biology and Angiotensin II (Ang II) is a principal effector of RAS. Ang II serves as an important signaling molecule involved in atherogenic stimuli and is known to promote cellular senescence and aging ([37]Shan et al., 2008). Ang II can stimulate cell migration ([38]Nadal et al., 2002) and induce angiogenesis via upregulation of vascular endothelial growth factor (VEGF) ([39]Suganuma et al., 2005). Ang II is also involved in the pathogenesis of endothelial dysfunction (ED) through multiple signaling pathways, including angiotensin type 1 receptor (AT1R)-mediated NADPH oxidase (Nox)/reactive oxygen species (ROS) ([40]Nguyen Dinh Cat et al., 2012). Most CVDs, as well as age-related cardiovascular changes, are associated with increases in oxidative stress, due to increased generation and/or decreased metabolism of ROS ([41]Ding et al., 2020). Ang II can also activate NF-κB pathway, which in turn induced the expression of pro-inflammatory genes ([42]Kim et al., 2012) and inflammation-related genes [e.g., VCAM-1, iNOS, COX-2 and IL-6] ([43]Yang et al., 2018). These inflammatory factors ususlly correlated with series of senescence-associated secretory phenotypes (SASP). Previous studies have investigated the events of Ang II-induced senescence of ECs from molecular perspective ([44]Forrester et al., 2018), however, no systematic analysis of the dynamic changes at the isoform levels has been performed. Transcription is a highly regulated process, and stress-induced changes in gene transcription have been shown to play a major role in stress response and adaptation ([45]Vilborg et al., 2017). Alternative promoter (AP), alternative splicing (AS), and alternative polyadenylation (APA) are three key steps that determine the final products of gene transcription, which greatly increase the transcriptome complexity ([46]Anvar et al., 2018; [47]Demircioğlu et al., 2019). Alternative promoter generates transcription start sites (TSSs) from different genomic locations and results in gene isoforms with distinct 5′untranslated regions (5′UTRs) ([48]Ayoubi and Van De Yen, 1996). Alternative promoters are particularly fascinating due to their abnormal usage in several diseases ([49]Singer et al., 2008). Previous analysis shows that the p21-coding gene has two transcripts, p21B and p21C, derived from alternative promoters and exhibit different functions. While p21C induces cell cycle arrest, p21B appears to stimulate apoptosis ([50]Landry et al., 2003). In eukaryotic genomes, AS is a prevalent phenomenon that can expand the protein pool without increasing the number of genes ([51]Roy et al., 2013). This helps to promote the evolution of complex functional transcriptomes that can control a range of cellular, molecular, and developmental events. AS is emerging as a critical contributor to senescence and aging ([52]Kwon et al., 2021). For example, PRPF19 downregulation changes the MDM4 splicing isoform from MDM4-FL to MDM4-S, which induces p53-p21-dependent cellular senescence ([53]Yano et al., 2021). APA is a phenomenon that RNA molecules with different 3′ends originate from distinct polyadenylation sites (PASs) of a single gene, which can influence the translation efficiency, stability, and localization of an mRNA ([54]Tian and Manley, 2016; [55]Chen et al., 2017). Multiple studies had showed that APA-mediated 3′UTR length changes were dynamically regulated during cellular senescence and could serve as a novel player in regulating senescence ([56]Chen et al., 2018; [57]Shen et al., 2019; [58]Schwich et al., 2021). Although AP, AS or APA have been separately reported to function in replicative senescence or age-related diseases, whether they are involved in Ang II-induced RAECs senescence, an important and widely used cellular model in cardiovascular disease, remains unkown. Traditional short-read RNA-seq is commonly used for quantifying genome-wide gene expression and AS events due to its high throughput ([59]Wang et al., 2009). But there is still a challenge for short reads [typically 100 to 250 base pairs (bp)] to infer the structure of full-length transcripts, which are often several thousand bases long ([60]Williams et al., 2014). Single-molecule real-time (SMRT) Iso-Seq using the Pacific Biosciences (PacBio) platforms can capture full-length transcripts, making it possible to refine genome annotation and discover novel transcripts and fusion genes. However, the base sequencing error rate for Iso-Seq is relatively higher than RNA-seq, so high-quality RNA-seq reads are often used to correct errors in Iso-Seq such as FMLRC ([61]Wang J. R. et al., 2018), proovread ([62]Förster et al., 2014), LoRDEC ([63]Rivals and Salmela, 2014). The quantification accuracy for Iso-Seq is also lower than RNA-seq ([64]Wang et al., 2019). Therefore, a combination of both Iso-Seq and RNA-seq would improve the performance at both qualitative and quantitative levels. In this study, we systematically characterized the transcriptome diversity and changes in RAECs before and after Ang II treatment. We first used the PacBio Iso-Seq approach ([65]Gordon et al., 2015) to generate full-length isoforms. We found widespread transcript diversity, with the detection of novel transcripts not in existing rat genomic annotations, and used short-read RNA-seq to cross-validate and complement the Iso-Seq result. We discovered many novel and differential alternative AP, AS and APA events between Control and Ang II-induced cells. Our results widen the research of Ang II-induced senescence in RAECs and also contribute to the investigation of transcriptome complexity in the research field of CVDs. Materials and methods Culture of rat primary aorta endothelial cells (RAECs) Rat primary aortic endothelial cells (RAECs) were isolated from male rats and those with the purity greater than 95% (determined by both positive staining for CD31 and morphologically based on their classical “hill and valley” appearance) were kept for the subsequent study, as performed according to the methods described previously ([66]Yang et al., 2018). Cells that experienced three to five passages were used for Ang II stimulation experiments. Quantified RAECs were treated with Ang II (2 μM) or vehicle (sodium chloride) for 48 hours (h) as described before ([67]Yang et al., 2018). RNA-seq library construction PolyA^+ RNA was enriched by oligo (dT)25 Dynabeads (Invitrogen) from total RNA. The dUTP-based strand-specific RNA-seq libraries for control and Ang II-induced RAECs were constructed according to the method previously described ([68]Yang et al., 2020). RNA-seq libraries were sequenced on Illumina HiSeq platform, and paired-end reads of 150 nucleotide (nt) lengths were obtained. PacBio sequencing library construction PolyA^+ RNA after double-stranded cDNA preparation, PCR amplification, product purification with AMPurePB magnetic beads, verification of DNA size, followed by qualitative and quantitative analysis using a Bioanalyzer. The resulting double-stranded DNA was used to generate the SMRTbell™ libraries using PacBio Template Prep Kit. SMRTbell templates are then sequenced in the PacBio Sequel System. Iso-seq data processing We performed initial data processing using IsoSeq v3 pipeline ([69]IsoSeq/isoseq-clustering.md at master·PacificBiosciences/IsoSeq·GitHub) for raw subreads. The circular consensus sequencing (CCS) reads were obtained through the consensus generation step. Then, we determine whether they are classified as full-length reads based on the presence or absence of the 5′and 3′primers, and use lima v2.0.0 ([70]https://lima.how/) to remove the primers. The obtained full-length reads, were refined by trimming poly (A) tails and removing concatemer, then full-length non-concatemer (FLNC) reads are finally obtained. To find transcript clusters, isoform-level clustering algorithm ICE (Iterative Clustering for Error Correction) ([71]Gordon et al., 2015) was applied to all full-length (FL) transcripts to obtain the consensus sequence for each cluster. Quiver ([72]Chin et al., 2013) was used for error correction to obtain high-quality (HQ) (accuracy, ≥99%) isoforms. To get non-redundant isoforms, HQ isoforms were mapped to rat genome (mRatBN7.2) using minimap2 ([73]Li, 2018) with parameters -ax splice -uf--secondary = no -C5 -O6,24 -B4, and collapsed by a third-party module Cupcakes Python script (collapse_isoforms_by_sam.py) with the following parameters “-c 0.99 -i 0.85”. The 5′degraded isoforms were filtered away because the Clontech SMARTer cDNA kit used to create the full-length cDNA does not do cap trap, these may be 5′RNA degradation products and not biologically meaningful. In that case, we filter them away according to the method described in Cupcake ([74]https://github.com/Magdoll/cDNA_Cupcake/wiki/Cupcake:-supporting-s cripts-for-Iso-Seq-after-clustering-step#filtersubset). Then the resulted isoforms were finally annotated using SQANTI3 ([75]Tardaguila et al., 2018). SQANTI3 marked the internal priming resulted PAS by identifying downstream six consecutive “A” or the percentage of “A” is greater than 60% in the 20 nucleotides downstream of the potential PAS, the transcript is considered to be an IntraPriming transcript and is filtered out. The coding potential of the resulted isoforms was also predicted with SQANTI3 which uses GeneMarkS-T (GMST) algorithm ([76]Tang et al., 2015). GeneMarkS-T utilizes iterative self-training and a hidden semi-Markov model to predict coding regions in eukaryotic transcripts. All FLNC reads were merged and the FMLRC ([77]Wang J. R. et al., 2018) was then used for error correction with RNA-seq to get high-quality FLNC reads. The corrected FLNC reads were then mapped to mRatBN7.2 genome using minimap2 ([78]Li, 2018) with parameters -ax splice -uf--secondary = no -C5 -O6,24 -B4, then using the sam_to_gff3.py from cDNA_Cupcake to generate an FLNC gff3 file. This file was then compared with non-redundant HQ isoforms using inhouse scripts to calculate the FLNC reads supporting HQ isoforms in Control and Ang II samples independently. Transcript quantification and differential expression analysis First, RNA-seq raw reads were cleaned by trimming using Cutadapt ([79]Martin, 2011) with the default parameter. The clean reads were mapped to the mRatBN7.2 reference genome using STAR (version 2.7.7a) ([80]Dobin et al., 2013) with parameters--twopassMode Basic--outFilterMultimapNmax 1 --outSAMstrandField intronMotif. Differential expression analysis between two conditions was performed using the DESeq2 R package (version 1.30.1) ([81]Love et al., 2014). The resulting p values were adjusted using Benjamin and Hochberg’s approach for controlling for the false discovery rate (FDR). Genes with an adjusted p-value <0.05, and |FoldChange (FC)| > 1.5 identified by DESeq2 ([82]Love et al., 2014) were assigned as differentially expressed. Transcripts per million (TPM) of reference transcriptome defined using Iso-Seq was used in calculating the expression level of transcripts by Salmon (v1.8.0) ([83]Patro et al., 2017). Functional annotation and enrichment analysis For differentially expressed genes (DEGs) and differentially spliced genes, GO and KEGG enrichment analysis were performed with clusterProfiler ([84]Yu et al., 2012), which supports statistical analysis and visualization of functional profiles for genes and gene clusters. Alternative promoter analysis Active promoter sites were identified by proActiv (v0.1.0) ([85]Demircioğlu et al., 2019) based on transcriptome generated by Iso-Seq. Short-reads of RNA-seq were aligned to the reference genome using STAR ([86]Dobin et al., 2013). The aligned RNA-seq bam files and GTF files generated from Iso-Seq were used as input by proActive to estimate promoter activities in each sample. We divided the promoter set into three categories depending on their relative promoter activity calculated by proActive, major, minor and inactive promoters. Promoters with the highest relative activity for each gene across the sample cohort were defined as major promoters. Besides, promoters with relative promoter activity less than 0.1 constitute inactive promoters whereas the rest promoters of a gene constitute minor promoters. Known promoter was defined if known genome annotation also uses this promoter. Novel promoter was defined only if Iso-Seq isoforms use this promoter. Differentially regulated promoters (DRPs) were identified using DESeq2. The resultant p values were adjusted using BH approach for controlling the false discovery rate (FDR). The promoters with an activity change level of |FoldChange| > 1.5 and adjusted p-value <0.05 were considered significant DRPs. We identified alternative promoters (APs) by considering both promoter expression and promoter activity. The criteria were as follows: (1) mean relative promoter activity >0.1 and were significantly changed between two conditions (FDR <0.05, BH adjusted); (2) |Fold Change| > 1.5 of promoter expression. (3) Gene expression FPKM >1. Alternative splicing and specific isoform analysis Through the use of SUPPA2 ([87]Trincado et al., 2018), a tool designed to find splicing events and determine whether there are differences between samples, alternative splicing patterns were found with the PacBio full-length transcripts. By scanning the ‘exon’ lines and categorizing each local event as a particular splicing type, the SUPPA2 “generateEvents” command was used to create local alternative splicing events from our high-quality PacBio transcript annotation. Then, using the Salmon ([88]Patro et al., 2017) quantification results, the SUPPA2 “psiPerEvent” command was used to quantify the percentage of spliced-in (PSI) values for each local alternative splicing event. Differential expression of AS events was screened out with the adjusted p-value <0.05. Alternative polyadenylation analysis PAS cluster was using with modified TAPIS 1.2.1 ([89]Abdel-Ghany et al., 2016) ([90]https://bitbucket.org/comp_bio/tapis/src/master/) scripts that PASs within 50 base pair (bp) were clustered as one PAS. APA analysis between two conditions was performed by using the algorithm of DaPars ([91]Xia et al., 2014). As DaPars predicts the proximal PAS using RNA-seq and Iso-Seq could provide validated PAS, we therefore supplied Iso-Seq supported PAS for APA analysis. These PAS were used as prior knowledge using a modified DaPars algorithm to calculate the distal PAS usage index (PDUI) based on RNA-seq coverage, and then Fisher exact test was used to determine differential APA events between the two groups, and the criteria for defining differential APA events between two groups were consistent with DaPars. 3′READS and Poly (A)-seq data analysis 3′READS and Poly (A)-seq reads were mapped to mRatBN7.2 genome using Bowtie 2 with the default parameters ([92]Langmead and Salzberg, 2012). Reads start pos were considered as potential PAS. PAS with at least five reads support is considered a potential validated PAS. CAGE data analysis CAGE reads were mapped to mRatBN7.2 genome using Bowtie 2 ([93]Langmead and Salzberg, 2012) with default parameters. Then peak calling was performed using MACS2 ([94]Zhang et al., 2008). The resultant files with suffix of.treat_pileup.bdg and.summits.bed were used for further analysis. Results Overall statistics of the Iso-Seq profiling on RAECs RNA-seq is widely used for the quantification of gene expression and AS events. However, it is still challenging to accurately identify full-length splicing isoforms using RNA-seq alone ([95]Abdel-Ghany et al., 2016). To reveal the complexity of the transcriptome of rat primary aortic endothelial cells (RAECs) and discover dynamic changes of alternative promoter (AP), alternative splicing (AS) and alternative polyadenylation (APA) in Ang II-induced endothelial cell senescence, we sequenced the transcriptome of RAECs before (Control) and after Ang II treatment using both PacBio Iso-Seq and Illumina RNA-seq techniques. We used PacBio’s official pipeline, IsoSeq3, to filter and process the raw sequencing files (details see: [96]https://github.com/PacificBiosciences/IsoSeq/blob/master/isoseq-clu stering.md). As a result, a total of 27,668,459 raw subreads with an average length of 1,838 bp were obtained in Control sample and 33,151,284 raw subreads with an average length of 1,700 bp in Ang II treatment samples ([97]Table 1). The PacBio official pipeline we adopted for the analysis of raw data includes circular consensus sequencing (CCS) reads calling, primer removal and demultiplexing, and the refining steps ([98]Supplementary Figure S1) (Details in Methods section). After processing with these steps, we got 1,469,285 FLNC reads with an average length of 2,361 bp in the Control sample and 1,676,143 FLNC with an average length of 2,486 bp in the Ang II sample, respectively ([99]Table 1). Then, polish step was performed on all the full-length transcripts to generate consensus isoforms. It created a consensus isoform for each cluster after classifying them into groups based on their sequence similarity. Subsequently, the full insert sequence that had not been tested was aligned to the adjusted to produce high-quality (HQ) isoforms (125,900) with a mean length of 2,603 from the consensus isoform and low quality (LQ) isoforms (5,323) with a mean length of 2,807 ([100]Table 1). To get non-redundant transcripts, HQ isoforms were mapped to rat genome (mRatBN7.2) using minimap2 ([101]Li, 2018) and collapsed with a third-party module Cupcakes Python script (collapse_isoforms_by_sam.py). Then we got 36,278 transcripts derived from 10,145 gene loci. These transcripts were annotated by SQANTI3 ([102]Tardaguila et al., 2018) for further analysis. TABLE 1. Summary of PacBio raw data analysis of Control and Ang II in RAECs. Subject Category Control Ang II Subreads Subreads number 27,668,459 33,151,284 Average length (bp) 1838 1700 Number of CCS Full-length non-concatemer (FLNC) 1,469,285 1,676,143 Average length (bp) 2,361 2,486 HQ isoform Number of polished high-quality isoforms 125,900 Average length (bp) 2,603 LQ isoform Number of polished low-quality isoforms 5,323 Average length (bp) 2,807 [103]Open in a new tab Full-length isoforms greatly improve rat genome annotation To validate the reliability of these isoforms identified by Iso-Seq, we used a series of public available datasets generated by independent methods including cap analysis gene expression (CAGE) data from FANTOM ([104]Lizio et al., 2015), RNA-seq short junction reads, rat 3′READ raw data used in polyA_DB3 ([105]Wang R. et al., 2018), rat PolyA-Seq data ([106]Derti et al., 2012), and ncbiRefSeq known rat genome annotation (REF) to construct reliable 5′-ends, splicing junction, and 3′-ends pool, respectively ([107]Figure 1Ai). If the isoforms have the same splicing pattern and PAS (within 50 nt) but different potential TSS, we only kept theose whose TSS are closest to the validated 5′-ends pool ([108]Figure 1Aii, iii). Since Iso-Seq protocol uses the similar strategy as the Clontech SMARTer cDNA kit to generate full-length cDNA, which does not perform cap trap, and thus the obtained 5′ends can be possibly derived from degradation products or reverse transcriptase drop offs ([109]Schaarschmidt et al., 2020). To filter out possible artificial isoforms, Iso-Seq defined transcription start site (TSS) and polyadenylation site (PAS) within 50 nt of validated 5′-ends and 3′-ends pool respectively, were called supported. As for splicing, if an Iso-Seq defined splicing event could be supported by at least two short junction reads in the pool, it was also called supported. We found considerable high support rates for TSS (96.5%), splicing junction (86.4%) and PAS (93.0%) ([110]Figure 1B), suggesting the reliability of Iso-Seq detected isoforms. FIGURE 1. [111]FIGURE 1 [112]Open in a new tab High confident full-length isoforms improve rat genome annotation. (A) The schematic diagram illustrating the cross-validation of full-length transcript end based on CAGE, PolyA-Seq and 3′READ. The transcriptome construction involves three steps. Step i: We generated validated pools of 5′-ends, splicing junction and 3′-ends from CAGE, short junction reads, PolyA-Seq and 3′READS, respectively, followed by comparing the aligned Iso-Seq isoforms with the validated 5′-end, junction, and 3′-end pool. Step ii: Mark those isoforms with the same splicing structure and PAS within 50 nt and only display 5′UTR difference (dotted closed circle). Step iii: We keep the one whose TSS is closest to the validated 5′-ends pool. (B) The support rate of PacBio isoforms for TSS, junction, and PAS independently. (C) Types and percentages of eight categories of PacBio defined transcripts. (D) Venn diagram shows the number of PacBio isoforms that differ from ncbiRefSeq in transcriptional start sites (TSS), alternative splicing (AS), and alternative polyadenylation (APA). (E) Arhgap21 as an example to show novel isoforms with novel TSSs when compared to ncbiRefSeq and were supported by CAGE. (F) Khdc4 as an example to show novel isoforms with intron retention when compared to ncbiRefSeq annotation and were supported by PacBio and RNA-seq. (G) Arhgef2 as an example to show novel isoforms with proximal PAS when compared to ncbiRefSeq annotation and were supported by PacBio and PolyA-Seq/3′READS. After confirming the reliability of the detected isoforms, we next asked whether full-length Iso-Seq could improve the gene annotation of rat genome. We used UCSC ncbiRefSeq as a reference for comparison, because this annotation is the most complete for rat genome ([113]Ji et al., 2020). We adopted the transcript structure classification strategy of SQANTI3 ([114]Tardaguila et al., 2018), which divides transcripts into eight categories compared to reference, to classify the Iso-Seq identified transcripts ([115]Supplementary Figure S2). Among all these transcripts annotated to known genes (n = 34,642), 61.59% (n = 21,337) were categorized as full-splice match (FSM, corresponds to a transcript with all the same splice junctions as a reference transcript) in catalog, 15.12% (n = 5,238) were categorized as novel not in catalog (NNC; transcripts use novel donors and/or acceptors), 13.54% (n = 4,692) were categorized as novel in catalog (NIC, transcripts contain new combinations of annotated splice junctions or novel splice junctions formed from annotated donors and acceptors), 8.42% (n = 2,916) were categorized as incomplete splice match (ISM, transcripts matching consecutive, but not all, splicing junctions of the REF), 0.59% (n = 204) were categorized as genic, 0.32% (n = 112) were categorized as intergenic transcripts, 0.23% (n = 80) were categorized as fusion, and 0.18% (n = 63) were categorized as antisense transcripts [structure categories defined by ([116]Tardaguila et al., 2018)] ([117]Figure 1C). We proceeded to characterize transcripts in these eight structure categories above. First, we cross-validated the reliability of transcripts in each category. PacBio transcripts with 5′-ends within 50 nt to CAGE or REF TSSs peaks were considered as validated TSSs, and PacBio transcripts 3′-end within 50 nt to PASs (generated from PolyA-Seq, 3′READ or REF) were considered as cross-validated PASs. Junctions of PacBio transcripts supported with RNA-seq junction reads were used for percent calculation. In line with our speculation, FSM category has high validation rate regarding TSS (100%), splicing junction (99.87%) and PAS (92.98%) ([118]Supplementary Figure S3). Noteworthy, two categories that contain novel sites (NNC) and (NIC) also showed consistently high validation rates (>95% for TSS, junction and PAS) ([119]Supplementary Figure S3), suggesting the novel isoforms detected by Iso-Seq are reliable. For categories that have lower validation rates (such as junction in genic and antisense categories), experimental or technical errors could possibly account for them, as discussed previously ([120]Tardaguila et al., 2018). After demonstrating the authenticity of these novel isoforms identified by Iso-Seq, we further explored the potential underlying contributing factors. We found AP, AS and APA could all contribute to the generation of novel isoforms, and APA contributed most, followed by AS and AP ([121]Figure 1D). Further, 92.83% of the APA-derived novel PASs are within 24 nt distance to validated PAS clusters ([122]Supplementary Figure S4), further supporting the biggest contribution of APA to novel isoforms. There were several good examples showing novel isoforms annotated by our data. For instance, the gene Arhgap21 has two novel TSSs not annotated in ncbiRefSeq but supported by CAGE data ([123]Figure 1E). The gene Khdc4 having intron retention (a type of AS), is also a contributor to novel isoforms ([124]Figure 1F). The gene Arhgef2 in the ncbiRefSeq annotation has only one PAS, however, we identified several APA isoforms of this gene based on our PacBio data, with both the shortest (novel proximal PAS) and the longest (annotated distal PAS) isoforms supported by strong PolyA-Seq/3′READs signals ([125]Figure 1G). Together, these above results support the notion that a combination of Iso-Seq and RNA-seq could largely improve the rat genome annotation. Expression changes at both gene and isoform levels in Ang II-induced RAEC senescence Previous analyses showed that Iso-Seq could identify full-length isoforms qualitatively but less quantitatively. In contrast, RNA-seq is more accurate in quantifying gene expression rather than distinguishing the different isoforms from the same gene ([126]Huang et al., 2021). Having confirmed the improvement of rat genome annotation by the combination of Iso-Seq and RNA-seq, we further applied this strategy to explore the expression changes at both gene and isoform levels in Ang II-induced senescent RAECs. First, we used RNA-seq to perform differential gene expression analysis and identified a total of 2,671 differentially expressed genes (DEGs) between Ang II-induced RAECs and the Control ones, including 1,249 upregulated and 1,422 downregulated DEGs ([127]Supplementary Figure S5A). The downregulated DEGs were most significantly enriched in Cell Cycle and DNA replication ([128]Supplementary Figure S5B), consistent with the cell cycle arrest phenotypes in Ang II-induced senescent RAECs ([129]Yang et al., 2020). The upregulated genes were highly enriched in ECM-receptor interaction, PI3K-Akt signaling pathway and TNF signaling pathway ([130]Supplementary Figure S5C), which were known to promote senescence ([131]Saito and Berk, 2002; [132]Yin et al., 2003; [133]Chen et al., 2012). Notably, a substantial number of DEGs were involved in Cell-Cycle pathway, including Cyclin- Dependent Kinase (CDK) gene family such as Cdk2, encoding a well-known cell cycle regulator, were decreased upon Ang II stimulation ([134]Supplementary Figure S5D). DEGs of PI3K-Akt signaling pathway, such as those encoding the inflammatory cytokine IL-6 (IL-6), and vascular endothelial growth factors A and D (Vegfa, Vegfd) were increased upon Ang II stimulation ([135]Supplementary Figure S5E). Overall, the DEG analysis with RNA-seq data indicates that the Ang II-induced cell senescence model is reliable and could be used for subsequent analysis. Next, we used FLNC reads generated from Iso-Seq to perform isoform level analysis. FLNC reads were chosen because they can be considered as full-length isoforms ([136]Wang et al., 2019). Although not as quantitative as RNA-seq, FLNC reads in Iso-Seq were still found to be semi-quantitative, especially for those full-length reads that derived from relatively highly expressed genes ([137]Uapinyoying et al., 2020; [138]Wyman et al., 2020). Therefore, we defined the isoform abundance as a percentage of isoform-supportive FLNC reads to the sum of all FLNC reads of a gene. The isoform with the largest percentage is called the major isoform, the isoform with abundance <0.1 is called inactive isoform, and the rest are called minor isoforms ([139]Figure 2A). The abundance distribution of these three isoform types is in line with their definition ([140]Figure 2B). The same trend was observed when quantification was performed with RNA-seq short reads ([141]Figure 2C). We next compared the isoform abundance between Ang II and Control conditions. The isoform abundance between these two conditions was overall correlated ([142]Figure 2D, Pearson’s correlation coefficient = 0.91; p < 2.2e-16). The abundance of most isoforms (78.62%, |Δ abundance| < 0.1) was not changed ([143]Figure 2D). However, if we do not consider isoforms that are inactive before and after Ang II stimulation (n = 14,329) ([144]Table 2), 36.2% of the isoforms undergo change (|Δ abundance| ≥ 0.1) in the proportion of usage before and after stimulation. Specifically, 3,548 isoforms showed upregulated expression and 3,470 isoforms showed downregulated expression upon Ang II treatment ([145]Figure 2D). KEGG pathway enrichment analysis showed that genes with changed isoform usage were more likely to be enriched in senescence or aging-related pathways such as p53 signaling pathway, mTOR signaling pathway and MAPK signaling pathway when compared to those genes with unchanged isoform usage ([146]Supplementary Figures S6A, B). We also used RNA-seq to examine the difference in isoform abundance and obtained similar results ([147]Figure 2E; [148]Table 3; [149]Supplementary Figures S6C, D). FIGURE 2. [150]FIGURE 2 [151]Open in a new tab Isoform abundance changed between Control and Ang II. (A) Classification of isoform abundance according to the percentage of FLNC reads support. (B) Isoform abundance (PacBio quantification) distribution between Major, Minor and Inactive isoforms. (C) Isoform abundance (RNA-seq quantification) distribution between Major, Minor and Inactive isoforms. (D) Scatter plot of the isoform abundance between Ang II and control conditions (Iso-Seq quantification). (Up + Down)/(Total–Inactive) = (3,548 + 3,470)/(33,706–14,329) = 36.2%, Around 36.2% of the identified isoforms changed the proportion of usage before and after Ang II stimulation. (E) Scatter plot of the isoform abundance between Ang II and control conditions (RNA-seq quantification). Correlation of the isoform abundance (PacBio quantification) between Control and Ang II. The text in the figure indicates example genes. TABLE 2. State of isoform changing in Control and Ang II (Iso-Seq quantification). Control.Class Ang II.Class n Inactive Inactive 14,329 Major Major 8,651 Minor Minor 4,559 Minor Inactive 1,891 Inactive Minor 1,813 Major Minor 949 Minor Major 915 Inactive Major 340 Major Inactive 259 [152]Open in a new tab TABLE 3. State of isoform changing in Control and Ang II (RNA-seq quantification). Control.Class Ang II.Class n Inactive Inactive 22,469 Major Major 6,737 Minor Minor 1,132 Minor Inactive 1,213 Inactive Minor 1,194 Major Minor 717 Minor Major 751 Inactive Major 1,359 Major Inactive 706 [153]Open in a new tab We selected Mxi1 and Cdh2 as examples to show the isoform usage difference upon Ang II treatment, both genes were associated with aging/senescence diseases ([154]Schreiber-Agus et al., 1998; [155]Mayosi et al., 2017; [156]Kruse et al., 2019; [157]Zhuo et al., 2019). Iso-Seq detected three isoforms of Mxi1, and two of these isoforms showed differential expression between Ang II and Control, despite no differential gene expression (Log2FC = 0.16), highlighting the importance of examining isoform level changes ([158]Supplementary Figure S7). Cdh2 gene is widely studied in CVDs and other diseases ([159]Mayosi et al., 2017; [160]Kruse et al., 2019; [161]Zhuo et al., 2019). Cdh2 showed PAS preference after Ang II stimulation, the major isoforms of Cdh2 in control tend to use the distal PAS while major isoforms in Ang II tend to use the proximal one ([162]Supplementary Figure S8). Alternative promoter regulation in ang II-induced RAEC senescence As alternative promoter ([163]Demircioğlu et al., 2019; [164]Huang et al., 2021), alternative splicing ([165]Roy et al., 2013) and alternative polyadenylation ([166]Chen et al., 2017) are three major contributors to isoform diversity in mammalian cells, we therefore respectively explored their contributions in Ang II-induced RAECs senescence. To explore the alternative promoter (AP) usage, we applied proActive ([167]Demircioğlu et al., 2019) to carry out the promoter analysis, wherein the promoter usage was defined based on the first junction from transcriptome GTF file (here we used the GTF file generated from our Iso-Seq data), and the activity of the promoter was evaluated based on the number of junction reads supported by RNA-seq ([168]Figure 3A). The promoter usage predicted by proActive is consistent with CAGE and H3K4me3 histone ChIP-seq data ([169]Demircioğlu et al., 2019). We identified 1,109 genes with multiple promoters and 9,013 genes with single promoter ([170]Figure 3B) using Iso-Seq data of RAECs. proActive classifies promoters into Major, Minor and Inactive according to their relative promoter activity. Promoters were also classified into known and novel promoters ([171]Figure 3C, see Methods for details), and the known promoter accounted for the largest proportion, while the novel ones accounted for a small proportion ([172]Figure 3D). We then applied DESeq2 to analyze differentially regulated promoters (DRPs) ([173]Huang et al., 2021) between Control and Ang II-treated cells, and found 188 and 197 known promoters were upregulated and downregulated respectively in Ang II-treated RAECs. Notably, 18 and 10 novel promoters were upregulated and downregulated respectively upon Ang II stimulation ([174]Figure 3E). Then we compared the genes with DRPs (DRPGs) with differentially expressed genes (DEGs) and found that 79.8% downregulated promoters overlapped with downregulated genes ([175]Supplementary Figure S9A). These genes with downregulated promoters were enriched in functional categories such as cell division, nuclear division, negative regulation of cell cycle ([176]Supplementary Figure S9B). 74.6% upregulated promoters overlapped with upregulated genes ([177]Supplementary Figure S9C), and genes with upregulated promoters were enriched in Ang II stimulation-related terms such as cellular response to external stimulus, wound healing, regulation of vascular development, regulation of angiogenesis, and reactive oxygen species metabolic process ([178]Supplementary Figure S9D). These results above indicated that the changed promoter expression may probably function in AngII-induced senescence in RAECs. FIGURE 3. [179]FIGURE 3 [180]Open in a new tab Alternative promoter activity analysis before and after Ang II treatment in RAECs. (A) Schematic representation of promoter definition in this study ([181]Demircioğlu et al., 2019). (B) Number of single promoter genes and multiple promoter genes identified in the transcriptome that Iso-Seq generated. (C) Example of known and novel promoters in RAECs. Promoters are considered known if known ncbiRefSeq transcripts also use this promoter. Promoters with the highest average activity are considered Major promoters, and all other promoters of the same gene are assigned as Minor promoters ([182]Huang et al., 2021). (D) The number of Major, Minor promoters in known and novel categories. Most promoters are known and the proportion of novel and minor promoters are small. (E) Differentially expressed promoter analysis in RAECs. Black color stands for not significantly differentially expressed promoters, gray color stands for significantly differentially expressed known promoters, and green color stands for significantly differentially expressed novel promoters. (F) Schematic representation of promoter activity switching. (G) Venn diagram shows the overlap of genes with upregulated (Up) and downregulated (Down) promoters. (H) Examples of genes that undergo dynamic changes in alternative promoter activity, alternative promoter was marked with ▲. Next, we asked how many promoters are experiencing dynamic changes between Control and Ang II based on the promoter usage and relative promoter activity ([183]Figure 3F). Relative promoter activity difference >0.1 and |FoldChange| > 1.5 were considered as alternative promoter (AP) between two conditions (Details in Methods section). A total of 239 APs derived from 121 genes were identified, including 34 genes with both upregulated and downregulated APs, which means switched promoter usage upon Ang II stimulation ([184]Figure 3G). Three genes were selected as representatives for the promoter usage switch ([185]Figure 3H). Map1b showed to have two promoters (Prmtr.29507 and Prmtr.29508) in Control, and after Ang II stimulation, Prmtr.29507 was downregulated (log2FC = −2.56) and Prmtr.29508 was upregulated (log2FC = 1.21) ([186]Figure 3H left panel). In Cdk16, the inactive promoter (Prmtr.4399) in Control turned to be active after Ang II stimulation (log2FC = 1.95), though the major promoter (Prmtr.4398) kept not changed (log2FC = −0.21) ([187]Figure 3H middle panel). As to Hpcal1, the activated promoter (Prmtr.11774) originally in Control turned to be inactive after Ang II treatment (log2FC = −1.84), but the major promoter (Prmtr.11773) did not show obvious change (log2FC = 0.35) ([188]Figure 3H right panel). These observations suggest that APs serve as one contributor to the transcriptome diversity in Ang II-induced senescence in RAECs. Alternative splicing regulation in ang II-induced RAEC senescence Alternative splicing (AS) is a well-known contributor to transcriptome diversity, and participates in diverse biological processes ([189]Wright et al., 2022). Therefore, we examined whether alternative splicing also showed dynamic changes upon Ang II stimulation in RAECs using SUPPA2 ([190]Trincado et al., 2018). AS patterns were generated based on the GTF file generated from Iso-Seq and quantified by Salmon quantification results (See Methods for details). All the AS events were classified into seven types: alternative 3′-acceptor (A3), alternative 5′-donor (A5), alternative first exon (AF), alternative last exon (AL), mutually exclusive exon (MX), intron retention (IR) and exon skipping (SE) ([191]Figure 4A). Among the expressed 10,122 genes based on PacBio Iso-Seq, 7,321 genes expressed more than one isoform. The most frequent AS type was SE (6,968), followed by AF (4,618), A3 (3,701), A5 (3,026), AL (593), IR (455) and MX (356) ([192]Figure 4B). Interestingly, the most frequent novel AS type found was AF (3,877), and IR (2,047) was the second ([193]Figure 4B), suggesting these two types of AS events are especially less annotated in the ncbiRefSeq. Next, we compared the percent spliced-in (PSI) values between known and novel in the seven AS types. We found that the expression of known AS events are generally higher than the novel ones. In novel AS events, the expression of AF, AL, MX was relatively higher than the expression of other events ([194]Figure 4C). SUPPA2 was next used for identification of differential AS events between Ang II and Control, which was evaluated with ΔPSI, the difference of the mean PSI between Ang II and Control. There were 666 significant different AS events between Ang II and Control, among of which, AF counts the largest proportion (n = 223, 33.5%) ([195]Figure 4D). Of note, AF was the most frequent novel differentially spliced AS type and SE was the most frequent known differentially spliced AS type ([196]Figure 4E). FIGURE 4. [197]FIGURE 4 [198]Open in a new tab Overview of Alternative splicing (AS) events in Ang II-induced RAEC senescence. (A) Schematic diagram of the seven types of AS events defined by SUPPA2 when compared to ncbiRefSeq of rat genome. (B) The number of AS events detected by SUPPA2 based on Iso-Seq transcriptome. (C) Expression ratio of AS events in novel and known. (D) Differential AS events (p-value <0.05) between Ang II and control conditions. (E) The number of significantly spliced AS events between Ang II and control. (F) GO enrichment analysis of differentially spliced genes between Ang II and control cells. (G) RNA-seq track plot of gene Castor1 before and after Ang II stimulation. Dashed circle indicates the differential intron retention (IR) event between two conditions. (H) RNA-seq track plot of gene Atp5mc1 upon Ang II treatment. The dashed circle represents the alternative first exon (AF) event between two conditions. We functionally categorized the differential AS genes above and found that they were enriched in Gene Ontology (GO) terms such as wound healing, regulation of intrinsic apoptotic signaling pathway, and regulation of cell growth ([199]Figure 4F). The importance of differential AS in Ang II-induced RAEC senescence is indicated by the following two example genes. Castor1, a regulator of mTORC1 ([200]Li et al., 2021), which plays a key role in cell proliferation in response to nutrition and growth stimuli, showed decreased intron retention level (a novel AS event for Castor1) upon Ang II stimulation ([201]Figure 4G). Atp5mc1, a gene associated with CVDs ([202]Hartmann et al., 2022), also displayed a usage shift of alternative first exon upon Ang II treatment ([203]Figure 4H). These results indicate that AS can contribute to Ang II-induced RAECs senescence through isoform changes in responsible genes. Alternative polyadenylation regulation in Ang II-induced RAEC senescence Alternative polyadenylation (APA) fine-tunes gene expression and largely enhances the complexity of mammalian transcriptome. To explore whether and to what extent APA contributes to the isoform changes in response to Ang II stimulation in RAECs, we performed APA analysis by integrating both Iso-Seq and RNA-seq data. We used Iso-Seq to identify Polyadenylation sites (PASs) and RNA-seq to quantify APA changes. PASs within 50 nt were clustered into one PAS, and PASs with distance >50 nt were considered as different PASs, as defined previously ([204]Tang et al., 2022). If a PAS is within 50 nt of a known ncbiRefSeq PAS, it is considered as known, otherwise defined as novel ([205]Figure 5A). We found that Iso-Seq detected a large number of PASs, including thousands of novel PASs ([206]Figure 5B). To evaluate whether these novel PASs are authentic, we compared them with the experimentally verified 3′-ends pool generated by 3′READ, PolyA-Seq and polyA_DB3 of rat samples. We found that most PASs were within 50 nt to those in the validated 3′-ends pool for both known and novel PASs ([207]Figure 5C). We analyzed the nucleotide composition of the 50 nt around the identified PASs and found that the 25 nt region upstream PAS is predominantly A-enriched, while the downstream is predominantly U-enriched ([208]Supplementary Figure S10A), consistent with known literature reports ([209]Chen et al., 1995). Moreover, we used MEME ([210]Bailey et al., 2015) to discover motif and found the canonical polyadenylation signal (AAUAAA) also over-represented in the PASs ([211]Supplementary Figure S10B) ([212]Elkon et al., 2013). These above lines of evidence suggested that these PASs identified by Iso-Seq are of satisfied quality and reliable for further analysis. A total of 5,755 (57.32%) genes had at least two PASs based on Iso-Seq of RAECs ([213]Figure 5D). We next analyzed the differential usage of PAS between Ang II and control RAECs using the modified algorithm of DaPars ([214]Xia et al., 2014) for the differential PAS usage between Ang II and control RAECs. DaPars uses mean squared error (MSE) to predict the proximal PAS with mean RNA-seq coverage. Here Iso-Seq can provide reliable but not predicted PASs, we therefore used these PASs as prior knowledge and calculated their differential PAS usage between Ang II and Control cells by Fisher’s exact test ([215]Figure 5E, see details in Method section). We found that 278 and 18 genes showed a preference usage of proximal and distal PASs, respectively, showing a global 3′UTR shortening upon Ang II stimulation ([216]Figure 5F). Most of the length-changed 3′UTRs are derived from novel PAS ([217]Figure 5F), suggesting Iso-Seq provides previously-unknown information for studying the APA regulation in Ang II-induced RAEC senescence. Further analysis showed that genes tending to use proximal PASs were enriched in pathways related to senescence such as p53 signaling pathway ([218]Ciardiello and Tortora, 2008; [219]Huang and Fu, 2015) ([220]Figure 5G), implying that APA may fine-tunes certain key pathways to participate in regulation of Ang II-induced senescence. FIGURE 5. [221]FIGURE 5 [222]Open in a new tab Global analysis of alternative polyadenylation combined with Iso-Seq and RNA-seq in RAECs. (A) The Schematic diagram of known and novel PASs. (B) The number of known and novel PASs identified by Iso-Seq. (C) Histogram of the genomic distance between Iso-Seq identified PASs and validated 3′ends pool in both known and novel classes. (D) Distribution of genes that have one or more PAS. (E) Differential usage of PAS was calculated based on given proximal PAS as prior knowledge. (F) Scatter plot between ΔPDUI and expression changes for genes with significantly longer or shorter 3′UTRs. Green and gray represent genes with novel and known PASs, respectively. (G) KEGG pathway enrichment analysis for genes tended to use proximal PAS after Ang II stimulation. As mentioned above, Iso-Seq and RNA-seq have their advantages and disadvantages. PacBio Iso-Seq data provides more reliable PASs even at relatively low coverage, and DaPars was originally designed to use Illunima’s RNA-seq data to predict proximal PASs usage and analyze global 3′UTR length changes between two conditions. Here we ran both the original DaPars program solely on RNA-seq data (Hereinafter referred to as RNA-seq) and the modified DaPars on both Iso-Seq and RNA-seq data (Hereinafter referred to as Iso-Seq), the results run with original DaPars used as comparison references. The DaPars result based