Abstract

   In eukaryotes, alternative promoter (AP), alternative splicing (AS),
   and alternative polyadenylation (APA) are three crucial regulatory
   mechanisms that modulate message RNA (mRNA) diversity. Although AP, AS
   and APA are involved in diverse biological processess, whether they
   have dynamic changes in Angiotensin II (Ang II) induced senescence in
   rat primary aortic endothelial cells (RAECs), an important cellular
   model for studying cardiovascular disease, remains unclear. Here we
   integrated both PacBio single-molecule long-read isoform sequencing
   (Iso-Seq) and Illumina short-read RNA sequencing (RNA-seq) to analyze
   the changes of AP, AS and APA in Ang II-induced senescent RAECs.
   Iso-Seq generated 36,278 isoforms from 10,145 gene loci and 65.81% of
   these isoforms are novel, which were further cross-validated by public
   data obtained by other techonologies such as CAGE, PolyA-Seq and
   3′READS. APA contributed most to novel isoforms, followed by AS and AP.
   Further investigation showed that AP, AS and APA could all contribute
   to the regulation of isoform, but AS has more dynamic changes compared
   to AP and APA upon Ang II stimulation. Genes undergoing AP, AS and APA
   in Ang II-treated cells are enriched in various pathways related to
   aging or senescence, suggesting that these molecular changes are
   involved in functional alterations during Ang II-induced senescence.
   Together, the present study largely improved the annotation of rat
   genome and revealed gene expression changes at isoform level, extending
   the understanding of the complexity of gene regulation in Ang
   II-treated RAECs, and also provided novel clues for discovering the
   regulatory mechanism undelying Ang II caused vascular senescence and
   diseases.

   Keywords: rat primary aortic endothelial cells senescence, Iso-Seq,
   RNA-seq, Angiotensin II, alternative promoter, alternative splicing,
   alternative polyadenylation

Introduction

   Rat has become an important model in the field of cardiovascular
   diseases (CVDs) such as hypertension, cardiac hypertrophy, and heart
   failure ([34]Doggrell and Brown, 1998). However, the Rat genome is not
   well annotated compared to that of human and mouse ([35]Wang et al.,
   2019). Endothelial cell (EC) senescence appears to be the first step in
   a series of events that culminate with the development of
   cardiovascular pathologies ([36]Carracedo et al., 2018). The
   renin-angiotensin system (RAS) plays an important role in vascular
   biology and Angiotensin II (Ang II) is a principal effector of RAS. Ang
   II serves as an important signaling molecule involved in atherogenic
   stimuli and is known to promote cellular senescence and aging ([37]Shan
   et al., 2008). Ang II can stimulate cell migration ([38]Nadal et al.,
   2002) and induce angiogenesis via upregulation of vascular endothelial
   growth factor (VEGF) ([39]Suganuma et al., 2005). Ang II is also
   involved in the pathogenesis of endothelial dysfunction (ED) through
   multiple signaling pathways, including angiotensin type 1 receptor
   (AT1R)-mediated NADPH oxidase (Nox)/reactive oxygen species (ROS)
   ([40]Nguyen Dinh Cat et al., 2012). Most CVDs, as well as age-related
   cardiovascular changes, are associated with increases in oxidative
   stress, due to increased generation and/or decreased metabolism of ROS
   ([41]Ding et al., 2020). Ang II can also activate NF-κB pathway, which
   in turn induced the expression of pro-inflammatory genes ([42]Kim et
   al., 2012) and inflammation-related genes [e.g., VCAM-1, iNOS, COX-2
   and IL-6] ([43]Yang et al., 2018). These inflammatory factors ususlly
   correlated with series of senescence-associated secretory phenotypes
   (SASP). Previous studies have investigated the events of Ang II-induced
   senescence of ECs from molecular perspective ([44]Forrester et al.,
   2018), however, no systematic analysis of the dynamic changes at the
   isoform levels has been performed.

   Transcription is a highly regulated process, and stress-induced changes
   in gene transcription have been shown to play a major role in stress
   response and adaptation ([45]Vilborg et al., 2017). Alternative
   promoter (AP), alternative splicing (AS), and alternative
   polyadenylation (APA) are three key steps that determine the final
   products of gene transcription, which greatly increase the
   transcriptome complexity ([46]Anvar et al., 2018; [47]Demircioğlu et
   al., 2019). Alternative promoter generates transcription start sites
   (TSSs) from different genomic locations and results in gene isoforms
   with distinct 5′untranslated regions (5′UTRs) ([48]Ayoubi and Van De
   Yen, 1996). Alternative promoters are particularly fascinating due to
   their abnormal usage in several diseases ([49]Singer et al., 2008).
   Previous analysis shows that the p21-coding gene has two transcripts,
   p21B and p21C, derived from alternative promoters and exhibit different
   functions. While p21C induces cell cycle arrest, p21B appears to
   stimulate apoptosis ([50]Landry et al., 2003).

   In eukaryotic genomes, AS is a prevalent phenomenon that can expand the
   protein pool without increasing the number of genes ([51]Roy et al.,
   2013). This helps to promote the evolution of complex functional
   transcriptomes that can control a range of cellular, molecular, and
   developmental events. AS is emerging as a critical contributor to
   senescence and aging ([52]Kwon et al., 2021). For example, PRPF19
   downregulation changes the MDM4 splicing isoform from MDM4-FL to
   MDM4-S, which induces p53-p21-dependent cellular senescence ([53]Yano
   et al., 2021).

   APA is a phenomenon that RNA molecules with different 3′ends originate
   from distinct polyadenylation sites (PASs) of a single gene, which can
   influence the translation efficiency, stability, and localization of an
   mRNA ([54]Tian and Manley, 2016; [55]Chen et al., 2017). Multiple
   studies had showed that APA-mediated 3′UTR length changes were
   dynamically regulated during cellular senescence and could serve as a
   novel player in regulating senescence ([56]Chen et al., 2018; [57]Shen
   et al., 2019; [58]Schwich et al., 2021). Although AP, AS or APA have
   been separately reported to function in replicative senescence or
   age-related diseases, whether they are involved in Ang II-induced RAECs
   senescence, an important and widely used cellular model in
   cardiovascular disease, remains unkown.

   Traditional short-read RNA-seq is commonly used for quantifying
   genome-wide gene expression and AS events due to its high throughput
   ([59]Wang et al., 2009). But there is still a challenge for short reads
   [typically 100 to 250 base pairs (bp)] to infer the structure of
   full-length transcripts, which are often several thousand bases long
   ([60]Williams et al., 2014). Single-molecule real-time (SMRT) Iso-Seq
   using the Pacific Biosciences (PacBio) platforms can capture
   full-length transcripts, making it possible to refine genome annotation
   and discover novel transcripts and fusion genes. However, the base
   sequencing error rate for Iso-Seq is relatively higher than RNA-seq, so
   high-quality RNA-seq reads are often used to correct errors in Iso-Seq
   such as FMLRC ([61]Wang J. R. et al., 2018), proovread ([62]Förster et
   al., 2014), LoRDEC ([63]Rivals and Salmela, 2014). The quantification
   accuracy for Iso-Seq is also lower than RNA-seq ([64]Wang et al.,
   2019). Therefore, a combination of both Iso-Seq and RNA-seq would
   improve the performance at both qualitative and quantitative levels.

   In this study, we systematically characterized the transcriptome
   diversity and changes in RAECs before and after Ang II treatment. We
   first used the PacBio Iso-Seq approach ([65]Gordon et al., 2015) to
   generate full-length isoforms. We found widespread transcript
   diversity, with the detection of novel transcripts not in existing rat
   genomic annotations, and used short-read RNA-seq to cross-validate and
   complement the Iso-Seq result. We discovered many novel and
   differential alternative AP, AS and APA events between Control and Ang
   II-induced cells. Our results widen the research of Ang II-induced
   senescence in RAECs and also contribute to the investigation of
   transcriptome complexity in the research field of CVDs.

Materials and methods

Culture of rat primary aorta endothelial cells (RAECs)

   Rat primary aortic endothelial cells (RAECs) were isolated from male
   rats and those with the purity greater than 95% (determined by both
   positive staining for CD31 and morphologically based on their classical
   “hill and valley” appearance) were kept for the subsequent study, as
   performed according to the methods described previously ([66]Yang et
   al., 2018). Cells that experienced three to five passages were used for
   Ang II stimulation experiments. Quantified RAECs were treated with Ang
   II (2 μM) or vehicle (sodium chloride) for 48 hours (h) as described
   before ([67]Yang et al., 2018).

RNA-seq library construction

   PolyA^+ RNA was enriched by oligo (dT)25 Dynabeads (Invitrogen) from
   total RNA. The dUTP-based strand-specific RNA-seq libraries for control
   and Ang II-induced RAECs were constructed according to the method
   previously described ([68]Yang et al., 2020). RNA-seq libraries were
   sequenced on Illumina HiSeq platform, and paired-end reads of 150
   nucleotide (nt) lengths were obtained.

PacBio sequencing library construction

   PolyA^+ RNA after double-stranded cDNA preparation, PCR amplification,
   product purification with AMPurePB magnetic beads, verification of DNA
   size, followed by qualitative and quantitative analysis using a
   Bioanalyzer. The resulting double-stranded DNA was used to generate the
   SMRTbell™ libraries using PacBio Template Prep Kit. SMRTbell templates
   are then sequenced in the PacBio Sequel System.

Iso-seq data processing

   We performed initial data processing using IsoSeq v3 pipeline
   ([69]IsoSeq/isoseq-clustering.md at
   master·PacificBiosciences/IsoSeq·GitHub) for raw subreads. The circular
   consensus sequencing (CCS) reads were obtained through the consensus
   generation step. Then, we determine whether they are classified as
   full-length reads based on the presence or absence of the 5′and
   3′primers, and use lima v2.0.0 ([70]https://lima.how/) to remove the
   primers. The obtained full-length reads, were refined by trimming poly
   (A) tails and removing concatemer, then full-length non-concatemer
   (FLNC) reads are finally obtained. To find transcript clusters,
   isoform-level clustering algorithm ICE (Iterative Clustering for Error
   Correction) ([71]Gordon et al., 2015) was applied to all full-length
   (FL) transcripts to obtain the consensus sequence for each cluster.
   Quiver ([72]Chin et al., 2013) was used for error correction to obtain
   high-quality (HQ) (accuracy, ≥99%) isoforms. To get non-redundant
   isoforms, HQ isoforms were mapped to rat genome (mRatBN7.2) using
   minimap2 ([73]Li, 2018) with parameters -ax splice -uf--secondary = no
   -C5 -O6,24 -B4, and collapsed by a third-party module Cupcakes Python
   script (collapse_isoforms_by_sam.py) with the following parameters “-c
   0.99 -i 0.85”. The 5′degraded isoforms were filtered away because the
   Clontech SMARTer cDNA kit used to create the full-length cDNA does not
   do cap trap, these may be 5′RNA degradation products and not
   biologically meaningful. In that case, we filter them away according to
   the method described in Cupcake
   ([74]https://github.com/Magdoll/cDNA_Cupcake/wiki/Cupcake:-supporting-s
   cripts-for-Iso-Seq-after-clustering-step#filtersubset). Then the
   resulted isoforms were finally annotated using SQANTI3 ([75]Tardaguila
   et al., 2018). SQANTI3 marked the internal priming resulted PAS by
   identifying downstream six consecutive “A” or the percentage of “A” is
   greater than 60% in the 20 nucleotides downstream of the potential PAS,
   the transcript is considered to be an IntraPriming transcript and is
   filtered out.

   The coding potential of the resulted isoforms was also predicted with
   SQANTI3 which uses GeneMarkS-T (GMST) algorithm ([76]Tang et al.,
   2015). GeneMarkS-T utilizes iterative self-training and a hidden
   semi-Markov model to predict coding regions in eukaryotic transcripts.

   All FLNC reads were merged and the FMLRC ([77]Wang J. R. et al., 2018)
   was then used for error correction with RNA-seq to get high-quality
   FLNC reads. The corrected FLNC reads were then mapped to mRatBN7.2
   genome using minimap2 ([78]Li, 2018) with parameters -ax splice
   -uf--secondary = no -C5 -O6,24 -B4, then using the sam_to_gff3.py from
   cDNA_Cupcake to generate an FLNC gff3 file. This file was then compared
   with non-redundant HQ isoforms using inhouse scripts to calculate the
   FLNC reads supporting HQ isoforms in Control and Ang II samples
   independently.

Transcript quantification and differential expression analysis

   First, RNA-seq raw reads were cleaned by trimming using Cutadapt
   ([79]Martin, 2011) with the default parameter. The clean reads were
   mapped to the mRatBN7.2 reference genome using STAR (version 2.7.7a)
   ([80]Dobin et al., 2013) with parameters--twopassMode
   Basic--outFilterMultimapNmax 1 --outSAMstrandField intronMotif.
   Differential expression analysis between two conditions was performed
   using the DESeq2 R package (version 1.30.1) ([81]Love et al., 2014).
   The resulting p values were adjusted using Benjamin and Hochberg’s
   approach for controlling for the false discovery rate (FDR). Genes with
   an adjusted p-value <0.05, and |FoldChange (FC)| > 1.5 identified by
   DESeq2 ([82]Love et al., 2014) were assigned as differentially
   expressed. Transcripts per million (TPM) of reference transcriptome
   defined using Iso-Seq was used in calculating the expression level of
   transcripts by Salmon (v1.8.0) ([83]Patro et al., 2017).

Functional annotation and enrichment analysis

   For differentially expressed genes (DEGs) and differentially spliced
   genes, GO and KEGG enrichment analysis were performed with
   clusterProfiler ([84]Yu et al., 2012), which supports statistical
   analysis and visualization of functional profiles for genes and gene
   clusters.

Alternative promoter analysis

   Active promoter sites were identified by proActiv (v0.1.0)
   ([85]Demircioğlu et al., 2019) based on transcriptome generated by
   Iso-Seq. Short-reads of RNA-seq were aligned to the reference genome
   using STAR ([86]Dobin et al., 2013). The aligned RNA-seq bam files and
   GTF files generated from Iso-Seq were used as input by proActive to
   estimate promoter activities in each sample. We divided the promoter
   set into three categories depending on their relative promoter activity
   calculated by proActive, major, minor and inactive promoters. Promoters
   with the highest relative activity for each gene across the sample
   cohort were defined as major promoters. Besides, promoters with
   relative promoter activity less than 0.1 constitute inactive promoters
   whereas the rest promoters of a gene constitute minor promoters. Known
   promoter was defined if known genome annotation also uses this
   promoter. Novel promoter was defined only if Iso-Seq isoforms use this
   promoter. Differentially regulated promoters (DRPs) were identified
   using DESeq2. The resultant p values were adjusted using BH approach
   for controlling the false discovery rate (FDR). The promoters with an
   activity change level of |FoldChange| > 1.5 and adjusted p-value <0.05
   were considered significant DRPs. We identified alternative promoters
   (APs) by considering both promoter expression and promoter activity.
   The criteria were as follows: (1) mean relative promoter activity >0.1
   and were significantly changed between two conditions (FDR <0.05, BH
   adjusted); (2) |Fold Change| > 1.5 of promoter expression. (3) Gene
   expression FPKM >1.

Alternative splicing and specific isoform analysis

   Through the use of SUPPA2 ([87]Trincado et al., 2018), a tool designed
   to find splicing events and determine whether there are differences
   between samples, alternative splicing patterns were found with the
   PacBio full-length transcripts. By scanning the ‘exon’ lines and
   categorizing each local event as a particular splicing type, the SUPPA2
   “generateEvents” command was used to create local alternative splicing
   events from our high-quality PacBio transcript annotation. Then, using
   the Salmon ([88]Patro et al., 2017) quantification results, the SUPPA2
   “psiPerEvent” command was used to quantify the percentage of spliced-in
   (PSI) values for each local alternative splicing event. Differential
   expression of AS events was screened out with the adjusted p-value
   <0.05.

Alternative polyadenylation analysis

   PAS cluster was using with modified TAPIS 1.2.1 ([89]Abdel-Ghany et
   al., 2016) ([90]https://bitbucket.org/comp_bio/tapis/src/master/)
   scripts that PASs within 50 base pair (bp) were clustered as one PAS.
   APA analysis between two conditions was performed by using the
   algorithm of DaPars ([91]Xia et al., 2014). As DaPars predicts the
   proximal PAS using RNA-seq and Iso-Seq could provide validated PAS, we
   therefore supplied Iso-Seq supported PAS for APA analysis. These PAS
   were used as prior knowledge using a modified DaPars algorithm to
   calculate the distal PAS usage index (PDUI) based on RNA-seq coverage,
   and then Fisher exact test was used to determine differential APA
   events between the two groups, and the criteria for defining
   differential APA events between two groups were consistent with DaPars.

3′READS and Poly (A)-seq data analysis

   3′READS and Poly (A)-seq reads were mapped to mRatBN7.2 genome using
   Bowtie 2 with the default parameters ([92]Langmead and Salzberg, 2012).
   Reads start pos were considered as potential PAS. PAS with at least
   five reads support is considered a potential validated PAS.

CAGE data analysis

   CAGE reads were mapped to mRatBN7.2 genome using Bowtie 2 ([93]Langmead
   and Salzberg, 2012) with default parameters. Then peak calling was
   performed using MACS2 ([94]Zhang et al., 2008). The resultant files
   with suffix of.treat_pileup.bdg and.summits.bed were used for further
   analysis.

Results

Overall statistics of the Iso-Seq profiling on RAECs

   RNA-seq is widely used for the quantification of gene expression and AS
   events. However, it is still challenging to accurately identify
   full-length splicing isoforms using RNA-seq alone ([95]Abdel-Ghany et
   al., 2016). To reveal the complexity of the transcriptome of rat
   primary aortic endothelial cells (RAECs) and discover dynamic changes
   of alternative promoter (AP), alternative splicing (AS) and alternative
   polyadenylation (APA) in Ang II-induced endothelial cell senescence, we
   sequenced the transcriptome of RAECs before (Control) and after Ang II
   treatment using both PacBio Iso-Seq and Illumina RNA-seq techniques.

   We used PacBio’s official pipeline, IsoSeq3, to filter and process the
   raw sequencing files (details see:
   [96]https://github.com/PacificBiosciences/IsoSeq/blob/master/isoseq-clu
   stering.md). As a result, a total of 27,668,459 raw subreads with an
   average length of 1,838 bp were obtained in Control sample and
   33,151,284 raw subreads with an average length of 1,700 bp in Ang II
   treatment samples ([97]Table 1). The PacBio official pipeline we
   adopted for the analysis of raw data includes circular consensus
   sequencing (CCS) reads calling, primer removal and demultiplexing, and
   the refining steps ([98]Supplementary Figure S1) (Details in Methods
   section). After processing with these steps, we got 1,469,285 FLNC
   reads with an average length of 2,361 bp in the Control sample and
   1,676,143 FLNC with an average length of 2,486 bp in the Ang II sample,
   respectively ([99]Table 1). Then, polish step was performed on all the
   full-length transcripts to generate consensus isoforms. It created a
   consensus isoform for each cluster after classifying them into groups
   based on their sequence similarity. Subsequently, the full insert
   sequence that had not been tested was aligned to the adjusted to
   produce high-quality (HQ) isoforms (125,900) with a mean length of
   2,603 from the consensus isoform and low quality (LQ) isoforms (5,323)
   with a mean length of 2,807 ([100]Table 1). To get non-redundant
   transcripts, HQ isoforms were mapped to rat genome (mRatBN7.2) using
   minimap2 ([101]Li, 2018) and collapsed with a third-party module
   Cupcakes Python script (collapse_isoforms_by_sam.py). Then we got
   36,278 transcripts derived from 10,145 gene loci. These transcripts
   were annotated by SQANTI3 ([102]Tardaguila et al., 2018) for further
   analysis.

TABLE 1.

   Summary of PacBio raw data analysis of Control and Ang II in RAECs.
      Subject                    Category                  Control     Ang II
     Subreads                Subreads number              27,668,459 33,151,284
                           Average length (bp)               1838       1700
   Number of CCS    Full-length non-concatemer (FLNC)     1,469,285  1,676,143
                           Average length (bp)              2,361      2,486
    HQ isoform   Number of polished high-quality isoforms       125,900
                           Average length (bp)                   2,603
    LQ isoform   Number of polished low-quality isoforms         5,323
                           Average length (bp)                   2,807
   [103]Open in a new tab

Full-length isoforms greatly improve rat genome annotation

   To validate the reliability of these isoforms identified by Iso-Seq, we
   used a series of public available datasets generated by independent
   methods including cap analysis gene expression (CAGE) data from FANTOM
   ([104]Lizio et al., 2015), RNA-seq short junction reads, rat 3′READ raw
   data used in polyA_DB3 ([105]Wang R. et al., 2018), rat PolyA-Seq data
   ([106]Derti et al., 2012), and ncbiRefSeq known rat genome annotation
   (REF) to construct reliable 5′-ends, splicing junction, and 3′-ends
   pool, respectively ([107]Figure 1Ai). If the isoforms have the same
   splicing pattern and PAS (within 50 nt) but different potential TSS, we
   only kept theose whose TSS are closest to the validated 5′-ends pool
   ([108]Figure 1Aii, iii). Since Iso-Seq protocol uses the similar
   strategy as the Clontech SMARTer cDNA kit to generate full-length cDNA,
   which does not perform cap trap, and thus the obtained 5′ends can be
   possibly derived from degradation products or reverse transcriptase
   drop offs ([109]Schaarschmidt et al., 2020). To filter out possible
   artificial isoforms, Iso-Seq defined transcription start site (TSS) and
   polyadenylation site (PAS) within 50 nt of validated 5′-ends and
   3′-ends pool respectively, were called supported. As for splicing, if
   an Iso-Seq defined splicing event could be supported by at least two
   short junction reads in the pool, it was also called supported. We
   found considerable high support rates for TSS (96.5%), splicing
   junction (86.4%) and PAS (93.0%) ([110]Figure 1B), suggesting the
   reliability of Iso-Seq detected isoforms.

FIGURE 1.

   [111]FIGURE 1
   [112]Open in a new tab

   High confident full-length isoforms improve rat genome annotation. (A)
   The schematic diagram illustrating the cross-validation of full-length
   transcript end based on CAGE, PolyA-Seq and 3′READ. The transcriptome
   construction involves three steps. Step i: We generated validated pools
   of 5′-ends, splicing junction and 3′-ends from CAGE, short junction
   reads, PolyA-Seq and 3′READS, respectively, followed by comparing the
   aligned Iso-Seq isoforms with the validated 5′-end, junction, and
   3′-end pool. Step ii: Mark those isoforms with the same splicing
   structure and PAS within 50 nt and only display 5′UTR difference
   (dotted closed circle). Step iii: We keep the one whose TSS is closest
   to the validated 5′-ends pool. (B) The support rate of PacBio isoforms
   for TSS, junction, and PAS independently. (C) Types and percentages of
   eight categories of PacBio defined transcripts. (D) Venn diagram shows
   the number of PacBio isoforms that differ from ncbiRefSeq in
   transcriptional start sites (TSS), alternative splicing (AS), and
   alternative polyadenylation (APA). (E) Arhgap21 as an example to show
   novel isoforms with novel TSSs when compared to ncbiRefSeq and were
   supported by CAGE. (F) Khdc4 as an example to show novel isoforms with
   intron retention when compared to ncbiRefSeq annotation and were
   supported by PacBio and RNA-seq. (G) Arhgef2 as an example to show
   novel isoforms with proximal PAS when compared to ncbiRefSeq annotation
   and were supported by PacBio and PolyA-Seq/3′READS.

   After confirming the reliability of the detected isoforms, we next
   asked whether full-length Iso-Seq could improve the gene annotation of
   rat genome. We used UCSC ncbiRefSeq as a reference for comparison,
   because this annotation is the most complete for rat genome ([113]Ji et
   al., 2020). We adopted the transcript structure classification strategy
   of SQANTI3 ([114]Tardaguila et al., 2018), which divides transcripts
   into eight categories compared to reference, to classify the Iso-Seq
   identified transcripts ([115]Supplementary Figure S2). Among all these
   transcripts annotated to known genes (n = 34,642), 61.59% (n = 21,337)
   were categorized as full-splice match (FSM, corresponds to a transcript
   with all the same splice junctions as a reference transcript) in
   catalog, 15.12% (n = 5,238) were categorized as novel not in catalog
   (NNC; transcripts use novel donors and/or acceptors), 13.54% (n =
   4,692) were categorized as novel in catalog (NIC, transcripts contain
   new combinations of annotated splice junctions or novel splice
   junctions formed from annotated donors and acceptors), 8.42% (n =
   2,916) were categorized as incomplete splice match (ISM, transcripts
   matching consecutive, but not all, splicing junctions of the REF),
   0.59% (n = 204) were categorized as genic, 0.32% (n = 112) were
   categorized as intergenic transcripts, 0.23% (n = 80) were categorized
   as fusion, and 0.18% (n = 63) were categorized as antisense transcripts
   [structure categories defined by ([116]Tardaguila et al., 2018)]
   ([117]Figure 1C).

   We proceeded to characterize transcripts in these eight structure
   categories above. First, we cross-validated the reliability of
   transcripts in each category. PacBio transcripts with 5′-ends within
   50 nt to CAGE or REF TSSs peaks were considered as validated TSSs, and
   PacBio transcripts 3′-end within 50 nt to PASs (generated from
   PolyA-Seq, 3′READ or REF) were considered as cross-validated PASs.
   Junctions of PacBio transcripts supported with RNA-seq junction reads
   were used for percent calculation. In line with our speculation, FSM
   category has high validation rate regarding TSS (100%), splicing
   junction (99.87%) and PAS (92.98%) ([118]Supplementary Figure S3).
   Noteworthy, two categories that contain novel sites (NNC) and (NIC)
   also showed consistently high validation rates (>95% for TSS, junction
   and PAS) ([119]Supplementary Figure S3), suggesting the novel isoforms
   detected by Iso-Seq are reliable. For categories that have lower
   validation rates (such as junction in genic and antisense categories),
   experimental or technical errors could possibly account for them, as
   discussed previously ([120]Tardaguila et al., 2018).

   After demonstrating the authenticity of these novel isoforms identified
   by Iso-Seq, we further explored the potential underlying contributing
   factors. We found AP, AS and APA could all contribute to the generation
   of novel isoforms, and APA contributed most, followed by AS and AP
   ([121]Figure 1D). Further, 92.83% of the APA-derived novel PASs are
   within 24 nt distance to validated PAS clusters ([122]Supplementary
   Figure S4), further supporting the biggest contribution of APA to novel
   isoforms. There were several good examples showing novel isoforms
   annotated by our data. For instance, the gene Arhgap21 has two novel
   TSSs not annotated in ncbiRefSeq but supported by CAGE data
   ([123]Figure 1E). The gene Khdc4 having intron retention (a type of
   AS), is also a contributor to novel isoforms ([124]Figure 1F). The gene
   Arhgef2 in the ncbiRefSeq annotation has only one PAS, however, we
   identified several APA isoforms of this gene based on our PacBio data,
   with both the shortest (novel proximal PAS) and the longest (annotated
   distal PAS) isoforms supported by strong PolyA-Seq/3′READs signals
   ([125]Figure 1G). Together, these above results support the notion that
   a combination of Iso-Seq and RNA-seq could largely improve the rat
   genome annotation.

Expression changes at both gene and isoform levels in Ang II-induced RAEC
senescence

   Previous analyses showed that Iso-Seq could identify full-length
   isoforms qualitatively but less quantitatively. In contrast, RNA-seq is
   more accurate in quantifying gene expression rather than distinguishing
   the different isoforms from the same gene ([126]Huang et al., 2021).
   Having confirmed the improvement of rat genome annotation by the
   combination of Iso-Seq and RNA-seq, we further applied this strategy to
   explore the expression changes at both gene and isoform levels in Ang
   II-induced senescent RAECs. First, we used RNA-seq to perform
   differential gene expression analysis and identified a total of 2,671
   differentially expressed genes (DEGs) between Ang II-induced RAECs and
   the Control ones, including 1,249 upregulated and 1,422 downregulated
   DEGs ([127]Supplementary Figure S5A). The downregulated DEGs were most
   significantly enriched in Cell Cycle and DNA replication
   ([128]Supplementary Figure S5B), consistent with the cell cycle arrest
   phenotypes in Ang II-induced senescent RAECs ([129]Yang et al., 2020).
   The upregulated genes were highly enriched in ECM-receptor interaction,
   PI3K-Akt signaling pathway and TNF signaling pathway
   ([130]Supplementary Figure S5C), which were known to promote senescence
   ([131]Saito and Berk, 2002; [132]Yin et al., 2003; [133]Chen et al.,
   2012). Notably, a substantial number of DEGs were involved in
   Cell-Cycle pathway, including Cyclin- Dependent Kinase (CDK) gene
   family such as Cdk2, encoding a well-known cell cycle regulator, were
   decreased upon Ang II stimulation ([134]Supplementary Figure S5D). DEGs
   of PI3K-Akt signaling pathway, such as those encoding the inflammatory
   cytokine IL-6 (IL-6), and vascular endothelial growth factors A and D
   (Vegfa, Vegfd) were increased upon Ang II stimulation
   ([135]Supplementary Figure S5E). Overall, the DEG analysis with RNA-seq
   data indicates that the Ang II-induced cell senescence model is
   reliable and could be used for subsequent analysis.

   Next, we used FLNC reads generated from Iso-Seq to perform isoform
   level analysis. FLNC reads were chosen because they can be considered
   as full-length isoforms ([136]Wang et al., 2019). Although not as
   quantitative as RNA-seq, FLNC reads in Iso-Seq were still found to be
   semi-quantitative, especially for those full-length reads that derived
   from relatively highly expressed genes ([137]Uapinyoying et al., 2020;
   [138]Wyman et al., 2020). Therefore, we defined the isoform abundance
   as a percentage of isoform-supportive FLNC reads to the sum of all FLNC
   reads of a gene. The isoform with the largest percentage is called the
   major isoform, the isoform with abundance <0.1 is called inactive
   isoform, and the rest are called minor isoforms ([139]Figure 2A). The
   abundance distribution of these three isoform types is in line with
   their definition ([140]Figure 2B). The same trend was observed when
   quantification was performed with RNA-seq short reads ([141]Figure 2C).
   We next compared the isoform abundance between Ang II and Control
   conditions. The isoform abundance between these two conditions was
   overall correlated ([142]Figure 2D, Pearson’s correlation coefficient =
   0.91; p < 2.2e-16). The abundance of most isoforms (78.62%, |Δ
   abundance| < 0.1) was not changed ([143]Figure 2D). However, if we do
   not consider isoforms that are inactive before and after Ang II
   stimulation (n = 14,329) ([144]Table 2), 36.2% of the isoforms undergo
   change (|Δ abundance| ≥ 0.1) in the proportion of usage before and
   after stimulation. Specifically, 3,548 isoforms showed upregulated
   expression and 3,470 isoforms showed downregulated expression upon Ang
   II treatment ([145]Figure 2D). KEGG pathway enrichment analysis showed
   that genes with changed isoform usage were more likely to be enriched
   in senescence or aging-related pathways such as p53 signaling pathway,
   mTOR signaling pathway and MAPK signaling pathway when compared to
   those genes with unchanged isoform usage ([146]Supplementary Figures
   S6A, B). We also used RNA-seq to examine the difference in isoform
   abundance and obtained similar results ([147]Figure 2E; [148]Table 3;
   [149]Supplementary Figures S6C, D).

FIGURE 2.

   [150]FIGURE 2
   [151]Open in a new tab

   Isoform abundance changed between Control and Ang II. (A)
   Classification of isoform abundance according to the percentage of FLNC
   reads support. (B) Isoform abundance (PacBio quantification)
   distribution between Major, Minor and Inactive isoforms. (C) Isoform
   abundance (RNA-seq quantification) distribution between Major, Minor
   and Inactive isoforms. (D) Scatter plot of the isoform abundance
   between Ang II and control conditions (Iso-Seq quantification). (Up +
   Down)/(Total–Inactive) = (3,548 + 3,470)/(33,706–14,329) = 36.2%,
   Around 36.2% of the identified isoforms changed the proportion of usage
   before and after Ang II stimulation. (E) Scatter plot of the isoform
   abundance between Ang II and control conditions (RNA-seq
   quantification). Correlation of the isoform abundance (PacBio
   quantification) between Control and Ang II. The text in the figure
   indicates example genes.

TABLE 2.

   State of isoform changing in Control and Ang II (Iso-Seq
   quantification).
   Control.Class Ang II.Class   n
     Inactive      Inactive   14,329
       Major        Major     8,651
       Minor        Minor     4,559
       Minor       Inactive   1,891
     Inactive       Minor     1,813
       Major        Minor      949
       Minor        Major      915
     Inactive       Major      340
       Major       Inactive    259
   [152]Open in a new tab

TABLE 3.

   State of isoform changing in Control and Ang II (RNA-seq
   quantification).
   Control.Class Ang II.Class   n
     Inactive      Inactive   22,469
       Major        Major     6,737
       Minor        Minor     1,132
       Minor       Inactive   1,213
     Inactive       Minor     1,194
       Major        Minor      717
       Minor        Major      751
     Inactive       Major     1,359
       Major       Inactive    706
   [153]Open in a new tab

   We selected Mxi1 and Cdh2 as examples to show the isoform usage
   difference upon Ang II treatment, both genes were associated with
   aging/senescence diseases ([154]Schreiber-Agus et al., 1998;
   [155]Mayosi et al., 2017; [156]Kruse et al., 2019; [157]Zhuo et al.,
   2019). Iso-Seq detected three isoforms of Mxi1, and two of these
   isoforms showed differential expression between Ang II and Control,
   despite no differential gene expression (Log2FC = 0.16), highlighting
   the importance of examining isoform level changes ([158]Supplementary
   Figure S7). Cdh2 gene is widely studied in CVDs and other diseases
   ([159]Mayosi et al., 2017; [160]Kruse et al., 2019; [161]Zhuo et al.,
   2019). Cdh2 showed PAS preference after Ang II stimulation, the major
   isoforms of Cdh2 in control tend to use the distal PAS while major
   isoforms in Ang II tend to use the proximal one ([162]Supplementary
   Figure S8).

Alternative promoter regulation in ang II-induced RAEC senescence

   As alternative promoter ([163]Demircioğlu et al., 2019; [164]Huang et
   al., 2021), alternative splicing ([165]Roy et al., 2013) and
   alternative polyadenylation ([166]Chen et al., 2017) are three major
   contributors to isoform diversity in mammalian cells, we therefore
   respectively explored their contributions in Ang II-induced RAECs
   senescence. To explore the alternative promoter (AP) usage, we applied
   proActive ([167]Demircioğlu et al., 2019) to carry out the promoter
   analysis, wherein the promoter usage was defined based on the first
   junction from transcriptome GTF file (here we used the GTF file
   generated from our Iso-Seq data), and the activity of the promoter was
   evaluated based on the number of junction reads supported by RNA-seq
   ([168]Figure 3A). The promoter usage predicted by proActive is
   consistent with CAGE and H3K4me3 histone ChIP-seq data
   ([169]Demircioğlu et al., 2019). We identified 1,109 genes with
   multiple promoters and 9,013 genes with single promoter ([170]Figure
   3B) using Iso-Seq data of RAECs. proActive classifies promoters into
   Major, Minor and Inactive according to their relative promoter
   activity. Promoters were also classified into known and novel promoters
   ([171]Figure 3C, see Methods for details), and the known promoter
   accounted for the largest proportion, while the novel ones accounted
   for a small proportion ([172]Figure 3D). We then applied DESeq2 to
   analyze differentially regulated promoters (DRPs) ([173]Huang et al.,
   2021) between Control and Ang II-treated cells, and found 188 and 197
   known promoters were upregulated and downregulated respectively in Ang
   II-treated RAECs. Notably, 18 and 10 novel promoters were upregulated
   and downregulated respectively upon Ang II stimulation ([174]Figure
   3E). Then we compared the genes with DRPs (DRPGs) with differentially
   expressed genes (DEGs) and found that 79.8% downregulated promoters
   overlapped with downregulated genes ([175]Supplementary Figure S9A).
   These genes with downregulated promoters were enriched in functional
   categories such as cell division, nuclear division, negative regulation
   of cell cycle ([176]Supplementary Figure S9B). 74.6% upregulated
   promoters overlapped with upregulated genes ([177]Supplementary Figure
   S9C), and genes with upregulated promoters were enriched in Ang II
   stimulation-related terms such as cellular response to external
   stimulus, wound healing, regulation of vascular development, regulation
   of angiogenesis, and reactive oxygen species metabolic process
   ([178]Supplementary Figure S9D). These results above indicated that the
   changed promoter expression may probably function in AngII-induced
   senescence in RAECs.

FIGURE 3.

   [179]FIGURE 3
   [180]Open in a new tab

   Alternative promoter activity analysis before and after Ang II
   treatment in RAECs. (A) Schematic representation of promoter definition
   in this study ([181]Demircioğlu et al., 2019). (B) Number of single
   promoter genes and multiple promoter genes identified in the
   transcriptome that Iso-Seq generated. (C) Example of known and novel
   promoters in RAECs. Promoters are considered known if known ncbiRefSeq
   transcripts also use this promoter. Promoters with the highest average
   activity are considered Major promoters, and all other promoters of the
   same gene are assigned as Minor promoters ([182]Huang et al., 2021).
   (D) The number of Major, Minor promoters in known and novel categories.
   Most promoters are known and the proportion of novel and minor
   promoters are small. (E) Differentially expressed promoter analysis in
   RAECs. Black color stands for not significantly differentially
   expressed promoters, gray color stands for significantly differentially
   expressed known promoters, and green color stands for significantly
   differentially expressed novel promoters. (F) Schematic representation
   of promoter activity switching. (G) Venn diagram shows the overlap of
   genes with upregulated (Up) and downregulated (Down) promoters. (H)
   Examples of genes that undergo dynamic changes in alternative promoter
   activity, alternative promoter was marked with ▲.

   Next, we asked how many promoters are experiencing dynamic changes
   between Control and Ang II based on the promoter usage and relative
   promoter activity ([183]Figure 3F). Relative promoter activity
   difference >0.1 and |FoldChange| > 1.5 were considered as alternative
   promoter (AP) between two conditions (Details in Methods section). A
   total of 239 APs derived from 121 genes were identified, including 34
   genes with both upregulated and downregulated APs, which means switched
   promoter usage upon Ang II stimulation ([184]Figure 3G). Three genes
   were selected as representatives for the promoter usage switch
   ([185]Figure 3H). Map1b showed to have two promoters (Prmtr.29507 and
   Prmtr.29508) in Control, and after Ang II stimulation, Prmtr.29507 was
   downregulated (log2FC = −2.56) and Prmtr.29508 was upregulated (log2FC
   = 1.21) ([186]Figure 3H left panel). In Cdk16, the inactive promoter
   (Prmtr.4399) in Control turned to be active after Ang II stimulation
   (log2FC = 1.95), though the major promoter (Prmtr.4398) kept not
   changed (log2FC = −0.21) ([187]Figure 3H middle panel). As to Hpcal1,
   the activated promoter (Prmtr.11774) originally in Control turned to be
   inactive after Ang II treatment (log2FC = −1.84), but the major
   promoter (Prmtr.11773) did not show obvious change (log2FC = 0.35)
   ([188]Figure 3H right panel). These observations suggest that APs serve
   as one contributor to the transcriptome diversity in Ang II-induced
   senescence in RAECs.

Alternative splicing regulation in ang II-induced RAEC senescence

   Alternative splicing (AS) is a well-known contributor to transcriptome
   diversity, and participates in diverse biological processes
   ([189]Wright et al., 2022). Therefore, we examined whether alternative
   splicing also showed dynamic changes upon Ang II stimulation in RAECs
   using SUPPA2 ([190]Trincado et al., 2018). AS patterns were generated
   based on the GTF file generated from Iso-Seq and quantified by Salmon
   quantification results (See Methods for details). All the AS events
   were classified into seven types: alternative 3′-acceptor (A3),
   alternative 5′-donor (A5), alternative first exon (AF), alternative
   last exon (AL), mutually exclusive exon (MX), intron retention (IR) and
   exon skipping (SE) ([191]Figure 4A). Among the expressed 10,122 genes
   based on PacBio Iso-Seq, 7,321 genes expressed more than one isoform.
   The most frequent AS type was SE (6,968), followed by AF (4,618), A3
   (3,701), A5 (3,026), AL (593), IR (455) and MX (356) ([192]Figure 4B).
   Interestingly, the most frequent novel AS type found was AF (3,877),
   and IR (2,047) was the second ([193]Figure 4B), suggesting these two
   types of AS events are especially less annotated in the ncbiRefSeq.
   Next, we compared the percent spliced-in (PSI) values between known and
   novel in the seven AS types. We found that the expression of known AS
   events are generally higher than the novel ones. In novel AS events,
   the expression of AF, AL, MX was relatively higher than the expression
   of other events ([194]Figure 4C). SUPPA2 was next used for
   identification of differential AS events between Ang II and Control,
   which was evaluated with ΔPSI, the difference of the mean PSI between
   Ang II and Control. There were 666 significant different AS events
   between Ang II and Control, among of which, AF counts the largest
   proportion (n = 223, 33.5%) ([195]Figure 4D). Of note, AF was the most
   frequent novel differentially spliced AS type and SE was the most
   frequent known differentially spliced AS type ([196]Figure 4E).

FIGURE 4.

   [197]FIGURE 4
   [198]Open in a new tab

   Overview of Alternative splicing (AS) events in Ang II-induced RAEC
   senescence. (A) Schematic diagram of the seven types of AS events
   defined by SUPPA2 when compared to ncbiRefSeq of rat genome. (B) The
   number of AS events detected by SUPPA2 based on Iso-Seq transcriptome.
   (C) Expression ratio of AS events in novel and known. (D) Differential
   AS events (p-value <0.05) between Ang II and control conditions. (E)
   The number of significantly spliced AS events between Ang II and
   control. (F) GO enrichment analysis of differentially spliced genes
   between Ang II and control cells. (G) RNA-seq track plot of gene
   Castor1 before and after Ang II stimulation. Dashed circle indicates
   the differential intron retention (IR) event between two conditions.
   (H) RNA-seq track plot of gene Atp5mc1 upon Ang II treatment. The
   dashed circle represents the alternative first exon (AF) event between
   two conditions.

   We functionally categorized the differential AS genes above and found
   that they were enriched in Gene Ontology (GO) terms such as wound
   healing, regulation of intrinsic apoptotic signaling pathway, and
   regulation of cell growth ([199]Figure 4F). The importance of
   differential AS in Ang II-induced RAEC senescence is indicated by the
   following two example genes. Castor1, a regulator of mTORC1 ([200]Li et
   al., 2021), which plays a key role in cell proliferation in response to
   nutrition and growth stimuli, showed decreased intron retention level
   (a novel AS event for Castor1) upon Ang II stimulation ([201]Figure
   4G). Atp5mc1, a gene associated with CVDs ([202]Hartmann et al., 2022),
   also displayed a usage shift of alternative first exon upon Ang II
   treatment ([203]Figure 4H). These results indicate that AS can
   contribute to Ang II-induced RAECs senescence through isoform changes
   in responsible genes.

Alternative polyadenylation regulation in Ang II-induced RAEC senescence

   Alternative polyadenylation (APA) fine-tunes gene expression and
   largely enhances the complexity of mammalian transcriptome. To explore
   whether and to what extent APA contributes to the isoform changes in
   response to Ang II stimulation in RAECs, we performed APA analysis by
   integrating both Iso-Seq and RNA-seq data. We used Iso-Seq to identify
   Polyadenylation sites (PASs) and RNA-seq to quantify APA changes. PASs
   within 50 nt were clustered into one PAS, and PASs with distance >50 nt
   were considered as different PASs, as defined previously ([204]Tang et
   al., 2022). If a PAS is within 50 nt of a known ncbiRefSeq PAS, it is
   considered as known, otherwise defined as novel ([205]Figure 5A). We
   found that Iso-Seq detected a large number of PASs, including thousands
   of novel PASs ([206]Figure 5B). To evaluate whether these novel PASs
   are authentic, we compared them with the experimentally verified
   3′-ends pool generated by 3′READ, PolyA-Seq and polyA_DB3 of rat
   samples. We found that most PASs were within 50 nt to those in the
   validated 3′-ends pool for both known and novel PASs ([207]Figure 5C).
   We analyzed the nucleotide composition of the 50 nt around the
   identified PASs and found that the 25 nt region upstream PAS is
   predominantly A-enriched, while the downstream is predominantly
   U-enriched ([208]Supplementary Figure S10A), consistent with known
   literature reports ([209]Chen et al., 1995). Moreover, we used MEME
   ([210]Bailey et al., 2015) to discover motif and found the canonical
   polyadenylation signal (AAUAAA) also over-represented in the PASs
   ([211]Supplementary Figure S10B) ([212]Elkon et al., 2013). These above
   lines of evidence suggested that these PASs identified by Iso-Seq are
   of satisfied quality and reliable for further analysis. A total of
   5,755 (57.32%) genes had at least two PASs based on Iso-Seq of RAECs
   ([213]Figure 5D). We next analyzed the differential usage of PAS
   between Ang II and control RAECs using the modified algorithm of DaPars
   ([214]Xia et al., 2014) for the differential PAS usage between Ang II
   and control RAECs. DaPars uses mean squared error (MSE) to predict the
   proximal PAS with mean RNA-seq coverage. Here Iso-Seq can provide
   reliable but not predicted PASs, we therefore used these PASs as prior
   knowledge and calculated their differential PAS usage between Ang II
   and Control cells by Fisher’s exact test ([215]Figure 5E, see details
   in Method section). We found that 278 and 18 genes showed a preference
   usage of proximal and distal PASs, respectively, showing a global 3′UTR
   shortening upon Ang II stimulation ([216]Figure 5F). Most of the
   length-changed 3′UTRs are derived from novel PAS ([217]Figure 5F),
   suggesting Iso-Seq provides previously-unknown information for studying
   the APA regulation in Ang II-induced RAEC senescence. Further analysis
   showed that genes tending to use proximal PASs were enriched in
   pathways related to senescence such as p53 signaling pathway
   ([218]Ciardiello and Tortora, 2008; [219]Huang and Fu, 2015)
   ([220]Figure 5G), implying that APA may fine-tunes certain key pathways
   to participate in regulation of Ang II-induced senescence.

FIGURE 5.

   [221]FIGURE 5
   [222]Open in a new tab

   Global analysis of alternative polyadenylation combined with Iso-Seq
   and RNA-seq in RAECs. (A) The Schematic diagram of known and novel
   PASs. (B) The number of known and novel PASs identified by Iso-Seq. (C)
   Histogram of the genomic distance between Iso-Seq identified PASs and
   validated 3′ends pool in both known and novel classes. (D) Distribution
   of genes that have one or more PAS. (E) Differential usage of PAS was
   calculated based on given proximal PAS as prior knowledge. (F) Scatter
   plot between ΔPDUI and expression changes for genes with significantly
   longer or shorter 3′UTRs. Green and gray represent genes with novel and
   known PASs, respectively. (G) KEGG pathway enrichment analysis for
   genes tended to use proximal PAS after Ang II stimulation.

   As mentioned above, Iso-Seq and RNA-seq have their advantages and
   disadvantages. PacBio Iso-Seq data provides more reliable PASs even at
   relatively low coverage, and DaPars was originally designed to use
   Illunima’s RNA-seq data to predict proximal PASs usage and analyze
   global 3′UTR length changes between two conditions. Here we ran both
   the original DaPars program solely on RNA-seq data (Hereinafter
   referred to as RNA-seq) and the modified DaPars on both Iso-Seq and
   RNA-seq data (Hereinafter referred to as Iso-Seq), the results run with
   original DaPars used as comparison references. The DaPars result based