Abstract

   Adenosine deaminase acting on RNA 1 (ADAR1) is the master RNA editor,
   catalyzing the deamination of adenosine to inosine. RNA editing is
   vital for preventing abnormal activation of cytosolic nucleic acid
   sensing pathways by self-double-stranded RNAs. Here we determine, by
   parallel analysis of RNA secondary structure sequencing (PARS-seq), the
   global RNA secondary structure changes in ADAR1 deficient cells.
   Surprisingly, ADAR1 silencing resulted in a lower global
   double-stranded to single-stranded RNA ratio, suggesting that A-to-I
   editing can stabilize a large subset of imperfect RNA duplexes. The
   duplexes destabilized by editing are composed of vastly complementary
   inverted Alus found in untranslated regions of genes performing vital
   biological processes, including housekeeping functions and type-I
   interferon responses. They are predominantly cytoplasmic and generally
   demonstrate higher ribosomal occupancy. Our findings imply that the
   editing effect on RNA secondary structure is context dependent and
   underline the intricate regulatory role of ADAR1 on global RNA
   secondary structure.
     __________________________________________________________________

   Adenosine deaminase acting on RNA 1 (ADAR1) edits adenosine to inosine.
   Here the authors, using parallel analysis of RNA secondary structure
   sequencing, provide evidence that ADAR1 induces
   sequence-context-dependent RNA secondary structures changes, often
   leading to stabilization of the RNA duplex.

Introduction

   Deamination of adenosine to inosine, A-to-I RNA editing^[56]1–[57]5, is
   the most widespread RNA-editing type in humans. This modification is
   catalyzed by the ADAR (adenosine deaminases acting on RNA) protein
   family. Most editing events are mediated by ADAR1 (gene symbol—ADAR)
   with two distinct protein isoforms p110 and p150. Double-stranded RNA
   (dsRNA), often formed between adjacent inverted repetitive (IR)
   elements, is the obligatory substrate of ADAR. Thus, a large fraction
   of editing sites occurs in Alu retroelements, interspersed within
   non-coding regions of mRNA transcripts including introns and
   untranslated regions (UTRs). The high abundance of the primate-specific
   Alu retroelements in the human genome, is a major cause of most of the
   RNA-editing activity in the human transcriptome^[58]6, [59]7. There are
   millions of editing sites in non-coding regions and although many are
   edited at only a tiny fraction of the transcripts, there are still
   thousands of sites where the editing level is significant^[60]8,
   suggesting a functional role.

   ADAR1 is required for normal fetal development. Knocked out mice die at
   an early embryonic stage (day 11.5) with defects in hematopoiesis and
   liver disintegration^[61]9, [62]10 apparently due to abnormal induction
   of a type-I interferon response. In hematopoietic stem cells, loss of
   ADAR1 led to global up regulation of interferon-stimulated genes and
   apoptosis^[63]11. ADAR1 is also vital for post-natal hepatic
   development and homeostasis, with hepatocyte-specific depletion of
   ADAR1 leads to liver inflammation and necrosis^[64]12, [65]13.
   Likewise, ADAR1 is vital for B lineage development^[66]14.

   dsRNA is a major inducer of anti-viral type-I interferon response via
   activation of cytosolic nucleic acid sensing pathways^[67]15. An
   essential role of A-to-I RNA editing is inhibition of the immune
   response to perfect self-dsRNA by editing such
   regions.^[68]16,[69]17–[70]19. Indeed, bi-allelic mutations in ADAR1
   were documented to cause Aicardi–Goutieres syndrome, an inflammatory
   genetic syndrome associated with toxic perturbation of the type-I
   interferon response (“interferonopathy”)^[71]20. Recent studies
   directly connect the severe phenotypes of ADAR1 deficiency to its mRNA
   editing activity^[72]21. ADAR1 is assumed to prevent activation of
   relevant cytosolic sensors of nucleic acids (mainly MDA5) by RNA
   duplexes dispersed in the human transcriptome.

   Editing may regulate RNA abundance and fate by changing the local
   structure^[73]22. This may change the stability of the RNA duplex,
   affect structural determinants recognized by trans-acting factors and
   can change accessibility of factors to defined regions^[74]23–[75]25.
   As even a single-nucleotide variant may affect RNA
   structure^[76]26–[77]28, RNA editing may well affect the transcriptome
   secondary structure landscape. Furthermore, A-to-I editing events
   frequently occur in clusters (hyper-edited regions), and such regions
   may undergo major structural alterations. Indeed, early pioneering
   studies demonstrated an unwinding activity for ADAR enzymes^[78]29,
   [79]30.

   Despite the accumulating evidence regarding the regulation of
   transcript structures by ADAR1 and the role of this regulation in
   health and disease, no study investigated the structural changes
   induced by ADAR using an experimental global approach.

   Recently, new methodologies were developed for experimental
   identification of RNA 2D structure in a transcriptome wide manner using
   next generation sequencing (NGS). Some are based on probing with
   single-strand (SS) and double-strand (DS) specific nucleases^[80]31,
   [81]32. Using such approaches, the global yeast, thale cress, fruit
   fly, nematode and human transcriptome 2D structures, were
   mapped^[82]27, [83]33–[84]35. Methods that are based on chemical
   probing^[85]36, more suitable for in vivo studies, were also developed
   and applied^[86]37–[87]40.

   Here, we report the use of high-throughput probing techniques to
   directly examine, the effect of ADAR1 deficiency and A-to-I editing on
   the whole transcriptome 2D structure and relate the data to the
   regulation of the cytosolic innate immune response. We show that in
   human cells, hundreds of transcripts undergo ADAR1 editing-dependent
   structural changes. In contrast to the prevailing paradigm that ADAR1
   activity decreases RNA duplex stability, we find that A-to-I editing
   may increase the stability of RNA duplexes and that the effect on the
   global RNA structure is context dependent. The smaller subset of genes
   whose transcripts are destabilized upon editing, as previously
   postulated, are significantly enriched for housekeeping genes and genes
   linked to the anti-viral interferon response. These transcripts tend to
   be localized in the cytoplasm, and are relevant to the involvement of
   ADAR1 in dampening innate immune response.

Results

PARS-seq of HepG2 cells captures RNA 2D structure

   In order to examine global RNA structural changes induced by ADAR1
   silencing, we applied PARS-seq^[88]32 as detailed in the Methods
   section. Purified poly-A + RNA samples from control HepG2 cells and
   ADAR knockdown (KD) HepG2 cells were extracted and in vitro probed with
   S1 nuclease (S1) and RNAse V1 (V1) that preferentially cleave RNA at SS
   and DS regions, respectively (Fig. [89]1 and Supplementary
   Table [90]1). The RNA was then reverse-transcribed and sequenced using
   Illumina HiSeq 2500. To uncover S1 or V1 digestion profiles we counted
   the number of reads that start at each base of the analyzed RNA
   molecule. The procedure was performed in two biological replicates for
   each treatment (Fig. [91]1).

Fig. 1.

   Fig. 1
   [92]Open in a new tab

   PARS-seq experimental flow. a Construction of single-stranded (SS) or
   double-stranded (DS) RNA profile is done by digestion with S1 or V1,
   respectively. b Samples used in this study

   To verify that the PARS-seq protocol reliably deciphers the structural
   profiles of the analyzed RNA molecules, we compared the structural
   profile obtained for a known control, the p4–p6 domain of the
   Tetrahymena ribozyme, inserted together with the studied human RNA, to
   its previously determined structure (PDB ID: 1x8w). We show that the
   mean PARS score is higher in known DS bases compared to known SS bases
   (Fig. [93]2b). The sum of raw V1 read starts is higher in DS bases
   compared to known SS bases while a reverse trend exists for S1 read
   starts (Fig. [94]2a), implying that the observed structural profile
   matches the solved 3D structure (~90% of the reads agree with the known
   structure, Fig. [95]2b, c). We also examined natively expressed human
   RNA molecules with a known fold. The PARS-seq structural profile of the
   human tRNA-ARG (UCSC ID: uc021ucc) matches the solved RNA structure of
   its yeast homolog (PDB ID: 1f7u, about ~60–80% of the reads agree,
   Supplementary Fig. [96]1) and the structural profile of human U2 (UCSC
   ID: uc021xtz) matches its conserved structure (Supplementary
   Table [97]2; Rfam ID: RF00004). Additionally, we verified that our PARS
   scores for long RNA molecules, such as the long non-coding RNA (lncRNA)
   MALAT1, show significant correlation between biological replicates
   (single-nucleotide resolution, Spearman’s correlation of R = 0.94). We
   also verified that calculated PARS scores from a previous study of
   lymphoblastoid cells transcriptome^[98]27 significantly correlate with
   our scores (Spearman’s R = 0.6), indicating that the PARS-seq approach
   we utilize reliably captures the global structural features of native
   long RNA molecules in different cell types. The V1 profiles obtained
   are significantly different from the S1 profiles in all the samples
   while the profiles of replicates are nicely correlated (heat maps in
   Fig. [99]2d and Supplementary Fig. [100]1d).

Fig. 2.

   [101]Fig. 2
   [102]Open in a new tab

   Verification of the PARS-seq 2D structural data. a Agreement with
   solved RNA structure. Ratio between the sum of read starts at known SS
   or DS bases based on the solved structure of Tetrahymena ribozyme (PDB
   ID: 1x8w). Blue: bases known as SS bases. Red: bases known as DS bases.
   S1, C1: control 1 cleaved by S1. V1, C1: control 1 cleaved by V1. S1,
   C2: control 2 cleaved by S1. V1, C2: control 2 cleaved by V1. b Mean
   PARS score. SS, C1: bases known as SS in control 1. DS, C1: bases known
   as DS in control 1. SS, C2: bases known as SS in control 2. DS, C2:
   bases known as DS in control 2. **Wilcoxon test, p < 0.01 c The solved
   structure of p4–p6 domain of the Tetrahymena ribozyme (PDB ID: 1x8w)
   colored by the PARS scores obtained in this study. Blue: PARS score < 0
   (SS region). Red: PARS score > 0 (DS region). d Spearman’s correlations
   were calculated between vectors per sample, where each vector element
   contains the number of reads which start at a particular base
   (correlation here is at single-nucleotide resolution). The correlation
   between samples is based on the 1000 most highly expressed genes to
   ensure sufficient coverage at the base level. Red: Spearman’s R = 1.
   Blue: Spearman’s R = 0. As expected, correlations between V1 and S1
   samples are low, further confirming the PARS-seq procedure

   In agreement with a previous study that applied a similar
   methodology^[103]32, start and stop codons indeed showed local minima
   of PARS score with major differences found in 2D structure near
   translation start/stop sites (Supplementary Fig. [104]2a and
   Supplementary Table [105]3). We likewise observed 2D structure
   periodicity at the CDS with lower PARS scores at the first base of the
   codons (Supplementary Fig. [106]2b, c and Supplementary Table [107]4).
   A significant trend (Wilcoxon test, p < 0.005) was detected for excess
   of V1 reads (DS) within Alu retroelements (Supplementary Table [108]5)
   as Alu elements in inverted orientation (IR) result in DS structures.
   Some specific RNA sites are not represented in neither the V1 nor the
   S1 libraries in PARS-seq experiments. Some transcripts expressed at low
   level may escape detection due to inadequate coverage. We considered an
   additional less trivial explanation for lack of detection, assuming
   that some sequences are not accessible to the solvent in the 3D fold
   and therefore cannot be digested by both V1 and S1 enzymes^[109]41. We
   put this hypothesis to the test and analyzed the Tetrahymena ribozyme
   domain p4–p6 that was used as control and checked if minimally cleaved
   regions have low solvent accessibility as computed based on the solved
   3D structure (PDB ID: 1x8w). Indeed, non-accessible (buried) bases were
   found to have, on average, lower read coverage in both V1 and S1
   libraries (p = 0.02, Supplementary Table [110]6).

Effects of ADAR1 silencing on editing and gene expression

   We silenced ADAR1 in HepG2 cells as described in the methods section.
   In brief, we used validated siRNA for ADAR1 (siADAR1) and scrambled
   siRNA to generate ADAR1 knockdown (KD) and control cells, respectively
   (Methods). quantitative reverse transcription (qRT)-PCR and
   western-blotting (Supplementary Fig. [111]3) showed an effective ADAR1
   KD of ~90%. RNA-seq and PARS-seq verified this observation with 72 and
   66% reduction of ADAR1 transcripts, respectively. 131,086 known
   potential editing sites (from the A-to-I RNA-editing collections in
   DARNED^[112]42 and RADAR^[113]8) were found to be edited in at least
   one PARS-seq sample. As expected, significant global decrease in
   RNA-editing level was observed in ADAR1 deficient cells (Supplementary
   Fig. [114]4a; p < 10^−15). Moreover, editing sites located in IR Alus
   in 3′UTR of genes, which are predominantly ADAR1 targets, showed
   reduction of about 70% in editing level (Supplementary Fig. [115]4f;
   p < 10^−15). To further identify sequence reads that are edited at many
   positions and could not be mapped by standard NGS alignment tools that
   were used to detect the known 131,086 editing sites, we also applied de
   novo detection of hyper-edited sites as described in Porath et
   al.^[116]43 Both numbers and levels of editing sites were reduced also
   in this set in the ADAR1 deficient cells (Supplementary Fig. [117]4b,
   e). A total of 5917 sites from this analysis were more confidently
   called, as they are located within uniquely aligned reads, and thus
   were added to the previous 131,086 set for further analysis.

   The significant enrichment (Supplementary Fig. [118]4c) of A-to-G
   mismatches compared with all other mismatches proves that our
   methodology is both sensitive and specific for A-to-I editing detection
   as previously described^[119]3. A strong ADAR1 binding motif was
   identified in the newly identified edited sites (using the Porath et
   al.^[120]43 pipeline), as demonstrated by under-representation of G at
   (−1) to the editing site (Supplementary Fig. [121]4d). Overall, 137,003
   editing sites were identified in at least one sample and were further
   intersected with the structural results. Complete lists of the editing
   sites in this study are available in Supplementary Data [122]1.

   We identified 5644 RNA-editing sites which have significantly lower
   editing level in ADAR1 KD than in control cells (p < 0.05). 508 of the
   sites also passed false discovery rate (FDR) filter (q < 0.1) in the S1
   samples or in the V1 samples. These differentially edited sites (DES)
   are likely specific targets of ADAR1 (Supplementary Data [123]2). DES
   have the potential to directly change Watson–Crick interfaces in the
   RNA, or alter binding sites of RNA-binding proteins that help to
   stabilize RNA in particular conformations.

   The expression of ADAR2 (synonym: ADARB1) was increased by ~2.2 fold in
   ADAR1 KD cells (FDR < 0.05 in both V1 and S1 samples), in line with the
   previous demonstration of a reciprocal compensatory change in
   abundance/activity between ADAR2 and ADAR1^[124]23, [125]44. Editing
   sites that demonstrate lower editing level in KD samples are therefore
   most likely ADAR1 substrates (DES, Supplementary Data [126]2), while
   sites with higher editing in KD are probably the targets of ADAR2 (only
   174 sites, included in Supplementary Data [127]3). The high ratio of
   ADAR1 to ADAR2-editing targets found here (~32-fold), further verifying
   our approach.

Regions with ADAR1-regulated secondary structure changes

   To identify transcripts where the V1 or S1 profiles were significantly
   changed between control and ADAR1 KD, we compared the correlation
   between the vectors of read start counts for each RefSeq gene. We
   derived the correlation difference (CorDiff) as an objective score for
   this analysis and filtered the data to retain only those transcripts
   with CorDiff > 0 in both V1 and S1 samples, that conform with a random
   shuffling test to avoid low sampling bias (Methods). 3196 RefSeq
   transcripts with secondary structure changes (SSC) had CorDiff > 0 in
   both V1 and S1. In total, 2155 of these transcripts have original
   CorDiff that is higher than 95% of the random trials (empirical
   p < 0.05, Supplementary Data [128]4). Similar analysis was also
   performed for individual exons (Supplementary Datas [129]5–[130]6).
   Analysis using DAVID^[131]45 and GOrilla^[132]46 of SSC transcripts
   revealed significant enrichment of biological and molecular functions
   related to ribonucleotide binding, mitosis, nuclear division, ATPase
   activity, RNA binding and p53 binding (FDR < 0.05; Supplementary
   Data [133]7). We found that SSC in 3′UTRs were enriched for DES
   (Table [134]1) and for IR Alu retroelements separated by <4000
   bases^[135]3 (Supplementary Table [136]7), as compared with a control
   set of 3′UTRs with similar coverage that were not altered. In addition,
   we found that the SSC 3′UTR exons overlapped with ADAR1 binding sites
   previously identified by Clip-seq (Supplementary Table [137]7)^[138]47.
   The lists of RNA-editing sites and DES in SSC 3′UTR exons are available
   in Supplementary Datas [139]8–[140]9.

Table 1.

   RNA editing is enriched within structurally changed (SSC) 3′UTR exons
   # RNA-editing sites^a Total genomic size (# bases) Fold p-value
   SSC 3′UTR (avg. read starts > 50^b) 1479 2,274,485 1.38 <10^−15
   Non-SSC 3′UTR (avg. read starts > 50) 9157 19,443,239
   SCC 3′UTR (avg. read starts > 200) 1293 1,550,212 1.65 <10^−15
   Non-SSC 3′UTR (avg. read starts > 200) 8227 16,311,452
   SSC 3′UTR (avg. read starts > 1000) 618 433,095 2.51 <10^−15
   Non-SSC 3′UTR (avg. read starts > 1000) 5303 9,328,336
   # DES ^c Total genomic size (# bases) Fold p-value
   SSC 3′UTR (avg. read starts > 50) 213 2,274,485 1.5 5.93 × 10^−10
   Non-SSC 3′UTR (avg. read starts > 50) 1215 19,443,239
   SCC 3′UTR (avg. read starts > 200) 206 1,550,212 1.88 <10^−15
   Non-SSC 3′UTR (avg. read starts > 200) 1152 16,311,452
   SSC 3′UTR (avg. read starts > 1000) 121 433,095 2.93 <10^−15
   Non-SSC 3′UTR (avg. read starts > 1000) 891 9,328,336
   [141]Open in a new tab

   ^aKnown RNA-editing sites and de novo detected hyper-edited sites
   (detected by the pipeline of Porath et al.^43) which are edited in the
   PARS-seq samples

   ^bAveraged number of read starts in all the PARS-seq samples

   ^cRNA-editing sites that showed significantly lower editing levels
   G/(A + G) in ADAR KD samples

   The observed structural changes may affect cellular physiology via
   several mechanisms. Structural alterations may affect binding
   specificity and avidity of RNA-binding proteins or ncRNAs. Search for
   SSC in 3′UTRs with long IR Alu regions revealed that these regions are
   enriched with binding sites for STAU1, an RNA binding protein which
   binds DS RNA and activates Staufen-mediated-decay (Supplementary
   Table [142]8, based on STAU1 hiCLIP data; Methods section)^[143]48. IR
   Alus in 3′UTRs that are influenced by editing-mediated SSC are also
   enriched by miR binding sites (Supplementary Table [144]9). Additional
   possible molecular outcomes of editing-mediated SSC include alteration
   of binding of lncRNAs, of splicing or 3′UTR shortening and change in
   nuclear/cytoplasmic mRNA localization and stability^[145]22.

   We showed before that ADAR indeed participates in alternative splicing
   regulation^[146]49. Since alternative splicing might be interpreted as
   SSC, we looked at single isoform genes, where there is no splicing that
   can affect transcript secondary structure. We found a similar (6.4%)
   fraction of SSC exons in single isoform genes compared to non-SSC exons
   (Supplementary Table [147]10), suggesting that alternative splicing
   does not play a significant role in the structural changes observed
   upon ADAR silencing. We also performed independent analysis to detect
   alternatively spliced exons based on the PARS-seq data using
   DEXSeq^[148]50 (Supplementary Data [149]10) and found that only ~22% of
   the SSC exons are identified as alternatively spliced (Supplementary
   Table [150]11). These findings indicate that most of the observed
   differences actually reflect local structural changes.

Structural changes in 3′UTRs of selected genes

   SSC 3′UTR are enriched with editing sites and DES (see above).
   Illustrative example for such cases, reflecting changes in mRNA folding
   are shown in Fig. [151]3a. The 3′UTR of F11R is highly edited with 24
   editing sites identified; at least 5 of them are DES (for the full DES
   list see Supplementary Data [152]2). A major change in 3′UTR folding
   upon ADAR knockdown was revealed by PARS-seq analysis and RNA 2D
   prediction using constraints derived from the PARS-seq experiment
   (Fig. [153]3a and Supplementary Fig. [154]5a).

Fig. 3.

   Fig. 3
   [155]Open in a new tab

   Examples of SSC in hyper-edited 3′UTRs. The arc diagrams show the base
   pairing in IR Alus in control (blue) and KD (green). a Prediction of
   control and KD structures using the PARS-seq data of F11R SSC 3′UTR IR
   Alu region (chr1:160966275-160967962/hg19) as constrains for RNAfold.
   Corresponding 2D structure diagrams from control (left) and KD (right)
   samples are shown. b Prediction of control and KD structures using the
   PARS-seq signals of PSMB2 IR Alu region (chr1:36066658-36067955/hg19)
   as constrains for RNAfold. Corresponding 2D structure diagrams of
   control (left) and KD (right) are shown. The change in base pairing is
   represented in both examples by the differences in the arcs in control
   and KD samples

   PSMB2 is also highly edited, with 61 editing sites in its 3′UTR, 21 of
   them DES, more than in any other gene (Supplementary
   Datas [156]8–[157]9). The structural profiles indicate that the 3′UTR
   structure is affected by ADAR silencing. Fig. [158]3b illustrates the
   IR Alu structures at the 3′UTR using the constraints derived from the
   PARS-seq experiment. Notably, the strong IR Alu structure of PSMB2
   3′UTR was shown to affect protein expression^[159]51; this strong
   duplex was recently used as ADAR “sponge”^[160]52, Moreover, the
   association between PSMB2 3′UTR and ADAR is also supported by the ADAR
   Clip-seq data we analyzed (Supplementary Table [161]12). Additional
   examples are the 3′UTR of MDM4 (93 editing sites, of which 16 are DES),
   and the 3′UTR of the XIAP gene (87 editing sites, 12 of them are DES)
   whose structural profiles are also altered by ADAR reduction
   (Supplementary Fig. [162]5b, c).

   We examined not only the minimum free energy structures but also the
   structural ensembles predicted using constraints taken from the
   PARS-seq of control and KD samples. The structural ensembles of IR Alus
   in F11R and PSMB2 differ significantly between control and KD cells
   with the KD samples showing fewer DS bases (Wilcoxon test, p < 0.0001
   for both, Supplementary Table [163]13). SeqFold^[164]53 analysis of
   these two examples reveal the same trend, as in both PSMB2 and F11R the
   structures are less stable in KD conditions (ΔG control < ΔG KD).

3′UTRs adopt different 2D structure upon editing

   Comparison of the structural profiles (at single nucleotide resolution)
   around edited bases (in regions up to 50 bases upstream to the edited
   adenosine) and the structural profiles of their non-edited version
   revealed apparent differences in the local 2D structure (Fig. [165]4a).
   This particular analysis was based only on the control samples and not
   on the ADAR KD cells, so the changes can be solely attributed to
   editing and not to other editing-independent functions of ADAR.

Fig. 4.

   [166]Fig. 4
   [167]Open in a new tab

   Structural differences between edited molecules and their non-edited
   versions based solely on control samples data. a Heatmap for the
   correlation between edited and non-edited control samples in a [−50,
   50] flanking regions centered on the edited adenosine. Each data point
   in the vectors being correlated is the number of reads that start in
   each position over all edited regions. Red: Spearman’s R = 1. Blue:
   Spearman’s R = −1. Note that edited reads show higher correlation to
   edited reads from the other sample and lower correlation to non-edited
   reads from the same sample. b Average relative PARS score for edited
   and non-edited reads in a region of 10 bases upstream to the edited
   adenosine. The error bars represent standard errors calculated from the
   relative PARS scores of the studied regions (87 and 223 regions for
   edited and non-edited versions, respectively). In both a and b we only
   analyzed regions centered on the edited adenosine with sufficient
   coverage in both replicates (>50 read starts in [−50, 50] region)

   Moreover, when comparing the relative PARS score upstream to the
   editing site between edited and non-edited reads, different patterns
   were inferred (Fig. [168]4b). Edited transcripts tend to adopt more
   duplex structures in the 5–9 bases upstream to the editing site, while
   non-edited transcripts tend to have more compact double stranded
   structures 2–4 bases upstream to the editing site. This analysis was
   restricted to bases located upstream to the edited adenosine (as both
   the read start and the edited position need to be on the same specific
   read in this analysis). Analysis of lymphoblatoid cells data showed
   similar patterns (Supplementary Table [169]14).

   Some genes show apparent differences in structural profiles in the
   vicinity of editing sites (50 bases upstream to the edited adenosine)
   between edited molecules and non-edited molecules (Supplementary
   Data [170]11). BPNT1 3′UTR is hyper edited with 37 editing sites, 19 of
   which are DES. One example for such structurally changed region is
   shown in Fig. [171]5a [from −50, to 50 bases] centered at the editing
   site (chr1:220231330/hg19. CorDiff score = 0.83, empirical p < 0.001).
   BPNT1 3′UTR was also found to be SSC when comparing KD to control
   samples. Fig. [172]5b depicts an example for editing associated with
   SSC (CorDiff score = 0.37, empirical p = 0.001) for PAICS 3′UTR in
   window [from −50, to 50 bases] centered at the editing site in
   chr4:57326262/hg19. PAICS 3′UTR, which is also hyper-edited, was shown
   to be involved in nuclear retention^[173]54.

Fig. 5.

   [174]Fig. 5
   [175]Open in a new tab

   Examples for editing sites associated with differences in 3′UTRs 2D
   structure between edited and their non-edited versions. a Region of
   [−50, 50] centered at the editing site (chr1:220231330/hg19) in the
   3′UTR of BNPT1. b Region of [−50, 50] centered at the editing site
   (chr4:57326262/hg19) in the 3′UTR of PAICS. The editing sites tested in
   each panel are marked with black arrow. Each track y-axis includes the
   normalized numbers of read starts at each position. Red: V1, non-edited
   samples, orange: V1, edited samples, blue: S1, non-edited samples,
   green: S1, edited samples

RNA editing often stabilizes RNA structures

   To check how ADAR silencing affects the number of paired (DS) bases,
   4021 long and highly expressed transcripts (longer than 100 bases and
   covered by more than 256 read starts) were subjected to Seqfold
   pars2spp analysis (PARS to structure preference profile; Ouyang et
   al.^[176]53. Pars2spp identifies confident paired (DS) or unpaired (SS)
   bases in each transcript according to statistical testing of the
   enrichment of V1 counts and S1 counts, respectively. Using these
   classifications, we calculated a gene-wise ratio of DS/(DS + SS), and
   compared it between control and ADAR KD samples. This ratio is found to
   be significantly lower in the KD samples (Wilcoxon’s test p < 10^−15),
   namely, the number of confident base pairs identified has been
   decreased (Fig. [177]6a, b). A related observation is that edited
   regions in the control samples are enriched with DS bases compared to
   the same regions in the KD samples (Supplementary Fig. [178]6a) and the
   DS/(DS + SS) ratio is positively correlated with editing level
   (G/(A + G)) (Supplementary Fig. [179]6b, Spearman’s R ~ 0.12, p < 0.01
   for both controls).

Fig. 6.

   [180]Fig. 6
   [181]Open in a new tab

   Editing is associated with global changes in 2D structure. a The
   relative number of DS bases in the control samples (red bars) is
   compared to the relative number of DS bases in the KD samples (blue
   bars) for each expressed transcript (length > 100 bases and number of
   read starts > 256). **Wilcoxon test, p < 0.01. b Density plots are
   shown for the distribution of per-transcript DS/(DS + SS) values in
   control and KD. Red: control, replicate 1, pink: control, replicate 2.
   Blue: KD, replicate 1, light blue: KD, replicate 2. c General scheme
   for editing of A:C and A:U pairs. Editing of A:C into I:C is more
   frequent than expected by chance (observed: 23%, expected: 13%. Fisher
   exact test, p = 0.001) and its editing level is higher than editing of
   A:U pairs into I:U. Editing of A:U into I:U occur less than expected by
   chance (observed: 73%, expected: 82%, Fisher exact test, p = 0.006).
   These numbers are based on alignment of only confident IR Alu duplexes
   intersected with 3′UTRs and depict the results from Table [182]2 and
   Supplementary Table [183]15. d Functional enrichment for DSWE genes.
   X-axis: functional terms. Y-axis: –Log[10](q-value). The dashed line
   indicates q = 0.05. e Averaged expression level for DSWE (436 genes)
   and SWE genes (2300 genes) in PARS-seq (averaged from 2 biological
   replicates per sample in both S1 and V1 treatments, i.e., 4 samples per
   control and 4 samples per KD). Error bars indicate standard errors.
   ***Wilcoxon test, p < 0.0001. f Density plots for the mean expression
   in DSWE and SWE genes in different tissues based on GNF atlas. Wilcoxon
   test, p = 1.08 × 10^−5.g The percent of DSWE/SWE genes among increasing
   number of genes ranked by cytoplasmic to nuclear ratio. The gene with
   the highest cytoplasm/nucleus expression ratio is ranked first, and the
   one with the lowest cytoplasm/nucleus expression ratio is ranked last.
   DSWE genes (red line) are enriched within genes highly expressed in the
   cytoplasm. The inset shows a snapshot of the first 1000 genes with the
   highest cytoplasmic/nuclear ratio and the lowest cytoplasmic/nuclear
   ratio. **Fisher exact test, p < 0.01

   The first role attributed to ADAR was unwinding of double helical
   RNA^[184]29, [185]30 and it was long assumed that ADAR destabilizes RNA
   duplexes^[186]55. This is also consistent with the recent results
   regarding its role in suppressing innate immune response by
   destabilizing self dsRNAs^[187]16–[188]19. However, Wong et al.^[189]25
   found that the preferred substrate for ADAR editing in GRIA2 site and
   in human hepatitis delta virus (HDV) is a pseudo-pairing between A and
   C (A:C) such that deamination of A-to-I, and therefore the formation of
   I:C pairing, increases RNA stability. We previously^[190]3 noted that
   the number of edited A:C pairing (the formation of I:C pairing) is
   significantly larger than expected by chance, even though in absolute
   terms most RNA-editing events target A:U pairing. In the current data
   the number of A:C pairing is also significantly larger than expected by
   chance (Fisher exact test, p = 0.001, Supplementary Table [191]15).
   Moreover, we found that editing level at A:C pairing within IR Alus is
   significantly higher than at A:U sites (averaged-editing level of 16%
   vs. 10%, for A:C and A:U, respectively. p < 10^−15; Table [192]2).
   Recent energy estimations^[193]56 showed that I:C pairing is about
   twice as stable as I:U pairing (−5.6 vs. −2.56 kcal per mol,
   respectively). As depicted schematically in Fig. [194]6c, we propose
   that the overall effect of A-to-I editing on DS stability is a
   trade-off between these two types of base-pair modulations. On the one
   hand, there is a strong preference to “correct” A:C mismatches
   (Fig. [195]6c on the right, with fewer sites but with higher editing
   level) into more thermodynamically stable I:C pairs. On the other hand,
   there is an abundant, but statistically less favored, editing of A:U
   pairs into I:U mismatches^[196]57 (left part of Fig. [197]6c, with more
   sites in absolute numbers but lower editing level). The model shown in
   Fig. [198]6c depicts the core results presented in Table [199]2 and in
   Supplementary Table [200]15.

Table 2.

   Average editing level [G/(A + G)] in A:C and A:U pairs within edited IR
   Alu intersected with 3′UTRs. Editing level [G/(A + G)] is averaged from
   all control samples
              Min    25%   Median  Mean   75%   Max  St dev  N   Wilcoxon’s p
   A:C pairs 0.0008 0.0316 0.0808 0.1586 0.1888 0.97 0.20   1414 <10^−15
   A:U pairs 0.0004 0.0238 0.0476 0.103  0.125  1    0.14   4378
   [201]Open in a new tab

   We restricted this analysis to sites with base coverage (A + G) > 5
   reads

   From this model, it is also clear that in the case of perfect DS RNAs,
   editing can in fact only destabilize the duplex as no A:C mismatches
   are present. On the other hand, the effect of editing on non-perfect DS
   RNAs is sequence dependent, and frequently stabilizing.

   ADAR1 deficient cells have fewer RNA paired bases, making their mRNAs
   largely less thermodynamically stable. Interestingly, an interaction
   between ADAR1 and HuR, a protein that stabilizes mRNA, was recently
   identified^[202]23, implying an additional indirect effect of ADAR on
   RNA stability.

   We next examined if the relative number of buried bases (associated
   with low digestion frequency by both V1 and S1) differs between control
   and KD samples. Interestingly, transcripts in the KD samples tend to
   contain relatively more buried bases than transcripts in the control
   state (Supplementary Table [203]16. p < 10^−15 for both replicates).
   Taken together with the above results, we suggest that globally, KD
   transcripts, which have less defined 2D structures, form a more
   globular (and less solvent accessible) 3D structure.

Editing-dependent destabilized transcripts

   We further divided the genes into two sets: 1. Genes with higher
   DS/(DS + SS) ratio in control cells than KD cells, with presumably a
   stabilizing editing effect, termed “stabilized while edited” (SWE,
   Supplementary Data [204]12). 2. Genes with lower DS/(DS + SS) ratio in
   control cells than in the KD cells, in which editing has presumably
   destabilizing effect. This set was termed “destabilized while edited”
   (DSWE, Supplementary Data [205]12).

   We found that the DSWE gene set was significantly enriched for Alu
   repeats and editing sites over the SWE gene set (Table [206]3). The
   DSWE transcripts were also enriched for short windows with identical
   nearby reverse complementary sequences (Supplementary Table [207]17).
   Importantly, we found that our DSWE gene set significantly overlapped
   (p < 0.01) with the gene set of long (>50 bases) hyper-edited duplexes,
   computed in accordance with the pipeline of Porath et al.^[208]43
   Further analysis of the DSWE gene set revealed a significantly lower
   free energy of confident IR Alu folding in 3′UTRs (Supplementary
   Table [209]18) and higher DS/(DS + SS) ratio in control cells
   (Supplementary Table [210]19). Taken together, our finding, that the
   DSWE gene set is enriched for complementary sequences that fold into
   highly stable duplexes, is consistent with our observation that A-to-I
   editing destabilizes this type of secondary structures. As the
   secondary structures of DSWE transcripts are a priori very stable,
   editing events which take place will likely reduce their stability. In
   KD samples the DSWE transcripts are not edited and thus retain more
   stable RNA duplex regions, (vice versa for SWE transcripts).

Table 3.

   Destabilized while edited (DSWE) genes are frequently edited and
   enriched with Alu repeats more than stabilized while edited (SWE) genes
   # Editing sites Summarized genomic size (# bases) Fold p-value
   DSWE genes^a 4156 14,121,576 1.22 <10^−15
   SWE genes^b 24,256 100,526,979
   # DES Summarized genomic size (# bases) Fold p-value
   DSWE genes 186 14,121,576 1.2 0.0089
   SWE genes 1109 100,526,979
   # Alu Summarized genomic size (# bases) Fold p-value
   DSWE genes 9587 14,121,576 1.04 6.06 × 10^−10
   SWE genes 65,706 100,526,979
   # Alu in IR Summarized genomic size (# bases) Fold p-value
   DSWE genes 7215 14,121,576 1.07 2.54 × 10^−15
   SWE genes 47,881 100,526,979
   [211]Open in a new tab

   ^aGenes that their average ratio of DS/(DS + SS) as resulted from
   pars2spp analysis (Methods section) is higher in KD than in control
   samples. Therefore, these genes are suggested to be less stable while
   edited

   ^bGenes that their average ratio of DS/(DS + SS) as resulted from
   pars2spp analysis (Methods section) is higher in control than in KD
   samples. Therefore, these genes are suggested to be more stable while
   edited

   To identify the enriched biological processes and molecular functions
   within the DSWE gene set, we applied Gene Set Enrichment Analysis
   (GSEA) (Fig. [212]6d; Supplementary Data [213]13). Relevant
   significantly enriched validated gene sets (FDR q < 0.05) included
   housekeeping genes and gene sets involved in a variety of vital
   biological and metabolic processes, such as protein and RNA
   biosynthesis. Importantly, enrichment for genes involved in the innate
   immune response to LPS and type-I interferon response were also
   detected. Analyses using clusterProfiler (for KEGG pathway enrichment
   analysis) and GOrilla showed similar functional enrichment signatures.

   Liddicoat et al. recently^[214]16 demonstrated that mouse Adar has an
   important role in destabilizing dsRNA duplexes formed within a very
   limited number of mRNA 3′UTRs, by folding of complementary inverted
   sequences, preventing over-activation of the MDA5-dependent interferon
   response in the mouse. The enrichment for interferon response genes in
   the DSWE set hints to a similar mechanism in humans. Interestingly,
   KEGG pathways analyses revealed that genes involved in systemic lupus
   erythematosus (SLE) are also enriched. While this result should be
   cautiously interpreted, as most SLE-related genes in our list are
   histone genes, over-activation of type-I interferon pathway was
   observed in both SLE and in AGS^[215]20. Many symptoms are shared
   between these pathologies^[216]20. Our results are also in agreement
   with the results of Vitali et al.^[217]58 which found that I:U pairs
   (which characterize the DSWE genes set) suppress the MDA5-dependent
   interferon response, while I:C pairs that stabilize the RNA duplex
   induce the interferon response.

   While analyzing the epigenetic markers of DSWE and SWE genes, a higher
   signal of H3K4me3 was clearly found for the promoter regions of DSWE,
   while a higher signal of H3K27me3 was detected in the promoter regions
   of SWE genes (Supplementary Table [218]20). In accordance, we found
   that DSWE genes show a significantly increased expression (~5-fold
   change; p < 10^−15) compared to SWE genes (Fig. [219]6e; expression
   values are available on Supplementary Data [220]12). A similar result
   (p = 1.08 × 10^−5) was also obtained by computing relevant GNF
   microarray data derived from several different tissue types
   (Fig. [221]6f). This is also supported by the RNA-seq and ribosome
   profiling we performed (see below) on control and ADAR KD HepG2 cells
   (Fig. [222]7b). Analysis of HepG2 gene expression data from the ENCODE
   database revealed that DSWE transcripts are enriched in the cytoplasm
   compared to SWE genes (Fig. [223]6g). The enrichment of the DSWE set
   for highly expressed cytoplasmic transcripts is consistent with the
   possible involvement of these transcripts in response to environmental
   signals, including the cytosolic immune response to viral nucleic
   acids. It is also in agreement with the known activity of ADAR p150 in
   the cytoplasm and its induction during the type-I interferon
   response^[224]13.

Fig. 7.

   [225]Fig. 7
   [226]Open in a new tab

   RNA-seq and ribosome profiling analysis. a Overlap between secondary
   structure changed (SSC) exons and the 1000 genes with the highest
   translation efficiency ratio (Methods section). (+) indicates SSC exons
   and (−) indicates non-SSC exons (background list). The fold-ratio is
   calculated according to the overlap with the (−) SSC exons. Only (+)
   SSC 3′UTR exons are enriched in the 1000 genes with the highest score.
   Fisher exact test, *p < 0.05, **p < 0.01, ***p < 0.001. b Averaged
   RNA-seq and Ribo-seq expression (averaged from 2 biological replicates
   of the Ribo-seq) differences (in RPKM values) between DSWE (210 genes)
   and SWE genes (1196 genes). Error bars indicate standard errors. t
   test, *p < 0.05, **p < 0.01, ***p < 0.001. c Mean ribosome occupancy
   around the start codon (left) and stop codon (right) of the coding
   sequence (CDS) of DSWE (red) and SWE (blue) genes. CDS is marked with
   gray shading. KS test p < 0.001 for both start/stop codon regions. d
   Percent of upregulated and downregulated genes in the KD samples
   between DSWE and SWE genes. DSWE genes are more frequently upregulated
   (Fisher exact test, p = 6.67 × 10^−13)

3′UTR structure changes and modulation of translation

   Translation is regulated directly by binding of translation factors to
   the mRNA 3′UTR^[227]59. This binding may depend on a defined RNA
   structure. Therefore, to explore if SSC 3′UTRs are associated with
   protein expression regulation, we conducted RNA-seq and ribosome
   profiling (Ribo-seq) in control and ADAR1 KD HepG2 cells (Methods
   section). Genes with SSC in their 3′UTR were found to be enriched for
   differences between Ribo-seq fold-change (FC) and RNA-seq FC (top 1000
   genes in the sorted list of all expressed genes, ranked by their
   translation efficiency ratio between KD and control; Fig. [228]7a). The
   structural changes may be related to modulation of RNA stability and
   abundance, or to regulation of translation. Interestingly, genes with
   SSC 3′UTR exons were shown to alter their translation efficiency more
   than SSC 5′UTR or SSC CDS exons (Fig. [229]7a), suggesting that ADAR
   regulation of translated genes is mediated mainly by 3′UTRs.

   We found that DSWE are highly translated compared to SWE
   (Fig. [230]7b), in agreement with their transcription levels. Ribosome
   occupancy signatures for DSWE and SWE are significantly different along
   the transcript (Fig. [231]7c; KS test p < 0.001 for both start/stop
   codon region), with DSWE genes consistently showing higher signals.
   More careful inspection of DSWE and SWE translation profile reveals
   that DSWE are frequently upregulated in the KD samples (Fig. [232]7d;
   Fisher exact, p = 6.67 × 10^−13), suggesting that KD of ADAR and
   reduction of RNA editing may contribute to DSWE mRNA stability and
   increase DSWE gene translation.

Discussion

   Our experimental data provide insight into the SSC induced by ADAR1 on
   dsRNA in the entire transcriptome of human cells. We demonstrated that
   the abundant SSC induced by ADAR1 silencing overlap with A-to-I editing
   sites in relevant transcripts, implying a direct role for editing in
   shaping mRNA structure. Yet, it is theoretically possible that some SSC
   resulted from an indirect editing-independent regulatory function of
   ADAR1.

   RNA bases can easily pair, intra-molecularly, resulting in RNA folding
   into a variety of secondary structures. Long intramolecular dsRNAs are
   the major substrate of ADAR1, and the pioneering studies in the
   RNA-editing field proposed that ADAR1 primarily unwinds dsRNAs^[233]29,
   [234]30. However, our results, rather unexpectedly, show that the
   effect of editing on the entire transcriptome is to increase the DS/SS
   ratio. Thus, our data support a modified paradigm whereby RNA editing
   can enhance or reduce proper intramolecular base pairing in a
   context-dependent manner. An explanation for the arguably contradicting
   observations is that in long perfect dsRNA structures, by definition,
   there are no A:C mismatches, and the only possible effect of editing is
   to make the structure less stable. However, in the human transcriptome
   where the dsRNA structures are formed by imperfect Alu repeats, with
   typically over 15% mismatches, editing that changes A:C mismatches to
   I:C can contribute to stabilization of the structures.

   RNA secondary structure acts as an additional layer of regulation which
   affects several steps in the gene expression program. The current
   understanding of the effects of small sequence variations and
   modifications on RNA structure is however still limited. The current
   study highlights the potential regulatory role of structural changes
   that take place following RNA modifications. This understanding is
   especially relevant in the current era, in which the extent and
   functional roles of various mRNA modifications, like m^6A and m^1A
   methylation^[235]60, [236]61, are being deciphered. Thus, A-to-I
   editing can play a significant role in this form of regulation as it
   affects Watson–Crick interactions.

   Combined deletion of genes involved in the cytosolic response to dsRNA,
   Mda5 and Mavs, rescues Adar null embryos to birth, suggesting that a
   major biological function for A-to-I editing of IRs is to prevent
   abnormal activation of MDA5 followed by a toxic type-I interferon
   response. While sensing of foreign, infectious agent-derived dsRNAs is
   essential for proper innate immune response, dampening of immune
   response to transcripts derived from endogenous repetitive elements
   (“endovirome”) that fold into long dsRNAs, is essential for the
   prevention of a deleterious response. We show that RNA editing can
   mediate both dsRNA stabilization (SWE genes) and destabilization (DSWE
   genes). The DSWE list of ~400 genes is significantly enriched for
   potential MDA5 ligands (i.e., complementary inverted Alus that fold
   into long “perfect” dsRNAs) for which editing of A:U pairs into I:U
   pseudo-pairs reduces stability. We hypothesize that DSWE transcripts
   are enriched for long, abundant, “perfect” dsRNA, involved in
   ADAR1-regulated cross talk with the innate immune system, which in the
   absence of adequate A-to-I editing may cause an overwhelming activation
   of MDA5/MAVS dsRNA sensing pathway. Interestingly, DSWE transcripts
   also contain increased numbers of STAU1 binding sites compare to SWE,
   suggesting that in DSWE transcript reduced ADAR1 levels may lead to
   escape from Staufen-mediated-decay and increase translation (as shown
   here by our Ribo-seq experiment, albeit not statistically significant,
   one-tail Wilcoxon test, p = 0.09).

   Our study provides the context to evaluate RNA-editing-dependent SSC
   throughout the human transcriptome and implies multiple
   post-transcriptional regulatory roles for this vital function of ADAR1.

Methods

Cell line and cell culture

   The Human HepG2, hepatocellular carcinoma, cell line (from ATCC) was
   maintained in Dulbecco’s modified Eagle’s medium, supplemented with
   4 mM glutamine, 100 U ml^−1 penicillin, 100 μg ml^−1 streptomycin and
   10% FBS (all from Thermo Fisher Scientific Inc.). The cells were
   routinely checked for mycoplasma contamination.

ADAR1 silencing

   HepG2 cells were stably transfected with validated siADAR1 (assay id:
   119581, cat. AM51331) or scrambled siRNA (negative control; cat.
   #AM4635). Both were purchased from Applied Biosystems. The optimal
   concentration of siRNA was 20 μM. The silencing was performed for 48 h
   with ~80% silencing levels. Transfections of the indicated siRNAs were
   performed using a Nucleofector® Device (Amaxa Biosystems, GmbH).

Western blot analysis and antibodies

   Proteins were extracted from cells using RIPA buffer (Sigma-Aldrich)
   supplemented with protease inhibitor (Roche). Following separation on a
   SDS-PAGE, proteins were transferred to a nitrocellulose membrane
   followed by staining with a primary antibody overnight at 40 °C and
   washed and incubated with the appropriate secondary antibody for 1 h at
   room temperature. Specific reactive bands were detected using the
   SuperSignal West Pico Chemiluminescent Substrate (Thermo Scientific).
   The antibodies used were as follows: anti-ADAR1 (Santa Cruz: sc-73408),
   anti-actin (Abcam: [237]Ab156302). For ADAR1, the antibody was diluted
   to 1:2000. For Actin, the antibody was diluted to 1:10,000. The
   uncropped western blot gel is found in Supplementary Fig. [238]7.

Probing mRNA with S1 nuclease and RNAse V1

   PARS-seq samples were digested using RNase V1 and S1 nuclease
   independently and the cleavage sites are captured by adapter ligation
   followed by high-throughput sequencing. Combing structural information
   obtained from these two enzymes independently creates greater
   confidence for whether a base is single or double-stranded than used
   alone. Here, we used PARS-seq to probe RNA structural changes that
   occur under ADAR1 silencing conditions of HepG2 cells. In brief, 2ug of
   poly-A selected mRNA was heated at 90 °C for 2 min and then immediately
   placed on ice for 2 min. Next 10× RNA structure buffer was added to the
   tubes that were transferred to a thermal cycler where the temperature
   was slowly increases from 4 °C to 23 °C over 20 min. These steps ensure
   that the RNA is properly renaturated and folded under in vitro
   conditions. For the enzymatic cleavage, each sample tube was either
   cleaved by RNase V1 or S1 at 23 °C for additional 15 min resulting in
   5′P overhangs. The enzymatic reaction was inactivated by
   phenol:chloroform:isoamyl alcohol, EtoH precipitated and resuspended in
   DDW for further fragmentation of the RNA into 200bases fragments
   (fragmentation was done by adding 1× alkaline hydrolysis buffer to the
   S1 or V1 cleaved RNA at 95 °C for 3 min). Size selection of RNA was
   done by gel electrophoresis and was followed by 5′ adapter ligation.
   This size selection step removes very short fragments of RNA, generated
   from the fragmentation step that could ligate to adapters.
   Fragmentation products with 3′P groups are converted to 3′OH groups by
   Antarctic phosphatase, enabling these products to be ligated to 3′
   adapters. This step is followed by RT, size selection and PCR to
   produce a cDNA library that is suitable for high-throughput sequencing.

Reverse transcription and real-time qPCR

   Total RNA from HepG2 cells was isolated using Trizol (Invitrogen) and
   treated with DNase I (Invitrogen). Poly-A mRNA was isolated using mRNA
   Sequencing Sample Preparation kit (Illumina Proprietary). Random-primed
   cDNA synthesis was done on 2 μg of total RNA using M-MLV RT
   (Invitrogen). Quantitative real-time PCR (qPCR) was performed using a
   Fast SYBR Green Master mix (Applied Biosystems, 4385612), and qPCR
   machine ABI 7900HT genetic analyzer (Applied Biosystems) with standard
   qPCR parameters to analyze the expression of indicated genes compared
   to the control gene ABL. Results were analyzed by the SDS 2.3 software
   with the comparative CT method and log10 (Relative quantification
   values values). Primers and probes sequences are detailed in
   Supplementary Table [239]21.

Libraries preparation and sequencing

   Libraries were prepared following the PARS-seq protocol^[240]62
   (briefly described below), using the TruSeq Small RNA sample
   preparation kit of the Illumina. We ligated the 5′ adapter as described
   in the TruSeq Small RNA sample preparation protocol (Illumina) and
   stopped the reaction with 1 µl stop solution (a component of the TruSeq
   Small RNA sample preparation kit) in 28 C for 15 min. 3′ end treatment
   with Antarctic phosphatase was done as described in the directional
   mRNA-Seq sample preparation protocol, followed by its inactivation with
   ethanol precipitation. 3′ adapter ligation, RT and PCR amplification
   were performed as described in the TruSeq Small RNA sample preparation
   protocol. The cDNA was size selected and cleaned using E-Gel 4% agarose
   (Invitrogen) and Agencourt AMPure XP beads (Beckman Coulter). The
   library was quantified using Qubit (Invitrogen) and validated using
   Agilent 2100 Bioanalyzer. Libraries were sequenced on Illumina
   Hiseq2500 machine. Sequencing yields of the samples are detailed in
   Supplementary Table [241]1.

Short read alignment and read starts counting

   Reads were aligned to the human reference genome (hg19) using
   bowtie2^[242]63 in local alignment option (bowtie2 -p 6 –local -x
   hg19_genome_index -U PARS_sample.fastq -S PARS_sample.sam). Only
   uniquely aligned reads were retained for further analysis. To count the
   number of reads that start at certain position in the genome we
   considered only reads that their 5′ matched the reference by at least
   five bases without gaps, carefully paying attention to the read
   orientation as PARS-seq protocol is strand specific. Samples were
   normalized such that at the end of the normalization step the sums of
   all read starts in each sample were equal. bigWig files containing the
   results of this counting step were uploaded to UCSC genome browser as
   custom tracks, for visualization proposes and deposited to GEO database
   under GEO ID: [243]GSE100210. In order to calculate RPKM values for
   each gene in the PARS-seq samples, coverage of each gene was calculated
   using coverageBed of Bedtools package^[244]64 with RefSeq transcripts
   annotation^[245]65 (coverageBed -split -count -abam BAM_FILE.bam -b
   REFSEQ.bed). The table with RPKM values can be found on Supplementary
   Data [246]14.

Calculation of PARS score for individual bases

   We used the definition of Kertesz et al.^[247]32
   [MATH: <msub><mrow><mi
   mathvariant="normal">P</mi><mi>A</mi><mi>R</mi><mi>S</mi><mspace
   width="0.3em"></mspace><mi
   mathvariant="normal">s</mi><mi>c</mi><mi>o</mi><mi>r</mi><mi>e</mi></mr
   ow><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>=</mo><msub><mrow><mi
   mathvariant="normal">Log</mi></mrow><mrow><mn>2</mn></mrow></msub><mfen
   ced close=")" open="(" separators=""><mrow><mi
   mathvariant="normal">V</mi><msub><mrow><mn>1</mn></mrow><mrow><mi>i</mi
   ><mi>j</mi></mrow></msub><mo>∕</mo><mi
   mathvariant="normal">S</mi><msub><mrow><mn>1</mn></mrow><mrow><mi>i</mi
   ><mi>j</mi></mrow></msub></mrow></mfenced> :MATH]
   1

   where V1[ij] and S1[ij] are the normalized count of V1 and S1 read
   start at base (j) in transcript (i), respectively. PARS scores for the
   control cells in this study were also applied for the analysis in
   Dominissini et al.^[248]61.

Calculation of correlation difference between structural profiles

   We defined the correlation difference (CorDiff) as follows:
   [MATH: <mi mathvariant="normal">CorDiff =</mi><msup><mrow><mfenced
   close=")" open="(" separators=""><mrow><msub><mrow><mi
   mathvariant="normal">Cor</mi></mrow><mrow><mi
   mathvariant="normal">c1,c2</mi></mrow></msub></mrow></mfenced></mrow><m
   row><mi mathvariant="normal">2</mi></mrow></msup><mi
   mathvariant="normal">+</mi><msup><mrow><mfenced close=")" open="("
   separators=""><mrow><msub><mrow><mi
   mathvariant="normal">Cor</mi></mrow><mrow><mi
   mathvariant="normal">kd1,kd2</mi></mrow></msub></mrow></mfenced></mrow>
   <mrow><mi mathvariant="normal">2</mi></mrow></msup><mi
   mathvariant="normal">-</mi><msup><mrow><mfenced close=")" open="("
   separators=""><mrow><msub><mrow><mi
   mathvariant="normal">Cor</mi></mrow><mrow><mi
   mathvariant="normal">c1,kd1</mi></mrow></msub></mrow></mfenced></mrow><
   mrow><mi mathvariant="normal">2</mi></mrow></msup><mi
   mathvariant="normal">-</mi><msup><mrow><mfenced close=")" open="("
   separators=""><mrow><msub><mrow><mi
   mathvariant="normal">Cor</mi></mrow><mrow><mi
   mathvariant="normal">c2,kd2</mi></mrow></msub></mrow></mfenced></mrow><
   mrow><mi mathvariant="normal">2</mi></mrow></msup> :MATH]
   2

   where Cor[ij] is the Pearson correlation between sample (i) to sample
   (j). c[i] stands for control sample i and kd[i] stands for ADAR
   knockdown sample i. Genes and genomic regions for which both CorDiff
   (V1) > 0 and CorDiff (S1) > 0 were considered as containing SSC
   regions. In order to calculate empirical p-value for this index, we
   calculated CorDiff based on random shuffle of the reads starts in each
   region (gene or exon). These CorDiff values were compared with the
   original CorDiff. Only original CorDiff that was higher than 95% of the
   random trials (100 random trials for exon level and gene level) was
   considered as significant (empirical p < 0.05).

Functional enrichment analysis and additional annotations

   Functional enrichment analysis of SSC exons was done with DAVID^[249]45
   and GOrilla^[250]46 using a background list of all exons with a similar
   coverage. Functional enrichment for the DSWE set was done using the
   GSEA online web site v6.1, available at:
   [251]http://software.broadinstitute.org/gsea/msigdb/annotate.jsp,
   employing the major gene sets collections of the MSigDB (Broad
   Institute, Massachusetts Institute of Technology). GOrilla and
   clusterProfiler^[252]66 were also applied with a background list of all
   transcripts longer than 100 bases with read starts coverage >256. miR
   binding sites were taken from TargetScan^[253]67. STAU1 hiClip data was
   downloaded from Sugimoto et al.^[254]68. ADAR1 Clip-seq data was
   downloaded from Bahn et al.^[255]47. Histone modification data of HepG2
   cells was downloaded from the ENCODE project^[256]69. Expression level
   of nuclear and cytoplasmic fractions in HepG2 cells were downloaded
   from the ENCODE project.

Defining significant DS or SS bases

   Illumina Hiseq2500 sequence reads were aligned to the knownCanonical
   transcripts list (downloaded from the UCSC table browser), using
   bowtie2 with local alignment option (bowtie2 -p 6 –local -x
   knownCanonical.index -U PARS_sample.fastq -S PARS_sample.sam). We only
   used uniquely aligned reads that their 5′ matched with no gaps for at
   least five bases (considering the read orientation). Read starts at
   each position of the transcriptome were counted. Samples were
   normalized such that at the end of the normalization step the sum of
   all read starts in each sample was equal. To classify bases as DS or SS
   or undetermined class (typically low coverage regions), we used the
   pars2spp script of SeqFold package^[257]53. This script gets the counts
   of read starts at each position and outputs the secondary structure
   class (DS or SS), employing Fisher exact test to assess significance.
   We ran this procedure on all transcripts and compared the relative
   number of DS bases in control and KD samples.

Differential expression and alternative splicing

   Differential expression analysis was done using DESeq^[258]70. Genes
   with FDR < 0.05 where considered as differentially expressed.
   Alternative splicing analysis was done using DEXSeq^[259]50. Exons with
   FDR < 0.1 were considered as alternatively spliced exons. To eliminate
   the option that the FDR < 0.1 is too stringent for our purposes (and
   more SSC exons represent in fact alternative splicing events), we also
   considered exons with more permissive filter of p < 0.05.

   To further assess the validity of our PARS-seq method following ADAR1
   silencing, a differential gene expression analysis was also performed
   based on the PARS-seq gene coverage. We found that 2412 and 2626
   transcripts were significantly differentially expressed in S1 and V1
   samples, respectively, based on DESeq statistics. 1884 of them
   overlapped (~8-fold enrichment; p < 10^−15 by Fisher’s exact test),
   increasing our confidence in the data and procedures.

   Based on gene FC between KD and control, high correlation between
   RNA-seq and PARS-seq was found (R = 0.67, p < 10^−15).

RNA-seq and Ribo-seq analysis

   RNA-seq and ribosome profiling^[260]71 (Ribo-seq) reads were aligned to
   the human reference genome (hg19) using bowtie2^[261]63 with local
   alignment option (bowtie2 -p 6–local). Only uniquely mapped reads were
   retained for further analyses. Coverage of each gene was calculated
   with coverageBed of Bedtools package^[262]64 using RefSeq transcripts
   annotation (coverageBed -split -count -abam BAM_FILE.bam -b REFSEQ.bed)
   downloaded from UCSC table browser. Using these gene coverage values
   RPKM values for RNA-seq and Ribo-seq were calculated (can be found on
   Supplementary Data [263]14).

   For the calculation of translation efficiency (TE)^[264]71 using
   Ribo-seq and RNA-seq, only genes with coverage of more than 100 reads
   in both Ribo-seq and RNA-seq were considered. The FC between ADAR KD
   and control was calculated using both the Ribo-seq and RNA-seq data
   with DESeq^[265]70. TE ratio (TER) was calculated based on the ratio of
   both FCs:
   [MATH: <mi mathvariant="normal">FC = Normalized</mi><mspace
   width="0.3em"></mspace><mi mathvariant="normal">read</mi><mspace
   width="0.3em"></mspace><mi mathvariant="normal">count</mi><mspace
   width="0.3em"></mspace><mfenced close=")" open="("
   separators=""><mrow><mi
   mathvariant="normal">KD</mi></mrow></mfenced><mi
   mathvariant="normal">∕Normalized</mi><mspace width="0.3em"></mspace><mi
   mathvariant="normal">read</mi><mspace width="0.3em"></mspace><mi
   mathvariant="normal">count</mi><mspace width="0.3em"></mspace><mfenced
   close=")" open="(" separators=""><mrow><mi
   mathvariant="normal">control</mi></mrow></mfenced> :MATH]
   3
   [MATH: <mi mathvariant="normal">TE = Normalized</mi><mspace
   width="0.3em"></mspace><mi mathvariant="normal">read</mi><mspace
   width="0.3em"></mspace><mi mathvariant="normal">count</mi><mspace
   width="0.3em"></mspace><mfenced close=")" open="("
   separators=""><mrow><mi
   mathvariant="normal">Ribo-seq</mi></mrow></mfenced><mi
   mathvariant="normal">∕Normalized</mi><mspace width="0.3em"></mspace><mi
   mathvariant="normal">read</mi><mspace width="0.3em"></mspace><mi
   mathvariant="normal">count</mi><mspace width="0.3em"></mspace><mfenced
   close=")" open="(" separators=""><mrow><mi mathvariant="normal">RNA -
   seq</mi></mrow></mfenced> :MATH]
   4
   [MATH: <msub><mrow><mi mathvariant="normal">TER =
   TE</mi></mrow><mrow><mi
   mathvariant="normal">KD</mi></mrow></msub><msub><mrow><mi
   mathvariant="normal">∕TE</mi></mrow><mrow><mi
   mathvariant="normal">control</mi></mrow></msub><msub><mrow><mi
   mathvariant="normal">= FC</mi></mrow><mrow><mi
   mathvariant="normal">Ribo-seq</mi></mrow></msub><msub><mrow><mi
   mathvariant="normal">∕FC</mi></mrow><mrow><mi
   mathvariant="normal">RNA-seq</mi></mrow></msub> :MATH]
   5

   Ribosome occupancy was calculated using the GWIPS-viz method in
   Ribogalaxy^[266]72 Meta-gene analysis was done in the span of 150 bases
   upstream/downstream to translation start/stop sites.

RNA-editing detection

   Known RNA-editing sites were examined in the bowtie2 alignment files
   based on list of editing sites gathered from RADAR^[267]8 and
   DANRED^[268]42. This list was used as input for mpileup of samtools
   package^[269]73 for detection of variants in defined positions. The
   mpileup file was parsed using mpileup2snp of VarScan^[270]74. Changes
   from A-to-G were retained.

   De novo RNA-editing sites in hyper-edited regions were detected using
   Porath et al.^[271]43 pipeline. We identified dense editing clusters of
   high-quality (Phred ≥ 30) A-to-G mismatches, in which the number of
   A-to-G mismatches was ≥5% of the read length and > 90% of the total
   number of mismatches. In order to find the editing level of these
   hyper-edited sites we used the bowtie2 alignment (uniquely aligned
   reads that were also used for the PARS-seq analysis), and the
   hyper-edited sites list was served as input for the mpileup–VarScan
   pipeline.

   Known and de novo sites for which their editing level G/(A + G) is
   significantly lower (p < 0.05) in ADAR KD cells were considered as
   differential-editing sites (DES).

   The lists of editing sites are given in Supplementary
   Datas [272]1–[273]3.

   The list of known RNA-editing sites from RADAR^[274]8 and
   DARNED^[275]42 was also used for editing analysis of lymphoblastoid
   cells PARS-seq^[276]27 in a similar manner.

Analysis of PARS-seq from lymphoblastoid cells

   Normalized read start count of V1 and S1 treatments for the samples
   from Wan et al.^[277]27 (NA12878 and NA12891, respectively) done on
   human lymphobalsotid cells were downloaded from GEO (GEO ID:
   [278]GSE50676). Raw reads (fastq files) for these samples were
   downloaded from SRA (SRA ID: SRP029656). Mapping, PARS-analysis and
   editing sites calling were performed as detailed above.

Prediction of RNA 2D structure and free energy

   RNA 2D predictions were performed for selected cases (as in the cases
   of IR Alu regions of PSMB2 and F11R).

   Average PARS scores from both biological replicates were calculated
   from all bases of the target regions. Bases with PARS score >2.5 were
   defined as DS and bases with PARS score <−2.5 were defined as SS. Bases
   with no significant trend were registered as unknown. The binary
   classifications were then used as constrains for RNAfold^[279]75
   (RNAfold –C) of the Vienna package^[280]75. In addition, editing sites
   which were significantly changed between control and KD (defined as DES
   by previous steps), covered by a total of more than 100 reads with
   editing level G/(A + G) > 0.1 on average in the control samples, were
   changed from A-to-G prior to modeling.

   Visualization of 2D RNA structures were done using RNAplot of Vienna
   package^[281]75. R-chie^[282]76 was used to draw the arc diagrams of
   the RNA 2D structures (as in Fig. [283]3).

   In order to compare structural ensembles in control and KD samples, we
   used RNAsubopt of Vienna package with 1000 random suboptimal structures
   using constrains derived from our PARS-seq experiment (RNAsubopt –p
   1000 –C).

   Selected examples (e.g., the F11R and PSMB2 IR Alu regions in
   Fig. [284]3) were analyzed additionally by SeqFold.

Comparison of PARS-seq signal to known RNA structures

   PARS score was calculated for each of the RNA molecules shown in
   Fig. [285]2 and Supplementary Fig. [286]1. Solved 3D structures were
   visualized using Jmol ([287]www.jmol.org) and colored by their PARS
   scores. Solved 3D structures were assigned to 2D structure using
   X3DNA/DSSR program^[288]77 (this was used in Supplementary
   Fig. [289]1). Known 2D structures were visualized using VARNA^[290]78
   and colored by their PARS score.

   In order to get the correlation heatmap in single-nucleotide resolution
   as in Fig. [291]2d, we first chose the 1000 genes with the highest
   coverage among PARS-seq samples. Next, we filtered for bases were read
   starts per base in all 8 PARS-seq samples together is >17 reads (>2.125
   read starts per base for each sample, on average). Next, we calculated
   the spearman correlation and plotted the resulted matrix.

Comparing PARS-seq signal between non-edited and edited reads

   Read starts information from reads containing the edited adenosine in
   control samples was registered. Similar information was register for
   reads covering the same site containing the non-edited version. Edited
   and non-edited profiles in control samples were compared using the
   CorDiff index (see above). Only regions with CorDiff >0 in both V1 and
   S1 for which the original CorDiff is higher than 95% of the random
   trials were considered as structurally changed between edited and
   non-edited versions (as in Supplementary Data [292]11). In order to
   draw the structural signature upstream to the edited adenosine, reads
   starts were normalized to the total read starts in that region. We
   averaged this normalized count and used it in order to calculate the
   relative PARS score in the upstream region to the edited adenosine.

Statistics

   All statistical tests were done using R^[293]79.

Code availability

   All R and Perl scripts used for the different bioinformatics assays can
   be obtained from the authors upon request.

Data availability

   The datasets generated during the current study are available in the
   GEO ([294]https://www.ncbi.nlm.nih.gov/geo/) repository, under GEO ID:
   [295]GSE100210. All raw sequencing data (fastq files) from this study
   were uploaded to GEO under GEO ID: [296]GSE100210. This includes also
   processed data of reads starts per base in genome coordinates (hg19,
   bigwig files) and in transcriptome coordinates (tab-delimited files).
   All PARS-seq alignment files (in BAM format, aligned to hg19 or to UCSC
   knownCanonical transcripts) can be downloaded from the following link:
   [297]http://bioinfo.lnx.biu.ac.il/downloads/Solomon_Paper/

Electronic supplementary material

   [298]Supplementary Information^ (1.4MB, pdf)
   [299]41467_2017_1458_MOESM2_ESM.pdf^ (177.8KB, pdf)

   Description of Additional Supplementary Files
   [300]Supplementary Data 1^ (3.1MB, txt)
   [301]Supplementary Data 2^ (131.2KB, txt)
   [302]Supplementary Data 3^ (4.2KB, txt)
   [303]Supplementary Data 4^ (12.7KB, xlsx)
   [304]Supplementary Data 5^ (108.2KB, xlsx)
   [305]Supplementary Data 6^ (345.8KB, xlsx)
   [306]Supplementary Data 7^ (33.1KB, xlsx)
   [307]Supplementary Data 8^ (21.3KB, xlsx)
   [308]Supplementary Data 9^ (11.9KB, xlsx)
   [309]Supplementary Data 10^ (173.5KB, xlsx)
   [310]Supplementary Data 11^ (14.5KB, xlsx)
   [311]Supplementary Data 12^ (665.4KB, xlsx)
   [312]Supplementary Data 13^ (20.2KB, xlsx)
   [313]Supplementary Data 14^ (2.8MB, xlsx)

Acknowledgements