Graphical abstract graphic file with name fx1.jpg [33]Open in a new tab Highlights * • IsomiRs are not a consequence of library artifacts in most RNA-seq protocols * • Randomized adapter protocols outperform fixed adapters in isomiR calling accuracy * • NTA-U calling is dependent on high-single-nucleotide-resolution performance * • NTA-U isomiR calling affects mRNA target prediction Motivation Sequence variants (isomiRs) of mature microRNAs can be detected by high-throughput sequencing methods. However, because isomiRs differ from the canonical (mature) miRNA sequence often by only a single nucleotide, most protocols introduce errors that may negatively influence biological interpretation. To investigate how sequencing protocols impact isomiR calling, we conducted a comprehensive comparison of 10 small RNA-seq protocols, providing a comprehensive reference guide for miRNA/isomiR interpretation by sequencing. __________________________________________________________________ Gómez-Martín et al. demonstrate that isomiRs are non-artifactual variants of miRNAs and that most small RNA sequencing protocols, especially those with randomized adapters, detect them with high accuracy. Single-nucleotide accuracy is crucial for making predictions about biological function. The work provides a reference for researchers interested in small ncRNA. Introduction MicroRNAs (miRNAs) are small non-coding RNA transcripts widely studied for their role in gene expression regulation, which they exert by imperfectly pairing to a target messenger RNA (mRNA).[34]^1 Expression profiles of the 2,600 reported human miRNAs over different tissues and pathophysiological conditions are a first step to understand the underlying biology and are useful for diagnostics[35]^2 and therapeutics.[36]^3 To generate comprehensive miRNA expression profiles, specific next-generation sequencing (NGS) protocols were developed. Initially, protocols were based on fixed (invariant) sequence adapters that introduced bias due to preferential ligation affinity to certain sequences, resulting in a high variability of less informative ncRNAs (Y-RNA, tRNA fragments).[37]^4^,[38]^5 To overcome this, recent sequencing protocols make use of random nucleotides at one or both sequencing adapters, thereby reducing detection bias of canonical miRNAs.[39]^5^,[40]^6^,[41]^7^,[42]^8 However, accurate detection of post-transcriptional modifications remains challenging, in particular for low-input applications such as extracellular miRNA profiling used for minimally invasive diagnostics. Early NGS studies revealed that a significant proportion of miRNA reads in biological samples deviate in length and sequence from the 20–21 nucleotide canonical miRNAs.[43]^9 Among these variants, termed “isomiRs,”[44]^10 one type is generated by post-transcriptional non-templated nucleotide additions (NTAs) by terminal nucleotidyl transferases.[45]^11 These enzymes add up to 3 nucleotides to the 3′ end, leading to slightly elongated mature miRNAs that will thus deviate from the genomic sequence. Functional implications of such post-transcriptional modifications are altered miRNA stability[46]^12 and/or different mRNA targets compared with the canonical sequence.[47]^13 A recent study describes that dysfunctional isomiRs accumulate in cancer cells, and that enzymatically restoring their canonical function may have therapeutic value.[48]^14 Indeed, modified miRNAs act as independent functional molecules that may support or counteract canonical miRNAs with diagnostic and prognostic implications. For instance, the targetome of one miR-411 isomiR that is upregulated in chronically ischemic human blood vessels has a very small overlap with the canonical miR-411 targetome.[49]^15 Due to their functional relevance, accurate discrimination between true isomiRs and sequencing or alignment artifacts is crucial to avoid misinterpretation of their biological function and diagnostic significance.[50]^16 Therefore we analyzed and compared the presence of isomiRs in a large body of sequencing data generated by independent laboratories using 10 different protocols to evaluate protocol differences in detecting true biological isomiRs. To assess comparable samples, we selected sequencing datasets from the miRXplore Universal Reference, a pool consisting of 963 different chemically synthesized miRNAs (thus theoretically free of isomiRs) and libraries obtained from the HEK293T cell line. To avoid any bioinformatics processing bias, all sample data were uniformly processed using sRNAbench,[51]^17 a broadly used miRNA analysis software that can also accommodate virtually any sequencing protocol. We also included samples from “IsoSeek,” our thoroughly optimized[52]^7 in-house 5N protocol that aims to account for all technical bias by using 5′ and 3′ randomized adapters, each of them with 5 random nucleotide unique molecular identifiers (UMIs) (5N) based on commercial adapters.[53]^18 Our results show that regardless of the adapter strategy, the majority of protocols generate only minor library preparation artifacts and detect biological variants with relevant overlap, mostly 3′ end NTAs. Moreover, 4N and 5N protocols appear to be highly suitable for isomiR profiling of biological samples, whereas some protocols suffer from biased quantification and/or generation of artificial isomiRs. Results Analysis of the 963-miRNA miRXplore reference pool reveals that the majority of small RNA-seq protocols generate low levels of false isomiRs A recent study cautioned that small RNA-seq protocols generate a large proportion of non-biological miRNA variants that should be considered as artifacts.[54]^8 We first scrutinized this claim by analyzing the canonical miRNA and isomiR profiles from 10 different protocols generated furthermore by different laboratories. Specifically, we analyzed data from a synthetic 963-miRNA reference pool (miRXplore Universal, Miltenyi Biotec) and HEK293T cellular RNA using commercial and non-commercial randomized-end adapter protocols (AQ-seq,[55]^6 4N-G and 4N-X,[56]^5^,[57]^6 NEXTflex, and our own 5N “IsoSeek” method[58]^7), one custom 2N adapter protocol (AQRNA-seq[59]^8), and three commercial invariant (fixed) adapter protocols (NEBNext,[60]^13^,[61]^19 TruSeq,[62]^20 QIAseq,[63]^21^,[64]^22 and Clean-Tag[65]^22). The main characteristics of the different protocols are summarized in [66]Table S1. Mapping statistics of the 963-miRNA pool revealed that 7 out of the 9 protocols showed 75%–80% recovery of exact matches, i.e., canonical sequences ([67]Figure 1A). AQRNA-seq and 4N-G protocols have a consistently lower percentage of canonical sequences but, in contrast, a higher percentage of 5′ end length variants (lv5p, see isomiR classification used in [68]Figure S1) than the other protocols. NTA isomiRs (see [69]Figure S1) should theoretically be absent in the reference pool as these samples were never in contact with nucleotidyl transferases. Indeed, this subclass is almost not detected by any protocol (∼1% for each NTA class), with the exception of NEBNext, which appears to generate the most artifacts (see also [70]Table 1A). However, more prominent were internal single mismatches (NucVar) that were present in ∼10%–12% of the miRNA-mapped reads in all protocols, except for 2 samples from the 4N-G protocol with only ∼4% of NucVar. This could be explained by a massive number of extensively trimmed 5′ length variants that drastically reduce the percentage of canonical sequences ([71]Figure 1A and [72]Table 1A). In fact, all other protocols have only ∼2% length variants ([73]Figure 1A and [74]Table 1A), often only lacking a single nucleotide at 3′ or rarely at the 5′ end ([75]Table 1A). Figure 1. [76]Figure 1 [77]Open in a new tab IsomiR distribution in 963-miRNA Mintenyi miRXplore Universal Reference pool libraries and HEK293T cells (A) IsomiR distribution in 62 libraries from the 963-miRNA Miltenyi miRXplore Universal Reference pool. A high percentage of canonical miRNAs (∼75%) is detected with almost all protocols, with the exception of 4N Giraldez in-house protocol (4N-G) and AQRNA-seq protocol that have a high percentage of 5′ end length variants and therefore a low percentage of canonical isoforms. The pool also contains a relatively large (∼10%) of NucVar isomiRs (only one mismatch compared to the canonical sequence), which was detected in all analyzed libraries. The height of each bar represents the mean percentage and the error bars the SD. (B) IsomiR distribution in 23 libraries from HEK293T cells. All small RNA sequencing protocols show ∼40% of canonical miRNAs. The most abundant isomiRs are lv3p (∼25%) and NTA-U/A (∼10% each one) and also NTA-C only in NEBNext samples (∼10%). The bars with a striped pattern indicate HEK293T cells with a TUT4/7 DKO. In all the protocols, a reduction of the NTA-U level in the DKO cells is observed. The height of each bar represents the mean percentage and the error bars the SD. Table 1. Percentage of non-templated addition (NTAs) isomiRs in 62 libraries from the 963-miRNA Miltenyi miRXplore Universal Reference pool and 12 libraries from HEK293T cells A 963-miRNA Miltenyi miRXplore Universal Reference __________________________________________________________________ NTA-U NTA-A NTA-C NTA-G TruSeq 0.67 0.53 0.26 0.69 CleanTag 0.61 0.39 0.24 0.39 NEBNext 2.63 2.97 2.99 1.62 AQRNA 1.25 1.43 1.27 0.5 QIAseq 0.41 0.39 0.15 0.67 NextFlex 0.81 0.6 0.19 0.34 IsoSeek 1.41 1.01 0.96 0.96 4N-X 1.19 1.08 0.95 0.44 4N-G 0.85 0.96 0.32 0.67 __________________________________________________________________ B HEK293T Cells __________________________________________________________________ NTA-U NTA-A NTA-C NTA-G NEBNext 7.4 8.41 2.13 7.21 QIAseq 5.03 2.75 0.1 0.35 AQ-seq 7.48 11.16 0.46 0.67 IsoSeek 7.16 4.45 0.68 0.51 [78]Open in a new tab A very low percentage (∼1%) of all NTAs is observed in the pool, with the highest value from the samples of the NEBNext protocol (∼3%). In contrast, a high percentage of NTA-U/A is observed in the HEK293T (∼15% with the exception of QIAseq protocol), while NTA-C/G (with no biological function underlined) percentages remain low. Only one fixed adapter protocol (NEBNext) showed a more significant higher percentage of NTA-C/G isomiRs. In contrast to the reference pool, the analysis of the isomiR distribution of the 23 HEK293T libraries revealed a different pattern. All protocols showed that ∼40%–50% of the miRNA-mapped reads represented the exact mature miRNA sequence (canonical) as annotated in miRBase ([79]Figure 1B). The most prominent isomiR class is the 3′ end length variants (lv3p), which are presumably generated by “sloppy” Drosha or Dicer cleavage or the action of exonucleases.[80]^1 Finally, approximately 10%–20% of the miRNA reads belong to NTA isomiRs, similarly to previously found in other studies.[81]^11 It was unexpected that the percentage of NucVar isomiRs was 3- to 4-fold smaller in cells (∼4%) compared with the reference pool (∼12%). This could indicate that, rather than artifacts of library preparation, artificial NucVar isomiRs represent errors already introduced during oligosynthesis of the 963-miRNAs pool.[82]^23 To investigate this possibility, we measured the presence of artificial NucVar isomiRs in a customized panel of 26 synthetic isomiRs, and we then compared the resulting percentage to the 963-miRNA pool results ([83]Figure S2A). Interestingly, we found a significantly lower percentage of NucVar in our customized spike-in set. Taking into account that our spike-in set was generated with the highest quality control standards, this suggests the presence of oligosynthesis errors in the commercial 963-miRNA pool. Moreover, we directly compared the performance of IsoSeek with the fixed adapter NEBNext protocol using the same 26 synthetic isomiRs that were sequenced without any biological background ([84]Figure S2B). The distribution of the different isomiRs was considerably more equal in the IsoSeek samples, with 0.88 coefficient of variation (CoV), compared with NEBNext with 1.82 CoV, indicating a major reduction in bias. Analysis of length variant isomiRs in the 963-miRNA set and HEK293T cells reveals 3′ end length variants outnumber 5′ end length variants We next performed an in-depth analysis of the four different length variant classes, i.e., extended or trimmed at the 5′ or 3′ end ([85]Figure S1). As can be observed in [86]Table 1A, the majority of the protocols showed that samples from the 963-miRNA pool have a relatively low percentage of length variants (∼5%). This contrasts with the HEK293T cells ([87]Table 1B) where the percentage of length variants is much (6-fold) higher (∼30%), suggesting a biological origin. In fact, in the case of the 963-miRNA pool samples, the most prominent length variants are at the 5′ end, while the HEK293T cells had a high presence of lv3p isomiRs, which are consistent with “sloppy” Drosha or Dicer cleavage or the action of exonucleases.[88]^12 Therefore, most 5′ truncated miRNAs (lv5pT) and other 5′ length variants are very likely due to library preparation artifacts. A high proportion of the NucVar in HEK293T cell isomiRs point to biological origin We next looked in more detail into the different classes of single-nucleotide changes (NucVar, see [89]Figure S1) in HEK293T cells. A high percentage of T>C and A>G single-nucleotide changes (in red) were consistently detected among the 4 protocols (NEBNext, QIAseq, AQ-seq, and IsoSeek) compared to all the other potential changes, which showed a much lower percentage ([90]Figure 2A). Figure 2. [91]Figure 2 [92]Open in a new tab NucVar isomiRs profiling in HEK293T cells compared to the 963-miRNA pool (A) Percentage of HEK293T different NucVar isomiRs. The most prominent classes were T>C and A>G changes. The height of each bar represents the mean percentage of each NucVar class of the total miRNA read counts of 4 different protocols (IsoSeek, QIAseq, AQ-seq, NEBNext, n = 20), and the error bar represents the SD. The mean line represents the mean percentage of all NucVar classes in all protocols. (B and C) Percentage of T>C and A>G modifications respectively compared to the total number of reads mapping to NucVar in both HEK293T cells and the 963-miRNA pool samples. In all the comparisons, the percentage of the variants in the cells was statistically significantly higher than in the 963-miRNA pool. The height of the bars represents the mean percentage and the error bars the SD. We then compared the percentage of these two NucVar types, T>C ([93]Figure 2B) and A>G ([94]Figure 2C), in HEK293T cells to the percentage in the 963-miRNA pool samples sequenced with the same protocol. The percentage of both variants in the HEK293T cells was significantly higher in both cases (p value < 0.01), regardless of the sequencing protocol used. Interestingly, these two variants specifically may be caused by enzymatic activity. Specifically, the A>G conversion can be a result of the ADAR enzyme activity[95]^24^,[96]^25 and, in the case of T>C, a result of APOBEC3C action.[97]^26 4 different small RNA protocols robustly detect NTA-U/A isomiRs in HEK293T cells that are absent in the artificial reference pool We observed that the percentage of NTA isomiRs in the 963-miRNA pool was consistently lower (∼1% per NTA class) than the percentage in HEK293T cells (∼20% total NTA), as shown in [98]Figure 1. We then looked deeply into the different NTA isomiRs, i.e., NTA-A, NTA-U, NTA-C, and NTA-G ([99]Table 2). In HEK293T cells, approximately 10%–30% of the total miRNA reads belonged to NTA isomiRs, similar to previous studies,[100]^11 which is up to 20-fold higher than in the reference pool libraries. The majority are NTA-U and NTA-A isomiRs, while NTA-C and NTA-G are hardly detected using AQ-seq and IsoSeek ([101]Figure 1B and [102]Table 2B). Moreover, in addition to a lower NTA percentage detected, a more equal and possibly random distribution of the percentage of the different NTA classes was observed ([103]Table 2A). Overall, randomized adapter protocols reveal the more equal percentage of NTA distribution and the lowest percentage of artificial NTA in the 963-miRNA pool samples in agreement with prior observations.[104]^11 Despite minor differences, we conclude that the majority of NTA-A and NTA-U classes that are detected in HEK293T cells represent true biological isomiRs. Table 2. Mean percentage of the different length variants isomiRs in 62 libraries from the 963-miRNA Miltenyi miRXplore Universal Reference pool and 12 libraries from HEK293T cells A 963-miRNA Miltenyi miRXplore Universal Reference __________________________________________________________________ lv5pT lv5pE lv3pT lv3pE TruSeq 3.77 0.03 1.31 0.22 CleanTag 4.55 0.05 0.93 0.28 NEBNext 2.67 0.03 7.08 0.13 AQRNA 26.25 3.47 9.71 0.37 QIAseq 3.74 0.05 0.98 0.28 NextFlex 6.11 0.08 1.02 0.32 IsoSeek 5.89 0.08 5.09 0.69 4N-X 9.64 0.24 3.77 0.44 4N-G 52.77 0.15 2.21 0.24 __________________________________________________________________ B HEK293T Cells __________________________________________________________________ lv5pT lv5pE lv3pT lv3pE NEBNext 2.14 0.53 9.86 11.9 QIAseq 3 0.19 15.04 11.77 AQ-seq 4.86 1.23 16.6 18.01 IsoSeek 0.59 2.93 19.89 13.49 [105]Open in a new tab A low percentage (∼10%) of length variants is observed in the pool in all protocols except the 4N in-house protocol from Giraldez et al.[106]^5 (4N-G) and AQRNA-seq protocol. Both these protocols seem to create a high percentage of 5′ trimmed length variants. In contrast, the length variants observed in the HEK293T cells are mainly at the 3′ end. The color intensity of each cell is relative to its value. TUT4/7 knockout HEK293 cells reveal concordance and differences between protocols in the detection of uridylated isomiRs (NTA-U) To assign specificity of the protocols for biological isomiR detection, we studied the biological activity of nucleotidyl (uridyl) transferases TUT4 and TUT7 in HEK293T CRISPR-Cas-mediated double knockout (DKO) HEK293T cells. TUT4 and TUT7 enzymes synthetize the addition of uridine nucleotides at the 3′ end of miRNAs.[107]^11 To fully establish that small RNA sequencing data obtained by various protocols contain biologically generated isomiRs, and because uridylation may be cell type specific,[108]^27 we analyzed 11 libraries from TUT4/7 DKO HEK293T cells[109]^13 prepared with 4 different protocols in 6 different laboratories (see striped bars in [110]Figure 1B). In agreement with knockdown studies using a NanoString-based detection in a different cell type,[111]^11 all small RNA sequencing protocols showed reduced percentages of NTA-U in the TUT4/7 DKO cell lines compared with wild-type (WT) counterparts ([112]Figure 1B). To establish the reproducibility of NTA-U isomiR detection by small RNA sequencing protocols, we assessed the detection of miRNAs with an uridylation ratio drop of at least 2-fold in TUT4/7 DKO HEK293T cells compared to the parental cells. We used data from 6 independent studies that employed 4 different protocols (NEBNext, QIAseq, AQ-seq, and IsoSeek). We identified 20 miRNAs (6% of all differentially uridylated miRNAs by at least 1 of the protocols) that were consistently detected as differentially uridylated due to the lack of TUT4/7 enzymes in all studies ([113]Figure 3). We validated this overlap by comparison to a random selection from the total list of differentially uridylated miRNAs (see more details in [114]STAR Methods), yielding a highly significant Z score of 28.5. These analyses indicate that the observed differences in NTA-Us between WT and DKO cell lines with multiple protocols are not random but are driven by the depletion of TUT4/7 enzymes and thus biologically motivated. Figure 3. [115]Figure 3 [116]Open in a new tab Differentially uridylated miRNAs in TUT4/7 DKO HEK293T cells vs. WT from 6 different studies A total of 20 miRNAs (red-highlighted bar) were found in common between the 6 different studies from 4 different protocols (O/E ratio = 20, Z score = 28.5). This represents 6% of the total of differentially uridylated miRNAs (See pie plot, black section). The inset box presents a table with the top 5 (with the highest fold-change [FC]) differentially uridylated miRNAs. Differences across protocols in detection of TUT4/7-dependent miRNA uridylation In order to estimate potential biases in the detection of NTA isomiRs, we performed a deep analysis into the NTA class distribution in parental and TUT4/7 DKO HEK293T cells. Using AQ-seq, IsoSeek, and QIAseq, we observed an overall reduction of NTA-U of 25%, 24%, and 34% respectively in TUT4/7 DKO cells as compared to parental cells ([117]Figure 4A). However, with the fixed adapter protocol NEBNext, this reduction was only 4% ([118]Figure 4A). Interestingly, using randomized adapter protocols, AQ-seq and IsoSeek, the global NTA-U reduction observed in DKO cells (∼24%) was complemented by a similar increase of the NTA-A fraction (∼23%). This phenomenon was recently described in cancer cells,[119]^27 and this subtle but significant correlation was not seen in the fixed adapter protocols ([120]Figure 4A). On the other hand, the QIAseq protocol detected a significant decrease on uridylation but not a clear increase in NTA-A isomiRs. Instead, NTA-C isomiRs were increased. Taking into account that such isomiRs have not been described as enzymatically driven, we cannot rule out the possibility of technical artifacts presence. Figure 4. [121]Figure 4 [122]Open in a new tab Differences in miRNA uridylation detection between protocols (A) NTA isomiRs distribution in 4 different protocols (NEBNext, QIAseq, AQ-seq, and IsoSeek) in both WT and TUT4/7 DKO HEK293T cells. (B–D) Difference of percentage of uridylation and adenylation, guadinylation, and cytidylation, respectively, between the HEK293T parental cells and TUT4/7 DKO cells of the differentially uridylated miRNAs according to IsoSeek. (E) Percentage of uridylation of miRNAs derived from the 3p arm (purple) or the 5p arm (green) in HEK293T WT and TUT4/7 DKO cells after library preparation using NEBNext, QIAseq, AQ-seq, and IsoSeek. We next investigated those miRNAs that displayed a clear difference in uridylation percentage in the DKO compared with the WT cells based on IsoSeek protocol. In most of the miRNAs where 3′ end uridylation was reduced in the DKO, we observed an increase in 3′ end adenylation ([123]Figure 4B). However, this was not the case for guadinylation and cytidylation ([124]Figures 4C and 4D respectively). Taken together, these results suggest that a specific increase in adenylation of selected miRNAs in DKO cells occurs in the absence of urydilation. These observations indicate that 3′ end NTA modification is a non-random biological process caused by competing NTA (uridylase vs. adynelase) nuclease activities. We also evaluated the uridylation of 3p and 5p arm-derived miRNAs and observed that TUT4/7-dependent uridylation occurs predominantly on 3p miRNAs. This preference is especially pronounced when using randomized adapter protocols with reduced bias while being much less evident with the NEBNext fixed adapter protocol ([125]Figure 4E). Biased detection of miRNA uridylation can affect target prediction To investigate whether differences in isomiR detection between protocols influence target prediction strategies,[126]^13 we compared the top 10 most uridylated miRNAs identified per protocol ([127]Figure 5A). miRNA hsa-miR-30e-3p, which has a role in cancer,[128]^28^,[129]^29 is heavily mono-uridylated according to randomized adapter protocols, while with NEBNext, the exact mature sequence is more abundant ([130]Figure 5B). Importantly, only IsoSeek and QIAseq protocols show a clear decrease in uridylation in TUT4/7 DKO cells and an increase in adenylation ([131]Figure 5B). Figure 5. [132]Figure 5 [133]Open in a new tab Biased detection of miRNA uridylation may negatively affect target prediction (A) Top 10 most uridylated miRNAs in HEK293T cells with the highest percentage of uridylation upon use of IsoSeek compared with NEBNext, AQ-seq, and QIAseq. (B) Percentage of canonical, mono-uridylated, and mono-adenylated miR-30e-3p in HEK293T libraries using NEBNext, QIAseq, AQ-seq and IsoSeek protocols. TUT4/7 DKO cells are represented with striped bars. (C) Target prediction of mature miR-30e-3p using the canonical seed sequence (red) or the mono-uridylated tail-U based targeting (TUMR, green) and its overlap. (D and E) Possible 3′ UTR targeting of miR-30e-3p. (D) shows the mature miR-30e-3p sequence and a potential targeting based on its seed sequence. Below are the significantly enriched Panther pathways in the target genes of the canonical isoform. (E) shows an alternative tail-U-based targeting of mono-uridylated miR-30e-3p and the significantly enriched Panther pathways of the TUMR targets. Having identified a biological isomiR, we performed target prediction for both the canonical hsa-miR-30e-3p and the 3′ end mono-uridylated isoform. We observed many unique potential target genes of both the canonical miR-30e-3p and the mono-uridylated isomiR. This is in agreement with the recently described tail-U-mediated repression (TUMR) as described by Yang et al.[134]^13 In fact, we found many more isomiR unique targets than overlapping targets ([135]Figure 5C). These observations may have critical implications for the biological function of these closely related isomiRs with a highly divergent targetome. Indeed, pathway enrichment analysis of the canonical targetome ([136]Figure 5D) and the TUMR targetome ([137]Figure 5E) showed that different pathways are enriched, leading to physiological changes. This analysis exemplifies the potential consequences of inaccurate isomiR detection derived from suboptimal choice of a sequencing protocol. Discussion The advent of NGS has revealed that miRNAs, apart from the 21/22 nucleotide canonical (mature) genomic miRNA sequences, are comprised of many variants known as isomiRs. We show here that distinguishing true isomiRs from technical artifacts and sequencing errors is of high importance, as it may result in misinterpretation of their biological function (i.e., targetome),[138]^13 and diagnostic[139]^16 and potential therapeutic significance.[140]^14 To demonstrate this, we thoroughly compared sequencing data from HEK293T cells and a 963-synthetic miRNA pool that should, theoretically, be free of any isomiRs. 95 samples were analyzed and sequenced with in total 10 different protocols including widely used fixed (or invariant) commercial adapters and newer protocols that make use of randomized adapters and other customized library preparation procedures. We report several novel findings, some of special relevance when studying the biology of isomiRs. First, when benchmarking protocols using a standard reference set, it is of key importance that the quality of all oligos (reference sets, primers, and adapters) is extremely high, as synthesis errors negatively influence the results ([141]Figures 1 and [142]S2 and [143]Tables 1 and [144]2). Second, some protocols may generate high numbers of library preparation artifacts ([145]Figure 1 and [146]Tables 1 and [147]2). Third, randomization of adapter sequences at the 3′ and 5′ ends strongly reduces ligation bias. Fourth, UMI-correction can reduce amplification bias but appears less essential when RNA input is not limiting. Fifth, to determine the biological relevance of isomiRs, analysis of control cells that lack relevant NTA-modifying enzymes is advised. Sixth, we determined higher percentages of T>C and A>G NucVar changes in cells when compared with the reference pool. This is in line with a biological role as these modifications are the only mutations known caused by RNA-editing enzymes.[148]^24^,[149]^25^,[150]^26 Seventh, uridylation on 3p-arm miRNAs is much more frequent than 5p miRNAs ([151]Figure 4E), confirming the idea that uridylation generally occurs after Drosha cleavage but before Dicer processing.[152]^30 Finally, only randomized adapter protocols detect NTA miRNA substrate competition ([153]Figure 4), a recently discovered biological phenomenon in cancer cells.[154]^27 What is the current experimental evidence for a biological role of isomiRs? In cartilage, a miR-140-3p isomiR is the dominant and active form, with a completely different seed and targetome.[155]^31 In addition, there is evidence that shRNA-induced liver toxicity is solely related to competition with a major isoform of miR-122.[156]^32 Moreover, some classes of isomiRs are preferentially sorted into communicating extracellular vesicles,[157]^33^,[158]^34 which could have important biological and diagnostic implications. Most strikingly, Qi et al. recently reported that expression of a plant nucleotidyl-transferase (RDR1) in human cancer cell lines blocks proliferation by targeting cell cycle genes and that these effects were dependent on the miRNA pathway.[159]^14 It appeared that miRNA duplex isoforms with 1-nt-shorter 3′ ends cannot be efficiently loaded onto the Argonaute complex and accumulate in human tumors. RDR1 expression restored the regular 2-nt overhang structure of miRNA duplexes, thereby rescuing the defective miRNA pathway in mouse xenograft models, which suppressed tumor growth.[160]^14 In another study, an isomiR sequence of miR-411 was 5-fold more abundant than the canonical form in primary human vascular cells and in venous tissue samples from patients with peripheral artery disease. The authors show that isomiR-411 and canonical miR-411 expression is differentially regulated under ischemic conditions in a murine hindlimb ischemia model.[161]^15 Finally, isomiRs may act as cancer biomarkers and function as either allies or antagonists of their canonical counterparts.[162]^35 By using cell, and not tissue, data, McCall et al. observed many cell-specific differences in the isomiR composition of miRNAs.[163]^36 Notably of 205 common miRNAs, the most abundant sequence did not match the reported sequence in [164]mirBase.org.[165]^36 Since a large body of studies shows that isomiRs have a biological function, it is thus critical to choose reliable library protocols and mapping strategies. We observed that 4-5N randomized adapters approaches resulted in the most unbiased protocols, showing the lowest amount of NTA percentage in the 963-miRNA reference pool samples. This is in agreement with previous observations only considering the exact miRNA sequences.[166]^11 IsoSeek and AQ-seq detect profound global changes in the miRNA uridylome in HEK293T cells, while this was not observed using the fixed adapter NEBNext protocol. Nevertheless, all protocols determined that hsa-miR-760, which has been correlated with breast cancer progression,[167]^37 is a heavily uridylated miRNA ([168]Figure 5A). Importantly, exploiting CRISPR-Cas9-generated TUT4/7 DKO cells, we could identify miRNAs that are specifically uridylated by the combined action of these TuTases. We identified with multiple protocols that at least 20 miRNAs (6%) are selectively uridylated by these enzymes ([169]Figure 3), a biologically relevant observation that was also highly significant (Z score = 28.5). NTAs by TUT4 and TUT7 triggers “arm-switching,” changing the repressive activity of modified miRNAs.[170]^30 The TUMR is abolished in cells lacking the uridylation enzymes TUT4 and TUT7.[171]^13 We show that the canonical and mono-uridylated miR-30e-3p targetome differ dramatically ([172]Figure 5), which highlights that the correct interpretation of this biological phenomenon relies on appropriate protocol selection. In summary, this study unveils the importance of an appropriate sequencing method for isomiR analysis. Randomized adapter-based protocols outperformed those with fixed adapters on this end. This is of crucial importance for the detection of non-templated additions (especially NTA-Us and NTA-As), which are miRNA modifications of high biological relevance that are underestimated or overestimated by certain protocols due to their different biases. Accurate profiling of isomiRs is mandatory to achieve detailed interpretations of miRNA biological functions. Given the growing body of evidence for the functional importance of certain classes of isomiRs,[173]^13^,[174]^14^,[175]^15^,[176]^19^,[177]^31^,[178]^32^,[17 9]^33^,[180]^34^,[181]^35^,[182]^36 we propose that miRNA expression profiling with sequencing is ideally analyzed at the individual isomiR level, rather than aggregating all functional variants and non-functional sequences into one miRNA read count number. Limitations of the study The main limitation of the study is that, despite conducting a comprehensive comparison of several available miRNA-seq protocols, other less-known or future protocols not included in this study have the potential to outperform those described here. However, by following the same analysis pipeline described here, any other potential protocol should be easily compared with the others. It is important to note that the choice of the bioinformatics pipeline is crucial, especially for isomiR analysis, and that the analysis with a different bioinformatics pipeline can lead to different results. In this study, we used the widely used and validated miRNA analysis software sRNAbench, which is also optimized for isomiR profiling. STAR★Methods Key resources table REAGENT or RESOURCE SOURCE IDENTIFIER Deposited data __________________________________________________________________ miRXplore™ Universal reference pool, sequenced with IsoSeek protocol, this paper SRA PRJNA867189 miRXplore™ Universal reference pool, generated by Tewari[183]^5 SRA SRP098948 miRXplore™ Universal reference pool, generated by Valihrach[184]^38 SRA SRP258941 miRXplore™ Universal reference pool, generated by Guan[185]^20 SRA SRP245378 miRXplore™ Universal reference pool, generated by Cao[186]^8 SRA SRP228584 miRXplore™ Universal reference pool, generated by Hackl[187]^22 SRA SRP348989 HEK293T cells WT and DKO for TUT4-7 enzymes, sequenced with IsoSeek protocol, this paper SRA PRJNA867189 HEK293T cells WT and DKO for TUT4-7 enzymes by Kim[188]^6 SRA SRP173274 HEK293T cells WT and DKO for TUT4-7 enzymes by Gu[189]^19 SRA SRP227370 HEK293T cells WT and DKO for TUT4-7 enzymes by Gu[190]^13 SRA SRP165868 HEK293T cells WT and DKO for TUT4-7 enzymes by Gu[191]^21 SRA SRP338096 [192]Open in a new tab Resource availability Lead contact Further information and requests for resources should be directed to and will be fulfilled by the lead contact, D. Michiel Pegtel ([193]d.pegtel@amsterdamumc.nl). Materials availability This study did not generate new unique reagents. HEK293T WT and TUT4/7 DKO cells were a kind gift from Dr. S. Gu group. Experimental model and subject details HEK293T cells HEK293T WT and TUT4/7 DKO cells, a kind gift from Dr. S. Gu, were cultured in DMEM (Gibco), supplemented with 10% FBS (Life Science Group), 100 U/mL penicillin G, 100 μg/mL streptomycin and 1x MEM non-essential amino acids (Thermo Fisher Scientific). 5N adapters and isomiR spike-ins All adapters and spike-ins were synthesized by and purchased from Eurogentec. The 5′- and -3′′-adapter sequences are based on the adapters from the NEBNext Multiplex Small RNA Library Prep Kit for Illumina (New England Biolabs) with the addition of 5 random nucleotides (5N) as described by van Eijndhoven et al.[194]^7 Method details Small RNA library preparation and sequencing All small RNA libraries were prepared using the NEBNext Multiplex Small RNA Library Prep Kit for Illumina with fixed NEBNext adapters from the kit or our custom designed 5′- and -3′′-5N-adapters (IsoSeek).[195]^7 Libraries from the Miltenyi miRXplore Universal pool were prepared from 5fmol to 200 ng total cellular RNA using both NEBNext and Isoseek protocols. Both adapter-sets and the RT primers were 1:2 diluted (5′-adapters 5.63 μM, 3′-adapters 2.5 μM). RNA isolation Total RNA from cell lines was isolated using TRIzol reagent (Thermo Fisher Scientific) according to the manufacturers' protocol. Processing of sequencing data and isomiR profiling Pre-processing, mapping of adapter trimmed reads and isomiR classification (as stated in [196]Figure S1) were performed using the latest version of sRNAbench[197]^17 command line tool. Default parameters were used for all analysis steps and miRBase[198]^39 was used as miRNA reference. In the case of Isoseek and NEBNext samples that were sequenced by our group, quality control was carried out using mirnaQC.[199]^40 IsomiR classification was made using a hierarchical classification schema (see [200]Figure S1). That way all reads assigned to a given miRNA were checked if they belong to one of the following classes. * 1. Canonical miRNA: The sequence is exactly the canonical sequence in miRbase.[201]^39 * 2. NucVar: The read starts and ends at the same position as the canonical sequence but shows internal sequence variation. * 3. NTA: The read has Non-Templated Additions, those meaning extra A, T(U), C or G at the 3′ end. * 4. Length variants: The read starts (lv3p) or ends (lv5p) at the same position as the canonical miRNA but it has been extended or shortened at one of its ends. * 5. Multiple length variants (mv): The read doesn’t start or end as the canonical sequence Prediction of canonical and TUMR miR-30e-3p targets Canonical target sites of miR-30e-3p were calculated using TargetScan version 8.0[202]^41 whereas conserved TUMR targets were obtained as previously described,[203]^13 adapting the script provided by the authors. The analysis was performed on a subset of 23 species where miR-30e-3p is fully conserved, which corresponded to 652096 3′-UTRs sequences from the TargetScan database. TUMR targets were searched in these 3′-UTRs sequences using base-pairing with up to 3 G:U wobble pairs. TUMR targets that were not conserved in at least 15 of the 23 studied species were not further considered. Data visualization ‘UpSetR’[204]^42 R package was used in [205]Figure 3 upset plot together with ‘ggplot2’. Quantification and statistical analysis Comparison of the T>C and A>G modifications percentage between the HEK293T WT cells and the 963-miRNA pool. For the statistical comparison of the percentage distribution of these modifications between the cells and the synthetic pool, a T-test to compare the means was performed, and only q-value lower than 0.05 was showed. miRXplore Universal reference pool: NEBnext, n = 9 (number of samples); QIAseq, n = 8; AQ-seq, n = 7; IsoSeek, n = 3. HEK293T cells: NEBnext, n = 15; QIAseq, n = 2; AQ-seq, n = 2; IsoSeek, n = 2. Differential expression and statistical analysis of the intersection Differential expression analysis in each study shown in [206]Figure 3 were computed by means of a t-test and the resulting p value was corrected for multiple testing using Benjamini-Hochberg’s FDR correction (q-value <0.05 to be considered as differentially uridylated). Only miRNAs with at least 1.5 times more expression in the WT than the DKO are considered as differentially uridylated and therefore shown in [207]Figure 3. To test if the observed number of commonly TUT dependent uridylated miRNAs is statistically significant, we randomly pick 1000 times the same number of miRNAs from the total of differentially uridylated miRNAs for each of the 6 studies. For each randomization we determine the intersection of the 6 sets. Thus, we obtain the expected value (mean of the randomization runs) and its standard deviation which allows us to calculate the Z score, i.e. the number of standard deviations the observed value is separated from the expectation. Gómez-Martín et al., 2023, NEBnext, n = 2 (number of samples); Yang, Shao et al. 2020, n = 3; Yang,Bofill-De Ros et al., 2019, n = 3; Gómez-Martín et al. 2023, IsoSeek, n = 2; Yang et al. 2022, n = 4; Kim et al., 2019, n = 3. Pathway enrichment analysis Panther Pathway enrichment analysis was performed by means of PantherDB[208]^43 webserver using as a reference all genes expressed by HEK293T cells as according to Human Protein Atlas database ([209]https://proteinatlas.org).[210]^44 An FDR corrected p value of 0.05 was used as cut-off. Acknowledgments