Abstract

   Library preparation is a key step in sequencing. For RNA sequencing
   there are advantages to both strand specificity and working with minute
   starting material, yet until recently there was no kit available
   enabling both. The Illumina TruSeq stranded mRNA Sample Preparation kit
   (TruSeq) requires abundant starting material while the Takara
   Bio SMART-Seq v4 Ultra Low Input RNA kit (V4) sacrifices strand
   specificity. The SMARTer Stranded Total RNA-Seq Kit v2 - Pico Input
   Mammalian (Pico) by Takara Bio claims to overcome these limitations.
   Comparative evaluation of these kits is important for selecting the
   appropriate protocol. We compared the three kits in a realistic
   differential expression analysis. We prepared and sequenced samples
   from two experimental conditions of biological interest with each of
   the three kits. We report differences between the kits at the level of
   differential gene expression; for example, the Pico kit results in 55%
   fewer differentially expressed genes than TruSeq. Nevertheless, the
   agreement of the observed enriched pathways suggests that comparable
   functional results can be obtained. In summary we conclude that the
   Pico kit sufficiently reproduces the results of the other kits at the
   level of pathway analysis while providing a combination of options that
   is not available in the other kits.

   Subject terms: Biotechnology, Computational biology and bioinformatics

Introduction

   RNA-Sequencing (RNA-Seq) analysis has become the de facto method for
   measuring gene expression genome wide. However, when designing an
   experiment, the investigator is faced with the task of making many
   decisions, including choice of platform and library preparation
   protocol, which can involve considerable trade-offs. We focus here on
   the Illumina and Takara Bio kits and investigate the differential
   effect of library prep protocol. The most widely used protocol involves
   the TruSeq (Illumina, catalog no. RS-122-2103) kit, which assumes
   abundant starting material (0.1–1 μg total RNA) and maintains strand
   specificity. Strand information is vital to the analysis of many
   studies^[44]1–[45]4. Indeed many genes undergo anti-sense
   transcription, which serves a regulatory purpose and has also been
   associated with disease^[46]5. In a standard treatment/control
   experiment, an important segment of signal comes from the anti-sense
   strand of annotated RNA and it is not uncommon to find genes with more
   anti-sense signal than sense. If data generated are not strand
   specific, then all such reads will get quantified as “sense” signal.

   In general, since one can obtain more accurate sense expression from
   strand specific data and one cannot obtain any anti-sense information
   from non-strand specific data, it is considered preferable to generate
   strand specific data. However, until recently it has not been possible
   to perform strand-specific sequencing on samples that require a high
   level of amplification from quantities below 10 ng total RNA. This
   situation has now been addressed in the new Pico kit (Takara Bio,
   catalog no. 634413). Different kits produce data with different biases.
   For example the act of removing the ribosomal RNA (rRNA) has a dramatic
   effect on the results; the polyA selection approach tends to result in
   a 3′ coverage bias, while the approach of hybridizing the rRNA out with
   beads results in a very different bias^[47]6. Therefore, the question
   naturally arises as to whether the same general results will be
   obtained from using two different kits on the same samples. To
   investigate this, we compared the TruSeq, Pico and V4 kits in a
   realistic differential expression (DE) analysis, using both abundant
   and minute quantities of starting material. The V4 kit (Takara Bio,
   catalog no. 634888) is the kit typically used for minute quantities
   which does not preserve strand specificity.

   Most comparative analyses of RNA-Seq methods, both wet bench and
   computational, are on samples that are too artificial to draw
   meaningful conclusions about performance in practice. For example, many
   studies^[48]7–[49]10 used the universal reference samples^[50]11 to
   benchmark differential expression (DE) analysis. Yet the comparison of
   technical replicates of two completely different reference samples is
   not realistic enough to draw practical conclusions, as we expect almost
   any expressed gene to be differentially expressed. Furthermore,
   technical replicates guarantee the distribution of each gene’s
   expression is independent from all other genes – which is in stark
   contrast to real data where a small minority of genes are
   differentially expressed, with an extremely complex background of
   non-DE genes involving population wide, highly dependent distributions.

   A few recent studies assess the quality of libraries produced with low
   starting material using a variety of kits. Palomares et al. evaluate
   the quality of libraries produced by different input amounts, and
   suggest that the non-stranded V4 kit performs well for low input
   quantities, compared to the TruSeq kit^[51]12. Similarly, Song et al.
   suggest that the TruSeq and V4 kits perform well for low starting
   material, while they also suggest the Pico kit if polyA and non-polyA
   mRNA are of interest^[52]13. However, neither looked at the ability of
   the Pico kit to produce stranded data with low input material, which is
   its key advantage.

   Here we designed a standard treatment/control experiment to assess the
   differences between the three kits in a typical differential expression
   analysis. Specifically we examined the hepatic inflammatory response of
   mice by assaying liver RNA from saline (control) and IL-1β treated
   mice^[53]6.

   There are several levels on which the data can be compared. First is
   the level of the raw data itself (alignment statistics, quality scores,
   quantified values, ribosomal content, etc.). The second level is to
   compare the results of a DE analysis of genes or gene features such as
   exons, introns or junctions. The third level on which to compare
   methods is on pathway enrichment of DE genes. There might be
   significant discordance at the level of alignment and even the specific
   lists of DE genes, but if they result in the same pathway enrichment
   results then it is likely the same conclusions will be drawn from each
   analysis; and indeed, this is what was observed in this study. The
   greatest differences between the kits were identified at the raw data
   and DE levels, however for the most part no major differences were
   identified at the pathway level.

   Although the pathway analyses were similar, one would still like to
   understand the source of the differences at the levels of raw data and
   DE. Therefore, we PCR validated some of the most discordant genes in
   order to determine which kit was more accurate.

Results

   Here we give the details of the comparative analysis of the three kits
   Pico, TruSeq and V4, on the levels of raw sequencing libraries,
   differential expression detection, and pathway enrichment analysis.

Gene expression evaluation

   As a standard treatment/control experiment, we sequenced three liver
   RNA samples from mice treated with saline and three treated with
   20 μg/Kg of IL-1β. We analyzed RNA from each sample with all three
   kits, and used 1.7–2.6 ng of RNA with the Pico kit, 0.8–1.3 ng with the
   V4, and 200 ng with the TruSeq. We followed the standard steps of a
   typical RNA-Seq analysis. Data were aligned, normalized, quantified,
   and Differential Expression p-values were computed (see Methods). The
   library prep, sequencing and analysis was repeated twice for the Pico
   kit. Additionally, we had sequenced the same samples with the TruSeq
   kit for a previous study, allowing us to compare the two TruSeq runs as
   control for batch effects when comparing TruSeq to Pico and V4.

   More than 90% of raw reads uniquely mapped to the reference genome for
   all three kits (Fig. [54]1a), which is reasonably high. Since rRNA is
   roughly 99% of total RNA, all kits necessarily perform ribosomal
   depletion. This typically removes more than 90% of rRNA, depending on
   tissue and other factors. Nevertheless, in the Pico kit ribosomal reads
   were retained up to 40–50% (Fig. [55]1b), while in the TruSeq kit it is
   ~7%. In their comparative study of ribosomal RNA removal kits, Herbert
   et al. report ~15% rRNA retention with the Pico kit^[56]14. Given the
   discrepancy we validated our findings in an independent library
   preparation and sequencing run. However, it is problematic to compare
   to Herbert et al., since they used very different samples - a mixture
   of 10 different cancer cell lines as compared to mouse liver - and used
   very different criteria for calling a read as ribosomal. They used a
   BWA alignment to the human genome build hg19 which unfortunately is
   missing half of 45 S, while we constructed a comprehensive rRNA library
   from GenBank and aligned to it using BLAST. These differences likely
   explain why their rRNA retention rate of 15% for the Pico kit is so
   much lower than ours and also why ours is probably more accurate. To
   investigate further, we compiled the distribution of the FPKM
   normalized ribosomal retention across the different ribosomal RNA
   subunits (5 S, 45 S, 18 S, 28 S, 5.8 S, 45 S spacers, and mitochondrial
   rRNAs). TruSeq and V4 have similar profiles, as expected since they
   both utilize polyA selection, while Pico’s ZapR method retains much
   higher levels of 5 S and 5.8 S (Supplemental Fig. [57]S1).

Figure 1.

   [58]Figure 1
   [59]Open in a new tab

   Profiling of RNA-Seq libraries. (a) Average rate of uniquely aligned
   reads to the reference genome, for each library preparation kit. The
   average total number of raw reads is printed for each kit in millions.
   (b) Average rate of uniquely aligned reads to ribosomal genes. TruSeq
   and V4 kits do polyA selection, while the Pico kit does ribo depletion
   using the ZapR enzyme. (c) Average rate of duplicated reads, after ribo
   removal. (d,e) Number of sense and anti-sense expressed genes per kit
   after normalization. We include genes that are expressed in at least
   one of the samples. (f) Percent of anti-sense reads mapping to genes
   out of all reads mapping to genes either sense or anti-sense, in the
   two stranded kits TruSeq and Pico. (g–i) Percentages of number of reads
   that map to gene, intron sense, and intergenic regions out of total
   number of reads after ribo removal. (a–i) The error bars represent the
   95% confidence interval around the percentage value. The T-statistic
   for confidence interval for standard error was calculated. (j,k)
   Pairwise comparisons of mean normalized sense gene expression of the
   ~200 genes with the highest anti-sense expression in the TruSeq kit, in
   both conditions. Pearson’s correlation is calculated. This illustrates
   the degree to which V4 over-estimates the sense signal on such genes,
   because it cannot differentiate with anti-sense. (l) Example of Tcp1
   gene body coverage in one of the IL-1β treated samples (ILB.9579);
   shows that Tcp1 has anti-sense expression (shown in blue color for Pico
   and TruSeq kits); indicates overestimation of the sense expression in
   the V4 un-stranded kit.

   Another common observation in RNA-Seq libraries is the high rate of
   duplicated reads. To evaluate this rate, we quantified the duplication
   rate after removal of ribosomal reads using the Duplication Rate
   Identifier for NGS Cleanup ([60]https://github.com/itmat/DRINC)
   developed in house, and showed that it is elevated in Pico (~20%)
   compared to the other two kits (Fig. [61]1c). Overall our results
   indicate that the Pico kit results in substantially higher ribosomal
   content and PCR duplication artifacts potentially associated with the
   low starting material.

   To assess the concordance of the kits on the level of quantification,
   the raw read counts were normalized and quantified at both the gene
   level and the exon-intron-junction levels, both sense and anti-sense
   (see Methods). The percentage of uniquely aligned reads that map to
   genes in the sense orientation (gene-mappers) in the Pico kit is ~10%
   smaller than in TruSeq (Fig. [62]1g), which can be explained by the
   greater intronic signal in Pico. Nonetheless, the observed number of
   expressed genes is comparable, across all three kits (Fig. [63]1d).

   The ability to quantify anti-sense gene expression is the key advantage
   of the Pico kit over the V4 kit. To investigate the extent of
   anti-sense signal in the two stranded kits Pico and TruSeq, we compared
   the balance of sense to anti-sense gene signal and we compared how many
   genes were identified with anti-sense transcription. Figure [64]1f
   shows the percent of anti-sense reads mapping to genes out of all reads
   mapping to genes either sense or anti-sense. Surprisingly, the percent
   is substantially higher in Pico, with roughly 1.5% anti-sense versus
   roughly 0.5% in TruSeq. As a result, Pico identified about 20% more
   genes expressing anti-sense signal in spite of having lower read depth
   and considerably higher rRNA retention (Fig. [65]1e). We therefore
   conclude that Pico is at least as sensitive to anti-sense transcription
   as TruSeq if not more.

   To investigate the adverse effect of non-strand specific sequencing on
   quantifying sense gene expression, we identified the genes with the
   highest anti-sense signal (~200 genes with mean anti-sense expression
   ≥20 normalized reads) for each treatment in the TruSeq kit. We then
   compared the sense expression of these genes across the three kits, to
   assess whether strandedness information was important for accurately
   quantifying gene expression levels. We expected that the correlation of
   the gene expression levels of TruSeq and Pico signals would be closer
   together because they eliminate anti-sense reads that do not constitute
   bona fide gene expression proxies. We observe that in both conditions
   the mean sense expression of Pico and TruSeq are highly correlated
   (R > 0.9), while V4 shows lower correlation with TruSeq
   (Fig. [66]1j,k); indeed, verifying that strandedness is important for
   accurate gene expression quantification. Figure [67]1l shows an example
   of V4 overestimating the sense gene expression; Tcp1 gene has
   anti-sense expression (shown in blue for Pico and TruSeq), which is
   combined in the sense gene expression in the V4 results.

   To investigate sense intron signal, we first identified using TruSeq
   which introns do not have any anti-sense signal. From these introns we
   compared the percentages of uniquely aligned reads between the three
   kits, and observed that it is elevated in the Pico kit, compared to the
   TruSeq and V4 kits (Fig. [68]1h). Similarly, we also observed an
   elevated percentage of reads mapping to intergenic regions in Pico,
   compared to the other two kits (Fig. [69]1i). In Supplemental
   Fig. [70]S2 we illustrate these two observations for Eri3 gene with a
   Genome Browser^[71]15 profile. This could be due to retained introns
   from pre-mRNA signal, since the Pico kit uses rRNA depletion instead of
   polyA selection.

   A hierarchical clustering of the 18 samples was performed
   (Fig. [72]2a), from which a clear distinction of kit type is observed.
   Thus the differences between the kits are more pronounced than the
   differences between the samples, in spite of a powerful treatment
   affecting thousands of genes with large effect sizes. After kit,
   samples cluster based on treatment. However, the correlation of the
   average of the normalized expression profiles between the three kits,
   as calculated using Spearman’s method, is high. Specifically, the
   highest correlation (0.96) is illustrated between the Pico and TruSeq
   kits (Fig. [73]2b). To further assess the concordance between the three
   kits, we performed pairwise comparisons of the means of normalized gene
   expressions (Fig. [74]2c–e). The higher linearity between the Pico and
   TruSeq kits (Fig. [75]2c) could be translated into the Pico kit being
   as reliable as the TruSeq kit.

Figure 2.

   [76]Figure 2
   [77]Open in a new tab

   Concordance of gene expression profiles. (a) Hierarchical clustering by
   normalized expression correlation of all 18 samples shows clear
   distinction of the samples first by kit type and secondly by treatment.
   (b) Average sample Spearman’s correlation of normalized gene expression
   shows higher concordance for Pico and TruSeq. (c–e) Pairwise
   comparisons of mean normalized gene expression demonstrate higher
   concordance between Pico and TruSeq. (f) Spearman’s rank correlation of
   normalized gene expression of IL-1β treated sample (ILB.9579) indicates
   that TruSeq is the most highly correlated to RT-PCR, while V4 is the
   least. The PCR metrics shown are the number of cycles (PCR.CT), and the
   efficiency scores controlled by Gapdh (PCR.Gapdh) or Vps13d
   (PCR.Vps13d).

   The high concordance of the Pico and TruSeq kits was also verified by a
   comparison of gene rankings by average expression between the two kits
   (Supplemental Fig. [78]S3; See Methods).

   To further evaluate the accuracy of the kits, we validated selected
   genes with RT-PCR. Specifically, we used one of the IL-1β treated
   samples (ILB.9579), based on which we selected 18 genes (Table [79]1)
   that were highly expressed or absent in only one of the kits and not in
   the others. We compared the expression values of RT-PCR and RNA-Seq,
   using the RT-PCR efficiency score and the RNA-Seq normalized read
   counts of all the loci the primers mapped to, as described in Methods.
   Both RT-PCR and RNA-Seq data were normalized with two controls, Gapdh
   and Vps13d (see Methods for explanation of why Vps13d was used in
   addition to Gapdh). The Spearman’s rank correlation of the normalized
   gene expression illustrates that TruSeq is the most correlated to
   RT-PCR (either relative to the average number of cycles, or the
   efficiency score normalized by Vps13d or Gapdh), while V4 is the least
   (Fig. [80]2f).

Table 1.

   RT-PCR-validated genes.
   Gene Pico V4 Truseq RT-PCR (AVG CT) Efficiency score using Vps13d
   Efficiency score using Gapdh
   Mup-ps19 12,839 137,475 87,486 15.68 0.007 2.037
   Saa1 6,358 114,234 81,987 15.84 0.007 1.959
   Gapdh 9,305 167,469 133,483 16.09 0.003 1.000
   Hyou1 3,152 635 14,199 19.38 0.008 24.915
   Myh9 2,893 197 3,171 20.41 0.303 90.535
   Gm15450 2,035 6,377 7,295 20.72 0.234 69.974
   Hist1h1e 251 58 3 20.98 0.523 127.969
   Dhx9 482 57 1,217 21.25 0.245 73.192
   Abhd2 808 61 2,181 21.46 0.370 110.569
   Gm14681 254 594 1,417 22.35 0.294 87.770
   Mll2 1,290 89 1,280 22.41 0.664 198.347
   Vps13d 1,092 115 1,377 22.71 1.000 298.751
   Acvr1b 160 12 488 22.72 0.453 135.227
   Kif1c 581 59 1,330 23.26 1.753 523.636
   Tcf20 491 16 365 23.84 5.227 1560.898
   Zfp445 431 15 562 23.89 0.927 276.947
   Ttll4 301 43 897 23.93 2.310 689.927
   Cdc42bpb 323 11 546 25.77 0.635 189.568
   [81]Open in a new tab

   The normalized gene expression of the 18 most discordant genes for
   sample ILB.9579 is given for all kits. The RT-PCR expression values are
   reported as the average number of PCR cycles, and the efficiency scores
   using two controls, Gapdh and Vps13d.

   To summarize, in this section we have been primarily concerned with
   evaluating the reliability of the signal from the Pico kit once it is
   quantified to genes and other features such as introns and intergenic
   regions. The ability of Pico to measure anti-sense signal is of
   interest as well as the maintenance of reliable sense signal. Such
   measurements are what is most relevant to the downstream analysis and
   we conclude from the various results that the Pico kit does provide
   robust anti-sense signal as claimed and also that properly quantifying
   anti-sense signal is equally important to accurately quantifying the
   sense signal. In spite of the TruSeq kit having better properties at
   the level of raw data (e.g. alignment statistics, rRNA retention rate,
   etc.) it appears that the Pico kit provides sufficiently reliable
   sense-signal and is generally more sensitive than TruSeq to the
   anti-sense signal, both for gene signal and intron signal.

Effects on differential expression identification

   To evaluate the kit impact on differential gene expression, we
   performed differential expression analysis with limma-voom on both gene
   and intron levels, comparing untreated (controls) to IL-1β treated. For
   different q-value cutoffs, we observed that the number of
   differentially expressed genes (DEGs), either up-regulated or
   down-regulated, follow the same trend across all kits (Fig. [82]3a).
   However, it is notable that the magnitude of the number of DEGs differs
   significantly, with Pico identifying 55% fewer DEGs than the TruSeq kit
   at q-value cutoff 0.1, and the V4 kit even less (Fig. [83]3a,
   Table [84]2). In contrast, the Pico and TruSeq kits identify a small
   and similar number of differentially expressed, retained introns,
   compared to the V4 kit (Fig. [85]3e), which could be due to V4
   incorrectly assigning anti-sense transcription to a gene’s body and
   introns. Moreover, the pairwise comparisons of the adjusted
   fold-changes (See Methods) of the average normalized gene expressions
   between kits also demonstrate that the Pico kit is highly concordant
   with the TruSeq (Fig. [86]3b–d).

Figure 3.

   [87]Figure 3
   [88]Open in a new tab

   Kit effect on differential gene expression. (a) Number of up- and
   down-regulated DEGs identified at different q-value cutoffs. (b–d)
   Scatterplots of pairwise comparisons of fold changes of the average
   normalized gene expression, between kits. Pearson’s correlation of
   adjusted fold changes of the average normalized gene expression between
   two kits, is calculated. (a) Number of up- and down-regulated,
   differentially expressed, retained introns identified at different
   q-value cutoffs.

Table 2.

   Number of up- and down-regulated, differentially expressed genes at
   different q-value cutoffs, for all kits.
   q-value cutoff Pico-Up Pico-Down V4-Up V4-Down Truseq-Up Truseq-Down
   0.9            9123    6600      9649  5900    10645     6869
   0.8            8258    5832      8634  5028    9900      6521
   0.7            7469    5140      7735  4227    9292      6031
   0.6            6643    4417      6763  3479    8695      5544
   0.5            5809    3737      5831  2870    8092      5064
   0.4            5070    3186      4949  2325    7437      4578
   0.3            4336    2667      4055  1825    6769      4075
   0.2            3575    2126      3171  1397    6067      3533
   0.1            2731    1578      2091  953     5053      2872
   0.05           2116    1200      1404  642     4296      2390
   0.01           1260    754       483   299     2815      1552
   [89]Open in a new tab

   We investigate why there is such a difference in the number of DEGs
   (Fig. [90]3a, Table [91]2), since the number of the genes being
   expressed is comparable across the kits (Fig. [92]1d). First, we
   examine to what extent the same DEGs are identified using the different
   kits. For various q-value cutoffs, we calculate the ratio of the number
   of DE genes found by both kits, to the number of genes found DE by at
   least one of the two kits. Values near 1 indicate that the DEG lists
   from both methods are nearly identical, while values near 0 indicate
   that only a small number of the total DEGs were found by both methods.
   As shown in Fig. [93]4a,b, we do not observe a high rate of common DEGs
   for any cutoffs. As a baseline for comparing V4 and Pico to TruSeq, we
   compared the two different runs of TruSeq (Supplemental Fig. [94]S4a).

Figure 4.

   [95]Figure 4
   [96]Open in a new tab

   Effects on differential expression. (a,b) Heatmap of the ratio of the
   number of DE genes found by both kits, to the number of genes found DE
   by at least one of the two kits, at varying q-value cutoffs (blue
   indicates nearly identical DEG lists and orange indicates highly
   discordant DEG lists). (c) Absolute value of the log[2] adjusted
   fold-changes for DEGs identified exclusively by Pico (n = 371) or
   TruSeq (n = 1755). (d) Absolute value of the log[2] adjusted
   fold-changes for DEGs identified exclusively by V4 (n = 317) or TruSeq
   (n = 3606). (c,d) The error bars represent the 95% confidence interval
   around the fold-change values. DEGs identified as those with
   q-values < 0.01. (e) Coefficients of variation for DEGs identified
   exclusively by Pico or TruSeq. (f) Coefficients of variation for DEGs
   identified exclusively by V4 or TruSeq. The difference among the kits
   was evaluated based on the overlapping of the notch region, defined as
   median m ± 1.58IQR/√n (See Methods). If the notches of the boxes of the
   two kits do not overlap, it suggests that the medians are significantly
   different, with 95% confidence interval. This suggests significant
   differences in variation of the two kits.

   A decrease in power to detect DEGs is due to an increase in variance
   and/or a decrease in effect size (e.g. fold-change), given all other
   things, such as number of replicates, are equal in the study designs.
   We investigated the source of the differences in the DE analyses by
   examining those genes that were significantly DE (q-value ≤ 0.01) in
   the TruSeq data but were not significant (q-value > 0.01) in either the
   Pico or V4 kits, and conversely (Fig. [97]4c–f). For genes detected as
   significantly DE in one kit, say kit A, and not in another kit, say kit
   B, the fold-changes are always larger in kit A (Fig. [98]4c,d).
   Meanwhile, the coefficients of variation (CV) are always equal or lower
   in the TruSeq data across all comparisons (Fig. [99]4e,f). The
   V4-specfic DEGs present an interesting case, since both the CV and
   fold-change are substantially higher in the V4 data than in the TruSeq
   data. For these DEGs, it appears the increased effect size has
   compensated for the increased variance. However, the TruSeq data
   identified nearly five times more kit specific DEGs than the Pico data,
   and over 11 times more than the V4 data. We looked for enrichment of
   general properties (gene length, GC-content, number of isoforms) among
   the Pico- and V4-specific DEGs, compared to the TruSeq-specific DEGs,
   but did not identify any significant factors. We also examined effect
   sizes and CVs for genes not found DE (q-value > 0.3; mean expression
   per condition >2) in data from any kit (Supplemental Fig. [100]S4b,c).
   Both analyses indicate that there is larger variance between the kits
   than between the biological replicate samples. To check for any impact
   of DE method on the observed DE discrepancies between kits, the DE
   analysis was repeated by quantifying gene expression with
   kallisto^[101]16 followed by DESeq2^[102]17 for the differential
   expression analysis. The results replicated our previous observations
   (Supplemental Fig. [103]S4d).

Implications on functional analysis

   To evaluate the functional effect of the DE discrepancy, we performed
   pathway enrichment analysis. The pathway analysis was performed using
   both the top 1,000 DEGs ranked by q-value, and the set of genes with
   q-value ≤ 0.05. In the first analysis a modest overlap was observed
   (Fig. [104]5a). Nevertheless, all three kits led to highly similar
   pathway enrichments, which are primarily related to inflammatory
   response, as expected. The heatmap of top 10 pathways identified in
   each of the three kits, sorted by enrichment p-value is shown in
   Fig. [105]5b.

Figure 5.

   [106]Figure 5
   [107]Open in a new tab

   Pathway enrichment concordance. (a) Overlaps of the 1,000 genes, used
   for pathway enrichment analysis for each kit. (b) Heatmap of top 10
   pathways of each kit, sorted by enrichment p-value. Pathway enrichment
   analysis for each kit was performed using the top 1,000 genes sorted by
   DE q-value. (c) Overlaps of the genes at DE q-value ≤ 0.05, used for
   pathway enrichment analysis for each kit. (d) Heatmap of top 10
   pathways of each kit, sorted by enrichment p-value. Pathway enrichment
   analysis for each kit was performed using genes with DE q-value ≤ 0.05.
   (b,d) Grey color indicates that pathway is not in the top 10 most
   enriched pathways of the specific kit.

   Similar results were produced using DEGs at q-value ≤ 0.05, for each
   kit (Fig. [108]5d), illustrating a modest overlap of genes
   (Fig. [109]5c). However, examining the two analyses (Fig. [110]5b,d)
   there is an overall consistency of the top 10 enriched pathways
   identified with Pico and TruSeq. Additionally, comparing the top 10
   pathways identified for the Pico kit by using different q-value cutoffs
   (Fig. [111]5b,d), we see consistency of the enrichment analysis for
   Pico. V4 shows a decreased reproducibility, demonstrating a lower
   overlap at different q-value cutoff, and having a less consistent
   overlap when compared to TruSeq kit.

Discussion

   While RNA-seq is a powerful technology in transcriptome profiling, some
   protocols do not retain the strand of the original transcripts.
   Anti-sense transcription plays an important role in the transcriptome
   and additionally this strand information is critical to accurately
   quantify sense gene expression, particularly for genes with overlapping
   genomic loci that are transcribed from opposite strands^[112]3. With
   our expanding appreciation for the regulatory and biological functions
   provided by anti-sense transcripts, strand-specific sequencing provides
   the most direct means for studying this class of RNAs^[113]4.

   Our goal was to evaluate whether the Pico kit performs as advertised.
   The claim is that it is comparable to the standard TruSeq kit, which is
   why we included the TruSeq as a gold standard. We further included the
   V4 just to have another baseline specific for low input materials, and
   also to investigate the differences of the strandedness retention. By
   focusing on these kits we were able to investigate the Pico kit to a
   level of detail which would have been cumbersome on an analysis of all
   kits in regular production. A broad analysis of many kits can be found
   in Palomares et al.^[114]12

   By focusing on samples representative of what is typically sequenced in
   practice, and in particular representing two experimental conditions,
   we were able to perform a detailed comparative evaluation of three
   RNA-Seq library preparation kits to evaluate the effect of the strand
   specificity on gene expression and pathway enrichment, and therefore
   the efficiency of the kits that use low quantities of RNA (Pico and V4
   kits) compared to the TruSeq kit.

   We observe that the alignment rates are comparable with over 90%
   alignment rates for all kits (Fig. [115]1a). The Pico kit data showed
   elevated levels of ribosomal content (Fig. [116]1b), which agrees with
   previous findings (Takara Bio 2015, Herbert et al. 2018^[117]14), as
   well as increased numbers of duplicated reads (Fig. [118]1c) and
   retained introns (Fig. [119]1h,i). However, the gene expression
   profiles indicated that the Pico and TruSeq kits have the greatest
   concordance (Fig. [120]2), which was also validated with RT-PCR of the
   19 most discordant genes (Fig. [121]2f). The number of DEGs (both up-
   and down-regulated) identified at varying q-value cutoffs, however,
   showed noticeable differences across kits (Fig. [122]3a, Table [123]2).
   Furthermore, the overlap between the sets of DEGs identified by both
   Pico (or V4) and TruSeq is not particularly high at any significance
   cutoff (Fig. [124]4a,b). This is likely due to the observed significant
   differences between the Pico (or V4) and TruSeq kits in gene expression
   variability across samples, for both treatment conditions
   (Fig. [125]4c–f). Specifically, Pico and V4 introduce considerably more
   variance than the TruSeq, which indicates that the largest variance is
   between the kits rather than the sample type. Finally, despite the
   differences observed at various levels of the comparative analysis,
   batch effects in preparation, and differences in sequencing geometry,
   the agreement of the observed enriched pathways indicates that
   meaningful and consistent results were obtained from using the
   different kits (Fig. [126]5b).

   Additionally, we showed that the un-stranded kit overestimates the
   sense gene expression (Fig. [127]1j,k) suggesting that the best
   practice for accurate gene expression quantification would be to retain
   the strand specificity in the RNA-Seq data. We therefore recommend the
   Pico kit over the V4 kit for library preparation when starting with
   small RNA quantities. Importantly, we show that treatment-wise
   variation of the libraries prepared with Pico considerably affects the
   identification of the differentially expressed genes, suggesting that
   adding more replicates could result in a more powerful study design.

Methods

Data

   Twelve-week old male C57/B6J mice were purchased from Jackson Labs
   and were housed in a controlled environment with regard to light,
   temperature and humidity in the animal facility of the University of
   Pennsylvania. All mice had free access to food and water. The animal
   care and treatment procedures were approved by the Institutional Animal
   Care and Use Committees of the University of Pennsylvania. All
   experiments were performed in accordance with relevant guidelines and
   regulations.

   Four hours prior to tissue collection, mice were treated with either
   20 µg/Kg of IL-1β or saline by intraperitoneal injection. Mice were
   euthanized through carbon dioxide induced asphyxiation 4 hrs after
   IL-1β or saline injection. Livers from animals perfused with ice-cold
   PBS were harvested and immediately stored in RNAlater® solution
   (Ambion, Austin, TX) at 4 °C. After 24 h, the tissue samples were
   transferred to −80 °C for storage until analysis. RNA was extracted
   using TRIzol® Reagent (Life Tehcnologies, Grand Island, NY) and RNeasy
   Kit (Qiagen, Valencia, CA) following the manufacturer’s protocol. The
   concentration and quality of extracted RNA were measured using
   NanoDrop® 1000 (Thermo Scientific, Wilmington, DE) and
   reverse-transcribed into cDNA using TaqMan Reverse Transcription
   Reagents (Applied Biosystems, Foster City, CA). Both treatment
   conditions have 3 biological replicates. Samples were extracted and
   aliquoted for the three runs, in order to assess the technical
   variability only.

RNA-Sequencing

   We performed RNA-Seq on 6 samples using three different library
   preparation kits following the manufacturer’s recommendations. UMI tags
   were not incorporated. (1) Pico: Total RNA from each liver sample was
   prepared for sequencing using the Takara Bio SMARTer: SMARTer® Stranded
   Total RNA-Seq Kit v2 - Pico Input Mammalian, with rRNA depletion,
   performed by first converting to cDNA using the Zapr enzyme which
   targets the ribosomal RNA sequences. 1.7–2.6 ng of RNA was used to
   prepare the RNA libraries. Five cycles of PCR were performed before
   rRNA depletion and fifteen cycles during the last library
   amplification. The libraries were sequenced on Illumina HiSeq 4000.
   Depths of 30–44 million paired-end 150 bp reads were generated for each
   sample. (2) V4: Total RNA from each liver sample was prepared for
   sequencing using the Takara Bio SMART-Seq: SMART-Seq® v4 Ultra® Low
   Input RNA Kit, with PolyA selection for ribo depletion. 0.8–1.3 ng of
   RNA was used to prepare the RNA libraries. Twelve cycles of PCR were
   performed during cDNA amplification and twelve cycles during Nextera
   library prep. The libraries were sequenced on Illumina HiSeq 4000.
   Depths of 15–30 million paired-end 150 bp reads were generated for each
   sample. (3) TruSeq: Total RNA from each liver sample was prepared for
   sequencing using the Illumina: TruSeq Stranded mRNA Sample Preparation
   Kit, with PolyA selection for ribo depletion. 200 ng of RNA was used to
   prepare the RNA libraries. Fourteen cycles of PCR were performed. The
   libraries were sequenced on Illumina HiSeq 2500. Depths of 16–34
   million paired-end 125 bp reads were generated for each sample. The
   experiment with the libraries prepared with the TruSeq kit was
   performed by Lahens et al.^[128]6.

   The Pico libraries were reproduced to repeat the experiment and ensure
   reproducibility.

RT-PCR

   Quantitative real time PCR was performed using Fast SYBR Green Master
   Mix on an ABI ViiA7 real-time PCR system in a 384 well plate at 95 °C
   for 20 s (hold stage), 40 cycles of 95 °C for 1 s and 60 °C for 20 s
   (PCR stage), and 95 °C for 15 s, 60 °C for 1 min, 95 °C for 15 s (melt
   curve stage). A standard curve with five different cDNA concentrations
   (0.125, 0.25, 0.5, 1 and 2 µl) was prepared using a representative cDNA
   for each gene of interest. 19 genes that were highly expressed or
   absent in one kit, were validated. All primers (final concentration-
   1 µM/reaction) were designed based on sense signal loci using Primer3
   software.

RNA-Seq analysis

   RNA-Seq data were aligned to the mouse genome build mm9 by
   GSNAP-v2018-07-04^[129]18. The raw read counts were normalized using a
   read-level resampling strategy, and quantified at the gene and
   exon-intron-junction levels, using the Pipeline Of RNA-Seq
   Transformations v0.8.5b-beta (PORT)
   ([130]https://github.com/itmat/Normalization). We used PORT in order to
   obtain normalized SAM files for comparisons at the read level. For the
   ribosomal analysis, we aligned the reads to the ribosomal subunit
   sequences with Blast v2.2.30 + ^[131]19 and FPKM normalized the read
   counts. For gene quantification, reads were required to respect the
   annotated exon/exon junctions. This applies to both sense and
   anti-sense gene level quantification. Read duplication rates were
   calculated with Duplication Rate Identifier for NGS Cleanup (DRINC)
   ([132]https://github.com/itmat/DRINC). Differential expressed genes
   were determined from a treated versus control comparison of first the
   PORT normalized expression values, using limma-voom -v3.34.9
   package^[133]20 and second by DESeq2 -v1.22.1^[134]17 applied on
   estimated expression by kallisto -v0.44.0^[135]16. Enrichment analysis
   was done using the Ingenuity Knowledge Base ([136]www.ingenuity.com),
   on the top 1000 differentially expressed genes ranked by q-value. The
   top 10 pathways of each kit are reported. All visualization is done
   with R-v3.4.3 packages.

Statistical analysis

Adjusted fold change

   Expression fold-changes were adjusted by adding 20 reads to both terms
   of the ratio as suggested in Nayak et al.^[137]21 as an optimal
   pseudocount.

   Difference in ranks is essentially the difference in the ranks of two
   kits, after sorting by the average gene expression (normalized read
   counts). The narrower the gene expression distributions are in
   Supplemental Fig. [138]3, the more concordant two kits are to each
   other.

Notches in boxplots

   In the boxplots (Fig. [139]4e,f), the upper whisker extends from the
   hinge to the largest value no further than 1.5 * IQR from the hinge
   (where IQR is the inter-quartile range, or distance between the first
   and third quartiles). The lower whisker extends from the hinge to the
   smallest value at most 1.5 * IQR of the hinge.

   The difference among the kits was evaluated based on the overlapping of
   the notch region. The notch is defined as median
   m ± 1.58IQR/√n^[140]22. This gives a roughly 95% confidence interval
   for comparing medians.

RT-PCR and RNA-Seq comparison

   To compare the expression values of RT-PCR and RNA-Seq, we converted
   the standard curve slope given by RT-PCR to efficiency score using the
   following formula
   [MATH: <mi
   mathvariant="normal">Efficiency</mi><mo>=</mo><mo>−</mo><mn>1</mn><mo>+
   </mo><msup><mrow><mn>10</mn></mrow><mrow><mo stretchy="false">(</mo><mo
   stretchy="false">(</mo><mo>−</mo><mspace
   width="-.25em"></mspace><mspace
   width="-.25em"></mspace><mn>1</mn><mo>/</mo><mi>s</mi><mi>l</mi><mi>o</
   mi><mi>p</mi><mi>e</mi><mo stretchy="false">)</mo><mspace
   width=".25em"></mspace><mo stretchy="false">)</mo></mrow></msup> :MATH]

   and calculated for the 19 genes, the RNA-Seq normalized read counts of
   all the loci the primers mapped to. We normalized both PCR and RNA-Seq
   data with two controls, Gapdh and Vps13d. Although Gapdh is a commonly
   used housekeeping gene, Vps13d appears as a cleaner control, as its
   primers only mapped to a single locus while the Gapdh primers mapped to
   numerous locations in several homologous genes. The efficiency scores
   of both controls and the number of cycles of the sample amplification
   are reported in Table [141]1. Gm5805 was omitted as it was detected
   after more than 30 CT and reported high amplification value.

Supplementary information

   [142]Supplemental_Figures^ (342.6KB, pdf)

Acknowledgements