Abstract

   Measuring mRNA decay in tumours is a prohibitive challenge, limiting
   our ability to map the post-transcriptional programs of cancer. Here,
   using a statistical framework to decouple transcriptional and
   post-transcriptional effects in RNA-seq data, we uncover the mRNA
   stability changes that accompany tumour development and progression.
   Analysis of 7760 samples across 18 cancer types suggests that mRNA
   stability changes are ~30% as frequent as transcriptional events,
   highlighting their widespread role in shaping the tumour transcriptome.
   Dysregulation of programs associated with >80 RNA-binding proteins
   (RBPs) and microRNAs (miRNAs) drive these changes, including
   multi-cancer inactivation of RBFOX and miR-29 families. Phenotypic
   activation or inhibition of RBFOX1 highlights its role in calcium
   signaling dysregulation, while modulation of miR-29 shows its impact on
   extracellular matrix organization and stemness genes. Overall, our
   study underlines the integral role of mRNA stability in shaping the
   cancer transcriptome, and provides a resource for systematic
   interrogation of cancer-associated stability pathways.

   Subject terms: Gene regulatory networks, Cancer genomics, Regulatory
   networks, Transcriptomics
     __________________________________________________________________

   The role of mRNA stability in shaping the cancer transcriptome is
   revealed using a statistical analysis of transcriptomic data.

Introduction

   Widespread disruption of gene expression programs is a hallmark of
   cancer and underlies the extensive transformation of tumour cell
   identity and behavior. Among the least understood aspects of this gene
   expression remodeling is the regulation of mRNA stability and decay.
   Previous studies have found specific programs that are involved in
   tumourigenesis or metastasis through modulation of mRNA
   stability^[40]1–[41]8; however, the extent to which mRNA stability
   contributes to cancer cell transcriptome has not been systematically
   studied, and the associated regulatory networks are mostly unknown. A
   key limitation in studying these post-transcriptional programs stems
   simply from our lack of ability to measure mRNA decay rate in vivo:
   traditional methods that measure mRNA decay rely on in vitro
   manipulations such as transcriptional inhibition with chemical
   inhibitors (e.g. actinomycin D) or metabolic labeling with nucleoside
   analogues (e.g. 4-thiouridine), combined with time series measurements
   of transcripts^[42]9–[43]11. Despite recent improvements^[44]12,[45]13,
   these methods are resource-intensive, have inherent limitations and
   biases such as triggering cellular stress and pleiotropic
   effects^[46]14, and, most importantly, are only applicable to in vitro
   models. As a result, the mRNA stability landscape of tumour remains
   almost completely uncharted across different cancer types.

   A potential solution comes from recent studies showing that tissue
   RNA-seq data contain enough information to disentangle transcription
   rate from mRNA decay rate. Briefly, under the assumption that RNA
   processing rate is constant^[47]15,[48]16, any change in unspliced
   (pre-mature) mRNA abundance (estimated from intronic reads) must
   reflect a proportional change in transcription rate, while any change
   in spliced (mature) mRNA abundance (estimated from exonic reads)
   reflects the combined effect of transcription rate and mRNA decay
   (Fig. [49]1a). This model enables the estimation of differential mRNA
   stability based on how the ratio of exonic and intronic reads changes
   across conditions^[50]15. A recent improvement on this model
   generalizes the unspliced-spliced relationship as a power-law function,
   with the power-law exponent reflecting the coupling between
   transcription rate and splicing rate^[51]17 (Supplementary Fig. [52]1a,
   b).

Fig. 1. Inference of differential mRNA stability using DiffRAC.

   [53]Fig. 1
   [54]Open in a new tab

   a Schematic representation of the effect of transcription and stability
   on the abundances of unspliced and spliced RNA. b DiffRAC models the
   mean (λ) of intronic (int) and exonic (exo) read distribution as a
   function of pre-mature (p) and mature (m) transcript abundances, in
   addition to gene-specific (l) and library-specific (s) scaling factors.
   Mature mRNA abundance is modeled as a function of the pre-mature RNA
   abundance and mRNA stability (γ), which is in turn a function (f) of
   the experimental variables. Also see Supplementary Fig. [55]1. c An
   example case with four samples and two experimental conditions, showing
   how DiffRAC’s model can be implemented in a regression with a log-link
   function, along with the interpretation of regression coefficients
   (also see Methods). d Comparison of DiffRAC stability estimates against
   experimental mRNA half-life (stability) measurements in mouse ES cells
   differentiated to terminal neurons (TN)^[56]15,[57]18,[58]19. Each data
   point stands for one gene, with the points coloured according the
   standard error of the mean (SEM) for DiffRAC estimates. e Comparison of
   DiffRAC estimates vs. measured mRNA stability for the 100 genes with
   the smallest (left) and largest (right) DiffRAC SEMs. Error bars
   represent the standard error of the mean (SEM). f The Pearson
   correlation between DiffRAC estimates and measured mRNA stability for
   bins of 50 genes sorted by their SEM. g Distribution of experimental
   mRNA half-life measurements for genes that DiffRAC has identified as
   significantly destabilized (blue boxplots) or stabilized (red boxplots)
   in TN vs. ES cells, at FDR cutoffs of 0.05 (dashed line) or 0.01 (solid
   line). Genes that are not called as significant by DiffRAC are
   represented with the grey boxplot.

   Here, we build on these methods to obtain a pan-cancer map of mRNA
   stability changes between tumour and normal tissues, as well as the
   mRNA stability changes that accompany tumour progression. To do so, we
   first introduce a general framework for statistical analysis of
   differential mRNA stability that takes into account the distributional
   properties of count data. We benchmark this method using experimental
   measurements of mRNA decay rate, and then apply it to the RNA-seq data
   from The Cancer Genome Atlas (TCGA) to map the mRNA stability
   landscapes of 18 cancer types. We identify thousands of transcripts
   whose stability is altered during tumour formation and/or
   progression––experimental measurements in cancer cell line models
   support these findings and suggest a role for mRNA stability
   alterations in tumour progression and invasiveness. Finally, using
   network modeling and functional experiments, we identify key microRNAs
   (miRNAs) and RNA-binding proteins (RBPs) that mediate these changes,
   providing new insights into the post-transcriptional mechanisms of
   transcriptome remodelling in cancer.

Results

A generalized linear model for statistical testing of mRNA stability

   The spliced and unspliced transcripts of each gene follow a power-law
   relationship, with deviations from this power-law trend reflecting
   changes in the degradation rate of the mature mRNA^[59]17
   (Supplementary Fig. [60]1a, b). The power-law exponent reflects the
   coupling between transcription rate and RNA processing rate–an exponent
   of 1 indicates no coupling between transcription and processing rate
   constants, whereas values smaller than 1 indicate that as transcription
   increases, processing rate constant decreases, potentially due to
   saturation of the RNA processing machinery (Supplementary Fig. [61]1a).
   To use this power-law relationship for the inference of differential
   stability, it is essential to correctly model the variability in
   RNA-seq counts. For this purpose, we developed DiffRAC
   ([62]https://github.com/csglab/DiffRAC), a framework that converts the
   unspliced-spliced relationship to a generalized linear model whose
   parameters can then be inferred from sequencing count data using an
   appropriate error model of choice (Fig. [63]1b, c and Supplementary
   Fig. [64]1c, d).

   We evaluated the performance of DiffRAC for estimating differential
   mRNA stability using a previously published dataset^[65]18,[66]19,
   consisting of RNA-seq data from mouse embryonic stem cells and terminal
   neurons, along with experimentally measured transcript half-life
   measurements after transcriptional blockage with actinomycin D, which
   here we consider as “ground-truth” measurements for benchmarking
   purposes. We observed an overall Pearson correlation of 0.22 between
   RNA-seq-based stability estimates from DiffRAC and ground-truth
   stability measurements (Fig. [67]1d and Supplementary Data [68]1a), in
   line with previous reports on RNA stability estimation using this
   specific benchmarking dataset^[69]15,[70]17. However, for transcripts
   that had narrow confidence intervals as estimated by DiffRAC, the
   Pearson correlation between RNA-seq-based estimates and ground truth
   exceeded 0.5 (Fig. [71]1d–f), indicating that the confidence intervals
   estimated by DiffRAC indeed reflect the true uncertainty in estimating
   differential mRNA stability. Based on (adjusted) P values associated
   with DiffRAC differential stability estimates, we identified 79
   transcripts with higher stability in embryonic stem cells and 37
   transcripts with higher stability in terminally differentiated neurons
   (FDR < 0.05), which closely correspond to differentially stable
   transcripts based on the ground-truth (Fig. [72]1g). We performed
   additional benchmarking using RNA-seq data from NAT10-deficient HeLa
   cells with matched stability data from metabolic labeling-based
   BRIC-seq measurements^[73]20. Using similar analysis methods as those
   described above, we observed that RNA-seq-based DiffRAC estimates for
   transcripts with narrow confidence intervals correlate with BRIC-seq
   stability measurements (Supplementary Fig. [74]2 and Supplementary
   Data [75]1b). Overall, these results suggest that DiffRAC can properly
   estimate not just the mean differential mRNA stability, but also its
   uncertainty and statistical significance.

   One limitation of the model described above is that, with increasing
   sample sizes, the number of latent variables that need to be estimated
   by regression also increases, which can become prohibitively expensive
   in terms of computational times. To overcome the challenges associated
   with fitting the model in large sample cohorts, we developed a
   simplified DiffRAC model that assumes most of the variance in
   transcription can be explained by the experimental variables (see
   Methods and Supplementary Fig. [76]3a–c). This assumption greatly
   reduces the number of parameters; however, we observed that it does not
   considerably alter the differential stability estimates in the
   benchmarking dataset (Supplementary Fig. [77]3d).

DiffRAC identifies cancer-associated changes in mRNA stability

   To investigate the post-transcriptional changes responsible for
   transcriptome remodeling in cancer, we performed a pan-cancer analysis
   of differential mRNA stability across TCGA (The Cancer Genome Atlas,
   available at [78]https://www.cancer.gov/tcga.), encompassing 7760
   samples from 18 cancer types. We used DiffRAC to identify transcripts
   that were differentially stabilized or destabilized in tumour compared
   to normal tissues in each cancer type. This analysis revealed an
   average of 3954 mRNAs that were differentially stabilized/destabilized
   per cancer type (FDR-adjusted p < 0.05) (Fig. [79]2a, b, Supplementary
   Figs. [80]4 and [81]5, and Supplementary Data [82]2), suggesting
   widespread post-transcriptional remodeling in cancer, with the majority
   of transcripts showing highly cancer-specific stability profiles
   (Fig. [83]2b). Interestingly, across TCGA samples, the degree of
   stability dysregulation, calculated as the number of differentially
   stabilized mRNAs per patient, was associated with reduced disease-free
   survival (log hazard ratio of 0.36, P < 0.005, using Cox
   proportional-hazards model correcting for the confounding effect of
   patient age, sex, tumour purity and cancer type). Per-cancer-type
   associations were also mostly positive (Fig. [84]2c), indicating that a
   greater disruption of mRNA stability is overall associated with worse
   patient outcomes.

Fig. 2. Pan-cancer analysis of differential mRNA stability.

   [85]Fig. 2
   [86]Open in a new tab

   a Volcano plot of differential RNA stability between tumour and normal
   tissues (T vs. N) for 18 TCGA cancer types. See Supplementary
   Fig. [87]4 for volcano plots of individual cancer types. b Heatmap of
   differential mRNA stability profiles across TCGA cancers. Genes with
   significant DiffRAC results in at least one cancer (FDR < 0.05) are
   included. The colour gradient represents a combination of the log[2]
   fold-change of mRNA stability and the FDR. BLCA bladder urothelial
   carcinoma, BRCA breast invasive carcinoma, CHOL cholangiocarcinoma,
   COAD colon adenocarcinoma, ESCA esophageal carcinoma, GBM glioblastoma
   multiforme, HNSC head and neck squamous cell carcinoma, KICH kidney
   chromophobe, KIRC kidney renal clear cell carcinoma, KIRP kidney renal
   papillary cell carcinoma, LIHC liver hepatocellular carcinoma, LUAD
   lung adenocarcinoma, LUSC lung squamous cell carcinoma, PRAD prostate
   adenocarcinoma, READ rectum adenocarcinoma, STAD stomach
   adenocarcinoma, THCA thyroid carcinoma, UCEC uterine corpus endometrial
   carcinoma. c Associations between the degree of disruption of mRNA
   stability, defined as a high or low number of differentially stabilized
   transcripts (relative to the median), and disease-free survival, tested
   using a Cox proportional-hazards model and correcting for patient age,
   sex and tumour purity. The bar height represents the Cox regression
   coefficient, while the colour gradient represents the p values, with
   red representing a worse prognosis, and blue representing a protective
   effect. The error bars represent the standard error of the mean (SEM).
   d Pathway enrichment analysis of genes with significant differential
   mRNA stability in each cancer type. Circles with black outline
   correspond to MSigDB hallmark gene sets that are significantly enriched
   among cancer-stabilized (red) and cancer-destabilized (blue) mRNAs
   (FDR < 0.05, Fisher’s exact test). Log-odds and P values are
   represented using the colour gradient and circle sizes, respectively. e
   Volcano plot showing the experimentally measured differential stability
   between highly metastatic MDA-LM2 cell line relative to its parental
   MDA-MB-231 line (see Methods). f Gene set enrichment analysis (GSEA)
   (Subramanian et al., 2005) for highly metastatic relative to poorly
   metastatic PDX models of breast cancer^[88]24–[89]26. Genes (x-axis)
   are sorted by Wald test statistic of differential stability between
   metastatic and primary PDXs. The red line represents the enrichment
   curve for the transcripts that were stabilized in MDA-LM2 relative to
   MDA-MB-231, while the blue line represents the enrichment curve for the
   destabilized transcripts. g Relative enrichment of transcripts that
   were stabilized (red) or destabilized (blue) in TCGA-BRCA tumours
   compared to normal samples, overlaid on the volcano plot from (e).
   Kernel density estimation was used to calculate the density of
   BRCA-stabilized and destabilized mRNAs across the plot, with the
   difference between the estimated densities of the two groups shown
   using the colour gradient. h Venn diagrams illustrating the overlap
   between transcripts that are significantly stabilized in BRCA tumours
   (relative to normal) and MDA-LM2 (relative to MDA-MB-231), and genes
   that are part of the mTORC1 signalling (top) or MYC targets (bottom). P
   values are based on Fisher’s exact test. i Transcription inhibition
   time-course graphs for two example genes, one involved in mTORC1
   signaling (RAB1A) and one among MYC targets (ODC1). The y-axis shows
   mRNA abundance after applying variance-stabilized transformation and
   correcting for mRNA abundance differences between the two cell lines at
   time zero. Time-course measurements in MDA-LM2 and parental MDA-MB-231
   cells are shown in red and blue, respectively, with the slope of each
   fitted line representing the rate of degradation.

   Several lines of evidence support the reliability of the stability
   profiles we have inferred. First, we observed that tumour mRNA
   stability profiles clustered by organ of origin (Fig. [90]2b),
   providing an internal validation for the robustness of stability
   inferences. Secondly, we observed that post-transcriptionally
   deregulated genes in each cancer type are functionally related
   (Fig. [91]2d), consistent with previously reported relationship between
   post-transcriptional regulons and functional gene
   modules^[92]21,[93]22. This analysis also highlights the role of mRNA
   stability in shaping the functional landscape of the cancer cell. For
   example, epithelial-mesenchymal transition genes and MYC targets are
   enriched among stabilized mRNAs across several cancer types, while
   metabolic pathways such as oxidative phosphorylation and lipid
   metabolism are highly enriched among destabilized mRNAs, most
   noticeably in cholangiocarcinoma (CHOL), liver hepatocellular carcinoma
   (LIHC) and head-neck squamous cell carcinoma (HNSC).

   Thirdly, we found that cancer-associated stability changes inferred
   from tissue RNA-seq data are highly consistent with experimentally
   measured mRNA stability changes in cancer cell line models.
   Specifically, we used time-series measurements of 4-thiouridine-labeled
   RNA^[94]23 from the MDA-MB-231 cell line, a model of breast cancer, as
   well as the highly invasive MDA-LM2 cells to identify mRNAs that are
   differentially stable between these two cell lines (Fig. [95]2e, see
   Methods for details; measurements are provided in Supplementary
   Data [96]3a). We then compared these experimental stability
   measurements to RNA-seq-based differential stability estimates between
   highly metastatic and poorly metastatic PDX models of breast
   cancer^[97]24–[98]26. We observed that the mRNAs that are more stable
   in the invasive MDA-LM2 cell line (based on experimental stability
   measurements) are also overall more stable in the highly metastatic
   PDXs compared to the poorly metastatic PDX (based on DiffRAC analysis
   of tissue RNA-seq data). Similarly, mRNAs that are less stable in the
   MDA-LM2 cell line are overall less stable in the poorly metastatic PDX
   (Fig. [99]2f; measurements are provided in Supplementary Data [100]3b).

   Interestingly, we found that the mRNAs that are more stable in primary
   breast tumours compared to normal tissue (based on DiffRAC analysis of
   TCGA data) are also overall more stable in the highly invasive LM2 line
   compared to the parental MDA line, and tumour-destabilized mRNAs are
   overall less stable in the LM2 line (Fig. [101]2g). This concordance
   can also be observed at the pathway level: two of the three pathways
   that were upregulated in breast tumours based on DiffRAC estimates also
   appear to be enriched among mRNAs that are stabilized in MDA-LM2
   compared to MDA-MB-231 cell lines (MYC targets and mTORC1 signaling,
   Fig. [102]2h; example genes are shown in Fig. [103]2i), supporting a
   role of mRNA stability in deregulation of these key pathways.

   Since the MDA-LM2 line is more invasive than MDA-MB-231, the above
   analysis suggests that, at least in breast cancer, normal-to-tumour
   stability changes persist during the progression of the disease to
   metastasis. To understand whether normal-to-tumour stability changes
   are correlated with progression-associated stability changes across
   other cancers, we used DiffRAC to examine the effect of tumour stage
   and grade on mRNA stability in each TCGA cancer type, by including
   stage/grade (as numerical variables) in DiffRAC’s GLM design while
   controlling for the confounding effects of age, sex and tumour purity
   (Supplementary Data [104]4). The differential stability results
   therefore reflect the change in stability that occurs as tumour stage
   or grade increases. We identified a total of 1966 transcripts with
   significant stability changes associated with tumour stage in at least
   one of the 11 cancers types that we analysed (Supplementary
   Data [105]5a), and 2013 transcripts whose stability was associated with
   tumour grade in at least one of the four cancer types for which this
   type of classification was available (Supplementary Data [106]6). We
   observed highly cancer-specific associations both for stage and grade
   (Fig. [107]3a). Importantly, we found that in most cases the stage- and
   grade-associated stability changes correlate with normal-to-tumour
   stability changes (Fig. [108]3b shows an example, with the overall
   results summarized in Fig. [109]3c).

Fig. 3. Stage- and grade-associated mRNA stability changes.

   [110]Fig. 3
   [111]Open in a new tab

   a The mRNA stability changes associated with tumour stage (left) and
   grade (right) across TCGA cancers. Genes with significant changes in at
   least one cancer at FDR < 0.05 are included. The colour gradient is the
   same as in Fig. [112]2b. b Comparison of the differential mRNA
   stability between tumour and normal (x-axis) and stage-associated
   differential mRNA stability (y-axis), in the TCGA-LIHC dataset as an
   example. Genes with significant changes along both axes at FDR < 0.05
   are coloured in blue. Pearson correlation coefficients and confidence
   intervals for all genes (black) and significant ones (blue) are shown
   on top. Panel (c) summarizes the Pearson correlations for significant
   genes in other cancer types (error bars represent the confidence
   intervals). d Schematic illustration of the model used for deconvolving
   the stage-associated changes in malignant and non-malignant cells. The
   equation on top represents the model used, with the interpretation of
   model coefficients shown on the plot. See Methods for details. e
   Stability changes associated with tumour stage across TCGA cancers that
   could be assigned to cancerous/pre-cancerous cells. Genes with
   significant DiffRAC results in at least one cancer (FDR < 0.05) are
   included. The colour gradient is the same as in Fig. [113]2b, with the
   exception that the log[2] fold-change of mRNA stability ranges from −1
   to 1 here. f Comparison of the stage-associated differential mRNA
   stability in non-cancerous cells (x-axis, left) or
   cancerous/pre-cancerous cells (x-axis, right) to the non-deconvoluted
   estimates (y-axis) in the TCGA-KIRC dataset. Pearson correlation
   coefficients and p values are shown on the plot. Panel (g) shows this
   Pearson correlations across all cancer types for non-cancerous (gray)
   and cancerous/pre-cancerous cells (red). Error bars represent the 95%
   confidence intervals. h The Pearson correlation between tumour vs.
   normal (T/N) differential stability and the deconvoluted
   stage-associated differential mRNA stability. Only cancer types with at
   least 5 significant deconvoluted stage-associated genes are shown.
   Error bars represent the 95% confidence intervals.

   We note that disease progression is often accompanied by substantial
   cell composition changes, which may confound the estimation of
   stage/grade-associated stability changes from bulk RNA-seq data.
   However, previous research has shown that cell type-specific gene
   expression changes can be identified from bulk RNA-seq data^[114]27. We
   implemented a similar design using DiffRAC to deconvolve the
   stage-associated stability changes occurring specifically in the
   malignant cells from those occurring in the tumour microenvironment, as
   well as changes that simply reflect cell composition differences
   (Fig. [115]3d, see Methods for details). We identified 275 genes whose
   stage-associated mRNA stability changes were confidently attributed to
   dysregulation in malignant cells (Fig. [116]3e and Supplementary
   Data [117]5b). With the exception of one cancer type, the
   stage-associated stability changes inferred from the tumour bulk were
   better correlated with the deconvoluted changes attributed to malignant
   cells compared to those of tumour microenvironment (Fig. [118]3f, g).
   Stage-associated changes that could be attributed to malignant cells
   were also positively correlated with tumour-to-normal changes in most
   cancer types (Fig. [119]3h). Taken together, these results highlight
   widespread mRNA stability changes in tumours, which affect key
   cancer-related pathways and continue to remodeling of the transcriptome
   in malignant cells through disease progression.

RNA-binding proteins play a key role in shaping the tumour mRNA stability
profile

   RNA-binding proteins (RBPs) and microRNAs (miRNAs) are the key
   regulators of mRNA stability. These sequence-specific factors primarily
   affect RNA stability through binding to the 3ʹ untranslated region
   (UTR) of their targets–RBPs either stabilize or destabilize their
   targets^[120]28, while miRNAs primarily destabilize their target
   mRNAs^[121]29,[122]30. Starting with RBPs, we set out to examine
   whether these factors underlie the mRNA stability changes in cancer. We
   specifically tested for the enrichment of the targets of each RBP among
   mRNAs that are differentially stable between tumour and normal tissues,
   after correcting for the background frequency of RBP binding to each
   transcript (see Methods). Figure [123]4a shows an example, where the
   binding targets of the RBFOX1 protein are enriched among transcripts
   that are destabilized in glioblastoma multiforme (GBM), relative to the
   binding targets of other RBPs. We can quantify this enrichment by
   statistical modeling of the relationship between the binding of a
   specific RBP to the 3ʹ UTR of a transcript and the tumour-specific
   stability status of that transcript (Fig. [124]4b). We performed a
   systematic quantification of these relationships for 35 RBPs whose
   stability target sets (regulons) have been previously mapped based on
   the presence of their preferred binding sequences in the 3ʹ UTRs as
   well as the expression pattern of the candidate target genes^[125]28.
   This analysis revealed significantly enriched regulons among
   tumour-stabilized or destabilized mRNAs across different cancer types,
   representing deregulation of 17 out of the 35 examined RBPs in at least
   one cancer type (Fig. [126]4c). Importantly, we observed excellent
   agreement between cancer-associated RBP expression changes and RBP
   target enrichments, after taking into account the expected function of
   each RBP in stabilizing or destabilizing its targets (Pearson
   correlation 0.61; Fig. [127]4d). For example, SNRPA, which is an
   RNA-destabilizing factor^[128]28, is upregulated in multiple cancers,
   consistent with the observed destabilization of its regulon
   (Fig. [129]4c, d). This strong correlation highlights the reliability
   of our regulon analysis approach for identifying dysregulated RBPs, and
   suggests that aberrant expression of RBPs in cancer drives coordinated
   changes in the stability of their regulons.

Fig. 4. Enrichment of RBP binding sites among differentially stabilized mRNAs
in cancer.

   [130]Fig. 4
   [131]Open in a new tab

   a An example case showing the enrichment of RBFOX1 binding sites among
   differentially stabilized mRNAs in TCGA-GBM. Genes are binned by FDR of
   their DiffRAC differential mRNA stability between tumour and normal,
   with destabilized mRNAs on the left and stabilized mRNAs on the right.
   The relative frequency of RBFOX1 targets (circles) and targets of all
   other RBPs (solid line) is shown for each bin. b Schematic
   representation of the logistic regression approach for modeling the
   enrichment of RBFOX1 targets (relative to other RBPs) as a function of
   differential stability. c Heatmap summarizing the results of applying
   the model in panel (b) to all RBPs. Positive (red) and negative (blue)
   regression coefficients indicate enrichment of RBP targets among mRNAs
   that are stabilized and destabilized in cancer, respectively. The
   circle size represents the significance level. Significant associations
   between RBP binding and stability status are shown using black outlines
   (FDR < 0.05). d Comparison of the differential RBP expression (tumour
   vs normal) and cancer-associated regulon activity. Regulon activity is
   defined to be the same as the enrichment coefficients from panel (c),
   with the sign of the coefficient inverted for RBPs whose binding leads
   to RNA destabilization (based on ref. ^[132]28). Each dot represents
   one RBP in one cancer type. RBFOX1 regulon activities are highlighted.
   Pearson correlation of differential expression vs. differential regulon
   activity is 0.61.

   Among the RBPs we analysed, two RBPs, namely RBFOX1 and RBFOX3, stand
   out as being consistently deregulated across several cancer types.
   Specifically, the targets of these RBPs are enriched among destabilized
   mRNAs in almost half of all the cancer types we analysed
   (Fig. [133]4c). Consistent with the role of RBFOX proteins in promoting
   mRNA stability^[134]28,[135]31, both RBFOX1 and RBFOX3 are
   downregulated across multiple cancers (Fig. [136]5a, b), suggesting
   that downregulation of RBFOX proteins leads to destabilization of their
   targets. For both RBFOX1 and RBFOX3, the highest expression in normal
   tissues can be seen in the brain tissue; subsequently, the most
   prominent case of their downregulation as well as the most significant
   changes in the stability of their regulons can be seen in GBM,
   suggesting a major role in determining tumour transcriptome in this
   cancer type. However, their effect is not limited to GBM, especially
   for RBFOX3, which shows a broader range of expression in normal tissues
   and is also downregulated in a greater number of cancers
   (Fig. [137]5b).

Fig. 5. Aberrant activity of RBFOX proteins mediates stability changes across
multiple cancers.

   [138]Fig. 5
   [139]Open in a new tab

   a RBFOX1 expression across TCGA cancer types. The box plot (top) shows
   the RBFOX1 log[2](RSEM) gene expression, retrieved from Firebrowse
   ([140]http://firebrowse.org/), in normal tissue samples. The bar plot
   (bottom) illustrates the average log fold-change of RBFOX1 expression
   in tumours compared to normal samples (T vs. N; error bars represent
   SEM). b RBFOX3 expression in normal tissue samples and differential
   expression in tumours, similar to panel a. c Conservation of mouse
   RBFOX1 binding sites in humans for high-confidence Rbfox HITS-CLIP
   targets^[141]32. The heatmap shows the sequences of human orthologs of
   the mouse Rbfox binding sites (Rbfox motif hits on the mouse sequences
   were identified, and the orthologous regions were extracted using
   liftOver^[142]69. The consensus sequence from the human orthologs is
   shown underneath the heatmap. The RBFOX1 motif from RNAcompete^[143]28
   is also shown at the bottom. d Heatmap showing the stability of RBFOX
   HITS-CLIP targets (as defined above). Rows correspond to genes and
   columns to cancer types, with the latter sorted in the same order as
   panels (a, b). e Gene set enrichment analysis (GSEA)^[144]70 for RBFOX1
   inhibition in terminally differentiated neurons. Genes (x-axis) are
   sorted by the Wald test statistic of differential expression between
   RBFOX1 knockdown (KD) and control (Ctrl) cells, with vertical black
   lines demarcating the pan-cancer-destabilized set of RBFOX1 targets.
   The blue line represents the enrichment curve for this gene
   set^[145]70. f GSEA for RBFOX1 rescue in mouse neurons deficient for
   RBFOX proteins, similar to panel (e). g Venn diagram illustrating the
   overlap between the leading-edge^[146]70 set of genes downregulated by
   RBFOX1 knockdown (from e) and the leading-edge set of genes upregulated
   by RBFOX1 rescue (from f). h Volcano plot of differential gene
   expression in RBFOX1-overexpressing (OE) A172 cells. i Gene set
   enrichment analysis (GSEA) (Subramanian et al., 2005) for RBFOX1
   overexpression (OE) in the A172 human glioblastoma cell line. Genes
   (x-axis) are sorted by the log[2] fold change of differential mRNA
   stability between RBFOX1 overexpressing and control cells. The blue
   line represents the enrichment curve for RBFOX1 direct targets. j
   Similar to panel i, with the blue line representing the enrichment
   curve for genes involved in the calcium signalling pathway. k
   Differential gene expression in RBFOX1-overexpressing A172 cells (n = 3
   biological replicates) relative to controls (n = 3 biological
   replicates), shown for pan-cancer destabilized mRNAs that are bound by
   RBFOX1 (from panel d). Error bars represent the SEM.

   To confirm that the downregulation of RBFOX proteins accompanies
   destabilization of their direct binding targets in cancer, we used
   HITS-CLIP data of Rbfox proteins in whole brain tissue lysate of
   mice^[147]32 to build a high-confidence stability network of
   transcripts that have the strongest binding sites in their 3ʹ UTRs (see
   Methods). We confirmed that RBFOX binding sites identified from mouse
   HITS-CLIP data are conserved in human (Fig. [148]5c), and observed
   overall destabilization of the associated targets across different
   cancers (Fig. [149]5d). We noticed a subset of mRNAs that are
   consistently destabilized across the same cancers in which either
   RBFOX1 or RBFOX3 is downregulated (Fig. [150]5d). Interestingly, a
   subgroup of these mRNAs is stabilized in the few cancer types in which
   RBFOX1 is upregulated (e.g. genes with positive mRNA stability values
   for LUSC, LUAD and THCA in Fig. [151]5d), further supporting the notion
   that their cancer-associated stability changes are driven by RBFOX
   proteins.

   To verify that the stability of these mRNAs is regulated by RBFOX1, we
   examined the RNA-seq data from differentiated primary human neural
   progenitor (PHNP) cells in which RBFOX1 is knocked
   down^[152]33,[153]34. As expected, cancer-destabilized mRNAs that were
   associated with RBFOX1 were also downregulated upon RBFOX1 knockdown
   (Supplementary Data [154]7a and Fig. [155]5e). In contrast, when RBFOX1
   expression is restored ectopically in mouse neurons lacking RBFOX
   proteins^[156]31,[157]35, the expression of these genes is also rescued
   (Fig. [158]5f). We identified a core set of eight transcripts that have
   RBFOX binding site in their 3ʹ UTRs, are concurrently destabilized
   across cancers, are inhibited when RBFOX1 is knocked down, and are
   upregulated when RBFOX1 expression is rescued (Fig. [159]5g).
   Interestingly, half of these genes belong to the calcium signaling
   pathway (based on KEGG pathways^[160]36, Fisher’s exact test
   P < 10^−6), suggesting that deregulation of RBFOX proteins primarily
   affects calcium signaling in cancer cells.

   Finally, to validate the role of RBFOX1 downregulation in mediating
   mRNA stability changes in human glioblastoma cells and to investigate
   whether restoring RBFOX1 activity can rescue the destabilization of its
   target transcripts, we overexpressed RBFOX1 in the human glioblastoma
   cell line A172 (Supplementary Fig. [161]6) and performed RNA-seq. As
   expected, we observed widespread changes in gene expression
   (Fig. [162]5h and Supplementary Data [163]7b), with overall
   upregulation of the RBFOX1 regulon in the RBFOX1-overexpressing A172
   cell line (Fig. [164]5i). Consistent with the pathway analysis
   described above, we observed significant upregulation of calcium
   signaling pathway genes after RBFOX1 overexpression (Fig. [165]5j).
   Furthermore, the majority of pan-cancer destabilized mRNAs that are
   bound by RBFOX1 are upregulated in A172 cells after RBFOX1
   overexpression (Fig. [166]5k). These results suggest that RBFOX1
   downregulation in glioblastoma cells leads to destabilization of its
   targets, including calcium signaling pathways genes, which can be
   partially rescued through RBFOX1 overexpression.

Dysregulation of miRNA regulons shapes the cancer transcriptome

   To examine the contribution of miRNAs to the dysregulation of mRNA
   stability in cancer, we systematically searched for miRNAs whose
   targets are disproportionately dysregulated at the stability level in
   cancer, similar to the RBP analysis above (Methods). Figure [167]6a
   shows miR-122 as an example; miR-122 is the most abundant miRNA
   expressed in liver cells^[168]37, was previously shown to be
   downregulated in cholangiocarcinoma, and acts as a tumour suppressor
   via suppression of cell proliferation and induction of
   apoptosis^[169]38,[170]39. As expected, our regulon analysis indicates
   that miR-122 targets are predominantly stabilized specifically in
   cholangiocarcinoma tumours compared to normal tissue (Fig. [171]6a),
   consistent with reduced activity of miR-122. This observation is
   consistent with TCGA miRNA expression data, which show specific
   downregulation of miR-122 expression in cholangiocarcinoma
   (Supplementary Fig. [172]7). Systematic application of this
   network-based approach revealed that, out of 153 broadly conserved
   miRNA families, the regulons of 63 miRNAs are deregulated in at least
   one cancer type, suggesting widespread disruption of miRNA networks
   (Fig. [173]6b).

Fig. 6. Dysregulation of miRNA regulons in cancer.

   [174]Fig. 6
   [175]Open in a new tab

   a An example case showing the enrichment of miR-122 targets among mRNAs
   stabilized in TCGA-CHOL tumours (relative to normal), similar to
   Fig. [176]4a. Note that since miRNAs are expected to destabilize their
   targets, enrichment in differentially stable mRNAs indicates
   downregulation of miRNA activity. b Heatmap summarizing enrichment
   analysis for all miRNAs across all cancer types, similar to
   Fig. [177]4c. c Enrichment of miR-29 targets among genes that are
   downregulated after transfection of miR-29 mimic in 786-O cells (n = 1)
   relative to control (n =1). The volcano plot (bottom) summarizes
   differential expression results between miR-29 mimic and control; the
   dot plot at the top shows enrichment of miR-29 targets at bins of
   differentially expressed genes, similar to panel (a). In the volcano
   plot, significantly differentially expressed genes (FDR < 0.05) are
   shown in red. d Enrichment of miR-29 binding sites, relative to other
   miRNA binding sites, in genes categories defined by their differential
   mRNA stability in TCGA-KIRC and differential expression after miR-29
   mimic expression in 786-O cells. Each dot represents a gene, and those
   with a black outline contain at least one miR-29 binding site. The
   colour gradient represents the log-odds of miR-29 binding site
   enrichment in each quarter. P values are based on Fisher’s exact test.
   Also see Supplementary Fig. [178]8 for miR-29 mimic expression in 786-O
   cells. e Similar to panel (d), but using differential expression after
   miR-29 inhibition in ACHN cells. f Venn diagram illustrating the
   overlap of genes that are bound by miR-29, upregulated in KIRC,
   downregulated after miR-29-mimic treatment of 786-O and A-498 cells,
   and upregulated after miR-29 inhibition in ACHN cells. g Differential
   expression of the 53 genes identified in panel (f), in 786-O or A-498
   cells expressing a miR-29 mimic, or in ACHN cells expressing a miR-29
   inhibitor. Error bars represent the SEM. Genes that are bold correspond
   to ECM genes (based on overlap with GO), and those with an asterisk are
   markers of embryonal carcinoma (based on StemCheker (Pinto et al.,
   2015)).

   Of interest, we observed that miR-29 targets are recurrently stabilized
   across more than half of the cancer types we analysed, suggesting a
   pan-cancer decrease in miR-29 activity. Among these cancer types, the
   miR-29 regulon showed the most significant enrichment among stabilized
   mRNAs in UCEC and KIRC (clear cell renal cell carcinoma), suggesting a
   major role in post-transcriptional remodeling in these cancer types. To
   understand whether restoring miR-29 activity can reverse these
   post-transcriptional changes, we expressed a miR-29 mimic in 786-O and
   A-498 cells, which are models for KIRC (Supplementary Fig. [179]8). As
   expected, expression of miR-29 mimic resulted in global downregulation
   of the miR-29 regulon (Fig. [180]6c, Supplementary Fig. [181]9a, and
   Supplementary Data [182]8a, b). Importantly, miR-29 mimic expression
   leads to downregulation of the majority of mRNAs that are significantly
   stabilized in KIRC (Fig. [183]6d and Supplementary Fig. [184]9b), most
   of which have a miR-29 binding site in their 3ʹ UTRs. Conversely,
   miR-29 inhibition in the ACHN cell line (also a model for KIRC)
   reversed these patterns, with a global upregulation of miR-29 targets
   (Supplementary Fig. [185]10 and Supplementary Data [186]8c), and
   upregulation of transcripts that are stabilized in KIRC and potentially
   targeted by miR-29 (Fig. [187]6e). Together, these results suggest that
   miR-29 downregulation has a widespread effect on the stability of
   transcripts in cancer, while restoring its activity partially rescues
   the normal mRNA stability landscape of the cell.

Discussion

   By quantifying differential mRNA stability patterns across 18 cancer
   types, our study presents a systematic resource for mining the
   post-transcriptional landscape of cancer. Importantly, our results
   uncovered recurrent changes in the stability of >13,000 mRNAs in at
   least one cancer type, highlighting the widespread role of
   post-transcriptional regulation in shaping the cancer transcriptome. We
   note that this resource also provides an approximation for the relative
   contribution of transcriptional and post-transcriptional events in
   shaping cancer transcriptome: on average, 19% of genes that are
   significantly upregulated at the expression level are detected by
   DiffRAC as significantly stabilized in tumours, and 23% of genes with
   significantly reduced expression are detected as significantly
   destabilized. In comparison, 66% and 61% of genes whose expression is
   significantly up- or downregulated are detected as transcriptionally
   activated or inhibited in tumours, respectively (Supplementary
   Fig. [188]11). We note that about 57% of the variability in the number
   of differentially stabilized genes across cancer types appears to be
   attributed to sample size, suggesting that our analysis may be
   underpowered for smaller cancer cohorts (Supplementary Fig. [189]12).
   Nonetheless, these results suggest an important role for
   post-transcriptional changes in shaping the cancer transcriptome, with
   recurrent changes that are ~30% as frequent as transcriptional events.

   Our study also highlights the coordinated post-transcriptional
   deregulation of genes that are involved in the same pathways. Notably,
   we observed recurrent stabilization of mRNAs that encode
   epithelial-mesenchymal transition (EMT) proteins and MYC targets across
   multiple cancer types. EMT is the process by which epithelial cells
   lose their apical-basal polarity and cell–cell adhesion, and instead
   acquire mesenchymal properties such as migratory and invasive
   potentials^[190]40; our results suggest that activation of the EMT
   pathway in cancer is at least partly mediated by post-transcriptional
   upregulation. Similarly, we observed post-transcriptional upregulation
   of MYC targets, which include growth-related genes that directly
   contribute to tumourigenesis^[191]41. MYC is a well-defined
   transcription factor and represents one of the most frequently
   amplified oncogenes^[192]42, leading to transcriptional activation of
   its targets in cancer. Therefore, our intriguing observation that MYC
   targets are also upregulated at the mRNA stability level suggests the
   presence of convergent transcriptional and post-transcriptional
   mechanisms that modulate overlapping gene sets. Furthermore, we
   observed coordinated destabilization of mRNAs for genes implicated in
   oxidative phosphorylation (OXPHOS) and related pathways such as fatty
   acid metabolism and adipogenesis, consistent with the well-documented
   Warburg effect in which upregulation of glucose consumption and
   glycolysis is accompanied by a downregulation of OXPHOS^[193]43.

   In addition, we observed widespread and coordinated
   post-transcriptional modulation of the targets of RNA-binding proteins
   (RBPs) in cancer, with the RBFOX family of RBPs standing out as having
   the most recurrently downregulated regulon across multiple cancer
   types. RBFOX proteins are known regulators of alternative splicing and
   mRNA stability^[194]28 and have been implicated in a number of
   neurological diseases^[195]17,[196]31,[197]44, but their role in cancer
   is less characterized. Nonetheless, at least the RBFOX1 locus appears
   to be among the most frequently deleted loci across different cancer
   types^[198]45,[199]46, with its deletion^[200]47 or other genetic
   defects^[201]48 being associated with poor survival. Our study suggests
   that downregulation of RBFOX proteins leads to destabilization of their
   target transcripts in tumours; many of these transcripts encode
   proteins involved in calcium signaling, a critical pathway that affects
   a wide range of cancer-associated processes such as proliferation,
   invasion, and apoptosis^[202]49. The association between RBFOX1 and
   calcium signaling is also supported by previous literature that shows a
   positive effect of RBFOX1 on the expression of some of the genes
   involved in this pathway^[203]50. We note that the RBFOX family of
   proteins includes RBFOX1, RBFOX2, and RBFOX3; however, RBFOX1 and
   RBFOX3 show the greatest extent of downregulation across different
   tumours (>60-fold, Fig. [204]5a, b), whereas RBFOX2 shows comparatively
   moderate downregulation (~3-fold, Supplementary Fig. [205]13).
   Furthermore, RBFOX2 does not show significant correlation with the
   expression of the mRNAs that contain the RBFOX-binding consensus
   sequence^[206]28. Taken together, these observations suggest that
   RBFOX1/3 are the most likely candidates driving dysregulation of the
   RBFOX regulon in cancer.

   In addition to RBPs, our results also highlight cancer type-specific
   deregulation of mRNA stability by miRNAs, with miR-29 standing out as a
   pan-cancer stability factor. Our observations are in line with previous
   studies showing that different miR-29 isoforms act as tumour
   suppressors and are downregulated in several cancer
   types^[207]51,[208]52, affecting cell proliferation, differentiation
   and apoptosis^[209]53. This downregulation correlates with more
   aggressive forms of cancer, characterized by increased metastasis,
   invasion and relapse^[210]54, and therapeutic restoration of miR-29 was
   suggested to improve disease prognosis^[211]55. In line with these
   reports, we observed pan-cancer stabilization of miR-29 targets,
   suggesting widespread reduction in miR-29 activity in cancer, which
   could be partially reversed by miR-29 rescue. We note that our results
   highlight a core set of 53 mRNAs that are miR-29 targets, stabilized at
   least in KIRC, downregulated after restoring miR-29 activity in the
   KIRC model cell lines 786-O and A-498, and upregulated after miR-29
   inhibition in ACHN cells (Fig. [212]6f). Importantly, seven of these
   genes are markers of embryonal carcinoma, suggesting that miR-29
   inhibition is essential for activation of an embryonic-like program in
   cancer (Fig. [213]6g). In addition, we observed a significant
   enrichment of the extracellular matrix (ECM) genes (Fig. [214]6g),
   suggesting that miR-29 inhibition also contributes to ECM remodeling in
   cancer, consistent with previous reports on ECM regulation by
   miR-29^[215]56.

   It should be noted that various pathways may affect mRNA stability and
   its estimates. For example, disruptions in the nonsense-mediated decay
   (NMD) pathway affects the translation-dependent stability of a wide
   range of mRNAs^[216]57. Since most of the affected transcripts are
   likely spliced^[217]58, such changes are expected to be properly
   captured by our analysis of spliced/unspliced transcript ratios.
   However, analysis of spliced/unspliced transcript ratios may not be
   suitable for studying NMD-dependent clearance of unspliced cytoplasmic
   transcripts^[218]59. Other proteins involved in the RNA decay pathway
   are also expected to influence mRNA stability, although we were not
   able to detect a significant association between the degree of RNA
   stability disruption and somatic alterations in RNA decay pathway
   proteins (Supplementary Fig. [219]14). While RNA surveillance pathways
   such as NMD and general RNA decay proteins affect mRNA stability
   globally, in this work we chose to focus on regulon-specific
   disruptions caused by abnormal activity of RBPs and miRNAs. We note
   that different mechanisms may underlie the observed disruption in the
   RBP/miRNA regulons in cancer, including changes in the expression
   levels of these regulatory factors, mutations, post-translational
   modifications in the case of RBPs, disruption of miRNA biogenesis,
   competition/cooperation with other regulatory factors, and
   enhanced/restricted access to binding sites on target transcripts.
   However, at least in the case of RBPs, we observed a strong correlation
   between their expression and regulon activity in cancer (Fig. [220]4d),
   suggesting that disruption of the expression of RBPs is most likely the
   dominant mechanism underlying the dysregulation of their regulons.

   Together, these results highlight a key role for mRNA stability
   programs, mediated by RBPs and miRNAs, in regulation of pathways that
   are integral to cancer development and progression. While the vast
   majority of current literature is focused on the role of
   transcriptional mechanisms in reprogramming cancer cells, this study
   underlines a critical and largely uncharacterized role for
   post-transcriptional remodeling of the cancer cell transcriptome, and
   provides a resource for exploring post-transcriptional pathways in
   cancer.

Methods

Joint modelling of intronic and exonic read counts and mRNA stability

   Our approach for statistical modeling of intronic and exonic read
   counts builds on previous research that connects the abundance of
   pre-mRNA and mature mRNA to mRNA stability (Supplementary Fig. [221]1a,
   b):
   [MATH: <mi>log</mi><mspace width="0.25em"></mspace><mi
   mathvariant="bold-italic">m</mi><mo>=</mo><mi>b</mi><mo>×</mo><mi>log</
   mi><mspace width="0.25em"></mspace><mi
   mathvariant="bold-italic">p</mi><mo>+</mo><mi>log</mi><mspace
   width="0.25em"></mspace><mi>φ</mi><mo>+</mo><mi>log</mi><mspace
   width="0.25em"></mspace><mi mathvariant="bold-italic">γ</mi> :MATH]
   1

   here, m corresponds to the vector of the mature mRNA abundance for a
   given gene across different samples, p is the abundance of the
   pre-mature mRNA, γ is the mRNA stability across samples, φ is the
   maximum processing rate of RNA, and b is the bias-term (Supplementary
   Fig. [222]1b). Vectors are differentiated from scalars using bold
   typeface.

   We further model the logarithm of mRNA stability as a linear function
   of a set of sample-level variables:
   [MATH: <mi>log</mi><mspace width="0.25em"></mspace><mi
   mathvariant="bold-italic">γ</mi><mo>=</mo><mi
   mathvariant="bold-italic">X</mi><mo>×</mo><mi
   mathvariant="bold-italic">β</mi><mo>+</mo><mi>α</mi> :MATH]
   2

   here, X is the n × k matrix of sample-level variables (for n samples
   and k variables), β is the vector of coefficients that quantify the
   effect of each variable on the mRNA stability, and α is an intercept
   (matrices are differentiated from vectors using capital letters). This
   leads to:
   [MATH: <mi>log</mi><mspace width="0.25em"></mspace><mi
   mathvariant="bold-italic">m</mi><mo>=</mo><mi>b</mi><mo>×</mo><mi>log</
   mi><mspace width="0.25em"></mspace><mi
   mathvariant="bold-italic">p</mi><mo>+</mo><mi>c</mi><mo>+</mo><mi
   mathvariant="bold-italic">X</mi><mo>×</mo><mi
   mathvariant="bold-italic">β</mi> :MATH]
   3

   where c = log φ + α. We model the mean of intronic read counts for a
   given gene across samples as a function of the pre-mRNA abundance for
   that gene, a gene-level scaling factor that can be interpreted as the
   effective length, and a sample-specific scaling factor that can be
   interpreted as library size (Fig. [223]1b):
   [MATH: <msup><mrow><mi
   mathvariant="bold-italic">λ</mi></mrow><mrow><mi>i</mi><mi>n</mi><mi>t<
   /mi></mrow></msup><mo>=</mo><mi
   mathvariant="bold-italic">p</mi><mo>×</mo><mi>l</mi><mo>×</mo><msup><mr
   ow><mi
   mathvariant="bold-italic">s</mi></mrow><mrow><mi>i</mi><mi>n</mi><mi>t<
   /mi></mrow></msup> :MATH]
   4

   here, int stands for intronic, λ represents the mean read count, l is
   the gene-specific scaling factor, and s is the sample-specific scaling
   factor. Similarly, the mean of exonic read counts for a given gene
   across samples can be expressed as:
   [MATH: <msup><mrow><mi
   mathvariant="bold-italic">λ</mi></mrow><mrow><mi>e</mi><mi>x</mi><mi>o<
   /mi></mrow></msup><mo>=</mo><mi
   mathvariant="bold-italic">m</mi><mo>×</mo><msup><mrow><mi>l</mi></mrow>
   <mrow><mo>′</mo></mrow></msup><mo>×</mo><msup><mrow><mi
   mathvariant="bold-italic">s</mi></mrow><mrow><mi>e</mi><mi>x</mi><mi>o<
   /mi></mrow></msup> :MATH]
   5

   The above equations can be collectively expressed by matrix operations
   as:
   [MATH: <mi>log</mi><mfenced close="]" open="["><mrow><mtable><mtr><mtd
   columnalign="center"><msup><mrow><mi
   mathvariant="bold-italic">λ</mi></mrow><mrow><mi>i</mi><mi>n</mi><mi>t<
   /mi></mrow></msup></mtd></mtr><mtr><mtd
   columnalign="center"><msup><mrow><mi
   mathvariant="bold-italic">λ</mi></mrow><mrow><mi>e</mi><mi>x</mi><mi>o<
   /mi></mrow></msup></mtd></mtr></mtable></mrow></mfenced><mo>=</mo><mi>l
   og</mi><mfenced close="]" open="["><mrow><mtable><mtr><mtd
   columnalign="center"><msup><mrow><mi
   mathvariant="bold-italic">s</mi></mrow><mrow><mi>i</mi><mi>n</mi><mi>t<
   /mi></mrow></msup></mtd></mtr><mtr><mtd
   columnalign="center"><msup><mrow><mi
   mathvariant="bold-italic">s</mi></mrow><mrow><mi>e</mi><mi>x</mi><mi>o<
   /mi></mrow></msup></mtd></mtr></mtable></mrow></mfenced><mo>+</mo><msup
   ><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mo>′</mo></mrow></msup><m
   i mathvariant="bold-italic">×</mi><mfenced close="]"
   open="["><mrow><mtable><mtr><mtd
   columnalign="center"><mi>log</mi><mspace
   width="0.25em"></mspace><msup><mrow><mi
   mathvariant="bold-italic">p</mi></mrow><mrow><mo>′</mo></mrow></msup></
   mtd></mtr><mtr><mtd
   columnalign="center"><msup><mrow><mi>c</mi></mrow><mrow><mo>′</mo></mro
   w></msup></mtd></mtr><mtr><mtd columnalign="center"><mi
   mathvariant="bold-italic">β</mi></mtd></mtr></mtable></mrow></mfenced>
   :MATH]
   6

   where
   [MATH: <msup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mo>′</mo></mrow></msup><m
   o>=</mo><mfenced close="]" open="["><mrow><mtable><mtr><mtd
   columnalign="center"><msub><mrow><mi
   mathvariant="bold-italic">I</mi></mrow><mrow><mi>n</mi></mrow></msub></
   mtd><mtd columnalign="center"><msub><mrow><mi
   mathvariant="bold">0</mi></mrow><mrow><mi>n</mi><mo>×</mo><mn>1</mn></m
   row></msub></mtd><mtd columnalign="center"><msub><mrow><mi
   mathvariant="bold">0</mi></mrow><mrow><mi>n</mi><mo>×</mo><mi>k</mi></m
   row></msub></mtd></mtr><mtr><mtd
   columnalign="center"><msub><mrow><mi>b</mi><mo>×</mo><mi
   mathvariant="bold-italic">I</mi></mrow><mrow><mi>n</mi></mrow></msub></
   mtd><mtd columnalign="center"><msub><mrow><mi
   mathvariant="bold">1</mi></mrow><mrow><mi>n</mi><mo>×</mo><mn>1</mn></m
   row></msub></mtd><mtd columnalign="center"><msub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>n</mi><mo>×</mo><mi>k<
   /mi></mrow></msub></mtd></mtr></mtable></mrow></mfenced> :MATH]
   7

   and p’ = p × l, c’ = c + log(l’) − b × log(l), and I is the identity
   matrix (matrix dimensions are indicated as subscripts). These equations
   connect pre-/mature mRNA abundance and mRNA stability to the observed
   intronic and exonic read counts for each given gene (see Supplementary
   Fig. [224]1c, d for matrix equations that consider all genes at the
   same time). This formulation enables the estimation of unknown
   parameters using a generalized linear model with a log-link function.
   In this study, we use DESeq2^[225]60 to fit the unknown parameters of
   this model, as explained below.

   It should be noted that changes in the ratio of spliced/unspliced
   mRNAs, and ultimately in the observed intronic and exonic read counts,
   may arise from a wide array of pathways affecting decay of pre-mRNAs or
   mature mRNAs in different manners. However, previous research has
   demonstrated that nuclear decay of pre-mRNAs does not affect the ratio
   of exonic/intronic reads^[226]17 (Supplementary Fig. [227]1b). This
   indicates that mechanisms affecting pre-mRNA levels do not lead to a
   substantial change in the final ratio of spliced/unspliced mRNAs as
   long as the pre-mRNA remains a potential substrate for the splicing
   machinery, since a change at the pre-mRNA level leads to an equivalent
   change at the mature mRNA level and, therefore, does not affect the
   ratio. The estimates of differential stability generated in this study
   therefore represent mostly the effect of change in degradation
   occurring at the mature mRNA levels.

   Different RNA selection methods can also affect the intronic read
   counts. Poly(A)-selected RNA will lead to a lower proportion of
   intronic reads compared to rRNA-depleted RNA. In the current study, we
   made use of several poly(A)-selected datasets, including the RNA-seq
   data from TCGA. However, since all samples in each dataset were
   analysed using the same method, the estimates are all affected in a
   similar manner across the sample types and cancer types. We note that
   poly(A)-selected RNA has previously been shown to produce sufficient
   intronic reads for stability estimation^[228]15. In addition, the large
   number of samples included in this study most likely mitigates any
   statistical power loss that results from lower amount of intronic
   reads.

Estimation of the effect of sample variables on mRNA stability

   The above equations allow us to estimate the distribution of latent
   variables log p’, c’, and β by fitting the model to observed intronic
   and exonic read counts. For this purpose, we use the matrix X’ as the
   design matrix in a DESeq2 model. In practice, we replace the first
   column of X’ with an intercept (Fig. [229]1c), which is an equivalent
   design matrix and does not change the interpretation of β, but enables
   the user to employ a beta prior (if desired) when fitting the DESeq2
   model.

   In order to be able to construct X’, the bias term b needs to be first
   estimated. We do this by first optimizing b in order to maximize the
   likelihood of observed intronic and exonic read counts across all genes
   in a model that assumes the mRNA stability is a gene-specific constant.
   Specifically, we use the below design matrix D to fit the model using
   DESeq2, while varying the value of b in the interval [0,1] to select
   the b that maximizes the sum of log-likelihood of the data across all
   genes:
   [MATH: <msup><mrow><mi
   mathvariant="bold-italic">D</mi></mrow><mrow><mo>′</mo></mrow></msup><m
   o>=</mo><mfenced close="]" open="["><mrow><mtable><mtr><mtd
   columnalign="center"><msub><mrow><mi
   mathvariant="bold-italic">I</mi></mrow><mrow><mi>n</mi></mrow></msub></
   mtd><mtd columnalign="center"><msub><mrow><mi
   mathvariant="bold">0</mi></mrow><mrow><mi>n</mi><mo>×</mo><mn>1</mn></m
   row></msub></mtd></mtr><mtr><mtd
   columnalign="center"><msub><mrow><mi>b</mi><mo>×</mo><mi
   mathvariant="bold-italic">I</mi></mrow><mrow><mi>n</mi></mrow></msub></
   mtd><mtd columnalign="center"><msub><mrow><mi
   mathvariant="bold">1</mi></mrow><mrow><mi>n</mi><mo>×</mo><mn>1</mn></m
   row></msub></mtd></mtr></mtable></mrow></mfenced> :MATH]
   8

   we use the ‘optimize’ function in R to select the optimal value of b.
   Once this optimal value is identified, it is used in the matrix X’ (see
   above), which is then used as the design matrix in DESeq2 to estimate
   the latent variables, including β (i.e. the effect of each variable on
   stability). This procedure is implemented in DiffRAC
   ([230]https://github.com/csglab/DiffRAC).

A modified design to accommodate larger sample sizes

   A major limitation of this approach is the considerable increase in
   computing time with larger sample sizes when DESeq2 is used to fit the
   model, since the model includes sample-specific latent variables for
   pre-mRNA abundance. To accommodate these cases, we have also
   implemented a model that assumes that most of the variance in pre-mRNA
   abundance can be explained by the experimental variables, instead of
   including sample-specific latent variables:
   [MATH: <mi>log</mi><mi mathvariant="bold-italic">p</mi><mo>=</mo><mi
   mathvariant="bold-italic">X</mi><mo>×</mo><mi
   mathvariant="bold-italic">ω</mi><mo>+</mo><mi>ρ</mi> :MATH]
   9

   Here, ω is the vector of coefficients that represent the effect of each
   variable on the pre-mRNA abundance of a given gene, and ρ is a
   gene-specific intercept. There, we also have:
   [MATH: <mi>log</mi><mi
   mathvariant="bold-italic">m</mi><mo>=</mo><mi>b</mi><mo>×</mo><mfenced
   close=")" open="("><mrow><mi
   mathvariant="bold-italic">X</mi><mo>×</mo><mi
   mathvariant="bold-italic">ω</mi><mo>+</mo><mi>ρ</mi></mrow></mfenced><m
   o>+</mo><mi>c</mi><mo>+</mo><mi
   mathvariant="bold-italic">X</mi><mo>×</mo><mi
   mathvariant="bold-italic">β</mi> :MATH]
   10

   This leads to a modified set of matrix equations (Supplementary
   Fig. [231]3a–c) that connect intronic/exonic read counts to sample
   variables:
   [MATH: <mi>log</mi><mfenced close="]" open="["><mrow><mtable><mtr><mtd
   columnalign="center"><msup><mrow><mi
   mathvariant="bold-italic">λ</mi></mrow><mrow><mi>i</mi><mi>n</mi><mi>t<
   /mi></mrow></msup></mtd></mtr><mtr><mtd
   columnalign="center"><msup><mrow><mi
   mathvariant="bold-italic">λ</mi></mrow><mrow><mi>e</mi><mi>x</mi><mi>o<
   /mi></mrow></msup></mtd></mtr></mtable></mrow></mfenced><mo>=</mo><mi>l
   og</mi><mfenced close="]" open="["><mrow><mtable><mtr><mtd
   columnalign="center"><msup><mrow><mi
   mathvariant="bold-italic">s</mi></mrow><mrow><mi>i</mi><mi>n</mi><mi>t<
   /mi></mrow></msup></mtd></mtr><mtr><mtd
   columnalign="center"><msup><mrow><mi
   mathvariant="bold-italic">s</mi></mrow><mrow><mi>e</mi><mi>x</mi><mi>o<
   /mi></mrow></msup></mtd></mtr></mtable></mrow></mfenced><mo>+</mo><msup
   ><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mo>′</mo></mrow></msup><m
   i mathvariant="bold-italic">×</mi><mfenced close="]"
   open="["><mrow><mtable><mtr><mtd columnalign="center"><mtable><mtr><mtd
   columnalign="center"><mi>ρ</mi><mo>′</mo></mtd></mtr><mtr><mtd
   columnalign="center"><mi
   mathvariant="bold-italic">ω</mi></mtd></mtr></mtable></mtd></mtr><mtr><
   mtd columnalign="center"><mtable><mtr><mtd
   columnalign="center"><msup><mrow><mi>c</mi></mrow><mrow><mo>′</mo></mro
   w></msup></mtd></mtr><mtr><mtd columnalign="center"><mi
   mathvariant="bold-italic">β</mi></mtd></mtr></mtable></mtd></mtr></mtab
   le></mrow></mfenced> :MATH]
   11

   where
   [MATH: <msup><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mo>′</mo></mrow></msup><m
   o>=</mo><mfenced close="]" open="["><mrow><mtable><mtr><mtd
   columnalign="center"><mtable><mtr><mtd
   columnalign="center"><msub><mrow><mi
   mathvariant="bold">1</mi></mrow><mrow><mi>n</mi><mo>×</mo><mn>1</mn></m
   row></msub></mtd><mtd columnalign="center"><msub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>n</mi><mo>×</mo><mi>k<
   /mi></mrow></msub></mtd></mtr><mtr><mtd
   columnalign="center"><msub><mrow><mi
   mathvariant="bold">1</mi></mrow><mrow><mi>n</mi><mo>×</mo><mn>1</mn></m
   row></msub></mtd><mtd
   columnalign="center"><mi>b</mi><mo>×</mo><msub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>n</mi><mo>×</mo><mi>k<
   /mi></mrow></msub></mtd></mtr></mtable></mtd><mtd
   columnalign="center"><mtable><mtr><mtd
   columnalign="center"><msub><mrow><mi
   mathvariant="bold">0</mi></mrow><mrow><mi>n</mi><mo>×</mo><mn>1</mn></m
   row></msub></mtd><mtd columnalign="center"><msub><mrow><mi
   mathvariant="bold">0</mi></mrow><mrow><mi>n</mi><mo>×</mo><mi>k</mi></m
   row></msub></mtd></mtr><mtr><mtd columnalign="center"><msub><mrow><mi
   mathvariant="bold">1</mi></mrow><mrow><mi>n</mi><mo>×</mo><mn>1</mn></m
   row></msub></mtd><mtd columnalign="center"><msub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>n</mi><mo>×</mo><mi>k<
   /mi></mrow></msub></mtd></mtr></mtable></mtd></mtr></mtable></mrow></mf
   enced> :MATH]
   12

   and ρ‘ = ρ + log l, and c’ = c + log(l’/l) + ρ × (b – 1). Similar to
   the previous section, X’ can be used as the design matrix for DESeq2 to
   estimate the latent variables, including ω and β.

   To construct X’, the bias-term b is chosen so that it maximizes the sum
   of log-likelihood of data across all genes in a model that assumes
   gene-specific constant stability, i.e. with the below design matrix D’:
   [MATH: <msup><mrow><mi
   mathvariant="bold-italic">D</mi></mrow><mrow><mo>′</mo></mrow></msup><m
   o>=</mo><mfenced close="]" open="["><mrow><mtable><mtr><mtd
   columnalign="center"><msub><mrow><mi
   mathvariant="bold">1</mi></mrow><mrow><mi>n</mi><mo>×</mo><mn>1</mn></m
   row></msub></mtd><mtd columnalign="center"><msub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>n</mi><mo>×</mo><mi>k<
   /mi></mrow></msub></mtd><mtd columnalign="center"><msub><mrow><mi
   mathvariant="bold">0</mi></mrow><mrow><mi>n</mi><mo>×</mo><mn>1</mn></m
   row></msub></mtd></mtr><mtr><mtd columnalign="center"><msub><mrow><mi
   mathvariant="bold">1</mi></mrow><mrow><mi>n</mi><mo>×</mo><mn>1</mn></m
   row></msub></mtd><mtd
   columnalign="center"><mi>b</mi><mo>×</mo><msub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mi>n</mi><mo>×</mo><mi>k<
   /mi></mrow></msub></mtd><mtd columnalign="center"><msub><mrow><mi
   mathvariant="bold">1</mi></mrow><mrow><mi>n</mi><mo>×</mo><mn>1</mn></m
   row></msub></mtd></mtr></mtable></mrow></mfenced> :MATH]
   13

   This simplified model is also implemented in DiffRAC. Overall, we see
   strong agreement between DiffRAC’s estimates when using the two
   different models (i.e. sample-specific pre-mRNA abundances vs.
   condition-specific pre-mRNA abundances) on the same data (Supplementary
   Fig. [232]3d).

Differential RNA stability between NAT10 knockout and parental cells

   Raw BRIC sequencing (BRIC-seq) (5′-bromo-uridine [BrU]
   immunoprecipitation chase-deep sequencing analysis) reads for
   time-series measurements of BrU-pulsed RNAs in parental and NAT10^−/−
   HeLa cells^[233]20,[234]61 were obtained from GEO accession
   [235]GSE102113 (SRA accession SRP114504). This RNA-seq dataset
   represents time points 0, 2, 4, 8 and 16 h after a 24-hour treatment of
   cells with BrU (two replicates for each cell line at each time point).
   Reads were mapped to the GRCh38 genome assembly using HISAT2^[236]62,
   and gene-level read counts for each sample were obtained using
   HTSeq-count^[237]63 (“intersection-strict” mode) based on Ensembl
   GRCh38 v87 gene annotations. Ground-truth Differential mRNA stability
   between the control and NAT10KO cells was obtained using DESeq2^[238]60
   by modeling the RNA abundances as a function of ~c + t + c:t, where c
   is the cell type (0 for Control and 1 for NAT10KO), t is the time
   point, and c:t is the interaction between cell type and time. In this
   model, the coefficient of c would represent the differential expression
   between the two cell types (i.e. difference in abundance at time zero);
   the coefficient of t would represent the stability of each gene’s mRNA
   in the reference cell line (relative to the average of all genes); and
   the coefficient of the interaction term c:t would represent the
   differential mRNA stability between the two cell lines. For each gene,
   the coefficient of c:t and associated statistics were retrieved using
   DESeq2.

TCGA RNA-seq data processing

   RNA-seq BAM files for 7078 tumour samples and 682 adjacent normal
   samples from the 18 cancer types with at least 5 normal samples in TCGA
   were acquired from the National Cancer Institute (NCI) Genomic Data
   Commons (GDC) data portal ([239]https://portal.gdc.cancer.gov/GDC;
   dbGaP study accession phs000178.v1.p1). All TCGA RNA-seq data used in
   this study was generated from poly(A)-selected RNA. In order to
   quantify the number of reads corresponding to pre-mRNA and mature mRNA
   for the estimation of mRNA stability, we generated custom annotations
   for exons and introns for the transcripts supported by both Ensembl and
   Havana consortia, using GTF formatted annotations acquired from Ensembl
   GRCh38 version 87.

   We note that, in addition to mRNA stability, aberrant alternative
   splicing may affect the exonic read profiles. To avoid the potential
   confounding effect of alternative splicing on mature mRNA
   quantification, we exclusively retained exonic reads mapping to
   constitutive exons that are present in all Ensembl/Havana transcripts.
   Even when only constitutive exons are used for read counting, there
   might be cases where a splicing shift leads to transcripts that have
   reduced or enhanced stability. In such cases, DiffRAC should still
   detect the overall change in stability, even though it is caused by the
   interaction between abnormal alternative splicing and isoform-specific
   decay mechanisms. Similar to ref. ^[240]17, we limited our analysis of
   RBP and miRNA regulons to the genes that shared the same 3′ UTR across
   all their isoforms, with the 3ʹ UTR composed of a single exon, to
   mitigate the potential confounding effect of alternative 3ʹ UTR
   usage/splicing on mRNA stability.

   Intronic regions were included in our annotations only if they did not
   overlap with any exon, regardless of whether the exon was concordantly
   annotated by Ensembl or Havana consortia. The strandedness of RNA-seq
   data was determined using RSeQC^[241]64. Subsequently, BAM files were
   sorted by read name using SAMtools, and exonic and intronic reads were
   separately counted using HTSeq-count^[242]63, limiting to reads with a
   MAPQ score ≥30. Exonic reads were counted using the HTSeq
   “intersection-strict” mode, whereas intronic reads were counted using
   the “union” mode. The exonic/intronic read counts were then used as
   input to DiffRAC for stability analysis. We removed the cell cycle
   genes (based on GO term GO:000704) for downstream analyses, given that
   these genes are not at steady state, which is required for estimating
   stability from pre-/mature mRNA abundances.

Deconvolution of cellular origin from differential stability estimates

   We inferred stage-associated changes in stability specifically
   originating from the cancerous (or pre-cancerous) cells using DiffRAC
   with a design matrix that models the exonic/intronic read ratio as a
   function of the tumour stage (dichotomized into low-stage and
   high-stage categories), the impurity (fraction of non-malignant cells)
   of the tumour as measured by ABSOLUTE^[243]65, and an interaction term
   between stage and impurity, similar to ref. ^[244]27. As shown in
   Fig. [245]3d, different coefficients retrieved from this model
   represent the stage-associated changes in stability originating from
   cancerous or pre-cancerous cells specifically. Specifically, the
   coefficient of the tumour stage variable represents difference in
   stability between high- and low-stage tumours when impurity is zero,
   and thus can be interpreted as the stage-associated differential
   stability that is confidently attributed to malignant cells.

Pathway analysis

   MSigDB hallmark gene-sets^[246]66 were retrieved using the msigdbr R
   package
   ([247]https://cran.r-project.org/web/packages/msigdbr/index.html). For
   each TCGA cancer type, Fisher’s exact test was used to examine the
   association between each pathway and the sets of significantly
   stabilized or destabilized mRNAs, separately.

Differential RNA stability between MDA-MB-231 and MDA-LM2 cells

   Raw RNA-seq reads for time-series measurements of 4-thiouridine
   (4sU)-labeled RNA^[248]23,[249]67 from MDA-MB-231 and MDA-LM2 cells
   were obtained from GEO accession [250]GSE49608 (SRA accession
   SRP028570). This RNA-seq dataset represents time points 0, 2, 4, and
   7 h after a 2-hour treatment of cells with 4sU (four replicates for
   each cell line at each time point). Raw data was processed and
   differential mRNA stability between the MDA-MB-231 and MDA-LM2 cells
   was obtained in the same way as the NAT10KO BRIC-seq data (see above
   Methods).

RBP and miRNA regulon analysis

   The stability regulons of 35 RBPs (i.e. the set of mRNAs bound and
   regulated by each RBP) were obtained from a previous
   publication^[251]28. The regulons of miRNA families were obtained by
   identifying exact miRNA seed matches in mRNA 3ʹ UTRs. Specifically, 3ʹ
   UTR sequences of protein-coding genes were retrieved using the Ensembl
   GRCh38 version 87 annotations. We limited the analysis to the genes for
   which a single 3ʹ UTR, composed of a single exon, was shared across all
   isoforms, in order to avoid the possible confounding effects of
   alternative splicing. The miRNA seed sequences (8nt) were retrieved
   from TargetScan v7.2^[252]68, limiting to a set of 153 broadly
   conserved miRNA families (family conservation score ≥1). Exact seed
   sequence matches in 3ʹ UTR sequences were identified while limiting the
   search space to a maximum of 2000 nt downstream of the stop codon.

   The regulon enrichment among upregulated or downregulated genes was
   quantified using a logistic regression approach. Specifically, for each
   cancer type, we modeled the likelihood of being bound by each RBP/miRNA
   as a function of status, with –1 corresponding to significantly
   destabilized mRNAs (FDR ≤ 0.05), +1 corresponding to significantly
   stabilized mRNAs, and 0 corresponding to non-significant mRNAs. To
   account for the confounding factors that generally affect the number of
   binding sites of RNA-binding factors (rather than a specific RBP or
   miRNA; e.g. 3ʹ UTR length), we used the total number of binding sites
   of each mRNA for RBPs or miRNAs as the background. Specifically, we
   used a generalized linear model of the binomial family, in which the
   presence of a binding site for the specific RBP or miRNA of interest is
   considered as “success”, and the presence of binding sites for other
   RBPs or miRNAs considered as “failures”. These success/failure counts
   were modeled as a function of the stability status of the transcript
   using the glm function in R.

HITS-CLIP data analysis

   Pooled HITS-CLIP peaks of RBFOX1/2/3 proteins in whole brain tissue
   lysate of mice were retrieved from a previous study^[253]32. Peaks
   occurring in the 3ʹ UTR with a height greater or equal to 200
   overlapping CLIP tags were retained (peak height was extracted from
   Supplementary Table [254]1 of the source publication). The mRNAs that
   had at least one 3ʹ UTR high-confidence peak were considered
   high-confidence RBFOX targets, which were further filtered to include
   only those whose orthologs had expression measurements in TCGA. This
   resulted in 58 genes, 54 of which also have a 3ʹ UTR RBFOX binding site
   based on CIMS analysis of CLIP data.

Cell culture and transient transfection of miRNA mimics and inhibitors

   The established renal cancer cell line 786-O, A-498 and ACHN as well as
   the glioblastoma cell line A172 were purchased from the American Type
   Culture Collection (ATCC; Rockville, MD, USA) and cultured in
   Dulbecco’s Modified Eagle Medium (DMEM) supplemented with 10% fetal
   bovine serum (FBS) and 1% penicillin/streptomycin (Life technologies)
   at 37 °C with 5% CO2. For transient transfection, 786-O and A-498 cells
   (100,000 cells/well in 6-well plates) were reverse-transfected in
   antibiotic-free medium with 10 nM of miRNA-29 mimic (stem-loop
   sequence:
   UGGUUUCGUAUUGGUGCAUAGAAGUAUUAAUUUUGUAACUUGUCUAGCACCAUUUGAAACCAGU (two
   biological replicates for A-498, and one for 786-O), mature miRNA
   sequence: UAGCACCAUUUGAAACCAGU, ThermoFisher, 4464066) or control mimic
   (ThermoFisher, 4464058) (two biological replicates for A-498, and one
   for 786-O) using Lipofectamine RNAiMAX Reagent (ThermoFisher,13778075)
   according to the manufacturer’s recommendations. ACHN cells were
   transfected either with miR-29 inhibitor (ThermoFisher, 4464084, Assay
   ID: MH10103) or negative control (ThermoFisher 4464076) using the same
   protocol described above, with three biological replicates each. Two
   additional RNA-seq samples related to the miR-29 mimic experiment
   performed in A-498 cells were excluded due to potential mislabeling of
   the samples.

RNA isolation and qRT-PCR analysis of miRNAs

   Total RNA was extracted using All Prep DNA/RNA/miRNA Universal kit
   (Qiagen) 48 h after transient transfection. RT-PCR was done using
   TaqMan MicroRNA reverse transcription kit (Applied Biosystems,
   4366596). The LightCycler 480 instrument (Roche) was used to perform
   qRT-PCR analysis of miR-29 and miR-26 using TaqMan Fast Advanced miRNA
   Assays (ThermoFisher, 4444557) following guidelines provided by the
   manufacturer. Expression was reported as Ct values (Supplementary
   Fig. [255]8).

Stable cells expressing RBFOX1

   To generate stable A172 cell lines, HEK293T cells were transfected with
   lentiviral packaging plasmids (psPAX2 and MD2.g) together with a
   lentiviral expression plasmid for either GFP or RBFOX1 (three
   biological replicates each) using Lipofectamine 3000. Plasmids
   pLX317-GFP and pLX317-RBFOX1 were obtained from the TRC3 ORF collection
   from Sigma provided by McGill Platform for Cellular Perturbation (MPCP)
   at McGill University. After 48 h, media containing lentiviral particles
   were collected, filtered through a 0.45 μm syringe filter, and
   immediately added to A172 cells with 8 μg/ml polybrene. Over-expression
   of GFP and RBFOX1 were confirmed by fluorescence microscopy (for GFP)
   or qPCR (for RBFOX1). Total RNA was extracted using the All Prep
   DNA/RNA/miRNA Universal kit (Qiagen).

RNA-sequencing and analysis

   Library preparation from total RNA was performed using NEB
   rRNA-depleted (HMR) stranded library preparation kit according to
   manufacturer’s instructions, and sequenced using Illumina NovaSeq 6000
   (100 bp paired-end). RNA-seq reads were aligned to the GRCh38 genome
   assembly using HISAT2^[256]62, and gene-level read counts were obtained
   using HTSeq-count^[257]63 (“intersection-strict” mode) based on Ensembl
   GRCh38 v87 gene annotations. DESeq2^[258]60 was used to compute
   differential gene expression.

Statistics and reproducibility

   All statistical analysis were performed using by Bioconductor packages
   in R (version 4.1.2). The specific statistical tests used for each
   analysis and the associated measures of statistical significance are
   indicated within the main text, methods, in the figure, or in their
   legends. Statistical significance was set at P < 0.05 for all analyses
   and multiple testing correction was performed when applicable using the
   FDR method. Sample size for TCGA cohort analysis depended on publicly
   available data. No statistical analysis was performed to select the
   sample sizes for RNA-seq experiments. To ensure reproducibility for
   RNA-seq experiments, biological replicates were used and/or the
   findings were replicated in other cell lines.

Reporting summary

   Further information on research design is available in the [259]Nature
   Research Reporting Summary linked to this article.

Supplementary information

   [260]Peer Review File^ (945.8KB, pdf)
   [261]Supplementary Information^ (17.3MB, pdf)
   [262]42003_2022_3796_MOESM3_ESM.pdf^ (480.9KB, pdf)

   Description of Additional Supplementary Files
   [263]Supplementary Data 1^ (3MB, xlsx)
   [264]Supplementary Data 2^ (26.9MB, xlsx)
   [265]Supplementary Data 3^ (5.7MB, xlsx)
   [266]Supplementary Data 4^ (387.7KB, xlsx)
   [267]Supplementary Data 5^ (272.4KB, xlsx)
   [268]Supplementary Data 6^ (261.4KB, xlsx)
   [269]Supplementary Data 7^ (8MB, xlsx)
   [270]Supplementary Data 8^ (11.7MB, xlsx)
   [271]Reporting Summary^ (1.3MB, pdf)

Acknowledgements