Graphical abstract

   graphic file with name fx1.jpg
   [35]Open in a new tab

Highlights

     * •
       SNPs can alter mRNA translation potential
     * •
       Polysomal imbalance of heterozygous SNPs can mark allele-specific
       protein expression
     * •
       UTR SNPs associated with allele-specific translation demonstrate
       prognostic value
     __________________________________________________________________

   Molecular mechanism of gene regulation; Computational bioinformatics;
   Transcriptomics

Introduction

   Single-nucleotide polymorphisms (SNPs) represent one of the largest
   classes of genetic variations. They underlie and are responsible for
   inter-individual variations in complex-disease phenotypes, including
   cancer risk or aggressiveness. Wide attention has been given to SNPs
   that can lead to allele-specific changes in gene expression, for
   instance, by modifying the affinity of transcription factor-binding
   sites in promoter or enhancer regions directly, or indirectly via
   influencing epigenetic regulation. Included in these examples are
   functionally relevant SNPs affecting p53 function as a transcription
   factor or p53 protein expression by altering target binding sites
   ([36]Bond et al., 2006; [37]Menendez et al., 2007, [38]2019; [39]Tomso
   et al., 2005), or association-driven studies that have candidated SNPs
   as modulators of major cancer drivers, such as AR ([40]Bu et al., 2016;
   [41]Romanel et al., 2017), ER ([42]Bailey et al., 2016; [43]Dunning
   et al., 2016), or cMYC ([44]Pomerantz et al., 2009; [45]Takatsuno
   et al., 2013).

   A fraction of SNPs identified in the human population is located within
   coding regions or UTRs. In this case, both mechanism- and
   association-driven studies have pursued functional SNPs that can modify
   aspects of post-transcriptional gene regulation, for example, by
   altering microRNA-binding sites ([46]Landi et al., 2012).

   Specific tools have been implemented to mine the available wealth of
   RNA-seq-based gene expression data and identify and pursue instances of
   allele-specific gene expression ([47]Mayba et al., 2014; [48]Pandey
   et al., 2013; [49]Przytycki and Singh, 2020; [50]Romanel, 2019;
   [51]Romanel et al., 2015; [52]Rozowsky et al., 2011; [53]Skelly et al.,
   2011; [54]Wei et al., 2012). The same cannot be said for annotating
   candidate SNPs driving or being associated with alterations in mRNA
   translation potential. Recently, we showed that allelic imbalance
   restricted to polysome-bound mRNAs can be exploited to investigate the
   functional significance of 5′UTR single-nucleotide variants (SNVs) at
   the CDKN2A gene ([55]Andreotti et al., 2016), since specific SNVs
   affected CDKN2A translation. Those results led us to hypothesize that a
   comparative analysis of allelic-imbalance from total and polysomal
   mRNAs extracted and sequenced starting from the same cell sample, which
   is independently genotyped for heterozygous SNPs, could overcome the
   intrinsic noise in SNP calling and coverage from RNA-seq data. If so, a
   catalog of coding and UTRs SNPs associated and potentially causative of
   alterations in mRNA translation potential could be obtained. Here we
   describe such an approach starting from a commonly used breast cancer
   cell line and two disease-relevant treatments. About 4% of the
   heterozygous SNPs that could be followed showed a significant level of
   imbalance within polysomal RNAs. Nearly 25% of these are located in UTR
   sequences, and some appear to stratify patients with cancer for
   distinct clinical outcomes.

   Thus, our approach led us to distinguish both constitutive and
   treatment-dependent SNPs that are associated with or can cause changes
   in mRNA translation efficiency. We propose that our method can lead to
   identify a class of functional SNPs that we call tranSNPs, endowed with
   prognostic value for cancer or other diseases.

Results

Using polysome profiling to identify SNPs exhibiting allelic imbalance

   In order to develop an approach to map SNPs associated with allelic
   imbalance within translating ribosomes, we took advantage of a
   well-characterized cancer cell line with available genotyping data. We
   thus chose MCF7 cells, p53 wild-type breast adenocarcinoma-derived
   cells we had previously used for polysomal profiling studies
   ([56]Zaccara et al., 2014). After determining the set of heterozygous
   SNPs using DNA data, we exploited RNA-seq reads to compare relative
   allelic representation between the transcripts isolated in the
   cytoplasm because of their association with polysomes or lack thereof.
   Besides a mock condition, we included two treatments, 1 μM doxorubicin
   or 10 μM Nutlin, that we found can reduce global translation, engage
   p53 responses, and lead to a significant uncoupling between
   transcriptome and translatome changes. Indeed, many transcripts were
   significantly modulated by the treatments only at the level of
   polysomal RNA fractions ([57]Rizzotto et al., 2020; [58]Zaccara et al.,
   2014). We reasoned that these treatments could uncover specific allelic
   imbalance events due to changes in translation specificity mediated,
   for instance, by the action of RNA-binding proteins ([59]Zaccara
   et al., 2014; [60]Rizzotto et al., 2020). Polysomal profiling was
   performed using cytoplasmic lysates fractionated by a linear 10%-50%
   sucrose gradient, as previously described ([61]Rizzotto et al., 2020;
   [62]Rossi et al., 2019; [63]Zaccara et al., 2014). Cytoplasmic mRNA
   associated with light fractions, with ribosome subunits, or with the
   80S monosome, assumed to be not actively translating, were pooled
   together and sequenced separately from mRNAs associated with two or
   more ribosomes. Total RNA was also collected and sequenced. The RNA
   sequencing of two biological replicates for each treatment and fraction
   produced on average 32 million unique mapped reads. Differential gene
   expression analysis and pathway enrichment analysis confirmed that the
   two treatments activated a canonical p53 response ([64]Figure S1).

Identifying instances of allelic imbalance in polysomal RNA

   To identify SNPs exhibiting allelic imbalance in one or more RNA
   fractions, we first retrieved from public databases an exhaustive list
   of SNPs that are heterozygous in the MCF7 cell line. We integrated
   SNP-array and exome-based genotype calls and overall identified 11,544
   heterozygous SNPs of which 1,802 were in 3′ UTRs and 729 were in 5′
   UTRs ([65]Table S1). Then, we determined the allelic fractions of these
   SNPs in our RNA-seq data using a pileup approach based on high-quality
   reads and bases and requiring a minimum local depth of coverage of 10X.
   We found an average of 4,100 of the expected 11,544 SNPs per sample
   with confirmed coverage signal and, overall, we found 3,974 SNPs
   analyzable in at least one condition.

   Our approach focuses on the comparison of SNP allelic counts across
   biological replicates of the two RNA fractions that were sequenced for
   each of the three treatments. We first determined which is the
   distribution of SNP allelic fractions across all experiments and
   measured to what extent they are variable across biological replicates.
   Of interest, the variability of SNP allelic fractions was limited
   across biological replicates ([66]Figure S2) with a mean replicate's
   difference of about 7% that was consistent among the different RNA
   fractions and treatments ([67]Table S2). In addition, the range of SNP
   allelic fractions was quite wide ([68]Figure S2). This suggests that
   heterozygous SNP allelic counts from RNA-seq data are potentially
   biased by position-specific sequencing properties, indicating that
   divergence from an expected 0.5 allelic fraction should not, as
   commonly done, be directly linked to putative allele-specific
   expression phenomena or to a specific expression imbalance direction.
   Hence, we designed and implemented a computational approach that first
   calculates a condition-specific variability of SNP allelic fractions
   exploiting the available biological replicates and then uses it to
   identify significant SNP allelic imbalances across different
   conditions.

   We used this approach to identify allelic imbalances between polysomal
   and total RNA fractions. Specifically, considering SNPs allelic
   fraction (AF) intervals calculated by summing and subtracting
   condition-specific AF variabilities from SNPs AFs, we searched for SNPs
   presenting polysomal and total RNA non-overlapping AF intervals
   ([69]Figure 1A). The analysis led to 162 imbalance instances, involving
   147 SNPs that were identified in both biological replicates preserving
   the imbalance direction. This conservative list of 147 SNPs represents
   the first catalog of tranSNPs: polymorphisms exhibiting allelic
   imbalance specifically in polysome-bound mRNA fractions ([70]Tables S3
   and [71]S4). The list is almost equally divided between constitutive,
   doxorubicin-dependent, and Nutlin-dependent polysomal allelic
   imbalances ([72]Figure 1B). Only three tranSNPs were common to all
   conditions, and only eight to the two different p53-activating
   treatments. Despite the limited observed intersection among our
   conservative calls, the comparison of condition-associated tranSNPs
   allelic fractions across all conditions showed not only an expected
   clear shift in the condition-associated tranSNPs AF distribution but
   also a fraction of SNPs with comparable imbalance across two or three
   conditions ([73]Figure 1C). The condition-associated lists of genes
   harboring the tranSNPs showed mild enrichment for specific biological
   processes or molecular functions, including regulation of spindle
   organization, mitotic cytokinesis, protein kinase regulator activity,
   and catalytic activity ([74]Figure S3).

Figure 1.

   [75]Figure 1
   [76]Open in a new tab

   Identification of SNPs allelic imbalance across different RNA fractions

   (A) Schematic representation of the approach developed to identify RNA
   fraction-specific SNP allelic imbalances. RNA-seq-based SNP allelic
   fraction variability is estimated both in total and polysomal RNA
   fractions. Then variability extended SNP allelic fractions are compared
   and only non-overlapping total versus polysomal imbalances are retained
   as tranSNPs. In the example, SNP2 satisfies the condition and is hence
   nominated as tranSNP. AF, allelic fraction; V, mean AF variability
   among replicates.

   (B) Venn diagram showing private and shared tranSNPs identified across
   the three analyzed conditions.

   (C) Allelic imbalance distribution of condition-associated tranSNPs is
   shown across the different conditions. Aggregate distribution is shown
   using boxplots, whereas single SNPs distribution is shown using a
   heatmap, where red intensity represents the level of imbalance. In the
   boxplot, the imbalance is shown as absolute log2 ratio of allelic
   fraction in polysomal RNA and allelic fraction in total RNA. In the
   heatmap, red intensity is proportional to this value; gray represents
   no value.

   Characteristics of tranSNPs are shown in [77]Table 1 and are compared
   with those in the larger set of 11,544 initial SNPs and those in the
   set of 3,974 analyzable SNPs. Of the 147 tranSNPs we identified, 39 are
   located in UTRs and represent top candidates for a direct role on the
   observed allelic imbalance.

Table 1.

   Summary table of MCF7 heterozygous SNPs
                      MCF7 heterozygous SNPs Analyzable SNPs tranSNPs
   Number             11,544                 3,974           147
   Average Global MAF 0.26                   0.25            0.25
   Average EUR MAF    0.28                   0.28            0.27
   Overlapping genes  5,493                  2,532           139
   Intronic           4,724                  675             26
   Coding             6,161                  3,007           108
   UTR                2,553                  1,182           39
   LD blocks          8,656                  3,202           143
   [78]Open in a new tab

   Summary table presenting the characteristics of tranSNPs together with
   same features measured from the set of the SNPs heterozygous in the
   MCF7 cell line initially considered, and the set of SNPs analyzable
   from the RNA-seq data by our approach. Average Global and EUR MAF
   (Minor Allele Frequency) were retrieved from 1,000 Genomes Project
   data. Details on how the LD blocks were defined are provided in the
   text.

BRI3BP 3′ UTR rs1055472 and CDKN1A 5′ UTR rs2395655 alleles are functionally
distinct

   We chose two UTR SNPs showing polysomal allelic imbalance for
   validation and opted for a gene reporter assay to evaluate the
   functional impact of the two pairs of tranSNPs. We selected two
   Nutlin-dependent tranSNPs and two genes harboring them whose functions
   are related to p53.

   BRI3BP, also known as human cervical cancer oncogene 1 (HCCR-1), might
   act as negative modulator of p53 ([79]Cho et al., 2007; [80]Ha et al.,
   2008). The two rs1055472 alleles of the BRI3BP 373nt 3′ UTR were cloned
   downstream of the Firefly cDNA. MCF7 cells were transiently transfected
   with each of the two alleles, and the activity of the reporter relative
   to the Renilla control luciferase was measured in untreated cells or in
   cells treated with Nutlin. In both cases we observed that the
   alternative allele led to a relative increase in the reporter gene
   activity ([81]Figure 2A), suggesting an overall differential
   translation efficiency. Although the SNP was computationally classified
   as polysomally imbalanced only after Nutlin treatment ([82]Tables S3
   and [83]S4), computational and experimental data combined suggest that
   the two alleles are functionally distinct, impacting on gene
   expression.

Figure 2.

   [84]Figure 2
   [85]Open in a new tab

   TranSNPs results in functionally distinct alleles

   (A) MCF7 cells were transiently transfected with pGL4.13-based vectors
   containing BRI3BP 3′ UTR fragments differing for the indicated BRI3BP
   SNP allele, and the control pRLSV40 Renilla vector. After 24 h of
   transfection, cells were treated with Nutlin for 24 h before performing
   dual-luciferase assays. Firefly luciferase signals were normalized to
   Renilla to control for transfection efficiency and to relative Firefly
   mRNA levels to take into account differences in reporter's transcript
   levels. Individual values from independently transfected wells are
   plotted.

   (B) Same as (A), except that the p21-5′ UTR was cloned in the
   low-expression pGL3-basic vector. ∗∗p value < 0.01; ∗∗∗∗p value <
   0.0001, adjusted p value based on a two-way ANOVA with Sidak's multiple
   comparison test. Data are represented as mean ± SD.

   CDKN1A (p21) is an important cyclin-dependent kinase inhibitor and one
   of the major direct p53 transcriptional targets mediating cell-cycle
   arrest ([86]El-Deiry, 2016; [87]Kreis et al., 2019). The gene is highly
   regulated at the transcriptional and post-transcriptional levels
   ([88]Rossi et al., 2019; [89]Wang et al., 2000). CDKN1A transcripts
   were highly induced by both doxorubicin and Nutlin treatment, and the
   coverage of rs2395655 alleles in the untreated condition was very low
   (<10X). We evaluated the CDKN1A 5′ UTR rs2395655 alleles cloned
   upstream of the Firefly cDNA. We observed an overall bimodal
   distribution in the reporter gene activity, whose amplitude is reduced
   when the alternative allele is present ([90]Figure 2B, left panel). The
   treatment with Nutlin further mitigated this bimodal distribution with
   an overall reporter gene activity that is strongly reduced when the
   alternative allele is present ([91]Figure 2B, right panel). Since the
   SNP was concordantly classified as polysomally imbalanced after Nutlin
   treatment, in this case we highlight a scenario with functionally
   distinct alleles where differential translation efficiency may impact
   protein expression only under specific conditions.

TranSNPs can have prognostic significance in breast cancer TCGA data

   Considering that rs1055472 and rs2395655 are tranSNPs with functionally
   distinct alleles and the two genes harboring them are related to p53,
   we next investigated their potential clinical impact. We focused on
   breast cancer for consistency with the cell line model used and
   exploited the richness of data available in TCGA ([92]Koboldt et al.,
   2012). Exploring TCGA survival data ([93]Liu et al., 2018) and using
   Kaplan-Meier curves, we observed that patients harboring the rs1055472
   alternative allele show a statistically significant increase in
   progression-free interval time (p value = 0.042, [94]Figure 3A). Of
   note, BRI3BP and TP53 transcript levels were not correlated
   ([95]Figure S4A) and variant rs1055472 was not associated with
   patients' TP53 somatic aberration status ([96]Figure S4B) or with the
   utilization of DNA-damaging agents ([97]Figure S4C). Furthermore, the
   analysis repeated on TP53 aberrant or TP53 wild-type patients only
   ([98]Figure S4D) showed similar trends. Overall, this indicates that
   rs1055472 signal is not dependent on the TP53 status. In addition, the
   signal also persisted when a multivariate model including breast cancer
   subtype as covariate was used (p value = 0.028), also highlighting an
   independence of the signal from ER status ([99]Figure S4E).

Figure 3.

   [100]Figure 3
   [101]Open in a new tab

   Prognostic significance of tranSNPs in breast cancer

   (A) Progression-Free Interval analysis of BRI3BP-related tranSNP.
   Kaplan-Meyer curves along with summary statistics are reported.

   (B–D) Examples of tranSNPs presenting prognostic significance.
   Kaplan-Meyer curves along with summary statistics are reported.

   Motivated by this result, we then explored more systematically whether
   tranSNPs could be associated with distinct clinical variables in
   patients with cancer. To this end, our tranSNPs list was extended
   including all common SNPs in strong linkage disequilibrium (LD) (r^2 >
   0.8) with them, obtaining 3,003 SNPs distributed across 120 LD blocks
   with genotype imputable from TCGA data. The extended SNP list was then
   used to interrogate clinical data in the breast cancer cohort from TCGA
   ([102]Liu et al., 2018), using Overall Survival (OS), Disease and
   Progression Free Intervals (DFI, PFI), and Disease Specific Survival
   (DSS) endpoints. Kaplan-Meier curves were built stratifying patients
   for the presence of the alternative allele (AA versus AB + BB) or the
   presence of the homozygous genotype for the alternative allele (AA + AB
   versus BB). Cox proportional hazards regression models were built to
   perform the analysis. To limit false-positive results, for each
   considered outcome, association results demonstrating a p value<0.05
   were aggregated at the level of LD blocks and >5% of SNPs reproducing
   the association signal in a block were required to nominate the block
   as associated. On average, we found 10.7% (min 8.3%, max 14.2%)
   associated blocks across all outcomes with an average fraction of SNPs
   associated in a block equal to 56.8% and an average number of SNPs per
   block of 23.4 ([103]Table S5). Across the associated blocks, we
   identified 33 SNPs in the UTR sequences of 17 genes demonstrating
   significant prognostic effects ([104]Figures 3B-3D, and [105]Table S6).
   Both protective and risk alternative SNP alleles were found.

UTR tranSNPs could alter RNA-binding protein target sites

   To search for a mechanism that could underlie polysomal imbalance and
   the observed differential translation, we examined the potential impact
   on RNA-binding proteins (RBPs)-binding sites for the 33 UTR tranSNPs
   resulting from the survival analysis. These polymorphisms might indeed
   directly cause the observed allelic imbalance of SNPs located in UTRs
   by, for example, impacting on UTR structure or binding sites for RBPs
   or also microRNAs. RNA binding consensus motifs were retrieved from the
   RBPDB database ([106]Cook et al., 2011). TESS software ([107]Schug,
   2008) was used to compute motifs scores starting from 60-bp sequences
   with the SNP reference or alternative allele in the center. Overall, we
   identified 61 putative motif disruptions involving 27 UTR variants and
   RBPs associated with translation control and mRNA stability ([108]Table
   S7), such as ELAVLl/HuR, FUS, YBX1, and PABPC1 ([109]Behm-Ansmant
   et al., 2007; [110]Mazan-Mamczarz et al., 2003; [111]Peixeiro et al.,
   2012, p. 1; [112]Tanguay and Gallie, 1996; [113]Yasuda et al., 2013).

UTR-specific imbalance in ATF6 transcript

   Among the UTR SNPs showing potential prognostic significance, a
   peculiar allelic imbalance pattern was observed for polymorphisms in
   the ATF6 transcript. Although only one ATF6 SNP was present in our
   initial tranSNPs list, an overall trend of imbalance in
   polysomal-associated RNAs was observed and restricted to SNPs located
   in its relatively long 3′ UTR ([114]Figure 4A). This imbalance trend
   was consistent across most ATF6 3′ UTR SNPs that are heterozygous in
   MCF7, and the overall imbalance distribution was strongly statistically
   significant (p value < 1 × 10^−4) when compared with imbalance
   distributions obtained from 1,000 sets of random sequential
   heterozygous SNPs of the same cardinality ([115]Figure 4B). On the
   contrary, the ATF6 heterozygous coding SNPs did not show any
   significant imbalance. Furthermore, alternative alleles were confirmed
   to be all in phase using 1,000 Genomes Project phased genotype data,
   ATF6 transcript level was not associated with clinical outcomes in both
   univariate and multivariate models, and no differential ATF6 transcript
   level was observed for patients carrying the clinically distinct
   haplotype structure.

Figure 4.

   [116]Figure 4
   [117]Open in a new tab

   Haplotype structure and allelic imbalance along the ATF6 gene and
   impact of UTR TranSNPs

   (A) RNA-seq-based allelic fractions of ATF6 heterozygous SNPs are
   reported for both coding and 3′ UTR (in red) SNPs. On the top we show
   the distribution observed in the first biological replicate, and on the
   bottom we show the distribution observed in the second biological
   replicate.

   (B) Significance of ATF6 3′ UTR SNPs allelic imbalance (red line)
   versus distribution of sequential random SNPs imbalances. On top using
   heterozygous SNPs data from the first biological replicate, and on the
   bottom using data from the second biological replicate.

   (C) Dual-luciferase assays in MCF7 cells transiently transfected with
   reporter vectors containing ATF6 3′ UTR SNP alleles. Experiments were
   developed as described in [118]Figure 2. ∗∗∗∗p value < 0.0001, adjusted
   p value based on a two-way ANOVA with Sidak's multiple comparison test.
   Data are represented as mean ± SD.

   (D) RIP experiment probing the interaction of PABPC1 with the ATF6
   transcript. Bars plot the average fold enrichment relative to the input
   sample. Individual average values from three biological replicates are
   also shown. Results obtained with an IgG control antibody are included.
   ∗p value < 0.05, two-tailed, unpaired t test. Data are represented as
   mean ± SD.

   A gene reporter assay was used to evaluate the functional impact of two
   different sub-regions of the long ATF6 3′ UTR, corresponding to
   chr1:161930870-161931403 and chr1:161931723-161932422 (hg19) genomic
   coordinates ([119]Figure 4C). For the downstream region, we observed a
   consistent statistically significant increase in the reporter gene
   activity for the allele harboring the alternative SNPs bases
   ([120]Figure 4C), both in untreated and Nutlin-treated cells. For the
   upstream region, we observed a similar trend of increased reporter gene
   activity (although not reaching statistical significance) for the
   allele harboring the alternative SNPs bases in the untreated cells but
   not in the Nutlin-treated cells.

   Computational predictions suggest that the allele harboring the
   alternative SNPs bases loses two PABPC1-binding sites that have strong
   scores when considering the allele harboring the reference SNPs bases
   ([121]Table S7), suggesting an allele-dependent distinct processing of
   the 3′ UTR. Indeed, using RNA immunoprecipitation (RIP) assays, we
   demonstrate that the ATF6 transcript, and specifically the 3′ UTR
   region, is a PABPC1 target in MCF7 cells ([122]Figure 4D).

Discussion

   The relative abundance of an mRNA in polysomal versus subpolysomal
   fraction is frequently used as a proxy for protein synthesis and can
   capture changes both in global and specific translation efficiency,
   driven by structural or sequence elements in the context of cell stress
   responses. It is common knowledge that cancer cells experience chronic
   stress as well as acute stress responses that converge on translation
   controls, unfolded protein responses, oxidative stress, and proteasome
   functions. Recent studies, including some from our group, are placing
   wild-type p53 in an important position to respond and influence these
   pathways and also are indicating that the gain of function properties
   of several mutant p53 proteins may converge on the dysregulation of
   controls on translation quality, protein folding, and proteasome
   functionality. Hence, in this study, we chose a p53 wild-type cancer
   cell line well characterized in terms of p53-induced responses
   ([123]Andrysik et al., 2017), including post-transcriptional and
   translational changes ([124]Zaccara et al., 2014). We used both a
   commonly used chemotherapeutic drug that activates the DNA damage
   response, and a selective MDM2 inhibitor that results in acute p53
   activation without apparent genotoxic response ([125]Rizzotto et al.,
   2020; [126]Tovar et al., 2006). Doses and treatment time points were
   selected based on previous characterization of both p53 activation
   pathway and cell outcome ([127]Lion et al., 2013; [128]Nassiri et al.,
   2019). Furthermore, for both treatments, we have previously shown a
   global as well as a specific impact on translation ([129]Zaccara
   et al., 2014).

   Since MCF7 cells have been genotyped both by SNP arrays and
   exome-sequencing as part of the cell-line encyclopedia and NCI-60
   studies, we could leverage RNA-seq data of total and polysome-bound
   mRNAs to focus on allele counts for all transcribed SNPs in the
   heterozygous state to reveal an imbalance in one fraction over the
   other. We hypothesized that this approach, although restrictive, could
   overcome limitations deriving from the quality and quantity of sequence
   reads that can be attributed to each allele of a SNP pair.
   Computational approaches that use allele counts of heterozygous SNPs to
   identify imbalance events have been implemented and widely used to
   investigate allele-specific expression patterns both in healthy and
   disease tissues ([130]Pandey et al., 2013; [131]Romanel, 2019;
   [132]Romanel et al., 2015; [133]Rozowsky et al., 2011; [134]Skelly
   et al., 2011; [135]Wei et al., 2012). However, only few approaches
   ([136]Mayba et al., 2014; [137]Przytycki and Singh, 2020) have been
   developed to compare allelic imbalances across matched samples, and all
   of the available methods have been specifically designed in the context
   of cancer (e.g., to compare cancer samples with their matched normal
   counterparts), hence limiting hence their broad applicability.

   Using polysomal profiling and RNA relative quantification from either
   pooled or individual fractions typically show relatively small
   differences, that can, however, underlie an important effect at the
   level of protein amounts. The approach we developed to identify
   tranSNPs is focused on computing and comparing the relative change of
   allele counts across mRNA fractions. To account for technical noise,
   the approach embeds in the calculation the amount of allele counts
   variability observed in a given condition exploiting data from
   biological replicates. Of note, such variability was not affected by
   the cellular treatment employed and was consistent across different RNA
   fractions. We identified 147 tranSNPs, representing nearly 4% of the
   overall analyzed SNPs, a fraction that, although conservative, can be
   considered comparable with the recent fraction of about 2% of
   allele-specific expressed genes computed across tumor versus normal
   matched samples ([138]Przytycki and Singh, 2020). Indeed, in
   [139]Przytycki and Singh (2020), differential allele-specific
   expression (ASE) of genes was computed by aggregating at gene level
   heterozygous SNPs data and no statistically significant difference
   between genes with differential ASE and those without with respect to
   the number of heterozygous SNPs across the length of the genes was
   found. We checked the intersection of our tranSNP list with the current
   list of GWAS variants from the GWAS catalog ([140]Buniello et al.,
   2019) that are related to cancer (N = 2,657), although it is possible
   that tranSNPs might not be directly related to the risk of developing a
   tumor but be related to a distinct clinical outcome. Only an
   intersection between the LD block defined by our tranSNP rs760077 and
   association signal with gastric cancer (variant rs760077 and LD
   variants rs140081212, rs4072037) and breast cancer (LD variants
   rs2075570, rs2974935) was identified. We further explored the
   intersection between the tranSNPs and a set of GWAS-implicated cancer
   risk SNPs identified in 41 genes of the p53 response pathway, of which
   those mapping at KITLG, CDKN2A, and TEX9 genes were reported to
   potentially impact drug sensitivity ([141]Stracquadanio et al., 2016;
   [142]Zhang et al., 2021). Given that MCF7 cells express wild-type p53
   and that the doxorubicin or Nutlin treatments elicit a p53-dependent
   response, we were motivated to examine the potential overlap with our
   tranSNP list. However, only 13 of those 41 GWAS-implicated cancer risk
   SNPs could be evaluated in our MCF7 RNA-seq data. They map at six genes
   (CERSS, ISYNA1, CFLAR, SESN1, AKAP9, CYP51A1) that were not reported to
   impact drug sensitivity ([143]Stracquadanio et al., 2016; [144]Zhang
   et al., 2021). Finally, we considered the results of a very recent
   study where over 12,000 3′ UTR variants, including more than 2,000
   disease-associated SNPs from the GWAS catalog, were cloned within
   100-nt-long fragments and tested in a massively parallel reporter assay
   measuring relative RNA expression ([145]Griesemer et al., 2021).
   Surprisingly, about 25% of the tested GWAS SNPs resulted in
   steady-state changes in reporter transcripts' abundance. Among the GWAS
   SNPs included in that screen, 131 are heterozygous in MCF7 and could be
   analyzed by our method. We found no intersection with our tranSNP list,
   consistent with our approach being focused on allelic differences that
   are detected among polysomal-associated mRNAs but not total mRNAs.

   A subset of the identified UTR tranSNPs showed apparent prognostic
   value. Some of these are located in genes that have been already
   established to play a role in cancer in general and in breast cancer or
   in relation to p53 functions, in particular. These include BRI3BP, also
   known as human cervical cancer oncogene 1 (HCCR-1), that might act as a
   negative modulator of p53 ([146]Cho et al., 2007; [147]Ha et al.,
   2008); the ATF6 gene, an important modulator of endoplasmic reticulum
   stress that can promote cell survival ([148]Guan et al., 2017;
   [149]Sicari et al., 2019); and the protein lysine methyltransferase
   SETD3 gene, which was recently reported as a prognostic marker in
   patients with breast cancer ([150]Hassan et al., 2020) and also shown
   to promote apoptosis in response to DNA damage, in part through the
   modulation of p53 ([151]Abaev-Schneiderman et al., 2019). Other genes
   implicated by our results are GFM1, a mitochondrial translation
   elongation factor that has been linked to oxidative stress and cancer
   cell survival ([152]Norberg et al., 2017), and PLEC, an important
   member of a large family of scaffolding proteins that can modulate
   various aspects of cell function. The plectin gene is frequently found
   co-amplified with MYC in BRCA1-deficient breast cancer ([153]Annunziato
   et al., 2019). Sequence variants predicted to be deleterious have also
   been identified in primary breast cancers ([154]Horvath et al., 2013).
   Physical interaction of plectin with BRCA2 has also been reported
   ([155]Niwa et al., 2009). In all these cases, the tranSNPs we uncovered
   may be modifiers of protein expression, influencing mRNA translation
   efficiency, with consequences on cancer phenotypes.

   To begin exploring a more precise mechanism of action, motifs analysis
   based on eCLIP or RIP-seq data was exploited to identify UTR tranSNPs
   that may alter or lead to gain or loss of RNA-binding sites. Based on
   these predictions and focusing on a peculiar imbalance in ATF6
   transcript that we observed only across 3′ UTR SNPs, we performed RIP
   assays and confirmed that ATF6 is a target of PABPC1, a well-known
   RNA-binding protein. This, combined with predicted putative loss of two
   PABPC1-binding sites across one of the two ATF6 alleles, suggests an
   allele-dependent processing of the 3′ UTR.

   Overall, our approach led us to identify a class of SNPs that are
   associated with changes in mRNA translation efficiency, potentially
   acting at the post-transcriptional level, and that show prognostic
   value for cancer, hence implicating the potential identification or
   stratification of patients with cancer based on genetic markers that
   are helpful in the prediction of prognosis in regard to death,
   progression, or recurrence.

Limitations of the study

   This study considers a single cell line and two biological replicates
   for each considered treatment/condition. Survival analysis is focused
   on breast cancer data. Our catalog of tranSNPs is hence limited to the
   data we analyzed.

STAR★Methods

Key resources table

   REAGENT RESOURCE SOURCE IDENTIFIER
   Antibodies
     __________________________________________________________________

   Anti-PABP Abcam Cat# bb6125 [10E10]
   Normal mouse IgG Merck Millipore Cat# 12-371
     __________________________________________________________________

   Bacterial and virus strains
     __________________________________________________________________

   E. coli strain laboratory stock DH5alpha
     __________________________________________________________________

   Chemicals, peptides, and recombinant proteins
     __________________________________________________________________

   TRIzol Thermo Fisher Scientific Cat# 15596026
   Nutlin Cayman chemicals Cat#10004372
   Doxorubicin Sigma Aldrich Cat#25316-40-9
   Xba I New England Biolabs Cat# R0145S
   Nco I New England Biolabs Cat# R3193S
   T4 DNA ligase New England Biolabs Cat# M0202S
   Protein A Magnetic Beads Thermo Fisher Scientific Cat# 88845
     __________________________________________________________________

   Critical commercial assays
     __________________________________________________________________

   TruSeq RNA Library preparation kit v2 Illumina Cat#RS-122-2001
   Agilent RNA 6000 Nano kit Agilent Cat#5067-1511
   RevertAid RT Kit Thermo Fisher Scientific Cat# K1691
   Dual Luciferase Reporter assay system Promega Cat# E1910
     __________________________________________________________________

   Deposited data
     __________________________________________________________________

   Sequencing of total and polysomal RNA from normal and treated MCF7
   cells This paper Bioproject: PRJNA693005
   TCGA SNP 6.0 array calls Genomic Data Commons
   [156]http://portal.gdc.cancer.gov/legacy-archive
   TCGA clinical information [157]Liu et al. (2018)
   [158]https://doi.org/10.1016/j.cell.2018.02.052 ([159]Table S1)
   MCF7 heterozygous SNPs (array data) [160]Barretina et al., 2021 GEO
   [161]GSM888366
   MCF7 heterozygous SNPs (wes data) [162]Reinhold et al. (2012)
   [163]https://doi.org/10.1158/0008-5472.CAN-12-1370
   RBPDB [164]http://rbpdb.ccbr.utoronto.ca/
   [165]https://doi.org/10.1093/nar/gkq1069
     __________________________________________________________________

   Experimental models: Cell lines
     __________________________________________________________________

   MCF7 ICLC N/A
     __________________________________________________________________

   Oligonucleotides: Cloning primers
     __________________________________________________________________

   BRI3BP_F Metabion aatctagaAGGTCAGCCGGCCGGGCGGGTCCAC
   BRI3BP_R Metabion aatctagaAACTTGACTCAATCTGCCTTTATTA
   ATF6_852F Metabion aatctagaTTCATCCCTCGATTCCCAGC
   ATF6_852R Metabion aatctagaGCAACCCCCAAAAGGCAATC
   ATF6_848F Metabion aatctagaGGACACAGCTTCATTAGAGTGTT
   ATF6_848R Metabion aatctagaGGCTGTGAAAGCAAAAGTGGT
   p21_UTR_ref_F Metabion CATGGGGTGGCTATTTTGTCCTTGGGCTGCCT
   GTTTTCAGCTGCTGCAACCACAGGATTTCTTCT
   GTTCAGGCGCCC
   p21_UTR_ref_R Metabion CATGGGGCGCCTGAACAGAAGAAATCCCTGTGGT
   TGCAGCAGCTGAAAACAGGCAGCCCAAGGACAAA
   ATAGCCACCC
   p21_UTR_ALT_F Metabion CATGGGGTGGCTATTTTGTCCTTGGGCTGCCTGT
   TTTCAGCTGCTGCAACCACAGGGGTTTCTTCTGTT
   CAGGCGCCC
   p21_UTR_ALT_R Metabion CATGGGGCGCCTGAACAGAAGAAACCCCTGTGGT
   TGCAGCAGCTGAAAACAGGCAGCCCAAGGACAAA
   ATAGCCACCC
     __________________________________________________________________

   Oligonucleotides: PCR and sequencing primers
     __________________________________________________________________

   GAPDH_F Metabion TCCAAAATCAAGTGGGGCGA
   GAPDH_R Metabion AGTAGAGGCAGGGATGATGT
   ATF6_F Metabion CCCGTATTCTTCAGGGTGCT
   ATF6_R Metabion TCACTCCCTGAGTTCCTGCT
   ATF6_UTR_F Metabion GAACCTTCCTCCCCTGTGTG
   ATF6_UTR_R Metabion CAGAGTGAAAGGGGGCATCA
     __________________________________________________________________

   Software and algorithms
     __________________________________________________________________

   ASEQ [166]https://bitbucket.org/CibioBCG/aseq/
   [167]https://doi.org/10.1186/s12920-015-0084-2
   Ensembl rest API [168]https://rest.ensembl.org/
   [169]https://doi.org/10.1093/bioinformatics/btu613
   SHAPEIT
   [170]https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.h
   tml [171]https://doi.org/10.1016/j.ajhg.2013.09.002
   IMPUTE2 [172]http://mathgen.stats.ox.ac.uk/impute/impute_v2.html
   [173]https://doi.org/10.1371/journal.pgen.1000529
   Survival R package
   [174]https://cran.r-project.org/web/packages/survival/index.html
   [175]https://doi.org/10.1007/978-1-4757-3294-8
   TESS [176]https://www.cbil.upenn.edu/tess
   [177]https://doi.org/10.1002/0471250953.bi0206s21
   VariantAnnotation R package
   [178]https://bioconductor.org/packages/release/bioc/html/VariantAnnotat
   ion.html [179]https://doi.org/10.1093/bioinformatics/btu168
   [180]Open in a new tab

Resource availability

Lead contact

   Further information and requests for resources should be directed to
   and will be fulfilled by the lead contact, Alberto Inga
   ([181]alberto.inga@unitn.it).

Materials availability

   This study did not generate new materials.

Experimental model and subject details

Cell lines and culture conditions

   MCF7 cells were cultured in standard RPMI (Lonza) supplemented with 10%
   FBS (brand), 100 units/ml penicillin, 100 mg/mL streptomycin antibiotic
   mix, and 2 mM glutamine. Cells were periodically tested to ascertain
   the absence of mycoplasma contamination. The authenticity of the
   parental cells was confirmed by a commercial genotypic service (BMR
   genomics, Padova, Italy). Doxorubicin was purchased from Sigma. Nutlin
   was purchased from Cayman chemicals.

Method details

SNP status in MCF7 cells

   We retrieved from the literature genotype assays for MCF7. Two
   different public datasets (GEO and NCI-60) were used containing
   information about homozygosis or heterozygosis conditions for SNP
   alleles in MCF7, the unique identifier, the relative and alternative
   base in specific single positions. One SNP-array based file
   ([182]GSM888366) contains 756,260 SNPs with the information about the
   reference and alternative bases [183](Barretina et al., 2012), while
   the second exome-based file ([184]Reinhold et al., 2012) contains
   85,593 SNPs. The SNPs in common (N = 6,454) were compared to verify the
   consistency among the two datasets: more than 91% of the SNP calls were
   consistent. This concordance suggested the possibility to merge the two
   files removing the data with opposite information. A first screening
   was performed through the elimination of SNPs called as homozygous in
   the two datasets, but with a different base call: i.e. reference “0/0”
   vs alternative “1/1”. Together, the final file presents 11,544
   heterozygous SNPs of which 1,802 in 3′UTR and 729 in 5′UTR. Variants
   were annotated using VariantAnnotation R package ([185]Obenchain
   et al., 2014).

Differential allele-specific expression analysis

   To identify allelic imbalances between polysomal and total RNA
   fractions a computational approach that exploits SNP allelic fractions
   variability across biological replicates was implemented. Given an
   experimental condition C, SNP allelic fractions variability for
   condition C is calculated as:
   [MATH:
   <msub><mi>V</mi><mi>c</mi></msub><mo>=</mo><mtext>mean</mtext><mfenced>
   <mfenced open="{"
   close="}"><mrow><mrow><mo>|</mo><mi>s</mi><mi>n</mi><msubsup><mi>p</mi>
   <mi>i</mi><msub><mi>R</mi><mi>j</mi></msub></msubsup><mo>−</mo><mi>s</m
   i><mi>n</mi><msubsup><mi>p</mi><mi>i</mi><msub><mi>R</mi><mi>k</mi></ms
   ub></msubsup><mo>|</mo></mrow><mi>s</mi><mi>u</mi><mi>c</mi><mi>h</mi><
   mspace
   width="0.25em"></mspace><mi>t</mi><mi>h</mi><mi>a</mi><mi>t</mi><mspace
   width="0.25em"></mspace><mn>1</mn><mo>≤</mo><mi>i</mi><mo>≤</mo><mi>N</
   mi><mspace
   width="0.25em"></mspace><mi>a</mi><mi>n</mi><mi>d</mi><mspace
   width="0.5em"></mspace><mn>1</mn><mo>≤</mo><mi>j</mi><mo>,</mo><mi>k</m
   i><mo>≤</mo><mi>M</mi><mspace
   width="0.5em"></mspace><mi>a</mi><mi>n</mi><mi>d</mi><mspace
   width="0.5em"></mspace><mi>j</mi><mo><</mo><mi>k</mi></mrow></mfenced><
   /mfenced> :MATH]

   where N is the number of considered SNPs, M is the number of biological
   replicates, and
   [MATH:
   <mrow><mi>s</mi><mi>n</mi><msubsup><mi>p</mi><mi>i</mi><msub><mi>R</mi>
   <mi>j</mi></msub></msubsup></mrow> :MATH]
   is the allelic fraction of SNP i computed from the RNA-seq BAM file
   representing the biological replicate R[j] using ASEQ pileup module
   ([186]Romanel et al., 2015) and defined as:
   [MATH:
   <mrow><mi>s</mi><mi>n</mi><msubsup><mi>p</mi><mi>i</mi><msub><mi>R</mi>
   <mi>j</mi></msub></msubsup><mo
   linebreak="badbreak">=</mo><mfrac><mrow><mo>#</mo><mi>A</mi><mi>L</mi><
   mi>T</mi></mrow><mrow><mo>#</mo><mi>A</mi><mi>L</mi><mi>T</mi><mo
   linebreak="badbreak">+</mo><mo>#</mo><mi>R</mi><mi>E</mi><mi>F</mi></mr
   ow></mfrac></mrow> :MATH]

   where
   [MATH: <mrow><mo>#</mo><mi>A</mi><mi>L</mi><mi>T</mi></mrow> :MATH]
   and
   [MATH: <mrow><mo>#</mo><mi>R</mi><mi>E</mi><mi>F</mi></mrow> :MATH]
   correspond to, respectively, the number of reads supporting the SNP i
   alternative base and the number of reads supporting the SNP i reference
   base. Of note, in the pileup computation only reads and bases with
   quality above 20 and a minimum of local depth of coverage of 10X were
   considered.

   Now, given variabilities values V[poly] and V[tot] computed,
   respectively, from RNA-seq replicate samples of polysomal and total RNA
   fractions, we define as tranSNPs all SNPs satisfying the following
   condition:
   [MATH: <mrow><msub><mo>∀</mo><mrow><mi>j</mi><mo
   linebreak="badbreak">∈</mo><mrow><mo>{</mo><mrow><mn>1</mn><mo>,</mo><m
   o>⋯</mo><mo>,</mo><mi>M</mi></mrow><mo>}</mo></mrow></mrow></msub><mrow
   ><mo>(</mo><mrow><mi>s</mi><mi>n</mi><msubsup><mi>p</mi><mi>i</mi><mrow
   ><mi>p</mi><mi>o</mi><mi>l</mi><msub><mi>y</mi><mi>j</mi></msub></mrow>
   </msubsup><mo
   linebreak="badbreak">+</mo><msub><mi>V</mi><mrow><mi>p</mi><mi>o</mi><m
   i>l</mi><mi>y</mi></mrow></msub><mo
   linebreak="badbreak"><</mo><mi>s</mi><mi>n</mi><msubsup><mi>p</mi><mi>i
   </mi><mrow><mi>t</mi><mi>o</mi><msub><mi>t</mi><mi>j</mi></msub></mrow>
   </msubsup><mo
   linebreak="badbreak">−</mo><msub><mi>V</mi><mrow><mi>t</mi><mi>o</mi><m
   i>t</mi></mrow></msub></mrow><mo>)</mo></mrow><mo>∨</mo><msub><mo>∀</mo
   ><mrow><mi>j</mi><mo
   linebreak="badbreak">∈</mo><mrow><mo>{</mo><mrow><mn>1</mn><mo>,</mo><m
   o>⋯</mo><mo>,</mo><mi>M</mi></mrow><mo>}</mo></mrow></mrow></msub><mrow
   ><mo>(</mo><mrow><mi>s</mi><mi>n</mi><msubsup><mi>p</mi><mi>i</mi><mrow
   ><mi>p</mi><mi>o</mi><mi>l</mi><msub><mi>y</mi><mi>j</mi></msub></mrow>
   </msubsup><mo
   linebreak="badbreak">−</mo><msub><mi>V</mi><mrow><mi>p</mi><mi>o</mi><m
   i>l</mi><mi>y</mi></mrow></msub><mo
   linebreak="badbreak">></mo><mi>s</mi><mi>n</mi><msubsup><mi>p</mi><mi>i
   </mi><mrow><mi>t</mi><mi>o</mi><msub><mi>t</mi><mi>j</mi></msub></mrow>
   </msubsup><mo
   linebreak="badbreak">+</mo><msub><mi>V</mi><mrow><mi>t</mi><mi>o</mi><m
   i>t</mi></mrow></msub></mrow><mo>)</mo></mrow></mrow> :MATH]

Survival analysis using TCGA data

   From our list of 147 tranSNPs an extended list of variants was built
   retrieving variants in strong linkage disequilibrium (LD). We queried
   the Ensembl rest API version GRCh37 ([187]Yates et al., 2015) and
   retrieved all the variants with r^2 > 0.8 in a genomic window of 500kbp
   using general population data. For a total of 3,003 variants,
   distributed among 120 LD blocks, we were able to retrieve genotype
   calls from TCGA data. Specifically, raw TCGA genotype calls were
   downloaded from TCGA legacy data portal
   (portal.gdc.cancer.gov/legacy-archive) and only genotypes with
   confidence score larger than 0.1 were retained. Genotypes were analyzed
   with SHAPEIT v2 ([188]Delaneau et al., 2013) to infer haplotype
   structure and optimize genotype content information for the imputation
   process. Variants were imputed using IMPUTE v2.3.2 ([189]Howie et al.,
   2009) against a reference panel built from 1,000 Genomes Project data.
   The final extended list of variants was used to perform
   variant-specific survival analysis using TCGA breast cancer survival
   data ([190]Liu et al., 2018). Overall Survival (OS), Disease-Specific
   Survival (DSS), Disease-Free Interval (DFI) and Progression-Free
   Interval (PFI) were considered in our analysis. Dominant and recessive
   models were built, respectively, grouping together patients with
   homozygous alternative and heterozygous genotypes (AA vs AB+BB) and
   homozygous reference and heterozygous genotypes (AA+AB vs BB).
   Kaplan-Meier survival curves were built for each variant and Cox
   proportional hazards regression models were used and inspected. To
   limit false positive results, for each considered outcome, association
   results demonstrating a p value <0.05 (minimum between Log rank test p
   value and Wald test p value was considered) were aggregated at the
   level of LD blocks and >5% of variants reproducing the association
   signal in a block were required to nominate the block as associated.
   Survival analyses were performed using survival R package
   ([191]Therneau and Grambsch, 2000).

Identification of SNPs disrupting putative RBP consensus motifs

   We retrieved 552 human RNA binding consensus motifs from RBPDB database
   ([192]Cook et al., 2011). Motif analysis was performed on the list of
   33 UTR tranSNPs resulting from the survival analysis. For each variant,
   two sequences of length 60bp with the SNP reference or alternative
   allele in the middle were built using GRCh37 reference genome. All
   sequences were then collected in a FASTA file and TESS software
   ([193]Schug, 2008) was ran on this file to compute motifs scores. TESS
   tool provides a set of log-likelihood-ratio-based scores, among which
   we used the score La, which represent the log-odds ratio of the match,
   and the score Lm, which represents the maximum possible log-odds ratio
   for a match from a given consensus motif. For each motif overlapping a
   variant, results were collected only if the score La for the reference
   or the alternative allele was at least 3 and it was computed on the
   forward (transcribed) strand. Confident results were filtered by
   considering only entries having a La/Lm value greater than 0.5 in at
   least one of the two conditions. In addition, only variants presenting
   an absolute ratio between reference and alternative La scores greater
   than 10% or presence of a motif with the reference but not with the
   alternative (or vice-versa) were retained.

Polysomal profiling and RNA-sequencing

   MCF7 parental cells were seeded in P100 dishes and treated at about 70%
   confluence by 1 μM doxorubicin or 10 μM Nutlin, for 16 hours. DMSO was
   used for treatment control. Cytoplasmic lysates were obtained and
   polysome fractionations performed as described in ([194]Dassi et al.,
   2015; [195]Provenzani et al., 2006; [196]Rizzotto et al., 2020;
   [197]Rossi et al., 2019; [198]Zaccara et al., 2014). Briefly,
   cytoplasmic lysates were loaded on a 15 ml, 10-50% sucrose gradient,
   ultra-centrifuged (40k rpm for 100’) and fractionated with an automated
   fraction collector (1 ml per fraction Teledyne ISCO), tracing the
   254[nm] absorbance. All the lighter fractions containing subpolysomal
   fractions (from the top to the gradient up to the fractions
   corresponding to the 80S monosome) were pooled in a tube. Heavier
   fractions corresponding to polysomal RNAs (two or more ribosomes) were
   also pooled in a separate tube. RNA was purified by extraction with 1
   volume of phenol-chloroform. The aqueous fraction was recovered after
   centrifugation, transferred to a new tube where RNA precipitation was
   induced by the addition of 1 volume of isopropanol and a subsequent
   centrifugation step. Pellets were washed once using 70% V/V ethanol to
   remove phenol contaminations and resuspended in sterile, RNAse free
   water. Total RNA samples were instead obtained by TRIzol (ThermoFisher)
   extraction of a separate cell population prepared in parallel. RNA
   preparations were quantified by NanoDrop microvolume spectrophotometer.
   Purity and integrity were checked by Agilent 2100 Bionalyzer
   electrophoretic runs exploiting the Agilent RNA 6000 Nano kit. Sampling
   with RIN over 6 were used to prepare libraries for RNA sequencing. Two
   biological replicates for each RNA type (total and polysomal) and
   condition (mock, doxorubicin, Nutlin) were sequenced. Libraries were
   prepared following the TruSeq RNA Library preparation kit v2 protocol
   (Illumina), staring from 1 μg of input RNA. Paired end (100 bp)
   sequencing was performed on a HiSeq 2500 (Illumina). FASTA files were
   aligned to the reference genome GRCh37 using TopHat v2.0.10 ([199]Kim
   et al., 2013), resulting in BAM files with an average of ∼32 million
   unique mapping (mapping quality >5) reads. Differential expression
   analysis was performed with Cuffdiff ([200]Trapnell et al., 2013)
   comparing Nutlin and doxorubicin samples against mock samples,
   considering both total and polysomal RNA fractions. Pathway enrichment
   analysis of deregulated genes in all tested comparisons was performed
   using clusterProfiler R package ([201]Yu et al., 2012) exploiting KEGG
   database ([202]Kanehisa et al., 2020). Comparison of multiple gene
   lists obtained from condition-associated tranSNPs was performed with
   Metascape ([203]Zhou et al., 2019).

Cloning 5’UTR or 3’UTR allelic variants for gene reporter assays

   The commercial Firefly luciferase plasmid pGL4.13 was used to clone the
   two rs1055472 alleles BRI3BP 3′UTR as well as two regions of the ATF6
   3′UTR containing different set of SNPs, using a PCR based approach that
   introduced the XbaI restriction enzyme site. MCF7 cDNA was used as
   template, given the heterozygous state for the SNPs of interest. The
   cloned regions correspond to genomic coordinates
   chr12:125509977-125510349 (hg19) for BRI3BP and
   chr1:161930870-161931403 or chr1:161931723-161932422 for ATF6. Primers
   used for the cloning are listed in the [204]key resources table, where
   the lowercase bases are the restriction site sequence included in the
   primers after two initial adenines. The pGL3 basic vector was instead
   used to clone the two rs2395655 alleles of the CDKN1A 5’UTR exploiting
   the unique NcoI site that overlaps with the firefly start codon and
   using complementary primers containing the 5’UTR sequence of interest
   for each SNP allele that were annealed prior to ligation to generate
   NcoI sites at both ends. After in vitro ligation (T4 DNA ligase, NEB),
   E. coli competent cells were transformed and colonies picked to recover
   plasmids based first on a colony PCR approach, followed by plasmid
   recovery from positive colonies followed by electrophoresis migration
   and digestion pattern analysis. The successful cloning of both SNP
   alleles for the two targets was then verified by Sanger sequencing
   (Eurofins Genomics) of independent candidate colonies.

Dual-luciferase assays

   pGL4.13- or pGL3-derivative plasmids were transiently transfected along
   with the pRL-SV40 Renilla control luciferase in a 3:1 ratio (250 ng
   plus 50 ng for 24 well format) using Mirus-LTI transfection reagents
   following the manufacturer's protocol. When needed, cells were treated
   with doxorubicin or Nutlin for 16 hours, starting 24 hours after
   transfection. Luciferase assays were performed using Dual-Luciferase™
   Reporter (DLR™) Assay System (Promega), following the provided
   protocol. Light units were measured using a Tecan M200pro multilable
   plate reader through three technical replicates of well. Presented in
   the graphs are the average Relative Light Units (Firefly/Renilla) and
   standard deviations of three biological experiments, compared using a
   two-tailed, equal variance Student's t-test.

RIP assays

   Ribonucleoprotein particles immunoprecipitation (RIP) experiments were
   performed following a published protocol with some modifications
   described recently ([205]Keene et al., 2006; [206]Uroda et al., 2019),
   starting from 4 × 10^7 MCF7 cells. After lysis, the supernatants were
   collected and 1% of each sample was set aside as input while the
   remaining was incubated for 4 hours at 4°C with protein A magnetic
   beads (Thermo Fisher Scientific) coated either with 3 μg of an
   anti-PABPC1 antibody (Abcam) or with 3 μg of normal Rabbit IgG (Santa
   Cruz). After the washing steps, RNA was isolated from Input and IP
   samples using TRIzol (Thermo Fisher Scientific) and converted to cDNA
   using the RevertAid First Strand cDNA Synthesis Kit and standard
   protocol (Thermo Fisher Scientific). qPCR reactions were performed
   using the qPCRBIO SyGreen Mix (PCR Biosystems) master mix on a CFX384
   thermal cycler (Biorad). GAPDH was tested as a comparison. Two pairs of
   primers were used for ATF6, of which one targeting exon 9, and the
   second targeting the 3UTR sequence comprising the rs2499847 SNP whose
   alleles are predicted to impact a PABPC1 binding site ([207]Table S7).
   The primers sequences can be found in the [208]key resources table.

Quantification and statistical analysis

   Statistical tests applied throughout the study are specified in
   results, figure legends and in the methods accordingly. Given a list of
   variants, LD blocks were determined identifying the number of cliques
   in a graph were nodes are variants and edges are present between two
   nodes when the corresponding variants are in LD with an r^2 > 0.8. In
   all considered cases the number of cliques corresponded to the number
   of connected components. Luciferase assay data were analyzed using a
   2-way ANOVA with Sidak’s multiple comparison test. RIP assays were
   analyzed using a two-tailed unpaired t-test.

Acknowledgments