Graphical abstract graphic file with name fx1.jpg [35]Open in a new tab Highlights * • SNPs can alter mRNA translation potential * • Polysomal imbalance of heterozygous SNPs can mark allele-specific protein expression * • UTR SNPs associated with allele-specific translation demonstrate prognostic value __________________________________________________________________ Molecular mechanism of gene regulation; Computational bioinformatics; Transcriptomics Introduction Single-nucleotide polymorphisms (SNPs) represent one of the largest classes of genetic variations. They underlie and are responsible for inter-individual variations in complex-disease phenotypes, including cancer risk or aggressiveness. Wide attention has been given to SNPs that can lead to allele-specific changes in gene expression, for instance, by modifying the affinity of transcription factor-binding sites in promoter or enhancer regions directly, or indirectly via influencing epigenetic regulation. Included in these examples are functionally relevant SNPs affecting p53 function as a transcription factor or p53 protein expression by altering target binding sites ([36]Bond et al., 2006; [37]Menendez et al., 2007, [38]2019; [39]Tomso et al., 2005), or association-driven studies that have candidated SNPs as modulators of major cancer drivers, such as AR ([40]Bu et al., 2016; [41]Romanel et al., 2017), ER ([42]Bailey et al., 2016; [43]Dunning et al., 2016), or cMYC ([44]Pomerantz et al., 2009; [45]Takatsuno et al., 2013). A fraction of SNPs identified in the human population is located within coding regions or UTRs. In this case, both mechanism- and association-driven studies have pursued functional SNPs that can modify aspects of post-transcriptional gene regulation, for example, by altering microRNA-binding sites ([46]Landi et al., 2012). Specific tools have been implemented to mine the available wealth of RNA-seq-based gene expression data and identify and pursue instances of allele-specific gene expression ([47]Mayba et al., 2014; [48]Pandey et al., 2013; [49]Przytycki and Singh, 2020; [50]Romanel, 2019; [51]Romanel et al., 2015; [52]Rozowsky et al., 2011; [53]Skelly et al., 2011; [54]Wei et al., 2012). The same cannot be said for annotating candidate SNPs driving or being associated with alterations in mRNA translation potential. Recently, we showed that allelic imbalance restricted to polysome-bound mRNAs can be exploited to investigate the functional significance of 5′UTR single-nucleotide variants (SNVs) at the CDKN2A gene ([55]Andreotti et al., 2016), since specific SNVs affected CDKN2A translation. Those results led us to hypothesize that a comparative analysis of allelic-imbalance from total and polysomal mRNAs extracted and sequenced starting from the same cell sample, which is independently genotyped for heterozygous SNPs, could overcome the intrinsic noise in SNP calling and coverage from RNA-seq data. If so, a catalog of coding and UTRs SNPs associated and potentially causative of alterations in mRNA translation potential could be obtained. Here we describe such an approach starting from a commonly used breast cancer cell line and two disease-relevant treatments. About 4% of the heterozygous SNPs that could be followed showed a significant level of imbalance within polysomal RNAs. Nearly 25% of these are located in UTR sequences, and some appear to stratify patients with cancer for distinct clinical outcomes. Thus, our approach led us to distinguish both constitutive and treatment-dependent SNPs that are associated with or can cause changes in mRNA translation efficiency. We propose that our method can lead to identify a class of functional SNPs that we call tranSNPs, endowed with prognostic value for cancer or other diseases. Results Using polysome profiling to identify SNPs exhibiting allelic imbalance In order to develop an approach to map SNPs associated with allelic imbalance within translating ribosomes, we took advantage of a well-characterized cancer cell line with available genotyping data. We thus chose MCF7 cells, p53 wild-type breast adenocarcinoma-derived cells we had previously used for polysomal profiling studies ([56]Zaccara et al., 2014). After determining the set of heterozygous SNPs using DNA data, we exploited RNA-seq reads to compare relative allelic representation between the transcripts isolated in the cytoplasm because of their association with polysomes or lack thereof. Besides a mock condition, we included two treatments, 1 μM doxorubicin or 10 μM Nutlin, that we found can reduce global translation, engage p53 responses, and lead to a significant uncoupling between transcriptome and translatome changes. Indeed, many transcripts were significantly modulated by the treatments only at the level of polysomal RNA fractions ([57]Rizzotto et al., 2020; [58]Zaccara et al., 2014). We reasoned that these treatments could uncover specific allelic imbalance events due to changes in translation specificity mediated, for instance, by the action of RNA-binding proteins ([59]Zaccara et al., 2014; [60]Rizzotto et al., 2020). Polysomal profiling was performed using cytoplasmic lysates fractionated by a linear 10%-50% sucrose gradient, as previously described ([61]Rizzotto et al., 2020; [62]Rossi et al., 2019; [63]Zaccara et al., 2014). Cytoplasmic mRNA associated with light fractions, with ribosome subunits, or with the 80S monosome, assumed to be not actively translating, were pooled together and sequenced separately from mRNAs associated with two or more ribosomes. Total RNA was also collected and sequenced. The RNA sequencing of two biological replicates for each treatment and fraction produced on average 32 million unique mapped reads. Differential gene expression analysis and pathway enrichment analysis confirmed that the two treatments activated a canonical p53 response ([64]Figure S1). Identifying instances of allelic imbalance in polysomal RNA To identify SNPs exhibiting allelic imbalance in one or more RNA fractions, we first retrieved from public databases an exhaustive list of SNPs that are heterozygous in the MCF7 cell line. We integrated SNP-array and exome-based genotype calls and overall identified 11,544 heterozygous SNPs of which 1,802 were in 3′ UTRs and 729 were in 5′ UTRs ([65]Table S1). Then, we determined the allelic fractions of these SNPs in our RNA-seq data using a pileup approach based on high-quality reads and bases and requiring a minimum local depth of coverage of 10X. We found an average of 4,100 of the expected 11,544 SNPs per sample with confirmed coverage signal and, overall, we found 3,974 SNPs analyzable in at least one condition. Our approach focuses on the comparison of SNP allelic counts across biological replicates of the two RNA fractions that were sequenced for each of the three treatments. We first determined which is the distribution of SNP allelic fractions across all experiments and measured to what extent they are variable across biological replicates. Of interest, the variability of SNP allelic fractions was limited across biological replicates ([66]Figure S2) with a mean replicate's difference of about 7% that was consistent among the different RNA fractions and treatments ([67]Table S2). In addition, the range of SNP allelic fractions was quite wide ([68]Figure S2). This suggests that heterozygous SNP allelic counts from RNA-seq data are potentially biased by position-specific sequencing properties, indicating that divergence from an expected 0.5 allelic fraction should not, as commonly done, be directly linked to putative allele-specific expression phenomena or to a specific expression imbalance direction. Hence, we designed and implemented a computational approach that first calculates a condition-specific variability of SNP allelic fractions exploiting the available biological replicates and then uses it to identify significant SNP allelic imbalances across different conditions. We used this approach to identify allelic imbalances between polysomal and total RNA fractions. Specifically, considering SNPs allelic fraction (AF) intervals calculated by summing and subtracting condition-specific AF variabilities from SNPs AFs, we searched for SNPs presenting polysomal and total RNA non-overlapping AF intervals ([69]Figure 1A). The analysis led to 162 imbalance instances, involving 147 SNPs that were identified in both biological replicates preserving the imbalance direction. This conservative list of 147 SNPs represents the first catalog of tranSNPs: polymorphisms exhibiting allelic imbalance specifically in polysome-bound mRNA fractions ([70]Tables S3 and [71]S4). The list is almost equally divided between constitutive, doxorubicin-dependent, and Nutlin-dependent polysomal allelic imbalances ([72]Figure 1B). Only three tranSNPs were common to all conditions, and only eight to the two different p53-activating treatments. Despite the limited observed intersection among our conservative calls, the comparison of condition-associated tranSNPs allelic fractions across all conditions showed not only an expected clear shift in the condition-associated tranSNPs AF distribution but also a fraction of SNPs with comparable imbalance across two or three conditions ([73]Figure 1C). The condition-associated lists of genes harboring the tranSNPs showed mild enrichment for specific biological processes or molecular functions, including regulation of spindle organization, mitotic cytokinesis, protein kinase regulator activity, and catalytic activity ([74]Figure S3). Figure 1. [75]Figure 1 [76]Open in a new tab Identification of SNPs allelic imbalance across different RNA fractions (A) Schematic representation of the approach developed to identify RNA fraction-specific SNP allelic imbalances. RNA-seq-based SNP allelic fraction variability is estimated both in total and polysomal RNA fractions. Then variability extended SNP allelic fractions are compared and only non-overlapping total versus polysomal imbalances are retained as tranSNPs. In the example, SNP2 satisfies the condition and is hence nominated as tranSNP. AF, allelic fraction; V, mean AF variability among replicates. (B) Venn diagram showing private and shared tranSNPs identified across the three analyzed conditions. (C) Allelic imbalance distribution of condition-associated tranSNPs is shown across the different conditions. Aggregate distribution is shown using boxplots, whereas single SNPs distribution is shown using a heatmap, where red intensity represents the level of imbalance. In the boxplot, the imbalance is shown as absolute log2 ratio of allelic fraction in polysomal RNA and allelic fraction in total RNA. In the heatmap, red intensity is proportional to this value; gray represents no value. Characteristics of tranSNPs are shown in [77]Table 1 and are compared with those in the larger set of 11,544 initial SNPs and those in the set of 3,974 analyzable SNPs. Of the 147 tranSNPs we identified, 39 are located in UTRs and represent top candidates for a direct role on the observed allelic imbalance. Table 1. Summary table of MCF7 heterozygous SNPs MCF7 heterozygous SNPs Analyzable SNPs tranSNPs Number 11,544 3,974 147 Average Global MAF 0.26 0.25 0.25 Average EUR MAF 0.28 0.28 0.27 Overlapping genes 5,493 2,532 139 Intronic 4,724 675 26 Coding 6,161 3,007 108 UTR 2,553 1,182 39 LD blocks 8,656 3,202 143 [78]Open in a new tab Summary table presenting the characteristics of tranSNPs together with same features measured from the set of the SNPs heterozygous in the MCF7 cell line initially considered, and the set of SNPs analyzable from the RNA-seq data by our approach. Average Global and EUR MAF (Minor Allele Frequency) were retrieved from 1,000 Genomes Project data. Details on how the LD blocks were defined are provided in the text. BRI3BP 3′ UTR rs1055472 and CDKN1A 5′ UTR rs2395655 alleles are functionally distinct We chose two UTR SNPs showing polysomal allelic imbalance for validation and opted for a gene reporter assay to evaluate the functional impact of the two pairs of tranSNPs. We selected two Nutlin-dependent tranSNPs and two genes harboring them whose functions are related to p53. BRI3BP, also known as human cervical cancer oncogene 1 (HCCR-1), might act as negative modulator of p53 ([79]Cho et al., 2007; [80]Ha et al., 2008). The two rs1055472 alleles of the BRI3BP 373nt 3′ UTR were cloned downstream of the Firefly cDNA. MCF7 cells were transiently transfected with each of the two alleles, and the activity of the reporter relative to the Renilla control luciferase was measured in untreated cells or in cells treated with Nutlin. In both cases we observed that the alternative allele led to a relative increase in the reporter gene activity ([81]Figure 2A), suggesting an overall differential translation efficiency. Although the SNP was computationally classified as polysomally imbalanced only after Nutlin treatment ([82]Tables S3 and [83]S4), computational and experimental data combined suggest that the two alleles are functionally distinct, impacting on gene expression. Figure 2. [84]Figure 2 [85]Open in a new tab TranSNPs results in functionally distinct alleles (A) MCF7 cells were transiently transfected with pGL4.13-based vectors containing BRI3BP 3′ UTR fragments differing for the indicated BRI3BP SNP allele, and the control pRLSV40 Renilla vector. After 24 h of transfection, cells were treated with Nutlin for 24 h before performing dual-luciferase assays. Firefly luciferase signals were normalized to Renilla to control for transfection efficiency and to relative Firefly mRNA levels to take into account differences in reporter's transcript levels. Individual values from independently transfected wells are plotted. (B) Same as (A), except that the p21-5′ UTR was cloned in the low-expression pGL3-basic vector. ∗∗p value < 0.01; ∗∗∗∗p value < 0.0001, adjusted p value based on a two-way ANOVA with Sidak's multiple comparison test. Data are represented as mean ± SD. CDKN1A (p21) is an important cyclin-dependent kinase inhibitor and one of the major direct p53 transcriptional targets mediating cell-cycle arrest ([86]El-Deiry, 2016; [87]Kreis et al., 2019). The gene is highly regulated at the transcriptional and post-transcriptional levels ([88]Rossi et al., 2019; [89]Wang et al., 2000). CDKN1A transcripts were highly induced by both doxorubicin and Nutlin treatment, and the coverage of rs2395655 alleles in the untreated condition was very low (<10X). We evaluated the CDKN1A 5′ UTR rs2395655 alleles cloned upstream of the Firefly cDNA. We observed an overall bimodal distribution in the reporter gene activity, whose amplitude is reduced when the alternative allele is present ([90]Figure 2B, left panel). The treatment with Nutlin further mitigated this bimodal distribution with an overall reporter gene activity that is strongly reduced when the alternative allele is present ([91]Figure 2B, right panel). Since the SNP was concordantly classified as polysomally imbalanced after Nutlin treatment, in this case we highlight a scenario with functionally distinct alleles where differential translation efficiency may impact protein expression only under specific conditions. TranSNPs can have prognostic significance in breast cancer TCGA data Considering that rs1055472 and rs2395655 are tranSNPs with functionally distinct alleles and the two genes harboring them are related to p53, we next investigated their potential clinical impact. We focused on breast cancer for consistency with the cell line model used and exploited the richness of data available in TCGA ([92]Koboldt et al., 2012). Exploring TCGA survival data ([93]Liu et al., 2018) and using Kaplan-Meier curves, we observed that patients harboring the rs1055472 alternative allele show a statistically significant increase in progression-free interval time (p value = 0.042, [94]Figure 3A). Of note, BRI3BP and TP53 transcript levels were not correlated ([95]Figure S4A) and variant rs1055472 was not associated with patients' TP53 somatic aberration status ([96]Figure S4B) or with the utilization of DNA-damaging agents ([97]Figure S4C). Furthermore, the analysis repeated on TP53 aberrant or TP53 wild-type patients only ([98]Figure S4D) showed similar trends. Overall, this indicates that rs1055472 signal is not dependent on the TP53 status. In addition, the signal also persisted when a multivariate model including breast cancer subtype as covariate was used (p value = 0.028), also highlighting an independence of the signal from ER status ([99]Figure S4E). Figure 3. [100]Figure 3 [101]Open in a new tab Prognostic significance of tranSNPs in breast cancer (A) Progression-Free Interval analysis of BRI3BP-related tranSNP. Kaplan-Meyer curves along with summary statistics are reported. (B–D) Examples of tranSNPs presenting prognostic significance. Kaplan-Meyer curves along with summary statistics are reported. Motivated by this result, we then explored more systematically whether tranSNPs could be associated with distinct clinical variables in patients with cancer. To this end, our tranSNPs list was extended including all common SNPs in strong linkage disequilibrium (LD) (r^2 > 0.8) with them, obtaining 3,003 SNPs distributed across 120 LD blocks with genotype imputable from TCGA data. The extended SNP list was then used to interrogate clinical data in the breast cancer cohort from TCGA ([102]Liu et al., 2018), using Overall Survival (OS), Disease and Progression Free Intervals (DFI, PFI), and Disease Specific Survival (DSS) endpoints. Kaplan-Meier curves were built stratifying patients for the presence of the alternative allele (AA versus AB + BB) or the presence of the homozygous genotype for the alternative allele (AA + AB versus BB). Cox proportional hazards regression models were built to perform the analysis. To limit false-positive results, for each considered outcome, association results demonstrating a p value<0.05 were aggregated at the level of LD blocks and >5% of SNPs reproducing the association signal in a block were required to nominate the block as associated. On average, we found 10.7% (min 8.3%, max 14.2%) associated blocks across all outcomes with an average fraction of SNPs associated in a block equal to 56.8% and an average number of SNPs per block of 23.4 ([103]Table S5). Across the associated blocks, we identified 33 SNPs in the UTR sequences of 17 genes demonstrating significant prognostic effects ([104]Figures 3B-3D, and [105]Table S6). Both protective and risk alternative SNP alleles were found. UTR tranSNPs could alter RNA-binding protein target sites To search for a mechanism that could underlie polysomal imbalance and the observed differential translation, we examined the potential impact on RNA-binding proteins (RBPs)-binding sites for the 33 UTR tranSNPs resulting from the survival analysis. These polymorphisms might indeed directly cause the observed allelic imbalance of SNPs located in UTRs by, for example, impacting on UTR structure or binding sites for RBPs or also microRNAs. RNA binding consensus motifs were retrieved from the RBPDB database ([106]Cook et al., 2011). TESS software ([107]Schug, 2008) was used to compute motifs scores starting from 60-bp sequences with the SNP reference or alternative allele in the center. Overall, we identified 61 putative motif disruptions involving 27 UTR variants and RBPs associated with translation control and mRNA stability ([108]Table S7), such as ELAVLl/HuR, FUS, YBX1, and PABPC1 ([109]Behm-Ansmant et al., 2007; [110]Mazan-Mamczarz et al., 2003; [111]Peixeiro et al., 2012, p. 1; [112]Tanguay and Gallie, 1996; [113]Yasuda et al., 2013). UTR-specific imbalance in ATF6 transcript Among the UTR SNPs showing potential prognostic significance, a peculiar allelic imbalance pattern was observed for polymorphisms in the ATF6 transcript. Although only one ATF6 SNP was present in our initial tranSNPs list, an overall trend of imbalance in polysomal-associated RNAs was observed and restricted to SNPs located in its relatively long 3′ UTR ([114]Figure 4A). This imbalance trend was consistent across most ATF6 3′ UTR SNPs that are heterozygous in MCF7, and the overall imbalance distribution was strongly statistically significant (p value < 1 × 10^−4) when compared with imbalance distributions obtained from 1,000 sets of random sequential heterozygous SNPs of the same cardinality ([115]Figure 4B). On the contrary, the ATF6 heterozygous coding SNPs did not show any significant imbalance. Furthermore, alternative alleles were confirmed to be all in phase using 1,000 Genomes Project phased genotype data, ATF6 transcript level was not associated with clinical outcomes in both univariate and multivariate models, and no differential ATF6 transcript level was observed for patients carrying the clinically distinct haplotype structure. Figure 4. [116]Figure 4 [117]Open in a new tab Haplotype structure and allelic imbalance along the ATF6 gene and impact of UTR TranSNPs (A) RNA-seq-based allelic fractions of ATF6 heterozygous SNPs are reported for both coding and 3′ UTR (in red) SNPs. On the top we show the distribution observed in the first biological replicate, and on the bottom we show the distribution observed in the second biological replicate. (B) Significance of ATF6 3′ UTR SNPs allelic imbalance (red line) versus distribution of sequential random SNPs imbalances. On top using heterozygous SNPs data from the first biological replicate, and on the bottom using data from the second biological replicate. (C) Dual-luciferase assays in MCF7 cells transiently transfected with reporter vectors containing ATF6 3′ UTR SNP alleles. Experiments were developed as described in [118]Figure 2. ∗∗∗∗p value < 0.0001, adjusted p value based on a two-way ANOVA with Sidak's multiple comparison test. Data are represented as mean ± SD. (D) RIP experiment probing the interaction of PABPC1 with the ATF6 transcript. Bars plot the average fold enrichment relative to the input sample. Individual average values from three biological replicates are also shown. Results obtained with an IgG control antibody are included. ∗p value < 0.05, two-tailed, unpaired t test. Data are represented as mean ± SD. A gene reporter assay was used to evaluate the functional impact of two different sub-regions of the long ATF6 3′ UTR, corresponding to chr1:161930870-161931403 and chr1:161931723-161932422 (hg19) genomic coordinates ([119]Figure 4C). For the downstream region, we observed a consistent statistically significant increase in the reporter gene activity for the allele harboring the alternative SNPs bases ([120]Figure 4C), both in untreated and Nutlin-treated cells. For the upstream region, we observed a similar trend of increased reporter gene activity (although not reaching statistical significance) for the allele harboring the alternative SNPs bases in the untreated cells but not in the Nutlin-treated cells. Computational predictions suggest that the allele harboring the alternative SNPs bases loses two PABPC1-binding sites that have strong scores when considering the allele harboring the reference SNPs bases ([121]Table S7), suggesting an allele-dependent distinct processing of the 3′ UTR. Indeed, using RNA immunoprecipitation (RIP) assays, we demonstrate that the ATF6 transcript, and specifically the 3′ UTR region, is a PABPC1 target in MCF7 cells ([122]Figure 4D). Discussion The relative abundance of an mRNA in polysomal versus subpolysomal fraction is frequently used as a proxy for protein synthesis and can capture changes both in global and specific translation efficiency, driven by structural or sequence elements in the context of cell stress responses. It is common knowledge that cancer cells experience chronic stress as well as acute stress responses that converge on translation controls, unfolded protein responses, oxidative stress, and proteasome functions. Recent studies, including some from our group, are placing wild-type p53 in an important position to respond and influence these pathways and also are indicating that the gain of function properties of several mutant p53 proteins may converge on the dysregulation of controls on translation quality, protein folding, and proteasome functionality. Hence, in this study, we chose a p53 wild-type cancer cell line well characterized in terms of p53-induced responses ([123]Andrysik et al., 2017), including post-transcriptional and translational changes ([124]Zaccara et al., 2014). We used both a commonly used chemotherapeutic drug that activates the DNA damage response, and a selective MDM2 inhibitor that results in acute p53 activation without apparent genotoxic response ([125]Rizzotto et al., 2020; [126]Tovar et al., 2006). Doses and treatment time points were selected based on previous characterization of both p53 activation pathway and cell outcome ([127]Lion et al., 2013; [128]Nassiri et al., 2019). Furthermore, for both treatments, we have previously shown a global as well as a specific impact on translation ([129]Zaccara et al., 2014). Since MCF7 cells have been genotyped both by SNP arrays and exome-sequencing as part of the cell-line encyclopedia and NCI-60 studies, we could leverage RNA-seq data of total and polysome-bound mRNAs to focus on allele counts for all transcribed SNPs in the heterozygous state to reveal an imbalance in one fraction over the other. We hypothesized that this approach, although restrictive, could overcome limitations deriving from the quality and quantity of sequence reads that can be attributed to each allele of a SNP pair. Computational approaches that use allele counts of heterozygous SNPs to identify imbalance events have been implemented and widely used to investigate allele-specific expression patterns both in healthy and disease tissues ([130]Pandey et al., 2013; [131]Romanel, 2019; [132]Romanel et al., 2015; [133]Rozowsky et al., 2011; [134]Skelly et al., 2011; [135]Wei et al., 2012). However, only few approaches ([136]Mayba et al., 2014; [137]Przytycki and Singh, 2020) have been developed to compare allelic imbalances across matched samples, and all of the available methods have been specifically designed in the context of cancer (e.g., to compare cancer samples with their matched normal counterparts), hence limiting hence their broad applicability. Using polysomal profiling and RNA relative quantification from either pooled or individual fractions typically show relatively small differences, that can, however, underlie an important effect at the level of protein amounts. The approach we developed to identify tranSNPs is focused on computing and comparing the relative change of allele counts across mRNA fractions. To account for technical noise, the approach embeds in the calculation the amount of allele counts variability observed in a given condition exploiting data from biological replicates. Of note, such variability was not affected by the cellular treatment employed and was consistent across different RNA fractions. We identified 147 tranSNPs, representing nearly 4% of the overall analyzed SNPs, a fraction that, although conservative, can be considered comparable with the recent fraction of about 2% of allele-specific expressed genes computed across tumor versus normal matched samples ([138]Przytycki and Singh, 2020). Indeed, in [139]Przytycki and Singh (2020), differential allele-specific expression (ASE) of genes was computed by aggregating at gene level heterozygous SNPs data and no statistically significant difference between genes with differential ASE and those without with respect to the number of heterozygous SNPs across the length of the genes was found. We checked the intersection of our tranSNP list with the current list of GWAS variants from the GWAS catalog ([140]Buniello et al., 2019) that are related to cancer (N = 2,657), although it is possible that tranSNPs might not be directly related to the risk of developing a tumor but be related to a distinct clinical outcome. Only an intersection between the LD block defined by our tranSNP rs760077 and association signal with gastric cancer (variant rs760077 and LD variants rs140081212, rs4072037) and breast cancer (LD variants rs2075570, rs2974935) was identified. We further explored the intersection between the tranSNPs and a set of GWAS-implicated cancer risk SNPs identified in 41 genes of the p53 response pathway, of which those mapping at KITLG, CDKN2A, and TEX9 genes were reported to potentially impact drug sensitivity ([141]Stracquadanio et al., 2016; [142]Zhang et al., 2021). Given that MCF7 cells express wild-type p53 and that the doxorubicin or Nutlin treatments elicit a p53-dependent response, we were motivated to examine the potential overlap with our tranSNP list. However, only 13 of those 41 GWAS-implicated cancer risk SNPs could be evaluated in our MCF7 RNA-seq data. They map at six genes (CERSS, ISYNA1, CFLAR, SESN1, AKAP9, CYP51A1) that were not reported to impact drug sensitivity ([143]Stracquadanio et al., 2016; [144]Zhang et al., 2021). Finally, we considered the results of a very recent study where over 12,000 3′ UTR variants, including more than 2,000 disease-associated SNPs from the GWAS catalog, were cloned within 100-nt-long fragments and tested in a massively parallel reporter assay measuring relative RNA expression ([145]Griesemer et al., 2021). Surprisingly, about 25% of the tested GWAS SNPs resulted in steady-state changes in reporter transcripts' abundance. Among the GWAS SNPs included in that screen, 131 are heterozygous in MCF7 and could be analyzed by our method. We found no intersection with our tranSNP list, consistent with our approach being focused on allelic differences that are detected among polysomal-associated mRNAs but not total mRNAs. A subset of the identified UTR tranSNPs showed apparent prognostic value. Some of these are located in genes that have been already established to play a role in cancer in general and in breast cancer or in relation to p53 functions, in particular. These include BRI3BP, also known as human cervical cancer oncogene 1 (HCCR-1), that might act as a negative modulator of p53 ([146]Cho et al., 2007; [147]Ha et al., 2008); the ATF6 gene, an important modulator of endoplasmic reticulum stress that can promote cell survival ([148]Guan et al., 2017; [149]Sicari et al., 2019); and the protein lysine methyltransferase SETD3 gene, which was recently reported as a prognostic marker in patients with breast cancer ([150]Hassan et al., 2020) and also shown to promote apoptosis in response to DNA damage, in part through the modulation of p53 ([151]Abaev-Schneiderman et al., 2019). Other genes implicated by our results are GFM1, a mitochondrial translation elongation factor that has been linked to oxidative stress and cancer cell survival ([152]Norberg et al., 2017), and PLEC, an important member of a large family of scaffolding proteins that can modulate various aspects of cell function. The plectin gene is frequently found co-amplified with MYC in BRCA1-deficient breast cancer ([153]Annunziato et al., 2019). Sequence variants predicted to be deleterious have also been identified in primary breast cancers ([154]Horvath et al., 2013). Physical interaction of plectin with BRCA2 has also been reported ([155]Niwa et al., 2009). In all these cases, the tranSNPs we uncovered may be modifiers of protein expression, influencing mRNA translation efficiency, with consequences on cancer phenotypes. To begin exploring a more precise mechanism of action, motifs analysis based on eCLIP or RIP-seq data was exploited to identify UTR tranSNPs that may alter or lead to gain or loss of RNA-binding sites. Based on these predictions and focusing on a peculiar imbalance in ATF6 transcript that we observed only across 3′ UTR SNPs, we performed RIP assays and confirmed that ATF6 is a target of PABPC1, a well-known RNA-binding protein. This, combined with predicted putative loss of two PABPC1-binding sites across one of the two ATF6 alleles, suggests an allele-dependent processing of the 3′ UTR. Overall, our approach led us to identify a class of SNPs that are associated with changes in mRNA translation efficiency, potentially acting at the post-transcriptional level, and that show prognostic value for cancer, hence implicating the potential identification or stratification of patients with cancer based on genetic markers that are helpful in the prediction of prognosis in regard to death, progression, or recurrence. Limitations of the study This study considers a single cell line and two biological replicates for each considered treatment/condition. Survival analysis is focused on breast cancer data. Our catalog of tranSNPs is hence limited to the data we analyzed. STAR★Methods Key resources table REAGENT RESOURCE SOURCE IDENTIFIER Antibodies __________________________________________________________________ Anti-PABP Abcam Cat# bb6125 [10E10] Normal mouse IgG Merck Millipore Cat# 12-371 __________________________________________________________________ Bacterial and virus strains __________________________________________________________________ E. coli strain laboratory stock DH5alpha __________________________________________________________________ Chemicals, peptides, and recombinant proteins __________________________________________________________________ TRIzol Thermo Fisher Scientific Cat# 15596026 Nutlin Cayman chemicals Cat#10004372 Doxorubicin Sigma Aldrich Cat#25316-40-9 Xba I New England Biolabs Cat# R0145S Nco I New England Biolabs Cat# R3193S T4 DNA ligase New England Biolabs Cat# M0202S Protein A Magnetic Beads Thermo Fisher Scientific Cat# 88845 __________________________________________________________________ Critical commercial assays __________________________________________________________________ TruSeq RNA Library preparation kit v2 Illumina Cat#RS-122-2001 Agilent RNA 6000 Nano kit Agilent Cat#5067-1511 RevertAid RT Kit Thermo Fisher Scientific Cat# K1691 Dual Luciferase Reporter assay system Promega Cat# E1910 __________________________________________________________________ Deposited data __________________________________________________________________ Sequencing of total and polysomal RNA from normal and treated MCF7 cells This paper Bioproject: PRJNA693005 TCGA SNP 6.0 array calls Genomic Data Commons [156]http://portal.gdc.cancer.gov/legacy-archive TCGA clinical information [157]Liu et al. (2018) [158]https://doi.org/10.1016/j.cell.2018.02.052 ([159]Table S1) MCF7 heterozygous SNPs (array data) [160]Barretina et al., 2021 GEO [161]GSM888366 MCF7 heterozygous SNPs (wes data) [162]Reinhold et al. (2012) [163]https://doi.org/10.1158/0008-5472.CAN-12-1370 RBPDB [164]http://rbpdb.ccbr.utoronto.ca/ [165]https://doi.org/10.1093/nar/gkq1069 __________________________________________________________________ Experimental models: Cell lines __________________________________________________________________ MCF7 ICLC N/A __________________________________________________________________ Oligonucleotides: Cloning primers __________________________________________________________________ BRI3BP_F Metabion aatctagaAGGTCAGCCGGCCGGGCGGGTCCAC BRI3BP_R Metabion aatctagaAACTTGACTCAATCTGCCTTTATTA ATF6_852F Metabion aatctagaTTCATCCCTCGATTCCCAGC ATF6_852R Metabion aatctagaGCAACCCCCAAAAGGCAATC ATF6_848F Metabion aatctagaGGACACAGCTTCATTAGAGTGTT ATF6_848R Metabion aatctagaGGCTGTGAAAGCAAAAGTGGT p21_UTR_ref_F Metabion CATGGGGTGGCTATTTTGTCCTTGGGCTGCCT GTTTTCAGCTGCTGCAACCACAGGATTTCTTCT GTTCAGGCGCCC p21_UTR_ref_R Metabion CATGGGGCGCCTGAACAGAAGAAATCCCTGTGGT TGCAGCAGCTGAAAACAGGCAGCCCAAGGACAAA ATAGCCACCC p21_UTR_ALT_F Metabion CATGGGGTGGCTATTTTGTCCTTGGGCTGCCTGT TTTCAGCTGCTGCAACCACAGGGGTTTCTTCTGTT CAGGCGCCC p21_UTR_ALT_R Metabion CATGGGGCGCCTGAACAGAAGAAACCCCTGTGGT TGCAGCAGCTGAAAACAGGCAGCCCAAGGACAAA ATAGCCACCC __________________________________________________________________ Oligonucleotides: PCR and sequencing primers __________________________________________________________________ GAPDH_F Metabion TCCAAAATCAAGTGGGGCGA GAPDH_R Metabion AGTAGAGGCAGGGATGATGT ATF6_F Metabion CCCGTATTCTTCAGGGTGCT ATF6_R Metabion TCACTCCCTGAGTTCCTGCT ATF6_UTR_F Metabion GAACCTTCCTCCCCTGTGTG ATF6_UTR_R Metabion CAGAGTGAAAGGGGGCATCA __________________________________________________________________ Software and algorithms __________________________________________________________________ ASEQ [166]https://bitbucket.org/CibioBCG/aseq/ [167]https://doi.org/10.1186/s12920-015-0084-2 Ensembl rest API [168]https://rest.ensembl.org/ [169]https://doi.org/10.1093/bioinformatics/btu613 SHAPEIT [170]https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.h tml [171]https://doi.org/10.1016/j.ajhg.2013.09.002 IMPUTE2 [172]http://mathgen.stats.ox.ac.uk/impute/impute_v2.html [173]https://doi.org/10.1371/journal.pgen.1000529 Survival R package [174]https://cran.r-project.org/web/packages/survival/index.html [175]https://doi.org/10.1007/978-1-4757-3294-8 TESS [176]https://www.cbil.upenn.edu/tess [177]https://doi.org/10.1002/0471250953.bi0206s21 VariantAnnotation R package [178]https://bioconductor.org/packages/release/bioc/html/VariantAnnotat ion.html [179]https://doi.org/10.1093/bioinformatics/btu168 [180]Open in a new tab Resource availability Lead contact Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Alberto Inga ([181]alberto.inga@unitn.it). Materials availability This study did not generate new materials. Experimental model and subject details Cell lines and culture conditions MCF7 cells were cultured in standard RPMI (Lonza) supplemented with 10% FBS (brand), 100 units/ml penicillin, 100 mg/mL streptomycin antibiotic mix, and 2 mM glutamine. Cells were periodically tested to ascertain the absence of mycoplasma contamination. The authenticity of the parental cells was confirmed by a commercial genotypic service (BMR genomics, Padova, Italy). Doxorubicin was purchased from Sigma. Nutlin was purchased from Cayman chemicals. Method details SNP status in MCF7 cells We retrieved from the literature genotype assays for MCF7. Two different public datasets (GEO and NCI-60) were used containing information about homozygosis or heterozygosis conditions for SNP alleles in MCF7, the unique identifier, the relative and alternative base in specific single positions. One SNP-array based file ([182]GSM888366) contains 756,260 SNPs with the information about the reference and alternative bases [183](Barretina et al., 2012), while the second exome-based file ([184]Reinhold et al., 2012) contains 85,593 SNPs. The SNPs in common (N = 6,454) were compared to verify the consistency among the two datasets: more than 91% of the SNP calls were consistent. This concordance suggested the possibility to merge the two files removing the data with opposite information. A first screening was performed through the elimination of SNPs called as homozygous in the two datasets, but with a different base call: i.e. reference “0/0” vs alternative “1/1”. Together, the final file presents 11,544 heterozygous SNPs of which 1,802 in 3′UTR and 729 in 5′UTR. Variants were annotated using VariantAnnotation R package ([185]Obenchain et al., 2014). Differential allele-specific expression analysis To identify allelic imbalances between polysomal and total RNA fractions a computational approach that exploits SNP allelic fractions variability across biological replicates was implemented. Given an experimental condition C, SNP allelic fractions variability for condition C is calculated as: [MATH: Vc=mean |snp iRjsnpiRk|such< mspace width="0.25em">that1iNand1j,kMandj<k< /mfenced> :MATH] where N is the number of considered SNPs, M is the number of biological replicates, and [MATH: snpiR j :MATH] is the allelic fraction of SNP i computed from the RNA-seq BAM file representing the biological replicate R[j] using ASEQ pileup module ([186]Romanel et al., 2015) and defined as: [MATH: snpiR j=#AL< mi>T#ALT+#REF :MATH] where [MATH: #ALT :MATH] and [MATH: #REF :MATH] correspond to, respectively, the number of reads supporting the SNP i alternative base and the number of reads supporting the SNP i reference base. Of note, in the pileup computation only reads and bases with quality above 20 and a minimum of local depth of coverage of 10X were considered. Now, given variabilities values V[poly] and V[tot] computed, respectively, from RNA-seq replicate samples of polysomal and total RNA fractions, we define as tranSNPs all SNPs satisfying the following condition: [MATH: j{1,,M}(snpipolyj +Vpoly<snpi totj Vtot)j{1,,M}(snpipolyj Vpoly>snpi totj +Vtot) :MATH] Survival analysis using TCGA data From our list of 147 tranSNPs an extended list of variants was built retrieving variants in strong linkage disequilibrium (LD). We queried the Ensembl rest API version GRCh37 ([187]Yates et al., 2015) and retrieved all the variants with r^2 > 0.8 in a genomic window of 500kbp using general population data. For a total of 3,003 variants, distributed among 120 LD blocks, we were able to retrieve genotype calls from TCGA data. Specifically, raw TCGA genotype calls were downloaded from TCGA legacy data portal (portal.gdc.cancer.gov/legacy-archive) and only genotypes with confidence score larger than 0.1 were retained. Genotypes were analyzed with SHAPEIT v2 ([188]Delaneau et al., 2013) to infer haplotype structure and optimize genotype content information for the imputation process. Variants were imputed using IMPUTE v2.3.2 ([189]Howie et al., 2009) against a reference panel built from 1,000 Genomes Project data. The final extended list of variants was used to perform variant-specific survival analysis using TCGA breast cancer survival data ([190]Liu et al., 2018). Overall Survival (OS), Disease-Specific Survival (DSS), Disease-Free Interval (DFI) and Progression-Free Interval (PFI) were considered in our analysis. Dominant and recessive models were built, respectively, grouping together patients with homozygous alternative and heterozygous genotypes (AA vs AB+BB) and homozygous reference and heterozygous genotypes (AA+AB vs BB). Kaplan-Meier survival curves were built for each variant and Cox proportional hazards regression models were used and inspected. To limit false positive results, for each considered outcome, association results demonstrating a p value <0.05 (minimum between Log rank test p value and Wald test p value was considered) were aggregated at the level of LD blocks and >5% of variants reproducing the association signal in a block were required to nominate the block as associated. Survival analyses were performed using survival R package ([191]Therneau and Grambsch, 2000). Identification of SNPs disrupting putative RBP consensus motifs We retrieved 552 human RNA binding consensus motifs from RBPDB database ([192]Cook et al., 2011). Motif analysis was performed on the list of 33 UTR tranSNPs resulting from the survival analysis. For each variant, two sequences of length 60bp with the SNP reference or alternative allele in the middle were built using GRCh37 reference genome. All sequences were then collected in a FASTA file and TESS software ([193]Schug, 2008) was ran on this file to compute motifs scores. TESS tool provides a set of log-likelihood-ratio-based scores, among which we used the score La, which represent the log-odds ratio of the match, and the score Lm, which represents the maximum possible log-odds ratio for a match from a given consensus motif. For each motif overlapping a variant, results were collected only if the score La for the reference or the alternative allele was at least 3 and it was computed on the forward (transcribed) strand. Confident results were filtered by considering only entries having a La/Lm value greater than 0.5 in at least one of the two conditions. In addition, only variants presenting an absolute ratio between reference and alternative La scores greater than 10% or presence of a motif with the reference but not with the alternative (or vice-versa) were retained. Polysomal profiling and RNA-sequencing MCF7 parental cells were seeded in P100 dishes and treated at about 70% confluence by 1 μM doxorubicin or 10 μM Nutlin, for 16 hours. DMSO was used for treatment control. Cytoplasmic lysates were obtained and polysome fractionations performed as described in ([194]Dassi et al., 2015; [195]Provenzani et al., 2006; [196]Rizzotto et al., 2020; [197]Rossi et al., 2019; [198]Zaccara et al., 2014). Briefly, cytoplasmic lysates were loaded on a 15 ml, 10-50% sucrose gradient, ultra-centrifuged (40k rpm for 100’) and fractionated with an automated fraction collector (1 ml per fraction Teledyne ISCO), tracing the 254[nm] absorbance. All the lighter fractions containing subpolysomal fractions (from the top to the gradient up to the fractions corresponding to the 80S monosome) were pooled in a tube. Heavier fractions corresponding to polysomal RNAs (two or more ribosomes) were also pooled in a separate tube. RNA was purified by extraction with 1 volume of phenol-chloroform. The aqueous fraction was recovered after centrifugation, transferred to a new tube where RNA precipitation was induced by the addition of 1 volume of isopropanol and a subsequent centrifugation step. Pellets were washed once using 70% V/V ethanol to remove phenol contaminations and resuspended in sterile, RNAse free water. Total RNA samples were instead obtained by TRIzol (ThermoFisher) extraction of a separate cell population prepared in parallel. RNA preparations were quantified by NanoDrop microvolume spectrophotometer. Purity and integrity were checked by Agilent 2100 Bionalyzer electrophoretic runs exploiting the Agilent RNA 6000 Nano kit. Sampling with RIN over 6 were used to prepare libraries for RNA sequencing. Two biological replicates for each RNA type (total and polysomal) and condition (mock, doxorubicin, Nutlin) were sequenced. Libraries were prepared following the TruSeq RNA Library preparation kit v2 protocol (Illumina), staring from 1 μg of input RNA. Paired end (100 bp) sequencing was performed on a HiSeq 2500 (Illumina). FASTA files were aligned to the reference genome GRCh37 using TopHat v2.0.10 ([199]Kim et al., 2013), resulting in BAM files with an average of ∼32 million unique mapping (mapping quality >5) reads. Differential expression analysis was performed with Cuffdiff ([200]Trapnell et al., 2013) comparing Nutlin and doxorubicin samples against mock samples, considering both total and polysomal RNA fractions. Pathway enrichment analysis of deregulated genes in all tested comparisons was performed using clusterProfiler R package ([201]Yu et al., 2012) exploiting KEGG database ([202]Kanehisa et al., 2020). Comparison of multiple gene lists obtained from condition-associated tranSNPs was performed with Metascape ([203]Zhou et al., 2019). Cloning 5’UTR or 3’UTR allelic variants for gene reporter assays The commercial Firefly luciferase plasmid pGL4.13 was used to clone the two rs1055472 alleles BRI3BP 3′UTR as well as two regions of the ATF6 3′UTR containing different set of SNPs, using a PCR based approach that introduced the XbaI restriction enzyme site. MCF7 cDNA was used as template, given the heterozygous state for the SNPs of interest. The cloned regions correspond to genomic coordinates chr12:125509977-125510349 (hg19) for BRI3BP and chr1:161930870-161931403 or chr1:161931723-161932422 for ATF6. Primers used for the cloning are listed in the [204]key resources table, where the lowercase bases are the restriction site sequence included in the primers after two initial adenines. The pGL3 basic vector was instead used to clone the two rs2395655 alleles of the CDKN1A 5’UTR exploiting the unique NcoI site that overlaps with the firefly start codon and using complementary primers containing the 5’UTR sequence of interest for each SNP allele that were annealed prior to ligation to generate NcoI sites at both ends. After in vitro ligation (T4 DNA ligase, NEB), E. coli competent cells were transformed and colonies picked to recover plasmids based first on a colony PCR approach, followed by plasmid recovery from positive colonies followed by electrophoresis migration and digestion pattern analysis. The successful cloning of both SNP alleles for the two targets was then verified by Sanger sequencing (Eurofins Genomics) of independent candidate colonies. Dual-luciferase assays pGL4.13- or pGL3-derivative plasmids were transiently transfected along with the pRL-SV40 Renilla control luciferase in a 3:1 ratio (250 ng plus 50 ng for 24 well format) using Mirus-LTI transfection reagents following the manufacturer's protocol. When needed, cells were treated with doxorubicin or Nutlin for 16 hours, starting 24 hours after transfection. Luciferase assays were performed using Dual-Luciferase™ Reporter (DLR™) Assay System (Promega), following the provided protocol. Light units were measured using a Tecan M200pro multilable plate reader through three technical replicates of well. Presented in the graphs are the average Relative Light Units (Firefly/Renilla) and standard deviations of three biological experiments, compared using a two-tailed, equal variance Student's t-test. RIP assays Ribonucleoprotein particles immunoprecipitation (RIP) experiments were performed following a published protocol with some modifications described recently ([205]Keene et al., 2006; [206]Uroda et al., 2019), starting from 4 × 10^7 MCF7 cells. After lysis, the supernatants were collected and 1% of each sample was set aside as input while the remaining was incubated for 4 hours at 4°C with protein A magnetic beads (Thermo Fisher Scientific) coated either with 3 μg of an anti-PABPC1 antibody (Abcam) or with 3 μg of normal Rabbit IgG (Santa Cruz). After the washing steps, RNA was isolated from Input and IP samples using TRIzol (Thermo Fisher Scientific) and converted to cDNA using the RevertAid First Strand cDNA Synthesis Kit and standard protocol (Thermo Fisher Scientific). qPCR reactions were performed using the qPCRBIO SyGreen Mix (PCR Biosystems) master mix on a CFX384 thermal cycler (Biorad). GAPDH was tested as a comparison. Two pairs of primers were used for ATF6, of which one targeting exon 9, and the second targeting the 3UTR sequence comprising the rs2499847 SNP whose alleles are predicted to impact a PABPC1 binding site ([207]Table S7). The primers sequences can be found in the [208]key resources table. Quantification and statistical analysis Statistical tests applied throughout the study are specified in results, figure legends and in the methods accordingly. Given a list of variants, LD blocks were determined identifying the number of cliques in a graph were nodes are variants and edges are present between two nodes when the corresponding variants are in LD with an r^2 > 0.8. In all considered cases the number of cliques corresponded to the number of connected components. Luciferase assay data were analyzed using a 2-way ANOVA with Sidak’s multiple comparison test. RIP assays were analyzed using a two-tailed unpaired t-test. Acknowledgments