Graphical abstract graphic file with name fx1.jpg [31]Open in a new tab Highlights * • Differential TSS distribution in CML and prostate cancer cells for AGAP2 and HK1 * • Less than 50 bp differences in the 5′UTR isoforms generated in these cells * • Longer mRNA 5′UTR isoform contains a G4 structure and reduced translation rates * • Evidence here supports the TSS selection within a cluster can affect translation rates __________________________________________________________________ Molecular biology; Molecular mechanism of gene regulation; Bioinformatics; Cancer Introduction The transfer of information from the genome to the proteome is a coordinated multi-step process tightly controlled at the gene promoter level (influencing transcription), the mRNA level (processing and stability), and the translation level (ribosome binding and polysome association). However, the amount of mRNA in a cell does not always correlate with the amount of protein present, making RNA quantification an inexact tool to predict protein levels.[32]^1^,[33]^2^,[34]^3 We came across discrepancies between mRNA and protein levels when studying AGAP2 (ArfGAP with GTPase-like domain, ankyrin repeat, and PH domain 2, isoform 2) promoter regulation in different cancers.[35]^4 AGAP2 (also known as PIKE-A) is a ubiquitously expressed protein that has a role in hepatic fibrosis and cancer progression.[36]^5^,[37]^6^,[38]^7 It is classed as a proto-oncogene involved in cell survival, apoptosis, migration, and lipid metabolism,[39]^8^,[40]^9 and understanding its expression regulation would be key to modulate its functions. One of the crucial steps in gene expression regulation is transcription initiation: a differential initiation can produce a heterogeneous population of mRNA isoforms from a single gene locus. Transcription does not initiate at a single nucleotide or discrete transcription start site (TSS) within a tissue or cell culture. Instead, it is initiated across a cluster of multiple closely spaced TSSs within a promoter.[41]^10^,[42]^11^,[43]^12 It can also be initiated from TSSs located in separate clusters (alternative promoters). In fact, alternative transcription initiation driven by multiple promoter usage has a higher contribution to tissue-dependant isoform-specific diversity than alternative splicing.[44]^13 It is estimated that 30-50% of human genes are regulated by alternative promoters active depending on the cell type, developmental stage,[45]^14 cellular environment, or disease stages.[46]^15 And indeed, AGAP2 is one of these genes, presenting two alternative promoters that lead to the production of two different protein isoforms with a differential N terminus (isoform 1 and 2) that confers them unique target specificity.[47]^16 Several attempts have been made in the last decade to precisely identify TSSs and characterize core promoter features. Notably, the FANTOM consortium and the DataBase of Transcriptional Start Sites (DBTSSs) have comprehensively captured the dynamically changing landscape of TSS selection by using mRNA cap-guided deep sequencing technologies.[48]^10^,[49]^17 These databases have facilitated genome-wide analyses of promoter architecture and highlighted widespread differences in TSS selection, identifying an average of 4 robust TSS clusters per gene.[50]^10 In addition, other studies have also found cell-specific differential distribution of TSSs within a cluster.[51]^18 This highlights a potential relevance in gene expression regulation. After all, a differential TSS selection will change the overall length of the mRNA 5′ UTR, likely altering the presence of regulatory elements such as upstream open reading frames (uORFs); upstream start codons (uAUG); RNA secondary structures; and internal ribosomal entries sites.[52]^19^,[53]^20^,[54]^21^,[55]^22^,[56]^23^,[57]^24 However, although previous studies have reported the translational impact of transcripts isoforms derived from multiple closely situated promoters,[58]^19^,[59]^25^,[60]^26^,[61]^27^,[62]^28^,[63]^29 minor TSSs (alternative TSS selection within the same cluster) have been considered as nonadaptive and the product of molecular errors.[64]^30 We demonstrate here that AGAP2 (isoform 2) mRNA expression is differentially initiated from alternative TSSs within the same cluster in different cancer types, directly impacting on the mRNA translational efficiency. We used 5′ RACE to determine the transcription start sites in prostate cancer and CML cell lines, finding that the transcripts with a slightly longer 5′ UTR contained the consensus sequence for a G quadruplex (G4), a type of secondary structure. We demonstrated the formation of the G4 using circular dichroism and an in-house developed immunoprecipitation approach that we have termed rG4IP (RNA G4 Immunoprecipitation).[65]^31 We also determined that the presence of the G4 in AGAP2 5′ UTR has a direct impact on the translation efficiency, reducing the amount of mRNA associated with polysomes. But more importantly, we hypothesized that this differential TSS selection could be a more widely used mechanism to regulate the amount of protein produced in cells. To test this, we developed a bioinformatics pipeline to interrogate data from the FANTOM project[66]^32 and from the NCI-60 microarray ([67]GSE32474) and NCI-60 SWATH-MS databases,[68]^3^,[69]^33 finding other genes behaving in a similar manner. And we validated our findings by testing and demonstrating that HK1 expression can also be regulated through alternative TSS selection. Together, we present here compelling data supporting an alternative mechanism to regulate cellular protein content by controlling transcription initiation within a single TSS cluster. Results AGAP2 mRNA levels correlate negatively to protein levels in some cancers Our group had previously studied the regulation of AGAP2 promoter in prostate cancer (PC) and chronic myeloid leukemia (CML) cell lines and noted a stronger basal promoter activity in CML cells, that resulted in higher relative AGAP2 mRNA levels when compared to levels in PC cell lines.[70]^4 However, when AGAP2 protein levels were analyzed, we observed a significant negative correlation (Pearson’s R = −0.89, p = 0.016) between AGAP2 mRNA and protein in both types of cancers ([71]Figures 1A–1C). In the CML cell lines (KU812, TCC-S, and KCL-22), AGAP2 relative mRNA expression was higher, but the protein levels were lower compared to PC cell lines (DU145, PC3, and LNCaP). In addition, the opposite occurred in PC cell lines. This mismatch between AGAP2 mRNA and protein levels was also observed in other cancer types ([72]Figures 1D and 1E) such as hepatocarcinoma (HepG2 cells), ovarian cancer (SKOV-3 cells), and acute myeloid leukemia (cell lines KG1 and Kasumi). However, when considering all cell lines analyzed together, the negative correlation between mRNA and protein was not as strong (Pearson’s R = −0.64, p = 0.011) as when focusing only on levels present in CML and PC cells ([73]Figure S1A), highlighting a specific cell line-dependent regulation of AGAP2 expression. Figure 1. [74]Figure 1 [75]Open in a new tab AGAP2 mRNA and protein levels discrepancies (A) AGAP2 mRNA basal levels were measured in prostate cancer (PC) cell lines (DU145, PC3, LNCaP) and chronic myeloid leukemia (CML) cell lines (KU812, TCC-S, KCL-22) by RT-qPCR. The values presented were normalized against the levels of the housekeeping gene HPRT and shown relative to the prostate cancer cell line DU145. Statistical analyses were carried out by one-way ANOVA[F (5, 12) = 21.23, p < 0.0001)] with post-hoc Sidak’s multiple comparison tests. (B) Representative image of AGAP2 protein levels detected by immunoblotting in CML and PC cell lines. β-Actin was used a loading control. Densitometry values for the relative protein expression are represented below the blots. Differences were analyzed using Kruskal-Wallis [H (5) = 14.71, p = 0.012] followed by uncorrected Dunn’s test. (C) Strong negative correlation between AGAP2 mRNA (x axis) and protein levels (y axis) in PC and CML cell lines (Pearson’s R = −0.89, p = 0.016). The data presented is relative to DU145 (PC cell line). (D and E) AGAP2 relative mRNA levels (D) and protein (E) in different cancer cell lines, assessed as described in (A) and (B). Statistical analyses for mRNA levels in (D) were carried out by one-way ANOVA[F (9, 20) = 41.30, p < 0.001)] with post-hoc Sidak’s multiple comparison tests. (F and G) Western blot analysis for the accumulation of ubiquitinated proteins (as positive control for the proteasomal inhibitors) and AGAP2 levels in CML cell lines treated with proteasomal inhibitors: MG132 [KU812 (5 μM), TCC-S (5 μM), KCL-22 (50 μM) for 4 h] and Bortezomib [KU812 (200 nM), TCC-S (10 nM), KCL-22 (100 nM) for 6 h]. β-Actin levels were used as a loading control. All data shown in the graphs in this figure are the mean ± SD from three independent experiments (performed in triplicate in the case of qPCRs); (∗p < 0.05; ∗∗p < 0.01; ∗∗∗p < 0.001). Post-translational mechanisms can account for reduced protein levels. To rule out an enhanced protein degradation in CML cell lines by the ubiquitin-proteasome pathway, we treated CML cells with the proteasomal inhibitors MG132 and bortezomib. At the concentrations used, the inhibitors increased the levels of ubiquitinated proteins in the CML cells ([76]Figure 1F) but did not significantly modify AGAP2 protein levels compared to untreated controls ([77]Figure 1G). These results suggest that the amount of protein produced (rather than the degradation) was key to the differential AGAP2 expression in these two types of cancers. However, translation is a complex mechanism with several layers of regulation. We studied the levels of the rate-limiting translation initiation factors in PC and CML cell lines, but we found no differences that could support the disparity in AGAP2 translational output ([78]Figure S1B). Furthermore, preliminary data of AGAP2 mRNA association to polysomes indicated a differential behavior in CML and PC cells, with AGAP2 mRNA associating to ribosome heavier fractions in DU145 cells ([79]Figure S1C) and we focused on exploring this variation further. Differential AGAP2 transcription start site selection within a cluster leads to slightly different 5′ untranslated regions in chronic myeloid leukemia The 5′ and-3′ untranslated regions (UTRs) of an mRNA play a very important role in regulating translation. Whilst the 3′ UTR has a well-characterized role in controlling mRNA stability and localization,[80]^34 the 5′ UTR allows for ribosome binding supporting cap-dependent translation. Structures or motifs in this region can exert a post-transcriptional control in gene expression and multiple transcription initiation within a core promoter has been previously highlighted.[81]^35^,[82]^36 Therefore, we decided to focus initially on this region. Sanger sequencing of the area upstream of the start codon did not reveal any cell line-specific mutation ([83]Figure S2A) that could support differences in AGAP2 expression. Next, we used 5′ RACE to map the transcription start site (TSS) for AGAP2 in CML and PC cells to determine if the 5′ UTRs were of equal length in these cell lines. We observed that transcription initiated from the same TSS cluster, but in KU812 cells (CML) the TSS was 35 nucleotides upstream compared to DU145 cells (PC) ([84]Figure 2A). Interestingly, in silico studies suggested that those extra nucleotides contained the consensus for a G quadruplex (G4) structure. Figure 2. [85]Figure 2 [86]Open in a new tab Alternative TSS usage for AGAP2 in PC and CML cell lines leads to differential 5′ UTR isoforms (A) Image derived from the ZENBU browser ([87]http://fantom.gsc.riken.jp/zenbu/) showing the main TSSs in AGAP2 and its frequency (height of peaks). 5′ RACE was performed according to the manufacturer’s instructions using adenines for the tailing reaction. KU812, a CML cell line, presents with an upstream TSS that produces an AGAP2 mRNA with a slightly longer (35 bp) 5′ UTR than the one found in DU145, a PC cell line. In those extra nucleotides, there is a repetition of guanine residues that fits the pattern predicted for the formation of G quadruplexes. (B) Comparison of the frequency of alternative TSSs used in DU145 (PC) and KU812 (CML) cell lines, mapped by 5′ RLM-RACE. The relative frequencies (in percentages) are shown as bars placed at the nucleotide position upstream from the start codon (n = 10). (C) TSSs for AGAP2 in KU812 obtained by 5′ RLM-RACE is plotted against the TSSs noted by the FANTOM CAGE database. (D) Cartoon representing AGAP2 core promoter and its TSSs. The differential TSS selection creates slight differences in the length of the 5′ UTR, with upstream/earlier TSSs resulting in longer 5′ UTRs. The selection of an earlier TSS in CML KU812 produces an mRNA that encodes extra nucleotides in the 5′ UTR containing the consensus for a G quadruplex structure. (E) Relative levels of the longer AGAP2 5′ UTR containing the G quadruplex consensus sequence in PC and CML cell lines. The data represent the mean ± SD of three independent experiments. Statistical differences were analyzed by one-way ANOVA[F (5, 12) = 29.35, p < 0.0001)] with post-hoc Sidak’s multiple comparison tests, p-values shown. (∗p < 0.05; ∗∗∗p < 0.001). To have a better understanding of TSS selection and distribution in CML and PC, we used 5′ RLM RACE and performed Sanger sequencing of the RACE products. We found a differential TSS distribution in the cell lines, with a broader distribution and upstream TSSs more frequently noted in CML (KU812) ([88]Figure 2B). As KU812 was one of the cell lines included in the FANTOM project, we were able to compare the TSSs identified in our study with the start sites detected in the FANTOM CAGE database,[89]^32 observing a highly similar distribution ([90]Figure 2C). Interestingly, the TSS distribution for AGAP2 in KCL-22, another CML cell line also available in this database, showed a similar widespread distribution ([91]Figure S2B). Next, we decided to study by qPCR the expression of the AGAP2 mRNA containing the longer 5′ UTR in all the cell lines. The selection of upstream TSSs has the potential to incorporate a G4 structure at the beginning of the 5′ UTR ([92]Figure 2D). When located within the first 50 bp of the 5′ UTR, those structures have been found to affect ribosome binding and influence translation rates.[93]^37 Using primers that would detect the incorporation of the extra nucleotides that formed the G4 structure, the AGAP2 mRNA with the longer G4-containing 5′ UTR was found to be significantly more abundant in CML cell lines ([94]Figure 2E). This confirmed a cell-specific bias in AGAP2 TSS selection within this single TSS cluster. Presence of a G quadruplex (G4) structure in AGAP2 longer 5′ UTR G4 structures can modulate gene expression.[95]^38 Given the presence of a putative G4 consensus sequence in the AGAP2 5′ UTR, incorporated due to the selection of an upstream TSS, we evaluated the formation of the G4 structure in vitro and in vivo using circular dichroism and in-house developed immunoprecipitation technique. To confirm the sequence found in the longer AGAP2 5′ UTR could form a G4 in vitro, RNA oligos that contained the sequence under study (5′-GGGCGGGCAGGGGCGGGG-3′) or a mutant version (5′-GAGCGAGCAGAGGCGGGG-3′) were prepared and their circular dichroism (CD) spectrum was obtained. The results obtained were characteristic of the formation of a parallel G4 in the presence of salts ([96]Figure 3A, left panel). When the G4 consensus was destroyed by punctual Guanine to Adenine substitutions (mutant), the characteristic peaks in the spectra were no longer observed ([97]Figure 3A, right panel). Figure 3. [98]Figure 3 [99]Open in a new tab Presence of a G quadruplex structure in the longer AGAP2 5′ UTR (A) RNA oligos containing either the sequence corresponding to the G quadruplex consensus found in the longer AGAP2 5′ UTR or a mutated version were folded in the presence of 100 mM NaCl, 100 mM KCl, or no salts and its CD spectra represented here. The characteristic pattern of a parallel G quadruplex (G4) structure was noted, exhibiting a positive peak at ∼260 nm and a negative peak at ∼240 nm in the presence of salts (left). This pattern was lost in the mutant RNA oligo where key guanosines were changed to adenosines (right). (B) Overview of the RNA G quadruplex immunoprecipitation (rG4IP) technique.[100]^31 Cells were treated with 25 μg/mL digitonin and the extracted cytoplasmic fraction precleared and incubated overnight with a structure-specific G4 antibody (BG4) bound to protein G magnetic beads. After incubation, the beads were washed, and the bound RNA eluted by unfolding the G4 by heating it at 65°C for 15 min. The eluent is treated with DNase I and analyzed by RT-qPCR. (C) rG4IP was performed in the TCC-S (CML) cell line and the immunoprecipitated samples are normalized by their input controls. NRAS and MM16 mRNAs were used as a positive control for the presence of G4 structures, as documented in the literature. TBP mRNA was used as a negative control as it lacks G4 consensus sequences in its entire mRNA. Differences between samples were analyzed with unpaired t test. (D) rG4IP was performed in DU145 cells transfected with either an empty vector (with no 5′ UTR) or the same vector containing AGAP2 longer 5′ UTR in front of the Renilla luciferase gene. The levels of Renilla mRNA in the immunoprecipitated samples were normalized by their input controls. An unspecific isotype antibody (IgG) was used as a negative control. Differences between samples were analyzed by unpaired two-tailed t-tests. All the data shown in this figure correspond to three independent immunoprecipitations and the error bars denote SD. (∗p < 0.05; ∗∗p < 0.01; ∗∗∗p < 0.001). The next step was to demonstrate that the G4 was formed in vivo. However, the detection of RNA G4 structures in living cells is challenging and different approaches have variable success rates.[101]^39 Interestingly, a structure-specific G4 antibody (BG4) that selectively binds both DNA and RNA G4 was generated relatively recently.[102]^40 Using this antibody, we developed an RNA-specific G4 immunoprecipitation technique (rG4IP) to selectively enhance the detection of cytosolic mRNAs with G4 structures ([103]Figure 3B).[104]^31 Using rG4IP, we obtained an enrichment of AGAP2 mRNA in the BG4-pulled fraction compared to the negative IgG control, detected by qPCR ([105]Figure 3C). We also observed BG4-mediated enrichment of NRAS and MMP16 mRNAs, which are known to present 5′ UTR G4 structures and were used here as positive controls.[106]^41^,[107]^42 TBP mRNA, which lacks a G4 consensus sequence in its entire mRNA, was used as a negative control. These results highlighted the effectiveness of our rG4IP technique and demonstrated the presence of native G4 structures in AGAP2 mRNA. However, an analysis of the AGAP2 mRNA sequence using the psqfinder web application[108]^43 revealed other several potential G4 consensus sequences along its entire length, apart from the one predicted in the longer 5′UTR ([109]Figure S3). Therefore, to detect the native G4 formation specifically in the longer 5′ UTR of AGAP2 mRNA, we performed rG4IP in DU145 cells transfected with either the empty bicistronic plasmid pcDNA3 RLuc Polires FLuc[110]^44 or the same plasmid with the AGAP2 longer 5′ UTR cloned in front of the Renilla Luciferase (RLuc) gene. The results showed a significant RLuc enrichment in the cells transfected with the plasmid containing the cloned AGAP2 5′ UTR, unequivocally demonstrating the presence of G4 structures in that region ([111]Figure 3D). The G4 structure in AGAP2 longer 5′ UTR influences mRNA translation negatively The presence of G4 structures in the 5′ UTR has been previously shown to decrease mRNA translational efficiency.[112]^23 To study the influence of these structures on AGAP2 mRNA translation, we used the same bicistronic plasmid mentioned above.[113]^44 We generated dual-luciferase reporter constructs comprising of either the shorter 5′ UTR without the G4 forming sequences (found in PC cells), the longer 5′ UTR containing G quadruplex forming sequences (found in CML cells), or a mutated version of the longer 5′ UTR with the G4 consensus sequence destroyed ([114]Figure 4A). These 5′ UTR variants were fused to the Renilla luciferase (Rluc) open reading frame (ORF) under the control of the CMV promoter. The Firefly luciferase (Fluc) was used as an internal control because, whilst a single mRNA is generated containing both Rluc and Fluc, its independent translation was ensured through the presence of the poliovirus IRES (Cap independent translation) sequence. Figure 4. [115]Figure 4 [116]Open in a new tab The G quadruplex (G4) structure in AGAP2 longer 5′ UTR influences mRNA translation negatively (A) Schematic representation of the fragments cloned into the bicistronic luciferase reporter (pcDNA3 RLUC POLIRES FLUC) plasmid.[117]^44 The AGAP2 5′ UTR fragments (shorter 5′ UTR, longer 5′ UTR with G4 consensus, and longer 5′ UTR with G4 consensus mutated) were inserted at the unique NheI restriction site proximal to the Renilla luciferase (Rluc) ORF. The Rluc is driven by cap-dependent mRNA translation through the cloned 5′ UTR. The Firefly luciferase cistron was used as an internal control for normalization. (B) Relative luciferase activity of the AGAP2 5′ UTR constructs measured using an in vitro transcription and translation system. The graph represents the mean of 4 independent experiments +/- standard deviation and data are expressed as the Rluc/Fluc ratio relative to the activity of the shorter 5′ UTR. Differences were analyzed using a Kruskal-Wallis [H (2) = 47.13, p =<0.001] followed by Mann-Whitney U test (∗∗∗p < 0.001). (C) Relative luciferase activity after transfecting the different constructs in DU145 and KU812 cells. The luciferase activity was analyzed 48 h after transfection in DU145 and 6 h after transfection in KU812. The graph represents the mean of three independent experiments performed in duplicate and expressed as relative Rluc/Fluc ratios. Differences between samples were analyzed with a Kruskal Wallis test followed by the Mann-Whitney U test, ∗∗∗p < 0.001. The bars represent the mean ± standard deviation. (D) Lysates for polysome profiling were prepared from KU812 and TCC-S cells and fractionated through a sucrose gradient. The profiles were monitored by measuring the absorbance at 254 nm (A[254 nm]). A representative polysome profile from a KU812 extraction is shown on the left. The relative distribution of the mRNA for AGAP2 longer 5′ UTR isoform (concentrated in the non-polysomal fractions) is shown on the right. The abundance of the RNA detected per fraction is presented as the percentage of the total RNA. mRNA levels were normalized to exogenous spike-in luciferase control mRNA. The graph represents the mean ± SEM of two independent experiments. Using in vitro transcription and translation, we observed that the plasmid with the longer 5′ UTR, containing the G4 sequence, mediated a significant decrease in the luciferase reporter activity relative to the short UTR. As this effect was reversed in the G4 mutant ([118]Figure 4B), these results confirmed the differential role for AGAP2 longer 5′ UTR in mRNA in vitro translation. Transfecting these 5′ UTR constructs into DU145 (PC) and KU812 (CML) cell lines demonstrated similar shifts in relative reporter activity in vivo ([119]Figure 4C). Although the pattern of relative luciferase activity was found to be similar, we noted that in the CML cell line (KU812), the impact of the longer 5′ UTR was less profound compared to the PC cell line. However, this could be explained by the differences in the method of transfection used. The leukemia cell lines are notoriously difficult to transfect, and an electroporation-based technique (nucleofection) was used to achieve optimal gene transfer.[120]^45 However, nucleofection has been shown to induce non-specific changes in the metabolic activity of the transfected cells and to alter the phosphorylation state of the translation initiation factor eIF2α.[121]^46^,[122]^47^,[123]^48 These non-specific effects could impact on KU812 cells response, as observed by the loss of differences in luciferase activity at later time points post-transfection and the loss of luciferase activity differences in DU145 when using nucleofection ([124]Figure S4). As the variations in translation efficiency found for the longer and shorter AGAP2 5′ UTRs could be attributed to differences in polysome seeding and occupancy,[125]^49 we examined the polysome association profiles of AGAP2 mRNA with the longer 5′ UTR. Interestingly, we noted a decreased polyribosome association of AGAP2 mRNA with longer 5′ UTR in the CML cell lines KU812 and TCC-S ([126]Figure 4D), implying a lower translation rate for this mRNA population with a longer 5′ UTR. Together, these results highlight the negative influence the presence of the G4 structure has on AGAP2 mRNA translation. AGAP2 expression regulation is not an isolated case Finding that AGAP2 expression could be regulated based on an alternative TSS selection within the same TSS cluster, raised the question of whether this was an isolated example. To examine its relevance in other genes, we performed a bioinformatics analysis to find potential G4 sequences between alternative TSS isoforms within a single cluster ([127]Figure 5A, see also [128]STAR methods). We used data from the FANTOM project[129]^32 to select the transcripts with differential TSS usage. Then, the nucleotide sequences between alternative TSSs were extracted and analyzed for the presence of potential G4s using the pqsfinder package in R.[130]^43 We identified 4,920 transcripts associated with 3,888 genes that contained potential G4 sequences between the two transcription start positions in the defined TSS cluster, upstream of the major TSS. And the large majority (91.9%) of these transcripts were protein-coding. In order to identify enriched pathways that could be modulated by alternatively TSS selection, we used MetaCore pathways analysis. The top three significantly enriched pathways included cytoskeleton remodeling, apoptosis and survival, and development ([131]Figure 5B). Figure 5. [132]Figure 5 [133]Open in a new tab Identification of potential targets regulated in a similar manner to AGAP2 (A) Workflow diagram used to identify genes whose expression could be regulated by the alternative selection of a TSSs, involving the presence of a G quadruplex (G4). First, the FANTOM database was used to identify all the transcripts that contained a G quadruplex (G4) consensus sequence between alternative TSSs within their defined TSS cluster. The G4 consensus sequences were identified using the pqsfinder package in R.[134]^50 The FANTOM database was also used to detect differential expression in PC (DU145, PC3) and CML (KU812, K562, KCL-22) cell lines for those genes that would encode a G4 consensus between alternative TSSs. Microarray and SWATH-MS data from the NCI-60 database were integrated to characterize genes that demonstrated discrepancies between their mRNA and protein levels (high mRNA and low protein) within those genes showing a differential 5′ UTRs with G4 sequences. (B) Metacore pathway enrichment analysis of mRNAs with alternative 5′ UTRs that contain G4 consensus sequences. The dot plot shows the top 15 enriched pathways with the largest gene ratio. The size of the dots represents the number of genes in each pathway and the color of the dots represents the adjusted p values (BH). (C) Venn diagram illustrating the overlapping genes in the FANTOM and NCI-60 databases showing differential 5′ UTRs with G4 consensus, with differentially expressed mRNA (≥1 log FC), and either no statistically significant differences in protein levels or significantly lower proteins levels in CML cell lines (left) or PC cell lines (right). The differential expression and TSS distribution were computed by linear modeling followed by empirical Bayes statistics. Next, we analyzed the distribution of these genes in PC and CML cell lines and found 1,007 genes that showed differential and cancer-type-specific distribution (similar to AGAP2, differential mRNA levels in both types of cancers). To identify suitable gene targets for validation, we used the NCI-60 dataset[135]^3 to confirm RNA expression levels (microarray data) and contrasted them to their protein (SWATH-MS) levels, compiling a reduced list of genes that had inconsistencies in RNA and protein levels whilst presenting potential G4 sequences between alternate TSSs within a cluster ([136]Figure 5C, See [137]STAR methods). From this list, we selected the HK1 gene (hexokinase 1, [138]NM_033496.2) as it showed very large differences in RNA vs protein levels in CML cell lines relative to PC cell lines. We tested HK1 expression in our system and, as shown in [139]Figure 6A, the relative mRNA levels were significantly higher in two of the CML cell lines included in the study (KU812 and TCC-S) whilst their HK I relative protein levels were lower ([140]Figure 6B). Furthermore, when analyzing the abundance of the mRNA with longer 5′ UTR that contained the potential G4 sequence, the levels were significantly higher in both CML cell lines ([141]Figure 6C). We were also able to detect an enrichment of HK1 mRNA in the BG4 fraction after performing rG4IP ([142]Figure 6D) and confirmed that HK1 mRNAs presented with the longer 5′ UTR preferentially associated with the non-polysomal fraction ([143]Figure 6E) although this preference was not as striking as in AGAP2’s case. Figure 6. [144]Figure 6 [145]Open in a new tab HK1 expression is also regulated by an alternative TSSs and mediation of a G4 structure (A) HK1 mRNA basal levels were detected in prostate cancer (PC) cell lines (DU145, PC3, and LNCaP) and chronic myeloid leukemia (CML) cell lines (KU812, TCC-S, and KCL-22) by RT-qPCR. HK1 expression was normalized against levels for the housekeeping gene HPRT and it is shown relative to levels in the PC cell line DU145. The difference in RNA expression was analyzed using one-way ANOVA[F (5, 12) = 22.25, p < 0.001)] with post-hoc Sidak’s multiple comparison tests, P-values shown (∗∗∗p < 0.001). The error bars denote standard deviation. (B) HK I protein levels were detected by western blot. The graph later in discussion shows overall densitometry values relative to those in the DU145 cell line. (C) Relative expression levels for the HK1 isoform with the longer 5′ UTR containing a G4 consensus sequence, detected by RT-qPCR in PC and CML cell lines. The bars represent the mean ± SD of three independent experiments. Statistical differences were analyzed by one-way ANOVA[F (5, 12) = 8.6, p < 0.001)] with post-hoc Sidak’s multiple comparison tests, p-values shown (∗p < 0.05; ∗∗p < 0.01). (D) rG4IP followed by HK1 detection by RT-qPCR. Expression levels were normalized by input control, and the data presented correspond to three independent immunoprecipitations. The error bars denote the standard deviation. Differences between samples were analyzed by unpaired two-tailed t-tests, P-values shown. (E) Polysomal fractionation: relative abundance of the HK1 mRNA with the longer 5′ UTR in polysome fractions in TCC-S cells (left) and KU812 cells (right). Overall, these results confirmed AGAP2 is not an isolated case for protein expression regulation mediated by the presence of a G4 associated with the selection of an upstream TSS within the same cluster and we proposed this mechanism ([146]Figure 7) as an alternative mechanism for gene expression regulation. Figure 7. [147]Figure 7 [148]Open in a new tab Model for an alternative regulatory mechanism (A) The selection of an earlier (upstream) TSS within a TSS cluster results in a slightly longer 5′ UTR that contains a G quadruplex (G4) structure. This G4 structure decreases the translational efficiency of the mRNA possibly by impeding ribosome scanning, decreasing the formation of polysomes, and resulting in a reduced translational output. (B) The selection of a downstream TSS yields a shorter 5′ UTR without the G4 sequence. This mRNA isoform prominently associates with ribosomes, forming polysomes and increasing translation efficiency. Discussion Alternative transcription initiation contributes to the transcriptomic diversity of eukaryotic organisms. It produces different transcript isoforms from a single gene that qualitatively and quantitatively different in their ability to produce proteins.[149]^15^,[150]^51^,[151]^52 However, studies investigating the regulatory role of alternative transcription initiation have focused so far on the transcript isoforms derived from alternative promoters. As a result, the consequence of differential TSSs selection within a single TSS cluster is currently poorly understood. Here, we have demonstrated a differential distribution of AGAP2 TSSs within the same TSS cluster in prostate cancer (PC) and chronic myeloid leukemia (CML) cell lines yielding a heterogeneous population of mRNAs with small nucleotide differences in their 5′ UTRs. We have highlighted that these minor changes in 5′ UTR lengths can lead to the presence of regulatory elements, G quadruplexes (G4) in our case, and influence mRNA translational efficiency. During our studies on AGAP2 role and regulation (a proto-oncogene involved in several cancer cell survival[152]^6^,[153]^7^,[154]^9), we identified a shared minimal promoter region for AGAP2 in PC and CML cells, observing significant differences in mRNA expression levels.[155]^4 In the current study, we have demonstrated a negative correlation between AGAP2 mRNA and protein levels in these cancers ([156]Figure 1). However, there are many instances where mRNA levels do not correspond with protein levels.[157]^2 The difference here is that we have also shown a differential distribution of TSSs for AGAP2 in PC and CML cell lines. A look at the CAGE tags representing the TSSs for AGAP2 in the FANTOM project[158]^53 shows a broad distribution with a dominant peak ([159]Figure 2A). But when we analyzed the TSS selection in PC and CML cell lines, we observed a single dominant peak in DU145 (PC) and a broad distribution in KU812 and KCL-22 (CML) cell lines ([160]Figures 2 and [161]S2), with the distinctive presence of an upstream TSS and the consequent production of a longer 5′ UTR in AGAP2 mRNA on CML cell lines ([162]Figure 2E). Tissue-specific TSS usage within a TSS cluster has been previously described even if the consequences were unknown,[163]^36 contributing to the notion that transcription initiation is precisely regulated at promoters. But despite hints of this TSS distribution change being linked to processes such as cell cycle phases,[164]^54 a clear role for this differential selection is still missing and the concept of transcriptional “noise” remains.[165]^30 As the selection of an upstream TSS in CML cell lines led to the presence of a longer 5′ UTR that could easily be monitored in cells, we investigated a possible differential role in translation for this isoform when compared to the 5′ UTR isoform generated from the dominant peak in PC cells (‘shorter 5’ UTR). However, it should be noted that this longer 5′ UTR isoform represented a reduced percentage of the total of 5′ UTR isoforms present in CML cells, both in our hands ([166]Figure 2B) and in the FANTOM database ([167]Figures 2C and [168]S2B). Changes in translation efficiency on mRNAs with differential 5′ UTRs are often due to specific sequences, with longer 5′ UTRs associating generally with lower translation efficiencies.[169]^26^,[170]^28 One of the features that can account for residual variance in translation rates is the presence of alternatively transcribed G quadruplexes (G4s). G4s in the 5′ UTRs are generally associated with suppressed translation.[171]^41^,[172]^55 However, there are also examples of increased translation when this structure is present.[173]^56 Different approaches have been used to detect the RNA G4 structures inside the cells, including the use of G4-stabilising ligands/ions,[174]^57^,[175]^58 small molecule probes,[176]^59 RNA structural mapping,[177]^60 reverse transcription stalling,[178]^57 RNA G4 structure-protein interactions,[179]^61 ligands with fluorescence activity,[180]^62 self-biotinylation methodology,[181]^63 and a G4-structure specific antibody.[182]^40 Most of the methodologies mentioned above used specific ligands and/or reactive small molecules that could shift the equilibrium in the favor of G4 formation and might not be representative of actual RNA G4 conformations in living cells. Therefore, we developed the rG4IP technique to selectively enrich cytosolic RNAs with G4s, not fixing the cells and incorporating a step to degrade any trace amount of genomic DNA.[183]^31 Using circular dichroism and rG4IP, we were able to demonstrate the formation of a G4 structure in AGAP2 longer 5′ UTR ([184]Figure 3). And, as described for other mRNAs,[185]^41^,[186]^55 this structure was responsible for a reduced protein expression ([187]Figures 4B and 4C) and a reduced polysome association ([188]Figure 4D). Next, we used a bioinformatics approach to detect other genes with protein levels negatively associated with the presence of a G4 and the selection of an upstream TSS within the same cluster. We identified a list of potential target genes implicated in key cellular pathways ([189]Figure 5). However, it is likely that our approach might have missed many other targets as, for example, the SWATH-MS data for protein levels only included proteins common to all the cell lines in the NCI-60 database and could lead to the underestimation of targets. Still, we were able to validate this association for HK1 expression ([190]Figure 6), a key protein in glucose metabolism and implicated in neurodevelopmental abnormalities.[191]^64 Our data support TSS selection within a TSS cluster as a mechanism to modulate protein levels. And as the longer 5′ UTR isoforms with the G4 are present when the levels of mRNA are higher, it would be interesting to explore their role as a mRNA reservoir ready to be translated under specific signals. Further research into TSS selection is also necessary as we know it can be influenced by different factors such as the type of promoter,[192]^35 methylation patterns,[193]^65 chromatin remodeling, and histone modifications,[194]^66 but a detailed understanding would open new possibilities for protein expression manipulation. In conclusion, when comparing CML and prostate cancer cell lines, our study has highlighted the relevance of TSS selection within a TSS cluster as a regulatory mechanism involving the differential formation of a G4 structure in the longer 5′ UTR isoforms, altering mRNA translation efficiency and associating with lower protein expression levels. Limitation of the study The negative correlation between AGAP2 mRNA and protein observed in [195]Figure 1 cannot be fully explained by the reduced protein expression obtained by the presence of the longer 5′ UTR, as this is not a major isoform in these cells. Therefore, the other 5′ UTRs generated in CML cells will likely be contributors to this reduced protein output and it would be worth studying them for the presence of specific motives/structures. In particular uORFs, as there are several predicted functional uORFs between the start codon and the main TSS peak[196]^67 that could become operative in the shorter 5′ UTR isoforms. STAR★Methods Key resources table REAGENT or RESOURCE SOURCE IDENTIFIER Antibodies __________________________________________________________________ Goat polyclonal anti-AGAP2 Sigma-Aldrich Cat#SAB2501250; RRID:[197]AB_10620617 Mouse monoclonal anti-HK I Santa Cruz Biotechnology Cat#sc-46695; RRID:[198]AB_627721 Mouse monoclonal anti-DNA/RNA G-quadruplex (clone BG4) Absolute Antibody Cat#Ab00174-1.1 Mouse IgG Isotype Control antibody ThermoFisher Cat#31903; RRID:[199]AB_10959891 Rabbit polyclonal anti-Ubiquitin Cell Signaling Technology Cat#3933; RRID:[200]AB_2180538 Mouse monoclonal anti-β-Actin Sigma-Aldrich Cat#A2228; RRID:[201]AB_476697 Mouse monoclonal anti-β-Tubulin Sigma-Aldrich Cat#T8328; RRID:[202]AB_1844090 Rabbit monoclonal anti-eIF4A (clone C32B4) Cell Signaling Technology Cat#2013; RRID:[203]AB_2097363 Rabbit Polyclonal anti-eIF4A1 Cell Signaling Technology Cat#2490; RRID:[204]AB_823487 Rabbit Polyclonal anti-eIF4B Cell Signaling Technology Cat#3592; RRID:[205]AB_2293388 Rabbit monoclonal anti-eIF4E (clone C46H6) Cell Signaling Technology Cat#2067; RRID:[206]AB_2097675 Rabbit monoclonal anti-eIF4G (clone C45A4) Cell Signaling Technology Cat#2469; RRID:[207]AB_2096028 Rabbit monoclonal anti-eIF4H (clone D85F2) Cell Signaling Technology Cat#3469; RRID:[208]AB_2096038 Anti-rabbit IgG, HRP-linked Antibody Cell Signaling Technology Cat#7074; RRID:[209]AB_2099233 Anti-mouse IgG, HRP-linked Antibody Cell Signaling Technology Cat#7076; RRID:[210]AB_330924 Anti-goat IgG, HRP-linked Antibody Sigma-Aldrich Cat#A4174; RRID:[211]AB_258138 __________________________________________________________________ Bacterial and virus strains __________________________________________________________________ DH5α Thermo-Fisher Cat#18265017 One Shot TOP10 Chemically Competent E. coli Thermo-Fisher Cat#C404010 __________________________________________________________________ Chemicals, peptides, and recombinant proteins __________________________________________________________________ Potassium chloride Sigma-Aldrich Cat#P9333; CAS:7447-40-7 HEPES Sigma-Aldrich Cat# H3375; CAS:7365-45-9 NP-40 Sigma-Aldrich Cat# I3021; CAS:9002-93-1 Digitonin Abcam ab141501; CAS:11024-24-1 Absolute Ethanol for molecular biology Fischer Scientific Cat#10644795 2-Propanol for molecular biology Sigma-Aldrich Cat#278475 Nuclease free water Promega Cat#P1193 MG132, proteasome inhibitor Sigma-Aldrich Cat#474790; CAS:133407-82-6 Bortezomib, proteasome inhibitor Santa Cruz Cat#sc-217785; CAS:179324-69-7 Cyclohexamide Santa Cruz Cat#sc-3508; CAS:66-81-9 DNase I (RNase-free) ThermoFisher Cat#AM2222 Complete EDTA-free protease inhibitor cocktail Roche Cat#5056489001 SureBeads Protein G Biorad Cat#161-4023 Ampicillin sodium salt Sigma-Aldrich Cat#A9518 Glycogen ThermoFisher Cat#AM9510 NheI restriction endonuclease Promega Cat#R6501 XhoI restriction endonuclease Promega Cat#R6161 Taq DNA polymerase Promega Cat#M7841 Alkaline Calf Intestinal Phosphatase Promega Cat# M1821 T4 DNA Ligase Promega Cat#M1801 TRIzol Reagent ThermoFisher Cat#15596026 LiCl Precipitation Solution ThermoFisher Cat#AM9480 Chloroform Sigma-Aldrich Cat#C2432 30% Acrylamide Severn Biotech Cat#20-2100-10 1 kb DNA Ladder Promega Cat#G5711 100 bp DNA Ladder Promega Cat#G2101 Precision Plus Protein™ Dual Color Standards Biorad Cat#1610374 __________________________________________________________________ Critical commercial assays __________________________________________________________________ Dual-Luciferase Reporter Assay System Promega Cat#E1910 ReliaPrep RNA Miniprep Systems Promega Cat#Z6011 NucleoSpin Plasmid Columns Fischer Scientific Cat#11932392 GeneRace Kit with Super-Script III RT and TOPO TA Cloning for 5′ RLM-RACE ThermoFisher Cat#L150201 Amaxa Cell Line Nucleofector Kit V Lonza Cat#VCA-1003 Pierce™ BCA Protein Assay Kit ThermoFisher Cat#23227 M-MLV Reverse Transcriptase Promega Cat#M1701 TOPO TA Cloning Kit ThermoFisher Cat#K4575J10 GoTaq® qPCR SYBR master mix Promega Cat#A6001 mMESSAGE mMACHINE T7 Transcription Kit ThermoFisher Cat#AM1344 ECL Western Blotting Substrate Promega Cat#W1001 Flexi Rabbit Reticulocyte Lysate System Promega Cat#L4540 Wizard SV Gel and PCR Clean-Up System Promega Cat#A9281 jetPRIME DNA/siRNA transfection reagent Polyplus Cat#114-01 DNeasy Blood & Tissue Kit QIAGEN Cat#69504 __________________________________________________________________ Deposited data __________________________________________________________________ FANTOM5 database for TSS profiles Lizio et al.,[212]^32 [213]https://fantom.gsc.riken.jp/5/datafiles/latest/ NCI-60 microarray dataset Reinhold et al.,[214]^33 [215]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE32474; GEO: [216]GSE32474 NCI-60 SWATH-MS dataset Guo et al.,[217]^68 PRIDE: [218]PXD003539 __________________________________________________________________ Experimental models: Cell lines __________________________________________________________________ KU812 (Human chronic myelogenous leukaemia) ATCC Cat#CRL-209; RRID:CVCL_0379 TCC-S (Human myelogenous leukaemia) Van et al.,[219]^69 N/A KCL-22 (Human myelogenous leukaemia) ATCC N/A DU145 (Human prostate cancer) ATCC Cat#HTB-81; RRID:CVCL_0105 PC3 (Human prostate adenocarcinoma) ATCC Cat#CRL-1435; RRID:CVCL_0035 LNCaP (Human prostate cancer) ATCC N/A HepG2 (hepatocellular carcinoma) ATCC Cat#HB-8065; RRID:CVCL_0027 HuH7 (Human liver cancer) JCRB Cat#JCRB0403; RRID:CVCL_0336 MCF-7 (Human breast adenocarcinoma) ATCC Cat#HTB-22; RRID:CVCL_0031 PA-1 (Human ovary teratocarcinoma) ATCC Cat#CRL-1572; RRID:CVCL_0479 SK-OV-3 (Human ovary adenocarcinoma) ATCC Cat#HTB-77; RRID:CVCL_0532 U-2 OS (Human osteosarcoma) ECACC Cat#92022711; RRID:CVCL_0042 RAJI (Human Burkitt′s lymphoma) ATCC Cat#CCL-86; RRID:CVCL_0511 KG1 (Human acute myelogenous leukemia) ATCC Cat#CRL-8031; RRID:CVCL_0374 KASUMI-1 (Human acute myeloblastic leukemia) ATCC Cat#CRL-2724; RRID:CVCL_0589 __________________________________________________________________ Oligonucleotides __________________________________________________________________ A full list of DNA oligos is available in [220]Table S1 N/A N/A Random Primers Promega Cat#C1181 RNA oligo (CD spectroscopy) GGGCGGGCAGGGGCGGGG This Study N/A Mutant RNA oligo (CD spectroscopy) GAGCGAGCAGAGGCGAGG This Study N/A __________________________________________________________________ Recombinant DNA __________________________________________________________________ pcDNA3 RLUC POLIRES FLUC Addgene; Poulin et al.,[221]^44 Cat#45642; RRID:Addgene_45642 pcDNA3 RLUC POLIRES FLUC-G1 (AGAP longer 5′ UTR) This Study N/A pcDNA3 RLUC POLIRES FLUC-G2 (AGAP shorter 5′ UTR) This Study N/A pcDNA3 RLUC POLIRES FLUC-G3 (AGAP mutated longer 5′ UTR) This Study N/A __________________________________________________________________ Software and algorithms __________________________________________________________________ GraphPad Prism 8 GraphPad Software, Inc. [222]https://www.graphpad.com/scientific-software/prism/ Image Studio™ Lite Li-COR [223]https://www.licor.com/bio/image-studio-lite/ MetaCore Pathway Analysis Clarivate Analytics [224]https://portal.genego.com/ BaseSpace Sequence Hub Illumina [225]https://basespace.illumina.com/ The Integrative Genomics Viewer (IGV) Broad Institute [226]http://software.broadinstitute.org/software/igv/ BEDTOOLS v2.28 N/A [227]https://bedtools.readthedocs.io/en/latest/index.html Rstudio Rstudio team [228]https://www.rstudio.com/ DESeq2 Love et al.,[229]^70 [230]http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html . Pqsfinder Hon et al.,[231]^50 [232]http://bioconductor.org/packages/release/bioc/html/pqsfinder.html Python programming language Version 3.6.8 [233]https://www.python.org/ Multiple sequence alignment Corpet et al.,[234]^71 [235]http://multalin.toulouse.inra.fr/multalin/ InteractiVenn for Venn diagram Heberle et al.,[236]^72 [237]http://www.interactivenn.net/ __________________________________________________________________ Other __________________________________________________________________ Nitrocellulose membrane GE Healthcare Cat#10600006 RPMI 1640 cell culture Media Gibco Cat#52400025 Dulbecco’s Modified Eagle Medium with GlutaMAX Gibco Cat#10566016 Dulbecco’s Modified Eagle Medium/Nutrient Mixture F-12 Gibco Cat# 11320033 Iscove’s Modified Dulbecco’s Medium Gibco Cat#12440053 Opti-MEM Reduced-Serum Medium Gibco Cat#31985062 Fetal Bovine Serum Biosera Cat#FB1090/500 [238]Open in a new tab Resource availability Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Dr Cristina Montiel-Duarte ([239]cristina.montielduarte@ntu.ac.uk). Materials availability Plasmids generated in this study will be deposited to Addgene. Experimental model and subject details Cell lines and culture conditions DU145 (RRID:CVCL_0105), HEPG2 (RRID:CVCL_0027), HuH7(RRID:CVCL_0336), and U-2 OS (RRID:CVCL_0042) were cultivated in DMEM GlutaMAX supplemented with 10% FBS. LNCaP (ATCC), KU812 (RRID:CVCL_0379), TCC-S,[240]^69 KCL-22 (ATCC), KASUMI-1 (RRID:CVCL_0589), and RAJI (RRID:CVCL_0511) were cultured in RPMI supplemented with 2 mM L-Glutamine and 10% FBS. PC3 (RRID:CVCL_0035) was grown in DMEM/F12 containing 2 mM L-Glutamine and 10% FBS. MCF-7 (RRID:CVCL_0031) was cultured in DMEM GlutaMAX supplemented with 10% FBS and 0.01 mg/mL human recombinant insulin. KG1 (RRID:CVCL_0374) was cultivated in Iscove’s Modified Dulbecco’s Medium, 2 mM Glutamine, and 20% FBS. All cell lines were maintained at 37°C in a 5% CO[2] incubator and tested negative for mycoplasma contamination. Method details Protein extraction and western blot analysis For total protein extraction and western blot analysis, cells were lysed in ice-cold RIPA buffer [50 mM Tris-Cl (pH 8.0), 1% NP-40, 1% sodium deoxycholate, 0.1% SDS, 150 mM NaCl] supplemented with protease inhibitors (Roche). The lysate was incubated on ice for 30 min and sonicated with ice-cooling for 3 × 5 s pulses at a frequency of 5 microns using a Soniprep 150 plus (MSEs) followed by centrifugation at 13,000 × g for 10 min at 4°C. The amount of protein was quantified using the Pierce BCA Protein Assay Kit (ThermoFisher). Typically, 50 μg of total protein in Laemmli buffer [2% SDS, 10% glycerol, 50 mM Tris-HCl (pH 6.8), bromophenol blue 0.02%, 1% β-mercaptoethanol] was heated to 95°C for 5 min and separated by SDS-PAGE and subsequently transferred to Amersham Protran 0.2 μm nitrocellulose membrane (GE Healthcare). The membrane was blocked with 5% non-fat dry milk in TBST [20 mM Tris-HCl (pH 7.6), 150 mM NaCl, 0.1% Tween 20] and probed with indicated primary antibodies overnight at 4°C. Membranes were then washed with TBST three times for 10 min at room temperature followed by incubation for 1 h with the appropriate secondary antibody. The membrane was washed again three times and signals were detected using ECL Western Blot Substrate (BioRad) and the luminescent image analyser LAS-4000 (Fujifilm). RNA extraction and real-time quantitative PCR Total RNA from cell lines were isolated by ReliaPrep RNA Miniprep System (Promega) according to the manufacturer’s protocol. 2 μg of the total RNA was reverse transcribed using M-MLV Reverse Transcriptase (Promega) with Random hexamers (Promega). The quantitative real-time PCR (qPCR) was performed in triplicate using GoTaq qPCR SYBR master mix (Promega) on the Rotor-Gene Q real-time PCR cycler (Qiagen). The expression levels of AGAP2 and HK1 were normalized to the house-keeping gene (HPRT). Primer sequences are presented in [241]Table S1. The relative gene expression levels were calculated using the comparative Ct method (2−ΔΔCt).[242]^73 For amplification of AGAP2 5′ UTR isoforms, a nested PCR with outer and inner forward and reverse primers were used ([243]Table S1). The first-round PCR products were diluted 50-fold as the template for the second-round of qPCR. 5′ RNA ligase-mediated rapid amplification of cDNA ends The 5′ RNA ligase-mediated rapid amplification of cDNA ends (5′ RLM-RACE) [GeneRacer kit (ThermoFisher)] was performed following the manufacturer’s instructions. Briefly, 3 μg of total RNA were treated with calf intestinal phosphatase and tobacco acid pyrophosphatase to dephosphorylate and remove the 5′ mRNA cap structure, respectively. The RNA was then ligated to 250 ng of GeneRacer RNA adaptor by T4 RNA ligase. After each step, the RNA was precipitated using phenol/chloroform. The dephosphorylated, decapped, and ligated RNA was reverse transcribed using gene-specific primers ([244]Table S1). The cDNA was amplified using the adaptor and gene-specific nested primers ([245]Table S1) and purified by 2% agarose gel electrophoresis. The purified product was cloned for sequencing using TOPO TA Cloning Kit (ThermoFisher) and, at least, ten independent clones were sequenced for each cell line by Sanger Sequencing (Source Bioscience). Plasmid constructs, transient transfection, and dual luciferase reporter assay The plasmid (pcDNA3 RLUC POLIRES FLUC) was a gift from Nahum Sonenberg (Addgene pcDNA3; RRID:Addgene_45642). The 5′ UTR isoforms (shorter, longer and mutated longer) were designed and purchased from GeneScript (Hong Kong) ([246]Table S2). The 5′ UTR isoforms and the plasmid were digested with NheI and the products were separated in a 1% agarose gel, purified with Wizard SV kit (Promega), and treated with alkaline phosphatase (Promega). The purified digested plasmid and the 5′ UTR inserts were ligated using T4 DNA ligase (Promega). The constructs were transformed into DH5α competent cells (Thermo-Fisher) and cultured in LB medium with 100 μg/mL Ampicillin (Sigma-Aldrich). Positive clones were chosen, purified, and confirmed by Sanger sequencing (Source Bioscience). Reporter constructs were transfected into the PC cell line (DU145) using JetPRIME transfection reagent (Polyplus) according to the manufacturer’s protocol. Cells were seeded at a density of 2.5 × 10^5 cells/well in 6-well plates for 24 h before transfection and collected 48 h after being transfected. For transient transfection of the CML cell line (KU812), 1 μg of each reporter plasmid was electroporated into 2 × 10^6 cells using Amaxa Cell Line Nucleofector solution V (Lonza), program X-001, and collected after 6 h. After the indicated time points, cells were lysed with Passive Lysis Buffer (Promega) and their luciferase activities were measured using the Dual-Luciferase Reporter Assay System (Promega) on a CLARIOstar microplate reader (BMG Labtech). The Firefly luciferase activity was used as an internal normalising control. In vitro transcription and translation assay The plasmid constructs were transcribed in vitro by T7 RNA polymerase using mMESSAGE mMACHINE Transcription Kit (Thermo Scientific) following the manufacturer’s guidelines and precipitated using Lithium chloride (ThermoFisher). The resulting RNAs were translated in vitro using Flexi Rabbit Reticulocyte Lysate Translation System (Promega). The RNA was translated for 90 min at 30°C and the luciferase activity of the translation products was analyzed using Dual-Luciferase Reporter Assay, as described above. Circular dichroism spectroscopy The Circular dichroism (CD) experiments were performed on 5 μM of RNA oligos (See [247]key resources table) in Tris–HCl (pH 7.5) buffer containing either 100 mM of NaCl or KCl or no salts. The measurements were performed using a JASCO J-715 Spectropolarimeter (JASCO). Quartz cell cuvettes of 0.1 cm path length were used, and wavelengths were recorded between 220 and 320 nm at a scan speed of 50 nm/min with a response time of 2 s. The data presented are an average of six spectral scans with baseline buffer correction. Polysome fractionation Prior to harvesting, 25 × 10^6 cells were treated with 100 μg/mL cycloheximide (CHX) and incubated at 37°C and 5% CO[2] for 10 min. The cells were washed and resuspended in lysis buffer (100 mM KCl, 5 mM MgCl2, 20 mM HEPES (pH 7.4), 0.5% NP-40, 100 μg/mL CHX, 2 mM DTT, 40 U/mL RNase inhibitor, and 1× protease inhibitor cocktail) and incubated for 10 min on ice followed by centrifugation at 12,000 × g for 10 min at 4°C to pellet the nuclei and debris. The RNA supernatant was layered on the top of a 10–50% sucrose gradient and centrifuged at 190,000 × g for 90 min at 4°C. The gradients were then fractionated from top to bottom while measuring absorbance at 254 nm. 500 μL of each sucrose fractions were collected and RNA was isolated using TRIzol extraction. Briefly, each fraction was resuspended in TRIzol (ThermoFisher) and chloroform (Sigma-Aldrich), mixed vigorously and centrifuged at 13,000 × g for 15 min at 4°C to separate into 3 layers. The RNA in the top aqueous layer was precipitated using ethanol and 3M sodium acetate (pH 5.2) and spiked with 500 ng of in vitro transcribed Renilla luciferase RNA control. The RNA was further cleaned and concentrated using ReliaPrep RNA Miniprep Systems (Promega). The samples were reverse transcribed and amplified using qPCR, as mentioned above. RNA G quadruplex immunoprecipitation (rG4IP)[248]^31 Briefly, rG4IP was performed with a structure-specific G quadruplex (BG4) antibody (Absolute Antibody). TCC-S cells (15 × 10^6) were collected, washed with ice-cold PBS, and resuspended in ice-cold lysis buffer (150 mM KCL, 50 mM HEPES, 25 μg/mL Digitonin, 100 U/mL RNase inhibitor). The cells were incubated with lysis buffer for 10 min at 4°C using end over end rotation and centrifuged at 2,000 × g for 5 min at 4°C. The supernatant (cytosolic fraction) was saved and 10% was removed to be used as input control. When transfections were required, 1 × 10^6 DU145 cells were seeded in a 100 mm dish, transfected using JetPRIME transfection reagent (Polyplus), trypsinised after 48 h, and processed as above. Precleared lysates were incubated overnight with 3 μg of BG4 antibody bound to Protein G magnetic beads (Biorad). After incubation, the beads were magnetised, washed thrice with lysis buffer, and eluted by incubating at 65°C for 15 min to release the bound nucleic acids. The eluent was treated with 2 U of RNase-free DNase I (ThermoFisher) for 15 min at 37°C to remove contaminating DNA. The RNAs from input and IP fractions were then isolated through TRIzol (ThermoFisher) extraction followed by isopropanol precipitation. Quantification and statistical analysis Statistics All statistical analysis was performed using GraphPad Prism software (version 8). For experiments where two groups were compared, a two-tailed Student’s t test was performed in case of normalised data and Mann-Whitney U-test was used for the analysis of non-parametric data. Normality was evaluated using the Shapiro-Wilk test. For comparison of three or more groups, a one-way ANOVA was performed followed by post-hoc Sidak’s multiple comparison tests. For non-parametric data, Kruskal-Wallis followed by uncorrected Dunn’s test was used. Unless otherwise stated, histogram columns represent the mean and error bars indicate the standard deviation. The data were considered to be statistically significant if p < 0.05 and this is indicated in the figure legends by asterisks (∗p < 0.05; ∗∗p < 0.01; ∗∗∗p < 0.001). Mapping G quadruplex consensus sequences between alternative TSSs The normalised 5′ CAGE tag density (Tags Per Million - TPM) data for all the available human samples in the FANTOM database was downloaded from the ZENBU genome browser.[249]^32 The CAGE tag starting sites mapped to a +/− 50 base pair region around the annotated transcription start sites for each gene transcripts were selected. The TSS annotations were downloaded from Ensembl GRCh38.p13 (release 98).[250]^74 The tags with a normalised density less than 2 TPM were removed to select robust CAGE tag starting sites. The overlapping CAGE tags around the annotated TSS, as defined above, were considered as part of a single cluster. The sequence between CAGE tags with the highest TPM and furthermost upstream tag within the same cluster were extracted for all the transcripts using Bio. Entrez module in Biopython[251]^75 ([252]Data S1). The sequences were then analyzed for the presence of G quadruplex consensus using pqsfinder package in R[253]^50 ([254]Data S2). For analysing the differential distribution of G quadruplex forming TSSs in PC and CML cell lines, the data for PC cell lines [DU145 (10490-107B4), PC3 (10439-106E7)] and CML cell lines [replicates for K562 (10454-106G4, 10824-111C5)] were downloaded from the FANTOM database. The proportions of G quadruplex forming TSSs were estimated by dividing the numbers of tags within a 21 bp subregion upstream of the G quadruplex starting position and the total tags in the selected TSS cluster. The 21 bp subregion was selected to maintain uniformity and also because the CAGE tags are about 21 bp long and any upstream overlapping tags within this region would belong to the same cluster.[255]^50^,[256]^76 The differential distribution was computed by linear modeling and empirical Bayes approach using the Limma package in R ([257]Data S3 for data wrangling and analysis). Identification of genes with a discrepancy in mRNA and protein levels To identify gene showing discrepancies in mRNA and protein level as noted for AGAP2, the NCI-60 microarray data ([258]GSE64674) and NCI-60 SWATH-MS database were used.[259]^33^,[260]^68 The differentially expressed RNAs in PC (DU145, PC3) and CML (K562) were analyzed using GEO2R (NCBI) ([261]Data S3). The significant differences in protein mass spectral intensity values were evaluated using Limma ([262]Data S3). The genes with differential RNA expression 2-fold or greater (RNA logFC ≥1) and no statistically significant differences in protein levels and/or significantly lower proteins levels were considered to have a discrepancy in mRNA and protein levels. The genes with discrepancies were analyzed for the presence of alternative 5′ UTRs with G quadruplex consensus sequences to identify targets for validation. Pathway analysis Functional pathway maps of genes with alternatively transcribed G quadruplex structure was created using Metacore (Clarivate Analytics). The ranked hypergeometric test was used to determine enriched pathways and processes. Acknowledgments