Abstract NF-Y is a CCAAT-binding trimeric transcription factor, whose regulome, interactome and oncogenic potential point to direct involvement in cellular transformation. Yet little is known about the levels of NF-Y subunits in tumors. We focused on breast carcinomas, and analyzed RNA-Seq datasets of TCGA and 54 BRCA cell lines at gene and isoforms level. We partitioned all tumors in the four major subclasses. NF-YA, but not histone-fold subunits NF-YB/NF-YC, is globally overexpressed, correlating with the proliferative Ki67 marker and a common set of 840 genes, with cell-cycle, metabolism GO terms. Their promoters are enriched in NF-Y, GC-rich and E2F sites. Surprisingly, there is an isoform switch, with the “short” isoform -NF-YAs- becoming predominant in tumors. E2F genes are also overexpressed in BRCA, but no switch in isoforms is observed. In Basal-like Claudin^low cell lines and tumors, expression of NF-YAl -long- isoform is high, together with 11 typical EMT markers and low levels of basal Keratins. Analysis of Progression-Free-Intervals indicates that tumors with unbalance of NF-YA isoforms ratios have worst clinical outcomes. The data suggest that NF-YA overexpression increases CCAAT-dependent, pro-growth genes in BRCA. NF-YAs is associated with a proliferative signature, but high levels of NF-YAl signal loss of epithelial features, EMT and acquisition of a more aggressive behavior in a subset of Claudin^low Basal-like tumors. Subject terms: Breast cancer, Cancer genomics Introduction The synergy and precise interplay of Transcription Factors -TFs- on promoters and enhancers dictate regulation of gene expression. Many TFs are pivotal in the control of cell growth, and their altered structure or expression leads to tumorigenesis. NF-Y is a TF binding with high specificity to the CCAAT box, an important regulatory element. NF-Y has a role as a”pioneer” TF, setting the chromatin stage for recruiting other TFs and coactivators^[28]1–[29]3. It consists of three subunits: the histone fold domain -HFD- dimer NF-YB/NF-YC and the sequence-specific NF-YA^[30]4. NF-YA and NF-YC are involved in alternative splicing^[31]5,[32]6. Specifically, there are two major isoforms of NF-YA, NF-YAs “short” and NF-YAl “long”, differing in 28/29 amino acids within the Gln-rich TransActivation Domain, TAD^[33]5. NF-Y genes are rarely mutated in cell lines or cancer specimens ([34]http://www.sanger.ac.uk/genetics/CGP/cosmic/), yet different lines of evidence suggest that it plays a relevant role in cancer progression. Microarrays profiling of genes overexpressed in tumor vs normal cells found cancer “signature” genes and TFBSs -Transcription Factor Binding Sites- searches identified CCAAT as overrepresented in their promoters (Reviewed in Ref.^[35]7). The same conclusion was reached in Oncomine profiling data using unbiased de novo motif discovery tools^[36]8. More recent profiling reports confirmed this, specifically in breast cancer^[37]9–[38]12. RNA-seq data analysis are fewer, but pointing in the same direction^[39]13,[40]14. It is well established that CCAAT, wherever present in promoters, is crucial for high-level expression of genes^[41]15; thus, it appears that tumors rely on CCAAT-binding to activate a significant number of “cancer“ genes. NF-Y was analyzed by the vast ENCODE consortium, and by independent ChIP-Seq experiments: connections to oncogenic and growth controlling TFs and signaling pathways emerged (1, Reviewed by^[42]16). What is not clear is whether NF-Y is overexpressed in cancer cells, and in case, which types. There is no widespread, systematic analysis of expression levels of the subunits in tumors, and the available information is limited to small cohorts of specific cancers. Epithelial ovarian cancer cells show increased NF-YA levels, specifically the short isoform, and tumors with high NF-YA levels have a poorer prognosis^[43]17,[44]18. Elevated expression of NF-YA, along with other TFs, was reported in Triple Negative Breast Cancers^[45]14. High levels of NF-YA mRNA were found in the “diffuse” type of gastric cancer^[46]19, and of the NF-YC protein in gliomas^[47]20 and colon adenocarcinomas^[48]21. To close this gap in our knowledge of NF-Y biology, we analyzed the mRNA levels of NF-Y subunits in human tumor samples, both in quantitative and qualitative terms, by interrogating large-scale RNA-Seq datasets of TCGA. We then decided to focus specifically on breast carcinomas. Results NF-YA is widely overexpressed in tumors of epithelial origin The global mRNA levels of the three NF-Y subunits were investigated with Firebrowse ([49]http://firebrowse.org/viewGene.html) in 37 different types of tumors present in TCGA. 9 types of tumors lack normal counterparts, and were not further considered. We restricted analysis to tumors with matched normal samples >5. Therefore, the analysis was limited to 18 tumor types and the results are shown in Fig. [50]S1 as FPKMs box plots of NF-YA, NF-YB and NF-YC. The levels of NF-YA are increased in many types of tumors and decreased in few. Considering a p-value threshold of e-04, 11/18 tumors have higher levels of NF-YA, 2/18 lower levels. The increase is robust in epithelial tumors: carcinomas of breast (BRCA), colon (COAD), rectum (READ), stomach (STAD), liver (LIHC), prostate (PRAD), uterine (UCEC), head and neck squamous cells (HNCC), cholangiocarcinoma (CHOL), lung adenocarcinoma (LUAD) and squamous cells carcinoma (LUSC). The pattern is different for the HFD subunits, since overexpression is neither statistically overwhelming nor concordant: NF-YB is decreased in 7 tumors, increased in 5; NF-YC is increased in 6 and decreased in 3. An increase in all NF-Y subunits is observed in CHOL, LIHC (Liver hepatocellular carcinoma) and STAD, a decrease in THCA (thyroid carcinoma) and KICH (kidney chromophoebe). In ESCA (Esophageal carcinoma), KIRP (kidney renal papillary cell carcinoma) and GBM (glioblastoma multiforme), subunits expression is not changed. In conclusion, there is an increase in mRNA levels of NF-YA, but not NF-YB/NF-YC, in most tumors, specifically of epithelial origin. One of the tumors in which overexpression of NF-YA is not observed is GBM. To verify this, we searched independent RNA-seq GEO datasets ([51]GSE59612) that include samples taken from areas of tumors with mesenchymal and neural cells and matched with normal ones^[52]22. Box plot analysis of expression of the two major splicing isoforms of NF-YA did not show a significant change; the same was true for the three isoforms of NF-YC, bar a modest increase in the 37 kD and a decrease of the 50 kD isoform. NF-YB was decreased (Fig. [53]S2). These results confirm the TCGA data shown above in that there is no overexpression of NF-YA in GBM. NF-YA is overexpressed in BRCA We focused our attention on the BRCA dataset of TCGA: further quantitative analysis of RNA-Seq data found that the levels of NF-YA, but not NF-YB nor NF-YC, are increased in cancer samples compared to normal controls (Fig. [54]1A). Breast carcinomas are divided in several subtypes, according to different clinical, histological and molecular parameters. In theory, NF-YA overexpression could be specific to one -or more- of the cancer subtypes. Molecular classification of BRCA is defined by a gene expression signature of 50 genes -termed PAM50- partitioning four types: Basal-like, HER2E, Luminal A and Luminal B. Originally identified with mRNA profilings^[55]23, PAM50 was later confirmed by qRT-PCR^[56]24, RNA-Seq and partial analysis of TCGA samples^[57]25. Classification of the four subtypes within TCGA was performed on 514^[58]26 and later 817 tumor samples^[59]27. Our first goal was to extend it to all 1083 BRCA for which RNA-Seq data are available. To do so, we employed a classifier based on PAM50, as defined previously^[60]28. Venn diagrams of the 514, 817 and 1083 samples are shown in Fig. [61]S3: with respect to the original partitioning of 524 tumors^[62]26, relative proportions are very similar for Basal-like (now 203 tumors) and HER2E (now 126); we confirm a shift of samples from Luminal A -now 320 tumors- to Luminal B -now 425- as previously described^[63]27. The expression heatmap of PAM50 genes in all BRCA samples shows the expected clustering (Fig. [64]S4). Supplementary Table [65]1 shows the complete list of 1083 BRCA tumors classified according to the four subtypes. With this in hand, we compared the mRNA levels of the three subunits in the four subtypes to the 113 normal breast samples: Fig. [66]1B shows global increase of NF-YA in all subtypes, somewhat less significant in HER2E, and very significant in Basal-like (p value e-11). NF-YB is not affected, bar a statistically significant decrease in Basal-like. NF-YC shows some reduction in HER2E and modest increase in Basal-like. These data indicate that overexpression of NF-YA is not restricted to a specific subtype of BRCA, and confirm little to no change in HFD subunits. Figure 1. [67]Figure 1 [68]Open in a new tab Analysis of NF-Y subunits expression in TCGA BRCA. (A) Box plots of NF-Y subunits expression at gene level in TCGA-BRCA, measured in TPMs. (B) Expression of NF-Y subunits at gene level across the TCGA BRCA subtypes after PAM50 classification of TCGA-BRCA cohort. p-values are calculated using a Wilcoxon signed-rank test. NF-Y, GC-rich and E2F sites are enriched in promoters of common genes overexpressed in all BRCA subtypes We analyzed the levels of gene expression in the four BRCA subtypes (Supplementary Table [69]2). The lists of the selected over- and under-expressed genes, using Log[2]FC 2 and FDR 0,01 thresholds, are in Supplementary Table [70]3. The overlap between BRCA subtypes is quite extended, with some 840 genes commonly overexpressed, while 41–61 genes are specific for each subtype (Fig. [71]2A). A similar picture was observed in down-regulated genes (Fig. [72]S5). We then analyzed promoter sequences -from −450 to +50 from the TSS- of overexpressed genes with the Pscan tool^[73]29: this algorithm allows the retrieval of statistically enriched TFBSs (Transcription Factors Binding Sites) based on the DNA matrices present in the JASPAR database. The results are shown in Fig. [74]2A: each of the four distinct signatures contains a specific set of promoters TFBSs, different from each other. In the commonly up-regulated genes, the most enriched matrices are CCAAT/NF-Y, flavors of E2Fs (E2F4/6) and GC-rich matrices, binding to Zinc-finger TFs (KLFs, SP1/2/3). NF-Y or E2Fs sites are not found in the subtypes-specific up-regulated genes. The same analysis was performed on promoters of down-regulated genes, and neither NF-Y nor E2F were found (Fig. [75]S5). To verify these results, we run Weeder, an algorithm for de novo motif discovery finding matrices without any pre-existing bias^[76]30. We found the NF-Y/CCAAT matrix with high frequency, in addition to a GC-rich matrix (Fig. [77]2B). E2F was not identified with this method. In summary, it can be concluded that NF-Y/CCAAT is a centerpiece in promoters of 840 genes commonly overexpressed in TCGA breast carcinomas. Figure 2. [78]Figure 2 [79]Open in a new tab Analysis of gene expression in TCGA BRCA. (A) Venn diagrams show the upregulated genes for each PAM50 subtype, comparing subtype samples to normal tissues in the TCGA BRCA cohort. On the borders, genes exclusively upregulated in each subtype are shown. For subtype-specific and common upregulated genes, the most represented promoter TFBSs are listed, obtained using the Pscan software. (B) The most represented motifs in the commonly upregulated genes from de novo discovery using Weeder. (C) The most represented Reactome pathways enriched in commonly upregulated genes are listed according to their p-value. The list is obtained using KOBAS. (D) Expression of levels of the proliferative marker Ki67 across TCGA-BRCA tumor samples ranked based on NF-YA expression. We then analyzed the common and subtype-specific overexpressed genes for Gene Ontology terms using the KOBAS algorithm: the most enriched term in the commonly overexpressed genes is cell-cycle, specifically mitosis. Additional enriched terms are signaling and senescence. On the other hand, pathways-specific terms are distinctly enriched in genes overexpressed in the four subtypes of BRCA (Fig. [80]2C). These results suggest that the presence of the NF-Y-binding CCAAT matrix correlates with a signature of overexpressed “proliferative” genes. To confirm this, we ranked all BRCA tumors in 20 groups, according to NF-YA levels (Fig. [81]2D, Lower Panel) and observed the levels of Ki67, a proliferative marker: Ki67 is indeed progressively decreased from NF-YA^high to NF-YA^low groups (Fig. [82]2D, Upper Panel). Incidentally, Ki67 is a direct genomic target of NF-Y (Not shown). In conclusion, increased NF-YA levels positively correlates with a “proliferative” signature of genes containing CCAAT in promoters, and with a marker of proliferation. E2Fs overexpression in BRCA The discovery of the E2F matrix in the BRCA commonly overexpressed genes was not a surprise, as it is often found in tumor cohorts^[83]8. E2Fs are a family of 8 genes and microarrays profiling reported overexpression of some -E2F1/2/3- in breast carcinomas [^[84]31 and References therein]. We quantified E2Fs expression in the RNA-Seq BRCA