Abstract ChIP-seq has been commonly applied to identify genomic occupation of transcription factors (TFs) in a context-specific manner. It is generally assumed that a TF should have similar binding patterns in cells from the same or closely related tissues. Surprisingly, this assumption has not been carefully examined. To this end, we systematically compared the genomic binding of the cell cycle regulator FOXM1 in eight cell lines from seven different human tissues at binding signal, peaks and target genes levels. We found that FOXM1 binding in ER-positive breast cancer cell line MCF-7 are distinct comparing to those in not only other non-breast cell lines, but also MDA-MB-231, ER-negative breast cancer cell line. However, binding sites in MDA-MB-231 and non-breast cell lines were highly consistent. The recruitment of estrogen receptor alpha (ERα) caused the unique FOXM1 binding patterns in MCF-7. Moreover, the activity of FOXM1 in MCF-7 reflects the regulatory functions of ERα, while in MDA-MB-231 and non-breast cell lines, FOXM1 activities regulate cell proliferation. Our results suggest that tissue similarity, in some specific contexts, does not hold precedence over TF-cofactors interactions in determining transcriptional states and that the genomic binding of a TF can be dramatically affected by a particular co-factor under certain conditions. Keywords: transcription factor, ChIP-seq, FOXM1 reprogramming, genomic binding, breast cancer prognosis INTRODUCTION Transcription factors (TFs) are crucial proteins that mediate gene transcriptional regulation by binding to specific DNA sequences in all living organisms. Several technologies have been developed to investigate the binding of TFs in a high-throughput manner [[32]1]. Among them, chromatin immunoprecipitation followed by microarray hybridization (ChIP-chip) [[33]2] or high-throughput DNA sequencing (ChIP-seq) [[34]3] have become the most widely used methods to detect binding events of individual TFs across the entire genome [[35]1, [36]4–[37]6]. These methods identify direct and indirect (through interacting with co-factors) binding sites of DNA-associated proteins of interest [[38]4, [39]7, [40]8]. Currently, ChIP-seq has become one of the most important technologies used in genomic studies as evidenced by the rapid accumulation of ChIP-seq data. Previous studies have reported a rapid turnover rate of binding sites of homologous TFs in different species [[41]9–[42]14], suggesting that individual binding sites of a TF are not conserved. Odom et al. performed ChIP-chip analysis and found that the binding sites varied extensively between human and mouse even for TFs that are highly conserved during evolution [[43]13]. Borneman et al. compared the pseudohyphal regulators STE12 and TEC1 binding sites in three yeast species, S. cerevisiae, S. mikatae, and S. bayanus under pseudohyphal conditions and reached a similar conclusion [[44]11]. Other than these comparative studies by experiments, computational studies based on systematic motif analysis also indicated high turnover rate of TF binding motifs in different organisms [[45]12, [46]14]. In spite of this, functional conservation has been demonstrated for many TFs even between species that are distantly related [[47]15, [48]16]. In other words, the homologous TFs participate in the regulation of the same biological process in different species. Interestingly, the functional conservation of them can be attained through regulating different sets of target genes in different species. For example, Tuch et al. showed that the target genes of MCM1 have diverged substantially in three related yeast species; however, in all species MCM1 is involved in regulating cell cycle and mating processes [[49]17]. Moreover, motif analyses indicate that the binding motifs associated with a TF is generally conserved across species, presumably due to the selective pressure imposed on its DNA binding domain [[50]7]. On the other hand, the genomic occupancy of a TF in multiple cell types of the same organism shows different degrees of variation. For some TFs, a high degree of shared occupancy between cell types has been observed. Investigation of CTCF binding in 19 human cell lines, for instance, indicates that on average 72% of CTCF sites were shared between any two cell types [[51]18]. Additionally, variable binding has been observed for 64% of CTCF sites which vary in at least one cell type. However, the binding variation for some other TFs are more dramatic. Shira et al. compared the REST genomic occupancy in 16 different human cell lines and found that only 7% of binding peaks are shared by all cell lines [[52]19]. According to the unpredictable binding of TFs described above, an interesting question arises: is the genomic occupancy of a TF more similar in more closely related cell types? Intuitively, this should be the case according to general knowledge from transcriptomic and other genomic studies. It has been shown in previous studies that gene expression [[53]20, [54]21] and DNA methylation [[55]22, [56]23] levels are highly consistent in cell lines from the same tissue. Moreover, TF binding is largely determined by local chromatin structure (i.e., the accessibility as measured by DNase I hypersensitivity analysis [[57]24, [58]25]) that is shaped by epigenetic mechanisms such as histone modification [[59]26, [60]27]. In fact, it has been demonstrated that a combination of these genomic features with motif analysis can predict TF binding sites with fairly high accuracy [[61]28–[62]32]. Furthermore, data generated from the Encyclopedia of DNA Elements (ENCODE) [[63]33] indicate that high similarity of DNase I hypersensitivity regions between cell lines of similar tissue origins [[64]34, [65]35]. Given these results, we would also expect TF binding profiles be more similar in closely related tissues or cell types. In this study, we investigate FOXM1 binding in several cell lines and show that overall genetic similarity of cell lines does not hold precedence over context-specific TF-co-factor interactions in determining TF binding profiles. FOXM1, forkhead box protein, is a crucial cell cycle regulator [[66]36, [67]37], which has been shown to be highly associated with multiple cancer types [[68]38–[69]42]. Recently, genomic binding data of FOXM1 have been generated by ChIP-seq experiments in several studies [[70]43–[71]47]. Based on these data, we performed a comparative analysis to identify the common and specific genome-wide binding events of FOXM1 in 8 distinct cell lines derived from 7 different tissues. By systematically comparing the binding sites and target genes of FOXM1, we find that even though MCF-7 and MDA-MB-231 are both breast cancer cell lines, FOXM1 binding events are substantially different between these two cell lines compared to non-breast cell lines. In particular, FOXM1 binding sites are more similar in MDA-MB-231, HeLa, U2OS, HEK293, GM12878, SK-N-SH and ECC-1 although they all represent different tissues. Moreover, the prognostic value of FOXM1 has been reported in several cancer types [[72]43, [73]48–[74]50] with the observation that FOXM1 activity is more predictive to prognosis than its mRNA level. We examined the ability of using target genes to infer FOXM1 activity in tumor samples and investigated their association with patient survival in breast cancer. We find that the inferred regulatory activity of FOXM1 is predictive of the survival of patients, and more interestingly, scores inferred based on FOXM1 targets from different cell lines provide complementary clinically related information -- MCF-7 specific FOXM1 targets inform estrogen receptor (ER) activity while targets in other cell lines inform the proliferative ability of tumor cells. These results indicate that the genomic occupation of TFs is more complicated than expected, and that these nuanced changes in binding activity are manifested at the clinical level. RESULTS Overview of our analyses To compare FOXM1 binding sites in different human cell lines, we searched for all available FOXM1 ChIP-seq datasets in the Gene Expression Omnibus (GEO) database [[75]51] and obtained a total of 23 ChIP-seq experiments for FOXM1 at the time of writing. These FOXM1 binding data were collected from 6 studies across 8 different cell lines (Table [76]1), including MCF-7 (ER-positive breast cancer), MDA-MB-231 (ER-negative breast cancer), ECC-1 (endometrium cancer), GM12878 (blood), HEK293 (kidney cancer), HeLa (cervical cancer), SK-N-SH (neuroblastoma), and U2OS (osteosarcoma). For these datasets, we performed systematic comparative analyses using different levels of information from these data (Figure [77]1). First, at the binding signal level we performed principle component analysis (PCA) [[78]52] on the normalized binding signals of the ChIP-seq experiments. Second, at the peak level we identified FOXM1 binding peaks in each of these ChIP-seq experiments and examined the number of shared peaks between each pair of experiments. Moreover, we performed motif analyses to examine the enrichment of 687 motifs available from the TFANSFAC [[79]53] and JASPAR [[80]54] databases in the binding peaks of each ChIP-seq experiment. Third, at the gene level we defined FOXM1 target genes using a probabilistic model and compared the shared genes between all pairs of experiments. Comparative analyses at the signal, peak, and gene target levels consistently support that MCF-7 ChIP-seq experiments are highly similar to each other but exhibit little resemblance to MDA-MB-231, which is more similar to non-breast cell lines. Finally, at the level of regulation activity, we applied the Binding Association with Sorted Expression (BASE) algorithm [[81]55] to a primary breast cancer expression dataset to infer FOXM1 regulatory activity in patient samples based on its target gene expression. Our results suggested that FOXM1 target genes identified in all cell lines except MCF-7 are informative of the proliferation-regulating activity of FOXM1, while the target genes identified in MCF-7 cell line reflect ER activity rather than FOXM1 activity. Excitingly, the activities inferred based on MCF-7 and other cell lines can be combined to achieve more accurate prediction of patient prognosis. Table 1. Numbers of reads, peaks and genes in each ChIP-seq experiment. ChIP-seq ID Total Reads Total Peaks Total Target Genes GSE Antibody Cell Line Tissue SRR577922 116,488,388 18,765 274 [82]GSE32465 SC-502 ECC-1 Endometrium SRR577923 94,505,444 5,892 228 SRR577673 168,945,472 54,916 243 [83]GSE32465 SC-502 GM12878 Blood SRR577674 121,585,588 30,273 199 SRR1534936 140,964,036 3,883 162 [84]GSE60032 GTX-102170 HEK293 Kidney SRR1534937 144,836,032 1,963 148 SRR1045855 68,473,196 3,789 243 [85]GSE52098 SC-502 HeLa Cervical SRR1045856 67,826,668 1,426 209 SRR2390493 168,903,060 12,214 112 [86]GSE72977 SC-501 MCF-7 Breast SRR2390494 143,504,268 5,032 97 SRR2390496 150,878,452 14,806 95 SRR567275 125,468,568 34,091 114 [87]GSE40762 SC-502 SRR567277 115,745,436 2,763 92 SRR567284 54,319,080 517 95 SRR567287 135,548,148 9,148 109 SRR577748 140,926,888 24,926 95 [88]GSE32465 SC-502 SRR577749 67,248,588 7,801 90 SRR567279 112,861,044 13,816 190 [89]GSE40762 SC-502 MDA-MB-231 SRR567281 129,971,924 11,160 111 SRR577710 119,592,208 13,405 170 [90]GSE32465 SC-502 SK-N-SH Neuroblastoma SRR577711 94,138,820 12,122 183 SRR500261 147,600,792 1,188 127 [91]GSE38170 SC-502 U2OS Osteosarcoma SRR500262 45,863,008 584 138 [92]Open in a new tab Figure 1. Schematic depicting the comparison of FOXM1 binding in different cell lines. Figure 1 [93]Open in a new tab We compared the difference based on three levels, (A) the raw signal profiles, (B) binding peaks and (C) target genes, to show the different binding of FOXM1 in different cells. (D) We applied the target gene profiles to infer FOXM1 activity and further compared the difference. Comparison of FOXM1 binding signals and peaks We first sought to investigate the difference in FOXM1 binding events between different cell types. We divided the whole human genome into 100 bp bins, and for each bin we calculated the normalized binding signal (mean coverage per million reads) based on the continuous-valued TF binding signal provided in the bedGraph files. Bins with non-zero signal in at least one of the 23 ChIP-seq profiles were selected for PCA analysis. The results showed that all MCF-7 cell ChIP-seq profiles clustered together according to the first principle component, whereas MDA-MB-231 profiles were grouped with HEK293, HeLa and U2OS (Figure [94]2A). This is interesting since MDA-MB-231 and MCF-7 are both breast cancer cell lines, yet FOXM1 binding profiles in MDA-MB-231 are more similar to those in non-breast cell lines. Figure 2. Comparison of FOXM1 binding events in different cell lines. Figure 2 [95]Open in a new tab (A) PCA analysis of the normalized binding signal of FOXM1 in different ChIP-seq experiments. Colored dots represent different ChIP-seq experiments. The first PC explains 41.13% variation and the second PC explains 14.79% variation. (B) Peak overlap analysis based on the called binding peak in different ChIP-seq experiments. The color bars in left and top represent different ChIP-seq experiments. (C) Genomic regions distribution of FOXM1 binding peaks in different ChIP-seq experiments. (D) Two specific examples of FOXM1 binding. Second, we called FOXM1 binding peaks for all ChIP-seq experiments using Model-based Analysis of ChIP-Seq (MACS2) [[96]56]. The number of FOXM1 binding peaks ranged from 517 to 54,916 depending on sequencing depth and other experimental factors, with details shown in Table [97]1. We examined the number of shared peaks between each pair of TF binding experiments. A peak in one experiment is counted as shared if there is at least a 1 bp overlap with peaks from the other experiment. Peak overlap analysis showed that most FOXM1 binding peaks called in MCF-7 cells are shared, despite the variation in peak numbers across different experiments (Figure [98]2B). As expected, peaks in MCF-7 exhibited low overlap with those called in the other cell types. Strikingly, binding peaks in MDA-MB-231 cells exhibited greater overlap with those in non-breast cell lines than with MCF-7 cells (Figure [99]2B). Moreover, FOXM1 binding peaks displayed different genomic distribution in different cell types (Figure [100]2C and see [101]Supplementary Figure 1). Compared to other cell types, a smaller fraction of FOXM1 binding peaks in MCF-7 cells were observed in promoter regions, but more binding events occurred in intergenic regions. In MDA-MB-231 and other non-breast cell lines, FOXM1 binding signals exhibited high enrichment in regions proximal to transcription start sites (TSS) (from −1 kb to 1 kb) (see [102]Supplementary Figure 2). Conversely, binding signals in MCF-7 cells were more enriched in regions distal to gene TSSs (from −3 kb to −2 kb and from 2 kb to 3kb). Figure [103]2D shows a strong FOXM1 binding peak associated with the gene CDC25B, a well-known FOXM1 target gene [[104]57], which was detected in all cell types but MCF-7. This phenomenon was also observed for many other cell cycle genes (data not shown), implying that FOXM1 may be less involved in cell cycle regulation in MCF-7 cells. In contrast, a FOXM1 binding peak associated with VMP1, a gene encoding a vacuole membrane protein [[105]58], was only detected in MCF-7 cells (Figure [106]2D). Taken together, these results suggest that FOXM1 binding in MCF-7 cells differs from that of the other cell types, most notably MDA-MB-231. Moreover, FOXM1 binding in MDA-MB-231 was more similar to binding in non-breast cell lines. One plausible explanation is that estrogen receptor alpha (ERα), which is present in MCF7 but not in MDA-MB-231 cells, modulates FOXM1 activity by recruiting FOXM1 to ERα binding sites [[107]59, [108]60] and causing differential FOXM1 binding between the two breast-cancer cell lines [[109]59]. Comparison of motifs enriched in FOXM1 binding sites To further investigate differences in FOXM1 binding events, we identified the TF binding motifs that are enriched in FOXM1 binding peaks in different cell lines. Motif enrichment analysis was conducted by scanning for 687 motifs from the TRANSFAC [[110]53] and JASPAR [[111]54] databases in peak regions (see details in Methods). To test whether a motif is enriched, we calculated enrichment scores for each FOXM1 binding peak across all the motifs enrolled in those two datasets. A log[2]-transformation was applied after performing the motif enrichment analysis so that a positive value indicates enrichment and a negative value indicates depletion. The results showed that experiments in MCF-7 cell lines (green bar) clustered together based on their enriched motifs compared with the other cell lines, whereas those from MDA-MB-231 (brown bar) cell lines clustered with other non-breast cells (Figure [112]3A). Figure 3. Comparison of enriched motifs of FOXM1 in different cells. Figure 3 [113]Open in a new tab (A) Motif enrichment analyses across all FOXM1 ChIP-seq experiments. The color bars in the left represent different cell lines. The value in heatmap is log 2 transferred enrichment score (ES). (B) Five specific examples show the different binding of FOXM1 in different ChIP-seq experiments. Barplot was performed in log2 transferred enrichment scores. * represents the significance of enrichment (FDR < 0.01). Mann Whitney Wilcoxon Test p-value was showed in the FOXA1 and ESR1 examples. In Figure [114]3A, we highlighted three interesting groups of enriched motifs. Group A contained the motifs enriched in all cell lines. Amongst these common motifs, NFYA (NFY CCAAT) has already been confirmed to be enriched in U2OS [[115]47] and HeLa [[116]61] cells; FOS and AP1 were previously shown to be associated with both ERα and FOXM1 binding [[117]61, [118]62]; and BACH1 and BACH2 are two FOXM1 related proteins [[119]63]. This group also contained some members of the E2F family of TFs involved in the cell cycle [[120]64]. Group B consisted of two groups, where group B1 was mainly enriched in MCF-7 cells while group B2 was specific to MCF-7 cells. Specifically, motifs in group B1 consisted of diverse forkhead family and GATA family motifs, including FOXA1, FOXA2 and GATA3 which have been shown to act as ERα pioneer factors [[121]65–[122]67]. On the other hand, group B2 contained motifs associated with many kinds of receptors including estrogen receptor, nuclear receptor, peroxisome proliferator-activated receptor and thyroid hormone receptor. In group B2, ESR1 and ESRRA are two specific motifs for ERα [[123]68, [124]69], and RORA (nuclear receptor related) and PPARG (peroxisome proliferator-activated receptor related) have been shown to associate with ERα [[125]70, [126]71]. Group C contained motifs that are specifically enriched in GM12878 cells, and motifs which are associated with signal transduction, activation of transcription, and interferon regulatory factors. Moreover, we compared log[2]-transformed ESs of five different motifs between the cluster groups including NFYA, AP-1, FOXA1, ESR1, and ISRE (Figure [127]3B). As shown, FOXA1 (Mann Whitney Wilcoxon Test P = 4e-03) and ESR1 (Mann Whitney Wilcoxon Test P = 3e-05) were significantly enriched in MCF-7 cell lines compared to others (Figure [128]3B). The same result was also observed for other ER-related motifs (see [129]Supplementary Figure 3). These findings suggest that ERα may interact with FOXM1 and mediate FOXM1 binding in MCF-7 cell line. Moreover, we conducted a preliminary exploration into other co-factors that may modulate FOXM1 binding activity. Due to higher enrichment in non-MCF-7 cells, we used NFH3 (see [130]Supplementary Figure 4), a FOXM1 motif included in the TF Encyclopedia dataset [[131]72] as the primary motif for SpaMo algorithm [[132]73]. Besides, we utilized HOCOMOCO V10 [[133]74], a human motif database as the secondary motif database as SpaMo input. Our results (see [134]Supplementary Table 1) suggest that the motif of STAT3, a regulator involved in signal transduction and activation of transcription [[135]75], was enriched in all cell lines except GM12878. Comparison of FOXM1 target genes Next, we explored whether the differential genome-wide binding sites of FOXM1 results in the regulation of different target genes across varied cell lines. To identify the target genes of FOXM1, we applied a probabilistic model, TIP [[136]76], to determine target genes for each ChIP-seq experiment (see [137]Supplementary Table 2). The numbers of identified target genes for each ChIP-seq experiment were shown in Table [138]1, with a range of 92 (in MCF-7 cell) to 274 (in ECC-1 cell). According to the target genes, pair-wised enrichment analyses were conducted to calculate the corresponding p values. Based on negative log 10 transferred p values, the cluster results showed that target genes from all MCF-7 related ChIP-seq experiments exhibit significant overlap with each other but little overlap with those from other cells (Figure [139]4A). Consistent with our binding peaks and motif analyses, target genes detected in MDA-MB-231 and other non-breast cell lines displayed highly degree of consistency. Figure 4. Comparison of target genes of FOXM1 in different cells. Figure 4 [140]Open in a new tab (A) Heatmap of the enrichment of the target genes of pair-wised ChIP-seq experiments. The color bars around the heat map represent different cells. (B) Heatmap of pathway enrichment results based on negative log 10 transferred p-value. The color bars in the left represent different cells. (C) Enrichment scores comparison of two pathways, the cell cycle and the ER nongenomic, in MCF-7 and other cell lines. Colored bars represent corresponding cells. Mann Whitney Wilcoxon Test p-value was showed. Furthermore, we performed comprehensive pathway analysis on target genes in each ChIP-seq experiment using the Molecular Signatures database (MSigDB) [[141]77] C2 pathway database which integrates KEGG [[142]78], REACTOME [[143]79], Pathway Interaction Database (PID) [[144]80] and BioCarta [[145]81] databases. Because FOXM1 is a critical cell cycle regulating TF [[146]82], we expected that the most significant enriched pathways would be related to the cell cycle. Indeed, our results showed that pathways related to the cell cycle were significantly enriched in FOXM1 target genes in MDA-MB-231 and other non-breast cell lines (Figure [147]4B). However, pathways enriched in FOXM1 target genes from MCF-7 cell lines exhibited lower enrichment in the cell cycle pathway (Figure [148]4C, Mann Whitney Wilcoxon Test P = 7e-04) but higher enrichment in an estrogen receptor related pathway, ER non-genomic pathway (Figure [149]4C, Mann Whitney Wilcoxon Test P = 0.005), when compared to other cell lines (Figure [150]4B). These results suggest that recruitment by ERα modifies the binding of FOXM1 in MCF-7 cell lines resulting in distinctive binding sites that differ from that of MDA-MB-231 and other non-breast cell lines. Moreover, the target genes of FOXM1 displayed a high degree of concordance between MDA-MD-231 and the other 6 non-breast cell lines even though they are derived from diverse tissues. Prognostic prediction based on FOXM1 activity in breast cancer Several studies have shown that the transcriptional activity of FOXM1 is more predictive than its mRNA expression in cancers [[151]43, [152]48–[153]50]. To further investigate whether the different binding of FOXM1 is able to disturb its activity, we applied a computational method that infers FOXM1 activity based on its target gene expression, and examined whether it is prognosticative. We performed our analysis in the METABRIC breast cancer dataset, which contains gene expression and clinical profiles for 1,992 breast cancer patients [[154]83]. We first merged target genes identified in each cell line into a core set of FOXM1 target genes and employed the BASE algorithm [[155]55] to calculate an individual Regulatory Activity Score (iRAS) for each tumor sample based on the expression of FOXM1 target genes in the tumor gene expression profile. A higher iRAS in a tumor sample indicates that target genes of FOXM1 tend to have higher expression level and therefore a higher transcriptional activity of FOXM1 in this sample. Since FOXM1 target genes have been identified in 8 cell lines, we inferred FOXM1 activities in breast cancer samples using target genes identified in each individual cell line, resulting in eight iRASs for a tumor sample. Here, we utilized the target genes from the two breast cancer cell lines, MCF-7 and MDA-MB-231, as an example to investigate whether the inferred TF activities are associated with cancer outcome. Specifically, we denote FOXM1 activity inferred based on target genes identified in MCF-7 cell line as iRAS[MCF-7]. Similarly, iRAS[MDA] indicates FOXM1 activity inferred based on its target genes in MDA-MB-231 cell line. We first compared the inferred regulatory activities of FOXM1 in ER-positive (ER+) versus ER-negative (ER−) breast tumor samples. When FOXM1 targets identified in MCF-7 were used for activity inference, the resulting iRAS[MCF-7] showed significant difference between ER+ and ER− samples (Figure [156]5A). As shown, ER+ samples exhibited significantly higher iRAS[MCF-7] than ER- samples (Mann Whitney Wilcoxon Test P = 4e-98). However, when we stratified patients into ER+ and ER- groups, iRAS[MCF-7] is not associated with prognosis any more (P > 0.1). This indicates that FOXM1 targets in MCF-7 cell lines reflect estrogen receptor activity in breast tumor samples, implying that ERα serves as the major factor that mediates FOXM1 genomic binding in ER+ breast cancer. Namely, in majority binding sites, ERα mediates FOXM1 through its DNA motifs, and thus regulating the transcription of ERα target genes. We further divided patients into two groups based on iRAS[MCF-7], patients with high iRAS[MCF-7] (iRAS[MCF-7] > 0) and those with low iRAS[MCF-7] (iRAS[MCF-7] < 0), and compared their prognosis. Patients with high iRAS[MCF-7] exhibited better prognosis compared to those with low iRAS[MCF-7] (Figure [157]5B, log-rank P = 6e-05). These observations further implied that iRAS[MCF-7] primarily reflects ERα activity in a tumor sample where higher ERα activity indicates greater sensitivity to hormone treatment, even though the major function of FOXM1 is to regulate cell cycle division. Figure 5. Associations between different FOXM1 activities and primary breast cancer sample prognosis. Figure 5 [158]Open in a new tab (A) Boxplot for iRAS[MCF-7] of ER+ and ER- patients. (B) Survival curve for ER+ and ER- patients with high or low iRAS[MCF]. (C) Boxplot for iRAS[MDA] of ER+ and ER- patients. (D) Survival curve for ER+ and ER- patients with high or low iRAS[MDA]. (E) Distribution of patients based on both iRAS[MCF-7] and iRAS[MDA]. Red dots: patients with both positive iRAS[MCF-7] and iRAS[MDA]. Blue dots: patients with both negative iRAS[MCF-7] and iRAS[MDA]. Green dots: patients with positive iRAS[MCF-7] and negative iRAS[MDA]. Pink dots: patients with negative iRAS[MCF-7] and positive iRAS[MDA]. The percentages at four corners are the fractions of patients with ER+ in the corresponding group. The number 1~4 mapped to the survival curves. (F) Survival curves for the four patient groups. We then examined the regulatory activity of FOXM1 inferred based on its targets in MDA-MB-231, an ER-negative breast cancer cell line, and generated the corresponding iRAS[MDA]. We observed significantly lower iRAS[MDA] in ER+ than in ER- tumor samples (Figure [159]5C, Mann Whitney Wilcoxon Test P = 1e-65), which is opposite to what we observed for iRAS[MCF-7]. Furthermore, patients with low iRAS[MDA] had better prognosis compared to those with high iRAS[MDA] (Figure [160]5D, log-rank P = 7e-17). These results suggested that iRAS[MDA] reflects the activity of FOXM1 in regulating cell proliferation. Lower iRAS[MDA] indicates slower proliferation of tumor cells in a patient's sample, and therefore informs better prognosis. The relevance of iRAS[MDA] with cell proliferation was further confirmed by its high positive correlation with the E2F4 score (Pearson Coefficient R = 0.93, see [161]Supplementary Figure 5), which is calculated based on an E2F4 gene signature that serves as an accurate indicator of cell proliferation in breast tumor cells [[162]84]. In contrast, iRAS[MCF-7] inferred based on FOXM1 targets in MCF-7 cell lines showed no significant correlation with iRAS[E2F4] (Pearson Coefficient R = 0.017, see [163]Supplementary Figure 5). However, the calculated iRAS[MCF-7] had high positive correlation (Pearson Coefficient R = 0.45, see [164]Supplementary Figure 5) with ESR1 gene expression while iRAS[MDA] was negatively correlated (Pearson Coefficient R = −0.34, [165]Supplementary Figure 5) with ESR1 gene expression. Moreover, we also inferred the FOXM1 activities in breast tumor samples based on its target genes identified in the other six non-breast cell lines. As expected the resulting iRAS were similar to iRAS[MDA] but different from iRAS[MCF-7] in terms of prognostic prediction ([166]Supplementary Figures 5 and [167]6). Our observations indicate that FOXM1 activity inferred based on its targets in MCF-7 cell line serves as a proxy for ERα activity, while iRAS[MDA] and other iRAS calculated using the target genes in non-breast cells serve as proxies for proliferation. Since both ER status and proliferation are critical factors for determining patient prognosis in breast cancer, we postulated that prognosis could be better predicated based on a combination of iRAS[MCF-7] and iRAS[MDA.] Specifically, we divided the patients into four categories based on both iRAS[MCF-7] and iRAS[MDA] (Figure [168]5E). For example, patients (green) with high iRAS[MCF-7] and low iRAS[MDA] were classified as group 1 which encompassed 33.4% of patients in the METABRIC dataset, 97.5% of which were ER+ patients. In contrast, patients (pink) in group 4 contained 23.1% of METABRIC patients with only 35.1% of these patients being ER+. This observation further suggests that the calculated iRAS[MCF-7] reflects the activity of estrogen receptor, which is consistent with Figure [169]5A where ER+ patients exhibited higher iRAS[MCF-7] than ER- patients. More interestingly, when we compared group 1 with group 4, we found that ER+ patients are more likely to maintain lower cell proliferation. We further found that patients in group 1 have the best outcome compared to the other 3 groups (Figure [170]5F, log-rank P = 3e-16), as these patients are associated with higher ERα activity and lower proliferation. Moreover, patients in group 1 and 2 had better prognosis compared to those in group 3 and 4 which implies that proliferation is more prognosticative than ER activity in breast cancer. These observations suggest that the combination of the inferred activities of both ERα and proliferation can provide more precise prognostic predictions in breast cancer. DISCUSSION TFs play crucial roles in regulating gene expression by binding to many cis-regulatory elements that decide cell fate. ChIP-seq technology has been widely utilized to investigate the binding sites of several TFs in a panoply of biological contexts. Intuitively, we would expect to observe more similar genomic binding profiles of a TF in cell lines from the same or more closely related tissues. However, our analyses on FOXM1, a well-known cell cycle regulator [[171]85], provide an example showing that the overall genetic similarity of cell lines does not always hold precedent over context-specific TF co-factors in determining TF binding profiles. In this study, we focus on the binding events of FOXM1 using 23 ChIP-seq experiments from 8 human cell lines that encompass 7 different tissues (Table [172]1). We comprehensively examine the binding patterns of FOXM1 across 4 information layers including raw binding signals, binding peaks, target genes, and regulator activity (Figure [173]1). The PCA analysis of raw binding signals showed that ChIP-seq experiments in MDA-MB-231 clustered with HEK293, HeLa and U2OS cell lines signals while those from MCF-7 cell lines group together independently. This is surprising due to the fact that MCF-7 and MDA-MB-231 are both breast cancer cell lines that are dissimilar in their FOXM1 binding profiles. In addition, binding profiles in MDA-MB-231 are more similar to those from other non-breast tissues. Peaks called in MCF-7 cell lines exhibited high overlap with each other even though the total number of peaks called differed between experiments. Contrastingly, peaks identified in another breast cancer cell line, MDA-MB-231, showed a high degree of overlap with other non-breast cell lines. Furthermore, utilizing the called binding peaks of FOXM1, we compared motifs enriched in different ChIP-seq experiments and demonstrated that the enriched motifs are highly overlapped between MCF-7 ChIP-seq experiments while those in MDA-MB-231 were extremely similar with that in the other non-breast cell lines. Notably, peaks called in MCF-7 cell lines were significantly enriched in ERα related motifs. Similar results are observed in the comparison of target genes as well. Specifically, genes identified in MCF-7 cell lines were highly redundant, whereas genes identified in MDA-MB-231 cell lines showed greater overlap with those found in non-breast cell lines. Additionally, the cell cycle pathway was highly enriched in FOXM1 target genes identified in MDA-MB-231 and non-breast cell lines, but not in MCF-7. However, the ER non-genomic pathway was significantly enriched in target genes identified in MCF-7 cell lines. Interestingly, ECC-1, an endometrial cell line that is also ER positive cells, shows similar pattern with other non-MCF-7 cell lines in terms of FOXM1 binding, suggesting that the effect of co-factor ERα on FOXM1 binding might be breast tissue-specific. Based on these results, we suspected that FOXM1 and ERα might exhibit a physical functional interaction in MCF-7 cell lines which is consistent with reports from previous studies [[174]59, [175]86, [176]87]. We hypothesized that this interaction alters the binding properties of FOXM1 in MCF-7 cell line and modulates FOXM1-driven transcriptional output. Sanders et al. [[177]59] indicated that the recruitment effects of ERα on FOXM1 binding in ER positive breast cancer cell lines differs in ER negative cell lines. In our study, we demonstrate that the influence of ERα not only causes differences between the two breast cancer cell lines, but also induces differences between ER positive breast cancer cell lines and non-breast cancer cell lines. Moreover, the binding sites of FOXM1 in MDA-MB-231, an ER negative breast cancer cell line, and those in non-breast cell lines showed a high degree of overlap even though they are classified as different tissues. These results suggest that the binding of FOXM1 can be modulated by a co-factor such as ERα, and this interaction lead to significant changes in activated pathways that hold precedence over tissue-specificity. Lastly, we further tested whether the varied binding affects the overall regulatory activity of FOXM1 in a cell type-specific fashion. In support of our earlier results, we found that the activity of FOXM1 is mainly captured by ERα binding events in MCF-7 cell line which reflects ERα activity. In contrast, the FOXM1 activity inferred using target genes in MDA-MB-231 or non-breast cell lines, is more indicative of cell proliferation. Both inferred activities are significantly predictive of breast cancer prognosis. Nevertheless, the results of multivariate Cox regression model suggested that the activity of FOXM1 in MDA-MB-231 cell line, proliferation activity, still is an independent biomarker in breast cancer sample prognosis (see [178]Supplementary Table 3). While FOXM1 activity in MCF-7 cells is not associated with prognosis after considering clinical variables which is consistent with the result of ER status. Moreover, combining information about the activity of both estrogen receptor and cell proliferation activities results in a gene signature that predicts patient prognosis with high accuracy. In light of our findings, we were surprised by how the influence of a single co-factor on the binding patterns of a certain TF could induce a completely different binding profile. The co-factor discovery analysis (see [179]Supplementary Table 1) suggests that STAT3 might be a potential co-factor of FOXM1, in line with previous report that the activation of FOXM1 is dependent on STAT3 activity [[180]88]. Because of the diversity and uncertainty of FOXM1 motifs (see [181]Supplementary Figure 4), we considered this analysis as a preliminary exploration to provide further directions. In view of our analyses, we inferred the binding patterns of FOXM1 in different cell lines based on its physical interactions with different co-factors. In MCF-7 cell lines FOXM1 binding might be mediated by ERα, supporting the following two possible patterns: (1) FOXM1 interacts with ERα and both bind their own motifs, eliciting a cooperative binding pattern, and (2) FOXM1 interacts with ERα but only ERα binds its DNA motif (Figure [182]6A). Our motif enrichment results (Figure [183]3A) indicate that FOXM1 binding sites in MCF-7 cells are enriched not only in ER-related motifs but also in cell cycle related motifs. This observation suggests that the cooperative binding (Figure [184]6A, pattern 1) is the more likely to be the binding pattern of FOXM1 in MCF-7, which is in agreement with the previous study by Sanders et al. [[185]59]. In contrast, in MDA-MB-231 and the non-breast cell lines FOXM1 may have four possible binding patterns (Figure [186]6B). (1) FOXM1 interacts with another TF x, each binding with its own motif; (2) FOXM1 binds with its motif without interacting with a co-factor. (3) FOXM1 interacts with another TF x, but DNA binding is mediated by FOXM1 motif. (4) FOXM1 interacts with another TF x that binds with its motif. These four binding patterns may co-exist in MCF-7 and non-breast cancer cell lines. When ChIP-seq data for FOXM1 are generated in more cell lines, we may expect to observe in some cell lines that FOXM1 binding events are mainly mediated by the interaction of another TF x with its DNA motif as observed in MCF-7 cells. Figure 6. Co-binding patterns for FOXM1 in different cells. Figure 6 [187]Open in a new tab (A) Two possible co-binding pattern of ERa and FOXM1 in MCF-7 cell lines. (B) Four possible co-binding patterns in MDA-MB-231 and other non-breast cell lines. Although the FOXM1 ChIP-seq data were collected from 6 different studies with different FOXM1 antibodies and sequence depths, we achieved consistent results according to the comparisons based on raw signals, binding peaks, motifs and target genes. Our systematic analyses revealed that binding of FOXM1 in MCF-7 cell line differs with MDA-MB-231 and other non-breast cell lines. Moreover, the binding of FOXM1 exhibited highly similarities in cell lines from MDA-MB-231 and six other different tissues. Our results provide new insights into TF binding and suggest that some TFs might not be as affected by tissue type as previously thought. However, there exists some limitations to our methods. First, noise caused by the hybridization process in ChIP-seq experiments[[188]3] may influence the accuracy of our results. Second, the computational nature of our analysis is unable to directly reveal a direct PPI between ERα and FOXM1. However, a previous study by Sander et al. [[189]59] revealed that ERα co-immunoprecipitates with FOXM1 demonstrating that they are able to interact in vitro. In summary, our analysis robustly demonstrated altered FOXM1 binding as a result of ERα activity and provided new insights concerning the binding modes of FOXM1 in different cell types. MATERIALS AND METHODS Collection of FOXM1 ChIP-seq experiments We retrieved 23 FOXM1 ChIP-seq experiments from the NCBI GEO database [[190]51] (Table [191]1). We downloaded two FOXM1 ChIP-seq experiments performed in ECC-1 cells (endometrium cancer cell line), two ChIP-seq experiments performed in GM12878 cells (blood cancer) and two ChIP-seq experiments performed in SK-N-SH cells (neuroblastoma cancer) from ENCODE using Series accession number [192]GSE32465 [[193]46]. Two FOXM1 ChIP-seq experiments in HEK293 cells (kidney cancer cell line) were acquired using Series accession number [194]GSE60032 [[195]44]. Two FOXM1 ChIP-seq experiments performed in HeLa cells were downloaded under Series accession number [196]GSE52098 [[197]45]. Two FOXM1 ChIP-seq experiments performed in U2OS cells (osteosarcoma cancer cell line) were downloaded under Series accession number [198]GSE38170 [[199]47]. Two FOXM1 ChIP-seq experiments performed in MDA-MB-231 cells (breast cancer cell line) were downloaded with Series accession number [200]GSE40762 [[201]59]. Nine FOXM1 ChIP-seq experiments performed in MCF-7 cells (breast cancer cell line) were downloaded under Series accession numbers [202]GSE32465 [[203]46], [204]GSE72977 [[205]43], and [206]GSE40762 [[207]59]. The antibodies used in [208]GSE60032 and [209]GSE72977 are GTX-102170 and SC-501, respectively. In addition, FOXM1 antibody SC-502 is used in the other experiments. Breast cancer database In this study, we downloaded the METABRIC breast cancer dataset [[210]83] to calculate iRAS and perform survival analysis with the accession number EGAS00000000083. This dataset contains gene expression profiles and clinicopathological data for 1,992 breast cancer samples including time-to-event information and ER status. ChIP-seq reads alignment After collecting the FOXM1 ChIP-seq experiments, we first used NCBI Sequence Read Archive [[211]89] to align the reads in each experiment. Fastq-dump was applied to convert the raw ChIP-seq experiment profiles to fastq files. We used Bowtie [[212]90] to map reads to the UCSC hg19 human sequence and generated SAM files for each ChIP-seq experiment. We then sorted these SAM files using SAM tools [[213]91] and then generated bedGraph files using BEDtools [[214]92]. The resulting reads for each ChIP-seq experiment were listed in Table [215]1. The bedGraph files were further used to perform PCA analysis [[216]52] and display specific peaks with Integrative Genomics Viewer [[217]93]. ChIP-seq peak calling and Genomic distribution of binding peaks Using the generated SAM files, significant peaks were called by MACS2 [[218]56] setting q-value cutoff as 0.01. The number of called peaks for each experiment were shown in Table [219]1. The peaks overlap coefficients were used to cluster ChIP-seq experiments. The Cis-regulatory Element Annotation System [[220]94] was applied to functionally annotate binding peaks across the genome. Motif enrichment analysis In our study, we utilized two high-quality transcription factor binding profile databases, TRANSFAC [[221]53] and JASPAR [[222]54], to perform the enrichment analysis. We used FIMO [[223]95] to scan peak regions for occurrences of all the 687 motifs contained in the two databases. Second, we refined the called peaks using the summit files generated by MACS [[224]56]. Specifically, for a given peak, we set up a region surrounding its summit (from −250 to 250 of summit). Then, the overlapped district of the peak and this region was considered as the refined peak. To perform the enrichment analysis, we set up two control regions (the same length of the refined peak) for each refined peak. One is upstream of the refined peak and ends at the start position of the refined peak. The other is downstream of the refined peak and starts at the ending position of the refined peak. Third, enrichment score (ES) was calculated and corresponding p value was calculated using Chi-squared test. The function heatmap.2 from the R package “gplots” was used to show the motif enrichment analysis results after log 2 transformation of the ES. Identification of target genes We used the TIP algorithm [[225]76] to identify the target genes of FOXM1 for different ChIP-seq experiments (bedGraph files) by assigning each gene a probability of being bound by FOXM1 based on the averaged background binding signal. An FDR threshold of 0.01 was used to identify putative target genes of FOXM1 for each ChIP-seq experiment. The number of target genes identified in each experiment are shown in Table [226]1 and the specific genes are listed in an additional table (see [227]Supplementary Table 1). Moreover, the average signal of FOXM1 binding in the DNA region surrounding TSS profiles were generated as the output of the TIP algorithm. These profiles were applied to show the FOXM1 binding distribution around gene TSSs (from −3 kb to 3 kb of TSS). Pathway enrichment analysis In our study, we used the MSigDB [[228]77] C2 pathway database to perform pathway enrichment analysis. The C2 database integrates 4 independent pathway datasets including REACTOME, PID, BIOCARTA, and KEGG [[229]78–[230]81]. We excluded pathways containing less than 40 target genes and calculated the ES and p-value for all FOXM1 target gene sets using a two-sided hypergeometric test. Clustering analysis of pathway enrichment was performed by log10-transforming the enrichment p-values and assigning a positive value if the corresponding ES was < 1 and a negative value if the ES was > 1. All p-value calculations were performed using the R “Hypergeometric” package and clustering analysis was implemented using the “gplots” package in R. Calculation of FOXM1 activity According to the target gene sets identified using our previously TIP algorithm [[231]76], we first merged target genes identified in each cell line into a core set of FOXM1 binding affinity data. Specifically, g[j] = {g[1,j], g[2,j], …g[i,j], …, g[n,j]}, where g[j] is the target gene set for cell line j, g[i,j] is the ith gene in whole human genome, n is the total number of whole human genome genes. g[i,j] is 1 suggests that the ith gene is FOXM1 target gene in cell line j, whereas, the ith gene is non-target gene. Integrating breast cancer samples gene expression profiles, we employed BASE [[232]55], a rank-based algorithm, to infer an iRAS for each breast tumor. Briefly, for a given sample S = {s[1], s[2], …, s[i], …, s[n]}, where s[i] is expression level of the ith gene. Basing on the FOXM1 binding affinity data, BASE calculated a cumulative distribution function for both target genes and non-target genes, denoting as T(i) and Non-T(i). The maximum deviation between these two functions infers the similarity score between the binding affinity data and sample gene expression. After 1000 times random permutation, we normalized those scores and got the iRAS which infers the activity of FOXM1 in a specific sample. Survival analysis The Kaplan-Meier method was applied to compare survival prognosis of patients in different groups and the log-rank test was used to calculate the p-value. Multivariate cox regression model was used to examine the independent prognostic power of iRAS considering important clinical variables including age, tumor stage, tumor grade, ER status, PR status and HER2 status. The R package “survival” was used to implement survival analysis. SUPPLEMENTARY MATERIALS FIGURES AND TABLES [233]oncotarget-08-76498-s001.pdf^ (6.3MB, pdf) [234]oncotarget-08-76498-s002.xlsx^ (18KB, xlsx) [235]oncotarget-08-76498-s003.xlsx^ (51.8KB, xlsx) ACKNOWLEDGMENTS AND FUNDING