Abstract

   Hypoxia inducible factor 1 alpha (HIF1A) is a transcription factor (TF)
   that forms highly structural and functional protein–protein
   interactions with other TFs to promote gene expression in hypoxic
   cancer cells. However, despite the importance of these TF-TF
   interactions, we still lack a comprehensive view of many of the TF
   cofactors involved and how they cooperate. In this study, we
   systematically studied HIF1A cofactors in eight cancer cell lines using
   the computational motif mining tool, SIOMICS, and discovered 201
   potential HIF1A cofactors, which included 21 of the 29 known HIF1A
   cofactors in public databases. These 201 cofactors were statistically
   and biologically significant, with 19 of the top 37 cofactors in our
   study directly validated in the literature. The remaining 18 were novel
   cofactors. These discovered cofactors can be essential to HIF1A’s
   regulatory functions and may lead to the discovery of new therapeutic
   targets in cancer treatment.

   Subject terms: Cancer genomics, Cancer, Computational biology and
   bioinformatics, Data integration, Gene regulatory networks

Introduction

   Most human cells typically require normal levels of oxygen to carry out
   their metabolic functions. When these cells are in a low oxygen/hypoxic
   environment, they will consequently activate a variety of adaptive
   responses to maintain oxygen homeostasis^[30]1. The effects of hypoxia
   are especially evident in cancer cells, as their ability to rapidly
   replicate and proliferate forces them to quickly exceed their oxygen
   supply^[31]2. To survive, cancer cells, mainly solid tumors, have
   developed various defense mechanisms to adapt to hypoxic
   environments^[32]3. Such defense mechanisms primarily rely on the
   activation of hypoxia inducible factor 1 alpha (HIF1A), a transcription
   factor (TF) that interacts with other cofactor TFs to activate
   transcriptional responses like increased glucose uptake and
   angiogenesis. Since the responses HIF1A regulates facilitate a cancer
   cell’s survival in hypoxic conditions^[33]4, many resources have been
   diverted in HIF1A targeted treatments^[34]5,[35]6.

   Though many attempts to target HIF1A have been proposed, one of the
   most appealing prospects has been to focus on targeting HIF1A through
   its protein–protein interactions (PPIs). This method has demonstrated
   high diagnostic potential^[36]7. Like many TFs, HIF1A interacts with a
   multitude of cofactor TFs to effectively control gene expression^[37]8.
   For instance, HIF1A is known to dimerize with its binding partner aryl
   hydrocarbon receptor nuclear translocator (ARNT) to cooperatively
   co-regulate target genes. PPIs between different TFs such as HIF1A and
   ARNT are extremely important due to their ability to confer high
   specificity and drive differential gene expression^[38]9,[39]10.
   Despite their importance, we found that in many databases containing
   experimentally curated PPIs—BioGRID^[40]11, HPRD^[41]12, and
   BIND^[42]13—very few of the HIF1A cofactors listed were TFs. The
   undiscovered TF cofactors could have novel therapeutic responses and
   contribute to a much deeper understanding of hypoxia’s regulatory
   mechanisms.

   In this study, we systematically investigated the potential TF
   interactions of HIF1A. Measuring possible PPIs in an experimental
   setup, usually done using co-immunoprecipitation arrays, is
   time-consuming and costly. Instead, using computational methods to find
   such cofactors possesses many benefits due to their high recall rate
   for predicting cofactors in a time-efficient manner^[43]14–[44]16. We
   computationally identified 201 potential HIF1A TF cofactors, many of
   which were supported by PPI databases and literature. These cofactors
   were conserved across multiple cell lines and crucial to regulating
   HIF1A regulated pathways. Our study facilitates an increased
   understanding of HIF1A’s regulatory functions with transcriptional
   cooperativity.

Material and methods

Retrieving genomic sequences that HIF1A binds

   To systematically study HIF1A cofactors in cancer cell lines, we
   retrieved raw ChIP-seq data in the following eight human cell lines:
   501A^[45]17, MDA-MB-231^[46]18, HepG2^[47]19, HKC8^[48]19,
   HUVEC^[49]20, K562^[50]21, LNCaP^[51]22, and PC3^[52]23, from the
   corresponding BioProjects at National Center for Biotechnology
   Information (NCBI) (Supplementary Tables [53]S1A & [54]S1B). The
   primary tissue-derived cell lines we have chosen represent the most
   common cancer types in each organ or system including blood, prostate,
   breast, liver, skin, and kidney. Next, we applied the tool
   Trimmomatic^[55]24 to the raw ChIP-seq read data to trim adaptor
   sequences and filter low-quality reads. We then mapped the processed
   reads in each cell line onto the human genome hg38 using Bowtie2^[56]25
   and defined the ChIP-seq peaks of the HIF1A binding regions using
   MACS2^[57]26. To discover TF binding sites (TFBSs) in these ChIP-seq
   experiments, we extended each MACS2 peak region so that each peak
   region was at least 800 base pairs long, which is about the median
   length of a cis-regulatory region^[58]27,[59]28. Finally, we extracted
   the repeat-masked genomic sequences from the UCSC genome browser^[60]29
   for each peak region. These sequences are likely to contain TFBSs of
   HIF1A and its cofactors.

Identifying potential HIF1A TF cofactor motifs and cofactors

   To find the potential TF cofactors of HIF1A in a cell line, we applied
   the tool SIOMICS to the above sequences from the extended ChIP-seq peak
   regions in this cell line to identify motifs (Fig. [61]1)^[62]16. A
   motif is a pattern of the TFBSs a TF binds, often represented by a
   position weight matrix. Many computational tools are developed for
   motif discovery in ChIP-seq data^[63]16,[64]30–[65]35. We chose SIOMICS
   here because it de novo identifies motifs by discovering motif modules,
   which can effectively work with large sequence datasets and
   significantly reduces false positive predictions^[66]15,[67]16,[68]36.
   A motif module is a group of motifs that significantly co-occurs in the
   input sequences, mimicking the group of motifs for a TF and its
   cofactors under a given experimental condition. In biological terms,
   these discovered motif modules are the binding patterns of TFs and
   their TF cofactors that frequently bind to the HIF1A ChIP-seq peak
   regions, which thus represent HIF1A and its cofactor motifs.

Figure 1.

   [69]Figure 1
   [70]Open in a new tab

   The pipeline to study hypoxia motifs and cofactor pairs.

   With the identified motifs in motif modules, we compared the predicted
   motifs with the known motifs in the JASPAR database^[71]37. We claimed
   that a predicted motif was similar to a known motif if their STAMP
   comparison E-value was smaller than 1.0E−5^[72]38. A predicted motif
   similar to a known motif suggests that the TF corresponding to this
   known motif may play a regulatory role in the data. As a side product,
   we considered this corresponding TF the HIF1A TF cofactor that binds to
   this predicted motif. In this way, we obtained 201 HIF1A cofactors.

Validating the predicted motifs and cofactors

   To validate the predicted motifs and cofactors, in addition to
   comparing the predicted motifs with known motifs, we compared the
   predicted motifs across the eight cell lines (Supplementary Table
   [73]S2). Two predicted motifs were similar if they had a STAMP E-value
   smaller than 1.0E−8. This more stringent E-value cutoff was used as
   previously^[74]39 because these motifs were predicted by the same tool
   and might be intrinsically more similar. A more stringent cutoff than
   the comparison of predicted with known motifs may control the false
   positives better.

   Similarly, we compared the predicted motif pairs across cell lines.
   Recall that SIOMICS outputs motif modules, and a motif module is a
   group of motifs that co-occur in a significant number of input
   sequences. We considered all motif pairs in every motif module as the
   motif pairs in each cell line. Two predicted motif pairs from two cell
   lines were similar if the corresponding predicted motifs in the two
   pairs were similar (STAMP E-value < 1.0E−8).

   In our study, motif pairs are significant since if the TFs are indeed
   cofactors of HIF1A, they should be interacting with one another. This
   combinatorial interaction of TFs within the TF complex is crucial to
   gene expression. Thus, with the predicted motif pairs, we investigated
   whether their corresponding cofactor pairs were enriched with known
   interacting TF pairs in BioGRID V4.4.209^[75]11. BioGRID is a database
   that stores curated PPIs, including TF-TF interactions. We gathered
   these TF-TF interactions and split them into two categories—direct and
   indirect. A direct interaction meant that the two TFs physically
   interacted, and an indirect interaction indicated that the TFs
   interacted through a third protein. Following this procedure, we
   gathered 6904 direct and 101,455 indirect TF-TF interactions. The
   predicted TF pairs that correspond to the predicted motif pairs in
   motif modules were then compared with these curated TF-TF interactions
   in BioGRID in enrichment analysis. Since each motif could correspond to
   multiple TFs, all of which had STAMP E-value smaller than 1.0E−5, we
   obtained the corresponding TF pairs from the predicted motif pair in
   two ways. One was to consider only the TF with its motif most similar
   to a predicted motif as the TF cofactor behind this predicted motif.
   The other was to consider up to the top five TFs with their motifs
   similar to a predicted motif as the TF cofactors of this predicted
   motif, because the default STAMP outputs only up to the top five TFs.
   For each of the obtained two sets of predicted cofactor pairs, the
   enrichment of known TF pairs in the set of predicted cofactor pairs in
   every cell line was carried out by the hypergeometric testing described
   in the next section.

   Finally, we studied the predicted HIF1A cofactors. We compared the
   HIF1A direct cofactors with the known HIF1A-interacting-TFs in
   BioGRID^[76]11, HPRD^[77]12, and BIND^[78]13, obtained from the NCBI
   page of the HIF1A gene ([79]https://www.ncbi.nlm.nih.gov/gene/3091).
   Using PubMed, we also searched for additional HIF1A cofactors that were
   not found in these databases by looking at if a predicted cofactor and
   HIF1A interact with each other, creating a high likelihood of a
   regulatory cascade. We used various experiments in the literature to
   further analyze and verify the predicted HIF1A cofactors.

Testing the enrichment of known interacting TF pairs in the predicted ones

   We tested the enrichment of known interacting TF pairs in the predicted
   ones by hypergeometric testing. Assume there were N TFs and M
   interacting TF pairs in the database. Assume there was n of the N TFs
   in a given group of predicted cofactor pairs in a cell line, and m
   cofactor pairs were known to interact. The p value of the enrichment of
   the known interacting TF pairs was then calculated as
   [MATH:
   <mrow><mi>p</mi><mi>h</mi><mi>y</mi><mi>p</mi><mi>e</mi><mi>r</mi><mfen
   ced close=")"
   open="("><mrow><mi>m</mi><mo>,</mo><mfrac><mrow><mi>n</mi><mfenced
   close=")"
   open="("><mrow><mi>n</mi><mo>-</mo><mn>1</mn></mrow></mfenced></mrow><m
   n>2</mn></mfrac><mo>,</mo><mi>M</mi><mo>,</mo><mfrac><mrow><mi>N</mi><m
   fenced close=")"
   open="("><mrow><mi>N</mi><mo>-</mo><mn>1</mn></mrow></mfenced></mrow><m
   n>2</mn></mfrac></mrow></mfenced></mrow> :MATH]
   , where
   [MATH:
   <mrow><mi>p</mi><mi>h</mi><mi>y</mi><mi>p</mi><mi>e</mi><mi>r</mi><mfen
   ced close=")"
   open="("><mrow><msub><mi>x</mi><mn>1</mn></msub><mo>,</mo><msub><mi>y</
   mi><mn>1</mn></msub><mo>,</mo><msub><mi>x</mi><mn>2</mn></msub><mo>,</m
   o><msub><mi>y</mi><mn>2</mn></msub></mrow></mfenced><mo>=</mo><munderov
   er><mo
   movablelimits="false">∑</mo><mrow><mrow><mi>k</mi><mo>=</mo><msub><mi>x
   </mi><mn>1</mn></msub></mrow></mrow><mrow><mtext>min</mtext><mfenced
   close=")"
   open="("><mrow><msub><mi>y</mi><mn>1</mn></msub><mo>,</mo><msub><mi>x</
   mi><mn>2</mn></msub></mrow></mfenced></mrow></munderover><mfrac><mrow><
   msub><mi>y</mi><mn>1</mn></msub><mo>!</mo><mfenced close=")"
   open="("><mrow><msub><mi>y</mi><mn>2</mn></msub><mo>-</mo><msub><mi>y</
   mi><mn>1</mn></msub></mrow></mfenced><mo>!</mo><msub><mi>x</mi><mn>2</m
   n></msub><mo>!</mo><mfenced close=")"
   open="("><mrow><msub><mi>y</mi><mn>2</mn></msub><mo>-</mo><msub><mi>x</
   mi><mn>2</mn></msub></mrow></mfenced><mo>!</mo></mrow><mrow><msub><mi>y
   </mi><mn>2</mn></msub><mo>!</mo><mi>k</mi><mo>!</mo><mfenced close=")"
   open="("><mrow><msub><mi>x</mi><mn>2</mn></msub><mo>-</mo><mi>k</mi></m
   row></mfenced><mo>!</mo><mfenced close=")"
   open="("><mrow><msub><mi>y</mi><mn>1</mn></msub><mo>-</mo><mi>k</mi></m
   row></mfenced><mo>!</mo><mfenced close=")"
   open="("><mrow><msub><mi>y</mi><mn>2</mn></msub><mo>-</mo><msub><mi>x</
   mi><mn>2</mn></msub><mo>-</mo><msub><mi>y</mi><mn>1</mn></msub><mo>+</m
   o><mi>k</mi></mrow></mfenced><mo>!</mo></mrow></mfrac></mrow> :MATH]
   for any non-negative integers x[1], y[1], x[2] and y[2]^[80]40,[81]41.

Gene Ontology (GO) and pathway analysis

   To study the regulatory mechanisms of the identified cofactors and the
   impacts of their cooperativity with HIF1A, we analyzed the binding
   specificity of the cofactors and their common target genes. We
   identified the target genes using the annotatePeaks tool from
   HOMER^[82]42 with the GENCODE^[83]43 annotation files and assigned
   every ChIP-seq peak to a gene according to the peak’s distance to the
   nearest transcription start site (TSS). Using the GO and Reactome
   pathway analysis offered by STRING^[84]44, the identified target genes
   and TFs were then researched regarding their roles in hypoxia-related
   pathways. Moreover, the tool ChIPseeker^[85]45 was also used to analyze
   the data, which allowed us to visualize the distance of the ChIP-seq
   peaks to the nearest gene and the pathway enrichment analysis of the
   ChIP-seq peaks.

Results

The predicted motifs and TF cofactors are biologically meaningful

   Using the process shown in Fig. [86]1, we could identify motifs and TF
   cofactors in every cell line (Supplementary Table [87]S1A and [88]S1B).
   To see whether the predicted motifs and TF cofactors are biologically
   meaningful, we compared the predicted motifs with known motifs in the
   JASPAR database^[89]37. In every cell line, on average, approximately
   65.6% (Supplementary Table [90]S2) of the predicted motifs were similar
   to known motifs (STAMP E-value < 1.0E−5) (Table [91]1). Not every
   predicted motif has a similar known JASPAR motif because JASPAR does
   not have known motifs for many TFs, and the current tools to compare
   motif similarity are imperfect^[92]36,[93]46. When we lowered the
   cutoff to 1.0E−4, more than 83.8% of the identified motifs were similar
   to a known motif. The high percentages of the predicted motifs similar
   to the known motifs indicate that these predicted motifs are
   biologically sound.

Table 1.

   The predicted motifs in eight cell lines.
   Cell lines #peaks #motifs %motifs similar to known motifs (STAMP
   E-value ≤ 1.0E−5) %motifs similar to known motifs (STAMP
   E-value ≤ 1.0E−4) %motifs in other cell lines %Peaks overlapped across
   cell lines (%)
   501A 13,061 100 60/100 = 60.0% 81/100 = 81.0% 90/100 = 90.0% 66.2
   MDA-MB-231 1500 32 25/32 = 78.1% 29/32 = 90.6% 32/32 = 100% 94.1
   HepG2 12,600 100 68/100 = 68.0% 88/100 = 88.0% 97/100 = 97.0% 60.9
   HKC8 391 100 58/100 = 58.0% 77/100 = 77.0% 71/100 = 71.0% 97.2
   HUVEC 661 27 13/27 = 48.1% 20/27 = 74.1% 25/27 = 92.3% 59.3
   K562 4156 100 71/100 = 71.0% 88/100 = 88.0% 90/100 = 90.0% 58.7
   LNCaP 1563 98 58/98 = 59.2% 77/98 = 78.6% 75/98 = 76.5% 69.1
   PC3 39,025 100 82/100 = 82.0% 93/100 = 93.0% 94/100 = 94.0% 66.2
   [94]Open in a new tab

   We also compared the predicted motifs across cell lines. The rationale
   for this comparison is that if a motif is a bona fide one, it is likely
   to occur in other cell lines, as different cell lines may share certain
   regulatory pathways. We found that, on average, approximately 88.9% of
   the predicted motifs in each cell line were independently identified
   across cell lines (STAMP E-value < 1.0E−8). This high percentage of
   motif conservation across cell lines corroborates the biological
   significance of the predicted motifs and cofactors.

   Note that such a high percentage of motif conservation across cell
   lines was not due to sharing ChIP-seq peaks by cell lines. We observed
   that the overlap percentage of ChIP-seq peaks did not correlate with
   the percentage of similar motifs between cell lines. For instance,
   although more than 87.7% of peaks in HKC8 overlapped with those in
   HepG2, only 25.9% of the motifs were similar between the two cell
   lines. Moreover, we removed the overlapping peaks in two cell lines and
   still observed a large percentage of shared motifs. For instance, the
   two cell lines, 501A and HepG2, which had a similar number of ChIP-seq
   peaks, shared 40.1% of their HIF1A ChIP-seq peaks and 83.0% of the
   predicted motifs. After removing all overlapping peaks, 59.0% of the
   motifs in 501A were similar to those in HepG2, and 52.0% of the motifs
   in HepG2 were similar to those in 501A. The substantial number of
   motifs still similar after removing overlapping peaks demonstrated that
   the sharing of the motifs was not due to overlapping peak regions
   across cell lines. It also suggested that the predicted motifs and
   cofactors were biologically meaningful.

   Finally, we compared the predicted motif pairs across cell lines. More
   than 63.9% of the motif pairs were independently discovered in other
   cell lines. To see the significance of such a high percentage of motif
   pairs independently discovered across cell lines, we generated the same
   numbers of random motif pairs in each cell line. We randomly chose the
   predicted motifs in each cell line to form the same number of random
   motif modules, with each random motif module composed of the same
   number of motifs as the corresponding predicted motif module. We then
   similarly obtained random motif pairs from these random motif modules.
   We observed that a much lower percentage of random motif pairs occurred
   in more than one cell line (Supplementary Table [95]S2, p
   value < 2.1E−15).

Databases and literature support the predicted cofactor interactions

   To evaluate the biological significance of the predicted cofactors, we
   studied whether the predicted cofactor pairs were enriched with known
   interacting TF pairs in BioGRID. In brief, for every predicted motif
   module in each cell line, we obtained the corresponding cofactor pairs
   in two ways, with every motif pair in each motif module corresponding
   to one or multiple cofactor pairs (Material and Methods). For the set
   of all cofactor pairs in each cell line, we then applied hypergeometric
   testing to see whether the known interacting TF pairs in BioGRID were
   significantly overrepresented in this set of the predicted cofactor
   pairs. We found that the known interacting TF pairs were significantly
   enriched in the predicted ones in all cell lines except HUVEC
   (Fig. [96]2A,B, and Supplementary Table [97]S3). As discussed in the
   following sections, many interacting TF pairs buried in literature were
   not curated in BioGRID, such as KLF4, EGR1, NFE2L2, TFAP2A, etc. In
   this sense, one should consider the above enrichment significance being
   underestimated. We did not identify any known cofactor pair in HUVEC,
   likely because HIF1A is not a dominant hypoxia-related TF here, as
   discussed in the next section.

Figure 2.

   [98]Figure 2
   [99]Open in a new tab

   The predicted cofactors are biologically sound. (A) and (B)
   Overrepresentation of the known TF-TF interactions in the predicted
   cofactor pairs. One predicted motif may correspond to multiple TFs in
   (A) while only one TF in (B). (C) The top 37 HIF1A cofactors ranked by
   the multiple of the -log of the STAMP E-value of its predicted motif
   and the number of motif modules in which this predicted motif occurred.
   An orange bar indicates that these cofactors were in BIND, HPRD, or
   BioGRID. A green bar indicates that they were not in the above
   databases but supported by literature. A blue bar indicates the
   remaining novel cofactors.

   The above analysis was for all predicted cofactor pairs in a cell line.
   We next sought to study how comprehensively the predicted cofactors
   included known TFs interacting with HIF1A. We obtained the curated
   known HIF1A cofactors from BioGRID^[100]11, HPRD^[101]12, and
   BIND^[102]13, large databases containing many experimentally verified
   PPIs. We filtered out the curated cofactors that were not TFs using the
   GO database^[103]47 and those that did not have a known motif in
   JASPAR. In this way, only 49 of the 431 curated cofactors in these
   databases are TFs, and only 29 of the 49 cofactors have motifs in
   JASPAR. We found that the predicted cofactors included 21 of these 29
   (72.4%) curated cofactors. We also compared the predicted HIF1A
   cofactors with those in another study by Semenza^[104]48 that listed 20
   HIF1A TF cofactors, many of which differed from the TFs gathered from
   BioGRID, HPRD, and BIND. We predicted 15 of the 20 (75.0%) HIF1A
   cofactors (STAMP E-value < 1.0E−5).

   We next investigated why eight of the 29 known cofactors were missed.
   Two cofactors, RORA (STAMP E-value 2.9E−3) and ELK1 (STAMP E-value
   4.5E−04), were identified in HKC8 and PC3, respectively, while excluded
   because they did not meet the STAMP E-value cutoff of 1.0E−5. All of
   the remaining six cofactors were paralogous to the predicted cofactors.
   Similarly, two of the twenty cofactors from Semenza did not satisfy the
   STAMP E-value cutoff, and all of the remaining three cofactors were
   paralogous to our predicted cofactors.

   Although the motifs of paralogous TFs are highly similar, there are
   subtle differences. The STAMP tool may have recognized such differences
   and predicted that the known motifs of only certain paralogs are
   similar to a predicted motif. Furthermore, the default STAMP analysis
   outputs only up to the top 5 TFs with their known motifs similar to a
   predicted motif, which may prevent discovering these missed paralogous
   cofactors. When we allowed STAMP to output more top TFs, all missed
   paralogous cofactors were discovered. For instance, the six of the 29
   known cofactors we missed, CEBPA, TP63, TP73, RUNX2, RUNX3, and ESRRB,
   had a STAMP E-value of 7.0E−5, 1.7E−4, 1.4E−3, 2.9E−3, and 1.6E−3,
   respectively. They were missed because their STAMP E-value did not
   satisfy our cutoff, and they were not in the top five TFs with motifs
   similar to the corresponding predicted motifs. Interestingly, the
   missed paralogous TFs might not be involved in the eight cell lines we
   considered here. We analyzed the experiments where the known
   interaction was gathered and found that the cell lines used to verify
   their interaction with HIF1A differed from the ones we used. For
   instance, to the best of our knowledge, RUNX3 has only been
   demonstrated in human gastric cells and RUNX3 transfected cells to
   interact with HIF1A^[105]52, and CREB3L1 has a high tissue-specific
   expression and is preferentially expressed in only osteoblasts and
   astrocytes^[106]55, which are thus not validated in the eight cell
   lines we considered under the hypoxia conditions.

   Because the number of the predicted HIF1A-interacting cofactors was
   large, we focused on the top 37 predicted cofactors that had their
   motifs co-occur with the HIF1A motif in motif modules (Fig. [107]2C).
   Intuitively, a top cofactor should have its motif more similar to a
   known motif and be active in more cell lines. We thus ranked the
   predicted cofactors based on the product of the -log of the STAMP
   E-value of its predicted motif and the number of cell lines in which
   its motif was predicted. A cutoff of 15 was used because a top cofactor
   should have an average STAMP E-value smaller than 1.0E−5 (Materials and
   Methods) and occur in more than three of the eight cell lines. We
   obtained the top 37 cofactors with this cutoff of 15. We found that 19
   of these 37 cofactors were directly supported by literature and the PPI
   databases. This high success rate in discovering known HIF1A cofactors
   supports the validity of the study and suggests that the predicted
   cofactors are likely biologically meaningful. Surprisingly, of these 19
   cofactors, 12 (63.1%) cofactors were not reported in the current PPI
   databases (Fig. [108]2C), suggesting that the current PPI databases are
   far from complete. For instance, four of the top five
   cofactors—NFE2L2^[109]49, KLF5^[110]50, TFAP2A^[111]51, and
   EGR1^[112]52—have all been experimentally identified to interact with
   HIF1A while not being found in the PPI databases. For the remaining 18
   novel cofactors, as shown in the last section, we could find
   experimental evidence to support the direct interaction of HIF1A with
   at least 11 cofactors. In other words, at least 30 of the top 37
   cofactors are most likely interactors of HIF1A.

HIF1A and cofactors regulate significant cancer-related pathways

   After validating the statistical and biological significance of the
   cofactors, we next sought to identify the mechanisms behind their
   contributions to hypoxia pathways, especially across different cell
   lines. We assigned the closest gene to each HIF1A ChIP-seq peak and
   analyzed the regulation of those genes and their relation to the
   cofactors (Supplementary Tables [113]S4–[114]S6).

   The regulatory regions, especially enhancers in mammalian genomes, are
   often far from their target genes^[115]53–[116]56. We thus first
   investigated how far the HIF1A ChIP-seq peaks are relative to their
   closest genes and whether it is proper to assign the closest genes as
   the target genes of the peak regions. Using the Homer tool^[117]42,
   annotePeaks.pl, we labeled all HIF1A ChIP-seq peaks by analyzing their
   distances to the nearest TSS. A representation of the binding
   specificities was then gathered using the tool ChIPseeker
   (Fig. [118]3). We observed that HIF1A and its respective TF cofactors
   have binding sites that are mostly found to be clustered around the TSS
   (Fig. [119]3), showing that HIF1A and most of the TF cofactors
   identified regulate gene expression by binding to promoters and are
   primarily promoter-centered TFs, a trend supported by other
   literature^[120]19,[121]57. Most HIF1A ChIP-seq peaks are thus close to
   the protein-coding genes and likely to regulate the closest
   protein-coding genes in all cell lines except HUVEC.

Figure 3.

   [122]Figure 3
   [123]Open in a new tab

   The distance of the HIF1A ChIP-seq peak regions to the proximal genes.
   Using the ChIPseeker tool, the HIF1A peaks were visualized in terms of
   their proximity to nearby genes.

   With the closest genes as the target genes, we next sought to identify
   pathways through GO, Reactome, and ChIPseeker pathway analysis
   (Supplementary Tables [124]S4–[125]S6). We discovered that among the
   most overrepresented GO terms and pathways, several of them identified
   were heavily related to hypoxia-induced pathways in cancers. For
   instance, GO terms like canonical glycolysis, glucose metabolism,
   IRE1-mediated unfolded protein response^[126]58, peptidyl-proline
   hydroxylation to 4-hydroxy-1-proline^[127]4, RHO GTPase cycle^[128]59,
   transcriptional regulation by TP53^[129]60, and cellular response to
   hypoxia were identified, while similar processes were found in the
   Reactome and ChIPseeker pathway analysis—transcriptional regulation by
   TP53, glycolysis, cellular responses to stress, etc. Consistently with
   all three methods, we found no enriched biological processes in HUVEC.

   Surprisingly, HUVEC differs greatly from the rest of our cell lines. As
   shown in Fig. [130]3, we also noticed that HIF1A’s binding
   distributions differed from those of other cell lines. One possible
   explanation is that endothelial PAS domain protein 1 (EPAS1, also known
   as HIF2A) instead of HIF1A is characterized as the predominant hypoxia
   inducible factor isoform in HUVEC^[131]20. However, it is still
   uncharacteristic that HIF1A does not have some other regulatory role.
   Instead, we propose that this is due to HIF1A’s lack of cofactors in
   the HUVEC cell line. It has been well documented that the cofactors
   involved in a complex are crucial to binding specificity and gene
   regulation. Consistent with our claim, according to Table [132]1, we
   can see that HUVEC has a low cofactor discovery rate, and in
   Fig. [133]2, there is a low enrichment for the TF pairs in HUVEC. As a
   result, we hypothesize that it is due to the lack of cofactors that
   HIF1A’s binding specificity is different and gene enrichment is
   minimal. Thus, this further supports the validity of the discovered
   cofactors in our study and the crucial roles they play in HIF1A’s
   functions.

   We next analyzed how different TF cofactors play different roles in
   different cell lines, resulting in potential cell-type-specific
   expression. Despite a high number of cofactors identified across cell
   lines, the identified cofactors were quite distinct when comparing two
   cell lines. On average, more than 71.1% of the cofactors in each cell
   line were not found in another cell line (Supplementary Table [134]S7).
   Since the difference in cofactors is essential to differential gene
   expression, we expect that the more different the cofactors are, the
   more different the enriched GO terms are. To quantify this, we graphed
   the difference in GO terms against the difference in cofactors in every
   pair of cell lines (Supplementary Figure [135]S1). We found that for
   most of the cell lines, a high difference between TF cofactors resulted
   in a high difference in the enriched GO terms (average R^2 = 0.77) in
   all cell lines except MDA-MB-231 (R^2 = 0.53) and HKC8 (R^2 = 0.23).
   This conclusion still held when we compared the difference in the
   identified pathways with the difference in the cofactors in pairs of
   cell lines. For instance, through the ChIPseeker pathways, although
   only 6.7% of the cofactors in MDA-MB-231 and 3.5% of the cofactors in
   HKC8 were shared (Supplementary Table [136]S7), four of the five most
   enriched pathways in HKC8 and four of the five most enriched pathways
   in MDA-MB-231 were identified in both cell lines (Supplementary Table
   [137]S6). The above analysis suggested that while different TF
   cofactors help promote various transcriptional responses, HIF1A may
   also cooperate with different TF cofactors in different cell lines to
   regulate overlapping target genes and pathways.

The novel HIF1A cofactors make biological sense

   We gathered the leftover TF cofactors that were not supported by
   external evidence, as found in Fig. [138]2C. In this study, we labeled
   a predicted cofactor as a probable HIF1A cofactor if it is confirmed to
   interact with HIF1A directly or indirectly, controlling the expression
   of HIF1A and vice versa and possess a high similarity in their target
   genes. We could not go into details about the 201 HIF1A cofactors
   discovered (Supplementary Table [139]S8), including those in
   Fig. [140]2C. Instead, we decided to validate the top TF cofactors
   (Fig. [141]4A) that were not in the databases or directly supported by
   literature, namely: heat shock transcription factor 1 (HSF1), E2F
   transcription factor 4 (E2F4), E2F6, nuclear respiratory factor 1
   (NRF1), FOS like 2, AP-1 transcription factor subunit (FOSL2), signal
   transducer and activator of transcription 1 (STAT1) and Kruppel like
   factor 4 (KLF4) (Fig. [142]4B). For the remaining top cofactors, we
   filled the supporting evidence as HIF1A cofactors to the best of our
   knowledge in Supplementary Table [143]S9.

Figure 4.

   [144]Figure 4
   [145]Open in a new tab

   Interactions between discovered HIF1A cofactors. (A) Cytoscape
   representation of the top 37 TFs and their interactions with one
   another. (B) Cytoscape network of interactions between six top novel TF
   cofactors, HSF1, E2F4, E2F6, STAT1, FOSL2, NRF1, and KLF4. SIOMICS
   predicted the edges shown.

   HSF1 is the main human protein responsible for regulating responses to
   severe heat and inflammation. It activates heat shock proteins like
   HSP70 and HSP90, which plays an essential role in tumorigenesis, as
   they allow cancer cells to manage their elevated heat and stress levels
   resulting from their rapid proliferation^[146]61. Due to the connection
   between hypoxia and cancer, it has also been demonstrated that HSF1 and
   HIF1A are closely related to tumor formation^[147]62. HSF1 regulates
   the mRNA binding protein, HuR, which controls the expression of HIF1A,
   and VEGF, a protein that stimulates neovascularization and plays an
   essential role in responses to hypoxia. A lack of HSF1 has been shown
   to dramatically deplete HIF1A and decrease angiogenesis, which is
   directly linked to a cancer cell’s growth and development. Our analysis
   further supports the importance of HSF1 in hypoxia-related pathways as
   the GO term “cellular responses to heat stress” was found in all cell
   lines except HUVEC. Interestingly, though it could mistakenly be
   inferred that HSF1 is a transcriptional regulator of HIF1A, there are
   very complex relations between the two TFs. It was found that while
   HIF1A upregulates HSF1 expression in Drosophila cells to activate the
   heat shock response pathway, HSF2 and HSF4 were found to upregulate
   HIF1A in mammalian cells^[148]62 to activate a hypoxic one. Moreover,
   we found that HSF1 was ranked as a top ten predicted cofactor by the
   tool PIP^[149]63, and seven of the top nine predictions by which were
   experimentally verified by previous studies. PIP predicts a potential
   interacting cofactor by testing the co-expression of the two proteins,
   orthology, domains, transitive scores, and other factors. All the above
   evidence shows the interconnectedness of the two TFs, which
   demonstrates a high possibility of HSF1 engaging in a PPI with HIF1A.

   E2F4 and E2F6 are members of the E2F family of TFs and have a high
   potential to interact with HIF1A directly. They are responsible for
   regulating cell cycle^[150]64. GO terms like “mitotic DNA damage
   checkpoint, regulation of transcription from RNA polymerase II promoter
   in response to hypoxia, and cell cycle phase transition” were found in
   our enrichment analysis (Supplementary Table [151]S5). Extensive
   studies have also linked both TFs and their functions to
   hypoxia-related pathways in cancers. For instance, E2F4 was found to
   work cooperatively with MYC and NRF1 to promote tumorigenesis in
   hypoxia-driven apoptosis^[152]65. Furthermore, Tiana et al.
   hypothesized that HIF1A might directly interact with E2F4/E2F6 to
   regulate the activity of SIN3A, a protein essential for a cell’s
   complete response to hypoxia^[153]20. E2F7, a paralog of E2F4/E2F6, has
   also been found to interact physically with HIF1A to form a repressor
   complex^[154]66.

   Interestingly, in the E2F4-NRF1-MYC complex mentioned above, NRF1 was
   also identified as a potential cofactor, while MYC is a known TF
   cofactor of HIF1A^[155]67. A previous study showed that NRF1 was found
   to be a regulator of HIF1A in HEK293T cells. Furthermore, Wierenga et
   al. performed a motif search on HIF1A binding sites^[156]68 and found
   that NRF1 was significantly enriched alongside SP1^[157]69 and
   ELK1^[158]70, both of which are known HIF1A cofactors. NRF1 has also
   been discovered to be a pioneer TF^[159]71. Pioneer TFs remodel their
   nearby landscape to allow non-pioneer TFs to interact with a specific
   DNA sequence, and in this case, HIF1A may interact with NRF1 to
   increase binding specificity and so that it can regulate target genes
   that it did not previously have access to.

   Much evidence supports the rest of the possible TF cofactors that were
   identified. For instance, FOSL2 is another potential cofactor. FOSL2
   and other members of the FOS family usually dimerize with proteins of
   the JUN family to form the AP-1 complex^[160]72. Our analysis
   identified many components of the AP-1 complex, including JUN, JUND,
   JUNB, FOS, FOSL1, and FOSL2. The AP-1 complex has conclusively been
   shown to interact with HIF1A to express a variety of hypoxia inducible
   factor target genes, including VEGF, the master regulator of
   angiogenesis. FOSL2 is also shown to have bound to several of HIF1A’s
   target genes and is itself upregulated by HIF1A^[161]73. Thus, it is
   very probable that FOSL2 is a possible HIF1A cofactor, either through
   interacting with HIF1A through the AP-1 complex or as an individual TF.
   STAT1 is another significant possible HIF1A cofactor identified in this
   study. While HIF1A has been shown to repress STAT1^[162]74, STAT1 has
   been, in turn, shown to repress HIF1A^[163]75. Furthermore, it has been
   well documented that STAT1 and STAT3 are closely related in terms of
   the pathways that they regulate, to the point where interfering with
   the expression of one protein may indirectly affect the expression of
   the other^[164]76. Since STAT3 is a known TF cofactor of HIF1A^[165]77,
   we predict that since the two TFs are so closely related, both may
   engage in an interaction with HIF1A. STAT1 and FOSL2 were found in the
   PIP database due to a high possibility of co-expression, many similar
   domains, and several common cofactors. Finally, KLF4 also shows high
   potential as, like NRF1, KLF4 additionally acts as a pioneer TF. It has
   been shown that while HIF1A upregulates KLF4 in vascular smooth
   cells^[166]78, KLF4 was found to inhibit HIF1A in Huh7 and HepG2
   cells^[167]79. This high interconnectivity between the two TFs
   demonstrates a high possibility of them engaging in crosstalk or a
   regulatory loop.

Discussion

   In this study, we sought to study HIF1A cofactors and how they connect
   to the role of HIF1A as a master regulator of a cancer cell’s response
   to hypoxia. Through the analysis of ChIP-seq data in eight cancer cell
   lines, we identified 201 potential HIF1A cofactors. Most of these
   identified cofactors were likely to be meaningful, as they were
   independently identified in other cell lines, enriched with known
   interacting TF-pairs, supported by curated databases and literature,
   and were connected to HIF1A’s gene regulatory pathways.

   We compared the predicted cofactors with the known HIF1A cofactors
   curated in public databases. We identify 21 of the 29 known TF
   cofactors. Two cofactors were missed due to the STAMP E-value cutoff
   used, suggesting the limitation of the current motif comparison tools.
   The other six missed cofactors are paralogous TFs to the identified
   cofactors, likely inactive in the eight cell lines we considered. For
   instance, the missed cofactor TP63 is paralogous to the identified
   cofactor TP53. The expression of TP63 was undetectable under hypoxic
   conditions^[168]80,[169]81. The high percentage of the identification
   of the known HIF1A cofactors suggests that our pipeline to identify
   cofactors is reliable, and the predicted HIF1A cofactors are
   biologically meaningful and worth further experimental validation.

   We also examined the cofactors not curated in the PPI databases. These
   cofactors could have not been included either because the PPI databases
   were not up to date, or they had not yet been discovered. We found that
   many of these cofactors were supported by the literature. For instance,
   of the top five cofactors that were not curated in the PPI databases,
   four of them—NFE2L2, KLF5, TFAP2A, and EGR1—had been experimentally
   verified as HIF1A cofactors in the literature. We also discovered that
   more than 51.3% of the top 37 cofactors were directly supported by
   experimental evidence in literature. We analyzed the remaining top
   cofactors that were not curated in databases or had their interaction
   with HIF1A experimentally validated. We found evidence supporting their
   interaction with HIF1A to modulate gene expression during hypoxia.

   Recent advancements in understanding the PPI’s role in diseases have
   made targeting PPIs a promising venture, especially in cancer
   therapies^[170]82. Due to the essential role that many of the
   discovered cofactors play in regulating HIF1A target genes, these new
   cofactors may have regulatory mechanisms that could be exploited for
   novel therapies. For instance, TFAP2A, an identified cofactor not found
   in any of the major databases, was demonstrated to interact with HIF1A
   to target the VEGF pathway in nasopharyngeal carcinoma cells. Shi et
   al. showed that inhibiting TFAP2A led to decreased tumor growth and
   conclusively proved that TFAP2A had the potential to be investigated as
   a therapeutic target^[171]51. Many of the other new potential cofactors
   identified also have similar implications. For instance, Toth et al.
   predicted that simultaneously targeting the HIF1A and NFE2L2 presents a
   novel approach for cancer therapies and proposed several molecular
   inhibitors that would target both proteins^[172]83.

   In five of the eight cell lines, we predicted 100 motifs, which was the
   upper limit when running the default SIOMICS. The discovered 100 motifs
   suggested that there could be more motifs unreported. In the future, we
   could explore the remaining motifs and study other cell lines and
   types. Moreover, of the 201 cofactors identified, only the top 37 was
   fully analyzed in this study. This was because it was too time
   consuming to find evidence that supports each and every single cofactor
   left. To gain a more comprehensive view of the HIF1A cofactors in
   hypoxic cancer cells, more studies will have to be conducted and
   experimental research like co-immunoprecipitation arrays would help
   verify our predictions. We look forward to further investigating the
   effects of HIF1A’s cofactors and the importance of TF cooperativity in
   many of its pathways.

Supplementary Information

   [173]Supplementary Information.^ (360.4KB, docx)

Acknowledgements