Abstract CBFA2T3-GLIS2 is a fusion oncogene found in pediatric acute megakaryoblastic leukemia that is associated with a poor prognosis. We establish a model of CBFA2T3-GLIS2 driven acute megakaryoblastic leukemia that allows the distinction of fusion specific changes from those that reflect the megakaryoblast lineage of this leukemia. Using this model, we map fusion genome wide binding that in turn imparts the characteristic transcriptional signature. A network of transcription factor genes bound and upregulated by the fusion are found to have downstream effects that result in dysregulated signaling of developmental pathways including NOTCH, Hedgehog, TGFβ, and WNT. Transcriptional regulation is mediated by homo-dimerization and binding of the ETO transcription factor through the nervy homology region 2. Loss of nerve homology region 2 abrogated the development of leukemia, leading to downregulation of JAK/STAT, Hedgehog, and NOTCH transcriptional signatures. These data contribute to the understanding of CBFA2T3-GLIS2 mediated leukemogenesis and identify potential therapeutic vulnerabilities for future studies. Subject terms: Acute myeloid leukaemia, Paediatric cancer, Cancer models, Cancer genomics __________________________________________________________________ CBFA2T3-GLIS2 is a fusion oncogene found in a subset of pediatric acute megakaryoblastic leukemia associated with a poor prognosis. Here, Garfinkle et al. map the gene regulatory network of CBFA2T3-GLIS2 using a humanized murine model and identify signaling pathways required for leukemogenesis. Introduction Pediatric acute megakaryoblastic leukemia (AMKL) comprises 4–15% of pediatric acute myeloid leukemia (AML) patients and is found in patients with Down syndrome (DS-AMKL) and in patients without Down syndrome (non-DS-AMKL)^[42]1. Children <4 years of age with Down syndrome have a 500-fold greater incidence of AMKL than the general population and have an event-free survival (EFS) approaching 90% with standard chemotherapy^[43]2. Malignant transformation of megakaryoblasts in DS-AMKL is due to trisomy 21 and somatic mutations in the GATA1 gene, which are rarely found in non-DS-AMKL patients^[44]3,[45]4. Prior to next-generation sequencing studies, the genomic alterations driving non-DS-AMKL were not well defined. In 2011, we reported a recurrent cryptic inversion event identified by RNA sequencing on chromosome 16 that produced the coding fusion CBFA2T3-GLIS2 in a subset of pediatric non-DS-AMKL patients^[46]5,[47]6. A follow-up study in 2017 revealed seven major genomic subtypes in pediatric non-DS-AMKL, five of which contain recurrent chromosomal translocations that code for chimeric transcription factors: HOX rearrangements (HOXr), RMB15-MKL1, NUP98-KDM5A, KMT2A rearrangements (KMT2Ar), and CBFA2T3-GLIS2^[48]4. CBFA2T3-GLIS2 was the most prevalent and the most aggressive of the fusions, with overall survival of only 14% despite maximum tolerated high-intensity chemotherapy and stem cell transplant in the first remission^[49]4. Patients harboring the CBFA2T3-GLIS2 fusion have significantly fewer cooperating somatic mutations compared to patients with other subtypes of non-DS-AMKL, suggesting CBFA2T3-GLIS2 is the driver of leukemogenesis^[50]5. Prior chromatin immunoprecipitation sequencing studies (ChIP-seq) of the CBFA2T3-GLIS2 fusion in engineered cell lines using a GFP tag and a CBFA2T3 antibody identified <2000 binding sites genome-wide with only 20% of target genes differentially expressed and 15% carrying a GLIS2-binding motif^[51]7. A significant number of peaks were found to have ETS consensus binding sites, and ERG ChIP-seq confirmed overlapping binding in 50% of the identified CBFA2T3-GLIS2 target genes; however, a biochemical association between the fusion and ERG has not been demonstrated. An analysis of super-enhancers identified ETO, ERG, PDGFRA, and KIT in cell lines. However, neither of these studies distinguished super-enhancers specific to the fusion as compared to the lineage state of the cell^[52]7,[53]8. Given the paucity of identified genomic targets, the lack of differential expression in 80% of these targets, and the genomic complexity of cell lines that carry additional mutations that may interfere with and/or impact results, we sought to establish an optimized model system to more thoroughly understand transcriptional regulation by the fusion. Herein we report on the development of a humanized murine model to identify key transcriptional co-factors and dysregulated signaling pathways that contribute to transformation. Advantages of the model include the isolation of CBFA2T3-GLIS2 as the only genomic event when compared to fusion negative megakaryoblasts and the ability to compare and contrast results in these two populations. The use of cleavage under targets and release using nuclease (CUT&RUN), which requires a lower input and provides superior sensitivity with a high signal-to-noise ratio, allowing for increased resolution of genes targeted by the fusion in our study^[54]9. With this approach, we were able to comprehensively map genome-wide binding with accuracy and identify fusion-specific changes to enhance our understanding of the dysregulated signaling and developmental pathways that may be targeted therapeutically. Results CBFA2T3-GLIS2 drives the transformation of human megakaryoblasts Current models to characterize CBFA2T3-GLIS2 positive non-DS-AMKL are limited. Immortalized cell lines derived from patients harboring this fusion, such as MO7e, have additional acquired mutations, such as in TP53, that are not represented in the patient population^[55]4,[56]5. In contrast to other fusions found in pediatric non-DS-AMKL, CBFA2T3-GLIS2 gene-modified murine bone marrow cells fail to induce a serially transplantable leukemia in the absence of a cooperating mutation^[57]10,[58]11. Several CBFA2T3-GLIS2-positive patient-derived xenograft models are available; however, patient cells are difficult to manipulate for experiments, cell quantity is limited, and they do not grow well in vitro. A reliable model of CBFA2T3-GLIS2-positive AMKL that recapitulates primary patient samples and can be experimentally manipulated is needed. To develop a model, CD34+ stem cells were isolated from human cord blood and transduced with a lentiviral vector encoding N-terminal tagged CBFA2T3-GLIS2 and the GFP reporter gene. The transduced stem cells were cultured in vitro in the presence of IL1β and TPO to promote megakaryoblast differentiation, flow-sorted for GFP purity, and transplanted into NOD.scid.Il2Rγc^null immunodeficient mice with the IL3, CSF2, and KITLG human transgenes (NSG-SGM3)^[59]12. In vivo expanded and recovered cells were defined as mouse CD45 negative and GFP positive (Fig. [60]1A). The fusion-positive megakaryoblasts induced a serially transplantable leukemia within 200 days compared to empty vector transduced megakaryoblasts lacking the fusion (Fig. [61]1B, Supplementary Data [62]1). Immunophenotyping of fusion-positive megakaryoblasts was consistent with patient specimens as determined by proteomics (Supplementary Fig. [63]2). 39% (12/31) of moribund mice presented with hind leg paralysis, and 45% (14/31) had enlarged spleens (Supplementary Data [64]1). Histology revealed neoplastic infiltrates in the bone marrow, spleen, and central nervous system, the latter of which caused the hind leg paralysis due to compression of the spinal cord (Supplementary Fig. [65]1A, B, Supplementary Data [66]1). One mouse developed a hind leg tumor (Supplementary Fig. [67]1A, Supplementary Data [68]1), and nine mice had neoplastic infiltrates in the lung (Supplementary Fig. [69]1B, Supplementary Data [70]1). The latency of disease and moribund phenotype were consistent across serial transplantations, suggesting the model is genomically stable (Fig. [71]1B, Supplementary Data [72]1). Expression of the CBFA2T3-GLIS2 fusion gene was confirmed through RT-PCR (Supplementary Fig. [73]1C) and Western blot (Supplementary Fig. [74]1D). Fig. 1. CBFA2T3-GLIS2 drives the transformation of primary human megakaryoblasts. [75]Fig. 1 [76]Open in a new tab A Generation of CBFA2T3-GLIS2-positive primary human megakaryoblasts. CD34+ stem cells were isolated from human cord blood, transduced with a lentivirus encoding the CBFA2T3-GLIS2 coding sequence and a GFP reporter, followed by differentiation to megakaryoblasts with human TPO and IL1β for 7–9 days. Cultures were sorted for GFP+ purity and transplanted into immunodeficient NSG-SGM3 mice. Recovered cells from transplantation were depleted for mouse CD45 cells, and GFP expression was confirmed by flow cytometry. B Serial transplantation (primary, N = 6; secondary, N = 11; and tertiary, N = 14) of fusion-positive megakaryoblasts induce AMKL within 200 days compared to empty vector megakaryoblasts (N = 6) which do not expand in vivo. Leukemia-free survival is shown. Source data are provided as a Source Data file. C t-SNE analysis of the gene expression profiles of in vitro cultured fusion-positive megakaryoblasts (light green), primary (red), secondary (blue), and tertiary (dark green) transplants, fusion-positive patient samples (orange), and AMKL patient samples harboring a different genetic driver other than the CBFA2T3-GLIS2 fusion (purple). Clusters are based on the top 1% of differentially expressed genes. We have previously shown that CBFA2T3-GLIS2-positive AMKL patient specimens (N = 9) display a distinct gene expression profile compared to other AMKL genomic subsets (N = 35)^[77]4,[78]5. To understand how closely our model mimics the transcriptional program found in patients, we subjected cultured cells as well as primary, secondary, and tertiary leukemias to RNA sequencing (Fig. [79]1C). An average of 18,660 transcripts were expressed with at least one read in our fusion-positive leukemic megakaryoblasts. The serially transplanted leukemic megakaryoblasts (N = 15) and fusion-positive megakaryoblasts cultured in vitro (N = 9) before transplantation were found to recapitulate the gene expression profile of fusion-positive patient specimens (Fig. [80]1C) with a strong correlation (R = 0.91, p < 2.2e−16) (Supplementary Fig. [81]1E). To assess our model on the translational level, tandem mass tag (TMT)-based quantitative proteomics was performed on cells from two transplanted leukemias and four CBFA2T3-GLIS2-positive patient-derived xenograft specimens^[82]13. Fusion-negative megakaryoblasts cultured in vitro were used as a control. Consistent with our transcriptome data, we found a strong correlation between transplanted leukemia cells and patient samples (R = 0.98, p < 2.2e−16) (Supplementary Fig. [83]1F, Supplementary Data [84]2), further supporting that this humanized murine model faithfully replicates primary patient samples phenotypically and can be accurately used to characterize the CBFA2T3-GLIS2 transcriptional complex. Spearman correlation of our transcriptome and proteome data was calculated using paired secondary (N = 1) and tertiary (N = 1) transplanted cells that underwent both RNA-sequencing and proteomic analysis^[85]14. The Spearman’s correlation coefficient was 0.46 (Supplementary Fig. [86]1G, Supplementary Data [87]2), a moderate correlation that compares favorably with prior publications that have looked at the paired transcriptome and proteome data and suggest post-transcriptional modifications and translational regulation that can be investigated in future studies^[88]14. CBFA2T3-GLIS2 binds at GC-rich promoter regions To map genome-wide binding of CBFA2T3-GLIS2 in our humanized model, cleavage under targets and release using nuclease with sequencing (CUT&RUN-seq) was done on CBFA2T3-GLIS2-positive leukemic megakaryoblasts from our murine model (N = 2) using an anti-TY1 antibody to recognize the N-terminal tag engineered on our fusion construct (see the “Methods” section). A total of 28,076 regions were identified, which corresponded to 5256 unique genes bound by the fusion at the promoter. 92% of these genes were expressed, and 55% were differentially expressed when compared to fusion-negative megakaryoblasts (Fig. [89]2A, Supplementary Data [90]3). Although a significant portion of peaks were located at gene promoters (52% of total gene regions), 27% and 16% were bound at intron and intergenic regions respectively, suggesting the fusion is also regulating genes within introns and binding at enhancer regions located upstream of promoter sites (Fig. [91]2A). Of the 55% differentially expressed genes bound by the fusion at promoters, 57% (1517/2657) were downregulated and 43% (1140/2657) were upregulated (Fig. [92]2B, Supplementary Data [93]3). Over-representation analysis (ORA), a statistical method to measure the enrichment of certain pathways in a subset of data, revealed a significant upregulation of genes associated with the WNT, TGFβ, and Hedgehog signaling pathways (Fig. [94]2B, Supplementary Data [95]4), consistent with enriched pathways in CBFA2T3-GLIS2-positive patient samples as previously published^[96]5,[97]15,[98]16. Fig. 2. CBFA2T3-GLIS2 binds at GC-rich promoter regions to alter gene expression. [99]Fig. 2 [100]Open in a new tab A Summary of CUT&RUN-seq data from fusion immunoprecipitation. The percentage of fusion-bound genes that are expressed and differentially expressed in the cell is shown. Distribution of CUT&RUN-seq peak locations from fusion immunoprecipitation. UTR untranslated region, TSS transcription start site. Source data are provided as a Source Data file. B RNA extraction and gene expression profiling were performed on three technical replicates from fusion-negative megakaryoblasts cultured in vitro and fusion-positive leukemic megakaryoblasts recovered from two secondary transplantations. A heatmap of differentially expressed genes bound by the fusion at promoter regions is shown (N = 2657). Log2 fold change (FC) was calculated using a generalized linear model with a negative binomial distribution in DESeq2 and p-values derived from the Wald t-test and corrected using the Benjamini–Hochberg correction. Over-representation analysis using the Reactome database of differentially expressed genes bound by the fusion. p-values were calculated using a one-sided Fisher’s exact and adjusted for multiple comparisons using the Benjamini–Hochberg correction. C Top 74 nodes from transcription factor gene regulatory network (GRN) analysis. To further characterize the altered gene expression profile of fusion-positive leukemic megakaryoblasts and understand the gene regulatory network that leads to dysregulation of genes in the absence of direct CBFA2T3-GLIS2 binding, we performed a global differential gene expression analysis comparing the two transplanted samples and the three fusion negative megakaryoblast samples. This analysis again showed differential expression with 54% of genes (4215) downregulated, and 46% (3590) upregulated compared to fusion negative megakaryoblasts (Supplementary Fig. [101]3A, Supplementary Data [102]3). In total, 68% (5275/7805) of dysregulated genes were not directly bound by CBFA2T3-GLIS2. Given the global-scale, we used gene set enrichment analysis (GSEA) and the Hallmark database to explore enriched pathways and found a consistent upregulation of WNT, TGFβ, and Hedgehog signaling-associated genes but also found upregulation of genes in the NOTCH and JAK/STAT pathways (Supplementary Fig. [103]3A, Supplementary Data [104]4), suggesting the fusion is dysregulating components of these pathways through binding of transcription factors that mediate these signaling pathways^[105]17–[106]19. To model the transcription factor gene regulatory network (GRN), we utilized GeneNetworkBuilder, a pipeline for building regulatory networks using genome-wide binding and gene expression data^[107]20. The top 74 nodes of this GRN when ranked by degree centrality (number of connections to and from a node), include transcription factors involved in JAK/STAT signaling (STAT2, STAT4, STAT6, JUN, JUNB), canonical and non-canonical WNT signaling (TCF7L2, NFATC3, NFYA, JUN), TGFβ (SMAD2, TGIF1, ZEB1), NOTCH (RBPJ), RUNX (RUNX1, RUNX3), and megakaryocyte development (IRF2, FLI1, GFI1B) pathways (Fig. [108]2C Supplementary Fig. [109]3B–G, Supplementary Data [110]5). We did not identify any nodes related to the Hedgehog signaling pathway, consistent with the absence of binding of CBFA2T3-GLIS2 to the GLI1-3 genes and the absence of GLIS1-3 in the GeneNetworkBuilder algorithm (Supplementary Data [111]3). Several transcription factors that are downstream effectors of multiple pathways were found, including JUN, which has been reported to function downstream of JAK/STAT, TGFβ, and WNT signaling. Additionally, JUN regulates the expression of members of the RUNX family, including RUNX2 and RUNX3, which play a role in leukemic progression, underscoring the significant cross-talk seen within the transcriptional network mediated by CBFA2T3-GLIS2 (Fig. [112]2C)^[113]21. Although the network was dominated by oncogenic and developmental transcription factors, a significant central node, IRF2, was identified as part of the megakaryocyte development pathway (Supplementary Fig. [114]3G). While IRF2 is primarily known as an antagonist of the tumor suppressor IRF1, studies have also shown that IRF2 expression in myeloid progenitor cells leads to reprogramming towards the megakaryocytic lineage and enables them to respond to thrombopoietin^[115]22,[116]23. Collectively, this analysis provides a map of the direct and indirect effects of CBFA2T3-GLIS2 expression in megakaryoblasts. Given that the fusion retains the five highly conserved DNA-binding zinc fingers of full-length GLIS2, we hypothesized a majority of genes bound by the fusion would contain a G-C-rich GLIS or the closely related GLI transcription factor consensus sequence. Homer's known motif search was used to identify the top represented consensus sequences from the promoter peak regions (filtered set—10,619) ranging in size from 34 to 11,355 base pairs associated with the 5256 fusion-bound genes. Consistent with previously published data in the MO7e cell line, ETS, GATA, and RUNX consensus sequences were highly represented (Supplementary Fig. [117]3H)^[118]7. Although we did not find a GLIS or GLI motif, we did identify the G-C-rich SP1 motif. To compare the SP1 motif to the GLI superfamily sequences, we performed a neighbor-joining analysis which revealed the SP1 consensus sequence clusters closely with the GLIS1–2 and GLIS1–4 consensus sequences (Supplementary Fig. [119]3I)^[120]24. We hypothesized the lack of GLI motifs in the Homer search was due to a lack of representation of GLI and GLIS consensus sequences in the database. Therefore, to focus the search on the GLI superfamily of transcription factors that share a high degree of homology in the DNA-binding zinc finger domains, a find individual motif occurrences (FIMO) search was used to determine the abundance of the GLIS and GLI family motifs in our data^[121]25. This custom search revealed the GLIS1–2 consensus sequence (DCWHNGTGGGGGGTV) was represented in 39.8% of the promoter regions, suggesting this is the preferential binding sequence. GLIS1–4 (SNSYSYRGGRGGBV) was also represented in 23.6% of regions and SP1 (NRGGGGCGGGGCBDGGVSSSGV) in 36.5% (Supplementary Fig. [122]2J). One or more GLIS1–2, GLIS1–4, and SP1 sites overlapped with fusion-bound sites in 77.5% (4072/5256), 77.8% (4089/5256), and 81% (4246/5256) unique genes, respectively (Supplementary Fig. [123]3K, Supplementary Data [124]6). Fusion preferential binding for GLIS consensus sequences is supported by Vasanth et al. who reported full-length GLIS2 binds GLIS binding sequences with a stronger affinity than GLI binding sequences^[125]26. CBFA2T3-GLIS2 alters super-enhancers To define the super-enhancer landscape in fusion-positive leukemic megakaryoblasts, we first mapped H3K27ac occupancy in fusion-negative and fusion-positive leukemic megakaryoblasts using CUT&RUN-seq. We identified 1537 new H3K27ac peaks in fusion-positive leukemic megakaryoblasts, 56% of which overlap with fusion-bound genes (Supplementary Fig. [126]4A, Supplementary Data [127]3). The ranking of super-enhancers (ROSE) algorithm was utilized to identify super-enhancers (SE) in both fusion-negative and fusion-positive leukemic megakaryoblasts. The signals from two biological replicates were averaged together (individual replicates are shown in Supplementary Fig. [128]3B, C) and identified 263 and 236 unique SE restricted to fusion-negative and fusion-positive leukemic megakaryoblasts, respectively. 74% of SE regions unique to the fusion-positive cells were bound by the fusion (Fig. [129]3A, B, Supplementary Data [130]7). As expected, a majority (61%) of the SE identified in fusion-negative megakaryoblasts led to upregulation of their associated genes (Fig. [131]3C). Similarly, a majority (60%) of the SE identified in fusion-positive leukemic megakaryoblasts were upregulated when compared to fusion negative megakaryoblasts (Fig. [132]3C) and this pattern held true when restricted to fusion-positive SE that were also bound by the fusion (Fig. [133]3C). Fig. 3. CBFA2T3-GLIS2 alters super-enhancers. [134]Fig. 3 [135]Open in a new tab A Averaged signal-ranking plots of H3K27ac bound genes were identified from CUT&RUN-seq in two fusion-negative megakaryoblast samples and two fusion-positive megakaryoblast samples from secondary transplants. Super-enhancer (SE) cut-off values are displayed on the graphs. Source data are provided as a Source Data file. B Venn diagram showing the overlap between SEs in fusion-negative megakaryoblasts (blue) and fusion-positive megakaryoblasts (red). C Heatmaps of differentially expressed SEs in fusion-negative megakaryoblasts (N = 222), fusion-positive megakaryoblasts (N = 187), and fusion-positive megakaryoblasts that are also bound by the fusion (N = 163). D H3K27ac binding profiles, fusion binding profiles, and corresponding RNA-sequencing tracks at the TESC, BCL2, and TCF7L2 SE. The TESC SE, essential for megakaryoblast differentiation, is lost in fusion-positive megakaryoblasts. BCL2, an anti-apoptotic gene, and TCF7L2, a member of the WNT signaling pathway, are upregulated super-enhancers exclusive to fusion-positive megakaryoblasts (generated using SparK v2.6.2)^[136]56. The 236 unique SE restricted to fusion-positive leukemic megakaryoblasts included gene targets related to pro-survival, hematopoietic differentiation, and developmental pathways (Fig. [137]3, Supplementary Data [138]7). Consistent with prior reports, we found SE associated with the BCL2 and KIT genes^[139]7. 171 SE were present that have not been previously reported, including TCF7L2, which encodes a transcription factor that plays a role in the regulation of the WNT signaling pathway (Fig. [140]3A-D, Supplementary Data [141]7)^[142]7,[143]8. TCF7L2 has also been shown to be required for the survival and proliferation of AML cells^[144]27. Consistent with a block in differentiation, we found the TESC SE present in fusion-negative megakaryoblasts to be lost and expression of this gene, which is required for megakaryoblast differentiation, downregulated in fusion-positive leukemic megakaryoblasts (Fig. [145]3D)^[146]28. Thus, a subset of genes are dysregulated through SE gains and losses in addition to regulation provided via promoter binding. ETO, CtBP1, and p300 associate with the CBFA2T3-GLIS2 The CBFA2T3 portion of the fusion retains three highly conserved nervy homology regions (NHR). NHR1 is a binding site for p300, a transcriptional co-activator and histone acetyltransferase^[147]29. NHR2 contains a hydrophobic heptad repeat region that plays a key role in homo- and hetero-oligomerization with the closely related transcriptional repressor ETO (also known as RUNX1T1 and CBFA2T1)^[148]30. ETO can also bind p300 thereby suggesting both transcriptional repression and activation can be altered through the hetero-oligomerization of ETO via NHR2^[149]31. While NHR3 does not contain any highly conserved domains and is not well characterized, it is suggested to support oligomerization^[150]30. In addition to the five DNA binding zinc-fingers retained in the GLIS2 portion of the fusion, there is one PXDLS consensus sequence and one degenerate sequence capable of binding CtBP1, a transcriptional repressor^[151]32,[152]33. p300, like many histone acetyltransferases, also contains a highly conserved PXDLS sequence in the center of its bromodomain that can bind to CtBP1, which, in turn, represses the transactivation activity of p300^[153]34. Based on these data, we hypothesized the CBFA2T3-GLIS2 fusion is able to recruit and associate with ETO, CtBP1, p300 and homo-dimerize to bi-directionally alter gene expression at fusion-bound genes (Supplementary Fig. [154]5A). To establish an association of ETO, CtBP1, and p300 with the fusion we conducted co-immunoprecipitation (co-IP) experiments in CBFA2T3-GLIS2 leukemic megakaryoblasts from our murine model (Fig. [155]4A, Supplementary Fig. [156]5B). Immunoprecipitation of ETO, CtBP1 and p300 followed by blotting for the fusion using the 3x-TY1 tag confirmed our predicted associations (Fig. [157]4A, Supplementary Fig. [158]5B). We then performed CUT&RUN-seq for ETO, CtBP1, and p300 in transplanted leukemic megakaryoblasts and in vitro cultured fusion negative megakaryoblasts to map co-occupancy with the fusion at target genes (Fig. [159]4B–D, Supplementary Data [160]3) and assess genome-wide shifts in the binding of these factors in the presence of the fusion (Fig. [161]4C, Supplementary Data [162]3). Overall, we found an increase in the absolute number of unique genes bound at promotor regions by ETO and p300 in megakaryoblasts harboring the fusion compared to those lacking the fusion, whereas there were a similar number of genes bound by CtBP1 in fusion-positive and negative megakaryoblasts (Fig. [163]4C, Supplementary Data [164]3). We then looked specifically at the bound genes retained, gained, and lost in the fusion-positive megakaryoblasts (Fig. [165]4C). This demonstrated clear genome-wide shifts for all three factors in the presence of the fusion, with ETO and p300 having a greater number of sites gained compared to lost leading to a net increase of 136% and 303% targets respectively. Although CtBP1 was also found to bind new sites across the genome, there was a significant loss of CtBP1 binding to genes resulting in a net loss of 11% (Fig. [166]4C). The majority of sites gained, irrespective of the co-factor, were co-bound by the fusion (ETO 64%, CtPB1 52%, and p300 57%) (Supplementary Fig. [167]5C, Supplementary Data [168]3). Accordingly, heatmaps demonstrated significant enrichment of these co-factors at fusion-bound genes (Fig. [169]4D, Supplementary Fig. [170]5D). Collectively, this data argues in favor of the recruitment of these co-factors to new sites in the genome by the fusion itself. Fig. 4. ETO, CtBP1, and p300 associated with the CBFA2T3–GLIS2 transcriptional complex. [171]Fig. 4 [172]Open in a new tab A Co-immunoprecipitations using fusion-positive leukemic megakaryoblasts confirm the association of ETO, CtBP1, and p300 with the fusion. All immunoprecipitations were repeated twice, representative blots are shown. Source data are provided as a Source Data file. B CUT&RUN-seq has performed in fusion-positive megakaryoblasts from two secondary transplants and fusion negative megakaryoblasts cultured in vitro for ETO, CtBP1, and p300. Distribution of CUT&RUN-seq peak locations for ETO, CtBP1, and p300. CBFA2T3-GLIS2 is included as a comparison. TSS transcription start site, TTS transcription termination site, UTR untranslated region. Source data are provided as a Source Data file. C Summary of ETO, CtBP1, and p300 bound genes retained, gained, and lost in fusion-positive megakaryoblasts compared to fusion-negative megakaryoblasts. Source data are provided as a Source Data file. D Heatmaps of enrichment for CBFA2T3-GLIS2 (CG-TY), H3K27ac, ETO, CtBP1, and p300 at genes bound by the fusion. E Venn diagram showing the overlap of genes bound at the promoters of ETO, CtBP1, and p300 exclusively in fusion-positive megakaryoblasts and their overlap with fusion-bound genes (Supplementary Data [173]3). To gain insight into the impact of these co-factors on the expression of target genes, we evaluated the directionality of expression in the presence of the ETO, CtBP1, and p300. Genes bound solely by the fusion served as a baseline, with 57% of genes downregulated and 43% of genes upregulated. Directionality of expression did not appear to be significantly impacted by an association with ETO, CtBP1, or p300 alone (Supplementary Fig. [174]5E, Supplementary Data [175]3). Genes bound by all three factors, however, demonstrated an increase in the proportion of genes upregulated (65%, p = 0.0028). An evaluation of the differential expression of target gene subsets bound by the fusion and one or more of these co-factors revealed a contribution of each of these co-factors to the dysregulation of oncogenic pathways. For example, target genes associated with RUNX transcriptional regulation were co-occupied by CtBP1, whereas WNT signaling genes were co-occupied by ETO (Table [176]1). Most significantly enriched pathways (ORA Reactome analysis), irrespective of the co-factor associated with them, were upregulated with the exception of platelet activation, aggregation, and degranulation downregulation associated with ETO and p300 co-occupancy which are likely a reflection of the block in terminal differentiation (Table [177]1, Supplementary Data [178]4). The lack of significant enrichment for downregulated gene sets co-occupied by the fusion and its co-factors was surprising given the proportion of genes whose expression is decreased by the fusion and its associated co-factors. ETO and CtBP1 are well known for their repressor activity, but in the context of co-occupancy with CBFA2T3-GLIS2, our data suggests they play a major role in transformation through the transcriptional activation of key pathways associated with proliferation and development. Table 1. Impact of co-factor binding on over-representation analysis Pathway Fusion only* Fusion p300* Fusion ETO* Fusion ETO/p300* Fusion CtBP1* Signaling by TGFβ 0.004 n.s. n.s. n.s. n.s. Signaling by WNT 0.05 0.01 0.02 n.s. n.s. Signaling by hedgehog 0.05 0.04 n.s. n.s. 0.05 Signaling by NOTCH4 n.s. 0.03 n.s. n.s. n.s. Transcriptional regulation by RUNX1 n.s. n.s. n.s. n.s. 0.05 Transcriptional regulation by RUNX2 0.01 0.01 n.s. n.s. n.s. Transcriptional regulation by RUNX3 0.03 0.04 n.s. n.s. n.s. Platelet aggregation n.s. n.s. n.s. 0.004 n.s. Platelet activation n.s. n.s. n.s. 0.01 n.s. Platelet degranulation n.s. n.s. n.s. 0.01 n.s. [179]Open in a new tab n.s. not significant *Adjusted p-values calculated with a one-sided Fisher’s exact test and adjusted for multiple comparisons using the Benjamini–Hochberg correction; all pathways were upregulated with the exception of platelet aggregation, platelet activation, and platelet degranulation, which were downregulated. Loss of NHR2 disrupts ETO association and dimerization Having demonstrated binding and genomic co-occupancy of the CBFA2T3-GLIS2 fusion with ETO, CtBP1, and p300, we next sought to determine which structural components of the fusion were essential for recruitment of these co-factors to the transcriptional complex. To do this, we established a series of mutant CBFA2T3-GLIS2 constructs for in vitro and in vivo assays based on prior studies of GLIS2 and ETO family members (Supplementary Fig. [180]5A). In AML1-ETO positive AML, deletion of NHR1 disrupted p300 binding and deletion of NHR2 abrogated ETO binding^[181]31,[182]35–[183]38. We therefore predicted removal of NHR1 from the fusion sequence would lead to a partial loss of p300 due to the continued recruitment of ETO and CtBP1, whereas deletion of NHR2 would abrogate ETO binding and disrupt homo-dimerization. It has been previously demonstrated that mutation of all four PXDLS sequences in the full-length GLIS2 protein abolished CtBP1 binding^[184]38. Therefore, we predicted mutation of the two PXDLS residues retained in the fusion (DL380, 487AS) would cause a partial loss of CtPB1 as recruitment of CtBP1 to the complex can still be achieved indirectly through an association of p300 via NHR1 or ETO. To disrupt the association of p300, ETO, and CtBP1, we predicted the fusion would need to have a loss of both NHR1-2 and mutation of the PXDLS residues (Supplementary Fig. [185]5A). We went on to perform a series of co-IPs with our mutant constructs in transfected 293T cells, a well-established embryonic kidney cell line that lacks CBFA2T3-GLIS2 but expresses ETO, CTBP1, and EP300. Cells were transiently transfected for forty-eight hours with a lentiviral construct encoding N-terminal 3xFLAG tagged CBFA2T3-GLIS2 or one of the several mutant constructs outlined in Fig. [186]S5A. Consistent with the literature on ETO family members, we found that loss of NHR2 abrogated ETO binding with the fusion (Fig. [187]5A, Supplementary Fig. [188]6A). NHR2 also contributes to homo- and hetero-oligomerization. Therefore, to assess the effect of NHR2 deletion on oligomerization, we first co-transfected 293T cells with a lentiviral construct encoding N-terminal 3xTY1-tagged CBFA2T3-GLIS2 (referred to as wild type) and a lentiviral construct encoding either a wild type N-terminal 3xFLAG tagged fusion or a mutant fusion construct that lacks NHR1, NHR2, or NHR3. We found a significant reduction in, but not complete disruption of, homo-oligomerization upon NHR2 deletion, providing support for the role of NHR3 in oligomerization (Fig. [189]5B). NHR1 and NHR3 deletion did not have a major impact on the ability to dimerize with the wild type fusion. To confirm our findings, we co-transfected 293T cells with two mutant constructs. In the absence of the wild-type fusion, we again saw a reduction in homo-dimerization only when NHR2 was deleted (Fig. [190]5C). Fig. 5. Loss of NHR2 disrupts ETO association and dimerization of the CBFA2T3–GLIS2 complex and abrogates leukemogenesis in vivo. [191]Fig. 5 [192]Open in a new tab A–E Co-immunoprecipitation using 293T cells transfected with empty vector (MIG), wild-type fusion, or a fusion containing one or more of the mutations outlined in Fig. [193]4A. All immunoprecipitations were repeated twice, representative blots are shown. Source data for all blots are provided as a Source Data file. A Immunoprecipitation of ETO followed by staining with the TY-1 tag to detect CBFA2T3-GLIS2 and the reciprocal immunoprecipitation of CBFA2T3-GLIS2 via the TY-1 tag followed by staining for ETO is shown. B TY-1-tagged wild-type CBFA2T3-GLIS2 and one of three FLAG-tagged mutant constructs were co-transfected into 293T cells to evaluate the ability of the mutant constructs to dimerize with the wild-type fusion. Immunoprecipitation by TY-1 followed by staining for FLAG and the reciprocal immunoprecipitation of FLAG followed by staining for TY-1 is shown. C 293T cells were co-transfected with two mutant constructs, one FLAG-tagged and one TY-1-tagged, to verify the results shown in (B). Immunoprecipitation of TY-1-tagged NHR2 deletion mutant followed by staining for FLAG again revealed a reduction of dimerization. D TY-1-tagged wild-type and mutant fusion constructs were transfected into 293T cells. Cells were immunoprecipitated for CtBP1 followed by staining for the fusion via TY-1 and the reciprocal immunoprecipitation of the fusion followed by staining for CtBP1. E TY-1-tagged wild-type and mutant fusion constructs were transfected into 293T cells. Cells were immunoprecipitated for p300, followed by staining for the fusion via TY-1, and the reciprocal immunoprecipitation of the fusion followed by staining for p300. F, G Transplantation of immunodeficient NSG-SGM3 mice with primary megakaryoblasts transduced with the mutant fusion constructs (N = 5 per mutant construct, N = 9 wild type in panel F and 12 wild type in panel G). p < 0.0001 for NHR2 deletion and NHR1-2 (DL380,487AS) mutant constructs compared to wild type. p-values determined by log-rank test. Source data are provided as a Source Data file. In contrast to prior reports on the full-length GLIS2 protein, mutation of the PXDLS motifs retained in the fusion sequence (DL380,487AS) only led to a reduction in CtBP1 binding, not a complete abrogation^[194]38. This may be due to the recruitment of CtBP1 to the complex through p300, independent of binding to the motif. However, deletion of NHR1 and 2 in combination with the PXDLS mutations only abrogated CtBP1 association when the fusion was directly immunoprecipitated, suggesting CtBP1 is still able to indirectly associate with the transcriptional complex even in the presence of all three structural mutations (Fig. [195]5D, Supplementary Fig. [196]6B). Although IP of p300 demonstrated an association with the fusion, we were unable to show direct binding of the fusion with p300 as when we immunoprecipitated for the fusion and probed for p300 we were unable to detect any bands. This suggests that p300 may only be recruited to the fusion complex via other co-factors, such as ETO and CtBP1 (Fig. [197]5E, Supplementary Fig. [198]6C). Loss of NHR2 abrogates leukemogenesis in vivo To assess the effects of the structural mutations and the requirement for these co-factors on leukemogenesis in vivo, we transplanted NSG-SGM3 mice with megakaryoblasts transduced with either wild-type CBFA2T3-GLIS2 (N = 13) or CBFA2T3-GLIS2 carrying a deletion of NHR1 (N = 5), NHR2 (N = 5), mutation of the PXDLS motif (DL380,487AS) (N = 5), or all mutations in combination (N = 5) (Fig. [199]5F, G, Supplementary Data [200]1). Loss of NHR2 abrogated leukemogenesis in vivo when monitored for over 250 days (p < 0.0001, Fig. [201]5F, Supplementary Data [202]1), whereas deletion of NHR1 or mutation of the PXDLS motif did not have an effect on leukemia-free survival (Fig. [203]5G, Supplementary Data [204]1). A previous study demonstrated an inability to engraft when CBFA2T3-GLIS2 PDX cells were transduced with NC128, an NHR2 domain competitive peptide, thereby further highlighting the importance of this domain on transformative capability^[205]7. Our demonstration of the retained ability of the fusion to dimerize, although at significantly reduced levels, even when NHR2 is deleted (Fig. [206]5B, C), suggests the loss of leukemogenesis in NHR2 deletion mutants cannot be solely due to the loss of oligomerization, arguing in favor of a role for ETO in transformation (Fig. [207]5A, Supplementary Fig. [208]6A). We sought to investigate genome-wide binding and gene expression changes upon deletion of NHR2 to identify key pathways required for leukemogenesis that may reveal therapeutic targets. Due to the lack of expansion in vivo when NHR2 is deleted, we performed CUT&RUN-seq and RNA-sequencing on megakaryoblasts transduced with either the wild-type fusion or a fusion construct lacking NHR2 grown in vitro. Differential gene expression analysis showed 64% (826/1296) of genes were downregulated, and 36% (470/1296) were upregulated when NHR2 was deleted compared to wild-type fusion-positive megakaryoblasts (Fig. [209]6A, Supplementary Data [210]3). GSEA found an enrichment of downregulated genes involved in the JAK/STAT, NOTCH, and Hedgehog signaling pathways (Fig. [211]6A, Supplementary Data [212]4). CUT&RUN-seq revealed that genome-wide, 61% of previously bound sites were lost, and 20% of sites had a decrease in the amount of fusion binding in NHR2 deletion mutants despite the retention of all five GLIS2 zinc finger DNA binding domains (Fig. [213]6B, Supplementary Data [214]8). Loss of binding was preferentially seen at transcriptional start sites, areas where co-factor binding and oligomerization are predicted to be important (Fig. [215]6C). When assessing gene expression of targets whose binding was lost at the transcriptional start sites, we found a concordant reversal in the signature imparted by the fusion (Fig. [216]6D). 90% (67/74) of the GRN nodes were no longer bound in NHR2 deletion mutants, so we reassessed the GRN and found a dramatic shift with only 26 nodes reaching significance (Fig. [217]6E, Supplementary Data [218]9). Collectively, the data argue in favor of impaired ETO binding and oligomerization of NHR2 deletion mutants, leading to a decrease in binding at transcriptional start sites and the loss of the GRN and key leukemogenic pathways. Fig. 6. Impact of NHR2 deletion on CBFA2T3-GLIS2 transcription. [219]Fig. 6 [220]Open in a new tab A RNA extraction and gene expression profiling were performed in technical triplicate on megakaryoblasts cultured in vitro transduced with either the fusion construct with NHR2 deleted or the wild-type fusion construct. In vitro, culturing was necessary due to the lack of expansion in vivo upon NHR2 deletion. A heatmap of all differentially expressed genes is shown (N = 1296). Significantly downregulated pathways, as determined by gene set enrichment analysis, are shown. For a full list of gene sets, please see supplementary data [221]4. Log2 fold change (FC) was calculated using a generalized linear model with a negative binomial distribution in DESeq2 and p-values derived from the Wald t-test and corrected using the Benjamini–Hochberg correction. False discovery rate (FDR) was calculated using the Benjamini–Hochberg correction. B CUT&RUN-seq was performed on NHR2 deletion mutants and compared to wild-type fusion cells. The proportion of sites where binding was lost, retained, and newly bound sites is shown. Source data are provided as a Source Data file. C Location of binding at newly bound targets, retained targets, and lost targets in NHR2 deletion mutants. Source data are provided as a Source Data file. D Gene Expression changes in NHR2 deletion mutants at sites where fusion binding was lost at the transcriptional start sites (N = 4059 genes). E Gene regulatory network of NHR2 deletion mutants. To assess the therapeutic potential of several of the key pathways identified in our studies and others, we subjected fusion-positive megakaryoblasts and cord blood-derived CD34 cells to a series of inhibitors, including cKIT (Ripretinib), JAK/STAT (Ruxolitinib), WNT (iCRT), p300 (A-485), and ETO (7.44) (Supplementary Fig. [222]7)^[223]8,[224]39–[225]41. The greatest sensitivity seen was to ruxolitinib and the p300 inhibitor A-485 with IC[50] values of 70 and 1300 nM, respectively (Supplementary Fig. [226]7). Only ruxolitinib, however, demonstrated a therapeutic window when taking into account the toxicity of the compounds on CD34 stem cells. We confirmed the hematologic toxicity of p300 loss by evaluating serial replating capacity in CBFA2T3-GLIS2-modified p300 knockout bone marrow (Supplementary Fig. [227]8). Confirming the sensitivity seen in vitro, p300 knockout abrogated colony development compared to p300 wild-type cells in the presence or absence of CBFA2T3-GLIS2 (Supplementary Fig. [228]8). Consistent with the sensitivity of our model to ruxolitinib in vitro, we have previously published the activity of this targeted inhibitor in a CBFA2T3-GLIS2 patient-derived xenograft model^[229]11. Combined, our data support a role for JAK/STAT signaling, and its potential as a therapeutic target. Discussion The majority of CBFA2T3-GLIS2-positive AMKL patients fail to be cured with the current treatment options that include chemotherapy in combination with stem cell transplant. Due to the relatively low incidence of this malignancy in the broader scope of pediatric cancer, these patients have not been afforded comprehensive research studies toward the development of new therapeutics^[230]42. We sought to contribute to the characterization of the CBFA2T3-GLIS2 transcriptional complex to enhance our understanding of the mechanism(s) of leukemogenesis and identify biologic pathways that are required in this process. Our data support a model in which the CBFA2T3-GLIS2 fusion facilitates homo-dimerization and recruits the transcriptional co-factors ETO, CtBP1, and p300 to impart differential gene expression at zinc finger bound G-C-rich motifs throughout the genome (Supplementary Fig. [231]9). Despite the large number of genes bound by the fusion, the impact of the fusion on gene expression extended far beyond fusion bound genes. This led us to map the gene regulatory network, which revealed a large number of well-known transcription factors that are dysregulated by the fusion and, in turn, affect downstream genes. While the fusion-bound genes included several associated with WNT, TGFβ, and Hedgehog signaling, NOTCH and the JAK/STAT oncogenic pathways only emerged when we looked at gene expression globally and resulted from upregulation of components of the gene regulatory network. Further, our ability to compare fusion-negative and fusion-positive megakaryoblasts allowed us to confirm the changes that were due to the presence of CBFA2T3-GLIS2 and rule out megakaryoblast lineage-specific changes. CBFA2T3 and GLIS2 are well-characterized transcription factors which led us to look at the potential role of ETO, CtBP1, and p300 co-factors on transcriptional regulation. We were able to confirm the association of these co-factors biochemically as well as co-localization at promoter sites bound by the fusion genome-wide. The generation of mutant constructs and their functional impact in vivo allowed us to narrow down the domain required for transformation to NHR2. When NHR2 is deleted, we found a significant reduction in homo-dimerization, a complete loss of ETO association, and impairment in DNA binding at transcriptional start sites. This had the greatest impact on JAK/STAT and NOTCH signaling pathways and the loss of the GRN at the gene expression level. Abrogation of the development of leukemia in vivo suggests that these pathways are important, in contrast to others that likely contribute to the transformation process, such as RUNX but were not affected by NHR2 deletion. Compilation of genomic occupancy mapping and pathway enrichment analyses in combination with NHR2 deletion gene dysregulation data has led us to propose that multiple pathways play an active role in transformation, which is disrupted when NHR2 is deleted. Thus, targeting a single pathway is not predicted to be effective therapeutically. BCL2, a downstream effector of JAK/STAT signaling and a therapeutic target of its own was found to be upregulated in fusion-positive leukemic megakaryoblasts as a result of the acquisition of a SE. The role of BCL-xL in megakaryocyte lineage survival, however, suggests a potential path of resistance to BCL2-specific inhibitors, which has recently been confirmed in CBFA2T3-GLIS2 model systems^[232]43,[233]44. In summary, we have developed a humanized murine model of AMKL that faithfully recapitulates the CBFA2T3-GLIS2 patient profile. We have used this model to show ETO, CtBP1, and p300 are associated with the transcriptional complex and play a role in gene dysregulation that is disrupted when NHR2 is deleted. Our model has the potential to be used for systematic knock-down experiments to identify absolute dependencies that can be therapeutically targeted and to aid in the pre-clinical testing of these agents. Methods Ethical regulations The research complies with all relevant ethical regulations. Human CD34+ stem cells utilized in this study were commercially purchased, de-identified units. All animal experiments were conducted under IACUC-approved protocols at one of two institutions: St. Jude Children’s Research Hospital, where “Animal Care and Use Committee” approved protocol entitled “Establishing Murine Models of Leukemia” 519-100568 (experiments conducted between 11/16/2018 and 8/25/2020); or Stanford University, where “Administrative Panel on Laboratory Animal Care” approved protocol entitled “Murine Modeling of High-Risk Pediatric Acute Leukemias” 33830 (experiments conducted between 08/25/2020 and 08/25/2023). Moribund mice were identified by the presentation of hind leg paralysis, hunched body position, labored breathing, and/or inability to eat or drink. Generation of primary human megakaryoblasts CD34+ stem cells were isolated from commercially purchased, de-identified human cord blood units (St. Louis Cord Blood Bank and Cleveland Cord Blood Bank). Cord blood was layered onto Ficoll-Paque (Fisher 17144003), and CD34+ stem cells were positively selected from the mononuclear layer using the human CD34 MicroBead kit (Miltenyi 130046702). CD34+ stem cells were cultured in GMP grade SCGM serum-free medium (CellGenix 208020500) supplemented with 1% penicillin–streptomycin and 100 ng/μl each of human TPO (Tonbo 218685), FLT3 ligand (Tonbo 218513), and SCF (PeproTech AF30007) for 48 h. Cultured CD34+ stem cells were then transduced using Retronectin (Takara Bio 62955748) seeded with lentiviral particles encoding the CBFA2T3-GLIS2 sequence. Packaged lentiviral particles were produced from transfected 293T cells (ATCC CRL-3216) using a third-generation lentiviral vector (see the section “Constructs” section), pMD2, and PAX2 packing plasmids with 2X BES (BES, NaCl, Na2PO4Heptahydrate, pH 6.9) and CaCl[2] (pH 5.5). Forty-eight hours post-transduction, cells were removed from the RetroNectin and differentiated towards the megakaryoblast lineage in GMP SCGM serum-free medium supplemented with 1% penicillin–streptomycin and 100 ng/ml human TPO and 10 ng/ml human IL1β (Tonbo 218018) for 7–9 days which allows for approximately 30-fold expansion^[234]45. Megakaryoblasts were confirmed by flow cytometry analysis. Cells were pelleted, washed with PBS, blocked with human Fc block (BD 564220) for 10 min at room temperature, and then stained for CD41 (human CD41-PE Life Technologies MHCD4104 clone VIPL3), CD34 (human CD34-APC Miltenyi 130113176 clone AC136), and viability with Ghost Dye Violet 510 (Tonbo 130870) for 20 min in the dark at room temperature. Cells were washed twice with FACS buffer (PBS supplemented with 1% FBS and 0.1 mM EDTA) and then analyzed on a BD Fortessa (St. Jude) or a Cytek DxP10 (Stanford). Megakaryoblasts were then sorted for GFP using fluorescence-activated cell sorting on a BD Biosciences Aria to obtain a pure GFP-positive population for downstream applications such as transplantation, RNA-sequencing, and CUT&RUN-sequencing. Transfection efficiencies averaged 15–30%. Constructs Overlap extension PCR was used to clone the N-terminal 3xFLAG tag CBFA2T3-GLIS2 coding sequence into the pCDH.MSCV.MCS.EF1acopGFP (pCDH) lentivirus backbone and N-terminal 3xTY1 tag CBFA2T3-GLIS2 coding sequence into the CSI.pEF1a.IRES.EmGFP (CEI) lentivirus backbone. NHR deletions and PXDLS motif mutations were made by site-directed mutagenesis using the PfuUltra high-fidelity DNA polymerase kit (Agilent 600380) as per the manufacturer’s instructions for targets <10 kb vector DNA. 250 ng of input DNA was amplified via the following PCR program: Step 1 95 °C for 1 min, Step 2 95 °C for 40 s, Step 3 64 °C for 50 s, Step 4 68 °C for 15 min, repeat steps 2–4 16 times, Step 5 72 °C for 15 min. The following primers were used (5’–3’): NHR1△ fw: CAGCACCTGCCCCCAGCCTGCGGGCTGCTGGACGCCAGCGCCTCCTCCCCC NHR1△ rv: GGGGGAGGAGGCGCTGGCGTCCAGCAGCCCGCAGGCTGGGGGCAGGTGCTG NHR2△ fw: CGGCAGGAAGAAGTGATCGACCACAAGGACACAAAGAAGGGCCCCGCTCCC NHR2△ rv: GGGAGCGGGGCCCTTCTTTGTGTCCTTGTGGTCGATCACTTCTTCCTGCCG NHR3△ fw: ACCCTCACCGGCTACGTGCCTGAGG/CGAAGCGGCAGGCCTCCGAGGAC NHR3△ rv: GTCCTCGGAGGCCTGCCGCTTCGCCTCAGGCACGTAGCCGGTGAGGGT DL380AS fw: CCTGCTGCCAGGCACCGTGCTGGCCTCGTCCACGGGCGTCAACTCAGCTG DL380AS rv: CAGCTGAGTTGACGCCCGTGGACGAGGCCAGCACGGTGCCTGGCAGCAGG DL487AS fw: ACCCCTGGCCCCCGGCCCCTTGCCTCCAGTGCCCTGGCCTGTGGCAACG DL487AS rv: CGTTGCCACAGGCCAGGGCACTGGAGGCAAGGGGGCCGGGGGCCAGGGGT Plasmids generated in this study are available from the corresponding author. Humanized murine model Immunodeficient NSG-SGM3 (NOD.Cg-Prkdc[scid] Il2rg[tm1Wjl] ITg(CMV-IL3, CSF2, KITLG) 1Eav/MloySzJ) breeder mice were purchased from the Jackson Laboratory (Strain 013062) and bred in-house maintained on the NOD.Cg-Prkdscid Il2rgtm1Wjl/SzJ background. Engraftment was found to be equal between male and female mice, so both sexes were utilized; see Supplementary Data [235]1 for the sex of each mouse reported in this study. Mice were treated with prophylaxis Baytril and Amoxicillin to minimize the contraction of bacterial infections. Both male and female mice four to six weeks of age were pre-conditioned with 200 rad (cesium irradiator at St. Jude, x-ray irradiator at Stanford) 24 h prior to megakaryoblast transplantation. Transplantations were done either via tail vein or intrafemoral injection. There was no difference in disease onset and phenotype with either method or based on the gender of recipient mice. All animal work was conducted under IACUC protocols: St. Jude Children’s Research Hospital “Establishing Murine Models of Leukemia” 519-100568 (11/16/2018–11/16/2021); Stanford University “Murine Modeling of High-Risk Pediatric Acute Leukemias” 33830 (08/25/2020–08/25/2023). Moribund mice were identified by the presentation of hind leg paralysis, hunched body position, labored breathing, and/or inability to eat or drink. Spleen and bone marrow cells from both hind legs were collected. Red blood cells were lysed with RBC lysis buffer (Invitrogen 501129743) according to the manufacturer’s instructions. Mouse CD45 cells were depleted using mouse CD45 microbeads (Miltenyi 130052301) according to the manufacturer’s instructions. Mouse CD45-negative cells were stained for mouse CD45 (BioLegend 103132 clone 30-F11) and for viability with Ghost Dye Violet 510 (Tonbo 130870) as described above and analyzed for GFP purity by flow cytometry to confirm recovery of transplanted, fusion-positive megakaryoblasts. Purified cells were used for downstream applications, including RNA-seq, CUT&RUN-seq, and co-IP/Western. Full-body necropsies with histology (hematoxylin and eosin stain) were also performed to assess the engraftment of transplanted megakaryoblasts. RT-PCR The iScript cDNA synthesis kit (Bio-Rad 1708890) was used to generate cDNA from primary human megakaryoblasts according to the manufacturer’s instructions. The Advantage 2 PCR Enzyme System (Takara Bio USA 639207) was used to amplify the fusion junction DNA sequence using the following primer sequences: forward (5’CATCTGGAGGAAGGCTGA3’) and reverse (5’CGGTTGTGGATCTTCAGGT3’). RNA-sequencing RNA was extracted using TRIzol reagent (Thermo Fisher 155960266) according to the manufacturer’s instructions, with all volumes divided in half due to low input (50,000–500,000 cells). Low input library prep (Ovation/NEB) and sequencing on the Illumina NovaSeq6000 (S4) PE150 sequencer were performed. Raw paired-end FASTQ files were filtered to remove low-quality reads and read with adaptor contamination. STAR v2.7.la was used to align clean reads to the human genome (hg138) and to generate gene counts. STAR output for each sample was combined for differential expression testing using DESeq2_1.32.0R package installed in R v4.1 using default settings. PCA was performed on rlog-transformed data (for sample size <20) or vst-transformed (for sample size ≥ 20) using DESeq2. Gene set enrichment analysis was performed using GSEA v4.2.2^[236]17 with parameters recommended for sample size. For pathway enrichment analysis on subsets of genes, over-representation analysis was performed using biomaRt_2.50.3, ReactomePA1_1.38.0, and clusterProfiler_4.2.2R packages. Pearson correlation was calculated using Intensity raw data that was screened for missing values for exclusion before log2 transformation in R. Data was then subset into Transplant and Patient groups, and then rowMean was taken by ID in each group. Cross-correlation plots were generated and correlation was measured using the Pearson method. Mass spectrometry-based proteomics 100 ug of protein from 10 million normal primary megakaryoblasts cultured in vitro (N = 2), recovered transplanted megakaryoblasts (N = 2), and CBFA2T3-GLIS2 patient samples (N = 4) were analyzed in technical replicate by tandem mass tag (TMT) mass spectrometry. Proteomic analysis was performed with a protocol of multiplexed tandem mass tag labeling, two-dimensional liquid chromatography and tandem mass spectrometry (TMT–LC/LC–MS/MS)^[237]46. TMT 10-plex isobaric label reagent set (Thermo Scientific) was used for proteome quantification of 10 samples simultaneously. 100 μg of protein per sample was used as starting material. Cells were lysed using a buffer that contains 50 mM HEPES, pH 8.5, 8 M urea, and 0.5% sodium deoxycholate. The lysate was split into three aliquots: one to measure the protein concentration, one for a positive control validation of BMP2 by western blotting, and one for proteomic analysis. Protein concentration of sample lysates was quantified by BCA protein assay (Thermo Fisher Scientific) with titrated BSA as a standard. ∼100 μg proteins per sample were first digested with Lys-C (Wako, 1:100 w/w) at room temperature for 2 h, diluted 4 times with 50 mM HEPES, pH 8.5, and then further digested with trypsin (Promega, 1:100 w/w) overnight at room temperature. 1% trifluoroacetic acid was added to quench the digestion reaction, followed by desalting with Sep-Pak C18 cartridge (Waters), and the desalted peptides were dried by speedvac. Samples were then resuspended in 50 mM HEPES, pH 8.5, and were labeled with 10-plex TMT reagents following the manufacturer’s instructions. Lastly, 10 isobaric labeled samples were pooled together with equal amount, desalted again by Sep-Pak C18 cartridge and then speedvac dried. TMT-labeled and unlabeled samples were verified by LC–MS/MS. The basic pH reverse phase liquid chromatography peptides pre-fractionation was performed on the Agilent 1220 LC system as previously described^[238]47. The analysis was carried out based on the St. Jude proteomics core optimized platform as previously described^[239]47,[240]48. The dried peptide fractions were reconstituted in loading buffer (5% formic acid), loaded on a reverse phase column (75 μm × 50 cm, 1.9 μm C18 resin (Dr. Maisch GmbH, Germany)), and interfaced with a FUSION or Q Exactive HF mass spectrometer (Thermo Fisher Scientific). Peptides were eluted up to 6 h by a 15–65% gradient of buffer B. (buffer A: 0.2% formic acid, 5% DMSO; buffer B: buffer A plus 65% acetonitrile, flow rate: 0.25 μl/min). A butterfly portfolio heater (Phoenix S&T) was applied to heat the column at 65 °C to reduce back pressure. The mass spectrometer was operated in data-dependent mode with MS1 settings of 60,000 resolution, 1 × 10^6 AGC target and 50 ms maximal ion time and top 20 MS/MS high-resolution scans with MS2 settings of 1m/z isolation window with offset 0.2, 60,000 resolution, 100 ms maximal ion time, 1 × 10^5 AGC target, HCD, 33 normalized collision energy, and 40 s dynamic exclusion (35 normalized collision energy and 20 s dynamic exclusion for phosphoproteome). Peptide identification was performed using the JUMP search engine^[241]49. Commercially available database search engines can be divided into two categories: tag-based De novo sequencing (e.g. PEAKS with limited sensitivity) and pattern-based database search (e.g. SEQUEST, MASCOT). The JUMP software integrates these two methods to score putative peptides, showing significant improvements compared with these commercially available tools11. The JUMP software has previously been utilized in numerous publications^[242]50. Analysis was done similarly as previously described, MS/MS raw files were first converted into mzXML format and searched against a composite target/decoy database FDR estimation^[243]48. The target protein database was compiled from the UniProt mouse and human database (Human database: 88,965 protein entries; Mouse database: 52,738 protein entries, downloaded in February 2015), and the decoy database was generated by reversing target protein sequences. Spectra were searched with ±10 ppm mass tolerance for both precursor ions and product ions with full tryptic restriction, static modification for TMT tag on N-terminus and lysine (+229.16293), dynamic modification for serine, threonine and tyrosine (+79.96633, for phosphoproteome analysis), three maximal modification sites, two maximal missed cleavages, and the assignments of a, b, and y ions. Peptide spectrum matches (PSM) were first filtered by MS mass accuracy (∼2 ppm, ±4 standard deviations). PSMs of doubly charged peptides with a JUMP Jscore of >30 were applied for global mass recalibration prior to the filtering. The survived PSMs were first grouped by precursor ion charge state and then further filtered by Jscore and dJn values. Cutoffs were applied to these values and were adjusted until a protein FDR < 1% was achieved. If one peptide was shared by multiple proteins, the protein with the highest PSM will represent the peptide according to the rule of parsimony^[244]46. For quantitation, an analysis was carried out in the following steps similarly as previously reported^[245]48. (i) TMT reporter ion intensities of each PSM were extracted; (ii) the raw intensities were corrected according to the isotopic distribution of each labeling reagent; (iii) PSMs with very low reporter ion intensities were excluded (e.g. minimum intensity <1000 and median intensity <5000); (iv) sample loading bias was corrected by normalization with the trimmed median intensity of all PSMs; (v) the mean-centered intensities across samples were calculated; (vi) protein or phosphosite relative intensities were summarized by averaging related PSMs; (vii) protein or phosphosite absolute intensities were derived by multiplying the relative intensities by the grand-mean intensity of the top three most highly abundant PSMs^[246]48. Protein lysates were digested and desalted and then underwent 16-plex TMT labeling. Samples were analyzed for total proteome by low pH reverse-phase liquid chromatography–tandem mass spectrometry (LC–MS/MS). Retrieved data was searched against the UniProt human database to identify peptides. This was determined by assigning related mass addition to all possible amino acid residues followed by an algorithm that evaluates the accuracy of modification localization. Peptide quantification data was corrected for mixing errors and summarized to derive protein quantification results. Using this approach, about 10,000 proteins are analyzed, which provides a comprehensive proteome analysis. Spearman’s correlation co-efficient For the paired secondary (N = 1) and tertiary (N = 1) transplant cells that underwent RNA-sequencing, transcript per million (TPM) values for protein-coding genes were log-transformed. Lowly expressed genes were filtered out as background noise (>1 TPM in 50% or more samples). For the transplant cells that underwent proteomics analysis, variance stabilizing normalization (VSN) was applied to the protein intensity data and replicates were combined by mean protein intensity value. 3525 overlapping genes between RNA-Sequencing and proteomics data were used in Spearman’s correlation analysis (measured as Spearman’s rho). Stringtie (v2.1.4) was used to convert BAM files to TPM values and R (v4.1.2) was used for the remaining analysis. CUT&RUN-sequencing 500,000 megakaryoblasts recovered from transplantations were cross-linked with 0.1% formaldehyde for one minute at room temperature. The reaction was quenched with 2.5 M glycine. Cross-linked cells were then used with the CUTANA ChIC/CUT&RUN kit (EpiCypher 14-1048) per the manufacturer’s instructions. Cells were washed twice with Wash Buffer (pre-wash buffer supplemented with 1% Triton-X and 0.05% SDS) and then bound to ConA beads. Cell-bound beads were collected and re-suspended in Antibody Buffer and 0.5 ug of one of the following antibodies: TY1 (GenScript A01004-40), ETO (Bethyl A303-509A), CtBP1 (ThermoFisher 10972-1-AP), p300 (ThermoFisher PA1848), H3K27ac (Active Motif 39034), and rabbit IgG (Epicypher 13-0042). Beads were incubated at 4 °C overnight on a nutator. The next day, beads were collected and washed twice with a cold Cell Permeabilization Buffer followed by incubation with pAG-MNase. Beads were collected and washed twice again with Cell Permeabilization Buffer and then 100 mM of calcium chloride was added on ice to activate the enzymatic reaction. Beads were incubated at 4 °C for two hours on a nutator. After the incubation, Stop Buffer was added, and the bound protein-DNA complexes were eluted off the beads at 37 °C. The recovered protein-DNA was collected and de-crosslinked by adding SDS and Proteinase K, followed by an overnight incubation at 55 °C. The next day, DNA was purified using the provided DNA Cleanup Columns and DNA Binding and Wash buffer reagents. DNA concentration was measured using the Qubit dsDNA HS kit (Fisher [247]Q32851). Library prep (NEBNext/ChIPSeq) and sequencing on Illumina NovaSeq600 (s4) PE150 was performed. CUT&RUN-sequencing was performed in biological replications and analyzed with paired RNA-sequencing data. Fastq files were trimmed and aligned to the hg19 genome via Basepair ([248]https://app.basepairtech.com). Bam files were extracted and MACS v2.2.71 was used to call peaks in each sample relative to IgG control background (parameters -f BAMPE, -g 2700000000, --p-value 1e−3, --call-summits, -B and -SPMR). To build the single tracks for each sample, macs2 bdgcmp was used to generate fold-enrichment track (parameters -M FE and -p 0.00001), and resulting bedGraphs were fixed for conversion to bigwig files using bdg2bw.sh and bedGraphToBigWig method (bedtools v2.27.1). To calculate correlation between replicates before combination, multiBigWigSummary (bins and -chromosomesToSkip,”chrX”, “chrY”, “chrM”, and “random”) and plotCorrelation (-corMethod pearson – skipZeros, -- whatToPlot scatterplot and -removeOutliers) from deeptools v3.3.0 were used. Annotation of narrowPeak files was performed using Homer v4.11 and hg19 as references. In the cofactor datasets, peaks were further filtered