Abstract

   CBFA2T3-GLIS2 is a fusion oncogene found in pediatric acute
   megakaryoblastic leukemia that is associated with a poor prognosis. We
   establish a model of CBFA2T3-GLIS2 driven acute megakaryoblastic
   leukemia that allows the distinction of fusion specific changes from
   those that reflect the megakaryoblast lineage of this leukemia. Using
   this model, we map fusion genome wide binding that in turn imparts the
   characteristic transcriptional signature. A network of transcription
   factor genes bound and upregulated by the fusion are found to have
   downstream effects that result in dysregulated signaling of
   developmental pathways including NOTCH, Hedgehog, TGFβ, and WNT.
   Transcriptional regulation is mediated by homo-dimerization and binding
   of the ETO transcription factor through the nervy homology region 2.
   Loss of nerve homology region 2 abrogated the development of leukemia,
   leading to downregulation of JAK/STAT, Hedgehog, and NOTCH
   transcriptional signatures. These data contribute to the understanding
   of CBFA2T3-GLIS2 mediated leukemogenesis and identify potential
   therapeutic vulnerabilities for future studies.

   Subject terms: Acute myeloid leukaemia, Paediatric cancer, Cancer
   models, Cancer genomics
     __________________________________________________________________

   CBFA2T3-GLIS2 is a fusion oncogene found in a subset of pediatric acute
   megakaryoblastic leukemia associated with a poor prognosis. Here,
   Garfinkle et al. map the gene regulatory network of CBFA2T3-GLIS2 using
   a humanized murine model and identify signaling pathways required for
   leukemogenesis.

Introduction

   Pediatric acute megakaryoblastic leukemia (AMKL) comprises 4–15% of
   pediatric acute myeloid leukemia (AML) patients and is found in
   patients with Down syndrome (DS-AMKL) and in patients without Down
   syndrome (non-DS-AMKL)^[42]1. Children <4 years of age with Down
   syndrome have a 500-fold greater incidence of AMKL than the general
   population and have an event-free survival (EFS) approaching 90% with
   standard chemotherapy^[43]2. Malignant transformation of
   megakaryoblasts in DS-AMKL is due to trisomy 21 and somatic mutations
   in the GATA1 gene, which are rarely found in non-DS-AMKL
   patients^[44]3,[45]4. Prior to next-generation sequencing studies, the
   genomic alterations driving non-DS-AMKL were not well defined. In 2011,
   we reported a recurrent cryptic inversion event identified by RNA
   sequencing on chromosome 16 that produced the coding fusion
   CBFA2T3-GLIS2 in a subset of pediatric non-DS-AMKL
   patients^[46]5,[47]6. A follow-up study in 2017 revealed seven major
   genomic subtypes in pediatric non-DS-AMKL, five of which contain
   recurrent chromosomal translocations that code for chimeric
   transcription factors: HOX rearrangements (HOXr), RMB15-MKL1,
   NUP98-KDM5A, KMT2A rearrangements (KMT2Ar), and CBFA2T3-GLIS2^[48]4.
   CBFA2T3-GLIS2 was the most prevalent and the most aggressive of the
   fusions, with overall survival of only 14% despite maximum tolerated
   high-intensity chemotherapy and stem cell transplant in the first
   remission^[49]4. Patients harboring the CBFA2T3-GLIS2 fusion have
   significantly fewer cooperating somatic mutations compared to patients
   with other subtypes of non-DS-AMKL, suggesting CBFA2T3-GLIS2 is the
   driver of leukemogenesis^[50]5. Prior chromatin immunoprecipitation
   sequencing studies (ChIP-seq) of the CBFA2T3-GLIS2 fusion in engineered
   cell lines using a GFP tag and a CBFA2T3 antibody identified <2000
   binding sites genome-wide with only 20% of target genes differentially
   expressed and 15% carrying a GLIS2-binding motif^[51]7. A significant
   number of peaks were found to have ETS consensus binding sites, and ERG
   ChIP-seq confirmed overlapping binding in 50% of the identified
   CBFA2T3-GLIS2 target genes; however, a biochemical association between
   the fusion and ERG has not been demonstrated. An analysis of
   super-enhancers identified ETO, ERG, PDGFRA, and KIT in cell lines.
   However, neither of these studies distinguished super-enhancers
   specific to the fusion as compared to the lineage state of the
   cell^[52]7,[53]8. Given the paucity of identified genomic targets, the
   lack of differential expression in 80% of these targets, and the
   genomic complexity of cell lines that carry additional mutations that
   may interfere with and/or impact results, we sought to establish an
   optimized model system to more thoroughly understand transcriptional
   regulation by the fusion. Herein we report on the development of a
   humanized murine model to identify key transcriptional co-factors and
   dysregulated signaling pathways that contribute to transformation.
   Advantages of the model include the isolation of CBFA2T3-GLIS2 as the
   only genomic event when compared to fusion negative megakaryoblasts and
   the ability to compare and contrast results in these two populations.
   The use of cleavage under targets and release using nuclease (CUT&RUN),
   which requires a lower input and provides superior sensitivity with a
   high signal-to-noise ratio, allowing for increased resolution of genes
   targeted by the fusion in our study^[54]9. With this approach, we were
   able to comprehensively map genome-wide binding with accuracy and
   identify fusion-specific changes to enhance our understanding of the
   dysregulated signaling and developmental pathways that may be targeted
   therapeutically.

Results

CBFA2T3-GLIS2 drives the transformation of human megakaryoblasts

   Current models to characterize CBFA2T3-GLIS2 positive non-DS-AMKL are
   limited. Immortalized cell lines derived from patients harboring this
   fusion, such as MO7e, have additional acquired mutations, such as in
   TP53, that are not represented in the patient population^[55]4,[56]5.
   In contrast to other fusions found in pediatric non-DS-AMKL,
   CBFA2T3-GLIS2 gene-modified murine bone marrow cells fail to induce a
   serially transplantable leukemia in the absence of a cooperating
   mutation^[57]10,[58]11. Several CBFA2T3-GLIS2-positive patient-derived
   xenograft models are available; however, patient cells are difficult to
   manipulate for experiments, cell quantity is limited, and they do not
   grow well in vitro. A reliable model of CBFA2T3-GLIS2-positive AMKL
   that recapitulates primary patient samples and can be experimentally
   manipulated is needed.

   To develop a model, CD34+ stem cells were isolated from human cord
   blood and transduced with a lentiviral vector encoding N-terminal
   tagged CBFA2T3-GLIS2 and the GFP reporter gene. The transduced stem
   cells were cultured in vitro in the presence of IL1β and TPO to promote
   megakaryoblast differentiation, flow-sorted for GFP purity, and
   transplanted into NOD.scid.Il2Rγc^null immunodeficient mice with the
   IL3, CSF2, and KITLG human transgenes (NSG-SGM3)^[59]12. In vivo
   expanded and recovered cells were defined as mouse CD45 negative and
   GFP positive (Fig. [60]1A). The fusion-positive megakaryoblasts induced
   a serially transplantable leukemia within 200 days compared to empty
   vector transduced megakaryoblasts lacking the fusion (Fig. [61]1B,
   Supplementary Data [62]1). Immunophenotyping of fusion-positive
   megakaryoblasts was consistent with patient specimens as determined by
   proteomics (Supplementary Fig. [63]2). 39% (12/31) of moribund mice
   presented with hind leg paralysis, and 45% (14/31) had enlarged spleens
   (Supplementary Data [64]1). Histology revealed neoplastic infiltrates
   in the bone marrow, spleen, and central nervous system, the latter of
   which caused the hind leg paralysis due to compression of the spinal
   cord (Supplementary Fig. [65]1A, B, Supplementary Data [66]1). One
   mouse developed a hind leg tumor (Supplementary Fig. [67]1A,
   Supplementary Data [68]1), and nine mice had neoplastic infiltrates in
   the lung (Supplementary Fig. [69]1B, Supplementary Data [70]1). The
   latency of disease and moribund phenotype were consistent across serial
   transplantations, suggesting the model is genomically stable
   (Fig. [71]1B, Supplementary Data [72]1). Expression of the
   CBFA2T3-GLIS2 fusion gene was confirmed through RT-PCR (Supplementary
   Fig. [73]1C) and Western blot (Supplementary Fig. [74]1D).

Fig. 1. CBFA2T3-GLIS2 drives the transformation of primary human
megakaryoblasts.

   [75]Fig. 1
   [76]Open in a new tab

   A Generation of CBFA2T3-GLIS2-positive primary human megakaryoblasts.
   CD34+ stem cells were isolated from human cord blood, transduced with a
   lentivirus encoding the CBFA2T3-GLIS2 coding sequence and a GFP
   reporter, followed by differentiation to megakaryoblasts with human TPO
   and IL1β for 7–9 days. Cultures were sorted for GFP+ purity and
   transplanted into immunodeficient NSG-SGM3 mice. Recovered cells from
   transplantation were depleted for mouse CD45 cells, and GFP expression
   was confirmed by flow cytometry. B Serial transplantation (primary,
   N = 6; secondary, N = 11; and tertiary, N = 14) of fusion-positive
   megakaryoblasts induce AMKL within 200 days compared to empty vector
   megakaryoblasts (N = 6) which do not expand in vivo. Leukemia-free
   survival is shown. Source data are provided as a Source Data file. C
   t-SNE analysis of the gene expression profiles of in vitro cultured
   fusion-positive megakaryoblasts (light green), primary (red), secondary
   (blue), and tertiary (dark green) transplants, fusion-positive patient
   samples (orange), and AMKL patient samples harboring a different
   genetic driver other than the CBFA2T3-GLIS2 fusion (purple). Clusters
   are based on the top 1% of differentially expressed genes.

   We have previously shown that CBFA2T3-GLIS2-positive AMKL patient
   specimens (N = 9) display a distinct gene expression profile compared
   to other AMKL genomic subsets (N = 35)^[77]4,[78]5. To understand how
   closely our model mimics the transcriptional program found in patients,
   we subjected cultured cells as well as primary, secondary, and tertiary
   leukemias to RNA sequencing (Fig. [79]1C). An average of 18,660
   transcripts were expressed with at least one read in our
   fusion-positive leukemic megakaryoblasts. The serially transplanted
   leukemic megakaryoblasts (N = 15) and fusion-positive megakaryoblasts
   cultured in vitro (N = 9) before transplantation were found to
   recapitulate the gene expression profile of fusion-positive patient
   specimens (Fig. [80]1C) with a strong correlation (R = 0.91,
   p < 2.2e−16) (Supplementary Fig. [81]1E). To assess our model on the
   translational level, tandem mass tag (TMT)-based quantitative
   proteomics was performed on cells from two transplanted leukemias and
   four CBFA2T3-GLIS2-positive patient-derived xenograft specimens^[82]13.
   Fusion-negative megakaryoblasts cultured in vitro were used as a
   control. Consistent with our transcriptome data, we found a strong
   correlation between transplanted leukemia cells and patient samples
   (R = 0.98, p < 2.2e−16) (Supplementary Fig. [83]1F, Supplementary
   Data [84]2), further supporting that this humanized murine model
   faithfully replicates primary patient samples phenotypically and can be
   accurately used to characterize the CBFA2T3-GLIS2 transcriptional
   complex. Spearman correlation of our transcriptome and proteome data
   was calculated using paired secondary (N = 1) and tertiary (N = 1)
   transplanted cells that underwent both RNA-sequencing and proteomic
   analysis^[85]14. The Spearman’s correlation coefficient was 0.46
   (Supplementary Fig. [86]1G, Supplementary Data [87]2), a moderate
   correlation that compares favorably with prior publications that have
   looked at the paired transcriptome and proteome data and suggest
   post-transcriptional modifications and translational regulation that
   can be investigated in future studies^[88]14.

CBFA2T3-GLIS2 binds at GC-rich promoter regions

   To map genome-wide binding of CBFA2T3-GLIS2 in our humanized model,
   cleavage under targets and release using nuclease with sequencing
   (CUT&RUN-seq) was done on CBFA2T3-GLIS2-positive leukemic
   megakaryoblasts from our murine model (N = 2) using an anti-TY1
   antibody to recognize the N-terminal tag engineered on our fusion
   construct (see the “Methods” section). A total of 28,076 regions were
   identified, which corresponded to 5256 unique genes bound by the fusion
   at the promoter. 92% of these genes were expressed, and 55% were
   differentially expressed when compared to fusion-negative
   megakaryoblasts (Fig. [89]2A, Supplementary Data [90]3). Although a
   significant portion of peaks were located at gene promoters (52% of
   total gene regions), 27% and 16% were bound at intron and intergenic
   regions respectively, suggesting the fusion is also regulating genes
   within introns and binding at enhancer regions located upstream of
   promoter sites (Fig. [91]2A). Of the 55% differentially expressed genes
   bound by the fusion at promoters, 57% (1517/2657) were downregulated
   and 43% (1140/2657) were upregulated (Fig. [92]2B, Supplementary
   Data [93]3). Over-representation analysis (ORA), a statistical method
   to measure the enrichment of certain pathways in a subset of data,
   revealed a significant upregulation of genes associated with the WNT,
   TGFβ, and Hedgehog signaling pathways (Fig. [94]2B, Supplementary
   Data [95]4), consistent with enriched pathways in
   CBFA2T3-GLIS2-positive patient samples as previously
   published^[96]5,[97]15,[98]16.

Fig. 2. CBFA2T3-GLIS2 binds at GC-rich promoter regions to alter gene
expression.

   [99]Fig. 2
   [100]Open in a new tab

   A Summary of CUT&RUN-seq data from fusion immunoprecipitation. The
   percentage of fusion-bound genes that are expressed and differentially
   expressed in the cell is shown. Distribution of CUT&RUN-seq peak
   locations from fusion immunoprecipitation. UTR untranslated region, TSS
   transcription start site. Source data are provided as a Source Data
   file. B RNA extraction and gene expression profiling were performed on
   three technical replicates from fusion-negative megakaryoblasts
   cultured in vitro and fusion-positive leukemic megakaryoblasts
   recovered from two secondary transplantations. A heatmap of
   differentially expressed genes bound by the fusion at promoter regions
   is shown (N = 2657). Log2 fold change (FC) was calculated using a
   generalized linear model with a negative binomial distribution in
   DESeq2 and p-values derived from the Wald t-test and corrected using
   the Benjamini–Hochberg correction. Over-representation analysis using
   the Reactome database of differentially expressed genes bound by the
   fusion. p-values were calculated using a one-sided Fisher’s exact and
   adjusted for multiple comparisons using the Benjamini–Hochberg
   correction. C Top 74 nodes from transcription factor gene regulatory
   network (GRN) analysis.

   To further characterize the altered gene expression profile of
   fusion-positive leukemic megakaryoblasts and understand the gene
   regulatory network that leads to dysregulation of genes in the absence
   of direct CBFA2T3-GLIS2 binding, we performed a global differential
   gene expression analysis comparing the two transplanted samples and the
   three fusion negative megakaryoblast samples. This analysis again
   showed differential expression with 54% of genes (4215) downregulated,
   and 46% (3590) upregulated compared to fusion negative megakaryoblasts
   (Supplementary Fig. [101]3A, Supplementary Data [102]3). In total, 68%
   (5275/7805) of dysregulated genes were not directly bound by
   CBFA2T3-GLIS2. Given the global-scale, we used gene set enrichment
   analysis (GSEA) and the Hallmark database to explore enriched pathways
   and found a consistent upregulation of WNT, TGFβ, and Hedgehog
   signaling-associated genes but also found upregulation of genes in the
   NOTCH and JAK/STAT pathways (Supplementary Fig. [103]3A, Supplementary
   Data [104]4), suggesting the fusion is dysregulating components of
   these pathways through binding of transcription factors that mediate
   these signaling pathways^[105]17–[106]19. To model the transcription
   factor gene regulatory network (GRN), we utilized GeneNetworkBuilder, a
   pipeline for building regulatory networks using genome-wide binding and
   gene expression data^[107]20. The top 74 nodes of this GRN when ranked
   by degree centrality (number of connections to and from a node),
   include transcription factors involved in JAK/STAT signaling (STAT2,
   STAT4, STAT6, JUN, JUNB), canonical and non-canonical WNT signaling
   (TCF7L2, NFATC3, NFYA, JUN), TGFβ (SMAD2, TGIF1, ZEB1), NOTCH (RBPJ),
   RUNX (RUNX1, RUNX3), and megakaryocyte development (IRF2, FLI1, GFI1B)
   pathways (Fig. [108]2C Supplementary Fig. [109]3B–G, Supplementary
   Data [110]5). We did not identify any nodes related to the Hedgehog
   signaling pathway, consistent with the absence of binding of
   CBFA2T3-GLIS2 to the GLI1-3 genes and the absence of GLIS1-3 in the
   GeneNetworkBuilder algorithm (Supplementary Data [111]3). Several
   transcription factors that are downstream effectors of multiple
   pathways were found, including JUN, which has been reported to function
   downstream of JAK/STAT, TGFβ, and WNT signaling. Additionally, JUN
   regulates the expression of members of the RUNX family, including RUNX2
   and RUNX3, which play a role in leukemic progression, underscoring the
   significant cross-talk seen within the transcriptional network mediated
   by CBFA2T3-GLIS2 (Fig. [112]2C)^[113]21. Although the network was
   dominated by oncogenic and developmental transcription factors, a
   significant central node, IRF2, was identified as part of the
   megakaryocyte development pathway (Supplementary Fig. [114]3G). While
   IRF2 is primarily known as an antagonist of the tumor suppressor IRF1,
   studies have also shown that IRF2 expression in myeloid progenitor
   cells leads to reprogramming towards the megakaryocytic lineage and
   enables them to respond to thrombopoietin^[115]22,[116]23.
   Collectively, this analysis provides a map of the direct and indirect
   effects of CBFA2T3-GLIS2 expression in megakaryoblasts.

   Given that the fusion retains the five highly conserved DNA-binding
   zinc fingers of full-length GLIS2, we hypothesized a majority of genes
   bound by the fusion would contain a G-C-rich GLIS or the closely
   related GLI transcription factor consensus sequence. Homer's known
   motif search was used to identify the top represented consensus
   sequences from the promoter peak regions (filtered set—10,619) ranging
   in size from 34 to 11,355 base pairs associated with the 5256
   fusion-bound genes. Consistent with previously published data in the
   MO7e cell line, ETS, GATA, and RUNX consensus sequences were highly
   represented (Supplementary Fig. [117]3H)^[118]7. Although we did not
   find a GLIS or GLI motif, we did identify the G-C-rich SP1 motif. To
   compare the SP1 motif to the GLI superfamily sequences, we performed a
   neighbor-joining analysis which revealed the SP1 consensus sequence
   clusters closely with the GLIS1–2 and GLIS1–4 consensus sequences
   (Supplementary Fig. [119]3I)^[120]24. We hypothesized the lack of GLI
   motifs in the Homer search was due to a lack of representation of GLI
   and GLIS consensus sequences in the database. Therefore, to focus the
   search on the GLI superfamily of transcription factors that share a
   high degree of homology in the DNA-binding zinc finger domains, a find
   individual motif occurrences (FIMO) search was used to determine the
   abundance of the GLIS and GLI family motifs in our data^[121]25. This
   custom search revealed the GLIS1–2 consensus sequence (DCWHNGTGGGGGGTV)
   was represented in 39.8% of the promoter regions, suggesting this is
   the preferential binding sequence. GLIS1–4 (SNSYSYRGGRGGBV) was also
   represented in 23.6% of regions and SP1 (NRGGGGCGGGGCBDGGVSSSGV) in
   36.5% (Supplementary Fig. [122]2J). One or more GLIS1–2, GLIS1–4, and
   SP1 sites overlapped with fusion-bound sites in 77.5% (4072/5256),
   77.8% (4089/5256), and 81% (4246/5256) unique genes, respectively
   (Supplementary Fig. [123]3K, Supplementary Data [124]6). Fusion
   preferential binding for GLIS consensus sequences is supported by
   Vasanth et al. who reported full-length GLIS2 binds GLIS binding
   sequences with a stronger affinity than GLI binding sequences^[125]26.

CBFA2T3-GLIS2 alters super-enhancers

   To define the super-enhancer landscape in fusion-positive leukemic
   megakaryoblasts, we first mapped H3K27ac occupancy in fusion-negative
   and fusion-positive leukemic megakaryoblasts using CUT&RUN-seq. We
   identified 1537 new H3K27ac peaks in fusion-positive leukemic
   megakaryoblasts, 56% of which overlap with fusion-bound genes
   (Supplementary Fig. [126]4A, Supplementary Data [127]3). The ranking of
   super-enhancers (ROSE) algorithm was utilized to identify
   super-enhancers (SE) in both fusion-negative and fusion-positive
   leukemic megakaryoblasts. The signals from two biological replicates
   were averaged together (individual replicates are shown in
   Supplementary Fig. [128]3B, C) and identified 263 and 236 unique SE
   restricted to fusion-negative and fusion-positive leukemic
   megakaryoblasts, respectively. 74% of SE regions unique to the
   fusion-positive cells were bound by the fusion (Fig. [129]3A, B,
   Supplementary Data [130]7). As expected, a majority (61%) of the SE
   identified in fusion-negative megakaryoblasts led to upregulation of
   their associated genes (Fig. [131]3C). Similarly, a majority (60%) of
   the SE identified in fusion-positive leukemic megakaryoblasts were
   upregulated when compared to fusion negative megakaryoblasts
   (Fig. [132]3C) and this pattern held true when restricted to
   fusion-positive SE that were also bound by the fusion (Fig. [133]3C).

Fig. 3. CBFA2T3-GLIS2 alters super-enhancers.

   [134]Fig. 3
   [135]Open in a new tab

   A Averaged signal-ranking plots of H3K27ac bound genes were identified
   from CUT&RUN-seq in two fusion-negative megakaryoblast samples and two
   fusion-positive megakaryoblast samples from secondary transplants.
   Super-enhancer (SE) cut-off values are displayed on the graphs. Source
   data are provided as a Source Data file. B Venn diagram showing the
   overlap between SEs in fusion-negative megakaryoblasts (blue) and
   fusion-positive megakaryoblasts (red). C Heatmaps of differentially
   expressed SEs in fusion-negative megakaryoblasts (N = 222),
   fusion-positive megakaryoblasts (N = 187), and fusion-positive
   megakaryoblasts that are also bound by the fusion (N = 163). D H3K27ac
   binding profiles, fusion binding profiles, and corresponding
   RNA-sequencing tracks at the TESC, BCL2, and TCF7L2 SE. The TESC SE,
   essential for megakaryoblast differentiation, is lost in
   fusion-positive megakaryoblasts. BCL2, an anti-apoptotic gene, and
   TCF7L2, a member of the WNT signaling pathway, are upregulated
   super-enhancers exclusive to fusion-positive megakaryoblasts (generated
   using SparK v2.6.2)^[136]56.

   The 236 unique SE restricted to fusion-positive leukemic
   megakaryoblasts included gene targets related to pro-survival,
   hematopoietic differentiation, and developmental pathways (Fig. [137]3,
   Supplementary Data [138]7). Consistent with prior reports, we found SE
   associated with the BCL2 and KIT genes^[139]7. 171 SE were present that
   have not been previously reported, including TCF7L2, which encodes a
   transcription factor that plays a role in the regulation of the WNT
   signaling pathway (Fig. [140]3A-D, Supplementary
   Data [141]7)^[142]7,[143]8. TCF7L2 has also been shown to be required
   for the survival and proliferation of AML cells^[144]27. Consistent
   with a block in differentiation, we found the TESC SE present in
   fusion-negative megakaryoblasts to be lost and expression of this gene,
   which is required for megakaryoblast differentiation, downregulated in
   fusion-positive leukemic megakaryoblasts (Fig. [145]3D)^[146]28. Thus,
   a subset of genes are dysregulated through SE gains and losses in
   addition to regulation provided via promoter binding.

ETO, CtBP1, and p300 associate with the CBFA2T3-GLIS2

   The CBFA2T3 portion of the fusion retains three highly conserved nervy
   homology regions (NHR). NHR1 is a binding site for p300, a
   transcriptional co-activator and histone acetyltransferase^[147]29.
   NHR2 contains a hydrophobic heptad repeat region that plays a key role
   in homo- and hetero-oligomerization with the closely related
   transcriptional repressor ETO (also known as RUNX1T1 and
   CBFA2T1)^[148]30. ETO can also bind p300 thereby suggesting both
   transcriptional repression and activation can be altered through the
   hetero-oligomerization of ETO via NHR2^[149]31. While NHR3 does not
   contain any highly conserved domains and is not well characterized, it
   is suggested to support oligomerization^[150]30. In addition to the
   five DNA binding zinc-fingers retained in the GLIS2 portion of the
   fusion, there is one PXDLS consensus sequence and one degenerate
   sequence capable of binding CtBP1, a transcriptional
   repressor^[151]32,[152]33. p300, like many histone acetyltransferases,
   also contains a highly conserved PXDLS sequence in the center of its
   bromodomain that can bind to CtBP1, which, in turn, represses the
   transactivation activity of p300^[153]34. Based on these data, we
   hypothesized the CBFA2T3-GLIS2 fusion is able to recruit and associate
   with ETO, CtBP1, p300 and homo-dimerize to bi-directionally alter gene
   expression at fusion-bound genes (Supplementary Fig. [154]5A).

   To establish an association of ETO, CtBP1, and p300 with the fusion we
   conducted co-immunoprecipitation (co-IP) experiments in CBFA2T3-GLIS2
   leukemic megakaryoblasts from our murine model (Fig. [155]4A,
   Supplementary Fig. [156]5B). Immunoprecipitation of ETO, CtBP1 and p300
   followed by blotting for the fusion using the 3x-TY1 tag confirmed our
   predicted associations (Fig. [157]4A, Supplementary Fig. [158]5B). We
   then performed CUT&RUN-seq for ETO, CtBP1, and p300 in transplanted
   leukemic megakaryoblasts and in vitro cultured fusion negative
   megakaryoblasts to map co-occupancy with the fusion at target genes
   (Fig. [159]4B–D, Supplementary Data [160]3) and assess genome-wide
   shifts in the binding of these factors in the presence of the fusion
   (Fig. [161]4C, Supplementary Data [162]3). Overall, we found an
   increase in the absolute number of unique genes bound at promotor
   regions by ETO and p300 in megakaryoblasts harboring the fusion
   compared to those lacking the fusion, whereas there were a similar
   number of genes bound by CtBP1 in fusion-positive and negative
   megakaryoblasts (Fig. [163]4C, Supplementary Data [164]3). We then
   looked specifically at the bound genes retained, gained, and lost in
   the fusion-positive megakaryoblasts (Fig. [165]4C). This demonstrated
   clear genome-wide shifts for all three factors in the presence of the
   fusion, with ETO and p300 having a greater number of sites gained
   compared to lost leading to a net increase of 136% and 303% targets
   respectively. Although CtBP1 was also found to bind new sites across
   the genome, there was a significant loss of CtBP1 binding to genes
   resulting in a net loss of 11% (Fig. [166]4C). The majority of sites
   gained, irrespective of the co-factor, were co-bound by the fusion (ETO
   64%, CtPB1 52%, and p300 57%) (Supplementary Fig. [167]5C,
   Supplementary Data [168]3). Accordingly, heatmaps demonstrated
   significant enrichment of these co-factors at fusion-bound genes
   (Fig. [169]4D, Supplementary Fig. [170]5D). Collectively, this data
   argues in favor of the recruitment of these co-factors to new sites in
   the genome by the fusion itself.

Fig. 4. ETO, CtBP1, and p300 associated with the CBFA2T3–GLIS2
transcriptional complex.

   [171]Fig. 4
   [172]Open in a new tab

   A Co-immunoprecipitations using fusion-positive leukemic
   megakaryoblasts confirm the association of ETO, CtBP1, and p300 with
   the fusion. All immunoprecipitations were repeated twice,
   representative blots are shown. Source data are provided as a Source
   Data file. B CUT&RUN-seq has performed in fusion-positive
   megakaryoblasts from two secondary transplants and fusion negative
   megakaryoblasts cultured in vitro for ETO, CtBP1, and p300.
   Distribution of CUT&RUN-seq peak locations for ETO, CtBP1, and p300.
   CBFA2T3-GLIS2 is included as a comparison. TSS transcription start
   site, TTS transcription termination site, UTR untranslated region.
   Source data are provided as a Source Data file. C Summary of ETO,
   CtBP1, and p300 bound genes retained, gained, and lost in
   fusion-positive megakaryoblasts compared to fusion-negative
   megakaryoblasts. Source data are provided as a Source Data file. D
   Heatmaps of enrichment for CBFA2T3-GLIS2 (CG-TY), H3K27ac, ETO, CtBP1,
   and p300 at genes bound by the fusion. E Venn diagram showing the
   overlap of genes bound at the promoters of ETO, CtBP1, and p300
   exclusively in fusion-positive megakaryoblasts and their overlap with
   fusion-bound genes (Supplementary Data [173]3).

   To gain insight into the impact of these co-factors on the expression
   of target genes, we evaluated the directionality of expression in the
   presence of the ETO, CtBP1, and p300. Genes bound solely by the fusion
   served as a baseline, with 57% of genes downregulated and 43% of genes
   upregulated. Directionality of expression did not appear to be
   significantly impacted by an association with ETO, CtBP1, or p300 alone
   (Supplementary Fig. [174]5E, Supplementary Data [175]3). Genes bound by
   all three factors, however, demonstrated an increase in the proportion
   of genes upregulated (65%, p = 0.0028). An evaluation of the
   differential expression of target gene subsets bound by the fusion and
   one or more of these co-factors revealed a contribution of each of
   these co-factors to the dysregulation of oncogenic pathways. For
   example, target genes associated with RUNX transcriptional regulation
   were co-occupied by CtBP1, whereas WNT signaling genes were co-occupied
   by ETO (Table [176]1). Most significantly enriched pathways (ORA
   Reactome analysis), irrespective of the co-factor associated with them,
   were upregulated with the exception of platelet activation,
   aggregation, and degranulation downregulation associated with ETO and
   p300 co-occupancy which are likely a reflection of the block in
   terminal differentiation (Table [177]1, Supplementary Data [178]4). The
   lack of significant enrichment for downregulated gene sets co-occupied
   by the fusion and its co-factors was surprising given the proportion of
   genes whose expression is decreased by the fusion and its associated
   co-factors. ETO and CtBP1 are well known for their repressor activity,
   but in the context of co-occupancy with CBFA2T3-GLIS2, our data
   suggests they play a major role in transformation through the
   transcriptional activation of key pathways associated with
   proliferation and development.

Table 1.

   Impact of co-factor binding on over-representation analysis
   Pathway Fusion only* Fusion p300* Fusion ETO* Fusion ETO/p300* Fusion
   CtBP1*
   Signaling by TGFβ 0.004 n.s. n.s. n.s. n.s.
   Signaling by WNT 0.05 0.01 0.02 n.s. n.s.
   Signaling by hedgehog 0.05 0.04 n.s. n.s. 0.05
   Signaling by NOTCH4 n.s. 0.03 n.s. n.s. n.s.
   Transcriptional regulation by RUNX1 n.s. n.s. n.s. n.s. 0.05
   Transcriptional regulation by RUNX2 0.01 0.01 n.s. n.s. n.s.
   Transcriptional regulation by RUNX3 0.03 0.04 n.s. n.s. n.s.
   Platelet aggregation n.s. n.s. n.s. 0.004 n.s.
   Platelet activation n.s. n.s. n.s. 0.01 n.s.
   Platelet degranulation n.s. n.s. n.s. 0.01 n.s.
   [179]Open in a new tab

   n.s. not significant

   *Adjusted p-values calculated with a one-sided Fisher’s exact test and
   adjusted for multiple comparisons using the Benjamini–Hochberg
   correction; all pathways were upregulated with the exception of
   platelet aggregation, platelet activation, and platelet degranulation,
   which were downregulated.

Loss of NHR2 disrupts ETO association and dimerization

   Having demonstrated binding and genomic co-occupancy of the
   CBFA2T3-GLIS2 fusion with ETO, CtBP1, and p300, we next sought to
   determine which structural components of the fusion were essential for
   recruitment of these co-factors to the transcriptional complex. To do
   this, we established a series of mutant CBFA2T3-GLIS2 constructs for in
   vitro and in vivo assays based on prior studies of GLIS2 and ETO family
   members (Supplementary Fig. [180]5A). In AML1-ETO positive AML,
   deletion of NHR1 disrupted p300 binding and deletion of NHR2 abrogated
   ETO binding^[181]31,[182]35–[183]38. We therefore predicted removal of
   NHR1 from the fusion sequence would lead to a partial loss of p300 due
   to the continued recruitment of ETO and CtBP1, whereas deletion of NHR2
   would abrogate ETO binding and disrupt homo-dimerization. It has been
   previously demonstrated that mutation of all four PXDLS sequences in
   the full-length GLIS2 protein abolished CtBP1 binding^[184]38.
   Therefore, we predicted mutation of the two PXDLS residues retained in
   the fusion (DL380, 487AS) would cause a partial loss of CtPB1 as
   recruitment of CtBP1 to the complex can still be achieved indirectly
   through an association of p300 via NHR1 or ETO. To disrupt the
   association of p300, ETO, and CtBP1, we predicted the fusion would need
   to have a loss of both NHR1-2 and mutation of the PXDLS residues
   (Supplementary Fig. [185]5A).

   We went on to perform a series of co-IPs with our mutant constructs in
   transfected 293T cells, a well-established embryonic kidney cell line
   that lacks CBFA2T3-GLIS2 but expresses ETO, CTBP1, and EP300. Cells
   were transiently transfected for forty-eight hours with a lentiviral
   construct encoding N-terminal 3xFLAG tagged CBFA2T3-GLIS2 or one of the
   several mutant constructs outlined in Fig. [186]S5A. Consistent with
   the literature on ETO family members, we found that loss of NHR2
   abrogated ETO binding with the fusion (Fig. [187]5A, Supplementary
   Fig. [188]6A). NHR2 also contributes to homo- and
   hetero-oligomerization. Therefore, to assess the effect of NHR2
   deletion on oligomerization, we first co-transfected 293T cells with a
   lentiviral construct encoding N-terminal 3xTY1-tagged CBFA2T3-GLIS2
   (referred to as wild type) and a lentiviral construct encoding either a
   wild type N-terminal 3xFLAG tagged fusion or a mutant fusion construct
   that lacks NHR1, NHR2, or NHR3. We found a significant reduction in,
   but not complete disruption of, homo-oligomerization upon NHR2
   deletion, providing support for the role of NHR3 in oligomerization
   (Fig. [189]5B). NHR1 and NHR3 deletion did not have a major impact on
   the ability to dimerize with the wild type fusion. To confirm our
   findings, we co-transfected 293T cells with two mutant constructs. In
   the absence of the wild-type fusion, we again saw a reduction in
   homo-dimerization only when NHR2 was deleted (Fig. [190]5C).

Fig. 5. Loss of NHR2 disrupts ETO association and dimerization of the
CBFA2T3–GLIS2 complex and abrogates leukemogenesis in vivo.

   [191]Fig. 5
   [192]Open in a new tab

   A–E Co-immunoprecipitation using 293T cells transfected with empty
   vector (MIG), wild-type fusion, or a fusion containing one or more of
   the mutations outlined in Fig. [193]4A. All immunoprecipitations were
   repeated twice, representative blots are shown. Source data for all
   blots are provided as a Source Data file. A Immunoprecipitation of ETO
   followed by staining with the TY-1 tag to detect CBFA2T3-GLIS2 and the
   reciprocal immunoprecipitation of CBFA2T3-GLIS2 via the TY-1 tag
   followed by staining for ETO is shown. B TY-1-tagged wild-type
   CBFA2T3-GLIS2 and one of three FLAG-tagged mutant constructs were
   co-transfected into 293T cells to evaluate the ability of the mutant
   constructs to dimerize with the wild-type fusion. Immunoprecipitation
   by TY-1 followed by staining for FLAG and the reciprocal
   immunoprecipitation of FLAG followed by staining for TY-1 is shown. C
   293T cells were co-transfected with two mutant constructs, one
   FLAG-tagged and one TY-1-tagged, to verify the results shown in (B).
   Immunoprecipitation of TY-1-tagged NHR2 deletion mutant followed by
   staining for FLAG again revealed a reduction of dimerization. D
   TY-1-tagged wild-type and mutant fusion constructs were transfected
   into 293T cells. Cells were immunoprecipitated for CtBP1 followed by
   staining for the fusion via TY-1 and the reciprocal immunoprecipitation
   of the fusion followed by staining for CtBP1. E TY-1-tagged wild-type
   and mutant fusion constructs were transfected into 293T cells. Cells
   were immunoprecipitated for p300, followed by staining for the fusion
   via TY-1, and the reciprocal immunoprecipitation of the fusion followed
   by staining for p300. F, G Transplantation of immunodeficient NSG-SGM3
   mice with primary megakaryoblasts transduced with the mutant fusion
   constructs (N = 5 per mutant construct, N = 9 wild type in panel F and
   12 wild type in panel G). p < 0.0001 for NHR2 deletion and NHR1-2
   (DL380,487AS) mutant constructs compared to wild type. p-values
   determined by log-rank test. Source data are provided as a Source Data
   file.

   In contrast to prior reports on the full-length GLIS2 protein, mutation
   of the PXDLS motifs retained in the fusion sequence (DL380,487AS) only
   led to a reduction in CtBP1 binding, not a complete abrogation^[194]38.
   This may be due to the recruitment of CtBP1 to the complex through
   p300, independent of binding to the motif. However, deletion of NHR1
   and 2 in combination with the PXDLS mutations only abrogated CtBP1
   association when the fusion was directly immunoprecipitated, suggesting
   CtBP1 is still able to indirectly associate with the transcriptional
   complex even in the presence of all three structural mutations
   (Fig. [195]5D, Supplementary Fig. [196]6B). Although IP of p300
   demonstrated an association with the fusion, we were unable to show
   direct binding of the fusion with p300 as when we immunoprecipitated
   for the fusion and probed for p300 we were unable to detect any bands.
   This suggests that p300 may only be recruited to the fusion complex via
   other co-factors, such as ETO and CtBP1 (Fig. [197]5E, Supplementary
   Fig. [198]6C).

Loss of NHR2 abrogates leukemogenesis in vivo

   To assess the effects of the structural mutations and the requirement
   for these co-factors on leukemogenesis in vivo, we transplanted
   NSG-SGM3 mice with megakaryoblasts transduced with either wild-type
   CBFA2T3-GLIS2 (N = 13) or CBFA2T3-GLIS2 carrying a deletion of NHR1
   (N = 5), NHR2 (N = 5), mutation of the PXDLS motif (DL380,487AS)
   (N = 5), or all mutations in combination (N = 5) (Fig. [199]5F, G,
   Supplementary Data [200]1). Loss of NHR2 abrogated leukemogenesis in
   vivo when monitored for over 250 days (p < 0.0001, Fig. [201]5F,
   Supplementary Data [202]1), whereas deletion of NHR1 or mutation of the
   PXDLS motif did not have an effect on leukemia-free survival
   (Fig. [203]5G, Supplementary Data [204]1). A previous study
   demonstrated an inability to engraft when CBFA2T3-GLIS2 PDX cells were
   transduced with NC128, an NHR2 domain competitive peptide, thereby
   further highlighting the importance of this domain on transformative
   capability^[205]7. Our demonstration of the retained ability of the
   fusion to dimerize, although at significantly reduced levels, even when
   NHR2 is deleted (Fig. [206]5B, C), suggests the loss of leukemogenesis
   in NHR2 deletion mutants cannot be solely due to the loss of
   oligomerization, arguing in favor of a role for ETO in transformation
   (Fig. [207]5A, Supplementary Fig. [208]6A).

   We sought to investigate genome-wide binding and gene expression
   changes upon deletion of NHR2 to identify key pathways required for
   leukemogenesis that may reveal therapeutic targets. Due to the lack of
   expansion in vivo when NHR2 is deleted, we performed CUT&RUN-seq and
   RNA-sequencing on megakaryoblasts transduced with either the wild-type
   fusion or a fusion construct lacking NHR2 grown in vitro. Differential
   gene expression analysis showed 64% (826/1296) of genes were
   downregulated, and 36% (470/1296) were upregulated when NHR2 was
   deleted compared to wild-type fusion-positive megakaryoblasts
   (Fig. [209]6A, Supplementary Data [210]3). GSEA found an enrichment of
   downregulated genes involved in the JAK/STAT, NOTCH, and Hedgehog
   signaling pathways (Fig. [211]6A, Supplementary Data [212]4).
   CUT&RUN-seq revealed that genome-wide, 61% of previously bound sites
   were lost, and 20% of sites had a decrease in the amount of fusion
   binding in NHR2 deletion mutants despite the retention of all five
   GLIS2 zinc finger DNA binding domains (Fig. [213]6B, Supplementary
   Data [214]8). Loss of binding was preferentially seen at
   transcriptional start sites, areas where co-factor binding and
   oligomerization are predicted to be important (Fig. [215]6C). When
   assessing gene expression of targets whose binding was lost at the
   transcriptional start sites, we found a concordant reversal in the
   signature imparted by the fusion (Fig. [216]6D). 90% (67/74) of the GRN
   nodes were no longer bound in NHR2 deletion mutants, so we reassessed
   the GRN and found a dramatic shift with only 26 nodes reaching
   significance (Fig. [217]6E, Supplementary Data [218]9). Collectively,
   the data argue in favor of impaired ETO binding and oligomerization of
   NHR2 deletion mutants, leading to a decrease in binding at
   transcriptional start sites and the loss of the GRN and key
   leukemogenic pathways.

Fig. 6. Impact of NHR2 deletion on CBFA2T3-GLIS2 transcription.

   [219]Fig. 6
   [220]Open in a new tab

   A RNA extraction and gene expression profiling were performed in
   technical triplicate on megakaryoblasts cultured in vitro transduced
   with either the fusion construct with NHR2 deleted or the wild-type
   fusion construct. In vitro, culturing was necessary due to the lack of
   expansion in vivo upon NHR2 deletion. A heatmap of all differentially
   expressed genes is shown (N = 1296). Significantly downregulated
   pathways, as determined by gene set enrichment analysis, are shown. For
   a full list of gene sets, please see supplementary data [221]4. Log2
   fold change (FC) was calculated using a generalized linear model with a
   negative binomial distribution in DESeq2 and p-values derived from the
   Wald t-test and corrected using the Benjamini–Hochberg correction.
   False discovery rate (FDR) was calculated using the Benjamini–Hochberg
   correction. B CUT&RUN-seq was performed on NHR2 deletion mutants and
   compared to wild-type fusion cells. The proportion of sites where
   binding was lost, retained, and newly bound sites is shown. Source data
   are provided as a Source Data file. C Location of binding at newly
   bound targets, retained targets, and lost targets in NHR2 deletion
   mutants. Source data are provided as a Source Data file. D Gene
   Expression changes in NHR2 deletion mutants at sites where fusion
   binding was lost at the transcriptional start sites (N = 4059 genes). E
   Gene regulatory network of NHR2 deletion mutants.

   To assess the therapeutic potential of several of the key pathways
   identified in our studies and others, we subjected fusion-positive
   megakaryoblasts and cord blood-derived CD34 cells to a series of
   inhibitors, including cKIT (Ripretinib), JAK/STAT (Ruxolitinib), WNT
   (iCRT), p300 (A-485), and ETO (7.44) (Supplementary
   Fig. [222]7)^[223]8,[224]39–[225]41. The greatest sensitivity seen was
   to ruxolitinib and the p300 inhibitor A-485 with IC[50] values of 70
   and 1300 nM, respectively (Supplementary Fig. [226]7). Only
   ruxolitinib, however, demonstrated a therapeutic window when taking
   into account the toxicity of the compounds on CD34 stem cells. We
   confirmed the hematologic toxicity of p300 loss by evaluating serial
   replating capacity in CBFA2T3-GLIS2-modified p300 knockout bone marrow
   (Supplementary Fig. [227]8). Confirming the sensitivity seen in vitro,
   p300 knockout abrogated colony development compared to p300 wild-type
   cells in the presence or absence of CBFA2T3-GLIS2 (Supplementary
   Fig. [228]8). Consistent with the sensitivity of our model to
   ruxolitinib in vitro, we have previously published the activity of this
   targeted inhibitor in a CBFA2T3-GLIS2 patient-derived xenograft
   model^[229]11. Combined, our data support a role for JAK/STAT
   signaling, and its potential as a therapeutic target.

Discussion

   The majority of CBFA2T3-GLIS2-positive AMKL patients fail to be cured
   with the current treatment options that include chemotherapy in
   combination with stem cell transplant. Due to the relatively low
   incidence of this malignancy in the broader scope of pediatric cancer,
   these patients have not been afforded comprehensive research studies
   toward the development of new therapeutics^[230]42. We sought to
   contribute to the characterization of the CBFA2T3-GLIS2 transcriptional
   complex to enhance our understanding of the mechanism(s) of
   leukemogenesis and identify biologic pathways that are required in this
   process. Our data support a model in which the CBFA2T3-GLIS2 fusion
   facilitates homo-dimerization and recruits the transcriptional
   co-factors ETO, CtBP1, and p300 to impart differential gene expression
   at zinc finger bound G-C-rich motifs throughout the genome
   (Supplementary Fig. [231]9). Despite the large number of genes bound by
   the fusion, the impact of the fusion on gene expression extended far
   beyond fusion bound genes. This led us to map the gene regulatory
   network, which revealed a large number of well-known transcription
   factors that are dysregulated by the fusion and, in turn, affect
   downstream genes. While the fusion-bound genes included several
   associated with WNT, TGFβ, and Hedgehog signaling, NOTCH and the
   JAK/STAT oncogenic pathways only emerged when we looked at gene
   expression globally and resulted from upregulation of components of the
   gene regulatory network. Further, our ability to compare
   fusion-negative and fusion-positive megakaryoblasts allowed us to
   confirm the changes that were due to the presence of CBFA2T3-GLIS2 and
   rule out megakaryoblast lineage-specific changes.

   CBFA2T3 and GLIS2 are well-characterized transcription factors which
   led us to look at the potential role of ETO, CtBP1, and p300 co-factors
   on transcriptional regulation. We were able to confirm the association
   of these co-factors biochemically as well as co-localization at
   promoter sites bound by the fusion genome-wide. The generation of
   mutant constructs and their functional impact in vivo allowed us to
   narrow down the domain required for transformation to NHR2. When NHR2
   is deleted, we found a significant reduction in homo-dimerization, a
   complete loss of ETO association, and impairment in DNA binding at
   transcriptional start sites. This had the greatest impact on JAK/STAT
   and NOTCH signaling pathways and the loss of the GRN at the gene
   expression level. Abrogation of the development of leukemia in vivo
   suggests that these pathways are important, in contrast to others that
   likely contribute to the transformation process, such as RUNX but were
   not affected by NHR2 deletion. Compilation of genomic occupancy mapping
   and pathway enrichment analyses in combination with NHR2 deletion gene
   dysregulation data has led us to propose that multiple pathways play an
   active role in transformation, which is disrupted when NHR2 is deleted.
   Thus, targeting a single pathway is not predicted to be effective
   therapeutically. BCL2, a downstream effector of JAK/STAT signaling and
   a therapeutic target of its own was found to be upregulated in
   fusion-positive leukemic megakaryoblasts as a result of the acquisition
   of a SE. The role of BCL-xL in megakaryocyte lineage survival, however,
   suggests a potential path of resistance to BCL2-specific inhibitors,
   which has recently been confirmed in CBFA2T3-GLIS2 model
   systems^[232]43,[233]44.

   In summary, we have developed a humanized murine model of AMKL that
   faithfully recapitulates the CBFA2T3-GLIS2 patient profile. We have
   used this model to show ETO, CtBP1, and p300 are associated with the
   transcriptional complex and play a role in gene dysregulation that is
   disrupted when NHR2 is deleted. Our model has the potential to be used
   for systematic knock-down experiments to identify absolute dependencies
   that can be therapeutically targeted and to aid in the pre-clinical
   testing of these agents.

Methods

Ethical regulations

   The research complies with all relevant ethical regulations. Human
   CD34+ stem cells utilized in this study were commercially purchased,
   de-identified units. All animal experiments were conducted under
   IACUC-approved protocols at one of two institutions: St. Jude
   Children’s Research Hospital, where “Animal Care and Use Committee”
   approved protocol entitled “Establishing Murine Models of Leukemia”
   519-100568 (experiments conducted between 11/16/2018 and 8/25/2020); or
   Stanford University, where “Administrative Panel on Laboratory Animal
   Care” approved protocol entitled “Murine Modeling of High-Risk
   Pediatric Acute Leukemias” 33830 (experiments conducted between
   08/25/2020 and 08/25/2023). Moribund mice were identified by the
   presentation of hind leg paralysis, hunched body position, labored
   breathing, and/or inability to eat or drink.

Generation of primary human megakaryoblasts

   CD34+ stem cells were isolated from commercially purchased,
   de-identified human cord blood units (St. Louis Cord Blood Bank and
   Cleveland Cord Blood Bank). Cord blood was layered onto Ficoll-Paque
   (Fisher 17144003), and CD34+ stem cells were positively selected from
   the mononuclear layer using the human CD34 MicroBead kit (Miltenyi
   130046702). CD34+ stem cells were cultured in GMP grade SCGM serum-free
   medium (CellGenix 208020500) supplemented with 1%
   penicillin–streptomycin and 100 ng/μl each of human TPO (Tonbo 218685),
   FLT3 ligand (Tonbo 218513), and SCF (PeproTech AF30007) for 48 h.
   Cultured CD34+ stem cells were then transduced using Retronectin
   (Takara Bio 62955748) seeded with lentiviral particles encoding the
   CBFA2T3-GLIS2 sequence. Packaged lentiviral particles were produced
   from transfected 293T cells (ATCC CRL-3216) using a third-generation
   lentiviral vector (see the section “Constructs” section), pMD2, and
   PAX2 packing plasmids with 2X BES (BES, NaCl, Na2PO4Heptahydrate, pH
   6.9) and CaCl[2] (pH 5.5). Forty-eight hours post-transduction, cells
   were removed from the RetroNectin and differentiated towards the
   megakaryoblast lineage in GMP SCGM serum-free medium supplemented with
   1% penicillin–streptomycin and 100 ng/ml human TPO and 10 ng/ml human
   IL1β (Tonbo 218018) for 7–9 days which allows for approximately 30-fold
   expansion^[234]45. Megakaryoblasts were confirmed by flow cytometry
   analysis. Cells were pelleted, washed with PBS, blocked with human Fc
   block (BD 564220) for 10 min at room temperature, and then stained for
   CD41 (human CD41-PE Life Technologies MHCD4104 clone VIPL3), CD34
   (human CD34-APC Miltenyi 130113176 clone AC136), and viability with
   Ghost Dye Violet 510 (Tonbo 130870) for 20 min in the dark at room
   temperature. Cells were washed twice with FACS buffer (PBS supplemented
   with 1% FBS and 0.1 mM EDTA) and then analyzed on a BD Fortessa (St.
   Jude) or a Cytek DxP10 (Stanford). Megakaryoblasts were then sorted for
   GFP using fluorescence-activated cell sorting on a BD Biosciences Aria
   to obtain a pure GFP-positive population for downstream applications
   such as transplantation, RNA-sequencing, and CUT&RUN-sequencing.
   Transfection efficiencies averaged 15–30%.

Constructs

   Overlap extension PCR was used to clone the N-terminal 3xFLAG tag
   CBFA2T3-GLIS2 coding sequence into the pCDH.MSCV.MCS.EF1acopGFP (pCDH)
   lentivirus backbone and N-terminal 3xTY1 tag CBFA2T3-GLIS2 coding
   sequence into the CSI.pEF1a.IRES.EmGFP (CEI) lentivirus backbone. NHR
   deletions and PXDLS motif mutations were made by site-directed
   mutagenesis using the PfuUltra high-fidelity DNA polymerase kit
   (Agilent 600380) as per the manufacturer’s instructions for targets
   <10 kb vector DNA. 250 ng of input DNA was amplified via the following
   PCR program: Step 1 95 °C for 1 min, Step 2 95 °C for 40 s, Step 3
   64 °C for 50 s, Step 4 68 °C for 15 min, repeat steps 2–4 16 times,
   Step 5 72 °C for 15 min. The following primers were used (5’–3’):

   NHR1△ fw: CAGCACCTGCCCCCAGCCTGCGGGCTGCTGGACGCCAGCGCCTCCTCCCCC

   NHR1△ rv: GGGGGAGGAGGCGCTGGCGTCCAGCAGCCCGCAGGCTGGGGGCAGGTGCTG

   NHR2△ fw: CGGCAGGAAGAAGTGATCGACCACAAGGACACAAAGAAGGGCCCCGCTCCC

   NHR2△ rv: GGGAGCGGGGCCCTTCTTTGTGTCCTTGTGGTCGATCACTTCTTCCTGCCG

   NHR3△ fw: ACCCTCACCGGCTACGTGCCTGAGG/CGAAGCGGCAGGCCTCCGAGGAC

   NHR3△ rv: GTCCTCGGAGGCCTGCCGCTTCGCCTCAGGCACGTAGCCGGTGAGGGT

   DL380AS fw: CCTGCTGCCAGGCACCGTGCTGGCCTCGTCCACGGGCGTCAACTCAGCTG

   DL380AS rv: CAGCTGAGTTGACGCCCGTGGACGAGGCCAGCACGGTGCCTGGCAGCAGG

   DL487AS fw: ACCCCTGGCCCCCGGCCCCTTGCCTCCAGTGCCCTGGCCTGTGGCAACG

   DL487AS rv: CGTTGCCACAGGCCAGGGCACTGGAGGCAAGGGGGCCGGGGGCCAGGGGT

   Plasmids generated in this study are available from the corresponding
   author.

Humanized murine model

   Immunodeficient NSG-SGM3 (NOD.Cg-Prkdc[scid] Il2rg[tm1Wjl] ITg(CMV-IL3,
   CSF2, KITLG) 1Eav/MloySzJ) breeder mice were purchased from the Jackson
   Laboratory (Strain 013062) and bred in-house maintained on the
   NOD.Cg-Prkdscid Il2rgtm1Wjl/SzJ background. Engraftment was found to be
   equal between male and female mice, so both sexes were utilized; see
   Supplementary Data [235]1 for the sex of each mouse reported in this
   study. Mice were treated with prophylaxis Baytril and Amoxicillin to
   minimize the contraction of bacterial infections. Both male and female
   mice four to six weeks of age were pre-conditioned with 200 rad (cesium
   irradiator at St. Jude, x-ray irradiator at Stanford) 24 h prior to
   megakaryoblast transplantation. Transplantations were done either via
   tail vein or intrafemoral injection. There was no difference in disease
   onset and phenotype with either method or based on the gender of
   recipient mice. All animal work was conducted under IACUC protocols:
   St. Jude Children’s Research Hospital “Establishing Murine Models of
   Leukemia” 519-100568 (11/16/2018–11/16/2021); Stanford University
   “Murine Modeling of High-Risk Pediatric Acute Leukemias” 33830
   (08/25/2020–08/25/2023). Moribund mice were identified by the
   presentation of hind leg paralysis, hunched body position, labored
   breathing, and/or inability to eat or drink. Spleen and bone marrow
   cells from both hind legs were collected. Red blood cells were lysed
   with RBC lysis buffer (Invitrogen 501129743) according to the
   manufacturer’s instructions. Mouse CD45 cells were depleted using mouse
   CD45 microbeads (Miltenyi 130052301) according to the manufacturer’s
   instructions. Mouse CD45-negative cells were stained for mouse CD45
   (BioLegend 103132 clone 30-F11) and for viability with Ghost Dye Violet
   510 (Tonbo 130870) as described above and analyzed for GFP purity by
   flow cytometry to confirm recovery of transplanted, fusion-positive
   megakaryoblasts. Purified cells were used for downstream applications,
   including RNA-seq, CUT&RUN-seq, and co-IP/Western. Full-body necropsies
   with histology (hematoxylin and eosin stain) were also performed to
   assess the engraftment of transplanted megakaryoblasts.

RT-PCR

   The iScript cDNA synthesis kit (Bio-Rad 1708890) was used to generate
   cDNA from primary human megakaryoblasts according to the manufacturer’s
   instructions. The Advantage 2 PCR Enzyme System (Takara Bio USA 639207)
   was used to amplify the fusion junction DNA sequence using the
   following primer sequences: forward (5’CATCTGGAGGAAGGCTGA3’) and
   reverse (5’CGGTTGTGGATCTTCAGGT3’).

RNA-sequencing

   RNA was extracted using TRIzol reagent (Thermo Fisher 155960266)
   according to the manufacturer’s instructions, with all volumes divided
   in half due to low input (50,000–500,000 cells). Low input library prep
   (Ovation/NEB) and sequencing on the Illumina NovaSeq6000 (S4) PE150
   sequencer were performed. Raw paired-end FASTQ files were filtered to
   remove low-quality reads and read with adaptor contamination. STAR
   v2.7.la was used to align clean reads to the human genome (hg138) and
   to generate gene counts. STAR output for each sample was combined for
   differential expression testing using DESeq2_1.32.0R package installed
   in R v4.1 using default settings. PCA was performed on rlog-transformed
   data (for sample size <20) or vst-transformed (for sample size ≥ 20)
   using DESeq2. Gene set enrichment analysis was performed using GSEA
   v4.2.2^[236]17 with parameters recommended for sample size. For pathway
   enrichment analysis on subsets of genes, over-representation analysis
   was performed using biomaRt_2.50.3, ReactomePA1_1.38.0, and
   clusterProfiler_4.2.2R packages. Pearson correlation was calculated
   using Intensity raw data that was screened for missing values for
   exclusion before log2 transformation in R. Data was then subset into
   Transplant and Patient groups, and then rowMean was taken by ID in each
   group. Cross-correlation plots were generated and correlation was
   measured using the Pearson method.

Mass spectrometry-based proteomics

   100 ug of protein from 10 million normal primary megakaryoblasts
   cultured in vitro (N = 2), recovered transplanted megakaryoblasts
   (N = 2), and CBFA2T3-GLIS2 patient samples (N = 4) were analyzed in
   technical replicate by tandem mass tag (TMT) mass spectrometry.
   Proteomic analysis was performed with a protocol of multiplexed tandem
   mass tag labeling, two-dimensional liquid chromatography and tandem
   mass spectrometry (TMT–LC/LC–MS/MS)^[237]46. TMT 10-plex isobaric label
   reagent set (Thermo Scientific) was used for proteome quantification of
   10 samples simultaneously. 100 μg of protein per sample was used as
   starting material. Cells were lysed using a buffer that contains 50 mM
   HEPES, pH 8.5, 8 M urea, and 0.5% sodium deoxycholate. The lysate was
   split into three aliquots: one to measure the protein concentration,
   one for a positive control validation of BMP2 by western blotting, and
   one for proteomic analysis. Protein concentration of sample lysates was
   quantified by BCA protein assay (Thermo Fisher Scientific) with
   titrated BSA as a standard. ∼100 μg proteins per sample were first
   digested with Lys-C (Wako, 1:100 w/w) at room temperature for 2 h,
   diluted 4 times with 50 mM HEPES, pH 8.5, and then further digested
   with trypsin (Promega, 1:100 w/w) overnight at room temperature. 1%
   trifluoroacetic acid was added to quench the digestion reaction,
   followed by desalting with Sep-Pak C18 cartridge (Waters), and the
   desalted peptides were dried by speedvac. Samples were then resuspended
   in 50 mM HEPES, pH 8.5, and were labeled with 10-plex TMT reagents
   following the manufacturer’s instructions. Lastly, 10 isobaric labeled
   samples were pooled together with equal amount, desalted again by
   Sep-Pak C18 cartridge and then speedvac dried. TMT-labeled and
   unlabeled samples were verified by LC–MS/MS. The basic pH reverse phase
   liquid chromatography peptides pre-fractionation was performed on the
   Agilent 1220 LC system as previously described^[238]47. The analysis
   was carried out based on the St. Jude proteomics core optimized
   platform as previously described^[239]47,[240]48. The dried peptide
   fractions were reconstituted in loading buffer (5% formic acid), loaded
   on a reverse phase column (75 μm × 50 cm, 1.9 μm C18 resin (Dr. Maisch
   GmbH, Germany)), and interfaced with a FUSION or Q Exactive HF mass
   spectrometer (Thermo Fisher Scientific). Peptides were eluted up to 6 h
   by a 15–65% gradient of buffer B. (buffer A: 0.2% formic acid, 5% DMSO;
   buffer B: buffer A plus 65% acetonitrile, flow rate: 0.25 μl/min). A
   butterfly portfolio heater (Phoenix S&T) was applied to heat the column
   at 65 °C to reduce back pressure. The mass spectrometer was operated in
   data-dependent mode with MS1 settings of 60,000 resolution, 1 × 10^6
   AGC target and 50 ms maximal ion time and top 20 MS/MS high-resolution
   scans with MS2 settings of 1m/z isolation window with offset 0.2,
   60,000 resolution, 100 ms maximal ion time, 1 × 10^5 AGC target, HCD,
   33 normalized collision energy, and 40 s dynamic exclusion (35
   normalized collision energy and 20 s dynamic exclusion for
   phosphoproteome). Peptide identification was performed using the JUMP
   search engine^[241]49. Commercially available database search engines
   can be divided into two categories: tag-based De novo sequencing (e.g.
   PEAKS with limited sensitivity) and pattern-based database search (e.g.
   SEQUEST, MASCOT). The JUMP software integrates these two methods to
   score putative peptides, showing significant improvements compared with
   these commercially available tools11. The JUMP software has previously
   been utilized in numerous publications^[242]50. Analysis was done
   similarly as previously described, MS/MS raw files were first converted
   into mzXML format and searched against a composite target/decoy
   database FDR estimation^[243]48. The target protein database was
   compiled from the UniProt mouse and human database (Human database:
   88,965 protein entries; Mouse database: 52,738 protein entries,
   downloaded in February 2015), and the decoy database was generated by
   reversing target protein sequences. Spectra were searched with ±10 ppm
   mass tolerance for both precursor ions and product ions with full
   tryptic restriction, static modification for TMT tag on N-terminus and
   lysine (+229.16293), dynamic modification for serine, threonine and
   tyrosine (+79.96633, for phosphoproteome analysis), three maximal
   modification sites, two maximal missed cleavages, and the assignments
   of a, b, and y ions. Peptide spectrum matches (PSM) were first filtered
   by MS mass accuracy (∼2 ppm, ±4 standard deviations). PSMs of doubly
   charged peptides with a JUMP Jscore of >30 were applied for global mass
   recalibration prior to the filtering. The survived PSMs were first
   grouped by precursor ion charge state and then further filtered by
   Jscore and dJn values. Cutoffs were applied to these values and were
   adjusted until a protein FDR < 1% was achieved. If one peptide was
   shared by multiple proteins, the protein with the highest PSM will
   represent the peptide according to the rule of parsimony^[244]46. For
   quantitation, an analysis was carried out in the following steps
   similarly as previously reported^[245]48. (i) TMT reporter ion
   intensities of each PSM were extracted; (ii) the raw intensities were
   corrected according to the isotopic distribution of each labeling
   reagent; (iii) PSMs with very low reporter ion intensities were
   excluded (e.g. minimum intensity <1000 and median intensity <5000);
   (iv) sample loading bias was corrected by normalization with the
   trimmed median intensity of all PSMs; (v) the mean-centered intensities
   across samples were calculated; (vi) protein or phosphosite relative
   intensities were summarized by averaging related PSMs; (vii) protein or
   phosphosite absolute intensities were derived by multiplying the
   relative intensities by the grand-mean intensity of the top three most
   highly abundant PSMs^[246]48. Protein lysates were digested and
   desalted and then underwent 16-plex TMT labeling. Samples were analyzed
   for total proteome by low pH reverse-phase liquid chromatography–tandem
   mass spectrometry (LC–MS/MS). Retrieved data was searched against the
   UniProt human database to identify peptides. This was determined by
   assigning related mass addition to all possible amino acid residues
   followed by an algorithm that evaluates the accuracy of modification
   localization. Peptide quantification data was corrected for mixing
   errors and summarized to derive protein quantification results. Using
   this approach, about 10,000 proteins are analyzed, which provides a
   comprehensive proteome analysis.

Spearman’s correlation co-efficient

   For the paired secondary (N = 1) and tertiary (N = 1) transplant cells
   that underwent RNA-sequencing, transcript per million (TPM) values for
   protein-coding genes were log-transformed. Lowly expressed genes were
   filtered out as background noise (>1 TPM in 50% or more samples). For
   the transplant cells that underwent proteomics analysis, variance
   stabilizing normalization (VSN) was applied to the protein intensity
   data and replicates were combined by mean protein intensity value. 3525
   overlapping genes between RNA-Sequencing and proteomics data were used
   in Spearman’s correlation analysis (measured as Spearman’s rho).
   Stringtie (v2.1.4) was used to convert BAM files to TPM values and R
   (v4.1.2) was used for the remaining analysis.

CUT&RUN-sequencing

   500,000 megakaryoblasts recovered from transplantations were
   cross-linked with 0.1% formaldehyde for one minute at room temperature.
   The reaction was quenched with 2.5 M glycine. Cross-linked cells were
   then used with the CUTANA ChIC/CUT&RUN kit (EpiCypher 14-1048) per the
   manufacturer’s instructions. Cells were washed twice with Wash Buffer
   (pre-wash buffer supplemented with 1% Triton-X and 0.05% SDS) and then
   bound to ConA beads. Cell-bound beads were collected and re-suspended
   in Antibody Buffer and 0.5 ug of one of the following antibodies: TY1
   (GenScript A01004-40), ETO (Bethyl A303-509A), CtBP1 (ThermoFisher
   10972-1-AP), p300 (ThermoFisher PA1848), H3K27ac (Active Motif 39034),
   and rabbit IgG (Epicypher 13-0042). Beads were incubated at 4 °C
   overnight on a nutator. The next day, beads were collected and washed
   twice with a cold Cell Permeabilization Buffer followed by incubation
   with pAG-MNase. Beads were collected and washed twice again with Cell
   Permeabilization Buffer and then 100 mM of calcium chloride was added
   on ice to activate the enzymatic reaction. Beads were incubated at 4 °C
   for two hours on a nutator. After the incubation, Stop Buffer was
   added, and the bound protein-DNA complexes were eluted off the beads at
   37 °C. The recovered protein-DNA was collected and de-crosslinked by
   adding SDS and Proteinase K, followed by an overnight incubation at
   55 °C. The next day, DNA was purified using the provided DNA Cleanup
   Columns and DNA Binding and Wash buffer reagents. DNA concentration was
   measured using the Qubit dsDNA HS kit (Fisher [247]Q32851). Library
   prep (NEBNext/ChIPSeq) and sequencing on Illumina NovaSeq600 (s4) PE150
   was performed. CUT&RUN-sequencing was performed in biological
   replications and analyzed with paired RNA-sequencing data. Fastq files
   were trimmed and aligned to the hg19 genome via Basepair
   ([248]https://app.basepairtech.com). Bam files were extracted and MACS
   v2.2.71 was used to call peaks in each sample relative to IgG control
   background (parameters -f BAMPE, -g 2700000000, --p-value 1e−3,
   --call-summits, -B and -SPMR). To build the single tracks for each
   sample, macs2 bdgcmp was used to generate fold-enrichment track
   (parameters -M FE and -p 0.00001), and resulting bedGraphs were fixed
   for conversion to bigwig files using bdg2bw.sh and bedGraphToBigWig
   method (bedtools v2.27.1). To calculate correlation between replicates
   before combination, multiBigWigSummary (bins and
   -chromosomesToSkip,”chrX”, “chrY”, “chrM”, and “random”) and
   plotCorrelation (-corMethod pearson – skipZeros, -- whatToPlot
   scatterplot and -removeOutliers) from deeptools v3.3.0 were used.
   Annotation of narrowPeak files was performed using Homer v4.11 and hg19
   as references. In the cofactor datasets, peaks were further filtered