Graphical abstract

   graphic file with name fx1.jpg
   [81]Open in a new tab

Highlights

     * •
       Patient-derived xenografts (PDXs) are important preclinical models
     * •
       50 pediatric leukemia PDXs are largely derived from Hispanic
       ethnicity
     * •
       Paired patient tumor, PDX, and germline for sequencing analysis
     * •
       This cohort uncovers genomic, transcriptomic, and epigenomic
       features
     __________________________________________________________________

   Cancer; Omics; Transcriptomics

Introduction

   Acute leukemia, the most common malignancy in children, accounts for
   over 25% of all childhood cancers.[82]^1 In recent years, significant
   changes to leukemia management, including improved risk stratification
   based on molecular and genetic alterations,[83]^2^,[84]^3 new
   therapeutic approaches,[85]^4^,[86]^5^,[87]^6^,[88]^7^,[89]^8^,[90]^9
   and new comprehensive supportive care measures,[91]^10 have improved
   outcomes for children with lymphoblastic and myeloid leukemias. Cure
   rates for pediatric leukemias have substantially increased over the
   past four decades, particularly for those with acute lymphoblastic
   leukemia (ALL), 90% of whom are predicted to become long-term
   survivors.[92]^11 While acute myeloid leukemia (AML) survival in
   pediatrics is lower than that of ALL, survival approaching 70% also
   indicates progress.[93]^2

   There are still opportunities to improve ALL and AML therapy. A
   considerable proportion of children with leukemia will not achieve a
   sustained remission, highlighted by the fact that leukemia remains the
   second leading cause of childhood cancer deaths.[94]^12 It is also
   recognized that children of Hispanic ethnicity have both a higher
   incidence and poorer outcome for leukemia when compared to non-Hispanic
   patients.[95]^13^,[96]^14^,[97]^15^,[98]^16^,[99]^17 Current approaches
   to therapy have likely reached their maximum benefit. In AML, for
   instance, the heterogeneity of the leukemia itself has limited
   biology-driven advances in therapy, and the ability to further
   intensify cytotoxic chemotherapy has likely been reached.[100]^18
   Finally, survivors of pediatric leukemia often suffer from long-term
   chronic health conditions such as osteonecrosis, cardiotoxicity, and
   peripheral neuropathy due to cytotoxic drug-based therapies. Further
   advancements to improve outcomes for the pediatric age group are likely
   to require more biology-based molecular targeted therapy, and this is
   particularly true for Hispanic patients where few preclinical model
   systems exist to identify potential therapeutic targets.

   Patient-derived xenograft (PDX) models represent a valuable tool to
   understand tumor evolution and to identify and develop novel
   therapeutics. PDX models are created by implanting cancer cells or
   tissues from a patient’s primary tumor into an immunodeficient mouse,
   simulating human tumor biology in vivo.[101]^19^,[102]^20 Currently,
   resources in the pediatric PDX community have two major shortcomings.
   First, most genomic studies in pediatric PDX models do not include
   germline DNA samples or patient primary tumor samples to delineate
   somatic from germline genetic
   alterations.[103]^21^,[104]^22^,[105]^23^,[106]^24^,[107]^25 Second,
   the majority of published PDX studies have only DNA and RNA sequencing
   data and fail to explore the epigenomic landscape of either PDX models
   or matched primary
   tumors.[108]^21^,[109]^22^,[110]^23^,[111]^24^,[112]^25 Therefore, a
   comprehensive multi-omics analysis of PDX models including both
   germline DNA and epigenomics studies are in an urgent need.

   An additional prevalent challenge for the PDX field is the lack of
   proper analysis method to remove mouse contamination from the
   multi-omics datasets of PDX models, especially DNA methylation data.
   Resected human PDX samples can exhibit a substantial presence of mouse
   DNA or RNA, reaching levels as high as 70–80% due to infiltration by
   murine stromal cells.[113]^26 Consequently, contamination from mouse
   reads could emerge as a critical source of errors during PDX sequencing
   data analysis. The majority of existing algorithmic frameworks tailored
   to alleviate mouse contaminant reads from human PDX samples have been
   designed to accommodate solely DNA sequencing data (e.g., MAPEX
   algorithm[114]^27), or RNA sequencing data (e.g., Xenome
   algorithm[115]^28). Though certain multi-omics solutions, such as
   XenofilteR,[116]^29 have sought to address both DNA and RNA-based
   contamination, but no published algorithm possess the capacity to
   effectively remove mouse contaminant from the three major
   next-generation sequencing (NGS) data modalities: DNA, RNA, and
   methylation-based sequencing data.

   In our cohort described here, we applied whole-exome,
   whole-transcriptome, and whole-genome bisulfite sequencing (WGBS)
   technologies to conduct genomic, transcriptomic, and epigenomic
   analysis on matched germline samples, primary leukemia and PDX models
   from 50 patients with pediatric leukemia, as well as developed a new
   algorithm, REMOCON, to remove mouse contaminant from all three major
   NGS data modalities including genomic, transcriptomic, and epigenomic
   data. This study represents a new resource of PDX models from pediatric
   patients with precursor B-cell and T cell ALL, AML, and mixed phenotype
   ALL samples, including the high fidelity of genomic, transcriptomic and
   epigenomic landscape of matched leukemia and PDXs from the same
   patients, as well as genetic features that correlate with PDX growth.

Results

Leukemia patient-derived xenografts were primarily generated from a
population enriched for Hispanic ethnicity

   Patients with pediatric leukemia with active leukemia were approached
   for specimen collection at three institutions in Texas. Patients and
   families were consented and leukemia-containing samples (either bone
   marrow or peripheral blood) were collected, processed and injected into
   immunodeficient mice at one institution. Overall, 117 patient samples
   were injected, and 82 of them successfully engrafted. In this study,
   the initial set of 50 PDXs with matched germline and primary leukemia
   samples underwent comprehensive multi-omics sequencing, making them the
   focus of our analysis. It is important to note that these 50 samples
   was based solely on their availability and did not involve any other
   influencing factors. The composition of these 50 PDXs includes 19 cases
   of standard risk preB ALL, 19 cases of high-risk preB ALL, 6 cases of
   AML, 5 cases of T cell ALL, and 1 case of mixed phenotype ALL.
   Whole-exome, whole-transcriptome, and whole-genome bisulfate sequencing
   were conducted for this cohort. Patient characteristics of these 50
   PDXs are highlighted in [117]Table 1, including age at diagnosis, sex,
   and ethnicity. More detailed demographic and clinical information for
   each of the pediatric leukemia samples are summarized in [118]Table S1,
   including initial white blood cell count, end of induction minimal
   residual disease (MRD) status, clinically relevant cytogenetics,
   immunophenotype, and relapse status.

Table 1.

   Demographic information for the patient leukemia samples utilized for
   the generation of the patient derived xenografts including age at
   diagnosis, gender, race, and ethnicity
   Diagnosis Number of PDXs Age at diagnosis median in years (range)
   Number of female (percentage) Race (number of white) Ethnicity
   Relapse/refractory
   Standard Risk preB ALL 19 4 (1–8) 10 (52%) 18 15 (79%) 1 (5%)
   High Risk preB ALL 19 14 (0–25) 7 (37%) 16 13 (68%) 5 (25%)
   Acute Myeloid Leukemia 6 9 (0.5–10) 4 (67%) 6 4 (67%) 4 (67%)
   T cell Leukemia/Lymphoma 5 10 (4–15) 1 (20%) 5 4 (80%) 4 (80%)
   Mixed Phenotype Acute Leukemia 1 0.67 1 (100%) 1 0 1 (100%)
   [119]Open in a new tab

   The final column records the percentage of patients who ultimately
   suffered refractory or relapsed disease.

   This PDX cohort is primarily derived from patients with preB cell ALL
   with an equal distribution between patients designated as standard risk
   (SR) and high risk (HR) at diagnosis based on the age of the patient
   and initial total white blood cell count. The percentage of patients
   with established PDXs who ultimately experienced relapsed/refractory
   disease was similar to that generally observed in preB ALL (5% for
   SR-ALL, 25% for HR-ALL), but rates higher than clinically
   reported[120]^30 were noted in patients with AML and T cell ALL (67%
   and 80% respectively) than ([121]Table 1). Therefore, this collection
   of PDX models, particularly for AML and T cell ALL, represent a unique
   resource for relapsed or refractory pediatric leukemia studies.

   In addition, consistent with the evolving demographics of the
   population in Texas, the majority of leukemia PDX models in this study
   were derived from leukemia samples collected from children of Hispanic
   ethnicity. We previously developed an ancestry informative marker (AIM)
   panel, called UT-AIM250,[122]^31 to infer 3-way genetic admixture from
   three distinct continental populations (African (AFR), European (EUR),
   and East Asian (EAS)). In order to validate self-reported ethnicity
   information in this study, we utilized this marker panel to confirm the
   Mexican ancestry for the majority of leukemia samples from our cohort
   ([123]Figure 1).

Figure 1.

   [124]Figure 1
   [125]Open in a new tab

   Determination of ancestral admixture

   (A) Two hundred and fifty (250) ancestral informative markers
   (UT-AIM250 panels) were extracted from WES to infer 3-way genetic
   admixtures of 49 samples (41 from germline DNAs, and 8 from tumor DNAs)
   and plotted as black dots in triangle plots.

   (B) PCA plots with 250 genotypes directly. Three continental
   populations (European, African, and East Asian from the 1000 genome
   project) are plotted in green, orange, and blue dots, respectively.
   Samples from Mexicans in LA (also part of the 1000 Genome Project) were
   plotted in red dots, overlapping with Texas patients in this study.

   (C) Bar chart of 49 patients with estimated proportion to each
   continental population. UT-AIM250 is capable to estimate admixture
   proportion from 8 tumor samples where germline DNAs were not available.

Leukemia patient-derived xenograft models show variability in time to
engraftment

   Across three different centers, a total of 117 primary leukemia samples
   were collected for PDX development. The majority of primary leukemia
   samples collected and injected ultimately engrafted with an overall
   efficiency of 70%. [126]Figure 2A illustrates the total number of
   primary leukemias samples of each subtype that were collected and
   injected and their engraftment status. The percentage of primary
   leukemia samples that engrafted is presented by subtype in
   [127]Figure 2B. AML had the poorest overall engraftment efficiency
   (54.8%) while preB ALL had the highest and showed no statistical
   difference between SR and HR (75.7% and 77.1% respectively). Because
   samples were collected at three separate centers in Texas but processed
   and injected into mice at one site (UT Health San Antonio), we assessed
   whether time from collection to processing for injection influenced
   engraftment efficiency. At one distant site (UT Southwestern), samples
   were collected and shipped with a median time from collection to
   processing of 2.29 days (range 2–6 days). The vast majority of samples
   collected at two sites in San Antonio were processed for injection on
   the day of collection. Delay from collection to injection did not
   measurably influence engraftment as samples injected in <24 h and those
   injected at > 24 h engrafted at similar rates [58/79 (73%) versus 24/38
   (63%), two-way ANOVA test, p = 0.39]. Delays of >72 h from collection
   to injection engrafted at similar rates (11/16, 68.8%). Samples from
   the distant site engrafted at the same overall rate as samples
   collected at the local site indicating no significant impact of time to
   injection [40/63 (63%) versus 42/54 (77%), two-way ANOVA test, p =
   0.81]. The mean time to engraftment across all 38 preB-cell ALL models
   was 10.7 weeks (range 4–26), which was similar to 9.5 weeks (range
   6–14) for engraftment in T cell ALL models ([128]Figure 2C). AML PDX
   models displayed an average of 16.5 weeks (range 5–35) to engraft,
   which was significantly longer than the other the ALL subtypes
   ([129]Figure 2C). Individual PDX growth and engraftment can be found in
   [130]Figure S1.

Figure 2.

   [131]Figure 2
   [132]Open in a new tab

   Leukemia PDX models show variability in time to engraftment and cell
   surface markers

   (A) Number of total leukemia samples collected for PDX collection by
   subtype detailing engraftment (black) versus no engraftment (gray)
   after implantation.

   (B) Percentage of engraftment samples for each leukemia subtype.

   (C) Engraftment rates of leukemia subtypes.

   (D) Percentage of maintained cell surface markers on the primary
   leukemia with its paired patient derived xenograft. Total number of
   paired samples analyzed for that particular cell surface marker noted
   later in discussion.

Patient-derived xenograft acute lymphoblastic leukemia models preserved the
immunophenotypes in the patient leukemia specimens

   Whether PDX models preserve biological features of primary tumor
   samples is a key issue and sets the foundation for their potential
   usefulness in preclinical studies.[133]^32^,[134]^33^,[135]^34 To
   assess the fidelity between the PDX and the patient’s primary leukemia,
   we compared the cell surface markers in the engrafted PDX to the paired
   primary leukemia diagnostic specimen assessed by flow cytometry.
   [136]Figure 2D shows the percentage of PDX/primary leukemia samples
   that expressed selected cluster of differentiation (CD) markers.
   Post-engraftment comparison of cell surface receptors of the PDX to the
   patient’s primary leukemia demonstrates excellent fidelity in standard
   risk preB ALL and T cell ALL with a higher proportion of discrepancies
   in the cell surface markers of AML models. A proportion of high risk
   preB ALL PDX models lost CD10 and CD19 expression which was present on
   the patient’s primary diagnostic sample. This contrasts standard risk
   preB ALL in which CD10 and CD19 expression was preserved. The loss of
   cell surface marker expression may be notable given the importance of
   tisagenlecleucel, a CAR-T therapy directed against CD19, for the
   treatment of refractory and relapsed preB ALL. It should be noted that
   not all cell surface markers were analyzed in the PDX models, and
   [137]Figure 2D only represents samples where data was available for
   both PDX and primary samples. T cell PDX models maintained
   immunophenotype fidelity across all analyzed cell surface markers,
   which suggested little evolution between the patient’s primary leukemia
   and the engrafted PDX. AML models had poor immunophenotypic fidelity
   when compared to the diagnostic specimens, particularly in CD13 which
   was maintained in only half of the PDX models. Full immunophenotype
   information for the PDX models and primary leukemias can be found in
   [138]Table S1.

The REMOCON algorithm effectively removes contaminating mouse DNA reads

   In addition to evaluating whether cell surface markers were preserved,
   we examined whether three PDX models maintain genomic, transcriptomic
   and epigenomic features of the primary leukemia samples. In order to
   accurately characterize the genomic, transcriptomic, and epigenomic
   landscape of these PDX models, we developed the REMOCON (REMOve
   CONtaminant reads) algorithm ([139]Figures 3A and 3B) to remove mouse
   reads in the next-generation sequencing data.

Figure 3.

   [140]Figure 3
   [141]Open in a new tab

   Development and application of REMOCON algorithm to analyze
   whole-exome, whole-transcriptome, and whole-genome methylation datasets
   from germline, leukemia, and PDX samples from the same patients with
   leukemia

   (A) A schematic strategy for the REMOCON algorithm to remove
   contaminant reads, in which reads are mapped to the human genome
   (target genome; TG) and mouse genome (contaminant genome; CG). The read
   which alignment score (AS) to the mouse genome is greater than that to
   the human genome is defined as contaminant read.

   (B) A Pipeline to analyze whole-exome, whole-genome,
   whole-transcriptome, and whole-genome methylation datasets in this
   cohort by removing contaminant reads.

   (C) Applying whole-exome data to build phylogenetic relations among
   germline (G for short), primary leukemia (PT for short) and PDX samples
   from the same patients after REMOCON analysis. To facilitate
   visualization, all PDX samples are labeled red.

   We directly compared whole-exome sequencing (WES) data before and after
   applying REMOCON. We focused on the unsupervised clustering based on
   WES data of paired primary leukemia and PDX samples. Before applying
   REMOCON ([142]Figure S2), only 25 of 50 paired primary leukemia and PDX
   samples clustered together using an unsupervised approach. In contrast,
   after applying REMOCON, all 50 paired primary leukemia and PDX samples
   from the same patient did cluster together ([143]Figure 3C) (Fisher
   exact test, p < 0.001), indicating that REMOCON effectively removes
   mouse contaminant reads. This step would be critical to reveal the
   biological patterns possibly masked by contaminating mouse nucleic acid
   sequencing reads.

   In addition, we further conducted analyses to compare experimentally
   measured and REMOCON-estimated mouse contamination in the same PDX
   models. Firstly, we utilized lactate dehydrogenase (LDH) isoenzyme
   assays to experimentally estimate the level of mouse contamination in
   each of 50 PDX samples (see [144]STAR Methods). Secondly, we employed
   REMOCON to independently calculate the level of mouse contamination in
   the same set of 50 PDX models. Because all 50 PDXs have WES data but
   other omics data are missing in some of 50 PDXs, we utilized WES data
   for estimating mouse contamination to maximize the number of samples
   available for REMOCON analysis. As shown in [145]Figure S3A, we
   compared the REMOCON-estimated mouse contamination (y axis) with the
   experimentally measured mouse contamination (x axis). We observed a
   significantly positive correlation between these two measures (Pearson
   correlation coefficient r = 0.53, p-value = 0.0457). This correlation
   provides strong evidence supporting the reliability of REMOCON as a
   tool for detecting mouse contamination in PDX models. Furthermore, we
   also applied another published method for removing mouse contamination
   reads to conduct the same analysis ([146]Figure S3B) and observed the
   same trend but less significant correlation (Pearson correlation
   coefficient r = 0.47, p-value = 0.1294).

Gene alterations and the transcriptome are conserved in patient-derived
xenografts models of leukemia

   To continue exploring the conservation between matched pediatric
   leukemia samples and PDXs in our cohort, we compared somatic mutations
   ([147]Figure 4A), copy-number variations ([148]Figure 4B), gene
   expression ([149]Figure 4C), presence of fusion genes ([150]Figure 4D),
   and whole-genome methylation ([151]Figure 4E) in matched leukemia-PDX
   pairs. In our analyses, we compared the matched leukemia-PDX pairs from
   the same patients (termed as “matched pairs” for short, red color in
   [152]Figure 4) to the randomly chosen leukemia-PDX pairs taken from the
   different patients (termed as “random background” for short, gray color
   in [153]Figure 4). To avoid trivial bias, we did not allow random
   matching between ALL and AML specimens.

Figure 4.

   [154]Figure 4
   [155]Open in a new tab

   Conservation between matched pediatric leukemia samples and PDXs (red
   color) on genomic, transcriptomic, and epigenomic landscape in
   comparison with randomly chosen pairs of leukemia samples and PDXs
   (gray color)

   This analysis contains mutation (A), copy-number alterations (B), gene
   expression (C), fusion genes (D), and genome-wide methylation states
   (E).

   First of all, as shown in [156]Figure 4A, for random background
   leukemia-PDX pairs defined above (gray color), as shown in the first
   gray bar from the left in [157]Figure 4A, 61% of them (from y axis) do
   not share any somatic mutations (from x axis), and the rest 39% (from y
   axis) only have 0 to 10% (from x axis) somatic mutations shared between
   them (the second gray bar from the left in [158]Figure 4A). None of the
   random background leukemia-PDX pairs have more than 10% somatic
   mutations shared between them, which is consistent with the lack of
   conservation between random chosen leukemia-PDX pairs from different
   patients. However, for all the matched leukemia-PDX pairs from the same
   patient, only 18% of them do not share somatic mutations (the first red
   bar from the left in [159]Figure 4A), and 82% of them have shared
   somatic mutation(s) (the rest of red bars, [160]Figure 4A). In summary,
   this analyses showed that 82% of matched leukemia-PDX pairs from the
   same patient shared somatic mutations as compared to only 39% of
   randomly chosen leukemia-PDX pairs (Wilcoxon rank-sum test, p < 1 X
   10–15, [161]Figure 4A), which supports a statistically significant
   conservation of somatic mutation landscape in matched leukemia-PDX
   pairs in comparison to random expectation.

   Secondly, we used the same rationale to analyze the conservation of
   copy-number alterations between random background and matched pairs. We
   calculated the Pearson correlation coefficients of the gene copy-number
   alterations between leukemia samples and PDX samples, as shown in x
   axis in [162]Figure 4B. 0 represents no correlation and 1 represents
   the highest correlation between leukemia samples and PDX samples. For
   random background leukemia-PDX pairs, the majority of leukemia-PDX
   pairs has no correlation (Pearson correlation coefficients equal to 0)
   on their gene copy-number alterations (gray color, [163]Figure 4B).
   However, in the matched leukemia-PDX pairs, Pearson correlation
   coefficients are not centered on 0 (red color, [164]Figure 4B) and is
   significantly higher than those in the random pairs (red vs. gray,
   Wilcoxon rank-sum test, p = 2.3 X 10-8) ([165]Figure 4B), which
   supports a notable degree of similarity in the copy-number alterations
   among matched leukemia-PDX pairs in comparison to random expectation.

   Thirdly, in order to explore the conservation of gene expression
   pattern, we calculated the Pearson correlation coefficients of the gene
   expression level between leukemia samples and PDX samples in random and
   matched pairs. We found that Pearson correlation coefficients of the
   gene expression is also higher in matched leukemia-PDX pairs than in
   the randomly chosen pairs (Wilcoxon rank-sum test, p = 0.0066)
   ([166]Figure 4C), indicating conserved gene expression pattern in
   matched leukemia-PDX pairs from the same patients.

   Fourthly, in order to explore the conservation of gene fusion pattern,
   we calculated the proportion of overlapping fusion genes between
   leukemia samples and PDX samples in random and matched pairs. We found
   that two-thirds of matched leukemia-PDX pairs showed expression of the
   same fusion genes, a finding does not present in any randomly chosen
   pairs (Wilcoxon rank-sum test, p < 1 X 10–15, [167]Figure 4D). Lastly,
   we observed that the methylation level in all protein-coding genes’
   promoter and coding regions was more similar in the tumor and PDX
   samples pairs (Wilcoxon rank-sum test, p = 0.0044) ([168]Figure 4E).

   To further display detailed transcriptome pattern of matched
   leukemia-PDX pairs, we focused on 715 known cancer genes reported in
   the COSMIC database[169]^35 to remove non-cancer transcription noise
   signals and then performed clustering analysis ([170]Figure S4A). We
   observed that all the samples are divided into three major clusters
   (from left to right) that are enriched in AML, T cell leukemia, and
   preB ALL subtypes, respectively. We repeated the clustering analysis
   using 24 known pediatric leukemia genes ([171]Figure S4B). We found
   that these 24 pediatric leukemia genes are clearly separated into two
   clusters (from top to bottom): the top-panel cluster ([172]Figure S4B)
   represents a group of genes highly expressed in most high-risk preB ALL
   samples and a small portion of standard risk preB ALL samples, while
   the bottom-panel cluster ([173]Figure S4B) represents genes highly
   expressed in most of standard risk preB ALL and AML samples. These
   genes suggest that these known cancer genes, especially known pediatric
   leukemia genes, have the potential as biomarkers to separate pediatric
   leukemia subtypes.

   In summary, we observed that matched leukemia-PDX pairs from the same
   patients (red color, [174]Figure 4) have a significantly higher
   correlation than random expectation (gray color, [175]Figure 4),
   suggesting that in our cohort the PDX samples preserve most of the
   genomic, transcriptomic, and epigenomic features from matched pediatric
   leukemia samples and therefore can faithfully model the disease.

Landscape of somatic mutations, homozygous copy-number deletions, and loss of
heterogeneity in matched pediatric leukemia samples and patient-derived
xenografts

   After confirming the general concordance between patient and PDX
   leukemia specimens, we next defined the somatic mutations, homozygous
   gene copy-number deletions, and loss of heterogeneity (LOH) from
   whole-exome sequencing data. [176]Figure 5A describes the top 30 genes
   that showed alterations in at least 5% of samples in our collection. We
   also clustered samples based on subtypes highlighted in [177]Table 1.
   In summary, NRAS harbored the most frequently recurrent alterations
   (21% of samples) followed by KRAS (11% of samples), both of which are
   known oncogenes in childhood AML[178]^36 and ALL.[179]^37^,[180]^38
   Besides these two established oncogenes, our analysis identified
   multiple known pediatric leukemia driver events, including deletion and
   LOH of ETV6[181]^39 (8% of samples), deletion of CDKN2A[182]^40 (7% of
   samples), as well as frameshift mutations and missense mutations in
   KMT2D[183]^41 (6% of samples). We also manually collected a list of
   previously reported fusion genes in pediatric leukemia and compared
   them with the fusion genes detected in our cohort. We identified two
   previously reported pediatric leukemia fusion genes in our cohort:
   ETV6-RUNX1 fusion[184]^42 in one PDX model (UHS0487 in [185]Table S1),
   and TCF3-PBX1 fusion[186]^43^,[187]^44 in two PDX models (UHS0518 and
   UHS0528 in [188]Table S1).

Figure 5.

   [189]Figure 5
   [190]Open in a new tab

   PDX mutation landscape and phenotype-transcriptome association

   (A) Mutation, homozygous copy-number deletion (LOSS), and loss of
   heterogeneity (LOH) landscape of the top 30 mutated genes in matched
   pediatric leukemia samples and PDXs from the same patients. TMB, total
   mutational burden, which is defined as the total number of somatic
   mutations per coding area of a genome.

   (B) The quantile-quantile (Q-Q) plot of the association test between
   PDX gene expression and PDX growth rate. Each circle in the plot
   represents one gene. The expected p value in a Q-Q plot is calculated
   as the following way: If no gene is associated with the trait (growth
   rate here), the p values we get from tests should follow a uniform
   (0,1) distribution. Suppose there are n genes in the plot, and the
   observed p values are sorted from smallest to largest. The expected p
   value for the i th gene is i/n.

   (C) Pathway enrichment analysis for PDX genes whose expression
   significantly associated with PDX growth rate in (A) (FDR-adjusted
   p < 0.05).

   In order to utilize this new PDX resource ([191]Figure 1) to study
   Hispanic-specific genetic alterations in pediatric leukemia, we
   re-analyzed the above mutations and copy-number pattern in Hispanic and
   non-Hispanic groups independently ([192]Figure S5). Among the 30 most
   frequent mutations and copy-number alterations in our cohort, 12 found
   only in PDXs from the Hispanic population included deletion and LOH of
   ETV6 and mutations in KMT2D ([193]Figure S5). However, due to a small
   sample size the enrichment of these mutations in leukemias from the
   Hispanic population was not statistically significant (Fisher exact
   test, p > 0.05). A re-examination on Hispanic-specific mutations and
   copy-number alterations in pediatric leukemia is warranted in our
   future study with a larger cohort.

Association between patient-derived xenograft gene expression and
patient-derived xenograft growth rate

   In order to elucidate factors that might influence PDX growth, we
   sought genes in which a positive correlation between PDX gene
   expression and PDX growth rate (i.e., elevated PDX gene expression
   correlated with increased PDX growth, or decreased PDX gene expression
   correlated with decreased PDX growth). In studying all 50 PDX models,
   we identified 201 genes (FDR-adjusted p < 0.05) whose expression
   positively correlated with PDX growth ([194]Figure 5B). The ALL- or
   AML-specific subsets lacked statistical power to detect associated
   genes because of the small sample size. Furthering this analysis, we
   identified that multiple GO or KEGG pathways significantly correlated
   with PDX growth rate ([195]Figure 5C). The hedgehog pathway, known for
   regulating cell proliferation and tumor growth,[196]^45^,[197]^46 had
   the most significant enrichment (FDR-adjusted q-value = 1.18 X 10^−5);
   seven genes overlapping with the KEGG hedgehog pathway term were
   identified to significantly associate with PDX growth rate
   ([198]Figure 5C). The contributions of the Hedgehog signaling pathway,
   and other identified pathways such as cell-cell signaling and cell fate
   commitment ([199]Figure 5C), to leukemia PDX engraftment and growth
   will need further functional evaluation in these models.

Discussion

   Novel pediatric PDX models are a critical research resource to evaluate
   the efficacy of new therapies. Pediatric PDX models which mimic the
   disease heterogeneity and patient ethnic diversity seen clinically have
   to date been limited. We report in this study on a comprehensive
   resource of whole-exome, whole-transcriptome, and whole-genome
   methylation sequencing datasets characterizing 50 new pediatric
   leukemia PDX models ([200]Table S1). All PDX models and related
   next-generation sequencing (NGS) datasets in our study are freely
   available for the cancer research community, as described in the Data
   Availability section. This large-scale PDX resource provides further
   tools for the pediatric leukemia research community. By establishing a
   cohort of pediatric leukemia PDXs covering multiple subtypes, these PDX
   models can complement other pediatric leukemia PDXs[201]^20^,[202]^47
   and more accurately recapitulate the underlying etiology of pediatric
   leukemia and assist in efficient drug development.

   It is established that the risk of leukemia in Hispanic children is
   higher than in the overall population, and these patients have poorer
   outcomes.[203]^16^,[204]^17 The majority of the leukemia samples
   collected in this cohort were from Hispanic patients and PDX
   development was equivalent in samples from Hispanic children compared
   to non-Hispanic patients. In the cohort of 50 analyzed here, the
   majority of PDX were from patients of Hispanic ethnicity. For standard
   risk 15 of 19 PDXs established (79%) and for high-risk preB ALL 13 of
   19 (68%) were derived from Hispanic patients. Similarly, 4 of 6 AML and
   4 of 5 T cell ALL were samples from Hispanic patients, although the
   sample number is low (n = 6 and 5, respectively). This cohort
   represents a novel, Hispanic-predominant set.

   Our work also includes the development of a competitive bioinformatics
   algorithm, REMOCON, to remove mouse contaminant reads from DNA-based,
   RNA-based, and methylation-based sequencing data, which is also freely
   available. Given PDX samples can harbor up to 70–80% mouse DNA or RNA
   due to the infiltration of murine stromal cells,[205]^26 mouse read
   contamination is an important source of errors in PDX sequencing data
   analysis and needs to be addressed prior to downstream analyses.
   Existing algorithm packages to remove mouse contaminant reads from
   human PDX samples have focused on either DNA sequencing data (e.g.,
   MAPEX algorithm[206]^27) or RNA sequencing data (e.g., Xenome
   algorithm[207]^28), or both (e.g., XenofilteR[208]^29). However, these
   previously published algorithms cannot process mouse contaminant reads
   from all three major NGS data types, including DNA-based, RNA-based, or
   methylation-based sequencing data. We demonstrated that REMOCON
   recovers the correct biological patterns of leukemia PDX models masked
   by mouse contaminant reads. In addition, we demonstrated a faithful
   recapitulation of pediatric leukemia disease in these PDX models
   through analysis of somatic mutations, copy number alterations, RNA
   expression, gene fusions, and whole-genome methylation patterns.

   Recently Ben-David et al.[209]^48 reported that PDX copy number
   patterns display significant divergence from the primary tumors that
   these PDXs originate from, and therefore questioned whether genetic
   evolution in PDXs can reflect the genomic conservation of primary
   tumors of the same origin, or as a consequence of mouse-specific
   selective pressures. However, other studies have not confirmed this
   report.[210]^49^,[211]^50^,[212]^51^,[213]^52 This is an important
   debate because the conclusion could impact the capacity of PDXs to
   faithfully model patient treatment response. In our PDX samples, we
   demonstrate significant fidelity in somatic mutations, copy number
   alterations, RNA expression, gene fusions, and whole-genome methylation
   patterns from matched leukemia samples, suggesting the high fidelity of
   PDX samples in our cohort.

Limitations of the study

   First of all, the majority of our PDX models are derived from patients
   with B cell ALL with fewer T cell and AML models generated, limiting
   the power of our genomic analyses in AML and T cell ALL more globally.
   We are continuing to expand this valuable leukemia PDX resource to
   include more T cell and AML models. The ability to generate patient
   derived xenografts from multiple centers spread across a significant
   geographic distance is critical in rare diseases such as pediatric
   cancer. The breakdown of our sample collection is reflective of the
   incidence of pediatric leukemia but is unique in the focus on patients
   of Hispanic ethnicity. Second, the role of genomic DNA methylation in
   the development of pediatric cancer is increasingly
   notable.[214]^53^,[215]^54^,[216]^55 However, it is cost prohibitive in
   large scale studies at this time. Our future endeavors include covering
   more samples in our cohort with multi-omics analysis including genomic
   DNA methylation sequencing. Third, we have observed the loss of certain
   surface markers (e.g., CD13) in the PDX models. However, the timeline
   of this shift in immunophenotype remains obscure. Further investigation
   on the time course of immunophenotype drift could provide valuable
   information to help understanding the related limitations and guide how
   to use these PDX models accordingly. Our forthcoming research endeavors
   are poised to delve comprehensively into this facet, thereby advancing
   our understanding of this immunophenotype drift process.

STAR★Methods

Key resources table

   REAGENT or RESOURCE SOURCE IDENTIFIER
   Antibodies
     __________________________________________________________________

   anti-mouse CD45-FITC antibody BD Biosciences, San Jose, CA, USA Cat.
   No. 103108
   anti-human CD45-APC antibody BD Biosciences, San Jose, CA, USA Cat. No.
   20-0459
     __________________________________________________________________

   Biological samples
     __________________________________________________________________

   Patient-derived xenografts (PDX) University of Texas Southwestern
   Medical Center/Children’s Health (Dallas, TX), the Greehey Children’s
   Cancer Research Institute UT Health San Antonio (San Antonio, TX), or
   Methodist Children’s Hospital (San Antonio, TX)
   [217]https://datacommons.swmed.edu/cce/ppdxe/
     __________________________________________________________________

   Critical commercial assays
     __________________________________________________________________

   lactate dehydrogenase (LDH) isoenzyme assay Helena Laboratories Cat.
   No. 3538T
   KAPA HyperPrep kit Roche Cat#5190-6210
   TruSeq mRNA Stranded Library Prep Kit Illumina Cat#20020595
     __________________________________________________________________

   Deposited data
     __________________________________________________________________

   Raw WES and RNAseq data This paper EGAS00001006710
   PDX information This paper
   [218]https://datacommons.swmed.edu/cce/ppdxe/
     __________________________________________________________________

   Software and algorithms
     __________________________________________________________________

   REMOCON algorithm This paper [219]https://github.com/jiwoongbio/REMOCON
   Burrows-Wheeler Aligner (BWA, v0.7.17) Li et al.[220]^56
   [221]https://github.com/lh3/bwa
   Picard (2.21.3) NA [222]https://broadinstitute.github.io/picard
   Genome Analysis Toolkit (GATK, 4.1.4.0) McKenna et al.[223]^57
   [224]https://gatk.broadinstitute.org/
   TopHat package Kim et al.[225]^58
   [226]http://ccb.jhu.edu/software/tophat/index.shtml
   DESeq2 R Bioconductor package Gentleman et al.[227]^59
   [228]https://bioconductor.org/packages/release/bioc/html/DESeq2.html
   CN-fusion NA [229]https://github.com/jiwoongbio/CN-fusion
   [230]Open in a new tab

Resource availability

Lead contact

   Further information and requests for resources should be directed to
   and will be fulfilled by the lead contacts Peter J. Houghton
   (HoughtonP@uthscsa.edu).

Materials availability

   This study did not generate unique reagents and is not part of a
   clinical trial.

Experimental model and study participant details

   Newly diagnosed or relapsed patient with leukemia younger than 26 years
   at the time of presentation were eligible and approached for enrollment
   on IRB approved institutional biobanking protocols. Informed consent
   was obtained prior to any study procedures including specimen
   collection. Patient specimens and demographic information were
   collected for consented patients at either the University of Texas
   Southwestern Medical Center/Children’s Health (Dallas, TX), the Greehey
   Children’s Cancer Research Institute UT Health San Antonio (San
   Antonio, TX), or Methodist Children’s Hospital (San Antonio, TX). Risk
   stratification and diagnosis were based on National Cancer Institute
   (NCI) criteria[231]^60 and World Health Organization (WHO)
   classification.[232]^61 Peripheral blood and bone marrow specimens of
   patients with active disease were collected in antibiotic containing
   RPMI 1640 Medium and stored at 4°C until implantation. Samples
   collected at UTSW/Children’s Health were shipped overnight to UTHSCSA
   using cold shipping containers to preserve specimens. DNA and RNA were
   extracted from Ficoll processed whole blood and bone marrow samples
   using Qiagen DNeasy Blood and Tissue Kit (Cat #69504) or Qiagen AllPrep
   DNA/RNA/protein kits (Cat#80004) respectively. Saliva was processed for
   germline DNA using prepIT.L2P (DNA Genotek). Patient germline sample
   from either whole blood or saliva was collected once marrow remission
   was clinically confirmed by flow cytometry. Ethnicity information was
   obtained by self-report from patient family. In our geographic area
   those reporting ethnicity are primarily of Mexican ancestry.

Method details

Establishment of leukemia PDX models

   Patient-derived xenografts (PDX) were generated as described[233]^62
   with slight modifications. NOD.Cg-Prkdc Il2rg/SzJ (NSG) mice (Jackson
   Laboratory, Bar Harbor, ME, USA) were used to establish patient-derived
   leukemia xenografts. Bone marrow and peripheral blood samples were
   collected from patients who provided written informed consent. This
   study was approved by the Institutional Review Board and the
   Institutional Animal Care and Use Committees of UTSW and UTHSA.

   Peripheral blood mononuclear cells were obtained through density
   gradient centrifugation (Histopaque®-1077 solution, Sigma-Aldrich,
   United Kingdom). Mononuclear cells (1 × 10^6 to 5 × 10^6) isolated from
   fresh BM or PB samples of primary leukemia patients were intravenously
   injected in tail of NSG mice. Assessment of engraftment of human
   leukemic cells in mice peripheral blood started from 3-4 weeks of
   implantation and analyzed over a period of 3-6 months. Time of being in
   the experiment depended on 1) the number and quality of viable cells,
   and 2) the type of leukemia injected in NSG mice. About 3-4 weeks after
   injection, leukemia progression was monitored in the peripheral blood
   of mice every 2-3 weeks by flow cytometry analyses using BD LSRFortessa
   X-20 Cell Analyzer (BD Biosciences, San Jose, CA, USA) with anti-mouse
   CD45-FITC and anti-human CD45-APC antibodies (BD Biosciences, San Jose,
   CA, USA). As shown in the [234]Figure S1, we reported the percentage of
   human CD45^+ cells from a flow cytometry analysis of a gated plot
   (hCD45+ vs. mCD45+). In addition, (hCD45+ vs. mCD45+) plot was
   determined from the live cells in the plot (viability vs. SSC), which
   preceded by doublets exclusion plots (FSC-A vs. FSC-H) and (FCS vs.
   SSC) respectively. We collected about 50 uL of blood (3 drops) from
   retro-orbital sinus with a sterile hematocrit capillary tube and under
   isoflurane anesthesia according Institutional IACUC protocol.

   We sacrificed the mice with the carbon dioxide asphyxiation or by
   cervical dislocation at the first indication of morbidity (≥20% weight
   loss, lethargy, riffled fur) according the Institutional IACUC protocol
   and/or when the proportion of human CD45^+ cells in the peripheral
   blood exceeded 50%. To make sure that the maximum of xenograft
   mononuclear cells would be collected, we harvested and purified them
   from bone marrow of femurs/tibias and spleen cells by density gradient
   centrifugation using Histopaque 1077.

   Purified tumor cells had been collected and cryopreserved in
   freeze-down sterile solution of 90% FBS and 10% DMSO for later use.
   Cryovials were kept at -80°C freezer for 24 h in Nalgene Cryo 1°C
   Freezing Containers and then transferred to liquid nitrogen for long
   storage. Aliquots a small number of cells kept frozen and used for
   downstream analysis. Human cells enrichment in the xenograft samples
   were evaluated by flow cytometry using the same antibody panel as for
   patient samples. PDX samples were analyzed by flow cytometry
   post-engraftment for expression of cell surface markers and compared to
   the patient primary leukemia sample flow cytometry completed at
   diagnosis. Bone marrow and spleen from PDX samples were collected at
   the stage of mortality or the highest level of engraftment. Xenograft
   identity was verified by DNA fingerprinting by STR analysis performed
   at the McDermott Center Sequencing Core, UT Southwestern Medical
   Center, Dallas, TX.

Human/mouse LDH isozyme assay for testing of leukemia samples

   With lactate dehydrogenase (LDH) isoenzyme assays (Helena Laboratories,
   Cat. No. 3538T), we quantitatively measured the levels of human/mouse
   lactate dehydrogenase (LDH) isoenzymes in the xenograft leukemia
   samples using agarose gel electrophoresis on the QuickGel Chamber from
   Helena Laboratories, Cat. No. 3538T. Lactate dehydrogenase (LDH) is a
   tetrameric enzyme that in vertebrates exists in five
   electrophoretically distinguishable forms known as isoenzymes
   (LD1-LD5). Each isoenzyme is designated by a number which is related to
   its electrophoretic mobility. LDH distribution patterns of mice and
   human are different. The isoenzymes of LDH have been determined by
   various methods. Electrophoresis provides far more information than the
   other methods because it allows complete separation of all five
   isoenzymes according to their electrophoretic mobility on agarose and
   visualization of the differences of patterns of distribution. After
   separation, each isoenzyme was detected colorimetrically: a tetrazolium
   salt is reduced with the formation of a colored formazan dye. A high
   quality scanning densitometer the QuickScan 2000 (Cat. No. 1660) was
   used to scan the gels for quantitative results with further analysis by
   ImageJ. A human control sample (HeLa, ATCC® CCL-2 cells) and mouse
   control sample (skeletal muscle tissue of normal mice) were included in
   each agarose gel. Fresh and frozen leukemia cells (1-5 X10^6) shows the
   same efficiency in this assay.

Flow cytometry analysis

   To characterize human leukemia cells from mouse cells, multicolor
   panels were designed for flow cytometry analysis, and each
   fluorochrome-conjugated antibody was titrated for optimal staining.

   Briefly, 50 μl of total peripheral blood or purified mononuclear cells
   (<10^6) were incubated 30 min at 4°C in the dark with fluorochrome
   labeled antibodies. The following monoclonal antibodies were used: FITC
   anti-human CD34, PE anti-human CD13, PE/Cy5 anti-human CD33,
   APC/Cyanine7 anti-human HLA-DR, PerCP anti-human CD61, PE Mouse
   Anti-Human Myeloperoxidase (MPO), PE/Cy5 anti-human CD3, PE anti-human
   CD5, PE/Cy7 anti-human CD7, PE/Cy7 anti-human CD10, APC/Cyanine7
   anti-mouse CD19, PE anti-human CD22, Alexa Fluor® 700 anti-human CD34,
   Alexa Fluor® 700 anti-human CD38, Brilliant Violet 711™ anti-human
   CD117 (c-kit), FITC Myeloperoxidase Monoclonal Antibody (8E6), Alexa
   Fluor® 594 anti-human CD79a, PE/Cy5 anti-human CD11b. Purified Rat
   Anti-Mouse CD16/CD32 (Mouse BD Fc Block™) Clone 2.4G2 (BioLegend, San
   Diego, CA, USA). After incubation, cells were washed with 5 ml of PBS
   and centrifuged at 380 × g (4°C) for 5 min. Red blood cells were
   removed by incubation with 1 ml of RBC lysis buffer (BioLegend, San
   Diego, CA, USA). Finally, cells were washed in PBS, resuspended in flow
   cytometry staining buffer (FCSB) (eBioscience, San Diego, CA, USA), and
   immediately analyzed by flow cytometry analysis.

   In the intracytoplasmic cell markers staining assay,
   Fixation/Permeabilization Working Solution (eBioscience, San Diego, CA,
   USA) was used. For flow cytometry acquisition (BD LSRFortessa X-20 Cell
   Analyzer (BD Biosciences, San Jose, CA, USA)), cells were resuspended
   in a final volume of 500 μl of FCSB. Data analysis was performed using
   FlowJo version 10.0.7 (Tree Star, CA). To define the best gating
   strategy to be applied, compensation was done with unstained, and
   “fluorescence minus one” (FMO) samples.

   Patient leukemia cells underwent diagnostic flow cytometry evaluation
   as part of routine, standard of care. Flow cytometry results were
   collected and cell surface markers which were identified as >dim + were
   noted. Patient derived xenograft leukemia cells were isolated and
   underwent flow cytometry as above. Paired samples were assessed for
   clusters of differentiation (CD) cell surface markers which were in
   common. Not all paired samples underwent the same, full analysis. CD
   markers assessed in both patient derived and PDX samples were
   identified as positive (>dim or >10%) or negative (no expression or
   dim) and the percentage in common calculated.

REMOCON algorithm

   In order to remove mouse contaminant reads, we developed an algorithm
   called REMOCON (short for REMOve CONtaminant reads) which is publicly
   accessible at [235]https://github.com/jiwoongbio/REMOCON. REMOCON is a
   series of PERL Script to be able to implement existing alignment tools,
   including BWA (v0.7.17)[236]^56 and SAMtools
   ([237]http://www.htslib.org). Detailed installation and implementation
   protocols are provided at [238]https://github.com/jiwoongbio/REMOCON.
   REMOCON utilizes the differential mappability of mouse reads onto the
   human and mouse reference genomes. REMOCON removes reads that are
   mapped only to the mouse reference genome or mapped with higher
   confidence to the mouse than the human reference genome. [239]Figure 3B
   displays our integrated analysis pipeline by leveraging REMOCON to
   remove mouse contaminant reads from DNA-based, RNA-based, and
   methylation-based sequencing data, and then performs downstream
   analyses, including but not limited to variant calling, copy-number
   calling, gene expression analysis, and whole-genome methylation
   measurements.

Whole exome sequencing and data analysis

   DNA libraries for whole-exome sequencing was constructed using KAPA
   HyperPrep kit. Approximately 250-500ng genomic DNA were sheared with
   Covaris S220 Ultra Sonicator to an average of 200-400bp fragments for
   DNA-seq library preparation. Then DNA-seq libraries were quantified and
   pooled together to go through two rounds of hybridization to enrich the
   DNA fragments of exome regions by using IDT xGen Exome Research Panel
   (V1 and V2). The final library was amplified, quantified, and loaded
   for 100bp paired end sequencing with Genome Sequencing Facility at
   UTHSA. An average of 60M reads were generated per sample. Trim Galore
   ([240]https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/)
   was used for quality and adapter trimming. The reference genome
   sequences of human (hg38) and mouse (mm10) were downloaded from
   Illumina iGenomes
   ([241]https://support.illumina.com/sequencing/sequencing_software/igeno
   me.html). The sequencing reads were aligned to human and mouse genome
   sequences using Burrows-Wheeler Aligner (BWA, v0.7.17),[242]^56 and
   contamination reads from mouse DNA were removed using REMOCON. Picard
   (2.21.3) ([243]https://broadinstitute.github.io/picard) was used to
   remove PCR duplicates and Genome Analysis Toolkit (GATK,
   4.1.4.0)[244]^57 was used to recalibrate base qualities. Calling
   variants and genotyping were performed using GATK HaplotypeCaller and
   low-quality variant calls were excluded by the following filtering
   thresholds: QD (Variant Confidence/Quality by Depth) < 2, FS
   (Phred-scaled p-value using Fisher's exact test to detect strand
   bias) > 60, MQ (RMS Mapping Quality) < 40, DP (Approximate read
   depth) < 3, GQ (Genotype Quality) < 7. A custom Perl script
   ([245]https://github.com/jiwoongbio/Annomen) was used to annotate
   variants with RefSeq[246]^63 human transcripts and proteins, dbSNP
   (build 151),[247]^64 Genome Aggregation Database (gnomAD,
   r3.0),[248]^65 Catalogue Of Somatic Mutations In Cancer (COSMIC,
   v90)[249]^35 (cancer.sanger.ac.uk). Jaccard distances between samples
   were calculated by point mutations (i.e., single nucleotide variation)
   from WES data, and then used to generate hierarchical clustering of
   samples. Somatic mutations were identified by GATK Mutect2 and somatic
   copy number alterations were identified by DEFOR.[250]^66 Oncoplot was
   generated using maftools[251]^67 with somatic mutations selected by the
   following criteria: variant-supporting reads in tumor sample ≥3,
   variant allele frequency (VAF) in tumor sample ≥0.1, variant-supporting
   reads in normal sample <3, VAF in normal sample <0.1, and also (dbSNP
   allele frequency ≤0.01, or gnomAD allele frequency ≤0.01, or COSMIC
   occurrence ≥3). Loss-of-heterozygosity (LOH) variants selected by the
   following criteria: VAF in tumor sample ≥0.7 and gnomAD homozygous
   individuals ≤3. The clustering analysis was based on Jaccard distances
   among mutations identified from WES data.

RNA sequencing and data analysis

   The quality of Total RNA was checked by Agilent Fragment Analyzer
   (Agilent Technologies, Santa Clara, CA), and only RNAs with RQN >7 were
   used for subsequent mRNA-seq library preparation and sequencing.
   Approximately 500ng Total RNA was used for RNA-seq library preparation
   by following the Illumina TruSeq stranded mRNA sample preparation
   guide. After RNA-seq libraries were subjected to quantification
   process, pooled for cBot amplification and subsequent 100bp paired read
   sequencing run with Illumina HiSeq 3000 platform. An average of 80M
   reads were obtained per sample. The reads were aligned to human and
   mouse transcript sequences using Bowtie (v2.3.4.3)[252]^68 within the
   TopHat package,[253]^58 and mouse contamination and ambiguous reads
   were removed using REMOCON. The quality of sample libraries and
   strand-specificity were estimated based on the alignments. SAMtools
   (v1.9) was employed to sort the alignments. HTSeq Python
   package[254]^69 was employed to count reads per gene and DESeq R
   Bioconductor package[255]^59 was used to normalize read.
   SpliceFisher[256]^70 ([257]https://github.com/jiwoongbio/SpliceFisher)
   was used to identify differential alternative splicing events and
   calculate PSI (percent spliced-in) values. Fusion genes were identified
   by a custom Perl script ([258]https://github.com/jiwoongbio/CN-fusion).

Whole genome bisulfite sequencing and data analysis

   Whole genome bisulfite sequencing (WGBS) was done with Zymo
   Methylation-Gold Kit for DNA bisulfite conversion and SwiftBiosciences
   Accel-NGS® Methyl-Seq kit for library preparation, using approximately
   100ng gDNA as starting material. After WGBS libraries were subjected to
   quantification process, pooled for cBot amplification and subsequent
   100bp paired read sequencing run with Illumina HiSeq 3000 platform. An
   average of 250-300M reads were obtained per sample. The reads were
   aligned to human and mouse genome sequences using Bismark
   (v0.22.3),[259]^71 and REMOCON was used to calculate alignment scores
   and remove contamination reads. The alignments were deduplicated and
   the methylation sites were called using Bismark pipeline.

Quantification and statistical analysis

   Copy number alterations and somatic mutations are discontinuous
   variables and therefore their associations with other variables are
   conducted by Fisher’s exact test. Gene expression values are continuous
   variables and therefore their associations with other features are
   conducted by Wilcoxon rank sum test. For each sample in [260]Figure 5B,
   the growth curve data was fitted using linear regression model to
   estimate the growth rate. Correlation analysis was performed as
   Spearman rank-order correlation with a two-tailed P value, and Spearman
   Rho was calculated. In all statistical tests, nominal p-values were
   corrected for multiple testing using the Benjamini–Hochberg method. A
   Benjamini-Hochberg adjusted P value of <0.05 was considered as
   statistically significant.

PDX model availability

   All PDX models are available after completing the request form on the
   PDX Explorer website under an MTA from UTHSA.

Acknowledgments