Abstract
As more clinically-relevant genomic features of myeloid malignancies
are revealed, it has become clear that targeted clinical genetic
testing is inadequate for risk stratification. Here, we develop and
validate a clinical transcriptome-based assay for stratification of
acute myeloid leukemia (AML). Comparison of ribonucleic acid sequencing
(RNA-Seq) to whole genome and exome sequencing reveals that a
standalone RNA-Seq assay offers the greatest diagnostic return,
enabling identification of expressed gene fusions, single nucleotide
and short insertion/deletion variants, and whole-transcriptome
expression information. Expression data from 154 AML patients are used
to develop a novel AML prognostic score, which is strongly associated
with patient outcomes across 620 patients from three independent
cohorts, and 42 patients from a prospective cohort. When combined with
molecular risk guidelines, the risk score allows for the
re-stratification of 22.1 to 25.3% of AML patients from three
independent cohorts into correct risk groups. Within the adverse-risk
subgroup, we identify a subset of patients characterized by
dysregulated integrin signaling and RUNX1 or TP53 mutation. We show
that these patients may benefit from therapy with inhibitors of focal
adhesion kinase, encoded by PTK2, demonstrating additional utility of
transcriptome-based testing for therapy selection in myeloid
malignancy.
Subject terms: Cancer genetics, Cancer genomics, Cancer screening,
Haematological cancer, Acute myeloid leukaemia
__________________________________________________________________
Several genomic features have been found for acute myeloid leukaemia
(AML) but targeted clinical genetic testing fails to predict prognosis.
Here, the authors generate an AML prognostic score from RNA-seq data of
patients, which successfully stratifies AML patients and which may
provide guidance for therapeutic strategies.
Introduction
The myeloid malignancies consist of a group of related hematopoietic
stem/progenitor cell cancers, including acute myeloid leukemia (AML)
and myelodysplastic syndromes (MDS)^[70]1–[71]3. The prognosis for AML
patients older than sixty years of age has not substantially improved
in decades and remains dismal. This lack of progress highlights the
need for improved diagnostic approaches, clinical assessment, and
treatment strategies, all of which require a better understanding of
the genetic basis of these diseases. Currently, clinical diagnostics
and risk stratification for AML rely on cytogenetic screening for
structural genomic alterations and targeted sequence-based screening
for prognostic and predictive genetic variants^[72]1,[73]4. However,
~50% of patients are stratified to an intermediate-risk group and
remain difficult to assign to an appropriate consolidation of therapy
regimen^[74]1, exemplifying the need for improved stratification of AML
patients.
While standard-of-care clinical guidelines used for AML stratification
are relatively conservative in their use of novel genetic markers,
several recent studies have proposed revised stratification schemes.
Recent updates to the European LeukemiaNet (ELN) guidelines for AML
stratification include RUNX1, TP53, and ASXL1^[75]5, and more recently
published schemes incorporate mutational status in 25 genes and propose
14 separate disease subtypes, although this has not been clinically
incorporated^[76]6. Additional models using mutational and clinical
data^[77]7 or gene expression profiling^[78]8–[79]10 have also been
proposed for improved patient stratification.
While next-generation sequencing of small gene panels is currently
common for diagnostic purposes, genome-wide screening offers many
potential benefits^[80]11. RNA-Seq assays are typically treated as
complementary to existing DNA-based assays rather than as stand-alone
assays. In some cases, however, RNA-Seq has been shown to be more
useful than whole-exome sequencing (WES) or whole-genome sequencing
(WGS) in providing actionable clinical hypotheses in
cancer^[81]12,[82]13 and Mendelian disorders^[83]14,[84]15. RNA-Seq
assays offer potential clinical benefits including the ability to
detect expressed structural variants (SVs), alternative isoform usage
and splicing variation, and global gene expression^[85]16, all of which
are known to be relevant for understanding the pathogenesis of myeloid
malignancies. In addition, RNA-Seq has the potential to supplant
cytogenetic testing, potentially improving cost-effectiveness. However,
for RNA-Seq assays to be incorporated into clinical workflows, strict
standards for analytic and clinical validity must be demonstrated and a
quality framework established^[86]17,[87]18.
In this work, we show that RNA-Seq based testing exceeds the current
clinical standard of care for the assessment of myeloid malignancies,
and provides the broadest range of genomic information, when compared
to WES- and WGS-based approaches. Further, we develop a novel gene
expression signature that allows for the restratification of cases
classified by the current protocol as intermediate-risk AMLs into high-
or low-risk subgroups, thereby allowing better risk stratification for
clinical management. Finally, to demonstrate the utility of
transcriptome-based testing in improving therapy selection in AML, we
identify a subset of high-risk patients with dysregulated integrin
signaling, which is potentially amenable to inhibitors of focal
adhesion kinase (FAK). As therapeutic options for myeloid malignancies
continue to evolve, a global transcriptome-based approach to
diagnostics will allow reconfiguration of mutation- and
expression-based predictors to best take advantage of new genomic
information as it arises.
Results
Experimental design and quality control
To compare RNA-Seq, WGS, and WES as potential platforms for clinical
assessment of myeloid malignancies, we constructed a patient cohort
(the AML Personalized Medicine Program, or AML PMP cohort) consisting
of patients with de novo AML, secondary AML (sAML), therapy-related AML
(tAML), MDS, and therapy-related MDS (tMDS). To demonstrate the
analytic validity of the RNA-Seq pipeline, we constructed a separate
validation cohort consisting of replicated patient and cell line
material, analyzed a local prospective cohort of newly diagnosed
patients, and re-analyzed patient RNA-Seq libraries from The Cancer
Genome Atlas AML (TCGA LAML)^[88]19, Beat AML^[89]20, and TARGET
AML^[90]21 cohorts (Fig. [91]1A, Supplementary Data [92]1–[93]8,
Supplementary Fig. [94]1). We generated and compared quality metrics
for the AML PMP cohort, determined appropriate quality thresholds
(Supplementary Fig. [95]2), and compared coverage depth across
sequencing platforms (Supplementary Fig. [96]3) for a set of 44 genes
with established disease relevance (Supplementary Data [97]9).
Fig. 1. Experimental overview and short nucleotide variant/indel analysis.
[98]Fig. 1
[99]Open in a new tab
A Overview of datasets used at each stage of the analysis, colored by
sequencing project, with sample size for each data set indicated in
brackets. For the AML PMP exploratory batch, MDS samples were used for
profiling the SNV and gene fusion pipelines, but not for
expression-based analyses, and validation RNA-Seq libraries were
prepared in triplicate. B–C Matched variant allele frequency (VAF) for
variants between whole-genome sequencing (WGS) and RNA-Seq (B) and
whole exome sequencing (WES) and RNA-Seq (C), by concordance status,
variant type, and coverage status (sites with ≤10x coverage are
indicated as ‘Low WGS/WES Depth’). Selected variants discussed in the
text are labeled with Human Genome Variation Society (HGVS)
nomenclature. D Variant observations in CEBPA. Potentially disruptive
mutations for each patient (y-axis) are indicated by their chromosomal
coordinate (x-axis), with predicted impact and variant concordance. E
Sensitivity and positive predictive value (PPV) for each variant caller
in the validation cohort (with 95% confidence intervals).
SNV and small indel detection
SNVs and short indel detection can be problematic in RNA-based assays
due to varying levels of transcription, the difficulty of spliced
alignment, allele-specific expression, and RNA editing^[100]16. We
therefore assessed concordance, sequence coverage, and variant allele
frequency (VAF) in matched RNA-Seq, WES, and WGS libraries, using three
different variant calling algorithms (GATK HaplotypeCaller^[101]22,
FreeBayes^[102]23, and VarScan2)^[103]24 and an ensemble caller.
Variant coverage depth was considerably higher for a given RNA-Seq
variant (median coverage: 317 and 168 across the RNA-Seq cohorts)
compared to WES (median coverage: 112) or WGS (median coverage: 36)
variants (Supplementary Fig. [104]4, Supplementary Data [105]10). These
observed coverage depths imply that sequencing RNA-Seq libraries with
fewer total reads would still provide sufficient sequencing coverage
for high-confidence variant detection (e.g., sequencing ~60 million
reads per library would result in an expected mean variant depth of
100x), although deeper sequencing would be expected to improve variant
detection sensitivity.
The majority of variants were concordant between matched RNA-Seq and
WGS (84/94 concordant calls) or WES (375/402 concordant calls)
libraries. Comparison of RNA-Seq to WGS revealed eight cases where
variant calls were present in the RNA-Seq but not WGS data, and two
variants in the WGS data that were not observed in the matched RNA-Seq
libraries (Fig. [106]1B, Supplementary Data [107]11). Comparison of
RNA-Seq to WES revealed 19 calls present in RNA-Seq but not WES data,
and eight calls unique to the WES libraries (Fig. [108]1C,
Supplementary Data [109]12). Seven out of eight of the variants that
were unique to WES and were not found in RNA-Seq data were rare
X-linked variants (in BCORL1, BCOR, and SMC1A) in female patients,
suggesting the variant alleles reside on the inactivated X chromosome
(Supplementary Data [110]13). These variants are all annotated as
benign, of unknown significance, or unreported in ClinVar. For the
variants of unknown significance, their lack of expression at the
transcript level suggests they are likely benign changes. The eighth
WES-specific variant was located on chromosome 3 (GATA2 p.Pro161Ala),
and is annotated as a likely benign germline change in ClinVar. In
general, we observed biased expression for variants in WT1 and GATA2,
which are located on chromosomes 11 and 3, respectively, and are known
to be imprinted^[111]25,[112]26. Variants present in RNA-Seq libraries
but not in matched WES or WGS libraries were mostly discordant due to
low coverage in the matched WES or WGS data. However, several cases
where variants were present at low allelic frequencies in the WES and
WGS data were found to be highly expressed in the RNA-Seq data. These
variants included a nonsense mutation in WT1, and a KIT p.Asp816Tyr
change, both with prognostic relevance that would be likely missed in
WGS or WES data.
Proper assignment of mono-allelic or bi-allelic mutation status in
CEBPA is necessary for patient stratification^[113]27. Sequence
coverage across this GC-rich gene was particularly poor in WES
libraries, leading to six additional calls being recovered in the
RNA-Seq data compared to WES (Fig. [114]1D, Supplementary Fig. [115]5).
Comparison of RNA-Seq to WGS revealed a single CEBPA variant that was
missed in the WGS data.
Validation analysis
To characterize the reproducibility of the RNA-Seq variant-calling
pipeline, we compared filtered call sets from technical replicate
samples to curated reference sets (Supplementary Figs. [116]6–[117]7,
Supplementary Data [118]5, [119]14–[120]15), calculating the
sensitivity and positive predictive value (PPV) of the observed
variants for each individual variant caller, as well as the ensemble
caller. Of the single callers, GATK HaplotypeCaller had the highest
sensitivity (0.978 (0.946–0.994)) and lowest PPV (0.985 (0.961–0.999))
(Fig. [121]1E). The ensemble caller improved on the PPV of GATK
HaplotypeCaller (1 (0.98–1)), but not the sensitivity (0.973
(0.938–0.991)). Notably, the difference in sensitivities between
callers was solely due to GATK HaplotypeCaller recovering two
additional synonymous variants compared to the other calls (while at
the same time not recovering a third synonymous variant) (Supplementary
Data [122]16). The remaining false-negative calls consisted of the same
X-linked synonymous variants observed in the exploratory cohort
(Supplementary Data [123]17), while the GATK false-positive calls
consisted of rare alignment artefacts around splice junctions
(Supplementary Data [124]18). While all three callers showed similar
high performance across the target space, we chose to use GATK
HaplotypeCaller for subsequent experiments in order to simplify
downstream analyses.
Structural variation
The myeloid malignancies are characterized by the presence of recurrent
SVs. We observed complete sensitivity in the exploratory cohort of 173
patients with RNA-Seq libraries for detection of the main clinically
relevant gene fusions, including PML-RARA (nine cases), MYH11-CBFB (11
cases), RUNX1-RUNX1T1 (eight cases), and MLLT3-KMT2A (four cases)
(Fig. [125]2A), in addition to 11 rarer translocations (Supplementary
Data [126]19). After establishing filtering criteria for novel fusions,
we observed several interesting novel fusion events which either
recapitulated the known cytogenetic events in greater resolution, or
identified cryptic events which were not detected by cytogenetics
(Supplementary Fig. [127]8A, Supplementary Data [128]20–[129]22). For
example, we observed multiple CUX1 fusions^[130]28 in a patient with
sAML, complex karyotype, and TP53 mutations. Similarly, we observed an
ERG fusion^[131]29 in another patient with sAML, complex karyotype, and
TP53 mutation. After review, we identified a total of 18 novel gene
fusions in disease-related genes after filtering, including seven
patients with novel rearrangements in the KMT2A gene family
(Supplementary Fig. [132]8B). These novel fusions would likely have
altered treatment decisions in several cases.
Fig. 2. Structural variation analysis.
[133]Fig. 2
[134]Open in a new tab
A Number of structural variation events detected by cytogenetics
analysis vs. RNA-Seq fusion detection. B
[MATH: z :MATH]
-scaled gene expression of GATA2 and MECOM in the AML PMP exploratory
patient cohort. MECOM- (red) and KMT2A-related (blue) structural
variants are colored. The dashed horizontal line indicates the
threshold of
[MATH: z :MATH]
≥ 3 for MECOM outlier expression.
The only case where a clinically relevant chromosomal rearrangement
was detected by cytogenetics, but was not detected by RNA-Seq was a
MECOM (EVI1)-RPN1 fusion representing an inv(3) karyotypic
rearrangement. This case was not recovered since the fusion breakpoint
occurs downstream of both genes^[135]30. Since inv(3) rearrangements
induce aberrant expression of MECOM by relocating a GATA2 enhancer
sequence, we compared the relative expression of these genes
(Fig. [136]2B). The relevant case exhibited the expected pattern of
elevated MECOM expression, as did another MECOM-rearranged sample. High
MECOM expression was also observed in KMT2A-family rearranged AMLs, as
noted previously^[137]31,[138]32 (Supplementary Data [139]23). A
similar pattern was observed in the TCGA LAML cohort (Supplementary
Data [140]24). These findings indicate that MECOM rearrangements can be
detected using expression analysis to identify cases with high MECOM
expression, but which lack KMT2A-family rearrangements.
We further evaluated the accuracy of our pipeline for two other
clinically relevant SVs: internal tandem duplications (ITDs) in FLT3
(Supplementary Figs. [141]8C, [142]D) and partial tandem duplications
(PTD) in KMT2A (MLL). We observed complete analytic sensitivity (33/33
positive results recovered) for FLT3-ITD detection, and also observed
ten novel FLT3-ITD calls in cases where a prior PCR-based assay showed
a negative result. These novel events were validated by panel
sequencing (eight cases) or manual review (two cases), and consisted
mainly of events with very low estimated burdens (Supplementary
Fig. [143]9). Although low- and high-burden FLT3-ITDs are thought to
confer differing risk status^[144]33, we did not observe a difference
in outcomes for patients between low and high allele fraction FLT3-ITD
variants (Supplementary Fig. [145]10). We also observed ten KMT2A (MLL)
PTD events, which are not detectable by standard clinical assays, but
are crucial for proper assessment of patient risk status^[146]34. In
five of these cases, detection of the PTD event resulted in a change of
patient stratification from intermediate to adverse by ELN
guidelines^[147]5.
Gene expression signature—AML prognostic score (APS)
Having validated the RNA-Seq assay as an improvement upon current
clinical assay performance, we next sought to derive a gene expression
signature that could be used to improve risk stratification for a broad
spectrum of AMLs. We used least absolute shrinkage and selection
operator (LASSO) regression^[148]35 to derive a 16-gene expression
signature, which we named the APS (Fig. [149]3A, Supplementary
Data [150]25). This model included both genes with previously described
leukemic associations, such as CD109^[151]36, and genes with previously
undescribed leukemic associations. The APS value was strongly
associated with overall survival in the AML PMP exploratory cohort
(HR = 5.46, p = 4.57e−13, Fig. [152]3B). To validate the utility of the
gene expression signature, we applied the APS to two separate
validation cohorts, observing a strong association with overall
survival in both the TCGA LAML (HR = 2.52, p = 2.48e−06, Fig. [153]3C)
and Beat AML (HR = 2.43, p = 1.88e−06, Fig. [154]3D) cohorts. We then
tested the utility of the APS signature in a prospective local cohort,
again observing a strong association with overall survival (HR = 2.97,
p = 0.00676, Fig. [155]3E). Interestingly, APS was also predictive in a
pediatric AML cohort^[156]21 (HR = 2.16, p = 0.00115, Fig. [157]3F).
The APS value also appears robust to changes in library preparation
protocol, as matched libraries prepared using ribo-depletion protocols
showed well-correlated gene expression values for the signature genes
(for ribo-depleted libraries with good mapping rates, Supplementary
Fig. [158]11).
Fig. 3. Training and validation of the AML prognostic score (APS) gene
expression signature.
[159]Fig. 3
[160]Open in a new tab
A APS model coefficients. The y-axis indicates the genes making up the
APS set, with the x-axis indicating the model coefficients. B–F
Survival plots for the AML PMP (B, n = 154), TCGA LAML (C, n = 173),
Beat AML (D, n = 293), AML PMP Prospective (E, n = 42), and TARGET
pediatric AML (F, n = 156) cohorts, for above-median and below-median
values of the APS value within each cohort. Dashed lines indicate time
to median survival. Log-rank p values are indicated for each cohort.
It has previously been demonstrated that the LSC17 score, based on the
expression of 17 genes linked to leukemic stem cells (LSCs), can be
used to identify AML cases with poor prognosis and treatment
resistance^[161]10. We calculated LSC17 scores for each of the de novo
AML (including favorable-risk cases), tAML, and sAML samples, and
observed that samples with above-median scores (within each cohort) had
significantly worse survival outcomes in the AML PMP (HR = 1.99,
p = 0.000995), TCGA LAML (HR = 1.86, p = 0.00116), and Beat AML
(HR = 1.69, p = 0.00324) cohorts. However, in each case, the hazard
ratio for the APS model was higher than the LSC17 model. In comparing
the APS values and LSC17 scores, we observed that most patients had the
same high/low status for both markers (AML PMP: 66.2%, TCGA LAML:
69.9%, Beat AML: 63.8%, Supplementary Fig. [162]12, Supplementary
Data [163]26). In multivariate survival models including ELN risk
category and either the APS value or LSC17 score, we saw that the APS
value was associated with a larger hazard ratio and stronger
significance in both the AML PMP (APS: HR = 3.6, p = 5.76e-06, LSC17:
HR = 1.63, p = 0.0235) and TCGA LAML (APS: HR = 1.98, p = 0.00567,
LSC17: HR = 1.46, p = 0.0582) cohorts (Supplementary Fig. [164]13).
Expression-based stratification
To determine whether the addition of gene expression information could
improve AML risk stratification, we analyzed the contributions of
clinical and molecular predictors in the AML PMP exploratory cohort
using univariate survival analysis (Fig. [165]4A, Supplementary
Data [166]27). Older patients (age ≥ 60 at diagnosis) had very poor
outcomes (HR = 3.11, adj. p = 1.32e−05), while the remaining clinical
variables were non-significant. We observed a strong association
between the presence of a KMT2A-related fusion or PTD with poor
outcomes (HR = 3.71, adj. p = 6.13e−05), and common molecular
alterations such as NPM1 and FLT3-ITD showed the expected positive and
negative associations with outcome^[167]5. TP53 mutations (HR = 4.59,
adj. p = 0.00163) showed a strongly adverse effect as previously
reported^[168]37.
Fig. 4. Univariate survival analysis.
[169]Fig. 4
[170]Open in a new tab
A–C Univariate survival analysis for selected clinical, gene fusion,
mutation, stratification, and expression-based predictors of survival
for the AML PMP (A), TCGA LAML (B), and Beat AML (C) cohorts. In each
panel, the x-axis indicates the hazard ratio, and the y-axis indicates
the FDR-adjusted p-value for each tested variable for univariate Cox
proportional hazards models. Point sizes are scaled to reflect the
proportion of patients affected by the relevant marker, and colored as
clinical variables—red, gene fusions—green, SNV/indels—purple,
expression—blue. D Distribution of APS values for AML PMP patients,
colored by ELN-RNA risk status (favorable —blue, intermediate—green,
adverse—red). Horizontal dashed lines indicate the bottom,
intermediate, and top terciles of the APS value distribution.
Comparison bars show two-sided Wilcoxon test p-values for each pairwise
comparison.
The expression level of several genes has been previously described as
having prognostic relevance for AML, including GPR56^[171]38,
BAALC^[172]39, MN1^[173]40, MECOM^[174]8,[175]31, and FLT3^[176]13. We
included continuous z-scaled expression of these genes (Supplementary
Fig. [177]14), in addition to the APS and LSC17 multigene expression
signatures in the univariate analysis. Patients with above-median APS
values had the highest observed hazard ratio (HR = 5.46, adj.
p = 1.74e−11). We also observed significant effects of above-median
LSC17 expression (HR = 1.99, adj. p = 0.0063), MECOM expression
(HR = 1.4, adj. p = 0.000392), and GPR56 expression (HR = 1.27, adj.
p = 0.0401). We observed similar results in both the TCGA LAML
(Fig. [178]4B) and Beat AML cohorts (Fig. [179]4C). In these validation
cohorts, the highest hazard ratios were associated with age, TP53
mutations, and high APS values.
To re-stratify the patient cohorts, we first used ELN criteria^[180]5
deriving information only from the RNA-Seq assay (ELN-RNA
stratification, Supplementary Fig. [181]15, Supplementary Data [182]28)
to stratify the patient cohorts into favorable-, intermediate-, and
adverse-risk groups. We chose to use data from the RNA-seq assay rather
than from standard clinical testing, because our data above indicate
that the RNA-seq assay was more accurate in identifying fusions and
relevant pathogenic variants, and thus would provide for more robust
stratification. For this stratification, we considered outlier MECOM
expression to be an adverse prognostic marker, in lieu of detecting
inv(3) rearrangements. We then divided the patient cohorts into
terciles based on APS value, noting that the APS values for favorable-,
intermediate-, and adverse-risk patients showed a significant increase
with increasing risk status (Fig. [183]4D). We then used these APS
tercile values to re-stratify the patients as follows: patients with
first-tercile (low-risk) APS values were stratified as favorable-risk,
patients with second-tercile APS values retained their ELN-RNA
stratification, and patients with third-tercile (high-risk) APS values
were stratified as adverse-risk.
In the AML PMP cohort, 34 of 154 (22.1%) patients were re-stratified
(Fig. [184]5A). As expected, the number of intermediate-risk patients
in the ELN-RNA-APS model was reduced—from 38 in the ELN-RNA model to 13
in the ELN-RNA-APS model. Six patients moved from the favorable-risk
group to adverse—a tAML case with inv(16) cytogenetics, and five other
cases with NPM1 mutations. The four cases that moved from the
intermediate to favorable categories all had long survival durations,
and three of the four carried both NPM1 and FLT3-ITD mutations.
Twenty-one cases were moved from the intermediate to adverse risk
groups. This set of patients was characterized by tAML and sAML
disease, and several normal-karyotype AML patients with no detected
fusions or alterations (all with very poor outcomes). Finally, three
patients transitioned from the adverse to favorable category—these
patients had either KMT2A-PTD alterations or a NUP98-KMT2A gene fusion.
Notably, the survival curves for the ELN-RNA-APS model were not
different from the ELN-RNA model, indicating that the patients
re-stratified away from intermediate status were accurately re-assigned
(Fig. [185]5B–C).
Fig. 5. Patient stratification.
[186]Fig. 5
[187]Open in a new tab
A Comparison of patient stratification by ELN-RNA and ELN-RNA-APS
models for the AML PMP cohort. The width of each link is scaled to the
number of patients shared between each set of risk categories. B–C
Survival curves for the ELN-RNA and ELN-RNA-APS models for the AML PMP
cohort. D–I Stratification and survival curves for the TCGA LAML and
Beat AML patient cohorts. Patient groups are colored as in Fig. [188]4D
(favorable—blue, intermediate—green, adverse—red), with log-rank p
values indicated.
We then applied the same models to the TCGA LAML and Beat AML cohorts
(Fig. [189]5D–I). In both validation cohorts, we observed a similar
pattern: the number of intermediate-risk patients was reduced, while
the outcomes for the newly assigned adverse-risk and favorable-risk
patients were quite similar to the ELN-RNA model. In the TCGA LAML
cohort (Fig. [190]5D), 43 of 173 (24.9%) cases were reassigned, while
in the Beat AML cohort (Fig. [191]5G), 74 of 293 (25.3%) were
reassigned. Re-stratified cases were similar in nature to those
described in the AML PMP cohort, with similar survival results.
Altogether, in all three cohorts, the addition of expression
information in the ELN-RNA-APS stratification model provided a clear
stratification improvement over the standard model using ELN criteria
alone.
To determine the impact of including clinical karyotyping information
instead of the purely RNA-Seq based stratification models, we generated
alternate patient stratification models which used the diagnostic
karyotype information rather than gene fusion data as the source of SV
information (Supplementary Fig. [192]15). While 22/154 cases were
discrepant between the stratification models relying on molecular
alterations alone, applying the APS re-stratification reduced the
number of discrepancies to 9/154. Of these nine cases, two were
discrepant due to cryptic KMT2A-family fusions observed in the RNA-Seq
data (and so were identified as adverse by the RNA-Seq based
models)—these patients both showed very poor outcomes (mean survival
time of 22 weeks). The remaining seven were discrepant due to rare
cytogenetic abnormalities without corresponding gene fusion events (and
so were identified as adverse by the karyotype-based models)—these
patients showed better outcomes (median survival time of 111 weeks).
Though the number of cases is low, these results suggest that including
diagnostic karyotyping information would not necessarily improve the
RNA-Seq based stratification model.
Pathway analysis and differential expression analysis
To demonstrate the utility of the transcriptome-based assay and gene
expression signature for therapy selection, we first used Ingenuity
Pathway Analysis (IPA)^[193]41 and Gene Set Enrichment Analysis
(GSEA)^[194]42 to identify differentially activated pathways between
patients with first-tercile (low-risk) and third-tercile (high-risk)
APS values, excluding acute promyelocytic leukemia (APL) patients. In
the IPA analysis, we observed results consistent with dysregulation of
molecules involved in integrin, chemokine, and cytokine signaling in
the high-risk group (Fig. [195]6A). We performed a similar analysis
using GSEA (with the Reactome^[196]43 database of pathways) and saw
similar results: integrin and chemokine signaling pathways were
strongly enriched across all three patient cohorts in the high-risk
group (Fig. [197]6B). To identify potentially targetable molecules from
the pathway enrichment results, we analyzed molecules that recurred
across multiple enriched pathways or gene sets. This analysis revealed
integrins (e.g., ITGB3, ITGA2B), focal adhesions and cytoskeletal
rearrangements (e.g., PTK2, MYLK, MYL9), SRC/RAS signaling components
(e.g., SRC, RASD1, RASGRP1), chemokines (e.g., CXCL1, CCR4, CXCL3), and
PI3K/AKT/MTOR signaling (e.g., AKT3) as being some of the key
dysregulated molecules in third-tercile APS patients (Supplementary
Fig. [198]16).
Fig. 6. Pathway enrichment analysis.
[199]Fig. 6
[200]Open in a new tab
A Pathway enrichment analysis of IPA canonical pathways overrepresented
in differentially expressed genes across all three cohorts. Pathways
are ranked by mean activation
[MATH: z :MATH]
-score, with point size scaled by Fisher’s exact test p values. B
Geneset enrichment analysis (GSEA) against the Reactome pathway
database. Pathways are ranked by the number of cohorts the pathway was
enriched in, then by the mean FDR-adjusted p value across cohorts. GSEA
p values are derived from permutation testing, and corrected for
multiple testing using the FDR method. In both panels, gene sets
containing PTK2 and/or ITGB3 are highlighted in bold and italic.
To further analyze the biological mechanisms underlying the differences
between patients with low and high APS values, we performed
differential expression analysis (Fig. [201]7). We observed large
differences for the APS genes CD109, CALCRL, and TMEM273.
Interestingly, CD109 was previously identified as being upregulated in
RUNX1-mutated AMLs^[202]44, and CALCRL has also recently been described
as being over-expressed in the immature CD34^+CD38^− compartment of AML
patient marrow^[203]45. Previously described genes with prognostic
implications in AML such as GPR56^[204]38 and MECOM^[205]8,[206]31 were
also consistently upregulated in the third-tercile set of patients.
Among the most downregulated genes, we observed TRH (previously
described as being associated with t(8;21) AML)^[207]46, IL5RA, and the
heme peroxidases LPO and MPO.
Fig. 7. Differential gene expression for patients with first-tercile vs.
third-tercile APS values.
[208]Fig. 7
[209]Open in a new tab
A–C Volcano plots of differentially expressed genes for the AML PMP
(A), TCGA LAML (B), and Beat AML (C) cohorts. Genes with absolute
log[2]fold-change of 1 and FDR-adjusted p ≤ 0.1 (based on Wald tests as
implemented in DESeq2) are highlighted in red (over-expressed in
third-tercile APS) and blue (over-expressed in first-tercile APS). D
Comparison of differentially expressed genes between cohorts. All genes
which were differentially expressed in any single cohort are displayed
with their log2FoldChange values across all cohorts. The top 40
over-expressed and bottom ten under-expressed genes (by mean
fold-change across cohorts) are labeled.
As a proof-of-principle for using transcriptome-based testing for
therapy selection, we chose to focus on integrin signaling via FAK. In
particular, we observed that ITGB3 and PTK2 (which encodes FAK), were
highly expressed in third-tercile APS patients in all three cohorts,
and were central molecules in many of the enriched pathways. FAK has
been previously characterized as a therapeutic target in various
cancers^[210]47–[211]49, while ITGB3 has been demonstrated to be a
vulnerability in some murine leukemias^[212]50.
Dysregulated integrin signaling in high-risk AML
Based on the pathway and differential expression analysis, we sought to
determine whether the high-risk AML cases with dysregulated integrin
signaling were characterized by specific genetic lesions. In all three
cohorts, we observed a significant concentration of RUNX1 or TP53
mutation in patients with elevated expression of PTK2 (Fig. [213]8A).
In addition, elevated expression of PTK2 often occurred in patients
with high APS values, and with sAML or tAML (Supplementary
Fig. [214]17). Interestingly, high expression of PTK2 was not
associated with FLT3-ITD mutation, indicating that this overexpression
was associated with a subset of high-risk AML cases, rather than all
high-risk AML. We also observed, using data from the AML Proteome
Atlas^[215]51, that FAK protein expression was strongly correlated with
the expression of several other proteins involved in focal adhesions,
including SRC, ITGB3, and ITGA2 (Fig. [216]8B). In addition, protein
expression of FAK was significantly higher in patients with RUNX1
mutation, and also elevated (non-significantly) in patients with TP53
mutation (Fig. [217]8C), but decreased in patients with FLT3 mutations,
confirming that the observations in the RNA-Seq data were also present
at the protein level.
Fig. 8. Correlation of PTK2 expression with specific mutations.
[218]Fig. 8
[219]Open in a new tab
A Samples from each cohort ranked by
[MATH: z :MATH]
-scaled expression of PTK2. The upper annotation plot for each cohort
indicates the presence of selected mutations. Indicated p values are
for Kruskal-Wallis tests comparing continuous PTK2 expression against
presence or absence of specific mutations. B Top highly correlated
proteins with FAK.
[MATH: r2
:MATH]
values indicate Pearson correlation coefficients. C Relative FAK
protein expression in patients with and without mutant RUNX1, TP53, or
FLT3 in the AML Proteome Atlas. Two-sided
[MATH: t :MATH]
-test p-values are indicated for each comparison. D Western blot
assessment of FAK and RUNX1 expression in KG1a and THP-1 cell line
derivatives. Cell lines were treated with either (C) control shRNA, (1)
shRUNX1-1, or (2) shRUNX1-2, with molecular weights quantified in
kilodaltons. Quantifications scaled to controls are shown as bar plots.
Each blot represents a single experiment, with the exception of RUNX1
in THP-1, which was performed twice with similar results. E, F
Colony-forming cell assay for AML cell lines with short-hairpin RNAs
against RUNX1, and treated with FAK inhibitors VS-4718 (0.5 μm for
KG1a, 1.5 μm for THP-1) or Defactinib (1 μm for both cell lines). The
indicated p values correspond to two-sided
[MATH: t :MATH]
-tests for each comparison.
We then sought to test whether inhibition of FAK in cell line model
systems could be an effective therapeutic strategy for RUNX1- or
TP53-mutated cells. First, in the MDSL cell line model^[220]52,[221]53,
we observed that CRISPR-mediated inactivation of RUNX1 or TP53 led to
elevated FAK expression and increased sensitivity to the FAK inhibitor
defactinib (Supplementary Fig. [222]18). Next, we tested the AML cell
lines KG1a and THP-1, following shRNA-mediated RUNX1 knockdown.
Depletion of RUNX1 reduced the levels of RUNX1 protein and induced
levels of FAK (Fig. [223]8D). RUNX1 knockdown sensitized cells to
inhibition by the FAK inhibitors VS-4718 and defactinib
(Fig. [224]8E–F), suggesting that these inhibitors may be efficacious
in AML cases with loss-of-function RUNX1 and possibly TP53 mutations.
Together, these results demonstrate a rationale for additional testing
of FAK inhibitors in AML patients with RUNX1 or TP53 mutation. Further,
our analyses revealed that some APS-high cases that did not exhibit a
RUNX1 or TP53 variant also showed increased PTK2 expression, suggesting
that transcriptome-based testing, in addition to better risk
stratification, could permit better selection of therapies for myeloid
malignancies. In the meantime, our approach reveals potential genetic
variant biomarkers that could assist in selecting AML patients for FAK
inhibitor clinical trials in AML.
Discussion
We compared RNA-Seq to WES and WGS-based approaches and found RNA-Seq
to be superior for clinical assessment of myeloid malignancies. The
RNA-Seq assay was capable of recovering (either directly or indirectly)
all the clinically relevant chromosomal translocations and inversions
in the exploratory patient cohort, as well as identifying novel
translocations. The RNA-Seq assay showed improved accuracy for
recovering FLT3-ITD events at low VAF. We also observed that, rather
than suffering compared to WES for detection of clinically relevant SNV
and short indel variants, the RNA-Seq assay had higher sensitivity (for
coding regions in myeloid-relevant targets). Our validation analyses
demonstrated that the reproducibility and analytic validity of RNA-Seq
meet the necessary standards for clinical implementation. While not all
variants are expressed in bulk RNA-Seq data^[225]54, clinically
relevant driver mutations are invariably present. Due to the high
sensitivity of the assay for detection of alterations in FLT3 and
IDH1/IDH2, the RNA-Seq assay could be used to accurately assign
patients to therapies targeting those genes. Indeed, since the RNA-Seq
assay provides a wealth of additional information about the size,
breakpoints, and expression level of FLT3-ITD events (compared to PCR-
or panel-based assays), the assay could be used to prospectively
determine the relationship between these parameters and response to
FLT3 inhibition.
Through expression-signature-based approaches, we saw that
incorporation of gene expression information offers substantial
benefits in terms of identifying lower- and higher-risk patients, and
the potential for identifying patients who may respond to specific
therapies. In particular, the APS value is a hazard indicator of
similar magnitude to TP53 mutation, with high values indicating a very
poor prognosis. Intriguingly, the APS signature was also strongly
prognostic for overall survival when applied to pediatric AML cases,
suggesting that even though the mutational spectra are somewhat
different between adult and pediatric AML, similar biological pathways
drive AML in children and adults. The APS signature outperformed the
LSC17 score with respect to predicting outcome, and there are at least
three potential explanations for this. First, the improved performance
of the APS signature may be due to the use of the entirety of the
transcriptome to generate the model, thus allowing for the capture of
cell nonautonomous signals from the microenvironment, as opposed to the
LSC17 score, which is based solely on a leukemic stem cell signature.
Second, recent studies have demonstrated the potential for mature
leukemia cells to de-differentiate and re-gain stem-like
properties^[226]55. Since the APS signature was trained on bulk
peripheral blood or bone marrow patient material, it could capture
signals identifying those cases with nascent de-differentiation
potential that are likely to progress. Third, since the APS signature
was trained using RNA-Seq data rather than microarray data, it might
benefit from the improved resolution of RNA-Seq. Our approach
highlights the benefit of incorporating gene expression analysis into
the clinical assessment of myeloid malignancies, and the utility of
integrative analysis of genomic data for stratification and therapy
selection. In the near term, this clinically validated RNA-Seq assay
would significantly improve the stratification of patients to stem-cell
transplant.
Our work also serves as a proof-of-concept for the use of a clinical
RNA-Seq assay to identify activated or inactivated pathways,
potentially enabling predictions about the susceptibility of AML
patients to specific targeted therapies. In particular, we show the
correlation of integrin pathway activation with RUNX1 or TP53 variants,
but not with other high-risk AMLs expressing FLT3-ITD variants. Given
the extremely poor outcome of AML with either of these mutations, and
the lack of current therapies directed against these variants, our data
suggest a potential role for FAK inhibitors in treating AML with RUNX1
and possibly TP53 variants. However, we also show that PTK2 and
integrin activation occurs in cases lacking RUNX1 or TP53 variants,
which arguably makes RNA-Seq the clinical assay of choice in high-risk
AML. We suggest that a clinical RNA-Seq assay could also be adapted for
routine clinical use to identify patient-specific therapeutic options
at diagnosis using the previously analyzed cohorts as comparators.
Methods
Study design
We generated sequence data from an exploratory cohort of 176 patients,
generating RNA-Seq libraries for all patients (Fig. [227]1A,
Supplementary Fig. [228]1). The exploratory patient cohort spanned a
range of myeloid malignancies, including AML, MDS, sAML, tAML, and tMDS
(Supplementary Data [229]1–[230]4). We sequenced material from patient
bone marrow, peripheral blood, or leukapheresis samples collected at
diagnosis. For most samples, prior clinical test results were available
(though there were many cases of missing data, and not all patients
received the same tests as the standard of care evolved through time).
The exploratory RNA-Seq libraries were prepared in two separate
batches, referred to as ‘batch one’ and batch two’. To compare RNA-Seq
against alternate technological platforms, we generated WGS data for 18
samples and WES data for 89 samples (Supplementary Data [231]2).
We then constructed an informatic pipeline to detect and annotate SVs,
SNVs, short indels, and gene expression counts for all samples (as
appropriate for each platform, Supplementary Fig. [232]1B). SVs were
assessed in an unbiased fashion, while SNVs and short indels were
inferred for a set of curated genes with known clinical relevance in
myeloid malignancies (Supplementary Data [233]9). For the SV and
SNV/short indel analyses, we used the entire exploratory batch of 176
cases, while for subsequent gene expression and stratification analyses
we used only the 154 AML, sAML, and tAML cases, as these analyses were
intended to be AML-specific (Fig. [234]1A). To determine the
reproducibility of the analysis, we prepared and sequenced material
from a validation cohort consisting of repeated patient samples and
commercially obtained cell line material, prepared as technical
replicates in triplicate from single RNA extractions (Supplementary
Data [235]5, [236]14).
We also re-analyzed samples from the TCGA LAML^[237]19 and Beat
AML^[238]20 cohorts (Supplementary Data [239]7-[240]8). Sample details
for the TCGA LAML cohort were retrieved from the National Cancer
Institute Genomic Data Commons (GDC)
[[241]https://portal.gdc.cancer.gov/projects/TCGA-LAML], cBioportal
[[242]https://www.cbioportal.org/study/summary?id=laml_tcga_pub], the
Broad GDAC Firehose [[243]https://gdac.broadinstitute.org], and from
the Tumor Fusion Gene Data Portal
[[244]https://tumorfusions.org]^[245]56. For data from the Beat AML
project, see the Data Availability section of Tyner et al^[246]20. For
both the TCGA LAML and Beat AML cohorts, we used the published SNV,
short indel, and SV calls, but performed gene expression analyses using
the pipeline described below. To further evaluate the derived gene
expression signature, we analyzed data from an additional local
prospective cohort of 42 cases (Supplementary Data [247]6). In order to
determine whether the APS was applicable to pediatric AML, we analyzed
data from the TARGET pediatric AML project^[248]21, which was also
retrieved from the GDC
[[249]https://portal.gdc.cancer.gov/projects/TARGET-AML]. Matched
ribo-depleted libraries for four cases were prepared by the Centre for
Epigenome Mapping platform, available via European Genome-Phenome
Archive Study EGAS00001000552
[[250]https://www.ebi.ac.uk/ega/studies/EGAS00001000552].
Ethics approval and consent to participate
Peripheral blood, bone marrow, and leukapheresis samples were obtained
from consenting patients via the Hematology Cell Bank of British
Columbia
[[251]http://hematology.med.ubc.ca/research/hematology-cell-bank-of-bc/
]. Ethics protocols were approved by the BC Cancer Research Ethics
Board, under protocols H04-61292, H09-01779, H11-01484, and H13-02687.
RNA extraction, library construction, and sequencing
RNA was manually extracted from bone marrow or peripheral blood using
Qiagen Allprep kits. Total RNA samples were checked using Agilent
Bioanalyzer RNA nanochip or Caliper GX HT RNA LabChip. Samples that
passed quality check were arrayed into a 96-well plate. Following this,
polyA^+ RNA was purified using the 96-well MultiMACS mRNA isolation kit
on the MultiMACS 96 separator (Miltenyi Biotec, Germany) from total RNA
with on column DNaseI-treatment as per the manufacturer’s instructions.
The eluted polyA^+ RNA was ethanol precipitated and resuspended in
10 µL of DEPC-treated water with 1:20 SuperaseIN (Life Technologies,
USA).
Double-stranded cDNA was synthesized from the purified polyA^+ RNA
using the Maxima H Minus First Strand cDNA Synthesis Kit (Thermo Fisher
Scientific Inc., USA) and random hexamer primers. Quality passed cDNA
plate was fragmented by Covaris LE220 for 2 × 65 s at duty cycle of
30%. The paired-end sequencing library was prepared following Canada’s
Michael Smith Genome Sciences Centre paired-end library preparation
plate-based library construction protocol on a Biomek FX robot
(Beckman-Coulter, USA). Briefly, the cDNA was subject to end-repair,
and phosphorylation by T4 DNA polymerase, Klenow DNA Polymerase, and T4
polynucleotide kinase respectively in a single reaction, followed by
cleanup using magnetic beads and 3′ A-tailing by Klenow fragment (3′ to
5′ exo minus). After cleanup, adapter ligation was performed. The
adapter-ligated products were purified using magnetic beads, then UNG
digested and PCR-amplified with Phusion DNA Polymerase (Thermo Fisher
Scientific Inc., USA) using Illumina’s PE primer set in a single
reaction, with cycle condition 37 °C 15 min, 98 °C 1 min followed by 13
cycles of 98 °C 15 s, 65 °C 30 s and 72 °C 30 s, and then 72 °C 5 min.
The PCR products were purified and size selected using magnetic beads,
checked with Caliper LabChip GX for DNA samples using the High
Sensitivity Assay (PerkinElmer, Inc. USA) and quantified with the
Quant-iT dsDNA HS Assay Kit using Qubit fluorometer (Invitrogen).
Libraries were normalized and pooled. The final concentration was
determined by Qubit dsDNA HS Assay for Illumina Sequencing.
For the first exploratory cohort RNA-Seq batch, we sequenced a single
library per Illumina HiSeq 2000 lane, using 2 × 75 bp reads, which
resulted in ~400 million reads per library. Ninety-two samples were
initially submitted for library construction, of which 89 were
successfully prepared and sequenced. For the second exploratory cohort
RNA-Seq batch, we sequenced two libraries per Illumina HiSeq 2000 lane,
using 2 × 75 bp reads, which resulted in ~200 million reads per
library. Ninety-two samples were initially submitted for library
construction, of which 87 were successfully prepared and sequenced. For
the validation and prospective RNA-Seq cohorts, we sequenced two
libraries per Illumina HiSeq 2000 lane, using 2 × 100 bp reads, which
resulted in ~200 million reads per library. For a set of three samples,
we sequenced two HiSeq lanes. Seventy-one samples were successfully
prepared and sequenced for the validation cohort, and 42 for the
prospective cohort. For both the validation and prospective cohorts, a
modified strand-specific RNA-Seq library construction protocol was
used.
DNA extraction, library construction, and sequencing
DNA was extracted from bone marrow or peripheral blood using Qiagen
Allprep kits. Genome libraries with fragment size ranges of ~400 bp
were constructed on a SPRI-TE robot (Beckman Coulter, USA) according to
the manufacturer’s instructions (SPRIworks Fragment Library System I
Kit, A84801). Briefly, 1 µg of genomic DNA in a 60 µL volume, and
96-well format, was fragmented by Covaris E210 sonication for 30 s
using a duty cycle of 20% and intensity of 5. Up to 10 paired-end
genome sequencing libraries were prepared in parallel using the SPRI-TE
300-600 bp size-selection program. Following completion of the SPRI-TE
run the adapter-ligated library templates were quantified using a Qubit
fluorometer. Five nanograms of adapter-ligated template was PCR
amplified using Phusion DNA Polymerase (Thermo Fisher Scientific Inc.
USA) and Illumina’s PE indexed primer set, with cycle conditions: 98 °C
for 30 s followed by 10 cycles of 98 °C for 15 s, 62 °C for 30 s and
72 °C for 30 s, and a final amplicon extension at 72 °C for 5 min. The
PCR products were purified using Ampure XP SPRI beads, and analyzed
with Caliper LabChip GX for DNA samples using the High Sensitivity
Assay (PerkinElmer, Inc. USA). PCR products of the desired size range
were purified using gel electrophoresis (8% PAGE or 1.5% Metaphor
agarose gels in a custom built robot) and the DNA quality was assessed
and quantified using an Agilent DNA 1000 series II assay and Quant-iT
dsDNA HS Assay Kit using Qubit fluorometer (Invitrogen), then diluted
to 8 nM. The final concentration was verified by Quant-iT dsDNA HS
Assay prior to Illumina Sequencing.
Two hundred and fifty nanograms of the constructed genome libraries
were used to capture the exome using Agilent SureSelect Human All Exon
(50 Mb) capture probes. 89 exome libraries were sequenced with
paired-end 100 bp reads (1 library per HiSeq 2000 lane). 20 WGS
libraries were sequenced with paired-end 100 bp reads on the Illumina
HiSeq 2000 instrument, sequencing three lanes per sample.
Bioinformatics pipeline overview
All samples were processed using customized in-house bioinformatics
pipelines, as well as the bcbio-nextgen pipeline framework
[[252]http://bcbio-nextgen.readthedocs.io/]. The WGS and WES data was
initially processed by the Bioinformatics Core at Canada’s Michael
Smith Genome Sciences Centre, Vancouver, Canada, while the RNA-Seq data
was processed by the authors. All variant-calling and tertiary analyses
were performed by the authors.
Read alignment and quality control
WGS and WES sequence data were aligned to the hg19 human reference
genome using BWA (version 0.5.7, ‘aln’ and ‘sampe’
subcommands)^[253]57, with duplicates marked using Picard
[[254]http://broadinstitute.github.io/picard/]. RNA-Seq data was
aligned using GSNAP (version 2013-10-28)^[255]58, using the hg19
reference and with command-line arguments ‘–novelsplicing 1
–max-mismatches 10 –use-splicing’.
We used RNA-SeQC (version v1.1.8)^[256]59 to gather quality metrics for
the RNA-Seq libraries. To examine the range of variation and identify
outliers in our data set, we constructed Levey-Jennings charts^[257]60
for all the metrics provided by RNA-SeQC (Supplementary Fig. [258]2).
These charts identified several samples which were consistent outliers
(metric values exceeding plus or minus two standard deviations from the
mean). From the available RNA-SeQC metrics, we selected six key
variables (number of mapped reads, mapping rate, rRNA rate, intragenic
rate, mean fragment length, and exonic rate). We then failed all
libraries which had outlying values (plus or minus two standard
deviations from the mean) for three or more of those variables. The
failed validation cohort libraries all corresponded to cases where we
attempted to re-sequence patient material, starting from limited
amounts of input RNA. Of the original 176 libraries in the exploratory
cohort, three samples were failed, while six samples from the
validation cohort were failed due to poor library quality. The failures
were generally associated with poor input material quality as most of
the failed libraries came from attempts to re-generate additional
sequencing libraries from older material.
SNV and small indel detection
We used the bcbio-nextgen pipeline (version 1.0.1) to call variants
using three callers: VarScan (version 2.4.2)^[259]24, GATK
HaplotypeCaller (version 3.7)^[260]22, and FreeBayes (version
1.1.0)^[261]23. In all cases, the variant-calling region was restricted
to the clinical targets of interest (Supplementary Data [262]9). For
these targets, we either selected all coding exons, or specific
‘hot-spot’ targets to assess for SNVs and short indels. All variant
callers were configured as specified by the bcbio pipeline software. In
addition to the three variant callers, we used the bcbio ensemble tool,
configured to report a consensus variant file containing all variants
called by at least two callers, without being filtered by any single
caller. We used vt (version 0.57)^[263]61 to decompose multiallelic
calls, and to normalize indel positions. All variants were then
annotated using snpEff (version 4.3 g)^[264]62, with HGVS nomenclature
generated using a single chosen transcript model for each gene. We used
gemini (version 0.20.0)^[265]63 to further annotate variants against
reference data sets including COSMIC (version 68))^[266]64, ExAC (r0.3)
and gnomAD (r2.0.1)^[267]65, ClinVar (v20160502)^[268]66, and CADD
(version 1.0)^[269]67.
We first determined the clinically relevant SNVs and short indel
alterations for each patient. Briefly, from the unfiltered variant
calls, we filtered coding variants with moderate ExAC^[270]65
population frequencies (≥0.01 for synonymous variants, ≥0.1 for
missense variants), and recurrent low-quality and artefactual variants.
We validated the presence of selected mutations (for 157 cases where
the material was available) using mononuclear cells (153 cases) or red
cell lysed marrow cells (four cases), and determined the somatic status
for variants where patient material was available (147 cases) using
cultured marrow fibroblasts (107 cases) or sorted CD3^+ T cells (40
cases) from the same patient specimen, using targeted sequencing
experiments. This validation was carried out by Canada’s Michael Smith
Genome Sciences Centre, with PCR primer design and data reviewed by the
authors.
Since the raw call set for the RNA-Seq libraries contained many
splice-alignment artefacts, we filtered all variants annotated as
intronic or in splice donor/acceptor sites. All calls from the ensemble
set were retained, as well as selected calls reported by only a single
caller, as determined during the report review. Additional recurrent
artefactual variants in the RNA-Seq data were manually reviewed and
removed if they were determined to be both recurrent and the result of
splice-alignment artefacts, polyA/T runs, or poor base quality. For
some cases where complex indel calls were reported with slight
nomenclature differences between callers, coordinates were manually
reviewed and merged if they reported identical events (e.g., for an
indel in CEBPA with calls from GATK-haplotype (c.283_288delGTGGGC and
c.292delA) and FreeBayes (c.283_289delGTGGGCC and c.292A > C), the
nomenclature was adjusted to allow the two calls to be counted as
concordant with each other). We additionally labeled all variants that
were either (a) synonymous with adjusted ExAC population frequencies
≥0.01 or (b) missense variants with adjusted ExAC population
frequencies ≥0.1 as benign.
To compare variant call sets between matched RNA-Seq and WES or WGS
libraries, we generated merged variant sets for each relevant library
pair, using the ensemble call set. We used bedtools (version
2.24.0)^[271]68 and VarScan2 (v2.4.2)^[272]24 to generate coverage
depth and the observed non-reference allele frequency across all SNV
targets (Supplementary Data [273]9). We used a coverage threshold of
50× for reporting low coverage, and 100× for a warning threshold in the
RNA-Seq data. In addition, we used the single-caller calls for each
caller and library to compare differences between callers. For these
comparisons, we considered sites with <10-fold coverage to be
effectively uncallable. For several sites in the RNA-Seq data where the
matched WES/WGS data indicated a potential missed call, we identified
an alignment issue in the pipeline that was addressed in a software
update and so did not consider these sites as discordant.
To analyze libraries from the validation cohort, we retained all calls
from all callers (i.e., keeping common synonymous variants), with the
exceptions of known recurrent artefacts, SNV calls near FLT3-ITD
boundaries, and sites with coverage <10-fold coverage (Supplementary
Data [274]15). Note that for samples 157-18 and 213-51, two of three
replicates failed sequencing quality checks, therefore these samples
were excluded from further analysis. In the raw call set, we observed
several false-negative and false-positive calls that were due to low
sequence coverage of the gene of interest (which would be reported as
uncallable by our standard procedure) or one of eight recurrent
spurious artefacts (which were added to a list of known artefacts for
the pipeline). For example, CEBPA expression is very low in the NIST
standard cell line NA12878^[275]69 (Supplementary Fig. [276]6) compared
to patient material. We thus filtered these calls from the dataset
(Supplementary Data [277]15), and examined the remaining true-positive,
false-positive, and false-negative calls (Supplementary Fig. [278]7).
We calculated the number of true positive, false positive, and false
negative sites as the average among replicates for each cell line or
repeated patient sample. In addition, we calculated the coefficient of
variation between replicates in terms of the number of retained variant
calls per replicate.
Structural variation
To detect inter- and intra-gene SVs, we performed de novo assembly on
all the RNA-Seq libraries, using Trans-ABySS (version 1.5.2)^[279]70
and PAVfinder (versions 0.2.0 and 0.3.0)^[280]71. PAVfinder relied on
GMAP (version 2014-12-28)^[281]58 and BWA MEM (version
0.7.12-r1039)^[282]72 for contig-to-genome, contig-to-transcript, and
read-to-contig alignments. The parameters used for trans-ABySS included
assembly at three values of k (the kmer size used for assembly)—32, 52,
and 72. These three assemblies were merged before further analysis with
PAVfinder. We used custom R^[283]73 and Python tools to annotate all
events with gene expression estimates, to annotate calls with similar
coordinates to each other, and allow for annotation of known recurrent
artefactual and common events.
We then filtered the raw event lists as follows. ‘Recurrent Artefact’
and ‘Recurrent Common’ events were identified based on manual review of
recurrence within the cohort, fusion properties, and literature review
(Supplementary Data [284]20), and were removed. We then collapsed
similar events (i.e., events with the same gene partners) into single
events, and filtered out events matching any of these criteria: minimal
support (fewer than eight supporting reads), average TPM expression
value for the two fusion partners <2 or >130), inter-gene distance for
putative fusion partners on the same chromosome of <2 × 10^6 bp, or
putative fusions involving HBA2.
We then constructed a list of expected gene fusions present in the
exploratory cohort using the cytogenetics nomenclature and determined
whether those events were captured in the RNA-Seq data. To construct
the set of novel gene fusions used for patient stratification and
downstream analysis, the set of fusion results was further filtered by
manual review, selecting gene fusions with strong supporting evidence
and/or genes previously known to be involved in myeloid malignancies
(Supplementary Data [285]21–[286]22).
To further verify the presence of the reported gene fusions, we
performed the following additional validation steps. All of the
retained fusions in disease-related genes (Supplementary Data [287]21)
were manually reviewed for supporting read evidence in IGV. In
addition, where copy-number array, WES, WGS, or gene panel data was
available, those data sources were also reviewed. Due to limited
patient material availability, validation of the remaining cases was
not able to be performed. This review identified two sets of fusion
partners (t(5;12) IL31RA-CTDSP2; t(X;Y) CD99P1-CD99) that were likely
to be false-positive results—these results were removed from downstream
analysis.
The reference truth set for FLT3-ITD calls was constructed principally
from prior research testing results using a PCR-based assay. The
complete set of PAVfinder results for each RNA-Seq library was filtered
for events annotated as ‘ITD’ or ‘ins’ (for insertion) within FLT3.
Each event was then annotated with the expression of FLT3 in
transcripts per million (TPM). For each library in the exploratory
cohort where a reference call was available, we assessed whether the
PAVfinder output agreed or disagreed with the reference call. To
estimate the relationship between RNA-Seq evidence and the allelic
fraction of detected FLT3-ITD events, we performed linear regression
analysis, modeling the observed VAF from GATK HaplotypeCaller (for
those ITD events that were captured by that tool in the RNA-Seq and WES
datasets) against combinations of spanning read support, ITD size,
median coverage over the FLT3 ITD region, and expression of FLT3. The
best model accuracy was obtained using only RNA-Seq VAF estimates,
using a linear model predicting FLT3-ITD VAF = (spanning reads * 0.293)
+ (median FLT3 coverage * −0.0213) + 29.4. This regression was then
used to estimate the VAF for all detected FLT3-ITD events. We then
dichotomized FLT3-ITD events into ‘FLT3-ITD-low’ and ‘FLT3-ITD-high’
support, using a VAF cutoff of 0.33 (corresponding to a mutant:wildtype
allelic ratio of 0.5). For selected FLT3-ITD events, we used
panel-based sequencing to validate the events. For the remaining cases,
the detected events were reviewed manually using IGV^[288]74. To detect
KMT2A-PTD events, we filtered the complete event set for events
annotated as ‘ins’, ‘ITD’, or ‘PTD’, with either ‘gene1’ or ‘gene2’
annotated as KMT2A. Two low-evidence ITD events were removed based on
manual inspection. Again, we used panel-based sequencing to confirm
that two additional low-evidence calls were truly false-positives in
the initial RNA-Seq result set.
Gene expression signatures
Expression quantification was performed for all RNA-Seq libraries using
sailfish (version 0.9.0)^[289]75, using RefSeq gene models downloaded
in Gene Transfer Format from the UCSC genome browser on 2014-08-21,
with gene models from non-standard chromosome sequences removed. Both
isoform- and gene-specific quantifications were generated, and both raw
estimated counts, as well as transcripts-per-million (TPM) estimates
were used in downstream analysis. For the TCGA LAML and Beat AML
cohorts, we re-quantified gene expression using the same software and
gene models.
From the raw, gene-level expression estimates, we filtered the data as
follows (calculating thresholds separately for each cohort). First,
genes with median expression ≤0 TPM were removed. Second, to normalize
counts across samples, we converted counts from TPM to log[2](TPM + 1)
and applied quantile normalization using the preprocessCore
Bioconductor package (version 1.48.0)^[290]76. To visualize and
determine outlier status for single genes, we calculated z-scaled
expression estimates (where
[MATH:
z=(x−μ)/σ :MATH]
) for each gene by comparing the expression of that gene in a
particular sample against the range of expression for that gene across
a given patient cohort. For MECOM outlier analysis, we labeled all
samples with z-scaled expression ≥3 as high outliers. For other genes
with known prognostic relevance in AML (FLT3, GPR56, BAALC, MN1, and
BCAT1), we used the z-scaled expression values as continuous variables
for survival modeling. We calculated values for the LSC17 gene
expression signature using the model coefficients described by Ng et
al.^[291]10, using the normalized, TPM-scaled expression estimates for
the AML PMP and TCGA LAML cohorts. We then dichotomized each cohort,
using the median LSC17 score within each cohort to define high and low
scores.
We used the R glmnet package (version 4.0-2)^[292]35,[293]77 to perform
LASSO regression to identify a subset of features that could be used to
fit a Cox regression model. We first filtered the retained gene set by
restricting the set to genes with a minimum expression of 0.5 (in
log[2](TPM + 1) units) across the AML PMP, TCGA LAML, and Beat AML
cohorts. We initially used 10-fold cross-validation, using the 154
AML-like samples from the exploratory cohort as the training set, using
parameters “family = ‘cox’, maxit = 10000” to select the value for the
[MATH: λ :MATH]
parameter with the minimum mean cross-validated error. We then used
this
[MATH: λ :MATH]
value to extract the final set of 16 genes and model coefficients (the
APS gene expression signature, Supplementary Data [294]25). We then
used the ‘predict’ function within the glmnet package to calculate
model predictions (i.e., APS values) for each of the samples,
dichotomizing patients by the median value into ‘APS-High’ and
‘APS-Low’ sets. APS values were applied to the TCGA LAML and Beat AML
cohorts in a similar manner (and dichotomized using the median
within-cohort APS values).
Expression-based survival models
To derive revised models for AML stratification that incorporated gene
expression information, we first tested many potential predictors in
univariate Cox proportional hazards analyses (Supplementary
Data [295]27). To control for multiple testing, we adjusted the
estimated p values using the false-discovery rate (FDR) method^[296]78.
To facilitate visualization, we transformed the adjusted p values by
calculating −10log[10](adj. p), using a threshold of 10 for
significance (corresponding to an adjusted p value of 0.10). Since none
of the APL patients succumbed to the disease, we could not calculate a
hazard ratio for the t(15;17) alteration in the AML PMP cohort.
Patient stratification by standard and expression-based models
Initial patient stratifications (ELN-RNA) were performed based ELN
molecular guidelines^[297]5, modified to incorporate equivalent
information available through the RNA-Seq assay (Supplementary
Data [298]28). In addition, we included t(15;17) translocations as a
favorable risk marker, and PTD in KMT2A (MLL) as an adverse risk
marker, based on their known strong influence on AML
outcomes^[299]6,[300]34. For the ELN-RNA-APS model, we re-stratified
patients with first-tercile or third-tercile values for the APS value
into the favorable and adverse risk categories, respectively. In each
case, custom R functions were written to stratify patients based on the
presence or absence of particular molecular markers. We then determined
the numbers of patients with agreeing or disagreeing risk
stratifications by the different criteria, and manually reviewed
discordant cases. We estimated survival outcomes for each patient
stratification model using the Kaplan–Meier estimator, using the R
survival (version 3.2-7)^[301]79 and survminer (version 0.4.8)^[302]80
packages. To compare the RNA-Seq-based stratifications with
stratifications utilizing diagnostic karyotyping information, we
re-stratified the patients using karyotype information (rather than
gene fusion information), for the ELN-Cyto model, and applied the same
APS rules to generate the ELN-Cyto-APS model (Supplementary
Fig. [303]15).
Pathway and differential expression analysis
We used IPA (IPA, December 2018 Release)^[304]41 to perform a Core
Analysis separately for each cohort, retaining all molecules meeting
the significance and fold-change thresholds, with other analysis
parameters set to their default values. To compare enriched pathways
across cohorts, we ranked the pathways by the number of cohorts they
occurred in, then by the mean activation
[MATH: z :MATH]
-score (as calculated by IPA). We used GSEA (version 3.0)^[305]42 to
perform gene-set enrichment analysis for selected pathway datasets
including the MSigDB hallmark gene set collection^[306]81, the Kyoto
Encyclopedia of Genes and Genomes (KEGG)^[307]82, Reactome^[308]43, the
Gene Ontology resource^[309]83, and Wikipathways^[310]84. For the GSEA
analysis, we used sets of genes ranked by their fold-change values as
input, gene sets from MSigDB v6.2, and command-line arguments ‘-norm
meandiv -nperm 1000 -scoring_scheme weighted -set_max 500 -set_min 15’.
We also performed exploratory analyses using EnrichR (version
2.1)^[311]85. To identify recurrently implicated molecules, we
inspected the IPA enriched molecule lists and GSEA leading edge genes
from the significantly enriched pathways, and identified molecules that
occurred repeatedly across those pathways.
We performed differential expression analysis using DESeq2 (version
1.26.0)^[312]86, comparing patients with first-tercile vs.
third-tercile APS values from within each cohort (removing APL patients
bearing t(15;17) translocations). For the DESeq2 analysis, we used the
raw expression count data, with all parameters set to their default
values. For considering genes to be significantly up- or downregulated
for downstream analysis, we used thresholds of FDR-adjusted p value of
≤0.1 and absolute log[2] foldchange ≥1 for all downstream analyses.
Proteomics data analysis
Proteomics data were retrieved from the AML Proteome Atlas^[313]51. To
identify proteins correlated with FAK protein expression, we calculated
Pearson correlation coefficients for FAK and all other proteins in the
dataset, and extracted the most highly correlated proteins. To compare
FAK protein expression in samples with different mutational status, we
extracted RUNX1 and TP53 mutation annotations from the experimental
design, and plotted relative FAK expression between samples with and
without those mutations, using two-sided t tests to determine
significance.
Cell line experiments
The RUNX1- and TP53-knockout MDSL cell lines were generated and
confirmed in Martinez-Hoyer et al.^[314]53. For shRNA knockdown
experiments, the shRNA lentiviral constructs targeting human RUNX1
(shRUNX1#1 TRCN0000013660 and shRUNX1 #2 TRCN0000338427) were purchased
from MilliporeSigma. The pLKO.1 backbones were modified to express EGFP
in place of the puromycin resistance gene. Non-targeting shScramble in
pLKO.1 vectors expressing EGFP were used as controls. The shRNA
Lentiviruses were prepared in HEK293T/17 cells, as previously
described^[315]53 and titres were typically in the
[MATH: 109
:MATH]
transducing units/ml range. KG1a and THP1 cells were resuspended in
growth media at 5 ×
[MATH: 105
:MATH]
cells/ml. Transduction was performed for 4–6 h in the presence of
8 µg/ml polybrene. After transduction, cells were washed and expanded
in growth media for 72 h, EGFP+ cells were further purified by
fluorescence-activated cell sorting.
For colony-forming cell assays (CFC), 500 lentivirus-transduced (GFP+)
KG1a or THP1 cells were plated in methylcellulose containing h-IL3,
h-erythropoietin, h-GM-CSF and h-SCF (H4434; StemCell technologies).
CFC medium was mixed before plating with FAK inhibitor or DMSO as
control. The colonies were scored after 7–10 days.
Western blotting was done as previously described^[316]53: cells were
pelleted and lysed in RIPA buffer (25 mM Tris, 150 mM NaCl, 1% TX-100,
0.25% sodium deoxycholate, and 0.1% SDS), to which protease and
phosphatase inhibitors were added. Cell extracts were centrifuged at
16,000 × g at 4 °C, and the supernatant was boiled for 5 min with
Laemmli buffer. Samples containing 20–50 μg protein were separated
using SDS–PAGE and transferred at 100 V for 1 h to nitrocellulose
membranes, which were subsequently blocked using 5% milk or BSA in
Tris-buffered saline with Tween-20 (20 mM Tris, 137 mM NaCl, 0.1%
Tween-20). Primary antibodies used were as follows: anti-RUNX1 (CST
4334, 1:1000), anti-FAK (CST 13009, 1:1000), anti-GAPDH (CST 2118,
1:5000).
For RT-qPCR, RNA was extracted using Qiagen AllPrep DNA/RNA mini kit
according to the manufacturer’s manual. For mRNA targets, cDNA was
synthesized using Maxima reverse transcriptase and random hexamer
(Thermo Scientific). RT-qPCR was performed using power SYBR green
master mix (Thermo Fisher Scientific), with the results normalized to
HPRT1 expression. Primers used for qPCR were as follows: PTK2_set 1
forward - GTCTGCCTTCGCTTCACG, reverse - GAATTTGTAACTGGAAGATGCAAG,
PTK2_set 2 forward GCGTCTAATCCGACAGCAACA, reverse
-CTCGAGAGAGTCTCACATCAGGTT, HPRT1 forward TGACCTTGATTTATTTTGCATACC,
reverse CGAGCAAGACGTTCAGTCCT.
Reporting summary
Further information on research design is available in the [317]Nature
Research Reporting Summary linked to this article.
Supplementary information
[318]Supplementary Information^ (10.9MB, pdf)
[319]Peer Review File^ (266.8KB, pdf)
[320]41467_2021_22625_MOESM3_ESM.pdf^ (134.2KB, pdf)
Description of Additional Supplementary Files
[321]Supplementary Data 1-28^ (275.1KB, xlsx)
[322]Reporting Summary^ (307.3KB, pdf)
Acknowledgements